Information Theory, Inference and Learning AlgorithmsInformation theory and inference, often taught separately, are here united in one entertaining textbook. These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning. |
Contents
Introduction to Information Theory | 1 |
Probabilities and Inference | 3 |
Probability Entropy and Inference | 22 |
ful theoretical ideas of Shannon but also practical solutions to communica | 34 |
More about Inference | 48 |
4 | 66 |
6 | 93 |
Codes for Integers | 132 |
Introduction to Neural Networks | 468 |
The Single Neuron as a Classifier | 471 |
17 | 473 |
Capacity of a Single Neuron | 483 |
Learning as Inference | 492 |
19 | 496 |
Hopfield Networks | 505 |
Boltzmann Machines | 522 |
8 | 138 |
Further Exercises on Information Theory | 239 |
Clustering | 284 |
More about Inference | 292 |
I | 302 |
4 | 311 |
5 | 321 |
Stream Codes | 334 |
Codes for Integers | 340 |
Model Comparison and Occams Razor | 343 |
Monte Carlo Methods | 357 |
8 | 358 |
Efficient Monte Carlo Methods | 387 |
9 | 390 |
Ising Models | 400 |
10 | 401 |
Exact Monte Carlo Sampling | 413 |
Variational Methods | 422 |
11 | 423 |
Independent Component Analysis and Latent Variable Mod elling | 437 |
Random Inference Topics | 445 |
Decision Theory | 451 |
Bayesian Inference and Sampling Theory | 457 |
14 | 460 |
Neural networks | 467 |
Supervised Learning in Multilayer Networks | 527 |
45 | 534 |
Gaussian Processes | 535 |
Gaussian Processes | 548 |
Deconvolution | 549 |
Sparse Graph Codes | 555 |
LowDensity ParityCheck Codes | 556 |
LowDensity ParityCheck Codes | 557 |
LowDensity ParityCheck Codes | 568 |
Convolutional Codes and Turbo Codes | 574 |
Convolutional Codes and Turbo Codes | 578 |
RepeatAccumulate Codes | 582 |
50 | 584 |
50 Digital Fountain Codes | 588 |
Digital Fountain Codes | 589 |
Appendices | 597 |
A Notation | 598 |
B Some Physics | 601 |
Some Mathematics | 605 |
613 | |
617 | |
620 | |
626 | |
Other editions - View all
Common terms and phrases
approximation assume Bayesian binary symmetric channel blocklength capacity Chapter codeword compression compute covariance data set defined density eigenvalue encoding ensemble entropy equal equation error probability estimate evaluate example flipped Gaussian distribution Gaussian processes Gibbs sampling given gradient graph H₁ Hamiltonian Monte Carlo Hamming code Hopfield network importance sampling inference information content input integer Ising model iterations length likelihood linear codes log2 low-density parity-check codes Markov chain maximum mean Metropolis method Monte Carlo methods neural network neuron nodes objective function obtain Occam factor optimal output packets parameters parity parity-check matrix posterior distribution posterior probability predictions prior probability distribution probability of error random variable rule sequence shown in figure shows slice sampling Solution to exercise source bits space spin standard deviation string sum-product symbol syndrome temperature theorem theory trajectory trellis turbo code update variance variational free energy weight zero