Deep Learning

Front Cover
MIT Press, Nov 18, 2016 - Computers - 800 pages
3 Reviews

An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.

“Written by three experts in the field, Deep Learning is the only comprehensive book on the subject.”
—Elon Musk, cochair of OpenAI; cofounder and CEO of Tesla and SpaceX

Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.

The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.

Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

 

What people are saying - Write a review

User Review - Flag as inappropriate

These are comments which I've read in other reviews, however, I definitely agree with them.
I have a Bachelor in Science in Mechanical Engineering with a Minor in Statistical Quality Control and a
Master in Science in Sustainable Energy Technologies. I'm not bragging about it. I just want to make clear I have a strong mathematics and statistics background.
After reading books like Introduction to Statistical Learning, Introduction to Machine Learning with Python and Python for Data Analysis and taking Andrew Ng's machine and deep learning specializations in Coursera, I thought it was a good idea to have a text book to follow upon what I learned with all the very valuable resources I mentioned.
Andrew interviewed Goodfellow and Bengio in his online courses and given the size of their contributions to the deep learning community, I thought there were no better people to write a book about this incredibly influential field. Unfortunately, the result is a highly technical book in which even the introduction is hard to totally grasp. If you are not very familiar with matrix calculus, and in particular with matrix calculus notation, you're going to have a very hard time with this book.
Something I appreciated from the aforementioned books is that all the authors found a way to explain in a very down to earth manner (as down to earth as this very complex subject allows) what each algorithm is doing and how it's searching for an optimal result. That doesn't happen in this book.
According to Goodfellow, this book is meant for undergrads and postgrads alike. Nevertheless, if you are not able to read and fully understand a common math Wikipedia article (say Gini coefficient, for example) like myself, you're probably going to find yourself in a position in which you may be even more confused relative to how you started.
I eventually gave up around page 150. I referred to some isolated subjetcs from time to time (convolutional networks, sequence models, GANs, etc.) and sometimes it was useful. It wasn't most of the time, though.
If you read the book's critics on the back, you'll notice amazing opinions from Geoffrey Hinton, Elon Musk and Yan LeCun. That may encourage you to buy it, but please take into account these are no 'normal' human beings and what may seem obvious to them, could very well be quite complicated for an average mind, specially if there is no mathematical background involved.
Bottomline, if you are very well versed mathematically speaking, this may be the best book available. Otherwise, you may be better off trying some online material such as Andrew Ng's excelent set of specializations.
 

User Review - Flag as inappropriate

Not well written

Contents

1 Introduction
1
11 Who Should Read This Book?
8
12 Historical Trends in Deep Learning
12
I Applied Math and Machine Learning Basics
27
2 Linear Algebra
29
22 Multiplying Matrices and Vectors
32
23 Identity and Inverse Matrices
34
24 Linear Dependence and Span
35
104 EncoderDecoder SequencetoSequence Architectures
385
105 Deep Recurrent Networks
387
106 Recursive Neural Networks
388
107 The Challenge of LongTerm Dependencies
390
108 Echo State Networks
392
109 Leaky Units and Other Strategies for Multiple Time Scales
395
1010 The Long ShortTerm Memory and Other Gated RNNs
397
1011 Optimization for LongTerm Dependencies
401

25 Norms
36
26 Special Kinds of Matrices and Vectors
38
27 Eigendecomposition
39
28 Singular Value Decomposition
42
29 The MoorePenrose Pseudoinverse
43
210 The Trace Operator
44
211 The Determinant
45
3 Probability and InformationTheory
51
31 Why Probability?
52
32 Random Variables
54
34 Marginal Probability
56
35 Conditional Probability
57
37 Independence and Conditional Independence
58
39 Common Probability Distributions
60
310 Useful Properties of Common Functions
65
311 Bayes Rule
68
313 Information Theory
70
314 Structured Probabilistic Models
74
4 Numerical Computation
77
42 Poor Conditioning
79
44 Constrained Optimization
89
Linear Least Squares
92
5 Machine Learning Basics
95
51 Learning Algorithms
96
52 Capacity Overfitting and Underfitting
107
53 Hyperparameters and Validation Sets
117
54 Estimators Bias and Variance
119
55 Maximum Likelihood Estimation
128
56 Bayesian Statistics
132
57 Supervised Learning Algorithms
136
58 Unsupervised Learning Algorithms
142
59 Stochastic Gradient Descent
147
510 Building a Machine Learning Algorithm
149
511 Challenges Motivating Deep Learning
151
Modern Practices
161
6 Deep Feedforward Networks
163
Learning XOR
166
62 GradientBased Learning
171
63 Hidden Units
185
64 Architecture Design
191
65 BackPropagation and Other Differentiation Algorithms
197
66 Historical Notes
217
7 Regularization for Deep Learning
221
71 Parameter Norm Penalties
223
72 Norm Penalties as Constrained Optimization
230
73 Regularization and UnderConstrained Problems
232
74 Dataset Augmentation
233
75 Noise Robustness
235
76 SemiSupervised Learning
236
77 Multitask Learning
237
78 Early Stopping
239
79 Parameter Tying and Parameter Sharing
246
710 Sparse Representations
247
711 Bagging and Other Ensemble Methods
249
712 Dropout
251
713 Adversarial Training
261
714 Tangent Distance Tangent Prop and Manifold Tangent Classifier
263
8 Optimization for Training Deep Models
267
81 How Learning Differs from Pure Optimization
268
82 Challenges in Neural Network Optimization
275
83 Basic Algorithms
286
84 Parameter Initialization Strategies
292
85 Algorithms with Adaptive Learning Rates
298
86 Approximate SecondOrder Methods
302
87 Optimization Strategies and MetaAlgorithms
309
9 Convolutional Networks
321
91 The Convolution Operation
322
92 Motivation
324
93 Pooling
330
94 Convolution and Pooling as an Infinitely Strong Prior
334
95 Variants of the Basic Convolution Function
337
96 Structured Outputs
347
97 Data Types
348
98 Efficient Convolution Algorithms
350
99 Random or Unsupervised Features
351
910 The Neuroscientific Basis for Convolutional Networks
353
911 Convolutional Networks and the History of Deep Learning
359
Recurrent and Recursive Nets
363
101 Unfolding Computational Graphs
365
102 Recurrent Neural Networks
368
103 Bidirectional RNNs
383
1012 Explicit Memory
405
11 Practical Methodology
409
111 Performance Metrics
410
112 Default Baseline Models
413
113 Determining Whether to Gather More Data
414
114 Selecting Hyperparameters
415
115 Debugging Strategies
424
MultiDigit Number Recognition
428
12 Applications
431
122 Computer Vision
440
123 Speech Recognition
446
124 Natural Language Processing
448
125 Other Applications
465
III Deep Learning Research
475
13 Linear Factor Models
479
131 Probabilistic PCA and Factor Analysis
480
132 Independent Component Analysis ICA
481
133 Slow Feature Analysis
484
134 Sparse Coding
486
135 Manifold Interpretation of PCA
489
14 Autoencoders
493
141 Undercomplete Autoencoders
494
142 Regularized Autoencoders
495
143 Representational Power Layer Size and Depth
499
144 Stochastic Encoders and Decoders
500
145 Denoising Autoencoders
501
146 Learning Manifolds with Autoencoders
506
147 Contractive Autoencoders
510
148 Predictive Sparse Decomposition
514
149 Applications of Autoencoders
515
15 Representation Learning
517
151 Greedy LayerWise Unsupervised Pretraining
519
152 Transfer Learning and Domain Adaptation
526
153 SemiSupervised Disentangling of Causal Factors
532
154 Distributed Representation
536
155 Exponential Gains from Depth
543
156 Providing Clues to Discover Underlying Causes
544
16 Structured Probabilistic Models for Deep Learning
549
161 The Challenge of Unstructured Modeling
550
162 Using Graphs to Describe Model Structure
554
163 Sampling from Graphical Models
570
164 Advantages of Structured Modeling
572
166 Inference and Approximate Inference
573
167 The Deep Learning Approach to Structured Probabilistic Models
575
17 Monte Carlo Methods
581
172 Importance Sampling
583
173 Markov Chain Monte Carlo Methods
586
174 Gibbs Sampling
590
175 The Challenge of Mixing between Separated Modes
591
18 Confronting the Partition Function
597
181 The LogLikelihood Gradient
598
182 Stochastic Maximum Likelihood and Contrastive Divergence
599
183 Pseudolikelihood
607
184 Score Matching and Ratio Matching
609
185 Denoising Score Matching
611
186 NoiseContrastive Estimation
612
187 Estimating the Partition Function
614
19 Approximate Inference
623
191 Inference as Optimization
624
192 Expectation Maximization
626
193 MAP Inference and Sparse Coding
627
194 Variational Inference and Learning
629
195 Learned Approximate Inference
642
20 Deep Generative Models
645
202 Restricted Boltzmann Machines
647
203 Deep Belief Networks
651
204 Deep Boltzmann Machines
654
205 Boltzmann Machines for RealValued Data
667
206 Convolutional Boltzmann Machines
673
207 Boltzmann Machines for Structured or Sequential Outputs
675
208 Other Boltzmann Machines
677
209 BackPropagation through Random Operations
678
2010 Directed Generative Nets
682
2011 Drawing Samples from Autoencoders
701
2012 Generative Stochastic Networks
704
2013 Other Generation Schemes
706
2014 Evaluating Generative Models
707
2015 Conclusion
710
Bibliography
711
Index
767
Copyright

Other editions - View all

Common terms and phrases

About the author (2016)

Ian Goodfellow is Research Scientist at OpenAI. Yoshua Bengio is Professor of Computer Science at the Universit de Montr al. Aaron Courville is Assistant Professor of Computer Science at the Universit de Montr al.

Bibliographic information