Uploaded by Oscar Su

mathemathical foundations of machine learning

advertisement
The Mathematical Foundations of Machine Learning
Version 1
5/9/2020
This is a map of subjects and the corresponding textbooks that one should study in order to
have a very solid mathematical foundation for doing machine learning work. This is a guide for
learning the math that you will use, not for learning the machine learning algorithms themselves.
Obviously not all of this is necessary, and you can find work in the machine learning field without
knowing all of this. However, if your goal is to have a deep understanding, on both an applied
and theoretical level, of algorithms that are typically used in this field, then following this guide
will enable you to do so. None of the topics here cover the software aspect of machine learning.
There are many applied machine learning courses available online that cover that aspect of the
field. Websites like Coursera or Udemy are a good place to start.
The only prerequisite knowledge this guide will assume is a year of calculus. There are
many resources for learning calculus so they will not be covered here. The book by Stewart or
Khan Academy are a perfectly fine way to learn the subject. My goal in writing this guide is to
provide someone who wants to read ESL and the Deep Learning book with enough mathematical
maturity to do so. Covering the required material will put you in good shape for that. These
subjects are difficult and require a serious level of dedication. I’ve done my best to provide
textbooks that have solutions available. I will not link to the solution manuals, they can be found
with a little bit of searching. In fact, all required texts have solution manuals available aside from
the linear models text by Christensen. Also, my goal isn’t to provide resources that are free. I’ve
picked what I believe to be are the best quality texts for the subject that will give you the deepest
understanding. With that said, it shouldn’t be hard to find electronic copies of them.
You will need to be very comfortable with proving things, this is non-negotiable. So step
0 would be to go through a book like How to Prove It by Velleman, or Discrete Mathematics
by Rosen. From there we start with linear algebra and introductory analysis. You need a
good grasp of these in order to understand statistics at the level that is required for machine
learning. It is impossible to understand things like linear models and the central limit theorem
without having a good grasp of linear algebra and analysis. For these topics I recommend Linear
Algebra by Friedberg, Insel, and Spence and Understanding Analysis by Abbott. These are
both great textbooks. The linear algebra one is great because it blends applied and theoretical
understanding, and Understanding Analysis helps build intuition for doing further work in
analysis. From here we move on to statistics, and then things branch out. As a machine learning
practitioner, having a working knowledge of probability can be very helpful, but a rigorous
understanding of probability cannot be had unless we first learn some measure theory. Thus
I’ve included that in our path as well. Below is the flowchart of subjects to study and their
corresponding texts. A blue node is a required subject/text and an orange node is an optional
subject/text.
Proofs
How to Prove It
Velleman
Linear Algebra
Linear Algebra
Friedberg, Insel, Spence
Optimization
Convex Optimization
Boyd, Vandenberghe
Advanced Statistics
Statistical Inference
Casella, Berger
Introductory Analysis
Understanding Analysis
Abbott
Statistics
Introduction to
Mathematical Statistics
Hogg, McKean, Craig
Analysis
Principles of
Mathematical Analysis
Rudin
Functional Analysis
Introductory Functional
Analysis With Applications
Kreyszig
Introductory Linear Models
Applied Linear
Statistical Models
Kutner, et. al
Linear Models
Plane Answers to
Complex Questions
Christensen
GLMs
Generalized, Linear,
and Mixed Models
McCulloch, Searle, Neuhaus
Advanced Linear Models
Advanced
Linear Modeling
Christensen
Further Advanced Statistics
Theoretical Statistics
Topics for a Core Course
Keener
Asymptotics
Asymptotic Statistics
van der Vaart
Topology
Topology
Munkres
Measure Theory
Measures, Integrals,
and Martingales
Schilling
Introductory Probability
A First Look
At Rigorous
Probability Theory
Rosenthal
Probability
Probability and
Measure Theory
Ash, Doléans-Dade
As stated before, if you are more comfortable with a standard discrete math text then you can
replace Velleman with the textbook by Rosen. But, Velleman is great and it has many solutions
in the back of the book. If you are finding linear algebra to be difficult then maybe backtrack a
bit and try working through Introduction to Linear Algebra by Strang (with its corresponding
MIT OpenCourseware videos) or Linear Algebra and Its Applications by Lay. For introductory
analysis, Abbott is as good as it gets. If you want more references though, Introduction to Real
Analysis by Bartle and Sherbert and The Way of Analysis by Strichartz are also good. The latter is
very wordy but the author focuses heavily on building intuition so it’s great if you’re not getting
that from Abbott’s text.
Once you’ve worked your way through linear algebra and analysis you should have enough
maturity to work through Hogg’s intro to statistics textbook. It is a great text when paired with
Casella and Berger. I learned from both of these texts and I still reference them from time to
time. If you need some supplemental texts to go along with Hogg try Mathematical Statistics
with Applications by Wackerly, Mendenhall, and Scheaffer, All of Statistics by Wasserman, and
Mathematical Statistics and Data Analysis by Rice. If you can get through the problems in Casella
and Berger (you really only need through chapter 10) then you are more than prepared for doing
work as a data scientist or machine learning engineer (in terms of probability and statistics).
Applied Linear Statistical Models by Kutner, et. al is a great text when it comes to learning
linear models. It is incredibly long, clocking in at a little over 1400 pages. However, you can
avoid the 2nd half of the book if you are pressed for time because it covers basic design and
analysis of experiments (ANOVA and the like). Once you’ve finished that you can move onto the
more theoretical aspects of linear models, like distributions of quadratic forms. This is covered
in the book by Christensen. It’s a great textbook but unfortunately there is no solution manual
available. If you are self-studying and need to be able to check your solutions then I would
recommend replacing this text with Linear Models in Statistics by Rencher and Schaalje. The
solutions are in the back. There are also many supplemental texts here. Some notable ones are:
A Primer on Linear Models by Monahan, Linear Models by Searle (solutions are available on the
text’s website), and Linear Statistical Models by Stapleton (solutions are in the back of the book).
Do not forget Convex Optimization! Knowing your optimization algorithms is incredibly
important as a machine learning practitioner and the text by Boyd and Vandenberghe is considered the bible. It can be a difficult text though. You should have a very solid foundation of
linear algebra, calculus, introductory analysis, and even some topology when working your way
through it. Supplement with Munkres (the first few chapters on point set topology) if needed
because I’m not sure if Abbott does any topology outside of R.
Once you’ve covered all that you are in good shape! If you are interested in really understanding probability then you will need a much better understanding of analysis. To do this
you should start by working through the first 7 chapters of Rudin (ignore the rest, they’re not
great). From here you can skip to introductory functional analysis by Kreyszig if you want. This
is functional analysis without measure theory so you’re still taking some baby steps here. To do
proper functional analysis we need a working knowledge of measure theory so we have more
to work with than sequence spaces. After you’ve completed the book by Schilling you can look
into the functional analysis texts by Rudin, Conway, or even Stein and Shakarchi. To do proper
probability we need our measure theory. The text by Schilling is a great introduction, and the
author provides a full solution manual on his website. It’s a very thorough textbook with great
proofs. It covers a few bits and pieces of probability but not enough for our liking, though. If you
want a supplement here, or maybe a even a more gentle introduction try Measure, Integration,
and Real Analysis by Axler. From here we move on to the study of rigorous probability. Start
with the gentle introduction by Rosenthal. It’s a very short text but it has lots of great problems
to work through. The introductory chapter explaining why we need measure theory to properly
define how probability works gives good motivation. Finally, a full blown probability textbook.
Probability by Ash was chosen over Probability and Measure by Billinglsey because it covers
roughly the same material and it has many solutions in the back of the text. Both are equally
good textbooks though, so they can be interchanged. If you’d like further references then the
texts by Chung, Resnick, Durrett, Athreya, and Pollard are good. If you still can’t get enough
probability then your next steps would be: Convergence of Probability Measures by Billingsley,
Real Analysis and Probability by Dudley, and Uniform Central Limit Theorems by Dudley.
Once we have a solid foundation of rigorous probability theory then we can work on even
more advanced statistics. The recommended text by Keener is a great book with some solutions
provided in the back. I think it is the text used for Stanford’s PhD level theoretical statistics class.
A good supplement here is the book Mathematical Statistics by Jun Shao (and its accompanying
solutions manual), and if you’re looking for a more Bayesian viewpoint then I would recommend
Theory of Statistics by Schervish. Finally, the most widely used text for asymptotic statistics is
the one by van der Vaart. A knowledge of measure theory might not be needed for this text but
it can never hurt. A supplementary text here would be Elements of Large-Sample Theory by
Lehmann.
Lastly there is the topic of GLMs and advanced linear models. The mentioned text for
GLMs is good but it is heavily theoretical. If you’d prefer something more applied then look
into Foundations of Linear and Generalized Linear Models by Agresti and an Introduction to
Generalized Linear Models by Dobson and Barnett. The classic text Genralized Linear Models by
Mccullagh and Nelder is recommended as well. If you are just interested in categorical data then
Categorical Data Analysis (not the introduction) by Agresti cannot be beat. Advanced Linear
Modeling by Christensen is really just a survey text covering a lot of advanced techniques, like
penalized estimation and reproducing kernel hilbert spaces. If you are interested in any of the
specific topics covered in it then there are references provided at the end of each chapter.
A few random recommended textbooks that did not fit in fall under econometrics and time
series analysis. For econometrics, Econometrics by Hayashi, Econometric Analysis by Greene,
Econometric Analysis of Cross Section and Panel Data by Wooldridge, and Econometric Theory
and Methods by Davidson and Mackinnon are great texts. For time series analysis I would
recommend Time Series Analysis by Hamilton (very dense) and Time Series Analysis and Its
Applications by Shumway and Stoffer.
Download