Uploaded by cmitty1

Machine Learning Study Material

advertisement
Review Material
1
Table of Contents
Online Courses...................................................................................................................................................... 3
Machine Learning ............................................................................................................................................. 3
Python ............................................................................................................................................................... 3
R........................................................................................................................................................................ 3
caret package................................................................................................................................................. 3
Tableau ............................................................................................................................................................. 3
Excel ................................................................................................................................................................. 3
Optimization ..................................................................................................................................................... 3
Statistics ................................................................................................................................................................ 4
Law of Large Numbers (LLN).......................................................................................................................... 4
Central Limit Theorem (CLT) .......................................................................................................................... 4
Classifiers ............................................................................................................................................................. 4
k-NN ................................................................................................................................................................. 4
Performance Metrics for Classifiers ................................................................................................................. 5
Kohen’s Kappa ............................................................................................................................................. 5
2
Online Courses
Machine Learning
Dive into Deep Learning: https://d2l.ai/chapter_introduction/index.html (also see pdf in ‘books’)
ML Crash Course, Google: https://developers.google.com/machine-learning/crash-course/ml-intro
AI Education: https://ai.google/education/
Python
Basics: https://www.w3schools.com/python/
Python for Data Science, IBM: https://www.coursera.org/learn/python-project-for-data-science#syllabus
R
Data Analysis with R Programming, Google: https://www.coursera.org/learn/data-analysis-r
Data Visualization & Dashboarding with R Specialization, Johns Hopkins:
https://www.coursera.org/specializations/jhu-data-visualization-dashboarding-with-r
Build Data Analysis and Transformation Skills in R using DPLYR, Coursera:
https://www.coursera.org/projects/dplyr
Univ. of Cincinnati Business Analytics Guide: http://uc-r.github.io/page7/ (expand the side menu for more)
caret package
Documentation: https://topepo.github.io/caret/index.html
DataCamp course: https://www.datacamp.com/courses/machine-learning-with-caret-in-r
Webinar by Max Kuhn: https://www.youtube.com/watch?v=7Jbb2ItbTC4
Tableau
Tableau Public samples from UVU: https://public.tableau.com/app/profile/tauna.walrath
Excel
Excel Basics for Data Analysis, IBM: https://www.coursera.org/learn/excel-basics-data-analysisibm/home/welcome
Excel Training: https://support.microsoft.com/en-us/office/excel-2013-training-aaae974d-3f47-41d9-895e97a71c2e8a4a
Optimization
MIT Open Course: https://openlearninglibrary.mit.edu/courses/course-v1:MITx+15.053x+3T2016/about
3
Statistics
Law of Large Numbers (LLN)
If you sample a r.v. independently a large no. of times, then the measured average should converge to the r.v.’s
expected value. As n  infinity, xbar  mu.
Central Limit Theorem (CLT)
States that the distribution of sample means approximates a normal distribution as the sample size gets larger,
regardless of the population’s distribution.
The difference between LLN and CLT is that LLN states something about a single sample mean whereas the
CLT states something about the distribution of the sample means.
Classifiers
k-NN
k-NN with caret, yardstick in R: https://rpubs.com/jkylearmstrong/knn_w_caret
How-tos: https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearestneighbor-classifier-from-elements-o/21602
4
Performance Metrics for Classifiers
Kohen’s Kappa
It is a good measure that addresses both multi-class and imbalanced class problems. It is generally thought to
be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of
the agreement occurring by chance. The definition of κ is:
where po is the relative observed agreement among raters, and pe is the hypothetical probability of chance or
random agreement, using the observed data to calculate the probabilities of each observer randomly seeing
each category. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters
other than what would be expected by chance (as given by pe), κ = 0. It is possible for the statistic to be
negative, which implies that there is no effective agreement between the two raters, or the agreement is worse
than random.
An example:
5
6
7
Download