Review Material 1 Table of Contents Online Courses...................................................................................................................................................... 3 Machine Learning ............................................................................................................................................. 3 Python ............................................................................................................................................................... 3 R........................................................................................................................................................................ 3 caret package................................................................................................................................................. 3 Tableau ............................................................................................................................................................. 3 Excel ................................................................................................................................................................. 3 Optimization ..................................................................................................................................................... 3 Statistics ................................................................................................................................................................ 4 Law of Large Numbers (LLN).......................................................................................................................... 4 Central Limit Theorem (CLT) .......................................................................................................................... 4 Classifiers ............................................................................................................................................................. 4 k-NN ................................................................................................................................................................. 4 Performance Metrics for Classifiers ................................................................................................................. 5 Kohen’s Kappa ............................................................................................................................................. 5 2 Online Courses Machine Learning Dive into Deep Learning: https://d2l.ai/chapter_introduction/index.html (also see pdf in ‘books’) ML Crash Course, Google: https://developers.google.com/machine-learning/crash-course/ml-intro AI Education: https://ai.google/education/ Python Basics: https://www.w3schools.com/python/ Python for Data Science, IBM: https://www.coursera.org/learn/python-project-for-data-science#syllabus R Data Analysis with R Programming, Google: https://www.coursera.org/learn/data-analysis-r Data Visualization & Dashboarding with R Specialization, Johns Hopkins: https://www.coursera.org/specializations/jhu-data-visualization-dashboarding-with-r Build Data Analysis and Transformation Skills in R using DPLYR, Coursera: https://www.coursera.org/projects/dplyr Univ. of Cincinnati Business Analytics Guide: http://uc-r.github.io/page7/ (expand the side menu for more) caret package Documentation: https://topepo.github.io/caret/index.html DataCamp course: https://www.datacamp.com/courses/machine-learning-with-caret-in-r Webinar by Max Kuhn: https://www.youtube.com/watch?v=7Jbb2ItbTC4 Tableau Tableau Public samples from UVU: https://public.tableau.com/app/profile/tauna.walrath Excel Excel Basics for Data Analysis, IBM: https://www.coursera.org/learn/excel-basics-data-analysisibm/home/welcome Excel Training: https://support.microsoft.com/en-us/office/excel-2013-training-aaae974d-3f47-41d9-895e97a71c2e8a4a Optimization MIT Open Course: https://openlearninglibrary.mit.edu/courses/course-v1:MITx+15.053x+3T2016/about 3 Statistics Law of Large Numbers (LLN) If you sample a r.v. independently a large no. of times, then the measured average should converge to the r.v.’s expected value. As n infinity, xbar mu. Central Limit Theorem (CLT) States that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution. The difference between LLN and CLT is that LLN states something about a single sample mean whereas the CLT states something about the distribution of the sample means. Classifiers k-NN k-NN with caret, yardstick in R: https://rpubs.com/jkylearmstrong/knn_w_caret How-tos: https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearestneighbor-classifier-from-elements-o/21602 4 Performance Metrics for Classifiers Kohen’s Kappa It is a good measure that addresses both multi-class and imbalanced class problems. It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance. The definition of κ is: where po is the relative observed agreement among raters, and pe is the hypothetical probability of chance or random agreement, using the observed data to calculate the probabilities of each observer randomly seeing each category. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters other than what would be expected by chance (as given by pe), κ = 0. It is possible for the statistic to be negative, which implies that there is no effective agreement between the two raters, or the agreement is worse than random. An example: 5 6 7