STAT 9073-01. Analysis of Sports Big Data Spring 2023 Instructor: Sangwook Kang, Ph.D. Associate Professor, Department of Applied Statistics DWHMB 529, Phone: (02) 2123-2538; Email: kanggi1@yonsei.ac.kr Class Hours: Wednesday 9:00 - 11:50am Class Room: DWHMB 535 Office Hours: TBD Textbook: Analyzing Baseball data with R, 2nd Edition, Max Marchi, Jim Albert and Benjamin S. Baumer, CRC Press, 2018. References: (i) Curve Ball: Baseball, Statistics, and the Role of Chance in the Game, by Jim Albert and Jay Bennett, Copernicus; 2001. (ii) Full House: The Spread of Excellence from Plato to Darwin, by Stephen J. Gould, Harmony, 1996. (iii) Moneyball: The Art of Winning an Unfair Game, by Michael Lewis, W. W. Norton & Company, 2004. (iv) Basketball Data Science: With Applications in R, by Paola Zuccolotto and Marica Manisera, Chapman and Hall/CRC, 2020. Course Description: Overview: • Are you a fan of baseball? Do you like numbers? Then, this course is for you. In this course, we talk about numbers, sports, and numbers in sports, especially for baseball. Many people have already done lots of things using numbers to answer their own questions of interest. We will go over them. And then it will be your turn to apply statistical concepts and techniques to answer your own questions of interest. For example, will Doosan Bears be the champion this year? We will also talk about Basketball data analysis (if time permits), too! 1 • This course is designed for graduate students in the Department of Statistics and Data Science. Other students could still, however, take this course if some requirements are met (Check “Prerequisite” for more details). Topics: • Introduction to baseball, Introduction to Sabermetrics, Exploring databases, Seasonby-Season data, Game-by-Game data, Play-by-Play data and Pitch-by-Pitch data, Relation between runs and wins, Value of plays using run expectancy, Balls and strikes Effects, Career Trajectories, Computing park factors, Other topics in baseball data analysis, Some basketball data science, and Examples of data analysis in other sports Prerequisites: • Your passion for sports and numbers plus introductory statistics are required as a prerequisite. If you are not a fan of baseball (or other sports) or do not want to be a fan but still want to take this course, you need to contact the instructor and talk with the instructor why you want to take this course! Software: R (and Python, if needed) Course Materials: We use LearnUs. Course syllabus, lecture notes, homework assignments, and some other related course materials will be posted on the course website at LearnUs. The lecture note will be available before each class. The students are responsible to print out all required course materials. Course Requirements for Grading Purposes: Homework: • There will be roughly bi-weekly homework assignment. • All homework assignments are due at the beginning of the class period on the date on which they are due. No delay will be allowed. • Students are encouraged to work together (maybe online?) on homework, but copying someone else’s work always an academic honesty violation. Exams: • There will be one take-home exam. 2 Project: • There will be one team project. • The project guidelines will be posted later in the semester. • The project report is due during the last class of the semester. Topical Outline: Date Week 1: Mar 2 - 8 Topic of Class Introduction to baseball / Introduction to Sabermetrics Week 2: Mar 9 - 15 Exploring databases: Season-by-Season data, Game-by-Game data, Play-by-Play data and Pitch-by-Pitch data Week 3: Mar 16 - 22 Introduction to R / Graphics Week 4: Mar 23 - 29 Relation between runs and wins Week 5: Mar 30 - Apr 5 Value of plays using run expectancy Week 6: Apr 6 - 12 Balls and strikes Effects / Career Trajectories Week 7: Apr 13 - 19 Exploring Streaky Performances / Computing park factors Week 8 *Apr 20 - 26 (Midterm week) Take-home Examination Week 9: Apr 27 - May 3 Extinction of 0.400 hitters in MLB Week 10: May 4 - 10 Other topics in baseball data analysis Week 11: May 11 - 17 Some basketball data science Week 12: May 18 - 24 Examples of Data Analysis in Other Sports Week 13: May 25 - 31 Students’ presentation Week 14: Jun 1 - 7 Students’ presentation Week 15: Jun 8 - 14 Final project due (*Jun 14) *: The dates of the midterm and final exams will be determined by the school during the exam weeks. 3 Grading: The grades will be assigned as follows: Homework Assignments Course Project Midterm Exam Class Attendance & Participation 40% 35% Due: Jun 14 20% Apr 20 - 26 5% The distribution of grades will be as follows: Grade A+ ∼ A− B+ ∼ B− Proportions (%) 60% + α1∗ % 35% + α2 % *: α1 is a non-negative number. In order to obtain a good grade, you need to successfully complete all assignments, the course project, and exams, to show your efforts putting into the class, and to attend every lecture. Make-Up Policy: You must contact the instructor in advance if you are unable to take an exam at its scheduled time. Arrangements may then be made for a make-up exam. Attendance Policy: Attending all the classes is highly recommended. Those who miss classes too often may not get full 5% credit. General Disclaimers: The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary. 4