STAT 5101 Foundations of Data Science Instructor: Xinyuan Song Office: LSB 114, 39437929, email: xysong@sta.cuhk.edu.hk Teaching Assistant: Xiangnan Feng Office: LSB G32, 39438527, email: fengxiangnan123@gmail.com Assessment Scheme Exercise Mid-term examination 20% 30% Final examination 50% October 23, 2013 7:00-9:00pm No make-up examination December 4, 2013 7:00-9:00pm 1 Course Description This course provides comprehensive coverage of basic concepts of statistics. Topics include exploratory data analysis, statistical graphics, sampling variability, point and confidence interval estimation, hypothesis testing, other selected topics. Two computer software: R and Microsoft Excel will be introduced to describe and analyze data. 2 Learning Outcomes After completing the course, students should be able to understand basic concepts in statistics; use various statistical methods and techniques to summarize, present, and analyze data; read statistical reports and recognize when the quantitative information presented is accurate or misleading ; use computer software (R and Excel) to analyze data and draw conclusions. 3 Textbook and Reference Books Textbook Levine, D. M., Stephan, D., Krehbiel, T. C. and Berenson, M. L. Statistics for Managers Using Microsoft Excel 5th Edition. Pearson Prentice Hall, 2008. Reference book 1. Siegel, A. F. Practical Business Statistics 5th Edition. Mc Graw Hill, 2003. 2. Agresti, A. and Franklin, C. Statistics: The Art and Science of Learning from Data. 2nd Edition, Pearson Prentice Hall, 2009. 3. Fraenkel, J., Wallen, N. and Sawin, E. I. Visual Statistics. 4. Any other textbook for introducing basic statistics. 4 Organization of Textbook Presenting and Describing Information Introduction and Data Collection (Chapter 1) Presenting Data in Tables and Charts (Chapter 2) Numerical Descriptive Measures (Chapter 3) Drawing Conclusions About Populations Using Sample Information Basic Probability (Chapter 4) Some Important Discrete Probability Distributions (Chapter 5) The Normal Distribution and Other Continuous Distributions (Chapter 6) Sampling and Sampling Distributions (Chapter 7) Confidence Interval Estimation (Chapter 8) Hypothesis Testing (Chapters 9-12) Decision Making (Chapter 17) 5 Organization of Textbook Making Reliable Forecasts Simple Linear Regression (Chapter 13) Introduction to Multiple Regression (Chapter 14) Multiple Regression Model Building (Chapter 15) Time-Series Forecasting (Chapter 16) Improving Business Process Statistical Applications in Quality Management (Chapter 18) 6 Course Outline Chapter I Data Collection and Data Presentation Chapter 2 Numerical Descriptive Measures Chapter 3 Important Discrete Probability Distributions Chapter 4 Important Continuous Distributions Chapter 5 Sampling and Sampling Distributions Chapter 6 Confidence Interval Estimation Chapter 7 Hypothesis Testing: One Sample Tests Chapter 8 Two-Sample Tests Chapter 9 Chi-squared Tests and Nonparametric Tests Chapter 10* Selected topic 7 Chapter 1 Data Collection and Data Presentation Explain key definitions: Population vs. Sample Primary vs. Secondary Data Parameter vs. Statistic Descriptive vs. Inferential Statistics Describe key data collection methods Describe different sampling methods Probability Samples vs. Nonprobability Samples Identify types of data and levels of measurement Use graphical techniques to organize and present data ordered array stem-and-leaf display frequency distribution, polygon, and ogive scatter diagrams histogram bar charts, pie charts 8 Chapter 2 Numerical Descriptive Measures Mean, median, mode Range, variance, standard deviation, coefficient of variation Five-number summary Box-and-whiskers plot Correlation coefficient 9 Chapter 3 Important Discrete Probability Distribution Define mean and standard deviation Explain covariance and its application in finance Binomial probability distribution Poisson probability distribution Hypergeometric probability distribution Negative binomial distribution, geometirc distribution, multinomial distribution 10 Chapter 4 Important Continuous Distributions Continuous probability distribution Characteristics of the normal distribution Using a normal distribution table Evaluate the normality assumption Uniform and exponential distributions Gamma and Weibull distributions 11 Chapter 5 Sampling and Sampling Distributions Types of sampling methods Sampling distributions Sampling distribution of the mean Sampling distribution of the proportion Central Limit Theorem 12 Chapter 6 Confidence Interval Estimation Point estimate Confidence interval estimate Confidence interval for a population mean Confidence interval for a population proportion Determine the required sample size 13 Chapter 7 Hypothesis Testing: One Sample Tests Null and alternative hypotheses A decision rule for testing a hypothesis Hypothesis testing Type I and Type II errors 14 Chapter 8 Two-Sample Tests Test the difference between two independent population means Test two means from related samples Test the difference between two proportions F test for the difference between two variances 15 Chapter 9 Chi-Square Tests and Nonparametric Tests Chi-square test for the difference between two proportions Chi-square test for differences in more than two proportions Chi-square test for independence The Wilcoxon rank sum test for two population medians The Kruskal-Wallis H-test for multiple population medians 16