Introduction to Statistical Modeling Tuesday, January 12, 9:00 a.m.– noon and 1:30 p.m.– 4:30 p.m. Room 2006, 2nd Floor, Moscone Center West Wei Zhu, State University of New York at Stony Brook Email: weizhu2000@gmail.com Abstract Statistics is a branch of the mathematical sciences that pertains to the collection and analysis of data. The goal of statistical inference is to make a probabilistic statement about the underlying population based on the given sample. A classical statistical model is usually one or a set of stochastic equations (linear or nonlinear) linking the relevant variables observed. For example, one may wish to establish a simple linear regression model predicting the height of a son based on the height of his father. To estimate the regression line, one can simply employ the ordinary least squares (OLS) method developed by Gauss and Legendre. However, when randomness exists in both measurements, the OLS method will no longer be suitable. For instance, in gauging the relationship between the concentrations of organic aerosols and anthropogenic carbon monoxide, we found that both quantities, measured by the mass spectrometer and the UV fluorescence analyzer respectively, contain measurement errors and possibly other volatilities due to air dynamics, and thus the OLS method is obsolete in this situation. What are the alternative modeling methods? In this one-day workshop intended for those who wish to broaden their horizons (and job market if pertinent) by learning more statistics, we will present a summary of classical as well as modern statistical modeling methods. Beginning with the simple linear regression introduced above, we will move on to the generalized linear models, categorical data analysis, time series models, survival analysis, and structural equation modeling (also called path analysis). We will discuss pertinent job markets and the knowledge/skills necessary for those markets. We will conclude the workshop by introducing the bootstrap resampling method and its role in modern statistical modeling and inference. A list of reference books corresponding to these subjects will be provided for the interested audience. The lecturer, Professor Wei Zhu, has a B.S. in mathematics and a Ph.D. in Biostatistics from UCLA. For the past decade, she has applied statistics to a wide spectrum of problems including brain imaging analysis, climate modeling, clinical trials, genetics and proteomics. She is an active educator whose former doctoral students are currently employed in academia, the pharmaceutical, Internet, and financial industries. She is the director of the Data Management and Statistical Analysis Core of the Alzheimer's Disease Research Center at New York University. She is also the director of the Bioinformatics Laboratory at the SBU Center of Excellence in Wireless and Information Technology. She collaborates closely with scientists from the Brookhaven National Laboratory, the Cold Spring Harbor Laboratory, and the National Institutes of Health. Course Outline Morning: Linear Models and Structural Equation Modeling 1. Simple linear regression analysis 2. Multiple linear regression analysis 3. General linear model and the dummy variables 4. Error in variable models – with focus on the main stream non-parametric methods, the orthogonal regression, and the geometric mean regression 5. Structural equation modeling 6. Statistics software – with focus on SAS Afternoon: Generalized Linear Models, Survival Analysis, Time Series Models and Resampling 1. Generalized linear models – with focus on the logistic regression model 2. Survival models – with focus on the proportional hazard model and the accelerated failure time model 3. Time series models – with focus on the autoregressive models and the moving average models 4. Resampling – the Bootstrap and the Jackknife 5. Overview of major job market 6. Questions and hopefully, answers References (selected): 1. A Agresti (2002). Categorical Data Analysis. Second Edition. Wiley-Interscience. 2. PD Allison (1995). Survival Analysis Using SAS: A Practical Guide. SAS Publishing. 3. GEP Box, GM Jenkins, and GC Reinsel (2008). Time Series Analysis: Forecasting and Control. F4th edition. Wiley. 4. PJ Brockwell and RA Davis (2010). Introduction to Time Series and Forecasting. Second Edition. Springer. 5. BM Byrne (2006). Structural Equation Modeling With Eqs: Basic Concepts, Applications, And Programming (Multivariate Applications). 6. BM Byrne (1998). STRUCTURAL EQUATION MODELING with LISREL, PRELIS, and SIMPLIS: Basic Concepts, Applications, and Programming (Multivariate Applications Book Series). 7. RP Cody and JK Smith (2005). Applied Statistics and the SAS Programming Language (5th Edition). Prentice Hall. 8. Jan de Leeuw and Erik Meijer (2008). Handbook of Multilevel Analysis. Springer. 9. NR Draper and H Smith (1998). Applied Regression Analysis. Wiley-Interscience. 10. B Efron and RJ Tibshirani (1994). An Introduction to the Bootstrap. Chapman & Hall/CRC. 11. JD Kalbfleisch and RL Prentice (2002).The Statistical Analysis of Failure Time Data. WileyInterscience. 12. P McCullagh and JA Nelder (1998). Generalized Linear Models, Second Edition. Chapman & Hall/CRC. 13. T Raykov and GA Marcoulides (2006). A First Course in Structural Equation Modeling, Second Edition. Lawrence Erlbaum Associates. 14. GAF Seber and AJ Lee (2003). Linear Regression Analysis. Wiley-Interscience. 15. GAF Seber (2004) Multivariate Observations. Wiley-Interscience. 16. ME Stokes, CS Davis, and GG Koch (2009). Categorical Data Analysis Using the SAS System. SAS Publishing. 17. X Zhou (2008). A Practical Guide To Quantitative Finance Interviews. CreateSpace. moving average, path diagram