Introduction_to_Statistical_Modeling_word2003

advertisement
Introduction to Statistical Modeling
Tuesday, January 12, 9:00 a.m.– noon and 1:30 p.m.– 4:30 p.m.
Room 2006, 2nd Floor, Moscone Center West
Wei Zhu, State University of New York at Stony Brook
Email: weizhu2000@gmail.com
Abstract
Statistics is a branch of the mathematical sciences that pertains to the collection and analysis of data.
The goal of statistical inference is to make a probabilistic statement about the underlying population
based on the given sample. A classical statistical model is usually one or a set of stochastic equations
(linear or nonlinear) linking the relevant variables observed. For example, one may wish to establish a
simple linear regression model predicting the height of a son based on the height of his father. To
estimate the regression line, one can simply employ the ordinary least squares (OLS) method developed
by Gauss and Legendre. However, when randomness exists in both measurements, the OLS method will
no longer be suitable. For instance, in gauging the relationship between the concentrations of organic
aerosols and anthropogenic carbon monoxide, we found that both quantities, measured by the mass
spectrometer and the UV fluorescence analyzer respectively, contain measurement errors and possibly
other volatilities due to air dynamics, and thus the OLS method is obsolete in this situation. What are the
alternative modeling methods?
In this one-day workshop intended for those who wish to broaden their horizons (and job market if
pertinent) by learning more statistics, we will present a summary of classical as well as modern statistical
modeling methods. Beginning with the simple linear regression introduced above, we will move on to the
generalized linear models, categorical data analysis, time series models, survival analysis, and structural
equation modeling (also called path analysis). We will discuss pertinent job markets and the
knowledge/skills necessary for those markets. We will conclude the workshop by introducing the
bootstrap resampling method and its role in modern statistical modeling and inference. A list of reference
books corresponding to these subjects will be provided for the interested audience.
The lecturer, Professor Wei Zhu, has a B.S. in mathematics and a Ph.D. in Biostatistics from UCLA. For
the past decade, she has applied statistics to a wide spectrum of problems including brain imaging
analysis, climate modeling, clinical trials, genetics and proteomics. She is an active educator whose
former doctoral students are currently employed in academia, the pharmaceutical, Internet, and financial
industries. She is the director of the Data Management and Statistical Analysis Core of the Alzheimer's
Disease Research Center at New York University. She is also the director of the Bioinformatics
Laboratory at the SBU Center of Excellence in Wireless and Information Technology. She collaborates
closely with scientists from the Brookhaven National Laboratory, the Cold Spring Harbor Laboratory, and
the National Institutes of Health.
Course Outline
Morning: Linear Models and Structural Equation Modeling
1. Simple linear regression analysis
2. Multiple linear regression analysis
3. General linear model and the dummy variables
4. Error in variable models – with focus on the main stream non-parametric methods, the orthogonal
regression, and the geometric mean regression
5. Structural equation modeling
6. Statistics software – with focus on SAS
Afternoon: Generalized Linear Models, Survival Analysis, Time Series Models and Resampling
1. Generalized linear models – with focus on the logistic regression model
2. Survival models – with focus on the proportional hazard model and the accelerated
failure time model
3. Time series models – with focus on the autoregressive models and the moving average
models
4. Resampling – the Bootstrap and the Jackknife
5. Overview of major job market
6. Questions and hopefully, answers
References (selected):
1. A Agresti (2002). Categorical Data Analysis. Second Edition. Wiley-Interscience.
2. PD Allison (1995). Survival Analysis Using SAS: A Practical Guide. SAS Publishing.
3. GEP Box, GM Jenkins, and GC Reinsel (2008). Time Series Analysis: Forecasting and Control.
F4th edition. Wiley.
4. PJ Brockwell and RA Davis (2010). Introduction to Time Series and Forecasting. Second
Edition. Springer.
5. BM Byrne (2006). Structural Equation Modeling With Eqs: Basic Concepts, Applications, And
Programming (Multivariate Applications).
6. BM Byrne (1998). STRUCTURAL EQUATION MODELING with LISREL, PRELIS, and SIMPLIS:
Basic Concepts, Applications, and Programming (Multivariate Applications Book Series).
7. RP Cody and JK Smith (2005). Applied Statistics and the SAS Programming Language (5th
Edition). Prentice Hall.
8. Jan de Leeuw and Erik Meijer (2008). Handbook of Multilevel Analysis. Springer.
9. NR Draper and H Smith (1998). Applied Regression Analysis. Wiley-Interscience.
10. B Efron and RJ Tibshirani (1994). An Introduction to the Bootstrap. Chapman & Hall/CRC.
11. JD Kalbfleisch and RL Prentice (2002).The Statistical Analysis of Failure Time Data. WileyInterscience.
12. P McCullagh and JA Nelder (1998). Generalized Linear Models, Second Edition. Chapman &
Hall/CRC.
13. T Raykov and GA Marcoulides (2006). A First Course in Structural Equation Modeling,
Second Edition. Lawrence Erlbaum Associates.
14. GAF Seber and AJ Lee (2003). Linear Regression Analysis. Wiley-Interscience.
15. GAF Seber (2004) Multivariate Observations. Wiley-Interscience.
16. ME Stokes, CS Davis, and GG Koch (2009). Categorical Data Analysis Using the SAS System.
SAS Publishing.
17. X Zhou (2008). A Practical Guide To Quantitative Finance Interviews. CreateSpace.
moving average, path diagram
Download