Applied Statistics and Data Analysis 2007 – 2008 Edition Teacher:!! Nicholas T. Longford email: nick.longford@upf.edu! Overview The course concentrates on statistical modelling of business and economic data. It covers topics from simple linear regression to generalized regression models, such as logit and Poisson regression, with emphasis on concepts and practice, keeping aside as much of the analytics and the theoretical background of the methods as possible. Special attention will be paid to the statistical practice using real data sets and settings. The statistical software STATA will be used, with other software, such as R, as an alternative. To enhance the practical orientation, attention will be paid to training in clear, accurate and detailed reporting of a statistical analysis with a problem-solving orientation. The final grade will be based on assignments (undertaken weekly), a project or directed reading (small groups of students), and a final exam. The final exam will take place in a computer lab; students will be evaluated on their application of sound statistical principles in practical settings using software and real data. The principal reference for the course is Hamilton (1992). ! Course Outline ! ! ! ! ! ! ! ! Basic data screening and univariate analysis: Variation and distributions. Graphics for displaying and comparing distributions (histograms, stem-and-leaf, Box-plots, quantile-normal plot, etc.). Time series plots, tables and summary statistics, ANOVA and other methods for comparing distributions, missing-data issues. Software for statistical practice. Ordinary regression analysis: bivariate and multiple regression, selection of covariates, regression coefficients, t-test and confidence interval, goodness-of-fit measures, regression with categorical covariates, interactions. Regression diagnostics for the model assumptions and influential data points: correlation and scatterplot matrices, studentized residuals, outliers, influence analysis, residual plots, multicolinearity, departures from normality. Generalized least squares regression and extensions: heteroscedasticity, autocorrelation, measurement error, specification analysis, nonlinear regression. Logit and probit regression and other generalized linear models: motivation, estimation, hypothesis testing and confidence intervals, interpretation, influential observations, diagnostic graphs, conditional effects plot, probit regression, Poisson regression. Elements of time series analysis. Principal component analysis and factor analysis and their applications. Introduction to other methods for multivariate analysis. References Bowerman, B.L., and O!Connell, R.T. (2003). Business Statistics in Practice. McGraw-Hill, Irwin. Greene, W.H. (1993). Econometric Analysis. McMillan Publishing Company. Hamilton, L.C. (1992). Regression with Graphics: A Second Course in Applied Statistics. Brooks/Cole Publishing Company. Hamilton, L.C. (1998). Statistics with STATA 5. Duxbury Press. Hutcheson, G., and Sofroniou, N. (1999). The Multivariate Social Scientist. Sage Publications. Kutner, M.H., Nachtsheim, C.J., and Neter, J. (2003). Applied Linear Regression Models. Irwin. Koop, G. (2000). Analysis of Economic Data. Wiley. Data sets and other materials for the course will be available through ftp.