Data Analysis: Analyzing, Visualizing & Understanding Data Year 2: Terms I & II (1.0 credit units) Overview This module aims to build skills in multivariate regression analysis using a variety of modelling techniques: linear, limited dependent, panel, time-series and longitudinal models. Students will be proficient users of statistical software and be able to identify, analyze and interpret regression output as well as present data visually. The module teaches skills that students can apply across a range of jobs—in the public, private and third sectors. Emphasis is placed on using real-world data, ‘hands-on’ lab sessions, analysis, interpretation and visualisation. Term 1 1. Review of linear regression • Theory and practice of simple linear regression and how to interpret the output. • The components of a simple linear regression model. 2. R – your key to the world! • The key features of the R statistical programming environment. • The benefits of scripting your analysis. 3. Multiple regression 1 • Theory and implementation of an extension to simple linear model by adding multiple explanatory variables. • The basic assumptions underlying the multiple linear regression model such as collinearity, outliers/leverage and correlated residuals. 4. Multiple regression 2 • Extending the multiple regression model further by including explanatory dummy variables for nominal/ordinal categories. • The lecture also covers interaction effects where one term is modified according the level of another in the model. 5. Multiple regression 3 • Heteroscedasticity • FGLS 6: Why spatial data are special • The features of spatial data. • Examining the importance of location in data analysis 7. Data on the web – Panning for gold in the 21st century • The possibilities and pitfalls of datastores • APIs and web scraping 8. Data management (SQL, database queries) • • Processing and storing large datasets. Database basics. 9. Data visualization best practice • What are the features of a good graphic? • The power of maps 10. Sharing insights • Communicating with a wider audience • Explaining uncertainty Term 2 11. Limited dependent variable models (1) • Most important statistical theory for a broad range of models • Maximum likelihood estimation 12. Limited dependent variable models (2) • Logit/probit: Models for binary dependent variables • Models explaining individual choices 13. Simulating uncertainty • Creating uncertainty measures for predictions • Simulation based solution to complex uncertainty problems 14. Limited dependent variable models (3) • Choice models when an individual faces several options (example: Party choice) • Multinomial logit 15. Repetition week 16. Multilevel modeling (1) • Models accounting for complex data structure • Partial pooling and the power of random effects 17. Multilevel modeling (2) • Varying slopes • Non-nested models • Multilevel, generalised linear models 18. Longitudinal & panel data methods • Assumptions and violations • Correcting standard errors, controlling for autocorrelation 19. Longitudinal and Panel Data Methods 2 • Fixed effects vs random effects • Dynamic models with lagged dependent variables 20. Revision