Higher Level Module H8 Module H8 Statistical Modelling Synopsis The formulation of statistical models, and their application to address particular types of study objectives, forms the core component of standard approaches to data analysis. This module will cover the basic concepts involved in fitting linear models appropriate for explaining the variation in data described by a quantitative measurement variable. The module will begin with simple linear regression ideas and then extend them to cover situations involving several explanatory variables that may potentially contribute to explaining the variability in a quantitative response of key interest to the study objectives. Models that can be used to compare means across two or more populations will also be introduced. An appreciation will be given of the way in which these models can be extended to fit a general linear model that allows for both quantitative and categorical explanatory variables. Objectives Successful students will be able to: Explain the general principles underlying statistical modelling for a quantitative response. Fit models involving both quantitative and categorical explanatory variables. Identify factors (quantitative and categorical) that can potentially influence a key response of interest in a given practical situation by the application of modelling techniques. Interpret results from fitting linear models. Report these results in ways that are accessible to a non-statistical audience. SADC Course in Statistics Module H8 – Page 1 Higher Level Module H8 Pre-requisites Intermediate Level and Module H1 and Module H2, or equivalent. Familiarity with the use of a statistics software package that has facilities for statistical modelling will be needed. Contents Session 1. Setting the scene Objectives that can be addressed through the use of statistical models. Identifying response and explanatory variables. Session 2. Simple Linear Regression Investigating the relationship between a quantitative response of interest (y), and a quantitative explanatory variable (x), graphically, and through fitting a straight line. Interpreting the fitted model. Session 3. Inferences about the regression line Inferences concerning the slope of the regression line using a t-test or an F-test. Introduction to analysis of variance ideas. Sessions 4. Correlation and the coefficient of determination Linear correlation, their benefits and limitations. Examining how much of the variation in y has been explained by x? Calculation and interpretation of the coefficient of determination and its connection to linear correlation. Session 5. Assumptions underlying regression analysis Assumptions associated with inferences drawn from a regression line. Procedures for checking assumptions via a residual analysis. Consequences of failure of assumptions. Dealing with assumption failure. SADC Course in Statistics Module H8 – Page 2 Higher Level Module H8 Session 6. Multiple linear regression: Introduction Extensions to more than one explanatory variable, i.e. dealing with multiple regression problems. Interpretation of computer output. Session 7. Multiple linear regression: Further issues and anova results Further examples to illustrate additional issues associated with a multiple regression analysis. Sequential and adjusted sums of squares and their interpretation. Session 8. Choosing the “best” model Methods of variable selection. Dangers of using automatic procedures. Lurking variables. Session 9. Predictions from the regression model Using the regression line for predictions. Standard errors associated with different types of predictions. Example of a real life situation. Assessing the model. Dangers of extrapolation. Session 10. Revision of key regression ideas Summary and revision of key ideas learnt in previous sessions with due emphasis on the need to keep study objectives in mind. Session 11. Analysis of variance (ANOVA) for comparing population means ANOVA objectives and underlying principles. Interpretation and presentation of results. Making simple comparisons. Session 12. A model for comparing means A model for the analysis of variance with a single categorical factor. Understanding model parameters and how they relate to comparisons between means. Interpreting computer output. Study of residuals. SADC Course in Statistics Module H8 – Page 3 Higher Level Module H8 Session 13. Analysis of variance with two categorical factors A model for the analysis of variance with two categorical factors. Understanding the difference between raw means and adjusted means. Interpreting computer output. Session 14. Comparing regressions Understanding and interpreting components of a linear model with one quantitative variable and one categorical factor. Writing the form of the regression equation for each level of the categorical factor. Fitting an interaction term to decide whether parallel lines or separate lines are more appropriate. Session 15. Revision of anova ideas Summary and revision of ideas learnt in sessions 11 to 14. Sessions 16-19. Case Study work Applying ideas learnt in the module to a specific case study scenario to answer questions posed by objectives. Exploratory analysis, regression modelling, summarising key findings, interpreting results and reporting. Preparing presentations on case study work. Session 20. Extending modelling ideas An appreciation of the concept of a general linear model in terms of the simple expression data = pattern + residual. Modelling ideas revised and an appreciation given of extensions of the model to non-normally distributed responses: the existence of Generalised Linear Models. Participants’ presentations of case study findings and discussion. SADC Course in Statistics Module H8 – Page 4