Statistical Modelling

advertisement
Higher Level Module H8
Module H8
Statistical Modelling
Synopsis
The formulation of statistical models, and their application to address particular types of
study objectives, forms the core component of standard approaches to data analysis. This
module will cover the basic concepts involved in fitting linear models appropriate for
explaining the variation in data described by a quantitative measurement variable.
The module will begin with simple linear regression ideas and then extend them to cover
situations involving several explanatory variables that may potentially contribute to
explaining the variability in a quantitative response of key interest to the study objectives.
Models that can be used to compare means across two or more populations will also be
introduced. An appreciation will be given of the way in which these models can be
extended to fit a general linear model that allows for both quantitative and categorical
explanatory variables.
Objectives
Successful students will be able to:

Explain the general principles underlying statistical modelling for a quantitative
response.

Fit models involving both quantitative and categorical explanatory variables.

Identify factors (quantitative and categorical) that can potentially influence a key
response of interest in a given practical situation by the application of modelling
techniques.

Interpret results from fitting linear models.

Report these results in ways that are accessible to a non-statistical audience.
SADC Course in Statistics
Module H8 – Page 1
Higher Level Module H8
Pre-requisites
Intermediate Level and Module H1 and Module H2, or equivalent. Familiarity with the use
of a statistics software package that has facilities for statistical modelling will be needed.
Contents
Session 1. Setting the scene
Objectives that can be addressed through the use of statistical models. Identifying response
and explanatory variables.
Session 2. Simple Linear Regression
Investigating the relationship between a quantitative response of interest (y), and a
quantitative explanatory variable (x), graphically, and through fitting a straight line.
Interpreting the fitted model.
Session 3. Inferences about the regression line
Inferences concerning the slope of the regression line using a t-test or an F-test.
Introduction to analysis of variance ideas.
Sessions 4. Correlation and the coefficient of determination
Linear correlation, their benefits and limitations. Examining how much of the variation in y
has been explained by x? Calculation and interpretation of the coefficient of determination
and its connection to linear correlation.
Session 5. Assumptions underlying regression analysis
Assumptions associated with inferences drawn from a regression line. Procedures for
checking assumptions via a residual analysis. Consequences of failure of assumptions.
Dealing with assumption failure.
SADC Course in Statistics
Module H8 – Page 2
Higher Level Module H8
Session 6. Multiple linear regression: Introduction
Extensions to more than one explanatory variable, i.e. dealing with multiple regression
problems. Interpretation of computer output.
Session 7. Multiple linear regression: Further issues and anova results
Further examples to illustrate additional issues associated with a multiple regression
analysis. Sequential and adjusted sums of squares and their interpretation.
Session 8. Choosing the “best” model
Methods of variable selection. Dangers of using automatic procedures. Lurking variables.
Session 9. Predictions from the regression model
Using the regression line for predictions. Standard errors associated with different types of
predictions. Example of a real life situation. Assessing the model. Dangers of
extrapolation.
Session 10. Revision of key regression ideas
Summary and revision of key ideas learnt in previous sessions with due emphasis on the
need to keep study objectives in mind.
Session 11. Analysis of variance (ANOVA) for comparing population means
ANOVA objectives and underlying principles. Interpretation and presentation of results.
Making simple comparisons.
Session 12. A model for comparing means
A model for the analysis of variance with a single categorical factor. Understanding model
parameters and how they relate to comparisons between means. Interpreting computer
output. Study of residuals.
SADC Course in Statistics
Module H8 – Page 3
Higher Level Module H8
Session 13. Analysis of variance with two categorical factors
A model for the analysis of variance with two categorical factors. Understanding the
difference between raw means and adjusted means. Interpreting computer output.
Session 14. Comparing regressions
Understanding and interpreting components of a linear model with one quantitative
variable and one categorical factor. Writing the form of the regression equation for each
level of the categorical factor. Fitting an interaction term to decide whether parallel lines or
separate lines are more appropriate.
Session 15. Revision of anova ideas
Summary and revision of ideas learnt in sessions 11 to 14.
Sessions 16-19. Case Study work
Applying ideas learnt in the module to a specific case study scenario to answer questions
posed by objectives. Exploratory analysis, regression modelling, summarising key findings,
interpreting results and reporting. Preparing presentations on case study work.
Session 20. Extending modelling ideas
An appreciation of the concept of a general linear model in terms of the simple expression
data = pattern + residual. Modelling ideas revised and an appreciation given of extensions of
the model to non-normally distributed responses: the existence of Generalised Linear
Models. Participants’ presentations of case study findings and discussion.
SADC Course in Statistics
Module H8 – Page 4
Download