Higher Level Module H2 Module H2 Introduction to Basic Statistical Concepts Synopsis Except where a census is conducted, the production of all National Statistics involves collecting information from a carefully chosen and representative subset (sample) of a target population. Primary interest centres on learning about the characteristics of this population, using inferential methods based on characteristics of the sample. This module is concerned with basic statistical concepts that are central to making “sample to population” inferences in an effective way, while quantifying the extent to which the results reflect the true character of the population. The module will assume that simple random sampling has been used in the sample selection process, so that the key concepts can be understood without concerns about complexities that arise in real life situations. The initial focus will be on the production of sample-based estimates, together with the precision of such estimates and recognition of their importance, assuming that samples have been drawn from an infinite population. Procedures underlying the testing of a proposed hypothesis concerning the population will be discussed for simple scenarios, leading to the introduction of several standard statistical distributions (beyond those introduced in Module H1) and the recognition of errors that arise when statistical tests are conducted. Although the contents of this module look traditional, there will be a problem-based approach to each session to provide the context in which the different statistical concepts become relevant. Throughout the module, illustrative examples and practical exercises will be included. The CAST software will be used to provide more practice and to ensure concepts covered have been well understood. SADC Course in Statistics Module H2 – Page 1 Higher Level Module H2 Learning Objectives Successful students will be able to: Explain the basic statistical concepts underlying the estimation of population characteristics from sample values. Calculate and correctly interpret standard errors and confidence intervals for the simple situations covered in the module, as well as explain why they are important and how they can be used in more general situations. Use concepts above in the testing of hypothesis for comparing two means or two proportions, for samples assumed to be drawn at random from infinite populations. Conduct tests for independence between two categorical variables, state assumptions underlying such tests and actions to take when assumptions fail. Discuss the use and limitations of significance probabilities when interpreting results from tests of hypothesis concerning population parameters. Have an appreciation of what is meant by non-parametric tests and be able to conduct such a test for the simple case corresponding to a one-sample t-test. Pre-requisites: Students attending this module should have full familiarity with contents covered in Modules at the Intermediate Level (Modules I1 to I4). Attendance on Module H1 (at a minimum, Sessions 1-10) is also a compulsory requirement before attending this module. SADC Course in Statistics Module H2 – Page 2 Higher Level Module H2 Contents Session 1. Introduction and study objectives Several problem scenarios are introduced to set the scene, to emphasise the need to be specific about the study objectives and to provide an overview of module contents. The session will also include a discussion of how the problem objectives can be translated into questions relating to the data. Session 2. Samples and populations Distinguishing between a sample and a population. Benefits and limitations of using a sample compared to a full census of the population. Finite and infinite populations. Simple random sampling ideas. Session 3. Introduction to statistical inference Addressing the question “What is statistical inference?”. Introducing the concept of a sampling distribution and the meaning of a standard error. Finding the standard error of a sample mean. Session 4. Revision using CAST Revision of concepts covered in previous sessions and practice with sampling distributions. Session 5. Meaning and use of confidence intervals Assessing precision through the use of confidence intervals for a mean. Correct interpretation of a confidence interval. Revision of the normal distribution and introduction to the t-distribution. Session 6. Further ideas concerning confidence intervals Effect of sample size and variability on width of the confidence interval. Confidence interval for a population proportion. Recognising and dealing with complications. SADC Course in Statistics Module H2 – Page 3 Higher Level Module H2 Session 7. Confidence intervals using CAST Further practice using CAST to produce confidence intervals for both a population mean and a proportion. Session 8. Basic principles of hypothesis tests Formulating the null and alternative hypotheses for a range of problems. Different types of errors. Power of a test. Session 9. Testing a hypothesis A hypothesis test for a simple problem. Concept of comparing a sample estimate with a population value, relative to “noise” in the sample estimate. Introducing the concept of significance. Interpretation of the p-value and reporting results appropriately. Session 10. Linking tests to confidence intervals (and other issues) Seeing how a hypothesis can be tested via computation of an appropriate confidence interval. Test for a population proportion. General issues related to tests of hypotheses in general, including one-sided and two-sided tests. Session 11. Tests for variances Hypothesis test concerning one variance. Extensions to comparing two variances. Introducing the chi-square test and the F-test. Further illustrative examples. Session 12. Comparing means from independent samples How to compare two population means given sample data. Assumptions. Appropriate ways to present and report test results. Statistical significance versus important findings. Session 13. Comparing means from paired samples Distinguishing between paired and independent samples. Tests with paired samples. Recognising gains in precision with paired samples. SADC Course in Statistics Module H2 – Page 4 Higher Level Module H2 Session 14. Comparing two proportions Using an approximation to the normal distribution to compare two population proportions. Comparing two proportions using the chi-square tests. Links between the normal approximation and the chi-square. Session 15. Comparing several proportions and testing for associations Using chi-squared tests for comparing several proportions. Examples. Formulating the hypotheses in terms of the study of association between two categorical variables. Extending to general r x c tables. Assumptions underlying the chi-square test and ways to overcome these, including the use of the continuity correction and use of Exact tests. Session 16. Goodness-of-fit tests (and other issues) Chi-square tests to study the goodness-of-fit of the distribution of a variable to an assumed probability distribution. Revision of Binomial and Poisson distributions as a result. Use of standardised residuals. Presentation of results from a chi-square test. Session 17. Revision on tests for means using CAST Further practice on tests for means, including simulation exercises to further illustrate the exact meaning of a p-value. Distinguishing between a on-sample and a two-sample test. Distinguishing between paired and independent samples t-tests. Further illustrative examples. Session 18. Revision on tests for proportions using CAST Further practice on comparing proportions and the study of associations. The exact meaning of independence. Justification of the formula for the chi-squared test statistic. Simulations to describe the chi-square distribution. SADC Course in Statistics Module H2 – Page 5 Higher Level Module H2 Session 19. Introduction to Non-parametric methods Meaning of non-parametric tests and when they might be used. Using the Sign Test, including its advantages and limitations. An appreciation of practical problems associated with non-parametric tests. Session 20. Common Non-parametric tests for comparing two samples Understand the logic of tests based on ranks. Interpret and understand the Wilcoxon signed rank test. Practical problems with non-parametric methods. Review with a reminder of errors associated with statistical tests. SADC Course in Statistics Module H2 – Page 6