Module H2

advertisement
Higher Level Module H2
Module H2
Introduction to Basic Statistical Concepts
Synopsis
Except where a census is conducted, the production of all National Statistics involves
collecting information from a carefully chosen and representative subset (sample) of a
target population. Primary interest centres on learning about the characteristics of this
population, using inferential methods based on characteristics of the sample.
This module is concerned with basic statistical concepts that are central to making “sample
to population” inferences in an effective way, while quantifying the extent to which the
results reflect the true character of the population. The module will assume that simple
random sampling has been used in the sample selection process, so that the key concepts
can be understood without concerns about complexities that arise in real life situations.
The initial focus will be on the production of sample-based estimates, together with the
precision of such estimates and recognition of their importance, assuming that samples
have been drawn from an infinite population. Procedures underlying the testing of a
proposed hypothesis concerning the population will be discussed for simple scenarios,
leading to the introduction of several standard statistical distributions (beyond those
introduced in Module H1) and the recognition of errors that arise when statistical tests are
conducted.
Although the contents of this module look traditional, there will be a problem-based
approach to each session to provide the context in which the different statistical concepts
become relevant. Throughout the module, illustrative examples and practical exercises will
be included. The CAST software will be used to provide more practice and to ensure
concepts covered have been well understood.
SADC Course in Statistics
Module H2 – Page 1
Higher Level Module H2
Learning Objectives
Successful students will be able to:

Explain the basic statistical concepts underlying the estimation of population
characteristics from sample values.

Calculate and correctly interpret standard errors and confidence intervals for the
simple situations covered in the module, as well as explain why they are important
and how they can be used in more general situations.

Use concepts above in the testing of hypothesis for comparing two means or two
proportions, for samples assumed to be drawn at random from infinite
populations.

Conduct tests for independence between two categorical variables, state
assumptions underlying such tests and actions to take when assumptions fail.

Discuss the use and limitations of significance probabilities when interpreting
results from tests of hypothesis concerning population parameters.

Have an appreciation of what is meant by non-parametric tests and be able to
conduct such a test for the simple case corresponding to a one-sample t-test.
Pre-requisites:
Students attending this module should have full familiarity with contents covered in
Modules at the Intermediate Level (Modules I1 to I4).
Attendance on Module H1 (at a minimum, Sessions 1-10) is also a compulsory requirement
before attending this module.
SADC Course in Statistics
Module H2 – Page 2
Higher Level Module H2
Contents
Session 1. Introduction and study objectives
Several problem scenarios are introduced to set the scene, to emphasise the need to be
specific about the study objectives and to provide an overview of module contents. The
session will also include a discussion of how the problem objectives can be translated into
questions relating to the data.
Session 2. Samples and populations
Distinguishing between a sample and a population. Benefits and limitations of using a
sample compared to a full census of the population. Finite and infinite populations. Simple
random sampling ideas.
Session 3. Introduction to statistical inference
Addressing the question “What is statistical inference?”. Introducing the concept of a
sampling distribution and the meaning of a standard error. Finding the standard error of a
sample mean.
Session 4. Revision using CAST
Revision of concepts covered in previous sessions and practice with sampling distributions.
Session 5. Meaning and use of confidence intervals
Assessing precision through the use of confidence intervals for a mean. Correct
interpretation of a confidence interval. Revision of the normal distribution and
introduction to the t-distribution.
Session 6. Further ideas concerning confidence intervals
Effect of sample size and variability on width of the confidence interval. Confidence
interval for a population proportion. Recognising and dealing with complications.
SADC Course in Statistics
Module H2 – Page 3
Higher Level Module H2
Session 7. Confidence intervals using CAST
Further practice using CAST to produce confidence intervals for both a population mean
and a proportion.
Session 8. Basic principles of hypothesis tests
Formulating the null and alternative hypotheses for a range of problems. Different types of
errors. Power of a test.
Session 9. Testing a hypothesis
A hypothesis test for a simple problem. Concept of comparing a sample estimate with a
population value, relative to “noise” in the sample estimate. Introducing the concept of
significance. Interpretation of the p-value and reporting results appropriately.
Session 10. Linking tests to confidence intervals (and other issues)
Seeing how a hypothesis can be tested via computation of an appropriate confidence
interval. Test for a population proportion. General issues related to tests of hypotheses in
general, including one-sided and two-sided tests.
Session 11. Tests for variances
Hypothesis test concerning one variance. Extensions to comparing two variances.
Introducing the chi-square test and the F-test. Further illustrative examples.
Session 12. Comparing means from independent samples
How to compare two population means given sample data. Assumptions. Appropriate
ways to present and report test results. Statistical significance versus important findings.
Session 13. Comparing means from paired samples
Distinguishing between paired and independent samples. Tests with paired samples.
Recognising gains in precision with paired samples.
SADC Course in Statistics
Module H2 – Page 4
Higher Level Module H2
Session 14. Comparing two proportions
Using an approximation to the normal distribution to compare two population
proportions. Comparing two proportions using the chi-square tests. Links between the
normal approximation and the chi-square.
Session 15. Comparing several proportions and testing for associations
Using chi-squared tests for comparing several proportions. Examples. Formulating the
hypotheses in terms of the study of association between two categorical variables.
Extending to general r x c tables. Assumptions underlying the chi-square test and ways to
overcome these, including the use of the continuity correction and use of Exact tests.
Session 16. Goodness-of-fit tests (and other issues)
Chi-square tests to study the goodness-of-fit of the distribution of a variable to an assumed
probability distribution. Revision of Binomial and Poisson distributions as a result. Use of
standardised residuals. Presentation of results from a chi-square test.
Session 17. Revision on tests for means using CAST
Further practice on tests for means, including simulation exercises to further illustrate the
exact meaning of a p-value. Distinguishing between a on-sample and a two-sample test.
Distinguishing between paired and independent samples t-tests. Further illustrative
examples.
Session 18. Revision on tests for proportions using CAST
Further practice on comparing proportions and the study of associations. The exact
meaning of independence. Justification of the formula for the chi-squared test statistic.
Simulations to describe the chi-square distribution.
SADC Course in Statistics
Module H2 – Page 5
Higher Level Module H2
Session 19. Introduction to Non-parametric methods
Meaning of non-parametric tests and when they might be used. Using the Sign Test,
including its advantages and limitations. An appreciation of practical problems associated
with non-parametric tests.
Session 20. Common Non-parametric tests for comparing two samples
Understand the logic of tests based on ranks. Interpret and understand the Wilcoxon
signed rank test. Practical problems with non-parametric methods. Review with a
reminder of errors associated with statistical tests.
SADC Course in Statistics
Module H2 – Page 6
Download