Higher Level Module H1 Module H1 Using Probability Ideas in Dealing with Data Synopsis This module will provide students with an understanding of basic ideas about probabilities and their manipulation, and of elementary probability distributions and their uses. This should equip them to discuss these ideas intelligently in the context of a National Statistical System, and provide a basis for further training in this programme or thereafter. The demographic section of the syllabus concerns some ways of making effective use of routinely collected population statistics, of the role of probability in providing a rational basis for calculations and of the need to combine that methodology with careful consideration of practicalities, and a critical approach to interpretation. Objectives Successful students will be able to: Explain basic concepts of probability and probability distribution needed to underpin later learning of general statistical work. Distinguish between discrete and continuous measurements and how this plays out in probability contexts. Recognise the usefulness of probability distribution models for statistical inference, while realising that such models are based on assumptions that may or may not be acceptable in practice. Use probability concepts, and realistic interpretation of figures derived from data, in the construction and application of life tables, including basic formulations of population projection. SADC Course in Statistics Module H1 – Page 1 Higher Level Module H1 Expected Outcomes In respect of probability ideas, the module is preparatory/foundational. The demographic strand, in the second half of the module, applies probability ideas but combines these with meticulous arithmetical manipulation of survival data to build up the concept of life expectancy. Students will learn to appreciate its uses. Further areas are included where life table formats find application. First, competing risks are discussed to illustrate how lives (e.g. working lives in a given organisation) can be terminated in several ways, and how this feeds into ideas about multiple decrement tables. As well as mortality, participants will learn how those concerned with population projections need also to consider figures for, and assumptions about, fertility and migration. Pre-requisites A good mathematical foundation as can be achieved by completing the Arithmetic, Algebra and parts (linear, polynomial, exponential, logarithm function and sigma notation) of the Functions, Graphs and Sequences modules of an excellent set of training materials available at www.mathtutor.ac.uk. Students attending this module should also be familiar with the use of Excel and with the use of the Excel add-in named SSC-Stat. This would be equivalent to skills gained by attending the Basic Level module B2 and the Intermediate Level module I2. It is recommended that students taking this module should be familiar with Intermediatelevel module I5 on Basic Demographic and Epidemiological Ideas, or courses at least equivalent. SADC Course in Statistics Module H1 – Page 2 Higher Level Module H1 Contents Session 01. Introduction to Probability and Life-Table ideas Meaning of probability, probability based on relative frequencies/ proportions. Probability concepts in a life-table. Session 02. Laws of Probability: Fundamental laws of probability. Results emerging from these laws. Venn diagrams, events, union and intersection of events, complement of an event, mutually exclusive events. Session 03. Conditional Probabilities and Independence: Definitions, notation. Independence of events. Conditional probability and independence. Law of Total Probability. Bayes’ Theorem. Addition rule for probabilities. Multiplication rule for independent events. Tree diagrams. Session 04. Probability Distributions: Discrete and continuous random variables. Meaning of a probability distribution. Examples. Expected value, moments and variance of a random variable. Skewness and Kurtosis. Cumulative distribution function. Session 05. Joint Distributions: Definitions. Joint and Marginal distributions. Conditional distributions. Independence. Using two-way tables of frequencies to determine joint, marginal and conditional probabilities. SADC Course in Statistics Module H1 – Page 3 Higher Level Module H1 Session 06. The binomial distribution: Introducing a discrete distribution, i.e. the binomial distribution. Importance of recognising the underlying data-generating process. Its depiction and interpretation of probabilities. Examples with varying values of p. Mean and variance of the binomial distribution. Session 07. The Poisson distribution: Definition of the Poisson distribution. Worked examples. Mean and variance of the Poisson distribution. Graphs for varying values of the Poisson parameter . Cumulative Poisson probabilities. Session 08. The normal distribution: Normal distribution introduced in own right. Continuous variables as opposed to discrete. Symmetry. Mean and variance – diagrams. Probability as area under curve. Tables of the standard normal distribution. Cumulative curve. Session 09. Importance of the normal distribution: Underlying notion of normal variation as resultant of many minor influences up and down. The Central Limit Theorem and consequences thereof. Normal approximation to the binomial and Poisson distributions. Checking for normality using normal probability plots. Session 10. Review and further practice: Review of the three main distributions. Poisson approximation to the binomial. Identifying variables as following a binomial, Poisson or normal distribution. Practice with further examples. SADC Course in Statistics Module H1 – Page 4 Higher Level Module H1 Session 11. Using Probability Ideas in Life Tables: The widespread use of Life Tables. The data input: a sequence of conditional probabilities of death in a single-year {qx} or n-year period {nqx}. The abridged Life Table. Computations of numbers surviving. Demographic “algebra” or “shorthand”. Session 12. Basic Life Table Computations – I: The calculations of numbers dying by age(group) i.e. {ndx}, and of years lived i.e. {nLx}. Interpretations. Graphs and their usefulness. Session 13. Basic Life Table Computations – II: Completing the Life Table. The calculations of residual life i.e. cumulative total years lived beyond exact age x i.e. {Tx}, and of life expectancy i.e. {ex}. Interpretations. Graphs and their usefulness. Session 14. A Life Table Discussion Topic: An extension of the “normal” basic calculations of the Life Table to {tpx} and {t|kqx}, motivated by a published example, and discussed through a series of classroom questions, to practise and develop fluency in using Life Table ideas. Session 15. Putting the Life Table in Context: Discussion of the real interpretation of cohort and cross-sectional Life Tables, and the stationary population. Discussion of data sources and the derivation of single-year {qx} ~ revising coverage in module I5 ~ and n-year period {nqx} ~ an extension of previous discussion. Brief description of role of Model Life Tables, of the need for large data samples, of the importance to insurers of allowing for life-style factors, and of statisticians’ responsibilities to work with other users. SADC Course in Statistics Module H1 – Page 5 Higher Level Module H1 Session 16. Applications of Stationary Population Ideas: Elementary examples of how Life Table ideas find application in manpower planning, illustrated by two very simple scenarios and discussed through a series of classroom questions, to practise and further develop fluency in using Life Table idea. Session 17. Competing Risks & Multiple Decrement Tables: The idea of having more than one exit from the Life Table population. Example developed of “competing risks” ~ medical statistical terminology for several disease groups as multiple possible causes of death. Dependent observable rates and the computation of “independent” rates. Illustration of how “what-if” calculations can be based on the independent rates and translated back into dependent rates. Session 18. Fertility Ideas: Simple cross-sectional rates: crude birth rate, general and age-specific fertility rates. Effect of changes of age at child-bearing. Child-woman ratio. Generation-based rates and ageperiod-cohort effects. Average completed family size, total fertility rate, gross and net reproduction rates. Critique. Session 19. Population Projections - I: Projection, not prediction. “Mathematical” models of total population. Component methods. Projecting age forward and accounting for deaths. Migration ~ discussion based on UK material. Age-specific fertility and births. Infant deaths. Session 20. Population Projections - II: Reasons for using projections: education planning examples. Politics of migration ~ UK material. Smoother projections based on information about generation effects. General use of varied assumptions in “steering” projections. SADC Course in Statistics Module H1 – Page 6