Probability distributions AS91586 Apply probability distributions in solving problems NZC level 8 • Investigate situations that involve elements of chance • calculating and interpreting expected values and standard deviations of discrete random variables • applying distributions such as the Poisson, binomial, and normal AS91586 Apply probability distributions in solving problems • Methods include a selection from those related to: • discrete and continuous probability distributions • mean and standard deviation of random variables • distribution of true probabilities versus distribution of model estimates of probabilities versus distribution of experimental estimates of probabilities. AO8-4 TKI Calculating and interpreting expected values and standard deviations of discrete random variables: A statistical data set may contain discrete numerical variables. These have frequency distributions that can be converted to empirical probability distributions. Distributions from both sources have the same set of possible features (centre, spread, clusters, shape, tails, and so on) and we can calculate the same measures (mean, SD, and so on) for them. • Makes a reasonable estimate of mean and standard deviation from a plot of the distribution of a discrete random variable. • Solves and interprets solutions of problems involving calculation of mean, variance and standard deviation from a discrete probability distribution. • Solves and interprets solutions of problems involving linear transformations and sums (and differences) of discrete random variables. Applying distributions such as the Poisson, binomial, and normal: • They learn that some situations that satisfy certain conditions can be modelled mathematically. The model may be Poisson, binomial, normal, uniform, triangular, or others, or be derived from the situation being investigated. • Recognises situations in which probability distributions such as Poisson, binomial, and normal are appropriate models, demonstrating understanding of the assumptions that underlie the distributions. • Selects and uses an appropriate distribution to model a situation in order to solve a problem involving probability. • Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the link between probabilities and areas under density functions for continuous outcomes (for example, normal, triangular, or uniform, but nothing requiring integration). • Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the way a probability distribution changes as the parameter values change. • Selects and uses an appropriate distribution to solve a problem involving finding and using estimates of parameters. • Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the relationship between true probability (unknown and unique to the situation), model estimates (theoretical probability) and experimental estimates. • Uses a distribution to estimate and calculate probabilities, including by simulation. AS 3.14 summary • Includes expected value and standard deviation (and variance). • Includes sums and differences (and linear combinations) of random variables. • Includes binomial, Poisson and normal, but also includes uniform, triangular distributions and experimental distributions. • Requires consideration of context as well as appearance of the distribution when selecting a model. Looking at distributions (simulated normal distribution) • Small samples do not always have distributions like the population they come from. • When looking at distributions, a sample of 30 is much too small to give a good picture of the whole population distribution. Looking at distributions (simulated normal distribution) • Large samples do have distributions like the population they come from. • When looking at distributions, a sample of about 200 is sufficient to give a picture of the whole population distribution. Estimating mean and standard deviation To estimate mean and standard deviation, students need to know that: • The mean is pulled towards extreme values • The SD is stretched by extreme values If the distribution is approximately normal, the mean is the middle, and the SD is roughly 1/6th the range (97.8% within μ ± 3σ). Estimating mean and standard deviation for any distribution Estimating the mean: • Estimate the median and adjust towards extreme values. Estimating the standard deviation: • Estimate the median distance from the mean and adjust it (stretch it if there are extreme values). Estimate the mean and standard deviation of the age of students completing the census@school survey. Mean = 12.3 years SD = 1.8 years Words remembered in Kim’s Game Mean = 13.1 SD = 2.4 Mean = 9.0 SD = 2.8 Text messages sent in a day by stage one university students Mean = 38 messages SD = 57 messages Number of pairs of shoes owned by stage one university students Mean = 10.4 pairs SD = 8.9 pairs words memorised with music 16 14 frequency 12 10 Mean = 5.9 words 8 6 4 SD = 2.5 words 2 0 1 2 3 4 5 6 7 8 9 10 number of words word memorised without music Mean = 7.0 words 16 14 frequency 12 SD = 23 words 10 8 6 4 2 0 1 2 3 4 5 6 7 number of words 8 9 10 Introducing distributions How do you introduce: • Binomial • Poisson • Normal • Uniform • Triangular distributions? Introducing the binomial distribution • Combinations and permutations are still in the curriculum, so you can still teach them if you want to. • You can teach the binomial distribution without using combinations by using trees. Introduce the binomial distribution as a shortcut for complicated trees. Chuck-a-luck • A gambling game played at carnivals, played against a banker. • A player pays a dollar to play and rolls 3 dice. • If no 4s are rolled, the player loses. • Otherwise the player gets back one dollar for every 4 rolled and gets their original dollar back. Introducing the binomial distribution Once students see the pattern emerge, they can start to generalise it, using Pascal’s triangle or an understanding of combinations to get the coefficients. For some students, it may be enough to know that the calculator is a shortcut method for working out probabilities from trees like these. Poisson distributions • Hokey Pokey ice-cream – is Tip Top really the best? • Choc chip cookies: number of choc chips visible on an area of cookie (Farmbake Triple Choc works well - do white chips and dark chips separately). Discrete uniform and triangular distributions Uniform: roll of one die Triangular: Sum of two dice 0.4 sum of two dice 25 0.3 20 0.25 15 frequenct 0.35 0.2 10 0.15 5 0.1 0 0.05 1 2 3 4 5 6 7 dice sum 0 1 2 3 4 5 6 8 9 10 11 12 Continuous probability graphs What are the units on the vertical axis for a continuous probability function? Continuous probability graphs are probability density functions The vertical axis measures the rate probability/x, which is called probability density. Probability density is only meaningful in terms of area. bus waiting time (1) The downtown inner link bus in Auckland arrives at a stop every ten minutes, but has no set times. If I turn up at the bus stop, how long will I expect to wait for a bus? What will the distribution of wait times look like? a b c 0.1 0 10 Which is more likely: a wait of between 2 and 5 minutes, or a wait of more than 6 minutes, measured to the nearest minute? 0.1 0 10 Bus waiting time (2) • My own bus route (277) runs only every half hour, and isn’t as reliable as the inner link. • I know that the bus is most likely to appear on time, but could in fact turn up at any time between the time it is due and half an hour later. What is the best model for wait time, given the available information? In the real world: Uniform models are used for modelling distributions when the only information you have are maximum and minimum. Triangular models are used for modelling distributions when the only information you have are maximum, minimum and average (could be the mode). a b c What is the probability that I will have to wait longer than 20 minutes for a bus? 1 15 0 30 My interpretation of AS 3.14 • Expect more questions giving experimental data to be fitted to a theoretical model. • Expect more evaluation of how well a theoretical model fits experimental data. • Expect more interpretation of the application of a model in context. Teaching and learning Students should: • record their hunch every time you start an investigation, and compare the results to their hunch. • always consider the context and the distribution you would expect in that context, as well as their observations of the data available. • Estimate the mean and the standard deviation every time they look at a distribution (write down the estimate, then check to see how close they were). Learning could start with: • Questions to investigate, and gathering data: What is the probability that at least 4 people in a class have the same birth month? • Data in tables: which distribution (if any) would you use to model it? Estimate probabilities • Data in graphs: estimate mean and standard deviation, which distribution (if any) would model it? Estimate probabilities. A learning activity • From Teaching Statistics: a bag of tricks (Gelman and Nolan) What do you notice? • Students tend to group their guessed histogram into large groups. • Different bin widths will give different estimates of probability. • What else do you notice? Misunderstanding of probability may be the greatest of all impediments to scientific literacy. Stephen J Gould