Chapter 6 Probability Distributions Learn …. To analyze how likely it is that sample results will be “close” to population values How probability provides the basis for making statistical inferences Agresti/Franklin Statistics, 1 of 139 Inferential Statistics Use sample data to make decisions and predictions about a population Agresti/Franklin Statistics, 1e, 2 of 139 Section 6.1 How Can We Summarize Possible Outcomes and Their Probabilities? Agresti/Franklin Statistics, 1e, 3 of 139 Randomness The numerical values that a variable assumes are the result of some random phenomenon: • Selecting a random sample for a • population or Performing a randomized experiment Agresti/Franklin Statistics, 1e, 4 of 139 Random Variable A random variable is a numerical measurement of the outcome of a random phenomenon. Agresti/Franklin Statistics, 1e, 5 of 139 Random Variable Use letters near the end of the alphabet, such as x, to symbolize variables. Use a capital letter, such as X, to refer to the random variable itself. Use a small letter, such as x, to refer to a particular value of the variable. Agresti/Franklin Statistics, 1e, 6 of 139 Probability Distribution The probability distribution of a random variable specifies its possible values and their probabilities. Agresti/Franklin Statistics, 1e, 7 of 139 Discrete Random Variable The possible outcomes are a set of separate numbers: (0, 1,2, …). Agresti/Franklin Statistics, 1e, 8 of 139 Probability Distribution of a Discrete Random Variable A discrete random variable X takes a set of separate values (such as 0,1,2,…) Its probability distribution assigns a probability P(x) to each possible value x: • For each x, the probability P(x) falls between 0 • and 1 The sum of the probabilities for all the possible x values equals 1 Agresti/Franklin Statistics, 1e, 9 of 139 Example: How many Home Runs Will the Red Sox Hit in a Game? What is the estimated probability of at least three home runs? Agresti/Franklin Statistics, 1e, 10 of 139 Example: How many Home Runs Will the Red Sox Hit in a Game? Agresti/Franklin Statistics, 1e, 11 of 139 Parameters of a Probability Distribution Parameters: numerical summaries of a probability distribution. Agresti/Franklin Statistics, 1e, 12 of 139 The Mean of a Probability Distribution The mean of a probability distribution is denoted by the parameter, µ. Agresti/Franklin Statistics, 1e, 13 of 139 The Mean of a Discrete Probability Distribution The mean of a probability distribution for a discrete random variable is x p(x) where the sum is taken over all possible values of x. Agresti/Franklin Statistics, 1e, 14 of 139 Expected Value of X The mean of a probability distribution of a random variable X is also called the expected value of X. The expected value reflects not what we’ll observe in a single observation, but rather that we expect for the average in a long run of observations. Agresti/Franklin Statistics, 1e, 15 of 139 Example: What’s the Expected Number of Home Runs in a Baseball Game? Find the mean of this probability distribution. Agresti/Franklin Statistics, 1e, 16 of 139 Example: What’s the Expected Number of Home Runs in a Baseball Game? The mean: x p(x) = 0(0.23) + 1(0.38) + 2(0.22) + 3(0.13) + 4(0.03) + 5(0.01) = 1.38 Agresti/Franklin Statistics, 1e, 17 of 139 The Standard Deviation of a Probability Distribution The standard deviation of a probability distribution, denoted by the parameter, σ, measures its spread. Larger values of σ correspond to greater spread. Agresti/Franklin Statistics, 1e, 18 of 139 Continuous Random Variable A continuous random variable has an infinite continuum of possible values in an interval. Examples are: time, age and size measures such as height and weight. Agresti/Franklin Statistics, 1e, 19 of 139 Probability Distribution of a Continuous Random Variable A continuous random variable has possible values that from an interval. Its probability distribution is specified by a curve. Each interval has probability between 0 and 1. The interval containing all possible values has probability equal to 1. Agresti/Franklin Statistics, 1e, 20 of 139 Continuous Variables are Measured in a Discrete Manner because of Rounding. Agresti/Franklin Statistics, 1e, 21 of 139 Which Wager do You Prefer? You are given $100 and told that you must pick one of two wagers, for an outcome based on flipping a coin: A. You win $200 if it comes up heads and lose $50 if it comes up tails. B. You win $350 if it comes up head and lose your original $100 if it comes up tails. Without doing any calculation, which wager would you prefer? Agresti/Franklin Statistics, 1e, 22 of 139 You win $200 if it comes up heads and lose $50 if it comes up tails. Find the expected outcome for this wager. a. b. c. d. $100 $25 $50 $75 Agresti/Franklin Statistics, 1e, 23 of 139 You win $350 if it comes up head and lose your original $100 if it comes up tails. Find the expected outcome for this wager. a. b. c. d. $100 $125 $350 $275 Agresti/Franklin Statistics, 1e, 24 of 139 Section 6.2 How Can We Find Probabilities for Bell-Shaped Distributions? Agresti/Franklin Statistics, 1e, 25 of 139 Normal Distribution The normal distribution is symmetric, bell-shaped and characterized by its mean µ and standard deviation σ. The probability of falling within any particular number of standard deviations of µ is the same for all normal distributions. Agresti/Franklin Statistics, 1e, 26 of 139 Normal Distribution Agresti/Franklin Statistics, 1e, 27 of 139 Z-Score Recall: The z-score for an observation is the number of standard deviations that it falls from the mean. Agresti/Franklin Statistics, 1e, 28 of 139 Z-Score For each fixed number z, the probability within z standard deviations of the mean is the area under the normal curve between - z and z Agresti/Franklin Statistics, 1e, 29 of 139 Z-Score For z = 1: 68% of the area (probability) of a normal distribution falls between: - 1 and 1 Agresti/Franklin Statistics, 1e, 30 of 139 Z-Score For z = 2: 95% of the area (probability) of a normal distribution falls between: - 2 and 2 Agresti/Franklin Statistics, 1e, 31 of 139 Z-Score For z = 3: Nearly 100% of the area (probability) of a normal distribution falls between: - 3 and 3 Agresti/Franklin Statistics, 1e, 32 of 139 The Normal Distribution: The Most Important One in Statistics It’s important because… • Many variables have approximate normal • • distributions. It’s used to approximate many discrete distributions. Many statistical methods use the normal distribution even when the data are not bell-shaped. Agresti/Franklin Statistics, 1e, 33 of 139 Finding Normal Probabilities for Various Z-values Suppose we wish to find the probability within, say, 1.43 standard deviations of µ. Agresti/Franklin Statistics, 1e, 34 of 139 Z-Scores and the Standard Normal Distribution When a random variable has a normal distribution and its values are converted to z-scores by subtracting the mean and dividing by the standard deviation, the z-scores have the standard normal distribution. Agresti/Franklin Statistics, 1e, 35 of 139 Example: Find the probability within 1.43 standard deviations of µ Agresti/Franklin Statistics, 1e, 36 of 139 Example: Find the probability within 1.43 standard deviations of µ Probability below 1.43σ = .9236 Probability above 1.43σ = .0764 By symmetry, probability below -1.43σ = .0764 Total probability under the curve = 1 Agresti/Franklin Statistics, 1e, 37 of 139 Example: Find the probability within 1.43 standard deviations of µ Agresti/Franklin Statistics, 1e, 38 of 139 Example: Find the probability within 1.43 standard deviations of µ The probability falling within 1.43 standard deviations of the mean equals: 1 – 0.1528 = 0.8472, about 85% Agresti/Franklin Statistics, 1e, 39 of 139 How Can We Find the Value of z for a Certain Cumulative Probability? Example: Find the value of z for a cumulative probability of 0.025. Agresti/Franklin Statistics, 1e, 40 of 139 Example: Find the Value of z For a Cumulative Probability of 0.025 Look up the cumulative probability of 0.025 in the body of Table A. A cumulative probability of 0.025 corresponds to z = -1.96. So, a probability of 0.025 lies below µ - 1.96σ. Agresti/Franklin Statistics, 1e, 41 of 139 Example: Find the Value of z For a Cumulative Probability of 0.025 Agresti/Franklin Statistics, 1e, 42 of 139 Example: What IQ Do You Need to Get Into Mensa? Mensa is a society of high-IQ people whose members have a score on an IQ test at the 98th percentile or higher. Agresti/Franklin Statistics, 1e, 43 of 139 Example: What IQ Do You Need to Get Into Mensa? How many standard deviations above the mean is the 98th percentile? • The cumulative probability of 0.980 in the body of Table A corresponds to z = 2.05. • The 98th percentile is 2.05 standard deviations above µ. Agresti/Franklin Statistics, 1e, 44 of 139 Example: What IQ Do You Need to Get Into Mensa? What is the IQ for that percentile? • Since µ = 100 and σ 16, the 98th percentile of IQ equals: µ + 2.05σ = 100 + 2.05(16) = 133 Agresti/Franklin Statistics, 1e, 45 of 139 Z-Score for a Value of a Random Variable The z-score for a value of a random variable is the number of standard deviations that x falls from the mean µ. It is calculated as: z x- Agresti/Franklin Statistics, 1e, 46 of 139 Example: Finding Your Relative Standing on The SAT Scores on the verbal or math portion of the SAT are approximately normally distributed with mean µ = 500 and standard deviation σ = 100. The scores range from 200 to 800. Agresti/Franklin Statistics, 1e, 47 of 139 Example: Finding Your Relative Standing on The SAT If one of your SAT scores was x = 650, how many standard deviations from the mean was it? Agresti/Franklin Statistics, 1e, 48 of 139 Example: Finding Your Relative Standing on The SAT Find the z-score for x = 650. x- 650 - 500 z 1.50 100 Agresti/Franklin Statistics, 1e, 49 of 139 Example: Finding Your Relative Standing on The SAT What percentage of SAT scores was higher than yours? • Find the cumulative probability for the z• score of 1.50 from Table A. The cumulative probability is 0.9332. Agresti/Franklin Statistics, 1e, 50 of 139 Example: Finding Your Relative Standing on The SAT The cumulative probability below 650 is 0.9332. The probability above 650 is 1 – 0.9332 = 0.0668 About 6.7% of SAT scores are higher than yours. Agresti/Franklin Statistics, 1e, 51 of 139 Example: What Proportion of Students Get A Grade of B? On the midterm exam in introductory statistics, an instructor always give a grade of B to students who score between 80 and 90. One year, the scores on the exam have approximately a normal distribution with mean 83 and standard deviation 5. About what proportion of students get a B? Agresti/Franklin Statistics, 1e, 52 of 139 Example: What Proportion of Students Get A Grade of B? Calculate the z-score for 80 and for 90: z z x- x- 90 - 83 1.40 5 80 - 83 - 0.60 5 Agresti/Franklin Statistics, 1e, 53 of 139 Example: What Proportion of Students Get A Grade of B? Look up the cumulative probabilities in Table A. • For z = 1.40, cum. Prob. = 0.9192 • For z = -0.60, cum. Prob. = 0.2743 It follows that about 0.9192 – 0.2743 = 0.6449, or about 64% of the exam scores were in the ‘B’ range. Agresti/Franklin Statistics, 1e, 54 of 139 Using z-scores to Find Normal Probabilities If we’re given a value x and need to find a probability, convert x to a z-score using: z x- Use a table of normal probabilities to get a cumulative probability. Convert it to the probability of interest. Agresti/Franklin Statistics, 1e, 55 of 139 Using z-scores to Find Random Variable x Values If we’re given a probability and need to find the value of x, convert the probability to the related cumulative probability. Find the z-score using a normal table. Evaluate x = zσ + µ. Agresti/Franklin Statistics, 1e, 56 of 139 Example: How Can We Compare Test Scores That Use Different Scales? When you applied to college, you scored 650 on an SAT exam, which had mean µ = 500 and standard deviation σ = 100. Your friend took the comparable ACT in 2001, scoring 30. That year, the ACT had µ = 21.0 and σ = 4.7. How can we tell who did better? Agresti/Franklin Statistics, 1e, 57 of 139 What is the z-score for your SAT score of 650? For the SAT scores: µ = 500 and σ = 100. a. 2.15 b. 1.50 c. -1.75 d. -1.25 Agresti/Franklin Statistics, 1e, 58 of 139 What percentage of students scored higher than you? a. b. c. d. 10% 5% 2% 7% Agresti/Franklin Statistics, 1e, 59 of 139 What is the z-score for your friend’s ACT score of 30? The ACT scores had a mean of 21 and a standard deviation of 4.7. a. 1.84 b. -1.56 c. 1.91 d. -2.24 Agresti/Franklin Statistics, 1e, 60 of 139 What percentage of students scored higher than your friend? a. b. c. d. 3% 6% 10% 1% Agresti/Franklin Statistics, 1e, 61 of 139 Standard Normal Distribution The standard normal distribution is the normal distribution with mean µ = 0 and standard deviation σ = 1. It is the distribution of normal z-scores. Agresti/Franklin Statistics, 1e, 62 of 139 Section 6.3 How Can We Find Probabilities When Each Observation Has Two Possible Outcomes? Agresti/Franklin Statistics, 1e, 63 of 139 The Binomial Distribution Each observation is binary: it has one of two possible outcomes. Examples: • • • Accept, or decline an offer from a bank for a credit card. Have, or have not, health insurance. Vote yes or no in a referendum. Agresti/Franklin Statistics, 1e, 64 of 139 Conditions for the Binomial Distribution Each of n trails has two possible outcomes: “success” and “failure”. Each trail has the same probability of success, denoted by p. The n trials are independent. The binomial random variable X is the number of successes in the n trials. Agresti/Franklin Statistics, 1e, 65 of 139 Example: Finding Binomial Probabilities for An ESP Experiment John Doe claims to possess ESP. An experiment is conducted: • • • • A person in one room picks one of the integers 1, 2, 3, 4, 5 at random. In another room, John Doe identifies the number he believes was picked. The experiment is done with three trials. Doe got the correct answer twice. Agresti/Franklin Statistics, 1e, 66 of 139 Example: Finding Binomial Probabilities for An ESP Experiment If John Doe does not actually have ESP and is actually guessing the number, what is the probability that he’d make a correct guess on two of the three trials? Agresti/Franklin Statistics, 1e, 67 of 139 Example: Finding Binomial Probabilities for An ESP Experiment Agresti/Franklin Statistics, 1e, 68 of 139 Example: Finding Binomial Probabilities for An ESP Experiment The three ways John Doe could make two correct guesses in three trials are: SSF, SFS, and FSS. Each of these has probability: (0.2)2(0.8)=0.032. The total probability of two correct guesses is 3(0.2)2(0.8)=0.096. Agresti/Franklin Statistics, 1e, 69 of 139 Probabilities for a Binomial Distribution Denote the probability of success on a trial by p. For n independent trials, the probability of x successes equals: n! x n x) p(x) p (1 p) , x 0,1,2,...,n x!(n - x)! Agresti/Franklin Statistics, 1e, 70 of 139 Example: Using the Binomial Formula in ESP Experiment The probability of exactly 2 correct guesses is the binomial probability with n = 3 trials, x = 2 correct guesses and p = 0.2 probability of a correct guess. 3! p(2) (0.2) 2 (0.8)1 3(0.04)(0. 8) 0.096 2!1! Agresti/Franklin Statistics, 1e, 71 of 139 Example: Are Women Passed over for Managerial Training? Example: Presence of bias in promotion. • Large supermarket in Florida. • Group of women claimed that female employees were passed over for management training. Agresti/Franklin Statistics, 1e, 72 of 139 Example: Are Women Passed over for Managerial Training? Large employee pool of more than 1000 people. Half the employees are male; half are female. None of the 10 employees chosen for management training were female. Agresti/Franklin Statistics, 1e, 73 of 139 Example: Are Women Passed over for Managerial Training? How can we investigate statistically the women’s assertion of gender bias? Agresti/Franklin Statistics, 1e, 74 of 139 Example: Are Women Passed over for Managerial Training? If the employees are selected randomly in terms of gender, about half of the employees picked should be females and about half should be males. Agresti/Franklin Statistics, 1e, 75 of 139 Example: Are Women Passed over for Managerial Training? Due to ordinary sampling variation, it need not happen that exactly 50 % of those selected are females. Agresti/Franklin Statistics, 1e, 76 of 139 Example: Are Women Passed over for Managerial Training? If employees were actually selected at random for the training, what are the chances that none of the 10 employees selected were females? Agresti/Franklin Statistics, 1e, 77 of 139 Example: Are Women Passed over for Managerial Training? The probability that no females are chosen equals: 10! 0 10 p(0) (0.50) (0.50) 0.001 0!10! Agresti/Franklin Statistics, 1e, 78 of 139 Example: Are Women Passed over for Managerial Training? It is very unlikely (one chance in a thousand) that none of the 10 selected for management training would be female. Agresti/Franklin Statistics, 1e, 79 of 139 Example: Are Women Passed over for Managerial Training? Agresti/Franklin Statistics, 1e, 80 of 139 Do the Binomial Conditions Apply? Before you use the binomial distribution, check that its three conditions apply: • Binary data (success or failure). • The same probability of success for each • trial (denoted by p). Independent trials. Agresti/Franklin Statistics, 1e, 81 of 139 Do the Binomial Conditions Apply to the Managerial Training Example? The data are binary. If employees are selected randomly, the probability of selecting a female on a given trial is 0.50. With random sampling from a large population, outcomes from trials are independent. Agresti/Franklin Statistics, 1e, 82 of 139 Binomial Mean and Standard Deviation The binomial probability distribution for n trials with probability p of success on each trial has mean µ and standard deviation σ given by: np, np(1- p) Agresti/Franklin Statistics, 1e, 83 of 139 Example: How Can We Check for Racial Profiling? Study conducted by the American Civil Liberties Union. Study analyzed whether African-American drivers were more likely than other in the population to be targeted by police for traffic stops. Agresti/Franklin Statistics, 1e, 84 of 139 Example: How Can We Check for Racial Profiling? Data: • 262 police car stops in Philadelphia in • • 1997. 207 of the drivers stopped were AfricanAmerican. In 1997, Philadelphia’s population was 42.2% African-American. Agresti/Franklin Statistics, 1e, 85 of 139 Example: How Can We Check for Racial Profiling? Does the number of African-Americans stopped suggest possible bias, being higher than we would expect (other things being equal, such as the rate of violating traffic laws)? Agresti/Franklin Statistics, 1e, 86 of 139 Example: How Can We Check for Racial Profiling? Assume: • 262 car stops represent n = 262 trials. • Successive police car stops are • independent. P(driver is African-American) is p = 0.422. Agresti/Franklin Statistics, 1e, 87 of 139 Example: How Can We Check for Racial Profiling? Calculate the mean and standard deviation of this binomial distribution: np, np(1- p) Agresti/Franklin Statistics, 1e, 88 of 139 Example: How Can We Check for Racial Profiling? 262(0.422) 111 262(0.422)(0.578) 8 Agresti/Franklin Statistics, 1e, 89 of 139 Example: How Can We Check for Racial Profiling? Recall: Empirical Rule • When a distribution is bell-shaped, about 100% of it falls within 3 standard deviations of the mean. Agresti/Franklin Statistics, 1e, 90 of 139 Example: How Can We Check for Racial Profiling? u - 3 111 - 3(8) 87 3 111 3(8) 135 Agresti/Franklin Statistics, 1e, 91 of 139 Example: How Can We Check for Racial Profiling? If no racial profiling is happening, we would not be surprised if between about 87 and 135 of the 262 people stopped were African-American. The actual number stopped (207) is well above these values. The number of African-American stopped is too high, even taking into account random variation. Agresti/Franklin Statistics, 1e, 92 of 139 Example: How Can We Check for Racial Profiling? Limitation of the analysis: • Different people do different amounts of driving, so we don’t really know that 42.2% of the potential stops were AfricanAmerican. Agresti/Franklin Statistics, 1e, 93 of 139 When Is the Binomial Distribution Dell Shaped? The binomial distribution has close to a symmetric, bell shape when the expected number of successes, np, and the expected number of failures, n(1-p) are both at least 15. Agresti/Franklin Statistics, 1e, 94 of 139 Section 6.4 How Likely Are the Possible Values of a Statistic? The Sampling Distribution Agresti/Franklin Statistics, 1e, 95 of 139 Statistic Recall: A statistic is a numerical summary of sample data, such as a sample proportion or a sample mean. Agresti/Franklin Statistics, 1e, 96 of 139 Parameter Recall: A parameter is a numerical summary of a population, such as a population proportion or a population mean. Agresti/Franklin Statistics, 1e, 97 of 139 Statistics and Parameters In practice, we seldom know the values of parameters. Parameters are estimated using sample data. We use statistics to estimate parameters. Agresti/Franklin Statistics, 1e, 98 of 139 Example: 2003 California Recall Election Prior to counting the votes, the proportion in favor of recalling Governor Gray Davis was an unknown parameter. An exit poll of 3160 voters reported that the sample proportion in favor of a recall was 0.54. Agresti/Franklin Statistics, 1e, 99 of 139 Example: 2003 California Recall Election If a different random sample of about 3000 voters were selected, a different sample proportion would occur. Agresti/Franklin Statistics, 1e, 100 of 139 Example: 2003 California Recall Election Imagine all the distinct samples of 3000 voters you could possibly get. Each such sample has a value for the sample proportion. Agresti/Franklin Statistics, 1e, 101 of 139 Statistics and Parameters How do we know that a sample statistic is a good estimate of a population parameter? To answer this, we need to look at a probability distribution called the sampling distribution. Agresti/Franklin Statistics, 1e, 102 of 139 Sampling Distribution The sampling distribution of a statistic is the probability distribution that specifies probabilities for the possible values the statistic can take. Agresti/Franklin Statistics, 1e, 103 of 139 The Sampling Distribution of the Sample Proportion Look at each possible sample. Find the sample proportion for each sample. Construct the frequency distribution of the sample proportion values. This frequency distribution is the sampling distribution of the sample proportion. Agresti/Franklin Statistics, 1e, 104 of 139 Example: Sampling Distribution Which Brand of Pizza Do You Prefer? • Two Choices: A or D. • Assume that half of the population prefers • Brand A and half prefers Random D. Take a random sample of n = 3 tasters. Agresti/Franklin Statistics, 1e, 105 of 139 Example: Sampling Distribution Sample No. Prefer Pizza A Proportion (A,A,A) 3 1 (A,A,D) 2 2/3 (A,D,A) 2 2/3 (D,A,A) 2 2/3 (A,D,D) 1 1/3 (D,A,D) 1 1/3 (D,D,A) 1 1/3 (D,D,D) 0 0 Agresti/Franklin Statistics, 1e, 106 of 139 Example: Sampling Distribution Sample Proportion Probability 0 1/8 1/3 3/8 2/3 3/8 1 1/8 Agresti/Franklin Statistics, 1e, 107 of 139 Example: Sampling Distribution Agresti/Franklin Statistics, 1e, 108 of 139 Mean and Standard Deviation of the Sampling Distribution of a Proportion For a binomial random variable with n trials and probability p of success for each, the sampling distribution of the proportion of successes has: Mean p and standard deviation p(1 - p) n To obtain these value, take the mean np and standard deviation np(1 p) for the binomial distribution of the number of successes and divide by n. Agresti/Franklin Statistics, 1e, 109 of 139 Example: 2003 California Recall Election Sample: Exit poll of 3160 voters. Suppose that exactly 50% of the population of all voters voted in favor of the recall. Agresti/Franklin Statistics, 1e, 110 of 139 Example: 2003 California Recall Election Describe the mean and standard deviation of the sampling distribution of the number in the sample who voted in favor of the recall. • µ = np = 3160(0.50) = 1580 • np(1- p) 3160(0.50)(0.50) 28.1 Agresti/Franklin Statistics, 1e, 111 of 139 Example: 2003 California Recall Election Describe the mean and standard deviation of the sampling distribution of the proportion in the sample who voted in favor of the recall. Mean p 0.50 Standard Deviation p(1 p) (0.50)(0.50) 0.000079 0.0089 p 3160 Agresti/Franklin Statistics, 1e, 112 of 139 The Standard Error To distinguish the standard deviation of a sampling distribution from the standard deviation of an ordinary probability distribution, we refer to it as a standard error. Agresti/Franklin Statistics, 1e, 113 of 139 Example: 2003 California Recall Election If the population proportion supporting recall was 0.50, would it have been unlikely to observe the exit-poll sample proportion of 0.54? Based on your answer, would you be willing to predict that Davis would be recalled from office? Agresti/Franklin Statistics, 1e, 114 of 139 Example: 2003 California Recall Election Fact: The sampling distribution of the sample proportion has a bell-shape with a mean µ = 0.50 and a standard deviation σ = 0.0089. Agresti/Franklin Statistics, 1e, 115 of 139 Example: 2003 California Recall Election Convert the sample proportion value of 0.54 to a z-score: (0.54 - 0.50) z 4.5 0.0089 Agresti/Franklin Statistics, 1e, 116 of 139 Example: 2003 California Recall Election Agresti/Franklin Statistics, 1e, 117 of 139 Example: 2003 California Recall Election The sample proportion of 0.54 is more than four standard errors from the expected value of 0.50. The sample proportion of 0.54 voting for recall would be very unlikely if the population support were p = 0.50. Agresti/Franklin Statistics, 1e, 118 of 139 Example: 2003 California Recall Election A sample proportion of 0.54 would be even more unlikely if the population support were less than 0.50. We there have strong evidence that the population support was larger than 0.50. The exit poll gives strong evidence that Governor Davis would be recalled. Agresti/Franklin Statistics, 1e, 119 of 139 Summary of the Sampling Distribution of a Proportion For a random sample of size n from a population with proportion p, the sampling distribution of the sample proportion has p(1 - p) Mean p and standard error n If n is sufficiently large such that the expected numbers of outcomes of the two types, np and n(1p), are both at least 15, then this sampling distribution has a bell-shape. Agresti/Franklin Statistics, 1e, 120 of 139 Section 6.5 How Close Are Sample Means to Population Means? Agresti/Franklin Statistics, 1e, 121 of 139 The Sampling Distribution of the Sample Mean The sample mean, x, is a random variable. The sample mean varies from sample to sample. By contrast, the population mean, µ, is a single fixed number. Agresti/Franklin Statistics, 1e, 122 of 139 Mean and Standard Error of the Sampling Distribution of the Sample Mean For a random sample of size n from a population having mean µ and standard deviation σ, the sampling distribution of the sample mean has: • • Center described by the mean µ (the same as the mean of the population). Spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: n Agresti/Franklin Statistics, 1e, 123 of 139 Example: How Much Do Mean Sales Vary From Week to Week? Daily sales at a pizza restaurant vary from day to day. The sales figures fluctuate around a mean µ = $900 with a standard deviation σ = $300. Agresti/Franklin Statistics, 1e, 124 of 139 Example: How Much Do Mean Sales Vary From Week to Week? The mean sales for the seven days in a week are computed each week. The weekly means are plotted over time. These weekly means form a sampling distribution. Agresti/Franklin Statistics, 1e, 125 of 139 Example: How Much Do Mean Sales Vary From Week to Week? What are the center and spread of the sampling distribution? $900 300 113 7 Agresti/Franklin Statistics, 1e, 126 of 139 Sampling Distribution vs. Population Distribution Agresti/Franklin Statistics, 1e, 127 of 139 Standard Error Knowing how to find a standard error gives us a mechanism for understanding how much variability to expect in sample statistics “just by chance.” Agresti/Franklin Statistics, 1e, 128 of 139 Standard Error The standard error of the sample mean: n As the sample size n increases, the denominator increase, so the standard error decreases. With larger samples, the sample mean is more likely to fall close to the population mean. Agresti/Franklin Statistics, 1e, 129 of 139 Central Limit Theorem Question: How does the sampling distribution of the sample mean relate with respect to shape, center, and spread to the probability distribution from which the samples were taken? Agresti/Franklin Statistics, 1e, 130 of 139 Central Limit Theorem For random sampling with a large sample size n, the sampling distribution of the sample mean is approximately a normal distribution. This result applies no matter what the shape of the probability distribution from which the samples are taken. Agresti/Franklin Statistics, 1e, 131 of 139 Central Limit Theorem: How Large a Sample? The sampling distribution of the sample mean takes more of a bell shape as the random sample size n increases. The more skewed the population distribution, the larger n must be before the shape of the sampling distribution is close to normal. In practice, the sampling distribution is usually close to normal when the sample size n is at least about 30. Agresti/Franklin Statistics, 1e, 132 of 139 A Normal Population Distribution and the Sampling Distribution If the population distribution is approximately normal, then the sampling distribution is approximately normal for all sample sizes. Agresti/Franklin Statistics, 1e, 133 of 139 How Does the Central Limit Theorem Help Us Make Inferences For large n, the sampling distribution is approximately normal even if the population distribution is not. This enables us to make inferences about population means regardless of the shape of the population distribution. Agresti/Franklin Statistics, 1e, 134 of 139 Section 6.6 How Can We Make Inferences About a Population? Agresti/Franklin Statistics, 1e, 135 of 139 Three Distinct Types of Distributions Population Distribution Data Distribution Sampling Distribution Agresti/Franklin Statistics, 1e, 136 of 139 Population Distribution This is the probability distribution from which we take the sample. Value of its parameters, such as the population proportion p and the population mean µ are usually unknown. Agresti/Franklin Statistics, 1e, 137 of 139 Data Distribution This is the distribution of the sample data. It’s described by sample statistics, such as a sample proportion or a sample mean. With random sampling, the large the sample size n, the more closely it resembles the population distribution. With larger n, the higher the probability that a sample statistic falls close to the population parameter. Agresti/Franklin Statistics, 1e, 138 of 139 Sampling Distribution This is the probability distribution of a sample statistic, such as a sample proportion or sample mean. It provides the key for telling us how close a sample statistic falls to the unknown parameter. For large n, the sampling distribution is approximately a normal distribution. Agresti/Franklin Statistics, 1e, 139 of 139