STAT 250 Dr. Kari Lock Morgan Normal Distribution Chapter 5 • Normal distribution • Central limit theorem • Normal distribution for confidence intervals • Normal distribution for p-values • Standard normal Statistics: Unlocking the Power of Data Lock5 Bootstrap and Randomization Distributions Correlation: Malevolent uniforms Measures from Scrambled Collection 1 Slope :Restaurant tips Measures from Scrambled RestaurantTips -60 -40 Dot Plot -20 0 20 slope (thousandths) Mean :Body Temperatures Measures from Sample of BodyTemp50 98.2 98.3 98.4 40 -0.4 -0.2 0.0 r 0.2 All bell-shaped What do you Diff means: Finger taps distributions! notice? 0.4 0.6 Dot Plot Measures from Scrambled CaffeineTaps 98.5 98.6 Nullxbar 98.7 98.8 0.5 phat 0.6 98.9 Dot Plot Dot Plot 99.0 -4 Proportion : Owners/dogs 0.4 60 -0.6 Measures from Sample of Collection 1 0.3 Dot Plot -3 -2 -1 0 Diff 1 2 3 Mean : Atlanta commutes Measures from Sample of CommuteAtlanta 0.7 0.8 Statistics: Unlocking the Power of Data 26 27 28 29 xbar 30 4 Dot Plot 31 32 Lock5 Normal Distribution • The symmetric, bell-shaped curve we have 1000 0 500 Frequency 1500 seen for almost all of our bootstrap and randomization distributions is called a normal distribution -3 Statistics: Unlocking the Power of Data -2 -1 0 1 2 3 Lock5 Central Limit Theorem! For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal www.lock5stat.com/StatKey Statistics: Unlocking the Power of Data Lock5 Distribution of đ n īŊ1 n īŊ 10 n īŊ 30 n īŊ 50 n īŊ 100 p īŊ 0.5 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 p īŊ 0.7 0.0 p īŊ 0.1 Statistics: Unlocking the Power of Data 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Lock5 CLT for a Mean Population 8 3.0 1.5 0 1 2 10 x n = 30 2.0 3.0 2 3 4 5 1.5 2.0 2.5 3.0 25 1 0 2 4 Statistics: Unlocking the Power of Data 1.0 0 10 Frequency 0 n = 50 3 4 5 6 8 6 4 4 0 2 Frequency 0 Distribution of Sample Means 0.0 n = 10 Frequency Distribution of Sample Data 6 8 12 1.4 1.8 2.2 2.6 Lock5 Central Limit Theorem • The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies • The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work • For small samples, it is more important that the data itself is approximately normal Statistics: Unlocking the Power of Data Lock5 Accuracy • The accuracy of intervals and p-values generated using simulation methods (bootstrapping and randomization) depends on the number of simulations (more simulations = more accurate) • The accuracy of intervals and p-values generated using formulas and the normal distribution depends on the sample size (larger sample size = more accurate) • If the distribution of the statistic is truly normal and you have generated many simulated randomizations, the p-values should be very close Statistics: Unlocking the Power of Data Lock5 Normal Distribution • The normal distribution is fully characterized by it’s mean and standard deviation N ī¨ mean,standard deviation īŠ Statistics: Unlocking the Power of Data Lock5 Bootstrap Distributions If a bootstrap distribution is approximately normally distributed, we can write it as a) b) c) d) N(parameter, sd) N(statistic, sd) N(parameter, se) N(statistic, se) sd = standard deviation of variable se = standard error = standard deviation of statistic Statistics: Unlocking the Power of Data Lock5 Hearing Loss • In a random sample of 1771 Americans aged 12 to 19, 19.5% had some hearing loss (this is a dramatic increase from a decade ago!) • What proportion of Americans aged 12 to 19 have some hearing loss? Give a 95% CI. Rabin, R. “Childhood: Hearing Loss Grows Among Teenagers,” www.nytimes.com, 8/23/10. Statistics: Unlocking the Power of Data Lock5 Hearing Loss (0.177, 0.214) Statistics: Unlocking the Power of Data Lock5 Hearing Loss N(0.195, 0.0095) Statistics: Unlocking the Power of Data Lock5 Confidence Intervals If the bootstrap distribution is normal: To find a P% confidence interval , we just need to find the middle P% of the distribution N(statistic, SE) www.lock5stat.com/statkey Statistics: Unlocking the Power of Data Lock5 Hearing Loss www.lock5stat.com/statkey (0.176, 0.214) Statistics: Unlocking the Power of Data Lock5 Randomization Distributions If a randomization distribution is approximately normally distributed, we can write it as a) N(null value, se) b) N(statistic, se) c) N(parameter, se) Statistics: Unlocking the Power of Data Lock5 p-values If the randomization distribution is normal: To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution N( Statistics: Unlocking the Power of Data , ) Lock5 First Born Children • Are first born children actually smarter? • Explanatory variable: first born or not • Response variable: combined SAT score • Based on a sample of college students, we find đĨđđđđ đĄ đđđđ − đĨđđđĄ đđđđ đĄ đđđđ = 30.26 • From a randomization distribution, we find SE = 37 Statistics: Unlocking the Power of Data Lock5 First Born Children đĨđđđđ đĄ đđđđ − đĨđđđĄ đđđđ đĄ đđđđ = 30.26 SE = 37 What normal distribution should we use to find the p-value? a) b) c) d) N(30.26, 37) N(37, 30.26) N(0, 37) N(0, 30.26) Statistics: Unlocking the Power of Data Lock5 Hypothesis Testing Distribution of Statistic Assuming Null Observed Statistic p-value -3 -2 -1 0 1 2 3 Statistic Statistics: Unlocking the Power of Data Lock5 First Born Children N(0, 37) www.lock5stat.com/statkey p-value = 0.207 Statistics: Unlocking the Power of Data Lock5 Standardized Data ī Often, we standardize the data to have mean 0 and standard deviation 1 ī This is done with z-scores From x to z : x ī mean zīŊ sd From z to x: x = mean + z ´ sd ī Places everything on a common scale Statistics: Unlocking the Power of Data Lock5 Standard Normal • The standard normal distribution is the normal distribution with mean 0 and standard deviation 1 of Statistic Assuming Null Distribution N ī¨ 0,1īŠ -3 -2 -1 0 1 2 3 Statistic Statistics: Unlocking the Power of Data Lock5 Standardized Data ī Confidence Interval (bootstrap distribution): mean = sample statistic, sd = SE From z to x: (CI) x = mean + z ´ sd Bootstrap Distribution: N(statistic, SE) x īŊ statistic īĢ z ī´ SE Statistics: Unlocking the Power of Data Lock5 P% Confidence Interval 1. Find z-scores (–z* and z*) that capture the middle P% of the standard normal 2. Return to original scale with statistic īą z*ī SE P% -z* Statistics: Unlocking the Power of Data z* Lock5 Confidence Interval using N(0,1) If a statistic is normally distributed, we find a confidence interval for the parameter using statistic īą z*ī SE where the area between –z* and +z* in the standard normal distribution is the desired level of confidence. Statistics: Unlocking the Power of Data Lock5 Confidence Intervals Find z* for a 99% confidence interval. www.lock5stat.com/statkey z* = 2.575 Statistics: Unlocking the Power of Data Lock5 z* ī Why use the standard normal? ī z* is always the same, regardless of the data! ī Common confidence levels: īĄ 95%: z* = 1.96 (but 2 is close enough) īĄ 90%: z* = 1.645 īĄ 99%: z* = 2.576 Statistics: Unlocking the Power of Data Lock5 Sin Taxes In March 2011, a random sample of 1000 US adults were asked “Do you favor or oppose ‘sin taxes’ on soda and junk food?” 320 adults responded in favor of sin taxes. Give a 99% CI for the proportion of all US adults that favor these sin taxes. From a bootstrap distribution, we find SE = 0.015 Statistics: Unlocking the Power of Data Lock5 Sin Taxes Statistics: Unlocking the Power of Data Lock5 Sin Taxes Statistics: Unlocking the Power of Data Lock5 Standardized Data ī Hypothesis test (randomization distribution): mean = null value, sd = SE From x to z (test) : x ī mean zīŊ sd Randomization Distribution: N(null value, SE) x - null z= SE Statistics: Unlocking the Power of Data Lock5 p-value using N(0,1) If a statistic is normally distributed under H0, the p-value is the probability a standard normal is beyond đ đđđđđ đ đĄđđĄđđ đĄđđ − đđĸđđ đđđđđđđĄđđ đ§= đđ¸ Statistics: Unlocking the Power of Data Lock5 First Born Children đĨđđđđ đĄ đđđđ − đĨđđđĄ đđđđ đĄ đđđđ = 30.26, SE = 37 1) Find the standardized test statistic 2) Compute the p-value Statistics: Unlocking the Power of Data Lock5 z-statistic If z = –3, using īĄ = 0.05 we would (a) Reject the null (b) Not reject the null (c) Impossible to tell (d) I have no idea Statistics: Unlocking the Power of Data Lock5 z-statistic • Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale Statistics: Unlocking the Power of Data Lock5 Confidence Interval Formula IF SAMPLE SIZES ARE LARGE… From N(0,1) sample statistic īą z ī´ SE * From original data Statistics: Unlocking the Power of Data From bootstrap distribution Lock5 Formula for p-values IF SAMPLE SIZES ARE LARGE… From original data From H0 sample statistic ī null value zīŊ SE From randomization distribution Statistics: Unlocking the Power of Data Compare z to N(0,1) for p-value Lock5 Standard Error • Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? • We can!!! • Or at least we’ll be able to next class… Statistics: Unlocking the Power of Data Lock5 To Do ī Read Chapter 5 ī Do HW 5 (due Friday, 4/3) Statistics: Unlocking the Power of Data Lock5