pptx

advertisement
STAT 250
Dr. Kari Lock Morgan
Normal Distribution
Chapter 5
• Normal distribution
• Central limit theorem
• Normal distribution for confidence intervals
• Normal distribution for p-values
• Standard normal
Statistics: Unlocking the Power of Data
Lock5
Bootstrap and Randomization Distributions
Correlation: Malevolent
uniforms
Measures from Scrambled Collection 1
Slope :Restaurant
tips
Measures from Scrambled RestaurantTips
-60
-40
Dot Plot
-20
0
20
slope (thousandths)
Mean :Body
Temperatures
Measures from Sample of BodyTemp50
98.2
98.3
98.4
40
-0.4
-0.2
0.0
r
0.2
All
bell-shaped
What
do you
Diff means: Finger taps
distributions!
notice?
0.4
0.6
Dot Plot
Measures from Scrambled CaffeineTaps
98.5
98.6
Nullxbar
98.7
98.8
0.5
phat
0.6
98.9
Dot Plot
Dot Plot
99.0
-4
Proportion : Owners/dogs
0.4
60
-0.6
Measures from Sample of Collection 1
0.3
Dot Plot
-3
-2
-1
0
Diff
1
2
3
Mean : Atlanta commutes
Measures from Sample of CommuteAtlanta
0.7
0.8
Statistics: Unlocking the Power of Data
26
27
28
29
xbar
30
4
Dot Plot
31
32
Lock5
Normal Distribution
• The symmetric, bell-shaped curve we have
1000
0
500
Frequency
1500
seen for almost all of our bootstrap and
randomization distributions is called a
normal distribution
-3
Statistics: Unlocking the Power of Data
-2
-1
0
1
2
3
Lock5
Central Limit Theorem!
For a sufficiently large sample
size, the distribution of sample
statistics for a mean or a
proportion is normal
www.lock5stat.com/StatKey
Statistics: Unlocking the Power of Data
Lock5
Distribution of 𝒑
n ī€Ŋ1
n ī€Ŋ 10
n ī€Ŋ 30
n ī€Ŋ 50
n ī€Ŋ 100
p ī€Ŋ 0.5
0.0
0.5
1.0
0.0
0.5
1.0 0.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.5
1.0 0.0
0.5
1.0
0.0
0.5
1.0
0.5
1.0 0.0
0.5
1.0
0.0
0.5
1.0
p ī€Ŋ 0.7
0.0
p ī€Ŋ 0.1
Statistics: Unlocking the Power of Data
0.0
0.5
1.0 0.0
0.5
1.0
0.0
0.5
1.0
Lock5
CLT for a Mean
Population
8
3.0
1.5
0 1 2
10
x
n = 30
2.0
3.0
2
3
4
5
1.5
2.0
2.5
3.0
25
1
0 2 4
Statistics: Unlocking the Power of Data
1.0
0 10
Frequency
0
n = 50
3 4 5 6
8
6
4
4
0
2
Frequency
0
Distribution of
Sample Means
0.0
n = 10
Frequency
Distribution of
Sample Data
6 8
12
1.4
1.8
2.2
2.6
Lock5
Central Limit Theorem
• The central limit theorem holds for ANY
original distribution, although “sufficiently large
sample size” varies
• The more skewed the original distribution is
(the farther from normal), the larger the sample
size has to be for the CLT to work
• For small samples, it is more important that the
data itself is approximately normal
Statistics: Unlocking the Power of Data
Lock5
Accuracy
• The accuracy of intervals and p-values generated
using simulation methods (bootstrapping and
randomization) depends on the number of simulations
(more simulations = more accurate)
• The accuracy of intervals and p-values generated
using formulas and the normal distribution depends on
the sample size (larger sample size = more accurate)
• If the distribution of the statistic is truly normal and
you have generated many simulated randomizations,
the p-values should be very close
Statistics: Unlocking the Power of Data
Lock5
Normal Distribution
• The normal distribution is fully
characterized by it’s mean and standard
deviation
N ī€¨ mean,standard deviation ī€Š
Statistics: Unlocking the Power of Data
Lock5
Bootstrap Distributions
If a bootstrap distribution is
approximately normally distributed, we
can write it as
a)
b)
c)
d)
N(parameter, sd)
N(statistic, sd)
N(parameter, se)
N(statistic, se)
sd = standard deviation of variable
se = standard error = standard deviation of statistic
Statistics: Unlocking the Power of Data
Lock5
Hearing Loss
• In a random sample of 1771 Americans aged 12
to 19, 19.5% had some hearing loss (this is a
dramatic increase from a decade ago!)
• What proportion of Americans aged 12 to 19
have some hearing loss? Give a 95% CI.
Rabin, R. “Childhood: Hearing Loss Grows Among Teenagers,” www.nytimes.com, 8/23/10.
Statistics: Unlocking the Power of Data
Lock5
Hearing Loss
(0.177, 0.214)
Statistics: Unlocking the Power of Data
Lock5
Hearing Loss
N(0.195, 0.0095)
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
If the bootstrap distribution is normal:
To find a P% confidence interval , we just
need to find the middle P% of the
distribution
N(statistic, SE)
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
Lock5
Hearing Loss
www.lock5stat.com/statkey
(0.176, 0.214)
Statistics: Unlocking the Power of Data
Lock5
Randomization Distributions
If a randomization distribution is
approximately normally distributed, we
can write it as
a) N(null value, se)
b) N(statistic, se)
c) N(parameter, se)
Statistics: Unlocking the Power of Data
Lock5
p-values
If the randomization distribution is
normal:
To calculate a p-value, we just need to
find the area in the appropriate tail(s)
beyond the observed statistic of the
distribution
N(
Statistics: Unlocking the Power of Data
,
)
Lock5
First Born Children
• Are first born children actually smarter?
• Explanatory variable: first born or not
• Response variable: combined SAT score
• Based on a sample of college students, we
find đ‘Ĩ𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − đ‘Ĩ𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 = 30.26
• From a randomization distribution, we find
SE = 37
Statistics: Unlocking the Power of Data
Lock5
First Born Children
đ‘Ĩ𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − đ‘Ĩ𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 = 30.26
SE = 37
What normal distribution should we use
to find the p-value?
a)
b)
c)
d)
N(30.26, 37)
N(37, 30.26)
N(0, 37)
N(0, 30.26)
Statistics: Unlocking the Power of Data
Lock5
Hypothesis Testing
Distribution of Statistic Assuming Null
Observed
Statistic
p-value
-3
-2
-1
0
1
2
3
Statistic
Statistics: Unlocking the Power of Data
Lock5
First Born Children
N(0, 37)
www.lock5stat.com/statkey
p-value = 0.207
Statistics: Unlocking the Power of Data
Lock5
Standardized Data
ī‚— Often, we standardize the data to have mean 0
and standard deviation 1
ī‚— This is done with z-scores
From x to z :
x ī€­ mean
zī€Ŋ
sd
From z to x:
x = mean + z ´ sd
ī‚— Places everything on a common scale
Statistics: Unlocking the Power of Data
Lock5
Standard Normal
• The standard normal distribution is
the normal distribution with mean 0 and
standard deviation
1 of Statistic Assuming Null
Distribution
N ī€¨ 0,1ī€Š
-3
-2
-1
0
1
2
3
Statistic
Statistics: Unlocking the Power of Data
Lock5
Standardized Data
ī‚— Confidence Interval (bootstrap distribution):
mean = sample statistic, sd = SE
From z to x: (CI)
x = mean + z ´ sd
Bootstrap Distribution:
N(statistic, SE)
x ī€Ŋ statistic ī€Ģ z ī‚´ SE
Statistics: Unlocking the Power of Data
Lock5
P% Confidence Interval
1. Find z-scores (–z*
and z*) that capture
the middle P% of the
standard normal
2. Return to
original scale with
statistic ī‚ą z*īƒ— SE
P%
-z*
Statistics: Unlocking the Power of Data
z*
Lock5
Confidence Interval using N(0,1)
If a statistic is normally distributed, we find a
confidence interval for the parameter using
statistic ī‚ą z*īƒ— SE
where the area between –z* and +z* in the
standard normal distribution is the desired
level of confidence.
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
Find z* for a 99% confidence interval.
www.lock5stat.com/statkey
z* = 2.575
Statistics: Unlocking the Power of Data
Lock5
z*
ī‚— Why use the standard normal?
ī‚— z* is always the same, regardless of the data!
ī‚— Common confidence levels:
ī‚Ą 95%: z*
= 1.96 (but 2 is close enough)
ī‚Ą 90%: z*
= 1.645
ī‚Ą 99%: z* =
2.576
Statistics: Unlocking the Power of Data
Lock5
Sin Taxes
In March 2011, a random sample of 1000 US
adults were asked
“Do you favor or oppose ‘sin taxes’ on soda and
junk food?”
320 adults responded in favor of sin taxes.
Give a 99% CI for the proportion of all US adults
that favor these sin taxes.
From a bootstrap distribution,
we find SE = 0.015
Statistics: Unlocking the Power of Data
Lock5
Sin Taxes
Statistics: Unlocking the Power of Data
Lock5
Sin Taxes
Statistics: Unlocking the Power of Data
Lock5
Standardized Data
ī‚— Hypothesis test (randomization distribution):
mean = null value, sd = SE
From x to z (test) :
x ī€­ mean
zī€Ŋ
sd
Randomization Distribution:
N(null value, SE)
x - null
z=
SE
Statistics: Unlocking the Power of Data
Lock5
p-value using N(0,1)
If a statistic is normally distributed under H0,
the p-value is the probability a standard normal
is beyond
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 − 𝑛đ‘ĸ𝑙𝑙 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
𝑧=
𝑆𝐸
Statistics: Unlocking the Power of Data
Lock5
First Born Children
đ‘Ĩ𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − đ‘Ĩ𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 = 30.26, SE = 37
1) Find the standardized test statistic
2) Compute the p-value
Statistics: Unlocking the Power of Data
Lock5
z-statistic
If z = –3, using īĄ = 0.05 we would
(a) Reject the null
(b) Not reject the null
(c) Impossible to tell
(d) I have no idea
Statistics: Unlocking the Power of Data
Lock5
z-statistic
•
Calculating the number of standard
errors a statistic is from the null value
allows us to assess extremity on a
common scale
Statistics: Unlocking the Power of Data
Lock5
Confidence Interval Formula
IF SAMPLE SIZES ARE LARGE…
From N(0,1)
sample statistic ī‚ą z ī‚´ SE
*
From original
data
Statistics: Unlocking the Power of Data
From
bootstrap
distribution
Lock5
Formula for p-values
IF SAMPLE SIZES ARE LARGE…
From original
data
From H0
sample statistic ī€­ null value
zī€Ŋ
SE
From
randomization
distribution
Statistics: Unlocking the Power of Data
Compare z to
N(0,1) for p-value
Lock5
Standard Error
• Wouldn’t it be nice if we could compute
the standard error without doing
thousands of simulations?
• We can!!!
• Or at least we’ll be able to next class…
Statistics: Unlocking the Power of Data
Lock5
To Do
ī‚— Read Chapter 5
ī‚— Do HW 5 (due Friday, 4/3)
Statistics: Unlocking the Power of Data
Lock5
Download