z-score

advertisement
Inference



We want to know how often students in a
medium-size college go to the mall in a given
year.
We interview an SRS of n = 10.
If we interviewed lots of SRSs, the “average
sample frequency of visits” would be
centered around the true “average population
frequency of visits.”
1
2
2
2
0.5
1
1
1
0
45
1
50
55
0.5
0
48
1
50
0
52 48
4
52
50
52
0
48
2
50
0
52 45
1
55
0
48
50
52
50
52
0
45
1
50
0
55 48
2
52
0
45
50
55
50
52
50
52
50
55
1
50
55
0.5
50
0
45
1
0.5
0.5
1
50
0
48
2
1
2
1
0
45
50
1
0.5
0
48
2
0
48
2
0
48
1
0.5
50
55
0
45
49.6896
49.7618
50.0742
49.8520
50.0590
50.3243
49.2806
49.6056
50.4129
49.3963
49.3617
49.7741
49.9237
50.2201
49.2904
50.1797
Inference


Suppose that instead we interviewed an SRS
of n = 400.
Our estimates will be more reliable because
estimates from other SRSs would be similar
… that is, our estimates would be less
variable.
50
0
45
50
0
45
50
50
50
0
55 45
50
50
55
0
45
50
0
45
50
0
45
50
50
0
55 45
50
50
55
0
45
50
50
0
55 45
50
50
55
0
45
50
50
0
55 45
50
50
55
50
55
0
45
50
50
55
50
0
55 45
50
50
0
55 45
50
50
55
50
55
0
45
50
55
0
45
50
55
49.9496
50.0165
50.0941
50.0573
50.1674
50.1402
50.0506
50.0838
49.9865
50.0195
49.9752
49.9439
49.9738
49.9966
50.0396
49.9819


Because we didn’t have money for 16
separate samples, we actually only collected
data from the first sample, whose sample
mean is = 49.9496.
Is the true number actually 50? Is the
difference between 50 and 49.9496 purely a
fluke? Does this result exclude 50 as a
possibility?

The Central Limit Theorem says that if the
entire population has a mean m and a
standard deviation s, then in repeated
samples of size n the sample mean
approximately follows a Normal distribution


s
x ~ N  m,

n


The first sample had a mean x = 49.9496
and a standard deviation s x = 1.0264.
n
Strd Dev of
x
=
sx
=
sx
n
Sample A
10
0.324576 = 1.0264/sqrt(10)
Sample B
400
0.0513 = 1.0264/sqrt(400)



We know that 95% of all observations fall within
± two standard deviations of the mean.
Likewise, 95% of all sample means fall within
± two standard deviations of the observed
sample mean.
So, for 1900 out of 2,000 samples, the interval
s

x  2 x

n


will contain the true population
mean.
2 x (0.0513)
2 x (0.0513)

1.
Now there are two possibilities. Either
the true population mean is contained in the
interval






1
.
0264
1
.
0264
 49.9496  2
, 49.9496  2

400 
400  



(49.8470, 50.0522)
2.
or this is one of those 5% of samples whose
s
 interval does not contain the
x  2 x

n
true value.


C is typically set at 95%, but it’s sometimes
chosen to be 90% or 99%.
STATA Exercise 1
-z*= - 1.96 if C=95%
z* = 1.96 if C=95%
So don’t use 2 when constructing a 95% CI: use 1.96.
If the margin of error is too large…

Reduce s
–
s is determined by the population: a population
–
with a lot of variability will increase the chance
that a sample contain observations very far from
the true mean.
This is easier to say than to do.
50
0
50
40
60
50
0
40
60
50
40
60
50
0
0
50
0
60
0
40
60
50
40
60
50
40
0
50
0
60
0
40
60
0
40
60
0
50
50
50
0
0
0
0
60
40
60
60
40
60
40
60
40
60
50
50
40
40
50
50
40
0
40
60
16 samples. The s of the population increases from 1 to 4, increasing
the spread of the sample and the likelihood of getting m wrong.
If the margin of error is too large…

Increase the sample size (larger n)
If the margin of error is too large…

Be less confident of your estimate …
Use a lower confidence level (make C
smaller, hence a smaller z*)
If the margin of error is too large…



“We’re 99% sure that the President will
receive 51.5% of the votes, with a ±5%
margin of error.”
“We’re 95% sure that the President will
receive 51.5% of the votes, with a ±3%
margin of error.”
“We’re 90% sure that the President will
receive 51.5% of the votes, with a ±1%
margin of error.”
Cautions
1.
2.
3.
4.
5.
Is it an SRS?
Is the data unbiased (or do we know the
bias)?
Are there no outliers that influence the
sample mean?
Is n large? If not, is the underlying
population Normally distributed?
Do you know the true s?
Theorems of mathematical statistics are true; statistical methods
are effective only when used with skill.
Cautions

FALSE: “The probability that the true mean
falls within x  2s x  is 95%”

–

n
This is false because either the interval contains
the true population mean (which is not a random
variable), with Pr=1, or it doesn’t, with Pr=0.
TRUE: “The probability that the interval
s
 is one of the ones that contain
x  2 x

n


the true mean is 95%”
Tests of Significance
Making claims about the population
parameters

In our sample, we observed a mean of
49.9496 visits to the mall per year.
–
–
Assuming that the true population mean is 50,
how likely is it that we observe a sample mean as
small as 49.9496, or even smaller?
if the true population mean were 45, how likely is
it that we observe a sample mean as large as
49.9496, or even larger?
Making claims about the population
parameters
z
xm
sx
n
49.9496  50
z
 -0.0252
1.0264
400
49.9496  45
z
 2.4748
1.0264
400
Making claims about the population
parameters
x
-0.0252
If the true population mean
were 50, how likely is it that
we observe a sample mean
at least as small as
49.9496? Pr=49%
2.4748
if the true population mean
were 45, how likely is it that we
observe a sample mean as
large as 49.9496? Pr=0.68%
Making claims about the population
parameters

We found that if the true mean is 45, the Pr
of observing a sample mean as large as
49.9496 is 0.68%. Either
–
–
we’ve observed a very rare event (our sample is
really unusual)
the true mean is not 45. There’s another number
that makes the observed sample more likely.
A sample outcome that would be extreme if a hypothesis were true
is evidence that they hypothesis is not true.
A sample outcome that would be extreme if a
hypothesis were true
is evidence that they hypothesis is not true.
H0: m=45
These are
hypothesis about
the population.
Ha: m45
This is a twosided alternative
hypothesis
Test Statistics


A test statistic measures compatibility
between the null hypothesis and the data.
The z-score can be used as a test statistic
because we can compare it against 1.96, the
z-score that delimits a 0.95 area under the
Normal curve.
–
1.96 is called the appropriate “critical value”.
Test Statistics


The Student’s t Distribution is used when n is
small.
It approximates the Standard Normal, zdistribution as n gets large.
Test Statistics



We know that 95% of all values are between
2 standard deviations of the mean.
That is, 95% of all values are between the zscore of 1.96 and the z-score of -1.96.
So if we get a sample outcome whose zscore is greater than 1.96 (in absolute value),
we know that it it is unlikely to belong to the
population of which the null hypothesis is a
parameter.

Suppose
–
–
–
–
–
n = 110
s = 26.4
x = 8.1
H0: m = 0
Ha: m  0
8.1  0
z
 3.22
26.4
110
z
8.1  0
 3.22
26.4
110
Exercise

A company makes cellphones using components from
two countries: Ecuador and Canadaguay. Here are
data on days of cellphone durability.
m
s
Ecuador
300
100
Canadaguay
100
50
# days till broken


Your retail shop buys 100 cellphones because the
manufacturer claims they were made in Ecuador. On
average, they stop working after 279 days of use.
Is this difference (279 days versus 300 days)
significant? Is it a fluke or does it mean something?
Exercise
H 0 : m  300


H a : m  300
x  m 279  300 
z

s n 100 100
 2.1
P  value  0.0179
 1.79%

The null hypothesis is that
the phone typically lasts
300 days.
Alternatively, it’s a lower
quality phone.
The z-score can tell us
how far this observation is
from the mean.
Look up in table A the
probability of observing a
z-score as small as this or
smaller.
Exercise

Suppose the parameters were, instead
# days till broken
Ecuador

m
s
300
200
Now, is this difference (279 days versus 300
days) significant? Is it a fluke or does it
mean something?
P  value  0.1469
xm
279  300
 1.05
z

 14.69%
s n 200 100
Exercise


Suppose average durability of the 100
cellphones was, instead, 90 days.
Now, is this difference (90 days versus 300
days) significant? Is it a fluke or does it
mean something?
# days till broken
Ecuador
m
s
300
200
xm
90  300
z

 21
s n 100 100
P  value  0.0000
 0%
We found that if the true mean is 45, the Pr of
observing a sample mean as large as 49.9496 is
0.68%.
Notice that here
This is a oneH0: m = 45
sided alternative
Ha: m > 45
hypothesis
Look this up in Table D, 20-1
degrees of freedom.
We have to use the Student’s
t because n is small.
Tests for Population Mean
1.
2.
3.
4.
State the hypothesis
Calculate the test statistics
Find the P-value
State your conclusion in the context of your specific setting
C = 1-a for two-sided tests






s = 0.0068
x = 0.8404
H0: m = 0.86
n=3
0.8404  0.86
t
 -4.99
0.0068
3
Look in Table D for the z-score on a two-tailed 1% significance
level (look in the 0.005 column) for df = 3-1.
Is it smaller (in absolute value) than - 4.99?





s = 0.0068
x = 0.8404
H0: m = 0.86
n=3
The 99% CI is
Look up the t* for
df=3-1, upper tail
probability 0.005
x  t
(
cii
3
*
n 1
s
n , x  tn*1 s
0.8014 ,
0.8404
n
0.8794
0.0068, level(99)

)
P-values versus a fixed a


If the z-score is - 4.99, the corresponding pvalue is 0.0000006
The p-value is the smallest level of a at
which the data are significant.
Remember that C = 1-a for two-sided tests,
and that bigger Confidence means wider CI.
“The smallest level of a” then mean the
largest C and widest CI that will still contain
the hypothesized value.
p-value
H0: m
x
If the P-value is larger
than the chosen
significance level a, we
say that the statistic is not
significant.
If the P-value is smaller
than the chosen
significance level a, we
say that the statistic is not
significant.
Using Significance Tests
. tabstat guess grade diff if position<8
stats |
guess
grade
diff
---------+-----------------------------mean |
76 98.41428 22.41428
----------------------------------------
Is it true that, on
average, people
who finish earlier
tend to do better?
. tabstat guess grade diff if position>=8
(Notice causality is
not determined).
stats |
guess
grade
diff
---------+-----------------------------mean |
64.375 82.49375 18.11875
----------------------------------------
Significance Tests

H0 is our hypothesis: how plausible is it,
given the data, our statistic, and its sampling
variation?
–
If a priori H0 seems true, very small p-values will
be needed to convince people that H0 are wrong.

A small p-value means that your
estimated statistic is so far from H0
that it’s unlikely that your statistics is
derived from a population where H0 is
true.
H0
Significance Tests

H0 is our hypothesis: what are the
consequences of rejecting H0.
–
If rejecting H0 led to huge changes in our
behavior, with large costs, we’ll need to be very
convinced.
H0
Significance Tests

Decide on a significance level, a.
–

Remember a = 1 - C, where C is the confidence
level
Check if the P-value is below your predecided significance level.
p-value
H0: m
x
p-value
H0: m
x
Significance Tests


Check for the practical significance (the
actual size of the number) of a statistic that is
statistically significant.
Do exploratory data analysis.
–
–

Check for outliers.
Check for the Normality of the data.
Report confidence intervals.
Excel and icosahedron exercise 1
Download