Psychology 2010 Lecture 10 Notes: Hypothesis Testing Ch 6

advertisement
P201 Lecture Notes
Statistical Inference and Estimation Chapter 6
The two processes that are statistical inference.
I. Estimation.
Question asked: What is the value of the population parameter?
Example:
What % of U.S. citizens believe that Congress should raise the debt ceiling?
What % of persons in Chattanooga believe that kids under 18 should be allowed in
Coolidge Park only under adult supervision?
What % of persons using the Riverwalk believe that dogs should prohibited?
Answers: A number, called a point estimate, or an interval, called a confidence interval. The
interval is one that has a prespecified probability (usually .95) of surrounding the population
parameter.
So the answer to the first example might be reported as “37% with a 5% margin of error.”
This is a combination of point estimate (the 37%) and a interval estimate (from 37-5 to 37+5 or from
32 to 42%).
II. Hypothesis Testing.
A. With one population. . .
Deciding whether a particular population parameter (usually the mean) equals a value specified by
prior research or other considerations.
Example: A light bulb is advertised as having an “average lifetime” of 5000 hours.
Question: Is the mean of the population of lifetimes of bulbs produced by the manufacturer equal to
5000 or not?
B. With two populations. . .
Deciding whether corresponding parameters (usually means) of the two populations are equal or not.
Example: Statistics taught with lab vs. Statistics taught without lab.
Question: Is the mean amount learned by the population of students taught with a lab equal to the
mean amount learned by the population of student taught without lab?
C. Three populations. . .
Deciding whether corresponding parameters (usually means) of the three populations are equal or
not. And on and on and on.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 1
2/5/2016
Estimation – not covered in Corty
Estimation is using information from samples to guess the value of a population parameter or
difference between parameters. A lot of this goes on during an election season.
Point Estimate
A single value which represents our single best estimate of the value of a population parameter.
Interval Estimate (usually reported as “Margin of error”)
An interval which has a prespecified probability of surrounding the unknown parameter.
This interval estimate is called a confidence interval (CI).
Typically the interval estimate is centered around the point estimate.
Lower limit
Upper limit
Point estimate
statistic
The sample statistic most often used as a point estimate is the Sample Mean.
Reporting the result of estimation: (Hypothetical data) . . .
“From the result of the XYZ poll, it is estimated that 17% of adult residents of the U.S. have tried
gluten free diets, with margin of error equal to 3%.”
This means that the pollster is 95% confident that the actual population percentage of persons trying
gluten free diets is between 14% and 20%.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 2
2/5/2016
Introduction to Hypothesis Testing: The mean of a population
Suppose we have a single population, say the population of light bulbs mentioned above.
Suppose we want to determine whether or not the mean of the population equals 5000.
You might wonder: What does it matter whether the mean is 5000?
Answer: The manufacturer might be selling light bulbs whose average lifetime is only 2000
hours and may be counting on the fact that no one really pays attention.
But the difference between 2000 hours and 5000 hours will add up over multiple purchases.
In times when money is tight, those small differences may combine to make a large overall
difference.
Two possibilities
H0. The population mean equals 5000.
H1. The population mean does not equal 5000.
These possibilities are called hypotheses.
The first, the hypothesis of no difference is called the Null Hypothesis. (H0)
The second, the hypothesis of a difference is called the Alternative Hypothesis. (H1)
Our task is to decide which of the two hypotheses is true.
“I reject the null”
vs
“I fail to reject the null” (I will sometimes say, “I retain the null.”)
Why can’t we just know about the population?
Light bulbs – The manufacturer may have simply made up the numbers on the package. They may
have made a mistake in estimation.
Treatment for C.Diff – One doctor says take antibiotics for 2 weeks. He/she believes that the mean
number of C.Diff bacteria will be essentially 0 after 2 weeks.
A 2nd doctor says take them for 6 weeks. He/she believes that the mean number of C.Diff bacteria
will not be zero until 6 weeks.
The point is that no one “knows” the correct value. If we have a belief about a specific population
value, we have to test that belief using hypothesis testing procedures.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 3
2/5/2016
Two general approaches.
1. The Bill Gates (Warren Buffet) approach.
Purchase ALL the light bulbs in the population. Measure the lifetime of each bulb. Compute the
mean. If the mean equals 5000, retain the null. If the mean does not equal 5000, reject the null.
Problem: Too many bulbs.
2. The Plan B approach.
Take a sample of light bulbs.
Compute the mean of the sample.
From our study of sampling distributions, we know that the sample mean will not exactly equal the
population mean. But we also know that it’ll be close to the population mean. If the population
mean is 5000, then the sample mean should be close to 5000. So . . .
An intuitively reasonable decision rule
If the value of the sample mean is “close” to 5000, decide that the null must be true.
If the value of the sample mean is “far” from 5000, decide that the null must be false.
But how close is “close”? How far is “far”?
What if the mean of the lifetimes of 25 bulbs were 4999.99? Most rational people would retain null.
What if the mean of the lifetimes of 25 bulbs were 1003.23 Most rational people would reject null.
What if the mean of the lifetimes of 25 bulbs were 4876.44? Hmm. A gray area.
Clearly we need some rules, or perhaps we should say that we need an operational definition of
“close” and of “far”.
Close and Far as Probabilities – not covered in Corty
Recall from sampling distributions that means of samples from a population vary around that mean
of the population. This variation is due to sampling error.
When we take samples from mean whose population is 5000, for example, the means most likely to
occur will be those close to 5000.
The means least likely to occur will be those far from 5000.
This suggests that we could say that a mean is “close” to the hypothesized mean if it is one of those
means that would be likely to be obtained from a population whose mean was the hypothesized
value.
And we could say that a mean is “far” from the hypothesized mean if it is one of those that would
be unlikely to be obtained from a population whose mean was the hypothesized value.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 4
2/5/2016
Our Decision Rule Stated in terms of Probabilities
So we can describe out decision rule (“close” vs. “far”) in terms of probabilities . . .
If the sample mean is one of those that would have high probability of occurring if the population
mean were 5000, then we’ll conclude that the population mean must be 5000.
If the sample mean is one of those that would have low probability of occurring if the population
mean were 5000, then we’ll conclude that the population mean must not be 5000.
The p-value.
Statisticians state the decision rule in terms of probabilities.
They formalize the process by computing a special probability called the p-value.
The p-value is the probability of an outcome (e.g., sample mean value) as extreme as the
obtained outcome if the null hypothesis is true.
Statisticians base their decision on the p-value.
Our Decision Rule described in terms of the p-value
But if the p-value is larger than the agreed-upon criterion value, the null hypothesis will not be
rejected.
Statisticians have agreed that if the p-value is smaller than or equal to an agreed-upon criterion
value then the null hypothesis is to be rejected.
Close = High probability = large p-value.
Far= Low probability = small p-value.
Signficance Level
The criterion against which the p-value is compared is called the significance level.
Typically, the significance level is (arbitrarily) set at .05.
Our Final Description of the process in terms of p-value and significance level . . .
If the p-value is larger than the significance level, then the null hypothesis is not rejected.
If the p-value is less than or equal to the significance level, then the null hypothesis is rejected.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 5
2/5/2016
Corty’s Steps in Hypothesis Testing
Every hypothesis test involves a set of steps.
Every statistical text has a variation on the following list of steps.
These steps are always carried out, regardless of the type of hypothesis being tested.
Step 1. Pick the statistical test appropriate for your hypothesis.
Right now, you don’t know of any statistical tests. That will soon change.
Step 2. Make sure your data meet the assumptions of the test, e.g., unimodality, symmetry, near
normality, no outliers.
Create a frequency distribution with Normal Curve Overlay. Look for outliers – extremely
positive or negative values. Look at skewness values – should be |skewess| < = 1.5.
Step 3. List the null and the alternative hypothesis
See my example.
Step 4. Set the significance level of the test. (Corresponds to Corty’s “critical value” step.)
Significance level will always be .05 for this course. See below for critical values.
Step 5. Compute the value of the test statistic. Also compute its p-value.
See example.
Step 6. Compare the p-value with the significance level and make your decision. Then interpret the
result.
You should memorize these steps for the next exam.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 6
2/5/2016
Worked Out Example – The light bulb example
Suppose that purchasers from across the country were contacted and instructed to go to the nearest
hardware/building supply store and purchase a packet of the bulbs and then to select one bulb
randomly from the packet. Those bulbs were then packed in foam and sent to a centralized testing
facility where 100 of them were randomly selected from the nearly 500 shipped from across the
country. Those 100 were plugged into standard sockets and power was applied. They were allowed
to burn continuously until they failed and the time to failure of each bulb in the sample was
recorded.
Suppose that the manufacturer had substantial evidence that the standard deviation of the population
was equal to 300 and that this value was not related to the mean of the population. We couldn’t actually
know this in real life.
Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single
population is the Z test. The test is called the One Population Z Test.
Step 2: Assumptions. We’ll assume that the data are essentially uniomodel and symmetric, pretty
nearly normally distributed. We’ll assume that there are no outliers.
Step 3: Null hypothesis and Alternative hypothesis.
We want to determine whether the manufacturer’s claim that the “average” lifetime in the population
is 5000 is true or not. This suggests the following . . .
Null Hypothesis:
The mean of the population equals 5000.
µ = 5000.
Alternative Hypothesis:
The mean of the population does not equal 5000.
µ ≠ 5000.
Step 4: Significance Level
This one's easy. It's quite common to use .05 as the criterion for the probability of the observed
outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than
.05, we retain the null. So let the significance level be .05
As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 7
2/5/2016
Step 5: Computed value of test statistic and the p-value
Suppose the mean of the sample of 100 lifetimes was 4935.33 with sample standard deviation equal
to 305.4.
Then Z
Z=
=
(X-bar – Hypothesized mean)
----------------------------------Standard error of the mean
4935.33
–
5000
-------------------------------- =
300/10
- 64.67
---------30
= -2.16
The p-value for a Z of -2.16 is computed as a Normal Distribution Problem.
Recall that the p-value is the probability of a value as extreme as the obtained value of Z.
The Z is 2.165, so any Z as large as + 2.16 or larger is “as extreme as” the obtained Z.
But note that any Z as negative as – 2.16 or more negative is also “as extreme as” the obtain ed Z.
So we want the probability of a Z as positive as +2.16 + probability of a Z as negative as -2.16.
To solve it, we get the two areas beyond 2.16 in either direction – the area to the left of the Z as a
negative number and the area to the right of Z as a positive number.
Z= - 2.16
Z= + 2.16
Distribution of
sample means if
null were true.
-2
-1
0
1
2
Z=
Tail
area =
.0154
.0154
p-value =
Gotten from the
Normal
Distribution
table.
.0154 + .0154 = .0308
Step 6: Compare the p-value with the significance level.
.0308 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population
mean is NOT equal to 5000.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 8
2/5/2016
Critical Values of the Z statistic
The Z obtained Z value for the above example was 2.16. The p-value was .0308.
Here are some other Z values that we could have obtained and the p-value for each of them.
Z value we could have obtained.
0.50
1.00
1.50
1.70
1.80
1.96
2.16 (our Z)
2.50
3.00
3.50
p-value
Decision
0.6170
0.3174
0.1336
0.0892
0.0718
0.0500
0.0308
0.0124
0.0027
0.0005
Do not reject
Do not reject
Do not reject
Do not reject
Do not reject
Reject
Reject
Reject
Reject
Reject
Note the pattern. As Zs get farther from 0, the p-values get smaller and smaller.
Note that there is one “special” Z value whose p is exactly 0.0500. That Z is called the critical Z.
Its value is 1.96. If your Z is -1.96 or +1.96, you know that your p-value is exactly .05. Any Z
larger in absolute value than the critical value will have a p smaller than .05.
This Z will always be the value that divides “Do not reject” from “Reject”, whenever you do a Z
test.
This means that knowledgeable data analysts don’t even bother to compute p-values when they do a
Z test. They remember that the Critical Z is 1.96 and after conducting their research, if their
obtained Z is equal to or more negative than -1.96 or equal to or more positive than + 1.96, they
reject.
We will use p-values, not critical values.
If all of our hypothesis tests were Z tests, then we would not bother computing p-values for any of
them. If that were the case, we’d simply compare our obtained Z with 1.96 and base our decision on
the result of that comparison.
But 99+% of our statistical tests will NOT be Z tests.
And computation of critical values for the other types of tests is cumbersome.
Luckily, our computer program will compute the p-value for each of the other tests that we’ll
conduct. So we don’t have to deal with critical values of any of the statistical tests that follow.
(I may mention them in passing, but we’ll let the computer do that work for us.)
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 9
2/5/2016
Another example.
A drug manufacturer has developed a drug that it believes will affect the duration of cold symptoms.
Suppose that prior careful measurement of colds symptoms has determined that the population
average duration of symptoms without treatment is 8 days with population standard deviation equal
to 2 days.
The manufacturer recruits a sample of persons who sign a waiver allowing the representatives of the
manufacturer to “give them” colds by swabbing their nasal passages with a fluid containing the cold
virus. The time of swab is time 0.
The number of days until a carefully calibrated measure indicated the absence of cold symptoms was
recorded. For 25 persons, the values are listed below . . .
days
Frequency
Valid
Percent
Valid Percent
Cumulative Percent
3
3
12.0
12.0
12.0
4
1
4.0
4.0
16.0
5
4
16.0
16.0
32.0
6
7
28.0
28.0
60.0
7
4
16.0
16.0
76.0
8
2
8.0
8.0
84.0
9
1
4.0
4.0
88.0
10
3
12.0
12.0
100.0
25
100.0
100.0
Total
Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single
population is the Z test.
Step 2: Assumptions.
Check these
Step 3: Null hypothesis and Alternative hypothesis.
We want to determine whether the manufacturer’s claim that the “average” duration of symptoms in
the population is 8 is true or not. This suggests the following . . .
Null Hypothesis:
Alternative Hypothesis:
Biderman’s 201 Handouts
The mean of the population equals 8.
The mean of the population does not equal 8.
µ = 8.
µ ≠ 8.
P201 Topic 11: Statistical Inference- 10
2/5/2016
Step 4: Significance Level
This one's easy. It's quite common to use .05 as the criterion for the probability of the observed
outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than
.05, we retain the null. So let the significance level be .05
As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96.
Step 5: Computed value of test statistic and the p-value
Z
=
(X-bar – Hypothesized mean)
----------------------------------Standard error of the mean
6.32 –
8
-------------------------------2/5
Z=
`-1.68
---------0.4
=
= -4.2
The p-value for a Z of -4.20 is computed as a Normal Distribution Problem.
Recall that the p-value is the probability of a value as extreme as the obtained value of Z.
To solve it, we get the two areas beyond 4.2 in either direction – the area to the left of the Z as a
negative number and the area to the right of Z as a positive number.
Z = -4.20
Z= +4.20
-2
Tail
area =
-1
0
1
2
.00133
.00133
p-value =
.00133 + .00133 = .00266
Step 6: Compare the p-value with the significance level.
.0000 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population
mean is NOT equal to 8.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 11
2/5/2016
Possible results of the Hypothesis Testing Process Start here on 10/23/14.
State of World
Null True, µ=5000 Null False, µ≠5000
Fail to reject Null
Correct
Failure to reject
Incorrect Failure to Reject
Type II Error
Reject Null
Incorrect Rejection
Type I Error
Correct Rejection
Decision
Correct Failure to reject – a good outcome
The null is true.
µ really does equal 5000.
The manufacturer’s claim is true.
We do not reject the null but instead conclude that the null is true. We make a correct decision.
Correct Rejection – another good outcome
The null is false.
µ really does not equal 5000
The manufacturer’s claim is wrong.
We "detected" the difference between the actual population mean and the manufacturer’s claim.
Incorrect Rejection: Type I Error
The Null is true.
µ really does equal 5000.
The manufacturer’s claim is true.
But unbeknownst to us, because of a random accumulation of factors, our outcome was one which
seemed inconsistent with the null. So we rejected it and incorrectly accused the manufacturer of
lying on its packaging.
Controlling P(Type I Error): The significance level.
So, in most research, the probability of this error is .05.
Incorrect Retention: Type II error.
The Null is false.
µ does not equal 5000.
The manufacturer is lying
But, because of a random accumulation of factors, our outcome was one which seemed consistent
with the null. So we did not reject it and issued a statement saying incorrectly that the
manufacturer’s packaging had truthful language.
Controlling P(Type II error). Having large samples is the primary means of minimizing P(Type II).
Larger sample sizes lead to smaller P(Type II error).
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 12
2/5/2016
Two-tailed vs. One-tailed Alternative Hypotheses
For one population tests, the Null is always: “Population mean equals X.”
Most of the time, the alternative hypothesis is, “Population mean does not equal X.”
The “does not equal” means that we will reject the null . . .
if the sample mean is less than the hypothesized value or
if it is larger than the hypothesized value.
Most of the time we’re so ignorant of the situation that we won’t be able to predict whether the
population mean will be less than or larger than the hypothesized value when the null is false.
Occasionally, however, we’ll know that if the null is false, the population mean can only be larger
than the hypothesized value.
In other instances, we’ll know that if the null is false, the population mean can only be smaller than
the hypothesized value.
Example . . .
A manufacturer of food products is trying to determine how much sodium to add to its packaged
meat.
The population mean taste ratings of its meat with 9% sodium is 57 with standard deviation = 7.
An experiment is conducted in which samples of meat with sodium = 12% are rated.
The null hypothesis is: Population mean rating of the 12% meat = 57.
Alternative hypothesis: ?
Suppose the manufacturer knows that the meat with more sodium won’t be rated worse (except on
rare occasions due to sampling error).
In this case, the manufacturer knows that if the sodium has an effect, it will only INCREASE the
mean rating. So the alternative hypothesis would be
(One-tailed) Alternative hypothesis: Population mean > 57.
Using a one-tailed alternative hypothesis changes the p-value.
If the alternative hypothesis is “Pop mean > hypothesized value”, the p-value is the probability of a
value as large as the obtained value.
If the alternative hypothesis is “Pop mean < hypothsized value”, the p-value is the probability of a
value as small as the obtained value.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 13
2/5/2016
Worked out example:
Ratings of a sample of 25 persons are taken.
Here they are . . .
rating
73
57
56
66
49
59
70
45
56
51
56
48
70
73
47
59
63
59
63
60
56
67
60
60
62
Mean = 59.4
S = 7.778.
Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single
population is the Z test.
Step 2: Assumptions.
Check these
Step 3: Null hypothesis and Alternative hypothesis.
We want to determine whether the manufacturer’s claim that the “average” duration of symptoms in
the population is 8 is true or not. This suggests the following . . .
Null Hypothesis:
Alternative Hypothesis:
Biderman’s 201 Handouts
The mean of the population equals 57.
The mean of the population is larger than 57.
µ = 57.
µ > 57.
P201 Topic 11: Statistical Inference- 14
2/5/2016
Step 4: Significance Level
This one's easy. It's quite common to use .05 as the criterion for the probability of the observed
outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than
.05, we retain the null. So let the significance level be .05
As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96.
Step 5: Computed value of test statistic and the p-value
Z
Z=
=
(X-bar – Hypothesized mean)
----------------------------------Standard error of the mean
59.4 –
57
-------------------------------7/5
` 2.4
---------1.4
=
= 1.7
The p-value for a Z of 1.70 is computed as a Normal Distribution Problem.
Recall that the p-value is the probability of a value as LARGE as the obtained value of Z (since
we’re employing a one-tailed alternative).
To solve it, we get the two areas beyond 4.2 in either direction – the area to the left of the Z as a
negative number and the area to the right of Z as a positive number.
Z= +1.70
-2
-1
0
1
2
Tail
area =
p-value =
.0446
.0446
Step 6: Compare the p-value with the significance level.
.0446 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population
mean is larger than 57.
Biderman’s 201 Handouts
P201 Topic 11: Statistical Inference- 15
2/5/2016
Download