Week 6 Notes - Department of Statistics and Probability

advertisement
Statistics for Business and
Economics:
Testing Of Hypotheses
STT 315: Section 201
Instructor: Abdhi Sarkar
Acknowledgement: I’d like to thank Dr. Ashoke Sinha for allowing me to use
and edit the slides.
Hypothesis Tests: Goal
Within the context of the problem, students will be
able to:
 Perform a one-sample z-test for proportion.
 Perform one-sample z-test and t-test for mean.
 Verify the conditions for performing the tests.
 Identify Type I and Type II errors.
 Understand the concept of p-value.
 Interpret the results of the test within the
context of the problem.
2
Example 1
A congressman claims that only 12% of drivers talk on
their cell phone. Standing at a bus stop someone
noticed 4 out of 10 drivers on a cell phone.
• Is this an evidence that the congressman is wrong?
• But before answering that question, we must reflect on
“Is the sample size large enough to come to any
conclusion about proportion?”
No, because np = 1.2, and n(1-p) = 8.8, both of which
are less than 9.
So we must have a larger sample.
3
Example 1
• What minimum sample size is required that meets the
“large enough” condition?
About 76, because it gives np = 9.12 expected
successes and n(1-p) = 66.88 expected failures,
both larger than 9.
• Suppose 19 out of 100 drivers were found to be on cell
phones. Is this an evidence that the congressman is
wrong?
Let us assume that the congressman’s assumption is
correct, and find how likely it is to have 19% or more
drivers to be on cell phone, when sample size is 100.
4
Example 1
• As n = 100 is large enough [np and n(1-p) > 9],
• the sample proportion is approximately normal
• with mean p = 0.12,
• and standard deviation (0.12 × 0.88/100) =
0.0325,
• so the chance that sample proportion will be larger
than 19% is
normalcdf(0.19,100,0.12,0.0325) = 0.016,
i.e. out of 100 drivers, there is only 1.6% chance
that 19 or more drivers will be on cell phone.
• Hence congressman’s claim does not seem to be
very likely.
5
Few questions
• We are making our conclusion on the basis of
a sample. Is there any possibility of making an
error?
• How much reliable is our conclusion?
In other words, what is the probability of
making an error?
• We are saying that 0.016 (1.6%) is not very
likely. What value(s) should we consider as
likely, and what as unlikely?
6
Some terminologies
• A hypothesis is a claim/conjecture about some
parameter of the population distribution.
In the driving example, congressman’s claim that p =
0.12, is a hypothesis.
• We shall have two competing hypotheses:
H0, the null hypothesis,
Ha, the alternative hypothesis.
• In testing of hypotheses we try to retain the null
hypothesis unless sample provides strong evidence
against it, in which case we conclude in favor of
alternative hypothesis.
7
Errors in testing
In a hypothesis test we have two possible errors:
• We find evidence to reject the null hypothesis but it
turns out that the null hypothesis is true: Type I error.
• We do not find evidence to reject the null hypothesis
but the null hypothesis turns out to be false: Type II
error.
8
In the cell phone drivers example, when we had
19 people in the sample who were talking on
their cell phones, we found evidence to reject the
null hypothesis. If the actual percent of drivers
who talk on cell phones is 12%, we have made
A. A Type I error
B. A Type II error
9
In the cell phone drivers example, when we had
14 people in the sample who were talking on
their cell phones, we did not find evidence to
reject the null hypothesis. If the actual percent of
drivers who talk on cell phones is not 12%, we
have made
A. A Type I error
B. A Type II error
10
Few Remarks
• Errors of Type I and II are NOT mistakes.
They happen due to sampling variability.
It is the same reason that some confidence intervals
do not contain the population value, because the
conclusions we make based on hypothesis tests
are probabilistic.
• Except in some special cases, we are bound to make
both type of errors. Our attempt is to minimize the
probability of making these errors.
• Unfortunately, often is the case that trying to reduce
one error increases the chance of other error.
11
Which error is more crucial?
• This is not an easy question. In order to answer this
question, let us think of a different problem.
Suppose you are a jury member and you have to
decide whether someone is guilty or not.
The person is
innocent
The person is
guilty
Jury declares the
person to be guilty
Type I error

Jury declares the
person to be not guilty

Type II error
• Which error is more crucial?
Type I error.
12
Example
A drug company tests the null hypothesis that a
new drug does not work better than a
placebo.
What is a Type I error in this case?
A. The tests show the new drug works better
than the placebo, but it really does not.
B. The tests show the new drug does not work
better than the placebo, but it really does.
13
Example
A drug company tests the null hypothesis that a
new drug does not work better than a
placebo.
Which error would be worse to make?
A. The tests show the new drug works better
than the placebo, but it really does not.
(Type I error)
B. The tests show the new drug does not work
better than the placebo, but it really does.
(Type II error)
14
The Testing Procedure
• Since Type I error is more crucial, we always make sure that the
probability of Type I error is not too high (say, under 5%) and
then we try to minimize Type II error.
• Such a test is said to have a level of significance 5% and it is
called a level-0.05 test.
• So level of significance (usually denoted by the Greek letter α)
is the maximum permissible probability of Type I error. The
popular choices of values of α are 1%, 5% and 10%.
• Given the sample we compute the probability of Type I error,
which is also known as observed level of significance or p-value.
• If “p-value is less than level of significance”, we reject H0.
• Alternatively, we compute the rejection region corresponding
to α, and find the value of the test-statistic from the sample.
• We reject H0 if the value of test-statistic falls in the rejection
15
region.
The Testing Procedure
• Given a statistical problem we shall first decide what are the
suitable null and alternative hypotheses.
• Mind that the hypotheses are decided before the data are
collected; that means
– the sampled data should not influence what the null
and/or alternative hypotheses are.
– but on the basis of sample we shall decide which of the
hypotheses to be rejected.
• We decide about the level of significance (α).
• We compute the test-statistic and the rejection region and see
if the test-statistic falls in the rejection region.
• Additionally, we compute p-value and if p-value < α, we reject
H0.
16
On the question of choosing an  level rules of thumb for p-values
• Popular choices for  are 0.01, 0.05 and 0.10.
• A p-value below 0.01 is very unusual and provides
strong evidence to reject the null hypothesis.
• A p-value between 0.01 and 0.05 is pretty unusual and
provides moderate evidence to reject the null
hypothesis.
• A p-value between 0.05 and 0.10 is unusual and
provides some evidence to reject the null hypothesis.
• In some social sciences, a p-value between 0.10 and
0.25 is considered unusual and evidence to reject the
null hypothesis.
17
Test for population proportion
18
Test for 𝑝
For the null hypothesis 𝐻0 : 𝑝 = 𝑝0 , we define
Test-statistic: 𝑍 =
𝑝−𝑝0
𝑝0 (1−𝑝0 ) 𝑛
.
Alternative
hypothesis
Rejection
region
𝐻𝑎 : 𝑝 > 𝑝0
𝐻𝑎 : 𝑝 < 𝑝0
𝑍 > 𝑧𝛼
𝑍 < −𝑧𝛼
|𝑍| > 𝑧𝛼 2
𝐻𝑎 : 𝑝 ≠ 𝑝0
One-sided test (right)
One-sided test (left)
Two-sided test
The sample has to be large enough to apply this test:
𝑛𝑝0 > 9,
𝑎𝑛𝑑
𝑛(1 − 𝑝0 ) > 9.
TI 83/84 function 1-PropZTest computes the value of the test
statistic and the p-value.
19
Remark
• Testing “𝐻0 : 𝑝 ≤ 𝑝0 against 𝐻𝑎 : 𝑝 > 𝑝0 ” is same
as testing “𝐻0 : 𝑝 = 𝑝0 against 𝐻𝑎 : 𝑝 > 𝑝0 ”.
• Testing “𝐻0 : 𝑝 ≥ 𝑝0 against 𝐻𝑎 : 𝑝 < 𝑝0 ” is same
as testing “𝐻0 : 𝑝 = 𝑝0 against 𝐻𝑎 : 𝑝 < 𝑝0 ”.
20
Example 1
Let us test the problem of Example 1 (calling on cell-phone while
driving) at 5% level of significance. Here we consider
H0: p = 0.12, Ha: p > 0.12.
In our sample 19 of 100 drivers were on cell-phone.
We shall use TI 83/84 Plus to compute p-value.
• Press [STAT].
• Select [TESTS].
• Choose 5: 1-PropZTest….
• Input the following:
o
o
o
o
p0: 0.12
x: 19
n: 100
prop: > p0
• Choose Calculate and press [ENTER].
21
Test of 1-proportion with TI 83/84 Plus
Let us test the problem of Example 1 (calling on cell-phone while
driving) at 5% level of significance. Here we consider
H0: p = 0.12, Ha: p > 0.12.
In our sample 19 of 100 drivers were on cell-phone.
The TI 83/84 Plus output gives us:
prop > .12
z = 2.154101092
p = .0156160664
𝑝 = .19
n = 100.
Alternative hypothesis (Ha)
Value of test-statistic
p-value
Value of sample proportion
Sample size
Since p-value = 0.016 and α = 0.05, we reject H0 at 5% level of
significance, because p-value < α.
Note that we would not have rejected H0 if α = 0.01.
22
Test of 1-proportion with TI 83/84 Plus
Let us test the problem of Example 1 (calling on cell-phone while
driving) at 5% level of significance. Here we consider
H0: p = 0.12, Ha: p > 0.12.
In our sample 19 of 100 drivers were on cell-phone.
The TI 83/84 Plus output gives us:
prop > .12
z = 2.154101092
p = .0156160664
𝑝 = .19
n = 100.
Alternative hypothesis (Ha)
Value of test-statistic
p-value
Value of sample proportion
Sample size
On the other hand, for α = 0.05, we compute 𝑧0.05 =
𝑖𝑛𝑣𝑁𝑜𝑟𝑚 0.95,0,1 = 1.645.
Since 𝑍 = 2.154 > 1.645, reject H0 at 5% level of significance.
Note, 𝑧0.01 = 2.326, and we do not reject H0 at α = 0.01.
23
Example 2
The manufacturer of a certain chewing gum claims that
four out of five dentists surveyed prefer their type of
gum. You decide to test their claim against the
alternative that it is less. You find that in a sample of
200 doctors, 74% do actually prefer their gum. Is this
evidence sufficient to doubt the manufacturer’s claim?
Use 𝛼 = 0.03.
We shall test, 𝐻0 : 𝑝 = 0.8 against 𝐻𝑎 : 𝑝 < 0.8.
We find that (from TI 83/84) that
𝑍 = −2.121,
𝑎𝑛𝑑
𝑝−value = 0.017.
Since p-value < 0.03, we reject 𝐻0 at 𝛼=0.03.
Also 𝑍 < −𝑧0.03 = −1.881.
24
Example 3
In a random sample of 200 education school students, 92 sample
members indicated some measure of agreement with this
statement: “Scores on a standardized entrance exam are less
important for a student’s chance to succeed academically than is
the student’s high school GPA”.
Test at the 5% level of significance the null hypothesis that onehalf of all education school graduates would agree with this
statement against the alternative that its is different than half.
We shall test, 𝐻0 : 𝑝 = 0.5 against 𝐻𝑎 : 𝑝 ≠ 0.5.
We find that (from TI 83/84) that
𝑍 = −1.131,
𝑎𝑛𝑑
𝑝−value = 0.258.
Since p-value > 0.05, we do not reject 𝐻0 at 𝛼=0.05.
Also 𝑍 < 𝑧0.025 = 1.96.
25
Tests for population mean
26
Testing 𝜇
• Consider a population with mean 𝜇 and
standard deviation 𝜎.
• Goal: to test 𝐻0 : 𝜇 = 𝜇0 against a suitable
alternative, using a random sample of size n
from the population.
• Like constructing CI’s of 𝜇, the testing
procedure will depend on whether
– the sample size n is large enough or not,
– we know the value of 𝜎 or not.
27
Large sample tests for 𝝁
28
Large sample test for 𝜇
If nothing is known about the population and the sample size is large
𝑛 ≥ 30 .
For the null hypothesis 𝐻0 : 𝜇 = 𝜇0 , we define
Test-statistic: 𝑍 =
𝑥−𝜇0
,
𝜎/ 𝑛
(if 𝜎 known) or 𝑍 =
Alternative
hypothesis
Rejection
region
𝐻𝑎 : 𝜇 > 𝜇0
𝐻𝑎 : 𝜇 < 𝜇0
𝐻𝑎 : 𝜇 ≠ 𝜇0
𝑍 > 𝑧𝛼
𝑍 < −𝑧𝛼
|𝑍| > 𝑧𝛼 2
𝑥−𝜇0
𝑆/ 𝑛
(if 𝜎 unknown).
One-sided test (right)
One-sided test (left)
Two-sided test
TI 83/84 function Z-Test computes the value of the test statistic and
the p-value.
29
Facebook Example
A sample of 82 MSU undergraduates, the mean number of Facebook
friends was 616.95 friends with standard deviation of 447.05 friends.
Test whether the average friend size of MSU undergraduates is different
from the average number of Facebook friends of all Facebook users,
reported at 130 friends as of March 20, 2010.
To test: 𝐻0: µ = 130, 𝐻𝑎 : µ ≠ 130.
• Press [STAT].
• Select [TESTS].
• Choose 1: Z-Test
• Inpt: Select with arrow keys Stats
• Input the following:





µ0: 130
𝜎 : 447.05
𝑥 : 616.95
n: 82
µ: ≠ µ0
• Choose Calculate and press [ENTER].
30
Facebook Example
In Facebook Example: H0: µ = 130, Ha: µ ≠ 130.
In our sample n = 82, mean = 616.95, std. dev. = 447.05.
Let us test at 5% level of significance (i.e. α = 0.05).
The TI 83/84 Plus output gives us:
Alternative hypothesis (Ha)
µ ≠ 130
z = 9.863594213
p = 6.100659E-23
𝑥 = 616.95
n = 82
Test statistic
p-value
Sample mean
Sample size
Since p-value is almost 0 and α = 0.05, we reject H0 at 5% level of
significance, because p-value < α.
Also, 𝑍 = 9.86 > 𝑧0.025 = 1.96. So we reject H0 at α = 0.05.
31
Tests for 𝝁 in normal populations
32
Test for 𝜇 [normal population: known 𝜎]
Suppose the population is normally distributed with known 𝜎.
For the null hypothesis 𝐻0 : 𝜇 = 𝜇0 , we define
Test-statistic: 𝑍 =
𝑥−𝜇0
.
𝜎/ 𝑛
Alternative
hypothesis
Rejection
region
𝐻𝑎 : 𝜇 > 𝜇0
𝑍 > 𝑧𝛼
One-sided test (right)
𝐻𝑎 : 𝜇 < 𝜇0
𝐻𝑎 : 𝜇 ≠ 𝜇0
𝑍 < −𝑧𝛼
|𝑍| > 𝑧𝛼 2
One-sided test (left)
Two-sided test
In case, nothing is known about the population distribution and
𝑛 < 30, we make normality assumption and perform the above
test.
TI 83/84 function Z-Test computes the value of the test statistic
and the p-value.
33
Example 4
The state lottery office claims that the average household
income of those people playing the lottery is about $37,000.
We want to test (𝛼 = 0.10) that above average income is
actually more than that. Assume that the distribution of
household income of those people playing the lottery is
normally distributed with a standard deviation of $5,756.
Suppose that for a sample of 25 households, it is found that the
average income was $38,243.
To test: 𝐻0 : 𝜇 = 37000 against 𝐻𝑎 : 𝜇 > 37000.
Using Z-Test, we get 𝑍 = 1.0988 and p-value = 0.1359.
Here rejection region: 𝑍 > 𝑧0.10 = 1.282.
Since 𝑍 = 1.0988 < 1.282, we do not reject 𝐻0 at 𝛼 = 0.10.
Also here p-value > 0.10.
34
Test for 𝜇 [normal population: unknown 𝜎]
Suppose the population is normally distributed with unknown 𝜎.
For the null hypothesis 𝐻0 : 𝜇 = 𝜇0 , we define
Test-statistic: 𝑇 =
𝑥−𝜇0
.
𝑆/ 𝑛
Alternative
hypothesis
Rejection
region
𝐻𝑎 : 𝜇 > 𝜇0
𝑇 > 𝑡𝛼;n−1
𝐻𝑎 : 𝜇 < 𝜇0
One-sided test (right)
𝑇 < −𝑡𝛼;n−1 One-sided test (left)
𝐻𝑎 : 𝜇 ≠ 𝜇0 |𝑇| > 𝑡𝛼/2;n−1 Two-sided test
In case, nothing is known about the population distribution and
𝑛 < 30, we make normality assumption and perform the above
test.
TI 83/84 function T-Test computes the value of the test statistic
and the p-value.
35
Example 5
The manufacturer of a new product claims that his product will
increase output per machine by at least 29 units per hour. A line
manager adopts the product on 15 of his machines, and finds
that the average increase was only 26 with a std. dev. of 6.2. Test
whether output increase is less than the claim (use 𝛼 = 0.05).
To test: 𝐻0 : 𝜇 ≥ 29 against 𝐻𝑎 : 𝜇 < 29.
Alternative hypothesis (Ha)
µ < 29
t = -1.8740242
p = 0.0409746397
𝑥 = 26
Sx = 6.2
n = 15
Test statistic
p-value
Sample mean
Sample standard deviation
Sample size
Since p-value = 0.04 < 0.05, we reject H0 at α = 0.05.
Also T = −1.87 < −𝑡0.05;14 = −1.761, so we reject H0 at α =
0.05.
36
Remark
• Testing “𝐻0 : 𝜇 ≤ 𝜇0 against 𝐻𝑎 : 𝜇 > 𝜇0 ” is same
as testing “𝐻0 : 𝜇 = 𝜇0 against 𝐻𝑎 : 𝜇 > 𝜇0 ”.
• Testing “𝐻0 : 𝜇 ≥ 𝜇0 against 𝐻𝑎 : 𝜇 < 𝜇0 ” is same
as testing “𝐻0 : 𝜇 = 𝜇0 against 𝐻𝑎 : 𝜇 < 𝜇0 ”.
37
Download