Uploaded by lcthelc

Intro to Statistics 9.1-9.5 complete

advertisement
MAT 160 – Chapter 9 – Hypothesis Testing with One Sample
PART I
Consider the following problem:
The Environmental Protection Agency publishes data regarding the miles per gallon of all cars. A researcher
claims that fuel additives increase the miles per gallon of cars driven under highway driving conditions. The
mean miles per gallon of all large cars manufactured in 1999 without the fuel additive is 25.1, based on data
from the EPA. The researcher obtains a simple random sample of 35 large cars manufactured in 1999 and adds
a fuel additive. The sample mean is determined to be 26.8 miles per gallon.
Can we assume from the sample data that the mean has gone up with the addition of the additive?
With what we have been given it is hard to tell. We need to determine whether or not we have been give
enough evidence to conclude that the mean has gone up.
How can we test the researcher's claim?
We can conduct a hypothesis test. In order to conduct a hypothesis test, we will need a null and an alternate
hypothesis.
The null hypothesis: This is the statement that is assumed to be true until evidence indicates otherwise.
The alternate hypothesis: This is the claim to be proved true if we have enough evidence.
Consider a trial by jury situation. There is a defendant on trial. In this case the null hypothesis is that he is
innocent. The alternate hypothesis is that he is guilty. The null hypothesis is assumed to be true until evidence
indicates otherwise. We are testing his guilt. We are trying to find evidence to prove him guilty. We must either
prove the alternate hypothesis is true (find him guilty) or state that there is not enough evidence to support the
alternate hypothesis (find him not guilty).
H0: defendant is innocent
H1: defendant is guilty
Notice, that we never accept the null hypothesis; that is, we never find a defendant innocent.
We could have one of 3 possible situations for an alternate hypothesis:
1. If our decision involves trying to decide if a population parameter is less than some specified value, then the
alternate hypothesis would be written as:
H1: parameter < assumed value
This would be a left-tailed test.
2. If our decision involves trying to decide if a population parameter is more than some specified value, then the
alternate hypothesis would be written as:
H1: parameter > assumed value
This would be a right-tailed test.
3. If our decision involves trying to decide if a population parameter differs from some specified values, then
the alternative hypothesis would be written as:
H1: parameter ≠ assumed value
This would be a two-tailed test.
Your book will allow less than or equal to and greater than or equal to in the null hypothesis. This varies
among statisticians. Many statisticians will only use a statement of equality in the null hypothesis. I will accept
both but it should reflect the wording of the problem.
For our initial example, what would be the null and alternate hypothesis?
H 0 :   25.1
H1 :   25.1
Would this be a two-tailed test, left-tailed test, or right-tailed test?
Right-tailed test
This researcher needs to answer the question, if the average number of miles per gallon with the additive is
25.1, how possible is it to have gotten a sample of 35 cars with a mean as different from 25.1 as 26.8?
When we conduct a hypothesis test we will make a decision and then we may find out later what was really
true. So, there are 4 possibilities when we conduct a hypothesis test:
1.
2.
3.
4.
We could reject the null hypothesis when the null hypothesis is false – correct decision
We could reject the null hypothesis when the null hypothesis is true – Type I Error
We could fail to reject the null hypothesis when the null hypothesis is false – Type II Error
We could fail to reject the null hypothesis when the null hypothesis is true – correct decision
Type I Error: We reject the null hypothesis and it was actually true.
In the trial by jury example, if we rejected the null hypothesis we would find the defendant guilty. If the null
hypothesis was actually true then the defendant would, in fact, be innocent. So, if we find a person guilty when
they are not, this is a type I error.
Type II Error: We fail to reject the null hypothesis and it is actually false.
In a trial by jury situation if we fail to reject the null hypothesis then we would find the defendant not guilty
(remember that we never declare someone is innocent). If the null is false, then we have found a person not
guilty when they are indeed guilty – a type II error.
Of course, the probability that we make an error depends on how confident we are in our conclusion – this is
very similar to a confidence interval. If we are really confident, there's little probability that we made an error. If
we are not very confident, we probably made an error.
In fact, P(Type I Error) = α. The symbol α stands for the level of significance. Researchers decide the
probability of making a Type I error before the sample data are collected.
P(Type I Error)= α
P(Type II Error)= β
α and β have an inverse relation which means that when one goes up the other goes down. Notice in the court
system that since we want to find people guilty beyond a reasonable doubt we choose to make α small so the
probability that we will send an innocent person to jail is very small. The consequence of this is that we then
have a large β which means many guilty defendants would go free.
The Power of the Test is 1 - β. We would like a high power and a small amount of error. In order achieve this
you should take as large of a sample as you can.
Example
This is based off of something that happened many years ago in a statistics class. My student went sky diving
and had an accident and crushed his leg. He missed quite a bit of class but managed to come back and make it
successfully through the course. Then, about 10 years later I went to i Fly near Chicago and went indoor sky
diving (I don’t jump out of perfectly good airplanes) and he was one of the instructors!
Consider the following set of hypotheses:
H0: parachute is good
H1:parachute is defective
What type of error is involved if you jump out of a plane with a defective parachute?
Type II Error – failed to reject the null hypothesis (decided that the parachute was not defective) and the null
hypothesis ended up being false.
What type of error is involved if you decide a perfectly good parachute is defective and so you don't jump out of
the plane with it?
Type I Error – rejected the null hypothesis (decided the parachute was defective) and the null hypothesis was
actually true.
Which type of error is worse in this situation?
Type II error
What might you choose for α?
 should range from 0.01 to 0.15. Since you want a high  so that you have a low  , you should choose
0.15.
Most of the time you will be given an α.
PART II
P-value: the probability of getting a value of the point estimate as inconsistent or more with the claimed value
of the corresponding population parameter in the null hypothesis. This number gives the probability of getting a
sample at least as extreme as the one we got assuming the null hypothesis is true.
*In 9.4 there are some good pictures and questions about p-values to help you understand what it really is. I
encourage you to pay special attention to that section as it will help you on one of the questions on your final
exam.
The p-value will determine for us whether we should reject the null hypothesis or fail to reject the null
hypothesis.
If the p-value is ≤ α then we reject the null hypothesis and say that there is significant evidence to support the
alternative hypothesis.
If the p-value is > α then we fail to reject the null hypothesis and say that there is not significant evidence to
support the alternative hypothesis.
This is based on the idea that when the p-value is smaller than α then either:
- the null hypothesis is true and a rare event has taken place - or - the null hypothesis is false.
In hypothesis testing we base our decisions on what is likely to occur by chance and by definition a rare event is
not likely to occur by chance so we will come to the conclusion that the null hypothesis is false whenever the pvalue is less than α.
The five steps to a hypothesis test:
1. State your null and alternate hypothesis.
2. State your level of significance (alpha).
3. State the test you are using and the p-value.
4. Compare your p-value to alpha and make your decision.
5. State your conclusion as a sentence.
We will start by testing a hypothesis about a mean where the population standard deviation is known. Just as we
have used Zinterval to find a confidence interval for the population mean when the population standard
deviation is known, we will use Ztest to test a claim about a population mean when the population standard
deviation is known. ZTest can be found under the same menu that ZInterval was found.
Let's do a formal hypothesis test for the following problem:
The Environmental Protection Agency publishes data regarding the miles per gallon of all cars. A researcher
claims that fuel additives increase the miles per gallon of cars driven under highway driving conditions. The
mean miles per gallon of all large cars manufactured in 1999 without the fuel additive is 25.1, based on data
from the EPA. The researcher obtains a simple random sample of 35 large cars manufactured in 1999 and adds
a fuel additive. The sample mean is determined to be 26.8 miles per gallon.
Can we assume from the sample data that the mean has gone up with the addition of the additive? Assume a
population standard deviation of 3.9 and test the claim at the .05 level of significance.
1.
2.
3.
4.
5.
H 0 :   25.1
H1 :   25.1
  0.05
ZTest, p-value = 0.00496
P-value   , Reject the null hypothesis.
There is significant evidence to conclude that the mean has gone up with the addition of the additive.
Here we were given a value for the level of significance. One advantage of the p-value approach is that you do
not need a value for alpha. If you are not given a value for alpha in the problem, you can come up with one on
your own. Alpha is usually between 0.01 and 0.15. You should choose it based on whether it would be worse to
have a type I error or a type II error.
Example
Justin is in the market to buy a 2-year old Jeep Liberty. Before shopping for the car, he wants to determine what
he should expect to pay. The mean price of a 2-year old Jeep Liberty is $14,585. Justin thinks the mean price is
higher than that in his neighborhood. After visiting 15 neighborhood dealers online, he obtains a mean price of
$15,685. Assuming that the population is normally distributed with a population standard deviation of $2300,
test Justin's claim at the .1 level of significance.
1.
2.
3.
4.
5.
H 0 :   14,585
H1 :   14,585
  0.1
ZTest, p-value = 0.0320
P-value   , reject the null hypothesis.
There is significant evidence to conclude that the mean price of a 2-year old Jeep Liberty is higher in
Justin’s neighborhood.
PART III
Of course, since we did a hypothesis test for one population when sigma was known, we should also do one
when sigma is unknown. Since sigma is usually unknown we can use what we learned from the last chapter to
decide what to do when sigma is unknown. We know that when a population has a normal distribution and the
population standard deviation is unknown, then the sample means have a t-distribution instead of a normal
distribution. Therefore we can use Ttest in this situation.
Example
To estimate the mean gestation period of domestic dogs, 14 randomly selected dogs are observed during
pregnancy. Their gestation periods, in days, are given below.
Is there significant evidence to conclude that the mean gestation period for domestic dogs is less than
62.0
65 days? Assume that gestation periods are normally distributed.
61.4
59.8
H 0 :   65
62.2
1.
H1 :   65
60.3
2.   0.05
60.4
3. TTest, p-value = 1.16 x 10-10
59.4
4. P-value   , reject the null hypothesis
60.2
5. There is significant evidence to conclude that the mean gestation period for domestic dogs is less
60.4
than 65 days.
60.8
61.8
59.2
61.1
60.9
We can also do a test about a population proportion just as we did confidence intervals for population
proportions.
The Sampling Distribution of the Proportion: If np(1 - p) ≥ 10 and n ≤ 0.05N, then the distribution of P is
approximately normal where
Be careful not to confuse this with the mean and standard deviation of a binomial distribution. Those were for
the mean and standard deviation of the random variable X, these are for the mean and standard deviation of the
random variable .
To test population proportions of a large size we can use 1-PropZTest. This will give you the p-value which
you can compare to alpha as we have done previously. 1-PropZTest is found under the STAT TESTS menu.
Example
The drug Prevnar is a vaccine meant to prevent meningitis. (It also helps control ear infections.) It is typically
administered to infants. In clinical trials, the vaccine was administered to 710 randomly sampled infants
between 12 and 15 months of age. Of the 710 infants, 121 experienced a decrease in appetite. Is there
significant evidence to conclude that the proportion of infants who receive Prevnar and experience a decrease in
appetite is different from 0.135, the proportion of children who experience a decrease in appetite in competing
medications? Test at the 0.01 level of significance.
1.
2.
3.
4.
5.
H 0 : P  0.135
H1 : P  0.135
  0.01
1-PropZTest, p-value = 0.00574
P-value   , reject the null hypothesis
There is significant evidence to conclude that the proportion of children who receive Prevnar and
experience a decrease in appetite differs from 0.135, the proportion of children who experience a
decrease in appetite in competing medications.
Download