Document 10291597

advertisement
Objectives 6.2, 7.1 Tests of significance (CIS Chapters 11 and 12)
p 
The purpose of significance tests
p 
Stating and assessing hypotheses
p 
The t-statistic and the P-value
p 
Statistical significance
p 
The one-sample t test for a population mean
p 
One-sided versus two-sided tests
p 
Further reading:
http://onlinestatbook.com/2/logic_of_hypothesis_testing/
logic_hypothesis.html and
http://onlinestatbook.com/2/tests_of_means/single_mean.html
What is a hypothesis? p 
Based on observations we want to answer the following question:
p 
Question Can pigeons fly?
p 
p 
p 
We write this as a hypothesis (conjecture)
H0 : Pigeons cannot fly (we call this the Null hypothesis and it is opposite of
the conjecture)
HA: Pigeons can fly (we call this the alternative hypothesis and this is the
conjecture of interest).
p 
If we can disprove the null – show that the null is unlikely – this in turn
means the alternative seems likely.
p 
Scenario1: You are watching pigeons and you see a pigeon fly. You
have immediately disproven the null (that pigeons cannot fly) and thus
proven the alternative.
p 
Scenario 2: You are watching pigeons and they are busy eating. None
of them are flying (far too much food to even attempt it). All this is
consistent with the null being true, however it does not prove the null. In
this situation we say there is no evidence to prove the alternative.
The role of the null and alternative hypothesis p 
In most types of research/investigation we make two hypothesis, the null
and alternative.
p 
We can only prove the alternative, we cannot prove the null. This is
because we can only prove the alternative by disproving the null. Ie.
discounting the possibility of the null being true. This is why the
`research hypothesis’ is always the alternative.
q 
q 
In most situations we cannot switch the null and alternative about without
taking great care to rewrite it correctly (comes later).
Unlike the pigeon example, given on the previous slide, in statistics we
cannot prove for certain that the alternative is true. Instead:
q 
We start by first writing down the null and alternative hypothesis of interest.
q 
We collect the data.
q 
q 
We calculate how likely we can get the data we observe if the null
hypothesis were true.
If this chance/probability is very small, then the null hypothesis is unlikely to be
true, which immediately suggests that the alternative hypothesis is true. We
say `there is evidence to prove the alternative hypothesis’ or, equivalently,
`there is evidence to reject the null’.
Motivation 1 (red wine example)
Let us return to the red wine example from the confidence interval section. It has
been suggested that drinking red wine in moderation may protect against heart
attacks. This is because red wind contains polyphenols which act on blood
cholesterol. To write this in a statistical sense, we let µ denote the mean change in
polyphenol levels if the entire population started to drink moderate amounts of red
wine. If µ > 0, this means that on average polyphenol levels would rise when
drinking red wine (thus proving the conjecture). We write this as:
H0 : µ  0
| {z }
mean levels of polyphenols have stayed the same or reduced
vs
HA : µ > 0
| {z }
mean levels of polyphenols have risen
Now we collect data. To see if moderate red wine consumption increases the average
blood level of polyphenols, a group of nine randomly selected healthy men were
assigned to drink half a bottle of red wine daily for two weeks.
q 
SCENARIO 1: The difference in polyphenol levels before and after the study is:
-0.60 -1.05 -2.09 -1.23 0.71 -0.53 0.33 -0.48 -1.42. The sample mean for
these 9 guys is x̄ = 0.7. Clearly for these guys polyphenol levels have not
reduced. A sample mean of -0.7 is entirely consistent with the null being true
(the mean level staying the same of reducing) – remember the sample mean is
estimating the true mean.
q 
q 
q 
SCENARIO 1 (cont). Since x̄ = 0.7 is consistent with the null hypothesis µ≤0,
there is no evidence to disprove the null. Therefore, there is no evidence in the data
that the polyphenol levels increase with moderate consumption of red wine.
SCENARIO 2: The difference in polyphenol levels before and after the study is: 0.06
-0.36 0.98 0.82 -0.25 2.49 -1.34 1.16 1.53. The sample mean for these 9 guys
is x̄ = 0.56 . Now the sample mean and the population mean µ≤0 are in different
regions. In other words, for this group of people a positive increase in polyphenol
levels is seen. BUT does this disprove the null? Not without some calculations. It is
still possible to get a positive sample mean, when µ≤0. We need to calculate how
likely it is get a sample mean of x̄ = 0.56 when the population mean µ≤0.
SCENARIO 3: The difference in polyphenol levels before and after the study is:
8.45 10.18 10.98 10.35 10.75 8.98 8.84 10.38 9.79. The sample mean is
x̄ = 9.86 It is clear that polyphenol levels have risen for these randomly selected
volunteers, and though now it is hard to articulate, it does not seem to by lucky
chance. It really seems that this data is completely inconsistent with the null
(population mean µ≤0) and strongly suggests that the alternative is true.
SCENARIO 4: The difference in polyphenol levels before and after the study is:
-0.43 -8.35 -8.31 26.11 4.32 25.02 9.40 11.54 0.71. The sample mean is
x̄ = 6.66 . Does this data disprove the null? How likely to get this data under the null.
A statistical test allows us to systematically navigate these different scenarios.
q 
Motivation 2: Does the lady take milk? p 
Recall the tea story: One lady insisted that the tea tasted different
depending on whether milk was poured into the cup and then the tea or
if the tea was first poured and then the milk. Fisher suggests that this
can be tested, by randomly giving her milk first and tea first cups and
asking her to identify the cup. The competing hypothesis are:
p 
H0: The lady guesses which cup is which by random chance.
p 
HA: The lady is able to select the correct cup.
p 
We collect the data and find that she identifies all 8 cups of tea
correctly. This is the data we observe.
p 
In order to prove the alternative we have to calculate how plausible it is
to correctly identify all the cups of tea correctly, under the null that she
was simply guessing. This is seeing whether the data is consistent with
the null being true. If this probability is small, then it suggests that null is
implausible (we have disproved the null). If the null is implausible, then
this implies the alternative is plausible (there is evidence to suggest the
alternative is true).
Motivation 2 (cont)? p 
The ONLY way to `prove’ the alternative (she has the ability to to
correctly identify the cup) is to prove that the null is implausible. If
the null is in anyway plausible, then we cannot reject the null (and
prove the alternative).
p 
The probability of her identifying all cups correctly is 1/72. This means
there is a 1/72 = 1.39% chance of her identifying the cups of tea by
simply guessing.
p 
If the probability is over a threshold, then the null is deemed plausible
and we cannot reject the null. If it is below the threshold then the null is
deemed implausible and we can reject the null.
p 
Typically, the α=5% significance level as used as the threshold. Since
1/72 = 1.39% is LESS than 5%, at the 5% level we believe the null is
implausible and thus reject it (saying that there is evidence to suggest
the alternative, that she knows her tea, is true).
p 
However, we will never know the truth! There is a 1.39% she got the
result by lucky guess. Our using a 5% threshold is admitting that we are
willing to make 5% mistake of rejecting the null when it is in fact true.
Motivation 3 (newspapers) Let p be the proportion of the population that is pro-gay marriage.
We want to investigate whether over 60% of Americans are pro-gay
marriage.
H0 : p  0.6
vs
HA : p > 0.6
|
{z
}
|
{z
}
the proportion is less than 60%
q 
the proportion is greater than 60%
Scenario 1. A sample of 500 people were interviewed. In the sample
58% said they were pro-gay marriage. Is there evidence to back the
newspapers claims?
q 
q 
For this data set, 58%, is consistent with the null being true. We cannot
disprove the null given the data. Thus there is no evidence to suggest that
the proportion of Americans who are pro-gay marriage is over 60%.
On the other hand, we cannot discount this claim since this is just a
sample. It could be that the population is 60% pro-gay marriage, it is just
not backed up with this sample (the difference could be due to sample
variation).
p 
SCENARIO 2. A random sample of 500 people are interviewed. In that
sample 62% said they were pro-gay marriage. What do you think about
the newspaper’s claim based on this sample?
p 
In this situation the sample proportion is greater than 60%, but we need to
ask ourselves could this be because really over 60% of the population are
pro-gay marriage or the actual population is less than 60% and the 62%
observed in the sample is simply due to random variation?
§ 
p 
p 
Example: We know that proportion of females in the world in 50%. But in
any given sample there could more or less than 50% females.
Is 62% in the sample consistent with the null being true? To do this we
calculate the chance that we can obtain a sample which gives a sample
proportion of 62%, when, in fact, the population proportion 60%. This is
the principle, in reality we are calculating a probability and this probability
depends on a few ingredients:
§ 
The size of the sample.
§ 
The variability in the population (as measured by the standard
deviation).
Further reading:
http://onlinestatbook.com/2/logic_of_hypothesis_testing/intro.html
The underlying principle in a test p 
In a hypothesis test we always calculate the probability of observing
the data under the null being true.
p 
The underlying idea of a hypothesis test is that rare events are
unlikely to happen. If this probability turns out to be small, it
suggests the assumption (in this case the null hypothesis) is not true
and that there is evidence the alternative is true instead.
p 
In most statistical tests we will encounter the underlying assumption
will be based on the mean.
p 
p 
p 
This may seem very simple, but it will allow us to test a wide range of
useful hypotheses.
Most calculations will be made using that the sample mean is normal,
therefore we always need to check this assumption – else the probability
we calculate will be incorrect.
In the next few slides we will explain how to calculate these
probabilities.
Purpose of SigniFicance Tests We have seen that the properties of the sampling distribution of x help us
estimate an interval of plausible values for the population mean µ.
p 
We can also rely on the properties of the sampling distribution to test
€
hypotheses. It is based on determining how plausible a particular claim is.
p 
Example: You are in charge of quality control in your food company. You
randomly sample fourteen packs of cherry tomatoes, each labeled 227 grams.
The average weight from your fourteen boxes is 224g. Obviously, we cannot
expect boxes filled with whole tomatoes to all weigh exactly 227 grams.
p 
p 
Is the somewhat smaller weight simply due to chance variation?
Or is it evidence that the machine that sorts the cherry
tomatoes into packages needs to be recalibrated?
p 
The null hypothesis is a very specific statement about
parameter(s) of the population(s). It is labeled H0. This is the
hypothesis we assess.
p 
The alternative hypothesis is a more general statement about
the parameter(s) that is exclusive of the null hypothesis. It is labeled
Ha. We accept Ha only if we find H0 to be implausible.
Weight of cherry tomato packs:
H0 : µ = 227g (µ is equal to the value claimed by the produce company)
Ha : µ ≠ 227g (µ is either larger or smaller than the value claimed)
The tomato machine data p 
Here is the actual data from the machine:
p 
224.1 222.2 226.4 224.8 218.7 231.2 224.8 226.7 217.7 227.6 212.5
236.1 226.6 219.9
This is the data, the sample mean is 224 (precisely 224.23) and sample
standard deviation is 5.89.
The basic prescription p 
Much of what we do are calculations.
p 
However, in most data analysis, there will be no need to do the
calculations. The software (such as Statcrunch) will give you some
probabilities that you need to understand.
p 
However, understanding the calculations will help in understanding
what these probabilities actually mean.
p 
After collecting the data,0the basic prescription
is to make a z/t1
transform. That is:
@X̄
A
µ
t =
q 
|{z}
mean under the null
s.e
This will measure the number of standard errors the estimator is
from the null hypothesis. The larger this value, the `less’ likely the
null hypothesis. To find the probability of observing the data under
the null we need to find the probability associated to the t-transform
either by looking up the t-tables (or the z-tables if the population
standard deviation is known).
p 
Having set-up the hypothesis we collect the data. We find that 14
boxes of tomatoes packed with the machine, gave the sample mean
224 grams. The sample standard deviation is 5.89 grams.
p 
Now our objective is to calculate the chance of getting a sample
mean of 224 grams or lower under the null that the mean packing
weight is the same as usual (227g).
p 
The standard error of the sample mean is 5.89/√14 =1.57.
p 
We assume that the sample size is large enough such that the
sample mean is close to normally. If the null is true, we center the
mean about 227g. Make a plot with 227g in the center.
p 
On this plot place 224g. To find out the chance of getting a sample
mean of 224g or less when the mean is 227g we make a t-transform
t=
p 
224 227
=
1.57
1.9
-1.9 tells us that 224 is -1.9 standard errors to the left of the mean
227 (if this were the true population mean). Looking up the area to
the left of -1.9 (using Statcrunch) gives the probability 4%.
DeFinition: The P-­‐value (for two sided test) How unusual is this data, assuming it is properly calibrated (null is true)? We
calculated that the sample mean is t = -1.9 standard errors from the mean
under the null. The area to the left of -1.9 is 4%. Samples that are properly
calibrated and are at least as unusual as this have t-value that is either greater
than 1.9 or less than -1.9. The chance of this is the area to the right of 1.9 or
area to the left of -1.9, which is 2×4 = 8%.
Definition We want to quantify the proportion of random samples that are at
least as unusual as our actual result, if the null hypothesis were true. This
quantity is called the p-value. The p-value (for a two-sided test, which this is) is
2×the smallest area.
Tomato Example:
p-value = P (|t|
1.9) = P (t 
1.9) + P (t
1.9) = 2 ⇥ P (t
1.9) = 8%
Further reading: http://onlinestatbook.com/2/tests_of_means/single_mean.html
Deciding the conclusion with α A very small P-value indicates that our results probably did not
p 
occur when the null hypothesis is true, and therefore H0 is
implausible. It should be rejected. In this case we say the evidence is
significant.
p 
The smaller the P-value the stronger the evidence against H0.
p 
The significance level α is the largest P-value for which we are
willing to reject the null hypothesis.
The value of α is decided before conducting the test.
p 
If the P-value is equal to or less than α then we reject H0. This is
when we accept Ha as the truth.
p 
If the P-value is greater than α then we fail to reject H0.
Whatever evidence there is, it is not sufficient to accept Ha.
p 
Typically we set α=5%.
Does the packaging machine need recalibration? Recall our hypotheses.
H0 : µ = 227g (µ is equal to the value claimed by the produce
company and the machine does not need recalibration)
Ha : µ ≠ 227g (µ is either larger than 227g or smaller than 227g, in
which case the machine needs recalibration)
The produce company traditionally uses α = 5% for their quality control. That
is what you choose to do here. Since the P-value is 8%, it is larger than α
and therefore H0 is not rejected, and the decision is to not recalibrate the
machine.
* If α had been chosen as 10%, then the P-value would be significant, H0
would be rejected. And the decision is to recalibrate the machine.
No matter what we decide, there always is the possibility
that the conclusion is incorrect.
The point/boundary of decision when α=5% p 
Recall, if α=5% we cannot reject the null. If α>5% we cannot reject
the null. However, if α<5% then we can reject the null.
p 
α=5% is the boundary of the decision. We recall that this
corresponds to 2.16 standard errors from the mean under the null (in
this case 227g).
p 
Therefore, if the sample mean is in the interval (this is not a
confidence interval):

5.89
5.89
227 2.16 ⇥ p , 227 + 2.16 ⇥ p
= [223.6, 230.4]
14
14
q 
q 
Then we cannot reject the null at the 5% level, because the p-value for
any number inside this interval will be larger than 5%.
On the hand, if the sample mean is outside this interval, then the pvalue will be less than 5%. Thus, when the sample mean is outside this
interval we can reject the null hypothesis (that the mean is 227g) at the
5% level.
Because the standard deviation s=5.89 was estimated from the data we need to use
the t-tables to obtain the probability.
We used the t-distribution with 13 degrees of freedom rather than the normal. The
area to the left of -1.9 can be calculated using either Statcrunch (go to Stat ->
Calculators -> T and select df=13 and place -1.9 in the equation). You should get
0.04. The p-value is 2×4% = 8%.
Alternatively, we can deduce bounds for the p-value using the tables:
Thus the P-value is between 2×.
025 = .05 and 2×.05 = .10.
This assesses the “believability”
of the null hypothesis, given the
evidence of the random sample.
More Examples p 
Let us return to the tomato problem. We want to see whether the
tomato weighing feature is functioning in 4 different machines. So
samples of size are collected from the 4 machines.
q  Sample1 is from machine 1 etc. The back dot denotes the sample mean.
Based on this plot and the sample size, which machines do you think will
need readjustment? We will make this analysis precise in the
next few slides.
Tomato boxes using the t-­‐tables p 
Below we compare the t-transforms, with the t-value at the 2.5%
point. From this we can deduce whether we can reject the null or not.
Null
X̄
s
s.e= ps14
t-transform
t13 (2.5)
227 ± s.e ⇥ t13 (0.25)
p-value
Result
Scenario 1
µ = 227
227.1
0.597
0.160
227.1 227
= 0.63
0.16
2.16
[226.75, 227.35]
54%
Cannot Reject
Scenario 2
µ = 227
225.8
0.424
0.113
225.8 227
= 10.6
0.113
-2.16
[226.76, 227.24]
1.4 ⇥ 10 5 %
Reject
Scenario 3
Scenario 4
µ = 227
µ = 227
227.7
225.4
2.43
2.65
0.649
0.708
227.7 227
225.4 227
=
1.07
= 2.26
0.649
0.708
2.16
-2.16
[225.6, 228.4]
[225.5, 228.5]
30.4%
4.16%
Cannot Reject Reject (just about)
Interpretation of the results p 
We see that for sample 1, the p-value is 54%, which means there is a
large chance of observing the sample mean 227.1 when the population
mean is 227. The spread in the data is small and concentrated about the
227. There is no evidence that the machine 1 is not working correctly.
p 
In sample 2, the p-value is very, very small. Note: the spread in the data
is small and concentrated quite far from 227. This strongly suggests that
the machine needs to be readjusted, as it seems highly unlikely to
observe the average 225.8 when the mean of the machine is 227.
p 
In sample 3, the p-value is 30.4%. This is large, and means that that there
is a good chance of observing the average 227.7 when the population
mean is 227. Looking at the data it is highly variable, and with such a
small sample size, it it is not easy to say whether the machine needs
readjustment or not.
p 
In sample 4, the average is 225.4 and the p-value is 4.16%. This means
there is relatively small chance of observing the mean 225.4 when the
true mean is 227. It is possible that the machine needs readjusting.
Using a larger sample size q 
It is costly to readjust the machine, so we need more compelling
evidence that the machine’s mean is different to 227g. In this case we
need to increase the sample size. We now consider the average over
100 boxes, for each of the machines. A plot of the results is below.
Larger sample sizes p 
Given that the sample size has increased we now calculate the
likelihood of observing these averages.
Null
X̄
s
s
s.e= p100
t-transform
t99 (2.5)
227 ± s.e ⇥ t99 (2.5)
p-value
Result
Scenario 6
µ = 227
226.94
0.499
0.049
226.94 227
= 1.22
0.049
-1.984
[226.9, 227.1]
22%
Cannot Reject
Scenario 7
µ = 227
225.9
0.506
0.0506
225.9 227
= 21.7
0.05
-1.984
[226.9, 227.1]
⇡0
Reject
Scenario 8
µ = 227
227.2
3.00
0.3
227.2 227
= 0.66
0.3
1.984
[226.4, 227.6]
51%
Cannot Reject
Scenario 9
µ = 227
226.1
2.76
0.276
226.1 227
= 3.33
0.276
-1.984
[226.45, 227.55]
0.12%
Reject
When comparing these results with the previous results, we see that the
standard errors are smaller, this implies the non-rejection interval is narrower
and the t-values can be far larger.
Observe that the in the case we reject the null, the p-values are far smaller
than when the sample size was 14.
Calculation practice: Annual coffee shop sales The marketing firm that studied annual coffee shop sales had a statistical
model that had predicted the yearly average to be $2.33 million. Now that they
have data, they want to determine if this prediction was accurate. They choose
to use α = 5%.
p 
Hypotheses: H0 : µ = 2.33 vs. Ha : µ ≠ 2.33 (in millions of dollars).
p 
The sample data are sample mean = 2.67, s=1.03, n=41, df=40.
This gives the standard error = 0.16
t-value = (2.67 – 2.33)/0.16 = 2.125.
p 
From the t-table, using 40 df, the area to right of 2.125 is less than
2%. Using a computer we obtain the exact probability of 1.6%. So
the P-value is 2 ×1.6% = 3.2%.
p 
Since P-value = 3.2% is less than α = 5%, the firm rejects H0 . (The
evidence is weak for H0 and strong for Ha.)
p 
The firm concludes “At the 5% significance level, the evidence
indicates that the model’s prediction was inaccurate.”
Review of the Tests of SigniFicance 1.  State the null hypotheses H0 and the alternative hypothesis Ha.
2.  Calculate the value of the test statistic (such as a t-statistic). This
is a measure of how much the data and H0 differ from each other.
3.  Determine the P-value for the observed data. This is the chance, if
H0 is true, of observing a more extreme/unusual test statistic.
4.  Compare the P-value to the significance level α and decide
whether or not there is sufficient evidence (i.e., P-value ≤ α) to
reject the null hypothesis.
5.  State your conclusion in terms your audience will understand, citing
the significance level used to obtain it.
Comments on the decision rule p 
The objective of a test is to make a decision between the plausibility
of two competing hypothesis.
p 
The p-value is the probability of observing the data under the
assumption the null is true.
p 
If the p-value is less than the significance level (often set at 5%).
The decision is to reject the null and go for the alternative instead.
p 
If the p-value is greater than 5% than the data is consistent with the
null being true and we cannot reject the null.
p 
The point is there is a chance we made the wrong decision. We
could have wrongly rejected the null when actually the null is true.
p 
The chance of this happening is the significance level. In other
words, if we set the significance level at 5% and our p-value is less
than 5% there is 5% chance we have made the wrong decision.
p 
The value at which we set the significance level determines how
willing we are to wrongly reject the null hypothesis.
p 
Examples:
p 
p 
p 
p 
Suppose we are in a tomato packing plant. Our aim is to ensure that the
mean weight of a tomato box is 227g. Every few hours we randomly
sample 14 boxes of tomatoes and do a hypothesis test. Each test is done
at the 5% level. We do the test 100 times, if the null hypothesis is true, the
on average we would falsely reject the null 5 times.
Each time we falsely reject the null, it is called a type I error or in medical
terms a false positive.
Suppose we reduce the significance level to 1%, in this case if the null
were true we would falsely reject the null 1 time out of a hundred.
We will show in Chapter 8 that by increasing the significance level
(from, say 5% to 10%) we increase the number of false positives, but
we are more likely to detect the alternative, if it is true. Decreasing the
significance level will have the opposite effect.
The SigniFicance level p 
How to choose the significance level?
p 
p 
p 
p 
p 
p 
p 
There is a trade off between not wanting to falsely reject the null but
wanting to detect the alternative.
The lower the significance level, the less likely we are not falsely reject the
null, but this makes detecting the alternative much harder!
Example: Consider the court case H0: Innocent HA: Guilty.
The p-value is the probability of observing the evidence given the null is
actually true. If we set the significance level at 5% and say a person is
guilty if the p-value is less than 5%, this means that we would put in prison
5% percent of all innocent people who were put on trial!
For a democracy this is just too much! Therefore, in this situation, we need
to places the significance level at a much lower value – to avoid throwing in
jail too many innocent people.
If the significance level is put to zero, this means that no one who is
innocent is put into jail. However, it also means that all guilty people are
free.
We need to tread a line between the two.
A common p-­‐value misconception p 
A very common misconception about a p-value is that the p-value is the
probability of the null being true and (1 - p-value) is the probability of the
alternative). This is not true.
p 
A p-value is simply the chance of observing what we do under the null
being true.
p 
This misunderstanding about p-values can have severe consequences in
criminal trials.
p 
For example, a juror in a court may hear something like
`The DNA on the weapon matches the defendant, there is a one in
million this could happen by random chance’.
A misinformed juror may interpret this as ‘there is only a 1 in a million
chance that he is innocent’. This is an incorrect understanding of what the
probability means. Typically, the 1 in a million means that approximately 1 in
a million people have the observed DNA. This no longer appears that
improbable – there are 7 billion people in the world so on average 7000 of
them will match this DNA!
One-­‐sided and two-­‐sided tests A two-sided test of the population mean has these null and
alternative hypotheses:
p 
H0 : µ = [a specific number µ0] Ha : µ ≠ [a specific number µ0]
The tomato packaging and coffee sales examples were two-sided.
A one-sided test of a population mean has one of these pairs of
null and alternative hypotheses:
p 
H0 : µ ≥ [a specific number µ0] Ha : µ < [a specific number µ0]
OR
H0 : µ ≤ [a specific number µ0] Ha : µ > [a specific number µ0]
Does moderate consumption of read wine increase polyphenol levels?
H0: µ≤ 0 against HA : µ>0. This is a one-sided test.
How to choose? The choice of a one-sided versus a two-sided test depends on our
purpose for doing the investigation in the first place, as determined
before we perform the test of statistical significance.
When appropriate, one-sided tests are preferable.
A health advocacy group suspects that a cigarette manufacturer sells
cigarettes with a nicotine content higher than what they advertise in order
keep consumers addicted to their products and thus maintain revenues.
Here, the health advocacy group wants to determine whether the mean
nicotine content of a brand of cigarettes is greater than the advertised
value of 1.4 mg. But they will decide and publicize this only if the
evidence is sufficiently strong to rule out the advertised value.
Thus, this is a one-sided test:
H0 : µ ≤ 1.4 mg
Ha : µ > 1.4 mg
It is important to identify both hypotheses before obtaining the data
or else the idea of “significance” becomes meaningless.
Examples: What is the hypothesis? p 
Question A 2008 study reported that 88% of students owned a cell
phone. There has been a recent health scare on cell phone use.
You plan to take a SRS of students to see if the percentage has
decreased.
p 
p 
H0 : µ ≥ 88% against HA : µ < 88%.
Question It is known that a freshman biology class has mean score
75%. A professor thinks that students who attend early morning
classes have a higher mean score. Her early morning class this
year can be considered as a sample of all students who take an
early morning class. So she compares their average score to the
mean score of 75%.
p 
H0 : µ ≤ 75% against HA : µ > 75%.
More examples p 
Question Experiments on learning in animals sometimes measure
how long it takes a mouse to find its way through a maze. The mean
time is 20 seconds for one particular maze. A researcher thinks that
playing loud music will cause the mice to complete the maze slower.
She measures how long each of 12 mice take to get through the
maze with the loud music stimuli.
q 
q 
H0 : µ ≤ 20 against HA : µ > 20.
Question The price of gasoline has changed, previously the mean
yearly mileage of a vehicle was 4000 miles. I want to see whether
the mean yearly mileage has changed after the price change.
q 
H0 : µ = 4000 against HA : µ ≠ 4000.
Calculation of p-­‐values for one-­‐sided tests p 
The calculation of p-values for one-sided tests is almost the same
was the calculation of the p-value for two-sided test.
p 
p 
p 
p 
p 
As in two-sided tests we make the same z or t-transform.
Once the z or t-transform has been made, now we have to take care. We
need to look in what direction the alternative arrow is pointing in.
If the alternative arrow is pointing to the right, eg. HA : µ > 20, then the pvalue is the area to the right of the z or t-transform.
Unlike the two-sided test case we do not double the probability.
If the alternative arrow is pointing to the left, eg. HA : µ > 20, then the pvalue is the area to the left of the z or t-transform. Again do not double
the probability.
p 
Note, that for one-sided tests, the p-value can be larger than 50%,
when the sample mean is on the `other’ side of the alternative. For
example, if HA : µ > 20, and we have a sample mean = 19, then the
p-value has to be greater than 50%. From a plot you can easily see
why this is true.
p 
In the next few slides we will get some practice in these ideas.
Recap: P-­‐values in one-­‐sided and two-­‐sided tests for Ha: µ > µ0,
P-value = P(T ≥ t)
(if t < 0, do not reject H0)
One-sided test
for Ha: µ < µ0,
P-value = P(T ≤ t)
(if t > 0, do not reject H0)
Two-sided test
for Ha: µ ≠ µ0,
P-value = 2×P(T ≥ |t|)
To calculate the P-value for a two-sided test, use symmetry. Find the
P-value for a one-sided test and double it.
One sided tests: Red Wine and Polyphenols (4 scenarios) p 
Recall our aim was to see whether the consumption of red wine
increased polyphenol levels. We state this as:
H0 : µ  0
vs
HA : µ > 0
| {z }
| {z }
mean levels of polyphenols have stayed the same or reduced
mean levels of polyphenols have risen
We obtain the results for the four different scenarios considered at the start of this
chapter. The alternative points to the RIGHT so we need to the area to the RIGHT.
Data
Null
X̄
s
s.e= ps9
t-transform
t8 (5)
( 1, 0 + s.e ⇥ t8 (5)]
p-value
Result
Scenario 1
Scenario 2
Scenario 3
Scenario 4
-0.60, -1.05, -2.09
0.06,-0.36,0.98
8.45,10.18,10.98
-0.43,-8.35, -8.31
-1.23, 0.71, -0.53
0.82,-0.25,2.49
10.35,10.75,8.98
26.11,4.32,25.02
0.33, -0.48, -1.42
-1.34,1.16,1.53
8.84,10.38,9.79
9.40,11.54 ,0.71
µ0
-0.7
0.87
0.29
0.7 0
2.41
0.29 =
1.86
( 1, 0.53]
97.9%
Cannot Reject
µ0
0.56
1.15
0.383
0.56 0
0.383 = 1.46
1.86
( 1, 0.72]
9.12%
Cannot Reject
µ0
9.86
0.90
0.3
9.86 0
= 32.8
0.3
1.86
( 1, 0.56]
⇡ 0%
Reject
µ0
6.66
12.6
4.2
6.66 0
= 1.58
4.2
1.86
( 1, 7.8]
7.6%
Cannot Reject
The t and p-­‐values for the red wine problem On the left we give the pvalues for each of the
scenarios.
Remember we need to
calculate the area to the
RIGHT of each t-value since
the alternative hypothesis is
pointing to the RIGHT
(HA :µ>0).
The 4 red wine examples in Statcrunch p 
Compare the result done by hand with that done in Statcrunch
Match the standard errors, t-stat and P-values with the results done by hand.
Deducing one-­‐sided results from two-­‐sided p 
p 
We recall for a given data set and population mean we can do three
different tests. However, the results of each test are connected.
For example, suppose we want to test the hypothesis that red wine decreases
polyphenol levels. Then our hypothesis of interest is
H0 : µ ≥ 0 against HA : µ < 0.
We are given the output from the first data set
This is the result of the test H0 : µ ≤ 0 against HA : µ > 0. The p-value for this test
98%, and there is no evidence to reject the null (the sample mean is negative).
However, if we test H0 : µ ≥ 0 against HA : µ < 0, the p-value is the area to the
LEFT of -2.45, which is 100-98% = 2%. Therefore, there is evidence to suggest
that µ < 0, hence we can reject the null of this hypothesis.
If we test H0 : µ = 0 against HA : µ ≠ 0, the p-value is 4% and there is evidence to
suggest the mean is not zero.
Example: Gestational diabetes p 
Let us return to the example of testing for gestational diabetes.
p 
We will use the data to collected to test for gestational diabetes. We
know that a patient has gestational diabetes if the mean glucose level
of the patient is over 140. This means we are testing:
p 
p 
Question A patient goes to the doctors. We do not know if she has
gestational diabetes (µ is unknown). The glucose level in her blood samples
is assumed to normally distributed with σ=4. After taking 4 blood samples
her sample mean is 145. Is there evidence that she has gestational
diabetes?
Answer: We want see whether she has gestational diabetes, this means
discounting the possibility that she does not have gestational diabetes.
We want to test H0: µ≤140 against the alternative HA: µ > 140.
§ 
To this we need to know the variability in the sample mean, this is
quantified by the standard error = 4/√4 = 2.
§ 
Next we have to calculate how far her sample mean is from the mean if
she were healthy: z-transform = (145-140)/2 = 2.5 (we call it a ztransform rather than a t-transform because we know the standard
deviation).
Calculation Practice (cont). § 
Since the alternative is pointing to the right, we need to calculate the
probability to the right of 2.5. From the z-tables this is 0.6%.
§ 
0.6% is quite small. It says the chance of getting a sample mean of 145
or higher, when the patient does not have gestational diabetes is 6 in a
1000.
§ 
As this quite a small chance and is below the standard α=5% threshold,
we reject the null and conclude that the patient has gestational diabetes.
Thus refer her for more tests. However, there is always a chance we are
making the wrong decision. Since there is a 6 in a 1000 chance of
observing this data when she does NOT have gestational diabetes.
Example: Low Potassium p 
Hypokalemia is diagnosed when the blood potassium level is below
3.5mEq/dl. The potassium in a blood sample varies from sample to
sample and follows a normal distribution with unknown mean but
standard deviation is known to be 0.2. We only `diagnose’ low
potassium when we discount the possibility that the potassium levels
are normal.
p 
Question: State the hypothesis of interest.
p 
Answer: H0 : µ ≥ 3.5 against HA : µ<3.5.
p 
p 
Question: A patient has 9 blood samples taken, his sample mean/average
is 3.4, is there evidence to suggest low potassium (use 5% significance
level)?
Answer: The alternative is pointing LEFT so the p-value is the area to the
left of
3.4 3.5
0.1
z=
p =
=
0.06
(s.e = 0.2/ 9)
1.5
Looking up the z-tables (remember the standard deviation is known) gives the
p-value 6.68%. As this is greater than 5% we cannot reject the null. There is
not enough evidence that he has low potassium.
The tests in Statcrunch p 
By giving Statcrunch the sample mean, standard deviation, sample
size and the hypothesis under investigation, Statcrunch will give us
the p-value and it is our job to understand what it means.
p 
To do this, go to Stats -> T-statistics (if the standard deviation is
estimated from the data, else z-statistics) -> One Sample -> With
summary. In the box you input the sample mean, standard deviation
and sample size. In the next box choose the null of interest and also
the alternative of interest (whether it is a two-sided test or one-sided
test – more of this later).
p 
You should then get output with the p-values.
Lab practice p 
Load the calf data into Statcrunch. We want to draw inference about
the mean weight of a newborn calf based on the sample mean of 44
calves.
p 
We first make a histogram of the data, to see if there are any major
deviation from normality.
p 
The distribution of weights at birth does not have a obvious skew or
thick tail. This means that distribution of the sample mean based on
a sample of 44 will be very close to normal. So we can rest assured
that using the t-distribution (since the standard deviation is
unknown) will be reliable.
p 
Now we construct a 95% confidence interval for the mean. We can
do this in by going to Stat -> T-statistics -> One sample -> with data
-> putting Weight W0 into right box, then select Confidence interval
and Calculate. This will give you a 95% confidence interval using the
distribution.
p 
p 
p 
[90.85,95.58]
This means with 95% confidence the mean weight of new born calves
should lie in this interval.
We now want to see whether there is evidence to suggest the mean
weight of calves is greater than 90 pounds. Ie. H0: µ ≤ 90 against
HA : µ > 90. We can already see from the confidence interval, that it
seems unlikely. Later in this chapter we will see how tests and
confidence intervals are related.
p 
We can also deduce the p-value in Statcrunch. Again Stat -> Tstatistics -> One sample -> with data -> putting Weight W0 into right
box, then pressing next. Select Hypothesis Test. Place box, in the
Null: mean = 90. And choose as the alternative >. Then press
calculate. It will calculate the p-value using the t-distribution with 43
degrees of freedom.
p 
q 
You get the p-value 0.44%. This means, at the 1% level we can reject the
null.
Looking at the 98% CI we see that the 90 does not lie in this interval, this fits
with the p-value being less than 1% (we cover this in later slides).
More lab practice p 
212 earthquakes of magnitude 6.0 or higher were observed in the
period 09/01/10 to 08/31/11. We are interested in the depth of these
earthquakes and whether the average depth (µ) exceeds 50 km.
p 
The summary data are
p 
The data are quite highly skewed. They may even be bimodal.
However with a sample size of 212 it is reasonable to suppose the
sample mean is close to normal (we can also check use the app).
[Stat-Summary Stats-Columns]
mean earthquake depth (cont.) p 
We will get a 95% confidence interval for µ.
p 
First get t* = 1.971 (for df = 211, area to left = .975). [Stat-Calculators-T]
p 
Next, compute the interval.
x ± t * ×s / n = 65.73 ± 1.971×125.65 / 212 = 65.73 ± 17.01 = (48.72,82.74).
p 
p 
The average depth is between 48.72km and 82.74km, with 95%
confidence.
Or, from StatCrunch,
[Stat-T Statistics-from data]
mean earthquake depth (cont.) p 
We also will use α = 5% for a hypothesis test of
H0: µ ≤ 50 versus Ha: µ > 50.
p 
p 
x − 50
65.73 − 50
=
= 1.823.
s / n 125.65 / 212
Next, look up the P-value = 0.035.
First compute the t-statistic
t=
[Stat-Calculators-T]
p 
p 
P-value = 3.5% < α = 5%, so we reject H0 and conclude the average
depth is greater than 50 km.
Or, from StatCrunch,
[Stat-Summary Stats-Columns]
Calculation practice: Sweetening colas
A cola manufacturer wants to test how much the sweetness of a new
cola drink is affected by storage. The sweetness loss due to storage
will be evaluated and scored by 10 professional tasters (by comparing
the sweetness before and after storage):
We only want to test if storage results in a mean loss of sweetness.
That is, if µ is the mean change in the tasters’ scores, we are only
concerned with whether the evidence will show that it is negative.
So the hypotheses are:
H0: µ ≥ 0 vs. Ha: µ < 0
Note that these are determined prior to obtaining the data.
This choice will affect how the P-value is calculated.
The next step is to obtain the data and compute the t-statistic.
Taster
1
2
3
4
5
6
7
8
9
10
Sweetness change
−2.0
−0.4
−0.7
−2.0
0.4
−2.2
1.3
−1.2
−1.1
−2.3
sample average = −1.02
standard deviation = 1.196
degrees of freedom = 10 – 1 = 9
t=
x −µ
s
n
=
−1.02 − 0.00
1.196 10
= −2.697.
The large, negative t indicates substantial
evidence in favor of the alternative hypothesis.
Sweetening colas (continued)
Is there sufficient evidence that storage results in sweetness loss for
the new cola recipe at the 0.05 level of significance (α = 5%)?
H0: µ ≥ 0 versus Ha: µ < 0 (one-sided test)
We have t = −2.70 with 9 df. Since the test is one-sided, only values
less than –2.70 are more in favor of Ha than our results (and thus more
relevant).
So P-value = area to the left of –2.70 = the area to the right of 2.70.
p 
p 
From the t-table: 2.398 < 2.70 < 2.821 thus 0.02 > P-value > 0.01.
Since P-value < α = .05, the result is significant and H0 is rejected.
There is a loss of sweetness, on average, following storage.
The t-score associated with probability 0.05 is the critical value tα =
1.833. This represents the smallest value (in magnitude) for which the
null hypothesis would be rejected at significance level α.
Two-­‐sided tests and conFidence intervals p 
There is a close connection between confidence intervals and two-sided
tests. Let us return to the one bed apartment in Dallas example.
p 
10 apartments are randomly sampled. The sample mean and the sample
standard deviation based on this sample is 980 dollars and 250 dollars
(both are estimators based on a sample of size ten). The 95% confidence
interval for the mean is [980±2.262×79]=[801,1159].
q 
Suppose we want to know whether the price of apartments has changed
since last year, where the mean price was 850 dollars.
q 
q 
q 
Based on this interval we see that 850 dollars is contained in this interval. This
means the mean could be 850 dollars . There given the sample it is unclear
whether the mean price of apartments is the same since last year or not.
We can rewrite the above as a statistical test H0: µ = 850 against HA : µ ≠850.
The t-transform is t = (980-850)/79 = 1.64. Looking at the t-distribution, we
see that 1.64 < 2.262 (this is the t-value corresponding to 9df at 2.5%).
Therefore, the p-value is greater than 5%. Thus we cannot reject the null at the
5% level.
Further reading:
http://onlinestatbook.com/2/logic_of_hypothesis_testing/sign_conf.html
p 
Summarizing these two observations we see that:
p 
850 lies inside the 95% confidence interval [801,1159].
p 
We are unable to reject the null at the 5% level.
p 
If the mean under the null lies in the 95% confidence interval, then
this implies the corresponding p-value will be greater than 5%.
p 
On the other hand, if the mean under the null does not lie in the
95% confidence interval its p-value will be less than 5%.
p 
This is easily seen with an illustration (see later slides).
p 
p 
If 850 is in an interval centered about 980 (where each side has length
178.7). Then 980 must be the interval centered about 850 with sides of
length 178.7. A few slides earlier we showed that this interval
[850±2.262×79]=[671,1028] corresponded to points where we make a
decision to reject the null or not at the 5% level.
In general, if the mean under the null lies in a (1-α)×100%
confidence interval, then the p-value for a two sided test will be
greater than α.
ConFidence intervals and one-­‐sided tests Consider the polyphenol and red wine example considered in Chapter 6. 15
randomly sampled men were asked to drink red wine every day for two
weeks. Their change in polyphenol levels was measured:
0.7, 3.5, 4.0, 4.9, 5.5, 7,0, 7.4, 8.1, 8.4, 3.2, 0.8, 4.3, -0.2, -0.6, 7.5.
The average change is 4.3 and sample standard deviation is 3.06.
p 
Review: Two-sided tests and confidence intervals
p 
p 
The 95% confidence interval for the change in polyphenol levels is
[2.6,5.99]. This means if I am testing the hypothesis H0:µ = 0 against the
alternative HA: µ ≠ 0, since 0 is not in the interval the p-value is less than
100 – 95% = 5%.
The 99% confidence interval for the chance in polyphenol levels is
[1.94,6.66]. This means if I am testing the hypothesis H0:µ = 0 against
the alternative HA: µ ≠ 0, since 0 is not in the interval the p-value is less
than 100 – 99% = 1%.
q 
One Sided test (pointing RIGHT) Suppose we are testing that polyphenol
levels increase. This means testing the hypothesis H0:µ ≤ 0 against the
alternative HA: µ > 0. The p-value is the area to the right of 4.3 (see that the
alternative is pointing to the right). Since from above we have deduced that
in the two sided test the p-value is less than 5%, so for the one-sided
the p-value is less than 2.5%.
q 
Why?
Recall the p-value for two-sided tests is the smallest area to the left/right of
of the t-transform times 2. In this case it is the area to the right of 4.3 times 2.
For the two sided test we have deduced that the p-value is
less than 5%, this implies that the area to the RIGHT of
4.3 is less than 5/2 = 2.5%.
The p-value for the one-sided test pointing to the RIGHT
is the area to the right of 4.3. We have just shown that the
area to the right of 4.3 less than 2.5%. Thus the p-value
for the one-sided test pointing to the RIGHT is less than
2.5%.
q 
One Sided test (pointing LEFT) Suppose we are testing that polyphenol
levels decrease. This means testing the hypothesis H0:µ ≥ 0 against the
alternative HA: µ < 0. Since 4.3 is not in the 95% confidence interval this
means the p-value is greater than 97.5% (there is no evidence to reject the
null – which is clear 4.3 lies within the null hypothesis).
q 
Why?
On the previous slide we showed that the p-value for the hypothesis pointing to
the RIGHT is less than 2.5% - the area to the RIGHT of 4.3 is less than
2.5%. The p-value for the test pointing to the LEFT is the area to the LEFT of
4.3. Which has to be greater than 97.5% (since the area to the left plus the area
to the right is 100%).
But this is obvious. The point of a test is
to see how plausible the data is under the
null. If the sample mean is 4.3 and the
null is that the true mean is greater than
or equal to 0, this is highly plausible! If
this is highly plausible we cannot reject
the null.
Illustration, mean in conFidence interval Illustration, mean not in conFidence interval Example 1: CI and testing Scientists want to understand whether Omega 3 supplements increase
the IQ of people. They randomly sampled 30 people (who previously did
not take any supplementation), took their IQ before the experiment and
asked them to take a daily 1000mg dose of EPA/DHA Omega 3. After
two months they measured the IQ again. They took the difference
between the current IQ (after supplementation) and previous IQ (before
supplementation) and evaluated the average, which was x̄ = 7 (so for
this group there was an overall increase, but we do know with out
statistics, whether this is by chance). The 95% CI for the mean change
was [-1,15].
q 
Question We want to test the hypothesis H0:µ =0 against HA: µ ≠ 0,
what are the results of the test using the 5% significance level?
q 
Answer Because 0 is in the 95% CI interval [-1,15], the p-value for the
two sided test is greater than 5% so there is not enough evidence to
reject the null.
p 
Question We want to test the hypothesis H0:µ ≤0 against HA: µ > 0 (in fact
this is really the hypothesis of interest as it asks whether Omega 3 on
average Omega 3 increases IQ) what are the results of the test using the
5% significance level?
p 
Answer In this case, the p-value is the area pointing RIGHT of 7. This is
the smallest area. We know from the p-value for the two sided test is
greater than 5%. This means the p-value for this one-sided test
is greater than 2.5%, so we would NOT be able to reject the null if we did
the test at the 1% level. However, it is unknown whether the p-value is
less 5%, so we do not know whether or not we can reject the null at the
5% level. Further calculations need to be done to determine the p-value
in this case.
p 
p 
Question We want to test the hypothesis H0:µ ≥0 against HA: µ < 0 (the
hypothesis of interest asks whether Omega 3 on average decreases IQ)
what are the results of the test using the 5% significance level?
Answer Since the average is x̄ = 7, lies within the interval under the null
(µ≥0), there is no evidence to reject the null.
Example 2: CI and testing Scientists want to understand the Omega 3 supplements increase the IQ
of people. This time they randomly sampled 100 people (who previously
did not take any supplementation), took they IQ before the experiment
and asked them to take a daily 1000mg dose of EPA/DHA Omega 3.
After two months they measured the IQ again. They took the difference
between the current IQ (after supplementation) and previous IQ (before
supplementation) and evaluated the average, which was x̄ = 6.5.
The 95% CI for the mean change was [2.11,10.88].
q 
Question We want to test the hypothesis H0:µ =0 against HA: µ ≠ 0,
what are the results of the test using the 5% significance level?
q 
Answer Because 0 is not in the 95% CI interval [2.11,10.88], the area
to the RIGHT (we need the smallest area) of 6.5 will be LESS than
2.5%. Thus the p-value is less than 5% and we can reject the null at
the 5% level.
p 
Question We want to test the hypothesis H0:µ ≤0 against HA: µ > 0 (in
fact this is really the hypothesis of interest as it asks whether Omega 3
on average Omega 3 increases IQ) what are the results of the test using
the 5% significance level?
p 
Answer In this case, the p-value is area to the RIGHT of 6.5, which we
know from the two-sided test is LESS than 2.5%. This means we can
reject the null at the 5% level.
p 
Question We want to test the hypothesis H0:µ ≥0 against HA: µ < 0 (the
hypothesis of interest asks whether Omega 3 on average decreases IQ)
what are the results of the test using the 5% significance level?
Answer Since the average is x̄ = 6.5 lies within the interval the null
(µ≥0), there is no evidence to reject the null.
p 
It is important to observe that the p-values for both the one-sided tests
will always add up to one.
What is wrong with the following? q 
A random sample of size 30 is taken from a population that is
assumed to have a standard deviation of 5. The standard deviation
of the sample mean (standard error) is 5/30.
q 
p 
Recall, the standard error is 5/√30.
A study where the sample mean is 45, reports a statistical
significance (p-value less than α%) for H0: µ≤ 55 against HA : µ> 55.
q 
This is an example where you need to consider which way the one-sided
test is pointing. It is clear that with a sample mean of 45, there is no
evidence what so ever to support the null. In this case the p-value will be
greater than 50%.
q 
Why? Because the p-value is the area to the right of 45 with the
mean centered at 55. If you make a plot, it is clear to see that the pvalue is greater than 50%.
p 
A test rejected the null hypothesis that the sample mean was equal
to 50.
p 
p 
Hypotheses are always about the population mean, which is
unobserved. It makes no sense to state the hypotheses in terms of the
sample mean which is observed.
A test preparation company wants to test that the average score of
their students on the ACT is better than the national score of 21.5.
They state their alternative hypothesis as HA: µ> 21.5. The zvalue is equal to 0.018. Because this is less than the significance
level 5%, the null hypothesis is rejected.
q 
This is an example where the z-transform has been mistaken for a
probability! We need to deduce the probability (which is the p-value) by
looking up 0.018 in the z-tables. This turns out to be 49% (0.018 is the
number of standard deviations the sample mean is from 21.5), since it is
so close to the mean, it is clear that the p-value will just below 50% and
we cannot reject the null.
How reliable are these p-­‐values? p 
Remember the p-values we have calculated so far always use the
normal or t-distribution (depending on whether the population standard
deviation is known or not).
p 
Underlying these calculation is the assumption that the sample mean is
normally distributed (remember we always make a plot of of the normal
distribution and center it about the mean under the null).
If the sample size is not large enough, so the central limit
theorem has not `kicked-in’, then the sample mean won’t be
normally distributed. This means the probabilities we have
calculated won’t be reliable – just like the 95% CI for the mean won’t
really be a 95% confidence interval.
q 
In this case we must be cautious in interpreting the results of the test. If
the p-value is extremely small (say 0.0001), it would be small even if the
correct distribution of the sample mean were used. On the other hand, if
the p-value is close to the 5% significance level we need to careful about
its statistical significance.
Example: Siblings p 
The university is interested in the (population) mean number of
younger siblings a student has at the university (in the hope that they
will attended the university). They believe that the mean is greater
than 0.25. To test this hypothesis, HA: µ≤ 0.25 against HA: µ> 0.25
they randomly sample 3 students ask them how many siblings they
have, they answer 0, 1, 3. The sample mean is 1.33 and the sample
standard deviation is 1.53.
p 
p 
Question: What are the conclusions of the test at the 10% level and
comment on the reliability of the result.
Answer: The t-transform is t = (1.33-0.25)/(1.53/√3) = 1.22. Using the ttables (with 2df) we see this lies somewhere between 15-20%. Since the
alternative hypothesis is pointing RIGHT this means the p-value is between
15-20%.
Now we comment on the reliability of this p-value. In HW9, Q1 we made
plot of the sample mean (based on size 3) for younger sibling numbers.
q 
q 
q 
The distribution of the sample mean is the lowest plot on the left, this is clearly not
normal (see also the corresponding QQplot). This means that the p-value is not
correct, it is based on normality when the sample mean is not normal.
This means we have to be very careful when we interpret this p-value.
We recall if the sample size is larger (in Q2, Quiz 9 we looked at sample size n =
150), then sample mean is close to normal and we corresponding p-value will be
closer to the truth (as it if came from the true distribution of the sample mean).
Accompanying problems associated with this Chapter p 
Quiz 9
p 
Quiz 9 part 2
p 
Quiz 10
p 
Quiz 11
p 
Part of Homework 4
p 
Homework 5 (Q1-Q6).
Download