STAT 205 slides - University of South Carolina

Elementary Statistics for the
Biological and Life Sciences
STAT 205
University of South Carolina
Columbia, SC
© 2011, University of South Carolina. All rights reserved, except where previous rights
exist. No part of this material may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means — electronic, mechanical, photoreproduction,
recording, or scanning — without the prior written consent of the University of South
Carolina.
Chapter 5:
Sampling Distributions
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
97
Sampling Variability

Question: If Y is random, say Y ~ N(µ,s2),
and we take a random sample, Y1,Y2,…,Yn,
aren’t the Yi’s also random?

And, if the Yi’s are random, aren’t any
statistics based on them, such as Y
or S2?

This is known as SAMPLING VARIABILITY.
STAT205 – Elementary Statistics for the Biological and Life Sciences
98
Sampling Distributions

A sample statistic has its own probability
dist’n, called the SAMPLING DISTRIBUTION
of the statistic.

Think of it as repeatedly taking a new
sample from the same popl’n and finding
each sample mean, ad infinitum.
• What will the probab. histogram/density
function of the sample mean look like?

The textbook calls this a Meta-Experiment.
STAT205 – Elementary Statistics for the Biological and Life Sciences
99
Binary Data

Recall that for Y ~ Bin(n,p) we can
estimate p if it is unknown using the
SAMPLE PROPORTION:
p = Y
n

Since Y is random, so is this statistic.
What is the sampling dist’n of p ?
STAT205 – Elementary Statistics for the Biological and Life Sciences
100
Example 5.4
Ex. 5.4: Y = # of people with 20/15 vision
(“superior”).
 Say n = 2. We are given P{superior} = 0.3.


Let p = Y/n. What are its possible values?
Clearly, Y = 0, 1, or 2. Thus, e.g.,
Pp=
1
2
= P[Y = 1]
= 2C1 (.3)1(.7)1
= (2)(.3)(.7) = .42
STAT205 – Elementary Statistics for the Biological and Life Sciences
101
Example 5.4 (cont’d)
Sampling dist’n of p :
j
0
1
2
p
0
1/2
1
P(Y = j)
.49
.42
.09
.49
.42
.09
P (p =
j
2
)
STAT205 – Elementary Statistics for the Biological and Life Sciences
102
Large-Sample Dist’n

Example 5.4 gives the sampling dist’n at
n = 2. The effort gets harder as n
increases. (Try it at n = 10….)

Fig. 5.5 shows the effect at larger n:
STAT205 – Elementary Statistics for the Biological and Life Sciences
103
Continuous Data

Facts: Given a random sample, Y1,Y2,…,Yn,
where E(Yi) = µ and E[(Yi – µ)2] = s2, then
(i) the POPL'N MEAN of Y is E(Y) = µ
(ii) the POPL'N VARIANCE of Y is
sY2
2
s
= n
(iii) the POPL'N SD of Y is sY = s
n
 Notice: same popl’n mean,
while SD  as n .
STAT205 – Elementary Statistics for the Biological and Life Sciences
104
Distribution of the
Sample Mean

If Yi ~ i.i.d. N(µ , s2) for i = 1,…,n, then
2
s
Y ~ N(µ , n )
 Once again:
• Same mean
• SD  as n 
• So, more precision as as n 
STAT205 – Elementary Statistics for the Biological and Life Sciences
105
Example 5.9

Ex. 5.9: Y = weight of seeds ~ N(500,14400).

Suppose n = 4. Since Y is normal, so is the
sample mean:
Y ~ N(500 ,

14400
4
) = N(500,3600)
And so, Z = Y - 500 = Y - 500 ~ N(0,1)
60
3600
STAT205 – Elementary Statistics for the Biological and Life Sciences
106
Example 5.9 (cont’d)
So, e.g.,
P[Y > 550] = P Y - 500 > 550 - 500
3600
3600
= P Z > 50 = P[Z > 0.83]
60
= 1 - P[Z < 0.83]
= 1 - .7967
= 0.2033
STAT205 – Elementary Statistics for the Biological and Life Sciences
107
CLT

Theorem: The CENTRAL LIMIT THEOREM
states that for any i.i.d. random sample,
Y1,Y2,…,Yn, where E(Yi) = µ and E[(Yi – µ)2] =
s 2,
2
s
Y  N(µ , n )
as n  ∞.

This is approximately true for any finite n,
and the approximation improves as n  ∞.
(A powerful tool !)
STAT205 – Elementary Statistics for the Biological and Life Sciences
108
CLT and Sample Size
Sometimes, the CLT kicks in after only a few
observations ( small n).
 But, sometimes we need a very large n:

STAT205 – Elementary Statistics for the Biological and Life Sciences
109
Example 5.13
Ex. 5.13: Y = # eye facets in fruit fly.
• Clearly Y is a count and can’t be exactly
normal (see the idealized plot in Fig. 5.13).
• But, by about n = 32 we’re close to normal:
STAT205 – Elementary Statistics for the Biological and Life Sciences
110
Chapter 6: Estimation and
Confidence Intervals
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
111
Unbiased Estimation

Parameters such as µ or p are usually
unknown, and we use the sample data to
estimate them.

DEF’N: If an estimator q of an unknown
parameter q has the property that E q = q
we say it is an UNBIASED ESTIMATOR.
(A BIASED estimator is not unbiased.)

For instance, we know E(Y) = µ, so Y is
unbiased for µ.
STAT205 – Elementary Statistics for the Biological and Life Sciences
112
Standard Error

DEF’N: The STANDARD ERROR of a point
estimator is the estimated SD (the square
root of the variance) of the estimator:
SE q = Variance q

DEF’N: The STANDARD ERROR OF THE
MEAN (SEM) is the estimated SD of the
sample mean:
2
2
SY
sY
SY
SE(Y) =
=
=
n
n
n
STAT205 – Elementary Statistics for the Biological and Life Sciences
113
Examples 6.1-6.2


Ex. 6.1-6.2: Y = stem length of soybean
plants (cm). n = 13:
2
We find Y = 21.34 cm and S = 1.486
so SE(Y) =
1.486 = 1.22 = 0.338 cm
13
13
STAT205 – Elementary Statistics for the Biological and Life Sciences
114
SE vs. SD

DO NOT confuse the SE with the SD !

In Ex. 6.2, the SD of the sample was
S = √1.486 = 1.22,
but the SEM was 1.22/√13 = 0.34.
(Usually, we round SEM to 2 signif. digits.)

Notice here again that as n  , SEM 
 more precision in larger samples.
STAT205 – Elementary Statistics for the Biological and Life Sciences
115
Interval Estimation

DEF’N: A 1 – a CONFIDENCE INTERVAL
for an unknown parameter q is any pair of
statistics L(Y1,…,Yn) and U(Y1,…,Yn) that
satisfy
P{L(Y1,…,Yn) < q < U(Y1,…,Yn)} = 1 – a
for some fixed 0 < a < 1.

Notice the use of statistics based on the
sample data to build the conf. limits L & U.
STAT205 – Elementary Statistics for the Biological and Life Sciences
116
Student’s t-Distribution

s2
For Y ~ N(µ , n ) and S = 1
n-1
the sampling distribution of
T =
2
n
(Yi - Y)

i=1
2
Y-µ
S n
is called a (Student's) t DISTRIBUTION
with degrees of freedom (df) equal to n - 1.

NOTATION: T ~ t(n–1)
STAT205 – Elementary Statistics for the Biological and Life Sciences
117
t Density Curve
The t-dist’n density curve is
(i) continuous over – ∞ < y < ∞
(ii) symmetric about t = 0
(iii) unimodal, and hence “bell-shaped”
(iv) slightly heavier in the tails than N(0,1)
STAT205 – Elementary Statistics for the Biological and Life Sciences
118
Upper-a t Critical Point

DEF’N: The UPPER- a CRITICAL POINT
from T ~ t(n–1) is the value ta such that
P{T > ta} = a.

Find ta for given df by:
• reading from the rows of Table 4 (p. 677);
also see textbook’s back inside cover
• Using a TI-84 t calculator or R

Notice that at df = ∞ we recover za (bottom row of
Table 4)
STAT205 – Elementary Statistics for the Biological and Life Sciences
119
(Portion of) Table 4, p.677
STAT205 – Elementary Statistics for the Biological and Life Sciences
120
Figure 6.8
Upper-a t critical point at a = 0.025:
STAT205 – Elementary Statistics for the Biological and Life Sciences
121
Confidence Interval for µ

DEF’N: A 1 – a CONFIDENCE INTERVAL
FOR µ when Yi ~ i.i.d. N(µ , s2) is
Y - ta S < µ < Y + ta S
2 n
2 n
or, simply Y ± ta S
2 n
where ta/2 has df = n - 1.

We will construct these by hand, but
they’re available via the TI-84 and R too.
STAT205 – Elementary Statistics for the Biological and Life Sciences
122
Example 6.6
Ex. 6.6 (6.1 cont’d): Y = soybean stem length.
We had Y = 21.34 and S2 = 1.486. So, a
95% confidence interval on µ is
Y ± t.05 S = 21.34 ± t.025 1.486
2 n
13
= 21.34 ± (2.179)(0.3381) = 21.34 ± 0.7367

use Table 4 with df = 12
i.e., 20.60 < µ < 22.08 cm.
STAT205 – Elementary Statistics for the Biological and Life Sciences
123
Confidence isn’t Probability!

NOTE: Confidence is NOT probability. A
result such as P{20.6 < µ < 22.1} is either
zero, or one (it either does occur, or it does
not).

Confidence has a “frequentist” interpretation: if a = 0.05, then we expect 95% of all
intervals to “cover.” But, any single one of
them either does cover, or it doesn’t.
STAT205 – Elementary Statistics for the Biological and Life Sciences
124
Figure 6.10
Frequentist interpretation of coverage:
STAT205 – Elementary Statistics for the Biological and Life Sciences
125
Margin of Error

DEF’N: The MARGIN OF ERROR (MoE) of a
1 – a conf. interval is the half-width of the
interval.
• Notice that the MoE depends upon a.

Ex. 6.6 (cont’d): At a = .05, the MoE of the
conf. interval on mean stem length is
(22.08 – 20.60)/2 = 1.48/2 = 0.74. Notice
here that this is simply
ta/2 S = 0.7367
n
STAT205 – Elementary Statistics for the Biological and Life Sciences
126
C.L.T. Intervals
If Yi is NOT N(µ , s2), but if n is still large,
the CLT may apply, at which point
Y ± ta S ,
2 n
where ta/2 has df = n–1, is still a valid 1 – a
conf. interval, if only approximately so for
finite n.
STAT205 – Elementary Statistics for the Biological and Life Sciences
127
Binomial Data

When Y ~ Bin(n,p) and p is unknown, we
can build conf. intervals for it as well.

We use the same structure as the t-interval:
estimator ± (critical point)(std. error).

For p (and ONLY for building confidence
intervals) we use the point estimator
p =
Y +
1 z2
2 a/2
n + z2a/2
STAT205 – Elementary Statistics for the Biological and Life Sciences
128
Agresti-Coull Conf. Intervals

DEF’N: When Y ~ Bin(n,p), the AGRESTICOULL CONFIDENCE INTERVAL for p is
p ± za/2SE(p)
where SE (p) =

p(1 - p)
n + z2a/2
This “AC” interval is recommended for
n ≥ 40.
STAT205 – Elementary Statistics for the Biological and Life Sciences
129
Wald Conf. Intervals

NOTE: for Y ~ Bin(n,p), some previous
authors use the “Wald interval”:
p ± za/2SE(p)
where p = Y and
n
SE(p) = p(1-p) n

The value p is a good estimator for p, but
the confidence interval is substandard.
 DO NOT USE the Wald interval.
STAT205 – Elementary Statistics for the Biological and Life Sciences
130
Small-Sample Conf. Intervals
For n < 40, the AC interval is not recommended. Many alternatives exist, including:
• Clopper-Pearson “exact” intervals
• Casella’s refined C-P intervals

• F-based “exact” intervals

• Likelihood-ratio intervals
• Jeffreys’ equal-tailed Bayesian intervals
• Wilson’s continuity-corrected (WCC)
intervals

STAT205 – Elementary Statistics for the Biological and Life Sciences
131
WCC Conf. Intervals
For practical use when n < 40, the Wilson
Continuity-Corrected (WCC) interval can be
recommended:
2
z
(Y ± 12) + a/2
2
2
n + za/2
±
za/2
(Y ±
1 ) - 1 (Y
n
2
±
1) 2 + 1z2
4 a/2
2
2
n + za/2
STAT205 – Elementary Statistics for the Biological and Life Sciences
132
Example 6.18

Ex. 6.18: Y = # left-handed UK college
students. n = 400. Find y = 40.

The sample proportion here is
40 = 0.1.
p =Y
=
n
400
 For a 90% conf. interval on p, use AC. At
a = 0.10, we need
(za/2)2 = (z0.10/2)2 = (z0.05)2 = (1.645)2 = 2.706

STAT205 – Elementary Statistics for the Biological and Life Sciences
133
Example 6.18 (cont’d)
The AC Point estimator is
1 z2
Y
+
2
(
)
p =
2 a/2 (n + za/2)
1 z2
40
+
(
)
2
=
2 0.10/2 (400 + z0.10/2)
2
1.645
) (400 + 1.6452)
= (40 +
2
= 41.353
402.706
= 0.103.
STAT205 – Elementary Statistics for the Biological and Life Sciences
134
Example 6.18 (cont’d)
The MoE is
MoE = za/2
= z0.10/2
p(1 - p)
n + z2a/2
(.103)(.897)
400 + (1.645)2
= (1.645) .000229
= (1.645)(0.015)
= 0.025.
STAT205 – Elementary Statistics for the Biological and Life Sciences
135
Example 6.18 (concluded)
And so, collecting all the components
together, the 90% AC interval for p is
p ± MoE
= 0.103 ± (1.645)(0.015) = 0.103 ± 0.025
or 0.078 < p < 0.128.
STAT205 – Elementary Statistics for the Biological and Life Sciences
136
Sample Size Specification

We can use the AC interval to design a
future study with BInS data, by making a
sample size specification.

Suppose we want a 1 – a AC interval to
contain p with a MoE no larger than, say,
eo > 0. (So, the interval’s width is ≤ 2eo.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
137
AC Sample Size
Set MoE  eo. But MoE = za/2SE(p), so
eo
 SE(p)  z

a/2
p(1 - p)
eo


2
za/2
n + za/2
 n+
So
2
za/2
2
p(1 - p)
eo
 z
2
a/2
n + za/2

2
2
n + za/2
z
a/2
 2
p(1-p)
eo
2
za/2
p(1 - p)

eo2
2
za/2
p(1 - p) 2
- za/2.
n
2
eo
STAT205 – Elementary Statistics for the Biological and Life Sciences
138
AC Sample Size (cont’d)

One problem: we don’t know p

Solution: substitute an initial “guess” at
p, say, po. This gives the AC sample size
2
za/2
po(1 - po) 2
- za/2 .
n 
2
eo

If po is unknown use po = 0.5, which is
always a conservative solution (cf. Fig.
6.17).
STAT205 – Elementary Statistics for the Biological and Life Sciences
139
Example 6.20

Ex. 6.20 (modified): Suppose the expt. on
left-handed college students in Ex. 6.18
was only preliminary.

We want better accuracy than given in that
study. Say we want SE(p) ≤ 0.01 in a new
90% conf. interval.

Since eo = za/2SE(p) = z0.05SE(p)
this translates to eo ≤ (1.645)(.01) = .01645.
STAT205 – Elementary Statistics for the Biological and Life Sciences
140
Example 6.20 (cont’d)

We saw previously that the sample proportion was 40/400 = 0.1, so set po = 0.1.

Then, we have
n 
2
(1.645) (0.1)(0.9)
2
2
- (1.645)
(0.01645)
or simply n ≥ 897.3. So, use n = 898.

(If we work in ignorance, set po = 0.5. This
leads to n ≥ 2498.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
141
Chapter 7: Inferences for Two
Independent Samples
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
142
Two-Sample Setting
Popl’n 1
parameters
Sample 1 (of size n1)
µ1
y1
s1
s1
Popl’n 2
parameters


statistics
Sample 2 (of size n2)
µ2
y2
s2
s2
statistics
STAT205 – Elementary Statistics for the Biological and Life Sciences
143
Estimating µ1 – µ2

We estimate µ1 – µ2 via Y1 - Y2

Why? Recall rule E1 from Sec. 3.7:
E(aX + bY) = aE(X) + bE(Y).

Here, this is
E(Y1 - Y2) = (1)E(Y1) + (-1)E(Y2)
= (1)µ1 + (-1)µ2
= µ1 - µ2
 unbiased!
STAT205 – Elementary Statistics for the Biological and Life Sciences
144
Standard Error

To find the SE of Y1 - Y2 use rule E4:
sX-Y2 = sX2 + sY2.

Here, we have
s2Y
1-Y2

2
2
s
s
= s2Y + s2Y = 1 + 2
n1 n2
1
2
From this, the SE is
2
SE(Y1 - Y2) =
2
S1 + S2
n1 n2
(replace unknown s2 with s2)
STAT205 – Elementary Statistics for the Biological and Life Sciences
145
Examples 7.3-7.4
Ex. 7.3-7.4: Y = Vital capacity (air exhaled)
Y1 = vital capacity for brass musician
Y2 = vital capacity for control subject
STAT205 – Elementary Statistics for the Biological and Life Sciences
146
Examples 7.3-7.4 (cont’d)


Data summarize as:
Brass (j=1)
nj
7
4.83
Yj
Sj
0.435
So
Control (j=2)
5
4.74
0.351
Y1 - Y2 = 4.83 - 4.74 = 0.09
with SE(Y1 - Y2) =
2
2
0.435 + 0.351
7
5
= .0517 = 0.227
STAT205 – Elementary Statistics for the Biological and Life Sciences
147
Homogeneous Variances

Special Case: Suppose we know
s12 = s22 = s 2 (say).
Then
2
2
2
s
s
s 2( 1 + 1 )
sY Y =
+
=
1- 2
n1 n2
n1 n2

How shall we estimate the single,
unknown variance parameter s 2?
STAT205 – Elementary Statistics for the Biological and Life Sciences
148
Pooled Variance Estimator

Given s12 = s22 = s 2, estimate s 2 with a
weighted average of the sample variances:
2
Spool

2
2
(n1 - 1)S1 + (n2 - 1)S2
=
n1 + n2 - 2
S2pool( 1 + 1 )
n1 n2
= Spool n1 + n1
1
2
Then, SE(Y1 - Y2) =
STAT205 – Elementary Statistics for the Biological and Life Sciences
149
Confidence Interval for µ1 – µ2
We use SE(Y1 - Y2) in constructing conf.
intervals for µ1 – µ2.
s21
 Suppose Y1 ~ N( µ1 ,
is
indep.
of
)
n1
s22
Y2 ~ N( µ2 , n )
2


Apply our usual interval approach:
estimator ± ta/2(std. error).

But, what are the df for ta/2?
STAT205 – Elementary Statistics for the Biological and Life Sciences
150
Welch-Satterthwaite df
Approximation

In the general case, we cannot find the
exact df. A highly accurate approximation (if n1 > 5 and n2 > 5) is in Equ. (7.1):
S21
2 2
S2
+
SE2(Y1) + SE2(Y2)
n1 n2
dfws =
=
4
4
4
4
S
S
SE (Y1)
SE (Y2)
1
2
+
+
n21(n1-1) n22(n2-1)
n1 - 1
n2 - 1
2

Always round down.
STAT205 – Elementary Statistics for the Biological and Life Sciences
151
Example 7.7
Ex. 7.7: Y1 = height (cm) of control plant
Y2 = height (cm) of Ancymidol trt’d plant
STAT205 – Elementary Statistics for the Biological and Life Sciences
152
Example 7.7 (cont’d)
Set a = 0.05.
 WS df approx. is

dfws
23.04 + 22.09 2
8
7
=
530.84 + 487.97
(64)(7) (49)(6)
= … = 12.8

So use dfws = 12. (Some computers
actually accept df = 12.8.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
153
Example 7.7 (cont’d)

t-dist’n critical point is t.025 = 2.179 (df =
12) in the 95% conf. interval
y1 - y2 ± t.025
2
2
S1 + S2
n1 n2
= (15.9 - 11.0) ± 2.179
23.04 + 22.09
8
7

This is 4.9 ± (2.179)(2.46) = 4.9 ± 5.35,
or –0.45 < µ1 – µ2 < 10.25.

(Interval contains zero. Interpretation?)
STAT205 – Elementary Statistics for the Biological and Life Sciences
154
WS Approx. in Practice

In practice the WS approx. works well for
large-enough samples if normality holds.

Also, if normality holds and if s1 = s2, then
using the pooled SD, Spool, in
1 + 1
n1 n2
gives an exact conf. interval for µ1 – µ2.
y1 - y2 ± ta/2Spool

If normality is invalid, the CLT may still
apply, but the sample sizes must grow
much larger.
STAT205 – Elementary Statistics for the Biological and Life Sciences
155
Statistical Hypotheses

DEF’N: An HYPOTHESIS is a specification
about an unknown parameter, q.

DEF’N: The NULL HYPOTHESIS is a “no
effect” or “null state” hypothesis about q.
NOTATION: Ho:q = qo.

DEF’N: The ALTERNATIVE HYPOTHESIS
(a.k.a. RESEARCH HYPOTHESIS) is some
alternative to the null hypothesis about q.
NOTATION: HA:q ≠ qo.
STAT205 – Elementary Statistics for the Biological and Life Sciences
156
Testing µ1 and µ2

For testing µ1 and µ2, the natural
hypotheses are
Ho: µ1 = µ2  Ho: µ1 – µ2 = 0
vs.
HA: µ1 ≠ µ2  HA: µ1 – µ2 ≠ 0

Ex 7.7 (cont’d): For the Ancymidol growth
expt., take
Ho: µ1 = µ2  Ho: no effect of Ancymidol
HA: µ1 ≠ µ2  HA: some effect of Ancymidol
STAT205 – Elementary Statistics for the Biological and Life Sciences
157
Hypothesis Testing

DEF’N: A TEST of HYPOTHESES is a
formal procedure for inferring a
decision about Ho, relative to HA, based
on observed data.

A TEST STATISTIC is used to measure
the departure (if any) from Ho in the
data.
STAT205 – Elementary Statistics for the Biological and Life Sciences
158
Test Statistic for µ1 and µ2

To assess Ho: µ1 – µ2 = 0, we build the test
statistic from Y1 - Y2

To compensate for variability and scale,
we adjust the difference by its SE:
ts =
(Y1 - Y2) - 0
2
S1
n1
+
null hypothesized
value of µ1–µ2
2
S2
n2
“s” for “sample” to emphasize
dependence on random sample
STAT205 – Elementary Statistics for the Biological and Life Sciences
159
Rejecting Ho

Notice that if ts ≈ 0, then Y1 ≈ Y2
so we’re comfortable claiming Ho is true.

But, if ts >> 0 or ts << 0, evidence exists
in the data that µ1 ≠ µ2, i.e., Ho is false.

So, we REJECT Ho when ts grows too far
from 0.

But, how “big” is “big”?
STAT205 – Elementary Statistics for the Biological and Life Sciences
160
Null Distribution of ts
• If Y1 and Y2 are both normal (and indep.)
or
• if n1 and n2 are sufficiently large (think:
CLT)
then under Ho
ts ~ t(dfws)
where dfws are the WS df from Equ. (7.1)
STAT205 – Elementary Statistics for the Biological and Life Sciences
161
Using t-dist’n to Reject Ho
So, if Ho is false, we expect to locate ts in
the “tails” of the t(dfws) null reference distribution:
STAT205 – Elementary Statistics for the Biological and Life Sciences
162
P-value
DEF’N: The P-VALUE of a test statistic is
the probability under Ho of observing a
result as extreme or more extreme (in the
direction of HA) as that actually observ’d.
STAT205 – Elementary Statistics for the Biological and Life Sciences
163
Example 7.12
Ex. 7.12: Suppose we test Ho: µ1 = µ2 vs.
HA: µ1 ≠ µ2.
 We find ts = 2.34 with dfws = 8.47. So round
down to df = 8.
 The P-value is

P{t(8) > 2.34 or t(8) < –2.34}
= P{t(8) > 2.34} + P{t(8) < –2.34}
= 2 P{t(8) > 2.34}
= (2)(0.0227) = 0.0474.

(disjoint)
(symmetric)
(Book says exact value is 2P{t(8.47)>2.34] = .0454.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
164
Example 7.12 – P-Value
STAT205 – Elementary Statistics for the Biological and Life Sciences
165
Example 7.12 (cont’d)

How do we find the t-dist’n tail area?

Answer:
• via computer (e.g., TI-84 t-distribution
calculator, R, Excel, etc).
• indirectly, by bracketing it from tables of
t-critical points (such as Table 4); see
Example 7.15.
STAT205 – Elementary Statistics for the Biological and Life Sciences
166
(Portion of) Table 4, p.677
STAT205 – Elementary Statistics for the Biological and Life Sciences
167
Significance Levels

Example 7.12 illustrates the usefulness of
P-values: if P is very small, Ho is called
into question.

In practice, choose a low cut-off prior to
calculating P, say 0 < a < 1/2, and
REJECT Ho if P ≤ a.

We call a the SIGNIFICANCE LEVEL of the
test of Ho.
STAT205 – Elementary Statistics for the Biological and Life Sciences
168
“Fail to Reject”

By the way, if P > a, Ho may or may not be
plausible, so, technically, we say then “fail
to reject Ho.”

So, suppose we choose the traditional a =
0.05. In Example 7.12, we found P = 0.045
for testing Ho: µ1 = µ2. So, since P = 0.045 ≤
0.05 = a, we reject Ho and conclude there is
a significant difference between µ1 and µ2.
(See Example 7.13.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
169
Example 7.14

Ex. 7.14 (7.7 cont’d): Recall the Ancymidol
data from Table 7.5, where µj = mean plant
growth (j=1: control; j=2: Ancymidol).
Test Ho: µ1 = µ2 vs. HA: µ1 ≠ µ2. Set a = 0.05.
 We had Y1 = 15.9, Y2 = 11.0,

SE(Y1-Y2) = 2.46, and dfws = 12.8
(15.9 - 11.0) - 0
so ts =
= 4.9 = 1.99
2.46
2.46
STAT205 – Elementary Statistics for the Biological and Life Sciences
170
Example 7.14 (cont’d)

Round the df down to df = 12. The P-value is
P{t(12) > 1.99 or t(12) < –1.99}
= P{t(12) > 1.99} + P{t(12) < –1.99}
= 2 P{t(12) > 1.99}
= (2)(0.0349) = 0.0699.

So, since P = .0699 > .05 = a, we fail to reject
Ho and conclude that Ancymidol does not
significantly affect mean growth rates.
STAT205 – Elementary Statistics for the Biological and Life Sciences
171
Non-Zero Null Values
Note in passing: If the null hypothesis
specifies a non-zero null value, say
Ho: µ1 – µ2 = C,
we simply modify the the test statistic to
ts =
(Y1 - Y2) - C
2
S1
n1
+
2
S2
n2
and proceed in a similar fashion as that
above.
STAT205 – Elementary Statistics for the Biological and Life Sciences
172
Tautology

Hypothesis tests and conf. intervals are
different forms of the same inference.
• The “events” that lead to a 1 – a conf.
interval statement can be re-expressed as
“events” that fail to reject Ho. (See
illustration on p. 248.)
• Formally: a 1 – a t-based conf. interval on
µ1–µ2 will contain zero if and only if the t-test
of Ho: µ1=µ2 fails to reject at signif. level a.

See Example 7.16.
STAT205 – Elementary Statistics for the Biological and Life Sciences
173
Error Rates

DEF’N: The FALSE POSITIVE RATE
(a.k.a. the TYPE I ERROR RATE) of a test is the
probability of rejecting Ho when it is true.
NOTATION: a = P{reject Ho | Ho true}

DEF’N: The FALSE NEGATIVE RATE
(a.k.a. the TYPE II ERROR RATE) of a test is the
probability of accepting Ho when it is false.
NOTATION: b = P{accept Ho | Ho false}
STAT205 – Elementary Statistics for the Biological and Life Sciences
174
Table 7.10:
Types of Testing Errors
STAT205 – Elementary Statistics for the Biological and Life Sciences
175
Choosing a

Unfortunately, we cannot simultaneously
minimize both a and b. Traditionally, we
attend to a:
• if a false positive error is worse than a false
negative, drive a very low (.01, .005, …);
• if a false negative error is worse than a false
positive, let a rise (.10, or even .15).

If you’re not sure/can’t distinguish, then a
traditional middle ground is a = 0.05.
STAT205 – Elementary Statistics for the Biological and Life Sciences
176
Example 7.19

Ex. 7.19: Cancer survival study.
• µ1 = mean chemotherapy response
• µ2 = mean chemo. + immunotherapy response

Test Ho: µ1 = µ2  Ho: no effect of immunotherapy
vs. HA: µ1 ≠ µ2  HA: some effect of immunother.

Here, a false positive is “thinking useless
immunotherapy is worthwhile,” while a false
negative is “dismissing useful immunotherapy.”

If former is worse, set a low. If latter is worse,
set a higher.
STAT205 – Elementary Statistics for the Biological and Life Sciences
177
Power
DEF’N: The POWER of a test is the
probability of rejecting Ho when it is false:
P{reject Ho | Ho false}.
 Notice:
P{reject Ho | Ho false} = 1 – P{accept Ho | Ho
false} = 1 – b. So, power is the
complement of false negative error.


We often try to design expts. so that
1 – b ≥ 0.80.
(See Sec. 7.8 for more on statistical power.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
178
One-Sided Hypotheses

DEF’N: A TWO-SIDED (alternative)
HYPOTHESIS describes departure from Ho
in two directions (e.g., HA: q ≠ qo); a.k.a.
“non-directional.”

DEF’N: A ONE-SIDED (alternative)
HYPOTHESIS describes departure from Ho
in a specific direction; a.k.a. “directional.”
Possibilities include HA: q > qo or HA: q < qo.
STAT205 – Elementary Statistics for the Biological and Life Sciences
179
Default is Two-Sided

In any hypothesis testing scenario, the
decision to chose a one-sided vs. twosided alternative hypothesis MUST be
made prior to sampling the data.

If the subject-matter cannot guide this
decision then use a two-sided
alternative hypothesis, by default.
STAT205 – Elementary Statistics for the Biological and Life Sciences
180
Example 7.21

Ex. 7.21: Laboratory cancer study.
• µ1 = mean rate of skin tumors in mice
exposed to hair dye
• µ2 = mean rate of skin tumors in
control mice

Against Ho:µ1 = µ2, it is sensible to set
HA:µ1 > µ2, since if any effect occurred
we would expect it to cause more
cancers (on average).
STAT205 – Elementary Statistics for the Biological and Life Sciences
181
One-Sided Testing

To assess Ho against a one-sided alternative, we just incorporate the direction of
HA into the P-value.

Recall: P = P{result as extreme or more
extreme as ts in the direction of HA}.
• So, if HA:µ1 – µ2 > 0, calculate P from the
upper tail (“greater than 0”) only.
• But, if HA:µ1 – µ2 < 0, calculate P from the
lower tail (“less than 0”) only.
STAT205 – Elementary Statistics for the Biological and Life Sciences
182
Figure 7.16
STAT205 – Elementary Statistics for the Biological and Life Sciences
183
Example 7.22

Ex. 7.22: Y1 = lamb wt. after niacin suppl.
Y2 = lamb wt. after no suppl.

j Set a = 0.05. k Set Ho:µ1 – µ2 = 0. Since
any niacin effect will only increase weight
(if at all), use HA:µ1 – µ2 > 0.

Data give Y1 = 14.0, Y2 = 10.0,
and SE(Y1 - Y2) = 2.2
with dfws = 18.0 from Eq. (7.1).
STAT205 – Elementary Statistics for the Biological and Life Sciences
184
Example 7.22 (cont’d)
(14 - 10) - 0
 l The test statistic is ts =
= 1.82
2.2
and the one-sided P-value is P{t(18) > 1.82}.

Find this P-value using:
• TI-84, R, or another
computer program
• Bracketing via Table 4
STAT205 – Elementary Statistics for the Biological and Life Sciences
185
Example 7.22 – P-value

P = P{t(18) > 1.82}. Bracketing from Table 4
is illustrated in Table 7.11:
ts = 1.82

m Find .04 < P < .05. (Actual P = .043.)
n reject Ho and o conclude that niacin
suppl. significantly increases mean weight.
STAT205 – Elementary Statistics for the Biological and Life Sciences
186
Two-Sample Testing
Summarizing: to test Ho:µ1 = µ2
vs.
• HA: µ1 ≠ µ2,
reject Ho when P = 2P{t(dfws) > |ts|} ≤ a
• HA: µ1 > µ2,
reject Ho when P = P{t(dfws) > ts} ≤ a
• HA: µ1 < µ2,
reject Ho when P = P{t(dfws) < ts} ≤ a
STAT205 – Elementary Statistics for the Biological and Life Sciences
187
Rejection Regions
ASIDE: an equivalent (but, older) way to
perform hypoth. tests is known as the
REJECTION REGION (or CRITICAL REGION)
approach: reject Ho when ts exceeds a
specific, tabulated “critical point.”
• Advantage: tables of critical points are
common (indeed, Table 4 is one).
• Disadvantage: hard to find (and report)
P-values this way.
STAT205 – Elementary Statistics for the Biological and Life Sciences
188
t-Test Rejection Regions
To test Ho:µ1 = µ2 using rejection regions,
if:
• HA: µ1 ≠ µ2,
reject Ho when |ts| ≥ ta/2 (with df = dfws)
• HA: µ1 > µ2,
reject Ho when ts ≥ ta
(with df = dfws)
• HA: µ1 < µ2,
reject Ho when ts ≤ –ta
(with df = dfws)
STAT205 – Elementary Statistics for the Biological and Life Sciences
189
Distribution-Free Tests

What if the data aren’t normal and the
sample sizes are small (so that the CLT
doesn’t apply)?

In this case, the t-test is invalid, and we
apply a general, two-sample test that
doesn’t depend on N(µ,s2)  “distribution
free” (a.k.a. “non-parametric”).

We develop this using the ranks of the
observations.
STAT205 – Elementary Statistics for the Biological and Life Sciences
190
Distribution-Free
Hypotheses

In the distribution-free setting, we describe
the hypotheses in terms of concepts:
• Ho: the distributions of Y1 and Y2 are the
same
• HA: the distributions of Y1 and Y2 are shifted
in some way
(note that HA can be one-sided or two-sided)

Our method for testing Ho is known as the
Wilcoxon-Mann-Whitney “Rank Sum” test.
STAT205 – Elementary Statistics for the Biological and Life Sciences
191
Rank-Sum Algorithm
To find the Wilcoxon-Mann-Whitney statistic:
(1) Arrange (“rank”) the data in each group from
smallest to largest;
(2) Count the number of obsv’ns in each group that
are smaller than each obsv’n in the opposite group;
(3) Sum the counts of smaller across-group obsv’ns
(ties get “1/2”): K1 = ∑(obsv’ns smaller in group 2)
K2 = ∑(obsv’ns smaller in group 1)
(4) The test statistic is Us = max{K1 , K2}.
STAT205 – Elementary Statistics for the Biological and Life Sciences
192
Rejection Criterion

Note: a check of the rank sum calculations is
always available: K1 + K2 = n1n2

For fixed a and pre-specified Ho & HA, find the
critical point ua in Table 6 (p. 680). In this
table, n = max{n1 , n2} and n = min{n1 , n2}.

Given Us, reject Ho in favor of HA if:
• (Rejection region approach) Us ≥ ua
• (P-value approach) P ≤ a, where P is
bracketed using Table 6 (or found via
computer).
STAT205 – Elementary Statistics for the Biological and Life Sciences
193
(Portion of) Table 6, p.680
STAT205 – Elementary Statistics for the Biological and Life Sciences
194
Examples 7.38-7.39

Exs. 7.38-7.39:
Y1 = soil respiration in heavy forest growth
Y2 = soil respir. in forest canopy gap

j Set a = .05. k Test
Ho: dist’ns of Y1 & Y2 are same vs.
HA: dist’ns of Y1 & Y2 are shifted
STAT205 – Elementary Statistics for the Biological and Life Sciences
195
Rank-Sum Calculations
Table 7.18
STAT205 – Elementary Statistics for the Biological and Life Sciences
196
Example 7.39 (cont’d)

Check calculations:
K1 + K2 = 49.5 + 6.5 = 56 = (7)(8) = n1n2


l Test statistic is Us = max{49.5 , 6.5} = 49.5.

In Table 6, use n = max{n1,n2} = 8, n =
min{n1,n2} = 7. At a = 0.05, the two-tailed
critical point is u.05 = 46 (see next slide ).

n Rejection region approach: Reject Ho if
Us ≥ u.05. Since 49.5 ≥ 46, we reject Ho and
o conclude soil respirat’n differs significantly.
STAT205 – Elementary Statistics for the Biological and Life Sciences
197
Example 7.39 (P-value)

To use the m P-value approach, we bracket
the P-value using Table 6. Table 7.19
shows how:
Us = 49.5

Here, 0.01 < P < 0.02 (two-sided). (P = 0.015
from R)  P ≤ 0.05 = a so n reject Ho.
STAT205 – Elementary Statistics for the Biological and Life Sciences
198