Student's t test, Inference for variances

advertisement
“Students” t-test
Recall: The z-test for means
The Test Statistic
z
x  0
x
x  0
x  0



s
n
n
Note: The replacement of  by s can only be done if the
sample size is large (n > 30). For smaller sample sizes one
has to account for the variability introduced by replacing 
by s .
Comments
• The sampling distribution of this statistic is the
standard Normal distribution
• The replacement of  by s leaves this
distribution unchanged only if the sample size
n is large.
For small sample sizes:
The sampling distribution of
x  0
t
s
n
is called “students” t distribution
with n –1 degrees of freedom
Properties of Student’s t
distribution
• Similar to Standard normal distribution
– Symmetric
– unimodal
– Centred at zero
• Larger spread about zero. (heavier tails)
– The reason for this is the increased variability introduced
by replacing  by s.
• As the sample size increases (degrees of freedom
increases) the t distribution approaches the standard
normal distribution
0.4
0.3
0.2
0.1
-4
-2
2
4
t distribution
standard normal distribution
The Situation
• Let x1, x2, x3 , … , xn denote a sample from a
normal population with mean  and standard
deviation . Both  and  are unknown.
• Let
n
x
x
i 1
n
n
s
i
 the sample mean
 x  x 
i 1
2
i
n 1
 the sample standard deviation
• we want to test if the mean, , is equal to some
given value 0.
The Test Statistic
x  0
t
s
n
The sampling distribution of the test
statistic is the t distribution with n-1
degrees of freedom
The Alternative
Hypothesis HA
The Critical Region
H A :   0
t  t / 2 or t  t / 2
H A :   0
t  t
H A :   0
t  t
t and t/2 are critical values under the t
distribution with n – 1 degrees of
freedom
Critical values for the t-distribution
 or /2
0
t
t / 2 or t
Critical values for the t-distribution are
provided in tables. A link to these tables are
given with today’s lecture
Look up 
Look up df
…
Note: the values
tabled for df = ∞
are the same
values for the
standard normal
distribution, z
Example
• Let x1, x2, x3 , x4, x5, x6 denote weight loss
from a new diet for n = 6 cases.
• Assume that x1, x2, x3 , x4, x5, x6 is a sample
from a normal population with mean  and
standard deviation . Both  and  are
unknown.
• we want to test:
H 0 :   0 New diet is not effective
versus
HA :   0
New diet is effective
The Test Statistic
x  0
t
s
n
The Critical region:
Reject if
t  t
The Data
1
2.0
2
1.0
3
1.4
4
-1.8
5
0.9
6
2.3
The summary statistics:
x  0.96667 and s  1.462418
The Test Statistic
x  0
0.96667  0
t

 1.619
1.462418
s
n
6
The Critical Region (using  = 0.05)
Reject if
t  t0.05  2.015 for 5 d.f.
Conclusion: Accept H0:
Confidence Intervals
using the t distribution
Confidence Intervals for the mean of a Normal
Population, , using the Standard Normal
distribution
x  z / 2

n
Confidence Intervals for the mean of a Normal
Population, , using the t distribution
x  t / 2
s
n
The Data
1
2.0
2
1.0
3
1.4
4
-1.8
5
0.9
6
2.3
The summary statistics:
x  0.96667 and s  1.462418
Example
• Let x1, x2, x3 , x4, x5, x6 denote weight loss
from a new diet for n = 6 cases.
The Data:
1
2.0
2
1.0
3
1.4
4
-1.8
5
0.9
6
2.3
The summary statistics:
x  0.96667 and s  1.462418
Confidence Intervals (use  = 0.05)
x  t0.025
s
n
1.462418
0.96667  2.571
6
0.96667 1.535
 0.57 to 2.50
Summary
Statistical Inference
Estimation by Confidence
Intervals
Confidence Interval for a Proportion
pˆ  z / 2 pˆ
 pˆ 
p1  p 

n
pˆ 1  pˆ 
n
z / 2  upper  / 2 critical point
of the standard normal distribtio n
B  z / 2 pˆ  z / 2
p 1  p 
n
 z / 2
 Error Bound
pˆ 1  pˆ 
n
Determination of Sample Size
The sample size that will estimate p with an Error Bound B
and level of confidence P = 1 –  is:
za2/ 2 p * 1  p *
n
B2
where:
• B is the desired Error Bound
• z/2 is the /2 critical value for the standard normal
distribution
• p* is some preliminary estimate of p.
Confidence Intervals for the mean
of a Normal Population, 
x  z / 2 x
or x  z / 2
or x  z / 2

n
s
n
x  sample mean
z / 2  upper  / 2 critical point
of the standard normal distribtio n
s  sample standard deviation  
Determination of Sample Size
The sample size that will estimate  with an Error Bound B
and level of confidence P = 1 –  is:
z 
z s *
n

2
2
B
B
2
a/2
2
2
a/2
2
where:
• B is the desired Error Bound
• z/2 is the /2 critical value for the standard normal
distribution
• s* is some preliminary estimate of s.
Confidence Intervals for the mean of a Normal
Population, , using the t distribution
x  t / 2
s
n
Hypothesis Testing
An important area of statistical
inference
To define a statistical Test we
1. Choose a statistic (called the test statistic)
2. Divide the range of possible values for the
test statistic into two parts
• The Acceptance Region
• The Critical Region
To perform a statistical Test we
1. Collect the data.
2. Compute the value of the test statistic.
3. Make the Decision:
• If the value of the test statistic is in
the Acceptance Region we decide to
accept H0 .
• If the value of the test statistic is in
the Critical Region we decide to
reject H0 .
Determining the Critical Region
1. The Critical Region should consist of values of
the test statistic that indicate that HA is true.
(hence H0 should be rejected).
2. The size of the Critical Region is determined so
that the probability of making a type I error, ,
is at some pre-determined level. (usually 0.05 or
0.01). This value is called the significance level
of the test.
Significance level = P[test makes type I error]
To find the Critical Region
1. Find the sampling distribution of the test statistic
when is H0 true.
2. Locate the Critical Region in the tails (either
left or right or both) of the sampling distribution
of the test statistic when is H0 true.
Whether you locate the critical region in the left
tail or right tail or both tails depends on which
values indicate HA is true.
The tails chosen = values indicating HA.
3. the size of the Critical Region is chosen so that
the area over the critical region and under the
sampling distribution of the test statistic when is
H0 true is the desired level of  =P[type I error]
Sampling distribution
of test statistic when H0
is true
Critical Region - Area = 
The z-test for Proportions
Testing the probability of success in a
binomial experiment
Situation
• A success-failure experiment has been
repeated n times
• The probability of success p is unknown.
We want to test either
1. H 0 : p  p0 versus
H A : p  p0
or
2. H 0 : p  p0 versus
or
3. H 0 : p  p0 versus
H A : p  p0
H A : p  p0
The Test Statistic
z
pˆ  p0
 pˆ

pˆ  p0
p0 1  p0 
n
Critical Region (dependent on HA)
Alternative Hypothesis
Critical Region
H A : p  p0
z   z/2 or z  z/2
H A : p  p0
z  z
H A : p  p0
z   z
The z-test for the mean of a Normal
population (large samples)
Situation
• A sample of n is selected from a normal
population with mean  (unknown) and
standard deviation . We want to test either
1. H 0 :   0 versus
H A :   0
or
2. H 0 :   0 versus
or
3. H 0 :   0 versus
H A :   0
H A :   0
The Test Statistic
z
x  0
x

x  0
x  0


s
n
n
if n is large.
Critical Region (dependent on HA)
Alternative Hypothesis
Critical Region
H A :   0
z   z/2 or z  z/2
H A :   0
z  z
H A :   0
z   z
The t-test for the mean of a Normal
population (small samples)
Situation
• A sample of n is selected from a normal
population with mean  (unknown) and
standard deviation  (unknown). We want
to test either
1. H 0 :   0 versus
H A :   0
or
2. H 0 :   0 versus
or
3. H 0 :   0 versus
H A :   0
H A :   0
The Test Statistic
x  0
x  0
t

s
sx
n
Critical Region (dependent on HA)
Alternative Hypothesis
H A :   0
Critical Region
t  t/2 or t  t/2
H A :   0
t  t
H A :   0
t  t
Testing and Estimation of
Variances
Let x1, x2, x3, … xn, denote a sample from a
Normal distribution with mean  and standard
deviation  (variance 2)
The point estimator of the variance 2 is:
n
s 
2
 x  x 
i 1
2
i
n 1
The point estimator of the standard deviation  is:
n
s
x  x 
i 1
i
n 1
2
Sampling Theory
The statistic
U
n
 x  x 
i 1
i
2
2
n  1 s 2


2
has a c2 distribution with n – 1 degrees of freedom
Critical Points of the c2 distribution
0.2
0.1

0
0
5
c
2
10
15
20
Critical Values for the Chi-squared
(c2) distribution
Link to Table
These values can also be calculated using
Excel and the function:
CHIINV(alpha, df)
Confidence intervals for 2 and .
0.2
 2

n  1 s 2

2
P  c1 / 2 
 c / 2   1  
2



0.1
/2
1
0
0
c12 / 2
/2
5
c2 / 2
10
15
20
Confidence intervals for 2 and .
It is true that
2
 2

n

1
s


2
P  c1 / 2 
 c / 2   1  
2



from which we can show
  n  1 2
n  1 2 

2
P 2 s   2
s   1 
c1 / 2 
 c / 2
and

P

 n  1 s     n  1 s   1  
c2 / 2
c12 / 2 
Hence (1 – )100% confidence limits for 2 are:
 n  1 s 2
c / 2
2
to
 n  1 s 2
c
2
1 / 2
and (1 – )100% confidence limits for  are:
 n  1 s
c / 2
2
to
 n  1 s
c12 / 2
Example
• In this example the subject is asked to type his
computer password n = 6 times.
• Each time xi = time to type the password is
recorded. The data are tabulated below:
i
xi
x
1
2
3
4
5
6
Sx i
Sx i 2
6.63
8.51
9.01
8.69
8.71
8.83
50.38
426.9062
x
i
i
n
50.38

 8.3967
6
2


2
  xi 
n
50.38


2
 i 1 
x

426.9062


i
n
6
s  i 1

 0.881151
n 1
5
n
95% confidence limits for the mean 
or
s
t.025  2.571 for 5 d . f .
x  t.025
n
0.881151
8.3967   2.571
8.3967  0.9249
7.472 to 9.322
6
2
2
c.975
 0.8312, c.025
 12.83 for 5 d . f .
95% confidence limits for 
n 1
c
2
2
s to
5
(0.881151) to
12.83
n 1
c
2
97
s
5
(0.881151)
0.8312
0.550 to 2.161
95% confidence limits for 2
2
n

1
s
 
c
2
2
to
2
n

1
s
 
c 2 97
5(0.881151) 2
5(0.881151) 2
to
12.83
0.8312
0.303 to 4.671
Testing Hypotheses for 2 and .
Suppose we want to test:
H 0 :  2   02 against H A :  2   02
The test statistic:
U
2
n

1
s
 

2
0
If H 0 is true the test statistic, U, has a c2 distribution
with n – 1 degrees of freedom:
Thus we reject H0 if
2
n

1
s
 

2
0
c
2
1 / 2
or
2
n

1
s
 

2
0
 c / 2
2
0.2
0.1
/2
/2
Reject
Reject
Accept
0
0
c12 / 2
5
c2 / 2
10
15
20
One-tailed Tests for 2 and .
Suppose we want to test:
H 0 :  2   02 against H A :  2   02
The test statistic:
We reject H0 if
U
2
n

1
s
 

 n  1 s
 02
2
2
0
 c2
0.2
0.1

Reject
Accept
0
0
5
c2
10
15
20
Or suppose we want to test:
H 0 :  2   02 against H A :  2   02
2
n

1
s
 
The test statistic:
U
We reject H0 if
 n  1 s
 02

2
2
0
 c12
0.2
0.1

Reject
Accept
0
0
c12
5
10
15
20
Example
• The current method for measuring blood alcohol content has the
following properties
– Measurements are
1. Normally distributed
2. Mean  = true blood alcohol content
3. standard deviation 1.2 units
• A new method is proposed that has the first two properties and
it is believed that the measurements will have a smaller standard
deviation.
• We want to collect data to test this hypothesis.
• The experiment will be to collect n = 10 observations on a case
were the true blood alcohol content is 6.0
• The data are tabulated below:
i
xi
1
2
3
4
5
6
7
8
9
10
Sx i
Sx i 2
5.21
6.90
5.69
5.05
5.75
5.90
6.92
6.48
6.85
5.80
60.55
370.9445
x
x
i
i
n
 6.0550
2


  xi 
n
2
 i 1 
x


i
n
s  i 1
 0.692359, s 2  0.479361
n 1
n
To test:
H 0 :  2  1.22 against H A :  2  1.22
U
The test statistic:
2
n

1
s
 
 02
9  0.479361

U
 2.996
2
1.2
We reject H0 if
Uc
2
1
c
2
0.95
 3.325 for 9 d . f .
Thus we reject H0 if  = 0.05.
Two sample Tests
Download