ppt - Department of Statistics

advertisement
Inference on a Single
Mean
G. Baker, Department of Statistics
University of South Carolina
Use Calculation from Sample to
Estimate Population Parameter
Population
(select)
Sample
(calculate)
(describes)
Parameter
p?
(estimate)
Statistic
pˆ  63 %
G. Baker, Department of Statistics
University of South Carolina; Slide 2
Use Calculation from Sample to
Estimate Population Parameter
Population
(select)
Sample
(calculate)
(describes)
Parameter
 ?
(estimate)
Statistic
y  2 , 200 hrs
G. Baker, Department of Statistics
University of South Carolina; Slide 3
Statistic
Describes a
sample.
 Always known
 Changes upon
repeated sampling.
 Examples:

y , s , s , pˆ
2
Parameter
Describes a
population.
 Usually unknown
 Is fixed


Examples:
 , , , p
2
G. Baker, Department of Statistics
University of South Carolina; Slide 4
A Statistic is a Random Variable
Upon repeated sampling of the same
population, the value of a statistic changes
 variable.
 While we don’t know what the next value
will be, we do know the overall pattern over
many, many samplings  random.
 The distribution of possible values of a
statistic for repeated samples of the same
size from a population is called the
sampling distribution of the statistic.

G. Baker, Department of Statistics
University of South Carolina; Slide 5
Sampling Distribution of y
•If a random sample of size n is taken
from a normal population having mean
μy and variance σy2, then y is a random
variable which is also normally
distributed with mean μy and variance
σy2/n .
G. Baker, Department of Statistics
University of South Carolina; Slide 6
Sampling Distribution of y
Original Population
Averages - Sample Size = 10
n(100,5)
80
85
90
95
100
105
110
115
n(100,1.58)
120
80
85
90
95
X
100
105
90
95
100
X(2)
120
Averages - Sample Size = 25
n(100,3.54)
85
115
X(10)
Averages - Sample Size = 2
80
110
105
110
115
n(100,1)
120
80
85
90
95
100
105
110
115
120
G.X(25)
Baker, Department of Statistics
University of South Carolina; Slide 7
Light Bulbs
The life of a light bulb is normally
distributed with a mean of 2000 hours and
standard deviation of 300 hours.
 What is the probability that a randomly
chosen light bulb will have a life of less
than 1700 hours?
 What is the probability that the mean life
of three randomly chosen light bulbs will
be less than 1700 hours?

G. Baker, Department of Statistics
University of South Carolina; Slide 8
Why Averages Instead of Single
Readings?
Suppose we are manufacturing light bulbs. The
life of these bulbs has historically followed a
normal distribution with a mean of 2000 hours
and standard deviation of 300 hours.
 We change the filament material and unknown
to us the average life of the bulbs decreases to
1500 hours. (We will assume that the
distribution remains normal with a standard
deviation of 300 hours.)
 If we randomly sample 1 bulb, will we realize
that the average life has decrease? What if we
sample 3 bulbs? 9 bulbs?
G. Baker, Department of Statistics

University of South Carolina; Slide 9
Why Averages Instead of Single
Readings?
μ = 1500
800
1300
σ = 300
μ = 2000
1800
2300
2800
Single Readings
Y < 1400 would signal shift
G. Baker, Department of Statistics
University of South Carolina; Slide 10
Why Averages Instead of Single
Readings?
μ = 1500
800
1300
σ = 173
μ = 2000
1800
2300
2800
Averages of n = 3
Y < 1654 would signal shift
G. Baker, Department of Statistics
University of South Carolina; Slide 11
Why Averages Instead of Single
Readings?
µ = 1500
µμ==1500
1500
800
1300
µ = 2000
µμ==2000
2000
1800
2300
σ = 100
2800
Averages of n = 9
Y < 1800 would signal shift
G. Baker, Department of Statistics
University of South Carolina; Slide 12
What if the original distribution
is not normal? Consider the roll
of a fair die:
Rolling A Fair Die
Probability
0.20
0.15
0.10
0.05
0.00
1
2
3
4
5
6
# of Dots
G. Baker, Department of Statistics
University of South Carolina; Slide 13
Suppose the single measurements
are not normally Distributed.
 Let
Y = time to fail of a light bulb in
constant failure rate mode
 Y is exponentially distributed
with λ = 0.0005 = 1/2000
0.0005
0
1000
2000
3000
4000
5000
6000
G. Baker,
Department of Statistics
8000
University of South Carolina; Slide 14
7000
Single measurements
Averages of 2 measurements
Averages of 4 measurements
Averages of 25 measurements
Source: Lawrence L.
Lapin, Statistics in
Modern Business
Decisions, 6th ed.,
1993, Dryden Press,
Ft. Worth, Texas.
G. Baker, Department of Statistics
University of South Carolina; Slide 15
n=1
As n increases, what
happens to the variance?
n=2
n=4
A.Variance increases.
B.Variance decreases.
C.Variance remains the
same.
n=25
G. Baker, Department of Statistics
University of South Carolina; Slide 16
n=1
n=2
n=4
n = 25
G. Baker, Department of Statistics
University of South Carolina; Slide 17
Central Limit Theorem
 If
n is sufficiently large, the sample
means of random samples from a
population with mean μ and standard
deviation σ are approximately normally
distributed with mean μ and standard
deviation  / n .
G. Baker, Department of Statistics
University of South Carolina; Slide 18
Random Behavior of Means Summary
 If
Y is distributed n(μ, σ), then y n
is distributed n(μ,  / n ).
 If
Y is distributed non-n(μ, σ), then y x  30
is distributed approximately
n(μ,  / n ).
G. Baker, Department of Statistics
University of South Carolina; Slide 19
If We Can Consider y to be Normal …

Recall: If Y is distributed normally with
mean μ and standard deviation σ, then
Z 

Y 

So if y is distributed normally with
mean μ and standard deviation  / n ,
then
Z 
Y 
 /
n
G. Baker, Department of Statistics
University of South Carolina; Slide 20
If the time between industrial accidents
follows an exponential distribution with an
average of 700 days, what is the probability
that the average time between 49 pairs of
accidents will be greater than 900 days?
G. Baker, Department of Statistics
University of South Carolina; Slide 21
XYZ Bottling Company claims that
the distribution of fill on it’s 16 oz
bottles averages 16.2 ounces with
a standard deviation of 0.1 oz. We
randomly sample 36 bottles and
get y = 16.15. If we assume a
standard deviation of 0.1 oz, do we
believe XYZ’s claim of averaging
16.2 ounces?
G. Baker, Department of Statistics
University of South Carolina; Slide 22
Up Until Now We have been Assuming
that We Knew the True Standard
Deviation (σ), But Let’s Face Facts …

When we use s to estimate σ, then the
calculated value
y
s/
n
follows a t-distribution with n-1 degrees of
freedom.
Note: we must be able to assume that we are
sampling from a normal population.
G. Baker, Department of Statistics
University of South Carolina; Slide 23
Let’s take another look at XYZ
Bottling Company. If we assume
that fill on the individual bottles
follows a normal distribution, does
the following data support the
claim of an average fill of 16.2 oz?
16.1 16.0 16.3 16.2 16.1
G. Baker, Department of Statistics
University of South Carolina; Slide 24
In Summary

When we know σ:
Z 

y
 / n
When we estimate σ with s:
t df  n 1 
y
s/
n
We assume we are
sampling from a
normal population.
G. Baker, Department of Statistics
University of South Carolina; Slide 25
Relationship Between Z and t
Distributions
Z
tdf=3
tdf=1
-4
-3
-2
-1
0
1
2
3
4
G. Baker, Department of Statistics
University of South Carolina; Slide 26
Internal Combustion Engine

The nominal power produced by a studentdesigned internal combustion engine should
be 100 hp. The student team that designed
the engine conducted 10 tests to determine
the actual power. The data follow:
98, 101, 102, 97, 101, 98, 100, 92, 98, 100
Assume data came from a normal distribution.
G. Baker, Department of Statistics
University of South Carolina; Slide 27
Internal Combustion Engine
Summary Data:
Column
hp
n
Mean
10
Std. Dev.
98.7
2.9
What is the probability of getting a sample
mean of 98.7 hp or less if the true mean is
100 hp?
G. Baker, Department of Statistics
University of South Carolina; Slide 28
Internal Combustion Engine
98 . 7  100 

P ( y  98 . 7 |   100 )  P  t df  9 
  P ( t df  9   1 . 418 )
2 . 9 / 10 

0.0949
-4
-3
-2
-1
0
1
2
3
4
t(df=9)
What did we assume when doing this analysis?
Are you comfortable with the assumption?
G. Baker, Department of Statistics
University of South Carolina; Slide 29
Can We Assume Sampling from a
Normal Population?

If data are from a normal population,
there is a linear relationship between the
data and their corresponding Z values.
Z 
Y 

Y  Z 
If we plot y on the vertical axis and z on the horizontal
axis, the y intercept estimates μ and the slope estimates σ.
G. Baker, Department of Statistics
University of South Carolina; Slide 30
How to Calculate Corresponding
Z-Values
Order data
 Estimate percent of population below each
data point.
i  0 .5

Pi 
n
where i is a data point’s position in the ordered set
and n is the number of data points in the set.

Look up Z-Value that has Pi proportion of
distribution below it.
G. Baker, Department of Statistics
University of South Carolina; Slide 31
Normal Probability (QQ) Plot
Data set:
2
Z
4
7
10
Pi
yi
i
-1.15
.125
2
1
-0.32
.375
4
2
+0.32
.625
7
3
+1.15
.875
10
4
Normal QQ Plot
12
10
Data
8
6
4
2
0
-1.5
-1
-0.5
0
0.5
1
1.5
Z values
G. Baker, Department of Statistics
University of South Carolina; Slide 32
Normal Probability (QQ) Plot
QQ Plot with Data on Vertical Axis
16
14
12
10
8
6
4
2
0
-3
-2
-1
0
1
2
This data is a random sample from a n(10,2) population.
G. Baker, Department of Statistics
University of South Carolina; Slide 33
3
Normal Probability (QQ) Plot
QQ Plot with Data on Vertical Axis
16
14
12
10
8
6
4
2
0
-3
-2
-1
0
1
2
3
G. Baker, Department of Statistics
University of South Carolina; Slide 34
Estimation of the Mean
G. Baker, Department of Statistics
University of South Carolina
Point Estimators
A point estimator is a single number
calculated from sample data that is used to
estimate the value of a parameter.
 Recall that statistics change value upon
repeated sampling of the same population while
parameters are fixed, but unknown.
 Examples:

pˆ estimates
p
ˆ  s estimates
ˆ  y estimates


2
2
ˆ
  s estimates

2
G. Baker, Department of Statistics
University of South Carolina; Slide 36
In General: ˆ is an estimator
of the arbitrary
parameter

What makes a “Good” estimator?
(1) Accuracy: An unbiased estimator of a
parameter is one whose expected value is equal
to the parameter of interest.
(2) Precision: An estimator is more precise if
its sampling distribution has a smaller
standard error*.
*Standard error is the standard deviation
G. Baker, Department of Statistics
for the sampling distribution.
University of South Carolina; Slide 37
Unbiased Estimators
For normal populations, both the sample
mean and sample median are unbiased
estimators of μ.
Sampling Distributions for Mean and Median
mean
median
-8
-6
-4
-2
µ0
2
4
6
8
G. Baker, Department of Statistics
University of South Carolina; Slide 38
Most Efficient Estimators

If you have multiple unbiased estimators, then you
choose the estimator whose sampling distribution
has the least variation. This is called the most
efficient estimator.
Sampling Distributions for Mean and Median
mean
median
-8
-6
-4
-2
0
2
4
6
8
For normal populations, the sample mean is the most efficient
G. Baker, Department of Statistics
estimator of μ.
University of South Carolina; Slide 39
Interval Estimate of the Mean
Z 
Yn  
 /
follows
a standard
normal distributi
on
n
P (  1 . 96 
Y 
 /
P (   Y  1 . 96
  1 . 96 )  0 . 95
n
(with a little algebra)

)  0 . 95
n
So we say that we are 95%
confident that μ is in the interval
Y  1 . 96

n
What assumptions have we
made?
G. Baker, Department of Statistics
University of South Carolina; Slide 40
Interval Estimate of the Mean
Standard Normal
0.95
.025
-4
-3
-2
-1.96
-1
0
.025
1
2
1.96
3
4
Z
G. Baker, Department of Statistics
University of South Carolina; Slide 41
Interval Estimate of the Mean
Let’s go from 95% confidence to the
general case.
 The symbol zα is the z-value that has an
area of α to the right of it.

P (  z / 2 
Y 
 /
n
P (   Y  z / 2
  z  / 2 )  (1   )

)  (1   )
n
G. Baker, Department of Statistics
University of South Carolina; Slide 42
Interval Estimate of the Mean
Standard Normal
1-α
α/2
-4
-3
-Zα/2
-2
-1
0
α/2
1
+Zα/22
3
4
(1 – α) 100% Confidence Interval
G. Baker, Department of Statistics
University of South Carolina; Slide 43
What Does (1 – α) 100% Confidence Mean?
Sampling Distribution
of the y
n( , /
y
n)
y
Z
8
x
y
7
Sample
6
x
y
5
y
4
3
2
y
y
x
y
y
1
y
x
(1-α)100%
Confidence
Intervals
0
μ
G. Baker, Department of Statistics
University of South Carolina; Slide 44
If Z0.05 = 1.645, we are _____%
confident that the mean is between
y  1 . 645

n
A.99%
B.95%
C.90%
D.85%
G. Baker, Department of Statistics
University of South Carolina; Slide 45
Which z-value would you use to
calculate a 99% confidence
interval on a mean?
A.
B.
C.
D.
Z0.10 = 1.282
Z0.01 = 2.326
Z0.005 = 2.576
Z0.0005 = 3.291
G. Baker, Department of Statistics
University of South Carolina; Slide 46
Plastic Injection Molding Process
A plastic injection molding process for a
part that has a critical width dimension
historically follows a normal distribution
with a standard deviation of 8.
 Periodically, clogs from one of the feeder
lines causes the mean width to change. As
a result, the operator periodically takes
random samples of size 4.

G. Baker, Department of Statistics
University of South Carolina; Slide 47
Plastic Injection Molding
A recent sample of four yielded a sample
mean of 101.4.
 Construct a 95% confidence interval for
the true mean width.
 Construct a 99% confidence for the true
mean width.

G. Baker, Department of Statistics
University of South Carolina; Slide 48
When going from a 95% confidence
interval to a 99% confidence interval,
the width of the interval will
Increase.
B. Decrease.
C. Remain the same.
A.
G. Baker, Department of Statistics
University of South Carolina; Slide 49
Interval Width, Level of Confidence
and Sample Size

At a given sample size, as level of
confidence increases, interval width
__________.

At a given level of confidence as sample
size increases, interval width __________.
G. Baker, Department of Statistics
University of South Carolina; Slide 50
Calculate Sample Size Before
Sampling!

The width of the interval is determined by:
 z / 2

n
Suppose we wish to estimate the mean to a
maximum error of e:
Max error  e  z  / 2

n
 z  / 2 
n

 e 
2
G. Baker, Department of Statistics
University of South Carolina; Slide 51
Plastic Injection Molding
A plastic injection molding process for a
part that has a critical width dimension
historically follows a normal distribution
with a standard deviation of 8.
 What sample size is required to estimate
the true mean width to within + 2 units at
95% confidence?
 What sample size is required to estimate
the true mean width to within + 2 units at
99% confidence?

G. Baker, Department of Statistics
University of South Carolina; Slide 52
If we don’t have prior knowledge of the
standard deviation, but can assume we
are sampling from a normal
population…

Instead of using a z-value to calculate the
confidence interval…
P (  t / 2 
Y 
s/
n
  t  / 2 )  (1   )
P (   Y  t / 2
s
n
)  (1   )
G. Baker, Department of Statistics
University of South Carolina; Slide 53
Interval Estimate of the Mean
t
Standard Normal
1-α
α/2
-4
-3
df=n-1
-tα/2
-2
-1
0
α/2
1
+tα/2 2
3
4
(1 – α) 100% Confidence Interval
G. Baker, Department of Statistics
University of South Carolina; Slide 54
Plastic Injection Molding –
Reworded
A plastic injection molding process for a
part that has a critical width dimension
historically follows a normal distribution.
 A recent sample of four yielded a sample
mean of 101.4 and sample standard
deviation of 8.
 Estimate the true mean width with a 95%
confidence interval.

G. Baker, Department of Statistics
University of South Carolina; Slide 55
Hypothesis Testing
G. Baker, Department of Statistics
University of South Carolina
Combustion Engine
The nominal power produced by a student
designed combustion engine is assumed to be at
least 100 hp. We wish to test the alternative
that the power is less than 100 hp.
Let µ = nominal power of engine.
QQ plots shows it is reasonable to assume data
came from a normal distribution.
Sample Data:
n  10
y  98 . 7
s  2 . 8694
G. Baker, Department of Statistics
University of South Carolina; Slide 57
Combustion Engine
(1) State hypotheses, set alpha.
(2) Choose test statistic
(3,4) Designate critical value for test and
draw conclusion.
or
Calculate p-value and draw conclusion.
G. Baker, Department of Statistics
University of South Carolina; Slide 58
(3) Designate Critical Region
Assumes H0: µ = 100 is true
0.05
-4
-3
-2
-1
0
100
-4
-3
-2
-1
0
-1.833
1
2
+1
+2
3
+3
4
Y=avg
hp
+4
tdf=9
G. Baker, Department of Statistics
University of South Carolina; Slide 59
Draw conclusion:
t df  9 
-4
-3
y  0
s/
98 . 7  100

n
-2
-1
-1.4327
-1.833
  1 . 4327
2 . 8694 / 10
0
1
2
3
4
tdf=9
G. Baker, Department of Statistics
University of South Carolina; Slide 60
p-value

The p-value is the probability of getting
the sample result we got or something
more extreme.
0.0928
-4
-3
-2
-1
-1.4327
0
1
2
3
4
tdf=9
G. Baker, Department of Statistics
University of South Carolina; Slide 61
p-value
 P(tdf=9

< -1.4327) = 0.0928
Note:
If p-value < α, reject H0.
If p-value > α. Fail to reject H0.
0.0928
0.05
-4
-3
-2
-1
-1.4327
-1.833
0
1
2
3
4
tdf=9
G. Baker, Department of Statistics
University of South Carolina; Slide 62
Average Life of a Light Bulb
Historically, a particular light bulb has
had a mean life of no more than 2000
hours. We have changed the
production process and believe that
the life of the bulb has increased.
Let μ = mean life.
(1) Set Up Hypotheses
α = 0.05
H0:
Ha:
G. Baker, Department of Statistics
University of South Carolina; Slide 63
Average Life of a Light Bulb
(2) Collect Data and calculate test statistic:
y  2141
t df 14 
y  0
s/

n
n  15
s  216
2141  2000
 2 . 5282
216 / 15
0.05
0.0121
-4
-3
-2
-1
0
1
2
3
4
tdf=14
1.761 2.5282
p-value = P(tdf=14 > 2.5282) = 0.0121
G. Baker, Department of Statistics
University of South Carolina; Slide 64
Average Life of a Light Bulb
State Conclusion:
A.
B.
At 0.05 level of significance there is
insufficient evidence to conclude
that µ > 2000 hours.
At 0.05 level of significance there is
sufficient evidence to conclude that
µ > 2000 hours.
G. Baker, Department of Statistics
University of South Carolina; Slide 65
Mean Width of a Manufactured Part

Test the theory that the mean width of a
manufactured part differs from 100 cm.
Let µ = mean width.
(1) Set up Hypotheses
α = 0.05
G. Baker, Department of Statistics
University of South Carolina; Slide 66
Mean Width of a Manufactured Part
(2,3) Collect data and calculate test statistic.
y  105
s  6
n  20
t df 19 
p  value  2 * P ( t df 19 ....
(4) State conclusion.
G. Baker, Department of Statistics
University of South Carolina; Slide 67
Given population parameter µ and value µ0:
For Ho: µ = µ0
Ha: µ = µ0
α/2
α/2
Ha
H0
Ha: µ > µ0
α
H0
Ha: µ < µ0
Ha
Ha
α
Ha
H0
G. Baker, Department of Statistics
University of South Carolina; Slide 68
There Are Two Errors We Can
Make in a Hypothesis Test
1)
Reject H0 when H0 is true. This is called a
type I error.
P(Rej H0|H0 is true) = α
2)
Fail to Reject H0 when Ha is true at some
value. This is called a type II error.
P(Fail to Rej H0|Ha is true at some value) = β
G. Baker, Department of Statistics
University of South Carolina; Slide 69
Avg Life of Light Bulb - Type I Error
H0: µ < 2000
Ha: µ > 2000
Z
Fail to reject H0.
Assumes H0
is true.
α = Probability that
we will reject Ho
when Ho is true.
G. Baker, Department of Statistics
University of South Carolina; Slide 70
Type I and Type II Errors
H0: µ = 2000
β = Probability we will
fail to reject Ho when
Ha is true at µ = 2200
What if µ = 2200
α = Probability that
we will reject Ho
when Ho is true.
G. Baker, Department of Statistics
University of South Carolina; Slide 71
How can we control the size of β?
 The
value of α.
 Location
 Sample
of our point of interest.
size.
G. Baker, Department of Statistics
University of South Carolina; Slide 72
Calculating β
If µ = 2200, what is the probability of a type II
error?
 Given: α = 0.05 and we are assuming
µ = 2000. We will also assume we know σ =
216.

P ( Z  1 . 645 )  0 . 05
1 . 645 
y  2000
 y  2091
216 / 15
G. Baker, Department of Statistics
University of South Carolina; Slide 73
Calculating β
H0: µ = 2000
Fail to Reject Ho
What if µ = 2200
2091
Reject Ho
P ( y  2091 |   2200 )  
G. Baker, Department of Statistics
University of South Carolina; Slide 74
Calculating β
  P ( y  2091 |   2200 )
2091  2200 

  P z 
  P ( z   1 . 9544 )  0 . 0254
216 / 15 

  P ( Fail to Reject H 0 |   2200 )  0 . 0254
G. Baker, Department of Statistics
University of South Carolina; Slide 75
α, β and Power

α = P(Reject H0|µ = 2000) = 0.05

β = P(Fail to Rej H0| µ = 2200) = 0.0254

We say that the power of this test at
µ = 2200 is 1 – 0.0254 = 0.9746
Power = 1 –β
 Power = P(Rej H0|µ is at some Ha level)

G. Baker, Department of Statistics
University of South Carolina; Slide 76
Plastic Injection Molding
A plastic injection molding process for a
part that has a critical width dimension
historically follows a normal distribution.
 A recent sample of n = 4 yielded a sample
mean of 101.4 and sample standard
deviation of 8.
 Does this data support the statement:
“The true average width is greater than
95.”?

G. Baker, Department of Statistics
University of South Carolina; Slide 77
Plastic Injection Molding
Confidence Interval Approach

95% confidence interval on µ:
y  t df  3 , 0 .025
s
n
101 . 4  3 . 182
8
 101 . 4  12 . 728
4
( 93 . 56 ,109 . 24 )
G. Baker, Department of Statistics
University of South Carolina; Slide 78
Plastic Injection Molding
Hypothesis Test Approach
H0:
α = 0.05
Ha:
Test statistics is
p-value =
Conclusion:
G. Baker, Department of Statistics
University of South Carolina; Slide 79
Download