Chap 4

advertisement
8/14/2007
Chapter 4: Sampling and
Estimation
© 2007 Pearson Education
Need for Sampling



Very large populations
Destructive testing
Continuous production process
The objective of sampling is to draw a valid inference about
a population.
Sample Design

Sampling Plan – a description of the
approach that will be used to obtain
samples from a population






Objectives
Target population
Population frame
Method of sampling
Operational procedures for data collection
Statistical tools for analysis
1
8/14/2007
Sampling Methods

Subjective



Judgment sampling
Convenience sampling
Probabilistic

Simple random sampling – every subset of
a given size has an equal chance of being
selected
PHStat Tool
Random Sample Generator

PHStat menu > Sampling > Random
Sample Generator
Enter sample size
Select sampling
method
Excel Data Analysis Tool
Sampling

Excel menu > Tools > Data Analysis >
Sampling
Specify input range
of data
Choose sampling
method
Select output option
2
8/14/2007
Other Sampling Methods




Systematic sampling
Stratified sampling
Cluster sampling
Sampling from a continuous process
Errors in Sampling

Nonsampling error

Sampling (statistical) error



Poor sample design
Depends on sample size
Tradeoff between cost of sampling and
accuracy of estimates obtained by
sampling
Estimation



Estimation – assessing the value of a
population parameter using sample data.
Point estimate – a single number used to
estimate a population parameter
Confidence intervals – a range of values
between which a population parameter is
believed to be along with the probability that
the interval correctly estimates the true
population parameter
3
8/14/2007
Common Point Estimates
Theoretical Issues


Unbiased estimator – one for which the
expected value equals the population
parameter it is intended to estimate
The sample variance is an unbiased
estimator for the population variance
2
n
xi
s
2
xi
2
i 1
2
n
x
i 1
n 1
N
Interval Estimates

Range within which we believe the true
population parameter falls


Example: Gallup poll – percentage of
voters favoring a candidate is 56% with a
3% margin of error.
Interval estimate is [53%, 59%]
4
8/14/2007
Confidence Intervals


Confidence interval (CI) – an interval
estimated that specifies the likelihood that
the interval contains the true population
parameter
Level of confidence (1 – ) – the probability
that the CI contains the true population
parameter, usually expressed as a percentage
(90%, 95%, 99% are most common).
Sampling Distribution of the
Mean
Interval Estimate Containing the
True Population Mean
5
8/14/2007
Interval Estimate Not Containing
the True Population Mean
Confidence Interval for the
Mean – Known
A 100(1 – )% CI is: x
z
/2(
/ n)
z /2 may be found from Table A.1 or using the
Excel function NORMSINV(1- /2)
Example

Compute a 95 percent confidence interval for
the mean number of TV hours/week for the
18-24 age group in the file TV Viewing.xls.
Assume that the population standard
deviation is known to be 10.0. The sample
mean for the n = 45 observations is
computed to be 60.16. For a 95 percent CI,
z /2 = 1.96. Therefore, the CI is
60.16 1.96(10/ 45)
= 60.16 2.92 or [57.24, 63.08]
6
8/14/2007
Confidence Interval for the
Mean, Unknown
A 100(1 – )% CI is: x
t
/2,n-1(s/
n)
t
/2,n-1 is the value from a t-distribution with
n-1 degrees of freedom, from Table A.2 or
the Excel function TINV( , n-1)
Relationship Between Normal
Distribution and t-distribution
The t-distribution yields larger confidence
intervals for smaller sample sizes.
Example

Compute a 95 percent confidence interval for the
mean number of TV hours/week for the 18-24 age
group in the file TV Viewing.xls. Assume that the
population standard deviation is not but estimated
from the sample as 10.095. A 95 percent CI
corresponds to /2 = 0.025. With 45 observations,
thus the t-distribution has 45 - 1 = 44 df. Using Table
A.2, we find that t0.025, 44 = 2.0154, yielding a 95
percent CI for the mean of
60.16 2.0154(10.095/ 45)
= 60.16 3.03 or [57.13, 63.19]
7
8/14/2007
PHStat Tool: Confidence
Intervals for the Mean

PHStat menu > Confidence Intervals >
Estimate for the mean, sigma known…,
or Estimate for the mean, sigma
unknown…
PHStat Tool: Confidence
Intervals for the Mean - Dialog
Enter the confidence level
Choose specification of
sample statistics
Check Finite Population
Correction box if
appropriate
Sampling From Finite
Populations

When n > 0.05N, use a correction
factor in computing the standard error:
x
n
N
N
n
1
8
8/14/2007
PHStat Tool: Confidence
Intervals for the Mean - Results
Confidence Intervals for
Proportions

Sample proportion: p = x/n




x = number in sample having desired
characteristic
n = sample size
The sampling distribution of p has mean
and variance (1 – )/n
When n and n(1 – ) are at least 5,
the sampling distribution of p approach
a normal distribution
Confidence Intervals for
Proportions
A 100(1 – )% CI is: p z
/2
p(1 - p)
n
PHStat tool is available under Confidence
Intervals option
9
8/14/2007
Confidence Intervals and
Sample Size

CI for the mean,


known
Sample size needed for half-width of at
most E is n (z /2)2( 2)/E2
CI for a proportion

Sample size needed for half-width of at
most E is
( z / 2 ) 2 (1
)
n

E2
Use p as an estimate of or 0.5 for the
most conservative estimate
PHStat Tool: Sample Size
Determination

PHStat menu > Sample Size >
Determination for the Mean or
Determination for the Proportion
Enter s, E, and
confidence level
Check Finite
Population Correction
box if appropriate
Confidence Intervals for
Population Total
A 100(1 – )% CI is: N x
tn-1,
/2
N
s
n
N
N
n
1
PHStat tool is available under Confidence
Intervals option
10
8/14/2007
Confidence Intervals for
Differences Between Means
Population 1
Population 2
Mean
1
2
Standard
deviation
1
2
Point estimate
Sample size
x1
x2
n1
n2
Point estimate for the difference in means,
1 – 2, is given by x1 - x2
Independent Samples With
Unequal Variances
A 100(1 – )% CI is: x1 - x2 (t
df* =
s12
s 22
n1
n2
( s12 / n1 ) 2
n1 1
/2, df*)
s12
s 22
n1
n2
2
( s 22 / n 2 ) 2
n2
Fractional values
rounded down
1
Example

In the Accounting Professionals.xls worksheet,
find a 95 percent confidence interval for the
difference in years of service between males and
females.
11
8/14/2007
Calculations



s1 = 4.39 and n1 = 14 (females),
s2 = 8.39 and n2 = 13 (males)
df* = 17.81, so use 17 as the degrees
of freedom
Independent Samples With
Equal Variances
A 100(1 – )% CI is: x1 - x2 (t
sp
( n1
1) s12
n1
1
n1
1
n2
1) s 22
(n2
n2
/2, n1 + n2 – 2) s p
2
where sp is a common “pooled” standard deviation. Must
assume the variances of the two populations are equal.
Example: Accounting
Professionals
12
8/14/2007
Paired Samples
A 100(1 – )% CI is: D (tn-1, /2) sD/ n
Di = difference for each pair of observations
D = average of differences
n
( Di
sD
2
D)
PHStat tool available in the
Confidence Intervals menu
i 1
n 1
Example


Pile Foundation.xls
A 95% CI for the average difference
between the actual and estimated pile
lengths is
Differences Between
Proportions
A 100(1 – )% CI is:
p1
p2
z
p1 (1
/2
p1 )
p 2 (1
n1
p2 )
n2
Applies when nipi and ni(1 – pi) are greater than 5
13
8/14/2007
Example

In the Accounting Professionals.xls
worksheet, the proportion of females having
a CPA is 8/14 = 0.57, while the proportion of
males having a CPA is 6/13 = 0.46. A 95
percent confidence interval for the difference
in proportions between females and males is
Sampling Distribution of s


The sample standard deviation, s, is a point
estimate for the population standard
deviation,
The sampling distribution of s has a chisquare ( 2) distribution with n-1 df



See Table A.3
CHIDIST(x, deg_freedom) returns probability to
the right of x
CHIINV(probability, deg_freedom) returns the
value of x for a specified right-tail probability
Confidence Intervals for the
Variance
A 100(1 – )% CI is:
( n 1) s 2 ( n 1) s 2
, 2
2
n 1, / 2
n 1,1
/2
Note the difference in the
denominators!
14
8/14/2007
PHStat Tool: Confidence
Intervals for Variance - Dialog

PHStat menu > Confidence Intervals >
Estimate for the Population Variance
Enter sample size,
standard deviation,
and confidence level
PHStat Tool: Confidence
Intervals for Variance - Results
Time Series Data

Confidence intervals only make sense
for stationary time series data
15
8/14/2007
Summary and Conclusions


As the confidence level (1 - )
increases, the width of the confidence
interval also increases.
As the sample size increases, the width
of the confidence interval decreases.
Probability Intervals


A 100(1 – )% probability interval for a
random variable X is any interval [a,b]
such that P(a X b) = 1 –
Do not confuse a confidence interval
with a probability interval; confidence
intervals are probability intervals for
sampling distributions, not for the
distribution of the random variable.
16
Download