Uploaded by alia Alnuamat

Chapter 5 -ESTIMATION-CLT

advertisement
Al-Ahliyya Amman University
Chapter 5
Central Limit Theorem
CLT
Dr Alia Al nuaimat
What is the point?
Research questions are about population
parameters (these are unknown).
This chapter covers estimation, one of the two types of statistical
inference. As discussed in earlier chapters, statistics, such as means
and variances, can be calculated from samples drawn from
populations. These statistics serve as estimates of the corresponding
population parameters. We expect these estimates to differ by some
amount from the parameters they estimate. This chapter introduces
estimation procedures that take these differences into account,
thereby providing a foundation for statistical inference procedures
discussed in the remaining chapters of the book.
2
Unbiased estimator
•
A statistic is said to be an unbiased estimate of a given parameter
when the mean of the sampling distribution of that statistic can be
shown to be equal to the parameter being estimated.
 For example, the mean of a sample is an unbiased estimate of the
mean of the population from which the sample was drawn.
 We require that the sample fairly represents the population of
interest.
• We WANT our statistic to be an unbiased estimator of a population
parameter.
• Our research question is about an unknown population and we will
use statistics (know quantities calculated from our data to provide
evidence that addresses our research question).
• Having an unbiased estimator gives us good evidence.
Central Limit Theorem (or CLT)
• The fundamental or central theorem of
Statistics says that under certain conditions
(i.e., random sample and large enough sample
size) the following is true:
“The sampling distribution of a sample mean is
a normal curve. “
Central limit theorem
• Suppose we have a population with mean μ
and standard deviation σ.
• If we take simple random samples of size n
and the sample size is sufficiently large (> 50),
then the sampling distribution of the sample
means is approximately normal with mean =
μ, and standard error (i.e., standard deviation
= σ/√ N ).
ADDITIONAL MATERIAL TO HELP YOU UNDERSTAND SAMPLING
DISTRIBUTIONS AND THE CENTRAL LIMIT THEOREM (CLT):
• The goal of most research is inference (taking
information from a sample and generalizing it to a
population).
• Valid inference depends on a selecting a sample that
fairly represents a population.
• To illustrate the concept of a sampling distribution,
suppose that 25 researchers have the same question:
What is the mean weight of cats in Gainesville, Florida?
Suppose that each researcher finds a random sample of 30 cats, weighs each cat, and
calculates the sample mean. Possible data collected from these experiments is given in
the table on the following table.
Each sample yields a different value of the sample mean, so the sample mean can be thought of as a random variable.
Next Step!
• We could graph the values of the sample means. The
histogram would give us an idea of the probability
distribution.
• This probability distribution, which shows how to assign
probability to the values of statistics, is called a sampling
distribution.
• The standard deviation of a sampling distribution is called a
standard error.
• Mathematical theory lets us know what the sampling
distributions are for various statistics.
Definition
• Sampling distribution: The probability
distribution of a statistic when the statistic is
considered as a random variable (e.g. mean for
several sample).
• Standard error: The standard deviation of a
sampling distribution. Its reflect the error (or how
much we close) in sampling to determine the
mean(SE = S/√N). If N increased SE decrease will
be more accurate since its reflect true poulation
General Rule in CLT
• As sample size becomes larger, the distribution becomes
more and more normal.
• If the population data is not normally distributed, the CLT
applies with sample sizes N >30.
• So you can start with a random distribution, take a sample
(of at least 30), plot the average of those samples and you
will end up with a normal distribution
• This is why a normal distribution is SO helpful and comes
up so often.
Sampling distribution of the Sample
Mean
• Derived from samples of original distribution
• Will have same mean as original distribution
• But as the sample size gets larger, will get a tighter fit
around the mean.
• When n is small eg. N=1 will usually not be normal no
matter how many trials you do. As n ∞ get normal
distribution
• The more samples, the closer to the mean the
distribution of your sample means will be?!!!
What will make the sample mean
more accurate?
• We know, the larger the sample (n) the closer
the values to the true mean.
• Also the smaller true σ, the less the spread of
sample means.
Two Factors: n and σ
Standard error of mean
•
SE 

N
This does not give the variability of the population, it gives a precision of the estimate
of the mean ie. “How close is my sample mean to the TRUE MEAN?”
Example
• Weight of adult women in a population is
normally distributed, with a mean of 75 kg.
Approximately 95 % of all women weigh
between 55kg and 95kg.
• What would the standard error of the mean
for a sample of the weight of 49 women be?
• For 64 women?
• For 625 women?
• 1.42 SE of mean for N=49
• 1.25 SE of mean for N = 64
• 0.4 SE of mean for N= 625
• What does this mean? It means that for larger
samples the precision of the sample mean is
better. That is it is closer to the true mean.
• Calculate 95% confidence intervals for each
sample mean.
Confidence Interval for a Mean
x̄ ± Z* σ / (√n)
There’s a 95% probability that the population
mean  is within E of the sample mean X .
18
Distribution of
sample means
0.025
0.95
0.025
  1.96
Z0.025 = 1.96

n

E  1.96

  1.96

n
n
19
Confidence Interval for a Mean
E  1.96

n
E = Error Margin
There’s a 95% probability that X , the sample
mean, is within E of the population mean .
20
Example:
95% Confidence Interval
x
Interpretation of 95% CI
• Correct
We have 95% confidence that the true population
mean lies within this interval
A 95% confidence interval is a range of values that
you can be 95% certain contains the true mean of
the population.
95% of the time, in repeated sampling, the interval
calculated from the same sample size will include
the true mean 
• Incorrect
The probability that the mean lies between the
lower and upper limits is 0.95
W H AT “ 9 0 % C O N F I D E N C E ” M E A N S
•
90% Confidence Interval: Lower Bound <  < Upper Bound
•
What “90% confidence” does not mean
•
We are 90% confident that the sample mean for the observed
sample (the data used to obtain the bounds) lies between the
bounds. ABSOLUTELY FALSE.
•
You can be 100% confident that the sample mean for the
given data is equal to itself with virtually no error margin.
23
W H AT “ 9 0 % C O N F I D E N C E ” M E A N S
(When the conditions are satisfied.)
90% of all samples produce an interval that covers
the true mean .
We have an interval from one sample, chosen
randomly.
Our interval either does or does not cover : in
practice we just don’t know. We do know that the
procedure works 90% of the time.
24
99 percent C.I for the mean age of Jordanians
was computed to be (29.8; 38.5 years). What
is the interpretation attached to this interval?
(a) We are 99 percent confident that the mean age of Jordanians is between 29.8 and 38.5.
(b) Ninety-nine percent of the residents in our sample had ages between 29.8 and 38.5.
(c) We are 99 percent confident that the mean age of Jordanians in our sample is between 29.8
and 38.5.
(d) All of the above are valid interpretations
.
α = tail area
central area = 1 – 2α
zα
0.10
0.80
z.10 = 1.28
0.05
0.90
z.05 = 1.645
0.025
0.95
z.025 = 1.96
0.01
0.98
z.01 = 2.33
0.005
0.99
z.005 = 2.58
Table 6.4
Definition
Definition
Theorem
Procedure
Figure 7.5 Locating za/2 on the standard normal
curve
Definition
Figure 7.6 The z value (z.05) corresponding to an area
equal to .05 in the upper tail of the z-distribution
Figure 7.7 MINITAB output for Finding z.05
Table 7.2
Procedure
Procedure
Procedure
Figure 7.9 Standard normal (z) distribution and
t-distributions
Table 7.3
Figure 7.10 The t.025 value in a t-distribution
with 4 df, and the corresponding z.025 value
Table 7.4
Figure 7.11 SPSS confidence interval for mean
blood pressure increase
Procedure
Figure 7.12 MINITAB printout with descriptive statistics
and 99% confidence interval for Example 7.5
Procedure
Procedure
Definition
Figure 7.15 MINITAB printout with 90% confidence
interval for p
Figure 7.16 Relationship between sample size and
width of confidence interval: hospital-stay example
Figure 7.17 Specifying the sampling error SE as the
half-width of a confidence interval
Procedure
Procedure
Download