DEFINITION - Addis Ababa University (USA)

advertisement
G. W. Teklewolde Math MS
Statistics Basics
Study Note
Part 6
Statistics Basics
The t-Distribution
In many real-life situations, the population standard deviation is unknown. Moreover, because of various
constraints such as time and cost, it is often not practical to collect samples of size 30 or more. So, how
can you construct a confidence interval for population mean given such circumstances? If the random
variable is normally distributed (or approximately normally distributed), you can use a t-distribution.
DEFINITION
If the distribution of a random variable x is approximately normal, then
t
x
s/ n
follows a t-distribution.
Critical values of t are denoted by tc. Several properties of the t-distribution are as follows.
1. The t-distribution is bell shaped and symmetric about the mean.
2. The t-distribution is a family of curves, each determined by a parameter called the degrees of
freedom. The degrees of freedom are the number of free choices left after a sample statistic
such as x is calculated. When you use a t-distribution to estimate a population mean, the degrees
of freedom are equal to one less than the sample size.
d . f  n  1 Degrees of freedom
3. The total area under a t-curve is 1 or 100%.
4. The mean, median, and mode of the t-distribution are equal to zero.
5. As the degrees of freedom increase, the t-distribution approaches the normal distribution. After 30
d.f the t-distribution is very close to the standard normal z-distribution.
The tails in the t-distribution are “thicker” than those in the standard normal distribution.
Confidence Intervals and t- Distributions
Constructing a confidence interval using the t-distribution is similar to constructing a confidence
interval using the normal distribution—both use a point estimate x and a margin of error E.
GUIDELINES
Constructing a Confidence Interval for the Mean: t-Distribution
1.
In Words
In Symbols
Identify the sample statistics n,
 x , s   ( x  x)
x
x , and s.
n
n 1
2
G. W. Teklewolde Math MS
2.
Identify the degrees of freedom,
the level of confidence c, and
the critical value tc.
3. Find the margin of error E.
4.Find the left and right endpoints
and form the confidence interval.
Statistics Basics
Study Note
d.f. = n — 1
E = t~, Left endpoint: ~ — E
Right endpoint: i~ + E
Interval: ~—E<p.<~+E
Point Estimate for the Population Proportion p
Recall from Section 4.2, that the probability of success in a single trial of a binomial experiment is p. This
probability is a population proportion. To estimate a population proportion p using a confidence interval
as with confidence intervals for µ, you will start with a point estimate.
DEFINITION
The point estimate for p, the population proportion of successes, is given by the proportion of successes in
a sample and is denoted by
p
x
n
where x is the number of successes in the sample and n is the number in the sample. The point estimate
for the proportion of failures is q  1  p . The symbols p and q are read as “p hat” and “q hat.”
Confidence Intervals for a Population Proportion p
Constructing a confidence interval for a population proportion p is similar to constructing a confidence
interval for a population mean. You start with a point estimate and calculate a margin of error.
DEFINITION
A c-confidence interval for the population proportion p is
pE  p  p E
where
E  zc
pq
n
The probability that the confidence interval contains p is c.
Binomial distribution can be approximated by the normal distribution if np ≥ 5 and nq ≥ 5. When
n p  5 and nq  5 , the sampling distribution for p is approximately normal with a mean of
p = p
and a standard error of
G. W. Teklewolde Math MS
p 
Statistics Basics
Study Note
pq
n
GUIDELINES
Constructing a Confidence Interval for a Population Proportion
1. Identify the sample statistics n and x.
2. Find the point estimate p .
p
x
n
3. Verify that the sampling distribution of p an be approximated by the normal distribution.
n p  5 , nq  5
4. Find the critical value zc that corresponds to the given level of confidence c. using the
standard Normal Table
5. Find the margin of error E= zc
pq
n
6. Find the left and right endpoints and form the confidence interval.
Left endpoint: p  E
Right endpoint: p  E
Interval: p  E  p  p  E
Increasing Sample Size to Increase Precision
One way to increase the precision of the confidence interval without decreasing the level of confidence is
to increase the sample size.
Finding a Minimum Sample Size to Estimate p
Given a c-confidence level and a margin or error E, the minimum sample size n needed to estimate p is
2
z 
n  pq  c 
E
This formula assumes that you have a preliminary estimate for p and q . If not, use p  0.5 and q  0.5 .
The Chi-Square Distribution
In manufacturing, it is necessary to control the amount that a process varies. For instance, an automobile
part manufacturer must produce thousands of parts to be used in the manufacturing process. It is
important that the parts vary little or not at all. How can you measure, and consequently control, the
amount of variation in the parts? You can start with a point estimate.
DEFINITION
G. W. Teklewolde Math MS
Statistics Basics
Study Note
The point estimate for  2 is s2 and the point estimate for  is s. s2 is the most unbiased estimate for
2.
You can use a chi -square distribution to construct a confidence interval for the variance and
standard deviation.
DEFINITION
If the random variable x has a normal distribution, then the distribution of
2 
(n  1)s2
2
forms a chi-square distribution for samples of any size n > 1. Four properties of the chi-square
distribution are as follows.
1. All chi-square values x2 are greater than or equal to zero.
2. The chi-square distribution is a family of curves, each determined by the degrees of
freedom. To form a confidence interval for  2 , use the  2 -distribution with degrees of
freedom equal to one less than the sample size.
d.f.
= n — 1 Degrees of freedom
3. The area under each curve of the chi-square distribution equals one.
4. Chi-square distributions are positively skewed.
Chi-square distributions
Confidence Intervals for  2 and 
You can use the critical values  2 and  L2 to construct confidence intervals for a population variance and
standard deviation. As you would expect, the best point estimate for the variance is s 2 and the best point
estimate for the standard deviation is s.
DEFINITION
A c-confidence interval for a population variance and standard deviation is as follows.
Confidence Interval for  2 :
G. W. Teklewolde Math MS
(n  1)s2
 R2
2 
Statistics Basics
Study Note
(n  1)s2
 L2
Confidence Interval for  :
(n  1)s2
 R2
 
(n  1)s 2
 L2
The probability that the confidence intervals contain  2 or  is c.
GUIDELINES
Constructing a Confidence Interval for a Variance and Standard Deviation
In Words
1. Verify that the population has
a normal distribution.
2. Identify the sample statistic n
and the degrees of freedom.
3. Find the point estimate s2.
In Symbols
d.f. = n — 1
s2  
( x  x) 2
n 1
4. Find the critical values  R2 and  L2 that correspond to the given level of confidence c.
Use Table for chi-dist.
5. Find the left and right endpoints and form the confidence interval for the population variance.
Left Endpoint Right Endpoint
(n  1)s2
 R2
2 
(n  1)s2
 L2
6. Find the confidence interval for the population standard deviation by taking the square root of each
endpoint.
Left Endpoint Right Endpoint
(n  1)s2
 R2
 
(n  1)s 2
 L2
Stating a Hypothesis
A verbal statement, or claim, about a population parameter is called a statistical hypothesis. To test a
population parameter, you should carefully state a pair of hypotheses—one that represents the claim and
the other, its complement. When one of these hypotheses is false, the other must be true. Either
G. W. Teklewolde Math MS
Statistics Basics
Study Note
hypothesis—the null hypothesis or the alternative hypothesis—may represent the original claim.
DEFINITION
1. A null hypothesis H0 is a statistical hypothesis that contains a statement of equality, such as
,  or 
2. The alternative hypothesis Ha is the complement of the null hypothesis H0. It is a statement that
must be true if H0 is false and it contains a statement of inequality, such as >, ≠ or <.
H0 is read as “H subzero” or “H naught” and Ha is read as “H sub-a.”
To write the null and alternative hypotheses, translate the claim made about the population parameter
from a verbal statement to a mathematical statement. Then, write its complement. For instance, if the
claim value is k and the population parameter is µ., then some possible pairs of null and alternative
hypotheses are
 H0 :   k

 Ha :   k
 H0 :   k

 Ha :   k
 H0 :   k

 Ha :   k
Regardless of which of the three pairs of hypotheses you use, you always assume   k and examine
the sampling distribution on the basis of this assumption.
Download