A. Parameter Estimation 1. Review of the Normal Distribution

advertisement

1/08/03 252oneal (Open this document in 'Outline' view!)

E CONOMICS 252 COURSE OUTLINE

A. Parameter Estimation

1. Review of the Normal Distribution

See 251greatD , 251distrex2 , 251distrex3 , 251distrex4

2. Point and Interval Estimation

3. A Confidence Interval for the Mean when the

Population Variance is Known.

and the sample size is relatively large; or

(ii) where the variance is not known and the sample variance,

2 s , is used to replace

 2

, but the degrees of freedom

are so large that the appropriate value of t

1

is not very different from z .

The first of these situations is not very realistic, but serves as a good introduction to confidence intervals. The formula for this type of confidence interval for the mean is,

  x

 z 

2

 x

, where

 n

.

Note: If n

.

05 N , use

 x

 n

N

N

 n

1

( n is sample size and N is population size) See 252onealex1.

Don’t use this method unless you know the population variance.

a. A Two-Sided Confidence Interval

(i)

An interval of this type is used in two situations: where the population variance,

 2

, is in fact, known

b. A One-Sided Confidence Interval.

There are two types of one-sided confidence interval for the mean.

These are (i) An upper bound, and (ii) a lower bound, and have the form:

  x

 z 

 x

and

  x

 z 

 x

. An example is in 252oneaex1a.

4. A Confidence Interval for the Mean when the

Population Variance is not Known.

"The variance is not known " implies that there is no previous

1

knowledge or assumption about the value of

 2

. Knowing s

2

is having a guess as to what the variance is; it is not the same as knowing the variance. If the population distribution is normal or approximately normal, the formula for a two-sided confidence interval for the mean is

  x

 t

 

2 s x

, where s x

 s n

.

Note: If n

.

05 N , use s x

 s n

N

N

 n

1

See 252onealex2 and 252oneaex3 .

Note: this is the more common case – if you do not know the population variance and the sample size is not very large, using z instead of t is a very bad idea.

2

5. Deciding on Sample Size when working with a Mean

The formula usually suggested is n it can be approximated by

 

 z

2  e

2 x

.

001

 x

.

999

6

2

.

, where, if

 is not known,

6. A Confidence Interval for a Proportion.

(a. Small Samples.

Table 16 ( ConfidenceIntervalsBinominalDistribution.pdf

) gives Confidence Intervals for proportions.

These tables are of use when the conditions do not exist in which one can use the normal distribution. For example if n

10 and p

.

5 , and we wish to find a 95% confidence interval, we can look at the horizontal axis of the upper table.

There we can find p

.

5 and look up to find the upper and lower curves for n

10 . Then vertical line at p

.

5 intersects these curves. The lower curve meets the vertical line at about p

.

175 . (Read up the vertical axis.). The upper curve meets the vertical line at about

95% confidence interval is about .

175 p

 p

.

825 , so that our

.

825 .)

b. Large Samples.

More usually, using the normal approximation to the binomial distribution, and using p for the population probability of success and q for the population probability of failure, and letting p and q be the corresponding sample quantities, we can write p

 p

 z

2 s p

, where s p

 p q and q

1

 p . An example is in 251 proport . n

c. Deciding on Sample Size.

The usually suggested formula is n

 pqz

2

, but since p e

2 is usually unknown, a conservative choice is to set p

0 .

5 .

This is the formula everyone forgets that we covered.

7. A Confidence Interval for a Variance.

This method is only appropriate when the population distribution is normal or approximately normal.

For small samples

 n

1

2

2

 s

2

  2 

 n

1

1

2

 

2 s

2

,

3

but if the degrees of freedom are too large for the chi-square table use z

2 s

2

 

2

 

  

 z

 s

2

2

 

2

 

. An example is in 252oneaex4.

4

(8. Appendix

A Confidence Interval for a Median.

In a situation where the population distribution is not normal, it is often more appropriate to find the median than the mean. The process of finding a confidence interval for a median is based on one simple fact: the probability that a single number picked at random from a population is above (or below) the median is 50%. Similarly, the probability that any two numbers picked at random from a population are both above (or both below) the median is 25%.. This comes from the multiplication rule: If A is the probability that the first number is above the median, and the median, then

B is the probability that the second number is above

P

A

B

     if A and B are independent events. If the probability of both numbers being above the median is

25%, and the probability of both numbers being below the median is

25%, then the probability that both numbers are on the same side of the median is 50%. This is due to the addition rule: Let event C be "both numbers are above the median," and event below the median." Then event D

D be "both numbers are

C

is "both numbers are on the same side of the median." The addition rule says that if mutually exclusive, P

C

D

P ( C )

P ( D )

C and D are

. Finally, if the probability that both numbers are on the same side of the median is 50%, then the probability that the two numbers are on opposite sides of the median is also 50%. This means that, since any two numbers picked from the sample have a 50% chance of bracketing the median, these two numbers constitute a 50% confidence interval.

Note that, since p , the probability that any one number is above the median, is 0.5, and q , the probability that any one number is below the median, is also 0.5, we have a problem that resembles finding the distribution of the number of heads on two tosses of a fair coin. If we call a head a success, the distribution of heads on two tosses is described by the binomial distribution with n (the number of tries) set at 2, and p (the probability of success on one try) set at 0.5.

For convenience, we will use q (the probability of failure on one try) for the probability that one number is below the median or of getting a tail on one toss of a fair coin. It is always true that q

1

 p . The formula for the binomial distribution is P

 

C n x p x q n

 x

, where x is the number of successes. For the probability of two successes (heads) in

2 tries, we find that P

 

C

2

2

       

.

25 . We find the probability of two heads or two tails in two tries by noting that the probability of two failures (tails) is P

 

C

0

2

probability of two heads or two tails is P

.

   

2

P

2 

.

25

.

25

. Thus the

.

25

.

50 .

This is the same as the probability of two randomly picked numbers both being on the same side of the mean.

To take this a bit further, let us assume that we take a sample of n numbers from a population and then take two numbers at equal distances from the ends of the sample (for example, the fourth lowest

5

and the fourth highest of a sample of 20 numbers). We will find that it is relatively easy to figure out the probability that these numbers bracket the median, and this will be our confidence level. This process requires some new thinking because: (i) we find our confidence interval without using a point estimate as we did in every previously studied method for constructing a confidence interval; and (ii) we find the interval first and then figure out its confidence level instead of starting with a confidence level and then figuring out the interval. This process serves as an introduction to the field of nonparametric statistics, which is largely made up of methods that do intervals and tests without assuming that the parent distribution (the distribution of the population from which the sample is drawn) is normal. In the case of finding a median, the process to be explained would be unnecessary if the parent population were normal, because in a normal population the mean and median are identical. Therefore, if the parent population is normal, we could use a method for finding a confidence interval for the mean in place of a method for finding a confidence interval for a median.

Assume that we pick a sample of four from a population, and that this sample, when put in ascending order, is

20 , 25 , 29 , 30

. If we use two numbers at equal distances from the ends as our confidence interval , we can use 20

  

30 or 25

  

29 (

(nu) is our symbol for a population median). The first of these intervals ( 20

  

30 ) is wrong only if all four numbers in the sample are below the median or all four numbers are above the median. The probability that all four are above the median is the same as the probability of four heads in four tosses,

P

 

C

4

4

   

0 

.

0625 . The probability that all four numbers are below the median is the same as the probability of four tails on four tosses P

 

C

0

4

.

   

4 

.

0625 . We can find the probability of all four being above the median from a cumulative binomial table by noting that, for n

4 , P

 x

4

P

 x

4

1

The binomial table will tell us that, for

P

 x

3

.

9375

1

P

 x

3

 p

.

5 , P

.

 x

4

1 , and

, so P

 x

4

.

9375

.

0625 that all four numbers are below the median, P

 x

.

( bin ) Since the probability

0

, is the same as the probability that all four numbers are above the median, the probability that the two numbers do not bracket the median (the probability that we are wrong or the significance level) is

 

2 P

 x

0

  

.

0625

.

1250

The confidence level is thus 1

  

1

2

P

 x

0

 

1

2

.

0625

.

8750 .

.

Now try picking the confidence interval

25

  

29 ,

 by choosing the numbers x

2

and x

3

, that is the second from the top and the second from the bottom in the ordered sample,

20 , 25 , 29 , 30

. This interval is invalid if (i) the lowest three or more numbers in the sample are below the median (equivalent to three or more tails when a coin is tossed four times), or (ii) the highest three or more numbers in the sample are above the median (equivalent to three or more heads). The probability of the first of these events is (for n

4 and p

.

5 ) P

 x

1

, and the probability of the second event is binomial table we find that P

 x

3

1

P

P

 x x

3

2

. But, using the

P

 x

1

.

3125 .

bin

6

So the probability that the interval does not bracket the median is the confidence level is 1

  

1

2

P

 x

1

 

1

2

.

3125

.

3750 .

2

P

 x

1

  

.

3125

.

6250 , and

7

Generalize this to a situation where we take a random sample of n items from a population and put the numbers in ascending order so that x

1

 x

2

 x

3

     x n

1

 x n

. Now pick x k and x n k + 1

, the numbers that are the k th from the bottom and the k th from the top, respectively. This interval is invalid if (i) all the numbers included in the interval and all the numbers below the interval are below the median or (ii) all the numbers on the interval and all the numbers above the interval are above the median. The probability of the first event is

P

P

 x x

 k n

1 k

and the probability of the second event is

  x confidence level is

1 k

 

1

 binomial distribution for

the equality is due to the symmetry of the p

.

5 . So

1

2 P

 x

 k

1

2 P

 x

 k

1

, and the

. For example, if we take a sample of 100 items and put them in order and then use the interval

 x

38

   x

63

, that is, the 38th number from the bottom and the 38th number from the top, the confidence level (from the binomial table for n

100 and p

.

5 ) is 1

  

1

2 P

 x

37

1

2

.

0060

.

9880 .

bin

P

 x

 k

There will be some situations in which we cannot find

1

on the cumulative binomial table. Then we must use a normal approximation to the binomial distribution, that is (using a continuity correction), find the normal probability,

2

P

 x

 k

1

1

2

P

 z

 k

1

1

2 npq

 np

P

 z

 k

.

5

.

5 n

.

5 n

 .

(In the last part of this equality, .5 was substituted for both p and q .)

This takes us back to a more conventional formulation for the confidence interval because we can choose k so that

 z

2

 k

.

5

.

5 n

.

5 n

. n

1

 z

.

2 n

If we solve this equation for k , we find that k

.

2

Thus if we want a 95% confidence interval for the median, and we take a sample of n

150 , and pick k

Our interval will then be x

63

 

150

1

1

2 x

88

.) ttable

.

96 150

63 .

4975 .

© 2002 R. E. Bove

8

Download