estimation

advertisement

ESTIMATION

STATISTICAL INFERENCE

It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn from that population

STATISTICAL INFERENCE

This can be achieved by :

Hypothesis testing

Estimation: Point estimation

Interval estimation

Estimation

If the mean and the variance of a normal distribution are known , then the probabilities of various events can be determined.

But almost always these values are not known , and we have to estimate these numerical values from information of a simple random sample

Estimation

The process of estimation involves calculating from the data of a sample , some “statistic” which is an approximation of the corresponding “parameter” of the population from which the sample was drawn

POINT ESTIMATION

It is a single numerical value btained from a random sample used to estimate the corresponding population parameter

_

Sample mean (X) is the best point estimate for population mean( µ )

POINT ESTIMATION

Sample standard deviation (s) is the best point estimate for population standard deviation ( σ )

~

Sample proportion ( P) is the best point estimator for population proportion (P)

But, there is always a sort of sampling error that can be measured by the Standard Error of the mean which relates to the precision of the estimated mean

Because of sampling variation we can not say that the exact parameter value is some specific number, but we can determine a range of values within which we are confident the unknown parameter lies

INTERVAL ESTIMATION

It consists of two numerical values defining an interval within which lies the unknown parameter we want to estimate with a specified degree of confidence

INTERVAL ESTIMATION

The values depend on the confidence level which is equal to 1-α (α is the probability of error)

The interval estimate may be expressed as:

Estimator ± Reliability coefficient X standard error

INTERVAL ESTIMATION

Parameter Estimator Standard error

Population mean

( µ )

Sample mean_

( X)

σ /√ n

INTERVAL ESTIMATION

Parameter Estimator Standard error

Difference between two population means

( µ

1

µ

2

)

Difference between two sample means

_ _

( X

1

-X

2

)

(σ 2

1

/n

1

)+ (σ 2

2

/n

2

)

INTERVAL ESTIMATION

Parameter Estimator Standard error

Population proportion

( P)

Sample proportion

~

(P)

~ ~

√ p(1-p)/n

(since P is unknown, and we want to estimate it)

INTERVAL ESTIMATION

Parameter

Difference between two

Population proportions

( P

1

-P

2

)

Estimator Standard error

Difference between two

Sample proportion

~ ~

P

1

-P

2

~ ~ ~

√ p

1

(1-p

1

)/n

1 + p

2

(1-

~ p

2

)/n

2

Reliability Coefficient

The reliability coefficient is the value of

Z

1α /2 level corresponding to the confidence

Reliability Coefficient

Confidence level

90%

α -value

0.1

95% 0.05

99% 0.01

Z-value

1.645

1.96

2.58

Confidence Interval

The Confidence Interval is central and symmetric around the sample mean , so that there is (α/2 %) chance that the parameter is more than the upper limit, and (α/2 % ) chance that it is less than the lower limit

CI FOR POPULATION MEAN

The sample mean is an unbiased estimate for population mean

If the population variance is known, CI around µ :

_ _

{ XZ

1α /2 x σ /√ n < µ < X + Z

1α /2 x σ /√ n

}

EXERCISE

The mean s.indirect bilirubin level of 16 four days old infants was found to be 5.98 mg/dl.

The population SD (σ)=3.5 mg/dl. Assuming normality , find 90,95, 99% CI for µ :

_ _

{XZ

1α /2 x σ /√ n < µ < X + Z

1α /2 x σ /√ n}

EXERCISE

_ _

CI{X-

α

Z

1α /2 x σ /√ n < µ < X +

Z

1α /2 x σ /√ n}=1-

90%CI= {5.98- 1.645 * 3.5 /√ 16 < µ < 5.98 + 1.645 *

3.5 /√ 16}=10.1

90%CI= {5.98- 1.44 < µ < 5.98 + 1.44}=10.1

90%CI= {4.54 < µ < 7.42}

_ _

CI{X-

Z

1α /2 x σ /√ n < µ < X +

Z

1α /2 x σ /√ n}=1α

95%CI {5.98- 1.96 * 3.5 /√ 16 < µ < 5.98 + 1.96 *

3.5 /√ 16}

95%CI {5.98- 1.715 < µ < 5.98 + 1.715}

95%CI {4.265 < µ < 7.695}

_ _

CI{X-

Z

1α /2 x σ /√ n < µ < X +

Z

1α /2 x σ /√ n}=1α

99%CI{5.98- 2.58 * 3.5 /√ 16 < µ < 5.98 + 2.58 * 3.5 /√

16}

99%CI{5.98- 2.258 < µ < 5.98 + 2.258}

99%CI={ 3.72 < µ < 8.24}

CI for difference between two population means

A sample of 10 twelve years old boys and a sample of

10 twelve years old girls yielded mean height of 59.8 inches (boys), and 58.5 inches (girls). Assuming normality and σ

1

=2 inches, and σ

2

= 3 inches . Find

90% CI for the difference in means of height between girls and boys at this age.

CI for difference between two population means

_ _ _ _

CI{ ( X

1

-X

2

) -Z √(σ 2

1

/n

1

)+ (σ 2

2

/n

2

)< ( µ

1

µ

2

)< ( X

1

-X

2

)+

Z

(σ 2

1

/n

1

)+ (σ 2

2

/n

2

)

}

90%CI{

( 59.8-58.5) -1.645 √(2) 2 /10)+ (3) 2 /10)< ( µ

(2) 2 /10)+ (3) 2 /10)

}

1

µ

2

)< ( 59.8-58.5)+1.645

90%CI{

1.3 -1.88< ( µ

1

µ

2

)< 1.3+ 1.88

}

90%CI{

-0.58< ( µ

1

µ

2

)< 3.18

}

CI for population proportion

In a survey 300 adults were interviewed , 123 said they had yearly medical checkup. Find the

95% for the true proportion of adults having yearly medical checkup.

~ 123

P=-------=0.41

300

CI for population proportion

~ ~ ~ ~ ~ ~

CI{P-Z √ p(1-p)/n<P<P+Z √ p(1-p)/n}=1-α

95%CI{0.41-1.96 √ 0.41(1-0.41)/300<P<0.41+1.96

√ 0.41(1-0.41)/300}

95%CI{0.41- 0.06<P<0.41+0.06}

95%CI{0.35<P<0.47}

95%CI= 35-47%

CI for difference between two population proportions

200 patients suffering from a certain disease were randomly divided into two equal groups.

The first group received NEW treatment, 90 recovered in three days. Out of the other 100 who received the STANDARD treatment 78 recovered within three days. Find the 95% CI for the difference between the proportion of recovery among the populations receiving the two treatments

Answer

~ ~ 90 78

P

1

-P

2

=------- - ---------=0.12

100 100

Answer

~ ~ ~ ~ ~ ~ ~ ~

CI ( P

1

-P

2

)-Z √ p

1

(1-p

1

)/n

1 + p

2

(1-p

2

)/n

2 <

P

1

-P

2 <

( P

1

-P

2

)+Z

~ ~ ~ ~

√ p

1

(1-p

1

)/n

1 + p

2

(1-p

2

)/n

2

95% CI=0.12

± 1.96 √ 0.9(1.0.9)/100 + 0.78(1-0.78)/100

95%CI=0.12 ± 0.1

95%CI =0.02-0.22 ( 2-22%)

The width of the interval estimation is increased by:

Increasing confidence level (i.e.: decreasing alpha value)

Decreasing sample size

Confidence level can shade the light on the following information:

1.The range within which the true value of the estimated parameter lies

2.The statistical significance of a difference ( in population means or proportions).

If the ZERO value is included in the interval of such differences( i.e.: the range lies between a negative value and a positive value), then we can state that there is no statistically significant difference between the two population values

(parameters), although the sample values

(statistics) showed a difference

3.The sample size.

A narrow interval indicates a “ large ” sample size, while a wide interval indicates a “ small ” sample size (with fixed confidence level)

EXERCISES

In a study to assess the side effects of two drugs

, 50 animals were given Drug A (11 showed undesirable side effects), and 50 were given

Drug B (8 showed similar side effects).

Find the 95% CI for P

A

-P

B

EXERCISES

In a random sample of 100 workers , the mean blood lead level was 90 ppm. If the distribution of blood lead level in workers population is normal with a standard deviation of 10 ppm.

Find the 90,95,and 99% CI for the population mean.

EXERCISE

In assessing the relationship between a certain drug and a certain anomaly in chick embryos, 50 fertilized eggs were injected with the drug on the 4 th day of incubation . On the 20 th day the embryos were examined and in 12 the presence of the abnormality was observed. Find the

90,95, and 99% CI for the population proportion.

EXERCISE

If the Hb level of males aged >10 years is normally distributed with a variance of 1.462

(gm/dl) 2 , and that of males below 10 years is also normally distributed with a variance of

0.867 (gm/dl) 2 . If a random sample of 10 older and 20 younger males are selected , and showed sample means of 14.47 gm/dl, and 12.64 gm/dl

, respectively. Find the 90, 95, and 99% CI for the difference in population means.

Download