Chap 2

advertisement

Ratio Estimation and Regression Estimation

(Chapter 4, Textbook, Barnett, V., 1991)

2.1 Estimation of a population ratio: The ratio estimator

 In some situations it is useful to estimate a (positive) ratio of two population characteristics: the totals, or means, of two (positive) variables X and Y .

2020/4/15 www.uic.edu.hk/~xlpeng 1

Two obvious estimators of R are

The sample average of ratio r

1 n i n 

1

( y x i

/ i

)

1 n i n 

1 r i unbiased for estimating the population mean

R

1

N j

N 

1

R j

1

N j

N 

1

( Y j

/ X j

) but biased for estimating R

The ratio of the sample averages r

/

 y

T

/ x

T is widely used.

2020/4/15 www.uic.edu.hk/~xlpeng 2

The bias in estimating R by r

The bias in estimating R by r is the expectation of the following difference: r

 

( y

Rx ) / x

(2.3)

 y

Rx

X

 1

 x

X

X

1

 y

Rx

X

1

 x

X

 x

X

X X

2

.

( )

 

E y

Rx

[(

)(

X

X X

2

)]

2020/4/15 www.uic.edu.hk/~xlpeng 3

Discussion about the bias

2020/4/15 www.uic.edu.hk/~xlpeng 4

(2.5)

Z j

  j

RX j

( Y j

Y ) ( RX j

RX )

1

 nX

2 f

1

 nX

2 f j

N 

1

( Y j

RX

N

1 j

)

2

S

Y

2 

2 RS

YX

R S

2

X

 www.uic.edu.hk/~xlpeng 2020/4/15 5

A (slightly) biased estimate of the true variance parameter ( )

1

 nx

2 f i n 

1

( y i

 rx i n

1

)

2

(1

 nx f s

2 r

2

For large n , an approximate 100 (1α ) % (symmetric) two-sided confidence interval for the population ratio R is: r

 z

V

ˆ

( r )

R

 r

 z

V

ˆ

( r )

And the required sample size is

V

 d x 2 / z 2

 2 2 d x / 4

2020/4/15 www.uic.edu.hk/~xlpeng 6

2.2 Ratio estimation of a population mean or total y rX ( / )

R

  y

TR

 rX

T

( / )

Ny

R

2020/4/15 www.uic.edu.hk/~xlpeng 7

Variance of ratio estimator

2020/4/15 www.uic.edu.hk/~xlpeng 8

2020/4/15 www.uic.edu.hk/~xlpeng 9

Example: (Food additive)

A researcher was investigating a new food additive for cattle. Midway through the two-month study, he was interested in estimating the average weight for the entire herd of N = 500 steers . A simple random sample of n =

12 steers was selected from the herd and weighed.

These data and prestudy weights are presented in the accompanying table for all cattle sampled. Assume the prestudy average = 880 pounds.

Estimate the ratio of present weight to prestudy weight of the herd, and provide an estimate of the standard error for your answer.

Which points have greatest influence on the estimate?

2020/4/15 www.uic.edu.hk/~xlpeng 10

2020/4/15 www.uic.edu.hk/~xlpeng 11

Solution:

2020/4/15 www.uic.edu.hk/~xlpeng 12

Solution:

The estimate of the ratio R of the present weight to prestudy weight for the herd is:

Var ( r )

1

 n X

2 f

S r

2 

1

880

2

( 1

12

500

)

8 , 848 .

646

12

0 .

000929 se ( r )

0 .

000929

0 .

030485

2020/4/15 www.uic.edu.hk/~xlpeng 13

Sample size determination

Hence we should sample 94 steers to estimate R, the change in weight of herd after the study with error bound of 1%.

2020/4/15 www.uic.edu.hk/~xlpeng 14

Example: (Sugar content)

In a study to estimate the total sugar content of a truckload of oranges, a SRS of n = 10 oranges was juiced and weighted. The total weight of all the oranges, obtained by first weighing the truck loaded and then unloaded, was found to be

1800 pounds. Estimate Y , the total sugar content for the oranges and provide the standard error of the estimate.

2020/4/15 www.uic.edu.hk/~xlpeng 15

Solution:

The scatter plot shows a strong positive association between sugar content and weight, making the ratio estimator a reasonable choice.

An estimate of Y is

2020/4/15 www.uic.edu.hk/~xlpeng 16

Solution:

2020/4/15 www.uic.edu.hk/~xlpeng 17

Example: (Promotional campaign)

An advertising firm is concerned about the effect of a new regional promotional campaign on the total dollar sales for a particular product. A SRS of n = 20 stores is drawn from the N = 452 regional stores in which the product is sold.

Quarterly sales data are obtained for the current three-month period and the three-month period prior to the new campaign. The pre-campaign sales for all stores X = 260, 256. Check the scatter plot to see if these stores are in two different size groups.

2020/4/15 www.uic.edu.hk/~xlpeng 18

Example: (Promotional campaign)

2020/4/15 www.uic.edu.hk/~xlpeng 19

Solution:

(a) Without using the auxiliary information, the estimate of the average current three-month sales using ordinary estimator is

2020/4/15 www.uic.edu.hk/~xlpeng 20

Solution:

(b) When the total pre-campaign three-month sales is known to be X = 260256, the average pre-campaign three-month sales is

Then the estimate of the average current three-month sales using ratio estimator is which represent an average increase of 7.1% of the current three-month sales from the pre-campaign three-month sales.

2020/4/15 www.uic.edu.hk/~xlpeng 21

Solution:

The ratio estimator here is much better than ordinary estimator since the current three-month sales y i is closely and positively related to the precampaign three-month sales x i with correlation coefficient 0.9986.

2020/4/15 www.uic.edu.hk/~xlpeng 22

2020/4/15 www.uic.edu.hk/~xlpeng 23

This examines when the variance of (2.10) could be less or greater than that of (1.9)

2020/4/15 www.uic.edu.hk/~xlpeng 24

2020/4/15 www.uic.edu.hk/~xlpeng 25

2.3 Regression estimation

Condition (2.15.1) demands that X and Y be linearly related, but, if the linear relationship does not pass through the origin , then, it suggests considering an alternative estimator known as regression estimator .

2020/4/15 www.uic.edu.hk/~xlpeng 26

.

2.3 Regression estimation

An ideal (perfect) linear relationship is

Y j

Y

 b ( X

X j

)

A practicable simple linear regression model is

Y j

Y

 b ( X

X j

)

E j

(2.18)

2020/4/15 www.uic.edu.hk/~xlpeng 27

2.3 Regression estimation

Consider the average (mean) of either (2.16) or (2.17),

(2.19) y y ( x

L

  

)

2020/4/15 www.uic.edu.hk/~xlpeng 28

2.3 Regression estimation

(2.20)

2020/4/15

(

L

)

[(

L

Y

2

) ]

E {[( y y

L

Y )

(

X

1

 f

2

)] }

( S

Y

2 

2 bS

YX

 2 2 b S

X

)

1

 n n f

S

Y

2

(1

 

2

YX

)

1

 n f

S

2

( )

Y

 www.uic.edu.hk/~xlpeng 29

2.3 Regression estimation

(2.21)

V y

L

1

 n f

(

2

2

2 2 s bs b s y yx x

)

2020/4/15 www.uic.edu.hk/~xlpeng 30

2.3 Regression estimation

From (2.20), min { b

Var y

L

)}

 min b

1

 n f

1

 n f

S

2

Y

( S

Y

2 

2 bS

YX

 b S 2

X

)

(1

 

2

YX

)

The minimum is obtained with b

 b min

S / S

YX X

2  

S / S

YX Y X

Thus the most efficient regression estimator of is y

L

  

( S

YX Y

/ S

X

)( X

 x )

2020/4/15 www.uic.edu.hk/~xlpeng 31

2.3 Regression estimation

The optimal value of b of (2.22) suggests the obvious estimate:

(2.24) b (

 b min

)

 s yx

2

 s x

 n

1 i

 i

( n y i

)( i

 x

1

( x i

 x )

2

)

(2.25) y

L

 

(

 x ) which enjoys the following asymptotic properties:

(

L

)

 

(

1

)

2020/4/15 www.uic.edu.hk/~xlpeng 32

2.3 Regression estimation

Asymptotic properties:

Var y

L

)

1

 n f

( S

2

Y

S

2

YX

/ S

2

X

)

O n

3/ 2

)

(2.27) V y

L

)

1

 n f

( s

2 y

 bs yx

)

2020/4/15 www.uic.edu.hk/~xlpeng 33

2.4 Comparison of ratio and regression estimators

2020/4/15 www.uic.edu.hk/~xlpeng 34

2.4 Comparison of ratio and regression estimators

(

R

)

(

L

)

1

 n f 

2 2

R S

X

2 R

S S

YX Y X

 

2 2

S

YX Y

1

 n f 

RS

X

 

S

YX Y

2

2020/4/15 www.uic.edu.hk/~xlpeng 35

Download