Topic 26: Analysis of Covariance

advertisement

Topic 26: Analysis of

Covariance

Outline

• One-way analysis of covariance

– Data

– Model

– Inference

– Diagnostics and rememdies

• Multifactor analysis of covariance

Data for One-Way

ANCOVA

• Y ij is the j th observation on the response variable in the i th group

• X ij is the j th observation on the covariate in the i th group

• i = 1, . . . , r levels (groups) of factor

• j = 1, . . . , n i observations for level i

KNNL Example

(pg 927)

• Y is cases of crackers sold during promotion period

• Factor is the type of promotion (r=3)

– Customers sample crackers in store

– Additional shelf space

– Special display shelves

• n i

=5 different stores per type

• The covariate X is the number of cases of crackers sold at the store in the preceding period

Data data a1; infile 'c:\...\CH22TA01.DAT'; input cases last trt store; proc print data=a1; run;

Output

Obs cases last trt store

1 38 21 1 1

2 39 26 1 2

3 36 22 1 3

4 45 28 1 4

5 33 19 1 5

6 43 34 2 1

7 38 26 2 2

Output

Obs cases last trt store

8 38 29 2 3

9 27 18 2 4

10 34 25 2 5

11 24 23 3 1

12 32 29 3 2

13 31 30 3 3

14 21 16 3 4

15 28 29 3 5

Plot the data title1 'Plot of the data'; symbol1 v='1' i=none c=black; symbol2 v='2' i=none c=black; symbol3 v='3' i=none c=black; proc gplot data=a1; plot cases*last=trt/frame; run;

Background

• Covariates are sometimes called

concomitant variables

• Covariates should be related to the response variable

• Covariates should not be affected by the treatment variable (factor)

• Often they are some kind of baseline or pretest value

Basic ideas

• A covariate can reduce the MSE, thereby increasing power

• A covariate can adjust for differences in characteristics of subjects in the treatment groups

• We assume that the covariate will be linearly related to the response and that the relationship will be the same for all levels of the factor. Similar to comparing regression lines.

Cell Means Model for one-way ancova

• Y

  ij

– the

 ij i

 are iid N(0, σ 2 )

(

X ij

X

..

)

  ij

– Y ij i

( X ij

X

..

) σ 2 ) and indep

• For each i, we have a simple linear regression

• The slopes are the same

• The intercepts can be different

Plot of the data with lines title1 'Plot of the data with lines'; symbol1 v='1' i=rl c=black; symbol2 v='2' i=rl c=black; symbol3 v='3' i=rl c=black; proc gplot data=a1; plot cases*last=trt/frame; run;

Parameters

• The parameters of the model are

– μ i

– β for i = 1 to r

– σ 2

Estimates

• We use multiple regression methods to estimate the μ i and β

• We use the residuals from the model to estimate σ 2

• The estimate is s 2 (equal to the MSE)

Factor Effects Model for one way anova

• Y

  ij

– the

 ij

  i

 

( X ij

X are iid N(0, σ 2 )

..

)

  ij

• The usual constraints are



= 0

• Proc glm sets α r

= 0

Interpretation of model

• Expected value of a Y with level i and X ij

=x is

   i

 

( X

X

..

)

• Expected value of a Y with level i ´ and X ij

=x is

  

• The difference is

 i

 i

 

-

 i ´

( X

X

..

)

• Note that this difference does not depend on the value of x (due to assumption of constant slopes)

Proc glm proc glm data=a1; class trt; model cases=last trt; run;

Model output

Source DF MS F P

Model 3 202.60 57.78 .0001

Error 11 3.50

Total 14

Anova table output

Type I

Source DF SS MS F P last 1 190 190 54.38 <.0001

trt 2 417 208 59.48 <.0001

Type III

Source DF SS MS F P last 1 269 269 76.72 <.0001

trt 2 417 208 59.48 <.0001

Type I vs Type III

• Because X for covariate is not orthogonal (like indep argument before) to X’s for factor

– Order that the X are fit makes a difference. Want to compare means after adjusting for covariate

– General rule to use Type III SS when Type I and III SS differ

Parameter Estimates proc glm data=a1; class trt; model cases=last trt

/solution; run;

Output

Par Est SE t P

Int 4.37 B 2.73 1.60 0.1381

last 0.89 0.10 8.76 <.0001

trt1 12.9 B 1.20 10.76 <.0001

trt2 7.9 B 1.18 6.65 <.0001

trt3 0.0 B

Common slope is 0.89

Interpretation

• Expected value of Y with level i of factor

A and X=x is

 ˆ   ˆ

   i

 

( X

X

..

) i when X is equal to the average covariate value

• This is usually the level of X where the trt means are calculated and compared

• Need to make sure this level of X is reasonable for each level of the factor

LSMEANS

• The L(least)S(square) means can be used to obtain these estimates

– All other categorical values are set at an equal mix for all levels (I.e., average over the other factors)

– All continuous values are set at the overall mean

• These are similar to subpopulation mean estimates

Intepretation for KNNL example

• Y is cases of crackers sold under a particular promotion scenario

• X is the cases of crackers sold during the last period

• The LSMEANS are the estimated cases of crackers that would be sold for a store with the ave number of crackers sold during the last period

LSMEANS Statement proc glm data=a1; class trt; model cases=last trt; lsmeans trt/ stderr tdiff pdiff cl; run;

Output treat LSMEAN SE P

1 39.8 0.8 <.0001

2 34.7 0.8 <.0001

3 26.8 0.8 <.0001

Output

L east Squares Means for Effect treat t for H0: LSMean(i)=LSMean(j) / Pr > |t|

Dependent Variable: cases i/j 1 2 3

1 4.129808 10.76359

0.0017 <.0001

2 -4.12981 6.646871

0.0017 <.0001

3 -10.7636 -6.64687

<.0001 <.0001

Output treat LSMEAN 95% CL

1 39.8 37.9 41.7

2 34.7 32.8 36.6

3 26.8 24.9 28.6

Output

Difference

Between 95% CL for i j Means LSM(i)-LSM(j)

1 2 5.0 2.3 7.7

1 3 12.9 10.3 15.6

2 3 7.9 5.2 10.5

Prep data for plot title1 'Plot of the data with the model'; proc glm data=a1; class trt; model cases=last trt; output out=a2 p=pred;

Prep data for plot data a3; set a2; drop cases pred; if trt eq 1 then do cases1=cases; pred1=pred; output; end;

Prep data for plot if treat eq 2 then do cases2=cases; pred2=pred; output; end; if treat eq 3 then do cases3=cases; pred3=pred; output; end; proc print data=a3; run;

Code for plot symbol1 v='1' i=none c=black; symbol2 v='2' i=none c=black; symbol3 v='3' i=none c=black; symbol4 v=none i=rl c=black; symbol5 v=none i=rl c=black; symbol6 v=none i=rl c=black; proc gplot data=a3; plot (cases1 cases2 cases3 pred1 pred2 pred3)

*last/frame overlay; run;

Check for equality of slopes title1 'Check for equal slopes'; proc glm data=a1; class trt; model cases=last trt last*trt; run;

Output

Type I

Source DF F P last 1 54.44 <.0001

treat 2 59.55 <.0001

last*treat 2 1.01 0.4032

Type III

Source DF F P last 1 69.42 <.0001

treat 2 0.18 0.8379

last*treat 2 1.01 0.4032

Diagnostics and remedies

• Examine the data and residuals

• Look for outliers that are influential

• Transform if needed, consider Box-Cox

• Examine variances (standard deviations)

– Use a BY statement and look at the

MSE

Last slide

• Read KNNL Ch 22

• We used topic26.sas to generate the output for today

Download