Lecture 10

advertisement

Single-Factor ANOVA

We skip Sec 10.3

Single-factor ANOVA focuses on a comparison of more than two population or treatment means. Let l = the number of populations or treatments being compared

1

= the mean of population 1 or the true average response when treatment 1 is applied

.

I

= the mean of population I or the true average response when treatment I is applied

1

Single-Factor ANOVA

The relevant hypotheses are

H

0

:

1

=

2

= ··· = 

I versus

H a

: at least two the of the

 i

’s are different

If I = 4, H

0 is true only if all four be true, for example, if

 i

’s are identical. H a would

1

=

2

 

3

=

4

, if

1

=

3

=

4

 

2

, or if all four

 i

’s differ from one another.

2

Notation and Assumptions

Let x i , j

= the random variable (rv) that denotes the j th measurement taken from the i th population, or the measurement taken on the j th experimental unit that receives the i th treatment x i , j

= the observed value of x i , j performed when the experiment is

3

Notation and Assumptions

Here we’ll focus on the case of equal sample sizes;

Let J denote the number of observations in each sample

( J = 6 in Example 1). The data set consists of IJ observations.

The individual sample means will be denoted by

X

1 

, X

2 

, . . ., X

I 

.

That is,

4

Notation and Assumptions

The dot in place of the second subscript signifies that we have added over all values of that subscript while holding the other subscript value fixed, and the horizontal bar indicates division by J to obtain an average.

Similarly, the average of all IJ observations, called the grand mean, is

5

Notation and Assumptions

Assumptions

The I population or treatment distributions are all normal with the same variance

2 . That is, each x ij is normally distributed with

E ( X ij

) =

 i

V ( X ij

) =

2

The I sample standard deviations will generally differ somewhat even when the corresponding

 ’s are identical.

6

The Test Statistic

Definition

Mean square for treatments is given by and mean square for error is

The test statistic for single-factor ANOVA is F = MSTr/MSE.

7

The Test Statistic

The terminology “mean square” will be explained shortly.

Notice that uppercase X ’s and S 2 ’s are used, so MSTr and

MSE are defined as statistics.

We will follow tradition and also use MSTr and MSE

(rather than mstr and mse) to denote the calculated values of these statistics.

Each assesses variation within a particular sample, so

MSE is a measure of within-samples variation.

8

The Test Statistic

Proposition

When H

0 is true,

E (MSTr) = E (MSE) =

2 whereas when H

0 is false,

E (MSTr) > E (MSE) =

2

That is, both statistics are unbiased for estimating the common population variance tends to overestimate

2 when H

2 when H

0

0 is true, but MSTr is false.

9

F Distributions and the F Test

Theorem

Let F = MSTr/MSE be the test statistic in a single-factor

ANOVA problem involving I populations or treatments with a random sample of J observations from each one.

When H

0 are satisfied, F has an F distribution with v v

2 is true and the basic assumptions of this section

= I – 1 and

= I ( J – 1).

1

With f denoting the computed value of F , the rejection region f

 then specifies a test with significance level

.

10

F Distributions and the F Test

The rationale for v

1 on the I deviations X

= I – 1 is that although MSTr is based

1 

– X



, . . ., X

1 

– X

 only I – 1 of these are freely determined.

( X

1 

– X



) = 0, so

Because each sample contributes J – 1 df to MSE and these samples are independent, v

2

= ( J – 1) + · · · + ( J – 1) = I ( J – 1).

11

Sums of Squares

The introduction of sums of squares facilitates developing an intuitive appreciation for the rationale underlying single-factor and multifactor ANOVAs.

Let x i  represent the sum (not the average, since there is no bar) of the x ij

’s for i fixed (sum of the numbers in the i th row of the table) and x

 denote the sum of all the x ij

’s

(the grand total ).

12

Sums of Squares

Definition

The total sum of squares (SST), treatment sum of squares (SSTr), and error sum of squares (SSE) are given by

13

Sums of Squares

Fundamental Identity

SST = SSTr + SSE

(10.1)

Thus if any two of the sums of squares are computed, the third can be obtained through (10.1); SST and SSTr are easiest to compute, and then SSE = SST – SSTr. The proof follows from squaring both sides of the relationship x ij

– x



= ( x ij

– x i 

) + ( x i 

– x



)

(10.2) and summing over all i and j .

14

Sums of Squares

Once SSTr and SSE are computed, each is divided by its associated df to obtain a mean square ( mean in the sense of average). Then F is the ratio of the two mean squares.

(10.3)

15

Sums of Squares

The computations are often summarized in a tabular format, called an ANOVA table, as displayed in Table 10.2.

Tables produced by statistical software customarily include a P -value column to the right of f .

An ANOVA Table

Table 10.2

16

Multiple Comparisons in ANOVA

When the computed value of the F statistic in single-factor

ANOVA is not significant, the analysis is terminated because no differences among the

 i

’s have been identified.

But when H

0 is rejected, the investigator will usually want to know which of the

 i

’s are different from one another.

A method for carrying out this further analysis is called a multiple comparisons procedure.

17

Multiple Comparisons in ANOVA

Several of the most frequently used procedures are based on the following central idea.

First calculate a confidence interval for each pairwise difference

 would be for

1

– 

4

,

2

– i

1

3

, j with i < j . Thus if I = 4. the six required CIs

2

2

(but not also for

4

, and

3

– 

4

.

2

– 

1

),

1

– 

3

,

Then if the interval for and

2

1

– 

2 does not include 0, conclude that

1 differ significantly from one another; if the interval does include 0, the two

 ’s are judged not significantly different.

18

Download