Combined Analysis of Experiments Basic Research 

advertisement

Combined Analysis of Experiments

 Basic Research

– Researcher makes hypothesis and conducts a single experiment to test it

– The hypothesis is modified and another experiment is conducted

– Combined analysis of experiments is seldom required

– Experiments may be repeated to

• Provide greater precision (increased replication)

• Validate results from initial experiment

 Applied Research

– Recommendations to producers must be based on multiple locations and seasons that represent target environments (soil types, weather patterns)

Multilocational trials

 Often called MET = multi-environment trials

 How do treatment effects change in response to differences in soil and weather throughout a region?

What is the range of responses that can be expected?

 Detect and quantify interactions of treatments and locations and interactions of treatments and seasons in the recommendation domain

 Combined estimates are valid only if locations are randomly chosen within target area

– Experiments often carried out on experiment stations

– Generally use sites that are most accessible or convenient

Can still analyze the data, but consider possible bias due to restricted site selection when making interpretations

Preliminary Analysis

 Complete ANOVA for each experiment

– Do we have good data from each site?

– Examine residual plots for validity of ANOVA assumptions, outliers

 Examine experimental errors from different locations for heterogeneity

– Perform F Max test or Levene’s test for homogeneity of variance

– If homogeneous, perform a combined analysis across sites

Heterogeneous Variances

 Differences in means across sites are often greater than treatment effects

 Does not prevent a combined analysis, but may contribute to error heterogeneity if there are associations between means and variances

 If heterogeneous:

– Break sites into homogeneous groups and analyze separately

– Use a transformation

– Use a generalized linear mixed model that accounts for error structure

MET Linear Model (for an RBD)

Y ijk

=

+

 i

+

 j(i)

+

 k

+ (



) ik

+

 ijk

= mean effect

 i

 j(i)

 k

 ik

 ijk

= i th location effect

= j th block effect within the i th location

= k th treatment effect

= interaction of the k th

= pooled error treatment in the i th location

 Environments = Locations = Sites

 Blocks are nested in locations

– SS for blocks is pooled across locations

Treatment x Environment Interaction

 Obtain a preliminary estimate of interaction of treatment with environment or season

 Will we be able to make general recommendations about the treatments or should they be specific for each region or site?

– Error degrees of freedom are pooled across sites, so it is relatively easy to detect interactions

– Consider the relative magnitude of variation due to the treatments compared to the interaction MS

– Are there rank changes in treatments across environments (crossover interactions)?

Treatments and locations are random

Source df SS MS

Location

Blocks in Loc.

l-1 l(r-1)

SSL M1

SSB(L) M2

Treatment t-1 SST M3

Loc. X Treatment (l-1)(t-1) SSLT M4

Pooled Error l(r-1)(t-1) SSE M5

Expected MS

 2 e

 r

 2

TL

 t

 2

R ( L )

 rt

 2

L

 2 e

 2 e

 2 e

 2 e

 t

 2

R ( L )

 r

 2

TL

 r

 2

TL

 rl

 2

T

F for Locations = (M1+M5)/(M2+M4)

Satterthwaite’s approximate df

N1’ = (M1+M5) 2 /[(M1 2 /(l-1))+(M5 2 /((l)(r-1)(t-1)))]

N2’ = (M2+M4) 2 /[(M2 2 /(l-1))+(M4 2 /((l)(r-1)(t-1)))]

F for Treatments = M3/M4

F for Loc. x Treatments = M4/M5

Treatments and locations are fixed

Fixed Locations

• constitute the entire population of environments

OR

• represent specific environmental conditions (rainfall, elevation, etc.)

Source

Location

Blocks in Loc.

Treatment

Loc. X Treatment df l-1 l(r-1) t-1

(l-1)(t-1)

SS

SSL

SSB(L)

SST

SSLT

MS Expected MS

M1 t 2

R(L) rt 2

L

M2

M3

M4 e e t 2

R(L) r rl 2

T

2

LT

Pooled Error l(r-1)(t-1) SSE M5

 2 e e

F for Locations = M1/M2

F for Treatments = M3/M5

F for Loc. x Treatments = M4/M5

Treatments are fixed, Locations are random

Source df SS MS

Location

Blocks in Loc.

l-1 l(r-1)

SSL M1

SSB(L) M2

Treatment t-1 SST M3

Loc. X Treatment (l-1)(t-1) SSLT M4

Pooled Error l(r-1)(t-1) SSE M5

Expected MS

 2 e

 t

 2

R ( L )

 rt

 2

L

 2 e

 t

 2

R ( L )

 2 e

 2 e

 2 e

 r

 2

TL

 r

 2

TL

 rl

 2

T

F for Locations = M1/M2

F for Treatments = M3/M4

F for Loc. x Treatments = M4/M5

 SAS uses slightly different rules for determining Expected MS

 No direct test for Locations for this model

SAS Expected Mean Squares

Varieties fixed, Locations random

PROC GLM ;

Class Location Rep Variety;

Model Yield = Location Rep(Location) Variety Location*Variety;

Random Location Rep(Location) Location*Variety/ Test ;

Source Type III Expected Mean Square

Location Var(Error) + 3 Var(Location*Variety) +

7 Var(Rep(Location)) + 21 Var(Location)

Dependent Variable: Yield

Source DF Type III SS Mean Square F Value Pr > F

Location 1 0.505125 0.505125 0.20 0.6745

Error 5.8098

15.027788 2.586644

Error: MS(Rep(Location)) + MS(Location*Variety) - MS(Error)

Treatments are fixed, Years are random

Source df SS MS

Years l-1

Blocks in Years l(r-1)

SSY M1

SSB(Y) M2

Treatment t-1 SST M3

Years X Treatment (l-1)(t-1) SSYT M4

Pooled Error l(r-1)(t-1) SSE M5

Expected MS

 2 e

 t

 2

R ( Y )

 rt

 2

Y

 2 e

 t

 2

R ( Y )

 2 e

 2 e

 2 e

 r

 2

TY

 r

 2

TY

 ry

 2

T

F for Years = M1/M2

F for Treatments = M3/M4

F for Years x Treatments = M4/M5

Locations and Years in the same trial

 Can analyze as a factorial

Source

Years df y-1

Locations

Years x Locations

Block(Years x Locations) l-1

(y-1)(l-1) yl(r-1)

 Can determine the magnitude of the interactions between treatments and environments

– TxY, TxL, TxYxL

 For a simpler interpretation, consider all year and location combinations as “sites” and use one of the models presented for multilocational trials

Combined Lab or Greenhouse Study (CRD)

Assume Treatments are fixed, Trials are random

A “trial” is a repetition of a replicated experiment

Source

Trial

Treatment df l-1 t-1

SS

SSL

SST

MS

M1

M2

Trial x Treatment (l-1)(t-1) SSLT M3

Pooled Error lt(r-1) SSE M4

Expected MS e e r r rt

2

LT

2

L

2

LT rl e

 2 e

2

T

F for Trials = M1/M4 ( SAS would say M1/M3 )

F for Treatments = M2/M3

F for Trials x Treatments = M3/M4

 If there are no interactions, consider pooling SSLT and SSE

– Use a conservative P value to pool (e.g. >0.25 or >0.5)

Preliminary ANOVA

 Assumptions for this example:

– locations and blocks are random

Treatments are fixed

Source

Total

Location

Blocks in Loc.

df lrt-1 l-1 l(r-1)

SS

SSTot

SSL

SSB(L)

Treatment t-1 SST

Loc. X Treatment (l-1)(t-1) SSLT

Pooled Error l(r-1)(t-1) SSE

MS F

M1 M1/M2

M2

M3 M3/M4

M4 M4/M5

M5

 If Loc. x Treatment interactions are significant, must be cautious in interpreting main effects combined across all locations

SAS Combined Analysis

PROC MIXED or PROC GLIMMIX

Genotype by Environment Interactions (GEI)

 When the relative performance of varieties differs from one location or year to another…

– how do you make selections?

– how do you make recommendations to farmers?

Genotype x Environment Interactions (GEI)

 How much does GEI contribute to variation among varieties or breeding lines?

P = G + E + GE

P is phenotype of an individual

G is genotype

E is environment

GE is the interaction

DeLacey et al ., 1990 – summary of results from many crops and locations

70-20-10 rule

E: GE: G

20% of the observed variation among genotypes is due to interaction of genotype and environment

Stability

 Many approaches for examining GEI have been suggested since the 1960’s

 Characterization of GEI is closely related to the concept of stability. “Stability” has been interpreted in different ways.

– Static – performance of a genotype does not change under different environmental conditions

(relevant for disease resistance, quality factors)

– Dynamic – genotype performance is affected by the environment, but its relative performance is consistent across environments. It responds to environmental factors in a predictable way.

Measures of stability

 CV of individual genotypes across locations

 Regression of genotypes on an environmental index

– Eberhart and Russell, 1966

 Ecovalence

– Wricke, 1962

 Superiority measure of cultivars

– Lin and Binns, 1988

 Many others…

Analysis of GEI – other approaches

 Rank sum index (nonparametric approach)

 Cluster analysis

 Factor analysis

 Principal component analysis

 AMMI

 Pattern analysis

 Analysis of crossovers

 Partial Least Squares Regression

 Factorial Regression

Download