L6-DESIGN.ppt

advertisement
Design of Micro-arrays
Lecture Topic 6
Experimental design
• Proper experimental design is needed to
ensure that questions of interest can be
answered and that this can be done
accurately, given experimental constraints,
such as cost of reagents and availability of
mRNA.
Design considerations in micro-arrays
• There are 2 main components where Designs
come in in Micro-arrays:
– Probe Design
– Allocation of RNA to probes
Array/Probe design
• Which gene-representative sequence from which gene
collection to print on the array?
• Where?
• Controls or Not?
• Numbers, how many controls, how many genes?
- Duplicate or replicate spots within a slide position.
Commonly asked questions
Should we put duplicates on a slides.
• What should be the percentage of control spots?
• Where should the control spots be placed?
[These relates to preprocessing such as quality assessment
and normalization].
Probe Design
As Statisticians we often have VERY little say on
the probe designs. The only input may be in
location of control spots. However, we may
have some input in the allocation of RNA
samples to the probes.
Idea behind Experimental Design
• It was introduced by Sir Ronald Fisher in the 1920s to deal
with systematic sources of variation in agricultural field
trials.
• The same ideas are true TODAY for Micro-arrays.
• Fisher’s idea was divided into 3 main principles:
– Randomization
– Replication
– Local Control or Blocking
Lets discuss some terms USED in design.
Terms and Definitions
• Treatment/ Condition: any attribute of primary interest
• Unit: Independent Replicate that is subject to the treatment
• Block: any attribute that is believed to have an influence
on the response but NOT of primary interest
• Crossing: assigning all possible combinations of factors to
units
• Confounding: the effect when the effect of one factor
cannot be separated from another factor
Designing using Principles
• Randomization: a chance device to assign treatments to
units, essential to reduce any systematic bias
• Replication: including more than one unit per condition,
allows us to estimate random variation and is also used for
reducing bias.
• Local control/blocking: if we believe that there is a
systematic source of variation that may affect the response,
we should identify this source and randomize within the
blocks.
Crossing and Confounding
• Crossing: refers to assigning one of all possible
combinations to the units. Common in terms of dye-swap,
exposing all experimental conditions to both dyes.
• Confounding: happens when one factor cannot be told
apart from another factor
• Example: you have two conditions H and C and 2 slides.
You hybridize condition H with Red dye in all slides and
condition C with Green dye in all slides. Here you cannot
distinguish the effect of Dye from condition. This is called
confounding.
Example:
• Consider a two-channel micro-array experiment done
where we are interested in comparing RNA samples from
healthy mice (H) to that of cancerous (C) mice. For the
experiment we have 2 arrays (A1, A2)
• Here the treatment: is the status of the mice (2 levels: H
and C)
The UNIT is the array. (There is some debate about this,
some argue each gene is the unit).
The Block is the color (two levels red and green)
Now lets consider how the experiment was done.
DESIGN 1
• ARRAY AND CONDITION
CONFOUNDING
Design
Array A1
Array A2
Channel Red
H
C
Channel Green
H
C
DESIGN 2
• NON DYE-SWAP: COLOR AND
CONDITION CONFOUNDING
Design
Array A1
Array A2
Channel Red
H
H
Channel Green
C
C
DESIGN 3
• DYE-SWAP
Design
Array A1
Array A2
Channel Red
H
C
Channel Green
C
H
DESIGN 4
• REFERNCE DESIGN
Design
Array A1
Array A2
Channel Red
H
C
Channel Green
REF
REF
Example continued
• Here we had two colors (blocks)
• Hence each condition (healthy or cancer) should be
randomized within each color.
• Finally we want more than one slide for each treatment,
block combination.
• The idea is:
• Response = f(treatment effect, block effect, random error)
• Randomization and replication allows us to believe that if
all systematic sources of variation is removed then we are
left with random error.
• Hence we assume no bias.
DESIGN WITH INDIVIDUAL
BIOLOGICAL REPLICATION
• 2 BIOLOGICAL REPLICATES, 4 SLIDES
Design
Array A1
Array A2
Array A3
Array A4
Channel Red
H(1)
C(1)
H(2)
C(2)
Channel Green
C(1)
H(1)
C(2)
H(2)
The main goal is: Avoidance of bias
• Conditions of an experiment; mRNA extraction and
processing, the reagents, the operators, the scanners
and so on can leave a “global signature” in the resulting
expression data.
Hence it is essential to follow the principles of proper
experimentation to avoid bias.
Replication and related issues
What type of replicate is to be used?
Allocation of samples to the slides
A Types of Samples
• -
Replication – technical, biological.
• This always needs to be considered in microarrays since in
general we often do NOT have biological replication
• -
Pooled vs individual samples.
• -
Pooled vs amplification samples.
Biological Replication
• The number of organisms from which you have taken the
RNA is your biological replicate. If you used 3 mice and
obtain RNA from each mice, that is your biological
replication.
• Biological replication allows us to infer about the general
population of interest.
• According to McClure and Wit: “the only thing that is
good enough to answer a biological questions are the socalled biological replicates”
Technical Replicate
• Sometimes it is more convenient to obtain RNA from 3
organisms, put them together and extract the RNA and
then divide them up into 3 RNA samples to be hybridized.
• This is NOT biological replication, rather this is called
technical replicate after pooling.
• This is more convenient and has less variability (pooling
always decreases variability), but often leads to bias.
• Another way is to obtain RNA from one organism and
divide the RNA into 3 batches for hybridization. This is an
extreme technical replicate.
More on Technical and Biological Replication
• Having a particular gene or (EST) repeated on a slide (as in
Affy chips) is an example of Technical replication.
• This is NOT biological replication since the whole chip is
exposed to the experimental condition
• However, technical replicates are useful, since they capture
the variability due to measurement error, hybridization
inequalities across a slide.
• The bottom line is: we are interested in the average
expression level of a particular gene exposed to a particular
condition for a specific biological organism.
How many Replicates?
• This is where the theory of optimal design comes in.
• Deciding HOW many replicates depends upon the
questions you are interested in and the contrasts you want
to estimate.
• In general a rule of thumb is: “at least 3 arrays per
condition”
• One thing to keep in mind is that, technical replicates are
in general highly reproducible, r = .95, whereas biological
replicates from the same condition often have r ~ .30.
Different design layout
• -
Scientific aim of the experiment.
• -
Robustness.
• -
Extensibility.
• -
Efficiency.
Taking physical limitation or cost into
consideration:
• -
the number of slides.
• -
the amount of material.
Pooled vs. amplified samples
In the cases where we do not have enough material from one
biological sample to perform one array (chip)
hybridizations.
Pooling or Amplification are necessary.
• Amplification
- Introduces more noise.
- Non-linear amplification (??), different genes amplified at
different rate.
- Able to perform more hybridizations.
• Pooling
- Less replicates hybridizations.
Pooled vs individual samples
• Pooling is seen as “biological averaging”.
• Trade off between
- Cost of performing a hybridization.
- Cost of the mRNA samples.
Cost or mRNA samples << Cost per hybridization
Pooling can assists reducing the number of hybridization.
To pool or not to pool?
• Pooling is routinely done when a single organism doesn’t
allow you to have enough RNA for hybridization. So
several organisms are combined to get enough RNA.
• The alternative to pooling is PCR amplification, where you
use PCR techniques to physically amplify the harvested
RNA.
• The literature has is not uniform in deciding which is
better. Affy (GeneChip help notes) suggest that pooling
causes too much averaging and sometimes we can average
out less significant expressions.
Design Issues:
• Single Channel:
– Identifying conditions of interest
– Obtaining biological replicates
– Preparing hybridization samples
Each condition is considered to be a
separate population from which
biological replicates can be sampled.
Single Channel Issues
• If each ARRAY is considered a “Blocking Factor” then for
one-channel oligo arrays it is NOT possible to apply more
than one condition per array.
• Here each array can only be exposed to ONE condition,
hence the array effects are confounded.
• Pooling of Biological replicates seems to be recommended
to somewhat deal with some of the biases possible in
single channel arrays.
Download