Introduction to DNA Microarrays

advertisement
Introduction to Gene Chips and
Microarray Expression Data
Dr. Travis Doom
Department of Computer Science and Engineering
Wright State University
Outline

DNA Microarrays
– Fabrication
– Application

Microarray Data
– Analysis Techniques

New Technology &
Open Commentary
Intro to gene chips - 2
Fabrication


Fabrication via PCR/Clone
 DNA sequence stuck
to glass substrate
 DNA solution presynthesized in the lab
Fabrication In Situ
 Sequence “built”
 Photolithographic
techniques use light to
release capping
chemicals
 365 nm light allows
20-m resolution
Intro to gene chips - 3
DNA Microarrays

Each probe consists of thousands of strands of identical
oglionucleotides
– The DNA sequences at each probe represent important
genes (or parts of genes)

Printing Systems
– Ex: HP, Corning Inc.
– Printing systems can build lengths of DNA up to 60
nucleotides long
– 1.28 x 1.28+ cm glass wafer
•

GeneChip
Each “print head” has a ~100 m diameter and are
separated by ~100 m. ( 5,000 – 20,000 probes)
Photolithographic Chips
– Ex: Affymetix
– 1.28 x 1.28 cm glass/silicon wafer
•
24 x 24 m probe site ( 500,000 probes)
– Lengths of DNA up to 25 nucleotides long
– Requires a new set of masks for each new array type
Intro to gene chips - 4
Practical Application of DNA Microarrays

DNA Microarrays are used to study gene activity (expression)
– What proteins are being actively produced by a group of cells?
•

“Which genes are being expressed?”
How?
– When a cell is making a protein, it translates the genes (made of DNA)
which code for the protein into RNA used in its production
– The RNA present in a cell can be extracted
– If a gene has been expressed in a cell
•
•
RNA will bind to “a copy of itself” on the array
RNA with no complementary site will wash off the array
– The RNA can be “tagged” with a fluorescent dye to determine its presence

DNA microarrays provide a high throughput technique for quantifying
the presence of specific RNA sequences
Intro to gene chips - 5
The Process
Poly-A
RNA
Cells
AAAA
10% Biotin-labeled Uracil
Antisense cRNA
IVT
L
L L
(In-vitro
Transcription)
cDNA
Fragment (heat, Mg2+)
Labeled
fragments
L
Hybridize
Wash/stain
Scan
L
L
Intro to gene chips - 6
Hybridization and Staining
Biotin
Labeled cRNA
GeneChip
Hybridized Array
L
L
+
L
L
L
L
L
+
L
L
L
L
L
SAPE
Streptavidinphycoerythrin
Intro to gene chips - 7
The Result
A light
source scans the
array, causing the dyes to
fluoresce
The
glow is picked up by
a sensor and is used to
determine the relative
abundance of the RNA
This
information must be
processed to determine the
level of activity for each
expressed gene
Intro to gene chips - 8
The Goals

Basic Understanding
– Arrays can take a snap shot of which subset of genes in a cell is actively
making proteins
– Heat shock experiments

Medical diagnosis
– Microarrays can indicate where mutations lie that might be linked to a disease.
Still others are used to determine if a person’s genetic profile would make him
or her more or less susceptible to drug side effects
– 1999 – A genechip containing 6800 human genes was used distinguish between
myeloid leukemia and lympholastic leukemia using a set of 50 genes that have
different activity levels

Drug design
– Pharmaceutical firms are in a rush to translate the human genome results into
new products
•
•
Potential profits are huge
First, though, they must figure out what the genes do, how they interact, and how
they relate to diseases.
– Evaluation, Specificity, Response
Intro to gene chips - 9
The Gains




A decade of rapid advances in biology has swept an avalanche of
genetic information into scientist’s laps.
Mass analysis of the vast set of biologic data is impractical without
high-throughput techniques
DNA microarrays (aka Gene chips, biochips) allow researchers to look
for the presence, productivity, or sequence of thousand of genes
simultaneously
Advantages:
–
–
–
–
Speed
Feasibility
Sensitivity
Reproducibility
Intro to gene chips - 10
Outline

DNA Microarrays
– Fabrication
– Application

Microarray Data
– Analysis Techniques

New Technology &
Open Commentary
Intro to gene chips - 11
Microarray Data

First, the Problems:
1. The fabrication process is not
error free
2. Probes have a maximum
length 25-60 nucleotides
3. Biologic processes such as
hybridization are stochastic
4. Background light may skew
the fluorescence
5. How do we decide if/how
strongly a particular gene is
being expressed?

Solutions to these problems are
still in their infancy
Intro to gene chips - 12
Features




Problem #1: The fabrication process
is not error free
Solution: Each probe does not
represent a unique DNA sequence.
Probe set: A set of probes each
containing the same DNA sequence
(the Feature)
Remove outermost rows and columns
to avoid fabrication-based error
Intro to gene chips - 13
Feature Value
83
112
96
32
47
382
165
87
55
246
140
93
104
552
187
65
Remove outermost rows and columns
Find 75th percentile of remaining values
This value is taken as representative of this feature
Intro to gene chips - 14
How Features Are Chosen

Problem #2: Probes have a maximum length 25-60 nucleotides:
– Solution: Use multiple features per gene
– Affymetrix claims that this redundancy actually improves detection and
quantification of the target gene
5’
Gene Sequence
3’
Multiple
oligo probes
25-mers
Features
Intro to gene chips - 15
Feature Mismatches

Problem #3: Biologic processes such as hybridization are stochastic
– Solution: Include a “control” for each probe – a DNA sequence which differs
only slightly from the feature
– In a 25-mer, the mismatch sequence differs in the 13th position (A-T or G-C)
5’
Gene Sequence
3’
Multiple
oligo probes
25-mers
Perfect Match
Mismatch
Intro to gene chips - 16
Background Noise Removal





Problem #4: Background light may skew the fluorescence
“Measure of non-specific fluorescence attributed to hybridization
conditions and sample” = Noise
Solution: Estimate background noise and subtract intensity
The array is divided into equal sectors (16 is standard)
For each sector
– Find the lowest feature intensities (2%)
– Average these
– Subtract this average from the intensity value of all features in the sector
Intro to gene chips - 17
Average Difference Intensity


Problem #5: How do we decide if / how strongly a particular gene is
being expressed?
For a given gene
– For each feature match/mismatch pair for the given gene
•
Calculate the difference PM-MM
– Calculate ,  for this set
– Remove outliers from set
•
Ex: abs( (PM – MM) - )  3
– The average (PM – MM) difference over the set (minus outliers) is the
average difference intensity
– This value can be used to compare expression levels for the gene which
the features represent
1
PM i  MM i 
AvgDiff 

# pairs in avg ipairs in avg
Intro to gene chips - 18
Positive & Negative Probe Pairs



Problem #5: How do we decide if / how strongly a particular gene is being
expressed?
For each perfect match/mismatch feature pair in the gene, perform a
standard difference and ratio test
Example SRT and SDT thresholds:
– SRT  1.5
– SDT  a multiple of intensity  or 
PM/MM  SRT
PM-MM  SDT
MM/PM  SRT
MM-PM  SDT
If both true,
mark probe pair as positive evidence
If both true,
mark as probe pair as negative evidence
Otherwise,
mark probe pair as inconclusive
Intro to gene chips - 19
Voting Methods for Absolute Call

Problem #5: How do we decide if / how strongly a particular gene is
being expressed?
– Solution: Use decision matrix to make absolute call



Positive/negative ratio
Positive fraction
Log average ratio
PNR = # pos. calls / # neg. calls
PF = # pos. calls / # probe pairs
LA = 10 x avg. ( log (PM/MM) )
VOTE!
Absent
Marginal
Present
PNR
3.00
4.00
PF
0.33
0.43
LA
0.90
1.30
Intro to gene chips - 20
Average Difference and Absolute Call



Problem #5: How do we decide if / how strongly a particular gene is
being expressed?
Which of these do you base a decision on, for whether a gene is being
expressed?
Use the absolute call for decision if a particular gene is being expressed
Absent

Marginal
Present
PNR
3.00
4.00
PF
0.33
0.43
LA
0.90
1.30
Use average difference to compare how strongly a gene which is
present is expressed
1
PM i  MM i 
AvgDiff 

# pairs in avg ipairs in avg
Intro to gene chips - 21
Comparison Analysis


Compare probe sets between two gene chips to determine whether gene
expression increased, did not change or decreased
Comparison analysis has its own set of problems:
– The signals must be adjusted (if necessary) to normalize average signal levels




For each perfect match/mismatch feature pair in the gene, perform a
difference and ratio test
If both true, mark probe pair as evidence of increase from base
– PM/MMexperiment – PM/MMbase  Change Threshold
– (PM-MM)experiment /(PM-MM)base  Percentage Change Threshold
If both true, mark probe pair as evidence of decrease from base
– PM/MMbase - PM/MMexperiment  Change Threshold
– (PM-MM)base / (PM-MM)experiment  Percentage Change Threshold
Otherwise mark probe pair as unchanged
Intro to gene chips - 22
Voting Methods for Comparison Call





Increase fraction
Increase ratio
Log average ratio change
IR = # increase calls / # PP used
DR = # increase calls / # decrease calls
LAC = LAexp – Labase
If a change is called, use the average difference to measure percent change
Are there better ways to extract patterns from multivariate gene expression
profiles?
No Change
Marginal
Increase
IF
.33
.43
IR
3.0
4.0
LAC
0.90
1.30
Intro to gene chips - 23
Outline

DNA Microarrays
– Fabrication
– Application

Microarray Data
– Analysis Techniques

New Technology &
Open Commentary
Intro to gene chips - 24
Does Moore’s Law apply to Gene Chips?

Density
Genes/chip
200000
150000
100000
50000
0
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Year
Cost
Dollars/gene
1.2
Ideally, we
would like to fit
all of an
organism’s
genes on one
chip
– Current
estimates for
Humans are
between
30,000 –
40,000 genes
1
0.8
0.6
0.4
0.2
0
1994
1995 1996
1997
1998
1999 2000
2001
2002 2003
2004
Years
Intro to gene chips - 25
Field-Programmable Microarrays?

Nanogen has produced a silicon chip embedded with 100
“programmable” probe pads
– 80m platinum pads (each spaced about 200um apart)
– Each pad can have apply a voltage (-1.3 to 2.0 V)

Since DNA carries a negative charge, applying a positive charge on a
pad “corrals” DNA onto that spot
– This is used to build custom arrays by washing the chip in a single stranded
DNA solution, biasing the desired spot on the chip, and then chemically
fixing the DNA to that spot

The electric charge is also useful during the hybridization reaction
– Pooling the DNA onto the charged pads increases the reaction by a factor of
1000
– Reversing the charge “shakes loose” imperfectly matched DNA leading to
more accurate results
Intro to gene chips - 26
From the Rumor-Mill

Xeotron Corp: Maskless lithography
– An array of micro mirrors are used to direct/block light during fabrication

Motorola: 3D microarrays
– Arrays with a coating of acrylimide gel to allow “certain enzymatic
reactions” to occur that might be important to lab-on-a-chip applications

Motorola: Electrical intensity measures
– Arrays contain embedded circuitry to detect hybridization through a change
in conductance rather than fluorescence

Ciphergen Biosystems Inc. & Packard Instrument Co.: Protein chips
– Creates microarrays of antibodies (rather than DNA) to bind and identify
proteins
Intro to gene chips - 27
Acknowledgements




David Paoletti, Ph.D. Student, BIRG Lab, Wright State University.
Berberich, S, and McGorry, M; GeneChip protocols, Wright State
University.
Moore, S K; Making chips to probe genes, IEEE Spectrum, March 2001,
54-60.
GeneChip Gene Expression Algorithm Training, Affymetrix.
Intro to gene chips - 28
Questions ?

DNA Microarrays
– Fabrication
– Application

Microarray Data
– Analysis Techniques

New Technology &
Open Commentary
Intro to gene chips - 29
The End

DNA Microarrays
– Fabrication
– Application

Microarray Data
– Analysis Techniques

New Technology &
Open Commentary
Intro to gene chips - 30
Download