Document

advertisement
Array quantitation for modeling mutations
affecting RNA, protein interactions & cell
proliferation.
CHI Macroresults through Microarrays 3
George Church 1-May-02
Thanks to the Lipper Center for Computational Genetics
Government and private grant agencies: NHLBI,
NSF, ONR, DOE, DARPA, HHMI, Armenise
Corporate collaborators & sponsors:
Affymetrix, GTC, Mosaic, Aventis, Dupont, Cistran
gggatttagctcagtt
gggagagcgccagact
gaa
gat
Post- 300
genomes &
3D structures
ttg
gag
gtcctgtgttcgatcc
acagaattcgcacca
Biosystems Measures & Models
Environment
Metabolites
RNAi
Insertions
SNPs
DNA
RNA
Replication rate
Protein: in vivo
& in vitro
interactions
Microbes
Cancer & stem cells
Darwinian
In vitro replication
Small multicellular organisms
Functional Genomics Challenges
• Systems dynamics and optimality modeling.
• Multiple genetic domains per gene: high density
readout of whole genome mutant phenotypes.
• Multiple RNAs & regulatory proteins per gene.
• Many causative genes & haplotypes per disease.
• Polony RNA exon-typing
• Multiplex in situ RNA & protein analyses
• Automated differentiation
• Homologous recombination genome engineering
Human
Red Blood Cell
ODE model
200 measured
parameters
ADP ATP
FDP
DHAP
PEP
GL6P
GO6P
ADP
ATP
NADH
NAD
RU5P
X5P S7P
NADP
NADP
NADPH
NADPH
GLCi
ATP
ADP
2 GSH
GSSG
GA3P
E4P
F6P
NADPH NADP
ADO
ADE
ADP
INO
LACi
ClpH
AMP
ATP
ADOe
2PG
R5P GA3P F6P PYR
ADP +
K
Na+
2,3 DPG
F6P
G6P
GLCe
3PG
GA3P
ADP
ATP
ADP
ATP
1,3 DPG
NADH
NAD
IMP ATP
HCO3-
PRPP
AMP
PRPP
Jamshidi, Edwards,INOe
R1P
HYPX
Fahland, Church,
Palsson, B.O. (2001)
Bioinformatics 17: 286.
(http://atlas.med.harvard.edu/gmc/rbc.html)
ATP
R5P
ADEe
LACe
Modeling
suboptimality:
Segre, Edwards,
Vitkup
Calculated
&and
Observed
Fluxes in wt
Sauer data
FBA fluxes comparison
200
180
Wild type, C 0.4-limited CC=0.97
160
7
8
140
Calculted
LPFlux
wt
120
9
100
10
80
3
14 13 11
12 1
60
2
40
16
20
4
180
6
5
1517
0
20
40
60
80
100
120
Sauer wild
type in
Observed
Fluxes
140
wt
160
180
200
Replication rate of a whole-genome set of
mutants
Badarinarayana, et al. (2001) Nature Biotech.19: 1060
Replication rate challenge met: multiple
homologous domains
thrA
1
1.1
metL
probes
2
3
6.7
1
2
1.8
1.8
lysC
3
1
2
10.4
Selective disadvantage in
minimal media
Multiple mutations per gene
Correlation between two selection experiments
Badarinarayana, et al. (2001) Nature Biotech.19: 1060
Comparison of selection data with Flux Balance
Optimization predictions on 488 genes
predictions
number
of genes
negatively
selected
essential
143
80
reduced
growth rate
46
24
non
essential
299
119
not negatively
selected
>
22
<
Position
effects, toxin
accumulation, non-opt?
P-value Chi Square = 0.004
63
180
Novel
duplicates?
Biosystems Measures & Models
Environment
Metabolites
RNAi
Insertions
SNPs
DNA
RNA
Replication rate
microbes
cancer & stem cells
In vitro replication
small multicellular organisms
Protein: in vivo
& in vitro
interactions
RNA quantitation issues
Small fold changes in RNA are important.
Example: 1.5-fold in trisomies.
Cross-hybridizing RNAs.
Alternative RNAs, gene families.
Mixed tissues.
In situ hybridization has low multiplex.
Gene Expression database
Aach, Rindone, Church, (2000) Genome Research 10: 431-445.
experiment
• Microarrays
1
ORF
control
• R/G ratios
• R, G values
• quality indicators
ORF
• Affymetrix2
• Averaged PM-MM
• “presence”
PM
MM
• feature statistics
• 25-mers
• Lynx-MPSS3,
SAGE4
1
agactagcag
• Counts of 14-mers
sequence tags for
each ORF
DeRisi, et.al., Science 278:680-686 (1997)
2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996)
3 Brenner et al. Massively Parallel Signature Sequencing, Nat Biotechnol. 18:630-4 (2000)
4 Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995)
RNA Cluster Analyses: Cell Cycle
1
0.5
0
-0.5
-1
-1.5
-1.5
-2
1.5
1
0.5
-2
23
30
11
40
16
8
5
4
0
00
-1
00
-2
00
00
-4
00
-5
00
-6
00
-7
00
-8
00
00
0
-1
Distance from ATG (b.p.)
Distance from ATG (b.p.)
MCB
SCB
Number of ORFs
100
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Tavazoie, et al.
P-value
-Log10
18
16
14
12
10
8
6
4
2
0
0
00
-1
00
-2
00
-3
00
-4
00
-5
00
00
-6
-7
00
-8
00
-9
-1
00
0
Number of sites
35
30
25
20
15
10
5
0
-9
-1.5
Number of sites
0.5
0
DNAsynthesis andreplication(82)
Cell cycle control andmitosis (312)
RecombinationandDNArepair (84)
Nuclear organization(720)
-0.5
-1
Number of ORFs
2
1.5
1
-0.5
-1
-1.5
ORFs within
functional category
(k)
2.5
2
0
2.5
N = 186
MIPSFunctional category(total ORFs)
3
s.d. from mean
2.5
2
1.5
1.5
1
0.5
0
-0.5
-1
-3
Replication & DNA synthesis (2)
3
2.5
2
CLUSTER
35
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1999 Nature Genetics 22:281.
CLUSTER
Combining mouse
knockouts with
RNA array analysis
(homeobox gene Crx-/-)
Livesey, Furukawa, Steffen,
Church, Cepko (2000)
Current Biol. 10:301.
sp
Biosystems Measures & Models
Environment
Metabolites
RNAi
Insertions
SNPs
DNA
RNA
Replication rate
microbes
cancer & stem cells
In vitro replication
small multicellular organisms
Protein: in vivo
& in vitro
interactions
Combinatorial arrays for binding constants
Human/Mouse EGR1
HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen
MRC: Yen Choo
ds-DNA
array
Combinatorial arrays for binding constants
pVIII
Antibodies
pIII
Combinatorial
DNA-binding
protein domains
Phage
ds-DNA
array
Combinatorial arrays for binding constants
Phycoerythrin
- 2º IgG
Combinatorial
DNA-binding
protein domains
Phage
ds-DNA
array
Martha Bulyk et al
Interactions of Adjacent Basepairs in EGR1
Zinc Finger DNA Recognition
Isalan et al., Biochemistry (‘98) 37:12026-12033
Wildtype EGR1 Microarray
high [DNA]
(+) ctrl sequence
for wt binding
etc.
alignment oligos
Wildtype
RSDHLTT
Motifs weight all 64 Kaapp
TGG 2.8 nM
GCG 16 nM
RGPDLAR
REDVLIR
LRHNLET
KASNLVS
2.5 nM
TAT 5.7 nM
AAA,AAT,ACT,AGA,
AGC,AGT,CAT,CCT,
CGA,CTT,TTC,TTT
AAT 240 nM
Biosystems Measures & Models
Environment
Metabolites
RNAi
Insertions
SNPs
DNA
RNA
Replication rate
microbes
cancer & stem cells
In vitro replication
small multicellular organisms
Protein: in vivo
& in vitro
interactions
Common diseases: billions of “new” alleles
plus a millions of balanced polymorphisms
• 60 new mutations per generation * 5,000 generations since major
bottleneck(s) which set up the linkage patterns (=300,000 per genome)
• Each of the 3 Gbp in the genome exist in all SNP forms: A,C,G,T,D
600,000 of each SNP on earth (spread over the common haplotypes).
The population frequency will be <0.01%.
(Aach et al, 2001 Nature 409: 856)
• Functional genomics (FG) may provide better leads for
therapies & diagnostics. (Accuracy goal 1 ppb?)
Projected costs affect our view of
what is possible.
In 1985, the dawn of the genome project,
$10 per bp, would have been $30B per genome.
In 2002, Perlegen or Lynx: $3M (103 bits/$, 4 logs)
In 2001, the cost of video data collection?
1013 bits/$
Genotyping & functional genomics demand will probably
be as high as permitted by costs.
Why lower-cost, high quality
“sequencing”?
Environmental, food, & biodiversity monitoring
Human genome haplotyping
RNA splicing & editing
immune B&T cell receptor spectra
& How?
Femtoliter (10-15) scale & low-cost scanners
Polymerase DNA colonies (polonies)
Fluorescent in situ sequencing (FISSEQ)
Mitra & Church Nucleic Acids Res. 27: e34
Primer A has 5’ immobilizing
(Acrydite) modification.
Single Molecule From Library
B
A’
A’
A’
B
B
B
A’
A’
B
A’
Primer is Extended
by Polymerase
B
A’
A’
B
A’
B
B
1st Round of PCR
Sequence polonies by sequential,
fluorescent single-base extensions
3’ 5’
1.
2.
3.
4.
Remove 1 strand of DNA.
Hybridize Universal Primer.
Add Red (Cy3) dTTP.
Wash; Scan Red Channel
3’ 5’
B B’
B B’
G
C
G
.
.
AT
G
T
.
.
Sequence polonies by sequential,
fluorescent single-base extensions
5. Add Green (FITC) dCTP
6. Wash; Scan Green Channel
3’ 5’
3’ 5’
B B’
B B’
GC
C
G
.
.
AT
GC
T
.
Primer Extension 26 cycles, 34 Nucleotides
Mean Intensity: 58, 0.5
40, 6.5
0.3, 48
0.4, 43
Polony Template
3’
5’
P’
P
TATTGTTAAAGTGTGTCCTTTGTCGATACTGGTA…5’
A TAACAAT TTCACACAGGAAACAGCTATGAC CAT
FITC ( C )
CY3 ( T )
Why lower-cost, high quality
“sequencing”?
Environmental, food, & biodiversity monitoring
•Human genome haplotyping
RNA splicing & editing
immune B&T cell receptor spectra
& How?
Femtoliter (10-15) scale & low-cost scanners
Polymerase DNA colonies (polonies)
Fluorescent in situ sequencing (FISSEQ)
Mitra & Church Nucleic Acids Res. 27: e34
Why lower-cost, high quality
“sequencing”?
Environmental, food, & biodiversity monitoring
Human genome haplotyping
•RNA splicing & editing
immune B&T cell receptor spectra
& How?
Femtoliter (10-15) scale & low-cost scanners
Polymerase DNA colonies (polonies)
Fluorescent in situ sequencing (FISSEQ)
Mitra & Church Nucleic Acids Res. 27: e34
RNA Exon typing
•Single molecules of RNA dispersed.
•Multiplex polonies spanning all likely variable exons
•Sequential probing of each exon.
Functional Genomics Challenges
• Systems dynamics and optimality modeling.
• Multiple genetic domains per gene: high density
readout of whole genome mutant phenotypes.
• Multiple RNAs & regulatory proteins per gene.
• Many causative genes & haplotypes per disease.
• Polony RNA exon-typing
• Multiplex in situ RNA & protein analyses
• Automated differentiation
• Homologous recombination genome engineering
For more information:
arep.med.harvard.edu
Download