Global Analysis of Genomes and Proteomes Michael Snyder March 2004

advertisement
Global Analysis of Genomes
and Proteomes
Michael Snyder
March 2004
>100 Genomes Sequenced
Organism
Mycoplasma genitalium
E. coli
Saccharomyces cerevisiae
C. elegans
Drosophila melanogaster
Arabidopsis thaliana
Rice
Humans
Genome
Size (Mbp)
0.589
4.6
14
100
180
125
466
3,000
# Genes
470
4,500
6,000
19,427
15,000
27,000
60,000
30,000
ATGGAGGATCATGGGATTGTAGAAACTTTAAACTTTCTATCATCAACAAAAATCAAAGAGAGAAACAATGCTTTAGATGAGCTAACAACAATTTTAA
AAGAAGATCCGGAAAGGATACCAACCAAGGCCCTATCTACAACGGCAGAAGCTTTGGTAGAGTTACTTGCATCTGAACACACAAAATACTGTGACCT
TCTTCGAAACTTGACAGTGTCAACCACAAACAAGCTATCACTTAGTGAGAACAGACTCTCCACGATATCGTACGTTTTAAGATTATTTGTAGAAAAA
TCATGTGAGAGATTTAAAGTGAAAACGTTGAAGTTACTTTTAGCAGTAGTACCTGAATTAATGGTCAAAGATGGTTCCAAAAGTTTATTGGATGCCG
TTTCAGTACATTTATCGTTTGCTTTGGATGCCCTAATTAAAAGTGACCCTTTCAAACTGAAATTCATGATACACCAATGGATATCCTTAGTCGATAA
AATTTGCGAGTACTTTCAAAGCCAAATGAAATTATCTATGGTAGACAAAACATTGACCAATTTCATATCGATCCTCCTGAATTTATTGGCGTTAGAC
ACAGTTGGTATATTTCAAGTGACAAGGACAATTACTTGGACCGTAATAGATTTTTTGAGGCTCAGCAAAAAAGAAAATGGAAATACGAGATTAATAA
TGTCATTAATAAATCAATTAATTTTGAAGTGCCATTGTTTTAGTGTTATTGATACGCTAATGCTTATAAAAGAAGCATGGAGTTACAACCTGACAAT
TGGCTGTACTTCCAATGAGCTAGTACAAGACCAATTATCACTGTTTGATGTTATGTCAAGTGAACTAATGAACCATAAACTTCCTTATATGATTGGT
CAAGAGAATTATGTTGAAGAGCTTCGGTCCGAATCTCTTGTATCTCTATACCGTGAGTACATTCTACTGCGCTTAAGTAATTATAAGCCTCAATTAT
TTACCGTAAACCATGTGGAATTCTCATATATTCGAGGTTCAAGGGATAAAAATTCATGGTTTGCATTACCTGATTTTAGACTTAGAGATAGGGGAGG
CAGATCGGTGTGGTTAAAAATACTCGGAATTACCAAATCATTGTTAACATATTTTGCATTGAACAGAAAAAATGAAAATTACTCATTATTATTTAAA
AGAAGAAAATGTGATTCGGATATACCTTCTATCCTACGGATTTCTGACGATATGGACACATTTCTTATTCATCTTTTAGAGGAGAACAGCTCACATG
AGTTTGAAGTGCTAGGATTACAATTGTGCTCATTTTATGGAACTTTACAAGACTTCACTAAAAGTTTTGCAGAACAGCTGAAAGAACTTCTGTTTTC
AAAATTCGAAAAAATCCAATGCTTTAATTGGGTTTGTTTTTCTTTTATTCCTTTATTATCCCAAAAAGAATGCGAATTAAGCAATGGCGACATGGCA
CGCCTATTTAAAGTTTGCTTACCATTAGTAAAATCAAATGAATCTTGCCAGTTAAGTTGTCTTTTATTAGCCAACTCCATAAAGTTTTCAAAGCAGC
TTTTATCCGATGAGAAAACTATCAATCAGATATATGATCTTTACGAATTATCCGATATTTTGGGTCCCATATTAGTTACTAATGAATCGTTCATGCT
ATGGGGATACCTTCAGTACGTTGGTAAAGACTTCCAATCTATGAACGGTATATCGTCCGCTGATAGAATTTTTGAGTGGCTAAAATCAAAGTGGAAC
CAGTTGCGCGGAACTGATGCTAAACAGGATCAGTTCTGCAATTTTATATCCTGGTTAGGTAACAAATATGACCCAGAGAACCCTTTCAACGATAAAA
AAGGCGAAGGAGCTAATCCTGTCTCACTATGTTGGGATGAAAGCCACAAGATTTGGCAACATTTTCAAGAGCAGAGGGAATTTCTTTTAGGCGTAAA
ACCAGAAGAAAAGTCAGAATGTTTTAACACTCCCTTTTTTAATTTACCAAAAGTTTCCTTAGACCTCACACGTTATAATGAAATTCTTTACAGATTA
CTGGAAAATATTGAAAGTGATGCATTTTCATCTCCACTACAAAAATTTACTTGGGTAGCAAAATTAATACAAATAGTTGATAATCTTTGTGGAGATT
CCACTTTTTCTGAGTTTATTGCAGCATATAAGAGAACAACCTTAATAACTATTCCACAACTTAGTTTTGATAGCCAAAACTCCTACCAATCATTTTT
TGAGGAGGTTTTATCGATACGGACCATAAATGTAGACCATTTAGTGCTTGACAAAATTAATATGAAGGAAATCGTTAATGATTTTATCAGGATGCAA
AAAAACAAATCTCAAACAGGAACTTCTGCCATCAATTACTTCGAAGCCTCTTCAGAAGACACTACCCAGAATAATAGTCCGTACACAATTGGAGGTA
GATTTCAGAAGCCTCTGCACTCCACTATAGATAAAGCAGTGCGAGCTTACCTATGGTCTTCAAGAAATAAATCCATTTCAGAGCGTTTGGTAGCCAT
ATTGGAATTTTCTGATTGCGTTAGCACAGATGTATTTATATCTTATCTTGGCACTGTTTGCCAGTGGTTAAAACAAGCAATCGGGGAGAAATCTTCT
TACAACAAAATCCTGGAAGAATTCACTGAAGTCTTGGGTGAAAAATTGCTTTGCAACCACTATAGTTCTTCCAATCAAGCTATGCTTTTACTTACAT
CTTATATCGAAGCAATAAGACCTCAATGGTTATCTTACCCCGAGCAGCCTTTGAATTCGGACTGCAATGATATCCTGGACTGGATCATATCTAGATT
TGAGGACAATTCTTTCACTGGTGTGGCCCCTACGGTCAACCTTTCTATGCTGCTGCTTAGCCTACTTCAAAATCATGATCTTTCCCACGGATCAATC
AGAGGTGGGAAGCAGAGAGTCTTTGCAACTTTTATTAAATGCCTGCAAAAGCTAGACTCCTCCAATATTATTAACATAATGAACAGTATTTCGAGTT
ATATGGCCCAAGTGAGCTATAAGAATCAAAGTATCATATTTTATGAGATTAAGAGCTTATTTGGTCCGCCTCAGCAAAGTATTGAAAAGTCCGCTTT
CTACTCTCTTGCAATGTCCATGTTGTCTTTGGTGTCTTACCCAAGCTTAGTTTTTTCTTTGGAGGATATGATGACATACTCTGGCTTCAATCATACT
CGTGCGTTTATCCAACAAGCTCTGAACAAAATTACGGTCGCTTTTCGCTACCAAAACCTTACAGAGCTCTTCGAATATTGTAAGTTTGATTTGATTA
TGTACTGGTTTAACAGAACAAAAGTCCCTACTTCTAAATTGGAGAAAGAATGGGATATATCTCTTTTTGGATTTGCCGATATTCATGAATTTTTAGG
AAGATACTTTGTAGAAATTTCTGCAATCTACTTTTCTCAAGGTTTCAACCAAAAATGGATCTTAGACATGTTACACGCGATTACTGGAAACGGTGAT
GCTTATCTGGTGGATAACAGCTATTACTTGTGTATTCCACTTGCCTTTATCAGTGGCGGTGTGAATGAACTAATATTTGATATATTGCCCCAAATAT
Genomics and Proteomics Projects
Gene
Disruption
Protein-Protein
Interactions
Bioinformatics
Gene & Protein
Expression
Identify Genes
& Proteins
Protein
Localization
Gene
Regulation
Biochemical
Genomics
Structural
Genomics
S. cerevisiae
• 6000 Protein Coding Genes
• 2/3 of Yeast Proteins Homologous
to those of Vertebrates
Yeast Localizome
Lys21
>4000 Proteins Localized
~1400 Nuclear
HA
600 Chromosomal
Find All Targets
DAPI
Yeast ChIp-chip
Epitope-tagged
Untagged
Crosslink
Lyse
Sonicate
IP
Reverse X-links
Label
Hybridize to
Intergenic Array
Nonspecific
DNA
Swi4 ChIP Chip
Summary of Swi4-Binding Targets
16 3 Tot al Int ergenic Regions
40 % Neighbor an ORF wit h G1 / S periodicit y of expression
7 0 % Cont ain one or more SCB
1 8 1 Pot ent ial Gene Target s
28 Involved in cell wall maint enance ( ERG1 )
12 Involved in cell cycle cont rol ( CLN1 )
9 Involved in cell polarit y and morphogenesis ( CLA4 )
3 Involved in DNA synt hesis/ repair ( POL1 )
13 Transcript ion fact ors
4 Hist ones
7 Involved in mult i-drug resist ance
3 Involved in microt ubule funct ion
1 0 2 Ot her/ unknown f unct ion
The G1/S Transcription Network
Species Variation
Different Genes vs
Differential Gene Expression
Conserved Morphogenic Pathways
S. cerevisiae
MAPK
Signaling
Pathway
cAMP
Signaling
Pathway
Ste12p Tec1p
Sok2p
Pseudohyphal growth
C. albicans
MAPK
Signaling
Pathway
cAMP
Signaling
Pathway
Cph1p Tec1p Efg1p Cph2p
Dimorphic growth
and virulence
• Sok2
– 207 targets
2
0
GO categories
Transcription
– 144 targets
4
Cell Wall
• Tec1
genome
chIP hits
6
Cell cycle
– 112 targets
8
Budding,
polarity &
morphogensis
• Ste12
fraction of targets
chIP chip Target Genes
S. cerevisiae Targets
Ste12
Tec1
Sok2
– 620 targets
• Cph2
– 433 targets
GO categories
Pathogenicity
• Cph1 (Ste12)
Transcription
– 589 targets
chIP hits
Cell wall
• Efg1 (Sok2)
2
1
0
genome
Cell cycle
– 359 targets
4
3
Morphogenesis
• Tec1 (Tec1)
fraction of targets
C. albicans chIP chip
Conserved factors bind to different
target genes
Genome
Cph1-Ste12
Homolog
Family
Efg1-Sok2
Tec1-Tec1
0
0.1
0.2
0.3
0.4
Fraction of homologous genes
0.7
Combinatorial Binding of Factors
S. cerevisiae
Ste12 Tec1
Sok2
C. albicans
Cph1 Cph2 Efg1 Tec1
Conserved (core) v.s.
C. albicans-specific targets
C.albicans
genome
All chIP targets
Cph1 targets
Tec1 targets
Efg1 targets
Cph2 targets
core
C.a specific
C.a and S.c only
core but NOT S.c
Chromosome 22
Genomic DNA
Array: 21,024
PCR products
820 bp ave. size.
Hybridized to
Placental polyA+
RNA
50% of
Transcribed
Regions Are
Not Annotated
Transcriptional Activity of Chromosome 22
Hybridization in
unannotated region
Hybridization in
annotated region
Many Unannotated Hybridizing
Sequences are Conserved in the Mouse
Mapping NFKB
Binding Sites on
Chromosome 22
209 Binding Sites
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
Examples of NFKB Targets
PIK4CA
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
5’
Up Regulated
BASC/MKL1
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
5’
Down Regulated
TXN2
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
5’
No Change
Potential p65 Targets on Ch22
• PDGF
• MIF
• TIMP3
• ATF 4
• BIK (Bcl2 interacting killer)
• EWSR1
• IL2R-b
• PPAR
Location of
NFKB
Binding Sites
unannotated
6%
Relative to 5 kb proximal to
novel transcript
16%
Genes
10 kb upstream
10%
novel transcript
1%
exon
1%
other intron
27%
5 kb upstream
27%
1st intron
12%
ChIP chip Summary
• Map binding sites of transcription factors in yeast and
humans
• Determination of binding site targets provides new insights
into biological functions of factors
• In humans binding sites lie in many locations relative to
target genes
• Regulatory circuits in related organisms can be highly
divergent even though the processes and regulators
themselves may be highly conserved.
Two Types of Protein Microarrays
Antibody Microarrays
Antigens
Functional Protein Microarrays
Protein-Protein Small Molecule Enzymatic
Interactions
Interactions
Assays
ATP
ADP
Yeast Protein Kinases
122 Protein Kinase Homologs
-All Members of Ser/Thr
Family
-24 Uncharacterized
Producing the Yeast Proteome
GST-His6::ORF1
5,800 expression clones 93.7%
KD
250
175
105
75
60
55
35
20
~80% full-length proteins
Printing the Yeast Proteome
GST:P1
GST:P2
GST:P3
Source Plate
Protein-Protein
Protein-Lipid
Protein-DNA
The Yeast Proteome Chip
A
C
2 mm
500
450
Number of Spots
B
400
350
Probed With
Anti-GST Antibodies
300
250
200
150
100
50
0
100
0
100

Screens Thus Far
• 15 Protein-Protein Interactions
• 8 Protein-Lipid Interactions
• 3 Nucleic Acids (dsDNA, ssDNA,
polyA-mRNA)
• 4 Small Molecule Screens
• 3 Posttranslational Modifications
• 14 Antibodies
Probe
a-GST
Biochemical Assays on Proteome Chips
Calmodulin
PI(3)P
PI(4,5)P2
Calmodulin-Binding Proteins
• 12 Known or Suspected Targets
• 33 New Binding Proteins
• Derived New Consensus Binding Site
14
7
0
IQ
L
L
RV
K
K
S
R
K
I
YFL003C/MSH4
YJR073C/OPI3
YBR050C/REG2
YNL202W/SPS19
YOL016C/CMK2
YBR011C/IPP1
L K E T L Q S VK S L K D A L
H S V D L Q S SK F Q L A I V
D E H F I Q R LP S T R L N S
A K I P L Q R LG S T R D I A
D D L R L Q S QK K G G E L T
L N P I I Q D TK K G K L R F
L
R
DS
H RL
G
PT
P F S
D
KQ
I
G
E
Q
HC
NN
G
V V
L
V N
S
E
F
Y
A
I
K
S
Identification of Drug Targets
Nutrient
Rapa
Drug (SMIR)
Fpr1p
Tor1/2p
???
Translation Glycogen
Accumulation
Arrest
J. Huang, H. Zhu, S.
Schreiber, M. Snyder
G1
Arrest
SMIR3 8 Targets
SMIR4 30 Targets
Identification of
New DNA
Binding Activities
Cy3 labeled
genomic DNA
Probe proteome chip
Summary of Genomic DNA Screen
• ~200 Proteins bound DNA probe
• 8 Novel ChIP chiped
– 5 No loci enriched
– 3 Showed enrichment:
Mtw1, Dig2, Arg5,6
Arg5,6 ChIP chipTargets
NAME
Location
15S rRNA
COX1
COX1
COX1
COX1
COX1
COX1
COX1
COX1
COX1
COX1
COB1
COB1
COB1
COX3
THI13/YDL244w
RIM8
YGL015c/PUF4
YHL046c
YLL064c
PHO23
MEK1/YOR352w
3' end
upstream
1st exon
1st intron
2nd intron
3rd intron
4th exon
5th exon
6th exon
last intron
last exon
1st exon
4th intron
6th exon
Internal
Upstream
Upstream
Upstream
Upstream
Upstream
Upstream
Upstream
Chromosome
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Mitochondria
Chromosome 4
Chromosome 7
Chromosome 7
Chromosome 8
Chromosome 12
Chromosome 14
Chromosome 15
Arg5,6 Binds DNA In Vitro
A
Purified Arg5,6 proteins
KD
160
105
75
50
35
C
bp
nM
5’-COX1+GST
D
bp
400
500
400
300
60
nM
500
400
300
EBNA DNA+GST::Arg5,6
bp
5’-COX1+GST::Arg5,6
B
nM
Arg5,6 Targets Require Arg5,6
Protein for Expression
Fold Enrichment: arg5,6D / WT
Media
Rich
7.50
Nitrogen Depletion
AA Starvation
Cox1Gen
Cox1Proc
5.50
Cob1Gen
Cob1Proc
3.50
Cox3
Puf4
1.50
-0.50
-2.50
Yor352w
Yhl045w
21s
Cox2
Act1
Antibody Probing of the Yeast Proteome Microarray
Antibody
# of +s
1
Monoclonal (3 Yeast + 3 Control) a-Sed3, a-Cox4
4
a-Pep12
Anti-Peptide Polyclonal (6)
a-Hda1
a-Mad2
8
1
Anti-FL Protein Polyclonal (2)
a-Nap1
a-Cdc11
1770
7
Cdc11
Anti-Nap1
Sed3
Mad2
a-Sed3p
Protometrix
Kinase Assay on a Proteome Chip
• 33P-g-ATP labeling
• 41 positives
• High resolution
• Quantitative & sensitive
• Low background
• Little reagent needed
Kinase Signaling Network
Kinase A
Kinase D
Protein 5
Protein 1
Kinase B
Protein 4
Kinase E
Protein 2
Protein 6
Protein 7
Kinase C
Protein 8
Protein 3
Protein 8
Acknowledgments
ChIP Chip - Yeast
Christine Horak
Vishy Iyer
Pat Brown
Anthony Borneman
Haiyuan Lu
Nick Luscombe
Jiang Qian
Mark Gerstein
Human Chromo 22
Ghia Euskirchen
John Rinn
Becky Goetsch
Ken Nelson
Steve Hartman
Sherman Weissman
Fred Sayward
Perry Miller
Nick Luscombe
Tom Royce
Mark Gerstein
Acknowledgments
Protein Chips
Heng Zhu
Metin Bilgin
Rhonda Bangham
Dave Hall
Antonio Casamayor
Scott Bidlingmaier
Ghil Jona
Geeta Devgan
Jason Ptacek
Informatics
Paul Bertone
Ron Jansen
Ning Lan
Xiaowei Zhu
Mark Gerstein
Small Molecule
Jing Huang
Stuart Schreiber
Protometrix
Greg Michaud
Michael Salcius
Fang Zhou
Rhonda Bangham
Jaclyn Bonin
Barry Schweitzer
Paul Predki
http://bioinfo.mbb.yale.edu/proteinchip
Download