Global Analysis of Genomes and Proteomes Michael Snyder March 2004 >100 Genomes Sequenced Organism Mycoplasma genitalium E. coli Saccharomyces cerevisiae C. elegans Drosophila melanogaster Arabidopsis thaliana Rice Humans Genome Size (Mbp) 0.589 4.6 14 100 180 125 466 3,000 # Genes 470 4,500 6,000 19,427 15,000 27,000 60,000 30,000 ATGGAGGATCATGGGATTGTAGAAACTTTAAACTTTCTATCATCAACAAAAATCAAAGAGAGAAACAATGCTTTAGATGAGCTAACAACAATTTTAA AAGAAGATCCGGAAAGGATACCAACCAAGGCCCTATCTACAACGGCAGAAGCTTTGGTAGAGTTACTTGCATCTGAACACACAAAATACTGTGACCT TCTTCGAAACTTGACAGTGTCAACCACAAACAAGCTATCACTTAGTGAGAACAGACTCTCCACGATATCGTACGTTTTAAGATTATTTGTAGAAAAA TCATGTGAGAGATTTAAAGTGAAAACGTTGAAGTTACTTTTAGCAGTAGTACCTGAATTAATGGTCAAAGATGGTTCCAAAAGTTTATTGGATGCCG TTTCAGTACATTTATCGTTTGCTTTGGATGCCCTAATTAAAAGTGACCCTTTCAAACTGAAATTCATGATACACCAATGGATATCCTTAGTCGATAA AATTTGCGAGTACTTTCAAAGCCAAATGAAATTATCTATGGTAGACAAAACATTGACCAATTTCATATCGATCCTCCTGAATTTATTGGCGTTAGAC ACAGTTGGTATATTTCAAGTGACAAGGACAATTACTTGGACCGTAATAGATTTTTTGAGGCTCAGCAAAAAAGAAAATGGAAATACGAGATTAATAA TGTCATTAATAAATCAATTAATTTTGAAGTGCCATTGTTTTAGTGTTATTGATACGCTAATGCTTATAAAAGAAGCATGGAGTTACAACCTGACAAT TGGCTGTACTTCCAATGAGCTAGTACAAGACCAATTATCACTGTTTGATGTTATGTCAAGTGAACTAATGAACCATAAACTTCCTTATATGATTGGT CAAGAGAATTATGTTGAAGAGCTTCGGTCCGAATCTCTTGTATCTCTATACCGTGAGTACATTCTACTGCGCTTAAGTAATTATAAGCCTCAATTAT TTACCGTAAACCATGTGGAATTCTCATATATTCGAGGTTCAAGGGATAAAAATTCATGGTTTGCATTACCTGATTTTAGACTTAGAGATAGGGGAGG CAGATCGGTGTGGTTAAAAATACTCGGAATTACCAAATCATTGTTAACATATTTTGCATTGAACAGAAAAAATGAAAATTACTCATTATTATTTAAA AGAAGAAAATGTGATTCGGATATACCTTCTATCCTACGGATTTCTGACGATATGGACACATTTCTTATTCATCTTTTAGAGGAGAACAGCTCACATG AGTTTGAAGTGCTAGGATTACAATTGTGCTCATTTTATGGAACTTTACAAGACTTCACTAAAAGTTTTGCAGAACAGCTGAAAGAACTTCTGTTTTC AAAATTCGAAAAAATCCAATGCTTTAATTGGGTTTGTTTTTCTTTTATTCCTTTATTATCCCAAAAAGAATGCGAATTAAGCAATGGCGACATGGCA CGCCTATTTAAAGTTTGCTTACCATTAGTAAAATCAAATGAATCTTGCCAGTTAAGTTGTCTTTTATTAGCCAACTCCATAAAGTTTTCAAAGCAGC TTTTATCCGATGAGAAAACTATCAATCAGATATATGATCTTTACGAATTATCCGATATTTTGGGTCCCATATTAGTTACTAATGAATCGTTCATGCT ATGGGGATACCTTCAGTACGTTGGTAAAGACTTCCAATCTATGAACGGTATATCGTCCGCTGATAGAATTTTTGAGTGGCTAAAATCAAAGTGGAAC CAGTTGCGCGGAACTGATGCTAAACAGGATCAGTTCTGCAATTTTATATCCTGGTTAGGTAACAAATATGACCCAGAGAACCCTTTCAACGATAAAA AAGGCGAAGGAGCTAATCCTGTCTCACTATGTTGGGATGAAAGCCACAAGATTTGGCAACATTTTCAAGAGCAGAGGGAATTTCTTTTAGGCGTAAA ACCAGAAGAAAAGTCAGAATGTTTTAACACTCCCTTTTTTAATTTACCAAAAGTTTCCTTAGACCTCACACGTTATAATGAAATTCTTTACAGATTA CTGGAAAATATTGAAAGTGATGCATTTTCATCTCCACTACAAAAATTTACTTGGGTAGCAAAATTAATACAAATAGTTGATAATCTTTGTGGAGATT CCACTTTTTCTGAGTTTATTGCAGCATATAAGAGAACAACCTTAATAACTATTCCACAACTTAGTTTTGATAGCCAAAACTCCTACCAATCATTTTT TGAGGAGGTTTTATCGATACGGACCATAAATGTAGACCATTTAGTGCTTGACAAAATTAATATGAAGGAAATCGTTAATGATTTTATCAGGATGCAA AAAAACAAATCTCAAACAGGAACTTCTGCCATCAATTACTTCGAAGCCTCTTCAGAAGACACTACCCAGAATAATAGTCCGTACACAATTGGAGGTA GATTTCAGAAGCCTCTGCACTCCACTATAGATAAAGCAGTGCGAGCTTACCTATGGTCTTCAAGAAATAAATCCATTTCAGAGCGTTTGGTAGCCAT ATTGGAATTTTCTGATTGCGTTAGCACAGATGTATTTATATCTTATCTTGGCACTGTTTGCCAGTGGTTAAAACAAGCAATCGGGGAGAAATCTTCT TACAACAAAATCCTGGAAGAATTCACTGAAGTCTTGGGTGAAAAATTGCTTTGCAACCACTATAGTTCTTCCAATCAAGCTATGCTTTTACTTACAT CTTATATCGAAGCAATAAGACCTCAATGGTTATCTTACCCCGAGCAGCCTTTGAATTCGGACTGCAATGATATCCTGGACTGGATCATATCTAGATT TGAGGACAATTCTTTCACTGGTGTGGCCCCTACGGTCAACCTTTCTATGCTGCTGCTTAGCCTACTTCAAAATCATGATCTTTCCCACGGATCAATC AGAGGTGGGAAGCAGAGAGTCTTTGCAACTTTTATTAAATGCCTGCAAAAGCTAGACTCCTCCAATATTATTAACATAATGAACAGTATTTCGAGTT ATATGGCCCAAGTGAGCTATAAGAATCAAAGTATCATATTTTATGAGATTAAGAGCTTATTTGGTCCGCCTCAGCAAAGTATTGAAAAGTCCGCTTT CTACTCTCTTGCAATGTCCATGTTGTCTTTGGTGTCTTACCCAAGCTTAGTTTTTTCTTTGGAGGATATGATGACATACTCTGGCTTCAATCATACT CGTGCGTTTATCCAACAAGCTCTGAACAAAATTACGGTCGCTTTTCGCTACCAAAACCTTACAGAGCTCTTCGAATATTGTAAGTTTGATTTGATTA TGTACTGGTTTAACAGAACAAAAGTCCCTACTTCTAAATTGGAGAAAGAATGGGATATATCTCTTTTTGGATTTGCCGATATTCATGAATTTTTAGG AAGATACTTTGTAGAAATTTCTGCAATCTACTTTTCTCAAGGTTTCAACCAAAAATGGATCTTAGACATGTTACACGCGATTACTGGAAACGGTGAT GCTTATCTGGTGGATAACAGCTATTACTTGTGTATTCCACTTGCCTTTATCAGTGGCGGTGTGAATGAACTAATATTTGATATATTGCCCCAAATAT Genomics and Proteomics Projects Gene Disruption Protein-Protein Interactions Bioinformatics Gene & Protein Expression Identify Genes & Proteins Protein Localization Gene Regulation Biochemical Genomics Structural Genomics S. cerevisiae • 6000 Protein Coding Genes • 2/3 of Yeast Proteins Homologous to those of Vertebrates Yeast Localizome Lys21 >4000 Proteins Localized ~1400 Nuclear HA 600 Chromosomal Find All Targets DAPI Yeast ChIp-chip Epitope-tagged Untagged Crosslink Lyse Sonicate IP Reverse X-links Label Hybridize to Intergenic Array Nonspecific DNA Swi4 ChIP Chip Summary of Swi4-Binding Targets 16 3 Tot al Int ergenic Regions 40 % Neighbor an ORF wit h G1 / S periodicit y of expression 7 0 % Cont ain one or more SCB 1 8 1 Pot ent ial Gene Target s 28 Involved in cell wall maint enance ( ERG1 ) 12 Involved in cell cycle cont rol ( CLN1 ) 9 Involved in cell polarit y and morphogenesis ( CLA4 ) 3 Involved in DNA synt hesis/ repair ( POL1 ) 13 Transcript ion fact ors 4 Hist ones 7 Involved in mult i-drug resist ance 3 Involved in microt ubule funct ion 1 0 2 Ot her/ unknown f unct ion The G1/S Transcription Network Species Variation Different Genes vs Differential Gene Expression Conserved Morphogenic Pathways S. cerevisiae MAPK Signaling Pathway cAMP Signaling Pathway Ste12p Tec1p Sok2p Pseudohyphal growth C. albicans MAPK Signaling Pathway cAMP Signaling Pathway Cph1p Tec1p Efg1p Cph2p Dimorphic growth and virulence • Sok2 – 207 targets 2 0 GO categories Transcription – 144 targets 4 Cell Wall • Tec1 genome chIP hits 6 Cell cycle – 112 targets 8 Budding, polarity & morphogensis • Ste12 fraction of targets chIP chip Target Genes S. cerevisiae Targets Ste12 Tec1 Sok2 – 620 targets • Cph2 – 433 targets GO categories Pathogenicity • Cph1 (Ste12) Transcription – 589 targets chIP hits Cell wall • Efg1 (Sok2) 2 1 0 genome Cell cycle – 359 targets 4 3 Morphogenesis • Tec1 (Tec1) fraction of targets C. albicans chIP chip Conserved factors bind to different target genes Genome Cph1-Ste12 Homolog Family Efg1-Sok2 Tec1-Tec1 0 0.1 0.2 0.3 0.4 Fraction of homologous genes 0.7 Combinatorial Binding of Factors S. cerevisiae Ste12 Tec1 Sok2 C. albicans Cph1 Cph2 Efg1 Tec1 Conserved (core) v.s. C. albicans-specific targets C.albicans genome All chIP targets Cph1 targets Tec1 targets Efg1 targets Cph2 targets core C.a specific C.a and S.c only core but NOT S.c Chromosome 22 Genomic DNA Array: 21,024 PCR products 820 bp ave. size. Hybridized to Placental polyA+ RNA 50% of Transcribed Regions Are Not Annotated Transcriptional Activity of Chromosome 22 Hybridization in unannotated region Hybridization in annotated region Many Unannotated Hybridizing Sequences are Conserved in the Mouse Mapping NFKB Binding Sites on Chromosome 22 209 Binding Sites QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Examples of NFKB Targets PIK4CA QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 5’ Up Regulated BASC/MKL1 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 5’ Down Regulated TXN2 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 5’ No Change Potential p65 Targets on Ch22 • PDGF • MIF • TIMP3 • ATF 4 • BIK (Bcl2 interacting killer) • EWSR1 • IL2R-b • PPAR Location of NFKB Binding Sites unannotated 6% Relative to 5 kb proximal to novel transcript 16% Genes 10 kb upstream 10% novel transcript 1% exon 1% other intron 27% 5 kb upstream 27% 1st intron 12% ChIP chip Summary • Map binding sites of transcription factors in yeast and humans • Determination of binding site targets provides new insights into biological functions of factors • In humans binding sites lie in many locations relative to target genes • Regulatory circuits in related organisms can be highly divergent even though the processes and regulators themselves may be highly conserved. Two Types of Protein Microarrays Antibody Microarrays Antigens Functional Protein Microarrays Protein-Protein Small Molecule Enzymatic Interactions Interactions Assays ATP ADP Yeast Protein Kinases 122 Protein Kinase Homologs -All Members of Ser/Thr Family -24 Uncharacterized Producing the Yeast Proteome GST-His6::ORF1 5,800 expression clones 93.7% KD 250 175 105 75 60 55 35 20 ~80% full-length proteins Printing the Yeast Proteome GST:P1 GST:P2 GST:P3 Source Plate Protein-Protein Protein-Lipid Protein-DNA The Yeast Proteome Chip A C 2 mm 500 450 Number of Spots B 400 350 Probed With Anti-GST Antibodies 300 250 200 150 100 50 0 100 0 100 Screens Thus Far • 15 Protein-Protein Interactions • 8 Protein-Lipid Interactions • 3 Nucleic Acids (dsDNA, ssDNA, polyA-mRNA) • 4 Small Molecule Screens • 3 Posttranslational Modifications • 14 Antibodies Probe a-GST Biochemical Assays on Proteome Chips Calmodulin PI(3)P PI(4,5)P2 Calmodulin-Binding Proteins • 12 Known or Suspected Targets • 33 New Binding Proteins • Derived New Consensus Binding Site 14 7 0 IQ L L RV K K S R K I YFL003C/MSH4 YJR073C/OPI3 YBR050C/REG2 YNL202W/SPS19 YOL016C/CMK2 YBR011C/IPP1 L K E T L Q S VK S L K D A L H S V D L Q S SK F Q L A I V D E H F I Q R LP S T R L N S A K I P L Q R LG S T R D I A D D L R L Q S QK K G G E L T L N P I I Q D TK K G K L R F L R DS H RL G PT P F S D KQ I G E Q HC NN G V V L V N S E F Y A I K S Identification of Drug Targets Nutrient Rapa Drug (SMIR) Fpr1p Tor1/2p ??? Translation Glycogen Accumulation Arrest J. Huang, H. Zhu, S. Schreiber, M. Snyder G1 Arrest SMIR3 8 Targets SMIR4 30 Targets Identification of New DNA Binding Activities Cy3 labeled genomic DNA Probe proteome chip Summary of Genomic DNA Screen • ~200 Proteins bound DNA probe • 8 Novel ChIP chiped – 5 No loci enriched – 3 Showed enrichment: Mtw1, Dig2, Arg5,6 Arg5,6 ChIP chipTargets NAME Location 15S rRNA COX1 COX1 COX1 COX1 COX1 COX1 COX1 COX1 COX1 COX1 COB1 COB1 COB1 COX3 THI13/YDL244w RIM8 YGL015c/PUF4 YHL046c YLL064c PHO23 MEK1/YOR352w 3' end upstream 1st exon 1st intron 2nd intron 3rd intron 4th exon 5th exon 6th exon last intron last exon 1st exon 4th intron 6th exon Internal Upstream Upstream Upstream Upstream Upstream Upstream Upstream Chromosome Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Mitochondria Chromosome 4 Chromosome 7 Chromosome 7 Chromosome 8 Chromosome 12 Chromosome 14 Chromosome 15 Arg5,6 Binds DNA In Vitro A Purified Arg5,6 proteins KD 160 105 75 50 35 C bp nM 5’-COX1+GST D bp 400 500 400 300 60 nM 500 400 300 EBNA DNA+GST::Arg5,6 bp 5’-COX1+GST::Arg5,6 B nM Arg5,6 Targets Require Arg5,6 Protein for Expression Fold Enrichment: arg5,6D / WT Media Rich 7.50 Nitrogen Depletion AA Starvation Cox1Gen Cox1Proc 5.50 Cob1Gen Cob1Proc 3.50 Cox3 Puf4 1.50 -0.50 -2.50 Yor352w Yhl045w 21s Cox2 Act1 Antibody Probing of the Yeast Proteome Microarray Antibody # of +s 1 Monoclonal (3 Yeast + 3 Control) a-Sed3, a-Cox4 4 a-Pep12 Anti-Peptide Polyclonal (6) a-Hda1 a-Mad2 8 1 Anti-FL Protein Polyclonal (2) a-Nap1 a-Cdc11 1770 7 Cdc11 Anti-Nap1 Sed3 Mad2 a-Sed3p Protometrix Kinase Assay on a Proteome Chip • 33P-g-ATP labeling • 41 positives • High resolution • Quantitative & sensitive • Low background • Little reagent needed Kinase Signaling Network Kinase A Kinase D Protein 5 Protein 1 Kinase B Protein 4 Kinase E Protein 2 Protein 6 Protein 7 Kinase C Protein 8 Protein 3 Protein 8 Acknowledgments ChIP Chip - Yeast Christine Horak Vishy Iyer Pat Brown Anthony Borneman Haiyuan Lu Nick Luscombe Jiang Qian Mark Gerstein Human Chromo 22 Ghia Euskirchen John Rinn Becky Goetsch Ken Nelson Steve Hartman Sherman Weissman Fred Sayward Perry Miller Nick Luscombe Tom Royce Mark Gerstein Acknowledgments Protein Chips Heng Zhu Metin Bilgin Rhonda Bangham Dave Hall Antonio Casamayor Scott Bidlingmaier Ghil Jona Geeta Devgan Jason Ptacek Informatics Paul Bertone Ron Jansen Ning Lan Xiaowei Zhu Mark Gerstein Small Molecule Jing Huang Stuart Schreiber Protometrix Greg Michaud Michael Salcius Fang Zhou Rhonda Bangham Jaclyn Bonin Barry Schweitzer Paul Predki http://bioinfo.mbb.yale.edu/proteinchip