http://pastime.cgu.edu.tw/petang/index.htm Bioinformatics 95 Lecture 1 – Introduction to Bioinformtics Petrus Tang, Ph.D. (鄧致剛) Graduate Institute of Basic Medical Sciences and Bioinformatics Center, Chang Gung University. petang@mail.cgu.edu.tw EXT: 5136 助教: 葉元鳴 (分機) 曾詩涵 (分機5690) Bioinformatics: A Practical Guide to the Analysis of Genes & Proteins Contents Bioinformatics and the Internet The NCBI Data Model The GenBank Sequence Database Structure Databases Genomic Mapping and Mapping Databases Information Retrieval from Biological Databases Sequence Alignment and Database Searches Multiple Sequence Alignment Predictive Methods using DNA Sequences Predictive Methods using Protein Sequences Expressed Sequence Tags Sequence Assembly and Finishing Methods Phylogenetic Analysis Comparative Genome Analysis Using Perl to Facilitate Biological Analysis 432 pages (2001) Wiley-Liss; ISBN: 0471383910 WHAT IS BIOINFORMATICS? ? AGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT AGCTAGCTAGCTAGCTAGCTAGCTATCGATGCATGCATGCATGCA TGCATGCATGCATGCACTAGCTAGCTAGTGCATGCATGCATG AGGTTGACCAATGTGAAATGGCCAATTGATGACCAGAGATTTAGGCCAATTAA AGGTTGACCAATGTGAAATGGCCAATTGATGACCAGAGA What is Bioinformatics? The answer to this question depends on whether you are talking to A computer scientist who 'does' biology, or A molecular biologist who 'does' computing. Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. 結合生物學、計算機科學與資訊學的技術,應用於生物化學資料的處理, 將繁瑣無意的資料轉化成有意義、有價值的訊息。 Biology Technology Mathematics Chemistry Physics Information Protein coding sequence 3‘UTR 5‘UTR promotor exon 1 exon 2 exon n-1 exon n Gene prediction Codon usage (single exon) coding Frame 1 non-coding Frame 2 coding sequence Frame 3 correct start Gene prediction Codon usage (multiple exons) coding Frame 1 non-coding Frame 2 Frame 3 Splice sites Exons: 208. .295 1029. .1349 1500. .1688 2686. .2934 3326. .3444 3573. .3680 4135. .4309 4708. .4846 4993. .5096 7301. .7389 7860. .8013 8124. .8405 8553. .8713 9089. .9225 13841. .14244 Drosophila Nucleic Acid Binding Functional Assignment8% using Hypothetical 11% Enzyme Gene Ontology 18% Signal Transduction 4% Transporter 4% 13,601 Genes Structural Protein 2% Unknown 48% Ligand Binding or Carrier 2% Motor Protein 1% Nucleic Acid Binding Transporter Cell Adhesion Unknown Enzyme Structural Protein Chaperone Hypothetical Chaperone 1% Cell Adhesion 1% Signal Transduction Ligand Binding or Carrier Motor Protein Gene Number in the Human Genome Number of genes 10 K Known genes 20 K 30 K Otto 4 3 2 40 K 1 50K Confidence Experiment Driven Hypothesis Experiments Results Information Driven Experiments Hypothesis THE COMPONENTS OF BIOINFORMATICS TECHNOLOGY ALGORITHM ANALYSIS TOOLS DATABASE COMPUTING POWER DNA Genome RNA protein Transcriptome Proteome phenotype DNA Sequencing MegaBRACE 1000 96 DNA sequencing in 2 hrs, approximately 600-800 readable bps per run. 1,000,000 bps in 24 hrs. Microarray 10,000 Clones per slide Proteomics 2 Dimensional Electrophoresis gels, differences that are characteristics of the individual starting states recognized by comparison of two protein pattern 6,000 protein spots per gel MALDI-MS peptide mass fingerprint, for identification of proteins separated by 2D electrophoresis 3D Modeling DNA Genome Projects RNA Microarry ESTs SAGE protein phenotype 2D Electrophoresis Protein Modeling Protein-Protein Interaction Genetic Sequence Data Bank Aug 15 2006, Release 155.0 65,369,091,950 bases, from 61,132,599 reported sequences Homo sapiens 12,385,903,706 bases from 10,649,134 sequences Expressed sequence tags 7,893,983 Recent years have seen an explosive growth in biological data. Large sequencing projects are producing increasing quantities of nucleotide sequences. The contents of nucleotide databases are doubling in size approximately every 14 months. The latest release of GenBank (V.139) exceeded two billion base pairs. Not only the size of sequence data is rapidly increasing, but also the number of characterized genes from many organisms and protein structures doubles about every two years. To cope with this great quantity of data, a new scientific discipline has emerged: bioinformatics, biocomputing or computational biology Entries 10649134 6753652 1267882 1663937 1287702 2499723 1149146 226213 1236899 1175934 1426915 655519 800633 1198209 209185 868038 397437 784170 69335 696319 Bases 12385903706 8049817803 5747965742 3566605068 2540551749 1998269811 1500985768 1251961979 1075752229 961525020 893771790 845341580 770627209 758043364 691252171 507883206 468939096 465881813 463195893 421330392 Species Homo sapiens Mus musculus Rattus norvegicus Bos taurus Danio rerio Zea mays Oryza sativa Strongylocentrotus purpuratus Sus scrofa Xenopus tropicalis Canis familiaris Drosophila melanogaster Gallus gallus Arabidopsis thaliana Pan troglodytes Triticum aestivum Medicago truncatula Sorghum bicolor Macaca mulatta Ciona intestinalis THE COMPONENTS OF BIOINFORMATICS TECHNOLOGY ALGORITHM ANALYSIS TOOLS DATABASE COMPUTING POWER The International Nucleotide Sequence Database Collaboration GenBank: http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information (NCBI) DDBJ: http://www.ddbj.nig.ac.jp/ National Institute of Genetics (NIG) EMBL: http://www.ebi.ac.uk European Bioinformatics Institute (EBI) ExPASy: http://tw.expasy.org Expert Protein Analysis System GenBank/EMBL/DDBJ International Nucleotide Sequence Database DDBJ: DNA Data Bank of Japan CIB: Center for Information Biology and DNA Data Bank of Japan NIG: National Institute of Genetics IAM: International Advisory Meeting ICM: International Collaborative Meeting NCBI: National Center for Biotechnology Information NLM: National Library of Medicine EMBL: European Molecular Biology Laboratory EBI: European Bioinformatics Institute Protein Databases Protein Information Resources (PIR) http://pir.georgetown.edu/ In 1988, The Protein Information Resource (PIR), established a cooperative effort with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID) , produces the PIR-International . Protein Sequence Database (PIR-PSD) -- a comprehensive, non-redundant, expertly annotated, fully classified and extensively cross-referenced protein sequence database in the public domain. The PIR-PSD, PIR-NREF, iProClass and other PIR auxiliary databases provide an integration of sequences, functional, and structural information to support genomics and proteomics research The PIR-PSD, Current Release 71.04, March 01, 2002, Contains 283153 Entries SWISSPROT http://www.ebi.ac.uk/swissprot/ The SWISS-PROT Protein Knowledgebase is an annotated protein sequence database established in 1986. It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI). Protein Databases ExPASY Molecular Biology Server http://tw.expasy.org The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE Protein Data Bank http://www.rcsb.org The Protein Data Bank (PDB) is operated by Rutgers, The State University of New Jersey; the San Diego Supercomputer Center at the University of California, San Diego; and the National Institute of Standards and Technology -- three members of the Research Collaboratory for Structural Bioinformatics (RCSB). The PDB is supported by funds from the National Science Foundation, the Department of Energy, and two units of the National Institutes of Health: the National Institute of General Medical Sciences and the National Library of Medicine. Metabolic & Signalling Pathways Biocarta ( http://biocarta.com) Kyto Encyclopedia of Genes &Genomes http://www.genome.ad.jp/kegg/ The Cancer Genome Anatomy Project (CGAP) http://cgap.nci.nih.gov/ THE COMPONENTS OF BIOINFORMATICS TECHNOLOGY ALGORITHM ANALYSIS TOOLS DATABASE COMPUTING POWER BIOINFORMATICS ANALYSIS TOOLS $ Vector NTI suite, Omiga, DNAsis $ Staden Package, EMBOSIS, BLAST, FASTA On line analysis tools http://bioinfo.nhri.org.tw/ 國家衛生研究院巨分子序列分析服務 巨 分 子 序 列 分 析 服 務 GCG 在 Unix 系 統 下 以 Command Mode 進 行 核 酸 或 蛋 白 質 的 序 列 分 析 。 ( telnet://bioinfo.nhri.org.tw ) 巨 分 子 序 列 分 析 服 務 SeqWeb 連 線 至 SeqWEB 以 瀏 覽 器 進 行 核 酸 或 蛋 白 質 的 序 列 分 析 。 (http://bioinfo.nhri.org.tw/) EMBOSS 連 線 至 SeqWEB 以 瀏 覽 器 進 行 核 酸 或 蛋 白 質 的 序 列 分 析 (http://srs.nchc.org.tw/EMBOSS/) Smith-Waterman 快 速 序 列 搜 尋 系 統 GenWEB 直 接 連 線 至 GenWeb 以 瀏 覽 器 進 行 核 酸 或 蛋 白 質 的 快 速 序 列 搜 尋 。 以 特 殊 設 計 的 硬 體 加 速 序 列 搜 尋 的 速 度 , 可 進 行 Smith-Waterman 及 FrameSearch 等 搜 尋 功 能 。 (http://sw.nhri.org.tw/cgi-bin/genweb/bin/login.cgi) ExPASy (Expert Protein Analysis System) 連 線 至 ExPASy 以 瀏 覽 器 進 行 蛋 白 質 的 序 列 分 析 。 (http://tw.expasy.org) THE COMPONENTS OF BIOINFORMATICS TECHNOLOGY ALGORITHM ANALYSIS TOOLS DATABASE COMPUTING POWER 設備 醫學大樓9樓0917 SunFire 6800 16 CPU 設備 COMPUTER SunFire 6800 Sun V60 Cluster IBM X336 Cluster IBM X225 Cluster HP DL580G3 Cluster LunuxWorX Cluster IBM Z-pro Graphic Station 教學電腦 教學電腦 CPU Sparc 750 MHz Xeon 2.8 GHz Xeon 3.2 GHz Xeon 2.4 GHz Xeon 3.0 GHz Xeon 2.4 GHz Xeon 3.2 GB x 2 P4 2.4 GHz P4 3.2 GHz ITEMS Proware RAID System Petastor Fibre RAID System Proware NAS System Brocad silkworm 2G Fibre switch UPS UPS Video Conference System Telephone Conference System NO. 24 20 14 2 16 8 2 15 15 MEMMORY 48 GB 20 GB 14 GB 1.5GB 16 GB 8 GB 3 GB 512 MB 1 GB SPECIFICATION 250 GB x 16 (4 TB) 400 GB x 16 (6.4 TB x 4) 80 GB x 8 (640 GB) 12 ports 10 KVA 30 KVA Centura Polycom sound station NO 1 4 1 1 1 2 50 1 設備 [Vector NTI Advanced Server] [GENOMAX High-Throughout Sequence Analysis System] [Paracel BLAST] [Paracel TranscriptAssembler] [Bioinformatics Linux Cluster] [Expression Sequence Tag Analysis Pipeline] [Protein Sequence Analysis Pipeline] [Protein Modeling & Docking System] [Lead Compound Database] [ The European Molecular Biology Open Software Suite ] [Sequence Retrieval System] [MetaCore: PPI Network] [Expressionist] Steps to Identify a Gene Gene-Search Protein-Search Annotation Full length ORF of TvEST-14G2 -2 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 …AGATGCGAAAAA AAGTTTCGGA TTGCTCTCAA GAAGCCAAGC TGTAGAACCA AGACAACTAA TTAGTTTCAT CGGACAAATG ACCGCGACAT AACAAAATTT AAATAATCGT CAAGATATTC GATGACATGG TCTTCCTTGG TTTTAATGAA AATAGTTTCT AGAAGAACCA TTGCTGATCA ATTGTTCGCC AGGAAAATGT GATATTCTTC ATTAAGAACA CCTTGATGAA ATCAGAAACC ATGAGATCAA TTCGGCAGTT CACATCCTGC TCTTCTACAA GGCTTCATCA ACGGAATTCG AGTAAGCCTC GGCTCCTCTC TTTTCTTTTT TCTACGGCAA GAGGTTTGGG ATTAGAGCCC TATACTCAAT TGCAACAACA CTACATGGCC CGGTCCCTAG ATTTCCTGTG CAAGCCAGAT ATATTATCGA CATATTAGAA ATCAATTAAT AATCTTTGGT ATGAGCTTAC GAAGAGATCA TTGTAAACTA AATTACGCGA AATTCCTTTT CACAACGTGA CCAAACTCAG GCACAGACAA GTAGTTCAAG TCATCTCAAG ATATACACCG AGACTACAAT AAGAAAGAAT AACTACAAAA CGAAACCGGC AGACGTACAT CCCTGCAAAG CGGTATCTTA ACATCTCGTC CTATCTGTAT TTACATTACG AAGCTGTCAG CGAAACTCTA GTTTCAGGCT TTCCAGTTGT ATGGAATTAC ATTTTCCCAA TTGAATTCGT AATTTTGCGA TTTTGGACTT ATTGCACAGG GCGCTCGAAG ATATGTCTGG CTACAACAGG ACGAAACCCG CTTAATAGCA TGTACAGGAA GATTATCGCT AAACCAATCA CTGAGTTTGA GTACAAAGCC TCCATCAAAG ATAAAAAGCC CCACGTACAA CAATACTGCA CGTCAGCAAC ACAACAAAAA AACTACGAGC CAACTCTACG GAAAGAACTG CCGTACTGGA GCTGAAATAT ATTAAATGTA CAGAAGCGTC TCATTCGACC GTGTTCCACA TCAAAATCCA TTATGCGACT TTGGCAAGTC AAGACAATAT TCACAAACAT TGGGAGTCAG TCCAAGAAGT AAAATCACTT GAAAGGAACA GTTTATTTAC CCGCAAGAAG AAGAATTATG GTTCGCTCAT AATGATATAC ATGATTGGGT CAGTTGTCCG TGGTTTCTCC CCGTTTCATC GATATTTTGC AATCAAAGCT TTAATACTAC AGAACAACAG AAGGACTGTT CTGTAAATAG TCTCACAAAG TTCAAGTCGC CGCTTTTCAC ATGCTTCCGA ATTTTTTATA TTTCTATATT TCGGTTCAGG GGACAAAAGG ATTATTTTTC CAAATAATAG GGTCAAACAG TCTGGAAGAT TAATGCTTGC AATTTTATTC TGAGAACTCA ACATTGACCA ACCGGAACCG GTCTATAAGA TTCATGGACG TATGAGGCCA TTTAGGACTT TGAAATTTGA GACGCAATGA CAAAACGAGA AACGTCAAGA TCCATCAAAG TAGAGATGTC AATCATCAAC GTCGAATCGA CGAAACAAGA CAAAGAACTC AAGAAAGAAA ACAATTGAAC ACTCAGAACC CGCCAAAATG AGCTACAGCC AATGGATGAT TTATTTATTT ATTAAAAAAA Amino Acid Sequence Comparison (1) 01B1 (1) 1B1(final) 04E12 (1) CK1-1_full 14G2 (1) CK1-2_full ciparum ) (1) PFCK s pombe) Yeast (1) sapiens ) (1) Human musculus ) (1) Mouse oma cruzi) (1) TcCK1.1 ma cruzi ) (1) TcCK1.2 onsensus (1) 1 (151) 01B1 (131) 1B1(final) 04E12(139) CK1-1_full CK1-2_full 14G2 (147) ciparum ) (139) PFCK s pombe) Yeast(141) sapiens ) (139) Human usculus ) (139) Mouse maTcCK1.1 cruzi) (142) maTcCK1.2 cruzi ) (144) onsensus (151) 151 (301) 1B1(final) 01B1 (273) CK1-1_full (289) 04E12 CK1-2_full (295) 14G2 ciparum ) (289) PFCK s pombe) (291) Yeast sapiens ) (289) Human musculus ) (289) Mouse ma cruzi) (292) maTcCK1.1 cruzi ) (294) TcCK1.2 onsensus (301) 301 (451) 1B1(final) 01B1 (397) K1-1_full 04E12(410) K1-2_full 14G2 (445) ciparum ) (325) PFCK s pombe) Yeast (366) Human sapiens ) (410) Mouse usculus ) (410) maTcCK1.1 cruzi) (313) maTcCK1.2 cruzi ) (331) nsensus (451) 451 10 20 30 40 50 60 70 80 92 93 (93) 100 110 120 130 140 150 Translation of 01B1(final) (73) TMELLGDSLEKLFERCGRKFSLKTVLMLADQMIKCVQYIHTKSFIHRDIKPENFTIGTGPN ----------MKVGERIGGGSYGNIFYAYNTANKKELALKIESEKTKRSQIFNEYRALKCLAGY----------VGIPKVYFETCYGNQNAF Translation of CK1-1_full (81) VIDLLGKSLEEHLNKVNRRMSLKTVLMLVDQMITAVEFFHSKNYIHRDIKPDNFVMGVNQN --MEEICGGEYQIIKKIGQGSFGKIYIIKQVKTGLLFAAKLENSDAPIPQLLFESRLYQIMSGS----------TNVPRLHAHSFDSRYNTI Translation of CK1-2_full (90) AMELLGKSLEDLVSSVP-RFSQKTILMLAGQMISCVEFVHKHNFIHRDIKPDNFAMGVSEN ---MRKIYGNYITQKRLGSGSFGEVWEAVSHSTGQKVALKLEPRNSSVPQLFFEAKLYSMFQASKSTNNSVEPCNNIPVVYATGQTETTNYM Translation of CK1(Plasmodium falciparum ) (81) VLDLLGPSLEDLFTLCNRKFSLKTVRMTADQMLNRIEYVHSKNFIHRDIKPDNFLIGRGKK --MEIRVANKYALGKKLGSGSFGDIYVAKDIVTMEEFAVKLESTRSKHPQLLYESKLYKILGGG----------IGVPKVYWYGIEGDFTIM Translation of CK1(Schizosaccharomyces pombe) (83) VMDLLGPSLEDLFNFCNRKFSLKTVLLLADQLISRIEFIHSKSFLHRDIKPDNFLMGIGKR MALDLRIGNKYRIGRKIGSGSFGDIYLGTNVVSGEEVAIKLESTRAKHPQLEYEYRVYRILSGG----------VGIPFVRWFGVECDYNAM Translation of CK1(Homo sapiens ) (81) VMELLGPSLEDLFNFCSRKFSLKTVLLLADQMISRIEYIHSKNFIHRDVKPDNFLMGLGKK --MELRVGNKYRLGRKIGSGSFGDIYLGANIASGEEVAIKLECVKTKHPQLHIESKFYKMMQGG----------VGIPSIKWCGAEGDYNVM Translation of CK1(Mus musculus ) (81) VMELLGPSLEDLFNFCSRKFSLKTVLLLADQMISRIEYIHSKNFIHRDVKPDNFLMGLGKK --MELRVGNKYRLGRKIGSGSFGDIYLGANIASGEEVAIKLECVKTKHPQLHIESKFYKMMQGG----------VGIPSIKWCGAEGDYNVM Translation of CK1.1(Trypansoma cruzi) (84) VMDLLGPSLEDLFSFCGRKLSLKTTLMLAEQMIARIEFVHSKSVIHRDMKPDNFLMGTGKK --MNLMIANRYCISQKIGAGSFGEIFRGTNMQTGETVAIKLEQAKTRHPQLAFEARFYRILNAGGGV-------VGIPNILFYGVEGEFNVM Translation of CK1.2(Trypansoma cruzi ) (86) VMDLLGPSLEDLFSFCDRKLSLKTTLMLAEQMIARIEFVHSKSVIHRDMKPDNFLMGTGKK MSLELRVGNRFRLGQKIGAGSFGEIFRGTNIQTGETVAIKLEQAKTRHPQLALEARFYRILNAGGGV-------VGIPNILFYGVEGEFNVM (93) VMDLLGPSLEDLF FC RKFSLKTVLMLADQMISRIEFIHSKNFIHRDIKPDNFLMGLGKK MELRVGNKYRLGKKIGSGSFGDIYLG NI TGEEVAIKLE KTKHPQL FESR YKILQGG VGIP I WConsensus G EGDYNVM 160 170 180 190 200 210 220 230 (243) 242243 250 260 270 280 290 300 Translation of 01B1(final) (215) IKLSTSVEELCEGLPVEFSIFLQDMRKLDFEEEPNYSKYLQLFRSLFLNSGFVYDDVYDWTL GPNSNVIYIIDFGLAKRYINGQTLTHIPYREGRSFTGTTRYGSINDHLDIEQSRRDDMESLAYTLIYFLKGFLPWHGCKRETFQ-------Translation of CK1-1_full (231) CKRDTPLEKLCEGLPSEIITYIRKVRSLRFTERLHYASYRRLFRGLFRAMQFTFDYIYDWSP NQNSNKLYIIDYGLAKKYRDVNTHEHIPYIEGKSLTGTARYASINALLGCEQSRRDDMEAIGYVIVYLLKGHLPWMGIDGATNQERYRRIAE Translation of CK1-2_full (237) KKRSTKPEELCLGLNSFFVNYLIAVRSLKFEEEPNYAMYRKMIYDAMIADQIPFDYRYDWVK SENSNKIYIIDFGLSKKYIDQ-NNRHIRNCTGKSLTGTARYSSINALEGKEQSIRDDMESLVYVWVYLLHGRLPWMSLPTTGRK-KYEAILM Translation of CK1(Plasmodium falciparum ) (231) KKISTSVEVLCRNASFEFVTYLNYCRSLRFEDRPDYTYLRRLLKDLFIREGFTYDFLFDWTGKKVTLIHIIDFGLAKKYRDSRSHTSYPYKEGKNLTGTARYASINTHLGIEQSRRDDIEALGYVLMYFLRGSLPWQGLKAISKKDKYDKIME Translation of CK1(Schizosaccharomyces pombe) (233) KKISTPTEVLCRGFPQEFSIYLNYTRSLRFDDKPDYAYLRKLFRDLFCRQSYEFDYMFDWTL GKRGNQVNIIDFGLAKKYRDHKTHLHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLVYFCRGSLPWQGLKATTKKQKYEKIME Translation of CK1(Homo sapiens ) (231) KKMSTPIEVLCKGYPSEFSTYLNFCRSLRFDDKPDYSYLRQLFRNLFHRQGFSYDYVFDWNM GKKGNLVYIIDFGLAKKYRDARTHQHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFNLGSLPWQGLKAATKRQKYERISE Translation of CK1(Mus musculus ) (231) KKMSTPIEVLCKGYPSEFSTYLNFCRSLRFDDKPDYSYLRQLFRNLFHRQGFSYDYVFDWNM GKKGNLVYIIDFGLAKKYRDARTHQHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFNLGSLPWQGLKAATKRQKYERISE Translation of CK1.1(Trypansoma cruzi) (234) CKMSLSLETLCKGFPAEFAAYLNYTRGLRFEDKPDYSYLKRLFRELFIREGYHVDYVFDWTL GKKGHHVYVVDFGLAKKYRDPRTHQHIPYKEGKSLTGTARYCSINTHLGIEQSRRDDLEGIGYILMYFLRGSLPWQGLPAATKQEKYVAIAK Translation of CK1.2(Trypansoma cruzi ) (236) RKQTTPVETLCKGFPAEFAAYLNYIRSLRFEDKPDYSYLKRLFRELFIREGYHVDYVFDWTL GKKGHHVYVVDFGLAKKYRDPRTHQHIPYKEGKSLTGTARYCSINTHLGIEQSRRDDLEGIGYILMYFLRGSLPWQGLKAHTKQEKYSRISE Consensus (243) KKMSTPVE LCKGFPSEFS YLNY RSLRFEDKPDYSYLRRLFRDLFIR GF YDYVFDWTL GKKGN VYIIDFGLAKKYRD RTH HIPYREGKSLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFLRGSLPWQGLKA TKK KYERISE 310 320 330 340 350 360 370 380 (393) 392393 400 410 420 430 440 450 Translation of 01B1(final) (344) PKRFSLETNQTLLSLFNK-SVNDYF-G-ILFLI-GFIFLSGKYGIVGKKKKKKKKKK--DWTLLPEEPPRPHFKQDVFNSKISN---------DDSSDSIIKTKQPHREKSAGTSRLSLISLPTQNVLAQSGIFLTK------------KP Translation of CK1-1_full (352) VEVKQIELSSSSSQDKPKTKPNYMREIDAILNRVKPIQTPKIVSHLPPPPIEELPKKLRK DWSPRKDNDVPPVRYTRRKGQMP-----------------VNERRPSIEAVFSGERRRRSEENMRTIDFENEEIPEPK------------KP Translation of CK1-2_full (387) PYTPPRTINTTETRMRSKTTINTARTTAKNSSAVKKESSATRTVKKETHPATTKTTKTVN DWVKTRIVRPQRENQSQLSERQEGKCPNSAEFDGFSSIKGYSSHRQVQSPVSSRDVIKNSSSSPSKDILQSSTLDESSQDKKPIKAVESNQK Translation of CK1(Plasmodium falciparum ) (325) -----------------------------------------------------------DWT---------CVYASEKDKKK-----------------MLENKNRFDQTADQEGRDQRNN-----------------------------Translation of CK1(Schizosaccharomyces pombe) (343) INTTVPVINDPSATGAQYINRPN------------------------------------DWTLKRKTQQDQQH---------------------------QQQLQQQLSATPQAINPP-PERSSFRNYQKQNFDEKG------------GD Translation of CK1(Homo sapiens ) (352) PASRIQPAGNTSPRAISRVDRERKVSMRLHRGAPANVSSSDLTGRQEVSRIPASQTSVPF DWNMLKFGAARNPEDVDRERREH-----------------EREERMGQLRGSATRALPPGPPTGATANRLRSAAEPVA------------ST Translation of CK1(Mus musculus ) (352) PASRIQQTGNTSPRAISRADRERKVSMRLHRGAPANVSSSDLTGRQEVSRLAASQTSVPF DWNMLKFGAARNPEDVDRERREH-----------------EREERMGQLRGSATRALPPGPPTGATANRLRSAAEPVA------------ST Translation of CK1.1(Trypansoma cruzi) (313) -----------------------------------------------------------DWTLKRIHESLQDE-----EKEL-----------------SNN------------------------------------------------Translation of CK1.2(Trypansoma cruzi ) (331) -----------------------------------------------------------DWTLKRIHENLKAEGSG--QQEQ-----------------KQQQQQQRERGDVEQA-----------------------------------Consensus (393) T K DWTL R R RQ SA 460 470 480 490 500 510 520 530 542 -------------------------------------------------------------------------------------------RKEEEKTHHHRKLSGHRTHHHESKRVVKKEKTKVEEEEEIIPKRFTKRKELEMPSDDEPLTSVDEFLIRRGLMKPRKPKI-Y-FFYCLYLFF VNRQLNSSTTKPATTSSHKDSEPASSRRTSTLRSSRRQNDGIRPAKERTALFTATASKPPVSYRTGMLPKWMMAPLTSRR-NIFFILFIFFF --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------PFDHLGK------------------------------------------------------------------------------------PFDHLGK--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- : kinesin homology domain : casein kinase 1 specific motifs PFCK : Plasmodium casein kinase 1 TcCK1.1: Trypansoma cruzi casein kinase 1.1 TcCK1.2: Trypansoma cruzi casein kinase 1.2 Similarity of Various CK1s from Different Species TvEST04E12 TvEST14G2 TvEST01B1 T. cruzi CK1.1 T. cruzi CK1.2 PFCK Yeast CK1 Mouse CK1 Human CK1 TvEST04E12 TvEST14G2 TvEST01B1 T. cruzi CK1.1 T. cruzi CK1.2 PFCK Yeast CK1 Mouse CK1 Human CK1 100 32 32 34 34 34 37 37 37 100 24 24 23 24 24 26 25 100 47 47 48 48 38 38 100 23 73 24 61 61 100 74 70 63 63 100 69 62 62 100 69 67 100 99 100 3-D Structure of TvEST-14G2 and other CK1s TVEST-14G2 TcCK1.1 TcCK1.2 1 MRKIYGNYIT QKRLGSGSFG EVWEAVSHST GQKVALKLEP RNSSVPQLFF 51 EAKLYSMFQA SKSTNNSVEP CNNIPVVYAT GQTETTNYMA MELLGKSLED 101 LVSSVPRFSQ KTILMLAGQM ISCVEFVHKH NFIHRDIKPD NFAMGVSENS 151 NKIYIIDFGL SKKYIDQNNR HIRNCTGKSL TGTARYSSIN ALEGKEQSIR 201 DDMESLVYVW VYLLHGRLPW MSLPTTGRKK YEAILMKKRS TKPEELCLGL 251 NSFFVNYLIA VRSLKFEEEP NYAMYRKMIY DAMIADQIPF DYRYDWVKTR 301 IVRPQRENQS QLSERQEGKC PNSAEFDGFS SIKGYSSHRQ VQSPVSSRDV 351 IKNSSSSPSK DILQSSTLDE SSQDKKPIKA VESNQKPYTP PRTINTTETR 401 MRSKTTINTA RTTAKNSSAV KKESSATRTV KKETHPATTK TTKTVNRQLN 451 SSTTKPATTS SHKDSEPASS RRTSTLRSSR RQNDGIRPAK ERTALFTATA 501 SKPPVSYRTG MLPKWMMAPL TSRR PfCK1 Yeast CK1 Mouse CK1 Human CK1-δ B I O I N F O R M A T I C S I C S 疾病預測及診斷,新基因的發現 基因演化整體功能及其網路調節系統 藥物設計及生物大分子結構 GENOMICS GENE EXPRESSION ANALYSIS PROTEOMICS MEDICAL INFORMATICS B I O I N F O R M A T Focuses in Bioinformatics Perturbation Dynamic Response Environment Medication Genetic Engineering Gene Expression Protein Expression Virtual Cell Analysis BioChip DataBase Genotype/Phenotype Biology Molecular Biology Bio Chemistry Genetics Symbolic Algorithms/ Computing Genome Sequencing Goals Leading Toward Predictive Biology Gene Sequence Data Gene Identification IL-3 Structure Prediction FAS-L IGF1 IGF1R FAS mitogen IL-3R FADD/MORT IRS1 FLICE P21 Cyclin D1 RAS pRb P16 Cdk4 ICE PI 3-K Protein Circuit & Regulatory Network Discovery P53 P27 P107 Bin-1 E2F CPP32 AKT/PKB apoptosis Bcl-XL BAD Mad Max C-Myc C-Myc Max Max Mad Cyclin E Cdc25A ? cell proliferation Cyclin E Cdk2 p Cdk2 P27 p Cyclin E Cyclin E Cdk2 p Cdk2 Biosimulation Reconstructing Cellular Functions Reductionistic Approach (Genome Sequencing, DNA arrays, proteomics) 20th Century Biology Integrative Approach (Bioinformatics, Systems Science, modeling & simulation) 21th Century Biology Hallmarks of Cancer D. Hanahan and R. A. Weinberg. Cell., 100(1):57–70 Review, 2000. Platform for Systems Biology • Objective is to link gene response, protein activity, metabolite dynamics to disease and interventions Gene Quantitative Comparisons protein index metabolite index Protein Complex Cellular Samples bodyfluids, tissue BioSystematics TM Dynamics i.e. environmental + time Metabolite Targets Biomarkers 9 8 7 6 5 4 3 2 1 0 ppm SYSTEMS BIOLOGY R HO Genomics Proteomics Metabolomics Transcriptomics Functional Proteomics/Genomics Systems Biology Q. As a biologist, what skills do I need to make the transition to bioinformatics? The fact is that many of the jobs available CURRENTLY involve the design and implementation of programs and systems for the storage, management and analysis of vast amounts of DNA sequence data. Such positions require in-depth programming and relational database skills which very few biologists possess, and so it is largely the computational specialists who are filling these roles. This is not to say the computer-savvy biologist doesn't play an important role. As the bioinformatics field matures there will be a huge demand for outreach to the biological community, as well as the need for individuals with the in-depth biological background necessary to sift through gigabases of genomic sequence in search of novel targets. It will be in these areas that biologists with the necessary computational skills will find their niche. A. Molecular biology packages (GCG, BLAST etc), Web and programming skills including HTML, Perl, JAVA and C++, Familiar with a variety of operating systems (especially UNIX), Relational database skills such as SQL, Sybase or Oracle, Statistics, Structural biology and modeling, Mathematical optimization, Computer graphics theory and linear algebra. You will need to be able to readily pick up, use and understand the tools and databases designed by computer programmers, and To communicate biological science requirements to core computer scientists.