生物信息学课程 -- 数据库与网络服务 杜舟 生物信息学 2007级 苏震实验室 博二的老人了 Concepts • Bioinformatics • Computational Biology (Many who draw a distinction between bioinformatics and computational biology portray the former as a tool kit and the latter as science. ) • Database • Web server Web service Nucleic Acids Research Database and Web Server issue Database Web Server Database http://www.oxfordjournals.org/nar/database/c/ Web sever http://bioinformatics.ca/links_directory/ Google !!! Bioinformatics主要期刊 专业期刊(以计算文章为主): Bioinformaitcs,plos computational biology, BMC bioinforma tics, journal of computational biology, BMC genomics , BMC systems biology, molecular biology eolution... 准专业期刊(基本上每期都有一定比例): genome biology, nucleic acids research, genome research, molecular systems biology, american journal of human ge netics,... 综合期刊:nature,science,pnas,plos one,... 其它(偶尔有计算类文章发表): nature biotechnology, nature genetics, nature methods, cel l,trends genetics, plos genetics,... Part I Overview of the bioinformatics Database and web server Part II Introduction to bioinformatics web services created in Su Zhen's lab Part III Construction of database and web services Three major public DNA EMBL GenBank databases DDBJ In 1988, 由此三家组成了国际核酸序列数据库协作组织 (INSDC),规定: 1、数据交换与共享(每24小时进行一次) 2、使用统一的数据记录格式处理提交数据,以保证各数 据库相应记录在内容上的一致性。 3、数据的维护与更新。Each database updates only the records that were directly submitted to it. 14 What is accession number? Accession number 是用来确定一个记录的标签。 Examples (all for retinol-binding protein, RBP4): X02775 NT_030059 Rs7079946 GenBank genomic DNA sequence(1+5,2+6) Genomic contig in RefSeq dbSNP (single nucleotide polymorphism) DNA N91759.1 NM_006744 An expressed sequence tag (1 of 170) RefSeq DNA sequence (from a transcript) NP_007635 AAC02945 Q28369 1KT7 RefSeq protein GenBank protein SwissProt protein Protein Data Bank structure record RNA protein 19 Accession number series in RefSeq Experimentally determined sequences NT_123456 NM_123456 NP_123456 Genomic contigs (DNA) mRNA Proteins • Sequences derived through genome annotation efforts XM_123456 XP_123456 Model mRNAs Model proteins 20 NCBI简介 • NCBI(National Center for Biotechnology Information),建立于1988年 • 主要任务 – – – – 开发数据库 进行计算生物学研究 开发基因组数据分析的工具 发布生物医学信息等 • 对于数据库 – 管理数据库 • • • • • • Genbank Unigene Refseq dbSNP dbEST OMIM – 提供Entrez数据库检索 – BLAST数据库序列搜索比对等 利用NCBI获取所有玉米的全长cDNA 1.利用关键字 FLI-CDNA搜索 2. 选择nucleotide 3. 选择物种 --- 玉米 4.选择浏览方式 (可选) 5. 选择下载方式,可直接下载fasta文件 Pfam http://pfam.janelia.org/ Genome Browser • 浏览基因组信息:原始测序序列、基因结构、EST 支持、转录因子、序列保守性、SNP等一系列信息 。 • 缺点:只适合手工浏览,不适和大规模处理 Jbrowser UCSC Introduction • University of California Santa Cruz (UCSC) • Genome Browser Database • URL:http://genome.ucsc.edu/ • 数据构成: – 基因组数据 – 基因组间的比对信息 – 参考序列(mRNA, EST) – 基因注释信息(ENCODE项目) UCSC HomePage Genome Browser Customized UCSC Browser 苏震实验室数据库及网络服务介绍 植物mRNA数据库 Zhenhai Zhang, Jingyin Yu, Daofeng Li, Zuyong Zhang, Fengxia Liu, Xin Zhou, Tao Wang, Yi Ling, and Zhen Su Nucleic Acids Research, 2010, Vol. 38, Database issue D806-D813 大豆功能数据库 苜蓿数据库 Li D, Su Z, Dong J, Wang T. An expression database for roots of the model legume Medicago truncatula under salt stress. BMC Genomics. 2009 Nov 11;10(1):517. 植物分泌蛋白数据库 植物泛素化系统数据库 Zhou Du, Xin Zhou, Li Li, Zhen Su, plantsUPS: a database of plants' Ubiquitin Proteasome System, BMC Genomics, 2009, 10:227 玉米信号转导数据库 BMC genomics, 2010 EasyGO:GO富集分析平台 Xin Zhou, Zhen Su, EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species, BMC Genomics 2007, 8:246 agriGO:农业物种GO富集分析平台 Zhou Du, Xin Zhou, Yi Ling, Zhenhai Zhang and Zhen Su Nucleic Acids Research, 2010 Faculty of 1000 biology “Recommend” 构建数据库或网络服务可能需要用到的技术 Biological Meaning Literature mining Database Linux Apache Computer technique (LAMP) + HTML (CSS) + Javascript MySQL PHP/Python/Perl 谢谢 ~