On line (DNA and amino acid) Sequence Information Lecture 9 Introduction • • • • • • Annotation of genes Basic bioinformatics Databases NCBI home page Query and return results DNA sequence results page Protein sequence results page Bioinformatcs Databases • The Biological data, generated by various labs, is submitted and stored in specific databases is : • The data is Nucleotide: DNA and mRNA (cDNA) and Proteins sequences • The main “primary” nucleotide sequence databases are: – United states: Genebank (NCBI) – Europe: Nucleotide sequence database (EMBL) – Japan: DNA databank of Japan. • These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA and can be used to see what genes are expressed… Protein Databases • The main protein databases is: • Uniprot: (universal Protein resource) • Uniprot (KB) databases contains data from – SWISS-PROT (most up-to date information) – Trembl: (translation of coding sequences.) – PIR database • Both the nucleotide and databases contain much more detail than sequences and the detail is referred to annotation. Annotation of sequences • Once the gene sequence’s have been determined then the data must be annotated: (Klug 2010) – Identify regulatory regions – Other sequences of interest: exons/ introns, coding sequences (cds), polyA signal – In protein annotation there are mRNA sequences – Other organisms where the DNA sequence/ AA sequence is to found – Journals/Reference to where data came from. Global Sequence 5 Bioinformatics Database • Bioinformatic Databases contain information for various biological data: • To faciliate finding information there are a number of specific search engines: – NCBI has ENTREZ – EMBL has SRS • Consider the following query: – What is the DNA and amino acid sequence for the following gene: Human BTEB – more detail on the terms can be found by looking at a sample record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord NCBI Entrez search page Nucleic Record Coding section of gene The Exon intron structure is also available in graphic form Protein records Other databases databases • The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as primary databases. • More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTS • There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD) Other databases • Databases for specific types of sequences such as those associated with promoters and other regulatory elements. • Others include structural databases from the Protein Data Bank • On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. Bioinformatics Search Engines • The Entrez (NCBI) search engine retrives information from NCBI databases and can be used to obtain other information including publications (Pubmed), 3D protein structures, online mendellian inheritance of Man…. A tutorial can be found at: – Entrez: Making use of its power: • The EMBL uses ExPASy site which utilises the open source application: Sequence retrival system: a tutorial can be found at: – SRS tutotial: quick tour Other important information sources • PUBMED: Literature research: journal articles/ conference proceedings/ books etc. – Search under many fields: keyword, author…. – Returns: journal articles/abstracts – Two types: general/review. • NCBI account: set up an NCBI account to manage previous searches…. • BTEB pubmed search found at: – http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&c md=DetailsSearch BTEB pubmed search result