On line (DNA and amino acid) Sequence Information Lecture 7 Bioinformatcs Databases • The Biological data, generated by various labs, is submitted and stored in specific databases is : • The data can be: – Nucleotide: DNA and mRNA (cDNA) – Proteins sequences • The main nucleotide sequence databases are: – United states: Genebank (NCBI) – Europe: Nucleotide sequence database (EMBL) – Japan: DNA databank of Japan. (DDJB) • These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA that be used to see what genes are expressed… Protein Databases • The main protein databases is: • Uniprot (DB) databases contains data from three related databases sites: – SWISS-PROT (most up-to date information) – Trembl: (translation of coding sequences.) – PIR database [protein information resource] • Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data. The Annotation of genes • Once the gene sequence’s have been determined then the data must be annotated, This basic annotated data includes: (Klug 2010) – Identify regulatory regions – Identify coding sequences (cds); the exons/ introns (if a sequence; eukaryotic)…. – The amino acid sequence for the gene. – Other organisms where the DNA sequence/ AA sequence is to found – Journals/Reference to where data came from. – Links to other databases that contain information about the gene, Global Sequence 4 Bioinformatics Database • To faciliate finding annotated data about genes and protein information there are a number of sites containing specific search engines; – NCBI has ENTREZ – EMBL has the EBI search page previously SRS engine – The SIB ExPaSy search engine (This is more fosuces on protein related information. ) • Consider the following query: – What is the DNA and amino acid sequence for the following gene: Human BTEB – Type the following into the search text box: – Human[orgamism] AND BTEB[title] NCBI Entrez search page BTEB NCBI Nucleotide Record Coding section of gene The Exon intron structure is also available in graphic form Further information • On the right hand column you will find links to online analytical resources; e.g. BLAST (psiblast) (a tool to search for similar sequences contained in the database): • Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database. An EMBL nucleotide record • Annotated data can also be found in the EMBL database: • BTEB EMBL record.: shows the main record. • Clicking on the “text” link at the top right hand corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record. • An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and Human The BTEB Protein record A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human Protein Other databases databases • The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”. – More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTS – There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD) Other databases – Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST ; Homologous structure alignment database. – Structural databases from the Protein Data Bank – On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. • The nucleic acids research journal January edition provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database edition Other important information sources • PUBMED: Literature research: journal articles/ conference proceedings/ books etc. – – – – Search under many fields: keyword, author…. Returns: journal articles/abstracts Two types: general/review. BTEB pubmed search found at: • http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&cmd=De tailsSearch • The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, emailed…. BTEB pubmed search result Exercise • The EMBL-EBI record: BTEB_”text”_record. • The NCBI : BTEB NCBI Nucleotide Record • The DDJB: BTEB flatfile Record • Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3rd edition ; Book can be found in the library. Exercise • Search for the following gene “DNA” sequence: – Human Leukocyte Elastase gene linear DNA [ hint should be 5292 bp long]. – Retrieve the record and download and save the fasta file.