On line (DNA and amino acid) Sequence Information

advertisement
On line (DNA and amino acid)
Sequence Information
Lecture 9
Introduction
•
•
•
•
•
•
Annotation of genes
Basic bioinformatics Databases
NCBI home page
Query and return results
DNA sequence results page
Protein sequence results page
Bioinformatcs Databases
• The Biological data, generated by various labs, is
submitted and stored in specific databases is :
• The data is Nucleotide: DNA and mRNA (cDNA)
and Proteins sequences
• The main “primary” nucleotide sequence
databases are:
– United states: Genebank (NCBI)
– Europe: Nucleotide sequence database (EMBL)
– Japan: DNA databank of Japan.
• These databases also contain sequences related
to:
– Expressed sequence tags (ESTs) small (800 bp) of
mRNA and can be used to see what genes are
expressed…
Protein Databases
• The main protein databases is:
• Uniprot: (universal Protein resource)
• Uniprot (KB) databases contains data from
– SWISS-PROT (most up-to date information)
– Trembl: (translation of coding sequences.)
– PIR database
• Both the nucleotide and databases contain much
more detail than sequences and the detail is
referred to annotation.
Annotation of sequences
• Once the gene sequence’s have been
determined then the data must be annotated:
(Klug 2010)
– Identify regulatory regions
– Other sequences of interest: exons/ introns,
coding sequences (cds), polyA signal
– In protein annotation there are mRNA sequences
– Other organisms where the DNA sequence/ AA
sequence is to found
– Journals/Reference to where data came from.
Global Sequence
5
Bioinformatics Database
• Bioinformatic Databases contain information for
various biological data:
• To faciliate finding information there are a
number of specific search engines:
– NCBI has ENTREZ
– EMBL has SRS
• Consider the following query:
– What is the DNA and amino acid sequence for the
following gene: Human BTEB
– more detail on the terms can be found by looking at a
sample record:
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord
NCBI Entrez search page
Nucleic Record
Coding section of gene
The Exon intron structure is also available in graphic form
Protein records
Other databases databases
• The nucleotide (Genbank and EMBL) and protein
(Uniprot) contain the “raw data” and are referred
to as primary databases.
• More specific databases derive data from these
and are referred to as secondary database;
examples include protein family and sequence
similarity databases such as PROSITE and PRINTS
• There are databases which contain information
about specific organisms such as e. coli using
Genome online database (GOLD)
Other databases
• Databases for specific types of sequences such
as those associated with promoters and other
regulatory elements.
• Others include structural databases from the
Protein Data Bank
• On-line Mendelian inheritance of man
(OMIM) which contains information on human
genes and genetic disorders.
Bioinformatics Search Engines
• The Entrez (NCBI) search engine retrives
information from NCBI databases and can be
used to obtain other information including
publications (Pubmed), 3D protein structures,
online mendellian inheritance of Man…. A
tutorial can be found at:
– Entrez: Making use of its power:
• The EMBL uses ExPASy site which utilises the
open source application: Sequence retrival
system: a tutorial can be found at:
– SRS tutotial: quick tour
Other important information sources
• PUBMED: Literature research: journal articles/
conference proceedings/ books etc.
– Search under many fields: keyword, author….
– Returns: journal articles/abstracts
– Two types: general/review.
• NCBI account: set up an NCBI account to manage
previous searches….
• BTEB pubmed search found at:
– http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&c
md=DetailsSearch
BTEB pubmed search result
Download