Accession number Accession number

Access to Sequence Data & Literature Information LESSON 2 BIOINFORMATICS APPLICATIONS 1. Information Search in databases (DBs) Scientific literature : PubMed …… Gene : GenBank, EMBL-EBI, DDBJ …… Protein : UniProt, Protein Data Bank …… 2. Data Analysis using programs (PGs) Sequence similarity search : Blast, Fasta …… Bioinformatics Portals 1. NCBI (http://www.ncbi.nlm.nih.gov/) 2. EBI (http://www.ebi.ac.uk/) 3. ExPASy (http://www.expasy.org/) Databases ? O Collections of Data O Sophisticated arrangements of storage O data in a structured manner O Can be manipulated by software backbone of bioinformatics research • Databases can be accessed locally or online and often link to each other  The data come from different sources GenBank DATABASE OF MOST KNOWN NUCLEOTIDE AND PROTEIN SEQUENCES (http://www.ncbi.nlm.nih.gov/genbank/) Growth of GenBank & WGS Dec 2013, 169331407 Dec 1982, 606 Top organisms in GenBank Nucl. Acids Res. (1 January 2014) 42 (D1): D32-D37. insdc.org - EBI • Primary sequence databases hold many millions of nucleic acid sequence records. Access to Information Nucleotide DATABASES Nucleotide (http://www.ncbi.nlm.nih.gov/nucleotide/) Gene (http://www.ncbi.nlm.nih.gov/gene)  protein-coding region  Start codon :  Stop codon :  : Coding Segments (= CDS) : Open Reading Frame (= ORF ) ATG → Met TAA, TAG, TGA Untranslated Region (= UTR) Accession number  label for sequence  String of letters and/or numbers that corresponds to a molecular sequence.  DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequence or other record relevant to molecular data. Protein coding segment start end To use the nucleotide sequence as input for other P/Gs. (FASTA: sequence-alignment-and-database-scanning P/G)  versatile, compact with one header line followed by a string of nucleotides or amino acids in capital letters for the single letter codes >My Sequence Name ARCGTCRGCKINTANDCKINTANDARCGCKINTANDRGCKINT ANDNTANDARCGCKINTANDARNDBCQEDNBNCDNDNQENNDN  Capital letters for the one-letter codes  No space between codes  Courier font for easy alignment ARCGTCRGCKINTANDCKINTANDARCGCKINTANDRGCKINT ANDNTANDARCGCKINTANDARNDBCQEDNBNCDNDNQENNDN DNA • provides an expertly curated accession number that corresponds to the most stable, agreed-upon “reference” version of a sequence. • RefSeq identifiers include the following formats: Complete genome Complete chromosome Genomic contig mRNA (DNA format) Protein NC_###### NC_###### NT_###### NM_###### e.g. NM_000518 NP_###### e.g. NP_000509 Gene mRNA Protein “Gene” at NCBI offers a wealth of information • • • • • • • • Genomic context Bibliography Phenotypes Gene Ontology (organizing principles of biological process, molecular function, cellular component) Reference sequences Additional (non-RefSeq sequences) Many, many links to NCBI resources Many, many links to external resources Access to Information Protein DATABASES PIR (http://pir.georgetown.edu/) Protein (http://www.ncbi.nlm.nih.gov/protein/) EBI Proteins (http://www.ebi.ac.uk/services/proteins) ExPASy (http://www.expasy.org/) UniProt (http://www.uniprot.org/) Universal Protein Resource •TrEMBL Automatic Translation of European Molecular Biology Laboratory nucleotide sequences  A logic-based organizational structure for knowledge.  A set of field-specific descriptors enabling the sharing of same concepts and definitions for specific terms.  Scientific data sharing made easy  Integration of the complex data  bridge the gap between different biological communities Access to Information GENOME BROWSERS Ensembl (http://www.ensembl.org/) UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway) Map viewer (http://www.ncbi.nlm.nih.gov/mapview/) GENOME?  The total genetic content (information) contained in a full set of chromosomes. Access to Information Scientific Literature Search PubMed (http://www.ncbi.nlm.nih.gov/pubmed) Bookshelf (http://www.ncbi.nlm.nih.gov/books/)

Accession number Accession number

Related documents

Products

Support

Accession number Accession number

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib