Lecture Slides, Part 1 - UT Health Sciences Library and

advertisement
NCBI FieldGuide
NCBI Molecular Biology
Resources
A Field Guide
Part 1
February 14, 2006
University of Tennessee, Memphis
- Health Sciences Center
Bethesda
Created in 1988 as a part of the
National Library of Medicine at NIH
–
–
–
–
Establish public databases
Research in computational biology
Develop software tools for sequence analysis
Disseminate biomedical information
NCBI FieldGuide
The National Center for
Biotechnology Information
NCBI FieldGuide
NCBI Web Traffic
Japan 6%
Italy 4%
Users per day
Canada 3%
Germany 3%
600,000
United Kingdom
3%
Netherlands 2%
World
Internet Users
Spain 2%
500,000
Brazil 2%
Sweden 1%
U.S.
400,000
(.com, .net, .org,
.gov,
gov, .us)
Switzerland 1%
Belgium1%
Other
14%
40%
US
Internet Users
300,000
200,000
100,000
1998
1999
2000
2001
2002
2003
2004
Christmas and New Year’s Day
2005
NCBI FieldGuide
Literature Databases
NCBI FieldGuide
Part 2. Data Flow and Processing
Part 3. Querying and Linking the Data
Part 4. User Support
A part of the NCBI Bookshelf
NCBI FieldGuide
Part 1. The Databases
NLM Catalog
PubChem
PubMed
Compounds
BioAssays
Substances
OMIM
PubMed Central
Journals
3D Domains
Books
Structure
Taxonomy
CDD/CDART
Entrez
Protein
NCBI FieldGuide
The (ever expanding) Entrez System
Genome
UniSTS
HomoloGene
HomoloGene
SNP
UniGene
Gene
Gene
GEO/GDS
GenSat
PopSet
Nucleotide
GenomeProjects
Cancer Chromosomes
● A system of 29 linked databases
● A tool for finding biologically linked data
● A text search and retrieval engine
● A virtual workspace for manipulating large datasets
NCBI FieldGuide
What is Entrez?
● Each record is assigned a UID.
– A “unique integer identifier” for internal tracking
● All Molecular Database entries are organized by
organism (Taxonomy Database).
● Each record is indexed by data fields.
– [author], [title], [organism], and many others
● Each record is given a Document Summary.
– a summary of the record’s content (DocSum)
● Each record is assigned links to biologically
related UIDs.
NCBI FieldGuide
Entrez Databases
Word weight
NCBI FieldGuide
Examples of Database Integration at NCBI
PubMed
Phylogeny
3-D
mmdb
Taxonomy
(3D
structure)
Structure
VAST
Genomes
BLASTn
Nucleotide
sequences
Protein
sequences
BLASTp
Links
Follow links to related data
in the same database
or in others!
Hard Links: Curated links based on biology
for example:
• nucleotide  taxonomy (based on organism identifier)
• protein  domain relatives (based on domain assignment)
• domains  pubmed (based on supporting literature)
Soft Links: Pre-computed analyses
for example:
• nucleotide  related sequences (BLAST neighbors)
• protein  conserved domains (CDD/RPS-BLAST search)
• gene  map viewer (map position of annotated gene)
NCBI FieldGuide
Following Links
NCBI FieldGuide
zebrafish
NCBI FieldGuide
•
Primary Databases
– Raw and redundant Data…..submitted, “owned” and updated
by experimentalists
• Examples: GenBank, SNP, GEO,
PubChem Substance & BioAssay
•
NCBI FieldGuide
Types of Molecular Databases
Derivative Databases
– Human-curated (compilation and curation of data)
• Examples: GEO Datasets, Structure & Literature databases
– Computationally-Derived
• Example: UniGene, HomoloGene, PubChem Compound
– Combination
• Examples: RefSeq, Gene, Genome Assembly, Conserved
Domain and Structure databases
RefSeq
Labs
Sequencing
Centers
TATAGCCG
AGCTCCGATA
CCGATGACAA
Curators
TATAGCCG
TATAGCCG
TATAGCCG
TATAGCCG
Updated
continually
by NCBI
GenBank
Updated ONLY
by submitters
Genome
Assembly
UniGene
Algorithms
NCBI FieldGuide
Primary vs. Derivative
Sequence Databases
GenBank
•
•
•
•
Nucleotide only sequence database
Archival in nature
Each record is assigned a stable accession number
Submission of GenBank Data to NCBI
– Direct submissions of individual records via Web
(BankIt, Sequin)
– Batch submissions of bulk sequences via Email
(EST, GSS, STS)
– FTP accounts for Sequencing Centers
•
Three collaborating databases and other sources of data
NCBI FieldGuide
1º Sequence Database
Entrez
NIH
NCBI
•Submissions
•Updates
GenBank
DDBJ
•Submissions
•Updates
getentry
EMBL
•Submissions
•Updates
EBI
CIB
NIG
EMBL
SRS
NCBI FieldGuide
The International Sequence
Database Collaboration
Release 151
December 2005
52,016,762
56,037,734,462
>140,000
Records
Nucleotides
Species
216 Gigabytes
890 files
• full release every two months
• incremental and cumulative updates daily
• available only through internet
ftp://ftp.ncbi.nih.gov/genbank/
NCBI FieldGuide
GenBank
PRI
ROD
PLN
BCT
VRT
INV
VRL
MAM
PHG
SYN
UNA
(29)
(23)
(17)
(13)
(10)
(9)
(5)
(2)
(1)
(1)
(1)
Primate
Rodent
Plant and Fungal
Bacterial/Archeal
Other Vertebrate
Invertebrate
Viral
Mammalian
Phage
Synthetic
Unannotated
EST
GSS
HTG
PAT
STS
HTC
ENV
(464)
(164)
(69)
(19)
(14)
(10)
(3)
Expressed Sequence Tag
Genome Survey Sequence
High Throughput Genomic
Patent sequences
Sequence Tagged Site
High Throughput cDNA
Environmental Samples
Traditional
NCBI FieldGuide
GenBank Divisions
•Direct Submissions (Sequin/Bankit)
•Accurate (~1 error per 10,000 bp)
•Well characterized
•Organized by taxonomy
Bulk
•From sequencing projects
•Batch submissions (ftp/email)
•Inaccurate
•Poorly Characterized
•Organized by sequence type
Derivative Sequence Database
•
•
•
The curated “best representative” sequences
Standardized nomenclature and record structure
Added annotation (references, sequence features)
NCBI FieldGuide
RELEASE 15 IS NOW AVAILABLE ON THE FTP SITE!
NCBI FieldGuide
RefSeq Curation Processes
Curated genomic DNA
(NC, NT, NW)
Scanning....
Curated Model mRNA (XM)
Model protein (XP)
(XR)
Curated mRNA (NM)
(NR)
Protein (NP)
LOCUS
DEFINITION
ACCESSION
VERSION
ADSS
1368 bp mRNA
linear
PRI 27-AUG-2002
Homo sapiens adenylosuccinate synthase (ADSS), mRNA.
NM_001126
RefSeq Nucleotide
NM_001126.1 GI:4557270
LOCUS
ADSS
455 aa
linear
PRI 27-AUG-2002
DEFINITION adenylosuccinate synthase; Adenylosuccinate synthetase
(Ade(-)H-complementing) Homo sapiens .
ACCESSION
NP_001117
VERSION
NP_001117.1 GI:4557271
RefSeq Protein
DBSOURCE
REFSEQ: accession NM_001126.1
COMMENT
REVIEWED REFSEQ: This record has been curated by NCBI
staff. The reference sequence was derived from X66503.1.
Summary: Adenylosuccinate synthetase catalyzes the first
committed step in the conversion of IMP to AMP.
X records: Genome Annotation & Inferred or Predicted
vs
N records: Provisional, Reviewed or Validated
NCBI FieldGuide
Curated RefSeq Records
NCBI now accepts the submission of new annotations
of existing GenBank sequences.
•
•
Submissions must be published in a peer-reviewed journal.
Facilitates the annotation of sequences by experts.
NCBI FieldGuide
Third Party Annotation
(TPA) Database
Examples of sequences appropriate for TPA are:
– Annotation of features on gene and/or mRNA sequences
– Assembled “full length” genes and/or mRNAs
What should not be submitted to TPA?
– Synthetic constructs (such as cloning vectors) that use well-characterized,
publicly available genes, promoters, or terminators
– Updates or changes to existing sequence data
– Sequence annotations without experimental evidence
“Best representative” (reference) sequences
Standardized nomenclature and record structure
Added annotation (references, sequence features)
Mapping Genome Data on an Assembly:
Genome Sequence
(RefSeq: NC, NT, NW)
Transcript regions & ORFs
(RefSeq: NM/NP, XM/XP)
Markers (STS)
Polymorphisms (SNP)
ESTs/Exons (UniGene)
NCBI FieldGuide
•
•
•
as of January 2006
Organelles:
– Mitochondria (806)
– Plastids (50)
– Plasmids (850)
– Nucleomorphs (3)
• Viruses (2260)
• Archaebacteria (25)
• Eubacteria (269)
• Eukaryotes
(19complete/83assemblies)
NCBI FieldGuide
Complete Genomes
NCBI FieldGuide
New!
Genome Projects
NCBI FieldGuide
• Full chromosomal
sequences are provided
• Genes are annotated
• The annotation can be
shown graphically and
linked to sequence
records
NCBI FieldGuide
Simple Genomes
NCBI FieldGuide
RefSeq Chromosomes: NC_
LOCUS
NC_000913
4639221 bp
DNA
circular BCT 30-JUL-2003
DEFINITION Escherichia coli K12, complete genome.
ACCESSION
NC_000913
VERSION
NC_000913.1 GI:16127994
gene
3954631..3956478
KEYWORDS
.
/gene="mutL"
SOURCE
Escherichia coli K12.
/locus_tag="b4170"
ORGANISM Escherichia coli K12
BASE COUNT
978672
a1011074
c 997153
g 974742 t Enterobacteriales;
/note="synonym:
mut-25"
Bacteria;
Proteobacteria;
Gammaproteobacteria;
ORIGIN
Enterobacteriaceae; Escherichia.
CDS
3954631..3956478
REFERENCE /gene="mutL"
1 (bases
1 to 4639221)
1 cgtcttcatt
gtcagacagc agaatttgta cgcgctgttc ggcttgttgt aatttggcct
AUTHORS /locus_tag="b4170"
Blattner,F.R.,
Plunkett,G.
III, acgccgcgtt
Bloch, C.A.,cgaactcgtt
Perna, N.T.,cagcgcctct
Burland,V., tccagcggca
61 gcccctgacg tgccagctgc
Riley,M., Collado-Vides,J., Glasner,J.D., Rode, C.K., Mayhew,G.F.,
121 ggtcgccact ttccagacggmismatch
gttacaatct
gttccagctc gctcagcgcc ttttcaaagc
/function="methyl-directed
repair"
Gregor,J., Davis,N.W., Kirkpatrick,H.A., Goeden,M.A., Rose,D.J.,
181 tggcgggcgc
/codon_start=1
Mau,R.
and Shao,Y. ctcatttttc ttcggcataa tgaatgtctg actctcaata tttttcgccc
241complete
cgtcatggta
aaataacgcg
caatggtaag gtgatgtgca
/transl_table=11
TITLE
The
genomeacggactcag
sequence of ggcaaatagc
Esherichia coli
K12.
301 cagcaaagcg
tatacttccg cgcctggatg cagccgcagg tgtgggctgc
JOURNAL /product="MutL"
Science
277 (5331),atgttagtgg
1453-1474 (1997)
MEDLINE /protein_id="NP_418591.1"
97426617
361 tgtatttttc cctatacaag tcgcttaagg cttgccaacg aaccattgcc gccatgaagt
PUBMED /db_xref="GI:16131992"
9278503
421 ttatcattaa attgttcccg gaaatcacca tcaaaagcca atctgtgcgc ttgcgcttta
REFERENCE
2 (bases 1 to 4639221)
481 taaaaatcct taccgggaac attcgtaacg ttttaaagca ctatgatgag acgctcgctg
/translation="MPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDI
AUTHORS
Blattner,F.R.
541 tcgtccgcca
DIERGGAKLIRIRDNGCGIKKDELALALARHATSKIASLDDLEAIISLGFRGEALASI
TITLE
Direct
submission ctgggataac atcgaagttc gcgcaaaaga tgaaaaccag cgtctggcta
601 ttcgcgacgc
tctgacccgt
attccgggta
tccaccatat
gaagacgtgc
JOURNAL SSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAAHPVGTTLEVLDLFYNTPARRKF
Sumbitted
(16-JAN-1997)
Guy Plunkett
III, Laboratory
of tctcgaagtc
Genetics,
661 cgtttaccga
catgcacgat
attttcgaga
aagcgttggt
tcagtatcgc
gatcagctgg
University
of Wisconsin,
445 Henry
Mall, Madison,
WI 53706,
USA.
LRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQKERRLGAICGT
E-mail
ecoli@genetics.wisc.edu
608-262-2543
Fax: acatgatttt agctcgattg
721 aaggcaaaac
cttctgcgta Phone:
cgcgtgaagc
gccgtggcaa
AFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQAC
Annotation
of sequence
Genome
Gene, CDS,
781 atgtggaacg ttacgtcggc ggcggtttaa atcagcatat tgaatccgcg cgcgtgaagc
EDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQQQL
and other
features
841 tgaccaatcc ggatgtgact gtccatctgg
aagtggaaga
cgatcgtctc ctgctgatta
ETPLPLDDEPQPAPRSIPENRVAAGRNHFAEPAAREPVAPRYTPAPASGSRPAAPWPN
901 aaggccgcta cgaaggtatt ggcggtttcc cgatcggcac ccaggaagat gtgctgtcgc
AQPGYQKQQGEVYRQLLQTPAPMQKLKAPEPQEPALAANSQSFGRVLTIVHSDCALLE
961 tcatttccgg tggtttcgac tccggtgttt ccagttatat gttgatgcgt cgcggctgcc
RDGNISLLSLPVAERWLRQAQLTPGEAPVCAQPLLIPLRLKVSAEEKSALEKAQSALA
ELGIDFQSDAQHVTIRAVPLPLRQQNLQILIPELIGYLAKQSVFEPGNIAQWIARNLM
SEHAQWSMAQAITLLADVERLCPQLVKTPPGGLLQSVDLHPAIKALKDE"
mutL
NCBI FieldGuide
New!
•
•
•
Sequences are provided complete or
we help assemble
Heavy annotation:
Genes, transcript regions & ORFs,
sequence variations & markers,
clones, ESTs, etc.
The annotation can be shown
graphically and linked to other
databases using the MapViewer
A database for retrieval and analysis of karyotype data:
Cancer Chromosomes
NCBI FieldGuide
Complex Genomes
Click here to see all features and the sequence of this contig record.
NCBI FieldGuide

RefSeq Records
1: Contig:
NT_034400. Homo
sapiens
NT_
&chro...[gi:51458694]
Chromosome: NC_Links
Click here to see all features and the sequence of this contig record .
LOCUS
NT_034400
1065823 bp
DNA
linear
CON 19-AUG-2004
DEFINITION Homo sapiens chromosome 1 genomic contig.
ACCESSION NT_034400
VERSION
NT_034400.3 GI:51458694
KEYWORDS .
gene
complement(2548206..2591802)
SOURCE
Homo sapiens
Annotation of
/gene="ADSS"
ORGANISM Homo sapiens
Gene, mRNA, CDS,
/db_xref="LocusID:159"
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
and other features
/db_xref="MIM:103060"
Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini;
mRNA
complement(join(2548206..2549349,2550998..2551147,
Hominidae; Homo.
2555692..2555789,2557339..2557463,2558471..2558625,
REFERENCE 1 (bases 1 to 1065823)
AUTHORS2559881..2560007,2562526..2562607,2563644..2563751,
International Human Genome Sequencing Consortium.
TITLE 2564012..2564078,2572236..2572286,2576516..2576584,
The DNA sequence of Homo sapiens
JOURNAL2577357..2577459,2591326..2591802))
Unpublished (2003)
/gene="ADSS"
COMMENT
GENOME ANNOTATION REFSEQ: Features on this sequence have
/product="adenylosuccinate
synthase"
CONTIG join(AL139152.7:1..55543,AL596177.4:1998..91084,
been produced for build 35
version 1 of the NCBI's genome
/note="Derived
by
automated
computational
analysis using gene
AL356378.17:1999..202955,AL391904.14:2001..68222,
annotation [see documentation].
prediction
method:
BLAST.
Supporting
evidence
includes
AL590667.7:2001..175494,AL359207.7:2001..112707,
On Aug 19,
2004 this
sequence
version
replaced
gi:27478327.
similarity
to: 3 mRNAs"
AL365260.11:2001..114412,complement(AL445591.10:1..138092),
The DNA sequence
is part of the third release of the
/transcript_id="XM_049992.8"
BX537254.7:2001..121309)
finished human reference genome. It was assembled from
/db_xref="GI:22045950"
draft sequences
individual clone sequences by Ordering
the HumanofGenome
Sequencing
//
/db_xref="LocusID:159"
Consortium in consultation with NCBI staff.
/db_xref="MIM:103060"
COMPLETENESS: not full length.
NCBI FieldGuide
NCBI FieldGuide
Higher Genome MapViews
NCBI FieldGuide
Higher Genome MapViews
A new database for localization of proteins in Mouse Brains:
New!
GenSat
A new database for information on Expression Reagents:
Probe
NCBI FieldGuide
Gene Expression Databases
NCBI FieldGuide
Submit and update data
Query the database:
• gene identifiers
• field information
• sequence
Browse datasets
Download data
GPL
Platform
descriptions
GSM
GSE
Grouping of
Raw/processed
slide/chip data
spot intensities
from a single “a single experiment”
slide/chip
Entrez GEO
Curated by
NCBI
NCBI FieldGuide
Submitted by
Manufacturer*
Submitted by
Experimentalists
GDS
Grouping of
experiments
Entrez
GEO Datasets
NCBI FieldGuide
GDS177: CMV infection of HFF cells
NCBI FieldGuide
as of January 2006
SEVERAL Organisms
Expression oriented
NCBI FieldGuide
UniGene Collections
A Cluster of ESTs:
NCBI FieldGuide
Arabidopsis serine protease
query
5’ EST hits
3’ EST hits
NCBI FieldGuide
New!
NCBI FieldGuide
EST-based Expression Profiles
NCBI FieldGuide
New!
NCBI FieldGuide
New!
Pr196507.1
Links
Ribonucleic acid probe (riboprobe) Prnp for Mus musculus gene prion protein (Prnp). Has
been used in the GENSAT project for in situ hybridization.
NCBI FieldGuide
Probe: Expression probes
Pr186482.1
Pr001034449.1
Links
Small hairpin
interfering
RNARNA
(shRNA)
(siRNA)
probe
probe
V2MM_66187
for Mus musculus
for Mus gene
musculus
priongene
protein
prion
(Prnp).
protein
Has(Prnp).
been
Developed
used for RNA
for interference
RNA interference
(RNAi).
(RNAi). Reagent is available from Open Biosystems.
NCBI FieldGuide
Probe: siRNAs & shRNAs
Sequences
&
Structures
NCBI FieldGuide
Protein
Protein
Conserved
Domain
Protein sequences
CDD: Conserved functional domains in
proteins represented by a PSSM
RPS-BLAST, CDART
Structure MMDB: Experimentally-derived 3D structure
records from PDB
3D Domain
Compact structural domains of protein folds
NCBI FieldGuide
Linking Protein Sequence,
Structure and Function
-Conserved Domainsconserved sequence elements that perform common functions
NCBI FieldGuide
Sequence-based Neighbors:
Domain Neighbors
Curation of protein multiple sequence alignments
with known similar function by conversion
to Position-Specific Scoring Matrices
10
20
30
40
50
60
....*....|....*....|....*....|....*....|....*....|....*....|
consensus
1FGI A
1BYG A
gi 125135
gi 125702
gi 1174437
1
1
1
1
1
1
KWEIPREDLTLGKKLGEGAFGEVYKGTLKGkgd---nkSIDVAVKTLKEDASEeqIKEFL
aWEIPRESLRLEVKLGQGCFGEVWMGTWNG--------TTRVAIKTLKPGTMS--PEAFL
RWELPRDRLVLgkPLGEGAFGQVYLAEAIglgkdkpnrvTKVAVKMLKSDAtedkLSLDI
GWALNMKELKLlqTIGKGEFGDVMLGDYRg---------NKVAVKCIKNDAt---AQAFL
KYEIPRTDLTLkhKLGGGQYGEVYEGVWKky-------sLTVAVKTLKEDTm--eVEEFL
KWEIPRSELTIlrKLGRGNFGEVFYGKWRn--------sIDVAVKTLREGTm--sTAAFL
57
311
74
62
284
325
“Reverse-Position Specific” Sequence Comparisons (RPS-BLAST)
a.k.a.
“Conserved Domain Database” (CDD) Search
NCBI FieldGuide
“Conserved Domain Architecture Retrieval Tool”
(CDART)
Modular Architecture of Domains
• Cartoon descriptions of protein domain organization
on the primary sequence
• Allows for comparison with other proteins with the same Domain
NCBI FieldGuide
Sequence-based Neighbors:
Domain Relatives
NCBI FieldGuide
NCBI Conserved Domain Summary
CDART: Conserved Domain Architecture Retrieval Tool
•
•
Derived from experimentally determined PDB records
Data is added to PDB records including:
–
–
–
–
•
Addition of explicit chemical bonding information
Validation and indexing of sequence
Inclusion of Taxonomy, Citation, and other information
Conversion to ASN.1 data description language
Searching the Structure Databases:
•
•
•
•
Keyword search by Entrez
Sequence search by BLAST or BLink
Domain search by CDD/RPS-BLAST
Structure search by VAST
NCBI FieldGuide
Entrez Structure:
Molecular Modeling Database
NCBI FieldGuide
Structure Summary Page
Structure-based
Neighbors
to get the Cn3D viewer
Sequence-based Neighbors:
Structure-based Domains:
Conserved Domains (CDD/RPS-BLAST)
(3D Domains)
Entrez PubChem
PC Compound
zidovudine
NCBI FieldGuide
New!
Derived database of
known chemicals from
PC Substance records
PC Substance
Primary database of
chemical samples
PC BioAssay
Primary database of
bioactivity screens of
samples in PC Substance
NCBI FieldGuide
PubChem:
NCBI FieldGuide
Compound, Substance, BioAssay
Summary pages
of curated information about genetic loci
for organisms in the RefSeq project.
►Graphics
►Gene information
►Bibliography (PubMed links)
►General gene information
►NCBI Reference Sequences
►Related sequences
►Additional Links
NCBI FieldGuide
The Gene Summary Database
R.norvegicus
G6pdx
M.musculus
G6pd1, G6pdx
D.melanogaster
Zw
A.thalia
At5g35970
S.pombe
SPAC3C7.13c
SPAC9.01
SPCC794.01c
B.anthracis
BA_3932
H.pylori
HP1101
zwf
E.coli,
Salmonella,
Shigella,
Yersinia,
Neisseria….
glucose-6-phosphate dehydrogenase
NCBI FieldGuide
H.sapiens & B.taurus
G6PD
GENE SYMBOL & name [Organism]
►Bibliography (PubMed links)
GeneRifs
►General gene information:
Gene Ontology
Homology (Mouse, Rat, Human)
Phenotypes
Sequence Tagged Sites
Pathways
►NCBI Reference Sequences
mRNA sequence
Source sequence
Product
Conserved Domains
►Related sequences
(genomic, mRNA & protein)
►Additional Links
Default Display
G6PD glucose-6-phosphate dehydrogenase [Homo sapiens]
NCBI FieldGuide
►Gene information:
Gene type
Gene name
Gene description
RefSeq status
Organism
Lineage
Gene Aliases
Summary
General protein information
►Graphics:
Transcripts and products
Genomic context
G6PD glucose-6-phosphate dehydrogenase [Homo sapiens]
NCBI FieldGuide
Gene
Default
Table Display
G6PD glucose-6-phosphate dehydrogenase [Homo sapiens]
NCBI FieldGuide
SNP: GeneView
NCBI FieldGuide
FTP Downloads
NCBI Toolbox: In-house source code useful for incorporating
NCBI-like functionality into their programs.
Three main parts: Data Model, Data Encoding
and Programming Libraries.
• Examples: BLAST, Cn3D, Sequin, Data format conversion scripts
http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi
E-Utilities: Guidelines for Entrez “URL calls” used to access data.
Designed for use in scripts.
• Examples: ESearch, EPost, ESummary, EFetch and ELink
http://www.ncbi.nih.gov/entrez/query/static/eutils_help.html
Caution: Overuse may result in blocked IPs!
NCBI FieldGuide
Help for Programmers
To come in Part 2:
• Searching Records with Entrez
• Searching Sequences with BLAST
• Searching Structures with VAST
• An Integrated Example
NCBI FieldGuide
Intermission
Download