NCBI FieldGuide - UT Health Sciences Library and

advertisement
A Field Guide
part 2
February 14, 2006
UT-Health Science Center
NCBI FieldGuide
National Center for Biotechnology Information
The Flatfile Format
Header
Feature Table
Sequence
NCBI FieldGuide
GenBank Records
A Typical GenBank Record
NM_019570 4279 bp mRNA linear INV 28-OCT-2004
Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA
NM_019570
NM_019570.3 GI:50811869
= Title
.
NCBI FieldGuide
LOCUS
DEFINITION
ACCESSION
VERSION
KEYWORDS
NCBI FieldGuide
GenBank Record: Feature Table
GenPept identifier
NCBI FieldGuide
GenBank Record: Feature Table, con’t.
skip
NCBI FieldGuide
GenBank Record: sequence
Field
[primary accession]
[title]
[organism]
[sequence length]
[modification date]
[properties]
Indexed Terms
NM_001012399 [accn]
Bos taurus hemochromatosis (hfe), mRNA.
Bos taurus [orgn]
1168
2005/02/19 [mdat]
biomol mrna [prop]
gbdiv mam
srcdb refseq
NCBI FieldGuide
Indexing for Nucleotide UID 59958365
HFE
NCBI FieldGuide
Global Entrez Search: HFE
137 records
[Title]
Not
HFE
NCBI FieldGuide
Entrez Nucleotide: HFE
hfe[title] AND human[orgn]
42 records
NCBI FieldGuide
Smarter Query
Curated HFE
splice variants
(11 total)
(con’t)
Primary data
NCBI FieldGuide
hfe[title] AND human[orgn]
NCBI FieldGuide
Preview/Index
Gateway to Advanced Searches
NCBI FieldGuide
Preview/Index
Properties
srcdb
NCBI FieldGuide
Preview/Index: Properties, srcdb
…AND srcdb refseq[Properties]
NCBI FieldGuide
Preview/Index: Properties, srcdb
…AND srcdb ddbj/embl/genbank[Properties]
NCBI FieldGuide
Preview/Index: Properties, srcdb
#1 hfe
#2 hfe[title] AND human[orgn]
137
42
#3 #2 AND srcdb refseq[prop]
#4 #2 AND srcdb ddbj/embl/genbank[prop]
#5 #4 AND gbdiv pri[prop]
#4 #4 AND gbdiv est[prop]
Primate division
EST division
11
31
29
2
gbdiv pri[prop]
gbdiv est[prop]
NCBI FieldGuide
Database Queries
#1 hfe
#2 hfe[title] AND human[orgn]
#3 #2 AND biomol mrna[prop]
#4 #2 AND biomol genomic[prop]
Genomic DNA
cDNA
116
42
29
13
biomol genomic[prop]
biomol mrna[prop]
NCBI FieldGuide
Molecule Queries
More Queries…
Entrez Nucleotide
Reviewed RefSeqs with transcript variants:
srcdb refseq reviewed[prop] AND transcript[title] AND variant[title]
NCBI FieldGuide
Fields are database-specific
More Queries…
Entrez Nucleotide
Reviewed RefSeqs with transcript variants:
srcdb refseq reviewed[prop] AND transcript[title] AND variant[title]
Entrez Gene
Topoisomerase genes from Archaea:
topoisomerase[gene name] AND archaea[organism]
Genes on human chromosome 2 with OMIM links
2[chromosome] AND human[organism] AND “gene omim”[filter]
Membrane proteins linked to cancer:
“integral to plasma membrane”[gene ontology] AND cancer[dis]
NCBI FieldGuide
Fields are database-specific
Genomic
Biology
Genomic
Biology
UniGene
E-PCR
Map Viewer
Trace Archive
NCBI FieldGuide
Genome Resources
NCBI FieldGuide
Genomic Biology
Gen Biol: Gen Resources
NCBI FieldGuide
NCBI FieldGuide
Map Viewer – Genome Annotation Updates
Gen Biol: Gen Resources
NCBI FieldGuide
NCBI FieldGuide
Genome Projects: microb
Genome Projects: microb
NCBI FieldGuide
13 Eukaryotic Genome Sequencing Projects Selected: Complete – 0, Assembly – 2,
In Progress - 11
Gen Biol: Gen Resources
NCBI FieldGuide
Gen Biol: Gen Resources
NCBI FieldGuide
Gen Biol: Gen Resources
NCBI FieldGuide
Gen Biol: Gen Resources
NCBI FieldGuide
NCBI FieldGuide
Gen Biol: Gen Resources
Genomic Biology
UniGene
E-PCR
Map Viewer
Trace Archive
NCBI FieldGuide
Genome Resources
Gene-oriented clusters of expressed sequences
• Automatic clustering using MegaBlast
• Each cluster represents a unique gene
• Informed by genome hits
• Information on tissue types and map locations
• Useful for gene discovery and selection of mapping
reagents
NCBI FieldGuide
UniGene
NCBI FieldGuide
A Cluster of ESTs
query
5’ EST hits
3’ EST hits
NCBI FieldGuide
UniGene Collections
Species
UniGene
NCBI FieldGuide
UniGene Collections
NCBI FieldGuide
UniGene Hs build 188
NCBI FieldGuide
UniGene Cluster Hs.95351
Lipase, hormone-sensitive (LIPE)
UniGene Cluster Hs.95351
NCBI FieldGuide
NCBI FieldGuide
UniGene Cluster Hs.95351: expression
NCBI FieldGuide
UniGene Cluster Hs.95351: seqs
Get Sequences
ftp://ftp.ncbi.nih.gov/repository/UniGene/Homo_sapiens/
NCBI FieldGuide
web page
Genomic Biology
UniGene
E-PCR
Map Viewer
Trace Archive
NCBI FieldGuide
Genome Resources
Genomic sequence here
NCBI FieldGuide
E-PCR
NCBI FieldGuide
Options
NCBI FieldGuide
Results
NCBI FieldGuide
reverse e-pcr
NCBI FieldGuide
reverse e-pcr
NCBI FieldGuide
reverse e-pcr
Gene
STS
LY6G6D: lymphocyte antigen 6 complex, locus G6D
NCBI FieldGuide
reverse e-pcr
Genomic Biology
UniGene
E-PCR
Map Viewer
Trace Archive
NCBI FieldGuide
Genome Resources
NCBI FieldGuide
List View
Human MapViewer
NCBI FieldGuide
adar
NCBI FieldGuide
MapViewer: Human ADAR
5’ UTR
MV Hs ADAR
NCBI FieldGuide
3’ UTR
--Sequence maps-Ab initio
Assembly
Repeats
BES_Clone
Clone
NCI_Clone
Contig
Component
CpG island
dbSNP haplotype
Fosmid
GenBank_DNA
Gene
Phenotype
SAGE_Tag
STS
TCAG_RNA
Transcript (RNA)
Hs_UniGene
Hs_EST
Mm_UniGene
Mm_EST
Rn_UniGene
Rn_EST
Ssc_UniGene
Ssc_EST
Bt_UniGene
Bt_EST
Gga_UniGene
Gga_EST
Variation
--Cytogenetic maps-Ideogram
FISH Clone
Gene_Cytogenetic
Mitelman Breakpoint
Morbid/Disease
--Genetic Maps-deCODE
Genethon
Marshfield
--RH maps-= SNP GeneMap99-G3
GeneMap99-GB4
NCBI RH
Standford-G3
TNG
Whitehead-RH
Whitehead-YAC
NCBI FieldGuide
Maps&
& Options
Maps
Options
MapViewer
UniGene
Repeats
Gene
NCBI FieldGuide
Component
Phenotype
NCBI FieldGuide
Gene
Variation
NCBI FieldGuide
Maps&
& Options
Options
Maps
Mouse
ADAR
Human
ADAR
NCBI FieldGuide
Chimp
ADAR
Genomic Biology
UniGene
E-PCR
Map Viewer
Trace Archive
NCBI FieldGuide
Genome Resources
NCBI FieldGuide
Trace Archive Page
NCBI FieldGuide
Ciona savignyi Traces
NCBI FieldGuide
Potential access to sequences NOT yet in GenBank
NCBI FieldGuide
Trace Archive BLAST Page
NCBI FieldGuide
Basic Local Alignment Search Tool
BLAST Web Searches, 2005
NCBI FieldGuide
200,000

Nucleotide or protein:
Related Sequences

BLAST link:
BLink

Transcript clusters:
UniGene

Protein homologs:
HomoloGene
NCBI FieldGuide
Precomputed BLAST Services
NCBI FieldGuide
Link to Related Sequences
Most similar
Least similar
NCBI FieldGuide
Related Sequences
NCBI FieldGuide
BLink (BLAST Link)
Best hits
3D structures
CDD-Search
NCBI FieldGuide
BLink Output
Fast
- heuristic approach based on Smith Waterman
Local alignments
Statistical significance
- Expect value
Versatile
- blastn, blastp, blastx, tblastn, tblastx, rps-blast,
psi-blast
- www, standalone, and network clients
NCBI FieldGuide
Why Is BLAST So Popular?
Seq 1
Seq 2
Global alignment
Seq 1
Seq 2
Local alignment
NCBI FieldGuide
Global vs Local Alignment
Seq1:
Seq2:
WHEREISWALTERNOW
(16aa)
HEWASHEREBUTNOWISHERE (21aa)
Global
Seq1:
1
Seq2:
1
W--HEREISWALTERNOW 16
W HERE
HEWASHEREBUTNOWISHERE
21
Local
Seq1: 1
Seq2: 3
W--HERE 5
W HERE
WASHERE 9
Seq1:
1 W--HERE 5
W HERE
Seq2: 15 WISHERE 21
NCBI FieldGuide
Global vs Local Alignment
1. Make lookup table of “words” for query
2. Scan database for hits
3. Extend alignment both directions
–
Ungapped extensions of hits (initial HSPs)
–
Gapped extensions (no traceback)
–
Gapped extensions (traceback - alignment
details)
NCBI FieldGuide
How BLAST Works
Query: GTQITVEDLFYNIATRRKALKN
GTQ
TQI
Word size can only be 2 or 3
Make a lookup
QIT
Neighborhood Words
table of words
ITV
VTV, LTV, VSV, etc.
TVE
VTV 12
Neighborhood
LTV 11
score threshold
VED
VSV
8
EDL
DLF
...
Word size = 3 (default)
NCBI FieldGuide
Protein Words
example query words
Query:
IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV…
HFL 18
YLS 15
HFV 15
YLT 12
YVS 12 Neighborhood HFS 14
HWL 13
YIT 10
words
NFL 13 Neighborhood
etc …
DFL 12 score threshold
HWV 10
T (-f) =11
etc …
Query
1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47
+E YA YL K
F+YLSL +SP+ +DVNVHP+K VHFL+++ I
Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333
Drop-off score =
Highest score – current score
-X X dropoff value for gapped alignment (in bits)
blastn 30, megablast 20, tblastx 0, all others 15
NCBI FieldGuide
BLASTP Summary
example query words
Query:
IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV…
HFL 18
YLS 15
HFV 15
YLT 12
YVS 12 Neighborhood HFS 14
HWL 13
YIT 10
words
NFL 13 Neighborhood
etc …
DFL 12 score threshold
HWV 10
T (-f) =11
etc …
Query
1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47
+E YA YL K
F+YLSL +SP+ +DVNVHP+K VHFL+++ I
Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333
High-scoring pair (HSP)
Gapped extension with trace back
Query
1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV… 50
+E YA YL K
F+YLSL +SP+ +DVNVHP+K VHFL+++ I + +
Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI… 337
Final HSP
NCBI FieldGuide
BLASTP Summary
Identity matrix
A
G
C
T
A
+1
–3
–3
–3
G
–3
+1
–3
–3
CAGGTAGCAAGCTTGCATGTCA
|| |||||||||||| |||||
CACGTAGCAAGCTTG-GTGTCA
C
–3
–3
+1
–3
T
-3
-3
-3
+1
[ -r 1 -q -3 ]
raw score = 19-9 = 10
NCBI FieldGuide
Scoring Systems - Nucleotides
Position Independent Matrices
PAM Matrices (Percent Accepted Mutation)
• Derived from observation; small dataset of
alignments
• Implicit model of evolution
• All calculated from PAM1
• PAM250 widely used
BLOSUM Matrices (BLOck SUbstitution Matrices)
• Derived from observation; large dataset of highly
conserved blocks
• Each matrix derived separately from blocks with a
defined percent identity cutoff
• BLOSUM62 - default matrix for BLAST
Position Specific Score Matrices (PSSMs)
PSI- and RPS-BLAST
NCBI FieldGuide
Scoring Systems - Proteins
A 4
R -1 5
N -2 0 6
D -2 -2 1 6
C 0 -3 -3 -3 9
Q -1 1 0 0 -3 5
E -1 0 0 2 -4 2 5
G 0 -2 0 -1 -3 -2 -2 6
H -2 0 1 -1 -3 0 0 -2 8
I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4
S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2
Negative
for -4
less-2likely
substitutions
W -3 -3 -4
-2 -3
-2 -2 -3 -2 -3 -1 1
Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3
V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1
Positive
substitutions
X 0 -1 -1
-1 -2 for
-1 more
-1 -1likely
-1 -1
-1 -1 -1 -1
A R N DD C Q E G H I L K M F
NCBI FieldGuide
BLOSUM62
7
-1 4
-1 1 5
-4 -3 -2 11
-3 -2 -2 2 7
-2 -2 0 -3 -1 4
-2 0 0 -2 -1 -1 -1
P S T W Y V X
Serine/Threonine protein kinases
catalytic loop
PSSM scores
DAF-1
1
5
7
4
4
NCBI FieldGuide
Position-Specific Score Matrix
catalytic
loop
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
K
E
S
N
K
P
A
M
A
H
R
D
I
K
S
K
N
I
M
V
K
N
D
L
A
-1
0
0
-1
-2
-2
3
-3
4
-4
-4
-4
-4
0
0
0
-4
-3
-4
-3
-2
1
-3
-3
R
0
1
0
0
1
-2
-2
-4
-4
-2
8
-4
-5
0
-3
3
-3
-5
-4
-3
1
1
-2
-1
N
0
0
-1
-1
1
-2
1
-4
-4
-1
-3
-1
-6
1
-2
0
8
-5
-6
-5
1
3
5
0
D
-1
2
0
-1
-1
-2
-2
-4
-4
-3
-4
8
-6
-3
-3
1
-1
-6
-6
-6
4
0
5
-3
C
-2
-1
1
1
-2
-3
0
-3
0
-5
0
-6
-3
-5
0
-5
-5
0
-3
-3
-5
-4
-1
0
Q
3
0
1
0
0
-2
-1
-4
-4
-2
-1
-2
-4
-1
-2
0
-2
-5
-4
-4
0
-1
-1
-3
E
0
2
0
-1
-1
-2
0
-4
-4
-2
-2
0
-5
-1
-2
0
-2
-5
-5
-5
-1
1
1
-2
G
3
-1
1
3
-2
-2
1
-5
-3
-4
-3
-3
-6
-3
-3
-4
-3
-6
-6
-6
-2
0
-1
3
H
0
0
1
3
-2
-2
-2
-4
-4
10
-2
-3
-5
-3
-3
-1
-1
-5
-5
-5
1
-3
0
-4
I
-2
-1
0
-1
-1
-1
-2
7
4
-6
-5
-5
3
-5
-4
-4
-6
6
0
3
-4
-4
-5
-2
L
-2
-1
-1
-1
-2
-2
-2
0
-1
-5
-4
-6
5
-5
-4
-3
-6
2
6
3
-2
-4
-4
3
K
1
0
0
1
5
-1
0
-4
-4
-3
0
-3
-5
7
-2
4
-2
-5
-5
-4
4
3
0
0
M
-1
0
0
-1
1
0
-1
1
-2
-4
-3
-5
1
-4
-4
-3
-4
2
1
2
-3
-2
-2
1
F
-1
0
0
0
-2
-3
-2
0
-3
-3
-2
-6
1
-5
-5
-2
-5
-2
0
-2
-2
-5
-5
1
P
-1
-1
2
0
-2
7
3
-4
-4
-2
-4
-4
-5
-3
2
2
-4
-5
-5
-5
-3
-2
-1
-2
S
-1
0
0
-1
-1
-1
1
-4
-1
-3
-3
-2
-5
-1
6
1
-1
-4
-4
-4
0
2
0
-2
T
-1
0
-1
-1
-1
-2
0
-2
-2
-4
-3
-3
-3
-2
2
-1
-2
-3
-3
-3
-1
-2
-2
-3
W
-1
-1
-1
1
-2
-3
-3
-4
-4
-5
0
-7
-4
-5
-5
-5
-6
-5
-4
-5
-5
-5
-6
5
Y
-1
-1
0
1
-2
-1
-3
-1
-3
0
-4
-5
-3
-4
-4
-4
-4
-3
-3
-3
-2
-4
-4
-1
V
-2
-1
-1
-1
-1
-1
0
2
4
-5
-5
-5
1
-4
-4
-4
-5
3
0
5
-3
-4
-5
-3
NCBI FieldGuide
Position-Specific Score Matrix
Expect Value
E = number of database hits you expect to find by chance, ≥ S
E = Kmne-S or E = mn2-S’
K = scale for search space
 = scale for scoring system
S’ = bitscore = (S - lnK)/ln2
(applies to ungapped alignments)
More info: The Statistics of Sequence Similarity Scores
NCBI FieldGuide
Local Alignment Statistics
1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG
|| | || || || | || || ||
|| | ||| |||||| | | || | ||| |
1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG
61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT
| || ||
|| ||| || | |||||| || | |||||| ||||| |
|
61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT
121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC
|||| || ||||| || ||
| | |||| || |||
121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC
Reason:
no contiguous exact match of 7 bp.
NCBI FieldGuide
An Alignment BLAST Cannot Make
NCBI FieldGuide
An Alignment BLAST Can Make
Score
= 290 bits
(741),
Expectsequences;
= 7e-77
Solution:
compare
protein
BLASTX
Identities = 147/331 (44%), Positives = 206/331 (61%), Gaps = 8/331 (2%)
Frame = +3
BLAST 2 Sequences (blastx) output:
• Megablast
• Discontiguous Megablast
• PSI-BLAST
• PHI-BLAST
NCBI FieldGuide
Other BLAST Algorithms
• Long alignments of similar DNA sequences
• Greedy algorithm
• Concatenation of query sequences
• Faster than blastn; less sensitive
NCBI FieldGuide
Megablast: NCBI’s Genome Annotator
Trade-off: sensitivity vs speed
WORD SIZE
default
minimum
blastn
11
7
megablast
28
8
blastp
3
2
NCBI FieldGuide
MegaBLAST & Word Size
• Uses discontiguous word matches
• Better for cross-species comparisons
NCBI FieldGuide
Discontiguous Megablast
W
W
W
W
W
W
W
W
W
W
W
W
=
=
=
=
=
=
=
=
=
=
=
=
11,
11,
12,
12,
11,
11,
12,
12,
11,
11,
12,
12,
t
t
t
t
t
t
t
t
t
t
t
t
=
=
=
=
=
=
=
=
=
=
=
=
16,
16,
16,
16,
18,
18,
18,
18,
21,
21,
21,
21,
coding:
non-coding:
coding:
non-coding:
coding:
non-coding:
coding:
non-coding:
coding:
non-coding:
coding:
non-coding:
1101101101101101
1110010110110111
1111101101101101
1110110110110111
101101100101101101
111010010110010111
101101101101101101
111010110010110111
100101100101100101101
111010010100010010111
100101101101100101101
111010010110010010111
W = word size; # matches in template
t = template length
Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology
search. Bioinformatics March, 2002; 18(3):440-5
NCBI FieldGuide
Templates for Discontiguous Words
NCBI FieldGuide
NCBI FieldGuide
Discontiguous (Cross-species) MegaBLAST
NCBI FieldGuide
Discontiguous Word Options
Query: NM_078651
Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp)
/note= mushroom bodies tiny; synonyms: Pak2, STE20, dPAK2
Database: nr (nt),
Mammalia[orgn]

MegaBLAST = “No significant similarity found.”

Discontiguous megaBLAST = numerous hits . . .
NCBI FieldGuide
Disco. Megablast Example . . .
NCBI FieldGuide
Ex: Discontiguous MegaBLAST
NCBI FieldGuide
Ex: BLASTN
Position-specific Iterated BLAST
Example: Confirming relationships of purine
nucleotide metabolism proteins
NCBI FieldGuide
PSI-BLAST
>gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE
MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF
VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD
EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY
RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA
VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK
0.005
E value cutoff for PSSM
NCBI FieldGuide
PSI-BLAST
Same results as protein-protein BLAST; different format
NCBI FieldGuide
RESULTS: Initial BLASTP
Other purine nucleotide metabolizing enzymes not found by ordinary BLAST
NCBI FieldGuide
Results of First PSSM Search
Just below threshold, another
nucleotide metabolism enzyme
Check to add to PSSM
NCBI FieldGuide
Tenth PSSM Search: Convergence
>gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4
MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASE
LIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHV
IKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDI
LKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEI
ASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK
[GA]xxxxGK[ST]
NCBI FieldGuide
PHI-BLAST
NCBI FieldGuide
What’s
New?
Nucleotide
• refseq_rna = NM_*, XM_*
• refseq_genomic = NC_*, NG_*
• env_nt
– environmental sample[filter], e.g., 16S
rRNA
Protein
• refseq = NP_*, XP_*
• env_nr
nr = nr
NCBI FieldGuide
BLAST Databases
Select lower case
Select red
NCBI FieldGuide
New Formatter
low complexity sequence filtered
NCBI FieldGuide
BLAST Output: Alignments & Filter
NCBI FieldGuide
BLAST Output: CDS Feature
Limit to Organism
all[filter] NOT ma
Example Entrez Queries
all[Filter] NOT mammalia[Organism]
ray finned fishes[Organism]
srcdb refseq[Properties]
Nucleotide only:
biomol mrna[Properties]
biomol genomic[Properties]
OtherAdvanced
–e 10000
-v 2000
-b 2000
-e 10000 -v 2000
expect value
descriptions
alignments
NCBI FieldGuide
Advanced Options
NCBI FieldGuide
Genome BLAST
NCBI FieldGuide
Genome BLAST via Map Viewer
Human
EST
TGCCTCCTTTGGTGAAGGTGACACATCATGTGACCTCTTCAGTGAC
CACTCTACGGTGTCGGGCCTTGAACTACTACCCCCAGAAC
ATCACCATGAAGTGGCTGAAGGATAAGCAGCCAATGGATGCCAAG
GAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGAC
CTACCAGGGCTGGATAACCTTGGCTGTACCCCCTGGGGAAGAGC
NCBI FieldGuide
Example: Human Genome BLAST
NCBI FieldGuide
Human Genome BLAST: Results
Entrez Gene
NCBI FieldGuide
Human Genome BLAST: MapViewer
>forward
CCATGGCGACCCTGGAAAAGC
?
?
>reverse
CAGCAGCGGCTGTGCCTGCGG
?
NCBI FieldGuide
Example: Mapping Oligos Onto a
Genome
>CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG
forward primer
-W 7 –e 1000
reverse primer
NCBI FieldGuide
Map Oligos Onto Genome
NCBI FieldGuide
Genome BLAST Results
NCBI FieldGuide
Primer Alignments
reverse primer
forward primer
NCBI FieldGuide
MapViewer
NCBI FieldGuide
MapViewer
forward
reverse
NCBI FieldGuide
Sequence View (sv)
•BLAST
•General Help
blast-help@ncbi.nlm.nih.gov
info@ncbi.nlm.nih.gov
•Wayne
matten@ncbi.nlm.nih.gov
Matten
NCBI FieldGuide
Service Addresses
Download