source file

advertisement
Sequence-based Similarity Module
(BLAST & CDD only )
&
Horizontal Gene Transfer Module
(Ortholog Neighborhood & GC content only)
Phylogenetic tree of Bacteria
Insert Figure 1 from Handelsman (2004)
Microbiol. Mol. Biol. Rev. 68: 669-685.
 Recall: Planctomycetes are one of
the GEBA genomes, representing
an under-represented phylum within
domain Bacteria
GEBA: Genomic Encyclopedia of Bacteria & Archaea
Recent phylogenetic analysis using 23S rRNA gene
supports the monophyletic grouping and branch order
for these four bacterial phyla
Insert Figure 4A from Pilhofer et al. (2008)
Characterization and Evolution of Cell Division and Cell Wall Synthesis
Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae,
and Planctomycetes and Phylogenetic Comparison with rRNA Genes.
J Bacteriology 190: 3192-3202.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=126
• The Basic Local Alignment
Search Tool (BLAST) finds
regions of local similarity
between two sequences.
• Conserved Domain Database
Search (CDD) finds sequence
similarity with genes in
conserved orthologous groups
(COGs).
Verifying Function Based on
Sequence Conservation
Different types of BLAST searches
–
–
–
–
–
blastp
blastn
blastx
tblastn
tblastx
>35% identity to experimentally characterized
protein (especially in conserved regions) can be
considered good evidence for function
E-value  less than 10-3 is significant
 equal to or less than 10-15 may
indicate good match
http://www.ncbi.nlm.nih.gov/
Beware!!!
Mindless BLAST – Similarity score and E-value do not tell whole story!
Must also consider length of match (query coverage) & biological
function (organismal context)
Be cautious of auto-annotated gene function – GenBank not a curated database
 Follow this link from the lab notebook
BLAST:
Altschul et al. (1997)
Nucleic Acids Research 25: 3389-2402.
Genbank:
Benson et al. (2006)
Nucleic Acids Research 35: D21 – D25.
Retrieve query sequence
from first module in
imgACT Lab Notebook
Copy amino acid sequence
in FASTA format from
in imgACT Lab Notebook
Paste query sequence
into box
“Click”
WHAT YOU SHOULD SEE. . . BLAST RESULTS
Scroll down
Accession ID
Top significant hit
Start with first hit. . .
Click on Accession ID
NOTE: Top hit is
from class organism;
Do not include results
in P. limnophilus
in lab notebook
Accession ID
Next significant hit
Click on Accession ID
Copy/paste this
information into
imgACT notebook
NOTE: Function assigned
by automatic Gene Caller
(not experimentally verified)
Reminder:
Make sure you are in
EDIT mode when
making changes to
imgACT notebook
and SAVE your work
along the way
Return to BLAST
results for this
information
“Click” on Bit score
Sequence length of database hit (not alignment length)
Pair-wise alignment
with statistics
(including E-value)
Copy/paste into imgACT
notebook:
 Length of alignment
 Score
 Expect (E-value)
 Identities
 Positives
 Gaps
 Pair-wise alignment
between “Query” and
“Sbjct” sequences.
725
NOTE:
You need to modify
your notebook for
requested info
(statistics
include E-value)
 REPEAT procedure
with second BLAST hit.
“Click” on Accession ID
“Click” on Bit score
Copy/paste requested information in lab notebook
733
CDD:
Conserved Domain Database
COG 1 – ion transport
COG 2 – energy production
COG 3 – cell division
etc.
Bi-directional best hit
in curated database
COG genes have
sequence similarity &
functional conservation
Figure from Sanders-Lorenz and Miller (2010)
 Return to top of BLAST Results page
CDD:
Marchler-Bauer et al. (2006)
Nucleic Acids Research 35: D237-D240.
“Click” on Conserved Domain image
“Click”
If there are no hits, write “no significant hits” in notebook
If there are hits, scroll down & click the + sign next to the top hit
Click here
Copy top COG hit and COG name into notebook
Modify BOX to include length, bit score, and E-value
COG
description
COG hit
COG name
Length, bit score, and E-value
 Change headings
and enter COG
information as shown
for top hit
If obtain more than one
significant hit, record this info for
at least the top 2 hits
 Hint: Look at Score & E-value
Retrieve from
Gene Detail page
How do I return to the Gene Detail page
for my proposed gene?
“Click” on URL saved for your gene
during first module (week 2)
Then what?
Keep the Gene Detail page open
in separate tab while working on
imgACT Lab Notebook modules
Scroll down
“Click” here on
Gene Detail page
Change to 40
Note the red arrow corresponds to your gene
 Plus strand genes on top (right to left)
 Minus strand genes on bottom (right to left)
Is your gene a stand alone ORF or is it clustered with other genes
on same DNA strand and in same orientation?
 Could be evidence that your gene is part of an operon
 What are the functions of adjacent genes? Do they have
related function?
How conserved is the gene neighborhood?
 Are there similar patterns in other organisms that contain
a gene from same orthologous group?
 If considerably different, may be evidence for HGT
Need to save individual panels
as JPEG or PNG files.
Include P. limnophilus as well
as 4-5 different organisms
in imgACT notebook.
“Click” here to
insert images
into notebook
Delete ‘gene neighborhood images’
and place cursor in the box
1- Click “Browse” to find image file.
2- Press “Attach” button. Thumbnail
image should appear in window.
3- Repeat for each individual
neighborhood panel until all are loaded
in the window prompt.
4- Next, select one image at a time and
press [OK] to insert them into imgACT
notebook at cursor position.
NOTE: The images should be
inserted in same order that the
organisms were listed in img/edu
Insert next image
Results: Ortholog Neighborhood
Scroll
down
Enter comments about homology & context:
Is your gene a stand alone ORF or is it clustered with other genes
or same DNA strand and in same orientation?
 Could be evidence that your gene is part of an operon
 What are the functions of adjacent genes? Do they have
related function?
How conserved is the gene neighborhood?
 Are there similar patterns in other organisms that contain
a gene from same orthologous group?
 If considerably different, may be evidence for HGT
Retrieve from
Organism
Details page
Retrieve from
Gene Detail page
On Gene Detail page, you will find
the GC content for your gene.
To find GC content for the
entire P. limnophilus genome,
select “Find Genomes” tab
from the Gene Detail page.
Search for Planctomyces limnophilus
and click on the corresponding hyperlink.
WHAT YOU SHOULD SEE. . .
Scroll
down
GC content will be listed
under Genome Statistics.
NOTE: A gene with a GC content that is more
than a few percentage points above or below the
the average GC content in the genome may have
originated from another organism by HGT. Add a
comment box & make note of this if your gene
meets this criterion.
Download