IMG/M User Tasks List with Answers

advertisement
IMG User Scenario October 21, 2010
IMG/M User Tasks List with Answers
A. Using Genome Browser and Metagenome Details
1. How many metagenomes are there in IMG/M?
Answer: currently 151, but changes with every update.
Explanation: start with the link http://img.jgi.doe.gov/m
Look at the “IMG/M Genomes” table of general statistics.
The number of metagenomes is in the category “Microbiomes”.
2. How many bins are there in Acid Mine Drainage metagenome?
Answer: 5 (according to the method “tetra”).
Explanation: on the above page “IMG top page” click on the number “22”
corresponding to the count of metagenomes sequenced by the JGI. It will bring you
the list of metagenomes sequenced by the JGI, which includes “Acid Mine Drainage”.
Clicking on its name will bring you to the “Microbiome Details” page for Acid Mine
Drainage metagenome.
IMG User Scenario October 21, 2010
At the bottom of “Microbiome Information” panel there is a category “Bins (of
scaffolds)”, which displays the binning method (“tetra”) and a list of 5 bins found by
Tetra in this metagenome. Note that to some metagenome more than one binning method
IMG User Scenario October 21, 2010
has been applied – in this case you would see several “Method” headings and a list of
bins found by each method.
3. To which phyla the proteins in Acid Mine Drainage metagenome have
the highest number of hits?
Answer: Euryarchaeota (but the majority belongs to the category “Unassigned”).
Explanation: on the above page “Microbiome Details” follow the link “Distribution of
genes binned by BLAST percent identities”. It will bring you to a histogram of the
distribution of best BLAST hits to the genomes from different phyla.
The highest number of hits is to the representatives of Euryarchaeota, Acid Mine
Drainage community includes 2 Ferroplasmas and a Thermoplasmatales archaeon (see
bins on the previous page). On the other hand, the majority of hits from bacterial
constituents of this community end up in the category “Unassigned”, since
IMG User Scenario October 21, 2010
Leptospirillum spp. belong to a phylum Nitrospirae, the only sequenced representative of
which (Thermodesulfovibrio yellowstonii) is too distant to get good hits.
4. How many scaffolds in Acid Mine Drainage metagenome have GC
content below 30%?
Answer: 2.
Explanation: go to the previous page (“Microbiome Details”). On the bottom of the
page there is “Scaffold Search” tool, which allows searching for scaffolds with
certain name, length, GC content and read depth. Select “GC (0.00-1.00)” filter, enter
the values of 0 and 0.3 and click “Go”.
5. Using Phylogenetic Profiler on Microbiome Details page, find how
many genes with COG hits are present in Methylococcaceae bin of Lake
Washington (combined v2) metagenome, but not in the genome of
Methylococcus capsulatus. Use default settings of Phylogenetic Profiler
for similarity cutoffs.
Answer: 75.
Explanation: go to Phylogenetic Profiler section on Microbiome Details page of
Lake Washington (combined v2) metagenome, click on “Phylogenetic Profiler”.
Select Methylococcaceae bin as your query, select Methylococcus capsulatus as your
reference organism (“Without Homologs In”). You could use browser “Find”
function to get to them faster. Keep default settings for similarity cutoffs, press “Go”.
On the “Phylogenetic Profiler Results” go to “Summary Statistics”.
IMG User Scenario October 21, 2010
B. Using Phylogenetic Distribution of Genes and Scaffold Cart.
1. How many genes in Sludge/US, Phrap Assembly metagenome have
the best hit to Alphaproteobacteria with percent identity between 60
and 90%?
Answer: 1330.
Explanation: type “Sludge/US, Phrap” in the “Quick Genome Search” and click
“Go”. Note that this is not a keyword search, so “Sludge US Phrap” won’t retrieve
anything. Click on the name of the metagenome on “Genome Search Results” page,
which will bring you to “Microbiome Details” page. Follow the link “Distribution of
genes binned by BLAST percent identities” below “Microbiome Information” panel.
On the “Distribution of Best Blast Hits” histogram find Alphaproteobacteria and the
column corresponding to the hits with percent identity between 60 and 90%. If you
want to compare cumulative counts (e. g., hits below 60% identity or hits above 30%
identity), add the counts in the corresponding columns. Since the counts are linked to
the corresponding lists of genes with hits in a certain phylum or class, the hits are
separated according to percent identity interval rather than into cumulative list below
or above certain percent identity to make the lists non-redundant.
IMG User Scenario October 21, 2010
2. How many genes in Sludge US Phrap metagenome with the best hit to
Alphaproteobacteria at 60-90% identity belong to the COG Functional
Category of “Amino acid transport and metabolism”?
Answer: 170.
Explanation: Click on the count of genes with best hits to Alphaproteobacteria with 6090% identity (1330). The list of results is sorted by gene_oid. Re-sort the resulting list of
genes by clicking on “COG Functional Cat.” above the table. The count of genes
belonging to a certain COG Functional Category is shown next to its name. Note that the
count of genes in the top right corner (1437) is different from the count in the previous
table (1330), because proteins identified as fusions can belong to more than one COG and
some COGs belong to multiple COG Functional Categories.
IMG User Scenario October 21, 2010
3. Which family of Betaproteobacteria has the highest number of best
hits from the genes in Sludge US Phrap metagenome (cumulative
above 30% identity)?
Answer: Rhodocyclaceae.
Explanation: go back to “Distribution of Best Blast Hits” histogram and click on
“Betaproteobacteria”. It will bring you the same histogram of the distribution of best
BLAST hits, only at the family level. Rhodocyclaceae have more best hits than the
second-highest hit family, Comamonadaceae in the categories 30-60% and 60-90%
identity and slightly fewer hits in >90% category. The sum of 3 categories for
Rhodocyclaceae is the highest.
IMG User Scenario October 21, 2010
4. Which archaeal genome has most hits with >90% identity from the
metagenomes of human gut subject 7 and human gut subject 8?
Answer: Methanobrevibacter smithii.
Explanation: from “Microbiome Details” page for each metagenome (human gut
subject 7 and human gut subject 8) go to “Phylogenetic Distribution of Genes” and
“Distribution by BLAST percent identities”. The second column in the histogram
table displays the domain (A for Archaea, B for Bacteria); out of 2 archaeal phyla
(Crenarchaeota and Euryarchaeota) Euryarchaeota have most hits with >90% identity
(1539 for human subject 7 and 1508 for human subject 8). If you click on the name of
the phylum “Euryarchaeota”, it will display the distribution of hits to the families
from this phylum; the family Methanobacteriaceae has the most hits with >90%
identity. Clicking on the name “Methanobacteriaceae” brings the table with the
species from this family. Methanobrevibacter smithii has the most hits with >90%
identity from the metagenomes of both human gut subject 7 (1533 hits) and human
gut subject 8 (1504).
IMG User Scenario October 21, 2010
5. What are the functions of genes in the region between 270 kb and 330
kb of isolate Methanobrevibacter smithii ATCC 35061 that are missing
from Methanobrevibacter smithii from human gut subject 7
metagenome? Are these genes also absent from Methanobrevibacter
smithii from human gut subject 8 metagenome?
Answer: ribosomal protein L15, an peptide/nickel ABC transporter, Ni-Fe
hydrogenase, UDP-glucose dehydrogenase, sugar kinase, 2-oxoisovalerate:ferredoxin
oxidoreductase, glutamyl-tRNA(Gln) amidotransferase and several hypothetical
proteins. These genes are also absent from human gut subject 8 metagenome.
Explanation: for the metagenome of human gut subject 7 go down the levels in the
table displaying the distribution of BLAST hits to the family level (“Family
Methanobacteriaceae”) and click on “Methanobrevibacter smithii”. In the table
“Protein Recruitment Plot” click on the “Normal” or “Larger” for “All Scaffolds”.
This plot displays proteins from the metagenome with BLASTp hit to the proteins in
the genome selected and the gaps on the genome that have no hits from the
metagenome can be reviewed. The region between 280 and 330 kb is one of the
largest gaps. Go back to the “Methanobrevibacter smithii” recruitment plot page. Go
to “Reference Genome Context View”, to “Methanobrevibacter smithii ATCC 35061
(bottom of the page), and select the range “240001-320000”on the scaffold
“Methanobrevibacter smithii ATCC 35061:NC_009515” (which you could have
identified from protein recruitment plot). Mouse over genes to see their functions.
IMG User Scenario October 21, 2010
They include ribosomal protein L15, an peptide/nickel ABC transporter, Ni-Fe
hydrogenase, UDP-glucose dehydrogenase, sugar kinase, 2-oxoisovalerate:ferredoxin
oxidoreductase, glutamyl-tRNA(Gln) amidotransferase and several hypothetical
proteins. Go to the metagenome of human gut subject 8, and down the levels of
“Distribution of BLAST hits by percent identity” table to Methanobrevibacter smithii.
Go to “Protein Recruitment Plot” (“Normal” or “Larger” for “All Scaffolds”. You can
change the resolution by selecting “View Range” 184376..366679. The region
between 280 and 330 kb also has a gap in BLAST hits of metagenome to
Methanobrevibacter smithii genome. You can verify the absence of the proteins by
going gene by gene in this fragment of Methanobrevibacter smithii chromosome and
running “IMG Genome BLAST” from the corresponding gene pages. Select “Human
Gut Community Subject 7” and “Human Gut Community Subject 8” in the list of
genomes. Select “Min. percent identity” at 90% to avoid seeing the hits from
organisms other than Methanobrevibacter smithii in metagenomic datasets.
6. What is the range of GC content of contigs assigned to the bin
“Accumulibacter” (binning method PhyloPythia) in Sludge US Phrap
metagenome? What is the range of read depths for this bin?
Answer: 58-69% GC, read depth 1.27-17.52.
Explanation: go to “Microbiome Details” page of Sludge US Phrap metagenome and
find the list of bins in “Microbiome Information” section. “Accumulibacter” bin
(binning method PhyloPythia) has 180 contigs, click on this count. Retrieve all the
CDSs in this bin (4301, don’t forget to change your preferences to increase the size of
the gene list to at least 5000!). Add all these genes to Gene Cart. Select all the genes
and add the corresponding scaffolds to Scaffold Cart. Now your Scaffold Cart
contains 180 contigs. Select all contigs (the button is below the table), go to
Histogram section (even lower). Select GC content as an option and click on “Show
Histogram” button. The histogram will display the range of GC content of the contigs
in “Accumulibacter” bin. Go back to the previous page and select “Read Depth” as an
option, click “Show Histogram”. The histogram of read depths will be retrieved. Both
parameters can be used to estimate the quality of binning, since both the coverage and
GC content for the same microbial population are expected to be in relatively narrow
range. The range of GC content for “Accumulibacter” bin is similar to what we see in
most bacterial genomes, but read depth is quite variable. Outliers can be checked to
eliminate binning and assembly errors.
IMG User Scenario October 21, 2010
C. Using Find Functions
1. Which Pfams describe carbohydrate-binding modules (CBM)?
Answer: pfam00553, pfam00686, pfam00734, pfam00942, pfam01607, pfam02013,
pfam02018, pfam02839, pfam02922, pfam03370, pfam03422, pfam03423,
pfam03424, pfam03425, pfam03426, pfam03427, pfam06204, pfam08305,
pfam09212, pfam09478, pfam10633.
Explanation: go to “Find Functions” tab. In “Search Terms and Pathways” page
select “Pfam” as a filter and search with the keyword “CBM”. 21 Pfams are retrieved.
Note that if you have selected a subset of genomes, the search will be performed on
these selected genomes only.
IMG User Scenario October 21, 2010
2. Are there any CBM-containing genes in human gut subject 7 and
subject 8?
IMG User Scenario October 21, 2010
Answer: yes - 8 pfams (pfam00553, pfam02018, pfam02839, pfam02922,
pfam03422, pfam06204, pfam08305, pfam10633), 46 genes.
Explanation: go back to “Search Terms and Pathways” page. Use the same keyword
and filter (“CBM”, “Pfam”), but select 2 metagenomes, “Human Gut Community
Subject 7” and “Human Gut Community Subject 8” in the scroll-down menu below
(press Ctrl button to select multiple genomes or metagenomes). Only 8 Pfams are
retrieved now. Add gene counts for total. You can find out which families are present
in which metagenome by adding these Pfams to the function cart and comparing
phylogenetic profiles of these two metagenomes.
3. Do A. phosphatis bins in Sludge/Australian, Phrap Assembly and
Sludge/US, Jazz Assembly metagenomes have all COGs assigned to
“Histidine biosynthesis” pathway? Do they have a complete pathway?
Answer: no, but it the only missing COG is an alternative implementation of
histidinol phosphatase, so the pathway is likely to be present.
Explanation: go to “Find Genomes” tab, in “Genome Browser” page press “Clear
All” button. Find “Sludge/Australian, Phrap Assembly” and “Sludge/US, Jazz
Assembly” metagenomes, select them, save selections. Go to “Find Functions” tab,
click on “COG” tab in this panel – this would bring a list of all COG Functional
Categories and COG Pathways. Find “Histidine biosynthesis” under “Amino acid
transport and metabolism” Category ([E]), click on it. This will bring the “COG
Pathway Details” page listing all COGs classified to this pathway. Select all of them
and click “Add Selected to Function Cart” button. Go to “Function Profile” option in
“Function Cart” page, select “A. phosphatis” bins in “Sludge/Australian, Phrap
IMG User Scenario October 21, 2010
Assembly” and “Sludge/US, Jazz Assembly” metagenomes in the scroll-down menu
(press Ctrl to select multiple bins/genomes/metagenomes). Click on “View Functions
vs Genomes” button. The table displays counts of genes assigned to each COG, the
only COGs with no genes is an alternative histidinol phosphatase (COG1387).
4. Representatives of which phyla are likely to be present in the
metagenome of human gut community subject 7 when COG0200
(Ribosomal protein L15) is used?
IMG User Scenario October 21, 2010
Answer: Firmicutes (Lactobacillus), Proteobacteria (Desulfovibrio), Actinobacteria
(Bifidobacteria)
Detailed answer: go to “Find Functions” tab and to “Phylogenetic Marker COGs”.
Select “Human Gut Community Subject 7” from the list of metagenomes, click “Go”.
Select “COG0200 Ribosomal protein L15” from the list of COGs, click “Go”.
Change gene selections if necessary (e. g. unselect the genes from multiple strains of
the same species), click “Run Multalin” button at the bottom of the page. On the
Multalin tree find red entries corresponding to the genes from the metagenome; check
the names of the organisms of their closest neighbors. You can find the names of the
phyla to which they belong by clicking on the corresponding genes, which will take
you to the “Gene Detail” page. On this page click on the genome name in the
“Genome” field of “Gene Information” panel; this link will bring the corresponding
“Organism Details” page, which displays full lineage of the organism (“Lineage”).
Multalin tree is a hierarchical clustering tree rather than phylogenetic tree. In order to
calculate a phylogenetic tree using your method of choice (neighbor-joining,
UPGMA, maximum likelihood, etc.), go to the previous page with the list of genes,
click “Add Selected to Gene Cart”, from which you can export their amino acid
sequences and use any of the available alignment and tree tools. Alternatively you can
use an alignment generated by Multalin, which is provided on the bottom of the page
with the tree.
D. Using Find Genes
IMG User Scenario October 21, 2010
1. Which domains are associated with carbohydrate binding module
family 6 (CBM_6, pfam03422) in Soil microbial communities from
Minnesota farm metagenome?
Answer: pfam00942 (CBM_3), pfam00041 (fn3), pfam08305 (NPCBM), pfam00801
(PKD).
Explanation: go to “Gene Search” tab in “Find Genes”. In “Gene Search” page use
keyword “pfam03422”, set filter to “Pfam Domain Search (list)” and select “Soil:
Diversa Silage” in the scroll-down menu below.
This search retrieves a list of genes with this particular pfam (or a combination of
pfams if several comma-separated pfams are listed) and displays all other pfams
found in the same genes. 3 genes in soil metagenome have CBM_6 in combination
with other domains; two of the pfams associated with CBM_6 are other CBMs.
IMG User Scenario October 21, 2010
E. Using Compare Genomes
1. Using Abundance Profile Search, find, how many Pfams are at least
twice as abundant in the metagenome of human gut community subject 7
as compared to human gut community subject 8 using frequency
normalization. Which Pfam has the highest frequency in human gut
community subject 7 metagenome? Is it more abundant than in human
gut community subject 8 metagenome by frequency? By raw counts?
Answer: 557; pfam00005 (ABC_tran); yes, although not by much; no.
Explanation: first, you have to make sure that you have these metagenomes selected,
so go to “Find Genomes” tab and on the “Genome Browser” page click “Select All”
and save selections.
Then go to “Compare Genomes” tab and to “Abundance Profiles”. Follow the link
“Abundance Profile Search”; on this page select “Pfam” as functional classification,
“frequency” as normalization method and set “More Abundant Cutoff” to 2. Select
IMG User Scenario October 21, 2010
“Human Gut Community Subject 7” as your query genome (“Find Functions In”) and
“Human Gut Community Subject 8” as your reference genome (“More Abundant
Than In”), click “Go” at the bottom of the page.
To find which Pfam has the highest frequency in Human Gut Community Subject 7,
change “More Abundant Cutoff” on the previous page to 1. This would bring all 1457
pfams pfams found in Human Gut Community Subject 7 and 8 metagenomes. Sort the
table in the order of decreasing counts in human gut community subject 7 by clicking on
the header of the column “Human Gut Community Subject 7”.
IMG User Scenario October 21, 2010
Pfam00005 (ABC_tran) has the highest frequency in human gut community subject 7
metagenome; it has slightly higher frequency in subject 7 than in subject 8 (21263 vs
20302), but less genes were assigned to this pfam in subject 7 than in subject 8 (451 vs
542). The frequency takes into account not only the number of genes, but also the size of
the metagenome.
2. Using Function Comparisons, find which COG is most overrepresented in
the metagenome of human gut community subject 7 as compared to human
gut community subject 8. Is this result statistically significant? Which COGs
are significantly overrepresented in human gut community subject 7 as
compared to human gut community subject 8? Use D-score as statistic and
gene count.
Answer: COG0629 (ssDNA-binding protein); not statistically significant. There are no
COGs in human gut subject 7 metagenome that are significantly overrepresented as
compared to human gut subject 8 metagenome.
Explanation: go to “Compare Genomes” tab, then to “Abundance Profiles” and follow
“Function Comparisons” link. Select “D-score” as “Output”, “Human Gut Community
Subject 7” metagenome as a query and “Human Gut Community Subject 8” as reference
genome. Select “COG” as for functional comparison, “Gene count” and “Show only rows
with at least one non-zero gene count” as your output options. IMG doesn’t have read
depth data for these metagenomes, so “Estimated gene copies” option is irrelevant.
Sort the resulting table in the order of decreasing D-score by clicking on the header of
“Human Gut 8 (R)” column. The higher is the D-score, the more overrepresented is the
corresponding protein family in the query metagenome; the lower is the D-score, the
more overrepresented is the corresponding protein family in the reference metagenome.
IMG User Scenario October 21, 2010
The COG with the highest D-score is COG0629 (single-stranded DNA-binding protein),
however, this result is not statistically significant, since the corresponding cells are not
colored (cells corresponding to protein families with statistically significant
overrepresentation are colored yellow or pink).
Two criteria are used to decide whether the test is valid: 1) since an approximate
statistical test is used, d-scores of protein families with less than 5 members in each
metagenome are unreliable and considered statistically insignificant, no matter how high
or low the d-score is and 2) if p-value of the protein family does not satisfy the p-value
cutoff, which is based on the false discovery rate of 0.05 and which also depends on the
number of hypotheses tested (number of protein families being analyzed, such as COG,
Pfam, etc., see the link to an explanation about D-score from the query page). In the case
of COG0629 the first criterion is satisfied, but the p-value is too high to be significant
after false discovery rate correction.
In order to find the statistically significant results, go back to query page and select
“Show only rows with significant hits” as your output option. The only COGs that are
significantly different between the two metagenomes are COG0642, COG2972 and
COG4753, but all of them are more abundant in human gut subject 8 than in human gut
subject 7. Therefore no COGs are significantly overrepresented in human gut subject 7
than in human gut subject 8.
IMG User Scenario October 21, 2010
3. Using Function Category Comparisons, find which COG Pathways are
overrepresented with p-value less than 1.0e-01 in human gut subject 7
metagenome as compared to human gut subject 8 metagenome. How these
results could be interpreted with respect to the overrepresentation of
individual COGs in the same metagenome?
Answer: “Aminoacyl-tRNA synthetases and alternate systems for amino acid activation”
and “Basal replication machinery”. COG Pathway “Basal replication machinery”
includes COG0629, Single-stranded DNA-binding protein, which was found as
overrepresented (although not statistically significant) in the previous test (see the answer
to the question D2). Overrepresentation of this single family could skew the results of
Function Category Comparison.
Explanation: go to “Compare Genomes”, “Abundance Profiles”, “Function Category
Comparisons”. Select “Human gut community subject 7” as your query metagenome and
“Human gut community subject 8” as your reference metagenome, select COG Pathways
for your “Function Category” and D-rank as an output option. In the results table click on
the header of the column “Human Gut 8 (R)” to sort the table by this column in
descending order. Two COG Pathways have positive D-rank (i. e. they have higher
abundance in query genome) with p-value less than 1.0e-01, ““Aminoacyl-tRNA
synthetases and alternate systems for amino acid activation” and “Basal replication
machinery”. If you click on the gene count corresponding to “Basal replication
machinery” in the column “Human Gut 7 Gene Count (Q)”, and scroll through the list,
you will see multiple occurrences of “Single-stranded DNA-binding protein”
corresponding to COG0629 (annotations of human gut subject 7 and human gut subject 8
metagenomes are based on COGs). This COG has been identified as the one with the
highest D-score in the previous test of COGs most overrepresented in human gut subject
7 metagenome as compared to human gut subject 8 (see the answer to the previous
question). Although D-rank test has been designed to avoid such situation, it is still
IMG User Scenario October 21, 2010
possible that one highly overrepresented protein family may skew the results of
“Function Category Comparisons” and this possibility should be taken into account.
4. Using Genome Clustering, find, which of 5 mouse gut community
metagenomes are the closest according to their COG frequency distributions?
According to their Pfam frequency distribution?
Answer: Mouse Gut Community ob1 and Mouse Gut Community lean3 by COG, and
ob1 and ob2 by Pfam profiles.
Explanation: in “Compare Genomes” tab go to “Genome Clustering”. Select 5 mouse
gut community metagenomes, “Mouse Gut Community lean1”, “Mouse Gut
Community lean2”, “Mouse Gut Community lean3”, “Mouse Gut Community ob1”,
“Mouse Gut Community ob2”. Select “COG” for functional profile and “Hierarchical
Clustering” for clustering method. Repeat the same using “Pfam” for functional
profile.
IMG User Scenario October 21, 2010
5. Using Phylogenetic Distribution, find, which phyla are underrepresented
in metagenomes of 2 obese mice as compared to 3 lean mice using “Gene
count” and BLAST hits with identity above 60%. Which phyla are
overrepresented?
Answer: Mouse Gut Community ob1 and Mouse Gut Community ob2 have fewer genes
with best hits to Bacteroidetes and more hits to Firmicutes.
Explanation: in “Compare Genomes” tab go to “Phylogenetic Distribution”, and then to
“Metagenomes Phylogenetic Distribution”. Select “Percent Identity” 60+, “Gene count”
and “Show percentages” as Display option. This metagenome is completely unassembled,
so the option of “Estimated gene copies” is irrelevant.
Select 5 mouse gut community metagenomes, “Mouse Gut Community lean1”, “Mouse
Gut Community lean2”, “Mouse Gut Community lean3”, “Mouse Gut Community ob1”,
“Mouse Gut Community ob2”. You will get the table with counts of genes in all 5
metagenomes with best hits to isolate genomes from various phyla/classes (similar to
Phylogenetic Distribution tool). You can filter the table by deselecting the phyla/classes
with very low counts in all metagenomes and clicking “Filter” button. 2 obese mice have
lower % of genes with hits to Bacteroidetes than any lean mouse.
IMG User Scenario October 21, 2010
F. Using SNP BLAST and SNP VISTA.
1. Using SNP BLAST and SNP VISTA find whether there are any
populations within Leptospirillum sp. group II bin of Acid Mine Drainage
metagenome.
Answer: no, there are no populations of Leptospirillum sp. Group II that could be
distinguished through single nucleotide polymorphisms (however, there could be
populations with large-scale genome rearrangements).
Explanation: go to Microbiome Details page of “Acid Mine Drainage” metagenome
(using “Find Genomes”, “Quick Genome Search” or through the list of metagenomes
from IMG/M Home page), scroll down to the list of bins. Click on the scaffold count
associated with Leptospirillum sp. Group II bin. Click on nucleotide range of any
scaffold (preferably larger one, since shorter scaffolds are more likely to be misclassified
by the binning tools). Click on any gene on the graphical representation of this scaffold.
On the gene page in the section “IMG Sequence Search” click on “SNP BLAST”, which
performs BLASTn of assembled sequence of the scaffold against the database of
sequence reads. You may use the sequence of one gene only or extend the range by
adding the sequence upstream and/or downstream, click on “Run BLAST”.
IMG User Scenario October 21, 2010
SNP BLAST produces simple text output, which can be analyzed to see if there are any
groups of reads differing from the consensus sequence of the scaffold.
In the case of Leptospirillum sp. Group II bin no such variation can be found (only single
reads deviating from consensus, likely due to sequencing artifacts), so there are no
populations of this organism detectable by single nucleotide polymorphisms. You can
view graphical presentation of SNP BLAST results by clicking on “SNP VISTA” button
(may not work in some browsers).
2. Is there any evidence of recombination between populations within
Ferroplasma acidarmanus type I bin of Acid Mine Drainage
metagenome?
Answer: yes, there is evidence of recombination between populations of Ferroplasma
acidarmanus.
Explanation: go to Microbiome Details Page of Acid Mine Drainage metagenomes and
click on the count of scaffolds associated with “Ferroplasma acidarmanus type I” bin.
IMG User Scenario October 21, 2010
Click on the coordinate range for any scaffold and then on any gene on the graphical
view of this scaffold. Go to “IMG Sequence Search” section and click on “SNP BLAST”
link, then on “Run BLAST” link.
As shown in the screenshot above, there are two populations of Ferroplasma that can be
distinguished by single nucleotide polymorphisms; one of them is represented by
consensus scaffold amd_scaffold_35, another is represented by amd_scaffold_1.
However there are some reads that are intermediate between two consensus sequences
(e.g., XYG47968.g1, XYG53549.b1 and XYG68254.b1) indicating a possibility of
recombination between the two populations.
Download