Document 11244406

advertisement
Examination of the Nematostella vectensis Holobiont by Comparative
Bacterial Genomics and Metatranscriptomics
by
Timothy J. Helbig
B.S. Biological Sciences
Carnegie Mellon University, 2010
SUBMITTED TO THE DEPARTMENT OF MICROBIOLOGY IN PARTIAL
FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
AT THE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
MASSACHUSETTS INSfT
TFCHNOLOGY
OCT 0 3 2013
September 2013
@2013 Timothy J. Helbig. All rights reserved.
LIBRARIES
The author hereby grants to MIT permission to reproduce
and to distribute publicly paper and electronic
copies of this thesis document in whole or in part
in any medium now known or hereafter created.
Signature redacted
Signature of Author:
Dep rtn, nt of Miobiology
August, 29th 2013
Certified by:
Signature redacted-,
Janelle R. Thompson
Doherty Assis ant Professor in Ocean Utilization
Thesispupervisor
Accepted by
Signature redacted
Michael T. Laub
of Biology
Professor
Associate
Development
Whitehead Career
Chairman, Committee for Microbiology Graduate Students
E
Examination of the Nematostella vectensis Holobiont by Comparative
Bacterial Genomics and Metatranscriptomics
by
Timothy J. Helbig
Submitted to the Department of Microbiology
on August 30, 2013 in Partial Fulfillment of the
Requirements for the Degree of Master of Science
ABSTRACT
Previous work has shown that similar microbial populations are associated with the
starlet sea anemone Nematostella vectensis over distinct temporal and geographic locations;
however, the functions these bacteria may be performing within their anemone hosts and
the mechanisms with which the bacteria may be using to adapt are unknown. To address
these issues comparative genomic analysis of ten newly sequenced bacterial isolates from
four bacterial populations (Pseudomonasoleovorans,Agrobacterium tumefaciens,
Limnobacter thiooxidans and Stappia stellulata) that are associated with Nematostella in the
laboratory and/or its natural salt marsh habitat was performed and whole
metatranscriptomes of lab-raised N. vectensis were sequenced and analyzed.
Comparative genomic analysis revealed the isolates from these bacterial
populations to likely be non-clonal, with no evidence that holobiont-specific orthologous
groups (i.e. gene orthologs found only in N. vectensis-associatedbacterial genomes and
absent in closely related genomes of the same genus/family) were shared across the
populations examined. Further, no evidence of lateral gene transfer or shared phage or
mobile elements among the isolates was observed. Isolate genomes did, however, reveal
conserved holobiont specific orthologs within members of the same bacterial population
that could be reflective of the ecology of the anemone holobiont; for instance, 3 of the four P.
oleovorans isolate genomes showed evidence of holobiont specific antibiotic production, the
three A. tumefaciens isolates all shared common ion scavenging proteins and both L.
thiooxidanshad a holobiont specific antibiotic resistance protein.
Whole anemone metatranscriptomic analysis based on BLASTx annotation of
sequenced transcripts revealed bacterial expression of housekeeping genes such as those
for replication, ribosomal structure and ATP-synthesis dominated by Proteobacteria, in
particular Gammaproteobacteria. Further recruitment of the transcripts to sequenced
Nematostella associates revealed an active and foraging Limnobacterpopulation expressing
genes for signaling, movement, iron scavenging and carbon storage in the form of PHA
granules. The similarity of high Limnobacter and host anemone expression for iron
regulators suggest iron may be a source of structuring within the anemone holobiont and a
good area of further study.
Thesis Supervisor: Janelle R. Thompson
Title: Doherty Assistant Professor in Ocean Utilization
3
4
Acknowledgements
Firstly I would like to thank the members of the Thompson Lab and everyone in
Parsons for creating a wonderful and jubilant work environment. Particularly, I'd
like to thank Sonia Timberlake who provided both invaluable programming and
bioinformatics advice and great life advice in general when I needed it.
Further, I would like to thank my advisor Janelle Thompson for patience and the
ability to talk my life out of dark places in my times of most need. She was a
wonderful mentor and one of the most thoughtful and caring people I have come
across in my life.
I would also like to thank my parental unit, who has provided a constant beam of
unconditional love throughout my time in grad school.
Finally, I would like to thank and give love to my crucial support factor in this
interesting past year and a half, my boyfriend Adam. Adam, you are a true raraavis
and the most special person in my life. Here's to more of the happiest times of my
life together now that this thesis is over.
5
6
Table of Contents
F ig u re Key ........................................................................................................................... 9
Introduction .....................................................................................................................11
M e th o d s ............................................................................................................................1 5
Com parative Genom ics Results and Discussion ............................................................ 23
M etatranscriptom ics Results and Discussion ................................................................41
Conclusion ........................................................................................................................61
References ........................................................................................................................63
Appendices .......................................................................................................................73
7
8
Figure Key
Comparative Genomics of Nematostella vectensis Associated Bacteria
1. Map and distribution of microbial diversity in field and lab-raised N. vectensis as
determined through 16S clone libraries (Har, MS Thesis) - p. 23
2. 16S rRNA Tree of Symbiont Phylogenetic Relatedness - p. 25
3. Shared Gene Contents of Nematostella Associated Isolates - p. 31
4. Holobiont-specific orthologous groups found within multiple members of the
Nematostella isolated populations - p. 33
5. MEGAN Visualization of N. vectensis Isolate Operational Core Genomes - p. 35
6. MEGAN Visualization of N. vectensis Isolate Operational Flexible Genomes - p. 36
7. Analysis of N. vectensis Genome Scaffolds containing and not containing
PseudomonasDNA - p. 37
8. Shared Phage Elements of N. vectensis Isolates within and among Populations - p.
38
Nematostella vectensis Metatranscriptome Analysis
1. Ribosomal and Contaminant Read Breakdown of Initial Read Pairs - p. 42
2. MEGAN Taxonomic Breakdown of Filtered Reads - p. 44
3. MEGAN Analysis of N. vectensis Metatranscriptome Diversity - p. 45
4. Top 15 most highly represented orthologous groups among transcripts of
Bacterial and Cnidarian binned reads - p. 47
5. Read mapping to second highest annotated "Cnidarian" Read Category,
opiNOG08261 - p. 48
6. Read mapping to representative sequence of highest expressed "Bacterial"
orthologous group NOG323497 - p. 49
7. Genus level assignment of SSU rRNA sequences from the unprocessed control
sample - p. 51
9
8. Species Level assignment of SSU rRNA sequences from all Metatranscriptome
samples - p. 52
9. COG Category Distribution of Reads Binned By MEGAN as Proteobacteria and
Reads Mapped to Sequenced Limnobacter Genomes - p. 57
10. Orthologous Group comparison of those present in "Bacterial" reads as
determined through MEGAN analysis and those present in the symbiont genome
mapping analysis - p. 58
10
Introduction
Significance of host-associated microbial communities: Emerging evidence
Multicellular life emerged in a world teeming with microorganisms. Rather
than something to overcome, this was, however, an opportunity for each type of life
to make use of the other's unique physiological and enzymatic capabilities in order
to help themselves better adapt to their surrounding environment. The success of
these multicellular-microorganism partnerships is well illustrated in the fact that all
studied mammals, lower vertebrates, invertebrates and plants are each distinctly
colonized with unique and active microbial communities (Bosch, 2013; Nyholm et
al., 2012; Hooper et al., 2001).
Microbial inhabitants of multicellular creatures have recently come to light
as powerful contributors to the well-being and success of their hosts. They are
critical in the digestion and absorption of nutrients as they have been found to
breakdown complex plant-polymers and polysaccharides in ruminants, termites and
humans (Warnecke et al., 2007; Xu et al., 2003; Mahowald et al. 2009), synthesize
essential amino acids in insects such as aphids (Moran, 2007) and produce vitamins
in mice and humans (Chaucheyras-Durand et al., 2010). They have also been found
to be imperative in the development of particular organs and systems within
animals such the immune system of mice and humans (Dobber et al. 1992; O'Hara et
al. 2004), the gut of zebrafish (Rawls et al., 2004) and the light creating organ of the
Hawaiian Bobtail Squid (Rader et al., 2012). Further, they are known to have effects
on complex physiological processes such as obesity in humans and mice (Backhed et
al., 2004; Ley at al., 2005; Ley at al., 2006). Finally, the microbial constituents are
known to be a passive and active deterrent of pathogens within both vertebrates
and invertebrates (Reshef et al., 2006; Bosch, 2013).
The composition, structuring and development of these microbial
communities within their hosts has been found to be species specific in both
vertebrates and invertebrates including hydra, mice, zebrafish, humans and other
hominids (Ley et al., 2008; Ochman et al., 2010; Rawls et al., 2006). In a few cases
critical components of the host structuring mechanism are known such as the roles
of particular cationic antimicrobial peptides in Drosophila and Hydra (Ryu et al.,
2008, Froune et al., 2010) and the use of specific surface attachment sites in
Nematode and the squid Euprymna scolopes (Ruby, 2008). However, most of the
mechanisms remain unknown leading researchers to speculate on single or
synergistic effects of nutrients, microbe-microbe interactions (both chemical and
frequency dependent), external environmental effects or host-derived factors as the
source of the control of the bacterial community composition (Bosch, 2013; Bevins
et al. 2011).
These communities of micro and macro organisms and their multicellular
hosts have become collectively known as "holobionts" (Rohwer et al., 2002).
Holobionts are relevant as it is becoming known that the microbial associates of a
multicellular organism are the keys to its success and adaptation in diverse
environments (Rohwer et al., 2002; Vega-Thurber et al., 2009). Indeed, the
holobiont theory of evolution suggests that it is the collective whole of the genetic
11
material of a multicellular host and its associates that is the unit of selection on
which evolution acts (Singh et al., 2013).
For most of the defined roles of microbial associates, extreme phenotypes of
the host, such as a termite's digestion of wood or the glowing light organ of the
squid Euprymna scolopes, provided the clues as to what the inhabiting
microorganisms were likely doing (Warnecke et al., 2007; Rader et al., 2012). Other
evidence of functional roles of the associates came from the creation and
manipulation of gnotobiotic animal models, or those capable of being raised with or
without microbes; powerful model organisms of this type include mouse, hydra and
zebrafish (Ruby, 2008). For organisms without a distinctive phenotype and
gnotobiotic form, methods of hypothesis creation of the function of microbial
associates (if any) included correlating behavioral and physiological observations
with monitored changes of complex microbial systems, comparative genomics of full
genome sequences of the associates or speculation of metagenomics datasets (Ley et
al., 2006; Mandel et al., 2009; Vega-Thurber et al., 2009). However, advances in
nucleic acid preparation techniques and next generation sequencing technologies
have permitted the use of population genomics and metatranscriptomics as viable
tools for the creation of hypotheses regarding the roles of microbial associates
within multicellular hosts (Coleman and Chisholm, 2010; Stewart et al., 2010).
Model System: Nematostella vectensis
One such multicellular host is the starlet sea-anemone Nematostella vectensis,
an animal of Phylum Cnidaria, Subclass Anthozoa: Hexacorallia. A sedentary
carnivore, this anemone resides exclusively in estuaries (Hand and Uhlinger, 1994)
including those of extreme salinity (Sheader et al., 1997), temperature (Williams,
1983; Kneib, 1988) and sulfide fluxes (Howes et al., 1985; Bart and Hartman, 2000).
Recently, due to its tractability in the lab, easily induced sexual and asexual
reproduction, sequenced genome including a genomic repertoire replete with innate
immunity genes (Putnam et al., 2007; Genikhovich and Technau, 2009) and a host of
molecular tools designed for it (Renfer et al. 2010), Nematostella has become a
popular model of evolution and development (Stefanik et al., 2013; Reitzel et al.,
2012).
Previous experimental work has shown that similar microbial populations
are associated with Nematostella over distinct geographic locations and timescales
(Har et al., MS Thesis). However, how these microbes are associating, their function
within the host and the factors used to structure the community are unknown. In
order to address these questions, we have used next-generation sequencing tools to
generate hypotheses about the functional roles of the microbial community and
possible ways of which it is structured. The motivation of this work was severalfold. 1. To compare the specifics of Nematostella-microbe interactions to other
symbiosis models to get a better understanding of canonical host-microbe
interaction principles. 2. To learn the specifics of anthozoan host-microbe
interactions in order to provide clues for understanding the health of anthozoans. 3.
To understand the role and adaptations of microbial associates to N. vectensis.
Based on an assessment of strength of association, we selected ten bacterial
genomes for sequencing, 4 classified as Pseudomonas oleovorans,3 as Agrobacterium
12
tumefaciens, 2 as Limnobacter thiooxidansand 1 as Stappia stelluata. To these
sequenced genomes we have applied the principles of comparative population
genomics in order to understand the unique ways in which they may be functioning
and surviving within the anemone.
Comparative population genomics as a tool for exploring bacterial function
Comparative population genomics for the elucidation of microbial genes and
functions related to host association can be done in two ways. In the first a genome,
or set of related genomes, of a strain(s) experiencing an environment of interest is
compared to the genome(s) of a related strain(s) not experiencing those factors and
the presence and absence of genomic signatures such as genes, non-coding RNAs,
regulatory regions and mobile genetic elements such as integrons and phages is
observed. This type of approach yielded the discovery of a single gene in Vibrio
fischerii that establishes its host specificity for squid (Mandel et al., 2009), genes of
potential pathogenicity in mycobacterium tuberculosis (Zakham et al., 2012) and
the discovery of repeat elements influencing plant pathogenicity in the fungus
Pyrenophoratriticirepentis(Manning et al., 2013). The second approach is to
compare genomes of phylogenetically diverse microbes experiencing the same
environment and see if they hold clues signifying a similar unknown pressure they
may both be under. This approach was used to successfully predict how species of
Pelagibacterand Prochlorochoccuswere both under phosphate limitation in the
Atlantic Ocean (Coleman and Chisholm, 2010). Because of our collection of ten
genomes from four distinct populations, we have employed both methods.
Metatranscriptomic analysis provides insights into gene expression in
complex communities.
In addition to the comparative genomics methods employed above, it was
desired to use metatranscriptomics, or the large-scale sequencing of RNA from a
mixed community, in order to understand the expressed functions of anemone
microbes. Metatranscriptomics has previously been a method of limited use due to
the cost of sequencing and the low amounts of desired bacterial mRNA in
comparison to the prodigious amounts of rRNA and mRNA of the host and other
organisms of the system (Poretsky et al. 2009, Moran et al. 2013, Gilbert et al. 2008).
However, advancements in rRNA subtraction and bacterial mRNA enrichment
techniques have yielded the ability to perform metatranscriptomics on complex
communities and gain insights into the bacteria residing within them (Stewart et al.,
Shi et al. 2009). For instance, studies examining such complex hosts as humans
(Gosalbes et al. 2011), insects (Xie et al. 2012), mice (Xiong et al. 2012) and sponges
(Radax et al. 2012) have all effectively used metatranscriptomics to understand the
ecology of the microorganisms within the host. Further, metatranscriptomics can be
applicable in practical ways in that metatranscriptome data was used to design a
medium in order to culture a symbiont of the medicinal leech Hirudo verbana
(Bomar et al. 2011). We have thus employed metatranscriptomics on whole, labraised anemones, using a variety of rRNA depletion and mRNA enrichment
techniques, not only to get an understanding of the microbial ecology of the
anemone holobiont based on the genes expressed and the types of microbes
13
expressing them, but to apply the results to the creation of lab testable hypotheses
for known and unknown inhabitants of the anemone microbiome.
Taken together, the following comparative genomics and
metatranscriptomics data reveal complex microbial associations with Nematostella
indicating that antibiotics and elemental iron may be key agents in the structuring of
the anemone microbiome.
14
Comparative Genomics Methods
N. vectensis Bacterial Isolate Genome Sample Preprocessing and Assembly
Barcode Sort and Removal: Raw 102bp paired-end Fastq reads of all 10 N. vectensis
associated genomes were obtained via Illumina-GAII (Illumina, Inc., San Diego, CA)
and received in one data file. To sort out the reads of the individual genomes, 6
base-pair barcodes on the sequences were identified, those sequences were moved
to specified files and the barcode sequence as well as a linking "T" nucleotide was
trimmed from the forward and reverse sequence of each pair; all of this was
performed using the perl script "bcsortfastq-pe.v3.pl" (Obtained from former
Thompson Lab post-doc Samodha Fernando).
Read PreprocessingandAdaptor Removal: Contaminating Illumina adaptor
sequences were trimmed from the reads of each paired-end genome fastq file by
utilizing the "fastx-clipper.pl" script of the FASTX-Toolkit (Hannon Lab Open source
software). This script also removed pairs of reads that contained ambiguous "N"
nucleotides. Trimmed reads less than 30nt were removed, and trimmed reads
greater than 30nt were moved to a separate file as they were now single-ended, and
this made it easier to import them into CLC Genomics Workbench (CLC Bio,
Cambride, MA). The exact line on the command line to perform this step was:
fastx-clipper -a GAGATCGGA -1 30 -i inputfilename -o Outputjfile-name
The movement of trimmed, now single end sequences over 30nt were moved to
separate files by custom script "fastq-purge.py".
Preprocessing was completed by importing both the file of untrimmed paired-end
fastq sequences and the file of trimmed > 30nt fastq single end sequences into CLC
Genomics Workbench and using the "Quality Trimming" function on default
parameters in order to remove low-quality bases before the assembly; reads < 30
base pairs were removed at this point.
Adaptor Discovery: The fact that proprietary Illumina adaptor sequences were part
of the reads was discovered upon mapping the raw reads back to an alternative
assembly via the "Map Reads to Reference" functionality of CLC Workbench. (That
alternative assembly will not be discussed as the preprocessing pipeline was found
to be inferior to the one described above). Upon mapping the raw reads to this
assembly, particular 25-80nt regions of scaffolds, when using manual inspection,
had thousands fold higher coverage than surrounding regions and these were found
to contain identical sequences later identified as the Illumina adaptor sequence.
Assembly and Assembly Check: Preprocessed trimmed and untrimmed reads were
submitted to the CLC Workbench "De Novo Assembly" Function on default
parameters. Assemblies were checked via manual inspection of the raw reads
mapped back to the constructed scaffolds using CLC Genomics Workbench.
15
Structural and Functional Annotation of Isolate Genomes
Open reading frames (ORFs) were identified by uploading fasta files of the genome
assemblies to RAST (Aziz et al. 2008) on default settings. Average nucleotide
identities (ANI) among isolate genomes were calculated based on pairs of ORFs
sharing bi-directional best hits. ORFs were annotated by using BLASTp (Altschul et
al. 1997) to compare them to the COG and NOG subsets of the eggNOG Database
(v3.0) (Powell et al. 2011). ORFs were given a particular COG/NOG designation if
their BLASTp alignment to a particular protein of a certain COG or NOG had an evalue < le-20 and the alignment included the functional portion of the protein (as
designated in the COG/NOG database). Only one orthologous group annotation was
given per ORF, so if an ORF had 2 or more significant hits, only the NOG/COG with
the lowest e-value was kept.
N. vectensis Bacterial Isolate 16S rRNA Gene Phylogenetic Analysis
Partial sequences of the 16S rRNA genes were obtained for each of the 10 isolates
from Ju Hyoung Lim and Jia Yi Har (a former post-doc and graduate student of the
Thompson lab respectively). These partial sequences, along with full 16S rRNA gene
sequences of the Alphaproteobacteria C. crescentus Na1000 and S. melliloti 1021,
Betaproteobacteria N. multiformis ATCC 25196 and V. paradoxus,
Gammaproteobacteria P. Putida F1 and P.aeruginosaPA01 and Firmicute B. subtilis
BPD-13 (all obtained from MicrobesOnline (Dehale et al. 2009)), were aligned using
the software Muscle on default parameters (Edgar 2004) and a maximum-likelihood
tree was constructed from the alignment using PHYML (Guindon et al. 2010); 100
non-parametric bootstraps were calculated and reported. B. subtilis served as
outgroup.
Symbiont Population Shared and Unshared Ortholog Determination
Genomes of the same population had their annotated ORFs compared to determine
shared and unshared orthologous groups present within them; this was carried out
using custom python scripts. Results were manually put into Venn diagrams.
Holobiont Specific Gene Determination
Holobiont-specific genes were defined as gene orthologs found in N. vectensisassociated bacterial genomes that were absent in closely related genomes of the
same genus/family. In order to determine holobiont specific gene orthologs a list
was made of all unique COG and NOG groups present in any of that population's
genomes; this was done via custom python script. Next, the fasta files of proteins
from all of a particular taxonomic level of closely sequenced relatives were obtained
from Microbesonline for each population (Dehale et al. 2009). In particular all
Pseudomonasgenomes available (n=27) were used for analysis of the Pseudomonas
oleovorans anemone isolates, all available sequences of Rhizobiacea (n=19) were
obtained for analysis of the Agrobacterium tumefacians isolates, 30 members of the
Burkholderiaceae family were used for assessment of the Limnobacterthiooxidans
isolates and all available Rhodobacteraceae (n=7) as well as the Rhizobiacea were
references for the Stappia isolate.
16
All of the relative protein files were annotated using the COG and NOG
subsets of the eggNOG database (v3.0) as described in the "Structural and
Functional Annotation of Isolate Genomes" section above. From there, unique
orthologous group lists of all the relatives were compared to those of the
populations to determine which orthologous groups were holobiont population
specific. A list of all holobiont specific orthologous groups and genomes in which
they're present can be found in Appendix I.
Determination of population-specific, 'operational core' orthologs and flexible
orthologs
Core genes are defined as the set of genes shared by all members of a population
while flexible genes are present in a subset of population genomes and may
represent recent evolutionary events (i.e. horizontal gene transfer). To investigate
the distribution and origin of core and flexible genes from N. vectensis associated
isolates, while accounting for the partial genome assembly of isolates, we
operationally defined "Core orthologous groups" as those that were present in the
majority of isolates of a population (i.e. three or more of the four Pseudomonas
strains, two or more of the three Agrobacterium strains or both Limnobacter
strains). Use of ortholog groups as the basis of comparison rather than reciprocal
hits of genes allowed genes split by partial assembly to be scored as a match.
Flexible orthologous groups were the remaining set that did not meet those criteria.
The Stappia strain was left out of core/flexible analysis as it was the only sequenced
member of its population.
MEGAN Analysis of Operational Core and Flexible Ortholog Groups:
In order to observe the phylogenetic origins of the core and flexible orthologous
groups, the genes annotated as particular orthologous groups were classified by
performing a BLASTp against the NCBI NR Database (Altschul et al. 1997) with
default parameters but with tabular (-m8) output. The BLASTp results were
imported into MEGAN (MEtaGenome ANalyzer) using Least Common Ancestor
(LCA) parameters of at least 3 hits to a clade for significance and a minimum
bitscore cutoff of 50 (Huson et al. 2007).
Nematostella Horizontal Gene Transfer Analysis
To assess the likelihood of horizontal gene transfer between Nematostella and the
Pseudomonas,Agrobacteria,Stappia and Limnobacterpopulation of symbionts, a
nucleotide BLAST was performed between the contigs of all 10 assembled N.
vectensis associate genome and the scaffolds of the complete N. vectensis genome
(Putnam et al. 2007). Sequence matches were determined using a stringent
expected value cutoff of le-30 for accepting a match. N. vectensis scaffolds bearing
bacteria-like DNA were manually inspected and analyzed with custom python
scripts in order to determine their GC content, ambiguous base composition and
size, which were used to asses the likelihood of horizontal gene transfer.
Inter-Population Lateral Gene Transfer Analysis
17
In order to asses whether there was evidence of horizontal gene transfer among the
different symbiont populations, every pair of genomes from different populations
had their genes BLASTn compared and checked to see if any of their genes shared
>95% nucleotide identity as that would be a strong indicator of recent gene transfer.
Symbiont Shared Phage and Prophage Element Analysis
To test if any of the populations shared similar phage or prophage elements, the
proteomes of each isolate were BLASTped against the PHAST phage and prophage
database (Zhou et al. 2011) and the top hits for each gene were compared among
the N. vectensis associates to look for any that were identical among different
populations of associates. Top hits were compared using a custom python script.
18
Transcriptomics Methods
RNA Extraction, rRNA Subtraction Methods and Illumina Library Preparation
Isolation of RNA, ribosomal RNA depletion, cDNA synthesis and preparation of
libraries for Illumina sequencing were carried out by Dr. Samodha Fernando
according to methods adapted and described in detail elsewhere (Timberlake, et al.,
In prep; Penn, et al., Submitted). In brief, 20 lab-raised N. vectensis were washed
with saline, homogenized and their RNA was extracted through use of TRIzol (Life
Technologies) according to manufacturer's instructions, including treatment with
DNAse followed by phenol-chloroform-extraction. The RNA was divided between six
samples that were each subjected to various combinations of rRNA depletion
protocols as an initial screen of protocol effectiveness (Table 1). These depletion
methods designed to eliminate eukaryotic and bacterial rRNAs included treatment
with RNAseH (Personal Comm. Samodha Fernando), use of MICROBEnrichTM Kit
(Ambion Part No. AM1901) and MICROBExpressTM Kit (Ambion Part No. AM1905)
and treatment with duplex-specific nuclease (Personal comm. Chisholm Lab).
Following depletion of rRNA samples were transcribed to cDNA (SuperScript
Double-Stranded cDNA Synthesis Kit (Catalog # 11917-020)).
Table 1 Bacterial mRNA enrichment techniques and their principles
* Poly(A)purist - A kit that relies on use of oligo(dT) cellulose to preferentially bind PolyJA) tails of
eukaryotic mRNA; this is used to remove unwanted eukaryotic mRNAs from our samples.
RNaseH - Endonuclease that specifically degrades RNA in RNA:DNA hybrids; DNA oligos that bind to
specific conserved regions of rRNA are added with it to selectively remove rRNAs.
- mRNAOnly - A kit that relies on an endonuclease that selectively degrades RNAs with 5'monophosphates; rRNAs have this feature while mRNAs do not, so they are selectively degraded* MlCRoQEnrich/MICROBExpress -A pair of kits that rely on a novel capture hybrilization protocol to
stectely degrade eukaryotic rRNA and Bacterial rRNA respectvely
d ON4Rp DNA:~4
ds6Ne ant.lr4E
- *Dpx
a e*-speci
pcfit
Nuclease- A nuclease thattcftctl.degriwdeA
ipecfo
ydge
adDAi
N:N
hybrods;
To prepare the Illumina Libraries, cDNA for each sample was sheared into pieces
between 100-300 base-pairs, purified, ligated into proprietary Illumina Adaptor
sequences (Illumina, Inc., San Diego, CA) with unique 6 base-pair barcode sequences
to designate samples for multiplexing within a single lane. Barcoded adaptor-ligated
reads were then subject to size selection to remove self-ligated adaptors. Cleaned
and merged adaptor ligated reads were than submitted to MIT personnel, who
sequenced them using the Illumina-GAII platform as described in Timberlake et al.,
in prep; Penn et al., Submitted. Sequence and sequence quality data in the form of
FastQ files obtained from the Illumina platform were obtained and form the basis of
the dataset analyzed in this thesis.
Read Filtering
Barcode Sort and Removal: Raw reads obtained from the Illumina GA-I were sorted
into respective samples by barcode and subsequently had that barcode + linking "T"
19
nucleotide removed via the perl script "bcsort_fastqpev3.pl" (Obtained from
former post-doc in the lab Samodha Fernando).
Removal of rRNAs: Raw read pairs were compared against the Silva large and small
subunit rRNA databases (Quast et al., 2013) using BLASTn and those read pairs
having one or both ends matching a database sequence with a bitscore > 50.0 were
removed and categorized as the type of rRNA matching its highest hit. For removal
of 5S rRNA and Internal Transcribed Spacers (ITS), remaining non-rRNA reads were
compared against custom databases of bacterial and Nematostella 5S rRNA and ITS
using BLASTn, and again, those read pairs having one or both ends matching a
sequence within one of these databases with bit score > 50.0 were removed and
classified as the type of rRNA according to the identity of the highest hit.
Trimming ofAdaptor contaminated reads: From visual inspection of the reads, it was
determined that some of the reads contained the adaptor sequence within them.
rRNA free reads had this adaptor contamination removed by searching for a pattern
of "'AGATCGG[ACTGN]+?NNN[ACGTN]+?CCGATCT"' (python regular expression)
within read pairs that had been combined by merging the first read pair, three
ambiguous nucleotides "NNN" and the reverse compliment of the reverse read. If
the pattern above was found in the merged read, the merged read was trimmed
before the beginning of the initial adaptor sequence "AGATCGG". This resulted in the
production of an adaptor free single ended sequence. If this sequence was larger
than 25 nucleotides, it was kept for analysis.
Removal of reads with tandem repeats: Tandem repeats were removed via custom
perl script. A read pair was determined to contain a tandem repeat if a 6-mer was
present for greater or equal to 6 times.
Merging overlappingpairedends: Paired end reads that had overlapping sequence
were merged using the software program SHERA with confidence metric >= 0.7
(Rodrigue et al. 2010)
Readfilteringresults:
Putative mRNA single-end and paired-end reads that had made it through the
filtering steps were now in three forms: 1) Single ended because of adaptor
trimming 2) Single ended because of overlapping paired-end that was merged with
SHERA 3) Paired-end because of non-overlap of ends and no adaptor presence. For
further analysis, each of these groups counted as one read pair unit.
Taxonomic binning of Read Pair Units through MEGAN
Read pair units were compared against all sequences in the NCBI database using
BLASTx (Altschul et al. 2007) with parameters (-m 8 -W 3 -e 20 -Q 11 -F "m S"). The
BLASTx results were imported into MEGAN with bit score cutoff 40.0 and the lowest
common ancestor cutoff being 2 matches (Huson et al. 2007). Read pairs with ends
matching different domains of life were discarded; those matching the same domain
of life were annotated with the end with the more specific taxonomy. Reads binned
20
at at least the level of Bacteria or Cnidaria were exported for annotation to
understand the functions of anemone host and bacterial associates. The reads
binned as "Virus", "Archaea" and "Non-Cnidarian Eukaryote" have been exported
but have yet to be annotated and analyzed.
EggNOG Annotations
Read pair units of interest exported from MEGAN were annotated by use of BLASTx
against the eggNOG database version 3.0 (Powell et al. 2011); "Bacterial" reads were
compared against the NOG and COG subsets of eggNOG while "Cnidarian" reads
were compared against the opiNOG subset. Single-ended read pair units were
annotated simply if a top hit had a bit score match > 50.0 with whatever the
orthologous group and function that top hit belonged and counted as two read
counts. Paired-end read units had each end BLASTxed separately against the
desired subset of the eggNOG database version 3.0. Using a custom python script,
bit scores were recalculated for pairs that hit the same member of the database
(Timberlake et al. in prep); this recalculation of bit scores was to account for added
confidence that both pairs should be classified as a particular orthologous group as
they aligned to the same target sequence. If the merged bit score was greater than
all other scores for the two ends and it was greater than 50.0, the read pair unit was
annotated with the identity and function of the orthologous group of that of the top
merged hit and counted as two read counts. If the individual ends had higher bit
scores than those of the merged hits, the ends are treated and annotated as single
ends with one read count value each. However, if after this treatment as single ends,
both ends ended up within the same orthologous group, they are counted as two
counts. This particular counting system was used in order to match the functional
annotation counting scheme of MEGAN.
Method Symbiont Genome Read Mapping
Reads that had been filtered through the metatranscriptomics preprocessing
pipeline were merged into one file and imported into CLC Genomics Workbench
(CLC Bio, Cambridge, MA). Using the "map reads to reference sequence"
functionality of CLC with parameters "Similarity = 0.9" and "Length Fraction = 0.5",
the reads were aligned with the annotated symbiont reference genomes. Results
were filtered by "Consensus length > 50", which meant at least half of a full length
read had to align to the reference in order to be counted. Further, reads containing >
5x coverage were manually inspected to ensure even distribution of the reads over
the length of the gene; genes with coverage deemed too skewed were excluded from
further analysis. These filtering results were exported and eggNOG version 3
annotations were added in as described in the above "EggNOG Annotations" section.
21
22
Comparative Genomics of Nematostella vectensis Associated Bacteria
Introduction and Initial Selection of Bacterial Symbionts for Sequencing
While previous experimental work has shown that similar microbial
populations are associated with Nematostella over distinct temporal and geographic
locations (Figure 1), the functions these bacteria may be performing within their
anemone hosts and the mechanisms the bacteria may be using to associate remain
unknown. In order to begin to address those questions, cultured bacteria found to
be associated with Nematostella were whole genome-sequenced in order to use
comparative genomics of those sequences as clues of the ecology of the anemone
symbionts and to use those sequences as references for quantitative
metatranscriptomics work; this section will focus exclusively on the comparative
genomics analysis.
Mahone Bay,
Nova Scotia
MIT (Laboratory)
Ju y 2008
Great Sippewissett Marsh,
Massachusetts
29%
November 2008
0
November 2008 sediment
72.6%
Clinton Harbor,
Connecticut
N
0 62.5125
A
CFB
250
375
5W
=I
Cyanobacteria
* Planctomycetes
U Verrucomicrobia
Or Chioroflexi
NChoroplasts
0 Deferribacteres
8 ODI
6 Spirochetes
OAlphaproteobacteria
U
Epsilonproteobacteria 0 Gammaproteobacteria U
Tenericutes
Deltaproteobacteria
Unknown
Figure 1 Map and distribution of microbial diversity in field and lab-raised N. vectensis as
determined through 16S clone libraries (Har, MS Thesis). Each Pie-graph represents a distinct 16S
clone library from a particular location and time as labeled. Colors on the graphs correspond to
classes of bacteria as specified in the key.
Bacterial isolates cultured from field or laboratory raised N. vectensis were
identified by 16S rRNA gene sequences (Har, MS Thesis). Isolates were selected for
genome sequencing if they appeared to be stably associated with the anemone host
based on recovery of the 16S rRNA sequence type (>99% nucleotide identity) in
multiple samples (Table 1). Genome sequences were prepared for four
Pseudomonasoleovorans strains (B4, Gab, Isu and 47) representing a 16S rRNA
sequence type recovered from anemones collected in the field, and from two
laboratories over a two year timespan, and for two Limnobacterthiooxidans strains
23
(Fl, FCMA) that also shared a 16S rRNA sequence with populations detected in both
the field-collected and lab-raised anemones. In addition, three Agrobacterium
tumefaciens strains(Isu, D5 and D8) were included in the sequencing effort because
they were observed in both culture- and clone-libraries of laboratory-raised
anemones over a two-year period. A single Stappia stelluata strain was also included
that was a close relative (on the genus level) to previously described associates of
Eastern Oysters, Crassostreavirginica,and corals (Boettcher et al. 2000, Uchino et al.
1998). It should be noted that only culturable populations were able to be included
in the current genome sequencing effort. While two 16S rRNA sequences types
similar to strains of Endozoicomonasand Campylobacterales respectively have been
observed in N. vectensis in multiple field sites and have been described as stable
associates, neither strain has been recovered in culture despite substantial effort (JH
Lim, pers. communication, data not shown).
The phylogenetic relationship among the 10 strains chosen for genome
sequencing is depicted in Figure 2; further, a comparison of the strains of each
population to their closest type strain proves they are more related to each other
than any other sequenced genome (see Appendix I). The Pseudomonasand
Limnobacterpopulations fall within the Gammaproteobacteria and
Betaproteobacteria classes respectively while the Agrobacterium and Stappia
populations both fall within the Alphaproteobacteria class.
and Detection of Bacterial Strains S
24
BdNus subdis BPD-13
CBuebct" rscentus Na1000
Stposteilwata f-IMGe-03
Sitrarbiamabs1021
Agrobaceriam tunWfacens IsuO.A8f-A9
Agrobactefium nolacens O
Agrobactenum wumefaiem DS
Alphaproteobacterta
edmn
Betaproteobacteria
oea
70
Gammaproteobacterla
Figure 2 16S rRNA Tree of Symbiont Phylogenetic Relatedness. Maximum-likelihood phylogenetic
tree constructed with the16S rRNA sequences of isolated Nematostella vectensis strains (highlighted
in red) and a collection of other Proteobacteria. Different color blocks represent different classes of
Proteobacteria as detailed in the key. The gram-positive Bacillus subtilis serves as out-group. One
hundred non-parametric bootstraps were calculated; branches with bootstrap support of > 50 are
labeled. (Scale is average substitutions per site)
Genome Assembly, Annotation, and Illumina Adaptor Artifact Detection
The selected genomes were prepared and sequenced using the Illumina GA-Il
platform, which, after sorting out through removal of barcodes, resulted in the
creation of, files of paired-end 102 bp sequences with quality score data for each
strain. In order to preprocess the samples for better assembly, all read-pairs
containing "Ns" were eliminated and the reads were trimmed at 60 nucleotides due
to the known decline in quality near the ends of the reads. This resulted in postprocessed read pairs that ranged among the ten isolates from 689,036 to 1,854,925,
with no correlation to population. The number and average length of these read
pairs allowed for the calculation of the expected coverage of every genome, and it
ranged from 13.3x-23.1.8x for the Pseudomonasand Agrobacteriastrains to 29.4x
and 64.1x for the Limnobacters,which had smaller genome sizes (Table 2). Clean
reads were assembled into scaffolds using the software Velvet along with the
optimizing script velvetoptimiser.pl (on default parameters); repetitive sequence
errors were corrected using the functionalities of the AMOS software package
(Zerbino et al. 2008).
The results of initial assemblies suggested most genomes could be assembled
into less than 1000 scaffolds, which was on par with assemblies used in other
comparative genomics studies of incomplete genomes (O'Brien et al. 2011) (Table
2). However, upon visually inspecting the genomes in CLC Genomics Workbench it
25
became clear that there were problems with the assemblies. The initial reads used
to assemble the genomes were mapped back to genome scaffolds and multiple small
regions of extremely high coverage were observed (almost 1000-fold higher than
neighboring regions) that were identical to the Illumina adaptor sequence. We
determined that the Illumina adaptor, a proprietary sequence construct used to
support the basic mechanics of the sequencing reaction into which DNA of interest
is ligated, was sequenced in some cases when the DNA insert ligated into the
adaptors was shorter than the read-length the machine was programmed to
sequence, causing the Illumina sequencer to begin reading into the adaptor
sequence. Thus, it appeared that the presence of this internal adaptor sequence
caused the artificial joining of contigs together.
Statistics of AdaDtor Containina Genomic Reads
Ss_F1
1,107,834
(27.8x)
4,778,618
(65.4%)
735
26
12,606
51,920
5026
Ss_F1
717,642 RPs4,253
231,719 ATRs
(25.3x)
4,40,534
3,410
1,932
13,551
5,487
1,756
In order to fix this missassembly, the original reads were subjected to a modified QC
procedure adding a step to remove internal adaptor sequences using the FASTXToolkit (Hannon Lab open access software). Non-adaptor containing read pairs
(RPs) and adaptor trimmed sequences (ATRs) were imported into CLC Genomics
Workbench (column 2 of Table 3); once in CLC, the read ends were trimmed using
the CLC quality trimmer function set on default parameters and assembled using the
CLC assembler (CLC Bio, Cambridge, MA). This cleaner pre-processing method
resulted in poorer assemblies of the genomes as the number of contigs the
Pseudomonasand Agrobacteriastrains have assembled into is 4,429 to 6,720 almost
3-10 times the range of the previous assemblies of 408 to 2,149 scaffolds (Table 2
and 3). However, the assemblies were now free from artifacts from contaminating
adaptor sequences as ascertained by mapping reads back to scaffolds for each
sample.
Following assembly, open reading frames (ORFs) were identified by
uploading the genome assemblies to RAST (Aziz et al. 2008) and the genes
associated with ORFs were annotated using the COG and NOG subsets of the eggNOG
Database v3.0 (Powell et al. 2011).
27
Diversity within Holobiont-Associated Microbial Populations
Since the bacterial genomes within each population isolated in this study
share >99% homology of the 16S rRNA gene, it was desired to know if the genomes
within those populations were clonal or had distinct genomic repertoires. This
question was addressed first by calculating the average nucleotide identity (ANI) of
each pair of genomes within each population calculated based on orthologs
identified by best bi-directional BLASTn in RAST (Aziz et al., 2008). For the
Pseudomonas population the ANI between strains ranged from 94.91%-97.08%, for
the Agrobacterium strains it ranged from 95.5%-97.46% and the ANI between the
two Limnobacter isolates was 92.5%. This revealed that although the strains within
our populations are quite similar, they are not clonal as there are nucleotide
differences among their shared genes. An ANI greater than 94-96% has been cited
as evidence that two strains belong to the same species (Richter and RosselloMora, 2009) and in our study the Pseudomonasand Agrobacterium strains were
each related above this threshold and may be part of the same ecological population,
while the ANI of 92.5% between the two Limnobacterstrains may indicate they are
not the same ecological species. However, when compared to the most closely
related genome sequences in the RAST database, it is clear that the isolates from N.
vectensis are more closely related to each other than to near phylogenetic neighbors
(Figure Appendix I).
A further test of the diversity of the populations performed was an analysis
of shared and unshared gene contents. Normally this would be tested by performing
a reciprocal-best BLAST hit or reciprocal smallest distance analysis between ORFs
from each pair of genomes within a population; and then merging orthologous
groups permitting the visualization of shared and unshared genes via a Venn
diagram. However, because the relatively low coverage of genomes prevented
complete assemblies, structural gene annotation likely resulted in creating
artificially higher numbers of genes, due to genes split between the ends of two noncontinuous contigs being annotated as two distinct genes. Thus reciprocal best
BLAST hit analysis will fail to group split genes together and would result in clonal
strains appearing artificially distinct. Artificially high gene numbers do indeed
appear to be the case for the poorly assembled Nematostella genomes as the protein
coding gene density (# of protein coding genes per kilo base pairs) of the 4 isolated
Pseudomonasstrains is -1.23 whereas the density for four fully assembled
reference strains of Pseudomonas are -0.90 (Table 4). Heightened coding densities
are also observed for the Agrobacteria (data not shown).
28
Table 4 Comparison of gene densities of Nematostella isolated partially sequenced Pseudomonas
genomes and fully assembled Pseudomonas reference genomes
PoB4
6,781
5,410,491
1.25
2,206
0.41
PoGab
6,372
5,196,558
1.23
2,252
0.43
PO_Isu
6,553
5,288,085
1.24
2,292
0.43
Po_47
6,359
5,254,749
1.21
2,203
0.42
",5,6709
Ps-omns4,$94'
,6105
men~dacinja yMp
Pseuddmorias
aeruginos
24
,9
PA1
Pseudonons,
4,128
4,567,418.
5,21
5,§59,964
2,462
054
stitzeri A1501
Pse'udomonas
6 8,72.4
putida F1
Information for reference strains (in the dark orange rows above) was obtained using the web
reference: www.microbesonline.com (Dehal et al. 2009)
In order to perform a preliminary comparative genome analysis and to avoid
artificial inter-strain differences due to the high numbers of split genes in poorly
assembled genomes, the genes were functionally annotated by one-way best BLAST
hits to the eggNOG Database (v3.0) of COGs (supervised clusters of orthologous
genes) or NOGs (non-supervised clusters of orthologous genes) (Powell et al. 2011).
This analysis classified each full or split gene within an orthologous group. From
there, shared orthologous groups were determined between two genomes by
checking simply for the presence/absence of a particular COG/NOG group; thus, a
gene artificially split in two fragments in one genome compared against a single
intact version of the gene in another genome would each be counted as a single
match to the same COG/NOG and counted as shared. Evidence of this as a better
approach can be seen in that a comparison of the N. vectensis Pseudomonasisolates
and fully sequenced reference Pseudomonasin Table 4 used in terms of unique
COG/NOG density reveals close isolate and reference averages of 0.42 and 0.49,
much closer than the protein coding gene densities compared above.
Analysis of shared COG/NOG groups was carried out within the three
Nematostella associated populations of Pseudomonas,Agrobacterium,and
Limnobacter (Figure 3); the Stappia strain was excluded, as it was the only strain of
its population. For all three of the assessed groups, it appeared that the genomes
within populations are not clonal as each genome has from 60 to 279 unique
COG/NOG orthologous groups present within it. However, it is also possible that
unique COG/NOGs are absent because they were not sequenced in other genomes
and confirmation by PCR is necessary before final confirmation of strain
heterogeneity. However, given the multi-fold sequencing depths of this study it is
unlikely that such high numbers of unique COG/NOGs would be detected by chance.
29
Additional evidence to support the existence of genomic heterogeneity within
strains of the same 16S rRNA gene sequence type is the failure of improved
assembly due to pooling sequences from multiple strains. If strains were clonal, we
would expect that pooling the sequences would improve the assembly, however in
the case of the Pseudomonas,Agrobacteria,and Limnobacter populations the
opposite occurred and the N50 decreased. The isolated bacteria of Nematostella
thus seem to have richly diverse genomic repertoires.
Comparison of Symbiont Genomes
One of the primary goals for sequencing the ten bacterial isolates from Nematostella
was to use the sequences in order to gain knowledge about the ecology of the
anemone holobiont. The first method employed to do this was to use fully
sequenced close phylogenetic relatives of the Nematostella isolates to determine if
the anemone strains contained any unique protein coding genes and to determine
whether those unique genes were present among the four populations.
To do this, reference genomes from fully sequenced phylogenetically
neighbors were selected for each holobiont strain group from the MicrobesOnline
database (Dehal et al. 2009); in particular all Pseudomonasgenomes available
(n=27) were used for analysis of the Pseudomonas anemone isolates, all available
sequences of Rhizobiaceae (n=19) were obtained for analysis of the Agrobacteria
population, 30 members of the Burkholderiaceae family were used for assessment
of the Limnobacters and all available Rhodobacteraceae (n=7) as well as the
Rhizobiaceae were references for the Stappia. These reference genomes were
annotated just as the isolates were by one-way BLAST against the COG and NOG
databases. To determine if a gene within an isolate genome was unique, its COG or
NOG designation was compared against all COGs and NOGs within the reference
strains (Powell et al. 2011).
The N. vectensis holobiont Pseudomonasstrains collectively contained 116
distinct orthologous groups that were not present in any of the reference
Pseudomonasgenomes. The majority of these holobiont-specific orthologous groups
were hypotheticals (COG Category S) and were observed in only one of the four
genomes. Similarly, the three Agrobacterium strains contained 100 holobiontspecific COG/NOGs, the Limnobacters 58 and the Stappia 59 with the majority of
these corresponding to "hypothetical" orthologous groups.
30
L. Thiooxidans FCMA
L. Thiooxidans F1
279
a.
A. Tumefaciens D5
A. Tumefaciens D8
b.
A. Tumefaciens Is
P. Oleovorans Gab
P. Oleovorans 47
Figure 3 Shared Gene Contents of Nematostella Associated Isolates. Populations are assessed individually: a =
Limnobacter Population, b = AgrobacteriaPopulation, c = Pseudomonas Population. Shared gene contents
determined by comparing the presence or absence of particular NOG/COG orthologous groups of genomes
annotated by one-way best BLAST hit to the COG/NOG subset of eggNOG Database version 3.0. Bolded numbers
represent those genes in the operational core.
31
However, some holobiont-specific orthologous groups did have functions
that may reflect ecological significance, and they were recovered from two or more
members of a particular strain group (Figure 4). For instance three of the isolated
Pseudomonashave both a unique efflux transporter and an antibiotic synthesis
monooxygenase. If antibiotic production by the members of the anemone
microbiome is necessary for establishment of a community, these genes could be
imperative factors for association. Further, two of the isolated Pseudomonasstrains
have a plasmid maintenance protein; plasmids are often associated with adaptive
functions such as antibiotic resistance and nutrient utilization, so this too could be a
potential target of interest for further study of interactions within the holobiont.
The Agrobacteriaand Limnobacter populations also have unique ortholog
groups of interest present in multiple members. All three of the Agrobacteriahave
an isochorismate synthase which is a branch point enzyme that can lead to the
creation of several types of siderophores (Figure 4); its presence might reflect a
unique way this population is scavenging metal ions from its surrounding. Adding
more evidence to the importance of metals within this population, all three strains
and two of the three respectively have a mercuric transport protein and mercuric
periplasmic transport protein, suggesting the importance of having a system for
efflux of the toxic ion mercury. Perhaps most interesting of all, both Limnobacters
have a holobiont-specific ABC-type multidrug transport system, which again may
reflect the relevance of antibiotics as a structuring agent of the anemone
microbiome.
It was finally tested to see if any of these holobiont-specific orthologous
groups were shared among the four populations, as this would provide strong
evidence to a similar holobiont-structuring factor to which the various bacterial
populations were adapting similarly. There are three such sharing events (data not
shown). However, they are each between one Pseudomonasstrain and one
Agrobacteriastrain and all three of the orthologous groups are hypothetical (COG
Category S), revealing little useful ecological information about the anemone
microbiome.
32
Holobiont-specific OGs
Ortholog found
in 2 Genomes
Ortholog found
in 3 Genomes
Ortholog found
in 4 Genomes
(A
r0
=0
0CL
&-0
+-,
UC
0=
C
0.
Figure 4 Holobiont-specific orthologous groups found within multiple members of the Nematostella
isolated populations, not found in any tested closely related reference strain. Horizontal axis
represents # of genomes of a population a particular orthologous group is found within. Vertical axis
represents the different populations. Inset tables within colored boxes are COG/NOG functional
descriptions. Additional Information of Holobiont-Specific Genes can be found in Appendix II.
Exploring Phylogenetic Origins of Core and Flexible Genomes of N. vectensis
Isolates
The comparison of shared and unshared orthologous groups within the four
populations of N. vectensis isolated bacterial associates permitted their assortment
33
into "core" and "flexible" groups and new questions about the phylogenetic origins
and functional characteristics of these new categories to be asked. Core orthologous
groups were operationally defined as those that were present in three or more
Pseudomonasstrains and those that were found in two or more Agrobacteriaor
Limnobacterstrains; flexible orthologous groups were the remaining set that did not
meet those criteria. The Stappia strain was left out of core/flexible analysis as it was
the only sequenced member of its population.
It was hypothesized that since they were present in fewer members of the
populations, flexible orthologous groups present in the isolates likely had much
more diverse phylogenetic origins than those of the core. This is a relevant idea to
test as diverse phylogenetic origins, and the identities of those origins, of
orthologous groups could reveal insights both into the community and structure of
the anemone holobiont and differences between the origins of core and flexible
genes could reveal the magnitude of these structuring effects.
In order to observe the phylogenetic origins of the core and flexible
orthologous groups, the genes annotated as particular orthologous groups were
classified by performing a BLASTp against the NCBI NR Database (Altschul et al.
1997). The BLASTp results were imported into MEGAN (MEtaGenome ANalyzer),
which binned each gene taxonomically using the Lowest Common Ancestor
algorithm to the closest known sequence (Figures 5 and 6) (Huson et al. 2007).
Examining the core genome (Figure 5), it is evident that the majority of the
orthologous groups bin within the Class of bacteria expected (i.e. genes of the
Pseudomonasbin into Gammaproteobacteria, those of Agrobacteriabin into
Alphaproteobacteria and Limnobacter orthologous groups bin into
Betaproteobacteria). The small exception being 17 orthologous groups which
classify as Delta and Epsilon subdivision of Proteobacteria. Although the expected
binning is still present to a large extent, the flexible genomes of the isolates have a
much higher tendency to have orthologous groups binned in other classes of
Proteobacteria (Figure 6). This may either represent heavy amounts of lateral gene
transfer among the proteobacteria or errors in taxonomic binning resulting from the
limited sequences available within the NCBI nr Database.
In the core and flexible orthologous groups (OGs) there is the presence of
diverse bacterial phyla, however, it is constrained to particular isolates. For
instance L. thiooxidans FCMA has two flexible OGs classified as Cyanobacteria and A.
tumefaciens IS has three core OGs classified as Actinobacteria. Skepticism of these
results should be taken due to the limited number of genomes available in the NR
database and given the fact that, although they were core OGs, only one of the three
Agrobacteriastrains had these particular OGs considered of Actinobacterial origin.
Also of note are four OGs from P. oleovorans B4 that were classified as
Eukaryota; upon uncollapsing the MEGAN hierarchy, these OGs were more
specifically binned as being of Nematostella origin. This has tremendous
implications in terms of host-symbiont horizontal gene transfer or host genome
DNA contamination and will be explored further in a section below.
Finally, in almost all of the Pseudomonasand Agrobacterialstrains, OGs, in
either the core or flexible groups, were classified as viral in origin. This may imply
34
that these isolates have similar viral or mobile genetic elements; this too will be
explored further in another section below.
MEGAN Visualization of Isolate Core Genomes
U
U
U
0
U
Aiphaproteobacteria
L. thiooxidans F1
L. thiooxidons FCMA
P. oleovorans Gab
P. oleovorons B4
P. oleovorons 47A01
P. oleovorons Isu
A. tumefodens Is
A. tumefodens 08
A. tumefidens DS
(11
Proteobacteria
ta
Delta/Eosilon
Proteobacteria
Subdivisions
Beta poteobacteria
Bacteria
Cellular
Organisms
Unclassified Proteobactera
Root
Actinobacteria
Environmental Samples
Eukaryota
,
-
II I - I
...... - X It
Virus
Figure 5 MEGAN Visualization of N. vectensis Isolate Core Genomes. Genes fulfilling the definition
of operational core for each population were compared with BLASTp to the NCBI nr Database
(September 2011) and those results were imported into MEGAN (Huson et al. 2007). MEGAN binned
the genes at levels of taxonomy using the LCA algorithm with parameters of at least 3 significant hits
per clade and minimum bitscores of 50 for significant hits. The size of the pie graphs at each node is
proportional to the total number of genes binned at or below that node from every genome. (i.e. the
size of the root node is proportional to the total number of genes used in the analysis from all
genomes). The colors of the pie graph sections correspond to the colors assigned to the symbionts in
the key found in the top left and the size of the pie graph sections are again, proportional to the total
number of genes binned at or below that level of taxonomy from the specified isolates flexible
genome.
35
MEGAN Visualization of Isolate Flexible Genome
L. thiooxidansF1
U
U
L. thiooxidonsFCMA
P. oleovoronsGab
P. oleovorans 84
P. oleovorans 47A01
P. oleovorans lsu
U
U
0) A. tumefociens Is
U
A. tumefociens08
A. tumefociensD5
Cellular
Organisms
Alphaproteobacteria
Proteobacteria
f
Gammaproteobacteria
Bacteria rM
Betaproteobacteria
Cyanobacteria
Virus
Figure 6 MEGAN Visualization of N. vectensis Isolate Flexible Genomes. Genes fulfilling the definition
of operational core for each population were compared with BLASTp to the NCBI nr Database
(September 2011) and those results were imported into MEGAN. MEGAN binned the genes at levels
of taxonomy using the LCA algorithm with parameters of at least 3 significant hits per clade and
minimum bitscores of 50 for significant hits. For interpretation of the figure please see description
below Figure 5.
Surprisingly from the analysis above, four genes from the Nematostella
associated strains, in particular those of P.oleovorans B4, were annotated via
MEGAN as deriving from the anemone genome. While horizontal gene transfer of
the shikimic acid pathway from bacteria to the N. vectensis genome has previously
been suggested (Starcevic et al. 2008) it may be possible that anemone-associated
bacteria have incorporated anemone sequences. However, the most parsimonious
explanation is that the Nematostella genome is contaminated with Pseudomonas
DNA incorrectly annotated as Cnidarian, which is causing these genes to have a false
phylogenetic origin. To investigate these possibilities a nucleotide BLAST was
36
performed between the assembled P. oleovoransgenomes and the scaffolds of the
complete N. vectensis genome (Putnam et al. 2007). Using a stringent expected
value cutoff of le-30 for accepting a match, Pseudomonas DNA was found in 101
scaffolds of the anemone genome. Those matching scaffolds were examined in
terms of GC content, ambiguous base composition and size (Figure 6) in order to
assess the likelihood of HGT.
GC Coten )Wailonof AS Scdods ofMN
O-
I
Se S~oPotetGeo
a.
1-
500
30
Ij
Total
Number of
Nucleotides
3.556 x 108
Q6
1.046 x 106
Average
Size of
Scaffold
33,008
10,361
GC%
46.00%
60.10%
Percentage
Ambiguous
Bases
16.60%
77.80%
I 15,,
5lt 02
95
0.4
as0
0.6
0.0
047
Figure 7: Analysis of N. vectensis Genome Scaffolds containing and not containing PseudomonasDNA.
(a.) Histogram of all Nematostella genome scaffolds and their average GC% (b.) Histogram of
Nematostella scaffolds containing Pseudomonas DNA (c.) Table comparing salient details of all N.
vectensis genome scaffolds and those just containing Pseudomonas DNA as determined by BLASTn
analysis with threshold le-30.
Several observations point to the likelihood that the N. vectensis genome is
contaminated with bacterial DNA including PseudomonasDNA. First, the
Nematostella scaffolds containing Pseudomonas DNA are on average about 20,000
base pairs shorter, contain almost 80% ambiguous nucleotides and have an average
GC% that matches quite closely with that of the sequenced Pseudomonassymbionts
(Table 2) and a full 14 percentage points above the average GC% of this anemone
genome. Due to this shortness, high percentage of ambiguous bases and skewed
GC%, the genes identified as Nematostella within the isolates are likely not the result
of HGT but the result of bacterial contamination in the Nematostella genomes.
Further tests with the genomes of the Agrobacteria,Limnobacterand Stappia
revealed no evidence of lateral gene transfer with the anemone but additional
evidence of bacterial contamination in the anemone genome. However, unlike the
Pseudomonas,which matched 101 scaffolds, the Agrobacteria,Limnobacterand
37
Stappia populations matched 6, 4 and 3 Nematostella scaffolds respectively; all
Nematostella scaffolds identified as bacterial are indicated in Appendix III.
A final screen for evidence of potential horizontal gene transfer in the
holobiont was done within the bacterial isolates themselves to examine the
hypothesis that these cultured populations interacted with a common gene pool in
the holobiont. Every pair of genomes from different populations had their genes
BLASTn compared. No inter-strain gene pairs were found that matched or were
greater than 95% nucleotide identity indicating that among the four populations
there has not been any recent horizontal transfer of protein coding genes.
Shared Phage and Prophage Element Analysis
Since MEGAN visualization of gene annotations revealed viral origins for
some genes of the Pseudomonasand Agrobacterialisolates (Figures 5 and 6), it was
hypothesized that the bacterial populations might share the same phage or other
mobile genetic elements. To test this, the proteomes of each isolate were BLASTed
against the PHAST phage and prophage database (Zhou et al. 2011) and the top hits
for each gene were compared among the N. vectensis associates.
100
-
90
00
.C
80
--
70
W
C
0
60M
-
-
-
V
-
--
-
-
-
-
--
-
-
-
-
-
-
-
-
-
-
-
-
>3 Genomes
Ge
m s
*2 Genomes
4
3 Genome
50
-+"-
Aiphaproteobacteria
Gammaproteabacteria
Betaproteobacteria
All
Isolates
Figure 8: Shared Phage Elements of N. vectensis Isolates within and among Populations. Best
BLASTP hits for genes that significantly matched a sequence in the PHAST phage database
(significant match = E-value < le-5) were compared within and among all microbial populations
isolated to determine if any were the same phage sequence. The three columns to the left represent
comparisons within populations of N. vectensis isolates; colors represent finding a best BLASTP hit to
the same phage sequence in the database in one to all of the isolates within that population. The
right-most column represents a comparison among all ten strains.
Comparisons of the best BLAST hits to the PHAST phage database (Figure 8)
reveal that a large proportion of mobile genetic elements are found only in a single
isolate suggesting dynamic evolution of the strains. Further, while two matches
were found among the populations of bacteria, which might suggest there was
lateral transfer of phage elements within the holobiont, upon manual inspection it
38
was determined that they were likely not of phage but bacterial origin as they were
annotated as an ATP-dependent Cp protease ATP-binding subunit and a TldD
protease, very conserved bacterial proteins. Manual curation of the rest of the
matches did however reveal phage specific genes present in each of the four
populations, suggesting that although they may not be shared among the
populations, phage elements play an important role in all of the N. vectensis
associated isolates.
Conclusion
This study focused on 10 sequenced isolates from four populations of
bacteria (P. oleovorans,A. tumefaciens, L. thiooxidans and S. stellulata) known to be
associated with the sea anemone Nematostella vectensis in order to gain an
understanding of the ecology of the anemone holobiont. A first look into the relative
diversity within the populations illustrated that the population strains are likely not
clonal as evidenced by their ANI and apparent strain-specific OGs but quite diverse,
implying that the populations are capable of adapting to multiple environments
including the anemone host.
This diversity did not, however, preclude an understanding of potential
ecological structuring factors of the holobiont. By comparing the shared or "core"
orthologs of each population to an abundance of sequenced relatives, holobiont
specific orthologs were able to be found for each population. While none of these
holobiont specific orthologous groups were shared among the populations, some
common functional themes emerged which may reflect adaptation to the holobiont
environment. For instance, three Pseudomonasisolates had a holobiont specific
antibiotic biosynthesis monooxygenase and a holobiont-specific efflux transporter
while both Limnobacters had a holobiont specific multidrug resistance efflux pump.
This shared holobiont specific function of efflux across the Limnobacter and
Pseudomonaspopulations, as well as the antibiotic biosynthesis gene in the
Pseudomonads may indicate the relevance of antibiotic production and resistance
as a microbial survival and adaption factor within Nematostella.
Although there are these hints of common functional themes, there are few
strong signals from the holobiont specific genes indicating adaptation of the
populations to similar ecological factors. This is likely because, as indicated by their
ease of being cultured and their diverse genomic repertoires, the 10 isolates used
for this study are quite adaptive to many environments and not just the anemone.
Better results would likely be obtained using these same approaches from obligate,
less free-living members of the anemone holobiont, which may be strains that would
be unable to be cultured but only studied through single-cell genomics.
While MEGAN analysis of "flexible" and "core" orthologs hinted at the
possible horizontal transfer of genes between the microbes and the anemone host,
further inspection revealed that it is the result of heavy Pseudomonascontamination
within the genome sequence of Nematostella itself (Putnam et al. 2007). However,
the MEGAN analysis, along with a BLAST comparison to the PHAST phage and
prophage element database, revealed the relevance of phage and mobile elements
39
on the evolution of each population and the complexities yet to be illuminated from
the anemone holobiont.
40
Nematostella vectensis Metatranscriptome Analysis
With the overall goal of my project to understand the microbial ecology of the
anemone holobiont, it was desired to get a general sense of what was happening at
the transcriptional level for both the anemone host and the microbial associates.
This was done by high-throughput metatranscriptomic sequencing of lab-raised
anemones. Although primarily descriptive, this analysis allows for the creation of
lab testable hypotheses of associate adaptation within the host, and, further, it
provides greater evidence of the relevance of the bacterial culturing and
comparative genomics work described in previous sections given the detection of
transcripts from those bacteria, particularly those of the Limnobacter population.
Metatranscriptome Preparation
Lab raised N. vectensis were prepared by a postdoctoral associate in the Thompson
laboratory for metatranscriptomic analysis using an initial screen of several
different methods designed to deplete rRNA and thus enrich for bacterial mRNA.
This initial screen of ribosomal depletion methods was carried out to guide selection
of an rRNA depletion strategy for future work (pers. Comm. Samodha Fernando).
Ribosomal RNA depletion was done as the expected amount of bacterial mRNA was
small compared to other mRNAs of the host and rRNAs of both host and associates
that would be sequenced (Stewart et al., 2010). A summary behind the rationale and
biological mechanism of each bacterial mRNA enrichment technique can be found
within Table 1. Five anemone metatranscriptomes processed with combinations of
bacterial mRNA enrichment techniques and one unprocessed control were
sequenced to a target depth of 1 million reads using Illumina GA-Il.
Metatranscriptome datasets that are the basis of the analysis presented in this
section consisted of FastQ files including both sequence data and quality scores.
Table 1 Bacterial mRNA enrichment techniques and their principles
*
Poly(A)purist - A kit that relies on use of oligo(dT) cellulose to preferentially bind Poly(A) tails of
eukaryotic mRNA; this is used to remove unwanted eukaryotic mRNAs from our samples.
RNaseH - Endonuclease that specifically degrades RNA in RNA:DNA hybrids; DNA oligos that bind to
-
specific conserved regions of rRNA are added with it to selectively remove rRNAs.
mRNAOnly - A kit that relies on an endonuclease that selectively degrades RNAs with S-
-
monophosphates; rRNAs have this feature while mRNAs do not, so they are selectively degraded.
SM
QCROBEnrich/MICROBExpress -A pair of kits that rely on a novel capture hybridization protocol to
slectveiy degrade eukaryotic ARNA and Sacterial rRNA respect$vety
-Dipex-Specific NMclease- Al nuclease that specfficdlly degrades dSDNAand MNA n DA:RNA -hybrids;
Evaluation of rRNA Depletion Techniques
For each of these samples, the raw sequencing reads were filtered for internal
adaptor contamination (as described in section Illumina Adaptor Artifact Detection),
41
Table 2 Filtering results of N. vectensis metatranscriptome sample reads for ribosomal RNAs and
other contaminants
Reads
Ieads
Reads
Read Pair Units*
Filtered
fFiltered as Filtered as Reads
Reads Filtered Remaining After
Filtered as as
Small
Large
Tandem as Adaptor All Filtering Steps
5S or ITS
Subunit
nitial Subunit
rRNA
Repeats Contaminants % Original)
lairs IrRNA
JrRNA
Abbreviations: ITS, Internal Transcribed Spacer. *A read pair unit can be one of three things: 1) A
read pair whose ends have both made it through filtering. 2) A pair of reads merged into one read
because of shared overlapping sequence. 3) A pair of reads clipped to one read because of adaptor
contamination.
100%
i Reads Remaining After
Filtering
90%
80%
-- a-
mAdaptor Contaminated
Reads
70%
A Tandem
In
Repeat Reads
M 60%
4 55 or ITS rRNA
50%
40%
U SSU rRNA Reads
30%
20%
-1
LSU rRNA Reads
10%
0%
poly(A)purist + poly(A)purist + poly(A)purist + poly(A) purist + poly(A)purist + Total RNA
RNaseH + Microbe enrich RNaseH + DSN unprocessed
mRNAonly
RNaseH
mRNAonly +MicrobExpress
+ mRNAonly
Figure 1 Ribosomal and Contaminant Read Breakdown of Initial Read Pairs. Initial read pairs were
filtered against the SILVA large and small subunit rRNA database and a hand-made database of N.
vectensis 5S and Internal Transcribed Spacer rRNA (database match = BLASTn bit score > 50.0) as
well as filtered for the presence of tandem repeats and adaptor contamination. Remaining reads
(seen above in orange) were to be subject to functional and taxonomic analysis.
42
tandem repeats and the presence of rRNAs. While all samples started off with the
same amount of initial Total RNA, the final yield of Illumina sequence reads was
variable by less than a factor of 3 (0.6 to 1.6 million read pairs) and this variability
increased to span more than three orders of magnitude after removal of ribosomal
contamination and QC filtering (i.e. 117 to 2.47 E 5 reads) suggesting very different
yields for the different processes (Table 2). In brief, the samples processed with the
MICROBEnrich/MICROBExpress kit performed best in terms of the remaining
putative mRNA compared to the initial number of reads; it had 37.7% of the original
reads remaining (246,506 out of 653,926 reads) as putative mRNAs as compared to
the unprocessed control with 8.24% (90721 out of 1,100,418 reads) (Table 1). This
makes sense given the success of the MICRO BEnrich/MICROBExpress kits in the
creation of previous metatranscriptomes. (Leimena et al., 2013) An unexpected
result was that those samples processed with RNaseH did worse than the
unprocessed control as they had a 2-10 fold reduction in the fraction of putative
mRNAs remaining after filtering (Table 1). The reason for this is unknown;
however, too strong of a conclusion should not be drawn because of the low number
of samples.
Performing worst of all, the Duplex-Specific Nuclease sample retained only
-0.001% of the original reads (117 out of 107964 reads), with the bulk of the
removed reads (95649 out of 107964 reads) being classified as large subunit rRNAs
(Figure 1 and Table 2). This was quite unexpected as anecdotal evidence of
Prochlorococcustranscriptome preparations demonstrated it to be an extremely
effective method of rRNA depletion. Because of the extreme low numbers of
remaining reads within the Duplex-Specific Nuclease treated sample, it was
excluded from further analysis.
MEGAN Taxonomic Binning and Efficiency of Bacterial mRNA Enrichment
After removal of ribosomal RNA, low complexity sequences and Illumina adaptors,
remaining "filtered" reads from the metatranscriptome samples were compared
against all known sequences in the NCBI database using BLASTx (Altschul et al.
1997), and those results were imported into the software package MEGAN (Huson
et al. 2007) in order to bin the reads taxonomically; the reads were first separated
into the broad phylogenetic categories of Bacteria, Archaea, Cnidaria, non-Cnidarian
Eukaryota and Virus. While each of the samples contained reads from all five of
these broad groups, the majority of reads that could be categorized (17.69-43.16%)
were classified as "Cnidarian" (Table 3 and Figure 2) implying that the most
abundant type of mRNA present was that of the host anemone.
MEGAN also classified some of the reads into the relevant but less
informative categories of "Root and Cellular Organisms", "Unassigned Reads" and
"Low Complexity" (Table 3). Reads binned into "Root and Cellular Organisms" have
so many significant matches to so many domains of life that the best MEGAN can do
is classify them as potentially being any kind of cellular organism. Reads classified
as "Unassigned" have significant hits to sequences in the NCBI database; however
the number of these significant matches does not meet MEGAN's algorithm's criteria
to assign it a taxonomy. Finally, reads considered "Low Complexity" are deemed as
43
Table 3 MEGAN Taxonomic Analysis of Metatranscriptome Diversity
Other
Root and
Aead Pair No
Celular
Units
significant Organisms MEGAN
Eukarylow
Cnidartz
84nned
A read pair unit can be one of three things: 1) A read pair whose ends have both made it through
filtering. 2) A pair of reads merged into one read because of shared overlapping sequence. 3) A pair
of reads clipped to one read because of adaptor contamination.
1%
Viral Sinned Reads
0 Archaeat Binned Reads
ABacterial Binned Reads
SOthor Ckaryotir Riniwd Re'ad,.
I
SCmidartan
BxAned Reads
40%
Low Complexity Reads
MEGAN Unassigned Reads
* Root and Cellular Orgarismn
Binred Reads
P
PsA)
et0
RNAS*H
+
pa 's A
om i -
NAns
It
'A )psc
Ra 4aa
"Nke
*
PQ(A)puni
KRCfeenrK .*
MCROSEsp&
TotalRNA
UJiworesisd
* No signiicant NR Database Hit
mRi4Aory
Figure 2 MEGAN Taxonomic Breakdown of Filtered Reads. Filtered Reads were compared using
BLASTx to the NCBI non-redundant protein database; the results were imported into MEGAN and
binned taxonomically according to the lowest-common ancestor algorithm. The categories above
describe the taxonomic groups within which they have been sorted. In particular, "No significant NR
database hit" = Read pairs having no hit to the NR database of bitscore > 40.0, "Low Complexity
Reads" = Reads determined by MEGAN to be suspicious based on low nucleotide diversity, "MEGAN
Unassigned Reads" = Reads with bit score hits to NR > 40.0 but without enough information to be
classified according to the specified parameters of the lowest common ancestor algorithm, "Root and
44
Cellular Binned Organisms" - Those reads with blast results that can only place them taxonomically
at the root of all known sequences or of all known cellular organisms.
O
Poly(A)purist + RNaseH
Poly(A)purist + mRNAonly
Poly(A)purist + RNaseH + mRNAonly
Poty(A)purist + MICROBEnrich
+ MICROBExpress + mRNAonly
Unprocessed RNA
-PIrE
s
4
A~*~e~mtsav'a
-.
W
-"IWR*
-;4~w,
cL-.r
W
qw -APOOM
Figure 3 MEGAN Analysis of N. vectensis Metatranscriptome Diversity: BLASTx results of filtered
reads against the NCBI NR database were imported into the MEGAN software package and binned
taxonomically at the most specific node possible on a universal phylogenetic tree. The radius of the
pie graphs is non-linearly proportional to the amount of reads summarized at and below the current
node (see Table 3 for distribution between Eukaryote, Cnidarian and Bacterial reads). The values
within the pie graphs for each sample are total reads binned at that level normalized to the total
number of reads with significant BLASTx hits to the NR database (Bit score > 40.0) in that sample.
Bacterial results were uncollapsed to the Class Level if possible (No reads in Bacteroidetes were able
to be placed at the Class level; reads from Cyanobacteria could only be classified into Orders) The
remaining groups were left at the domain level, and nodes for "Low Complexity", "Not Assigned" and
"No Hits" were removed for image clarity.
having too much of one specific nucleotide, which MEGAN judges to be more likely a
sequencing artifact than a relevant read and categorizes it as such.
45
While the MEGAN software was able to bin some of the reads, the largest
fraction of filtered reads had no significant BLASTx hit to the NCBI non-redundant
database (Figure 2); the fraction of these non-annotatable filtered reads ranged
from 44.39% (109,422 out of 246,506 reads) to 73.81% (9,822 out of 13,307 reads)
of our samples (Table 3). This unidentifiable group of putative mRNAs may indicate
the extent of the biological novelty of this anemone holobiont in that these reads
may be transcripts from organisms that have yet to have themselves or
phylogenetically close relatives sequenced. However, it cannot be discounted that
they could be a number of other things including non-coding RNAs or unknown and
currently undetectable errors of sequencing.
In terms of the success of the different processing steps used to try and
enrich the metatranscriptome samples for bacterial mRNA, different samples were
successful based on whether you consider relative or absolute numbers of bacterial
reads. In absolute terms the treatment "mRNAonly +
MICROBEnrich/MICROBExpress treatment" that yielded the highest number filtered
sequences also yielded the highest number of bacterial reads with nearly five-fold
more reads than the unprocessed control (i.e. 728 bacterial reads representing
0.30% of filtered reads). In relative terms all of the treatments (i.e. rRNA depletion
strategies) had at least 2-fold higher proportions of bacterial mRNA relative to the
no-treatment control, which had a bacterial proportion of 0.15% (138 out of 90721
reads). The highest efficiency of bacterial mRNA recovery was the "RNaseH +
mRNAonly treatment" which yielded 0.69% bacterial reads (472 out of 68143
reads) (Table 3); in comparison the no-treatment control. It should be noted that
without replication the differences observed cannot be interpreted as statistically
significant. However the data from this initial screen supported adoption of the
mRNA-only + Microbe Express/Enrich protocol for preparation of
metatranscriptomes in diverse samples (e.g. Penn, et al in prep, Timberlake et al in
prep), yielding similar proportions of sequences and successful enrichment of
bacterial mRNA's among complex targets (J. Thompson, pers. Comm).
Descriptive Analysis of Anemone Transcriptome
The reads within each of the five metatranscriptome samples (Table 3) classified as
"Cnidarian" were pooled and annotated by using BLASTx comparisons versus the
eggNOG version 3 database of opiNOGs (Powell et al. 2011), a set of orthologous
genes of sequenced Opisthokonts (Metazoans/Fungi); opiNOGs, instead of the entire
eggNOG database, were used to ensure reads of the same orthologous groups
weren't artificially separated owing to the redundancy of the eggNOG database. In
order to assure accurate annotations, a read was only considered to be part of a
particular opiNOG if its sequence matched to a functionally relevant portion(s) of
one of the proteins of that opiNOG. Checking annotations further, top opiNOGs were
manually curated to check that reads assigned to them were evenly distributed
across the length of a protein when mapped to a particular protein of that opiNOG;
for instance the reads of the second most abundant function of the anemone
transcriptome (opiNOG08261 - Proteins involved in cellular iron ion homeostasis)
46
map evenly across the N. vectensis protein 45351.JG1237552, a constituent of that
particular opiNOG (Figure 5). The orthologous groups were then counted and
ranked to determine the most expressed transcripts of the anemone host (Figure 4).
(
Top Bacterial Functions
Top Cnidarian Functions
Protein involved in
1.73
microtubule-based
opiNOG00002
NA
opiNOG08261
P
opiNOG06546
S
opiNOGOS941
NA
opiNOG22116
S
process
Protein involved in
cellular iron ion
homeostasis
Glycoprotein 2
(zymogen granule
membrane)
Protein involved in
negative regulation of
inclusion body
assembly
1.79
1.74
COGOO5O
i
NOG12793
S
COG3203
M
COG0749
L
COG0188
L
1.24
Hypothetical
GTPases -translation
elongation factors
Calcium ion binding
protein
Outer membrane protein
(porin)
DNA polymerase 1- '-5'
exonuclease and
polymerase domains
Type hIA topoisomerase
(DNA gyrase/topo 11,
topoisomerase IV), A
subunit
1.4
1.35
0.92
0.86
Figure 4 Top 15 most highly represented orthologous groups among transcripts of Bacterial and
Cnidarian binned reads annotated with eggNOG version 3.0 database (Cnidarian reads with opiNOG
subset and Bacterial with COG/NOG subset). For top 15 function determination, annotated read
counts of all five samples were merged and ranked. COG Categories: W - extracellular structures, P Inorganic ion transport and metabolism, S - Poorly characterized, I - Lipid Transport and
metabolism, Z - cytoskeleton, J - translation, ribosomal structure and biogenesis, M - Cell
wall/membrane/envelope biogenesis, L - Replication, recombination and repair, C - Energy
production and conversion, U - intracellular trafficking, secretion and vesicular transport, E - Amino
acid transport and metabolism, T - Signal Transduction Mechanisms, 0 - Posttranslational
modification, protein turnover, chaperones, NA - Not placed into a functional category.
47
The top annotation, making up 1.79% of Cnidarian classified reads (3,873
reads out of 215,900) is opiNOG0002, which consists of proteins involved
microtubule processes and tubulin itself (Huson et al. 2007). This result, along with
some of the other top expressed Cnidarian functions including actin binding
proteins (opiNOG02967) and cell adhesion factors (opiNOG11729) not only reflect
how cell structure and the cytoskeletal maintenance are some of the most active
processes within the anemone but they also give confidence to our transcriptomics
pipeline in that the top functions of other metazoan whole-organism
transcriptomes, including zebrafish and mouse, are known to be of this variety
(Francis et al., 2013). Also of interest is the third highest transcript, which is a
protein involved in iron homeostasis (1.74% of all opiNOG annotated reads). This
high level of expression of this type of regulator likely indicates the importance of
iron in the microenvironment of the anemone.
200
400
600
800
45351JG12375S2
Consensus
coverage
-I
-
-
-ul
-+4
-4
Figure 5 Read mapping to second highest annotated "Cnidarian" Read Category, opiNOG08261.
"Cnidarian" reads assigned to this opiNOG were mapped to the N. vectensis protein 45351.JGI237552
(a ferritin and member of opiNOG08261) to determine the validity of this assignment. Numbers at
the top represent the nucleotide position of the mRNA; the gray histogram represents the coverage of
a particular nucleotide position. Physical representations of reads are shown below the histogram;
green reads are the "forward" read of paired end reads while red are the "reverse".
Descriptive Analysis of Bacterial Transcriptome Top Functions
Reads that were annotated as "Bacterial" were pooled among the samples
(Total 1853 bacterial mRNA sequences) and annotated into orthologous groups
using BLASTx comparisons versus the COG and NOG subsets of the eggNOG
database v3.0 (Huson et al. 2007). These identified, expressed orthologous groups
were then ranked to determine the top bacterial functions within the anemone
holobiont (Figure 4).
48
The most expressed orthologous group, containing 1.73% of the "Bacterial"
read counts (32 out of 1853 read counts), is NOG323497 containing hypothetical
conserved proteins. To determine the validity of this classification, the reads
assigned to this orthologous group were mapped to the transcript of the protein
member of the group that was most similar to our reads, protein 216895.VV1_0587
from the genome of Vibrio vulnificus CMCP6 (Figure 6) (this was identical to the
method used for verification for the top "Cnidarian" opiNOG assignments). The
reads distribute along the length of the gene giving strength to our classification of
these reads within this particular orthologous group. Further, this highly expressed
bacterial transcript from a hypothetical gene is an intriguing target for the
determination of bacterial association and community structuring factors within the
anemone that could be explored further.
100
200
300
400
216895.VV1_0587
Consensus
Coverage
Figure 6 Read mapping to representative sequence of highest expressed "Bacterial" orthologous group
NOG323497. To determine the evenness of spread of the reads assigned to this orthologous group, the
nucleotide coding sequence of the protein member being most similar to our reads (Protein 216895.VV1_0587
from the genome of Vibrio vulnificus CMCP6) was mapped against. Numbers at the top represent the nucleotide
position of the mRNA; the gray and pink histogram represents the coverage of a particular nucleotide position.
Physical representations of reads are shown below the histogram; green reads are the "forward" read of paired
end reads while red are the "reverse".
A majority of the rest of the top expressed bacterial functions are
"housekeeping" genes, including those that code for translation elongation factors,
DNA polymerases, topoisomerase, ribosomal proteins, ATPases, alkaline
phosphatase and aconitase (Figure 4). While they provide little information about
the interactions of bacterial associates within the host, the presence of these
functions indicates that bacteria are not just residing within the anemone but
actively growing.
Several top functions may indicate activities that mediate acclimation and/or
interaction within the holobiont. One function that may indicate bacterial adaption
to the holobiont is COG0841, an orthologous group that contains many acriflavin
resistance genes and is expressed in 9 out of 1853 of our read counts. Acriflavin is a
known antibacterial compound; if Nematostella were to use it as a microbiome
structuring agent, resistance would imply an adaptation required to inhabit the
anemone space. Another interesting observation is that both lists of top "Bacterial"
and "Cnidarian" functions contain orthologous groups annotated as calcium ion
binding proteins (Figure 4). This, however, does not provide any evidence of the
49
ecology of the anemone as calcium ion binding proteins are involved in many basic
cellular pathways and processes.
Other ecologically relevant top expressed orthologous groups of the bacterial
transcriptome include COG0841, which consists of cation/multidrug efflux proteins;
this may represent a mechanism of antibiotic resistance used by the bacteria for
adaptation within the anemone microbiome. Another possible mechanism of
antibiotic resistance or general stress resistance may come from COG3203 and
COG0477, orthologous groups containing outer membrane porins and permeases
respectively. Both of these types of proteins permit the diffusion of solutes across
the cell wall and thus allow for a bacteria to interact with the solutes of the external
environment. These three cell-wall/membrane associated ortholgous groups along
with COG1289, a potential membrane protein, illustrate the likely importance of
bacterial members of the holobiont to regulate the barrier between themselves and
the anemone as they encompass four out of the fifteen top expressed bacterial
functions.
Bacterial Transcriptome Taxonomic Analysis and Comparison to 16S rRNA
gene Clone Library Data
The MEGAN analysis was not only able to group the "Bacterial" reads at the
domain level but also at more precise categories of taxonomy from phylum all the
way to the species level (Figure 3). Further, the putative SSU rRNA reads filtered
from the metatranscriptome samples during the preprocessing steps (Table 2) were
also able to be phylogenetically analyzed. This presented opportunities of validating
both the MEGAN taxonomic classification and earlier work done on the bacterial
inhabitants of N. vectensis that included constructed 16S rRNA clone libraries (Har,
MS Thesis) in the hopes of getting an even fuller understanding of the bacterial
associates of Nematostella.
In order to classify the putative SSU rRNAs filtered from the
metatranscriptomes in the pre-processing steps, the sequences were uploaded and
compared to known SSU rRNA sequences in the RDP database using MG-RAST, and
they were classified at the genus level with a threshold of at least 70nt matching
with an e-value of less than 1 E -20 and at the species level with a threshold of at
least 90nt matching and an e-value of less than 1E -40. The unprocessed
metatranscriptome sample was assessed first as it was the least likely to be biased
by one of the rRNA subtraction protocols. At the genus level, only 60 reads were
able to be classified with the majority (58 of the 60 reads; 97%) annotated as a
genus of Proteobacteria and with the bulk of those 58 being identified as "Vibrio" or
"unclassified Alphaproteobacteria" (Figure 7). This large proportion of
Proteobacteria matches the MEGAN taxonomic binning of mRNA as 1,616 of 1,853
binned "Bacterial" reads (87%) were further grouped as "Proteobacteria". The large
amount of Proteobacteria within the SSU rRNA, MG-RAST analyzed reads is also
confirmed somewhat in the previous 16S clone library experimental work as some
clone libraries, particularly those generated from lab-raised anemones, tended to be
dominated by Proteobacteria (Har, MS Thesis).
50
One of the non-proteobacterial reads was classified as Synechococcus, which
is likely not aberrant as Synechococcus species were also found in the previous 16S
clone library work (Har, MS Thesis). Synechococcus was not found in the MEGAN
phylogenetic results as the Cyanobacterial results were only robust at the phylum
level.
# of Reads
0
2
4
6
8
10
12
14
16
unclassified (derived from Alphaproteobactena)
Vibno
unclassified (derived from unclassified sequences)
Spiroplasma
Maricauks Ik04
11RI
Burkholdera
Thiobacillus
unclassified (derived from Alteromonadaceae)
Aquimanna
Crocelbacter
4OW64k
WO"
U-
Cyanothete
Synechococcus
Pirelluta
Cohaesibacter
Marivtta
unclassified (derived from Rhodobacteraceae)
Desulfuromusa
UUUU-
Shewanella
Alcanivorax
U-
Pseudomonas
Solimonas
Figure 7 Genus level assignment of SSU rRNA sequences from the unprocessed control sample.
Putative SSU rRNA reads filtered from the "unprocessed control" metatranscriptome sample were
compared to the RDP database of SSU rRNA sequences using MEGAN; reads were annotated at the
genus level if they aligned with a sequence in the database with minimum length 70nt and e-value
less than -20. Genus names are on the vertical axis; counts of reads are on the horizontal.
To leverage the power of greater read depth SSU rRNA filtered reads from all
six samples were classified at the species level using a minimum alignment size of
90 nucleotides and minimum e-value cutoff of le-40. Although SSU reads from the
rRNA depleted samples are most likely biased by the selective removal of rRNA's
based on sequence homology to proprietary probes (Microbe Enrich/Express) or
amplified rDNA (RNAseH or DSN), the richness and taxonomic groups revealed can
still be utilized to gain insight into the composition of the N. vectensis associated
microbial community (Figure 8). Like the analysis with only the unprocessed reads,
the majority of the reads from samples treated for rRNA-depletion were classified as
Proteobacterial with Vibrio in particular as the dominant type. These criteria
revealed sequence types most similar to Vibrio parahaemolyticusand Marcaulis
maris as present in multiple samples.
While the previously made clone libraries did not include these particular
species of bacteria, they did have the presence of related strains. Several species of
51
0
8
4
*11 RNAS
* 2) mR
Afteromonas spUSTL10723-013
Afteromanas sp. UST981101-023
Ferimonas
A38-S7-2
Moritella sp. 3681
sp
Rotumn
20
24
M(n 86)
bnly (n - 2
3) RNA3: eHnmRNAinly (n - 157
a4) Micra be Enrich/E*ress (n - 3)
Comarnonas denitnticans Skm"n
Herbaspirillum --ropedicae si
Burkhotderiales
16
12
Afteromonas macendib
-S) Unu esed )n .21)
06)00.
tranded uclease (n
5)
emppdkae
Rubrivivax gelatnosus
uncultured Ralstonia sp.
ssp. P 70 d
Synlcec6Desulfuromonadale s Desulfuromusa succinoxidans
Chrobcocaifes
Eschichacli
Pantaea aggorrerans
Panto"s
ananats
Enterobacteriales,
Celulophaga
Flayobacter'ales
"tca 0
branchiophilum
m
Th .bacllus thiophilus,
Alcanivora. sp. IMhol
Hj& iiinoPh ies
Oceanospirillales
Acinetobactr haerxlyticus
Pseudomonasp.
Rhizobliales
w
Crocelbacter aanticusm
Flavobacteure
Pseudomonadales
a
W
ND4167
Cohaesbac teOactrlus
Mesorhirobiumn sp. RPJ16
Mesorhimobium s.RJ
*
Maricauki marIs 6===
ee
Rhodovulurn sufiohn
Rhodobacterales
Stappia stellulata u
uncultu red Rhodobacteraceae bacteniun ft
unassgnred
uncultured alpha protembacterium
unicultured bacterium
Q M
uncultured marine bacteriumgan
Unclass. Gammap roteo.
Wilonas sp
NAA16
unc ultured gamma proteobacterdum
micoorgaibm asM---Wav
Xanffhomrinadales unultured marineXylelia
fastidiosa
0
4
12
8
Listonella angulllaram
marine proteobacterium 'Sippewissett 2'
Photobacterium aplysiae
Photobacterium damselae
Photobactenum IeiOgnathi
Vibrio aestuarianus
Vibrio alginolyticus
Vlbro azureus A40
Vibrio casei
Vibro cholerae
Vibrio corallilyticus
Vibrio cyclltrophicus
Vibrio furnissti
Vibrio galllcus
Vibria harvey #
Vibno hispanicus I
Vibrionale
(n
3) RNAaseH+mR
04) Mic
Ni)
==r=i
AVOWQWI
28
24)
only (n
=57)
obe Enrlchf Express (nr33)
(r 21)
Strand d Nuclease
Unprocessed
I 6) Doqble
Vibrio porneroyi W
Vibnto ponticus I
24
8
(n
a 2) mR INAonly
,
Virbno natriegens
Vbno mgnpulchntudrcao
Vibrioowenso t
Vibrio parahaemolytcus
20
a1) RNAseH
Vibro mediterranei
Vibrno mytilt
16
n = 5)
Vibno proteolyticus
Vibrio scophthalml
Vibrio shiloni
Vibio sp.
Vibro sp. 12012
-
Vibro sp. AND4
Vibrio sp. BFLP-10
Vibriosp. EW2S
Vbrio sp. FALF273
Vibro sp.FLCAl
Vibro sp. MSSRF10
Vbi sp. R-619
14bo sp.seaur,1
Vibrio superstes
Vibrio tapens
.
-
Vibno vulnfcus
xwit JeOsa*4NW*
Vibrio
Figure 8 Species Level assignment of SSU rRNA sequences from all Metatranscriptome samples.
Putative SSU rRNA reads filtered from the metatranscriptome samples were compared to the RDP
database of SSU rRNA sequences using MEGAN; reads were annotated at the species level if they
aligned with a sequence in the database with minimum length 90nt and e-value less than -40. Species
names are on the vertical axis; counts of reads are on the horizontal. The different
metatranscriptome samples are colored as indicated by the key and the relative abundances of each
species are displayed as a stacked bar chart. Bacterial order names are the left most labels of the
grey and white boxes.
52
Vibrio were present in the 16S rRNA gene libraries as well as an unspecified
Rhodobacteraceae, which is in the same family as Marcaulismaris.These two
species are also not specifically found in the MEGAN taxonomic analysis; however,
like the clone libraries, reads classified as Vibrio and Rhodobacteraceae are present.
Of particular interest is the presence in two of the six metatranscriptome
samples of SSU rRNAs classified by MG-RAST as Stappia stellulata.Although this
strain is not found in the clone libraries it has been found in culture-based analysis
of Nematostella and is one of the ten sequenced isolates used for comparative
genomic analysis.
While the SSU rRNA reads analyzed by MG-RAST were mostly
grouped as Proteobacteria, the MEGAN taxonomic analysis revealed the small but
relevant presence of other bacterial phyla including Bacteroidetes, Planctomycetes,
Firmicutes, Cyanobacteria and Actinobacteria. Two of these phyla (Actinobacteria
and Firmicutes) were found in four or more of the five MEGAN imported samples
giving good weight of evidence to their true presence within the anemone holobiont
(Figure 3); however, these phyla were not detected within the 16S clone libraries.
Further, while the phyla of Planctomycetes and Bacteroidetes were found in only
one or two MEGAN samples, these phyla are a significant portion of the libraries,
thus illustrating the vast complexity and continuum of N. vectensis microbial
inhabitants.
N. vectensis Bacterial Associate Genome Recruitment Mapping
Filtered reads were also recruited, under stringent conditions (90% identity
over 50% of the read length), to the sequenced and annotated genomes of the 10
cultured N. vectensis associated bacteria in order to determine if those particular
bacteria were present and active within the metatranscriptomes of the lab
anemones. The Pseudomonasand Limnobacterpopulations were recruited to by 18
and 127 reads respectively suggesting they may be relevant and active members of
the microbial community. The Agrobacteriaand Stappia genomes had zero and one
reads mapped to them respectively providing inconclusive evidence regarding
whether they are active within the anemone holobiont.
In the Pseudomonasgenomes, the reads tended to primarily map to
conserved housekeeping genes (Table 4); these included translation initiation and
elongation factors, ribosomal proteins, and synthases of ATP and nucleotides. Two
genes of potential ecological relevance were detected, including a chemotaxis
protein and a multidrug-transport protein. However, each of these genes had only
one read recruited to it, making it hard to posit strong conclusions from their
presence.
The recruited reads of the Limnobacter population, however, do offer some
insight into its ecology within the anemone (Table 5). The top function for the
Limnobacter population is a member of the phasin protein family, a group of
proteins responsible for the synthesis and structure of PHA granules, which are
insoluble spherical inclusions of polyesters used as energy and carbon storage
reserves within bacterial cells. PHA granules have recently been determined,
through direct genetic manipulation of symbiont bacteria, to play a critical role in
53
allowing a species of Burkholderiato establish a community within the bean bug
Riptortuspedestris (Kim et al., 2013); when the PHA synthesis genes were knocked
out, density of the Burkholderiacommunities declined sharply within the bean bugs
and resulted in smaller host size. Further, the PHA deficient bacteria were more
vulnerable to general environmental stresses. Because of the strong expression of
this particular phasin protein and the presence of COG3432, a synthase of the
polyester units that compose the granules (Table 5), and this recent work with bean
bug symbionts, it is possible to hypothesize that PHA granules may play a similarly
important role in Limnobactersadapting to the anemone holobiont.
Table 4 Top functions of metatranscriptomic reads recruited to the Pseudomonas population of the
sequenced anemone associates.
# Rleads
COG category Funtion
cO/NG
Mapped
COG0050
6
J
GTPases - translation elongation factors
COG0090
3
J
Ribosomal protein L2
Hypothetical_1
1
NA
C0G0093
1
J
Ribosomal protein L14
COG0644
1
C
Dehydrogenases (flavoproteins)
COG0290
1
J
COG0842
1
V
Translation initiation factor 3 (IF-3)
ABC-type multidrug transport system,
permease component
COG0209
1
F
Ribonucleotide reductase, alpha subunit
COG0840
1
T
Methyl-accepting chemotaxis protein
Hypothetical_2
1
NA
-
COG0055
1
C
FOF1-type ATP synthase, beta subunit
Reads were mapped to the gene sequences of the four sequenced Pseudomonasassociates that were
annotated using the COG and NOG subsets of the eggNOG database (version 3.0). Mapped reads to
transcripts unable to be annotated in the above manner are labeled "Hypothetical_#". NA = Not
classified into a COG category. See Figure 4 for relevant COG category definitions.
Other top mapped functions of the Limnobacter population include phosphate and
iron transporters and flagellar motility proteins (Table 5). The expression of these
particular transcripts portrays the Limnobacter population as an active and foraging
member of the anemone microbiome. Further, the similar high expression of iron
regulatory genes to the annotated "Cnidarian" reads provides further evidence of
iron as a potential source of structuring within the anemone.
54
Table 5 Top 40 functions of metatranscriptomic reads recruited to the Limnobacterpopulation of the
seauenced anemone associates.
NOG45042
22
S
Hypothetical_ 1
COG3211
COG1629
Hypothetical-2
10
6
6
6
NA
COG0226
4
COG3604
4
T
ABC-type phosphate transport system, periplasmic component
Transcriptional regulator containing GAF, AAA-type ATPase, and DNA binding
domains
COG2885
Hypothetical_3
NOG47765
Hypothetical_4
Hypothetical_5
4
3
3
M
Outer membrane protein and related peptidoglycan-associated (lipo)proteins
COG3203
COG2908
NOG69967
3
3
3
3
2
R
P
NA
P
NA
NA
M
S
S
S
K
2
COG0183
COG2063
2
2
N
NOG12793
Hypothetical_6
Hypothetical_7
2
S
2
2
NA
NA
COG1960
Hypothetical_8
2
2
NA
COG0834
2
T
COG4774
2
2
2
2
P
COG0784
COG1426
NOG16078
Predicted phosphatase
Outer membrane receptor proteins, mostly Fe transport
NA
S
NOG268346
COG1386
2
Phasin family protein
Outer membrane protein (porin)
Uncharacterized protein conserved in bacteria
Galactose oxidase
Predicted transcriptional regulator containing the HTH domain
Acetyl-CoA acetyltransferase
Flagellar basal body L-ring protein
Calcium ion binding protein
Acyl-CoA dehydrogenases
T
S
S
ABC-type amino acid transport/signal transduction systems, periplasmic
component/domain
Outer membrane receptor for monomeric catechols
FOG: CheY-like receiver
Uncharacterized protein conserved in bacteria
3-hydroxyisobutyrate dehydrogenase and related beta-hydroxyacid
COG2084
COG1278
COG1999
K
COG1520
COG0234
COG0194
COG4105
COG0773
COG0102
S
0
F
R
R
M
J
COG3243
NOG08290
E
dehydrogenases
Cold shock proteins
involved in biogenesis of respiratory and photosynthetic systems
FOG: WD40-like repeat
Co-chaperonin GroES (HSP1O)
Guanylate kinase
DNA uptake lipoprotein
UDP-N-acetylmuramate-alanine ligase
Ribosomal protein L13
Poly(3-hydroxyalkanoate) synthetase
L-Ectoine synthase
Reads were mapped to the gene sequences of the two sequenced Limnobacterassociates that were
annotated using the COG and NOG subsets of the eggNOG database (version 3.0). Mapped reads to
transcripts unable to be annotated in the above manner are labeled "Hypothetical_#". NA = Not
classified into a COG category. See Figure 4 for relevant COG category definitions.
Bacterial Life Strategy Assessment Limnobacter vs. Other Proteobacteria
The results of the analysis of top expressed "Bacterial" reads annotated by MEGAN
and those annotated through read mapping to sequenced associate genomes
55
suggests the presence of two diverse bacterial strategies for residing within the
anemone holobiont. One, as seen in the presence of top "Bacteria" expressed
functions (Figure 4), mapped Pseudomonasreads (Table 4) and top "Proteobacteria"
expressed functions (Table 6) in which the bacteria are actively regulating solute
levels with the external environment, synthesizing ribosomes and ATP and
replicating. The other, as seen in the top expressed functions of Limnobacter
mapped reads, in which the bacteria are foraging, adapting and interacting within
the holobiont.
To explore this divergence of strategies further, the relative proportions of
reads annotated into particular COG Categories within those classified as
"Proteobacteria" by MEGAN and those classified as Limnobacterby genome
mapping were compared (Figure 8). The results show that the Limnobacter
population is expressing disproportionately higher amounts (by at least two-fold) of
COG Categories T (Signal Transduction Mechanisms), P (Inorganic Ion Transport
and Metabolism), N (Cell Motility) and I (Lipid Transport and Metabolism) whereas
the general "Proteobacteria" are expressing at least 2-fold higher amounts of
categories J (Translation, Ribosomal Structure and Biogenesis), C (Energy
Production and Conversion), G (Carbohydrate Transport and Metabolism) and L
(Replication, Recombination and Repair).
Table 6 Top 10 Annotated NOG/COG Orthologous Groups of "Proteobacteria" Binned Reads
NOG323497
NA
Hypothetical
1.98
NOG12793
S
1.55
COG0050
J
Calcium ion binding protein
GTPases - translation elongation
factors
COG3203
M
1.05
COG0749
L
Outer membrane protein (porin)
DNA polymerase I - 3'-5' exonuclease
and polymerase domains
0.99
COG0055
C
FOF1-type ATP synthase, beta subunit
0.80
COG0090
J
0.74
COG0188
L
COG0477
R
Ribosomal protein L2
Type IIA topoisomerase (DNA gyrase/
topo I, topoisomerase IV), A subunit
Permeases of the major facilitator
superfamily
1.24
0.74
0.68
COG0056
C
FOF1-type ATP synthase, alpha subunit
0.62
Reads binned as "Bacteria" by MEGAN analysis were further subdivided into Bacterial phyla. The
1,616 out of 1,853 total "Bacterial" reads that were classified as "Proteobacteria" had their
annotations filtered from the total "Bacterial" pool and are seen above. See Figure 4 for relevant COG
Category Information.
56
This suggests that the Limnobactersmay be adapting to the anemone in their
distinct
way in that they are signaling, moving and scavenging lipids and ions
own
(mostly iron) to withstand the microenvironment of their host. In addition, from the
view provided by MEGAN classification of BlastX annotations we also see that a
more generalized proteobacterial population is successfully inhabiting the anemone
by producing energy and dividing.
Mapped Limnobacter Reads
* MEGAN Binned Proteobacteria Reads
40
35
20
0
5
NA
C
E
D
G
F
I
H
K
J
M
L
0
N
Q
P
5
R
U
T
V
COG Category
Figure 9 COG Category Distribution of Reads Binned By MEGAN as Proteobacteria and Reads
Mapped to Sequenced LimnobacterGenomes. Total count of reads mapped to Limnobacter genomes
= 127; total count of reads binned by MEGAN as Proteobacteria = 1616. See Figure 4 for COG category
descriptions.
Assessment of Sequenced Symbiont Genome Annotation Benefit
One of the initial goals for sequencing bacterial genomes of microbes associated
with Nematostella was to use them as reference genomes for metatranscriptomics
studies of the anemone holobiont; this was done as it was hypothesized that since
those microbes were found to be present on the anemone, it was likely transcripts
from those bacteria would be found within its metatranscriptome and having their
genome sequences would possibly be the only way of identifying them. Thus, it was
desired to assess the success of these genomes on the ability to help annotate
metatranscriptomic reads.
To measure this level of success, orthologous groups were compared
between the collected "Bacterial" reads as determined through MEGAN analysis and
COG/NOG annotation and those reads that mapped to any of the 10 symbiont
genomes at 90% similarity for over 50% of the read. Whereas MEGAN determined
bacterial reads were classified into 719 distinct NOG or COG orthologous groups,
reads mapping to symbiont genomes only represented 49 distinct orthologous
57
groups with 31 shared between both methods of assessment (Figure 10). The
symbiont genomes thus only added the presence of 18 orthologous groups to the
metatranscriptome assessment, with all 18 of them gained just from having the
Limnobacter genomes.
While the number of unique orthologous groups gained by symbiont
mapping is not large, when explored along with orthologous groups with 2-fold
higher expression in symbiont mapping than MEGAN analysis, these groups
represent most of the unique and interesting ecological insights into activities in the
anemone holobiont previously discussed (Table 7). For instance, the majority of the
strongly expressed NOG45042 reads and all of the COG3243 reads, both related to
the synthesis and regulation of PHA granules (as described in the "Genome
Mapping" Section above), were determined through read mapping. Further, the
flagellar reads (COG2063) and the iron scavenging TonB dependent siderophore
receptor reads (COG4774) were only annotated in the presence of the Limnobacter
genomes. So, while the symbiont genomes added relatively fewer numbers of total
reads and orthologous groups annotations, the ones they did add gave great
evidence to the ways in which bacteria of the Limnobacter population are adapting
and surviving within the anemone.
OGs from
OGs from
Symbiont
Genome
Mappings
MEGAN
Analysis
Figure 10 Orthologous Group comparison of those present in "Bacterial" reads as determined
through MEGAN analysis (Orange circle) and those present in the symbiont genome mapping
analysis (Yellow circle). Numbers represent the number of unique orthologous groups present within
each particular analysis or shared between them (overlap of circles).
Conclusion
Metatranscriptomic analysis on whole animals of Nematostella vectensis was
performed in order to get an understanding of the transcriptional activity of the
host and microbes within the anemone holobiont. Because the sought bacterial
mRNA activity is such a small component of the pool of total RNA within a complex
milieu of organisms such as the anemone holobiont, various methods of eliminating
unwanted components, particularly rRNAs were tested (Stewart et al., 2010); it was
found that on absolute terms use of the kits MicrobENRICH/MicrobEXPRESS and
58
Table 7 Orthologous Groups present in Metatranscriptome Data Unique to Limnobacter Mapping
Analysis or Present at Least 2-Fold Higher in Limnobacter Mapping Analysis
C
NOGteor
CG
Function
-
Total
Counts
IMEGAN6
-Analysis
Toaons
Linobacter
Mapping
Analysis
NOG45042
S
Phasin family protein
2
22
COG3211
R
2
6
COG3604
T
Predicted phosphatase
Transcriptional regulator containing GAF, AAAtype ATPase, and DNA binding domains
0
4
NOG47765
S
0
3
NOG69967
S
Galactose oxidase
0
2
COG2063
N
0
2
COG4774
P
Flagellar basal body L-ring protein
Outer membrane receptor for monomeric
catechols
0
2
COG1426
S
Uncharacterized protein conserved in bacteria
0
2
NOG16078
S
--
0
2
C0G1520
S
FOG: WD40-like repeat
0
2
C0G3243
I
Poly(3-hydroxyalkanoate) synthetase
0
2
C0G3490
S
Uncharacterized protein conserved in bacteria
cAMP-binding proteins - catabolite gene
0
2
COG0664
T
0
2
COG2262
R
0
1
COG3046
R
GTPases
Uncharacterized protein related to
deoxyribodipyrimidine photolyase
0
1
COG1905
C
NADH:ubiquinone oxidoreductase 24 kD subunit
0
1
NOG44894
S
0
1
COG0067
E
0
1
NOG13288
S
0
1
0
1
activator and regulatory subunit of cAMPdependent protein kinases
Glutamate synthase domain 1
Membrane proteins related to
COG0739
M
metalloendopeptidases
Orthologous groups present in the "Bacterial" Reads of those analyzed by MEGAN and those reads
mapping to sequenced symbiont genomes were compared and those present in just the mapped
reads or at least 2-fold higher in the mapped reads are displayed in the table above. Please see Figure
4 for relevant COG category definitions.
mRNAOnly resulted in the greatest number of recovered bacterial mRNA reads
while on a relative scale, use of RNAseH and mRNAOnly was superior. However,
with sample sizes of one for each of the enrichment/subtraction technique trials, it
is difficult to make strong conclusions.
59
Assessment of the highest Cnidarian expression activity revealed mostly
transcripts coding for cell structure regulation and components of the cytoskeleton
as well as other "housekeeping" functions such as ribosome synthesis and energy
production; of possible interest was a highly expressed orthologous group for iron
regulation and metabolism.
The bacteria, mostly dominated by the phylum Proteobacteria as determined
by MEGAN taxonomic analysis and classification of SSU rRNAs using MG-RAST
(Huson et al. 2007), had high expression of many ecologically relevant cellwall/membrane related functions including efflux pumps and permeases as well as
a variety of replication and ribosomal synthesis genes. While these top functions
indicated a presence of an actively growing, dividing and interacting population of
bacteria, genome recruitment mapping to sequenced N. vectensis associates
revealed a specialized population of the genus Limnobacterexpressing motility, iron
regulation, nutrient scavenging and antibiotic resistance genes. Of particular
interest within the Limnobacter population is that their most highly expressed gene
regulates PHA-granules: large, insoluble and intracellular stores of carbon known to
be associated with symbiosis in insects (Kim et al., 2013).
Metatranscriptomics can be used as a descriptive method in order to create
more explicit lab testable hypotheses. For the anemone holobiont,
metatranscriptomics has revealed the possible importance of iron as an
environmental factor as both the Cnidarian host and bacteria expressed genes to
respond and regulate it. Further, it has revealed evidence of specialization of
Limnobacterswithin the microbiome and the potential importance PHA granules
may play in their ability to persist within the holobiont.
60
Conclusions
The starlet sea anemone Nematostella vectensis is an emerging model of
evolution and development, which, like all multicellular organisms, can be viewed as
a "holobiont" consisting of a host organism (the anemone) and microbial associates
that influence the hosts physiology and evolution (Stefanik et al., 2013; Reitzel et al.,
2012; Rohwer et al., 2002; Vega-Thurber et al., 2009). Previous experimental work
has shown that similar microbial populations are associated with Nematostella over
distinct geographic locations and timescales (Har et al., MS Thesis). However, how
these microbes are associating, their function within the host and the factors used to
structure the community are unknown. In order to address these questions,
comparative genomics of known bacterial associates and metatranscriptomics on
whole lab-raised anemones was performed.
Comparative genomic analysis was carried out on 10 isolates from four
bacterial populations (P. oleovorans,A. tumefaciens, L. thiooxidans and S. stellulata)
known to be associated with the anemone. While no genes, phage or mobile genetic
elements were found to be horizontally transferred among these populations or
between these populations and the anemone host, analysis of holobiont-specific
genes (i.e. gene orthologs found only in N. vectensis-associatedbacterial genomes
and absent in closely related genomes of the same genus/family) within the
populations revealed common functional themes that may indicate adaptation of
bacteria within the holobiont. Specifically, two of the populations, P. oleovorans and
L. thiooxidans,contained a holobiont-specific efflux transporter and multi-drug
efflux pump respectively, and the P. oleovoranspopulation contained a holobiontspecific antibiotic biosynthesis monooxygenase. This provides some evidence to the
idea that antibiotic synthesis/resistance is a mechanism of adaptation within the
microbial communities of the anemone and provides a starting point for future labtestable work of anemone holobiont structuring factors.
Metatranscriptomic analysis of whole lab-raised anemones revealed that the
bacteria, mostly dominated by the phylum Proteobacteria as determined by MEGAN
taxonomic analysis and classification of SSU rRNAs using MG-RAST (Huson et al.
2007), had high expression of many ecologically relevant cell-wall/membrane
related functions including efflux pumps and permeases as well as a variety of
replication and ribosomal synthesis genes. These functions indicate the presence of
an actively growing, dividing and interacting population of bacteria, but they do
little to describe particular bacterial adaptations to the holobiont. Genome
recruitment mapping to sequenced N. vectensis associates, unlike the more broad in
scale MEGAN analysis, revealed a specialized population of the genus Limnobacter
expressing motility, iron regulation, nutrient scavenging and antibiotic resistance
genes. Of particular interest within this Limnobacter population is that their most
highly expressed gene regulates PHA-granules: large, insoluble and intracellular
stores of carbon known to be associated with symbiosis in insects (Kim et al., 2013).
Discovery of these specialized transcripts from the Limnobacter population among
the anemone metatrascriptomes illustrates the power of known, sequenced
microbial associates in the analysis of a holobiont. To better understand bacterial
adaptation within the anemone, more work should be done in trying to culture and
61
sequence less free-living, more anemone-specific strains as these, unlike the 10
isolates used in this analysis, will contain clearer genomic signals for the selective
environment of the anemone holobiont. Not only will their genomes likely provide a
reference for annotating more reads from the metatranscriptome, but assessment of
their holobiont-specific genes using the methods established here will provide
stronger evidence of microbial community structuring factors within the anemone.
62
References
Altschul, S. F., T. L. Madden, A. A. Schaffer, J.Zhang, Z. Zhang, W. Miller and D. J.
Lipman. "Gapped Blast and Psi-Blast: A New Generation of Protein Database
Search Programs." Nucleic Acids Res 25, no. 17 (1997): 3389-402.
Aziz, R. K., D. Bartels, A. A. Best, M. DeJongh, T. Disz, R. A. Edwards, K. Formsma, S.
Gerdes, E. M. Glass, M. Kubal, F. Meyer, G. J. Olsen, R. Olson, A. L. Osterman, R.
A. Overbeek, L. K. McNeil, D. Paarmann, T. Paczian, B. Parrello, G. D. Pusch, C.
Reich, R. Stevens, 0. Vassieva, V. Vonstein, A. Wilke and 0. Zagnitko. "The Rast
Server: Rapid Annotations Using Subsystems Technology." BMC Genomics 9,
(2008): 75.
Backhed, F., H. Ding, T. Wang, L. V. Hooper, G. Y. Koh, A. Nagy, C. F. Semenkovich and
J. I. Gordon. "The Gut Microbiota as an Environmental Factor That Regulates
Fat Storage." ProcNatlAcad Sci U SA 101, no. 44 (2004): 15718-23.
Bevins, C. L. and N. H. Salzman. "The Potter's Wheel: The Host's Role in Sculpting Its
Microbiota." Cell Mol Life Sci 68, no. 22 (2011): 3675-85.
Boettcher, K. J., B. J.Barber and J.T. Singer. "Additional Evidence That Juvenile
Oyster Disease Is Caused by a Member of the Roseobacter Group and
Colonization of Nonaffected Animals by Stappia Stellulata-Like Strains." Appl
Environ Microbiol 66, no. 9 (2000): 3924-30.
Bomar, L., M. Maltz, S. Colston and J. Graf. "Directed Culturing of Microorganisms
Using Metatranscriptomics." MBio 2, no. 2 (2011): e00012-11.
Bosch, T. C. "Cnidarian-Microbe Interactions and the Origin of Innate Immunity in
Metazoans." Annu Rev Microbiol, (2013).
Chapman, J.A., E. F. Kirkness, 0. Simakov, S. E. Hampson, T. Mitros, T. Weinmaier, T.
Rattei, P. G. Balasubramanian, J. Borman, D. Busam, K. Disbennett, C.
Pfannkoch, N. Sumin, G. G. Sutton, L. D. Viswanathan, B. Walenz, D. M.
Goodstein, U. Hellsten, T. Kawashima, S. E. Prochnik, N. H. Putnam, S. Shu, B.
Blumberg, C. E. Dana, L. Gee, D. F. Kibler, L. Law, D. Lindgens, D. E. Martinez, J.
Peng, P. A. Wigge, B. Bertulat, C. Guder, Y. Nakamura, S. Ozbek, H. Watanabe,
K. Khalturin, G. Hemmrich, A. Franke, R. Augustin, S. Fraune, E. Hayakawa, S.
Hayakawa, M. Hirose, J.S. Hwang, K. Ikeo, C. Nishimiya-Fujisawa, A. Ogura, T.
Takahashi, P. R. Steinmetz, X. Zhang, R. Aufschnaiter, M. K. Eder, A. K. Gorny,
W. Salvenmoser, A. M. Heimberg, B. M. Wheeler, K. J. Peterson, A. Bottger, P.
Tischler, A. Wolf, T. Gojobori, K. A. Remington, R. L. Strausberg, J.C. Venter, U.
Technau, B. Hobmayer, T. C. Bosch, T. W. Holstein, T. Fujisawa, H. R. Bode, C.
N. David, D. S. Rokhsar and R. E. Steele. "The Dynamic Genome of Hydra."
Nature 464, no. 7288 (2010): 592-6.
63
Chaucheyras-Durand, F. and H. Durand. "Probiotics in Animal Nutrition and Health."
Benef Microbes 1, no. 1 (2010): 3-9.
Coleman, M. L. and S. W. Chisholm. "Ecosystem-Specific Selection Pressures
Revealed through Comparative Population Genomics." Proc NatlAcad Sci US
A 107, no. 43 (2010): 18634-9.
Dehal, P. S., M. P. Joachimiak, M. N. Price, J.T. Bates, J.K. Baumohl, D. Chivian, G. D.
Friedland, K. H. Huang, K. Keller, P. S. Novichkov, I. L. Dubchak, E. J.Alm and
A. P. Arkin. "Microbesonline: An Integrated Portal for Comparative and
Functional Genomics." Nucleic Acids Res 38, no. Database issue (2010): D396400.
Dobber, R., A. Hertogh-Huijbregts, J. Rozing, K. Bottomly and L. Nagelkerken. "The
Involvement of the Intestinal Microflora in the Expansion of Cd4+ T Cells
with a Naive Phenotype in the Periphery." Dev Immunol 2, no. 2 (1992): 14150.
Edgar, R. C. "Muscle: Multiple Sequence Alignment with High Accuracy and High
Throughput." Nucleic Acids Res 32, no. 5 (2004): 1792-7.
Franzenburg, S., S. Fraune, P. M. Altrock, S. Kunzel, J.F. Baines, A. Traulsen and T. C.
Bosch. "Bacterial Colonization of Hydra Hatchlings Follows a Robust
Temporal Pattern." ISMEJ 7, no. 4 (2013): 781-90.
Fraune, S., R. Augustin, F. Anton-Erxleben, J.Wittlieb, C. Gelhaus, V. B. Klimovich, M.
P. Samoilovich and T. C. Bosch. "In an Early Branching Metazoan, Bacterial
Colonization of the Embryo Is Controlled by Maternal Antimicrobial
Peptides." Proc NatlAcad Sci USA 107, no. 42 (2010): 18067-72.
Fraune, S. and T. C. Bosch. "Long-Term Maintenance of Species-Specific Bacterial
Microbiota in the Basal Metazoan Hydra." ProcNatlAcad Sci USA 104, no. 32
(2007): 13146-51.
Fraune, S. and T. C. Bosch. "Why Bacteria Matter in Animal Development and
Evolution." Bioessays 32, no. 7 (2010): 571-80.
Frias-Lopez, J., Y. Shi, G. W. Tyson, M. L. Coleman, S. C. Schuster, S. W. Chisholm and
E. F. Delong. "Microbial Community Gene Expression in Ocean Surface
Waters." Proc Natl Acad Sci U SA 105, no. 10 (2008): 3805-10.
Gan, H. M., A. 0. Hudson, A. Y. Rahman, K. G. Chan and M. A. Savka. "Comparative
Genomic Analysis of Six Bacteria Belonging to the Genus Novosphingobium:
Insights into Marine Adaptation, Cell-Cell Signaling and Bioremediation."
BMCGenomics 14, (2013): 431.
64
Gilbert, J.A., D. Field, Y. Huang, R. Edwards, W. Li, P. Gilna and I. Joint. "Detection of
Large Numbers of Novel Sequences in the Metatranscriptomes of Complex
Marine Microbial Communities." PLoS One 3, no. 8 (2008): e3042.
Gosalbes, M. J., A. Durban, M. Pignatelli, J.J.Abellan, N. Jimenez-Hernandez, A. E.
Perez-Cobas, A. Latorre and A. Moya. "Metatranscriptomic Approach to
Analyze the Functional Human Gut Microbiota." PLoS One 6, no. 3 (2011):
e17447.
Har,
J.Y. "Introducing the starlet sea anemone
Nematostella vectensis as a model for
and disease in hexacorals." Civil
of
health
investigating microbial mediation
and Environmental Engineering. Cambridge, MA, Massachusetts Institute of
Technology (2009): 115.
Hayashi, T., K. Makino, M. Ohnishi, K. Kurokawa, K. Ishii, K. Yokoyama, C. G. Han, E.
Ohtsubo, K. Nakayama, T. Murata, M. Tanaka, T. Tobe, T. lida, H. Takami, T.
Honda, C. Sasakawa, N. Ogasawara, T. Yasunaga, S. Kuhara, T. Shiba, M.
Hattori and H. Shinagawa. "Complete Genome Sequence of
Enterohemorrhagic Escherichia Coli 0157:H7 and Genomic Comparison with
a Laboratory Strain K-12." DNA Res 8, no. 1 (2001): 11-22.
Helbling, D. E., M. Ackermann, K. Fenner, H. P. Kohler and D. R. Johnson. "The
Activity Level of a Microbial Community Function Can Be Predicted from Its
Metatranscriptome." ISMEJ6, no. 4 (2012): 902-4.
Hislop, N. R., D. de Jong, D. C. Hayward, E. E. Ball and D. J.Miller. "Tandem
Organization of Independently Duplicated Homeobox Genes in the Basal
Cnidarian Acropora Millepora." Dev Genes Evol 215, no. 5 (2005): 268-73.
Hooper, L. V. and J. I. Gordon. "Commensal Host-Bacterial Relationships in the Gut."
Science 292, no. 5519 (2001): 1115-8.
Huson, D. H., A. F. Auch, J.Qi and S. C. Schuster. "Megan Analysis of Metagenomic
Data." Genome Res 17, no. 3 (2007): 377-86.
Konstantinidis, K. T., A. Ramette and J.M. Tiedje. "The Bacterial Species Definition in
the Genomic Era." Philos Trans R Soc Lond B Biol Sci 361, no. 1475 (2006):
1929-40.
Kusserow, A., K. Pang, C. Sturm, M. Hrouda, J. Lentfer, H. A. Schmidt, U. Technau, A.
von Haeseler, B. Hobmayer, M. Q. Martindale and T. W. Holstein. "Unexpected
Complexity of the Wnt Gene Family in a Sea Anemone." Nature 433, no. 7022
(2005): 156-60.
Leimena, M. M., J.Ramiro-Garcia, M. Davids, B. van den Bogert, H. Smidt, E. J.Smid, J.
Boekhorst, E. G. Zoetendal, P. J.Schaap and M. Kleerebezem. "A
65
Comprehensive Metatranscriptome Analysis Pipeline and Its Validation
Using Human Small Intestine Microbiota Datasets." BMC Genomics 14,
(2013): 530.
Ley, R. E., F. Backhed, P. Turnbaugh, C. A. Lozupone, R. D. Knight and J. I. Gordon.
"Obesity Alters Gut Microbial Ecology." Proc NatlAcad Sci U SA 102, no. 31
(2005): 11070-5.
Ley, R. E., M. Hamady, C. Lozupone, P. J.Turnbaugh, R. R. Ramey, J. S. Bircher, M. L.
Schlegel, T. A. Tucker, M. D. Schrenzel, R. Knight and J. I. Gordon. "Evolution of
Mammals and Their Gut Microbes." Science 320, no. 5883 (2008): 1647-51.
Ley, R. E., P. J.Turnbaugh, S. Klein and J. I. Gordon. "Microbial Ecology: Human Gut
Microbes Associated with Obesity." Nature 444, no. 7122 (2006): 1022-3.
Mahowald, M. A., F. E. Rey, H. Seedorf, P. J. Turnbaugh, R. S. Fulton, A. Wollam, N.
Shah, C. Wang, V. Magrini, R. K. Wilson, B. L. Cantarel, P. M. Coutinho, B.
Henrissat, L. W. Crock, A. Russell, N. C. Verberkmoes, R. L. Hettich and J.I.
Gordon. "Characterizing a Model Human Gut Microbiota Composed of
Members of Its Two Dominant Bacterial Phyla." Proc NatlAcad Sci USA 106,
no. 14 (2009): 5859-64.
Mandel, M. J., M. S. Wollenberg, E. V. Stabb, K. L. Visick and E. G. Ruby. "A Single
Regulatory Gene Is Sufficient to Alter Bacterial Host Range." Nature 458, no.
7235 (2009): 215-8.
Manning, V. A., I. Pandelova, B. Dhillon, L. J. Wilhelm, S. B. Goodwin, A. M. Berlin, M.
Figueroa, M. Freitag, J. K. Hane, B. Henrissat, W. H. Holman, C. D. Kodira, J.
Martin, R. P. Oliver, B. Robbertse, W. Schackwitz, D. C. Schwartz, J.W.
Spatafora, B. G. Turgeon, C. Yandava, S. Young, S. Zhou, Q. Zeng, I. V. Grigoriev,
L. J.Ma and L. M. Ciuffetti. "Comparative Genomics of a Plant-Pathogenic
Fungus, Pyrenophora Tritici-Repentis, Reveals Transduplication and the
Impact of Repeat Elements on Pathogenicity and Population Divergence." G3
(Bethesda) 3, no. 1 (2013): 41-63.
Meyer, F., D. Paarmann, M. D'Souza, R. Olson, E. M. Glass, M. Kubal, T. Paczian, A.
Rodriguez, R. Stevens, A. Wilke, J.Wilkening and R. A. Edwards. "The
Metagenomics Rast Server - a Public Resource for the Automatic
Phylogenetic and Functional Analysis of Metagenomes." BMC Bioinformatics
9, (2008): 386.
Miller, D. J., E. E. Ball and U. Technau. "Cnidarians and Ancestral Genetic Complexity
in the Animal Kingdom." Trends Genet 21, no. 10 (2005): 536-9.
Mira, A., H. Ochman and N. A. Moran. "Deletional Bias and the Evolution of Bacterial
Genomes." Trends Genet 17, no. 10 (2001): 589-96.
66
Moran, M. A., B. Satinsky, S. M. Gifford, H. Luo, A. Rivers, L. K. Chan, J. Meng, B. P.
Durham, C. Shen, V. A. Varaljay, C. B. Smith, P. L. Yager and B. M. Hopkinson.
"Sizing up Metatranscriptomics." ISMEJ 7, no. 2 (2013): 237-43.
Moran, N. A. "Symbiosis as an Adaptive Process and Source of Phenotypic
Complexity." Proc NatlAcadSci USA 104 Suppl 1, (2007): 8627-33.
Nyholm, S. V. and J.Graf. "Knowing Your Friends: Invertebrate Innate Immunity
Fosters Beneficial Bacterial Symbioses." Nat Rev Microbiol 10, no. 12 (2012):
815-27.
O'Brien, H. E., Y. Gong, P. Fung, P. W. Wang and D. S. Guttman. "Use of Low-Coverage,
Large-Insert, Short-Read Data for Rapid and Accurate Generation of
Enhanced-Quality Draft Pseudomonas Genome Sequences." PLoS One 6, no.
11 (2011): e27199.
O'Hara, A. M. and F. Shanahan. "The Gut Flora as a Forgotten Organ." EMBO Rep 7,
no. 7 (2006): 688-93.
Ochman, H., M. Worobey, C. H. Kuo, J. B. Ndjango, M. Peeters, B. H. Hahn and P.
Hugenholtz. "Evolutionary Relationships of Wild Hominids Recapitulated by
Gut Microbial Communities." PLoS Biol 8, no. 11 (2010): e1000546.
Penn, K., J.Wang, S. C. Fernando and J.R. Thompson. (Submitted). "Secondary
Metabolite Gene Expression and Interplay of Bacterial Functions in a
Freshwater Cyanobacterial Bloom."
Poretsky, R. S., S. Gifford, J.Rinta-Kanto, M. Vila-Costa and M. A. Moran. "Analyzing
Gene Expression from Marine Microbial Communities Using Environmental
Transcriptomics." J Vis Exp, no. 24 (2009).
Powell, S., D. Szklarczyk, K. Trachana, A. Roth, M. Kuhn, J.Muller, R. Arnold, T. Rattei,
I. Letunic, T. Doerks, L. J.Jensen, C. von Mering and P. Bork. "Eggnog V3.0:
Orthologous Groups Covering 1133 Organisms at 41 Different Taxonomic
Ranges." Nucleic Acids Res 40, no. Database issue (2012): D284-9.
Putnam, N. H., M. Srivastava, U. Hellsten, B. Dirks, J.Chapman, A. Salamov, A. Terry,
H. Shapiro, E. Lindquist, V. V. Kapitonov, J.Jurka, G. Genikhovich, 1. V.
Grigoriev, S. M. Lucas, R. E. Steele, J.R. Finnerty, U. Technau, M. Q. Martindale
and D. S. Rokhsar. "Sea Anemone Genome Reveals Ancestral Eumetazoan
Gene Repertoire and Genomic Organization." Science 317, no. 5834 (2007):
86-94.
Quast, C., E. Pruesse, P. Yilmaz, J.Gerken, T. Schweer, P. Yarza, J.Peplies and F. 0.
Glockner. "The Silva Ribosomal Rna Gene Database Project: Improved Data
67
Processing and Web-Based Tools." Nucleic Acids Res 41, no. Database issue
(2013): D590-6.
Radax, R., T. Rattei, A. Lanzen, C. Bayer, H. T. Rapp, T. Urich and C. Schleper.
"Metatranscriptomics of the Marine Sponge Geodia Barretti: Tackling
Phylogeny and Function of Its Microbial Community." Environ Microbiol 14,
no. 5 (2012): 1308-24.
Rader, B. A. and S. V. Nyholm. "Host/Microbe Interactions Revealed through "Omics"
in the Symbiosis between the Hawaiian Bobtail Squid Euprymna Scolopes
and the Bioluminescent Bacterium Vibrio Fischeri." Biol Bull 223, no. 1
(2012): 103-11.
J.F., M. A.
Rawls,
Mahowald, R. E. Ley and J.1. Gordon. "Reciprocal Gut Microbiota
Transplants from Zebrafish and Mice to Germ-Free Recipients Reveal Host
Habitat Selection." Cell 127, no. 2 (2006): 423-33.
Rawls,
J. F., B. S.
Samuel and J.I. Gordon. "Gnotobiotic Zebrafish Reveal
Evolutionarily Conserved Responses to the Gut Microbiota." Proc NatlAcad
Sci USA 101, no. 13 (2004): 4596-601.
Reitzel, A. M., J.F. Ryan and A. M. Tarrant. "Establishing a Model Organism: A Report
from the First Annual Nematostella Meeting." Bioessays 34, no. 2 (2012): 15861.
Renfer, E., A. Amon-Hassenzahl, P. R. Steinmetz and U. Technau. "A Muscle-Specific
Transgenic Reporter Line of the Sea Anemone, Nematostella Vectensis." Proc
NatlAcad Sci U SA 107, no. 1 (2010): 104-8.
Reshef, L., 0. Koren, Y. Loya, I. Zilber-Rosenberg and E. Rosenberg. "The Coral
Probiotic Hypothesis." Environ Microbiol 8, no. 12 (2006): 2068-73.
Richter, M., and R. Rossello-Mora. "Shifting the genomic gold standard for the
prokaryotic species definition." Proc NatlAcad Sci USA 106 (2009) :1912631.
Rodrigue, S., A. C. Materna, S. C. Timberlake, M. C. Blackburn, R. R. Malmstrom, E.
Alm and S. W. Chisholm. "Unlocking Short Read Sequencing for
Metagenomics." PLoS One 5, no. 7 (2010): e11840.
J.
Ruby, E. G. "Symbiotic Conversations Are Revealed under Genetic Interrogation."
Nat Rev Microbiol 6, no. 10 (2008): 752-62.
Ryu, J. H., S. H. Kim, H. Y. Lee, J.Y. Bai, Y. D. Nam, J.W. Bae, D. G. Lee, S. C. Shin, E. M.
Ha and W. J.Lee. "Innate Immune Homeostasis by the Homeobox Gene
68
Caudal and Commensal-Gut Mutualism in Drosophila." Science 319, no. 5864
(2008): 777-82.
Sanders, J. G., R. A. Beinart, F. J.Stewart, E. F. Delong and P. R. Girguis.
"Metatranscriptomics Reveal Differences in in Situ Energy and Nitrogen
Metabolism among Hydrothermal Vent Snail Symbionts." ISMEJ 7, no. 8
(2013): 1556-67.
Shapiro, B. J.and E. Alm. "The Slow:Fast Substitution Ratio Reveals Changing
Patterns of Natural Selection in Gamma-Proteobacterial Genomes." ISMEJ 3,
no. 10 (2009): 1180-92.
Shapiro, B. J., L. A. David, J. Friedman and E. J.Alm. "Looking for Darwin's Footprints
in the Microbial World." Trends Microbiol 17, no. 5 (2009): 196-204.
Shi, Y., G. W. Tyson and E. F. DeLong. "Metatranscriptomics Reveals Unique
Microbial Small Rnas in the Ocean's Water Column." Nature 459, no. 7244
(2009): 266-9.
Singh, Y., J.Ahmad, J. Musarrat, N. Z. Ehtesham and S. E. Hasnain. "Emerging
Importance of Holobionts in Evolution and in Probiotics." Gut Pathog 5, no. 1
(2013): 12.
Starcevic, A., S. Akthar, W. C. Dunlap, J.M. Shick, D. Hranueli, J.Cullum and P. F. Long.
"Enzymes of the Shikimic Acid Pathway Encoded in the Genome of a Basal
Metazoan, Nematostella Vectensis, Have Microbial Origins." Proc NatlAcad
Sci USA 105, no. 7 (2008): 2533-7.
Stefanik, D. J., L. E. Friedman and J.R. Finnerty. "Collecting, Rearing, Spawning and
Inducing Regeneration of the Starlet Sea Anemone, Nematostella Vectensis."
Nat Protoc 8, no. 5 (2013): 916-23.
Stewart, F. J., E. A. Ottesen and E. F. DeLong. "Development and Quantitative
Analyses of a Universal Rrna-Subtraction Protocol for Microbial
Metatranscriptomics." ISMEJ 4, no. 7 (2010): 896-907.
Timberlake, S. C., S. C. Fernando, K. Penn, F. L. Thompson, and J. R. Thompson. (In
Preparation). "Holobiont metatranscriptomics in Brazilian reef-building
corals (gen. Mussismilia): Unraveling the functional dynamics of Coral host,
Symbiodinium, and Microbiota during health and disease."
Uchino, Y., A. Hirata, A. Yokota and J.Sugiyama. "Reclassification of Marine
Agrobacterium Species: Proposals of Stappia Stellulata Gen. Nov., Comb. Nov.,
Stappia Aggregata Sp. Nov., Nom. Rev., Ruegeria Atlantica Gen. Nov., Comb.
Nov., Ruegeria Gelatinovora Comb. Nov., Ruegeria Algicola Comb. Nov., and
69
Ahrensia Kieliense Gen. Nov., Sp. Nov., Nom. Rev."J Gen Appl Microbiol44, no.
3 (1998): 201-210.
Vega Thurber, R. L., K. L. Barott, D. Hall, H. Liu, B. Rodriguez-Mueller, C. Desnues, R.
A. Edwards, M. Haynes, F. E. Angly, L. Wegley and F. L. Rohwer. "Metagenomic
Analysis Indicates That Stressors Induce Production of Herpes-Like Viruses
in the Coral Porites Compressa." Proc NatlAcad Sci U SA 105, no. 47 (2008):
18413-8.
Vega Thurber, R., D. Willner-Hall, B. Rodriguez-Mueller, C. Desnues, R. A. Edwards, F.
Angly, E. Dinsdale, L. Kelly and F. Rohwer. "Metagenomic Analysis of Stressed
Coral Holobionts." Environ Microbiol 11, no. 8 (2009): 2148-63.
Warnecke, F., P. Luginbuhl, N. Ivanova, M. Ghassemian, T. H. Richardson, J.T. Stege,
M. Cayouette, A. C. McHardy, G. Djordjevic, N. Aboushadi, R. Sorek, S. G.
Tringe, M. Podar, H. G. Martin, V. Kunin, D. Dalevi, J.Madejska, E. Kirton, D.
Platt, E. Szeto, A. Salamov, K. Barry, N. Mikhailova, N. C. Kyrpides, E. G.
Matson, E. A. Ottesen, X. Zhang, M. Hernandez, C. Murillo, L. G. Acosta, I.
Rigoutsos, G. Tamayo, B. D. Green, C. Chang, E. M. Rubin, E. J. Mathur, D. E.
Robertson, P. Hugenholtz and J. R. Leadbetter. "Metagenomic and Functional
Analysis of Hindgut Microbiota of a Wood-Feeding Higher Termite." Nature
450, no. 7169 (2007): 560-5.
Whitaker, R. J.and J. F. Banfield. "Population Genomics in Natural Microbial
Communities." Trends Ecol Evol 21, no. 9 (2006): 508-16.
Wilmes, P., S. L. Simmons, V. J.Denef and J.F. Banfield. "The Dynamic Genetic
Repertoire of Microbial Communities." FEMS Microbiol Rev 33, no. 1 (2009):
109-32.
Xie, W.,
Q. S. Meng, Q. J.Wu, S. L. Wang, X. Yang, N. N. Yang, R. M. Li, X. G. Jiao, H.
Pan, B. M. Liu, Q. Su, B. Y. Xu, S. N. Hu, X. G. Zhou and Y. J.Zhang.
P.
"Pyrosequencing the Bemisia Tabaci Transcriptome Reveals a Highly Diverse
Bacterial Community and a Robust System for Insecticide Resistance." PLoS
One 7, no. 4 (2012): e35181.
Xiong, X., D. N. Frank, C. E. Robertson, S. S. Hung, J.Markle, A. J.Canty, K. D. McCoy, A.
J. Macpherson, P. Poussier, J.S. Danska and J. Parkinson. "Generation and
Analysis of a Mouse Intestinal Metatranscriptome through Illumina Based
Rna-Sequencing." PLoS One 7, no. 4 (2012): e36009.
Xu, J., M. K. Bjursell, J. Himrod, S. Deng, L. K. Carmichael, H. C. Chiang, L. V. Hooper
and J. I. Gordon. "A Genomic View of the Human-Bacteroides
Thetaiotaomicron Symbiosis." Science 299, no. 5615 (2003): 2074-6.
70
Zakham, F., 0. Aouane, D. Ussery, A. Benjouad and M. M. Ennaji. "Computational
Genomics-Proteomics and Phylogeny Analysis of Twenty One Mycobacterial
Genomes (Tuberculosis & Non Tuberculosis Strains)." Microb Inform Exp 2,
no. 1 (2012): 7.
Zerbino, D. R. and E. Birney. "Velvet: Algorithms for De Novo Short Read Assembly
Using De Bruijn Graphs." Genome Res 18, no. 5 (2008): 821-9.
Zhao, Z., H. Liu, C. Wang and J.R. Xu. "Comparative Analysis of Fungal Genomes
Reveals Different Plant Cell Wall Degrading Capacity in Fungi." BMC Genomics
14, (2013): 274.
71
72
Appendix I: Comparison of N. vectensis Isolate genomes with closest genome
sequenced type-strains
Pseudomonas oleovorans
47_CLC
4455 orthologs
Agrobacterium tumefaciens
D5
5407 orthologs
Compared strains: Outer to Inner rir g
Pseudomonas oleovorans B4_CLC
Agrobacterium tumefaciens D8
Agrobacterium tumefaciens IsCLC
Pseudomonas oleovorans GabCLC
Stappia stelluatta FlCLC
Pseudomonas oleovorans IsCLC
(T) Agrobacterium tumefacians C58
(T) Pseudomonas mendocina str. ymp
Limnobacter thidoxidans
strain F1
3850 orthologs
Limnobacter thiooxidans FCMA
(T) Burkholderia cenocepacia AU 1054
(T) Ralstonia solanacearum GM11000
Percent protein sequence identity
Bidirectional best hit
9 99.9 99.5 99 98 95 90 80 70 60 50 40 *3 *0
Unidirectional best hit 100 99.9 99.8 99.5 99 98 95 90 80 70 60 50 40 30 20
10
Figure 1: Comparison of closest genome-sequenced type strains (T) to holobiont-isolates.
Assemblies of the N. vectensis isolate genomes were imported into RAST and
annotated via its built in ORF Finder function (Aziz et al. 2008). Nucleotide
sequences of shared genes (determined by bi- and uni- directional BLAST analysis)
were compared within each population and its closest available type-strain. In the
above figure, one of the population isolates is chosen as the subject and is at the top
of the circle diagrams (Po_47, AtD5 and Lt_F1 in the image above). The circles
represent average nucleotide identities of the genes with the color scale
representing the range of this value. The closest sequenced type strain, the inner
circle of each diagram above, can be visually seen to have less average nucleotide
identity to the other strains than those in the same populations, showing the
relatedness of our isolate populations.
73
Appendix II: List of Holobiont-Specific Orthologous Groups
All designated as COG/NOG: COG category: Functional Description
Pseudomonas- Present in all 4 Strains
NOG09865
NA
Protein involved in cGMP biosynthetic process
NOG81651
R
Alpha/Beta protein
Pseudomonas - Present in 3 Strains
NOG127992 U
Efflux transporter, RND family, MFP subunit
COG3305
S
Predicted membrane protein
COG01658
R
Alpha/beta superfamily hydrolase
NOG238032 R
Antibiotic biosynthesis monooxygenase
Pseudomonas - Present in 2 Strains
NOG43354
S
Accessory processing protein
NOG82489
NA
Nuclear transport factor 2
NOG150660 L
Protein involved in plasmid maintenance
COG1223
R
Predicted ATPase (AAA+ superfamily)
COG5331
S
Uncharacterized protein conserved in bacteria
COG3623
G
Putative L-xylulose-5-phosphate 3-epimerase
Pseudomonas- Present in 1 Strain
NOG76531
R
Phospholipase D/transphosphatidylase
NOG67790
L
Integrase
L
COG3723
Recombinational DNA repair protein (RecE pathway)
NOG138108 P
Small multidrug resistance protein
NOG260384 S
Ring-Infected erythrocyte surface antigen protein
COG5588
S
Uncharacterized conserved protein
COG01389
M
G lycosyltra nsferase
NOG236196 NA
Photosystem I reaction center subunit VIII
NOG42892
R
Plasmid stabilization system protein
COG5322
R
Predicted dehydrogenase
COG5256
J
Translation elongation factor EF-1alpha (GTPase)
COG5486
S
Predicted metal-binding integral membrane protein
Indolepyruvate ferredoxin oxidoreductase, alpha and beta
COG4231
C
subunits
NOG68061
R
Unsaturated glucuronyl hydrolase
NOG136402 NA
Major facilitator protein
NOG80712
R
M ethyltra nsferase
COG0615
I
Cytidylyltra nsferase
COG1887
M
Putative glycosyl/glycerophosphate transferases
COG4922
S
Uncharacterized protein conserved in bacteria
74
NOG72023
R
Radical S-adenosyl methionine domain containing protein
Agrobacterium - Present in All 3 Strains
Helix-Turn-Helix protein, CopG family
S
NOG41617
Phage-related tail protein
S
COG5283
Mercuric transport protein
P
NOG45800
Peptide arylation enzymes
Q
COG 1021
Isochorismate synthase
Q
COG1169
Agrobacterium - Present in 2 Strains
ABC-type dipeptide/oligopeptide/nickel transport system,
ATPase component
E
COG00181
Prevent-Host-Death protein
NOG115657 S
Endonuclease
NOG150960 S
Udp-N-Acetylmuramoylalanyl-D-Glutamyl-2,6-Diaminopimelate-D-AlanylD-Alanine ligase
R
NOG27742
DNA methylase N-4/N-6 domain-containing protein
NOG145827 L
Cdp-Glycerol glycerophosphotransferase
M
NOG86690
Bifunctional DNA primase/polymerase
NOG127640 NA
Mercuric transport protein periplasmic component
P
NOG81601
R
Phage tail tube protein FIl
COG3498
R
Gcn5-Related N-acetyltransferase
NOG47648
Agrobacterium - Present in 1 Strain
Tripartite motif-containing 63 protein
NOG241729 0
Crispr-Associated protein, VVA1548 family
L
NOG47954
Nucleotidyltransferase substrate binding protein
R
NOG09685
A
Diacylglycerol kinase, catalytic region
N
NOG68654
L
Uncharacterized protein predicted to be involved in DNA repair (RAMP superfamily
COG1337
Uncharacterized protein predicted to be involved in DNA repair (RAMP superfamily
L
COG1336
Parallel beta-helix repeats
NOG236395 S
R
DNA polymerase beta domain protein region
NOG38892
N-Acetyltra nsferase
R
NOG39129
DNA primase
NOG79506
S
Uncharacterized protein predicted to be involved in DNA repair (RAMP superfamily'
L
COG1367
NOG145870 N A Helicase-Like
Predicted hydrolase of the HD superfamily (permuted catalytic motifs)
R
COG1353
Crispr-Associated protein, NE0113 family
L
NOG44923
Uncharacterized protein conserved in bacteria
S
COG2859
Limnobacter - Pr esent in Both Strains
V ABC-type multidrug transport system, permease component
COG01467
COG4270
S Predicted membrane protein
75
NOG124444 R Gcn5-Related N-acetyltransferase
Limnobacter- Present in 1 Strain
NOG241729
0
Tripartite motif-containing 63 protein
NOG47954
NOG09685
NOG68654
L
R
NA
COG1337
L
COG 1336
NOG236395
NOG38892
NOG39129
NOG79506
L
S
R
R
S
COG1367
NOG145870
COG1353
NOG44923
COG2859
L
NA
R
L
S
Crispr-Associated protein, VVA1548 family
Nucleotidyltransferase substrate binding protein
Diacylglycerol kinase, catalytic region
Uncharacterized protein predicted to be involved in DNA repair
(RAMP superfamily)
Uncharacterized protein predicted to be involved in DNA repair
(RAMP superfamily)
Parallel beta-helix repeats
DNA polymerase beta domain protein region
N-Acetyltransferase
DNA primase
Uncharacterized protein predicted to be involved in DNA repair
(RAMP superfamily)
Helicase-Like
Predicted hydrolase of the HD superfamily (permuted catalytic motifs)
Crispr-Associated protein, NE0113 family
Uncharacterized protein conserved in bacteria
Stappia - Present in 1 Strain
NOG131267
NOG125598
COG00976
COG01364
NOG127558
K
NA
0
C
NA
NOG05352
COG2364
COG04817
COG08079
NA
S
M
R
NOG255023
COG00516
NOG139698
NOG123944
NOG77058
COG04213
NA
R
S
K
NA
I
COG01054
C
Transcriptional regulator protein
Chorismate mutase
Protein-L-isoaspartate carboxylmethyltransferase
Radical SAM superfamily enzyme
Major facilitator superfamily MFS_1 protein
Udp-N-Acetylglucosamine-Lysosomal-Enzyme
N-acetylglucosaminephosphotransferase
Predicted membrane protein
CMP-N-acetylneuraminic acid synthetase
AAA+ ATPase domain containing protein
Plays a central role in 2-thiolation of mcm(5)SU at tRNA wobble
positions of tRNA, tRNA and tRNA. May act by forming a heterodimer with protei
Predicted flavin-nucleotide-binding protein
TRAP-T family transporter, DctQ (4 TMs) subunit
Helix-Turn-Helix domain protein
3-Hydroxyanthranilate 3,4-dioxygenase
Myo-inositol-1-phosphate synthase
Pyruvate/2-oxoglutarate dehydrogenase complex,
dehydrogenase (El) component, eukaryotic type, alpha subunit
76
Appendix III - List of scaffolds from the JGI assembly of Nematostella vectensis that
likely are bacterial contaminants based on BLASTn alignments to the 10 sequenced
N. vectensis associated isolates
jgilNemvel 11568311e-gw.12797.7.1
jgilNemvel 11563511e-gw.10270.1.1
jgilNemvel 11447851e-gw.1158.3.1
jgilNemve1133161gw.7009.1.1
jgilNemvel 11447561e-gw.1152.1.1
jgilNemvell 1487161e-gw.2574.7.1
jgi|Nemve1160966|gw.2574.4.1
jgilNemve1l 1487181e-gw.2574.12.1
jgilNemve11629521gw.1345.3.1
jgilNemve11688841gw.9505.6.1
jgiINemve1I68384jgw.9505.4.1
jgilNemvell 156025Ie-gw.9580.4.1
jgilNemvel 11562931e-gw.10013.4.1
jgilNemvel192161gw.10013.1.1
jgilNemvel 11571961e-gw.14966.6.1
jgilNemvel 145135e-gw.1325.6.1
jgilNemvell 1522251e-gw.4762.1.1
jgilNemve1616761gw.3793.4.1
jgilNemvel 11491721e-gw.2834.1.1
jgilNemvel 1450191e-gw.1269.1.1
jgilNemve1146382 Igw.9892.1.1
jgilNemvel 11442551 egw.981.2.1
jgilNemvel 11554481e-gw.8594.3.1
jgilNemve1|684291gw.12503.2.1
jgilNemvel |1533581e-gw.6033.2.1
jgilNemvel 11449641e-gw.1244.3.1
jgilNemve1j68529jgw.1228.1.1
jgilNemvel 11571711e-gw.14678.2.1
jgilNemvel 157272egw.15471.3.1
jgilNemvel 732381gw.4271.1.1
jgilNemvel 787631gw.18200.1.1
jgilNemve11623311gw.863.2.1
jgilNemvel 11447851e-gw.1158.3.1
jgilNemve1133161gw.7009.1.1
jgiINemve1 144756e-gw.1152.1.1
jgijNemve1l1487161e-gw.2574.7.1
jgijNemve1j60966jgw.2574.4.1
jgilNemvel 11487201e-gw.2574.3.1
jgi|Nemve1 148718e-gw.2574.12.1
jgi|Nemve168384jgw.9505.4.1
jgilNemvel 1560251e-gw.9580.4.1
jgilNemvel192161gw.10013.1.1
77
jgi|Nemvel 11571961 egw.14966.6.1
jgi|Nemvel 11574281 e-gw.16429.2.1
jgilNemvel 11522251 egw.4762.1.1
jgi INemvel 161676 lgw.3793.4.1
jgi INemvel 1149172 1 e-gw.2834.1.1
jgi Nemvel 146382 |gw.9892.1.1
jgi Nemvel 11442551 egw.981.2.1
jgi Nemvel 168429 lgw.12503.2.1
jgi Nemve 11569881e-gw.13707.1.1
jgilNemvel 1153358 1egw.6033.2.1
jgi Nemve 11449641e-gw.1244.3.1
jgi INemve 168529 lgw.1228.1.1
jgil Nemve 1157171 e-gw.14678.2.1
jgi INemve 1157272 egw.15471.3.1
jgi INemvel 173238 lgw.4271.1.1
jgi Nemvel 178763|gw.18200.1.1
jgi Nemvel 162331 Igw.863.2.1
jgi Nemve 11447851 e-gw.1158.3.1
jgi Nemve 133161gw.7009.1.1
jgi Nemve 11447561e-gw.1152.1.1
jgi Nemve 11487161e-gw.2574.7.1
jgilNemve 1I60966 Igw.2574.4.1
jgilNemve 11487201e-gw.2574.3.1
jgilNemve 11487181e-gw.2574.12.1
jgil Nemvel I683841gw.9505.4.1
jgi lNemve 1156293 e.gw.10013.4.1
jgi Nemvel 192161 gw.10013.1.1
jgi Nemve 11451351 egw.1325.6.1
jgiINemve 1157428 Iegw.16429.2.1
jgi|Nemvel 152225 I e-gw.4762.1.1
jgilNemvel 161676 lgw.3793.4.1
jgil Nemvel 1149172 1 e-gw.2834.1.1
jgilNemvel 146382 Igw.9892.1.1
jgi I Nemve 11442551 e-gw.981.2.1
jgi INemve 11554481 egw.8594.3.1
jgil Nemvel 168429 |gw.12503.2.1
jgi INemve 11569881 egw.13707.1.1
jgilNemve 11533581 egw.6033.2.1
jgil Nemve 168529 |gw.1228.1.1
jgi INemve 1157171 e.gw.14678.2.1
jgi INemvel 173238 lgw.4271.1.1
jgi INemvel 178763 |gw.18200.1.1
78
Download