project description - Faculty Support Site

advertisement
PROJECT DESCRIPTION
1. Results from prior support
Due to space constraints we will report here only on results from the first Nematode AToL award "ATOL:
Phylum Nematoda: Integrating Multidisciplinary Expertise and Infrastructure for Resolving Relationships
in a Major Branch of the Tree of Life" (DEB 0228692, $2,512,500, 10/1/02 - 9/30/07 + 1 year no-cost
extension) as the single most directly relevant prior support to our collaborative team.
1.1. Trainees: A total of 7 graduate students, 29 undergraduate students and 7 postdoctoral researchers
received training and conducted research as part of the first Nematode ATOL; over half of these trainees
were female and/or from ethnic minorities. We also collectively hosted 9 international professors on
sabbatical or fellowship visiting for 6-9 months each, as well as 15 additional international short-term
visitors for 1-8 weeks (including 7 graduate students).
1.2. Publications: We have published a total of 50 manuscripts during the course of the first NemATOL
project (see section 1 in references). and will list only some of the highlights here. A robust phylogeny
with good representation of Rhabditina (part of Clade V) was a featured cover article in Current Biology
and recommended as a "Must Read" by the Faculty of 1000 (Kiontke et al., 2007). Contributions in the
online third volume on Caenorhabditis elegans biology introduced the nematode model research
community to the latest developments in nematode phylogeny and to the methods and database
developed by our project (Fitch, 2005; De Ley, 2006). Invited reviews on the evolution of nematode
parasitism similarly informed, respectively, phytopathological and parasitological audiences (Baldwin et
al., 2004; Blaxter et al., 2004). In addition to direct project-related papers, data and database structures
produced during our project contributed to crossover research in image recognition algorithms (Wei et al.,
2008).
1.3. Workshops and symposia: We have organized five workshops with a collective attendance by 61
national and international experts (in addition to the 5 project PIs) as well as 8 graduate students. These
workshops brought together experts to create lists of priority taxa for sequencing, to review the current
state of taxonomic understanding in terms of available and unpublished data within each clade, discuss
provisional phylogenies based on the foregoing data, and (where considered possible) begin the process
of assembling morphological matrices for possible future analysis. In addition to the workshops, our
project CoPIs and graduate students chaired dedicated symposium sessions on nematode phylogeny
and Nematode AToL project updates at the annual meetings of the Society of Nematologists in 2003,
2004, 2005 and 2007. Each of these symposia had an audience of over 50 nematologists and included
brief clade-by-clade overviews of progress and problems with phylogenetic analyses.
1.4. Data advancements to nematode phylogeny: We collectively produced 500 novel near-complete SSU
sequences, 1020 new sequence fragments of approximately 600-1000bp of LSU and 669 new sequences
of various other loci of known or potential value for phylogenetic analyses. As originally planned, the first
Nematode ATOL proposed to obtain 25 orthologous gene sequences from 50 selected nematode species
by producing cDNA libraries for each of those species. However, as the project went through its first two
years it quickly became evident that cDNA library-based sequencing was no longer the optimal approach,
both from theoretical and practical standpoints. Opinion in the field of multigene phylogeny has evolved to
a stated preference for phylogenies based on hundreds of genes rather than dozens. Moreover,
massively parallel sequencing devices (see below) led to a revolution in molecular biology. Last but not
least, complete genome projects were planned (some in consultation with our project) and instigated for
at least a dozen additional nematode species.
We produced several “traditional” cDNA libraries (e.g., for Anguina sp. and Romanomermis culicivorax)
and conducted exploratory sequencing. In the course of this work, it became evident that much greater
gains could be expected from newer genomic techniques. PI Thomas therefore undertook a series of
preliminary analyses with massively parallel sequencing of cDNA (see 1.5. below as well as sections 3.1
and 3.2). This yielded datasets with hundreds of genes obtained at a fraction of the traditional cost from
each of five distantly related nematodes, including contigs representing nearly 200 orthologous genes
that are known or expected to be highly informative for phylogenetic analysis. Furthermore, several
orthologs were developed by CoPI Fitch for a PCR-based approach to phylogenetic analysis in Clade V.
CoPI Nadler has hosted visiting scholar Joong-ki Park in a collaborative exploration of the potential
usefulness of complete mitochondrial gene sequences for phylogenetic inference of nematodes. Results
of combined analyses of the 12 protein-coding mtDNA genes for 23 nematode species show strong
support for clades, and some disagreement with results based on SSU sequences.
During the workshops for Clades V and IV, sets of morphological characters and states were defined,
scored for >50 representatives of each clade and organized into a character matrix for each clade. During
the workshop for Clade I, international collaborator Dr. Peña Santiago summarized his ongoing
development of a morphological character matrix for freeliving members of that clade. Although many of
the morphological characters thus defined do not provide sufficient resolution by themselves, they often
contribute to resolution in important ways when combined with the molecular characters; several
morphological characters were found to be apomorphic for particular subclades. Here, we propose to
extend relevant data on morphological characters to representatives of major groups and to explore the
use of selected developmental charctersto resolve fundamental branches within the phylum Nematoda.
1.5. Methodological advancements to nematode phylogeny: We have collectively developed a number of
new or improved protocols and tools that greatly simplify and/or improve the quality and amount of
phylogenetic data obtainable from individual nematode species or even individual nematodes. CoPI De
Ley has experimented with a previously ignored preservative based on DMSO, EDTA and saturated NaCl
and demonstrated that it is particularly useful for preserving and transporting nematode specimens that
are still amenable to both PCR amplification and morphological identification (Yoder et al., 2006). In
combination with multifocal image vouchering via high-definition camcorder microscopy, this has opened
up a wide range of highly affordable investigations of combined morphological and molecular analyses of
nematode relationships (demonstrated in e.g. Tandingan De Ley et al., 2007). In parallel, CoPIs Thomas,
Nadler, and Baldwin have assembled electronically controlled microscopy and imaging systems that
provide a highly standardized approach to multifocal imaging and are an essential step closer towards
automation and high-throughput morphological vouchering for ecological or phylogenetic surveys of
nematodes (see e.g. Abebe et al., 2004). Yet another innovation pertains to genomic amplification of
individual nematodes or other interstitial invertebrates (De Ley et al., 2005). Our most recent innovation is
the testing of high-yield mRNA extraction and RT-PCR protocols on minute amounts of nematode
biomass (down to a single 5 mm long nematode), in combination with massively parallel sequencing of
the resulting cDNA (bypassing library construction) and improved assembly algorithms for the resulting
millions of sequence fragments per cDNA sample. Results from this far-reaching enhancement of
molecular capabilities are presented in sections 3.1 and 3.2.
1.6. The Nematol database: We have pursued a collaborative databasing strategy that aims to provide
open access to our sequence data as well as to the voucher images of our own project as well as those
of our national and international colleagues in the discipline. The first Nematode ATOL produced a
versatile and extensive website called Nematol, which not only acts as a database but also provides a
wide range of other utilities including online phylogenetic pipelines, nematode alignment and tree viewing
tools (http://nematol.unh.edu/tree/tree.php), and geographical mapping functions linked with the source
locations of sequenced and/or imaged nematode specimens. To date, Nematol has collections of 11,688
DNA sequences (6057, 18S rDNA; 3050, 28S rDNA; and 2581 ITS) and 4,549 image and video files
(images 138 files, videos 4411 files) of 627 nematode specimens, which were isolated from 273 samples
collected from 208 distinct locations in 26 countries. These nematodes belong to 274 genera and 87
families). It remains clear that because of the quality of the morphological information contained in the
vouchers that they are extremely useful for communicating morphological information and are becoming a
'gold standard' in taxonomy (Eyualem et al., 2006).
2. Specific aims of the new Nematode Tree of Life proposal:
The previous nematode AToL was highly successful in mobilizing the community of nematode
researchers and producing extensive datasets of ribosomal sequences from major and minor clades
across the entire phylum. Our published and ongoing analyses demonstrate, however, that several
important evolutionary radiations within the phylum cannot be adequately resolved with presently
available data. We therefore propose to extend the collective results obtained during the first nematode
AToL project by bringing to bear on these outstanding problems in nematode phylogeny a much wider
range of data and methodologies.
Aim 1: To resolve major relationships by vastly increasing the number of orthologous molecular
characters
During the first nematode AToL, we developed and tested several protocols that allow us to start from
extremely small amounts of nematode tissue and obtain sequences from large numbers of orthologous
genes with high information content for phylogenetic analysis. We now propose to apply the most
successful and robust among these protocols to a much wider range of species across the phylum than
was previously possible, emphasizing especially the areas of low resolution that correspond mainly to the
origins and early evolution of the whole phylum, as well as the origins and early evolution of Rhabditida
sensu lato, which is a monophylum that includes most of the medically and economically important
parasitic nematode clades.
Aim 2: To use morphological, developmental, and certain other characters to test hypotheses of
relationship in particular parts of the tree where experience suggests they will be relevant
In addition to obtaining multigene data across the phylum but emphasizing these major and as yet
unresolved radiations, we will also bring together published data and obtain new information for a breadth
of other types of characters from mitochondrial gene orders, morphological and ultrastructural studies, as
well as developmental lineages and patterns.
Since truly phylum-wide study and analysis of such character suites is not possible within the limited time
and resources of one project, we will focus specifically on those aspects for which previous and ongoing
research has shown the greatest direct relevance to the unraveling of the major radiations. In this
manner, our new project will have major impacts on science and education by consolidating the
infrastructure creations and collaborations established during the first nematode AToL project. It will
moreover integrate and reciprocally inform a much wider range of biological disciplines by targeted
inclusion of these characters.
3. Plan of Work
3.1. Archiving images and material for specimens representing taxa
3.1.1. Multifocal imaging/Video Capture and Editing (VCE).
NemATOL I implemented novel approaches to link molecular data to morphological vouchers, insuring
future verification of correct species identification, even after destruction of the actual specimen for DNA
extraction; this is accomplished by through-focus digital images linked with sequence data. In the present
study, live nematodes selected for Solexa will be assigned a unique ID, immobilized (using routine
mounting techniques; De Ley et al., 2005), and multifocal images recorded by VCE. Alternatively, in rare
cases where live worms are not available, killed worms for Solexa, shipped for example in RNAlater
buffer (http://www.ambion.com/techlib/resources/RNAlater/), will be otherwise vouchered for identification
(see below). Alternatively, nematodes selected for DNA amplifications also will be assigned a unique ID
and heat-immobilized or preserved in DESS (Yoder et al., 2006; Tandingan De Ley et al., 2007) for
similar video recording prior to DNA extraction and PCR amplification (De Ley et al., 2005). Key
diagnostic characters of the specimens will be imaged as described in De Ley & Bert (2002) and updated
in Yoder et al. (2006). Most of the collaborating PIs already have appropriate digital cameras for this
purpose. The resulting images are edited, archived in multi-platform video formats, and integrated into
the NemATOL website database. When feasible, digital images will be made for some specimens
subsequently preserved as conventional vouchers and deposited in museum collections; yet the digital
images have the advantage of preserving data from fresh material and in their capability to be the precise
morphological voucher of an individual exactly as it appeared when first observed with light microscopy.
We propose that all material for the project be vouchered, including specimens destructively sampled for
molecular data as: 1) through focus videos available online, 2) conventional slide mounted voucher
specimens deposited in a recognized, curated collection, such as the UC Riverside or UC Davis
nematode collections. Voucher images and/or slide mounted specimens will be linked to all metadata
appropriate to a specimen or sample.
3.1.2. Frozen strains and material
Where possible, live strains will be cultured (and if possible, cryogenically preserved alive), as we have
done for many Clade V species (e.g., the NYU Rhabditid Collection). For species that can be isolated in
bulk but not cultured (e.g., many parasites and some terrestrial freeliving species), material will be kept at
-80C or in liquid nitrogen dewars. In many cases (e.g. marine nematodes and other single-isolate cases),
neither archive method will be possible and only image archiving can be used.
3. 2. Multigene molecular data
3.2.1. Nemalogs: pan-orthologous genes for phylogenetic analysis of Nematoda
The genetic loci of choice for inferring phylogenetic frameworks for the phylum Nematoda have
been the nuclear ribosomal RNA genes (SSU and LSU), with occasional attempts to incorporate other
putative nuclear othologs (Kiontke, et al., 2007). The broadest evolutionary frameworks for the phylum
have all been based primarily or exclusively on SSU sequences (e.g. Blaxter et al. 1998, Meldal et al.
2006, Holterman et al. 2007). While much has been gained from these analyses, there are three general
reasons why additional orthologous loci are critical to advancing our understanding of the relationships in
the phylum. First is the need for more phylogenetically informative characters. The SSU and LSU loci
have a finite amount of signal, often insufficient to resolve deep and/or closely spaced branches. Second,
we need many putative orthologs across the phylum because any single gene’s history can be
significantly misleading. The definition of orthologs is imperfect and greatly affected by gene duplication
and loss, processes that are common in genome evolution (Lynch and Conery, 2003; Sebat et al., 2004,
Tuzun et al., 2005; Korbel et al., 2007). Consequently, any single putative ortholog could produce
erroneous inferences and the power of the approach is achieved only by establishing a consensus among
putative ortholog gene trees. Finally, because evolutionary rates vary between genes and among data
partitions within genes; different putative orthologs may be useful for addressing different phylogenetic
questions at different timescales.
As part of our efforts to expand the number of genes for phylogenetic inference, we have refined and
automated an approach developed by Blair et al. (2005) to identify single-copy, pan-ortholog sequences
across a large set of genomes using both complete genome sequences and unigene sets derived from
large-scale EST projects. The goal of this project was to develop a relational database containing precomputed sets of pair-wise gene comparisons that can be queried to give all known orthologous genes
for the entire phylum or any subset of taxa. The tool developed utilizes two distinct methodologies for
prediction of orthologous genes, a method based on Reciprocal Best Blast (RBB) as implemented by
Blair et al. (2005) and an alternative approach based on a threshold percent maximal bit score as
developed by Lerat and colleagues (2003). In anticipation of the growth of genome wide datasets (as
proposed here) we have developed a pipeline for analysis that includes a relational database and a web
interface (http://nematol.unh.edu/ortholog/index.php) allowing users to select taxa and methods of
analysis, and then download alignment ready putative orthologs in real time from a precomputed set for
all nematodes with significant genome-wide sequence representation.
3.2.2. Datasets:
Table 1. Datasets included in our preliminary development of the Nemalog Database
Genes analyzed
Clade
Species
Genera
per species
In the development of this approach,
V
9
6
2654-25775
we leveraged several sources of data including
IVB
12
5
488-11174
predicted genes from completely sequenced
IVA
3
2
3478-5234
genomes and unigene sets from cDNA/EST
III
5
4
2113-17843
sequencing projects. For the EST libraries, the
II
0
0
0
data included in this analysis was derived from
I
2
2
5448-5951
the unigene clusters available at
http://www.nematode.net. The data for both C. elegans and C. briggsae was based on complete genome
sequences (http://wormbase.org). The B. malayi data was derived from two sources, the ESTs from
http://nematode.net and a set of predicted genes available from the genome sequencing project. These
genomes and EST projects have been excellent sources of comparative information but do not represent
a thorough sampling of the biological/phylogenetic diversity of this large group. Most specifically Clade II
is as yet completely missing from these resources and with the exception of multiple Caenorhabditis and
Pristionchus species in Clade V, all available genome wide datasets are from parasitic nematodes. The
broad expansion of orthologous genes for phylogenetics will require both the generation of genome/EST
datasets from many more as yet unsampled lineages and methods to determine orthologous genes
among those datasets.
Figure 1. BLAST pipeline for determining intersection sets among datasets.
3.2.3. Pipeline for discovery of
putative orthologs.
A diagram of our pipeline for panortholog discovery (Figure 1)
begins with a pair-wise protein
sequence based BLAST of all
genes/unigenes from each
organism against all other
organisms. All of the results (top
five hits) are cached. This includes
self BLASTs to allow identification
of inparalogs. From these results
we are able to implement multiple
methods of analysis as queries of
the database and currently support
both RBB and the method of Lerat
et al. (2003).
The web interface allows users to
select taxa and methods. As implemented in our analysis the results are very similar to those based on
much smaller numbers of taxa using the same methodology (Blair et al 2005). By applying these
approaches to the current datasets it has been possible to identify large numbers of putative orthologs.
As expected, the number of putative orthologs found depends greatly on the number of genes/unigene
known for the inclusive set, the divergence of taxa selected, and the number of taxa selected. However,
even with the relatively limited resources and very divergent lineages we have identified 14 nematode
pan-orthologs.
Figure 2. Selected
results from the
Nemalogs pipeline
showing the number of
putative orthologs
identified based on
subsets of species
(marked *) with genome
wide resources placed
upon the SSU
phylogeny for those
taxa.
3.2.4. Outcome and limitations:
In an analysis of the 14 putative nematode panorthologs derived from 8 species we find that the
concatenated set results in a branching pattern that is the same as that predicted by the SSU analysis
presented Figure 2. Inference based on individual genes did not always support all clades but no
significant conflicts were identified with any single gene. Based on these analyses we now feel we have
a useful method for expanding the number of orthologous genes to be analyzed using current resources.
However, it is also obvious that we will need to dramatically increase the number of taxa for which we
have large-scale gene sequences. Furthermore, alignment of the orthologs resulting from this analysis
demonstrates that the design of conserved/degenerate primers for assay of other divergent nematodes
will be limited to a very small subset of genes. That approach has not proven workable in this group and
ultimately is not cost effective. These observations are a major reason we are proposing the dramatic
expansion of EST resources from 160 nematode species carefully sampled across the phylum
3.3. Solexa-Based EST Sequencing: a novel, rapid and low-cost solution for obtaining hundreds
of gene sequences from 160 nematode species
3.3.1. Logic of the approach:
Recent innovations in sequencing have resulted in two mature technologies Illumina Genome Analyzer
(Solexa) and the Roche Genome Sequencer FLX (454, Margulies et al. 2005) both of which now make it
possible to generate vast amounts of sequence data from small amounts of starting material without
cloning. While the sequence reads generated in these approaches are short (36-250bp), the per-base
cost is a fraction of traditional sequencing. Toward our goal of expanding the genes for phylogenetic
inference across this diverse phylum we are evaluating both massively parallel sequencing approaches.
However, based on our preliminary success described below with the Solexa approach and its already
very low cost, we are concentrating our efforts on that technology because we have determined that it
has the potential to rapidly expand our genome-wide, nematode datasets for discovery of orthologs for
phylogenetic analysis.
Our approach aims to discover large numbers of orthologous genes by generating EST-like sequences
from approximately 160 nematode taxa representing key lineages across the phylum. Using the Solexa
method to sequence cDNAs, it is theoretically possible to generate more unigene sets than are currently
available for even the best EST databases using traditional approaches. The major hurdle for the Solexa
approach is to cluster and assemble the unigenes from millions of short reads (currently 36bp, however
new systems being set up at UC-Riverside and UC-Davis will produce 50bp reads). In theory and in
practice with template from small, non-repetitive, homozygous genomes (e.g. E. coli) these reads can be
very effective for assembly of relatively large contigs given sufficient sequence coverage (Chaisson et al.,
2004; Whiteford et al., 2005). While the coding sequences of eukaryotic genomes represented in the
cDNA are less repetitive than the genome as a whole, and are only a fraction of the total size of the
complete genome, these cDNAs nevertheless represent specific challenges to assembly.
Table 2. Solexa sequencing results and contigs formed for 5 diverse nematode species.
SSAKE contigs
(set 1/set 2)
35,572/35,382
36,182/36,600
25,122
40,207
Reads
(set 1/set 2)
6,860,114/5,898,906
6,577,851/6,349,153
5,233,765
7,211,175
First, in real samples,
including normalized cDNA
preparations, each gene is
represented by different
numbers of cDNA copies
with over-coverage of some
genes and under-coverage
36,157
6,489,748
of most. Furthermore, in
44,620,712
samples from natural
isolates of diploid
organisms as proposed herein, heterozygosity is expected to significantly interfere with assembly of
contigs, a phenomenon that becomes limiting with short reads. Overall, the extraordinary number of
reads in each Solexa run is expected to maximize gene discovery, even though the effect of
heterozygosity is expected to limit the initial steps of contig formation. Contaminant cDNA from non-target
organisms may also create specific problems, but various steps can be taken to avoid or minimize these.
Prokaryotic mRNA can be largely excluded a priori by targeting polyA tails during cDNA synthesis, and
eukaryote contaminants that are not close relatives of the target organism can be filtered out a posteriori
on the basis of codon usage analysis among genes. In cases where closely related nematode DNA is
likely to be included (e.g. as prey tissues in the gut of predatory nematodes), problems can be avoided by
starving the target organism for several days before mRNA and cDNA preparation.
To test the feasibility of this approach using real samples we have generated EST datasets based on
Solexa reads from high-yield cDNA preps produced from minimal quantities of tissue (1-100 individuals)
Species
Plectus minimus
Romanomermis culcivorax
Hemicycliophora sp.
Thoracostoma
microfenestratum
Diplolaimelloides meyli
Total
for 5 diverse nematode species. With the specific goal of finding sequences representing orthologous
genes, we developed an analysis pipeline. This pipeline begins with the raw reads. These reads are first
assembled using the program SSAKE (Warren et al. 2007) to produce sequence contigs. The output
from SSAKE is then treated as typical EST sequence and assembled into putative unigenes using the
CAP3 assembler (Huang and Madan, 1999). Finally, these unigenes are then compared to the other
annotated genomic resources for nematodes to add them to the database of nematode orthologs. In our
initial tests to evaluate gene discovery using a common metric for all samples we have compared the
unigenes produced by our analysis to a set of 285 pan-orthologs from 9 metazoan species (Blair et al.,
2005). In practice, we will simply add these unigenes to the Nemalog pipeline. There are key aspects of
this approach that make it particularly useful for our goals of discovering and using orthologous
sequences. First, orthologous genes show a strong tendency toward common housekeeping functions
and generally high levels of expression (Lee et al., 2002). Consequently, orthologs are more commonly
found in EST sequencing databases. Second, while the Solexa sequences are short, they are long
enough that when translated in silico to generate significant amino acid matches in protein-sequence
based BLASTs. To test these ideas we sequenced cDNAs from 5 different nematode species. (Table 2).
From these species, a total of more than 44 million 36-base pair reads were generated. The reads from
each species were subjected to the analysis pipeline described above. In two cases multiple flow cells
were used for one cDNA sample. Due to processing time constraints we have only combined the
analysis of both read sets for Plectus minimus.
3.3.2. Test cases and development of parameters for assembly:
One of the limitations to this analysis is the processing time associated with SSAKE contig formation. To
speed up the evaluation and optimization process in our preliminary analysis, we reduced the total
dataset for Romanomermis by selecting only those reads that were from the mitochondrial genome of this
species by blasting all reads against the published Romanomermis mitochondrial genome (Azevedo &
Hyman, 1993). This analysis reduced the reads to be assembled and clustered from 14 million to 256K
allowing rapid assessment of contig quality and optimization of parameters. Relative to the
Romanomermis mitochondrial genome these reads represent >200-fold coverage, and as expected,
subsequent alignment of these contigs with the published genome gave on average hundreds of
overlapping contigs from SSAKE for each of the 12 mitochondrial, protein-coding genes. The consensus
of which represents deep coverage of all 12 genes with some loss of coverage on the extreme 5’ and 3’
ends of each coding sequence, as expected. Based on the evaluation using this reduced dataset we
have refined our parameters for the initial contig formations and followed this approach on complete
datasets. To evaluate our ortholog discovery we used the 286 animal wide pan-orthologs (Blair et al.,
2005) as BLAST targets to assign all clusters to specific genes. This approach permits the generation of
overlapping sets of clusters that can be combined to form putative orthologous gene sequences.
Table 3. Number of sequence contigs assigned to each of the 285 pan-orthologs for each species.
Pan-orthologs 1
or more contigs
>3
10+
of 285 possible
contigs
Contigs
Clade
Species
Plectus minimus
284
258
179
most probable outgroup
to III+IV+V
Romanomermis
187
26
0
I
culcivorax
Hemicycliophora sp.
142
81
18
IV
Thoracostoma
258
134
20
II
microfenestratum
Diplolaimelloides meyli
222
60
7
outgroup to
Plectida+III+IV+V
The numbers in Table 3 represent the number of putative unigenes in the Solexa analysis pipeline (Post
CAP3) that hit one of the 285 putative metazoan panorthologs (Blair, 2005). These results represent the
first-ever multigene data obtained from a Clade II nematode, as well as first multigene data from the
phylogenetically crucial outgroups to the Clades III, IV and V. They demonstrate the versatility and
applicability of our approach across phyletic diversity and suggest a very high rate of discovery of likely
pan-orthologs. This is particularly true for the Plectus minimus sample where we have successfully
analyzed more than 12 million reads to date (almost twice as many as for the other species). In this case
all but one of the likely pan-orthologs had at least one matching unigene sequence from the Solexa
analysis of that species, while the majority had more than 10 matches.
3.3.3. Next steps:
This analysis demonstrates that the Solexa reads can be used effectively to find gene sequences that are
likely to be panorthologs from these species and from many others that were hitherto inaccessible or
highly impractical for genome-scale sequencing. Our intention is to extend this analysis to use the
unigenes produced by the analysis pipeline for Solexa sequences in the generation of Nemalogs
(described above). To do this we must first develop consensus sequences for unigenes that appear to
match a single gene in other nematode taxa. As is clear in Table 2 many genes are represented by
multiple contigs from the analysis pipeline. This is largely due to an inability to assemble short contigs into
single unigenes, resulting often in largely overlapping contigs that failed to assemble with one another
due to sequence mismatches resulting from heterozygosity (or sequencing errors). While we can extend
these contigs by reducing the stringency of assembly we have chosen instead to conduct this final step
after the BLAST analysis in the Nemalog process by modifying that pipeline. The advantage of this step
is that we will avoid making chimeric assemblies and allow for the production of gapped assembly where
a transcript is not completely represented by sequence contigs. Because the goal of this analysis is to
generate orthologs that can be used in phylogenetic analysis, contigs that do not align well with their
nematode homologs are of minimal use in phylogeny. Similarly, regions of coding sequence that do not
match well (and will be selectively excluded in this analysis) were of little use to our goals as they will be
excluded in the alignment. However, these contigs from partially assembled genes will provide
information for primer design to allow the filling of gaps in these sequences. To address recognized
bottlenecks in the analysis, specifically the assembly of Solexa reads into contigs using SSAKE, we have
engaged the cooperation of the ToL CIPRES group (http://www.phylo.org) and supercomputing facility in
San Diego.
3.4. Analysis of mitochondrial gene sequences and gene order
As mitochondrial genes evolve at rates different from each other and from nuclear loci, the 12 protein
coding and two organelle-specific ribosomal RNA genes found in nematode mtDNA are likely to have
phylogenetic utility at several different levels within the nematode tree. The phylogenetic signal in these
genes may well enable improved resolution of branching order in parts of the nematode tree that are left
unresolved in nematode clades with especially slowly evolving SSU and LSU sequences as occur
especially in Spirurina (Clade III), Tylenchina (Clade IV) and Dorylaimia (Clade I). Analysis of
mitochondrial DNA (mtDNA) sequence data has long been a popular and useful tool for molecular
phylogenetic analysis and discussions continue unabated of its relative merits and weaknesses compared
to informative nuclear loci (Boore, 2006). Phylogenetic analyses of complete mtDNA genomes for
nematodes have included few species to date (Hu et al., 2003); our preliminary analyses show strong
support for clades (Figure 3), and certain conflicts in comparison to nuclear ribosomal DNA trees (e.g.
Blaxter et al., 1998; Holterman et al., 2006; Meldal et al., 2006). We propose to include this important
approach not only as part of the multigene dataset per se, but also to provide hypotheses for
mitochondrial gene order evolution and to compare the evolutionary rates of nuclear and mitochondrial
genes. The overall size of the multigene dataset to be accumulated in the course of our project creates
unique opportunities for addressing many questions specific to phylogenetic affinities among nematodes
as well as larger questions regarding metazoan mitochondrial genome evolution.
Accepted wisdom is that mitochondrial transcriptional organization, or gene order, is generally stable at
lower taxonomic levels (e.g., families) but can vary at higher ranks, as among phyla (Boore and Brown,
1998, Boore, 1999). Gene order has been advocated as a uniquely homoplasy-free source of cladistic
characters and typically employed to resolve ancient relationships (e.g. Ryu et al., 2007) but with varying
degrees of success (Boore 2006). However, rates of gene order change within the two major nematode
taxonomic classes are decidedly different (Hyman, 2007; Tang , 2006). While syntenic relationships are
often conserved among the 14 Chromadorean mitochondrial genomes sequenced, no two gene orders
are the same among the eight available complete Enoplean mtDNA sequences, nor among three
congeners from the genus Romanomermis. The use of mitochondrial gene order as a first-choice
phylogenetic tool is at best controversial, and we are aware of the many pitfalls and unknowns that must
be considered (Boore, 2006). However, in the case of Chromadorea, we believe that the occurrence of
large blocks of synteny reflects a tractable history of individual mutations, matching the premises of
Rokas and Holland (2000) about the phylogenetic advantages of rare genomic changes. The opportunity
is now available for selective application of this method across a targeted range of chromadorean species
where we expect it to be particularly valuable.
Figure 3. Combined analysis of all 12
mtDNA genes of all currently available
nematode species (Soonthornpong, Nadler
& Joong-Ki, unpublished; Hyman et al.,
unpublished). Identical topologies are
obtained maximum likelihood, maximum
parsimony, and neighbor-joining methods at
amino acid level and catenated nucleotide
sequences including two ribosomal RNA
genes with the 12 protein coding genes
(excluding 3rd codon positions).
Nematode mtDNA genes which are among
the most highly expressed RNAs in tissues,
based on the prelimary analyses described
above will be well-represented in data from
the high-throughput Solexa sequencing
effort. As such, large scale mitochondrial
gene-by-gene comparisons can be
accomplished for all 160 taxa analyzed.
While immature mitochondrial transcripts are
typically poly-cistronic and would be useful
for deducing adjacent genes and gene
orders, it is the processed, polyA+
transcripts that provide template for Solexa
high-throughput runs. As such, it seems
unlikely that Solexa sequencing will enable
direct assembly of mitochondrial gene orders. However, divergently oriented primers can be developed
for each mitochondrial gene sequence derived from the Solexa sequencing effort, enabling a PCR map of
adjacent gene pairs with successful amplifications. This strategy creates a simple two-step process (if
necessary) for constructing gene orders in a facile, high-throughput fashion. Analysis of mitochondrial
gene orders in nematodes has also been greatly facilitated by our applications of rolling circle
amplification (Tang and Hyman, 2005, 2007), and it is anticipated that larger scale mtDNA sampling
combined with selective studies of gene order will prove especially valuable in helping us to resolve the
phylogenetic relationship of the outgroup orders Plectida and Monhysterida to the three major clades (III,
IV and V) contained within the order Rhabditida. Collectively these three orders constitute a deeply
embedded monophylum within the Chromadorea.
We therefore expect that levels of synteny in this clade will be consistent with previous findings within this
class, and appropriate for unambiguous retrieval of tractable gene rearrangements. This solution
requires resolution of relatively ancient associations; mitochondrial gene order has found utility for this
purpose in many studies (reviewed by Boore et al., 2005).
3.5. Selective analysis of phylogenetically informative development processes
Data on developmental variation among species can be fairly independent of morphological variation (e.g.
Kiontke et al., 2007) in the final adult and are expected to be independent of DNA sequence data. Like
morphology, however, much of the developmental data obtained to date have revealed a striking degree
of homoplasy and when used by themselves to construct phylogenetic trees leave little that is resolved
(because homoplasy in one character is usually independent of homoplasy in another developmental
character; Kiontke et al., 2007; Brauchle, Kiontke, Fitch & Piano, submitted). For example, multiple
convergences and reversals have occurred in male tail tip development, yet most changes occur in the
ancestors of well-marked clades (see Kiontke & Fitch, 2005). Despite this homoplasy when analyzed
alone, developmental datasets—like morphological ones—tend to increase the resolution of the
molecular-based trees when combined in a "total evidence" approach (Kiontke & Fitch, unpub.),
presumably because the multiple changes in homoplasious characters can serve as apomorphies for
multiple clades supported by the preponderance of the data. That is, the homoplasy of these characters
is not due to significant conflict in phylogenetic signal between morphology, development, and molecules.
Importantly, there are several developmental characters that have been found to change rarely and are
very clearly apomorphic for particular clades. For example, the orientation of a cell division in the 2-cell
embryo flipped 90˚ in an ancestor of the clade that includes Protorhabditis, Prodontorhabditis, and
Diploscapter (Brauchle et al., unpub.; Dolinski et al., 2001). A change in anteroposterior (AP) axis
specification due to sperm entry point is a synapomorphy linking Panagrolaimorpha with Tylenchina
(Goldstein et al., 1998; De Ley & Blaxter, 2004). Other early developmental events may also be useful in
indicating basal clades (Schierenberg, 2005). A novel ventral migration of the gonad's distal tip cell is
synapomorphic between diplogastrids and Rhabditoides inermis (Kiontke & Fitch, unpub.). The posterior
positioning and the timing of migration of the phasmids relative to male genital papillae are also
informative developmental characters marking particular clades, at least in rhabditids. The latter events
are clearly revealed with immunofluoresence using an anti-adherens antibody, MH27, which we have
demonstrated has a broad species range (at least Clades IV and V). Thus, an aim of the present grant is
to incorporate some developmental characters that appear to change rarely into a "total evidence"
approach to test if these datasets will aid resolution of the major branches of nematodes. Such
characters will include some easily scored early developmental processes (e.g., AP axis specification
events as correlated with P granule segregation; orientation of the earliest cleavage division planes;
certain clear cell migration events, including those involved in positioning male genital papillae). Taxa will
be chosen for developmental analysis by their representation of major clades on the molecular tree and
by other criteria, such as ability to culture embryos). For scoring these characters, we will use currently
available imaging systems in our labs. In the total evidence analysis, we will check for robustness of the
phylogenetic conclusions by assessing the effects of different character weighting schemes (e.g., each
character set equally weighted vs. all characters equally weighted), different methods (parsimony,
Bayesian), and different models for character-state changes (e.g., unordered vs. ordered, if there are
multistate characters in which a stepmatrix model can be envisioned).
3.6. The role of morphology in resolving major radiations
3.6.1. Using morphology to inform phylogeny.
Our experience underscores that diverse and often independent datasets are complementary and inform
one another toward resolving phylogenies, molecular and morphological character sets being the
prominent example. NemATOL I leveraged clade-specific workshops to discuss and define characters
and states to be assessed for phylogenetic utility. These character sets emphasized traditional aspects
of morphology to be collected and verified by individual labs in conjunction with relevant new data on
character homologies and polarity from developmental and other evidence. Morphological characters
were primarily evaluated and used for phylogenetic inference by groups of investigators working within
clades rather than across the entire phylum. For some groups (e.g. Rhabditina, Clade V) there is a more
detailed characterization of certain classical morphological characters, and a more advanced
understanding of their informativeness for phylogenetic analysis (Kiontke, Sudhaus and Fitch, unpub.).
For example, a change from a pore-shaped to a slit-shaped vulva is apomorphic for a clade we
temporarily call” Eurhabditis”. That the Rhabditis-Adenobia group is the sister taxon of Oscheius is
supported by the loss of the pharyngeal median bulb. All species in the “Mesorhabditis group” share a
posterior vulva, prodelphic gonad, and complex oviduct and uterus. However, in most cases, molecular
frameworks suggest that many characters traditionally used to define taxa are proving to be
homoplasious and would be uninformative or even misleading for defining phylogenies if used by
themselves. Despite this homoplasy, our morphological character sets tend to increase the resolution of
several clades when combined with molecular data in a “total evidence” approach (Kiontke and Fitch,
unpub.). Thus for morphological characters that are represented across the phylum, we proposed to
extend these analyses to other taxa representing additional major clades (i.e., ~20 members of each of
the major clades I-V). Specific taxa will be chosen by their representation of molecularly defined
subclades and availability of good specimens.
Problems with morphological characters (which often lack detailed characterization) include
misinterpretation, phenotypic plasticity, convergence, non-independence and epigenetic factors as well as
the large number of continuous characters that can be difficult to recode (Fitch, 1997, Subbotin et al.
2006 Subbotin et al. in review1). Although some of these problems also affect molecular data, resolving
them for nematode morphology must take into account that these traditional characters are often at the
limits of light microscope resolution (thus leading easily to misinterpretation). Against this background, in
the context of NemATOL I, we have demonstrated the value of employing SEM, TEM (including 3D
modeling), and confocal-LM, to discover new (and reinterpret classical) characters in representatives of
major clades including Rhabditomorpha, Cephaolobmorpha, and Tylenchomorpha, (Baldwin et al., 2004;
Bumbarger et al., 2006, 2007, submitted, Dolinski and Baldwin, 2003). These approaches have
characterized features that appear homologous and promising for testing specific hypotheses of
relationships within Nematol II, including special emphasis on slowly evolving morphological features that
may help resolve nodes that have weak support in current molecular trees.
3.6.2. Selection of taxa for targeted morphological investigation
The scope of Nematol II limits the number of taxa that can be evaluated with technically demanding
morphological tools; thus to maximize overall phylogenetic insight for the phylum, it is crucial that these
representatives be selected to best reflect character states across clades and outgroups; this process is
supported in Nematol II as major clades are increasingly resolved. Based on our preliminary studies as
to what will be the most productive we propose to target: 1) Aspects of lip patterns as observed by SEM,
2) Cellular basis of feeding structures via TEM reconstruction. This includes, in the stoma region, anterior
hypodermal, arcade and anterior pharyngeal cells as well as strategic regions of the pharynx including the
basal region/bulb, 3) The basic topography of the nervous system in key areas including the head region
(sensory organs), and innervation of the male tail (e.g. comparing underlying topography of males with
and without caudal sensilla (e.g., papillae) expressed at the surface); this includes the phasmid which is
considered a synapomorphy for the Secernentea sensu lato (= Rhabditida) radiation and may or may not
include homologs outside Secernentea2 4) Architecture of the secretory/excretory system across the
phylum with respect to patterns of glands and canals and expanding on the basis for classical bifurcation
of the phylum (e.g. Secernentea and Adenophorea sensu lato; Chitwood and Chitwood, 1977)3. This
region is also promising for resolving branching of clades within the Secernentea radiation, including
apparent synapomorphies in the particular topography of lateral canals 4. 5) Additional promising
characters relevant to major branches throughout Nematoda is suggested by ongoing work of
collaborator Yushin from TEM of diverse sperm (Yushin and Malaknov, 2004, Yushin and Coomans,
2005), and 6) In patterns of early development by collaborator Schierenberg from 4D resolution of early
embryos (Schierenburg, 2006). In addition to participants Yushin and Schierenburg, collaborators GiblinDavis and Poiras, have long interacted with Baldwin on several projects including classical taxonomy with
SEM and TEM; this expertise combined with their ongoing surveys focusing on groups underrepresented
(i.e. insect associates) and potentially key to some important radiations, will be invaluable to both
morphological and molecular aspects of the proposed projects.
1
For example, in NemATOL I testing clade Pratylenchus demonstrated a large set of classical
characters to lack phylogenetic information. However a very small set of redefined classical features and
new characters from SEM of lip patterns, while too small for a combined analysis with molecular data, did
provide a basis for demonstrating a striking level of congruence using Baysian approaches and pie charts
to demonstrate and convey the strength of each node and the strength of each character state at the
node (Subbotin et al., submitted).
2
By example it would be phylogenetically informative if the pattern of innervation essential for
phasmids was conserved in taxa such as Plectida that are sister to the Secernentea, and furthermore if
that same conservation does or does not extend more broadly to additional outgroups.
3
The excretory system may also prove informative within clades. Its structure was one of the few
morphological characters congruent with molecular trees for Ascaridoidea (Nadler and Hudspeth, 2000)
4
By contrast to tradition TEM methods, fixation by freeze substitution (Bumbarger et al, 2006,
2007) especially preserves excretory canals that are open and readily resolved.
Whereas the morphological selections are undoubtedly ambitious in the context of NemATOL II, we
propose that the work is tractable because 1) NemATOL II will extend the phylogenetic framework as a
basis to efficiently target the most useful representatives of clades, 2) we are building upon long
experience of PIs and collaborators including that gained through NemATOL I, 3) areas of special
morphological interest can often be elucidated simultaneously. By example, innervation of the anterior
region is resolved in conjunction with reconstruction of the stoma (Bumbarger et al., 2006, 2007, Figure 4)
and regions of the pharyngeal base and excretory system involve investigating (as for TEM sections) the
same region, 4) with development of emerging software and microscopy tools which are already available
in our labs, technical processes can often be simplified to access characters across a wider range of taxa.
For example, phylogenetically informative patterns of radial pharyngeal muscle cells (De Ley et al, 1995,
Baldwin and Eddleman, 1995, Baldwin et al, 1997, Dolinski, and Baldwin, 2003) first elucidated with TEM
reconstruction can now be used to interpret character states across a range of taxa using more rapid
confocal procedures in conjunction with specific florescence of membrane junctional complexes. The
demonstration of confocal microscopy as a useful shortcut has been developed in the Baldwin lab by
collaborator Burr, and has built upon methods previously used by PI Fitch (Fitch and Emmons,1995, Burr
and Baldwin, unpublished). Indeed, characters first elucidated with detailed TEM reconstruction often
"open our eyes" to visualize key features using simpler and more efficient instrumentation. In short,
phylogenetic approaches in morphology are designed with proven techniques and the most cost effective
and efficient methods possible to 1) document specimens used for molecular data with corresponding
records of phenotypes and 2) engage diverse expertise (PIs and collaborating participants) to target the
taxa and features most promising toward resolving major radiations within the limited scope of this
project.
Figure 4. Serial TEM
reconstructions depicting
cell-cell homologies across
taxa with quick time
movies A, B anterior
nervous system resp.
epidermal cells
(Cephalobina) C. Anterior
epidermal cells and stylet
(Tylenchida). (Bumbarger
et al. 2006, 2007,
Ragsdale et al. submitted)
3.7. Phylogenetic analysis – Molecular characters
3.7.1. Overall strategy.
As part of NemAToL I, more than 1,000 SSU (18S) rRNA sequences were used to identify
unambiguously supported clades reflecting the diversity of Nematoda (e.g. Kiontke et al. 2007). However,
measures of confidence in certain major clades in SSU trees are sometimes low, and results can vary
according to inference method and other factors, including the multiple alignment (Smythe et al, 2006). In
virtually all cases where it has been applied, combined analysis of near-complete SSU plus LSU
sequences provides substantially increased resolution (and clade support) for the nuclear rDNA gene tree
(e.g.Kiontke et al. 2004, 2007; Kiontke & Fitch, 2005; Nadler et al., 2006). As part of the nuclear rRNA
transcription unit, the LSU is present in many copies per genome and therefore is comparably easy to
amplify and sequence. Therefore, one goal is to sample the same 160 species that are targeted for
single-copy nuclear gene sequencing (i.e., Solexa and directed methods) for near-complete (all but the
extreme 5’ and 3’ ends) LSU rRNA sequences. This will maximize the number of informative characters
available for inferring the rRNA gene tree, which has become a benchmark for nematodes, not unlike 16S
rRNA in prokaryotes. These same taxa will be targeted for Solexa sequencing of highly expressed
single-copy nuclear genes and the 12 protein-coding mtDNA genes, and this combination of molecular
datasets should provide enhanced resolution for deep splits (and certain other weakly resolved nodes) in
the nematode tree of life.
Phylogenetic analyses (details follow) will be performed independently on each molecular dataset (i.e.,
each presumptive locus) to assess at what levels different loci are contributing to the analysis, and to
determine if certain potentially problematic features associated with a few taxa (e.g., compositional bias)
are repeated across loci. However, the emphasis will be on combined analyses of molecular datasets,
which will include two approaches. First, a “total evidence” analysis of all combined sequence datasets
(Kluge, 1998). Second, congruence tests (Partition homogeneity test; Farris et al. 1995) will be
performed, and gene partitions that are not found incongruent will be combined for analysis. Both
approaches are likely to provide resolution of portions of the tree poorly resolved in the analysis of
individual datasets such as the aforementioned SSU rDNA tree (e.g., see Fitch 1997; de Queiroz et al.
1995). Comparison of topologies inferred from “total evidence” and congruence-tested datasets will reveal
which clades are sensitive to the assumptions underlying these very different philosophical approaches to
combining data partitions. These different philosophical “camps” are present even among the NemATol
collaborators, so this dual approach to combined analysis is needed for our own satisfaction.
3.7.2. Characters used for phylogenetic inference.
For any particular level of phylogenetic inquiry, we will only use unambiguously aligned regions of the
sequences. Previously, we have used secondary structure models for nematode rRNA (SSU and LSU) to
guide alignments over difficult regions (e.g. Fitch et al., 1995, Kiontke et al., 2004, Meldal et al., 2007;
Subbotin et al., 2007). These models will also be used for guiding the rDNA alignments for these
analyses, and rRNA characters will be partitioned into "stems" versus "loops" (see Fitch et al. 1995,
Subbotin et al.) to test the effects of differential weighting of such sites on tree inference. Unambiguous
regions can be selected without investigator bias by employing the program ProAlign (Löytynoja and
Milinkovitch. 2003) to filter characters by their average minimum posterior probability (caveat: some
datasets with long internal gaps will not work for ProAlign), or alternatively, by using the Cutter tool
developed for this purpose as part of NemAToL I (Cutter is available and described on the NemAToL
website). We also intend to test the sensitivity of our phylogenetic conclusions to the use of gap
characters by conservatively recoding unambiguous gaps as an additional character state (e.g., see Fitch
et al. 1995). Only characters that have states assigned for all of the included taxa will be used for
molecular analysis. For example, with Solexa sequencing, this may preclude using some sequence at
the 5’ and 3’ termini of certain genes. For the protein-coding genes, the inferred amino acid sequence
alignments will be used to guide the sequence alignment of nucleotides, with gaps placed from peptide
alignments using RevTrans (http://www.cbs.dtu.dk/services/RevTrans/). Protein-coding genes will be
partitioned by codon position to assess the effects of including all positions, versus just 1 st and 2nd
positions on tree topology. Similarly, phylogenetic hypotheses based on amino acid characters will also
be explored because these characters may be more informative for relationships deeper in the tree.
3.7.3. Phylogenetic methods
Analysis will initially be performed on the each dataset separately, with the exception of the mtDNA
genes, which will be considered to represent a single linked genome for the purposes of analysis. The
partition homogeneity (ILD) test will be used to determine if different datasets are significantly
incongruent. Genes lacking incongruence will be combined for analysis. If the datasets are incongruent,
we will try to determine the characters or taxa responsible (e.g., by analyses with taxon deletions or
character elision). Although the ILD test can reveal that different datasets should not be combined, there
are reasonable differences in philosophy about approaches to combining data in developing phylogenetic
hypotheses (e.g., Huelsenbeck et al., 1996). Therefore, even when incongruence is revealed, a "total
evidence" analysis will be performed to determine how different the resulting hypotheses are. In
combination with the ILD results, topologies from individual gene analyses may indicate which partitions
are mainly responsible for differences between “combined congruent” and total evidence trees. We will
use parsimony (MP) maximum likelihood (ML) and Bayesian (BPP) methods as tools to explore these
data in different ways. All of these methods are tractable for the size of the datasets to be analyzed (~165
taxa) given current implementations. MP offers an efficient method to explore tree space using relatively
simple data models. For example, comparing tree topologies for rRNA gene trees with and without
stem:loop character weighting or stepmatrices that incorporate estimates of transition:transversion ratios.
Heuristic search strategies will be required given the number of taxa planned for these analyses. For
unweighted parsimony analysis, the parsimony ratchet (Nixon, 1999) will be used to increase the odds of
escaping local optima and finding shorter trees. Bootstrap resampling (with heuristic searches) will be the
primary tool for assessing the relative reliability of clades for parsimony analyses.
Bayesian and ML approaches offer the potential advantages of using an inference method with explicit
(and more complex) substitution models. Especially for cases where significant variation exists among
branch lengths (including some nematodes), choice of the appropriate model is important for accurately
recovering phylogenies (Cunningham et al. 1998). We will compare and choose among models for each
gene using the Akaike information criterion using ModelTest (Posada and Crandall, 1998) or other modelfit programs now available as online resources. Because of its relative insensitivity to rate differences
along branches and its ability to incorporate corrections of superimposed substitutions at different rates
over different sites (Huelsenbeck 1997; Cunningham et al. 1998), ML will be our primary tool for modelbased phylogenetic inference for individual genes. Due to the moderate size of these datasets (~165
taxa), implementation of ML will be most feasible using RAxML as implemented through the Cipres portal
web server (http://8ball.sdsc.edu:8889/cipres-web/Home.do), which also permits Bootstrap-RAxML
analysis. Bayesian ML (Huelsenbeck 2000) will be our preferred tool for analyzing combined datasets
since it is particularly effective for implementing a mixed/partitioned model analysis wherein different
genes have different best-fit models. The Cipres portal now hosts a parallel implementation of MrBayes
as do other sites such as the High-Performance Computing Institute at Cornell
(http://cbsuapps.tc.cornell.edu/mrbayes.aspx). Bayesian analysis will be used to test the stability of
branches as the posterior probabilities of trees collected after convergence of a search to a stable
likelihood value (Huelsenbeck, 2000), which may require several million generations given these
datasets. Additionally, once the most-highly resolved branches are established, the clades defined by
these branches can be focused on for more intensive tree evaluations and stability tests, including
statistical tests of alternative topologies. These approaches, including alternative topology testing, have
been used by us in testing nematode phylogenetic hypotheses (Nadler and Hudspeth, 2000; Sudhaus
and Fitch 2001). In all cases, tree rooting will be performed by the outgroup method. For the most
comprehensive analyses, this will involve using taxa outside of Nematoda (e.g., Nematomorpha and
others) as informed by the Protostome ToL project. Finally, new procedures or improvements for multiple
alignment, model estimation, phylogenetic inference and alternative topology testing continue to be
published at a rapid rate. These include research and development by other ToL projects, for example,
advances in methods to perform simultaneous multiple alignment and phylogenetic inference for large
datasets (Holder et al. ToL project). We hope to take advantage of these new computation resource tools
as appropriate to Nematode ToL project.
4. Implications for Classification
Nematode classification has a long history of controversy and disagreement among experts, often based
on seemingly arbitrary choices of preferred taxonomic principles and preferences (see review in De Ley &
Blaxter, 2002). Many of these controversies mirror larger discussions and changes in the theory and
practice of animal systematics in general, but with especially aggravating factors such as: sheer diversity
of species and taxa; fragmentation of expertise along boundaries of zoological subdisciplines (e.g. plant
nematology vs helminthology vs meiobenthology vs soil ecology); fragmentation along major language
and geography boundaries (e.g. western european vs american vs former soviet bloc vs south asian);
difficulties in observing and interpreting characters at the microscopic edge of optical resolution; high
preponderance of homoplasy and intraspecific variability. The first SSU rDNA phylogenies to appear in
print provided an opportunity to change the nature of these controversies by moving away from the
lifetime experience and individual authority of recognized world experts, towards more formal and less
personal questions of sequence analysis and comprehensive representation of known diversity. De Ley &
Blaxter (2002, 2004) proposed a first classification based primarily on molecular phylogeny, which was
relatively compatible with different aspects of different prior morphological systems and proved useful for
framing more precise questions in the context of the first Nematode AToL project. However, being very
largely based on a single gene, this system was constrained by the apparent limitations of the gene in
question, including the lack of resolution of the major clades near the origin of the phylum, as well as
around the origin of nematodes with phasmids.
Our proposed efforts to resolve these as yet polytomous major radiations (by means of a wide range of
data from carefully selected taxa) will also provide added value in permitting a comprehensive reappraisal
of this and preceding systems. We propose to do this in a manner that is best suited to attain a relative
degree of consensus, through established mechanisms and collaborations continuing from the first
Nematode AToL, by encouraging a community-wide discussion of the multigene and multidata trees
generated during our new AToL project. Fundamental questions to be addressed will include the relative
usefulness and theoretical strengths vs weaknesses of Linnean nomenclature as compared to phylocode
proposals, the degree to which phylogenies should inform vs define classifications, how to accommodate
different usage of taxon names in different subdisciplines, etc. These discussions will be channeled
through our workshops and through symposia at forthcoming conferences, as well as through creation of
a publicly accessible section of NemForum, an online forum tool currently used as restricted-acces tool
for project management via the Nematol database (see management section). While we are hopeful that
a general consensus about nematode classification has now become much easier to reach than in past
decades, we will prioritize the production of a comprehensively revised system and make the necessary
decisions by consensus within our team of CoPIs, if wider consensus seems to fall prey to protracted or
circular debates.
5. Impact and Outreach
A key impact of the previous and proposed nematode ATOL is far reaching promotion of worldwide
linkages for communication, integration and comparison across different scientific cultures and
disciplines (see previous section). This is accomplished by bringing together into a single evolutionary
and taxonomic framework the unparalleled diversity of nematode taxa, including worldwide
representation of freeliving species as well as parasites of plants and animals. The outreach is extended
by worldwide accessible online databases (http://nematol.unh.edu/), forums and modern information
technology at all stages (e.g., digital imaging, multiple types of optical and electron microscopy, testing
homologies scoring and polarizing of classical and novel morphological characters, using established
and cutting edge methods of sequencing diverse genes and phylogeny reconstruction). The CoPIs each
have long-established international ties, and the project engages collections, and collaborators worldwide
through workshops and laboratory exchanges. Online resources, including video images and SEM have
been and will be further incorporated into teaching programs and workshops worldwide. These
resources, databases and enhanced museum collections all provide infrastructure that supports and
integrates diverse nematology research beyond the scope of the proposed project. The nematode treeof-life has and will continue to expand on a superb track record in supporting graduate and published
undergraduate research with a high proportion of participation by underrepresented groups. The
proposed project expands this excellent record by incorporating research participation of minority
undergraduate students at Elizabeth City State University, NC, as well as at least two female graduate
students at UCR including one African American.
Beyond phylogenetics, the nematode ATOL supports researchers working on comparative and applied
aspects ranging from genomics to parasite control. The phylogenetic framework allows them to find better
models for studying the biology of their target organisms. Soil ecologists and biodiversity surveyors are
aided by new tools supporting identification and phylogenetic classification, while evolutionary ecologists
gain new opportunities to explore the evolution of functional relationships between decomposers (fungi
and bacteria), primary producers and their respective nematode regulators. Greater phylogenetic
documentation of the interweaving of plant and animal parasitism allows parasitologists to better
investigate principles of parasite-host interaction. A phylogenetic framework for parasites will further
improve communication with Caenorhabditis genomicists, neurobiologists and embryologists, and
especially so for those seeking to expand the diversity of model species compared with C. elegans.
Better understanding of the nematode branch also supports new insight into outgroups i.e. the broader
Tree-of-Life. The nematode ATOL has been and will continue to be a catalyst for development of
technologies for applications beyond phylogenetic analysis. At least two of our collaborators from the
previous nematode ATOL are leveraging molecular phylogenetic information in developing assays to
monitor nematode communities. One ATOL collaborator (Holterman et al., 2008) has developed methods
based on rDNA sequences to assay the abundance of specific plant pathogens in soils, a process that
has taken hold in the private sector testing environment. Similarly, collaborator Giblin-Davis and
colleagues (Porazinska et al., 2007) have developed massively parallel sequencing approaches based on
rDNA sequences to assess nematode communities. Databases, alignments and phylogenetic analyses
stemming from our studies are critical to the development of these uses and their evaluation.
Download