genomeweb article - Department of Botany and Plant Pathology

advertisement
Team Introduces Targeted Capture Method
for Use in Groups of Related Plants
September 16, 2014
Team Introduces Targeted Capture Method
for Use in Groups of Related Plants
By Andrea Anderson
NEW YORK (GenomeWeb) – Plant scientists from the US and the Czech Republic have
developed a targeted sequencing and genome skimming strategy aimed at assessing both lowcopy plant genes and high-copy genetic elements such as sequences encoded in plant organelles.
The method, known as Hyb-Seq, scours transcriptome and genome sequences from a
representative plant in a given lineage and uses information from these sequences to design
probes for targeted capture and sequencing in related plants, Oregon State University botany and
plant pathology researcher Aaron Liston told In Sequence.
The idea was to come up with a targeted sequencing approach that was "specific enough that it
would pick up primarily low-copy genes," he explained, "but broad enough that it could work
across a range [of plants] — across an entire genus or a group of related genera."
As they described in a protocol note appearing in the journal Applications in Plant Sciences,
Liston and his co-authors used Hyb-Seq to look at sequences from plants in the same lineage as
the milkweed plant Asclepias syriaca. Using milkweed transcriptome sequences and a draft
version of the plant's genome, they designed 80 to 120 base pair probes to capture and sequence
thousands of exons from the genomes of a dozen related plant species or genera.
Through Illumina MiSeq sequencing on these enriched samples, the team successfully assembled
sequences coinciding with the majority of the genes and exons targeted. From the off-target
reads, meanwhile, it put together sequences representing the genomes of the plants' plastomes
and other high-copy sequences.
Although targeted sequencing has been used for humans and other animals for many years, the
approach is more difficult to apply across plant species due to the enormous complexity of plant
genomes, which are prone to duplication.
"For targeting, you want to target things that are 'single-copy' in your genome," Liston explained.
"That's one of the big challenges: whole-genome duplication."
Plants also lack the sort of phylogenetically informative ultra-conserved elements that are often
targeted for animal studies, he noted. Consequently, targeted capture techniques and related
phylogenetic analyses have generally been developed on a species-by-species basis.
"The challenge was to make something that would work across a range of close relatives —
different genera, for example," Liston said.
To that end, the team turned to its own genome and leaf/bud transcriptome data for the milkweed
— a representative plant from its initial lineage of interest.
By mining this unpublished sequence data, the researchers designed targeted capture probes
corresponding to 3,385 milkweed exons. These coding sequences, in turn, coincided with coding
sequences for 768 genes suspected of being present in single copies in the milkweed genome.
Possible gene paralogs were weeded out of the probe design process by excluding targets within
milkweed that share 90 percent sequence similarity or higher. Likewise, the team tossed potential
single-copy targets that spanned fewer than 120 bases or so in an effort to enrich for plant
sequences that were at least as long as the original probes.
Through solution hybridization, the investigators used the final probe set to enrich for related
sequences from 10 other Asclepias species and two plants from nearby plant genera, called
Calotropis procera and Matelea cynochoides before sequencing the resulting libraries with
Illumina's MiSeq.
With the help of a reference guided assembly approach, they then aligned the targeted capture
reads to the original milkweed sequences.
The approach made it possible to pick up some part nearly 93 percent of the exons the team had
targeted. Together, those sequences offered a look at 99.7 percent of the genes initially sought
after.
From the 760 or so loci that they began with, for instance, the investigators lost 60 to 70 percent
of markers when looking at more distant members of the family, Liston noted. Nevertheless,
enough information remained to begin looking at relationships within the milkweed lineage.
Between almost 2 percent and 13 percent of sequences spanning the original 768 genes varied
amongst the plants included in the study, for example, offering clues to phylogenomic
relationships in the milkweed lineage.
It remains to be seen whether there is an optimum number of Hyb-Seq markers for delineating
relationships in this and or other plant lineages, Liston noted. He and his colleagues are currently
putting together a phylogenic tree for this plant group that's built around sequences coinciding
with roughly 1 percent of the milkweed's exome.
While that may be modest in a whole-genome context, the Hyb-Seq strategy provides far more
resolution than that available from plant family trees built with data at just one or a few genes,
Liston argued. "Just getting the sample size up to 1 percent is, I think, going to lead to much
more robust phylogenies in the future."
Even if some markers cannot be detected in some or all of the plants tested, the remaining loci
continue to provide phylogenetic clues, he explained, whereas existing PCR-based gene-by-gene
approaches to looking at such plant relationships are "very onerous, very time-consuming."
Given that the targeted sequencing technique currently has 50 percent efficiency or so, around
half of the sequences coming out of Hyb-Seq experiments correspond to targeted regions of the
plant genomes, Liston explained.
The remaining sequences detected typically stem from high-copy sequences found in plant
organelles such as ribosomes, mitochondria, and plastids. Plastid sequences tend to be
particularly common, he noted, given that these genomes are between 10 and 100 times as
prevalent as mitochondrial genomes.
Those high-copy sequences "can be readily assembled" from the sequence data, Liston said,
noting that the analytical pipelines used to look at low-copy plant genes from the nucleus and
high-copy organellar sequences are slightly different.
At the moment, there do not seem to be differences in the applicability of Hyb-Seq within
different types of plants or plant lineages. Prior to the paper's publication, the authors shared
details of the Hyb-Seq approach with investigators working on grasses, legumes, and a range of
other land plants.
"You could do this for any plant of interest," Liston said. "With a transcriptome and a genome
skim, you can basically have all your data to design a set of probes that will work across that
entire genus."
For the current study, he and his colleagues generated their own draft genome sequence for
milkweed. Genome quality is not particularly important for this particular application, according
to Liston, as long as the bulk of the plant's gene space is represented.
Nevertheless, the availability of both transcriptome and genome sequences from the plant used
for probe design is important for accuracy, he explained, particularly given the variability that
tends to spring up in the intronic sequences that fall between coding portions of related plant
genes.
Introns "are too variable to reliably use as a probe when you want to go across species," Liston
insisted, explaining that probes designed to match these variable sequences are less likely to
effectively enrich for sequences across different plant species. By targeting the rapidly evolving
introns, "you're going to lose the flanking exon too."
Instead, he recommends targeting adjacent coding sequences, since the exon-targeting probes
will inevitably pick up at least some neighboring intron sequences. "It's part of the whole splash
zone idea. You target the exon, but depending on your read length … you can easily pick up 250
base pairs on either side of your exon," Liston said.
On the other hand, probes designed using transcriptome data alone are apt to inadvertently span
sequences interrupted by introns in the genome.
"Computationally, you can predict where the introns are," Liston said. "But … the introns change
rapidly, so having the genome is the best way to find the introns."
For the time being, Hyb-Seq probe design is expected to be most effective when both genome
and transcriptome data are available, he argued, though that may change as more and more plant
transcriptomes and genomes become available and intron predictions improve.
For their part, members of Liston's Oregon State University group are applying Hyb-Seq to a
wide range of phylogenetic and targeted gene experiments. For instance, they hope to get a better
look at not only the nature of plant relationships with one another, but also the extent to which
certain gene duplications are shared or distinct between closely related species.
The team believes the technique has potential for other applications as well, including efforts
aimed at exploring the biological features of the organelles producing the high-copy sequences
detected by Hyb-Seq. Population genetic studies are another possibility, Liston pointed out,
though the extent to which it can be applied across large sample sizes may still be somewhat
prohibited by cost.
The price of the approach changes over time and depends on the technology used. Generally
speaking, though, the team has been able to do the Hyb-Seq analysis for between $50 and $100
per sample.
Most of the samples the researchers are considering at the moment are multiplexed at both the
hybridization and sequencing levels.
So far, they have relied exclusively on Illumina sequencing technologies, though the same
general approach is expected to be compatible with other high-throughput sequencing
technologies such as Ion Torrent.
The team is pleased with the way the method itself is performing, though Liston noted that there
may be room for improvement in curbing the cost of library preparation and/or making the
analytic side of the pipeline more straightforward.
"The [analytical] methods are so much still in development," he said. "Making the methods more
available to a broad audience would be good."
Andrea Anderson is a senior science reporter for GenomeWeb Daily
News, covering genomics research studies and translational
research. E-mail Andrea Anderson or follow GWDN's headlines at
@DailyNewsGW.
Related Stories

Oxford Nanopore Presents Details on New High-throughput Sequencer, Improvements to
MinIon
September 16, 2014 / In Sequence

Recent Sequencing Papers of Note
September 16, 2014 / In Sequence

Recent Clinical Sequencing Papers of Note
September 10, 2014 / Clinical Sequencing News

MinIon Review by Early Access User Suggests Technology Not Ready for Routine Use
Yet
September 9, 2014 / In Sequence

Recent Sequencing Papers of Note
September 9, 2014 / In Sequence
footer
Download