BIODIVERSITY ASSESSMENT OF INSECTS FROM

advertisement
Biodiversity Assessment of Insect from Environmental Samples Using qPCR
and Next-Generation Parallelized Sequencing of DNA Barcodes
by
Saina Taidi
A Thesis
Presented to
The University of Guelph
In partial fulfillment of requirements
for the degree of
Master of Science
in
Integrative Biology
Guelph, Ontario, Canada
© Saina Taidi, August, 2012
ABSTRACT
Biodiversity Assessment of Insect from Environmental Samples Using qPCR
and Next-Generation Parallelized Sequencing of DNA Barcodes
Saina Taidi
University of Guelph, 2012
Advisor:
Mehrdad Hajibabaei
This thesis employs three bioindicator species of mayfly (Insecta: Ephemeroptera) and
three of caddisfly (Insecta: Trichoptera) as models to develop a reliable biodiversity and
biomonitoring assessment approach by using quantitative PCR (qPCR) and next
generation sequencing (NGS) technology. Quantitative PCR was employed to assess the
efficiency of species-specific PCR primers in amplifying their target species versus other
taxa from closely or distantly related taxonomic groups from benthic habitats. Results
showed qPCR can be used as a practical test for evaluating PCR primers for amplifying
specific taxa in mixed environmental samples although it might be influenced by
amplification bias. Target specific primers are an alternate to presumably universal
primers. Each primer set can be tested and optimized using qPCR prior to use in nextgeneration sequencing. qPCR results showed corroboration with 454 pyrosequence data
and hence it can be used in experimental design procedure for NGS based biomonitoring
which could indicate that qPCR is a useful tool for selecting primers in the NGS
amplicon preparation.
ACKNOWLEDGMENTS
First and foremost I offer my sincerest gratitude to my supervisor, Dr. Mehrdad Hajibabaei,
who has supported me throughout my thesis with his patience and knowledge. I attribute the
achievement of my Master’s degree to his encouragement and assistance. Without his advice,
this thesis would not have been written or completed. One simply could not wish for a better
or more friendly supervisor.
I would like to especially thank Dr. Teresa Crease for all her support both as my committee
member and as Graduate Coordinator. I will never forget her kind support; nobody could
wish for a better professor.
Many thanks to my co-adviser, Dr. Paul Hebert, who supported me with his constructive
opinions; it was a great honour for me to have such opportunity to have his guidance through
my project. I also had such an unforgettable time in Dr. Donald Baird’s lab. Also many
thanks to Dr. Baird and his team, especially Kristie Heard, who supervised my very first
experience in sample collection and species identification.
Heartfelt thanks to my dear friends and lab mates Claudia, Jennifer, Steve, Connor, Joel, Ian
and Stephanie, who blessed my everyday work brain storming and having a good time
together as well. Special thanks to Shannon who was always there to support me in more
ways than anyone can expect. Many thank to Dr. Shady Shokarallah, who helped me
throughout this journey not only by his scientific knowledge, but with his great attitude and
encouragement to keep on going. I definitely would not be here without his great support.
I am deeply lucky to have friends who helped me maintain the courage to write and to move
forward. I would like to thank all of them, near or far, for their support and encouragement.
iii
My especial thank to Ahmed Al-Wattar, Margaret Hundleby and Shawn Kehoe, who read
over my thesis and provided great comments, explaining their concerns and paying careful
attention.
Thanks to the great help of Xin Zhou and Terri Porter who helped me in learning the
strategies for classic taxonomic identification and bioinformatics analysis of my data.
Susan Mannhardt, with her extra busy schedule, was always there to answer all questions and
support me in all aspects of the administrative process. Also I would like to thank Mary-Ann
Davis, Karen White, Lori Ferguson and all the IB department staff.
I would like to thank to all of my colleagues and friends at Biodiversity Institute of Ontario,
especially Natalia Ivanova, for aid with laboratory protocols and workshops on sequence
editing.
Finally I would like to thank especially my mother and father for all their love and support
they gave me in my life, and also my sisters and brother for their love and support.
iv
Table of Contents
LIST OF TABLES ....................................................................................................................... vi
LIST OF FIGURES ................................................................. viError! Bookmark not defined.
LIST OF APPENDICES ......................................................... viError! Bookmark not defined.
INTRODUCTION......................................................................................................................... 1
The challenges of biodiversity analysis ...................................................................................... 2
Biodiversity and biomonitoring .................................................................................................. 2
DNA information and biodiversity analysis ............................................................................... 4
DNA barcoding: standardized molecular biodiversity analysis.................................................. 5
Next-Generation sequencing for biomonitoring ......................................................................... 6
Quantitative PCR ........................................................................................................................ 7
Why qPCR? ................................................................................................................................ 9
Objectives ................................................................................................................................. 10
MATERIAL AND METHODS ................................................................................................. 11
Target species selection and specimen collection ..................................................................... 11
DNA extraction ......................................................................................................................... 12
Primer design and optimization ................................................................................................ 12
Sanger sequencing validation of amplicons .............................................................................. 14
Quantitative PCR ...................................................................................................................... 14
v
1. Template selection and normalization .............................................................................. 14
2. Experimental design.......................................................................................................... 14
3. Reaction conditions for qPCR experiments ...................................................................... 15
4. Data analysis ..................................................................................................................... 16
454 pyrosequencing .................................................................................................................. 18
1. Experimental design.......................................................................................................... 18
2. Multiplexing amplicons .................................................................................................... 18
3. Amplicon preparation ....................................................................................................... 19
4. 454 Pyrosequencing amplicon library preparation ........................................................... 20
5. 454 data analysis framework ............................................................................................ 20
Automated sequence filtering ................................................................................................... 21
Manual sequence analysis ......................................................................................................... 22
RESULTS .................................................................................................................................... 23
Quantitative PCR Results ......................................................................................................... 23
Relative Amplified Copies (RAC) Analysis ............................................................................. 25
Quantitative and qualitative analysis of pyrosequencing reads ................................................ 26
vi
DISCUSSION .............................................................................................................................. 29
Primer behaviour in multi-template PCR ................................................................................. 30
Quantitative PCR as a tool for target identification .................................................................. 31
Optimal NGS analysis of target genes and taxa........................................................................ 32
Comparing qPCR and 454 results ............................................................................................. 34
Towards an standardized approach for metagenomics analysis of environmental DNA ......... 35
REFERENCES ............................................................................................................................ 37
TABLES ....................................................................................................................................... 44
Table 1: Species-specific oligonucleotide primers ................................................................... 44
Table 2: gDNA extracts concentration from target species ...................................................... 45
Table 3: 454 Pyrosequencing tagged primers ........................................................................... 46
Table 4: CT Values of target species setA (Trichoptera) .......................................................... 47
Table5: CT Values of target species setA (Ephemeroptera) ...................................................... 48
Table 6: CT Values of target species setB (Trichoptera) ........................................................... 49
Table 7: CT Values of target species setB (Ephemeroptera) ..................................................... 50
Table 8: Summary results from qPCR & 454 pyrosequencing analysis ................................... 51
Table 9: Slope and efficiency rates for primer set A and B amplicon-based material ............. 52
Table 10: Read numbers for gDNA-based material, automated analysis ................................ 53
Table 11: Read numbers for amplicon-based material, automated analysis ............................. 54
vii
Table 12: Read numbers for gDNA based material, manual analysis ...................................... 55
Table 13: Reads numbers for amplicon based material, manual analysis ................................ 56
FIGURES ..................................................................................................................................... 57
Figure 1: Amplification plot sample ......................................................................................... 57
Figure 2: The workflow used in qPCR experiments ................................................................ 58
Figure 3: 454 pyrosequencing experimental workflow ............................................................ 59
Figure 4: Exemplar standard curves for qPCR experiments (gDNA based) ............................ 60
Figure 5: Exemplar standard curves for qPCR experiments (amplicon based) ........................ 61
Figure 6. Exemplar Relative Amplified Copies (RAC). ........................................................... 62
Figure 7. MID distribution for gDNA based material. ............................................................. 63
Figure 8. MID distribution for Amplicon based material. ........................................................ 64
APPENDIX 1: Standard curves for target samples, gDNA based .............................................. 65
APPENDIX 2: Standard curves for target samples, amplicon based .......................................... 69
APPENDIX 3: 454 pyrosequencing analysis results .................................................................. 73
viii
INTRODUCTION
Biodiversity is the diversity of genes, species and ecosystems, or the variety of every
living organism and can be defined at many different levels, from allelic diversity and
heterozygosity to the variation of population distribution in a region (Lovejoy, 1997). Today, the
concept of biodiversity within conservation biology is not only focused on the subject of species
diversity or endangered species but also on other aspects of biodiversity that focus on practical
applications such as water quality analysis, conservation biology or measuring the health of
biological resources .
Biodiversity and its impact on other fields of biological sciences has long been a subject
of fascination for scientists around the world. Modern biodiversity analysis started with the work of
Linnaeus almost 250 years ago, and yet even today only a small fraction of the world’s species are
known to humanity. The greatest diversity exists among insects, which account for more than one
million of the planet's named animal species. From the canopy of the tropical rain forests to ocean
floor, it is estimated that millions of undescribed insect species and other organisms exist (Mora et
al. 2011).
All together, the earth's oceans and continents support close to 50,000-55,000 species of
vertebrate animals and 300,000-500,000 species of plants, with anywhere from 10 to 100 million
species still to be identified (Mora et al. 2011). A new study used a statistical approach to estimate
the total number of species to be 8.7 million (Mora et al. 2011). However, the authors recognize
limitations of current direct methods for estimating biodiversity.
1
The challenges of biodiversity analysis
Biodiversity is fundamentally concerned with measuring the number of species and how
they combine to form communities and ecosystems. The most common way of studying this is to
characterize the differences between species using different traits such as body size, physiological
tolerance and body shape or even by habitat preferences (Bonada et al. 2006). However, it is
important to note the difficulty of measuring these characteristics easily and accurately for
biodiversity analysis. There are bottlenecks such as difficulties in the identification of species at
different life stages (such as difficulties in identifying larvae) or sometimes measuring the
biodiversity based on this method is more difficult when parameters such as species richness or the
increase in consistency (evenness) distributes more equally among these species.
Although biodiversity measurements are based on counting the abundance of species in a
target environment, the ability of research scientists to conduct measurements on a large scale is an
important factor in the efficacy of any method. When considering species-rich ecosystems such as in
the tropics, analyses become more complicated and the nature of these complex ecosystems makes
biodiversity assessment much more difficult.
Biodiversity and biomonitoring
Biological monitoring or biomonitoring is the systematic utilization of biological
responses to assess and monitor changes in the environment with the intention of using that
information in environmental assessment programs (Bonada et al. 2006). The utilization of
environmental bio-indicators has become one of the common methods for evaluating the health of a
2
target environment. In general, bio-indicators are defined as taxa that can respond to environmental
changes or disturbances in a way that can be observed and measured (quantified). The sensitivity of
an organism’s reactions to environmental changes and the capacity of scientists to measure them are
important factors in selecting bio-indicators (Hajibabaei, et al., 2011; Nash, 1989; Noss, 1990).
Biomonitoring of water quality can occur in freshwater or marine water. Freshwater
biomonitoring can occur in lentic (lakes and ponds) or lotic (rivers and streams) inland waters.
Organisms that live in the bottom subtracts (sediments, debris, logs, macrophytes, filamentous algae,
etc.) of freshwater habitats (lentic and lotic) for at least part of their life cycle are considered benthic.
Benthic macroinvertebrates refers to animals that inhabit the bottom substrate for at least part of
their life cycle and are retained by mesh sizes ≥ 200 to 500 µm ( Rosenberg et al., 1993; Suess,
1982; Ward et al., 1986).
The processing of benthic macroinvertebrate specimens using classical taxonomic
approaches is an important barrier to the development of biomonitoring processes especially when
applied to large-scale programs such as the biomonitoring of freshwater to indicate the quality of a
target stream. Moreover, this type of bottleneck can also occur at the sample collection, sorting and
preparation stages. The identification of larvae has always been a major bottleneck in biomonitoring
studies involving benthic macroinvertebrates (Bonada et al. 2006).
The routine biomonitoring process relies on the identification of one specimen at a time,
which requires experienced technicians, sufficient time and funds to complete the process. Another
difficulty found within taxonomy-based biomonitoring is the depth of the identification. Although
keys exist for the identification of species, they are not comprehensive and are lacking in
descriptions of all life stages of target species.
3
DNA information and biodiversity analysis
Without genetic diversity, a population loses the ability to evolve and adapt to
environmental changes. Genetic diversity has an impact on intraspecific levels of biodiversity.
Hence, the study of genetic variation is central to biodiversity analysis. In order to accurately
identify species based on genetic information one needs to focus on genetic information that varies
between species and not among members of the same species. However, traditionally, the
characterization of species has been studied based on morphological characteristics. Nevertheless,
morphological inconsistency is one of the main issues that scientists are faced with; diagnose the
characteristics may not be apparent at all life stages of an organism’s development and its
appearance may be influenced by environment factors. Today, many different genetic markers and
techniques have been introduced to assess genetic variation as a complementary tool to aid
traditional approaches (Roesch et al. 2007; Gill et al. 2006; Limpiyakorn et al. 2006).
Molecular biology tools have provided useful information on the diversity of target organisms
through the detection of variation at the molecular level (mainly DNA and proteins). The reliable
identification of organisms is an essential and important ability that these techniques can provide
within evolutionary, ecological and environmental studies. There are many instances in which
genetic tools could give better resolution in the identification of species when barriers in
identification processes exist.
There are a number of different techniques which are available for genetic identification.
The priority of choosing one technique over another is dependent on the material that is being
studied or the nature of the questions to be addressed. DNA barcoding is one of the DNA-based
techniques that have been used for studying biodiversity and molecular evolution.
4
DNA Barcoding: Standardized molecular biodiversity analysis
DNA barcoding(Floyd, et al., 2002; Hebert, et al., 2003) is a relatively new molecular
approach that uses a short uniform sequence of DNA to identify species across taxonomic groups. A
650 base pair region near the 5’ end of the mitochondrial gene cytochrome c oxidase 1 (COI) has
been suggested as a DNA barcode for animals. Subsequently (Hebert and Gregory, 2005; Smith, et
al., 2008), DNA barcoding has gained momentum in biodiversity studies as a standard species
identification method (Frézal and Leblois, 2008; Hajibabaei et al., 2006). DNA barcoding can
differentiate between morphologically cryptic species more efficiently than other methods; however
it does not eliminate the need for traditional taxonomy. Beyond its use as an identification technique,
it has been suggested that DNA barcoding can be used to expand our understanding of phylogenetic
and population-level differentiation, although DNA barcode sequences are often not appropriate for
comprehensive phylogenetic analyses. Some studies have questioned the ability of COI barcodes to
distinguish between species from certain taxa, such as hybrids and in recently diverged species
(Munch, et al., 2008). These critics propose that COI should be used in concert with nuclear genes
to yield more robust conclusions. Additionally, alternative genes have been proposed as DNA
barcodes for plants and fungi (Hollingsworth, et al., 2009). In cases where DNA in a specimen is
degraded, it has been shown that even a partial fragment of DNA barcode, a mini-barcode, can
provide species-level resolution (Meusnier et al., 2008). These mini-barcodes can often provide
DNA barcode information in situations where a full-length barcode cannot be retrieved. These cases
include museum samples with potentially degraded DNA as well as environmental samples in which
next generation sequencing methods (that can currently produce sequence reads less than 500 bases)
are needed.
5
Next-generation sequencing for biomonitoring
Although DNA barcoding contributes to taxonomic research and biodiversity analysis by
identifying unknown specimens, some important issues need to be considered concerning the
possible applications of barcoding to the analysis of bulk environmental samples. For example, is it
possible to analyze and barcode all species in an environmental sample without separating them to
individuals? If so, would it then be possible to quantify species abundance by analyzing bulk
samples? Next Generation Sequencing (NGS) platforms may aid in answering these questions.
While Sanger sequencers work on single specimens, NGS devices such as 454-FLX (Margulies et
al., 2005) can read the sequence of thousands to millions of DNA fragments. However, one of these
technologies, massively parallelized pyrosequencing, which is currently implemented in the Roche
454 device, has three characteristics that make it suitable for the analysis of biodiversity in a large
number of DNA templates, such as DNA extracted from bulk environmental samples: 1) high
throughput, 2) the ability of parallel sequencing, and 3) the ability to read a relatively long length of
sequence (currently 250-400 bases). The third characteristic is especially important for accurate
identification of biota in environmental samples, as the alternative technologies produce short
sequence reads incapable of distinguishing taxa in complex environmental samples (Claesson et al.
2010). Therefore, through the use of a 454 pyrosequencer, it is possible to gain sequence information
from DNA barcodes and to use bioinformatics to compare this information to standard barcode
libraries to assess biodiversity in an environmental sample. 454-pyrosequencing produces large
amounts of data at low cost as well as providing a method for sequencing environmental DNA
without a former cloning step. To date, 454-pyrosequencing technology has mainly been used in
environmental studies involving bacteria. While the use of DNA barcoding combined with next
6
generation sequencing offers great potential in broadening the application of DNA barcodes, such
protocols have not been fully developed.
The goal of a new technology development project at the Biodiversity Institute of Ontario is to
optimize protocols for data generation and bioinformatics analyses of an environmental barcoding
system for biomonitoring applications. The 454-FLX pyrosequencing facility has been generating
data from sentinel groups, such as benthic macro-invertebrates including mayflies (Ephemeroptera),
stoneflies (Plecoptera), and caddisflies (Trichoptera) called “EPTs”. Because of their sensitivity to
environmental changes, EPTs are key taxa for environmental biomonitoring studies for freshwater
quality assessments (Bonada et al. 2006). If these taxa are to be used in environmental barcoding
using a pyrosequencing approach, we need to understand and optimize recovering their DNA
barcode sequences directly from environmental samples. Although groundbreaking work at BIO has
proved this approach feasible (Hajibabaei et al. 2011), molecular tools for assessing multi-template
Polymerase chain reaction (PCR) prior to pyrosequencing analysis are not available. This thesis
employs six bioindicator species of Ephemeroptera and Trichoptera (three species from each order)
as a model to assess various factors in developing a reliable biodiversity and biomonitoring
assessment approach by using pyrosequencing. Quantitative PCR technology will be employed to
assess the behavior and efficiency of PCR primers used in the multi-template PCR necessary to
perform amplicon-based pyrosequencing.
Quantitative PCR
The polymerase chain reaction (PCR) can produce millions of copies of a particular DNA
sequence in approximately 1.5-2 hours. This automated process avoids the use of cloning and
7
bacteria to amplify DNA. Real-time polymerase chain reaction or quantitative polymerase chain
reaction (qPCR) is similar to normal PCR, but the PCR amplicons are detected and quantified as
they are generated. Hence, qPCR has been used for quantifying the PCR product of one or more
specific sequences in a DNA sample.
Preliminary efforts to manage the quantifying power of PCR have been faced with limitations
such as generating data by removing an aliquot of reaction at specific cycles, making a serial dilution
of PCR product or in some cases by including an internal control (Becker et al., 1996; Kennedy,
2011; Ozawa et al., 1990; Piatak et al., 1993; Roux, 2009). Although these methods are able to
quantify the PCR product to some extent, they are time consuming and labour intensive so the use of
these methods has been limited.
Quantitative PCR has had a great impact on molecular biology and simplified quantification. The
mechanism of this technique is based on monitoring the amount of fluorescence in each cycle, which
is produced by a dye that binds to the PCR amplicon as it is generated. The amount of PCR product
can be plotted as a function of cycle number. By this new method there is no longer a need to
actually sample a reaction at various cycles or to use labor intensive techniques to predict the
exponential phase. This technique recognizes the exponential region by plotting fluorescence on a
logarithmic plot. The preliminary cycle occurs when the fluorescence level is significantly higher
than background levels, which represents the initial template amount. The quantification cycles
(Cqs) are determined by a fluorescence threshold (The term, “CT value” is the number of cycles
required for each template to pass the threshold). Figure 1 provides an example of the differences in
CT value and cycle number which may be detected in a qPCR experiment.
8
Why qPCR?
Although PCR-based techniques have had a great influence on the field of molecular biology, the
post PCR analysis methods used to analyze its results are limited. Gel electrophoresis is one of the
most common techniques for visualizing PCR products. Although it is fast, easy and inexpensive, it
cannot distinguish between different products with the same molecular weight.
Soon after the introduction of qPCR in 1996, it became an everyday tool in molecular
labs; Quantitative qPCR machines have simplified amplicon recognition by providing the ability
to monitor amplifications during each cycle. All available instruments designed for qPCR
experiments measure the progress of PCR amplification by tracking the changes in the
fluorescence level coming from each amplicon, in each cycle within each PCR reaction. In
addition, these measures can be taken without opening the instrument so the risk of
contamination decreases significantly.
Quantitative PCR offers many advantages for quantitative analysis and detection of
specific target genes and has been widely used in research and diagnostics. The ability to
monitor the reaction constantly, rapid running time, potential for high throughput analysis, high
sensitivity (~ 3pg or 1 genome equivalent of DNA) and wide range as it can detect across 101010 copies of target DNA are some of the advantages of qPCR . Conversely, there are
disadvantages of this technique such as limited capacity for multiplexing, the requirement for
high levels of optimization and the need for high technical skills above those required for normal
PCR.
In this study, I employ qPCR to evaluate primer-binding affinities in different primer sets used in
multi-template PCR amplification of bulk environmental samples prior to pyrosequencing.
9
Objectives
The objectives of this study are to improve the present understanding of the patterns and
processes obtained using molecular information from DNA barcodes in biodiversity assessment
using species from two orders of the class Insecta as models. More specifically, an attempt will
be made to examine the use of barcoding as a tool for biodiversity assessment and biomonitoring
of environmental samples. I predict that the results from pyrosequencing will be more robust in
obtaining a comprehensive species-level biodiversity measure from bulk samples at a much
faster pace than other approaches such as cloning and Sanger sequencing.
I predict that the primers that bind to specific sites (100% matching) in the target species
will lead to better amplification efficiency as reflected in qPCR analysis. Moreover, the
proportion of pyrosequencing reads obtained from a mixed template PCR analysis will reflect the
amplification efficiency of qPCR for each target-specific primer set.
10
MATERIAL AND METHODS
Target species selection and specimen collection
Three local species from the insect order Trichoptera (Ceratopsyche bronta, C. sparna,
and Chimarra obscura) and three local species from the insect order Ephemeroptera
(Maccaffertium interpunctatum, M. modestum, and Caenis diminuta) were selected to test the
effect of primer bias. In both cases, two species were selected from the same genus and one
species was selected from another distantly related genus in the same family. These insect orders
were selected because of their importance in freshwater biomonitoring programs. Target species
were chosen because of their abundance and availability, which allows access to fresh material
for downstream analyses.
Three sampling sites were selected for this study. The first two were near Fredericton,
New Brunswick. These sites were the Marysville Bridge on the Nashwaak River (45°59'4.19"N,
66°35'29.40"W), and the Renous River (46°47'46.65"N, 66°11'58.52"W). The third site was on
the Grand River in Ontario (43°50'0"N, 80°25'0"W) close to the Elora Conservation Area. Both
adult and larval insect samples were obtained from all three sites during the spring and summer
of 2009. A light trap technique was used to collect adults, and each individual was placed in a
1.5 ml tube containing 95% ethanol. A total of 140 Trichoptera individuals from the two New
Brunswick sites were placed in separate empty tubes, frozen overnight and pinned and identified
using the taxonomic key on the next day.
To select target samples, a total of 279 individual insects from the 6 species were all
either pinned or sorted in ethanol from the three sites, and were tentatively classified on the basis
of morphological characteristics, and sorted into three 96-well plates.
11
DNA extraction
A single leg from each individual was placed into a 10 MP lysing matrix tube (MP
Biomedicals Inc., Solon, Ohio USA) and homogenized using the MP FastPrep-24 Instrument
(MP Biomedicals Inc.) set at “6” for 30 seconds. DNA was extracted from each homogenized
tissue sample using a NucleoSpin tissue kit (MACHEREY-NAGEL Inc. Bethlehem,
Pennsylvania , USA) following the manufacturer’s instructions. The DNA was eluted with 70 l
of molecular biology grade water pre-warmed to 70 °C.
Primer design and optimization
Routine DNA barcoding of target samples followed standard COI barcoding protocols
(Hajibabaei et al., 2005). A full-length COI DNA barcode was amplified using the
LCOI490/HCO2198 primers (Folmer et al., 1994). In order to evaluate primer binding bias,
additional primers were designed with 100% match to the sequence of the target species.
Previous studies focusing on the amount of DNA barcode sequence information needed for
species differentiation and resolution have shown that a partial fragment of the standard COI
barcoding region can be informative enough to discriminate species in most groups (Hajibabaei
et al., 2006; Hollingsworth et al., 2009; Janzen et al., 2005). Following these studies and by
taking advantage of available barcode sequences (for primer design), the species-specific primers
were designed within the COI standard barcode region.
After aligning the available barcodes for the target species, two regions for designing
primers were selected. Twelve primer sets were designed in total: six were designed near the 5’
end of the COI DNA barcode region (Set A) and the other six primer sets were designed (Set B)
12
at the 3’ end of the DNA barcode region. Primer Set A targeted a 143bp amplicon of the COI
barcode region and Primer Set B targeted a longer fragment of 305bp at the opposite end of the
COI barcode region. The routine primer design conventions, including high G+C content (more
than 50%), minimal secondary structure, primer length and self complementarities were
considered (Aird et al., 2011; Lakes, 2001). Primers were checked for routine primer designing
rules using tools available on the Integrated DNA Technologies, Inc (Coralville, Iowa, USA)
website and produced by the same company. All primers were received in lyophilized tubes, and
diluted to 10mM working solutions (molecular biology grade water). Table 1 provides details of
primer codes and their nucleotide sequences.
The PCR mixture consisted of 17.5 l molecular biology grade water, 2.5 l 10X
reaction buffer, 2mM of 50 mM MgCl2 , 0.2mM of 10 mM dNTPs mix, 0.2μM of 10μM, 0.2
μM of 10μM reverse primer and 5 U/ μl Invitrogen’s Platinum Taq polymerase in a total volume
of 25 μl. The amplification regime was set to initial denaturing at 94°C for one min, followed by
4 cycles of denaturing at 94°C for 40 s, annealing at 45°C for 40 s and extension at 72°C for one
min. For the next 35 cycles, the annealing temperature was increased to 50°C, followed by final
extension at 72°C for 10 min. Amplicons were visualized on 1.5% agarose gel using 0.3 l of
ethidium bromide for 5 μl of each PCR product in TE 10X buffer.
A consensus optimal condition (considering factors affecting PCR) was selected by
running test PCRs for each primer set for each species and selecting the condition where all
primer sets provided amplicons with relatively similar intensity on Agarose gels. For example,
an optimal annealing temperature of 50°C was selected after gradient PCR was done at varying
annealing temperatures of 40°C, 43.5°C, 46°C, 50°C and 55°C.
13
Sanger sequencing validation of amplicons
Amplicons were verified to correspond to the targeted fragment of the COI barcode
region by direct sequencing using a bidirectional Sanger sequencing approach utilizing BigDye
chemistry version 3.1 (Applied Biosystems). Excess primers and dNTPs were removed from the
sequencing reaction using EdgeBio’s AutoDTR96 (Gaithersburg, MD, USA), after which, the
purified products were visualized on an ABI 3730xl sequencer, Applied Biosystems (Foster City,
CA, USA).
Quantitative PCR
Figure 2 provides an overview of the qPCR experimental workflow. Below I provide the
details of major steps in this workflow.
1. Template selection and normalization
Quantitative PCR experiments were performed using three dilutions of DNA extracts (101
, 10-2 and 10-3) starting with the same concentration (250 ng/µl) in all tested specimens.
Additionally, normalized and purified amplicons from each species (amplified barcode region
using standard barcoding primers) were used as the DNA template for qPCR in six different
normalized dilutions.
2. Experimental design
Measurements with the Nanodrop spectrophotometer showed the DNA concentration
acquired from target species (Table 2). Quantitative PCR optimization was performed for
14
dilutions of 10-1, 10-2 and 10-3 for normalized genomic DNA extracts (250 ng/µl), whereas for
purified amplicon-based material (70 ng/µl), six dilutions (10-1, 10-2, 10-3, 10-4, 10-5, and 10-6)
were used. The experiment was designed as a matrix so that the PCR product for each species
matched with its own primers and every other primer. The matrix layout also allowed the primer
behavior among all target species to be studied.
Three dilutions (1000, 250 and 50 pg/ µl) were subsequently tested in qPCR (see below).
To obtain a presumably equal number of the target DNA template and to avoid fluctuations in
gene/mitochondrial copy number, normalized DNA extracts were used as a template to produce
an amplicon from the standard barcode region (Figure 2) that was then used as template for
subsequent qPCR analyses. Primer set, LCOI490/HCO2198 (Folmer et al., 1994) was used to
amplify the full-barcode amplicons. The same PCR condition for amplicon preparation used for
Sanger sequencing was used for preparation of amplicon based material. All amplicons were
purified using the QIAquick 96 PCR Purification Kit (Qiagen Inc. Toronto, Ontario, Canada) and
subsequently quantified using the NanoDrop spectrophotometer ND-1000 (V3. 3.0), and
normalized on the basis of the least concentrated amplicon.
3. Reaction conditions for qPCR experiments
QuantiTect SYBR® Green PCR kit (Qiagen) and Eppendorf Mastercycler® ep realplex Thermal
Cyclers were used for all qPCR experiments. Based on primer optimization results, the annealing
temperature was set at 50°C for all subsequent qPCR experiments. Other PCR variables were
optimized as well. For example, the concentration of MgCl2 was set to 7mM final concentration
instead of 2mM. Likewise, primer concentration was set to 900 nM after testing 300 nM, 600
15
nM, 900 nM and, 1200 nM. PCR reactions also included 2x quantitech SYBR green PCR master
mix (12.5 l per reaction), 2 l of DNA template (for both genomic DNA and full-barcode
amplicons as template) and, RNAse-free water to a total volume of 25 l for each reaction.
4. Data analysis
All qPCR experiments were performed in triplicate to determine the stability of the
results and the average of the three replicates was used for the qPCR analysis (Rieu and Powers,
2009; Udvardi, et al., 2008). Standard curves were generated from the machine default software
and the logarithm of relative amplification and threshold cycle (CT) values were determined. The
CT value is used commonly in reporting qPCR results and corresponds to the cycle number in
which the fluorescent signal of the reaction passes the threshold line. The CT value is inversely
related to the amount of starting template. Assuming that PCR is operating with 100% efficiency,
the copy number of amplicons doubles every cycle.
The Eppendorf analysis software (Eppendorf mastercycler ep, realplex 2.0) was used to
analyze the results; CT values were recorded with a default threshold setting of 100 and an
automatic mode baseline setting for all target specimens. To ensure consistency of qPCR
experiments in different target species and primer combinations, a standard curve was generated
for each primer/species using the CT value with the threshold set at 100 in 6 different dilutions.
To describe the difference between the CT value of the target gene and the CT value of the
corresponding gene (COI), ∆CT value is calculated:
∆CT = CT (target species with specific primers) – CT (non target species with the same primers)
16
I used 2∆CT to calculate the copy number of generated amplicons in sample A relative to
that in sample B. For example if ∆CT between species A and B is 7 cycles (it takes 7 more
cycles to see amplification of A), then there is:
27 = 128 times more B than A
17
454 pyrosequencing
1. Experimental design
Amplicon-based metagenomics analysis is one of the major applications of next
generation sequencing (NGS) technology in biodiversity science. The amount of data produced
by NGS technology provides insights into the diversity of organisms in bulk samples in an
unprecedented way. Specifically, for amplicon-based analysis of biodiversity, Roche 454pyrosequencing technology has been the most practical choice since this technology produces
longer reads as compared to other available NGS options, namely Illumina and SOLiD (Pandey
et al., 2011).
Since 454 pyrosequencing and other NGS approaches are becoming the main tools for
the analysis of mixed environmental samples, I used two experimental mixtures to test primerbinding properties in 454 experiments. The first mixture consisted of an equimolar pool of the
DNA extracts from all six target species, while the second included an equimolar pool of purified
full-length COI DNA barcode amplicons of the target species (following the same procedure as
amplicon-based qPCR analysis described above). Full-length DNA barcode amplicons of each
target were normalized to 70ng/µl, and a 10-3 dilution (1µl of PCR in 999µl of water) was used
to prepare the equimolar pool (Figure 3).
2. Multiplexing amplicons
In order to combine sequencing reactions for multiple specimens in a single 454
sequencing lane and further separate and track individual 454 sequencing reads, Multiplex
Identifier sequence tags/ molecular barcodes (MID) (Binladen et al., 2007) were designed for
18
each target species and were incorporated in each species-specific primer set (A and B) .
Additionally, because the sequences of the primers themselves were not fully discriminatory and
in order to rule out any mismatch and wrong assignments or sequencing errors, MIDs were
employed in this 454 analysis. Each MID is a 10-base oligonucleotide (Table 3).
The 454 experiment was completed in two physically separated lanes in a 16-lane 454
picotiter plate. One lane was used for genomic DNA-based analysis (for primer sets A and B)
and the other for PCR product based material (for primer sets A and B).
3. Amplicon preparation
The first PCR was performed with target specific primers. Each PCR reaction contained 2
µl pooled DNA templates (250 ng/µl each), 17.5 µl molecular biology grade water, 2.5 µl 10×
reaction buffer, 1 µl 50× MgCl2 (50 mM), 0.5 µl dNTPs mix (10 mM), 0.5 µl forward primer (10
mM), 0.5 µl reverse primer (10 mM), and 0.5 µl Invitrogen's Platinum Taq polymerase (5 U/µl)
in a total volume of 25 µl. The PCR started with heated lid at 95°C for 5 min, followed by 15
cycles of 94°C for 40 sec, 43.5°C for 1 min, and 72°C for 30 sec, a final extension step at 72°C
for 5 min, and hold at 4°C. All target species amplicons were purified using Qiagen's MiniElute
PCR purification columns and eluted in 50 µl molecular biology grade water. The amplicons
from the first PCR were used as template in the second PCR with similar conditions using 454
fusion-tailed primers in a 30-cycle amplification regime. The second PCR was used to attach
fusion tails to the amplicons to allow them to bind to the beads in the 454 emulsion PCR
(described below). For all PCRs the Eppendorf Mastercycler gradient S thermalcycler was used.
19
The results for PCR success were visualized by agarose gel electrophoresis (1.5%) and negative
controls were included in all experiments.
4. 454 Pyrosequencing amplicon library preparation
In 1.5ml tubes, 22.5ul of the generated amplicons were mixed with 22.5ul of molecular
grade water. To this mix, 72µl of AMPure beads were added and vortexed well. The mixture was
stored at room temperature for 10 minutes in a Magnetic Particle Concentrator (MPC). Unused
reagents and primer dimers were washed away with 70% ethanol and fragments were eluted with
10µl of 1× Tris EDTA (TE) buffer.
Subsequently, the quantified libraries were amplified in micro-reactors through emulsion
PCR (emPCR) followed by Streptavidin bead enrichment and emulsion breaking. The beads
attached to amplified DNA fragments were denatured with 1N sodium hydroxide solution and
annealed to a specific sequencing primer. All these steps and subsequent sequencing steps on the
454 instrument were performed according to Roche-454 GS FLX amplicon sequencing manual
protocol updated in October 2009 and revised by November 2010 (Roche 2009).
5. 454 Data analysis framework
The FASTA files (FNA) and the quality score files (QUAL) were obtained from the 454
FLX Sequencer after signal processing. Both FNA and QUAL files were generated through
Roche signal processing software using amplicon processing with default settings.
Data analysis was performed using two approaches:
20
A. Manual analysis: sequences were inspected by eye in sequence editing software such as
Bioedit (Hall, 1999) and the quality-filtering step was omitted for manual filtering. This
approach allowed the retrieval of a maximal number of reads for subsequent analysis (see
Results for details). I used all the generated sequences to count the number of sequences
generated by each primer set for each target species.
B. Automated analysis: In this approach the SeqTrim software (Falgueras et al. 2010) was used
for filtering low quality sequences based on set criteria (See below).
Automated sequence filtering
After obtaining both FNA and QUAL files, all MIDs were sorted with zero mismatches.
Using quality filter software SeqTrim (Falgueras et al., 2010), the sequences were filtered as
follows:
A quality filter with a 10bp sliding window was applied to the sequences. If the Phred score
(Ewing et al. 1998; Ewing and Green 1998) was less than 20 for any window of 10 bp, the
sequence was deleted. After quality filtering, all sequences were sorted based on their
amplification primers and all sequences shorter than 80bp were removed. The remaining
sequences were clustered using the UClust program(Edgar, 2010) and all clusters with less than
3 reads were removed. Finally, all sequences were Megablasted to the reference library and the
number of reads for each target species was determined. The above routine was performed using
a Perl script (Wall, et al., 2000) and filtering was completed using SeqTrim filtering software.
21
Manual sequence analysis
By using the manual sequence analysis method I omitted the filtering step to keep all
sequences and used BioEdit and MEGA to sort sequences and eliminate low quality sequences. I
sorted the sequences based on the multiple identifiers (MID) with zero mis-matches. After
sorting each MID based on the forward and reverse primer sequences, all MIDs and primers
were trimmed and the remaining sequences were sorted by length to a minimum of 100bp to be
prepared for alignment. Sequences were then aligned using available reference sequences of the
6 target species. Finally, all sequences were clustered by constructing a neighbor – joining (NJ)
tree from Kimura 2-parameter sequence divergence estimates in MEGA4 (Tamura et al., 2007). I
used this tree to cluster my sequences so I could count the sequences belonging to each species
more effectively.
22
RESULTS
Quantitative PCR Results
The 1000 pg/ µl and 50 pg/ µl template dilutions gave CT values that were either too low
(≤ 10 cycles) or too high (≥ 38 cycles), respectively using a 100 fluorescence threshold. The 250
pg/ µl dilution gave CT values in the expected range (≥10 and ≤ 38).
The results from qPCR experiments using total genomic DNA as template did not show a
general trend that can either support or refute the expected higher efficiency of species-specific
primers in amplifying target species in any of the 6 species tested. Therefore I could not generate
standard curves based on genomic DNA results, because of the lack of data points in several
qPCR cycles in different combinations; therefore there were not sufficiently consistent to allow
generation of a standard curve.
In fact, there were cases of target species being less efficiently amplified as compared to
non-targets (Figure 4, Appendix 1). Thus, primer match may not be the only factor at play in this
experimental design and availability of target mitochondrial DNA might vary to the point that it
may offset potential primer mismatch. Hence, normalized PCR products were used to test the
primer binding bias.
Based on the results from experiments using genomic DNA templates, it was
hypothesized that the fluctuations and non-linear results might be due to variation in the
mitochondrial copy number or non-specific amplification.
Using the full-length DNA barcode amplicons as template for qPCR allowed me to
generate consistent standard curves for different target species (Figure 5, Appendix 2). It is
23
important to note that there are fluctuations in the slope of standard curves, which may indicate
different efficiencies of primer binding at different concentrations of template DNA (Figure 5).
In the qPCR experiments using genomic DNA as template, amplification only occurred
with 10-1 and 10-2 dilutions of the template DNA (250g/ul) with the exception of E1 (C.
diminuta) primers (sets A and B) that produced detectable amplification with a template dilution
of 10-3 as well (Tables 4 and 5). As previously mentioned, I noted a lack of consistency in
standard curve calculations of genomic DNA-based experiments (see above), and with the small
number of data points in the actual cross species qPCR experiments, I decided not to pursue this
line of experimentation further.
Unlike standard curves generated using genomic DNA as the template, standard curves
using full length amplicon templates were consistent across primer sets (Figure 5, table 6 and 7).
Hence, I predict that amplicon-based qPCR analysis of cross species primer tests should provide
reliable results on the effect of primer specificity in qPCR efficiency.
Results of the amplicon-based qPCR in set A supported this hypothesis. With the
exception of two primer sets, all target-specific primers amplified their target species more
efficiently than non-target species (Table 4 and 5). The first exceptional case was the primer set
designed for Maccaffertium modestum (E2mod), which amplified Maccaffertium interpunctatum
(E3int) in earlier cycles (e.g. more efficiently) than its own target species. The second exception
involved Caenis diminuta (E1dim) which amplified Ceratopsyche bronta (T2bro),
Maccaffertium modestum (E2mod) and Maccaffertium interpunctatum (E3int) in earlier cycles,
than it amplified itself. This observation is important because primer E1dim was designed for an
Ephemeroptera species but in fact amplified a Trichoptera species more efficiently (Table 4 and
5).
24
The number of species which could pass the threshold in all 6 different dilutions in qPCR
was higher when using primer set B than set A (Tables 4 to 7). For example, using primer set A
for Chimarra obscura (T1obs), none of the other species passed threshold in all dilution except
the target. However, set B primer designed for this species produced positive qPCR results for
other species (table 6 to 8). In the majority of experiments, using primer set B, target species
amplified more efficiently (passed threshold at lower cycle numbers) except for primers designed
for C. diminuta (E1dim.) and M. modestum (E2mod).
Relative Amplified Copies (RAC) Analysis
The Relative Amplified Copies (RAC) approach shows the rate of amplification of each
species as compared to the target species of a specific primer set (Jolla, 2004). Based on qPCR
results from experiments using full-length barcode amplicons as template, RAC plots were
generated for each target species for both primer sets (A and B) and for all dilutions. The plot for
C. obscura as the target species (Figure 6) shows the importance of dilution in relative
amplification of non-target species. All non-target species were amplified less efficiently as
compared to target species. However, substantial differences exist in the relative amplification of
the non-target species with RAC values ranging from 212 for C. bronta to 76 million for C.
sparna at a template dilution of 0.1. Moreover, only three of the five non-target species amplified
with the 10-2 template dilution and none amplified at higher dilutions. Similar analyses were
conducted for all other combinations of target species and primer sets A and set B (Appendix 3).
25
Quantitative and qualitative analysis of pyrosequencing reads
Using both primer sets A and B, 454 pyrosequencing reads were obtained for amplicons
directly generated from genomic DNA mixtures of the target species and from mixtures of fulllength COI barcode amplicons. A total of 10,034 reads were generated from genomic DNA
templates and 13,681 reads from full-length COI barcode amplicons.
The distribution of sequence read lengths has been used as a measure to evaluate
pyrosequencing run quality. I sequenced two amplicons of 143 bp (set A) and 305 bp (set B).
However, the addition of PCR primers, pyrosequencing fusion tail and MID tags increases the
total size of each amplicon by about 50 bases. Hence, sequence reads should optimally be
distributed around 193 bp (set A) and 355 bp (set B).
Automated sequence analysis conducted by SeqTrim software greatly reduced the
number of sequence reads as compared to raw sequences obtained. Only 6.5% and 4.6% of the
reads passed SeqTrim in genomic DNA-based and amplicon-based analyses, respectively. This
rather small proportion of reads did not provide a stable trend for target species specificity of
primers and compatibility of 454 analysis and qPCR results. For example, in the automated
analysis of genomic DNA templates, both primer sets designed for C. obscura (T1) showed
results only for their target species (108 sequences for set A and 35 sequences for set B) (Table
13). Conversely, primer set A for C. bronta (T2) did not produce any results for the target
species and set B only produced 34 reads. However, primer set B for T2 produced 179 reads for
non-target Ephemeroptera species, C. diminuta (E1), which is much more than the number of
reads produced for its target species.
Fewer reads were obtained after Seqtrim filtering of genomic DNA templates compared
to pyrosequenced COI amplicons (Table 11 and 12, 10034 reads from genomic DNA pooled
26
templates and 13681 reads from pooled full-length barcode amplicons templates). In 4 out of 6
cases, target species produced more reads than non-targets, but these results were only obtained
by one of the two primer sets in each case (Table 13). On the other hand, there were 4 cases in
which the target species did not show the highest amount of amplification. As an example, 4
reads were obtained for the target species using C.obscura (T1) primer set B while 8 reads were
obtained for C.diminuta (E1) and 30 reads for C. bronta (T2) (Table 13).
The manual analysis of 454 sequences provided a higher number of sequence reads
compared to SeqTrim (Table 13). In other words, many sequences that did not pass SeqTrim
filter were retrieved after manual inspection and editing of each pyrosequence read.
Consequently, 21.5% of sequences obtained from genomic DNA templates and 31.4% of
sequence reads from amplicon templates passed manual inspection and were used for subsequent
comparisons.
In the manual analysis of the pyrosequences obtained from DNA-based pooled material,
target species produced more reads in both primer sets with the exception of C. bronta (T2). In
this case, E2 (M.modestum) and E1 (C.diminuta) produced more reads, for primer sets A and B,
respectively (Table 13). In manual analysis of amplicon-based material, target species produced
more reads in both primer sets with two exceptions. In one case, T1 (C. obscura) COI amplicons
produced 1.6X more reads than the target species using T2 (C.bronta) primer set B (Table 13).
The second exceptional case was manual analysis of DNA-based material using E2 (M.
modestum) primer set B, for which T1 (C. obscura) COI amplicons produced 1.9X more reads
than the target species.
An important factor in the utility of NGS is the ability to parallelize the analysis of many
templates in one sequencing reaction. Aside from using this approach in analyzing mixed DNA
27
templates such as environmental samples, sets of specific oligonucleotide tags (MIDs) can be
used for mixing amplicons and then retrieving corresponding sequences bioinformatically.
However, the efficiency of this MID approach needs to be evaluated to be able to use this
approach in applications reliably. Here, we used 6 MIDs for our target species primer sets (A and
B). Based on the analysis of raw 454 reads, it is clear that the MID approach can provide a rather
uniform distribution of sequence reads for each MID (Figures 7, Table 1 in Appendix 3).
However, we observed some fluctuations in the number of reads per MIDs in amplicon based
material (Figure 8).
28
DISCUSSION
Since the early days of NGS, most of its applications in biodiversity science have been
focused on discovering unknown biodiversity from the bottom of the ocean (Sogin et al., 2006)
to the human microbiome (Gilbert et al., 2008). These applications have mainly been focused on
data generation and biological interpretations by using much higher sequencing capacity offered
by NGS platforms. However, some recent studies have illuminated the importance of NGS data
quality and the fact that low quality data may lead to misleading biological interpretations
(Quince et al., 2009). NGS workflow and potential biases associated with it become even more
critical in applications that involve targeting specific groups of organisms, especially in socioeconomically important taxa such as pathogens, pests and bioindicator species. This study was
conducted to specifically address the issue of amplification bias in NGS analysis of DNA
barcodes (and similar marker gene amplicons) from two sets of closely related target species
(fresh water bioindicator species in this case).
NGS technologies, in general, have made the genomic analysis of environmental samples
such as benthos, soil, water or bulk samples of terrestrial or marine biota more feasible. For
example, several recent studies have demonstrated the accuracy and reproducibility of the 454
pyrosequencing results (Hajibabaei et al., 2011; Schwartz et al., 2011; Shokralla et al., 2012).
More specifically, short fragments of COI DNA barcodes were successful in providing data for
identification of freshwater invertebrates for biomonitoring purposes (Hajibabaei et al., 2011).
The purpose of this study was to advance our understanding of genomics analysis of
mixed environmental samples by developing a qPCR-based approach with customized primers to
quantify species from mixed samples and to optimize and select primers for downstream next29
generation sequencing analysis. This work will hopefully help us use NGS technologies in realworld biomonitoring applications. The majority of studies using qPCR have focused on gene
expression analysis and methods developed to analyze and interpret qPCR results are mainly
geared towards gene expression (Livak & Schmittgen, 2001; Ohtsu et al., 2007; Selinger et al.,
1998; Torres et al., 2008; Wang, 2003). However, in recent years qPCR has been used in the
molecular diagnosis of infectious diseases or genetic defects (Francois et al., 2003; Menard et
al., 2008). Because this study aimed at evaluating qPCR as a method to test efficiency and
behavior of primers in multi-template amplifications and no reference gene or target was
involved I decided to use an alternate approach to analyze the data.
Primer behavior in multi-template PCR
Previous studies on multi-template PCR bias in template-to-product ratios on bacteria
suggested that there are numerous uncertainties about the source of this problem ( Polz and
Cavanaugh 1998; Thompson, et al., 2002; Acinas et al. 2005). However, aside from a number of
studies mainly conducted prior to the introduction of NGS, the issue of primer selection for
multi-template PCR remains understudied. Perhaps an important factor that has contributed to
this problem is the notion of universal primers and that selecting genomic targets with conserved
primer binding sites is the only solution to achieve optimal amplification (Sogin et al., 2006).
Consequently, the majority of studies that target environmental samples for NGS analysis use
ribosomal markers such as 16S rDNA and 18S rDNA genes for targeting prokaryotes and
eukaryotes, respectively. The proponents of these genes have suggested that the difficulty in
designing primers for other genes such as COI DNA barcodes is a reason to abandon using these
30
markers in NGS studies of environmental samples (Creer et al., 2010; Wang et al., 2007).
However, recent work has shown that differential amplification (PCR bias) is problematic in
NGS analysis even for ribosomal genes (Hajibabaei et al., 2011; Schwartz et al., 2011). It is
widely accepted that quantitative analysis of NGS amplicon results should be interpreted with
caution. In many cases, a number of specific taxonomic or functional genes are targets of NGS
analysis (Hajibabaei et al., 2011). These cases demand better understanding of primer behavior
in multi-template PCR.
Quantitative PCR as a tool for target identification
Different commercially available qPCR tests are increasingly used for measuring levels
of gene expression and for target identification in molecular diagnostic tests of genetic diseases
or infectious agents. These tests typically use differential amplification as a measure for a
specific gene expression or gene target validation. Because qPCR instruments and reagents are
relatively cheap and tests can be performed rather quickly and do not require large lab
operations, qPCR is now a workhorse in many molecular biology labs. In this study, I
demonstrated that qPCR has the potential to be used for validation of PCR primers before they
are used in more expensive NGS analysis of multi-template DNA such as bulk environmental
samples. However, my experimental design challenged the sensitivity of qPCR when genomic
DNA was used as the template. Hence, results obtained from these comparisons could not
provide conclusive evidence for primer specificity. Nevertheless, when more uniform amplicons
were used as templates, qPCR behaved mainly as expected and target species showed more
efficient amplification in the majority of cases.
31
This study is the first attempt to use qPCR for validating primers for NGS analysis of
multi-template PCR for taxonomic identifications. However, qPCR has recently been suggested
as a method for library quantification in NGS analysis of whole genomes (Buehler et al., 2010).
In this case, specific primers target adaptor sequences (common among all genomic fragments)
at different dilutions and qPCR analysis is conducted at different steps of library preparation to
provide a guide for selecting the optimal dilution for downstream NGS. However, in the current
study, primers that target different taxa were selected based on their target specificity in qPCR
analysis of amplicons, which offsets fluctuations in target copy number in genomic DNA.
Primers with the best target specificity can then be used in NGS experiments.
Optimal NGS analysis of target genes and taxa
Although NGS approaches have the capacity to generate a large volume of DNA
sequences, they often involve tedious workflows and require highly skilled bioinformaticians to
handle the data. Additionally, available software may not provide the optimal tools for data
filtering and analysis, as I observed in this study. Lack of efficient software dictated a rather
tedious manual approach in data editing, but this approach allowed recovery of many additional
sequences for downstream analysis.
Results from pyrosequencing analysis provided evidence for the utility of specific primer
sets for targeting genes and species of interest. In contrast to universal primers, combinations of
species-specific primers may provide a more reliable solution to avoid false negatives in NGS
analysis of bulk environmental samples. Fluctuations in gene copy number or differences in
biomass can influence the utility of any primer set in a mixture. However, even in our analysis of
32
genomic templates (which are potentially more prone to gene copy number fluctuation) we were
able to detect our target species using a combination of two target-specific primer sets (Table 8)
Moreover the efficiency and slope of each primer set has been calculated and shown in table 9.
Quantitative analysis of bulk samples using mitochondrial markers is challenging. The
mtDNA copy number per reaction can vary between species and tissue types in mixed
environmental samples. In my experiments, I overcame the fluctuation effect of different gene
copies within certain biomass on amplification dynamics by performing another set of
experiments using normalized full-length DNA barcode amplicons as templates for my species
specific PCRs (Figure 2). In real environmental sample with a wide range of individuals’ sizes,
this approach can help to interpret the NGS results and relate the numbers of generated
sequences to the known information about each individual biomass.
Automated SeqTrim analysis greatly reduced the number of sequences that were used in
comparative analysis of primers. Moreover, there was a trend towards target specificity using
some primer sets, suggesting that it is not reasonable to use only a few sequences in these
comparisons. However, substantially more sequences passed manual inspection and provided the
basis for our comparative analysis. We investigated two types of material as template for PCR,
genomic DNA and COI amplicons. Analysis of both genomic and amplicon templates shows a
rather strong trend towards more efficient sequencing (as reflected in number of sequence reads
in each comparison; Tables 9 to 12) by target-specific primers. However, there are few
exceptions in both DNA and amplicon-based analyses. These exceptions may be due to higher
number of available templates, especially for genomic DNA, as a consequence of variation in
mitochondrial DNA copies. However, in the only exceptional case using genomic templates
(C.bronta (T2)), two different non-target species produced more sequences than target species
33
for the two primer sets A and B. If a single non-target species had outcompeted the target
species, then the likelihood of a higher mitochondrial copy number for this non-target species
seems to be higher. On the other hand, the only two exceptions in analysis of amplicon templates
are linked to primer set B and in both cases, the non-target species that outcompetes the target
species is T1 (C.obscura).
Comparing qPCR and 454 results
I had hypothesized that qPCR results obtained for each target species using its specific
primer sets would be reflected in the corresponding pyrosequencing reads (Table 8). In each case
I ascertained if the target species was more efficiently amplified in qPCR and pyrosequenced.
These comparisons show more consistency between qPCR and pyrosequencing using amplicon
templates. However, results obtained for each primer set are somewhat different. In primer set A,
we observed an almost perfect agreement between qPCR and 454 results, supporting the
hypothesis that target species are amplified and sequenced more efficiently using their specific
(i.e.100% matching) primers. In this comparison, with the exception of qPCR using the T2
primers, all other target species were more efficiently amplified and pyrosequenced. On the other
hand, qPCR and pyrosequencing data showed the same pattern using primer set B with the
exception of the T3 primers. However, the target species amplified and pyrosequenced more
efficiently in only half the cases.
Perhaps the most realistic (applicable) comparison can be conducted between qPCR
analysis of amplicon templates and pyrosequencing of genomic templates. In other words,
because qPCR testing of primers can potentially be used as a guide to select optimal target34
specific primers for pyrosequencing analysis, these comparisons can provide important insights
concerning the utility of this approach in a wider context. The majority of target species showed
a consistent pattern between qPCR of amplicon templates and pyrosequencing of genomic
templates. However, the two target species, T3 and E3, were exceptions as results using primer
sets for both these species were not similar when I compared the qPCR and pyrosequencing
results.
Towards an standardized approach for metagenomics analysis of environmental DNA
Based on recent advances in genomics instrumentation and bioinformatics tools it is clear
that biological sciences and a wide range of socio-economic applications will rely on genomics
information captured from environmental DNA. For example, a special issue of Molecular
Ecology (April 2012) was devoted to Environmental DNA focusing on recent advancements and
applications of NGS in ecological research (Baird & Hajibabaei, 2012; Shokralla et al., 2012;
Taberlet, et al., 2012) The excitement in using NGS tools for many different applications has led
to many primary publications and may potentially lead to better technologies and tools.
However, the user communities (i.e. ecologists) should work with genomics and bioinformatics
experts to overcome technical challenges that can limit the usability of NGS in larger-scale
studies. Most of the studies in ecological use of NGS are in fact one-off or proof of concept
(Callaway, 2012).
A recent Genome Web article highlights challenges in moving NGS tools to real-world
diagnostics and the fact that many industry leaders believe NGS is still far from being applicable
in standard diagnostics settings (Karow, 2012). These challenges mainly involve difficulties (and
black boxes) in data generation and workflows as well as data quality and lack of efficient
35
standard software to differentiate accurate sequence information from errors. In fact, this thesis
confirms the above-mentioned issues both at the level of molecular biology (PCR bias and
primer specificity) and bioinformatics analysis (automated versus manual sequence filtering).
This line of work will hopefully set the stage for the use of available tools such as qPCR and
specific primers for more efficient and standardized application of NGS in biodiversity analysis.
Based on my study, qPCR can be efficiently used for designing primers for target specific
groups of organisms according to the environmental/ecological question. Commonly used
bioindicators for fresh water bioassesment, such as Trichoptera, Ephemeroptera and Plecoptera
would be a good potential target for this type of studies. By this approach, testing the designed
primers through qPCR would be applicable among different individual species of these groups
starting with both gDNA and amplicon material.
This study showed that qPCR can be used as a proxy for testing the efficiency of PCR
primers for amplifying mixed environmental samples for biomonitoring applications.
The primers designed for this study could be able to perform in relatively efficient way
however for further studies slight changes in designing the primers for in-groups is
recommended.
DNA material could illustrate the behavior of the primers with individual sample as the
real life while the amplicon material could provide the chance of dealing efficiently with the
sequence variation only. Also the amplicon based material will eliminate the variation in the
mitochondrial copy number between different species at the same time. Once we reach the
optimal primer design which can amplify the majority of the targets in a relative uniform pattern,
then the results obtained from qPCR could be used in developing optimized 454 pyrosequencing
amplicon based analysis for bulk environmental sample.
36
REFERENCES
Acinas, S. G., Sarma-Rupavtarm, R., Klepac-Ceraj, V., & Polz, M. F. (2005). PCR-Induced
sequence artifacts and bias: Insights from comparison of two 16S rRNA clone libraries
constructed from the same sample. American Society of Microbiology, 71(12), 8966-8969.
Aird, D., Ross, M. G., Chen, W.-S., Danielsson, M., Fennell, T., Russ, C., Jaffe, D. B., et al.
(2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.
Genome Biology, 12(2), R18.
Applied Biosystems. (2008). Guide to performing relative quantitation of gene expression using
real-time quantitative PCR. Applied Biosystems.
Baird, D. J., & Hajibabaei, M. (2012). Biomonitoring 2.0: a new paradigm in ecosystem
assessment made possible by next-generation DNA sequencing. Molecular ecology, 21(8),
2039-44.
Baird, D. J., Pascoe, T. J., Zhou, X., & Hajibabaei, M. (2011). Building freshwater
macroinvertebrate DNA-barcode libraries from reference collection material: formalin
preservation vs. specimen age. Journal of the North American Benthological Society, 30(1),
125-130.
Becker, A., Reith, A., Napiwotzki, J., & Kadenbach, B. (1996). A quantitative method of
determining initial amounts of DNA by polymerase chain reaction cycle titration using
digital imaging and a novel DNA stain. Analytical Biochemistry, 237(2), 204-207.
Binladen, J., Gilbert, M. T. P., Bollback, J. P., Panitz, F., Bendixen, C., Nielsen, R., &
Willerslev, E. (2007). The use of coded PCR primers enables high-throughput sequencing
of multiple homolog amplification products by 454 parallel sequencing. PLoS ONE, 2(2),
e197.
Bonada, N., Prat, N., Resh, V. H., & Statzner, B. (2006). Developments in aquatic insect
biomonitoring: a comparative analysis of recent approaches. Annual Review of
Entomology, 51, 495-523.
Buehler, B., Hogrefe, H. H., Scott, G., Ravi, H., Pabón-Peña, C., O’Brien, S., Formosa, R., et al.
(2010). Rapid quantification of DNA libraries for next-generation sequencing. Methods,
50(4), 15-18.
Callaway, E. (2012). A bloody boon for conservation. Nature News. Available:
http://www.nature.com/news/a-bloody-boon-for-conservation-1.10499
37
Claesson, M. J., Wang, Q., O’Sullivan, O., Greene-Diniz, R., Cole, J. R., Ross, R. P., & O’Toole,
P. W. (2010). Comparison of two next-generation sequencing technologies for resolving
highly complex microbiota composition using tandem variable 16S rRNA gene regions.
Nucleic Acids Research, 38(22), e200.
Creer, S., Fonseca, V. G., Porazinska, D. L., Giblin-Davis, R. M., Sung, W., Power, D. M.,
Packer, M., et al. (2010). Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and
promises. Molecular Ecology, 19(s1), 4-20.
Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST.
Bioinformatics, 26(19), 2460-2461.
Ewing, B., & Green, P. (1998). Base-Calling of automated sequencer traces using Phred. II .
Error probabilities. Genome Research, 8(3), 186-194.
Ewing, B., Hillier, L., Wendl, M. C., & Green, P. (1998). Base-Calling of automated sequencer
traces Using Phred. I . Accuracy assessment. Genome Research, 8(3), 175-185.
Falgueras, J., Lara, A., Fernandez-Pozo, N., Canton, F., Perez-Trabado, G., & Claros, M. G.
(2010). SeqTrim: a high-throughput pipeline for preprocessing any type of sequence reads.
BMC Bioinformatics, 11(1), 38.
Floyd, R., Abebe, E., Papert, A., & Blaxter, M. (2002). Molecular barcodes for soil nematode
identification. Molecular ecology, 11(4), 839–50.
Folmer, O., Black, M., Hoeh, W., Lutz, R., & Vrijenhoek, R. (1994). DNA primers for
amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan
invertebrates. Molecular Marine Biology and Biotechnology, 3(5), 294-299.
Francois, P., Pittet, D., Bento, M., Pepey, B., Vaudaux, P., Lew, D., & Schrenzel, J. (2003).
Rapid detection of methicillin-resistant Staphylococcus aureus directly from sterile or
nonsterile clinical samples by a new molecular assay. Journal of Clinical Microbiology,
41(1), 254-260.
Frézal, L., & Leblois, R. (2008). Four years of DNA barcoding: current advances and prospects.
Infection, genetics and evolution : Journal of Molecular Epidemiology and Evolutionary
Genetics in Infectious Diseases, 8(5), 727-36.
Gilbert, M. T. P., Kivisild, T., Grønnow, B., Andersen, P. K., Metspalu, E., Reidla, M., Tamm,
E., et al. (2008). Paleo-Eskimo mtDNA genome reveals matrilineal discontinuity in
Greenland. Science, 320(5884), 1787-9.
38
Gill, S. R., Pop, M., Deboy, R. T., Eckburg, P. B., Turnbaugh, P. J., Samuel, B. S., Gordon, J. I.,
et al. (2006). Metagenomic analysis of the human distal gut microbiome. Science,
312(5778), 1355-9.
Hajibabaei, M., DeWaard, J. R., Ivanova, N. V., Ratnasingham, S., Dooh, R. T., Kirk, S. L.,
Mackie, P. M., et al. (2005). Critical factors for assembling a high volume of DNA
barcodes. Philosophical Transactions of the Royal Society of London - Series B: Biological
Sciences, 360(1462), 1959-1967.
Hajibabaei, M., Shokralla, S., Zhou, X., Singer, G. A. C., & Baird, D. J. (2011). Environmental
barcoding: A Next-Generation sequencing approach for biomonitoring applications using
river benthos. PLoS ONE, 6(4), e17497.
Hajibabaei, M., Smith, M. A., Janzen, D. H., Rodriguez, J. J., Whitfield, J. B., & Hebert, P. D. N.
(2006). A minimalist barcode can identify a specimen whose DNA is degraded. Molecular
Ecology Notes, 6(4), 959-964.
Hall, T. A. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis
program for Windows 95/98/NT. Nucleic Acids Symposium Series, 41(41), 95-98.
Hebert, P. D. N., Cywinska, A., Ball, S. L., & DeWaard, J. R. (2003). Biological identifications
through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences,
270(1512), 313–321.
Hebert, P. D. N., & Gregory, T. R. (2005). The promise of DNA barcoding for taxonomy.
Systematic Biology, 54(5), 852-859.
Hollingsworth, P. M., Forrest, L. L., Spouge, J. L., Hajibabaei, M., Ratnasingham, S., Van Der
Bank, M., Chase, M. W., et al. (2009). A DNA barcode for land plants. Proceedings of the
National Academy of Sciences of the United States of America, 106(31), 12794-12797.
Janzen, D. H., Hajibabaei, M., Burns, J. M., Hallwachs, W., Remigio, E., & Hebert, P. D. N.
(2005). Wedding biodiversity inventory of a large and complex Lepidoptera fauna with
DNA barcoding. Philosophical Transactions of the Royal Society of London - Series B:
Biological Sciences, 360(September), 1835-1845.
Karow, J.(2012). Experts discuss challenges of moving next-generation sequencing into
diagnostics. Genome Web. Available: http://www.genomeweb.com/sequencing/expertsdiscuss-challenges-moving-next-gen-sequencing-diagnostics
Kennedy, S. (2011). PCR troubleshooting and optimization: The essential guide. Wydmondham:
Caister Academic Press. 235p.
39
Lakes, F. (2001). Optimization of annealing temperature to reduce bias caused by a primer
mismatch in multitemplate PCR. American Society of Microbiology, 67(8), 3753-5.
Limpiyakorn, T., Kurisu, F., & Yagi, O. (2006). Development and application of real-time PCR
for quantification of specific ammonia-oxidizing bacteria in activated sludge of sewage
treatment systems. Applied Microbiology and Biotechnology, 72(5), 1004-13.
Livak, K. J., & Schmittgen, T. D. (2001). Analysis of relative gene expression data using realtime quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods, 25(4), 402-8.
Lovejoy, T. E. (1997). Biodiversity: what is it? In: ML Reaka-Kudla, DE Wilson & EO Wilson,
editors. Biodiversity II: understanding and protecting our biological resources, Joseph
Henry Press, Washington D.C., pp. 7-14.
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A, Berka, J., et
al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature,
437(7057), 376-80.
Menard, J.-P., Fenollar, F., Henry, M., Bretelle, F., & Raoult, D. (2008). Molecular
quantification of Gardnerella vaginalis and Atopobium vaginae loads to predict bacterial
vaginosis. Clinical Infectious Diseases, 47(1), 33-43.
Meusnier, I., Singer, G. A., Landry, J.-F., Hickey, D. A., Hebert, P. D., & Hajibabaei, M. (2008).
A universal DNA mini-barcode for biodiversity analysis. BMC Genomics, 9(1), 214.
Mora, C., Tittensor, D. P., Adl, S., Simpson, A. G. B., & Worm, B. (2011). How many species
are there on Earth and in the ocean? PLoS Biology, 9(8), e1001127.
Munch, K., Boomsma, W., Willerslev, E., & Nielsen, R. (2008). Fast phylogenetic DNA
barcoding. Philosophical transactions of the Royal Society of London. Series B, Biological
sciences, 363(1512), 3997-4002.
Nash, R. (1989). The rights of nature: a history of environmental ethics. Madison: University of
Wisconsin Press. 304p.
Noss, R. F. (1990). Indicators for monitoring biodiversity: A hierarchical approach. Conservation
Biology, 4(4), 355-364.
Ohtsu, K., Smith, M. B., Emrich, S. J., Borsuk, L. a, Zhou, R., Chen, T., Zhang, X., et al. (2007).
Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.). The
Plant Journal: for Cell and Molecular Biology, 52(3), 391-404.
Ozawa, T., Tanaka, M., Ikebe, S., Ohno, K., Kondo, T., & Mizuno, Y. (1990). Quantitative
determination of deleted mitochondrial DNA relative to normal DNA in parkinsonian
40
striatum by a kinetic PCR analysis. Biochemical and Biophysical Research
Communications, 172(2), 483-489.
Pfaffl, M. W. (2001). Quantification strategies in real-time PCR, In: Bustin S.A, editor. A-Z of
quantitative PCR. La Jolla: International University Line. pp 87-112.
Pandey, R. V., Nolte, V., Boenigk, J., & Schlotterer, C. (2011). CANGS DB: a stand-alone webbased database tool for processing, managing and analyzing 454 data in biodiversity studies.
BMC Research Notes, 4(1), 227.
Pang, S., Koyanagi, Y., Miles, S., Wiley, C., Vinters, H. V., & Chen, I. S. (1990). High levels of
unintegrated HIV-1 DNA in brain tissue of AIDS dementia patients. Nature, 343(6253), 8589.
Piatak, M., Saag, M. S., Yang, L. C., Clark, S. J., Kappes, J. C., Luk, K. C., Hahn, B. H., et al.
(1993). Determination of plasma viral load in HIV-1 infection by quantitative competitive
polymerase chain reaction. AIDS, 7 Suppl 2, S65-S71.
Polz, M. F., & Cavanaugh, C. M. (1998). Bias in template-to-product ratios in multitemplate
PCR. Applied and Environmental Microbiology, 64(10), 3724-30.
QIAGEN. (2006). Critical factors for successful real-time PCR. QIAGEN . Available :
http://www.qiagen.com/selectlocation.aspx?redirect=%2fliterature%2frender.aspx%3fid%3
d23490
Quince,C., Lanzen,A., Curtis,T.P., Davenport,R.J., Hall,N.,Head,I.M., Read,L.F. and Sloan,W.T.
(2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nature
Methods, 6, 639–641.
Qu, X. D., Song, M. Y., Park, Y. S., Oh, Y. N., & Chon, T. S. (2008). Species abundance
patterns of benthic macroinvertebrate communities in polluted streams. Annales de
Limnologie International Journal of Limnology, 44(2), 119-133.
Rieu, I., and Powers, S. J. (2009). Real-time quantitative RT-PCR: design, calculations, and
statistics. American Society of Plant Biologists, 21(4), 1031-3.
Roche, (2006). Sequencing Method Manual, GS FLX Titanium Series. Available:
http://454.com/downloads/my454/documentation/gs-flx/method-manuals/GS-FLXTitanium-Sequencing-Method-Manual-%28Nov2010%29.pdf
Roesch, L. F. W., Fulthorpe, R. R., Riva, A., Casella, G., Hadwin, A. K. M., Kent, A. D.,
Daroub, S. H., et al. (2007). Pyrosequencing enumerates and contrasts soil microbial
diversity. The International Society of Microbial Ecology, 1(4), 283-90.
41
Rosenberg, D. M., & Resh, V. H. (1993). Introduction to freshwater biomonitoring and benthic
macroinvertebrates. In: D. M. Rosenberg & V. H. Resh, editors. Freshwater Biomonitoring
and Benthic Macroinvertebrates. New York: chapman and Hall. pp.1-9).
Roux, K. H. (2009). Optimization and troubleshooting in PCR. Cold Spring Harbor protocols,
ip66.
Schmieder, R., & Edwards, R. (2011). Quality control and preprocessing of metagenomic
datasets. Bioinformatics, 27(6), 863-864.
Schwartz, S., Oren, R., & Ast, G. (2011). Detection and removal of biases in the analysis of nextGeneration sequencing reads. PLoS ONE, 6(1), e16685.
Selinger, L. B., Khachatourians, G. G., Byers, J. R., & Hynes, M. F. (1998). Expression of a
Bacillus thuringiensis delta-endotoxin gene by Bacillus pumilus. Canadian Journal of
Microbiology, 44(3), 259-69.
Smith, P. J., McVeagh, S. M., & Steinke, D. (2008). DNA barcoding for the identification of
smoked fish products. Journal of Fish Biology, 72(2), 464-471.
Shokralla, S., Spall, J. L., Gibson, J. F., & Hajibabaei, M. (2012). Next-generation sequencing
technologies for environmental DNA research. Molecular Ecology, 21(8), 1794-805.
Sogin, M. L., Morrison, H. G., Huber, J. A, Welch, D. M., Huse, S. M., Neal, P. R., Arrieta, J.
M., et al. (2006). Microbial diversity in the deep sea and the underexplored “rare
biosphere”. Proceedings of the National Academy of Sciences of the United States of
America, 103(32), 12115-20.
Suess, M. J. (1982). Examination of water for pollution control. Oxford: Pergamon Press. 554 p.
Su, Z., Ning, B., Fang, H., Hong, H., Perkins, R., Tong, W., & Shi, L. (2011). Next-generation
sequencing and its applications in molecular diagnostics. Expert Review of Molecular
Diagnostics, 11(3), 333-343.
Taberlet, P., Coissac, E., Hajibabaei, M., & Rieseberg, L. H. (2012). Environmental DNA.
Molecular Ecology, 21(8), 1789-93.
Tamura, K., Dudley, J., Nei, M., & Kumar, S. (2007). MEGA4: Molecular Evolutionary
Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution, 24(8),
1596-1599.
Thompson, J. R., Marcelino, L. A, & Polz, M. F. (2002). Heteroduplexes in mixed-template
amplifications: formation, consequence and elimination by “reconditioning PCR”. Nucleic
Acids Research, 30(9), 2083-8.
42
Torres, T. T., Metta, M., Ottenwälder, B., & Schlötterer, C. (2008). Gene expression profiling by
massively parallel sequencing. Genome Research, 18(1), 172-7.
Udvardi, M. K., Czechowski, T., & Scheible, W. R. (2008). Eleven golden rules of quantitative
RT-PCR. American Society of Plant Biologists, 20(7), 1736-7.
Wall, L., Christiansen, T. & Orwant, J. (2000). Programming Perl (3rd edition). O’Reilly and
Associates. 1104p.
Wang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M., & Shafer, R. W. (2007). Characterization
of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance.
Genome Research, 17(8), 1195-201.
Wang, X. (2003). A PCR primer bank for quantitative gene expression analysis. Nucleic Acids
Research, 31(24), 154e-154.
Wang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M., & Shafer, R. W. (2007). Characterization
of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance.
Genome Research, 17(8), 1195-1201.
Ward, R. C., Loftis, J. C., & McBride, G. B. (1986). The “data-rich but information-poor”
syndrome in water quality monitoring. Environmental Management, 10(3), 291-297.
43
TABLES:
Table 1. Species-specific oligonucleotide primers targeting two fragments of cytochrome c oxidase 1 (COI) gene for Set A
(40F/183R) and set B (240F/545R). T1=Chimarra obscura, T2=Ceratopsyche bronta, T3=Ceratopsyche sparna, E1=Caenis diminuta,
E2=Maccaffertium modestum, E3=Maccaffertium interpunctatum.
Set A
Primer Code
T_40_F
T1
T_183_R
T_40_F
T2
E3
T3
E_183_R
T_240_F
T_545_R
E1
E_240_F
E_545_R
E2
E_183_R
E_40_F
T_240_F
T_545_R
E_183_R
E_40_F
E2
T2
T_183_R
E_40_F
E1
T_545_R
T_183_R
T_40_F
T3
T1
Primer code
T_240_F
E_240_F
E_545_R
E3
E_240_F
E_545_R
44
Set B
Sequence (5’-3’)
CCAGACATAGCCTTCCCTCG
20
GCTCCTGCTAATACAGG
17
CCAGATATAGCATTCCCCCG
20
GCTCCGGCTAAAACAGG
17
CCTGATATAGCTTTTCCTCG
20
GCTCCAGCAAGAACAGG
17
CCAGATATGGCATTCCCCCG
20
GCTCCTGCTAAAACAGG
17
CCTGATATAGCCTTCCCACG
20
GCTCCTGCTAATACAGG
17
CCTGATATGGCCTTCCCCCG
20
GCCCCTGCCAATACAGG
17
Table 2. Concentration of Genomic DNA extracts obtained from each target species as measured
by NanoDrop.
Target species
Voucher
DNA conc.
Amplicon conc.
Number
(ng/ µl)
(ng/ µl)
Chimarra obscura (T1)
STRI20091
31.6
136.5
Ceratopsyche sparna (T2)
STRI20092
41.1
281.1
Ceratopsyche bronta (T3)
STRI20093
48.2
242.8
Caenis diminuta (E1)
SEPH20091
6.0
210.5
Maccaffertium modestum (E2)
SEPH20092
0.9
71.8
Maccaffertium interpunctatum(E3)
SEPH20093
1.3
109.7
45
Table 3. 454 pyrosequencing tagged primer, species-specific primers modified by adding
Multiplex Identifier sequence tags (MID) were employed in 454 pyrosequencing experiments.
T1=Chimarra obscura, T2 =Ceratopsyche bronta, T3=Ceratopsyche sparna, E1=Caenis
diminuta, E2=Maccaffertium modestum, E3=Maccaffertium interpunctatum.
Name MID code
MID16-TCACGTACTA
SetA
MID16-TCACGTACTA
Primer code
Sequence (5` - 3`)
Tagged_T_1_40_F
TCACGTACTATTGATCAAGAATATTAGG
Tagged_T_1_183_R TCACGTACTACCYCCAATTATGATGGG
SetB
Tagged_T_1_240_F TCACGTACTACCAGACATAGCCTTCCCTCG
Tagged_T_1_545_R TCACGTACTAGCTCCTGCTAATACAGG
MID16-TCACGTACTA
MID16-TCACGTACTA
SetA
MID50_ACTAGCAGTA Tagged_T_2_40_F
ACTAGCAGTATTGATCAGGTCTAGTAGG
MID50_ACTAGCAGTA Tagged_T_2_183_R ACTAGCAGTACCCCCAATTATAATAGG
SetB
MID50_ACTAGCAGTA Tagged_T_2_240_F ACTAGCAGTACCAGATATAGCATTCCCCCG
MID50_ACTAGCAGTA Tagged_T_2_545_R ACTAGCAGTAGCTCCGGCTAAAACAGG
SetA
MID51_AGCTCACGTA
MID51_AGCTCACGTA
Tagged_T_3_40_F
AGCTCACGTATTGATCAGGATTAGTAGG
Tagged_T_3_183_R AGCTCACGTACCCCCAATTATAATTGG
SetB
MID51_AGCTCACGTA
MID51_AGCTCACGTA
Tagged_T_3_240_F AGCTCACGTACCTGATATAGCTTTTCCTCG
Tagged_T_3_545_R AGCTCACGTAGCTCCAGCAAGAACAGG
SetA
MID54_AGTGCTACGA Tagged_E_1_40_F
AGTGCTACGATTGATCTGGGATAGTAGG
MID54_AGTGCTACGA Tagged_E_1_183_R AGTGCTACGACCCCCAATTATGATGGG
SetB
MID54_AGTGCTACGA Tagged_E_1_240_F AGTGCTACGACCAGATATGGCATTCCCCCG
MID54_AGTGCTACGA Tagged_E_1_545_R AGTGCTACGAGCTCCTGCTAAAACAGG
SetA
MID56_CGCAGTACGA Tagged_E_2_40_F
CGCAGTACGATTGATCAGGGATGGTAGG
MID56_CGCAGTACGA Tagged_E_2_183_R CGCAGTACGACCTCCAATCATAATAGG
SetB
MID56_CGCAGTACGA Tagged_E_2_240_F CGCAGTACGACCTGATATAGCCTTCCCACG
MID56_CGCAGTACGA Tagged_E_2_545_R CGCAGTACGAGCTCCTGCTAATACAGG
SetA
MID61_CTATAGCGTA
MID61_CTATAGCGTA
Tagged_E_3_40_F
CTATAGCGTATTGATCGGGGATGGTAGG
Tagged_E_3_183_R CTATAGCGTACCTCCAATCATAATAGG
SetB
MID61_CTATAGCGTA
MID61_CTATAGCGTA
Tagged_E_3_240_F CTATAGCGTACCTGATATGGCCTTCCCCCG
Tagged_E_3_545_R CTATAGCGTAGCCCCTGCCAATACAGG
46
Table 4. CT values obtained in qPCR analysis for each Trichoptera primer set A. The templates
are shown in different dilutions in all Trichoptera (starting from 70 pg/ µl). A full length COI
barcode amplicon was used as template in qPCR for each target species. T1=Chimarra obscura,
T2=Ceratopsyche bronta, T3 = Ceratopsyche sparna, E1=Caenis diminuta, E2=Maccaffertium
modestum, E3= Maccaffertium interpunctatum. CT values in bold indicate the primer set tested
matches the template DNA. -- did not pass threshold.
log dilution
Dilution
T1 Primer on T1 amplicon
T1 Primer on T2 amplicon
T1 Primer on T3 amplicon
T1 Primer on E1 amplicon
T1 Primer on E2 amplicon
T1 Primer on E3 amplicon
log dilution
Dilution
T2 Primer on T1 amplicon
T2 Primer on T2 amplicon
T2 Primer on T3 amplicon
T2 Primer on E1 amplicon
T2 Primer on E2 amplicon
T2 Primer on E3 amplicon
-1
0.1
3.68
11.41
29.86
17.76
26.44
27.83
-1
0.1
29.78
13.41
20.92
35.58
22.85
23.07
-2
-3
-4
-5
-6
0.01 0.001 0.0001 0.00001 0.000001
10.68 11.67 17.87
21.65
26.1
17.17
---------23.18 27.38 30.78
34.57
-----34.89
---39.39 39.9
-2
0.01
32.21
18.53
22.2
37.61
23.68
22.82
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
---34.34
20.2 21.34
22.2
26.08
31.74 35.91
37.31
--36.55
--27.13 31.16
34.05
36.8
34.78 37.41
37.98
--
log dilution
-1
-2
-3
-4
-5
-6
Dilution
0.1
0.01 0.001 0.0001 0.00001 0.000001
T3 Primer on T1 amplicon
6.03 13.09 5.04 12.88
5.15
5.64
T3 Primer on T2 amplicon
3.79 11.51 4.31 11.37
10.72
4.42
T3 Primer on T3 amplicon 3.12 3.98 4.66
4.2
4.37
10.86
T3 Primer on E1 amplicon
12.74 14.8
11
5.68
5.5
11.05
T3 Primer on E2 amplicon
4.98 14.31 5.2
5.05
12.52
4.06
T3 Primer on E3 amplicon
6.2 15.67 5.26
5.81
4.39
5.7
47
Table 5. CT values obtained in qPCR analysis for each Ephemeroptera primer set A. The
templates are shown in different log dilutions from all six target species (starting from 70 pg/
µl). A full length COI barcode amplicon was used as template in qPCR for each target species.
E1=Caenis diminuta, E2=Maccaffertium modestum, E3= Maccaffertium interpunctatum,
T1=Chimarra obscura, T2=Ceratopsyche bronta, T3 = Ceratopsyche sparna. CT values in bold
indicate target species for the primer set tested. -- did not pass threshold.
log dilution
Dilution
E1 Primer on T1 amplicon
E1 Primer on T2 amplicon
E1 Primer on T3 amplicon
E1 Primers on E1 amplicon
E1 Primers on E2 amplicon
E1 Primers on E3 amplicon
-1
0.1
6.03
3.79
3.12
8.3
4.98
6.2
-2
0.01
28.6
32.49
32.29
9.34
21.27
23.35
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
31.03 37.87
32.71
-25.94 29.73
31.93
30.24
29.92 34.89
--11.61 12.74
15.55
18.65
23.75 27.1
30.34
29.89
35.02 31.75
31.92
31.28
log dilution
Dilution
E2 Primer on T1 amplicon
E2 Primer on T2 amplicon
E2 Primer on T3 amplicon
E2 Primers on E1 amplicon
E2 Primers on E2 amplicon
E2 Primers on E3 amplicon
-1
0.1
39.98
34.52
38.33
28.84
19.15
14.62
-2
0.01
32.17
38.36
37.74
33.57
21.49
15.62
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
35.65 32.17
35.65
-30.21 33.99
37.96
37.96
----38.38 35.65
--25.66 28.41
32.73
34.74
25.26 23.74
27.52
30.45
log dilution
-1
-2
-3
-4
-5
-6
Dilution
0.1
0.01 0.001 0.0001 0.00001 0.000001
E3 Primer on T1 amplicon
---23.37
27.16
29.91
E3 Primer on T2 amplicon
---21.68
25.32
28.23
E3 Primer on T3 amplicon
---32.15
33.75
-E3 Primers on E1 amplicon
26.46 31.89 35.29 27.66
31.06
33.58
E3 Primers on E2 amplicon
11.89 15.33 19.57 22.33
25.78
28.7
E3 Primers on E3 amplicon 4.74 10.12 15.55 17.32
19.45
21.88
48
Table 6. CT values obtained in qPCR analysis for each primer set B. The templates are shown in
different log dilutions in all Trichoptera (starting from 70 pg/ µl). A full length COI barcode
amplicon was used as template in qPCR for each target species. T1=Chimarra obscura,
T2=Ceratopsyche bronta, T3 = Ceratopsyche sparna, E1=Caenis diminuta, E2=Maccaffertium
modestum, E3= Maccaffertium interpunctatum. CT values in bold indicate target species for the
primer set tested. -- did not pass threshold.
log dilution
Dilution
T1 Primer on T1 amplicon
T1 Primer on T2 amplicon
T1 Primer on T3 amplicon
T1 Primer on E1 amplicon
T1 Primer on E2 amplicon
T1 Primer on E3 amplicon
log dilution
Dilution
T2 Primer on T1 amplicon
T2 Primer on T2 amplicon
T2 Primer on T3 amplicon
T2 Primer on E1 amplicon
T2 Primer on E2 amplicon
T2 Primer on E3 amplicon
-1
0.1
3.22
15.57
32.52
19.9
18.12
21.45
-1
0.1
11.24
24.07
27.06
2.6
21.23
19.28
-2
0.01
10.55
18
34.52
22.15
19.06
22.13
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
13.23 20.86
25.33
37.09
31.88 34.86
36.42
36.67
35.5 36.26
36.04
37.45
31.24 34.42
36.55
38.72
19.88 25.27
28.99
36.74
31.62 35.99
37.46
37.11
-2
0.01
16.75
26
31.78
9.38
26.13
21.54
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
19.73 26.91
32.48
--26.35 30.19
35.99
37.82
---12.07 16.99
21.21
30.2
28.74 33.51
36.3
-14.61 19.54
23.85
32.64
log dilution
-1
-2
Dilution
0.1
0.01
T3 Primer on T1 amplicon
8.48 20.33
T3 Primer on T2 amplicon
3.64 11.93
T3 Primer on T3 amplicon 3.81 12.57
T3 Primer on E1 amplicon
28.42 30.66
T3 Primer on E2 amplicon
--T3 Primer on E3 amplicon
36.35 37.49
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
36.02
---37.71
-----29.14 35.28
21.63 26.54
31.08
39.77
----34.9 37.74
---
49
Table 7. CT values obtained in qPCR analysis for each primer set B. The templates are shown in
different dilutions in all Ephemeroptera (starting from 70 pg/ µl). A full length COI barcode
amplicon was used as template in qPCR for each target species. E1=Caenis diminuta,
E2=Maccaffertium modestum, E3= Maccaffertium interpunctatum, T1=Chimarra obscura,
T2=Ceratopsyche bronta, T3 = Ceratopsyche sparna. CT values in bold indicate target species
for the primer set tested. -- did not pass threshold.
log dilution
Dilution
E1 Primer on T1 amplicon
E1 Primer on T2 amplicon
E1 Primer on T3 amplicon
E1 Primers on E1 amplicon
E1 Primers on E2 amplicon
E1 Primers on E3 amplicon
-1
0.1
12.43
22.53
16.42
5.42
27.06
9.44
-2
0.01
18.48
26.49
21.58
9.58
30.05
12.52
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
19.93 26.36
--20.35 25.21
--32.97 35.31
----10.01 13.75
31.83 31.26
--25.7 29.33
---
log dilution
Dilution
E2 Primer on T1 amplicon
E2 Primer on T2 amplicon
E2 Primer on T3 amplicon
E2 Primers on E1 amplicon
E2 Primers on E2 amplicon
E2 Primers on E3 amplicon
-1
0.1
4.14
16.79
29.22
20.02
11.66
12.55
-2
0.01
11.25
20.49
32.64
23.49
15.32
15.34
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
14.13 21.05
25.84
37.68
25.31 29.13
32.48
-35.51 36.7
-29.1 33.84
38.73
--25.72 30.76
33.84
17.13 21.9
25.6
34.89
log dilution
Dilution
E3 Primer on T1 amplicon
E3 Primer on T2 amplicon
E3 Primer on T3 amplicon
E3 Primers on E1 amplicon
E3 Primers on E2 amplicon
E3 Primers on E3 amplicon
-1
0.1
19.08
27.3
21.81
16.87
13.02
3.48
-2
0.01
22.63
33.23
25.94
20.87
17.62
5.5
-3
-4
-5
-6
0.001 0.0001 0.00001 0.000001
25.75 32.93
35.14
38.84
28.41 31.61
34.93
38.88
36.11 38.27
33.94
38.72
31.63 33.8
37.91
37.53
30.37 34.8
31.04
35.83
8.69 13.39
16.39
26.31
50
Table 8. Summary of the results from both qPCR and 454 pyrosequencing analysis. “Yes” for
qPCR analysis means the target sample passed the threshold in an earlier cycle than any nontarget samples. “Yes” for 454 pyrosequencing means the target species produced a higher
number of reads than non-target species. “No” represents the opposite pattern. T1: Chimarra
obscura T2: Ceratopsyche bronta T3: Ceratopsyche sparna E1: Caenis diminuta E2:
Maccaffertium modestum E3: Maccaffertium interpunctatum
Target
gDNA (setA)
gDNA (setB)
Amplicon (setA)
Amplicon (setB)
species
qPCR
454
qPCR
454
qPCR
454
qPCR
454
T1
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
T2
Yes
No
No
No
Yes
Yes
Yes
No
T3
No
No
No
Yes
Yes
Yes
Yes
Yes
E1
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
E2
No
Yes
No
No
No
Yes
No
No
E3
Yes
No
No
No
Yes
Yes
Yes
Yes
51
Table 9. The slope and efficiency of each primer set for amplicon-based material set A and set
B. T1: Chimarra obscura T2: Ceratopsyche bronta T3: Ceratopsyche sparna E1: Caenis
diminuta E2: Maccaffertium modestum E3: Maccaffertium interpunctatum
T1 primer- set A
SLOPE
EFFICIENCY
-6.323428571
0.439269283
-4.678285714
0.635887793
-0.856285714
13.71751548
-4.013714286
0.774785149
-3.665142857
0.874306715
-3.676
0.870832132
E1 primer- set A
SLOPE
EFFICIENCY
-4.324
0.703206654
-0.19
183297.0711
-6.806
0.402584968
-2.542
1.473950642
-1.438
3.959184798
-7.285
0.371729124
T1 primer-set B
SLOPE
EFFICIENCY
-4.32
0.704046656
-5.76
0.491458285
0
0
-3.3
1.009233003
-8.45
0.313237257
-6.035
0.464536105
E1 primer-set B
SLOPE
EFFICIENCY
-2.28
1.745342234
-1.06
7.778013136
-4.649
0.640967658
-0.2014
92307.84074
-2.9969
1.156145846
-4.441
0.679478739
T2 primer- set A
SLOPE
EFFICIENCY
-5.264
0.548708225
-2.803
1.273843726
-5.38
0.534170424
-5.097428571
0.571004194
-3.752
0.847245087
-2.247428571
1.785819424
E2 primer- set A
SLOPE
EFFICIENCY
-6.239714286
0.446317857
-4.002
0.777767909
-2.531
1.483709227
-4.777
0.619333885
-4.207142857
0.728586041
-5.98
0.469684386
T2 primer-set B
SLOPE
EFFICIENCY
-2.28
1.745342234
-1.06
7.778013136
-4.649
0.640967658
-0.2014
92307.84074
-2.9969
1.156145846
-4.441
0.679478739
E2 primer-set B
SLOPE
EFFICIENCY
-2.28
1.745342234
-1.06
7.778013136
-4.649
0.640967658
-0.2014
92307.84074
-2.9969
1.156145846
-4.441
0.679478739
52
T3 primer- set A
SLOPE
EFFICIENCY
-13.77
0.182011335
-17.035
0.144728963
-11.098
0.230570006
-1.797714286
2.599663628
0
0
-0.158
2133603.527
E3 primer- set A
SLOPE
EFFICIENCY
-4.100285714
0.753417931
-1.891428571
2.37832095
-3.163142857
1.070814847
-4.474
0.67306816
-4.329142857
0.702129538
-4.750285714
0.623729396
T3 primer-set B
SLOPE
EFFICIENCY
-2.28
1.745342234
-1.06
7.778013136
-4.649
0.640967658
-0.2014
92307.84074
-2.9969
1.156145846
-4.441
0.679478739
E3 primer-set B
SLOPE
EFFICIENCY
-2.28
1.745342234
-1.06
7.778013136
-4.649
0.640967658
-0.2014
92307.84074
-2.9969
1.156145846
-4.441
0.679478739
Table 10. The number of reads for gDNA-based material with automated analysis approach for
set A and B. The number of sequences captured by each tag is shown as well. T1 represents
Chimarra obscura, T2 Ceratopsyche bronta, T3 Ceratopsyche sparna, E1 Caenis diminuta, E2
Maccaffertium modestum and E3 is Maccaffertium interpunctatum.
Set A
Set B
# of
Sample Analysis
Detected
# of
# of
Primer used
sequences
type
method
species
sequences
sequences
per tag
40F/183R
240F/545R
T1
111
108
3
Chimarra
T2
0
0
0
obscura
T3
0
0
0
E1
0
0
0
Tag 16
E2
0
0
0
(111 reads total)
E3
0
0
0
T1
4
0
4
Ceratopsyche
T2
34
0
34
bronta
T3
0
0
0
E1
179
0
179
Tag 50
E2
0
0
0
(217 reads total)
E3
0
0
0
T1
0
0
0
Ceratopsyche
T2
0
0
0
sparna
T3
69
0
69
E1
0
0
0
Tag 51
E2
0
0
0
(69 reads total)
E3
0
0
0
DNA
Automated
T1
0
0
0
Caenis
T2
14
0
14
diminuta
T3
0
0
0
E1
191
69
121
Tag 54
E2
0
0
0
(205 reads total)
E3
0
0
0
T1
16
0
16
Maccaffertium
T2
0
0
0
modestum
T3
0
0
0
E1
0
0
0
Tag 56
E2
0
0
0
(16 reads total)
E3
0
0
0
T1
0
0
0
Maccaffertium
T2
0
0
0
interpunctatum
T3
0
0
0
Tag 61
E1
0
0
0
(0 reads total)
E2
0
0
0
E3
0
0
0
53
Table 11. Number of pyrosequencing reads from automated analysis of COI amplicon templates.
The number of sequences captured by each tag is shown as well. T1 = Chimarra obscura, T2
=Ceratopsyche bronta, T3 =Ceratopsyche sparna, E1 =Caenis diminuta, E2 =Maccaffertium
modestum E3 = Maccaffertium interpunctatum.
Set A
Set B
# of
Sample
Analysis
Detected
# of
# of
Primer used
sequences
type
method
species
sequences
sequences
per tag
40F/183R
240F/545R
T1
45
41
4
Chimarra
T2
30
0
30
obscura
T3
0
0
0
E1
8
0
8
Tag 16
E2
0
0
0
(83 reads total)
E3
0
0
0
T1
0
0
0
Ceratopsyche
T2
122
37
85
bronta
T3
0
0
0
Tag 50
E1
37
0
37
(159 reads
E2
0
0
0
total)
E3
0
0
0
T1
0
0
0
Ceratopsyche
T2
21
0
21
sparna
T3
109
0
109
Tag 51
E1
22
0
22
(152 reads
E2
0
0
0
total)
E3
0
0
0
Amplicon Automated
T1
0
0
0
Caenis
T2
25
0
25
diminuta
T3
0
0
0
Tag 54
E1
170
32
138
(195 reads
E2
0
0
0
total)
E3
0
0
0
T1
0
0
0
Maccaffertium
T2
49
0
49
modestum
T3
0
0
0
E1
3
0
3
Tag 56
E2
0
0
0
(52 reads total)
E3
0
0
0
T1
0
0
0
Maccaffertium
T2
0
0
0
interpunctatum
T3
0
0
0
E1
0
0
0
Tag 61
E2
0
0
0
(0 reads total)
E3
0
0
0
54
Table 12. Number of reads for gDNA-based material with manual analysis approach for set A
and B. The number of sequences captured by each tag is shown as well. T1 represents Chimarra
obscura, T2 Ceratopsyche bronta, T3 Ceratopsyche sparna, E1 Caenis diminuta, E2
Maccaffertium modestum and E3 is Maccaffertium interpunctatum.
Set A
Set B
# of
Sample Analysis
Detected
# of
Target species
sequences/
# of sequences
type
method
species
sequences
tag
240F/545R
40F/183R
T1
450
320
130
Chimarra
T2
19
0
19
obscura
T3
15
0
15
Tag 16
E1
0
0
0
(591 reads
E2
47
0
47
total)
E3
60
51
9
T1
160
4
156
Ceratopsyche
T2
7
7
0
bronta
T3
0
0
0
Tag 50
E1
466
1
465
(648 reads
E2
15
15
0
total)
E3
0
0
0
T1
11
0
1
T2
0
0
0
Ceratopsyche
T3
4
0
4
sparna
Tag 51
E1
0
0
0
(17reads total)
E2
2
0
2
E3
0
0
0
DNA
Manual
T1
2
0
2
Caenis
T2
1
1
0
diminuta
T3
0
0
0
Tag 54
E1
288
100
188
(294 reads
E2
6
1
5
total)
E3
2
0
2
T1
333
0
333
Maccaffertium
T2
0
0
0
modestum
T3
0
0
0
Tag 56
E1
0
0
0
(516 reads
E2
180
5
175
total)
E3
3
0
3
T1
0
0
0
Maccaffertium
T2
0
0
0
interpunctatum
T3
53
0
53
E1
0
0
0
Tag 61
E2
0
0
0
(53 reads total)
E3
0
0
0
55
Table 13. The number of reads for amplicon-based material with manual analysis approach of
COI amplicon template for set A and B. The number of sequences captured by each tag is shown
as well. T1 represents Chimarra obscura, T2 Ceratopsyche bronta, T3 Ceratopsyche sparna, E1
Caenis diminuta, E2 Maccaffertium modestum and E3 is Maccaffertium interpunctatum.
Set A
Set B
# of
Sample
Analysis
Detected
# of
Primer used
sequences
# of sequences
type
method
species
sequences
per tag
240F/545R
40F/183R
T1
509
192
317
Chimarra
T2
129
0
129
obscura
T3
0
0
0
Tag 16
E1
27
0
27
(909 reads
E2
175
0
175
total)
E3
69
0
69
T1
317
0
317
Ceratopsyche
T2
197
68
129
bronta
T3
12
12
0
Tag 50
E1
27
0
27
(1021 reads
E2
394
22
175
total)
E3
74
5
69
T1
0
0
0
Ceratopsyche
T2
74
10
64
sparna
T3
168
34
138
Tag 51
E1
56
0
56
(302 reads
E2
0
0
0
total)
E3
4
0
4
Amplicon Manual
T1
185
0
185
Caenis
T2
138
2
136
diminuta
T3
29
22
7
Tag 54
E1
411
60
351
(909 reads
E2
121
19
102
total)
E3
25
1
24
T1
157
13
144
Maccaffertium
T2
116
46
70
modestum
T3
29
26
3
Tag 56
E1
8
1
7
(623 reads
E2
213
115
98
total)
E3
100
53
47
T1
71
5
66
Maccaffertium
T2
9
7
2
interpunctatum
T3
28
28
0
Tag 61
E1
2
2
0
(537 reads
E2
78
73
5
total)
E3
349
143
206
56
FIGURES:
Figure 1: Amplification plots showing threshold and baseline values of fluorescence. The
threshold (dotted line) is set by either the machine itself or the researcher based on the
experiments needs. ∆Rn is the difference between the emission intensity of a reporter dye
divided by the emission intensity of a passive reference dye measured in each cycle. The CT
value is the number of cycles required for each template to pass the threshold, the CT value
indicated is for the sample shown by the purple line (QIAGEN 2006).
CT Value
57
Figure 2. The workflow used in qPCR experiments involves two different approaches; running a
qPCR on genomic DNA as template and using full-length DNA barcode amplicons as template.
Target species
Genomic DNA
Full length
Amplicon- purified
and normalized
PCR
Folmer
primer
qPCR
•
•
Matrix dilution of
different primers
Measurement
58
Matrix dilution of
different primers
• Standard curve
Figure 3. 454 pyrosequencing experimental workflow. Equimolar amounts of each tagged
amplicon were generated using the primers merged into one single lane of the 454 flow cell (1/16
run) and sorted bioinformatically after sequencing (for both set A and B).
T1
T2
T3
E1
E2
E3
DNA mix or PCR mix
Primer T1
Primer E3
Primer E2
Normalization
Primer T2
PrimerT3
Primer E1
•16%
•16%
•16%
•16%
•16%
•16%
T1 primer product
T2 primer product
T3 primer product
E1 primer product
E2 primer product
E3 primer product
•16%
•16%
•16%
•16%
•16%
•16%
T1
T2
T3
E1
E2
E3
Emulation PCR
&
Data Analysis
Tag 1
T1 primer
Set A
F
R
Tag 2
T2 Primer
Tag3
T3 Primer
Tag 4
E1 Primer
Tag 5
E2 Primer
Set B
F
R
59
Tage6
E3 Primer
Figure 4. Exemplar standard curves for qPCR experiments using genomic DNA templates. Six
dilutions were made for each curve to check the consistency of primer behavior. (T1) Chimarra
obscura (Trichoptera, T1obs). (T2) Ceratopsyche bronta (Trichoptera, T2bro). (T3)
Ceratopsyche sparna (Trichoptera, T3spa).
Standard Curve for T1 Primer set B
36.9
CT Value
36.8
36.7
36.6
36.5
T1
36.4
36.3
-8
-6
-4
-2
0
log of Dilution
Standard Curve for T2 primer set B
1.2
1
CT Value
0.8
0.6
0.4
T2
0.2
0
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for T3 primer set B
50
CT Value
40
30
20
T3
10
0
-6
-5
-4
-3
-2
log of Dilution
60
-1
0
Figure 5. Standard curves for Chimarra obscura (Trichoptera, T1obs) (top panel), Ceratopsyche
bronta (Trichoptera, T2bro) (middle panel) and Ceratopsyche sparna (Trichoptera, T3spa)
(bottom panel) primers in amplicon based qPCR experiments. Six dilutions were made for each
case to check the consistency of primer behavior.
Standard Curve for T1 set B
40
CT Value
30
20
T1
10
0
-8
-6
-4
-2
0
log of Dilution
Standard Curve for T2 set B
40
CT Value
30
20
T2
10
0
-8
-6
-4
-2
0
log of Dilution
Standard Curve for T3 set B
80
CT Value
60
40
T3
20
0
-8
-6
-4
-2
log of Dilution
61
0
Figure 6. Exemplar Relative Amplified Copies (RAC) of COI from Chimarra obscura (T1)
compared to 5 other target species. For example in the dilution 10-1 the amplified copies of T1
species is 212 times more than T2, 76 million times more than T3, 5million times more than E1,
7million times more than E2 and 18.6 million times more than E3. No histogram indicates lack
of RAC value in a comparison. T2: Ceratopsyche bronta, T3: Ceratopsyche sparna. E1: Caenis
diminuta E2: Maccaffertium modestum E3: Maccaffertium interpunctatum.
Chimarra obscura (T1)-setA
9
Ceratopsyche bronta(T2)
8
Ceratopsyche sparna(T3)
7
Caenis diminuta(E1)
log RAC
6
Maccaferritium modestum(E2)
5
Maccaferritium
interpunctatum(E3)
4
3
2
1
0
10^-1
10^-2
10^-3
10^-4
PCR template concentrations series
62
10^-5
10^-6
Figure 7. MID distribution for gDNA based material. In the pie chart T1 represents Chimarra
obscura, T2: Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2:
Maccaffertium modestum, E3: Maccaffertium interpunctatum
DNA Based MID Distribution
18%
16%
T1
T2
T3
19%
18%
E1
E2
E3
14%
15%
63
Figure 8. MID distribution for amplicon based material. In the pie chart T1 represents Chimarra
obscura, T2: Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2:
Maccaffertium modestum, E3: Maccaffertium interpunctatum
Amplicon Based MID Distribution
9%
24%
17%
T1
T2
T3
E1
14%
18%
E2
E3
18%
64
APPENDIX 1. Standard curves for target samples, gDNA based
DNA based qPCR experiment SetA – Trichoptera. Standard curves for Chimarra
obscura-T1obs- (top), Ceratopsyche bronta-T2bro- (middle) and Ceratopsyche sparna-T3spa(bottom) primers in DNA based qPCR experiment. Six dilutions were made for each case to
check the consistency of primer behavior.
Standard Curve for T1obs primer Set A
CT Value
60
50
40
30
20
10
0
-7
-6
-5
-4
-3
-2
-1
T1
0
log of Dilution
Standard Curve for T2bro primer Set A
40
CT Value
30
20
T2
10
0
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for T3spa primer Set A
CT Value
15
10
5
T3
0
-6
-5
-4
-3
-2
log of Dilution
65
-1
0
DNA based qPCR experiment SetA- Ephemeroptera
Standard curves for Caenis diminuta-E1dim-(top), Maccaffertium modestum -E2mod(middle) and Maccaffertium interpunctatum-E3spa-(bottom) primers in DNA based qPCR
experiment. Six dilutions were made for each case to check the consistency of primer behavior.
Standard Curve for E1dim primer Set A
50
CT Value
40
30
20
E1
10
0
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for E2mod primer Set A
36
CT Value
35
34
33
E2
32
31
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for E3int primer Set A
37
CT Value
36.5
36
E3
35.5
35
-6
-5
-4
-3
log of Dilution
-2
66
-1
0
DNA based qPCR experiment SetB- Trichoptera
Standard curves for Chimarra obscura-T1obs- (top), Ceratopsyche bronta-T2bro(middle) and Ceratopsyche sparna-T3spa-(bottom) primers in DNA based qPCR experiment.
Six dilutions were made for each case to check the consistency of primer behavior.
Standard Curve for T1obs primer set B
CT Value
36.9
36.8
36.7
36.6
36.5
36.4
36.3
-6
-5
-4
-3
-2
-1
T1
0
log of Dilution
Standard Curve for T2bro primer set B
1.2
CT Value
1
0.8
0.6
0.4
T2
0.2
0
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for T3spa primer set B
50
CT Value
40
30
20
T3
10
0
-5
-4
-3
-2
log of Dilution
67
-1
0
DNA based qPCR experiment SetB- Ephemeroptera
Standard curves for Caenis diminuta-E1dim-(top), Maccaffertium modestum -E2mod(middle) and Maccaffertium interpunctatum-E3spa-(bottom) primers in DNA based qPCR
experiment. Six dilutions were made for each case to check the consistency of primer behavior.
Standard Curve for E1dim primer set B
CT Value
35
30
25
20
15
10
5
0
-5
-4
-3
-2
-1
E1
0
log of Dilution
Standard Curve for E2mod primer set B
40
CT Value
30
20
E2
10
0
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for E3int primer set B
CT Value
39.5
39
38.5
38
37.5
37
36.5
36
-5
-4
-3
-2
log of Dilution
68
-1
E3
0
APPENDIX 2. Standard curve for target samples, amplicon based
Amplicon based qPCR experiment material setA-Trichoptera. Standard curves for
Chimarra obscura-T1obs- (top), Ceratopsyche bronta-T2bro- (middle) and Ceratopsyche
sparna-T3spa-(bottom) primers in amplicon based qPCR experiment. Six dilutions were made
for each case to check the consistency of primer behavior.
Standard Curve for T1obs primer set A
30
25
CT Value
20
15
10
T1
5
0
-8
-6
-4
-2
0
log of Dilution
Standard Curve for T2bro primer set A
CT Value
30
25
20
15
10
5
0
-8
-6
-4
-2
T2
0
log of Dilution
Standard Curve for T3spa primer set A
CT Value
5
4
3
2
1
0
-8
-6
-4
-2
log of Dilution
69
T3
0
Amplicon based qPCR experiment material setA- Ephemeroptera
Standard curves for Caenis diminuta-E1dim-(top), Maccaffertium modestum -E2mod(middle) and Maccaffertium interpunctatum-E3spa-(bottom) primers in amplicon based qPCR
experiment. Six dilutions were made for each case to check the consistency of primer behavior.
Standard Curve for E1dim primer set A
20
CT Value
15
10
E1
5
0
-8
-6
-4
-2
0
log of Dilution
Standard Curve for E2mod primer set A
40
CT Value
30
20
E2
10
0
-8
-6
-4
-2
0
log of Dilution
Standard Curve for E3int primer set A
25
CT Value
20
15
10
E3
5
0
-8
-6
-4
log of Dilution
70
-2
0
Amplicon based qPCR experiment SetB-Trichoptera
Standard curves for Chimarra obscura-T1obs- (top), Ceratopsyche bronta-T2bro(middle) and Ceratopsyche sparna-T3spa-(bottom) primers in amplicon based qPCR experiment.
Six dilutions were made for each case to check the consistency of primer behavior.
Standard Curve for T1obs primer set B
40
CT Value
30
20
T1
10
0
-7
-6
-5
-4
-3
-2
-1
0
log of dilution
Standard Curve for T2bro primer set B
40
CT value
30
20
T2
10
0
-7
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for T3spa primer set B
CT Value
70
60
50
40
30
20
10
0
-8
-6
-4
log of Dilution
71
-2
T3
0
Amplicon based qPCR experiment SetB-Ephemeroptera
Standard curves for Caenis diminuta-E1dim-(top), Maccaffertium modestum -E2mod(middle) and Maccaffertium interpunctatum-E3spa-(bottom) primers in amplicon based qPCR
experiment. Six dilutions were made for each case to check the consistency of primer behavior.
Standard Curve for E1dim primer set B
20
CT Value
15
10
E1
5
0
-7
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for E2mod primer set B
40
CT Value
30
20
E2
10
0
-7
-6
-5
-4
-3
-2
-1
0
log of Dilution
Standard Curve for E3int primer set B
50
CT Value
40
30
20
E3
10
0
-7
-6
-5
-4
-3
log of Dilution
72
-2
-1
0
APPENDIX 3. 454 pyrosequencing analysis results
Table 1. MID distribution in 454 pyrosequencing material. First column indicated the name of
target primers which T1 represents Chimarra obscura, T2: Ceratopsyche bronta, T3:
Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium modestum, E3: Maccaffertium
interpunctatum. The last two columns show the percentage of the target species that could be
generated by the target primer in automated and manual analysis method.
Species
Tag
# Raw reads
% Automated
% Manual
T1
MID16
1051
10.56
56.23
T2
MID50
1256
17.28
51.6
T3
MID51
1011
6.82
6.62
E1
MID54
889
23.1
33.1
E2
MID56
1191
1.34
43.32
E3
MID61
1176
0
4.5
T1
MID16
2587
3.2
35.13
T2
MID50
1937
8.2
42.54
T3
MID51
1915
7.93
16
E1
MID54
1450
13.44
62.7
E2
MID56
1758
3
35.43
E3
MID61
908
0
59.1
DNA
Amplicons
73
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample Chimarra
obscura (second graph and table). T1 represents Chimarra obscura, T2: Ceratopsyche bronta,
T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium modestum and E3 is
Maccaffertium interpunctatum.
Chimarra obscura (T1)-setA
9
Ceratopsyche bronta(T2)
8
Ceratopsyche sparna(T3)
7
Caenis diminuta(E1)
log RAC
6
Maccefferrtium modestum(E2)
5
Maccafferrtium interpunctatum(E3)
4
3
2
1
0
0.1
0.01
0.001
0.0001
0.00001
0.000001
PCR template concentration series
T2
T3
0.1
212
76,026,550
0.01
90
--
0.001
---
0.0001
---
0.00001
---
0.000001
---
E1
E2
E3
221,969
7,103,014
18,615,486
9,503,318
19,406,007
--
----
----
----
----
74
Next gen-Chimarra obscura(T1)-setA
250
Number of Reads
200
150
Seqtrim analysis
Manual analysis
100
50
0
Generated Sequence
T1-MID 16-Amplicon based
40/183
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
SeqTrim analysis
0
0
0
8
0
0
Manual analysis
192
0
0
0
0
0
75
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Ceratopsyche bronta (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Ceratopsyche bronta (T2)-setA
Chimarra obscura(T1)
Ceratopsyche sparna(T3)
Caenis diminuta(E1)
5
Maccafferrtium modestum(E2)
4.5
4
Maccafferrtium interpunctatum(E3)
log RAC
3.5
3
2.5
2
1.5
1
0.5
0
0.1
0.01
0.001
0.0001
0.00001
0.000001
PCR template concentration series
T1
T3
0.1
765
1.64
0.01
1871
2
0.001
6383
1052.7
0.0001
-13400
0.00001
-2402
0.000001
---
E1
--
--
--
--
--
--
E2
E3
6.27
7.31
5
3
43
8659
498
37902
251
3822
---
76
Next gen-Ceratopsyche bronta(T2)-setA
80
Number of Reads
70
60
50
40
Seqtrim analysis
30
Manual analysis
20
10
0
Generated Sequence
T2-MID 50Amplicon
based
40 F
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
SeqTrim
analysis
0
0
0
0
0
0
77
Manual
analysis
0
68
12
0
22
5
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Ceratopsyche sparna (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Ceratopsyche sparna(T3)-setA
4
Chimarra obscura(T1)
Ceratopsyche bronta(T2)
3.5
Caenis diminuta(E1)
3
Maccafferrtium modestum(E2)
Maccafferrtium interpunctatum(E3)
log RAC
2.5
2
1.5
1
0.5
0
0.1
0.01
0.001
0.0001
PCR template concentration series
T1
T2
E1
0.1
7
1
786
0.01
885
296
289,6.30
0.001
2
1
111
0.0001
478
168
3
E2
E3
3
8
2,062
5,293
2
2
2
3
0.00001 0.000001
2
2
92
1
2
102
319
1
78
0.80
2
0.00001
0.000001
Next gen-Ceratopsyche sparna(T3)-40F
40
Number of Reads
35
30
25
20
Seqtrim analysis
15
manual analysis
10
5
0
Generated sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
T3-MID 51PCR
40/183
SeqTrim
manual
analysis
analysis
0
0
0
10
0
34
0
0
0
0
0
0
79
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample Caenis
diminuta (second graph and table). T1 represents Chimarra obscura, T2: Ceratopsyche bronta,
T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium modestum and E3 is
Maccaffertium interpunctatum.
Caenis diminuta(E1)-setA
Chimarra obscura(T1)
Ceratopsyche bronta(T2)
log RAC
10
Ceratopsyche sparna(T3)
8
Maccafferrtium modestum(E2)
6
Maccafferrtium interpunctatum(E3)
4
2
0
0.1
0.01
0.001
0.0001
0.00001
0.000001
-2
-4
-6
PCR template concentration series
T1
T2
0.1
65
14
0.01
627,823
9,307,743
0.001
6,956,836
204,253
T3
E2
E3
8
31
73
8,102,861
0.003
3,902
44762
16,497
110,540,515
0.0001
80,361,436
284,881
0.0003
46020
1,155,431
80
0.00001
2
85,284
0.000001
17,079
3,082
2,117,3.91 7,739,7.54
28,329
2,418
84,695
6,338
Next gen-Caenis diminuta(E1)--setA
70
Number of Reads
60
50
40
Seqtrim analysis
30
Manual analysis
20
10
0
Generated Sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
E1-MID 54PCR
40/183
SeqTrim
manual
analysis
analysis
32
0
0
2
0
22
0
60
0
19
0
1
81
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Maccaffertium modestum (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Maccafferrtium modestum(E2)-setA
Chimarra obscura(T1)
9
8
Ceratopsyche bronta(T2)
7
Ceratopsyche sparna(T3)
log RAC
6
Caenis diminuta(E1)
5
Maccafferrtium interpunctatum(E3)
4
3
2
1
0
0.1
0.01
0.001
0.0001
0.00001
0.000001
PCR template concentration series
T1
T2
T3
E1
E3
0.1
43,064,627
978,356
13,722,119
19,082
23
0.01
-7,005,225
4,558,096
253,214
58
0.001
---8902
1
0.0001
345
89
-3848
25
82
0.00001
280
89
--37
0.000001
-182
--19
Next gen-Maccafferrtium modestum(E2)-setA
140
Number of Reads
120
100
80
Seqtrim analysis
60
manual analysis
40
20
0
Generated Sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
E2-MID 56Amplicon
based
40/183
SeqTrim
manual
analysis
analysis
0
13
0
46
0
26
0
1
0
115
0
53
83
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Maccaffertium interpunctatum (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Maccafferrtium interpunctatum(E3) -setA
10
Chimarra obscura(T1)
9
Ceratopsyche bronta(T2)
8
Ceratopsyche sparna(T3)
Caenis diminuta(E1)
7
log RAC
Macceferritium modestum(E2)
6
5
4
3
2
1
0
0.1
0.01
0.001
0.0001
0.00001
PCR template concentration series
T1
T2
T3
0.1
884,324,121
579,406,248
139,917,386
E1
E2
3,454,391
142
0.01
----
0.001
----
0.0001
225
70
99,334
3,576,210 2,567,49.2
37
4
84
4,420
109
0.00001 0.000001
209
261
58
81
20,171
-3,125
80
3,326
112
0.000001
Next gen-Maccafferrtium interpunctatum(E3) -setA
160
Numeber of Reads
140
120
100
80
Seqtrim analysis
60
Manual analysis
40
20
0
T1obs
T2bro
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
T3spa
E1dim
Generated Sequence
E2mod
E3-MID 61Amplicon
based
40/183
Seqtrim
manual
analysis
analysis
0
5
0
7
0
28
0
2
0
73
0
143
85
E3int
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample Chimarra
obscura (second graph and table). T1 represents Chimarra obscura, T2: Ceratopsyche bronta,
T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium modestum and E3 is
Maccaffertium interpunctatum.
Chimarra obscura (T1)-setB
14.00
Ceratopsyche bronta(T2)
12.00
Ceratopsyche sparna(T3)
log RAC
10.00
Caenis diminuta(E1)
8.00
Maccafferrtium modestum(E2)
6.00
Maccafferrtium(E3)
4.00
2.00
0.00
0.1
-2.00
T2
T3
E1
E2
E3
-1
5220.60
660965624.00
978356.00
30573.63
307451.64
0.01
0.001
0.0001
0.00001
0.000001
PCR template concentration series
-2
174.85
16,431,945
4,653,871
364
3,061
-3
44,453
3,245,479
48,115,553
910
546,552
-4
16384
43,238
22,985,420,368
21
35,858
86
-5
2,179
1,675
100,611,202,922
12
4,482
-6
0.75
1.28
452,773,950,009
0.78
1
Next gen-Chimarra obscura(T1) -setB
350
Number of Reads
300
250
200
Seqtrim analysis
150
Manual analysis
100
50
0
Generated Sequence
T1-MID 16-Amplicon based
240/545
Seqtrim analysis
manual analysis
T1obs
0
317
T2bro
10
129
T3spa
0
0
E1dim
66
27
E2mod
0
175
E3int
0
69
87
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Ceratopsyche bronta (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Ceratopsyche bronta (T2) -setB
12.00
Chimarra obscura(T1)
10.00
Ceratopsyche sparna(T3)
Caenis diminuta(E1)
8.00
Maccafferrtium modestum(E2)
log RAC
6.00
Maccafferrtium(E3)
4.00
2.00
0.00
0.1
0.01
0.001
0.0001
0.00001
0.000001
-2.00
-4.00
-6.00
-8.00
T1
T3
E1
E2
E3
PCR template concentration series
-1
0.00
7.94
0.00
0.14
0.04
-2
0.00
27
0.00
0.55
0.02
-3
0.00
56
0.00
2
11
-4
0.00
1
0.00
0.18
0.00
88
-5
-6
5,990,378,433
---2,425,750
1,233,405,466
84,603,599,871
-15,120,473
6,692,972,775
Next gen-Ceratopsyche bronta (T2) -setB
350
Number of Reads
300
250
200
Seqtrim analysis
150
Manual analysis
100
50
0
Generated Sequences
T2-MID 50Amplicon
based
240/545
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
SeqTrim
analysis
0
10
0
66
0
0
89
manual analysis
317
129
0
27
175
69
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Ceratopsyche sparna (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Ceratopsyche sparna (T3) -setB
15.00
Chimarra obscura(T1)
Ceratopsyche bronta(T2)
10.00
Caenis diminuta(E1)
log RAC
5.00
Maccafferrtium modestum(E2)
Maccafferrtium interpunctatom (E3)
0.00
0.1
0.01
0.001
0.0001
0.00001 0.000001
-5.00
-10.00
-15.00
T1
T2
E1
E2
E3
-1
25
0.89
25606380
2449771426
6244764411
PCR template concentration series
-2
216
0.64
279,018
45,205,657
31,744,426
-3
34
17,079
3,147
218,913
71,715
90
-4
0.00
12
0.00
9
0.00
-5
--2,269,928,957
---
-6
--937,481,977,746
---
Next gen-Ceratopsyche sparna (T3) -setB
400
Number of Reads
350
300
250
200
Seqtrim analysis
150
Manual analysis
100
50
0
Generated Sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
T3-MID 51Am0plicon
based
240/545 F
Seqtrim
manual
analysis
analysis
1
0
146
64
277
138
338
56
0
0
0
4
91
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample Caenis
diminuta (second graph and table). T1 represents Chimarra obscura, T2: Ceratopsyche bronta,
T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium modestum and E3 is
Maccaffertium interpunctatum.
Caenis diminuta (E1) -setB
8.00
Chimarra obscura(T1)
6.00
Ceratopsyche bronta(T2)
4.00
Ceratopsyche sparna(T3)
Maccafferrtium modestum(E2)
log RAC
2.00
Maccafferrtium interpunctatom
(E3)
0.00
0.1
0.01
0.001
0.0001
0.00001 0.000001
-2.00
-4.00
-6.00
-8.00
-10.00
T1
T2
T3
E2
E3
-1
129
141457
2048
3268053
16
PCR template concentration series
-2
478
123145
4096
1452392
8
-3
760
1355130
89525
11945799
33,923
-4
6,251
2,817
3,091,766
186,653
48,983
92
-5
843
359
0.00
891
5,293
-6
------
Next gen-Caenis diminuta (E1) -setB
400
Numebr of Reads
350
300
250
Seqtrim analysis
200
manual analysis
150
100
50
0
Generated Sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
E1-MID 54Amplicon
based
240/545
Seqtrim
manual
analysis
analysis
0
185
25
136
0
7
141
351
0
102
0
24
93
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Maccaffertium modestum (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Maccafferrtium modestum (E2) -setB
6.00
Chimarra obscura(T1)
4.00
Ceratopsyche bronta(T2)
2.00
Ceratopsyche sparna(T3)
log RAC
0.00
-2.00
0.1
0.01
0.001
0.0001
0.00001 0.000001
-4.00
Caenis diminuta(E1)
Maccafferrtium
interpunctatom (E3)
-6.00
-8.00
-10.00
-12.00
T1
T2
T3
E1
E3
PCR template concentration series
-1
0.00
19
104272
177
0.54
-2
0.06
35
161368
284
0.99
-3
0.03
38
45387
40
134
-4
0.55
150
28526
3929
464
94
-5
1.18
118
0.00
8964
302
-6
7
-----
Next gen-Maccafferrtium modestum (E2) -setB
160
140
Number of Reads
120
100
Seqtrim analysis
80
Manual analysis
60
40
20
0
Generated Sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
E2-MID 56Amplicon
based
240/545
Seqtrim
manual
analysis
analysis
0
144
49
70
0
3
3
7
0
98
0
47
95
Comparison between the relative amplification copies obtained from qPCR method (first graph
and table) and the number of reads obtained from 454 FLX pyrosequencer for sample
Maccaffertium interpunctatum (second graph and table). T1 represents Chimarra obscura, T2:
Ceratopsyche bronta, T3: Ceratopsyche sparna, E1: Caenis diminuta, E2: Maccaffertium
modestum and E3 is Maccaffertium interpunctatum.
Maccafferrtium interpunctatum (E3) -setB
6.00
Chimarra obscura(T1)
4.00
Ceratopsyche bronta(T2)
Ceratopsyche sparna(T3)
2.00
log RAC
Caenis diminuta(E1)
0.00
Maccafferrtium modestum(E2)
0.1
0.01
0.001
0.0001
0.00001
0.000001
-2.00
-4.00
-6.00
-8.00
T1
T2
T3
E1
E2
PCR template concentration series
-1
67
19893
443
14
5
-2
32
50012
319
9
0.33
-3
0.11
76
0.75
0.01
--
-4
0.27
0.11
11
0.50
--
96
-5
17
15
7
117
--
-6
8
8
7
3
--
Next gen-Maccafferrtium interpunctatum (E3) -setB
250
Number of Reads
200
150
Seqtrim analysis
Manual analysis
100
50
0
Generated Sequence
T1obs
T2bro
T3spa
E1dim
E2mod
E3int
E3-MID 61Amplicon
based
240/545
Seqtrim
manual
analysis
analysis
0
66
0
2
0
0
0
0
0
5
0
206
97
Download