De novo Pogonus chalceus one

advertisement
PLos one
OPEN 3 ACCESS Freely available online
De novo Transcriptome Assembly and SNP Discovery
in the Wing Polymorphic Salt Marsh Beetle Pogonus
chalceus (Coleoptera, Carabidae)
Steven M. Van Belleghem 1'2*, Dick Roelofs3, Jeroen Van Houdt4, Frederik Hendrickx1,2
1 Terrestrial Ecology Unit, Biology Departm ent, Ghent University, Gent, Belgium, 2 D epartm ent Entomology, Royal Belgian Institute of Natural Sciences, Brussel, Belgium,
3 D epartm ent o f Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands, 4 Laboratory of Cytogenetics and G enom e Research, Leuven, Belgium
Abstract
Background: The salt marsh beetle Pogonus chalceus represents a unique o p p o rtu n ity to understand and study the origin
and evolution o f dispersal polym orphism s as remarkable Inter-population divergence In dispersal related traits (e.g. w ing
developm ent, body size and m etabolism) has been shown to persist In face o f strong hom ogenizing gene flow . Sequencing
and assembling the transcriptom e o f P. chalceus Is a first step In developing large scale genetic Inform ation th a t w ill allow us
to furthe r study the recurrent phenotypic evolution In dispersal traits In these natural populations.
M ethodology/Results: We used the Illum ina HISeq2000 to sequence 37 Gbases o f the transcriptom e and perform ed de
novo transcriptom e assembly w ith the T rinity short read assembler. This resulted In 65,766 contlgs, clustering Into 39,393
unique transcripts (unlgenes). A subset o f 12,987 show sim ilarity (BLAST) to known proteins In the NCBI database and 7,589
are assigned Gene O ntology (GO). Using hom ology searches we Identified all reported genes Involved In w ing
developm ent, juvenile- and ecdysterold horm one pathways In Tribolium castaneum . A bo ut half (56.7%) o f the unique
assembled genes are shared am ong three life stages (third-instar larva, pupa, and ¡mago). We Identified 38,141 single
nucleotide polym orphism s (SNPs) In these unlgenes. O f these SNPs, 26,823 (70.3%) were found In a predicted open reading
frame (ORF) and 6,998 (18.3%) were nonsynonymous.
Conclusions: The assembled transcriptom e and SNP data are essential genom ic resources for furthe r study o f the
developm ental pathways, genetic mechanisms and m etabolic consequences o f adaptive divergence In dispersal pow er In
natural populations.
C itation : Van Belleghem SM, Roelofs D, Van H oudt J, Hendrickx F (2012) De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh
Beetle Pogonus chalceus (Coleoptera, Carabidae). PLoS ONE 7(8): e42605. doi:10.1371/journal.pone.0042605
Editor: Zhanjiang Liu, Auburn University, United States o f America
Received May 17, 2012; A ccepted July 9, 2012; Published August 1, 2012
C opyrigh t: © 2012 Van Belleghem e t al. This is an open-access article distributed under the term s of th e Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided th e original author and source are credited.
Funding: Funding was received from th e FWO-Flanders (PhD grant to Steven Van Belleghem) and th e Belgian Science Policy (MO/36/025) and partly conducted
within th e framework o f th e Interuniversity Attraction Poles program m e IAP (SPEEDY) - Belgian Science Policy. The funders had no role in study design, data
collection and analysis, decision to publish, or preparation o f th e manuscript.
C om peting Interests: The authors have declared th a t no com peting interests exist.
* E-mail: Steven.VanBelleghem@UGent.be
Introduction
a n d in w hich o rd e r they are involved in adaptive differentiation
[15,16,17], (ii) w h eth er adap tatio n s a n d the evolution o f distinct
dispersal phenotypes are m ainly the result o f m utations in coding
regions o f the genom e or ra th e r due to differences in gene
expression (i.e. regulatory changes) [18,19,20,21], (iii) if the
re cu rre n t ap p ea ran c e o f this trait is caused b y in d ep en d en t
m utations o r ra th e r by introgression o f standing genetic variation
[22,23] or the release o f cryptic genetic v a riation b y changes in
epistatic interactions [24,25], a n d (iv) how disruptive selection in
dispersal traits affects m etabolic pathw ays resulting in genetically
co rrelated changes in o th er life history traits [26], Such
in form ation is particularly crucial to link the proxim ate a n d
ultim ate m echanism s underlying the re cu rre n t intra- a n d in te r­
specific evolution o f dispersal phenotypes.
T h e en d an g e red halobiontic g ro u n d beetle Pogonus chalceus
(M arsham , 1802) is a m ost suitable system to study the m olecular
m echanism s b e h in d adaptive divergence in dispersal traits. T h e
species exhibits a clear w ing polym orphism w ith b o th short-w inged
individuals (brachypterous), long-w inged individuals (m acropter-
A vast n u m b e r o f insect species are c h aracterized by rem arkable
a n d often discontinuous m orphological variatio n in traits related to
dispersal capacity [1,2]. As v a riation in such traits determ ines the
ability o f populations a n d species to persist in b o th p a tc h y a n d
c hanging landscapes [3,4,5,6], research on the ultim ate a n d
proxim ate causes o f dispersal is a central them e in b o th
evolutionary ecology a n d conservation biology [7,8], T heoretical
a n d em pirical research o n the ultim ate cause o f dispersal
d e m o n stra ted th a t such dispersal polym orphism s are the result
o f disruptive selection in heterogeneous landscapes in response to
h a b ita t persistence [5,9,10] a n d fitness hom ogenization u n d e r
spatiotem poral pop u latio n fluctuations [1 l,1 2,13,14](H endrickx et
al. u n d e r rev.).
Still, only little is know n ab o u t the m olecular basis o f this
p ro fo u n d phenotypic variation. F or instance, it is unclear w h eth er
(i) divergence in dispersal traits is caused by a small set genes th at
exert large effects o r b y m an y genes w ith m o d erate to small effect,
PLoS ONE I w w w .p lo s o n e .o rg
1
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
ous), as well as interm ediate form s [27]. T hese differences in
dispersal pow er have b e en show n to be related to differences in
h a b ita t stability a n d persistence, w ith long w inged individuals
o ccurring prim arily in unstable a n d relatively recen t salt m arsh
areas. T h e d e term in atio n o f w ing size in this species is polygenic as
crosses b etw een brachy- a n d m acropterous p opulations result in
the p ro d u c tio n o f individuals w ith interm ediate w ing sizes [28].
D ivergent selection o n w ing size likely results in sim ultaneous
selection in o th er life history traits, as suggested b y a strong
correlation am o n g populations betw een average w ing size a n d
frequencies o f the m etabolic enzym e isoform s o f the isocitrate
dehydrogenase 2 (IDH 2) p ro tein [6,29]. M oreover, w ithin a salt
m arsh situated a t the A dantic coast in the G u é ran d e region in
F rance, individuals o f P. chalceus occur chiefly in two h a b ita t types
interlaced a t a very small scale, i.e. ponds a n d canals [29]. Salt
ex traction ponds are m ostly occupied by long w inged individuals
w ith larger body size a n d the ID H 2-B allozym e. T h e bo rd ers o f
tidal canals th a t lead sea w ater to these ponds are occupied by
sm aller short w inged individuals w ith the ID H 2 -D allozym e.
W hile signals o f strong divergent n a tu ra l selection are observed
betw een the ecotypes for the ID H 2 allozym es, dispersal pow er a n d
body size, no differentiation could be d etected for n e u tra l m arkers,
suggesting high levels o f gene flow a m o n g b o th ecotypes [6,29,30].
T hese findings a n d the incipient stage o f divergence m ake the salt
m arsh beetle P. chalceus attractive for genetic studies o f selection,
a d ap tatio n , a n d gene flow.
It has b e en show n th a t p ortions o f the w ing developm ent gene
netw ork are largely conserved am o n g holom etabolous insect
orders [31,32]. A n u m b e r o f genes involved in the patterning,
grow th a n d differentiation o f the w ing in Pfrosophila have been
identified [33] a n d c h aracterized in T. castaneum [34], F u rth e r­
m ore, genes involved in the juvenile h o rm o n e (JH) a n d ecdysteroid
(ECD) p a th w ay have also b e e n show n to be relevant for the study
o f insect polym orphism s, including w ing polym orphism s
[35,36,37,38,39]. Flow ever, little genom ic resources are available
to study the genetic arch itectu re o f dispersal polym orphism s in
n a tu ra l p opulations o f g ro u n d beetles, in w hich intraspecific
dispersal polym orphism s can be found ab u n d an tly [40,41,42].
C onsidering g ro u n d beetles (C arabidae), N C B I reports 306 E ST s
from a study co m p arin g seven co leopteran species [43] a n d a
m itochondrial genom e o f a Calosoma species [44]. O th e r genom ic
resources com prise m ostly single bar-co d in g gene sequences, such
as cytochrom e oxidase a n d ribosom al R N A , used for phylogenetic
studies. T h e only co leopteran species for w hich the genom e has
b e en sequenced is the re d flour beetle Tribolium castaneum [34],
belonging to the P olyphaga suborder. T h e evolutionary distance o f
this suborder to the A dep h ag a suborder, com prising C a rab id ae
species, is estim ated to be m o re th a n 200 M a [45].
S h o rt re ad de now transcriptom e analysis has pro v en to be a
valuable first step to study genetic characteristics a n d allow ed
researchers to o b tain sequence inform ation a n d expression levels
o f genes involved in developm ental a n d m etabolic pathw ays,
insecticide resistance, can d id ate transcripts for diapauses p re p a ra ­
tion based o n hom ology w ith related organism s a n d to discover
single nucleotide polym orphism (SNP) in all kinds o f m odel a n d
non-m odel organism s [46,47,48,49],
In this study, w e used Illum ina short re ad sequencing for de novo
transcriptom e assem bly a n d analysis o f the salt m arsh beetle P.
chalceus. W e co nstructed th ree libraries covering th ree life stages,
one third-instar larva, one p u p a a n d one a d u lt m ale beetle. W e
m atc h ed these sequences in a B L A ST search to know n proteins o f
the N C B I database a n d aligned the sequences to the genom e o f T.
castaneum. M atches include a n u m b e r o f genes relevant to the study
o f w ing developm ent a n d dispersal polym orphism . F urth erm o re,
PLoS ONE I w w w .p lo s o n e .o rg
we screened the transcriptom e for b o th conservative SN Ps a n d
SN Ps resulting in am ino acid changes, w hich will allow genom e
w ide screening o f v a riation betw een different ecotypes. T h e
resulting assem bled a n d a n n o ta te d transcriptom e sequences
constitute com prehensive genom ic resources, available for further
studies a n d m ay provide a fast a p p ro ac h for identifying genes
involved in developm ental pathw ays (i.e. w ing developm ent, J H ,
a n d E C D ) relevant to adaptive divergence in this species.
Materials and M ethods
Tissue material and nucleic acid Isolation
T h e geographical distribution o f P. chalceus extends along the
A tlantic coasts from D e n m ark u p to a n d including the m ajo r p a rt
o f the M e d ite rran e a n coasts [50]. Beetles w ere c ap tu red in the
G u é ran d e region, F rance. N o specific perm its w ere re q u ire d for
the described field study. Eggs w ere o b tain e d from the canal
ecotype (short-winged) a n d raised in a c o m m o n environm ent. A
larva (third-instar), p u p a a n d im ago (male) resulting from the sam e
m o th e r w ere frozen in liquid nitro g en a n d subsequently used for
sequencing. T h e sex d e term in atio n is p ro b ab ly o f the X Y type
P i]T o ta l R N A was isolated from a com plete larv a (third-instar),
p u p a a n d newly em erged m ale im ago. R N A was ex tracted using
the SV T o ta l R N A isolation System (Prom ega, M adison, USA)
according to m an u fa ctu re r’s instructions a n d genom ic D N A was
rem oved by on-colum n digest w ith D N ase I. R N A was quantified
by m easuring the ab sorbance a t 260 n m using a N a n o D ro p
sp ectrophotom eter (T herm o Fisher Scientific, Inc.). T h e p u rity o f
the R N A sam ples was assessed a t a n ab sorbance ratio o f O D 26o/
280 a n d O D 26o/230 a n d the integrity was confirm ed on a n A gilent
2100 B ioanalyzer (Agilent T echnologies, Inc.).
Illumina paired-end cDNA library construction and
sequencing
T h e cD N A libraries w ere co nstructed for the larva, p u p a a n d
im ago using the T r u S e q ™ R N A Sam ple P re p a ra tio n K it
(Illum ina, Inc.) according to the m an u fa ctu re r’s instructions.
Poly-A c o ntaining m R N A was purified from 2 pg o f total R N A
using oligo(dT) m agnetic beads a n d fragm ented into 2 0 0 -5 0 0 b p
pieces using divalent cations a t 94°C for 5 m in. T h e cleaved R N A
fragm ents w ere copied into first stran d cD N A using S u p e rsc rip t II
reverse transcriptase (Life T echnologies, Inc.) a n d ra n d o m
prim ers. A fter second strand cD N A synthesis, fragm ents w ere
en d rep aired , a-tailed a n d indexed ad ap ters w ere ligated. T h e
products w ere purified a n d en rich ed w ith P C R to create the final
cD N A library. T h e tagged cD N A libraries w ere p ooled in equal
ratios a n d used for 2 x 1 0 0 b p p a ire d -en d sequencing on a single
lane o f the Illum ina H iS eq2000 (G enom ics C ore, U Z L euven,
Belgium). A fter sequencing, the sam ples w ere dem ultiplexed a n d
the indexed a d a p te r sequences w ere trim m ed using the C A SA V A
v l.8 .2 software (Illum ina, Inc.).
De novo transcriptom e assembly
T h e transcriptom e reads w ere de novo assem bled using T rin ity
(release 20111126) [52] on the S T E V IN S u p erco m p u ter In fra ­
structure a t G h e n t U niversity (48 cores, 350 G o f m em ory). T h e
th ree sam ples (i.e. larva, p u p a , a n d imago) w ere assem bled a n d
analyzed as a p ooled dataset. As the T rin ity assem bler discards low
coverage Um ers, no quality trim m in g o f the reads was p erform ed
p rio r to the assem bly. T rin ity was ru n o n the p a ire d -en d sequences
w ith the fixed default k-m er size o f 25, m in im u m contig length o f
200, p a ire d fragm ent length o f 500, 12 C PU s, a n d a butterfly
H e ap S p ace o f 25G (i.e. allocated m em ory). P rio r to subm ission o f
2
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
the d a ta to the T ran scrip to m e S hotgun Assem bly Sequence
D atabase (TSA), assem bled transcripts w ere blasted to N C B I’s
U niV ec
datab ase
(h ttp ://w w w .n c b i.n lm .n ih .g o v /V e c S c re e n /
U niV ec.htm l) to identify segm ents w ith a d a p te r contam in atio n
a n d trim m ed w h en significant hits w ere found. T his a d ap ter
c o n tam in atio n m ay result from sequencing into the 3 ' ligated
a d a p te r o f small fragm ents (< 1 0 0 bp). H u m a n a n d bacterial
sequence c o n ta m in a tio n was investigated using the w eb-based
version o f D econS eq [53], w ith a qu ery coverage a n d sequence
identity threshold o f 90% .
for n u m b e r o f th read s (-t) = 8 a n d m axim um n u m b e r o f alignm ents
(sam pe -n) = 40. U n d e r these settings, re ad pairs m ap p in g to
m ultiple equally best positions are p laced random ly. P roperly
p a ire d reads w ith a m ap p in g quality o f a t least 20 (-q = 20) w ere
extracted from the resulting B A M file using SA M tools [67] for
fu rth er analyses. P roperly p a ire d is defined as b o th left a n d right
reads m ap p e d in opposite directions o n the sam e tran scrip t a t a
distance com patible w ith the expected m ea n size o f the fragm ents.
T h e high m ap p in g quality ensures reliable (unique) m ap p in g o f the
reads, w hich im p o rta n t for v a ria n t calling.
As reads c an m ap to m ultiple genes or isoform s a n d we have no
available reference genom e, w e used the R S E M software [68] to
assign reads to genes a n d isoform s a n d to co u n t transcript
abundances. R S E M requires gap-free alignm ents a n d therefore
the Bowtie aligner (older version, n o t Bowtie 2) was used a n d
pro p erly p a ire d reads w ere extracted. R S E M a n d Bowtie w ere
used as im p lem en ted in the T rin ity software package [52]. Bowtie
m ap p in g p a ram ete rs w ere set as follows: m ax im u m n u m b e r o f
m ism atches allow ed (-v) = 2, n u m b e r o f valid alignm ents p e r re ad
p a ir (-k) = 40. S etting the - k p a ra m e te r allows reads to align
against u p to 40 different locations. T h e old version o f Bowtie does
n o t re p o rt m ap p in g quality and, hence, does n o t enable filtering
on this p a ram ete r. W e c o m p a red the th ree developm ental stages
for tran scrip t com position. U niquely expressed genes for each life
stage w ere co u n ted a n d investigated for G ene O ntology (GO)
com position.
Functional annotation
T h e assem bled transcripts w ere subjected to sim ilarity search
against N C B I’s n o n -re d u n d a n t (nr) database using the B L A ST x
algorithm [54], w ith a cut-off E -value o f < IO-3 a n d a H S P (highscoring segm ent pairs) length cut-off o f 33. T h e publicly available
platform in d ep e n d en t ja v a im plem entation o f the B last2G O
software [55] was used for blasting a n d to retrieve associated
gene ontology (GO) term s describing biological processes,
m olecular functions, a n d cellular com ponents [56]. T o p 20 blast
hits w ith a cut-off E-value o f < 1 0 6 a n d sim ilarity cut-off o f 55%
w ere considered for G O a n n o tatio n . N ext, to get a n idea o f the
a m o u n t o f genes o f the T. castaneum transcriptom e are covered by
P. chalceus transcripts, assem bled transcripts w ere aligned to the
T rib o liu m Official G ene Set [34,57] using the P R O m e r pipeline
o f the M U M m e r 3.0 software [58] w ith default p aram eters. T h e
presence o f o p en read in g fram es (ORFs) was investigated using the
O R F -p red ic to r server w ith a n O R F cut-off length o f 200 b p [59].
Variant analysis
O n ly reliable pro p erly p a ire d BW A m ap p e d reads w ere
considered for Single N ucleotide Polym orphism (SNP) calling.
Indels w ere n o t considered because alternative splicing im pedes
reliable indel discovery. SN Ps w ere called using the SA M tools
software package [67]. G enotype likelihoods w ere c o m p u ted using
the SA M tools utilities a n d variable positions in the aligned reads
c o m p a red to the reference w ere called w ith the B CFtools utilities
[69]. U sing the v arF ilter com m and, SN Ps w ere called only for
positions w ith a m inim al m ap p in g quality (-Q) a n d coverage (-d) o f
25. T h e m ax im u m re ad d e p th (-D) was set a t 200. T h e reference is
based o n all three sam ples com bined. T h erefo re, to co m p are the
variational com position o f the sam ples, w e ex tracted only
heterozygous SN P positions (i.e. M ax-likelihood estim ate o f the
site allele freq u e n c y ~ 0 .5 ) from each sam ple for the unigenes.
U n iq u e a n d shared SN Ps w ere extracted w ith the V C F tools
software [70]. SN Ps located in a n o p en read in g fram e (ORF)
< 2 0 0 b p w ere extracted. A custom peri script was used to test
w h eth er these SN Ps resulted in a n am ino acid change in the
p red icted O R F .
Genes o f interest
T o guide o u r search for w ing developm ent genes, we used a
previously g e n era te d list o f Tribolium castaneum (T able S 13b
R ich ard s et al. 2008 [34]). T o find P. chalceus w ing developm ent
orthologs, we used T. castaneum p ro tein sequences in a local
B L A ST search (tBLASTn) querying the assem bled P. chalceus
transcriptom e sequences. H its w ith a n E-value less th a n l e - 15
w ere exam ined. T h e m ost significant h it was considered to be the
putative P. chalceus orthologue o f the w ing d evelopm ent gene in T.
castaneum. SubsequenÜy, the P. chalceus tran scrip t sequence was
used in a reciprocal b last to the N C B I n r database. I f the B L A ST
a n d reciprocal B L A ST m atched, we assigned orthology to th a t
sequence. F or the apterous gene, we extracted sequences o f D.
melanogaster, T. castaneum, A. mellifera a n d A.pisum from G enB ank a n d
constructed a neighbor-joining tree o f the p ro tein sequences w ith
M E G A 5.0 [60], b o o tstrap p e d 1000 tim es. T h e m ethodology used
is sim ilar to th a t o f Brisson et al. 2010 [61].
N ext, genes involved in the juvenile h o rm o n e (JH) [62] a n d
ecdysteroid (ECD) [63] path w ay in T. castaneum w ere extracted
from the K E G G path w ay datab ase [64] a n d the sam e p ro c ed u re
for orthologue discovery for w ing d evelopm ent genes was followed.
T h e assem bled transcriptom e was also investigated for the
presence o f the isocitrate dehydrogenase 2 (IDH 2) gene, w hich
has b e e n show n to b e strongly co rrelated w ith dispersal pow er in P.
chalceus [6,29]. F or this; the T. castaneum p ro tein sequence o f the
gene hom ologues to ID H 2 (X P_970446) was blasted to the P.
chalceus transcript.
Results and Discussion
Sequencing, transcriptom e assembly and validation
T h re e developm ental stages (one third-instar larva, p u p a a n d
m ale a d u lt beetíe) w ere b arco d e tagged a n d sequenced o n one lane
o f a n Illum ina H iS eq2000 sequencer. S equencing o f cD N A
libraries gen erated a total o f 184,749,261 raw p a ire d e n d reads
w ith a length o f 101 bp, resulting in a total o f 37.32 giga bases.
T h e raw sequence reads w ere o f good quality (< 2 0 P h re d score). A
sum m ary o f sequencing, assem bly a n d a n n o ta tio n results for the
th ree sam ples a n d the pooled reads dataset is presen ted in T ab le 1.
F or the p u p a sam ple, rem arkably less reads w ere sequenced.
R eads w ere assem bled using the R N A seq de novo assem bler T rin ity
[52]. T h e com plete re ad dataset assem bled into 65,766 contigs,
clustering into 39,393 isoform clusters (i.e. unigenes). W e selected
the longest tran scrip t as the representative for each cluster. T h e
M apping reads to reference transcriptome
T o align the reads b ack to the assem bled reference tran scrip ­
tom e the B urrow s— W h eeler A ligner (BWA) p ro g ra m [65] a n d the
Bowtie aligner [66] w ere used. BW A was used for v a ria n t analysis.
R eads w ere m ap p e d for each sam ple (i.e. larva, p u p a , a n d imago)
separately to the assem bled transcriptom e based o n the pooled
re ad d ata. T h e BW A default values for m ap p in g w ere used, except
PLoS ONE I w w w .p lo s o n e .o rg
3
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
T a b le 1. P. c h a lc e u s t r a n s c r i p t o m e s e q u e n c i n g , a s s e m b l y a n d a n n o t a t i o n s u m m a r y .
Stage
Sequencing
Assembly
Annotation
Larva
Pupa
Im ago
ALL
Sequencing reads (101 bp paired end)
66,595,267
48,251,298
69,902,696
184,749,261
Bases (Gbp)
13.45
9.75
14.12
37.32
Trinity assembly (Transcripts)
65,766
Unigenes (Isoform clusters)
39,393
N50 length (bp) (Unigenes)*
1,904
Max length (bp) (Transcripts)
19,606
Max length (bp) (Unigenes)
19,606
Mean length (bp) (Transcripts)
1,044
Mean length (bp) (Unigenes)
868
Median length (bp) (Transcripts)
422
Median length (bp) (Unigenes)
365
Transcripts with BLAST results
29,358
Unigenes with BLAST results
12,987
Transcripts annotated with GO terms
17,756
Unigenes annotated with GO terms
7,589
Mapping
Read m appings (properly paired)
83,539,754
53,814,547
85,597,567
(BWA)**
Properly paired reads (%)
92.6
90.4
93.1
Mean coverage (properly paired)
93.7
55.2
111.6
Median coverage (properly paired)
0.93
0.91
2.27
Mapping
Read m appings
143,056,584
97,896,830
156,747,118
(Bowtie)**
Properly paired reads (%)
86.8
87.2
87.7
Mean coverage (properly paired)
132.98
78.54
150.71
Median coverage (properly paired)
1.95
2.21
4.67
*Contig length for which half o f all bases in th e assem bled sequences are in a sequence equal or longer than this contig length.
**Reads o f each sample w ere m apped to th e assem bled transcriptom e of the pooled data (ALL).
doi:10.1371 /journal.pone.0042605.t001
size o f the contigs ra n g ed from 200 (m inim um contig length) up to
19,606 bp, w ith a m ean length o f 1,046 b p a n d totaling
68,799,644 b p for all contigs (Figure 1) a n d a m ea n length o f
869 b p totaling 34,249,556 b p for the unigenes. T h e top longest
(> 1 6 ,0 0 0 bp) assem bled sequences w ere inspected for correctness.
O verall these extrem ely long transcripts m atc h ed long gene
sequences presen t in N C B I’s n r database, indicating th a t these
sequences a re not the result o f chim erical assem bly errors due to
re p ea t regions in the genes. T h e longest transcript (19,606 bp) also
m atches the D. melanogaster d u m p y gene, a gigantic extracellular
p ro tein re q u ire d to m ain tain tension a t ep iderm al cuticle
a tta ch m en t sites [71].
B acterial a n d h u m a n transcriptom e c o n tam in atio n was negli­
gible. Fifty a n d fifty-seven unigenes w ere identified b y D econS eq
[53] as bacterial a n d h u m a n c o n ta m in a n t sequences, respectively.
Flow ever, these sequences w ere short in length (289 b p (SD = 148)
a n d 251 b p (SD = 60) for b acterial a n d h u m a n contam inants,
respectively) a n d m ost likely represent conserved p ro tein regions.
All sequencing reads w ere deposited into the S hort R e ad
A rchive (SRA) o f the N atio n al C e n te r for B iotechnology
In fo rm atio n (NCBI), a n d can be accessed u n d e r the accession
n u m b e r SR A 050429. T h e assem bled transcriptom e was subm itted
to the T ran scrip to m e S hotgun Assem bly S equence D atabase
(TSA) a n d can b e accessed th ro u g h the G enB ank accession
num bers JU 4 0 4 6 8 7 ^JU 4 7 0 4 5 2 .
PLoS ONE I w w w .p lo s o n e .o rg
Functional annotation
F ro m the assem bled unigenes, 12,987 (33.0%) show ed signifi­
can t sim ilarity (E v a lu e < le ) to proteins in N C B I’s nonre d u n d a n t (nr) database, w ith a n average best-hit am ino acid
identity o f 70.5% (SD = 14.2). As expected, the m ajority o f the
sequences h a d top hits to T. castaneum proteins (54.5%) (Figure 2),
the only C o leo p tera species for w hich a com plete genom e is
available. O th e r insects resem bling P. chalceus sequences are
divided across different insect orders, the m ost relevant being
H y m e n o p te ra (,Nasonia vitripennis (2.85%), Camponotus floridanus
(2.41%), Apis mellifera (2.15%), Harpagnathos saltator (1.86%)),
L ep id o p tera (Danaus plexippus (2.48%)), H e m ip te ra (Acyrthosiphon
pisum (2.24%)), a n d D ip tera (Aedes aegipty (1.88%)). T h e only nonA rth ro p o d a species w ith top blast hits w o rth m entio n in g is Hydra
magnipapillata (0.53% ).In total 7,589 (19.3%) P. chalceus unigenes
w ere assigned G ene O ntology (GO ) term s based on B L A ST
m atches to sequences w ith know n function. T h e functional
classification based o n biological process, m olecular function a n d
cellular c o m p o n en t is d epicted in Figure 3. A m ong th e biological
process term s, a significant percen tag e o f genes w ere assigned to
cellular (22.1%) a n d m etabolic (18.0%) processes. M olecular
functions w ere for a high percen tag e assigned to b in d in g (44.8%)
a n d catalytic activity (36.4%), w hereas m an y genes w ere assigned
to cell p a rt (48.2%) a n d organelle (27.5%) for the functional class
cellular com ponent. T hese observations are in acco rd an ce w ith
observations o f m etabolic processes in o th er transcriptom ic studies
o n insects [4 7 ,48,72,73,74].R edundancy is expected in the
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
6000
to
Œ)
C
o
o
4000
to
JD
E
2000
3.0
3.5
C o n tig length (Iog10(bp))
Figure 1. C ontig length d istrib ution o f T rin ity assem bly fo r P o g o n u s c h a lc e u s . All a s s e m b le d c o n tig s w e re in c lu d e d .
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 1
assem bled transcriptom e due to the stochastic process o f
sequencing a n d the heuristic n a tu re o f the assem bly process,
w hich can result in the fragm ented assem bly o f genes. T o assess
how m an y actual unique genes we have found in o u r data, we
aligned the o b tain e d unigenes to the 16,645 official genes re p o rte d
for T. castaneum. O f these Tribolium genes, 6,883 w ere covered b y P.
chalceus transcripts based o n the P R O m e r alignm ents [58], w ith a
m ea n p ercen t sim ilarity o f 76.2% (SD = 10.4). N ext, m in in g the
alignm ents shows th a t 7 64 o f these Tribolium gene hits have m ore
th a n one hit by uniq u e P. chalceus transcripts (com prising 1,837
unigenes). F or the transcripts w ith a P R O m e r alignm ent to a
Tribolium gene this corresponds to a m axim al red u n d an c y o f
15.6% ((l,837-764)/6,883). H ow ever, fu rth er investigating these
m ultiple hits show ed th a t m ost com prise genes th a t belong to the
sam e gene fam ily (i.e. paralogs). O nly 272 Tribolium genes are
m atc h ed by m ultiple n on-overlapping P. chalceus contigs (com pris­
ing 649 unigenes) a n d align to different portions o f the sam e gene.
T his reduces the re d u n d an c y to 5.5% ((649-272)/6,882>). H ence,
the contig sets th a t are different p ortions o f the sam e gene do
inflate the gene counts for P. chalceus to only a m in o r extent.
W e calculated the “ ortholog hit ra tio ” as described in O ’N eil et
al. 2010 [75] b y dividing the length o f the p u tative coding region o f
a unigene by the length o f the orth o lo g fo und for th a t unigene. F or
this, each unigene a n d its best B L A ST x hit w ere considered
orthologs a n d the hit region in the unigene is considered to be a
conservative estim ator o f the “p u tative coding region” . In this
way, the ortholog hit ratio gives a n estim ate o n the a m o u n t o f a
transcript th a t is rep resen ted b y each unigene. R atios greater th an
1.0 can indicate insertions in unigenes. Figure 4(A) shows th a t the
com pleteness o f the assem bled transcripts decreases for very long
genes. H ow ever, for genes w ith a length < 1 2 ,0 0 0 b p this
relationship disappears, w hich shows th a t the sequencing design
a n d T rin ity assem bler succeed well in assem bling b o th short a n d
long transcripts. T h e distribution o f orth o lo g hit ratios is
rep resen ted in Figure 4(B). O verall, unigenes w ith B L A ST x results
have high ratios, indicating high com pleteness o f these transcripts.
PLoS ONE I w w w .p lo s o n e .o rg
O f the 12,987 transcripts w ith B L A S T x results, 4,567 genes have a
ratio > 0 .9 a n d 8,300 have a ratio > 0 .5 .
A high percentage o f unigenes (31,804; 80.7% ) could n o t be
assigned a G O term . E xam ining the length a n d coverage
distribution o f these a n n o ta te d a n d u n a n n o ta te d uniq u e tra n ­
scripts shows th a t m ost reads (68.8%) are, how ever, m ap p e d to
a n n o ta te d transcripts. F u rth e rm o re , a m ajo r p o rtio n o f the
u n a n n o ta te d transcripts consist o f assem bled transcripts w ith very
low coverage values a n d short length (Figure 5). F o r instance,
23,497 (59.6% o f all unigenes) o f these u n a n n o ta te d transcripts
have a length shorter th a n 500 b p a n d only 3.1% o f all reads m ap
to these transcripts. T hese short low coverage transcripts m ay
represent chim eric sequences resulting from assem bly errors,
fragm ented transcripts corresponding to lowly expressed genes, as
well as u n tran sla ted regions. T h e rem ain in g 8,427 u n a n n o ta te d
sequences are m o re likely to represent tru e gene sequences, w hich
m ay represent novel genes o r less conserved genes for w hich no
a n n o ta tio n is found. 15,765 (40.0%) o f the unigenes h a d a n O R F
(open read in g fram e) > 2 0 0 bp, w ith a n average length o f 1,040 b p
a n d a m edian length o f 659 bp. 7,203 (45.7%) o f these unique
sequences w ith O R F s w ere assigned G O annotations. T h e
rem ain in g sequences w ith a n O R F > 2 0 0 b p th a t lack a n n o ta tio n
results m ight represent tru e gene sequences. F ro m the d a p h n ia
genom e sequence it was discovered th a t significant genom ic
regions w ithout assigned open read in g fram es are actively
tran scrib ed [76], T h e functional significance o f these regions
rem ains to b e elucidated, b u t such transcripts m ay also be present
in the Pogonus transcriptom e, w hich c an n o t b e functionally
analyzed. F u rth erm o re, high n u m b ers o f u n a n n o ta te d contigs
a re frequently found in o th er transcriptom e sequencing projects
[72,73,74,77] a n d m ay give som e indication o f the lim itation o f
inferring the relevant functions o f transcripts assem bled from
sequence d a ta from species w ith very lim ited genom ic resources or
w ith long evolutionary distances to m odel species. O n the o th er
han d , T rin ity succeeds in assem bling a reasonable set o f a n n o ta te d
genes despite low coverage values (Figure 5).
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
■ Tribolium castaneum
15
m Nasonia vitripennis
mDanausplexippus
0.36%
■ Camponotus floridanus
m Pediculus humanus
0 52%
0 53%
m Acyrthosiphon pisum
0 .8 4 % ^ 1
mApismellifera
1 33%
m Aedes aegypti
m Harpegnathos saltator
m Solenopsis invicta
m Bombus impatiens
■ Bombus terrestiis
mAcromyrmex echinatior
1 .8 6 %
m Anopheles gambiae
■ Culex quinquefasciatus
2 15 *?
m Bombyx mon
■ Drosophila melanogaster
2 24%
■ Hydra magnipapillata
2 30%
Daphnia pulex
Anopheles dariingi
Otheis
Figure 2. Species d istrib ution o f top BLASTx results. T h e p ie c h a rt s h o w s th e s p e c ie s d is trib u tio n o f u n ig e n e s t o p BLASTx re s u lts a g a in s t th e nr
p ro te in d a ta b a s e w ith a c u to ff E v a lu e < 1 e ~ 3.
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 2
60,00%
50,00%
40,00%
30,00%
20 ,00 %
0,
1
■1
10 ,00 %
n „ . l n
li .
, 1
n
- ■ 1 I - 1
\e -
B iological Process
M olecular Function
C ellula r C om po n en t
Figure 3. Gene O n to lo g y (GO) categories o f th e unigenes. D istrib u tio n o f t h e GO c a te g o r ie s a s s ig n e d to th e P ogonus chalceus tra n s c rip to m e .
U n iq u e tra n s c rip ts (u n ig e n e s) w e re a n n o ta te d in th r e e c a te g o rie s : cellu lar c o m p o n e n ts , m o le c u la r fu n c tio n s , b io lo g ic a l p ro c e ss.
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 3
8 . PLoS ONE I w w w .p lo s o n e .o rg
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
1200
1000
O
!_
800
4—'
C
=3
O
O
cn
o
°
_c=
0 .6
-
O
600
400
0 .4 -
0 .2
200
-
0.0
2.0
2 .5
3.0
3 .5
4 .0
0 .0
0 .2
O rtholog length (log10(Am ino acids))
0 .4
0 .6
0 .8
1.0
1 .2
1 .4
Ortholog hit ratio
Figure 4. Relationship b etw een o rtho log hit ratio and o rtho log length (A) and distrib ution o f o rtho log h it ratios (B). O rth o lo g hit
ra tio s w e re c a lc u la te d fo r c o n tig s w ith BLASTx re su lts. A ra tio o f 1.0 in d ic a te s th e g e n e is likely fully a ss e m b le d .
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 4
Genes o f interest
As we are interested in the adaptive divergence o f w ing length in
populations o f P. chalceus, we beg an o u r investigation b y searching
the assem bled transcriptom e for orthologous genes know n to be
involved in w ing developm ent in the fruit fly Drosophila melanogaster.
In particular, we used a previously gen erated list o f the w ing
developm ent genes re p o rte d in the genom e o f the re d flour beetle
Tribolium castaneum (T able S13b o f R ichards et al. 2008 [34]), w hich
was based o n Drosophila w ing developm ent studies. W e found
A nnotated
2.5
3.0
3.5
orthologous genes for every w ing developm ent gene th a t we
looked for in the assem bled P. chalceus transcriptom e w ith high
confidence (T able 2). E ngrailed (en) a n d in v erted (inv) blasted to
the sam e P. chalceus tran scrip t a n d reciprocal blast o f this
co m p o n en t re tu rn e d engrailed. T his is not surprising considering
their sim ilarity in sequences a n d function [78], R etrieving
orthologous genes for the apterous (ap) gene was problem atic as
this gene exhibits a duplication in T. castaneum a n d Acyrthosiphon
pisum [61,79]. T h erefo re, w e aligned the am ino acid sequences o f
U nannotated
4.0
2.5
3.0
3.5
4.0
C ontig length (Iog 10 (bp ))
Figure 5. C ontour p lot o f length and coverage distrib ution o f a nn otated (left) and u nann otated (right) unigenes. T ran sc rip ts w e re
a n n o ta te d u sin g Blast2G O . R eads w e re m a p p e d u sin g BWA. For th e a n n o ta te d tra n s c rip ts , m e a n le n g th a n d c o v e r a g e w a s 2,1 3 9 a n d 932,
re sp ec tiv e ly . For th e u n a n n o ta te d tra n s c rip ts , m e a n le n g th a n d c o v e r a g e w a s 5 6 7 a n d 224, re sp ec tiv e ly . T h e c o lo r b a r s h o w s th e Io g 1 0 tra n s fo rm e d
c o u n t v a lu e s.
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 5
PLoS ONE I w w w .p lo s o n e .o rg
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
apterous genes from D. melanogaster (N P_724428), T. castaneum (apA:
N P _ 0 0 1 139341, apB: A C N 43342), Apis mellifera (X P_392622) a n d
A. pisum (apA: X P _001946004, apB: X P _001949543) w ith those
retrieved from B L A ST hits to the P. chalceus transcriptom e
(Figure 6). T h e apterous gene is a hox transcription factor a n d
contains tw o conserved dom ains; the hom eo d o m ain a n d the L IM con tain in g region [80]. As we did n o t retrieve the hom eo dom ain
for apB o f P. chalceus, we only c o m p a red the conserved L IM
d o m ain region o f the apterous genes as rep o rte d in [61]. T o ro o t the
tree, we a d d ed the closely related L IM -co n tain in g gene tailup (tup)
o f A. pisum (X P_001944557) a n d T. castaneum (X P_001815525).
T h e phylogenetic inference indicates th a t P. chalceus exhibits b o th
apterous paralogs th a t are presen t in T. castaneum a n d A. pisum
genom e, w hich w ere lost in the holom etabolous insects Drosophila
a n d Apis. T h e relationships are sim ilar as the ones re p o rte d by
[61]Subsequently, w e p erfo rm e d sim ilar sim ilarity analyses for genes
involved in th e Ju v e n ile h o rm o n e a n d ecdysteroid p athw ay. W e
fo u n d orthologous candidates w ith h igh certain ty for each gene
re p o rte d in th e K E G G insect h o rm o n e biosynthesis p a th w ay
(T able 3). T h e length o f the O R F o f th e P. chalceus m atch,
c o m p a red to th e O R F length in T. castaneum is also reported.
Finally w e identified th e full coding sequence o f th e isocitrate
dehydrogenase 2 (ID H 2) gene (P c_ co m p l5 6 0 _ c0 _ seq l) b a se d on
hom ology to th e T. castaneum p ro te in sequence (EFA 04299; Evalue = 0, b it score = 760). T h e b last result also identified the
isocitrate dehydrogenase 1 (ID H 1) gene (Pc_com p296_c0_seql),
b u t w ith less su p p o rt (E-value = e-172, b it score = 602).
T a b le 2 . L ist o f w i n g d e v e l o p m e n t g e n e s f o u n d in P. c h a lc e u s o r t h o l o g o u s t o T. c a s ta n e u m .
Function
Anterior/Posterior
Accession P. chalceus
G ene
Engrailed
(en)
Pc_comp5821_c0_seq 1
Invected
(inv)
Pc_comp5821_cO_seq 1
H edgehog
(hh)
Pc_co m p8905_c0_seq 1
Cubitus interruptus
Dorsal/Ventral
Pc_co m p4 719_c0_seq 1
Bodywall/wing
Hox
O rth o lo g hit
ratio
1.27
56
1.31
0.96
60
Patched
(ptc)
Pc_comp7372_c1_seq1
Decapentaplegic
(dpp)
Pc_comp8429_c0_seq2
Daughters against
(dad)
Pc_comp5722_c0_seq1
1.08
Brinker
(brk)
Pc_co m p8966_c0_seq 1
0.29
Optomotor-blind-like
(omb)
Pc_comp6103_c0_seq1
0.68
Spalt-like protein
(sal)
Pc_co m p7794_c0_seq 1
0.87
Apterous a
(ap A)
Pc_comp9155_c1_seq1
0.76
Apterous b
(ap B)
Pc_co m p 10531 _c0_seq 1
0.69
Pc_co m p3149_c0_seq 1
1.02
Notch
Vein and sensory
A m in o acid
id e n tity (% )
0.62
64
0.85
Serrate
(Ser)
Pc_co m p6451_c0_seq 1
80
1.00
Wingless
(wg)
Pc_co m p9580_c0_seq 1
96
0.74
Distal-less
(Dll)
Pc_co m p7089_c0_seq 1
Serum response factor
(srf)
Pc_co m p3744_c0_seq2
96
0.36
1.08
Rhomboid
(rho)
Pc_comp9713_c0_seq1
96
0.72
Knirps
(kni)
Pc_comp8029_c0_seq2
Knot transcription factor
(knot)
Pc_comp 14479_c0_seq 1
liroquois
Oro)
Pc_comp4855_c0_seq2
Abrupt
(ab)
Pc_comp3738_c0_seq3
1.00
Noradrenaline transporter
(net)
Pc_comp9252_c0_seq1
0.94
Delta
(Dl)
Pc_co m p8811 _c0_seq 1
Extra m acrochaetae
(emc)
Pc_co m p778_c0_seq 1
86
1.04
A chaete-scute
(ASH)
Pc_co m p5966_c0_seq 1
67
1.09
Asense
(ase)
Pc_co m p 12489_c0_seq 1
54
1.07
Teashirt
(tsh)
Pc_co m p7294_c0_seq 1
69
Homothorax
(hth)
Pc_comp2739_c0_seq1
87
Nubbin
(nub)
Pc_co m p7766_c0_seq 1
0.36
Ventral vein lacking
(w l)
Pc_co m p4049_c0_seq 1
1.05
Vestigial
(vg)
Pc_co m p7899_c0_seq 1
Sex com bs reduced Ser
(Cx)
Pc_co m p5657_c0_seq 1
Prothoraxless
(ptl)
Ultrabithorax
(Ubx)
0.83
84
0.61
1.04
0.95
1.04
69
0.74
Pc_comp8727_c0_seq1
100
0.31
Pc_co m p6090_c0_seq 1
84
0.97
1.07
doi:10.1371 /journal.pone.0042605.t002
PLoS ONE I w w w .p lo s o n e .o rg
August 2012 | Volume 7 | Issue 8 | e42605
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
10
1
I
20
30
40
Tc_apB
Pc_a pB
ApapB
Dm a p
Ap_apA
Am_ap
TcapA
Pc_apA
CAG CGG R IQ DRY YL LAVDRQWHASCLKC CECKL PLDTELT
CAG CGG R IH DRYYL LAVDRQWHAPCLKC CECKVPLDTELT
CAGCGRSIDDEFYLSAVDMCWHTGCLQCAECKLPLDTET.V
C SG CG RQ IO DRF YL s AVEKRWH ASC LQ C YAC RQ RLERES S
CCGCGLKILDRYYLFAVDKRWHASCLQCSQCTRTLASEIK
CAGCG T.RI SDRF YLQ AVDRRWHAACLQ CS HCRQG LDGEVT
CAGCGMRITDRFYLQAVDRRWHASCLQCCQCRNTLDGEIT
CAGCGMRIADRFYLQAVDRRWHAACLQCCQCRNTLDGEIT
50i
60 ___ ___
70
80
___
Tc_apB
Pc_apB
Ap_apB
Dm_ap
Ap_apA
Am_ap
TcapA
Pc_apA
CF ARDGNIYCKEDY YRMFA-VTRCGROQ AG I SANELVMRA
CFARDGNIYCKSDYYRMFA-VTROGROQAG ISANELVMRA
CYSRHGNIYCKQDYYRLFS-IKRCAROQGGIGACDLVMRA
C YSR DG NIY CK ND Y YSFFG -TR RC SRC LA SISSNELV M R A
CF YRDGNIYCKADYQRLYG-IRRCGRCIIAGISPSELVM R A
CF SRCG N I YCKKDY'YRMFG SMKRCARCH A A I LAS ELVMRA
c f s r c g n iy c k k d y y r l f g - m k r c a r o q a t iis s e l v m r a
CFSRDGNIYCKKDYYRLFG-MKRCGROQAAILSSELVMRA
90
100
110
120
Tc_apB
Pc_apB
Ap_apB
Dm_ap
Ap_apA
Am ap
TcapA
Pc^apA
RDSVYHLHCFSCTSCGMPLSKGDHFGMRDGLIYCRPHYEL
RDLVYHLHCF TCASCNVPLSKG DHFGMREG LVYCRPHYEI
KDLVYHVECFACYAOGAVLCKGDYYGVRDGAVFCRPDYER
RNLVFH VN C FC CTV C HTPLTK GD 2YG IID ALIY CR TH YSI
99
— Tc_apA
99
■Pc_apA
Am_ap
79
Dm_ap
42
58
- Ap_apA
-T c_ a p B
100
Pc_apB
Ap_apB
- Ap_tup
100
-Tc_tup
rdtvfh vpcfsctvclavltk g dq fg m rdg avfcq h h yq q
r d l v fh v r c fsc a a c a v pltk g d h fg m r d g a v lc r lh y em
RDLVFHVllCFSCAVCNSPLTKGDUFGMRDGAVLCRLILFEM
0.1
r elv fh vh cfsc avc ncpltk g d h fg m r dg avlc rlh fem
Figure 6. Phylogenetic analysis o f the U M d om ain of th e a p t e r o u s
gene.
(A) A lig n m e n t o f p ro te in s e q u e n c e s o f t h e UM d o m a in re g io n o f th e
a p te ro u s (ap) o rth o lo g s a n d p a ra lo g s o f T rib o liu m ca sta n e u m (Tc), A c irth o s y p h o n p is u m (Ap), D ro s o p h ila m e la n o g a s te r (Dm), A p is m e llife ra (Am) w ith
th e p re s u m e d p a ra lo g s f o u n d in th e P og on u s chalceus (P c_apA a n d Pc_apB) tra n s c r ip to m e . (B) N e ig b o u r-jo in in g tr e e o f a p p ro te in s e q u e n c e s , ro o te d
w ith ta ilu p (tu p ). B o o tstra p s u p p o r t v a lu e s a re g iv e n a t e a c h n o d e .
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 6
M apping
R ead s for each sam ple (i.e. larva, p u p a , adult) w ere m ap p e d
b ack to the assem bled reference transcriptom e based o n the
pooled d a ta a n d pro p erly p a ire d reads w ere extracted (T able 1;
Figure 7). Based on the BW A m appings [65], 92.6% , 90.4% a n d
93.1% o f the m ap p e d reads w ere aligned pro p erly p a ire d w hen
aligning the reads o f the larva, p u p a a n d a d u lt sam ple,
respectively, to the assem bled reference transcriptom e. T h e m ean
coverage d e p th (reads covering each base pair) for the larva, p u p a
a n d a d u lt sam ple is respectively 93.7, 55.2 a n d 111.6. T h e Bowtie
aligner resulted in a h igher m ea n coverage, ow ing to reads being
m ap p e d to m ultiple positions. T h e p u p a sam ple has less m ean
coverage d e p th resulting from less sequenced reads.
Som e transcripts w ere rep resen ted by m an y reads. M oreover,
50% o f the reads m ap p e d to only 146 tran scrip t sequences a n d
90% m ap p e d to 2,971 transcripts. M a p p in g o f the reads shows
th a t re a d coverage is very high. H ow ever, the fact th a t only 149
transcripts consum e 50% o f all reads m ay indicate th a t n o rm al­
ization can b e useful for transcriptom e assem bling. T h e top tw enty
o f these w ere investigated a n d are show n in T ab le 4. A m ongst
these transcripts, several are associated w ith energy m etabolism
(cytochrom e c oxidase subunit II a n d III, succinate a n d N A D H
dehydrogenase a n d A D P /A T P translocase), locom otion (actin a n d
T a b le 3 . List o f Insect horm one biosynthesis genes.
NCBI ge n e lD T.
castaneum
Accession P. chalceus
A m ino acid O rth o lo g hit
id e n tity (% ) ratio
(JHE)
658208
Pc_comp7235_c0_seq 1
62
0.97
(JHAMT)
662961
Pc_comp8820_c0_seq 1
65
1.01
juvenile horm one epoxide
hydrolase
(JHEH)
659305
Pc_comp841 _c0_seq 1
74
0.98
Function
G ene
Juvenile horm one
juvenile-horm one esterase
juvenile horm one acid
m ethyltransferase
cytochrom e P450, family 15
(CYP15A1)
658858
Pc_comp2578_c2_seq2
77
0.95
Molting horm one
ecdysteroid 25-hydroxylase
(PHM)
656884
Pc_comp6141_c0_seq 1
72
0.98
(ecdysone)
ecdysteroid 22-hydroxylase
(DIB)
663098
Pc_comp7215_c0_seq2
73
0.70
ecdysteroid 2-hydroxylase
(SAD)
658665
Pc_comp5946_c0_seq 1
64
0.75
ecdysone 20-monooxygenase (SHD)
661451
Pc_comp8625_c0_seq2
73
0.69
cytochrom e P450, family 307
(Spo/spok)
658081
Pc_comp9046_c0_seq 1
79
0.93
cytochrom e P450, family 18
(CYP18A1)
656794
Pc_comp3811_c0_seq1
86
0.52
Note: Genes w ere extracted from T. castaneum through th e KEGG pathw ay database.
doi:10.1371 /journal.pone.0042605.t003
PLoS ONE I w w w .p lo s o n e .o rg
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
Larva
Pupa
Is: 42,711 (64,8% )
U: 25,712 (65,0% )
Is: 41,292 (6 2 ,6 '
U: 24,653 (62,3 “
«\=\
&
/b' <&■
'A'
Is: 2,894 (4,4 %}
U: 1,504(3,8% )
Is: 30,200 (45.8 %)
U: 18,462(56,7% )
Is; 6,171 (9,4% )
U: 3,867(9,8% )
C o m p a ris o n o f th e sam p le s
R e ad s w ere m ap p e d w ith Bowtie [66] a n d assigned to genes a n d
isoform s w ith the R S E M softw are [68]. S h a re d a n d uniq u e
presence o f genes a n d isoform s is show n in F igure 6. 30,200
(45.8% ) a n d 18,462 (56.7%) o f th e isoform s a n d unigenes
respectively w ere sh ared a m o n g life stages. 1,879 (4.8%), 1,403
(3.5%) a n d 7,086 (17.9%) o f th e unigenes a re uniquely expressed
in the larva, p u p a a n d a d u lt stage, respectively. O f these uniquely
expressed unigenes, only 170, 106, a n d 243 respectively w ere
assigned G O term s (Figure 8). O verall, th e G O te rm com position
o f these uniquely expressed transcripts in eac h life stage
corresponds well to the G O te rm com position o f th e com plete
transcriptom e. N o statistical differences in G O te rm com position
w ere fo und b etw een these sets o f uniquely expressed genes
(F D R < 0 .1 ). T h e h ig h er a m o u n t o f uniquely expressed genes in
th e a d u lt stage m o st likely resulted from m o re sh o rt transcripts
b e in g assem bled.
O '* .
Is: 5,795(8,8% )
U: 3.284 (8,3 %)
Is: 10,384(15,7% )
U: 7 ,086(17.9% )
Is: 52,550(79,7% )
U: 32,699(82,7% )
Figure 7. U nique and shared transcript presence o f the three
d evelopm ental stages. T he v e n n d ia g ra m s h o w s th e u n iq u e a n d
V a ria n t c a llin g
F o r S N P calling, BW A was used to m a p th e reads o f each
sam ple to th e reference transcriptom e. In total, SA M tools [67]
detected 38,141 different heterozygous S N P position in uniq u e
tran scrip t sequences using th e stringent p a ram ete rs (i.e. coverage
a n d m ap p in g quality o f 25) (Figure 9). T h is is a b o u t one S N P p e r
nine h u n d re d b p o f u n iq u e tran scrip t sequence (1/898). O f these
SN Ps, 26,823 (70.3%) w ere fo und in a p re d ic te d o p e n read in g
fram e (O RF) > 2 0 0 b p a n d 6,998 (18.3%) resulted in a a m in o acid
ch an g e (nonsynonym ous S N P (nsSNP)) a n d a re fo und in 2,907
different unigenes. T h is results in a p e rce n ta g e o f nonsynonym ous
changes in th e co d in g region o f 26.1% , w hich is low er c o m p a red
s h a r e d tra n s c r ip t p re s e n c e o f th e th re e d e v e lo p m e n ta l s ta g e s (larva,
p u p a a n d a d u lt), b a s e d o n RSEM c o u n ts . R ead s w e re a ss ig n e d to
iso fo rm s (Is) o r u n ig e n e s (U). W h en RSEM r e p o r te d a c o u n t o f a t le a st
o n e , th e tra n s c r ip t w a s r e p o r te d a s p re s e n t.
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 7
m yosin lig h t chain), tran scrip tio n (D N A topoisom erase 1) a n d
translation (elongation factor 1 a n d 2). F erritin is a p ro tein th a t
stores a n d buffers iro n [81] a n d its h igh a b u n d an c e m ay resem ble
a n a cc o m m o d a tio n to h igh re d u ce d iro n concentrations a n d high
oxidative stress in salt m arshes [82,83] o r a stress response.
T a b le 4 . T o p t w e n t y t r a n s c r i p t s w ith m o s t r e a d s a s s i g n e d .
Accession P. chalceus
Nr. reads
Length (bp)
Pc_comp0_c1 _seq1
21905861
1,272
Pc_comp5_c0_seq 1
4116337
Pc_com p 18_c0_seq 1
3016196
3,942
Melanization -related protein
Pc_comp23_c1 _seq1
2836940
3,453
Unknown
Pc_com p 7_c0_seq 1
2585095
1,672
Myosin light chain 23
Pc_comp32_c0_seq 1
1912972
3,409
NADH dehydrogenase subunit 43
Pc_com p4_c3_seq 1
1842608
651
Unknown
Pc_comp30_c0_seq 1
1823110
1,598
Pc_com p 4 1_c0_seq 1
1788846
1,961
Elongation factor 1-alpha3
Pc_comp 1_c0_seq3
1511917
1,714
Actin3
Pc_comp39_c0_seq 1
1501260
2,011
Unknown
Pc_com p 14_c0_seq 1
1505364
6,711
DNA topoisom erase 1
Pc_com p 16_c0_seq 1
1501260
2,186
Muscular protein 20
Pc_comp58_c0_seq 1
1419825
1,732
ADP/ATP trans locase3
Pc_com p 13_c0_seq 1
1346169
759
Unknown
Pc_com p 10_c4_seq 1
1217481
1,679
Cytochrome c Oxidase subunit III (coxlll)-
Pc_comp26_c0_seq 1
1178489
3,236
Elongation factor 23
Pc_comp2_c0_seq 1
1128159
634
Unknown
Pc_com p 19_c 1_seq 1
1124751
821
Cytochrome c Oxidase subunit II (coxii)3
A n n o tatio n
Unknown
Succinate dehydrogenase3
Alpha-tubulin
*Associated with mitochondria, energy metabolism and electron transport chain.
**Associated with muscles and m ovement.
***Associated with translation or transcription.
doi:10.13717journal.pone.0042605.t004
p.
PLoS ONE I w w w .p lo s o n e .o rg
10
A u g u s t 2 0 1 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
120
■ L arva
nPupa
■ A d u lt
I
y .^ u l L i u J l J lu. , ..d
Biological Process
I
I 1 . üU ul
,
Molecular Function
E
d
E
Cellular Component
Figure 8. Gene O n to lo g y (GO) d istrib ution assigned to unigenes th a t are fou nd u niqu ely in each life stage. R eads w e re m a p p e d w ith
B ow tie a n d a s s ig n e d to g e n e s a n d iso fo rm s w ith th e RSEM so ftw a re .
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 8
Conclusion
to studies re p o rtin g u p to 57.3% nsSN Ps in coding regions in a
single individual o f Ja p a n e se native cattle [84] a n d 41 to 47% in
h u m a n individual resequencing studies [85,86], b u t com p arab le to
ratios found in o th er studies [87,88],
In the presen t study, we sequenced a n d characterized the
transcriptom e in the w ing polym orphic beetle P. chalceus. T h e
assem bled sequence d a ta com prising 39,393 unique transcripts
provides valuable resources to study w ing polym orphism a n d the
Larva
Pupa
S N P : 1 6 ,0 2 7 (42.0
( 4 2 .0 % )
O R F: 11.511 (30.2%
(3 0 .2 % )
n sS N P : 2 .8
,8 1 0
(7.4%
7.4% )
\
S N P : 16,578 (43.5% )
O R F: 1 2 .042
.0 4 2 (3 1 .6 % )
n s S N P : 2 .8 4 2 (7.5% )
SN P : 2 ,9 8 5 (7.8% )
O R F : 2 .1 9 9 (5.6% )
n sS N P :
4 6 5 (1 .2 % )
S N P : 1,9 3 7 (5.1% )
O R F: 1 ,4 1 3 (3 .7 % )
n s S N P : 3 0 7 (0.8% )
S N P : 2 ,0 1 4 (5 .3 % )
O R F: 1,4 5 9 (3.8% )
n sS N P :
3 5 6 (0.9% )
S N P : 4,2 7 9 (11.2% )
O R F: 3,171 (8.3% )
nsS N P :
7 19 (1.9% )
SN P : 1 0 ,4 5 8 (27.4% )
O R F : 6 ,8 8 2 (1 8 .0 % )
n s S N P : 2,1 1 8 (5.6% )
Adult
S N P : 1 8 ,6 8 8 (4 9 .0 %)
O R F: 1 2 .9 2 5 (33.9% )
n s S N P : 3 .5 0 0
(9.2% )
Figure 9. Shared and uniqu e SNPs. O nly H e te ro z y g o u s SNPs a re c o n s id e re d fro m u n ig e n e s . T h e to ta l a m o u n t o f h e te ro z y g o u s SNPs called in th e
th r e e s a m p le s is 3 8 ,141. 70.3% (26,823) o f th e s e SNPs w e re fo u n d in a n o p e n re a d in g fra m e (ORF) a n d 18.3% (6,998) re s u lte d in a n a m in o acid c h a n g e
(nsSNP).
d o i:1 0 .1 3 7 1 /jo u rn a l.p o n e .0 0 4 2 6 0 5 .g 0 0 9
PLoS ONE I w w w .p lo s o n e .o rg
11
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
Acknowledgm ents
adaptive divergence in the face o f strong gene flow found in P.
chalceus. W e c h aracterized a large set o f genes relevant to w ing
developm ent a n d dispersal polym orphism w ith high significance,
including paralogs, giving a n in dication o f the integrity a n d
com pleteness o f the assem bled P. chalceus transcriptom e resulting
from short re ad Illum ina sequencing.
W e found a high n u m b e r o f putative SN Ps (37,492). T h e
com bination o f SN P calling w ith O R F pred ictio n allow ed us to
infer th a t a large p a r t o f the SN Ps located in a coding fragm ent
(26,757) result in nonsynonym ous nucleotide substitutions (23.2%).
T h e results show th a t it is possible to com bine transcriptom e
assem bly a n d characterizatio n w ith the discovery o f b o th
synonym ous a n d nonsynonym ous SNPs, p roviding a fram ew ork
for fu rth er po p u latio n genom ic studies to indentify the m olecular
basis underlying p henotypic v a riation o f ecologically relevant traits
in a n on-m odel species.
W e thank Janine M arien, affiliated to the Vrije Universiteit Am sterdam ,
for her help in preparation of the samples. This work was carried out using
the Stevin Supercom puter Infrastructure a t G hent University, funded by
G hent University, the Hercules Foundation and the Flemish G overnm ent
—departm ent EW I and we are grateful to the IC T D epartm ent of G hent
U niversity for assistance w ith ou r com putations. S equencing was
perform ed by the Genomics Core o f the University Hospital o f Leuven,
Belgium.
Author Contributions
Conceived and designed the experiments: SVB F H D R J V H . Perform ed
the experiments: SVB F H JV H . Analyzed the data: SVB. C ontributed
reagents/m aterials/analysis tools: SVB F H D R JV H . W rote the paper:
SVB FH.
References
1. R o f f D A (1986) T h e E v o lu tio n o f W in g D im o rp h is m in In sects. E v o lu tio n 40:
1 0 0 9 -1 0 2 0 .
2. R o f f D A , F a irb a irn D J (2007) T h e e v o lu tio n a n d g e n e tic s o f m ig ra tio n in insects.
B io scien ce 57: 1 5 5 -1 6 4 .
3.
4.
5.
6.
H e n d ric k x F , M a e lfa it J - P , D e s e n d e r K , A v iro n S, B ailey D , e t al. (2009)
P e rv asiv e effects o f d isp e rsal lim ita tio n o n w ith in - a n d a m o n g -c o m m u n ity sp ecies
ric h n e ss in a g ric u ltu ra l lan d sc a p e s. G lo b a l E c o lo g y a n d B io g e o g ra p h y 18: 6 0 7 —
616.
K o k k o H , L o p e z -S e p u lc re A (2006) F r o m in d iv id u a l d isp e rsa l to sp ecies ran g es:
P e rsp e c tiv e s fo r a c h a n g in g w o rld . S c ien c e 313: 78 9 —791.
D e n n o R E , R o d e ric k G K , P e te rs o n M A , H u b e r ty A F , D o b e l H G , e t al. (1996)
H a b ita t p e rs is te n c e u n d e rlie s in tra sp e c ific v a ria tio n in th e d isp e rsal stra te g ie s o f
25.
G ib s o n G , D w o rk in I (2004) U n c o v e rin g c ry p tic g e n e tic v a ria tio n . N a tu re
26.
R e v ie w s G e n e tic s 5: 6 8 1 -U 6 1 1 .
S tev en s V M , T r o c h e t A , V a n D y ck H , C l o b e r t J , B a g u e tte M (2011) H o w is
d isp e rsa l in te g r a te d in life h isto ries: a q u a n tita tiv e an aly sis u sin g bu tte rflie s.
27.
E c o lo g y L e tte rs 15: 7 4 -8 6 .
D e s e n d e r K (1985) W in g p o ly m o rp h is m a n d re p ro d u c tiv e b io lo g y in th e
h a lo b io n t c a ra b id b e e d e Pogonus chalceus (M a rsh a m ) (C o le o p te ra , C a ra b id a e ).
28.
B i o l j b D o d o n a e a 53: 8 9 —100.
D e s e n d e r K (1987) H e rita b ility E s tim a te s fo r D iffe re n t M o rp h o lo g ic a l T r a its
R e la te d to W in g D e v e lo p m e n t a n d B o d y Size in th e H a lo b io n t a n d W in g
P o ly m o rp h ic C a r a b id B e e d e P o g o n u s -C h a lc e u s M a rs h a m (C o le o p te ra , C a r a b i­
d ae). A c ta P h y to p a th o lo g ic a E t E n to m o ló g ic a H u n g a r ic a 22: 8 5 —101.
p la n th o p p e rs . E c o lo g ic a l M o n o g ra p h s 66: 3 8 9 —f 0 8 .
D h u y v e tte r H , G a u b lo m m e E , D e s e n d e r K (2004) G e n e tic d iffe re n tia tio n a n d
lo c a l a d a p ta tio n in th e s a lt-m a rs h b e e d e P o g o n u s c h alceu s: a c o m p a ris o n
b e tw e e n a llo z y m e a n d m ic ro sa te llite lo ci. M o le c u la r E c o lo g y 13: 1 0 6 5 —1074.
29.
7. V a n D y c k H , M a tth y s e n E (1999) H a b ita t f ra g m e n ta tio n a n d in se c t flight: a
c h a n g in g ‘d e s ig n ’ in a c h a n g in g la n d sc a p e ? T r e n d s in E c o lo g y & E v o lu tio n 14:
1 7 2 -1 7 4 .
R o n c e O (2007) H o w d o e s it feei to b e like a ro llin g sto n e? T e n q u e stio n s a b o u t
d isp e rsa l e v o lu tio n . A n n u a l R e v ie w o f E c o lo g y E v o lu tio n a n d S y stem a tic s 38:
2 3 1 -2 5 3 .
9. d e n B o e r P J (1968) S p re a d in g o f risk a n d th e s ta b iliz a tio n o f a n im a l n u m b e rs .
A c ta B io th e o re tic a 18: 1 6 5 -1 9 4 .
30.
E v o lu tio n 61: 184—193.
D e s e n d e r K , B a c k e lja u T , D e la h a y e K , D e M e e s te r L (1998) A ge a n d size o f
E u r o p e a n s a ltm a rsh e s a n d th e p o p u la tio n g e n e tic c o n s e q u e n c e s fo r g ro u n d
31.
b e e d es. O e c o lo g ia 114: 5 0 3 -5 1 3 .
A b o u h e if E , W ra y G A (2002) E v o lu tio n o f th e g e n e n e tw o rk u n d e rly in g w in g
p o ly p h e n is m in a n ts. S c ien c e 297: 2 49—252.
8.
10.
R o f f D A (1994) H a b ita t P e rsiste n c e a n d th e E v o lu tio n o f W in g D im o rp h is m in
In sects. A m e ric a n N a tu ra lis t 144: 7 7 2 -7 9 8 .
11.
M c P e e k M A , H o lt R D (1992) T h e E v o lu d o n o f D is p e rs a l in S p a tia lly a n d
T e m p o ra lly V a r y in g E n v iro n m e n ts . A m e ric a n N a tu ra lis t 140: 1010—1027.
12.
H o lt R D , M c P e e k M A (1996) C h a o tic p o p u la tio n d y n a m ic s fav o rs th e e v o lu tio n
o f d isp ersal. A m e ric a n N a tu ra lis t 148: 7 0 9 —718.
M a th ia s A , K isd i E , O liv ie ri I (2001) D iv e rg e n t e v o lu tio n o f d isp e rsa l in a
13.
14.
32.
33.
34.
35.
h e te ro g e n e o u s la n d s c a p e . E v o lu tio n 55: 24 6 —259.
D o e b e li M , R u x to n G D (1997) E v o lu tio n o f d isp e rsa l ra te s in m e ta p o p u la tio n
m odels: B ra n c h in g a n d cyclic d y n a m ic s in p h e n o ty p e space. E v o lu tio n 5 1 : 1 7 3 0 —
1741.
15.
O r r H A (2005) T h e g e n e tic th e o ry o f a d a p ta tio n : A b r ie f h isto ry . N a tu re
R e v ie w s G e n e tic s 6: 1 1 9 -1 2 7 .
16. M ic h e l A P , S im S, P o w ell T H Q , T a y lo r M S , N o sil P , e t al. (2010) W id e s p re a d
g e n o m ic d iv e rg e n c e d u r in g s y m p a tric sp é cia tio n . P ro c e e d in g s o f th e N a tio n a l
A c a d e m y o f S cien ces o f t h e U n ite d S ta te s o f A m e ric a 107: 9 7 2 4 —9 7 2 9 .
17. H o e k s tra H E , H ir s c h m a n n R J, B u n d e y R A , In se l P A , G ro s s la n d J P (2006) A
single a m in o a c id m u ta tio n c o n trib u te s to a d a p tiv e b e a c h m o u se c o lo r p a tte rn .
Sc ien c e 313: 1 0 1 -1 0 4 .
S te in e r C C , W e b e r J N , H o e k s tra H E (2007) A d a p tiv e v a ria tio n in b e a c h m ic e
p r o d u c e d b y tw o in te r a c tin g p ig m e n ta tio n g enes. P los B io lo g y 5: 1880—1889.
19. W e s t- E b e rh a r d M J (2005) D e v e lo p m e n ta l p lastic ity a n d th e o rig in o f sp ecies
d ifferences. P ro c e e d in g s o f th e N a tio n a l A c a d e m y o f S cien ces o f th e U n ite d
S ta te s o f A m e ric a 102: 6 5 4 3 -6 5 4 9 .
20. H o e k s tra H E , C o y n e J A (2007) T h e lo cu s o f ev o lu tio n : E v o d e v o a n d th e
g e n e tic s o f a d a p ta tio n . E v o lu tio n 61: 9 9 5 —1016.
R ic h a rd s S, G ib b s R A , W e in s to c k G M , B ro w n SJ, D e n e ll R , e t al. (2008) T h e
g e n o m e o f th e m o d e l b e e d e a n d p e s t T r ib o liu m c a sta n e u m . N a tu r e 452: 9 4 9 —
955.
E m le n D J , N ijh o u t H F (1 9 9 9 ) H o r m o n a l c o n tr o l o f m a le h o r n le n g th
d im o rp h is m in th e d u n g b e e d e O n th o p h a g u s ta u r u s (C o le o p te ra : S c a ra b a e i­
dae). J o u r n a l o f In se c t P h y sio lo g y 45: 4 5 —53.
37.
Z e r a A J (2003) T h e e n d o c rin e re g u la tio n o f w in g p o ly m o rp h is m in insects: S ta te
o f th e a rt, r e c e n t s u rp rise s, a n d f u tu re d ire c d o n s. In te g ra tiv e a n d C o m p a ra tiv e
38.
B io lo g y 43: 6 0 7 - 6 1 6 .
Z e r a A J (2007) E n d o c r in e analysis in e v o lu tio n a ry -d e v e lo p m e n ta l stu d ie s o f
39.
in se c t p o ly m o rp h is m : h o r m o n e m a n ip u la tio n v ersu s d ire c t m e a s u r e m e n t o f
h o r m o n a l re g u la to rs. E v o lu tio n & D e v e lo p m e n t 9: 4 9 9 —51 3 .
Z e r a A f D e n n o R F (1997) P h y sio lo g y a n d eco lo g y o f d isp e rsal p o ly m o rp h is m in
40.
insects. A n n u a l R e v ie w o f E n to m o lo g y 42: 2 0 7 —230.
d e n B o e r I J (1970) O n th e sig n ific a n c e o f d isp e rsal p o w e r fo r p o p u la tio n s o f
42.
21. V a n S tra a le n N M , R o e lo fs D (2006) A n in tro d u c tio n to ec o lo g ic al g en o m ics.
N e w Y ork: O x f o r d U n iv e rsity P ress. 3 07 p.
PLoS ONE I w w w .p lo s o n e .o rg
p a tte rn s . C u r r e n t B io lo g y 9: 109—115.
W e ih e U , M ilá n M , C o h e n S M (2005) Drosophila L im b D e v e lo p m e n t. In : G ilb e rt
L I, e d ito r. In s e c t D e v e lo p m e n t: M o rp h o g e n e s is , M o ltin g a n d M e ta m o rp h o s is,
first ed . L o n d o n , U K : E lsev ier, p p . 730.
Ish ik a w a A , O g a w a K , G o to h H , W a ls h T K , T a g u D , e t al. (2012) J u v e n ile
h o r m o n e titre a n d re la te d g e n e e x p re s s io n d u r in g th e c h a n g e o f re p ro d u c tiv e
m o d e s in th e p e a a p h id . In se c t M o le c u la r B io lo g y 21: 4 9 —60.
41.
43.
B a r re tt R D H , S c h lu te r D (2008) A d a p ta tio n fro m s ta n d in g g e n e tic v a ria tio n .
T r e n d s in E c o lo g y & E v o lu tio n 23: 38—44.
23. A r e n d t J , R e z n ic k D (2008) C o n v e rg e n c e a n d p a ra lle lis m rec o n s id e re d : w h a t
h a v e w e le a r n e d a b o u t th e g e n e tic s o f a d a p ta tio n ? T r e n d s in E c o lo g y &
E v o lu tio n 23: 2 6 -3 2 .
24. L e R o u z ic A , C a r lb o r g O (2008) E v o lu tio n a ry p o te n tia l o f h id d e n g e n e tic
v a ria tio n . T r e n d s in E c o lo g y & E v o lu tio n 23: 33—37.
W e a th e rb e e S D , N ijh o u t H F , G r u n e r i L W , H a id e r G , G a la n t R , e t al. (1999)
U ltra b ith o r a x fu n c d o n in b u tte rfly w in g s a n d th e e v o lu d o n o f in se c t w in g
36.
18.
22.
D h u y v e tte r H , H e n d r ic k x F , G a u b lo m m e E , D e s e n d e r K (2007) D iffe re n d a tio n
b e tw e e n tw o sa lt m a rs h b e e d e eco ty p es: E v id e n c e fo r o n g o in g s p é cia tio n .
44.
c a ra b id -b e e d e s . O e c o lo g ia 4: 1—28.
d e n B o e r I J (1980) W in g p o ly m o rp h is m a n d d im o rp h is m in g r o u n d b e e d e s as
stag es in a n e v o lu tio n a ry p ro ce ss (C o le o p te ra , C a ra b id a e ). E n to m o l G e n 6: 107—
134.
D e s e n d e r K (1988) F lig h t-M u sc le D e v e lo p m e n t a n d D isp e rsa l in th e L ife-C ycle
o f C a r a b id B eed es. A n n a le s D e L a S o ciété R o y a le Z o o lo g iq u e D e B e lg iq u e 118:
7 8 -7 9 .
T h e o d o r id e s K , D e R iv a A , G o m e z -Z u rita J , F o ste r P G , V o g le r A P (2002)
C o m p a ris o n o f E S T lib ra rie s fro m sev en b e e d e species: to w a rd s a fra m e w o rk fo r
p h y lo g e n o m ic s o f th e C o le o p te ra . In s e c t M o le c u la r B iology 11: 4 6 7 —475.
S o n g H , S h effield N G , C a m e r o n SL, M ille r K B , W h itin g M F (2010) W h e n
p h y lo g e n e tic a s su m p tio n s a re v io la ted : b a s e c o m p o s itio n a l h e te ro g e n e ity a n d
a m o n g -s ite r a te v a ria tio n in b e e d e m ito c h o n d ria l p h y lo g en o m ics. S ystem a tic
E n to m o lo g y 35: 4 2 9 —448.
12
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
T ra n s c rip to m ic s o f a W in g P o ly m o rp h ic B eetle
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
H u n t T , B e rg ste n J , L e v k a n ic o v a Z . P a p a d o p o u lo u A , J o h n O S , e t al. (2007) A
c o m p re h e n siv e p h y lo g e n y o f b e e d e s rev eals th e e v o lu tio n a ry o rig in s o f a
s u p e rra d ia tio n . S c ien c e 318: 1913—1916.
S lo a n D B , K e lle r S R , B e ra rd i A E , S a n d e rs o n BJ, K a r p o v ic h J F , e t al. (2012) D e
n o v o tra n s c rip to m e a sse m b ly a n d p o ly m o rp h is m d e te c tio n in th e flo w erin g p la n t
S ilene v u lg aris (C a ry o p h y llac e a e ). M o le c u la r E c o lo g y R e s o u rc e s 12: 3 3 3 —343.
X u e J , B a o Y -Y , L i B-l, C h e n g Y -B , P e n g Z -Y , e t al. (2011) T ra n s c r ip to m e
A nalysis o f th e B ro w n P la n th o p p e r N ila p a r v a ta lu g en s. P los O n e 5.
M itta p a lli O , B ai X , M a m id a la P , R a ja ra p u SP, B o n ello P, e t al. (2010) T issu e Specific T ra n s c rip to m ic s o f th e E x o tic In v asiv e In s e c t P e s t E m e ra ld A sh B o re r
(A grilus p lan ip e n n is). P los O n e 5.
P o e lc h a u M F , R e y n o ld s J A , D e n lin g e r D L , E lsik C G , A rm b ru s te r P A (2011) A
d e n o v o tra n s c rip to m e o f th e A sia n tig e r m o sq u ito , A e d e s a lb o p ic tu s, to id en tify
c a n d id a te tra n s c rip ts fo r d ia p a u s e p r e p a r a tio n . B m c G e n o m ic s 12.
T u r in H (2000) D e N e d e r la n d s e lo o p k e v e rs , v e rs p r e id in g e n o e c o lo g ie
(C o le o p te ra . C a ra b id a e ), N e d e rla n d s e F a u n a 3 .: N a tio n a a l N a tu u rh is to ris c h
M u s e u m N a tu ra lis , K N N V U itg ev e rij & E IS N e d e r la n d , L e id en .
S e rra n o J (1981) A C h r o m o s o m e S tu d y o f S p a n ish B e m b id iid a e a n d O th e r
C a r a b o id e a (C o le o p te ra A d e p h a g a ). G e n e tic a 57: 119—129.
G r a b h e rr M G , H a a s BJ, Y a s s o u r M , L e v in J Z , T h o m p s o n D A , e t al. (2011) Fullle n g th tra n s c rip to m e a sse m b ly fro m R N A -S e q d a ta w ith o u t a re fe re n c e g e n o m e .
N a tu r e B io te c h n o lo g y 29: 6 4 4 -U 1 3 0 .
S c h m ie d e r R , E d w a rd s R (2011) F a s t I d e n tific a tio n a n d R e m o v a l o f S e q u e n c e
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
C o n ta m in a tio n fro m G e n o m ic a n d M e ta g e n o m ic D a ta se ts. Plos O n e 6.
A ltsc h u l S F, G ish W , M ille r W , M y e rs E W , L ip m a n D J (1990) B asic L o c a l
A lig n m e n t S e a rc h T o o l. J o u r n a l o f M o le c u la r B iology 215: 4 0 3 ^ 4 1 0 .
G o tz S, G a rc ia -G o m e z J M , T e r o l J , W illia m s T D , N a g a ra j S H , e t al. (2008)
H ig h - th r o u g h p u t f u n c tio n a l a n n o ta tio n a n d d a ta m in in g w ith th e B la st2 G O
77.
78.
suite. N u c le ic A cids R e s e a rc h 36: 3 4 2 0 -3 4 3 5 .
A s h b u r n e r M , B all C A , B lak e J A , B o tste in D , B u d e r H , e t al. (2000) G e n e
O n to lo g y : to o l fo r th e u n ific a tio n o f b io lo g y . N a tu r e G e n e tic s 25: 25—29.
K im H S , M u rp h y T , X i a J , C a r a g e a D , P a rk Y , e t al. (2010) B e e d e B a se in 2010:
re v is io n s to p ro v id e c o m p r e h e n s iv e g e n o m ic i n fo r m a tio n fo r T r ib o liu m
79.
c a sta n e u m . N u c le ic A cid s R e s e a rc h 38: D 4 3 7 - D 4 4 2 .
K u r tz S, P h illip p y A , D e lc h e r A L , S m o o t M , S h u m w a y M , e t al. (2004) V e rsa tile
a n d o p e n so ftw are fo r c o m p a r in g la rg e g e n o m e s. G e n o m e B io lo g y 5.
M in X J , B u d e r G , S to rm s R , T s a n g A (2005) O rfP re d ic to r: p r e d ic tin g p ro te in -
80.
c o d in g reg io n s in E S T -d e riv e d se q u en c e s. N u c le ic A cid s R e s e a rc h 33: W 6 7 7 —
W 680.
60. T a m u r a K , P e te rs o n D , P e te rs o n N , S te c h e r G , N e i M , e t al. (2011) M E G A 5 :
M o le c u la r E v o lu tio n a ry G e n e tic s A n a ly sis U s in g M a x im u m L ik e lih o o d ,
E v o lu tio n a ry D is ta n c e , a n d M a x im u m P a rsim o n y M e th o d s . M o le c u la r B iology
61.
62.
63.
64.
65.
66.
81.
82.
a n d E v o lu tio n 28: 2 7 3 1 -2 7 3 9 .
B risso n J A , Ish ik a w a A , M iu r a T (2010) W in g d e v e lo p m e n t g e n e s o f th e p e a
a p h id a n d d iffe re n tia l g e n e e x p re ssio n b e tw e e n w in g e d a n d u n w in g e d m o rp h s .
In s e c t M o le c u la r B iology 19: 6 3 - 7 3 .
Belles X , M a rtin D , P iu la c h s M D (2005) T h e m e v a lo n a te p a th w a y a n d th e
83.
84.
synthesis o f ju v e n ile h o r m o n e in insects. A n n u a l R e v ie w o f E n to m o lo g y , p p .
1 8 1 -1 9 9 .
W a r r e n J T , P e try k A , M a rq u e s G , P a r v y J P , S h in o d a T , e t al. (2004) P h a n to m
e n c o d e s th e 2 5 -h y d ro x y la se o f D ro s o p h ila m e la n o g a ste r a n d B o m b y x m o ri: a
P 4 5 0 e n z y m e c ritic a l in e c d y so n e b io sy n th e sis. In s e c t B io c h e m is try a n d
to m ic s o f th e B e d B u g (C im e x lectu lariu s). P los O n e 6.
W a n g X -W , L u a n J -B , LÍ J - M , B a o Y -Y , Z h a n g G -X , e t al. (2010) D e n o v o
c h a ra c te r iz a tio n o f a w h itefly tra n s c rip to m e a n d an aly sis o f its g e n e e x p re ssio n
d u r in g d e v e lo p m e n t. B m c G e n o m ic s 11.
S h e n G -M , D o u W , N i u J - Z , J i a n g H -B , Y a n g W -J, e t al. (2011) T ra n s c r ip to m e
A n aly sis o f th e O r ie n ta l F r u it Fly (B a c tro c e ra dorsalis). Plos O n e 6.
O ’N e il S T , D z u ris in J D K , G a rm ic h a e l R D , L o b o N F , E m ric h SJ, e t al. (2010)
P o p u la d o n -le v e l tra n s c rip to m e s e q u e n c in g o f n o n m o d e l o rg a n ism s E ry n n is
p r o p e r d u s a n d P a p ilio ze lic ao n . B m c G e n o m ic s 11.
C o l b o u r n e J K , P fre n d e r M E , G ilb e rt D , T h o m a s W K , T u c k e r A , e t al. (2011)
T h e E c o re sp o n siv e G e n o m e o f D a p h n ia p u lex . S c ien c e 331: 5 5 5 —561.
K a ra to lo s N , P a u c h e t Y , W ilk in so n P , G h a u h a n R , D e n h o lm I, e t al. (2011)
P y r o s e q u e n c in g th e tra n s c rip to m e o f th e g r e e n h o u s e w h itefly, T ria le u ro d e s
v a p o ra r io r u m rev e a ls m u ld p le tra n s c rip ts e n c o d in g in se c d c id e ta rg e ts a n d
d eto x ify in g e n z y m e s. B m c G e n o m ic s 12.
G u s ta v so n E , G o ld s b o ro u g h A S , A li Z , K o r n b e r g T B (1996) T h e D ro s o p h ila
e n g ra ile d a n d in v e c te d genes: P a rtn e rs in re g u la tio n , e x p re ssio n a n d f u n c d o n .
G e n e tic s 142: 8 9 3 -9 0 6 .
S h ig e n o b u S, B ickel R D , B risso n J A , B u tts T , C h a n g G G , e t al. (2010)
C o m p re h e n s iv e su rv e y o f d e v e lo p m e n ta l g e n e s in th e p e a a p h id , A c y rth o s ip h o n
p isu m : f re q u e n t lin e a g e -sp e c ific d u p lic a tio n s a n d losses o f d e v e lo p m e n ta l genes.
In s e c t M o le c u la r B iology 19: 4 7 - 6 2 .
C o h e n B, M c G u f d n M E , Pfeifle G , S eg al D , C o h e n S M (1992) A p te ro u s , a G e n e
R e q u ire d fo r Im a g in a i D isk D e v e lo p m e n t in D ro s o p h ila E n c o d e s a M e m b e r o f
th e T im F a m ily o f D e v e lo p m e n ta l R e g u la to ry P ro te in s. G e n e s & D e v e lo p m e n t
6: 7 1 5 - 7 2 9 .
T h e il E C (1987) F e r rid n - S tru c tu re , G e n e -R e g u la d o n , a n d C e llu la r F u n c d o n in
A n im als, P la n ts, a n d M ic ro o rg a n is m s . A n n u a l R e v ie w o f B io c h e m istry 56: 289—
315.
O r in o K , L e h m a n L, T su ji Y , A y ak i H , T o r ti S V , e t al. (2001) F e rritin a n d th e
res p o n s e to o x id a d v e stress. B io c h e m ic a l J o u r n a l 357: 24 1 —247.
O d u m W E (1988) C o m p a ra tiv e E c o lo g y o f T id a l F re s h w a te r a n d S a lt M a rsh e s.
A n n u a l R e v ie w o f E c o lo g y a n d S y stem a tic s 19: 147—176.
K a w a h a r a -M ik i R , T s u d a K , S h iw a Y , A ra i-K ic h is e Y , M a ts u m o to T , e t al.
(2011) W h o le -g e n o m e re s e q u e n c in g show s n u m e ro u s g e n e s w ith n o n s y n o n ­
y m o u s S N P s in th e J a p a n e s e n a tiv e c a td e K u c h in o s h im a -U sh i. B m c G e n o m ic s
12.
85.
M o le c u la r B iology 34: 9 9 1 —1010.
K a n e h is a M , G o to S (2000) K E G G : K y o to E n c y c lo p e d ia o f G e n e s a n d
G e n o m e s. N u c le ic A cid s R e s e a rc h 28: 27—30.
L i H , D u r b in R (2009) F a s t a n d a c c u r a te s h o rt r e a d a lig n m e n t w ith B u rro w sW h e e le r tra n s fo rm . B io in fo rm a tic s 25: 1754—1760.
L a n g m e a d B, T r a p n e ll G , P o p M , S a lz b e rg S L (2009) U ltra f a s t a n d m e m o ry efficie n t a lig n m e n t o f s h o rt D N A se q u en c e s to th e h u m a n g e n o m e . G e n o m e
B iology 10.
PLoS ONE I w w w .p lo s o n e .o rg
L i H , H a n d s a k e r B, W y so k er A , F e n n e ll T , R u a n J , e t al. (2009) T h e S e q u e n c e
A lig n m e n t/M a p f o rm a t a n d S A M to o ls. B io in fo rm a tic s 25: 2 0 7 8 -2 0 7 9 .
L i B, D e w e y C N (2011) R S E M : a c c u r a te tra n s c rip t q u a n tific a tio n fro m R N A S e q d a ta w ith o r w ith o u t a re fe re n c e g e n o m e . B m c B io in fo rm a tic s 12.
L i H (2011) A sta tistic a l fra m e w o rk fo r S N P c a llin g, m u ta tio n discovery,
a sso c ia tio n m a p p in g a n d p o p u la tio n g e n e tic a l p a ra m e te r e s tim a tio n fro m
s e q u e n c in g d a ta . B io in fo rm a tic s 27: 2 9 8 7 —2993.
D a n e c e k P, A u to n A , A b e c a sis G , A lb e rs C A , B a n k s E , e t al. (2011) T h e v a ria n t
c all f o rm a t a n d V C F to o ls. B io in fo rm a d c s 27: 2 1 5 6 -2 1 5 8 .
W ilk in M B , B ec k e r M N , M u lv e y D , P h a n I, C h a o A , e t al. (2000) D ro s o p h ila
D u m p y is a g ig a n tic e x tra c e llu la r p r o te in r e q u ir e d to m a in ta in te n s io n a t
e p id e rm a l-c u d c le a tta c h m e n t sites. C u r r e n t B io lo g y 10: 5 5 9 —56 7.
B a i X , M a m id a la P, R a ja r a p u SP, J o n e s S G , M itta p a lli O (2011) T r a n s c r ip ­
86.
87.
88.
13
E c k S H , B e n e t-P a g e s A , Flisikow ski K , M e itin g e r T , F rie s R , e t al. (2009) W h o le
g e n o m e s e q u e n c in g o f a single Bos ta u ru s a n im a l fo r single n u c le o tid e
p o ly m o rp h is m d isco v ery . G e n o m e B io lo g y 10.
K i m J - I , J u Y S , P a rk H , K im S, L e e S, e t al. (2009) A h ig h ly a n n o ta te d w holeg e n o m e s e q u e n c e o f a K o r e a n in d iv id u a l. N a tu re 460: 1011—U 1 0 9 6 .
B e n tle y D R , B a la s u b r a m a n ia n S, S w erd lo w H P , S m ith G P , M ilto n J , e t al.
(2008) A c c u ra te w h o le h u m a n g e n o m e s e q u e n c in g u sin g rev e rsib le te rm in a to r
c h e m istry . N a tu r e 456: 5 3 - 5 9 .
L ev y S, S u tto n G , N g P C , F e u k L , H a lp e rn A L , e t al. (2007) T h e d ip lo id g e n o m e
s e q u e n c e o f a n in d iv id u a l h u m a n . P los B io lo g y 5: 2 1 1 3 —2 1 4 4 .
A u g u s t 2 01 2 | V o lu m e 7 | Issue 8 | e 4 2 6 0 5
Download