Structural features of conopeptide genes inferred from partial

advertisement
Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome
Molecular Genetics and Genomics
Neda Barghi
Marine Science Institute, University of the Philippines-Diliman, Quezon City 1101, Philippines
Gisela P. Concepcion
Philippine Genome Center, University of the Philippines, Quezon City 1101, Philippines
Marine Science Institute, University of the Philippines-Diliman, Quezon City 1101, Philippines
Baldomero M. Olivera
Department of Biology, University of Utah, Salt Lake City, UT-USA
Arturo O. Lluisma
Philippine Genome Center, University of the Philippines, Quezon City 1101, Philippines
Marine Science Institute, University of the Philippines-Diliman, Quezon City 1101, Philippines
Corresponding author:
Arturo O. Lluisma
aolluisma@upd.edu.ph
1
Fig. S1. Alignment of the complete conopeptide precursor of C. tribblei genomic scaffolds with the
conopeptide transcripts. The mature regions are underlined; the signal regions are shown in bold and the
Cysteine residues in bold italic red. Sequences of the first, second and third exons of the conopeptide
gene are highlighted in blue, pink, and green, respectively. The name of each genomic scaffold is shown
as ‘scaffold#_$’ where # is an arbitrary number assigned to the scaffold by the assembler, and $ is the
name of conopeptide superfamily or group. The conopeptide transcripts of C. tribblei are from Barghi et
al. (2015a, 2015b).
a) B1 superfamily
Ctr_106_T
scaffold678003_B1
MQFFTCLSLLVPLMLFFHLTHSVTADHDHEAAATEVRSADHTLKRLVHHH
MQFFTCLCLLVPLMLFFHLTHSVTADHDHEAAATEVRSADHTLKRLVHH-
Ctr_106_T
scaffold678003_B1
VHQAPKRGPGLSRTSAVHGLHNDDNDNDGGKTKRTDDLSVTLDLWLLAEQ
VHRAPKRGPVLSRTSTVHGLHNDDNDNDGDQKKRTGNLSVNLALEIWVKQ
Ctr_106_T
scaffold678003_B1
L-NLEKKIKESKAKLDSLGR
MKNQQKKMKGAKTRLDVLGR
b) J superfamily
Ctr_45_T
MTPVWSVTCCCLLWPMLSVQLVTPGSPAPAQEGILVDSVEECPEMCEEGGVDPLCECPTT
scaffold868323_J MTPVWSVTCCCLLWPMLSVQLVTPGSPAPAQEGILVDSVEECPEMCEEGGVDPLCECPTT
Ctr_J_1
MTPVWSVTCCCLLWPMLSVQLVTPGSPAPAQEGILVDSVEECPEMCKEGGVDPLCECPTT
Ctr_45_T
WNDLPPWIGRRKMSTVA
scaffold868323_J WNDLPPWIGRRKMSTVA
Ctr_J_1
WNDLPPWIGRRKMSTVA
Ctr_153_T
scaffold40779_J
MQSVQSVTRCCLLVLLLPALCVNPHPLGISQPLPQQLNTERGDPSGLKYC
MQSVQSVTRCCLLVLLLPALCVNPHPLGISQPLPQQLNTERGDPSGLKYC
Ctr_153_T
scaffold40779_J
NKLCAQHTPTKVCTEKVCSKLPDVVDDRRKRTELPMAP
NKLCAQHTPTKVCTEKVCSKLPDVVDDRRKRTELPMAP
2
c) Y2-like group
Ctr_147_T
MATGLLSPLLVTMLGFLLHVHVARAGLEHTCTLETRLQGAHPRGICGSKL
scaffold34965_Y2-like MATGLLSPLLVTMLGFLLHVHVARAGLEHTCTLETRLQGAHPRGICGSKL
Ctr_147_T
PNIIHTVCQVMGRGYAGGQRQLRKRTSMINSDDMEADEGSVGGFLMSKRR
scaffold34965_Y2-like PNIIHTVCQVMGRGYAGGQRQLRKRTSMINSDDMEADEGSVGGFLMSKRR
Ctr_147_T
ALSYLQKETNPLVMAGYERRGLQKRHGGQGITCECCYNFCSFRELVQYCN
scaffold34965_Y2-like ALSYLQKETNPLVMAGYERRGLQKRHGGQGITCECCYNFCSFRELVQYCN
d) O1 superfamily
scaffold448978_O1 MKLTCVLTIAALFLTACQLITASSDTRDLQEFPRRKRSHRTLKKKAEEEPCIPGGLQCDV
Ctr_38_N
MKLTCVLTIAALFLTACQLITASSDTRDLQEFPRRKRSHRTLKKKAEEEPCIPGGLQCDV
scaffold448978_O1 LDDKCCNSCSLFWCT
Ctr_38_N
LDDKCCNSCSLFWCT
e) G-like group
Ctr_GL_1
Ctr_28_N
Ctr_41_N
scaffold15221_G-like
MSKSGMLLFVLLLVWPLAFPKLVPVQRSLARRYGDLGAKRDVPTGCVSPSTSNLQGPWEN
MSKSGMLLFVLLLVWPLAFPKLVPVQRSLARRYGDLGAKRDVPTGCVSPSTSNLQGPWEN
MSKSGMLLFVLLLVLPLAFPKLVPVQRSLARRYGDLAAKRDVATDCVSPSTPNLQGPWQN
MSKSGMLLFVLLLVLPLAFPKLVPVQRSVARRYGDLGVKRSGTSSCVSQSTPNLQGPWED
Ctr_GL_1
Ctr_28_N
Ctr_41_N
scaffold15221_G-like
KKCCNTKRCSPTNCCASSSCTCSGSTCYCPGR
KKCCNTKRCSPTNCCASSSCTCSGTACYCSGR
KKCCLTKRCGPTNCCVSSSCTCSGSTCYCPGR
KKCCLTRRCGPTNCCPSSSCTCSGTTCNCPGG
f) R superfamily
Ctr_142_T
Ctr_R_1
scaffold188032_R
MRASTWLSGRMVITVLPSLRVSVAISTLSGVSLVRSRLLLSTLTARVRASSRLVSPSLYS
MRASTWLSGRMVITVLPSLRVSVAISTLSGVSLVRSRLLLSTLTARVRASSRLVSPSLYS
MRASTWLSGRMVITVLPSLRVSVAISTLSGVSLVRSRLLLSTLTARVRASSRLVSPSLYS
Ctr_142_T
Ctr_R_1
scaffold188032_R
CFSRDTALALLLPMHVAFQPP
CFSRDTALALLLPMHVAFQPP
CFSRDTAFALLLPMHVAFQPP
3
Fig. S2. Schematic representation of the partial structures of conopeptide genes. Untranslated regions
(UTR) are shown in gray rectangles, coding DNA sequences (CDS) in pink wedges, and introns in
dashed lines. Intron phases (0, 1, and 2) are shown on the introns. Regions of the conotoxin precursors
are shown in red (signal), blue (pro) and green (mature) lines. The lengths of the lines representing
conopeptide regions are not to scale. The name of each scaffold is shown as scaffold#_$ where # is an
arbitrary number assigned to the scaffold by the assembler, and $ is the name of conopeptide
superfamily or group. The gene structures were constructed using GSDS (http://gsds.cbi.pku.edu.cn/).
4
Fig. S3. Schematic representation of the partial structures of conopeptide genes. The notes are indicated
in Fig. S2. The superfamily abbreviations are Ikot: Con-ikot-ikot, and Kunitz: Conkunitzin.
5
Fig. S4. Sequence logos of (a) the intron donor sites, and (b) the intron acceptor sites of conopeptide
genes. For the donor site logo, the last 10 bp of exons and the first 10 bp of introns are shown. As for the
acceptor site logo, the last 10 bp of introns and the first 10 bp of exons are shown. The sequence logos
were constructed using Weblogo (Crooks et al. 2004)
a)
b)
References:
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: A sequence logo generator,
Genome Res. 14: 1188-1190.
6
Download