Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome Molecular Genetics and Genomics Neda Barghi Marine Science Institute, University of the Philippines-Diliman, Quezon City 1101, Philippines Gisela P. Concepcion Philippine Genome Center, University of the Philippines, Quezon City 1101, Philippines Marine Science Institute, University of the Philippines-Diliman, Quezon City 1101, Philippines Baldomero M. Olivera Department of Biology, University of Utah, Salt Lake City, UT-USA Arturo O. Lluisma Philippine Genome Center, University of the Philippines, Quezon City 1101, Philippines Marine Science Institute, University of the Philippines-Diliman, Quezon City 1101, Philippines Corresponding author: Arturo O. Lluisma aolluisma@upd.edu.ph 1 Fig. S1. Alignment of the complete conopeptide precursor of C. tribblei genomic scaffolds with the conopeptide transcripts. The mature regions are underlined; the signal regions are shown in bold and the Cysteine residues in bold italic red. Sequences of the first, second and third exons of the conopeptide gene are highlighted in blue, pink, and green, respectively. The name of each genomic scaffold is shown as ‘scaffold#_$’ where # is an arbitrary number assigned to the scaffold by the assembler, and $ is the name of conopeptide superfamily or group. The conopeptide transcripts of C. tribblei are from Barghi et al. (2015a, 2015b). a) B1 superfamily Ctr_106_T scaffold678003_B1 MQFFTCLSLLVPLMLFFHLTHSVTADHDHEAAATEVRSADHTLKRLVHHH MQFFTCLCLLVPLMLFFHLTHSVTADHDHEAAATEVRSADHTLKRLVHH- Ctr_106_T scaffold678003_B1 VHQAPKRGPGLSRTSAVHGLHNDDNDNDGGKTKRTDDLSVTLDLWLLAEQ VHRAPKRGPVLSRTSTVHGLHNDDNDNDGDQKKRTGNLSVNLALEIWVKQ Ctr_106_T scaffold678003_B1 L-NLEKKIKESKAKLDSLGR MKNQQKKMKGAKTRLDVLGR b) J superfamily Ctr_45_T MTPVWSVTCCCLLWPMLSVQLVTPGSPAPAQEGILVDSVEECPEMCEEGGVDPLCECPTT scaffold868323_J MTPVWSVTCCCLLWPMLSVQLVTPGSPAPAQEGILVDSVEECPEMCEEGGVDPLCECPTT Ctr_J_1 MTPVWSVTCCCLLWPMLSVQLVTPGSPAPAQEGILVDSVEECPEMCKEGGVDPLCECPTT Ctr_45_T WNDLPPWIGRRKMSTVA scaffold868323_J WNDLPPWIGRRKMSTVA Ctr_J_1 WNDLPPWIGRRKMSTVA Ctr_153_T scaffold40779_J MQSVQSVTRCCLLVLLLPALCVNPHPLGISQPLPQQLNTERGDPSGLKYC MQSVQSVTRCCLLVLLLPALCVNPHPLGISQPLPQQLNTERGDPSGLKYC Ctr_153_T scaffold40779_J NKLCAQHTPTKVCTEKVCSKLPDVVDDRRKRTELPMAP NKLCAQHTPTKVCTEKVCSKLPDVVDDRRKRTELPMAP 2 c) Y2-like group Ctr_147_T MATGLLSPLLVTMLGFLLHVHVARAGLEHTCTLETRLQGAHPRGICGSKL scaffold34965_Y2-like MATGLLSPLLVTMLGFLLHVHVARAGLEHTCTLETRLQGAHPRGICGSKL Ctr_147_T PNIIHTVCQVMGRGYAGGQRQLRKRTSMINSDDMEADEGSVGGFLMSKRR scaffold34965_Y2-like PNIIHTVCQVMGRGYAGGQRQLRKRTSMINSDDMEADEGSVGGFLMSKRR Ctr_147_T ALSYLQKETNPLVMAGYERRGLQKRHGGQGITCECCYNFCSFRELVQYCN scaffold34965_Y2-like ALSYLQKETNPLVMAGYERRGLQKRHGGQGITCECCYNFCSFRELVQYCN d) O1 superfamily scaffold448978_O1 MKLTCVLTIAALFLTACQLITASSDTRDLQEFPRRKRSHRTLKKKAEEEPCIPGGLQCDV Ctr_38_N MKLTCVLTIAALFLTACQLITASSDTRDLQEFPRRKRSHRTLKKKAEEEPCIPGGLQCDV scaffold448978_O1 LDDKCCNSCSLFWCT Ctr_38_N LDDKCCNSCSLFWCT e) G-like group Ctr_GL_1 Ctr_28_N Ctr_41_N scaffold15221_G-like MSKSGMLLFVLLLVWPLAFPKLVPVQRSLARRYGDLGAKRDVPTGCVSPSTSNLQGPWEN MSKSGMLLFVLLLVWPLAFPKLVPVQRSLARRYGDLGAKRDVPTGCVSPSTSNLQGPWEN MSKSGMLLFVLLLVLPLAFPKLVPVQRSLARRYGDLAAKRDVATDCVSPSTPNLQGPWQN MSKSGMLLFVLLLVLPLAFPKLVPVQRSVARRYGDLGVKRSGTSSCVSQSTPNLQGPWED Ctr_GL_1 Ctr_28_N Ctr_41_N scaffold15221_G-like KKCCNTKRCSPTNCCASSSCTCSGSTCYCPGR KKCCNTKRCSPTNCCASSSCTCSGTACYCSGR KKCCLTKRCGPTNCCVSSSCTCSGSTCYCPGR KKCCLTRRCGPTNCCPSSSCTCSGTTCNCPGG f) R superfamily Ctr_142_T Ctr_R_1 scaffold188032_R MRASTWLSGRMVITVLPSLRVSVAISTLSGVSLVRSRLLLSTLTARVRASSRLVSPSLYS MRASTWLSGRMVITVLPSLRVSVAISTLSGVSLVRSRLLLSTLTARVRASSRLVSPSLYS MRASTWLSGRMVITVLPSLRVSVAISTLSGVSLVRSRLLLSTLTARVRASSRLVSPSLYS Ctr_142_T Ctr_R_1 scaffold188032_R CFSRDTALALLLPMHVAFQPP CFSRDTALALLLPMHVAFQPP CFSRDTAFALLLPMHVAFQPP 3 Fig. S2. Schematic representation of the partial structures of conopeptide genes. Untranslated regions (UTR) are shown in gray rectangles, coding DNA sequences (CDS) in pink wedges, and introns in dashed lines. Intron phases (0, 1, and 2) are shown on the introns. Regions of the conotoxin precursors are shown in red (signal), blue (pro) and green (mature) lines. The lengths of the lines representing conopeptide regions are not to scale. The name of each scaffold is shown as scaffold#_$ where # is an arbitrary number assigned to the scaffold by the assembler, and $ is the name of conopeptide superfamily or group. The gene structures were constructed using GSDS (http://gsds.cbi.pku.edu.cn/). 4 Fig. S3. Schematic representation of the partial structures of conopeptide genes. The notes are indicated in Fig. S2. The superfamily abbreviations are Ikot: Con-ikot-ikot, and Kunitz: Conkunitzin. 5 Fig. S4. Sequence logos of (a) the intron donor sites, and (b) the intron acceptor sites of conopeptide genes. For the donor site logo, the last 10 bp of exons and the first 10 bp of introns are shown. As for the acceptor site logo, the last 10 bp of introns and the first 10 bp of exons are shown. The sequence logos were constructed using Weblogo (Crooks et al. 2004) a) b) References: Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: A sequence logo generator, Genome Res. 14: 1188-1190. 6