1 Supplementary Information 2 Supplementary Methods 3 (a) Illumina-based transcriptome sequencing and library construction 4 Sequencing was conducted using RNA extractions from dissected whole midgut 5 regions (M1-M4) of five symbiotic and aposymbiotic bugs, respectively, fed on their 6 natural diet of cottonseeds, resulting in two pooled samples. Prior to sequencing, the 7 extracted RNA was exposed to a poly-A enrichment strategy. RNA sequencing was 8 performed by a commercial service provider (Fasteris; http://www.fasteris.com) using 9 5 μg total RNA (per sample) on the HiSeqTM 2000 Sequencing System from Illumina 10 (http://www.illumina.com/), utilizing the paired read 100 bp technology. The reads 11 were trimmed according to the sequencing quality test using Timmomatic [1]. The 12 leading and trailing bases of each read were cut off if the quality values were below 13 the default threshold. Additionally, reads were discarded if they were shorter than 30 14 base pairs. 15 Following quality checks, the trimmed reads were assembled de novo into contigs 16 using Trinity [2]. The minimal contig length was set to 200 and the k-mer length to 25 17 base pairs. The read libraries of symbiotic and aposymbiotic bugs were pooled and 18 assembled into a single backbone. After the assembly with Trinity, the contigs were 19 clustered by CD-HIT EST according to their sequence similarity to remove potential 20 duplicates. Sequences with more than 99% sequence similarity to other contigs were 21 subsequently collapsed. For the assignment of expression values to each 22 constructed transcript in the respective library, the original reads were mapped back 23 to the respective backbone assembly using the algorithm Bowtie2 [3]. The generated 24 output was processed using SAMtools [4] to create BAM files and asses the 1 25 coverage depth as the number of reads mapped to each transcript. The files were 26 parsed with the custom script nucdepth.R (supplementary materials) using the R 27 package Rsamtools [5] for further analysis. The correction for biases due to the 28 different depths of sequencing across treatments and due to different transcript sizes 29 were addressed using the RPKM (reads per kilobase of transcript per million of 30 mapped reads) transformation to obtain estimates of relative expression levels. 31 Homology searches (BLASTx and BLASTn) of unique sequences and functional 32 annotation 33 (http://www.blast2go.de). were determined using the BLAST2GO software suite v2.4.1 34 35 (b) Validation of host gene expression with quantitative PCR 36 Quantitative PCRs (qPCRs) for the candidate host genes involved in B vitamin 37 metabolism were conducted across (i) aposymbiotic and symbiotic bugs reared on 38 their natural diet of cotton seeds (to confirm the transcriptome sequencing results), 39 and (ii) across the four experimental treatments of aposymbiotic and symbiotic bugs 40 reared on a complete and vitamin-deficient artificial diet to examine if similar 41 expression patterns are observed under controlled conditions with vitamin availability 42 as the only nutritional difference between dietary treatments. Primers were designed 43 based on the candidate gene sequences available from the transcriptome, and 44 checked for specificity in vitro using capillary sequencing of amplified PCR products 45 (Table S2). 46 The qPCR reactions were performed using a RotorGene®-Q cycler (Qiagen, Hilden, 47 Germany), with the same individual cDNA extracts used for the diagnostic PCR 48 screens. The final reaction volume of 25 µl included the following components: 1 µl of 2 49 cDNA template, 2.5 µl of each primer (10 µM), 6.5 µl of autoclaved distilled H 2O, and 50 12.5 µl of SYBR Green Mix (Qiagen, Hilden, Germany). 51 Conditions for qPCR were optimized using a VWR® Gradient Thermocycler (VWR, 52 Radnor, PA, USA) at various annealing temperatures (60-68 °C). Standard curves for 53 absolute quantification in the qPCR (10-fold dilution series from 1 ng/µl to 10-6 ng/µl) 54 were generated using purified PCR products for all primer pairs after measuring the 55 PCR product concentrations using a NanoDropTM1000 spectrophotometer (Peqlab). 56 The following cycling parameters were used: 95°C for 10 min., followed by 45 cycles 57 of 68ºC for 30 s, 72ºC for 20 s, and 95ºC for 15 s. Subsequently, a melting curve 58 analysis was conducted by increasing the temperature from 60ºC to 95ºC within 20 59 min. Six replicates of one of the standard concentrations were used, for each primer 60 pair and concentration, for the configuration and calibration of the standard curve. 61 The resulting averages were then utilized to correct for possible errors in the DNA 62 concentration measurements. Based on the standard curve, absolute copy numbers 63 were calculated. The linear correlation coefficient (R2) of all the measured genes 64 ranged from 0.981 to 1.000. According to the slopes of the standard curves, 65 amplification efficiencies of the standards utilized in our study ranged from 66 93%~102%, (based on the formula E = 10 1/-slope -1). 67 Gene copy numbers estimated from qPCRs were first normalized against the 60S 68 ribosomal protein L13a (Table S3), then compared using the Mann-Whitney U-test 69 and Kruskal-Wallis tests with Dunn posthoc tests, respectively, to asses levels of 70 significance (P < 0.05) across treatments. 71 3 72 Supplementary References 73 1. Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B. 2012 74 RobiNA: a user-friendly, integrated software solution for RNA-Seq-based 75 transcriptomics. Nucleic Acids Res. 40, W622-627. (doi:10.1093/nar/gks540). 76 2. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis 77 X, Fan L, Raychowdhury R, Zeng Q, et al. 2011 Full-length transcriptome 78 assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 79 29(7), 644-652. (doi:10.1038/nbt.1883). 80 81 82 3. Langmead B, Salzberg SL. 2012 Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357-359. (doi:10.1038/nmeth.1923). 4. Huang Y, Niu B, Gao Y, Fu L, Li W. 2010 CD-HIT Suite: a web server for 83 clustering and comparing biological sequences. Bioinformatics 26(5), 680-682. 84 (doi:10.1093/bioinformatics/btq003). 85 5. Delhomme N, Padioleau I, Furlong EE, Steinmetz LM. 2012 easyRNASeq: a 86 bioconductor package for processing RNA-Seq data. Bioinformatics 28(19), 87 2532-2533. (doi:10.1093/bioinformatics/bts477). 88 89 90 91 92 4 93 Supplementary tables and figures 94 Table S1: Summary of all B vitamin related transporter and intracellular processing isoforms 95 detected in the transcriptome of symbiotic and aposymbiotic D. fasciatus reared on 96 cottonseeds. Transcripts with Aposymbiotic/Symbiotic fold change > 2 are marked with an 97 asterisk. Seq. Name Seq. Description Symbiotic Aposymbiotic Fold (RPKM) (RPKM) Aposymbiotic/Symbiotic 0.9 2.0 2.3* Intracellular Processing Dfas-06032 Thiamine pyrophosphokinase (TPK) (B1) Dfas-08844 Riboflavin kinase (RFK) (B2) 23.5 41.6 1.8 Dfas-46813 Nicotinamide mononucleotide 9.8 11.6 1.2 2.9 4.9 1.7 adenylyltransferase (NMNAT) (B3) Dfas-29450 Pantothenate kinase (PANK) (B5) Dfas-50613 Pyridoxal kinase (PK) (B6) 13.6 19.7 1.5 Dfas-11244 Biotin-protein lyase (BPL) (B7) 0.3 0.8 2.4* Dfas-40090 Dihydrofolate reductase 6.2 9.6 1.6 257.8 237.3 0.9 1.4 3.4 2.5* 5.5 25.7 4.7* 6.3 32.1 5.1* (DHFR) (B9) Dfas-21030 Transcobalamine 2 (TCII) (B12) Transport Dfas-27947 Proton-coupled folate transporter (PCFT) (B9) Dfas-08740 Thiamine alkaline phosphatase (ALKP) (B1) Dfas-21251 Thiamine transporter 2 (THTR2) (B1) 98 99 100 5 101 Table S2: Summary of all stress-related isoforms detected in the transcriptome of symbiotic 102 and 103 Aposymbiotic/Symbiotic fold change > 2 are marked with an asterisk. aposymbiotic Seq. Name D. fasciatus Seq. Description reared on cottonseeds. Transcripts with Symbiotic Aposymbiotic Fold (RPKM) (RPKM) Aposymbiotic/Symbiotic 432.19 1,162.36 2.69* 2.40 2.51 1.05 Dfas-25287 Heat shock protein 70 Dfas-38373 Stress-activated protein kinase jnk Dfas-54540 Heat shock protein 90 107.91 227.00 2.10* Dfas-38957 Heat shock protein 70 27.80 14.23 0.51 Dfas-28415 Heat shock protein 70 3.27 4.21 1.28 Dfas-40833 Heat shock protein 90 135.47 381.08 2.81* Dfas-08437 Related to glyoxal oxidase precursor 2980.39 3231.26 1.08 Dfas-41199 Related to glyoxal oxidase precursor 171.70 338.03 1.97 Dfas-11475 Related to glyoxal oxidase precursor 211.52 412.40 1.95 Dfas-19692 Peroxiredoxin 6 733.49 668.13 0.91 Dfas-10108 Ccaat enhancer-binding 59.72 63.70 1.07 Dfas-47994 Superoxide dismutase 2 16.61 17.92 1.08 Dfas-48940 Superoxide dismutase 220.71 165.31 0.75 Dfas-23888 Superoxide dismutase 0.87 2.75 3.15* Dfas-22754 Superoxide dismutase 2.40 3.39 1.41 Dfas-52644 Mitochondrial manganese superoxide 36.37 43.21 1.19 0.40 0 N/A 161.79 122.79 0.76 dismutase Dfas-39352 Tryparedoxin peroxidase Dfas-12213 Peroxiredoxin- mitochondrial-like Dfas-01237 Peroxiredoxin 4 0.82 2.21 2.70* Dfas-36450 Thioredoxin peroxidase 2-like 0.31 2.10 6.75* Dfas-39210 Thioredoxin family trp26 5.07 6.73 1.33 Dfas-38112 Venom allergen 5-like 0.33 3.38 10.13* Dfas-35010 Trypsin-like protease 1.96 0 N/A Dfas-31914 Glucose transporter type 1-like 0.94 1.33 1.41 Dfas-19251 Solute carrier family facilitated glucose 25.94 63.95 2.47* transporter member 8 104 6 105 Table S3. Composition of the artificial diet. To generate the vitamin-deficient diet, the B 106 vitamin solution was substituted with water. Component Amount soybean protein 15.0 g potato starch 7.5 g dextrose 7.5 g sucrose 2.5 g cellulose 12.5 g B vitamin stock solution 15.0 ml thiamine (B1) (0.25 g) riboflavin (B2) (0.5 g) nicotinamide (B3) (1 g) calcium pantothenate (B5) (1 g) pyradoxine (B6) biotin (B7) folic acid (B9) (0.25 g) (0.02 ml) (0.25 g) cobalamin (B12) water (1g) (ad 1000 ml) soybean oil 20 ml wheat germ 10 g water 25 ml 107 108 7 109 Table S4: Summary of all primer pairs utilized in this study. Primer Primer Sequence Orientation Target Gene TPK fwd TCTTTCCGAAGGATTTGGTG Fwd. Thiamine pyrophosphokinase (TPK) TPK rev TTTCGGGACAAATTCGAGAG Rev. RFK fwd GATGGGCGAAACTTGAAGAA Fwd. RFK rev AGGGTTCCAACCAACACTCA Rev. NMNAT fwd CTGGGAATGCAGTCAGGAAA Fwd. Nicotinamide mononucleotide NMNAT rev TGGCTCGTCATTCTCATTCA Rev. adenylyltransferase (NMNAT) PANK fwd GGAGACAGCACAAGCTGGAC Fwd. Pantothenate kinase (PANK) PANK rev ACCGCTCTACCCATTCCTTC Rev. PK fwd ATAGCGCTCCATGCTTCATC Fwd. PK rev GATCCGGTAATGGGTGACAA Rev. BPL fwd GGATGTGCTACGTTTTCTCTCC Fwd. BPL rev TAAGGCCTCAAGTCCGTGTT Rev. DHFR fwd GAGTGTCTGGATAATCGGAGGA Fwd. DHFR rev TCTTCTTGGACTTCCGTTGG Rev. TCII fwd CTTTAAGAAGCCGGCAACAG Fwd. TCII rev TTGACAGGCATAAGGGTCGT Rev. ALKP fwd GACATATGCGGCGAACAAAC Rev. ALKP rev GTCGGGCCTCTTGTTTAAGG Fwd. THTR2 fwd GCTTCGACAAGTCCATTCCA Rev. THTR2 rev GATGTTCTGGTGGGCGTTAG Fwd. PCFT fwd AGACGAGGCAAACTGTTCCA Rev. Proton-coupled folate transporter PCFT rev GGCGTCTTCTCTGTGCTGTT Fwd. (PCFT) GLUT8 fwd AGGGTGGAAGGTTTGCTTCT Fwd. Glucose transporter (GLUT8) GLUT8 rev GAAAGCCCTAATGGTGCTGA Rev. Hsp70 fwd GGATGCCGGTACAATTTCTG Fwd. Hsp70 rev GGTTCCACCACCAAGATCAA Rev. RPL13A fwd CGAGGATAAGACGGAACTTGG Rev. RPL13A rev CATGAAGGCTATGGGTCTGG Fwd. 110 8 Riboflavin kinase (RFK) Pyridoxal kinase (PK) Biotin-protein lyase (BPL) Dihydrofolate reductase (DHFR) Transcobalamine 2 (TCII) Thiamine alkaline phosphatase (ALKP) Thiamine transporter 2 (THTR2) Heat shock protein (Hsp) 70 60S ribosomal protein L13a 111 112 Figure Captions 113 114 Figure S1. Midgut transcriptome of cottonseed-fed symbiotic and aposymbiotic D. fasciatus. 115 (a) Illumina transcriptome sequencing and assembly statistics. (b) Histogram of contig 116 lengths as assembled by Trinity. (c) Summary of the BLAST and GO annotation results. (d) 117 Heatmap of the normalized expression (RPKM) of all differentially expressed transcripts. For 118 each row, the expression was centered according to the mean and standard deviation. The 119 row dendrogram represents the Pearson correlation of all genes. 120 121 Figure S2. Differential expression of non-vitamin transporter and housekeeping genes by 122 qPCR for symbiotic and aposymbiotic D. fasciatus reared on a complete or vitamin deficient 123 artificial diet (normalized to the 60S ribosomal protein L13a). (a) Non-B vitamin transporter 124 genes. (b) Elongation factor 1 alpha (EF1a). Shading of boxes signifies the experimental 125 treatments (see legend). Lines represent medians, boxes comprise the 25–75 percentiles, 126 and whiskers denote the range. Significant differences were assessed based on the 127 normalized expression in reference to the 60S ribosomal protein L13a with the Kruskal-Wallis 128 test and Dunn posthoc tests. 129 9