SUPPLEMENTARY MATERIALS AND METHODS Choanoflagellate Culture Conditions Cultures of the loricate choanoflagellates Diaphanoeca grandis (Ellis 1930) and Stephanoeca diplocostata (Ellis 1929) were obtained from Barry Leadbeater (University of Birmingham, U.K.). Each species was cultured in artificial seawater medium (36.5 gL-1 Marin Salts (Dr. Biener Aquarientechnik, Wartenberg Germany) in ddH2O. The artificial seawater was vacuum-filtered through a 0.22µm Steriop GP Express Plus filter (Millipore, Massachusetts U.S.A.) into a sterile 1L screw-top glass bottle (Schott Duran). The filtered artificial seawater was then sterilized by autoclaving. New cultures were split under a Labcaire PCR 6 Workstation hood (Labcaire Systems, Avon, U.K.) to reduce the risk of contamination by foreign microorganisms. Cultures were grown in 100ml, 250ml and 500ml glass bottles with plastic screw-tops (Schott Duran). Starting cultures containing 50-200ml were topped up with sterile artificial seawater to 80% of the volume of the culture vessel. Splitting of cultures occurred every 3-5 weeks. Up to three grains of dry-autoclaved white long grain rice were added to provide nutrition for the prey bacteria in the cultures. For both species, cultures were maintained at 13.5oC in an incubator. RNA preparation RNA was extracted from cultures of S. diplocostata using a TRIzol (Invitrogen) based method (as employed in [1]). No antibiotic or filtration purification methods were employed in case they interfered with the normal choanoflagellate gene expression and in particular with transcription of biomineralization-related genes. Each RNA sample was tested for concentration and integrity using a 2100 Bioanalyser (Agilent Technologies, Waldbronn, Germany). The RNA 600 Nano Chip and 2100 Expert software (Agilent Technologies, Waldbronn, Germany) were used to generate an electropherogram of the RNA samples as per the manufacturer’s instructions. Degraded RNA samples were rejected. RNA samples were pooled to give 55µg of total culture RNA, of which approximately 10µg was S. diplocostata total RNA. cDNA Library Preparation As the RNA samples were inevitably contaminated with large amounts of rRNA and RNA from prey bacteria present in the cultures, two rounds of poly(A) mRNA enrichment were performed using the Dynabeads mRNA Purification Kit (Invitrogen) and subsequent rRNA contamination determined by running 1µl of the enriched mRNA on a 2100 Bioanalyser using an RNA 6000 Picochip (Agilent Technologies, Waldbronn, Germany). 180ng of enriched mRNA (<10% rRNA contamination) was then used to construct a 454 transcriptome library as outlined in the cDNA rapid library Preparation Method (Roche). Library quality was assessed by running 1µl of the library on a DNA High Sensitivity Labchip (Agilent), and the number of viable library molecules per µl determined using the KAPA 454 qPCR library quantification kit (Kapabiosystems) on a Step-One qPCR machine (Applied Biosystems). 454 Sequencing For full scale emulsification PCR a ratio of 1.3 molecules per bead was employed. Eight SV oil tubes from the GS Titanium SV emPCR Kit (Lib-L) were used to generate sufficient enriched templated beads for 454 sequencing. Approximately 2x106 enriched templated beads was subjected to 454 pyrosequencing on half of a picotitre plate on the GS FLX sequencer (Roche) using the GS FLX Titanium Chemistry according to the manufacturer's protocol. Assembly method Post-run sequence outputs were viewed in gsRunBrowser in order to verify their metrics and confirm that the sequencing was successful. An assembly for the S. diplocostata sequence data was generated using the Newbler v2.3 software (Roche). Bioinformatic Analysis The S. diplocostata EST contigs were filtered to remove those contigs ≤10 bp in length, resulting in the removal of 528 contigs from the dataset. Custom-written BioPerl scripts were used to classify the source organism for each of the remaining 25,797 contigs: first via tBLASTx [2] against a local copy of the NCBI’s non-redundant (nr) database (October 2010 release), accepting anything with with a threshold E-value < 0.01, and secondly by taking the best hit for each contig and interrogating the Entrez nucleotide database (http://www.ncbi.nlm.nih.gov/nucleotide) to find its taxonomic identity within the Genbank taxonomy (http://www.ncbi.nlm.nih.gov/taxonomy). In this way, a probable taxonomic identity was assigned to over half of the contigs (13,716), allowing the identification of over 3376 from choanoflagellates. A stand-alone copy of InterProScan v 4.6 [3] was used to obtain InterPro and GO annotation for the contigs (InterPro database version 27.0; http://www.ebi.ac.uk/interpro). Diaphanoeca grandis Genomic DNA Extraction and Analysis Cultures of D. grandis were treated with a combination of 2.4ng/ml ampicillin (Sigma), 1.2ng/ml kanamycin (Sigma) and 1.2ng/ml streptomycin-penicillin (Gibco) for 36 hours in order to reduce the amount of bacterial contamination. 50ml of culture was then filtered through a 20μm nylon mesh (Small Parts Inc., Florida, USA) and 15ml of the filtrate collected for gDNA extraction, in a further attempt to remove a portion of the natural bacterial contamination present in the cultures. Approximately 20µg of gDNA was extracted from those cultures of D. grandis that were observed to have the highest amount of choanoflagellate material compared to bacterial contamination. DNA was extracted by a CTAB Buffer based method [4]. The extracted gDNA was sequenced using 120bp paired-end reads with Illumina HiSeq2000 sequencing (Illumina Inc.). The sequence reads produced were assembled into contigs with ABySS v1.2.5 [5] using the default settings. The assembled genomic dataset was analyzed further by tBLASTx [2] to detect sequence similarity to individual genes. A wider taxonomic assignment was conducted using the metagenomic analysis program PhymBL v3.2 [6] to classify contigs as being of bacterial or choanoflagellate origin. The choanoflagellate reference dataset comprised choanoflagellate sequences available in the EMBL/Genbank WGS genomes and non-redundant nucleotide sequences databases. The prokaryotic reference dataset used was the bacterial/archeal genome database included with PymmBL v3.2. Contigs were arbitrarily divided into those <1kb and those >1kb in size. These two datasets were used as separate queries and for both query datasets the default PhymmBL settings were used. SUPPLEMENTARY RESULTS Stephanoeca diplocostata EST Dataset RNA samples extracted from S. diplocostata cultures were oligo-dT bead treated to enrich for poly(A) tagged eukaryotic mRNA. The success of this process was tested for using the Agilent Bioanalyser Picochip and Pico assay software. By comparison to control eukaryotic RNA samples, the first round of enrichment produced a large reduction in contamination and a second round of poly(A) selection produced a marked reduction in rRNA content, with almost all of the rRNA peaks disappearing but a broad mRNA peak being retained. The remaining rRNA was measured at 7% of the total sample, below the 10% threshold recommended for 454 EST sequencing. The total choanoflagellate RNA after two rounds of poly(A) enrichment amounted to 180ng. The results of the EST sequencing and assembly are summarized in table S1. The 454 Titanium sequencing produced 0.261Gb. The average read length was 329 bases (standard deviation ±110 bases, median read length= 347 bases) with a maximum read length of 659 bases, roughly in keeping with the predicted metrics for this sequencing platform. The Q40 score (a base identification of 99.9% accuracy) was 94.4%. The Newbler assembly of the reads produced 26325 contigs of mean length 962bp. The longest contig was 12.6kb long. tBLASTx Analysis The tBLASTx search of the EST dataset against the full EMBL/Genbank nonredundant nucleotide database was used to assign (a) similarity and (b) taxonomic identity to each contig. Hits to selected taxonomic groups are expressed as the total number and as a percentage of the EST dataset in table 4. It should be noted that these are only top hits and in the vast majority of cases equally or only marginally less significant hits to M. brevicollis were also returned. The tBLASTx findings in Table 4 demonstrate the success of the poly(A) enrichment procedures in reducing the levels of bacterial, archaeal, viral and rRNA contamination. The enrichment is estimated to have reduced prokaryotic content from approximately 80% in the starting material to 13.5% of the final contigs. The true number of bacterial contigs may be even lower, given the proliferation of prokaryotic-to-eukaryotic lateral gene transfer found in choanoflagellates [7,8]. Of the tBLASTx hits 241 were to known ribosomal RNA sequences, with 15% (37) eukaryotic rRNA and the remaining 85% (204) bacterial. Again this demonstrates the success of the mRNA enrichment procedures. All predicted eukaryotic rRNA contigs produced perfect (E-value= 0.0) hits to the S. diplocostata 18S and 23S sequences from the EMBL/Genbank databases. There is no evidence for contamination from other eukaryotes. There were no large numbers of hits to one species (apart from M. brevicollis) and an absence of nonchoanoflagellate housekeeping genes. Only 62 top hits were to human sequences, each having low E-values, indicating that RNA samples were not contaminated by lab workers during RNA extraction or cDNA library construction. Approximately 24% of top hits were to M. brevicollis sequences, with a further 0.6% coming from other choanoflagellate sequences in the EMBL/Genbank database (note that this analysis pre-dates the submission of the S. rosetta genome to Genbank). These hits included housekeeping genes that would be expected to be conserved within clades, e.g. ribosomal RNA, alpha tubulin [9]. M. brevicollis represented the single species with the largest number of hits, the majority of which had highly significant E-values confirming the successful sequencing of loricate choanoflagellate genes. The metazoans were the largest clade producing top hits, with 39.8% of hits. This is largely due to the tBLASTx query database containing over 50 fully sequenced animal species versus only only one fully sequenced choanoflagellate species, Monosiga brevicollis [10], and the many more animals with smaller scale sequence depositions into EMBL/Genbank. The high number of hits to metazoan sequences (and to opisthokont sequences, 68.2%) once again confirms the opisthokont affinity of loricate choanoflagellates and the evolutionary relationship between the choanoflagellates and metazoans [11–13]. A further notable finding of the tBLASTx analysis of the S. diplocostata EST dataset is that there are a large number of hits to sequences from other, distantly related eukaryotic groups. The most prominent of these are the stramenopiles (5.5%) and viridiplantae (archaeplastids) (6.8%). Given the low levels of sampling in these groups with respect to large-scale sequencing project, bias due to sequence availability cannot be used to fully explain these results. One explanation is gene loss in the close relatives of loricate choanoflagellates from the eukaryotic last common ancestor. Gene loss has been observed in the non-loricate choanoflagellates [14] and metazoans [15]. Another explanation for the tBLASTx results is eukaryotic-eukaryotic lateral gene transfer, known to be a prominent feature of choanoflagellate genomes [16,17]. Diaphanoeca grandis Genome Dataset The Illumina sequencing of genomic DNA from D. grandis cultures provided 329,237,297bp of sequence data. The sequenced reads were assembled into 921,181 contigs, (sequence dataset available from the authors on request). However these were mainly short contigs (N50=725bp, mean contig length= 357.41bp). Local tBLASTx searches detected 100% matches to known sequences from D. grandis [1,9], confirming that D. grandis genomic material had been successfully sequenced. PhymBL analysis found that the vast majority (>98%) of contigs greater than 1kb in length, and all contigs >10kb in length, were of bacterial origin. The dataset did not contain sufficient choanoflagellate genes, nor genes of sufficient completeness, to merit further large-scale taxonomic or protein domain analysis. The partial genome sequence data did allow detection of contigs with significant similarity to parts of the SdSITa sequence using tBLASTx searches (see Results). PCR primers were designed from the longest contig sequence (see table S4) and the amplified PCR product (DgSITa) cloned, sequenced and used for all further analyses (see Materials and Methods, Results). REFERENCES 1 Steenkamp, E. T., Wright, J. & Baldauf, S. L. 2006 The protistan origins of animals and fungi. Molecular Biology and Evolution 23, 93-106. 2 Altschul, S. F., Madden, T. L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389-3402. 3 Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. & Lopez, R. 2005 InterProScan: protein domains identifier. Nucleic Acids Research 33, W116-W120. 4 Doyle, J. & Doyle, J. 1987 A rapid DNA isolation method for small quantities of fresh tissues. Phytochemical Bulletin 19, 11-15. 5 Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M. & Birol, I. 2009 ABySS: a parallel assembler for short read sequence data. Genome Research 19, 1117-11123. 6 Brady, A. & Salzberg, S. L. 2009 Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods 6, 673-676. 7 Torruella, G., Suga, H., Riutort, M., Peretó, J. & Ruiz-Trillo, I. 2009 The evolutionary history of lysine biosynthesis pathways within eukaryotes. Journal of Molecular Evolution 69, 240-248. 8 Sun, G. & Huang, J. 2011 Horizontally acquired DAP pathway as a unit of selfregulation. Journal of Evolutionary Biology 24, 587-595. 9 Carr, M., Leadbeater, B. S. C., Hassan, R., Nelson, M. & Baldauf, S. L. 2008 Molecular phylogeny of choanoflagellates, the sister group to Metazoa. Proceedings of the National Academy of Sciences of the United States of America 105, 16641-16646. 10 King, N. et al. 2008 The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451, 783-788. 11 Nitsche, F., Carr, M., Arndt, H. & Leadbeater, B. S. C. 2011 Higher Level Taxonomy and Molecular Phylogenetics of the Choanoflagellatea. Journal of Eukaryotic Microbiology 58, 452-462. 12 Ruiz-Trillo, I., Roger, A. J., Burger, G., Gray, M. W. & Lang, B. F. 2008 A phylogenomic investigation into the origin of metazoa. Molecular Biology and Evolution 25, 664-672. 13 Torruella, G., Derelle, R., Jordi, P., Lang, B. F., Andrew, R., Shalchian-Tabrizi, K. & Iñaki, R.-T. 2011 Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single copy protein domains. Molecular Biology and Evolution 29, 531-544. 14 Sebe-Pedros, A., de Mendoza, A., Lang, B. F., Degnan, B. M. & Iñaki, R.-T. 2010 Unexpected repertoire of metazoan transcription factors in the unicellular holozoan Capsaspora owczarzaki. Molecular Biology and Evolution 28, 1241-1254. 15 Chauve, C., Doyon, J.-P. & El-Mabrouk, N. 2008 Gene family evolution by duplication, speciation, and loss. Journal of Computational Biology  15, 1043-1062. 16 Sun, G., Yang, Z., Ishwar, A. & Huang, J. 2010 Algal genes in the closest relatives of animals. Molecular Biology and Evolution 27, 2879-2889. 17 Nedelcu, A.M., Miles, I. H., Fagir, A.M. & Karol, K. 2008 Adaptive eukaryote-toeukaryote lateral gene transfer: stress-related genes of algal origin in the closest unicellular relatives of animals. Journal of Evolutionary Biology 21, 1852-1860. Supplemental Information Tables Table S1. Summary statistics for the EST sequencing results. Statistics refer to the EST dataset after assembly by Newbler. %A %T %G %C No. of contigs Mean contig length Median contig length Max. contig size N50 21 25 25 22 26,325 962 720 12,628 1,300 Table S2. Top hits to contigs from the tBLASTx search against the EMBL/Genbank database. Classifications are as per the Entrez taxonomy (as of October 2010). The total number of contigs in the search query was 25,797. The number of contigs that provided a hit was 13,716 and the number of contigs with no significant similarity at e-value <0.01 was 12081. Group/Species Fungi/Metazoa (i.e. Opisthokonts) Metazoa Eumetazoa Homo sapiens Choanoflagellates Monosiga brevicollis Fungi Amoebozoa Euglenozoa Viridaeplantae Haptophyta Stramenopiles Diatoms Rhizaria Alveolata Bacteria Archaea Viruses Number of contigs with top hit to this taxon 9352 5457 5210 62 3376 3286 517 230 74 926 4 749 304 4 223 1850 60 29 % of Total Contigs with hits 68.2 39.8 38 0.5 24.6 24 3.8 1.7 0.5 6.8 0.03 5.5 2.2 0.03 1.6 13.5 0.4 0.2 Table S3 BLAST and InterProScan results from SIT-like S. diplocostata EST contigs. Sequences are deposited within EMBL/Genbank BioProject PRJEB1282. EMBL/Genbank Accession No. HAAH01000001 No. of Reads 322 64 HAAH01000002 9 HAAH01000003 HAAH01000004 324 331 HAAH01000005 64 HAAH01000006 HMMPfa m Domain [Region] Score PF03842 Silicon Transporter [52-491] 7.3 e-76 PF03842 Silicon transporter [11-193] 3.4e-30 PF03842 Silicon transporter [2-259] 2.3e-36 PF03842 Silicon transporter [45-260] 1.1e-36 PF03842 Silicon transporter [56-494] 3.9e-76 PF03842 Silicon transporter [9-233] 3.2e-35 tBLASTx top E-Value hit PsiBLAST Top Hit E-Value P. 9.00E-27 P. 7.00E-66 tricornutum tricornutum SIT2-2 SIT3 GI:215398379 GI:215398382 7.00E-11 S. acus SIT GI:227460943 1.00E-23 S. acus SIT3 GI:227460944 1.00E-17 2.00E-31 P. N. pelliculosa tricornutum SIT1 SIT3 GI:82527174 GI:215398382 S. acus SIT 2.00E-24 P. 6.00E-28 GI:227460943 tricornutum Cell Surface Receptor Protein GI:219116172 1.00E-27 1.00E-66 P. P. tricornutum tricornutum SIT2-2 SIT3 GI:215398379 GI:215398382 9.00E-11 P. 4.00E-28 tricornutum SIT3 S. acus SIT GI:215398382 GI:227460943 Table S4 Primer Sequences designed for the amplification of S. diplocostata SIT-like (SdSIT) and D. grandis SIT-like (DgSIT) genes. All sequences are given 5’-3’. Primers were synthesized by Sigma. Primer DgSITa_R DgSITa_F SdSIT_24102_F SdSIT_24102_R SdSIT_00527_F1 SdSIT_00527_F2 SdSIT_00527_R1 SdSIT_00527_R2 SdSIT_10214 F SdSIT_R Sequence GGCATGAGCACGGTGTAGTACGC AACAATGGAACAACCCTCCATGGG CCATCATCTAGAAGATCCTCAAAG CGTATTTAAGTAATGAAACGATAGTGT CACCCGACCACAAGGACCAG ACAATGGATAAGAGCCACATCC GTGGAAATAATAAAGATTTAATGAGAGTAC AAAGATTTAATGAGAGTACAACAATTACCC AACATGGAGAAAAGCCACG GGCTGGTGCAGGTCAAATGGT Table S5 Successful primer combinations and their resulting products. Product Name SdSITa SdSITb.1 SdSITb.2 SdSITc DgSITa Forward Primer SdSIT_24102_F SdSIT_00527_F1 SdSIT_00527_F2 SdSIT_10214_F DgSITa_F Reverse Primer SdSIT24102_R SdSIT_00527_R2 SdSIT_00527_R1 SdSIT_R DgSITa_R Table S6 Choanoflagellate SIT similarity and protein domain search results. All analyses were conducted using default settings. BLAST searches were done against the EBML/Genbank non-redundant databases. Gene tBLASTx Top Hit (EMBL/Genbank Accession No.) SdSITa P. tricornutum SIT2(HE981735) 2 GI:215398379 2e-31 SdSITb P. tricornutum SIT2(HE981736) 2 GI:215398379 3e-27 SdSITc S. acus SIT (HE981737) GI:227460943 2e-24 DgSITa C. fusiformis SIT5 (HE981738) GI:3283037 3e-17 PsiBLAST Top Hit InterProScan HMMPfam Domain P. tricornutum SIT2-2 Silicon Transporter GI:215398382 PF03842 4e-68 [42-491] 9.4e-76 P. tricornutum SIT2-2 Silicon Transporter GI:215398382 PF03842 5e-68 [40-475] 2.8e-76 P. tricornutum SIT2-2 Silicon Transporter GI:215398382 PF03842 2e-67 [40-474] 4.4e-75 C. fusiformis SIT3 Silicon Transporter GI:3283034 PF03842 7e-22 [3-180] 2.2e-30 Table S7 Results of WolfPSort analysis of S. diplocostata SITs. The majority prediction was for localization to the plasma membrane from all three available eukaryotic subcellular location databases. Gene SdSITa SdSITb SdSITc Prediction vs. Animal Database 31 Plasma Membrane; 1 Golgi Membrane 32 Plasma Membrane Prediction vs. Plant Database 11 Plasma Membrane; 2 E.R.; 1 Vacuole 11 Plasma Membrane; 2 E.R.; 1 Vacuole 32 Plasma Membrane 11 Plasma Membrane; 2 E.R.; 1 Vacuole Prediction vs. Fungal Database 23 Plasma Membrane; 3 E.R. 26 Plasma Membrane; 1 E.R. 26 Plasma Membrane; 1 E.R. Table S8 Significant tBLASTx hits to SdSIT genes. These 156 sequences were used in a ClustalX alignment for the purposes of identifying conserved protein motifs and functionally relevant residues (charged or hydroxylated). EMBL/Genbank Gene Identifier Number gi|215398382 gi|219116172 gi|3283034 gi|1480867 gi|3283030 gi|3283038 gi|3283036 gi|3283032 gi|227460944 gi|82527177 gi|219128344 gi|219126028 gi|82527195 gi|82527191 gi|82527197 gi|224004538 gi|82527193 gi|224003147 gi|224002056 gi|82527175 gi|82527161 gi|82527185 gi|82527183 gi|82527179 gi|82527181 gi|82527169 gi|82527167 Group Species Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Phaeodactylum tricornutum Phaeodactylum tricornutum Cylindrotheca fusiformis Cylindrotheca fusiformis Cylindrotheca fusiformis Cylindrotheca fusiformis Cylindrotheca fusiformis Cylindrotheca fusiformis Synedra acus Nitzschia alba Phaeodactylum tricornutum Phaeodactylum tricornutum Thalassiosira pseudonana Skeletonema costatum Thalassiosira pseudonana Thalassiosira pseudonana Thalassiosira pseudonana Thalassiosira pseudonana Thalassiosira pseudonana Fistulifera pelliculosa Phaeodactylum tricornutum Nitzschia sp. KKT-2005 Nitzschia alba Nitzschia alba Nitzschia alba Fistulifera pelliculosa Fistulifera pelliculosa gi|94983079 gi|94983081 gi|94983087 gi|94983177 gi|94983155 gi|94983169 gi|82527199 gi|94983211 gi|94983089 gi|94983085 gi|94983191 gi|94983153 gi|94983141 gi|82527201 gi|94983171 gi|82527163 gi|82527173 gi|94983193 gi|94983143 gi|82527165 gi|94983229 gi|94983165 gi|94983181 gi|94983133 gi|94983093 gi|94983235 gi|94983111 gi|94983167 gi|94983097 gi|94983129 gi|94983103 gi|94983227 gi|94983091 gi|94983149 gi|94983223 gi|94983209 gi|82527171 gi|94983175 gi|94983131 gi|94983233 gi|94983095 gi|94983189 gi|94983105 gi|94983151 gi|94983219 gi|94983199 gi|94983107 gi|94983299 Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Pennate Diatom Centric Diatom Centric Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Thalassiosira pseudonana Thalassiosira pseudonana Porosira glacialis Thalassiosira weissflogii Thalassiosira weissflogii Minidiscus trioculatus Bacterosira sp. CCMP991 Thalassiosira weissflogii Porosira pseudodenticulata Porosira glacialis Thalassiosira rotula Minidiscus trioculatus Thalassiosira nodulolineata Thalassiosira weissflogii Bacterosira sp. CCMP991 Fistulifera pelliculosa Fistulifera pelliculosa Thalassiosira rotula Thalassiosira nodulolineata Fistulifera pelliculosa Bacterosira bathyomphala Thalassiosira minima Thalassiosira sp. CCMP1065 Thalassiosira gessneri Lauderia annulata Skeletonema menzellii Cyclotella cf. meneghiniana Thalassiosira minima Thalassiosira punctigera Thalassiosira gessneri Cyclotella striata Bacterosira bathyomphala Porosira pseudodenticulata Thalassiosira sp. CCMP353 Detonula pumila Thalassiosira weissflogii Fistulifera pelliculosa Thalassiosira weissflogii Thalassiosira gessneri Skeletonema subsalsum Thalassiosira punctigera Thalassiosira rotula Cyclotella striata Thalassiosira sp. CCMP353 Shionodiscus ritscheri Thalassiosira pacifica Cyclotella cf. meneghiniana Stephanodiscus minutulus gi|94983101 gi|94983275 gi|94983237 gi|94983249 gi|94983243 gi|82527189 gi|94983245 gi|94983145 gi|94983225 gi|94983289 gi|94983119 gi|94983293 gi|94983217 gi|94983203 gi|94983241 gi|94983291 gi|94983099 gi|94983221 gi|94983259 gi|94983127 gi|94983163 gi|94983247 gi|94983161 gi|94983121 gi|94983231 gi|94983123 gi|94983285 gi|94983301 gi|94983257 gi|94983173 gi|94983255 gi|94983269 gi|94983281 gi|94983125 gi|94983277 gi|94983297 gi|94983265 gi|94983279 gi|94983239 gi|94983267 gi|94983287 gi|94983251 gi|94983283 gi|94983215 gi|94983253 gi|82527187 gi|94983295 gi|94983271 Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Cyclotella cryptica Stephanodiscus neoastraea Skeletonema japonicum Cyclostephanos tholiformis Stephanodiscus binderanus Skeletonema costatum Stephanodiscus parvus Thalassiosira nodulolineata Detonula pumila Stephanodiscus hantzschii Cyclotella cf. meneghiniana Stephanodiscus hantzschii Thalassiosira sp. CC03-04 Thalassiosira pacifica Stephanodiscus agassizensis Stephanodiscus sp. Y98-1 Thalassiosira pseudonana Shionodiscus ritscheri Stephanodiscus minutulus Cyclotella distinguenda Thalassiosira antarctica Stephanodiscus parvus Thalassiosira antarctica Cyclotella cf. meneghiniana Skeletonema grethae Cyclotella distinguenda Stephanodiscus minutulus Stephanodiscus minutulus Stephanodiscus minutulus Thalassiosira oceanica Stephanodiscus niagarae Stephanodiscus niagarae Stephanodiscus reimerii Cyclotella distinguenda Stephanodiscus neoastraea Stephanodiscus yellowstonensis Discostella cf. pseudostelligera Stephanodiscus reimerii Stephanodiscus agassizensis Discostella stelligera Cyclostephanos sp. WTC16 Cyclostephanos invisitatus Stephanodiscus reimerii Thalassiosira sp. CC03-04 Cyclostephanos invisitatus Paralia sulcata Stephanodiscus yellowstonensis Discostella pseudostelligera gi|94983261 gi|76594269 gi|94983273 gi|20799543 gi|94983263 gi|94983139 gi|94983157 gi|94983117 gi|94983137 gi|94983187 gi|94983179 gi|94983213 gi|94983115 gi|94983135 gi|146746039 gi|94983183 gi|94983083 gi|94983159 gi|94983205 gi|94983207 gi|125661882 gi|94983147 gi|148250143 gi|125661884 gi|94983109 gi|94983185 gi|94983195 gi|82919490 gi|94983197 gi|94983113 gi|148250145 gi|70797601 gi|71152682 Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Centric Diatom Pennate Diatom Pennate Diatom Centric Diatom Centric Diatom Centric Diatom Chrysophyte (Non-diatom Stramenopile) Centric Diatom Centric Diatom Pennate Diatom Pennate Diatom Pennate Diatom Cyclotella bodanica Chaetoceros muellerii Discostella pseudostelligera Synedra acus var. radians Cyclotella bodanica Thalassiosira anguste-lineata Thalassiosira aestivalis Cyclotella atomus Thalassiosira anguste-lineata Thalassiosira sp. CCMP1093 Thalassiosira weissflogii Thalassiosira weissflogii Cyclotella sp. L1844 Thalassiosira anguste-lineata Achnanthes exigua Thalassiosira sp. CCMP1065 Thalassiosira pseudonana Thalassiosira aestivalis Thalassiosira pacifica Thalassiosira pacifica Rhopalodia gibba Thalassiosira eccentrica Synedra vaucheriae Epithemia zebra Cyclotella meneghiniana Thalassiosira sp. CCMP1093 Thalassiosira punctigera Ochromonas ovalis Thalassiosira punctigera Cyclotella meneghiniana Nitzschia communis Pseudo-nitzschia multiseries Pseudo-nitzschia pungens Table S9 Stramenopile sequences used in the maximum likelihood and Bayesian analyses of S. diplocostata SITa-c and D. grandis SITa. Taxonomy and SIT classifications are based on the EMBL/Genbank annotations. Group Synurophyte Chrysophyte Centric Diatom Centric Diatom Centric Diatom Centric Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Pennate Diatom Species & Sequence Synura petersenii SIT Ochromonas ovalis SIT Thalassiosira pseudonana SIT3 Thalassiosira pseudonana SIT2 Thalassiosira pseudonana SIT1 Skeletonema costatum SIT2 Phaeodactylum tricornutum SIT3 Phaeodactylum tricornutum SIT2 Phaeodactylum tricornutum SIT1 Cylindrotheca fusiformis SIT3 Cylindrotheca fusiformis SIT1 Cylindrotheca fusiformis SIT5 Cylindrotheca fusiformis SIT4 Cylindrotheca fusiformis SIT2 Synedra acus SIT Nitzschia alba SIT1 Accession No. GI:76594265 GI:82919490 GI:82527197 GI:224004538 GI:82527193 GI:82527191 GI:215398382 GI:219128344 GI:219126028 GI:3283034 GI:3283030 GI:3283038 GI:3283036 GI:3283032 GI:227460944 GI:82527177 Supplemental Information Figures Figure S1. Alignment of Choanoflagellate SITs and Predicted Transmembrane Domains. Topology prediction programs predicted multiple transmembrane domains (TMDs), but they disagreed as to the number of TM domains present. TMPred (black line) predicted ten TMDs in all three SdSIT proteins. TMHMM (red line) predicted nine TMDs in each. HMMTop predicted eleven TMDs in SdSITa and SdSITb (blue line), but nine in SdSITc. All programs predicted the N terminal to be cytoplasmic. The dotted magenta line shows the amalgamated results for DgSITa from TMPred (five TMDs), HMMTop (six TMDs) and TTMHMM (four TMDs). The locations of these regions are broadly in agreement those from the SdSIT sequences; however as the DgSITa protein sequence is incomplete these predictions cannot be treated as definitive. A vertical line at the side denotes where the transmembrane domain continues onto the next line of the alignment. Alignment produced using ClustalX and the default standard residue colour convention. Figure S2. Bayesian phylogenetic analysis of choanoflagellate and stramenopile SITs demonstrates that choanoflagellate SITs are monophyletic within the stramenopile SITs. The highest-likelihood tree was produced using Bayesian MCMC from an alignment of 769 positions with the WAG substitution model, four gamma-distributed rate categories with the alpha-value estimated, and the proportion of invariant sites estimated. Numbers at nodes indicate posterior probability. The scale bar indicates the average number of amino acid substitutions per site. Blue= Pennate Diatom, Green= Centric Diatom, Brown= Non-diatom Stramenopile, Red= Choanoflagellate. Supplemental Information Alignment Alignment S1. Alignment of 156 non-choanoflagellate significant tBLASTx hits to SdSITa-c and DgSITa. Sequence alignment was produced using ClustalX and is in fasta format. The non-choanoflagellate sequences used in the alignment are listed in table S6.