Supplementary Material

advertisement
SUPPLEMENTARY MATERIALS AND METHODS
Choanoflagellate Culture Conditions
Cultures of the loricate choanoflagellates Diaphanoeca grandis (Ellis 1930) and
Stephanoeca diplocostata (Ellis 1929) were obtained from Barry Leadbeater (University of
Birmingham, U.K.). Each species was cultured in artificial seawater medium (36.5 gL-1
Marin Salts (Dr. Biener Aquarientechnik, Wartenberg Germany) in ddH2O. The artificial
seawater was vacuum-filtered through a 0.22µm Steriop GP Express Plus filter (Millipore,
Massachusetts U.S.A.) into a sterile 1L screw-top glass bottle (Schott Duran). The filtered
artificial seawater was then sterilized by autoclaving. New cultures were split under a
Labcaire PCR 6 Workstation hood (Labcaire Systems, Avon, U.K.) to reduce the risk of
contamination by foreign microorganisms. Cultures were grown in 100ml, 250ml and 500ml
glass bottles with plastic screw-tops (Schott Duran). Starting cultures containing 50-200ml
were topped up with sterile artificial seawater to 80% of the volume of the culture vessel.
Splitting of cultures occurred every 3-5 weeks. Up to three grains of dry-autoclaved white
long grain rice were added to provide nutrition for the prey bacteria in the cultures. For both
species, cultures were maintained at 13.5oC in an incubator.
RNA preparation
RNA was extracted from cultures of S. diplocostata using a TRIzol (Invitrogen) based
method (as employed in [1]). No antibiotic or filtration purification methods were employed
in case they interfered with the normal choanoflagellate gene expression and in particular
with transcription of biomineralization-related genes. Each RNA sample was tested for
concentration and integrity using a 2100 Bioanalyser (Agilent Technologies, Waldbronn,
Germany). The RNA 600 Nano Chip and 2100 Expert software (Agilent Technologies,
Waldbronn, Germany) were used to generate an electropherogram of the RNA samples as per
the manufacturer’s instructions. Degraded RNA samples were rejected. RNA samples were
pooled to give 55µg of total culture RNA, of which approximately 10µg was S. diplocostata
total RNA.
cDNA Library Preparation
As the RNA samples were inevitably contaminated with large amounts of rRNA and
RNA from prey bacteria present in the cultures, two rounds of poly(A) mRNA enrichment
were performed using the Dynabeads mRNA Purification Kit (Invitrogen) and subsequent
rRNA contamination determined by running 1µl of the enriched mRNA on a 2100
Bioanalyser using an RNA 6000 Picochip (Agilent Technologies, Waldbronn, Germany).
180ng of enriched mRNA (<10% rRNA contamination) was then used to construct a 454
transcriptome library as outlined in the cDNA rapid library Preparation Method (Roche).
Library quality was assessed by running 1µl of the library on a DNA High Sensitivity
Labchip (Agilent), and the number of viable library molecules per µl determined using the
KAPA 454 qPCR library quantification kit (Kapabiosystems) on a Step-One qPCR machine
(Applied Biosystems).
454 Sequencing
For full scale emulsification PCR a ratio of 1.3 molecules per bead was employed.
Eight SV oil tubes from the GS Titanium SV emPCR Kit (Lib-L) were used to generate
sufficient enriched templated beads for 454 sequencing. Approximately 2x106 enriched
templated beads was subjected to 454 pyrosequencing on half of a picotitre plate on the GS
FLX sequencer (Roche) using the GS FLX Titanium Chemistry according to the
manufacturer's protocol.
Assembly method
Post-run sequence outputs were viewed in gsRunBrowser in order to verify their
metrics and confirm that the sequencing was successful. An assembly for the S. diplocostata
sequence data was generated using the Newbler v2.3 software (Roche).
Bioinformatic Analysis
The S. diplocostata EST contigs were filtered to remove those contigs ≤10 bp in
length, resulting in the removal of 528 contigs from the dataset. Custom-written BioPerl
scripts were used to classify the source organism for each of the remaining 25,797 contigs:
first via tBLASTx [2] against a local copy of the NCBI’s non-redundant (nr) database
(October 2010 release), accepting anything with with a threshold E-value < 0.01, and
secondly by taking the best hit for each contig and interrogating the Entrez nucleotide
database (http://www.ncbi.nlm.nih.gov/nucleotide) to find its taxonomic identity within the
Genbank taxonomy (http://www.ncbi.nlm.nih.gov/taxonomy).
In this way, a probable taxonomic identity was assigned to over half of the contigs
(13,716), allowing the identification of over 3376 from choanoflagellates. A stand-alone copy
of InterProScan v 4.6 [3] was used to obtain InterPro and GO annotation for the contigs
(InterPro database version 27.0; http://www.ebi.ac.uk/interpro).
Diaphanoeca grandis Genomic DNA Extraction and Analysis
Cultures of D. grandis were treated with a combination of 2.4ng/ml ampicillin
(Sigma), 1.2ng/ml kanamycin (Sigma) and 1.2ng/ml streptomycin-penicillin (Gibco) for 36
hours in order to reduce the amount of bacterial contamination. 50ml of culture was then
filtered through a 20μm nylon mesh (Small Parts Inc., Florida, USA) and 15ml of the filtrate
collected for gDNA extraction, in a further attempt to remove a portion of the natural
bacterial contamination present in the cultures. Approximately 20µg of gDNA was extracted
from those cultures of D. grandis that were observed to have the highest amount of
choanoflagellate material compared to bacterial contamination. DNA was extracted by a
CTAB Buffer based method [4].
The extracted gDNA was sequenced using 120bp paired-end reads with Illumina
HiSeq2000 sequencing (Illumina Inc.). The sequence reads produced were assembled into
contigs with ABySS v1.2.5 [5] using the default settings. The assembled genomic dataset was
analyzed further by tBLASTx [2] to detect sequence similarity to individual genes. A wider
taxonomic assignment was conducted using the metagenomic analysis program PhymBL
v3.2 [6] to classify contigs as being of bacterial or choanoflagellate origin. The
choanoflagellate reference dataset comprised choanoflagellate sequences available in the
EMBL/Genbank WGS genomes and non-redundant nucleotide sequences databases. The
prokaryotic reference dataset used was the bacterial/archeal genome database included with
PymmBL v3.2. Contigs were arbitrarily divided into those <1kb and those >1kb in size.
These two datasets were used as separate queries and for both query datasets the default
PhymmBL settings were used.
SUPPLEMENTARY RESULTS
Stephanoeca diplocostata EST Dataset
RNA samples extracted from S. diplocostata cultures were oligo-dT bead treated to
enrich for poly(A) tagged eukaryotic mRNA. The success of this process was tested for using
the Agilent Bioanalyser Picochip and Pico assay software. By comparison to control
eukaryotic RNA samples, the first round of enrichment produced a large reduction in
contamination and a second round of poly(A) selection produced a marked reduction in
rRNA content, with almost all of the rRNA peaks disappearing but a broad mRNA peak
being retained. The remaining rRNA was measured at 7% of the total sample, below the 10%
threshold recommended for 454 EST sequencing. The total choanoflagellate RNA after two
rounds of poly(A) enrichment amounted to 180ng.
The results of the EST sequencing and assembly are summarized in table S1. The 454
Titanium sequencing produced 0.261Gb. The average read length was 329 bases (standard
deviation ±110 bases, median read length= 347 bases) with a maximum read length of 659
bases, roughly in keeping with the predicted metrics for this sequencing platform. The Q40
score (a base identification of 99.9% accuracy) was 94.4%. The Newbler assembly of the
reads produced 26325 contigs of mean length 962bp. The longest contig was 12.6kb long.
tBLASTx Analysis
The tBLASTx search of the EST dataset against the full EMBL/Genbank nonredundant nucleotide database was used to assign (a) similarity and (b) taxonomic identity to
each contig. Hits to selected taxonomic groups are expressed as the total number and as a
percentage of the EST dataset in table 4. It should be noted that these are only top hits and in
the vast majority of cases equally or only marginally less significant hits to M. brevicollis
were also returned.
The tBLASTx findings in Table 4 demonstrate the success of the poly(A) enrichment
procedures in reducing the levels of bacterial, archaeal, viral and rRNA contamination. The
enrichment is estimated to have reduced prokaryotic content from approximately 80% in the
starting material to 13.5% of the final contigs. The true number of bacterial contigs may be
even lower, given the proliferation of prokaryotic-to-eukaryotic lateral gene transfer found in
choanoflagellates [7,8]. Of the tBLASTx hits 241 were to known ribosomal RNA sequences,
with 15% (37) eukaryotic rRNA and the remaining 85% (204) bacterial. Again this
demonstrates the success of the mRNA enrichment procedures. All predicted eukaryotic
rRNA contigs produced perfect (E-value= 0.0) hits to the S. diplocostata 18S and 23S
sequences from the EMBL/Genbank databases.
There is no evidence for contamination from other eukaryotes. There were no large
numbers of hits to one species (apart from M. brevicollis) and an absence of nonchoanoflagellate housekeeping genes. Only 62 top hits were to human sequences, each
having low E-values, indicating that RNA samples were not contaminated by lab workers
during RNA extraction or cDNA library construction.
Approximately 24% of top hits were to M. brevicollis sequences, with a further 0.6%
coming from other choanoflagellate sequences in the EMBL/Genbank database (note that this
analysis pre-dates the submission of the S. rosetta genome to Genbank). These hits included
housekeeping genes that would be expected to be conserved within clades, e.g. ribosomal
RNA, alpha tubulin [9]. M. brevicollis represented the single species with the largest number
of hits, the majority of which had highly significant E-values confirming the successful
sequencing of loricate choanoflagellate genes.
The metazoans were the largest clade producing top hits, with 39.8% of hits. This is
largely due to the tBLASTx query database containing over 50 fully sequenced animal
species versus only only one fully sequenced choanoflagellate species, Monosiga brevicollis
[10], and the many more animals with smaller scale sequence depositions into
EMBL/Genbank. The high number of hits to metazoan sequences (and to opisthokont
sequences, 68.2%) once again confirms the opisthokont affinity of loricate choanoflagellates
and the evolutionary relationship between the choanoflagellates and metazoans [11–13].
A further notable finding of the tBLASTx analysis of the S. diplocostata EST dataset
is that there are a large number of hits to sequences from other, distantly related eukaryotic
groups. The most prominent of these are the stramenopiles (5.5%) and viridiplantae
(archaeplastids) (6.8%). Given the low levels of sampling in these groups with respect to
large-scale sequencing project, bias due to sequence availability cannot be used to fully
explain these results. One explanation is gene loss in the close relatives of loricate
choanoflagellates from the eukaryotic last common ancestor. Gene loss has been observed in
the non-loricate choanoflagellates [14] and metazoans [15]. Another explanation for the
tBLASTx results is eukaryotic-eukaryotic lateral gene transfer, known to be a prominent
feature of choanoflagellate genomes [16,17].
Diaphanoeca grandis Genome Dataset
The Illumina sequencing of genomic DNA from D. grandis cultures provided
329,237,297bp of sequence data. The sequenced reads were assembled into 921,181 contigs,
(sequence dataset available from the authors on request). However these were mainly short
contigs (N50=725bp, mean contig length= 357.41bp). Local tBLASTx searches detected
100% matches to known sequences from D. grandis [1,9], confirming that D. grandis
genomic material had been successfully sequenced. PhymBL analysis found that the vast
majority (>98%) of contigs greater than 1kb in length, and all contigs >10kb in length, were
of bacterial origin. The dataset did not contain sufficient choanoflagellate genes, nor genes of
sufficient completeness, to merit further large-scale taxonomic or protein domain analysis.
The partial genome sequence data did allow detection of contigs with significant
similarity to parts of the SdSITa sequence using tBLASTx searches (see Results). PCR
primers were designed from the longest contig sequence (see table S4) and the amplified PCR
product (DgSITa) cloned, sequenced and used for all further analyses (see Materials and
Methods, Results).
REFERENCES
1
Steenkamp, E. T., Wright, J. & Baldauf, S. L. 2006 The protistan origins of animals
and fungi. Molecular Biology and Evolution 23, 93-106.
2
Altschul, S. F., Madden, T. L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. &
Lipman, D. J. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Research 25, 3389-3402.
3
Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. &
Lopez, R. 2005 InterProScan: protein domains identifier. Nucleic Acids Research 33,
W116-W120.
4
Doyle, J. & Doyle, J. 1987 A rapid DNA isolation method for small quantities of fresh
tissues. Phytochemical Bulletin 19, 11-15.
5
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M. & Birol, I.
2009 ABySS: a parallel assembler for short read sequence data. Genome Research 19,
1117-11123.
6
Brady, A. & Salzberg, S. L. 2009 Phymm and PhymmBL: metagenomic phylogenetic
classification with interpolated Markov models. Nature Methods 6, 673-676.
7
Torruella, G., Suga, H., Riutort, M., Peretó, J. & Ruiz-Trillo, I. 2009 The evolutionary
history of lysine biosynthesis pathways within eukaryotes. Journal of Molecular
Evolution 69, 240-248.
8
Sun, G. & Huang, J. 2011 Horizontally acquired DAP pathway as a unit of selfregulation. Journal of Evolutionary Biology 24, 587-595.
9
Carr, M., Leadbeater, B. S. C., Hassan, R., Nelson, M. & Baldauf, S. L. 2008
Molecular phylogeny of choanoflagellates, the sister group to Metazoa. Proceedings of
the National Academy of Sciences of the United States of America 105, 16641-16646.
10
King, N. et al. 2008 The genome of the choanoflagellate Monosiga brevicollis and the
origin of metazoans. Nature 451, 783-788.
11
Nitsche, F., Carr, M., Arndt, H. & Leadbeater, B. S. C. 2011 Higher Level Taxonomy
and Molecular Phylogenetics of the Choanoflagellatea. Journal of Eukaryotic
Microbiology 58, 452-462.
12
Ruiz-Trillo, I., Roger, A. J., Burger, G., Gray, M. W. & Lang, B. F. 2008 A
phylogenomic investigation into the origin of metazoa. Molecular Biology and
Evolution 25, 664-672.
13
Torruella, G., Derelle, R., Jordi, P., Lang, B. F., Andrew, R., Shalchian-Tabrizi, K. &
Iñaki, R.-T. 2011 Phylogenetic relationships within the Opisthokonta based on
phylogenomic analyses of conserved single copy protein domains. Molecular Biology
and Evolution 29, 531-544.
14
Sebe-Pedros, A., de Mendoza, A., Lang, B. F., Degnan, B. M. & Iñaki, R.-T. 2010
Unexpected repertoire of metazoan transcription factors in the unicellular holozoan
Capsaspora owczarzaki. Molecular Biology and Evolution 28, 1241-1254.
15
Chauve, C., Doyon, J.-P. & El-Mabrouk, N. 2008 Gene family evolution by
duplication, speciation, and loss. Journal of Computational Biology  15, 1043-1062.
16
Sun, G., Yang, Z., Ishwar, A. & Huang, J. 2010 Algal genes in the closest relatives of
animals. Molecular Biology and Evolution 27, 2879-2889.
17
Nedelcu, A.M., Miles, I. H., Fagir, A.M. & Karol, K. 2008 Adaptive eukaryote-toeukaryote lateral gene transfer: stress-related genes of algal origin in the closest
unicellular relatives of animals. Journal of Evolutionary Biology 21, 1852-1860.
Supplemental Information Tables
Table S1. Summary statistics for the EST sequencing results. Statistics refer to the EST
dataset after assembly by Newbler.
%A
%T
%G
%C
No. of contigs
Mean contig length
Median contig length
Max. contig size
N50
21
25
25
22
26,325
962
720
12,628
1,300
Table S2. Top hits to contigs from the tBLASTx search against the EMBL/Genbank
database. Classifications are as per the Entrez taxonomy (as of October 2010). The total
number of contigs in the search query was 25,797. The number of contigs that provided a hit
was 13,716 and the number of contigs with no significant similarity at e-value <0.01 was
12081.
Group/Species
Fungi/Metazoa (i.e. Opisthokonts)
Metazoa
Eumetazoa
Homo sapiens
Choanoflagellates
Monosiga brevicollis
Fungi
Amoebozoa
Euglenozoa
Viridaeplantae
Haptophyta
Stramenopiles
Diatoms
Rhizaria
Alveolata
Bacteria
Archaea
Viruses
Number of contigs with top
hit to this taxon
9352
5457
5210
62
3376
3286
517
230
74
926
4
749
304
4
223
1850
60
29
% of Total Contigs with hits
68.2
39.8
38
0.5
24.6
24
3.8
1.7
0.5
6.8
0.03
5.5
2.2
0.03
1.6
13.5
0.4
0.2
Table S3 BLAST and InterProScan results from SIT-like S. diplocostata EST contigs.
Sequences are deposited within EMBL/Genbank BioProject PRJEB1282.
EMBL/Genbank
Accession No.
HAAH01000001
No. of
Reads
322
64
HAAH01000002
9
HAAH01000003
HAAH01000004
324
331
HAAH01000005
64
HAAH01000006
HMMPfa
m Domain
[Region]
Score
PF03842
Silicon
Transporter
[52-491]
7.3 e-76
PF03842
Silicon
transporter
[11-193]
3.4e-30
PF03842
Silicon
transporter
[2-259]
2.3e-36
PF03842
Silicon
transporter
[45-260]
1.1e-36
PF03842
Silicon
transporter
[56-494]
3.9e-76
PF03842
Silicon
transporter
[9-233]
3.2e-35
tBLASTx top E-Value
hit
PsiBLAST
Top Hit
E-Value
P.
9.00E-27 P.
7.00E-66
tricornutum
tricornutum
SIT2-2
SIT3
GI:215398379
GI:215398382
7.00E-11
S. acus SIT
GI:227460943
1.00E-23
S. acus SIT3
GI:227460944
1.00E-17
2.00E-31
P.
N. pelliculosa
tricornutum
SIT1
SIT3
GI:82527174
GI:215398382
S. acus SIT
2.00E-24 P.
6.00E-28
GI:227460943
tricornutum
Cell Surface
Receptor
Protein
GI:219116172
1.00E-27
1.00E-66
P.
P.
tricornutum
tricornutum
SIT2-2
SIT3
GI:215398379
GI:215398382
9.00E-11 P.
4.00E-28
tricornutum
SIT3
S. acus SIT
GI:215398382
GI:227460943
Table S4 Primer Sequences designed for the amplification of S. diplocostata SIT-like
(SdSIT) and D. grandis SIT-like (DgSIT) genes. All sequences are given 5’-3’. Primers
were synthesized by Sigma.
Primer
DgSITa_R
DgSITa_F
SdSIT_24102_F
SdSIT_24102_R
SdSIT_00527_F1
SdSIT_00527_F2
SdSIT_00527_R1
SdSIT_00527_R2
SdSIT_10214 F
SdSIT_R
Sequence
GGCATGAGCACGGTGTAGTACGC
AACAATGGAACAACCCTCCATGGG
CCATCATCTAGAAGATCCTCAAAG
CGTATTTAAGTAATGAAACGATAGTGT
CACCCGACCACAAGGACCAG
ACAATGGATAAGAGCCACATCC
GTGGAAATAATAAAGATTTAATGAGAGTAC
AAAGATTTAATGAGAGTACAACAATTACCC
AACATGGAGAAAAGCCACG
GGCTGGTGCAGGTCAAATGGT
Table S5 Successful primer combinations and their resulting products.
Product Name
SdSITa
SdSITb.1
SdSITb.2
SdSITc
DgSITa
Forward Primer
SdSIT_24102_F
SdSIT_00527_F1
SdSIT_00527_F2
SdSIT_10214_F
DgSITa_F
Reverse Primer
SdSIT24102_R
SdSIT_00527_R2
SdSIT_00527_R1
SdSIT_R
DgSITa_R
Table S6 Choanoflagellate SIT similarity and protein domain search results. All
analyses were conducted using default settings. BLAST searches were done against the
EBML/Genbank non-redundant databases.
Gene
tBLASTx Top Hit
(EMBL/Genbank
Accession No.)
SdSITa
P. tricornutum SIT2(HE981735)
2 GI:215398379
2e-31
SdSITb
P. tricornutum SIT2(HE981736)
2 GI:215398379
3e-27
SdSITc
S. acus SIT
(HE981737)
GI:227460943
2e-24
DgSITa
C. fusiformis SIT5
(HE981738)
GI:3283037
3e-17
PsiBLAST Top Hit
InterProScan
HMMPfam
Domain
P. tricornutum SIT2-2 Silicon Transporter
GI:215398382
PF03842
4e-68
[42-491] 9.4e-76
P. tricornutum SIT2-2 Silicon Transporter
GI:215398382
PF03842
5e-68
[40-475] 2.8e-76
P. tricornutum SIT2-2 Silicon Transporter
GI:215398382
PF03842
2e-67
[40-474] 4.4e-75
C. fusiformis SIT3
Silicon Transporter
GI:3283034
PF03842
7e-22
[3-180] 2.2e-30
Table S7 Results of WolfPSort analysis of S. diplocostata SITs. The majority prediction
was for localization to the plasma membrane from all three available eukaryotic subcellular
location databases.
Gene
SdSITa
SdSITb
SdSITc
Prediction vs.
Animal Database
31 Plasma
Membrane; 1 Golgi
Membrane
32 Plasma Membrane
Prediction vs. Plant
Database
11 Plasma
Membrane; 2 E.R.; 1
Vacuole
11 Plasma
Membrane; 2 E.R.; 1
Vacuole
32 Plasma Membrane 11 Plasma
Membrane; 2 E.R.; 1
Vacuole
Prediction vs.
Fungal Database
23 Plasma
Membrane; 3 E.R.
26 Plasma
Membrane; 1 E.R.
26 Plasma
Membrane; 1 E.R.
Table S8 Significant tBLASTx hits to SdSIT genes. These 156 sequences were used in a
ClustalX alignment for the purposes of identifying conserved protein motifs and functionally
relevant residues (charged or hydroxylated).
EMBL/Genbank Gene Identifier
Number
gi|215398382
gi|219116172
gi|3283034
gi|1480867
gi|3283030
gi|3283038
gi|3283036
gi|3283032
gi|227460944
gi|82527177
gi|219128344
gi|219126028
gi|82527195
gi|82527191
gi|82527197
gi|224004538
gi|82527193
gi|224003147
gi|224002056
gi|82527175
gi|82527161
gi|82527185
gi|82527183
gi|82527179
gi|82527181
gi|82527169
gi|82527167
Group
Species
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Phaeodactylum tricornutum
Phaeodactylum tricornutum
Cylindrotheca fusiformis
Cylindrotheca fusiformis
Cylindrotheca fusiformis
Cylindrotheca fusiformis
Cylindrotheca fusiformis
Cylindrotheca fusiformis
Synedra acus
Nitzschia alba
Phaeodactylum tricornutum
Phaeodactylum tricornutum
Thalassiosira pseudonana
Skeletonema costatum
Thalassiosira pseudonana
Thalassiosira pseudonana
Thalassiosira pseudonana
Thalassiosira pseudonana
Thalassiosira pseudonana
Fistulifera pelliculosa
Phaeodactylum tricornutum
Nitzschia sp. KKT-2005
Nitzschia alba
Nitzschia alba
Nitzschia alba
Fistulifera pelliculosa
Fistulifera pelliculosa
gi|94983079
gi|94983081
gi|94983087
gi|94983177
gi|94983155
gi|94983169
gi|82527199
gi|94983211
gi|94983089
gi|94983085
gi|94983191
gi|94983153
gi|94983141
gi|82527201
gi|94983171
gi|82527163
gi|82527173
gi|94983193
gi|94983143
gi|82527165
gi|94983229
gi|94983165
gi|94983181
gi|94983133
gi|94983093
gi|94983235
gi|94983111
gi|94983167
gi|94983097
gi|94983129
gi|94983103
gi|94983227
gi|94983091
gi|94983149
gi|94983223
gi|94983209
gi|82527171
gi|94983175
gi|94983131
gi|94983233
gi|94983095
gi|94983189
gi|94983105
gi|94983151
gi|94983219
gi|94983199
gi|94983107
gi|94983299
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Thalassiosira pseudonana
Thalassiosira pseudonana
Porosira glacialis
Thalassiosira weissflogii
Thalassiosira weissflogii
Minidiscus trioculatus
Bacterosira sp. CCMP991
Thalassiosira weissflogii
Porosira pseudodenticulata
Porosira glacialis
Thalassiosira rotula
Minidiscus trioculatus
Thalassiosira nodulolineata
Thalassiosira weissflogii
Bacterosira sp. CCMP991
Fistulifera pelliculosa
Fistulifera pelliculosa
Thalassiosira rotula
Thalassiosira nodulolineata
Fistulifera pelliculosa
Bacterosira bathyomphala
Thalassiosira minima
Thalassiosira sp. CCMP1065
Thalassiosira gessneri
Lauderia annulata
Skeletonema menzellii
Cyclotella cf. meneghiniana
Thalassiosira minima
Thalassiosira punctigera
Thalassiosira gessneri
Cyclotella striata
Bacterosira bathyomphala
Porosira pseudodenticulata
Thalassiosira sp. CCMP353
Detonula pumila
Thalassiosira weissflogii
Fistulifera pelliculosa
Thalassiosira weissflogii
Thalassiosira gessneri
Skeletonema subsalsum
Thalassiosira punctigera
Thalassiosira rotula
Cyclotella striata
Thalassiosira sp. CCMP353
Shionodiscus ritscheri
Thalassiosira pacifica
Cyclotella cf. meneghiniana
Stephanodiscus minutulus
gi|94983101
gi|94983275
gi|94983237
gi|94983249
gi|94983243
gi|82527189
gi|94983245
gi|94983145
gi|94983225
gi|94983289
gi|94983119
gi|94983293
gi|94983217
gi|94983203
gi|94983241
gi|94983291
gi|94983099
gi|94983221
gi|94983259
gi|94983127
gi|94983163
gi|94983247
gi|94983161
gi|94983121
gi|94983231
gi|94983123
gi|94983285
gi|94983301
gi|94983257
gi|94983173
gi|94983255
gi|94983269
gi|94983281
gi|94983125
gi|94983277
gi|94983297
gi|94983265
gi|94983279
gi|94983239
gi|94983267
gi|94983287
gi|94983251
gi|94983283
gi|94983215
gi|94983253
gi|82527187
gi|94983295
gi|94983271
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Cyclotella cryptica
Stephanodiscus neoastraea
Skeletonema japonicum
Cyclostephanos tholiformis
Stephanodiscus binderanus
Skeletonema costatum
Stephanodiscus parvus
Thalassiosira nodulolineata
Detonula pumila
Stephanodiscus hantzschii
Cyclotella cf. meneghiniana
Stephanodiscus hantzschii
Thalassiosira sp. CC03-04
Thalassiosira pacifica
Stephanodiscus agassizensis
Stephanodiscus sp. Y98-1
Thalassiosira pseudonana
Shionodiscus ritscheri
Stephanodiscus minutulus
Cyclotella distinguenda
Thalassiosira antarctica
Stephanodiscus parvus
Thalassiosira antarctica
Cyclotella cf. meneghiniana
Skeletonema grethae
Cyclotella distinguenda
Stephanodiscus minutulus
Stephanodiscus minutulus
Stephanodiscus minutulus
Thalassiosira oceanica
Stephanodiscus niagarae
Stephanodiscus niagarae
Stephanodiscus reimerii
Cyclotella distinguenda
Stephanodiscus neoastraea
Stephanodiscus yellowstonensis
Discostella cf. pseudostelligera
Stephanodiscus reimerii
Stephanodiscus agassizensis
Discostella stelligera
Cyclostephanos sp. WTC16
Cyclostephanos invisitatus
Stephanodiscus reimerii
Thalassiosira sp. CC03-04
Cyclostephanos invisitatus
Paralia sulcata
Stephanodiscus yellowstonensis
Discostella pseudostelligera
gi|94983261
gi|76594269
gi|94983273
gi|20799543
gi|94983263
gi|94983139
gi|94983157
gi|94983117
gi|94983137
gi|94983187
gi|94983179
gi|94983213
gi|94983115
gi|94983135
gi|146746039
gi|94983183
gi|94983083
gi|94983159
gi|94983205
gi|94983207
gi|125661882
gi|94983147
gi|148250143
gi|125661884
gi|94983109
gi|94983185
gi|94983195
gi|82919490
gi|94983197
gi|94983113
gi|148250145
gi|70797601
gi|71152682
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Centric Diatom
Pennate Diatom
Pennate Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Chrysophyte
(Non-diatom
Stramenopile)
Centric Diatom
Centric Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Cyclotella bodanica
Chaetoceros muellerii
Discostella pseudostelligera
Synedra acus var. radians
Cyclotella bodanica
Thalassiosira anguste-lineata
Thalassiosira aestivalis
Cyclotella atomus
Thalassiosira anguste-lineata
Thalassiosira sp. CCMP1093
Thalassiosira weissflogii
Thalassiosira weissflogii
Cyclotella sp. L1844
Thalassiosira anguste-lineata
Achnanthes exigua
Thalassiosira sp. CCMP1065
Thalassiosira pseudonana
Thalassiosira aestivalis
Thalassiosira pacifica
Thalassiosira pacifica
Rhopalodia gibba
Thalassiosira eccentrica
Synedra vaucheriae
Epithemia zebra
Cyclotella meneghiniana
Thalassiosira sp. CCMP1093
Thalassiosira punctigera
Ochromonas ovalis
Thalassiosira punctigera
Cyclotella meneghiniana
Nitzschia communis
Pseudo-nitzschia multiseries
Pseudo-nitzschia pungens
Table S9 Stramenopile sequences used in the maximum likelihood and Bayesian
analyses of S. diplocostata SITa-c and D. grandis SITa. Taxonomy and SIT classifications
are based on the EMBL/Genbank annotations.
Group
Synurophyte
Chrysophyte
Centric Diatom
Centric Diatom
Centric Diatom
Centric Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Pennate Diatom
Species & Sequence
Synura petersenii SIT
Ochromonas ovalis SIT
Thalassiosira pseudonana SIT3
Thalassiosira pseudonana SIT2
Thalassiosira pseudonana SIT1
Skeletonema costatum SIT2
Phaeodactylum tricornutum SIT3
Phaeodactylum tricornutum SIT2
Phaeodactylum tricornutum SIT1
Cylindrotheca fusiformis SIT3
Cylindrotheca fusiformis SIT1
Cylindrotheca fusiformis SIT5
Cylindrotheca fusiformis SIT4
Cylindrotheca fusiformis SIT2
Synedra acus SIT
Nitzschia alba SIT1
Accession No.
GI:76594265
GI:82919490
GI:82527197
GI:224004538
GI:82527193
GI:82527191
GI:215398382
GI:219128344
GI:219126028
GI:3283034
GI:3283030
GI:3283038
GI:3283036
GI:3283032
GI:227460944
GI:82527177
Supplemental Information Figures
Figure S1. Alignment of Choanoflagellate SITs and Predicted Transmembrane
Domains. Topology prediction programs predicted multiple transmembrane domains
(TMDs), but they disagreed as to the number of TM domains present. TMPred (black line)
predicted ten TMDs in all three SdSIT proteins. TMHMM (red line) predicted nine TMDs in
each. HMMTop predicted eleven TMDs in SdSITa and SdSITb (blue line), but nine in
SdSITc. All programs predicted the N terminal to be cytoplasmic. The dotted magenta line
shows the amalgamated results for DgSITa from TMPred (five TMDs), HMMTop (six
TMDs) and TTMHMM (four TMDs). The locations of these regions are broadly in
agreement those from the SdSIT sequences; however as the DgSITa protein sequence is
incomplete these predictions cannot be treated as definitive. A vertical line at the side denotes
where the transmembrane domain continues onto the next line of the alignment. Alignment
produced using ClustalX and the default standard residue colour convention.
Figure S2. Bayesian phylogenetic analysis of choanoflagellate and stramenopile SITs
demonstrates that choanoflagellate SITs are monophyletic within the stramenopile
SITs. The highest-likelihood tree was produced using Bayesian MCMC from an alignment of
769 positions with the WAG substitution model, four gamma-distributed rate categories with
the alpha-value estimated, and the proportion of invariant sites estimated. Numbers at nodes
indicate posterior probability. The scale bar indicates the average number of amino acid
substitutions per site. Blue= Pennate Diatom, Green= Centric Diatom, Brown= Non-diatom
Stramenopile, Red= Choanoflagellate.
Supplemental Information Alignment
Alignment S1. Alignment of 156 non-choanoflagellate significant tBLASTx hits to
SdSITa-c and DgSITa. Sequence alignment was produced using ClustalX and is in fasta
format. The non-choanoflagellate sequences used in the alignment are listed in table S6.
Download