Supplemental Table S1. The 21 organisms used in

advertisement
Supplemental Table S1. The 21 organisms used in
the phylogenetic and orthology analyses
Taxonomic Group No. of protein
sequences
Bacteria
4,240
1,906
Archaea
2,223
2,075
Metazoa
34,180
20,923
Fungi
6,736
Diplomonadida
6,500
Parabasalidea
59,672
Stramenopiles
15,743
11,390
Ciliophora
27,424
39,588
Cryptophyta
1,663
Apicomplexa
3,805
5,460
3,795
7,793
Viridiplantae
32,016
Rhodophyta
5,016
Euglenozoa
9,152
Full Name
Escherichia coli
Procholorococcus marinus
Sulfolobus acidocaldarius
Halobacterium sp. NRC-1
Homo sapiens
Drosophila melanogaster
Saccharomyces cerevisiae
Giardia lamblia
Trichomonas vaginalis
Phytophthora ramorum
Thalassiosira pseudonana
Tetrahymena thermophila
Paramecium tetraurelia
Guillardia theta
Cryptosporidium parvum
Plasmodium falciparvum
Theileria annulata
Toxoplasma gondii
Arabidopsis thaliana
Cyanidioschyzon merolae
Trypanosoma brucei
Supplemental Table S2. Distribution of BLASTX hits (E-value ≤
1e-5) between P. marinus clustered EST sequences from
standard and serum-supplemented medium cultures.
Protein Type
Number of Sequence Hits
Standard Medium
Ribosomal proteins
Binding proteins
Hydrolase
Transcriptional regulatory proteins
Transporter proteins
Unknown or Hypothetical proteins
Aldolase
Protease
Histone specific proteins
Lipase
Oxidoreductase
Serine/Threonin protein kinase
Ligase
Heat shock proteins
Deaminase
22
0
3
1
5
9
0
0
0
0
0
0
0
2
0
Oyster SerumSupplemented
Medium
23
1
0
10
10
34
1
3
2
1
1
1
1
1
1
Supplemental Table S3. Distribution of Sequences in GenBank nr and
dbEST by Taxonomic Group.
Number of protein sequences
present, by taxonomic group in
NCBI nr on May 2009
Number of EST sequences, by
taxonomic group, present in NCBI
dbEST, May 2009
Alveolata
Metazoa/Fungi
Stramenopiles
Viridiplantae
Bacteria
Amoebozoans
Euglenozoans
Parabasalidea
Archaea
Virus
Alveolata
Metazoa/Fungi
Stramenopiles
Viridiplantae
Bacteria
Amoebozoans
Euglenozoans
Parabasalidea
Jakobids
Rhizarids
Diplomonadids
332,861
3,083,595
59,106
1,110,444
11,094,968
68,320
119,505
120,122
360,344
953,646
873,480
37,269,034
606,109
19,422,149
402
252,004
100,968
27,398
36,869
9,775
45,564
Number of protein sequences
for the 3 major phyla within the
Alveolata in NCBI nr, May 2009
Number of EST sequences for the 3
major phyla within the Alveolata in
NCBI dbEST, May 2009
Apicomplexa
Ciliophora
Dinophyceae
Apicomplexa
Ciliophora
Dinophyceae
193,603
136,431
2,636
454,256
261,834
125,873
Supplemental Table S4 - Distribution of P. marinus orthologous protein- enco
ding genes among other taxa. P. marinus EST ORFs were compared to proteins
from 21 additional taxa (Supplemental Table S1). Groups of orthologous genes were
identified with OrthoMCL. Row 1, lists all gene groups unique to P. marinus (paralogs).
Rows 2-10 are ortholog clusters that differ by increasing number of Taxa containing
the shared genes.
Number of
Total Number of
orthologous protein sequences
gene groups found in the
ortholog groups
1
2
3
4
5
6
7
8
9
10
1,715
1,163
1,093
1,042
961
785
611
350
46
3
3,957
23,841
23,572
23,272
22,153
20,285
17,200
10,983
1,620
114
Orthologous Protein Distribution
Ortholog group unique to P. marinus
Ortholog group found in >= 2 taxa
Ortholog group found in >= 3 taxa
Ortholog group found in >= 4 taxa
Ortholog group found in >= 5 taxa
Ortholog group found in >= 8 taxa
Ortholog group found in >= 12 taxa
Ortholog group found in >= 16 taxa
Ortholog group found in >= 20 taxa
Ortholog group found in >= 22 taxa
Download