Supplemental Table S1. The 21 organisms used in the phylogenetic and orthology analyses Taxonomic Group No. of protein sequences Bacteria 4,240 1,906 Archaea 2,223 2,075 Metazoa 34,180 20,923 Fungi 6,736 Diplomonadida 6,500 Parabasalidea 59,672 Stramenopiles 15,743 11,390 Ciliophora 27,424 39,588 Cryptophyta 1,663 Apicomplexa 3,805 5,460 3,795 7,793 Viridiplantae 32,016 Rhodophyta 5,016 Euglenozoa 9,152 Full Name Escherichia coli Procholorococcus marinus Sulfolobus acidocaldarius Halobacterium sp. NRC-1 Homo sapiens Drosophila melanogaster Saccharomyces cerevisiae Giardia lamblia Trichomonas vaginalis Phytophthora ramorum Thalassiosira pseudonana Tetrahymena thermophila Paramecium tetraurelia Guillardia theta Cryptosporidium parvum Plasmodium falciparvum Theileria annulata Toxoplasma gondii Arabidopsis thaliana Cyanidioschyzon merolae Trypanosoma brucei Supplemental Table S2. Distribution of BLASTX hits (E-value ≤ 1e-5) between P. marinus clustered EST sequences from standard and serum-supplemented medium cultures. Protein Type Number of Sequence Hits Standard Medium Ribosomal proteins Binding proteins Hydrolase Transcriptional regulatory proteins Transporter proteins Unknown or Hypothetical proteins Aldolase Protease Histone specific proteins Lipase Oxidoreductase Serine/Threonin protein kinase Ligase Heat shock proteins Deaminase 22 0 3 1 5 9 0 0 0 0 0 0 0 2 0 Oyster SerumSupplemented Medium 23 1 0 10 10 34 1 3 2 1 1 1 1 1 1 Supplemental Table S3. Distribution of Sequences in GenBank nr and dbEST by Taxonomic Group. Number of protein sequences present, by taxonomic group in NCBI nr on May 2009 Number of EST sequences, by taxonomic group, present in NCBI dbEST, May 2009 Alveolata Metazoa/Fungi Stramenopiles Viridiplantae Bacteria Amoebozoans Euglenozoans Parabasalidea Archaea Virus Alveolata Metazoa/Fungi Stramenopiles Viridiplantae Bacteria Amoebozoans Euglenozoans Parabasalidea Jakobids Rhizarids Diplomonadids 332,861 3,083,595 59,106 1,110,444 11,094,968 68,320 119,505 120,122 360,344 953,646 873,480 37,269,034 606,109 19,422,149 402 252,004 100,968 27,398 36,869 9,775 45,564 Number of protein sequences for the 3 major phyla within the Alveolata in NCBI nr, May 2009 Number of EST sequences for the 3 major phyla within the Alveolata in NCBI dbEST, May 2009 Apicomplexa Ciliophora Dinophyceae Apicomplexa Ciliophora Dinophyceae 193,603 136,431 2,636 454,256 261,834 125,873 Supplemental Table S4 - Distribution of P. marinus orthologous protein- enco ding genes among other taxa. P. marinus EST ORFs were compared to proteins from 21 additional taxa (Supplemental Table S1). Groups of orthologous genes were identified with OrthoMCL. Row 1, lists all gene groups unique to P. marinus (paralogs). Rows 2-10 are ortholog clusters that differ by increasing number of Taxa containing the shared genes. Number of Total Number of orthologous protein sequences gene groups found in the ortholog groups 1 2 3 4 5 6 7 8 9 10 1,715 1,163 1,093 1,042 961 785 611 350 46 3 3,957 23,841 23,572 23,272 22,153 20,285 17,200 10,983 1,620 114 Orthologous Protein Distribution Ortholog group unique to P. marinus Ortholog group found in >= 2 taxa Ortholog group found in >= 3 taxa Ortholog group found in >= 4 taxa Ortholog group found in >= 5 taxa Ortholog group found in >= 8 taxa Ortholog group found in >= 12 taxa Ortholog group found in >= 16 taxa Ortholog group found in >= 20 taxa Ortholog group found in >= 22 taxa