PLASTID-TARGETED PROTEINS ARE ABSENT FROM THE PROTEOMES OF ACHLYA HYPOGYNA AND THRAUSTOTHECA CLAVATA (OOMYCOTA, STRAMENOPILA): IMPLICATIONS FOR THE ORIGIN OF CHROMALVEOLATE PLASTIDS AND THE ‘GREEN GENE’ HYPOTHESIS Lindsay Rukenbrod A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Science Center for Marine Science University of North Carolina Wilmington 2012 Approved by Advisory Committee D. Wilson Freshwater Jeremy Morgan Allison Taylor J. Craig Bailey Chair Accepted by Digitally signed by Robert Roer DN: cn=Robert Roer, o=UNCW, ou=Graduate School and Research, email=roer@uncw.edu, c=US Date: 2012.11.27 15:07:17 -05'00' Dean, Graduate School This thesis has been prepared in the style and format consistent with the Journal of Eukaryotic Microbiology. ii TABLE OF CONTENTS ABSTRACT .....................................................................................................................iv ACKNOWLEDGMENTS ..................................................................................................vi DEDICATION ................................................................................................................. vii LIST OF TABLES .......................................................................................................... viii LIST OF FIGURES ..........................................................................................................ix CHAPTER 1: Implications for the origin of chromalveolate plastids ............................... X INTRODUCTION .................................................................................................. 1 METHODS............................................................................................................ 3 RESULTS AND DISCUSSION ............................................................................. 4 Revised Hypotheses for the Evolution of Chromalveolate Plastids ............ 6 CHAPTER 2: Do chromalveolate genomes encode ‘green genes’? ............................ 15 INTRODUCTION ................................................................................................ 16 METHODS.......................................................................................................... 18 RESULTS AND DISCUSSION ........................................................................... 19 Green Genes in Oomycetes and Other Chromalveolates? ...................... 22 SUPPLEMENTAL INFORMATION................................................................................ 32 LITERATURE CITED .................................................................................................... 41 iii ABSTRACT Chapter 1 The chromalveolate hypothesis predicts that extant nonphotosynthetic stramenopiles are secondarily nonphotosynthetic and derived from ancestors bearing a secondary redtype plastid. To test this hypothesis, proteomes of the oomycetes Achlya hypogyna and Thraustotheca clavata were canvassed for plastid-targeted genes. Proteins for each species encoding putative plastid-targeting signal peptides were identified, annotated, and assigned to protein families if possible. Forty-six candidate proteins were culled from the two genomes. Bioinformatic analyses revealed that the proteomes of Achlya and Thraustotheca do not encode plastid-targeted genes acquired by endosymbiotic gene transfer. All proteins possessing non-mitochondrial-targeting signal peptides identified were judged to belong to the secretome (i.e, extracellularly secreted proteins). These results indicate that oomycetes are ancestrally aplastidic stramenopiles and do not support the chromalveolate theory of plastid evolution. Revised hypotheses for the origin of plastids characterized by chlorophylls a and c and fucoxanthin are presented. It is concluded that alveolate and stramenopile plastids are likely tertiary or higher order plastids, not secondary plastids. Chapter 2 The hypothesis that a green algal symbiosis preceded the red algal symbiont that gave rise to red-type plastids in the ancestors of the chromalveolates is reexamined. A network approach was used to detect nuclear encoded proteins from the genomes of Achlya hypogyna, Thraustotheca clavata, other oomycetes, and other chromalveolates iv that cluster with green algal genes. Twelve oomycete proteins clustering with green algal genes at high stringency were annotated and selected for further analyses. Representative homologs from all other eukaryotic taxa available were aligned to sequences comprising each network and maximum likelihood trees were constructed from these alignments. Protein trees derived from these data exhibited obvious errors resulting from taxon biases and heterotachy. These results argue that ‘green genes’ detected in phylogenomics studies are artifactual and not indicative of endosymbiotic gene transfer. v ACKNOWLEDGMENTS My thanks go to my advisor, Dr. J. Craig Bailey, whose enthusiasm about molecular protistology caught my interest in the very beginning of my scientific education. His continuous encouragement, wit, and sense of humor made this journey an enjoyable one. Ian Misner and Dr. Chris Lane of the University of Rhode Island have also been instrumental in my education, providing feedback and technical support in my research. I’d also like to thank my committee members, Dr. D. Wilson Freshwater, Dr. Jeremy Morgan, and Dr. Allison Taylor, for their encouragement and flexibility throughout this process. My lab mates past and present, particularly Cory Dashiell, Erika Shwarz, Ashley Hayes, and Allison Martin, helped me maintain my focus over the years throughout failed DNA extractions, computer malfunctions, approaching deadlines, and many other graduate school related challenges. The Department of Biology and Marine Biology, the Center for Marine Science and the National Science Foundation provided financial support for my education and research. Finally, I’d like to thank my parents and my husband for supporting me every step of the way. vi DEDICATION I’d like to dedicate this to my mother, whose endless patience has allowed me to explore life with few restrictions and overwhelming love and support. vii LIST OF TABLES Table Page Chapter 1 1. Protein IDs for 46 hypothetical proteins detected in the genomes of Thraustotheca and/or Achlya.. ............................................................................ 9 2. Protein ID numbers, annotations and protein family designations..................... 11 3. Proteins sorted into one of 14 unique protein families....................................... 13 4. List of seven proteins from the Achlya and Thraustotheca and putative homologs found in the Arabidopsis thaliana plastid proteome.. ........................ 14 Chapter 2 1. List of 12 annotated proteins from the Achlya and/or Thraustotheca proteomes or other oomycetes found in EGNs . ........................................................................ 24 viii LIST OF FIGURES Figure Page Chapter 1 1. Hypotheses for the origin of complex, higher order chlorophyll a+c-containing plastids in chromalveolates. ....................................... 8 Chapter 2 1. Three examples of putative green genes in oomycete genomes based on EGN analysis. .................................................................... 25 2. DEXDc ML tree................................................................................................... 26 3. RPB ML tree ....................................................................................................... 27 4. ALDH ML tree ..................................................................................................... 28 5. TOR-containing kinase ML tree .......................................................................... 29 6. YAK1 ML tree: .................................................................................................... 30 7. ALS ML tree....................................................................................................... 31 ix CHAPTER 1: Implications for the origin of chromalveolate plastids. x INTRODUCTION The evolutionary origin and subsequent movement of secondary and higher order plastids among photosynthetic eukaryotes is the subject of intense debate. The principal key to unraveling the evolutionary history of plastids is an accurate understanding of the relationships among both host and plastid lineages (Archibald 2009; Green 2011). This goal is hampered by the mosaic nature of eukaryotic genomes comprised of lineagespecific genes inherited vertically, thousands of genes acquired by endosymbiotic gene transfer (EGT), and genes obtained via lateral gene transfer (LGT) (Archibald 2008; Green 2011; Keeling 2009; Larkum 2007). The chromalveolate hypothesis posits that the alveolates, cryptomonads, haptophytes and stramenopiles are monophyletic and that the last common ancestor of these lineages was a photosynthetic alga bearing a red-type plastid (Cavalier-Smith 1999; 2003). This notion is supported, in the first instance, by the fact that photosynthetic members of these chlorophyll a+c-containing groups all possess redtype plastids surrounded by three or four unit membranes [the so-called chloroplastendoplasmic reticulum, or CER], a feature indicative of secondary endosymbiosis (Dodge 1975; Foth and McFadden 2003; Guillot and Gibbs 1980a, b; Gibbs 1981a, b; Köhler et al. 1997). Second, nuclear-encoded plastid-targeted proteins in these algae are characterized by the presence of a 5’ bipartite signal sequence that directs gene products to the plastid and across the outer- and inner-pair of plastid membranes (Kroth 2002; Soll and Schleiff 2004). In terms of coding capacity, gene content, and organization the plastid genomes of chromalveolates resemble those of red algae far more closely than they resemble the plastid genomes of green algae (Delwiche 1999; Keeling 2004; Yoon et al. 2002). Cavalier-Smith (1999) originally emphasized the chromalveolate hypothesis is consistent with idea that the chloroplast endoplasmicreticulum (CER) and complex protein-trafficking systems that characterize chromalveolates are unlikely to have evolved independently on different occasions (see Kroth 2002; Ralph et al. 2004). Over the last decade, tests of the ‘chromalveolate’ concept has been the subject – implicitly or explicitly – of numerous broad-scale phylogenetic studies. The chromalveolates have not been recovered as a monophyletic group in any study (Archibald 2009, Baurain et al. 2010). More recent studies imply the relationships among chromalveolate host cells and their plastids is more complex than originally supposed, perhaps involving tertiary and higher-order transfers among hosts (Archibald 2009; Bodyl 2005; Keeling 2004; Sanchez-Puerta and Delwiche 2008). In this paper the chromalveolate hypothesis is re-examined in light of new genomic data available for nonphotosynthetic members of the Stramenopila. The stramenopiles, one of the four principal taxa included in the Chromalveolata, are divided into two groups. (i)The ‘photosynthetic stramenopiles’, ‘heterokont algae’ or ‘ochrophytes’ - is comprised of chlorophyll a+c-containing photosynthetic algae including phaeophytes, chrysophytes, and diatoms, eustigmatophytes, pelagophytes, and xanthophytes (Lee et al. 2000). (ii) Nonphotosynthetic organisms that are bactivorous, parasitic or saprobic heterotrophs in nature including bicosoecieds, hyphochytrids, labyrinthulids, oomycetes, thraustochytrids, among others (Lee et al. 2 2000). The oomycetes are the most diverse, well studied, and economically important of all nonphotosynthetic stramenopiles. The chromalveolate hypothesis implies that extant aplastidic stramenopiles are derived from ancestors that once possessed a secondary red-type plastid. However, there is no ultrastructural or DNA evidence suggesting that bicosoecieds, hyphochytrids, labyrinthulids, oomycetes, or thraustochytrids possess, or possessed in the past, a plastid. Furthermore, ultrastructural or DNA sequence evidence for cryptic plastids in these organisms is absent or controversial (Lee et al 2000; Reyes-Prieto et al. 2008; Slamovits and Keeling 2008; Stiller et al. 2009). In this study the proteomes of the oomycetes Achlya hypogyna and Thraustotheca clavata, were canvassed in search of photosynthesis related genes. METHODS Full length predicted proteins were obtained from ongoing genome sequencing projects for Achlya hypogyna (ATCC48635) and Thraustotheca clavta (ATCC34112) estimated to encode 17,430 and 12,154 predicted proteins, respectively; additional details will be published separately. The Achlya and Thraustotheca proteomes were searched for possible plastid-targeted genes using the signal peptide prediction program ChloroP (v.1.1) (Emanuelsson et al. 1999). Hypothetical proteins returned from these searches were subsequently analyzed using SignalP (v.4.0) (Petersen et al. 2011), annotated and assigned to protein families if possible using the Conserved Domain Database (CDD) (Marchler-Bauer et al. 2007). Mitochondria-targeted proteins and proteins possessing 3 transmembrane regions identified using TmHMM (v2.0) were removed from the data set (Krogh et al. 2001). Searches for heterokont-like bipartite plastid-targeting peptides, consisting of both signal and transit peptide motifs, were conducted using HECTAR (Gruber et al. 2007; Gschloessl et al. 2008; Waller et al. 2000). Finally, the oomycete proteins were BLASTed against the Arabidopsis thaliana plastid proteome database (which includes plastid- and nuclear-encoded plastid targeted proteins) using plprot v.2.3 (Baginsky et al. 2005; Kleffmann et al. 2004; 2006). RESULTS AND DISCUSSION The chromalveolate hypothesis implies that the ancestors of oomycetes were photosynthetic organisms bearing red-type plastids and putative plastid-related genes have been reported from the genomes of the plant pathogens Phytophthora ramorum and P. sojae (Tyler et al. 2006). The competing hypothesis is the long-held view that oomycetes are ancestrally aplastidic. It is possible that the ancestors of oomycetes were photosynthetic but that extant members of group have not retained any plastidassociated genes. On the other hand, empirical data including studies of apicomplexans, dinoflagellates and other taxa imply plastid-associated genes are unlikely to be completely purged from the genome even in organisms where a vestigal, nonphotosynthetic plastid is absent (Barbrook et al. 2006; de Koning and Keeling 2004; Matsuzuki et al. 2008; Wilson 2004; Sanchez-Puerta et al. 2007). Thirty hypothetical proteins from the Achlya genome and 16 from the Thraustotheca genome putatively possessing a 5’ plastid-targeting signal peptide were 4 identified (Table 1). Of these 46 proteins 22 are presently characterized as hypothetical proteins of unknown function; 24 of the proteins were annotated (<1.00E-25) and found to represent 14 unique protein families (Tables 2, 3). BLASTp queries revealed that none of the oomycete proteins (Table 2) are encoded by the 271 eukaryotic plastid genomes sequenced to date. None of the 46 presequences examined here possess the ASAFP (Y/W/L) motif necessary for plastid import in diatoms, although the significance of this observation is unclear (Gruber et al. 2007) (see supplementary Tables S1 and S2). Putative homologs to seven of the oomycete proteins were detected in the A. thaliana plastid proteome (Table 4). These seven oomycete proteins are more-or-less distant relatives of three A. thaliana genes. Both Achlya and Thraustotheca encode proteins similar to the zinc-finger type WRKY1 DNA-binding transcription factor that plays a role in disease resistance in A. thaliana (Dong et al. 2003; Shindo et al. 2012) Three Achlya and one Thraustotheca proteins putatively encoding cysteine proteinase RD21A are shared in common with the A. thaliana plastid proteome. Finally, a single Achlya protein distantly related (6E-17) to A. thaliana aldehyde dehydrogenase (ALDH) was also detected. These three genes are not indicators for photosynthesis per se because homologs have been detected from across the tree of life in photosynthetic (e.g., plants and green algae) and nonphotosynthetic organisms (e.g., eubacteria, animals, fungi, and the rhizarian Dictyostelium). Homologs, more closely related to the Achlya and Thraustotheca proteins, to each of these putative genes have been previously detected 5 in the genomes of Phytophthora infestans, P. sojae (Pythiales) and the white rust Albugo laibachii (Tyler et al. 2006). The annotated proteins recovered in this study include nine know to belong to oomycete secretomes and six of these are common proteases such as chitinase and cellulase (Tables 2, 3: Birch et al. 2006; Gaulin et al. 2008; Kamoun 2006; Levesque et al. 2010). One of the proteins belongs to the elicitin family; a family of virulence genes unique to oomycetes (Jiang et al. 2006). Based upon these data, plastid-associated genes are not present in the Achlya or Thraustotheca predicted proteomes. Revised Hypotheses for the Evolution of Chromalveolate Plastids These data, as well as the study by Stiller et al. (2009), indicate that oomycetes are ancestrally aplastidic despite reports to the contrary (Tyler et al. 2006). This information and the results of recent phylogenomics investigations have been synthesized and revised hypotheses for the evolution of chromalveolate plastids are presented in Figures 1 and 2. These diagrams reflect a number of assumptions that are enumerated for the sake of clarity. (i) The Chromalveolata sensu stricto is paraphyletic (e.g., , Iida et al. 2007; Khan et al. 2007; reviewed in Green 2011; Rogers et al. 2007). (ii) )omycetes, all other heterotrophic stramenopiles, as well as the ciliates are ancestrally aplastidic (Archibald 2008; Reyes-Prieto et al. 2008; Tyler et al. 2006). (iii) The SAR clade is recognized as natural (Burki et al. 2007; Hackett et al. 2007; Lane & Archibald 2008). Fourth, recent studies imply that SAR and Hacrobia host cells are likely distantly related (Baurain et al. 2010; Hackett et al. 2007; Parfrey et al. 2010). For these reasons, no 6 specifically defined relationship between SAR and Hacrobia host cells is implied in Figure 1. The diagrams comprising Figure 1 are drawn under the assumption that the Hacrobia is monophyletic (Burki et al. 2007; Hackett et al. 2007; Harper et al. 2005, Patron et al. 2007). These hypotheses share elements in common with prior models of chromalveolate plastid evolution in which multiple plastid acquisitions (or plastid replacements) are inferred via serial endosymbiotic transfer (Archibald 2008; Bodyl 2005; Bodyl et al. 2009; Bodyl and Moszczynski 2006; Sanchez-Puerta & Delwiche 2008). Two predictions derived from these models bear emphasizing: (1) Alveolates and Stramenopiles likely possess tertiary or quarternary plastids and (2) it is conceivable that one of these taxa, the alveolates or stramenopiles, may have obtained their plastid from the other (Fig. 1). Finally, it is noted that the number of membranes surrounding higher-order, complex plastids seems to be fixed at four or less. 7 Fig. 1 Hypotheses for the origin of complex, higher order chlorophyll a+c-containing plastids in chromalveolates. (A) Independent acquisition of a tertiary (3°) plastid in the alveolate and stramenopile lineages from the Hacrobia lineage. (B) Serial endosymbiotic transfer resulting in a quarternary (4°) alveolate plastid from the 3° stramenopile plastid. (C) ) Serial endosymbiotic transfer resulting in a 4° stramenopile plastid from the 3° alveolate plastid. 8 Table 1. Protein IDs for 46 hypothetical proteins detected in the genomes of Thraustotheca and/or Achlya characterized by the presence of a putative plastid-targeting 5’ signal peptide sequence. ChloroP was used to detect classical plastid transit peptides. HECTOR was used to search for bipartite plastid targeting leader sequences characteristic of stramenopiles and other 3° chromalveolates (Kilian and Kroth 2003, McFadden and van Dooren 2004, Vesteg et al. 2009). Protein ID Thraustotheca clavata THRCLA_02069 THRCLA_03737 THRCLA_03876 THRCLA_04285 THRCLA_04386 THRCLA_04952 THRCLA_05863 THRCLA_06099 THRCLA_07047 THRCLA_08011 THRCLA_10855 THRCLA_10997 THRCLA_11248 SignalP ChloroP Y Y Y Y Y N Y Y Y Y N N Y Y Y Y Y Y Y Y Y Y THRCLA_11271 THRCLA_11391 THRCLA_11516 Y Y Y Y Y Y Chloroplast Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide No N-terminal target peptide found Chloroplast Signal peptide Signal peptide Achlya hypogyna ACHHYP_00269 ACHHYP_01095 ACHHYP_01226 ACHHYP_01546 ACHHYP_02169 ACHHYP_02305 ACHHYP_03044 ACHHYP_03052 ACHHYP_04549 ACHHYP_04706 ACHHYP_04908 ACHHYP_05005 Y Y Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Signal Peptide Signal peptide Signal peptide Signal peptide Chloroplast Signal peptide Signal peptide Signal peptide Chloroplast Signal peptide Signal peptide Signal peptide 9 HECTAR Table 1 cont ACHHYP_05180 ACHHYP_05326 ACHHYP_05770 ACHHYP_06287 ACHHYP_06505 ACHHYP_06977 ACHHYP_07400 ACHHYP_08323 ACHHYP_09221 ACHHYP_09519 ACHHYP_10824 ACHHYP_11025 ACHHYP_11286 ACHHYP_11397 ACHHYP_12628 ACHHYP_13722 ACHHYP_14385 ACHHYP_15409 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y 10 Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Signal peptide Chloroplast Signal peptide Chloroplast Chloroplast Signal peptide Chloroplast Signal peptide Signal peptide Chloroplast Chloroplast Signal peptide Chloroplast Table 2. Protein ID numbers, annotations (<1.00E-25), and protein family designations for 46 proteins from the Thraustotheca and Achlya genomes putatively possessing 5’ plastid-targeting signal peptides. Gene/Protein ID Annotation pfam Thraustotheca clavate THRCLA_02069 putative GPI-anchored serine-rich hypothetical protein THRCLA_03737 cd05384: SCP_PRY1_like [COG2340] THRCLA_03876 hypothetical protein, with EGF-like motif THRCLA_04285 Kazal-type serine proteinase inhibitor THRCLA_04386 hypothetical protein THRCLA_04952 hypothetical protein THRCLA_05863 hypothetical protein THRCLA_06099 putative GPI-anchored serine-rich hypothetical protein THRCLA_07047 hypothetical protein THRCLA_08011 cysteine protease family C01A, putative THRCLA_10855 hypothetical protein THRCLA_10997 chitinase D-like THRCLA_11248 hypothetical protein, unknown function THRCLA_11271 hypothetical protein, elicitin superfamily THRCLA_11391 beta-N-acetylglucosaminidase THRCLA_11516 hypothetical protein, unknown function Achlya hypogyna ACHHYP_00269 ACHHYP_01095 ACHHYP_01226 ACHHYP_01546 ACHHYP_02169 ACHHYP_02305 ACHHYP_03044 ACHHYP_03052 ACHHYP_04549 ACHHYP_04706 ACHHYP_04908 ACHHYP_05005 putative GPI-anchored serine-rich hypothetical protein beta-N-acetylglucosaminidase hypothetical protein hypothetical protein trypsin-like serine protease putative GPI-anchored serine-rich hypothetical protein putative chitinase-like carbohydrate-binding protein hypothetical protein hypothetical protein hypothetical protein encoding ricin_B_lectin hypothetical protein puative D-lactate dehydrogenase 11 _ pfam00188: _ pfam7648 _ _ _ _ _ pfam00112 _ pfam00704 _ pfam00964 pfam00728 _ _ pfam00728 pfam12937 _ pfam13365 _ pfam00704 _ _ pfam00652 _ pfam01565 Table 2 cont ACHHYP_05180 ACHHYP_05326 ACHHYP_05770 ACHHYP_06287 ACHHYP_06505 ACHHYP_06977 ACHHYP_07400 ACHHYP_08323 ACHHYP_09221 ACHHYP_09519 ACHHYP_10824 ACHHYP_11025 ACHHYP_11286 ACHHYP_11397 ACHHYP_12628 ACHHYP_13722 ACHHYP_14385 ACHHYP_15409 hypothetical protein Cellulose hypothetical protein hypothetical protein papain family cysteine protease hypothetical protein hypothetical protein hypothetical protein containing PAN domain hypothetical protein _ pfam00150 _ _ pfam00112 _ _ pfam00024 _ hypothetical protein encoding ricin_B_lectin ankyrin repeat protein hypothetical protein aldehyde dehydrogenase hypothetical protein papain-like cysteine protease C1 hypothetical protein hypothetical protein papain-like cysteine protease C1 pfam00652 pfam12796 _ pfam0017 _ pfam00112 _ _ pfam00112 12 Table 3. Proteins investigated in this study were sorted into one of 14 unique protein families, which are listed below. Note that all proteins investigated (see Table 2) are predicted to have a 5’ signal peptide and that nine of the 14 families include secreted proteins. Six of the families include proteases and that the elicitin family of virulence proteins are secreted extracellularly and is unique to oomycetes. pfam ID 00188 07648 00112 00964 00728 12937 13365 00704 00652 01565 00150 00024 12796 0017 Protein ID THRCLA_03737 THRCLA_04285 THRCLA_08011 ACHHYP_06505 ACHHYP_12628 ACHHYP_15409 THRCLA_11271 THRCLA_11391 ACHHYP_01095 ACHHYP_01226 ACHHYP_02169 THRCLA_10997 ACHHYP_03044 ACHHYP_04706 ACHHYP_09519 ACHHYP_05005 ACHHYP_05326 ACHHYP_08323 ACHHYP_10824 ACHHYP_11286 Protein family / Conserved domains Cysteine-rich secretory protein family Kazal_2: Kazal-type serine protease inhibitor domain Peptidase_C1: Papain family cysteine protease Elicitin Glyco_hydro_20: Glycosyl hydrolase family 20, catalytic domain F-box-like Trypsin_2: Trypsin-like peptidase domain Glyco_hydro_18: Glycosyl hydrolases family 18 Ricin_B_lectin: Ricin-type beta-trefoil lectin domain FAD_binding_4: FAD binding domain Cellulase: Cellulase (glycosyl hydrolase family 5) PAN_1: PAN domain Ank_2: Ankyrin repeats aldehyde dehydrogenase superfamily (ALDH-SF) 13 Table 4. List of seven proteins from the Achlya hypogyna and Thraustotheca clavata oomycete genomes and putative homologs found in the Arabidopsis thaliana plastid proteome. Reference refers to functional studies of the genes identified in this analysis. Oomycete protein ID A. thaliana plastid proteome ID ACH_05770 plp_at_01492 THR_04952 THR_08011 ACH_15409 ACH_12628 ACH_06505 ACH_11286 plp_at_01492 plp_at_00089 plp_at_00089 plp_at_00089 plp_at_00089 plp_at_00466 Gene annotation disease resistance protein related to DNA-binding protein WRKY1 disease resistance protein related to DNA-binding protein WRKY1 cysteine proteinase RD21A (=thiol protease RD21A) cysteine proteinase RD21A (=thiol protease RD21A) cysteine proteinase RD21A (=thiol protease RD21A) cysteine proteinase RD21A (=thiol protease RD21A) aldehyde dehydrogenase (ALDH) 14 E - value Reference 2.00E-20 7.00E-23 4.00E-53 Shindo et al. 2012 1.00E-24 Shindo et al. 2012 1.00E-18 Shindo et al. 2012 9.00E-47 Shindo et al. 2012 6.00E-17 1 2 3 CHAPTER 2: Do chromalveolate genomes encode ‘green genes’? 1 2 INTRODUCTION One of the most vexing problems in eukaryote systematics is the 3 interrelationships among the so-called ‘chromalveolates’ (Archibald 2008; Cavalier- 4 Smith 1999; Green 2011; Keeling 2004). The Chromalveolata is a paraphyletic taxon 5 whose members can be divided into two groups: The first group (the SAR clade) 6 includes the Alveolates (apicomplexans, dinoflagellates, and ciliates) that are sister to 7 Stramenopiles (including phaeophytes, chrysophytes, oomycetes). In turn, these two 8 clades are sister to the Rhizaria, a group principally comprised of free-living amoebae 9 (Burki et al. 2007; Hackett et al. 2007; Lane and Archibald 2008; Rogers et al. 2007). 10 The second group, the Hacrobia, includes cryptomonads and haptophytes and lesser- 11 known relatives such as the telonemids, centrohelids, and picobiliphytes (Burki et al. 12 2007; Elias and Archibald 2009; Hackett et al. 2007; Okamoto et al. 2009; Rice and 13 Palmer 2006; Patron et al. 2007). The exact relationship between host cells and plastids 14 belonging to members of the SAR and Hacrobia clades is unclear (Baurain et al. 2010; 15 Harper et al. 2005). Despite these uncertainties, it is clear that all photosynthetic 16 chromalveolates possess three or four membrane-bound secondary or higher-order 17 plastids ultimately derived from a red alga (Hackett et al. 2004; Janouskovec et al. 18 2010; Kahn et al. 2007; Yoon et al. 2002; 2004; Sanchez-Puerta et al. 2007). How 19 these plastids were acquired is a contentious issue but most recent models reflect a 20 growing consensus that multiple independent origins and/or serial endosymbiotic events 21 best explain most recent data (Bodyl 2005; Bodyl and Moszczynski 2006; Sanchez- 22 Puerta and Delwiche 2008). 16 1 The understanding of the evolutionary history of chromalveolates has recently 2 been further complicated by the unexpected discovery of so-called ‘green genes’ in 3 chromalveolate genomes. Whole genome sequencing and EST studies have revealed 4 that the genomes of chromalveolate species encode 100s or 1000s of genes apparently 5 derived from within the green algal lineage (Moustafa et al. 2009; Tyler et al. 2006; 6 Woehle et al. 2011). For example, the genomes of the diatoms Phaeodactylum and 7 Thalassiosira reportedly contain thousands of genes whose phylogenetic affinities lie 8 within green algae (Armbrust et al. 2004; Bowler et al. 2008; Chan et al. 2011; Moustafa 9 et al. 2009). Putative ‘green genes’ (albeit fewer in number) have also been detected in 10 the genomes other chromalveolates examined (Cock et al. 2010). The presence of 11 ‘green genes’ has lead some authorities to speculate that the last common ancestor of 12 the chromalveolates once harbored a green algal symbiont that was later replaced by a 13 red algal symbiont that gave rise to the chlorophyll a + c-containing red-type plastids 14 that characterize most extant chromalvelates (Armbrust 2009; Dorrell & Smith 2011; 15 Frommolt et al. 2008; Moustafa et al. 2009). In short, the green genes found in 16 chromalveolate genomes are hypothesized to have been obtained via endosymbiotic 17 gene transfer (EGT) (Huang et al. 2004; Reyes-Prieto et al. 2008; Slamovits and 18 Keeling 2008; Tyler et al. 2006;). 19 Other studies – implicitly or explicitly – imply that the green phylogenetic signal in 20 chromalveolate (particularly diatom) genomes may be more apparent than real. Biases 21 associated with heuristic phylogenomics pipelines needed to construct across genome- 22 level trees and the uneven distribution of protein sequences for eukaryotic taxa have 23 been previously described (Stiller et al. 2009; Woehle et al. 2011). In this study, two 17 1 chromalveolates, the nonphotosynthetic stramenopiles Achlya, Thraustotheca, were 2 canvassed for proteins of putative green algal origin. These proteins were annotated, 3 combined with homologs from other oomycete genomes or expressed sequence tag 4 (EST) databases, and homologs representing all other available eukaryotic taxa. The 5 phylogenetic trees obtained were used to (1) determine if nonphotosynthetic, aplastidic 6 oomycetes encode green algal genes similar to those found in diatoms and other 7 chromalveolates. Note, that if oomycetes are ancestrally non-photosynthetic then their 8 genomes should not encode ‘green genes’. (2) Second these trees were used to 9 critically reassess the veracity of green genes found in chromalveolates in toto. 10 11 METHODS 12 13 The genomes of Achlya hypogyna (ATCC 48635) and Thraustotheca clavata 14 (ATCC 34112) were sequenced and assembled yielding 17,430 and 12,154 predicted 15 proteins, respectively. Green genes possibly obtained by HGT or EGT events were 16 identified using evolutionary gene network (EGNs) analyses as described in Bittner et 17 al. (2010). In brief, all sequences were BLAST-ed against one another. Sequences 18 were connected in the EGN connected components graph when they showed a 19 minimum similarity, BLASTp score < E-value threshold, and sequence identity score 20 and BLAST identity percentage equal to or exceeding user determined limits. For 21 example, an EGN network with user defined parameters of ‘1E-20 at 80% similarity’ 22 connects sequences that have BLASTp scores below 1E-20 and sequence identities 23 equal to or greater than 80%. 18 1 In this study batches of networks were separately constructed with minimum 2 threshold protein identities of 35, 45 and 65% and E-value thresholds of 1E-20. 3 Networks including oomycete proteins and one or more protein sequences derived from 4 representatives of (1) the green algal lineage (GAL) or (2) Fungi were selected for 5 further investigation. Annotations for candidate HGT/EGT proteins in the Achlya and 6 Thraustotheca genomes were then refined using NCBI’s conserved domain (CDD) and 7 KOG databases (Marchler-Bauer et al. 2007; Tatusov et al. 2003) and then used to 8 drive BLASTp searches aimed at recovering more distantly related eukaryotic homologs 9 from GenBank. Homologous sequences from representative all available eukaryotic 10 lineages were selected and aligned using “Geneious Alignment” with default settings in 11 Geneious v5.5 (Drummond et al. 2011) and manually edited as necessary. Thus, each 12 protein alignment included all sequences in the EGN of interest, as well as a number of 13 more distant homologs from other eukaryotes. Maximum likelihood trees for each 14 protein alignment were constructed using PHYML (Guindon et al. 2010) with the WAG 15 substitution model (Whelan & Goldman 2001) to account for heterotachy and 500 16 bootstrap replicates. Baysian posterior probabilities were calculated with using the Mr. 17 Bayes plugin for Geneious and run with default settings using the WAG substitution 18 model. 19 20 RESULTS AND DISCUSSION 21 22 23 Because they are ancestrally aplastidic, oomycetes are a perfect foil for examining the hypothesis that chromalveolate genomes harbor varying numbers of 19 1 green genes acquired via EGT from an ancient green algal endosymbiont (Dorrell & 2 Smith 2011; Moustafa et al 2009). Genes of cyanobacterial and/or red algal origin were 3 originally reported for the genomes of Phytopthora ramorum and P. sojae but it has 4 since been demonstrated that these genes are very unlikely to reflect cyanobacterial or 5 red algal contributions to these genomes (Tyler et al. 2006; Stiller et al. 2009; Woehle et 6 al. 2011). 7 In this study 12 protein-encoding genes encoded by the Achlya, Thraustotheca or 8 other oomycete genomes were examined, which, based on EGN analyses, are closely 9 related to genes found in green algae (Table 1). Three exemplary EGN networks are 10 depicted in Figure 1. These networks indicate that Phytopthora spp. share one or more 11 copies of the phosphate dikinase (PPDK) gene in common with the green algae 12 Chlamydomonas and Volvox (Fig. 1a). The PPDK gene is, however, absent from the 13 genomes of Achlya and Thraustotheca and this observation – coupled with the current 14 understanding of oomycete systematics – implies that PPDK was likely acquired in the 15 Phytopthora lineage following the pythialean/saprolegnialean divergence (Beakes & 16 Sekimoto 2008; Sekimoto et al. 2009) In any event, the PPDK network clearly 17 demonstrates a putative green algal gene in Phytopthora spp., that is unknown in other 18 oomycetes. If Phytopthora spp. PPDK genes were acquired via EGT, then this 19 observation is most parsimoniously interpreted as a recent event – not one that can be 20 associated with the presence of a ancient green algal symbiont. All oomycetes 21 examined encode single copies of eukaryotic translation initiation factor 5B and an 22 aldehyde dehydrogenase whose most similar homologs are putatively found in the 23 bryophyte Physcomitrella patens (Fig. 1b, 1c, respectively). 20 1 Maximum likelihood (ML) trees for six of the 12 oomycete proteins of putative 2 green algal origin examined in this study are depicted in Figures 2 – 7. These six were 3 selected for demonstration because they are the most taxon replete and best 4 supported; trees for the remaining six proteins are equally problematic, or worse (see 5 below). 6 A tree comprised of DEXDc homologs is presented in Fig. 2. The EGN for 7 DEXDc implies a green origin for this gene in oomycetes, specifically uniting oomycete 8 homologs with the sequence for Chlamydomonas reinhardtii (not shown). Note, 9 however, in the tree that the C. reinhardtii DEXDc terminates a very long branch and 10 that when other eukaryotic homologs are added the oomycete/green relationship 11 becomes less clear. In fact, this tree implies that oomycetes share a common ancestor 12 with the Opistokonts (fungi and animals), a result clearly at odds with current 13 understanding of eukaryotic systematics. In summary, (at least) two phylogenetic errors 14 are apparent in the DEXDc tree: long branch attraction and a topological error that can 15 likely be traced to problems associated with taxon sampling, i.e. clear homologs to the 16 algal, plant, oomycete, and fungal DEXDc genes have yet to be identified in other 17 eukaryotes. The same issue – taxon sampling – specifically the differential distribution 18 of homologs among eukaryotic lineages also plagues the RPB tree (Fig. 3). Bearing in 19 mind that protein sequences for animals, fungi and plants far outnumber those available 20 for other organisms, the RPB subunit II tree implies that the alveolates are sister to a 21 clade including stramenopiles (brown algae, diatoms, and oomycetes), animals, and 22 green algae + land plants (Fig. 3). This topological error is likely compounded by the 23 observation that the alveolate sequences terminate long branches whereas the 21 1 embryophytes terminate shorter branches, and heterotachy is a well-known source of 2 phylogenetic error (Kolaczkowski and Thornton 2008; Pagel and Meade 2008; Philippe 3 et al. 2008; Shalchian-Tabrzi et al 2006). The ALDH tree implies that the stramenopiles 4 are not monophyletic; green algal sequences are nested within a clade including 5 sequences for alveolates and stramenopiles (Fig. 4). 6 These same types of phylogenetic errors are demonstrated in Figures 5 – 7 7 and are not repeated. What these trees clearly demonstrate, however, is the pervasive 8 influence that the vast number of sequences available for fungi (80+ complete 9 genomes) may have on phylogenomics studies (cf. Stiller et al. 2009). The TOR- 10 containing kinase tree suggests that green algae may not be monophyletic and that 11 green algae and stramenopiles are, again, sister to animals and fungi (Opistokonts) 12 (Fig. 5). The unorthodox relationships among green algae, oomycetes, and fungi are 13 also recovered in the YAK1 tree (Fig. 6). The ALS tree is equally vexing and seems to 14 suggest that the chromalveolates (in toto?) may have obtained their copy of this gene 15 via horizontal gene transfer from fungi (Fig. 7). 16 17 Green Genes in Oomycetes and Other Chromalveolates? 18 19 On the basis of the data collected, the notion that chromalveolate genomes encode 20 hundreds or thousands of genes derived from green algae is false. 21 Critical analyses of protein-encoding sequences from oomycetes and other 22 chromalveolates of putative green algal origin yielded trees seriously compromised by a 23 number of obvious and well-known sources of phylogenetic error. These included at 22 1 minimum biased taxon sampling, long branch attraction, and heterotachy. This 2 argument is bolstered by the curious fact that so-called ‘green genes’ can be detected in 3 oomycetes even though these organisms are ancestrally aplastidic. These results, and 4 those of Stiller et al. (2009), suggest that these biases are so prevalent at this time that 5 broad-scale evolutionary scenarios drawn from phylogenomics studies need to be 6 interpreted with a higher level of skepticism. 7 23 1 2 3 4 Table 1. List of 12 annotated proteins from the Achlya and/or Thraustotheca proteomes or other oomycetes found in EGN connected components graphs clustering with homologs from green algae. Protein Annotation TOR-phosphatidylinositol kinase Yak1 acetolactate synthase (ALS or AHAS) DEXDc phosphatidylinositol kinase, putative target of rapamycin (TOR) RPB RRM RRM2 Sm_D1 Sm_E thioredoxin peroxidase threonine protease ALDH PKc-like superfamily, Yak1-like protein kinase TPP_AHAS[cd02015], Thiamine pyrophosphate (TPP) family, Acetohydroxyacid synthase (AHAS) subfamily DEXDc superfamily, premRNAsplicing factor ATPdependent RNA helicase PRP16 putative RNA polymerase beta subunit.cd00653: RNA_pol_B_RPB2 RRM superfamily, PREDICTED: cleavage stimulation factor subunit 2-like RRM superfamily, PREDICTED: similar to RNA binding motif protein Sm-like superfamily, small nuclear ribonucleoprotein D1 Sm-like superfamily, small nuclear ribonucleoprotein E thioredoxin-like superfamily, cd03015: PRX_Typ2cys threonine protease family T01A putative, cd01911: proteasome_alpha ALDH-SF superfamily, cd07084: ALDH_KGSADH-like 5 6 24 1 2 3 4 5 6 7 8 9 10 11 Fig. 1. Three examples of putative green genes in oomycete genomes based on EGN analysis conducted at 65% protein identity. (A) All species of Phytophthora in this analysis share a copy of phosphate dikinase (PPDK: P. infestans gene ID 03724) with Chlamydomonas reinhardtii and Volvox carteri, two microscopic green algae. Note that PPDK is not encoded on the Achlya or Thraustotheca genomes. (B) The moss Physcomitrella patens shares both eukaryotic translation initiation factor 5B (P. infestans gene ID 20386) and (C) an aldehyde dehydrogenase (P. infestans gene ID 00034) with all oomycetes included in this analysis. 25 1 2 3 4 5 Fig. 2 DEXDc ML tree: Oomycetes are shown sister to animals, sharing a common ancestor with fungi. The phylogenetic errors demonstrated include long branch attraction and topological error due to sampling bias. 26 1 2 3 4 5 Fig. 3. RPB ML tree: Alveolates are shown as sister to a clade including stramenoplies, animals, and GAL. Long branches in the alveolate clad and short branches in the GAL, stramenoplie and animal clade is indicative of topological error due to heterotachy. 27 1 2 3 4 5 Fig. 4. ALDH ML tree: Stramenopiles and GAL shown as not monophyletic. Long branch attraction between GAL, stramenopiles, and alveolates is likely responsible for phylogenetic error. 28 1 2 3 4 5 Fig. 5. TOR-containing kinase ML tree: Stramenopiles are sister to GAL, shown sharing a common ancestor with animals. Heterotachy and topological error due to sampling bias are demonstrated. 29 1 2 3 4 5 Fig. 6. YAK1 ML tree: GAL and stramenopiles shown sharing a common ancestor with fungi. Long branch attraction between GAL and stramenoplies, heterotachy and topological error due to sampling bias are demonstrated. 30 1 2 3 4 5 6 Fig. 7. ALS ML tree: Two clade tree shown making inferences about the relationship between the two impossible. Phylogenetic error is likely due to abundance of available fungal genome data (sampling bias). 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 SUPPLEMENTAL INFORMATION Table S1. Selected hypothetical proteins (n=16) from the Thraustotheca clavata genome possessing putative 5’ transit peptides. Chloroplast transit peptides predicted using ChloroP (v.1.1) are shown in bold face. Transit peptide sequences predicted using SignalP (v.4.0) are underlined. >THRCLA_02069 MVRISALLGTFALIHAQTTTAPPASASNSWTMTTVNSIQARVVSDAATWDATNKKFG LVMKQNTVTFPDQYRAAMDTVNTASVEGALFYVQTEGINKQFDVNCMRKTNMSYIWF LNVTIVQPTFAIAEYADNGGVVPEYGKFIAMDNGQCTPLDTKGTMSDECMTLGGLNYH ANIGPFIGGEPRKEHLLAKYPDNIWFSYPNSCFTKTFIAKDTKCREAQKGGLCPLGVQP DGIKCTYSFDILGYIRIDELVGITNLTNSQTGQKYKDRVEFCKDSKVEFDFSTMKSDLTF WDNPTDEAANTNRTTKMLELYNNLIKTGTGDAAYMKSLPTAAELTAKNPPCWKNSPIC ATAEFGCRRKLTAQICEKCTSASPDCKKPTSSDSVPPKLTKAVAPPLPTDASGKTTVP RNPTGAGGNGNAAAAESSASSLVAFTSLIITLAALFA >THRCLA_03737 MKSTFVLLAAISLVNASSSTKLRGAAPCPNSNSGSSDNSSDYSGSESNWDSGSGSD WDDCGSGSTSTSDSGSNDYPSNWDSNSGSDTTEEPATYAPAPTSAPTSAPTETPAT SKGTLKEQIIHQTNLIRAAHGLGPVKWNDELAAKMQAWANSDPQQNGGGHGGPPGN QNLASFDVCNDNCMRMTGPAWAWYSGEEKLWDYDANKSRDGIWETTGHFSNSMDP GVNEIACGYSTFYNPQIGHDDSLVWCNYLGGNNGVIPRPRIDQATLEKQLTSAY >THRCLA_03876 MNLKAWILSVAIASAAAASGSSSGSGSTTDAPLTQENLSSRPGLCNTSKDCAKYTKG SNVYSCIAVKSNIVNLTTLKQCVLGDGCSGGKAGSCPTFTSWPQKFRQVQPVCAFVA VPNCNSAVNSQGQVVSVRSLREQAAKPGNVTCFQAKFGSNSSSSDDSATVYGIYQCV DKKLYAEKNLGYLDNTPKQLQSCAGNVTVVNGQSVSNVLCNGHGTCVPQTDFSDIYK CLCSTGYSDKDNCGAATGNVCSAFGQCGNGNCNPDTGKCVCPYGSTGDQCSKCDP AQNNNASVTNMCNGNGKCGIDGTCQCSDGYLGTNCETQIKKNSTASSATGSTTSSKK SAASGLHEASIAIFSIATIFAAALI >THRCLA_04285 MQIKSIIATLTLAALAQADNNNCEKSCTKELSPLCASNNETYNNLCLFQIAQCQQPTLTI SANQSCSTNVKFCTRLCPTVYQPVCGSDNTTYPTECDLKNKACNNPSLTVTKQGACD NCPKACLEILAPVCGSDGKTYDNTCFLLKTACANPSLNLTFVSTGSCTNGNNTTTTAPP SGTTLPPSGTTLPPTTSGNPSTTTTPPTTKPASSATTAMLSLMSAAAIAITYML >THRCLA_04386 MKWQVALLSLVTSGIAQDHCGSTTVPTIVPTPAPTLAPTPAPTPAPTPAPTPAPTPAPT PAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPLAVATATWTNLW SDIVQVATDNTQICIRETNGDVDCKPWSTDSSLPTVYGGHSSNFLATGGGWSISTVNN VNYLVVISPLYNANVMVLDEAILYAATDGATCCITTSTFRCASQKLDMTFVKMTDKYITS SSIYNAVIYGVDAQGKLYKGSTASISTGVANWQEVSTPCPFTQVSYDGTTLCGLYAST NTIVCTSGTLSLQPNWVALQSNKWKQFSITQSYIYAVDTSNNVQRLQISQPIAVAP >THRCLA_04952 MTLASSPTFSRPLLLPPLTSALSPSIAQQMKRQHECEGGGSVKRHCSTFPYMEMPRL PSITQPSSHIGYLSESYYPSPTSLPMLPPASTLLQQATRKSMDLVPSNAYAPTLPEPCT LYKSNENTKPSPSNEEVRGECLDAQCHNSVKHRGYCKLHGGARRCDVPGCPKGVQG 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 GNLCIGHGGGKRCRFPGCSKATQSQGLCKAHGGGVRCKYDGCNKSSQGGGFCRRH GGGKRCSVAGCPRGAQRGTTCAQHGGKAQCMIDGCVRADRGGGYCEVHRKDKVC RQGYCNRLARIKCEGYCTQHHREFCITSPPQ >THRCLA_05863 MGSVLVLLFSPLHAWLTSSSSSSLPCLQTSFLLSQNASLDQIATAQPRINDAIVSFLAQ SNVQSRIQWTNVDITTSTIDGNGPAIVQMCLYVPPNTSVNQVAMAMSATVNWSGLKTS ISTLHRRFQLFDLTTPLQVLNQVTQYNFQRIPFPYVQWYLVVQGRFDYFWPQIKIKHAIA VLLNISSSSVIPQDIIFPPYDAYNDIATILPFAITQVNSSTFARTANTLTGPLQDILALHGILL LTQFPDPNGNGKLQQSVPWPEYPQLDPTSFYPFHNWTPVPNSFVVKLIYGGLLTLTN MSSVILQVLDVLDSPQTANFTDFQTLTLTYPPNNGTATFESSRYNTLDFIVAGDRSTLE ANQQTLGESLYQIGVSIFDVIDINSTMQTAQWYPYMQLDCPYNLSALASIIQRIALAAFF SIPLSSIQLIEIATNSTTFEIACNDTLEQRYLKKQLKETTRWSTVMNNFTANSAFCTIGGE SLAYPPMFPGSTYGWSQPSSSMDNTCSVNTIELTACDQCDRYLNAVCFTNPNCYQTQ TTLLSQLLVSSNASSVFQQLSLSTSANTKTLNTLALYYSCIAAFQCLIAPNTSIITSDEVYT IDINANGANFSTTLYYPQDDIYLVLNDQTTLEEIQINLSSSISNSIFVNVSGTSSSFNVTM DSVVIPFQLPVIAYSTVPATIQRISASIPQLVFLSNSSNDTTVLLNGKCTTCLTQMDECK MSPSCPSIAICWSNVVESAISQLDSVYSTLEISTQLISCYENASLEDFEMFLRVQKCLLQ SSCPISPTLESIVKGTMIVLRSTTGFQTIELTPTPAVTLTIGTESIILSSNSISGLQATMINFL SPLCQASIQSNTANLTIQFNDFGAPILPTINGTIYSQMPRIFLDRMPLDSSRFGFSYQSY KQLSPSSLPNAFTTTLNSNCQMCQNLFDQCLLSSFCASIISNFQNTIAGATNAFIGWSV ALQRLSFDIPEWDQFAQTLSCFEIHNCPINSTISMLKNGRMLLLSSTPVVLSVTFSSSPF EAAIYVQRFRQPINVSSNSSAAYIQGQFQMNFGSLALTNVSITNTSMELSLNSYYGPTP EFMVTSSEFSNKTIILGTSMVSVVSYSPAAYFPY >THRCLA_06099 MKFALVSSLAVLASAQTNNSSAGSNSNVNCPLQFTSACANTQECGTLNGYPLECQV YGSVKQCVCSKENANCQNSTNIANTIPQFGVCTGGKQCAGSGFKALQTPVRTCSEQL VCIPQYASGNELQSICHTCSSCKQQNKPDATGRLIFNCTQICPLGQGDPIVTIPPVTTAP TNSTKKNDSSKGSGSTAGSKPKSAATSIVAGVATVAIVAIASLF >THRCLA_07047 MILINLLFGLRLCTDGVSLLQQQVPRKPSKRTKQSRCKHVPFVASTALKPTHETLAPL MPLVVYQEVTENDMAHLISLVDNQDNQEDNEEITENVVADVFVPLVDNQVSQEANEN VVVDFADNQDNQEANENVAEFEPLVNYQDNQENVAEFVSLVDNQGSQEASENVVVE LVPSVVCRDSFEPTEEDVAAVLHGRFAANQAALLRVSSFQPADDRSLTAIQLIRYFELY HLVRMDYNQLRHLEPSRLEKIQLVRLSILERQAIEAMLSDVAELWSRQPNDVSSAKKLQ WFKNLQYGLMWDMLELLEHQKPDHHCARGLCPQLYQEKLDIIYSE >THRCLA_08011 MKTIFLTTALLASTSCALQMTNKERNEILDELNKWKQSAVGKAALVHNFLPSSQRQEG LSIDAKQDLEITRFAHTKKVVEQLNKEHKGSAVFSTNNMFALMSDEEYKKWVKGAFGR DHKKRQLRGENIQLELTAEQREASGIDWTSNKCMPAVKNQGQCGSCWTFASVGAAE MAHCLVTGNLLDLAEQQLVDCASDAGQGCQGGWPTKALQYITQTGMCTSRDYPYTA SDGQCNNSCKKTKLSIGEPVDIQGESALQSALNKQPISVVVEAGNDVWRNYQSGIVQQ CPGAQSDHAVIAVGYGSDGGDYFKIRNSWGAEWGEQGYIRLRRGVGGKGMCNVAE GPSYPSMSGKPNPDGPTDEPSNDPTDEPSNDPTDEPSNDPTDEPSNDPTDDPSDDP TDDPWNGSNDWDWGN >THRCLA_10855 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 MIVPVVLMALTGISITGVTLRWCCSHRQKTSWKEERKEPLLATPVPLPRIKTDVFIERSI AMDVPLMETCSGCGAWIDPSLAAIANGLCVVCSYQTIPSLEIDIDENISENDETSNDKDS DKPILTDEDTTADIESPKDVEIQIEESNFEDEMEDISTTSQDMVIPISQDNNCNEGDEDV EIAEEALALVQDMWDIAYQAHLGVGDDPTADIFVEMALDLDATAEAIKKEPHLLSESFH FLSLSLASLMELVPEAWVAHVEATELKFEALQFRYHSKLTVENCLDLATHLYELVECAQ EFGVDPAVASSLMDGLEELVEAIEETPCELVSWLAYLAATVKLLKSYQRDFEQAEMWD TVVECERNLEPLEMHCWEIYSPC >THRCLA_10997 MKASLCIATLAAMGSIASSRNIRHHAESVMGNPVQRRSESTRLPTHPLTGYWHDFPN PAGDTYPLTQITKDWDVIVVAFANSLGSGKVGFDVDPKAGSETQFIKDISTLKAAGKTIV LSLGGQNGAVTLNDATETANFVSSVYDLIKKFGFDGIDLDLENGISKDLPIINNLITAVKQ LKQKVGDSFYLSMAPTYGGIWGAYLPIIDGLRNELTQIHVQYYNNGGFVYTDGRTLNE GTVDCLVGGSVMLIEGFQTNYGNGWKFNGLRPDQVSFGVPSGTSAAGRGFVTPEVV KRALTCLVQGVGCDTVKPPKTYPTYRGAMTWSINWDSHDGYVFSRPARQALDSLGG SPPQPNPTAVNPTDAPNPLTNPPTSRPTNTPTVTPTQSPRPTSQPTSLPTSSPSSVPTI NPTPIPTSVAPQPTQAPSSSC >THRCLA_11248 MANTIQWLFIYCVIVASQGPPNNGERTCSVTLGGPVSQTSTAGTMSFCTAFPQERCC LPVHDEYVKSTFYALLDSGYICASATNTAIAHLQTMFCLACDPSMSLYLTPPRNTTFFS APQTLKVCRALAISFKQHIDAVSPYYFSDCGLTYAGDRNNLCIPKTAISPNMVFPGCSE GQNICYSTTQGYYSPIWYCSSSPCGPDTPFGLNDIPCSGPTCTPAFQFLNDNRAAKPP FFEPFAVEIIDESTCAPGESSCCMTDSSIVPTS >THRCLA_11271 MKTTAFVLALASTAAASSPCTGSAVITAVTPLIAQATTCSTDSGFDLVALISGTTPTDA QKQKFLTAESCKTLYASVQKSLAGITPACTIGDIDTSGWSTVSMDKGLDALIKSLPSLLA SSGATNSTSNSTANSTISSTTVSPSSTTAAPAKSGVAATGVTIAAVALTTAILHLNANKQ QEIHEHLRLTIKESDVETLGEVMSMSLIPAAEAHQFI >THRCLA_11391 MKLSILLAAFGVVASSSIPKHTYKCNDGVCVQTPLNGAGVSLGSPLLSLRMCEMTCG AGSLWPYPASVSLGTTATAIDTNKVSHSIKINGAEATSTLTNSIVQTFNEGVKAKTKWV RGQSEIGAISHSIYGTISSNNEVLGQDTDESYELSIDGPRVKINAATIYGYRHALTTLNQL IDYDELTNSVKMISKATISDKPAYSHRGIVLDTSRNFYPIESLKRMIDTMGANKLNTFHW HMTDSSSFPIEINGEPRLTTYGAYSAEQIYTQDQIRDLVQFAKARGVRIIPELDAPAHAG AGWQWGPKAGYGDLTLCYGADPWMNYCLEPPCGQLNPLNKQVYSVLDTVYKELTSL FDGDVFHMGGDEVSIPCWNSSKVITDHLKDTNKPGAFFDLWGDFQTKAAAMLNKKVM VWSSDLTTDPYLKYFEPNNTIIQLWGGSTDGDATRITSQGYDVVASYWDAYYLDCGFG GWVSKGNGWCAPYKSWQVIYDLDITANMTAANAKHVLGSEVAMWSEIADAHVVETKV WPRAAALAERLWTNPKTDWKSAMGRMRIQRDRIADAGIGADAVHPLWCRQNPGKCQ LV >THRCLA_11516 YTCVAVQTAIAGIALASQCVLGTTCGGNSAGQCPTFSSWSSSYQKIQPVCAFVNVTN CVNFIKAGSEAKATSGSGSTSTVNCYQATFSANNISQVVSGIYKCVDSGLYVSQNLGAI KNLTTTQMDVCAGNLTTSVGALCNGHGTCAPTAAFSSKYQCICNEGYSATDNCNVAT SNVCNAFGSCGAGNTCDTTSKQCSCTTGTTGPQCSLCDPTASSSVVCNGNGVCSSS GTCTCNSDYTGSLCSRTATTNSTGSNKSSSSSHLVASLATIATCLLAILM 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Table S2. Selected hypothetical proteins (n=30) from the Achlya hypogyna genome possessing putative 5’ transit peptides. Chloroplast transit peptides predicted using ChloroP (v.1.1) are shown in bold face. Transit peptide sequences predicted using SignalP (v.4.0) are underlined. >ACHHYP_00269 MVRTLSLLLLAAGVAGQTSTTPVPTPAVSNPPFTMTLVNSIQARVVAEAATWDETNQ KFGLVLKQNTNTFEERYRAVMDTVNTASVEGALYYVQTEGIDKPLQTGCMRKTNMSYI WFLNITMVQPTFAIAEYQDNGGVVPEYGKFVAMDGGLCTPVGTETPLECLTYGGLNFN KNLGQWVGGEARKKNGRANYDDNYWFSFPNSCYTMRFDAKTKACRDLQKGGLCPIG TQPDGVKCTYSFDVLGYLAIDDLVGITSMKNTLTGQNFKGFSEFCKAGKTEYNFADSS SDLTFWNDPLEPAANANRTKVMMQKYNDLVQNGVGDQKHMKALPSVEELTKANPPC WKNSPRCATAANGCRRKLLSQICEVCSAPADDCKKPGPNDKAAPMLNKQFQPALPTD ATGNTKQPRAPNAAPLDAPAGGAGGNVIKGSGAAATSLILATAVGLVALAV >ACHHYP_01095 MLARLAALIGVAAALQVPFTTYECVRGRCEPRPRSFSPPDSASSLRLCEMTCGAGNL WPLPTSVSLGTTTRVVSVDYVSHTVTFLDNSVPISPLVGAIQRIFDNTLALKATECALAS VGGAELAVTASIESGNEVRDYFRTFTMAADDNTMVQELELETDESYTLTIVDGAATIHA ATVYGYRHALTTLSQLIEYDELSHDMHIISAVTITDAPHFAHRGIVLDTSRQYYSVPAIKR LLDGMGATKLNSFHWHFTDTASFPIEIKGEPRLTAFGAYHPRSVYTQQAMRDIVAYAR ARGVRVIPEVDAPSHVGAGWQWGKDAGLGELAVCFGHNPWTEACVEPPCGQLNPF NPHVYDVLETVYEELNEIFDSDVFHMGGDEVHLGCWNMSAAVTAHMTDRSPDAFYRV WGRFQMQARQLVGEKKIAVWTSDLTNAPYLRKYFDPASTIIQMWTLSTGSDAARFTA QGYPVIASYYDAYYLDCGFGNWLLKGADWCTPYHHWSVLYDLDVLHNVPAAQRNLVL GGEVALWSEEVDEATMDAKIWPRAAAAAERWWSNPVNGTWKDAIDRMRIQRDRLVD IGLQADALQPLWCRQNAGDLSQGSGISISATVKSKSEALTVDTDESYELSIDGPKVSIN AATVYGYRHALTTLNQLIDYDEISNSVKMIAKAKIADKPAYSHRGIVLDTARNYYSIDSLK RLVDTMGANKLNTFHWHFSDSSSFPFEIKSEPRLTSYGAYSKDQVYTQDQIRDFVQFA KARGVRIIPELDAPSHAGAGWQWGPKAGYGELTLCYGSDPWMDYCLEPPCGQLNPL NDHVYDILKTVFEEMHGLFDSNVFHMGGDEVSVPCWNSSKVITDHLKNTTSNAPFFDL WGTFQTKAGALIEKANKKIMVWTSDLTTDPYLKYFKPSNTIVQLWGGSTDGDAERLTS KGYEVVASYWDAYYLDCGFGGWVSKGNGWCAPYKSWQVIYDLDVRANLTATNAKRV LGSEVAMWSEIADEKAVEAKIWPRAAALAERLWTNPKTNWKSAMTRMRIQRDRIADA GVGTDAVHPLWCRQNPGKCTLV >ACHHYP_01226 MTALADAVWLAVMAFLDGQDLSRLMRVSRAHWRRLQAQVRRWREIQLGLGLGHWV QRNVRLTINTQVQEAQSLAVQRSPDARVPPRVETIQKELGPIEAERSVHRLTATTPLFT ATQQAVLVLSFDCTSADTKPLLVHTSQRARTLYTTLTLTIFDRTLRRHVYHKASGDLAT VPVAEKQAWTNAGATLRCDVASNDKSCQVQLGLPARLDGKIDCYHIERVDFTLHKREL YPVFSLPLEPSLPTCWIHLQFHDLARAQCLARVSAPCHALLEMAASRTDDTNHPARRT AVEQLEVATFRSTQPTSLPDISSLAKPGMISMVISGPERHQAFYHTAFGHSGATRKSDS AHVLAATWVPGVLEFAMYPDTLNRRVLKGIFTLEFAVSGALTSLVVLAQHLSPRRLLRY NARVASYSRRPEAERNEDA >ACHHYP_01546 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 MVALFLGTAIALALASSATGSFTGLAMPAANSSEPKSGQCKLMKLLPRATQFNVALS PRHYGRGGHCGRCVQTQCDRCAASAPIIAQVTDRASDVGLSKPMLRALFGSGAPSAV TWDFVDCPVNDPIALCTKPRNTSAYIIYVQPTNTVAGVQNMTIDGFRGRLTNASYHFKA PMPANWSNVRVSMKSFTGDAIAASVALRPGRCVTIPHQFSPSPAAASGTPAVIDYDGD EDADSITVPPPYK >ACHHYP_02169 MAWIVVLGILAHVATALQSSLCATSSAFSPPGCHANRRLATWSRAIVRLNAGGHVCT GWFVGSEGHILTAHHCIHKARAVEVVVEETPAQTCPPRTIRGRMTTGIDVVAFSVALDY ALLRPLNRSVRGPVHLQLHSSAADIVGLEAIVAQHVDASSPVVLSEAGRIVSTTFAGCG RRDRLAYALDTKASASGSPILSTATGAVLGLHTCGGIHCHGKSVPMWIVIGCSSEPGH WNSGAVAADVVADLRQRHHLPPDAVAHETLSAPTPSTIIVERGRLVQRAANTTSVDAY LLTMAMPGRVTLDLLAWTMDAQGRWHDLRRDCDGSFFDTKVILAVVDDADGRPLLRR IAENDNDTRHQGMGDGSIDNRDAFLDVYLASPGDYYVLVGTAAMLLPAVFAPRLSAPT DGGQHLYGCGNTRATEANYNLRITTDDGTLQRIEAPFPRTAACSSSARKCPAAHADTA LTLDAVVAGTLHRTYSSGTSMDHISFELTKAGRIAIDVVSYQEHTNGSIAIDGLHDVCGR AYLDTVLYVFGATIPSGEYLDPAALVATASDRPPTHVASQRYRSVSTRDPYVEVDLPA GNFTLVVGQQPLSLFEAVRVLYPGSRETDAPLLCGRPHPFGHYHVFFWVQHRRMLSA TMPGSFDHAACTHEVCSDSML >ACHHYP_02305 MKFTTLLVATVFGQNTTTAPSSAPTPAPTKCLLQFTSPCKSSSECGDLNGFNLTCIKS GSNKQCNFNGGSTVAKDNQFKAADNLVYQFGDCSTASCTTGHGFTEGLPTTVTCQE PLVCVKEINDNPGVVLKSQCHTCGSCKAQSLKDTRFDCSKVCPLTPAPTTKAPKVPGA TGSAASSGSGSETSAPATRAPKTGTPAPTAASSASTALVSGIAVVALAFAQLC >ACHHYP_03044 MAGLIVGILAAVGTFSGSGESISTGTSSTPAPTTHTPTTLSPSPTTKPTTVTPTPTLAN GLCPLRGMYLSGTSCVACPTPKKTFSVFWESQVDCSTFATSSAAAYVTHIYWSFALID PTTGTVSSTFQGSSATLKACIAAARAKCIKNYVSIGGATMRQTFVALNSSAQLTTFALS AAQVVQEYGFDGVDIDDESGNLLAGGDWKANALPNVLVYLQGLKTQLAALPRAATEP KYQITWDEFPTSLSTGCDLASGDYLRCFDVRIANIVDQVNIMMYNSASSTDYDNFLNVV TPTEWATAMPASKIVIGGCVGPIGTIGGCAFGAAPTATQLKAYASLLDPALHERLSRMD LGFMLDLARDELLVLLESEQAHNPGVAVREGEGREDKQQQRRVQREVGAEEVDEAH VGEERVEGGVRRDLAGVEQQ >ACHHYP_03052 MAAVSNPLLPLQLALADLLERPIHAALDDALRQPSNEQHLHHCVRSLPPSATVDALD ASLAFVVHARALLTICSDYLDQHIAPQHALKKITDLLSVSREIANDAEVNATADDADVDE AATDDSDQFASPKGEPPVGPWSGSETPAAPTSRQSWWAQIWGGDEDNDSAGDDVS APPEEETLPSLPVEVANTIASLAAFPTNLKLQLHGLEALVEYVHGPCCCESVGPLYAAP DMLPAVLHAISSLAQSKRAQIAGLSLLANPSSPKANMPMLPANLPTQQVRRLILRAMQR FKAHAQIQGLGCLALSNLCRGPAISESHALKARGCRLVWSSWLLALICASSGTSMRAH PLTGGPEDMQYAVLDAGSVAVVEAASRRFQDDDRVRKHADMALREMLQKHASRRAP QCAFQ >ACHHYP_04549 MRARAFFVLAGCATAAASPPLPWQSSCQVCAHTGRCGGASSPIKFCGTWPTGACC CSANVNCPTPGVHATCDCGFLADYPVDAALPPVADVLGYNFS >ACHHYP_04706 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 MRASVLAIAATVAAAANNQTATTKVFSLEVGTVGVHASRNQDSVLIPCKSNVCVPTG SATLEFCRKACNRETGEHDCTTNCACNGTTPGYMCAGICNKAKTADECGSPVFQTCS GEDLVPDYECANYKCTNHQRTNYLGANNRCANYERAAYPLPDIHARVRFVKLLNPGE AHLIEYYTGLYFGPGQNNANDGFIWNPSVGSIKSISGNSCLDAYVAVDHNVYVHTYPC DDSNPNQWWLYDSSLHQLRHKTHSTMCLDADPNDANKKVQMYLCSPGNANQYFDM RPILS >ACHHYP_04908 MTSVVAVTACLLSWLQRSRASPPVAYSAPNSVAFPAEIVHIKVILSRRRSSVLANGVL PPVAPPRRAAHHEGHLAPLTRDLLSDKSGNAPP >ACHHYP_05005 MSHCHFAFFVPMLARSLASFTRASRRCFSTEGPFEHRAVSAEVIAELKALYGDRVSTA ASVREHHGTDESYHTPSPPDVVVYADSTEEVSKILQIASASKTPVIPFGAGSSLEGHISA LHGGISLDLTNMKSVISVEQENMSCRVQCGVTRLQLESELRATGLFFPVDPGADATLG GMVATNASGTTTVRYGNMKSNVLGLTAVMADGKIIKTGSKARKSSAGYDLTRLFIGSE GTLAVVTEVELRLQGVPEAQKIAVCSFPTIQDAVDTCTVIMQMGIPVARMEFMDHKAIE ATNSYSKLNNIVSPCLVIEMNGTPEEIEHHTATVQALAEEYSVQRMSWAATEEDRKELL KARHSAWYATMNLVPGSRALSTDVCVPISNLTQVIVDTQADLEASNLVGTIVGHVGDG NFHVMLPFLPEDEPAVRAFSDRLVERALAADGTCTGEHGIGSGKIKYLRMEHGDSVDV MRTIKQALDPHNILNPSKLF >ACHHYP_05180 MYNTADSVAFLSLLTSTVRAITPLPPLQFRVQAKFATGPLPASKPSPSSFISVRFVWNI LVRLVVYRRRATPTPVDMAQERTVLA >ACHHYP_05326 MHCTFFLSIVTAALAGVAGHVQQRIRSGAVKARGVNLGSWLVTEHFMMPQSPIYQNV SADLQPLGEYVVTTALGRAVADPLFKAHRSSWITENDIKEIASFGLNTVRVPVGWWIYE DPNDSDWQAYSPGGIQYLDALINDWALKYNVAVLVGMHGAKGSQNGEGHSAPQLPG ESHFTDDADNVYTTMQSAKFIMSRYQSSVAFLGLEMLNEPTITPGRVYNIDRTKLIIYYT NLYSKLRAICSSCIIMLSPLLNEQYESFGNQWANVLPTGSNNWIDWHKYLIWGFENWS MKDIINTGTQWIANDITLWQSRRSAPIFVGEWSLAAAEGILGELKNGTNLNTYANRALA AMKEAKAGWTYWSWKVNATDWRSYGWNMQALLRAGVIDLKNA >ACHHYP_05770 MSKLSLAFLLHPTALACPPGPEAYVCPLSPETIVCPLSPRVSPASSARAKPKRSPPA PRSRPCKEPGCTKYAVTRGHCIAHGGGKRCSVEQCPSGAKSNGLCWKHGGSKTCS FPKCSNRSKTYGVCWSHGGGKQCADPNCTKTALRHGFCWAHGGGKRCRTEGCQR PAYERNDNLCDVHCAKAS >ACHHYP_06287 MQLSHILLFATAAAAQHTLLDSGTPEDRPSSWGSPVTKQIPSAVRFRSSGLCGEAQTI DYVDFMVNTDLADIKANATWIGVEICPSVEDVPACPPTSVAEQIPIEVRGKRTTLHWVP ATPKVLEPESLYWFIVSSNVENALQAVSWYPGSKRYGTDNDPKSDVASATRMLVPWG GMDWVVEPSGGVAPLDHRRVPNAKIVVKA >ACHHYP_06505 MIKSFTITATLLASASSLQMTNKERNELIDELNQWKKSQAGKTALVQGLLPPHPKTESF DANAKLEAELVRFATTKKVVEKLNAEHNGSAVFSTDNQFALMTDDEFKKYVQGAFGK PHKKRQLRGENIQLELTPAQREASGKDWTTSKCMPAVKNQGSCGSCWSFAAVGASA MAHCLVSGKLIDLSEQQLVSCASSAGQGCQGGWPNKALEYIAQTGVCTAADFPYTQS NGQCKQSCRKNKLSIGRPVDIRGESALQSALDKQPVTVVVEAGNNVWRNYKSGIVKS 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 CPGAQSDHAVIAVGYGNGFFKIRNSWGANWGEQGYMRLQKGSGGNGMCNVAEAPS YPSMSGSPKPNNDDNNMPDDNDD >ACHHYP_06977 MKISRVAVIGLLFVAARSTRAQSTSSSTQSNTETTSTESTPFSSSSSSGPAPIVDVIAAA IDAGATPKQAAIVAVAADTGASLAAIIQTAVDAGVSPSIASAVASAANSAAGSGADDVTS APITTVADAAVDAGATTAQAAAIANAASSGVSGDDLVNVAISVGVPASIASSVASAAGS AAGTPAPIADVIQAALDSGASLNQAAAIAVAVSAGVSVDDIQTAAIQQGLPASVASSIAS AVQSTIASAAGSTSADALAANGLGVTSASSTSYVPPSEVTPLKLTGAKDPEAASDVNS PEAYSFSAPMTSGSTKSSESPLSGISGMFNNIVALVTSAPSPAEEPKPRLRASCRTA >ACHHYP_07400 MKTPAFLASALFAVATGERPACGPDTPSPTMTPTADPTFAPTSGPTFPPTPAPGQWT SLGGFAHDISFDGTNVCVKNGDGAFCGFAGQPFDQWKPVATQLKDIEQVACAKGVAF VWGRSSGDLVMKTINLKTGEEHDAKMQDGESPRQFSTDGSVVCGTTNSRLFGAKVT NGALGAYSTISEDHEIYKTAVAGEFLIVAGYDGALQATLLDAENWDTFSFDVVPVDLRA REISTDGVDLCIVTYELDIACSKLSSGLEKWTKVPGEWKTVAVSNNTIYGVDFKSSEIRY TYLK >ACHHYP_08323 MVAWAWLPAAAAVVAATETHWSHLGNASSDRGLRIHTPITRADLHDEYNDAPVTQR RLSGSAASLFRAVAGYGFRGLSNAAIFSGVTLDMCASACVTDARCLSFDYEASTCYIA HTDRYAYPADFVPRATSTYYEWQGAAATPTIEPNGGRLTSYGAFQLFTTSRAAAMYY QFKSLENGTVTVYTLYSPGTTVTLPEYPCVVQAYTTKAGLSDSIVLVSNAFTVYAARYA YLVPFYNGLGFHGLVTRVQLDVQGVKRPRPSRVLEFTDINSTLGIGPFRGQLSTINLTA YDARLAGFFDAFTGITTTLCPQVESRVAVSTVTYVNVSLQVFQNASRWVLVPAPLYAS APGDLVFSSSVSLVEEYLYLCPHQNAKGHAGVIAKVNLRAFNATSHLPFQPAIEMLDLT VIDPSLTGFGSCFANRNYGYFVQRRNAAGLAGQIVRVNLDLFAQPALAVTVLNATTFD ARFVGFSGAVVYKNVAYLVPFERNKVGLELNPNYKYFPTPTSSIMGRLDLTTFSTVTPV DLSVLDVKYACGYFGGFTVSYYVYLVPNMWTTDTTSPGVNPYHGLVARLNTLTMNVE SLDLTLVDPSLKGFMRGFAFGRYAILVPHRNGLTTELPVRLNKSQKNNLGTIVAIDTDNF TPSGVRYLDLTLALRSQIPNMPDADLRGFIGGGVSGEYGFFVPYFNGVRFSGKVVRVN LRKFGEVQVLDMTQVHTSLRGFTNAVFPQLYEPTVTSLWNYVIPDGTQTPYTFITVDV >ACHHYP_09221 MVSVTTPSMTLLGAIALVAGQATVAPTTATPSAPSASPTKGPWAFKSVRTVQARVQA DVPVWDAAHKEWVAVFPQNTVTFEQRYRAAMDTINTATVEGALFYVQTEGIDKAVQA ANGCMRKSNMSYIWYYDIEVVQPVYSVAEFGQNTGYAPEYGPFIAMDNGMCTPTSGT TVPQGCMQFTGLAGNIALGNYIGGEPRTKHQYANYANNYWFSYPNSCFTKSFTAKTD ACRNSPMQKGGLCPYGTKPDGINCTYSFSVLGYLSIDDLVGITSTVNPQTGKAFSNHM EFCKAGKYEWDFTTSTGLPFWADPLNVTANAARSAKMMDLYTAKVAAGVGEYANMK PFPKVSELVAQNPSCSDNSPYCAKQPHGCQRSLLGQICVPCSSASPSCKPPTRAFPA LPVATTPPPVTDAAGNVVPMSTNLLGQAVPATSSASTVAFSATAAILVLALA >ACHHYP_09519 MIVSAIVFAVLASAAGQSPLKIASSVPYALTIDGSAPVSTVISNTRATSLSVHIASMNLP PGATLTIGTVDGKDKVVYTGAHTNLVSDYFIQNKVVVSYAAASYSNNTTPLVAIDKYFA GTPDAGGLESICSTTGDLSRPAACYATSEPVKYAKARAIARLVIGGSSLCTGWLFGSE GHLLTNNHCINNDRLAASTQVEFGAECASCSDGSNNVQLACKGTIVASNVTLLATSSK LDFALVKINLNAGVDLSKYGYLQARDSAPVLNEPVWLAGHPQGDPLRMAVATSNNAE GAIVSTNVTDSCKDNQVGYLLDTQGGSSGSPVMSTVDNSVVAIHNCGGCDSETPSNG 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 GIPLTKILAYLRANNIALPKNSVSAAPAPTTAKTTTASPATPAPSTSAPATAAPKPPTFTL CSVSNKVISEYYTGLYVAPAGHTANEQFSYSPDTGAIQVQSNGQCLDAYWGGSSFLV HTWPCDRGNNNQKWTVANNQVMHRVHGVCLTSVAGSKSLGVAPCNAADVRQWIYT NCDTANVRNFVQLRTPRGALVSEWYSSVLAKQPQSSWTELWEINGQQMRSFSGSTC LDAYWDNSRFQVHTWQCDPTNGNQQWRVGNSVVAHATHSNLCLDVDPTDPRQAAQ VWGCHSATINSNQLFDVVAF >ACHHYP_10824 MASISQWLCLSCWAPMSTPKTTMATDAWCGTFWKHMLMSVSVTPPPACDCSTDGA TALFFAAQRGHSDIVYLLMSAGATAEESTLGISPKQIAQANGHTIVAAIFDTLPPPLPHRL HWERSSVLFLSSFLVYRCNLLLLRH >ACHHYP_11025 MHARFFAPVLGTLSLVAGSATTLAVNSSRTPQVNAQVRRLSKRALPRDMGKSSTSA QAPEGSSKPDMMKDFPIFLFTIE >ACHHYP_11286 MASESTPLLALLELPLLKPTSAETIQGHVTALRASFISGAMRPLAARKAQLRAIRALVE DGCEILQAAMWKDLHKHAAETFVTETSSVLLEVQDHLDNLDDWAAPHKVGTNLLNLP GSSYIRSDPLGVACIMDTWNYPIMLLLMPLIGAI >ACHHYP_11397 MDRLLLLSALATAVAVDDAAPRPSRAPLPTTLVPWGSPLAAPTAPCTWGGRAHALD WNLTTSVPGSRQCFPNLFAADQPLEFPYPRSSYNYDLDPPVVGPRVQVQWTNGVTN VTAPVAAFDYRTFEMTGDELLFHALPDAPGVYRLAVQAFDWDRASSECRACLAVTDQ VRPRATVARAGLCGASTTAPYSPEALAAADDRVRALVRYRATATNNDACSDRRCDAV TVAQTGFLSAFPTAVVDGANAAVDAVPDGWLGCLAAPLSARERQRLTTPLALVDDAR DYFVALQELYTPFRCGAPPGRPTCAGAASETCALMQAVVLPASHLVARVAVKLKATAG HIADPAAAFPGAGYLPPSARHLHLAIPCYPTNASFSSFCADTVEWRVSDLFELSAELNA SQPWGFDAAAPLVTWFVQQGPAWVAVADNKRLAFDKFQDTLVFRAMTPCGQVGEDI AWTVFSHRAEALSVDAWWNSLWSCGGCNVPKADFSVCRFRFDPTSPLVSAMLHPPA SCRDAAGRSCRNGCLARGQCNGRSTAASCGQQAGATWCDARGSALLAAAVPRYSL RSLQCVWQYANTSSANWSVAVDVAVDTAFALKLRNADATELSVSCTLTFDPDTGEPA VVKTRSLALSLRNCDGPRFEDHALAFVKDRCDASWRPGVGRQPAPRQACAGHLVFP STTDAAATVLLTPADDLACCSGPVAAFSCQPLPGHPGLKQCQRADTATALLAAEPQA WPPVALAASLALVFVLVRRRRQPSDTDLSRPLIDGDRC >ACHHYP_12628 MIVQILALAATASAFTKCHIRHPNRTEVLSTPCPHEYVTELPASFDWRNVNGTNFVTV SRNQHVPHYCGSCWAFAATSALSDRVRIARERNSEGKDRVLVTRQVNLSPQVLLNC DKEDMGCHGGEGLSAYRYIHENGIPEEGCQRYLATGHDVGNTCTAIDVCRNCEPSKG CFPQPSYDTYHVSEYGAVDGEAKMMAEIFARGPIVCGVAVTDEFLNYSGGVIDDKSGR TDIDHDISIVGWGVDGSGTKYWVGRNSWGTYWGEEGWFRLRRGNNNLGVETDCAF GVPADDGWPKRHTETTSPAKAAVWSGEIKSLLQPSRAQAKSRAPVHFVGGEKVLSPR PHEEIDVLALPKQWDWRNIAGINYVTWDKNQHIPQYCGSCWAQATTSALSDRIAILRN ASWPEIALSPQVVVNCHGGGSCEGGNPGAVYEYAHRHGIPDQTCQAYVAKDGQCNA LGVCETCWPTNSSFTPGKCVAVPKFKSYYVAEYGHVRGADKMKAELYKRGPIGCGM HVTDKFEAYTGGIYSEKTWFPIPNHEISIAGWGFDEATQTEYWIGRNSWGTYWGENG WFRIKMHSDNLGIEGDCDWGVPIPDGSQPLL >ACHHYP_13722 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 MKCFAVLAFAAFAAAATSEQAATTQPATTTAAVTTAAPNTTTVVPLVSTKAPNTTTVTP APTTKAVTTVPVTTVKANTTAPVTTVPVVTQTNVTSPDETETPEPVIEQPTDAPLPVPT KKKSNATTVPPSASASISMLSVASVAVAVAAYVM >ACHHYP_14385 MATSVLALCFSSLTANSTNTPEPKYQTRTVDTVVYESSAKWPKYMGKGSAIQMYTTA ALSAQILVSFPETTTVLEKVATVGPLVSLSAVIFFGAKYLGERVITNVTSCRTVGQRGIT DAIYLYLDEFLKIQVAGGLRPKTFECYPKGVSALRLISYLKLVSKDENGMCNVKINRTTF WLDLGKAQVHQEQSLKILLDGKPLLVRKGKIKKAARA >ACHHYP_15409 MGLFAPVLAFATVAVAGSSSTTLPTAPASLSTTRSVPLTDRAALIQELAKWKDSKAGK YAAANGFLKLSRLESAGDAEAELAAFAETKATVEALNQQYPLARFSTENPFALLTNDEF ATWVSGGRDKVQRKVPEASTTQSTTASIAPGTVDWTMSGCVASVRSQGVCGSCFAF AAVAAAESAYCLLHDRHLTPFSDQQVLSCGPGNGCMGGWSDQSLAWMASHGVCTG ASYPHTNDWNTTAAACIPECKALSMPYSSVASVAGEHELEAAIALQPVAVDISATSPVF KNYESGIITGGCNVDFNHVVLGVGYGVAEVPYFKMKNSWGDWWGEGGFVRLQRGV GGVGTCGLARHAAYPVVFPMPFNLVTFRGVVISEYYSNLFASAKQGSVNELWTYDAIT RHITVGSNHQCLDAYPTGSSYAVHTYSCDAKNDNQKWVIDSANHAIKHAVHPTLCLDV DPNQNNKVQVWSCSPGNQNQWVAVSEERVKLWNVNGNFLASDGNLIQFYSPSSPSY EWAVSNLDHTWRARSNVGAPDLCLDAYEPWNGGAVHLYTCDSTNGNQKWIYDAKTQ QLRHLTHVGFCLDMRTALGDKAHLWTCNTPANSLQKFQYKSLTFPA 40 LITERATURE CITED Archibald, J. M. 2008. The origin and spread of eukaryotic photosynthesis: evolving views in light of genomics. Bot. Mar., 52:95--103. Archibald, J. M. 2009. The puzzle of plastid evolution. Curr. Biol., 19:R81--R88. Armbrust, E. V. 2009. The life of diatoms in the world’s oceans. Nature, 459:185--192. Armbrust, E. V., Berges, J. A., Bowler, C., Green, B. R., Martinez, D., Putnam, N. H., Zhou, S., Allen, A. E., Apt, K. E., Bechner, M., Brzezinski, M. A., Chaal, B. K., Chiovitti, A., Davis, A. K., Demarest, M. S., Detter, J. C., Glavina, T., Goodstein, D., Hadi, M. Z., Hellsten, U., Hildebrand, M., Jenkins, B. D., Jurka, J., Kapitonov, V. V., Kröger, N., Lau, W. W. Y., Lane, T. W., Larimer, F. W., Lippmeier, J. C., Lucas, S., Medina, M., Montsant, A., Obornik, M., Parker, M. S., Palenik, B., Pazour, G. J., Richardson, P. M., Rynearson, T. A., Saito, M. A., Schwartz, D. C., Thamatrakoln, K., Valentin, K., Vardi, A., Wilkerson, F. P. & Rokhsar, D. S. 2004. The genome of the diatom Thalassiosira pseudonana: Ecology, evolution and metabolism. Science, 306:79-86. Baginsky, S., Kleffmann, T., von Zychlinski, A & Gruissem, W. 2005. Analysis of shotgun proteomics and RNA profiling data from Arabidopsis thaliana chloroplasts. J. Prot. Res., 4:637--640. Barbrook, A. C., Howe, C. J. & Purton, S. 2006. Why are plastid genomes retained in non-photosynthetic organisms. Trends Plant Sci., 11:101--108. Baurain, D., Brinkmann, H., Petersen, J., Rodríguez-Ezpeleta, N., Stechmann, A., Demoulin, V., Roger, A. J., Burger, G., Lang, B. F. & Philippe, H. 2010. Phylogenomic evidence for separate acquisition of plastids in cryptophytes, haptophytes and stramenoiles. Mol. Biol. Evol., 27:1698--1709. Beakes, G. W. & Sekimoto, S. 2009. The evolutionary phylogeny of oomycetes insights gained from studies of holocarpic parasites of algae and invertebrates. In: K. Lamour and S. Kamoun (ed.), Oomycete Genetics and Genomics: Diversity, Interactions, and Research Tools. John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470475898.ch1. Birch, P. R. J., Rehmany, A. P., Pritchard, L., Kamoun, S. & Beynon, J. L. 2006. Trafficking arms: oomycete effectors enter host plant cells. Trends Microbiol., 14:8--11. Bittner, L., Halary, S., Payri, C., Cruaud, C., de Reviers, B., Lopez, P. & Bapteste, E. 2010. Some considerations for analyzing biodiversity using integrative metagenomics and gene networks. Biol. Direct, 5:doi:10.1186/1745-6150-5-47. Bodyl, A. & Moszczynski, K. 2006. Did the peridinin plastid evolve through tertiary endosymbiosis? A hypothesis. Eur. J. Phycol., 41:435--448. Bodyl, A. 2005. Do plastid-related characters support the chromalveolate hypothesis? J. Phycol., 41:712--719. Bodyl, A., Stiller, J. W. & Mackiewicz, P. 2009. Chromalveolate plastids: direct descent or multiple endosymbiosis. Trends Ecol. Evol., 3:119--121. Bowler, C., Allen, A. E., Badger, J. H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U., Martens, C., Maumus, F., Otillar, R. P., Rayko, E., Salamov, A., Vandepoele, K., Beszteri, B., Gruber, A., Heijde, M., Katinka, M., Mock, T., Valentin, K., Verret, F., Berges, J. A., Brownlee, C., Cadoret, J. P., Chiovitti, A., Choi, C. J., Coesel, S., De Martino, A., Detter, J. C., Durkin, C., Falciatore, A., Fournet, J., Haruta, M., Huysman, M. J., Jenkins, B. D., Jiroutova, K., Jorgensen, R. E., Joubert, Y., Kaplan, A., Kroger, N., Kroth, P. G., La Roche, J., Lindquist, E., Lommer, M., Martin-Jezequel, V., Lopez, P. J., Lucas, S., Mangogna, M., McGinnis, K., Medlin, L. K., Montsant, A., Oudot-Le Secq, M. P., Napoli, C., Obornik, M., Parker, M. S., Petit, J. L., Porcel, B. M., Poulsen, N., Robison, M., Rychlewski, L., Rynearson, T. A., Schmutz, J., Shapiro, H., Siaut, M., Stanley, M., Sussman, M. R., Taylor, A. R., Vardi, A., von Dassow, P., Vyverman, W., Willis, A., Wyrwicz, L. S., Rokhsar, D. S., Weissenbach, J., Armbrust E. V., Green B. R., Van de Peer, Y., Grigoriev, I. V.. 2008. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature, 456:239--244. Burki, F., Shalchian-Tabrizi, K., Minge, M., Skjaevelane, A. Nikolaev, S. I., Jakrobsen, K. S. & Pawlowski, J. 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS One, 2:e790. Cavalier-Smith, T. 1999. Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J. Eukaryot. Microbiol., 46: 347--366. Cavalier-Smith, T. 2003. Genomic reduction and evolution of novel genetic membranes and protein-targeting machinery in eukaryote-eukaryote chimaeras (meta-algae). Philos. Trans. R. Soc. Lond. B. Biol., 359:109--134. Chan, C. X., Reyes-Prieto, A. & Bhattacharya, D. 2011. Red and green algal origin of diatome membrane transporters: Insights into enviromental adaptation and cell evolution. PloS ONE, 6(12):e29138. doi:10.1371/journal.pone.0029138 Cock, J. M., Sterck, L., Rouze, P., Scornet, D., Allen, A. E., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J. M., Badger, J. H., Beszteri, B., Billiau, K., Bonnet, E., Bothwell, J. H., Bowler, C., Boyen, C., Brownlee, C., Carrano, C. J., Charrier, B., Cho, G. Y., Coelho, S. M., Collen, J., Corre, E., Da Silva, C., Delage, L., Delaroque, N., Dittami, S. M., Doulbeau, S., Elias, M., Farnham, G., Gachon, C. M. M., Gschloessl, B., Heesch, S., Jabbari, K. Jubin, C., Kawai, H., Kimura, K., Kloareg, B., Küpper, F. C., 42 Lang, D., Le Bail, A., Leblanc, C., Lerouge, P., Lohr, M., Lopez, P. J., Martens, C., Maumus, F., Michel, G., Miranda-Saavedra, D., Morales, J., Moreau, H., Motomura, T., Nagasato, Ch., Napoli, C. A., Nelson, D. R., Nyvall-Collén, P., Peters, A. F., Pommier, C., Potin, P., Poulain, J., Quesneville, H., Read, B., Rensing, S. A., Ritter, A., Rousvoal, S., Samanta, M., Samson, G., Schroeder, D. C., Ségurens, B., Strittmatter, M., Tonon, T., Tregear, J. W., Valentin, K., von Dassow, P., Yamagishi, T., Van de Peer, Y., & Wincker, P. 2010. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature, 465:617--621. De Koning, A. P. & Keeling, P. J. 2004 Nucleus-encoded genes for plastid-targeted proteins in Helicosporidium: functional diversity of a cryptic plastid in a parasitic alga. Eukaryot. Cell, 3:1198--1205. Delwiche, C. F. 1999. Tracing the thread of plastid diversity through the tapestry of life. Am. Nat., 154:S164--S177. Dodge, J. D. 1975. A survey of chloroplast ultrastructure in the dinophyceae. Phycologia 14:253-–263. Dong, J., Chen, C. & Chen, Z. 2003. Expression profiles of the Arabidopsis WRKY gene superfamily during plant defense response. Plant Mol. Biol., 51:21--37. Dorrell, R. G. & Smith, A. G. 2011. Do red and green make brown?: perspectives on plastid acquisitions within chromalveolates. Eukaryotic Cell, 10:856--868. Drummond, A. J., Ashton, B., Buxton, S., Cheung, M., Cooper, A., Duran, C., Field, M., Heled, J., Kearse, M., Markowitz, S., Moir, R., Stones-Havas, S., Sturrock, S., Thierer, T. & Wilson, A. 2011. Geneious v5.5. www.geneious.com. Elias, M. & Archibald, J. M. 2009. Sizing up the genomic footprint of endosymbiosis. BioEssays, 31:1273--1279. Emanuelsson, O., Nielsen, H. & von Heijne, G. 1999. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites Prot. Sci., 8:978--984 Foth, B. J. & McFadden, G. I. 2003. The apicoplast: a plastid in Plasmodium falciparum and other Apicomplexan parasites. Int. Rev. Cytol. 224:57--110. Gaulin, E., Madoui, M. A., Bottin, A., Jacquet, C., Mathé, C., Couloux, A., Wincker, P., Dumas, B. 2008. Transcriptome of Aphanomyces euteiches: new oomycete putative pathogenicity factors and metabolic pathways. PLoS ONE, doi:10.1371/journal.pone.0001723 Gibbs, S. 1981a. The chloroplast endoplasmic reticulum: structure, function, and evolutionary significance. Int. Rev. Cytol., 72:49--99. 43 Gibbs, S. 1981b. The chloroplast of some algal groups may have evolved from endosymbiotic eukaryotic algae. Ann. N.Y. Acad. Sci., 361:193--208. Green, B. R. 2011. After the primary endosymbiosis: an update on the chromalveolate hypothesis and the origins of algae with Chl c. Photosynth. Res., 107:103--115. Gruber, A., Vugrinec, S., Hempel, F., Gould, S. B., Maier, U. G. & Kroth, P. G. 2007. Protein argeting into complex diatom plastids: functional characterisation of a specific targeting motif. Plant Mol. Biol. 64:519--530. Gschloessl, B., Guermeur, Y. & Cock, J. M. 2008. HECTAR: A method to predict subcellular targeting in heterokonts. BMC Bioinformatics, doi: 10.1186/1471-2105-9393. Guillot, M. & Gibbs, S. 1980a. Evidence that the chloroplast and nucleomorph of cryptomonads are remnants of a eukayrotic symbiont. J. Cell Biol., 87:186. Guillot, M. & Gibbs, S. 1980b. The cryptomonad nucleomorph: its ultrastructure and evolutionary significance. J. Phycol., 16:558--568 Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Sys. Biol., 59:307--321. Hackett, J. D., Yoon, H. S., Li, S., Reyes-Prieto, A., Rümmele, S. E. & Bhatta charya, D. 2007. Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolats. Mol. Biol. Evol. 24:1702--1713. Hackett, J. D., Yoon, H. S., Soares, M. B., Bonaldo, M. F., Casavant, T. L., Sheetz, T. E., Nosenko, T. & Bhattacharya, D. 2004. Migration of the plastid genome to the nucleus in a peridinin dinoflagellates. Curr. Biol., 14:213--218. Harper, J. T., Waanders, E. & Keeling, P. J. 2005. On the monophyly of chromalveolates using a six-protein phylogeny of eukaryotes. Int. J. Syst. Evol. Micr., 55:487--496. Huang, J., Mullapudi, N., Lancto, C. A., Scott, M., Abrahamsen, M. S. & Kissinger, J. C. 2004. Genomic evidence supports past endosymbiosis, intracellular and horizontal gene transfer in Cryptosporidium parvum. Genome Biol., 11:R88. Iida, K. Takishita, K., Ohshima, K. & Inagaki, Y. 2007. Assessing the monophyly of chlorophyll-c containing plastids by multi-gene phylogenies under the unlinked model conditions. Mol. Phylogenet. Evol., 45:227--238. 44 Janouskovec, J., Horak, A., Obornik, M., Lukes, J. & Keeling, P. J. 2010. A common red algal origin of the apicomplexan, dinoflagellates and heterokont plastids. Proc. Natl. Acad. Sci., 107:10949--10954. Jiang, R. H., Tyler, B. M., Whisson, S. C., Hardham, A. R. & Govers, F. 2006. Ancient origin of elicitin gene clusters in Phytophthora genomes. Mol. Biol. Evol., 2:338--351. Kamoun, S. 2006. A catalogue of the effector secretome of plant pathogenic oomycetes. Annu. Rev. Phytopathol., 44:41--60. Keeling, P. J. 2004. Diversity and evolutionary history of plastids and their hosts. Am. J. Bot., 91:1481--1493. Keeling, P. J. 2009. Role of horizontal gene transfer in the evolution of photosynthetic eukaryotes and their plastids. Methods Mol. Biol., 532:501--515. Khan, H., Parks, N., Kozera, C., Curtis, B. A., Parsons, B. J., Bowman, S. & Archibale, J. M. 2007. Plastid genome sequence of the cryptophytes alga, Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol. Biol. Evol., 24: 1832--1842. Kleffmann, T., Russenberger, D., von Zychlinski, A., Christopher, W., Sjolander, K., Gruissem, W. & Baginsky, S. 2004. The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions. Curr. Biol., 14:354--362. Kleffmann, T., Hirsch-Hoffmann, M. Gruissem, W. & Baginsky, S. 2006. plprot: a comprehensive proteome database for different plastid types. Plant Cell Physiol., 47:432--436. Köhler, S., Delwiche, C. F., Denny, P. W., Tilney, L. G., Webster, P., Wilson, R. J., Palmer, J. D. & Roos, D. S. 1997. A plastid of probable green algal origin in apicomplexan parasites. Science, 275:1485--1489. Kolaczkowski, B. & Thornton, J. W. 2008. A mixed branch length model of heterotachy improves phlogenetic accuracy. Mol. Biol. Evol., 25:1054--1066. Kroth, P. G. 2002. Protein transport into secondary plastids and the evolution of primary and secondary plastids. Int. Rev. Cytol., 221:191--255. Lane, C. E. & Archibald, J. M. 2008. The eukaryotic tree of life: endosymbiosis takes its TOL. Trends Ecol. Evol., 5:268--275. Larkum, A. W. D., Lockhart, P. J. & Howe, C. J. 2007. Shopping for plastids. Trends Plant Sci., 12:189--195. Lee J. J., Leedale G. F. & Bradbury P. (eds) 2000. Illustrated Guide to the Protozoa. 45 2nded., Society of Protozoologists, Allen Press, Lawrence, Kansas. Marchler-Bauer, A., Anderson, J. B., Derbyshire, M. K., DeWeese-Scott, C., Gonzales, N. R., Gwadz, M., Hao, L., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Krylov, D., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Lu, S., Marchler, G. H., Mullokandov, M., Song, J. S., Thanki, N., Yamashita, R. A., Yin, J. J., Zhang, D. & Bryan, S. H. 2007. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acid Res., 35:D237--240. Moustafa, A., Beszteri, B., Maier, U. G., Bowler, C., Valentin, K. & Bhattacharya, D. 2009. Science, 324:1724--1726. Okamoto, N., Chantangsi, C., Horák, A., Leander, B. S. & Keeling, P. J. 2009. Molecular phylogeny and description of the novel katablepharid Roombia truncate gen. et sp. Nov., and establishment of the hacrobia taxon nov. PLoS ONE. 4:e7080. doi:10.1371/journal.pone.0007080. Pagel, M. & Meade, A. 2008. Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo. Phil. Trans. R. Soc. B., 363:3955--3964. Parfrey, L. W., Grant, J., Tekle, I. Y., Lasek-Nesselquist, E., Morrison, H. G., Sogin, M. L., Patterson, D. J. & Katz, L. A. 2010. Broadly sampled multigene analyses yield a wellresolved eukaryotic tree of life. Syst. Biol., 59:518--533. Patron, N. J., Inagaki, Y. & Keeling, J. P. 2007 Multiple gene phylogenies support the monophyly of cryptomonads and haptophytes host lineages. Curr. Biol.,17:887-891. Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods, 8:785-786. Philippe, H., Zhou, Y., Brinkmann, H., Rodrigue, N. & Delsuc, F. 2005. Heterotachy and long-branch attraction in phyloogenetics. BMC Evol. Biol. 5:50. Doi:10.1186/14712148-5-50. Ralph, S. A., van Dooren, G. G., Waller, R. F., Crawford, M. J., Fraunholz, J. J., Foth, B. J., Tonkin, C. J., Roos, D. S. & McFadden, G. I. 2004. Metabolic maps and functions of the Plasmodium falciparum apicoplast. Nature Rev. Microbiol., 2:203--216. Reyes-Prieto, A., Moustafa, A. & Bhattacharya, D. 2008. Multiple genes of apparent algal origin suggest ciliates may once have been photosynthetic. Curr. Biol., 13:956-962. Rice, D. W. & Palmer, J. D. 2006. An exceptional gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophytes and cryptophytes plastids are sisters. BMC Biol., 4:31. 46 Rogers, M. B., Patron, N. J. & Keeling, P. J. 2007. Horizontal transfer of a eukarotic plastid--targeted protein ene to cyanobacteria. BMC Biol., 5:26. Sanchez-Puerta, M. V & Delwiche, C. F. 2008. A hypothesis for plastid evolution in chromalveolates. J. Phycol., 44:1097--1107. Sanchez-Puerta, M. V., Lippmeier, J. C., Apt, K. E. & Delwiche, C. F. 2007. Plastid genes in a non-photosynthetic dinoflagellate. Protist, 158:105--117. Sekimoto, S., Klochkova, T. A., West, J. A., Beakes, G. W. & Honda, D. 2009. Olpidiopsis bostrychiae sp. Nov.: an endoparasitic oomycete that infects Bostrychia and other red algae (Rhodophyta). Phycologia, 48:460--472. Shindo, T., Misas-Villamil, J. C., Hörger A. C., Song, J. & van der Hoorn, R. A. L. 2012. A role in immunity for Arabidopsis cystein protease RD21, the ortholog of the tomato immune protease C14. PloS ONE, 7:e29317. Doi:10.1371/journal.pone.0029317. Slamovits, C. H. & Keeling, P. J. 2008. Plastid-derived genes in the nonphotosynthetic alveolates Oxyrris marinus. Mol. Biol. Evol., 25: 1297--1306. Soll, J. & Schleiff, E. 2004. Protein import into chloroplasts. Nature Rev. Mol. Cell Biol., 5:198--208. Stiller, J. W., Huang, J., Ding, Q., Tian, J. & Goodwillie, C. 2009. Are algal genes in nonphotosynthetic protists evidence of historical plastid endosymbiosis? BMC Genomics, doi:10.1186/1471-2164-10-484 Tatusov, R.L., Natale, D.A., Fedorova, N.D., Jackson, J., Jacobs, A., Krylov, D.M., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Wolf, Y.I., Aravind, L., Lanczycki, C., Masumder, R., Sreekumar, K., Vasudevan, S., Walker, D.R., Tatusova, T.A., Yao, K., Yin, J., Koonin, E.V. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 4:41. Tyler, B. M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R. H. Y., Aerts, A., Arredondo, F. P., Baxter, L., Bensasson, D., Beynon, J. L., Chapman, J., Damasceno, C. M. B., Dorrance, A. E., Dou, D., Dickerman, A. W., Dubchak, I. L., Garbelotto, M., Gijzen, M., Gordon, S. G., Govers, F., Grunwald, N. J., Huang, W., Ivors, K. L., Jones, R. W., Kamoun, S., Krampis, K., Lamour, K. H., Lee, M. K., McDonald, W. H., Medina, M., Meijer, H. J. G., Nordberg, E. K., Maclean, D. J., Ospina-Giraldo, M. D., Morris, P. F., Phuntumart, V., Putnam, N. H., Rash, S., Rose, J. K. C., Sakihama, Y., Salamov, A. A., Savidor, A., Scheuring, C. F., Smith, B. M., Sobral, B. W. S., Terry, A., Torto-Alalibo, T. A., Win, J., Xu, Z., Zhang, H., Grigoriev, I. V., Rokhsar, D. S., Boore, J. L. 2006. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science, 313:1261--1266. 47 Whelan, S. & Goldman, N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Bio.l Evol,.18:691--699. Wilson, R. J. M. 2004. Plastid functions in the Apicomplexa. Protist, 155:11--12. Woehle, C., Dagan, T., Martin, W. F. & Gould, S. B. 2011. Red and problematic green phylogenetic signals among thousands of nuclear genes from the photosynthetic and apicomplexa-related Chromera velia. Genome Biol. Evol., 3:1220--1230. Yoon, H. S., Hackett, J. D., Ciniglia, C., Pinto, G. & Bhattacharya, D. 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol., 21:809-818. Yoon, H. S., Hackett, J. D., Pinto, G. & Bhattacharya, D. 2002. The single, ancient origin of chromist plastids. Proc. Natl. Acad. Sci. USA, 99:15507--15512. 48