Analysis and Results of Core Motifs and Conserved Peptides In further examination of the P. reichenowi and P. falciparum DBLα domain homology blocks, we revealed multiple regions of high sequence homology, classified here in two groups: core motifs and conserved peptides. The core motifs are defined as regions of 100% sequence identity that are found in at least one P. reichenowi gene matching two or more from P. falciparum (unique pairs of matches are considered conserved peptides, details below) and correspond to one of the five known degenerate motifs [1] that characterize the DBLα domains. These domains, listed starting from the N-terminus side are HB4, HB3, HB5, HB2, and HB1 [1]. We recovered 18 sequences corresponding to the conserved motifs, between 10 and 29 residues, in 36 different P. reichenowi var genes and shared with 85 of those from 3D7 and HB3. The HB2 motifs are the most frequently represented in this group, corresponding to 11 of the 18 different motifs, and were found in 26 P. reichenowi and 63 P. falciparum paralogs total, with a mid-sized motif (19 residues) found in 16 out of 49 standard (non-csa) P.reichenowi PFEMP1 sequences. The HB3 motifs are found in two different forms, in 12 P. reichenowi and 54 P. falciparum domains, and HB5 in four forms in 9 P. reichenowi genes and 19 from the P. falciparum sets. The HB4 had only one identical form matching two P. falciparum (one each in 3D7 and HB3). Notably, HB1 motifs were not present in these results. Although one match was found for this motif, it is not counted among these results in this group as it was only found in a unique pair, and it was between the P. reichenowi and HB3 var1csa genes, a locus which is already known to have a relatively recombination rate and high degree of conservation, quite distinct from the standard PfEMP1s (these results are summarized in Table S7). The var genes are classified as upsaA, upsB, or upsC based on conserved upstream sequences. A combination of all the results of the P. falciparum matches from all the motifs shows a total of 555 hits, with individual genes represented between 1 and 14 times. Of these 555 hits, 13 were from uspA genes, 368 were from upsB, and 164 from upsC. Because there are a total of 18 upsA, 55 upsB, and 21 upsC genes in the analysis, this indicates that upsB and upsC genes show greater levels of conservation, with upsA genes more divergent [2]. In addition to these core motifs, we also found 53 regions that define conserved peptides, determined by pairwise matches of high identity/similarity between P. reichenowi and P. falciparum DBLα domains. These peptides range between 24 and 140 residues in length, and between 70% and 100% identity (identical amino acids) and 80% to 100% similarity (amino acids with the same or similar functional characteristics). The conserved regions are drawn from 48 non-redundant matches, with the five additional peptides each drawn from a smaller portion of a conserved peptide with even higher identity. While many P. reichenowi loci revealed none of this highly conserved sequence, some loci were represented in multiple matches, the most frequent being Preich_4 with 6 non-redundant regions plus one additional sub-region of high P. reichenowi/P. falciparum homology, Preich_11 with four peptides and one additional sub-region of high identity, and Preich_7.2, and Preich_84, both with four matches 1. 2. Rask TS, Hansen DA, Theander TG, Gorm Pedersen A, Lavstsen T: Plasmodium falciparum Erythrocyte Membrane Protein 1 Diversity in Seven Genomes: Divide and Conquer. PLoS Comput Biol 2010, 6(9):e1000933. Lavstsen T, Salanti A, Jensen ATR, Theander TG, Copenhagen D: Sub-grouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions. Malaria Journal 2003, 2:27. Sequence VFEVWLGNQQEAFKKQK FDYVPQYLRWF FDYVPQYLRWFEEW FDYVPQYLRWFEEWAE DYVPQYLRWFEEWAEDFCR FDYVPQYLRWFEEWAEDFC FDYVPQYLRWFEEWAEDFCR PTYFDYVPQYLRWFEEWAEDFCR VPTYFDYVPQYLRWFEEWAEDFCRK DQVPTYFDYVPQYLRWFEEWAEDFCRK PTYFDYVPQYLRWFEEWAEDFCRKKKK PTYFDYVPQYLRWFEEWAEDFCRLRKHKL LARSFADIGDIVRG CTVLARSFADIGDIVRGRDLY APYRRRHICDYNLHHINENN WWTANRETVW KLREDWWTANR LREDWWTANRETVW KLREDWWTANRETVW Table S7: Summary of core motifs Matches Motif 2 87 78 75 69 70 67 63 40 12 23 8 66 5 3 16 20 15 8 HB1 HB2 HB2 HB2 HB2 HB2 HB2 HB2 HB2 HB2 HB2 HB2 HB3 HB3 HB4 HB5 HB5 HB5 HB5 Length No. Matches: P. reichenowi 17 11 14 16 19 19 20 23 25 27 27 29 14 21 20 10 11 14 15 1 24 19 18 16 16 14 11 7 3 2 2 12 1 1 8 5 8 4 No. Matches: P. falciparum 1 63 59 57 53 54 53 52 33 9 21 6 54 4 2 8 15 7 4 P. falciparum HB3var16 HB3var16 HB3var16 HB3var17 HB3var18 HB3var18 HB3var21 HB3var22 HB3var24 HB3var25 HB3var27 HB3var27 HB3var2 HB3var2 HB3var34 HB3var34* HB3var34** HB3var36 HB3var36 HB3var36* HB3var6 HB3var7 HB3var8 HB3var8_2 MAL7P1.187 MAL7P1.56 MAL7P1.56* PF07_0050 PF07_0050 PF07_0050* PF070051 PF080103 PF080103* PF080103 PF080140 PF100406 PF110521 PF140773_truncated PF140773_truncated PFB1055c PFB1055c P. reichenowi Preich_24 Preich_126 Preich_102 Preich_7.2 Preich_9 Preich_7.2 Preich_7.2 Preich_84 Preich_40 Preich_17 Preich_4 Preich_61 Preich_5.2 Preich_8.2 Preich_5 Preich_5* Preich_5** Preich_26 Preich_4 Preich_4* Preich_8.2 Preich_1 Preich_46 Preich_46 Preich_7 Preich_4 Preich_4* Preich_11 Preich_7.2 Preich_11* Preich_4 Preich_133 Preich_4.2* Preich_4.2 Preich_11 Preich_61 Preich_11 Preich_5 Preich_61 Preich_25 Preich_84 Identical Aas 48 61 51 101 63 105 103 72 42 33 37 35 44 47 76 47 27 37 102 81 30 26 79 24 31 45 36 70 39 22 37 115 32 65 23 28 23 29 53 49 31 Length 60 84 73 138 82 139 136 95 48 40 39 35 50 59 83 55 27 47 142 101 36 29 100 29 36 63 40 99 46 29 46 145 35 87 29 34 30 37 67 59 37 Percent Identity 80 73 70 73 77 76 76 76 88 83 95 100 88 80 81 85 100 79 72 80 83 90 79 83 86 71 90 71 85 76 80 79 91 75 79 82 77 78 79 83 84 Percent Similar AAs Similarity 56 93 75 89 62 85 114 83 74 90 116 83 118 87 79 83 47 98 36 90 39 100 x x 48 96 55 93 73 88 51 93 x x 42 89 121 85 92 91 36 100 26 90 85 85 27 93 33 92 51 81 38 95 82 83 43 93 26 90 40 87 126 87 35 100 77 89 26 90 30 88 29 97 35 95 58 87 53 90 34 92 PFC0005w PFC0005w PFC0005w PFC0005w PFC1120c PFD1245c PFE1640w_truncated PFL0005w PFL1955w PFL1970w PFL2665c PFL2665c Preich_32 Preich_46 Preich_11 Preich_84 Preich_84 Preich_9 Preich_8.2 Preich_5 Preich_40 Preich_78 Preich_129 Preich_4 57 72 21 37 43 47 48 28 38 67 97 21 73 101 30 41 58 56 58 37 43 92 139 24 78 71 70 90 74 84 83 76 88 73 70 88 61 81 26 41 50 52 52 34 41 74 117 24 84 80 87 100 86 93 90 92 95 80 84 100 Table S8: Summary of conserved peptide matches. Stars (*) indicate sub sections of higher homology