1 Supplemental Information 2 Classification of var genes for study 3 Var genes are large, two exon structures, with exon 1 coding for the surface- 4 exposed portion of PfEMP1. This region is, generally, highly variable in length (2.7-10.4 5 kb) and sequence, both within and between isolates. Based on the 3D7 genome sequence, 6 exon1 of most var genes is composed of Duffy Binding-like (DBL) domains of multiple 7 sequence groups , , , , or X, and cysteine-rich interdomain regions (CIDR; see 8 Smith 2000 for nomenclature). The 57 complete var genes of 3D7 that encode an N- 9 terminal domain of type DBL1 followed by a CIDR domain are very polymorphic 10 within the 3D7 genome [(Gardner et al., 2002) and latest release at www.sanger.ac.uk], 11 and they are also not shared as intact genes in other parasite genomes [(Taylor et al., 12 2000) and Kraemer et al submitted]. 13 In 3D7, 23 of the 57 polymorphic var genes are in chromosome-central clusters 14 and tend to have 5' promoter sequences of type upsB or upsC. Most of the remaining 15 polymorphic var genes are located adjacent to and transcribed away from the telomere, 16 and have upsB type 5' sequences. A small number of var genes are adjacent to these 17 upsB-type vars, transcribed towards the telomere, and have upsA1 (formerly upsA) type 18 5' promoters. The highly polymorphic var genes in 3D7 are thus the most abundant var 19 type, and they have upsA1, upsB or upsC 5' promoters. 20 In contrast, a total of 5 var genes in the 3D7 genome are highly conserved 21 between isolates, having upsA1 (formerly upsA; type 3 var), upsA2 (formerly upsD; 22 var1csa; pseudogene PFE1640w) and upsE (var2csa) type promoters (Rowe and Kyes, 23 2004; Trimnell et al., 2006). Although var2csa and type 3 var appear to be regulated by 1 mutually exclusive gene expression, the upsA2 type var1csa gene is expressed 2 independently of phenotype in parasites where the gene is intact (Kyes et al., 2003). We 3 considered these semi-conserved, subtelomeric genes to be unusual representatives of the 4 var gene family, on the basis of having unusually short introns and/or unusual 5’ 5 promoter regions, both sites being implicated in gene regulation (Calderwood et al., 6 2003; Deitsch et al., 1999; Kraemer and Smith, 2003; Lavstsen et al., 2003). However, 7 they are investigated here for completeness. 8 9 10 Map and sequence of the A4varICAM/R29R+var1 locus The 3D7 isolate, on which the published genome is based, is particularly 11 unsuitable for deriving phenotypically homogeneous parasites, due to low levels of 12 PfEMP1 expression at the RBC surface. Instead, we use the IT isolate, for which 13 cytoadherent phenotypes have been well-characterized. We wanted to examine RNA 14 polymerase activity both within and near the A4varICAM gene, because IT parasite 15 populations can be selected to high homogeneity for expression of the protein this gene 16 encodes. This most closely approximates a population that is ‘clonal’ for var expression. 17 The genome sequence of IT is not yet available, so we first mapped, cloned and 18 sequenced the region surrounding A4varICAM (Figure S1). 19 Using a combination of PCR and vectorette cloning, with confirmation by 20 Southern blot and restriction mapping, we constructed a contig sequence for one end of 21 the IT genome chromosome 13, corresponding to the 'left' end of 3D7 chromosome 13 22 (mapping by hybridisation to the same end as the gene for glycophorin binding protein 23 homologue 2; gbph2; accession no X69769; PF13_0010). We showed previously by 1 mapping that the A4varICAM and upsA type R29R+var1 genes are located in tail-to-tail 2 orientation at this end of chromosome 13, in most IT parasites except those that express 3 the R29R+var1 gene (Horrocks et al, 2004). 4 Non-coding sequences near var genes show much more similarity than coding 5 regions (Taylor et al, 2000; Kraemer et al submitted). Conservation in non-coding 6 sequences may indicate conservation of function, or may simply reflect that the coding 7 segments, exposed at the host-parasite interface, are under constant selection. Whatever 8 the underlying cause, conserved non-coding genomic contexts for these variant antigen 9 families in 3D7 allows prediction of gene organization in new isolates. From the 3D7 10 genome information we were able to predict rif gene positions relative to the var genes. 11 In 3D7, most subtelomeric upsB/upsA var gene pairs have a rif sequence located between 12 them, and most upsA vars are arranged head-to-head with a rif gene (Gardner et al 2002). 13 The distance between A4varICAM and R29R+var1 is only ~1kbp, with no open reading 14 frames in the intergenic sequence. However, we amplified nine rif genes from PFG - 15 purified chromosome 13 DNA and by chromosome pulsed field gel hybridisation and 16 restriction mapping we determined that two of these rif genes were within ~25kbp of the 17 R29R+var1 gene. We mapped one of these rif genes, ITrif13.1, as closest to the 18 R29R+var1 gene, and we were able to extend the contig sequence out to this gene on the 19 basis, from similar var pairs in 3D7, that it would be in head-to-head orientation with 20 R29R+var1. The second rif gene, ITrif13.2, lies telomere-distal of ITrif13.1 within 21 ~12kbp. Although we were able to amplify several stevor sequences from A4 22 chromosome 13 DNA, none of these mapped to the same ApaI fragment as A4varICAM, 23 and therefore these are probably located at the other telomere. 1 In the R29 clone, the A4varICAM gene has been deleted, and the telomere is 2 adjacent to the 3' end of R29R+var1, leaving only ~800bp between the stop codon of 3 exon2 and the telomere repeat sequence. In R29 genomic DNA, the Chromosome 13 4 ‘left’ telomere repeats end at a standard CA breakpoint, determined using telomere PCR. 5 Unfortunately, this technique is not sufficiently sensitive to detect deletions in 6 heterogeneous populations, even if that population is expressing R29R+var1. From clone 7 R29 genomic DNA sequence, we can define the sufficient length of 3' down-stream 8 sequence for proper regulation of var gene expression as being 790bp. 3'RACE suggests 9 that the transcript ends approximately ~410-460nt (not shown) after the stop codon. 10 Only two subtelomeric upsB/upsA1 var pairs in 3D7 are similar to 11 A4varICAM/R29R+var1 in organization, having no rif gene between them: PF11_0007/ 12 PF11_0008 (intergenic distance 1067bp) and PF08_0141/ PF08_0142 (1046bp). 13 Although the distance between these two subtelomeric, tail-to-tail 3D7 var pairs is similar 14 (compared to 1030bp in A4/R29), the sequences are not similar. Better matches for 15 A4var-R29var intergenic sequence are found in chromosome central var intergenic 16 sequence. The A4varICAM upsB sequence is a common upsB type, with the full 1500bp 17 highly similar to other upsB for subtelomeric vars. The R29 5' untranslated upsA1 type 18 sequence is most similar to that for the conserved 'type3' vars. 19 20 1 2 3 Supplemental Information Methods 4 Chromosome pulsed field gel (PFG) Southern blots, PFG separation blots of DNA 5 digested in agarose blocks with rare-cutting restriction enzymes, or linear electrophoresed 6 blots of liquid DNA digested with frequent-cutting enzymes were prepared as previously 7 described (Smith et al., 1995). The distance to the end of the chromosome from the 8 R29R+var1 gene in R29 parasites was estimated by restriction digest mapping with the 9 rare cutter BglI and with BglII and EcoRI. PCR with a primer 200bp 3' of the end of Genomic mapping of expressed var genes, and telomere PCR 10 R29exon2 (R29-3UTF: 5'-ATTTTGTATTTATTTGACAC) to a telomere repeat primer 11 (5'-TGAACCCTGAACCCTGAACCC) was used to confirm this distance. PCR using 12 previously reported telomere repeat primers either failed or did not work as efficiently as 13 this primer. Conditions for PCR: 1.5mM MgCl2, 200M dNTPs, 1u Perkin Elmer Taq 14 polymerase per 50l reaction, each primer at 1M, and 100ng genomic DNA; 95oC 15 3min, followed by 30 cycles of 94oC 30sec, 45oC 30sec, 65oC 4min, followed by 65oC 16 10min. A4 and R29 genomic DNA templates were compared, and a fragment unique to 17 R29 was gel-purified then cloned into pCR2.1 TA vector (Invitrogen); sequences were 18 determined by BigDye sequencing/ABI (Applied Biosystems) analysis. 19 20 Chromosome 13 subtelomeric rif gene identification 21 Localization of specific rif genes neighboring the A4varICAM and R29R+var1 genes was 22 performed by separating chromosomes on PFG, staining representative lanes in ethidium 23 bromide, then excising regions of the gel (not stained) corresponding to chromosome 13. 24 A small segment of gel was then equilibrated in 10mMTris, 1mM EDTA 10 minutes at 1 room temperature, the equilibration buffer was removed, and the tube was placed in a 2 boiling water bath for several minutes, until agarose was molten. Approximately 9 3 volumes of sterile water were added, mixed, and the tube placed in a boiling water bath 4 for 30 sec. This was used as a PCR template. PCR for rif genes was performed with 5 generic, degenerate primers designed to a single class of rif genes 6 rifF4: ATTCCA/CACATGTA/GTA/TTG 7 rifR1: CTTCAA/TTTTA/GTTA/TTTTC/TG/TG/A/TCGATAACG 8 Primers were designed to a second class of rif genes, and although they amplified a 9 product from genomic DNA, they never yielded any product on RT-PCR, so this class 10 was not included in this study. Reactions were performed using standard conditions, 11 3mM MgCl2, with 95oC 3min, followed by 30sec 94oC/30sec 42oC/60sec 65oC, 30 12 cycles. PCR fragments were cloned and sequenced as above. Unique sequences were then 13 hybridised to PFG chromosome separation blots, and to ApaI and ApaI/BglI digest blots, 14 for confirmation of position. 15 16 RTPCR, probe labeling and Southern blots for var expression analysis 17 We confirmed expression of A4varICAM in parasites with high monoclonal antibody Bc6 18 positivity by DBL1alpha-tag RTPCR of ring stage RNA (Bull et al., 2005). RTPCR 19 products were 32P-labeled (Megaprime, GE Healthcare/Amersham Biosciences, as per 20 manufacturer’s instructions) and hybridised to a Southern blot containing a panel of A4- 21 genomic DNA derived DBL1alpha tag PCR fragments. Blots were exposed to regular 22 speed frilm (Hyperfilm MP; GE Healthcare/Amersham Biosciences) or fast film (Biomax 23 MS; Kodak), for various lengths of time. Genbank accession numbers AJ319680- 1 AJ319712 correspond to tag numbers A4AFBR1-43 (consecutive as listed in Figure S2). 2 A4AFBR tags missing from list, eg A4AFBR3, were not submitted to GenBank due to 3 sequence chimeras and discrepancies, therefore these are not shown here. 4 5 Hybridisations PCR fragments of var and rif genes, and single-copy markers contained 6 in plasmids or as PCR fragments, were labeled with alpha32P-dATP (Megaprime, 7 Amersham). Hybridisations were generally at 60oC, (but 65oC hybridisation for gene- 8 specific exon1 probes, and alphaAF’-BR RTPCR probes) in 7% SDS, 0.5M Na- 9 Phosphate buffer pH 7.2, 2% dextran sulfate, 1mM EDTA; washes were at 68oC in 10 11 0.1xSSC (0.1xSSC= 0.015M NaCl, 1.5mM Na citrate, pH7) 0.1%SDS. 1 Supplemental Information Figure legends 2 Figure S1. Restriction map of A4varICAM/R29R+var1 locus, and relationship to 3 published and new sequences. 4 We used restriction mapping to orientate these genes within the parasite genome, and 5 found that they were both located on Chromosome 13, with slight differences between 6 A. A4 cloned parasites and B. R29 clone parasites. 7 The 5' end of the R29R+var1 gene lies on a 110kb ApaI/BglI fragment in both A4 and 8 R29 clone genomic DNA. This same ApaI/BglI fragment hybridizes to GBPH2, 9 glycophorin binding protein homologue 2. The 3' end of the R29R+var1 gene is on a 10 40kb BglI fragment in A4 clone parasites, and a 6kb fragment in R29 clone parasites. 11 Correspondingly, the entire A4varICAM gene is on a 40kb BglI fragment in A4 parasites. 12 Although DBL1, DBL2 and part of DBL3 of A4varICAM are present in R29, this 13 fragment of A4varICAM has rearranged to a different chromosome (not shown). 14 In A4 and related parasites, orientation of the A4varICAM gene has been confirmed as 15 tail to tail with R29 by restriction digestion, then by PCR to span the distance between the 16 genes. In R29 parasites, the 790bp from the 3’ end of the R29R+var1 gene to the 17 telomere repeats has been cloned and sequenced (the breakpoint occurs between positions 18 12693 and 12694 in the contig sequence, accession no. AM411451). ITrif 13.1 lies on a 19 15kb EcoRI fragment adjacent to the R29R+var1 gene; PCR and sequencing confirmed 20 that this rif is arranged in head-to-head orientation with the var gene. A second rif gene, 21 ITrif13.2, mapped to the same end of chromosome 13, centromeric to ITrif13.1, on the 22 same ApaI fragment. R= EcoRI, B=BglI, ApaI. Published sequences indicated by 23 horizontal bars. Accession numbers: A4varICAM: L42244; R29R+var1: Y13402; 1 R29R+var1 exon2: Y13402, AJ535777; R29 5′: AJ582223; Contig of entire locus: 2 AM411451; partial ITrif13.2: AM411450. 3 4 Figure S2. At 10 hours post invasion, RNA polymerase activity is detected for only 5 the expected dominant var gene, not multiple var genes 6 Run-on probe was prepared from a relatively homogeneous IT parasite population 7 (selected three times on monoclonal antibody Bc6) at ring stage, approximately 10 hours 8 post-invasion. Only A4varICAM appears to be transcribed across all DBL domains. PCR 9 fragments indicated are: var gene DBLs (numbers) and tag (t) A4varICAM 10 (A4varICAM); CS2 (CS2var; DBL1 sequence is identical to that of A4varICAM); Tres 11 (A4TresICAMvar); 17 (tag for ITg-ICAMvar); R29 (R29R+var1); D1 (Dd2var1); var1 12 (var1csa); Stage-specific genes T (MSP1); R (KAHRP). References Bull, P.C., Berriman, M., Kyes, S., Quail, M.A., Hall, N., Kortok, M.M., et al. (2005) Plasmodium falciparum variant surface antigen expression patterns during malaria. PLoS Pathog 1: e26. Calderwood, M.S., Gannoun-Zaki, L., Wellems, T.E., and Deitsch, K.W. (2003) Plasmodium falciparum var genes are regulated by two regions with separate promoters, one upstream of the coding region and a second within the intron. J Biol Chem 278: 34125-34132. Deitsch, K.W., del Pinal, A., and Wellems, T.E. (1999) Intra-cluster recombination and var transcription switches in the antigenic variation of Plasmodium falciparum. Mol Biochem Parasitol 101: 107-116. Gardner, M.J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R.W., et al. (2002) Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419: 498-511. Kraemer, S.M., and Smith, J.D. (2003) Evidence for the importance of genetic structuring to the structural and functional specialization of the Plasmodium falciparum var gene family. Mol Microbiol 50: 1527-1538. Lavstsen, T., Salanti, A., Jensen, A.T., Arnot, D.E., and Theander, T.G. (2003) Subgrouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions. Malar J 2: 27. Rowe, J.A., and Kyes, S.A. (2004) The role of Plasmodium falciparum var genes in malaria in pregnancy. Mol Microbiol 53: 1011-1019. Smith, J.D., Chitnis, C.E., Craig, A.G., Roberts, D.J., Hudson-Taylor, D.E., Peterson, D.S., et al. (1995) Switches in expression of Plasmodium falciparum var genes correlate with changes in antigenic and cytoadherent phenotypes of infected erythrocytes. Cell 82: 101-110. Taylor, H.M., Kyes, S.A., Harris, D., Kriek, N., and Newbold, C.I. (2000) A study of var gene transcription in vitro using universal var gene primers. Mol Biochem Parasitol 105: 13-23. Trimnell, A.R., Kraemer, S.M., Mukherjee, S., Phippard, D.J., Janes, J.H., Flamoe, E., et al. (2006) Global genetic diversity and evolution of var genes associated with placental and severe childhood malaria. Mol Biochem Parasitol 148: 169-180. Table S1. Oligonucleotides used for PCR fragments on Southern blots, and for probes; 89% Bc6+ run-on results. Fragment template HRP1 genomic MSP1 A4-5UT A4-DBL1 A4-DBL3 A4-DBL5 A4-intron A4-exon2 A4-3UT genomic plasmid YAC YAC YAC genomic plasmid plasmid Forward primer sequence (5’—3’) Reverse primer sequence (5’—3’) CAACAAATGCTGCTACACCAG TTTAACCACAGCATCCTC summary* conditions expected run-on product size signal 2.5/50 450bp rings GTCAAAAAACTAGAAGCTTTAG ATCAATTAAATATTTGAAACC 1.5/50 450bp trophs AATATGGAAGTAACGGAAT GCTATCCAATACATGTTTGGCATC 1.5/50 989bp rings ATGAATATCATACTAATGTTA ATATTCCGTATGAGAAAATGT 3.0/50 1.2kbp rings ACCAAGTTGGATGTGTGCGCC AGAAGAATAACCTTTTTCTTTTAG 2.0/50 1.1kbp rings TCTATTTTAGACAGTACATTTG TGTCCTATCCTGTGTATATAAT 2.0/50 900bp rings GGCATTAGGATCCATTGC GTCGACAGGGTGTTTAG 2.5/45 1kbp rings GTTACACCGATCATTATAGTG CTCATTTTCCCACTCTT 1.5/50 1.6kb rings CAAATTGGTGAAAGAG ATAATATCAAATATATATATC 1.5/45 300bp rings R29-3UT R29-exon2 R29-intron R29-DBL4 R29-DBL1 R29-5UT plasmid plasmid plasmid plasmid genomic plasmid var-rif -intergenic plasmid ITrif13.1 plasmid ITrif13.2 stevor A4Tres DBL1 plasmid plasmid plasmid TCCTATATCAGATGTATG TATACAAATAATCAAATGTGC 1.5/45 350bp trophs GGAAGGAGATTCAGATGA TAGGTGTATCCACGTTTG 3.5/50 890bp rings/trophs ACAACCATTCCTTTTGGAG GTATGTATGTATATATGTATGTA 1.5/45 750bp trophs GATGTTTTATACTTTAGG CTCTTATCACTCACAAGC 1.5/50 1.1kbp trophs GGGAATTCGAGTACACCGAAGGTAGAAAG GGGAATTCTTCACAATATCCTGAAGGACC 2.5/45 1kbp negative TGTTATTAGCAGTACAATG AATTTCAATAAACATGTTCTC 2.5/50 1kbp negative TTCTATTATGTTCAATTA GTGTTGTATTCATTCAAG 2.5/45 1.2kbp negative CTCATGGGAAGTTGTTGC TAAAACTATAGCTAGTATTGT 1.5/50 430bp negative GCGGCGATGCCTGAAGTG AGAAGTCTGAAAACTAGT 2.5/45 450bp negative AAATGTTATTGTTTAC CCAAAGCTGCAATACCAC 3.0/45 700bp negative CGGAATTCAGACAACCGGTTCGATTTTCC CGGAATTCCTAAGATGAACTTTGCGTCTG 1.5/50 800bp negative A4Tres DBL3 plasmid ITgICAM DBL1 tag plasmid ITgICAM DBL2 genomic Dd2var1 DBL1 genomic Dd2var1 DBL4 genomic FCR3-var1 5UT genomic FCR3-var1 DBL1 genomic FCR3-var1 DBL3 genomic FCR3-var1 DBL7 genomic FCR3-var1 intron genomic CGGAATTCACAGAGGACGCAAAATGGAA CGGAATTCCTATGTATAATCCAACGATGC 1.5/50 800bp negative GCACGA/CAGTTTT/CGC GCCCATTCG/CTCGAACCA 2.5/45 400bp negative GGTTTAAAATAGGAACAC GTTAGAAGCCATTTGTGC 1.5/50 900bp rings CAAGGACGTTTGTCAGAAGC GATTACATGCATACAAACAG 3.5/50 1kbp rings ATACGGCAAAACCGCACC TCCATTTACACATTTGTC 3.5/45 1.3kbp negative AAAGAAAGAACGTGACGC TCTAATGATGATGCTGCATTCC 1.5/50 650bp negative TCTACGCGAGTAAATAAGC GACAAATTTGTTATCGTTCG 1.5/50 1kbp rings, trophs CAAGTAGAAGATTGTCATCC CTGTTCAAGTAATCTGTTGC 3.5/45 1kbp rings, trophs AATCCATTGGATAATTGTCC AACTCCAAAGCGCATTGAG 1.5/50 800bp (negative/ PCR product failure) AAGATCAATCTTCAG AGGCATTCCATACTCTC 1.5/50 750bp negative FCR3-var1 exon2 genomic TTCAAATCGTCTGTGGAC TATCAATAGGTTTAGCAC 1.5/50 650bp rings/trophs upsC genomic ACAAACATAGTGACTACC GCCCATTCSTCGAACCA 2.5/50 1.2kb rings GTCGAAATCAATGTACCAG TCACATAGCGATGGCACG 3.0/45 820bp negative TAAGGAAAACATAGACACTG ATGGAATGCGTCACTTCACG 3.5/45 840bp negative ACGAACCAATATTCCAATGCT ATTTTTTGCATGTAGGTATGAT 2.0/50 850bp rings GACAACAGTCATAGTGGAGC GAGGGTACAAGCGTCATCC 2.0/50 1.1kbp negative GTACCCTCAAATATAGTG CATGGATCACAATAATCTG 2.0/50 1.1kbp negative 3D7 3D7 genomic ACTATAAGATAAATTTAAGAGA PFL0030c5UT TATCATATTTCTTGTAATAGC 2.5/45 700bp negative 3D7 3D7 genomic CTAAATAGTTAGACATATAAC PFL0030c5utORF ACTTGATTTATCCATTTTGTC 1.5/45 650bp negative 3D7 3D7 genomic CTTGTGATAGAATACC PFL0030cDBL1 TTTGTTGATATAATTCTG 2.5/45 800bp negative ITvar1 DBL1 genomic ITvar1 DBL4 genomic A4var-DBL2 genomic CS2var-DBL2 genomic CS2var-DBL3 genomic 3D7 3D7 genomic ATACTATAATACATGGAG PFL0030cDBL3 CATTATTAGTGCATGCGTC 2.5/45 700bp negative 3D7 3D7 genomic CTTCGGACATTAATAAAGGTGTGC PFL0030cDBL6 CAATTATTTTTAACTTCTGTGTCATC 1.5/45 800bp negative 3D7 3D7 genomic ACGTGTACTTGATATACC PFL0030cexon2 TTTCCATCTGATCGTCAC 2.5/45 1.2kbp negative A4-var3 5’UT A4 genomic ATATTATGGATAATACAGATAG TCTATACCAAATGATTGCCAT 1.5/50 500bp negative A4-var3 DBL1 A4 genomic ACAGTAATGCTGGAGCATGT CACACTTTGGATGTGTCAA 2.0/50 500bp negative A4-var3 DBL2 A4 genomic ATCCATTAGAAAAATGTC GTTTTGTTACGTATGATG 1.5/45 872bp negative A4-var3 intron A4 genomic GGTGTCGCCTTAACTCTA ATTTGTAATATCCGTATCATAT 3.5/50 242bp negative A4-var3 exon2 A4 genomic TGAAGTAGATATGATACG GAAATTGTTGTTCCAACG 1.5/45 1.2kbp rings A4-var3 3’UT A4 genomic TGTCATTGTACATAATTCAATA GGTATTTTACAACTTATGATAC 3.5/50 840bp negative * Concentration MgCl2/Annealing temperature, oC. Genomic = A4 genomic DNA. YAC = A4var-containing YAC.