Malaria genomic epidemiology research Dr. Alyssa Barry Malaria Genomic Epidemiology Lab., Centre for Population Health What is Malaria? • A disease caused by infection with Plasmodium spp. parasites • Carried from person to person by anopheline mosquitoes • Six species of Plasmodium cause malaria – P. vivax, P.falciparum, P. malariae P. ovale curtisii, P. ovale wallikeri, P. knowlesii – P.falciparum causes most morbidity and mortality • Symptoms include fever, nausea, vomiting, diarrhoea, tissue damage, multiple organ failure, severe anaemia, coma (cerebral Malaria), death The Malaria Parasite Lifecycle – human host The Burden of Malaria • • • ~ 50% of the global population at risk of malaria Half a billion clinical attacks each year At least 1 million deaths each year Two or three people die of malaria every minute ! Who are Most at Risk? • Children under 5yrs old – In the top 5 causes of death • Pregnant women – 400million births/yr in malaria affected areas • Other non-immunes – – – – natural disaster war environmental change climate change Effects of Malaria Besides direct morbidity and mortality: – Reduced school attendance – Lower productivity – Impaired intellectual development – Developmental abnormalities – 2% less GDP growth in malarious countries – Costs Africa about US$12 billion a year Malaria Genomic Epidemiology… • Genomic epidemiology (Def’n): The systematic investigation of how variation in the human genome, and in the genomes of human pathogens, affect the occurrence and clinical outcome of disease • We are investigating patterns of genomic diversity within natural malaria parasite populations to: – Monitor patterns and routes of transmission (molecular epidemiology, population genetics, ecology) – Design malaria vaccines (what strains circulate?) – Understand parasite evolution (changes over time, immune selection, interactions with host molecules) – Understand how humans naturally acquire immunity to diverse malaria parasites Malaria parasite diversity Variant specific antibodies Vaccine A diverse parasite population will be more resilient to interventions Partial efficacy of single strain vaccines. A malaria vaccine may need to contain multiple variants. Rapid evolution of drug resistance and other advantageous traits Malaria parasite population structure C Gene flow Variability in allele frequencies A B Unique alleles Movement of different strains between populations D Population specific approaches to malaria control (e.g. tailored vaccines, efforts targeted to specific foci) Speed and direction of the dissemination of advantageous traits Polymorphism 101 • Derived from the Greek language – Poly = many πολύ – Morph = form μορφή • The occurrence in a population (or among populations) of several phenotypic forms associated with alleles (variants, types) of one gene • Genetic polymorphism: the occurrence together in the same population of one or more allele or genetic marker (e.g. nucleotide or string of nucleotides) at the same locus (position in the genome) • Therefore: Genetic variation results in the occurrence of several different forms or types of individuals among the members of a single species (diversity) • e.g. Humans: blood group, hair colour, eye colour, disease status in • e.g. microorganisms: drug sensitivity/resistance, growth characteristics, antigenic diversity (strains) • Caused by mutation Types of polymorphism Fragment size/pattern analysis (electrophoresis): • AFLP: Amplified Fragment Length Polymorphism • RFLP: Restriction Fragment Length Polymorphism • SSLP: Short Sequence Length Polymorphism – Microsatellites: tandem repeats (2-3 bp) – Minisatellites:tandem repeats (>3 bp) Sequence analysis (sequencing, but also RFLP, SSLP): • SNP : single nucleotide polymorphism • Indel: Insert or Deletion • Simple sequence repeats: polynucleotides (AAAA), microsatellites (TATATA) etc… Microsatellites TA TA TA TA TA TA TA TA •Arrays of short tandem repeats 1-4 bp long •A class of variable number tandem repeat (VNTR) used in DNA fingerprinting •Also known as simple sequence repeats (SSR) •Abundant and rapidly evolving •Highly polymorphic •Detected by size variation •Fairly evenly spaced through the genome •Cheap to analyze Chromatin Binding Protein Intronic microsatellite polymorphism (TA)n 3D7(GB) 3D7 HB3 W2 Muz12 Muz37 Muz51 #61 #61 #61 #61 #61 #61 #61 AATTAAATAG AATTAAATAG AATTAAATAG AATTAAATAG AATTAAATAG AATTAAATAG AATTAAATAG GATTAAAATA GATTAAAATA GATTAAAATA GATTAAAATA GATTAAAATA GATTAAAATA GATTAAAATA ATTGTCATAA ATTGTCATAA ATTGTCATAA ATTGTCATAA ATTGTCATAA ATTGTCATAA ATTGTCATAA AAAAAATTAT AAAAAATTAT AAAAAATTAT AAAAAATTAT AAAAAATTAT AAAAAATTAT AAAAAATTAT ATATACTTGA ATATACTTGA ATATACTTGA ATATACTTGA ATATACTTGA ATATACTTGA ATATACTTGA AAAAGCAAAT AAAAGCAAAT AAAAGCAAAT AAAAGCAAAT AAAAGCAAAT AAAAGCAAAT AAAAGCAAAT tatatatat: tatatatat: tatatatat: tatatat::: tatatatat: tatatatat: tatatatata :::::::att :::::::att :::::::att :::::::att :::::::att :::::::att tatat::att taatataaac taatataaac taatataaac taatataaac taatataaac taatataaac taatataaac aaaatatatt aaaatatatt aaaatatatt aaaatatatt aaaatatatt aaaatatatt aaaatatatt 5’ Regulatory Domain 3D7(GB) 3D7 HB3 W2 Muz12 Muz37 Muz51 #121 #121 #121 #121 #121 #121 #121 GACTGATTTT GACTGATTTT GACTGATTTT GACTGATTTT GACTGATTTT GACTGATTTT GACTGATTTT TTAAGgtatg TTAAGGtatg TTAAGGtatg TTAAGGtatg TTAAGGtatg TTAAGGtatg TTAAGGtatg aataaaatga aataaaatga aataaaatga aataaaatga aataaaatga aataaaatga aataaaatga atataatata atataatata atataatata atataatata atataatata atataatata atataatata Exon I 3D7(GB) 3D7 HB3 W2 Muz12 Muz37 Muz51 #181 #181 #181 #181 #181 #181 #181 taacctaaga taacctaaga taacctaaga taacctaaga taacctaaga taacctaaga taacctaaga Intron I tatatatgtt tatatatgtt tatatatgtt tatatatgtt tatatatgtt tatatatgtt tatatatgtt ttttcatata ttttcatata ttttcatata ttttcatata ttttcatata ttttcatata ttttcatata atagttaata atagttaata atagttaata atagttaata atagttaata atagttaata atagttaata Intron I Single Nucleotide Polymorphisms (SNPs) • Point mutation, variation at a single nucleotide position – e.g. A/C, G/A etc… • Clustered in rapidly evolving genes e.g. human MHC genes, P. falciparum var, HIV env, • A good SNP map is useful for population genetics and linkage analysis • Rapid, high throughput detection possible but can be expensive P. falciparum Erythrocyte Binding Antigen 175 (EBA175) Population genetic markers for P. falciparum • Different markers show different patterns • The P. falciparum genome : SNP “islands” coding • Selected markers non-coding – Vaccine candidate antigens • inform vaccine design • Novel vaccine candidates - immune selection? – Drug resistance genes and their genetic background • Is it spreading (how fast, which direction) or multiple independent origins? • Neutral markers – Genome wide microsatellites and SNPs • Population biology e.g. how diverse (fit) is the parasite population? gene flow? i.e. how difficult will the parasite population be to control? Population biology of P. falciparum in PNG • • • • Intense year round transmission of P. falciparum in the lowlands (50-60%), epidemics in the highlands Any spp. (~80%), P. vivax (~50%), P. malariae (~20%), P. ovale (~5%) Diverse micro-epidemiology- Spatially variable transmission, host genetics, vector species, malaria control (bednets) Complex population genetics? Collecting samples Volunteers Isolate Blood Pic of Ivo here Extract genomic DNA Analysis “Wet” lab. methods gDNA n ~ 3000 Screen for P.falciparum infection (msp2 PCR, multiplex) n ~1500 Count the number of msp2 bands Mean MOI = 1.7 (1-13) Whole genome amplification of single infections n ~ 700 Microsatellite genotyping Antigen gene PCR and sequencing Data Analysis Microsatellite protocol 50 + 50 + 16 bp = 116 bp PCR product = 8 repeat units 50 + 50 + 20 bp = 120 bp PCR product CACACACACACACACACACA GTGTGTGTGTGTGTGTGTGT = 10 repeat units 20 bp • Small size difference (4bp) – cannot be detected by agarose gel electrophoresis • Solution: Sequencing, or for cheaper high throughput run PCR products on an ABI Sequencer • The latter solution requires products to be fluorescently labeled Approach: Fragment analysis on an ABI Sequencer Dye attached to the 5’ end of primer TA TA TA TA TA TA TA TA TA TA TA TA TA Fluorescent dye is incorporated into PCR product TA TA TA TA TA TA TA TA TA TA TA TA TA Microsatellite genotyping Multilocus genotyping: Different sizes and different coloured dyes increase the number of loci that can be analyzed in a single run and the sensitivity of the assay Sequencing gel Chromatogram Isolates Locus 1 Locus 7 2 1 The haplotype, a string of alleles (e.g. the number of repeats per loci 15_12_6_8_10_6) is then determined for each isolate “In silico” analysis Allelic richness 12 10 8 6 4 2 0 Utu Malala Mugil Wosera Total Population biology of P. falciparum in PNG Factors that may influence the distribution of parasites: -Mugil/Karkar Is. ferry -Malala boarding school -vector spp. -language groups -human genetics High but variable diversity and population structure Implications for control, elimination and the spread of vaccine and drug resistance (mapping routes of transmission) Currently sequencing several vaccine candidate antigens in two of these populations to inform vaccine design Acknowledgements PNG Communities and Volunteers PNGIMR Peter Siba Ivo Mueller Nicholas Senn Livingstone Tavul Ore Toporua Benson Kiniboro Joe Nale Thomas Adiguma Elias Namosha Burnet Institute Lee Schultz Pilate Ntsuke Johanna Wapling John Reeder Harvard School of Public Health Caroline Buckee Funding