In search of susceptibility genes for psychiatric illness Andrea Christoforou Medical Genetics Section Molecular Medicine Centre MVM Research Symposium: November 3, 2008 Introduction Two devastating psychiatric illnesses: Bipolar Disorder (BP) Schizophrenia (SCZ) Mania Positive (delusions, hallucinations) Negative (lack of affect) Cognitive Depression (speech poverty, disorganised thought) Each affects 1% of the general population Complex Genetics Adapted from Chpts 9&10 of Psychiatric Genetics and Genomics Gener al Po p ulat io n SCZ F ir st co usins BPAD U ncles/ aunt s Complex pattern of inheritance N ep hew s/ nieces Gr and child r en Half - sib ling s Par ent s Sib ling s C hild r en Difficult to identify causative genetic factors DZ MZ 0 Gender 10 20 30 % Concordance Environment Variable age at onset 40 50 Genetic heterogeneity Poly/oligo-genic inheritance BP SCZ Unclear diagnostic boundaries Single LARGE Family with BP F22 Chr 4 A single LARGE family 4.4 0.88 1.97 3.24 LODs → Reduce genetic heterogeneity Linkage analysis (eg MERLIN, SUPERLINK,…) → identify large segments of DNA that segregate with a particular phenotype/illness within a family → genome-wide → no a priori information → Significant LOD score = gene of major effect Blackwood et al., 1996 Le Hellard et al., 2007 Houlihan, 2008 Association Analysis of the Chromosome 4p15-p16 Candidate Region for Bipolar Disorder and Schizophrenia. 4.4 0.88 1.97 3.24 Region A Refining the locus “First linkage, then association” 9 9 9 Region B In unrelated cases and controls More powerful Better resolution (Risch & Merikangas, Science 1996) Region C Region D Hypothesis: Gene X Gene X Prioritised: 8.3Mb of 20Mb F22 linkage region Analysis is underway for remaining region Outline of Methods Marker Selection Genotyping Data processing Data Analysis Replication SNP Selection Catalogue of common genetic variants (single nucleotide polymorphisms, SNPs) in humans http://www.hapmap.org/ G T C A A A G T C T C C A A A T → Downloaded all available SNP genotype data for Regions B and D →CEU sample data → Selected based on linkage disequilibrium (LD) ~ correlation between SNPs → Using Haploview → SNPs selected to tag haplotypes of SNPs -> htSNPs G A G C C A UNDERLYING THEORY Summary of SNPs Genotyped Chromosome 4 htSNPs Region B 149 Region D 259 Total 408 368 BP 386 SCZ 458 Controls From the Scottish Population Methods χ12 Markers Selection (HapMap) Genotyping (Illumina BeadArray) SNP3 χ22 Cases Cases Controls Controls AA AT TT Single marker χ2 test Haplotype (Cocaphase v2.4) Sliding windows: 2-5 Global and individual (EM) Log ratio test (LRT) Data Analysis SNP2 T Allele and genotype P-values to determine if difference b/w cases and controls. Data processing & QC (~700,000 genotypes) SNP1 A SNP4 By Diagnosis and Gender BP, SCZ and All cases Male, Female and Both Single-marker analysis ○ All P-value (-log scale) ■ SCZ What is an appropriate significance threshold? 0.01 0.05 → P ≤ 0.05? → P ≤ 0.01? How do we account for multiple testing? Region B – Position along Chromosome 4 Bonferroni: ~ 0.05/# tests P-value (-log scale) Distribution of allele p-values ● BP → inappropriate 0.01 0.05 Region D – Position along Chromosome 4 Permutation: Shuffle case/control status → GOLD STANDARD → BUT computational and time restraints due to software Single-marker analysis ■ SCZ ○ All SNPSpD (D. Nyholt) P-value (-log scale) → Meff 0.0005 0.01 0.05 “Nyholt-corrected” significance thresholds Region B: 149 -> 108 Meff P ≤ 0.0005 Region D: 259 -> 191 Meff P ≤ 0.0003 Region B – Position along Chromosome 4 0.0003 P-value (-log scale) Distribution of allele p-values ● BP 9 Supported by permutation correction 0.01 0.05 Region D – Position along Chromosome 4 FOR HAPLOTYPE ANALYSIS Permuted with Cocaphase 2.4: P ≤ threshold at global ≤ 3 SNPs (1000 permutations) Haplotype Analysis Region B: P ≤ 0.0005 B-1 B-4 BP M, Global haplotype BP F, Global haplotype BP M, Individual haplotype BP F, Individual haplotype 56 HS T 0.0001 3S T1 131 MI S P - v a lu e ( -(-log lo g s c ascale) le ) P-value 0.00001 577kb 239kb 126 0.001 0.01 0 20 40 60 80 100 120 140 RegionBB Project SNP Project Number Region SNP Number Most of these haplotypes survived the permutation correction. Haplotype Analysis Region D: P ≤ 0.0003 BP F, Global haplotype All F, Individual Haplotype SCZ M, Global haplotype SCZ MF, Global haplotype BP F, Individual Haplotype BP MF, Global haplotype SCZ M, Individual Haplotype SCZ MF, Individual Haplotype All F, Global haplotype BP MF, Individual Haplotype 0.000001 D-7 074 GC 1 0.0001 KIA A AR 25kb 6 A 0.00001 PP P -value(-log (-log scale) P-value scale) D-2 17kb 0.001 0 50 100 150 200 250 Region D SNP Project Number Region D Project SNP Number GERMAN Sample Most of these haplotypes survived the permutation correction. REPLICATION IS ESSENTIAL! SCOTTISH Sample Permutation analysis General Principal → Disrupt relationship being tested (eg frequencies in cases vs controls) multiple times to create null distribution. See where actual result falls within the empirical null distribution. “Gold standard” for multiple-testing correction → no assumptions → corrects for hidden correlation BUT time consuming → in Cocaphase, run serially: Perm1 -> Perm2 -> Perm3 -> … → because of this, we only permuted → SNPs with P≤Nyholt threshold → Haplotypes with global P≤Nyholt threshold and ≤3 SNPs in size Solution → parallelize Cocaphase so that permutations are run in parallel → problem tackled by student Omer Jilani, MSc in Computer Science Cocaphase on the Grid (ECDF) 5-SNP haplotypes 259 SNPs Full dataset Figures provided by Omer Jilani A further approach… 4.4 0.88 1.97 3.24 LODs F22 e g a is k n lys i L a An n it o ia is c o lys s a As An M ra r a o r ic y A l a n i s y s Microarray Expression Study KK0053 KK0141 KK0142 KK0143 KK0152 KK0052 KK0234 KK0404 KK0053 KK0109 KK0027 KK0067 KK0035 KK0161 KK0025 KK0050 kk0028 KK0127 KK0026 KK0119 KK0088 KK0282 KK0265 KK0110 KK0162 KK0090 KK0091 KK0083 KK0662 KK0089 Kk0281 KK0098 KK0097 KK0191 KK0190 KK0100 KK0099 KK0274 KK0034 KK0233 KK0306 KK0147 KK0177 KK0273 KK0196 KK0250 0585, KK07 KK0272 KK0256 KK0251 KK0093 KK0257 KK0092 KK0255 KK0051 KK0054 KK0117 KK0116 KK0113 KK0238 KK0114 KK0115 KK0241 KK0240 KK0363 KK0365 KK0279 KK0364 KK0280 KK0278 KK0354 KK0355 KK0239 KK0362 KK0513 KK0512 KK0514 KK0409 KK1478 KK0504 KK0519 KK0515 Affected w/ “disease haplotype” Affected w/o “disease haplotype” Married-in Controls Differentially expressed genes Unaffected w/ “disease haplotype” Expression QTLs (eQTLs) SNP loci that control expression of genes (cis/trans) e l b li a ype a ot v A e n a ta G D KK0552 KK0572 Conclusion Acknowledgments David Porteous Kathy Evans Albert Tenesa CRF Pippa Thomson Stewart Morris Naomi Wray Ian White Dave Liewald Steve Cass Psychiatric Genetics Section Helen Torrance Susan Anderson Lorna Houlihan Douglas Blackwood Walter Muir Sven Cichon Lee Murphy Angie Fawkes Alison Condie Medical Genetics Section Varrie Ogilvie Laura Hyndman ECDF Omer Jilani Jon Weiss Sam Skipsey Mike Baker John Blair-Fish