Statistical Methods to Prioritize GWAS Results by Integrating Pleiotropy and Annotation Hongyu Zhao Yale School of Public Health June 25, 2014 Joint work with Min Chen, Lin Hou, Tianzhou Ma, Can Yang, Dong-Jun Chung, Cong Li, Judy Cho, Joel Gelernter What we have learned from GWAS • Genes/Variants associated with phenotypes • Genetic risk prediction • Genetic architecture What we have learned from GWAS • Genes/Variants associated with phenotypes • Prediction • Genetic architecture Crohn’s Disease IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 IL22 Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble Soluble jewish pathway_name Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway Receptor Signaling Pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway pathway gene_symbol pvalue IL23R 0.002297 SOCS1 0.010415 IL2RA 0.017337 PRLR 0.019376 STAT2 0.033827 TYK2 0.052902 IL10RB 0.060543 CNTFR 0.068332 IL12RB2 0.072698 IL20RA 0.077203 IFNAR2 0.085782 IL22 0.10299 IL22RA2 0.113906 IL6ST 0.124483 IL21R 0.125142 IL6R 0.125529 SOCS2 0.131336 IL13RA2 0.142406 IL7R 0.146245 JAK2 0.166414 IL11RA 0.16868 GHR 0.191144 CSF3R 0.191723 IFNGR2 0.208994 IL12RB1 0.267659 IL28RA 0.294141 JAK1 0.317088 STAT6 0.349177 LEPR 0.391859 IFNAR1 0.392715 IL15RA 0.414013 SOCS6 0.442633 SOCS3 0.444405 IL22RA1 0.469906 STAT1 0.503734 STAT4 0.504923 EPOR 0.553102 SOCS4 0.556056 IL2RB 0.61677 STAT5A 0.661919 IL2RG 0.672769 IFNGR1 0.676117 JAK3 0.702464 IL4R 0.746998 STAT3 0.780401 IL5RA 0.78238 LIFR 0.803115 SOCS5 0.807055 CSF2RB 0.903223 STAT5B 0.906422 IL10RA 0.924236 OSMR 0.928906 IL13RA1 0.973552 Network-Based Analysis • Start from a known interaction/co-expression network [N: assumed to be known] • Each gene is either associated or not associated with a phenotype [D: unknown] • Each gene has an observed statistical evidence for association [Z: observed] • Goal: Infer D conditional on N and Z Chen, Cho, Zhao (2011) PLoS Genetics Chen, Cho, Zhao (2011) PLoS Genetics Application to CD GWAS Chen, Cho, Zhao (2011) PLoS Genetics Co-Expression Networks Zhou et al. (2002) PNAS Guilt by Rewiring: Motivation • Gene networks are different between healthy controls and diseased individuals. • The differences are as important or even more important than their commonalities. A B A C D Control A B C D B C D Disease Rewiring network Hou et al. (2014) Human Molecular Genetics MRF model leads to better replication rates between independent studies • Negative control: – Non-specific microarray dataset (brown line, left figure) Hou et al. (2014) Human Molecular Genetics Signal enrichments in DHS sites Hou, Ma, Zhao (2014) Better replication rates at DHS sites Hou, Ma, Zhao (2014) Weighted scheme to integrate DHS site information to prioritize SNPs http://dongjunchung.github.io/GPA/ GPA formulation GPA formulation GPA formulation GPA formulation GPA formulation GPA formulation GPA formulation GPA: Single GWAS Chung et al. (2014) PLoS Genetics, under revision GPA: Modeling Pleiotropy GPA: Modeling Annotation Data Modeling Pleiotropy and Annotation Key Assumptions for GPA Simulations Comparisons with conditional FDR approach GPA: Enrichment Testing • Pleiotropy & enrichment for annotation can be checked conveniently using the hypothesis testing procedure incorporated into the GPA G1/G2 Null Assoc. framework. • Null hypothesis for pleiotropy: H0: ( π10 + π11 ) ( π01 + π11 ) = π11 Null π00 π01 Assoc. π10 π11 • Hypothesis testing for annotation enrichment: H0: q0 = q1 GPA: Hypothesis Testing Comparisons with GSEA Five Psychiatric Disorders • Five psychiatric disorders: – – – – – ADHD. Autism spectrum disorder. Bipolar disorder. Major depression disorder. Schizophrenia. • Strong pleiotropy exists for BIP-SCZ, MDD-SCZ, ASD-SCZ, & BIP-MDD. Five Psychiatric Disorders BIP: separate analysis BIP: joint analysis Five Psychiatric Disorders SCZ: separate analysis SCZ: joint analysis Comparisons with Linear Mixed Models • Integration of bladder cancer GWAS data with ENCODE DNase-seq data from 125 cell lines. • Annotation from 11 cell lines are significantly enriched, under α = 0.01, after Bonferroni correction. Acknowledgements Medicine: Judy Cho (Mount Sinai) Psychiatry: Joel Gelernter Yale Center for Statistical Genomics and Proteomics: Min Chen (UT Dallas), Lin Hou, Tianzhou Ma (U. Pittsburgh), Can Yang (HKBU), Dong-Jun Chung (MUSC), Cong Li Various NIH and NSF grants