Homogenizing data using ENIGMA-MEGA analysis Peter Kochunov University of Maryland, School of Medicine Introduction • What is mega-analysis • MEGA-Analysis algorithm developed by ENIGMA • Examples: – MEGA analysis of additive genetic effects – MEGA analysis of SCZ effects on white matter integrity Mega-analysis • Combining of “raw” data from multiple studies • Pros: – Additive increase in degrees of freedom – Simplified analysis structure – Uniformly weighting by subjects • Removes weighting uncertainty in Meta analysis of genetic data – Ideal for familial genetics studies where analysis is performed per family Mega-Analysis: Cons • Data may have a site-specific bias Histogram % 30 TAOS: Ages 13-15 25 UCLA-QTIM: Ages: 20-30 20 15 GOBS: Ages: 18-85 10 5 0 0.3 0.35 0.4 0.45 Average FA values 0.5 0.55 Mega-Analysis Algorithm • Developed by Neda and Me • Coded in SOLAR-Eclipse • Tried in the following analyses • Additive Genetic Analysis (Heritability) • Additive Genetic Analysis (Genetic correlation) • Association Analyses (GWA) • Disorder effect analyses • Effects of Schizophrenia on white matter Step 1. Regression of nuisance covariates is performed per site Remove effects of the covariates that don’t act as “contrast” to make data “equivalent” per site Histogram % 30 TAOS: Ages 13-15 25 UCLA-QTIM: Ages: 20-30 20 15 GOBS: Ages: 18-85 10 5 0 0.3 0.35 0.4 0.45 Average FA values 0.5 0.55 3.5 UCLA-QTIM 3 2.5 2 GOBS 1.5 TAOS 1 0.5 0 -3 -2 -1 0 1 2 3 Z-score Step 2 è Inverse normalization 2.5 2 1.5 1 0.5 0 -3 -2 -1 0 1 2 Z-score Step 3 è Testing for Stratification ANOVA of heritability estimates 3 Test of homogeneity of the effect per group Measure h2 per sample. Perform ANOVA UCLA= 0.56±0.25; p=.0001 TAOS=0.49±0.23; p=0.04 GOBS= 0.45±0.07; p=10-8 No difference among groups We can combine them into a single pedigree with the weight assigned based on the relativeness and the pedigree strength . Significance of additive effects: Mega vs. Meta • Mega Analysis: lowest SE and highest significance – h2=0.47±0.02; p=10-16 • Meta Analysis StdErr-Weighted Greatly influenced by the small samples – h2=0.48 ±0.09; p=0.004 • Meta Analysis N-Weighted – h2=0.44 ±0.03; p=10-6 Difficult to justify given that subjects don’t contribute equally Similar trends in voxel-wise data P-values for heritability estimates (-log10) Effects of SCZ on white matter integrity • Apply mega-analysis to study effects of disorder on FA values • Use three samples collected on three scanners • Some cross-over of subjects to directly study effects of data transform Effects per Site: Site 1 (N=350) raw significance p=2*10-6 Controls Patients 16 14 12 10 8 6 4 Transformed significance p=10-6 2 0 10 0.3 0.32 0.34 0.36 0.38 0.4 9 8 7 6 5 4 3 2 1 0 -3 -2 -1 0 1 2 3 Effects per Site: Site 2 (N=220) Raw significance p=0.03 14 12 10 8 6 Transformed significance p=0.01 4 2 0 0.3 0.35 0.4 0.45 0.5 9 8 7 6 5 4 3 2 1 0 -3 -2 -1 0 1 2 3 Effects per Site: Site 3 (N=120) raw significance p=0.03 14 12 10 8 6 4 Transformed significance p=0.03 2 0 0.32 0.34 0.36 0.38 0.4 0.42 10 9 8 7 6 5 4 3 2 1 0 -3 -2 -1 -1 0 1 2 3 Homogeneity of effect per site Mega-analysis 10 Site 1 9 8 7 6 5 4 3 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 2 1 0 -3 -2 -1 0 1 2 3 0 1 2 3 9 Site 2 8 7 6 5 4 3 2 1 0 -3 -2 -1 Site 3 10 9 8 -3 7 -1 1 6 Combined Mega significance p=6*10-9 5 4 3 2 1 0 -3 -2 -1 -1 0 1 2 3 3 Regional Specificity? 10 9 Site 2 4 3 2 1 8 0 2 4 6 Site 1 7 4 y = 0.4117x + 0.5196 R² = 0.23196 Site 3 3 4 2 1 0 0 0.5 1 1.5 3 2.5 3 3.5 4 Site 2 2 1 2 4 Site 3 -log(p) 6 5 y = 0.2826x + 0.1678 R² = 0.37474 8 10 0 3 y = 0.0173x + 1.0281 R² = 0.00192 2 1 0 0 0 2 4 6 Site 1 8 10 Mega-analysis results of regional effects Greatest impact with Schizophrenia n Anterior corona radiata p<10-11 n Genu of Corpus Calosum p<10-6 n Inferior Frontal Occipital p<10-5 n Superior Corona radiata p<10-5 Mega-analysis results of regional effects Least impact with Schizophrenia n Cortico-Spinal Tract p=0.2 n Superior-Frontal Occipital 0.05 n Uncinate fasiculos p=0.02 Regional MEGA vs site Site 1 (N=350) Site 1 (N=220) 10 Site 1 (N=120) 4 9 3.5 ACR 8 3.5 3 7 3 ACR 2.5 2.5 6 5 2 y = 0.3448x + 3.122 R² = 0.162 4 1.5 1.5 3 y = 0.1886x + 0.7025 R² = 0.2275 1 2 1 0 0 5 10 Mega p-values 15 y = 0.1108x + 0.6827 R² = 0.10733 0.5 0.5 1 0 ACR 2 0 0 5 10 Mega p-values 15 0 5 10 Mega p-values 15 How does this work on individual subjects? N=35 subjects were imaged at Site 1 and 2 in studies 5 years apart R=0.55 R=0.43 3 0.55 2 0.5 0 -3 -2 -1 0 1 2 3 Site 2 Site 2 1 0.45 0.4 -1 0.35 -2 0.3 -3 Site 1 0.3 0.32 0.34 0.36 0.38 Site 1 0.4 0.42 0.44 Limitations: Normality • Data for patients and controls has to be transformable to “normal” state – Violated if patients have bi-modal distribution Excessive Kurtosis 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 -4 -3 -2 -1 -0.5 0 6 1 2 3 4 5 4 3 2 1 0 -4 -3 -2 -1 -1 0 1 2 3 4 • Caused by bi-modal distribution of FLAIR lesions in patients • Use inverse normal mapping parameters from the Controls • Use bi-Gaussian fit to probabilistically separate patients Acknowledgement • ENIGMA Team – Paul Thompson, Neda Jahanshad, Siniad Kelly, Jessica Turner • The PIs of the GOBS project: John Blangero and David Glahn • NIH – R01s MH085646, R01DA027680, R01EB015611, MH078111, MH0708143 and MH083824 – U54EB020403 and P50MH103222