SUPPLEMENTARY INFORMATION Identification of Germline Susceptibility Loci in ETV6-RUNX1-Rearranged Childhood Acute Lymphoblastic Leukemia Short title: Identification of risk loci in ALL subtype Eva Ellinghaus MSc 1,*, Prof. Martin Stanulla MD 2,*,‡, Gesa Richter PhD 1, David Ellinghaus MSc 1, Geertruy te Kronnie PhD 3, Gunnar Cario MD 2, Giovanni Cazzaniga PhD 4, Martin Horstmann MD 5, Prof. Renate Panzer Grümayer MD 6, Hélène Cavé MD 7, Prof. Jan Trka MD PhD 8, Ondrej Cinek PhD 9, Andrea Teigler-Schlegel PhD 10, Abdou ElSharawy PhD 1, Robert Häsler PhD 1, Prof. Almut Nebel PhD 1, Barbara Meissner MD 2, Thies Bartram MSc 2, Francesco Lescai PhD 11, Prof. Claudio Franceschi MD12, Marco Giordan PhD3, Prof. Peter Nürnberg PhD 13, Birger Heinzow MD 14, Martin Zimmermann PhD 15, Prof. Stefan Schreiber MD 1,16,17,*, Prof. Martin Schrappe MD 2,*, Prof. Andre Franke PhD 1,*,‡ 1 Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel, Germany 2 Department of Pediatrics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany, on behalf of the German Berlin-Frankfurt-Münster Study Group for Treatment of Childhood Acute Lymphoblastic Leukemia 3 Department of Pediatrics, Laboratory of Pediatric Hematology/Oncology, University of Padua, Padua, Italy 4 M. Tettamanti Research Center, Children’s Hospital, University of Milan-Bicocca, Monza, Italy 5 Clinic of Pediatric Hematology and Oncology, University Medical Center, and Research Center Children’s Cancer Center, Hamburg, Germany 6 St. Anna Children’s Hospital and Children’s Cancer Research Institute, Vienna, Austria 7 Department of Genetics, Hôpital Robert Debré, Paris, France 8 Department of Pediatric Hematology/Oncology, Second Faculty of Medicine, Charles University Prague, Prague, Czech Republic 9 Department of Pediatrics, Second Faculty of Medicine, Charles University Prague, Prague, Czech Republic 10 Oncogenetic Laboratory, Department of Pediatric Hematology and Oncology, JustusLiebig-University, Giessen, Germany 11 Division of Research Strategy, University College London, London, United Kingdom 12 Department of Experimental Pathology, University of Bologna, Bologna, Italy 13 Cologne Center for Genomics, University of Cologne, Cologne, Germany 14 State Social Services Agency Schleswig-Holstein, Kiel, Germany and University of Notre Dame, Sydney Medical School, Sydney, Australia 15 Pediatric Hematology and Oncology, Hannover Medical School, Hannover, Germany 16 Department of General Internal Medicine, University Hospital Schleswig-Holstein, Christian-Albrechts-University Kiel, Kiel, Germany 17 Popgen Biobank, University Hospital Schleswig-Holstein, University Kiel, Kiel, Germany * ‡ These authors contributed equally to this work To whom correspondence should be addressed: Prof. Dr. rer. nat. Andre Franke (a.franke@mucosa.de) Institute of Clinical Molecular Biology Christian-Albrechts-University Kiel Schittenhelmstr. 12 D-24105 Kiel, Germany Tel.: +49-431-597-4138 Fax.: +49-431-597-2196 Prof. Dr. med. Martin Stanulla, M.Sc. (martin.stanulla@uk-sh.de) Department of Pediatrics University Hospital Schleswig-Holstein Campus Kiel Arnold-Heller-Str. 3 D-24105 Kiel, Germany Tel.: +49-431-597-1628 Fax.: +49-431-597-3966 Christian-Albrechts- Figure S1. Quality control of genome-wide association data. (A) Statistical power to detect a given allelic disease association (for carriership of the rarer SNP allele) in screening panel A (419 cases and 474 controls) was calculated with PS Power and Sample Size v3.012 (1). Calculations were performed for different allele frequencies (denoted p0).The power is given as a function of the odds ratio and the red dotted line shows the threshold of 80% power. (B) To display whether the study generated more significant results than expected by chance, the quantile-quantile (Q-Q) plot of the association test statistic was calculated for all SNPs that passed quality control (n=355,750). The genomic inflation factor, based on the median chi-squared (λGC=1.14) indicated no or minimal undetected population stratification or cryptic relatedness. (C) The de Finetti diagram shows genotype distributions of all quality-controlled SNPs in the case-control population. Any point within the de Finetti triangle corresponds to a specific combination of the three genotype frequencies p11, p12 and p22 in relation to each other: The curved red line is referred to as the Hardy-Weinberg parabola and depicts the genotype distributions strictly fulfilling Hardy-Weinberg equilibrium. The black band of genotype distributions represents an area where deviation from Hardy-Weinberg is not too strong. (D) To detect "outliers", pair-wise percentage identity-by-state (IBS) values were computed with PLINK (2). The distribution of the IBS values for each individual was compared with the combined IBS distribution of the entire population. We detected and removed 22 individuals that were less related to the entire population than expected. For these individuals >60% of the IBS values were smaller than the median minus three times the interquartile range (3×IQR) of the population distribution. No cryptically related individuals with at least one observed IBS value above the median plus 3×IQR were detected. (E) The multidimensional scaling (MDS) plot shows genuine European ancestry for the cleaned GWAS panel A which was plotted with the three distinct HapMap sample populations (see box with legend in plot) for the first two principle components. After exclusion of 22 "outliers" (see D), 59 samples showed evidence of nonEuropean ancestry and were removed as well. Figure S2. Results of genome-wide association analysis. For each SNP the negative decadic logarithm for the corresponding P-value of the allelic test of the genome-wide association study is shown, according to chromosome. All markers that passed quality control criteria before clumping and visual inspection of the cluster plots were used for plotting. The four novel susceptibility loci for ALL that our study indicated, are highlighted by arrows. The plot was created with the software environment R version 2.11.1 (3). Figure S3. Signal intensity/cluster plots. Scatter plots of normalized summary probe intensities for the 100 SNPs selected for verification. Each point represents one individual and is colored according to the genotype assignment by the calling algorithm (blue or red: homozygous for one of the two alleles; green: heterozygous; black circle: ‘null’ or missing call). The aim of examining a cluster plot is twofold: 1) to determine whether a given SNP has been genotyped well. In particular, whether clear distinct clusters can be identified on the plot that would correspond to the three genotypes, and 2) to determine whether the calling algorithm has called the clusters correctly. If both of these requirements are fulfilled, genotype counts can usually be assumed to be sufficiently accurate. If not, any observed disease association of such a SNP may be due to incorrect genotype counts. References 1. Dupont WD, Plummer WD. PS power and sample size program available for free on the Internet. Controlled Clin Trials. 1997;18(274). 2. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep;81(3):559-75. 3. Team RDC. R: A language and environment for statistical computing. . R Foundation for Statistical Computing, Vienna, Austria. 2007.