Supplementary Figure Legend S1-S3 (doc 44K)

advertisement
SUPPLEMENTARY INFORMATION
Identification of Germline Susceptibility Loci in ETV6-RUNX1-Rearranged
Childhood Acute Lymphoblastic Leukemia
Short title: Identification of risk loci in ALL subtype
Eva Ellinghaus MSc 1,*, Prof. Martin Stanulla MD 2,*,‡, Gesa Richter PhD 1, David Ellinghaus
MSc 1, Geertruy te Kronnie PhD 3, Gunnar Cario MD 2, Giovanni Cazzaniga PhD 4, Martin
Horstmann MD 5, Prof. Renate Panzer Grümayer MD 6, Hélène Cavé MD 7, Prof. Jan Trka
MD PhD 8, Ondrej Cinek PhD 9, Andrea Teigler-Schlegel PhD 10, Abdou ElSharawy PhD 1,
Robert Häsler PhD 1, Prof. Almut Nebel PhD 1, Barbara Meissner MD 2, Thies Bartram MSc
2,
Francesco Lescai PhD 11, Prof. Claudio Franceschi MD12, Marco Giordan PhD3, Prof. Peter
Nürnberg PhD
13,
Birger Heinzow MD
14,
Martin Zimmermann PhD
15,
Prof. Stefan
Schreiber MD 1,16,17,*, Prof. Martin Schrappe MD 2,*, Prof. Andre Franke PhD 1,*,‡
1
Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel,
Germany
2
Department of Pediatrics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel,
Germany, on behalf of the German Berlin-Frankfurt-Münster Study Group for Treatment
of Childhood Acute Lymphoblastic Leukemia
3
Department of Pediatrics, Laboratory of Pediatric Hematology/Oncology, University of
Padua, Padua, Italy
4
M. Tettamanti Research Center, Children’s Hospital, University of Milan-Bicocca,
Monza, Italy
5
Clinic of Pediatric Hematology and Oncology, University Medical Center, and Research
Center Children’s Cancer Center, Hamburg, Germany
6
St. Anna Children’s Hospital and Children’s Cancer Research Institute, Vienna, Austria
7
Department of Genetics, Hôpital Robert Debré, Paris, France
8
Department of Pediatric Hematology/Oncology, Second Faculty of Medicine, Charles
University Prague, Prague, Czech Republic
9
Department of Pediatrics, Second Faculty of Medicine, Charles University Prague,
Prague, Czech Republic
10
Oncogenetic Laboratory, Department of Pediatric Hematology and Oncology, JustusLiebig-University, Giessen, Germany
11
Division of Research Strategy, University College London, London, United Kingdom
12
Department of Experimental Pathology, University of Bologna, Bologna, Italy
13
Cologne Center for Genomics, University of Cologne, Cologne, Germany
14
State Social Services Agency Schleswig-Holstein, Kiel, Germany and University of
Notre Dame, Sydney Medical School, Sydney, Australia
15
Pediatric Hematology and Oncology, Hannover Medical School, Hannover, Germany
16
Department of General Internal Medicine, University Hospital Schleswig-Holstein,
Christian-Albrechts-University Kiel, Kiel, Germany
17
Popgen
Biobank,
University
Hospital
Schleswig-Holstein,
University Kiel, Kiel, Germany
*
‡
These authors contributed equally to this work
To whom correspondence should be addressed:
Prof. Dr. rer. nat. Andre Franke (a.franke@mucosa.de)
Institute of Clinical Molecular Biology
Christian-Albrechts-University Kiel
Schittenhelmstr. 12
D-24105 Kiel, Germany
Tel.:
+49-431-597-4138
Fax.: +49-431-597-2196
Prof. Dr. med. Martin Stanulla, M.Sc. (martin.stanulla@uk-sh.de)
Department of Pediatrics
University Hospital Schleswig-Holstein
Campus Kiel
Arnold-Heller-Str. 3
D-24105 Kiel, Germany
Tel.:
+49-431-597-1628
Fax.: +49-431-597-3966
Christian-Albrechts-
Figure S1. Quality control of genome-wide association data.
(A) Statistical power to detect a given allelic disease association (for carriership of the rarer
SNP allele) in screening panel A (419 cases and 474 controls) was calculated with PS Power
and Sample Size v3.012 (1). Calculations were performed for different allele frequencies
(denoted p0).The power is given as a function of the odds ratio and the red dotted line shows
the threshold of 80% power. (B) To display whether the study generated more significant
results than expected by chance, the quantile-quantile (Q-Q) plot of the association test
statistic was calculated for all SNPs that passed quality control (n=355,750). The genomic
inflation factor, based on the median chi-squared (λGC=1.14) indicated no or minimal
undetected population stratification or cryptic relatedness. (C) The de Finetti diagram shows
genotype distributions of all quality-controlled SNPs in the case-control population. Any
point within the de Finetti triangle corresponds to a specific combination of the three
genotype frequencies p11, p12 and p22 in relation to each other: The curved red line is referred
to as the Hardy-Weinberg parabola and depicts the genotype distributions strictly fulfilling
Hardy-Weinberg equilibrium. The black band of genotype distributions represents an area
where deviation from Hardy-Weinberg is not too strong. (D) To detect "outliers", pair-wise
percentage identity-by-state (IBS) values were computed with PLINK (2). The distribution of
the IBS values for each individual was compared with the combined IBS distribution of the
entire population. We detected and removed 22 individuals that were less related to the entire
population than expected. For these individuals >60% of the IBS values were smaller than the
median minus three times the interquartile range (3×IQR) of the population distribution. No
cryptically related individuals with at least one observed IBS value above the median plus
3×IQR were detected. (E) The multidimensional scaling (MDS) plot shows genuine
European ancestry for the cleaned GWAS panel A which was plotted with the three distinct
HapMap sample populations (see box with legend in plot) for the first two principle
components. After exclusion of 22 "outliers" (see D), 59 samples showed evidence of nonEuropean ancestry and were removed as well.
Figure S2. Results of genome-wide association analysis.
For each SNP the negative decadic logarithm for the corresponding P-value of the allelic
test of the genome-wide association study is shown, according to chromosome. All
markers that passed quality control criteria before clumping and visual inspection of the
cluster plots were used for plotting. The four novel susceptibility loci for ALL that our
study indicated, are highlighted by arrows. The plot was created with the software
environment R version 2.11.1 (3).
Figure S3. Signal intensity/cluster plots.
Scatter plots of normalized summary probe intensities for the 100 SNPs selected for
verification. Each point represents one individual and is colored according to the genotype
assignment by the calling algorithm (blue or red: homozygous for one of the two alleles;
green: heterozygous; black circle: ‘null’ or missing call). The aim of examining a cluster plot
is twofold: 1) to determine whether a given SNP has been genotyped well. In particular,
whether clear distinct clusters can be identified on the plot that would correspond to the three
genotypes, and 2) to determine whether the calling algorithm has called the clusters correctly.
If both of these requirements are fulfilled, genotype counts can usually be assumed to be
sufficiently accurate. If not, any observed disease association of such a SNP may be due to
incorrect genotype counts.
References
1.
Dupont WD, Plummer WD. PS power and sample size program available for free on
the Internet. Controlled Clin Trials. 1997;18(274).
2.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK:
a tool set for whole-genome association and population-based linkage analyses. Am J
Hum Genet. 2007 Sep;81(3):559-75.
3.
Team RDC. R: A language and environment for statistical computing. . R Foundation
for Statistical Computing, Vienna, Austria. 2007.
Download