Supplemental Methods Genotyping DNA was extracted from whole

advertisement
Supplemental Methods
Genotyping
DNA was extracted from whole blood using standard protocols as described before 1.
All DNA samples were tested for their quality by photo-spectrometry (Nanodrop
1000, Thermo Scientific, USA) and agarose gel electrophoresis. DNA concentrations
were determined using QUANT-IT Picogreen dsDNA reagent (Invitrogen, USA) on a
Genios fluorometer (Tecan, Switzerland) or by spectrophotometry. In stage 1, wholegenome genotyping of cases was performed using the Human SNP Array 5.0
(Affymetrix, USA). Affymetrix CEL-files from whole-genome genotyping of the
control samples were retrieved from KORA (chip type: GeneChip Human Mapping
500K) and PopGen (Human SNP Array 5.0) consortia. Call rates and genotyping
calls for each array were analyzed by Affymetrix Genotyping Console software using
BRLMM-P algorithms except for the KORA samples, which were called by the
BRLMM algorithm optimized for Human Mapping 500K chips. SNPs exhibiting
minor allele frequencies < 3%, call rates ≤ 95%, or deviations from Hardy-Weinberg
equilibrium considering a significance level of 0.05 for controls and 0.001 for cases
failed quality control criteria and were excluded from further analyses.
Direct genotyping of selected SNPs within cases and controls of stage 2 and 3
was carried out with TaqMan fluorogenic 5’ nuclease assays (Applied Biosystems,
USA) on an ABI 7900 sequence detector (Applied Biosystems, USA) using 2 ng of
genomic DNA per SNP. Genotypes were generated by automatic calling using SDS
software version 2.3 (Applied Biosystems, USA). The missing-genotype rate of cases
was < 4% and Hardy-Weinberg-Equilibrium (HWE) was met for all SNPs. Among
French control samples, HWE was not met for rs9262636 (p=0.03; for QC statistics
see supplemental table 3). HWE was met for all SNPs of the SHIP cohort after
additionally genotyping of 2 SNPs (rs4713429 and rs9262635) in 4,081 controls.
For eQTL analysis, a subset of the SHIP-TREND cohort (n = 986) was
genotyped using the Illumina Human Omni 2.5 array. Processing of genomic DNA
and array hybridization was done in accordance with the manufacturer’s standard
recommendations at the Helmholtz Zentrum München (Germany). The genetic data
analysis workflow was created using the Software InforSense. Genetic data were
stored using the database Caché (InterSystems). Genotypes were determined using the
GenomeStudio Genotyping Module v1.0 (GenCall algorithm). All 986 arrays
produced genotyping rates of at least 94 %. The overall genotyping efficiency was
99.67 %. Imputation of genotypes in SHIP was performed with the software
IMPUTEv2 based on HapMap II (CEU v22, Build 36). The imputation quality of
rs9262636 was 0.98, with 1 representing complete and 0 = null imputation accuracy.
Total RNA was prepared from peripheral whole-blood samples collected in PAXgene
tubes (BD) using a QIAcube device in combination with the Blood miRNA Kit (both
from QIAGEN, Hilden, Germany) according to manufacturer’s protocols. Subsequent
RNA sample processing and hybridization with the Illumina HumanHT-12 v3
Expression BeadChip was performed as described by the manufacturer (Illumina) at
the Helmholtz Zentrum München.
Statistical analysis
To encounter potential confounders, the analyses were adjusted for sex and age of the
individuals by logistic regression. Genomic inflation factor for the screening stage
was calculated as median of all SNPs divided by the median of a chi square
distribution with 1 df. Plink was used for correcting p-values of the association
analyses for genomic control (GC) as well as for multiple testing using the
Bonferroni-method (cutoff p = 1.7x10-7, which is the significance threshold of 0.05
corrected for multiple testing of 292,367 tested SNPs). Hardy-Weinberg equilibrium
in cases and controls was tested using an exact test 2 as implemented in Plink. SNPs
from the screening phase were candidates for replication if they passed genotyping
quality control criteria and if an association with DCM was observed considering a
significance level of 10-6 in an additive model either with or without adjustment for
age and sex. Out of 16 SNPs in the dataset meeting these criteria 3 SNPs
(rs17098042, rs7864098, rs697055) were discarded due to noticeable imbalances in
allele frequencies between controls of different origin. Another SNP (rs7192626) was
not considered for replication because no appropriate Taqman assay was available.
Since neither Taqman assay was available for SNP rs2523883, rs2517471 in close LD
was genotyped instead. The final set of selected SNPs for replication is shown in table
2. Since replication samples of stage 2 originate from Germany and Italy, association
analysis of this stage was additionally adjusted for place of origin.
Results from the screening stage and replication stage 2 were combined using
Fisher’s combined probability test. Association of haplotypes was calculated by using
the case/control association test implemented in Haploview version 4.2 3. The
attributable risk (AR) of haplotypes is estimated as
with x1 (y1) indicating haplotype counts in cases (controls) and x 0 (y0) individuals
showing different haplotypes in cases (controls) 4. In order to investigate variants that
were not directly genotyped in our study, we imputed genotypes based on the CEU
population in HapMap (Phase I+II, release 24) and on the directly genotyped SNPs
around the strongest signal of association. Imputation relied on inference of
haplotypes by means of expectation-maximization (EM) algorithm with partially
missing data 5. Shortly, in the E-step, frequencies of partially missing genotypes were
updated looping through all possible genotypes. In the M-step, all existing haplotypes
that had alleles identical to the non-missing alleles of this haplotype were updated.
The accuracy of imputation using HapMap data was evaluated by cross-validation and
it was summarized by minus the logarithm of the probability value for Cohen’s Kappa
equal to zero between true and imputed genotypes. Originally, a ±500 kb region
centered on the strongest association signal was investigated. The low recombination
rate found to the right of this region motivated an extension by 900kb (supplementary
figure 1B). Selection of variants for subsequent logistic regression relied on the visual
inspection of recombination rates and on the estimated imputation accuracies. In the
analyses of association with DCM, uncertainty in the imputed genotypes was taken
into account by multiple bootstrapping (100 replicates) from the multinomial
distribution of the expected given the observed genotypes in HapMap. Probability
values referred to an additive logistic regression model including age and sex.
eQTL association analyses were carried out under an additive model in a
linear regression that incorporates the normalized log2 expression values as dependent
variable. Analyses were adjusted for sex, age and the first 50 principal components
obtained from PCA over the expression values to reduce variance related to technical
variability and batch effects during the processing of expression arrays. Analyses and
plots were performed in R.
References
1.
Meder B, Haas J, Keller A, Heid C, Just S, Borries A, Boisguerin V,
Scharfenberger-Schmeer M, Stahler P, Beier M, Weichenhan D, Strom TM, Pfeufer
A, Korn B, Katus HA, Rottbauer W. Targeted Next-Generation Sequencing for the
Molecular Genetic Diagnostics of Cardiomyopathies. Circ Cardiovasc Genet 2011.
2.
Wigginton JE, Cutler DJ, Abecasis GR. A note on exact tests of Hardy-
Weinberg equilibrium. Am J Hum Genet 2005;76(5):887-93.
3.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization
of LD and haplotype maps. Bioinformatics 2005;21(2):263-5.
4.
Lui KJ. Statistical Estimation of Epidemiological Risk: John Wiley & Sons,
New York; 2004.
5.
Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint
method for genome-wide association studies by imputation of genotypes. Nat
Genet 2007;39(7):906-13.
Download