ESM 1 - Nederlands Tweelingen Register

advertisement
Supplementary Material
QIMR Genotyping and Quality Control
Individuals were genotyped as part of several different genotyping projects undertaken at
QIMR. Depending upon the project in which they were included, subjects were genotyped
on the Illumina 317K, 370K or 610K platforms. Strict quality control (QC) procedures were
applied to the genotype data to reduce the potential for false positives during the analysis.
All samples were imputed to the International HapMAP 2 project reference panel using the
software MACH(Li et al., 2010) using 274,604 SNPs common across all genotyping platforms.
Further QC steps were carried out to remove SNPs that were poorly imputed. Full details of
the genotyping, imputation and QC procedures are given in Medland et al 2009(Medland et
al., 2009). After the QC process and removal of non-genotyped individuals leaving 564 cases
and 1571 controls of which 486 cases and 1056 controls were unrelated and used in the
primary analysis.
QC steps included testing for deviation from Hardy-Weinberg Equilibrium (p < 10-6),
Mendelian errors and removing individuals and SNPs who had >5% missingness. SNPs with
MAF (minor allele frequency) < 0.01 or genotype quality score < 0.7 were removed.
Individuals showing evidence of non-European ancestry based on the genotyping
information were also removed.
NESDA/NTR Genotyping and QC
Participants were genotyped on the Affymetrix Perlegen 5.0 platform, and 1,235,109 SNPs
from the HapMap3 CEU+TSI populations were imputed using Beagle 3.0.4.
STR Genotyping and Quality Control
Quality control filtering included removal of SNPs with more than 3% missing information,
<1% minor allele frequency, deviation from Hardy-Weinberg equilibrium (p<=1x10-7) and
individuals with more than 3% missing genotypes, failed sex-check, more than 5 SD from
sample mean of heterozygosity as well as unresolved cases of cryptic relatedness.
Imputation to Hapmap 2 build 36 was performed by using IMPUTE2.
ALSPAC
ALSPAC genotyping was carried out at the Centre National de Génotypage (CNG) using the
Illumina Human660W-quad array. Quality control measures included the removal of SNPs
with >5% missing information, MAF <1% and deviation from HWE (p<1.0x10-6). Individuals
were excluded if they had >5% missing information, evidence of non-European ancestry
from principal component analysis of the GWAS data or indeterminate X chromosome
heterozygosity. A total of 8,340 participants and 526,688 SNPs passed these quality control
filters. Autosomal SNPs were imputed to the HapMap CEU population (release 22) using
MaCH (v1.0.16) and NCBI build 36, HapMap 3 release 2 (Feb 2009) for the X chromosome
using Minimac (v4.4.3). Prior to analysis, quality control measures were applied to the
imputed genotypes and SNPs with MAF <1% were excluded, as were those with R2 ≤30%.
Association Analysis
Association analysis was initially performed in the Australian discovery sample, with
subsequent attempted replication of the most significant SNPs in the Australian replication
sample. Access to data from the Dutch, Swedish and U.K. samples became available after
the initial replication attempt, and a meta-analysis of the Australian Discovery, NESDA/NTR,
Swedish and U.K. samples was carried out.
Statistical Power
Our study was underpowered to detect common risk variants with effect sizes typical of
those found for other psychiatric disorders (0.3% power to detect a risk allele of frequency
0.2-0.8 and a genotype relative risk of 1.15).
Logistic regression of SNP imputation dosage scores and PPD using unrelated individuals was
conducted in the Australian Discovery sample using R. Association testing of genotyped SNPs
in the Australian replication sample was also performed using PLINK. Association testing
between SNP dosage scores and PPD in the both the GAIN and Swedish replication samples
was performed in PLINK(Purcell et al., 2007), with ancestry principal components included as
covariates in all analyses. Meta-analysis of the Australian discovery sample and the Dutch
and Swedish replication samples (total no. of cases = 1,420, total number of controls =
9,473) was performed for all SNPs analysed in at least two studies (n = 2,473,712) using
PLINK (Purcell et al., 2007). A total 967,839 SNPs were analysed in all four studies. The
estimate from each study was weighted by the standard error of the log of the odds ratio
and the results from both a fixed-effects and random-effects meta-analysis were computed.
Our two-stage design in the Australian samples has 80% power to detect a risk allele of
frequency 0.1-0.8 and genotype relative risk 1.5 at the genome-wide level of significance p<
5x10-8 (Skol et al., 2006). The meta-analysis of all 4 samples provides >90% power to detect a
risk allele of frequency 0.2-0.8 and a genotype relative risk of 1.35 for a SNP that is included
in all three samples. Secondary analyses such as sign tests between the 4 cohorts and genebased analyses using VEGAS (Liu et al., 2010) were performed.
We performed a sign test to assess whether the associated SNPs show the same direction of
effect in the replication samples. A positive result in the sign test would support the
hypothesis that many of the most significant SNPs are truly associated with PPD, but the
sample size is too small to detect them at genome-wide significant level. We used a binomial
test to evaluate the probability of seeing the observed number of SNPs with the same sign in
the replication samples.
The gene-based test VEGAS(Liu et al., 2010) was applied in the Australian discovery sample
to identify genes associated with PPD, with genes defined as ± 50 kb from the start/stop
sites. The test combines the test statistic for all of the SNPs in the gene into a single genebased test-statistic accounting for correlation between them due to linkage disequilibrium.
In total 17787 genes were tested and genes with a p-value less than 2.8 x 10-6 (0.05/17787)
were considered significant.
Australian Replication Sample Genotyping
Sequences in regions containing SNPs selected for the replication study were downloaded
from the National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) and
were cross-checked using Sequenom databases (https://mysequenom.com) before assay
design. Multiplexed assays were designed for 62 SNPs using the Sequenom MassARRAY
Assay Design software (version 3.1). SNPs were typed using iPLEXTM Gold chemistry and
analysed using a Sequenom MassARRAY Compact Mass Spectrometer (Sequenom Inc, San
Diego, CA, USA). The SAP and iPLEX reactions were performed according to manufacturer’s
instructions, while the PCR stage of the process was performed using a 2.5uL half reaction.
The post-PCR products were spotted on a Sequenom SpectroChip 2, and the data was
processed and analysed using Sequenom MassARRAY TYPER 4.0 software. Two SNPs
produced ambiguous calls and were removed from the analysis. A total of 21 individuals
were removed from the replication sample for missingness > 0.05
Association analysis results
The association results from the Australian discovery sample with p < 0.001 are listed in
Supplement Table 2. The Q-Q plot is shown in Supplementary Figure 1. Twenty independent
SNPs (r2 < 0.65) were associated at p < 10-4. SNP rs9360356 on chromosome 6 is found in a
plausible candidate brain expressed gene(DeRosse et al., 2008), BAI3 (brain-specific
angiogenesis inhibitor 3)(Kee et al., 2004). SNPs representing 19 independent regions
associated at p < 10-4 were genotyped in the Australian replication sample drawn from the
same phenotyped sample, and showed minimum association p =0.03, not significant given
the level of multiple testing (not reported) (Supplementary Table 3). None of the SNPs were
significant in the Dutch, Swedish and UK replication samples after correcting for multiple
testing (minimum uncorrected p = 0.02).
No genes reached significance in the gene-based test (Supplementary Table 4). However,
the two most associated genes were two closely linked genes that encode pregnancy
specific beta-1-glycoproteinss, PSG8 and PSG3 (p= 0.0001 and p=0.0002). The association
signal for both genes resulted from an overlapping set of SNPs of which rs8112446 was the
most associated (p=0.0003) (Supplementary Figure 3). After conditioning on rs8112446, no
SNPs in either gene were associated with nominal significance. The human pregnancyspecific glycoproteins (PSGs) are transcribed from a family of 11 genes(Teglund et al., 1994)
located in a 700kb region of chromosome 19 and are primarily synthesized by the
syncytiotrophoblast during pregnancy.(Zhou et al., 1997) Reduced serum levels of PSGs
during the first trimester of pregnancy have been associated with small for gestational age
fetuses (SGA) and spontaneous preterm delivery.(Pihl et al., 2009) Low levels of PSGs in later
in pregnancy are also associated with the same features.(Gordon et al., 1977, Tamsen et al.,
1983, Westergaard et al., 1985) Replication was attempted for 10 SNPs in the region,
including the most significant SNPs in both genes, 4 nonsynonymous SNPs identified using
SNPper(Riva and Kohane, 2002), and 4 SNPs known to be cis-eQTLs for the PSG3 gene,
identified using the seeQTL database(Xia et al., 2012). No SNPs in the region showed
nominal significance in the replication sample, and rs8112446, the top SNP in the region in
the discovery analysis having p = 0.84 in the Australian replication sample.
A meta-analysis of SNPs overlapping in at least two of the studies (n = 2,473,712) did not
identify any genome-wide significant SNPs. The results for the most significant independent
(at r2 < 0.5) SNPs in the meta-analysis are shown in Supplementary Table 5. The most
significant SNP was rs6918856 (p = 8.9 x 10-8) which approached genome-wide significance.
This SNP is located on chromosome 6 in a region with no annotated Refseq genes. The SNP is
found in close proximity to an H3K27Ac site identified in the ENCODE, that harbors binding
sites for the GATA-2, c-FOS and p300 transcription factors. Furthermore, the SNP is in strong
LD with SNPs located within the transcription factor binding sites.
The Australian, Swedish and UK samples were imputed to HapMap II and therefore the top
20 independent SNPs in the Australian sample were all found in the Swedish and UK
samples. Only 9 of the 20 SNPs had the same direction of effect in the Swedish sample which
is less than would be expected by chance. Furthermore, only 8 of the top 20 SNPs from the
discovery sample were directly typed or imputed in the GAIN sample. A further 4 of the top
20 SNPs had a proxy SNP at r2 > 0.8 in the GAIN sample, giving a total of 12 SNPs that could
be evaluated in the sign test. 6 of the 12 SNPs had the same direction of effect as the
discovery sample. 19 of the top 20 SNPs in the Australian sample were typed or imputed in
the UK sample. Only 5 of 19 SNPs had the same direction in the UK sample.
To further test for evidence of a polygenic architecture for PND, we meta-analysed the
results from the Australian, Dutch and Swedish cohorts and then extracted the 200 most
significant independent (at r2 < 0.5) SNPs. We then tested whether the direction of effect
was the same in the UK sample for those SNPs. Only 94 of the 200 SNPs had the same
direction of effect in the UK sample, which was not significant.
Permutation Analysis in NESDA/NTR
We tested the significance of this result by randomly sampling 208 cases (both PPD and nonPPD MDD cases) and 761 controls 1,000 times from the entire NESDA/NTR MDD
case/control sample and performing the profile scoring analysis using all clumped BPD SNPs.
When sampling from the entire NESDA/NTR cohort, only 18 of the 1,000 replicates had a
lower p-value or a higher R2 than the true p-value and R2 (p = 0.018). The same sampling
approach was utilized but restricting the cases and controls sampled to females. By
restricting to only sampling females, we would expect that we will oversample PPD cases
relative to the previous analysis. 23 replicates had a lower p-value or higher R2 than the true
values (p = 0.023).
Supplementary Figure 1. Q-Q plot from the GWAS analysis in the Australian
Discovery Sample.
Supplementary Figure 2. Q-Q plot of meta-analysis of all 4 samples.
Supplementary Figure 3
References
DEROSSE, P., LENCZ, T., BURDICK, K. E., SIRIS, S. G., KANE, J. M. & MALHOTRA, A. K.
2008. The genetics of symptom-based phenotypes: toward a molecular
classification of schizophrenia. Schizophr Bull, 34, 1047-53.
GORDON, Y. P., GRUDZINSKAS, J. G., JEFFREY, D. & CHARD, T. 1977. Concentrations
of pregnancy-specific beta 1-glycoprotein in maternal blood in normal
pregnancy and in intrauterine growth retardation. Lancet, 1, 331-3.
KEE, H. J., AHN, K. Y., CHOI, K. C., WON SONG, J., HEO, T., JUNG, S., KIM, J. K., BAE, C.
S. & KIM, K. K. 2004. Expression of brain-specific angiogenesis inhibitor 3
(BAI3) in normal brain and implications for BAI3 in ischemia-induced brain
angiogenesis and malignant glioma. FEBS Lett, 569, 307-16.
LI, Y., WILLER, C. J., DING, J., SCHEET, P. & ABECASIS, G. R. 2010. MaCH: using
sequence and genotype data to estimate haplotypes and unobserved
genotypes. Genet Epidemiol, 34, 816-34.
LIU, J. Z., MCRAE, A. F., NYHOLT, D. R., MEDLAND, S. E., WRAY, N. R., BROWN, K. M.,
HAYWARD, N. K., MONTGOMERY, G. W., VISSCHER, P. M., MARTIN, N. G. &
MACGREGOR, S. 2010. A versatile gene-based test for genome-wide
association studies. Am J Hum Genet, 87, 139-45.
MEDLAND, S. E., NYHOLT, D. R., PAINTER, J. N., MCEVOY, B. P., MCRAE, A. F., ZHU, G.,
GORDON, S. D., FERREIRA, M. A., WRIGHT, M. J., HENDERS, A. K., CAMPBELL,
M. J., DUFFY, D. L., HANSELL, N. K., MACGREGOR, S., SLUTSKE, W. S., HEATH,
A. C., MONTGOMERY, G. W. & MARTIN, N. G. 2009. Common variants in the
trichohyalin gene are associated with straight hair in Europeans. Am J Hum
Genet, 85, 750-5.
PIHL, K., LARSEN, T., LAURSEN, I., KREBS, L. & CHRISTIANSEN, M. 2009. First trimester
maternal serum pregnancy-specific beta-1-glycoprotein (SP1) as a marker of
adverse pregnancy outcome. Prenatal Diagnosis, 29, 1256-1261.
PURCELL, S., NEALE, B., TODD-BROWN, K., THOMAS, L., FERREIRA, M. A., BENDER, D.,
MALLER, J., SKLAR, P., DE BAKKER, P. I., DALY, M. J. & SHAM, P. C. 2007.
PLINK: a tool set for whole-genome association and population-based linkage
analyses. Am J Hum Genet, 81, 559-75.
RIVA, A. & KOHANE, I. S. 2002. SNPper: retrieval and analysis of human SNPs.
Bioinformatics, 18, 1681-5.
SKOL, A. D., SCOTT, L. J., ABECASIS, G. R. & BOEHNKE, M. 2006. Joint analysis is more
efficient than replication-based analysis for two-stage genome-wide
association studies. Nat Genet, 38, 209-13.
TAMSEN, L., AXELSSON, O. & JOHANSSON, S. G. 1983. Serum levels of pregnancyspecific beta 1-glycoprotein (SP1) in women with pregnancies at risk. Gynecol
Obstet Invest, 16, 253-60.
TEGLUND, S., OLSEN, A., KHAN, W. N., FRANGSMYR, L. & HAMMARSTROM, S. 1994.
The pregnancy-specific glycoprotein (PSG) gene cluster on human
chromosome 19: fine structure of the 11 PSG genes and identification of 6
new genes forming a third subgroup within the carcinoembryonic antigen
(CEA) family. Genomics, 23, 669-84.
WESTERGAARD, J. G., TEISNER, B., HAU, J., GRUDZINSKAS, J. G. & CHARD, T. 1985.
Placental function studies in low birth weight infants with and without
dysmaturity. Obstet Gynecol, 65, 316-8.
XIA, K., SHABALIN, A. A., HUANG, S., MADAR, V., ZHOU, Y. H., WANG, W., ZOU, F.,
SUN, W., SULLIVAN, P. F. & WRIGHT, F. A. 2012. seeQTL: a searchable
database for human eQTLs. Bioinformatics, 28, 451-2.
ZHOU, G. Q., BARANOV, V., ZIMMERMANN, W., GRUNERT, F., ERHARD, B.,
MINCHEVA-NILSSON, L., HAMMARSTROM, S. & THOMPSON, J. 1997. Highly
specific monoclonal antibody demonstrates that pregnancy-specific
glycoprotein (PSG) is limited to syncytiotrophoblast in human early and term
placenta. Placenta, 18, 491-501.
Download