Supplementary Material (doc 150K)

advertisement

Bipolar GWA Study

SUPPLEMENTAL MATERIAL

Study Subjects

Bipolar cases.

Cases were selected from those collected and characterized by the Bipolar Consortium over the past 18 years. These subjects were collected in 5 waves. Waves 1-4 were families collected for linkage studies, while wave 5 is a large set of primarily unrelated cases collected for large-scale association studies.

Waves 1-4 comprise the Bipolar Family Dataset (BFD), which includes 2,936 subjects from 646 pedigrees, each ascertained via a proband with a Bipolar I (BPI) diagnosis and an additional first degree relative with BPI or Schizoaffective Disorder,

Bipolar Type (SABP) diagnosis .

All subjects were diagnosed with a standard best estimate (BEFD) procedure (see below). For the BiGS GWA study we selected unrelated Diagnostic and Statistical Manual (DSM) IV-defined BPI subjects (1) from these families. Waves 1 and 2 involved recruitment of primarily large pedigrees through four sites (Indiana, Johns Hopkins, the NIMH Intramural Program, and Washington

University at St. Louis). In Waves 3 and 4, smaller pedigrees with a minimal ascertainment criterion of a BPI-BPI affected sibling pair were recruited through an expanded series of sites (i.e., including those above, as well as University of

Pennsylvania, University of California at San Diego, University of California at Irvine,

University of California at San Francisco, University of Iowa, University of Chicago, and

Rush University). BiGS subjects include 175 unrelated EA cases from Waves 1 and 2 and 396 unrelated cases from Waves 3 and 4. In addition, the BiGS sample included

1

Bipolar GWA Study

430 cases from the ‘Wave 5’ data collection, which included 4,089 DNA samples from

3,655 families ascertained through probands with DSM IV-defined BPI disorder.

The BiGS cases abstracted from Waves 1-5 of the BiGS consortium dataset totaled 1,041 EA individuals. EA status was determined based on the subject’s selfreport that all four grandparents were of EA heritage. Forty EA subjects were removed due to a non-BPI/SABP best estimate diagnosis or low diagnostic confidence. The final

BiGS sample thus included 1,001 EA cases of which 951 had a diagnosis of BPI and 50 had a diagnosis of SABP. AA status was based on self-report of at least one grandparent being of AA. A total of 345 of these subjects were ultimately included in the

BiGS analyses after review of best estimate diagnoses; these included 315 BPI AA cases and 30 SABP AA cases.

Controls.

Controls were ascertained separately through a NIMH-supported contract mechanism between Dr. Pablo Gejman and Knowledge Networks, Inc.; this mechanism allowed the ascertainment of 4,586 subjects across the U.S. who agreed to donate a blood sample for transformation into lymphoblastoid cell lines and to respond to a medical questionnaire. All participating subjects, including 3,303 EA and 1,283 AA were given the questionnaire. Only individuals with complete or near-complete psychiatric questionnaire data who did not fulfill diagnostic criteria for major depression and denied a history of psychosis or bipolar disorder (BD) were included as controls for the BiGS analyses. Potential controls were matched for gender and ethnicity with the

BiGS EA cases, and the control counts were 1,034 EA subjects and 716 AA subjects.

2

Bipolar GWA Study

Clinical Assessment.

All case subjects were interviewed with the Diagnostic

Interview for Genetic Studies (2). The DIGS was revised (DIGS 4.0) between Waves 4 and 5 to allow collection of additional data on posttraumatic stress disorder and adult attention deficit disorder, as well as additional phenotypic information on BD. A change was also made in the ‘Best Estimate Final Diagnosis’ (BEFD) process at the start of

Wave 5 to incorporate clinician judgment of multiple phenotypic indicators. These included diagnosis by DSM IV, DSM IIIR, and the Research Diagnostic Criteria (RDC)

(3), as well as age of onset, number of episodes for depression, hypomania and mania, temporal relationship of mood disorder to substance abuse and psychosis, evidence of mixed episodes and rapid cycling, and a summary of the family history information. All of these indicators were scored independently by a senior clinician (generally a psychiatrist) based on all available information, including medical records, interviewer observations, the coded DIGS, and the Family Instrument for Genetic Studies (‘FIGS,’ developed for the NIMH Genetics Initiative; available at http://www.nimhgenetics.org/).

The FIGS incorporates clinician judgment on family patterns of illness, including presence or absence of BD, unipolar disorder, and/or other psychiatric disorders in first and second-degree relatives. Final BEFD judgments based on all available criteria, including level of interviewer agreement and certainty, are available for all BiGS subjects (www.nimhgenetics.org). In addition, all genotypic and phenotypic data for the

BiGS subjects, as detailed above, are available through the GAIN dbGAP database

(http://www.ncbi.nlm.nih.gov/gap) and the NIMH data repository

(http://www.nimhgenetics.org). This resource for the study of BD has been widely utilized by academic groups both within and outside of the United States, as well as by

3

Bipolar GWA Study pharmaceutical and biotechnology companies. Additional subject self-report data, including the Akiskal Temperament Scale (4), the Basic Language Morningness Scale

(5), the Childhood Life Events Scale (Lawson & Gershon, unpublished), the Lifetime

History of Aggression measure (6), the Questionnaire on Genetic Risk (Nurnberger &

Lawson, unpublished), the Temperament and Character Inventory (7), a Visual

Analogue Scale on Mood, the Wender Attention Deficit Scale (8), and the Zuckerman-

Kuhlman Personality Questionnaire (9) are not yet publicly posted but are available from the authors upon request.

Genotyping and Quality Control

Genotyping and quality control of data available on dbGaP.

Genotyping was carried out by The Broad Institute Center for Genotyping and Analysis

(http://www.broad.mit.edu/node/306). DNA quantity was checked using PicoGreen fluorometry, and sample quality was initially assessed by genotyping a 24-SNP panel on the Sequenom iPLEX platform, which contains a sex-determining assay. Samples were plated at 50 ng/ul in 96 well plates at the Rutgers University Cell and DNA Repository.

In addition, the Centre d'Etude du Polymorphisme Humain (CEPH; www.cephb.fr/en/cephdb/ ) sample NA12144 was placed on each production plate at the

Broad Institute. Because this Bipolar project and the Genome-Wide Association Study of Schizophrenia (dbGaP; Study Accession: phs000021.v2.p1) shared controls, bipolar cases were often genotyped on different plates than controls. This unbalanced distribution of BD cases and controls produced systematic differences in the processing that may have affected allele calling. One plate, containing only 16 EA BD samples,

4

Bipolar GWA Study was not included in the analysis because allele calling for many single nucleotide polymorphisms (SNPs) was aberrant. Genotyping of the EA and AA samples was carried out separately, using the Affymetrix Genome-Wide Human SNP Array 6.0 (10).

Samples for which fewer than 86% of the quality control (FQC) SNPs produced genotypes were rerun. Allele calling was performed using the BirdSeed algorithm (11)

Affymetrix Power Tools version apt-1.8.6

(http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.aff

x#1_3 ) and cluster models (‘priors’) file (found here: http://www.broad.mit.edu/~dbmirel/ncbi/Affy6.0_birdseed1.31.priors.tsv). Scans from the same production plate were clustered together. Concordance between genotypes from the array and those from the initial QC panel was evaluated to confirm sample ID.

Further quality controls were carried out separately for the EA and AA samples.

Samples were not used in the analysis if they failed any of several quality metrics: low call rate (below 98.5% for EA and 97.8% for AA), excessively high or low heterozygosity

(between 0.344 and 0.363 for EA and between 0.29 and 0.324 for AA), or incompatibility between reported gender and genetically determined gender. Samples were also checked for unexpected familial relationships using pairwise IBD estimation in

PLINK (12). SNPs were not analyzed if the minor allele frequency was < 0.01, if the call rate was < 95%, if the SNP violated Hardy Weinberg Equilibrium (p < 1 x 10 -6 ) in control samples within an ancestry group, if there were 3 or more Mendelian errors, or if there was more than one discrepancy among duplicate samples. Each plate in the study, including those in the GAIN Schizophrenia GWA study, was compared to all other plates. SNP allele frequency variation between the plates was examined using a chi-

5

Bipolar GWA Study squared association test. This was performed with PLINK by setting all the individuals on one plate to “case” and all other individuals to “control”. If the association p-value was less than 10 -8 for any one plate or 10 -4 for two or more plates, the SNP was removed. This resulted in 39,817 SNPs being removed from EA and 8,026 SNPs being removed from AA. The total number of SNPs passing all initial QC tests was 729,454 for

EA and 845,814 for AA.

Additional Quality Control after downloading of data from dbGaP

SNP quality assessment.

Data were downloaded from the database of Genotype and Phenotype (dbGaP; http://www.ncbi.nlm.nih.gov/gap). Each dataset (i.e., EA and

AA) underwent a second round of quality control limited to the samples of interest for our BD GWA study. Since we analyzed only a subset of the genotyped samples (i.e., the BD and control samples), we reapplied similar QC thresholds to the final set of individuals meeting certain criteria: < 1 homozygote to homozygote error in all duplicates; < 2 homozygote to heterozygote errors in all duplicates; > 90% SNP call rate; minor allele frequency > 0.01; and no Hardy Weinberg Disequilibrium in control samples within an ancestry group (p < 1 x 10 -6 ). We tested whether plate effects were due to problems with genotype calling that were driven by a handful of poorly genotyped individual samples (i.e., individuals that may have altered genotype clustering based on allelic intensity values). We also merged our data with data obtained for the GAIN

Schizophrenia GWA study, removed all individuals that had been removed for any QC reason, and recalled all the genotypes by plate using Birdseed version 2 in Affymetrix

6

Bipolar GWA Study

Power Tools (version 1.10.0). This procedure did not remove the observed plate effect and so the genotypes as originally called were used in the final analysis.

The final dataset consisted of 724,067 SNPs in the EA dataset and 840,730

SNPs in the AA dataset. The two datasets had 702,044 SNPs in common, and these were used to perform analyses addressing SNP and SNP x genetic background interactions in the combined sample.

Analysis of SNP genotypes for individuals. Individual ancestry and admixture levels were assessed by the program Local Ancestry in Admixed Populations (LAMP)

(13). Admixture was assessed for both the EA and AA samples. LAMP uses a sliding window approach to estimate ancestry of local regions of the genome, which can then be averaged to obtain genome-wide levels of admixture. It is most effective when analyzing populations of recent admixture, such as African Americans, and has been shown to be faster and more accurate than STRUCTURE (13). We used the HapMap

(version 2, release 23a) CEU and YRI collections as ancestral populations.

Population stratification.

We performed analyses to control for population stratification using several available methods, including genomic control-adjusted pvalues, logistic regression analyses using a LAMP-generated ancestry estimate as a covariate, and logistic regression analyses using multidimensional scaling values (i.e., the top 4 coordinates) as covariates. We also compared LAMP admixture estimates to admixture estimates generated with STRUCTURE v2.2 (14, 15) based on the use of either the same ancestral populations (i.e., HapMap subjects) or based on an alternative set consisting of subjects from the Human Genome Diversity Panel (HGDP)

(e.g., see 16). Specifically, a random panel of 3,505 SNPs present in all datasets were

7

Bipolar GWA Study chosen from chromosome 1 with intermarker-distances of > 15,000 bp. In order to avoid strand issues between markers genotyped on different platforms, CG and AT

SNPs were excluded. Ancestral allele frequencies were estimated using either genotypes from 60 CEPH and 60 Yoruban subjects from HapMap (version 2, release

23a) or genotypes from 158 European and 102 African subjects from the HGDP. All

STRUCTURE runs were performed using a burn-in period of 5,000 iterations, followed by 5,000 MCMC repeats, and were based on the admixture model and the F model of correlation in allele frequencies across clusters. A supervised analysis with K = 2 clusters was performed with individuals from Europe and Africa forced into separate clusters. Individual admixture was estimated as the average membership coefficients across five replicate runs. LAMP and STRUCTURE ancestry estimates were similar

(Figure S4). Ancestry estimates were not affected by the choice of HapMap or HGDP populations.

Replication Genotyping

A subset of 85 SNPs from our BD GWA study was selected for replication genotyping based on several criteria, with a primary focus on the allelic association pvalue in the larger EA sample. Other factors considered in this prioritization were statistical support for association from neighboring SNPs and evidence for association in the AA GWA study sample. The 85 SNPs were genotyped using Sequenom mass spectrometry technology in 1,749 EA subjects from 250 multiplex bipolar families which have been previously described (17). There was a modest amount of overlap between individuals in this familial sample and individuals in our BD GWA study EA sample, with

8

Bipolar GWA Study

199 EA cases drawn from the familial cohort. The SNPs in this set were also genotyped using the same technique in an independent EA cohort consisting of 1,263 BPI cases and 431 controls. Family-based transmission analysis was performed in the family sample as previously described (17), using both a strict (affecteds-only) and a more permissive definition (individuals not meeting criteria for BPI, BPII, or recurrent depression) of unaffected status. The case-control cohort was analyzed in the same fashion as our BD GWA study described above. Meta-analysis was performed to evaluate evidence for association across the primary EA sample and the two replication genotyping cohorts using the METAL method (Willer and Abecasis, http://www.sph.umich.edu/csg/abecasis/Metal/index.html). Effective sample sizes for each of the three studies were obtained using a published method (18).

Statistical Analysis

Imputation.

Imputation was performed for the cleaned EA dataset using MACH v.1.0.16 (http://www.sph.umich.edu/csg/abecasis/mach/index.html) with HapMap rel21 phased haplotypes as a reference. Model parameters were estimated with a random subset of 200 individuals before imputation on the entire dataset. Association with estimated genotype expectations and BD was performed in R using logistic regression, with the top 4 MDS components as covariates.

9

Bipolar GWA Study

Supplemental Table 1 (S1) : Subject demographics.

Demographic

Female

African

Ancestry

Cases

N = 345

African

Ancestry

Controls

N = 670

European

Ancestry

Cases

N = 1,001

European

Ancestry

Controls

N = 1,033

239 (69%) 398 (59%) 501 (50%) 501 (48%)

Gender

Male 106 (31%) 272 (41%) 500 (50%) 532 (52%)

Diagnosis

Bipolar I (BPI)

Schizoaffective disorder, bipolar type

(SABP)

<= 18 years

315 (91%)

30 (9%)

201 (58%)

--

--

3 (0.4%)

951 (95%)

50 (5%)

541 (54%)

--

--

10 (1%)

> 18 years

Age at onset or study entry

Unknown

Average/median

131 (38%)

13 (4%)

18.0/16

667 (99.6%)

0 (0%)

45.8/46

426 (43%)

34 (3%)

19.3/18

1023 (99%)

0 (0%)

52.2/53

10

Bipolar GWA Study

Supplemental Table 2 (S2) : Consistency of top hits across the different study samples.

Analysis

AA

LAMP adjusted

SNP/Allele rs2111504/T rs2769605/T

Location

MAF

(AA/EA)

AA Lamp EA MDS

EA/AA Combined

Main Effect MDS

OR p-value OR p-value OR p-value

DPY19L3

(19q13.11)

NTRK2

(9q21.33)

0.23/0.17 1.73 1.7 x 10 -6 1.12

0.22/0.57 0.61 4.5 x 10 -5 0.92

0.20

0.20

1.31

0.84

8.6 x 10

1.5 x 10

-5

-3 rs1825828/C 3q11.2 0.58/0.26 0.91 0.33 0.70 7.0 x 10 -7 0.77 5.3 x 10 -6

EA

MDS adjusted rs5907577/T Xq27.1 rs10193871/G

NAP5

(2q21.2)

EA/AA Combined

Main Effect rs4825220/C Xq27.1

MDS adjusted

0.10/0.31 1.08

0.06/0.13 0.75

0.1/0.52 0.62

0.67

0.18

0.01

1.48 1.6 x 10

0.65 9.8 x 10

0.74 2.8 x 10

-6 1.27 9.7 x 10

-6 0.67 4.2 x 10

-5 0.71 2.6 x 10

-5

-6

-7

We investigated the association for our top SNPs in each sub-sample within the other sub-samples studied. This table shows, for each SNP identified, the association score in each of our three sub-samples.

11

Bipolar GWA Study

Supplemental Table 3 (S3).

Annotation of regions showing high haplotype heterogeneity.

Chromsome Start

1 196366461

End

196672977

Genes within/nearby region

NEK7

1

6

225792670

64735225

226014057

64943243

ZNF678, C1orf142, JMJD4

PHF3

6

7

8

10

11

12

17

23

98667547

46118553

109360409

61748209

98821885

46178058

109478088

61913110

POU3F2

IGFBP3

EIF3E, TTC35

ANK3

104024629 104164746 PDGFD, CASP12

46696784

36697627

47173860

36746143

SENP1, PFKM, ASB8, C12orf68,

OR10AD1, H1FNT, ZNF641,

ANP32D, C12orf54

KRTAP cluster

78870379 79016438 ITM2A, TBX22

12

Bipolar GWA Study

Supplemental Table 4 (S4).

Results for SNPs reported in Baum et al. (2008).

CHR

4

7

4

4

4

4

4

4

3

3

3

3

2

2

1

2

2

8

9

7

8

10

10

13

16

16

17

20

X

SNP BP Allele Cases Controls CHISQ rs12032218 235262729 A rs13414801 28143851 A rs4853066 rs6732834

74943663

85359113

A

A rs3100624 232646827 C rs7610043 rs9869826 rs7620081 rs1388612

29362635

50973856

51475631

62219484

T

C

C

A rs4411993 rs7683874 rs7660807 rs7660807

7517366

7526324

A

T

71423140 G

71423140 G rs2162126 129998526 T rs3736456 187359349 C rs3736456 187359349 C rs815952 55529378 A rs10949703 157735765 C rs1561158 97684411 A rs2255317 99113704 T rs16929770 116314950 G rs9804190 61509837 T rs1980869 112754584 T rs9315885 41540810 G rs10500336 6237506 C rs4398100 rs2360111 rs4813030 rs6625561

80371845

778417

1045609

T

A

T

69141924 G

0.11

0.06

0.10

0.10

0.14

0.06

0.06

0.08

0.18

0.33

0.17

0.07

0.09

0.48

0.08

0.08

0.17

0.25

0.19

0.11

0.08

0.21

0.17

0.32

0.14

0.19

0.39

0.19

0.30

0.14

0.07

0.08

0.08

0.15

0.05

0.05

0.09

0.21

0.35

0.20

0.07

0.07

0.52

0.08

0.08

0.18

0.24

0.19

0.11

0.07

0.24

0.19

0.32

0.11

0.24

0.43

0.22

0.30

6.78

1.16

2.95

2.95

1.45

0.63

0.63

1.98

6.16

3.47

5.96

0.03

6.71

6.80

0.13

0.49

1.22

0.18

0.00

0.08

1.12

5.63

3.93

0.19

7.40

16.68

5.63

3.52

0.02

P

0.01307

0.06255

0.01461

0.86280

0.00959

0.00912

0.72000

0.48340

0.26880

0.00923

0.28110

0.08589

0.08589

0.22810

0.42670

0.42670

0.15950

0.66930

0.99080

0.78100

0.28980

0.01763

0.04742

0.66420

0.00653

0.00004

0.01762

0.06079

0.87440

OR Gene

1.03

1.00

1.03

1.13

0.84

0.85

0.97

1.30

0.73

0.86

0.86

1.01

0.78

0.87

1.21

1.21

0.90

1.12

1.12

0.86

0.82

0.88

0.82

0.98

RYR2

BRE

HK2

TCF7L1

1.35

MGC42174

0.85

1.04

1.09

0.91

RBMS3

DOCK3

VprBP

PTPRG

SORCS2

SORCS2

UNQ689

UNQ689

PHF17

CYP4V2

CYP4V2

GASP

PTPRN2

SDC2

MATN2

DFNB31

ANK3

SHOC2

DGKH

A2BP1

PLCG2

NXN

PSMF1

EDA

Of the 88 replicated SNPs reported in the previous study, 29 were directly genotyped in the present study. All allele-wise results are shown, with SNPs in physical order (UCSC hg 18, dbSNP build 125).

13

Bipolar GWA Study

SUPPLEMENTAL FIGURE LEGENDS

Supplemental Figure 1 (S1): QQ and multidimensional scaling (MDS) plots for each study population.

Unadjusted –log(p-values) are shown in black, with MDS (top 4 components) adjustment in red and LAMP adjustment in blue. In the top left hand corner of each plot are the top 2 MDS components for each analysis plotted against each other. Genomic control

values are shown given the current study size and are corrected for a study size of 1000 cases and 1000 controls. In the case where EA and AA individuals are analyzed together, unadjusted

levels are elevated because of different ratios of cases and controls in the two populations. All individuals shown in the MDS plots in the upper left corners were included in the analyses.

Supplemental Figure 2 (S2) : Regions near top hits and areas of interest.

Regions +/- 250 kb around each SNP listed in Table 1 are shown. P-values are from the analysis where the SNP was identified. Genotyped SNPs are shown as circles, while imputed SNPs are shown as smaller diamonds. The primary SNP of interest is large and colored in black. Other SNPs are colored according to linkage disequilibrium levels with the primary SNP (r2), as calculated from Phase 3 HapMap data using CEU (for EA and EA + AA combined) or YRI (for AA) populations. Recombination rate (HapMap) is shown on the second y-axis in blue. RefSeq genes are shown with all possible exons;

14

Bipolar GWA Study arrows indicate transcript direction. In the upper left hand corner of each graph, the genotype intensity plots are shown, with each color indicating the final genotype call

(blue and red for homozygotes and purple for the heterozygote).

Supplemental Figure 3 (S3): Top regions characterized in Ferreria et al., +/- 250 kb.

Each circle represents a SNP, with the first y-axis indicating the p-value for BD in EA individuals (MDS adjusted) for this study. Genotyped SNPs are indicated with a circle, while imputed SNPs are indicated with diamonds. The most strongly previously associated SNP is indicated by a large black circle. SNPs are colored shades of red depending on their linkage disequilibrium with the most strongly previously associated

SNP (r2, calculated from HapMap CEU Phase 3 using Haploview). Recombination rate

(HapMap) is shown on the second y-axis in blue. Green horizontal lines indicate haplotype association p-values from a 10-SNP sliding window. Below the plot are SNPs from genotyped SNPs in the WTCCC Bipolar Disorder and STEP-BP studies, with pvalue indicated by color. RefSeq genes are shown with all possible exons; arrows indicate transcript direction.

Supplemental Figure 4 (S4): Comparison of European ancestry estimates.

Red: HapMap and HGDP with (STRUCTURE, 3503 SNPs on Chr. 1); Green:

STRUCTURE and LAMP subset (Hapmap, 3503 SNPs on Chr. 1); Blue: STRUCTURE and LAMP full set (HapMap, 3503 SNPs on Chr. 1 and all markers for LAMP).

15

Download