Data source Diseases Ancestry (Cases, controls) Cross

advertisement
Data source
WTCCC
Diseases
Bipolar disorder
Coronary artery disease
Crohn’s disease
Hypertension
Rheumatoid
Type 1 diabetes
Ancestry
Europe
Europe
Europe
Europe
Europe
Europe
(Cases, controls)
(1817, 2928)
(1878, 2928)
(1729, 2928)
(1934, 2928)
(1894, 2928)
(1939, 2928)
Cross-validation
5-fold
5-fold
5-fold
5-fold
5-fold
5-fold
Functional SNP sets for 2D PRS
Europe
(5937, 10862)
10-fold
Asian
(5510, 4544)
10-fold
Europe
(5066, 8807)
10-fold
blood eSNPs1,2, CR-SNPs3, active histone marks H3K4me3
and H3K9-14Ac in HAEC4, active histone marks in bladder
cell lines downloaded from the ROADMAP project, lung
related functional SNPs (eSNPs5 and meSNPs6 in lung
tissues, H3K4me and H3K-14Ac in HAEC4)
blood eSNPs1,2, CR-SNPs3, eSNPs5 and meSNPs6 in lung
tissues, active histone marks H3K4me3 and H3K9-14Ac in
HAEC4, pleiotropic SNPs with p<0.01 (denoted as PT-0.01)
or p<0.001 (denoted as PT-0.001) in at least one other
trait.
CR-SNPs3, eSNPs/meSNPs in adipose7,8, combined active
histone mark (H3K4me3, H3K9-14Ac, H3K36me3,
H3K4me1, H3K9ac and H3K9me3) SNPs in pancreatic islet
cells and primary pancreatic cells downloaded from the
ROADMAP project, PT-0.01 and PT-0.001 SNPs.
Bladder cancer
Three cancer
GWAS with
individual
genotype data
Lung cancer,
Asian non-smoking
females
Pancreatic cancer
blood eSNPs1,2, CR-SNPs3
Table S1A: Disease GWAS with individual genotype data used for evaluating risk prediction performance.
Type 2 diabetes
Validation sample
Data sources
(Cases, controls)
GERA
(1500,1500)
Europe
Discovery sample
Data sources
(Cases, controls)
DIAGRAM
(17,802, 105,109)
GERA
TRICL
(11,300, 15,952)
PLCO
(1237,1330)
Europe
Europe
PGC2
GECCO
(31,560,42,951)
(9,719, 10,937)
MGS
PLCO
(2681,2653)
(1000,2302)
Europe
African
Japanese
Latino
PRACTICAL
ELLIPSE
(38,703, 40,796)
Pegsus
(4600,2941)
Ancestry
Europe
Lung cancer
Schizophrenia
Colorectal cancer
Prostate cancer
Table S1B: Disease GWAS with independent validation samples for evaluating prediction performance.
Functional SNP sets for 2D PRS
CR-SNPs3, eSNPs/meSNPs in adipose7,8, histone mark
SNPs in pancreatic islet cells.
blood eSNPs1,2, eSNPs5 and meSNPs6 in lung tissues,
CR-SNPs3, H3K4me3 in SAEC9, PT-0.01 and PT-0.001
SNPs.
blood eSNPs1,2, CR-SNPs3, PT-0.01 and PT-0.001 SNPs
blood eSNPs1,2, CR-SNPs3, PT-0.01 and PT-0.001 SNPs,
histone mark SNPs in colon/rectal cells in the
ROADMAP project .
blood eSNPs1,2, CR-SNPs3, PT-0.01 and PT-0.001 SNPs,
TCF7L2/H3K27Ac (-DHT)/H3K27Ac(+DHT) in LNCaP
cells10.
Disease
PRS
High priority
SNPs for 2D PRS
1D
Bipolar
disorder
2D
Blood eSNPs
CR SNPs
1D
Coronary
artery disease
2D
Blood eSNPs
CR SNPs
1D
Crohn’s
disease
2D
Blood eSNPs
CR SNPs
1D
Hypertension
2D
Blood eSNPs
CR SNPs
1D
Rheumatoid
Type 1
diabetes
2D
Blood eSNPs
CR SNPs
1D
2D
Blood eSNPs
CR SNPs
Prediction R2
Nagelkerke R2
AUC
Winner’s curse correction
Winner’s curse correction
Winner’s curse correction
NO
LASSO
MLE
NO
LASSO
MLE
NO
LASSO
MLE
5.59%
5.64%
5.62%
7.59%
7.65%
7.62%
0.635
0.636
0.635
5.75%
5.74%
5.72%
7.80%
7.79%
7.76%
0.637
0.636
0.636
5.72%
5.75%
5.76%
7.76%
7.80%
7.81%
0.637
0.637
0.637
1.63%
1.58%
1.61%
2.22%
2.14%
2.18%
0.572
0.571
0.572
1.73%
1.67%
1.72%
2.34%
2.26%
2.34%
0.575
0.574
0.575
1.79%
1.72%
1.70%
2.43%
2.33%
2.31%
0.578
0.574
0.572
6.65%
8.22%
7.60%
9.25%
11.32%
10.43%
0.646
0.660
0.656
7.71%
8.75%
8.40%
10.59%
12.10%
11.55%
0.658
0.667
0.663
6.85%
8.25%
7.75%
9.34%
11.40%
10.61%
0.651
0.660
0.656
3.04%
3.02%
3.07%
4.12%
4.09%
4.16%
0.597
0.597
0.598
3.33%
3.28%
3.27%
4.52%
4.45%
4.44%
0.601
0.600
0.600
3.23%
3.15%
3.2
4.38%
4.27%
4.34%
0.600
0.600
0.600
7.24%
8.60%
7.60%
9.77%
11.59%
10.30%
0.653
0.669
0.659
7.12%
8.69%
7.74%
9.63%
11.71%
10.44%
0.650
0.671
0.658
7.50%
8.68%
7.84%
10.11%
11.69%
10.56%
0.657
0.670
0.661
18.20%
18.50%
18.20%
26.09%
26.99%
26.03%
0.754
0.758
0.754
18.30%
18.70%
18.40%
26.31%
26.87%
26.54%
0.755
0.758
0.756
18.50%
18.70%
18.50%
26.70%
27.20%
26.64%
0.757
0.758
0.756
Table S2: Prediction R2 (=cor(y,PRS)2), Nagelkerke R2 and AUC in the WTCCC data, based on five-fold cross-validation.
Disease
PRS
High priority
SNPs for 2D
PRS
NO
LASSO
MLE
0.4
0.5
0.3
Blood eSNPs
(0.3,0.4)
(0.6,0.5)
(0.3,0.4)
CR SNPs
(0.3,0.1)
(0.4,0.2)
(0.9,0.2)
Blood eSNPs
CR SNPs
0.6
(0.5,0.4)
(0.5, 0.01)
0.7
(0.8,0.4)
(0.8,0.2)
0.6
(0.8,0.5)
(0.7,0.3)
Blood eSNPs
CR SNPs
0.0001
(0.001,0.0001)
(0.0001,0.00005)
0.005
(0.005,0.005)
(0.005,0.005)
0.005
(0.005, 0.0005)
(0.001, 0.0005)
0.3
0.4
0.4
Blood eSNPs
(0.4,28)
(0.9,0.3)
(0.6,0.3)
CR SNPs
(0.2,0.3)
(0.4,0.5)
(0.4,0.3)
1D
Bipolar disorder
Coronary artery
disease
2D
1D
2D
1D
Crohn’s disease
2D
1D
Hypertension
2D
1D
Rheumatoid
2D
0.000001
0.001
0.00005
Blood eSNPs
(0.00005,0.00005)
(0.005,0.001)
(0.0001,0.00005)
CR SNPs
(0.00001, 0.00001)
(0.001,0.005)
(0.000000005,0.00005)
Blood eSNPs
CR SNPs
0.00001
(0.00001,0.000001)
(0.000001,0.00005)
0.005
(0.01,0.005)
(0.001,0.005)
0.0001
(0.0005,0.0001)
(0.00001,0.00005)
0.1
0.2
0.2
Blood eSNPs
(0.05,0.1)
(0.3,0.2)
(0.4,0.02)
CR SNPs
(0.7,0.1)
(0.8,0.2)
(0.6,0.1)
1D
Type 1 diabetes
2D
1D
Type 2 diabetes
2D
Winner’s curse correction
Table S3: Optimal P-value thresholds for including SNPs for 1D and 2D PRS for WTCCC data.
This table corresponds to the results reported in Figure 3 and Supplemental Table S2. For each disease, we have
performed five-fold cross-validation. For each cross-validation, we determined the optimal threshold for 1D PRS
and a pair of thresholds for 2D PRS. The reported data were the median of the five cross-validation results.
Prediction R2
Nagelkerke R2
AUC
Winner’s curse correction
Winner’s curse correction
Winner’s curse correction
NO
LASSO
MLE
NO
MLE
NO
1D
2.20%
2.54%
2.41%
3.26%
3.68%
3.47%
0.587
0.594
0.590
2D, CR-SNPs
2.56%
2.88%
2.74%
3.81%
4.24%
4.07%
0.596
0.601
0.598
2D, histone SNPs, pancreatic islet
2.57%
2.86%
2.72%
3.79%
4.18%
3.99%
0.597
0.601
0.598
2D, histone SNPs, pancreatic
2.44%
2.86%
2.66%
3.57%
4.18%
3.88%
0.592
0.600
0.597
2D, PT-0.001 SNPs
2.50%
2.64%
2.60%
3.63%
3.83%
3.75%
0.594
0.597
0.595
2D, PT-0.01 SNPs
2.48%
2.68%
2.58%
3.61%
3.89%
3.77%
0.593
0.596
0.595
2D, eSNPs/meSNPs in adipose
2.59%
2.81%
2.73%
PRS and high-priority
SNPs for 2D PRS
Disease
Pancreatic
cancer
Asian lung
LASSO
MLE
3.75%
4.06%
3.95%
0.593
0.599
0.598
1D
2.35%
2.51%
2.50%
3.15%
3.36%
3.35%
0.586
0.591
0.590
2D, blood SNPs
2.58%
2.63%
2.62%
3.46%
3.51%
3.50%
0.592
0.593
0.592
2D, CR-SNPs
2.42%
2.58%
2.57%
3.24%
3.46%
3.43%
0.588
0.591
0.591
2D, PT-0.01
2.70%
2.72%
2.82%
3.61%
3.64%
3.77%
0.593
0.594
0.595
2D, PT-0.001
2.70%
2.69%
2.76%
3.61%
3.60%
3.69%
0.594
0.594
0.595
2D, H3kme3, HAEC
2.76%
2.74%
2.84%
4.09%
4.07%
4.20%
0.595
0.595
0.596
2D, H3K9-14Ac, HAEC
2.65%
2.69%
2.76%
4.02%
4.07%
4.15%
0.593
0.594
0.596
2.63%
1.29%
2.62%
1.22%
4.01%
3.98%
4.08%
0.591
0.592
0.592
1D
2.55%
1.12%
1.53%
1.78%
1.68%
0.561
0.565
0.563
2D, CR-SNPs
1.34%
1.33%
1.34%
1.84%
1.83%
1.84%
0.568
0.566
0.566
2D, blood eSNPs
1.30%
1.46%
1.36%
1.79%
2.00%
1.87%
0.566
0.569
0.568
2D, H3K4me3, HAEC
1.47%
1.61%
1.57%
2.03%
2.21%
2.17%
0.570
0.574
0.573
2D, H3K9-14Ac, HAEC
1.45%
1.55%
1.47%
1.99%
2.13%
2.02%
0.570
0.573
0.571
2D, histone SNPs, OADMAP bladder
1.46%
1.57%
1.53%
2D, functional SNPs in lung tissues
1.54%
1.64%
1.62%
2.01%
2.13%
2.15%
2.25%
2.11%
2.23%
0.571
0.572
0.572
0.575
0.571
0.573
2D, eSNPs and meSNPs in lung
Bladder
LASSO
Table S4: Prediction R2 (=cor(y,PRS)2), Nagelkerke R2 and AUC in the three cancer GWAS data sets, based on 10-fold cross-validation.
PRS and high-prority
SNPs for 2D PRS
Disease
Prediction R2
Winner’s curse correction
NO
1D
Pancreatic cancer
Bladder cancer
MLE
0.176
0.104
0.176
0.107
0.145
0.109
0.059
0.071
2D, histone SNPs, pancreatic
0.149
0.063
0.055
2D, PT-0.001 SNPs
0.076
0.125
0.023
2D, PT-0.01 SNPs
0.044
0.131
0.023
2D, eSNPs/meSNPs in adipose
0.132
0.144
0.135
0.220
0.124
2D, CR-SNPs
2D, histone SNPs, pancreatic islet
1D
Asian lung
LASSO
2D, blood SNPs
0.113
0.111
0.058
2D, CR-SNPs
0.250
0.140
0.052
2D, PT-0.01
0.112
0.092
0.028
2D, PT-0.001
0.078
0.082
0.025
2D, H3kme3, HAEC
0.155
0.114
0.097
2D, H3K9-14Ac, HAEC
0.200
0.221
0.137
2D, eSNPs and meSNPs in lung
0.220
0.121
0.072
1D
0.50
0.15
0.13
2D, CR-SNPs
0.06
0.06
0.02
2D, blood eSNPs
0.15
0.12
0.13
2D, H3K4me3, HAEC
0.08
0.02
0.05
2D, H3K9-14Ac, HAEC
0.09
0.07
0.10
2D, histone SNPs, OADMAP bladder
0.15
0.12
0.15
0.13
0.07
0.11
2D, functional SNPs in lung tissues
Table S5: P-values for testing whether a PRS statistically significantly improved the risk prediction for three
cancer GWAS. P-values were calculated based on a t-statistic with standard deviation estimated by 10-fold
cross-validation.
PRS and high-priority
SNPs for 2D PRS
Disease
NO
5×10-6
LASSO
10-3
MLE
10-4
2D, CR-SNPs
(10-6,5×10-6)
(0.001, 5×10-4)
(5×10-4, 10-5)
2D, histone SNPs, pancreatic islet
(10-5,5×10-6)
(0.002, 10-4)
(5×10-4,5×10-6)
2D, histone SNPs, pancreatic
(5×10-5, 10-5)
(0.005, 5×10-5)
(10-3, 5×10-5)
2D, PT-0.001 SNPs
(10-5, 10-6)
(10-3, 10-4)
(5×10-4, 10-5)
2D, PT-0.01 SNPs
(5×10-6,5×10-6)
(10-3, 5×10-4)
(5×10-5, 10-5)
(10-5, 5×10-7)
(0.002, 10-4)
(10-4,5×10-6)
5×10-6
10-4
10-5
(5×10-6, 5×10-7)
(5×10-4, 10-4)
(10-5, 10-5)
2D, CR-SNPs
(7,5×10-6)
(5×10-5, 10-4)
(10-5, 10-5)
2D, PT-0.01
(10-5, 5×10-7)
(5×10-4, 5×10-5)
(10-4, 10-5)
1D
Pancreatic
cancer
2D, eSNPs/meSNPs in adipose
1D
2D, blood SNPs
Asian lung
2D, PT-0.001
(10 , 10 )
(10 , 5×10 )
(10-4, 10-5)
2D, H3kme3, HAEC
(10-6, 10-6)
(5×10-4, 5×10-5)
(5×10-6, 10-5)
2D, H3K9-14Ac, HAEC
(1,5×10-6)
(10-5, 10-4)
(5×10-5, 5×10-5)
1D
(10-6, 5×10-7)
5×10-6
(5×10-4, 10-4)
10-4
(5×10-6, 10-5)
5×10-5
2D, CR-SNPs
(5×10-6, 10-7)
(5×10-4, 10-4)
(5×10-5, 10-5)
(5×10-6,5×10-6)
(10-3, 10-4)
(10-4, 0.00005)
2D, H3K4me3, HAEC
(10-4,5×10-6)
(0.005, 10-4)
(0.002, 10-5)
2D, H3K9-14Ac, HAEC
(10-5, 5×10-7)
(0.005, 10-4)
(10-3,5×10-6)
(10-3, 10-5)
(0.005, 5×10-4)
(5×10-4, 10-4)
2D, eSNPs and meSNPs in lung
2D, blood eSNPs
Bladder
Winner’s curse correction
2D, histone SNPs, ROADMAP bladder
-5
-7
-3
-5
(10-4, 10-6)
(0.002, 10-4)
(5×10-4, 10-5)
2D, functional SNPs in lung tissues
Table S6: Optimal P-value thresholds for including SNPs for 1D and 2D PRS for three cancers GWAS.
This table corresponds to the results reported in Figure 3 and Supplemental Table S4. For each disease, we have
performed 10-fold cross-validation. For each cross-validation, we determined the optimal threshold for 1D PRS and
a pair of thresholds for 2D PRS. The reported data were the median of the ten cross-validation results.
PRS and high-priority
SNPs for 2D PRS
Disease
T2D
EUR lung
Prostate
Prediction R2
Nagelkerke R2
AUC
Winner’s curse correction
Winner’s curse correction
Winner’s curse correction
NO
LASSO
MLE
NO
LASSO
MLE
NO
LASSO
MLE
1D
2.29%
3.10%
2.67%
3.05%
4.13%
3.56%
0.582
0.597
0.590
2D, CR-SNPs
2.73%
3.32%
3.11%
3.64%
4.43%
4.15%
0.594
0.600
0.600
2D, histone SNPs, pancreatic islet
2.58%
3.23%
2.81%
3.44%
4.32%
3.75%
0.590
0.600
0.594
2D, eSNPs/meSNPs
2.58%
3.28%
2.83%
3.44%
4.38%
3.78%
0.587
0.600
0.593
2D, eSNPs/meSNPs and H3K4me3 in islet
2.90%
3.53%
3.13%
3.87%
4.71%
4.17%
0.598
0.605
0.598
2D, eSNPs/meSNPs, CR-NPs
1D
2.92%
1.13%
3.48%
1.12%
3.30%
1.12%
3.89%
4.65%
4.41%
0.594
0.602
0.601
1.52%
1.48%
1.50%
0.564
0.563
0.563
2D, CR-SNPs
1.17%
1.23%
1.16%
1.55%
1.64%
1.55%
0.564
0.564
0.564
2D, eSNPs and meSNPs in lung
1.14%
1.22%
1.13%
1.52%
1.63%
1.51%
0.563
0.566
0.563
2D, eSNPs and meSNPs
1.14%
1.31%
1.12%
1.52%
1.75%
1.49%
0.564
0.571
0.563
2D, PT-0.01 SNPs
1.14%
1.12%
1.14%
1.52%
1.49%
1.52%
0.564
0.563
0.563
2D, PT-0.001 SNPs
1.15%
1.21%
1.14%
1.54%
1.61%
1.52%
0.567
0.567
0.563
2D, H3K4me3, SAEC
1.13%
1.35%
1.21%
1.51%
1.80%
1.61%
0.560
0.569
0.565
2D, eSNPs, meSNPs and H3K4me3 in SAEC
1.14%
1.65%
1.25%
1.52%
1.98%
1.67%
0.566
0.574
0.567
1D
6.94%
6.87%
6.98%
9.43%
9.35%
9.48%
0.654
0.652
0.654
2D, blood eSNPs
6.95%
6.93%
7.15%
9.44%
9.43%
9.72%
0.654
0.653
0.656
2D, CR-SNPs
6.95%
7.05%
6.98%
9.44%
9.58%
9.49%
0.654
0.653
0.654
2D, PT-0.001
6.94%
7.10%
6.98%
9.43%
9.67%
9.49%
0.654
0.655
0.654
2D, PT-0.01
6.94%
7.04%
7.02%
9.43%
9.58%
9.55%
0.654
0.654
0.654
2D, H3K27Ac, -DHT
7.02%
7.10%
7.10%
9.54%
9.65%
9.65%
0.655
0.655
0.655
2D, H3K27Ac, +DHT
6.95%
7.06%
6.98%
9.45%
9.60%
9.48%
0.654
0.654
0.653
2D, TCF7L2
6.96%
6.90%
7.00%
9.45%
9.38%
9.51%
0.654
0.652
0.654
Table S7: Prediction R2 (=cor(y,PRS)2), Nagelkerke R2 and AUC for five large scale GWAS summary statistics with independent validation data.
Disease
CRC
PRS and high-priority
SNPs for 2D PRS
Nagelkerke R2
AUC
Winner’s curse correction
Winner’s curse correction
Winner’s curse correction
NO
MLE
NO
LASSO
LASSO
MLE
NO
LASSO
MLE
1D
1.37%
1.33%
1.26%
1.93%
1.87%
1.78%
0.571
0.570
0.568
2D, blood eSNPs
1.40%
1.40%
1.41%
1.97%
1.96%
1.98%
0.570
0.571
0.572
2D, CR-SNPs
1.34%
1.33%
1.28%
1.92%
1.86%
1.78%
0.570
0.570
0.568
2D, PT-0.001
1.41%
1.39%
1.32%
1.93%
1.92%
1.81%
0.570
0.571
0.569
2D, PT-0.01
1.38%
1.35%
1.28%
1.97%
1.93%
1.84%
0.571
0.571
0.570
2D, H3K27ac
1.44%
1.47%
1.51%
2.04%
2.07%
2.11%
0.571
0.570
0.571
2D, H3K36me3
1.36%
1.32%
1.31%
1.93%
1.86%
1.84%
0.571
0.570
0.569
2D, H3K4me1
1.40%
1.38%
1.42%
1.98%
1.95%
2.00%
0.571
0.571
0.570
2D, H3K4me3
1.39%
1.33%
1.27%
1.96%
1.88%
1.80%
0.572
0.570
0.569
2D, H3K9ac
SCZ
Prediction R2
1.38%
1.37%
1.29%
1.96%
1.92%
1.82%
0.571
0.571
0.569
1D
14.01%
14.94%
14.89%
18.75%
19.99%
19.91%
0.717
0.724
0.724
2D, blood eSNPs
14.10%
14.94%
14.91%
18.88%
19.99%
19.94%
0.718
0.724
0.723
2D, CR-SNPs
14.25%
15.37%
15.15%
19.03%
20.56%
20.24%
0.718
0.727
0.725
2D, PT-0.001 SNPs
14.09%
15.00%
14.95%
18.83%
20.02%
20.00%
0.717
0.724
0.724
2D, PT-0.01 SNPs
14.07%
14.97%
14.95%
18.85%
19.99%
19.95%
0.718
0.724
0.724
Table S7–Continued: Prediction R2 (=cor(y,PRS)2), Nagelkerke R2 and AUC for five large scale GWAS summary statistics with independent validation data.
PRS and high-priority
SNPs for 2D PRS
Disease
1D
CRC
Winner’s curse correction
NO
LASSO
MLE
0.5879
0.7412
2D, blood eSNPs
0.4152
0.4404
0.4245
2D, CR-SNPs
0.6179
0.5755
0.6736
2D, PT-0.001
0.3040
0.4621
0.5941
2D, PT-0.01
0.4638
0.5442
0.6736
2D, H3K27ac
0.3632
0.3556
0.2951
2D, H3K36me3
0.5362
0.6038
0.6239
2D, H3K4me1
0.4207
0.4790
0.3962
2D, H3K4me3
0.4207
0.5793
0.7007
2D, H3K9ac
0.4715
0.5000
0.6631
1.5E-11
2.2E-09
1D
SCZ
2D, blood eSNPs
1.7E-01
1.5E-11
6.0E-08
2D, CR-SNPs
2.0E-01
3.2E-10
5.8E-06
2D, PT-0.001 SNPs
3.5E-02
3.1E-10
1.6E-08
2D, PT-0.01 SNPs
2.7E-01
9.9E-10
8.8E-08
0.00173
0.02748
1D
T2D
EUR lung
2D, CR-SNPs
0.04529
0.00030
0.00313
2D, histone SNPs, pancreatic islet
0.07353
0.00059
0.01513
2D, eSNPs/meSNPs
0.13234
0.00048
0.01890
2D, eSNPs/meSNPs and H3K4me3 in islet
0.01468
0.00002
0.00256
2D, eSNPs/meSNPs, CR-NPs
1D
0.01222
0.00004
0.00038
0.5285
0.5662
2D, CR-SNPs
0.4166
0.3446
0.4300
2D, eSNPs and meSNPs in lung
0.4778
0.3478
0.5000
2D, eSNPs and meSNPs
0.4778
0.2169
0.5199
2D, PT-0.01 SNPs
0.4693
0.5222
0.4668
2D, PL-0.001 SNPs
0.4532
0.3085
0.4715
2D, H3K4me3, SAEC
0.5000
0.1694
0.3581
2D, eSNPs, meSNPs and H3K4me3 in SAEC
0.4878
0.1399
0.3413
0.6306
0.2866
1D
Prostate
2D, blood eSNPs
0.4602
0.5173
0.0401
2D, CR-SNPs
0.4327
0.3162
0.3581
2D, PT-0.001
0.5000
0.2767
0.3792
2D, PT-0.01
0.5000
0.3446
0.2692
2D, H3K27Ac, -DHT
0.2119
0.2611
0.1587
2D, H3K27Ac, +DHT
0.4594
0.3222
0.4070
2D, TCF7L2
0.3170
0.5721
0.2209
Table S8: P-values for testing whether a PRS statistically significantly improved the risk prediction for five largescale GWAS summary statistics based on bootstrap.
PRS and high-priority
SNPs for 2D PRS
Disease
MLE
0.008
0.01
(0.002, 5×10 )
(0.02,0.005)
(0.01, 0.00005)
2D, histone SNPs, pancreatic islet
(0.1,0.002)
(0.03,0.008)
(0.02,0.005)
2D, eSNPs/meSNPs
(0.02,0.002)
(0.03,0.008)
(0.02,0.002)
-5
2D, CR-SNPs
EUR lung
-8
(0.03,0.005)
(0.02,0.005)
10-4
(0.02, 0.002)
(0.01,0.0001)
10-7
2D, CR-SNPs
(10-10, 5×10-9)
(0.02, 5×10-4)
(10-10, 10-6)
2D, eSNPs and meSNPs in lung
(5×10-9, 10-10)
(10-4, 5×10-10)
(10-7, 10-10)
2D, eSNPs and meSNPs
(5×10-9, 10-10)
(0.01, 5×10-6)
(10-7, 10-10)
2D, PT-0.01 SNPs
(10-10, 5×10-9)
(5×10-5,5×10-6)
(5×10-8, 10-6)
2D, PL-0.001 SNPs
(0.001, 5×10-9)
(0.002,5×10-6)
(5×10-8, 10-6)
1D
2D, blood eSNPs
-9
(0.002, 5×10 )
(0.008, 10 )
(0.005, 10-7)
(0.001, 10-6)
(0.008,5×10-6)
(0.005,5×10-6)
5×10-6
0.002
5×10-5
(5×10-6,5×10-6)
(0.03,0.005)
(5×10-5,0.001)
-6
(5×10 , 10 )
(0.02,0.001)
(5×10-5, 10-5)
2D, PT-0.001
(5×10-6,5×10-6)
(0.06,0.002)
(5×10-5, 10-5)
2D, PT-0.01
(5×10-6,5×10-6)
(0.02,0.002)
(5×10-5, 10-5)
2D, H3K27Ac, -DHT
(5×10-4,5×10-6)
(0.07,0.002)
(0.04, 5×10-5)
(10 ,5×10 )
(0.08,0.005)
(0.04, 5×10-5)
(0.005,5×10-6)
(0.002,0.02)
(0.005, 5×10-5)
2D, TCF7L2
1D
-6
-5
-5
2D, CR-SNPs
2D, H3K27Ac, +DHT
-6
0.005
0.008
0.008
2D, blood eSNPs
(0.008,0.005)
(0.008,0.02)
(0.008,0.03)
2D, CR-SNPs
(0.008,0.005)
(0.01,0.008)
(0.008,0.005)
2D, PT-0.001
(0.008,0.005)
(0.02,0.008)
(0.03,0.008)
2D, PT-0.01
(0.005,0.005)
(0.01,0.008)
(0.03,0.008)
2D, H3K27ac
(0.03,0.005)
(0.04,0.008)
(0.03,0.005)
2D, H3K36me3
(0.008,0.005)
(0.01,0.008)
(0.03,0.008)
2D, H3K4me1
(0.005,0.005)
(0.01,0.008)
(0.03,0.008)
2D, H3K4me3
(0.002,0.005)
(0.008,0.008)
(0.008,0.005)
2D, H3K9ac
(0.005,0.005)
(0.01,0.008)
(0.01,0.008)
0.2
0.3
0.2
2D, blood eSNPs
(0.04,0.2)
(0.3,0.3)
(0.09,0.2)
2D, CR-SNPs
(0.5,0.05)
(0.8,0.3)
(0.5,0.1)
2D, PT-0.001 SNPs
(0.4,0.2)
(0.3,0.7)
(0.9,0.3)
1D
SCZ
0.002
(0.01, 5×10 )
(0.01, 5×10-5)
5×10-9
2D, eSNPs, meSNPs and H3K4me3 in SAEC
CRC
NO
2D, eSNPs/meSNPs and H3K4me3 in islet
2D, eSNPs/meSNPs, CR-NPs
1D
2D, H3K4me3, SAEC
Prostate
Winner’s curse correction
LASSO
1D
T2D
P-value thresholds
2D, PT-0.01 SNPs
(0.01,0.2)
(0.3,0.3)
(0.07,0.2)
Supplemental Table S9: Optimal P-value thresholds for including SNPs for 1D and 2D PRS for five diseases with
large-scale discovery data and independent validation samples. This table corresponds to the results reported in
Figure 3 and Table S7.
1D
1D-LASSO
1D- MLE
2D-random
2D-random-LASSO
2D-random-MLE
2D-CR
2D-CR-LASSO
2D-CR-MLE
Δ=2
5 × 10−5
5 × 10−4
5 × 10−4
(0.01, 10−5 )
(0.03, 1 × 10−4 )
(0.04, 5 × 10−5 )
(5 × 10−5 ,5 × 10−5 )
(5 × 10−3 , 10−4 )
(5 × 10−3 , 10−4 )
Δ=3
5 × 10−5
5 × 10−4
5 × 10−4
(0.04, 5 × 10−6 )
(0.08, 5 × 10−5 )
(0.08, 10−5 )
(5 × 10−4 , 5 × 10−5 )
(0.002, 10−4 )
(0.002, 10−4 )
Δ=4
5 × 10−5
5 × 10−4
5 × 10−4
(0.08, 5 × 10−6 )
(0.2, 10−4 )
(0.2, 5 × 10−5 )
(10−4 , 10−5 )
(0.001, 10−4 )
(0.001, 5 × 10−5 )
Table S10: Optimal P-value thresholds for including SNPs for 1D and 2D PRS in simulation studies. For each
parameter setting, 50 simulations were performed and the P-value thresholds reported in the tables are the
median of the 50 simulations. This table corresponds to the results reported in Figure 4. For 2D PRS, the two Pvalue thresholds correspond to the high-priority SNP set and the low priority SNP set.
“1D” denotes 1D PRS without winner’s curse correction; “1D-LASSO(MLE)” denotes 1D PRS with LASSO-type (MLE)
correction; “2D-random” indicates 2D PRS with functional SNP sets randomly selected from the LD-pruned SNPs in
the genome; “2D-CR” indicates 2D PRS using SNPs in conserved regions as functional SNPs. Δ is the enrichment
fold change for the high-priority SNPs.
A: WTCCC data. Reported values are based on the average of five-fold cross validation.
Winner’s curse
correction
BD
CAD
CD
HT
RA
T1D
NO
0.043
0.021
0.364
0.036
0.710
0.649
LASSO
0.071
0.026
1.111
0.052
1.204
0.867
MLE
0.051
0.024
0.495
0.043
0.905
T2D
0.303
1.255
0.637
0.355
B: For pancreatic cancer, Asian nonsmoking female lung cancer and bladder cancer, the reported values are
based on the average of 10 fold cross validation. For other five diseases, the values are based on independent
validation samples.
Winner’s
curse
Pancreatic Bladder Lung cancer, Lung cancer,
Prostate
Colorectal
correction cancer
cancer
Asian
EUR
T2D
Schizophrenia
cancer
cancer
NO
0.67
0.75
0.91
0.66
0.22
0.17
0.60
0.17
LASSO
1.83
1.82
1.88
0.96
0.74
0.31
0.86
0.86
MLE
0.69
0.66
0.99
0.67
0.26
0.23
0.61
0.28
Table S11: Calibration comparison for 1D PRS modeling with or without winner’s curse correction. Reported
values are the coefficient of the PRS in the logistic regression. A value close to one represents a well-calibrated
prediction model. The calibration results for 2D PRS are similar to 1D PRS and are not reported here. LASSO-type
winner’s correction has the smallest bias overall.
k
2
3
4
5
k
2
3
4
5
(A) Type-2 diabetes
Number of samples (out of 1500 validation samples) with k-fold of populationaverage risk
Standard 1D PRS
Best PRS
Theoretical
Empirical
Theoretical
Empirical
calculation
calculation
calculation
calculation
10.4
10
30.4
31
0.1
0
1.3
2
0.0014
0
0.069
0
0.000026
0
0.005
0
(B) Lung cancer in European population
Number of samples (out of 1333 validation samples)
with k-fold of population-average risk
Stanford PRS
Best PRS
Theoretical
Empirical
Theoretical
Empirical
calculation
calculation
calculation
calculation
0.62
0
2.17
4
1.7E-4
0
0.0029
0
6.3E-8
0
0.0000055
0
4.3E-11
0
0.000000017
0
Ratio (bestPRS/standard-PRS)
theoretical calculation
2.93
12.52
51.26
190.61
Ratio (bestPRS/standard-PRS)
theoretical calculation
3.29
17.37
88.55
405.3
Table S12: Implication of identifying high-risk subjects based on PRS.
Here, we calculate the proportion of samples in the general population is identified as high-risk based on a given
PRS distribution. For a given PRS, we assume that the PRS risk scores follows a centered normal distribution, i.e.
𝑠~𝑁(0, 𝜎 2 ), with parameters estimated based on validation sample. We first perform calibration by fitting a
logistic regression 𝑙𝑜𝑔𝑖𝑡(𝑦|𝑠) = 𝛼 + 𝛽𝑠 to derive 𝛽̂ . The calibrated risk score is then exp(𝛽̂ 𝑠). The average risk in
the population is then 𝐴 = ∫ exp(𝛽̂ 𝑠) 𝜙(𝑠; 0, 𝜎)ds. To identify samples with projected risk greater than k-fold of
the population average risk, we need to find a cut off 𝑠0 s.t. exp(𝛽̂ 𝑠0 ) = 𝑘𝐴, i.e., 𝑠0 = log(𝑘𝐴) /𝛽̂ . The theoretical
proportion of samples is then calculated by 𝑃(𝑠 ≥ 𝑠0 ) assuming a normal distribution 𝑁(0, 𝜎 2 ).
Here, we use two datasets, T2D and lung cancer GWAS data, to illustrate the calculation. The parameter 𝜎 was
estimated based on the control samples in the validation sample. We calculated the number of samples with k-fold
greater risk out of the validation samples using the above theoretical calculations. We also empirically calculate
this number based on the PRS in the validation sample. For each disease, we compared our best PRS with the
standard 1D PRS without winner’s curse correction or integrating functional data. For T2D, the best PRS is the 2D
PRS with eSNPs/meSNPs and H3K4me3 SNPs in pancreatic islet cell line. For lung cancer, the best PRS is the 2D PRS
with eSNPs, meSNPs and H3K4me3 SNPs in SAEC.
Figure S1: Randomly selected SNPs and SNPs related with conserved genomic regions (CR-SNPs) have different
local linkage disequilibrium (LD) pattern. For each given SNP (either randomly selected from LD-pruned SNPs or
CR-SNPs after pruning), we counted the number of SNPs located less than 1Mb from the SNP and had 𝑟 2 ≥ 0.8.
Shown are the histograms of the LD SNPs for two SNP sets. Mean=6.4 and median=2 for randomly selected SNPs;
while mean=22.4 and median=12 for CR-SNPs. Thus, CR-SNPs have a much stronger local LD pattern than randomly
selected SNPs.
Figure S2: The prediction R2 for four diseases with large-scale discovery samples. For each disease, the left panel
reports the 1D PRS R2 with varying p-value threshold; the right panel reports the 2D PRS R2 for the HP SNP set
achieving the highest prediction.
Additional acknowledgements
Funding for GECOO (Genetics and Epidemiology of Colorectal Cancer) Consortium
GECCO: National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human
Services (U01 CA137088; R01 CA059045). ASTERISK: a Hospital Clinical Research Program (PHRC)
and supported by the Regional Council of Pays de la Loire, the Groupement des Entreprises Françaises
dans la Lutte contre le Cancer (GEFLUC), the Association Anne de Bretagne Génétique and the Ligue
Régionale Contre le Cancer (LRCC). COLO2&3: National Institutes of Health (R01 CA60987). DACHS:
German Research Council (Deutsche Forschungsgemeinschaft, BR 1704/6-1, BR 1704/6-3, BR 1704/6-4
and CH 117/1-1), and the German Federal Ministry of Education and Research (01KH0404 and
01ER0814). DALS: National Institutes of Health (R01 CA48998 to M. L. Slattery). HPFS is supported by
the National Institutes of Health (P01 CA 055075, UM1 CA167552, R01 137178, R01 CA151993 and
P50 CA127003), NHS by the National Institutes of Health (UM1 CA186107, R01 CA137178, P01
CA87969, R01 CA151993 and P50 CA127003) and PHS by the National Institutes of Health (R01
CA042182). MEC: National Institutes of Health (R37 CA54281, P01 CA033619, and R01 CA63464).
OFCCR: National Institutes of Health, through funding allocated to the Ontario Registry for Studies of
Familial Colorectal Cancer (U01 CA074783); see CCFR section above. Additional funding toward
genetic analyses of OFCCR includes the Ontario Research Fund, the Canadian Institutes of Health
Research, and the Ontario Institute for Cancer Research, through generous support from the Ontario
Ministry of Research and Innovation. PMH: National Institutes of Health (R01 CA076366 to P.A.
Newcomb). VITAL: National Institutes of Health (K05 CA154337). WHI: The WHI program is funded
by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health
and Human Services through contracts HHSN268201100046C, HHSN268201100001C,
HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C.
GECCO: The authors would like to thank all those at the GECCO Coordinating Center for helping bring
together the data and people that made this project possible. The authors acknowledge Dave Duggan and
team members at TGEN (Translational Genomics Research Institute), the Broad Institute, and the
Génome Québec Innovation Center for genotyping DNA samples of cases and controls, and for scientific
input for GECCO. ASTERISK: We are very grateful to Dr. Bruno Buecher without whom this project
would not have existed. We also thank all those who agreed to participate in this study, including the
patients and the healthy control persons, as well as all the physicians, technicians and students. DACHS:
We thank all participants and cooperating clinicians, and Ute Handte-Daub, Utz Benscheid, Muhabbet
Celik and Ursula Eilber for excellent technical assistance.
HPFS, NHS and PHS: We would like to acknowledge Patrice Soule and Hardeep Ranu of the Dana
Farber Harvard Cancer Center High-Throughput Polymorphism Core who assisted in the genotyping for
NHS, HPFS, and PHS under the supervision of Dr. Immaculata Devivo and Dr. David Hunter, Qin
(Carolyn) Guo and Lixue Zhu who assisted in programming for NHS and HPFS, and Haiyan Zhang who
assisted in programming for the PHS. We would like to thank the participants and staff of the Nurses'
Health Study and the Health Professionals Follow-Up Study, for their valuable contributions as well as
the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN,
IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA,
WA, WY. The authors assume full responsibility for analyses and interpretation of these data. PMH: The
authors would like to thank the study participants and staff of the Hormones and Colon Cancer study.
WHI: The authors thank the WHI investigators and staff for their dedication, and the study participants
for making the program possible. A full listing of WHI investigators can be found at:
http://www.whi.org/researchers/Documents%20%20Write%20a%20Paper/WHI%20Investigator%20Shor
t%20List.pdf
PanScan I, II and III authors: Brian M. Wolpin1, 2, Cosmeri Rizzato3, Peter Kraft4, 5, Charles
Kooperberg6, Gloria M. Petersen7, Zhaoming Wang8, 9, Alan A. Arslan10, 11, 12, Laura Beane-Freeman8,
Paige M. Bracci13, Julie Buring14,15, Federico Canzian3, Eric J. Duell16, Steven Gallinger17, Graham G.
Giles18, 19, 20, Gary E. Goodman6, Phyllis J. Goodman21, Eric J. Jacobs22, Aruna Kamineni23, Alison P.
Klein24, 25, Laurence N. Kolonel26, Matthew H. Kulke1, Donghui Li27, Núria Malats28, Sara H. Olson29,
Harvey A. Risch30, Howard D. Sesso4, 14, 15, Kala Visvanathan31, Emily White32, 33, Wei Zheng34, 35,
Christian C. Abnet8, Demetrius Albanes8, Gabriella Andreotti8, Melissa A. Austin33, Richard Barfield5,
Daniela Basso36, Sonja I. Berndt8, Marie-Christine Boutron-Ruault37, 38, 39, Michelle Brotzman40, Markus
W. Büchler41, H. Bas Bueno-de-Mesquita42, 43, 44, Peter Bugert45, Laurie Burdette8, 9, Daniele Campa46,
Neil E. Caporaso8, Gabriele Capurso47, Charles Chung8, 9, Michelle Cotterchio48, 49, Eithne Costello50,
Joanne Elena51, Niccola Funel52, J. Michael Gaziano14, 15, 53, Nathalia A. Giese41, Edward L. Giovannucci4,
54, 55
, Michael Goggins56, 57, 58, Megan J. Gorman1, Myron Gross59, Christopher A. Haiman60, Manal
Hassan27, Kathy J. Helzlsouer61, Brian E. Henderson62, Elizabeth A. Holly13, Nan Hu8, David J. Hunter2, 63,
64
, Federico Innocenti65, Mazda Jenab66, Rudolf Kaaks46, Timothy J. Key67, Kay-Tee Khaw68, Eric A.
Klein69, Manolis Kogevinas70, 71, 72, Vittorio Krogh73, Juozas Kupcinskas74, Robert C. Kurtz75, Andrea
LaCroix6, Maria T. Landi8, Stefano Landi76, Loic Le Marchand77, Andrea Mambrini78, Satu Mannisto79,
Roger L. Milne18, 19, Yusuke Nakamura80, Ann L. Oberg81, Kouros Owzar82, Alpa V. Patel22, Petra H. M.
Peeters83, 84, Ulrike Peters85, Raffaele Pezzilli86, Ada Piepoli87, Miquel Porta71, 88, 89, Francisco X. Real90, 91,
Elio Riboli44, Nathaniel Rothman8, Aldo Scarpa92, Xiao-Ou Shu34, 35, Debra T. Silverman8, Pavel Soucek93,
Malin Sund94, Renata Talar-Wojnarowska95, Philip R. Taylor8, George E. Theodoropoulos96, Mark
Thornquist6, Anne Tjønneland97, Geoffrey S. Tobias8, Dimitrios Trichopoulos4, 98, 99, Pavel Vodicka100,
Jean Wactawski-Wende101, Nicolas Wentzensen8, Chen Wu4, Herbert Yu77, Kai Yu8, Anne ZeleniuchJacquotte11, 12, Robert Hoover8, Patricia Hartge8, Charles Fuchs1, 54, Stephen J. Chanock8, 9, Rachael S.
Stolzenberg-Solomon8, Laufey T. Amundadottir8
1
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston,
Massachusetts, USA
3
Genomic Epidemiology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
4
Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA
5
Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
6
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington,
USA
7
Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester,
Minnesota, USA
8
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health,
Bethesda, Maryland, USA
9
Cancer Genomics Research Laboratory, National Cancer Institute, Division of Cancer Epidemiology
and Genetics, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research,
Frederick, Maryland, USA
10
Department of Obstetrics and Gynecology, New York University School of Medicine, New York, New
York, USA
11
Department of Environmental Medicine, New York University School of Medicine, New York, New
York, USA
12
New York University Cancer Institute, New York, New York, USA
13
Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco,
California, USA
14
Division of Preventive Medicine, Department of Medicine, Brigham and Women’s Hospital and
Harvard Medical School, Boston, Massachusetts, USA
2
Division of Aging, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical
School, Boston, Massachusetts, USA
16
Unit of Nutrition, Environment and Cancer, Cancer Epidemiology Research Program, Bellvitge
Biomedical Research Institute (IDIBELL), Catalan Institute of Oncology (ICO), Barcelona, Spain
17
Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
18
Cancer Epidemiology Centre, Cancer Council Victoria, Melbourne, Victoria, Australia
19
Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The
University of Melbourne, Victoria, Australia
20
Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Victoria,
Australia
21
Southwest Oncology Group Statistical Center, Fred Hutchinson Cancer Research Center, Seattle,
Washington, USA
22
Epidemiology Research Program, American Cancer Society, Atlanta, Georgia, USA
23
Group Health Research Institute, Seattle, Washington, USA
24
Department of Oncology, the Johns Hopkins University School of Medicine, Baltimore, Maryland,
USA
25
Department of Epidemiology, the Bloomberg School of Public Health, Baltimore, Maryland, USA
26
The Cancer Research Center of Hawaii (retired), Honolulu, Hawaii, USA
27
Department of Gastrointestinal Medical Oncology, University of Texas M.D. Anderson Cancer Center,
Houston, Texas, USA
28
Genetic and Molecular Epidemiology Group, CNIO-Spanish National Cancer Research Centre, Madrid,
Spain
29
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York,
New York, USA
30
Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut,
USA
31
Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
32
Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
33
Department of Epidemiology, University of Washington, Seattle, Washington, USA
34
Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
35
Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
36
Department of Laboratory Medicine, University Hospital of Padova, Padua, Italy
37
Inserm, Centre for Research in Epidemiology and Population Health (CESP), U1018, Nutrition,
Hormones and Women’s Health Team, F-94805, Villejuif, France
38
University Paris Sud, UMRS 1018, F-94805, Villejuif, France
39
IGR, F-94805, Villejuif, France
40
Westat, Rockville, Maryland, USA
41
Department of General Surgery, University Hospital Heidelberg, Heidelberg, Germany
42
National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
43
Department of Gastroenterology and Hepatology, University Medical Centre Utrecht, Utrecht, The
Netherlands
44
Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London,
London, United Kingdom
45
Institute of Transfusion Medicine and Immunology, Heidelberg University, Medical Faculty
Mannheim, German Red Cross Blood Service Baden-Württemberg-Hessen, Mannheim, Germany
46
Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
47
Digestive and Liver Disease Unit, ‘Sapienza’ University of Rome, Rome, Italy
48
Cancer Care Ontario, University of Toronto, Toronto, Ontario, Canada
49
Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
50
National Institute for Health Research Liverpool Pancreas Biomedical Research Unit, University of
Liverpool, Liverpool, United Kingdom
15
51
Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of
Health, Bethesda, Maryland, USA
52
Department of Surgery, Unit of Experimental Surgical Pathology, University Hospital of Pisa, Pisa,
Italy
53
Massachusetts Veteran’s Epidemiology, Research, and Information Center, Geriatric Research
Education and Clinical Center, Veterans Affairs Boston Healthcare System, Boston, Massachusetts, USA
54
Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital,
and Harvard Medical School, Boston, Massachusetts, USA
55
Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, USA
56
Department of Pathology, Sidney Kimmel Cancer Center and Johns Hopkins University, Baltimore,
Maryland, USA
57
Department of Medicine, Sidney Kimmel Cancer Center and Johns Hopkins University, Baltimore,
Maryland, USA
58
Department of Oncology, Sidney Kimmel Cancer Center and Johns Hopkins University, Baltimore,
Maryland, USA
59
Laboratory of Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA
60
Preventive Medicine, University of Southern California, Los Angeles, California, USA
61
Prevention and Research Center, Mercy Medical Center, Baltimore, Maryland, USA
62
Cancer Prevention, University of Southern California, Los Angeles, California, USA
63
Harvard School of Public Health, Boston, Massachusetts, USA
64
Harvard Medical School, Boston, Massachusetts, USA
65
The University of North Carolina Eshelman School of Pharmacy, Center for Pharmacogenomics and
Individualized Therapy, Lineberger Comprehensive Cancer Center, School of Medicine, Chapel Hill,
North Carolina, USA
66
International Agency for Research on Cancer, Lyon, France
67
Cancer Epidemiology Unit, University of Oxford, Oxford, United Kingdom
68
School of Clinical Medicine, University of Cambridge, United Kingdom
69
Glickman Urological and Kidney Institute, Cleveland Clinic, Cleveland, OH, USA
70
Centre de Recerca en Epidemiologia Ambiental (CREAL), CIBER Epidemiología y Salud Pública
(CIBERESP), Spain
71
Hospital del Mar Institute of Medical Research (IMIM), Barcelona, Spain
72
National School of Public Health, Athens, Greece
73
Epidemiology and Prevention Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
74
Department of Gastroenterology, Lithuanian University of Health Sciences, Kaunas, Lithuania
75
Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, USA
76
Department of Biology, University of Pisa, Pisa, Italy
77
Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
78
Oncology Department, ASL1 Massa Carrara, Massa Carrara, Italy
79
National Institute for Health and Welfare, Department of Chronic Disease Prevention, Helsinki, Finland
80
Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
81
Alliance Statistics and Data Center, Division of Biomedical Statistics and Informatics, Department of
Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
82
Alliance Statistics and Data Center, Department of Biostatistics and Bioinformatics, Duke Cancer
Institute, Duke University Medical Center, Durham, North Carolina, USA
83
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The
Netherlands
84
Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London,
London, United Kingdom
85
Epidemiology, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
86
Pancreas Unit, Department of Digestive Diseases and Internal Medicine, Sant’Orsola-Malpighi
Hospital, Bologna, Italy
Department of Gastroenterology, Scientific Institute and Regional General Hospital “Casa Sollievo
della Sofferenza”, Opera di Padre Pio da Pietrelcina, San Giovanni Rotondo, Italy
88
School of Medicine, Universitat Autònoma de Barcelona, Spain
89
CIBER de Epidemiología y Salud Pública (CIBERESP), Spain
90
Epithelial Carcinogenesis Group, CNIO-Spanish National Cancer Research Centre, Madrid, Spain
91
Departament de Ciències i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
92
ARC-NET: Centre for Applied Research on Cancer, University and Hospital Trust of Verona, Verona,
Italy
93
Toxicogenomics Unit, Center for Toxicology and Safety, National Institute of Public Health, Prague,
Czech Republic
94
Department of Surgical and Peroperative Sciences, Umeå University, Umeå, Sweden
95
Department of Digestive Tract Diseases, Medical University of Łodz, Łodz, Poland
96
1st Propaideutic Surgical Department, Hippocration University Hospital, Athens, Greece
97
Institute of Cancer Epidemiology, Danish Cancer Society, Copenhagen, Denmark
98
Bureau of Epidemiologic Research, Academy of Athens, Athens, Greece
99
Hellenic Health Foundation, Athens, Greece
100
Department of Molecular Biology of Cancer, Institute of Experimental Medicine, Academy of
Sciences of the Czech Republic, Prague, Czech Republic
101
Department of Social and Preventive Medicine, University at Buffalo, Buffalo, New York, USA
87
MGS Consortium
The Molecular Genetics of Schizophrenia Consortium includes P.V. Gejman, A.R. Sanders, J. Duan
(North Shore University Health System and University of Chicago), C.R. Cloninger, D.M. Svrakic
(Washington University, St. Louis), N.G. Buccola (Louisiana State University Health Sciences Center,
New Orleans), D.F. Levinson, J. Shi (Stanford University, Stanford, Calif.; Dr. Shi is now at the National
Cancer Institute), B.J. Mowry (Queensland Centre for Mental Health Research, Brisbane, and Queensland
Brain Institute, University of Queensland, Brisbane), R. Freedman, A. Olincy (University of Colorado
Denver), F. Amin (Atlanta Veterans Affairs Medical Center and Emory University, Atlanta), D.W. Black
(University of Iowa Carver College of Medicine, Iowa City), J.M. Silverman (Mount Sinai School of
Medicine, New York), and W.F. Byerley (University of California, San Francisco).
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNAsequencing of 922 individuals. Genome Research 24, 14-24 (2014).
Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease
associations. Nature Genetics 45, 1238-U195 (2013).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29
mammals. Nature 478, 476-482 (2011).
Marconett, C., Zhou, B., Rieger, M., Selamat, S. & Mickael Dubourd, X.F., Sean K. Lynch, Kimberly
D. Siegmund, Benjamin P. Berman, Zea Borok, Ite A. Laird-Offringa. Integrated transcriptomic
and epigenomic analysis reveals novel pathways regulating distal lung epithelial cell
differentiation. PlosGenet (2013).
Hao, K. et al. Lung eQTLs to Help Reveal the Molecular Underpinnings of Asthma. Plos Genetics
8(2012).
Shi, J. et al. Characterizing the genetic basis of methylome diversity in histologically normal
human lung tissue. Nat Commun 5, 3365 (2014).
Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins.
Nat Genet 44, 1084-9 (2012).
Grundberg, E. et al. Global analysis of DNA methylation variation in adipose tissue from twins
reveals links to disease-associated variants in distal regulatory elements. Am J Hum Genet 93,
876-90 (2013).
Consortium, T.E.P. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol
9(2011).
Hazelett, D.J. et al. Comprehensive Functional Annotation of 77 Prostate Cancer Risk Loci. Plos
Genetics 10(2014).
Download