Supplementary Materials

advertisement
Supplementary Materials
A Systematic Heritability Analysis of the Human Whole Blood
Transcriptome
Tianxiao Huan1,2, Chunyu Liu1.2, Roby Joehanes1,2,3, Xiaoling Zhang1, Brian H. Chen1.2, Andrew D.
Johnson1, Chen Yao1,2, Paul Courchesne1,2, Christopher J. O'Donnell1, Peter J. Munson3, and Daniel Levy
1,2
The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA
Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute,
Bethesda, MD
3
Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for
Information Technology, National Institutes of Health, Bethesda, MD
1
2
Correspondence should be addressed to:
Daniel Levy, MD
Framingham Heart Study
Population Sciences Branch
National Heart, Lung, and Blood Institute
73 Mt. Wayte Avenue, Suite 2
Framingham, MA 01702
Email: Levyd@nih.gov
Phone: 508-935-3458
Fax: 508-872-2678
Running title: Heritability of the Blood Transcriptome
Key words: heritability, eQTL, transcriptome, gene expression
Word count of the abstract: 245
Word count of the body without references or tables: 5637
Word count of the body including references and tables: 9814
Numbers of figures: 5
Numbers of tables: 4
1
Supplementary Notes
Heritability estimation in relation to cohorts and sample size
It is well known that heritability estimates are affected by sample size and family structure (Tenesa and
Haley 2013). Previous studies reported heritability for gene expression in smaller sample groups
(n=~1000) and with relatively simple family structures (Emilsson et al. 2008; Goring et al. 2007;
Grundberg et al. 2012; Price et al. 2011; Stranger et al. 2007). For example, Grundberg et al. reported
heritability for gene expression in twins (Grundberg et al. 2012); Price et al reported heritability for gene
expression in a population sample via identity-by-descent (IBD) in related or unrelated individuals (Price
et al. 2011).
We found that 40% of gene transcript levels are heritable and the average heritability of global
gene expression genome-wide is estimated to be 0.07 (for all 18,000 genes) and 0.13 (for 7,161 genes
2
2
with β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
>0). Several studies reported that 40-70% of gene transcripts are heritable with β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
>0
(Emilsson et al. 2008; Goring et al. 2007; Price et al. 2011; Stranger et al. 2007). Dixon et al reported that
the average heritability for global gene expression is 0.2 (Dixon et al. 2007), and the average heritability
2
for transcripts with β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
>0 range between 0.15 and 0.30 (Emilsson et al. 2008; Price et al. 2011;
Stranger et al. 2007). These heritability estimates are considerable larger than our findings.
We further compared the heritability estimates at the individual transcript level between our study
and the study by Goring et al. (Göring et al. 2007). We used the following formula to compare if
heritability estimates were smaller in FHS than in the study by Goring et al. (Göring et al. 2007).

here
2
2
hFHS
ο€­ hGoring
2
hGoring
2
hFHS
and
%
2
hGoring
,
represent the heritability estimates in FHS and in the study by Goring et al.
(Göring et al. 2007), respectively, for the same gene transcripts. Supplementary Table S3 showed that
62% of 838 genes with h2geneExp>0.2 display lower heritability estimates in FHS than in the study by
Goring at al. (Göring et al. 2007) using Δ < -10% rule. The Pearson correlation is only 0.3 between the
heritability estimates in two studies. However, when the heritability estimates being higher, a smaller
proportion of genes display lower heritability estimates in FHS than in the study by Goring et al. (Göring
et al. 2007). In addition, the heritability estimates become more concordant in both studies for genes with
higher heritability estimates. The Pearson correlation between the heritability estimates of genes in both
studies increased from 0.3 to 0.85 when the heritability estimates increased from 0.2 to 0.8.
2
2
The overall distribution of β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
for genes in the FHS offspring cohort and the third
generation cohort is similar (Supplementary Figure S2). The average heritability is slightly higher in the
2
2
offspring cohort (average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.087 and median β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.052) than in the third generation
2
2
cohort (average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.078 and median β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.044). To investigate if sample size and family
structure affects heritability estimates, we randomly selected n=50, 100 and 200 nuclear families from the
full sample. We estimated heritability using two family structures: 1) offspring only (that is, only
offspring individuals in these n=50, 100, and 200 nuclear families) and 2) both offspring and parents (that
is, the entire nuclear families). To minimize fluctuations, we performed 500 random samplings in selected
2
nuclear families. We randomly selected 10 genes in each of seven β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
levels (70 genes in total),
2
β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
= (0, 0.1), (0.1, 0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5), (0.5, 0.6), and (0.6, 1) to represent the whole
2
gene expressions. Supplementary Figures S3 displays β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
estimates from different numbers of
2
nuclear families for 70 genes. The standard errors of β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
are 0.16, 0.11 and 0.07 on average when
sampling 50, 100, and 200 full families, and are 0.18, 0.12, and 0.08 when sampling 50, 100, and 200
siblings from unclear families. This result indicates that larger sample size and more comprehensive
family structure provide more accurate heritability estimates.
Heritability estimation in relation to cell types
Previous heritability estimates were reported for gene expression measured in different tissues [19, 20]. In
the FHS, gene expression levels were measured using whole blood derived RNA, which contains multiple
cell types. To gain insight into how different cell types in whole blood affect heritability estimates, we
compared the heritability estimates for transcripts (n=18,000) with and without adjusting for cell types
(i.e. white blood cells [WBCs], neutrophils, lymphocytes, monocytes, platelets, eosinophils and
basophils) as additional covariates in the LME models. The cell count proportions in whole blood were
measured in 2138 individuals. We imputed cell count proportions of all FHS individuals by Partial Least
Squares (PLS) regression (Boardman et al. 1981). The imputed cell count proportion values were highly
consistent with the measured cell counts proportion (Joehanes R, PhD, in preparation). The average
2
2
β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
remained unchanged, that is the average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.072 before and after accounting for cell
counts.
Furthermore, we selected individuals falling within the top 25% and bottom 25% for three cell
types [ WBCs, neutrophils, and lymphocytes, respectively], and compared the heritability estimates in the
top and bottom levels for these three cell types. We used LME models implemented in lmekin() (Abecasis
et al. 2001) to test the associations between cell types and transcripts. The transcripts associated with cell
types at Bonferroni corrected p<0.05 were used to compare their heritability levels between samples from
3
the top and bottom cell type proportion levels. The WBC count affected heritability estimates the most.
2
2
The average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
was higher in the individuals in the top quartile of WBC (average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.13)
2
than those in the bottom quartile (average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.10; the heritability estimates are significantly
different [students’ T-test, p=5.7e-11]). Similarly, the heritability estimates were slightly higher in
2
individuals in the top quartile of lymphocyte proportions (average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.10) than in those in the
2
2
bottom quartile (average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.09) (students’ T-test, p=1.5e-4). The average β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
for
individuals in the top vs. bottom quartiles of neutrophil proportions were not different (average
2
β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
=0.10, p=0.6).
Summary of eQTLs results
The eQTL data were generated using gene expression results from 5257 individuals with both genotype
dosage data imputed from 1000-Geonome SNP set and gene expression profiling. In total, we identified
2,206,283 cis and 160,183 trans eQTLs for 12,407 genes at FDR<0.05. Details of eQTLs will be reported
in a separate paper (Joehanes et al. in preparation).
4
Supplementary Figures
Supplementary Figure S1: Heritability distribution of transcriptom in Framingham cohorts. A)
Heritability (h2) distribution of transcripts without SNP-probes (n=6,059, see methods); B) Summary of
genes in different h2 sub-categories
5
Supplementary Figure S2: Heritability distribution of transcripts in Framingham offspring cohort
and the third generation cohort.
6
Supplementary Figure S3: Heritability estimates of transcripts regarding to sample size and family
2
2
structure. The x-axis indicated the 70 genes, 10 genes in each of seven β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
levels, β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
= (0,
2
0.1), (0.1, 0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5), (0.5, 0.6), and (0.6, 1). The y-axis is β„Žπ‘”π‘’π‘›π‘’πΈπ‘₯𝑝
.
7
Supplementary Figure S4: Effect size of cis/trans eQTLs in the discovery and replication sets. Cis
eQTLs were defined as the eQTLs located within 1MB before and after the transcriptional start site of
eQTL genes, and trans eQTLs indicated that trans eQTLs and eQTL genes were in different
chromosome. In this study, we excluded trans eQTLs in the same chromosome with eQTL genes but
extended 1MB window.
8
Supplementary Figure S5: Comparison of t-test Statistics of cis (A) and trans (B) eQTLs at
FDR<1e-4 before and after accounting for cell counts effects. We compared the ratios of BETA/SE (tvalue) of eQTLs identified by adjusting for cell types to those without adjusting for cell types in
regression models. The t-values remained similar with or without adjusting for cell types. Pearson
correlations for t values were 0.99 and 0.98 for cis and trans eQTLs respectively.
9
Supplementary Figure S6: Relationship between Heritability Estimates and proportion of genes
with cis/trans eQTLs. A) Proportion of transcripts having cis eQTLs in different heritability levels; B)
Proportion of transcripts having trans eQTLs in different heritability levels. (This result is based on
eQTLs from 6,059 genes without SNP-probes, see methods)
10
Supplementary Figure S7: Proportion of Transcript Variance Explained by Peak eQTLs. A)
Variance proportion of a transcript explained by a single peak cis eQTL; B) Variance proportion of a
transcript explained by a single peak trans eQTL. (This result is based on eQTLs from 6,059 genes
without SNP-probes, see methods)
11
Supplementary Tables
Supplementary Table S1: Technical covariates utilized for gene expression data
normalization
Supplementary Table S2: Pedigree structure formed by Framingham Heart Study
participants
Supplementary Table S3: Comparison of h2 estiamtes in the FHS and in the Goring et al
Supplementary Table S4: Gene Expression Heritability and Their Peak cis eQTL
Supplementary Table S5: Gene Expression Heritability and Their Peak trans eQTL
Supplementary Table S6: Gene Expression Heritability and Their Peak long-range cis
eQTL
Supplementary Table S7: Differentially expressed genes for body mass index at FDR<0.05
12
References
Abecasis GR, Cardon LR, Cookson WO, Sham PC, Cherny SS (2001) Association analysis in a variance
components framework. Genet Epidemiol 21 Suppl 1: S341-6
Boardman AE, Hui BS, Wold H (1981) The partial least squares-fix point method of estimating
interdependent systems with latent variables. Communications in statistics-theory and methods
10: 613-639
Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M,
Lathrop GM, Abecasis GR, Cookson WO (2007) A genome-wide association study of global
gene expression. Nat Genet 39: 1202-7
Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters
GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir
I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson
KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG,
Thorsteinsdottir U, Lamb JR, Gulcher JR, Reitman ML, Kong A, Schadt EE, Stefansson K
(2008) Genetics of gene expression and its effect on disease. Nature 452: 423-8
Göring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JB, Abraham LJ,
Rainwater DL, Comuzzie AG (2007) Discovery of expression QTLs using large-scale
transcriptional profiling in human lymphocytes. Nature genetics 39: 1208-1216
Goring HHH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JBM, Abraham LJ,
Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, MacCluer JW, Kissebah AH, Collier
GR, Moses EK, Blangero J (2007) Discovery of expression QTLs using large-scale
transcriptional profiling in human lymphocytes. Nature Genetics 39: 1208-1216
Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, Bell JT, Yang TP, Meduri E, Barrett
A, Nisbett J, Sekowska M, Wilk A, Shin SY, Glass D, Travers M, Min JL, Ring S, Ho KR,
Thorleifsson G, Kong A, Thorsteindottir U, Ainali C, Dimas AS, Hassanali N, Ingle C, Knowles
D, Krestyaninova M, Lowe CE, Di Meglio P, Montgomery SB, Parts L, Potter S, Surdulescu G,
Tsaprouni L, Tsoka S, Bataille V, Durbin R, Nestle FO, O'Rahilly S, Soranzo N, Lindgren CM,
Zondervan KT, Ahmadi KR, Schadt EE, Stefansson K, Smith GD, McCarthy MI, Deloukas P,
Dermitzakis ET, Spector TD, R MTHE (2012) Mapping cis- and trans-regulatory effects across
multiple tissues in twins. Nature Genetics 44: 1084-+
Price AL, Helgason A, Thorleifsson G, McCarroll SA, Kong A, Stefansson K (2011) Single-Tissue and
Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated
Individuals. Plos Genetics 7
Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller
D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET (2007) Population genomics of human
gene expression. Nature Genetics 39: 1217-1224
Tenesa A, Haley CS (2013) The heritability of human disease: estimation, uses and abuses. Nat Rev
Genet 14: 139-49
13
Download