Supplementary Materials for Tung et al, “The genetic architecture of

advertisement
Supplementary Materials for Tung et al, “The genetic architecture of gene expression
levels in wild baboons”
J Tung, X Zhou, SC Alberts, M Stephens, Y Gilad
1. Supplementary Text
Text S1: Read mapping and SNP quality filtering pipeline
Text S2: Estimated accuracy of SNP genotypes using human RNA-seq data
Text S3: Possible confounds associated with eQTL mapping using RNA-seq data
Text S4: Demographic and environmental variables
2. Supplementary Tables
Table S1: Read mapping summary
Table S2: Gene Ontology analysis for genes with no eQTL in baboon or YRI
Table S3: Gene Ontology analysis for genes with eQTL in either or both baboon and
YRI
Table S4: Demographic and environmental data
3. Supplementary Figures
Figure S1: Detailed workflow for gene expression level estimation
Figure S2: Elimination of GC bias via quantile normalization
Figure S3: Detailed workflow for SNP genotyping
Figure S4: Location of analyzed SNPs relative to genes
Figure S5: Accuracy of genotype calls for SNPs independently typed in HapMap3
Figure S6: PCA projection of YRI samples using the RNA-seq-based pipeline versus
independently typed SNPs
Figure S7: Power simulations for masked eQTL based on effect size
Figure S8: Correlation between eQTL detection and mean phyloP scores based on 100way vertebrate comparison
Figure S9: PVE explained by demographic and early environmental variables
Figure S10: Coverage by genotype call
Text S1: Read mapping and SNP quality filtering pipeline
Read alignment was performed against the baboon genome, Panu2.0, using bwa 0.5.9 [1].
Alignments utilized a seed length of 25 bases, maximum edit distance of 2 mismatches in the
seed, read trimming quality score threshold of 20, and the default maximum edit distance (4%
after trimming, which allows no more than 3 mismatches in a 76 base pair read). Only uniquely
mapped reads with mapping quality ≥10 were retained at this step. Because reads that did not
map to the genome could reflect reads spanning exon-exon junctions, we used the program
jfinder [2,3] to separately align the first 35 and last 35 base pairs of each initially unmapped read,
using a seed length of 20 bases. Total read counts for each individual thus reflected the sum of: i)
reads that uniquely mapped directly to the genome; ii) those that uniquely mapped on both ends
after splitting the reads with jfinder; and iii) those that uniquely mapped on one end after
splitting with jfinder. For gene expression analysis, we evaluated the total read counts for each
gene and individual as the sum of the number of reads for that individual that overlapped that
gene. We defined a “gene” as the union of all exon base pairs assigned to a given gene, based on
the NCBI annotation run of the current baboon genome (‘ref_Panu_2.0_top_level.gff3’,
downloaded September 2012).
For variant discovery and genotyping, we used the Genome Analysis Toolkit v. 1.2.6
(GATK: [4,5]). GATK relies on a known set of variants to perform recalibration of base quality
scores; in the absence of known variants, all non-reference base calls are assumed to reflect
sequencing errors. Because no validated reference set of known genetic variants are available for
baboon, we performed an iterative bootstrapping procedure for base quality score recalibration.
Specifically, after processing to remove possible PCR duplicates and performing sample-level
read realignment at potential indels, we performed an initial round of base quality score
recalibration and identified a set of variants using GATK’s UnifiedGenotyper and
VariantFiltration walker (focusing on variants within 20kb of annotated exons, where most of
our reads fell). From this call set, we constructed a set of high confidence variants with quality
score ≥100 that passed all filters for variant confidence (variants failed if QD <2.0), mapping
quality (variants failed if MQ <35.0), strand bias (variants failed if FS >60.0), haplotype score
(variants failed if HaplotypeScore >13.0), mapping quality (variants failed if MQRankSum <12.5) and read position bias (variants failed if ReadPosRankSum <-8.0).
We then used this high confidence set as the set of “known sites” in a second round of
base quality score recalibration. We repeated this process until the number of variants identified
in consecutive rounds of recalibration stabilized (six rounds of recalibration and variant
identification). We again filtered the final call set as described above, and kept only sites that
were polymorphic in Amboseli and did not deviate from Hardy-Weinberg equilibrium (p >
0.001). We also removed any site for which genotype data were missing from more than 20% of
the individuals in the data set and for which mean coverage was <5x per individual. Finally, we
removed sites with a quality score less than 100 and sites distant from expressed genes (>200 kb
away), as we were primarily interested in eQTL mapping in cis.
Text S2: Estimated accuracy of SNP genotypes using human RNA-seq data
To assess the likely accuracy of the RNA-seq based genotyping calls we performed in the
baboons, we investigated a similarly sized data set of RNA-seq reads from a human population
[2]. Because this data set focused on samples from the HapMap consortium (n = 69 members of
the Yoruba population from Ibadan, Nigeria), we were able to compare genotypes called using
the RNA-seq pipeline to independently collected genotype data from HapMap Phase 3 (r27) [6].
To do so, we focused on 9,919 variants that were genotyped in both data sets. We then calculated
the correlation between genotypes called in the RNA-seq-based pipeline and genotypes from
HapMap, for each individual. When including imputed genotypes in this analysis, the median
correlation across individuals was 0.984 (mean = 0.976 ± 0.026); excluding imputed genotypes
yielded a median correlation of 0.987 (mean = 0.980 ± 0.024) (Figure S5a). Similar to the results
of [cites], our data support highly accurate genotype calls from RNA-seq reads. However, we
also found that low accuracy was correlated with the level of apparent homozygosity in the
genotype data: the three lowest accuracy YRI individuals were outliers in this regard (Figure
S5b). In the baboon data, we had no individuals with unusually low homozygosity, but six
individuals with unusually high homozygosity (>80% of genotype calls). These outliers were
missing a median of 10.6% of data (requiring imputation), whereas all other individuals were
missing a median of 0.6% data. Notably, removing these six individuals from our analysis
resulted in very similar results as using the full data set. 87.6% of the genes (n = 1566) identified
when using all individuals were also identified with this subset.
Importantly, the available data from humans also supports accurate variant discovery. Of
the 30,938 sites that we identified from the RNA-seq data and that passed all of our filters, only
3.1% (967) did not have an assigned rsID in dbSNP release 138. These sites were likely enriched
for false positives, as the transition/transversion ratio for this set was 1.42, versus 2.80 for the set
of 30,938 sites as a whole.
Text S3: Possible confounds associated with eQTL mapping using RNA-seq data
We report the results of eQTL mapping in the Amboseli baboon population using both
gene expression level and genotype data derived from the same RNA-seq data set. This strategy
has the potential to introduce four possible confounds.
First, for genes with large effect cis-eQTLs, reads from heterozygotes at eQTL-linked
sites might be biased towards the allele associated with higher gene expression levels. If so,
heterozygotes might be mistakenly genotyped as homozygotes for the high expressing allele,
resulting in an underrepresentation of heterozygous genotypes relative to neutral expectations.
To control for this possibility, we eliminated sites that violated Hardy-Weinberg expectations (n
= 2,386) from our analyses. We note, however, that this scenario would not introduce false
positives. Instead, it would lead to more conservative detection of additive eQTL effects, with
the direction of an estimated eQTL effect still consistent with the true effect.
Second, SNP calling might be biased towards the reference allele. If so, more reads
would be required to support a genotype call of homozygote alternate than a genotype call of
homozygote reference. This bias would result in higher apparent expression levels for alternate
allele homozygotes and lower expression levels for reference allele homozygotes, which could
create false positive eQTLs. However, we observe no evidence for this scenario in our data set.
For all tested SNPs (n = 64,432) and for eQTL SNPs only (n = 1,693), alternate allele
homozygotes tend to have slightly lower coverage than reference allele homozygotes, and
heterozygotes tend to have the highest coverage (because more reads are required to support
inference of heterozygosity) (Figure S10). Thus, coverage and genotype do not covary additively,
and this potential confound is unlikely to produce false positive eQTLs.
Third, read mapping might be biased towards the reference allele, such that reads
carrying the alternate allele are less likely to map because they contain more mismatches to the
reference genome. This possibility is consistent with our observation that alternate allele
homozygotes tend to have slightly less coverage than reference allele homozygotes (Figure
S10a,b). While this difference in coverage is significant (Kolmogorov-Smirnov test: p < 2.2 x 1016
for all SNPs; p = 3.9 x 10-5 for eQTL SNPs), the magnitude of the effect itself is modest
(Figure S10c,d), probably because we allowed reads to map with up to 3 mismatches; Wittkopp
and colleagues have shown that reference allele mapping bias is largely obviated by allowing
reads to map with more mismatches. Further, systematic calling of false positive eQTLs due to
biased read mapping would predict a bias towards negative effect sizes (i.e., eQTL effects
suggesting that the alternate allele is associated with lower expression levels). Our data are not
consistent with such a pattern: 47% of eQTL betas are negative, whereas 53% are positive. We
note that reference allele mapping biases are, however, more likely to affect ASE analysis,
producing a pattern of greater expression in the reference allele. Indeed, we do observe a bias
towards negative betas in the ASE analysis (67.2% of n = 510 genes), although the magnitude
and direction of ASE data agree well with eQTL evidence.
Fourth, lower mean coverage in homozygotes of either type relative to heterozygotes
could induce false positive eQTLs in which the major allele was associated with lower gene
expression levels. To test this possibility, we recoded eQTL effects to reflect the effect of the
major allele instead of the effect of the alternative allele (i.e., a genotype of 0 = homozygous
minor and a genotype of 2 = homozygous major). We observed a modest excess of eQTL for
which the major allele was associated with lower gene expression levels (56%, binomial test p =
1.15 x 10-7). This bias did not differ depending on whether the major allele was the reference
allele or the alternate allele (Fisher’s Exact Test, p = 0.28), supporting minimal read mapping
biases in our data. Instead, it appears to be primarily driven by SNPs with low minor allele
frequencies (proportion of negative betas for the lowest quartile of MAFs = 62.8%, p = 7.49 x
10-8; highest quartile of MAFs = 48.6%, p = 0.602). At these sites, eQTL inference relies
primarily on two genotype classes (the major allele homozygotes and heterozygotes) rather than
three genotype classes. Because heterozygotes tend to have slightly higher coverage than
homozygotes of both classes, spurious relationships between genotype and gene expression
levels are much less likely to be observed when both types of homozygotes are well represented
(i.e., MAFs are larger).
Along with the high genotype accuracy rates estimated from the Yoruba data, our
analyses thus indicate that the set of eQTL we identified are largely robust to RNA-seq-specific
confounds. The most salient source of error stems from low MAF sites, which are also the cases
most vulnerable to sampling error and winner’s curse effects more generally (main text Fig 5)—a
problem that is not confined to RNA-seq-based eQTL mapping. Taken together, our findings
argue that, as a general rule, eQTL associated with lower MAF SNPs should be treated with
increased caution.
Text S4: Demographic and environmental variables
In addition to genetic effects, we also estimated the contribution of demographic
variables (age and sex) and early environment-related variables (early life social status and social
connectedness) to variation in gene expression levels. All individuals used in the PVE analysis
were either born into the Amboseli study population or immigrated into the population as adults
(males in this species disperse from their group of birth upon sexual maturity).
Sex was known from direct observation of the study subjects. As a consequence of the
long-term nature of research on this population (monitoring has been conducted on a near-daily
basis since 1971), ages were known to within a few days’ error for 52 of the 63 individuals in the
data set. For the remaining 11 individuals used in the PVE analysis, 6 animals had birth dates
estimated to be accurate within 1 year, 4 animals had birth dates estimated to be accurate within
2 years, and 1 had a birth date estimated to be less accurate than 2 years. Early social status was
measured using the proportional dominance rank of the individual’s mother, at the time of that
individual’s conception. We chose to use proportional dominance rank instead of ordinal
dominance rank in this analysis to break up the correlation between ordinal dominance rank
values and group size, which could measure density-dependent resource competition instead
(only larger groups contain the lowest rank values). Dominance ranks are assigned monthly
using ad libitum observations of dyadic agonistic (aggressive or competitive) encounters within
social groups [7,8]. Maternal social connectedness values were defined as the social
connectedness of the individual’s mother, in the year of that female’s life during which the focal
individual was born. Social connectedness is calculated on a yearly basis as the frequency with
which a female is involved in affiliative interactions, relative to the median for all females in the
population at the same time and controlling for observer effort (see [9,10]). Social connectedness
is measured for females, but can focus on either female-female relationships (SCI-F) or a
female’s relationship with adult males (SCI-M), which have independent effects on longevity in
this population [9]. For SCI-F, affiliative interactions included both grooming interactions and
close spatial proximity to other females. For SCI-M, only grooming interactions were used.
REFERENCES
1. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler
transform. Bioinformatics 25: 1754-1760.
2. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, et al. (2010) Understanding
mechanisms underlying human gene expression variation with RNA sequencing. Nature
464: 768-772.
3. Pickrell JK, Pai AA, Gilad Y, Pritchard JK (2010) Noisy splicing drives mRNA isoform
diversity in human cells. PLoS genetics 6: e1001236.
4. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A framework for
variation discovery and genotyping using next-generation DNA sequencing data. Nature
genetics 43: 491-498.
5. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome
Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA
sequencing data. Genome research 20: 1297-1303.
6. International HapMap C (2010) Integrating common and rare genetic variation in diverse
human populations. Nature 467: 52-58.
7. Alberts SC, Watts HE, Altmann J (2003) Queuing and queue-jumping: long-term patterns of
reproductive skew in male savannah baboons,< i> Papio cynocephalus</i>. Animal
Behaviour 65: 821-840.
8. Hausfater G (1974) Dominance and reproduction in Baboons (Papio cynocephalus).
Contributions to primatology 7: 1-150.
9. Archie EA, Tung J, Clark M, Altmann J, Alberts SC (In review) Social affiliation matters:
both same-sex and opposite-sex relationships predict survival in wild female baboons.
Proceedings of the Royal Society B: Biological Sciences.
10. Runcie DE, Wiedmann RT, Archie EA, Altmann J, Wray GA, et al. (2013) Social
environment influences the relationship between genotype and gene expression in wild
baboons. Philosophical Transactions of the Royal Society B: Biological Sciences 368:
20120345.
Table S1. Read mapping summary
index
ID
Total
Mapped
Unmapped
Unique
Exons
Junctions
Total_Exons
1
LAO
28235620
18570502
9665118
13889782
9370873
3201045
12571918
2
HAV
27488867
18507097
8981770
13125119
8101067
2931097
11032164
3
NET
27024563
16368164
10656399
12407791
6761125
2171006
8932131
4
LAT
27077953
18859256
8218697
14759720
8765292
2821173
11586465
5
LAW
28080996
16806868
11274128
11607624
6804911
2300480
9105391
6
THR
29522248
20771001
8751247
15700967
9776180
3125754
12901934
7
RIK
27677280
19784305
7892975
15063744
9901904
3119680
13021584
8
DHO
28074777
19456545
8618232
12228926
8830434
2932484
11762918
9
NIK
27367851
19397674
7970177
15532909
9047441
2814469
11861910
10
VEI
28620648
19380650
9239998
15791116
5144057
2240917
7384974
11
ODE
29649699
20666904
8982795
14628129
10024546
3256687
13281233
12
SCE
28157829
18970024
9187805
13898564
7491658
2796945
10288603
13
OTI
27906913
19181873
8725040
13064031
8020712
2682590
10703302
14
OPH
29274145
20556667
8717478
15255511
9606604
3006900
12613504
15
LOF
29307323
21174539
8132784
18160007
7716050
2163227
9879277
16
OCE
26997164
15663374
11333790
11008130
6483175
2524012
9007187
17
LUT
26129071
18401539
7727532
13043592
7803181
2444454
10247635
18
MAD
26286502
16898883
9387619
10329614
7502823
2697335
10200158
19
KOR
29992134
20534392
9457742
14499508
9975416
3629263
13604679
20
KIZ
26386405
16208282
10178123
8645866
6895382
3028682
9924064
21
FAX
27385179
17335149
10050030
12059163
8062395
2993049
11055444
22
NAP
29001094
17220457
11780637
10610563
7006916
2750349
9757265
23
LOG
26780027
18378069
8401958
12156957
8792079
2221263
11013342
24
NOK
28151763
17225372
10926391
10516283
8121055
3077334
11198389
25
WOL
33737482
18531740
15205742
13669393
6733362
2496220
9229582
26
LUX
30575711
19679305
10896406
11105474
7554823
3042602
10597425
27
VIN
35726525
24220741
11505784
18843582
11034495
3834221
14868716
28
QUA
33324526
21685880
11638646
14286996
8642849
3545658
12188507
29
VAZ
34379907
22436096
11943811
16612442
12237810
4700787
16938597
30
WIZ
34198847
21131631
13067216
10529981
8351489
3278531
11630020
31
EAS
34670118
24124470
10545648
15496561
10233100
3446621
13679721
32
HOK
31844219
21838132
10006087
14169633
9817243
3464400
13281643
33
LAX
32597159
20484922
12112237
12852326
9561573
3993455
13555028
34
ZAM
32942091
21818152
11123939
13219985
9288886
3569765
12858651
35
TOR
35812525
24898710
10913815
18405442
11591846
3910056
15501902
36
CAR
31707653
13579628
18128025
7629947
4554517
2476654
7031171
37
LYC
30401145
19630729
10770416
12158177
9304996
3444296
12749292
38
FAM
33275300
20803140
12472160
13417108
8765329
3975458
12740787
39
VEX
34018145
22756483
11261662
15827752
10134845
3677327
13812172
40
WEN
30264148
19162207
11101941
15257538
7830779
2562990
10393769
41
HUC
32952047
18236173
14715874
12078558
8420801
4897666
13318467
42
DED
34682291
22531415
12150876
12084907
9658020
3821578
13479598
43
LOU
33701586
20655828
13045758
12609485
8397256
3877048
12274304
44
PIT
35081496
24360475
10721021
16694750
11055248
3657317
14712565
45
NOZ
34554749
21394629
13160120
13516368
9964822
4427702
14392524
46
BAT
35431207
24638835
10792372
16752338
11057809
3915190
14972999
47
BUT
32879707
21959497
10920210
13707959
9411461
3587097
12998558
48
YOK
31025689
20570165
10455524
15330851
8852849
3610366
12463215
49
LYB
33083886
20827582
12256304
12670207
9175415
3898163
13073578
50
WON
33687192
21390373
12296819
16557747
9585575
3688205
13273780
51
EUR
34647992
16342620
18305372
9854659
6609676
4497395
11107071
52
AMO
30315547
20864959
9450588
16454442
11328705
3680624
15009329
53
EVA
33797805
24094410
9703395
18719723
11273140
3434348
14707488
54
NJU
34515910
24265603
10250307
19442223
9167778
2848803
12016581
55
COO
32102635
21157218
10945417
12916975
9366381
3396693
12763074
56
WRI
32528970
21436838
11092132
14669143
9957703
3697744
13655447
57
SCO
13191431
8586679
4604752
5651976
3655760
1288132
4943892
58
KIJ
31724195
21128559
10595636
12863693
10575562
3558757
14134319
59
WHE
21111426
13960560
7150866
9073011
5609626
1958378
7568004
60
CAC
26029270
17109577
8919693
10976405
8326047
3211354
11537401
61
GAM
18855838
11051054
7804784
6188928
4907306
2505598
7412904
62
WEI
16003239
10531947
5471292
7102739
4740896
1951922
6692818
63
BOL
26790552
18155252
8635300
13080794
8689416
3468565
12157981
mean
29980099
19497616
10482483
13340696
8594134
3194093
11788228
standard deviation
4508501
3309226
2381469
2970185
1817659
705120
2370100
Table S2. Gene Ontology analysis for genes with no eQTL in baboon or YRI
Expected
Observed
B-H FDR
Direction of
enrichment
GO:0003676 nucleic acid binding
1216.8
1300
4.81E-07
up
GO:0005634 nucleus
2135.27
2231
1.10E-06
up
GO:0060255 regulation of macromolecule metabolic process
1295.23
1376
1.39E-06
up
GO:0019222 regulation of metabolic process
1459.36
1543
1.42E-06
up
GO:0031323 regulation of cellular metabolic process
1390.63
1471
2.63E-06
up
GO:0016070 RNA metabolic process
1005.78
1075
4.22E-06
up
GO:0080090 regulation of primary metabolic process
regulation of nucleobase, nucleoside, nucleotide
GO:0019219 and nucleic acid metabolic process
regulation of nitrogen compound metabolic
GO:0051171 process
1332.42
1410
4.22E-06
up
1095.53
1166
6.57E-06
up
1102.8
1173
7.25E-06
up
GO:0010467 gene expression
1541.02
1620
7.83E-06
up
GO:0090304 nucleic acid metabolic process
1505.44
1583
9.89E-06
up
GO:0044260 cellular macromolecule metabolic process
2328.5
2415
1.06E-05
up
GO:0009889 regulation of biosynthetic process
regulation of macromolecule biosynthetic
GO:0010556 process
1102.8
1170
1.97E-05
up
1072.08
1138
2.31E-05
up
GO:0031326 regulation of cellular biosynthetic process
1097.95
1164
2.63E-05
up
617.7
669
2.63E-05
up
GO:0006350 transcription
1016.29
1080
2.72E-05
up
GO:0010468 regulation of gene expression
1103.61
1169
3.17E-05
up
GO:0045449 regulation of transcription
968.592
1030
3.92E-05
up
GO:0006351 transcription, DNA-dependent
660.551
711
7.69E-05
up
GO:0003677 DNA binding
801.231
855
0.000132994
up
GO:0032774 RNA biosynthetic process
nucleobase, nucleoside, nucleotide and nucleic
GO:0006139 acid metabolic process
662.976
712
0.00015385
up
1705.14
1776
0.000161347
up
GO ID
Category
GO:0051252 regulation of RNA metabolic process
GO:0006355 regulation of transcription, DNA-dependent
595.87
642
0.000189046
up
GO:0009059 macromolecule biosynthetic process
1413.27
1477
0.000403065
up
GO:0034645 cellular macromolecule biosynthetic process
1395.48
1459
0.000403065
up
GO:0043234 protein complex
1032.46
1089
0.000403065
up
GO:0044451 nucleoplasm part
299.956
331
0.000705323
up
GO:0034641 cellular nitrogen compound metabolic process
1795.7
1862
0.0007835
up
GO:0006996 organelle organization
643.572
688
0.000802833
up
GO:0032991 macromolecular complex
1296.04
1355
0.000927737
up
GO:0030528 transcription regulator activity
537.657
577
0.00160019
up
GO:0050789 regulation of biological process
2334.97
2402
0.00167445
up
GO:0006807 nitrogen compound metabolic process
1828.84
1892
0.00175189
up
GO:0009058 biosynthetic process
1692.21
1754
0.00175189
up
GO:0044249 cellular biosynthetic process
1659.06
1720
0.00192109
up
GO:0016055 Wnt receptor signaling pathway
67.9146
81
0.00219049
up
GO:0050794 regulation of cellular process
2233.1
2298
0.00219049
up
GO:0043170 macromolecule metabolic process
2474.84
2537
0.00539685
up
GO:0005667 transcription factor complex
77.6167
91
0.00595511
up
GO:0044237 cellular metabolic process
2925.99
2988
0.00595511
up
GO:0003723 RNA binding
350.892
380
0.00645126
up
GO:0043229 intracellular organelle
3642.33
3700
0.00826617
up
GO:0031981 nuclear lumen
759.189
800
0.00837838
up
GO:0043226 organelle
3645.56
3703
0.00837838
up
GO:0065007 biological regulation
2442.5
2502
0.00873111
up
GO:0043231 intracellular membrane-bounded organelle
3375.52
3434
0.00887101
up
GO:0005654 nucleoplasm
479.445
512
0.00973108
up
GO:0043227 membrane-bounded organelle
3377.14
3435
0.00975924
up
GO:0005488 binding
4302.88
4352
0.0106872
up
GO:0010629 negative regulation of gene expression
216.68
238
0.0154158
up
GO:0044424 intracellular part
4209.09
4258
0.0154158
up
GO:0044428 nuclear part
949.996
992
0.0174508
up
GO:0005622 intracellular
4308.54
4354
0.0259786
up
GO:0051254 positive regulation of RNA metabolic process
185.148
204
0.0262778
up
GO:0015030 Cajal body
29.1063
36
0.0264874
up
GO:0003712 transcription cofactor activity
173.021
191
0.0296204
up
GO:0009987 cellular process
positive regulation of transcription, DNAGO:0045893 dependent
negative regulation of macromolecule metabolic
GO:0010605 process
positive regulation of nucleobase, nucleoside,
GO:0045935 nucleotide and nucleic acid metabolic process
positive regulation of nitrogen compound
GO:0051173 metabolic process
4162.2
4208
0.0342128
up
181.914
200
0.0367732
up
328.254
352
0.0440579
up
237.701
258
0.0440579
up
244.978
265
0.0588737
up
GO:0006366 transcription from RNA polymerase II promoter
363.02
387
0.0646731
up
GO:0044446 intracellular organelle part
negative regulation of nucleobase, nucleoside,
GO:0045934 nucleotide and nucleic acid metabolic process
positive regulation of macromolecule metabolic
GO:0010604 process
2092.42
2140
0.0654513
up
215.063
233
0.08962
up
367.871
391
0.0934134
up
GO:0008380 RNA splicing
167.361
183
0.0983268
up
GO:0009892 negative regulation of metabolic process
347.658
370
0.0983268
up
GO:0004175 endopeptidase activity
101.063
77
6.48E-05
down
GO:0031224 intrinsic to membrane
1237.82
1179
0.00130188
down
GO:0044425 membrane part
1522.42
1461
0.00175189
down
GO:0016021 integral to membrane
peptidase activity, acting on L-amino acid
GO:0070011 peptides
1216
1159
0.00176057
down
167.361
144
0.00452098
down
GO:0008233 peptidase activity
174.638
151
0.00494679
down
GO:0000323 lytic vacuole
109.149
91
0.00873111
down
GO:0005764 lysosome
109.149
91
0.00873111
down
GO:0005886 plasma membrane
856.209
812
0.00908116
down
GO:0016020 membrane
1978.42
1925
0.0218727
down
GO:0016757 transferase activity, transferring glycosyl groups
88.9358
74
0.0262778
down
GO:0005773 vacuole
128.553
111
0.0268413
down
GO:0034599 cellular response to oxidative stress
25.0637
17
0.0440579
down
GO:0044459 plasma membrane part
485.913
456
0.0462556
down
GO:0003824 catalytic activity
2020.46
1974
0.0828394
down
GO:0019882 antigen processing and presentation
11.3191
6
0.0843294
down
Table S3. Gene Ontology analysis for genes with eQTL in either or both baboon and YRI
GO ID
Category
Expected
Observed
B-H FDR
Direction of
enrichment
GO:0004175
endopeptidase activity
23.9366
48
2.77E-05
up
GO:0031224
intrinsic to membrane
293.175
352
0.000556877
up
GO:0044425
membrane part
360.58
422
0.00074937
up
GO:0016021
288.005
345
0.000753081
up
1.14895
6
0.00185626
up
1.14895
6
0.00185626
up
GO:0070011
integral to membrane
antigen processing and presentation of
endogenous peptide antigen
antigen processing and presentation of
endogenous peptide antigen via MHC class I
peptidase activity, acting on L-amino acid
peptides
39.6389
63
0.00188989
up
GO:0008233
peptidase activity
41.3624
65
0.00206896
up
GO:0000323
lytic vacuole
25.8515
44
0.00366681
up
GO:0005764
lysosome
25.8515
44
0.00366681
up
GO:0005886
plasma membrane
antigen processing and presentation of
endogenous antigen
202.791
247
0.00381631
up
1.34045
6
0.00730263
up
468.582
522
0.00906815
up
GO:0016757
membrane
transferase activity, transferring glycosyl
groups
21.0642
36
0.0109097
up
GO:0005773
vacuole
30.4473
48
0.0113149
up
GO:0034599
cellular response to oxidative stress
5.93627
14
0.0185944
up
GO:0044459
plasma membrane part
115.087
145
0.0195254
up
GO:0003824
catalytic activity
478.54
525
0.0349915
up
GO:0019882
antigen processing and presentation
2.68089
8
0.0356264
up
GO:0004252
serine-type endopeptidase activity
5.93627
13
0.0476448
up
GO:0042995
cell projection
antigen processing and presentation of peptide
antigen
54.9583
74
0.0479857
up
2.29791
7
0.0482462
up
2.87239
8
0.0505191
up
GO:0002474
brush border
antigen processing and presentation of peptide
antigen via MHC class I
1.91492
6
0.0712938
up
GO:0004197
cysteine-type endopeptidase activity
7.6597
15
0.0712938
up
GO:0005887
61.4691
80
0.0743323
up
GO:0016763
integral to plasma membrane
transferase activity, transferring pentosyl
groups
4.9788
11
0.0749643
up
GO:0031226
intrinsic to plasma membrane
62.4266
81
0.0749643
up
GO:0003676
nucleic acid binding
288.196
205
2.06E-07
down
GO:0005634
505.732
410
4.72E-07
down
GO:0060255
nucleus
regulation of macromolecule metabolic
process
306.771
226
5.95E-07
down
GO:0019222
regulation of metabolic process
345.644
262
6.09E-07
down
GO:0031323
regulation of cellular metabolic process
329.367
249
1.13E-06
down
GO:0016070
RNA metabolic process
238.217
169
1.80E-06
down
GO:0080090
315.58
238
1.80E-06
down
GO:0019219
regulation of primary metabolic process
regulation of nucleobase, nucleoside,
nucleotide and nucleic acid metabolic process
259.472
189
2.81E-06
down
GO:0051171
regulation of nitrogen compound metabolic
261.196
191
3.10E-06
down
GO:0002483
GO:0019885
GO:0019883
GO:0016020
GO:0048002
GO:0005903
process
GO:0010467
gene expression
364.985
286
3.35E-06
down
GO:0090304
nucleic acid metabolic process
356.559
279
4.23E-06
down
GO:0044260
cellular macromolecule metabolic process
551.498
465
4.54E-06
down
GO:0009889
261.196
194
8.43E-06
down
GO:0010556
regulation of biosynthetic process
regulation of macromolecule biosynthetic
process
253.919
188
9.87E-06
down
GO:0031326
regulation of cellular biosynthetic process
260.047
194
1.13E-05
down
GO:0051252
regulation of RNA metabolic process
146.3
95
1.13E-05
down
GO:0006350
transcription
240.706
177
1.16E-05
down
GO:0010468
regulation of gene expression
261.387
196
1.36E-05
down
GO:0045449
regulation of transcription
229.408
168
1.68E-05
down
GO:0006351
transcription, DNA-dependent
156.449
106
3.29E-05
down
GO:0003677
DNA binding
189.769
136
5.69E-05
down
GO:0032774
157.024
108
6.58E-05
down
GO:0006139
RNA biosynthetic process
nucleobase, nucleoside, nucleotide and nucleic
acid metabolic process
403.858
333
6.90E-05
down
GO:0006355
regulation of transcription, DNA-dependent
141.13
95
8.09E-05
down
GO:0009059
macromolecule biosynthetic process
334.729
271
0.00017241
down
GO:0034645
cellular macromolecule biosynthetic process
330.516
267
0.00017241
down
GO:0043234
protein complex
244.536
188
0.00017241
down
GO:0044451
nucleoplasm part
71.0437
40
0.000301701
down
GO:0034641
cellular nitrogen compound metabolic process
425.305
359
0.000335141
down
GO:0006996
organelle organization
152.428
108
0.000343411
down
GO:0032991
macromolecular complex
306.962
248
0.000396838
down
GO:0030528
transcription regulator activity
127.343
88
0.000684478
down
GO:0050789
regulation of biological process
553.03
486
0.000716243
down
GO:0006807
nitrogen compound metabolic process
433.156
370
0.00074937
down
GO:0009058
biosynthetic process
400.794
339
0.00074937
down
GO:0044249
cellular biosynthetic process
392.943
332
0.000821743
down
GO:0050794
regulation of cellular process
528.902
464
0.000959832
down
GO:0043170
macromolecule metabolic process
586.159
524
0.00225831
down
GO:0005667
transcription factor complex
18.3833
5
0.00249422
down
GO:0044237
cellular metabolic process
693.011
631
0.00249422
down
GO:0003723
RNA binding
83.1077
54
0.0027032
down
GO:0043229
intracellular organelle
862.674
805
0.00346513
down
GO:0031981
nuclear lumen
179.811
139
0.00351492
down
GO:0043226
organelle
863.44
806
0.00351492
down
GO:0065007
biological regulation
578.499
519
0.00366681
down
GO:0043231
intracellular membrane-bounded organelle
799.481
741
0.0037268
down
GO:0005654
nucleoplasm
113.555
81
0.00409069
down
GO:0043227
membrane-bounded organelle
799.864
742
0.00410375
down
GO:0005488
binding
1019.12
970
0.00449524
down
GO:0010629
negative regulation of gene expression
51.32
30
0.00648772
down
GO:0044424
intracellular part
996.91
948
0.00648772
down
GO:0044428
nuclear part
225.004
183
0.00730263
down
GO:0005622
intracellular
1020.46
975
0.0107756
down
GO:0051254
positive regulation of RNA metabolic process
43.8518
25
0.0109097
down
GO:0003712
transcription cofactor activity
40.9794
23
0.0124891
down
GO:0009987
cellular process
positive regulation of transcription, DNAdependent
negative regulation of macromolecule
metabolic process
positive regulation of nucleobase, nucleoside,
nucleotide and nucleic acid metabolic process
positive regulation of nitrogen compound
metabolic process
transcription from RNA polymerase II
promoter
985.803
940
0.0144284
down
43.0858
25
0.0155112
down
77.746
54
0.0185944
down
56.2988
36
0.0185944
down
58.0222
38
0.0248561
down
85.9801
62
0.0273092
down
495.583
448
0.0276423
down
50.937
33
0.0378674
down
GO:0010604
intracellular organelle part
negative regulation of nucleobase, nucleoside,
nucleotide and nucleic acid metabolic process
positive regulation of macromolecule
metabolic process
87.1291
64
0.0394761
down
GO:0008380
RNA splicing
39.6389
24
0.0415643
down
GO:0009892
negative regulation of metabolic process
82.3418
60
0.0415643
down
GO:0003700
transcription factor activity
67.0224
47
0.0431062
down
GO:0006397
mRNA processing
39.4475
24
0.0431062
down
GO:0008134
transcription factor binding
58.7882
40
0.0431062
down
GO:0010628
positive regulation of gene expression
51.703
34
0.0431062
down
GO:0016071
46.9157
30
0.0431062
down
GO:0051172
mRNA metabolic process
negative regulation of nitrogen compound
metabolic process
51.703
34
0.0431062
down
GO:0016604
nuclear body
27.9579
15
0.0435841
down
GO:0043565
sequence-specific DNA binding
40.5964
25
0.0438237
down
GO:0009893
positive regulation of metabolic process
90.9589
68
0.0448117
down
GO:0044422
organelle part
499.412
455
0.0448117
down
GO:0016481
negative regulation of transcription
45.3837
29
0.0449439
down
GO:0045941
positive regulation of transcription
positive regulation of cellular metabolic
process
50.171
33
0.0455109
down
GO:0045893
GO:0010605
GO:0045935
GO:0051173
GO:0006366
GO:0044446
GO:0045934
GO:0031325
87.3206
65
0.0471322
down
742.225
698
0.0520077
down
GO:0010557
protein binding
positive regulation of macromolecule
biosynthetic process
58.0222
40
0.0521694
down
GO:0009891
positive regulation of biosynthetic process
60.3201
42
0.0528017
down
GO:0048285
organelle fission
33.5112
20
0.0599789
down
GO:0006396
RNA processing
72.5757
53
0.062525
down
GO:0016563
transcription activator activity
44.4263
29
0.062525
down
GO:0031974
223.28
191
0.0633893
down
GO:0031328
membrane-enclosed lumen
positive regulation of cellular biosynthetic
process
59.7457
42
0.0636819
down
GO:0005694
chromosome
52.4689
36
0.0688441
down
GO:0043233
organelle lumen
219.642
188
0.0688441
down
GO:0005515
GO:0044427
chromosomal part
42.8943
28
0.0688441
down
GO:0070013
intracellular organelle lumen
218.493
187
0.0688441
down
GO:0000775
17.4258
8
0.0712938
down
GO:0045944
chromosome, centromeric region
positive regulation of transcription from RNA
polymerase II promoter
31.5963
19
0.0749643
down
GO:0016043
cellular component organization
253.728
221
0.0751558
down
GO:0007049
110.683
88
0.081491
down
23.7451
13
0.0826468
down
GO:0006357
cell cycle
negative regulation of transcription from RNA
polymerase II promoter
regulation of transcription from RNA
polymerase II promoter
67.9798
50
0.0826468
down
GO:0000087
M phase of mitotic cell cycle
33.7027
21
0.0860795
down
GO:0003713
transcription coactivator activity
26.043
15
0.0938963
down
GO:0000280
nuclear division
32.1707
20
0.0974628
down
GO:0007067
mitosis
negative regulation of macromolecule
biosynthetic process
32.1707
20
0.0974628
down
54.7669
39
0.0974628
down
GO:0000122
GO:0010558
Table S4. Demographic and environmental data
index
ID
sex
age
early status
early rainfall
SCI-F
SCI-M
1
LAO
F
7.36
4
270.4
-0.3856093
-0.52241845
2
HAV
F
7.11
13
280.1
0.461661203
1.3030539
3
NET
M
11.97
4
702.2
-0.20947336
0.7208306
4
LAT
M
11.59
3
716.6
-0.0853144
-0.878015
5
LAW
M
8.92
4
199.2
0.506491543
1.7584365
6
THR
M
10.55
8
422
-0.319276067
-2.3777085
7
RIK
M
9.23
NA
380.1
NA
NA
8
DHO
F
6.34
4
198
0.454224
-0.96813245
9
NIK
F
11.01
7
547.3
0.21232889
1.11919545
10
VEI
M
13.53
4
262.2
NA
NA
11
ODE
M
5.8
5
212.4
-0.406493733
-1.230359
12
SCE
F
10.97
2
424.3
-0.520617833
1.629578
13
OTI
M
18.75
NA
271.9
NA
NA
14
OPH
F
9.96
12
461.3
-0.188829937
-0.02389575
15
LOF
M
16.8
16
392.8
0.179244073
-0.8860635
16
OCE
M
13.23
9
443.8
1.325138933
-0.8945281
17
LUT
M
6.22
7
239.6
0.354122883
0.252442565
18
MAD
F
8.66
NA
220.5
NA
NA
19
KOR
M
7.25
13
340.7
0.9380716
0.3850899
20
KIZ
M
15.66
NA
462.8
NA
NA
21
FAX
F
11.32
15
337
0.169666853
0.07530965
22
NAP
F
15.77
13
412.6
NA
NA
23
LOG
M
10.64
11
350.7
-0.43243739
-0.133389
24
NOK
M
6.67
6
280.1
0.1812232
0.36759935
25
WOL
M
7.27
6
267.2
0.0255269
-0.9990805
26
LUX
F
6.69
15
275.2
-0.50772177
0.8703787
27
VIN
F
19.76
19
317.4
NA
NA
28
QUA
M
14.07
NA
214.5
NA
NA
29
VAZ
M
14.74
5
276
NA
NA
30
WIZ
M
11.85
2
606.6
0.4980208
-0.6448934
31
EAS
F
6.46
8
257.8
0.127537233
1.211794
32
HOK
F
11.72
10
423.2
-0.100373433
0.36817205
33
LAX
M
12.65
8
415.9
0.6420283
-0.018190715
34
ZAM
M
11.99
NA
337
NA
NA
35
TOR
M
16.66
NA
326.2
NA
NA
36
CAR
M
7.4
9
343.3
0.424152667
0.6364931
37
LYC
F
7.12
15
334.7
-0.352513733
-0.56316986
38
FAM
M
6.84
20
204
-1.307986233
-0.0343694
39
VEX
F
11.59
8
337
-0.466001957
0.92390655
40
WEN
F
21.37
5
363.6
0.47977995
-2.6580745
41
HUC
F
7.72
16
264.6
0.47783111
-0.82024385
42
DED
F
5.78
5
317.7
0.26435965
-1.1607869
43
LOU
M
7.25
9
214
0.438560367
1.8825625
44
PIT
M
6.48
2
338.9
-0.7560449
1.325622
45
NOZ
M
9.99
7
517.9
0.223508633
1.17933235
46
BAT
F
4.91
17
251.3
0.202330133
-2.03858335
47
BUT
M
8.36
NA
281.5
NA
NA
48
YOK
M
6.75
9
247
0.604935033
0.84237355
49
LYB
F
4.22
13
412.3
0.317352433
1.4190425
50
WON
F
6.83
3
319.2
0.466852983
-0.0283486
51
EUR
F
4.54
7
551.9
0.144907933
0.0824483
52
AMO
M
13.76
11
322.2
NA
NA
53
EVA
F
8.58
6
518.5
0.471769867
0.52521566
54
NJU
M
12.01
5
763
0.06423019
-1.5682155
55
COO
M
8.15
8
262.2
0.550405133
-1.6419525
56
WRI
M
10.73
2
245.2
0.7824479
0.13655645
57
SCO
M
11.23
NA
459.9
NA
NA
58
KIJ
M
6.26
15
214
1.519485367
1.3677687
59
WHE
F
13.75
2
216.3
-0.030356642
-1.159825015
60
CAC
F
5.42
9
349.7
0.126639667
0.297957
61
GAM
M
10.51
NA
448.2
NA
NA
62
WEI
F
4.94
4
229.6
-0.166366
-0.40710915
63
BOL
M
6.47
NA
259
NA
NA
Raw reads from 63 known
Amboseli baboons
(~30M/individual)
Map to Panu2.0 using bwa; filter for
uniquely mapped reads with q >= 10
Uniquely mapped reads
(~13.3M/individual; non-uniquely
mapped and low quality reads
discarded)
Reads that do not map to the
genome
(~10.5M/individual)
Count reads that overlap Panu2.0
NCBI RefSeq exons from
ref_Panu_2.0_top_level.gff3
Count reads that map
uniquely (q >= 10) on one or
both ends (with a gap)
Uniquely mapped reads in exons
(~8.6M/individual; non-exonic r eads
discarded)
Combine exon reads into pergene read counts
Uniquely mapped reads associated
with possible exon junctions
Combine exon
reads into per-gene
read counts
Count reads that overlap Panu2.0
NCBI RefSeq exons from
ref_Panu_2.0_top_level.gf f3
Uniquely mapped reads
overlapping exon junctions
(~3.2M/individual; non-exonic r eads
discarded)
Matrix of raw read counts per
individual per gene
Calculate RPKM and quantile nor malize across
individuals and within genes; r etain genes
associated with segregating genetic variants
Matrix of normalized read counts
per individual (n = 63) per gene (n
= 10,409)
Bayesian sparse linear mixed model
(BSLMM) to estimate sources of
variance in gene expression levels
Linear mixed effects models
(GEMMA) for eQTL mapping
Beta binomial model
to detect ASE
Map cis-eQTL within 200 kb
for each gene (1,787 genes
with eQTL)
ASE in genes with multiple
exonic heterozygotes (510
genes with ASE)
Figure S1. Detailed workflow for gene expression level estimation.
Cis, trans contributions to
additive genetic variance; sex,
age, and early social
environmental effects
Figure S2. Elimination of GC bias via quantile normalization. Each plot shows gene GC
content (x-axis) versus the log of the ratio of the individual’s RPKM for that gene to mean
RPKM across all individuals. Data for three individuals are shown in pairs (A and B, C and D, E
and F) for prior to (left) and after (right) quantile normalization.
Uniquely mapped reads (q >= 10)
for 63 baboons
Perform indel realignment and
remove apparent PCR duplicates
Realigned uniquely mapped r eads
with unique start/end coordinates
Base quality recalibration with no
known variant sites
Realigned, uniquely mapped r eads
with recalibrated base quality
scores
Realigned, uniquely mapped r eads
with recalibrated base quality
scores
Call putative variable sites using
GATK UnifiedGenotyper
Unfiltered SNPs and genotypes
Repeat until number
of high quality SNPs
stabilizes (x5)
Hard filter for mapping quality
of supporting reads, base pair
position, strand bias
Filtered SNPs and genotypes
Filter out monomorphic sites, violations
of HWE, high percentage (>=19%)
missing data, mean depth <5x
Set of filtered SNPs for eQTL
mapping (n = 64,432)
BEAGLE imputation of remaining
missing genotypes
Set of filtered genotypes used for
eQTL mapping and ASE analysis
Figure S3. Detailed workflow for SNP genotyping.
Base quality recalibration using
high quality sites
0.06
0.00
Density
0.02
0.04
all SNPs
SNPs associated
with eQTL genes
-200
-100
TSS
TES
100
200
position relative to gene (kb)
Figure S4. Location of analyzed SNPs relative to genes. The locations of all SNPs tested in the
eQTL analysis are shown in gold relative to the 5’ most gene transcription start site (TSS) and
the 3’ most gene transcription end site (TES) for all 10,409 genes. The location of all SNPs
tested in association with eQTL genes are overplotted in blue. Gray shaded rectangle denotes the
region bounded by the TSS and TES, with gene lengths divided into 20 bins for visibility.
Figure S5. Accuracy of genotype calls for SNPs independently typed in HapMap3. (A)
Distribution of correlations between SNPs called using RNA-seq data and SNPs called
independently by HapMap3 (n = 9,919 variants). (B) Estimated homozygosity levels for n = 69
YRI individuals at the same set of sites; outliers (denoted with red stars) reflect those individuals
with the lowest correlation between RNA-seq-based genotypes and HapMap3 genotypes. The
four starred outliers in (B) include the 3 lowest accuracy individuals in the boxplots in (A).
Figure S6
B
HapMap
RNA-seq
57
57
42
67
67
29
29
21
2160
-5
3053
30
62
62
59
59
33
63
23 33
46
1231
49 46
39
66 50
34
39
20
20
44
58
18
11
66
15
17
21 13
21
1666
17
60
451
64
11
5
52
11
54
7
37
37
951
47
47
19
27
1948
35
3224
35
24
61 14
68
3338
68
1465245
40
45
2636
4125
69 22
8 10
28
56 55
55
0
5
HapMap
RNA-seq
-10
2929
PC 2 (1.9%)
0
10
6767
-10
PC 2 (1.9%)
20
10
30
A
30 62
30
53
5362
59
59
33
3363
23
46
46
31
12 31
39 49
66
50
34
34
39
20 44
20
44
58
58
18
18
13 15
15
1166 16
17
16 66
17
44 1164
64
55 11
51
52
51 54
54
7
52
37
37
9
9
47
47
19
27 35
19 48
48 35
32 24
24
38
38
61
2 68
6833
14
4014
40
45
45
65
65 36
36
41
41 2526
2526
69
69
22
8
22
10
28
56
56
55
55
43
43
-40
-30
-20
-10
PC 1 (2.3%)
0
-10
-5
0
5
10
43
43
PC 1 (2.3%)
Figure S6. PCA projection of YRI samples using the RNA-seq-based pipeline versus
independently typed SNPs. PCA projection of genotype data from the RNA-seq-based pipeline
and the HapMap3 data place individual samples very close together. (A) and (B) show the same
data, but (B) zooms in on the central cluster for better visibility.
Figure S7. Power simulations for masked eQTL based on effect size
Figure S7. Relationship between power to detect eQTL and simulated effect size, when the
true eQTL is masked. Purple line shows the baboon data; pink line shows the baboon data with
SNP density subsampled to match the YRI; orange line shows the YRI data. Masking the
simulated eQTL SNP demonstrates that the baboon data set has greater power to detect eQTL
due to both increased cis-regulatory SNP density and more extended LD. Subsampling the SNP
density in baboon to the level of the YRI data set reduces the difference in power but does not
remove it completely.
Figure S8. Correlation between eQTL detection and mean phyloP scores based on 100-way
vertebrate comparison. Genes with eQTL in both data set or one data set are less conserved
across vertebrates than genes for which no eQTL were detected p = 7.69 x 10-22)
Figure S9. PVE explained by demographic and early environmental variables. (A) QQ plots
of PVE explained by a variable of interest versus PVE explained by that variable with permuted
data, for age (upper left panel) and maternal social connectedness to males (SCI-M, upper right
panel). Bottom panels show the difference between evidence for significant PVE by sex for
genes on autosomes (bottom left) versus genes on the X chromosome (bottom right). (B)
Distribution of PVE explained by additive genetic variance, age, sex, and maternal social
connectedness to males across all genes.
Figure S10. Coverage by genotype call. Mean coverage by genotype class for (A) all SNPs
tested in the baboon eQTL analysis (n = 64,432), and (B) SNPs identified as eQTL (n = 1,693).
QQ plot of mean coverage in homozygotes for the reference allele versus homozygotes for the
alternate allele for (C) all SNPs and (D) SNPs identified as eQTL. The magnitude of increased
coverage in reference allele homozygotes indicates the degree of systematic reference allele
mapping bias (dashed line shows the expectation for no mapping bias). Reference allele
homozygotes tend to have higher coverage, on average, than alternate allele homozygotes (K-S
test: p < 2.2 x 10-16 for all SNPs; p = 3.9 x 10-5 for eQTL SNPs), suggesting some degree of
mapping bias; however the effect is actually smaller for eQTL SNPs than for all SNPs (K-S D =
0.167 for all SNPs; K-S D = 0.084 for eQTL SNPs).
Download