Supplementary Materials for Tung et al, “The genetic architecture of

Supplementary Materials for Tung et al, “The genetic architecture of gene expression levels in wild baboons” J Tung, X Zhou, SC Alberts, M Stephens, Y Gilad 1. Supplementary Text Text S1: Read mapping and SNP quality filtering pipeline Text S2: Estimated accuracy of SNP genotypes using human RNA-seq data Text S3: Possible confounds associated with eQTL mapping using RNA-seq data Text S4: Demographic and environmental variables 2. Supplementary Tables Table S1: Read mapping summary Table S2: Gene Ontology analysis for genes with no eQTL in baboon or YRI Table S3: Gene Ontology analysis for genes with eQTL in either or both baboon and YRI Table S4: Demographic and environmental data 3. Supplementary Figures Figure S1: Detailed workflow for gene expression level estimation Figure S2: Elimination of GC bias via quantile normalization Figure S3: Detailed workflow for SNP genotyping Figure S4: Location of analyzed SNPs relative to genes Figure S5: Accuracy of genotype calls for SNPs independently typed in HapMap3 Figure S6: PCA projection of YRI samples using the RNA-seq-based pipeline versus independently typed SNPs Figure S7: Power simulations for masked eQTL based on effect size Figure S8: Correlation between eQTL detection and mean phyloP scores based on 100way vertebrate comparison Figure S9: PVE explained by demographic and early environmental variables Figure S10: Coverage by genotype call Text S1: Read mapping and SNP quality filtering pipeline Read alignment was performed against the baboon genome, Panu2.0, using bwa 0.5.9 [1]. Alignments utilized a seed length of 25 bases, maximum edit distance of 2 mismatches in the seed, read trimming quality score threshold of 20, and the default maximum edit distance (4% after trimming, which allows no more than 3 mismatches in a 76 base pair read). Only uniquely mapped reads with mapping quality ≥10 were retained at this step. Because reads that did not map to the genome could reflect reads spanning exon-exon junctions, we used the program jfinder [2,3] to separately align the first 35 and last 35 base pairs of each initially unmapped read, using a seed length of 20 bases. Total read counts for each individual thus reflected the sum of: i) reads that uniquely mapped directly to the genome; ii) those that uniquely mapped on both ends after splitting the reads with jfinder; and iii) those that uniquely mapped on one end after splitting with jfinder. For gene expression analysis, we evaluated the total read counts for each gene and individual as the sum of the number of reads for that individual that overlapped that gene. We defined a “gene” as the union of all exon base pairs assigned to a given gene, based on the NCBI annotation run of the current baboon genome (‘ref_Panu_2.0_top_level.gff3’, downloaded September 2012). For variant discovery and genotyping, we used the Genome Analysis Toolkit v. 1.2.6 (GATK: [4,5]). GATK relies on a known set of variants to perform recalibration of base quality scores; in the absence of known variants, all non-reference base calls are assumed to reflect sequencing errors. Because no validated reference set of known genetic variants are available for baboon, we performed an iterative bootstrapping procedure for base quality score recalibration. Specifically, after processing to remove possible PCR duplicates and performing sample-level read realignment at potential indels, we performed an initial round of base quality score recalibration and identified a set of variants using GATK’s UnifiedGenotyper and VariantFiltration walker (focusing on variants within 20kb of annotated exons, where most of our reads fell). From this call set, we constructed a set of high confidence variants with quality score ≥100 that passed all filters for variant confidence (variants failed if QD <2.0), mapping quality (variants failed if MQ <35.0), strand bias (variants failed if FS >60.0), haplotype score (variants failed if HaplotypeScore >13.0), mapping quality (variants failed if MQRankSum <12.5) and read position bias (variants failed if ReadPosRankSum <-8.0). We then used this high confidence set as the set of “known sites” in a second round of base quality score recalibration. We repeated this process until the number of variants identified in consecutive rounds of recalibration stabilized (six rounds of recalibration and variant identification). We again filtered the final call set as described above, and kept only sites that were polymorphic in Amboseli and did not deviate from Hardy-Weinberg equilibrium (p > 0.001). We also removed any site for which genotype data were missing from more than 20% of the individuals in the data set and for which mean coverage was <5x per individual. Finally, we removed sites with a quality score less than 100 and sites distant from expressed genes (>200 kb away), as we were primarily interested in eQTL mapping in cis. Text S2: Estimated accuracy of SNP genotypes using human RNA-seq data To assess the likely accuracy of the RNA-seq based genotyping calls we performed in the baboons, we investigated a similarly sized data set of RNA-seq reads from a human population [2]. Because this data set focused on samples from the HapMap consortium (n = 69 members of the Yoruba population from Ibadan, Nigeria), we were able to compare genotypes called using the RNA-seq pipeline to independently collected genotype data from HapMap Phase 3 (r27) [6]. To do so, we focused on 9,919 variants that were genotyped in both data sets. We then calculated the correlation between genotypes called in the RNA-seq-based pipeline and genotypes from HapMap, for each individual. When including imputed genotypes in this analysis, the median correlation across individuals was 0.984 (mean = 0.976 ± 0.026); excluding imputed genotypes yielded a median correlation of 0.987 (mean = 0.980 ± 0.024) (Figure S5a). Similar to the results of [cites], our data support highly accurate genotype calls from RNA-seq reads. However, we also found that low accuracy was correlated with the level of apparent homozygosity in the genotype data: the three lowest accuracy YRI individuals were outliers in this regard (Figure S5b). In the baboon data, we had no individuals with unusually low homozygosity, but six individuals with unusually high homozygosity (>80% of genotype calls). These outliers were missing a median of 10.6% of data (requiring imputation), whereas all other individuals were missing a median of 0.6% data. Notably, removing these six individuals from our analysis resulted in very similar results as using the full data set. 87.6% of the genes (n = 1566) identified when using all individuals were also identified with this subset. Importantly, the available data from humans also supports accurate variant discovery. Of the 30,938 sites that we identified from the RNA-seq data and that passed all of our filters, only 3.1% (967) did not have an assigned rsID in dbSNP release 138. These sites were likely enriched for false positives, as the transition/transversion ratio for this set was 1.42, versus 2.80 for the set of 30,938 sites as a whole. Text S3: Possible confounds associated with eQTL mapping using RNA-seq data We report the results of eQTL mapping in the Amboseli baboon population using both gene expression level and genotype data derived from the same RNA-seq data set. This strategy has the potential to introduce four possible confounds. First, for genes with large effect cis-eQTLs, reads from heterozygotes at eQTL-linked sites might be biased towards the allele associated with higher gene expression levels. If so, heterozygotes might be mistakenly genotyped as homozygotes for the high expressing allele, resulting in an underrepresentation of heterozygous genotypes relative to neutral expectations. To control for this possibility, we eliminated sites that violated Hardy-Weinberg expectations (n = 2,386) from our analyses. We note, however, that this scenario would not introduce false positives. Instead, it would lead to more conservative detection of additive eQTL effects, with the direction of an estimated eQTL effect still consistent with the true effect. Second, SNP calling might be biased towards the reference allele. If so, more reads would be required to support a genotype call of homozygote alternate than a genotype call of homozygote reference. This bias would result in higher apparent expression levels for alternate allele homozygotes and lower expression levels for reference allele homozygotes, which could create false positive eQTLs. However, we observe no evidence for this scenario in our data set. For all tested SNPs (n = 64,432) and for eQTL SNPs only (n = 1,693), alternate allele homozygotes tend to have slightly lower coverage than reference allele homozygotes, and heterozygotes tend to have the highest coverage (because more reads are required to support inference of heterozygosity) (Figure S10). Thus, coverage and genotype do not covary additively, and this potential confound is unlikely to produce false positive eQTLs. Third, read mapping might be biased towards the reference allele, such that reads carrying the alternate allele are less likely to map because they contain more mismatches to the reference genome. This possibility is consistent with our observation that alternate allele homozygotes tend to have slightly less coverage than reference allele homozygotes (Figure S10a,b). While this difference in coverage is significant (Kolmogorov-Smirnov test: p < 2.2 x 1016 for all SNPs; p = 3.9 x 10-5 for eQTL SNPs), the magnitude of the effect itself is modest (Figure S10c,d), probably because we allowed reads to map with up to 3 mismatches; Wittkopp and colleagues have shown that reference allele mapping bias is largely obviated by allowing reads to map with more mismatches. Further, systematic calling of false positive eQTLs due to biased read mapping would predict a bias towards negative effect sizes (i.e., eQTL effects suggesting that the alternate allele is associated with lower expression levels). Our data are not consistent with such a pattern: 47% of eQTL betas are negative, whereas 53% are positive. We note that reference allele mapping biases are, however, more likely to affect ASE analysis, producing a pattern of greater expression in the reference allele. Indeed, we do observe a bias towards negative betas in the ASE analysis (67.2% of n = 510 genes), although the magnitude and direction of ASE data agree well with eQTL evidence. Fourth, lower mean coverage in homozygotes of either type relative to heterozygotes could induce false positive eQTLs in which the major allele was associated with lower gene expression levels. To test this possibility, we recoded eQTL effects to reflect the effect of the major allele instead of the effect of the alternative allele (i.e., a genotype of 0 = homozygous minor and a genotype of 2 = homozygous major). We observed a modest excess of eQTL for which the major allele was associated with lower gene expression levels (56%, binomial test p = 1.15 x 10-7). This bias did not differ depending on whether the major allele was the reference allele or the alternate allele (Fisher’s Exact Test, p = 0.28), supporting minimal read mapping biases in our data. Instead, it appears to be primarily driven by SNPs with low minor allele frequencies (proportion of negative betas for the lowest quartile of MAFs = 62.8%, p = 7.49 x 10-8; highest quartile of MAFs = 48.6%, p = 0.602). At these sites, eQTL inference relies primarily on two genotype classes (the major allele homozygotes and heterozygotes) rather than three genotype classes. Because heterozygotes tend to have slightly higher coverage than homozygotes of both classes, spurious relationships between genotype and gene expression levels are much less likely to be observed when both types of homozygotes are well represented (i.e., MAFs are larger). Along with the high genotype accuracy rates estimated from the Yoruba data, our analyses thus indicate that the set of eQTL we identified are largely robust to RNA-seq-specific confounds. The most salient source of error stems from low MAF sites, which are also the cases most vulnerable to sampling error and winner’s curse effects more generally (main text Fig 5)—a problem that is not confined to RNA-seq-based eQTL mapping. Taken together, our findings argue that, as a general rule, eQTL associated with lower MAF SNPs should be treated with increased caution. Text S4: Demographic and environmental variables In addition to genetic effects, we also estimated the contribution of demographic variables (age and sex) and early environment-related variables (early life social status and social connectedness) to variation in gene expression levels. All individuals used in the PVE analysis were either born into the Amboseli study population or immigrated into the population as adults (males in this species disperse from their group of birth upon sexual maturity). Sex was known from direct observation of the study subjects. As a consequence of the long-term nature of research on this population (monitoring has been conducted on a near-daily basis since 1971), ages were known to within a few days’ error for 52 of the 63 individuals in the data set. For the remaining 11 individuals used in the PVE analysis, 6 animals had birth dates estimated to be accurate within 1 year, 4 animals had birth dates estimated to be accurate within 2 years, and 1 had a birth date estimated to be less accurate than 2 years. Early social status was measured using the proportional dominance rank of the individual’s mother, at the time of that individual’s conception. We chose to use proportional dominance rank instead of ordinal dominance rank in this analysis to break up the correlation between ordinal dominance rank values and group size, which could measure density-dependent resource competition instead (only larger groups contain the lowest rank values). Dominance ranks are assigned monthly using ad libitum observations of dyadic agonistic (aggressive or competitive) encounters within social groups [7,8]. Maternal social connectedness values were defined as the social connectedness of the individual’s mother, in the year of that female’s life during which the focal individual was born. Social connectedness is calculated on a yearly basis as the frequency with which a female is involved in affiliative interactions, relative to the median for all females in the population at the same time and controlling for observer effort (see [9,10]). Social connectedness is measured for females, but can focus on either female-female relationships (SCI-F) or a female’s relationship with adult males (SCI-M), which have independent effects on longevity in this population [9]. For SCI-F, affiliative interactions included both grooming interactions and close spatial proximity to other females. For SCI-M, only grooming interactions were used. REFERENCES 1. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754-1760. 2. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, et al. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464: 768-772. 3. Pickrell JK, Pai AA, Gilad Y, Pritchard JK (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS genetics 6: e1001236. 4. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 43: 491-498. 5. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research 20: 1297-1303. 6. International HapMap C (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52-58. 7. Alberts SC, Watts HE, Altmann J (2003) Queuing and queue-jumping: long-term patterns of reproductive skew in male savannah baboons,< i> Papio cynocephalus</i>. Animal Behaviour 65: 821-840. 8. Hausfater G (1974) Dominance and reproduction in Baboons (Papio cynocephalus). Contributions to primatology 7: 1-150. 9. Archie EA, Tung J, Clark M, Altmann J, Alberts SC (In review) Social affiliation matters: both same-sex and opposite-sex relationships predict survival in wild female baboons. Proceedings of the Royal Society B: Biological Sciences. 10. Runcie DE, Wiedmann RT, Archie EA, Altmann J, Wray GA, et al. (2013) Social environment influences the relationship between genotype and gene expression in wild baboons. Philosophical Transactions of the Royal Society B: Biological Sciences 368: 20120345. Table S1. Read mapping summary index ID Total Mapped Unmapped Unique Exons Junctions Total_Exons 1 LAO 28235620 18570502 9665118 13889782 9370873 3201045 12571918 2 HAV 27488867 18507097 8981770 13125119 8101067 2931097 11032164 3 NET 27024563 16368164 10656399 12407791 6761125 2171006 8932131 4 LAT 27077953 18859256 8218697 14759720 8765292 2821173 11586465 5 LAW 28080996 16806868 11274128 11607624 6804911 2300480 9105391 6 THR 29522248 20771001 8751247 15700967 9776180 3125754 12901934 7 RIK 27677280 19784305 7892975 15063744 9901904 3119680 13021584 8 DHO 28074777 19456545 8618232 12228926 8830434 2932484 11762918 9 NIK 27367851 19397674 7970177 15532909 9047441 2814469 11861910 10 VEI 28620648 19380650 9239998 15791116 5144057 2240917 7384974 11 ODE 29649699 20666904 8982795 14628129 10024546 3256687 13281233 12 SCE 28157829 18970024 9187805 13898564 7491658 2796945 10288603 13 OTI 27906913 19181873 8725040 13064031 8020712 2682590 10703302 14 OPH 29274145 20556667 8717478 15255511 9606604 3006900 12613504 15 LOF 29307323 21174539 8132784 18160007 7716050 2163227 9879277 16 OCE 26997164 15663374 11333790 11008130 6483175 2524012 9007187 17 LUT 26129071 18401539 7727532 13043592 7803181 2444454 10247635 18 MAD 26286502 16898883 9387619 10329614 7502823 2697335 10200158 19 KOR 29992134 20534392 9457742 14499508 9975416 3629263 13604679 20 KIZ 26386405 16208282 10178123 8645866 6895382 3028682 9924064 21 FAX 27385179 17335149 10050030 12059163 8062395 2993049 11055444 22 NAP 29001094 17220457 11780637 10610563 7006916 2750349 9757265 23 LOG 26780027 18378069 8401958 12156957 8792079 2221263 11013342 24 NOK 28151763 17225372 10926391 10516283 8121055 3077334 11198389 25 WOL 33737482 18531740 15205742 13669393 6733362 2496220 9229582 26 LUX 30575711 19679305 10896406 11105474 7554823 3042602 10597425 27 VIN 35726525 24220741 11505784 18843582 11034495 3834221 14868716 28 QUA 33324526 21685880 11638646 14286996 8642849 3545658 12188507 29 VAZ 34379907 22436096 11943811 16612442 12237810 4700787 16938597 30 WIZ 34198847 21131631 13067216 10529981 8351489 3278531 11630020 31 EAS 34670118 24124470 10545648 15496561 10233100 3446621 13679721 32 HOK 31844219 21838132 10006087 14169633 9817243 3464400 13281643 33 LAX 32597159 20484922 12112237 12852326 9561573 3993455 13555028 34 ZAM 32942091 21818152 11123939 13219985 9288886 3569765 12858651 35 TOR 35812525 24898710 10913815 18405442 11591846 3910056 15501902 36 CAR 31707653 13579628 18128025 7629947 4554517 2476654 7031171 37 LYC 30401145 19630729 10770416 12158177 9304996 3444296 12749292 38 FAM 33275300 20803140 12472160 13417108 8765329 3975458 12740787 39 VEX 34018145 22756483 11261662 15827752 10134845 3677327 13812172 40 WEN 30264148 19162207 11101941 15257538 7830779 2562990 10393769 41 HUC 32952047 18236173 14715874 12078558 8420801 4897666 13318467 42 DED 34682291 22531415 12150876 12084907 9658020 3821578 13479598 43 LOU 33701586 20655828 13045758 12609485 8397256 3877048 12274304 44 PIT 35081496 24360475 10721021 16694750 11055248 3657317 14712565 45 NOZ 34554749 21394629 13160120 13516368 9964822 4427702 14392524 46 BAT 35431207 24638835 10792372 16752338 11057809 3915190 14972999 47 BUT 32879707 21959497 10920210 13707959 9411461 3587097 12998558 48 YOK 31025689 20570165 10455524 15330851 8852849 3610366 12463215 49 LYB 33083886 20827582 12256304 12670207 9175415 3898163 13073578 50 WON 33687192 21390373 12296819 16557747 9585575 3688205 13273780 51 EUR 34647992 16342620 18305372 9854659 6609676 4497395 11107071 52 AMO 30315547 20864959 9450588 16454442 11328705 3680624 15009329 53 EVA 33797805 24094410 9703395 18719723 11273140 3434348 14707488 54 NJU 34515910 24265603 10250307 19442223 9167778 2848803 12016581 55 COO 32102635 21157218 10945417 12916975 9366381 3396693 12763074 56 WRI 32528970 21436838 11092132 14669143 9957703 3697744 13655447 57 SCO 13191431 8586679 4604752 5651976 3655760 1288132 4943892 58 KIJ 31724195 21128559 10595636 12863693 10575562 3558757 14134319 59 WHE 21111426 13960560 7150866 9073011 5609626 1958378 7568004 60 CAC 26029270 17109577 8919693 10976405 8326047 3211354 11537401 61 GAM 18855838 11051054 7804784 6188928 4907306 2505598 7412904 62 WEI 16003239 10531947 5471292 7102739 4740896 1951922 6692818 63 BOL 26790552 18155252 8635300 13080794 8689416 3468565 12157981 mean 29980099 19497616 10482483 13340696 8594134 3194093 11788228 standard deviation 4508501 3309226 2381469 2970185 1817659 705120 2370100 Table S2. Gene Ontology analysis for genes with no eQTL in baboon or YRI Expected Observed B-H FDR Direction of enrichment GO:0003676 nucleic acid binding 1216.8 1300 4.81E-07 up GO:0005634 nucleus 2135.27 2231 1.10E-06 up GO:0060255 regulation of macromolecule metabolic process 1295.23 1376 1.39E-06 up GO:0019222 regulation of metabolic process 1459.36 1543 1.42E-06 up GO:0031323 regulation of cellular metabolic process 1390.63 1471 2.63E-06 up GO:0016070 RNA metabolic process 1005.78 1075 4.22E-06 up GO:0080090 regulation of primary metabolic process regulation of nucleobase, nucleoside, nucleotide GO:0019219 and nucleic acid metabolic process regulation of nitrogen compound metabolic GO:0051171 process 1332.42 1410 4.22E-06 up 1095.53 1166 6.57E-06 up 1102.8 1173 7.25E-06 up GO:0010467 gene expression 1541.02 1620 7.83E-06 up GO:0090304 nucleic acid metabolic process 1505.44 1583 9.89E-06 up GO:0044260 cellular macromolecule metabolic process 2328.5 2415 1.06E-05 up GO:0009889 regulation of biosynthetic process regulation of macromolecule biosynthetic GO:0010556 process 1102.8 1170 1.97E-05 up 1072.08 1138 2.31E-05 up GO:0031326 regulation of cellular biosynthetic process 1097.95 1164 2.63E-05 up 617.7 669 2.63E-05 up GO:0006350 transcription 1016.29 1080 2.72E-05 up GO:0010468 regulation of gene expression 1103.61 1169 3.17E-05 up GO:0045449 regulation of transcription 968.592 1030 3.92E-05 up GO:0006351 transcription, DNA-dependent 660.551 711 7.69E-05 up GO:0003677 DNA binding 801.231 855 0.000132994 up GO:0032774 RNA biosynthetic process nucleobase, nucleoside, nucleotide and nucleic GO:0006139 acid metabolic process 662.976 712 0.00015385 up 1705.14 1776 0.000161347 up GO ID Category GO:0051252 regulation of RNA metabolic process GO:0006355 regulation of transcription, DNA-dependent 595.87 642 0.000189046 up GO:0009059 macromolecule biosynthetic process 1413.27 1477 0.000403065 up GO:0034645 cellular macromolecule biosynthetic process 1395.48 1459 0.000403065 up GO:0043234 protein complex 1032.46 1089 0.000403065 up GO:0044451 nucleoplasm part 299.956 331 0.000705323 up GO:0034641 cellular nitrogen compound metabolic process 1795.7 1862 0.0007835 up GO:0006996 organelle organization 643.572 688 0.000802833 up GO:0032991 macromolecular complex 1296.04 1355 0.000927737 up GO:0030528 transcription regulator activity 537.657 577 0.00160019 up GO:0050789 regulation of biological process 2334.97 2402 0.00167445 up GO:0006807 nitrogen compound metabolic process 1828.84 1892 0.00175189 up GO:0009058 biosynthetic process 1692.21 1754 0.00175189 up GO:0044249 cellular biosynthetic process 1659.06 1720 0.00192109 up GO:0016055 Wnt receptor signaling pathway 67.9146 81 0.00219049 up GO:0050794 regulation of cellular process 2233.1 2298 0.00219049 up GO:0043170 macromolecule metabolic process 2474.84 2537 0.00539685 up GO:0005667 transcription factor complex 77.6167 91 0.00595511 up GO:0044237 cellular metabolic process 2925.99 2988 0.00595511 up GO:0003723 RNA binding 350.892 380 0.00645126 up GO:0043229 intracellular organelle 3642.33 3700 0.00826617 up GO:0031981 nuclear lumen 759.189 800 0.00837838 up GO:0043226 organelle 3645.56 3703 0.00837838 up GO:0065007 biological regulation 2442.5 2502 0.00873111 up GO:0043231 intracellular membrane-bounded organelle 3375.52 3434 0.00887101 up GO:0005654 nucleoplasm 479.445 512 0.00973108 up GO:0043227 membrane-bounded organelle 3377.14 3435 0.00975924 up GO:0005488 binding 4302.88 4352 0.0106872 up GO:0010629 negative regulation of gene expression 216.68 238 0.0154158 up GO:0044424 intracellular part 4209.09 4258 0.0154158 up GO:0044428 nuclear part 949.996 992 0.0174508 up GO:0005622 intracellular 4308.54 4354 0.0259786 up GO:0051254 positive regulation of RNA metabolic process 185.148 204 0.0262778 up GO:0015030 Cajal body 29.1063 36 0.0264874 up GO:0003712 transcription cofactor activity 173.021 191 0.0296204 up GO:0009987 cellular process positive regulation of transcription, DNAGO:0045893 dependent negative regulation of macromolecule metabolic GO:0010605 process positive regulation of nucleobase, nucleoside, GO:0045935 nucleotide and nucleic acid metabolic process positive regulation of nitrogen compound GO:0051173 metabolic process 4162.2 4208 0.0342128 up 181.914 200 0.0367732 up 328.254 352 0.0440579 up 237.701 258 0.0440579 up 244.978 265 0.0588737 up GO:0006366 transcription from RNA polymerase II promoter 363.02 387 0.0646731 up GO:0044446 intracellular organelle part negative regulation of nucleobase, nucleoside, GO:0045934 nucleotide and nucleic acid metabolic process positive regulation of macromolecule metabolic GO:0010604 process 2092.42 2140 0.0654513 up 215.063 233 0.08962 up 367.871 391 0.0934134 up GO:0008380 RNA splicing 167.361 183 0.0983268 up GO:0009892 negative regulation of metabolic process 347.658 370 0.0983268 up GO:0004175 endopeptidase activity 101.063 77 6.48E-05 down GO:0031224 intrinsic to membrane 1237.82 1179 0.00130188 down GO:0044425 membrane part 1522.42 1461 0.00175189 down GO:0016021 integral to membrane peptidase activity, acting on L-amino acid GO:0070011 peptides 1216 1159 0.00176057 down 167.361 144 0.00452098 down GO:0008233 peptidase activity 174.638 151 0.00494679 down GO:0000323 lytic vacuole 109.149 91 0.00873111 down GO:0005764 lysosome 109.149 91 0.00873111 down GO:0005886 plasma membrane 856.209 812 0.00908116 down GO:0016020 membrane 1978.42 1925 0.0218727 down GO:0016757 transferase activity, transferring glycosyl groups 88.9358 74 0.0262778 down GO:0005773 vacuole 128.553 111 0.0268413 down GO:0034599 cellular response to oxidative stress 25.0637 17 0.0440579 down GO:0044459 plasma membrane part 485.913 456 0.0462556 down GO:0003824 catalytic activity 2020.46 1974 0.0828394 down GO:0019882 antigen processing and presentation 11.3191 6 0.0843294 down Table S3. Gene Ontology analysis for genes with eQTL in either or both baboon and YRI GO ID Category Expected Observed B-H FDR Direction of enrichment GO:0004175 endopeptidase activity 23.9366 48 2.77E-05 up GO:0031224 intrinsic to membrane 293.175 352 0.000556877 up GO:0044425 membrane part 360.58 422 0.00074937 up GO:0016021 288.005 345 0.000753081 up 1.14895 6 0.00185626 up 1.14895 6 0.00185626 up GO:0070011 integral to membrane antigen processing and presentation of endogenous peptide antigen antigen processing and presentation of endogenous peptide antigen via MHC class I peptidase activity, acting on L-amino acid peptides 39.6389 63 0.00188989 up GO:0008233 peptidase activity 41.3624 65 0.00206896 up GO:0000323 lytic vacuole 25.8515 44 0.00366681 up GO:0005764 lysosome 25.8515 44 0.00366681 up GO:0005886 plasma membrane antigen processing and presentation of endogenous antigen 202.791 247 0.00381631 up 1.34045 6 0.00730263 up 468.582 522 0.00906815 up GO:0016757 membrane transferase activity, transferring glycosyl groups 21.0642 36 0.0109097 up GO:0005773 vacuole 30.4473 48 0.0113149 up GO:0034599 cellular response to oxidative stress 5.93627 14 0.0185944 up GO:0044459 plasma membrane part 115.087 145 0.0195254 up GO:0003824 catalytic activity 478.54 525 0.0349915 up GO:0019882 antigen processing and presentation 2.68089 8 0.0356264 up GO:0004252 serine-type endopeptidase activity 5.93627 13 0.0476448 up GO:0042995 cell projection antigen processing and presentation of peptide antigen 54.9583 74 0.0479857 up 2.29791 7 0.0482462 up 2.87239 8 0.0505191 up GO:0002474 brush border antigen processing and presentation of peptide antigen via MHC class I 1.91492 6 0.0712938 up GO:0004197 cysteine-type endopeptidase activity 7.6597 15 0.0712938 up GO:0005887 61.4691 80 0.0743323 up GO:0016763 integral to plasma membrane transferase activity, transferring pentosyl groups 4.9788 11 0.0749643 up GO:0031226 intrinsic to plasma membrane 62.4266 81 0.0749643 up GO:0003676 nucleic acid binding 288.196 205 2.06E-07 down GO:0005634 505.732 410 4.72E-07 down GO:0060255 nucleus regulation of macromolecule metabolic process 306.771 226 5.95E-07 down GO:0019222 regulation of metabolic process 345.644 262 6.09E-07 down GO:0031323 regulation of cellular metabolic process 329.367 249 1.13E-06 down GO:0016070 RNA metabolic process 238.217 169 1.80E-06 down GO:0080090 315.58 238 1.80E-06 down GO:0019219 regulation of primary metabolic process regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process 259.472 189 2.81E-06 down GO:0051171 regulation of nitrogen compound metabolic 261.196 191 3.10E-06 down GO:0002483 GO:0019885 GO:0019883 GO:0016020 GO:0048002 GO:0005903 process GO:0010467 gene expression 364.985 286 3.35E-06 down GO:0090304 nucleic acid metabolic process 356.559 279 4.23E-06 down GO:0044260 cellular macromolecule metabolic process 551.498 465 4.54E-06 down GO:0009889 261.196 194 8.43E-06 down GO:0010556 regulation of biosynthetic process regulation of macromolecule biosynthetic process 253.919 188 9.87E-06 down GO:0031326 regulation of cellular biosynthetic process 260.047 194 1.13E-05 down GO:0051252 regulation of RNA metabolic process 146.3 95 1.13E-05 down GO:0006350 transcription 240.706 177 1.16E-05 down GO:0010468 regulation of gene expression 261.387 196 1.36E-05 down GO:0045449 regulation of transcription 229.408 168 1.68E-05 down GO:0006351 transcription, DNA-dependent 156.449 106 3.29E-05 down GO:0003677 DNA binding 189.769 136 5.69E-05 down GO:0032774 157.024 108 6.58E-05 down GO:0006139 RNA biosynthetic process nucleobase, nucleoside, nucleotide and nucleic acid metabolic process 403.858 333 6.90E-05 down GO:0006355 regulation of transcription, DNA-dependent 141.13 95 8.09E-05 down GO:0009059 macromolecule biosynthetic process 334.729 271 0.00017241 down GO:0034645 cellular macromolecule biosynthetic process 330.516 267 0.00017241 down GO:0043234 protein complex 244.536 188 0.00017241 down GO:0044451 nucleoplasm part 71.0437 40 0.000301701 down GO:0034641 cellular nitrogen compound metabolic process 425.305 359 0.000335141 down GO:0006996 organelle organization 152.428 108 0.000343411 down GO:0032991 macromolecular complex 306.962 248 0.000396838 down GO:0030528 transcription regulator activity 127.343 88 0.000684478 down GO:0050789 regulation of biological process 553.03 486 0.000716243 down GO:0006807 nitrogen compound metabolic process 433.156 370 0.00074937 down GO:0009058 biosynthetic process 400.794 339 0.00074937 down GO:0044249 cellular biosynthetic process 392.943 332 0.000821743 down GO:0050794 regulation of cellular process 528.902 464 0.000959832 down GO:0043170 macromolecule metabolic process 586.159 524 0.00225831 down GO:0005667 transcription factor complex 18.3833 5 0.00249422 down GO:0044237 cellular metabolic process 693.011 631 0.00249422 down GO:0003723 RNA binding 83.1077 54 0.0027032 down GO:0043229 intracellular organelle 862.674 805 0.00346513 down GO:0031981 nuclear lumen 179.811 139 0.00351492 down GO:0043226 organelle 863.44 806 0.00351492 down GO:0065007 biological regulation 578.499 519 0.00366681 down GO:0043231 intracellular membrane-bounded organelle 799.481 741 0.0037268 down GO:0005654 nucleoplasm 113.555 81 0.00409069 down GO:0043227 membrane-bounded organelle 799.864 742 0.00410375 down GO:0005488 binding 1019.12 970 0.00449524 down GO:0010629 negative regulation of gene expression 51.32 30 0.00648772 down GO:0044424 intracellular part 996.91 948 0.00648772 down GO:0044428 nuclear part 225.004 183 0.00730263 down GO:0005622 intracellular 1020.46 975 0.0107756 down GO:0051254 positive regulation of RNA metabolic process 43.8518 25 0.0109097 down GO:0003712 transcription cofactor activity 40.9794 23 0.0124891 down GO:0009987 cellular process positive regulation of transcription, DNAdependent negative regulation of macromolecule metabolic process positive regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process positive regulation of nitrogen compound metabolic process transcription from RNA polymerase II promoter 985.803 940 0.0144284 down 43.0858 25 0.0155112 down 77.746 54 0.0185944 down 56.2988 36 0.0185944 down 58.0222 38 0.0248561 down 85.9801 62 0.0273092 down 495.583 448 0.0276423 down 50.937 33 0.0378674 down GO:0010604 intracellular organelle part negative regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process positive regulation of macromolecule metabolic process 87.1291 64 0.0394761 down GO:0008380 RNA splicing 39.6389 24 0.0415643 down GO:0009892 negative regulation of metabolic process 82.3418 60 0.0415643 down GO:0003700 transcription factor activity 67.0224 47 0.0431062 down GO:0006397 mRNA processing 39.4475 24 0.0431062 down GO:0008134 transcription factor binding 58.7882 40 0.0431062 down GO:0010628 positive regulation of gene expression 51.703 34 0.0431062 down GO:0016071 46.9157 30 0.0431062 down GO:0051172 mRNA metabolic process negative regulation of nitrogen compound metabolic process 51.703 34 0.0431062 down GO:0016604 nuclear body 27.9579 15 0.0435841 down GO:0043565 sequence-specific DNA binding 40.5964 25 0.0438237 down GO:0009893 positive regulation of metabolic process 90.9589 68 0.0448117 down GO:0044422 organelle part 499.412 455 0.0448117 down GO:0016481 negative regulation of transcription 45.3837 29 0.0449439 down GO:0045941 positive regulation of transcription positive regulation of cellular metabolic process 50.171 33 0.0455109 down GO:0045893 GO:0010605 GO:0045935 GO:0051173 GO:0006366 GO:0044446 GO:0045934 GO:0031325 87.3206 65 0.0471322 down 742.225 698 0.0520077 down GO:0010557 protein binding positive regulation of macromolecule biosynthetic process 58.0222 40 0.0521694 down GO:0009891 positive regulation of biosynthetic process 60.3201 42 0.0528017 down GO:0048285 organelle fission 33.5112 20 0.0599789 down GO:0006396 RNA processing 72.5757 53 0.062525 down GO:0016563 transcription activator activity 44.4263 29 0.062525 down GO:0031974 223.28 191 0.0633893 down GO:0031328 membrane-enclosed lumen positive regulation of cellular biosynthetic process 59.7457 42 0.0636819 down GO:0005694 chromosome 52.4689 36 0.0688441 down GO:0043233 organelle lumen 219.642 188 0.0688441 down GO:0005515 GO:0044427 chromosomal part 42.8943 28 0.0688441 down GO:0070013 intracellular organelle lumen 218.493 187 0.0688441 down GO:0000775 17.4258 8 0.0712938 down GO:0045944 chromosome, centromeric region positive regulation of transcription from RNA polymerase II promoter 31.5963 19 0.0749643 down GO:0016043 cellular component organization 253.728 221 0.0751558 down GO:0007049 110.683 88 0.081491 down 23.7451 13 0.0826468 down GO:0006357 cell cycle negative regulation of transcription from RNA polymerase II promoter regulation of transcription from RNA polymerase II promoter 67.9798 50 0.0826468 down GO:0000087 M phase of mitotic cell cycle 33.7027 21 0.0860795 down GO:0003713 transcription coactivator activity 26.043 15 0.0938963 down GO:0000280 nuclear division 32.1707 20 0.0974628 down GO:0007067 mitosis negative regulation of macromolecule biosynthetic process 32.1707 20 0.0974628 down 54.7669 39 0.0974628 down GO:0000122 GO:0010558 Table S4. Demographic and environmental data index ID sex age early status early rainfall SCI-F SCI-M 1 LAO F 7.36 4 270.4 -0.3856093 -0.52241845 2 HAV F 7.11 13 280.1 0.461661203 1.3030539 3 NET M 11.97 4 702.2 -0.20947336 0.7208306 4 LAT M 11.59 3 716.6 -0.0853144 -0.878015 5 LAW M 8.92 4 199.2 0.506491543 1.7584365 6 THR M 10.55 8 422 -0.319276067 -2.3777085 7 RIK M 9.23 NA 380.1 NA NA 8 DHO F 6.34 4 198 0.454224 -0.96813245 9 NIK F 11.01 7 547.3 0.21232889 1.11919545 10 VEI M 13.53 4 262.2 NA NA 11 ODE M 5.8 5 212.4 -0.406493733 -1.230359 12 SCE F 10.97 2 424.3 -0.520617833 1.629578 13 OTI M 18.75 NA 271.9 NA NA 14 OPH F 9.96 12 461.3 -0.188829937 -0.02389575 15 LOF M 16.8 16 392.8 0.179244073 -0.8860635 16 OCE M 13.23 9 443.8 1.325138933 -0.8945281 17 LUT M 6.22 7 239.6 0.354122883 0.252442565 18 MAD F 8.66 NA 220.5 NA NA 19 KOR M 7.25 13 340.7 0.9380716 0.3850899 20 KIZ M 15.66 NA 462.8 NA NA 21 FAX F 11.32 15 337 0.169666853 0.07530965 22 NAP F 15.77 13 412.6 NA NA 23 LOG M 10.64 11 350.7 -0.43243739 -0.133389 24 NOK M 6.67 6 280.1 0.1812232 0.36759935 25 WOL M 7.27 6 267.2 0.0255269 -0.9990805 26 LUX F 6.69 15 275.2 -0.50772177 0.8703787 27 VIN F 19.76 19 317.4 NA NA 28 QUA M 14.07 NA 214.5 NA NA 29 VAZ M 14.74 5 276 NA NA 30 WIZ M 11.85 2 606.6 0.4980208 -0.6448934 31 EAS F 6.46 8 257.8 0.127537233 1.211794 32 HOK F 11.72 10 423.2 -0.100373433 0.36817205 33 LAX M 12.65 8 415.9 0.6420283 -0.018190715 34 ZAM M 11.99 NA 337 NA NA 35 TOR M 16.66 NA 326.2 NA NA 36 CAR M 7.4 9 343.3 0.424152667 0.6364931 37 LYC F 7.12 15 334.7 -0.352513733 -0.56316986 38 FAM M 6.84 20 204 -1.307986233 -0.0343694 39 VEX F 11.59 8 337 -0.466001957 0.92390655 40 WEN F 21.37 5 363.6 0.47977995 -2.6580745 41 HUC F 7.72 16 264.6 0.47783111 -0.82024385 42 DED F 5.78 5 317.7 0.26435965 -1.1607869 43 LOU M 7.25 9 214 0.438560367 1.8825625 44 PIT M 6.48 2 338.9 -0.7560449 1.325622 45 NOZ M 9.99 7 517.9 0.223508633 1.17933235 46 BAT F 4.91 17 251.3 0.202330133 -2.03858335 47 BUT M 8.36 NA 281.5 NA NA 48 YOK M 6.75 9 247 0.604935033 0.84237355 49 LYB F 4.22 13 412.3 0.317352433 1.4190425 50 WON F 6.83 3 319.2 0.466852983 -0.0283486 51 EUR F 4.54 7 551.9 0.144907933 0.0824483 52 AMO M 13.76 11 322.2 NA NA 53 EVA F 8.58 6 518.5 0.471769867 0.52521566 54 NJU M 12.01 5 763 0.06423019 -1.5682155 55 COO M 8.15 8 262.2 0.550405133 -1.6419525 56 WRI M 10.73 2 245.2 0.7824479 0.13655645 57 SCO M 11.23 NA 459.9 NA NA 58 KIJ M 6.26 15 214 1.519485367 1.3677687 59 WHE F 13.75 2 216.3 -0.030356642 -1.159825015 60 CAC F 5.42 9 349.7 0.126639667 0.297957 61 GAM M 10.51 NA 448.2 NA NA 62 WEI F 4.94 4 229.6 -0.166366 -0.40710915 63 BOL M 6.47 NA 259 NA NA Raw reads from 63 known Amboseli baboons (~30M/individual) Map to Panu2.0 using bwa; filter for uniquely mapped reads with q >= 10 Uniquely mapped reads (~13.3M/individual; non-uniquely mapped and low quality reads discarded) Reads that do not map to the genome (~10.5M/individual) Count reads that overlap Panu2.0 NCBI RefSeq exons from ref_Panu_2.0_top_level.gff3 Count reads that map uniquely (q >= 10) on one or both ends (with a gap) Uniquely mapped reads in exons (~8.6M/individual; non-exonic r eads discarded) Combine exon reads into pergene read counts Uniquely mapped reads associated with possible exon junctions Combine exon reads into per-gene read counts Count reads that overlap Panu2.0 NCBI RefSeq exons from ref_Panu_2.0_top_level.gf f3 Uniquely mapped reads overlapping exon junctions (~3.2M/individual; non-exonic r eads discarded) Matrix of raw read counts per individual per gene Calculate RPKM and quantile nor malize across individuals and within genes; r etain genes associated with segregating genetic variants Matrix of normalized read counts per individual (n = 63) per gene (n = 10,409) Bayesian sparse linear mixed model (BSLMM) to estimate sources of variance in gene expression levels Linear mixed effects models (GEMMA) for eQTL mapping Beta binomial model to detect ASE Map cis-eQTL within 200 kb for each gene (1,787 genes with eQTL) ASE in genes with multiple exonic heterozygotes (510 genes with ASE) Figure S1. Detailed workflow for gene expression level estimation. Cis, trans contributions to additive genetic variance; sex, age, and early social environmental effects Figure S2. Elimination of GC bias via quantile normalization. Each plot shows gene GC content (x-axis) versus the log of the ratio of the individual’s RPKM for that gene to mean RPKM across all individuals. Data for three individuals are shown in pairs (A and B, C and D, E and F) for prior to (left) and after (right) quantile normalization. Uniquely mapped reads (q >= 10) for 63 baboons Perform indel realignment and remove apparent PCR duplicates Realigned uniquely mapped r eads with unique start/end coordinates Base quality recalibration with no known variant sites Realigned, uniquely mapped r eads with recalibrated base quality scores Realigned, uniquely mapped r eads with recalibrated base quality scores Call putative variable sites using GATK UnifiedGenotyper Unfiltered SNPs and genotypes Repeat until number of high quality SNPs stabilizes (x5) Hard filter for mapping quality of supporting reads, base pair position, strand bias Filtered SNPs and genotypes Filter out monomorphic sites, violations of HWE, high percentage (>=19%) missing data, mean depth <5x Set of filtered SNPs for eQTL mapping (n = 64,432) BEAGLE imputation of remaining missing genotypes Set of filtered genotypes used for eQTL mapping and ASE analysis Figure S3. Detailed workflow for SNP genotyping. Base quality recalibration using high quality sites 0.06 0.00 Density 0.02 0.04 all SNPs SNPs associated with eQTL genes -200 -100 TSS TES 100 200 position relative to gene (kb) Figure S4. Location of analyzed SNPs relative to genes. The locations of all SNPs tested in the eQTL analysis are shown in gold relative to the 5’ most gene transcription start site (TSS) and the 3’ most gene transcription end site (TES) for all 10,409 genes. The location of all SNPs tested in association with eQTL genes are overplotted in blue. Gray shaded rectangle denotes the region bounded by the TSS and TES, with gene lengths divided into 20 bins for visibility. Figure S5. Accuracy of genotype calls for SNPs independently typed in HapMap3. (A) Distribution of correlations between SNPs called using RNA-seq data and SNPs called independently by HapMap3 (n = 9,919 variants). (B) Estimated homozygosity levels for n = 69 YRI individuals at the same set of sites; outliers (denoted with red stars) reflect those individuals with the lowest correlation between RNA-seq-based genotypes and HapMap3 genotypes. The four starred outliers in (B) include the 3 lowest accuracy individuals in the boxplots in (A). Figure S6 B HapMap RNA-seq 57 57 42 67 67 29 29 21 2160 -5 3053 30 62 62 59 59 33 63 23 33 46 1231 49 46 39 66 50 34 39 20 20 44 58 18 11 66 15 17 21 13 21 1666 17 60 451 64 11 5 52 11 54 7 37 37 951 47 47 19 27 1948 35 3224 35 24 61 14 68 3338 68 1465245 40 45 2636 4125 69 22 8 10 28 56 55 55 0 5 HapMap RNA-seq -10 2929 PC 2 (1.9%) 0 10 6767 -10 PC 2 (1.9%) 20 10 30 A 30 62 30 53 5362 59 59 33 3363 23 46 46 31 12 31 39 49 66 50 34 34 39 20 44 20 44 58 58 18 18 13 15 15 1166 16 17 16 66 17 44 1164 64 55 11 51 52 51 54 54 7 52 37 37 9 9 47 47 19 27 35 19 48 48 35 32 24 24 38 38 61 2 68 6833 14 4014 40 45 45 65 65 36 36 41 41 2526 2526 69 69 22 8 22 10 28 56 56 55 55 43 43 -40 -30 -20 -10 PC 1 (2.3%) 0 -10 -5 0 5 10 43 43 PC 1 (2.3%) Figure S6. PCA projection of YRI samples using the RNA-seq-based pipeline versus independently typed SNPs. PCA projection of genotype data from the RNA-seq-based pipeline and the HapMap3 data place individual samples very close together. (A) and (B) show the same data, but (B) zooms in on the central cluster for better visibility. Figure S7. Power simulations for masked eQTL based on effect size Figure S7. Relationship between power to detect eQTL and simulated effect size, when the true eQTL is masked. Purple line shows the baboon data; pink line shows the baboon data with SNP density subsampled to match the YRI; orange line shows the YRI data. Masking the simulated eQTL SNP demonstrates that the baboon data set has greater power to detect eQTL due to both increased cis-regulatory SNP density and more extended LD. Subsampling the SNP density in baboon to the level of the YRI data set reduces the difference in power but does not remove it completely. Figure S8. Correlation between eQTL detection and mean phyloP scores based on 100-way vertebrate comparison. Genes with eQTL in both data set or one data set are less conserved across vertebrates than genes for which no eQTL were detected p = 7.69 x 10-22) Figure S9. PVE explained by demographic and early environmental variables. (A) QQ plots of PVE explained by a variable of interest versus PVE explained by that variable with permuted data, for age (upper left panel) and maternal social connectedness to males (SCI-M, upper right panel). Bottom panels show the difference between evidence for significant PVE by sex for genes on autosomes (bottom left) versus genes on the X chromosome (bottom right). (B) Distribution of PVE explained by additive genetic variance, age, sex, and maternal social connectedness to males across all genes. Figure S10. Coverage by genotype call. Mean coverage by genotype class for (A) all SNPs tested in the baboon eQTL analysis (n = 64,432), and (B) SNPs identified as eQTL (n = 1,693). QQ plot of mean coverage in homozygotes for the reference allele versus homozygotes for the alternate allele for (C) all SNPs and (D) SNPs identified as eQTL. The magnitude of increased coverage in reference allele homozygotes indicates the degree of systematic reference allele mapping bias (dashed line shows the expectation for no mapping bias). Reference allele homozygotes tend to have higher coverage, on average, than alternate allele homozygotes (K-S test: p < 2.2 x 10-16 for all SNPs; p = 3.9 x 10-5 for eQTL SNPs), suggesting some degree of mapping bias; however the effect is actually smaller for eQTL SNPs than for all SNPs (K-S D = 0.167 for all SNPs; K-S D = 0.084 for eQTL SNPs).

Supplementary Materials for Tung et al, “The genetic architecture of

Related documents

Products

Support

Supplementary Materials for Tung et al, “The genetic architecture of

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib