S1 Text. Supplementary Methods

advertisement
S1 Text. Supplementary Methods
Filtering of Genomic Regions
We performed filtering of the assembly considered in analyses of allelic expression
biases, to identify regions where we have high confidence in our SNP calls. To do so,
we first identified genomic regions with evidence for large-scale copy-number
variation, second, we identified repeats and selfish genetic elements, and third, we
identified genomic regions with unusually high proportions of heterozygous genotype
calls in the inbred C. rubella line Cr1GR1, which is expected to be highly
homozygous. Regions with evidence for high proportions of repeats, copy number
variation or high proportion of heterozygous calls in Cr1GR1 were removed from
consideration in further analyses of allele-specific expression.
To identify regions with large-scale copy-number variation, we used the
software Control-FREEC, which uses information on read depth to call copy number
variants (Boeva et al 2011, http://bioinfo-out.curie.fr/projects/freec/). Control-FREEC
does not require sequences from a reference sample, and controls for variation in GC
content, a major source of variability in read depth. We ran Ctrl-FREEC 6.4 in sliding
windows of 50 kb on .bam files filtered to retain only primary alignments. We then
filtered all genomic regions with copy number variant calls in any of the samples for
which we had genomic data.
To identify regions with repeats, we ran RepeatModeler 1.0.5
(www.repeatmasker.org) on the C. rubella reference genome to build a custom library
of repeats. We then ran RepeatMasker 4.0.1 (www.repeatmasker.org) using this
custom library to identify repetitive regions. We assessed the cumulative distribution
of repeats in the genome and set a threshold for filtering such that all 50 kb windows
within the 30% most repetitive regions were removed. This corresponded to filtering
all 50 kb windows with more than ~17% of sites assigned as repeats by
RepeatMasker.
Finally, we filtered all genomic regions containing a high proportion of
heterozygous genotype calls in the C. rubella line Cr1GR1 which has been inbred in
the lab for at least six generations. This was also done using the cumulative
distribution of the fraction of heterozygous calls across the genome, in 50 kb
windows, and we set the cutoff such that regions within the top 20% most
heterozygous regions were removed. This corresponded to filtering all 50 kb windows
with more than ~9% heterozygous calls in C. rubella Cr1GR1.
Supplementary Figs S2-S5 show the regions kept for analysis, as well as the
distribution of repeats, copy number variant calls, and fraction of heterozygous SNP
calls in C. rubella across the eight main scaffolds of the C. rubella assembly. For all
scaffolds, the proportion of sites assigned as repeats was elevated around centromeric
regions, and genomic windows with evidence for copy number variation or high
proportions of repeats often overlapped with windows with high repeat content and
high proportions of heterozygous calls in C. rubella Cr1GR1. After filtering, we
retained approximately 55% of the assembly, where we had high confidence in our
SNP calls.
Validation of ASE Results by qPCR
We validated ASE results by performing qPCR. For this we used the TaqMan®
Reverse Transcription Reagents (LifeTechnologies, Carlsbad, CA, USA) using
oligo(dT)16s to convert mRNA into cDNA using the manufacturers protocol and
performed
qPCR
with
the
Custom
TaqMan®
Gene
Expression
Assay
(LifeTechnologies, Carlsbad, CA, USA) with the colors FAM and VIC using
manufacturers protocol. The qPCR for both alleles was multiplexed in one well to
directly compare the two alleles using a Bio-Rad CFX96 Touch™ Real-Time PCR
Detection System (Bio-Rad, Hercules, CA, USA).
Primers were designed to match both alleles, whereas probes were designed to
overlay SNPs that separated the two alleles. The primers were additionally designed
to have either the forward or the reverse primer overlaying an intron to make sure
only RNA was included in the analysis and not any remaining DNA contamination.
To exclude color bias, we tested five genes using reciprocal probes (S11 Table) with
VIC and FAM colorant and assessed if we saw a difference in the expression signal.
Difference in expression signal was inferred by the relative expression difference
between the two alleles, as well as the Quantification Cycle (Cq value) (Table 4).
References
Boeva V, Zinovyev A, Bleakley K, Vert J-P, Janoueix-Lerosey I, Delattre O, Barillot
E. 2011. Control-free calling of copy number alterations in deep-sequencing data
using GC-content normalization. Bioinformatics 27:268–269.
Download