Mutations and Epimutations A story of two cultivars and their children. Matteo Pellegrini Nipponbare and 93-11 • Nipponbare: – Oryza sativa japonica • Primarily Japan, China, Indonesia • Agronomic differences: • Days to heading • 93-11 – Oryza sativa indica • India, Bangladesh, Nepal, China • Submerged growth • Agronomic differences: • Seed fertility • Long grain • Taller (83 cm) Why Study Crosses? • Crosses of Indica and Japonica are often sterile • Show hybrid vigor in agronomic traits Overview • 2 rice ecotypes: Nipponbare and 93-11 • Generated BS-seq data for NPB, 93-11, and 2 reciprocal crosses NPB 9311 • Identify SNPs between ecotypes. – SNP generation P • Identify epiMutations between ecotypes. – Identify methyl-inheritance F1 • Identify allele-specific expression • Identify RNA editing Detecting Cytosine Methylation A, Cunmethylated, Cmethylated, G, T ? … m mm … … ACCCGTACCCGATTAG … … ATCTGTATCCGATTAG … Apply sodium bisulfite and amplify: Unmethylated C → T, methylated C (and A/G/T) unchanged Try to align new sequence to known reference; compare Mapping Approach: BS Seeker BS reads are C/T converted, so normal aligners are not applicable Three letter alignment: Convert C to T BS read: AATCGTA Bowtie mapping AATTGTA AATTGTA TTAATTGTAGG Ref. CTAATCGCAG genome: G Restore to 4 letters Compare alignments m u AATCGTA CTAATCGCAG G TTAATTGTAGG Chen et al (2010) BMC Bioinformatics Methylation levels at single-base resolution tagtgcgtggtg cattttagtgcgtgg ttttagcgcgtggtg Ref. 5’--attgagacatcctagcgcgtggtgacaataata—-3’ genome: 1/(1+2)=33.3% 3/(3+0)=100% Calculate methylation level at each covered cytosine Methylation level= #C/(#C+#T) 7 Workflow • Alignments – BS-Seeker mapping of NPB and 9311 samples to NPB reference genome. – Maps 9311 genome to NPB coordinates • Parent genomes – Each read generates a small implied sequence fragment. – Use this to generate a parent genome. • F1 read matching • Map reads to NPB reference genome to get location. • Compare each read to NPB and 9311 parent genomes and determine better match. Detecting Alelle-Specific methylation parent1/parent2 SNP BS-seq parent1 parent2 Methylation level at CG sites Methylation level at CG sites Library statistics Methyl-Seq Reads Mapped % Mapped Coverage NPB 298M 134M 45% 17.58 93-11 157M 74M 47% 10.14 NPB x 93-11 594M 279M 47% 20.04 -NPB 6.51 -93-11 6.08 93-11 x NPB 543M 236M 43% -NPB -93-11 RNA-Seq NPB 42M 17M 42% 93-11 42M 13M 31% NPB x 93-11 48M 12M 26% 43M 11M 25% -NPB -93-11 93-11 x NPB -NPB -93-11 25.77 7.45 6.59 Identifying SNPs • If sites: – – – – > 3 reads/strand > 90% agreement within ecotype Strands agree with each other (compensate for Cs). (obviously) disagree with each other. • Will miss indels, dups, inversions, other chr rearrangements. • Will miss long runs of SNPs ( > 3 within ~55 bp) (BS-seeker limit) SNPs - NPB vs 93-11 A C G T 216,135 42,513 A 86,677,300 42,553 C 43,336 65,771,387 34,146 G 226,045 34,146 65,771,387 43,336 T 42,513 216,135 42,553 226,045 86,677,300 • 1,209,456 mutations / 306,106,830 sites with mutual base calls • ~ 1/253 bases • Mostly (73%) C->T (or G>A if C->T on opposite strand) or T->C & A->G if in other 93-11 SNPs - NPB vs F1 (9N-NPB) A C G T A 3,188,414 - 3 - C - 2,695,005 - 3 G 2 - 2,548,205 - T - 4 - 3,253,196 • 12 mutations • Are these real or false? • Similar numbers amongst all F1 comparisons Identifying epimutations Min/max • Use the binomial dist. to build min, max, and mean pct methylation at each C. • Confidence intervals at 5% are min, max As # of reads ^, interval size v Reads Identifying epimutations (cont) • Called different if: – mean(sample1) < min(sample2) & mean(sample2) > max(sample1) Epimutation rate 1 in 300 CG sites spontaneously mutate across one generation Epimutation clusters 9311 cross 9311 cross 9311 parent NPB cross NPB cross NPB parent Epimutation clusters II 9311 cross 9311 cross 9311 parent NPB cross NPB cross NPB parent Epimutations are enriched in regions where parents differ Half of the epimutations between parents and crosses occur at sites where parents differ Epimutations (continued) • Epimutations within genes – 498 genes were significantly enriched for epimutations – GO Term x-ecotypes indicates: ATP synthesizing related activity (ATP synthesis coupled proton transport, hydrogen transport, ion transmembrane transport, etc). Expression • Many genes (~7800/25640) are differentially expressed between ecotypes. • GO term: choroplast related terms, response to cadmiumion. Expression cont. • Across generations, only 78 genes differentially expressed • Of these only 2 were differentially expressed in the parents Allele Specific Expression • 681 examples of allele specific expression • Partially explain hybrid vigor? NPB cross NPB parent 9311 parent 9311 cross NPB cross 9311 cross Allele-Specific Genes Accumulate Mutations SNP Density All genes Allele-specific genes And are also enriched for differentially methylated sites Allele-specific Expression cont. And are also enriched for differentially methylated sites RNA Editing • Cytidine deamination : C to U • Adenosine deaminase: A to I (G) How Widespread • Recent studies indicate that RNA editing may be more widespread than originally thought • Others have disputed this claim (Schrider et al, PlosOne) • In plants RNA editing is thought to take place in the mitochondria and plastids • Is there editing in nuclear genes? Science. 2011 Jul 1;333(6038):53-8. RNA Editing in Rice NPB - RNA A NPB - DNA C G T A 5535334 6907 3063 2219 C 4758 4436282 4279 7054 G 3777 2437 4382636 4213 T 2210 3227 6949 5577323 Initially we found lots of examples…. On Closer Inspection… Alignments are often off by one or more bases at splice sites But a Few Real Ones Remain? But more Filtering Should be done… Position of edit site along read Current Numbers Conclusions • Epimutation rates are one in 300 cytosines across one generation – Clusters of epimutations are present – Are enriched in sites where parental epigenomes differ • Allele-specific expression is widespread and associated with – Increased SNP densities – Higher differential methylation • Find some evidence for RNA editing but… Acknowledgements –Krishna Chodavarapu (Pellegrini Lab) –Suhua Feng (Steve Jacobsen Lab) –Blake Myers, Guo-liang Wang, Yulin Jia