High throughput genome-wide scan for epistasis with

High throughput genome-wide scan for epistasis with implementation to Recombinant Inbred Lines (RIL) populations Pavel Goldstein Dr. Anat Reiner-Benaim Prof. Abraham Korol 1 Outline  Problem description  Modeling epistasis: NOIA – the model for epistasis identification  Dimensionality: • Multi-trait complexes • Two-stage hypothesis testing • Hierarchical FDR control in eQTL analysis  Proposed algorithm for epistasis identification  Results:  • Simulation study • Implementation on Arabidopsis data Conclusions and discussion 2 eQTL analysis  The goal: find loci of which genotypic variation has an effect on the quantitative trait of interest using gene expressions as phenotype and molecular markers as genotype information. 3 Problem description  Epistasis – nonadditivity in the contributions of several genes to a trait.  The number of tests involved is enormous  Error control 4 Statistical epistasis no epistasis epistasis 5 Natural and Orthogonal Interactions (NOIA) model (Alvarez-Castro and Carlborg , 2007) for RIL population For loci A and B, trait t, loci-pair l and replicate i : design matrix gene expression Indicator of genotype combinations for two loci vector of genetic effects phenotypes 6 The Weighted Gene Co-Expression Network Analysis (WGCNA) (Zhang and Horvath, 2005)  Top-down hierarchical clustering.  Dynamic Tree Cut algorithm: branch cutting method for detecting gene modules, depending on their shape  Building up meta-genes by taking the first principal component of the genes from every cluster. 7 Two-stage hypothesis testing Framework marker Secondary markers 8 False Discovery Rate(FDR) in eQTL analysis  FDR is the expected proportion of erroneously identified epistasis effects among all identified ones. Hierarchical FDR control (Yekutieli, 2008) :  Full-tree FDR - all epistasis discoveries, whether in framework or in secondary marker pairs. 9 Hierarchical FDR control A universal upper bound is derived for the full-tree FDR: An upper bound for 𝜹* may be estimated using: where RtPi=0 and RtPi=1 are the number of discoveries in τt, given that Hi is a true null hypothesis in τt, and false null hypothesis, respectively. . 10 Simulation study  5 clusters of 10 traits each were simulated with different forms of epistasis or no epistasis  Six configurations: effect size (1%, 2%, 3%) X two/four epistatic clusters  Replicated 1000 times  Heritability (effect size): 11 The WGCNA hierarchical clustering 12 Heritability gain 13 Power gain 14 Real Data of West et. al,2006   A sample of 210 RIL population individuals was derived from a cross between two inbred Arabidopsis thaliana accessions, Bayreuth-0 (Bay-0) and Shahdara (Sha). Genotype map consists of 579 markers  Genome-wide transcript (mRNA) levels were quantified using Affymetrix whole-genome microarrays  Total of 22,810 gene expressions from all five chromosomes. 15 Preprocessing  The Variance Stabilization Normalization  Gene expression filtering: 7244 genes out of 22810  Markers preprocessing 16 Two-stage hierarchical testing for epistasis  Identified 314 gene clusters (WGSNA)  47 sparse "framework" markers that are within 10 cM of each other  10-12 “secondary" markers related to each "framework" marker  First step: 1981 marker pairs X 314 meta-genes =339,434 tests 17 Hierarchical FDR control  A universal upper bound is derived for the full-tree FDR: 𝜹*=1.015 (SE=0.008) q*=q/2𝜹*=0.1/2*1.015=0.0472 18 Two-stage hierarchical testing for epistasis   First stage – 11 significant epistatic areas Second stage – 1141 significant epistatic effects out of 1673 (68%) 19 Epistasis detected, superimposed on the Arabidopsis markers map 20 Computational advantage  Using the two-stage algorithm on meta-genes, 341,107 hypotheses were tests  Naive analysis: 121278 loci pairs for each of 7244 traits, namely 878,537,832 tests would have been performed  Reduction of tests number by 2575 times 21 Epistasis heritability: meta−genes vs single genes Meta-genes Single genes 22 Total heritability: meta−genes vs single genes Meta-genes Single genes 23 T-values of epistatic effects: meta−genes vs single genes Meta-genes Single genes 24 Further research  The method by which markers are chosen may take the genomewide marker distribution into consideration.  Generalization of the NOIA model  Using GO for the validation of the approach 25 Acknowledgements Dr. Anat Reiner-Benaim Prof. Abraham Korol 26 Thank you 27

High throughput genome-wide scan for epistasis with

Related documents

Products

Support

High throughput genome-wide scan for epistasis with

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib