Introduction Biological Background Method Results Moving Forward Modeling gene expression using five histone modifications Fifth Annual Primes MIT Conference Lalita Devadas Mentor: Angela Yen May 17, 2015 Lalita Devadas — Modeling gene expression using five histone modifications 1/31 Introduction Biological Background Method Results Moving Forward Outline 1 Biological Background 2 Method 3 Results 4 Moving Forward Lalita Devadas — Modeling gene expression using five histone modifications 2/31 Introduction Biological Background Method Results Moving Forward Outline 1 Biological Background 2 Method 3 Results 4 Moving Forward Lalita Devadas — Modeling gene expression using five histone modifications 3/31 Introduction Biological Background Method Results Moving Forward Gene Expression Central dogma of molecular biology Lalita Devadas — Modeling gene expression using five histone modifications 4/31 Introduction Biological Background Method Results Moving Forward Gene Expression Relevance Important to understanding biological activity Crucial to advances in medicine Detection, prevention, and treatment of disease Lalita Devadas — Modeling gene expression using five histone modifications 5/31 Introduction Biological Background Method Results Moving Forward Gene Expression Regulation Genetic Sequences of nucleotides (ACTG) Lalita Devadas — Modeling gene expression using five histone modifications 6/31 Introduction Biological Background Method Results Moving Forward Gene Expression Regulation Epigenetic Changes to environment surrounding DNA Lalita Devadas — Modeling gene expression using five histone modifications 7/31 Introduction Biological Background Method Results Moving Forward Epigenetics Histone modifications Chemical changes to histone protein core or protruding tail Lalita Devadas — Modeling gene expression using five histone modifications 8/31 Introduction Biological Background Method Results Moving Forward Epigenomes Roadmap Project Lalita Devadas — Modeling gene expression using five histone modifications 9/31 Introduction Biological Background Method Results Moving Forward Outline 1 Biological Background 2 Method 3 Results 4 Moving Forward Lalita Devadas — Modeling gene expression using five histone modifications 10/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Objective Density of Histone Modifications Prediction Lalita Devadas — Modeling gene expression using five histone modifications Level of Gene Expression 11/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Overview All Histone Data Best-binning Relevant Histone Data Classification “Off” genes (Unexpressed) Predict at 0 Classification “On” genes (Expressed) Regression Predicted Expression Lalita Devadas — Modeling gene expression using five histone modifications 12/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Best-bin approach All Histone Data Best-binning Relevant Histone Data Classification “Off” genes (Unexpressed) Predict at 0 Classification “On” genes (Expressed) Regression Predicted Expression Lalita Devadas — Modeling gene expression using five histone modifications 13/31 Introduction Biological Background Method Results Moving Forward Best-bin approach Dividing genes Lalita Devadas — Modeling gene expression using five histone modifications 14/31 Introduction Biological Background Method Results Moving Forward Best-bin approach Choosing best bin epigenome X, histone mark Y bin 1 bin 2 bin 3 bin 4 . . bin p . . . bin 81 expression gene a gene b gene c . . . . . p = best bin Lalita Devadas — Modeling gene expression using five histone modifications strongest correlation 15/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Classification All Histone Data Best-binning Relevant Histone Data Classification “Off” genes (Unexpressed) Predict at 0 Classification “On” genes (Expressed) Regression Predicted Expression Lalita Devadas — Modeling gene expression using five histone modifications 16/31 Introduction Biological Background Method Results Moving Forward Types of Models Random Forest Random Forest model Returns majority vote of classification determined by a group of decision trees Lalita Devadas — Modeling gene expression using five histone modifications 17/31 Introduction Biological Background Method Results Moving Forward Types of Models Random Forest Random Forest model Returns majority vote of classification determined by a group of decision trees Lalita Devadas — Modeling gene expression using five histone modifications 18/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Regression All Histone Data Best-binning Relevant Histone Data Classification “Off” genes (Unexpressed) Predict at 0 Classification “On” genes (Expressed) Regression Predicted Expression Lalita Devadas — Modeling gene expression using five histone modifications 19/31 Introduction Biological Background Method Results Moving Forward Types of Models Linear Model Linear model Finds a linear correlation between predictors and response Lalita Devadas — Modeling gene expression using five histone modifications 20/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Overview All Histone Data Best-binning Relevant Histone Data Classification “Off” genes (Unexpressed) Predict at 0 Classification “On” genes (Expressed) Regression Predicted Expression Lalita Devadas — Modeling gene expression using five histone modifications 21/31 Introduction Biological Background Method Results Moving Forward Outline 1 Biological Background 2 Method 3 Results 4 Moving Forward Lalita Devadas — Modeling gene expression using five histone modifications 22/31 Introduction Biological Background Method Results Moving Forward Epigenomes Roadmap Project Lalita Devadas — Modeling gene expression using five histone modifications 23/31 Introduction Biological Background Method Results Moving Forward Data Pipeline Objective Density of Histone Modifications Prediction Level of Gene Expression H3K9me3 H3K4me3 H3K4me1 H3K36me3 H3K27me3 Lalita Devadas — Modeling gene expression using five histone modifications 24/31 Introduction Biological Background Method Results Moving Forward Results of Pipeline Conclusions Models created for cultured epigenomes have a much higher predictive power than those created for tissue samples H3K36me3 is the most important histone mark used for prediction Lalita Devadas — Modeling gene expression using five histone modifications 25/31 Introduction Biological Background Method Results Moving Forward Results of Pipeline Graph Lalita Devadas — Modeling gene expression using five histone modifications 26/31 Introduction Biological Background Method Results Moving Forward Specifics of Best Model Classification Accuracy Lalita Devadas — Modeling gene expression using five histone modifications 27/31 Introduction Biological Background Method Results Moving Forward Specifics of Best Model Regression Accuracy every data point represents one gene r2 value: 0.640 Lalita Devadas — Modeling gene expression using five histone modifications 28/31 Introduction Biological Background Method Results Moving Forward Outline 1 Biological Background 2 Method 3 Results 4 Moving Forward Lalita Devadas — Modeling gene expression using five histone modifications 29/31 Introduction Biological Background Method Results Moving Forward Next Steps Improve predictive power Broaden scope of predictors and response Further analysis of current results Apply procedure to different data Release code as a tool for other researchers Lalita Devadas — Modeling gene expression using five histone modifications 30/31 Introduction Biological Background Method Results Moving Forward Acknowledgements I would like to thank: My mentor, Angela Yen Prof. Manolis Kellis Roadmap Project PRIMES program My family Lalita Devadas — Modeling gene expression using five histone modifications 31/31