BD2K-LINCS DATA COORDINATION AND INTEGRATION CENTER Training and Outreach Efforts Webinar for the NIH BD2K Working Group on Training May 29, 2015 Avi Ma’ayan, PhD (Contact PI) Associate Professor Sherry Jenkins, MS Program Manager Department of Pharmacology and Systems Therapeutics Icahn School of Medicine at Mount Sinai New York, New York LINCS PHASE II Library of Integrated Network-based Cellular Signatures High Throughput Transcriptomics L1000 Connectivity Map Microenvironment Effects on Cancer Cells Transcriptomics Imaging Proteomics NeuroLINCS ALS Imaging Proteomics Transcriptomics High Throughput Imaging Proteomics Phenotypes Cancer Cells Modeling Cell Signaling DCIC Drug Combinations Mitigating Side Effects Proteomics Transcriptomics High Throughput Proteomics P100 Epigenomics BD2K-LINCS Data Coordination and Integration Center Ma’ayan Medvedovic Schurer Internal & External Data Science Research Projects Metadata, APIs, Visualization, Integration Tools Training and Outreach Coordination, Infrastructure DSR IKE CTO CCA iDSRs eDSRs Summer Research Training Program Webinars Mini-symposium, seminars and workshops lincs-dcic.org lincsproject.org LINCS Working Groups BD2K-LINCS Data Coordination and Integration Center – Scientific Objectives - - - - - Understand how different layers of human cellular regulatory networks, i.e., transcriptomics and proteomics, correlate and interact. Develop methods to benchmark computational and experimental methods to objectively evaluate their quality and extract more knowledge from the data. Understand the inherit biases within low- and high content experiments, and develop methods to correct for such biases. Map the dimensionality of all possible global molecular states of human cells in normal physiology, disease, and in response to perturbations by small molecules and genetic manipulations. Develop methods to connect cellular and organismal phenotypes with molecular cellular signatures. BD2K-LINCS Data Coordination and Integration Center – Data Science Objectives - Organize, curate and serve for search and download the largest possible collection of annotated molecular cellular signatures, networks and attribute tables. - Develop novel data visualization methods for dynamically interacting with large-genomics and proteomics datasets. - Develop educational and outreach activities for training and engaging the next generation of data scientists. - Develop ontologies and other methods for data integration across diverse sets of experimental data collected by different laboratories, centers and large-scale projects utilizing different high content profiling assays. Community Training and Outreach (CTO) lincs-dcic.org DCIC Outreach Activities • Courses • • MOOCs on Coursera: 1. Network Analysis in Systems Biology 2. Big Data Science with the BD2K-LINCS DCIC ISMMS Graduate Courses: 1. BD2K-LINCS DCIC - Programming for Big Data Biomedicine 2. BD2K-LINCS DCIC - Data Mining in Systems Biology • Big Data Biostatistics PhD Program (at the University of Cincinnati College of Medicine) • Summer Research Training Program in Biomedical Big Data • Data Science Research Webinars • Crowdsourcing Projects Portal • External Data Science Projects • Mini-symposium, Seminars and Workshops Funding Opportunities 6 Network Analysis in Systems Biology MOOC on Coursera Description: https://class.coursera.org/netsysbio-002 A graduate-level course which serves as an introduction to Big Data analysis in systems biology including statistical methods used to identify differentially expressed genes, performing various types of enrichment analyses, and applying clustering algorithms. Course Features: • 8 weeks / 7 modules • Weekly overviews • 34 Short video lectures • 24 Auto-graded short quizzes • Crowdsourcing tasks • Auto-graded final exam • Discussion forum Last session: January 5, 2015 – March 3, 2015 7 Network Analysis in Systems Biology Course Analytics Engagement: Content: ~600 students passed the course to obtain a statement of accomplishment 8 Big Data Science with the BD2K-LINCS Data Coordination and Integration Center MOOC on Coursera https://www.coursera.org/course/bd2klincs Session: Sep 15 – Nov 9 2015 Syllabus: • Overview of the NIH Common Fund LINCS Program • Overview of the Data and Signature Generation Centers (experiments and data) • Meta-Data and Ontologies • Data Normalization • Unsupervised Learning Methods: Data Clustering • Supervised Learning Methods • Enrichment Analyses • Bayesian Data Integration • Network Analysis and Network Visualization • Cheminformatics • Serving data through RESTful APIs and JSON • Interactive Data Visualization of LINCS Data 9 BD2K-LINCS DCIC: Programming for Big Data Biomedicine ISMMS Graduate Course Spring 2015 Course Dates: Feb 24 – May 4 2015 Ten-week mini-course taught by Avi Ma’ayan PhD and members of his research team within the BD2K-LINCS DCIC at the Icahn School of Medicine at Mount Sinai Topics: • • • • • • • • • • • Agent Based Modeling with NetLogo Agent Based Modeling with MATLAB Python Python and MatPlotLib HTML and CSS JavaScript and PHP MySQL MongoDB Bootstrap Templates R Final Project 10 BD2K-LINCS DCIC: Data Mining in Systems Biology ISMMS Graduate Course Fall 2015 Fall 2014 Course Dates: September 16 – December 2, 2014 Ten-week mini-course taught by Avi Ma’ayan PhD and members of his research team within the BD2K-LINCS DCIC at the Icahn School of Medicine at Mount Sinai Topics: • • • • • • • • Self Organizing Maps Hierarchical Clustering PCA Linear Regression Decision Trees Graph Theory Concepts Support Vector Machines Final Project 11 BD2K-LINCS DCIC Summer Research Training Program in Biomedical Big Data Science Summer 2015 Program Dates: June 1 – August 7 Ten-week training program for undergraduate and master’s students interested in research projects aimed at solving data-intensive biomedical problems. Summer 2015 | Training Sites Icahn School of Medicine at Mount Sinai University of Washington Ma’ayan Laboratory of Computational Systems Biology Yeung / Computational Systems Biology Group Dynamic Data Visualization Machine Learning Data Harmonization Machine Learning Data Integration Network Visualization Plugins 2015 Cohort Summary • 6 trainees • 2 master’s / 3 undergraduate / 1 high school • 4 women / 2 men • All future plans include STEM graduate degrees • • • • • • Carnegie Mellon University Bar-Joseph / Systems Biology Group Machine Learning Time Series Analysis Transcriptional Regulatory Networks Carnegie Mellon University, Biological Sciences (Bar-Joseph) Carnegie Mellon University, Computational Biology (Ma’ayan) University of Washington, Computer Engineering (Yeung) University of Washington, Computer Science (Ma’ayan) The City College of New York, Bioinformatics (Ma’ayan) Yorktown High School (Ma’ayan) http://lincs-dcic.org/#/srp 12 Data Science Research Webinars Purpose / Target Audience Serve as a general forum to engage data scientists within and outside of the LINCS project to work on problems related to LINCS data analysis and integration. BD2K-LINCS DCIC | @BD2KLINCSDCIC | http://lincs-dcic.org/#/webinars | • Open to data science research community • Advertised on DCIC website, LINCS portal, Twitter, Google group • Schedule and connection details posted on the DCIC website and LINCS portal • Past webinar videos posted on the DCIC’s YouTube channel www.lincsproject.org/community/webinars/ 13 BD2K-LINCS DCIC Crowdsourcing Portal http://www.maayanlab.net/crowdsourcing/ 14 Community Science Project: Building a Database of Gene Expression Signatures Extracted from Single Gene Knockout/Knockdown Studies http://www.maayanlab.net/crowdsourcing/ 15 Data Science Research Collaborations with the BD2K-LINCS DCIC http://www.lincs-dcic.org/#/edsr 16 Mini-symposium, Seminars and Workshops Winter 2014 - 2015 Mini-symposium | January 7, 2015 Invited Seminar Speakers December 5, 2014 Reverse Engineering a more Reliable Translational Pipeline with Patient-Derived iPSC Models of Neurodegenerative Disease, Robotic Longitudinal Single Cell Analysis and Deep Learning Steven Finkbeiner, MD, PhD / NeuroLINCS Center January 14, 2015 The PAGE Study and Coordinating Center (Population Architecture using Genomics and Epidemiology) Tara Matise, PhD / PAGE Coordinating Center Works in Progress Seminar Series January 15, 2015 Enrichr and GEO2Enrichr: Tools to Extract and Analyze Signatures Gregory Gundersen and Matthew Jones / BD2K-LINCS DCIC Outreach Session at the Society of Toxicology’s Annual Meeting March 23, 2015 BD2K-LINCS Outreach Session: Turning Big Data to Knowledge (BD2K-LINCS): A discussion of the NIH BD2K initiative and how it might advance the practice of Toxicology and Risk Assessment John Reichard PhD, Mario Medvedovic PhD / BD2K-LINCS DCIC Symposium was co-sponsored by the BD2K-LINCS DCIC and Mount Sinai’s Knowledge Management Center for Illuminating the Druggable Genome Poster Session: Big Data to Knowledge (BD2K) - A Graphical Approach for Data Coordination and Integration J.F. Reichard, M. Medvedovic, S. Sivagas / BD2K-LINCS DCIC Calendar of events on lincs-dcic.org 17 Genomic and Computational Approaches for Biomarker and Drug Discovery WORKSHOP | June 19, 2015 Hands-on Session: Web Apps and Tools Enrichr Search engine for gene lists and signatures http://amp.pharm.mssm.edu/Enrichr/ GEO2Enrichr Differential Expression Analysis Tool http://maayanlab.net/g2e L1000CDS2 L1000 Characteristic Direction Signature Search Engine http://amp.pharm.mssm.edu/L1000CDS2/ PAEA Principle Angle Enrichment Analysis http://amp.pharm.mssm.edu/PAEA/ Workshop hosted by the NIAAA Location: San Antonio, TX Grand Hyatt, San Antonio Room: Travis C/D Time: 2:00 – 5:00pm http://www.lincsproject.org/news/ 18 Acknowledgements The BD2K-LINCS DCIC is co-funded by BD2K and the NIH Common Fund NIH Grant Number: U54HL127624 Follow BD2K-LINCS DCIC BD2K-LINCS DCIC WEBSITE LINCS CONSORTIUM PORTAL lincs-dcic.org lincsproject.org BD2K-LINCS DCIC @BD2KLINCSDCIC +BD2K-LINCS 19