Presentation

advertisement
BD2K-LINCS
DATA COORDINATION AND
INTEGRATION CENTER
Training and Outreach Efforts
Webinar for the NIH BD2K
Working Group on Training
May 29, 2015
Avi Ma’ayan, PhD (Contact PI)
Associate Professor
Sherry Jenkins, MS
Program Manager
Department of Pharmacology and Systems Therapeutics
Icahn School of Medicine at Mount Sinai
New York, New York
LINCS PHASE II
Library of Integrated Network-based
Cellular Signatures
High Throughput
Transcriptomics
L1000
Connectivity Map
Microenvironment
Effects on
Cancer Cells
Transcriptomics
Imaging
Proteomics
NeuroLINCS
ALS
Imaging
Proteomics
Transcriptomics
High
Throughput
Imaging
Proteomics
Phenotypes
Cancer Cells
Modeling Cell
Signaling
DCIC
Drug
Combinations
Mitigating
Side Effects
Proteomics
Transcriptomics
High Throughput
Proteomics
P100
Epigenomics
BD2K-LINCS Data Coordination and Integration Center
Ma’ayan Medvedovic Schurer
Internal &
External Data
Science
Research
Projects
Metadata,
APIs,
Visualization,
Integration
Tools
Training and
Outreach
Coordination,
Infrastructure
DSR
IKE
CTO
CCA
iDSRs
eDSRs
Summer Research
Training Program
Webinars
Mini-symposium,
seminars and workshops
lincs-dcic.org
lincsproject.org
LINCS Working Groups
BD2K-LINCS Data Coordination and
Integration Center – Scientific Objectives
-
-
-
-
-
Understand how different layers of human cellular
regulatory networks, i.e., transcriptomics and proteomics,
correlate and interact.
Develop methods to benchmark computational and
experimental methods to objectively evaluate their quality
and extract more knowledge from the data.
Understand the inherit biases within low- and high
content experiments, and develop methods to correct for
such biases.
Map the dimensionality of all possible global molecular
states of human cells in normal physiology, disease, and in
response to perturbations by small molecules and genetic
manipulations.
Develop methods to connect cellular and organismal
phenotypes with molecular cellular signatures.
BD2K-LINCS Data Coordination and Integration
Center – Data Science Objectives
-
Organize, curate and serve for search and download the
largest possible collection of annotated molecular
cellular signatures, networks and attribute tables.
-
Develop novel data visualization methods for
dynamically interacting with large-genomics and
proteomics datasets.
-
Develop educational and outreach activities for training
and engaging the next generation of data scientists.
-
Develop ontologies and other methods for data
integration across diverse sets of experimental data
collected by different laboratories, centers and large-scale
projects utilizing different high content profiling assays.
Community Training and Outreach (CTO)
lincs-dcic.org
DCIC Outreach Activities
•
Courses
•
•
MOOCs on Coursera:
1. Network Analysis in Systems Biology
2. Big Data Science with the BD2K-LINCS
DCIC
ISMMS Graduate Courses:
1. BD2K-LINCS DCIC - Programming for Big
Data Biomedicine
2. BD2K-LINCS DCIC - Data Mining in Systems
Biology
•
Big Data Biostatistics PhD Program (at the
University of Cincinnati College of Medicine)
•
Summer Research Training Program in
Biomedical Big Data
•
Data Science Research Webinars
•
Crowdsourcing Projects Portal
•
External Data Science Projects
•
Mini-symposium, Seminars and Workshops
Funding Opportunities
6
Network Analysis in Systems Biology
MOOC on Coursera
Description:
https://class.coursera.org/netsysbio-002
A graduate-level course which serves as
an introduction to Big Data analysis in
systems biology including statistical
methods used to identify differentially
expressed genes, performing various
types of enrichment analyses, and
applying clustering algorithms.
Course Features:
•
8 weeks / 7 modules
•
Weekly overviews
•
34 Short video lectures
•
24 Auto-graded short quizzes
•
Crowdsourcing tasks
•
Auto-graded final exam
•
Discussion forum
Last session: January 5, 2015 – March 3, 2015
7
Network Analysis in Systems Biology
Course Analytics
Engagement:
Content:
~600 students passed the course to obtain a statement of accomplishment
8
Big Data Science with the BD2K-LINCS Data
Coordination and Integration Center
MOOC on Coursera
https://www.coursera.org/course/bd2klincs
Session: Sep 15 – Nov 9 2015
Syllabus:
• Overview of the NIH Common Fund LINCS
Program
• Overview of the Data and Signature
Generation Centers (experiments and data)
• Meta-Data and Ontologies
• Data Normalization
• Unsupervised Learning Methods: Data
Clustering
• Supervised Learning Methods
• Enrichment Analyses
• Bayesian Data Integration
• Network Analysis and Network Visualization
• Cheminformatics
• Serving data through RESTful APIs and
JSON
• Interactive Data Visualization of LINCS Data
9
BD2K-LINCS DCIC: Programming for Big Data Biomedicine
ISMMS Graduate Course
Spring 2015
Course Dates: Feb 24 – May 4 2015
Ten-week mini-course taught by Avi Ma’ayan PhD and
members of his research team within the BD2K-LINCS
DCIC at the Icahn School of Medicine at Mount Sinai
Topics:
•
•
•
•
•
•
•
•
•
•
•
Agent Based Modeling with NetLogo
Agent Based Modeling with MATLAB
Python
Python and MatPlotLib
HTML and CSS
JavaScript and PHP
MySQL
MongoDB
Bootstrap Templates
R
Final Project
10
BD2K-LINCS DCIC: Data Mining in Systems Biology
ISMMS Graduate Course
Fall 2015
Fall 2014 Course Dates:
September 16 – December 2, 2014
Ten-week mini-course taught by Avi Ma’ayan PhD and
members of his research team within the BD2K-LINCS
DCIC at the Icahn School of Medicine at Mount Sinai
Topics:
•
•
•
•
•
•
•
•
Self Organizing Maps
Hierarchical Clustering
PCA
Linear Regression
Decision Trees
Graph Theory Concepts
Support Vector Machines
Final Project
11
BD2K-LINCS DCIC Summer Research Training
Program in Biomedical Big Data Science
Summer 2015 Program Dates:
June 1 – August 7
Ten-week training program for undergraduate and master’s students interested
in research projects aimed at solving data-intensive biomedical problems.
Summer 2015 | Training Sites
Icahn School of Medicine at Mount Sinai
University of Washington
Ma’ayan Laboratory of Computational Systems Biology Yeung / Computational Systems Biology Group
Dynamic Data Visualization
Machine Learning
Data Harmonization
Machine Learning
Data Integration
Network Visualization Plugins
2015 Cohort Summary
• 6 trainees
• 2 master’s / 3 undergraduate / 1 high school
• 4 women / 2 men
• All future plans include STEM graduate degrees
•
•
•
•
•
•
Carnegie Mellon University
Bar-Joseph / Systems Biology Group
Machine Learning
Time Series Analysis
Transcriptional Regulatory Networks
Carnegie Mellon University, Biological Sciences (Bar-Joseph)
Carnegie Mellon University, Computational Biology (Ma’ayan)
University of Washington, Computer Engineering (Yeung)
University of Washington, Computer Science (Ma’ayan)
The City College of New York, Bioinformatics (Ma’ayan)
Yorktown High School (Ma’ayan)
http://lincs-dcic.org/#/srp
12
Data Science Research Webinars
Purpose / Target Audience
Serve as a general forum to engage data scientists within and
outside of the LINCS project to work on problems related to
LINCS data analysis and integration.
BD2K-LINCS DCIC
|
@BD2KLINCSDCIC
|
http://lincs-dcic.org/#/webinars
|
•
Open to data science
research community
•
Advertised on DCIC website,
LINCS portal, Twitter,
Google group
•
Schedule and connection
details posted on the DCIC
website and LINCS portal
•
Past webinar videos posted
on the DCIC’s YouTube
channel
www.lincsproject.org/community/webinars/
13
BD2K-LINCS DCIC Crowdsourcing Portal
http://www.maayanlab.net/crowdsourcing/
14
Community Science Project: Building a Database of Gene
Expression Signatures Extracted from Single Gene
Knockout/Knockdown Studies
http://www.maayanlab.net/crowdsourcing/
15
Data Science Research Collaborations with the
BD2K-LINCS DCIC
http://www.lincs-dcic.org/#/edsr
16
Mini-symposium, Seminars and Workshops
Winter 2014 - 2015
Mini-symposium | January 7, 2015
Invited Seminar Speakers
December 5, 2014
Reverse Engineering a more Reliable Translational Pipeline with
Patient-Derived iPSC Models of Neurodegenerative Disease,
Robotic Longitudinal Single Cell Analysis and Deep Learning
Steven Finkbeiner, MD, PhD / NeuroLINCS Center
January 14, 2015
The PAGE Study and Coordinating Center (Population
Architecture using Genomics and Epidemiology)
Tara Matise, PhD / PAGE Coordinating Center
Works in Progress Seminar Series
January 15, 2015
Enrichr and GEO2Enrichr: Tools to Extract and Analyze Signatures
Gregory Gundersen and Matthew Jones / BD2K-LINCS DCIC
Outreach Session at the Society of Toxicology’s
Annual Meeting
March 23, 2015
BD2K-LINCS Outreach Session:
Turning Big Data to Knowledge (BD2K-LINCS): A discussion of the NIH
BD2K initiative and how it might advance the practice of Toxicology and
Risk Assessment
John Reichard PhD, Mario Medvedovic PhD / BD2K-LINCS DCIC
Symposium was co-sponsored by the BD2K-LINCS
DCIC and Mount Sinai’s Knowledge Management
Center for Illuminating the Druggable Genome
Poster Session:
Big Data to Knowledge (BD2K) - A Graphical Approach for Data
Coordination and Integration
J.F. Reichard, M. Medvedovic, S. Sivagas / BD2K-LINCS DCIC
Calendar of events on lincs-dcic.org
17
Genomic and Computational Approaches for
Biomarker and Drug Discovery
WORKSHOP | June 19, 2015
Hands-on Session: Web Apps and Tools
Enrichr
Search engine for gene lists and signatures
http://amp.pharm.mssm.edu/Enrichr/
GEO2Enrichr
Differential Expression Analysis Tool
http://maayanlab.net/g2e
L1000CDS2
L1000 Characteristic Direction Signature
Search Engine
http://amp.pharm.mssm.edu/L1000CDS2/
PAEA
Principle Angle Enrichment Analysis
http://amp.pharm.mssm.edu/PAEA/
Workshop hosted by the NIAAA
Location:
San Antonio, TX
Grand Hyatt, San Antonio
Room: Travis C/D
Time: 2:00 – 5:00pm
http://www.lincsproject.org/news/
18
Acknowledgements
The BD2K-LINCS DCIC is co-funded by BD2K
and the NIH Common Fund
NIH Grant Number: U54HL127624
Follow BD2K-LINCS DCIC
BD2K-LINCS DCIC WEBSITE
LINCS CONSORTIUM PORTAL
lincs-dcic.org
lincsproject.org
BD2K-LINCS DCIC
@BD2KLINCSDCIC
+BD2K-LINCS
19
Download