FDA and M-CERSI Workshop. Complex data analytics Henry “Skip” Francis, M.D. Data Mining Research and Evaluation Team, FDA/CDER September 11, 2015 Pharmacovigilance IT Tools Should Not Replace Our Own Eyes or Good Clinical Judgment (Slide courtesy of June Almenoff, M.D., Ph.D.) • Human Judgment is irreplacable DIA www.diahome.org 2 Workshop Questions From a Data mining and informatics viewpoint • • • • Mobile health and social media. Tools and technologies for real time analysis. Methodological opportunities and challenges. Understanding trust and influence in social media. • Standards and Inter-operatibility Complex Data Analytics Unifying Goal • Convergence is an approach to problem solving that integrates expertise from life sciences with physical, mathematical and computational sciences, medicine and engineering to form comprehensive synthetic frameworks that merge areas of knowledge from multiple fields to address specific challenges. • National Research Council. Convergence: Facilitating Transdisciplinary Integration of Life Sciences, Physical Sciences, Engineering, and Beyond. Washington, DC: The National Academies Press, 2014. Converging data sources 5 FDA Complex Data Analytic methods Components of data mining program (CRISP_DM model) Data mining universe and its problems Business process management Visualization tools In database analytics Database data Warehouse Decision operations Descriptive Analysis Predictive Analysis Optimization Business rules Enterprise Platform (Oracle 11g) Operational Analytics: Putting Analytics to Work in Operational Systems 2010 BeyeNETWORK and Decision Management Solutions http://www.oracle.com/us/products/applications/hyperion/operation al-analytics-report-081829.pdf, Page 14. Adaptive Control KEY Performance Indicators Workshop Questions From a Data mining and informatics viewpoint • • • • Mobile health and social media. Tools and technologies for real time analysis. Methodological opportunities and challenges. Understanding trust and influence in social. Media. • Standards and Inter-operatibility RAPID logos – 03/13/2014 RAPID logos – 02/20/2014 Peramivir use during H1N1 influenza A epidemic Using a combined system for convergent data • 14 to 34 million cases • 63,00 to 153,000 hospitalizations • 7,000 to 14,000 deaths • • • • 1371 requests for drug 369 AE reports 344 unique patients 900 adverse events 1 Visual Look at RAPID Stakeholders RAPID MOBILE APPS 3 CHIO Cloud Infrastructure • Taha Kass-Hout (CHIO) • Jim Milto • AWS Contractor Team 2 Security • Lewis Watson (CISO) • Shawn Porter FDA RAPID LEADERSHIP • Dr Henry Francis • Richard Zhang • Bruce Weaver • Syed Haider FDA COMMUNICATIONS Office of Communications • Paul Buckman • Sherunda Lister • Kim Rawlings MOBILE DEVELOPMENT • NIH National Library of Medicine: George Thoma, Sameer Antani, Stacey Arnesen • FDA RAPID Contractor Team: Booz Allen Hamilton, Program Manager Dinesh Kolla Mobile Data Collection: Clinicians / Reporters enter MCM AE, attach digital file (ie, photo) and app auto captures geolocation (lat/long) 2 Data Transferred to FDA: Data is submitted from mobile device and sent over Cellular network or WiFi to FDA’s Cloud Environment MedDRA / Data SMEs • Sonja Brajovic • Roger Goetsch • Krishna Chary (E2B) • Mitra Roca (HL7) 4 3500A / MEDWATCH • Joseph Tonning • Dr Robert Ball 1 RAPID CLOUD ENVIRONMENT OIM (Streaming) • Josh Lehman Division of Drug Information (DDI) • Mary Kremzer • Catherine Chew Web Services, Database, Dashboards • FDA RAPID Contractor Team DMEP (Emergency) • Mary Beth Roberts 5 3 Data Processed in Cloud: Data is processed and stored in FDA Cloud. Adhering to FDA Security and Data best practices. Dashboards provide leadership, comm staff and QC views into MCM AE Data. 4 Response Sent to Reporter: Within 24 hours a targeted response is sent via email containing link to digital file (ie, Podcast) with additional information 5 Perform Analytics : Utilize existing tools for location-based analysis (ArcGIS) and signal detection (Empirica) of captured MCM AE data. EXISTING ANALYTICAL TOOLS ArcGIS (Geolocation from Mobile Devices, Heat Maps) •Martha O’Connor •Wayne Gorski •Nathan Beck •Newland Agbenowosi (RAPID Contractor Team) Empirica (Signal Detection) • Ana Szarfman • Marilyn Pitts • Michael Johnston Workshop Questions From a Data mining and informatics viewpoint • • • • mobile health and social media Tools and technologies for real time analysis Methodological opportunities and challenges Understanding trust and influence in social media • Standards and Inter-operatibility App Flow The cloud-based back end design will support the ingestion, analysis and visualization of streaming adverse event data for pharmacovigillance study. Visualization Layer View Age Group Breakdowns the numberAnalysis of Single Drug Analysis Modules• The graph shows Multi-Drug Modules Main Screen Analyze Drug-AE Combinations Stratified into Age or Gender Groups reports for the drug alone compared to the number of reports for the drugdrug event combination Analyze Drug-Event Combination per SDR Metric Per Drug Class • The bars are broken out into nine age groups • Discrepancies in these distributions are often worthy y of follow up p What are the drug-AE combinations and SDR metrics for a particular age group or gender? Search Across All AE Reports What are all the drug-AE combinations per SDR metric per drug class? View a Sector Map © 2011 Oracle Corporation Analyze the AE profile for a particular Drug 56 Analyze Drug-Event Associations A sector map for Em management data is presentation of data across all System Or (SOCs). (SOCs) • Each System O represented by tile") in the sect What information can one find on any drug, AE, or drug-AE pair by searching across a database of AE reports? What is the AE profile of a particular drug across all System Organ Classes? Are there associations between different • onSmaller Stheirll tiles til ("P groups of drug-AE pairs based prevalence in AE reports?SOC tile repres • These are rank order of values Propylthiouracyl (left) vs. Methimazole (right) 14 3. RAPID Phase II The proposed RAPID Biosurveillance Platform includes a cloud-based open source big data analytic tool to facilitate detection of adverse event signals in near real-time Cubism.js [Alerts] Display Drugs with > 50% increase in the number of reports in the last 2 weeks My Projects Project 1 Project 2 Project 3 Create Project Drug-AE Filter: • All drugs in which 1 or more cases are fatal • All drugs in which 50% of the adverse event cases are reported in pediatrics Disproportionality Metric Filter: • ROR • PRR • GPS • IC • Corresponding Ranking statistic: e.g., 5 per cent quantile Q0.05of the posterior distribution TTime imeS eriesVisualization Series Visualization for Number of Reports for Drug of Interest: Drug-Event Tracker Adverse Event Data and Social Media Data % Change in Number of Reports (daily) Algorithm & Ranking Statistic (all data) Paramivir-H1N1 +3 2.0 AvandiaDiabetes 0.17 1.05 MPA-Steroid Injections 1.0 1.3 ZanamivirH1N1 0.17 1 + 10 2.5 Drug ALL Clicking on a drug in the list above activates the time series and geographic visualization 09:23 09:24 09:25 09:26 09:27 foo foo FAERS bar Pro-MED-mail FAERS + Profoo + bar MED-mail FAERS – Profoo - bar MED-mail 09:28 09:29 09:30 09:31 09:3 10 14 24 -4 *Mouseover or use the arrow keys to inspect values Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct be dashboards, pulling data from Graphite, Cube and other sources. Cubism is ava Apache License on GitHub. Scalable Cubism fetches time series data incrementally: after the initial display, Cubism re by polling only the most recent values. Cubism renders incrementally, too, using charts one pixel to the left. This approach lets Cubism scale easily to hundreds o *Mouseover bubbles to view information on adverse events every ten seconds! Despite asynchronous fetching, rendering is synchronized so reported at different geographic locations simultaneously, further improving performance and readability. Effective RAPID GIS Visualizing 16 Growing volumes of rich and complex data sets have outstripped basic tools and methods, resulting in a ‘data analysis gap’ Analytic Complexity Small amounts of data or samples (megabytes to gigabytes) ADVANCED ANALYTICS BASIC ANALYTICS Large (or all) Amounts of data (gigabytes to petabytes) CLOUD ANALYTICS Predictive and real-time analytics capabilities BIG DATA ANALYTICS Accurate historical observations Volume, Variety, Velocity of Data Existing Capabilities – Basic/Advanced Analytics: •Analyze the average number of violations at a single manufacturer over time •Simulate the impact of a foodborne illness on the wider population Future Capabilities – Big Data/Cloud Analytics: •Predict which manufacturers are at greatest risk for producing contaminated products based on a variety of data sources •Analyze relationships between contaminated produce and multiple environmental variables Workshop Questions From a Data mining and informatics viewpoint • • • • Mobile health and social media. Tools and technologies for real time analysis. Methodological opportunities and challenges. Understanding trust and influence in social media. • Standards and Inter-operability. Social Media World Mining Social Media for Product safety • Description of the problem • -Like drinking from a firehose • -90% of data stored in the world today has been created in the past • two years. (Kass-Hout, T. and Alhinnawi, H. British Medical Bulletin 2013:108: 5-24) • Solutions to the problem • -Informatics, data mining, bi directional communication. • - “Real time” adaptation to public health events • How will the solution impact FDA? • - Develop multidimensional vision of medication use and drug safety What’s the Value of Social Media? • Responsibility to monitor public discussions • Expectation that someone is listening • Massive volume of discussion and patient reported outcomes of unknown depth (and quality) • Unique stream of intelligence that is not necessarily captured in other data sources • Not intended to supplant traditional post-marketing surveillance Vernacular-to-Regulatory Translation 22 Workshop Questions From a Data mining and informatics viewpoint • • • • Mobile health and social media. Tools and technologies for real time analysis. Methodological opportunities and challenges. Understanding trust and influence in social media. • Standards and Inter-operability. • Where do you do the work? The cloud-based RAPID Bio-surveillance System will support collaboration between FDA and other Federal agencies to enhance monitoring emerging health threats RAPID Biosurveillance System Tier 1: Regulatory Action/Guidance FDA and external partners issue guidance to ensure patient safety Tier 2: Data Visualization Dashboards summarizing key information provide safety alerts Disproportionality metrics and detailed analyses allow FDA and collaborators to understand emerging issues Tier 3: Data Management & Analytics RAPID data is combined with existing MedWatch and Medwatcher reports and data from external collaborators Advanced analytics support AE signal detection Tier 4: Processing of Adverse Events Healthcare professionals submit AE data via the RAPID mobile app AE data is stored in a “data lake” to support real-time access m e d w a t h e r D r u g S h o r t a g e The cloud-based RAPID Biosurveillance System will support collaboration between FDA and other Federal agencies to enhance monitoring of investigational therapies for Ebola and other emerging health threats RAPID Biosurveillance System Tier 4: Processing of Adverse Events Healthcare professionals submit AE data via the RAPID mobile app AE data is stored in a “data lake” to support real-time access The cloud-based RAPID Biosurveillance System will support collaboration between FDA and other Federal agencies to enhance monitoring of investigational therapies for Ebola and other emerging health threats RAPID Biosurveillance System Tier 1: Regulatory Action/Guidance FDA and external partners issue guidance to ensure patient safety Four points worth discussion • • • • Convergence of information sources What solutions will solve convergence issues Each information source has its own message Each information source has its own problems Holistic Drug Safety 28 Pharmacovigilance IT Tools Should Not Replace Our Own Eyes or Good Clinical Judgment (Slide courtesy of June Almenoff, M.D., Ph.D.) Distinguishing Humans and non humans DIA www.diahome.org 29 Data Mining team • • • • • • • Syed Haider Alfred Sorbello Ana Szarfman Joseph Tonning Bruce Weaver Lin Yu (collaborator) Richard Zhang