IBM Big Data Projects with Ontario Universities July 16, 2014 0 © 2012 IBM Corporation Focus Area Phase Focus #Project SME Phase1 FastStart 7 30% Phase2 Academic-led 24 70% Phase3 Industry and Academic-led 11 90% Status: 25 running/initiated, 13 scheduled 3Q, 2 Q114, 2 TBD* Institution Health 20 Energy (* new Mining) 8 Water 5 Cities 4 Agile 5 + 6 multi = 13 #Projects Platform McMaster University 4 (all Agile) University of Ottawa 3 (1 Agile) Queen’s University 4 (0 Agile) University of Ontario 3 (0 Agile) University of Toronto 12 (3 Agile) University of Waterloo 6 (1 Agile) Western University (1 IBM lead Mining) 9 (4 Agile) Carleton University (IBM Lead) 1 (0 Agile) TOTAL 1 #Project 42 #Project Blue Gene/Q 15 Cloud 13 Agile 13 Multi-Platform 6 (1 non SOSCIP) * 2 projects deferred pending Sustainability plan + resources ** 1 project on hold pending UofT/UHN/IBM IP agreement NANOPHOTONIC DEVICES FOR EARLY DISEASE DETECTION Create a computational model of nanophotonic devices that could improve the early detection of disease at a cellular level. HEALTH PANDEMIC MODELING, PREDICTION, AND CONTROL Based on citizen behaviors and local GTA health care policies, create models to help decision makers control a pandemic outbreak of a contagious disease. INFECTIOUS DISEASE MODELING SOFTWARE Develop a software package that enables easier mathematical modeling of certain factors of infectious disease, such as infection rates, incubation periods, and initial conditions for a new infection. COMMON SIGNATURES IN LUNG DISEASE Design an algorithm to identify common signatures in lung cancer patients, advancing diagnosis, prognosis and treatment capabilities. INTEGRATED RADIOLOGY AND PATHOLOGY Leverage cloud-based smart platform solutions to integrate medical imaging from radiology, pathology, and oncology departments, providing a patient-centred, holistic approach to the diagnosis and treatment of cancer. DETECTING RADIATION EXPOSURE Develop software that will enable rapid, accurate detection of radiation exposure for mass numbers of people. REAL TIME ANALYSIS OF HUMAN BRAIN NETWORKS Apply stream analytics to functional MRI data to analyze brain activity in near real time to improve patient experience and reduce medical costs and timelines. ARTEMIS EXPANSION Build a cloud-based service using streaming analytics to predict the health status of individuals, particularly premature children. Bring sophistication of urban teaching hospitals to remote communities and extend early detection of infections to adult ICU. BUILDING AND CERTIFYING SAFE AND SECURE INSULIN PUMPS Build a toolkit for software certification. First application will target medical devices. DATA PRIVACY AND SHARING MEDICAL DATA Develop new healthcare privacy and security framework to address both patient care and medical studies ANALYTICS AS A SERVICE Develop new toolkit to support ultra large scale services for big data analytics. Initial target is health applications but toolkit will be applicable to other domains. 2 ENERGY 3 RENEWABLE SOLAR ENERGY Develop a new, low-cost, paint-on, solar cell-based renewable energy. COMBUSTION SYSTEM SIMULATION Develop a high-performance computing algorithm that simulates combustion systems to improve the design of gasturbine engines in transportation and power-generation applications. CREATING SUSTAINABLE ENERGY FROM ARTIFICIAL PHOTOSYNTHESIS Examine artificial photosynthesis and water-splitting at a nanoscale level to better understand the chemical and physical properties of these processes. This level of understanding will contribute to the development of clean, renewable, and sustainable sources of energy in Ontario. SMART METER DATA ANALYTICS Develop software for small/medium enterprises which will help to identify smart ways to reduce energy consumption. WIND-FORECASTING MODEL Develop a wind-forecasting model to enable renewable wind energy generation in Ontario to be more proactive and operationally cost effective. A MODEL FOR SHORT AND LONG TERM ENERGY PLANNING Improve models for power distribution reliability building on IBM’s weather forecast model. WATER REAL-TIME, REMOTE SENSOR, WATER SHED DATA ANALYSIS Design communications software that will transmit real-time data from remote sensors in the Grand River Watershed. The data will improve both the understanding of water behavior, as well as water management tactics. CLIMATE CHANGE IMPACT ON WATER RESOURCES Develop a 3D hydrological model that represents the impact of the climatechange forecast on the quality and quantity of the surface and subsurface water resources in the Grand River Watershed. WATER QUALITY MONITORING Create a low-cost, easy-to-use, real-time sensor system for water quality monitoring, including biological and chemical contamination detection. MISSION CRITICAL INFRASTRUCTURE MONITORING Create disaster management response systems which have a clear understanding of interdependencies of all aspects of risk mitigation, disaster preparedness and post disaster planning. Develop reliable monitoring systems for the real-time detection of mission-critical infrastructure failures. REAL-TIME DRINKING WATER MANAGEMENT/ MONITORING Create a real-time water data processing system to aid in water contamination alerts, ensuring enough water is supplied to the public, and to control residential water use. 4 CITIES SMART URBAN SYSTEM DESIGN 5 Research into transportation and urban activity systems in the Greater Toronto Area (GTA), improving the decision-making ability of urban planning designers in Ontario. AUTOMATIC DETECTION OF MAN-MADE OBJECTS FROM IMAGE DATA Develop a cloud-based tool that will automatically identify features (buildings, roads, forest types, etc.) from high-resolution image data for use in areas like urban planning or forest management. CLIMATE MODEL Create new detailed climate projection and drive hydrological models to assess impact of global warming using dynamic downscaling specific to the Grand River watershed. MAKING HIGH-PERFORMANCE COMPUTING ACCESSIBLE AGILE Design a program that enables software application developers, with minimal hardware skills, to leverage agile, highperformance computing, resulting in faster development cycles. 6 ASTRONOMICAL DATA MINING Leverage agile, high-performance computing to boost the processing capability of data obtained from Ontario’s world-leading astronomical radio telescope at the Algonquin Radio Observatory. REAL-TIME NETWORK CAPACITY ADJUSTMENT Leverage agile computing to develop an accelerated ray-tracing algorithm, allowing network operators to adjust network capacity in real-time based on changes in network state and improving quality of service. IMPROVING SMART GRID DATA EXCHANGE Improve the security and efficiency of handling massive data exchanges in smart grid infrastructures through hardware accelerated computing. DESIGN PATTERNS FOR HETEROGENEOUS COMPUTING Develop a set of design patterns that will make high performance computing more accessible to software developers creating complex applications. HARDWARE ACCELERATION THROUGH AGILE COMPUTING Using agile computing, develop a hardware-based system that will accelerate the ability of the IT industry to solve optimization problems, such as routing and scheduling of airplanes and urban transportation systems. SME Lead 7 PATIENT-CENTRED UNIVERSAL HEALTH RECORDS Develop a cloud-based solution to aggregate, analyze, and standardize patient health records. WEATHER PROJECTIONS FOR SMART CITIES Integrate high resolution weather projections with cities infrastructure (buildings, transportation networks, etc.) to improve their design, sustainability, and resiliency. PREDICTING LEUKEMIA INHIBITORS Develop a tool that simulates molecular behaviour to accelerate the selection of drugs for the treatment of leukemia. IBM Lead 8 IN SILICO PROTEIN SYNTHESIZER Perform genetic analysis to sequence proteins that aid in drug design. This will help to better understand diseases and treatments. ANALYZING GEOSPATIAL PATTERNS IN THE CLOUD: APPLICATION TO MINERAL EXPLORATION AND MINING IN CANADA Aggregate and perform statistical analysis, data-mining, scoring and ranking, using Monte Carlo methods and Bayesian statistics to identify areas of promise with requiring extraction of new core samples. NEW PROJECTS CYTOGENETIC DECISION-SUPPORT TOOL Design and implement a decision-support tool to assist cytogeneticists in the selection of appropriate probes to improve speed and accuracy of DNA microarray testing for the diagnosis and treatment of chromosome/cell related disorders. PHOTODYNAMIC CANCER THERAPY Develop new, minimally invasive, photodynamic therapy for the treatment of head and neck cancers. MISSION-CRITICAL INFRASTRUCTURE MONITORING Develop real-time detection of mission critical infrastructure failures, such as loss of energy supply, water contamination, dam failure, or the collapse of structures such as buildings or bridges. NEWEST PROJECTS PREDICTIVE CARDIOTOXICITY USING MACHINE LEARNED MODELING FROM SIMPLE BIOLOGICAL INPUTS Use techniques that identify novel ECG patterns via advanced mathematical techniques that assess dynamic alterations in cardiac conduction and repolarization along with alterations in vascular and autonomic function. CHARACTERIZATION OF PROTEIN-DRUG INTERACTION NETWORKS FOR RARE-DISEASE REPURPOSING (FDA) Structural alignment of test sets of millions of compounds, molecular dynamics simulations of involving tens or hundreds of proteins from the FDA’s Rare Disease Repurposing Database. QUANTIFIED PSYCHIATRY: TIME SERIES PREDICTIVE MODELING IN MENTAL HEALTH DIAGNOSTICS Perform analysis of pharmaceutical clinical trial data for antidepressant medications and gather behavioural markers to segment psychopathology. Analysis will generate insight into patterns underlying patient behaviour, and understand placebo response in the context of these trials. FPGAs and Big Data Dimensions of Parallelism Direct data injest Network Storage Pipeline Parallelism I/O attachment, or Coherent attachment via CAPI Multiple pipelines to Host CPU Multiple kernels/functions Performance Floating Point Performance Power Consumption GFLOPS Power (W) 300 200 100 0 GPU FPGA 7000 6000 5000 4000 3000 2000 1000 0 40 30 ~2× 20 10 GFLOPS/W 400 CPU Enabling Technologies (single-precision fused-multiple-add) Lime 0 CPU GPU FPGA POWER8 CAPI © 2012 IBM Corporation Real-time fMRI Brain Analytics Mark Daley, Western University (London, ON) The problem: brain activity scans take days to analyze The solution: a real-time analytics engine FPGA replaces 48 x86 cores and implements superior motion correction algorithm IBM InfoSphere Streams on Power 7 constructs graphs of brain networks 40x faster than single process on x86 Graph updates every 0.6-0.8s Planning replacement of CPU-based graph analytics with Power 8 and CAPI-attached FPGA accelerator Results in seconds instead of days! © 2012 IBM Corporation