Distributed Big Data and Analytics (DBDA) Internet2 CINO Ini,a,ve Working Group Co-­‐Chair Mee,ng Chairs Alex Feltus, Clemson 10 August 2015 Sam Gustman, USC Marc Hoit, NC State 1 1 Meeting Objectives • • Discussion of eight submitted use cases Discussion of next steps 2 Use case input from the working group Use Case # Use Case Title 1 2 Data Analy,cs of Campus-­‐Scale Power System Intelligent Management Systems Center 3 Machine Tool Ball Screw Health Monitoring 4 Bioinforma,cs 5 Name Ins0tu0on • • • • • • • SubmiHed by Alex Feltus Contact Dan Noneaker SubmiHed by Jane Combs Contact Prof. Jay Lee SubmiHed by Jane Combs Contact Prof. Jay Lee SubmiHed by Jane Combs Clemson • SubmiHed by Jane Combs Univ of Cincinna, 6 Computa,onal Fluid Dynamics Research: Aerospace Geography/Climate • SubmiHed by Jane Combs Univ of Cincinna, 7 High Energy Physics • SubmiHed by Jane Combs Univ of Cincinna, 8 Modeling and Simula,ons • SubmiHed by Jane Combs Univ of Cincinna, Univ of Cincinna, Univ of Cincinna, Univ of Cincinna, 3 Data Analytics of Campus-Scale Power System Submitted by Alex Feltus at Clemson Contact Dan Noneaker at Clemson • Project/Research Title: Data Analytics of Campus-Scale Power System • Industry Sector: Electric Power Utility • Science Sub-domain: Electrical Engineering • Short Description of Project & Relation to Big Data: The local electric power grid will be heavily instrumented on a campus containing a mix of residential sites, office spaces, industrial-scale electromechanical systems, and distributed energy sources. The instruments will be networked to a server that provides data for use in analytics focused on electric energy consumption, electric-service reliability, power quality, and local grid planning and design. The analytics will support research in local-grid technologies, distributed control of the electric grid, and power electronics, etc. • Potential Industrial Partners: Duke Energy (Clemson's electric service provider), other electric utilities, power-industry instrumentation and electronics manufacturers, power-system monitoring and control vendors • Other Faculty Involved All power faculty at Clemson, power research staff at Clemson's CURI site in Charleston, SoC faculty working in data analytics • Best Contact: Dan Noneaker, ECE Dept. Chair, dnoneak@clemson.edu • Big Data Attributes: sensor, near-realtime, distributed, geospatial • Aggregate Data Size: Now 2 TB 2016 4 TB 2017 16 TB 2020 1 PB 4 Intelligent Maintenance Systems (IMS) Center Submitted by Jane Combs at University of Cincinnati Contact Prof. Jay Lee at University of Cincinnati • Project/Research Title: Utilizing Prognostics & Health Management (PHM) Cloud Technology to Improve Band Sawing Process • Industry Sector: Manufacturing / Industrial Machinery • Science Sub-domain: Data Analytics / Prognostics & Health Management • Short Description of Project & Relation to Big Data: The goal of this project is to acquire a large amount of operating data from band saw machines both in the field and from an in-house test bed. This data is then analyzed using the Watchdog Agent® toolkit to assess and predict the health condition of the monitored band saws. Once validated, this approach will be used to construct a commercial cloud-based platform and mobile app for the project sponsor. • Best Contact: Professor Jay Lee • Big Data Attributes: Sensor, Near Real-time • Aggregate Data Size: Now 5 TB 2016 10 TB 2017 2020 5 Machine Tool Ball Screw Health Monitoring Submitted by Jane Combs at University of Cincinnati Contact Prof. Jay Lee at University of Cincinnati • Project/Research Title: Machine Tool Ball Screw Health Monitoring • Industry Sector: Manufacturing / Industrial Machinery • Science Sub-domain: Data Analysis / Prognostics & Health Management • Short Description of Project & Relation to Big Data: The goal of this project is to conduct multiple run-to-failure tests using commercially available machine tool ball screws and motors to collect data and design a data driven model for health monitoring and prediction of such ball screws. Data from these tests is transferred and stored on a central server. A mobile app will be developed for monitoring these tests. • Best Contact: Professor Jay Lee • Big Data Attributes: Sensor, Near Real-time • Aggregate Data Size: Now 5TB 2016 15TB 2017 30TB 2020 6 Bioinformatics Submitted by Jane Combs at University of Cincinnati • Project/Research Title: NIH BD2K-LINCS Perturbation Data Coordination and Integration Center • Industry Sector: Health care / Environmental Health/ Biomedical Research • Science Sub-domain: Bioinformatics and Data Science • Short Description of Project & Relation to Big Data: The Library of Integrated Network Based Cellular Signatures (LINCS) project is expected to produce masses of data collected from human cells and tissues perturbed with drugs and other molecules. The center’s role is to develop new methods to integrate big data, come up with intelligent ways to mine and analyze it, intuitive tools to interact with it and to educate the research community on how to best leverage this trove of information for biomedical research. • Best Contact: Jane Combs, combsje@uc.edu • Big Data Attributes: In biomedical research, these data sources include the diverse, complex, disorganized, massive, and multimodal data being generated by researchers, hospitals, and mobile devices around the world. • Aggregate Data Size: Now 2016 2017 2020 7 Computational Fluid Dynamics Research: Aerospace Submitted by Jane Combs at University of Cincinnati • Project/Research Title: Study of Active and Passive Flow Control Techniques over Turbine Blades • Industry Sector: Aerospace • Science Sub-domain: Mechanical Engineering, Comp Fluid Dynamics • Short Description of Project & Relation to Big Data: Collaborative immersive visualization of large datasets and simulation trajectories to support the study of active and passive flow control techniques and turbine-blade cooling. The goal of such simulations is to provide predictive performance analysis of physical systems that may contain many integrated components and which are described by multiple, interacting, physical processes. • Best Contact: Jane Combs, combsje@uc.edu • Big Data Attributes: Simulation datasets and data visualization • Aggregate Data Size: Now 2TB 2016 4TB 2017 6TB 2020 8 Geography/Climate Submitted by Jane Combs at University of Cincinnati • Project/Research Title: Toward a Circumarctic Lakes Observation Network (CALON)--Multiscale Observations of Lacustrine Systems • Industry Sector: Geography • Science Sub-domain: Climate • Short Description of Project & Relation to Big Data: Expand on existing lake monitoring sites in northern Alaska by developing a network of regionally representative lakes along environmental gradients from which we will collect baseline data to assess current physical, chemical, and biological lake characteristics. Download and process hundreds of up to 1TB processed satellite image data sets with four Internet2 universities and NSF National Snow and Ice Data Center. Develop and refine data management, visualization, and archiving activities with ACADIS. • Best Contact: Jane Combs, combsje@uc.edu • Big Data Attributes: Satellite image data sets • Aggregate Data Size: Now 1TB 2016 3TB 2017 2020 9 High Energy Physics Submitted by Jane Combs at University of Cincinnati • Project/Research Title: Large Hadron Collider (LHCb) experiment at CERN studying heavy flavor physics • Industry Sector: Nuclear Physics, Energy • Science Sub-domain: Physics • Short Description of Project & Relation to Big Data: The physics focus is studying oscillations of matter into anti-matter and studying the differences in decays rates of matter and corresponding anti-matter to “mirror-image” final states. These address the nature of fundamental interactions between the basic constituents of matter. We move large data files from host laboratories to computers at UC for final analysis. For example, a single file with an LHCb NTUPLE from a small fraction of the data is 3GB. The size of the data set to be transferred for this analysis will be on the order of 1TB. • Best Contact: Jane Combs, combsje@uc.edu • Big Data Attributes: LHCb NTUPLE data sets • Aggregate Data Size: Now 3TB 2016 6TB 2017 6TB 2020 10 Modeling and Simulation Submitted by Jane Combs at University of Cincinnati • Project/Research Title: Study of Active and Passive Flow Control Techniques over Turbine Blades • Industry Sector: Consumer Products • Science Sub-domain: Mechanical Engineering, Modeling and Simulation • Short Description of Project & Relation to Big Data: The UC Simulation Center, in collaboration with Procter & Gamble, focuses on high-fidelity numerical simulation of complex flow phenomena with a wide range of length and time scales. The UC Simulation center is a partnership where students work on M&S of complex industrial problems associated with porous media, multiphase flows, etc. Predictive performance analysis of the multidisciplinary, multi-scale systems generate terabytes of data. • Best Contact: Jane Combs, combsje@uc.edu • Big Data Attributes: Simulation datasets and data visualization • Aggregate Data Size: Now 2TB 2016 4TB 2017 6TB 2020 11 Next Steps • Schedule deep dive calls between Co-Chairs and Use Case POCs • Determine co-chair presenters for August 31 Joint Collaborative Innovation Community call • Preparation for in person meetings at TechEx – October 4-7 in Cleveland, Ohio – – • DBDA Innovation Working Group meeting – 75 minutes • Review current status, use cases, gather new ideas Collaborative Innovation Community meeting – all 3 working groups together – 90 minutes • Each working group presents status • Invite new participants and new ideas • Innovation hackathon over lunch for new ideas in current innovation areas or new ones Monthly team meeting – next one September 12 Thank You 13