Big Data Presentation and Discussion

advertisement
National Science Foundation
Cyberinfrastructure for Materials
*
Discovery and Innovation
*Some
Assembly Required
Daryl W. Hess
Program Director, DMR
dhess@nsf.gov
Big Data Request for Input Closes November 14
https://www.nitrd.gov/bigdata/rfi/02102014.aspx
Opportunities to Shape How Science is Done
National Science Foundation
Data plays a key role …
• The way computation, data, experiment & theory interact
• Enhanced Connectivity
– Unprecedented Communication of Results
– What didn’t work, not just what did
• The Data is getting bigger ….
– Simulations
– New instruments
• How do we handle BIG data?
–
–
–
–
–
Distributed, Inhomogeneous
Provenance? Access? “Presentation?” Will it be there tomorrow?
Discovery tool?
A gateway to new problems?
….
OSTP: Big Data is a Big Deal
National Science Foundation
President: … an “all hands on deck” effort.
By improving our ability to extract knowledge and insights from large
and complex collections of digital data, the initiative promises to help
accelerate the pace of discovery in science and engineering,
strengthen our national security, and transform teaching and
learning.”
 more than $200 million from 6 Federal departments and agencies
 Big Data: data sets so large, complex, or rapidly-generated that they can’t
be processed by traditional information and communication technologies
 Year 2: Obama Administration encourages federal agencies,
industry, academia, state and local governments to develop and
participate in Big Data initiatives.
 advance core Big Data technologies;
 use Big Data to advance national goals
 use competitions and challenges
National Science Foundation
To help businesses discover, develop, and deploy new materials
twice as fast, we’re launching what we call the Materials Genome
Initiative. The invention of silicon circuits and lithium ion batteries
made computers and iPods and iPads possible, but it took years to
get those technologies from the drawing board to the market place.
We can do it faster.
-President Obama
Carnegie Mellon University, June 2011
The Materials Genome Imitative
National Science Foundation
Discovery-to-market in less than half the time at half the cost
A Materials
Innovation
Infrastructure is
Required
MGI for Global Competitiveness
1. Discovery
Number of Materials “to Market”
National Science Foundation
A strategy for acceleration
2
3
1
5
4
6
7
3- 5 years
Time
(yrs)
National Science Foundation
Nanotechnology Knowledge Infrastructure:
Enabling National Leadership in Sustainable Design
A Nanotechnology Signature Initiative
Thrust
Foster an agile
modeling network
for multidisciplinary
collaboration.
Thrust
Thrust
Create a robust digital
nanotechnology
data and information
Build a sustainable
nanotechnology
cybertoolbox
infrastructure
Thrust
Nurture a diverse
collaborative community to create nanotechnology to
meet national challenges
A community-based knowledge infrastructure to accelerate
nanotechnology discovery and innovation
http://www.nano.gov/NSINKI
National Science Foundation
The Science Drives The
Cyberinfrastructure
NSF Workshop
The Materials Genome Initiative:
National Science Foundation
The Interplay of Experiment, Theory and Computation
“… a seamless interplay between
experiment, computation and
demonstration is essential.”
A key role for data
J.J. de Pablo, B. Jones, C. Lind-Kovacs, V. Ozolins, A. P. Ramirez.
Current Opinion in Solid State and Materials Science 18, 99–117 (April, 2014)
NSF Workshop
The Materials Genome Initiative:
National Science Foundation
The Interplay of Experiment, Theory and Computation
Consider a university researcher with an idea for a new battery material
…
She queries a large database of experimental and computational data
accesses an online simulation service and suggests other promising
compounds …
… [Others] make some of the materials … and upload energy storage
performance data to the database.
… other computational researchers show why the new materials are
effective.
The new material is made in several labs and … stimulating industry.
Commercial scale fabrication and scale-up are taken into consideration
by industrial scientists and engineers as materials are being developed.
All of this happens faster than the time it takes for a single research
paper to be published today. (highly abridged)
J.J. de Pablo, B. Jones, C. Lind-Kovacs, V. Ozolins, A. P. Ramirez.
Current Opinion in Solid State and Materials Science 18, 99–117 (April, 2014)
National Science Foundation
Designing Materials to Revolutionize
and Engineer our Future
NSF’s signature response to the MGI
What: … activities that accelerate materials discovery and
development by building the fundamental knowledge base needed to
progress towards designing and making a material with a specific and
desired function or property from first principles.
DMREF goal: to control material properties through design: this is to
be accomplished by understanding the interrelationships of composition,
processing, structure, properties, performance, and process control.
How: … the proposed research must be a
collaborative and iterative process wherein theory
guides computational simulation, computational
simulation guides experiments, and experiments
further guide theory.
Benefits from existing software and data and will
contribute new data, models, and software
National Science Foundation
Software Infrastructure for Sustained
Innovation (SI2)
Innovation  Resuable and Sustainable Software
NSF 14-520
• Through Division of Advanced Cyberinfrastructure
• ~50 Elements & Frameworks projects & 13 potential Institutes
planning projects ( http://bit.ly/sw-ci for current SI2 projects)
SI2-SSI Collaborative Research:
A Computational Materials Data and Design Environment
Goal: Develop modular and extensible high-throughput ab-initio tools and critical property
databases for materials design
Developing tools to model
• Defect energetics/thermodynamics
• Solid state diffusion
• Charged surfaces
Participants
Dane Morgan
Formation
energies
Kristin Persson
Migration
energies
Gerbrand Ceder
Alan K Dozier,
Raphael Finkel
CyberInfrastructure Impact
• Open source tools for easy public use
• Workshops on high-throughput computation and the Materials Genome framework.
• Integration with Materials Project at LBNL
• Training of next generation in automated computational materials design tools
National Science Foundation
SSI: Scalable, Extensible, and Open Framework for
Ground and Excited State Properties of Complex Systems
Sohrab Ismail-Beigi, Yale (139804)
Laxmikant Kale, UIUC (1339715)
Glenn Martyna, IBM (collaborator)
• Develop a first-principles ground
state/excited state and response code
good scaling properties on HPC
platforms
• For study of new materials science,
condensed matter physics, and chemistry
problems.
• An American electronic structure code
• Highly scalable codes for HPC that would
open new frontiers through the ability to
tackle larger scale problems.
• Helps support sophisticated materials
design efforts and the Materials Genome
Initiative.
DFT-GGA predicted structure of the ordered
nanoscale heterojunction of P3HT polymers (gold,
above) anchored covalently by sulfur atoms to the
[10-10] surface of a ZnO nanowire (red and
purple, bottom). A GW/BSE prediction of the
electronic bands and excitons of this interface
would help for photo-voltaic applications.
S2I2 Software Institutes
National Science Foundation
What does materials research need?
• Long-term hubs of excellence in software
infrastructure and technologies, research
and application communities of substantial
size and disciplinary breadth.
• ~ $1-2 Mil./yr.
• Conceptualization awards possible
~13 awarded across NSF
 Collaborative Research: Scientific Software Innovation Institute for
Advanced Analysis of X-Ray and Neutron Scattering Data (SIXNS)
- Brent Fultz, CalTech
 Collaborative Research: A Scientific Software Innovation Institute
for Computational Chemistry and Materials Modeling (S2I2C2M2)
- Tom Crawford, VaTech
National Science Foundation
Data & NSF: Enabling a Knowledge
Infrastructure
Warning: Emphasis will change!
• BIGDATA (NSF 14-542) [$200K/yr. - $500K/yr.]
Critical Techniques and Technologies for
Advancing Big Data Science & Engineering
 "Foundations" (F): developing or studying fundamental
techniques, theories, methodologies, and technologies of broad
applicability to Big Data problems.
 "Innovative Applications" (IA): developing techniques,
methodologies and technologies of key importance to a Big Data
problem directly impacting at least one specific application.
National Science Foundation
Data & NSF: Enabling a Knowledge
Infrastructure
Warning: Emphasis will change!
• DIBBs (NSF 14-542)
Data Infrastructure Building Blocks
“… development of robust and shared data-centric cyberinfrastructure
capabilities to accelerate interdisciplinary and collaborative research in
areas of inquiry stimulated by data.”
 Pilot Demonstration Awards (up to $500K/yr.)
Addresses needs of a large number of researchers within a domain
 Early Implementation Awards (up to $1 Million/yr.)
Multiple research communities in multiple S&E domains
Community Input
National Science Foundation
Grappling with the *issues
Workshops
– NKI: Data Sharing Workshop
• Diverse participants: nanotechnology data producers &
users, journal editors, industry, ….
• Cross-cutting needs of the communities
– DMR: Data Workshop
• Data Sharing in the context of materials research
* Data curation, completeness, relevance, quality, “code
publication,” ontologies …
Big Data Request for Input
National Science Foundation
https://www.nitrd.gov/bigdata/rfi/02102014.aspx
“This request encourages feedback from multiple big
data stakeholders to inform the development of a
framework, set of priorities, and ultimately a strategic
plan for the National Big Data R&D Initiative.”
VISION STATEMENT: We envision a Big Data innovation ecosystem in
which the ability to analyze, extract information from, and make decisions
and discoveries based upon large, diverse, and real-time data sets enables
new capabilities for federal agencies and the nation at large; accelerates the
process of scientific discovery and innovation; leads to new fields of
research and new areas of inquiry that would otherwise be impossible;
educates the next generation of 21st century scientists and engineers; and
promotes new economic growth.
Moving Forward
National Science Foundation
A useful way to think about data ?
DMR MGI Workshop
– Shared Data is a focal point for
collaborative research
Data infrastructure across the scales …
Small Group
Collaboration
Center-Center
Collaboration
Intra-Center
Collaborations
Many Mixed
Collaborations
National Science Foundation
Thank You!
National Science Foundation
Nanotechnology Knowledge Infrastructure:
Enabling National Leadership in Sustainable Design
A Nanotechnology Signature Initiative
Thrust Areas:
• A diverse collaborative community of
scientists, engineers, and technical staff to
support research, development, and
applications of nanotechnology to meet
national challenges
• An agile modeling network for
multidisciplinary intellectual collaboration that
effectively couples experimental basic
research, modeling, and applications
development
• A sustainable cyber-toolbox to enable
effective application of models and knowledge
to nanomaterials design
• A robust digital nanotechnology data and
Accelerating Nanotechnology
information infrastructure to support
discovery and innovation
effective data sharing, collaboration, and
http://www.nano.gov/NSINKI
innovation across disciplines and applications
Download