Computational Science as an enabler for sustainable FEW Systems Baskar Ganapathysubramanian

advertisement
Computational Science as an enabler
for sustainable FEW Systems
Baskar Ganapathysubramanian
Iowa State University
NSF FEW Workshop: Oct 12-13, 2015, ISU
1
Computational Science and Engineering Group
What do we do:
1) Algorithm design and software implementation
2) Application driven research: Curiosity driven group
NSF FEW Workshop: Oct 12-13, 2015, ISU
Overview of research activities related to Plant Sciences
2
Feature extraction: Data for crop models
Data for validation/input/calibration
Data deluge due to sensor advances
and data collection improvements
Heterogeneous, multi length and
time scale data
Noisy, gappy data
Need to extract traits used for
various ‘down stream’ tasks
Have to do this in an automated,
high throughput, and efficient way
Similar issues faced by other disciplines: Astronomy, Particle physics,
Driverless automobiles, security and defense applications
Machine learning approaches very promising
NSF FEW Workshop: Oct 12-13, 2015, ISU
3
Machine Learning
Goal of ML is to generalize beyond training data
From
opsrules.com
Pattern recognition, perception and control tasks
Very difficult to manually encode all features
Breakthrough in learning algorithms. Prominent
examples include ‘deep networks’
MNIST
dataset
NVIDIA cuDNN website
TIMIT
dataset
More data, Better computing infrastructure
NSF FEW Workshop: Oct 12-13, 2015, ISU
4
Machine Learning Examples
Learning feature labels in scenes: Convolution networks
NSF FEW Workshop: Oct 12-13, 2015, ISU
From Le Cun group,
Hinton group, Ng group
5
Machine Learning Examples
Learning a hierarchy of features: Feature extractions using auto-encoders, sparse
encoders, Deep Belief networks, Deep Neural Networks
NSF FEW Workshop: Oct 12-13, 2015, ISU
From Le Cun group,
Hinton group, Ng group
6
ML: Agricultural Examples
Basic hypothesis: Use high throughput phenotyping to
enable extraction of detailed characteristics of tassels.
P. Schnable
Challenges: Identification of tassel locations, followed by extraction of tassel features of
close to a million images!
ML: Agricultural Examples
Basic hypothesis: Use high throughput phenotyping to
understand features affecting (a)biotic stress tolerance
Example Application:
Iron Deficiency Chrolosis (IDC)
A. Singh
2
1
S. Sarkar
3
4
IDC: Inability of plants to
absorb iron from soil
Current Methods are Visual:
- Time consuming
- Labor Intensive
- Reliability/Consistency
issues
5
A. Singh
Standard Area Diagram
ML tools for rapid
identification. Deploy as apps
ML for Yield Prediction
Yield forecasting: Combination of knowledge-based computer programs
(that simulate plant-weather-soil-management interactions) along with soil
and environment data and targeted surveys.
D. Hayes
S. Sarkar
D. Nettleton
Companies such as Climate Corp and other big data firms may now be able
to beat the USDA at yield forecasting, leading to detrimental asymmetric
markets.
A publicly available high quality yield prediction tool will enable the
producers to make informed decisions thereby ensuring a symmetrical
market.
Goal:
1) Collect and curate dataset of economic, agricultural, meteorological, and
crop management traits that is used to make predictions.
2) Develop and deploy suite of statistical and ML tools on data
3) Create a workflow that will enable the larger community to utilize data
and test methods
NSF FEW Workshop: Oct 12-13, 2015, ISU
9
Optimization: Trait identification for productivity
Simple physiological model of adult maize plant.
Validated in field by Matthew Gilbert (UC Davis)
D. Attinger
Several field-testable traits: stomatal conductance,
root, stem, leaf conductance.
Input: Hourly weather data.
Outputs: Water use, Photosynthetic yield
M. Gilbert
Software engineering
Code optimization
Integrate with parallel
 optimization
framework

Deploy on HPC systems
Optimization: Trait identification for productivity
Explored traits that perform under well irrigated vs drought conditions.
Pareto front with more than 3 million configurations tested. Ran on XSEDE TACC and
local HPC resources (unpublished, 2015).
NSF FEW Workshop: Oct 12-13, 2015, ISU
11
Concluding Observations
1) Leverage (rapid) machine learning developments
2) Learn from progress/best practices in other fields
3) Fast ML models as surrogate models for exploration, uncertainty quantification
4) Visualization and data management become important
5) Data exchange/sharing/interoperability protocols have to be set.
6) Critical to incorporate software engineering practices into the workflow (code
reuse, modularity).
7) Need sustained support for software development and maintenance
8) Need to be ready for next generation cyber infrastructure
9) Community based approach?
Download