Computational Science as an enabler for sustainable FEW Systems Baskar Ganapathysubramanian Iowa State University NSF FEW Workshop: Oct 12-13, 2015, ISU 1 Computational Science and Engineering Group What do we do: 1) Algorithm design and software implementation 2) Application driven research: Curiosity driven group NSF FEW Workshop: Oct 12-13, 2015, ISU Overview of research activities related to Plant Sciences 2 Feature extraction: Data for crop models Data for validation/input/calibration Data deluge due to sensor advances and data collection improvements Heterogeneous, multi length and time scale data Noisy, gappy data Need to extract traits used for various ‘down stream’ tasks Have to do this in an automated, high throughput, and efficient way Similar issues faced by other disciplines: Astronomy, Particle physics, Driverless automobiles, security and defense applications Machine learning approaches very promising NSF FEW Workshop: Oct 12-13, 2015, ISU 3 Machine Learning Goal of ML is to generalize beyond training data From opsrules.com Pattern recognition, perception and control tasks Very difficult to manually encode all features Breakthrough in learning algorithms. Prominent examples include ‘deep networks’ MNIST dataset NVIDIA cuDNN website TIMIT dataset More data, Better computing infrastructure NSF FEW Workshop: Oct 12-13, 2015, ISU 4 Machine Learning Examples Learning feature labels in scenes: Convolution networks NSF FEW Workshop: Oct 12-13, 2015, ISU From Le Cun group, Hinton group, Ng group 5 Machine Learning Examples Learning a hierarchy of features: Feature extractions using auto-encoders, sparse encoders, Deep Belief networks, Deep Neural Networks NSF FEW Workshop: Oct 12-13, 2015, ISU From Le Cun group, Hinton group, Ng group 6 ML: Agricultural Examples Basic hypothesis: Use high throughput phenotyping to enable extraction of detailed characteristics of tassels. P. Schnable Challenges: Identification of tassel locations, followed by extraction of tassel features of close to a million images! ML: Agricultural Examples Basic hypothesis: Use high throughput phenotyping to understand features affecting (a)biotic stress tolerance Example Application: Iron Deficiency Chrolosis (IDC) A. Singh 2 1 S. Sarkar 3 4 IDC: Inability of plants to absorb iron from soil Current Methods are Visual: - Time consuming - Labor Intensive - Reliability/Consistency issues 5 A. Singh Standard Area Diagram ML tools for rapid identification. Deploy as apps ML for Yield Prediction Yield forecasting: Combination of knowledge-based computer programs (that simulate plant-weather-soil-management interactions) along with soil and environment data and targeted surveys. D. Hayes S. Sarkar D. Nettleton Companies such as Climate Corp and other big data firms may now be able to beat the USDA at yield forecasting, leading to detrimental asymmetric markets. A publicly available high quality yield prediction tool will enable the producers to make informed decisions thereby ensuring a symmetrical market. Goal: 1) Collect and curate dataset of economic, agricultural, meteorological, and crop management traits that is used to make predictions. 2) Develop and deploy suite of statistical and ML tools on data 3) Create a workflow that will enable the larger community to utilize data and test methods NSF FEW Workshop: Oct 12-13, 2015, ISU 9 Optimization: Trait identification for productivity Simple physiological model of adult maize plant. Validated in field by Matthew Gilbert (UC Davis) D. Attinger Several field-testable traits: stomatal conductance, root, stem, leaf conductance. Input: Hourly weather data. Outputs: Water use, Photosynthetic yield M. Gilbert Software engineering Code optimization Integrate with parallel optimization framework Deploy on HPC systems Optimization: Trait identification for productivity Explored traits that perform under well irrigated vs drought conditions. Pareto front with more than 3 million configurations tested. Ran on XSEDE TACC and local HPC resources (unpublished, 2015). NSF FEW Workshop: Oct 12-13, 2015, ISU 11 Concluding Observations 1) Leverage (rapid) machine learning developments 2) Learn from progress/best practices in other fields 3) Fast ML models as surrogate models for exploration, uncertainty quantification 4) Visualization and data management become important 5) Data exchange/sharing/interoperability protocols have to be set. 6) Critical to incorporate software engineering practices into the workflow (code reuse, modularity). 7) Need sustained support for software development and maintenance 8) Need to be ready for next generation cyber infrastructure 9) Community based approach?