The bycatch of Bayes Nets Kerrie Mengersen QUT Australia Australian Research Council Centre of Excellence Mathematical & Statistical Frontiers: Big Data, Big Models, New Insights 7 year horizon 6 Universities 7 Partner Organisations 18 CIs, 8 PIs, 23 AIs, 18 RAs, 40PhDs Bayesian Research and Applications Group (BRAG) Our vision: To engage in world-class, relevant fundamental and collaborative statistical research, training and application through Bayesian (and other) modelling + fast computation + translation Bayesian stats + food security • • • • 4 Process modelling for plant biosecurity Conservation Surveillance design “Intelli-sensing”, eg satellite data and UAVs Spiralling Whitefly Aleurodicus dispersus The Problem Major tropical plant pest Lives on 100 hosts + Restricts market access to other states Information Literature: Characteristics, growth, spread Detectability (inspectors) Surveillance data (> 30 000 records) Scope of modelling Local, district and statewide Countries where spiralling whitefly has been detected. Administrative regions within some countries are shown when documented. Source (CABI 2004, Monteiro et al. 2005, CABI 2006). Personal communications (J.H. Martin, 2008, B.M. Waterhouse, 2008) Hierarchical Bayesian model • Data Model: Pr(data | incursion process and data parameters) – How data is observed given underlying pest extent • Process Model: Pr(incursion process | process parameters) – Potential extent given epidemiology / ecology • Parameter Model: Pr(data and process parameters) – Prior distribution to describe uncertainty in detectability, exposure, growth … • The posterior distribution of the incursion process (and parameters) is related to the prior distribution and data by: Pr(process, parameters | data) Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters) Early Warning Surveillance Priors Surveillance data Posterior learning modest reduction in area freedom large reduction in estimated extent residual “risk” maps to target surveillance Invasion Parameter Estimates Useful for local management Observation parameter estimates Also learn about: • Host suitability • Inspector efficiency Conservation and food security Modelling complex systems Systems Models “There's so much talk about the system. And so little understanding.” Robert Pirsig Zen and the Art of Motorcycle Maintenance “Move away from indicators reported separately towards methods based on understanding complexity and emergence.” Tony Morton Bayesian Networks F low 0.7 medium 0.2 high 0.1 G E yes F E no G F normal high low 0.4 0.6 medium 0.2 0.8 high 0.1 0.9 low 0.5 0.5 medium 0.6 0.4 high 0.4 0.6 Why BNs? • Be able to model the system – Include many diverse factors and their interactions – Bring together disparate knowledge, including data, model outputs, expert information, etc – Include costs, benefits, utility • Use the model to: – – – – – Identify key drivers Explore scenarios of change (“what if…?”) Identify critical control points Suggest optimal strategies for improved outcomes Understand impact of management and policy decisions Systems models (BNs) related to food security • • • • • Conservation Water quality Recycled water and health Dairy sustainability Plant biosecurity risk Indicator Economic Category Farm Factory Market Rating Commodity prices 0.8 0.6 0.6 0.7 Legal and administrative environment 0.0 0.1 0.8 0.3 Access to capital and labour 0.6 0.8 0.8 0.7 Profitability 0.3 0.8 0.7 0.6 Workforce capabiility 0.1 0.8 0.6 0.5 Economic sustainability rating 0.4 0.6 0.7 0.6 Social Lifestyle and community 0.0 0.5 0.1 0.3 Health and well being 0.6 0.9 0.8 0.7 Value and contribution 0.1 0.6 0.6 Product, safety and production 0.8 0.0 0.0 0.1 Social relevance 0.6 0.8 0.8 0.7 Social sustainability rating 0.4 0.6 0.3 0.5 Environment Energy, effluent and water 0.6 0.2 0.2 0.3 Materials, suppliers and transport 0.2 0.8 0.6 Products and services 0.8 0.8 0.2 0.6 Biodiversity 0.2 0.8 0.6 0.6 Compliance 0.2 0.0 0.0 0.1 Environment sustainability rating 0.4 0.5 0.2 0.4 Dairy Industry sustainability rating 0.4 0.6 0.4 0.5 Study 1: viability of wild cheetah population in Namibia Human Factors Subnetwork Biological Factors Subnetwork Ecological Factors Subnetwork Combined “Object Oriented” BN (OOBN) Study 2: Sustainability scorecard Measuring the complex interactions of sustainability Aim: to develop a sustainability scorecard to measure Triple Bottom Line (TBL – economic, social and environmental) performance of agricultural systems. Collaboration with Dairy Australia Sustainability Measurement Review – Key Dairy Stakeholder Review – 2009 Diary Sustainability Project – 2011 Materiality Survey (NetBalance) – 2007/08 Australian Dairy Manufacturing Industry Sustainability Report (DMSC) – Stakeholder TBL reports Vital Capital Survey, SAFE framework, DairySAT, Fonterra Sustainability Indicators, Unilever Sustainable Code, Nestle, Lactalis / Parmalat / Pauls, Danone Sustainability Report, Dutch Dairy Farming, RISE, GRI Dairy Scorecard – Conceptual BN Social Farm Economic Farm Environmental Farm Measurement of indicator Initial Sustainability at the Farm Using the quantified BN submodels & putting them together gives the initial predictive scores for sustainability at the farm level What if …..? • Now able to ask questions of the model, e.g. 1. If we improve social sustainability, how will it affect overall sustainability at the farm level? High: 20% 39%, Medium: 48% 33%, Low: 32% 28% What if …. ? 2. If we improve sustainability at the farm level, what is the effect on the TBL? Economic H,M,L: 5%, 51%, 43% 13%, 60%, 27% H,M,L: 25%, 39%, 26% 70%, 18%, 12% H,M,L: 25%, 62%, 13% 48%, 48%, 4% Social Environmental Sustainability scorecard Indicator Economic Category Farm Factory Market Rating Commodity prices 0.8 0.6 0.6 0.7 Legal and administrative environment 0.0 0.1 0.8 0.3 Access to capital and labour 0.6 0.8 0.8 0.7 Profitability 0.3 0.8 0.7 0.6 Workforce capabiility 0.1 0.8 0.6 0.5 Economic sustainability rating 0.4 0.6 0.7 0.6 Social Lifestyle and community 0.0 0.5 0.1 0.3 Health and well being 0.6 0.9 0.8 0.7 Value and contribution 0.1 0.6 0.6 Product, safety and production 0.8 0.0 0.0 0.1 Social relevance 0.6 0.8 0.8 0.7 Social sustainability rating 0.4 0.6 0.3 0.5 Environment Energy, effluent and water 0.6 0.2 0.2 0.3 Materials, suppliers and transport 0.2 0.8 0.6 Products and services 0.8 0.8 0.2 0.6 Biodiversity 0.2 0.8 0.6 0.6 Compliance 0.2 0.0 0.0 0.1 Environment sustainability rating 0.4 0.5 0.2 0.4 Dairy Industry sustainability rating 0.4 0.6 0.4 0.5 Study 3: Water quality Initiation of lyngbya in Moreton Bay The policy questions What is the overall scientific consensus about the drivers of lyngbya? What management actions should be taken to reduce lyngbya in Moreton Bay, Australia? Particulates (Nutr) Low 45.1 High 54.9 2.8 ± 3.3 INITIATION MODEL No.of previous dry days Low 10.0 Medium 50.0 High 40.0 75.6 ± 110 Rain - Present Low 62.0 Medium 26.0 High 12.0 142 ± 190 Ground Water Amount Low 73.1 High 26.9 Wind direction North 21.0 SE 24.0 Other 55.0 Wind Speed Low 59.9 High 40.1 Light Quality Poor 10.0 Borderline 40.0 High 50.0 Light Quantity Optimal SubOptimal 20.0 80.0 Low High Air 57.4 42.6 Point Sources Low 26.3 Medium 30.1 High 43.7 Dissolved P Concentration Low 62.1 High 37.9 199 ± 300 Land Run-off Load Low 51.6 High 48.4 Spring Neap Dissolved Fe Concentration Low 56.7 High 43.3 Dissolved N Concentration Low 49.6 High 50.4 Tide 50.0 50.0 Sediment Nutrient Climate NonReducing 58.4 Reducing 41.6 Dissolved Organics Low 51.0 High 49.0 Bottom Current Climate Low 48.0 High 52.0 Low High Turbidity 45.4 54.6 Light Climate Inadequate 71.3 Adequate 28.7 20.7 ± 12 Avail nutrient pool (dissolved) Enough 33.6 Not enough 66.4 Temperature Low 49.5 High 50.5 19.6 ± 9 Bloom Initiation No 76.4 Yes 23.6 Most influential factors 1. 2. 3. 4. 5. 6. 7. Available Nutrient Pool Bottom Current Climate Sediment Nutrients Dissolved Iron Dissolved Phosphorous Light Temperature M A N A G E M E N T A C T I O N S “What-if” scenarios Factor Change in P(Bloom) (%) Available Nutrient Pool 77 (3% - 80%) Bottom Current Climate 28 (15% - 43%) Sediment Nutrient Climate 17 (21% - 38%) Dissolved Fe 16 (21% - 37%) Dissolved P 15 (23% - 38%) Light Climate 14 (18% - 32% ) Temperature 14 (21% - 35%) Dissolved N 13 (22% - 35%) Rain – present 10 (25% - 35%) Light Quantity 9 (21% - 30%) From Science to Management Study 4: Recycled Water and Health Handbook Study 5: “Beyond Compliance” An integrated approach to pest risk management STDF – WTO funded project 5 SEA partners + OC: + QUT Mumford et al. • Production Chain • Decision Support • Control Point BN (CP-BN) 44 1: Production chain Exporting Malaysian jackfruit to China Decision support spreadsheet A2.01 A2.02 A2.03 A2.04 Key Factors Overall rating - Entry Overall rating - Establishment Overall rating - Spread Overall rating - Impact Score Unlikely Moderately unlikely Moderate Minor Uncertainty Low Low Low Low A2.05 How easy is it to detect the key Easy organisms on the commodity / pathway? A2.06 How easy is it to identify the key With some difficulty organisms? A2.07 How well organised is the sector at risk Mod. well organised in the importing country? A2.08 What is the estimated prevalence of the pest in the area where commodity is cultivated? High Medium Medium Medium Low Decision support spreadsheet Risk management measures available 1.1 a) What is its potential contribution to risk reduction? Efficacy 1.1 b) Uncertainty 1.2 a) The measure can be verified? Verification 1.2 b) Uncertainty Graphic Graphic (automatically read in from Table B2) Sterile insect technique (SIT) 1 1 0.8 0.8 0.6 Very high Low 0.6 Very easy 0.4 0.2 Very low 0 H M L VL 1 1 0.8 0.8 0.6 High Medium Easy 0.2 Low H M L 1 1 0.8 With some difficulty 0.2 Low M L 1 1 0.8 Easy 0.2 VD VE E SD D VD Low VE E SD D VD VE E SD D VD 0.4 0.2 0 H M L VL 1 1 0.8 0.8 0.6 0.4 Easy 0.2 0 Low 0.4 0.2 0 VH Harvesting at right maturity index to prevent infestation of fruit fly D 0.6 0.4 0.6 Low SD 0.2 0.8 VH Very high E 0.4 VL 0 Bagging of fruits 14 days after fruit set VE 0 H 0.6 Low VD 0.6 0.4 VH High D 0.2 0.8 0 Culling of over-crowded and disease infested fruits SD 0.4 VL 0.6 Low E 0 VH High VE 0.6 0.4 0 Male annihilation, utilizing the attraction of males to methyl eugenol baits 0.2 0 VH Pesticides spray program 0.4 H M L VL 1 1 0.8 0.8 0.6 0.6 0.4 0.4 CP-BN Economics add-on • The final target node gives the probability of infestation at the point of export. This must be sufficiently low to comply with the requirements of the dragon fruit importer concerned. • We also need to include the equally important issues of loss to fruit production due to this infestation, and costs of control or preventive measures • That is, what is the net value of the crop? 49 Economics adding costs via utility nodes 50 Economics adding losses utility nodes J. Holt, A. W. Leach, S. Johnson, D. M. Tu, D. T. Nhu, N. T. Anh, L. N. Quang, M. M. Quinlan, P. J. L.Whittle, K. Mengersen and J. D. Mumford (in prep.) Bayesian networks to compare pest control interventions on commodities 51 along agricultural production chains. Methods Questions 1. How to elicit information from experts? 2. How to combine information from multiple experts? 3. How to assess the validity and reliability of a BN? 4. How to incorporate uncertainty into BNs? 5. How to combine BNs? 52 1. Eliciting expert information • Train experts prior to elicitation • Elicit using “outside-in” method – Extrema: absolute lower and upper limits – Quantiles: realistic limits (L, U) + uncertainty/sureness around these bounds – Mode: most plausible value • Record as count, percentage or multiplicative factor • Encode via least squares as normal, lognormal, extended beta etc 2. Combining expert judgements – Delphi method – Pooling – Modelling Pooling 1. Average expert opinions for each node and propagate the averages through the network 2. Average after transforming probability to log odds 3. Propagate the opinions through the network for each expert and average the outputs for each expert Average = linear or geometric, weighted or unweighted Add a random effect for between-expert deviations Modelling • Random effects model • Measurement error model • Item response model Probability in nodel l Overall Node effect Expert effect • Can obtain estimates of combined probabilities, node differences, expert differences 3. Validity and reliability of a BN Psychometric approach Nomological: Face: Content: Concurrent: Convergent: Discriminant: sits well within current academic thought valid representation of the underlying system includes all potentially relevant factors related measures in time/space vary similarly theoretically related measures match theoretically unrelated measures are different Pitchforth, 2013 57 4. Incorporating uncertainty • Add prior distributions to nodes • Propagate populations through the BN (Donald et al. ANZJS 2015) Prob. gastroenteritis (95% CI) = 0.030 (0.026, 0.034) 58 5. Combining BNs Many perspectives = many potential models How to combine outputs? Model averaging approach – Obtain an estimate of goodness of fit for each BN – Generate probabilities or ‘data’ from each BN – Obtain a weighted average of the desired measures How to combine structures? TBC… 59 Conclusion: Why BNs? Because sometimes the solutions are not where we are looking 60