The bycatch of Bayes Nets Kerrie Mengersen QUT Australia

advertisement
The bycatch of Bayes Nets
Kerrie Mengersen
QUT
Australia
Australian Research Council Centre of Excellence
Mathematical & Statistical Frontiers:
Big Data, Big Models, New Insights
7 year horizon
6 Universities
7 Partner Organisations
18 CIs, 8 PIs, 23 AIs, 18 RAs, 40PhDs
Bayesian Research and Applications
Group (BRAG)
Our vision: To engage in world-class, relevant fundamental
and collaborative statistical research, training and
application through Bayesian (and other) modelling + fast
computation + translation
Bayesian stats + food security
•
•
•
•
4
Process modelling for plant biosecurity
Conservation
Surveillance design
“Intelli-sensing”, eg satellite data and UAVs
Spiralling Whitefly
Aleurodicus dispersus

The Problem
 Major tropical plant pest
 Lives on 100 hosts +
 Restricts market access to other
states

Information
 Literature: Characteristics,
growth, spread
 Detectability (inspectors)
 Surveillance data (> 30 000
records)

Scope of modelling
 Local, district and statewide
Countries where spiralling whitefly has been detected.
Administrative regions within some countries are shown
when documented. Source (CABI 2004, Monteiro et al.
2005, CABI 2006). Personal communications (J.H.
Martin, 2008, B.M. Waterhouse, 2008)
Hierarchical Bayesian model
• Data Model: Pr(data | incursion process and data parameters)
– How data is observed given underlying pest extent
• Process Model: Pr(incursion process | process parameters)
– Potential extent given epidemiology / ecology
• Parameter Model: Pr(data and process parameters)
– Prior distribution to describe uncertainty in detectability, exposure, growth …
• The posterior distribution of the incursion process (and
parameters) is related to the prior distribution and data by:
Pr(process, parameters | data) 
Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters)
Early Warning Surveillance

Priors

Surveillance data

Posterior learning
 modest reduction in
area freedom
 large reduction in
estimated extent
 residual “risk” maps to
target surveillance
Invasion Parameter Estimates
Useful for local management
Observation parameter
estimates
Also learn about:
• Host suitability
• Inspector efficiency
Conservation and food security
Modelling complex systems
Systems Models
“There's so much talk about the system.
And so little understanding.”
Robert Pirsig
Zen and the Art of Motorcycle Maintenance
“Move away from indicators reported
separately towards methods based on
understanding complexity and emergence.”
Tony Morton
Bayesian Networks
F
low
0.7
medium
0.2
high
0.1
G
E
yes
F
E
no
G
F
normal
high
low
0.4
0.6
medium
0.2
0.8
high
0.1
0.9
low
0.5
0.5
medium
0.6
0.4
high
0.4
0.6
Why BNs?
• Be able to model the system
– Include many diverse factors and their interactions
– Bring together disparate knowledge, including data, model
outputs, expert information, etc
– Include costs, benefits, utility
• Use the model to:
–
–
–
–
–
Identify key drivers
Explore scenarios of change (“what if…?”)
Identify critical control points
Suggest optimal strategies for improved outcomes
Understand impact of management and policy decisions
Systems models (BNs) related to
food security
•
•
•
•
•
Conservation
Water quality
Recycled water and health
Dairy sustainability
Plant biosecurity risk
Indicator
Economic
Category
Farm
Factory Market Rating
Commodity prices
0.8
0.6
0.6
0.7
Legal and administrative environment
0.0
0.1
0.8
0.3
Access to capital and labour
0.6
0.8
0.8
0.7
Profitability
0.3
0.8
0.7
0.6
Workforce capabiility
0.1
0.8
0.6
0.5
Economic sustainability rating
0.4
0.6
0.7
0.6
Social
Lifestyle and community
0.0
0.5
0.1
0.3
Health and well being
0.6
0.9
0.8
0.7
Value and contribution
0.1
0.6
0.6
Product, safety and production
0.8
0.0
0.0
0.1
Social relevance
0.6
0.8
0.8
0.7
Social sustainability rating
0.4
0.6
0.3
0.5
Environment Energy, effluent and water
0.6
0.2
0.2
0.3
Materials, suppliers and transport
0.2
0.8
0.6
Products and services
0.8
0.8
0.2
0.6
Biodiversity
0.2
0.8
0.6
0.6
Compliance
0.2
0.0
0.0
0.1
Environment sustainability rating
0.4
0.5
0.2
0.4
Dairy Industry sustainability rating
0.4
0.6
0.4
0.5
Study 1: viability of wild cheetah
population in Namibia
Human Factors Subnetwork
Biological Factors Subnetwork
Ecological Factors Subnetwork
Combined “Object Oriented” BN (OOBN)
Study 2: Sustainability scorecard
Measuring the complex interactions of sustainability
Aim: to develop a sustainability scorecard to measure
Triple Bottom Line (TBL – economic, social and
environmental) performance of agricultural systems.
Collaboration with Dairy Australia
Sustainability Measurement Review
– Key Dairy Stakeholder Review
– 2009 Diary Sustainability Project
– 2011 Materiality Survey (NetBalance)
– 2007/08 Australian Dairy Manufacturing Industry
Sustainability Report (DMSC)
– Stakeholder TBL reports
Vital Capital Survey, SAFE framework, DairySAT, Fonterra
Sustainability Indicators, Unilever Sustainable Code, Nestle, Lactalis /
Parmalat / Pauls, Danone Sustainability Report, Dutch Dairy Farming,
RISE, GRI
Dairy Scorecard – Conceptual BN
Social Farm
Economic Farm
Environmental Farm
Measurement of indicator
Initial Sustainability at the Farm

Using the quantified BN submodels & putting
them together gives the initial predictive scores for
sustainability at the farm level
What if …..?
• Now able to ask questions of the model, e.g.
1. If we improve social sustainability, how will it affect
overall sustainability at the farm level?
High: 20%  39%, Medium: 48%  33%, Low: 32%  28%
What if …. ?
2. If we improve sustainability at the farm level, what is the
effect on the TBL?
Economic
H,M,L: 5%, 51%, 43%  13%, 60%, 27%
H,M,L: 25%, 39%, 26%  70%, 18%, 12%
H,M,L: 25%, 62%, 13%  48%, 48%, 4%
Social
Environmental
Sustainability scorecard
Indicator
Economic
Category
Farm
Factory Market Rating
Commodity prices
0.8
0.6
0.6
0.7
Legal and administrative environment
0.0
0.1
0.8
0.3
Access to capital and labour
0.6
0.8
0.8
0.7
Profitability
0.3
0.8
0.7
0.6
Workforce capabiility
0.1
0.8
0.6
0.5
Economic sustainability rating
0.4
0.6
0.7
0.6
Social
Lifestyle and community
0.0
0.5
0.1
0.3
Health and well being
0.6
0.9
0.8
0.7
Value and contribution
0.1
0.6
0.6
Product, safety and production
0.8
0.0
0.0
0.1
Social relevance
0.6
0.8
0.8
0.7
Social sustainability rating
0.4
0.6
0.3
0.5
Environment Energy, effluent and water
0.6
0.2
0.2
0.3
Materials, suppliers and transport
0.2
0.8
0.6
Products and services
0.8
0.8
0.2
0.6
Biodiversity
0.2
0.8
0.6
0.6
Compliance
0.2
0.0
0.0
0.1
Environment sustainability rating
0.4
0.5
0.2
0.4
Dairy Industry sustainability rating
0.4
0.6
0.4
0.5
Study 3: Water quality
Initiation of lyngbya in Moreton Bay
The policy questions
What is the overall scientific consensus
about the drivers of lyngbya?
What management actions should be
taken to reduce lyngbya in Moreton Bay,
Australia?
Particulates (Nutr)
Low
45.1
High
54.9
2.8 ± 3.3
INITIATION MODEL
No.of previous dry days
Low
10.0
Medium
50.0
High
40.0
75.6 ± 110
Rain - Present
Low
62.0
Medium
26.0
High
12.0
142 ± 190
Ground Water Amount
Low
73.1
High
26.9
Wind direction
North
21.0
SE
24.0
Other
55.0
Wind Speed
Low
59.9
High
40.1
Light Quality
Poor
10.0
Borderline
40.0
High
50.0
Light Quantity
Optimal
SubOptimal
20.0
80.0
Low
High
Air
57.4
42.6
Point Sources
Low
26.3
Medium
30.1
High
43.7
Dissolved P Concentration
Low
62.1
High
37.9
199 ± 300
Land Run-off Load
Low
51.6
High
48.4
Spring
Neap
Dissolved Fe Concentration
Low
56.7
High
43.3
Dissolved N Concentration
Low
49.6
High
50.4
Tide
50.0
50.0
Sediment Nutrient Climate
NonReducing
58.4
Reducing
41.6
Dissolved Organics
Low
51.0
High
49.0
Bottom Current Climate
Low
48.0
High
52.0
Low
High
Turbidity
45.4
54.6
Light Climate
Inadequate
71.3
Adequate
28.7
20.7 ± 12
Avail nutrient pool (dissolved)
Enough
33.6
Not enough
66.4
Temperature
Low
49.5
High
50.5
19.6 ± 9
Bloom Initiation
No
76.4
Yes
23.6
Most influential factors
1.
2.
3.
4.
5.
6.
7.
Available Nutrient Pool
Bottom Current Climate
Sediment Nutrients
Dissolved Iron
Dissolved Phosphorous
Light
Temperature
M
A
N
A
G
E
M
E
N
T
A
C
T
I
O
N
S
“What-if” scenarios
Factor
Change in P(Bloom)
(%)
Available Nutrient Pool
77 (3% - 80%)
Bottom Current Climate
28 (15% - 43%)
Sediment Nutrient Climate
17 (21% - 38%)
Dissolved Fe
16 (21% - 37%)
Dissolved P
15 (23% - 38%)
Light Climate
14 (18% - 32% )
Temperature
14 (21% - 35%)
Dissolved N
13 (22% - 35%)
Rain – present
10 (25% - 35%)
Light Quantity
9 (21% - 30%)
From Science to Management
Study 4: Recycled Water and
Health Handbook
Study 5: “Beyond Compliance”
An integrated
approach to
pest risk
management
STDF – WTO
funded project
5 SEA partners
+ OC: + QUT
Mumford et al.
• Production Chain
• Decision Support
• Control Point BN
(CP-BN)
44
1: Production chain
Exporting Malaysian jackfruit to China
Decision support spreadsheet
A2.01
A2.02
A2.03
A2.04
Key Factors
Overall rating - Entry
Overall rating - Establishment
Overall rating - Spread
Overall rating - Impact
Score
Unlikely
Moderately unlikely
Moderate
Minor
Uncertainty
Low
Low
Low
Low
A2.05
How easy is it to detect the key
Easy
organisms on the commodity /
pathway?
A2.06 How easy is it to identify the key
With some difficulty
organisms?
A2.07 How well organised is the sector at risk
Mod. well organised
in the importing country?
A2.08
What is the estimated prevalence of
the pest in the area where commodity
is cultivated?
High
Medium
Medium
Medium
Low
Decision support spreadsheet
Risk management
measures available
1.1 a) What is its potential
contribution to risk reduction?
Efficacy
1.1 b) Uncertainty
1.2 a) The measure can be
verified?
Verification
1.2 b) Uncertainty
Graphic
Graphic
(automatically read in from Table B2)
Sterile insect technique (SIT)
1
1
0.8
0.8
0.6
Very high
Low
0.6
Very easy
0.4
0.2
Very low
0
H
M
L
VL
1
1
0.8
0.8
0.6
High
Medium
Easy
0.2
Low
H
M
L
1
1
0.8
With some difficulty
0.2
Low
M
L
1
1
0.8
Easy
0.2
VD
VE
E
SD
D
VD
Low
VE
E
SD
D
VD
VE
E
SD
D
VD
0.4
0.2
0
H
M
L
VL
1
1
0.8
0.8
0.6
0.4
Easy
0.2
0
Low
0.4
0.2
0
VH
Harvesting at right maturity index to
prevent infestation of fruit fly
D
0.6
0.4
0.6
Low
SD
0.2
0.8
VH
Very high
E
0.4
VL
0
Bagging of fruits 14 days after fruit set
VE
0
H
0.6
Low
VD
0.6
0.4
VH
High
D
0.2
0.8
0
Culling of over-crowded and disease
infested fruits
SD
0.4
VL
0.6
Low
E
0
VH
High
VE
0.6
0.4
0
Male annihilation, utilizing the
attraction of males to methyl eugenol
baits
0.2
0
VH
Pesticides spray program
0.4
H
M
L
VL
1
1
0.8
0.8
0.6
0.6
0.4
0.4
CP-BN
Economics add-on
• The final target node gives the probability of infestation at the point
of export. This must be sufficiently low to comply with the
requirements of the dragon fruit importer concerned.
• We also need to include the equally important issues of loss to fruit
production due to this infestation, and costs of control or preventive
measures
• That is, what is the net value of the crop?
49
Economics
adding costs via utility nodes
50
Economics
adding losses utility nodes
J. Holt, A. W. Leach, S. Johnson, D. M. Tu, D. T. Nhu, N. T. Anh, L. N. Quang,
M. M. Quinlan, P. J. L.Whittle, K. Mengersen and J. D. Mumford (in prep.)
Bayesian networks to compare pest control interventions on commodities
51
along
agricultural production chains.
Methods Questions
1. How to elicit information from experts?
2. How to combine information from multiple
experts?
3. How to assess the validity and reliability
of a BN?
4. How to incorporate uncertainty into BNs?
5. How to combine BNs?
52
1. Eliciting expert information
• Train experts prior to elicitation
• Elicit using “outside-in” method
– Extrema: absolute lower and upper limits
– Quantiles: realistic limits
(L, U) + uncertainty/sureness around these bounds
– Mode: most plausible value
• Record as count, percentage or multiplicative factor
• Encode via least squares as normal, lognormal, extended beta etc
2. Combining expert judgements
– Delphi method
– Pooling
– Modelling
Pooling
1. Average expert opinions for each node and propagate
the averages through the network
2. Average after transforming probability to log odds
3. Propagate the opinions through the network for each
expert and average the outputs for each expert
Average = linear or geometric, weighted or unweighted
Add a random effect for between-expert deviations
Modelling
• Random effects model
• Measurement error model
• Item response model
Probability in nodel l
Overall Node effect
Expert effect
• Can obtain estimates of combined probabilities, node
differences, expert differences
3. Validity and reliability of a BN
Psychometric approach
Nomological:
Face:
Content:
Concurrent:
Convergent:
Discriminant:
sits well within current academic thought
valid representation of the underlying system
includes all potentially relevant factors
related measures in time/space vary similarly
theoretically related measures match
theoretically unrelated measures are different
Pitchforth, 2013
57
4. Incorporating uncertainty
• Add prior distributions to nodes
• Propagate populations through the BN
(Donald et al. ANZJS 2015)
Prob. gastroenteritis (95% CI) = 0.030 (0.026, 0.034)
58
5. Combining BNs
Many perspectives = many potential models
How to combine outputs?
Model averaging approach
– Obtain an estimate of goodness of fit for each
BN
– Generate probabilities or ‘data’ from each BN
– Obtain a weighted average of the desired
measures
How to combine structures?
TBC…
59
Conclusion: Why BNs?
Because sometimes the solutions are not
where we are looking
60
Download