moon-final-learningfrompast&present-input-data

advertisement
Learning From the Past
and the Present
Those who refuse to learn from the past
are doomed to repeat its failures.
Presented by
D.E. Moon
CDT
Core Decision Technologies, Inc.
Lets Put Inventory Data Into
Perspective
The Knowledge Progression
Data
Organization
Interpretation
Information
Analysis
© 2001 CDT Core Decision Technologies Inc.
Integration/Model
Prediction/
Knowledge
Criteria
Decision
Process
Slide #2
A Brief History of Land
Resource Inventory in Canada
 1950s
 Traditional
agricultural soil inventories
 1960s
 Canada
Land Inventory (external paying client)
 Systematic,
1:250K nation wide land capability
Forestry, Agriculture, Recreation, Wildlife
 Based on climatic zonation, distribution of landscapes with in
zones, and soils within landscapes

 Criteria,
procedures, data, and products were clearly
defined before the inventory was started.
 It was very well done in only 10 years
© 2001 CDT Core Decision Technologies Inc.
Slide #3
A Brief History (cont.)
 1970s
 Upgrade
of CLI maps to “1:100K, Multi-purpose Soil
Inventories” (sold on the success of the CLI)
 sold as a one time effort because soils are stable!
 able to answer a wide range of suitability questions!
 Landscape / hydrology / genetic bias, soil associations
 modal data, many attributes, few sites
 beautiful, well edited, technical reports and maps
 when potential clients came to us, we could not answer
their questions.
And so closed the 70s.
© 2001 CDT Core Decision Technologies Inc.
Slide #4
A Brief History (cont.)
 1980s
 we
told potential clients that we could not answer
their questions because we needed more detail
 we moved from 1:100 K to 1:20K and
 we tried to answer everything!
at 10-20x the cost of answering the clients question
and the extra questions were answered poorly
1987 brought the closure of the B.C. Soil Survey program,
the largest provincial soil inventory program in Canada
© 2001 CDT Core Decision Technologies Inc.
Slide #5
Decision
Decision
Knowledge/
Prediction
Knowledge/
Prediction
Information
Information
Data
Data
We had moved from here,
To here,
a few decisions
many decisions
based on sound data. based on unsound/inappropriate data.
© 2001 CDT Core Decision Technologies Inc.
Slide #6
A Brief History (cont.)
 1990s
 Reverse
Direction
 1:1
million Soil Landscapes of Canada
 minimum data set but complete national coverage
 But
we had large gaps in coverage and data
 missing
coverage mapped with no verification
 missing data estimated by expert systems or regional
regressions applied nationally
 We
now claimed to have complete national
coverage and were ready to answer questions!
© 2001 CDT Core Decision Technologies Inc.
Slide #7
A Brief History (the final
chapter)
 1994
 We
got a real, internal client. It needed
indicators of environmental sustainability
 indicator
procedures were developed by our scientists
 our “data” was evaluated but found wanting
 “pedo-transfer” functions were developed by
inventory experts to infer required parameters from
our previous estimates
 resulting in an internally contradictory, dialectical disunity
 e.g. we had more soil water than voids to hold it
© 2001 CDT Core Decision Technologies Inc.
Slide #8
A Brief History (the epilogue)
 1995
 National
Soil database supported by a National
Research Centre
 120 FTEs, 11 Regional Offices
 Annual Budget $ 13 million
 1997
 Supported
by a section of a regional research centre
 11
FTEs
 Annual Budget $ 0.5 million
© 2001 CDT Core Decision Technologies Inc.
Slide #9
The final irony
 1997
 Statistics
Canada launched a program to
recover the digital database for the 1970s
Canada Land Inventory.
 Why?
 It
provides complete national coverage
 It allows comparison of resource values
 It is consistent, non-contradictory, and does what it
was designed to do
It
is better than anything produced since!
© 2001 CDT Core Decision Technologies Inc.
Slide #10
So were we stupid or what?
 The
closure of the Province of B.C.’s soil
survey program in 1987 was a wake up call
for some but not many.
 We tried a number of things.
 We
looked at our clients,
 we looked at our mandate,
 how
we did our jobs,
 how we packaged the results, and
 how we promoted our products.
© 2001 CDT Core Decision Technologies Inc.
Slide #11
Our clients
 Until
the end, the inventory community was
our only paying client therefore, until the
end
 we
defined our own needs,
 defined our own procedures,
 set our own priorities,
 performed our own QA/QC
 did our own evaluations
 Boy
did we look good!
© 2001 CDT Core Decision Technologies Inc.
Slide #12
Our mandate
 Our
job was to collect and interpret land
resource data (we decided for what
purpose).
 We convinced senior management that our
products were widely needed. In the end,
we did not deliver and had no real clients.
 We
believed that we knew what potential clients
needed better than they did.
 We were, after all, the land resource experts!
© 2001 CDT Core Decision Technologies Inc.
Slide #13
How we did our job?
 We
actually did evaluate some of our
inventory procedures.
 Map
units (what we drew lines around).
 Inventory procedures (cost effectiveness).
 Data management procedures and systems.
 Reliability.
 The
results failed to inspire confidence so,
 they
were deemed non-representative and
irrelevant.
© 2001 CDT Core Decision Technologies Inc.
Slide #14
How we packaged it
 We
moved to “Productization”
 standard
map and report formats
 standard interpretations
 standard packaging
 computer automation
 electronic distribution
 promotion and advertising but
 we
still had inappropriate, inaccurate, and
incomplete data that could not answer
client’s questions reliably.
© 2001 CDT Core Decision Technologies Inc.
Slide #15
So what was missing?
 Paying
 With
 When
clients (who could hold us accountable).
real problems and needs to provide direction!
we finally accepted a client (1994)
 We
let them define the problem.
 We adopted or developed procedures to solve them.
 We did a functional analysis to determine data needs
 and we discovered that we did not have the required data.
 Too late. We had told everyone that we had the data, so
we made it up.
 So
was it all inevitable.
© 2001 CDT Core Decision Technologies Inc.
No!
Slide #16
What should we have done?
 Determined
 Not
what we wanted or hoped to map.
 Found
 One
what we could reliably map
a formal client with clear objectives.
to whom we would be accountable.
 Involved
the client in defining deliverables
 Determined the degree to which we could
meet the client’s needs and at what costs.
© 2001 CDT Core Decision Technologies Inc.
Slide #17
 Been
honest about:
 what
we could and could not do, and
 what was really needed to answer the question.
 Imposed
rigorous internal & external QA / QC.
 Put the client’s, not our own needs and
aspirations, first.
 Conducted follow-up
 How
well were the clients needs being met.
 How could the needs have been better met.
© 2001 CDT Core Decision Technologies Inc.
Slide #18
Why Did We Fail?
 Management
reasons
 Technical reasons
 Political reasons
 Human resource reasons
 Human nature reasons
© 2001 CDT Core Decision Technologies Inc.
Slide #19
Management Causes
 Mandate
 Not
was inventory not decision support.
tied to departmental business functions.
 Project
approval was based on internal
priorities.
 No external clients or accountability.
 Inappropriate performance criteria.
 No executive commitment to needed change.
© 2001 CDT Core Decision Technologies Inc.
Slide #20
Technical Causes
 Interpretations
and decision support were
secondary and, therefore, poorly served.
 We had spent a bundle in the 60’s and 70’s on
 Soil
classification
 the
 Soil
only national taxonomy in Canada
interpretation
 Every
thing from crop suitability to suitability for septic
fields based on site data
 and
 at
we were damn well going map it.
scales of 1:15K to 1:1Million
© 2001 CDT Core Decision Technologies Inc.
Slide #21
 Legacy
systems constrained progress.
 Designed
to do the wrong thing but since the
systems were available we continued to use
them.
 Data which the legacy systems did not handle
were ignored, generalized, or forced to fit.
 Decision support procedures which the systems
could not handle were rejected.
© 2001 CDT Core Decision Technologies Inc.
Slide #22
Political causes
 Line
and middle management power
politics (aided by senior management’s lack of
commitment to change).
 managers
feared the success of subordinates
who were allowed to use new skills and
techniques,
 managers feared loss of control if they could
not develop or master the new procedures,
 managers feared that the introduction of new
procedures would imply that the previous
procedures were wrong.
© 2001 CDT Core Decision Technologies Inc.
Slide #23
Human Resources
 Demographics.
 last
significant staffing was late 1970s.
 academically qualified people not available.
 staffed
rather than lose the position.
 indoctrinated new staff into the current system.
 present
managers were the product of 1970s
indoctrination.
 attitudes and positions of 1970s and 1980s
became entrenched and then retrenched.

Note similar conditions in information technology today.
© 2001 CDT Core Decision Technologies Inc.
Slide #24
Human nature
 Fear
of no longer being able to do their job.
 Fear of losing hard won intellectual equity
if it is made obsolete by new procedures.
 Fear of losing professional status and
influence if no longer “the expert”.
 Resistance to learning new ways of doing
what you already know how to do.
© 2001 CDT Core Decision Technologies Inc.
Slide #25
So what else should we have
done?
 Commit
to adaptive change (senior and middle management).
 Develop a client-centric mode of operation.
 Let
them define their problems.
 Jointly develop procedures to solve the problems.
 Develop
relevant performance measures.
 Reward
innovation and adaptive change.
 Incorporate client satisfaction.
 Require rigorous problem analysis.
© 2001 CDT Core Decision Technologies Inc.
Slide #26
Learning from the Present?
 Your
present is looking a lot like our past.
 High
credibility based on success of BEC zones
and sub-zones and identification of site-series.
 You have a tremendous volume of legacy data
 They
are at best suspect, at worst wrong, and most
probably inadequate.
 You
are being pressured to use it to answer
today’s site management questions at low cost.
 You are selling an untested product.
 Wanting
and believing does not make it so.
© 2001 CDT Core Decision Technologies Inc.
Slide #27
Highlights of a Problem
Analysis of Input Data Quality
for PEM
 An
initial assessment of input data quality
identified potentially serious problems with:
 spatial
accuracy and resolution
 thematic accuracy and resolution
© 2001 CDT Core Decision Technologies Inc.
Slide #28
Spatial Issues
Terrain
Overlay
A7B3
Spatial
Overlay
Soil
X5Y5
A7B3
X5Y5
© 2001 CDT Core Decision Technologies Inc.
Alluvial fans that should
correspond.
Slide #29
Thematic Issues
 Compound




map unit overlay
AX from 21% to 50%
AY from 21% to 50%
BX from 0% to 30%
BY from 0% to 30%
© 2001 CDT Core Decision Technologies Inc.
X5Y5
A7B3
Slide #30
Thematic Reliability

Estimate and 95% Confidence Interval for the area of defined
Error Classes
Class
Correct
Area Confidence
Interval
17%
8-27
Similar
Dissimilar
Contrasting
Diss + Cont
18%
7%
61%
68%
© 2001 CDT Core Decision Technologies Inc.
9-21
2-12
48-73
57-78
Slide #31
Approach to an Input Data
Evaluation Framework
 Documented
 map

mapping concepts used in inventories
units and reliability
Documented the elements of data quality
 spatial
and thematic
Defined the effect of mapping procedures on data
quality
 Specified the meta-data required to evaluate data
quality and we then
 Developed a framework and criteria for evaluating
data inputs to PEM

© 2001 CDT Core Decision Technologies Inc.
Slide #32
Conclusion
 Input
data quality is highly variable.
 The strong appeal of PEM will require a
tightly reasoned, widely accepted input data
evaluation framework if it is to have wide
acceptance and application.
 The framework developed in the Input Data
Quality Report provides an effective basis
for both PEM input and PEM output data
quality evaluation.
© 2001 CDT Core Decision Technologies Inc.
Slide #33
Knowledge-based systems
 Knowledge-based
systems incorporate data
and information with rules, relationships,
probabilities, and logic or models to:
 predict
outcomes
 support decisions
 classify unknowns
 identify new relationships and patterns
© 2001 CDT Core Decision Technologies Inc.
Slide #34
Examples of older and current
knowledge-based systems
 Taxonomic
models (intuitive, empirical, no causality)
 Statistical models (empirical, no causality)
 Process models (causal, no feedback)
 Systems models (causal with feedback mechanisms)
 Expert systems (heuristic, no causality) Eldar
 Artificial neural networks (non-rigorous statistics)
 Belief matrixes and decision trees (heuristic)
Ecogen
© 2001 CDT Core Decision Technologies Inc.
Slide #35
The new knowledge paradigm
 Ontology
 A representational
vocabulary for a shared
domain of discourse – definitions of classes,
relations, functions, processes and other
objects.
 Ontologies
will be the basis for knowledge
integration
 analogous
to data dictionaries and data models
for data management and integration.
© 2001 CDT Core Decision Technologies Inc.
Slide #36
Knowledge management
 The
definition and maintenance of
knowledge standards and protocols is a
necessary precursor to knowledge
management.
 The management of knowledge-based
systems will require:
a
knowledge syntax and semantic lexicon
 domain ontologies for the knowledge area
© 2001 CDT Core Decision Technologies Inc.
Slide #37
The current situation
 Knowledge
management is following an
evolution similar to data and information
management.
 Many
organizations are building knowledge
bases in an uncoordinated and ad hoc manner.
 There is growing recognition of the need for
and value of knowledge sharing and reuse.
© 2001 CDT Core Decision Technologies Inc.
Slide #38
The current situation
 As
with data management, the major
impediments to knowledge sharing and reuse
are:
 inconsistent
concepts, definitions, terminology,
structures, and formats
 in addition there is no standard syntax or
semantic of inference to support communication
 standards and protocols to enable knowledge
sharing and reuse are only just emerging
© 2001 CDT Core Decision Technologies Inc.
Slide #39
Conclusions
 It
is feasible to establish a generic knowledge
structure that will accommodate most, if not
all, evolving knowledge models including
those used in TEM and PEM
 this
structure would be able to store, retrieve and
share disparate knowledge bases.
 integration into a true knowledge management
system with a common syntax and semantic of
knowledge inference and retrieval is also feasible
but would be much more difficult and costly.
© 2001 CDT Core Decision Technologies Inc.
Slide #40
Conclusions
 The
generic knowledge structure would
accommodate the ontology of TEM and
PEM approaches
 the
ontology would include the definitions of
classes, concepts, relations, functions, and
processes assumed to produce the classes
recognized in the TEM classification and would
also accommodate the inferences used in PEM.
© 2001 CDT Core Decision Technologies Inc.
Slide #41
Our end,
your beginning?
© 2001 CDT Core Decision Technologies Inc.
Slide #42
When interpretation was
attempted we discovered that:
 Although
the maps were reasonably accurate,
 the
necessary data had not been collected, or
 the data was in the wrong format, or
 we had used the wrong method of analysis, or
 class limits, or precision were inappropriate, or
 the map units and scale severely limited the site
specificity of the interpretation (e.g. a wide range of
response could be expected in most polygons).
© 2001 CDT Core Decision Technologies Inc.
Slide #43
Agriculture Land Reserve
 Survey
cost $ 3-10 million
 reliability
of predicting reserve status 92%
 Post
completion we did a pilot project using
air photo interpreted land use as an indicator
of reserve status
 the
reliability of reserve status was 96%.
 The estimated cost of full survey $ 0.3 million
© 2001 CDT Core Decision Technologies Inc.
Slide #44
Now when interpretation was
attempted we discovered that:
 The
necessary data had been collected,
 the data was in the correct format,
 we had used the appropriate method of
analysis, and
 we had used appropriate soil definitions,
class limits, and precision for the required
interpretations, but
© 2001 CDT Core Decision Technologies Inc.
Slide #45
 map
units and scale were still inappropriate
 Compound
map units (TEM convention) gave
conflicting interpretations
 The
maps were unreliable
 Thematic
reliability < 30%
 Mappers
were restricted to naming 3 soils
 Maps had a modal value of 7 soils / polygon
 Interpretations
required greater precision
than could be mapped at 1:20K
© 2001 CDT Core Decision Technologies Inc.
Slide #46
Minimum data set
 Review
of National Soil Database
 average
of 2,500 attributes per site
 about 12,000 sites
 only 10 attributes were used routinely in
interpretations

only 30% of data sets had all 10
 collection
of these 10 takes about 1/8 - 1/4 of
the field time and 1/20 of the lab costs.
 deemed inappropriate to collect missing data!
© 2001 CDT Core Decision Technologies Inc.
Slide #47
Selling the Product
 We
used models developed at the plot level
(10 m2) to make predictions at 1: 1 million (10
– 50 km2)
 Average
polygon had 10 – 30 major soils
 Algorithm
accessed 1 detailed soil description and
extrapolated the results to 10s of polygons and
hundreds to thousand of hectares.
 The required data was most frequently estimated.
 One
enterprising fellow sold Health Canada on a
1:1 million map of suitability for carcass burial.
© 2001 CDT Core Decision Technologies Inc.
Slide #48
Requirements
Data Inventory
Understand the questions to be answered.
 Determine the data needed to answer the questions.

 Evaluate
existing data for adequacy.
It may be that a sub-optimal but adequate procedure can be
developed for existing data.
 It may be that adequate data does not exist and must be collected.


If the needed data cannot be collected at appropriate
cost and reliability, the requirement cannot be met.
 Look
for alternative approaches.
 It did not matter how badly Regan wanted star wars, it just
could not be built.
© 2001 CDT Core Decision Technologies Inc.
Slide #49
Download