Getting Data to Applications: Why Do We Arnon Rosenthal,

advertisement
Getting Data to Applications: Why Do We
Fail, and How We Can Do Better?
Arnon Rosenthal,
Frank Manola, Scott Renner
Toward an
Industrial Revolution for Data Interoperability
Incremental, (full) Interfaces, Incentives
Arnon Rosenthal,
Frank Manola, Scott Renner
Goal: A Common Operational Picture (COP)
View
tier
logistics
mapmaker
intelligence
operations
User sees
data values,
assembled and
expressed
in user’s
own terms
The “Common Operation Picture”
warehouse or federation:
an integrated subset of information sources
with presentations for different users
Source
tier
sensor
naval
NIMA info products ground
air
5
Current Status
Read only is insufficiently ambitious for a guiding vision
but is driving many industrial solutions
 Proposed architectures (e.g., messaging) often don’t fit
- Metadata
- Operations: update /annotate/subscribe
- Fusion
 Numerous initiatives that are likely to fail
-
e.g., common operational pictures
Today’s technology: Costly, little reuse, skill-intensive
7
Toward Attainable Goals
(and more realistic slogans)
 “Give everyone transparent (read) access to all data”.
(Any success stories?)
The vision of perfection crowds out ability to live with
imperfection!|
 Restate the challenge: Prepare data/software systems to
work with partners -- including unknown future ones?
 Connection-creation as a core competence for IT
- Describe each service that is offered or wanted
-
(e.g., some operation on some data)
Reduce cost of establishing the software connection
Reuse knowledge captured when a connection is built
8
What Do We Mean “Industrial Revolution”?
 Small tasks
 Each with one skill
 Many atomic steps become automatable
 Each produces reusable knowledge
(as opposed to motivating a few lines within a program)
 “Market-driven” (as connections are made) rather than giant
initiatives
9
Future of Large Info Management Architectures
 Consensus among researchers for scalable sharing
- Each data resource describes what it offers
- Each consumer describes what it wants
- Discovery and brokering processes create a connection
(prototypes automate some cases)
Is it really so different from today?
each functional task is performed by today’s developers
- Key difference: “describe and generate”
10
A word from our sponsor: We’re Hiring
 Researcher / Consultants, Prototypers, Systems Engineers
(or make us an offer)
 Main offices: suburbs of Boston and Washington DC
- Also jobs in Norfolk, Montgomery, St. Louis, San Diego,
…
+ Europe, Asia
 We’re a nonprofit working mostly for the US government
(A good place to learn. So you’ll get more stock options
later)
 US Citizens and Permanent residents only
(so MITRE can get you a security clearance)
12
Talk Outline
 Why do current approaches so often fail?
- We act as if we believe ridiculous things -in architectures and in design discussions
 Where should we try to go? Incremental Interoperability
- Aim to revolutionize -- incrementally
 How to Start Moving in this Direction?
- Scope of talk:
-
Create logical connectivity -- development
and logical admin
Omits: Systems planning, execution performance
(cache selection, indexing, dissemination)
14
Tacit Assumptions -- and Antidotes -- 2
 “End State” fallacies:
- Architectures are for a perfect end state (?)
-
Systems conform and consumers benefit only when
transition is complete (?)
You’ll add flexibility later (?)
 Config. mgt. is a sufficient strategy for change (?)
Advice Nuggets
 Architect for manageable, adaptable, imperfect systems
(for 2001, 2002, … 2999)
- Transitional states are within the architecture
 Architect for adaptability. How to contract for it?
- Config. management is only a brake
15
Tacit Assumptions -- and Antidotes -- 3
 Mandates will elicit good quality metadata (?)
- Local administrators will rush to keep you up to date (?)
Advice Nuggets
 Active (operational) metadata is kept accurate
- Passive metadata is untested, and soon too obsolete to
drive automated processing (except browsing)
 More carrots, fewer sticks
- If your tools use the metadata to ease the providers’
tasks, you’ll get better metadata
Calls for metadata should include an exploitation plan
16
Tacit Assumptions -- and Antidotes -- 4
 “Midpoint” Fallacy: Design a compromise interface (msg?)
Build around and above it. (?)
 “Message interface” Fallacy : “Send message Mxyz” is a
fine interface between systems (?)
- Support interfaces procedurally (e.g., Java + parser) (?)
 Describe the “natural” interface.
- One interface supports all subsets.
- Connectors are separate & declarative (e.g. SQL + fns?)
 On the consumer’s interface, generate
- operations (e.g., query, update, subscribe)
- metadata, e.g., units, error, access controls
18
Tacit Assumption 6:
Interoperability Metaphor: Universal Plug
Two Prongs
Too Simple
Important element of truth:
Design to plug into the “infosphere”, not into one neighbor
19
A Better Interoperability Metaphor:
A Multi-Pin Connector
CORBA/DCOM
transactions
1
3
2
14
XML
15
4
16
5
17
6
18
7
19
8
20
10
9
21
SQL
22
11
23
12
24
13
25
All the Pins
Have To Fit -and
Many are
compound
Data Each attribute has
semantics format, quality
Track Resolution of Each Pin’s Issues
20
Organization of the Section
 Why do current approaches so often fail?
 Where Should We Want to Go?
- Approach
- Taxonomy of needed capabilities
 How to Start Moving in this Direction?
 Research Agenda: Risk Mitigation
21
Transition is the steady state,
with good ways to cope
 Descriptions of sources, consumers exist -- sometimes
- When build next connection, capture more
-
You’re still funded to build connections
No giant process cutover
Discovery and brokering tools work with whatever
descriptions they find
 Integration contractors already do discovery and brokering!
- Manually, with too little reuse!
 For everything, there are multiple ways to do it
- Choose one, but work with those who chose differently
- Connections and transforms are partially known
22
Steps to Connect a Consumer to Provider(s):
(with metadata reuse)
Obtain descriptions of each player
Use same form for consumers’ needs as for providers
- May employ intermediary vocabularies
-
 Discover potential (source, consumer) pairs
 Obtain transforms for
- Element representations (e.g., miles  km; jpeg  gif)
- Object and set representations (e.g., ODBC  XML)
- Protocols (e.g., DCOM  CORBA)
- Pull versus push, whole versus changes
 Generate the entire connection (tuned for efficiency)
What vendor can supply the framework?
24
Metadata Drives Connection Creation
(when there is enough metadata)
New “Wants”
from consumer
Repository/
Discovery
process
Knowl. Base
Brokering
process
Transform
Library +
execute
25
Connection Creation Drives Metadata
New “Wants”
from consumer
Repository/
M’data
capture
tools +
Discovery
process
Knowl. Base
M’data
capture
tools
Brokering
process
Transform
Library +
execute
26
Connection Creation Drives Vocabularies (?)
Vocab and I/f
creation tools
Repository/
New “Wants”
from consumer
M’data
capture
tools +
Discovery
process
Knowl. Base
M’data
capture
tools
Brokering
process
Transform
Library +
execute
Optimizer
27
Toward an “industrial revolution” for IT:
Re-imagine Existing Processes as Simpler Steps
 Each step should
- Require just one or two skills
- Benefit from existing resources -- metadata and
-
transforms
 Be fully automated (sometimes)
Produce reusable resources for later steps
 Key challenges:
- Incentives:
-
It’s must be made easier to generate from
resource atoms than to code it all yourself!
To support these incentives, we may need tools that
assemble the atomic components into a solution
28
Data Descriptions: A Taxonomy
(foil 1 of 2)
Data admin for requirements parallels admin for offers!
Use same constructs
- Enables (partly) automated comparisons
-
 Interpretation: element semantics, element representation,
schema
 Scope and completeness of what you provide (population),
e.g., images of
+ all US air-fuel depots, since 1970
+ some NATO fuel depots since 1990
 Delivery style (push/pull, whole / changes)
(Is offer/need model adequate for update transactions?)
29
Data Descriptions’ Taxonomy
(foil 2 of 2)
 Quality of service
- Data quality, timeliness, attribution, completeness,
obligation (to continue providing), cost, …
 Guidance for data merging (match-up, conflict resolution)
 Server information, e.g. (catch-all)
- Access language, protocols, address, security domains,
…
32
Talk Outline
 Why do current approaches so often fail?
 Discussion of a “low risk” approach
- What the goal system looks like
- How it evolves
- Tool and technology details
 How to Start Moving in this Direction? How to:
- Simplify the task of interfacing to a particular system
- Establish more connections
- Make created interfaces “first class”
 Research Agenda: Risk Mitigation
33
Getting Started along the New Road
 Provide help in creating needed interfaces
- Focus on individual programs, small initiatives
- Give incremental benefits, to keep all aboard
 What’s
the minimum to give some benefits?
 Separate existing work into atomic tasks that require fewer
skills, and are sometimes automatable
- No giant cutovers, with massive retraining, coordination
 Issues
- What does each program need to do?
- What requires coalitions, or central funding?
(e.g., repository, brokers)
34
Tasks (examples)
 Define vocabularies for
- Metadata
-
(how to say “means the same”, or
“distanceUnits = km” or
“Corba3.0 interface)
Aspects to be brokered (of scope, representation, …)
Frequently-exchanged domain data (Part#, Facility#)
 Describe portions of systems in terms of these vocabs
- Be opportunistic,
e.g., when building new connections
 Provide transforms among major representations, protocols
 Provide brokers for various aspects (simple brokers first)
“Partial brokering” must help metadata providers
35
Who Will Be Most Interested?
(Suggested Initial Targets)
 Find a system which needs multiple interfaces.
(to customers and/or feeders)
 Good candidates
- Non-dominant players who must connect to multiple
-
others
Dominant player with bad ease-of-connecting (MIDB?)
 Issue: How soon till it’s helpful
- Generate, based on own entries in metadata repository
- Transformers are quickly helpful (esp. harder ones, e.g.,
coordinates, image formats)
 Perhaps attach to DBMS, or to XML engine?
36
Example Initiatives (and their benefits)
 Publish interface in one formalism (with description)
e.g., SQL
Tools generate the additional interfaces, without
disturbing the original publisher
e.g., XML, CORBA, DCOM, html, …
-
 Publish interface in one vocabulary, for all exported info
e.g., Supply
- Tools generate “closest feasible” interface in other
vocabularies that have been related to it
e.g., Repair, Procurement, Defense finance, …
- Transform representations (image format, coord system)
 Provide interfaces as (root concept, well known modifier)
 Derive metadata, additional operations (e.g., update)
40
Summary: Try an approach that hasn’t failed
consistently!
 Identified pitfalls that are too rarely avoided
 Described incremental steps toward large scale data admin
for diverse, changing, incomplete systems
 Generate connections from reusable resources (system
metadata, vocabulary metadata, transforms)
active metadata
- Separation of skills, use point and click
- Incentives: Make provide resource + generate
easier than writing connecting code
 Connection-creation creates more reusable resources
- Projects cooperate to create vocabularies, acquire tools
It’s a low risk approach -- begin prototyping
41
Challenges for Database Researchers
 Better brokering for matching requirements to sets of views
- Assume multiple ontologies, spotty connection,
-
incremental improvement
Explain the shortfalls, understandably
 Scalable fusion (to match objects, resolve data conflicts)
without n x n pairwise administration
 Pragmatic
- Acquisition guidance, e.g., metrics on flexibility
(what should be in each acquisition contract?)
Combine techniques for learning metadata?
No more discovery heuristics!
 Automate physical DBA work (caching, optimization)
-
Download