12. Models of Business Information [2] (35)

advertisement
12. Models of Business
Information [2]
DE + IA (INFO 243) - 3 March 2008
Bob Glushko
1 of 35
Plan for Today's Class
"Operation Clean Data" case studies
Authority Control
Data Warehouses
"Interoperability Costs in Auto Supply Chain" case study
Hub languages
The Universal Business Language
2 of 35
But First... Schedule for Assignments
Assignment 3, Business patterns (assigned today 3/3, due 3/12)
Assignment 4, Requirements and Source Inventory (assigned 3/12, due
3/24)
Assignment 5, Process Analysis (assigned 3/31, due 4/9)
Assignment 6, Document Analysis (assigned 4/16, due 4/23)
3 of 35
We're Both "Shipping Containers"
"The expense of resolving ambiguous business terms over and over on
a daily basis pales in comparison with the expense of NOT realizing
there is an ambiguity in the term" (Farish)
4 of 35
Controlled Vocabularies
The words people use to describe things or concepts are "embodied" in
their context and experiences... so they are often different or even "bad"
with respect to the words used by others
These naturally-occurring words are an "uncontrolled vocabulary"
Information retrieval or other processes with uncontrolled vocabularies
are often ineffective and error-prone
Creating a controlled vocabulary creates an artificial language by:
1. Choosing an authoritative form of a term, name or identifier
2. Ensuring that the term is distinctive
3. Mapping all the variant forms to the authoritative one
5 of 35
"Operation Clean Data" -- British Military
Case
What were the symptoms or implications of "dirty" data in the British
army's supply chains?
What were the primary causes of this "dirty" data?
Which data items were the focus of the data cleanup effort? Why?
What technologies or tools were used in the data cleanup effort?
6 of 35
"Operation Clean Data"
-- Carlson Wagonlit Case
What were the symptoms or implications of "dirty" data for the Carlson
Wagonlit travel agency?
What were the primary causes of this "dirty" data?
How is Carlson Wagonlit improving its data quality?
7 of 35
"Operation Clean Data" -- Cendant Case
What were the symptoms or implications of "dirty" data for Cendant?
What were the primary causes of this "dirty" data?
How is Cendant improving its data quality?
8 of 35
Normative Name Forms
When names appear in multiple forms, one form needs to be chosen
using criteria that include:
Fullness (e.g., full names vs. initials only)
Language of the name
Spelling (choose predominant form)
Entry element
"Smith, John" not "John Smith"
"Mao Zedong" or "Zedong, Mao" or "Mao Tse Tung" or ?
9 of 35
Authority Control for Places
Variant forms: St. Petersburg, Санкт Пербургскйй, Saint-Pétersbourg
Multiple names: Cluj, in Romania / Roumania / Rumania, is also called
Klausenburg and Kolozsvar
Name changes: Bombay -> Mumbai.
Homographs:Vienna, VA, and Vienna, Austria; 50 Springfields
Anachronisms: No Germany before 1870
Vague, e.g. Midwest, Silicon Valley
Unstable boundaries: 19th century Poland; Balkans; USSR
10 of 35
"Operation Clean Data" -- US govt agencies
How are the US Census Bureau and CDC improving data quality?
How do these processes differ for printed and electronic surveys/forms?
11 of 35
Some General Questions about Data
Quality
Are the data quality problems primarily technology ones or
process/management ones?
Why are "homonyms" worse than "synonyms" in a set of item
identifiers?
Does data have to be perfectly clean? Can it ever be?
How can your own actions contribute to data quality problems or to their
resolution?
12 of 35
Principles and Processes for Quality
Information
Prioritize the data items
Involve the data owners
Keep future data clean (enough)
Find the data owners and the "headwaters"
Validate at the time of capture or creation
Set realistic goals for data quality
13 of 35
Data Warehouses
A data warehouse is a "subject-oriented, integrated, time-varying,
non-volatile collection of data used in organizational decision making"
Data warehouses extract data from ERP systems or other transactional
applications into a separate repository
It is common practice to "stage" data prior to merging it into a data
warehouse with an "Extract, Transform, and Load" (ETL) application
The data model for the warehouse, designed to enable efficient ad hoc
data analysis and reporting, is sometimes called a "hypercube"
14 of 35
Generic Enterprise Information Integration
Architecture with Warehouse (Gantz, XML
2004)
15 of 35
ETL vs ELT
The traditional ETL (Extract-Transform-Load) approach relies on
proprietary ETL engines being deployed between sources and targets.
Relational databases are rapidly eliminating the ETL category by
incorporating transformation functionality
So ETL is becoming ELT (Extract-Load-Transform), with all the
complex processing of data occurring inside the database
16 of 35
The Virtual Warehouse
A virtual warehouse is created "on demand" by centralizing and
normalizing metadata about the data sources rather than the data itself.
The data is left in its original location and extracted only when needed,
which makes more "real time" analysis
17 of 35
Virtual Warehouse Via Metadata Repository
(Gantz, XML 2004)
18 of 35
"Interoperability Costs in the US Auto
Supply Chain"
Excellent case study about how a concurrent engineering business
model escalates the information exchanges and interoperability
problems in the "ecosystem"
Analyzes various alternatives for data transfer, and finds that the
choices made are not the optimal ones
Concepts and lessons apply to other industries with "data
exchange-intensive" supply chains
19 of 35
Alternatives for Data Transfer Between Two
Systems
Manual re-entry
Everyone has to learn to "speak" all the languages
Native formal transfer
Point-to-point translation
Everyone has to learn just one new language but it has to be the same
one
Dominant players impose their language on their ecosystem
Multiple vocabularies exist, but there is at least one "interchange" or "hub"
language designed to facilitate translations between "native" vocabularies
20 of 35
CAD / CAM Systems Proliferation
21 of 35
Juran's "Quality Costs" Framework
Joseph Juran's "Quality Control Handbook" (1951) -- "cost of quality"
framework determines how much to spend on quality at any point in the
"quality system"
The costs of preventing and finding quality problems (avoidance) ...
Prevention costs (design reviews, training, guidelines, knowledge...)
Appraisal costs (tests, process control measurements, reports,
evaluations,...)
... must be balanced against the costs associated with those quality
problems (mitigation):
Internal failure costs (costs incurred before the product or service is
delivered: scrap, rework, lost time, unused capacity, ...)
External failure costs (cost incurred when quality problems reach customers:
returns, recalls, complaints, field services, warranty repairs, liability
lawsuits,...)
22 of 35
The Case for Investing in Avoidance
23 of 35
Interoperability Avoidance Costs
24 of 35
Interoperability Mitigation and Delay Costs
25 of 35
Estimated Interoperability Costs
26 of 35
An Interchange or Hub Language
27 of 35
Hub Languages for e-Business
(early 1990s) - Ad hoc efforts in EDIFACT to "harmonize" core
components across verticals
1997- XML Common Business Library
is 1st XML horizontal vocabulary, incorporated EDIFACT semantics and
code lists
1999 - ebxml
initiative of EDIFACT and OASIS to develop syntax-neutral "core
components"
2001 - Universal Business Language
effort begins, building on xCBL and ebXML Core Components
28 of 35
Universal Business Language
DOCUMENT ARCHITECTURE: A generic XML interchange format for
business documents that can be extended to meet the requirements of
particular industries
CORE COMPONENTS: A library of XML schemas for reusable data
components such as "Address," "Item," and "Payment" -- the common
data elements of everyday business documents
STANDARD DOCUMENTS: A small set of XML schemas for common
business documents such as "Order," "Despatch Advice," and "Invoice"
that are constructed from the UBL library components and can be used
in a generic order-to-invoice trading context
29 of 35
UBL 1.0 Document / Process Scope
30 of 35
How A Hub Language Increases the XML
Advantage over EDI
31 of 35
How a Hub Language Shortens the Time to
the XML Payoff
32 of 35
Document Exchange Context with UBL
33 of 35
Mapping in and out of Hub Language
If all parties/applications/services rely on a hub language for their
external interfaces, an exponential interoperability challenge becomes a
linear one
Mapping
tools for transforming instances from an internal information model to
another one are ubiquitous as standalone tools and as parts of
application servers
EXAMPLE: Altova MapForce
34 of 35
For Wednesday March 5
Chapter 5 of Document Engineering
"E-Government Architecture in Ireland" Sean McGrath and Fergal
Murray, XML 2004 Conference
"Mobile Telemedicine System for Home Care and Patient Monitoring"
M. V. M. Figeuredo and J. S. Dias, Proceedings of the 26th Annual
Conference of the IEEE EMBS (September 2004)
"Redefining the Patient Record Paradigm" MedicAlert Foundation,
Whitepaper (January 2005)
35 of 35
Download