DAMA
New York
The Impact of Data Governance and
Data Quality on MDM
2013-01-17
Peter R. Benson
Project leader for ISO 22745 and ISO 8000
Copyright © 2013 ECCMA
All rights reserved
Slide 1
Information
is Power
Data
is Truth
Differentiating Information and Data
data: fixed form into which information is transformed so that it can be
stored or moved.
Peter R. Benson
Information
Data
Information
Copyright only covers “fixed form” (you can only copyright data not information)
Data quality is an essential component of quality
information but not the only component
Quality information must be based on quality data but quality data
does not necessarily yield quality information
Quality data may not be timely or relevant, these are
characteristics of information quality not data quality
Copyright © 2012 by Peter R. Benson
Slide 3
Data Maturity Model
Global Data Excellence
Maximize the Business Value of Enterprise Data
Part 1 - Introduction to data quality
Defining Quality
Portable data
Standards and contracts for quality data
What does “Quality” Mean?
When you order seafood from
Quality Fresh Seafood, you can be
confident that you are receiving the
very best quality of seafood and
delivery.
Copyright © 2013 by ECCMA
Slide 7
ISO 9000 Definition of “Quality”
3.1.1
quality
degree to which a set of inherent characteristics fulfils requirements
ISO 9000:2005(E)
Slide 8
ISO 9001 Quality Management System
Requirements
ISO 9001:2000(E)
Slide 9
Requirements Define Quality
3.1.2
requirement
need or expectation that is stated, generally implied or obligatory
ISO 9000:2005(E)
Quality is about meeting requirements
Quality data is data that meets stated requirements
nothing more and nothing less!
(data that exceeds stated requirements does not increase the quality of the data)
Copyright © 2013 by ECCMA
Slide 10
These are data requirements
Slide 11
The quality of the data depends on how you ask for it
Slide 12
The Need for Semantic Encoding
In South Africa a traffic light is
called a robot!
Robot
Copyright © 2013 by ECCMA
Slide 13
ECCMA Open Technical Dictionary (eOTD)
Just as with music notation and engineering symbols, the eOTD concept identifiers
are simply used to communicate more accurately in a language independent
environment.
Music
Engineering
eOTD
A unique public domain identifier is
assigned to a concept.
0161-1#01-089388#1
table
0161-1#01-086445#1
chair
0161-1#02-018635#1
weight
0161-1#02-005808#1
length
0161-1#07-277660#1
Monday
0161-1#05-001122#1
kilogram
Copyright © 2013 by ECCMA
Slide 14
Publicly Visible Terminology in a Standard Model
The eOTD (ECCMA Open Technical Dictionary) is an ISO 22745-20 compliant central
registry of terminology. Each concept and terminological component in the eOTD is
assigned a unique and permanent public domain identifier.
Users create corporate preferred subsets of the eOTD and use the eOTD concept
identifiers to manage concept equivalence mapping with the concepts used by their
trading partners.
ISO 22745 - ECCMA Open Technical Dictionary (eOTD)
Terms
Abbreviations
Terminology
Public Domain
Concept Identifier
0161-1#xx-xxxxxx#1
Copyright © 2013 by ECCMA
Definitions
Images
Publicly Visible Terminology in a Standard Model
Industry
terminology
Terminology
Terminology
Government
terminology
Terminology
SDO
terminology
• Public domain concept identifiers
• Free identifier resolution to underlying
Terminology
Copyright © 2013 by ECCMA
SDO
terminology
•
•
•
terminology (web services)
Hyperlink to source standards
Multilingual
Multiple terms, definitions and images
linked to single concept identifier
Slide 16
Building the Corporate Dictionary and Data Requirements
ISO 22745-30
Corporate Data
Requirements
Registry
eDRR
ECCMA Data
Requirements Registry
ISO 22745-10
Corporate Dictionary
eOTD
ECCMA Open
Technical Dictionary
Spreadsheets
ISO/IEC 11179
Corporate
Metadata
Registry
Reports
Corporate
Classifications
Forms
New eOTD concept or
terminology registration
Data models
Copyright © 2013 by ECCMA
Slide 17
The property-value pair
Everything (Individuals, organizations, locations, goods, services, processes,
rules, transactions) can be described using property-value pairs.
Property
Value
Property
Value
CLASS NAME
BOLT
DATE
2012-08-12
PRODUCT NUMBER
3225020037
CREDIT CARD NUMBER
3225020037
NOMINAL THREAD DIAMETER
1.0 INCH
MERCHANT NUMBER
15689345
WIDTH ACROSS FLATS
1.450 INCHES
AMOUNT
156.45 DOLLARS
WIDTH ACROSS CORNERS
1.653 INCHES
HEAD HEIGHT
0.591 INCHES
COUNT PER PACK
10
Property
Value
STATE
PA
PACK PRICE
0.80 DOLLARS
TITLE
33
Property
Value
CHAPTER
75
PARAGRAPH
3362
FIRST NAME
JOE
NAME
Maximum speed limits
LAST NAME
SMITH
TITLE
MANAGER
EMAIL
JOE.SMITH@COMANY.COM
Copyright © 2013 by ECCMA
Slide 18
Taxonomy of Data
data
Data
Copyright © 2013 by ECCMA
metadata
transaction data
master data
identification data
descriptive data
classification data
physical
characteristics
performance
characteristics
data exchange
characteristics
Slide 19
ISO 8000 Family of Standards
ISO 8000
General Principles
Part 1
Introduction
Part 2
Terminology
Part 100
introduction
Part 120
Provenance
Part 110
Syntax
Copyright © 2013 by ECCMA
Transaction
data
Master Data
Semantic
encoding
Part 130
Accuracy
Part 140
Completeness
Meets
requirements
Slide 20
Intrinsic and Extrinsic Data Quality
“Mass is an intrinsic property of any physical object, weight is an extrinsic property
that varies depending on the strength of the gravitational field in which the
respective object is placed.”
Intrinsic (internal) data quality characteristics
Characteristic that you can evaluate by looking at the data itself, (expectation
that is generally implied)
 Is the data complete: what is the % of empty cells?
 Is the data consistent: are the values consistent in their formatting?
Extrinsic (external) data quality characteristic



ISO 8000 Quality Data:
Is the syntax referenced – is the data compliant with the syntax?
Is the semantic encoding referenced – is the data encoded?
Is the data requirement referenced – does the data meet the
requirement?
Copyright © 2013 by ECCMA
Slide 21
ISO 8000 Quality Data
Data that meets Requirements
Data that is Portable
(syntax and semantic encoding)
Copyright © 2013 by ECCMA
Slide 22
Portable Data
If we believe that:
• the software we will be using tomorrow will be different from
what we are using today, and
• we will need access to our data …. forever, and
• data that cannot be separated from licensed software is also
licensed data
then
We should ensure that our corporate data is
portable, i.e., independent of any licensed
software application.
ISO 8000 quality data is portable data!
Copyright © 2013 by ECCMA
Slide 23
Vision of the Future – The Service Provider’s View
Application
Application
Application
Customer data
Customer
Application
Customer data
Storing customer data within the application creates
“customer lock-in”
Copyright © 2013 by ECCMA
Slide 24
Vision of the Future – The Customer’s View
Data
Data
Data
ISO 8000
Data
Application
Customer
Application
Data standards are the antidote to application “lock-in.”
Copyright © 2013 by ECCMA
Slide 25
Operating Systems as a Solution to Hardware “Lock-in”
User
Application
Operating
system
Hardware
Copyright © 2013 by ECCMA
An operating system is the infrastructure software
component of a computer system responsible for
sharing the limited resources of the computer. The
operating system acts as a host for applications that are
run on the machine.
1969 - UNIX (we also landed on the moon!)
1974 - CP/M
1981 - MSDOS
1985 - Windows 1.0
1995 - BOB
2000 - ME
2001 - XP
2007 - VISTA
2009 - Windows 7
2009 - Android
Slide 26
ISO Data Standards are the Antidote to Application “Lock-in”
Data
quality
analysis
app
User
Application
Operating
system
Hardware
Copyright © 2013 by ECCMA
ISO 22745
ISO 8000
Portable data
Data
de-dup
app
Data
validation
app
Quality data is portable data; it is
independent of the software application
and accessible by any application.
Slide 27
Portable Data
data that cannot be separated from licensed software is also
licensed data
Images
Metadata
Source: Royalty Free Photos
Proprietary metadata and proprietary
identifiers are copyright
We can expect to see copyright
rigorously enforced in the years ahead
Slide 28
ISO 8000 Quality Data
is
Portable Data that Meets Requirements
Motivation for ISO 22745 and ISO 8000
quality data saves money
“From a logistics information
perspective…
an F-15 is just 171,000 parts flying in very
close formation.”
Cataloging and Standardization Act, Public Law 82-436 as codified by United States
Code, Title 10, Chapter 145 – Cataloging and Standardization Sec. 2451.
(a) The Secretary of Defense shall develop a single catalog system and related program of
standardizing supplies for the Department of Defense.
(b) In cataloging, the Secretary shall name, describe, classify, and number each item
recurrently used, bought, stocked, or distributed by the Department of Defense, so that
only one distinctive combination of letters or numerals, or both, identifies the same item
throughout the Department of Defense. Only one identification may be used for each item
for all supply functions from purchase to final disposal in the field or other area.
Faster, Better, Lower Cost Master Data
Slide 30
Standards as Business Constraints
LAWS
(Mandatory)
STANDARDS
(Voluntary, but compliance
can be verified)
CONVENTIONS
(Voluntary)
Copyright © 2013 by ECCMA
Slide 31
ISO 8000 referenced in a NATO contract
ASD Specification 2000M (S2000M) is a standard that specifies the information
exchange requirements for most materiel management functions commonly
performed in supporting international projects. S2000M is based on a business model
agreed between military customers and industry suppliers.
As of Mar 2011 ASD 2000M Chapter 1B section 3.1 includes the following statement, :
The Contractor shall supply identification and characteristic data in accordance with
ISO 8000-110:2009 on any of the selected items covered in his contract. Following
an initial codification request as specified in section 3.2, the NATO Codification
Bureau (NCB) shall present a list of the required properties in accordance with the
US Federal Item Identification Guides.
quality data save money
Copyright © 2013 by ECCMA
Slide 32
Motivation for ISO 22745 and ISO 8000
Controlling costs requires better
asset, product , component and
process visibility. This is achieved
through faster, better and lower cost
access to authoritative characteristic
data.
quality data save money
Slide 33
ISO 8000 Referenced in a Commercial Contract
The supplied data shall be ISO 8000-110:2009 compliant.
• The data shall comply with registered ISO 22745-30 compliant data
requirements
• The data shall be encoded using concept identifiers from an ISO 22745
compliant open technical dictionary that supports free resolution to concept
definitions.
• The data shall be provided in ISO 22745-40 compliant Extensible Markup
Language (xml).
• Creating ISO 8000-110:2009 compliant data does not require the payment of any
license fees or the use of specialized software, it is within the technical ability of all
suppliers regardless of their size.
quality data save money
Copyright © 2013 by ECCMA
Slide 34
Part 2 – Drivers and Impact of Data quality on MDM
Data quality drivers
Impact of quality standards
Natural identifiers
Current Data Quality Drivers
1. Big Data
2. Compliance
3. Data Governance
Companies no longer need convincing that
measuring and managing the quality of their data is
a good idea, they want to know how to do it.
ISO 8000 is the international standard
for data quality
Big Data?
1to1 Media
Slide 37
Compliance
ISO 8000 would not solve the highlighted problem!
Data Governance
People
use
Data
Governance
Policy
Technology
to
access
Data
Copyright © 2013 by ECCMA
Emerging Data Quality Drivers
1. Data Portability
2. Provenance
3. Authoritative data
Companies are discovering how to manage a data
supply chain to acquire and distribute authoritative
high quality data
ISO 22745 is the international standard
for the exchange of quality data
ISO 8000-120 Data Warehouse
Copyright © 2013 by ECCMA
Slide 42
Managing a dictionary
Copyright © 2013 by ECCMA
Slide 43
ISO 8000-120 Data Warehouse used to Manage Content
Multilingual Product Descriptions
Copyright © 2013 by ECCMA
Slide 44
New item request: Supplier name and part number, description
External reference data
(supplier ID and part #)
Characteristic data
(description)
eOTD Class name
eOTD class identifier
Classifications
C1, C2, C3, Cn
eOTD - Classifications lookup
eDRR data requirement name
eDRR identifier
ISO 22745-40 Material record
P1
ISO 22745-45 Description rules
eOTD Property Name
CLASS NAME
eOTD property identifier
Value
P2
eOTD Property Name
MATERIAL GROUP
eOTD property identifier
Value
P3
eOTD Property Name
eOTD property identifier
Value
P4
eOTD Property Name
eOTD property identifier
Value
Pn
eOTD Property Name
eOTD property identifier
Value
Copyright © 2013 by ECCMA
Short
Long
Max=40; P1 value, P3 abvr
name +“ “+ value
Max=256; P1 value, P3 name=value, P4
name=value
ISO 22745 Description generation engine
Material name (short description)
PO description (long description)
Developing the data requirement (ISO 22745-30)
Data requirements support a business function, granting access
to a computer, a website or a software program, simply asking
for the data needed to deliver the right product or service or to
comply with a regulation, these are all data requirements.
Be careful what you ask for – data quality starts with the quality
of the request for data
Data requirement
eOTD-i-xml
ISO 22745-30
Copyright © 2013 by ECCMA
Developing request for data (ISO 22745-35)
Clear and unambiguous requests for:
1. Reference data (identifiers)
2.
Characteristic data (descriptions)
3.
The validation of reference and characteristic
data,
ISO 22745-35 is a standard format for the generation
and distribution of requests for data in a simple XML
format that can be automated by the sender and
recipient to create an integrated data exchange system.
Request for data
eOTD-q-xml
ISO 22745-35
Copyright © 2013 by ECCMA
Developing a reply to a request for data (ISO 22745-40)
ISO 22745-40 is a standard format for the exchange of
reference and characteristic data in a simple XML
format that can be automated by the sender and
recipient to create an integrated data exchange system.
Data requirement
eOTD-i-xml
ISO 22745-30
Data exchange
eOTD-r-xml
ISO 22745-40
Copyright © 2013 by ECCMA
Request for data
eOTD-q-xml
ISO 22745-35
Automating the data supply chain using ISO 22745
A data provider may not have all the data requested so they in
turn send a request through their data supply chain using the
same ISO 22745 standard exchanges
Request for data
eOTD-q-xml
ISO 22745-35
Sub
Request for data
eOTD-q-xml
ISO 22745-35
Data
provider
Data
requester
Data requirement
eOTD-i-xml
ISO 22745-30
Data exchange
eOTD-r-xml
ISO 22745-40
Copyright © 2013 by ECCMA
Data exchange
eOTD-r-xml
ISO 22745-40
Conducting and ISO 8000 data governance and data quality evaluation

Is there a data governance policy registry?

Is there a corporate dictionary?

Is the corporate dictionary mapped to an open technical dictionary?

Is there a data requirements registry?

Is stored data encoded using the dictionary?

Does stored or exchanged data contain provenance at the data
element level?

Does the provenance data identify the authoritative source of the
data?
Copyright © 2013 by ECCMA
Slide 50
Natural and “Managed” Identifiers
Identifier: an object identifier is created when an object is defined. The
identifier is then used to reference the object (see Microsoft et al.).
Reference: the combination of an identifier of the organization issuing
an identifier and the object identifier.
Natural identifier: based on a defined process, if the process is
proprietary, the identifier is proprietary. A natural open identifier is
based on an open standard process.
Managed (proprietary identifier): issued by a defined organization,
belongs to the organization that issued it. Copyright and subject to
license.
Copyright © 2013 by ECCMA
Slide 51
Natural and “Managed” Identifiers
Slide 52
Developing an international open standard for
creating and resolving globally unambiguous
Property Natural Identifiers (PNI)
ECCMA 2010-10-21 revised 2012-09-26
Property Natural Identifier - Lot (PNIL)
• Lot boundary represented as one or more
polygons in an Earth coordinate system
expressed in KML and compressed to
form a string.
134x3QC8P14Zv3GT…..*
*the identifier is expected to be very large
ECCMA 2010-10-21
54
Property Natural Identifier – Unit (PNIU)
• A combination of a PNIL and a three
dimensional elevation of the point of
demarcation (front door) in an Earth coordinate
system expressed in KML and compressed to
form a string.
+
134x3QC8P14Zv3GT…../0GR98WR593C8SQ4…..*
*the identifier is expected to be very large
ECCMA 2010-10-21
55
Requirements
• Natural identifiers - not assigned by a registry
– The identifiers should be unique to each lot and each
unit space
– The identifier should be spatially enabled
– The algorithm for the generation of the identifiers
should be in the public domain
– The identifiers should be in the public domain
– The algorithm for the conversion of the identifiers to
the location and representation of the lot and unit
space should be in the public domain
ECCMA 2010-10-21
56
Information
is Power
Data
is Truth
ISO 8000 Quality Data
Portable Data that Meets Requirements
ECCMA ISO 8000 Master Data Quality Manager (MDQM) Certification
An open book test consisting of 60 questions designed
to assess your ability to remember or recall basic and
fundamental pieces of knowledge related to the
workshop as well as assess your ability to think critically
about the subject.
Contact: vicky.falcone@eccma.org
Questions?
Peter R. Benson
Executive Director
Peter.Benson@eccma.org