DAMA

Chicago

Tracking the Success of Data Quality

2012-12-12

Peter R. Benson

Project leader for ISO 22745 and ISO 8000

Copyright © 2012 ECCMA

All rights reserved Slide 1

Information is Power

Data is Truth

Introductions

Quality

Quality drivers

Data quality successes

Next steps

Copyright © 2012 by ECCMA

What does “Quality” Mean?

When you order seafood from

Quality Fresh Seafood, you can be confident that you are receiving the very best quality of seafood and delivery.

Slide 5

ISO 9000 Definition of “Quality”

3.1.1

quality degree to which a set of inherent characteristics fulfils requirements

ISO 9000:2005(E)

Slide 6

Requirements

ISO 9001 Quality Management System

ISO 9001:2000(E)

Slide 7

Requirements Define Quality

3.1.2

requirement need or expectation that is stated, generally implied or obligatory

ISO 9000:2005(E)

Quality is about meeting requirements

Quality data is data that meets stated requirements nothing more and nothing less!

(data that exceeds stated requirements does not increase the quality of the data)

Slide 8

Copyright © 2012 by ECCMA

These are data requirements

Slide 9

The quality of the data depends on how you ask for it

Slide 10

The Need for Semantic Encoding

In South Africa a traffic light is called a robot!

Robot

Copyright © 2011 by ECCMA

Slide 11

ECCMA Open Technical Dictionary (eOTD)

Just as with music notation and engineering symbols, the eOTD concept identifiers are simply used to communicate more accurately in a language independent environment.

Music Engineering eOTD

A unique public domain identifier is assigned to a concept.

0161-1#01-089388#1 table

0161-1#01-086445#1 chair

0161-1#02-018635#1 weight

0161-1#02-005808#1 length

0161-1#07-277660#1 Monday

0161-1#05-001122#1 kilogram

Slide 12

Publicly Visible Terminology in a Standard Model

The eOTD (ECCMA Open Technical Dictionary) is an ISO 22745-20 compliant central registry of terminology. Each concept and terminological component in the eOTD is assigned a unique and permanent public domain identifier.

Users create corporate preferred subsets of the eOTD and use the eOTD concept identifiers to manage concept equivalence mapping with the concepts used by their trading partners .

ISO 22745 - ECCMA Open Technical Dictionary (eOTD)

Terms Abbreviations

Terminology

Definitions Images

Public Domain

Concept Identifier

0161-1#xx-xxxxxx#1

Publicly Visible Terminology in a Standard Model

Industry terminology

Terminology

Terminology

Government terminology

SDO terminology

Terminology

SDO terminology

Terminology

Public domain concept identifiers

Free identifier resolution to underlying terminology (web services)

Hyperlink to source standards

Multilingual

Multiple terms, definitions and images linked to single concept identifier

Slide 14

Building the Corporate Dictionary and Data Requirements

Spreadsheets

Reports

Forms

ISO 22745-30

Corporate Data

Requirements

Registry

ISO/IEC 11179

Corporate

Metadata

Registry

Corporate

Classifications

ISO 22745-10

Corporate Dictionary

New eOTD concept or terminology registration

Copyright © 2012 by ECCMA

Data models eDRR

ECCMA Data

Requirements Registry eOTD

ECCMA Open

Technical Dictionary

Slide 15

General Principles

ISO 8000 Family of Standards

ISO 8000

Master Data

Transaction data

Part 1

Introduction

Part 2

Terminology

Part 100 introduction

Part 110

Part 120

Provenance

Part 130

Accuracy

Part 140

Completeness

Syntax

Semantic encoding

Meets requirements

Copyright © 2012 by ECCMA

Slide 16

Intrinsic and Extrinsic Data Quality

Mass is an intrinsic property of any physical object, weight is an extrinsic property that varies depending on the strength of the gravitational field in which the respective object is placed.”

Intrinsic (internal) data quality characteristics

Characteristic that you can evaluate by looking at the data itself, (expectation that is generally implied)

 Is the data complete: what is the % of empty cells?

 Is the data consistent: are the values consistent in their formatting?

Extrinsic (external) data quality characteristic

ISO 8000 Quality Data:

 Is the syntax referenced – is the data compliant with the syntax?

 Is the semantic encoding referenced – is the data encoded?

 Is the data requirement referenced – does the data meet the requirement?

Slide 17

Copyright © 2012 by ECCMA

Copyright © 2012 by ECCMA

ISO 8000 Quality Data

Data that meets Requirements

Data that is Portable

(syntax and semantic encoding)

Slide 18

Portable Data

If we believe that:

• the software we will be using tomorrow will be different from what we are using today, and

• we will need access to our data …. forever, and

• data that cannot be separated from licensed software is also licensed data then

We should ensure that our corporate data is portable, i.e., independent of any licensed software application.

ISO 8000 quality data is portable data!

Slide 19

Copyright © 2012 by ECCMA

Vision of the Future – The Service Provider’s View

Application

Application

Customer

Application

Customer data

Application

Customer data

Storing customer data within the application creates

“customer lock-in”

Slide 20

Copyright © 2012 by ECCMA

Vision of the Future – The Customer’s View

Data

Data

Data

ISO 8000

Data

Application

Customer

Application

Data standards are the antidote to application “lock-in.”

Slide 21

Copyright © 2012 by ECCMA

Operating Systems as a Solution to Hardware “Lock-in”

User

Application

Operating system

Hardware

An operating system is the infrastructure software component of a computer system responsible for sharing the limited resources of the computer. The operating system acts as a host for applications that are run on the machine.

1969 - UNIX

(we also landed on the moon!)

1974 - CP/M

1981 - MSDOS

1985 - Windows 1.0

1995 - BOB

2000 - ME

2001 - XP

2007 - VISTA

2009 - Windows 7

2009 - Android

Slide 22

Copyright © 2012 by ECCMA

ISO Data Standards are the Antidote to Application “Lock-in”

Data quality analysis app User

Application

ISO 22745

ISO 8000

Portable data

Operating system

Hardware

Data validation app

Quality data is portable data; it is independent of the software application and accessible by any application.

Slide 23

Copyright © 2012 by ECCMA

Data de-dup app

Portable Data data that cannot be separated from licensed software is also licensed data

Images

Metadata

Source: Royalty Free Photos

Proprietary metadata and proprietary identifiers are copyright

We can expect to see copyright rigorously enforced in the years ahead

Slide 24

ISO 8000 Quality Data is

Portable Data that Meets Requirements

Motivation for ISO 22745 and ISO 8000 quality data saves money

“From a logistics information perspective… an F-15 is just 171,000 parts flying in very close formation.”

Cataloging and Standardization Act, Public Law 82-436 as codified by United States

Code, Title 10, Chapter 145 – Cataloging and Standardization Sec. 2451.

(a) The Secretary of Defense shall develop a single catalog system and related program of standardizing supplies for the Department of Defense.

(b) In cataloging, the Secretary shall name, describe, classify, and number each item recurrently used, bought, stocked, or distributed by the Department of Defense, so that only one distinctive combination of letters or numerals, or both, identifies the same item throughout the Department of Defense. Only one identification may be used for each item for all supply functions from purchase to final disposal in the field or other area.

Faster, Better, Lower Cost Master Data

Slide 26

ISO 8000 referenced in a NATO contract

ASD Specification 2000M (S2000M) is a standard that specifies the information exchange requirements for most materiel management functions commonly performed in supporting international projects. S2000M is based on a business model agreed between military customers and industry suppliers.

As of Mar 2011 ASD 2000M Chapter 1B section 3.1 includes the following statement, :

The Contractor shall supply identification and characteristic data in accordance with

ISO 8000-110:2009 on any of the selected items covered in his contract. Following an initial codification request as specified in section 3.2, the NATO Codification

Bureau (NCB) shall present a list of the required properties in accordance with the

US Federal Item Identification Guides.

quality data save money

Slide 27

Copyright © 2012 by ECCMA

Motivation for ISO 22745 and ISO 8000

Controlling costs requires better asset, product , component and process visibility. This is achieved through faster, better and lower cost access to authoritative characteristic data.

quality data save money

Slide 28

ISO 8000 Referenced in a Commercial Contract

The supplied data shall be ISO 8000-110:2009 compliant.

• The data shall comply with registered ISO 22745-30 compliant data requirements

• The data shall be encoded using concept identifiers from an ISO 22745 compliant open technical dictionary that supports free resolution to concept definitions.

• The data shall be provided in ISO 22745-40 compliant Extensible Markup

Language (xml).

• Creating ISO 8000-110:2009 compliant data does not require the payment of any license fees or the use of specialized software, it is within the technical ability of all suppliers regardless of their size. quality data save money

Slide 29

Copyright © 2012 by ECCMA

Current Data Quality Drivers

1. Big Data

2. Compliance

3. Data Governance

Companies no longer need convincing that measuring and managing the quality of their data is a good idea, they want to know how to do it.

ISO 8000 is the international standard for data quality

Big Data?

1to1 Media

Slide 31

Compliance

Data

Governance

Data Governance

Policy

People

use

Technology

to access

Data

Copyright © 2012 by ECCMA

Emerging Data Quality Drivers

1. Data Portability

2. Provenance

3. Authoritative data

Companies are discovering how to manage a data supply chain to acquire and distribute authoritative high quality data

ISO 22745 is the international standard for the exchange of quality data

ISO 8000-120 Data Warehouse

Copyright © 2012 by ECCMA

Slide 35

Managing a dictionary

Copyright © 2012 by ECCMA

Slide 36

ISO 8000-120 Data Warehouse used to Manage Content

Multilingual Product Descriptions

Copyright © 2012 by ECCMA

Slide 37

Original SAP Material Name and Description:

External reference data Characteristic data

Classifications

C1, C2, C3 eOTD Class name eOTD class identifier eDRR Data requirement name eDRR data requirement identifier

ISO 22745-40 Master Data eOTD Property

P1 eOTD Property Name eOTD property identifier

P2 eOTD Property Name eOTD property identifier

P3 eOTD Property Name eOTD property identifier

P..

eOTD Property Name eOTD property identifier

Value

ISO 22745-45 Description Rules

Short Long

Max=50

Class Name; P1 value, P2 value

Max=256

Class name; External reference, P1 name=value, P3 name=value, P2 value

SAP Material Name

SAP Purchase Order Description

Good quality data sells!

One king Hyatt Grand Bed and one queen sofa bed. Contemporary decor. Desk.

Complimentary wireless and wired high-speed

Internet access. 42-inch, flat-panel, highdefinition TV with satellite channels and connectivity for laptop, MP3 player, or DVD player. Two telephones (one cordless) with voice mail. AM/FM radio/alarm clock.

Complimentary newspaper. Video check-out.

Refrigerator and coffeemaker. Wet bar.

Bathroom with granite countertops, hair dryer, and Portico bath amenities. Iron/ironing board.

Please create name and description for

300,000 products in 29 languages

Developing the data requirement (ISO 22745-30)

Data requirements support a business function, granting access to a computer, a website or a software program, simply asking for the data needed to deliver the right product or service or to comply with a regulation, these are all data requirements.

Be careful what you ask for – data quality starts with the quality of the request for data

Data requirement eOTD-i-xml

ISO 22745-30

Developing request for data (ISO 22745-35)

Clear and unambiguous requests for:

1.

Reference data (identifiers)

2.

Characteristic data (descriptions)

3.

The validation of reference and characteristic data,

ISO 22745-35 is a standard format for the generation and distribution of requests for data in a simple XML format that can be automated by the sender and recipient to create an integrated data exchange system.

Request for data eOTD-q-xml

ISO 22745-35

Developing a reply to a request for data (ISO 22745-40)

ISO 22745-40 is a standard format for the exchange of

reference and characteristic data in a simple XML format that can be automated by the sender and recipient to create an integrated data exchange system.

Data requirement eOTD-i-xml

ISO 22745-30

Data exchange eOTD-r-xml

ISO 22745-40

Request for data eOTD-q-xml

ISO 22745-35

Automating the data supply chain using ISO 22745

A data provider may not have all the data requested so they in turn send a request through their data supply chain using the same ISO 22745 standard exchanges

Request for data eOTD-q-xml

ISO 22745-35

Request for data eOTD-q-xml

ISO 22745-35

Data provider

Sub

Data exchange eOTD-r-xml

ISO 22745-40

Data requirement eOTD-i-xml

ISO 22745-30

Data requester

Data exchange eOTD-r-xml

ISO 22745-40

ISO 8000 would not solve the highlighted problem!

Conducting and ISO 8000 data governance and data quality evaluation

 Is there a data governance policy registry?

 Is there a corporate dictionary?

 Is the corporate dictionary mapped to an open technical dictionary?

 Is there a data requirements registry?

 Is stored data encoded using the dictionary?

 Does stored or exchanged data contain provenance at the data element level?

 Does the provenance data identify the authoritative source of the data?

Slide 45

Copyright © 2012 by ECCMA

Developing an international open standard for creating and resolving globally unambiguous

Property Natural Identifiers (PNI)

ECCMA 2010-10-21 revised 2012-09-26

Property Natural Identifier - Lot (PNIL)

• Lot boundary represented as one or more polygons in an Earth coordinate system expressed in KML and compressed to form a string.

134x3QC8P14Zv3GT…..*

*the identifier is expected to be very large

47 ECCMA 2010-10-21

Property Natural Identifier – Unit (PNIU)

• A combination of a PNIL and a three dimensional elevation of the point of demarcation in an Earth coordinate system expressed in KML and compressed to form a string.

+

134x3QC8P14Zv3GT…../0GR98WR593C8SQ4…..*

*the identifier is expected to be very large

ECCMA 2010-10-21 48

Requirements

• Natural identifiers - not assigned by a registry

– The identifiers should be unique to each lot and each unit space

– The identifier should be spatially enabled

– The algorithm for the generation of the identifiers should be in the public domain

– The identifiers should be in the public domain

– The algorithm for the conversion of the identifiers to the location and representation of the lot and unit space should be in the public domain

ECCMA 2010-10-21 49

Questions?

Peter R. Benson

Executive Director

Peter.Benson@eccma.org

Information is Power

Data is Truth