DAMA New York The Impact of Data Governance and Data Quality on MDM 2013-01-17 Peter R. Benson Project leader for ISO 22745 and ISO 8000 Copyright © 2013 ECCMA All rights reserved Slide 1 Information is Power Data is Truth Differentiating Information and Data data: fixed form into which information is transformed so that it can be stored or moved. Peter R. Benson Information Data Information Copyright only covers “fixed form” (you can only copyright data not information) Data quality is an essential component of quality information but not the only component Quality information must be based on quality data but quality data does not necessarily yield quality information Quality data may not be timely or relevant, these are characteristics of information quality not data quality Copyright © 2012 by Peter R. Benson Slide 3 Data Maturity Model Global Data Excellence Maximize the Business Value of Enterprise Data Part 1 - Introduction to data quality Defining Quality Portable data Standards and contracts for quality data What does “Quality” Mean? When you order seafood from Quality Fresh Seafood, you can be confident that you are receiving the very best quality of seafood and delivery. Copyright © 2013 by ECCMA Slide 7 ISO 9000 Definition of “Quality” 3.1.1 quality degree to which a set of inherent characteristics fulfils requirements ISO 9000:2005(E) Slide 8 ISO 9001 Quality Management System Requirements ISO 9001:2000(E) Slide 9 Requirements Define Quality 3.1.2 requirement need or expectation that is stated, generally implied or obligatory ISO 9000:2005(E) Quality is about meeting requirements Quality data is data that meets stated requirements nothing more and nothing less! (data that exceeds stated requirements does not increase the quality of the data) Copyright © 2013 by ECCMA Slide 10 These are data requirements Slide 11 The quality of the data depends on how you ask for it Slide 12 The Need for Semantic Encoding In South Africa a traffic light is called a robot! Robot Copyright © 2013 by ECCMA Slide 13 ECCMA Open Technical Dictionary (eOTD) Just as with music notation and engineering symbols, the eOTD concept identifiers are simply used to communicate more accurately in a language independent environment. Music Engineering eOTD A unique public domain identifier is assigned to a concept. 0161-1#01-089388#1 table 0161-1#01-086445#1 chair 0161-1#02-018635#1 weight 0161-1#02-005808#1 length 0161-1#07-277660#1 Monday 0161-1#05-001122#1 kilogram Copyright © 2013 by ECCMA Slide 14 Publicly Visible Terminology in a Standard Model The eOTD (ECCMA Open Technical Dictionary) is an ISO 22745-20 compliant central registry of terminology. Each concept and terminological component in the eOTD is assigned a unique and permanent public domain identifier. Users create corporate preferred subsets of the eOTD and use the eOTD concept identifiers to manage concept equivalence mapping with the concepts used by their trading partners. ISO 22745 - ECCMA Open Technical Dictionary (eOTD) Terms Abbreviations Terminology Public Domain Concept Identifier 0161-1#xx-xxxxxx#1 Copyright © 2013 by ECCMA Definitions Images Publicly Visible Terminology in a Standard Model Industry terminology Terminology Terminology Government terminology Terminology SDO terminology • Public domain concept identifiers • Free identifier resolution to underlying Terminology Copyright © 2013 by ECCMA SDO terminology • • • terminology (web services) Hyperlink to source standards Multilingual Multiple terms, definitions and images linked to single concept identifier Slide 16 Building the Corporate Dictionary and Data Requirements ISO 22745-30 Corporate Data Requirements Registry eDRR ECCMA Data Requirements Registry ISO 22745-10 Corporate Dictionary eOTD ECCMA Open Technical Dictionary Spreadsheets ISO/IEC 11179 Corporate Metadata Registry Reports Corporate Classifications Forms New eOTD concept or terminology registration Data models Copyright © 2013 by ECCMA Slide 17 The property-value pair Everything (Individuals, organizations, locations, goods, services, processes, rules, transactions) can be described using property-value pairs. Property Value Property Value CLASS NAME BOLT DATE 2012-08-12 PRODUCT NUMBER 3225020037 CREDIT CARD NUMBER 3225020037 NOMINAL THREAD DIAMETER 1.0 INCH MERCHANT NUMBER 15689345 WIDTH ACROSS FLATS 1.450 INCHES AMOUNT 156.45 DOLLARS WIDTH ACROSS CORNERS 1.653 INCHES HEAD HEIGHT 0.591 INCHES COUNT PER PACK 10 Property Value STATE PA PACK PRICE 0.80 DOLLARS TITLE 33 Property Value CHAPTER 75 PARAGRAPH 3362 FIRST NAME JOE NAME Maximum speed limits LAST NAME SMITH TITLE MANAGER EMAIL JOE.SMITH@COMANY.COM Copyright © 2013 by ECCMA Slide 18 Taxonomy of Data data Data Copyright © 2013 by ECCMA metadata transaction data master data identification data descriptive data classification data physical characteristics performance characteristics data exchange characteristics Slide 19 ISO 8000 Family of Standards ISO 8000 General Principles Part 1 Introduction Part 2 Terminology Part 100 introduction Part 120 Provenance Part 110 Syntax Copyright © 2013 by ECCMA Transaction data Master Data Semantic encoding Part 130 Accuracy Part 140 Completeness Meets requirements Slide 20 Intrinsic and Extrinsic Data Quality “Mass is an intrinsic property of any physical object, weight is an extrinsic property that varies depending on the strength of the gravitational field in which the respective object is placed.” Intrinsic (internal) data quality characteristics Characteristic that you can evaluate by looking at the data itself, (expectation that is generally implied) Is the data complete: what is the % of empty cells? Is the data consistent: are the values consistent in their formatting? Extrinsic (external) data quality characteristic ISO 8000 Quality Data: Is the syntax referenced – is the data compliant with the syntax? Is the semantic encoding referenced – is the data encoded? Is the data requirement referenced – does the data meet the requirement? Copyright © 2013 by ECCMA Slide 21 ISO 8000 Quality Data Data that meets Requirements Data that is Portable (syntax and semantic encoding) Copyright © 2013 by ECCMA Slide 22 Portable Data If we believe that: • the software we will be using tomorrow will be different from what we are using today, and • we will need access to our data …. forever, and • data that cannot be separated from licensed software is also licensed data then We should ensure that our corporate data is portable, i.e., independent of any licensed software application. ISO 8000 quality data is portable data! Copyright © 2013 by ECCMA Slide 23 Vision of the Future – The Service Provider’s View Application Application Application Customer data Customer Application Customer data Storing customer data within the application creates “customer lock-in” Copyright © 2013 by ECCMA Slide 24 Vision of the Future – The Customer’s View Data Data Data ISO 8000 Data Application Customer Application Data standards are the antidote to application “lock-in.” Copyright © 2013 by ECCMA Slide 25 Operating Systems as a Solution to Hardware “Lock-in” User Application Operating system Hardware Copyright © 2013 by ECCMA An operating system is the infrastructure software component of a computer system responsible for sharing the limited resources of the computer. The operating system acts as a host for applications that are run on the machine. 1969 - UNIX (we also landed on the moon!) 1974 - CP/M 1981 - MSDOS 1985 - Windows 1.0 1995 - BOB 2000 - ME 2001 - XP 2007 - VISTA 2009 - Windows 7 2009 - Android Slide 26 ISO Data Standards are the Antidote to Application “Lock-in” Data quality analysis app User Application Operating system Hardware Copyright © 2013 by ECCMA ISO 22745 ISO 8000 Portable data Data de-dup app Data validation app Quality data is portable data; it is independent of the software application and accessible by any application. Slide 27 Portable Data data that cannot be separated from licensed software is also licensed data Images Metadata Source: Royalty Free Photos Proprietary metadata and proprietary identifiers are copyright We can expect to see copyright rigorously enforced in the years ahead Slide 28 ISO 8000 Quality Data is Portable Data that Meets Requirements Motivation for ISO 22745 and ISO 8000 quality data saves money “From a logistics information perspective… an F-15 is just 171,000 parts flying in very close formation.” Cataloging and Standardization Act, Public Law 82-436 as codified by United States Code, Title 10, Chapter 145 – Cataloging and Standardization Sec. 2451. (a) The Secretary of Defense shall develop a single catalog system and related program of standardizing supplies for the Department of Defense. (b) In cataloging, the Secretary shall name, describe, classify, and number each item recurrently used, bought, stocked, or distributed by the Department of Defense, so that only one distinctive combination of letters or numerals, or both, identifies the same item throughout the Department of Defense. Only one identification may be used for each item for all supply functions from purchase to final disposal in the field or other area. Faster, Better, Lower Cost Master Data Slide 30 Standards as Business Constraints LAWS (Mandatory) STANDARDS (Voluntary, but compliance can be verified) CONVENTIONS (Voluntary) Copyright © 2013 by ECCMA Slide 31 ISO 8000 referenced in a NATO contract ASD Specification 2000M (S2000M) is a standard that specifies the information exchange requirements for most materiel management functions commonly performed in supporting international projects. S2000M is based on a business model agreed between military customers and industry suppliers. As of Mar 2011 ASD 2000M Chapter 1B section 3.1 includes the following statement, : The Contractor shall supply identification and characteristic data in accordance with ISO 8000-110:2009 on any of the selected items covered in his contract. Following an initial codification request as specified in section 3.2, the NATO Codification Bureau (NCB) shall present a list of the required properties in accordance with the US Federal Item Identification Guides. quality data save money Copyright © 2013 by ECCMA Slide 32 Motivation for ISO 22745 and ISO 8000 Controlling costs requires better asset, product , component and process visibility. This is achieved through faster, better and lower cost access to authoritative characteristic data. quality data save money Slide 33 ISO 8000 Referenced in a Commercial Contract The supplied data shall be ISO 8000-110:2009 compliant. • The data shall comply with registered ISO 22745-30 compliant data requirements • The data shall be encoded using concept identifiers from an ISO 22745 compliant open technical dictionary that supports free resolution to concept definitions. • The data shall be provided in ISO 22745-40 compliant Extensible Markup Language (xml). • Creating ISO 8000-110:2009 compliant data does not require the payment of any license fees or the use of specialized software, it is within the technical ability of all suppliers regardless of their size. quality data save money Copyright © 2013 by ECCMA Slide 34 Part 2 – Drivers and Impact of Data quality on MDM Data quality drivers Impact of quality standards Natural identifiers Current Data Quality Drivers 1. Big Data 2. Compliance 3. Data Governance Companies no longer need convincing that measuring and managing the quality of their data is a good idea, they want to know how to do it. ISO 8000 is the international standard for data quality Big Data? 1to1 Media Slide 37 Compliance ISO 8000 would not solve the highlighted problem! Data Governance People use Data Governance Policy Technology to access Data Copyright © 2013 by ECCMA Emerging Data Quality Drivers 1. Data Portability 2. Provenance 3. Authoritative data Companies are discovering how to manage a data supply chain to acquire and distribute authoritative high quality data ISO 22745 is the international standard for the exchange of quality data ISO 8000-120 Data Warehouse Copyright © 2013 by ECCMA Slide 42 Managing a dictionary Copyright © 2013 by ECCMA Slide 43 ISO 8000-120 Data Warehouse used to Manage Content Multilingual Product Descriptions Copyright © 2013 by ECCMA Slide 44 New item request: Supplier name and part number, description External reference data (supplier ID and part #) Characteristic data (description) eOTD Class name eOTD class identifier Classifications C1, C2, C3, Cn eOTD - Classifications lookup eDRR data requirement name eDRR identifier ISO 22745-40 Material record P1 ISO 22745-45 Description rules eOTD Property Name CLASS NAME eOTD property identifier Value P2 eOTD Property Name MATERIAL GROUP eOTD property identifier Value P3 eOTD Property Name eOTD property identifier Value P4 eOTD Property Name eOTD property identifier Value Pn eOTD Property Name eOTD property identifier Value Copyright © 2013 by ECCMA Short Long Max=40; P1 value, P3 abvr name +“ “+ value Max=256; P1 value, P3 name=value, P4 name=value ISO 22745 Description generation engine Material name (short description) PO description (long description) Developing the data requirement (ISO 22745-30) Data requirements support a business function, granting access to a computer, a website or a software program, simply asking for the data needed to deliver the right product or service or to comply with a regulation, these are all data requirements. Be careful what you ask for – data quality starts with the quality of the request for data Data requirement eOTD-i-xml ISO 22745-30 Copyright © 2013 by ECCMA Developing request for data (ISO 22745-35) Clear and unambiguous requests for: 1. Reference data (identifiers) 2. Characteristic data (descriptions) 3. The validation of reference and characteristic data, ISO 22745-35 is a standard format for the generation and distribution of requests for data in a simple XML format that can be automated by the sender and recipient to create an integrated data exchange system. Request for data eOTD-q-xml ISO 22745-35 Copyright © 2013 by ECCMA Developing a reply to a request for data (ISO 22745-40) ISO 22745-40 is a standard format for the exchange of reference and characteristic data in a simple XML format that can be automated by the sender and recipient to create an integrated data exchange system. Data requirement eOTD-i-xml ISO 22745-30 Data exchange eOTD-r-xml ISO 22745-40 Copyright © 2013 by ECCMA Request for data eOTD-q-xml ISO 22745-35 Automating the data supply chain using ISO 22745 A data provider may not have all the data requested so they in turn send a request through their data supply chain using the same ISO 22745 standard exchanges Request for data eOTD-q-xml ISO 22745-35 Sub Request for data eOTD-q-xml ISO 22745-35 Data provider Data requester Data requirement eOTD-i-xml ISO 22745-30 Data exchange eOTD-r-xml ISO 22745-40 Copyright © 2013 by ECCMA Data exchange eOTD-r-xml ISO 22745-40 Conducting and ISO 8000 data governance and data quality evaluation Is there a data governance policy registry? Is there a corporate dictionary? Is the corporate dictionary mapped to an open technical dictionary? Is there a data requirements registry? Is stored data encoded using the dictionary? Does stored or exchanged data contain provenance at the data element level? Does the provenance data identify the authoritative source of the data? Copyright © 2013 by ECCMA Slide 50 Natural and “Managed” Identifiers Identifier: an object identifier is created when an object is defined. The identifier is then used to reference the object (see Microsoft et al.). Reference: the combination of an identifier of the organization issuing an identifier and the object identifier. Natural identifier: based on a defined process, if the process is proprietary, the identifier is proprietary. A natural open identifier is based on an open standard process. Managed (proprietary identifier): issued by a defined organization, belongs to the organization that issued it. Copyright and subject to license. Copyright © 2013 by ECCMA Slide 51 Natural and “Managed” Identifiers Slide 52 Developing an international open standard for creating and resolving globally unambiguous Property Natural Identifiers (PNI) ECCMA 2010-10-21 revised 2012-09-26 Property Natural Identifier - Lot (PNIL) • Lot boundary represented as one or more polygons in an Earth coordinate system expressed in KML and compressed to form a string. 134x3QC8P14Zv3GT…..* *the identifier is expected to be very large ECCMA 2010-10-21 54 Property Natural Identifier – Unit (PNIU) • A combination of a PNIL and a three dimensional elevation of the point of demarcation (front door) in an Earth coordinate system expressed in KML and compressed to form a string. + 134x3QC8P14Zv3GT…../0GR98WR593C8SQ4…..* *the identifier is expected to be very large ECCMA 2010-10-21 55 Requirements • Natural identifiers - not assigned by a registry – The identifiers should be unique to each lot and each unit space – The identifier should be spatially enabled – The algorithm for the generation of the identifiers should be in the public domain – The identifiers should be in the public domain – The algorithm for the conversion of the identifiers to the location and representation of the lot and unit space should be in the public domain ECCMA 2010-10-21 56 Information is Power Data is Truth ISO 8000 Quality Data Portable Data that Meets Requirements ECCMA ISO 8000 Master Data Quality Manager (MDQM) Certification An open book test consisting of 60 questions designed to assess your ability to remember or recall basic and fundamental pieces of knowledge related to the workshop as well as assess your ability to think critically about the subject. Contact: vicky.falcone@eccma.org Questions? Peter R. Benson Executive Director Peter.Benson@eccma.org