Data Modeling & Metadata Management Donna Burbank Global Data Strategy Ltd. Lessons in Data Modeling DATAVERSITY Series October 27th, 2016 Donna Burbank Donna is a recognized industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specialises in the alignment of business drivers with data-centric technology. In past roles, she has served in a number of roles related to data modeling & metadata: Global Data Strategy, Ltd. 2016 • Metadata consultant (US, Europe, Asia, Africa) • Product Manager PLATINUM Metadata Repository • Director of Product Management, ER/Studio • VP of Product Marketing, Erwin • Data modeling & data strategy implementation & consulting • Author of 2 books of data modeling & contributor to 1 book on metadata management, plus numerous articles • OMG committee member of the Information Management Metamodel (IMM) As an active contributor to the data management community, she is a long time DAMA International member and is the President of the DAMA Rocky Mountain chapter. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, Follow on Twitter @donnaburbank Today’s hashtag: #LessonsDM and Africa and speaks regularly at industry conferences. She has co-authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications such as DATAVERSITY, EM360, & TDAN. She can be reached at donna.burbank@globaldatastrategy.com Donna is based in Boulder, Colorado, USA. 2 Lessons in Data Modeling Series This Year’s Line Up • July 28th Why a Data Model is an Important Part of your Data Strategy • August 25th Data Modeling for Big Data • September 22nd UML for Data Modeling – When Does it Make Sense? • October 27th Data Modeling & Metadata Management • December 6th Data Modeling for XML and JSON Global Data Strategy, Ltd. 2016 3 Agenda What we’ll cover today • How data modeling fits within a larger metadata management landscape • When can data modeling provide “just enough” metadata management • Key data modeling artifacts for metadata • Organization, roles & implementation considerations • Summary & questions Global Data Strategy, Ltd. 2016 4 Metadata is Hotter than ever A Growing Trend In a recent DATAVERSITY survey, over 80% of respondents stated that: Metadata is as important, if not more important, than in the past. Global Data Strategy, Ltd. 2016 5 What is Metadata? Metadata is Data In Context Global Data Strategy, Ltd. 2016 6 Metadata is the “Who, What, Where, Why, When & How” of Data Who What Where Why When How Who created this data? What is the business definition of this data element? Where is this data stored? Why are we storing this data? When was this data created? How is this data formatted? (character, numeric, etc.) Who is the Steward of this data? What are the business rules for this data? Where did this data come from? What is its usage & purpose? When was this data last updated? How many databases or data sources store this data? Who is using this data? What is the security level or privacy level of this data? Where is this data used & shared? What are the business drivers for using this data? How long should it be stored? Who “owns” this data? What is the abbreviation or acronym for this data element? Where is the backup for this data? Who is regulating or auditing this data? What are the technical naming standards for database implementation? Are there regional privacy or security policies that regulate this data? Global Data Strategy, Ltd. 2016 When does it need to be purged/deleted? 7 Metadata is Part of a Larger Enterprise Landscape A Successful Data Strategy Requires Many Inter-related Disciplines “Top-Down” alignment with business priorities Managing the people, process, policies & culture around data Leveraging & managing data for strategic advantage Coordinating & integrating disparate data sources “Bottom-Up” management & inventory of data sources Global Data Strategy, Ltd. 2016 8 Metadata Across & Beyond the Organization • Metadata exists in many sources across & beyond the organization. COBOL JCL Legacy Systems Data Models Spreadsheets Media Databases Data In Motion Documents Global Data Strategy, Ltd. 2016 Social Media Open Data IoT 9 Types of Metadata • The DATAVERSITY Emerging Trends in Metadata survey revealed some interesting findings about what types of metadata organizations will be managing now and in the future. Now Future = Supported by most data modeling tools Global Data Strategy, Ltd. 2016 10 Data Models are a Good Source of Metadata • Data Models are another good source of both business & technical metadata for relational databases. • They store structural metadata as well as business rules & definitions. • Key relationships are also stored to provide lineage & impact analysis. Technical Metadata Business Metadata Customer Customer_ID CHAR(18) NOT NULL First Name Last Name City Date Purchased CHAR(18) CHAR(18) CHAR(18) CHAR(18) Global Data Strategy, Ltd. 2016 NOT NULL NOT NULL NULL NULL 11 Data vs. Metadata Customer First Name Last Name Company City Joe Smith Komputers R Us New York Year Purchased 1970 Mary Jones The Lord’s Store London 1999 Proful Bishwal The Lady’s Store Mumbai 1998 Ming Lee My Favorite Store Beijing 2001 Global Data Strategy, Ltd. 2016 Metadata Data 12 Data vs. Metadata Customer STR01 STR02 TXT123 TXT127 DT01 Joe Smith Komputers R Us New York 1970 Mary Jones The Lord’s Store London 1999 Proful Bishwal The Lady’s Store Mumbai 1998 Ming Lee My Favorite Store Beijing 2001 Global Data Strategy, Ltd. 2016 Metadata? Data 13 Metadata adds Context & Definition Customer First Name Joe Mary Proful Ming Last Name Smith Jones Bishwal Lee Company Definition City Komputers R Us New York The Lord’s Store London The Lady’s Store Mumbai My Favorite Store Beijing Year Purchased 1970 1999 1998 2001 Business Rules Format Abbreviation Required Is this the city where the customer lives or where the store is located? Global Data Strategy, Ltd. 2016 Etc. Last Name represents the surname or family name of an individual. In the Chinese market, family name is listed first in salutations. VARCHAR(30) LNAME YES Numerous technical & business metadata including security, privacy, nullability, primary key, etc. 14 Technical & Business Metadata • Technical Metadata describes the structure, format, and rules for storing data • Business Metadata describes the business definitions, rules, and context for data. • Data represents actual instances (e.g. John Smith) Data Technical Metadata CREATE TABLE EMPLOYEE ( employee_id INTEGER NOT NULL, department_id INTEGER NOT NULL, employee_fname VARCHAR(50) NULL, employee_lname VARCHAR(50) NULL, employee_ssn CHAR(9) NULL); CREATE TABLE CUSTOMER ( customer_id INTEGER NOT NULL, customer_name VARCHAR(50) NULL, customer_address VARCHAR(150) NULL, customer_city VARCHAR(50) NULL, customer_state CHAR(2) NULL, customer_zip CHAR(9) NULL); Business Metadata Term Definition An employee is an individual who currently Employee works for the organization or who has been recently employed within the past 6 months. A customer is a person or organization who has purchased from the organization within Customer the past 2 years and has an active loyalty card or maintenance contract. John Smith Global Data Strategy, Ltd. 2016 15 Business vs. Technical Metadata • The following are examples of types of business & technical metadata. Business Metadata • • • • • • • • Definitions & Glossary Data Steward Organization Privacy Level Security Level Acronyms & Abbreviations Business Rules Etc. Global Data Strategy, Ltd. 2016 Technical Metadata • • • • • • • • • • Column structure of a database table Data Type & Length (e.g. VARCHAR(20)) Domains Standard abbreviations (e.g. CUSTOMER -> CUST) Nullability Keys (primary, foreign, alternate, etc.) Validation Rules Data Movement Rules Permissions Etc. 16 Levels of Data Modeling Conceptual Business Concepts Business Metadata Logical Data Entities Physical Technical Metadata Physical Tables Global Data Strategy, Ltd. 2016 17 Business Definitions From Data Modeling for the Business by Hoberman, Burbank, Bradley, Technics Publications, 2009 Global Data Strategy, Ltd. 2016 Non-Traditional Sources Not all metadata is in a relational database Human Metadata Avoid the dreaded “I just know” • Much business metadata and the history of the business exists in employee’s heads. • It is important to capture this metadata in an electronic format for sharing with others. • Avoid the dreaded “I just know” Part Number is what used to be called Component Number before the acquisition. Business Glossary Metadata Repository Data Models Etc. Global Data Strategy, Ltd. 2016 20 Data Modeling in the Big Data Ecosystem Data Sources JSON / XML HDFS File System Structured Data HQL Hive HBase Hadoop Framework Global Data Strategy, Ltd. 2016 Semi-structured Data JSON Unstructured Data XML JSON MapReduce / Analytics Cobol Copybook Metadata • What is a COBOL Copybook? – In COBOL, a copybook file is used to define data elements that can be referenced by many programs • What is COBOL Copybook Metadata? – structure, definition Metadata Describes structure & format of data Global Data Strategy, Ltd. 2016 22 ERP/CRM and Packaged Application Metadata • Packaged applications such as CRM and ERP systems (e.g. Salesforce, Peoplesoft, etc.) are typically based on a relational database system. • Therefore, there is important metadata about both the physical table structures as well as the business names & definitions. Technical Metadata Global Data Strategy, Ltd. 2016 Business Metadata 23 Relationship Metadata Showing How Information Interrelates Data Lineage - Data Warehousing Example • In the data warehouse example below, metadata for CUSTOMER exists in a number tools & data stores. • This lineage can be tracked in most data modeling tools. Logical Data Model Dimensional Data Model Physical Data Model Physical Data Model CUSTOMER Database Table CUSTOMER ETL Tool Business Glossary CUSTOMER BI Tool ETL Tool Database Table Database Table CUST Database Table TBL_C1 Sales Report Database Table Global Data Strategy, Ltd. 2016 25 Metadata Discovery Tools • Metadata Discovery Tools extract metadata from source systems, and rationalize them to a common metamodel and storage facility. Metadata Discovery Tools Metadata Population Metadata Storage (Repository) Metamodel(s) Metadata Storage (Database) Global Data Strategy, Ltd. 2016 26 Impact Analysis & Where Used • Impact Analysis shows the relationship between a piece of metadata and other sources that rely on that metadata to assess the impact of a potential change. • For example, if I change the length & name of a field, what other systems that are referencing that field will be affected? What happens if I change the name & length of the “Brand” field? Customer Database Oracle Sales Application Brand CHAR(10) MyBrand VARCHAR(30) Sales Database DB2 Global Data Strategy, Ltd. 2016 ETL Staging Area 27 Design Layer Relationships • In a data model there are several design layers that describe a given data concept. Global Data Strategy, Ltd. 2016 28 Organization, Roles & Implementation Considerations Ensuring that metadata is used effectively across the organization Who Uses Metadata? • In addition to sharing metadata between tools and via export, many users across both IT & the business want to view the metadata through reports, portals, etc. If I change this field, what else will be affected? Developer What’s the definition of “Regional Sales” Business Person (e.g. Finance) Global Data Strategy, Ltd. 2016 What is the approved data structure for storing customer data? Data Architect How was “Total Sales” calculated? Show me the lineage. Auditor What are the source-totarget mappings for the DW? Data Warehouse Architect How can I get new staff upto-speed on our company’s business terminology? Business Person (e.g. HR) 30 Metadata is Needed by Business Stakeholders Making business decisions on accurate and well-understood data 80% of users of metadata are from the business, according to the recent DATAVERSITY survey. Global Data Strategy, Ltd. 2016 31 Metadata Publication & Reporting – Business Glossary • A Business Glossary is a common way to publish business terms & their definitions. • When sourced from a common repository, these terms are integrated with the wider data landscape. • Most data modeling tools can take the definitions from Logical and/or Conceptual data models and publish them to a Glossary-style format, via web portals or reports. Business Term Data Steward Security Level BFPO Number Abbreviation Definition BFPO Number is for British Forces Postal Office. It can be BFPO Num used in UK and overseas addresses. Accounting Unclassified Interest Int Finance Unclassified PO Box POB Accounting Unclassified The growth in capital of a monetary investment A numbered box in a post office assigned to a person or organization, where mail for them is kept until collected A feedback mechanism is important to gather valuable input & updates from users. Global Data Strategy, Ltd. 2016 32 Metadata Publication & Reporting – Lineage • Data Lineage can be visualized through a web portal or reports. • With web-based reporting, users can drill-down into each data source and investigate further lineage. Global Data Strategy, Ltd. 2016 33 Metadata Publication & Reporting – Data Structures • Having a common view of standard data structures is helpful for data architects, developers, etc. • This can all be sourced from a data model. Table Name CUSTOMER ORDER Column Name Attribute Name Data Type Nullability Primary Key Definition CUST_ID Customer Identifier VARCHAR(20) NOT NULL Yes Customer ID is the unique identifier that locates a customer F_NAME First Name VARCHAR(30) NOT NULL No The given name of an individual L_NAME Last Name VARCHAR(40) NOT NULL No ORDER_ID Order Identifier VARCHAR(10) NOT NULL Yes The family name of an individual The number assigned to an order from the FIX10 system that locates a unique order. Etc. Global Data Strategy, Ltd. 2016 34 Data Models can provide “Just Enough” Metadata Management • While data modeling tools are not metadata repositories, nor designed to be, they offer many features shared with these repository solutions: • Metadata storage, Data lineage visualization, Business Glossary, Integration with BI tools, ETL tools, etc. • Metadata repositories have a broader range metadata sources & dedicated metadata management support. • And Data Modeling tools, of course, have the added benefit of doing data modeling! • And the benefit is that much of the needed metadata is in these data models. Metadata Storage Data Modeling Tools (e.g. Erwin, SAP PowerDesigner, Idera ER/Studio) x Metadata Repositories (e.g. ASG, Adaptive) X Data Governance Tools (e.g. Collibra, Diaku) x Spreadsheets x Global Data Strategy, Ltd. 2016 Metadata Lifecycle & Versioning X Data Modeling Metadata Discovery & Integration w/ Other Tools Customizable Metamodel X X x X X X X x x Data Lineage Visualization Business Glossary X x X x x x 35 Key Components of Metadata Management Metadata Strategy Metadata Capture & Storage Metadata Integration & Publication Metadata Management & Governance Alignment with business goals & strategy Identification of all internal & external metadata sources Identification of all technical metadata sources Metadata roles & responsibilities defined Identification of & feedback from key stakeholders Population/import mechanism for all identified sources Identification of key stakeholders & audiences (internal & external) Metadata standards created Prioritization of key activities aligned with business needs & technical capabilities Identification of existing metadata storage Integration mechanism for key technologies (direct integration, export, etc.) Metadata lifecycle management defined & implemented Prioritization of key data elements/subject areas Definition of enterprise metadata storage strategy Publication mechanism for each audience Metadata quality statistics defined & monitored Feedback mechanism for each audience Metadata integrated into operational activities & related data management projects Communication Plan developed Global Data Strategy, Ltd. 2016 36 Implementing a Metadata Strategy • A successful metadata strategy requires input from multiple factors. Business Drivers & Motivation Stakeholders & Audience Metadata Strategy Metadata Management Maturity Metadata Sources & Technology Global Data Strategy, Ltd. 2016 37 Stakeholder Feedback • Determine key business issues & drivers through direct feedback. There is limited ownership or enforcement of common practices and standards across the projects I didn’t know we had any documented data standards We have 15 customer databases – with many duplications. I just joined the company and don’t understand all of the acronyms! I need a central, accurate view of all my customers worldwide. Global Data Strategy, Ltd. 2016 There was an error in reporting products by customer & region that was noticed by upper management. $12m has been spent on projects to clean up the data over the past 2-3 years Where do I go to get the definition of “default banking standard”? Key subject matter experts are relied upon to review detailed data from various systems to ensure accuracy. What are the data structures used in the application? 38 Mapping Business Drivers to Metadata Management Capabilities Stakeholder Challenges 1 Business Drivers External Drivers Digital Self Service Online Community & Social Media Increasing Regulatory Pressures Community Building Brand Reputation Metadata Strategy Integrating Data • Siloed systems • No common view of key information 33 Efficient IT • System redundancy • No reuse or standards 2 3 4 5 6 Metadata Capture & Storage 2 3 4 5 6 Metadata Integration & Publication 1 2 3 4 5 Shows “Heat Map” of Priorities 6 Metadata Management & Governance No Audit Trails • No lineage of changes • Fines had been levied in past for lack of compliance 66 1 Data Quality Issues 4 Cost of Data Management •4 Manual entry increases costs 5 360 View of Customer Metadata Capability 2 • Bad customer info causing Brand damage • Completeness & Accuracy Needed Internal Drivers Targeted Marketing Lack of Business Alignment • Data spend not aligned to Business Plans • Business users not involved with data 2 3 4 Big Data Exploitation • Exploiting Unstructured Data • Access to External & Social Data Global Data Strategy, Ltd. 2016 39 Inventory & Usage Mapping • It’s also important to determine which teams are using these technologies to create a “heat map” of usage & priority. Metadata Sources Leadership Sales Finance Marketing Support R&D HR Legal Compliance X X X X X X X X X X Relational Databases MySQL X Oracle X X SQL Server X X Sybase X X Etc. BI Tools Tableau Qlik X X X X Etc. Open Data Data.gov – agricultural data X X X Etc. Global Data Strategy, Ltd. 2016 40 Metadata Roles & Responsibilities • It’s important to establish formal roles & responsibilities for your metadata effort. • Some may be part-time, and some full-time, but they should be clearly defined and communicated so that staff has understanding of and accountability for their roles. • Executive Sponsor/Champion: Understands & communicates the importance of metadata management across the organization. • Steering Group: As part of a metadata management effort, or part of a larger data governance effort, the steering group prioritizes & sets direction for key activities. • Data Stewards: Responsible for business definitions & rules for key data elements. • Metadata Repository Administrator: Manages the administration, population, and interfaces of a metadata repository. • Metadata Publicist: Establishes reports & publication methods to end users. • Metadata Consumers: Actively use metadata as part of their daily jobs, and are held accountable for using published standards. • Data Modelers • Developers • Business Users • Report Developers • Etc. Global Data Strategy, Ltd. 2016 41 Monitoring Metadata Quality & Metrics • Metadata is a key driver of data quality, and to support this, the metadata itself must be of high quality. • In order to ensure that quality metadata is maintained, it must be actively managed and monitored. Dashboards & Reports can be used to monitor key quality indicators. • Key metadata quality indicators include: • Completeness: e.g. Do definitions exist for all key data elements? • Accuracy: e.g. Are current definitions correct? Do data types accurately represent currently implemented standards? • Currency/ Timeliness: e.g. Are metadata definitions current or outdated? • Consistency: e.g. Are metadata standards defined, published & implemented consistently across the organization? • Accountability: e.g. Are data stewards or owners defined? • Integrity: e.g. Are linkages and relationships established between critical metadata items? • Privacy: e.g. Is any metadata subject to privacy restrictions? • Usability: e.g. Are people actually using this metadata? Global Data Strategy, Ltd. 2016 42 Summary • Metadata is more important than ever • Data models are a rich source of metadata • While metadata repositories are valuable, data models & associated functionality can often provide “just enough” metadata management • • • • Business definitions Technical data structures Data lineage & impact analysis Visual models • Organizational considerations are critical to achieve success • Understanding business drivers • Defining roles & responsibilities • Monitoring metadata quality & metrics • Have fun! Metadata is for the cool kids. Global Data Strategy, Ltd. 2016 About Global Data Strategy, Ltd Data-Driven Business Transformation • Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • Clear & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. Business Strategy Aligned With Data Strategy Visit www.globaldatastrategy.com for more information Global Data Strategy, Ltd. 2016 44 Contact Info • Email: donna.burbank@globaldatastrategy.com • Twitter: @donnaburbank @GlobalDataStrat • Website: www.globaldatastrategy.com • Company Linkedin: https://www.linkedin.com/company/global-data-strategy-ltd • Personal Linkedin: https://www.linkedin.com/in/donnaburbank Global Data Strategy, Ltd. 2016 45 White Paper: Emerging Trends in Metadata Management Free Download • Download from www.dataversity.net • Under ‘Whitepapers’ Global Data Strategy, Ltd. 2016 46 DATAVERSITY Training Center Online Training Courses New Metadata Management Course • Learn the basics of Metadata Management and practical tips on how to apply metadata management in the real world. This online course hosted by DATAVERSITY provides a series of six courses including: • • • • • • What is Metadata The Business Value of Metadata Sources of Metadata Metamodels and Metadata Standards Metadata Architecture, Integration, and Storage Metadata Strategy and Implementation • Purchase all six courses for $399 or individually at $79 each. Register here • Other courses available on Data Governance & Data Quality Visit: http://training.dataversity.net/lms/ Global Data Strategy, Ltd. 2016 47 Lessons in Data Modeling Series Join us next time • July 28th Why a Data Model is an Important Part of your Data Strategy • August 25th Data Modeling for Big Data • September 22nd UML for Data Modeling – When Does it Make Sense? • October 27th Data Modeling & Metadata Management • December 6th Data Modeling for XML and JSON Global Data Strategy, Ltd. 2016 48 Questions? Thoughts? Ideas? Global Data Strategy, Ltd. 2016 49