THE DATA MANAGEMENT COOKBOOK A POCKET GUIDE FOR IMPLEMENTATION OF DATA MANAGEMENT Irina Steenbeek “All roads lead to data.” Copyright © 2018 Irina Steenbeek. All right reserved. Published by Data Crossroads. www.datacrossroads.com First edition, 2018. All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatsoever without the express written permission of the publisher except for the use of brief quotations in a book review. Book design by Natalia Zhuravska, Dreamalizer. ISBN: 1984149938 ISBN-13: 978-1984149930 Table of contents Introduction 4 Setting up the table6 FIND YOUR DRIVER First course: apéritif DEFINE YOUR DATA NEEDS8 Second course: appetizers DIVIDE TASKS AND RESPONSIBILITIES10 Third course: entree ORGANIZE YOUR DATA HOUSE12 Part 1: Understand your data12 Part 2: Locate your data14 Part 3: Manage data flow 16 Part 4: Improve data quality18 Fourth course: main course ASSESS THE GAPS20 Fifth course: dessert KEEP GOING 22 EXTRA MATERIALS: Case study: Data Quality26 Step-by-step checklist 28 3 Introduction The amount of data available in your company and in the business environment grows exponentially. Thus the task of keeping the data in control becomes more and more difficult with time. A lot of companies realize that data is an important asset and has to be managed accordingly. They would also like to get value from data. Everyone wants to be ‘data-driven’ these days. What lies beneath this idea, is the wish to make the decision-making process easier and more effective. It means delivering the required data of acceptable quality to the decision makers when and where they need it. In short: a lot of companies understand the vital necessity of proper management of their data. The main question now is: how put this into practice? Knowing the potential of your data, and managing it correctly is the key to a successful business. As a result of well-implemented data management, you will be able to reduce risks and costs, increase efficiency, ensure business continuity and successful growth. We propose a 5-step system which will guarantee successful implementation of data management. 4 In this book, we invite you for a five-course dinner. During each course we will explain the steps of our 5-step system one by one. The content and the order of the steps are a result of practical experience, but as any recipe, they are more of a guideline than a list of strict instructions. Tastes (and businesses) differ, so feel free to choose and adjust, make iterations, or even skip steps, if that fits your purpose. A FEW THINGS YOU SHOULD KNOW ABOUT COOKING DATA 1. There are several recognizable international guidelines on setting up data management. These are: • Data Management Body of Knowledge, 2nd edition (DAMA-DMBOK 2), by DAMA International1; • Data Management Capability Assessment Model (DCAM), by EDM Council2; • The Open Group Architecture Framework, version 9.1 (TOGAF 9.1), by The Open Group3. 2. The good news is: you do not have to read all of them in detail. Our method contains the essence of the above-mentioned guidelines. It covers the required feasible minimum of information that you need for an effective implementation of data management in your company. 3. Regardless of their size, most companies are dealing with the same limited list of urgent tasks in the area of data management. 4. You need to define what data management functions are feasible and which match your company’s profile and requirements. There is no one particular approach that has been widely accepted by the data management community. You need to find your own way, with our support, of course! Good luck, or should we say: ‘bon appétit’! 5 Setting up the table FIND YOUR DRIVER Before diving into the development of a data management function, let us align our understanding of the basics. TOGAF 9.1 stipulates that ‘business function delivers business capabilities closely aligned to the organization’4. It also defines ‘capability’ as ‘an ability that organization, person, or system possesses. Capabilities…typically…require a combination of organization, people, processes and technology to achieve’5. So the following conclusion derives from these statements: data management is a business function, and in order for it to operate properly, you need to combine organization, processes, technology, and data all together. First things first To activate the growth of your business, you have to understand where exactly you need to focus your energy on. The key is to choose the most important and influential business areas for generating the overall success. For sure, data management business function already exists in your organization, in some formal or informal shape. Usually, the idea to set up a formal data management function does not 6 just ‘come up’, it comes from a long process of thinking, analyzing and evaluating the needs of the company. This all leads to a certain point in time when necessity of setting up or extending a formal implementation of this function cannot be overlooked any longer. Very often, this realization comes along with some urgent challenges. These challenges are to some extent the business drivers we are talking about. Choosing the right tableware Probably you already have a few ideas in mind about what the main drivers for your company might be, but let us assist you in structuring these thoughts. It is crucial to have a clear goal before you start ‘cooking’. 1. Create a list of all possible drivers. Think of: • regulatory compliance (i.e. GDPR requirements); • data quality issues; • business intelligence and data warehousing activities; • big data perspective; • predictive analytics; • impact analysis of changes in software and reporting practices; • business continuity. 2. Investigate the most attractive selling points of your data management plan. These could be: • cost efficiency reduction; • risk reduction; • improvement of organizational efficiency and productivity; • protection and improvement of the organization’s reputation. 3. Minimize the list to 1-2 most important drivers. 4. Look for ‘sponsors’ for the idea(s) amongst influential stakeholders. 5. Sell your ideas to your management. After finishing this very first step, you should have a pretty clear idea of the main drivers and goals for the data management initiative. And hopefully you have already gotten acquainted with a few main stakeholders, which will come in very handy during the following steps. 7 First course: apéritif DEFINE DATA NEEDS All business units within your company deal with data in one way or another. They produce data, transform and transport it, and most importantly - they use it for making decisions. In every business, data is the main means of communication. Ingredients data stakeholders data challenges data requirements data quality Preparation BEFORE YOU START Every company deals with some issues around data. It is often difficult to find volunteers who would want to take on the responsibility for managing this. Uncertainty in such responsibilities will result in a waste of time and resources. INSTRUCTIONS 1.Identify all stakeholders within your company. They are your main partners in making your data manageable. Usually the main stakeholders are top management, IT, finance, sales, production, and other business units. 2. Collect and align the requirements of different stakeholders. Do not be surprised if all of you have different needs and requirements for the data, its quality, the frequency of its delivery and the tools and devices involved in data delivery. 8 RESULTS As a result of your efforts you will: • know who the most influential stakeholders are and how to approach them; • have improved efficiency of your business partnering communication; • be able to protect your future business and information needs and requirements. DELIVERABLES 1. Stakeholders’ map and assessment. 2. Communication approach. 3. Business and data requirements. It is advisable to revise data requirements at least once a year. This has several reasons, but mainly because almost every year you might receive new requirements from regulators, thus your management reports will require regular updates. You should organize revision before the the start of the budgeting cycle. By this time, you will have better insight in additional investments regarding new technologies, applications, projects, developments of reporting, etc. 9 Second course: appetizer DIVIDE TASKS AND RESPONSIBILITIES Data management is a shared responsibility between data professionals and others stakeholders6. You need to establish data governance rules, starting with defining the main data management principles. Ingredients data management principles data governance data owner Preparation BEFORE YOU START The finance department often sees data as their responsibility, although you know now that every other department also has their own data needs. Assigning a data owner and defining a clear set of accountabilities, will result in more efficient data processing. shared responsibility rules, roles, and tasks INSTRUCTIONS 1. Set up data management principles. There are a lot of principles which your company can adapt and implement. For data management, you need to choose those which meet your company goals and culture. 2. Agree on governance rules and procedures. This action will allow you to define accountable functions per specific data management task. Data governance consists of: • roles and responsibilities; 10 • data management tasks assigned to specific roles; • processes and procedures; • governance bodies. The step is iterative. During the first stage, when the data management function is not fully designed, you will comprise a preliminary list of tasks and allocate responsibilities. Later on you might need to revise this list. 3. Identify the function responsible for data management. Although data management is a shared responsibility, you still need to decide who will have the ultimate accountability for the coordination of the tasks. There is no agreed vision on the place of the data management function in the company. Some companies consider it an IT function, some assign it directly to the CIO (if present), others give the responsibility to the financial business unit. It is your company’s choice. The most significant challenge for the person in this role, is balancing the (often contradicting) interests of different stakeholders and guiding all of them to common success. 4. Identify data-related roles, including the owners’ accountabilities. Aside from ‘data owners’, you must have heard of ‘system owners’, ‘process owners’, ‘product owners’, etc. The distinction between such functions is not always clear. It is crucial to align responsibilities of various roles and to assign them to business functions. RESULTS As soon as the data management principles, rules and roles are set up and agreed upon, you will: • know who you can approach to discuss your data-related (including data quality) issues and solutions; • be aware of all the concerns the main data stakeholders have; • be sure that all the issues with data will be resolved according to an agreed procedure. DELIVERABLES 1. Data management principles. 2. Data policy, governance bodies, procedures, roles, responsibilities. 3. Matrix: data management tasks vs roles. 4. Matrix: roles vs functions. 11 Third course: entree ORGANIZE YOUR DATA HOUSE Part 1: UNDERSTAND YOUR DATA Ingredients professional language internal and external communication business glossary report flow Clear communication between your colleagues and business partners is crucial in any business. You want to spend less time explaining what you mean and avoid any unnecessary duplications in your reports. How do you achieve this? Preparation BEFORE YOU START Speaking the same language with your co-workers and external parties is not as easily achieved as it seems. Usually there are two issues which you can encounter while communicating on professional level: • You use the same term as your colleague does, but it means something entirely different. • You use different terms which all mean the same thing. This is often the case with internal, as well as external communication, and is also reflected on the quality of data and reports that are being exchanged. The reports received by the stakeholders might not always be as comprehensible for them as you intend. And this, in its turn, can have an undesirable effect on their decision-making process. This issue creates problems with reconciliation of reports and figures from different departments and building enterprise DWH and BI solution. 12 INSTRUCTIONS 1. Create a business glossary by analysing: • company policies; • regulations relevant for your business; • main reports. 2. Agree with stakeholders on the definitions of terms. 3. Create a catalogue of main management reports circulating in your company. 4. Create a report flow. RESULTS By now, you should be: • aware which and how many reports contain duplicate information; • able to optimize the number of your reports and remove the duplicate information; • able to reduce the efforts on reconciliation reports from different departments; • able to communicate more productively and efficiently with your co-workers and business partners. DELIVERABLES 1. Company’s business glossary. 2. Report catalogue. 3. Report flow. 13 Third course: entree ORGANIZE YOUR DATA HOUSE Part 2: LOCATE YOUR DATA Ingredients critical data elements applications a ‘golden’ source Your dream is that pressing one button will make all required information magically appear on your screen. Unfortunately, such a tool does not exist yet. It is not about the tooling though: the key to easy access to data is knowing its exact location. Preparation BEFORE YOU START You have already identified how many reports are circulating in your company. Not all the data in these reports is equally critical to your business. Now it is time to minimize your efforts to a feasible minimum by identifying critical data elements that constitute the reports. It is also time to establish how many systems and applications are in use in your company. INSTRUCTIONS The main task is to identify the location of the most important elements within the applications involved in data processing, and pinpoint the ‘golden’ sources where the data elements were initially put into processing. 1. Identify critical data elements that have the biggest influence on your work results. 14 2. Catalogue all of the main applications, services, and interfaces. 3. Map applications and critical data elements. 4. Identify the ‘golden’ sources for critical data elements. 5. Extend the list of data elements and repeat actions 1 to 4. RESULTS By now, you should know how to: • search for the sources of information faster and more efficiently; • save time on reconciliation of reports as you can acquire data from the ‘golden’ sources; • decrease the number of processes, reduce duplicate applications, and save money in the process. DELIVERABLES 1. Catalogues of: • (critical) data elements; • applications; • ‘golden’ sources. 2. Matrices: • application vs (critical) data elements; • ‘golden’ sources vs (critical) data elements. 15 Third course: entree ORGANIZE YOUR DATA HOUSE Part 3: MANAGE THE DATA FLOW Ingredients data flow data transformation data life cycle data lineage 16 Once in a while you internal audit or external regulators request you to explain how you derive to certain data in your reports. The same data can lead to different results, depending on transformation it undergoes along the way. The key to knowing your data is understanding these transformations and the way it travels starting from the source to the end user. Preparation BEFORE YOU START You and your colleagues from other departments use the same data, but sometimes it leads to different outcomes. Who is right? It costs a lot of time and effort to investigate what causes these issues, but such investigations still happen almost daily. Due to new regulations (i.e., GDPR), the amount of such tasks will only grow. Supporting the data flow information is one of the most complicated data-related tasks that many companies deal with. The bigger the company the more complex and costly the solution will be. Every company has to find a feasible solution depending on its size and resources. The success lies not in the right software solution. The key is to get all the staff involved in sharing their information and be willing to make it maintainable. INSTRUCTIONS 1. Document or find a technical solution for the presentation of the data flow. Data lineage describes the changes that data undergoes from source to its end user. The main components of data lineage are business processes, application landscape, business roles, and technical metadata. 2. There are three solutions possible for documentation of data lineage: Solution 1: Data lineage by design. This requires highly automated software solutions that exist on the market. There are several providers that offer such solutions. Solution 2: Descriptive data lineage. You analyze and describe your business processes, applications, aggregated data, and controls. The most important challenges are: • centralization of the information to make it available to all stakeholders; • involvement of all stakeholders in the process; • making this process maintainable. Solution 3: A combination of solutions 1 and 2. RESULTS Making data flow information available will result in: • saving a lot of time not having to investigate data quality (and other)issues; • decreased operational risk due to the decreased amount of issues with data quality; • increased work efficiency of your staff. DELIVERABLES The main deliverables will depend on the type of the solution you have chosen. As a minimum you have to ensure centrally located and well-described documentation that links the following components of data lineage: • business processes; • business roles; • applications; • (critical) data elements; • business rules and controls. 17 Third course: entree ORGANIZE YOUR DATA HOUSE Part 4: IMRPOVE DATA QUALITY Ingredients data quality manual corrections data quality dimensions measurement criteria The saying ‘garbage in, garbage out’ is a precise description of the relationship between the quality of your data and the information on which decisions are based. You need to ensure the quality of your data in order to make your business grow and prosper. Preparation BEFORE YOU START Correcting errors made during data input or processing costs a lot of time and energy, it means extra hours for the staff doing manual adjustments, repeated corrections at least on a monthly basis, etc. You want your staff to focus on the analysis of the data, instead of investing their time in endless corrections. INSTRUCTIONS 1. Define who will be responsible for data quality within business units. Unless, of course, you have already done it while setting up data governance (Step 2). If you have - well done, now you can focus on the next points! 2. Define critical data elements for which you will examine and improve data quality. 3. Define the measurement criteria for data quality. 18 There are several data quality dimensions such as: accuracy, completeness, consistency, timeliness, etc.7 For the maximum benefit of your company, you need to prioritize the criteria which reflect the needs of your business the most. 4. Define techniques you wish to use for the ‘root – cause’ analysis. There can be a lot of different reasons why your data is of insufficient quality. Some of the common sources of such issues are: • data entry process; • data processing, including manual operations; • system (mal)functions.8 5. Set up a data quality issue log. 6. Start executing internal data quality initiatives. RESULTS Working with data of better quality, you will: • reduce operational risk; • improve the decision making process; • save resources on reconciliations and fixing reporting issues. DELIVERABLES 1. A list of data quality responsibilities assigned to business functions. 2. A catalogue of data issues. 3. Data quality issues resolution plan. 4. Data quality issues resolution business process embedded in daily activities. 5. Fine-tuned analysis techniques for data quality issues. 19 Fourth course: main course ASSESS THE GAPS It is highly probable that some of data management functions already exist in your company. Think, for example of information security. At this point you probably have a plan for further development. Now try to evaluate your current position. Once you know where you stand, you can start making changes towards your goal. Ingredients business function data management capability gap analysis roadmap Preparation BEFORE YOU START It is important to be on the same page with your business partners regarding feasibility and necessity of the changes. Also, you need to know how to achieve required results using minimal resources and effort. INSTRUCTIONS You have to make a gap analysis between the current situation and the situation ‘to be’. The gap analysis consists of the following steps: 1. Define which current business functions in your company are related to data-management. 2. Finalize the list of the data management functions you wish to develop (the revision we were talking about on p. 13). 20 3. Find the gaps between ‘now’ and ‘to be’. You can use the check-list in the end of this book (Appendix 2) to assess which data management capabilities are in need of further development. 4. Outline a roadmap to close these gaps. RESULTS As a result, you and your company’s management will have a clear strategic vision on: • which data management capabilities your company needs; • how long it will take to reach the desired situation and what it will cost; • what your company will gain as the result of this effort. DELIVERABLES 1. An overview of the required data management functions within your company. 2. A gap analysis between the current and desired future situation. 3. A roadmap. 4. A filled-in check list for data capabilities (see Appendix 2). 21 Fifth course: dessert KEEP GOING You already have an approved roadmap and an idea of what your company can achieve. It is time to put theory into practice! Ingredients maintenance and development Preparation BEFORE YOU START All the preliminary preparation is done, stop dreaming and start doing. INSTRUCTIONS There are various approaches you can choose from. You can work on project base, or you can go Agile-style. It does not matter which approach you will choose, as long as you keep the following key success factors in mind: • Data management needs to be set up as a business function. • Data management is a shared responsibility: the staff from different departments need to be involved on a daily basis. • Top management has to be the main sponsor and supporter of the implementation. • Once set up, it requires permanent maintenance and development. 22 • Some tasks of data management (i.e. data quality, data flow) are ongoing processes. • There are not too many widely experienced data management specialists on the market. It would be wise to train and develop your own staff. • You need to concentrate on small deliverables that can immediately improve your current processes and deliver results. RESULTS As a result of this last step, you will: • get a clear vision on the tasks to be done in a different time perspective; • know which deliverables and results you have to request from your staff; • know when you can expect improvements in your daily work, which investments and which reduction of costs you need to plan. DELIVERABLES The main deliverable is a clear operational model for data management in your company. 23 Notes 1. DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, 2nd edition. Technics Publications, 2017. 2. EDM Council. “Data Management Capability Assessment Model, DCAM 1.2.2. (Assessor’s Guide)” EDM Council, 23 Jan. 2018, www.edmcouncil.org/dcam. 3. The Open Group. “TOGAF Version 9.1”, The Open Group Standard no. G116, 2011. 4. TOGAF 9.1, 23. 5. TOGAF 9.1, 23. 6. DAMA-DMBOK 1, 5. 7. DAMA-DMBOK2, 465. 8. DAMA-DMBOK2, 467-469. Works cited DAMA International. DAMA Guide to the Data Managemen Body of Knowledge, 1st edition. Technics Publications, 2010. DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, 2nd edition. Technics Publications, 2017. EDM Council. “Data Management Capability Assessment Model, DCAM 1.2.2. (Assessor’s Guide)” EDM Council, 23 Jan. 2018, www.edmcouncil.org/dcam. The Open Group. “TOGAF Version 9.1”, The Open Group Standard no. G116, 2011. 24 Extra materials CASE STUDY: DATA QUALITY STEP-BY-STEP CHECKLIST Appendix 1 Case study DRIVER: DATA QUALITY Define data needs 1. Define main stakeholders that have concerns regarding data quality (DQ) issues. 2. Identify communication strategies with the stakeholders. 3. Document stakeholders’ data-related needs and requirements. Divide tasks and responsibilities 4. Define the scope of your company to be involved. 5. Capture data (management) principles as the basis for DQ governance. 6. Make self-assessment of current data management capabilities. 7. Identify required DQ management processes, procedures, tasks, and roles. 8. Develop or adjust data management roadmap, strategy, policy with regard to DQ tasks. Organize your data house 9. List reports that should be in scope. 10. Identify critical data domains and elements. 11. List reports involved in the scope. 12. Create a DQ issues log. 13. Specify definitions of critical data elements by putting them into 26 the business glossary and or by creating conceptual or logical data models. 14. Prepare an action plan for DQ issues resolution according to the DQ governance procedures. 15. In order to execute root-cause analysis, you need to document data lineage for the critical data elements. This process can be broken down into the following steps: a. identifying related business processes; b. creating a catalogue of applications; c. identifying location of data elements in applications by documenting database metadata; d. documenting business rules; e. analyzing existing business processes and data quality controls. 16. Identify data quality requirements of different data users. 17. Align your activities with main stakeholders, especially IT. 18. Clean historical data if needed. 19. Develop data quality checks and controls, based on DQ requirements. 20. Adjust existing data processing flows. Assess the gaps 21. Re-assess the initial plans through gap analysis. 22. Based on gap analysis, verify the feasibility of the chosen approach. Take action 23. Realize your strategy for fit-for-purpose data delivery with the required level of quality. 27 Appendix 2 Check list DRIVER Regulatory compliance, i.e. GDPR Data quality issues Implementation of advanced analytics techniques Improvement of business and financial planning and forecasting Other, DATA NEEDS Stakeholder map Stakeholder assessment Stakeholder communication approach Business needs and requirement Data needs and requirements TASKS & RESPONSIBILITIES Data management principles Data policy Data governance roles and responsibilities Data governance procedures Data management tasks vs roles Data roles vs organizational functions 28 DATA HOUSE Company business glossary Report catalogue Report flow Catalogue of (critical) data elements Data models (conceptual and logical) Catalogue of applications Data dictionary (physical data model) Catalogue of ‘golden’ sources Matrix application vs (critical) data element Matrix ‘golden’ source vs (critical) data element Descriptive data lineage (including business processes, roles, applications, data elements, business rules and controls) List of data quality responsibilities assigned to business functions Catalogue of data issues Data quality resolution plan Data quality issues resolution process Fine-tuned data quality issues analysis techniques THE GAPS Gap analysis between existing and desired (future) data management functions and tasks Roadmap on data management development FINAL PRODUCT Implemented operational data management function 29 A lot of companies realize that data is an invaluable asset and has to be managed accordingly. They would also like to get value from data. Everyone wants to be ‘data-driven’ these days. What lies beneath this idea, is the wish to make the decision-making process easier and more effective. It means delivering the required data of acceptable quality to the relevant decision makers when and where they need it. In short: a lot of companies have the necessity to manage their data properly. The main question is: how do you put this in practice? Knowing the potential of your data, and managing it correctly is the key to an effective and successful business. As a result of well-implemented data management, you will be able to reduce risks and costs, increase efficiency, ensure business continuity and successful growth. In this book, we invite you for a five-course dinner. During each course we will explain the steps of our 5-step programme which guarantees successful implementation of data management. ABOUT THE AUTHOR Dr. Irina Steenbeek is a dedicated and tenacious Senior Data Management, Finance and IT Professional with 15+ years of extensive experience. Her areas of expertise are data management, software implementation, financial and business control, project management, business process re-engineering, and management consulting and training. Throughout the years, she has worked for various medium and large multinational organizations, among which The World Bank, ABN AMRO Bank, Amsterdam Trade Bank, and International Card Services (ICS). In 2016 she has founded Data Crossroads - a consulting agency in the area of data management and predictive analytics. She has developed several models for implementation of data management which are based on industry reference guidelines and are universal for every business. Her approach is highly customizable, and ensures effective results in any type of organization. Data Crossroads connects experts within various industries in order to ensure the highest quality of consultation to all clients. ISBN: 1984149938 ISBN-13: 978-1984149930