Big Data: Impacts and Benefits About ISACA® With more than 100,000 constituents in 180 countries, ISACA® (www.isaca.org) is a leading global provider of knowledge, certifications, community, advocacy and education on information systems (IS) assurance and security, enterprise governance and management of IT, and IT-related risk and compliance. Founded in 1969, the nonprofit, independent ISACA hosts international conferences, publishes the ISACA® Journal, and develops international IS auditing and control standards, which help its constituents ensure trust in, and value from, information systems. It also advances and attests IT skills and knowledge through the globally respected Certified Information Systems Auditor® (CISA®), Certified Information Security Manager® (CISM®), Certified in the Governance of Enterprise IT® (CGEIT®) and Certified in Risk and Information Systems ControlTM (CRISCTM) designations. ISACA continually updates and expands the practical guidance and product family based on the COBIT® framework. COBIT helps IT professionals and enterprise leaders fulfill their IT governance and management responsibilities, particularly in the areas of assurance, security, risk and control, and deliver value to the business. Disclaimer ISACA has designed and created Big Data: Impacts and Benefits (the “Work”) primarily as an educational resource for governance and assurance professionals. ISACA makes no claim that use of any of the Work will assure a successful outcome. The Work should not be considered inclusive of all proper information, procedures and tests or exclusive of other information, procedures and tests that are reasonably directed to obtaining the same results. In determining the propriety of any specific information, procedure or test, governance and assurance professionals should apply their own professional judgment to the specific circumstances presented by the particular systems or information technology environment. Reservation of Rights © 2013 ISACA. All rights reserved. No part of this publication may be used, copied, reproduced, modified, distributed, displayed, stored in a retrieval system or transmitted in any form by any means (electronic, mechanical, photocopying, recording or otherwise) without the prior written authorization of ISACA. Reproduction and use of all or portions of this publication are permitted solely for academic, internal and noncommercial use and for consulting/advisory engagements, and must include full attribution of the material’s source. No other right or permission is granted with respect to this work. ISACA 3701 Algonquin Road, Suite 1010 Rolling Meadows, IL 60008 USA Phone: +1.847.253.1545 Fax: +1.847.253.1443 Email: info@isaca.org Web site: www.isaca.org Provide feedback: www.isaca.org/Big-Data-WP Participate in the ISACA Knowledge Center: www.isaca.org/knowledge-center Follow ISACA on Twitter: https://twitter.com/ISACANews Join ISACA on LinkedIn: ISACA (Official), http://linkd.in/ISACAOfficial Like ISACA on Facebook: www.facebook.com/ISACAHQ Big Data: Impacts and Benefits 2 Big Data: Impacts and Benefits Acknowledgments ISACA wishes to recognize: Project Development Team Richard Chew, CISA, CISM, CGEIT, Emerald Management Group, USA Keith Genicola, KPMG LLP, USA Brian Li, Ernst & Young LLP, CFE, CMCON, USA Jothi Philip, CISA, ACA, CISSP, Bank of England, UK Tichaona Zororo, CISA, CISM, CGEIT, CRISC, CIA, EGIT | Enterprise Governance of IT (PTY) Ltd., South Africa Expert Reviewers Joanne De Vito De Palma, BCMM, The Ardent Group LLC, USA Russell Fairchild, CISA, CRISC, CISSP, PMP, SecureIsle, USA Rammiya Perumal, CISA, CISM, CRISC, Sumitomo Mitsui Bank, USA Lily M. Shue, CISA, CISM, CGEIT, CRISC, CCP, LMS Associates LLP, USA ISACA Board of Directors Gregory T. Grocholski, CISA, The Dow Chemical Co., USA, International President Allan Boardman, CISA, CISM, CGEIT, CRISC, ACA, CA (SA), CISSP, Morgan Stanley, UK, Vice President Juan Luis Carselle, CISA, CGEIT, CRISC, Wal-Mart, Mexico, Vice President Christos K. Dimitriadis, Ph.D., CISA, CISM, CRISC, INTRALOT S.A., Greece, Vice President Ramses Gallego, CISM, CGEIT, CCSK, CISSP, SCPM, Six Sigma Black Belt, Dell, Spain, Vice President Tony Hayes, CGEIT, AFCHSE, CHE, FACS, FCPA, FIIA, Queensland Government, Australia, Vice President Jeff Spivey, CRISC, CPP, PSP, Security Risk Management Inc., USA, Vice President Marc Vael, Ph.D., CISA, CISM, CGEIT, CRISC, CISSP, Valuendo, Belgium, Vice President Kenneth L. Vander Wal, CISA, CPA, Ernst & Young LLP (retired), USA, Past International President Emil D’Angelo, CISA, CISM, Bank of Tokyo-Mitsubishi UFJ Ltd. (retired), USA, Past International President John Ho Chi, CISA, CISM, CRISC, CBCP, CFE, Ernst & Young LLP, Singapore, Director Krysten McCabe, CISA, The Home Depot, USA, Director Jo Stewart-Rattray, CISA, CISM, CGEIT, CRISC, CSEPS, BRM Holdich, Australia, Director Knowledge Board Marc Vael, Ph.D., CISA, CISM, CGEIT, CRISC, CISSP, Valuendo, Belgium, Chairman Rosemary M. Amato, CISA, CMA, CPA, Deloitte Touche Tohmatsu Ltd., The Netherlands Steven A. Babb, CGEIT, CRISC, Betfair, UK Thomas E. Borton, CISA, CISM, CRISC, CISSP, Cost Plus, USA Phil J. Lageschulte, CGEIT, CPA, KPMG LLP, USA Jamie Pasfield, CGEIT, ITIL V3, MSP, PRINCE2, Pfizer, UK Salomon Rico, CISA, CISM, CGEIT, Deloitte LLP, Mexico Guidance and Practices Committee Phil J. Lageschulte, CGEIT, CPA, KPMG LLP, USA, Chairman Dan Haley, CISA, CGEIT, CRISC, MCP, Johnson & Johnson, USA Yves Marcel Le Roux, CISM, CISSP, CA Technologies, France Aureo Monteiro Tavares Da Silva, CISM, CGEIT, Vista Point, Brazil Jotham Nyamari, CISA, Deloitte, USA Connie Lynn Spinelli, CISA, CRISC, CFE, CGMA, CIA, CISSP, CMA, CPA, BKD LLP, USA Siang Jun Julia Yeo, CISA, CPA (Australia), Visa Worldwide Pte. Limited, Singapore Nikolaos Zacharopoulos, CISA, DeutschePost–DHL, Germany 3 Big Data: Impacts and Benefits Acknowledgments (cont.) ISACA and IT Governance Institute® (ITGI®) Affiliates and Sponsors Information Security Forum Institute of Management Accountants Inc. ISACA chapters ITGI France ITGI Japan Norwich University Socitum Performance Management Group Solvay Brussels School of Economics and Management Strategic Technology Management Institute (STMI) of the National University of Singapore University of Antwerp Management School ASIS International Hewlett-Packard IBM Symantec Corp. 4 Big Data: Impacts and Benefits Introduction Big data is both a marketing and a technical term referring to a valuable enterprise asset—information. Big data represents a trend in technology that is leading the way to a new approach in understanding the world and making business decisions. These decisions are made based on very large amounts of structured, unstructured and complex data (e.g., tweets, videos, commercial transactions) which have become difficult to process using basic database and warehouse management tools. Managing and processing the ever-increasing data set requires running specialized software on multiple servers. For some enterprises, big data is counted in hundreds of gigabytes; for others, it is in terabytes or even petabytes, with a frequent and rapid rate of growth and change (in some cases, almost in real time). In essence, big data refers to data sets that are too large or too fast-changing to be analyzed using traditional relational or multidimensional database techniques or commonly used software tools to capture, manage and process the data at a reasonable elapsed time. According to COBIT® 5, information is effective if it meets the needs of the information consumer (who is considered a stakeholder). In the case of big data, the enterprise is the stakeholder, and one of its primary stakes is information quality. The stakes can be related to information goals in the COBIT 5 enabler model, which divides them into three subdimensions of quality, described later in this white paper. The better the quality of the data, the better the decisions based on the data—ultimately creating value for the enterprise. Therefore, big data management must ensure the quality of the data throughout the data life cycle. Data are collected to be analyzed to find patterns and correlations that may not be initially apparent, but may be useful in making business decisions. This process is called big data analytics. These data are often personal data that are useful from a marketing perspective in understanding the likes and dislikes of potential buyers and in analyzing and predicting their buying behavior. Personal data can be categorized as: • Volunteered data—Created and explicitly shared by individuals (e.g., social network profiles) • Observed data—Captured by recording the actions of individuals (e.g., location data when using cell phones) • Inferred data—Data about individuals based on analysis of volunteered or observed information (e.g., credit scores) The primary objective of analyzing big data is to support enterprises in making better business decisions. Data scientists and other users analyze large amounts of transaction data as well as other data sources that may be ignored by traditional business intelligence software, such as web server logs, social media activity reports, cell phone records and data obtained via sensors. Data analytics can enable a targeted marketing approach that gives the enterprise a better understanding of its customers—an understanding that will influence internal processes and, ultimately, increase profit, which provides the competitive edge most enterprises are seeking. The primary objective of analyzing big data is to support enterprises in making better business decisions. This white paper provides an overview of the impact that big data collection and analytics can have on an enterprise. It identifies potential business benefits, challenges, risk, governance and risk management practices, and provides an overview of relevant assurance considerations related to big data analytics. 5 Big Data: Impacts and Benefits Impact of Big Data on the Enterprise Big data can impact current and future process models in many ways. Beyond a business impact, the aggregation of data can affect governance and management over planning, utilization, assurance and privacy: • Governance—What data should be included and how should governance of big data be defined and delivered? (These topics are explored later in this white paper.) • Planning—Planning involves the process of collecting and organizing outcomes to: – Justify process adjustments or improvements which until recently could be identified using specialized research techniques such as predictive modeling. – Design a trading program predicated on certain conditions that trigger events. – Encourage target purchase patterns while a buyer is researching products and services. – Use location-based information in combination with other collected data to guide customer loyalty, route traffic, identify new product demands, etc. – Manage just-in-time (JIT) inventory based on seasonal or demand changes. For example, a manufacturing enterprise may adjust production levels for a particular item after the part number is not ordered for two consecutive days. – Manage operations of logistics and transportation firms based on real-time performance. – Manage unplanned IT infrastructure and policy changes that disrupt the direction of IT support. • Utilization—Use of big data can vary from one enterprise to another depending on the enterprise’s culture and maturity. A small enterprise may be slower to adopt big data because it may not have the necessary infrastructure to support the new processes involved. Companies such as IBM®, Hewlett-Packard Company (HP) and Amazon.com®, on the other hand, have changed direction over the last few years from selling products to providing services and using information to guide business decisions. Companies that have embraced big data have made the necessary investments to become information mavens capable of identifying new product and service demands using data mining— information that they then turn into a competitive advantage by being the first to market. Infrastructures built to support big data are also cross-marketed to support cloud computing services, in a way making customers business partners (causing the rise of phrases such as “frenemies” and “coopetition”). In other words, big data customers may be competitors in one geometric plane and cooperative partners in another, as with Netflix using the Amazon.com cloud infrastructure to support its media streaming. • Assurance—Experience leads enterprises to develop better assurance practices. Once leadership develops a strategy that leverages big data, the enterprise can focus on defining an assurance framework to control and protect big data. The main concern for the assurance organization is data quality, addressed by topics such as normalization, harmonization and rationalization. (These topics are technical and pertinent to publications on tools and techniques, and are not covered in this white paper.) • Privacy—Privacy protection has always been handled differently by geographic regions, governments and enterprises. Laws protect the privacy Laws protect the privacy of individuals of individuals and any information collected about them, even if people share confidential information inappropriately, for example, posting nonpublic or and any information collected about them, even if people share confidential private information (e.g., pictures of credit cards, birthdays, phone numbers, personal preferences) in social media outlets. Regardless of the authenticity of information inappropriately. information collected from social media, its collection requires protection from nefarious users as well as over-controlling governments. 6 Big Data: Impacts and Benefits Business Benefits of Big Data Big data opportunities are significant, as are the challenges. Enterprises that master the emerging discipline of big data management can reap significant rewards and differentiate themselves from their competitors. Indeed, research conducted by Erik Brynjolfsson, an economist at the Sloan School of Management at the Massachusetts Institute of Technology (USA), shows that companies that use “data-directed decision making” enjoy a five to six percent boost in productivity.1 Proper use of big data goes beyond collecting and analyzing large quantities of data; it also requires understanding how and when to use the data in making crucial decisions. Enterprises that master the emerging discipline of big data management can reap significant rewards and differentiate themselves from their competitors. Competitive advantage can be greatly improved by leveraging the right data. According a research report by McKinsey,2 the potential value from data in the US health care sector could be more than US $300 billion in value every year, two-thirds of which would be in the form of reducing national health care expenditures by approximately eight percent. Financial benefits can be realized when data management processes are aligned with the enterprise’s strategy, which may require top management involvement to set direction and oversee major decisions. Big data analytics can positively impact: • Product development • Market development • Operational efficiency • Customer experience and loyalty • Market demand predictions The process for accessing organization-specific commercial insights from big data is shown in figure 1. Figure 1—Addressing Organization–specific Commercial Insights Business Benefits Analyze Better Decisions Exhaust Data Any Data Acquire Discover (social media, enterprise records, Data as a Service [DaaS], competitor data) Organize Predict Vast amounts of information collected from every imaginable source Plan 1 2 Faster Action Greater Innovation Stronger Competitive Advantage Swalwell, John; “Big Data and Intelligent Image Capture Platforms,” Technology First, USA, August 2012 Manyika, James; Michael Chui; Brad Brown; Jacques Bughin; Richard Dobbs; Charles Roxburgh; Angela Hung Byers; “Big data: The next frontier for innovation, competition, and productivity,” McKinsey Global Institute, McKinsey & Company, USA, May 2011 7 Big Data: Impacts and Benefits Should the enterprise pursue big data wholeheartedly or start small with target opportunities? Buy or outsource? These are strategies that should be implemented based on the strategic goals and existing capabilities of each enterprise. For enterprises ready to turn big data from a revenue-hemorrhaging liability into a revenue-enhancing asset, a four-tier plan is proposed: 1. Take time to strategize—Work with key stakeholders and business units to understand their data needs. Incorporate their feedback to improve processes across the business. 2. Think analytically—Improve the analytical support team and ensure that managers have the applications and access they need to examine business-critical information firsthand. 3. Ask for what is needed—Leverage industry-specific applications and software, where available. If needs are not being met, alert the management team and/or industry suppliers. 4. Invest to improve—Arm the enterprise with the appropriate technology, staff and systems/processes needed to optimize information for true business intelligence. Risks and Concerns With Big Data Enterprises are investing considerable capital to develop and deploy big data analytics and measurement to obtain an early competitive advantage. Although big data can supply a competitive advantage and other benefits, it also carries significant risk. Now that enterprises have huge amounts of structured and unstructured data available, management should be asking: • Where should we store the data? • How are we going to protect the data? • How are we going to utilize the data safely and lawfully? In the following section, risk and concerns associated with big data are highlighted. The concept of big data risk management is still at the infancy stage for many enterprises, and security policies and procedures are still developing in many Inaccurate, incomplete or fraudulently areas. Numerous business executives might not recognize that the faster manipulated data pose increasing risk as and easier it is to access big data, the greater the risk to all of that valuable information. For the data to be utilized productively, executives must pay enterprises become more dependent on the data to drive decision special attention to corporate data life cycle processes; big data insights are only as good as the data themselves. According to the COBIT 5 information making and assess results. enabler, the full life cycle of information needs to be considered and different approaches may be necessary, depending on the phase within the life cycle. The COBIT 5 information enabler identifies four different phases (i.e., plan, design, build/acquire and use/operate). Inaccurate, incomplete or fraudulently manipulated data pose an increasing risk as enterprises become more dependent on the data to drive decision making and assess results. The need to manage data risk within the enterprise may not be clearly communicated and understood at all management levels. It is essential to point out that addressing big data risk and concerns cannot be seen exclusively as an information technology exercise. Participation from the entire enterprise, including legal, finance, compliance, internal audit and other business departments, allows everyone to focus on the business goals in the planning stage. Enterprises can then focus on both the technical and business aspects of big data. At times enterprises may resist periodic reviews of big data strategies and security policies and procedures because top management believes that the current practice is “sufficient” and is reluctant to spend more if it is not “necessary.” This philosophy, however, is inaccurate. Security and privacy play an increasingly important role in big data, and all stakeholders should be aware of the implications of storing and cross-analyzing large amounts of sensitive, disparate 8 Big Data: Impacts and Benefits data. Furthermore, it is imperative to understand that some data should be considered “toxic” in the sense that loss of control over these data could be damaging to the enterprise. Examples of potentially “toxic” data are: • Private or custodial information such as credit card numbers, personally identifiable information such as Social Security numbers, and personal health information • Strategic information such as intellectual property, business plans and product designs • Information such as key performance indicators, sales figures, financial metrics and production metrics used to make critical decisions Data vulnerabilities are especially acute for enterprises that rely on personal data that are generated or can be modified by the public. For instance, social media data can be a highly valuable source for assessing customer sentiment, tracking the effectiveness of marketing campaigns and learning more about consumers. However, utilizing this type of personal data will require addressing current uncertainties and points of tension: • Privacy—Individual needs for privacy vary. Policy makers face a complex challenge while developing legislation and regulations. • Global governance—There is a lack of global legal interoperability, with each country evolving its own legal and regulatory frameworks. • Personal data ownership—The concept of property rights is not easily extended to data, creating challenges in establishing usage rights. • Transparency—Too much transparency too soon presents as much of a risk to destabilizing the personal data ecosystem as too little transparency. • Value distribution—Even before value can be shared more equitably, more clarity is required on what truly constitutes value for each stakeholder. To minimize the potential for damages resulting from inaccurate or fraudulent data, enterprises should take inventory of all the data sources they are pulling into their analyses and assess each source for vulnerabilities. Are the data publicly generated? Who has access to the data at any point before they enter the analysis? Are there incentives to manipulate the data? In the case of vulnerable data sources, classification techniques can be employed to detect potentially fraudulent data points and remove them prior to further dissemination. To minimize the potential for damages resulting from inaccurate or fraudulent data, enterprises should take inventory of all the data sources they are pulling into their analyses and assess each source for vulnerabilities. Strategies for Addressing Big Data Risk The main strategy for addressing risk is aligning the technology solution to business needs. The COBIT 5 framework addresses this in the goals cascade by aligning stakeholder drivers and stakeholder needs. These needs cascade to the enterprise goals, then to the IT-related goals, and ultimately to the enabler goals. There are seven enablers that should be applied to assist the enterprise in addressing risk and improving its ability to meet its business objectives and create value for its stakeholders. When new initiatives, such as adoption of big data, are properly aligned to the business, existing governance structures can be easily adjusted to address security, assurance and a general approach to embracing new technologies. These steps should include building the talent base, invoking alignment of information security concerns related to big data, and starting pilot programs to determine whether the need is to build internally or consume benefits of prior big data wisdom. The COBIT 5 people, skills and competencies enabler, which suggests that the enterprise should know what its current skill base is and plan what it needs to be, will be helpful in building the talent base. 9 Big Data: Impacts and Benefits Building the talent base internally is a fundamental cornerstone to best practice. Who can understand enterprise culture, processes and the behavior of enterprise data better than staff? Power users and their tools are an excellent start to: • Determine what internal resources and capacities are available to digest existing information. • Determine what tools are needed to enhance the information acquisition and digestion process. • Address how information will be used to achieve both tactical and strategic goals, if the determination is made that new and/or different information is needed. • Develop or obtain training programs for the team. • Determine whether a data scientist is needed. • Establish realistic expectations and create a tactical plan. Integrating big data analytics into business risk management and security operations is not an easy task. While big data in general has transformed competitive dynamics in an enterprise, it has also transformed the enterprise’s information security programs, including how the security programs are developed and executed. It is prudent to set expectations with stakeholders at every step of the journey. This helps mitigate the risk of losing focus on the “shared vision” of strategic business alignment. Risk can also be mitigated by ensuring the quality of the data. The COBIT 5 information enabler guides the enterprise through the information cycle by suggesting that business processes generate and process data, converting them into information and knowledge, and ultimately producing value for the enterprise by delivering quality data. The information enabler also lays out the approach by suggesting that the first step is to identify stakeholders as well as their stakes (i.e., why they care or are interested in the information). The stakes can be related to information goals. Goals of information are divided into three subdimensions of quality (figure 2). Figure 2—Data Quality Subdimensions Intrinsic Quality • Accuracy • Objectivity • Believability • Reputation The immediate adoption of outsourcing denies an enterprise the intellectual property it needs to partner, manage and control the big data journey. 10 Contextual and Representational Quality • Relevancy • Completeness • Currency • Appropriate amount of information • Concise representation • Consistent representation • Interpretability • Understandability • Ease of manipulation Security/Accessibility Quality • Availability/timeliness • Restricted access Choosing a partner is a major step toward deciding what processes are to be embraced eventually. It is the “make or buy” decision of every facet of the journey from training and information protection, to pilot project and intellectual property transfer. The immediate adoption of outsourcing denies an enterprise the intellectual property it needs to partner, manage and control the big data journey. Every enterprise should, at a minimum, experience some facets of big data to gain knowledge and expertise for future reference. Big data may change the way enterprises do business, and it will affect their business, culture and processes. It should also be a catalyst for how the enterprise selects and changes partners. Big Data: Impacts and Benefits Selection is a critical first step and can incorporate several strategies in addition to selection of the big data vendor: • It can result in a strategic alliance with one or more big data technology providers. • It can ensure that training classes are taught by practitioners, not by those who cannot answer fundamental questions, and training infrastructures that support hands-on interaction are used. • It can ensure that course information is shared with, and thoroughly reviewed by, the big data team. • The pilot project can encompass the instructor and the company big data team, in recognition that the project is really a work in progress. • Third-party processes, project management and goals can be aligned to enterprise goals and expertise. • Stakeholders in business and risk management can be involved to ensure that appropriate controls are in place with the third-party vendor/partner. Once a company knows what it wants, it must determine how to obtain the information it needs. A data broker is a possible source. Some companies already in the business of brokering information about enterprises include Bloomberg, Thomson Reuters, Simmons Market Research and The Nielsen Company. If the enterprise elects to build, it must decide: • Whether it should use a broker • Whether to use a training partner for the project • Whether it takes small steps or giant leaps of faith as it acquires terabytes • What options are available as partners • What the project deliverables should be Project documentation should be a deliverable to: • Prevent vendor/partner lock-in. • Demonstrate ownership of intellectual property. Governance for Big Data Governance ensures that stakeholders’ needs, conditions and options are evaluated to determine balanced, agreed-on enterprise objectives to be achieved. It further supports setting direction through prioritization and decision making, and monitoring performance and compliance against agreed-on direction and objectives. The scope of an enterprise’s governance, risk and compliance would most likely be expanded to create a unified system to consolidate silos and business functions to enable access of all the data. Figure 3—End-to-end Governance Governance Objective: Value Creation Benefits Realisation Risk Optimisation Governance Enablers Resource Optimisation Governance Scope The end-to-end governance approach that is at the foundation of COBIT 5 is depicted in figure 3, showing the key components of a governance system. Roles, Activities and Relationships Source: COBIT 5, ISACA, USA, 2012, figure 8 11 Big Data: Impacts and Benefits Without a proper data governance process, big data projects can unleash a lot of trouble, including misleading data and unexpected costs. Without a proper data governance process, big data projects can unleash a lot of trouble, including misleading data and unexpected costs. The role of data governance in keeping the big data house in order is just starting to be understood given the relatively recent emergence of the technology and its allocation to the IT department. Consequently, governance of big data environments is at an early stage of maturity and there are few widespread prescriptions for how to do it effectively. One fundamental problem is that pools of big data are oriented more to data exploration and discovery than they are to conventional business intelligence reporting and analysis. Data governance programs provide a framework for setting data-usage policies and implementing controls designed to ensure that information remains accurate, consistent and accessible. Data governance programs provide a framework for setting data-usage policies and implementing controls designed to ensure that information remains accurate, consistent and accessible. Clearly, a significant challenge in the process of governing big data is categorizing, modeling and mapping the data as they are captured and stored, particularly because of the unstructured nature of much of the information. Data often come from external sources, and accuracy cannot always be easily validated; also, the meaning and context of text data are not necessarily self-evident. For many enterprises, big data involves a collective learning curve for all concerned: IT managers, programmers, data architects, data modelers and data governance professionals. To help ensure that the data are mapped properly, the task should be assigned to a senior data architect whose experience and IT background will prove invaluable in this complex activity. During the exploratory phase of big data projects, which defines expected business value and leads to formal initiatives, enterprises should consider the fundamental questions (as articulated by IBM) within information management: • Do we fully recognize the responsibilities associated with handling big data? • How does big data change the traditional concept of information as a corporate asset? • What are the emerging requirements around privacy? • How do the big data technologies relate to our current IT infrastructure? The discussion surrounding big data may raise more questions for the chief information officer (CIO) than he/she is prepared to answer. Many enterprises justify the lack of adequate governance policies because they believe that big data is “different” somehow, which is side-stepping the issue. Simply stated, as big data technologies become operational— as opposed to exploratory—they need the same governance disciplines as applied to traditional approaches to data management. When implementing an information governance program, the current (as-is) state should be assessed and the future (to-be) state should be developed. COBIT 5 can help the enterprise address this task and others inherent in governing big data, ultimately guiding the enterprise’s efforts to create value by striking a balance between realizing benefits and maintaining risk at an acceptable level. Assurance Considerations for Big Data Controls around big data can be grouped into four categories: • Approach and understanding • Quality • Confidentiality and privacy • Availability 12 Big Data: Impacts and Benefits Approach and Understanding This category addresses demonstrating the right tone at the top of the enterprise. A critical facet in this effort is the establishment and implementation of a data policy. The policy (and associated procedures) should define the data in scope; establish a system of governance and assurance over data quality; and identify qualitative and quantitative criteria to assess accuracy, reliability, completeness and timeliness of data. Taking an inventory of all data sources, assessing vulnerabilities, and implementing policies and procedures will most certainly cost the enterprise time and money. Such costs are necessary when managing risk and should be considered the cost of doing business. The assurance process should begin by creating an inventory of the data. After the inventory, the data should be classified for sensitivity and relevance and a data flow created. A process should then be developed to identify vulnerabilities in the data flow, an activity that begins with creation of a multidimensional data flow diagram supported by a data dictionary3 that maps the data landscape across the enterprise. This process should capture internal and external sources of data, the various automated and manual processes (e.g., transformation, aggregation) performed on each data set, and their ultimate destination and use. Each vulnerability identified should be entered into an established data deficiency governance process for analysis of impact and probability, an escalation to senior management where necessary, and a strategic or tactical resolution. In addition, each vulnerability needs an owner—someone who is responsible for the data. This category addresses demonstrating the right tone at the top of the enterprise. A critical facet in this effort is the establishment and implementation of a data policy. Each vulnerability identified should be entered into an established data deficiency governance process for analysis of impact and probability, an escalation to senior management where necessary, and a strategic or tactical resolution. Materiality criteria should be established that will enable those responsible for data governance to identify the most relevant data sets and items on which to focus their efforts. This process will also help create an escalation path for data deficiency management. Data Quality Controls should be established and implemented across the data flow to assess data against the accuracy, reliability, completeness and timeliness criteria defined in the data policy and associated standards. Where data are being sourced from a third party, the enterprise should establish a contractually bound process to gain confidence over the quality of the data. This could be through an independent validation of data quality controls at the third party or through having independent checks on any material data received. Ownership and responsibilities associated with each material data set should be assigned. Appropriate training should be rolled out to all relevant personnel to make them aware of their data-related responsibilities. For example, two roles that could be defined are data producer and data consumer. A data producer provides data to the data consumer according to predefined quality requirements. The consumer must define and communicate the expected quality requirements for the data and validate against them when the data are received. The roles change as data moves across the data flow. 3 Two roles that could be defined are data producer and data consumer. A data producer provides data to the data consumer according to predefined quality requirements. he data dictionary should also document all material data items and their relationship with each other, their source, and their usage, so that a consistent T understanding can be established throughout the enterprise. 13 Big Data: Impacts and Benefits Data Confidentiality/Privacy Through the data risk management process, all sensitive data should be identified and appropriate controls put in place. The nature of the sensitive information could vary from personal information to competitive secrets. A number of rules and regulations, such as the 1998 UK Data Protection Act and the US Payment Card Industry Data Security Standards (PCI DSS), govern how sensitive data should be secured in storage and transit. Logical and physical access security controls are needed to prevent unauthorized access to sensitive data. This includes classic Information Technology General Controls (ITGC) such as password settings, masking or partially masking sensitive data, periodic user access review, firewalls, server room door security, server access logs, administrative access privileges and screen saver lockout. Encryption technologies must be used to store and transfer highly sensitive information within and outside the enterprise. Data Availability Reliable (i.e., tested) disaster recovery arrangements should be in place to ensure that data are available in accordance with the data recovery point objective (RPO) and recovery time objective (RTO) criteria defined in a business impact analysis. Conclusion Constant change and innovation are challenges that the enterprise and data science team must manage. Innovation threatens the traditional “comfort zone” of stability and longevity. Accountability is also a fine line to manage. The enterprise culture, which either fights or embraces innovation, requires a big data leader who understands his/her role in innovation or enterprise direction. The enterprise culture, which either fights or embraces innovation, requires a big data leader who understands his/her role in innovation or enterprise direction. In addition, the leader must: • Manage expectations • Reward behaviors rather than results • Shield data scientists from the detailed scrutiny of management and investors • Manage projects • Communicate well to span the enterprise channels It is not unusual for various levels of leadership to disagree. Soft skills that stimulate a focus on shared goals and a desire to avoid failure, rather than the disagreement itself, are needed to navigate conflicts within the enterprise and among the big data team members. Additional Resources and Feedback Visit www.isaca.org/Big-Data-WP for additional resources and use the feedback function to provide your comments and suggestions on this document. Your feedback is a very important element in the development of ISACA guidance for its constituents and is greatly appreciated. 14