CHAPTER 3 Data and Knowledge Management Chapter 3: Data and Knowledge Management 3.1 Managing Data 3.2 The Database Approach 3.3 Database Management Systems 3.4 Data Warehouses and Data Marts 3.5 Knowledge Management Copyright John Wiley & Sons Canada 2 LEARNING OBJECTIVES 1. Identify three common challenges in managing data, and describe one way organizations can address each challenge using data governance. 2. Name six problems that can be minimized by using the database approach. 3. Demonstrate how to interpret relationships depicted in an entity-relationship diagram. 4. Discuss at least one main advantage and one main disadvantage of relational databases. Copyright John Wiley & Sons Canada 3 LEARNING OBJECTIVES (continued) 5. Identify the six basic characteristics of data warehouses and data marts. 6. Demonstrate the use of a multidimensional model to store and analyze data. 7. List two main advantages of using knowledge management, and describe the steps in the knowledge management system cycle. Copyright John Wiley & Sons Canada 4 OPENING CASE 3.1 BIG DATA The Problem • In fact, the amount of digital data increases tenfold every five years. Scientists say that we are undergoing a new revolution, the “Industrial Revolution of Data,” and they have coined the term “Big Data” to describe the superabundance of data available today. This causes issues in storage space, speed, time, structure, quantity and quality of data. Copyright John Wiley & Sons Canada 5 THE SOLUTION For many organizations, the first step in managing Big Data was to deal with the problem of information silos. Silos are information that is stored and isolated in separate functional areas. Organizations began to integrate this information into a database environment and then to develop data warehouses to serve as decision-making tools. Next, they turned their attention to the business of data and information management; that is, making sense of their proliferating data. Seeing a market need for data management, Oracle, IBM, Microsoft, and SAP together have spent more than $15 billion in recent years to purchase software firms specializing in data management and business intelligence Copyright John Wiley & Sons Canada 6 THE RESULTS • • • The way information is managed touches all areas of life. Today, the availability of abundant yet small-scale data enables companies to cater to niche markets, and even individual customers, anywhere in the world. Some industries have led the way in gathering and exploiting data. For example, credit card companies monitor every purchase and can accurately identify fraudulent ones, using rules derived by analyzing billions of transactions. Copyright John Wiley & Sons Canada 7 DISCUSSION • • What market do you believe will experience the most growth in “Big Data”? Smart Phones? Tablets? What type of “Big Data” is used at a university? Copyright John Wiley & Sons Canada 8 3.1 MANAGING DATA • • The Difficulties of Managing Data Data Governance Copyright John Wiley & Sons Canada 9 DIFFICULTIES IN MANAGING DATA • Amount of data increases exponentially over time • Data are scattered throughout organizations • Data obtained from multiple internal and external sources • Data degrade over time • Data subject to data rot • Data security, quality, and integrity are critical, yet easily jeopardized • Information systems that do not communicate with each other can result in inconsistent data; • Federal regulations. Copyright John Wiley & Sons Canada 10 DATA GOVERNANCE • • • • Data Governance Master Data Management Master Data See video Copyright John Wiley & Sons Canada 11 MASTER DATA MANAGEMENT • John Stevens registers for Introduction to Management Information Systems (ISMN 3140) from 10 AM until 11 AM on Mondays and Wednesdays in Room 41 Smith Hall, taught by Professor Rainer. Transaction Data Master Data John Stevens Student Intro to Management Information Systems Course ISMN 3140 Course No. 10 AM to 11AM Time Mondays and Wednesday Weekday Room 41 Smith Hall Location Professor Rainer Instructor Copyright John Wiley & Sons Canada 12 3.2 THE DATABASE APPROACH • Databases minimize the following problems: – Data redundancy: The same data are stored in many places. – Data isolation: Applications cannot access data associated with other applications. – Data inconsistency: Various copies of the data do not agree. Copyright John Wiley & Sons Canada 13 DATABASE APPROACH (CONTINUED) • Database Management Systems (DBMS) maximize the following issues: – Data security: Databases have extremely high security measures in place to deter mistakes and attacks. – Data integrity: Data meet certain constraints, such as no alphabetic characters in a Social Insurance Number field. – Data independence: Applications and data are not linked to each other, so that all applications are able to access the same data. Copyright John Wiley & Sons Canada 14 DATABASE MANAGEMENT SYSTEMS Figure 3.1 University Database Management System Copyright John Wiley & Sons Canada 15 DATA HIERARCHY • • • • • • Bit: (binary digit) represents the smallest unit of data a computer can process. Byte: represents a single character. Field: A logical grouping of related characters Record: A logical grouping of related fields File (or table): A logical grouping of related records Database: A logical grouping of related files Copyright John Wiley & Sons Canada 16 HIERARCHY OF DATA FOR A COMPUTER-BASED FILE Figure 3.2 Hierarchy of data in University database Copyright John Wiley & Sons Canada 17 DATA HIERARCHY (CONTINUED) • • Bit (binary digit): 1 0 0 1 Byte (eight bits): 01101010 Copyright John Wiley & Sons Canada 18 DATA HIERARCHY (CONTINUED) Example of Field and Record Copyright John Wiley & Sons Canada 19 DATA HIERARCHY (CONTINUED) Example of Field and Record Copyright John Wiley & Sons Canada 20 DESIGNING THE DATABASE • Data model – Entity is a person, place, thing, or event which an organization maintains information. – Instance: is a specific, unique representation of the entity. – Attribute is a characteristic or quality of a particular entity – Primary key is a field that uniquely identifies a record. – Secondary keys are other field that have some identifying information but typically do not identify the file with complete accuracy. Copyright John Wiley & Sons Canada 21 ENTITY-RELATIONSHIP MODELING • • Database designers plan the database design in a process called entity-relationship (ER) modeling. ER diagrams consists of entities, attributes and relationships. – Entity classes – Instance – Identifiers Copyright John Wiley & Sons Canada 22 RELATIONSHIPS BETWEEN ENTITIES Figure 3.3 Cardinality and Modality Symbols Copyright John Wiley & Sons Canada 23 ENTITY-RELATIONSHIP DIAGRAM MODEL Copyright John Wiley & Sons Canada 24 3.3 DATABASE MANAGEMENT SYSTEMS • • Database management system (DBMS) Relational database model – Structured Query Language (SQL) – Query by Example (QBE) • Data Dictionary Copyright John Wiley & Sons Canada 25 STUDENT DATABASE EXAMPLE Figure 3.5 Example of Student Database Copyright John Wiley & Sons Canada 26 NORMALIZATION • Normalization –Minimizes redundancy –Maximizes data integrity –Optimizes processing performance • Normalized data occurs when attributes in the table depend only on the primary key. Copyright John Wiley & Sons Canada 27 NON-NORMALIZED RELATION Copyright John Wiley & Sons Canada 28 NORMALIZING THE DATABASE (PART A) Copyright John Wiley & Sons Canada 29 NORMALIZING THE DATABASE (PART B) Copyright John Wiley & Sons Canada 30 NORMALIZATION PRODUCES ORDER Copyright John Wiley & Sons Canada 31 3.4 DATA WAREHOUSING AND DATA MARTS • Data warehouses and Data Marts – – – – – – Organized by business dimension or subject Use On-line Analytical Processing Integrated Time Variant Nonvolatile Multidimensional Copyright John Wiley & Sons Canada 32 THE ENVIRONMENT FOR DATA WAREHOUSING AND DATA MARTS • • • • • Source systems that provide data to the data warehouse or data mart Data integration technology and processes that are needed to prepare the data for use Different architectures for storing data in an organization’s data warehouse or data marts Different BI tools and applications for the variety of users The need for metadata, data quality, and governance processes to be in place to ensure that the data warehouse or data mart meets its purposes Copyright John Wiley & Sons Canada 33 DATA WAREHOUSE FRAMEWORK Copyright John Wiley & Sons Canada 34 RELATIONAL DATABASES Copyright John Wiley & Sons Canada 35 MULTIDIMENSIONAL DATABASE Copyright John Wiley & Sons Canada 36 EQUIVALENCE BETWEEN RELATIONAL AND MULTIDIMENSIONAL DATABASES Copyright John Wiley & Sons Canada 37 DATA INTEGRATION (ETL) • • • To extract data from source systems, transform them, and load them into a data mart or warehouse. Can be performed by hand-written code (e.g., SQL queries) or by commercial data-integration software. Can be transformed to make them more useful. Copyright John Wiley & Sons Canada 38 STORING THE DATA • • • The most common architecture is one central enterprise data warehouse, without data marts. Independent data marts, which store data for a single or a few applications, such as in marketing or finance. Hub and spoke stores data in a central data warehouse while simultaneously maintaining dependent data marts that obtain their data from the central repository. Copyright John Wiley & Sons Canada 39 STORING DATA (CONTINUED) • • • • Metadata is Data about data. Data Quality: The quality of the data in the warehouse must be adequate to satisfy users’ needs Governance requires that people, committees, and processes be in place. Users: There are a large number of potential BI users, including IT developers; front-line workers; analysts; information workers; managers and executives; and suppliers, customers, and regulators. Copyright John Wiley & Sons Canada 40 3.5 KNOWLEDGE MANAGEMENT • • • Knowledge management (KM) Knowledge Intellectual capital (or intellectual assets) Copyright John Wiley & Sons Canada 41 KNOWLEDGE MANAGEMENT (CONTINUED) • Explicit knowledge: objective, rational, technical knowledge that has been documented. – Examples: policies, procedural guides, reports, products, strategies, goals, core competencies • Tacit knowledge: cumulative store of subjective or experiential learning. – Examples: experiences, insights, expertise, know-how, trade secrets, understanding, skill sets, and learning Copyright John Wiley & Sons Canada 42 KNOWLEDGE MANAGEMENT (CONTINUED) • Knowledge management systems (KMSs) • Best practices Copyright John Wiley & Sons Canada 43 KNOWLEDGE MANAGEMENT SYSTEM CYCLE • • • • • • Create knowledge Capture knowledge Refine knowledge Store knowledge Manage knowledge Disseminate knowledge Copyright John Wiley & Sons Canada 44 KNOWLEDGE MANAGEMENT SYSTEM CYCLE Copyright John Wiley & Sons Canada 45 CHAPTER CLOSING • • • Organizations can use knowledge management to develop best practices, the most effective and efficient ways of doing things, and to make these practices readily available to a wide range of employees. The database approach minimizes the following problems: data redundancy, data isolation, data inconsistency, data security, data integrity, and data independence. Master data management provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for the company’s core master data. Copyright John Wiley & Sons Canada 46 Copyright Copyright © 2014 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (the Canadian copyright licensing agency) is unlawful. Requests for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his or her own use only and not for distribution or resale. The author and the publisher assume no responsibility for errors, omissions, or damages caused by the use of these files or programs or from the use of the information contained herein. Copyright John Wiley & Sons Canada