Information systems Unit Name: Information Systems for Business Unit Code: JQU0001 * Topic 9 Data and Knowledge Management Unit Name: Information Systems for Business Unit Code: JQU0001 * [ CHAPTER OUTLINE ] 1.Managing Data 2.The Database Approach 3.Fundamentals of Relational Database Operations 4.Big Data 5.Data Warehouses and Data Marts 6.Knowledge Management * 1. Managing Data The Difficulties of Managing Data Data Governance 4 Difficulties in Managing Data Data increases exponentially with time Multiple sources of data Data rot, or data degradation Data security, quality, and integrity Government Regulation 5 Multiple Sources of Data Internal Sources Corporate databases, company documents 文档 Personal Sources Personal thoughts, opinions, experiences External Sources Commercial databases, government reports, and corporate Web sites.公司网址 New sources of data (e.g., blogs, podcasts, videocasts, and RFID tags and other wireless sensors) * Difficulties in Managing Data Data Degradation 数据退化 (e.g., customers move to new addresses, change their names, etc.) Data Rot: 数据腐烂 refers primarily to problems with the media on which the data are stored. Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access the data. 主要指存储数据的媒体的问题。随着时 间的推移,温度、湿度和暴露在光线下会导致存储介质的物理问题 ,从而使数据难以访问。 Data security: quality, and integrity are critical诚信是至关重要的 * Difficulties in Managing Data Government Regulation:规则 • Legal requirements change frequently and differ among countries and industries法律要求 经常变化,不同国家和行业之间也有所不同 • Federal regulations:联邦的条例 Sarbanes–Oxley Act萨班斯-奥克斯利法案of 2002 requires that: public companies evaluate and disclose the effectiveness of their internal financial controls independent auditors for these companies agree to this disclosure. * Data Governance An approach to managing information across an entire organization Master Data a set of core data (e.g., customer, product, employee, vendor, geographic location, etc.) that span the enterprise information systems. Master Data Management Strategic process of data governance to manage the companies master data consistently and accurately 9 2. The Database Approach The Data Hierarchy数据分层 The Relational Database Model 10 Data Hierarchy Data hierarchy A basic concept概念 in data and database theory and helps to show the relationships between smaller and larger components 组成in a database or data file数据文件. This concept is a starting point when trying to see what makes up data and whether data has a structure. (i.e. to see these terms as smaller or larger components in a hierarchy)这个概念 是一个起点,可以帮助我们了解数据是由什么组成的,以及数据 是否具有结构。(即把这些术语看作层次结构中更小或更大的组成 部分)Data organization involves characters, fields, records, files and so on 数据组织包括字符、字段、记录、文件等 11 Data Hierarchy Components层次结构组件 Bit Byte Field Data File or Table Database 12 Data Hierarchy Components Binary digit (BIT): basic unit of information, either 0 or 1 byte 字节 is a unit of data that is eight binary digits long; unit most computers use to represent a character such as a letter, Data field 数据区 holds a single fact or attribute of an entity. e.g. "19 September 2004“ Record is a collection of related fields. E.g. Employee record contains a name field(s), address fields, birthdate field and so on File is a collection of related records. Files (or Table) are integrated into a database. This is done using a Database Management System 13 14 Database Management System (DBMS) Database Management System (DBMS) is a software package designed to store and manage databases.一种用于存储和管理数据库的软件包。 It controls access to the physical data它控制对 物理数据的访问 The DBMS is an interface between applications and the physical data.应用程序和物理数据之间的接口 A Database is the actual collection of data数据库的实际集合 A DBMS manages the data. * Database Management Approach * Database Management Systems Once we have built the database, the DBMS then becomes the interface between the database and the database applications that use Forms, Reports, Queries and Application Programs to report on and manipulate the data.一旦我们建立了数据库, DBMS就成为数据库和数据库应用程序之间的接口,数据库应用程序使用表单、报表 、查询和应用程序报告和操作数据 * Database Management Systems Minimize Three Main Problems数据库管理系统最小化了三个主要问题 Data Redundancy: the same data are stored in multiple locations Data Isolation: 隔离 Applications cannot access data associated with other applications. 应用程序不能访问与 其他应用程序关联的数据。 Data Inconsistency:不一致 Various copies of the data do not agree.数据 的各种副本不一致 * Database Management Systems Maximize Three Things Data Security: Reduce risks of loss. corruptions, hackers.. Data Integrity: Data meet certain constraints; e.g. No alphabetic characters in a Social Security number field. Data Independence: Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data. * Database Management Systems Advantages Data Consistency and Integrity – by controlling access and minimizing data duplication Application program independence – by storing data in a uniform fashion Data Sharing – by controlling access to data items, many users can access data concurrently Backup and Recovery, Security and Privacy Dis-Advantages Expensive/complicated to set up & maintain; Specialized staff required- Database Administrator This cost & complexity must be offset by need * 3. Fundamentals of Relational Database Operations Query Languages Entity Relationship Modeling Normalization and Joins 21 Query Languages Structured Query Language (SQL): the most popular query language used for interacting with a database. SQL allows people to perform complicated searches by using relatively simple statements or key words Query By Example (QBE): the user fills out a grid or template—also known as a form—to construct a sample or a description of the data desired. 22 The Relational Database Model Key Terms Database Management System Relational Database Model Entity Instance Attribute Primary Key Foreign or Secondary Keys 23 The Relational Database Model Relational Database Model: is based on the concept of two-dimensional tables and is usually designed with a number of related tables with each of these tables contains records (listed in rows) and attributes (listed in columns). Entity: a person, place, thing, or event (e.g., customer, an employee, or a product). Attribute: each characteristic or quality of a particular entity. 24 Designing the Database Entity A person, place, thing, or event about which information must be kept Example: Students Customers An order to buy a product Attribute (field) Do not confuse the attribute name (Order A fact about a particular entity Date) with the field value (for example “02/08/2012”) Example: Order Date Quantity Price Student ID * The Relational Database Model Primary Key: a field in a database that uniquely identify each record so that it can be retrieved, updated, and sorted. Secondary Key: a field that has some identifying information, but typically does not identify the record with complete accuracy and therefore cannot serve at the Primary Key. Foreign Key: a field (or group of fields) in one table that uniquely identifies a row of another table. It is used to establish and enforce a link between two tables. 26 Designing the Database Entity A person, place, thing, or event about which information must be kept Example: Students A Primary Key (identifier) /Key Field One attribute (or a set of Customers attributes) in a record that An order to buy a product Attribute (field) A fact about a particular entity Example: Order Date Quantity Price Student ID uniquely identifies instances of that record so that it can be retrieved, updated, or sorted. Only one record will be retrieved. * 27 Designing the Database Entity A person, place, thing, or event about which information must be kept Example: Students Customers An order to buy a product Attribute (field) A fact about a particular entity Example: Order Date Quantity Price Student ID * Entity Relationship Modeling Entity Relationship Diagram (ERD) Business Rules Data Dictionary Relationships Unary, Binary, Ternary Cardinality Connectivity 29 Entity Relationship Modeling Entity–Relationship (ER) Modeling: A process by which designers plan and create databases using an entity–relationship diagram. ER Diagrams (ERD): consist of entities, attributes, and relationships. To properly identify entities, attributes, and relationships, database designers first identify the business rules for the particular data model. 30 Entity Relationship Modeling 31 Entity Relationship Modeling 32 Entity Relationship Modeling Relationships: illustrate an association between entities. Degree of a Relationship indicates the number of entities associated with a relationship. Unary Relationship: exists when an association is maintained within a single entity. Binary Relationship: exists when two entities are associated. Ternary Relationship: exists when three entities are associated. 33 Entity Relationship Modeling Components of an Entity-Relationship Data Model: Entities Attributes Something users want to track, e.g. order, customer, salesperson, item, volunteer, donation Describe characteristics of an entity, e.g. OrderNumber, CustomerNumber, VolunteerName, PhoneNumber Unique Identifier Attribute that uniquely identifies one entity instance from other instances, e.g. Student_ID_Number 34 Entity Relationship Modeling 35 Entity Relationship Modeling Cardinality: Refers to the maximum number of times an instance of one entity can be associated with an instance in the related entity. (One to Many Relationship example below) 36 Entity Relationship Modeling Optionality: The cardinality shows what type of relationship, but is does not state whether this is a mandatory or optional relationship. Cardinality can be mandatory single, optional single, mandatory many, or optional many. A Mandatory relationship is where there must be at least one matching record in each entity. An Optional relationship is where there may or may not be a matching record in each entity. 37 Entity Relationship Modeling 38 Entity Relationship Modeling 39 Figure 5A.1 Cardinality Symbols * Types of E-R Modeling 41 Types of E-R Modeling 42 Types of E-R Modeling 43 Types of E-R Modeling 44 Types of E-R Modeling 45 Types of E-R Modeling 46 Types of E-R Modeling 47 Figure 5A.2 One-to-one Relationship * Figure 5A.3 One-to-Many Relationship * Figure 5A.4 Many-to-Many Relationship * Entity Relationship Modeling Business Rules: precise descriptions of policies, procedures, or principles in any organization that stores and uses data to generate information. Business rules are derived from a description of an organization’s operations, and help create and enforce business processes in that organization. 51 Entity Relationship Modeling 52 Entity Relationship Modeling 53 Entity Relationship Modeling 54 Entity Relationship Modeling 55 Entity Relationship Modeling 56 Entity Relationship Modeling 57 Entity Relationship Modeling 58 Entity Relationship Modeling 59 Entity Relationship Modeling 60 Entity Relationship Modeling Data Dictionary: provides information on each attribute, such as Name, if a primary key, Type of data: (alphanumeric, numeric, dates, etc.), and valid values. Data dictionaries can also provide information on Relationships: illustrate an association between entities. 61 Normalization and Joins Normalization is a method for analyzing and reducing a relational database to its most streamlined form to ensure minimum redundancy, maximum data integrity, and optimal processing performance. Functional Dependencies First Normal Form Second Normal Form Third Normal Form 62 Normalization and Joins Example 63 Normalization and Joins Example 64 Normalization and Joins Example 65 Normalization and Joins Functional Dependencies: a means of expressing that the value of one particular attribute is associated with a specific single value of another attribute. For example, for a Student Number 05345 at a university, there is exactly one Student Name, John C. Jones, associated with it. That is, Student Number is referred to as the determinant because its value determines the value of the other attribute. We can also say that Student Name is functionally dependent on Student Number. 66 Figure 5A.5 Raw Data Gathered from Orders at the Pizza Shop * Figure 5A.6 Functional Dependencies in Pizza Shop Example * Figure 5A.7 First Normal Form for Data from Pizza Shop * Figure 5A.8 Second Normal Form for Data from Pizza Shop * Figure 5A.9 Third Normal Form for Data from Pizza Shop * Figure 5A.10 The Join Process with the tables of third normal form to produce an order * FIGURE 5.3 Student database example. Database applications also consists of: • • • • Tables Forms Reports Queries * Database Application Systems 74 Database Application Systems • Tables 75 Database Application Systems • Tables 76 Database Application Systems 77 • Form 78 • Query 79 • Report 80 4. Big Data Defining Big Data Characteristics of Big Data Managing Big Data Leveraging Big Data 81 Defining Big Data Big Data Generally Consist of: Traditional enterprise data Machine-generated/sensor data Social Data Images captured by billions of devices located around the world Digital cameras, camera phones, medical scanners, and security cameras 82 Defining Big Data • Variety, includes structured, unstructured, and semi-structured data • Generated at high Volumes and Velocity with an uncertain pattern • Do not fit neatly into traditional, structured, relational databases • Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems. * Defining Big Data Its core, Big Data is about predictions. Predictions do not come from “teaching” computers to “think” like humans. Instead, predictions come from applying mathematics to huge quantities of data to infer probabilities * Defining Big Data • Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. It may be textual or nontextual, and human- or machinegenerated. * Issues with Big Data Untrusted data sources Big Data is dirty: data refers to inaccurate, incomplete, incorrect, duplicate, or erroneous data. Big Data changes, especially in data streams: Organizations must be aware that data quality in an analysis can change, or the data itself can change, because the conditions under which the data are captured can change 86 Managing Big Data When properly analyzed big data can reveal valuable patterns and information. Database environment Open source solutions Traditional relational databases versus NoSQL databases 87 Putting Big Data to Use Making Big Data Available: relevant stakeholders can help organizations gain value by using Big Data. Enabling Organizations to Conduct Experiments Micro-segmentation of Customers: dividing them up into groups that share one or more characteristics. 88 Putting Big Data to Use Creating New Business Models: Telematics, e-commerce Organizations Can Analyze Far More Data: organizations can even process all the data in a population relating to a particular phenomenon, meaning that they do not have to rely as much on sampling. 89 5. Data Warehouses and Data Marts Data Warehouse A repository of historical data that are organized by subject to support decision makers in the organization. primarily used by large companies. Data Mart A low-cost, scaled-down version of a data warehouse designed for end-user needs in a strategic business unit (SBU) or individual department. 90 FIGURE 5.4 Data warehouse framework. SOURCE SYSTEMS DATA INTEGRATION STORING DATA USERS * Basic Characteristics of Data Warehouses & Data Marts Data Organized by business dimension or subject For example, by customer, vendor, product, price level, and region. This arrangement differs from transactional systems, where data is organized by business process, such as order entry, inventory control, and accounts receivable. Use of Online Analytical Processing (OLAP): performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling 92 Basic Characteristics of Data Warehouses & Data Marts Integrated Data is collected from multiple systems and then integrated around subjects; not organized by business process (e.g. transaction) Time variant Data warehouses and data marts maintain historical data (i.e., data that include time as a variable), rather than real time. 93 Basic Characteristics of Data Warehouses & Data Marts Nonvolatile Data warehouses and data marts are nonvolatile; that is, users cannot change or update the data. Multidimensional Typically the data warehouse or mart uses a multidimensional data structure. Recall that relational databases store data in twodimensional tables. 94 FIGURE 5.5 Relational databases. * FIGURE 5.6 Data cube. * FIGURE 5.7 Equivalence between relational and multidimensional databases. * 6. Knowledge Management Concepts and Definitions Knowledge Management Systems The KMS Cycle 98 Concepts & Definitions Knowledge: information that is contextual, relevant, and useful. It is information in action. Intellectual capital (or intellectual assets) is another term for knowledge. Knowledge Management (KM) A process that helps manipulate important knowledge that comprises part of the organization’s memory, usually in an unstructured format. 99 Concepts & Definitions Explicit Knowledge: more objective, rational, and technical knowledge. In an organization, explicit knowledge consists of the policies, procedural guides, reports, products, strategies, goals, core competencies, and IT infrastructure of the enterprise. Tacit Knowledge: the cumulative store of subjective or experiential learning. In an organization, tacit knowledge consists of an organization’s experiences, insights, expertise, know-how, trade secrets, skill sets, understanding, and learning. It is generally imprecise and costly to transfer. 100 Knowledge Management Systems (KMS) Refer to the use of modern information technologies – the Internet, intranet, extranets, databases – to systematize, enhance, and expedite intra-firm and interfirm knowledge management. KMSs are intended to help an organization cope with turnover, rapid change, and downsizing by making the expertise of the organization’s human capital widely accessible. 101 FIGURE 5.8 The knowledge management system cycle. * The KMS Cycle 1. Create: when new ‘ways’ and ‘hows’ are developed 2. Capture: New knowledge identified as valuable 3. Refine: placed in context that is actionable (i.e. where tacit qualities (human insights) must be captured along with explicit facts.) 103 The KMS Cycle 4. Store: be stored in a reasonable format in a knowledge repository so that other people in the organization can access it. 5. Manage: Like a library, the knowledge must be kept current. It must be reviewed regularly to verify that it is relevant and accurate. 6. Disseminate: Knowledge must be made available in a useful format to anyone in the organization who needs it, anywhere and anytime. 104