Uploaded by qiaozhuzhang12138

副本week 5. Data Management(1)

advertisement
Information systems
Unit Name: Information Systems for Business
Unit Code: JQU0001
*
Topic 9
Data and Knowledge
Management
Unit Name: Information Systems for
Business
Unit Code: JQU0001
*
[ CHAPTER OUTLINE ]
1.Managing Data
2.The Database Approach
3.Fundamentals of Relational Database
Operations
4.Big Data
5.Data Warehouses and Data Marts
6.Knowledge Management
*
1. Managing Data
The Difficulties of Managing Data
Data Governance
4
Difficulties in Managing Data
Data increases exponentially with time
Multiple sources of data
Data rot, or data degradation
Data security, quality, and integrity
Government Regulation
5
Multiple Sources of Data
Internal Sources
Corporate databases, company documents 文档
Personal Sources
Personal thoughts, opinions, experiences
External Sources
Commercial databases, government reports, and
corporate Web sites.公司网址
New sources of data (e.g., blogs, podcasts, videocasts, and
RFID tags and other wireless sensors)
*
Difficulties in Managing Data
Data Degradation 数据退化
(e.g., customers move to new addresses, change their names,
etc.)
Data Rot: 数据腐烂
refers primarily to problems with the media on which the data are
stored. Over time, temperature, humidity, and exposure to light
can cause physical problems with storage media and thus make it
difficult to access the data. 主要指存储数据的媒体的问题。随着时
间的推移,温度、湿度和暴露在光线下会导致存储介质的物理问题
,从而使数据难以访问。
Data security:
quality, and integrity are critical诚信是至关重要的
*
Difficulties in Managing Data
Government Regulation:规则
• Legal requirements change frequently and
differ among countries and industries法律要求
经常变化,不同国家和行业之间也有所不同
• Federal regulations:联邦的条例
Sarbanes–Oxley Act萨班斯-奥克斯利法案of 2002
requires that:
public companies evaluate and disclose the
effectiveness of their internal financial controls
independent auditors for these companies agree to
this disclosure.
*
Data Governance
An approach to managing information across an
entire organization
Master Data
a set of core data (e.g., customer, product, employee,
vendor, geographic location, etc.) that span the
enterprise information systems.
Master Data Management
Strategic process of data governance to manage the
companies master data consistently and accurately
9
2. The Database Approach
The Data Hierarchy数据分层
The Relational Database Model
10
Data Hierarchy
Data hierarchy
A basic concept概念 in data and database theory and helps to
show the relationships between smaller and larger components
组成in a database or data file数据文件.
This concept is a starting point when trying to see what makes
up data and whether data has a structure. (i.e. to see these
terms as smaller or larger components in a hierarchy)这个概念
是一个起点,可以帮助我们了解数据是由什么组成的,以及数据
是否具有结构。(即把这些术语看作层次结构中更小或更大的组成
部分)Data organization involves characters, fields, records, files
and so on 数据组织包括字符、字段、记录、文件等
11
Data Hierarchy Components层次结构组件
Bit
Byte
Field
Data File or
Table
Database
12
Data Hierarchy Components
Binary digit (BIT):
basic unit of information, either 0 or 1
byte 字节
is a unit of data that is eight binary digits long; unit most computers use
to represent a character such as a letter,
Data field 数据区
holds a single fact or attribute of an entity. e.g. "19 September 2004“
Record
is a collection of related fields. E.g. Employee record contains a name
field(s), address fields, birthdate field and so on
File
is a collection of related records.
Files (or Table)
are integrated into a database. This is done using a Database
Management System
13
14
Database Management System (DBMS)
Database Management System (DBMS) is a
software package designed to store and manage
databases.一种用于存储和管理数据库的软件包。
It controls access to the physical data它控制对
物理数据的访问
The DBMS is an interface between applications
and the physical data.应用程序和物理数据之间的接口
A Database is the actual collection of data数据库的实际集合
A DBMS manages the data.
*
Database Management Approach
*
Database Management Systems

Once we have built the database, the DBMS then becomes the interface between the
database and the database applications that use Forms, Reports, Queries and
Application Programs to report on and manipulate the data.一旦我们建立了数据库,
DBMS就成为数据库和数据库应用程序之间的接口,数据库应用程序使用表单、报表
、查询和应用程序报告和操作数据
*
Database Management Systems Minimize Three Main
Problems数据库管理系统最小化了三个主要问题
Data Redundancy:
the same data are stored in multiple locations
Data Isolation: 隔离
Applications cannot access data associated
with other applications. 应用程序不能访问与
其他应用程序关联的数据。
Data Inconsistency:不一致
Various copies of the data do not agree.数据
的各种副本不一致
*
Database Management Systems Maximize
Three Things
Data Security:
Reduce risks of loss. corruptions, hackers..
Data Integrity:
Data meet certain constraints; e.g. No alphabetic
characters in a Social Security number field.
Data Independence:
Applications and data are independent of one
another; that is, applications and data are not
linked to each other, so all applications are able
to access the same data.
*
Database Management Systems
Advantages
Data Consistency and Integrity – by controlling access and
minimizing data duplication
Application program independence – by storing data in a
uniform fashion
Data Sharing – by controlling access to data items, many
users can access data concurrently
Backup and Recovery, Security and Privacy
Dis-Advantages
Expensive/complicated to set up & maintain; Specialized
staff required- Database Administrator
This cost & complexity must be offset by need
*
3. Fundamentals of Relational
Database Operations
Query Languages
Entity Relationship Modeling
Normalization and Joins
21
Query Languages
Structured Query Language (SQL):
the most popular query language used for
interacting with a database. SQL allows people
to perform complicated searches by using
relatively simple statements or key words
Query By Example (QBE):
the user fills out a grid or template—also known
as a form—to construct a sample or a description
of the data desired.
22
The Relational Database Model
Key Terms
Database Management System
Relational Database Model
Entity
Instance
Attribute
Primary Key
Foreign or Secondary Keys
23
The Relational Database Model
Relational Database Model:
is based on the concept of two-dimensional tables
and is usually designed with a number of related
tables with each of these tables contains records
(listed in rows) and attributes (listed in columns).
Entity:
a person, place, thing, or event (e.g., customer, an
employee, or a product).
Attribute:
each characteristic or quality of a particular entity.
24
Designing the Database

Entity
 A person, place, thing, or event about which information must be kept
 Example:

Students

Customers

An order to buy a product

Attribute (field)
Do not confuse the attribute name (Order
 A fact about a particular entity Date) with the field value (for example
“02/08/2012”)
 Example:

Order Date

Quantity

Price

Student ID
*
The Relational Database Model
Primary Key:
a field in a database that uniquely identify each record
so that it can be retrieved, updated, and sorted.
Secondary Key:
a field that has some identifying information, but
typically does not identify the record with complete
accuracy and therefore cannot serve at the Primary
Key.
Foreign Key:
a field (or group of fields) in one table that uniquely
identifies a row of another table. It is used to establish
and enforce a link between two tables.
26
Designing the Database

Entity
 A person, place, thing, or event about which information must be kept
 Example:

Students
 A Primary Key (identifier) /Key Field
 One attribute (or a set of

Customers
attributes) in a record that

An order to buy a product

Attribute (field)
 A fact about a particular entity
 Example:

Order Date

Quantity

Price

Student ID

uniquely identifies instances of
that record so that it can be
retrieved, updated, or sorted.
Only one record will be retrieved.
*
27
Designing the Database

Entity
 A person, place, thing, or event about which information must be kept
 Example:

Students

Customers

An order to buy a product

Attribute (field)
 A fact about a particular entity
 Example:

Order Date

Quantity

Price

Student ID
*
Entity Relationship Modeling
Entity Relationship Diagram (ERD)
Business Rules
Data Dictionary
Relationships
Unary, Binary, Ternary
Cardinality
Connectivity
29
Entity Relationship Modeling
Entity–Relationship (ER) Modeling:
A process by which designers plan and create
databases using an entity–relationship diagram.
ER Diagrams (ERD):
consist of entities, attributes, and relationships.
To properly identify entities, attributes, and
relationships, database designers first identify
the business rules for the particular data model.
30
Entity Relationship Modeling
31
Entity Relationship Modeling
32
Entity Relationship Modeling
Relationships: illustrate an association between
entities.
Degree of a Relationship indicates the number
of entities associated with a relationship.
Unary Relationship: exists when an association is
maintained within a single entity.
Binary Relationship: exists when two entities are
associated.
Ternary Relationship: exists when three entities are
associated.
33
Entity Relationship Modeling
Components of an Entity-Relationship Data Model:
 Entities


Attributes


Something users want to track, e.g. order, customer,
salesperson, item, volunteer, donation
Describe characteristics of an entity, e.g.
OrderNumber, CustomerNumber, VolunteerName,
PhoneNumber
Unique Identifier

Attribute that uniquely identifies one entity instance
from other instances, e.g. Student_ID_Number
34
Entity Relationship Modeling
35
Entity Relationship Modeling
Cardinality:
Refers to the maximum number of times an instance
of one entity can be associated with an instance in
the related entity. (One to Many Relationship example
below)
36
Entity Relationship Modeling
Optionality:
The cardinality shows what type of relationship, but is
does not state whether this is a mandatory or optional
relationship.
Cardinality can be mandatory single, optional single,
mandatory many, or optional many.
A Mandatory relationship is where there must be at
least one matching record in each entity.
An Optional relationship is where there may or may
not be a matching record in each entity.
37
Entity Relationship Modeling
38
Entity Relationship Modeling
39
Figure 5A.1 Cardinality Symbols
*
Types of E-R Modeling
41
Types of E-R Modeling
42
Types of E-R Modeling
43
Types of E-R Modeling
44
Types of E-R Modeling
45
Types of E-R Modeling
46
Types of E-R Modeling
47
Figure 5A.2 One-to-one Relationship
*
Figure 5A.3 One-to-Many Relationship
*
Figure 5A.4 Many-to-Many Relationship
*
Entity Relationship Modeling
Business Rules:
precise descriptions of policies, procedures,
or principles in any organization that stores
and uses data to generate information.
Business rules are derived from a description
of an organization’s operations, and help
create and enforce business processes in
that organization.
51
Entity Relationship Modeling
52
Entity Relationship Modeling
53
Entity Relationship Modeling
54
Entity Relationship Modeling
55
Entity Relationship Modeling
56
Entity Relationship Modeling
57
Entity Relationship Modeling
58
Entity Relationship Modeling
59
Entity Relationship Modeling
60
Entity Relationship Modeling
Data Dictionary: provides
information on each
attribute, such as
Name, if a primary key,
Type of data: (alphanumeric,
numeric, dates, etc.), and
valid values.
Data dictionaries can also
provide information on
Relationships: illustrate an
association between entities.
61
Normalization and Joins
Normalization
is a method for analyzing and reducing a
relational database to its most streamlined form
to ensure minimum redundancy, maximum data
integrity, and optimal processing performance.
Functional Dependencies
First Normal Form
Second Normal Form
Third Normal Form
62
Normalization and Joins Example
63
Normalization and Joins Example
64
Normalization and Joins Example
65
Normalization and Joins
Functional Dependencies:
a means of expressing that the value of one
particular attribute is associated with a specific single
value of another attribute.
For example, for a Student Number 05345 at a
university, there is exactly one Student Name, John
C. Jones, associated with it. That is, Student Number
is referred to as the determinant because its value
determines the value of the other attribute. We can
also say that Student Name is functionally dependent
on Student Number.
66
Figure 5A.5 Raw Data Gathered from Orders at the Pizza
Shop
*
Figure 5A.6 Functional Dependencies in Pizza Shop
Example
*
Figure 5A.7 First Normal Form for Data from Pizza Shop
*
Figure 5A.8 Second Normal Form for Data from Pizza Shop
*
Figure 5A.9 Third Normal Form for Data from Pizza Shop
*
Figure 5A.10 The Join Process with the tables of third
normal form to produce an order
*
FIGURE 5.3 Student database example.
Database applications
also consists of:
•
•
•
•
Tables
Forms
Reports
Queries
*
Database Application Systems
74
Database Application Systems
• Tables
75
Database Application Systems
• Tables
76
Database Application Systems
77
• Form
78
• Query
79
• Report
80
4. Big Data
Defining Big Data
Characteristics of Big Data
Managing Big Data
Leveraging Big Data
81
Defining Big Data
Big Data Generally Consist of:
Traditional enterprise data
Machine-generated/sensor data
Social Data
Images captured by billions of devices
located around the world
Digital cameras, camera phones, medical
scanners, and security cameras
82
Defining Big Data
• Variety, includes structured, unstructured, and
semi-structured data
• Generated at high Volumes and Velocity with
an uncertain pattern
• Do not fit neatly into traditional, structured,
relational databases
• Can be captured, processed, transformed, and
analyzed in a reasonable amount of time only
by sophisticated information systems.
*
Defining Big Data
Its core, Big Data is about predictions.
Predictions do not come from “teaching”
computers to “think” like humans.
Instead, predictions come from applying
mathematics to huge quantities of data to infer
probabilities
*
Defining Big Data
• Unstructured data
is information that
either does not
have a pre-defined
data model or is
not organized in a
pre-defined
manner. It may be
textual or nontextual, and
human- or
machinegenerated.
*
Issues with Big Data
Untrusted data sources
Big Data is dirty:
data refers to inaccurate, incomplete, incorrect,
duplicate, or erroneous data.
Big Data changes, especially in data streams:
Organizations must be aware that data quality in an
analysis can change, or the data itself can change,
because the conditions under which the data are
captured can change
86
Managing Big Data
When properly analyzed big data can
reveal valuable patterns and information.
Database environment
Open source solutions
Traditional relational databases versus
NoSQL databases
87
Putting Big Data to Use
Making Big Data Available:
relevant stakeholders can help organizations gain
value by using Big Data.
Enabling Organizations to Conduct
Experiments
Micro-segmentation of Customers:
dividing them up into groups that share one or
more characteristics.
88
Putting Big Data to Use
Creating New Business Models:
Telematics, e-commerce
Organizations Can Analyze Far More Data:
organizations can even process all the data in a
population relating to a particular phenomenon,
meaning that they do not have to rely as much
on sampling.
89
5. Data Warehouses and Data Marts
Data Warehouse
A repository of historical data that are organized
by subject to support decision makers in the
organization. primarily used by large companies.
Data Mart
A low-cost, scaled-down version of a data
warehouse designed for end-user needs in a
strategic business unit (SBU) or individual
department.
90
FIGURE 5.4 Data warehouse framework.
SOURCE SYSTEMS
DATA
INTEGRATION
STORING
DATA
USERS
*
Basic Characteristics of Data Warehouses & Data
Marts
Data Organized by business dimension or
subject
For example, by customer, vendor, product, price
level, and region. This arrangement differs from
transactional systems, where data is organized by
business process, such as order entry, inventory
control, and accounts receivable.
Use of Online Analytical Processing (OLAP):
performs multidimensional analysis of business data
and provides the capability for complex calculations,
trend analysis, and sophisticated data modeling
92
Basic Characteristics of Data Warehouses & Data
Marts
Integrated
Data is collected from multiple systems and then
integrated around subjects; not organized by
business process (e.g. transaction)
Time variant
Data warehouses and data marts maintain historical
data (i.e., data that include time as a variable), rather
than real time.
93
Basic Characteristics of Data Warehouses & Data
Marts
Nonvolatile
Data warehouses and data marts are nonvolatile; that
is, users cannot change or update the data.
Multidimensional
Typically the data warehouse or mart uses a
multidimensional data structure.
Recall that relational databases store data in twodimensional tables.
94
FIGURE 5.5 Relational databases.
*
FIGURE 5.6 Data cube.
*
FIGURE 5.7 Equivalence between relational and
multidimensional databases.
*
6. Knowledge Management
Concepts and Definitions
Knowledge Management Systems
The KMS Cycle
98
Concepts & Definitions
Knowledge:
information that is contextual, relevant, and
useful. It is information in action. Intellectual
capital (or intellectual assets) is another term for
knowledge.
Knowledge Management (KM)
A process that helps manipulate important
knowledge that comprises part of the
organization’s memory, usually in an
unstructured format.
99
Concepts & Definitions
Explicit Knowledge:
more objective, rational, and technical knowledge. In
an organization, explicit knowledge consists of the
policies, procedural guides, reports, products,
strategies, goals, core competencies, and IT
infrastructure of the enterprise.
Tacit Knowledge:
the cumulative store of subjective or experiential
learning. In an organization, tacit knowledge consists
of an organization’s experiences, insights, expertise,
know-how, trade secrets, skill sets, understanding,
and learning.
It is generally imprecise and costly to transfer.
100
Knowledge Management Systems (KMS)
Refer to the use of modern information
technologies – the Internet, intranet,
extranets, databases – to systematize,
enhance, and expedite intra-firm and interfirm knowledge management.
KMSs are intended to help an organization
cope with turnover, rapid change, and
downsizing by making the expertise of the
organization’s human capital widely
accessible.
101
FIGURE 5.8 The knowledge management system cycle.
*
The KMS Cycle
1. Create:
when new ‘ways’ and ‘hows’ are developed
2. Capture:
New knowledge identified as valuable
3. Refine:
placed in context that is actionable (i.e. where
tacit qualities (human insights) must be
captured along with explicit facts.)
103
The KMS Cycle
4. Store:
be stored in a reasonable format in a knowledge
repository so that other people in the organization can
access it.
5. Manage:
Like a library, the knowledge must be kept current. It
must be reviewed regularly to verify that it is relevant
and accurate.
6. Disseminate:
Knowledge must be made available in a useful format
to anyone in the organization who needs it, anywhere
and anytime.
104
Download