Lecture Note 4

advertisement
I S 6 3 0 : A c c o u nti ng I n f orma tion S y s t ems
h t t p : / / w w w. c s u n . e d u / ~ d n 5 8 4 1 2 / I S 5 3 0 / I S 5 3 0 _ F 1 5 . h t m
Relational Databases
& Data Modeling with ERD
Lecture 4
Learning Objectives
 Limitations of traditional application approaches to
managing data.
 Advantages of centralized database approach
 REAL framework to capture relevant business data
 Data Modeling with Entity-Relationship Diagrams (ERD)
 Advanced database applications in decision support
and knowledge management.
IS 530 : Lecture 4
2
Why Databases ?
 Business information systems are built on databases
of business event data.
 Accounting information is one of many outputs, of
business event data.
 Larger organizations store information in data
warehouses in ways that let managers analyze it to
gain important insights.
 Sophisticated reporting systems, based on data
warehouses and business event databases, help
managers makes better decisions.
IS 530 : Lecture 4
3
Application Approach
To Business Event Processing
IS 530 : Lecture 4
4
Database Approach
To Business Event Processing
IS 530 : Lecture 4
5
Difficulties of
Non-Relational Data Files
 Update Anomaly: not changing all occurrence of a
data item (in many places)
 Insert Anomaly: add an invalid (null record) to the
database
 Delete Anomaly: not remove all info (in many
places) about a deleted record
IS 530 : Lecture 4
6
Difficulties with Applications Approach
 Each application collects and manages its own
data in dedicated, separate, physically
distinguishable files.
 Data redundancy leads to inconsistencies /
integrities among the same data in different files.
 Increased costs to store multiple versions of the
same data in different files.
 Data residing in separate files are not shareable
because fixed record layout in data files created
for a particular application.
IS 530 : Lecture 4
7
Centralized Database Approach
 Facts about events are stored in relational
database tables instead of separate files.
 Improves efficiency, eliminates data redundancies,
and improves data integrity.
 Enables integrated business information systems
that include data about all of a company’s
operations.
 Multiple users from throughout the organization can
view and aggregate event data in a manner most
conducive to their needs.
IS 530 : Lecture 4
8
Database Management Systems
 Database management systems (DBMS): set of
integrated programs designed to simplify the tasks
of creating, accessing, and managing a
centralized database.
 Integrates a collection of files that are independent
of application programs and are available to satisfy
a number of different processing needs.
 Supports normal data processing needs and
provides data useful to managers.
IS 530 : Lecture 4
9
Key DBMS Concepts
 Data independence: data from the system
applications is decoupled to make it independent
of the application or other users.
 Three-tier architecture: presentation (user
interface), logic (applications), and data
(database).
 Query language : a programming language to
create and access a database and to produce
inquiry reports.
 SQL (Structured Query Language): standard for
DBMS query languages.
IS 530 : Lecture 4
10
Advantages of DBMS
 Eliminating data redundancy
 Ease of maintenance
 Reduced labor and storage costs
 Data integrity
 Data independence
 Privacy
IS 530 : Lecture 4
11
Disadvantages of DBMS
 Expensive to implement.
 Expertise needed
 If the DBMS fails, all the organization’s information
processing halts.
 Increased potential for damage with unauthorized
access to central location.
 Database recovery and contingency planning are
more important than in the applications approach.
IS 530 : Lecture 4
12
Disadvantages of DBMS . . .
 “Contention” or “concurrency” problems when
more than one user attempts to access data at the
same time.
 Territorial disputes over “data ownership” who is
responsible for data maintenance.
 CIO and/or a database administrator function
needed to deal with these and other problems.
IS 530 : Lecture 4
13
Evolution of Database Systems
 File Management (Flat File) Systems
 Hierarchical Databases
 Network Databases
 Relational Databases
 Object-Oriented Databases
 Data Warehouse
IS 530 : Lecture 4
14
File Management Systems
EMPLOYEE UPDATE PROGRAM
FD
EMPLOYEE
MASTER
FILE
EMPLOYEE REPORT PROGRAM
FD
CHECK-WRITING PROGRAM
FD
TIMECARD
FILE
FD
IS 530 : Lecture 4
15
Hierarchical Databases
Car
Engine
Left
Door
Handle
Body
Right
Door
Window
Chassis
Hood
Roof
Lock
IS 530 : Lecture 4
16
Hierarchical Database Model
 Hierarchical database model: Records are organized
in a pyramid structure.
 Child records: records that are included in a record
at one level above them (a parent record). May only
have one parent record. Link through “pointers”
 Parent records: include the lower-level child records.
 Cannot sustain complex data structures.
IS 530 : Lecture 4
17
Network Databases
CUSTOMERS
Acme
Mfg.
#11231
PRODUCTS
First
Corp.
#11232
Size 4
Widget
#11233
#11234
4D
Bolt
#11235
ORDERS
IS 530 : Lecture 4
18
Network Database Model
 Network database model: child records can have
more than one parent record.
 Overcomes problems of hierarchical model.
 Eclipsed by relational databases.
IS 530 : Lecture 4
19
Relational Databases
CUSTOMERS
CUST ID
PRODUCTS
1
1
PRODUCT ID
ORDERS
M
ORDER #
CUST ID
PRODUCT ID
M
QUANTITY
IS 530 : Lecture 4
20
Relational Database Model
 Relational database model: data are logically
organized into two dimensional tables (i.e.,
“relations”).
 Improvement over hierarchical or network
database models.
 Able to handle complex queries (info from many
tables/files.)
 Allows only text and numerical data to be stored.
Does not allow the inclusion of complex object
types such as graphics, audio, video, or geographic
information.
IS 530 : Lecture 4
21
Object-Oriented Databases
CUSTOMERS
CUST ID
1
1
PRODUCTS
PRODUCT ID
CUST NAME
PRICE
ADDRESS
QTY-ON HAND
Add Customer
Drop Customer
Change
Customer
New Product
ORDERS
ORDER #
*
Buy Product
Sell Product
CUST ID
PRODUCT ID
QUANTITY
*
Take Order
Update Order
IS 530 : Lecture 4
22
Object-Oriented Database Model
 Object oriented database model: allows the storage
of both simple and complex objects.
 An object can store attributes and instructions for
actions (methods) that can be performed on the
object or its attributes. It is a complete “application
with its own data”
 Object is reusable.
 Object-relational databases: includes a relational
DBMS framework with the capability to store complex
data types.
IS 530 : Lecture 4
23
Data Warehouse
IS 530 : Lecture 4
24
What Info to keep ?
REAL Framework
•Resources
•Events
•Agents
•Locations
IS 530 : Lecture 4
25
A Model of Business Event
Internal
Agents
Resources
Business
Event
Location
•
•
•
•
•
External
Agents
What happened?
When did it happen?
Who was involved?
What Resources were involved?
Where did it occur?
IS 530 : Lecture 4
26
REAL framework
Internal
Agent
Resource
Event 1
Location
External
Agent
Resource
Internal
Agent
Event 2
External
Agent
Location
IS 530 : Lecture 4
REAL Model for Retailing Business
Merchandise
Salesperson
Sell
Merchandise
Counter
Customer
Receive
Customer
Payment
Cash
IS 530 : Lecture 4
28
Entities
 Entity is a group of attributes corresponding to
the same conceptual thing about which we
need to capture and store data (in a file/table)
 Entity is a set of instances / members of the
object that it represents (records)
 Entity must have a unique name, unique
identifier, and at least one attribute (the identifier
itself is sufficient)
IS 530 : Lecture 4
29
Entities : Attributes
 An attribute is a descriptive property or
characteristic of interest of an entity. Also called field.
The data type for an attribute defines what type
of data can be stored in that attribute.
The domain of an attribute defines what values
an attribute can legitimately take on.
The default value for an attribute is the value that
will be recorded if not specified by the user.
•
•
•
IS 530 : Lecture 4
30
Entities : Identification
 A key is an attribute, or a group of attributes, that
assumes a unique value for each entity member
(Student ID, SSN, Driver License).
Why First Name, Last Name are NOT valid keys ?
 A group of attributes that uniquely identifies a
member of an entity is called a composite key.
•
IS 530 : Lecture 4
31
Alternative ERD Notation
Attribute 1
Attribute 3
Attribute 4
Attribute 2
1
Attribute 1
N
Entity 1
Attribute 5
IS 530 : Lecture 4
Attribute 3
Attribute 2
Entity 2
Attribute 4
32
Entities . . .
ENTITY NAME
CUSTOMER
- entity id
- attribute 1
- attribute 2
- …………..
- attribute n
- Customer_ID
- Cust_Name
- Cust_Address
- Cust_Phone
IS 530 : Lecture 4
33
Relationships: Degree
 Degree of Relationship defines how many entities
are involved in a relationship (according to a
business rule):
Recursive (Unary), Binary, Ternary
May carry specific data on the relationship
•
•
IS 530 : Lecture 4
34
Relationships: Degree...
 Recursive Relationship: members in the same
entity have relationship with each other (one
another)
INDIVIDUAL
-ID
-Name
STUDENT
Marry
Date
-StudendID
-StudentName
IS 530 : Lecture 4
Be Friend
35
Relationships : Degree . . .
 Binary Relationship
EMPLOYEE
- Emp_ID
- Emp_Name
- Emp_Title
PROJECT
Lead
Date
IS 530 : Lecture 4
- Project_ID
- Proj_Name
- Proj_Due
36
Relationships : Degree . . .
 Ternary relationship
EMPLOYEE
- EmpID
- Emp_Name
- Emp_Title
Assign
Date
PROJECT
- ProjectID
- Proj_Name
- Proj_Due
TASK
- TaskID
- TaskName
IS 530 : Lecture 4
37
Relationships: Cardinalities
 Cardinalities document how many members of
one entity can relate to a single member of
another entity in a relationship.
Max / Min number of members
Reflect business policies or general business
practices (e.g., how many classes a student can
take; how many students a class can hold).
•
•
Student
Enroll
(16, 37)
IS 530 : Lecture 4
Class
(1, 5)
38
Max Cardinalities
One-to-One (1:1) (Binary) Relationship
Sales
Pay
Cash
Collections
Ex: Cash Sales
One-to-Many (1:M) (Binary) Relationship
Sales
Pay
Cash
Collections
Ex: Installment Payments
IS 530 : Lecture 4
39
Max Cardinalities . . .
Many-to-One (M:1) (Binary) Relationship
Sales
Pay
Cash
Collections
Ex: Pay many credit purchases in full
Many-to-Many (M:N) (Binary) Relationship
Sales
Pay
Cash
Collections
Ex: Pay credit purchases with partial payments over some months
IS 530 : Lecture 4
40
Data Modeling & DB Design
 Data Modeling: what info do we need to keep and
how they relate to one another
 Database Design: tables must be organized with few
or no redundancies (Normalization)
 Keys in Relational DB
Primary key : for identification (Student ID)
Combination primary key (Composite key)
Foreign key: to link one table to another.
Surrogate key : a single-value key as alternate to
Composite key)
[ Secondary key : for grouping (major, gender)]
[Candidate key : alternative attribute could be used
as identifier (SSN, Driver License)]
•
•
•
•
•
•
IS 530 : Lecture 4
41
Database Design
 Relational Data Model (Data Schema)
• Primary key (PK): for record identification
•
•
(Customer ID), (Order ID)
Foreign key (FK): for 1:M relationship, on M-side
(Orders) links to 1-side (the Customer who places
Orders)
Associative Table (Junction table) with Composite
Key (CK) for M:N relationships
IS 530 : Lecture 4
42
Foreign Keys in Relational Database
A foreign key (FK) in Entity E1(CustID in ORDER) is a
primary key of another Entity E2 (CustID in
CUSTOMER), which is used to identify (link) a 1:M
relationship between E1 and E2 (CUSTOMER and
ORDER).
Foreign key is made on the many side (CUSTOMER
has many ORDERS, therefore ORDER carries CustID as
FK to show which Customer places that Order)
IS 530 : Lecture 4
43
Foreign Key
CUSTOMER
CUSTOMER
CustomerID
ORDER
ORDER
OrderID
CustomerID
IS 530 : Lecture 4
1:M
Relationship
Primary Key
Foreign Key
44
Foreign Keys in Relational Database. . .
In M:N relationship, the associative/junction
table
with a composite key will be used to capture the
relationship.
• ORDER involved many PRODUCTS, PRODUCT
involved in many ORDERS. Composite key
ProductID-OrderID for LINE ITEM to indicate
which product involves in which sales
 Each part of the composite key serves like a
foreign key.
Sometimes, a “surrogate” key (RecordNo) is used
as primary key to simplify the identification of
record.
IS 530 : Lecture 4
45
Composite Key
ORDER
PRODUCT
ORDER
OrderID
PRODUCT
ProductID
M:N Relationship
Primary Key
LINE_ITEM
RecordNo
OrderID
ProductID
Composite Key
JUNCTION TABLE
IS 530 : Lecture 4
46
Database Integrity
 Entity integrity: An identifier (primary key) must be
unique to identify specific member of the entity.
 Referential integrity: A foreign key value in a manyside table should match primary key value in the
one-side table (Create ORDER only to an existing
CUSTOMER, or we have to add a customer first
before having business with him/her)
 Domain integrity: error exists when field value is
outside the range/type
IS 530 : Lecture 4
47
Data Dictionary
EMPLOYEE
Attributes
Types
Size
Description
Authorization
EmpID
Numeric
6
Identifier
HR Manager
EmpFirstName
Text
10
Employee First Name
HR Manager
EmpLastName
Text
10
Employee Last Name
HR Manager
Address
Text
50
Employee Address
HR Manager
City
Text
10
Employee City
HR Manager
State
Text
2
Employee Last Name
HR Manager
Zip
Text
XXXXX
Employee Last Name
HR Manager
Phone
Text
XXX-XXX-XXXX
Employee Last Name
HR Manager
Date Hired
Date
MM/DD/YY
Date Hired Employee
HR Manager
Position
Text
15
Position of Employee
HR Manager
Attributes
Types
Size
Description
Authorization
EntryNumber
Numeric
6
Identifier
Project Manager
EntryDate
Date
MM/DD/YY
Date of Entry
Project Manager
HoursWorked
Numeric
2
Hours per Task
Project Manager
CostOfHotel
Currency
3
Fund Spent on Hotel
HR Clerk
CostOfTravel
Currency
3
Fund Spent on Travel
HR Clerk
CostOfMeals
Currency
3
Fund Spent on Food
HR Clerk
Approved
Y/N
1
Approved / Not Yet
Project Manager
EXSPENSE
IS 530 : Lecture 4
48
From REAL Model . . .
Resources
Product
Events
Sales
Agents
Salesperson
Customer
Cash
Cash
Collection
Cashier
IS 530 : Lecture 4
49
From Logical Data Model
Entity-Relationship Diagram:
Customer
Cust No
place
Order
Order No
contain
Product
Product No
Relational Data Model (Data Schema):
CUSTOMER (Cust No, ….)
ORDER (Order No, Cust No, ….)
PRODUCT (Product No,…)
ORDER-PRODUCT (OrderNo, ProductNo, …)
IS 530 : Lecture 4
50
. . . to Physical Implementation with MS Access
IS 530 : Lecture 4
51
Elements of Relational Databases
 Tables: place to store data.
 Queries: tools that allow users to access the data
stored in various tables and to transform data into
information.
 Forms: onscreen presentations that allow users to
view data in tables or collected by queries from
one or more tables and input new data.
 Reports: printed lists and summaries of data stored
in tables or collected by queries from one or more
tables.
IS 530 : Lecture 4
52
Elements Of Relational Databases . . .
Form
Builder
Report
Writer
Interactive
Query Tool
Application
Program
Database
Front-end
Database Engine
To other
computer
systems
Database
Database
Gateway
To other DBMS brands
IS 530 : Lecture 4
53
Database Normalization
 Normalization: A technique for making complex
databases more efficient and more easily handled
by the DBMS
• Eliminates data redundancy
• Each entity stores info about one thing/object only
 Structure of tables must comply with several rules
called normal forms to transform data tables that
are not in normal form into tables that comply with
the rules.
 Failure to normalize results in anomalies: errors when
adding, changing, or deleting data stored in the
database.
IS 530 : Lecture 4
54
Normalization
First normal form (1NF) – an entity whose attributes have no more
than one value for a single instance of that entity
Any attributes that can have multiple values actually describe a
separate entity, possibly an entity and relationship.
•
Second normal form (2NF) – an entity whose nonprimary-key
attributes are dependent on the full primary key.
Any nonkey attributes that are dependent on only part of the
primary key should be moved to any entity where that partial key
is actually the full key. This may require creating a new entity and
relationship on the model.
•
Third normal form (3NF) – an entity whose nonprimary-key attributes
are not dependent on any other non-primary key attributes.
Any nonkey attributes that are dependent on other nonkey
attributes must be moved or deleted. Again, new entities and
relationships may have to be added to the data model.
•
IS 530 : Lecture 4
55
Normalization in Plain English !!!
 First normal form (1NF) :
• No repeating group of a same attribute (multi-valued attribute)
• If not: create a new entity/record for this group.
 Second normal form (2NF)
• Attributes should depend on the whole (composite) key, not
part of it (partial functional dependency).
• If not: create a new entity for these partial depended attributes
 Third normal form (3NF)
• Attributes should depend on the (primary) key only, not on each
other – a non-key attribute (transitive dependency)
• If not: create new entity for these partial depended attributes
IS 530 : Lecture 4
56
Unnormalized Relation
Observation: Repeating groups / multi-value attributes !!!
IS 530 : Lecture 4
57
Relation in 1NF
Observation: Attributes depend on a part of the key !!!
IS 530 : Lecture 4
58
Relations in 2NF
Observation: Attributes depend
on a non-key attribute !!!
IS 530 : Lecture 4
59
Relations in 3NF
Observation: Each table stores
data about one thing only.
IS 530 : Lecture 4
60
Example of Relational Database
IS 530 : Lecture 4
61
Example of Relational Database . . .
IS 530 : Lecture 4
62
Data Warehouses for Data Mining
 Data Warehousing: use IT / IS to collect, organize,
integrate, and store entity-wide data to provide users
with easy access to large quantities of varied data
from across the organization to improve decisionmaking capabilities.
 Data Mart: a subset of Data Warehouse to store
special purposed data
 Metadata is an index of DB: what, format, where
 Data Mining: exploration, aggregation, and analysis
of data in data warehouses using analytical tools and
exploratory techniques.
IS 530 : Lecture 4
63
Data Warehouse
IS 530 : Lecture 4
64
Knowledge Management (KM)
 Explicit Knowledge : anything that can be
documented, archived, or codified often with
the help of information systems
 Tacit Knowledge : the processes and procedures
on how to effectively perform a particular task
stored in a persons mind
 Knowledge Assets : all underlying skills routines,
practices, principles, formulas, methods,
heuristics, and intuitions whether explicit or tacit
 Knowledge Management (KM) : the process an
organization uses to gain the greatest value from
its knowledge assets
IS 530 : Lecture 4
65
Decisions Aids
 Decision aids: Information systems that help
decision makers with aggregate information, what-if
analyses ....
 Includes:
Decision Support Systems
Executive Information Systems
Expert Systems
Intelligent Agents
•
•
•
•
IS 530 : Lecture 4
66
Decision Support Systems (DSS)
 Decision support systems (DSS): information systems
that assist managers with unstructured decisions by
retrieving data and generating information.
• Possesses interactive capabilities (What-if analyses.)
• Can answer ad-hoc inquires.
• Provides data modeling facilities.
 Can imitate human decision making (i.e., artificial
intelligence) when confronting complex and
ambiguous situations (tacit knowledge, underlying
nonlinear relationships from historical data)
IS 530 : Lecture 4
67
Executive Information Systems (EIS)
 Executive Information Systems (EIS) / Executive
Support Systems (ESS): information systems, often
considered a subset of DSS, that combine
information from the organization and the
environment, organize and analyze the information,
and present the information to the manager in an
aggregate form to assists decision making.
IS 530 : Lecture 4
68
Group Support Systems (GSS)
 Group Support Systems (GSS) / Group Decision
Support Systems (GDSS): computer based systems
that support collaborative intellectual work such as:
idea generation, elaboration, analysis, synthesis,
information sharing, and decision making
 Supports brainstorming (a method for freely and
creatively generating as many ideas as possible
without undue regard for their practicality or
realism).
IS 530 : Lecture 4
69
Expert Systems (ES) and Neural Networks (NN]
 Expert Systems (ES): decision support systems for:
complex decisions, where consistency is desirable,
minimize time and maximize quality. Emulates the
problem solving techniques of human experts.
 Neural Networks (NN): computer hardware and
software systems that mimic the human brain’s
ability to recognize patterns or predict outcomes
using less-than complete information.
IS 530 : Lecture 4
70
Intelligent Agents (IA)
 Intelligent Agent (IA): software program that may
be integrated into DSS or other software tools (such
as word processing, spreadsheet, or database
packages).
 Once set in motion, these so-called “bots,” or
“robots,” continue to perform their tasks without
further direction from the user.
IS 530 : Lecture 4
71
Business Intelligence (BI)
 Business intelligence (BI) : uses state-of-the-art
information technologies for storing and analyzing
data to help managers make the best possible
decisions for their companies.
 BI systems are specifically designed to support
managers in making tactical and strategic
decisions.
 BI is often installed into an existing ERP as an
additional module.
IS 530 : Lecture 4
72
Download