lec 14 IT

advertisement
INFORMATION AND DATABASES
Part 2
Entities: Identification
A key is an attribute, or a group of attributes,
that assumes a unique value for each entity
member (Student ID, SSN, Driver License).
•
• Why First Name, Last Name are NOT valid keys ?
• A group of attributes that uniquely identifies a
member of an entity is called a composite key.
•A secondary key is an attribute whose values
divide all entity members into useful
subgroups/sub-criteria. (Major, Gender, etc)
Relationships: Degree
• Degree of Relationship defines how many entities are
involved in a relationship (according to a business
rule):
• Recursive (Unary), Binary, Ternary
• May carry specific data on the relationship
Relationships: Degree...
• Recursive Relationship: members in the same entity
have relationship with each other (one another)
1
M
INDIVIDUAL(0,1)
STUDENT (0,M)
Marry
Date
-ID
-Name
1
(0,1)
Be Friend
-StudendID
-StudentName
N
(0,N)
Relationships: Degree ...
• Binary Relationship
EMPLOYEE
PROJECT
1
- Emp_ID
(1,1)
- Emp_Name
- Emp_Title
Lead
Date
M
(0,M)
- Project_ID
- Proj_Name
- Proj_Due
Relationships: Degree ...
• Ternary relationship
EMPLOYEE
PROJECT
M
- EmpID
(1,M)
- Emp_Name
- Emp_Title
Assign
Date
N
(1,N) - ProjectID
- Proj_Name
- Proj_Due
P
(1,P)
TASK
- TaskID
- TaskName
Relationships: Cardinalities
• Cardinalities document how many members of one
entity can relate to a single member of another entity in
a relationship.
• Max / Min number of members
• Reflect business policies or general business
practices (e.g., how many classes a student can take,
how many students a class can hold).
Student
M
(25, 40)
N
Enroll
Class
(1, 5)
One-to-One
• One-to-One (1:1) – A relationship between two
entities in which an instance of entity A can be related
to only one instance of entity B and entity B can be
related to only one instance of entity A
1
Sales
(1,1)
Ex: Cash Sales
1
Pay
Cash
(1,1) Collections
One-to-Many
• One-to-Many (1:M) – A relationship between two
entities, in which an instance of entity A, can be
related to zero, one, or more instances of entity B
and entity B can be related to only one instance
of entity A
Sales
1
(1,1)
Pay
M
Cash
(1,M) Collections
Ex 1: Installment Payments
Sales
M
(1,M)
Pay
1
Cash
(1,1) Collections
Ex2: Pay many credit purchases in full
Many-to-Many
• Many-to-Many (M:N) – A relationship between
two entities in which an instance of entity A can be
related to zero, one, or more instances of entity B
and entity B can be related to zero, one, or more
instances of entity A
Sales
M
(1,M)
Pay
N
(1,N)
Cash
Collections
Ex: Pay credit purchases with partial
payments over some months
Data Modeling & DB Design
• Database Design
• Must be organized
• Few or no redundancies
• Data model: what info do we need to keep and how they relate
to one another
• Keys
• Primary key : for identification (Student ID, SSN)
• Combination primary key (Composite key)
• Secondary key : for grouping (major, gender)
• Foreign key: to link one table to another,
Dealing with Many-to-Many
Relationships
The relational data model cannot handle Many-to-Many
relationships directly
–
–
It is limited to one-to-one and one-to-many relationships
Many-to-many relationships need to be replaced with a
collection of one-to-many relationships (Cf # 63)
Composite Entities
• Composite entities - Entities that exist to represent the
M:N relationship between two other entities
• Example:
• There is a many-to-many relationship between an ITEM
and an ORDER
• An ORDER can contain many ITEM(s) and over time,
the same ITEM can appear on many ORDER(s)
Composite Entities
Entity-Relationship Diagram Model
Database Design
• Relational Data Model
• Primary key (PK): for record identification (Customer), (Order)
• Foreign key (FK): for 1:M relationship, on M-side (Orders) links to
1-side (the Customer who places Orders)
• Associative Table (Junction table) with Composite Key (CK) for M:N
relationships
Foreign Keys in Relational Database
•A foreign key (FK) in Entity E1(CustID in
ORDER) is a primary key of another Entity E2
(CustID in CUSTOMER), which is used to
identify (link) a 1:M relationship between E1
and E2 (CUSTOMER and ORDER).
•Foreign key is made on the many side
(CUSTOMER has many ORDERS, therefore
ORDER carries CustID as FK to show which
Customer places that Order)
Foreign Key
CUSTOMER
CUSTOMER
CustomerID
1
M
ORDER
ORDER
OrderID
CustomerID
1:M Relationship
Primary Key
Foreign Key
Foreign Keys in Relational Database. . .
•In M:N relationship, the associative/junction
table with a composite key will be used to
capture the relationship.
• ORDER involved many PRODUCTS, PRODUCT involved in
many ORDERS. Composite key ProductID-OrderID for LINE
ITEM to indicate which product involves in which sales
Each part of the composite key serves like a
foreign key.
•Sometimes, a “surrogate” key (RecordNo) is
used as primary key to simplify the identification
of record.
•
Composite Key
ORDER
ORDER
OrderID
N
M
PRODUCT
PRODUCT
ProductID
LINE_ITEM
RecordNo
OrderID
ProductID
JUNCTION TABLE
M:N Relationship
Primary Key
Composite Key
Database Integrity
• Entity integrity: An identifier (primary key) must
be unique to identify specific member of the
entity.
• Referential integrity: A foreign key value in a
many-side table should match primary key
value in the one-side table (Create ORDER
only to an existing CUSTOMER, or we have to
add a customer first before having business
with him/her)
• Domain integrity:error exists when field value
is outside the range/type
Database Design …
• Data Dictionary
• Provides information about each attribute in the
database including:
- Name (What data is about?)
- Key (Is it a key or part of a key?)
- Data Type (date, alpha-numeric, numeric, etc.)
- Valid Value (the format or numbers allowed)
• Can be used to enforce Business Rules to prevent
illegal or illogical values from entering the database.
(e.g. who has authority to enter certain kinds of data;
can’t enter characters in numeric field …)
Database Design …
• Data Dictionary …
• Data type (especially data types of keys)
• Data size (especially data sizes of keys)
• Description (what for)
• Authorization (who can create/update)
Data Dictionary
EMPLOYEE
Attributes
Types
Size
Description
Authorization
EmpID
Numeric
6
Identifier
HR Manager
EmpFirstName
Text
10
Employee First Name
HR Manager
EmpLastName
Text
10
Employee Last Name
HR Manager
Address
Text
50
Employee Address
HR Manager
City
Text
10
Employee City
HR Manager
State
Text
2
Employee Last Name
HR Manager
Zip
Text
XXXXX
Employee Last Name
HR Manager
Phone
Text
XXX-XXX-XXXX
Employee Last Name
HR Manager
Date Hired
Date
MM/DD/YY
Date Hired Employee
HR Manager
Position
Text
15
Position of Employee
HR Manager
Attributes
Types
Size
Description
Authorization
EntryNumber
Numeric
6
Identifier
Project Manager
EntryDate
Date
MM/DD/YY
Date of Entry
Project Manager
HoursWorked
Numeric
2
Hours per Task
Project Manager
CostOfHotel
Currency
3
Fund Spent on Hotel
Project Manager
CostOfTravel
Currency
3
Fund Spent on Travel
Project Manager
CostOfMeals
Currency
3
Fund Spent on Food
Project Manager
Approved
Y/N
1
Approved / Not Yet
Project Manager
EXSPENSE
From Logical Data Model
Entity-Relationship Diagram:
Customer
Cust No
1
place
Order
M Order No
M
contain
Product
N Product No
Relational Data Model:
CUSTOMER (Cust No, ….)
ORDER (Order No, Cust No, ….)
PRODUCT (Product No,…)
ORDER-PRODUCT (OrderNo, ProductNo, …)
…to “MS Access” Implementation.
Another Example: Enrollment
Entity-Relationship Diagram:
Student
M
N
Enroll
Class
N
M
Assign
Relational Data Model:
STUDENT (Student ID, ….)
CLASS (Course ID, ….)
INSTRUCTOR (Instructor ID,…)
ENROLLMENT (Student ID , Course ID, …)
ASSIGNMENT (CourseID , InstructorID, …)
Instructor
Data Analysis with DBMS : Queries
• Structured Query Language (SQL)
• Query by Example (QBE)
How Organizations Get the
Most from Their Data
• Data Warehousing: – A logical collection of information
– gathered from many different operational databases
– that supports business analysis activities and
decision-making tasks
• Integrating multiple large databases into a single
repository
• Queries, analysis, and processing
• Purpose: put key business information into the hands
of decision makers
How Organizations Get the
Most from Their Data …
• Data Marts
• Instead of one large data warehouse, many
organizations create multiple data marts.
• A data mart is a small data warehouse,
designed for the end-user needs in a strategic
business unit (SBU) or a department.
• Each contains a subset of the data: finance,
inventory, personnel
• Each data mart is customized for particular DSS
applications
DATA MARTS
Performing Business Analysis
with Data Marts
• Extraction, transformation, and loading
(ETL) – A process that extracts information
from internal and external databases,
transforms the information using a common
set of enterprise definitions, and loads the
information into a data warehouse
How Organizations Get the
Most from Their Data …
• Data Mining
• Information on customers, products, markets, etc. from
historical data
• Drill down: from summary to more detailed data
• Sort and extract information
• Trends, correlations, forecasting, statistics
Data Warehousing
• Data warehouses are organized by business
dimension or subject.
• Data warehouses are multidimensional.
• Data warehouses are historical.
• Data warehouses use online analytical processing.
Data Warehouse Framework & Views
Benefits of Data Warehousing
• End users can access data quickly and easily via Web
browsers because they are located in one place.
• End users can conduct extensive analysis with data in
ways that may not have been possible before.
• End users have a consolidated view of organizational
data.
Business Intelligence
• Improving the quality of business decisions has a
direct impact on costs and revenue
• BI enables business users to receive data for
analysis that is:
• Reliable
• Consistent
• Understandable
• Easily manipulated
Business Intelligence
BI Can Answer Tough Questions
Multidimensional Analysis
• Databases contain information in a series of two-
dimensional tables
• In a data warehouse and data mart, information is
multidimensional, it contains layers of columns
and rows
• Dimension – A particular attribute of information
• Cube – Common term for the representation of
multidimensional information
Multidimensional Analysis
Cubes of Information
Information Cleansing
• Information cleansing / scrubbing – A process
that weeds out and fixes or discards inconsistent,
incorrect, or incomplete information
Uncovering Trends and Patterns
with Data Mining
• Data mining – The process of analyzing data to
extract information not offered by the raw data alone
• Data-mining tools – use a variety of techniques to
find patterns and relationships in large volumes of
information
• Classification
• Estimation
• Affinity grouping
• Clustering
Uncovering Trends and Patterns
with Data Mining . . .
• Structured data – Data already in a database or a
spreadsheet
• Unstructured data – Data does not exist in a fixed
location and can include text documents, PDFs, voice
messages, emails
• Text mining – Analyzes unstructured data to find
trends and patterns in words and sentences
• Web mining – Analyzes unstructured data
associated with websites to identify consumer
behavior and website navigation
Uncovering Trends and Patterns
with Data Mining . . .
•
Common forms of data-mining analysis
capabilities include
•
•
•
Cluster analysis
Association detection
Statistical analysis
THANKYOU
Download