Week11- Designing Data

advertisement
DATA SCIENCE
MIS0855 | Spring 2016
Designing Data
SungYong Um
sungyong.um@temple.edu
Pivot Table Data Use a Flat File Structure
EmpNo
101
102
103
101
102
103
101
102
103
Ename
Abigail
Bob
Carolyn
Abigail
Bob
Carolyn
Abigail
Bob
Carolyn
DeptNo
10
20
10
10
20
10
10
20
10
DeptName
Marketing
Purchasing
Marketing
Marketing
Purchasing
Marketing
Marketing
Purchasing
Marketing
Date
Jan 2014
Jan 2014
Jan 2014
Feb 2014
Feb 2014
Feb 2014
Mar 2014
Mar 2014
Mar 2014
Expenses
$1,000
$500
$1,500
$250
$1,000
$900
$400
$1,750
$2,000
All related data is in the same column
EmpNo
101
102
103
101
102
103
101
102
103
Ename
Abigail
Bob
Carolyn
Abigail
Bob
Carolyn
Abigail
Bob
Carolyn
DeptNo
10
20
10
10
20
10
10
20
10
DeptName
Marketing
Purchasing
Marketing
Marketing
Purchasing
Marketing
Marketing
Purchasing
Marketing
Date
Jan 2014
Jan 2014
Jan 2014
Feb 2014
Feb 2014
Feb 2014
Mar 2014
Mar 2014
Mar 2014
Expenses
$1,000
$500
$1,500
$250
$1,000
$900
$400
$1,750
$2,000
This allows you to aggregate data by column value
EmpNo
101
102
103
101
102
103
101
102
103
Ename DeptNo DeptName
Date
Expenses
Abigail
10
Marketing Jan 2014
$1,000
Bob
20
Purchasing Jan 2014
$500
Carolyn
10
Marketing Jan 2014
$1,500
Abigail
10
Marketing Feb 2014
$250
Bob
20
Purchasing Feb 2014 $1,000
Carolyn
10
Marketing Feb 2014
$900
Abigail
10
Marketing Mar 2014
$500
Bob
20
Purchasing Mar 2014 $1,750
Carolyn
10
Marketing Mar 2014 $2,000
How many people
work in marketing?
How much did Abigail
expense in January
through March?
Who expensed the
most in March?
Which month had the
greatest amount
expensed?
Relational Data Model
 Data Model: a formal way to express data relationships to a database
 Relational Data Model: a type of model that represents its information in the form of
logically-related 2-D tables
 The most common, intuitive, de facto model
 consists of Entities, Attributes, and Relationships
 Entity-Relationship Diagram (ERD)
Entities and Attributes (1/2)
 Entity, aka table, is an object or event about which information is stored.
 Object: e.g. Customer, Product, Employee, Factory
 Event: e.g. Order, Registration, Contract, Payment
 A company is usually not an entity unless a database stores information of multiple
companies.
 Attributes, aka fields or columns, are characteristics or proper- ties of an entity.
 A customer (entity) can be described by customer number, name, address, phone
number (attributes).
Entities and Attributes (2/2)
 How to distinguish between Entity and Attribute?
 An Attribute is a characteristic or property of an Entity.
 An Attribute becomes a column of a Table.
Instances
 An entity consists of multiple Instances, aka records or rows.
Entity
Instance
Attribute
Entity Identifier (Primary Key)
 Entity Identifier, aka primary key, is an attribute that
 ensures each instance has a unique value that distinguishes it from every other
instance.
 Every entity must have an entity identifier.
 TUID #
 Social Security #
 Plate #
 other examples?
How to draw Entities and Attributes (1/2)
Entity
Identifier
How to draw Entities and Attributes (2/2)
Entity Identifier
Entity
Customer
Order
Item
Distributor
Customer Number
First Name
Last Name
Street
City
State
Zip Code
Phone Number
Credit Card #
Credit Card Exp.
Order Number
Customer Number
Order Date
Order Filled
Item Number
Title
Distributor #
Price
Release Date
Genre
Distributor #
Name
Street
City
State
Zip Code
Phone Number
Contact Name
Contact Phone
Attributes
Example
SportTech Events puts on athletic events for local high school athletes. The company
needs a database designed to keep track of the sponsor for the event and where the
event is located.
Each event needs a description, date, and cost. Separate costs are negotiated for each
event. The company would also like to have a list of potential sponsors that includes each
sponsor’s contact information such as the name, phone number, and address.
Each event will have a single sponsor, but a particular sponsor may sponsor more than
one event. Each location will need an ID, contact person, and phone number. A particular
event will use only one location, but a location may be used for multiple events.
ID
Descripti
on
Date
ID
Cost
Name
Event
Sponsor
Phone
Add-ress
Location
ID
Contact
Phone
Relationship
 An entity should be related to one or more other entities.
 Any relationship between the entities below?
One-to-One (1:1) Relationship
 A relationship between two entities in which
 an instance of entity A can be related to only one instance of entity B and
 an instance of entity B can be related to only one instance of entity A.
Name of relationship
usually with a verb
One-to-Many (1:M) Relationship
 A relationship between two entities, in which
 an instance of entity A can be related to one or more instances of entity B and
 an instance of entity B can be related to only one instance of entity A.
Many-to-Many (M:N) Relationship
 A relationship between two entities, in which
 an instance of entity A can be related to one or more instances of entity B and
 an instance of entity B can be related to one or more instances of entity A.
ID
Descripti
on
Date
ID
Cost
Name
Event
M
Support
1
Sponsor
Phone
M
Add-ress
Host
1
Location
ID
Contact
Phone
Relationship Depends on Business Rules
 There is no right answer in relationship types. It depends on business descriptions and
rules.
 If a customer can place only one order, it’s 1:1 relationship.
 If an order can have multiple customers, it’s M:N.
Database Normalization (1/2)
EmpNo
101
102
Ename
Abigail
Bob
DeptNo
10
20
DeptName
Marketing
Purchasing
Date
Jan 2014
Jan 2014
Expenses
$1,000
$500
103
101
102
104
Carolyn
Abigail
Bob
Kevin
10
10
20
30
Marketing
Marketing
Purchasing
R&D
Jan 2014
Feb 2014
Feb 2014
Feb 2014
$1,500
$250
$1,000
$900
101
102
103
Abigail
Bob
Carolyn
10
20
10
Marketing
Purchasing
Marketing
Mar 2014
Mar 2014
Mar 2014
$400
$1,750
$2,000
Database Normalization (1/2)
EmpNo
Ename
DeptNo
DeptName
Date
Expenses
101
Abigail
10
Marketing
Jan 2014
$1,000
102
Bob
20
Purchasing
Jan 2014
$500
103
Carolyn
10
Marketing
Jan 2014
$1,500
101
Abigail
10
Marketing
Feb 2014
$250
102
Bob
20
Purchasing
Feb 2014
$1,000
104
Kevin
30
R&D
Feb 2014
$900
101
Abigail
10
Marketing
Mar 2014
$400
102
Bob
20
Purchasing
Mar 2014
$1,750
103
Carolyn
10
Marketing
Mar 2014
$2,000
 There are too many redundant data (Department name, employee names). What if
there are millions of records?
 Normalization : a process to remove redundant data
Database Normalization (2/2)
DeptNo
10
20
30
DeptName
Marketing
Purchasing
R&D
EmpNo
101
102
Ename
Abigail
Bob
DeptNo
10
20
103
104
Carolyn
Kevin
10
30
EmpNo
101
102
Date
Jan 2014
Jan 2014
Expenses
$1,000
$500
103
101
102
Jan 2014
Feb 2014
Feb 2014
$1,500
$250
$1,000
104
101
Feb 2014
Mar 2014
$900
$400
102
103
Mar 2014
Mar 2014
$1,750
$2,000
EmpNo
Ename
DeptNo
Employee
M
Belong to
Make
M
Expense
Date
Dept
Dept
Name
1
ExpID
1
Expense
Download