Logical Database Design

advertisement
Chapter 4
G. Green
Logical Database Design
1
Agenda
• Chapter 1 pgs 25 – 28
• Chapter 9 pgs 409 – 418
• Relational Database Model
• Transforming ERDs into Relations
G. Green
• Evolution of Data Models
• Referential Integrity
• Normalization
2
The Evolution
of Data Models
• Network
• Relational
G. Green
• Hierarchical
• Object oriented
• Multi-dimensional
• NoSQL
3
• Developed by Codd (IBM) in 1970
• Represents data in the form of tables
• Based on mathematical theory
G. Green
Relational Data Model
• 3 Components:
• relational database structure
• relational rules (integrity)
• relational operations
(manipulation)
4
Relational Data Model
• Advantages
• Improved conceptual simplicity
• Easier database design, implementation, management, and use
G. Green
• Structural independence
• Ad hoc query capability
• Mathematical foundation
• Disadvantages
• Hardware and system software overhead
• Can facilitate poor design and implementation
• May promote “islands of information” problems
5
1. Relational Database Structures
2. Rules of Relations
3. Relational Operators
G. Green
Relational Theory Components
6
1. Relational Database Structure
› Tables, Rows, Columns
› Files, Records, Fields
 Primary Key must be designated
 Foreign Keys must be designated for relationships
CLASS TABLE
CRN
CourseNo SecNo Room Days
(PK)
No(FK)
13109 MIS 1305 02
HCB
TTh
229
15225 MKT2307 05
HCB
MWF
229
13206 MIS3305 01
HSB
MWF
210
G. Green
 Relations, Tuples, Attributes
ROOM TABLE
RoomNo Owning
(PK)
Dept
ROG111 ECS
# of DeskPCs
Seats
30
Y
HSB210
MIS
50
N
HCB229 MIS
24
N
7
 Relation names must be unique
 Entries in columns are atomic (single valued)
 Entries in column are from same domain
 Each row is unique
 Ordering is insignificant
G. Green
2. Rules of Relations
CLASS Table
CRN
CourseNo SecNo Room
(PK)
No(FK)
13109 MIS 1305 02
HCB
229
15225 MKT2307 05
HCB
229
13206 MIS3305 01
HSB
210
Days
TTh
MWF
MWF
8
 Data in tables should be added, updated, and
deleted without errors
› avoid inconsistency ==> referential integrity
 insertion
 update
 Deletion
› avoid anomalies
 insertion
 update
 deletion
G. Green
2. Rules of Relations, con’t...
==> normalization
9
3. Relational Operators
• Relational Algebra
*
UNION (+)
INTERSECTION
DIFFERENCE (-)
PRODUCT (x)
SELECT (tuples)
PROJECT (attributes)
JOIN (PRODUCT, SELECT, PROJECT)
G. Green
•
•
•
•
•
•
•
*
*Diagram adapted from Hyperion presentation,
http://infolab.stanford.edu/infoseminar/Archive/FallY99/rus
sakovskii-slides/sld001.htm
10
• Represent entities as relations
• Represent relationships as either:
• foreign keys in relations
• new relations
G. Green
Converting ERD to Relational
Model
• Provide sample data
• Normalize relations
11
Representing Entities as Tables
• attributes become columns
• primary key must be designated
• regular entities have atomic keys
• associative entities have composite keys
• subtype entities have same key as supertype
G. Green
• Each entity converted to a relational table
• example entity instances are rows of table
12
ERD Example Problem Revisited
• Customer requests generate orders
• Orders may consist of many ordered products
• Products may be contained on many orders, or no orders at all
G. Green
• A company sells products to customers
13
ERD Example Converted to Tables
14
Representing Relationships
› merge attributes into single table;
› OR create foreign key (FK) in either relation
 1:M
G. Green
 1:1
› create foreign key (FK) in relation on “many” side of
relationship
 M:M
› should’ve been eliminated on ERD!!!
› create new relation with PKs of related entities as (1)
concatenated PK, and (2) FKs in new relation
15
Referential Integrity
• For every value of a foreign key there must be an
existing primary key with that value
G. Green
• Maintains consistency between data in related
tables
• Create rules/constraints for:
• insertion of foreign keys
• update and deletion of primary keys
16
Adding Referential Integrity Constraints
(PK)
D:R,
D:R, U:C
U:C
(PK)
(FK)
(FK)
D:R,
D:R, U:C
U:C
(PK)
(PK)
(FK)
(FK)
(PK)
(PK)
(FK)
(FK)
D:R,
D:R, U:C
U:C
(PK)
(PK)
G. Green
Adding Referential Integrity
Constraints, cont…
18
 Convert complex relations into simpler relations
 Why?
 Ensures relations conform to rules
 Ensures relation contains facts about one “theme”
G. Green
Normalization
 Reveals/corrects redundancies, errors,
ambiguities in data model
 Only a simple check IF good data model exists
 Normal Forms
 state of a relation
 rids relations of potential anomalies
19
Normalization, con’t...
• are experienced when we attempt to store a
value for one field but cannot because the
value of another field is unknown
• e.g., cannot add a new customer’s information
until an order number is ready to be entered
Order ID(PK) Order Date
Customer ID
Customer
Name
Customer
Address
G. Green
• Insertion Anomalies
20
Normalization, con’t...
• are experienced when a value for one field
we wish to keep is unexpectedly removed
when a value for another field is deleted
• e.g., cannot delete the sole order for a
customer without deleting the only copy of
the customer’s information also
Order ID(PK)
Order Date
Customer ID
Customer
Name
Customer
Address
G. Green
• Deletion Anomalies
21
Normalization, con’t...
• are experienced when changes to multiple
records of a table are needed to effect an
update to a single value of a field
• e.g., cannot completely update a customer’s
address without changing it for every order
placed by that customer
Order ID(PK) Order Date
Customer ID
Customer
Name
Customer
Address
G. Green
• Update Anomalies
22
G. Green
Steps in
Normalization
23
Normalization, con’t...
• the key (1NF)
• the WHOLE key (2NF)
• and nothing but the key (3NF)
G. Green
• Every attribute is dependent on:
24
1NF
• Primary key
• No repeating values or groups
G. Green
• The table is a relation
• only atomic values
• All column values from same domain
• To correct:
• define new (usually associative) entity
25
2NF
 (Full) Functional dependency
 when the value of one attribute can be determined based on
the value of another attribute
 Partial functional dependency
 when a non-key attribute is functionally dependent on a part of
the PK
G. Green
 1NF + No partial functional dependencies
 Already in 2NF if:
 PK is NOT concatenated
 Relation contains no non-key attributes
• To correct:
• Decompose into 2 or more relations (if not already)
• one with original (concatenated) key + attributes
• one (or more) with the “depended on” partial key as PK + attributes
26
3NF
• 2NF + No transitive dependencies
• a functional dependency between 2 non-key attributes
• when a non-key attribute is functionally dependent on another
non-key attribute
G. Green
• Transitive dependency
• Already in 3NF if:
• only 0 or 1 non-key attributes in relation
• To correct:
• Decompose into 2 or more relations (if not already)
• one with original PK + attributes
• one (or more) with “depended on” non-key attribute as PK +
attributes
27
G. Green
OTHER DATA MODELS
28
The Evolution
of Data Models
• Network
• Relational
G. Green
• Hierarchical
• Object oriented
• Multi-dimensional
• NoSQL
29
• Each parent can have many children
• Each child has only one parent
• Tree defined by path that traces parent segments to child
segments, beginning from the left
• Hierarchical path
G. Green
Hierarchical Data Model
• Ordered sequencing of segments tracing hierarchical structure
30
Problem: Child with Multiple Parents
G. Green
31
Hierarchical Data Model, cont…
• Database security
• Performance, efficiency
• Data independence
G. Green
• Advantages
• Disadvantages
•
•
•
•
Complex implementation
Structural dependence
Complex applications programming and use
Lack of standards
32
Network Data Model
 Created to:
› Represent complex M:M data relationships
• Child can have many parents
 Resembles hierarchical model
› Collection of records in 1:M relationships
G. Green
› Impose a database standard
 Sets
› Implement relationships
› Composed of:
 Owner
 Member
33
Network Data Model, cont…
• Handles more relationship types
• Conformance to standards
• Disadvantages
G. Green
• Advantages
• System complexity
• Lack of (popular) product support
34
• Big Data = more than you're able to
effectively process
G. Green
Big Data
• Influenced by Mobile, Social Networking, Web analytics, RFID,
Atmospheric, Medical Research data, …
• Issue: ability of traditional RDBMSs to handle “big data”
35
Big Data RDBMS Issues
• Transaction-focus
 Requires schema
 maintenance issue
G. Green
• Traditional RDBMS Problems
• ACID-focus
 Requires locks, db constraints, joins
 performance & availability issues
• "Relatively" small amounts of operational data
 Exceptions require complex, $ actions  scalability issue
• Traditional RDBMS Solutions
Reference: http://www.slideshare.net/dondemsak/intro-to-big-data-and-nosql
36
G. Green
Complex Data Landscape
37
NOTE: This diagram is for effect ONLY—it is incomplete (e.g., no MDDB, no OODB) AND contains some inaccuracies
Big Data Solutions
Columnar Databases
G. Green
NewSQL Databases
Hadoop
NoSQL Databases
38
Big Data Solutions
• Relational-based
• Good when high scalability needed with relational DBMSs
G. Green
NewSQL Databases
 Re-written, highly optimized storage engines
 "Sharding" (i.e. horizontal partitioning across multiple DB instances)
 In-memory databases
 Distributed query processing
39
Big Data Solutions
• Good for data warehouses, analytics
 computing aggregates on a few columns
G. Green
Columnar Databases
• File contains:
 all values of a specific column vs. all values of all columns
40
Multidimensional Data Model
 Data represented as cube
 Cube depicts business measures analyzed by dimensions
G. Green
 Modeling and analyzing of data across dimensions
 Optimized for decision-making vs. transaction processing
 Data storage
 Pre-aggregation
 De-normalization
 Basis for data warehouses
41
Big Data Solutions
• Good for storing, retrieving large amounts of semiand unstructured data in batch/offline mode
 HDFS: data distributor
 MapReduce: request/processing distributor
G. Green
Hadoop
42
NoSQL Databases
•
•
•
•
Scalability via Physical Distribution and Replication of data
No fixed schema
"Individual query systems" instead of SQL
Support for semi- and un-structured data
G. Green
• Focus (in most cases):
• Some provide consistency
• Apache's Hbase
• Amazon's DynamoDB
• Most provide "eventual consistency"
• Google’s BigTable
• Facebook's Cassandra
• Amazon's SimpleDB
• Uses a variety of data models…
43
Big Data/NoSQL, cont…
•
•
•
•
Column (Family) Store
Key Value Store
Document Store
Graph
G. Green
• NoSQL Physical Data Models
• Advantages
•
•
•
•
•
•
Highly scalable
Good for many writes
Support for semi- and un-structured data
Data Model (schema) does not have to be defined up-front
Many are open source
Cloud options available (e.g., Amazon's SimpleDB)
• Disadvantages
• No common query language
• Data inconsistency (“dirty reads”)
• Reliance on client applications for data validation, consistency, etc…
44
Summary




Hierarchical
Network
Multidimensional
NoSQL/Big Data
 ACID, BASE
G. Green
 Data Models
 Relational Model Components
 Structure
 Rules
 Manipulation
 Transforming ERDs to Relations
 Representing entities and relationships
 understand foreign keys
 Referential Integrity
 understand RI constraints
 Normalization
 Purpose
 understand anomalies
 3 normal forms
45
Download