Chapter 4 G. Green Logical Database Design 1 Agenda • Chapter 1 pgs 25 – 28 • Chapter 9 pgs 409 – 418 • Relational Database Model • Transforming ERDs into Relations G. Green • Evolution of Data Models • Referential Integrity • Normalization 2 The Evolution of Data Models • Network • Relational G. Green • Hierarchical • Object oriented • Multi-dimensional • NoSQL 3 • Developed by Codd (IBM) in 1970 • Represents data in the form of tables • Based on mathematical theory G. Green Relational Data Model • 3 Components: • relational database structure • relational rules (integrity) • relational operations (manipulation) 4 Relational Data Model • Advantages • Improved conceptual simplicity • Easier database design, implementation, management, and use G. Green • Structural independence • Ad hoc query capability • Mathematical foundation • Disadvantages • Hardware and system software overhead • Can facilitate poor design and implementation • May promote “islands of information” problems 5 1. Relational Database Structures 2. Rules of Relations 3. Relational Operators G. Green Relational Theory Components 6 1. Relational Database Structure › Tables, Rows, Columns › Files, Records, Fields Primary Key must be designated Foreign Keys must be designated for relationships CLASS TABLE CRN CourseNo SecNo Room Days (PK) No(FK) 13109 MIS 1305 02 HCB TTh 229 15225 MKT2307 05 HCB MWF 229 13206 MIS3305 01 HSB MWF 210 G. Green Relations, Tuples, Attributes ROOM TABLE RoomNo Owning (PK) Dept ROG111 ECS # of DeskPCs Seats 30 Y HSB210 MIS 50 N HCB229 MIS 24 N 7 Relation names must be unique Entries in columns are atomic (single valued) Entries in column are from same domain Each row is unique Ordering is insignificant G. Green 2. Rules of Relations CLASS Table CRN CourseNo SecNo Room (PK) No(FK) 13109 MIS 1305 02 HCB 229 15225 MKT2307 05 HCB 229 13206 MIS3305 01 HSB 210 Days TTh MWF MWF 8 Data in tables should be added, updated, and deleted without errors › avoid inconsistency ==> referential integrity insertion update Deletion › avoid anomalies insertion update deletion G. Green 2. Rules of Relations, con’t... ==> normalization 9 3. Relational Operators • Relational Algebra * UNION (+) INTERSECTION DIFFERENCE (-) PRODUCT (x) SELECT (tuples) PROJECT (attributes) JOIN (PRODUCT, SELECT, PROJECT) G. Green • • • • • • • * *Diagram adapted from Hyperion presentation, http://infolab.stanford.edu/infoseminar/Archive/FallY99/rus sakovskii-slides/sld001.htm 10 • Represent entities as relations • Represent relationships as either: • foreign keys in relations • new relations G. Green Converting ERD to Relational Model • Provide sample data • Normalize relations 11 Representing Entities as Tables • attributes become columns • primary key must be designated • regular entities have atomic keys • associative entities have composite keys • subtype entities have same key as supertype G. Green • Each entity converted to a relational table • example entity instances are rows of table 12 ERD Example Problem Revisited • Customer requests generate orders • Orders may consist of many ordered products • Products may be contained on many orders, or no orders at all G. Green • A company sells products to customers 13 ERD Example Converted to Tables 14 Representing Relationships › merge attributes into single table; › OR create foreign key (FK) in either relation 1:M G. Green 1:1 › create foreign key (FK) in relation on “many” side of relationship M:M › should’ve been eliminated on ERD!!! › create new relation with PKs of related entities as (1) concatenated PK, and (2) FKs in new relation 15 Referential Integrity • For every value of a foreign key there must be an existing primary key with that value G. Green • Maintains consistency between data in related tables • Create rules/constraints for: • insertion of foreign keys • update and deletion of primary keys 16 Adding Referential Integrity Constraints (PK) D:R, D:R, U:C U:C (PK) (FK) (FK) D:R, D:R, U:C U:C (PK) (PK) (FK) (FK) (PK) (PK) (FK) (FK) D:R, D:R, U:C U:C (PK) (PK) G. Green Adding Referential Integrity Constraints, cont… 18 Convert complex relations into simpler relations Why? Ensures relations conform to rules Ensures relation contains facts about one “theme” G. Green Normalization Reveals/corrects redundancies, errors, ambiguities in data model Only a simple check IF good data model exists Normal Forms state of a relation rids relations of potential anomalies 19 Normalization, con’t... • are experienced when we attempt to store a value for one field but cannot because the value of another field is unknown • e.g., cannot add a new customer’s information until an order number is ready to be entered Order ID(PK) Order Date Customer ID Customer Name Customer Address G. Green • Insertion Anomalies 20 Normalization, con’t... • are experienced when a value for one field we wish to keep is unexpectedly removed when a value for another field is deleted • e.g., cannot delete the sole order for a customer without deleting the only copy of the customer’s information also Order ID(PK) Order Date Customer ID Customer Name Customer Address G. Green • Deletion Anomalies 21 Normalization, con’t... • are experienced when changes to multiple records of a table are needed to effect an update to a single value of a field • e.g., cannot completely update a customer’s address without changing it for every order placed by that customer Order ID(PK) Order Date Customer ID Customer Name Customer Address G. Green • Update Anomalies 22 G. Green Steps in Normalization 23 Normalization, con’t... • the key (1NF) • the WHOLE key (2NF) • and nothing but the key (3NF) G. Green • Every attribute is dependent on: 24 1NF • Primary key • No repeating values or groups G. Green • The table is a relation • only atomic values • All column values from same domain • To correct: • define new (usually associative) entity 25 2NF (Full) Functional dependency when the value of one attribute can be determined based on the value of another attribute Partial functional dependency when a non-key attribute is functionally dependent on a part of the PK G. Green 1NF + No partial functional dependencies Already in 2NF if: PK is NOT concatenated Relation contains no non-key attributes • To correct: • Decompose into 2 or more relations (if not already) • one with original (concatenated) key + attributes • one (or more) with the “depended on” partial key as PK + attributes 26 3NF • 2NF + No transitive dependencies • a functional dependency between 2 non-key attributes • when a non-key attribute is functionally dependent on another non-key attribute G. Green • Transitive dependency • Already in 3NF if: • only 0 or 1 non-key attributes in relation • To correct: • Decompose into 2 or more relations (if not already) • one with original PK + attributes • one (or more) with “depended on” non-key attribute as PK + attributes 27 G. Green OTHER DATA MODELS 28 The Evolution of Data Models • Network • Relational G. Green • Hierarchical • Object oriented • Multi-dimensional • NoSQL 29 • Each parent can have many children • Each child has only one parent • Tree defined by path that traces parent segments to child segments, beginning from the left • Hierarchical path G. Green Hierarchical Data Model • Ordered sequencing of segments tracing hierarchical structure 30 Problem: Child with Multiple Parents G. Green 31 Hierarchical Data Model, cont… • Database security • Performance, efficiency • Data independence G. Green • Advantages • Disadvantages • • • • Complex implementation Structural dependence Complex applications programming and use Lack of standards 32 Network Data Model Created to: › Represent complex M:M data relationships • Child can have many parents Resembles hierarchical model › Collection of records in 1:M relationships G. Green › Impose a database standard Sets › Implement relationships › Composed of: Owner Member 33 Network Data Model, cont… • Handles more relationship types • Conformance to standards • Disadvantages G. Green • Advantages • System complexity • Lack of (popular) product support 34 • Big Data = more than you're able to effectively process G. Green Big Data • Influenced by Mobile, Social Networking, Web analytics, RFID, Atmospheric, Medical Research data, … • Issue: ability of traditional RDBMSs to handle “big data” 35 Big Data RDBMS Issues • Transaction-focus Requires schema maintenance issue G. Green • Traditional RDBMS Problems • ACID-focus Requires locks, db constraints, joins performance & availability issues • "Relatively" small amounts of operational data Exceptions require complex, $ actions scalability issue • Traditional RDBMS Solutions Reference: http://www.slideshare.net/dondemsak/intro-to-big-data-and-nosql 36 G. Green Complex Data Landscape 37 NOTE: This diagram is for effect ONLY—it is incomplete (e.g., no MDDB, no OODB) AND contains some inaccuracies Big Data Solutions Columnar Databases G. Green NewSQL Databases Hadoop NoSQL Databases 38 Big Data Solutions • Relational-based • Good when high scalability needed with relational DBMSs G. Green NewSQL Databases Re-written, highly optimized storage engines "Sharding" (i.e. horizontal partitioning across multiple DB instances) In-memory databases Distributed query processing 39 Big Data Solutions • Good for data warehouses, analytics computing aggregates on a few columns G. Green Columnar Databases • File contains: all values of a specific column vs. all values of all columns 40 Multidimensional Data Model Data represented as cube Cube depicts business measures analyzed by dimensions G. Green Modeling and analyzing of data across dimensions Optimized for decision-making vs. transaction processing Data storage Pre-aggregation De-normalization Basis for data warehouses 41 Big Data Solutions • Good for storing, retrieving large amounts of semiand unstructured data in batch/offline mode HDFS: data distributor MapReduce: request/processing distributor G. Green Hadoop 42 NoSQL Databases • • • • Scalability via Physical Distribution and Replication of data No fixed schema "Individual query systems" instead of SQL Support for semi- and un-structured data G. Green • Focus (in most cases): • Some provide consistency • Apache's Hbase • Amazon's DynamoDB • Most provide "eventual consistency" • Google’s BigTable • Facebook's Cassandra • Amazon's SimpleDB • Uses a variety of data models… 43 Big Data/NoSQL, cont… • • • • Column (Family) Store Key Value Store Document Store Graph G. Green • NoSQL Physical Data Models • Advantages • • • • • • Highly scalable Good for many writes Support for semi- and un-structured data Data Model (schema) does not have to be defined up-front Many are open source Cloud options available (e.g., Amazon's SimpleDB) • Disadvantages • No common query language • Data inconsistency (“dirty reads”) • Reliance on client applications for data validation, consistency, etc… 44 Summary Hierarchical Network Multidimensional NoSQL/Big Data ACID, BASE G. Green Data Models Relational Model Components Structure Rules Manipulation Transforming ERDs to Relations Representing entities and relationships understand foreign keys Referential Integrity understand RI constraints Normalization Purpose understand anomalies 3 normal forms 45