Logical Database Design

Chapter 4 G. Green Logical Database Design 1 Agenda • Chapter 1 pgs 25 – 28 • Chapter 9 pgs 409 – 418 • Relational Database Model • Transforming ERDs into Relations G. Green • Evolution of Data Models • Referential Integrity • Normalization 2 The Evolution of Data Models • Network • Relational G. Green • Hierarchical • Object oriented • Multi-dimensional • NoSQL 3 • Developed by Codd (IBM) in 1970 • Represents data in the form of tables • Based on mathematical theory G. Green Relational Data Model • 3 Components: • relational database structure • relational rules (integrity) • relational operations (manipulation) 4 Relational Data Model • Advantages • Improved conceptual simplicity • Easier database design, implementation, management, and use G. Green • Structural independence • Ad hoc query capability • Mathematical foundation • Disadvantages • Hardware and system software overhead • Can facilitate poor design and implementation • May promote “islands of information” problems 5 1. Relational Database Structures 2. Rules of Relations 3. Relational Operators G. Green Relational Theory Components 6 1. Relational Database Structure › Tables, Rows, Columns › Files, Records, Fields  Primary Key must be designated  Foreign Keys must be designated for relationships CLASS TABLE CRN CourseNo SecNo Room Days (PK) No(FK) 13109 MIS 1305 02 HCB TTh 229 15225 MKT2307 05 HCB MWF 229 13206 MIS3305 01 HSB MWF 210 G. Green  Relations, Tuples, Attributes ROOM TABLE RoomNo Owning (PK) Dept ROG111 ECS # of DeskPCs Seats 30 Y HSB210 MIS 50 N HCB229 MIS 24 N 7  Relation names must be unique  Entries in columns are atomic (single valued)  Entries in column are from same domain  Each row is unique  Ordering is insignificant G. Green 2. Rules of Relations CLASS Table CRN CourseNo SecNo Room (PK) No(FK) 13109 MIS 1305 02 HCB 229 15225 MKT2307 05 HCB 229 13206 MIS3305 01 HSB 210 Days TTh MWF MWF 8  Data in tables should be added, updated, and deleted without errors › avoid inconsistency ==> referential integrity  insertion  update  Deletion › avoid anomalies  insertion  update  deletion G. Green 2. Rules of Relations, con’t... ==> normalization 9 3. Relational Operators • Relational Algebra * UNION (+) INTERSECTION DIFFERENCE (-) PRODUCT (x) SELECT (tuples) PROJECT (attributes) JOIN (PRODUCT, SELECT, PROJECT) G. Green • • • • • • • * *Diagram adapted from Hyperion presentation, http://infolab.stanford.edu/infoseminar/Archive/FallY99/rus sakovskii-slides/sld001.htm 10 • Represent entities as relations • Represent relationships as either: • foreign keys in relations • new relations G. Green Converting ERD to Relational Model • Provide sample data • Normalize relations 11 Representing Entities as Tables • attributes become columns • primary key must be designated • regular entities have atomic keys • associative entities have composite keys • subtype entities have same key as supertype G. Green • Each entity converted to a relational table • example entity instances are rows of table 12 ERD Example Problem Revisited • Customer requests generate orders • Orders may consist of many ordered products • Products may be contained on many orders, or no orders at all G. Green • A company sells products to customers 13 ERD Example Converted to Tables 14 Representing Relationships › merge attributes into single table; › OR create foreign key (FK) in either relation  1:M G. Green  1:1 › create foreign key (FK) in relation on “many” side of relationship  M:M › should’ve been eliminated on ERD!!! › create new relation with PKs of related entities as (1) concatenated PK, and (2) FKs in new relation 15 Referential Integrity • For every value of a foreign key there must be an existing primary key with that value G. Green • Maintains consistency between data in related tables • Create rules/constraints for: • insertion of foreign keys • update and deletion of primary keys 16 Adding Referential Integrity Constraints (PK) D:R, D:R, U:C U:C (PK) (FK) (FK) D:R, D:R, U:C U:C (PK) (PK) (FK) (FK) (PK) (PK) (FK) (FK) D:R, D:R, U:C U:C (PK) (PK) G. Green Adding Referential Integrity Constraints, cont… 18  Convert complex relations into simpler relations  Why?  Ensures relations conform to rules  Ensures relation contains facts about one “theme” G. Green Normalization  Reveals/corrects redundancies, errors, ambiguities in data model  Only a simple check IF good data model exists  Normal Forms  state of a relation  rids relations of potential anomalies 19 Normalization, con’t... • are experienced when we attempt to store a value for one field but cannot because the value of another field is unknown • e.g., cannot add a new customer’s information until an order number is ready to be entered Order ID(PK) Order Date Customer ID Customer Name Customer Address G. Green • Insertion Anomalies 20 Normalization, con’t... • are experienced when a value for one field we wish to keep is unexpectedly removed when a value for another field is deleted • e.g., cannot delete the sole order for a customer without deleting the only copy of the customer’s information also Order ID(PK) Order Date Customer ID Customer Name Customer Address G. Green • Deletion Anomalies 21 Normalization, con’t... • are experienced when changes to multiple records of a table are needed to effect an update to a single value of a field • e.g., cannot completely update a customer’s address without changing it for every order placed by that customer Order ID(PK) Order Date Customer ID Customer Name Customer Address G. Green • Update Anomalies 22 G. Green Steps in Normalization 23 Normalization, con’t... • the key (1NF) • the WHOLE key (2NF) • and nothing but the key (3NF) G. Green • Every attribute is dependent on: 24 1NF • Primary key • No repeating values or groups G. Green • The table is a relation • only atomic values • All column values from same domain • To correct: • define new (usually associative) entity 25 2NF  (Full) Functional dependency  when the value of one attribute can be determined based on the value of another attribute  Partial functional dependency  when a non-key attribute is functionally dependent on a part of the PK G. Green  1NF + No partial functional dependencies  Already in 2NF if:  PK is NOT concatenated  Relation contains no non-key attributes • To correct: • Decompose into 2 or more relations (if not already) • one with original (concatenated) key + attributes • one (or more) with the “depended on” partial key as PK + attributes 26 3NF • 2NF + No transitive dependencies • a functional dependency between 2 non-key attributes • when a non-key attribute is functionally dependent on another non-key attribute G. Green • Transitive dependency • Already in 3NF if: • only 0 or 1 non-key attributes in relation • To correct: • Decompose into 2 or more relations (if not already) • one with original PK + attributes • one (or more) with “depended on” non-key attribute as PK + attributes 27 G. Green OTHER DATA MODELS 28 The Evolution of Data Models • Network • Relational G. Green • Hierarchical • Object oriented • Multi-dimensional • NoSQL 29 • Each parent can have many children • Each child has only one parent • Tree defined by path that traces parent segments to child segments, beginning from the left • Hierarchical path G. Green Hierarchical Data Model • Ordered sequencing of segments tracing hierarchical structure 30 Problem: Child with Multiple Parents G. Green 31 Hierarchical Data Model, cont… • Database security • Performance, efficiency • Data independence G. Green • Advantages • Disadvantages • • • • Complex implementation Structural dependence Complex applications programming and use Lack of standards 32 Network Data Model  Created to: › Represent complex M:M data relationships • Child can have many parents  Resembles hierarchical model › Collection of records in 1:M relationships G. Green › Impose a database standard  Sets › Implement relationships › Composed of:  Owner  Member 33 Network Data Model, cont… • Handles more relationship types • Conformance to standards • Disadvantages G. Green • Advantages • System complexity • Lack of (popular) product support 34 • Big Data = more than you're able to effectively process G. Green Big Data • Influenced by Mobile, Social Networking, Web analytics, RFID, Atmospheric, Medical Research data, … • Issue: ability of traditional RDBMSs to handle “big data” 35 Big Data RDBMS Issues • Transaction-focus  Requires schema  maintenance issue G. Green • Traditional RDBMS Problems • ACID-focus  Requires locks, db constraints, joins  performance & availability issues • "Relatively" small amounts of operational data  Exceptions require complex, $ actions  scalability issue • Traditional RDBMS Solutions Reference: http://www.slideshare.net/dondemsak/intro-to-big-data-and-nosql 36 G. Green Complex Data Landscape 37 NOTE: This diagram is for effect ONLY—it is incomplete (e.g., no MDDB, no OODB) AND contains some inaccuracies Big Data Solutions Columnar Databases G. Green NewSQL Databases Hadoop NoSQL Databases 38 Big Data Solutions • Relational-based • Good when high scalability needed with relational DBMSs G. Green NewSQL Databases  Re-written, highly optimized storage engines  "Sharding" (i.e. horizontal partitioning across multiple DB instances)  In-memory databases  Distributed query processing 39 Big Data Solutions • Good for data warehouses, analytics  computing aggregates on a few columns G. Green Columnar Databases • File contains:  all values of a specific column vs. all values of all columns 40 Multidimensional Data Model  Data represented as cube  Cube depicts business measures analyzed by dimensions G. Green  Modeling and analyzing of data across dimensions  Optimized for decision-making vs. transaction processing  Data storage  Pre-aggregation  De-normalization  Basis for data warehouses 41 Big Data Solutions • Good for storing, retrieving large amounts of semiand unstructured data in batch/offline mode  HDFS: data distributor  MapReduce: request/processing distributor G. Green Hadoop 42 NoSQL Databases • • • • Scalability via Physical Distribution and Replication of data No fixed schema "Individual query systems" instead of SQL Support for semi- and un-structured data G. Green • Focus (in most cases): • Some provide consistency • Apache's Hbase • Amazon's DynamoDB • Most provide "eventual consistency" • Google’s BigTable • Facebook's Cassandra • Amazon's SimpleDB • Uses a variety of data models… 43 Big Data/NoSQL, cont… • • • • Column (Family) Store Key Value Store Document Store Graph G. Green • NoSQL Physical Data Models • Advantages • • • • • • Highly scalable Good for many writes Support for semi- and un-structured data Data Model (schema) does not have to be defined up-front Many are open source Cloud options available (e.g., Amazon's SimpleDB) • Disadvantages • No common query language • Data inconsistency (“dirty reads”) • Reliance on client applications for data validation, consistency, etc… 44 Summary     Hierarchical Network Multidimensional NoSQL/Big Data  ACID, BASE G. Green  Data Models  Relational Model Components  Structure  Rules  Manipulation  Transforming ERDs to Relations  Representing entities and relationships  understand foreign keys  Referential Integrity  understand RI constraints  Normalization  Purpose  understand anomalies  3 normal forms 45

Logical Database Design

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib