The Relational Data Model Properties of Relations Keys and Constraints CG084&085 /01-02 2/ 1 Relational Data Base • The data base is a collection of relations or tables. • Data organisation is based on the relational data model. • A relation is actually a set of tuples – but we can visualise it as a table with columns and rows. • Each tuple in a relation has the same number and type of attributes. CG084&085 /01-02 2/ 2 Relational Model • A data model is a collection of constructs used to organise data. • The Relational Model is based on a simple and uniform construct – the relation (Codd, 1970). • A relation is a mathematical concept based on the idea of sets. • The strength of the relational model comes from the formal foundation provided by the theory of relations. CG084&085 /01-02 2/ 3 Relational Concepts – Relations • A relation consists of a relation scheme and a relation instance. • The relation scheme can be thought of as a heading for the table. • It consists of the relation name, and a set of pairs (attribute name and domain), where • A domain represents the set of (atomic) values that the attribute can take. CG084&085 /01-02 2/ 4 Relations – cont… • e.g. Student (Name: string, RegNo: string, Address: string, PhoneNo: string, Age: integer, AvgeMark: real) • A relation instance can be visualised as the “body” of a table. • It consists of a set of rows (tuples) each containing values corresponding to the various attributes in the relation scheme. CG084&085 /01-02 2/ 5 Relations – cont… • Student Table • Alternative terminologies for the relational model Informal Terms Table Column Row Values in a column Table Definition Populated Table CG084&085 /01-02 Formal Terms Relation Attribute/Domain Tuple Domain Relation Scheme Relation Instance/Extension 2/ 6 Properties of Relations • No duplicate tuples in a relation – because a relation is a set of tuples • The tuples in a relation have no order – ditto • No duplicate attributes – because the relation scheme contains a set of attributes • The attributes have no order – they are identified by name, not position • Every tuple must have attribute values drawn from each of the domains of the relation scheme (or alternatively the special “value” NULL) CG084&085 /01-02 2/ 7 Position is Not Important A1 A4 A3 A2 A1 A4 A3 A2 Tuples are unordered If you change the place of a row in a table, the meaning of the data remains the same. A1 A2 A3 A4 A1 A4 A3 A2 Attributes are unordered If you change the place of a column in a table, the meaning of the data remains the same. CG084&085 /01-02 2/ 8 Mathematical Relations Mathematical definition of a relation: • Consider two sets, Cars and Colours. – Cars = {NissanMicra, FordEscort} – Colours = {Red, Blue, Silver} (the notation { … } denotes a set) • Let’s find all combinations of elements: first element from Cars, and second from Colours. – {(NissanMicra, Red), (NissanMicra, Blue), (NissanMicra, Silver), (FordEscort, Red), (FordEscort, Blue), (FordEscort, Silver)} • Mathematically, this set is called the Cartesian Product of Cars and Colours. CG084&085 /01-02 2/ 9 Mathematical Relations – cont… A relation is a sub-set of the Cartesian Product of its domains, e.g. BlueCars = {(NissanMicra, Blue), (FordEscort, Blue)} The above set can be written mathematically using conditions as: {(x, y) such that x belongs to Cars and y belongs to Colours and y = Blue} Cars = {NissanMicra, FordEscort} Colours = {Red, Blue, Silver} BlueCars = {(NissanMicra, Blue), (FordEscort, Blue)} CG084&085 /01-02 2/ 10 Mathematical Relations – cont… Consider another set for models: Years = {2008, 2009, 2006, 2007} Let’s define a relation for LatestBlueCars: {(x, y, z) such that x belongs to Cars and y belongs to Colours and y = Blue and z belongs to Years and z > 2007} So we can write: LatestBlueCars = {(NissanMicra, Blue, 2008), (NissanMicra, Blue, 2009), (FordEscort, Blue, 2008), (FordEscort, Blue, 2009)} CG084&085 /01-02 2/ 11 Relational Concepts – Domains • A domain is a set of atomic values, e.g. – Gender = {“Male”, “Female”} – DepartmentCodes = {“CG”, “EN”, “BE”, …} – NationalPhoneNums = {“0191-227 3521”, “0161-275 6138”, …} • An atomic value is indivisible. – A data item is treated as a whole for simplicity. – Although “0191-227-3521” can be divided into 0191, 227, and 3521, in the relational model it is considered as one atomic value. CG084&085 /01-02 2/ 12 Domains – cont… • A data type may be used to define a domain, e.g. integer, real. • A format can be defined to specify the pattern of the legal values, e.g. – NationalPhoneNums: dddd-ddd dddd – Date: MM/DD/YYYY (where MM = {01, 02, …, 12}, DD = {01, 02, …, 31}, YYYY = {0000, 0001, …, 1999, 2000, 2001, …, 9999}) CG084&085 /01-02 2/ 13 Relational Constrains Constraints are conditions that must hold on all relation instances. • Domain Constraint. – Values of attributes must belong to the appropriate domain. • Key Constraint. – A sub-set (Primary Key) of the attributes of a relation should identify each tuple uniquely. • Entity Integrity Constraint. – No Primary Key value can be null. • Referential Integrity Constraint. – Specifies data dependencies between relations. CG084&085 /01-02 2/ 14 Domain Constraint • The value of each attribute A must be an atomic value drawn from the domain of A. • E.g. if a Person relation has a Sex attribute defined on the Gender domain, the only allowable values are “Male” and “Female”. • Each domain has an associated data type. • Standard data types include INT(EGER), DECIMAL, DATE, CHAR, VARCHAR, etc. CG084&085 /01-02 2/ 15 Key Constraint • Since a relation is a set, all tuples must be unique (no duplicates). • No two tuples can have the same combinations of values for all of their attributes. • The Key Constraint makes sure that all tuples in a relation are unique. • A sub-set of the attributes of a relation may be used to guarantee that no two tuples have the same values for these attributes. CG084&085 /01-02 2/ 16 Relational Keys Superkey • A set of attributes (one or more) that uniquely identifies a tuple within a relation. • Every relation must have at least one superkey: the set of all its attributes. Key • A minimum set of attributes that guarantees uniqueness of each tuple within a relation. (No proper sub-set of the key does so). E.g. RegNo is a key of the Student relation. CG084&085 /01-02 2/ 17 Relational Keys – cont… Candidate Key • A relation may have more than one key. Each such key is called a Candidate Key. Primary Key • A Candidate Key chosen to identify each tuple in a relation uniquely – usually underlined. CG084&085 /01-02 2/ 18 Entity Integrity Constraint NULL • Represents a value for an attribute that is currently unknown or is not applicable for this tuple. • Deals with incomplete or exceptional data. • Represents the absence of a value and is not the same as zero or spaces, which are values. CG084&085 /01-02 2/ 19 Entity Integrity Constraint • No Primary Key (PK) attribute can be NULL. • A NULL PK would mean that a tuple cannot be identified, which is against the concept of PK, i.e. to identify tuples uniquely. • If two tuples had NULLs in the PK, there would be no way of distinguishing them. • Non-key attributes may or may not allow null values. CG084&085 /01-02 2/ 20 Relational Keys – cont… Single attribute key Every value in this column is defined to be different from all other values in that column. Multi-attribute key (composite key) Every combination of values in these columns is defined to be different from all other combined values in these columns. CG084&085 /01-02 2/ 21 Referential Integrity Constraint • A constraint involving two relations: the referencing relation and the referenced relation (other constraints involve just a single relation). • Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2. • A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK]. CG084&085 /01-02 2/ 22 Referential Integrity – cont… Foreign Key • A set of attributes in a relation that exactly “matches” a Primary Key in another relation. • The names of the attributes don’t have to be the same, but they must be of the same domain. • A Foreign Key in a relation A, matching a Primary Key in a relation B, represents a relationship between A and B. CG084&085 /01-02 2/ 23 Referential Integrity – cont… Primary Key value here These values are all the same CG084&085 /01-02 Foreign key values here 2/ 24 Referential Integrity – cont… For example, consider two relations: Student (Name, RegNo, Address, …, TutorId) Lecturer (Name, IdNo, RoomNo, …) Student [TutorId] is a FK referencing Lecturer [SSN], the PK of Lecturer. CG084&085 /01-02 2/ 25 Referential Integrity – cont… Referential Integrity says • A FK value cannot reference a non-existent PK. • Therefore we cannot delete a tuple from Lecturer whilst a tuple in Student has a FK value the same as the PK of that tuple. But a FK can be NULL • If a Lecturer tuple is deleted, the TutorId values of all his / her tutees can be set to NULL, meaning that those students have no Tutor. CG084&085 /01-02 2/ 26 Referential Integrity – cont. CG084&085 /01-02 2/ 27 Relational Schemas A relational data base usually contains many relations. A relational schema S contains many relation schemes and a set of integrity constraints IC: • S = {R1, R2, …, Rm} A relational data base state DB over S is a set of the relation instances: • DB = {r1, r2, …, rm} • Each relation instance in DB satisfies the integrity constraints defined in IC. CG084&085 /01-02 2/ 28 Company Data Base Schema CG084&085 /01-02 2/ 29 Company Data Base Schema – Constraints CG084&085 /01-02 2/ 30 Company Database State CG084&085 /01-02 2/ 31 Relational Data Model Implementation • The Relational Data Model is a formalism and an abstract structure for data organisation. • A particular DBMS provides implementation of the Relational Data Model in general. • SQL (Structured Query Language) is the standard for commercial DBMSs. • It is composed of three sub-languages. – DDL: Data Definition Language. – DML: Data Manipulation Language. – DCL: Data Control Language. CG084&085 /01-02 2/ 32 Data Definition in DDL • DDL is used to specify relations (as tables) and the various types of constraints. • We will use the SQL2 standard, a version of SQL proposed in 1992. • Before SQL2, all tables were considered to be part of one global schema. • A schema is defined using the CREATE SCHEMA command. CG084&085 /01-02 2/ 33 Data Definition in DDL • For the Company data base, the schema might be created by: CREATE SCHEMA Company AUTHORIZATION NEliot; • NEliot is the authorisation identifier of the user. • Not implemented in Oracle! CG084&085 /01-02 2/ 34 Data Definition in DDL – cont. • SQL2 provides data types for string, numeric, date, and time data. – CHAR, VARCHAR – INT(EGER), NUMERIC or DEC(IMAL) – FLOAT or REAL – DATE, TIME, TIMESTAMP, INTERVAL • Domains can be defined using CREATE DOMAIN. – Suppose we want to define a domain for the RegNo attribute of Student: CREATE DOMAIN RegNoType AS CHAR(8); – CHAR(8) means a string of 8 characters. – Oracle doesn’t implement domains! CG084&085 /01-02 2/ 35 Data Definition in DDL – cont. Defining relations. • A relation becomes a TABLE. • Relations are defined using CREATE TABLE. • CREATE TABLE specifies both Attributes and Constraints. • Each attribute is given a name, a data type or domain, and may have constraints specified, e.g. NOT NULL. • Constraints on the whole table, e.g. composite Primary and Foreign Keys, are specified after all of the attributes have been defined. CG084&085 /01-02 2/ 36 Data Definition in DDL – cont. Creating Employee Table EMPLOYEE (FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX, SALARY, SUPERSSN*, DNO*) CG084&085 /01-02 2/ 37 Data Definition in DDL – cont. Alternative, specifying default values and referential actions: CG084&085 /01-02 2/ 38 Summary DBMS requires structured data. • Structure is recorded in the Relation Scheme. • Structure depends on the kind of data model. • A Relational DBMS uses the Relational Model. • A Relation is a mathematically defined concept. • A Relation is visualised as a table with special properties. Data model implementation using SQL DDL. • Relations become Tables. • Relations, attributes, and constraints are defined using CREATE TABLE. CG084&085 /01-02 2/ 39