Opsummering Basic Definitions Database: A collection of related data. Data: Known facts that can be recorded and have an implicit meaning. E.g. “John B. Smith” a name123456789 a number ---two pieces of data If they are used in a query like “who is the head of the department and what is his ssn” the data will turn into information (give an implicit meaning) Mini-world: Some part of the real world about which data is stored in a database. For example, student grades and transcripts at a university. 2 User/programmers Database System Application programs/queries DBMS Software Software to process queries/programs Software to access storede data Stored database definition Meta data Stored database 3 Additional Implications of Using the Database Approach Flexibility to change data structures: database structure may evolve as new requirements are defined. Availability of up-to-date information – very important for on-line transaction systems such as airline, hotel, car reservations. 5 Data Models Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations. 6 Schemas versus Instances • Database Schema: The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. • Schema Diagram: A diagrammatic display of (some aspects of) a database schema (fig 2.1). • Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE. • Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence). 7 End users External view External view Conceptual Schema Internal Schema Stored database 8 DBMS Languages • Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. 9 DBMS Languages • Data Manipulation Language (DML): Used to specify database retrievals and updates. • DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as Java or C# • Alternatively, stand-alone DML commands can be applied directly (query language). 10 DBMS Languages • High Level or Non-procedural Languages: e.g., SQL, are set-oriented and specify what data to retrieve than how to retrieve. Also called declarative languages. • Low Level or Procedural Languages: recordat-a-time; they specify how to retrieve data and include constructs such as looping. 11 Conceptual Data Models A conceptual model of the data on which the IT systems of an organisation are based Independent of implementation Stable over time Conceptual data structure doesn't change as much as functionality Conceptual models are to be transformed to a database model as the relational model ER Model: Concepts Entities Relations Attributes Atomic Composite Multi valued Attribute values Entity types Keys Domains Cardinality ratio Participation (total / partial) Relations may have attributes Weak Entity Types Identifying owner Identifying relation Partial key A weak entity always has total participation in the identifying relation. ER Diagram for the Company Database EE/R-diagram Overlapping subclasses Den relationelle model Den relationelle model er en datamodel med specielt sigte på relationsdatabaser Den relationelle model er en logisk datamodel, der beskriver hvordan data struktureres i relationsdatabaser Den relationelle model Den relationelle model beskrives ved hjælp af en række veldefinerede begreber: domæner relationelle skemaer relationer attributter tupler primærnøgler, fremmednøgler begrænsninger (constraints) Eksempel på tabeller som repræsentation af relationer Attributter Relation navn Ansat Records/ Tupler Afdeling Ansat Nr 19 34 2 123 23 102 Afdeling Nr 1 2 3 Fornavn Efter navn Peter Knudsen Mads Michelsen Anne Andersen Marianne Jensen Svend Michelsen Hans Pedersen Henrik Foedsels dato 240165 081151 230245 240165 111253 021170 Uddannelse Maaneds Loen Murer 20000 Murer 23000 Sekretær 19000 Ingeniør 25000 Tømrer 20000 Byggetekniker 23000 AfdelingNavn Chef ChefStartDato Nybygning Renovering Nedrivning 34 123 102 110164 230671 061193 Afdeling 1 1 1 2 2 3 Nøglebegrebet En nøgle er en attributkombination, som entydigt identificerer en forekomst i en tabel. En nøgle er minimal, dvs.. fjernes een attribut, er den ikke længere entydig. Alle attributter fra tabellen vil tilsammen altid være en (evt.. ikke-minimal) nøgle, kaldet en supernøgle. Der kan være flere forskellige kandidatnøgler i en tabel Der vælges altid en primærnøgle fra mængden af kandidatnøgler Tabelsammenhænge repræsenteres ved fremmednøgler en fremmednøgle er een eller flere attributter i en tabel, som svarer til primærnøglen i en anden tabel en fremmednøgle peger på en forekomst i en anden tabel og fortæller, at her ligger resten af oplysningerne fremmednøglen og primærnøgleattributterne i den tabel, der refereres til, skal have samme domæne. Integritetsregler Integritet: at være sammenhængende Domæneregel: Værdien af en attribut skal være en atomisk værdi fra dom(A) Entitetsintegritet: En primærnøgle må ikke indeholde NULL-værdier Referenceintegritet: En fremmednøgle skal enten være NULL eller referere til en forekomst med en tilsvarende primærnøgleværdi Semantisk integritet: Forskellige regler, der i modsætning til de andre former for integritet, afhænger af den bestemte database. DBMS-understøttelse DBMS’et bør understøtte: 1. Domæneintegritet 2. Entitetsintegritet 3. Referenceintegritet 4. Semantisk integritet Udbredte relationelle DBMS understøtter kun 1 og 4 i begrænset omfang. Datamanipulation i den relationelle model - relationsalgebraen Det er det, man forstår ved en algebra! Arbejder på hele tabeller dvs. alle operationer tager tabeller som input og returnerer nye tabeller Hermed kan operationer sammensættes til udtryk (som almindelige regneudtryk) Operationer: rækkeudvælgelse (RESTRICT/SELECT) søjleudvælgelse (PROJECT) sammensætning af tabeller (JOIN) mængdeoperationer (UNION, INTERSECTION, MINUS, PRODUCT) avancerede operationer (OUTER (LEFT/RIGTH) JOIN) Relational Algebra - Overview Table Design Transformation from E/R-model to Relational Model Eigth Steps Algorithm Does not always yield an optimal design, but provides a good starting point for the final design of tables Step 1: For each regular entity create a table •For composite attributes only the components are included. •Multi-value attributes are not included (they are considered in step 6). •Choose a primary key. Step 2: For each weak entity type create a table •All attributes from the weak entity are included. •The primary key from the owner is included as foreign key. •The primary key is composed by the owner’s primary key and the partial key. Step 3: For each (binary) 1:1-relation type include primary key of one participant as foreign key in the other •Any attribute on the relation type is included with the key. •If possible, include on a side with total participation. Step 4: For each (binary) 1:n relation type include primary key of 1side as foreign key on n-side •Any attribute is included with the key on the n-side. Step 5: For each (binary) n:m relation type create table with participating entity types primary keys as foreign keys •Any attribute on the relation is included in the new table. •Primary key is composed of the foreign keys. •This may also be applied to binary 1:1- and 1:n relations – in particularly if there are relatively few instants of the relation type. Step 6: For each multi value attribute create table with primary key of entity type as foreign key and the multi value attribute •The primary key of the new table is composed of the foreign key and the multi value attribute. Step 7: For each n-ary (n>2) relation type create a table with the primary keys of all participating entity types as foreign keys •Any attribute on the relation type is included. •The primary key is composed of the included foreign keys. Step 8: B. Pull-down (only in case of disjoint, total specialisation): Create a table for each subclass Include (“pull down”) all attributes from the super class in each table Use the primary key from the super class as primary key in the new tables Step 8: C. Pull-up-1: (only in case of disjoint specialisation): Create one table for the superclass Include (pull up) all attributes from the subclasses Add a type attribute Step 8: D. Pull-up-2: (in case of overlapping specialisation): Create one table for the superclass Include (pull up) all attributes from the subclasses Add a type flag for each subclass Normalisation Normal forms are the formal way to state design guidelines. Normalisation is the process. 6 normal forms (NF) are defined: 1st, 2nd, 3rd, and Boyce-Codd (BCNF). 4th and 5th NF BCNF is the one of most practical interest. Guideline for Normalisation All attributes are to depend on the key, the whole key, and nothing but the key. So help me Codd. SQL DDL create definition af table, view alter tilføje felter, ændre felter tilføje constraint drop grant / revoke DML insert update delete select /* automatik autoincrement pa primaer noeglen */ /* create table test (id int IDENTITY(1,1) navn varchar(20)); */ primary key, Comparing NULL’s to Values The logic of conditions in SQL is really 3-valued logic: TRUE, FALSE, UNKNOWN. When any value is compared with NULL, the truth value is UNKNOWN. But a query only produces a tuple in the answer if its truth value for the WHERE clause is TRUE (not FALSE or UNKNOWN). Use is or is not select Fname, lname from employee where superssn is null Instead of = or != Since sql considers each null value as being distinct from every other null value AGGREGATE FUNCTIONS Include COUNT, SUM, MAX, MIN, and AVG Query 15: Find the maximum salary, the minimum salary, and the average salary among all employees. Q15: SELECT MAX(SALARY), MIN(SALARY), AVG(SALARY) FROM EMPLOYEE Some SQL implementations may not allow more than one function in the SELECT-clause GROUPING (cont.) Query 20: For each department, retrieve the department number, the number of employees in the department, and their average salary. Q20:SELECT FROM GROUP BY DNO, COUNT (*), AVG (SALARY) EMPLOYEE DNO In Q20, the EMPLOYEE tuples are divided into groups--each group having the same value for the grouping attribute DNO The COUNT and AVG functions are applied to each such group of tuples separately The SELECT-clause includes only the grouping attribute and the functions to be applied on each group of tuples A join condition can be used in conjunction with grouping THE HAVING-CLAUSE Sometimes we want to retrieve the values of these functions for only those groups that satisfy certain conditions The HAVING-clause is used for specifying a selection condition on groups (rather than on individual tuples) THE HAVING-CLAUSE (cont.) Query 22: For each project on which more than two employees work , retrieve the project number, project name, and the number of employees who work on that project. Q22: SELECT (*) FROM WHERE GROUP BY HAVING PNUMBER, PNAME, COUNT PROJECT, WORKS_ON PNUMBER=PNO PNUMBER, PNAME COUNT (*) > 2 Inner join select pnumber, dnum, fname, lname, address from ((project join department on dnum = dnumber) join Employee on mgrssn = ssn) where plocation = 'Stafford' Outer join select fname, lname, dname as lederAf from (employee left join department on ssn = mgrssn) John FrankLin Joyce Ramesh James Jennifer Ahmad Alicia Smith NULL Wong Research English NULL Narayalan NULL Borg Headquarters Wallace Administration Jabbar NULL Zelaya NULL Select select < attribute and function list> from < tablelist> [where < condition>] [group by <grouping attributelist>] [Having <group condition>] [Order by < attributelist>] SQL Views: An Example CREATE view ViewWorksOn(fname, lname, pname, hours) AS (SELECT FNAME, LNAME, PNAME, HOURS FROM EMPLOYEE, PROJECT, WORKS_ON WHERE SSN=ESSN AND PNO=PNUMBER GROUP BY fname, lname, PNAME,hours)