OHOpsummering

advertisement
Opsummering
Basic Definitions
Database: A collection of related data.
Data: Known facts that can be recorded and have an
implicit meaning.
E.g. “John B. Smith” a name123456789 a number ---two
pieces of data
If they are used in a query like “who is the head of the
department and what is his ssn” the data will turn into
information (give an implicit meaning)
Mini-world: Some part of the real world about which
data is stored in a database. For example, student
grades and transcripts at a university.
2
User/programmers
Database
System
Application programs/queries
DBMS
Software
Software to process queries/programs
Software to access storede data
Stored database definition
Meta data
Stored
database
3
Additional Implications of Using the Database Approach
Flexibility to change data structures: database
structure may evolve as new requirements
are defined.
Availability of up-to-date information – very
important for on-line transaction systems
such as airline, hotel, car reservations.
5
Data Models
Data Model: A set of concepts to describe the
structure of a database, and certain constraints
that the database should obey.
Data Model Operations: Operations for
specifying database retrievals and updates by
referring to the concepts of the data model.
Operations on the data model may include
basic operations and user-defined operations.
6
Schemas versus Instances
• Database Schema: The description of a
database. Includes descriptions of the
database structure and the constraints that
should hold on the database.
• Schema Diagram: A diagrammatic display of
(some aspects of) a database schema (fig
2.1).
• Schema Construct: A component of the
schema or an object within the schema, e.g.,
STUDENT, COURSE.
• Database Instance: The actual data stored in
a database at a particular moment in time.
Also called database state (or occurrence).
7
End users
External view
External view
Conceptual Schema
Internal Schema
Stored database
8
DBMS Languages
• Data Definition Language (DDL): Used by
the DBA and database designers to specify the
conceptual schema of a database. In many
DBMSs, the DDL is also used to define internal
and external schemas (views). In some
DBMSs, separate storage definition
language (SDL) and view definition
language (VDL) are used to define internal
and external schemas.
9
DBMS Languages
• Data Manipulation Language (DML): Used to
specify database retrievals and updates.
• DML commands (data sublanguage) can
be embedded in a general-purpose
programming language (host language),
such as Java or C#
• Alternatively, stand-alone DML commands
can be applied directly (query language).
10
DBMS Languages
• High Level or Non-procedural Languages:
e.g., SQL, are set-oriented and specify what
data to retrieve than how to retrieve. Also
called declarative languages.
• Low Level or Procedural Languages: recordat-a-time; they specify how to retrieve data and
include constructs such as looping.
11
Conceptual Data Models
A conceptual model of the data on which the IT
systems of an organisation are based
Independent of implementation
Stable over time
Conceptual data structure doesn't change as
much as functionality
Conceptual models are to be transformed to a
database model as the relational model
ER Model: Concepts
Entities
Relations
Attributes
Atomic
Composite
Multi valued
Attribute values
Entity types
Keys
Domains
Cardinality ratio
Participation (total / partial)
Relations may have attributes
Weak Entity Types
Identifying owner
Identifying relation
Partial key
A weak entity always has total
participation in the identifying
relation.
ER Diagram for the Company Database
EE/R-diagram
Overlapping subclasses
Den relationelle model
Den relationelle model er en datamodel med specielt
sigte på relationsdatabaser
Den relationelle model er en logisk datamodel, der
beskriver hvordan data struktureres i
relationsdatabaser
Den relationelle model
Den relationelle model beskrives ved hjælp af en række veldefinerede begreber:
domæner
relationelle skemaer
relationer
attributter
tupler
primærnøgler, fremmednøgler
begrænsninger (constraints)
Eksempel på tabeller som repræsentation af relationer
Attributter
Relation navn
Ansat
Records/
Tupler
Afdeling
Ansat
Nr
19
34
2
123
23
102
Afdeling
Nr
1
2
3
Fornavn Efter
navn
Peter
Knudsen
Mads
Michelsen
Anne
Andersen
Marianne Jensen
Svend
Michelsen
Hans
Pedersen
Henrik
Foedsels
dato
240165
081151
230245
240165
111253
021170
Uddannelse
Maaneds
Loen
Murer
20000
Murer
23000
Sekretær
19000
Ingeniør
25000
Tømrer
20000
Byggetekniker 23000
AfdelingNavn
Chef ChefStartDato
Nybygning
Renovering
Nedrivning
34
123
102
110164
230671
061193
Afdeling
1
1
1
2
2
3
Nøglebegrebet
En nøgle er en attributkombination, som entydigt
identificerer en forekomst i en tabel.
En nøgle er minimal, dvs.. fjernes een attribut, er den
ikke længere entydig.
Alle attributter fra tabellen vil tilsammen altid være en
(evt.. ikke-minimal) nøgle, kaldet en supernøgle.
Der kan være flere forskellige kandidatnøgler i en tabel
Der vælges altid en primærnøgle fra mængden af
kandidatnøgler
Tabelsammenhænge
repræsenteres ved fremmednøgler
en fremmednøgle er een eller flere attributter i en tabel,
som svarer til primærnøglen i en anden tabel
en fremmednøgle peger på en forekomst i en anden
tabel og fortæller, at her ligger resten af oplysningerne
fremmednøglen og primærnøgleattributterne i den tabel,
der refereres til, skal have samme domæne.
Integritetsregler
Integritet: at være sammenhængende
Domæneregel: Værdien af en attribut skal være en
atomisk værdi fra dom(A)
Entitetsintegritet: En primærnøgle må ikke indeholde
NULL-værdier
Referenceintegritet: En fremmednøgle skal enten
være NULL eller referere til en forekomst med en
tilsvarende primærnøgleværdi
Semantisk integritet: Forskellige regler, der i
modsætning til de andre former for integritet,
afhænger af den bestemte database.
DBMS-understøttelse
DBMS’et bør understøtte:
1. Domæneintegritet
2. Entitetsintegritet
3. Referenceintegritet
4. Semantisk integritet
Udbredte relationelle DBMS understøtter kun 1
og 4 i begrænset omfang.
Datamanipulation i den relationelle
model - relationsalgebraen
Det er det, man
forstår ved en
algebra!
Arbejder på hele tabeller dvs. alle operationer tager
tabeller som input og returnerer nye tabeller
Hermed kan operationer sammensættes til udtryk
(som almindelige regneudtryk)
Operationer:
rækkeudvælgelse (RESTRICT/SELECT)
søjleudvælgelse (PROJECT)
sammensætning af tabeller (JOIN)
mængdeoperationer (UNION, INTERSECTION, MINUS,
PRODUCT)
avancerede operationer (OUTER (LEFT/RIGTH) JOIN)
Relational Algebra - Overview
Table Design
Transformation from E/R-model to
Relational Model
Eigth Steps Algorithm
Does not always yield an optimal design,
but provides a good starting point for the
final design of tables
Step 1: For each regular entity create a table
•For composite attributes only the components are included.
•Multi-value attributes are not included (they are considered in step 6).
•Choose a primary key.
Step 2: For each weak entity type create a table
•All attributes from the weak entity are included.
•The primary key from the owner is included as foreign key.
•The primary key is composed by the owner’s primary key and the partial
key.
Step 3: For each (binary) 1:1-relation type include primary key of
one participant as foreign key in the other
•Any attribute on the relation type is included with the key.
•If possible, include on a side with total participation.
Step 4: For each (binary) 1:n relation type include primary key of 1side as foreign key on n-side
•Any attribute is included with the key on the n-side.
Step 5: For each (binary) n:m relation type create table with
participating entity types primary keys as foreign keys
•Any attribute on the relation is included in the new table.
•Primary key is composed of the foreign keys.
•This may also be applied to binary 1:1- and 1:n relations – in particularly if there are relatively few
instants of the relation type.
Step 6: For each multi value attribute create table with primary
key of entity type as foreign key and the multi value attribute
•The primary key of the new table is composed of the foreign key and the multi value
attribute.
Step 7: For each n-ary (n>2) relation type create a table with the
primary keys of all participating entity types as foreign keys
•Any attribute on the relation type is included.
•The primary key is composed of the included foreign keys.
Step 8:
B. Pull-down (only in case of disjoint, total specialisation):
Create a table for each subclass
Include (“pull down”) all attributes from the super class in each
table
Use the primary key from the super class as primary key in the
new tables
Step 8:
C.
Pull-up-1: (only in case of disjoint specialisation):
Create one table for the superclass
Include (pull up) all attributes from the subclasses
Add a type attribute
Step 8:
D.
Pull-up-2: (in case of overlapping specialisation):
Create one table for the superclass
Include (pull up) all attributes from the subclasses
Add a type flag for each subclass
Normalisation
Normal forms are the formal way to state design
guidelines.
Normalisation is the process.
6 normal forms (NF) are defined:
1st, 2nd, 3rd, and Boyce-Codd (BCNF).
4th and 5th NF
BCNF is the one of most practical interest.
Guideline for Normalisation
All attributes are to depend on the
key,
the whole key,
and nothing but the key.
So help me Codd.
SQL
DDL
create definition af table, view
alter tilføje felter, ændre felter
tilføje constraint
drop
grant / revoke
DML
insert
update
delete
select
/* automatik autoincrement pa primaer noeglen */
/*
create table test
(id
int IDENTITY(1,1)
navn varchar(20));
*/
primary key,
Comparing NULL’s to Values
The logic of conditions in SQL is really 3-valued
logic: TRUE, FALSE, UNKNOWN.
When any value is compared with NULL, the truth
value is UNKNOWN.
But a query only produces a tuple in the answer if
its truth value for the WHERE clause is TRUE
(not FALSE or UNKNOWN).
Use is or is not
select Fname, lname
from employee
where superssn is null
Instead of = or !=
Since sql considers each null value as
being distinct from every other null value
AGGREGATE FUNCTIONS
Include COUNT, SUM, MAX, MIN, and AVG
Query 15: Find the maximum salary, the minimum salary,
and the average salary among all employees.
Q15:
SELECT
MAX(SALARY),
MIN(SALARY), AVG(SALARY)
FROM
EMPLOYEE
Some SQL implementations may not allow more than one
function in the SELECT-clause
GROUPING (cont.)
Query 20: For each department, retrieve the department number, the
number of employees in the department, and their average salary.
Q20:SELECT
FROM
GROUP BY
DNO, COUNT (*), AVG (SALARY)
EMPLOYEE
DNO
In Q20, the EMPLOYEE tuples are divided into groups--each
group having the same value for the grouping attribute DNO
The COUNT and AVG functions are applied to each such group
of tuples separately
The SELECT-clause includes only the grouping attribute and the
functions to be applied on each group of tuples
A join condition can be used in conjunction with grouping
THE HAVING-CLAUSE
Sometimes we want to retrieve the values of
these functions for only those groups that
satisfy certain conditions
The HAVING-clause is used for specifying a
selection condition on groups (rather than on
individual tuples)
THE HAVING-CLAUSE (cont.)
Query 22: For each project on which more than two
employees work , retrieve the project number,
project name, and the number of employees who
work on that project.
Q22:
SELECT
(*)
FROM
WHERE
GROUP BY
HAVING
PNUMBER, PNAME, COUNT
PROJECT, WORKS_ON
PNUMBER=PNO
PNUMBER, PNAME
COUNT (*) > 2
Inner join
select pnumber, dnum, fname, lname, address
from ((project join department on dnum =
dnumber)
join Employee on mgrssn = ssn)
where plocation = 'Stafford'
Outer join
select fname, lname, dname as lederAf
from (employee left join department on
ssn = mgrssn)
John
FrankLin
Joyce
Ramesh
James
Jennifer
Ahmad
Alicia
Smith
NULL
Wong
Research
English NULL
Narayalan NULL
Borg
Headquarters
Wallace Administration
Jabbar
NULL
Zelaya
NULL
Select
select < attribute and function list>
from < tablelist>
[where < condition>]
[group by <grouping attributelist>]
[Having <group condition>]
[Order by < attributelist>]
SQL Views: An Example
CREATE view ViewWorksOn(fname, lname, pname, hours)
AS
(SELECT FNAME, LNAME, PNAME, HOURS
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE SSN=ESSN AND PNO=PNUMBER
GROUP BY fname, lname, PNAME,hours)
Download