Sql

advertisement

Database Management Systems

1

Overview

● we studied:

● how memory is managed within the virtual memory space of a process how messages can be exchanged between processes, using inter-process communication

2

Next

Storage remains unchanged beyond the end of the process execution.

RTE provides services to processes to support persistent data storage.

3

Persistent storage

● several properties:

It is persistent

It can be shared among several processes

We discuss in this lecture two types of persistent storage services:

1.

2.

the file system database services.

4

File System (FS) Services (interface)

Map names to storage locations:

● storage location=storage on disk.

FS maps names ("/usr/local/file.ext") to concrete locations on disk. name mapping interface = sys. calls: open(name), create(name), delete(name) query names space getfiles(name-pattern).

FS organizes space of file names in hierarchical structure of embedded directories.

5

File System (FS) Services (interface)

Access:

● stream interface to read and write a file as a stream of bytes. seek within the file, read its content and write new content. allocation of space / automatic increase storage space

6

File System (FS) Services (interface)

● locking files:

● allow multiple processes to synchronize access on shared files one can lock a whole file or just a segment of bytes within the content of a file.

7

FS

FS = abstract interface of system calls

(open, read, write, seek etc)

● implemented by specific drivers which know how to interact with specific hardware devices.

8

Distributed FS

● process can request access to files stored on a remote machine:

● implementation of FS in RTE passes the system calls (open, read, etc) to FS-server on server distributed file systems (for example, NFS and

SAMBA) have complex locking mechanisms.

9

Database Management System (DBMS)

● way of storing persistent data other than FS

DBMS is present as a service processes connect through inter-process communication protocol (generally over TCP).

DBMS manages persistent storage, often by accessing FS on its side

10

File systems vs. DBMS

Data Model

● in FS is very simple: a stream of bytes with the open/close, read/write and lock operation.

DBMS defines a rich data model, manages serialization programmer must define the specific data model for the data he is interested in storing.

11

File systems vs. DBMS

Data Independence

DBMSs provide an abstract interface for data storage/access. programmer does not need to define in which file, at which offset data is to be stored. one can query the DBMS by content (e.g.,

"retrieve all content that satisfies certain criteria").

12

File systems vs. DBMS

Efficient concurrency

DBMSs are built to support thousands of concurrent users. ensure data is kept consistent provide efficient management of such high concurrency.

13

File systems vs. DBMS

Reduced application development time

DBMSs allow data manipulation using a simple

API.

DBMS manages storage, query optimization, concurrency and integrity management

14

DBMS Services

● management:

● verify that users are authorized to access data.

Specific rights can be granted for each part of the data and each group of users.

15

DBMS Services

Session and Transaction management:

● transaction = complex data operation - read and modify many different objects viewed from the outside as single atomic operation- completely succeeds or not performed

16

DBMS Services

Query optimization and execution:

● clients interact with DBMS by submitting complex queries, in a language

DBMS execute queries in the most efficient manner possible.

techniques in the world of relational database

servers.

17

Back-end services

● not directly visible to the user but performed by the DBMS as part of its implementation

File and Access methods

Buffer management

Disk space management

18

Transaction

● unit of work performed against a database management system treated in a coherent and reliable way independent of other transactions.

atomic, consistent, isolated and durable

(ACID acronym). composed of independent units of work, each reading and/or writing information to DB

19

Transaction

● provide "all-or-nothing" - work units must complete or take no effect whatsoever.

must be isolated from other transactions results must conform to existing constraints in database if completed successfully must be committed to durable storage.

20

Transaction pattern

Begin transaction

Execute data manipulations and queries

If no errors occur then commit transaction

If errors occur rollback the transaction

21

Transactions safety

Identify resources that will be accessed during the transaction.

SQL: full tables or parts of tables (rows).

Acquire locks on all resources before transaction starts.

22

Transactions safety

Perform transaction.

● modifications are performed on data snapshot in memory.

isolation: data in transaction is not affected by other threads. partial modifications not committed are not visible

23

Transactions safety

● transaction committed: modifications are merged to persistent storage if transaction is rollbacked: modifications are deleted and persistent storage is not modified.

24

Data Models

● data model = which data values can be sent for storage

OO method:

● primitive data types (int, bool, char etc) complex data types

DBMS - relational model

25

Case Study: A Data Model for Colleges

● data model of an academic college. different departments, each specializing in teaching a certain domain.

Computer Science / Physics department department has a department head and a geographical location. students may be enrolled in one department and registered to courses given by it.

26

Relational Model

"Relation" = "table", - "relational" = "based on tables“… logical representation of information. logical schema - declared constraints in the database design - consistency database normalization design with desirable properties from a set of logically equivalent alternatives.

27

building blocks

Relation - set of records of same fields. header of a table

Our example: students, departments, etc.

Attribute - field within a relation (data type, name)

Our example: student name, department head,

28

building blocks

Domain - restriction of types

● our example: student ID is 9 digits 0-9 each.

Records - tuple, single line within a relation.

Integrity constraints - constraint on attribute with regard to all records

● our example: Student ID is unique.

29

Relational Algebra

Selection - select specific rows

Projection - projects specific columns

Set operations - union, intersection, crossproduct, set-difference

● on relations that have the same structure (same attributes)

Join - cross product followed by selection and projection.

Renaming operation - change name of columns.

30

● take existing relations as arguments and return new relations as a value.

columns of tables are restricted to being simple values

● columns must be primitive data types no embedded tables or array. complex data - primary and foreign keys

31

SQL (Structured Query Language)

● query language - interface with a relational

DBMS. sub-languages:

DDL: Data Definition Language define schemas, relations and domains.

DML: Data Manipulation Language - queries, insertions, updates and deletions

32

● client interacts with SQL server by sending

SQL queries (TCP protocol)

● receiving as answers a result-set. communication protocol between an SQL server and its clients is stateful and session oriented.

33

Keys

● subset of the attributes of the relation which uniquely identify rows.

Example: relation for students with attributes name, birthday, address and Social Security

Number (SSN).

Name is not a key (two students could have the same name). SSN is a key (this is why it was invented:). several attributes in a key: courses (course name, department, semester, lecturer) course name is not a key / but the pair (course name, semester) could

34

Database Normalization and Foreign Keys

● minimize duplication of information (also known as "Don't Repeat Yourself" - the DRY principle

Duplication = data appear twice in the domain of the problem modeled.

Example: relation student (ID, Name, Course) a student is enrolled to 3 courses = 3 records duplicating the ID and Name

35

Solution: student data will be stored only in a single relation. refer to students- keys to remove data repetition: use a single attribute SSN to refer to a student.

36

Join

● retrieve the information on the student given a tuple from the courses relation, we need to operate a join operation on the SSN attribute (which is shared by the 2 relations)

The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

37

38

Foreign key

● in student relation, SSN is a key. in course relation, StudentSSN is not a key

● several courses taken by a single student

StudentSSN refers to the key of another table.

= foreign key in the course relation - refers to the student relation.

39

Data Integrity Constraints

● cannot create a tuple in the course relation that refers to a student that does not exist.

● new row in course relation: check foreign key exists in student relation.

update SSN field of course relation: check new

StudentSSN exists in student relation.

delete row in student relation: verify no rows in course that refer to it update row in student relation: verify no rows in course that refer to old value

40

When defining a data model on a server, user declares which constraints are part of the model:

– keys and foreign keys.

DBMS server enforces constraints and verifies insert, update, delete operations –no conflict with constraints

41

NULL

● belongs to all domains indicates lack of knowledge - we do not know what its value is

● example (1 = NULL) is false and (1 != NULL) is also false! (NULL = NULL) is also false. compare NULL with value, result is always false.

42

Indexing

● records in relation in an query efficient data structure with respect to certain attributes.

B-Tree / hash-tables

● query efficient

● cost time when inserting or deleting data.

Implicit indexing - employed by DBMS for attributes defined as unique (keys).

Explicit indexing of an attribute is possible

43

Designing a Data Model

A normalized data model provides efficient access to data with respect to the most typical scenarios

44

Design Steps

45

Step A – Identify simple correlation among attributes

A Student has a name and ID. ignore Courses and Department - not simple hold correlation to other attributes as well:

● department has a name, location and head.

Course has a name.

46

Step B – Identify unique attributes

● at least one attribute in a relation that is unique

Primary Key - a unique attribute that must have a value (NULL is not allowed).

student relation: ID = primary key. courses relation = artificial one: a course number. department ID.

47

● can define several attributes as a unique key any relation that should be later queried efficiently must have a primary key.

in rare cases, no need to define a primary key.

● example: logs stored in DB, only insertions are performed occasionally based on a time range.

– no requirement to uniquely identify a single row of a log - so there is no need to define a key on such a table.

48

Step C – Associate simple relations

● associations between tables. where one table refers to another: adding in a table an attribute that refers to rows in another relation.

● example, a student is registered to a department

(assume only one). add the primary key of the department relation as an attribute in the students relation foreign key - many-to-one relationship

49

● foreign key identifies a column (set) in

(referencing) table that refers to a column

(set) in another (referenced) table. values in one row of the referencing columns must occur in a single row in referenced table.

A row in the referencing table cannot contain values that don't exist in the referenced table (except NULL - which means that we do not know what the value of the column is).

50

step D – Represent Collections

● describe an attribute as a collection.

● our example: courses to which a student is registered is a collection of values. model association - create table that correlates courses and students with foreign keys.

define student2course with attributes: courseId as foreign key to courses.ID and studentId as a foreign key to Students.ID. model many-to-many associations - called "cross tables".

51

The College model

Tables: departments, students, courses, student2course departments: ID integer as a PRIMARY KEY.

Name short string that must be UNIQUE,

Building integer students: ID integer as a PRIMARY KEY,

Name short string, DepID integer as a

FOREIGN KEY to departments.ID

52

● courses: ID integer as a PRIMARY KEY,

Name short string that must not be NULL student2course: SID integer as a FOREIGN

KEY to students.ID , CID integer as a

FOREIGN KEY to courses.ID and Grade integer.

53

We add an explicit index on Students.Name

and on Courses.Name as we know many queries will be performed on those attributes

54

Download