1
● we studied:
●
● how memory is managed within the virtual memory space of a process how messages can be exchanged between processes, using inter-process communication
2
●
●
Storage remains unchanged beyond the end of the process execution.
RTE provides services to processes to support persistent data storage.
3
● several properties:
●
●
It is persistent
It can be shared among several processes
●
We discuss in this lecture two types of persistent storage services:
1.
2.
the file system database services.
4
●
Map names to storage locations:
●
●
●
● storage location=storage on disk.
FS maps names ("/usr/local/file.ext") to concrete locations on disk. name mapping interface = sys. calls: open(name), create(name), delete(name) query names space getfiles(name-pattern).
FS organizes space of file names in hierarchical structure of embedded directories.
5
●
Access:
●
●
● stream interface to read and write a file as a stream of bytes. seek within the file, read its content and write new content. allocation of space / automatic increase storage space
6
● locking files:
●
● allow multiple processes to synchronize access on shared files one can lock a whole file or just a segment of bytes within the content of a file.
7
●
FS = abstract interface of system calls
(open, read, write, seek etc)
● implemented by specific drivers which know how to interact with specific hardware devices.
8
● process can request access to files stored on a remote machine:
●
● implementation of FS in RTE passes the system calls (open, read, etc) to FS-server on server distributed file systems (for example, NFS and
SAMBA) have complex locking mechanisms.
9
●
●
●
● way of storing persistent data other than FS
DBMS is present as a service processes connect through inter-process communication protocol (generally over TCP).
DBMS manages persistent storage, often by accessing FS on its side
10
●
Data Model
●
●
● in FS is very simple: a stream of bytes with the open/close, read/write and lock operation.
DBMS defines a rich data model, manages serialization programmer must define the specific data model for the data he is interested in storing.
11
●
Data Independence –
●
●
●
DBMSs provide an abstract interface for data storage/access. programmer does not need to define in which file, at which offset data is to be stored. one can query the DBMS by content (e.g.,
"retrieve all content that satisfies certain criteria").
12
●
Efficient concurrency
●
●
●
DBMSs are built to support thousands of concurrent users. ensure data is kept consistent provide efficient management of such high concurrency.
13
●
Reduced application development time
●
●
DBMSs allow data manipulation using a simple
API.
DBMS manages storage, query optimization, concurrency and integrity management
14
● management:
●
● verify that users are authorized to access data.
Specific rights can be granted for each part of the data and each group of users.
15
●
Session and Transaction management:
●
● transaction = complex data operation - read and modify many different objects viewed from the outside as single atomic operation- completely succeeds or not performed
16
●
Query optimization and execution:
●
●
● clients interact with DBMS by submitting complex queries, in a language
DBMS execute queries in the most efficient manner possible.
techniques in the world of relational database
servers.
17
●
●
●
● not directly visible to the user but performed by the DBMS as part of its implementation
File and Access methods
Buffer management
Disk space management
18
●
●
●
● unit of work performed against a database management system treated in a coherent and reliable way independent of other transactions.
atomic, consistent, isolated and durable
(ACID acronym). composed of independent units of work, each reading and/or writing information to DB
19
●
●
●
● provide "all-or-nothing" - work units must complete or take no effect whatsoever.
must be isolated from other transactions results must conform to existing constraints in database if completed successfully must be committed to durable storage.
20
●
●
●
●
Begin transaction
Execute data manipulations and queries
If no errors occur then commit transaction
If errors occur rollback the transaction
21
●
●
Identify resources that will be accessed during the transaction.
●
SQL: full tables or parts of tables (rows).
Acquire locks on all resources before transaction starts.
22
●
Perform transaction.
●
● modifications are performed on data snapshot in memory.
isolation: data in transaction is not affected by other threads. partial modifications not committed are not visible
23
●
● transaction committed: modifications are merged to persistent storage if transaction is rollbacked: modifications are deleted and persistent storage is not modified.
24
●
●
● data model = which data values can be sent for storage
OO method:
●
● primitive data types (int, bool, char etc) complex data types
DBMS - relational model
25
●
●
●
● data model of an academic college. different departments, each specializing in teaching a certain domain.
●
Computer Science / Physics department department has a department head and a geographical location. students may be enrolled in one department and registered to courses given by it.
26
●
●
●
●
●
"Relation" = "table", - "relational" = "based on tables“… logical representation of information. logical schema - declared constraints in the database design - consistency database normalization design with desirable properties from a set of logically equivalent alternatives.
27
●
●
Relation - set of records of same fields. header of a table
●
Our example: students, departments, etc.
Attribute - field within a relation (data type, name)
●
Our example: student name, department head,
28
●
●
●
Domain - restriction of types
● our example: student ID is 9 digits 0-9 each.
Records - tuple, single line within a relation.
Integrity constraints - constraint on attribute with regard to all records
● our example: Student ID is unique.
29
●
●
●
●
●
Selection - select specific rows
Projection - projects specific columns
Set operations - union, intersection, crossproduct, set-difference
● on relations that have the same structure (same attributes)
Join - cross product followed by selection and projection.
Renaming operation - change name of columns.
30
●
● take existing relations as arguments and return new relations as a value.
columns of tables are restricted to being simple values
●
●
● columns must be primitive data types no embedded tables or array. complex data - primary and foreign keys
31
● query language - interface with a relational
DBMS. sub-languages:
●
●
DDL: Data Definition Language define schemas, relations and domains.
DML: Data Manipulation Language - queries, insertions, updates and deletions
32
● client interacts with SQL server by sending
SQL queries (TCP protocol)
●
● receiving as answers a result-set. communication protocol between an SQL server and its clients is stateful and session oriented.
33
● subset of the attributes of the relation which uniquely identify rows.
●
Example: relation for students with attributes name, birthday, address and Social Security
Number (SSN).
–
–
Name is not a key (two students could have the same name). SSN is a key (this is why it was invented:). several attributes in a key: courses (course name, department, semester, lecturer) course name is not a key / but the pair (course name, semester) could
34
Database Normalization and Foreign Keys
●
●
● minimize duplication of information (also known as "Don't Repeat Yourself" - the DRY principle
Duplication = data appear twice in the domain of the problem modeled.
Example: relation student (ID, Name, Course) a student is enrolled to 3 courses = 3 records duplicating the ID and Name
35
●
●
Solution: student data will be stored only in a single relation. refer to students- keys to remove data repetition: use a single attribute SSN to refer to a student.
36
● retrieve the information on the student given a tuple from the courses relation, we need to operate a join operation on the SSN attribute (which is shared by the 2 relations)
●
The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).
37
38
●
● in student relation, SSN is a key. in course relation, StudentSSN is not a key
●
● several courses taken by a single student
StudentSSN refers to the key of another table.
= foreign key in the course relation - refers to the student relation.
39
● cannot create a tuple in the course relation that refers to a student that does not exist.
●
●
●
● new row in course relation: check foreign key exists in student relation.
update SSN field of course relation: check new
StudentSSN exists in student relation.
delete row in student relation: verify no rows in course that refer to it update row in student relation: verify no rows in course that refer to old value
40
●
When defining a data model on a server, user declares which constraints are part of the model:
–
– keys and foreign keys.
DBMS server enforces constraints and verifies insert, update, delete operations –no conflict with constraints
41
●
●
● belongs to all domains indicates lack of knowledge - we do not know what its value is
● example (1 = NULL) is false and (1 != NULL) is also false! (NULL = NULL) is also false. compare NULL with value, result is always false.
42
●
●
●
● records in relation in an query efficient data structure with respect to certain attributes.
B-Tree / hash-tables
● query efficient
● cost time when inserting or deleting data.
Implicit indexing - employed by DBMS for attributes defined as unique (keys).
Explicit indexing of an attribute is possible
43
●
●
A normalized data model provides efficient access to data with respect to the most typical scenarios
44
45
Step A – Identify simple correlation among attributes
●
●
A Student has a name and ID. ignore Courses and Department - not simple hold correlation to other attributes as well:
●
● department has a name, location and head.
Course has a name.
46
●
●
●
● at least one attribute in a relation that is unique
Primary Key - a unique attribute that must have a value (NULL is not allowed).
student relation: ID = primary key. courses relation = artificial one: a course number. department ID.
47
●
●
● can define several attributes as a unique key any relation that should be later queried efficiently must have a primary key.
in rare cases, no need to define a primary key.
● example: logs stored in DB, only insertions are performed occasionally based on a time range.
– no requirement to uniquely identify a single row of a log - so there is no need to define a key on such a table.
48
●
● associations between tables. where one table refers to another: adding in a table an attribute that refers to rows in another relation.
●
●
● example, a student is registered to a department
(assume only one). add the primary key of the department relation as an attribute in the students relation foreign key - many-to-one relationship
49
●
●
● foreign key identifies a column (set) in
(referencing) table that refers to a column
(set) in another (referenced) table. values in one row of the referencing columns must occur in a single row in referenced table.
A row in the referencing table cannot contain values that don't exist in the referenced table (except NULL - which means that we do not know what the value of the column is).
50
● describe an attribute as a collection.
●
●
●
● our example: courses to which a student is registered is a collection of values. model association - create table that correlates courses and students with foreign keys.
define student2course with attributes: courseId as foreign key to courses.ID and studentId as a foreign key to Students.ID. model many-to-many associations - called "cross tables".
51
●
●
●
Tables: departments, students, courses, student2course departments: ID integer as a PRIMARY KEY.
Name short string that must be UNIQUE,
Building integer students: ID integer as a PRIMARY KEY,
Name short string, DepID integer as a
FOREIGN KEY to departments.ID
52
●
● courses: ID integer as a PRIMARY KEY,
Name short string that must not be NULL student2course: SID integer as a FOREIGN
KEY to students.ID , CID integer as a
FOREIGN KEY to courses.ID and Grade integer.
53
●
We add an explicit index on Students.Name
and on Courses.Name as we know many queries will be performed on those attributes
54