Datalog and storage management

advertisement
Datalog
Another formalism for expressing queries:
- cleaner
- closer to a “logic” notation
- more convenient for analysis
- equivalent in power to relational algebra
- will later allow us to consider queries with recursion
Predicates and Atoms
- relations are represented by predicates
- tuples are represented by atoms.
Purchase( “joe”, “bob”, “Nike Town”, “Nike Air”, 2/2/98)
- arithmetic atoms:
X < 100,
X+Y+5 > Z/2
- negated atoms:
NOT Product(“Brooklyn Bridge”, $100, “Microsoft”)
Datalog Rules and Queries
A datalog rule has the following form:
head :- atom1, atom2, …., atom,…
ExpensiveProduct(X) :- Product(X,Y,P) & P > $100
BritishProduct(X) :- Product(X,Y,P) & Company(P, “UK”, SP)
P(X,Y) :- Between(X,Y,Z) & NOT Direct(X,Z)
The Meaning of Datalog Rules
ExpensiveProduct(X) :- Product(X,Y,P) & P > $100
Consider every assignment from the variables in the body
to the constants in the database.
If each of the atoms in the body is made true by the assignment,
then
add the tuple for the head into the relation of the head.
Rule Safety
Every variable that appears anywhere in the query must appear
also in a relational, nonnegated atom in the query.
Q(X,Y,Z) :- R1(X,Y) & X < Z
not safe
Q(X,Y,Z) :- R1(X,Y) & NOT R2(X,Y,Z)
not safe
Composing Datalog Rules
Extensional predicates: represent relations appearing in the database.
Intentional predicates: defined by rules. These can be thought of as
being views.
Datalog rules may be composed in order to express more complex
queries.
An Example Query
Find employees participating in projects that don’t involve
their department heads:
EmpInvolve ( X, P, H) :- Project(P,X, S, E, B, D) &
Employee( X, N ) &
Department( N, H)
DHInvolve ( X, P, H) :- Project( P, H, S, E, B, D) &
Department( N, H) &
Employee( P, N )
Answer (X) :-
EmpInvolve(X, P, H) &
NOT DHInvolve( X, P, H).
From Relational Algebra to
Datalog
We can translate any relational algebra operation to datalog:
- projection
- selection
- union
- intersection
- join
Architecture of a DBMS
Query optimization and
execution
Relational operators
Files and access methods
Buffer management
Disk space management
The Memory Hierarchy
Main Memory
•Volatile
•limited address
spaces
• expensive
• average access
time:
5 microseconds
Disks
• 5-10 MB/S
transmission rates
• 2-10 GB storage
• average time to
access a block:
10-15 msecs.
• Need to consider
seek, rotation,
transfer times.
• Keep records “close”
to each other.
Tapes
• 1.5 MB/S transfer rate
• 280 GB typical
capacity
• Only sequential access
• Not for operational
data
Disk Space Manager
Task: manage the location of pages on disk (page = block)
Provides commands for:
• allocating and deallocating a page on disk
• reading and writing pages.
Why not use the operating system for this task?
• Portability
• Limited size of address space
• May need to span several disk devices.
Buffer Manager
Manages buffer pool: the pool provides space for a limited
number of pages from disk.
Needs to decide on page replacement policy.
Enables the higher levels of the DBMS to assume that the
needed data is in main memory.
Why not use the Operating System for the task??
- DBMS may be able to anticipate access patterns
- Hence, may also be able to perform prefetching
- DBMS needs the ability to force pages to disk.
Managing Files
The abstraction used by the higher levels of the DBMS is of files.
For example, a relation is stored in a file.
A file will typically consist of many pages.
Main issue: how to organize the records in a file?
Approaches:
•
•
•
•
heap files
ordered files
hashed files
additional index structures.
Download