Datalog Another formalism for expressing queries: - cleaner - closer to a “logic” notation - more convenient for analysis - equivalent in power to relational algebra - will later allow us to consider queries with recursion Predicates and Atoms - relations are represented by predicates - tuples are represented by atoms. Purchase( “joe”, “bob”, “Nike Town”, “Nike Air”, 2/2/98) - arithmetic atoms: X < 100, X+Y+5 > Z/2 - negated atoms: NOT Product(“Brooklyn Bridge”, $100, “Microsoft”) Datalog Rules and Queries A datalog rule has the following form: head :- atom1, atom2, …., atom,… ExpensiveProduct(X) :- Product(X,Y,P) & P > $100 BritishProduct(X) :- Product(X,Y,P) & Company(P, “UK”, SP) P(X,Y) :- Between(X,Y,Z) & NOT Direct(X,Z) The Meaning of Datalog Rules ExpensiveProduct(X) :- Product(X,Y,P) & P > $100 Consider every assignment from the variables in the body to the constants in the database. If each of the atoms in the body is made true by the assignment, then add the tuple for the head into the relation of the head. Rule Safety Every variable that appears anywhere in the query must appear also in a relational, nonnegated atom in the query. Q(X,Y,Z) :- R1(X,Y) & X < Z not safe Q(X,Y,Z) :- R1(X,Y) & NOT R2(X,Y,Z) not safe Composing Datalog Rules Extensional predicates: represent relations appearing in the database. Intentional predicates: defined by rules. These can be thought of as being views. Datalog rules may be composed in order to express more complex queries. An Example Query Find employees participating in projects that don’t involve their department heads: EmpInvolve ( X, P, H) :- Project(P,X, S, E, B, D) & Employee( X, N ) & Department( N, H) DHInvolve ( X, P, H) :- Project( P, H, S, E, B, D) & Department( N, H) & Employee( P, N ) Answer (X) :- EmpInvolve(X, P, H) & NOT DHInvolve( X, P, H). From Relational Algebra to Datalog We can translate any relational algebra operation to datalog: - projection - selection - union - intersection - join Architecture of a DBMS Query optimization and execution Relational operators Files and access methods Buffer management Disk space management The Memory Hierarchy Main Memory •Volatile •limited address spaces • expensive • average access time: 5 microseconds Disks • 5-10 MB/S transmission rates • 2-10 GB storage • average time to access a block: 10-15 msecs. • Need to consider seek, rotation, transfer times. • Keep records “close” to each other. Tapes • 1.5 MB/S transfer rate • 280 GB typical capacity • Only sequential access • Not for operational data Disk Space Manager Task: manage the location of pages on disk (page = block) Provides commands for: • allocating and deallocating a page on disk • reading and writing pages. Why not use the operating system for this task? • Portability • Limited size of address space • May need to span several disk devices. Buffer Manager Manages buffer pool: the pool provides space for a limited number of pages from disk. Needs to decide on page replacement policy. Enables the higher levels of the DBMS to assume that the needed data is in main memory. Why not use the Operating System for the task?? - DBMS may be able to anticipate access patterns - Hence, may also be able to perform prefetching - DBMS needs the ability to force pages to disk. Managing Files The abstraction used by the higher levels of the DBMS is of files. For example, a relation is stored in a file. A file will typically consist of many pages. Main issue: how to organize the records in a file? Approaches: • • • • heap files ordered files hashed files additional index structures.