223_103_Final - Department of Computer Science

advertisement
Database Principles-CS 257
Deepti Bhardwaj
Roll No. 223_103
SJSU ID:
006521307
CS 257 – Dr.T.Y.Lin
13.1.1 The Memory Hierarchy




Several components for data storage having different data
capacities available
Cost per byte to store data also varies
Device with smallest capacity offer the fastest speed with
highest cost per bit
Cache
 Lowest level of the hierarchy
 Data items are copies of certain locations of main
memory
 Sometimes, values in cache are changed and
corresponding changes to main memory are delayed
 Machine looks for instructions as well as data for
those instructions in the cache
 Holds limited amount of data
Memory Hierarchy Diagram
Programs,
Main Memory DBMS
DBMS
Tertiary Storage
As Visual Memory
System
Disk
Main Memory
Cache
File
13.1.1 The Memory Hierarchy con’t


No need to update the data in main memory immediately
in a single processor computer
In multiple processors data is updated immediately to
main memory is called as write through
Main Memory




Everything happens in the computer i.e. instruction
execution, data manipulation, as working on information
that is resident in main memory
Main memories are random access one can obtain any
byte in the same amount of time
In the center of the action is the computer's main
memory. We may think of everything that happens
in the computer - instruction executions and data
manipulations - as working on information that is
resident in main memory
Typical times to access data from main memory to
the processor or cache are in the 10-100
nanosecond range
Secondary Storage





Used to store data and programs when they are not
being processed
More permanent than main memory, as data and
programs are retained when the power is turned off
E.g. magnetic disks, hard disks
Essentially every computer has some sort of
secondary storage, which is a form of storage that
is both significantly slower and significantly more
capacious than main memory.
The time to transfer a single byte between disk
and main memory is around 10 milliseconds.
Tertiary Storage




Holds data volumes in terabytes
Used for databases much larger than what can be stored
on disk
As capacious as a collection of disk units can be,
there are databases much larger than what can be
stored on the disk(s) of a single machine, or even
of a substantial collection of machines.
Tertiary storage is characterized by significantly
higher read/write times than secondary storage,
but also by much larger capacities and smaller cost
per byte than is available from magnetic disks.
13.1.2 Transfer of Data between Levels






Data moves between adjacent levels of the hierarchy
At the secondary or tertiary levels accessing the desired
data or finding the desired place to store the data takes
a lot of time
Disk is organized into bocks
Entire blocks are moved to and from memory called a
buffer
A key technique for speeding up database operations is
to arrange the data so that when one piece of data block
is needed it is likely that other data on the same block
will be needed at the same time
Same idea applies to other hierarchy levels
13.1.3 Volatile and Non Volatile Storage



A volatile device forgets what data is stored on it after
power off
Non volatile holds data for longer period even when
device is turned off
All the secondary and tertiary devices are non volatile
and main memory is volatile
13.1.4 Virtual Memory





Typical software executes in virtual memory
When we write programs the data we use,
variables of the program, files read and so on
occupies a virtual memory address space.
Address space is typically 32 bit or 232 bytes or 4GB
The Operating System manages virtual memory,
keeping some of it in main memory and the rest on
disk.
Transfer between memory and disk is in terms of blocks.
13.2.1 Mechanism of Disk





Mechanisms of Disks
 Use of secondary storage is one of the important
characteristic of DBMS
 Consists of 2 moving pieces of a disk
 1. disk assembly
 2. head assembly
 Disk assembly consists of 1 or more platters
 Platters rotate around a central spindle
 Bits are stored on upper and lower surfaces of
platters
Disk is organized into tracks
The track that are at fixed radius from center form one
cylinder
Tracks are organized into sectors
Tracks are the segments of circle separated by gap
A typical disk format from the text book
is shown as below:
13.2.2 Disk Controller


One or more disks are controlled by disk controllers
Disks controllers are capable of
 Controlling the mechanical actuator that moves the
head assembly
 Selecting the sector from among all those in the
cylinder at which heads are positioned
 Transferring bits between desired sector and main
memory
 Possible buffering an entire track
 Selecting a surface from which to read or write,
and selecting a sector from the track on that
surface that is under the head.
An example of single processor is shown in next
slide.
Simple computer system from the text
is shown below:
13.2.3 Disk Access Characteristics

Accessing (reading/writing) a block requires 3 steps
 Disk controller positions the head assembly at the
cylinder containing the track on which the block is
located. It is a ‘seek time’
 The disk controller waits while the first sector of the
block moves under the head. This is a ‘rotational
latency’
 All the sectors and the gaps between them pass the
head, while disk controller reads or writes data in
these sectors. This is a ‘transfer time’
The sum of the seek time, rotational latency,
transfer time is the latency of the time.
13.3 Accelerating Access to Secondary
Storage



Several approaches for more-efficiently accessing data in
secondary storage:
 Place blocks that are together in the same cylinder.
 Divide the data among multiple disks.
 Mirror disks.
 Use disk-scheduling algorithms.
 Pre fetch blocks into main memory.
Scheduling Latency – added delay in accessing data
caused by a disk scheduling algorithm.
Throughput – the number of disk accesses per second
that the system can accommodate.
13.3.1 The I/O Model of Computation


The number of block accesses (Disk I/O’s) is a good
time approximation for the algorithm.
 This should be minimized.
Ex 13.3: You want to have an index on R to identify the
block on which the desired tuple appears, but not where
on the block it resides.
 For Megatron 747 (M747) example, it takes 11ms to
read a 16k block.
 A standard microprocessor can execute millions of
instruction in 11ms, making any delay in searching
for the desired tuple negligible.
13.3.2 Organizing Data by Cylinders


If we read all blocks on a single track or cylinder
consecutively, then we can neglect all but first seek time
and first rotational latency.
Ex 13.4: We request 1024 blocks of M747.
 If data is randomly distributed, average latency is
10.76ms by Ex 13.2, making total latency 11s.
 If all blocks are consecutively stored on 1 cylinder:
 6.46ms + 8.33ms * 16 = 139ms
(1 average seek) (time per rotation)
(# rotations)
13.3.3 Using Multiple Disks



If we have n disks, read/write performance will increase
by a factor of n.
Striping – distributing a relation across multiple disks
following this pattern:
 Data on disk R1: R1, R1+n, R1+2n,…
 Data on disk R2: R2, R2+n, R2+2n,…
…
 Data on disk Rn: Rn, Rn+n, Rn+2n, …
Ex 13.5: We request 1024 blocks with n = 4.
 6.46ms + (8.33ms * (16/4)) = 39.8ms
(1 average seek) (time per rotation)
(# rotations)
13.3.4 Mirroring Disks




Mirroring Disks – having 2 or more disks hold identical
copied of data.
Benefit 1: If n disks are mirrors of each other, the
system can survive a crash by n-1 disks.
Benefit 2: If we have n disks, read performance
increases by a factor of n.
Performance increases further by having the controller
select the disk which has its head closest to desired data
block for each read.
13.3.5 Disk Scheduling and the Elevator
Problem


Disk controller will run this algorithm to select which of
several requests to process first.
Pseudo code:
 requests[] // array of all non-processed data
requests
 upon receiving new data request:
 requests[].add(new request)
 while(requests[] is not empty)
 move head to next location
 if(head location is at data in requests[])
 retrieve data
 remove data from requests[]
 if(head reaches end)
 reverse head direction
13.3.6 Prefetching and Large-Scale
Buffering

If at the application level, we can predict the order
blocks will be requested, we can load them into main
memory before they are needed.
13.4 Disk Failure - Types of Errors




Intermittent Error: Read or write is unsuccessful.
Media Decay: Bit or bits becomes permanently
corrupted.
Write Failure: Neither write or retrieve the data.
Disk Crash: Entire disk becomes unreadable.
13.4.1 Intermittent Failures








The most common form of failure.
If we try to read the sector but the correct content of that
sector is not delivered to the disk controller
Check for the good or bad sector
To check write is correct: Read is performed
Good sector and bad sector is known by the read operation
Parity checks can be used to detect this kind of
failure.
When we try to read a sector, but the correct
content of that sector is not delivered to the disk
controller.
If the controller has a way to tell that the sector is
good or bad (checksums), it can then reissue the
read request when bad data is read.
Media Decay




Serious form of failure.
Bit/Bits are permanently corrupted.
Impossible to read a sector correctly even after many
trials.
Stable storage technique for organizing a disk is used to
avoid this failure.
Write failure




Attempt to write a sector is not possible.
Attempt to retrieve previously written sector is
unsuccessful.
Possible reason – power outage while writing of the
sector.
Stable Storage Technique can be used to avoid this.
Disk Crash



Most serious form of disk failure.
Entire disk becomes unreadable, suddenly and
permanently.
RAID techniques can be used for coping with disk
crashes.
13.4.2 Checksums







Technique used to determine the good/bad status
of a sector.
Each sector has some additional bits, called the
checksums
Checksums are set on the depending on the values of
the data bits stored in that sector
Probability of reading bad sector is less if we use
checksums
For Odd parity: Odd number of 1’s, add a parity bit 1
For Even parity: Even number of 1’s, add a parity bit 0
So, number of 1’s becomes always even
13.4.2. Checksums –con’t

Example:
A sequence of bits 01101000 has odd number of 1’s.
The parity bit will be 1. So the sequence with the parity
bit will now be 011010001.
1. Sequence : 01101000-> odd no of 1’s
parity bit: 1 -> 011010001
A sequence of bits 11101110 will have an even parity as
it has even number of 1’s. So with the parity bit 0, the
sequence will be 111011100.
2. Sequence : 111011100->even no of 1’s
parity bit: 0 -> 111011100
13.4.2. Checksums –con’t






By finding one bit error in reading and writing the bits
and their parity bit results in sequence of bits that has
odd parity, so the error can be detected
Error detecting can be improved by keeping one bit for
each byte
Probability is 50% that any one parity bit will detect an
error, and chance that none of the eight do so is only one
in 2^8 or 1/256
Same way if n independent bits are used then the
probability is only 1/(2^n) of missing error
Any one-bit error in reading or writing the bits
results in a sequence of bits that has odd-parity.
The disk controller can count the number of 1’s
and can determine if the sector has odd parity in
the presence of an error.
13.4.3. Stable Storage





Checksums can detect the error but cannot correct it.
Sometimes we overwrite the previous contents of a
sector and yet cannot read the new contents correctly.
To deal with these problems, Stable Storage policy can
be implemented on the disks.
Sectors are paired and each pair represents one sectorcontents X.
The left copy of the sector may be represented as XL and
XR as the right copy.
13.4.3. Stable Storage Assumptions

We assume that copies are written with sufficient
number of parity bits to decrease the chance of bad
sector looks good when the parity checks are considered.

Also, If the read function returns a good value w for
either XL or XR then it is assumed that w is the true value
of X.
13.4.3. Stable Storage – Writing Policy

Write the value of X into XL. Check the value has status
“good”; i.e., the parity-check bits are correct in the
written copy. If not repeat write. If after a set number
of write attempts, we have not successfully written X in
XL, assume that there is a media failure in this sector. A
fix-up such as substituting a spare sector for XL must be
adopted.

Repeat (1) for XR.
13.4.3. Stable Storage – Reading Policy

The policy is to alternate trying to read XL and XR until a
good value is returned.

If a good value is not returned after pre chosen number
of tries, then it is assumed that X is truly unreadable.
13.4.4. Error Handling Capabilities of
Stable Storage


Failures: If out of Xl and Xr, one fails, it can be read form
other, but in case both fails X is not readable, and its
probability is very small
Write Failure: During power outage,
1. While writing Xl, the Xr, will remain good and X can
be read from Xr
2. After writing Xl, we can read X from Xl, as Xr may or
may not have the correct copy of X
13.4.5 Recovery from Disk Crashes

The most serious mode of failure for disks is “head
crash” where data permanently destroyed.

So to reduce the risk of data loss by disk crashes there
are number of schemes which are know as RAID
(Redundant Arrays of Independent Disks) schemes.

Each of the schemes starts with one or more disks that
hold the data and adding one or more disks that hold
information that is completely determined by the
contents of the data disks called Redundant Disk.
13.4.6. Mirroring as a Redundancy Technique

Mirroring Scheme is referred as RAID level 1 protection
against data loss scheme.

In this scheme we mirror each disk.

One of the disk is called as data disk and other
redundant disk.

In this case the only way data can be lost is if there is a
second disk crash while the first crash is being repaired.
13.4.7 Parity Blocks

RAID level 4 scheme uses only one redundant disk no
matter how many data disks there are.

In the redundant disk, the ith block consists of the parity
checks for the ith blocks of all the data disks.

It means, the jth bits of all the ith blocks of both data
disks and redundant disks, must have an even number
of 1’s and redundant disk bit is used to make this
condition true.
13.4.7 Parity Blocks – Reading disk
Reading data disk is same as reading block from
any disk.
• We could read block from each of the other disks and
compute the block of the disk we want to read by taking
the modulo-2 sum.
disk 2: 10101010
disk 3: 00111000
disk 4: 01100010
If we take the modulo-2 sum of the bits in each column,
we get
disk 1: 11110000
13.4.7 Parity Block - Writing

When we write a new block of a data disk, we need to
change that block of the redundant disk as well.

One approach to do this is to read all the disks and
compute the module-2 sum and write to the redundant
disk.

But this approach requires n-1 reads of data, write a
data block and write of redundant disk block.
Total = n+1 disk I/Os
Continue : Parity Block - Writing
•
Better approach will require only four disk I/Os
1. Read the old value of the data block being changed.
2. Read the corresponding block of the redundant disk.
3. Write the new data block.
4. Recalculate and write the block of the redundant disk.
Parity Blocks – Failure Recovery
If any of the data disk crashes then we just have to compute
the module-2 sum to recover the disk.
Suppose that disk 2 fails. We need to re compute each block of
the replacement disk. We are given the corresponding blocks of
the
first and third data disks and the redundant disk, so the
situation looks like:
disk 1: 11110000
disk 2: ????????
disk 3: 00111000
disk 4: 01100010
If we take the modulo-2 sum of each column, we deduce that
the missing block of disk 2 is : 10101010
13.4.8 An Improvement: RAID 5

RAID 4 is effective in preserving data unless there are two
simultaneous disk crashes.

Whatever scheme we use for updating the disks, we need
to read and write the redundant disk's block. If there are n
data disks, then the number of disk writes to the
redundant disk will be n times the average number of
writes to any one data disk.

However we do not have to treat one disk as the
redundant disk and the others as data disks. Rather, we
could treat each disk as the redundant disk for some of the
blocks. This improvement is often called RAID level 5.
Con’t: An Improvement: RAID 5
• For instance, if there are n + 1 disks numbered 0
through n, we could treat the ith cylinder of disk j as
redundant if j is the remainder when i is divided by n+1.
• For example, n = 3 so there are 4 disks. The first disk,
numbered 0, is redundant for its cylinders numbered 4,
8, 12, and so on, because these are the numbers that
leave remainder 0 when divided by 4.
• The disk numbered 1 is redundant for blocks numbered
1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6.
10,. . ., and disk 3 is redundant for 3, 7, 11,. . . .
13.4.9 Coping With Multiple Disk
Crashes
•
Error-correcting codes theory known as Hamming code
leads to the RAID level 6.
•
By this strategy the two simultaneous crashes are
correctable.
 The bits of disk 5 are the modulo-2 sum of the
corresponding bits of disks 1, 2, and 3.
 The bits of disk 6 are the modulo-2 sum of the
corresponding bits of disks 1, 2, and 4.
 The bits of disk 7 are the module2 sum of the
corresponding bits of disks 1, 3, and 4
Coping With Multiple Disk Crashes –
Reading/Writing

We may read data from any data disk normally.

To write a block of some data disk, we compute the
modulo-2 sum of the new and old versions of that block.
These bits are then added, in a modulo-2 sum, to the
corresponding blocks of all those redundant disks that
have 1 in a row in which the written disk also has 1.
13.5 Arranging data on disk

Data elements are represented as records, which stores
in consecutive bytes in same disk block.

Basic layout techniques of storing data :
1. Fixed-Length Records
2. Allocation criteria - data should start at word
boundary.

Fixed Length record header
1. A pointer to record schema.
2. The length of the record.
3. Timestamps to indicate last modified or last read.
Data on disk - Example
CREATE TABLE employee(
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE );
Data should start at word boundary and contain header
and four fields name, address, gender and birthdate.
13.5 Packing Fixed-Length Records into
Blocks


Records are stored in the form of blocks on the disk and
they move into main memory when we need to update
or access them.
A block header is written first, and it is followed by series
of blocks.
13.5 Block header contains the following
information





Links to one or more blocks that are part of a network
of blocks.
Information about the role played by this block in such a
network.
Information about the relation, the tuples in this block
belong to.
A "directory" giving the offset of each record in the
block.
Time stamp(s) to indicate time of the block's last
modification and/or access.
13.5 Block header -Example
Along with the header we can pack as many record as we can
in one block as shown in the figure and remaining space will
be unused.
13.6 Representing Block And Record
Addresses

Address of a block and Record
 In Main Memory
 Address of the block is the virtual memory
address of the first byte
 Address of the record within the block is the
virtual memory address of the first byte of the
record
 In Secondary Memory: sequence of bytes describe
the location of the block in the overall system

Sequence of Bytes describe the location of the block :
the device Id for the disk, Cylinder number, etc.
13.6.1 Addresses In Client-server
Systems


The addresses in address space are represented in two
ways
 Physical Addresses: byte strings that determine the
place within the secondary storage system where the
record can be found.
 Logical Addresses: arbitrary string of bytes of some
fixed length
Physical Address bits are used to indicate:
 Host to which the storage is attached
 Identifier for the disk
 Number of the cylinder
 Number of the track
 Offset of the beginning of the record
Addresses In Client-server Systems
(Contd..)
 Map Table relates logical addresses to physical addresses.
Logical
Physical
Logical Address
Physical Address
13.6.2 Logical And Structured
Addresses
 Purpose of logical address?
 Gives more flexibility, when we
 Move the record around within the block
 Move the record to another block
 Gives us an option of deciding what to do when a record is
deleted?
Unused
Rec
ord
4
Offset table
Header
Rec
ord
3
Rec
ord
2
Rec
ord
1
13.6.3 POINTER SWIZZLING
 Having pointers is common in an object-relational
database systems
 Important to learn about the management of pointers
 Every data item (block, record, etc.) has two addresses:
 database address: address on the disk
 memory address, if the item is in virtual memory
Pointer Swizzling (Contd…)
 Translation Table: Maps database address to memory
address
 All addressable items in the database have entries in the
map table, while only those items currently in memory are
mentioned in the translation table
Database address
Dbaddr
Mem-addr
Memory Address
Pointer Swizzling (Contd…)
 Pointer consists of the following two fields
 Bit indicating the type of address
 Database or memory address
Disk
Memory
Swizzled
Block 1
Block 1
Unswizzled
Block 2
Example 13.7
 Block 1 has a record with pointers to a second record on
the same block and to a record on another block
 If Block 1 is copied to the memory
 The first pointer which points within Block 1 can be
swizzled so it points directly to the memory address of
the target record
 Since Block 2 is not in memory, we cannot swizzle the
second pointer
Pointer Swizzling (Contd…)
 Three types of swizzling
 Automatic Swizzling
As soon as block is brought into memory, swizzle all
relevant pointers.
 Swizzling on Demand
Only swizzle a pointer if and when it is actually
followed.
 No Swizzling
Pointers are not swizzled they are accesses using the
database address.
Programmer Control Of Swizzling
 Unswizzling
 When a block is moved from memory back to disk, all
pointers must go back to database (disk) addresses
 Use translation table again
 Important to have an efficient data structure for the
translation table
Pinned Records And Blocks
 A block in memory is said to be pinned if it cannot be
written back to disk safely.
 If block B1 has swizzled pointer to an item in block B2,
then B2 is pinned
 Unpin a block, we must unswizzle any pointers to it
 Keep in the translation table the places in memory
holding swizzled pointers to that item
 Unswizzle those pointers (use translation table to
replace the memory addresses with database (disk)
addresses
13.7.1 Records with Variable Fields
An effective way to represent variable length records
is as follows
 Fixed length fields are Kept ahead of the variable
length fields
Record header contains
 Length of the record
 Pointers to the beginning of all variable
 length fields except the first one.
Records with Variable Length Fields
header information
record length
to address
gender
birth date
name
address
Figure 2 : A Movie Star record with name and address
implemented as variable length character strings
13.7.2 Records with Repeating Fields
Records contains variable number of occurrences of a field F
 All occurrences of field F are grouped together and the
record
 Header contains a pointer to the first occurrence of field F
 L bytes are devoted to one instance of field F
 Locating an occurrence of field F within the record
 Add to the offset for the field F which are the integer
multiples of L starting with 0 , L ,2L,3L and so on to locate
We stop upon reaching the offset of the field F.
13.7.2 Records with Repeating Fields
other header information
record length
to address
to movie pointers
name
address
pointers to movies
Figure 3 : A record with a repeating group of references to movies
13.7.2 Records with Repeating Fields
record header
information
address
to name
length of name
to address
length of address
to movie references
number of references
name
Figure 4 : Storing variable-length fields separately from the record
13.7.1 Records with Repeating Fields
Advantage
 Keeping the record itself fixed length allows record to be
searched more efficiently, minimizes the overhead in the
block headers, and allows records to be moved within or
among the blocks with minimum effort.
Disadvantage
 Storing variable length components on another block
increases the number of disk I/O’s needed to examine all
components of a record.
13.7.2 Records with Repeating Fields
A compromise strategy is to allocate a fixed portion of the
record for the repeating fields
 If the number of repeating fields is lesser than
 allocated space, then there will be some unused space

 If the number of repeating fields is greater than
 allocated space, then extra fields are stored in a
 different location and
Pointer to that location and count of additional
 occurrences is stored in the record
13.7.3 Variable Format Records
 Records that do not have fixed schema
 Variable format records are represented by sequence of
tagged fields
 Each of the tagged fields consist of information
• Attribute or field name
• Type of the field
• Length of the field
• Value of the field
 Why use tagged fields
• Information – Integration applications
• Records with a very flexible schema
13.7.3 Variable Format Records
code for name
code for string type
length
N
S
14
Clint Eastwood
code for restaurant owned
code for string type
length
R
S
16
Fig 5 : A record with tagged fields
Hog’s Breath Inn
13.7.4 Records that do not fit in a
block
 When the length of a record is greater than block size ,then
then record is divided and placed into two or more blocks
 Portion of the record in each block is referred to as a
RECORD FRAGMENT
 Record with two or more fragments is called
SPANNED RECORD
 Record that do not cross a block boundary is called
UNSPANNED RECORD
Spanned Records
Spanned records require the following extra header
information
• A bit indicates whether it is fragment or not
• A bit indicates whether it is first or last fragment of
a record
• Pointers to the next or previous fragment for the
same record
13.7.4 Records that do not fit in a
block
block header
record header
record 1
block 1
record
2-a
record
2-b
record 3
block 2
Figure 6 : Storing spanned records across blocks
13.7.5 BLOBS
 Large binary objects are called BLOBS
e.g. : audio files, video files
Storage of BLOBS
Retrieval of BLOBS
13.8 Record Modification

What is Record ?
Record is a single, implicitly structured data item
in the database table. Record is also called as
Tuple.

What is definition of Record Modification ?
We say Records Modified when a data
manipulation operation is performed.
Modification Types:
Insertion, Deletion, Update
13.8 Insertion

Insertion of records without order
Records can be placed in a block with empty space or
in a new block.
Insertion of records in fixed order
 Space available in the block
 No space available in the block (outside the block)
Structured address
Pointer to a record from outside the block.
13.8 Insertion in fixed order
Space available within the block
 Use of an offset table in the header of each block with
pointers to the location of each record in the block.
 The records are slid within the block and the pointers in
the offset table are adjusted.
Offset
table
header
unused
Record 4
Record 3
Record 2
Record 1
13.8 Insertion in fixed order
No space available within the block (outside the block)

Find space on a “nearby” block.
•
•

In case of no space available on a block, look at the following block in sorted order of
blocks.
If space is available in that block ,move the highest records of first block 1 to block 2 and
slide the records around on both blocks.
Create an overflow block
•
•
•
Records can be stored in overflow block.
Each block has place for a pointer to an overflow block in its header.
The overflow block can point to a second overflow block as shown below.
Block
B
Overflow
block for B
13.8 Deletion

Recover space after deletion
 When using an offset table, the records can be slid
around the block so there will be an unused region in
the center that can be recovered.
 In case we cannot slide records, an available space
list can be maintained in the block header.
 The list head goes in the block header and available
regions hold the links in the list.
13.8 Deletion
Use of tombstone

The tombstone is placed in a record in order to avoid
pointers to the deleted record to point to new records.

The tombstone is permanent until the entire database
is reconstructed.
 If pointers go to fixed locations from which the location
of the record is found then we put the tombstone in
that fixed location. (See examples)

Where a tombstone is placed depends on the nature of
the record pointers.

Map table is used to translate logical record address to
physical address.
13.8 Deletion

Use of tombstone

If we need to replace records by tombstones, place the bit that serves
as the tombstone at the beginning of the record.

This bit remains the record location and subsequent bytes can be
reused for another record
Record 1
Record 2
Record 1 can be replaced, but the tombstone remains, record 2 has no
tombstone and can be seen when we follow a pointer to it.
82
13.8 Update


Fixed Length update
No effect on storage system as it occupies same
space as before update.

Variable length update
 Longer length
 Short length
83
13.8 Update
Variable length update (longer length)
 Stored on the same block:
 Sliding records
 Creation of overflow block.
 Stored on another block
 Move records around that block
 Create a new block for storing variable length fields.
Variable length update (Shorter length)
 Same as deletion
 Recover space
 Consolidate space.
84
14.2 BTrees & Bitmap Indexes
14.2 BTree Structure

A balanced tree, meaning that all paths from the
leaf node have the same length.

There is a parameter n associated with each BTree
block. Each block will have space for n search keys
and n+1 pointers.

The root may have only 1 parameter, but all other
blocks most be at least half full.
14.2 Structure
● A typical node >
● a typical interior
node would have
pointers pointing to
leaves with out
values
● a typical leaf would
have pointers point
to records
N search keys
N+1 pointers
14.2 Application

The search key of the BTree is the primary key for the
data file.

Data file is sorted by its primary key.

Data file is sorted by an attribute that is not a key and
this attribute is the search key for the BTree.
14.2 Lookup
If at an interior node, choose the correct pointer to use.
This is done by comparing keys to search value.
If at a leaf node, choose the key that matches what you are looking
for and the pointer for that leads to the data.
14.2 Insertion

When inserting, choose the correct leaf node to put
pointer to data.

If node is full, create a new node and split keys
between the two.

Recursively move up, if cannot create new pointer to
new node because full, create new node.

This would end with creating a new root node, if
the current root was full.
14.2 Deletion

Perform lookup to find node to delete and delete it.

If node is no longer half full, perform join on adjacent
node and recursively delete up, or key move if that node
is full and recursively change pointer up.
14.2 Efficiency

Btrees allow lookup, insertion, and deletion of records
using very few disk I/Os.

Each level of a BTree would require one read. Then you
would follow the pointer of that to the next or final read.

Three levels are sufficient for Btrees. Having each block
have 255 pointers, 255^3 is about 16.6 million.

You can even reduce disk I/Os by keeping a level of a
BTree in main memory. Keeping the first block with 255
pointers would reduce the reads to 2, and even possible
to keep the next 255 pointers in memory to reduce
reads to 1.
14.7 BTree Indexes - Definition
A bitmap index for a field F is a collection of bit-vectors of
length n, one for each possible value that may appear in
that field F.[1]
14.7 What does that mean?

Assume relation R with
 2 attributes A and B.
 Attribute A is of type
Integer and B is of
type String.
 6 records, numbered
1 through 6 as
shown.
A
B
1
30
foo
2
30
bar
3
40
baz
4
50
foo
5
40
bar
6
30
baz
14.7 Example Continued…
 A bitmap for attribute B is:
Value
foo
bar
baz
Vector
100100
010010
001001
A
B
1
30
foo
2
30
bar
3
40
baz
4
50
foo
5
40
bar
6
30
baz
14.7 Where do we reach?

A bitmap index is a special kind of database index that
uses bitmaps.[2]

Bitmap indexes have traditionally been considered to
work well for data such as gender, which has a small
number of distinct values, e.g., male and female, but
many occurrences of those values.[2]
14.7 A little more…



A bitmap index for attribute A of relation R is:
 A collection of bit-vectors
 The number of bit-vectors = the number of distinct
values of A in R.
 The length of each bit-vector = the cardinality of R.
 The bit-vector for value v has 1 in position i, if the ith
record has v in attribute A, and it has 0 there if
not.[3]
Records are allocated permanent numbers.[3]
There is a mapping between record numbers and record
addresses.[3]
14.7 Motivation for Bitmap Indexes

Very efficient when used for partial match queries.[3]

They offer the advantage of buckets [2]
 Where we find tuples with several specified attributes
without first retrieving all the record that matched in
each of the attributes.

They can also help answer range queries [3]
14.7 Another Example
Multidimensional Array of multiple types
{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}
5
79
4
6
d
t
a
= 100010
= 010100
= 001000
= 000001
= 101100
= 010010
= 000001
14.7 Example Continued…
{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}
Searching for items is easy, just AND together.
To search for (5,d)
5 = 100010
d = 101100
100010 AND 101100 = 100000
The location of the
record has been traced!
14.7 Compressed Bitmaps

Assume:
 The number of records in R are n
 Attribute A has m distinct values in R

The size of a bitmap index on attribute A is m*n.

If m is large, then the number of 1’s will be around 1/m.
 Opportunity to encode

A common encoding approach is called run-length
encoding.[1]
Run-length encoding

Represents runs
 A run is a sequence of i 0’s followed by a 1, by some suitable
binary encoding of the integer i.

A run of i 0’s followed by a 1 is encoded by:
 First computing how many bits are needed to represent i, Say k
 Then represent the run by k-1 1’s and a single 0 followed by k bits
which represent i in binary.
 The encoding for i = 1 is 01. k = 1
 The encoding for i = 0 is 00. k = 1

We concatenate the codes for each run together, and the sequence of bits is
the encoding of the entire bit-vector
Understanding with an Example
 Let us decode the sequence 11101101001011
 Staring at the beginning (left most bit):
 First run: The first 0 is at position 4, so k = 4. The next 4 bits
are 1101, so we know that the first integer is i = 13
 Second run: 001011
 k=1
 i=0
 Last run: 1011
 k=1
 i=3
 Our entire run length is thus 13,0,3, hence our bit-vector is:
0000000000000110001
Managing Bitmap Indexes
1) How do you find a specific bit-vector for a
value efficiently?
2) After selecting results that match, how do you retrieve
the results efficiently?
3) When data is changed, do you you alter bitmap index?
1) Finding bit vectors
 Think of each bit-vector as a key to a value.[1]
 Any secondary storage technique will be efficient in
retrieving the values.[1]
 Create secondary key with the attribute value as a
search key [3]
 Btree
 Hash
2) Finding Records

Create secondary key with the record number as a
search key [3]

Or in other words,
 Once you learn that you need record k, you can
create a secondary index using the kth position as a
search key.[1]
3) Handling Modifications
Two things to remember:
Record numbers must remain fixed once assigned
Changes to data file require changes to bitmap index
14.7 Deletion
 Tombstone replaces deleted record
 Corresponding bit is set to 0
14.7 Insertion
 Record assigned the next record number.
 A bit of value 0 or 1 is appended to each bit
vector
 If new record contains a new value of the
attribute, add one bit-vector.
14.7 Modification
 Change the bit corresponding to the old value
of the modified record to 0
 Change the bit corresponding to the new value
of the modified record to 1
 If the new value is a new value of A, then
insert a new bit-vector.
Chapter 15
15.1
Query Execution
111
15.1 What is a Query Processor

Group of components of a DBMS that converts a user
queries and data-modification commands into a
sequence of database operations

It also executes those operations

Must supply detail regarding how the query is to be
executed
112
15.1 Major parts of Query processor
Query Execution:
The algorithms that
manipulate the data of
the database.
Focus on the
operations of extended
relational algebra.
113
15.1Outline of Query Compilation
Query compilation
 Parsing : A parse tree for the
query is constructed
 Query Rewrite : The parse
tree is converted to an initial
query plan and transformed
into logical query plan (less
time)
 Physical Plan Generation :
Logical Q Plan is converted into
physical query plan by selecting
algorithms and order of
execution of these operator.
114
15.1Physical-Query-Plan Operators

Physical operators are implementations of the operator
of relational algebra.

They can also be use in non relational algebra
operators like “scan” which scans tables, that is, bring
each tuple of some relation into main memory
115
15.1 Scanning Tables


One of the basic thing we can do in a Physical query plan is
to read the entire contents of a relation R.
Variation of this operator involves simple predicate, read
only those tuples of the relation R that satisfy the predicate.
Basic approaches to locate the tuples of a relation R
 Table Scan
 Relation R is stored in secondary memory with its
tuples arranged in blocks
 It is possible to get the blocks one by one
 Index-Scan
 If there is an index on any attribute of Relation R, we
can use this index to get all the tuples of Relation R
116
15.1 Sorting While Scanning Tables

Number of reasons to sort a relation
 Query could include an ORDER BY clause, requiring
that a relation be sorted.
 Algorithms to implement relational algebra operations
requires one or both arguments to be sorted
relations.
 Physical-query-plan operator sort-scan takes a
relation R, attributes on which the sort is to be made,
and produces R in that sorted order
117
15.1 Computation Model for Physical
Operator



Physical-Plan Operator should be selected wisely which is
essential for good Query Processor .
For “cost” of each operator is estimated by number of
disk I/O’s for an operation.
The total cost of operation depends on the size of the
answer, and includes the final write back cost to the total
cost of the query.
118
15.1 Parameters for Measuring Costs




Parameters that affect the performance of a query
 Buffer space availability in the main memory at
the time of execution of the query
 Size of input and the size of the output generated
 The size of memory block on the disk and the size in
the main memory also affects the performance
B: The number of blocks are needed to hold all tuples of
relation R. Also denoted as B(R)
T: The number of tuples in relationR. Also denoted as T(R)
V: The number of distinct values that appear in a column of
a relation R.V(R, a)- is the number of distinct values of
column for a in relation R
119
15.1. I/O Cost for Scan Operators

If relation R is clustered, then the number of disk I/O for
the table-scan operator is = ~B disk I/O’s

If relation R is not clustered, then the number of required
disk I/O generally is much higher

A index on a relation R occupies many fewer than B(R)
blocks

That means a scan of the entire relation R which takes at
least B disk I/O’s will require more I/O’s than the entire
index
120
15.1. Iterators for Implementation of
Physical Operators

Many physical operators can be implemented as an
Iterator.

Three methods forming the iterator for an operation are:

1. Open( ) :
 This method starts the process of getting tuples
 It initializes any data structures needed to perform
the operation
121
15.1 Iterators for Implementation of
Physical Operators


2. GetNext( ):
 Returns the next tuple in the result
 If there are no more tuples to return, GetNext
returns a special value NotFound
3. Close( ) :
 Ends the iteration after all tuples
 It calls Close on any arguments of the operator
122
15.2 One-Pass Algorithms for Database
Operations -Introduction



The choice of an algorithm for each operator is an
essential part of the process of transforming a logical
query plan into a physical query plan.
Main classes of Algorithms:
 Sorting-based methods
 Hash-based methods
 Index-based methods
Division based on degree difficulty and cost:
 1-pass algorithms
 2-pass algorithms
 3 or more pass algorithms
123
15.2. One-Pass Algorithm Methods
 Tuple-at-a-time, unary operations: (selection &
projection)
 Full-relation, unary operations
 Full-relation, binary operations (set & bag versions of
union)
124
15.2 One-Pass Algorithms for Tuple-at
-a-Time Operations
 Tuple-at-a-time operations are selection and projection
 read the blocks of R one at a time into an input buffer
 perform the operation on each tuple
 move the selected tuples or the projected tuples to
the output buffer
 The disk I/O requirement for this process depends only
on how the argument relation R is provided.
 If R is initially on disk, then the cost is whatever it
takes to perform a table-scan or index-scan of R.
125
15.2 A selection or projection being
performed on a relation R
126
15.2 One-Pass Algorithms for Unary,
fill-Relation Operations
 Duplicate Elimination
 To eliminate duplicates, we can read each block of R
one at a time, but for each tuple we need to make a
decision as to whether:
 It is the first time we have seen this tuple, in which
case we copy it to the output, or
 We have seen the tuple before, in which case we
must not output this tuple.
 One memory buffer holds one block of R's tuples, and
the remaining M - 1 buffers can be used to hold a single
copy of every tuple.
127
15.2.Managing memory for a one-pass
duplicate-elimination
128
15.2. Duplicate Elimination


When a new tuple from R is considered, we compare it
with all tuples seen so far
 if it is not equal: we copy both to the output and add
it to the in-memory list of tuples we have seen.
 if there are n tuples in main memory: each new tuple
takes processor time proportional to n, so the
complete operation takes processor time proportional
to n2.
We need a main-memory structure that allows each of
the operations:
 Add a new tuple, and
 Tell whether a given tuple is already there
129
15.2. Duplicate Elimination (contd.)

The different structures that can be used for such main
memory structures are:
 Hash table
 Balanced binary search tree
130
15.2 One-Pass Algorithms for Unary,
fill-Relation Operations

Grouping
 The grouping operation gives us zero or more
grouping attributes and presumably one or more
aggregated attributes
 If we create in main memory one entry for each
group then we can scan the tuples of R, one block at
a time.
 The entry for a group consists of values for the
grouping attributes and an accumulated value or
values for each aggregation.
131
15.2. Grouping

The accumulated value is:
 For MIN(a) or MAX(a) aggregate, record minimum
/maximum value, respectively.
 For any COUNT aggregation, add 1 for each tuple of
group.
 For SUM(a), add value of attribute a to the
accumulated sum for its group.
 AVG(a) is a hard case. We must maintain 2
accumulations: count of no. of tuples in the group &
sum of a-values of these tuples. Each is computed as
we would for a COUNT & SUM aggregation,
respectively. After all tuples of R are seen, take
quotient of sum & count to obtain average.
132
15.2. One-Pass Algorithms for Binary
Operations

Binary operations include:
 Union
 Intersection
 Difference
 Product
 Join
133
15.2. Set Union

We read S into M - 1 buffers of main memory and build
a search structure where the search key is the entire
tuple.

All these tuples are also copied to the output.

Read each block of R into the Mth buffer, one at a time.

For each tuple t of R, see if t is in S, and if not, we
copy t to the output. If t is also in S, we skip t.
134
15.2. Set Intersection

Read S into M - 1 buffers and build a search structure
with full tuples as the search key.

Read each block of R, and for each tuple t of R, see if t
is also in S. If so, copy t to the output, and if not,
ignore t.
135
15.2. Set Difference




Read S into M - 1 buffers and build a search structure
with full tuples as the search key.
To compute R -s S, read each block of R and examine
each tuple t on that block. If t is in S, then ignore t; if
it is not in S then copy t to the output.
To compute S -s R, read the blocks of R and examine
each tuple t in turn. If t is in S, then delete t from the
copy of S in main memory, while if t is not in S do
nothing.
After considering each tuple of R, copy to the output
those tuples of S that remain.
136
15.2. Bag Intersection



Read S into M - 1 buffers.
Multiple copies of a tuple t are not stored individually.
Rather store 1 copy of t & associate with it a count
equal to no. of times t occurs.
Next, read each block of R, & for each tuple t of R see
whether t occurs in S. If not ignore t; it cannot appear
in the intersection. If t appears in S, & count
associated with t is (+)ve, then output t & decrement
count by 1. If t appears in S, but count has reached 0,
then do not output t; we have already produced as
many copies of t in output as there were copies in S.
137
15.2. Bag Difference

To compute S -B R, read tuples of S into main memory
& count no. of occurrences of each distinct tuple.

Then read R; check each tuple t to see whether t
occurs in S, and if so, decrement its associated count.
At the end, copy to output each tuple in main memory
whose count is positive, & no. of times we copy it
equals that count.

To compute R -B S, read tuples of S into main memory
& count no. of occurrences of distinct tuples.
138
15.2. Bag Difference (…contd.)

Think of a tuple t with a count of c as c reasons not to
copy t to the output as we read tuples of R.

Read a tuple t of R; check if t occurs in S. If not, then
copy t to the output. If t does occur in S, then we look at
current count c associated with t. If c = 0, then copy t to
output. If c > 0, do not copy t to output, but decrement
c by 1.
139
15.2. Product

Read S into M - 1 buffers of main memory

Then read each block of R, and for each tuple t of R
concatenate t with each tuple of S in main memory.

Output each concatenated tuple as it is formed.

This algorithm may take a considerable amount of
processor time per tuple of R, because each such tuple
must be matched with M - 1 blocks full of tuples.
However, output size is also large, & time/output tuple is
small.
140
15.2. Natural Join
 Convention: R(X, Y) is being joined with S(Y, Z), where Y
represents all the attributes that R and S have in
common, X is all attributes of R that are not in the
schema of S, & Z is all attributes of S that are not in the
schema of R. Assume that S is the smaller relation.
 To compute the natural join, do the following:
 Read all tuples of S & form them into a mainmemory search structure.

Hash table or balanced tree are good e.g. of
such structures. Use M - 1 blocks of memory for this
purpose.
141
15.2. Natural Join (…contd.)

Read each block of R into 1 remaining main-memory
buffer.
For each tuple t of R, find tuples of S that agree with t
on all attributes of Y, using the search structure.
For each matching tuple of S, form a tuple by joining it
with t, & move resulting tuple to output.
142
15.5 Two Pass algorithms based on
Hashing

Hashing is done if the data is too big to store in main
memory buffers.
 Hash all the tuples of the argument(s) using an
appropriate hash key.
 For all the common operations, there is a way to
select the hash key so all the tuples that need to be
considered together when we perform the operation
have the same hash value.
 This reduces the size of the operand(s) by a factor
equal to the number of buckets.
15.5 Partitioning Relations by Hashing
Algorithm:
initialize M-1 buckets using M-1 empty buffers;
FOR each block b of relation R DO BEGIN
read block b into the Mth buffer;
FOR each tuple t in b DO BEGIN
IF the buffer for bucket h(t) has no room for t THEN
BEGIN
copy the buffer t o disk;
initialize a new empty block in that buffer;
END;
copy t to the buffer for bucket h(t);
END ;
END ;
FOR each bucket DO
IF the buffer for this bucket is not empty THEN
write the buffer to disk;
15.5 Duplicate Elimination

For the operation δ(R) hash R to M-1 Buckets.
(Note that two copies of the same tuple t will hash to the
same bucket)

Do duplicate elimination on each bucket Ri
independently, using one-pass algorithm

The result is the union of δ(Ri), where Ri is the portion of
R that hashes to the ith bucket
15.5 Requirements

Number of disk I/O's: 3*B(R)
 B(R) < M(M-1), only then the two-pass, hash-based
algorithm will work

In order for this to work, we need:
 hash function h evenly distributes the tuples among
the buckets
 each bucket Ri fits in main memory (to allow the onepass algorithm)
 i.e., B(R) ≤ M2
15.5 Grouping and Aggregation

Hash all the tuples of relation R to M-1 buckets, using a
hash function that depends only on the grouping
attributes
(Note: all tuples in the same group end up in the same
bucket)

Use the one-pass algorithm to process each bucket
independently

Uses 3*B(R) disk I/O's, requires B(R) ≤ M2
15.5 Union, Intersection, and
Difference
 For binary operation we use the same hash function to
hash tuples of both arguments.
 R U S we hash both R and S to M-1
 R ∩ S we hash both R and S to 2(M-1)
 R-S we hash both R and S to 2(M-1)
 Requires 3(B(R)+B(S)) disk I/O’s.
 Two pass hash based algorithm requires min(B(R)+B(S))≤
M2
15.5 Hash-Join Algorithm
 Use same hash function for both relations; hash function
should depend only on the join attributes
 Hash R to M-1 buckets R1, R2, …, RM-1
 Hash S to M-1 buckets S1, S2, …, SM-1
 Do one-pass join of Ri and Si, for all I
 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M2
15.5 Sort based Vs Hash based
 For binary operations, hash-based only limits size to min
of arguments, not sum
 Sort-based can produce output in sorted order, which can
be helpful
 Hash-based depends on buckets being of equal size
 Sort-based algorithms can experience reduced rotational
latency or seek time
15.6 Index-Based Algorithms Clustering and Non clustering Indexes

Clustered Relation: Tuples are packed into roughly as
few blocks as can possibly hold those tuples

Clustering indexes: Indexes on attributes that all the
tuples with a fixed value for the search key of this index
appear on roughly as few blocks as can hold them

A relation that isn’t clustered cannot have a clustering
index

A clustered relation can have nonclustering indexes
15.6 Index-Based Selection

For a selection σC(R), suppose C is of the form a=v,
where a is an attribute

For clustering index R.a:
the number of disk I/O’s will be B(R)/V(R,a)

The actual number may be higher:
1. index is not kept entirely in main memory
2. they spread over more blocks
3. may not be packed as tightly as possible into blocks
152
15.6 Example





B(R)=1000, T(R)=20,000 number of I/O’s required:
1. clustered, not index
1000
2. not clustered, not index
20,000
3. If V(R,a)=100, index is clustering
10
4. If V(R,a)=10, index is nonclustering 2,000
153
15.6 Joining by Using an Index

Natural join R(X, Y) S S(Y, Z)
Number of I/O’s to get R
Clustered: B(R)
Not clustered: T(R)
Number of I/O’s to get tuple t of S
Clustered: T(R)B(S)/V(S,Y)
Not clustered: T(R)T(S)/V(S,Y)
154
15.6 Example

R(X,Y): 1000 blocks S(Y,Z)=500 blocks
Assume 10 tuples in each block,
so T(R)=10,000 and T(S)=5000
V(S,Y)=100
If R is clustered, and there is a clustering index on Y for S
the number of I/O’s for R is:
1000
the number of I/O’s for S is10,000*500/100=50,000
155
15.6 Joins Using a Sorted Index



Natural join R(X, Y) S (Y, Z) with index on Y for either
R or S
Extreme case: Zig-zag join
Example:
relation R(X,Y) and R(Y,Z) with index on Y for both
relations
search keys (Y-value) for R: 1,3,4,4,5,6
search keys (Y-value) for S: 2,2,4,6,7,8
156
15.7 Buffer Management -What does a
buffer manager do?
Assume there are M of main-memory buffers needed for the operators
on relations to store needed data.
In practice:
1) rarely allocated in advance
2) the value of M may vary depending on system conditions
Therefore, buffer manager is used to allow processes to get the
memory they need, while minimizing the delay and unclassifiable
requests.
15.7. The role of the buffer manager
Read/Writes
Requests
Buffers
Buffer
manager
Figure 1: The role of the buffer manager : responds to requests for
main-memory access to disk blocks
15.7.1 Buffer Management Architecture
Two broad architectures for a buffer manager:
1) The buffer manager controls main memory directly.
• Relational DBMS
2) The buffer manager allocates buffers in virtual memory,
allowing the OS to decide how to use buffers.
• “main-memory” DBMS
• “object-oriented” DBMS
15.7.1 Buffer Pool
Key setting for the Buffer manager to be efficient:
The buffer manager should limit the number of buffers in
use so that they fit in the available main memory, i.e. Don’t
exceed available space.
The number of buffers is a parameter set when the DBMS is
initialized.
No matter which architecture of buffering is used, we simply
assume that there is a fixed-size buffer pool, a set of
buffers available to queries and other database actions.
15.7.1 Buffer Pool
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK


DB
choice of frame dictated
by replacement policy
Data must be in RAM for DBMS to operate on it!
Buffer Manager hides the fact that not all data is in RAM.
15.7.2 Buffer Management Strategies
Buffer-replacement strategies:
When a buffer is needed for a newly requested block and the
buffer pool is full, what block to throw out the buffer
pool?
15.7.2 Buffer-replacement strategy LRU
Least-Recently Used (LRU):
To throw out the block that has not been read or written
for the longest time.
• Requires more maintenance but it is effective.
• Update the time table for every access.
• Least-Recently Used blocks are usually less likely to be
accessed sooner than other blocks.
15.7.2 Buffer-replacement strategy -FIFO
First-In-First-Out (FIFO):
The buffer that has been occupied the longest by the
same block is emptied and used for the new block.
• Requires less maintenance but it can make more
mistakes.
• Keep only the loading time
• The oldest block doesn’t mean it is less likely to be
accessed.
Example: the root block of a B-tree index
15.7.2.Buffer-replacement strategy –
“Clock”
The “Clock” Algorithm (“Second Chance”)
Think of the 8 buffers as arranged in a circle, shown
as Figure 3
Flag 0 and 1:
buffers with a 0 flag are ok to sent their
contents back to disk, i.e. ok to be replaced
buffers with a 1 flag are not ok to be replaced
15.7.2 Buffer-replacement strategy –
“Clock”
0
0
1
0
the buffer with
a 0 flag will
be replaced
0
0
1
1
Start point to
search a 0 flag
The flag will
be set to 0
By next time the hand
reaches it, if the content of
this buffer is not accessed,
i.e. flag=0, this buffer will
be replaced.
That’s “Second Chance”.
Figure 3: the clock algorithm
15.7.2 Buffer-replacement strategy -Clock
a buffer’s flag set to 1 when:
a block is read into a buffer
the contents of the buffer is accessed
a buffer’s flag set to 0 when:
the buffer manager needs a buffer for a new block, it looks
for the first 0 it can find, rotating clockwise. If it passes 1’s,
it sets them to 0.
15.7 System Control helps Bufferreplacement strategy
System Control
The query processor or other components of a DBMS can give
advice to the buffer manager in order to avoid some of the
mistakes that would occur with a strict policy such as
LRU,FIFO or Clock.
For example:
A “pinned” block means it can’t be moved to disk without first
modifying certain other blocks that point to it.
In FIFO, use “pinned” to force root of a B-tree to remain in
memory at all times.
15.7.3 The Relationship Between Physical
Operator Selection and Buffer Management
Problem:
Physical Operator expected certain number of buffers M
for execution.
However, the buffer manager may not be able to
guarantee these M buffers are available.
15.7.3 The Relationship Between Physical
Operator Selection and Buffer Management
Questions:
Can the algorithm adapt to changes of M, the number of
main-memory buffers available?
When available buffers are less than M, and some blocks
have to be put in disk instead of in memory.
How the buffer-replacement strategy impact the performance
(i.e. the number of additional I/O’s)?
15.7 Example
FOR each chunk of M-1 blocks of S DO BEGIN
read these blocks into main-memory buffers;
organize their tuples into a search structure whose
search key is the common attributes of R and S;
FOR each block b of R DO BEGIN
read b into main memory;
FOR each tuple t of b DO BEGIN
find the tuples of S in main memory that
join with t ;
output the join of t with each of these tuples;
END ;
END ;
END ;
Figure 15.8: The nested-loop join algorithm
15.7 Example
The outer loop number (M-1) depends on the average
number of buffers are available at each iteration.
The outer loop use M-1 buffers and 1 is reserved for a block
of R, the relation of the inner loop.
If we pin the M-1 blocks we use for S on one iteration of
the outer loop, we shall not lose their buffers during the
round.
Also, more buffers may become available and then we could
keep more than one block of R in memory.
Will these extra buffers improve the running time?
15.7 Example
CASE1: NO
Buffer-replacement strategy: LRU
Buffers for R: k
We read each block of R in order into buffers.
By end of the iteration of the outer loop, the last k blocks of
R are in buffers.
However, next iteration will start from the beginning of R
again.
Therefore, the k buffers for R will need to be replaced.
15.7 Example
CASE 2: YES
Buffer-replacement strategy: LRU
Buffers for R: k
We read the blocks of R in an order that alternates:
firstlast and then lastfirst.
In this way, we save k disk I/Os on each iteration of
the outer loop except the first iteration.
15.7 Other Algorithms and M buffers
Other Algorithms also are impact by M and the bufferreplacement strategy.
Sort-based algorithm
If M shrinks, we can change the size of a sublist.
Unexpected result: too many sublists to allocate each
sublist a buffer.
Hash-based algorithm
If M shrinks, we can reduce the number of buckets, as long
as the buckets still can fit in M buffers.
15.8 Algorithms using more than two
passes

Reason that we use more than two passes:
Two passes are usually enough, however, for the largest
relation, we use as many passes as necessary.

Multi-pass Sort-based Algorithms
Suppose we have M main-memory buffers available to
sort a relation R, which we assume is stored clustered.
Then we do the following:
15.8 Algorithms using more than two
passes
BASIS:
If R fits in M blocks (i.e., B(R)<=M)
1. Read R into main memory.
2. Sort it using any main-memory sorting algorithm.
3. Write the sorted relation to disk.
INDUCTION:
If R does not fit into main memory.
1. Partition the blocks holding R into M groups, which
we shall call R1, R2, R3…
2. Recursively sort Ri for each i=1,2,3…M.
3. Merge the M sorted sublists.
15.8 Algorithms using more than two
passes
If we are not merely sorting R, but performing a unary
operation such as δ or γ on R.
We can modify the above so that at the final merge we
perform the operation on the tuples at the front of the
sorted sublists.


That is:
For a δ, output one copy of each distinct tuple, and skip
over copies of the tuple.
For a γ, sort on the grouping attributes only, and
combine the tuples with a given value of these grouping
attributes.
15.8 Algorithms using more than two
passes
Conclusion
The two pass algorithms based on sorting or hashing
have natural recursive analogs that take three or more
passes and will work for larger amounts of data.
The Query Compiler
16.1 Parsing and Preprocessing
16.1 Parsing and Preprocessing–Query
compilation is divided into three steps
 Parsing: Parse SQL query into parser tree.
 2. Logical query plan: Transforms parse tree into
expression tree of relational algebra.
 3.Physical query plan: Transforms logical query
plan into physical query plan.

. Operation performed
 . Order of operation

. Algorithm used

. The way in which stored data is obtained and
passed from one

operation to another.
16.1.1 Syntax Analysis and Parse Tree
Parser takes the sql query and convert it to parse
tree. Nodes of parse tree:
1. Atoms: known as Lexical elements such as key
words, constants, parentheses, operators, and
other schema elements.
2. Syntactic categories: Subparts that plays a
similar role in a query as <Query> , <Condition>
16.1.2.Grammar for Simple Subset of SQL
<Query> ::= <SFW>
<Query> ::= (<Query>)
<SFW> ::= SELECT <SelList> FROM <FromList> WHERE <Condition>
<SelList> ::= <Attribute>,<SelList>
<SelList> ::= <Attribute>
<FromList> ::= <Relation>, <FromList>
<FromList> ::= <Relation>
<Condition>
<Condition>
<Condition>
<Condition>
<Tuple> ::=
::= <Condition> AND <Condition>
::= <Tuple> IN <Query>
::= <Attribute> = <Attribute>
::= <Attribute> LIKE <Pattern>
<Attribute>
Atoms(constants), <syntactic categories>(variable),
::= (can be expressed/defined as)
16.1 Query and Parse Tree
StarsIn(title,year,starName)
MovieStar(name,address,gender,birthdate)
Query:
Give titles of movies that have at least one star born in
1960
SELECT title FROM StarsIn WHERE starName IN
(
SELECT name FROM MovieStar WHERE
birthdate LIKE '%1960%'
);
16.1 Query and Parse Tree
16.1.3. The Preprocessor
Functions of Preprocessor
. If a relation used in the query is virtual view then each use
of this relation in the form-list must replace by parser tree
that describe the view.
. It is also responsible for semantic checking
1. Checks relation uses : Every relation mentioned in FROMclause must be a relation or a view in current schema.
2. Check and resolve attribute uses: Every attribute
mentioned in SELECT or WHERE clause must be an attribute
of same relation in the current scope.
3. Check types: All attributes must be of a type appropriate
to their uses.
16.1.3. The Preprocessor
StarsIn(title,year,starName)
MovieStar(name,address,gender,birthdate)
Query:
Give titles of movies that have at least one star born in
1960
SELECT title FROM StarsIn WHERE starName IN
(
SELECT name FROM MovieStar WHERE
birthdate LIKE '%1960%'
);
16.1.4. Preprocessing Queries Involving
Views
When an operand in a query is a virtual view, the
preprocessor needs to replace the operand by a piece of
parse tree that represents how the view is constructed
from base table.
Base Table: Movies( title, year, length, genre, studioname,
producerC#)
View definition : CREATE VIEW ParamountMovies AS
SELECT title, year FROM movies
WHERE studioName = 'Paramount';
Example based on view:
SELECT title FROM ParamountMovies WHERE year = 1979;
16.2
Algebraic Laws For Improving
Query Plans
16.2 Optimizing the Logical Query Plan
 The translation rules converting a parse tree to a logical
query tree do not always produce the best logical query
tree.
 It is often possible to optimize the logical query tree by
applying relational algebra laws to convert the original
tree into a more efficient logical query tree.
 Optimizing a logical query tree using relational algebra
laws is called heuristic optimization
16.2 Relational Algebra Laws
These laws often involve the properties of:
 commutativity - operator can be applied to operands
independent of order.
 E.g. A + B = B + A - The “+” operator is commutative.
 associativity - operator is independent of operand
grouping.
 E.g. A + (B + C) = (A + B) + C - The “+” operator is
associative.
16.2 Associative and Commutative
Operators
 The relational algebra operators of cross-product (×), join
(⋈), union, and intersection are all associative and
commutative.
Commutative
Associative
R X S=S X R
(R X S) X T = S X (R X T)
R⋈S=S⋈R
(R ⋈ S) ⋈ T= S ⋈ (R ⋈ T)
RS=SR
(R  S)  T = S  (R  T)
R ∩S =S∩ R
(R ∩ S) ∩ T = S ∩ (R ∩ T)
16.2 Laws Involving Selection

Complex selections involving AND or OR can be broken
into two or more selections: (splitting laws)



σC1 AND C2 (R) = σC1( σC2 (R))
σC1 OR C2 (R) = ( σC1 (R) ) S ( σC2 (R) )
Example
 R={a,a,b,b,b,c}
 p1 satisfied by a,b, p2 satisfied by b,c
 σp1vp2 (R) = {a,a,b,b,b,c}
 σp1(R) = {a,a,b,b,b}
 σp2(R) = {b,b,b,c}
 σp1 (R) U σp2 (R) = {a,a,b,b,b,c}
16.2. Laws Involving Selection (Contd..)
 Selection is pushed through both arguments for union:
 σC(R  S) = σC(R)  σC(S)
 Selection is pushed to the first argument and optionally
the second for difference:
 σC(R - S) = σC(R) - S
 σC(R - S) = σC(R) - σC(S)
 All other operators require selection to be pushed to only
one of the arguments.
 For joins, may not be able to push selection to both if
argument does not have attributes selection requires.
 σC(R × S) = σC(R) × S
 σC(R ∩ S) = σC(R) ∩ S
 σC(R ⋈ S) = σC(R) ⋈ S
 σC(R ⋈D S) = σC(R) ⋈D S
16.2. Laws Involving Selection (Contd..)






Example
Consider relations R(a,b) and S(b,c) and the expression
σ (a=1 OR a=3) AND b<c (R ⋈S)
σ a=1 OR a=3(σ b<c (R ⋈S))
σ a=1 OR a=3(R ⋈ σ b<c (S))
σ a=1 OR a=3(R) ⋈ σ b<c (S)
16.2. Laws Involving Projection
 Like selections, it is also possible to push projections down
the logical query tree. However, the performance gained is
less than selections because projections just reduce the
number of attributes instead of reducing the number of
tuples.
 Laws for pushing projections with joins:
 πL(R × S) = πL(πM(R) × πN(S))
 πL(R ⋈ S) = πL((πM(R) ⋈ πN(S))
 πL(R ⋈D S) = πL((πM(R) ⋈D πN(S))
 Laws for pushing projections with set operations.
16.2. Laws Involving Projection
 Projection can be performed entirely before union.
 πL(R UB S) = πL(R) UB πL(S)
 Projection can be pushed below selection as long as we
also keep all attributes needed for the selection (M = L 
attr(C)).
 πL ( σC (R)) = πL( σC (πM(R)))
16.2. Laws Involving Join

We have previously seen these important rules about
joins:
 Joins are commutative and associative.
 Selection can be distributed into joins.
 Projection can be distributed into joins.
16.2. Laws Involving Duplicate
Elimination






The duplicate elimination operator (δ) can be pushed
through many operators.
R has two copies of tuples t, S has one copy of t,
δ (RUS)=one copy of t
δ (R) U δ (S)=two copies of t
Laws for pushing duplicate elimination operator (δ):
 δ(R × S) = δ(R) × δ(S)
 δ(R
S) = δ(R)
δ(S)
 δ(R D S) = δ(R)
D δ(S)
 δ( σC(R) = σC(δ(R))
The duplicate elimination operator (δ) can also be pushed
through bag intersection, but not across union,
difference, or projection in general.
 δ(R ∩ S) = δ(R) ∩ δ(S)
16.2.Laws Involving Grouping
 The grouping operator (γ) laws depend on the aggregate
operators used.
 There is one general rule, however, that grouping
subsumes duplicate elimination:
 δ(γL(R)) = γL(R)
 The reason is that some aggregate functions are
unaffected by duplicates (MIN and MAX) while other
functions are (SUM, COUNT, and AVG).
16.3 From Parse to Logical Query Plans
-Review
Query
Parser
Section 16.1
Preprocessor
Logical query
plan generator
Section 16.3
Query Rewriter
Preferred logical query plan
16.3. Two steps to turn Parse tree
into Preferred Logical Query Plan

Replace the nodes and structures of the parse tree, in
appropriate groups, by an operator or operators of
relational algebra.

Take the relational algebra expression and turn it into an
expression that we expect can be converted to the most
efficient physical query plan.
16.3 Conversion to Relational Algebra

If we have a <Query> with a <Condition> that has no
subqueries, then we may replace the entire construct –
the select-list, from-list, and condition – by a relationalalgebra expression.

The relational-algebra expression consists of the
following from bottom to top:
 The products of all the relations mentioned in the
<FromList>, which Is the argument of:
 A selection σC, where C is the <Condition> expression
in the construct being replaced, which in turn is the
argument of:
 A projection πL , where L is the list of attributes in the
<SelList>
16.3 A query : Example
 SELECT movieTitle
FROM Starsin, MovieStar
WHERE starName = name AND
birthdate LIKE ‘%1960’;
16.3 Parse Tree
16.3. Translation to an algebraic
expression tree
16.3.Removing Subqueries From
Conditions

For parse trees with a <Condition> that has a subquery

Intermediate operator – two argument selection

It is intermediate in between the syntactic categories of
the parse tree and the relational-algebra operators that
apply to relations.
16.3 Using a two-argument σ
πmovieTitle
σ
<Condition>
StarsIn
<Tuple>
<Attribute>
starName
IN
πname
σ birthdate LIKE ‘%1960'
MovieStar
16.3.Two argument selection with
condition involving IN

Now say we have, two arguments – some relation and
the second argument is a <Condition> of the form t IN
S.
 ‘t’ – tuple composed of some attributes of R
 ‘S’ – uncorrelated subquery

Steps to be followed:
 Replace the <Condition> by the tree that is the
expression for S ( δ is used to remove duplicates)
 Replace the two-argument selection by a oneargument selection σC.
 Give σC an argument that is the product of R and S.
16.3.Two argument selection with
condition involving IN
σ
R
σC
<Condition>
t
IN
X
S
R
δ
S
16.3.The effect
16.3Improving the Logical Query Plan

Algebraic laws to improve logical query plans:
 Selections can be pushed down the expression tree
as far as they can go.
 Similarly, projections can be pushed down the tree,
or new projections can be added.
 Duplicate eliminations can sometimes be removed, or
moved to a more convenient position in the tree.
 Certain selections can be combined with a product
below to turn the pair of operations into an equijoin.
16.3.Grouping Associative/ Commutative
Operators




An operator that is associative and commutative
operators may be though of as having any number of
operands.
We need to reorder these operands so that the multiway
join is executed as sequence of binary joins.
Its more time consuming to execute them in the order
suggested by parse tree.
For each portion of subtree that consists of nodes with
the same associative and commutative operator (natural
join, union, and intersection), we group the nodes with
these operators into a single node with many children.
16.3. The effect of query rewriting
Π movieTitle
Starname = name
StarsIn
σbirthdate LIKE ‘%1960’
MovieStar
16.3.Final step in producing logical query
plan
U
U
S
T
=>
R
U
U
U
R
S
T
V
W
V
W
16.3.An Example to summarize
 “find movies where the average age of the stars was at
most 40 when the movie was made”
 SELECT distinct m1.movieTitle, m1,movieYear
FROM StarsIn m1
WHERE m1.movieYear – 40 <= (
SELECT AVG (birthdate)
FROM StartsIn m2, MovieStar s
WHERE m2.starName = s.name AND
m1.movieTitle = m2.movieTitle AND
m1.movieYear = m2.movieyear
);
16.3.An Example to summarize
16.3.Selections combined with a product to
turn the pair of operations into an equijoin…
16.3. Condition pushed up the
expression tree…
16.3. Condition pushed up the
`expression tree…
16.3.Selections combined…
16.3.Selections combined…
16.4.Estimating the Cost of Operations



After getting to the logical query plan, we turn it into
physical plan.
Consider all the possible physical plan and estimate their
costs – this evaluation is known as cost-based
enumeration.
The one with least estimated cost is the one selected to
be passed to the query-execution engine.
16.4. Selection for each physical plan

We select for each physical plan:
 An order and grouping for associative-andcommutative operations like joins, unions, and
intersections.
 An algorithm for each operator in the logical plan, for
instance, deciding whether a nested-loop join or
hash-join should be used.
 Additional operators – scanning, sorting etc. – that
are needed for the physical plan but that were not
present explicitly in the logical plan.
 The way in which the arguments are passed from on
operator to the next.
16.4.Estimating Sizes of Intermediate
Relations
1.
Give accurate estimates.
2.
Are easy to compute.
3.
Are logically consistent; that is, the size estimate for an
intermediate relation should not depend on how that
relation is computed.
16.4.Estimating the Size of a Projection

We should treat a classical, duplicate-eliminating
projection as a bag-projection.

The size of the result can be computed exactly.

There may be reduction in size (due to eliminated
components) or increase in size (due to new components
created as combination of attributes).
16.4.Estimating the Size of a Selection
 While performing selection, we may reduce the
number of tuples but the sizes of tuple remain
same.
 Size can be computed as:
S = σ A=c (R)
Where A is an attribute of R and c is a constant

The recommended estimate is
T(S) = T(R)/ V(R,A)
16.4.Estimating Sizes of Other
Operations





Union
Intersection
Difference
Duplicate Elimination
Grouping and Aggregation
16.6 Choosing an Order for Joins Introduction

This section focuses on critical problem in cost-based
optimization:
 Selecting order for natural join of three or more
relations

Compared to other binary operations, joins take more
time and therefore need effective optimization
techniques
16.6.Significance of Left and Right
Join Arguments




The argument relations in joins determine the cost of the join
The left argument of the join is
 Called the build relation
 Assumed to be smaller
 Stored in main-memory
The right argument of the join is
 Called the probe relation
 Read a block at a time
 Its tuples are matched with those of build relation
Join algorithm which distinguish between the arguments are:
 One-pass join
 Nested-loop join, Index join
16.6. Join Trees




Order of arguments is important for joining two relations
Left argument, since stored in main-memory, should be
smaller
With two relations only two choices of join tree
With more than two relations, there are n! ways to order
the arguments and therefore n! join trees, where n is the
no. of relations
16.6. Join Trees

Total # of tree shapes T(n) for n relations given by
recurrence:




T(1)
T(2)
T(3)
T(4)
=
=
=
=
1
1
2
5 … etc
16.6.Left-Deep Join Trees

Consider 4 relations. Different ways to join them are as
follows
16.6.Left-Deep Join Trees




In fig (a) all the right children are leaves. This is a leftdeep tree
In fig (c) all the left children are leaves. This is a rightdeep tree
Fig (b) is a bushy tree
Considering left-deep trees is advantageous for deciding
join orders
16.6.Join order

Join order selection
 A1
A2
A3
..
 Left deep join trees
An
An
Ai
 Dynamic programming
 Best plan computed for each subset of relations
 Best plan (A1, .., An) = min cost plan of(
Best plan(A2, .., An)
A1
Best plan(A1, A3, .., An)
A2
….
Best plan(A1, .., An-1))
An
16.6.Dynamic Programming to Select
a Join Order and Grouping


Three choices to pick an order for the join of many
relations are:
 Consider all of the relations
 Consider a subset
 Use a heuristic o pick one
Dynamic programming is used either to consider all or a
subset
 Construct a table of costs based on relation size
 Remember only the minimum entry which will
required to proceed
16.6.Dynamic Programming to Select
a Join Order and Grouping
16.6.Dynamic Programming to Select
a Join Order and Grouping
16.6. A Greedy Algorithm for Selecting
a Join Order



It is expensive to use an exhaustive method like
dynamic programming
Better approach is to use a join-order heuristic for the
query optimization
Greedy algorithm is an example of that
 Make one decision at a time about order of join and
never backtrack on the decisions once made
16.7 Completing the Physical-QueryPlan

3 topics related to turning LP into a complete physical
plan
 Choosing of physical implementations such as
Selection and Join methods
 Decisions regarding to intermediate results
(Materialized or Pipelined)
 Notation for physical-query-plan operators
16.7. I. Choosing a Selection Method (A)

Algorithms for each selection operators
 1. Can we use an created index on an attribute?
If yes, index-scan. Otherwise table-scan)
 2. After retrieve all condition-satisfied tuples in (1),
then filter them with the rest selection conditions
16.7.Choosing a Selection Method(A)
(cont.)
How costs for various plans are estimated from σC(R) operation
1. Cost of table-scan algorithm
a) B(R)
if R is clustered
b) T(R)
if R is not clustered
2. Cost of a plan picking an equality term (e.g. a = 10) w/ indexscan
a) B(R) / V(R, a)
clustering index
b) T(R) / V(R, a)
nonclustering index
3. Cost of a plan picking an inequality term (e.g. b < 20) w/
index-scan
a) B(R) / 3
clustering index
b) T(R) / 3
nonclustering index
16.7.Example
Selection: σx=1 AND y=2 AND z<5 (R)
- Where parameters of R(x, y, z) are :
T(R)=5000,
B(R)=200,
V(R,x)=100, and V(R, y)=500
-
Relation R is clustered
x, y have nonclustering indexes, only index on z is
clustering.
16.7.Example (cont.)
Selection options:
1.
2.
3.
4.
Table-scan  filter x, y, z. Cost is B(R) = 200 since R is
clustered.
Use index on x =1  filter on y, z. Cost is 50 since
T(R) / V(R, x) is (5000/100) = 50 tuples, index is not
clustering.
Use index on y =2  filter on x, z. Cost is 10 since
T(R) / V(R, y) is (5000/500) = 10 tuples using
nonclustering index.
Index-scan on clustering index w/ z < 5  filter x ,y.
Cost is about B(R)/3 = 67
16.7.Example (cont.)



Costs
option 1 = 200
option 2 = 50
option 3 = 10

option 4 = 67
The lowest Cost is option 3.
Therefore, the preferred physical plan
 retrieves all tuples with y = 2
 then filters for the rest two conditions (x, z).
16.7.II. Choosing a Join Method

Determine costs associated with each join algorithms:
1. One-pass join, and nested-loop join devotes enough
buffer to joining
2. Sort-join is preferred when attributes are pre-sorted
or two or more join on the same attribute such as
 (R(a, b) S(a, c)) T(a, d)
- where sorting R and S on a will produce result of R
S to be sorted on a and used directly in next join
3. Index-join for a join with high chance of using index
created on the join attribute such as R(a, b)
S(b,
c)
4. Hashing join is the best choice for unsorted or nonindexing relations which needs multipass join.
16.7.III. Pipelining Versus Materialization

Materialization (naïve way)
 store (intermediate) result of each operations on disk

Pipelining (more efficient way)
 Interleave the execution of several operations, the
tuples produced by one operation are passed directly to
the operations that used it
 store (intermediate) result of each operations on buffer,
which is implemented on main memory
16.7. IV. Pipelining Unary Operations
Unary = a-tuple-at-a-time or full relation
selection and projection are the best candidates for
pipelining.


In buf
Unary
operation
Out buf
Unary
operation
Out buf
R
In buf
M-1 buffers
16.7.Pipelining Unary Operations (cont.)

Pipelining Unary Operations are implemented by
iterators
16.7.V. Pipelining Binary Operations




Binary operations : ,  , - , , x
The results of binary operations can also be pipelined.
Use one buffer to pass result to its consumer, one block
at a time.
The extended example shows tradeoffs and opportunities
16.7.Example


Consider physical query plan for the expression
 (R(w, x) S(x, y)) U(y, z)
Assumption
 R occupies 5,000 blocks, S and U each 10,000
blocks.
 The intermediate result R S occupies k blocks for
some k.
 Both joins will be implemented as hash-joins, either
one-pass or two-pass depending on k
 There are 101 buffers available.
16.7.Example (cont.)
 First consider join
R S, neither relations
fits in buffers
Needs two-pass
hash-join to partition
R into 100 buckets
(maximum possible)
each bucket has 50 blocks
 The 2nd pass hash-join
uses 51 buffers, leaving the rest 50 buffers for joining result
of R S with U.
16.7.Example (cont.)


Case 1: suppose k  49, the result of
R S occupies
at most 49 blocks.
Steps
1. Pipeline in R S into 49 buffers
2. Organize them for lookup as a hash table
3. Use one buffer left to read each block of U in turn
4. Execute the second join as one-pass join.
16.7.Example (cont.)

The total number of I/O’s is
55,000
 45,000 for two-pass hash
join of R and S
 10,000 to read U for onepass hash join of

(R S) U.
16.7.Example (cont.)

1.
2.
3.
Case 2: suppose k > 49 but < 5,000, we can still
pipeline, but need another strategy which intermediate
results join with U in a 50-bucket, two-pass hash-join.
Steps are:
Before start on R S, we hash U into 50 buckets of 200
blocks each.
Perform two-pass hash join of R and U using 51 buffers
as case 1, and placing results in 50 remaining buffers to
form 50 buckets for the join of R S with U.
Finally, join R S with U bucket by bucket.
16.7.Example (cont.)


The number of disk I/O’s is:
 20,000 to read U and write its tuples into buckets
 45,000 for two-pass hash-join R S
 k to write out the buckets of R S
 k+10,000 to read the buckets of R S and U in the
final join
The total cost is 75,000+2k.
16.7.Example (cont.)


Compare Increasing I/O’s between case 1 and case 2
 k  49 (case 1)
 Disk I/O’s is 55,000
 k > 50  5000 (case 2)
 k=50 , I/O’s is 75,000+(2*50) = 75,100
 k=51 , I/O’s is 75,000+(2*51) = 75,102
 k=52 , I/O’s is 75,000+(2*52) = 75,104
Notice: I/O’s discretely grows as k increases from 49
50.
16.7.Example (cont.)

1.
2.


Case 3: k > 5,000, we cannot perform two-pass join in
50 buffers available if result of R S is pipelined. Steps
are
Compute R S using two-pass join and store the result
on disk.
Join result on (1) with U, using two-pass join.
The number of disk I/O’s is:
 45,000 for two-pass hash-join R and S
 k to store R S on disk
 30,000 + k for two-pass join of U in R S
The total cost is 75,000+4k.
16.7.Example (cont.)
 In summary, costs of physical plan as function of R
size.
S
16.7.VI. Notation for Physical Query Plans
Several types of operators:
1. Operators for leaves
2. (Physical) operators for Selection
3. (Physical) Sorts Operators
4. Other Relational-Algebra Operations

In practice, each DBMS uses its own internal notation
for physical query plan.

16.7.Notation for Physical Query
Plans (cont.)
1.
Operator for leaves
 A leaf operand is replaced in LQP tree
 TableScan(R) : read all blocks
 SortScan(R, L) : read in order according to L
 IndexScan(R, C): scan index attribute A by condition
C of form Aθc.
 IndexScan(R, A) : scan index attribute R.A. This
behaves like TableScan but more efficient if R is not
clustered.
16.7.Notation for Physical Query
Plans (cont.)
2.
(Physical) operators for Selection
 Logical operator σC(R) is often combined with access
methods.
 If σC(R) is replaced by Filter(C), and there is no
index on R or an attribute on condition C
 Use TableScan or SortScan(R, L) to access R
 If condition C  Aθc AND D for condition D, and
there is an index on R.A, then we may
 Use operator IndexScan(R, Aθc) to access R and
 Use Filter(D) in place of the selection σC(R)
16.7.Notation for Physical Query
Plans (cont.)
3.
(Physical) Sort Operators
 Sorting can occur any point in physical plan, which
use a notation SortScan(R, L).
 It is common to use an explicit operator Sort(L) to
sort relation that is not stored.
 Can apply at the top of physical-query-plan tree if
the result needs to be sorted with ORDER BY clause
(г).
16.7.Notation for Physical Query
Plans (cont.)
4.
Other Relational-Algebra Operations
 Descriptive text definitions and signs to elaborate
 Operations performed e.g. Join or grouping.
 Necessary parameters e.g. theta-join or list of
elements in a grouping.
 A general strategy for the algorithm e.g. sortbased, hashed based, or index-based.
 A decision about number of passed to be used
e.g. one-pass, two-pass or multipass.
 An anticipated number of buffers the operations
will required.
16.7.Notation for Physical Query
Plans (cont.)

Example of a physical-query-plan
 A physical-query-plan in example 16.36 for the case
k > 5000
 TableScan
 Two-pass hash join
 Materialize (double line)
 Store operator
16.7.Notation for Physical Query
Plans (cont.)

Another example
 A physical-query-plan in example 16.36 for the case
k < 49
 TableScan
 (2) Two-pass hash join
 Pipelining
 Different buffers needs
 Store operator
16.7.Notation for Physical Query
Plans (cont.)

A physical-query-plan in example 16.35
 Use Index on condition y = 2 first
 Filter with the rest condition later on.
16.7.VII. Ordering of Physical Operations


The PQP is represented as a tree structure implied
order of operations.
Still, the order of evaluation of interior nodes may not
always be clear.
 Iterators are used in pipeline manner
 Overlapped time of various nodes will make
“ordering” no sense.
16.7.Ordering of Physical Operations
(cont.)

3 rules summarize the ordering of events in a PQP tree:
 Break the tree into sub-trees at each edge that
represent materialization.
 Execute one subtree at a time.
 Order the execution of the subtree
 Bottom-top
 Left-to-right
 All nodes of each sub-tree are executed
simultaneously.
16.8.COMPILATION OF QUERIES


Compilation means turning a query into a physical query
plan, which can be implemented by query engine.
Steps of query compilation :
 Parsing
 Semantic checking
 Selection of the preferred logical query plan
 Generating the best physical plan
16.8.THE PARSER




The first step of SQL query processing.
Generates a parse tree
Nodes in the parse tree corresponds to the SQL
constructs
Similar to the compiler of a programming language
16.8.VIEW EXPANSION



A very critical part of query compilation.
Expands the view references in the query tree to the
actual view.
Provides opportunities for the query optimization.
16.8.SEMANTIC CHECKING




Checks the semantics of a SQL query.
Examines a parse tree.
Checks :
 Attributes
 Relation names
 Types
Resolves attribute references.
16.8.CONVERSION TO A LOGICAL
QUERY PLAN



Converts a semantically parsed tree to a algebraic
expression.
Conversion is straightforward but sub queries need to be
optimized.
Two argument selection approach can be used.
16.8.ALGEBRAIC TRANSFORMATION


Many different ways to transform a logical query plan to an
actual plan using algebraic transformations.
The laws used for this transformation :
 Commutative and associative laws
 Laws involving selection
 Pushing selection
 Laws involving projection
 Laws about joins and products
 Laws involving duplicate eliminations
 Laws involving grouping and aggregation
16.8.Estimating Sizes Of Relations



True running time is taken into consideration when
selecting the best logical plan.
Two factors the affects the most in estimating the sizes
of relation :
 Size of relations ( No. of tuples )
 No. of distinct values for each attribute of each
relation
Histograms are used by some systems.
16.8.COST BASED OPTIMIZING


Best physical query plan represents the least costly plan.
Factors that decide the cost of a query plan :
 Order and grouping operations like joins, unions and
intersections.
 Nested loop and the hash loop joins used.
 Scanning and sorting operations.
 Storing intermediate results.
16.8.Plan Enumeration Strategies

Common approaches for searching the space for best
physical plan .
 Dynamic programming : Tabularizing the best plan
for each sub expression
 Selinger style programming : sort-order the results
as a part of table
 Greedy approaches : Making a series of locally
optimal decisions
 Branch-and-bound : Starts with enumerating the
worst plans and reach the best plan
16.8.Left-deep Join Trees



Left – Deep Join Trees are the binary trees with a single
spine down the left edge and with leaves as right
children.
This strategy reduces the number of plans to be
considered for the best physical plan.
Restrict the search to Left – Deep Join Trees when
picking a grouping and order for the join of several
relations.
16.8.Physical Plans For Selection



Breaking a selection into an index-scan of relation,
followed by a filter operation.
The filter then examines the tuples retrieved by the
index-scan.
Allows only those to pass which meet the portions of
selection condition.
16.8.Pipelining Versus Materializing





This flow of data between the operators can be controlled
to implement “ Pipelining “ .
The intermediate results should be removed from main
memory to save space for other operators.
This techniques can implemented using “ materialization “ .
Both the pipelining and the materialization should be
considered by the physical query plan generator.
An operator always consumes the result of other operator
and is passed through the main memory.
Chapter -18
Concurrency Control
18.1.Concurrency Control

Concurrency control in database management systems
(DBMS) ensures that database transactions are performed
concurrently without the concurrency violating the data
integrity of a database.

Executed transactions should follow the ACID rules. The
DBMS must guarantee that only serializable (unless
Serializability is intentionally relaxed), recoverable
schedules are generated.

It also guarantees that no effect of committed transactions
is lost, and no effect of aborted (rolled back) transactions
remains in the related database.
18.1.Transaction ACID rules
Atomicity - Either the effects of all or none of its operations
remain when a transaction is completed - in other words, to
the outside world the transaction appears to be indivisible,
atomic.
Consistency - Every transaction must leave the database in a
consistent state.
Isolation - Transactions cannot interfere with each other.
Providing isolation is the main goal of concurrency control.
Durability - Successful transactions must persist through
crashes.
18.1. Serial and Serializable Schedules
In the field of databases, a schedule is a list of actions,
(i.e. reading, writing, aborting, committing), from a set of
transactions.
 In this example, Schedule D is the set of 3 transactions
T1, T2, T3. The schedule describes the actions of the
transactions as seen by the DBMS.
T1 Reads and writes to object X,
and then T2 Reads and writes
to object Y, and finally T3 Reads
and writes to object Z. This is an
example of a serial schedule,
because the actions of the
3 transactions are not interleaved.

18.1.Serial and Serializable Schedules
 A schedule that is equivalent to a serial schedule has the
serializability property.
 In schedule E, the order in which the actions of the
transactions are executed is not the same as in D, but in
the end, E gives the same result as D.
Serial Schedule
TI precedes T2A
T1
T2
Read(A); A  A+100
Write(A);
Read(B); B  B+100;
Write(B);
25
B
25
125
125
Read(A);A  A2;
Write(A);
Read(B);B  B2;
250
Write(B);
250
250
250
Serial Schedule T2 precedes Tl A
T1
T2
Read(A);A  A2;
Write(A);
Read(B);B  B2;
25
50
Write(B);
Read(A); A  A+100
Write(A);
Read(B); B  B+100;
Write(B);
B
25
50
150
150
150
150
serializable, but not serial, schedule
A
T1
T2
Read(A); A  A+100
Write(A);
25
Read(A);A  A2;
Write(A);
Read(B); B  B+100;
Write(B);
125
250
125
Read(B);B  B2;
Write(B);
250
r1(A); w1 (A): r2(A); w2(A); r1 (B); w1 (B); r2(B); w2(B);
B
25
250
250
nonserializable schedule
T1
T2
Read(A); A  A+100
Write(A);
Read(A);A  A2;
Write(A);
Read(B);B  B2;
A
25
B
25
125
250
Write(B);
50
Read(B); B  B+100;
Write(B);
250
150
150
schedule that is serializable only because of the detailed
behavior of the transactions
A
25
T1
T2’
Read(A); A  A+100
Write(A);
Read(A);A  A1;
Write(A);
Read(B);B  B1;
125
125
Write(B);
25
Read(B); B  B+100;
Write(B);

regardless of the consistent initial state: the final state will be consistent.
B
25
125
125
125
18.Non-Conflicting Actions
Two actions are non-conflicting if whenever they occur
consecutively in a schedule, swapping them does not affect
the final state produced by the schedule. Otherwise, they
are conflicting.
18.Conflicting Actions: General Rules


Two actions of the same transaction conflict:
 r1(A) w1(B)
Two actions over the same database element conflict, if
one of them is a write
 r1(A) w2(A)
 w1(A) w2(A)
18. Conflict actions



Two or more actions are said to be in conflict if:
 The actions belong to different transactions.
 At least one of the actions is a write operation.
 The actions access the same object (read or write).
The following set of actions is conflicting:
 T1:R(X), T2:W(X), T3:W(X)
While the following sets of actions are not:
 T1:R(X), T2:R(X), T3:R(X)
 T1:R(X), T2:W(Y), T3:R(X)
18.Conflict Serializable
 We may take any schedule and make as many
nonconflicting swaps as we wish.
 With the goal of turning the schedule into a serial schedule.
 If we can do so, then the original schedule is serializable,
because its effect on the database state remains the same as
we perform each of the nonconflicting
swaps.
18.Conflict Serializable



A schedule is said to be conflict-serializable when the
schedule is conflict-equivalent to one or more serial
schedules.
Another definition for conflict-serializability is that a
schedule is conflict-serializable if and only if there exists an
acyclic precedence graph/serializability graph for the
schedule.
Which is conflict-equivalent to the serial schedule <T1,T2>,
but not <T2,T1>.
18.Conflict equivalent/conflictSerializable

Let Ai and Aj are consecutive non-conflicting actions
that belongs to different transactions. We can swap Ai
and Aj without changing the result.

Two schedules are conflict equivalent if they can be
turned one into the other by a sequence of nonconflicting swaps of adjacent actions.

We shall call a schedule conflict-serializable if it is
conflict-equivalent to a serial schedule.
18.Conflict-serializable
T1
R(A)
W(A)
T2
R(A)
R(B)
W(A)
W(B)
R(B)
W(B)
18.Conflict-serializable
T1
T2
R(A)
W(A)
R(B)
R(A)
W(A)
W(B)
R(B)
W(B)
18.Conflict-serializable
T1
T2
R(A)
W(A)
R(A)
R(B)
W(B)
W(A)
R(B)
W(B)
18.Conflict-serializable
T1
T2
R(A)
W(A)
R(A)
Serial
Schedule
W(B)
R(B)
W(A)
R(B)
W(B)
18.4 Locking Systems with Several
Lock Modes
 In 18.3, if a transaction must lock a database element (X)
either reads or writes,
 No reason why several transactions could not read X at
the same time, as long as none write X
 Introduce locking schemes
 Shared/Read Lock ( For Reading)
 Exclusive/Write Lock( For Writing)
302
18.4.1 Shared & Exclusive Locks
 Transaction Consistency
 Cannot write without Exclusive Lock
 Cannot read without holding some lock
 Consider lock for writing is “stronger” than for reading
 This basically works on 2 principles
 1. A read action can only proceed a shared or an
exclusive lock
 2. A write lock can only proceed a exclusive lock
 All locks need to be unlocked before commit
303
18.4.1 Shared & Exclusive Locks
(cont.)
 Two-phase locking (2PL) of transactions
Ti
Lock  R/W  Unlock
 Notation:
sli (X)– Ti requests shared lock on DB element X
xli (X)– Ti requests exclusive lock on DB element X
ui (X)– Ti relinquishes whatever lock on X
304
18.4.1 Shared & Exclusive Locks
(cont.)

Legality of Schedules
 An element may be locked by: one write transaction or
by several read transactions shared mode, but not both
18.4.2 Compatibility Matrices
 A convenient way to describe lock-management policies
 Rows correspond to a lock held on an element by
another transaction
 Columns correspond to mode of lock requested.
 Example :
Lock requested
Lock in
hold
S
X
S
YES
NO
X
NO
NO
306
18.4.3 Upgrading Locks
 A transaction (T) taking a shared lock is friendly toward
other transaction.
 When T wants to read and write a new value X,
 1. T takes a shared lock on X.
 2. performs operations on X (may spend long time)
 3. When T is ready to write a new value, “Upgrade”
shared lock to exclusive lock on X.
307
18.4.3 Upgrading Locks (cont.)

Observe the example
T1 retry and
succeed

‘B’ is released
T1 cannot take an exclusive lock on B until all locks on B
are released.
18.4.3 Upgrading Locks (cont.)
 Upgrading can simply cause a “Deadlock”.
 Both the transactions want to upgrade on the same
element
Both transactions will wait forever !!
309
18.4.4 Update locks
 The third lock mode resolving the deadlock problem, which
rules are
 Only “Update lock” can be upgraded to a write
(exclusive) lock later.
 An “Update lock” is allowed to grant on X when there
are already shared locks on X.
 Once there is an “Update lock,” it prevents additional
any kinds of lock, and later changes to a write
(exclusive) lock.
 Notation: uli (X)
310
18.4.4 Update locks (cont.)

Example
18.4.4 Update locks (cont.)
• Compatibility matrix (asymmetric)
Lock requested
Lock in
hold
S
X
U
S
YES
NO
YES
X
NO
NO
NO
U
NO
NO
NO
312
18.4.5 Increment Locks
 A useful lock for transactions which increase/decrease
value.
 e.g. money transfer between two bank accounts.
 If 2 transactions (T1, T2) add constants to the same
database element (X),
 It doesn’t matter which goes first, but no reads are
allowed in between transaction processing
 Let see on following exhibits
313
18.4.5 Increment Locks (cont.)
CASE 1
T1: INC (A,2)
A=7
A=5
T2: INC (A,10)
CASE 2
T2: INC (A,10)
A=17
A=15
T1: INC (A,2)
18.4.5 Increment Locks (cont.)

What if
T1: INC (A,2)
A=5
T2: INC (A,10)
A=15
A=5
T2: INC (A,10)
A=5
A=7
A=5
A=15
A != 17
T1: INC (A,2)
A=7
18.4.5 Increment Locks (cont.)


INC (A, c) –
 Increment action of writing on database element A,
which is an atomic execution consisting of
 1. READ(A,t);
 2. t = t+c;
 3. WRITE(A,t);
Notation:
 ili (X)– action of Ti requesting an increment lock on X
 inci (X)– action of Ti increments X by some constant;
don’t care about the value of the constant.
18.4.5 Increment Locks (cont.)

Example
18.4.5 Increment Locks (cont.)
 Compatibility matrix
Lock requested
Lock in
hold
S
X
I
S
YES
NO
NO
X
NO
NO
NO
I
NO
NO
YES
318
18.5. Concurrency Control – Scheduler
Architecture

A simple scheduler architecture based on following
principle :
 Insert lock actions into the stream of reads, writes,
and other actions
 Release locks when the transaction manager tells it
that the transaction will commit or abort
319
18.5. Scheduler That Inserts Lock Actions
into the transactions request stream
18.5.Scheduler That Inserts Lock Actions
If transaction is delayed, waiting for a lock, Scheduler
performs following actions
 Part I: Takes the stream of requests generated by the
transaction & insert appropriate lock modes to db
operations (read, write, or update)
 Part II: Take actions (a lock or db operation) from Part
I and executes it.
Determine the transaction (T) that action belongs and
status of T (delayed or not). If T is not delayed then
Database access action is transmitted to the database and
executed
321
18.5.Scheduler That Inserts Lock
Actions
2.
If lock action is received by PartII, it checks the L Table
whether lock can be granted or not
i> Granted, the L Table is modified to include granted lock
ii>Not G. then update L Table about requested lock then
PartII delays transaction T
3. When a T = commits or aborts, PartI is notified by the
transaction manager and releases all locks.
If any transactions are waiting for locks PartI
notifies PartII.
4. Part II when notified about the lock on some DB element,
determines next transaction T’ to get lock to continue.
322
18.5.The Lock Table



A relation that associates database elements with locking
information about that element
Implemented with a hash table using database elements
as the hash key
Size is proportional to the number of lock elements only,
not to the size of the entire database
DB
element A
Lock
information
for A
323
18.5.Lock Table Entries Structure
Some Sort of information
found in Lock Table entry
1>Group modes
-S: only shared locks are
held
-X: one exclusive lock and
no other locks
- U: one update lock and one
or more shared locks
2>wait : one transaction
waiting for a lock on A
3>A list : T currently hold
locks on A or Waiting for
lock on A
324
18.5.Handling Lock Requests



Suppose transaction T requests a lock on A
If there is no lock table entry for A, then there are no
locks on A, so create the entry and grant the lock
request
If the lock table entry for A exists, use the group mode
to guide the decision about the lock request
325
18.5.Handling Lock Requests




If group mode is U (update) or X (exclusive)
No other lock can be granted
 Deny the lock request by T
 Place an entry on the list saying T requests a lock
 And Wait? = ‘yes’
If group mode is S (shared)
Another shared or update lock can be granted
 Grant request for an S or U lock
 Create entry for T on the list with Wait? = ‘no’
 Change group mode to U if the new lock is an update
lock
326
18.5.Handling Unlock Requests




Now suppose transaction T unlocks A
Delete T’s entry on the list for A
If T’s lock is not the same as the group mode, no need to
change group mode
Otherwise check entire list for new group mode
 S: GM(S) or nothing
 U: GM(S) or nothing
 X: nothing
327
18.5.Handling Unlock Requests
 If the value of waiting is “yes" need to grant one or more
locks using following approaches
First-Come-First-Served:
 Grant the lock to the longest waiting request.
 No starvation (waiting forever for lock)
Priority to Shared Locks:
 Grant all S locks waiting, then one U lock.
 Grant X lock if no others waiting
Priority to Upgrading:
 If there is a U lock waiting to upgrade to an X lock,
grant that first.
328
18.6.Managing Hierarchies of Database
Elements



Two problems that arise with locks when there is a tree
structure to the data are:
When the tree structure is a hierarchy of lockable
elements
 Determine how locks are granted for both large
elements (relations) and smaller elements (blocks
containing tuples or individual tuples)
When the data itself is organized as a tree (B-tree
indexes)
 This will be discussed in the next section
18.6.Locks with Multiple Granularity



A database element can be a relation, block or a tuple
Different systems use different database elements to
determine the size of the lock
Thus some may require small database elements such as
tuples or blocks and others may require large elements
such as relations
18.6.Example of Multiple Granularity
Locks

Consider a database for a bank
 Choosing relations as database elements means we
would have one lock for an entire relation
 If we were dealing with a relation having account
balances, this kind of lock would be very inflexible
and thus provide very little concurrency
 Why? Because balance transactions require
exclusive locks and this would mean only one
transaction occurs for one account at any time
 But as each account is independent of others we
could perform transactions on different accounts
simultaneously
…(contd.)

 Thus it makes sense to have block element for the
lock so that two accounts on different blocks can be
updated simultaneously
Another example is that of a document
 With similar arguments as above, we see that it is
better to have large element (a complete
document) as the lock in this case
18.6.Warning (Intention) Locks


These are required to manage locks at different
granularities
 In the bank example, if the a shared lock is obtained
for the relation while there are exclusive locks on
individual tuples, unserializable behavior occurs
The rules for managing locks on hierarchy of database
elements constitute the warning protocol
18.6.Database Elements Organized in
Hierarchy
18.6.Rules of Warning Protocol


These involve both ordinary (S and X) and warning (IS
and IX) locks
The rules are:
 Begin at the root of hierarchy
 Request the S/X lock if we are at the desired element
 If the desired element id further down the hierarchy,
place a warning lock (IS if S and IX if X)
 When the warning lock is granted, we proceed to the
child node and repeat the above steps until desired
node is reached
18.6.Compatibility Matrix for Shared,
Exclusive and Intention Locks
IS
IX
S
X
IS
Yes
Yes
Yes
No
IX
Yes
Yes
No
No
S
Yes
No
Yes
No
X
No
No
No
No
• The above matrix applies only to locks held by
other transactions
18.6.Group Modes of Intention Locks



An element can request S and IX locks at the same time
if they are in the same transaction (to read entire
element and then modify sub elements)
This can be considered as another lock mode, SIX,
having restrictions of both the locks i.e. No for all except
IS
SIX serves as the group mode
18.6.Example


Consider a transaction T1 as follows
 Select * from table where attribute1 = ‘abc’
 Here, IS lock is first acquired on the entire relation; then
moving to individual tuples (attribute = ‘abc’), S lock in
acquired on each of them
Consider another transaction T2
 Update table set attribute2 = ‘def’ where attribute1 =‘ghi’
 Here, it requires an IX lock on relation and since T1’s IS
lock is compatible, IX is granted
 On reaching the desired tuple (ghi), as there is no lock, it
gets X too
 If T2 was updating the same tuple as T1, it would have to
wait until T1 released its S lock
18.6.Phantoms and Handling
Insertions Correctly



This arises when transactions create new sub elements of
lockable elements
Since we can lock only existing elements the new elements
fail to be locked
The problem created in this situation is explained in the
following example
Consider a transaction T3
 Select sum(length) from table where attribute1 = ‘abc’
 This calculates the total length of all tuples having
attribute1
 Thus, T3 acquires IS for relation and S for targeted tuples
 Now, if another transaction T4 inserts a new tuple having
attribute1 = ‘abc’, the result of T3 becomes incorrect
18.6.Example (…contd.)


This is not a concurrency problem since the serial order
(T3, T4) is maintained
But if both T3 and T4 were to write an element X, it
could lead to unserializable behavior
 r3(t1);r3(t2);w4(t3);w4(X);w3(L);w3(X)
 r3 and w3 are read and write operations by T3 and
w4 are the write operations by T4 and L is the total
length calculated by T3 (t1 + t2)
 At the end, we have result of T3 as sum of lengths of
t1 and t2 and X has value written by T3
 This is not right; if value of X is considered to be that
written by T3 then for the schedule to be serializable,
the sum of lengths of t1, t2 and t3 should be
considered
18.6.Example (…contd.)


 Else if the sum is total length of t1 and t2 then for
the schedule to be serializable, X should have value
written by T4
This problem arises since the relation has a phantom
tuple (the new inserted tuple), which should have been
locked but wasn’t since it didn’t exist at the time locks
were taken
The occurrence of phantoms can be avoided if all
insertion and deletion transactions are treated as write
operations on the whole relation
18.7 TREE PROTOCOL
 Kind of graph-based protocol
 Alternate to Two-Phased Locking (2PL)
 database elements are disjoint pieces of data
 Nodes of the tree DO NOT form a hierarchy
based on containment
 Way to get to the node is through its parent
 Example: B-Tree
Advantage:
 Unlocking takes less time as compared to 2PL
 Freedom from deadlocks
18.7.1 Motivation For Tree- Based Locking
 Consider B-Tree Index, treating individual nodes as lockable
database elements.
 Concurrent use of B-Tree is not possible with standard set of
locks and 2PL.
 Therefore, a protocol is needed which can assure
serializability by allowing access to the elements all the way
at the bottom of the tree even if the 2PL is violated.
Reason for : “Concurrent use of B-Tree is not possible with
standard set of locks and 2PL.”
 every transaction must begin with locking the root node
 2PL transactions can not unlock the root until all the required
locks are acquired.
18.7.2 ACCESSING TREE STRUCTURED
DATA
 Assumptions:
 Only one kind of lock
 Consistent transactions
 Legal schedules
 No 2PL requirement on transaction
 RULES:
 First lock can be at any node.
 Subsequent locks may be acquired only after parent node
has a lock.
 Nodes may be unlocked any time.
 No relocking of the nodes even if the node’s parent is still
locked
18.7.3 WHY TREE PROTOCOL WORKS?
 Tree protocol implies a serial order on transactions in the schedule.
 Order of precedence:
Ti < s Tj
 If Ti locks the root before Tj, then Ti locks every node in common with Tj
before Tj.
18.7.ORDER OF PRECEDENCE
18.8.What is Timestamping?
 Scheduler assign each transaction T a unique number, it’s
timestamp TS(T).
 Timestamps must be issued in ascending order, at the
time when a transaction first notifies the scheduler that it
is beginning.
18.8.Timestamp TS(T)
 Two methods of generating Timestamps.
 Use the value of system, clock as the timestamp.
 Use a logical counter that is incremented after a new
timestamp has been assigned.
 Scheduler maintains a table of currently active
transactions and their timestamps irrespective of the
method used
18.8.Timestamps for database element
X and commit bit
 RT(X):- The read time of X, which is the highest
timestamp of transaction that has read X.
 WT(X):- The write time of X, which is the highest
timestamp of transaction that has write X.
 C(X):- The commit bit for X, which is true if and only if the
most recent transaction to write X has already committed.
18.8.Physically Unrealizable Behavior
Read too late:
 A transaction U that started after transaction T, but wrote
a value for X before T reads X.
U writes X
T reads X
T start
U start
18.8.Physically Unrealizable Behavior
Write too late
 A transaction U that started after T, but read X before T
got a chance to write X.
U reads X
T writes X
T start
U start
Figure: Transaction T tries to write too late
18.8. Dirty Read
 It is possible that after T reads the value of X written by U,
transaction U will abort.
U writes X
T reads X
U start
T start
U aborts
T could perform a dirty read if it reads X when shown
18.8.Rules for Timestamps-Based
scheduling
1. Scheduler receives a request rT(X)
a) If TS(T) ≥ WT(X), the read is physically realizable.
1. If C(X) is true, grant the request, if TS(T) > RT(X),
set RT(X) := TS(T); otherwise do not change RT(X).
2. If C(X) is false, delay T until C(X) becomes true or
transaction that wrote X aborts.
b) If TS(T) < WT(X), the read is physically unrealizable.
Rollback T.
2. Scheduler receives a request WT(X).
a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically
realizable and must be performed.
1. Write the new value for X,
2. Set WT(X) := TS(T), and
18.8.Rules for Timestamps-Based
scheduling
3. Set C(X) := false.
b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is
physically realizable, but there is already a later values in X.
a. If C(X) is true, then the previous writers of X is
committed, and ignore the write by T.
b. If C(X) is false, we must delay T.
c) if TS(T) < RT(X), then the write is physically unrealizable,
and T must be rolled back.
18.8.Rules for Timestamps-Based
scheduling
3. Scheduler receives a request to commit T. It must find all
the database elements X written by T and set C(X) :=
true. If any transactions are waiting for X to be
committed, these transactions are allowed to proceed.
4. Scheduler receives a request to abort T or decides to
rollback T, then any transaction that was waiting on an
element X that T wrote must repeat its attempt to read or
write.
18.8.Multiversion Timestamps
 Multiversion schemes keep old versions of data item to
increase concurrency.
 Each successful write results in the creation of a new
version of the data item written.
 Use timestamps to label versions.
 When a read(X) operation is issued, select an appropriate
version of X based on the timestamp of the transaction,
and return the value of the selected version.
18.8.Timestamps and Locking
 Generally, timestamping performs better than locking in
situations where:
 Most transactions are read-only.
 It is rare that concurrent transaction will try to read
and write the same element.
 In high-conflict situation, locking performs better than
timestamps
18.9. Concurrency Control by Validation
- Introduction
What is optimistic concurrency control?
(assumes no unserializable behavior will occur)
 Timestamp- based scheduling and
 Validation-based scheduling
(allows T to access data without locks)
18.9.Validation based scheduling




Scheduler keeps a record of what the active transactions are
doing.
Executes in 3 phases
1. Read- reads from RS( ), computes local address
2. Validate- compares read and write sets
3. Write- writes from WS( )
Contains an assumed serial order of transactions.
Maintains three sets:
1. START( ): set of T’s started but not completed validation.
2. VAL( ): set of T’s validated but not finished the writing
phase.
3. FIN( ): set of T’s that have finished.
18.9.Expected exceptions
1. Suppose there is a transaction U, such that:
 U is in VAL or FIN; that is, U has validated,
 FIN(U)>START(T); that is, U did not finish before T started
 RS(T) ∩WS(T) ≠φ; let it contain database element X.
2. Suppose there is transaction U, such that:
 U is in VAL; U has successfully validated.
FIN(U)>VAL(T); U did not finish before T entered its validation phase.
WS(T) ∩ WS(U) ≠φ; let x be in both write sets.
18.9.Validation rules
 Check that RS(T) ∩ WS(U)= φ for any previously validated
U that did not finish before T has started i.e.
FIN(U)>START(T).
 Check that WS(T) ∩ WS(U)= φ for any previously
validated U that did not finish before T is validated i.e.
FIN(U)>VAL(T)
18.9.Solution
 Validation of U:
Nothing to check
 Validation of T:
WS(U) ∩ RS(T)= {D} ∩{A,B}=φ
WS(U) ∩ WS(T)= {D}∩ {A,C}=φ
 Validation of V:
RS(V) ∩ WS(T)= {B}∩{A,C}=φ
WS(V) ∩ WS(T)={D,E}∩ {A,C}=φ
RS(V) ∩ WS(U)={B} ∩{D}=φ
 Validation of W:
RS(W) ∩ WS(T)= {A,D}∩{A,C}={A}
WS(W) ∩ WS(V)= {A,D}∩{D,E}={D}
WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ (W is not
validated)
18.9.Comparison
Concurrency control
Mechanisms
Storage Utilization
Delays
Locks
Space in the lock table is
proportional to the number of
database elements locked.
Delays transactions but avoids
rollbacks
Timestamps
Space is needed for read and write
times with every database
element, neither or not it is
currently accessed.
Do not delay the transactions
but cause them to rollback
unless Interface is low
Validation
Space is used for timestamps and
read or write sets for each
currently active transaction, plus a
few more transactions that finished
after some currently active
transaction began.
Do not delay the transactions
but cause them to rollback
unless interface is low
Chapter 21
Information Integration
21.1.Need for Information Integration

All the data in the world could put in a single database
(ideal database system)

In the real world (impossible for a single database):
databases are created independently
hard to design a database to support future use
21.1.University Database

Registrar: to record student and grade

Bursar: to record tuition payments by students

Human Resources Department: to record employees

Other department….
21.1.Inconvenient

Record grades for students who pay tuition

Want to swim in SJSU aquatic center for free in summer
vacation?
(all the cases above cannot achieve the function by a
single database)
Solution: one database
21.1.How to integrate

Start over build one database: contains all the legacy
databases; rewrite all the applications
result: painful

Build a layer of abstraction (middleware)
on top of all the legacy databases
this layer is often defined by a collection of classes
BUT…
21.1.Heterogeneity Problem
 What is Heterogeneity Problem
Aardvark Automobile Co.
1000 dealers has 1000 databases
to find a model at another dealer
can we use this command:
SELECT * FROM CARS WHERE MODEL=“A6”;
21.1.Type of Heterogeneity






Communication Heterogeneity
Query-Language Heterogeneity
Schema Heterogeneity
Data type difference
Value Heterogeneity
Semantic Heterogeneity
21.1.Conclusion




One database system is perfect, but impossible
Independent database is inconvenient
Integrate database
1. start over
2. middleware
heterogeneity problem
21.2. Modes of Information Integration
- Federations
 The simplest architecture for integrating several DBs

One to one connections between all pairs of DBs
 n DBs talk to each other, n(n-1) wrappers are needed
 Good when communications between DBs are limited
Wrapper : a software translates incoming queries and
outgoing answers. In a result, it allows information
sources to conform to some shared schema.
21.2.1.Federations Diagram
DB1
DB2
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
DB3
DB4
A federated collection of 4 DBs needs 12 components to translate queries
from one to another.
21.2.1.Example
Car dealers want to share their inventory. Each dealer queries the
other’s DB to find the needed car.
Dealer-1’s DB relation: NeededCars(model,color,autoTrans)
Dealer-2’s DB relation: Auto(Serial, model, color)
Options(serial,option)
wrapper
Dealer-1’s DB
wrapper
Dealer-2’s DB
21.2.1.Example…
For(each tuple(:m,:c,:a) in NeededCars){
if(:a=TRUE){/* automatic transmission wanted */
SELECT serial
FROM Autos, Options
WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’
AND Autos.model = :m AND Autos.color =:c;}
Else{/* automatic transmission not wanted */
SELECT serial
FROM Auto
WHERE Autos.model = :m AND
Autos.color = :c AND
NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial
AND option=‘autoTrans’);
Dealer 1 queries Dealer 2 for needed cars
} }
21.2.2.Data Warehouse

Sources are translated from their local schema to a
global schema and copied to a central DB.

User transparent: user uses Data Warehouse just like
an ordinary DB

User is not allowed to update Data Warehouse
21.2.2.Warehouse Diagram
User
query
result
Warehouse
Combiner
Extractor
Extractor
Source 1
Source 2
21.2.2.Example
Construct a data warehouse from sources DB of 2 car dealers:
Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…)
Dealer-2’s schema: Auto(serial,model,color)
Options(serial,option)
Warehouse’s schema:
AutoWhse(serialNo,model,color,autoTrans,dealer)
Extractor --- Query to extract data from Dealer-1’s data:
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,autoTrans,’dealer1’
From Cars;
21.2.Example
Extractor --- Query to extract data from Dealer-2’s data:
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’yes’,’dealer2’
FROM Autos,Options
WHERE Autos.serial=Options.serial AND
option=‘autoTrans’;
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’no’,’dealer2’
FROM Autos
WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial
AND option = ‘autoTrans’);
21.2.2.Construct Data Warehouse
There are mainly 3 ways to constructing the data in the
warehouse:
1)
Periodically reconstructed from the current data in the
sources, once a night or at even longer intervals.
Advantages:
simple algorithms.
Disadvantages:
1) need to shut down the warehouse;
2) data can become out of date.
21.2.2.Construct Data Warehouse
2)
Updated periodically based on the changes(i.e. each
night) of the sources.
Advantages:
involve smaller amounts of data. (important when
warehouse is large and needs to be modified in a
short period)
Disadvantages:
1) the process to calculate changes to the warehouse
is complex.
2) data can become out of date.
21.2.2.Construct Data Warehouse
3)
Changed immediately, in response to each
change or a small set of changes at one or more
of the sources.
Advantages:
data won’t become out of date.
Disadvantages:
requires too much communication, therefore, it
is generally too expensive.
(practical for warehouses whose underlying
sources changes slowly.)
21.2.3.Mediators

Virtual warehouse, which supports a virtual view or a
collection of views, that integrates several sources.

Mediator doesn’t store any data.

Mediators’ tasks:
1)receive user’s query,
2)send queries to wrappers,
3)combine results from wrappers,
4)send the final result to user.
21.2.3.A Mediator diagram
Result
User query
Mediator
Query
Result
Result
Wrapper
Query
Result
Source 1
Query
Wrapper
Query
Result
Source 2
21.2.3.Example
Same data sources as the example of data warehouse, the mediator
Integrates the same two dealers’ source into a view with schema:
AutoMed(serialNo,model,color,autoTrans,dealer)
When the user have a query:
SELECT sericalNo, model
FROM AkutoMed
Where color=‘red’
Example
In this simple case, the mediator forwards the same query to each
Of the two wrappers.
Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …)
SELECT serialNo,model
FROM cars
WHERE color = ‘red’;
Wrapper2: Autos(serial,model,color); Options(serial,option)
SELECT serial, model
FROM Autos
WHERE color=‘red’;
The mediator needs to interprets serial into serialNo, and then
returns the union of these sets of data to user.
21.2.3.Example
There may be different options for the mediator to forward
user query,
for example, the user queries if there are a specific
model&color car
(i.e. “Gobi”, “blue”).
The mediator decides 2nd query is needed or not based on
the result of
1st query. That is, If dealer-1 has the specific car, the
mediator doesn’t
have to query dealer-2.
21.3 Wrappers in Mediator-Based
Systems



More complicated than that in most data warehouse
system.
Able to accept a variety of queries from the mediator and
translate them to the terms of the source.
Communicate the result to the mediator.
How to design a wrapper?
Classify the possible queries that the mediator can ask
into templates, which are queries with parameters that
represent constants.
21.3. Templates for Query Patterns

Use notation T=>S to express the idea that the template
T is turned by the wrapper into the source query S.
Example 1
Dealer 1
Cars (serialNo, model, color, autoTrans,
navi,…)
For use by a mediator with schema
AutoMed (serialNo, model, color, autoTrans, dealer)
21.3. Templates for Query Patterns
 We denote the code representing that color by the
parameter $c, then the template will be:
SELECT * FROM AutosMed WHERE color = ’$c’;
=>
SELECT serialNo, model, color, autoTrans, ’dealer1’
FROM Cars WHERE color=’$c’;
(Template T => Source query S)
 There will be total 2n templates if we have the option of
specifying n attributes.
21.3. Wrapper Generators

The wrapper generator creates a table holds the various
query patterns contained in the templates.

The source queries that are associated with each.
21.3. Wrapper Generators
A driver is used in each wrapper, the task of the driver is to:




Accept a query from the mediator.
Search the table for a template that matches the query.
The source query is sent to the source, again using a
“plug-in” communication mechanism.
The response is processed by the wrapper.
Filter
 Have a wrapper filter to supporting more queries.
21.3. Wrapper Generators
 Example 2
If wrapper is designed with more complicated template
with queries specify both model and color.
SELECT * FROM AutosMed WHERE model = ’$m’ AND color = ’$c’;
=>
SELECT serialNo, model, color, autoTrans, ’dealer1’
FROM Cars WHERE model = ’$m’ AND color=’$c’;

Now we suppose the only template we have is color. However the
wrapper is asked by the Mediator to find “blue Gobi model car.”
21.3. Wrapper Generators
Solution:
1. Use template with $c=‘blue’ find all blue cars and
store them in a temporary relation:
TemAutos (serialNo, model, color, autoTrans, dealer)
2.The wrapper then return to the mediator the desired
set of automobiles by excuting the local query:
SELECT * FROM TemAutos WHERE model= ’Gobi’;
21.4 Capability Based Optimization
 Introduction
 Typical DBMS estimates the cost of each query plan and
picks what it believes to be the best
 Mediator – has knowledge of how long its sources will
take to answer
 Optimization of mediator queries cannot rely on cost
measure alone to select a query plan
 Optimization by mediator follows capability based
optimization
21.4.1 The Problem of Limited Source
Capabilities
 Many sources have only Web Based interfaces
 Web sources usually allow querying through a query form
 E.g. Amazon.com interface allows us to query about books in
many different ways.
 But we cannot ask questions that are too general
E.g. Select * from books;
21.4.1 The Problem of Limited Source
Capabilities (con’t)
 Reasons why a source may limit the ways in which queries
can be asked
 Earliest database did not use relational DBMS that
supports SQL queries
 Indexes on large database may make certain queries
feasible, while others are too expensive to execute
 Security reasons
 E.g. Medical database may answer queries about
averages, but won’t disclose details of a particular
patient's information
21.4.2 A Notation for Describing
Source Capabilities
 For relational data, the legal forms of queries are
described by adornments
 Adornments – Sequences of codes that represent the
requirements for the attributes of the relation, in their
standard order
 f(free) – attribute can be specified or not
 b(bound) – must specify a value for an attribute but
any value is allowed
 u(unspecified) – not permitted to specify a value for a
attribute
21.4.2 A notation for Describing
Source Capabilities….(cont’d)
 c[S](choice from set S) means that a value must be
specified and value must be from finite set S.
 o[S](optional from set S) means either do not specify a
value or we specify a value from finite set S
 A prime (f’) specifies that an attribute is not a part of the
output of the query

A capabilities specification is a set of adornments
A query must match one of the adornments in its
capabilities specification
E.g. Dealer 1 is a source of data in the form:
Cars (serialNo, model, color, autoTrans, navi)
The adornment for this query form is b’uuuu

21.4.3 Capability-Based Query-Plan
Selection
 Given a query at the mediator, a capability based query
optimizer first considers what queries it can ask at the
sources to help answer the query
 The process is repeated until:
 Enough queries are asked at the sources to resolve all
the conditions of the mediator query and therefore
query is answered. Such a plan is called feasible.
 We can construct no more valid forms of source
queries, yet still cannot answer the mediator query. It
has been an impossible query.
21.4.3 Capability-Based Query-Plan
Selection (cont’d)
 The simplest form of mediator query where we need to
apply the above strategy is join relations
 E.g we have sources for dealer 2
 Autos(serial, model, color)
 Options(serial, option)
 Suppose that ubf is the sole adornment for Auto and
Options have two adornments, bu and uc[autoTrans,
navi]
 Query is – find the serial numbers and colors of Gobi
models with a navigation system
21.4.4 Adding Cost-Based Optimization
 Mediator’s Query optimizer is not done when the
capabilities of the sources are examined
 Having found feasible plans, it must choose among them
 Making an intelligent, cost based query optimization
requires that the mediator knows a great deal about the
costs of queries involved
 Sources are independent of the mediator, so it is difficult
to estimate the cost
21.5 Optimizing Mediator Queries
 Chain algorithm – a greed algorithm that finds a way to
answer the query by sending a sequence of requests to its
sources.
 Will always find a solution assuming at least one
solution exists.
 The solution may not be optimal.
21.5.1 Simplified Adornment Notation
 A query at the mediator is limited to b (bound) and f
(free) adornments.
 We use the following convention for describing
adornments:
 nameadornments(attributes)
 where:
 name is the name of the relation
 the number of adornments = the number of
attributes
21.5.2 Obtaining Answers for
Subgoals
 Rules for subgoals and sources:
 Suppose we have the following subgoal:
 Rx1x2…xn(a1, a2, …, an),
 and source adornments for R are: y1y2…yn.
 If yi is b or c[S], then xi = b.
 If xi = f, then yi is not output restricted.
 The adornment on the subgoal matches the adornment
at the source:
 If yi is f, u, or o[S] and xi is either b or f.
21.5.3 The Chain Algorithm
 Maintains 2 types of information:
 An adornment for each subgoal.
 A relation X that is the join of the relations for all the
subgoals that have been resolved.
 Initially, the adornment for a subgoal is b iff the mediator
query provides a constant binding for the corresponding
argument of that subgoal.
 Initially, X is a relation over no attributes, containing just
an empty tuple.
 First, initialize adornments of subgoals and X.
 Then, repeatedly select a subgoal that can be resolved.
Let Rα(a1, a2, …, an) be the subgoal:
21.5.3 The Chain Algorithm (con’t)
 1.Wherever α has a b, we shall find the argument in R is a
constant, or a variable in the schema of R.
 Project X onto its variables that appear in R.
 2.For each tuple t in the project of X, issue a query to the
source as follows (β is a source adornment).
 If a component of β is b, then the corresponding
component of α is b, and we can use the corresponding
component of t for source query.
 If a component of β is c[S], and the corresponding
component of t is in S, then the corresponding component
of α is b, and we can use the corresponding component of t
for the source query.
21.5.3 The Chain Algorithm (con’t)
If a component of β is f, and the corresponding
component of α is b, provide a constant value for
source query.
 If a component of β is u, then provide no binding for
this component in the source query.
 If a component of β is o[S], and the corresponding
component of α is f, then treat it as if it was a f.
 If a component of β is o[S], and the corresponding
component of α is b, then treat it as if it was c[S].
 3. Every variable among a1, a2, …, an is now bound. For
each remaining unresolved subgoal, change its
adornment so any position holding one of these variables
is b.

21.5.3 The Chain Algorithm (con’t)




4. Replace X with X πs(R), where S is all of the
variables among: a1, a2, …, an.
5. Project out of X all components that correspond to
variables that do not appear in the head or in any
unresolved subgoal.
α then X is the answer.
If every subgoal is resolved,
If every subgoal is not resolved, then the algorithm fails.
21.5.3 The Chain Algorithm Example
 Mediator query:
 Q: Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c)
Example:
Relation
Data
R
S
T
w
x
x
y
y
z
1
2
2
4
4
6
1
3
3
5
5
7
1
4
5
8
Adornment
bf
c’[2,3,5]f
bu
21.5.3 The Chain Algorithm Example
(con’t)
 Initially, the adornments on the subgoals are the same as
Q, and X contains an empty tuple.
 S and T cannot be resolved because they each have ff
adornments, but the sources have either a b or c.
 R(1,a) can be resolved because its adornments are
matched by the source’s adornments.
 Send R(w,x) with w=1 to get the tables on the previous
page.
21.5.3 The Chain Algorithm Example
(con’t)
 Project the subgoal’s relation onto its second component,
since only the second component of R(1,a) is a variable.
a
2
3
4
This is joined with X, resulting in X equaling this relation.
 Change adornment on S from ff to bf.
21.5.3 The Chain Algorithm Example
(con’t)
 Now we resolve Sbf(a,b):
 Project X onto a, resulting in X.
 Now, search S for tuples with attribute a equivalent to
attribute a in X.
a
b
2
4
3
5
 Join this relation with X, and remove a because it doesn’t
appear in the head nor any unresolved subgoal:
b
4
5
21.5.3 The Chain Algorithm Example
(con’t)
 Now we resolve Tbf(b,c):
b
c
4
6
5
7
5
8
 Join this relation with X and project onto the c attribute to
get the relation for the head.
 Solution is {(6), (7), (8)}.
21.5.4 Incorporating Union Views at
the Mediator
 This implementation of the Chain Algorithm does not
consider that several sources can contribute tuples to a
relation.
 If specific sources have tuples to contribute that other
sources may not have, it adds complexity.
 To resolve this, we can consult all sources, or make best
efforts to return all the answers.
21.5.4 Incorporating Union Views at
the Mediator (con’t)
 Consulting All Sources
 We can only resolve a subgoal when each source for its
relation has an adornment matched by the current
adornment of the subgoal.
 Less practical because it makes queries harder to answer
and impossible if any source is down.
 Best Efforts
 We need only 1 source with a matching adornment to
resolve a subgoal.
 Need to modify chain algorithm to revisit each subgoal
when that subgoal has new bound requirements.
21.6.Local-as-View Mediators.
 In a LAV mediator, global predicates defined are not views
of the source data.
 for each source, expressions are defined, involving the
global predicates that describe the tuples that the source
is able to produce.
 Queries are answered at the mediator by discovering all
possible ways to construct the query using the views
provided by the source.
21.6.Motivation for LAV Mediators
 Sometimes the the relationship between what the mediator
should provide and what the sources provide is more subtle.
 For example, consider the predicate Par(c, p) meaning that p
is a parent of c which represents the set of all child parent
facts that could ever exist.
 The sources will provide information about whatever childparent facts they know.
 There can be sources which may provide child-grandparent
facts but not child- parent facts at all.
 This source can never be used to answer the child-parent
query under GAV mediators.
 LAV mediators allow to say that a certain source provides
grand parent facts.
21.6.Terminology for LAV Mediation.
 The queries at the mediator and the queries that describe
the source will be single Datalog rules.
 A query that is a single Datalog rule is often called a
conjunctive query.
 The global predicates of the LAV mediator are used as the
subgoals of mediator queries.
 There are conjunctive queries that define views.
 Their heads each have a unique view predicate that is the
name of a view.
 Each view definition has a body consisting of global
predicates and is associated with a particular source.
 It is assumed that each view can be constructed with an allfree adornment.
21.6.Example..
 Consider global predicate Par(c, p) meaning that p is a
parent of c.
 One source produces parent facts. Its view is defined by the
conjunctive queryV1(c, p)  Par(c, p)
 Another source produces some grand parents facts. Then its
conjunctive query will be –
V2(c, g)  Par(c, p) AND Par(p, g)
21.6.Example contd..
 The query at the mediator will ask for great-grand parent
facts that can be obtained from the sources. The mediator
query is –
Q(w, z)  Par(w, x) AND Par(x, y) AND Par(y, z)
 One solution can be using the parent predicate(V1) directly
three times.
Q(w, z)  V1(w, x) AND V1 (x, y) AND V1(y, z)
 Another solution can be to use V1(parent facts) and
V2(grandparent facts).
Q(w, z)  V1(w, x) AND V2(x, z)
Or
Q(w, z)  V2(w, y) AND V1(y, z)
21.6.Expanding Solutions.
 Consider a query Q, a solution S that has a body whose
subgoals are views and each view V is defined by a
conjunctive query with that view as the head.
 The body of V’s conjunctive query can be substituted for a
subgoal in S that uses the predicate V to have a body
consisting of only global predicates.
21.6.Expansion Algorithm
 A solution S has a subgoal V(a1, a2,…an) where ai’s can be
any variables or constants.
 The view V can be of the form
V(b1, b2,….bn)  B
Where B represents the entire body.
 V(a1, a2, … an) can be replaced in solution S by a version
of body B that has all the subgoals of B with variables
possibly altered.
21.6.Expansion Algorithm contd..
The rules for altering the variables of B are:
1. First identify the local variables B, variables that appear
in the body but not in the head.
2. If there are any local variables of B that appear in B or
in S, replace each one by a distinct new variable that
appears nowhere in the rule for V or in S.
3. In the body B, replace each bi by ai for i = 1,2…n.
21.6.Example.
 Consider the view definitions,
V1(c, p)  Par(c, p)
V2(c, g)  Par(c, p) AND Par(p, g)
 One of the proposed solutions S is
Q(w, z)  V1(w, x) AND V2(x, z)
 The first subgoal with predicate V1 in the solution can be
expanded as Par(w, x) as there are no local variables.
 The V2 subgoal has a local variable p which doesn’t appear
in S nor it has been used as a local variable in another
substitution. So p can be left as it is.
 Only x and z are to be substituted for variables c and g.
 The Solution S now will be
Q(w, z)  Par(w, x) AND Par(x, p) AND Par(p,z)
21.6.Containment of Conjunctive
Queries
A containment mapping from Q to E is a function т from the
variables of Q to the variables and constants of E, such
that:
1. If x is the ith argument of the head of Q, then т(x) is
the ith argument of the head of E.
2. Add to т the rule that т(c)=c for any constant c. If
P(x1,x2,… xn) is a subgoal of Q, then
P(т(x1), т(x2),…
т(xn)) is a subgoal of E.
21.6.Example
 Consider two Conjunctive queries:

Q1: H(x, y)  A(x, z) and B(z, y)

Q2: H(a, b)  A(a, c) AND B(d, b) AND A(a, d)
 When we apply the substitution,

Т(x) = a, Т(y) = b, Т(z) = d, the head of Q1
becomes H(a, b) which is the head of Q2.
 So,there is a containment mapping from Q1 to Q2.
 The first subgoal of Q1 becomes A(a, d) which is the third
subgoal of Q2.
 The second subgoal of Q1 becomes the second subgoal
of Q2.
 There is also a containment mapping from Q2 to Q1 so
the two conjunctive queries are equivalent.
21.6.Why the Containment-Mapping
Test Works
 Suppose there is a containment mapping т from Q1 to Q2.
 When Q2 is applied to the database, we look for
substitutions σ for all the variables of Q2.
 The substitution for the head becomes a tuple t that is
returned by Q2.
 If we compose т and then σ, we have a mapping from the
variables of Q1 to tuples of the database that produces the
same tuple t for the head of Q1.
21.6.Finding Solutions to a Mediator
Query
 There can be infinite number of solutions built from the
views using any number of subgoals and variables.
 LMSS Theorem can limit the search which states that
 If a query Q has n subgoals, then any answer
produced by any solution is also produced by a solution
that has at most n subgoals.
 If the conjunctive query that defines a view V has in its
body a predicate P that doesn’t appear in the body of the
mediator query, then we need not consider any solution
that uses V.
21.6.Example.
 Recall the query
Q1: Q(w, z) Par(w, x) AND Par(x, y) AND Par(y, z)
 This query has three subgoals, so we don’t have to look at
solutions with more than three subgoals.
21.6.Why the LMSS Theorem Holds
 Suppose we have a query Q with n subgoals and there is a
solution S with more than n subgoals.
 The expansion E of S must be contained in Query Q, which
means that there is a containment mapping from Q to E.
 We remove from S all subgoals whose expansion was not the
target of one of Q’s subgoals under the containment
mapping.
 We would have a new conjunctive query S’ with at most n
subgoals.
 If E’ is the expansion of S’ then, E’ is a subset of Q.
 S is a subset of S’ as there is an identity mapping.
 Thus S need not be among the solutions to query Q.
Thank You
Download