Uploaded by Girinath Suddala

Advanced Databases

advertisement
UNIT1
PARALLEL DATABASES




Introduction to Parallel Databases Companies need to handle huge amount of data with
high data transfer rate.
The client server and centralized system is not much efficient. The need to improve the
efficiency gave birth to the concept of Parallel Databases.
Parallel database system improves performance of data processing using multiple
resources in parallel, like multiple CPU and disks are used parallely.
It also performs many parallelization operations like, data loading and query processing.
Goals of Parallel Databases




The concept of Parallel Database was built with a goal to: Improve performance: The
performance of the system can be improved by connecting multiple CPU and disks in
parallel. Many small processors can also be connected in parallel.
Improve availability of data: Data can be copied to multiple locations to improve the
availability of data. For example: if a module contains a relation (table in database)
which is unavailable then it is important to make it available from another module.
Improve reliability: Reliability of system is improved with completeness, accuracy and
availability of data.
Provide distributed access of data: Companies having many branches in multiple cities
can access data with the help of parallel database system.
Parallel Database Architectures
Today everybody interested in storing the information they have got. Even small organizations
collect data and maintain mega databases. Though the databases eat space, they really helpful
in many ways. For example, they are helpful in taking decisions through a decision support
system. To handle such a voluminous data through conventional centralized system is bit
complex. It means, even simple queries are time consuming queries. The solution is to handle
those databases through Parallel Database Systems, where a table / database is distributed
among multiple processors possibly equally to perform the queries in parallel. Such a system
which share resources to handle massive data just to increase the performance of the whole
system is called Parallel Database Systems.
We need certain architecture to handle the above said. That is, we need architectures which
can handle data through data distribution, parallel query execution thereby produce good
throughput of queries or Transactions. Figure 1, 2 and 3 shows the different architecture
proposed and successfully implemented in the area of Parallel Database systems. In the figures,
P represents Processors, M represents Memory, and D represents Disks/Disk setups.
1. Shared Memory Architecture
Figure 1 - Shared Memory Architecture
In Shared Memory architecture, single memory is shared among many processors as show in
Figure 1. As shown in the figure, several processors are connected through an interconnection
network with Main memory and disk setup. Here interconnection network is usually a high
speed network (may be Bus, Mesh, or Hypercube) which makes data sharing (transporting) easy
among the various components (Processor, Memory, and Disk).
Advantages:

Simple implementation

Establishes effective communication between processors through single memory
addresses space.

Above point leads to less communication overhead.
Disadvantages:

Higher degree of parallelism (more number of concurrent operations in different
processors) cannot be achieved due to the reason that all the processors share the same
interconnection network to connect with memory. This causes Bottleneck in interconnection
network (Interference), especially in the case of Bus interconnection network.

Addition of processor would slow down the existing processors.

Cache-coherency should be maintained. That is, if any processor tries to read the data
used or modified by other processors, then we need to ensure that the data is of latest version.

Degree of Parallelism is limited. More number of parallel processes might degrade the
performance.
2. Shared Disk Architecture
Figure 2 - Shared Disk Architecture
In Shared Disk architecture, single disk or single disk setup is shared among all the available
processors and also all the processors have their own private memories as shown in Figure 2.
Advantages:

Failure of any processors would not stop the entire system (Fault tolerance)

Interconnection to the memory is not a bottleneck. (It was bottleneck in Shared
Memory architecture)

Support larger number of processors (when compared to Shared Memory architecture)
Disadvantages:

Interconnection to the disk is bottleneck as all processors share common disk setup.

Inter-processor communication is slow. The reason is, all the processors have their own
memory. Hence, the communication between processors need reading of data from other
processors’ memory which needs additional software support.
Example Real Time Shared Disk Implementation

DEC clusters (VMScluster) running Rdb
3. Shared Nothing Architecture
Figure 3 - Shared Nothing Architecture
In Shared Nothing architecture, every processor has its own memory and disk setup. This setup
may be considered as set of individual computers connected through high speed
interconnection network using regular network protocols and switches for example to share
data between computers. (This architecture is used in the Distributed Database System). In
Shared Nothing parallel database system implementation, we insist the use of similar nodes
that are Homogenous systems. (In distributed database System we may use Heterogeneous
nodes)
Advantages:

Number of processors used here is scalable. That is, the design is flexible to add more
number of computers.

Unlike in other two architectures, only the data request which cannot be answered by
local processors need to be forwarded through interconnection network.
Disadvantages:

Non-local disk accesses are costly. That is, if one server receives the request. If the
required data not available, it must be routed to the server where the data is available. It is
slightly complex.

Communication cost involved in transporting data among computers.
Example Real Time Shared Nothing Implementation

Teradata

Tandem

Oracle nCUBE
Data Partitioning Strategies in Parallel Database Systems
Partitioning the tables/databases is very important step in parallelizing the database activities.
By partitioning the data equally into many different processors’ workload, we can achieve
better performance (better parallelism) of the whole system. We have the following types of
fragmentation (partitioning) techniques in Parallel database;
Horizontal Fragmentation – Partitioning the tables using the conditions specified through
WHERE clause of SQL query to distribute bunch of tuples (records). For example, consider the
following schema;
STUDENT (Regno, SName, Address, Branch, Phone)
In this table, Branch column is used to store the academic branch in which the student
admitted in. suppose, there are various branches like ‘BTech CIVIL’, ‘BTech MECH’, ‘BTech CSE’,
and so on, then the following query would be used to partition the table using horizontal
partitioning.
SELECT * FROM student WHERE Branch = branch name;
In the result, there won’t be any changes in the schema of the table; i.e the structure of the
table is unaltered. Only, ‘BTech CIVIL’ students fragmented into one partition, ‘BTech CSE’
students fragmented into one partition and so on.
Vertical Fragmentation – Partitioning tables using the decomposition rules to break and
distribute the tables into multiple partitions vertically (different schemas) is called Vertical
fragmentation.
For example, if we would like to break STUDENT into different tables like STUDENT(Regno,
SName, Address, Branch) and STU_PHONE(Regno, Phone), i.e into two different schema on
columns is Vertical partitioning.
Among the above discussed fragmentation techniques, partitioning any relation with respect to
Parallel databases involves Horizontal Partitioning. Horizontal data partition helps us to
distribute the data into several processors to execute queries on them simultaneously.
Partitioning Strategies:
There are various partitioning strategies proposed to manage the data distribution into multiple
processors evenly.
Let us assume that in our parallel database system we have n processors P0, P1, P2, …, Pn-1 and n
disks D0, D1, D2, …, Dn-1 where we partition our data. The value of n is chosen according to the
degree of parallelism required. The partitioning strategies are,
Round-Robin Partitioning
It is the simplest form of partitioning strategy. Here, the data are distributed into
various disks in the fashion, first record into first disk, second record into second disk, and so
on. If the number of available disks n is 10, then first record goes to D1 (1 mod 10 = 1), second
record goes to D2 (2 mod 10 =2), and so on and 10 th record goes to D0 (10 mod 10 = 0),
11th record goes to D1 (11 mod 10 = 1). This scheme distributes data evenly in all the disks.
Hash Partitioning
This strategy identifies one or more attributes as partitioning attributes (which is not
required in Round-Robin partitioning). We need a hash function which is carefully chosen
(careful to avoid data skewness) which takes the identified partitioning attributes as input to
hash function.
For example, consider the following table;
EMPLOYEE(ENo, EName, DeptNo, Salary, Age)
If we choose DeptNo attribute as the partitioning attribute and if we have 10 disks to distribute
the data, then the following would be a hash function;
h(DeptNo) = DeptNo mod 10
If we have 10 departments, then according to the hash function, all the employees of
department 1 will go into disk 1, department 2 to disk 2 and so on.
As another example, if we choose the EName of the employees as partitioning attribute, then
we could have the following hash function;
h(EName) = (Sum of ASCII value of every character in the name) mod n,
where n is the number of disks/partitions needed.
Range Partitioning
In Range Partitioning we identify one or more attributes as partitioning attributes. Then we
choose a range partition vector to partition the table into n disks. The vector is the values
present in the partitioning attribute.
For example, for the EMPLOYEE relation given above, if the partitioning attribute is Salary, then
the vector would be one as follows;
[5000, 15000, 30000],
where every value means the individual range of salaries. That is, 5000 represents the first
range (0 – 5000), 15000 represents the range (5001 – 15000), 30000 represents the third range
(15001 – 30000), and it includes the final range which is (30001 – rest). Hence, the vector with 3
values represents 4 disks/partitions.
The above discussed partition strategies must be chosen carefully according to the workload of
your parallel database system. The workload may involve many components like, which
attribute is frequently used in any queries as filtering condition, the number of records in a
table, the size of the database, approximate number of incoming queries in a day, etc.
Detailed Example:
Let us start with the following table Emp_table. Emp_table instance has 14 records and every
record stores information about the name of the employee; his/her work grade, and the
department name. Assume that we have 3 processors namely P0, P1, P2, and 3 Disks associated
with those 3 processors namely D0, D1, D2.
Emp_table
ENAME GRADE
DNAME
SMITH
1 RESEARCH
BLAKE
4 SALES
FORD
4 RESEARCH
KING
5 ACCOUNTING
SCOTT
4 RESEARCH
MILLER
2 ACCOUNTING
TURNER
3 SALES
WARD
2 SALES
MARTIN
2 SALES
ADAMS
1 RESEARCH
Emp_table
ENAME GRADE
DNAME
JONES
4 RESEARCH
JAMES
1 SALES
CLARK
4 ACCOUNTING
ALLEN
3 SALES
Table 1 – Emp_table
A. Round-Robin Partitioning:
In this strategy we partition records in a round-robin manner using the function i mod n, where
i is the record position in the table and n is the number of partitions/disks which is in our case
3. On the application of partitioning technique, first record goes into D1, second record goes
into D2, third record goes into D0, fourth record goes into D1, and so on. After distribution of
records, we will get the following partitions;
Emp_table_Partition0
ENAME GRADE DNAME
FORD
4 RESEARCH
MILLER
2 ACCOUNTING
MARTIN
2 SALES
JAMES
1 SALES
Table 2 – Records 3, 6, 9, 12 mod 3
Emp_table_Partition1
ENAME GRADE DNAME
SMITH
1 RESEARCH
KING
5 ACCOUNTING
TURNER
3 SALES
ADAMS
1 RESEARCH
CLARK
4 ACCOUNTING
Table 3 – Records 1, 4, 7, 10, 13 mod 3
Emp_table_Partition2
ENAME GRADE DNAME
BLAKE
4 SALES
SCOTT
4 RESEARCH
Emp_table_Partition2
ENAME GRADE DNAME
WARD
2 SALES
JONES
4 RESEARCH
ALLEN
3 SALES
Table 4 – Records 2, 5, 8, 11, 14 mod 3
B. Hash Partitioning:
Let us take GRADE attribute of the Emp_table to explain Hash partitioning. Let us choose a hash
function as follows;
h(GRADE) = (GRADE mod n)
where GRADE is the value of GRADE attribute of a record and n is number of partitions which is
3 in our case. While applying the hash partitioning on GRADE, we will get the following
partitions of Emp_table. For example, the GRADE of ‘Smith’ is 1 and while hashing the function
shows partition 1 (i.e 1 mod 3 = 1). The GRADE of ‘Blake’ is 4, then (4 mod 3) directs to partition
1. The GRADE of ‘King’ is 5 which directs to partition 2 (5 mod 3 = 2).
Emp_table_Partition0
ENAME GRADE DNAME
TURNER
3 SALES
ALLEN
3 SALES
Table 5 – GRADEs 3 mod 3
Emp_table_Partition1
ENAME GRADE DNAME
SMITH
1 RESEARCH
BLAKE
4 SALES
FORD
4 RESEARCH
SCOTT
4 RESEARCH
ADAMS
1 RESEARCH
JONES
4 RESEARCH
JAMES
1 SALES
CLARK
4 ACCOUNTING
Table 6 – GRADEs 1, 4 mod 3
Emp_table_Partition2
ENAME GRADE DNAME
KING
5 ACCOUNTING
Emp_table_Partition2
ENAME GRADE DNAME
MILLER
2 ACCOUNTING
WARD
2 SALES
MARTIN
2 SALES
Table 7 – GRADEs 2, 5 mod 3
C. Range Partitioning:
Let us consider GRADE of Emp_table to partition under range partitioning. For applying range
partition, we need to first identify partitioning vector, [v0, v1, …, vn-2]. Let us choose the
following vector as range partitioning vector for our case;
[2, 4]
According to the vector, the records having the GRADE value 2 and less will go into partition 0,
greater than 2 and less than or equal to 4 will go into partition 1, and all the other values
(greater than 4) will go into partition 2 as depicted in the following tables.
Emp_table_Partition0
ENAME GRADE DNAME
SMITH
1 RESEARCH
MILLER
2 ACCOUNTING
WARD
2 SALES
MARTIN
2 SALES
ADAMS
1 RESEARCH
JAMES
1 SALES
Table 8 – GRADE values 1 and 2
Emp_table_Partition1
GRAD
ENAME
DNAME
E
BLAKE
4 SALES
FORD
4 RESEARCH
SCOTT
4 RESEARCH
TURNER
3 SALES
JONES
4 RESEARCH
CLARK
4 ACCOUNTING
ALLEN
3 SALES
Table 9 – GRADE values 3 and 4
Emp_table_Partition2
ENAME GRADE
DNAME
KING
5 ACCOUNTING
Table 10 – GRADE value 5 and above
Interquery and Intraquery Parallelism in Parallel Database
Inter-query Parallelism
It is a form of parallelism where many different Queries or Transactions are executed in parallel
with one another on many processors.
Advantages
It increases Transaction Throughput. That is, number of transactions executed in a given time
can be increased.
It scales up the Transaction processing system. Hence, best suited for On-Line Transaction
Processing (OLTP) systems.
Supported Parallel Database Architectures
It is easy to implement in Shared Memory Parallel System. Lock tables and Log information are
maintained in the same memory. Hence, it is easy to handle those transactions which shares
locks with other transactions. Locking and logging can be done efficiently.
In other parallel architectures like Shared Disk and Shared Nothing, the locking and logging
must be done through message passing between processors, which is considered as costly
operation when compared Shared Memory Parallel architecture. Cache coherency problem
would occur.
Example Database systems which support Inter-query Parallelism
Oracle 8 and Oracle Rdb
Intra-Query Parallelism
It is the form of parallelism where Single Query is executed in parallel on many processors.
Advantages
To speed up a single complex long running queries.
Best suited for complex scientific calculations (queries).
Supported Parallel Database Architectures
SharedMemory, Shared Disk and Shared Nothing parallel architectures are supported. We need
not worry about locking and logging as because it involves parallelizing single query.
Types
Intra-operation parallelism – the process of speeding up a query through parallelizing the
execution of individual operations. The operations which can be parallelized are Sort, Join,
Projection, Selection and so on.
Inter-operation parallelism – the process of speeding up a query through parallelizing
various operations which are part of the query. For example, a query which involves join of 4
tables can be executed in parallel in two processors in such a way that each processor shall join
two relations locally and the result1 and result2 can be joined further to produce the final
result.
Example Database systems which support Intra-query Parallelism
Informix, Terradata
Parallel Database - Intraquery Parallelism
Intraquery Parallelism
It is about executing a single query in parallel using multiple processors or disks. This can be
done by dividing the data into many smaller units and execute the query on those smaller
tables. We have so many queries which are complex and consume more time and resources.
For example, consider the query given below;
SELECT * FROM Email ORDER BY Start_Date;
This query will sort the records of Email table in ascending order on the
attribute Start_Date. Assume that the Email table has 10000 records. To sort the table we need
at least 10000 comparisons of Salary attribute values, if the query is executed in single
processor. If the same query is executed in several processors simultaneously, we can finish
sorting in lesser time. For example, if 10000 records are distributed equally into 10 processors,
1000 each, then every processor can sort 1000 records in lesser time, later it can be
combined/merged using other algorithms to produce final sorted result.
In Intraquery Parallelism, we have two types;


IntraoperationParallelism
InteroperationParallelism
Parallel Database - Intraoperation Parallelism
Intra-operation Parallelism
It is about parallelizing a single relational operation given in a query.
SELECT * FROM Email ORDER BY Start_Date;
In the above query, the relational operation is Sorting. Since a table may have large number of
records in it, the operation can be performed on different subsets of the table in multiple
processors,
which
reduces
the
time
required
to
sort.
Consider another query,
SELECT * FROM Student, CourseRegd WHERE Student.Regno = CourseRegd.Regno;
In this query, the relational operation is Join. The query joins two tables Student, and
CourseRegd on common attribute Regno. Parallelism is required here if the size of tables is very
large. Usually, order of tuples does not matter in DBMS. Hence, the tables arranged in random
order needs every record of one table should be matched with every record of other table to
complete the join process. For example, if Student has 10000 records and CourseRegd has
60000 records, then join requires 10000 X 60000 comparisons. If we exploit parallelism in here,
we could achieve better performance.
There are many such relational operations which can be executed in parallel using many
processors on subsets of the table/tables mentioned in the query. The following list includes
the relational operations and various techniques used to implement those operations in
parallel.

o
o

o
o
Parallel Sort
Range-Partitioning Sort
Parallel External Sort-Merge
Parallel Join
Partitioned Join
Fragment and Replicate Join
Parallel Sort - Range Partitioning Sort
Range Partitioning Sort
Assumptions:
Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1.
Disk Di is associated with Processor Pi.
Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or Hash
Partitioning technique or Range Partitioning technique (if range partitioned on some
other attribute other than sorting attribute)
Objective:
Our objective is to sort a relation (table) Ri that resides on n disks on an attribute A in
parallel.
Steps:
Step 1: Partition the relations Ri on the sorting attribute A at every processor using a
range vector v. Send the partitioned records which fall in the ith range to Processor
Pi where they are temporarily stored in Di.
Step 2: Sort each partition locally at each processor Pi. And, send the sorted results for
merging with all the other sorted results which is trivial process.
Parallel Database - Parallel External Sort-Merge Technique
Parallel External Sort-Merge
Assumptions:
Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1.
Disk Di is associated with Processor Pi.
Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or Hash
Partitioning technique or Range Partitioning technique (partitioned on any attribute)
Objective:
Our objective is to sort a relation (table) Ri on an attribute A in parallel where R resides on n
disks.
Steps:
Step 1: Sort the relation partition Ri which is stored on disk Di on the sorting attribute of the
query.
Step 2: Identify a range partition vector v and range partition every Ri into processors, P0, P1, …,
Pn-1 using vector v.
Step 3: Each processor Pi performs a merge on the incoming range partitioned data from every
other processors (The data are actually transferred in order. That is, all processors send first
partition into P0, then all processors sends second partition into P1, and so on).
Step 4: Finally, concatenate all the sorted data from different processors to get the final result.
Point to note:
Range partition must be done using a good range-partitioning vector. Otherwise, skew might be
the problem.
Example:
Let us explain the above said process with simple example. Consider the following relation
schema Employee;
Employee (Emp_ID, EName, Salary)
Assume that relation Employee is permanently partitioned using Round-robin technique into 3
disks D0, D1, and D2 which are associated with processors P0, P1, and P2. At processors P0, P1,
and P2, the relations are named Employee0, Employee1 and Employee2 respectively. This initial
state is given in Figure 1.
Figure 1 - Partitioned Employee
Assume that the following sorting query is initiated.
SELECT * FROM Employee ORDER BY Salary;
As already said, the table Employee is not partitioned on the sorting attribute Salary. Then, the
Parallel External Sort-Merge technique works as follows;
Step 1:
Sort the data stored in every partition (every disk) using the ordering attribute Salary. (Sorting
of data in every partition is done temporarily). At this stage every Employee i contains salary
values of range minimum to maximum. The partitions sorted in ascending order is shown
below, in Figure 2.
Figure 2 - Partitioned Employee in ascending order on Salary attribute
Step 2:
We have to identify a range vector v on the Salary attribute. The range vector is of the
form v[v0, v1, …, vn-2]. For our example, let us assume the following range vector;
v[14000, 24000]
This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and
range 2 (24001 and more).
Redistribute every partition (Employee0, Employee1 and Employee2) using these range vectors
into 3 disks temporarily. What would be the status of Temp_Employee 0, 1, and 2 after
distributing Employee 0 is given in Figure 3.
Figure 3 - Distribution of Employee 0 according to the range vector
Step 3:
Actually, the above said distribution is executed at all processors in parallel such that
processors P0, P1, and P2 are sending the first partition of Employee 0, 1, and 2 to disk 0. Upon
receiving the records from various partitions, the receiving processor P0 merges the sorted
data. This is shown in Figure 4.
Figure 4 - Merging data of range <=14000 at D0 from all the disks
The above said process is done at all processors for different partitions. The final version of
Temp_Employee 0, 1, and 2 are shown in Figure 5.
Figure 5 - Status of Temp_Employee 0, 1, and 2 after distributing all ranges from all disks
Step 4:
The final concatenation of sorted data from all the disks is trivial.
Range partition must be done using a good range-partitioning vector. Otherwise, skew
might be the problem.
Example:
Let us explain the above said process with simple example.Consider the following
relation schema Employee;

Employee (Emp_ID, EName, Salary)
Assume that relation Employee is permanently partitioned using Round-robin
technique into 3 disks D0, D1, and D2 which are associated with processors P0, P1, and P2.
At processors P0, P1, and P2, the relations are named Employee0, Employee1 and
Employee2 respectively. This initial state is given in Figure 1.
Figure 1
Assume that the following sorting query is initiated.
 SELECT * FROM Employee ORDER BY Salary;
As already said, the table Employee is not partitioned on the sorting attribute Salary.
Then, the Range-Partitioning technique works as follows;
Step 1:
At first we have to identify a range vector v on the Salary attribute. The range vector is
of the form v[v0, v1, …, vn-2]. For our example, let us assume the following range
vector;
 v[14000, 24000]
This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to
24000) and range 2 (24001 and more).
Redistribute the relations Employee0, Employee1 and Employee2 using these range
vectors into 3 disks temporarily. After this distribution disk 0 will have range 0 records
(i.e, records with salary value less than or equal to 14000), disk 1 will have range 1
records (i.e, records with salary value greater than 14000 and less than or equal to
24000), and disk 2 will have range 2 records (i.e, records with salary value greater than
24000).
This redistribution according to range vector v is represented in Figure 2 as links to all
the disks from all the relations. Temp_Employee0, Temp_Employee1, and
Temp_Employee2, are the relations after successful redistribution. These tables are
stored temporarily in disks D0, D1, and D2. (They can also be stored in Main memories
(M0, M1, M2) if they fit into RAM).
Figure 2
Step 2:
Now, we got temporary relations at all the disks after redistribution.
At this point, all the processors sort the data assigned to them in ascending order of Salary
individually. The process of performing the same operation in parallel on different sets of
data is called Data Parallelism.
Final Result:
After the processors completed the sorting, we can simply collect the data from different
processors and merge them. This merge process is straightforward as we have data already
sorted for every range. Hence, collecting sorted records from partition 0, partition 1 and
partition 2 and merging them will give us final sorted output.
Parallel Join Technique - Partitioned Join
Partitioned Join
The relational tables that are to be joined gets partitioned on the joining attributes of both
tables using same partitioning function to perform Join operation in parallel.
How does Partitioned Join work?
Assume that relational tables r and s need to be joined using attributes r.A and s.B. The system
partitions both r and s using same partitioning technique into n partitions. In this process A and
B attributes (joining attributes) to be used as partitioning attributes as well for r and s
respectively. r is partitioned into r0, r1, r2, …, rn-1 and s is partitioned into s0, s1, s2, …, sn-1. Then,
the system sends partitions ri and si into processor Pi, where the join is performed locally.
What type of joins can we perform in parallel using Partitioned Join?
Only joins such as Equi-Joins and Natural Joins can be performed using Partitioned join
technique.
Why Equi-Jion and Natural Jion?
Equi-Join or Natural Join is done between two tables using an equality condition such as r.A =
s.B. The tuples which are satisfying this condition, i.e, same value for both A and B, are joined
together. Others are discarded. Hence, if we partition the relations r and s on applying certain
partitioning technique on both A and B, then the tuples having same value for A and B will end
up in the same partition. Let us analyze this using simple example;
RegNo
1
3
4
2
SName
Gen
Phone
Sundar
M
9898786756
Karthik
M
8798987867
John
M
7898886756
Ram
M
9897786776
Table 1 - STUDENT
RegNo
Courses
4
2
3
Database
Database
Data Structures
1
Multimedia
Table 2 – COURSES_REGD
Let us assume the following;
The RegNo attributes of tables STUDENT and COURSES_REGD are used for joining.
Observe the order of tuples in both tables. They are not in particular order. They are stored in
random order on RegNo.
Partition the tables on RegNo attribute using Hash Partition. We have 2 disks and we need to
partition the relational tables into two partitions (possibly equal). Hence, n is 2.
The hash function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the hash
function we shall get the tables STUDENT and COURSES_REGD partitioned into Disk0 and
Disk1 as stated below.
Partition 0
Partition 1
RegNo
SName
Gen
Phone
4
John
M
7898886756
2
Ram
M
9897786776
STUDENT_0
RegNo
SName
Gen
Phone
1
Sundar
M
9898786756
3
Karthik
M
8798987867
STUDENT_1
RegNo
Courses
4
Database
2
Database
COURSES_REGD_0
RegNo
Courses
3
Data Structures
1
Multimedia
COURSES_REGD_1
From the above table, it is very clear that the same RegNo values of both tables STUDENT and
COURSES_REGD are sent to same partitions. Now, join can be performed locally at every
processor in parallel.
One more interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd records)
comparisons need to be done in every partition for our example. Hence, we need total of 8
comparisons in partitioned join against 16 (4 X 4) in conventional join.
The above discussed process is shown in Figure 1.
Figure 1 - Partitioned Join
Fragment and Replicate Join - a parallel joining technique
Fragment and Replicate Join
Introduction
All of us know about different types of joins in terms of the nature of the join conditions and
the operators used for join. They are Equi-join and Non-Equi Join. Equi-join is performed
through checking an equality condition between different joining attributes of different tables.
Non-equi join is performed through checking an inequality condition between joining
attributes.
Equi-join is of the form,
SELECT columns FROM list_of_tables WHERE table1.column = table2.column;
whereas, Non-equi join is of the form,
SELECT columns FROM list_of_tables WHERE table1.column < table2.column;
(Or, any other operators >, <>, <=, >= etc. in the place of < in the above example)
We have discussed Partitioned Join in the previous post, where we partitioned the relational
tables that are to be joined, into equal partitions and we performed join on individual partitions
locally at every processor. Partitioning the relations on the joining attribute and join them will
work only for joins that involve equality conditions.
Clearly, joining the tables by partitioning will work only for Equi-joins or natural joins. For
inequality joins, partitioning will not work. Consider a join condition as given below;
r r.a>s.b s
In this non-equal join condition, the tuples (records) of r must be joined with records of s for all
the records where the value of attribute r.a is greater than s.b. In other words, all records of r
join with some records of s and vice versa. That is, one of the relations’ all records must be
joined with some of the records of other relation. For clear example, see Non-equi join post.
What does fragment and replicate mean?
Fragment means partitioning a table either horizontally or vertically (Horizontal and Vertical
fragmentation). Replicate means duplicating the relation, i.e, generating similar copies of a
table. This join is performed by fragmenting and replicating the tables to be joined.
Asymmetric Fragment and Replicate Join
(How does Asymmetric Fragment and Replicate Join work?)
It is a variant of Fragment and Replicate join. It works as follows;
1. The system fragments table r into n fragments such that r 0, r1, r2, .., rn-1, where r is one of the
tables that is to be joined and n represents the number of processors. Any partitioning
technique, round-robin, hash or range partitioning could be used to partition the relation.
2. The system replicates the other table, say s into n processors. That is, the system generates n
copies of s and sends to n processors.
3. Now we have, r0 and s in processor P0, r1 and s in processor P1, r2 and s in processor P2, .., rn1 and s in processor Pn-1. The processor Pi is performing the join locally on r i and s.
Figure 1 given below shows the process of Asymmetric Fragment-and-Replicate join (it may not
be the appropriate example, but it clearly shows the process);
Figure 1 - process of Asymetric Fragment and Replicate Parallel Join
Points to Note:
1. Non-equal join can be performed in parallel.
2. If one of the relations to be joined is already partitioned into n processors, this technique is
best suited, because we need to replicate the other relation.
3. Unlike in Partitioned Join, any partitioning techniques can be used.
4. If one of the relations to be joined is very small, the technique performs better.
Fragment and Replicate Join
(How does Fragment and Replicate Join work?)
It is the general case of Asymmetric Fragment-and-Replicate join technique. Asymmetric
technique is best suited if one of the relations to be joined is small, and if it can fit into memory.
If the relations that are to be joined are large, and the joins is non-equal then we need to use
Fragment-and-Replicate Join. It works as follows;
1. The system fragments table r into m fragments such that r0, r1, r2, .., rm-1, and s into n
fragments such that s0, s1, s2, .., sn-1 . Any partitioning technique, round-robin, hash or range
partitioning could be used to partition the relations.
2. The values for m and n are chosen based on the availability of processor. That is, we need at
least m*n processors to perform join.
3. Now we have to distribute all the partitions of r and s into available processors.
And, remember that we need to compare every tuple of one relation with every tuple of other
relation. That is the records of r0 partition should be compared with all partitions of s, and the
records of partition s0 should be compared with all partitions of r. This must be done with all
the partitions of r and s as mentioned above. Hence, the data distribution is done as follows;
a. As we need m*n processors, let us assume that we have processors P 0,0, P0,1, …, P0,n1, P1,0, P1,1, …, Pm-1,n-1. Thus, processor Pi,j performs the join of ri with sj.
b. To ensure the comparison of every partition of r with every other partition of s, we
replicate ri with the processors, Pi,0, Pi,1, Pi,2, …, Pi,n-1, where 0, 1, 2, …, n-1 are partitions of s.
This replication ensures the comparison of every ri with complete s.
c. To ensure the comparison of every partition of s with every other partition of r, we
replicate si with the processors, P0,i, P1,i, P2,i, …, Pm-1,i, where 0, 1, 2, …, m-1 are partitions of r.
This replication ensures the comparison of every si with complete r.
4. Pi,j computes the join locally to produce the join result.
Figure 2 given below shows the process of general case Fragment-and-Replicate join (it may not
be the appropriate example, but it clearly shows the process);
Figure 2 - process of general case Fragment-and-Replicate Join
Partitioned Parallel Hash Join
The Sequential Hash Join technique discussed in the post Hash Join technique in DBMS can be
parallelized. .
The Technique
Please go through the post Hash Join technique in DBMS for better understanding of Parallel
Hash-Join.
Assume that we have,

n processors, P0, P1, …, Pn-1


two tables, r and s (r and s are already partitioned into disks of n processors),
s is the smaller table
Interoperation Parallelism
It is about executing different operations of a query in parallel. A single query may involve
multiple operations at once. We may exploit parallelism to achieve better performance of such
queries. Consider the example query given below;
SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id;
It involves two operations. First one is an Aggregation and the second is grouping. For executing
this query,
We need to group all the employee records based on the attribute Dept_Id first.
Then, for every group we can apply the AVG aggregate function to get the final result.
We can use Interoperation parallelism concept to parallelize these two operations.
[Note: Intra-operation is about executing single operation of a query using multiple processors
in parallel]
The following are the variants using which we would achieve Interoperation Parallelism;
1. Pipelined Parallelism
2. Independent Parallelism
1. Pipelined Parallelism
In Pipelined Parallelism, the idea is to consume the result produced by one operation by the
next operation in the pipeline. For example, consider the following operation;
r1 ⋈ r2 ⋈ r3 ⋈ r4
The above expression shows a natural join operation. This actually joins four tables. This
operation can be pipelined as follows;
Perform temp1 ← r1 ⋈ r2 at processor P1 and send the result temp1 to processor P2 to
perform temp2 ← temp1 ⋈ r3 and send the result temp2 to processor P3 to perform
result ← temp2 ⋈ r4. The advantage is, we do not need to store the intermediate results, and
instead the result produced at one processor can be consumed directly by the other. Hence, we
would start receiving tuples well before P1 completes the join assigned to it.
Disadvantages:
1. Pipelined parallelism is not the good choice, if degree of parallelism is high.
2. Useful with small number of processors.
3. Not all operations can be pipelined. For example, consider the query given in the first section.
Here, you need to group at least one department employees. Then only the output can be
given for aggregate operation at the next processor.
4. Cannot expect full speedup.
2. Independent Parallelism:
Operations that are not depending on each other can be executed in parallel at different
processors. This is called as Independent Parallelism.
For example, in the expression r1 ⋈ r2 ⋈ r3 ⋈ r4, the portion r1 ⋈ r2 can be done in one
processor, and r3 ⋈ r4 can be performed in the other processor. Both results can be pipelined
into the third processor to get the final result.
Disadvantages:
Does not work well in case of high degree of parallelism
Query Optimization
Query optimizers account in large measure for the success of relational technology. Recall that
a query optimizer takes a query and finds the cheapest execution plan among the many
possible execution plans that give the same answer. Query optimizers for parallel query
evaluation are more complicated than query optimizers for sequential query evaluation.
First, the cost models are more complicated, since partitioning costs have to be accounted for,
and issues such as skew and resource contention must be taken into account. More important
is the issue of how to parallelize a query. Suppose that we have somehow chosen an expression
(from among those equivalent to the query) to be used for evaluating the query.
The expression can be represented by an operator tree. To evaluate an operator tree in a
parallel system, we must make the following decisions:


How to parallelize each operation, and how many processors to use for it.
What operations to pipeline across different processors, what operations to execute
independently in parallel, and what operations to execute sequentially, one after the
other.
These decisions constitute the task of scheduling the execution tree. Determining the resources
of each kind—such as processors, disks, and memory— that should be allocated to each
operation in the tree is another aspect of the optimization problem.
Design of Parallel Systems
Parallel loading of data from external sources is an important requirement, if we are to handle
large volumes of incoming data. A large parallel database system must also address these
availability issues:


Resilience to failure of some processors or disks.
Online reorganization of data and schema changes.
When we are dealing with large volumes of data (ranging in the terabytes), simple operations,
such as creating indices, and changes to schema, such as adding a column to a relation, can take
a long time—perhaps hours or even days. Therefore, it is unacceptable for the database system
to be unavailable while such operations are in progress. Most database systems allow such
operations to be performed online, that is, while the system is executing other transactions.
Parallelism on Multicore Processors
Parallelism has become commonplace on most computers today, even some of the smallest,
due to current trends in computer architecture. As a result, virtually all database systems today
run on a parallel platform.
Parallelism versus Raw Speed
Since the dawn of computers, processor speed has increased at an exponential rate, doubling
every 18 to 24 months. This increase results from an exponential growth in the number of
transistors that could be fit within a unit area of a silicon chip, and is known popularly as
Moore’s law, named after Intel co-founder Gordon Moore. Technically, Moore’s law is not a
law, but rather an observation and a prediction regarding technology trends.
As a result, modern processors typically are not one single processor but rather consist of
several processors on one chip. To maintain a distinction between on-chip multiprocessors and
traditional processors, the term core is used for an on-chip processor. Thus we say that a
machine has a multicore processor.
Cache Memory and Multithreading
Each core is capable of processing an independent stream of machine instructions. However,
because processors are able to process data faster than it can be accessed from main memory,
main memory can become a bottleneck that limits overall performance. For this reason,
computer designers include one or more levels of cache memory in a computer system. Cache
memory is more costly than main memory on a per-byte basis, but offers a faster access time.
In multilevel cache designs, the levels are called L1, L2, and so on, with L1 being the fastest
cache (and thus the most costly per byte and therefore the smallest), L2 the next fastest, and so
on. The result is an extension of the storage hierarchy to include the various levels of cache
below main memory.
Adapting Database System Design for Modern Architectures
It would appear that database systems are an ideal application to take advantage of large
numbers of cores and threads, since database systems support large numbers of concurrent
transactions. Finally, there are components of a database system shared by all transactions.
The adaptation of database system design and database query processing to multicore and
multithreaded systems remains an area of active research.
Download