UNIT1 PARALLEL DATABASES Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency gave birth to the concept of Parallel Databases. Parallel database system improves performance of data processing using multiple resources in parallel, like multiple CPU and disks are used parallely. It also performs many parallelization operations like, data loading and query processing. Goals of Parallel Databases The concept of Parallel Database was built with a goal to: Improve performance: The performance of the system can be improved by connecting multiple CPU and disks in parallel. Many small processors can also be connected in parallel. Improve availability of data: Data can be copied to multiple locations to improve the availability of data. For example: if a module contains a relation (table in database) which is unavailable then it is important to make it available from another module. Improve reliability: Reliability of system is improved with completeness, accuracy and availability of data. Provide distributed access of data: Companies having many branches in multiple cities can access data with the help of parallel database system. Parallel Database Architectures Today everybody interested in storing the information they have got. Even small organizations collect data and maintain mega databases. Though the databases eat space, they really helpful in many ways. For example, they are helpful in taking decisions through a decision support system. To handle such a voluminous data through conventional centralized system is bit complex. It means, even simple queries are time consuming queries. The solution is to handle those databases through Parallel Database Systems, where a table / database is distributed among multiple processors possibly equally to perform the queries in parallel. Such a system which share resources to handle massive data just to increase the performance of the whole system is called Parallel Database Systems. We need certain architecture to handle the above said. That is, we need architectures which can handle data through data distribution, parallel query execution thereby produce good throughput of queries or Transactions. Figure 1, 2 and 3 shows the different architecture proposed and successfully implemented in the area of Parallel Database systems. In the figures, P represents Processors, M represents Memory, and D represents Disks/Disk setups. 1. Shared Memory Architecture Figure 1 - Shared Memory Architecture In Shared Memory architecture, single memory is shared among many processors as show in Figure 1. As shown in the figure, several processors are connected through an interconnection network with Main memory and disk setup. Here interconnection network is usually a high speed network (may be Bus, Mesh, or Hypercube) which makes data sharing (transporting) easy among the various components (Processor, Memory, and Disk). Advantages: Simple implementation Establishes effective communication between processors through single memory addresses space. Above point leads to less communication overhead. Disadvantages: Higher degree of parallelism (more number of concurrent operations in different processors) cannot be achieved due to the reason that all the processors share the same interconnection network to connect with memory. This causes Bottleneck in interconnection network (Interference), especially in the case of Bus interconnection network. Addition of processor would slow down the existing processors. Cache-coherency should be maintained. That is, if any processor tries to read the data used or modified by other processors, then we need to ensure that the data is of latest version. Degree of Parallelism is limited. More number of parallel processes might degrade the performance. 2. Shared Disk Architecture Figure 2 - Shared Disk Architecture In Shared Disk architecture, single disk or single disk setup is shared among all the available processors and also all the processors have their own private memories as shown in Figure 2. Advantages: Failure of any processors would not stop the entire system (Fault tolerance) Interconnection to the memory is not a bottleneck. (It was bottleneck in Shared Memory architecture) Support larger number of processors (when compared to Shared Memory architecture) Disadvantages: Interconnection to the disk is bottleneck as all processors share common disk setup. Inter-processor communication is slow. The reason is, all the processors have their own memory. Hence, the communication between processors need reading of data from other processors’ memory which needs additional software support. Example Real Time Shared Disk Implementation DEC clusters (VMScluster) running Rdb 3. Shared Nothing Architecture Figure 3 - Shared Nothing Architecture In Shared Nothing architecture, every processor has its own memory and disk setup. This setup may be considered as set of individual computers connected through high speed interconnection network using regular network protocols and switches for example to share data between computers. (This architecture is used in the Distributed Database System). In Shared Nothing parallel database system implementation, we insist the use of similar nodes that are Homogenous systems. (In distributed database System we may use Heterogeneous nodes) Advantages: Number of processors used here is scalable. That is, the design is flexible to add more number of computers. Unlike in other two architectures, only the data request which cannot be answered by local processors need to be forwarded through interconnection network. Disadvantages: Non-local disk accesses are costly. That is, if one server receives the request. If the required data not available, it must be routed to the server where the data is available. It is slightly complex. Communication cost involved in transporting data among computers. Example Real Time Shared Nothing Implementation Teradata Tandem Oracle nCUBE Data Partitioning Strategies in Parallel Database Systems Partitioning the tables/databases is very important step in parallelizing the database activities. By partitioning the data equally into many different processors’ workload, we can achieve better performance (better parallelism) of the whole system. We have the following types of fragmentation (partitioning) techniques in Parallel database; Horizontal Fragmentation – Partitioning the tables using the conditions specified through WHERE clause of SQL query to distribute bunch of tuples (records). For example, consider the following schema; STUDENT (Regno, SName, Address, Branch, Phone) In this table, Branch column is used to store the academic branch in which the student admitted in. suppose, there are various branches like ‘BTech CIVIL’, ‘BTech MECH’, ‘BTech CSE’, and so on, then the following query would be used to partition the table using horizontal partitioning. SELECT * FROM student WHERE Branch = branch name; In the result, there won’t be any changes in the schema of the table; i.e the structure of the table is unaltered. Only, ‘BTech CIVIL’ students fragmented into one partition, ‘BTech CSE’ students fragmented into one partition and so on. Vertical Fragmentation – Partitioning tables using the decomposition rules to break and distribute the tables into multiple partitions vertically (different schemas) is called Vertical fragmentation. For example, if we would like to break STUDENT into different tables like STUDENT(Regno, SName, Address, Branch) and STU_PHONE(Regno, Phone), i.e into two different schema on columns is Vertical partitioning. Among the above discussed fragmentation techniques, partitioning any relation with respect to Parallel databases involves Horizontal Partitioning. Horizontal data partition helps us to distribute the data into several processors to execute queries on them simultaneously. Partitioning Strategies: There are various partitioning strategies proposed to manage the data distribution into multiple processors evenly. Let us assume that in our parallel database system we have n processors P0, P1, P2, …, Pn-1 and n disks D0, D1, D2, …, Dn-1 where we partition our data. The value of n is chosen according to the degree of parallelism required. The partitioning strategies are, Round-Robin Partitioning It is the simplest form of partitioning strategy. Here, the data are distributed into various disks in the fashion, first record into first disk, second record into second disk, and so on. If the number of available disks n is 10, then first record goes to D1 (1 mod 10 = 1), second record goes to D2 (2 mod 10 =2), and so on and 10 th record goes to D0 (10 mod 10 = 0), 11th record goes to D1 (11 mod 10 = 1). This scheme distributes data evenly in all the disks. Hash Partitioning This strategy identifies one or more attributes as partitioning attributes (which is not required in Round-Robin partitioning). We need a hash function which is carefully chosen (careful to avoid data skewness) which takes the identified partitioning attributes as input to hash function. For example, consider the following table; EMPLOYEE(ENo, EName, DeptNo, Salary, Age) If we choose DeptNo attribute as the partitioning attribute and if we have 10 disks to distribute the data, then the following would be a hash function; h(DeptNo) = DeptNo mod 10 If we have 10 departments, then according to the hash function, all the employees of department 1 will go into disk 1, department 2 to disk 2 and so on. As another example, if we choose the EName of the employees as partitioning attribute, then we could have the following hash function; h(EName) = (Sum of ASCII value of every character in the name) mod n, where n is the number of disks/partitions needed. Range Partitioning In Range Partitioning we identify one or more attributes as partitioning attributes. Then we choose a range partition vector to partition the table into n disks. The vector is the values present in the partitioning attribute. For example, for the EMPLOYEE relation given above, if the partitioning attribute is Salary, then the vector would be one as follows; [5000, 15000, 30000], where every value means the individual range of salaries. That is, 5000 represents the first range (0 – 5000), 15000 represents the range (5001 – 15000), 30000 represents the third range (15001 – 30000), and it includes the final range which is (30001 – rest). Hence, the vector with 3 values represents 4 disks/partitions. The above discussed partition strategies must be chosen carefully according to the workload of your parallel database system. The workload may involve many components like, which attribute is frequently used in any queries as filtering condition, the number of records in a table, the size of the database, approximate number of incoming queries in a day, etc. Detailed Example: Let us start with the following table Emp_table. Emp_table instance has 14 records and every record stores information about the name of the employee; his/her work grade, and the department name. Assume that we have 3 processors namely P0, P1, P2, and 3 Disks associated with those 3 processors namely D0, D1, D2. Emp_table ENAME GRADE DNAME SMITH 1 RESEARCH BLAKE 4 SALES FORD 4 RESEARCH KING 5 ACCOUNTING SCOTT 4 RESEARCH MILLER 2 ACCOUNTING TURNER 3 SALES WARD 2 SALES MARTIN 2 SALES ADAMS 1 RESEARCH Emp_table ENAME GRADE DNAME JONES 4 RESEARCH JAMES 1 SALES CLARK 4 ACCOUNTING ALLEN 3 SALES Table 1 – Emp_table A. Round-Robin Partitioning: In this strategy we partition records in a round-robin manner using the function i mod n, where i is the record position in the table and n is the number of partitions/disks which is in our case 3. On the application of partitioning technique, first record goes into D1, second record goes into D2, third record goes into D0, fourth record goes into D1, and so on. After distribution of records, we will get the following partitions; Emp_table_Partition0 ENAME GRADE DNAME FORD 4 RESEARCH MILLER 2 ACCOUNTING MARTIN 2 SALES JAMES 1 SALES Table 2 – Records 3, 6, 9, 12 mod 3 Emp_table_Partition1 ENAME GRADE DNAME SMITH 1 RESEARCH KING 5 ACCOUNTING TURNER 3 SALES ADAMS 1 RESEARCH CLARK 4 ACCOUNTING Table 3 – Records 1, 4, 7, 10, 13 mod 3 Emp_table_Partition2 ENAME GRADE DNAME BLAKE 4 SALES SCOTT 4 RESEARCH Emp_table_Partition2 ENAME GRADE DNAME WARD 2 SALES JONES 4 RESEARCH ALLEN 3 SALES Table 4 – Records 2, 5, 8, 11, 14 mod 3 B. Hash Partitioning: Let us take GRADE attribute of the Emp_table to explain Hash partitioning. Let us choose a hash function as follows; h(GRADE) = (GRADE mod n) where GRADE is the value of GRADE attribute of a record and n is number of partitions which is 3 in our case. While applying the hash partitioning on GRADE, we will get the following partitions of Emp_table. For example, the GRADE of ‘Smith’ is 1 and while hashing the function shows partition 1 (i.e 1 mod 3 = 1). The GRADE of ‘Blake’ is 4, then (4 mod 3) directs to partition 1. The GRADE of ‘King’ is 5 which directs to partition 2 (5 mod 3 = 2). Emp_table_Partition0 ENAME GRADE DNAME TURNER 3 SALES ALLEN 3 SALES Table 5 – GRADEs 3 mod 3 Emp_table_Partition1 ENAME GRADE DNAME SMITH 1 RESEARCH BLAKE 4 SALES FORD 4 RESEARCH SCOTT 4 RESEARCH ADAMS 1 RESEARCH JONES 4 RESEARCH JAMES 1 SALES CLARK 4 ACCOUNTING Table 6 – GRADEs 1, 4 mod 3 Emp_table_Partition2 ENAME GRADE DNAME KING 5 ACCOUNTING Emp_table_Partition2 ENAME GRADE DNAME MILLER 2 ACCOUNTING WARD 2 SALES MARTIN 2 SALES Table 7 – GRADEs 2, 5 mod 3 C. Range Partitioning: Let us consider GRADE of Emp_table to partition under range partitioning. For applying range partition, we need to first identify partitioning vector, [v0, v1, …, vn-2]. Let us choose the following vector as range partitioning vector for our case; [2, 4] According to the vector, the records having the GRADE value 2 and less will go into partition 0, greater than 2 and less than or equal to 4 will go into partition 1, and all the other values (greater than 4) will go into partition 2 as depicted in the following tables. Emp_table_Partition0 ENAME GRADE DNAME SMITH 1 RESEARCH MILLER 2 ACCOUNTING WARD 2 SALES MARTIN 2 SALES ADAMS 1 RESEARCH JAMES 1 SALES Table 8 – GRADE values 1 and 2 Emp_table_Partition1 GRAD ENAME DNAME E BLAKE 4 SALES FORD 4 RESEARCH SCOTT 4 RESEARCH TURNER 3 SALES JONES 4 RESEARCH CLARK 4 ACCOUNTING ALLEN 3 SALES Table 9 – GRADE values 3 and 4 Emp_table_Partition2 ENAME GRADE DNAME KING 5 ACCOUNTING Table 10 – GRADE value 5 and above Interquery and Intraquery Parallelism in Parallel Database Inter-query Parallelism It is a form of parallelism where many different Queries or Transactions are executed in parallel with one another on many processors. Advantages It increases Transaction Throughput. That is, number of transactions executed in a given time can be increased. It scales up the Transaction processing system. Hence, best suited for On-Line Transaction Processing (OLTP) systems. Supported Parallel Database Architectures It is easy to implement in Shared Memory Parallel System. Lock tables and Log information are maintained in the same memory. Hence, it is easy to handle those transactions which shares locks with other transactions. Locking and logging can be done efficiently. In other parallel architectures like Shared Disk and Shared Nothing, the locking and logging must be done through message passing between processors, which is considered as costly operation when compared Shared Memory Parallel architecture. Cache coherency problem would occur. Example Database systems which support Inter-query Parallelism Oracle 8 and Oracle Rdb Intra-Query Parallelism It is the form of parallelism where Single Query is executed in parallel on many processors. Advantages To speed up a single complex long running queries. Best suited for complex scientific calculations (queries). Supported Parallel Database Architectures SharedMemory, Shared Disk and Shared Nothing parallel architectures are supported. We need not worry about locking and logging as because it involves parallelizing single query. Types Intra-operation parallelism – the process of speeding up a query through parallelizing the execution of individual operations. The operations which can be parallelized are Sort, Join, Projection, Selection and so on. Inter-operation parallelism – the process of speeding up a query through parallelizing various operations which are part of the query. For example, a query which involves join of 4 tables can be executed in parallel in two processors in such a way that each processor shall join two relations locally and the result1 and result2 can be joined further to produce the final result. Example Database systems which support Intra-query Parallelism Informix, Terradata Parallel Database - Intraquery Parallelism Intraquery Parallelism It is about executing a single query in parallel using multiple processors or disks. This can be done by dividing the data into many smaller units and execute the query on those smaller tables. We have so many queries which are complex and consume more time and resources. For example, consider the query given below; SELECT * FROM Email ORDER BY Start_Date; This query will sort the records of Email table in ascending order on the attribute Start_Date. Assume that the Email table has 10000 records. To sort the table we need at least 10000 comparisons of Salary attribute values, if the query is executed in single processor. If the same query is executed in several processors simultaneously, we can finish sorting in lesser time. For example, if 10000 records are distributed equally into 10 processors, 1000 each, then every processor can sort 1000 records in lesser time, later it can be combined/merged using other algorithms to produce final sorted result. In Intraquery Parallelism, we have two types; IntraoperationParallelism InteroperationParallelism Parallel Database - Intraoperation Parallelism Intra-operation Parallelism It is about parallelizing a single relational operation given in a query. SELECT * FROM Email ORDER BY Start_Date; In the above query, the relational operation is Sorting. Since a table may have large number of records in it, the operation can be performed on different subsets of the table in multiple processors, which reduces the time required to sort. Consider another query, SELECT * FROM Student, CourseRegd WHERE Student.Regno = CourseRegd.Regno; In this query, the relational operation is Join. The query joins two tables Student, and CourseRegd on common attribute Regno. Parallelism is required here if the size of tables is very large. Usually, order of tuples does not matter in DBMS. Hence, the tables arranged in random order needs every record of one table should be matched with every record of other table to complete the join process. For example, if Student has 10000 records and CourseRegd has 60000 records, then join requires 10000 X 60000 comparisons. If we exploit parallelism in here, we could achieve better performance. There are many such relational operations which can be executed in parallel using many processors on subsets of the table/tables mentioned in the query. The following list includes the relational operations and various techniques used to implement those operations in parallel. o o o o Parallel Sort Range-Partitioning Sort Parallel External Sort-Merge Parallel Join Partitioned Join Fragment and Replicate Join Parallel Sort - Range Partitioning Sort Range Partitioning Sort Assumptions: Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1. Disk Di is associated with Processor Pi. Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or Hash Partitioning technique or Range Partitioning technique (if range partitioned on some other attribute other than sorting attribute) Objective: Our objective is to sort a relation (table) Ri that resides on n disks on an attribute A in parallel. Steps: Step 1: Partition the relations Ri on the sorting attribute A at every processor using a range vector v. Send the partitioned records which fall in the ith range to Processor Pi where they are temporarily stored in Di. Step 2: Sort each partition locally at each processor Pi. And, send the sorted results for merging with all the other sorted results which is trivial process. Parallel Database - Parallel External Sort-Merge Technique Parallel External Sort-Merge Assumptions: Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1. Disk Di is associated with Processor Pi. Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or Hash Partitioning technique or Range Partitioning technique (partitioned on any attribute) Objective: Our objective is to sort a relation (table) Ri on an attribute A in parallel where R resides on n disks. Steps: Step 1: Sort the relation partition Ri which is stored on disk Di on the sorting attribute of the query. Step 2: Identify a range partition vector v and range partition every Ri into processors, P0, P1, …, Pn-1 using vector v. Step 3: Each processor Pi performs a merge on the incoming range partitioned data from every other processors (The data are actually transferred in order. That is, all processors send first partition into P0, then all processors sends second partition into P1, and so on). Step 4: Finally, concatenate all the sorted data from different processors to get the final result. Point to note: Range partition must be done using a good range-partitioning vector. Otherwise, skew might be the problem. Example: Let us explain the above said process with simple example. Consider the following relation schema Employee; Employee (Emp_ID, EName, Salary) Assume that relation Employee is permanently partitioned using Round-robin technique into 3 disks D0, D1, and D2 which are associated with processors P0, P1, and P2. At processors P0, P1, and P2, the relations are named Employee0, Employee1 and Employee2 respectively. This initial state is given in Figure 1. Figure 1 - Partitioned Employee Assume that the following sorting query is initiated. SELECT * FROM Employee ORDER BY Salary; As already said, the table Employee is not partitioned on the sorting attribute Salary. Then, the Parallel External Sort-Merge technique works as follows; Step 1: Sort the data stored in every partition (every disk) using the ordering attribute Salary. (Sorting of data in every partition is done temporarily). At this stage every Employee i contains salary values of range minimum to maximum. The partitions sorted in ascending order is shown below, in Figure 2. Figure 2 - Partitioned Employee in ascending order on Salary attribute Step 2: We have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-2]. For our example, let us assume the following range vector; v[14000, 24000] This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2 (24001 and more). Redistribute every partition (Employee0, Employee1 and Employee2) using these range vectors into 3 disks temporarily. What would be the status of Temp_Employee 0, 1, and 2 after distributing Employee 0 is given in Figure 3. Figure 3 - Distribution of Employee 0 according to the range vector Step 3: Actually, the above said distribution is executed at all processors in parallel such that processors P0, P1, and P2 are sending the first partition of Employee 0, 1, and 2 to disk 0. Upon receiving the records from various partitions, the receiving processor P0 merges the sorted data. This is shown in Figure 4. Figure 4 - Merging data of range <=14000 at D0 from all the disks The above said process is done at all processors for different partitions. The final version of Temp_Employee 0, 1, and 2 are shown in Figure 5. Figure 5 - Status of Temp_Employee 0, 1, and 2 after distributing all ranges from all disks Step 4: The final concatenation of sorted data from all the disks is trivial. Range partition must be done using a good range-partitioning vector. Otherwise, skew might be the problem. Example: Let us explain the above said process with simple example.Consider the following relation schema Employee; Employee (Emp_ID, EName, Salary) Assume that relation Employee is permanently partitioned using Round-robin technique into 3 disks D0, D1, and D2 which are associated with processors P0, P1, and P2. At processors P0, P1, and P2, the relations are named Employee0, Employee1 and Employee2 respectively. This initial state is given in Figure 1. Figure 1 Assume that the following sorting query is initiated. SELECT * FROM Employee ORDER BY Salary; As already said, the table Employee is not partitioned on the sorting attribute Salary. Then, the Range-Partitioning technique works as follows; Step 1: At first we have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-2]. For our example, let us assume the following range vector; v[14000, 24000] This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2 (24001 and more). Redistribute the relations Employee0, Employee1 and Employee2 using these range vectors into 3 disks temporarily. After this distribution disk 0 will have range 0 records (i.e, records with salary value less than or equal to 14000), disk 1 will have range 1 records (i.e, records with salary value greater than 14000 and less than or equal to 24000), and disk 2 will have range 2 records (i.e, records with salary value greater than 24000). This redistribution according to range vector v is represented in Figure 2 as links to all the disks from all the relations. Temp_Employee0, Temp_Employee1, and Temp_Employee2, are the relations after successful redistribution. These tables are stored temporarily in disks D0, D1, and D2. (They can also be stored in Main memories (M0, M1, M2) if they fit into RAM). Figure 2 Step 2: Now, we got temporary relations at all the disks after redistribution. At this point, all the processors sort the data assigned to them in ascending order of Salary individually. The process of performing the same operation in parallel on different sets of data is called Data Parallelism. Final Result: After the processors completed the sorting, we can simply collect the data from different processors and merge them. This merge process is straightforward as we have data already sorted for every range. Hence, collecting sorted records from partition 0, partition 1 and partition 2 and merging them will give us final sorted output. Parallel Join Technique - Partitioned Join Partitioned Join The relational tables that are to be joined gets partitioned on the joining attributes of both tables using same partitioning function to perform Join operation in parallel. How does Partitioned Join work? Assume that relational tables r and s need to be joined using attributes r.A and s.B. The system partitions both r and s using same partitioning technique into n partitions. In this process A and B attributes (joining attributes) to be used as partitioning attributes as well for r and s respectively. r is partitioned into r0, r1, r2, …, rn-1 and s is partitioned into s0, s1, s2, …, sn-1. Then, the system sends partitions ri and si into processor Pi, where the join is performed locally. What type of joins can we perform in parallel using Partitioned Join? Only joins such as Equi-Joins and Natural Joins can be performed using Partitioned join technique. Why Equi-Jion and Natural Jion? Equi-Join or Natural Join is done between two tables using an equality condition such as r.A = s.B. The tuples which are satisfying this condition, i.e, same value for both A and B, are joined together. Others are discarded. Hence, if we partition the relations r and s on applying certain partitioning technique on both A and B, then the tuples having same value for A and B will end up in the same partition. Let us analyze this using simple example; RegNo 1 3 4 2 SName Gen Phone Sundar M 9898786756 Karthik M 8798987867 John M 7898886756 Ram M 9897786776 Table 1 - STUDENT RegNo Courses 4 2 3 Database Database Data Structures 1 Multimedia Table 2 – COURSES_REGD Let us assume the following; The RegNo attributes of tables STUDENT and COURSES_REGD are used for joining. Observe the order of tuples in both tables. They are not in particular order. They are stored in random order on RegNo. Partition the tables on RegNo attribute using Hash Partition. We have 2 disks and we need to partition the relational tables into two partitions (possibly equal). Hence, n is 2. The hash function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the hash function we shall get the tables STUDENT and COURSES_REGD partitioned into Disk0 and Disk1 as stated below. Partition 0 Partition 1 RegNo SName Gen Phone 4 John M 7898886756 2 Ram M 9897786776 STUDENT_0 RegNo SName Gen Phone 1 Sundar M 9898786756 3 Karthik M 8798987867 STUDENT_1 RegNo Courses 4 Database 2 Database COURSES_REGD_0 RegNo Courses 3 Data Structures 1 Multimedia COURSES_REGD_1 From the above table, it is very clear that the same RegNo values of both tables STUDENT and COURSES_REGD are sent to same partitions. Now, join can be performed locally at every processor in parallel. One more interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd records) comparisons need to be done in every partition for our example. Hence, we need total of 8 comparisons in partitioned join against 16 (4 X 4) in conventional join. The above discussed process is shown in Figure 1. Figure 1 - Partitioned Join Fragment and Replicate Join - a parallel joining technique Fragment and Replicate Join Introduction All of us know about different types of joins in terms of the nature of the join conditions and the operators used for join. They are Equi-join and Non-Equi Join. Equi-join is performed through checking an equality condition between different joining attributes of different tables. Non-equi join is performed through checking an inequality condition between joining attributes. Equi-join is of the form, SELECT columns FROM list_of_tables WHERE table1.column = table2.column; whereas, Non-equi join is of the form, SELECT columns FROM list_of_tables WHERE table1.column < table2.column; (Or, any other operators >, <>, <=, >= etc. in the place of < in the above example) We have discussed Partitioned Join in the previous post, where we partitioned the relational tables that are to be joined, into equal partitions and we performed join on individual partitions locally at every processor. Partitioning the relations on the joining attribute and join them will work only for joins that involve equality conditions. Clearly, joining the tables by partitioning will work only for Equi-joins or natural joins. For inequality joins, partitioning will not work. Consider a join condition as given below; r r.a>s.b s In this non-equal join condition, the tuples (records) of r must be joined with records of s for all the records where the value of attribute r.a is greater than s.b. In other words, all records of r join with some records of s and vice versa. That is, one of the relations’ all records must be joined with some of the records of other relation. For clear example, see Non-equi join post. What does fragment and replicate mean? Fragment means partitioning a table either horizontally or vertically (Horizontal and Vertical fragmentation). Replicate means duplicating the relation, i.e, generating similar copies of a table. This join is performed by fragmenting and replicating the tables to be joined. Asymmetric Fragment and Replicate Join (How does Asymmetric Fragment and Replicate Join work?) It is a variant of Fragment and Replicate join. It works as follows; 1. The system fragments table r into n fragments such that r 0, r1, r2, .., rn-1, where r is one of the tables that is to be joined and n represents the number of processors. Any partitioning technique, round-robin, hash or range partitioning could be used to partition the relation. 2. The system replicates the other table, say s into n processors. That is, the system generates n copies of s and sends to n processors. 3. Now we have, r0 and s in processor P0, r1 and s in processor P1, r2 and s in processor P2, .., rn1 and s in processor Pn-1. The processor Pi is performing the join locally on r i and s. Figure 1 given below shows the process of Asymmetric Fragment-and-Replicate join (it may not be the appropriate example, but it clearly shows the process); Figure 1 - process of Asymetric Fragment and Replicate Parallel Join Points to Note: 1. Non-equal join can be performed in parallel. 2. If one of the relations to be joined is already partitioned into n processors, this technique is best suited, because we need to replicate the other relation. 3. Unlike in Partitioned Join, any partitioning techniques can be used. 4. If one of the relations to be joined is very small, the technique performs better. Fragment and Replicate Join (How does Fragment and Replicate Join work?) It is the general case of Asymmetric Fragment-and-Replicate join technique. Asymmetric technique is best suited if one of the relations to be joined is small, and if it can fit into memory. If the relations that are to be joined are large, and the joins is non-equal then we need to use Fragment-and-Replicate Join. It works as follows; 1. The system fragments table r into m fragments such that r0, r1, r2, .., rm-1, and s into n fragments such that s0, s1, s2, .., sn-1 . Any partitioning technique, round-robin, hash or range partitioning could be used to partition the relations. 2. The values for m and n are chosen based on the availability of processor. That is, we need at least m*n processors to perform join. 3. Now we have to distribute all the partitions of r and s into available processors. And, remember that we need to compare every tuple of one relation with every tuple of other relation. That is the records of r0 partition should be compared with all partitions of s, and the records of partition s0 should be compared with all partitions of r. This must be done with all the partitions of r and s as mentioned above. Hence, the data distribution is done as follows; a. As we need m*n processors, let us assume that we have processors P 0,0, P0,1, …, P0,n1, P1,0, P1,1, …, Pm-1,n-1. Thus, processor Pi,j performs the join of ri with sj. b. To ensure the comparison of every partition of r with every other partition of s, we replicate ri with the processors, Pi,0, Pi,1, Pi,2, …, Pi,n-1, where 0, 1, 2, …, n-1 are partitions of s. This replication ensures the comparison of every ri with complete s. c. To ensure the comparison of every partition of s with every other partition of r, we replicate si with the processors, P0,i, P1,i, P2,i, …, Pm-1,i, where 0, 1, 2, …, m-1 are partitions of r. This replication ensures the comparison of every si with complete r. 4. Pi,j computes the join locally to produce the join result. Figure 2 given below shows the process of general case Fragment-and-Replicate join (it may not be the appropriate example, but it clearly shows the process); Figure 2 - process of general case Fragment-and-Replicate Join Partitioned Parallel Hash Join The Sequential Hash Join technique discussed in the post Hash Join technique in DBMS can be parallelized. . The Technique Please go through the post Hash Join technique in DBMS for better understanding of Parallel Hash-Join. Assume that we have, n processors, P0, P1, …, Pn-1 two tables, r and s (r and s are already partitioned into disks of n processors), s is the smaller table Interoperation Parallelism It is about executing different operations of a query in parallel. A single query may involve multiple operations at once. We may exploit parallelism to achieve better performance of such queries. Consider the example query given below; SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id; It involves two operations. First one is an Aggregation and the second is grouping. For executing this query, We need to group all the employee records based on the attribute Dept_Id first. Then, for every group we can apply the AVG aggregate function to get the final result. We can use Interoperation parallelism concept to parallelize these two operations. [Note: Intra-operation is about executing single operation of a query using multiple processors in parallel] The following are the variants using which we would achieve Interoperation Parallelism; 1. Pipelined Parallelism 2. Independent Parallelism 1. Pipelined Parallelism In Pipelined Parallelism, the idea is to consume the result produced by one operation by the next operation in the pipeline. For example, consider the following operation; r1 ⋈ r2 ⋈ r3 ⋈ r4 The above expression shows a natural join operation. This actually joins four tables. This operation can be pipelined as follows; Perform temp1 ← r1 ⋈ r2 at processor P1 and send the result temp1 to processor P2 to perform temp2 ← temp1 ⋈ r3 and send the result temp2 to processor P3 to perform result ← temp2 ⋈ r4. The advantage is, we do not need to store the intermediate results, and instead the result produced at one processor can be consumed directly by the other. Hence, we would start receiving tuples well before P1 completes the join assigned to it. Disadvantages: 1. Pipelined parallelism is not the good choice, if degree of parallelism is high. 2. Useful with small number of processors. 3. Not all operations can be pipelined. For example, consider the query given in the first section. Here, you need to group at least one department employees. Then only the output can be given for aggregate operation at the next processor. 4. Cannot expect full speedup. 2. Independent Parallelism: Operations that are not depending on each other can be executed in parallel at different processors. This is called as Independent Parallelism. For example, in the expression r1 ⋈ r2 ⋈ r3 ⋈ r4, the portion r1 ⋈ r2 can be done in one processor, and r3 ⋈ r4 can be performed in the other processor. Both results can be pipelined into the third processor to get the final result. Disadvantages: Does not work well in case of high degree of parallelism Query Optimization Query optimizers account in large measure for the success of relational technology. Recall that a query optimizer takes a query and finds the cheapest execution plan among the many possible execution plans that give the same answer. Query optimizers for parallel query evaluation are more complicated than query optimizers for sequential query evaluation. First, the cost models are more complicated, since partitioning costs have to be accounted for, and issues such as skew and resource contention must be taken into account. More important is the issue of how to parallelize a query. Suppose that we have somehow chosen an expression (from among those equivalent to the query) to be used for evaluating the query. The expression can be represented by an operator tree. To evaluate an operator tree in a parallel system, we must make the following decisions: How to parallelize each operation, and how many processors to use for it. What operations to pipeline across different processors, what operations to execute independently in parallel, and what operations to execute sequentially, one after the other. These decisions constitute the task of scheduling the execution tree. Determining the resources of each kind—such as processors, disks, and memory— that should be allocated to each operation in the tree is another aspect of the optimization problem. Design of Parallel Systems Parallel loading of data from external sources is an important requirement, if we are to handle large volumes of incoming data. A large parallel database system must also address these availability issues: Resilience to failure of some processors or disks. Online reorganization of data and schema changes. When we are dealing with large volumes of data (ranging in the terabytes), simple operations, such as creating indices, and changes to schema, such as adding a column to a relation, can take a long time—perhaps hours or even days. Therefore, it is unacceptable for the database system to be unavailable while such operations are in progress. Most database systems allow such operations to be performed online, that is, while the system is executing other transactions. Parallelism on Multicore Processors Parallelism has become commonplace on most computers today, even some of the smallest, due to current trends in computer architecture. As a result, virtually all database systems today run on a parallel platform. Parallelism versus Raw Speed Since the dawn of computers, processor speed has increased at an exponential rate, doubling every 18 to 24 months. This increase results from an exponential growth in the number of transistors that could be fit within a unit area of a silicon chip, and is known popularly as Moore’s law, named after Intel co-founder Gordon Moore. Technically, Moore’s law is not a law, but rather an observation and a prediction regarding technology trends. As a result, modern processors typically are not one single processor but rather consist of several processors on one chip. To maintain a distinction between on-chip multiprocessors and traditional processors, the term core is used for an on-chip processor. Thus we say that a machine has a multicore processor. Cache Memory and Multithreading Each core is capable of processing an independent stream of machine instructions. However, because processors are able to process data faster than it can be accessed from main memory, main memory can become a bottleneck that limits overall performance. For this reason, computer designers include one or more levels of cache memory in a computer system. Cache memory is more costly than main memory on a per-byte basis, but offers a faster access time. In multilevel cache designs, the levels are called L1, L2, and so on, with L1 being the fastest cache (and thus the most costly per byte and therefore the smallest), L2 the next fastest, and so on. The result is an extension of the storage hierarchy to include the various levels of cache below main memory. Adapting Database System Design for Modern Architectures It would appear that database systems are an ideal application to take advantage of large numbers of cores and threads, since database systems support large numbers of concurrent transactions. Finally, there are components of a database system shared by all transactions. The adaptation of database system design and database query processing to multicore and multithreaded systems remains an area of active research.