A Novel Way of Determining the Optimal Location of a Fragment in a DDBS: BGBR Ashkan Bayati Database Research Group Pedram Ghodsnia Database Research Group Reza Basseda Database Research Group Faculty of Electrical and Computer Eng. Faculty of Electrical and Computer Eng. Faculty of Electrical and Computer Eng. School of Engineering School of Engineering School of Engineering University of Tehran abayati1@acm.ut.org University of Tehran wwwpedram@yahoo.com University of Tehran R.Basseda@ece.ut.ac.ir Maseud Rahgozar Database Research Group Control and Intelligent Processing Center of Excellence Faculty of Electrical and Computer Eng. School of Engineering University of Tehran Rahgozar@ut.ac.ir ABSTRACT This paper addresses the problem of determining the optimal location to place a fragment in a distributed non-replicated database. The algorithm defined takes into consideration a changing environment with changing access patterns. This paper contributes by allocating data fragments to their optimal location, in a distributed network, based on the access patterns for that fragment. The mechanism for achieving this optimality relies on knowing the costs involved in moving data fragments from one site to the other. Embedded into the algorithm is a mechanism to prioritize fragments so that if there is a limited space on a specific node where many fragments are chosen to be allocated the ones with higher priority are placed before the lower priority fragments. Keywords Distributed Database Systems, Dynamic Fragment Allocation. 1. INTRODUCTION Due to the advancements in current networking technology, distributed database systems (DDBS) have emerged. In a distributed system data is stored across several sites, and each site is typically managed by a database management system (DBMS). The individual database systems run independently but all sites cooperate to provide a coherent view of the overall system [1]. Since relations are stored across several sites, accessing a relation in a remote site incurs message passing costs and, to reduce this overhead, a single relation may be partitioned or fragmented across several sites. Determining the location of a fragment is normally performed by placing the fragment in the location it is mostly accessed. The fragment allocation problem is very similar to the file allocation problem which has been studied using both replicated and non-replicated models [7, 15, 16]. The algorithms and policies adopted in these papers is based on a static environment and known access patterns. However, with the growth of internet and networking technologies our environment and access patterns are continuously changing; hence, a dynamic algorithm is more appropriate. [6, 9, 17] provide great insight into dynamic algorithms. Data fragmentation can either be in vertical, horizontal or mixed [2, 3]. In vertical fragmentation, each fragment consists of a subset of columns of the original relation. In horizontal fragmentation, each fragment consists of a subset of rows of the original relation and mixed is an approach that uses both horizontal and vertical fragmentation. A major cost in executing queries in a distributed database system is the data transfer cost incurred in transferring relations (fragments) accessed by a query from different sites to the site where the query is initiated. The objective of a data allocation algorithm is to determine the assignment of fragments at different sites so as to minimize the total data transfer cost incurred in executing a set of queries. This is equivalent to minimizing the query execution time plus the fragment migration time, which is of primary importance in a wide class of distributed conventional or multimedia database systems [10]. Various approaches have already described the data Fragment Location technique in distributed systems [8]. Some of these possess limitations in either the theoretical or implementation details [4, 5]. Others fail to prove their performance in various network topologies [10]. In most studies, fragment allocation has been based on static data access patterns and/or static query patterns. In a static environment where the access probabilities of nodes to the fragments never change, a static allocation of fragments provides the best solution. However, in a dynamic environment where the access patterns are continuously changing, the static allocation solution would degrade the database performance [11]. A dynamic mechanism for fragment allocation commonly used in commercial database systems is named the optimal algorithm [12]. The algorithm works by keeping a counter for each node that has accessed an object. The counter is incremented after a remote access to an object. The object is moved if the site with the highest counter value is a site other than the current storage site. The process to monitor the counters is run at regular intervals. Essentially the optimal algorithm works by moving an object to the site that has most frequently accessed it. 2. METHODOLGY Although it does make sense to move an object close to nodes with frequent access to that object, the optimal choice is not necessarily determined by selecting a node that has accessed that object. To further illustrate this point please refer to figure 1. Assume that object O1 is stored in Node A. Both Nodes A and D have accessed this node with the number of accesses 13 and 14 respectively. In this case, the optimal algorithm would choose Node D to move object O1. However, that would make accesses from Node A to object O1 an expensive operation. The problem with the optimal algorithm is that the algorithm fails to evaluate any intermediate nodes. In the below example if the cost of the links are all identical Node C would be a better choice than Node D. The reason for this being is that Node C is a node close to Node D but not too far from Node A and since the number of accesses by Nodes A and D to object O1 is almost equivalent it is best to consider moving the object to an intermediate node. Section 2 will provide greater insight to why Node C was chosen and not Node B. Initially at startup or after a topology change the algorithm begins by running Dijkstra’s algorithm to obtain the shortest path from one node to every other node [13]. This algorithm is continuously repeated until we have a shortest path matrix that determines the shortest path from every node to every other node. Assume we have a weighted directed graph G= (V, E), with a weight function W: E → R mapping edges to real valued weights, with nodes A, B, C and D. The shortest path matrix would look like the following: The BGBR algorithm is performed in three steps: 1. Determination of Shortest Paths 2. Aggregate Bottom Up 3. Determine the Minimum Cost Determination of Shortest Paths: A B C D A AA BA CA DA B AB BB CB DB C AC BC CC DC D AD BD CD DD Figure 2 The entry in position (X, Y) is the value determined by adding all the weights along the shortest path from Node X to Node Y. Figure 1 The NNA (Near Neighborhood Allocation) algorithm presented in [11], a variation to the optimal algorithm, has shown better performance than the optimal algorithm. NNA works in the same way as the optimal algorithm, by keeping a counter for each node accessing each object, except when a node has accessed an object more times then the owner of that object it does not move the fragment to than node in one step. It gradually moves the object along the shortest path between the two nodes one hop at a time. Although NNA has better overall system performance than the optimal algorithm it still suffers by not determining the best location for an object considering the overall access patterns of all nodes on a specific object. The only two nodes it considers is the owner of the node and the node that has accessed the object more times than the owner. In this paper we are going to present a novel approach to dynamically determine the location of a fragment that provides an overall optimal system performance. The algorithm is called BGBR, an abbreviation for the first letters of the author’s last names. The algorithm proves to have better performance than the commonly commercially used optimal and NNA algorithms. The rest of this paper is organized as following: In section 2 we will describe about our method, section 3 consists of our simulation environment and its results. Section 4 is future work and section 5 is our conclusion. Aggregate Bottom Up: The BGBR algorithm keeps track of the number of times a fragment has been accessed by each and every node. This is simply implemented with the use of counters. However, once a fragment is relocated then its counters all become zero. The idea in this step is to determine the highest priority fragment to be relocated. This is an important step due to the limited physical hardware space in typical systems. If two fragments want to reside in a location that only has space for one of these fragments, it is important to choose the fragment with higher priority. Algorithms normally define the priority of a fragment based on individually assessing which object has been accessed the most by which node without considering the relationship among the various nodes accessing the same object [14]. The following figure shows why this is not the best strategy: Node ID C1 C2 C3 Object ID X Y Y num of Accesses 10 9 8 Figure 3 In the above figure fragment X has been accessed the most (a total of 10 times) if we only consider the number of accesses by individual nodes. However, if we sum the number of accesses and group by the Object ID we see that object Y has been accessed a total of 17 times, where as Object X has been accessed 10 times. Hence, the object with higher priority should be object X not object Y. Example: Assume we have the following weighted graph (Figure 4) with its corresponding shortest path matrix (figure 5). After aggregation we sort the results in descending order. While we are sorting we make sure that only objects with an aggregated value of over a predefined threshold are kept. All others are disregarded. The reason for the threshold is to prevent too much object mobility. Moving objects that are accessed so rarely to an optimal location in the topology will instead of improving performance will degrade performance. This is because it is not justifiable to bare the high costs of moving objects that are rarely being accessed. Determine the Minimum Cost: Once we have determined and sorted the objects based on their priority we need to dequeue objects from the priority queue and select the optimal location for each. The optimal location can be calculated by selecting the location that provides the minimum cost. How do we determine the minimal cost? More importantly to determining the minimal cost, is to determine the cost of placing an object on a node. Figure 4 The idea behind determining the cost is to place objects so that they are close to the nodes that access them. Hence, the cost of Node X accessing an object on Node Y is the value of the shortest path from Node X to Node Y multiplied by the number of times this access has taken place. Therefore the total cost of placing an object, O, on Node X can be calculated with the following formula: Assume Nodes {W, X, Y, Z} have accessed object O. Let SP(Xi) denote the shortest path from Node i to Node X. Let A(Oi) denote the number of accesses Node i has made to object O. Let SP(XX)=0. A A B C D SP(XW).A(OW) + SP(XX).A(OX) + SP(XY).A(OY)+ SP(XZ).A(OZ) ∑ (SP(Xi).A(Oi)) where i={W,X,Y,Z} 1-1 6 0 11 8 O1 8 13 16 18 55 A B C D Sum Let Loc denote the Location. Let i denote the set {W, X, Y, Z} = Min (0x8 + 6x13 + 5x16 + 14x18, Optimal Value= Min(Loc A, Loc B, Loc C, Loc D) The minimal cost is achieved by taking the Minimum of: 6x8 + 0x13 + 11x16 + 8x18, ∑(SP(Yi).A(Oi)), 5x8 + 11x13 + 0x16 + 9x18, 1-2 The location of where the minimum value took place provides the optimal solution. 6 14 9 11 40 As shown in Figure 6 object O1 has a higher aggregated value then O2; hence, O1’s location is determined before O2. To find the optimal location for O1, we use formula 1-2. The following are the results: Assume Nodes have {W, X, Y, Z} have accessed object O. (∑(SP(Wi).A(Oi)), ∑(SP(Xi).A(Oi)), ∑(SP(Zi).A(Oi))) O2 Figure 6 5 11 0 9 D 14 8 9 0 Also assume that we have the following access patterns for objects O1 and O2. Or The above formula provides the cost of placing an object on a node. To determine the minimal cost for object O we need to use the following formula: 0 6 5 14 C Figure 5 Therefore the cost of placing object O on Node X is: B 14x8 + 8x14 + 9x16 + 0x18) = Min ( 410, 368, 345, 368) = 345 = Loc C From the above results we can see that Location C is the optimal location for fragment (or Object) O1. 18000 16000 3. EXPERIMENTAL ENVIRONMENT AND SIMULATION RESULTS . 12000 Total Query Cost To show the performance of BGBR versus optimal and NNA we developed a simulator that can simulate the various algorithms and show results based on changing fragments size and query rate. 14000 10000 Optimal NNA BGBR 8000 6000 The topology under test for all of our experiments is showed in figure 7. The numbers presented on the links represents the cost of using the link. This cost is equivalent to the weight in Dijsktra shortest path algorithm. The type of experiment we performed was to increase the fragment size and observe the overall results. 4000 2000 16000 15500 15000 14000 14000 13500 13000 12500 12000 11500 11000 9500 10500 9000 10000 8500 8000 7500 7000 6500 6000 5500 5000 4500 4000 3500 3000 2500 2000 500 1500 1000 0 Fragmentation Size Figure 8 As demonstrated in figure 8, BGBR outperforms both NNA and the optimal algorithm when considering the Total Query Cost. Comparing the NNA algorithm with Optimal you will notice that NNA starts to perform better when the fragment sizes are greater then 9MB. However, BGBR outperforms both in all fragment sizes. This is precisely because fragments are placed in a location that provides the overall minimum access cost. 3000 . 2500 Figure 7 Total Fragmentation Migration Cost 2000 Optimal NNA BGBR 1500 1000 500 3.1 Changing the Fragment Size 16000 15500 15000 14500 14000 13500 13000 12500 12000 11500 11000 10500 10000 9500 9000 8500 8000 7500 7000 6500 6000 5500 5000 4500 4000 3500 3000 2500 2000 500 1500 0 1000 This experiment involves increasing the fragment size in 500KB intervals. We started with 500KB size fragments and increased the fragment sizes to a maximum of 16MB. The results of our experiment have been evaluated by two parameters: Total Query Cost and Total Fragmentation Migration Cost. Query cost is the cost of running queries on a particular fragment as it is being allocated to various places in the network. The cost is calculated by multiplying the result query size by the shortest path from the query node to where the fragment is located. Hence, the total query cost is the sum of queries run on all fragments. Fragmentation Migration cost is the cost of moving a fragment from one node to the other node. This is calculated by multiplying the size of the fragment by the shortest path from the source node to the destination node. Hence, the Total Fragmentation Migration Cost is the sum of moving all fragments around the network. The overall system performance can be measured by adding the Total Query Cost and Total Fragmentation Migration Cost. This cost is referred to as the Total Cost in this paper. The three experiments shown in figures 8, 9 and 10 show the Total Query Cost, Total Fragmentation Migration Cost and Total Cost results, respectively, as the Y coordinate. The X coordinate shows the changes to the fragmentation size. Fragmentation Size Figure 9 Figure 9 illustrates how BGBR has better performance than both NNA and Optimal when considering the Total Fragmentation Migration Cost. The problem with the optimal algorithm is that fragments continuously are moved from one node to the other as the access patterns change. Optimal does not consider moving fragments in a location which is beneficial to all nodes that have accessed a specific object. It only considers the owner of a fragment and the node who has accessed an object more times than the owner. Evidently, this algorithm results in a lot of fragment mobility. NNA on the other hand tries to resolve this problem by not making drastic changes. Fragments are moved hop by hop. The reason that BGBR is much better than the other two is precisely because BGBR picks a location that will not lead to much fragment mobility. This location is determined based on the overall access patterns. Second, we have written a graphical tool that easily allows a user to define a topology. This tool is to be linked with the simulator in order to show the performance of BGBR versus NNA and Optimal under various network topologies. 25000 15000 Total Cost . . 20000 Optimal NNA BGBR 10000 5000 16000 15500 15000 14000 14000 13500 13000 12500 12000 11500 11000 9500 10500 9000 10000 8500 8000 7500 7000 6500 6000 5500 5000 4500 4000 3500 3000 2500 2000 500 1500 1000 0 Fragmentation Size Figure 10 The total cost of the system or the overall performance relies on two factors: query cost and fragmentation mobility cost. These two factors are not completely independent since as the fragmentation mobility cost increases query cost also increases. This is due to locking a fragment while it is on the move. The locks are removed once the fragment has reached its destination node. Since BGBR outperforms both Optimal and NNA in the Total Query Cost and Total Fragmentation Mobility Cost it therefore has better overall system performance. This is clearly demonstrated in Figure 10. 5. CONCLUSION In this study we presented a novel mechanism to determine the optimal location to place a fragment in a non-replicated distributed database. This novel mechanism is called BGBR. The optimal choice is made under the assumption that the access patterns that have happened in history will continue to happen in the near future. Unlike other sophisticated algorithms BGBR has a time complexity of O(N), where N is the number of nodes in our network. This algorithm was then further compared with the commercially used Optimal algorithm, and a variation to the Optimal algorithm called NNA. Using our own simulator we have shown that BGBR outperforms both Optimal and NNA algorithm when observing the Total Query Cost and Total Fragmentation Migration Cost. 6. ACKNOWLEDGEMENT We would like to send our thanks to Afshin Bayati, Shaghayegh Rahimizadeh, Kasra Karbazchi, Zahra Baseda, Farid Amir Ghiasvand, and Pooya Tavakoly for their useful comments in this project. 4. FUTURE WORK There are several opportunities for future research. The first factor that could deal with improvement is a mechanism to analyze and keep all or some of history. As you know our algorithm completely erases all access patterns when a node has moved. The problem here to be solved is how much of history should be logged? Should all of it be kept or should some records be filtered? If all of history is kept what happens as the system evolves? Searching through history may be more time consuming then moving fragments. The second area that could use improvement is on the algorithm for determining the node with the lowest cost. As noticed our algorithm has a time complexity of O (N), where N is the number of the nodes. Assume there are 100 nodes in our network but only 3 of the nodes have accessed a fragment. Does it make sense to check all the nodes in order to find the minimum? How can we guarantee that by not checking all the nodes we can pick the node with the minimum cost? Do we necessarily need to find the node with a minimum cost? Is there a node with similar cost to the minimum which can be obtained without an O (N) algorithm? The third area for future research is on experimental results with a changing topology. Experiments should be performed to expand the number of nodes in the topology and observe the system performance of BGBR versus NNA and Optimal. The fourth area is to add replication to the algorithms presented in this paper and observe the behavior of BGBR versus NNA and Optimal. We are currently actively pursuing two of these future research opportunities: first, we have come up with an algorithm that, in most cases, leads to an O(1) time complexity. This algorithm is still under test but we believe that it will provide positive results. 7. REFERENCES [1] Ramakrishnan, Gehrke, Database Management Systems Third Edition, McGraw Hill Higher Education 2003. [2] Navathe, S. B., Ceri, S., Wiederhold, G., and Dou, J., Vertical Partitioning Algorithms for Database Design, ACM Transaction on Database Systems, 9, 1984, 680-710. [3] Meghinin, C., Thanos, C., The Complexity of Operations on a Fragmented Relation, ACM Transactions on Database Systems (TODS), ACM, March 1991. [4] Apers, P. M. G. , “Data allocation in distributed database systems,” ACM Transactions on Database Systems, vol. 13, no. 3, 1988, 263–304. [5] Huang, Y. F. and Chen, J. H., Fragment Allocation in Distributed Database Design, Journal of Information Science and Engineering 17, 2001, 491-506. [6] Smith, A. J., Long-term File Migration: Development and Evaluation of Algorithms, Communications of ACM, 24, 1981, 512-532. [7] Morgan, H.L., and Levin, K. D., Optimal Program and Data Locations in Computer Networks, Communications of ACM, 20, 1977, 315-321. [8] Voulgaris, S., Steen, M. V., Baggio, A., and Ballintjn, G., Transparent Data Relocation in Highly Available Distributed Systems. Studia Informatica Universalis. 2002. [9] Kraus, S., Automated Negotiation on Data Allocation in Multi-Agent Environments, Springer Verlag Berlin Heidelberg, 2001, 150-172. [10] Ahmad, I., Karlapalem, K., Kwok, Y. K., and So, S. K. Evolutionary Algorithms for Allocating Data in Distributed Database Systems, International Journal of Distributed and Parallel Databases, 11: 5-32, The Netherlands, 2002. [11] R. Baseda, S. Tasharofi, M. Rahgozar, Near Neighborhood Allocation: A Novel Dynamic Data Allocation Algorithm in DDB, CSICC 2006 [12] Brunstroml, A., Leutenegger, S. T. and Simhal, R., Experimental Evaluation of Dynamic Data Allocation Strategies in a Distributed Database with changing Workloads, ACM Transactions on Database Systems, 1995. [13] T. H. Cormen, C. E. Leisorson, R. L. Rivest, C. Stein, Introduction to Algorithms 2nd Edition, McGraw Hill 2001. [14] Tang, M., Lee, B., Yeo, C., Tang, X. “Dynamic Replication Algorithms For The Multitier Data Grid” Elsevier, October 2004. [15] Paris, J., Long, D.E, Glockner, A., A Realistic Evaluation of Consistency for Replicated Files, Proceedings of the 21st annual symposium on Simulation ANSS '88, ACM , January 1988. [16] Awerbuch, B., Bartal, Y., Fiat, A., Competetive Distributed File Allocation, Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, ACM, June 1993 [17] Corcoran, A.L., Hale, J., A Genetic Algorithm for Fragment Allocation in a Distributed Database System, Proceedings of the 1994 ACM symposium on Applied computing, ACM, June 1994.