Self Adjusted Data Structures Self-adjusting Structures Consider the following AVL Tree 44 17 78 32 50 48 88 62 Self-adjusting Structures Consider the following AVL Tree 44 17 78 32 50 48 88 62 Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 50, 50, 50, 50, 50. Self-adjusting Structures Consider the following AVL Tree In this case, is this a good structure? 44 17 78 32 50 48 88 62 Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 50, 50, 50, 50, 50. Self-adjusting Structures • So far we have seen: • BST: binary search trees – Worst-case running time per operation = O(N) – Worst case average running time = O(N) » Think about inserting a sorted item list • AVL tree: – Worst-case running time per operation = O(logN) – Worst case average running time = O(logN) – Does not adapt to skew distributions Self-adjusting Structures • The structure is updated after each operation • Consider a binary search tree. If a sequence of insertions produces a leaf in the level O(n), a sequence of m searches to this element will represent a time complexity of O(mn) Use an auto-adjusting strucuture Self adjustable lists • Move to Front – Whenever an element is accessed it is moved to the front of the list • Transposition – Whenever an element x is accessed, we switch x with its predecessor • Frequency counter – The list is always sorted by decreasing order of frequencies Self adjustable lists EXAMPLE Self-adjusting Structures • Algoritmos Admissíveis – Um método é dito admissível se no i-ésimo acesso ele move o elemento acessado ki posições para frente – Classe de algoritmos que engloba todos os métodos apresentados. Análise do Move to Front Teorema. Seja H um método admissível e seja s uma sequência de m acessos. Então CustoMF(S) <= 2CustoH(s) –m, Aonde CustoMF(S) e CustoH(s) são, respectivamente, os custos de MF e H para processar uma sequência s de requisições Análise do Move to Front Prova. Empregamos o método da função potencial Di :o numero de inversões da lista mantida pelo MF em relação a lista do algoritmo H após o i-ésimo acesso. Lista de H: a b c f e d Lista de MF: b a f d e c Inversões: (b,a) , (f,c), (d,e), (d,c) ο¨ Di =4 Temos que c’i = ci +Di –Di-1 Análise do Move to Front Prova. Elemento x é acessado pelo MF. x: k-ésimo elemento da lista de H x: j-ésimo elemento da lista de MF p: número de elementos que precedem x na lista de MF e sucedem x na lista de H • Quando o MF coloca x na primeira posição, j-1-p inversões são criadas e p são destruídas • Quando H move o elemento x ei operações para frente, ei inversões são destruídas e nenhuma é criada Análise do Move to Front Prova. Da análise anterior c’i = j +(j-1-p) – (p + ei) = 2(j-1-p) - ei + 1 Temos que j-1-p <=k-1 já que o número de inversões criadas é limitado pela posição de x na lista de H. Logo, c’i <= 2(k-1) - ei + 1 Somando c’i π ′ π=1 π π Segue que ≤ 2ππ’π π‘ππ» − ππππππ π» + π π π π=1 π ≤ 2ππ’π π‘ππ» − ππππππ π» + π + π·0 − π·π Bad Sequences for Transposition and Frequency Counter • Transposition – We insert n elements a1,…,an and then we access the element an n times. – Transposition pays O(n2) while MF pays O(n) • Frequency Counter – List with elements (an,…,a1). We access an n times, an-1 (n1) times and so on. FC does not reorganize the list. – FC pays O(n3) while MF pays O(n2) Further Reading Randomization avoids malicious adversaries Self-adjusting Structures Splay Trees (Tarjan and Sleator 1985) • Binary search tree. • Every accessed node is brought to the root • Adapt to the access probability distribution Splay trees: Basic Idea • Try to make the worst-case situation occur less frequently. • In a Binary search tree, the worst case situation can occur with every operation. (while inserting a sorted item list). • In a splay tree, when a worst-case situation occurs for an operation: – The tree is re-structured (during or after the operation), so that the subsequent operations do not cause the worst-case situation to occur again. Splay trees: Basic idea • The basic idea of splay tree is: • After a node is accessed, it is pushed to the root by a series of AVL tree-like operations (rotations). • For most applications, when a node is accessed, it is likely that it will be accessed again in the near future (principle of locality). A first attempt • A simple idea – When a node k is accessed, push it towards the root by the following algorithm: • On the path from k to root: – Do a singe rotation between node k’s parent and node k itself. A first attempt k5 Accessing node k1 k4 F k3 E k2 D k1 A B B access path A first attempt k5 After rotation between k2 and k1 k4 F k3 E k1 D k2 C A B A first attempt After rotation between k3 and k1 k5 k4 F k1 A E k3 k2 B C D A first attempt After rotation between k4 and k1 k5 k1 F k4 k2 k3 A E B C D A first attempt k1 k1 is now root k5 k2 k4 A B F k3 E C D But k3 is nearly as deep as k1 was. An access to k3 will push some other node nearly as deep as k3 is. So, this method does not work ... Splaying - algorithm • Assume we access a node. • We will splay along the path from access node to the root. • At every splay step: – We will selectively rotate the tree. – Selective operation will depend on the structure of the tree around the node in which rotation will be performed Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. y x C A B Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. root x y y x C A B ZIG(x) A B C Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. root y x x y C A B ZAG(x) A B C Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. z y D x C A B Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. z x y y A D ZIG-ZIG x C A B z B C D Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. z y A x D B C Implementing Splay(x, S) • Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. z x y z A y ZIG-ZAG x D B C A B C D Splay Example Apply Splay(1, S) to tree S: 10 9 8 7 6 5 4 3 2 1 ZIG-ZIG Splay Example Apply Splay(1, S) to tree S: 10 9 8 7 6 5 4 1 2 3 ZIG-ZIG Splay Example Apply Splay(1, S) to tree S: 10 9 8 7 6 ZIG-ZIG 1 4 5 2 3 Splay Example Apply Splay(1, S) to tree S: 10 9 8 1 ZIG-ZIG 6 7 4 2 5 3 Splay Example Apply Splay(1, S) to tree S: 10 1 8 6 2 5 3 9 7 4 ZIG Splay Example Apply Splay(1, S) to tree S: 1 10 8 6 7 4 2 5 3 9 Apply Splay(2, S) to tree S: 2 1 10 1 4 8 6 8 3 9 10 6 9 Splay(2) 7 4 2 5 3 5 7 Splay Tree Analysis • Define the potential function • Associate a positive weight to each node v: w(v) • W(v)= ο₯ w(y), y belongs to a subtree rooted at v • Rank(v) = log W(v) Splay Tree Analysis • • • • • Define the potential function Associate a positive weight to each node v: w(v) W(v)= ο₯ w(y), y belongs to a subtree rooted at v Rank(v) = log W(v) The tree potential is: ο₯ rank(v) ο’v Upper bound for the amortized time of a complete splay operation • To estimate the time of a splay operation we are going to use the number of rotations Upper bound for the amortized time of a complete splay operation • To estimate the time of a splay operation we use the number of rotations Lemma: The amortized time for a complete splay operation of a node x in a tree of root r is at most 1 + 3[rank(r) – rank(x)] where rank(x) is the rank of x before the splay and rank(r) is the rank of r after the splay. Upper bound for the amortized time of a complete splay operation Proof: The amortized cost a is given by a=t + ο¦after – ο¦before t : number of rotations executed in the splaying Upper bound for the amortized time of a complete splay operation Proof: The amortized cost a is given by a=t + ο¦after – ο¦before a = o1 + o2 + o3 + ... + ok oi : amortized cost of the i-th operation during the splay ( zig or zig-zig or zig-zag) Upper bound for the amortized time of a complete splay operation Proof: ο¦i : potential function after i-th operation ranki : rank after i-th operation oi = ti + ο¦i – ο¦i-1 Splay Tree Analysis • Operations – Case 1: zig( zag) – Case 2: zig-zig (zag-zag) – Case 3: zig-zag (zag-zig) Splay Tree Analysis – Case 1: Only one rotation (zig) r root x Splay Tree Analysis – Case 1: Only one rotation (zig) r root x w.l.o.g. x r r x C A B ZIG(x) A B C Splay Tree Analysis – Case 1: Only one rotation (zig) r root x w.l.o.g. x r r x C A ZIG(x) B After the operation only rank(x) and rank(r) change A B C Splay Tree Analysis – Since potential is the sum of every rank: ο¦i - ο¦i-1 = ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x) ti = 1 (time of one rotation) Amort. Complexity: oi = 1 + ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x) Splay Tree Analysis Amort. Complexity: oi = 1 + ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x) x r r x C A B ZIG(x) A B C Splay Tree Analysis Amort. Complexity: oi = 1 + ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x) ranki-1(r) ο³ ranki(r) x r r x C A B ZIG(x) ranki (x) ο³ ranki-1(x) A B C Splay Tree Analysis Amort. Complexity: oi <= 1 + ranki(x) – ranki-1(x) ranki-1(r) ο³ ranki(r) x r r x C A B ZIG(x) ranki (x) ο³ ranki-1(x) A B C Splay Tree Analysis Amort. Complexity: oi <= 1 + 3[ ranki(x) – ranki-1(x) ] ranki-1(r) ο³ ranki(r) q r r q C A B ZIG(x) ranki (x) ο³ ranki-1(x) A B C Splay Tree Analysis – Case 2: Zig-Zig z x y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis – Case 2: Zig-Zig oi = 2 + ranki (x) + ranki (y)+ranki (z) – ranki-1(x) – ranki-1(y) – ranki-1(z) z x y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis – Case 2: Zig-Zig oi = 2 + ranki(x) + ranki (y)+ranki(z) – ranki-1(x) – ranki-1(y) – ranki-1(z) z x ranki-1(z) = ranki (x) y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis – Case 2: Zig-Zig oi = 2 + ranki (y)+ranki (z) – ranki-1(x) – ranki-1(y) z x y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis – Case 2: Zig-Zig oi = 2 + ranki (y)+ranki (z) – ranki-1 (x) – ranki-1 (y) ranki (x) ο³ ranki (y) ranki -1(y) ο³ ranki-1 (x) z x y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis – Case 2: Zig-Zig oi ο£ 2 + ranki (x)+ranki (z) – 2ranki-1 (x) z x y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis Devemos mostrar que 2 + ranki (x)+ranki (z) – 2ranki-1 (x) <=3(ranki(x)-ranki-1(x)) Ou seja, -2 ο£ ranki-1 (x)+ranki (z) – 2ranki (x) Devemos estudar o comportamento da função ranki-1 (x)+ranki (z) – 2ranki (x) Splay Tree Analysis Temos que ranki-1 (x)+ranki (z) – 2ranki (x) = log( wi-1(x)/ wi (x))+log( wi(z)/ wi(x) ) Examinando a árvore percebemos que wi-1(x)/ wi (x)+ wi(z)/ wi(x)<=1 Portanto, log( wi-1(x)/ wi (x))+log( wi(z)/ wi(x) ) é maior ou igual a min log a + log b sujeito a a+b<=1. Segue da convexidade da função log que o mínimo é atingido em a=b=1/2. Portanto, o mínimo é maior ou igual a -2. Splay Tree Analysis – Case 2: Zig-Zig oi ο£ 3[ ranki (x) – ranki-1 (x) ] z x y y A D ZIG-ZIG x C A B z B C D Splay Tree Analysis – Case 3: Zig-Zag ( analysis similar to the case 2) oi ο£ 3[ ranki (x) – ranki-1 (x) ] Splay Tree Analysis Putting the three cases together and telescoping a = o1 + o2 + ... + ok ο£ 3[rank(r)-rank(x)]+1 Splay Tree Analysis For proving different types of results we must set the weights accordingly Splay Tree Analysis Theorem. The cost of m accesses is O(m log n +n log n), where n is the number of items in the tree Splay Tree Analysis Theorem. The cost of m accesses is O(m log n+n logn), where n is the number of items in the tree Proof: • Define every weight as 1/n. • Then, the amortized cost is at most 3 log n + 1. • The potential variation is at most n log n •Thus, by summing over all accesses we conclude that the cost is at most 3m log n + n log n +m Static Optimality Theorem Theorem: Let q(i) be the number of accesses to item i. If every item is accessed at least once, then total cost is at most n ο¦ ο¦ m οΆοΆ ο·ο· ο·ο· Oο§ο§ m ο« ο₯ q(i) log ο§ο§ i ο½1 ο¨ q(i) οΈ οΈ ο¨ Static Optimality Theorem Proof. Assign a weight of q(i)/m to item i. Then, • rank(r)=0 and rank(i) ο³ log(q(i)/m) Thus, 3[rank(r) – rank(i)] +1 ο£ 3log(m/q(i)) + 1 In addition, |ο¦| ο£ n ο₯ log( m / q(i)) i ο½1 Thus, the overall cost is n ο¦ ο¦ m οΆοΆ ο·ο· ο·ο· Oο§ο§ m ο« ο₯ q(i) log ο§ο§ i ο½1 ο¨ q(i) οΈ οΈ ο¨ Static Optimality Theorem Theorem: The cost of an optimal static binary search tree is m ο¦ ο¦ m οΆοΆ ο§ ο·ο· ο·ο· οο§ m ο« ο₯ q(i) log ο§ο§ i ο½1 ο¨ q(i) οΈ οΈ ο¨ Static Finger Theorem Theorem: Let i,...,n be the items in the splay tree. Let the sequence of accesses be 1,...,m. If f is a fixed item, the total access time is m O(n log n ο« m ο« ο₯ log(| i j ο f | ο«1)) i ο½1 Static Finger Theorem Proof. Assign a weight 1/(|i –f|+1)2 to item i. Then, • rank(r)= O(1). • rank(ij)=O( log( |ij – f +1|) Since the weight of every item is at least 1/n 2, then |ο¦| ο£ n log n Working Set Theorem Theorem: Let i,...,n be the items in the splay tree. Let the sequence of accesses be 1,...,m. Let i(j) be the item accessed at the j-th access and let let t(j) be the number of distinct itens accessed since the previous access to i(j). Then, m O(n log n ο« m ο« ο₯ log(t ( j ) ο« 1)) i ο½1 Dynamic Optimality Conjecture Conjecture Consider any sequence of successful accesses on an n-node search tree. Let A be any algorithm that carries out each access by traversing the path from the root to the node containing the accessed item, at a cost of one plus the depth of the node containing the item, and that between accesses performs an arbitrary number of rotations anywhere in the tree, at a cost of one per rotation. Then the total time to perform all the accesses by splaying is no more than O(n) plus a constant times the time required by the algorithm. Dynamic Optimality Conjecture: best attempt Tango Trees: O(log log n) competitive ratio Dynamic optimality - almost. E. Demaine, D. Harmon, J. Iacono, and M. Patrascu. In Foundations of Computer Science (FOCS), 2004 Insertion and Deletion Most of the theorems hold ! Paris Kanellakis Theory and Practice Award Award 1999 Splay Tree Data Structure Daniel D.K. Sleator and Robert E. Tarjan Amortized running time • Ordinary Complexity: determination of worst case complexity. Examines each operation individually Amortized running time • Ordinary Complexity: determination of worst case complexity. Examines each operation individually • Amortized Complexity: analyses the average complexity of each operation. Amortized Analysis: Physics Approach • It can be seen as an analogy to the concept of potential energy Amortized Analysis: Physics Approach • It can be seen as an analogy to the concept of potential energy • Potential function ο¦ which maps any configuration E of the structure into a real number ο¦(E), called potential of E. Amortized Analysis: Physics Approach • It can be seen as an analogy to the concept of potential energy • Potential function ο¦ which maps any configuration E of the structure into a real number ο¦(E), called potential of E. It can be used to to limit the costs of the operations to be done in the future Amortized cost of an operation a = t + ο¦(E’) - ο¦(E) Amortized cost of an operation Structure configuration after the operation a = t + ο¦(E’) - ο¦(E) Real time of the operation Structure configuration before the operation Amortized cost of a sequence of operations a = t + ο¦(E’) - ο¦(E) m m ο t = οi=1(ai - ο¦i + ο¦i-1) i=1 i Amortized cost of a sequence of operations a = t + ο¦(E’) - ο¦(E) m m ο t = οi=1(ai - ο¦i + ο¦i-1) i=1 i By telescopic m = ο¦0 - ο¦m +i=1ο ai Amortized cost of a sequence of M operations a = t + ο¦(E’) - ο¦(E) m m ο t = οi=1(ai - ο¦i + ο¦i-1) i=1 i By telescopic m = ο¦0 - ο¦m +i=1ο ai The total real time does not depend on the intermediary potential Amortized cost of a sequence of operations ο T = ο (a - ο¦ + ο¦ ) If the final potential is greater or equal than the initial, then the amortized i i-1 i=1 iboundi to estimate i=1 as complexity can be used an upper the total real time. Amortized running time • Definition: For a series of M consecutive operations: – If the total running time is O(M*f(N)), we say that the amortized running time (per operation) is O(f(N)). • Using this definition: – A splay tree has O(logN) amortized cost (running time) per operation. Implementing Splay(x, S) z y D x C A B Implementing Splay(x, S) y z y z x D x C A A B B C D Implementing Splay(x, S) y z y z x D x C A A B B C D Implementing Splay(x, S) y z x y y z x D A z x B C A A B B C D C D