x - PUC-Rio

advertisement
Self Adjusted Data Structures
Self-adjusting Structures
Consider the following AVL Tree
44
17
78
32
50
48
88
62
Self-adjusting Structures
Consider the following AVL Tree
44
17
78
32
50
48
88
62
Suppose we want to search for the following sequence of
elements: 48, 48, 48, 48, 50, 50, 50, 50, 50.
Self-adjusting Structures
Consider the following AVL Tree
In this case,
is this a good structure?
44
17
78
32
50
48
88
62
Suppose we want to search for the following sequence of
elements: 48, 48, 48, 48, 50, 50, 50, 50, 50.
Self-adjusting Structures
• So far we have seen:
• BST: binary search trees
– Worst-case running time per operation = O(N)
– Worst case average running time = O(N)
» Think about inserting a sorted item list
• AVL tree:
– Worst-case running time per operation = O(logN)
– Worst case average running time = O(logN)
– Does not adapt to skew distributions
Self-adjusting Structures
• The structure is updated after each operation
• Consider a binary search tree. If a sequence of
insertions produces a leaf in the level O(n), a
sequence of m searches to this element will
represent a time complexity of O(mn)
Use an auto-adjusting strucuture
Self adjustable lists
• Move to Front
– Whenever an element is accessed it is moved to
the front of the list
• Transposition
– Whenever an element x is accessed, we switch x
with its predecessor
• Frequency counter
– The list is always sorted by decreasing order of
frequencies
Self adjustable lists
EXAMPLE
Self-adjusting Structures
• Algoritmos Admissíveis
– Um método é dito admissível se no i-ésimo acesso ele
move o elemento acessado ki posições para frente
– Classe de algoritmos que engloba todos os métodos
apresentados.
Análise do Move to Front
Teorema. Seja H um método admissível e seja s
uma sequência de m acessos. Então
CustoMF(S) <= 2CustoH(s) –m,
Aonde CustoMF(S) e CustoH(s) são,
respectivamente, os custos de MF e H para
processar uma sequência s de requisições
Análise do Move to Front
Prova.
Empregamos o método da função potencial
Di :o numero de inversões da lista mantida pelo MF em relação a
lista do algoritmo H após o i-ésimo acesso.
Lista de H: a b c f e d
Lista de MF: b a f d e c
Inversões: (b,a) , (f,c), (d,e), (d,c)  Di =4
Temos que
c’i = ci +Di –Di-1
Análise do Move to Front
Prova.
Elemento x é acessado pelo MF.
x: k-ésimo elemento da lista de H
x: j-ésimo elemento da lista de MF
p: número de elementos que precedem x na lista de MF e
sucedem x na lista de H
• Quando o MF coloca x na primeira posição, j-1-p inversões são
criadas e p são destruídas
• Quando H move o elemento x ei operações para frente, ei
inversões são destruídas e nenhuma é criada
Análise do Move to Front
Prova. Da análise anterior
c’i = j +(j-1-p) – (p + ei) = 2(j-1-p) - ei + 1
Temos que j-1-p <=k-1 já que o número de inversões criadas é limitado pela
posição de x na lista de H. Logo,
c’i <= 2(k-1) - ei + 1
Somando c’i
π‘š
′
𝑖=1 𝑐 𝑖
Segue que
≤ 2π‘π‘’π‘ π‘‘π‘œπ» − π‘‡π‘Ÿπ‘œπ‘π‘Žπ‘ π» + π‘š
π‘š
𝑐
𝑖=1
𝑖
≤ 2π‘π‘’π‘ π‘‘π‘œπ» − π‘‡π‘Ÿπ‘œπ‘π‘Žπ‘ π» + π‘š + 𝐷0 − π·π‘š
Bad Sequences for Transposition and
Frequency Counter
• Transposition
– We insert n elements a1,…,an and then we access
the element an n times.
– Transposition pays O(n2) while MF pays O(n)
• Frequency Counter
– List with elements (an,…,a1). We access an n times, an-1 (n1) times and so on. FC does not reorganize the list.
– FC pays O(n3) while MF pays O(n2)
Further Reading
Randomization avoids malicious adversaries
Self-adjusting Structures
Splay Trees (Tarjan and Sleator 1985)
• Binary search tree.
• Every accessed node is brought to the root
• Adapt to the access probability distribution
Splay trees: Basic Idea
• Try to make the worst-case situation occur less
frequently.
• In a Binary search tree, the worst case situation can
occur with every operation. (while inserting a sorted
item list).
• In a splay tree, when a worst-case situation occurs for
an operation:
– The tree is re-structured (during or after the operation), so that
the subsequent operations do not cause the worst-case
situation to occur again.
Splay trees: Basic idea
• The basic idea of splay tree is:
• After a node is accessed, it is pushed to the root by a
series of AVL tree-like operations (rotations).
• For most applications, when a node is accessed, it is
likely that it will be accessed again in the near future
(principle of locality).
A first attempt
• A simple idea
– When a node k is accessed, push it towards the
root by the following algorithm:
• On the path from k to root:
– Do a singe rotation between node k’s parent and
node k itself.
A first attempt
k5
Accessing node k1
k4
F
k3
E
k2
D
k1
A
B
B
access path
A first attempt
k5
After rotation between k2 and k1
k4
F
k3
E
k1
D
k2
C
A
B
A first attempt
After rotation between k3 and k1
k5
k4
F
k1
A
E
k3
k2
B
C
D
A first attempt
After rotation between k4 and k1
k5
k1
F
k4
k2
k3
A
E
B
C
D
A first attempt
k1
k1 is now root
k5
k2
k4
A
B
F
k3
E
C
D
But k3 is nearly as deep as k1 was.
An access to k3 will push some other node nearly as deep as k3 is.
So, this method does not work ...
Splaying - algorithm
• Assume we access a node.
• We will splay along the path from access node
to the root.
• At every splay step:
– We will selectively rotate the tree.
– Selective operation will depend on the structure
of the tree around the node in which rotation will
be performed
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then
rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent,
and if both x and y are either both left children or
both right children.
– ZIG-ZAG: If x has a parent y and a grandparent,
and if one of x, y is a left child and the other is a
right child.
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both
x and y are either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one
of x, y is a left child and the other is a right child.
y
x
C
A
B
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both
x and y are either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one
of x, y is a left child and the other is a right child.
root
x
y
y
x
C
A
B
ZIG(x)
A
B
C
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both
x and y are either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one
of x, y is a left child and the other is a right child.
root
y
x
x
y
C
A
B
ZAG(x)
A
B
C
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y
are either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a
left child and the other is a right child.
z
y
D
x
C
A
B
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y
are either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a
left child and the other is a right child.
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are
either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a
left child and the other is a right child.
z
y
A
x
D
B
C
Implementing Splay(x, S)
• Do the following operations until x is root.
– ZIG: If x has a parent but no grandparent, then rotate(x).
– ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are
either both left children or both right children.
– ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a
left child and the other is a right child.
z
x
y
z
A
y
ZIG-ZAG
x
D
B
C
A
B
C
D
Splay Example
Apply Splay(1, S) to tree S:
10
9
8
7
6
5
4
3
2
1
ZIG-ZIG
Splay Example
Apply Splay(1, S) to tree S:
10
9
8
7
6
5
4
1
2
3
ZIG-ZIG
Splay Example
Apply Splay(1, S) to tree S:
10
9
8
7
6
ZIG-ZIG
1
4
5
2
3
Splay Example
Apply Splay(1, S) to tree S:
10
9
8
1
ZIG-ZIG
6
7
4
2
5
3
Splay Example
Apply Splay(1, S) to tree S:
10
1
8
6
2
5
3
9
7
4
ZIG
Splay Example
Apply Splay(1, S) to tree S:
1
10
8
6
7
4
2
5
3
9
Apply Splay(2, S) to tree S:
2
1
10
1
4
8
6
8
3
9
10
6
9
Splay(2)
7
4
2
5
3
5
7
Splay Tree Analysis
• Define the potential function
• Associate a positive weight to each node v:
w(v)
• W(v)= οƒ₯ w(y), y belongs to a subtree rooted at
v
• Rank(v) = log W(v)
Splay Tree Analysis
•
•
•
•
•
Define the potential function
Associate a positive weight to each node v: w(v)
W(v)= οƒ₯ w(y), y belongs to a subtree rooted at v
Rank(v) = log W(v)
The tree potential is:
οƒ₯ rank(v)
ο€’v
Upper bound for the amortized time of
a complete splay operation
• To estimate the time of a splay operation we
are going to use the number of rotations
Upper bound for the amortized time of
a complete splay operation
• To estimate the time of a splay operation we use
the number of rotations
Lemma: The amortized time for a complete splay
operation of a node x in a tree of root r is at most
1 + 3[rank(r) – rank(x)]
where rank(x) is the rank of x before the splay and
rank(r) is the rank of r after the splay.
Upper bound for the amortized time of
a complete splay operation
Proof: The amortized cost a is given by
a=t + after – before
t : number of rotations executed in the
splaying
Upper bound for the amortized time of
a complete splay operation
Proof: The amortized cost a is given by
a=t + after – before
a = o1 + o2 + o3 + ... + ok
oi : amortized cost of the i-th operation during
the splay ( zig or zig-zig or zig-zag)
Upper bound for the amortized time of
a complete splay operation
Proof:
i : potential function after i-th operation
ranki : rank after i-th operation
oi = ti + i – i-1
Splay Tree Analysis
• Operations
– Case 1: zig( zag)
– Case 2: zig-zig (zag-zag)
– Case 3: zig-zag (zag-zig)
Splay Tree Analysis
– Case 1: Only one rotation (zig)
r root
x
Splay Tree Analysis
– Case 1: Only one rotation (zig)
r root
x
w.l.o.g.
x
r
r
x
C
A
B
ZIG(x)
A
B
C
Splay Tree Analysis
– Case 1: Only one rotation (zig)
r root
x
w.l.o.g.
x
r
r
x
C
A
ZIG(x)
B
After the operation only rank(x) and rank(r) change
A
B
C
Splay Tree Analysis
– Since potential is the sum of every rank:
i - i-1 = ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x)
ti = 1 (time of one rotation)
Amort. Complexity:
oi = 1 + ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x)
Splay Tree Analysis
Amort. Complexity:
oi = 1 + ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x)
x
r
r
x
C
A
B
ZIG(x)
A
B
C
Splay Tree Analysis
Amort. Complexity:
oi = 1 + ranki(r) + ranki(x) – ranki-1(r) – ranki-1(x)
ranki-1(r) ο‚³ ranki(r)
x
r
r
x
C
A
B
ZIG(x)
ranki (x) ο‚³ ranki-1(x)
A
B
C
Splay Tree Analysis
Amort. Complexity:
oi <= 1 + ranki(x) – ranki-1(x)
ranki-1(r) ο‚³ ranki(r)
x
r
r
x
C
A
B
ZIG(x)
ranki (x) ο‚³ ranki-1(x)
A
B
C
Splay Tree Analysis
Amort. Complexity:
oi <= 1 + 3[ ranki(x) – ranki-1(x) ]
ranki-1(r) ο‚³ ranki(r)
q
r
r
q
C
A
B
ZIG(x)
ranki (x) ο‚³ ranki-1(x)
A
B
C
Splay Tree Analysis
– Case 2: Zig-Zig
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
– Case 2: Zig-Zig
oi = 2 + ranki (x) + ranki (y)+ranki (z) – ranki-1(x) – ranki-1(y) – ranki-1(z)
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
– Case 2: Zig-Zig
oi = 2 + ranki(x) + ranki (y)+ranki(z) – ranki-1(x) – ranki-1(y) – ranki-1(z)
z
x
ranki-1(z) = ranki (x)
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
– Case 2: Zig-Zig
oi = 2 + ranki (y)+ranki (z) – ranki-1(x) – ranki-1(y)
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
– Case 2: Zig-Zig
oi = 2 + ranki (y)+ranki (z) – ranki-1 (x) – ranki-1 (y)
ranki (x) ο‚³ ranki (y)
ranki -1(y) ο‚³ ranki-1 (x)
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
– Case 2: Zig-Zig
oi ο‚£ 2 + ranki (x)+ranki (z) – 2ranki-1 (x)
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
Devemos mostrar que
2 + ranki (x)+ranki (z) – 2ranki-1 (x) <=3(ranki(x)-ranki-1(x))
Ou seja,
-2 ο‚£ ranki-1 (x)+ranki (z) – 2ranki (x)
Devemos estudar o comportamento da
função
ranki-1 (x)+ranki (z) – 2ranki (x)
Splay Tree Analysis
Temos que
ranki-1 (x)+ranki (z) – 2ranki (x) =
log( wi-1(x)/ wi (x))+log( wi(z)/ wi(x) )
Examinando a árvore percebemos que
wi-1(x)/ wi (x)+ wi(z)/ wi(x)<=1
Portanto, log( wi-1(x)/ wi (x))+log( wi(z)/ wi(x) ) é maior ou igual a
min log a + log b sujeito a a+b<=1. Segue da convexidade da
função log que o mínimo é atingido em a=b=1/2. Portanto, o
mínimo é maior ou igual a -2.
Splay Tree Analysis
– Case 2: Zig-Zig
oi ο‚£ 3[ ranki (x) – ranki-1 (x) ]
z
x
y
y
A
D
ZIG-ZIG
x
C
A
B
z
B
C
D
Splay Tree Analysis
– Case 3: Zig-Zag ( analysis similar to the case 2)
oi ο‚£ 3[ ranki (x) – ranki-1 (x) ]
Splay Tree Analysis
Putting the three cases together and telescoping
a = o1 + o2 + ... + ok ο‚£ 3[rank(r)-rank(x)]+1
Splay Tree Analysis
For proving different types of results we must set
the weights accordingly
Splay Tree Analysis
Theorem. The cost of m accesses is O(m log n +n log n), where
n is the number of items in the tree
Splay Tree Analysis
Theorem. The cost of m accesses is O(m log n+n logn),
where n is the number of items in the tree
Proof:
• Define every weight as 1/n.
• Then, the amortized cost is at most 3 log n + 1.
• The potential variation is at most n log n
•Thus, by summing over all accesses we conclude that
the cost is at most 3m log n + n log n +m
Static Optimality Theorem
Theorem: Let q(i) be the number of accesses to
item i. If every item is accessed at least once, then
total cost is at most
n

 m οƒΆοƒΆ
οƒ·οƒ· οƒ·οƒ·
O m  οƒ₯ q(i) log 
i ο€½1
 q(i) οƒΈ οƒΈ

Static Optimality Theorem
Proof. Assign a weight of q(i)/m to item i. Then,
• rank(r)=0 and rank(i) ο‚³ log(q(i)/m)
Thus, 3[rank(r) – rank(i)] +1 ο‚£ 3log(m/q(i)) + 1
In addition, || ο‚£
n
οƒ₯ log( m / q(i))
i ο€½1
Thus, the overall cost is
n

 m οƒΆοƒΆ
οƒ·οƒ· οƒ·οƒ·
O m  οƒ₯ q(i) log 
i ο€½1
 q(i) οƒΈ οƒΈ

Static Optimality Theorem
Theorem: The cost of an optimal static binary search tree is
m

 m οƒΆοƒΆ

οƒ·οƒ· οƒ·οƒ·
 m  οƒ₯ q(i) log 
i ο€½1
 q(i) οƒΈ οƒΈ

Static Finger Theorem
Theorem: Let i,...,n be the items in the splay tree. Let
the sequence of accesses be 1,...,m. If f is a fixed item,
the total access time is
m
O(n log n  m  οƒ₯ log(| i j ο€­ f | 1))
i ο€½1
Static Finger Theorem
Proof. Assign a weight 1/(|i –f|+1)2 to item i. Then,
• rank(r)= O(1).
• rank(ij)=O( log( |ij – f +1|)
Since the weight of every item is at least 1/n 2, then
|| ο‚£ n log n
Working Set Theorem
Theorem: Let i,...,n be the items in the splay tree. Let
the sequence of accesses be 1,...,m. Let i(j) be the item
accessed at the j-th access and let let t(j) be the
number of distinct itens accessed since the previous
access to i(j). Then,
m
O(n log n  m  οƒ₯ log(t ( j )  1))
i ο€½1
Dynamic Optimality Conjecture
Conjecture Consider any sequence of successful accesses on
an n-node search tree. Let A be any algorithm that carries
out each access by traversing the path from the root to the
node containing the accessed item, at a cost of one plus the
depth of the node containing the item, and that between
accesses performs an arbitrary number of rotations
anywhere in the tree, at a cost of one per rotation. Then the
total time to perform all the accesses by splaying is no more
than O(n) plus a constant times the time required by the
algorithm.
Dynamic Optimality Conjecture: best
attempt
Tango Trees: O(log log n) competitive ratio
Dynamic optimality - almost.
E. Demaine, D. Harmon, J. Iacono, and M. Patrascu.
In Foundations of Computer Science (FOCS), 2004
Insertion and Deletion
Most of the theorems hold !
Paris Kanellakis Theory and Practice
Award Award 1999
Splay Tree Data
Structure
Daniel D.K. Sleator and Robert
E. Tarjan
Amortized running time
• Ordinary Complexity: determination of worst
case complexity. Examines each operation
individually
Amortized running time
• Ordinary Complexity: determination of worst
case complexity. Examines each operation
individually
• Amortized Complexity: analyses the average
complexity of each operation.
Amortized Analysis: Physics Approach
• It can be seen as an analogy to the concept of
potential energy
Amortized Analysis: Physics Approach
• It can be seen as an analogy to the concept of
potential energy
• Potential function  which maps any
configuration E of the structure into a real
number (E), called potential of E.
Amortized Analysis: Physics Approach
• It can be seen as an analogy to the concept
of potential energy
• Potential function  which maps any
configuration E of the structure into a real
number (E), called potential of E.
It can be used to to limit the costs of the
operations to be done in the future
Amortized cost of an operation
a = t + (E’) - (E)
Amortized cost of an operation
Structure configuration
after the operation
a = t + (E’) - (E)
Real time
of the operation
Structure configuration
before the operation
Amortized cost of a sequence of
operations
a = t + (E’) - (E)
m
m

t = i=1(ai - i + i-1)
i=1 i
Amortized cost of a sequence of
operations
a = t + (E’) - (E)
m
m

t = i=1(ai - i + i-1)
i=1 i
By telescopic
m
= 0 - m +i=1 ai
Amortized cost of a sequence of M
operations
a = t + (E’) - (E)
m
m

t = i=1(ai - i + i-1)
i=1 i
By telescopic
m
= 0 - m +i=1 ai
The total real time does not depend on the intermediary potential
Amortized cost of a sequence of
operations
 T =  (a -  +  )
If the final potential is greater or equal than the initial, then the amortized
i
i-1
i=1 iboundi to estimate
i=1 as
complexity can be used
an upper
the total real time.
Amortized running time
• Definition: For a series of M consecutive
operations:
– If the total running time is O(M*f(N)), we say that
the amortized running time (per operation) is
O(f(N)).
• Using this definition:
– A splay tree has O(logN) amortized cost (running
time) per operation.
Implementing Splay(x, S)
z
y
D
x
C
A
B
Implementing Splay(x, S)
y
z
y
z
x
D
x
C
A
A
B
B
C
D
Implementing Splay(x, S)
y
z
y
z
x
D
x
C
A
A
B
B
C
D
Implementing Splay(x, S)
y
z
x
y
y
z
x
D
A
z
x
B
C
A
A
B
B
C
D
C
D
Download