Document

advertisement
Connectivity
A Semi-External Algorithm
Analysis:
• Scan vertex set to load vertices into main
memory
• Scan edge set to carry out algorithm
• O(scan(|V| + |E|)) I/Os
Theorem: If |V|  M, the connected
components of a graph can be computed
in O(scan(|V| + |E|)) I/Os.
1
Connectivity
The General Case
Idea [Chiang et al 1995]:
• If |V|  M
– Use semi-external algorithm
• If |V| > M
– Identify simple connected subgraphs of G
– Contract these subgraphs to obtain graph
G’ = (V’, E’) with |V’|  c|V|, c < 1
– Recursively compute connected components
of G’
– Obtain labelling of connected components
of G from labelling of components of G’
2
Connectivity
The General Case
1a
e 1 B
b
1
f
1
A
i
h
g
2
2
D
C j 2
2 n
2
m
l
2 E
k
2
2
D
2
1
d
1
c
1 B
1 A
C
2
2
E
3
Connectivity
The General Case
Main steps:
• Find smallest neighbors
• Compute connected components of graph
H induced by selected edges
• Contract each component into a single vertex
• Call the procedure recursively
• Copy label of every vertex v  G’ to all
vertices in G represented by v
4
Finding smallest neighbors
To find smallest neighbor w(v) of every vertex v:
Scan edges and replace each undirected edge {u,v} with
directed edges (u,v) and (v,u)
Sort directed edges lexicographically
This produces adjacency lists
Scan adjacency list of v and return as w(v) first vertex in list
This takes overall O(sort(|E|)) I/Os
To produce edge set of (undirected) graph H, sort and scan
edges {v, w(v)} to remove duplicates
This takes another O(sort(|V|)) I/Os
5
Computing Conn Comps of H
Cannot use same algorithm recursively
(didn’t reduce vertex set)
Exploit following property:
Lemma Graph H is a forest
Assume not. Then H must contain cycle x0, x1, …, xk = x0. Since no
duplicate edges, k ≥ 3. Since each vertex v has at most one incident
edge {v,w(v)} in H, w.l.o.g. xi+1 = w(xi) for 0 ≤ i < k. Then the
existence of {xi-1,xi} implies that xi-1 > xi+1. Similarly, xk-1 > x1.
If k even: x0 > x2 > … > xk = x0 yields a contradiction.
If k odd: x0 > x2 > … > xk-1 > x1 > x3 > … > xk = x0 yields a
contradiction.
6
Exploit Property that H is a Forest
Apply Euler tour to H in order to transform each tree into a list
Now compute connected components using ideas from list ranking:
Find large independent set I of H and remove vertices in I from H
Recursively find connected components of smaller graphs
Reintegrate vertices in I (assign component label of neighbor)
This takes sort(|H|) = sort(|V|) I/Os
7
Recursive Calls
Every connected component of H has size at least 2
 |V’|  |V|/2
 O(log (|V|/M)) recursive calls
Theorem: The connected components of a graph G = (V,E) can
be computed in O(sort(|V|) + sort(|E|) log(|V|/M)) I/Os.
8
Improved Connectivity via BFS
• BFS in O(|V| + sort(|E|)) I/Os [Munagala & Ranade 99]
 BFS can be used to identify connected components
• When |V| = |E|/B, algorithm takes O(sort(|E|)) I/Os
• Same alg. but stop recursion before, when # of vertices
reduced to |E|/B (after log (|V|B/|E|) recursive calls)
• At this point, apply BFS rather than semi-external
connectivity
Theorem: The connected components of a graph
G = (V,E) can be computed in
O(sort(|V|) + sort(|E|) log (|V|B / |E|) I/Os.
9
Minimum Spanning Tree (MST)
Can push same ideas to work on MSTs:
Theorem: A MST of a graph G = (V,E) can be
computed in O(sort(|V|) + sort(|E|) log (|V|/M)) I/Os.
Theorem: A MST of a graph G = (V,E) can be found
in O(sort(|V|) + sort(|E|) log (|V|B / |E|) I/Os.
10
Three Techniques for Graph Algs
• Time-forward processing:
– Express graph problems as evaluation problems of
DAGs
• Graph contraction:
– Reduce the size of G while maintaining the properties of
interest
– Solve problem recursively on compressed graph
– Construct solution for G from solution for compressed
graph
• Bootstrapping:
– Switch to generally less efficient algorithm as soon as
(part of the) input is small enough
11
Cache Oblivious
Algorithms
Typical Cache Configuration
13
Cache Oblivious Model
Introduced by Frigo, Leiserson, Prokop &
Ramachandran [FLPR99, Pro99].
Its principle idea simple: design external-memory
algorithms without knowing B and M (internal
details of the hierarchical memory)
But this simple idea has several surprisingly
powerful consequences.
14
Consequences of Cache Oblivious
If cache-oblivious alg. performs well between two levels
of the memory hierarchy, then it must automatically
work well between any two adjacent levels of memory
hierarchy.
Self-tuning: a cache-oblivious algorithm should work
well on all machines without modification (still subject
to some tuning, e.g., where to trim base case of
recursion)
In contrast to external-memory model, algs in the
cache-oblivious model cannot explicitly manage the
cache
15
Assumptions of Cache Oblivious
How can we design algs that minimize number of block
transfers if we do not know the page-replacement
strategy?
An adversarial page replacement strategy could always
evict next block that will be accessed…
Cache oblivious model assumes an ideal cache: page
replacement is optimal, and cache is fully associative.
16
Assumptions of Cache Oblivious
Optimal Page Replacement:
Page replacement strategy knows the future and always
evicts page that will be accessed farthest in future.
Real-world caches do not know the future, and employ
more realistic page replacement strategies such as evicting
the least-recently-used block (LRU) or evicting the oldest
block (FIFO).
17
Assumptions of Cache Oblivious
Full Associativity
Any block can be stored anywhere in cache.
In contrast, most caches have limited associativity: each
block belongs to a cluster and at most some small constant c
of blocks from a common cluster can be stored in cache
at once.
Typical real-world caches are either directed mapped (c = 1)
or 2-way associative (c = 2). Some caches have more
associativity—4-way or 8-way—but constant c is certainly
limited
18
Justification of Ideal Cache
Frigo et al. [FLPR99,Pro99] justify the ideal-cache model by
a collection of reductions that modify an ideal-cache alg to
operate on a more realistic cache model.
Running time of the alg. degrades somewhat, but in most
cases by only a constant factor.
Will outline major steps, without going into the details of the
proofs.
19
Justification of Ideal Cache
Replacement Strategy:
The first reduction removes optimal (omniscient) replacement
strategy that uses information about future requests.
Lemma [FLPR99]. If an alg makes T memory transfers on
cache of size M/2 with optimal replacement, then it makes at
most 2T memory transfers on cache of size M with LRU or
FIFO replacement (and same block size B).
I.e., LRU and FIFO do just as well as optimal replacement up
to constant factors of memory transfers and wastage of the
cache. This competitiveness property of LRU and FIFO goes
back to a 1985 paper of Sleator and Tarjan.
20
Another Assumption: Tall Cache
Commonly assumed that cache taller than wide, i.e., number
of blocks, M/B, larger than size of each block, B:
M = Ω ( B2 )
Particularly important in more sophisticated cache-oblivious
algs: ensures that cache provides polynomially large
“buffer” for guessing block size slightly wrong.
Also commonly assumed in external-memory algorithms.
22
Ideal Cache Oblivious Model
Focus on two levels:
Level 1 has size M
Level 2 tranfers blocks of
size B.
Algorithm designer does
not need to know
parameters M and B
explicitly
Sometimes, tall cache
assumption: M = Ω ( B2 )
usually true in practice.
23
(Easy) Cache Oblivious Algs
Scanning N elements stored in a contiguous segment of
memory costs at most •N/B• +1 memory transfers:
Reversing an array same as scanning:
24
Matrix Transposition
for (i = 0; i < N; i++)
for (j = i+1; j < N; j++)
swap(A[i][j], A[j][i])
How many cache misses?
O(N2) in the worst case.
How to improve this?
Recursion (divide & conquer) may be helpful.
25
Cache Oblivious Matrix Transposition
x
y
y+dy
xmid = (dx/2)
x+dx
Which problem must be solved recursively?
26
Cache Oblivious Matrix Transposition
O(N2/B)
cache
misses
27
Rough Experiments
Athlon
1Ghz,
512M
RAM,
Linux
28
Stop Recursion Earlier
Stop recursion when problem size becomes less than a certain
block size and use simple for loop implementation inside block.
Using
different
block
sizes
seems to
have little
effect on
running
time.
29
Why Divide & Conquer Works?
Divide & conquer repeatedly refines problem size.
Eventually, problem will fit in cache (size ≤ M), and
later will fit in single block (size ≤ B).
For divide & conquer recursion dominated by leaf
costs, algorithm will usually use within a constant
factor of the optimal number of memory transfers.
If divide and merge can be done using few memory
transfers, then divide & conquer approach efficient
even when cost not dominated by leaves.
30
Divide & Conquer OK: Selection
Median and Selection: find k-th item in unsorted sequence
Classical (internal memory) algorithm [Blum et al]:
Recurrence on running time T(N) is:
T(N) = T(N/5) + T(7N/10) + O(N) = O(N)
31
Cache Oblivious Implementation
Step 1 conceptual; do nothing
Step 2 in two parallel scans:
one reads array 5 items at a time,
other writes new array of computed medians.
Assuming M ≥ 2B, that’s O(1 + N/B) memory transfers.
Step 3 recursive call of size N/5.
Step 4 in three parallel scans:
one reads array,
two others write partitioned arrays.
Again, parallel scans use O(1 + N/B) memory transfers (M ≥ 3B)
Step 5 recursive call of size at most 7N/10
Recurrence on memory transfers T(N) is:
T(N) = T(N/5) + T(7N/10) + O(1 + N/B)
32
Failed Attempt in the Analysis
Recurrence on memory transfer T(N) is:
T(N) = T(N/5) + T(7N/10) + O(1 + N/B)
Wish to prove O(1 + N/B) memory tranfers
If T(O(1)) = O(1), each leaf incurs a constant number of
memory transfers.
How many leaves does the recurrence tree have?
L(N) total number of leaves:
L(N) = L(N/5) + L(7N/10)
If L(N) = Nc, then (1/5)c + (7/10)c = 1. I.e., c ≈ 0.8397803
But T(N) is Ω( Nc ), which is still larger than O(1 + N/B)
(when B ≤ N ≤ B Nc. i.e, B ≤ N ≤ B1/(1-c) = B6.24)
33
Refined Analysis
Recurrence on memory transfer T(N) is:
T(N) = T(N/5) + T(7N/10) + O(1 + N/B)
Luckily, can use base case stronger than T(O(1)) = O(1):
T(O(B)) = O(1)
(once problem fits into O(1) blocks, all 5 steps incur only
constant number of memory transfers)
Stop recursion at O(B): then there are only (N/B)c leaves in
recursion tree, which cost only O((N/B)c)= o(N/B) memory
transfers. Thus cost per level decreases geometrically from
root, so total cost is cost of root: O(1 + N/B).
34
Cache Oblivious Implementaion
Theorem. The worst-case linear-time median algorithm,
implemented with appropriate scans, uses O(1 + N/B)
memory transfers, provided M ≥ 3B.
Key part of analysis is to identify relevant base case, so that
“overhead term” does not dominate cost for small problem
sizes relative to cache.
Other than the new base case, analysis is same as classic
(internal memory) algorithm.
35
Divide & Conquer KO: Binary Src
Binary search has the following recurrence:
T(N) = T(N/2) + O(1)
Cost of leaves balance with cost of root: cost of every level is the
same, so extra log N factor
Hope to reduce log N factor in a blocked setting by using
stronger base case T(O(B)) = O(1)
However, stronger base case does not help much: only reduce
number of levels in the recursion tree by an additive Θ(log B)
In this case, solution to recurrence becomes:
T(N) = log N - | Θ(log B) |
Will see later how to get O(logB N) with a different layout
than the sorted one
36
Matrix Multiplication
Wish to compute C = A · B. For sake of simplicity, square
matrices whose dimensions are powers of two (this is w.l.o.g)
Trivial alg.: For each cij, scan in parallel row i of A and column j
of B. Ideally, A stored in row-major and B in column-major order.
Then each element of C requires ≤ O(1 + N/B) memory transfers,
if M ≥ 3B. Cost could only be smaller if M large enough to store
previously visited row or column. If M ≥ N, relevant row of A
remembered for an entire row of C. But for column of B to be
remembered, M ≥ N2, in which case entire problem fits in cache.
Theorem. Assume A stored in row-major and B in column-major
order. Then trivial matrix-multiplication uses O(N2 + N3/B)
memory transfers if 3B ≤ M < N2 and O(1 + N2/B) memory
transfers if M ≥ 3N2 .
37
Matrix Multiplication
Point of theorem is that, even with ideal storage order of
A and B, trivial algorithm still requires O ( N3 / B )
memory transfers unless entire problem fits in cache.
Can do better, and achieve running time of
O(N2/B + N3/B √M).
In external-memory, this bound first achieved by Hong
and Kung [HK81]
Cache-oblivious solution uses same idea as externalmemory solution: block matrices.
38
Matrix Multiplication
Can write C = A · B as a divide-and-conquer recursion
using block-matrix notation:
This way, reduce N · N multiplication problem down to
eight (N/2) · (N/2) multiplication subproblems, plus four
(N/2) · (N/2) addition subproblems (which can be solved
by single scan in O(1+N2/B) memory transfers).
Thus, we get following recurrence:
T (N) = 8 T (N/2) + O(1 + N2/B)
39
Matrix Layout
To make small matrix blocks fit into blocks or main
memory, matrix not stored in row-major or column-major
order, but rather in recursive layout.
Each matrix A laid out so that blocks A11, A12, A21, A22
occupies consecutive segment of memory, and these four
segments stored together in arbitrary order.
40
Base Case
Base case becomes trickier, as both B and M
relevant.
Certainly, T (O(√B)) = O(1), because O(√ B) ·
O(√B) submatrix fits in a constant number of
blocks. But this base case turns out to be irrelevant.
More interesting is T (c√M)= O(M/B), where
constant c chosen so that three c√M · c√M
submatrices fit in cache, and hence each block is
read or written at most once.
41
Analysis
Recurrence is T (N) = 8 T (N/2) + O(1 + N2/B)
Stronger base case T (c√M)= O(M/B).
At level i of recurrence tree:
8i nodes, matrix dimension is N / 2i
total cost 8i O(N2 / (22i B)) = 2i O(N2 / B)
Recursion stops when N / 2i = c√M, i.e., L = O(log (N/√M))
Total cost is
L
Σ
2i O(N2 / B) = (2L+1-1) O(N2 / B) = O(N2/B) + O(N3/ (B √M))
i=0
(That’s divide-merge cost at root plus total leaf cost).
Divide/merge cost at root of the recursion tree is O(N2/B). These
two costs balance when N = Θ (√M), when depth of tree is O(1).
42
Matrix Multiplication
Trivial:
O( N3/B )
Cache Ob.:
O ( N2/B +
N3/B√M )
Trivial vs. blocked Cache Oblivious
43
Static Searching
44
Cache Oblivious Searching
Divide and conquer on tree layout
(van Emde Boas O(loglog U)
priority queue)
Split tree at midde level, resulting
in one top tree and ≈ √N bottom
subtrees, each of size ≈ √N
Recursively layout top subtree
followed by bottom subtrees
45
Cache Oblivious Searching
If height not power of 2, each split rounds so that
bottom subtrees have heights power of 2:
46
CO Searching
• Recursively split tree (cut at middle level) until every recursive
subtree has size at most B (or small enough to fit into cache line)
• Each recursive subtree stores an interval of memory of size at
most B, so occupies at most two blocks.
• Each recursive subtree except topmost has same height.
• Since trees are cut at middle level in each step, this height may
be as small as (log B)/2, for subtree of size Θ(√B), but no
smaller.
47
CO Searching
O(logB N)
cache misses
• Search visits nodes along root-to-leaf path of length log N ,
visiting sequence of recursive subtrees along the way.
• All but first recursive subtree has height at least (log B)/2, so
number of visited recursive subtrees is
≤ 1 + 2(log N )/(log B) = 1 + 2 logB N .
• Each recursive subtree may incur up to two memory transfers,
for a total of ≤ ( 2 + 4 logB N ) memory transfers.
• Faster than trivial search by log2 N / 4 logB N = log2 B / 4
• log2 B / 2 more realistic (each recursive subtree in a block)
• For disk blocks of 1024 elements, expect speedup ≈ 5 (or ≈ 2.5)
48
Experiments on CO Searching
256 bytes
tree nodes
49
Resilient Algorithms and
Data Structures
Memory Errors
Memory error: one or multiple bits read differently from
how they were last written.
Many possible causes:
• electrical or magnetic interference (cosmic rays)
• hardware problems (bit permanently damaged)
• corruption in data path between memories and
processing units
Errors in DRAM devices concern for a long time
[May & Woods 79, Ziegler et al 79, Chen & Hsiao 84, Normand 96,
O’Gorman et al 96, Mukherjee et al 05, … ]
51
Memory Errors
Soft Errors:
Randomly corrupt bits, but do not leave any physical
damage --- cosmic rays
Hard Errors:
Corrupt bits in a repeatable manner because of a
physical defect (e.g., stuck bits) --- hardware problems
52
Error Correcting Codes (ECC)
Error correcting codes (ECC) allow detection and
correction of one or multiple bit errors
Typical ECC is SECDED (i.e., single error correct,
double error detect)
Chip-Kill can correct up to 4 adjacent bits at once
ECC has several overheads in terms of performance
(33%), size (20%) and money (10%).
ECC memory chips are mostly used in memory systems
for server machines rather than for client computers
53
Impact of Memory Errors
Consequence of a memory error is system dependent
1. Correctable errors : fixed by ECC
2. Uncorrectable errors :
2.1. Detected : Explicit failure (e.g., a machine reboot)
2.2. Undetected :
2.2.1. Induced failure (e.g., a kernel panic)
2.2.2. Unnoticed (but application corrupted,
e.g., segmentation fault, file not found,
file not readable, … )
54
How Common are Memory Errors?
55
How Common are Memory Errors?
56
How Common are Memory Errors?
[Schroeder et al 2009] experiments 2.5 years (Jan 06 – Jun 08)
on Google fleet (104 machines, ECC memory)
Memory errors are NOT rare events!
57
Memory Errors
Not all machines (clients) have ECC memory chips.
Increased demand for larger capacities at low cost
just makes the problem more serious – large clusters
of inexpensive memories
Need of reliable computation in the presence of
memory faults
58
Memory Errors
Other scenarios in which memory errors have impact (and
seem to be modeled in an adversarial setting):
• Memory errors can cause security vulnerabilities:
Fault-based cryptanalysis [Boneh et al 97, Xu et al 01, Bloemer & Seifert 03]
Attacking Java Virtual Machines [Govindavajhala & Appel 03]
Breaking smart cards [Skorobogatov & Anderson 02, Bar-El et al 06]
• Avionics and space electronic systems:
Amount of cosmic rays increase with altitude (soft errors)
59
Memory Errors in Space
60
Memory Errors in Space
61
Memory Errors in Space
62
Recap on Memory Errors
I’m thinking of getting back into crime, Luigi.
Legitimate business is too corrupt…
1. Memory errors can be harmful: uncorrectable
memory errors cause some catastrophic event (reboot,
kernel panic, data corruption, …)
63
A small example
Classical algorithms may not be correct in the
presence of (even very few) memory errors
An example: merging two ordered lists
A
80
1 2
B
11 12 13 14 15 16 17 18 19 20
Out
3 4
11 12 13
Q(n)
5 6 7 8 9 10
... 20 80 2 3
4
...
9 10
Q(n2)
inversions
Q(n)
64
Recap on Memory Errors
I know my PIN number:
it’s my name I can’t remember…
2. Memory errors are NOT rare: even a small cluster
of computers with few GB per node can experience
one bit error every few minutes.
65
Memory Errors
In the field study, Google researchers observed mean error
rates of 2,000 – 6,000 per GB per year (25,000 – 75,000
FIT/Mbit)
Mem. size
Mean Time Between Failures
512 MB
1 GB
16 GB
64 GB
1 TB
2.92 hours
1.46 hours
5.48 minutes
1.37 minutes
5.13 seconds
66
Recap on Memory Errors
3. ECC may not be
available (or may not be
enough): No ECC in
inexpensive
memories.
ECC does not guarantee
complete fault coverage;
expensive; system halt
upon
detection
of
uncorrectable
errors;
service disruption; etc…
etc…
67
Impact of Memory Errors
68
Resilient Algorithms and Data Structures
Make sure that the algorithms and data structures we
design are capable of dealing with memory errors
Resilient Algorithms and Data Structures:
Capable of tolerating memory errors on data (even
throughout their execution) without sacrificing
correctness, performance and storage space
69
Faulty- Memory Model [Finocchi, I.
04]
• Memory fault = the correct data stored in a memory
location gets altered (destructive faults)
at any time
• Faults can appear
in any memory location
simultaneously
Wish to produce correct output on
uncorrupted data (in an adversarial model)
• Assumptions:
– Only O(1) words of reliable memory (safe memory)
– Corrupted values indistinguishable from correct ones
• Even recursion may be problematic in this model.
70
Terminology
d = upper bound known on the number of memory
errors (may be function of n)
a = actual number of memory errors
(happen during specific execution)
Note: typically a ≤ d
All the algorithms / data structure described here need to
know d in advance
71
Other Faulty Models
Design of fault-tolerant alg’s received attention for 50+ years
Liar Model [Ulam 77, Renyi 76,…]
Comparison questions answered by a possibly lying
adversary. Can exploit query replication strategies.
Fault-tolerant sorting networks [Assaf Upfal 91, Yao Yao 85,…]
Comparators can be faulty. Exploit substantial data
replication using fault-free data replicators.
Parallel Computations [Huang et al 84, Chlebus et al 94, …]
Faults on parallel/distributed architectures: PRAM or DMM
simulations (rely on fault-detection mechanisms)
72
Other Faulty Models
Memory Checkers [Blum et al 93, Blum et al 95, …]
Programs not reliable objects: self-testing and self-correction.
Essential error detection and error correction mechanisms.
Robustness in Computational Geometry [Schirra 00, …]
Faults from unreliable computation (geometric precision)
rather than from memory errors
Noisy / Unreliable Computation [Bravermann Mossel 08]
Faults (with given probability) from unreliable primitives
(e.g., comparisons) rather than from memory errors
………………………………………
73
Outline of the Talk
1. Motivation and Model
2. Resilient Algorithms:
•
Sorting and Searching
3. Resilient Data Structures
•
Priority Queues
•
Dictionaries
4. (Ongoing) Experimental Results
5. Conclusions and Open Problems
74
Resilient Sorting
We are given a set of n keys that need to be sorted
Value of some keys may get arbitrarily corrupted
We cannot tell which is faithful and which is corrupted
Q1. Can sort efficiently correct values in presence of
memory errors?
Q2. How many memory errors can tolerate in the worst
case if we wish to maintain optimal time and space?
75
Terminology
• Faithful key = never corrupted
• Faulty key = corrupted
• Faithfully ordered sequence = ordered except for
corrupted keys
1 2
3 4
5 6 80
7 8 9 10
Faithfully
ordered
• Resilient sorting algorithm = produces a faithfully
ordered sequence (i.e., wish to sort correctly all the
uncorrupted keys)
76
Trivially Resilient
Resilient variable: consists of (2d+1) copies
x1, x2, , x2d+1 of a standard variable x
Value of resilient variable given by majority of its copies:
• cannot be corrupted by faults
• can be computed in linear time and constant space
[Boyer Moore 91]
Trivially-resilient algorithms and data structures
have Θ(d) multiplicative overheads in terms of time
and space
Note: Trivially-resilient does more than ECC
(SECDED, Chip-Kill, ….)
77
Trivially Resilient Sorting
Trivially Resilient Sorting
Can trivially sort in O(d n log n) time during d
memory errors
O(n log n) sorting algorithm able to tolerate only
O (1) memory errors
78
Resilient Sorting
Upper Bound [Finocchi, Grandoni, I. 05]:
Comparison-based sorting algorithm that takes
O(n log n + d2) time to run during d memory errors
O(n log n) sorting algorithm able to tolerate up
to O ((n log n)1/2) memory errors
Lower Bound [Finocchi, I. 04]:
Any comparison-based resilient O(n log n) sorting
algorithm can tolerate the corruption of at most
O ((n log n)1/2) keys
79
Resilient Sorting (cont.)
Integer Sorting [Finocchi, Grandoni, I. 05]:
Randomized integer sorting algorithm that takes
O(n + d2) time to run during d memory errors
O(n) randomized integer sorting algorithm able
to tolerate up to O(n1/2) memory errors
80
Resilient Binary Search
1 2 80
3 4 5 10
7 8 9 13 20 26
search(5) = false
Wish to get correct answers at least on correct keys:
search(s) either finds a key equal to s,
or determines that no correct key is equal to s
If only faulty keys are equal to s, answer uninteresting
(cannot hope to get trustworthy answer)
81
Trivially Resilient Binary Search
Trivially Resilient Binary Search
Can search in O(d log n) time during d memory errors
82
Resilient Searching
Upper Bounds :
Randomized algorithm with O(log n + d) expected time
[Finocchi, Grandoni, I. 05]
Deterministic algorithm with O(log n + d) time
[Brodal et al. 07]
Lower Bounds :
W(log n + d) lower bound (deterministic)
[Finocchi, I. 04]
W(log n + d) lower bound on expected time
[Finocchi, Grandoni, I. 05]
83
Outline of the Talk
1. Motivation and Model
2. Resilient Algorithms:
•
Sorting and Searching
3. Resilient Data Structures
•
Priority Queues
•
Dictionaries
4. (Ongoing) Experimental Results
5. Conclusions and Open Problems
84
Resilient Data Structures
Data structures more vulnerable to memory errors
than algorithms:
Algorithms affected by errors during execution
Data structures affected by errors in lifetime
85
Resilient Priority Queues
Maintain a set of elements under insert and deletemin
insert adds an element
deletemin deletes and returns either the minimum
uncorrupted value or a corrupted value
Consistent with resilient sorting
86
Resilient Priority Queues
Upper Bound :
Both insert and deletemin can be implemented in
O(log n + d) time
[Jorgensen et al. 07]
(based on cache-oblivious priority queues)
Lower Bound :
A resilient priority queue with n > d elements must use
W(log n + d) comparisons to answer an insert followed
by a deletemin
[Jorgensen et al. 07]
87
Resilient Dictionaries
Maintain a set of elements under insert, delete
and search
insert and delete as usual, search as in
resilient searching:
search(s) either finds a key equal to s,
or determines that no correct key is equal to s
Again, consistent with resilient sorting
88
Resilient Dictionaries
Randomized resilient dictionary implements each
operation in O(log n + d) time
[Brodal et al. 07]
More complicated deterministic resilient dictionary
implements each operation in O(log n + d) time
[Brodal et al. 07]
89
Resilient Dictionaries
Pointer-based data structures
Faults on pointers likely to be more problematic
than faults on keys
Randomized resilient dictionaries of Brodal et al.
built on top of traditional (non-resilient) dictionaries
Our implementation built on top of AVL trees
90
Outline of the Talk
1. Motivation and Model
2. Resilient Algorithms:
•
Sorting and Searching
3. Resilient Data Structures
•
Priority Queues
•
Dictionaries
4. (Ongoing) Experimental Results
5. Conclusions and Open Problems
91
Algorithm / Data
Structure
Experimental Framework
Non-Resilient
O(f(n))
Trivially Resilient
O(d · f(n))
Resilient
O(f(n) + g(d ))
Resilient sorting from [Ferraro-Petrillo et al. 09]
Resilient dictionaries from [Ferraro-Petrillo et al. 10]
Implemented resilient binary search and heaps
Implementations of resilient sorting and dictionaries
more engineered than resilient binary search and heaps
92
Experimental Platform
• 2 CPUs Intel Quad-Core Xeon E5520 @ 2.26Ghz
• L1 cache 256Kb, L2 cache 1 Mb, L3 cache 8 Mb
• 48 GB RAM
• Scientific Linux release with Linux kernel 2.6.18-164
• gcc 4.1.2, optimization flag –O3
93
Fault Injection
This talk: Only random faults
Preliminary experiments (not here): error
rates depend on memory usage and time.
Algorithm / data structure and fault injection
implemented as separate threads
(Run on different CPUs)
94
Resiliency: Why should we care?
What’s the impact of memory errors?
Try to analyze impact of errors on mergesort, priority queues
and dictionaries using a common framework (sorting)
Attempt to measure error propagation: try to estimate how
much output sequence is far from being sorted (because of
memory errors)
Heapsort implemented on array. For coherence, in AVLSort
we do not induce faults on pointers
We’ll measure faults on AVL pointers in separate
experiment
95
Error Propagation
• k-unordered sequence = faithfully ordered except for k
(correct) keys
1 80
2 3 4
9 5 7 8 6 10
2-unordered
• k-unordered sorting algorithm = produces a kunordered sequence, i.e., it faithfully sorts all but k
correct keys
• Resilient is 0-unordered = i.e., it faithfully sorts all
correct keys
96
The Importance of Being Resilient
n = 5,000,000;
0.01% (random) errors in input  0.13% errors in output
0.02% (random) errors in input  0.22% errors in output
a
97
The Importance of Being Resilient
n = 5,000,000;
0.01% (random) errors in input  0.40% errors in output
0.02% (random) errors in input  0.47% errors in output
a
98
The Importance of Being Resilient
n = 5,000,000;
0.01% (random) errors in input  68.20% errors in output
0.02% (random) errors in input  79.62% errors in output
a
99
The Importance of Being Resilient
a
100
Error Amplification
Mergesort
0.002-0.02% (random) errors in input  24.50-79.51% errors in
output!!!
AVLsort
0.002-0.02% (random) errors in input  0.39-0.47% errors in output
Heapsort
0.002-0.02% (random) errors in input  0.01-0.22% errors in output
They all show some error amplification.
Large variations likely to depend on data organization
Note: Those are errors on keys. Errors on pointers are more
dramatic for pointer-based data structures
101
The Importance of Being Resilient
AVL with n = 5,000,000; a errors on memory used (keys,
parent pointers, pointers, etc…)
100,000 searches; around a searches fail:
on the
avg, able to complete only about
(100,000/a) searches before crashing
a
102
Isn’t Trivial Resiliency Enough?
Memory errors are a problem
Do we need to tackle it with new algorithms / data
structures?
Aren’t simple-minded approaches enough?
103
Isn’t Trivial Resiliency Enough?
d = 1024
104
Isn’t Trivial Resiliency Enough?
d = 1024
100.000 random search
105
Isn’t Trivial Resiliency Enough?
d = 512
100.000 random ops
106
Isn’t Trivial Resiliency Enough?
d = 1024
100.000 random ops
no errors on pointers
107
Isn’t Trivial Resiliency Enough?
All experiments for 105 ≤ n ≤ 5 105, d=1024, unless specified
otherwise
Mergesort
Trivially resilient about 100-200X slower than non-resilient
Binary Search
Trivially resilient about 200-300X slower than non-resilient
Dictionaries
Trivially resilient AVL about 300X slower than non-resilient
Heaps
Trivially resilient about 1000X slower than non-resilient (d = 512)
[deletemin are not random and slow]
108
Performance of Resilient Algorithms
Memory errors are a problem
Trivial approaches produce slow algorithms /
data structures
Need non-trivial (hopefully fast) approaches
How fast can be resilient algorithms / data
structures?
109
Performance of Resilient Algorithms
a = d = 1024
110
Performance of Resilient Algorithms
a = d = 1024
111
Performance of Resilient Algorithms
a = d = 1024
100,000 random search
112
Performance of Resilient Algorithms
a = d = 1024
100,000 random search
113
Performance of Resilient Algorithms
a = d = 512
100,000 random ops
114
Performance of Resilient Algorithms
a = d = 512
100,000 random ops
115
Performance of Resilient Algorithms
a = d = 1024
100,000 random ops
116
Performance of Resilient Algorithms
a = d = 1024
100,000 random ops
117
Performance of Resiliency
All experiments for 105 ≤ n ≤ 5 105, a=d=1024, unless specified otherwise
Mergesort
Resilient mergesort about 1.5-2X slower than non-resilient mergesort
[Trivially resilient mergesort about 100-200X slower]
Binary Search
Resilient binary search about 60-80X slower than non-resilient binary search
[Trivially resilient binary search about 200-300X slower]
Heaps
Resilient heaps about 20X slower than non-resilient heaps (a = d = 512)
[Trivially resilient heaps about 1000X slower]
Dictionaries
Resilient AVL about 10-20X slower than non-resilient AVL
[Trivially resilient AVL about 300X slower]
118
Larger Data Sets
How well does the performance of resilient
algorithms / data structures scale to larger
data sets?
Previous experiments: 105 ≤ n ≤ 5 105
New experiment with n = 5 106
(no trivially resilient)
119
Larger Data Sets
n = 5,000,000
a
120
Larger Data Sets
n = 5,000,000
a
121
Larger Data Sets
100,000 random search on n =
5,000,000 elements
log2 n ≈ 22
a
122
Larger Data Sets
100,000 random search on n
= 5,000,000 elements
a
123
Larger Data Sets
100,000 random ops on a
heap with n = 5,000,000
log2 n ≈ 22
a
124
Larger Data Sets
100,000 random ops on a
heap with n = 5,000,000
a
125
Larger Data Sets
100,000 random ops on
AVL with n = 5,000,000
log2 n ≈ 22
a
126
Larger Data Sets
100,000 random ops on
AVL with n = 5,000,000
a
127
Larger Data Sets
All experiments for n = 5 106
Mergesort [was 1.5-2X for 105 ≤ n ≤ 5 105]
Resilient mergesort is 1.6-2.3X slower (requires ≤ 0.04% more
space)
Binary Search [was 60-80X for 105 ≤ n ≤ 5 105]
Resilient search is 100-1000X slower (requires ≤ 0.08% more
space)
Heaps [was 20X for 105 ≤ n ≤ 5 105]
Resilient heap is 100-1000X slower (requires 100X more space)
Dictionaries [was 10-20X for 105 ≤ n ≤ 5 105]
Resilient AVL is 6.9-14.6X slower (requires about 1/3 space)
128
Sensitivity to d
How critical is the choice of d ?
Underestimating d (a > d) compromises
resiliency
Overestimating d (a << d) gives some
performance degradation
129
Performance Degradation
a = 32, but algorithm overestimates d = 1024:
Mergesort
Resilient mergesort improves by 9.7% in time and degrades by
0.04% in space
Binary Search
Resilient search degrades to 9.8X in time and by 0.08% in space
Heaps
Resilient heap degrades to 13.1X in time and by 59.28% in
space
Dictionaries
Resilient AVL degrades by 49.71% in time
130
Robustness
Resilient mergesort and dictionaries appear
more robust than resilient search and heaps
I.e., resilient mergesort and dictionaries scale
better with n, less sensitive to d (so less
vulnerable to bad estimates of d), 
How much of this is due to the fact that their
implementations are more engineered?
131
Outline of the Talk
1. Motivation and Model
2. Resilient Algorithms:
•
Sorting and Searching
3. Resilient Data Structures
•
Priority Queues
•
Dictionaries
4. (Ongoing) Experimental Results
5. Conclusions and Open Problems
132
Concluding Remarks
• Need of reliable computation in the presence of
memory errors
• Investigated basic algorithms and data structures in
the faulty memory model: do not wish to detect
/correct errors, only produce correct output on
correct data
• Tight upper and lower bounds in this model
• After first tests, resilient implementations of
algorithms and data structures look promising
133
Future Work and Open Problems
• Lower bounds for resilient integer sorting?
• Full repertoire for resilient priority queues
(delete, decreasekey, increasekey)?
• Resilient graph algorithms?
• Resilient algorithms oblivious to d?
• Better faulty memory model?
• More (faster) implementations, engineering and
experimental analysis?
134
Thank You!
My memory’s terrible these days…
135
Questions & Answers
136
Euler Tour
Given a tree T, and a distinguished vertex r
of T, an Euler tour of T is a traversal of T
that starts and ends at r and traverses every
edge exactly twice, once in each direction.
r
137
Euler Tour
Formally, every undirected edge {u,v} in T
replaced by two directed edges (u,v) and (v,u)
The tour starts with an edge (r,w)
For every vertex v in T with
incoming edges e1, e2, …, ek and
outgoing edges e’1, e’2, …, e’k,
numbered so that ei and e’i have
the same endpoints, edge ei is w1
succeeded by edge e’(i mod k) +1 in
the Euler tour.
w4
v
w2
w3
Euler Tour
If we wish to compute the Euler tour as a
list (say because we want to apply list
ranking), we can do that in O(sort(N)) I/Os
r
139
Download