Uploaded by Phước Nguyên Hoàng

Chapter 7 - Space and Time Trade-Offs - Student

advertisement
1
Chapter 7: Space and Time Trade-Offs
Introduction
Space-and-time trade-off strategy uses an extra amount of storage to store or
reorganize the input data. The idea of such proprocessing or prestructuring in design will
lead to a faster algorithm.
There are two types of technique that exploits space-and-time trade-off:
Input enhancement approach
The idea is to preprocess the problem’s input, in whole or in part, and store the
additional information obtained to accelerate solving the problem afterward.
Example: Distribution counting sort, efficient algorithms in string matching.
Prestructuring approach
The idea is to use extra space to facilitate faster and/or more flexible access to the
data. Specifically, some processing is done before a problem in question is actually
solved but, unlike the input-enhancement approach, it deals with access structuring.
Example: Hashing, B-tree.
Note:
− The two resources — time and space — do not have to compete with each other in
all design situations.
− Dynamic programming is considered as a branch of this strategy.
2
Input enhancement approach
Distribution counting sort
The basic idea is to count, for each element of a list to be sorted, the total number
of elements smaller than this element and record the results in a table. These numbers
will indicate the positions of the elements in the sorted list.
Algorithm (Comparison-counting sort - Θ(𝑛2 ))
CountingSort(a[1 .. n]) {
count[1 .. n] = 0;
for (i = 1; i  n – 1; i++)
for (j = i + 1; j  n; j++)
if (a[i] < a[j])
count[j]++;
else
count[i]++;
for (i = 1; i  n; i++)
b[count[i]] = a[i];
a[1 .. n] = b[1 .. n];
}
Algorithm (Distribution counting sort - Θ(𝑛))
DistributionCounting(a[1 .. n]) {
f[0 .. high] = 0;
for (i = 1; i ≤ n; i++)
f[a[i]] ++;
for (i = 1; i ≤ high; i++)
f[i] += f[i – 1];
for (i = n; i  1; i--) {
b[f[a[i]]] = a[i];
f[a[i]]--;
}
a[1 .. n] = b[1 .. n];
}
3
Horspool’s Algorithm
The problem is finding an occurrence of a given string of 𝑚 characters called the
pattern 𝑃 in a longer string of 𝑛 characters called the text 𝑇.
Idea: Firstly, the algorithm aligns the pattern 𝑃 against the beginning characters of
the text 𝑇. It compares characters of 𝑃 with their counterparts in 𝑇 from right to left,
starting with the last character in 𝑃. If a mismatch occurs, it shifts the pattern to the right
𝑠 position(s).
Note: The value of 𝑠 is expected as large as possible without risking the possibility of
missing a matching substring in the text.
Example:
Horspool’s algorithm determines the size of a shift by looking at the character 𝑐 of
𝑇 that is aligned against the last character of 𝑃. To be specific, such shift sizes are
precomputed and stored in a table. The table will be indexed by all possible characters
that can be encountered in 𝑇.
The table’s entries will indicate the shift sizes computed by the formula: ∀𝑐 ∈ Σ
𝑚
𝑐 ∉ 𝑃[1. . 𝑚 − 1]
𝐷[𝑐] = { 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑟𝑖𝑔ℎ𝑡𝑚𝑜𝑠𝑡 𝑐 𝑎𝑚𝑜𝑛𝑔 𝑡ℎ𝑒
𝑐 ∈ 𝑃[1. . 𝑚 − 1]
𝑓𝑖𝑟𝑠𝑡 𝑚 − 1 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑠 𝑜𝑓 𝑃 𝑡𝑜 𝑖𝑡𝑠 𝑙𝑎𝑠𝑡 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟
4
Algorithm
ComputeArray(P[1..m], D, ) {
for (Mỗi ký tự c  )
D[c] = m;
for (i = 1; i ≤ m - 1; i++)
D[P[i]] = m – i;
}
Horspool(P[1..m], T[1..n], ) {
int D[||];
ComputeArray(P, D, );
i = m;
while (i ≤ n) {
k = 0;
while (k < m) && (P[m - k] == T[i - k])
k++;
if (k == m)
print (i – m + 1);
i += D[T[i]];
}
}
5
Prestructuring approach
B-trees
The idea of using extra space to facilitate faster access to a given data set is
particularly important if the data set in question contains a very large number of records
that need to be stored on a disk.
Here is the structure of a node (or page) of a B-tree:
𝑝0
𝑘1
𝑝1
𝑘2
𝑝2
…
𝑘𝑚
𝑝𝑚




where 𝑘1 , 𝑘2 , … , 𝑘𝑚 are distinct keys so that 𝑘𝑖 < 𝑘𝑖+1 , ∀𝑖 ∈ [1, 𝑚 − 1], and
𝑝0 , 𝑝1 , … , 𝑝𝑚 are pointers to the node’s children.
Let’s consider a B-tree with 𝑘 = 1000:
1 node
(1000 keys)
1000
1000
1000
•••
1000
1000
•••
1000
1000
1001 nodes
(1000 × 1001 =
1,001,000 keys)
1001 × 1001 =1002001 nodes
(1000 × 1002001 =
1,002,001,000 keys)
In this case, the B-tree of the height of 2 may contain more than one billion keys.
Definition: A B-tree of order 𝑡 ≥ 2 with height ℎ > 0 must satisfy the following
characteristics:
1. Every node contains at most 2𝑡 keys.
2. Every node, except the root, contains at least 𝑡 keys.
3. Every node is either a leaf, i.e., has no descendants or it has 𝑚 + 1 descendants,
where 𝑚 is its number of keys.
4. All leaves appear at the same level.
6
As known that the number of disk accesses is the principal indicator of the
efficiency of B-trees and similar data structures. This number is, obviously, equal to the
height of the tree in the worst case.
For any B-tree of order 𝑡 with 𝑛 nodes and height ℎ > 0, we have the following
inequality:
𝑛+1
ℎ ≤ 1 + log (𝑡+1) (
) ∈ O(log 𝑛)
2
Download