1 Chapter 7: Space and Time Trade-Offs Introduction Space-and-time trade-off strategy uses an extra amount of storage to store or reorganize the input data. The idea of such proprocessing or prestructuring in design will lead to a faster algorithm. There are two types of technique that exploits space-and-time trade-off: Input enhancement approach The idea is to preprocess the problem’s input, in whole or in part, and store the additional information obtained to accelerate solving the problem afterward. Example: Distribution counting sort, efficient algorithms in string matching. Prestructuring approach The idea is to use extra space to facilitate faster and/or more flexible access to the data. Specifically, some processing is done before a problem in question is actually solved but, unlike the input-enhancement approach, it deals with access structuring. Example: Hashing, B-tree. Note: − The two resources — time and space — do not have to compete with each other in all design situations. − Dynamic programming is considered as a branch of this strategy. 2 Input enhancement approach Distribution counting sort The basic idea is to count, for each element of a list to be sorted, the total number of elements smaller than this element and record the results in a table. These numbers will indicate the positions of the elements in the sorted list. Algorithm (Comparison-counting sort - Θ(𝑛2 )) CountingSort(a[1 .. n]) { count[1 .. n] = 0; for (i = 1; i n – 1; i++) for (j = i + 1; j n; j++) if (a[i] < a[j]) count[j]++; else count[i]++; for (i = 1; i n; i++) b[count[i]] = a[i]; a[1 .. n] = b[1 .. n]; } Algorithm (Distribution counting sort - Θ(𝑛)) DistributionCounting(a[1 .. n]) { f[0 .. high] = 0; for (i = 1; i ≤ n; i++) f[a[i]] ++; for (i = 1; i ≤ high; i++) f[i] += f[i – 1]; for (i = n; i 1; i--) { b[f[a[i]]] = a[i]; f[a[i]]--; } a[1 .. n] = b[1 .. n]; } 3 Horspool’s Algorithm The problem is finding an occurrence of a given string of 𝑚 characters called the pattern 𝑃 in a longer string of 𝑛 characters called the text 𝑇. Idea: Firstly, the algorithm aligns the pattern 𝑃 against the beginning characters of the text 𝑇. It compares characters of 𝑃 with their counterparts in 𝑇 from right to left, starting with the last character in 𝑃. If a mismatch occurs, it shifts the pattern to the right 𝑠 position(s). Note: The value of 𝑠 is expected as large as possible without risking the possibility of missing a matching substring in the text. Example: Horspool’s algorithm determines the size of a shift by looking at the character 𝑐 of 𝑇 that is aligned against the last character of 𝑃. To be specific, such shift sizes are precomputed and stored in a table. The table will be indexed by all possible characters that can be encountered in 𝑇. The table’s entries will indicate the shift sizes computed by the formula: ∀𝑐 ∈ Σ 𝑚 𝑐 ∉ 𝑃[1. . 𝑚 − 1] 𝐷[𝑐] = { 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑟𝑖𝑔ℎ𝑡𝑚𝑜𝑠𝑡 𝑐 𝑎𝑚𝑜𝑛𝑔 𝑡ℎ𝑒 𝑐 ∈ 𝑃[1. . 𝑚 − 1] 𝑓𝑖𝑟𝑠𝑡 𝑚 − 1 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑠 𝑜𝑓 𝑃 𝑡𝑜 𝑖𝑡𝑠 𝑙𝑎𝑠𝑡 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟 4 Algorithm ComputeArray(P[1..m], D, ) { for (Mỗi ký tự c ) D[c] = m; for (i = 1; i ≤ m - 1; i++) D[P[i]] = m – i; } Horspool(P[1..m], T[1..n], ) { int D[||]; ComputeArray(P, D, ); i = m; while (i ≤ n) { k = 0; while (k < m) && (P[m - k] == T[i - k]) k++; if (k == m) print (i – m + 1); i += D[T[i]]; } } 5 Prestructuring approach B-trees The idea of using extra space to facilitate faster access to a given data set is particularly important if the data set in question contains a very large number of records that need to be stored on a disk. Here is the structure of a node (or page) of a B-tree: 𝑝0 𝑘1 𝑝1 𝑘2 𝑝2 … 𝑘𝑚 𝑝𝑚 where 𝑘1 , 𝑘2 , … , 𝑘𝑚 are distinct keys so that 𝑘𝑖 < 𝑘𝑖+1 , ∀𝑖 ∈ [1, 𝑚 − 1], and 𝑝0 , 𝑝1 , … , 𝑝𝑚 are pointers to the node’s children. Let’s consider a B-tree with 𝑘 = 1000: 1 node (1000 keys) 1000 1000 1000 ••• 1000 1000 ••• 1000 1000 1001 nodes (1000 × 1001 = 1,001,000 keys) 1001 × 1001 =1002001 nodes (1000 × 1002001 = 1,002,001,000 keys) In this case, the B-tree of the height of 2 may contain more than one billion keys. Definition: A B-tree of order 𝑡 ≥ 2 with height ℎ > 0 must satisfy the following characteristics: 1. Every node contains at most 2𝑡 keys. 2. Every node, except the root, contains at least 𝑡 keys. 3. Every node is either a leaf, i.e., has no descendants or it has 𝑚 + 1 descendants, where 𝑚 is its number of keys. 4. All leaves appear at the same level. 6 As known that the number of disk accesses is the principal indicator of the efficiency of B-trees and similar data structures. This number is, obviously, equal to the height of the tree in the worst case. For any B-tree of order 𝑡 with 𝑛 nodes and height ℎ > 0, we have the following inequality: 𝑛+1 ℎ ≤ 1 + log (𝑡+1) ( ) ∈ O(log 𝑛) 2