Tries Trees of order >= 2 Variable length keys The decision on what path to follow is taken based on potion of the key Static environment, fast retrieval but large space overhead Applications: dictionaries text searching (patricia tries) compression (Ziv-Lembel encoding) E.G.M. Petrakis Tries 1 Example MACCBOY MACAW MACARON MACARONI MACARONIC MACAQUE MACADAMIA MACADAM MACACACO MACABRE . . . E.G.M. Petrakis Tries 2 1. Words in Tree M A C A C A B C D Q R W B R A A U O @ O E @ C O @ M E N O Y I @ I N @ @ A @ @ C @ @ @: word terminating symbol as many words as terminating symbols E.G.M. Petrakis Tries 3 Word Searching At most m+1 comparisons to find the word x@=a1a2….am+1, with root: i=1 and t(ai): child of node ai. i=1; u = ai; while (u != @ && i <= m) { if (t(ai): undefined) X is not in the trie; else { u = t(ai); i = i+1; } if (u == @ && word(t) == X) X is in the trie; else X is not in the trie; } E.G.M. Petrakis Tries 4 2. Words at the Leafs M A C A B . C C D . Q . A MACABRE MACACO . R MACAQUE M N I . . MACADAM MACADAMI I . W MACCBOY . O MACAW O MACAROON MACARONI C . MACARONIC E.G.M. Petrakis Tries 5 Example Implementation 26 symbols/node => large space overhead E.G.M. Petrakis Tries 6 Insertions “bluejay” inserted E.G.M. Petrakis Tries 7 3. Compact Trie compact nodes MAC A C ... MACADAM DAM Q I MACAQUE MACADAMI RO N MACAW MACABOY MACAROO MACARON MACARONI I C words at the leafs MACARONIC E.G.M. Petrakis Tries 8 Patricia Trie Practical Algorithm To Retrieve Information Coded In Alphanumeric especially good for text searching replace sub-words by a reference position in text together with its length e.g., “jay” in “the blue jay” is (10,3) internal nodes keep only the length and the first letter of the phrase external nodes keep the position of the phrase in the text E.G.M. Petrakis Tries 9 Suffix Trie Construct a trie with all suffixes “the blue-jay@” => 12 suffixes Merge branches with common prefix Leafs point to suffices Search: “e-jay” go to position” 3 or 7 increase by query length: 5 check suffices at positions 3+5 and 7+5 the second one matches E.G.M. Petrakis 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Tries theblue-jay@ heblue-jay@ eblue-jay@ blue-jay@ lue-jay@ ue-jay@ e-jay@ -jay@ jay@ ay@ y@ @ 10 Suffix Trie “the 1 t 2 4 b h e 3 b E.G.M. Petrakis 7 - 5 l 6 u 8 9 - j 10 a blue-jay” 11 y 12 @ Search: “e-jay@” Tries 11 Search Patricia Scan query from left to right go to first matching symbol in trie increase by the increment if not in the trie => search failed if all symbols match the query => found !!! E.G.M. Petrakis Tries 12 Patricia Pros & Cons Good for secondary storage applications the trie is not high The space overhead is still a problem Mainly useful for static environments E.G.M. Petrakis Tries 13 Ziv-Lempel Encoding Mainly compression method for any kind of data based on tries and symbol alphabets Basic idea: compute code for repeating symbols or for sets of symbols Globally and locally optimal Compared with Huffman no a-priori knowledge of symbol probabilities Huffman is globally optimum but, not always Locally optimal (for parts of the input) E.G.M. Petrakis Tries 14 Encoding Construct trie: nodes: code of a symbol or of sequences of symbols arc: alphabet symbol initially a trie with all symbols each symbol is assigned a unique code Scan input from left to right: read next symbol follow appropriate branch in trie if symbol not in trie => insert symbol in trie, assign it a new code and output its code E.G.M. Petrakis Tries 15 initial trie root a b c z ........ 1 2 3 26 codes E.G.M. Petrakis Tries 16 alphabet {a, b, c} initial trie: α c b 1 2 3 Input: a 1 b 2 E.G.M. Petrakis a 3 b 4 c 5 b 6 Tries a 7 b 8 a 9 b 10 a 11 a 12 17 α α c b 1 2 α 3 1 2 c α α 1 3 b α 4 Tries 2 b 4 E.G.M. Petrakis c b 1 2 3 output 2 b b b output 1 α b c 3 5 18 output 4 α c α c b 1 2 α b 3 c b b 1 2 α b 3 4 4 5 5 c output 3 6 c b 2 3 α E.G.M. Petrakis b a b α b 5 7 Tries 5 19 output 5 b a b 2 α b 2 α 5 5 b b 8 output 8 8 α output 1 1 c b b 2 α 5 a 2 α 5 b b 4 1 c 8 b 6 α 8 E.G.M. Petrakis α a 9 Tries 20 Decoding Follows the same sequence of steps in reverse order start from the same initial trie scan the code from left to right read next code symbol go to the node having the same code if the code is not in the trie guess the appropriate position in the trie E.G.M. Petrakis Tries 21 Alphabet: {a, b, c} Code: 1 2 4 3 5 8 1 a 1 E.G.M. Petrakis c b 2 Tries 3 22 read(1): must be coming from “a” output “α” α 1 c b 1 2 c b 2 3 1 output “b” 4 b 4 Petrakis E.G.M. 2 c b 3 α 1 α Tries 2 3 In order to go from “1” to “2”, we must have visited “b” 23 In order to go from “2” to “4” we must go through “ab” (passing from 1) “a” was a child of “2” output “αb” α c 3 b 1 2 E.G.M. Petrakis α b 4 3 5 Tries 24 In order to go from “4” to “3” we must go through “c” and“c” was child of “4” α c output “c” b 1 2 5 α b 4 3 5 c 6 E.G.M. Petrakis Tries 25 In order to go from “3” to “5” we must go through “ba” and “b” was child of “3” The next symbol is “8”, but … there is no “8” in the trie!! α c b 1 2 4 3 α b 5 8 b 7 c 6 E.G.M. Petrakis Tries 26 This may happen only if after “5” we passed through the same path since “8” cannot be under “6” or “7” (there is no “6” or “7” in the code) α c output “bαb” b 1 2 α b 4 3 5 c 6 E.G.M. Petrakis 1 b 7 … b 8 Tries 27 α c b 1 2 α b 4 b 5 c 6 3 output “α” 7 b 8 α 9 E.G.M. Petrakis Tries 28 Generalize ..... d ..... n ..... d ..... ..... ..... x x d n output “dxd” Special case: “x” can be null E.G.M. Petrakis Tries 29