The Trie Data Structure • Basic definition: a recursive tree structure that uses the digital decomposition of strings to represent a set of strings for searching. • Example • One of the advantages of the trie data structure is that its tree depth depends on the amount of data stored in it. Each element of data is stored at the highest level of the tree that still allows a unique retrieval. Another Trie Example • Suppose we have a string that can be identified by the number sequence 8376. Assume also that the trie data structure consists of a trie[] array at the top level. If 8376 is the only data element to be stored, it can be stored as trie[8]. Assume now that another string identified as 8453 is added. To make room for the second element, we allocate another 10-element array and have trie[8] point to it. We push 8376 down and now is indexed as trie[8][3], while the new element, 8453 will be stored under trie[8][4]. Patricia (PAT) Trees • Basic definition: A trie with the additional constraint that single-descendant nodes are eliminated. • Example • PAT arrays provide the same functionality as PAT trees with less space requirement. Suffix Trees • Basic definition: A trie data structure (usually a Patricia tree) built over all suffixes of a text • Example • A suffix tree for a text of n characters can be built in O(n) time. • Suffix arrays provide the same functionality as suffix trees with less space requirement. See page 200 of your text for more on suffix arrays. IR using Suffix Trees • Text is view as one long string and each position in the text corresponds to a semi-infinite string. • No keywords or terms are used. Queries are based on substrings of the text. • The text does not need structure, but if there is one, it can be used. • Searches supported: – Proximity searching – Range searching – Regular expression searching