Uploaded by cse19733137

Tries

advertisement
Tries
1
Standard Tries: is an ordered tree with the following properties
Each node of tree, except the root is labelled with a character
Ordering of children of an internal node is determined by canonical order
Path from root to a leaf node yields a string
2
1
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑏𝑒𝑎𝑟, 𝑏𝑒𝑙𝑙, 𝑏𝑖𝑑, 𝑏𝑢𝑙𝑙, 𝑏𝑢𝑦, 𝑠𝑒𝑙𝑙, 𝑠𝑡𝑜𝑐𝑘, 𝑠𝑡𝑜𝑝}
b
e
s
i
a
l
r
l
u
l
d
e
t
l
o
y
l
c
l
p
k
For s strings in set, there will be s leaf nodes
Height of the tree is equal to length of the longest string
3
Compressed Tries: is similar to standard tries, but each internal node has at least
two children
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑏𝑒𝑎𝑟, 𝑏𝑒𝑙𝑙, 𝑏𝑖𝑑, 𝑏𝑢𝑙𝑙, 𝑏𝑢𝑦, 𝑠𝑒𝑙𝑙, 𝑠𝑡𝑜𝑐𝑘, 𝑠𝑡𝑜𝑝}
b
e
ar
s
u
id
ll
ll
to
ell
y
ck
p
For s strings in set, there will be s leaf nodes
Every internal node has at least two children
4
2
Compressed Tries: is similar to standard tries, but each internal node has at least
two children
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑏𝑒𝑎𝑟, 𝑏𝑒𝑙𝑙, 𝑏𝑖𝑑, 𝑏𝑢𝑙𝑙, 𝑏𝑢𝑦, 𝑠𝑒𝑙𝑙, 𝑠𝑡𝑜𝑐𝑘, 𝑠𝑡𝑜𝑝}
0,0,0
b
0,1,1
e
0,2,3
ar
5,0,0
s
3,1,1
u
2,1,2
id
1,2,3
ll
3,2,3
ll
6,1,2
to
5,1,3
ell
4,2,2
y
6,3,4
ck
7,3,3
p
0
1
2
3
S[0]
b
e
a
r
S[1]
b
e
l
l
S[2]
b
i
d
S[3]
b
u
l
S[4]
b
u
y
S[5]
s
e
l
l
S[6]
s
t
o
c
S[7]
s
t
o
p
4
l
k
Each Node is a tuple, first element is the index of string, second element is start index of the character and the
Third element is the end character of the string.
5
Suffix Tries: is a trie where the strings in the collection are all suffixes
𝑆𝑡𝑟𝑖𝑛𝑔 = "𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒"
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑒, 𝑧𝑒, 𝑖𝑧𝑒, 𝑚𝑖𝑧𝑒, 𝑖𝑚𝑖𝑧𝑒, 𝑛𝑖𝑚𝑖𝑧𝑒, 𝑖𝑛𝑖𝑚𝑖𝑧𝑒, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒}
𝑆𝑢𝑓𝑓𝑖𝑥𝑒𝑠
𝑒
𝑧𝑒
𝑖𝑧𝑒
𝑚𝑖𝑧𝑒
𝑖𝑚𝑖𝑧𝑒
𝑛𝑖𝑚𝑖𝑧𝑒
𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
i
e
m
n
z
i
i
e
z
e
m
n
z
i
i
e
n
z
m
m
i
e
i
i
m
z
z
i
e
e
z
e
6
3
Suffix Tries: is a trie where the strings in the collection are all suffixes
𝑆𝑡𝑟𝑖𝑛𝑔 = "𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒"
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑒, 𝑧𝑒, 𝑖𝑧𝑒, 𝑚𝑖𝑧𝑒, 𝑖𝑚𝑖𝑧𝑒, 𝑛𝑖𝑚𝑖𝑧𝑒, 𝑖𝑛𝑖𝑚𝑖𝑧𝑒, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒}
𝑆𝑢𝑓𝑓𝑖𝑥𝑒𝑠
𝑒
𝑧𝑒
𝑖𝑧𝑒
𝑚𝑖𝑧𝑒
𝑖𝑚𝑖𝑧𝑒
𝑛𝑖𝑚𝑖𝑧𝑒
𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
i
e
m
n
z
i
i
e
z
e
m
n
z
i
i
e
n
z
m
m
i
e
i
i
m
z
z
i
e
e
z
e
7
Suffix Tries: is a trie where the strings in the collection are all suffixes
𝑆𝑡𝑟𝑖𝑛𝑔 = "𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒"
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑒, 𝑧𝑒, 𝑖𝑧𝑒, 𝑚𝑖𝑧𝑒, 𝑖𝑚𝑖𝑧𝑒, 𝑛𝑖𝑚𝑖𝑧𝑒, 𝑖𝑛𝑖𝑚𝑖𝑧𝑒, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒}
𝑆𝑢𝑓𝑓𝑖𝑥𝑒𝑠
𝑒
𝑧𝑒
𝑖𝑧𝑒
𝑚𝑖𝑧𝑒
𝑖𝑚𝑖𝑧𝑒
𝑛𝑖𝑚𝑖𝑧𝑒
𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
i
e
mize
nimize
m
nimize
ze
i
ze
nimize
ze
8
4
Suffix Tries: is a trie where the strings in the collection are all suffixes
𝑆𝑡𝑟𝑖𝑛𝑔 = "𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒"
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑒, 𝑧𝑒, 𝑖𝑧𝑒, 𝑚𝑖𝑧𝑒, 𝑖𝑚𝑖𝑧𝑒, 𝑛𝑖𝑚𝑖𝑧𝑒, 𝑖𝑛𝑖𝑚𝑖𝑧𝑒, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒}
𝑆𝑢𝑓𝑓𝑖𝑥𝑒𝑠
𝑒
𝑧𝑒
𝑖𝑧𝑒
𝑚𝑖𝑧𝑒
𝑖𝑚𝑖𝑧𝑒
𝑛𝑖𝑚𝑖𝑧𝑒
𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
i
e
mize
m
nimize
nimize
ze
i
ze
ze
nimize
9
Suffix Tries: is a trie where the strings in the collection are all suffixes
𝑆𝑡𝑟𝑖𝑛𝑔 = "𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒"
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑒, 𝑧𝑒, 𝑖𝑧𝑒, 𝑚𝑖𝑧𝑒, 𝑖𝑚𝑖𝑧𝑒, 𝑛𝑖𝑚𝑖𝑧𝑒, 𝑖𝑛𝑖𝑚𝑖𝑧𝑒, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒}
𝑆𝑢𝑓𝑓𝑖𝑥𝑒𝑠
𝑒
𝑧𝑒
𝑖𝑧𝑒
𝑚𝑖𝑧𝑒
𝑖𝑚𝑖𝑧𝑒
𝑛𝑖𝑚𝑖𝑧𝑒
𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
i
e
mize
nimize
mi
ze
nimize
nimize
ze
ze
10
5
Suffix Tries: is a trie where the strings in the collection are all suffixes
𝑆𝑡𝑟𝑖𝑛𝑔 = "𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒"
𝑆𝑡𝑟𝑖𝑛𝑔𝑠 = {𝑒, 𝑧𝑒, 𝑖𝑧𝑒, 𝑚𝑖𝑧𝑒, 𝑖𝑚𝑖𝑧𝑒, 𝑛𝑖𝑚𝑖𝑧𝑒, 𝑖𝑛𝑖𝑚𝑖𝑧𝑒, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒}
𝑆𝑢𝑓𝑓𝑖𝑥𝑒𝑠
𝑒
𝑧𝑒
𝑖𝑧𝑒
𝑚𝑖𝑧𝑒
𝑖𝑚𝑖𝑧𝑒
𝑛𝑖𝑚𝑖𝑧𝑒
𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
i
e
mize
nimize
mi
ze
4,7
2,7
0
1
2
3
4
5
6
7
m
i
n
i
m
i
z
e
0,1
6,7
ze
ze
nimize
1,1
7,7
nimize
2,7
2,7
6,7
6,7
Each Node is a tuple, where first element is the start index of the character and the
second element is the end character of the string.
11
World wide web contains a huge collection of text documents
Information is gathered by using web crawler
Search engine allows users to retrieve relevant information (keywords)
Information stored by search engine is a dictionary called inverted index
Words in the dictionary are called index terms
Array stores occurrence of list of terms
Compressed trie is used for set of index terms
12
6
Download