A Self-adjusting Data Structure for Multi-dimensional Point Sets Eunhui Park & David M. Mount University of Maryland Sep. 2012 Motivation • Sleator & Tarjan introduced the splay tree almost 30 years ago. • Self adjusts to access distribution • Supports insertion and deletion in O(log n) amortized time • Efficient access: • Balance property – m accesses in O((m+n) log n) time • Scanning property [Elmasry 2004] – access all items in O(n) time • Working set property – … on temporal locality • Static optimality property – Efficient access based on frequency • Static & dynamic finger [Cole, 2000] properties – … on spatial locality Is there a multi-dimensional generalization? Background • Compressed Quadtree • Hierarchical partition of space • O(n) space • O(log n) access time if augmented: • Topology tree [Frederickson1985, Har-Peled 2005 ] • Skip quadtree [Eppstein, Goodrich, Sun 2005] • Quadtreap [Mount, Park 2010] based on treap [Seidel, Aragon 1996] • Efficient approximate proximity queries • Approximate nearest neighbor search • Approximate range search Objective Quadtree + Splay tree Splay Quadtree • Like quadtrees: • A versatile geometric partition tree • Supports efficient approximate proximity queries • Like splay trees: • Adjusts to access distribution • Supports insertion/deletion in O(log n) amortized time • Supports splay tree access properties: balance, static optimality, working set, static finger Overview • BD-tree • BD-tree • Rotation • Splaying operation • Basic splaying • Splaying • Efficiency • Insertion/deletion • Search and access efficiency BD-tree Box Decomposition tree (BD-tree) : A geometric data structure based on a hierarchical decomposition of space into d-dimensional axis-aligned rectangles • Each node is associated with a region of space • • • • called a cell. Each cell is defined by an outer box and an optional inner box. Partition operations: split and shrink. Internal nodes: split nodes and shrink nodes. Each leaf has a single point or a single inner box. box cell leaves BD-tree: Partitioning Operations • Split Partitions a cell by an axis-orthogonal hyperplane that bisects the cell’s longest side. C D C E left D E split • Shrink Partitions a cell by a shrinking box, which lies within the cell. right C C C F shrink inner F outer C\F 523686 BD-tree: Promotion • By construction, nodes are generated in shrink-split pairs. We merge each into a single ternary node, called a pseudo-node. shrink node outer inner split node right left pseudo-node left right outer • Tree can be restructured through a local operation, called promotion. x y E 𝑝𝑟𝑜𝑚𝑜𝑡𝑒(𝑥) y x D A B C E 𝑝𝑟𝑜𝑚𝑜𝑡𝑒(𝑦) A C B C D D E A B Splay Quadtree • Given an internal node, x, splay(x) uses promotions to transform x to the root of the tree • This makes future accesses to x more efficient g x splay(x) f b g e d c b x c f d e Basic Splaying • As in Sleator & Tarjan, splaying is based on primitive operations: • Zig-zag z z y x F A x G y x B C F D D A E B G z y D E A C B E C F G • Zig-zig x z y y y F x D A B C G A z x B z D E C A B C E F G D E F G The Problem of Right Promotion • Inner-left convention: • If an internal node’s cell has an inner box, it resides in its left child • If necessary, left and right children are relabeled to satisfy this • This guarantees that each cell has constant complexity • Right promotion may violate this convention y x E 𝑝𝑟𝑜𝑚𝑜𝑡𝑒(𝑥) A x E u B If this cell has an inner box, u y B C A D v C D D v v Now, y’s cell has two inner boxes, u and v ! A u E u B C Splaying in 3-Phases • Promotions must be carefully structured to avoid this problem • 3-phased approach (3 passes from bottom to top) g R g f O e O L d c R L a g b R R f L O a b b g c c c d R e L b a d d f f e a • As in Sleator & Tarjan, amortized efficiency is established by a potential-based analysis. e Insertion and deletion • Insert(q): locate leaf x containing q add q as new leaf splay(x) x 𝑖𝑛𝑠𝑒𝑟𝑡(𝑞, 𝑥) 𝑠𝑝𝑙𝑎𝑦(𝑥) x q x q • Insertion can be performed in O(log n) amortized time. • Deletion can be performed in O(log n) amortized time. Analogous to Splay Trees • Balance Theorem: Total access for q1, q2, …, qm takes O((m+n) log n) time. • Working Set Theorem: For each access qj, let tj be the number of different queries since the last access of qj, or since the beginning if this is the qj’s first access. Total m access queries take O( 𝑚 𝑗=1 log 𝑡𝑗 + 1 + 𝑚 + 𝑛 log 𝑛). • Static Optimality Theorem: Given a quadtree subdivision Z, where each cell z ∈ Z has an access probability pz, the entropy of Z is defined as 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑍 = 𝑝𝑧 𝑙𝑜𝑔 𝑧∈𝑍 1 𝑝𝑧 Total m access queries take O(𝑚 ∙ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑍)). Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + 𝑚 𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)). 𝑖𝑚 𝑓 • d-dim • For a single point 𝑞, - Let 𝑟 = 𝑑𝑖𝑠𝑡(𝑓, 𝑞) 𝑓× 𝑟 ° 𝑞 Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + 𝑚 𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)). 𝑖𝑚 𝑓 • d-dim • But most geometric queries involve 𝑄 regions, not points - Let 𝑟 = max 𝑑𝑖𝑠𝑡(𝑓, 𝑥) 𝑥∈𝑄 𝑟 𝑓× Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + 𝑚 𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)). 𝑖𝑚 𝑓 • d-dim • 𝑚 queries 𝑄1 , 𝑄2 , ⋯ , 𝑄𝑚 𝑄𝑖 - Let 𝑟 = max max 𝑑𝑖𝑠𝑡(𝑓, 𝑥) 1≤𝑖≤𝑚 𝑥∈𝑄𝑖 𝑓× 𝑟 Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + 𝑚 𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)). 𝑖𝑚 𝑓 • d-dim • For the technical reasons, need to expand 𝑄𝑖 - Let 𝑟 = (1 + 𝑐) max max 𝑑𝑖𝑠𝑡(𝑓, 𝑥) 1≤𝑖≤𝑚 𝑥∈𝑄𝑖 𝑓× 𝑟 Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + 𝑚 𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)). 𝑖𝑚 𝑓 • d-dim • Consider an expanded ball 𝑄𝑖 - Let 𝑟 = (1 + 𝑐) max max 𝑑𝑖𝑠𝑡(𝑓, 𝑥) 1≤𝑖≤𝑚 𝑥∈𝑄𝑖 • Define the working set to be the set 𝑊 of points within distance 𝑟 from 𝑓 • Total access for approx. range queries 𝑄1 , 𝑄2 , ⋯ , 𝑄𝑚 : O(𝑚 (1/ε) d-1 log |𝑊| + 𝑛 log 𝑛) • ANN queries • Box queries 𝑓× 𝑟 𝑊: set of points in expanded ball Conclusions • Splay Quadtree: • Self-adjusting geometric data structure • Supports insertion/deletion in O(log n) amortized time • Supports efficient approximate proximity queries • Open problems: • Other properties of standard splay trees? • Dynamic finger theorem • Scanning theorem • Better notions of distance (or generally locality) in a geometric setting? References Thank you!