The Palm-tree Index Indexing with the crowd Ahmed R Mahmood* Eduard Dragut* Walid G. Aref* Saleh Basalamah** *Purdue University **Umm AlQura University Outline • • • • • • • Motivation Taxonomy for Crowd-based Indexing Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Motivation 500 1000 200 100 100 200 500 1000 Outline • • • • • • • Motivation Taxonomy for Crowd-based Indexing Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Taxonomy Outline • • • • • • • Motivation Taxonomy Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Problem Definition • Let S be a set of N keys (e.g., images or videos) and q be a query • B+-tree-like index is constructed over S • Study how to use human workers to search the index • Workers perform subjective comparisons between the query image and tree keys, and make subjective decisions, e.g., – Less than, greater than, almost the same – Better, worse, almost the same – Cheaper, more expensive, almost the same Outline • • • • • • • Motivation Taxonomy Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Index Structure Why B+-tree? What is tree order and height? How to construct tree? What are performance metrics? Index Structure • Why B+-tree? – To obtain predictive query cost – Cost reduction with more keys per node • How is the tree order and height determined? – Set by the ability of workers to process at once a specific number of keys Index height Fixed dataset size Error Fixed height Error Error Fixed order Index order Order increase Height decrease Index Construction: How to grow a palm tree? • Key associated with some “Quantitative Value” – Keys have a subjective property and an associated quantitative value – Index constructed based on the quantitative value – Example: Damaged car images with repair cost • Key car image • Subjective property car damage • Qualitative value repair cost 500 1000 200 100 100 200 500 1000 Index Construction: How to grow a palm tree? (Cont’d) • Key associated with some “Qualitative Property” • Keys have a subjective property only • Index constructed by successive insertions • e.g. images of butterflies to be ordered based on beauty Performance Metrics • What are performance metrics? – Error: Distance between ground truth and selected result Error – Cost: Total number of tasks to complete a job Cost Outline • • • • • • • Motivation Taxonomy Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Traversal Algorithms • How to descend the tree? – Leaf-only aggregation – All-level aggregation – All-level aggregation with backtracking Leaf-Only Aggregation Budget: 12 Tasks per worker w2 4 4 4 w3 w1 9 5 13 3 2 7 4 1 2 3 4 6 11 8 5 6 7 8 10 15 12 14 16 9 10 11 12 13 14 15 16 • Even budget distribution – Number of workers = Budget/Tree Height All-Levels Aggregation Budget: 12 w2 9 w1 w3 Tasks per level 3 3 3 3 5 w3 3 2 w2 7 4 1 2 3 4 6 13 w1 11 8 5 6 7 8 10 15 12 14 16 9 10 11 12 13 14 15 16 • Even budget distribution – Replication per level = Budget/Tree Height All-Levels Aggregation EDE Budget: 12 9 3 tasks per level 6 3 2 1 5 3 2 7 4 1 2 3 4 1.5 13 6 11 8 5 6 7 8 10 1 15 12 14 16 9 10 11 12 13 14 15 16 • Uneven budget distribution based on – Probability of distance d error at level l: Pdl – Expected Distance Error per level: EDE .5 Algorithms: Crowd-Search Backtracking All-Levels Aggregation Node A 9 Node C Node B 5 13 Node D 3 2 7 4 1 2 3 4 6 11 8 5 6 7 8 10 15 12 14 16 9 10 11 12 13 14 15 16 Outline • • • • • • • Motivation Taxonomy Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Preliminary Experimental Results Experimental Setup • Squares dataset – Generated 200 images of squares with different sizes • Cars dataset – 1300 image of used cars associated with desired selling prices – Collected using a custom crawler from the Craigslist Website • Crowd: – Students in the DB Group at Purdue (and their spouses) – (IRB Approval) Preliminary Experimental Results Sample task Preliminary Experimental Results Sample task Preliminary Experimental Results • Mean Error while changing the tree fanout and the number of workers (replications) • Higher error on cars dataset • Error increases as fanout increases • Error decreases as number of replications increase • All-levels aggregation has less error than leafonly aggregation Preliminary Experimental Results • Mean Cost while changing the tree fanout and the number of workers (replications) Error • The taller the tree the higher the cost • Higher cost on the cars dataset (has more keys) • More replications involve higher cost Fixed dataset size Order increase Height decrease Outline • • • • • • • Motivation Taxonomy Problem Definition The Palm-tree Index Structure Traversal Algorithms Preliminary Experimental Results Conclusions and Future Work Conclusions and Future Work • Conclusions – The Palm-tree allows employing humans to perform index operations on keys that cannot be indexed by computer • Future Work – More extensive experimental evaluation – Mathematical analysis – Multi-dimensional indexing Questions?