Query Processing in Mobile P2P Databases IGERT Seminar Presentation Bo Xu joint work with Ouri Wolfson Talk outline Introduction System Model The MARKET Algorithm Evaluation Extension to CTS Conclusion and Future Work July 12, 2016 IGERT seminar 2 Query Processing Environments Motivation: a general purpose query processing strategy mobile disconnected wireless ad-hoc networks Wireless ad-hoc networks Query longevity nuous Instantaneous Sensor Mobile MANET Static Vehicular Sensor Network (VSN) July 12, 2016 Connectivity Sparse Dense GPS receiver chemical spill detector still/video camera vibration sensor acoustic detector IGERT seminar 3 Store-and-forward to deal with sparseness Q QA A r q Q A qA July 12, 2016 IGERT seminar A 4 Issues with Store-and-forward How to manage limited memory, power, and bandwidth? Which reports to save/transmit? July 12, 2016 IGERT seminar 5 Difficulty of Store-and-forward Case: Each mobile node is interested in every data-item Assume that the trajectories of all nodes is known a priori at a central server. If memory, energy, and bandwidth are bounded at mobile nodes, then the problem of determining whether a set of data-items can be disseminated to all the mobile nodes is NP-complete. Mobile P2P: Trajectories unknown a priori; Heuristics needed July 12, 2016 IGERT seminar 6 Talk outline Introduction System Model The MARKET Algorithm Evaluation Extension to CTS Conclusion and Future Work July 12, 2016 IGERT seminar 7 Mobile P2P Database Pda’s, cell-phones, sensors, hotspots, vehicles, with short-range wireless capabilities report 8 A Local query Local database C query C query A report 1 report 2 report 3 query B report 4 report 5 B • Applications coexist • Variable report sizes • A peer can be a produce, consumer, and broker July 12, 2016 IGERT seminar 8 Queries A query Q maps each report R to a match degree: 0 match( R, Q ) 1 Examples: Top parking slots given my current location match(R,Q)=e-t-d Profile with expertise “children-periodontics” Similarity between two images July 12, 2016 IGERT seminar 9 Query/report Dissemination Two peers within transmission range exchange queries and reports Least relevant reports that do not fit in local broker database are purged Exchange not necessarily synchronous (periodic broadcast) July 12, 2016 IGERT seminar 10 Talk outline Introduction System Model The MARKET Algorithm Evaluation Extension to CTS Conclusion and Future Work July 12, 2016 IGERT seminar 11 Ranking Factors Rank of a report R is determined by Demand: What fraction of peers are querying R Supply: What fraction of peers already have R Probability that a peer is interested in R Probability that a peer has R Size of R July 12, 2016 IGERT seminar 12 Rank of a report expected benefit = demand(R)*(1supply(R)) reports benefit 0.7 0.5 0.4 0.5 0.8 reports database 0.3 Rank(R)= demand(R)*(1supply(R)) size(R) July 12, 2016 IGERT seminar 13 Report Ranking: sample demand r2 r5 q1 q7 O1 r3 O2 r6 q2 q6 r1 r2 r5 r4 q1 q3 q4 q7 O1 O2 O3 O3 reports relation r1 r4 r7 r8 q3 q4 q5 q8 r1 r4 r7 r8 queries relation (a) r4 r3 r6 r1 q2 q4 q3 q6 q3 q4 q5 q8 (b) Queries relation is FIFO maintained July 12, 2016 IGERT seminar 14 Rank of Reports Demand for R n match( R, Q ) i i 1 n Qi’s are the members of the queries relation Size of the queries relation determined based on Hoeffding’s inequality confidence _level 1 2e 2 n ( confidence_interval_width) 2 E.g., if n=108, then with 95% chance the demand estimation error is smaller than 0.08 July 12, 2016 IGERT seminar 15 How does peer O determine supply(R)? number _ of _ peers _ that _ have _ R supply ( R) total _ number _ of _ peers A parametric formula giving the supply is beyond the state of the art O machine-learns supply(R) based on metadata of R: Age of R Number of times O sighted R from other peers etc. July 12, 2016 IGERT seminar 16 Computing Supply by Machine-learning MAchine LEarning based Novelty rAnking (MALENA) Reports database of O report -id R1 R4 R2 R7 Report description … … … … aro fin 1 1 2 4 3 2 4 2 aro: The age rank order within O’s reports database fin: The number of times O has sighted the report from other peers July 12, 2016 IGERT seminar 17 MALENA Reports database of O (a) reportReport aro id description R1 R2 … … Tracking Set of O fin R1 1 3 R2 2 1 R3 R1, R2 Advertise to peer B Examples Examplescreated createdby ReceiveREQ Examples set ES of O insert aro fin label 1 3 negative old 2 1 new positive (b) Tracking Set of O Reports database of O reportReport aro id description R1 … 1 R2 … fin 2 July 12, 2016 report- Report aro id description R2 1 R3 R1 … 1 Tracking Set of O fin B R1 4 Reports database of O (c) ReceiveREQ RequestR2) R2 ((R1,R2), R1 IGERT seminar 4 R2 ReceiveRPT (R4) 18 MALENA Implementation Considerations Minimize overhead No need to actually store examples Model incrementally built Bayesian learning a simple but effective method July 12, 2016 IGERT seminar 19 Talk outline Introduction System Model The MARKET Algorithm Evaluation Extension to CTS Conclusion and Future Work July 12, 2016 IGERT seminar 20 Comparison with RANDI (MDM’07) throughput (matches/peer) peer density (D) = 20, energy allocation fraction (F) = 0.15 20 production peers within transmission range report rate (P) = 0.1 reports/second 140 120 100 80 60 MARKET RANDI Ideal Benchmark 40 20 0 0 30 60 90 120 150 180 210 240 270 300 response-time bound (second) RANDI=MARKET-supply July 12, 2016 throughput (matches/peer) mobility model=random way point, average motion speed=1 mile/hour transmission range=100 meters, mean of reports database size=100Kbytes queries database size=100 queries report size uniformly distributed between 1K and 2K bytes 0.1 report produced per second peer density (D) = 1, energy allocation fraction (F) = 0.15 report1production rate transmission (P) = 0.1 reports/second peer within range 140 MARKET RANDI 120 Ideal Benchmark 100 80 60 40 20 0 0 3600 7200 10800 14400 18000 21600 25200 28800 response-time bound (second) MARKET half as good as ideal benchmark MARKET twice better than RANDI IGERT seminar 21 Comparison with LRU and LFU throughput (matches/peer) mobility model=iMotes traces mean of reports database size=150Kbytes queries database size=10 queries report size uniformly distributed between 2K and 20K bytes 0.1 report produced per second, transmission size=100Kbytes 5.5 5 4.5 MARKET 4 RANDI 3.5 LRU LFU 3 2.5 2 0 10 20 30 40 50 response-time bound (second) (results obtained by Fatemeh Vafaee) July 12, 2016 IGERT seminar 22 Evaluation of MALENA (TAAS’09) turn-over: peers enter/exit system injection: number of peers that have a report initially mobility model=iMotes traces, reports database size=100 reports 2 reports produced per second, transmission size=10 reports throughput (reports/peer) throughput (reports/peer) high-turn-over/high-injection low-turn-over/low-injection low-turn-over/low-injection Bluetooth, Bluetooth, M=100, M=100, f=2, f=2, q=0.1, q=0.1, flooding flooding low-turn-over/low-injection high-turn-over/high-injection 200 14000 180 12000 160 10000 140 1208000 100 806000 604000 40 MALE MALE NANA 2000 aro-ranking aro-ranking 20 fin-ranking fin-ranking 0 0 0 50 10 5 10 15 15 20 20 25 25 30 30 35 3540 4045 4550 5055556060 response-time response-time bound bound (minute) (minute) MALENA always follows the best indicator July 12, 2016 IGERT seminar 23 Application: K-nearest-neighbors query-point sink Query: K-nearest-neighbors of a fixed location (query-point) Reports: current locations of mobile sensors match(Q,R): in reverse proportion to the distance from query point July 12, 2016 IGERT seminar 24 Itinerary based KNN processing Phase I: Query delivered to the sensor closest to query point Phase II: Query traverses an itinerary to collect answers Phase III: Answers returned to sink July 12, 2016 IGERT seminar 25 Simulation Results mobility model=random way point, average motion speed=1 mile/hour transmission range=100 meters report size=24 bytes, query size=16 bytes mean of reports database size=100 reports one location report produced at each sensor per second 100% accuracy 80% 60% MARKET 40% Upper bound of itinerary 20% 0% 1 2 3 4 5 6 peer density MARKET is especially suitable for sparse environments July 12, 2016 IGERT seminar 26 Talk outline Introduction System Model The MARKET Algorithm Evaluation Extension to CTS Conclusion and Future Work July 12, 2016 IGERT seminar 27 TrafficInfo: Disseminating Traffic Information in VANET’s July 12, 2016 IGERT seminar 28 What does relevance mean in TrafficInfo B B A A A report is relevant if it changes the route July 12, 2016 IGERT seminar 29 Which factors indicate relevance of report? Distance to the reported road segment Type of road segment Speed variance … July 12, 2016 IGERT seminar 30 Conceptual Learning Procedure An example is created for a received report The example is labeled positive if the report changes route and negative otherwise Individual vs. group How to deal with aggregation? July 12, 2016 IGERT seminar 31 Conclusion sensor-rich + short-range Query processing environment wireless Mobile P2P Store-and-forward enables in-network processing in mobile disconnected networks Ranking is important for dealing with memory, bandwidth, and energy constraints July 12, 2016 IGERT seminar 32 Future Work Multimedia reports Utilization of metadata Integration of stateless and stateful approaches Starvation/fairness July 12, 2016 IGERT seminar 33 Thanks! Questions? July 12, 2016 IGERT seminar 34 802.11 Basics 3 modes: transmitting, receiving, listening (order of power consumption) When listening: if detecting a message destined to host receive-mode Time divided into slots, 20microsecs each Transmission: Listen for 1 time slot If channel free start broadcast (observe collision possible) Broadcast may last for many time slots July 12, 2016 IGERT seminar 35 Energy Efficiency of a Broadcast successfully receive the broadcast from x X Collisions occur at neighbor Throughput (Th) = (expected number of neighbors that successfully receive broadcast) (broadcast size) Power efficiency (PE) = July 12, 2016 Th En IGERT seminar 36 Computation of Throughput X Y Conditions for successful reception at an arbitrary node Y 1. No green node inside starts to broadcast at the same time slot with X 2. No transmission from any purple node overlaps with that from X July 12, 2016 IGERT seminar 37 Energy Constraints Energy consumed by a 802.11 network interface for transmitting a message of size M bytes En=fM+g For 802.11 broadcast, g=26610-6 Joule, f=5.2710-6Joule/byte July 12, 2016 IGERT seminar 38 Experimental MP2P Projects (Pedestrians) 7DS – Columbia University (web pages) iClouds – Darmstadt Univ. (incentives) MoGATU – UMBC (specialized query processing, e.g., collaborative joins) PeopleNet – NUS, IIS-Bangalore (Mobile commerce, information type location baazar) MoB – Wisconsin, Cambridge (incentives, information resources e.g. bandwidth) Mobi-Dik – Univ. of Illinois, Chicago (brokering, physical resources, bandwidth/memory/power management) July 12, 2016 IGERT seminar 39 Vehicular Projects Inter-vehicle Communication and Intelligent Transportation: CarTALK 2000 is a European project VICS (The Vehicle Information and Control System) is a government-sponsored system in Japan with an 11-year track record FleetNet, an inter-vehicle communications system, is being developed by a consortium of private companies and universities in Germany IVI (Intelligent Vehicle Initiative) and VII (Vehicle Infrastructure Integration), the US DOT MP2P provides data management capabilities on top of these communication systems Grassroots, TrafficView, SOTIS, V3 – P2P dissemination of traffic info to reduce travel times July 12, 2016 IGERT seminar 40 RANk-based DIssemination (RANDI) Ranking of reports Bandwidth/energy aware Exchange enhances Consumer functionality Broker functionality Consumer: Answer local query (pull) Broker: Transmit reports most likely requested by future-encountered peers (push) Transmission trigger: Encounter New reports July 12, 2016 IGERT seminar 41 RANDI When two peers meet they conduct a two-phase exchange: Phase 1 answers satisfied as a consumer (pull) Phase 2 enhanced as a broker (push) Phase 1: Exchange queries and receive answers (pull) Phase 2: Exchange more reports using available energy/bandwidth (push) Combination of: unicast (thin line) and broadcast (thick lines) to enable overhearing. July 12, 2016 IGERT seminar 42 RANDI (Cont’d) To solve problem with static peers: Two interaction modes which combine pull and push new reports • Query-response: triggered by discovery of new neighbors • Relay: triggered by receipt of new reports Disseminate to existing neighbors July 12, 2016 IGERT seminar 43 7DS P2P mode: each node periodically broadcasts its query and receives reports from neighboring peers. No strategy to determine query frequency and transmission size. Cache management based on webpage expiration time. query query reports reports query July 12, 2016 query IGERT seminar 44 PeopleNet Reports are randomly selected for exchanging and saving upon encountering. Peer A Peer B Peer A Peer B random-spread before exchange Peer A Peer B after exchange Peer A Peer B random-swap July 12, 2016 before exchange IGERT seminar after exchange 45 7DS Each peer periodically broadcasts its query and receives reports from neighboring peers. No strategy to determine query frequency and transmission size. Cache management based on web-page expiration time. query query reports reports query July 12, 2016 query IGERT seminar 46 PeopleNet Reports are randomly selected for exchanging and saving upon encountering. Peer A Peer B Peer A before exchange July 12, 2016 Peer B after exchange IGERT seminar 47 Mobile Local Search: Applications transportation social networking (wearable website) Search for victims in a rubble asset management and tracking Sale on an item of interest at mall Music-file exchange emergency response Personal profile of interest at a convention Singles matchmaking Floating BBS mobile electronic commerce Announce sudden stop, malfunctioning brake light, patch of ice Floating car data Dissemination of multi-media traffic information (picture, video, voice) Search close-by taxi customer, parking slot, ride-share Sensors on containers exchange security information => remote checkpoints tourist and location-based-services Closest ATM July 12, 2016 IGERT seminar 48 Applications – Common features Mobile/stationary peers Resources of interest in a limited geographic area Short time duration Can be solved by fixed servers, but Unlikely solution Proposed mp2p paradigm can enhance fixed solution (reliability, performance, coverage) July 12, 2016 IGERT seminar 49 MARKET When two peers meet they conduct a two-phase exchange: Local query Phase 1 answers satisfied as a consumer (pull) Phase 2 enhanced as a broker (push) Phase 1: Exchange subscriptions and receive answers (pull) Phase 2: Exchange more publications using available energy/bandwidth (push) Combination of: unicast (thin line) and broadcast (thick lines) to enable overhearing. July 12, 2016 IGERT seminar 50 MARKET (Cont’d) To solve problem with static peers: Two interaction modes which combine pull and push new publications • Query-response: triggered by discovery of new neighbors • Relay: triggered by receipt of new publications Disseminate to existing neighbors July 12, 2016 IGERT seminar 51 Query in static disconnected network Q r Q A q Q A In-network query processing may not be possible July 12, 2016 IGERT seminar 52 Query in static connected sensor network Q A r Q A A A Q A A qA q Q Data transmission delay is 0. July 12, 2016 QA A Answer can be obtained instantaneously IGERT seminar 53 Query in static disconnected network Q r Q A q Q A In-network query processing may not be possible July 12, 2016 IGERT seminar 54 Query in mobile disconnected network Query processing enabled by mobility and store-and-forward qA r QA A q A One hop case July 12, 2016 IGERT seminar 55 Query in mobile disconnected network Q QA A r q Q A qA A The answer is in disseminated only after anitanswer node receives query Query Multil-hop can be case network processed, but is delayed Query processing doesn’t control motion. First alogrithm stage: query disseminated during encounter July 12, 2016 IGERT seminar 56