NextPlace: A Spatio-Temporal Prediction Framework for Pervasive Systems Salvatore Scellato1, Micro Musolesi, Cecilia Mascolo1, Vito Latora, and Andrew T. Campbell2 1 Computer Laboratory, University of Cambridge, UK 2 Department of Computer Science, Dartmouth College, USA Pervasive’11 Motivation • The ability to predict future locations of people allows ▫ A rich set of novel pervasive applications and systems Advertisement Leisure events reports and notification ▫ With pervasive technology, these could be implemented in a more effective way avoiding the delivery of information to uninterested users providing a better user experience NextPlace • A ▫ ▫ ▫ new prediction framework based on nonlinear time series analysis For forecasting user behavior In different locations From a spatio-temporal point of view • Estimate the duration of a visit to a certain location and of the interva ls between two subsequent visits ▫ When they visit their most important places • Do not focus on the transitions between different locations Predictability & Intuition • Any prediction of future user behavior is based on the assumption of determinism ▫ Determinism: future events are determined by past events • Human activities are characterized by a certain degree of regularity and predictabi lity ▫ Because in human societies daily and weekly routines are well-established • Intuition ▫ The sequence of important locations that an individual visits every day is more or less fixed ▫ Example if a woman periodically goes to the gym on Mondays and Thursdays, she may change her routine for those days, but the changed routine will be more or less the same over different weeks. • Two steps of Next Place ▫ How to isolate the user’s significant places ▫ How to estimate future times of arrival and residence times in the different sig nificant places • Omit the detailed nonlinear prediction model Significant Place Extraction: GPS • Many solutions have been presented in the literature [2,11,18] • Intuition: permanence at a place is directly proportional to the importance that is attrib uted it by the user • Approach: 2-D Gaussian distribution weighted by the residence time at each GPS point ▫ The value of the variance for the Gaussian distributions: 10 meters • Frequency map ▫ ▫ Contains peaks which give information about the position of popular locations Significant places: regions that are above a certain threshold T (a) Frequency map (b) Significant places Significant Place Extraction: WiFi • Intuition: the most frequently seen access points are natural candidates to re present significant places • Methodology ▫ Determine a significant place if the user has a sequence of at least n visits to the access point In the setting: n = 20 Predicting User Behavior • Algorithm description ▫ The history of visits of a user to each of its significant locations is considered ▫ For each location, try to predict when the next visits will take place and for how lon g they will last • Procedure ▫ Create two time series from the sequence of previous visits C = (c1, c2, …, cn), time series of the visit start times D = (d1, d2, …, dn), time series of the visit duration ▫ Search in the time series C sequence of m consecutive values (ci-m+1, …, ci) that are closely similar to the last m values (cn-m+1,…,cn) ▫ Estimate next value of time series C by averaging all the values ci+1 that follow each found sequence ▫ Select corresponding sequences (di-m+1, …, di) ▫ Estimate next value of time series D by averaging all the values di+1 that follow thes e sequence • Do not consider type of visit place, the visit purpose, correlation of visit place s, … Example • Last three visit of a certain user to a location ▫ Monday at 6:30pm ▫ Monday at 10:00pm ▫ Tuesday at 8:15 am • Find sequences that are numerically close to (6:30pm, 10:00pm, 8:15 pm) ▫ i.e., (6:10pm, 9:50pm, 8:35am) and (6:35pm, 10:10pm, 8:00am) • Assume that the next visits that follow these subsequences ▫ Start at 1:10pm and 12:40pm ▫ Last for 40 and 30 minutes • Estimate the next visit at 12:55pm for 35 minutes Validation: Datasets • Cabspotting: movement traces of taxi cabs in San Francisco with GP S coordinates of approximately 500 taxis • CenceMe GPS: during the deployment of CenceMe[21], at Dartmout h College with GPS • Dartmouth WiFi: extracted from SNMP logs of the WiFi LAN of Dart mouth College campus • Ile Sans Fils: a non-profit organization which operates a network of free WiFi hotspots in Montreal, Canada. Over 45,000 users with 140 h otspots Validation: Careful Choice of suitable threshold • GPS-based: threshold T for frequency map ▫ T: a fraction of the maximum value of the frequency map ▫ T=0.10 for Cabspotting ▫ T=0.15 for CenceMe GPS Validation: Predictability Test • Mean quadratic prediction error ▫ 𝜀= 1 𝑁 𝑁 𝑛=1(𝑠𝑛 − 𝑝𝑛 )2 ▫ sn = time series ▫ pn = predicted values • Predictability error: error / variance^2 ▫ If this ratio is close to 1, the mean quadratic prediction error is large no determinism is present ▫ If this ratio is close to 0, the mean quadratic prediction error is small a high degree of determinism Evaluation • Methodology ▫ NPm : NextPlace with m = 1, 2, 3 ▫ M1, M2: first-order and second-order Markov-based ▫ L: NextPlace with linear predictor • Definition of correctness ▫ If we predict that, at time T, the user will be at location L at time TP = T + delta T ▫ Correct only if the user is at L at any time during the interval [TP – theta, TP+theta]