Enhancing Policy Decision Making with Large-Scale Digital Traces Vanessa Frias-Martinez University of Maryland NFAIS, February 2014 5.9 billion 87% 3.2 billion unique users 45% mobile devices >>humans Have you ever heard of DATIFICATION? 1. Yes 2. No Mobile Digital Footprints… …for Social Good? Research Goal To extract human behavioral information from mobile digital traces in order to assist decision makers in organizations working for social development TOOLS BEHAVIORAL INSIGHTS To enhance or Data Mining complement informationMachine in an Learning Statistical affordable manner MOBILE DIGITAL TRACES Energy Education Interviews, surveys: Health Information to assist on policy Transportation decisions Safety RESEARCH DECISION MAKERS OUTLINE Outline • Cell Phone Data • Projects with Social Impact – Cencell – AlertImpact Cell Phone Data Call Detail Records Granularity 1-4km² Anonymized CDR: Caller | Callee | Date | Duration | Geolocation Modeling Human Behavior Consumption • Number calls, duration, frequency, SMS/MMS/voice • Expenses • Handset Type and Features Over 270 variables Social • Degree of the social network • Strength of the contacts (Reciprocity & Frequency) • Geography of the social contacts Mobility • Mobility Patterns (Entropy) • Diameter of mobility • Radius of gyration (Home/Work) CenCell Cost-Effective Census Maps From Cell Phone Data Motivation: Census Maps A/B C+ C D E National Statistical Institutes A/B C+ C D E Important Data Comes at a Price Expensive Low resource regions A/B C+ C D E Can the variables extracted from Call Detail Records be used as predictors of regional socioeconomic levels (SELs)? Cost-effective Maps NSI surveys NSI carries subset of out surveys regions Cell Phone Data Forecasting Models REDUCE COSTS Predict the Present Methodology Classifying SELs - Training SEL Consumption Aggregated 1-4km² Social Mobility CLASSIFIER Classifying SELs - Testing Consumption Social Mobility Aggregated CLASSIFIER SEL Experimental Evaluation Datasets • Data for a city in Latin America (NSI) – 1200 regions (GUs) – SEL values from 0..100 • Call Detail Records – 6 months, 500K customers – City has 920 coverage areas – 279 variables per coverage area Evaluation Results Random Forests 86% 3 SELs (A,B,C) EM Clustering 68% 6 SELs (A,B,…,F) Human Behavior and Census Variables Large Scale Quantitative Analysis Consumption Social Mobility Insights Consumption Variables Mobility Variables AlertImpact Understanding the Impact of Health Alerts using Cell Phone Data H1N1 Mexico Timeline Preflu Medical Alert 17th April Closing Schools Reopen 27th April 6th May Suspension 1st May Can we measure the impact that government alerts had on the mobility of the population ? Evaluation • Call Records from 1st Jan till 31st May 2009 – Compute mobility as different number of BTSs visited • Stages – Medical Alert - Stage 1 (17th-27th April) – Closing Schools - Stage 2 (28th-1st May) – Suspension of Essential Activities - Stage 3 (1st May-6th May) • Baselines – same periods, different year (2008) Changes in Mobility April 27th May 1st May 6th Mobility reduced between 10% and 30% Alert Alert Baseline Closed Shutdown Closed Suspension Reopen Reopen Changes in Epidemic Spreading K Baseline (“preflu” behavior all weeks) BASELINE Intervention (alert,closed,shutdown) Epidemic peak postponed 40 hours Reduced number of infected in peak agents by 10% University Campus Statistically Significant Decrease during Stages 2 and 3 Airport Statistically Significant Increase during Stages 2 and 3 Take Away Message Take Away Message • Geolocated traces allow us to quantitatively – Model human behavior – Measure behavioral changes – Predict/Classify external sources of information Future • Enhance and complement the tools currently used by decision makers in organizations working for social good – Use of open datasets, social media and other digital traces Thanks !! vfrias@umd.edu