Enhancing Public Policy Decision Making using Large

advertisement
Enhancing Policy Decision Making
with Large-Scale Digital Traces
Vanessa Frias-Martinez
University of Maryland
NFAIS, February 2014
5.9 billion
87%
3.2 billion
unique users
45%
mobile devices
>>humans
Have you ever heard of DATIFICATION?
1. Yes
2. No
Mobile Digital Footprints…
…for Social Good?
Research Goal
To extract human behavioral
information from mobile digital
traces in order to assist decision
makers in organizations working
for social development
TOOLS
BEHAVIORAL
INSIGHTS
To enhance or
Data Mining
complement
informationMachine
in an Learning
Statistical
affordable manner
MOBILE DIGITAL
TRACES
Energy
Education
Interviews,
surveys:
Health
Information to
assist on policy
Transportation
decisions
Safety
RESEARCH
DECISION MAKERS
OUTLINE
Outline
• Cell Phone Data
• Projects with Social Impact
– Cencell
– AlertImpact
Cell Phone Data
Call Detail Records
Granularity
1-4km²
Anonymized
CDR: Caller | Callee | Date | Duration | Geolocation
Modeling Human Behavior
Consumption
• Number calls, duration, frequency, SMS/MMS/voice
• Expenses
• Handset Type and Features
Over 270 variables
Social
• Degree of the social network
• Strength of the contacts (Reciprocity & Frequency)
• Geography of the social contacts
Mobility
• Mobility Patterns (Entropy)
• Diameter of mobility
• Radius of gyration (Home/Work)
CenCell
Cost-Effective Census Maps
From Cell Phone Data
Motivation: Census Maps
A/B
C+
C
D
E
National Statistical Institutes
A/B
C+
C
D
E
Important Data Comes at a Price
Expensive
Low resource
regions
A/B
C+
C
D
E
Can the variables extracted from Call
Detail Records be used as predictors of
regional socioeconomic levels (SELs)?
Cost-effective Maps
NSI surveys
NSI carries
subset of
out surveys
regions
Cell Phone
Data
Forecasting
Models
REDUCE
COSTS
Predict the Present
Methodology
Classifying SELs - Training
SEL
Consumption
Aggregated
1-4km²
Social
Mobility
CLASSIFIER
Classifying SELs - Testing
Consumption
Social
Mobility
Aggregated
CLASSIFIER
SEL
Experimental Evaluation
Datasets
• Data for a city in Latin America (NSI)
– 1200 regions (GUs)
– SEL values from 0..100
• Call Detail Records
– 6 months, 500K customers
– City has 920 coverage areas
– 279 variables per coverage area
Evaluation Results
Random Forests 86%
3 SELs (A,B,C)
EM Clustering 68%
6 SELs (A,B,…,F)
Human Behavior
and
Census Variables
Large Scale Quantitative Analysis
Consumption
Social
Mobility
Insights
Consumption Variables
Mobility Variables
AlertImpact
Understanding the Impact of Health
Alerts using Cell Phone Data
H1N1 Mexico Timeline
Preflu
Medical
Alert
17th April
Closing
Schools
Reopen
27th April
6th May
Suspension
1st May
Can we measure the impact that
government alerts had on the
mobility of the population ?
Evaluation
• Call Records from 1st Jan till 31st May 2009
– Compute mobility as different number of BTSs visited
• Stages
– Medical Alert - Stage 1 (17th-27th April)
– Closing Schools - Stage 2 (28th-1st May)
– Suspension of Essential Activities - Stage 3 (1st May-6th May)
• Baselines
– same periods, different year (2008)
Changes in Mobility
April 27th
May 1st
May 6th
Mobility reduced between
10% and 30%
Alert
Alert
Baseline
Closed
Shutdown
Closed Suspension
Reopen
Reopen
Changes in Epidemic Spreading
K
Baseline
(“preflu” behavior all weeks)
BASELINE
Intervention (alert,closed,shutdown)
Epidemic peak postponed 40
hours
Reduced number of infected
in peak agents by 10%
University Campus
Statistically Significant
Decrease during Stages 2 and 3
Airport
Statistically Significant
Increase during Stages 2 and 3
Take Away Message
Take Away Message
• Geolocated traces allow us to quantitatively
– Model human behavior
– Measure behavioral changes
– Predict/Classify external sources of information
Future
• Enhance and complement the tools currently
used by decision makers in organizations
working for social good
– Use of open datasets, social media and other
digital traces
Thanks !!
vfrias@umd.edu
Download