May Offermans, Martijn Tennekes, Alex Priem, Shirley Ortega en Nico Heerschap
1 Data Sources
‐ Event Data Records(EDR)
‐ Customer databases
2 Privacy and processing
3Results
‐ Applications in statistics
• Daytime population
• Tourism
4 Conclusions
2
Call Detail Records/ Event Data Detail Records
Call Detail records can contain many variables like:
– the phone number of the subscriber originating the call (calling party)
– the phone number receiving the call (called party)
– the starting time of the call (date and time)
– the call duration
– the billing phone number that is charged for the call
– the identification of the telephone exchange or equipment writing the record
– a unique sequence number identifying the record
– the disposition or the results of the call, indicating, for example, whether or not the call was connected
– call type (voice, SMS, etc.)
– Each exchange manufacturer decides which information is emitted on the tickets and how it is formatted. Examples:
– Timestamp
3
Call Detail Records/ Event Data Detail Records
– Monthly 4 Billion Event Data/Detail Records of
6-7 million users contains information of:
‐ Antenna location
‐ Time indicator
‐ In- or outgoing
‐ Technology information (data, sms, call ..dual/umts)
‐ Roaming (foreign devices)
– Customer database (unique number of foreign callers per months)
4
‐ Daytime population
‐ Mobility, of which tourism
‐ Safety
‐ Demographics
‐ Border traffic
‐ Economical activity
‐ Disaster management or safety planning
‐ Use of public services
‐ Sociology (calling patterns)
‐ Health
Titel van de presentatie
– Problems big data
‐ Dynamical data source that keeps on growing
‐ Daily change of antenna locations (4G)
‐ Software
‐ Transporting data
‐ Security issues
‐ Privacy
‐ Costs ->>>>
7
Anonymized aggregated data
‐ Micro data from the mobile network will be transferred to a new server system.
‐ During this process most sensitive variables become hashed or deleted.
‐ Only Mezuro has access to the process to collect aggregated anonymized data
Validated output for mobility reporting
Aggregation & validation
(Anonymisation – phase 2)
Automated ‘blind’ analysis
Replace User-IDs
(Anonymisation – phase 1)
Traffic data
(Events = CDR’s)
– Advantages
‐ Save, quick, fast, cheap, limits the risks and no personal data
– Disadvantages
‐ Does not fit current methodological practice
• No personal data, so cannot be coupled to other personal data.
• Persons are not followed directly
• No direct weighing
– ‘New’ statistics- > Daytime population
– Tourism statistics -> Inbound tourism
10 Titel van de presentatie
Source: Vodafone/Mezuro, compiled by SN
Source: Vodafone/Mezuro, compiled by SN
Almere: commuter town?
Municipal
Personal
Records
Database
Inbound tourism
Roaming data
– German tourists (= devices)
Source: Vodafone/Mezuro, compiled by SN
14
German tourists at the coast
Devices
Rainfall
Source: Vodafone/Mezuro, compiled by SN
Portugese roaming data during 2013 UEFA Cup
League final, Benfica (Portugal) - Chelsea (England)
Source: Vodafone/Mezuro, compiled by SN
16
Source: Vodafone/Mezuro, compiled by SN 17
Different type of communication
Source: Vodafone/Mezuro, compiled by SN
18
– Potential
‐ Replace existing statistics and new statistics
‐ Smaller area and smaller timeframes
‐ Events
‐ Also when 24 hour limit is dropped:
• Daytrips and number overnight stays
• Flows of tourists
• Tourist related areas
– Rather trends then volumes (benchmarking)
– Privacy issues, but also access (telecom providers)
– New methodological issues/new framework (representativeness)
– Role of national statistical offices?
– Revolutionary or evolutionary?
19