Cairo Transport and Big Data

advertisement
BIG DATA AND URBAN MOBILITY
Cairo Transport and Big Data
Dr. Ahmad Ibrahim Mosa
Ministry Advisor for Transportation Planning
Director of Transportation Planning Center of Excellence –
Associate Professor, German University, Cairo
 Mobility Figures for GCMA
The Socio-demographic Framework
• Car Ownership will grow at 4.2% p.a.
From 1.3 million (2010) to 2.5 million (2022)
Per Capita Income
Growth : 2.9%
• Households without access to car
From 70% (2001) to 55% (2022)
GRDP and population estimates 2050 forecasts
Year
GRDP
Population
GRDP per capita
Actual and
Av Annual Rate 000 persons Av Annual Rate L.E.
Av Annual Rate
national G
of Growth
of Growth
of Growth
%
133,190
4.1%
15,415
2.20%
8640
4.1%
139,184
4.5%
15,754
2.20%
8835
2.3%
4.5%
148,648
6.8%
16,101
2.20%
9232
4.5%
6.8%
159,202
7.1%
16,464
2.25%
9670
4.7%
7.1%
170,665
7.2%
16,836
2.26%
10137
4.8%
7.2%
184,318
8.0%
17,217
2.26%
10706
5.6%
3.6%
199,063
8.0%
17,606
2.26%
11307
5.6%
3.0%
214,988
8.0%
18,004
2.26%
11941
5.6%
3.8%
232,187
8.0%
18,411
2.26%
12611
5.6%
4.5%
398,941
7.0%
21,639
2.04%
18436
4.9%
7.0%
714,442
6.0%
25,387
1.61%
28142
4.3%
6.0%
1,279,457
6.0%
29,783
1.61%
42959
4.3%
6.0%
2,291,313
6.0%
34,941
1.61%
65577
4.3%
6.0%
L.E. mill.
2004
2005
2006
2007
2008
2009
2010
2011
2012
2020
2030
2040
2050
(1)
IMF actual. Actual GDP growth as recorded in IMF World Economic Outlook, April 2009.
‫‪Travel Demand ( all‬‬
‫‪modes) 2027- 2012‬‬
‫‪2027-2012‬‬
‫‪96.4%‬‬
‫‪87.5%‬‬
‫المصدر ‪ :‬مخطط النقل الشامل للقاهرة الكبري – جايكا ‪ & 2002‬تحديث– مركز تميز تخطيط النقل –وزارة النقل ‪2013‬‬
Cairo Mode share 2012
Data source: JICA Study Team. Data exclude Maadi
and refer to unlinked trips derived from HIS prior to
network calibration procedures.
‫االتوبيس التعاوني‬
1.0 %
0.126 Mill Trips/Day
‫قطار الضواحي‬
Shared Taxi
0.6 %
‫الميترو‬
52.3 %
0.078 Mill Trips/Day
16.6 %
6.501Mill Trips/Day
2.061 Mill Trips/Day
‫ترام‬
Shared Taxi
1.4 %
52.3 %
0.175 Mill Trips/Day
6.501Mill Trips/Day
‫الميكروباص‬
Shared Taxi
52.3 %
52.3 %
6.501 Mill Trips/Day
6.501Mill Trips/Day
‫اتوبيس النقل العام‬
Shared Taxi
24.6 %
52.3 %
3.058 Mill Trips/Day
6.501Mill Trips/Day
‫التاكسي النهري‬
Shared Taxi
0.1 %
52.3 %
0.011 Mill Trips/Day
6.501Mill Trips/Day
)‫الميني باص ( النقل العام‬
Shared Taxi
3.4 %
52.3 %
0.426 Mill Trips/Day
6.501Mill Trips/Day
2013 ‫ & تحديث– مركز تميز تخطيط النقل –وزارة النقل‬2002 ‫ مخطط النقل الشامل للقاهرة الكبري – جايكا‬: ‫المصدر‬
‫الزيادة المتوقعة في الرحالت المتولدة من مناطق النقليات المختلفة ‪2027-‬‬
‫المصدر ‪ :‬دراسة مخطط النقل الشامل للقاهرة الكبري – تحديث مركز تميز تخطيط النقل –وزارة النقل ‪2013‬‬
The Cost of Congestion in Greater Cairo
Cost / Individual / Year
Cost Item
Value of Travel Time
Value
2.2 Billion Hour/ Year
Reliability
1.4 Billion Hour/ Year
Excess Fuel
1.9 Million lit./ Year
CO2 Emissions
Total
7.1 Billion kg
–
Cost Bill/ Year
14.7
9.2
6.6
0.4
30.8
1550
78
56
5
1689
2013 ‫ & دراسة تكلفة النقل الحضري بالقاهرة الكبري – مركز تميز تخطيط النقل –وزارة النقل‬2011 ‫ دراسة تكلفة االزدحام المروري –البنك الدولي‬: ‫المصدر‬
‫تكلفة االزدحام المروري في القاهرة الكبري‬
‫المصدر ‪ :‬دراسة تكلفة االزدحام المروري –البنك الدولي ‪ & 2011‬دراسة تكلفة النقل الحضري بالقاهرة الكبري – مركز تميز تخطيط النقل –وزارة النقل ‪2013‬‬
The Development Corridors
Optimized Scenarios
Types of Data Needed
MOSA A. I., 2010.
12
What are …?
Big Data
Significantly large volumes of data,
particularly involving human
activities and characteristics
The Three V’s
Big data is not only about the volume
of data but also its velocity and
variety
Analytics
High technology applied to data
processing, complex
calculations, and automation
Common Examples
Private/Public Sector
Public Transportation
• Consumer behavior analysis
• Ridership forecasting
• Customer mailing lists/marketing
• Train signaling/dispatching
• Smartphone apps
• Route planning/scheduling
• GPS
• Automatic Vehicle Location (AVL)
• Financial market trading
• Passenger Information Systems
• Astronomical tracking/mapping
• Automated Fare Collection (AFC)
• Weather tracking/forecasting
• Automated Passenger Counting (APC)
• Genome mapping
• Driverless Automatic Train operation
(ATO)
• Crowd surveillance
• Monitoring electronic communications
• Data-mining online/wireless data (Emails,
texting, social media)
• Robots
The phenomena of mobile positioning
• Mobile positioning - locating (pinpointing) mobile telephones using radio
waves
– Active mobile positioning - tracking the location of mobile phones in real
time through a network of antennas
– Passive mobile positioning - uses location and activity information from
historical log files stored by mobile service providers (for charging
clients)
Activities in home network or
when roaming
•
•
•
•
•
voice calls,
SMSes/MMSes,
mobile-net usage,
data transmission operations,
mobile supported GPS usage,
etc.
Data file
•
•
•
•
SIM card ID (statistical pseudonym)
Date and time
Antenna ID with location data
Country ID
15
Collect, Integrate, Manage and Disseminate
Public Web
RSS Media
VMS
Parking
Info
HAR
Parking
Info
SmartDSS
ICM
Coordinator
TMC Agency
Users
Evaluation
System
Expert
System
SmartFusion
Data
Dissemination
Data
Store
SmartEvent
DSS DD
GUI
SmartEvent
Web Based
Info Tool
Alert System
C2C
DSS DD
Process
Signaling
Parking
SystemsInfo
Data
Collection
Parking
ParkingInfo
Info
Link
LinkData
Data
Info
Info
SmartSym
Traffic
Modeling
System
CCTV Info
Parking
Data Fusion
Predictive
System
Administrator
IVR
Traffic
Simulation
Smart Integrated Corridor Management
Weather Info
AVL Info
Weather Info
Transit
WeatherInfo
Info
Schedule
Transit Stop
Weather
Info Info
Integrate data feeds from a wide variety of
sources and have the tools to act on the
assimilated information including:
Traffic
Transit
Human Behavior
O-D
Construction
Incidents
Special Events
Parking
Toll
Weather
Signal
Plans
Scenarios
Then provide the tools to manage the
information effectively including:
Decision Support Tools
Traffic Prediction, Simulation
Inter Agency Communication
Then provide the dissemination platforms to deliver Real-time, predictive,
location based, personalized information “proactively”
16
Big Data Is Needed For Cairo
O-D – HBW Matrix (HBW) Sample Output
Customized Origin and Destination Demand
Trip Distribution Table: For given
Origin- Destination boundary pairs,
estimates trip counts by day or daypart.
Origin
90210
90211
90212
90213
Single Trip Frequency Table: For
given Origin-Destination boundary
pairs, estimates the number of
people who made the trip with a
certain frequency within a given time
period.
2D Trip Frequency Table: A twodimensional Trip Frequency Table
correlates trip frequency counts from
a single Origin to two different
Destinations.
Trip Duration Table: Estimates the
number of people who made trips of
various durations between given
Origin-Destination boundary pairs.
Destination
Zip Code
81724
Destination
81743
81743
81744
81744
Day of the Week
Monday
Monday
Tuesday
Tuesday
Origin
Destination
LA County,
CA
LA County,
CA
LA County,
CA
Orange
County, CA
Orange
County, CA
Clark County,
NV
Clark County,
NV
Clark County,
NV
Clark County,
NV
Clark County,
NV
0 Trips
1 Trips
2 Trips
3 Trips
…
Origin
90210
90210
90210
90211
90211
Time of Day
Morning
Morning
Afternoon
Evening
Number of
Trips
Count
1
5,725
2
1,143
3
274
1
92
2
27
Destination Zip Code 90381
0 Trips
1 Trips
2 Trips
14,378
2,728
1,721
1,207
397
113
403
114
84
172
64
31
…
…
…
Destination
81724
81724
81724
81724
81724
Trip Duration
1 Day
2 Days
3 Days
1 Days
2 Days
Trips
431
129
523
904
3 Trips
221
52
27
14
…
Count
4,357
1,815
363
254
109
…
…
…
…
…
…
Trip Gerneation and Attraction
Demand on Public Transport
Passengers Boarding and Alighting at each station
Hourly Traffic Mentoring
Link By Link data
Time
4/27/2010 10:00
4/27/2010 10:01
4/27/2010 10:02
4/27/2010 10:03
4/27/2010 10:04
4/27/2010 10:05
4/27/2010 10:06
4/27/2010 10:07
4/27/2010 10:08
4/27/2010 10:09
4/27/2010 10:10
Seg
Seg Num
ID
119+09789
119+09789
119+09789
119+09789
119+09789
119+09789
119+09789
119+09789
119+09789
119+09789
119+09789
Count
23.00
16.00
24.00
18.00
30.00
27.00
23.00
40.00
34.00
37.00
29.00
Calc
139.33
136.49
137.50
148.85
119.23
118.71
119.77
112.29
109.50
107.29
109.90
Min
112.09
100.37
100.37
117.07
90.88
84.62
95.46
95.46
90.72
82.44
92.78
Travel Time
Max
StdDev%StdDev Calc
155.61
16.56
23.07
26.14
155.61
22.16
30.24
26.69
166.34
19.90
27.36
26.49
171.20
17.11
25.47
24.47
162.70
19.52
23.27
30.55
162.70
21.08
25.02
30.68
162.70
17.95
21.49
30.41
134.81
11.79
13.24
32.44
128.74
10.94
11.98
33.26
119.56
12.85
13.79
33.95
119.56
8.76
9.63
33.14
Route Data (aggregated Segments)
Time
4/27/2010 15:18
4/27/2010 15:19
4/27/2010 15:20
4/27/2010 15:21
4/27/2010 15:22
4/27/2010 15:23
4/27/2010 15:24
4/27/2010 15:25
4/27/2010 15:26
4/27/2010 15:27
4/27/2010 15:28
4/27/2010 15:29
4/27/2010 15:30
4/27/2010 15:31
4/27/2010 15:32
4/27/2010 15:33
4/27/2010 15:34
Route ID
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
MO A-3
Count Avg TT - Calc Spd - Calc TT - Dly Spd - Dly Conf
17.00
795.97
56.01
50.90
5.54
0.48
28.00
791.79
56.30
46.72
5.24
0.49
23.00
809.90
55.04
64.83
6.50
0.49
23.00
799.82
55.74
54.75
5.81
0.49
22.00
802.24
55.57
57.17
5.98
0.49
17.00
805.19
55.37
60.12
6.18
0.49
11.00
815.17
54.69
70.10
6.86
0.49
13.00
815.31
54.68
70.24
6.87
0.49
16.00
847.62
52.59
102.55
8.95
0.48
14.00
835.52
53.36
90.45
8.19
0.49
12.00
851.21
52.37
106.14
9.17
0.49
16.00
851.21
52.37
106.14
9.17
0.49
12.00
839.29
53.12
94.22
8.43
0.49
15.00
826.26
53.95
81.20
7.59
0.49
19.00
775.94
57.45
30.87
4.09
0.50
18.00
767.20
58.11
22.13
3.44
0.50
8.00
854.43
52.18
109.36
9.37
0.49
Min
23.41
23.41
21.90
21.27
22.39
22.39
22.39
27.02
28.29
30.46
30.46
Speed
Max
32.49
36.29
36.29
31.11
40.08
43.04
38.16
38.16
40.15
44.18
39.26
StdDev TT Dly
4.33
48.27
5.91
45.43
5.27
46.45
4.19
57.80
5.96
28.17
6.47
27.66
5.46
28.71
3.82
21.23
3.64
18.45
4.36
16.24
2.90
18.85
Spd Dly
13.86
13.31
13.51
15.53
9.45
9.32
9.59
7.56
6.74
6.05
6.86
Conf
0.42
0.42
0.38
0.39
0.36
0.36
0.37
0.34
0.38
0.38
0.39
Len
1.01
1.01
1.01
1.01
1.01
1.01
1.01
1.01
1.01
1.01
1.01
Speed Monitoring and Congestion analysis
Unique Advantages for Transportation Planning
mobility data provides significant advantages over other mobile location
and monitoring technologies.
 More data and market coverage – by far – with data received
continually from every active phone on every participating network.
 Mobility data is derived from actual observations of traffic and
consumer movement, as they happen – rather than a predictive “guess”
based on limited data.
 Measures mobility in an “organic” way, without the behavior biases
inherent when using surveys, probe vehicles, or similar techniques.
 Data is readily available when and as needed to support either planned
or ad hoc project needs.
 Big Data would offers significant cost savings – up to 60% or more –
versus traditional mobility data collection.
Extracting Vehicular Data From
Moving Cell phones on Highways
GUC- Ministry of Transportion Team
Ahmed Mosa, Fadwa Fawzy
Proposed Method
• Our method will be explained as follows:
1. Traffic Data Generation.
2. Cell Phones Data Generation.
3. Dynamic Clustering Algorithm.
• As mentioned before, obtaining vehicle/cell phone data
in Egypt is extremely difficult due to security purposes.
• We used " Simulation of Urban MObility" (SUMO) to
generate traffic data.
• SUMO is an open source, microscopic, multi-modal
traffic simulation. It allows to simulate how a given
traffic demand which consists of single vehicles moves
through a given road network. The simulation allows to
address a large set of traffic management topics. It is
purely microscopic in which each vehicle is modeled
explicitly, has an own route, and moves individually
through the network.
Network Specifications
3 lanes highway segment
10 Km length
Max allowed speed 50 Km/sec
10 Km
In
Out
Vehicles Specifications
• We simulated the behavior of four types of
Type
accel
decel
Length
Min-gap
Max-speed
sigma
vehicles:
0
3
6
5
2.5
100
0.5
1
2
6
7.5
2.5
60
0.5
2
1
5
5
2.5
65
0.5
3
4
5
7.5
2.5
80
0.5
 In this table each type of vehicle specified with its acceleration (accel),
deceleration (decel), length, min-gap, max allowed speed(in Km/hr), and the
driver behavior (sigma).
 Types are (0,1,2,and 3) corresponding to (private car, truck, microbus, and bus)
respectively.
SUMO Ins & Outs
• The format of the output.XML file generated
<timestep time="<COLLECTION_TIME>" id="<DETECTOR_ID>" vtype="<TYPE>">
by SUMO
is as
follows
<vehicle
id="<VEHICLE_ID>"
lane="<LANE_ID>"
pos="<POSITION_ON_LANE>" \
x="<X-COORDINATE>" y="<Y-COORDINATE>" \ lat="<LAT-COORDINATE>"
lon="<LON-COORDINATE>" \ speed="<VEHICLE_SPEED>"/>
... further vehicles ...
</timestep>
... further time steps ...
XML Parser
• To move on to the next step (cell phone data
generation), we need to parse the vehicle data
generated by SUMO.
Timestamp
Vehicle ID
X
Y
Speed
Vehicle Type
• We used MATLAB R2013a XML parser to get
the data out of the XML file in the following
table format.
Timestamp
Vehicle ID
X
Y
Speed
Vehicle Type
Samples of The generated Vehicles
Generate Cellphones Locations
• For each vehicle, cell phones are randomly
distributed around its location (x,y) within a
curtain diameter (D).
At each time stamp, the
number of cell phones for
each vehicle does not change,
but their locations around the
vehicle are changed Within D.
(x,y)
+
X
Y
Speed
Vehicle ID
Samples of The generated Cell Phones
Snapshots
Snapshots (cont.)
Continuous Clustering of Moving Objects.
DYNAMIC CLUSTERING ALGORITHM
• In this work, a dynamic clustering algorithm is
used to cluster the cell phones generated from
the previous step.
• The cluster behavior can then be estimated
and used as a vehicle behavior (under the
assumption : a cluster of cell phones is a moving vehicle).
How We Cluster
• This clustering algorithm utilizes the
cellphones location and speed at each time
step to predict the cellphones positions in the
near feature.
• This method makes sure that in the near
future cellphones will remain part of their
clusters. In-addition, cluster split/merge
actions can be predicted.
Object Modeling
• Each cell phone(object) is capable of
transmitting its current location and velocity
to a central server every U units of time (10
sec in our experiment).
• The server can use these data to predict the
object location until the next update time.
Server
Every U units of time
The new position, t>tu
𝑥 𝑡 = 𝑥𝑢 + 𝑣𝑢 . (𝑡 − 𝑡𝑢 )
(𝑂𝐼𝐷, 𝑥𝑢 , 𝑣, 𝑡𝑢 )
Cluster Modeling
• Clustering feature (CF)
It is a compact, incrementally maintainable data structure that
summarizes a cluster and that can be used for computing the average
radius of a cluster. The features for cluster at time t is:
𝐶𝐹 = (𝑁, 𝐶𝑋, 𝐶𝑋 2 , 𝐶𝑉, 𝐶𝑉 2 , 𝐶𝑋𝑉, 𝑡)
N: # objects within the cluster.
𝑁
𝐶𝑋 =
𝐶𝑣 =
𝑥𝑖 (𝑡)
𝑖=1
𝑁
𝑖=1 𝑣𝑖 (𝑡)
𝑁
𝐶𝑋𝑣 =
𝑥𝑖 𝑡 . 𝑣𝑖 𝑡
𝑖=1
𝑁
CX 2 =
C𝑣 2 =
𝑥𝑖2 (𝑡)
𝑖=1
𝑁
2
𝑖=1 𝑣𝑖 (𝑡)
CF Claims
• CF at time 𝑡 can be updated at new time 𝑡𝑛𝑒𝑤 based on
its value at 𝑡 and (𝑡𝑛𝑒𝑤 − 𝑡).
𝐶𝐹 ′ = (𝑁, 𝐶𝑋 + 𝐶𝑉 𝑡𝑛𝑜𝑤 − 𝑡 ,
𝐶𝑋 2 + 2𝐶𝑋𝑉 𝑡𝑛𝑜𝑤 − 𝑡
+ 𝐶𝑉 2 𝑡𝑛𝑜𝑤 − 𝑡 2 , 𝐶𝑉, 𝐶𝑉 2 , 𝐶𝑋𝑉
+ 𝐶𝑉 2 𝑡𝑛𝑜𝑤 − 𝑡 , 𝑡𝑛𝑜𝑤 )
• If object given by (𝑂𝐼𝐷, 𝑥, 𝑣, 𝑡) is inserted or deleted to
acluser with CF it becomes
𝐶𝐹
= (𝑁 ∓ 1, 𝐶𝑋 ∓ 𝑥, 𝐶𝑋 2 ∓ 𝑥 2 , 𝐶𝑉 ∓ 𝑣, 𝐶𝑉 2 ∓ 𝑣 2 , 𝐶𝑉𝑋
∓ 𝑥𝑣, 𝑡)
CF claims (cont.)
• Each cluster has virtual moving center object
𝐶𝑋 𝐶𝑉
given by (𝑂𝐼𝐷, , , 𝑡)
𝑁
𝑁
• Each cluster has average radius R(t) which
represents the cluster compactness.
𝑅 𝑡 =
1
𝑁
𝑁
2 (𝑜𝑏𝑗𝑒𝑐𝑡 , 𝑐𝑒𝑛𝑡𝑒𝑟
𝐸𝐷
𝑖
𝑖=1
𝑜𝑏𝑗𝑒𝑐𝑡)
• 𝑅 𝑡2 can be computed based on the 𝑅(𝑡1 )
Measuring Object Movement
Dissimilarity
• Select 𝑚 ≥ 1 time stamps 𝑡1 , … , 𝑡𝑚 . Each slot
is associated with a weight 𝑤𝑖 .
• The object location is predicted at each time
stamp.
∀𝑖 (𝑡𝑖 < 𝑡 & 𝑡𝑛𝑜𝑤 ≤ 𝑡𝑖 ≤ 𝑡𝑛𝑜𝑤 + 𝑈 & 𝑤𝑖
≥ 𝑤𝑖+1 )
The closer the time slot to the 𝑡𝑛𝑜𝑤 , the higher 𝑤𝑖 it has
i = 1,….,m
𝑡𝑛𝑜𝑤
U
2U
• Object positions are computed at the chosen
time stamps. Given an object O, its positions at
times 𝑡1 , … . , 𝑡𝑚 are 𝑥 1 , … , 𝑥 𝑚
• The dissimilarity function between
Two objects:
𝑚
𝑤𝑖 𝐸𝐷 2 (𝑂1 , 𝑂2 )
𝑀 𝑂1 , 𝑂2 =
𝑖=1
Object and Cluster:
𝑁
𝑀 𝑂, 𝐶 =
𝑁+1
𝑚
𝑤𝑖 𝐸𝐷 2 (𝑂, 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑒𝑛𝑡𝑒𝑟)
𝑖=1
The Insertion Operation
• To insert object O with 𝑂𝐼𝐷, 𝑥, 𝑣, 𝑡 . Find the
cluster C with the closest center to O, using M
function.
• Introduce threshold 𝜌𝑔 represents the max
acceptable distance between the closest
clusters.
Calculate M
yes
Create new cluster to O
> 𝜌𝑔
No
Add O to C, update CF
No
Split needed
for C after
adding O
yes
Check any of the
new clusters can
be merged to
others
Stop
The Deletion Operation
• Next, to delete an object O, a hash table was
used to locate the cluster C that object O
belongs to. Then we remove object O, and we
adjust the clustering feature for C.
Split and Merge
• Two situations exist where a cluster must be split.
1. When the number of objects in the cluste
exceeds a user-specified threshold (i.e., the
maximum cluster capacity)
2. When the average radius of the cluster exceeds
a threshold 𝜌𝑠 , which means that the cluster is
not compact enough.
Here, the threshold 𝜌𝑠 can be defined by the users
if they want to limit the cluster size, or estimated as
the average radius of clusters.
Enabling high resolution traffic analysis
from cellular Big data
EJUST team: Ahmed El-Mahdy, Tetsuji Ogawa, Essam Algizawy
• Typical Cell Phone Service Provider Data:
– <Timestamp, UserID, AntennaID>
– … few billion records!
• The resolution is in km!
• Not suitable for measuring traffic at the road
level
Approach/Methodology
• Generate cellular big data via simulation
– SUMO simulator
– Consider the available city of Osnabrück
– Construct 2M records
• Build Markovian Model form Simulator
• Utilise Viterbi Decoding to Recover Actual
Routes
Cell
Antennas
Viterbi paths,
matching the
simulation
results
On-Demand High Performance
Clusters on Mobile Phones
Joined Collaborative R&D Project between EJUST and IBM Center for Advanced Studies,
in Cairo: Ahmed El-Mahdy (EJUST), Hisham Elshishiny (IBM), Essam Algizawy (EJUST)
• Processing of big data is
distributed and happens
close to when the data is
sensed
• Mobile phones provides
low-cost high performance
computing
• We utilise the concept of
‘expiring’ threads to ease
migration issues
• The concept of ‘micromoney’ is utilised
 Thank you
Download