BIG DATA AND URBAN MOBILITY Cairo Transport and Big Data Dr. Ahmad Ibrahim Mosa Ministry Advisor for Transportation Planning Director of Transportation Planning Center of Excellence – Associate Professor, German University, Cairo Mobility Figures for GCMA The Socio-demographic Framework • Car Ownership will grow at 4.2% p.a. From 1.3 million (2010) to 2.5 million (2022) Per Capita Income Growth : 2.9% • Households without access to car From 70% (2001) to 55% (2022) GRDP and population estimates 2050 forecasts Year GRDP Population GRDP per capita Actual and Av Annual Rate 000 persons Av Annual Rate L.E. Av Annual Rate national G of Growth of Growth of Growth % 133,190 4.1% 15,415 2.20% 8640 4.1% 139,184 4.5% 15,754 2.20% 8835 2.3% 4.5% 148,648 6.8% 16,101 2.20% 9232 4.5% 6.8% 159,202 7.1% 16,464 2.25% 9670 4.7% 7.1% 170,665 7.2% 16,836 2.26% 10137 4.8% 7.2% 184,318 8.0% 17,217 2.26% 10706 5.6% 3.6% 199,063 8.0% 17,606 2.26% 11307 5.6% 3.0% 214,988 8.0% 18,004 2.26% 11941 5.6% 3.8% 232,187 8.0% 18,411 2.26% 12611 5.6% 4.5% 398,941 7.0% 21,639 2.04% 18436 4.9% 7.0% 714,442 6.0% 25,387 1.61% 28142 4.3% 6.0% 1,279,457 6.0% 29,783 1.61% 42959 4.3% 6.0% 2,291,313 6.0% 34,941 1.61% 65577 4.3% 6.0% L.E. mill. 2004 2005 2006 2007 2008 2009 2010 2011 2012 2020 2030 2040 2050 (1) IMF actual. Actual GDP growth as recorded in IMF World Economic Outlook, April 2009. Travel Demand ( all modes) 2027- 2012 2027-2012 96.4% 87.5% المصدر :مخطط النقل الشامل للقاهرة الكبري – جايكا & 2002تحديث– مركز تميز تخطيط النقل –وزارة النقل 2013 Cairo Mode share 2012 Data source: JICA Study Team. Data exclude Maadi and refer to unlinked trips derived from HIS prior to network calibration procedures. االتوبيس التعاوني 1.0 % 0.126 Mill Trips/Day قطار الضواحي Shared Taxi 0.6 % الميترو 52.3 % 0.078 Mill Trips/Day 16.6 % 6.501Mill Trips/Day 2.061 Mill Trips/Day ترام Shared Taxi 1.4 % 52.3 % 0.175 Mill Trips/Day 6.501Mill Trips/Day الميكروباص Shared Taxi 52.3 % 52.3 % 6.501 Mill Trips/Day 6.501Mill Trips/Day اتوبيس النقل العام Shared Taxi 24.6 % 52.3 % 3.058 Mill Trips/Day 6.501Mill Trips/Day التاكسي النهري Shared Taxi 0.1 % 52.3 % 0.011 Mill Trips/Day 6.501Mill Trips/Day )الميني باص ( النقل العام Shared Taxi 3.4 % 52.3 % 0.426 Mill Trips/Day 6.501Mill Trips/Day 2013 & تحديث– مركز تميز تخطيط النقل –وزارة النقل2002 مخطط النقل الشامل للقاهرة الكبري – جايكا: المصدر الزيادة المتوقعة في الرحالت المتولدة من مناطق النقليات المختلفة 2027- المصدر :دراسة مخطط النقل الشامل للقاهرة الكبري – تحديث مركز تميز تخطيط النقل –وزارة النقل 2013 The Cost of Congestion in Greater Cairo Cost / Individual / Year Cost Item Value of Travel Time Value 2.2 Billion Hour/ Year Reliability 1.4 Billion Hour/ Year Excess Fuel 1.9 Million lit./ Year CO2 Emissions Total 7.1 Billion kg – Cost Bill/ Year 14.7 9.2 6.6 0.4 30.8 1550 78 56 5 1689 2013 & دراسة تكلفة النقل الحضري بالقاهرة الكبري – مركز تميز تخطيط النقل –وزارة النقل2011 دراسة تكلفة االزدحام المروري –البنك الدولي: المصدر تكلفة االزدحام المروري في القاهرة الكبري المصدر :دراسة تكلفة االزدحام المروري –البنك الدولي & 2011دراسة تكلفة النقل الحضري بالقاهرة الكبري – مركز تميز تخطيط النقل –وزارة النقل 2013 The Development Corridors Optimized Scenarios Types of Data Needed MOSA A. I., 2010. 12 What are …? Big Data Significantly large volumes of data, particularly involving human activities and characteristics The Three V’s Big data is not only about the volume of data but also its velocity and variety Analytics High technology applied to data processing, complex calculations, and automation Common Examples Private/Public Sector Public Transportation • Consumer behavior analysis • Ridership forecasting • Customer mailing lists/marketing • Train signaling/dispatching • Smartphone apps • Route planning/scheduling • GPS • Automatic Vehicle Location (AVL) • Financial market trading • Passenger Information Systems • Astronomical tracking/mapping • Automated Fare Collection (AFC) • Weather tracking/forecasting • Automated Passenger Counting (APC) • Genome mapping • Driverless Automatic Train operation (ATO) • Crowd surveillance • Monitoring electronic communications • Data-mining online/wireless data (Emails, texting, social media) • Robots The phenomena of mobile positioning • Mobile positioning - locating (pinpointing) mobile telephones using radio waves – Active mobile positioning - tracking the location of mobile phones in real time through a network of antennas – Passive mobile positioning - uses location and activity information from historical log files stored by mobile service providers (for charging clients) Activities in home network or when roaming • • • • • voice calls, SMSes/MMSes, mobile-net usage, data transmission operations, mobile supported GPS usage, etc. Data file • • • • SIM card ID (statistical pseudonym) Date and time Antenna ID with location data Country ID 15 Collect, Integrate, Manage and Disseminate Public Web RSS Media VMS Parking Info HAR Parking Info SmartDSS ICM Coordinator TMC Agency Users Evaluation System Expert System SmartFusion Data Dissemination Data Store SmartEvent DSS DD GUI SmartEvent Web Based Info Tool Alert System C2C DSS DD Process Signaling Parking SystemsInfo Data Collection Parking ParkingInfo Info Link LinkData Data Info Info SmartSym Traffic Modeling System CCTV Info Parking Data Fusion Predictive System Administrator IVR Traffic Simulation Smart Integrated Corridor Management Weather Info AVL Info Weather Info Transit WeatherInfo Info Schedule Transit Stop Weather Info Info Integrate data feeds from a wide variety of sources and have the tools to act on the assimilated information including: Traffic Transit Human Behavior O-D Construction Incidents Special Events Parking Toll Weather Signal Plans Scenarios Then provide the tools to manage the information effectively including: Decision Support Tools Traffic Prediction, Simulation Inter Agency Communication Then provide the dissemination platforms to deliver Real-time, predictive, location based, personalized information “proactively” 16 Big Data Is Needed For Cairo O-D – HBW Matrix (HBW) Sample Output Customized Origin and Destination Demand Trip Distribution Table: For given Origin- Destination boundary pairs, estimates trip counts by day or daypart. Origin 90210 90211 90212 90213 Single Trip Frequency Table: For given Origin-Destination boundary pairs, estimates the number of people who made the trip with a certain frequency within a given time period. 2D Trip Frequency Table: A twodimensional Trip Frequency Table correlates trip frequency counts from a single Origin to two different Destinations. Trip Duration Table: Estimates the number of people who made trips of various durations between given Origin-Destination boundary pairs. Destination Zip Code 81724 Destination 81743 81743 81744 81744 Day of the Week Monday Monday Tuesday Tuesday Origin Destination LA County, CA LA County, CA LA County, CA Orange County, CA Orange County, CA Clark County, NV Clark County, NV Clark County, NV Clark County, NV Clark County, NV 0 Trips 1 Trips 2 Trips 3 Trips … Origin 90210 90210 90210 90211 90211 Time of Day Morning Morning Afternoon Evening Number of Trips Count 1 5,725 2 1,143 3 274 1 92 2 27 Destination Zip Code 90381 0 Trips 1 Trips 2 Trips 14,378 2,728 1,721 1,207 397 113 403 114 84 172 64 31 … … … Destination 81724 81724 81724 81724 81724 Trip Duration 1 Day 2 Days 3 Days 1 Days 2 Days Trips 431 129 523 904 3 Trips 221 52 27 14 … Count 4,357 1,815 363 254 109 … … … … … … Trip Gerneation and Attraction Demand on Public Transport Passengers Boarding and Alighting at each station Hourly Traffic Mentoring Link By Link data Time 4/27/2010 10:00 4/27/2010 10:01 4/27/2010 10:02 4/27/2010 10:03 4/27/2010 10:04 4/27/2010 10:05 4/27/2010 10:06 4/27/2010 10:07 4/27/2010 10:08 4/27/2010 10:09 4/27/2010 10:10 Seg Seg Num ID 119+09789 119+09789 119+09789 119+09789 119+09789 119+09789 119+09789 119+09789 119+09789 119+09789 119+09789 Count 23.00 16.00 24.00 18.00 30.00 27.00 23.00 40.00 34.00 37.00 29.00 Calc 139.33 136.49 137.50 148.85 119.23 118.71 119.77 112.29 109.50 107.29 109.90 Min 112.09 100.37 100.37 117.07 90.88 84.62 95.46 95.46 90.72 82.44 92.78 Travel Time Max StdDev%StdDev Calc 155.61 16.56 23.07 26.14 155.61 22.16 30.24 26.69 166.34 19.90 27.36 26.49 171.20 17.11 25.47 24.47 162.70 19.52 23.27 30.55 162.70 21.08 25.02 30.68 162.70 17.95 21.49 30.41 134.81 11.79 13.24 32.44 128.74 10.94 11.98 33.26 119.56 12.85 13.79 33.95 119.56 8.76 9.63 33.14 Route Data (aggregated Segments) Time 4/27/2010 15:18 4/27/2010 15:19 4/27/2010 15:20 4/27/2010 15:21 4/27/2010 15:22 4/27/2010 15:23 4/27/2010 15:24 4/27/2010 15:25 4/27/2010 15:26 4/27/2010 15:27 4/27/2010 15:28 4/27/2010 15:29 4/27/2010 15:30 4/27/2010 15:31 4/27/2010 15:32 4/27/2010 15:33 4/27/2010 15:34 Route ID MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 MO A-3 Count Avg TT - Calc Spd - Calc TT - Dly Spd - Dly Conf 17.00 795.97 56.01 50.90 5.54 0.48 28.00 791.79 56.30 46.72 5.24 0.49 23.00 809.90 55.04 64.83 6.50 0.49 23.00 799.82 55.74 54.75 5.81 0.49 22.00 802.24 55.57 57.17 5.98 0.49 17.00 805.19 55.37 60.12 6.18 0.49 11.00 815.17 54.69 70.10 6.86 0.49 13.00 815.31 54.68 70.24 6.87 0.49 16.00 847.62 52.59 102.55 8.95 0.48 14.00 835.52 53.36 90.45 8.19 0.49 12.00 851.21 52.37 106.14 9.17 0.49 16.00 851.21 52.37 106.14 9.17 0.49 12.00 839.29 53.12 94.22 8.43 0.49 15.00 826.26 53.95 81.20 7.59 0.49 19.00 775.94 57.45 30.87 4.09 0.50 18.00 767.20 58.11 22.13 3.44 0.50 8.00 854.43 52.18 109.36 9.37 0.49 Min 23.41 23.41 21.90 21.27 22.39 22.39 22.39 27.02 28.29 30.46 30.46 Speed Max 32.49 36.29 36.29 31.11 40.08 43.04 38.16 38.16 40.15 44.18 39.26 StdDev TT Dly 4.33 48.27 5.91 45.43 5.27 46.45 4.19 57.80 5.96 28.17 6.47 27.66 5.46 28.71 3.82 21.23 3.64 18.45 4.36 16.24 2.90 18.85 Spd Dly 13.86 13.31 13.51 15.53 9.45 9.32 9.59 7.56 6.74 6.05 6.86 Conf 0.42 0.42 0.38 0.39 0.36 0.36 0.37 0.34 0.38 0.38 0.39 Len 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 Speed Monitoring and Congestion analysis Unique Advantages for Transportation Planning mobility data provides significant advantages over other mobile location and monitoring technologies. More data and market coverage – by far – with data received continually from every active phone on every participating network. Mobility data is derived from actual observations of traffic and consumer movement, as they happen – rather than a predictive “guess” based on limited data. Measures mobility in an “organic” way, without the behavior biases inherent when using surveys, probe vehicles, or similar techniques. Data is readily available when and as needed to support either planned or ad hoc project needs. Big Data would offers significant cost savings – up to 60% or more – versus traditional mobility data collection. Extracting Vehicular Data From Moving Cell phones on Highways GUC- Ministry of Transportion Team Ahmed Mosa, Fadwa Fawzy Proposed Method • Our method will be explained as follows: 1. Traffic Data Generation. 2. Cell Phones Data Generation. 3. Dynamic Clustering Algorithm. • As mentioned before, obtaining vehicle/cell phone data in Egypt is extremely difficult due to security purposes. • We used " Simulation of Urban MObility" (SUMO) to generate traffic data. • SUMO is an open source, microscopic, multi-modal traffic simulation. It allows to simulate how a given traffic demand which consists of single vehicles moves through a given road network. The simulation allows to address a large set of traffic management topics. It is purely microscopic in which each vehicle is modeled explicitly, has an own route, and moves individually through the network. Network Specifications 3 lanes highway segment 10 Km length Max allowed speed 50 Km/sec 10 Km In Out Vehicles Specifications • We simulated the behavior of four types of Type accel decel Length Min-gap Max-speed sigma vehicles: 0 3 6 5 2.5 100 0.5 1 2 6 7.5 2.5 60 0.5 2 1 5 5 2.5 65 0.5 3 4 5 7.5 2.5 80 0.5 In this table each type of vehicle specified with its acceleration (accel), deceleration (decel), length, min-gap, max allowed speed(in Km/hr), and the driver behavior (sigma). Types are (0,1,2,and 3) corresponding to (private car, truck, microbus, and bus) respectively. SUMO Ins & Outs • The format of the output.XML file generated <timestep time="<COLLECTION_TIME>" id="<DETECTOR_ID>" vtype="<TYPE>"> by SUMO is as follows <vehicle id="<VEHICLE_ID>" lane="<LANE_ID>" pos="<POSITION_ON_LANE>" \ x="<X-COORDINATE>" y="<Y-COORDINATE>" \ lat="<LAT-COORDINATE>" lon="<LON-COORDINATE>" \ speed="<VEHICLE_SPEED>"/> ... further vehicles ... </timestep> ... further time steps ... XML Parser • To move on to the next step (cell phone data generation), we need to parse the vehicle data generated by SUMO. Timestamp Vehicle ID X Y Speed Vehicle Type • We used MATLAB R2013a XML parser to get the data out of the XML file in the following table format. Timestamp Vehicle ID X Y Speed Vehicle Type Samples of The generated Vehicles Generate Cellphones Locations • For each vehicle, cell phones are randomly distributed around its location (x,y) within a curtain diameter (D). At each time stamp, the number of cell phones for each vehicle does not change, but their locations around the vehicle are changed Within D. (x,y) + X Y Speed Vehicle ID Samples of The generated Cell Phones Snapshots Snapshots (cont.) Continuous Clustering of Moving Objects. DYNAMIC CLUSTERING ALGORITHM • In this work, a dynamic clustering algorithm is used to cluster the cell phones generated from the previous step. • The cluster behavior can then be estimated and used as a vehicle behavior (under the assumption : a cluster of cell phones is a moving vehicle). How We Cluster • This clustering algorithm utilizes the cellphones location and speed at each time step to predict the cellphones positions in the near feature. • This method makes sure that in the near future cellphones will remain part of their clusters. In-addition, cluster split/merge actions can be predicted. Object Modeling • Each cell phone(object) is capable of transmitting its current location and velocity to a central server every U units of time (10 sec in our experiment). • The server can use these data to predict the object location until the next update time. Server Every U units of time The new position, t>tu 𝑥 𝑡 = 𝑥𝑢 + 𝑣𝑢 . (𝑡 − 𝑡𝑢 ) (𝑂𝐼𝐷, 𝑥𝑢 , 𝑣, 𝑡𝑢 ) Cluster Modeling • Clustering feature (CF) It is a compact, incrementally maintainable data structure that summarizes a cluster and that can be used for computing the average radius of a cluster. The features for cluster at time t is: 𝐶𝐹 = (𝑁, 𝐶𝑋, 𝐶𝑋 2 , 𝐶𝑉, 𝐶𝑉 2 , 𝐶𝑋𝑉, 𝑡) N: # objects within the cluster. 𝑁 𝐶𝑋 = 𝐶𝑣 = 𝑥𝑖 (𝑡) 𝑖=1 𝑁 𝑖=1 𝑣𝑖 (𝑡) 𝑁 𝐶𝑋𝑣 = 𝑥𝑖 𝑡 . 𝑣𝑖 𝑡 𝑖=1 𝑁 CX 2 = C𝑣 2 = 𝑥𝑖2 (𝑡) 𝑖=1 𝑁 2 𝑖=1 𝑣𝑖 (𝑡) CF Claims • CF at time 𝑡 can be updated at new time 𝑡𝑛𝑒𝑤 based on its value at 𝑡 and (𝑡𝑛𝑒𝑤 − 𝑡). 𝐶𝐹 ′ = (𝑁, 𝐶𝑋 + 𝐶𝑉 𝑡𝑛𝑜𝑤 − 𝑡 , 𝐶𝑋 2 + 2𝐶𝑋𝑉 𝑡𝑛𝑜𝑤 − 𝑡 + 𝐶𝑉 2 𝑡𝑛𝑜𝑤 − 𝑡 2 , 𝐶𝑉, 𝐶𝑉 2 , 𝐶𝑋𝑉 + 𝐶𝑉 2 𝑡𝑛𝑜𝑤 − 𝑡 , 𝑡𝑛𝑜𝑤 ) • If object given by (𝑂𝐼𝐷, 𝑥, 𝑣, 𝑡) is inserted or deleted to acluser with CF it becomes 𝐶𝐹 = (𝑁 ∓ 1, 𝐶𝑋 ∓ 𝑥, 𝐶𝑋 2 ∓ 𝑥 2 , 𝐶𝑉 ∓ 𝑣, 𝐶𝑉 2 ∓ 𝑣 2 , 𝐶𝑉𝑋 ∓ 𝑥𝑣, 𝑡) CF claims (cont.) • Each cluster has virtual moving center object 𝐶𝑋 𝐶𝑉 given by (𝑂𝐼𝐷, , , 𝑡) 𝑁 𝑁 • Each cluster has average radius R(t) which represents the cluster compactness. 𝑅 𝑡 = 1 𝑁 𝑁 2 (𝑜𝑏𝑗𝑒𝑐𝑡 , 𝑐𝑒𝑛𝑡𝑒𝑟 𝐸𝐷 𝑖 𝑖=1 𝑜𝑏𝑗𝑒𝑐𝑡) • 𝑅 𝑡2 can be computed based on the 𝑅(𝑡1 ) Measuring Object Movement Dissimilarity • Select 𝑚 ≥ 1 time stamps 𝑡1 , … , 𝑡𝑚 . Each slot is associated with a weight 𝑤𝑖 . • The object location is predicted at each time stamp. ∀𝑖 (𝑡𝑖 < 𝑡 & 𝑡𝑛𝑜𝑤 ≤ 𝑡𝑖 ≤ 𝑡𝑛𝑜𝑤 + 𝑈 & 𝑤𝑖 ≥ 𝑤𝑖+1 ) The closer the time slot to the 𝑡𝑛𝑜𝑤 , the higher 𝑤𝑖 it has i = 1,….,m 𝑡𝑛𝑜𝑤 U 2U • Object positions are computed at the chosen time stamps. Given an object O, its positions at times 𝑡1 , … . , 𝑡𝑚 are 𝑥 1 , … , 𝑥 𝑚 • The dissimilarity function between Two objects: 𝑚 𝑤𝑖 𝐸𝐷 2 (𝑂1 , 𝑂2 ) 𝑀 𝑂1 , 𝑂2 = 𝑖=1 Object and Cluster: 𝑁 𝑀 𝑂, 𝐶 = 𝑁+1 𝑚 𝑤𝑖 𝐸𝐷 2 (𝑂, 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑒𝑛𝑡𝑒𝑟) 𝑖=1 The Insertion Operation • To insert object O with 𝑂𝐼𝐷, 𝑥, 𝑣, 𝑡 . Find the cluster C with the closest center to O, using M function. • Introduce threshold 𝜌𝑔 represents the max acceptable distance between the closest clusters. Calculate M yes Create new cluster to O > 𝜌𝑔 No Add O to C, update CF No Split needed for C after adding O yes Check any of the new clusters can be merged to others Stop The Deletion Operation • Next, to delete an object O, a hash table was used to locate the cluster C that object O belongs to. Then we remove object O, and we adjust the clustering feature for C. Split and Merge • Two situations exist where a cluster must be split. 1. When the number of objects in the cluste exceeds a user-specified threshold (i.e., the maximum cluster capacity) 2. When the average radius of the cluster exceeds a threshold 𝜌𝑠 , which means that the cluster is not compact enough. Here, the threshold 𝜌𝑠 can be defined by the users if they want to limit the cluster size, or estimated as the average radius of clusters. Enabling high resolution traffic analysis from cellular Big data EJUST team: Ahmed El-Mahdy, Tetsuji Ogawa, Essam Algizawy • Typical Cell Phone Service Provider Data: – <Timestamp, UserID, AntennaID> – … few billion records! • The resolution is in km! • Not suitable for measuring traffic at the road level Approach/Methodology • Generate cellular big data via simulation – SUMO simulator – Consider the available city of Osnabrück – Construct 2M records • Build Markovian Model form Simulator • Utilise Viterbi Decoding to Recover Actual Routes Cell Antennas Viterbi paths, matching the simulation results On-Demand High Performance Clusters on Mobile Phones Joined Collaborative R&D Project between EJUST and IBM Center for Advanced Studies, in Cairo: Ahmed El-Mahdy (EJUST), Hisham Elshishiny (IBM), Essam Algizawy (EJUST) • Processing of big data is distributed and happens close to when the data is sensed • Mobile phones provides low-cost high performance computing • We utilise the concept of ‘expiring’ threads to ease migration issues • The concept of ‘micromoney’ is utilised Thank you