Urban Computing with Taxicabs Yu Zheng Microsoft Research Asia Motivation Urban computing for Urban planning Developing countries: Urbanization and city planning Developed countries: Urban reconstruction, city renewal, and suburbanization Questions What’s wrong with the city configurations? Does a carried out urban planning really works? GPS-equipped taxis are mobile sensors Rank Cities Country/Region Taxicabs 1 The Mexico city Mexico 103,000+ 2 Bangkok Thailand 80,000+ 3 Seoul South Korea 73,000+ 4 Beijing China 67,000 5 Tokyo Japan 60,000 6 Shanghai China 50,000+ 7 New York City USA 48,300 8 buenos aires Argentina 45,000 9 Moscow Russia 40,000 (1000,000) 10 St.Paul Brazil 37,000 11 Tianjin China 35,000 12 Taipei Taiwan 31,000+ 13 New Taipei City Taiwan 23,500 14 Singapore Singapore 23,000 15 Osaka Japan 20,000 16 Hong Kong China 18,000+ 17 Wuhan China 18,000 18 London England 17,000 19 Harbin China 17,000 20 Guangzhou China 16,000+ 21 Shenyang China 15,000+ 22 Paris France 15,000 What We Do Detect flawed urban planning using taxi trajectories Evaluate the carried out city configurations Reminder city planners with the unrecognized problems Challenges City-wide traffic modeling Embodying flaws and reveal their relationship Methodology Partition a city into regions with major roads Methodology Partition the trajectory dataset into some portions 42 48 Speed Average Speed 46 44 36 42 Speed (km/h) 38 34 32 30 Speed Average Speed 40 38 36 34 28 Time of day Time of day Workday Rest day Time Work day Rest day Slot 1 7:00am-10:30am 9:00am-12:30pm Slot 2 10:30am-4:00pm 12:30pm-7:30pm Slot 3 4:00pm-7:30pm 7:30pm-9:00am Slot 4 7:30pm-7:00am -- 22:00 20:00 18:00 16:00 14:00 12:00 10:00 22:00 20:00 18:00 16:00 28 14:00 24 12:00 30 10:00 26 8:00 32 8:00 Speed (km/h) 40 Methodology Project taxi trajectories onto these regions Building a region graph for each time slot Methodology Extracting features from each edge |S|: Number of taxis E(v): Expectation of speed π = πΈ π· πΆπππ·ππ π‘(π1 , π2 ) Tr1 p0 r p4 1 p5 Tr2 p1 p7 r2 p2 p3 r 3 p8 p6 πΆπππ·ππ π‘(π1 , π3 ) Tr2: 1000 800 rj r0 r1 r0 r1 ο¦ a0,1 a1,0 ο¦ ri ai,0 ai,1 ai,j rn-1 rn a0,n a1,n |S| 600 400 M= ai,n 200 0 4 0 10 3 20 30 E(V 2 1 40 50 ) (k m/h 60 ) 0 70 -1 ο± rn-1 rn an-1,0 an,0 Tr1: ο¦ an-1,n ο¦ πππ =< πΊ , πΈ π , π > Methodology Select edges with |S| above average Detect Skyline edges according to < πΈ π , π > E(V) Select edges with big π and small πΈ π Any point from the skyline is not dominated by other points skyline Ι΅ A) A skyline point E(V) Ι΅ 1 2 3 4 5 24 20 30 22 18 1.6 2.4 2.8 2.0 1.4 6 7 8 34 30 36 2.4 2.0 3.2 B) Seeking a skyline Slot 3 r3 r9 r4 r2 r8 r9 r8 r3 r4 r6 r2 r1 r5 r3 r6 r4 r5 r8 r2 r8 r4 r3 r5 r6 r8 r4 Slot Slot1 1 r11 r1 r22 r2 r44 r4 r33 r3 r55 r5 r22 r2 r88 r8 r55 r5 r77 r7 r88 r8 r99 r9 r11 r1 r44 r4 r1 r2 r3 r8 r2 r4 r8 r4 r3 r5 r6 r11 r1 r55 r5 r99 r9 r3 r3 r44 r4 r6 r6 r2 r2 r33 r3 r44 r4 r8 r8 r3 r3 r4 r4 r1 r1 r5 r5 r8 r8 r4 r4 r77 r7 r88 r8 r4 r4 r8 r8 r44 r4 Support=1.0 Support=1.0 r22 r2 r55 r5 r5 r5 r2 r2 r88 r8 r11 r1 Step Step (2) Step (2) r11 r1 r44 r4 r4 r4 r2 r2 r88 r8 r55 r5 r2 r2 r5 r5 r4 r4 r44 r4 r22 r2 r33 r3 r1 r1 r2 r2 r33 r3 Mining skyline patterns Support=2/3 Support=1.0 r2 r1 r5 r4 r7 r8 r4 Step (2) r1 r4 r3 r8 r4 r3 r4 r2 To avoid falser2alert r7 r5 r8 Deep understanding r8 r5 4 r8 r5 r1 5 4 Patterns Patterns r2 2 Slot2 2 Slot 5 1 2 Slot3 3 Slot 3 1 Skyline SkylineGraphs Graphs 4 Building skyline graphs Skyline Graphs 2 Step (1) Patterns r r r r skyline r r Formulate graphs r r r r r Mining frequent patterns r1 Day 3 Day 3 Day 2 Day 2 Day 1 Day 1 Day 3 Day 2 Day 1 Slot 2 Slot 1 Methodology r6 r6 r8 r8 r2 r2 r8 r8 r4 r4 r3 r3 r5 r5 r6 r6 Mining skyline patterns Mining skyline patterns Support=2/3 Support=2/3 r1 r1 r2 r2 r3 r3 r8 r8 r2 r2 r4 r4 r8 r8 r4 r4 r3 r3 r5 r5 r6 r6 Evaluations Datasets 2009. 3-5 2010.3-6 Number of taxis 29,286 30,121 Effective days 89 116 Total 679M 1,730M Per taxi/day 306 528 Total 310M 600M Per taxi/day 128 171 Average sampling rate (s) 100 74 Ave. dist. between two points (m) 457 349 Number of points Distance (KM) 2009 Rest Days 2010 Workdays Results Entrance r1 The 4th ring road r3 Wangjing we st r2 Road Nanhuqu we st Road Some flaws occurring in 2009 disappeared Example 1: Two roads launched in late 2009 r4 Results Su lin bw e ay 14 Subway line 14 Subway Line 14 Subway line 15 Some flaws occurring in 2009 still exist in 2010 Example 1: Subway line 14 and 15 r2 r3 r1 A) overview B) Line 14 and 15 in Wangjing C) Line 14 passing New CBD Conclusion Video The Released Dataset: T-Drive taxi trajectories A demo in the demo session on Sept. 20. Thanks! Yu Zheng http://research.microsoft.com/en-us/people/yuzheng/