Paper7 - Transit GIS Clearinghouse

advertisement
Origin/Destination Applications from Smart Card Data
Jesse Simon
A number of studies have explored the potential use of smart cards in generating
Origin/Destination matrices and several agencies have sponsored demonstrations on
portions of their systems. The LACMTA has developed a methodology that generates O/D
from the entire system’s smart card transactions. The prototype practical applications
discussed in this paper are oriented to transactions on specific transit lines or specific
locations where the O/D data is queried from the system-wide data collected for a specified
date or dates. The Linked Trip application pairs the initial line to the final line of each
linked trip. The Line O/D application maps destinations of trips originating on the transit
line. The Area O/D application maps origins of trips destined to a study area.
The Los Angeles County Metropolitan Transportation Authority (LACMTA) operates a “TAP
Card” fare system with smart cards. One serendipitous benefit of this system is that it allows the
agency to generate origin/destination (O/D) trip information from stored TAP card data. The
agency has been exploring methods to do so and is developing applications to exploit the results.
The main advantage of knowing O/D is that one can plan a delivery system where the initial stop
of a patron’s trip and the final stop is known, even when more than one bus is used. Like many
other large agencies, the LACMTA already knows where patrons board and alight within a route
from automated passenger counters, but it has had to rely on sparse, expensive, infrequent and
possibly biased on-board survey data to estimate trip linkages and final destinations.
BACKGROUND
Transportation modelers have had great hopes for smart card technology to supplement or
replace origin/destination data from on-board surveys. As far a back as 2004 Bagchi and White
summarized its advantages: (1) it would generate much larger volumes of data than surveys, (2)
it would allow linkage of single trips to an individual’s travel on a single card, (3) it constitutes
continuous trip data involving longer time periods than collectable via survey, and (4) insofar as
the smart cards are subdivided by fare type, it would allow researchers to track different market
segments. (1) Others have added that it would have low marginal cost because it is already
collected for other agency applications, (2) the database would be more accurate than personal
recollection by survey respondents, (3) and it would be available in weeks rather than in a year or
more. (3) In addition, the low and declining response rate of surveys (often much lower than
20%) and the potential for self-selection bias among responders is a general concern among
researchers. (4) This is not a particular concern about smart card users: the New York City
Transit Authority (NYCTA) used travel diaries to test whether smart card subway patrons
differed in travel pattern from non-card subway patrons. They did not. (3)
Most of the efforts involving O/D are for transportation modeling applications, particularly
development of O/D matrices. (5) They tend to have the modeler’s preference for statistical
assignment of uncertain data. For example, Rahbee uses two methods of scaling when data is
missing or incomplete, where the model may “reassign the whole passenger or reassign the
passenger fractionally”. (5) In addition, because data is usually aggregated, confidence
objectives are at the “reasonable approximation” level (i.e., 90% accuracy) that was first stated
by the NYCTA in its pioneering 2002 study. They further accept the NYCTA assumption that
riders end their last trip of the day at or near the station stop of the first trip of the day, which
NYCTA confirmed as a reasonable approximation for its target population of subway riders. (2) (3)
This paper describes a more direct application to line specific or corridor/location data. It does
not make the assumption that the first trip of the day is the mate of the last trip of the day; rather,
it tests whether the data from two trips match. As the discussion will show, this is at a
considerable loss of usable matches but at a greater trust in data validity. The goal was not to fill
in matrices with reasonable approximations but to aid decision making via tables and maps based
on valid data that has face validity for non-statistically trained personnel.
On-board surveys will continue to provide information on demographics, trip purpose, and other
information that cannot be collected through automated fare and patronage counting systems.
Nevertheless, automatically collected material can be organized to provide large volumes of data
that is pertinent to route and system planning. It is collected on a daily basis that can be used for
time-series analysis which surveys cannot provide.
How LACMTA’s TAP Card System Measures Up
Some providers, such as BART and WMATA, have entry/exit fare transactions on their rail
lines. They have full O/D information within their rail systems as a result.
The rest of the providers have entry-only fare transactions. Boarding stops are all recorded,
intermediate link alighting stops can be easily inferred from the next link’s boarding stop. Final
alighting stops, the linked-trip destination, must be inferred from algorithms or rules of thumb.
LACMTA has one of these systems.
LACMTA’s smart card data interface is fairly advanced among transit providers, especially
multi-modal transit providers. Most providers have AVL systems that are not strongly linked to
their smart card systems. Links must be forged to pursue O/D estimation. (5) (6) LACMTA is
fortunate to have an integrated AVL/smart card system. All smart card transactions are both
time-stamped and geo-stamped automatically. But with all complex operations involving
hardware, firmware, software and multi-system interface, system maintenance becomes a key to
success. The project was undertaken, in part, to understand system weaknesses and to develop
diagnostic reporting and procedures. This effort is not included in this paper except to say that
low match rates currently experienced will be improved by the maintenance reports and
procedures that are being developed. LACMTA has succeeded with its APC system where other
agencies have had difficulty because of this emphasis on diagnostics and maintenance; we expect
similar results with our smart card systems.
WHAT AN O/D MAP CAN SHOW
The following map is a graphic example of the potential of O/D mapping. It is a Desire Line
map of the origins of two stations and one stop, each with a different origin recruitment pattern.
Desire Lines are “as-the-crow-flies” representations of travel from origin to destination. Desire
Line maps, while dramatic, are no longer the method of choice for O/D mapping because it
masks detail. Nevertheless, the Desire Line map very clearly shows how the patterns differ.
The red Desire Lines connect origins to Metro Center rail and bus stops. Metro Center has the
highest number of origins of any destination area of Los Angeles’s Metro system (the system
operated by LACMTA – a different location, Union Station, has more patrons when the
Metrolink and Amtrak rail systems are thrown in.) Metro Center has an extremely wide origin
recruitment area with heavy recruitment along the Metro Rail and Harbor Freeway corridors.
The blue Desire Lines connect origins to El Monte Station, the busway station with the largest
number of origins. It too has a wide recruitment area, but mostly from the San Gabriel Valley
and Downtown Los Angeles extending throughout the Metro Red Line.
The green Desire Lines connect origins to 3rd and Vermont, the bus stop with the largest number
of origins among bus areas not adjacent to a rail or busway station. Its origin recruitment area,
while larger than many stops, is much more localized than the other two.
GOALS OF THE O/D PROJECT
The primary goal of the O/D project was to generate map and table applications that can be used
for scheduling new service and revising existing service.
LACMTA has been incrementally introducing multi-use smart cards as fare media. About 45%
of all fare transactions currently involve these cards. Generating O/D data from the card
transactions will eventually be seamless. The present trial run was as much an effort to
troubleshoot as it was an effort to develop the O/D applications.
A variety of applications were developed but three seem most promising. Two of these are O/D
map applications, and the third is a linked trip application. The most intuitive map application is
the map of destinations from an originating transit line - it can be discussed in this section. The
other two applications must wait until after the methodology/data definition section.
O/D mapping can show how each transit line interfaces with other transit lines to distribute
patrons. Below is one of a series of maps that were used to study transit usage in the San
Fernando Valley. It is a map of destinations from Orange Line origins. While there were no
surprises about the major distribution patterns, there were some about relative pattern strength,
and the limits of the catchment area.
Inferences from the map:
 The Orange Line itself was a very frequent destination among people originating on the
line. This not only represents destinations in the vicinity of the stations, but also park &
ride interface and, for a few stations, transfer to non-Metro transit providers.
 The Line distributes patrons throughout the Valley via other Metro Bus lines.
 The Line distributes patrons all along the Red and Purple Lines, but not the Blue or Gold
Lines.
 Via the Red Line it distributes patrons through third Metro Bus links to Hollywood and
downtown Los Angeles.
 It distributes a small but concentrated number of patrons to Westwood via Line 761.
 This map does not show trips where Orange Line is an intermediate link on a 3-link trip,
which would necessarily be shown on another map.
METHODOLOGY/DATA DEFINITION TASKS
Data definition can be broken into two main tasks: (1) creating linked trips from TAP records
and (2) inferring the final link’s alighting stop (the linked trip’s destination).
Creating Linked Trips
The smart card dataset is organized by smart card identification number and date. On any given
date on any given card there may be one or many fare transactions. Each transaction represents a
boarding which, in this context, is called a “link”. The question is how to decide which links are
parts of a linked trip.
According to Chu and Chapleau, “the identification of linked trips in previous studies is solely
based on a fixed temporal threshold between transactions”. They cite a variety of thresholds: (1)
a transfer occurs if wait time is less than 60 minutes, (2) less than a 90 minute elapsed time
between successive boardings, and (3) less than a 30 minute elapsed time between successive
boardings. The problem with arbitrary temporal thresholds is that it does not account for
variation in trip length and service levels. As Chu and Chapleau put it “this would destroy the
disaggregate property of the data”. (7)
In their case study their solution was to create a “spatial-temporal path” between successive trips.
This was a several step process: (1) Boarding time was obtained from the fare transaction, (2)
alighting time of the cardholder at the stop for the prior trip was obtained from the boarding data
of other passengers at that stop of that trip, and (3) if no boarding took place, then it was
interpolated from other passenger boardings at other stops. (4) Then the distance between stops
was found if walk distance is involved. (5) A walk speed of 1.2 m/second (2.7 mph) is applied
(6) with an added 5 minutes to account for variations in walk speed.
Chu and Chapleau’s advocacy of a spatial-temporal context is an improvement over a fixed
temporal element but their method is more applicable to a small scale study. In LACMTA’s case
where there are millions of TAP transactions every week, referring to boarding times of other
passengers to attach to the cardholder’s transaction is a very large processing speed bump,
especially if further interpolation is sometimes required. Instead, LACMTA uses mph between
the cardholder’s boardings. In this case the spatial-temporal context is that a link becomes part
of the linked trip if the time elapsed between successive boardings is greater than 3 mph, i.e., the
time to the next boarding better be faster than walking speed if that boarding is to be part of the
same linked trip. It should be noted that the 3 mph is really more a characteristic of the service
provided than the speed of the passenger. There were very few instances where Metro service,
including headways, was lower than 3 mph between any two connecting lines at any two stops
(almost all in Downtown LA, and these were rare). A different mph would be appropriate in
other cities.
Inferring the Final Destination of the Linked Trip
TAP card data clearly indicates where a patron boards; it is where he taps his card. The initial
boarding of a trip is the trip origin. In this exercise one must also find out where he finally
alights from the trip’s last link because this is his destination. TAP cards do not directly say
where this happens, there is no card tapping upon alighting – so, it must be inferred.
The basic method of inferring final destinations starts with finding linked trips that can be
matched as the inbound and outbound trips of a “round trip”. Once done, the initial boarding stop
of the first linked trip is identified as the final alighting stop of the later linked trip and the initial
boarding stop of later linked trip is identified as the final alighting stop of the first linked trip.
The destinations of each of the linked trips are thereby inferred.
There are two separate parts to a round trip, both of which are linked trips. (Multiple site tours
are discussed in the “Some Weaknesses in Matching Round Trips” section.) For example, in a
home to work round trip, the first linked trip is from home to work and the second linked trip is
from work back to home. Note that “linked trip” can be a one-link trip if no other links were
found to be part of it.
As soon as Linked trips are identified, the boarding stop of the intermediate links can be
eliminated and pertinent information from the initial (the origin) link and the final (the
destination) link can be merged into one record. These records present an opportunity to infer
Trip Origin Stop - Trip Destination Stop matches (OD).
The following graphic shows what is known from TAP data and how OD can be inferred from it.
The red and orange must each match to make an O/D pair.
Line to Area Match
Direction of Travel
Outgoing Trip:
First Link’s Boarding Stop Area’s
Associated Line Numbers
Incoming Trip:
Last Link’s Line
Line to Area Match
Last Link’s Line
First Link’s Boarding Stop Area’s
Associated Line Numbers
The result is that the Outgoing First Link’s Boarding Stop Area will be named the Origin and
Incoming First Link’s Boarding Stop Area will be named the Destination. In this application the
definition of “stop area” is important. Presently, stop area is defined as 350 meter circle (just
over 1/5 mile) around the stop. This allows for travel to and from bus and rail stops and bus
depot stops in selected areas, which while very few, are very busy. This could have been
extended to ¼ mile, which is the walk distance many transportation models use to represent the
distance people are willing to walk to a bus stop. It was not done for two reasons. The first is
that so many options would be available in some areas, especially downtown LA, that matching
would become an empty exercise. The second is that in travel surveys walk distance preference
questions are about stop distance to and from the true origins and destinations, not distance
between transit stops. We may revisit this restrictive approach in the future, but for the present, a
distance over 350 meters between two stop areas voids the trip match.
On the other hand, the above is a little looser construct than matching First Link’s Line to Last
Link’s Line because Metro’s system has stops where a person returning to the same place could
choose more than one line since each would make his desired connections.
The matching criteria was amended even further because TAP cards currently often record only
the parent line of a bus run; if it is on a branch line for part of the day, then the wrong line will be
recorded and no match will be made. The geo-stamp is unaffected by line assignment, so using
geographic assignment would increase the number of matches. Here is a similar graphic to the
above that contains the criteria that matches Area to Area rather than Line to Area:
Stop Area Match
Outgoing Trip:
First Link’s Boarding Stop Area’s
Associated Line Numbers
Direction of Travel
Incoming Trip:
Last Link’s Boarding Stop Area’s
Associated Line Numbers
Stop Area Match
Last Link’s Boarding Stop Area’s
Associated Line Numbers
First Link’s Boarding Stop Area’s
Associated Line Numbers
Here again, the result is that the Outgoing First Link’s Boarding Stop Area will be named the
Origin and Incoming First Link’s Boarding Stop Area will be named the Destination. A
beneficial side-effect is that Areas are determined by geo-stamps; current problems (that are
hopefully temporary) with proper designation of lines is thereby avoided.
TABLE AND MAP APPLICATIONS
There are a number of applications that have been developed from this process. The first of
which is derived from the designation of linked trips, prior to matching initial and return trips.
This is important because of some weaknesses in round-trip matching: many trips are not
matched and there is no way to determine if the ones selected represent the population of trips.
Some Weaknesses in Matching Round Trips
Only 38.3% of all smart card transactions were given origins and destinations through the
matching method. This low return is not a problem per se, but it would be if matches do not
result in a representative sample of the total population of trips.
At the present time many unmatched trips may be due to a recording problem. Two major
systems must talk to each other: the passenger counter (APC) system which geo-stamps the
boarding and the farebox system which records the TAP. Any failure to make the boarding geostamp part of the TAP record of any multi-link trip, or any part of the round trip, will nullify the
ability to match. (Tracking system integration error will be part of another paper.) Insofar as
system error is randomly distributed, this does not substantially contribute to a concern about
how representative round trips are of the general population of trips. Currently there is a nonrandom distribution of system error among fareboxes; they are far less likely to talk with the
APC systems aboard buses on contracted lines rather than aboard buses on directly-operated
lines. This is an installation, not a permanent problem. In Metro’s APC experience, non-random
patterns that are found are diagnosed, corrected and eliminated. No general non-random pattern,
such as a relationship to boarding frequency, has been found.
Another possible explanation is that there was no matching trip for a given trip. Some trips are
not part of round trips because the return trip is made on another mode (e.g., a car or bicycle is
used). In other cases, the return trip could have been made on a subsequent day.
There are also two situations in which trip tours are not matched as round trips. First, some
Metro trips involve transfers to or from other operators. Since only Metro trips are being
tracked, the other operator links would not appear and matching would fail. Second, there could
be a trip tour with no way to designate which is the origin and which is the destination. This
could represent a strong bias in some localities. A particular instance was a stop area serving a
commuter college where many trips involve going from home to work to school and then home
again. The computer could not break this tour into two matching components of a round trip
without additional knowledge about primary destination, which is not collected at the fare box.
These non-random examples lead to questions of sample bias that can only be partially addressed
in this paper. The data is a biased sample of the overall population of trips. It primarily
represents the travel behavior of regular users who make round trips directly to and from school
or work. The commuter college example shows that in some locales the results will not be
useful. But the bias potential should be understood in context. LACMTA on-board surveys
indicate that 82% of riders use the service 5 or more days a week, and 82% of this group’s trip
productions are either home-work or home-school. Fare card users primarily come from this
majority group, even if it is an open question as to how their travel differs from others’.
Several modelers have attempted to coordinate the O/D information with on/off passenger
counts. Their efforts focus on transfer estimates in restricted neighborhoods; (8) (9) there is no
methodology extant for application to wide areas with multi-link transfers. The tack taken in this
paper is to retain the original O/D of the very large sample (millions of cases per week) knowing
that it represents behavior of the core group of users of the system.
Line Destiny Report
In contrast to the 38.3% match rate, linked trip attribution was successful for 75.2% of the smart
card transactions. Designating the linked trip does not yet identify the final destination of the
trip but it does allow the identification of the initial link and final link of each trip. LACMTA’s
new “Line Destiny” report is the result. The report rank orders transfers to lines from any given
line. The report is generated with more matches, and involves fewer inferences, than
LACMTA’s O/D applications.
The example below is only the first page of the Line Destiny report which shows every line in
the system. The report is illuminating.
 Staff is well aware that Los Angeles’ Metro has the highest proportion of multi-link trips
in the country but the report shows that the basic travel pattern is still the one-link trip.
The report shows that 57.3% of all trips involve only one link.
 As to multi-link trips, linkage from any given line is widely distributed among transfer
points - to many lines. The report shows that, from any given line, the median for the
highest percentage of trips destined to end on another specific line is 4.0%. Only six
lines have 10% of its patrons destined to a specific line. The highest among these are
patrons on Metro Rail’s Gold Line: 21.3% of its patrons are destined to end their trip on
Line 802 (the designation for Metro’s two heavy rail Routes that share a large corridor
segment). Obviously, the Gold Line and the heavy rail routes are closely inter-related;
planning and scheduling should be approached with this in mind.

Another interesting general finding is that no Rapid Line has over 10% of its patrons
transfer to the companion local line that travels the same corridor. When Rapid was first
proposed it was assumed that Rapid Line patrons would begin or complete their trip on
the parallel local line; this is not empirically supported. Patrons may game whether to
board the Local or Rapid Line but once aboard they do not tend to subsequently transfer
to the parallel service.
Line Destiny Report for Weekdays Sept. 7-13, 2010
(Only destinations with over 2.5% of origin boardings)
Original Line Final Line
Frequency Percent Cumulative
Boarded
Boarded
Percent
2
2
17,532
65.8
65.8
Total
26,658
100.0
4
4
20,023
67.6
67.6
10
14
802
Total
10
Total
14
204
789
29,632
10,506
17,077
13,288
815
2.7
100.0
61.5
100.0
57.9
3.6
70.2
207
754
Total
710
604
22,939
3.1
2.6
100.0
64.6
67.2
61.5
57.9
61.5
O/D applications
The O/D applications discussed in this paper were developed with schedule makers and service
planners in mind: the applications focus on specific lines or specific places. The data is
prepared for all smart card transactions that can be matched to generate linked trips with origins
and destinations. Data is then queried and mapped for specific places or lines. The Line O/D
map application has already been discussed. The discussion below focuses upon the Area O/D
application, a map of origins and destinations related to a specified geographic area.
The Area application was applied to Westwood. The map below was part of a series of spatial
analyses that began with questions about where to shorten Line 761 that travels along Van Nuys
Boulevard in the San Fernando Valley and then crosses the Santa Monica Mountains to end in
Westwood. How far up Van Nuys Boulevard was travel demand to and from Westwood? The
initial query set up several Van Nuys corridor maps and Line 761 maps, each map generating
more questions and more maps. Once an O/D dataset is generated, maps can be drawn to answer
location-specific queries as required. The discussion segued to a question about travel to
Westwood in general (inspired by the question, “How typical is the long-distance travel from
Van Nuys to Westwood?”).
The first map shows the area defined as “Westwood”. It is somewhat different than the standard
demarcation of the area. As part of an ongoing process, “Westwood” was defined as areas in or
near Westwood that people on Van Nuys corridor traveled to. Census tracts were used in this
study because they have demographic information attached to them; but any standard set of
polygons could have been used such as TAZs, Census block-groups, or Census blocks. Use of
demographics will not be discussed in this paper, except to say that in real-life applications lots
of resources are often used to answer lots of questions.
The second map shows Census Tracts color coded by origins and destinations, where the color
represents intensity of travel (number of trips). Destinations are represented by colored outlines
of the Westwood tracts; Origins are represented by the colored interiors of tracts both in and
outside of Westwood.
The study findings were presented in-house in the following slide.
Findings
• There are two destination tracts in Westwood that dwarf
all the others: UCLA (238 trips) and the tract along and
south of Wilshire by Westwood Boulevard (128 trips)
– An optimal stop on the subway to the sea would be on Wilshire
between these two census tracts.
• Most of the origin tracts lie on three main corridors: Van
Nuys (with a short jog on Ventura), Wilshire/Whittier, and
Sunset
– The heavy origins are as far north as Nordhoff on Van Nuys
– The heavy origins stretch very far to the east on both
Wilshire/Whittier and Sunset
• They trace out a strong path for potential corridors of the subway to
the sea.
– All three corridors represent some long trip-making.
• The UCLA tract has the most origins, which indicates
travel within Westwood.
– These are short trips.
The findings are not important to this paper in themselves; they are exemplary in showing the
utility of the O/D material, and how easily the data can be focused and re-focused to regional,
line or local area considerations as discussions and queries evolve.
NEXT STEPS
The project was not only trying to explore applications, it was also an exploration of data
reliability. Multiple complex systems talk to one another to generate the data, and in such
situations there are breakdowns of equipment, software, and firmware. In the immediate future
the focus will be on farebox data errata, data structure requirements, and diagnostic reporting and
tracking. A major reason LACMTA’s APC system is so reliable is that user and maintenance
departments have treated errata, and errata tracking, as “telltales” that insure identification of
what needs to be fixed and where it is hiding.
One of the oversights of the current project was to discard intermediate link information once the
origin and destination links were identified. Such information is necessary for investigating
critical paths and calculating links per trip. It would also allow the mapping of O/D where the
intermediate link is the transit line that is being investigated.
The ultimate goal is the development of data structures and routines for regular, routine,
processing of O/D datasets for queries and mapping
The question of sample bias is always on the research agenda. Ameliorating the bias with on/off
data is a very attractive proposal by several researchers. It must be considered in the framework
of applications that can generate outcomes for very large datasets that are geographically
widespread.
Further research must also be undertaken on the extent to which travel in the matched sample
represents total travel. And this research should not restrict itself to whether the matched sample
represents fare card user travel. It is still problematic as to whether fare card user travel
represents all travel. The NYCTA finding that it is representative in that city may be perfectly
true and still say nothing about transit users elsewhere in the United States.
Even if the data from automatic sources is somewhat biased, it has benefits that survey data
lacks. Recognizing, exploiting and combining the strengths and weaknesses of data collected by
diverse methods are going to be major endeavors in the coming decade. With relational
databases we can already force fit diverse data sources; doing so in a valid manner is the
challenge. The real advances will be by researchers who can offer valid ways in which the
massive datasets can routinely calibrate or be calibrated to (and can supplement or be
supplemented by) the intentionally developed and controlled survey data.
REFERENCES
(1)
Bagchi, M. & White, P.R. The Potential of Public Transport Smart Card Data, Transport
Policy, vol. 12, 2004, pp. 464-474.
(2)
Farzin, J. Constructing an Automated Bus O-D Matrix Using Smart Card and GPS Data in Sao
Paulo, Transportation Research Record #2072, 2008, pp. 30-37.
(3)
Barry, J. Newhouser, R., Rahbee, A and Sayeda, S. Using Automated Fare System Data,
Transportation Research Record #1817, 2002, pp. 183-187.
(4)
Stopher, P., The Travel Survey Toolkit: Where to From Here?, Keynote paper, 8th
International Conference on Travel Survey Methods, 2009 (contact peters@itls.ussyd.edu.au ).
(5)
Rahbee, A. Smart Card Passenger Flow Model at CTA, Transportation Research Record
#2072, 2008, pp. 3-9
(6)
Wang, W., Attanucci, J., Wilson, N. A Study of Bus Passenger O-D and Travel Behavior
using Automated Data Collection Systems in London, unpublished manuscript, 2010, (contact:
winniewang@worldbank.org).
(7)
Chu, K. and Chapleau, R. Enriching Archived Smart Card Transaction Data for Transit
Demand Modeling, Transportation Research Record #2063, 2008, pp. 63-72.
(8)
Navick, D. and Furth, P. Estimating Passenger Miles, Origin-Destination Patterns, and Loads
with Location Stamped Farebox Data, Transportation Research Record, 1799, 2002, pp. 107113.
(9)
Cui, A., Bus Passenger Origin-Destination Matrix Estimation Using Automated Data
Collection Systems, Master’s Thesis, MIT, 2006
Download