Analytics and Big Data — Rail Public Transportation is a Leader Lyndon Henry Railway Age Magazine Urban Rail Today Consulting Austin, Texas INTRODUCTION MAJOR APPLICATIONS OF ANALYTICS Two concepts currently at the leading edge of today's information technology (IT) revolution are Analytics and Big Data. Analytics is high-technology applied to data processing, complex calculations, and automation; Big Data is the current term referring to significantly large volumes of data, on virtually every facet of human activities and characteristics, that can be rapidly processed via Analytics, yielding a broad spectrum of highly useful results. Recent technological advances have sparked what amounts to a "revolution" in the application of these cognitive and informational tools. Apparently without realizing it, the public transportation industry, has, for many decades, been at the forefront in utilizing and implementing Analytics and Big Data, from ridership forecasting to transit operations. Rail transit systems have been especially involved with these IT concepts, and tend to be especially amenable to the advantages of Analytics and Big Data because they are generally "closed" systems that involve sophisticated processing of large volumes of data. In virtually any American city, on any normal weekday, one is likely to see the results of analytics literally in motion — the operation of transit buses and trains that are essential to maintaining the mobility of the metro area. The more that public transportation professionals and decisionmakers understand the role of Analytics and Big Data in their industry in perspective, the more effectively they will be able to utilize its promise. Furthermore, it is useful for both the public and the industry to realize how significantly public transportation has been a leading pioneer in the rich and extensive historic development of these tools, the roots of which in some cases extend back to 19th century rail technology. Some of the most salient applications of Big Data and Analytics in today's urban rail transit are summarized in the following sections. [1] These range from urban planning activities with computerized processing of massive amounts of demographic and geographic data, to complex signaling and train dispatching or control systems, to communications, train tracking, and passenger information operations using increasingly common modern technologies like GPS, Wi-Fi, and cellular phone systems. The intensive use of Analytics is particularly underscored by the incorporation of "automatic" and "automated" in so many of the common technical features of modern rail operations: terms such as Automatic Block Signals, Automatic Train Control, Automated Passenger Counting, etc. Travel Demand Modeling Surely one of the biggest deployments of Big Data has been in planning new public transportation services and infrastructure. At least since the 1950s, this relatively gigantic undertaking has involved modeling (projecting) future travel demand in various urban areas. Not only have the public at large generally not realized the magnitude of this task, but transportation planners themselves seem to have been unaware of the extent to which systems modeling — involving projections ranging from travel demand to modal split to ridership — has, for more than half a century, represented one of the most widespread and intensive deployments of Analytics and Big Data. The modeling process typically involves splintering up all the census tracts in a multicounty metro region into segments (each one often called a travel (or transportation or traffic) analysis zone, or TAZ), tallying the total households in each TAZ, then assigning some demographic characteristics (e.g., income level) to B – Partnering for Success proportional household categories based on available data. Next the model uses sophisticated algorithms to project future growth in population, economic activity, and perhaps other critical elements. Then (typically using estimates of factors such as travel time and cost) more algorithms project an estimate of all trips (for an average weekday) among all the households and centers of employment (and other activity centers, such as educational facilities, retail centers, etc.) among all these TAZs. [2] From this complex process enormous volumes of trips (Big Data) are projected, which are then assigned (via the model Analytics) into travel corridors. Ultimately, final projections are produced for transportation facilities such as new road routes, additional freeway lanes, and major new transit lines (such as light rail). This highly complex procedure, involving massive amounts of data and networks of intricately interrelated algorithms, has been repeated in metro areas across the country, for many decades, well before the terms Big Data and Analytics were fashioned. And planning for rail transit systems has certainly been at the leading edge, driving this effort and refining the methodology from city to city, year after year. Furthermore, the incorporation of global positioning systems (GPS) and geographic information systems (GIS), with associated Big Data Analytics, has facilitated major advancements in essential planning tasks such as pinpointing locations, determining land areas accurately, and overlaying and correlating large volumes of demographic data with geographic areas (e.g., census tracts, transit service areas, etc.). systems. On the other hand, where trains may run at higher speeds in exclusive alignments, automatic block signaling (ABS) systems are common, with lines segmented into fixed blocks governed by automatically operated signals that detect train occupancy and use red, green, or yellow signal lights on the wayside of the track to inform train operators whether their train should proceed, stop, slow down, etc. An additional improvement is the cab signaling system (CSS), whereby the current track block condition is displayed in the operator's cab (usually with wayside signals as a backup). At the core of most of today's urban rail systems is a central dispatching operation, often highly automated. A centralized traffic control (CTC) system, involving the rapid and deft processing of train and track data via Analytics, not only controls how signals and switches are set, but usually also monitors the location of all the system's trains, their directions and speeds, whether they're on schedule, etc. — typically displayed diagrammatically across several computer screens. Simple signaling and control has evolved over many decades, and has in fact long incorporated features such as automatic train stop (ATS) to stop a train that proceeds past a red signal or perhaps commits other dangerous violations. Another advance, in use for many decades on numerous systems, is automatic train control (ATC) to control train movement according to signal and speed authorizations. This takes train protection a step further by implementing some form of speed control, usually with CSS, in response to external inputs. On the very high end of train control, some systems are installing or upgrading to communications-based train control (CBTC), typically eliminating physical fixedblock segmentation of track in favor of moving "virtual" blocks with variable-length spacing between trains. Needless to say, computer-based CBTC is especially heavy on Analytics. Some systems (such as the Port Authority Transit Corporation's Speedline from Philadelphia into South Jersey and Bay Area Rapid Transit) have deployed operational Analytics to a very high degree indeed, even automatically running trains via automatic train operation, or ATO. (Airlines have had air traffic control and autopilot capability for decades, but, in the Analytics race, rail technology beat them to it!) Currently, the newest installations of ATO appear to rely on CBTC technology. In addition to these evolutionary technological developments, a 2008 federal law mandates for many transit operations the installation of a further signaling technology, positive train control (PTC). Basically incorporating a form of ATC, together with GPS, PTC involves the installation of special devices in thousands of locomotives and railcars, the construction of an extensive Train Signal and Control Systems By far one of the most intricate and critical aspects of rail transit operations, the signaling and control (e.g., dispatching) component merits prominent discussion because it represents an ancestor, of sorts, of Analytics in public transit — originating in the late 19th century! At first, this was primarily a system (based on the electrical and mechanical technology of the period) to keep trains from crashing into each other. But in the modern era, it has been upgraded into a high-tech, complex process for tracking the location of trains, estimating travel times, and providing other valuable information. Intricate electrical circuitry and electronics (a quantum leap from 19thcentury technology) form the basis of today's systems. [3] [4] Contemporary urban rail systems use a wide range of signal types to manage train movements and ensure safe, efficient operation. In very simple operations, train signals may be integrated with ordinary street traffic light 2 B – Partnering for Success new wireless communications network (some using satellite-based links), and installation of many thousands of wayside devices (typically, transponders) interconnected with signals, switches and other railway hardware. By law, PTC must be functional by the end of 2015. [5] passengers as to when their trains (or buses) are due to arrive at their stations or stops. Passenger information display (PID) monitors or digital signs in stations or even available apps on smartphones keep passengers updated on imminent arrivals or departures. Automated Fare Collection (AFC) Route Planning and Scheduling While passengers still drop coins in onboard fareboxes on most urban bus systems, fare collection on almost all rail systems today is largely automated. Automated fare collection (AFC) typically uses ticket vending machine (TVM) devices in stations that can receive cash or process credit card swipes, thus also instantly updating a central database — often with voluminous amounts of really Big Data. Passes and discounted multi-tickets are encouraged, but the hot trend is toward smartcards that provide access to all types of transit services across multiple operating agencies and jurisdictions. When a passenger uses a credit card, the transit agency can correlate passenger travel with other data available from the credit card. Slick new analytics give transit agencies details of how passengers are using their systems, identify trends, and help improve service. With the advent and proliferation of computer technology, and advances in analytical processing of complex data, laborious transit scheduling tasks have been made considerably faster, less arduous, and more efficient through the application of Analytics to process large, complex volumes of data. Today's powerful software accomplishes the tasks of routing, developing timetables, blocking these into bus and train schedules, then performing runcutting and other essential component tasks such as rostering. As Christopher MacKechnie explains in an informative online summary, [6] Scheduling software allows a transit agency to design bus routes, create bus stops, schedule bus routes, combine individual bus trips into blocks, cut blocks into pieces that individual drivers will operate, on a daily basis assign individual drivers into runs, and provide customer information about the network. The automation allows for schedulers and transit planners to quickly develop many different scheduling scenarios rather than rely on just one, which has significantly [increased] the operational efficiency of today's transit systems. Automated Passenger Counting (APC) While AFC can tell a transit operator how many people are purchasing tickets or passes, and how much fare revenue is being taken in, transit agencies still need to count how many passengers are actually boarding each bus or train. An automatic passenger counting (APC) system not only can inform the agency as to how many passengers are boarding or deboarding each vehicle, but precisely where this happens — and they can relay this information online, continuously, to a central database (typically generating Big Data in the process). Especially by applying sophisticated Analytics, the transit agency can then use this data to provide better service and project evolving ridership trends. Automatic Vehicle Location (AVL) Some of the most useful and popular of today's applications of Analytics in public transit are automatic vehicle location (AVL) and associated passenger information systems (such as NextBus). Using GPS-based data plus Analytics to track both buses and trains, AVL has become an extremely reliable system to inform central dispatching personnel (or an automated control center) as to where trains are and whether they're on schedule — information that can then be communicated to passengers in stations. (While the data on the location of a single train at any point in time is relatively small, in the aggregate, with multiple trains moving constantly, the volume quickly leaps to the category of Big Data.) Many transit agencies use AVL integrated with a passenger-oriented information system (NextBus, the brand name of the most widely deployed system, has virtually become a generic term for this) to clue waiting EXAMPLES — SELECTED SYSTEMS To illustrate the diverse role of Analytics and Big Data in a broad variety of tasks and scenarios, it's helpful to summarize a selective sampling of useful and essential applications involving Analytics in several U.S. rail transit systems of various sizes. 3 B – Partnering for Success can be integrated with data incoming from station fare gates, thus enabling the team to also monitor the flow of passengers through the system, using identification codes from passenger tickets, cards, passes, etc. Altogether, Analytics facilitates more accurate projections of service needs in terms of schedules, train consists, and similar essential features. Thus, according to Roy Henrichs, reliability engineering team manager, this application of Analytics enables BART to address three critical needs passengers face: "You arrive at the station, and the first question you ask is, 'Where's my train?' Then you ask, 'Where's my seat?' And finally, 'Will I be on time?'" Using operational Analytics, BART has been able to resolve these issues positively for passengers, thus ensuring high user satisfaction. Bay Area Rapid Transit (BART) The Bay Area Rapid Transit (BART) system provides a highly automated, relatively high-speed, urbansuburban rail rapid transit (RRT) system for the San Francisco Bay Area, serving the counties of San Francisco, Contra Costa, San Mateo, and Alameda. The system consists of 105 miles of double-track RRT service with 43 passenger stations; it includes a 4-mile-long underwater tube connecting San Francisco with Oakland. Average weekday ridership totals about 370,000. From its Lake Merritt Operations Control Center, BART maintains supervision over all phases of its system, including train operations, passenger services, power delivery, and wayside facilities. And Analytics, especially with the processing and interpretation of Big Data, is a key element within all of these functions. [7] The critical role of some key aspects of BART's operational analytics — aimed at ensuring schedule reliability — is the focus of a 2012 article by Beth Schultz, titled "Operational Analytics Keeps Bay Area Trains on Track" and posted on the All Analytics website. [8] On-time performance is cited as "the most important issue" for BART's passengers, and it certainly is for the system's management. Thus, "some rather sophisticated operational analytics" are essential to enable the agency to know "if its trains are running on time and patrons arriving at their destinations as expected...." Implementing this, a variety of operational analytics performed by BART's reliability engineering team includes delay analysis, passenger flow modeling (PFM), and system performance analysis, as well as various other types of modeling used for forecasting and similar tasks. Notably, "the data required for the analytics is complex and voluminous" — definitely Big Data. Using infrastructure based an IBM Maximo asset management system and an Oracle database/Linux platform, the BART team's operational analytics implements algorithmic code developed by Analytics software vendor SAS (ported from an original mainframe-based environment ). Critical to the top priority of ensuring on-time service is the PFM application, which deploys time-series analysis integrated with econometric data to render ridership forecasting models with the objective of optimizing train schedules to ensure high customer satisfaction while constraining service operating costs. The goal is to avoid running trains that are either underor overloaded; PFM, in addition to other functions, "captures or estimates train loadings for use in generating the train schedules." Through this application of Analytics, BART's operational team can monitor and analyze train arrivals and departures precisely. Furthermore, that information Salt Lake City TRAX Salt Lake City's TRAX light rail transit (LRT) system illustrates how Analytics — in this case, tracking and utilizing large volumes of real-time operational data — plays an integral role in even a modest rail system in a midsized American city. With the opening of the new Airport line in mid-April 2013, the TRAX LRT system stretches 41.3 miles over three lines (Red, Blue, and Green) with 47 stations, carrying daily ridership of about 65,000. An extension to the suburb of Draper nearing completion will expand the system to about 45 miles. [9] With a fleet of 146 LRT cars that run in train consists of up to three cars at maximum speeds ranging between 15 and 65 mph, TRAX's central operations must keep track of up to 23 trains running at headways as close as five minutes in peak periods or 20 minutes in off-peak periods. Particularly complicated is operation in sections where two or three lines share the same tracks. TRAX also shares a section of its line to suburban Sandy with freight trains of a short line operator. This joint use is authorized via temporal separation by the Federal Railroad Administration (FRA). To coordinate all these trains, including LRT often at close headways, a thoroughly reliable, efficient, safe, and secure train control and traffic management system is essential — and for TRAX, an ABS system with fixed blocks provides this capability. Analytics is an especially critical ingredient in the CTC train tracking and dispatching system, which relies on a GPS network to locate more than two dozen trains at peak times and communicate their positions to the control center, located at TRAX's Jordan River Service Center.. Since TRAX uses a relatively modest, less costly dispatching system heavily incorporating the KISS 4 B – Partnering for Success principle (Keep It Simple, Stupid), the system relies mainly on human dispatchers (rather than automation) to monitor train locations and operations. In turn, dispatchers perform essential functions like permitting trains to embark on each trip from a terminal station, throwing switches when necessary, and ensuring that trains adhere to schedule. But the GPS-based network and associated Analytics comprising the backbone of the dispatching system also facilitate a convenient passenger information system, with information presented via PIDs suspended beneath the roofs of TRAX stations. Similarly, reliable communications, transmitting large volumes of Big Data, and Analytics are involved in the self-service AFC system, including TVMs in all stations. Data" that would enable the agency not merely to track all its vehicles in service but, by applying Analytics to scrutinize service performance, to fine-tune and improve its operations. [11] MetroRail cars are also equipped with APC. While this is obviously valuable in gathering passenger statistics, it's also an efficient tool for planning and operations (e.g., adjusting schedules to accommodate passenger flows and changes in traffic demand by time of day). [10] Philadelphia — SEPTA Regional Rail Southeastern Pennsylvania Transportation Authority (SEPTA) is a very large transit agency operating an extensive system of rail rapid transit (subway and elevated), regional rail (commuter), light rail (urban and suburban lines, some with subway operation), motor buses, and electric trolleybuses ("trackless trolleys"). SEPTA deploys Analytics and Big Data throughout its system in a wide variety of functions, including operations control, AVL, train signaling and dispatching, AFC with TVMs, APC, passenger information with PIDs, and other tasks. However, for examples of the role of Analytics and Big Data, only selected applications in just a couple of these modal categories will be discussed here summarily. SEPTA's Regional Rail system consists of 13 electrified lines operating FRA-compliant rolling stock over about 280 miles of track, stretching as far north as Trenton, New Jersey, and south to Wilmington, Delaware. Daily ridership totals nearly 124,000. Regional Rail operations deploy Analytics intensively in the signaling/control/dispatching system, mostly with a combination of CTC and ABS. On lines shared with Amtrak (Paoli-Thorndale, Cynwyd, Chestnut Hill West, Airport, Trenton and Marcus Hook-Newark) Amtrak controls the dispatching and SEPTA trains are equipped with cab signals and compatibility with Amtrak's relatively high-tech Advanced Civil Speed Enforcement System (ACSES) system for PTC. On Amtrak’s Keystone corridor (Philadelphia to Harrisburg) cab signals are utilized, but the line is not 100% ATC. [12] The ACSES system is being extended to all of SEPTA's Regional Rail lines. Working in unison with SEPTA's existing signaling-control operations, these two systems will provide the functionality of PTC in compliance with the federal mandate. SEPTA's PTC system will be able to enforce permanent and temporary civil speed restrictions and train stops through a network of transponders, while maintaining the continuous track Austin — Capital Metro's MetroRail Since March 2010, Austin, Texas's Capital Metropolitan Transportation Authority (Capital Metro) has been operating its MetroRail light railway using diesel multiple-unit (DMU) rolling stock over a 32-mile line, with 10 stations, from the city's lower downtown to a northwestern suburban town. Despite the length, it's a relatively small, bare-bones system, with a fleet of just six DMUs, and daily ridership currently averaging about 2,200. In an arrangement similar to Salt Lake City's TRAX system (with its line to Sandy), MetroRail also shares its tracks, under temporal separation mandated by the FRA, with freight trains of a short line operator. Despite MetroRail's relatively small size, Analytics plays a critical role in the line's operations, particularly in its ABS system overseen by CTC. Communication between blocks and to-from the CTC control center is maintained via data radio as the primary system, and a cellular phone system as a secondary backup. [10] While trains (currently run as single cars) are equipped with GPS, the geopositioning system is not currently used for routine train location, but mainly as a component of the passenger information system. (GPS for train location is used as a temporary expedient for emergency situations or unusual freight train movements.) Thus, in what's in effect a limited AVL application, GPS provides train schedule information (e.g., the next arrival or departure) at stations via PIDs. The system, originally installed by Orbital Sciences Corporation, is now branded as ACS, under parent Xerox Corporation. Capital Metro has plans for major expansion of GPS and AVL in both its bus and MetroRail services. Possibly to be developed within the next several years, according to Todd Hemingson, Vice-President for Strategic Planning, AVL would generate "a massive pool of Big 5 B – Partnering for Success monitoring advantages of the current ATC System. The installation of the ACSES system will also ensure interoperability with Amtrak and various freight carriers. On Regional Rail lines owned and maintained by SEPTA there is a mixture of ABS and ATC (Rule 562: Cab Signals with no wayside signals). While all trains have cab signals, not all lines have been upgraded from ABS. ATC is operational from Center City to Doylestown, Jenkintown to Woodbourne, Glenside to Warminster, Newtown Junction to Fox Chase and from Wayne Junction to Chestnut Hill East. Projects are currently under way to convert the Manayunk-Norristown Line (16th Street Junction to Elm Street) and Chestnut Hill West (North Philadelphia to Chestnut Hill) to ATC (Rule 562). GPS is used for AVL, processed through the Regional Rail control center. APC is gradually being introduced into the Regional Rail system; SEPTA's new Silverliner V cars are equipped with APC detectors. APC data will be used to adjust scheduled consists and to track trips that receive external funding, such as services receiving federal Job Access and Reverse Commute (JARC) funding. Analytics plays a role in current planning and scheduling. For schedule development, the Regional Rail system uses Multi-Rail Passenger Edition, soon to be upgraded to Enterprise Edition. SEPTA's passenger information system provides PIDs with train arrival/departure updates in some of the system's larger stations. In addition, the system includes an app that provides bus and train status information to passengers' smartphones (Android and I-phone platforms). latitude/longitude, it will be possible to tally and analyze passenger boarding and alighting at each station. As with bus, trolleybus, and high-speed services, for the schedule management and planning of the suburban trolley services, SEPTA's Service Planning deploys the automated Trapeze scheduling system. Furthermore, this is integrated with Google Maps so that all route changes mapped in Trapeze are communicated to Google for automatic updating. And the smartphone-based passenger information app described in the Regional Rail section also serves passengers on these suburban trolley routes. Seattle — Sound Transit's Link and Sounder The Seattle-Puget Sound region's Sound Transit (ST) agency provides several important rail transit services reaching from the Central Business District into the surrounding metro area. ST's Central Link is a 15.6-mile (25.1-km) LRT line running between downtown Seattle and Seattle-Tacoma International Airport, with 13 stations. Average weekday ridership is about 25,300. ST's Sounder is a regional passenger rail (commuter rail) service operated under contract by BNSF Railway. From central Seattle, trains run north to Everett and south to Lakewood, plus two daily round-trips to and from Tacoma, over about 82 miles (132 km) of route, with 9 stations (and another 3 under construction). Average weekday ridership is about 25,300. As with other major rail transit operations, Analytics and Big Data are intensively involved in signalingcontrol-dispatching; passenger information with online and smartphone train status information and station PIDs; APC; GPS and AVL capabilities; and AFC with TVMs in stations. By far one of the most interesting deployments of Analytics and Big Data can be seen in a relatively recent expansion of the AFC system with the regional, transagency ORCA payment card. A contactless, stored-value "smartcard" containing a microprocessor, the ORCA (One Regional Card for All) card is used for the payment of public transportation fares on most of the region's major bus and rail services, including Washington State Ferries — thus providing a virtually "seamless" fare-payment (in effect, a prepaid pass) among these multiple systems and agencies.. [13] [14] The card medium itself must be purchased by the user (currently the charge is $5.00 or less, depending on the user's eligibility for discounts). Value (for fare payments) must then be added to the card (typically, via a credit card account, often as an online transaction). Discounts are offered for multi-ride packages as well as for passengers that are seniors, disabled, or in other Philadelphia — SEPTA Suburban Trolley Lines Another category of SEPTA's rail operations that provides interesting examples of some aspects of the deployment of Analytics, particularly in signaling-control functions, is the suburban trolley lines, Routes 101 (Media) and 102 (Sharon Hill). Totaling 11.9 miles (19.2 km), with 52 stations, the two lines carry daily ridership averaging over 6,500. While these LRT services use ABS, dispatching is very bare-bones — i.e., manual communication with the control center, where dispatchers authorize train departures by voice. However, conversion of signalingcontrol to CBTC is being planned. GPS is currently used primarily to assess on-time performance (averaging about 92%). [12] Suburban trolleys do not currently have APCs installed, but plans to install up to 10 units are awaiting funding. Since each station is now geocoded with 6 B – Partnering for Success qualified categories. Thus, the equivalent of multi-ride passes can be purchased as well as single fares. The card eliminates the inconvenience to passengers of constantly having to find currency or change to pay fares, especially when transferring from system to system. Passengers can use the ORCA card somewhat like a debit card. Entering a rail station or ferry terminal, or boarding a bus, the passenger can just tap it against an electronic reader. But ORCA card information can also be a source of significant Big Data for transit agencies, providing information on individual passenger movements on transit throughout the region, as well as broader data on passenger flows at various locations and times of day. Processed with good Analytics, the card data provide a wealth of information for planning and scheduling, leading to service improvements, as well as for marketing. typically, merely a computer-based process of discerning patterns in sizable sets of data — for example, discovering passenger flow and mobility patterns from boarding-deboarding data at stations. The basic aim of data mining is to glean information from a set of data and transform it into an intelligible structure for further useful analysis. Data mining is tending to bring together innovations in statistical analysis, database architecture, and machine learning development. Particularly with the maturing of technologies such as AFC and APC, opportunities abound for the rail transit industry to utilize data mining of the data flows from these technologies to analyze operations, passenger behavior, and other phenomena. This can then be utilized to improve services and performance, thus better fulfilling the basic missions of transit agencies. Tacoma — Sound Transit's Tacoma Link Streetcar Cloud Computing Tacoma Link, operated in central-city Tacoma by Sound Transit, is a very small 1.6-mile (2.6-km) streetcar-type light rail transit line with 5 stations, carrying daily ridership of roughly 3,800. Currently, the service is provided free (no fare), so there is no integration with ST's AFC system and the ORCA card. While the system is currently extremely bare-bones in overall design, with relatively simple operation, it does integrate its APC system with onboard GPS using inputs from the cars' door systems. Furthermore, GPS deployment in operations is planned to be further expanded. Currently on-train passenger announcements use wheel pulses from the cars' propulsion sensors to gauge distance traveled. However, the agency is in the process of replacing this older passenger information system on the train with a new digital system that will rely on GPS to identify train position. [15] Cloud computing commonly refers to the utilization of computing resources (both hardware and software) that are available over a network (typically the Internet). On a small scale, using online software and servers (e.g., an Email system or blogging software) is an example. However, cloud computing has grown as a means of providing the substantial computing resources — in terms of both computational "firepower" and storage — needed for the increasingly gigantic volumes of data (really Big Data) many organizations now encounter. With cloud computing, an external entity and remote services must be entrusted with the user's data — and many organizations understandably are reluctant to pass access to such sensitive data to external users. However, such security concerns are traded off against the necessity to have access to the necessary off-site computing and storage resources. One resource commonly used for many Big Data Analytics applications is Apache Hadoop, an open-source software framework supporting distributed processing applications, often needed for data-intensive tasks. Derived from Google's MapReduce and Google File System research, and written in the Java programming language, Hadoop is designed to support running applications on large "clusters" of commodity hardware (i.e., affordable and easily procured). Whether transit agencies will need to access cloud computing resources to effectively handle future needs in Analytics and Big Data remains to be seen. But these resources merit monitoring in the event such needs do eventually arise. CURRENT ISSUES AND TRENDS Where are Analytics and Big Data headed in public transportation? Here's a brief overview of some of the major current issues in Analytics and Big Data and the implications for rail public transportation. Data Mining This application of Analytics and Big Data has acquired a somewhat adverse public reputation, mainly because of privacy issues raised by intrusive manipulation of personal information. But data mining is, more 7 B – Partnering for Success Sentiment Analysis Privacy Concerns Sentiment analysis, also known as opinion mining , is actually a form of data mining applied to textual, verbal information. By parsing human verbal communication through natural language processing, applying computational linguistics, and deploying text analytics, subjective information, such as attitudes, opinions, and even intentions, in source materials can be identified and extracted for more intense processing and scrutiny. In general, the objective of sentiment analysis is to determine the attitude of individuals (the public, specific business customers, transit passengers, etc.) with respect to certain issues. Typically, specific issues are assessed against the contextual polarity of each of the verbal documents, which might be Emails, text messages, postings to forums or Facebook, Twitter messages, and so on. Evaluating aspects such as attitudes, judgments, or emotional states, is key to the process. The prominence of social media (especially Facebook and Twitter, as well as blogs and other social networks) has expedited interest in sentiment analysis. The proliferation of verbal data such as consumer reviews, subjective ratings, and personal recommendations, together with other types of verbal expression publicly available online, has tremendously increased attention in regard to this aspect of Analytics. For public transit agencies, sentiment analysis is a potentially valuable tool that merits consideration — for example, to gauge public attitudes toward the agency's services in general; to assess attitudes in regard to a new service, or perhaps a political issue such as a ballot measure; or simply to sift for transit-related issues important to passengers or the public at large. In some agencies, sentiment analysis is also used to monitor for security threats to transit operations or passengers. Certainly one of the most hot-button issues with respect to the general public's relationship to Big Data, Analytics, and associated applications such as data mining, is the issue of privacy. For transit agencies, the potential exists to extract great volumes of Big Data from fare transaction data, passenger counts, and even surveillance of passengers in trains and stations. Yet the opportunity for abuse is clear, and the public realize this — and it's also a major issue of concern within the professions themselves that are involved with Analytics and Big Data. An example is the ongoing controversy over Seattle's ORCA card (see previous discussion), underscored by a news reporter's ominous headline "Is Big Brother watching your ORCA card?" [14] This arises from the revelation that ORCA card sponsors and participants are data-mining information from passengers' use of the card; indeed, employers that provide the card to their own employees can receive data as to how and where each individual employee is using the card and traveling on the various transit systems. As the article reports, Whenever someone buys an employer-subsidized fare card through one of 2,000 companies or institutions, the employer has the right to see that person's travel records. A boss could check to see, for example, whether someone is abusing a subsidy by reselling ORCA cards or find out if an employee called in sick but rode the bus to the mall or the beach. And if you register any ORCA card, as transit officials suggest to protect against loss or theft, your personal information goes into the transit-agency database. Personal fare-card information is technically available to news media and other groups, as well, though it's unclear how forthcoming ORCA would be in providing it. In another application of data mining, a form of sentiment analysis is used by some transit agencies not merely to assess public attitudes toward the agency, but to parse personal text and Email messages and evaluate content to reveal possible intentions of threats against the system's operations or passengers. Despite the presumably benign objectives of agency security personnel, the public may well perceive this practice as a serious violation of the right to privacy. There's no "magic potion panacea" to apply to this issue. However, transit agencies would be well-advised to exercise caution in encroaching on personal privacy, and to keep monitoring this issue as it evolves in public discourse. Security issues Protecting critical data from theft, vandalism, intrusion by unauthorized users, and other hostile or destructive acts is obviously a major concern for transit agencies. This concern has only grown greater with the expansion of Big Data. This issue has escalated even further recently with the increasing publicity of "cyber-attacks" on the data and cybernetic functioning of large institutions, from banks to electric power installations to military facilities. Clearly, transit agencies are vulnerable and need to maintain vigilance against such threats. 8 B – Partnering for Success further application of this promising technology to solve problems and improve services in transit operations. Predictive Analytics Comprising an array of techniques from statistics, modeling, machine learning, artificial intelligence, and data mining, predictive analytics applies Analytics to current and perhaps historical data (increasingly, Big Data) to develop predictions about future (or perhaps otherwise occult) events or possible outcomes. Exploiting patterns detected in historical and transactional data, predictive models can be used to identify risks and opportunities, capturing relationships among a variety of elements to facilitate assessment of the potential risk associated with a particular set of conditions, thus helping to guide decisionmaking. But certainly the most venerable and productive use of predictive Analytics models in public transportation has been to evaluate the future role of public transit systems, forecast ridership, and suggest the need for new transit systems and facilities (as detailed in the earlier discussion on Travel Demand Modeling). Predictive models may have further benefits for public transportation, such as illustrated in BART's use of Analytics for passenger flow modeling and related operational projections and simulations. Other possibilities, meriting evaluation by transit agencies and IT professionals, is applying predictive modeling to analyze behavioral data to evaluate the propensity of transit passengers to exhibit specific behaviors. This would be useful, for example, as a tool to help improve the effectiveness of new marketing efforts. NOTES 1. This section has been adapted and expanded from Henry, Lyndon. Public Transportation Moves With Analytics. All Analytics (online), 10 July 2012. http://www.allanalytics.com/author.asp?section_id=2310 &doc_id=247066 2. BMC staff. Travel Demand Forecasting Model. Baltimore Metropolitan Council (BMC) website, 2013. http://www.baltometro.org/regional-data-andforecasting/travel-demand-forecasting-model 3. RTWP editors. US Railroad Signalling. Railway Technical Web Pages (RTWP). Site updated 3 April 2013. http://www.railway-technical.com/US-sig.shtml 4. Burgett, Michael J. The Engineering Basics of CTC. Control Train Components website. Accessed 11 April 2013. http://www.ctcparts.com/aboutprint.htm 5. AAR editors. Positive train control. Association of American Railroads (AAR) website. Accessed 13 April 2013 https://www.aar.org/safety/Pages/Positive-TrainControl.aspx#.UWlKRZNthLU Robotics 6. MacKechnie, Christopher. Software Used in the Public Transit Industry: Hastus by GIRO. About.com Guide. Accessed 2 April 2013). http://publictransport.about.com/od/Transit_Technology/a /Software-Used-In-The-Public-Transit-Industry-HastusBy-Giro.htm Robotics technology incorporates some of the most advanced applications and developments of Analytics to Big Data sets and challenges, addressing the design, fabrication, operation, and application of automated machines or devices that can replicate human activity or behavior in situations ranging from dangerous environments, manufacturing processes, or tediously repetitive tasks, to simply ordinary, routine physical functions such as housekeeping chores or operating a vehicle. While some robots may be designed to resemble humans in appearance, in most cases they are designed to assume human behavior and even cognition. With the development and deployment of automatic train control and, increasingly, totally autonomous, selfcontrolled and self-monitored transit operations (e.g., driverless metros), rail public transport systems have certainly been in the forefront of robot technology for decades. Transit professionals, and especially IT personnel, should continue to monitor developments in this area of Analytics, seeking opportunities for the 7. Center for Urban Transportation Research (CUTR) staff. Case Study — Bay Area Transit District (BART) — San Francisco, California; CUTR, University of South Florida (USF); document #FTA-FL-26-71054-03. http://www3.cutr.usf.edu/security/documents/UCITSS/B ART.pdf 8. Schultz, Beth (Editor in Chief, All Analytics website). Operational Analytics Keeps Bay Area Trains on Track. All Analytics (online), 15 May 2012. http://www.allanalytics.com/author.asp?section_id=1411 &doc_id=244062 9 B – Partnering for Success 9. This section has been adapted and expanded from Henry, Lyndon. Analytics Keep SLC's Light Rail on Track. All Analytics (online), 28 December 2012. http://www.allanalytics.com/messages.asp?piddl_msgthre adid=260931 10. Clendennen, Mark (MetroRail, Capital Metro). Phone conversation, 10 April 2013. 11. Hemingson, Todd (Vice-President for Strategic Planning, Capital Metro). Phone conversation, 5 April 2013. 12. Calnan, John F. (Manager, Suburban Service Planning & Schedules, SEPTA). Phone conversation, 10 April 2013. Email message, "Signals, APC, GPS etc. on SEPTA Suburban LRT ", 11 April 2013. 13. ORCAcard.com website editors. About ORCA. Accessed 9 April 2013. http://www.orcacard.com/ERG-Seattle/p3_001.do?m=3 14. Lindblom, Mike. Is Big Brother watching your ORCA card? Seattle Times, 17 December 2009 (updated 18 December 2009 ). http://seattletimes.com/html/localnews/2010537022_orca card18m.html 15. Blackburn, Robert (Tacoma Link Light Rail Manager, Sound Transit). Email message, 26 March 2013. CONTACT INFORMATION: Lyndon Henry nawdry@gmail.com Phone 512.441-3014 10