Cross-Platform Aviation Analytics Using Big-Data Integration Methods 2013 Integrated Communications Navigation and Surveillance (ICNS) Conference April 25, 2013 Dr. Tulinda Larsen Vice President tulinda@masflight.com Mobile. +1 (443) 510-3566 4833 Rugby Avenue, Suite 301 Bethesda, Maryland 20814 www.masflight.com T H E AN ALY S I S C H AL L E N G E The Analysis Challenge Scale and complexity of aviation data limits research applications Fuel and Oil Conservation Problems Acquiring Data Gate and Terminal Use • Real-time transmission of very large data • Proprietary and inconsistent formats • No conditioning or validation Weather Plan & Ops Recovery Pilot and Crew Staffing Operational Optimization Obtaining radar and airport data, schedules, weather maps and forecasts, fleet information Problems Analyzing Information Using data for strategic planning and recovery, cost improvement and new market opportunities • Goes beyond desktop capability • Time-consuming manual slicing of data • Need weather and competitor information to answer key operational questions Big-data analytical methods can address these challenges CLOUD COMPUTING What is Cloud Computing? The cloud consists of terrestrial servers across the Internet that collectively store, manage and process data • • Figurative “Cloud” The term comes from the common use of a cloud-shaped symbol as an abstraction for the Internet but application to virtual servers is as recent as 2006 Cloud computing is the use of resources (hardware and software) that are delivered as a service over the Internet or other network Network Identity Monitoring Content Object Storage Content Application Platform Infrastructure Computation Financial Metrics Communication Collaboration Storage Databases Cloud Computing Architecture CLOUD COMPUTING What are Cloud Architectures? • • Cloud computing services can be delivered by an internal IT organization (company-owned private cloud) or By an external service provider (managed services private cloud or public cloud provider) or Shared between users Public Cloud Providers Community Cloud Providers Low Industry Expertise High Industry Focus Managed Services Private Providers Company Private Clouds Private Infrastructure In aviation cloud resources can be customized and shared among consortiums of customers (community cloud) or shared with customers in other industries (public cloud) B I G - D ATA A N A LY T I C S What is Big-Data Analytics? • The process of examining diverse, large-scale data sets to uncover patterns, unknown correlations and other useful information • Organizations have different levels of (1) database management expertise and (2) knowledge to process and analyze big data sets – “Big data” is a relative term based on the user – Data tables in excess of ten terabytes (10TB) are difficult to work with using most relational database management systems, and particularly using desktop statistics and visualization packages, including Microsoft Excel and Access • Unstructured data sources in the operational world simply do not fit into desktop or small-scale database structures – They can be hosted using cloud computing at lower cost, and mined more efficiently, than with on-premises database architectures B I G - D ATA A N A LY T I C S What are Big-Data Analytics Tools? • Big-data analytics employ software tools from advanced analytics disciplines such as data mining and predictive analytics. – Mining data, trends or analysis of these multi-terabyte data sets requires parallel software running on tens, hundreds, or even thousands of servers to keep pace with user demands and processing expectations. • A new class of big-data methods have emerged to address user demands for horizontal scaling and availability of underlying data – Hadoop and MapReduce, among others, offer fast processing speed. – Great for large-scale static data sets, but not so great for real-time data – Most organizations employ a hybrid method combining technologies • A robust open source framework supports processing in clustered systems. • Platform-as-a-service vendors (Microsoft, Amazon, Google) offer turn-key solutions for analysts to simply upload, link and compute basic data sets – Great for simple historical analysis; bad for real-time or diverse data sets MASFLIGHT masFlight: A Global Aviation Data Warehouse and Big-Data Analytics Platform Hybrid Architecture Redundancy • Physical architecture for secure data feeds • Multi-source data acquisition • Cloud-based instances for linking • Real-time validation and processing • Managed cloud data tables • Replication across cloud infrastructure • Integrates with local BI and warehouses • Load balancing and parallel processing Backup Customization • Cluster processing to reduce dependencies • Customizable for specific user requirements • Monitored data integrity and performance • Dashboards and web templates • Multiple geographic zones and clusters • Integrated internal data in warehouse • Imaging of tables for replication • Connect to local BI systems D ATA A N D A P P L I C AT I O N S masFlight’s Data and Applications Platform OUR CLOUD-BASED DATA WAREHOUSE Data Input Feeds In-House Servers For private gov’t feeds Reference and Static Data Geospatial, airline, airport info Current Weather Global hourly conditions Forecast Weather Standard and severe forecasts Flight Schedules What’s planned to operate Secure External Network Cloud Warehouse Linked Information 60TB structured data Airport & Gate Status Multisource, real-time feeds OUR CUSTOMER APPLICATIONS Web Application (masflight.com) HTML 5 / Ruby Analyst focused Customizable Fast deployment SaaS revenue model Dashboards & Web Services REST web services Feed internal systems Custom dashboards Flexible interfaces Secure U.S./Canada Radar Authorized direct access Other Airspace Data Satellite and transponder info Government Economic Data Revenue and audited data Robots and Java Applications Cloud Managed Database Hosting Automated collection Virtual tables Updated in real time Bypass constraints Ultimate customization M A S F L I G H T P L AT F O R M masFlight Platform Multisource, integrated airline operations data Planned Flight Schedules Airport Runway Data Airport Gate & Terminal Data Airline Ops Data Multisource Flight Status U.S. Radar Data Airline Fleet Information Global Weather Data and Maps Key Partners and Suppliers: Our platform shows where, when and why problems occur • Examine diversions, cancellations, delays and determine root causes • Deep-dive into airport gates, taxi times, and runway patterns • Analyze air space usage and air traffic management E N D TO E N D C APAB I L I T Y Big-Data Analytics Facilitates End-to-End Analysis A full picture of each flight is critical for analyzing operations Query flights from planned schedule through post-operation recovery Up to 500 data points per flight KIAD V268 SWANN 1502Z 1550Z 1620Z Origin weather Origin information Operating airline Scheduled times Departure gate/time Taxi-out/takeoff times Flight plan filed Actual path flown Congestion Weather diversions En-route times and fixes Arrival weather Destination information Landing/taxi times Arrival gate/time Diversion data Aircraft information Other sources only offer limited, disaggregated and unformatted regional data COVERAGE A Global Solution masFlight tracks flights, airports and weather around the world North and South America EMEA and Asia • Global daily flight information capture ― 82,000 flights ― 350 airlines ― 1700 airports • Integrated weather data for 6,000 stations ― Match weather to delays ― Validate block forecasts at granular level White lines are flights in the masFlight platform from February 8, 2013. Yellow pins are weather stations feeding hourly data to our platform. Maps from Google Earth / masFlight ― Add weather analytics to IRROPS review and scenario planning TOWER CLOSINGS Example 1: Proposed FAA Tower Closures masFlight used big-data to link airport operations across three large data sets: – Current and historical airline schedules – Raw Aircraft Situation Display to Industry (ASDI) radar data from the FAA – Enhanced Traffic Management System Counts (ETMS), including Airport operations counts by type (commercial, freight, etc.), departure & arrival Findings: Proposed Tower Closings • Dots indicate closures; Red dots have scheduled service From schedules database: 55 airports with scheduled passenger airline service – 14 EAS Airports • From ASDI & ETMS: 10,600 weekly flights on a flight plan (ex. VFR and local traffic) – 6,500 Part 91/125 weekly flights – 4,100 Part 135/121 weekly flights Based on scheduled service March 1 – 7, 2013; scheduled service includes scheduled charter flights, cargo flights, and passenger flights TOWER CLOSINGS Example 1: Big-Data Analytics Applied to ASDI and ETMS To Analyze Operations Distribution of Airports By Average Number of “Daily” Impacted Flights Airports Affected by Tower Closures Count of Airports 44 26 24 23 11 10 6 Up to 5 5-10 10-15 15-20 20-25 25-30 30-35 2 1 2 35-40 40-45 45+ Average Number of Daily Operations with a Flight Plan Filed Source: ASDI radar data – Part 91/151 flying and Part 135/121 flying – March 1-7, 2013; masFlight analysis Note: Average “daily“ operations based on 5-day week CAUSAL FACTORS Example 2: Aviation Safety Causal Factor Data-mining algorithms can mine the text of safety reports to obtain specific data that can be used to analyze causal factors. For example, consider the following ASRS report (ACN 1031837): “Departing IAH in a 737-800 at about 17,000 FT, 11 miles behind a 737-900 on the Junction departure over CUZZZ Intersection. Smooth air with wind on the nose bearing 275 degrees at 18 KTS. We were suddenly in moderate chop which lasted 4 or 5 seconds then stopped and then resumed for another 4 or 5 seconds with a significant amount of right rolling… I selected a max rate climb mode in the FMC in order to climb above the wake and flight path of the leading -900. We asked ATC for the type ahead of us and reported the wake encounter. The -900 was about 3,300 FT higher than we were.” • Synopsis – B737-800 First Officer reported wake encounter from preceding B737-900 with resultant roll and moderate chop. What causal factors can be identified from this narrative that could be applied to future predictive applications? CAUSAL FACTORS Example 2: Identifying Causal Factors Indicators – Data Element Methods – Identifying Context and Causes • Time of day • Date range (month, day) • Aircraft type We pinpoint the sequencing of flights on the IAH Junction Seven departure (at CUZZZ) during the specified wind conditions to find cases where a B737-900 at 20,000 feet precedes by 11 miles a B737-800 at 17,000 feet • Fix or coordinates • • Originating airport Search related data sets including ASDI (flight tracks, local traffic and congestion) • • Destination airport Weather conditions for alternative causes (winds aloft, shear and convective activity) • Weather notes • Airline specific information (repeated occurrence of event in aircraft type) Big data gives us visibility into contextual factors even if specific data points are missing such as a specific date or route. Big-data analytics gives us insight into unreported factors as well. C O M PA R I N G O T P A N D U T I L I Z AT I O N Example 3: Correlating Utilization and Delays Daily Utilization vs. On-time Departures January 2013 System Operations Narrowbodies By Day of Week 100.0% Correlation Coefficient -0.53 ONTIME DEPARTURE PERFORMANCE 90.0% Includes AA, AC, AS, B6, F9, FL, NK, UA, US, VX and WN 100% 95% 80.0% 70.0% 90% 60.0% 85% 7.0 80% 75% 100.0% 70% 90.0% 65% 80.0% 60% 9.0 11.0 13.0 Widebodies by Day of Week 70.0% 7 9 11 HOURS OF DAILY UTILIZATION SOURCE: masFlight (masflight.com) 13 60.0% 7.0 9.0 11.0 13.0 U T I L I Z AT I O N B Y H U B Example 4: Daily Utilization of Gates, by Hub Big-data analysis of different carriers – daily departures per gate used United Airlines Hubs Alaska Airlines Hubs American Hubs Average Daily Deps per Gate Used Average Daily Deps per Gate Used Average Daily Deps per Gate Used CLE SJC 3.6 IAD 3.8 IAH 5.8 4.0 JFK LAX 4.3 GEG 4.4 MIA DEN 6.1 SFO 5.3 LGA EWR 6.2 ANC 5.4 LAX PDX 5.5 SFO 7.2 LAX SAN 7.4 ORD 7.7 2.7 6.4 6.8 DFW 6.4 SEA 5.0 6.9 ORD 7.8 7.2 JetBlue Focus US Airways Hubs AirTran Hubs Average Daily Deps per Gate Used Average Daily Deps per Gate Used Average Daily Deps per Gate Used FLL DCA 4.9 BOS 5.2 PHX MCO 5.8 BOS 5.8 LGB MKE 6.0 6.2 CLT 5.9 6.9 7.2 MCO June 1 through August 31, 2012. Gates with minimum 1x daily use SOURCE: masFlight (masflight.com) 5.5 6.6 BWI DCA 4.7 4.9 ATL PHL JFK 4.2 6.6 CONCLUSIONS Conclusions for Big Data in Aviation • Big-data transforms operational and commercial problems that were practically unsolvable using discrete data and on-premises hardware • Big data offers new insight into existing data by centralizing data acquisition and consolidation in the cloud and mining data sets efficiently • There is a rich portfolio of information that can feed aviation data analytics – Flight position, schedules, airport/gate, weather and government data sets offer incredible insight into the underlying causes of aviation inefficiency. – Excessive size of each set forces analysts to consider cloud based architectures to store, link and mine the underlying information – When structured, validated and linked, these data sources become significantly more compelling for applied research than they are individually • Today’s cloud based technologies offer a solution CONCLUSIONS Conclusions: Our Approach • masFlight’s data warehouse and analysis methods provide a valuable example for others attempting to solve cloud based analytics of aviation data sets • masFlight’s hybrid architecture, consolidating secure data feeds in on-premises server installations and feeding structured data into the cloud for distribution, addresses the unique format, security and scale requirements of the industry • masFlight’s method is well suited for airline performance review, competitive benchmarking, airport operations and schedule design, and has demonstrated value in addressing real-world problems in airline and airport operations as well as government applications