Dynamic Prediction of Terminal-Area Severe Convective Weather Penetration by Daniel Schonfeld B.S., United States Air Force Academy (2013) Submitted to the Sloan School of Management in partial fulfillment of the requirements for the degree of Master of Science in Operations Research at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2015 c Massachusetts Institute of Technology 2015. All rights reserved. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sloan School of Management May 8, 2015 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hamsa Balakrishnan Associate Professor of Aeronautics and Astronautics Thesis Supervisor Accepted by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrick Jaillet Dugald C. Jackson Professor Department of Electrical Engineering and Computer Science Co-director, Operations Research Center Dynamic Prediction of Terminal-Area Severe Convective Weather Penetration by Daniel Schonfeld Submitted to the Sloan School of Management on May 8, 2015, in partial fulfillment of the requirements for the degree of Master of Science in Operations Research Abstract Despite groundbreaking technology and revised operating procedures designed to improve the safety of air travel, numerous aviation accidents still occur every year. According to a recent report by the FAA’s Aviation Weather Research Program, over 23% of these accidents are weather-related, typically taking place during the takeoff and landing phases. When pilots fly through severe convective weather, regardless of whether or not an accident occurs, they cause damage to the aircraft, increasing maintenance cost for airlines. These concerns, coupled with the growing demand for air transportation, put an enormous amount of pressure on the existing air traffic control system. Moreover, the degree to which weather impacts airspace capacity, defined as the number of aircraft that can simultaneously fly within the terminal area, is not well understood. Understanding how weather impacts terminal area air traffic flows will be important for quantifying the effect that uncertainty in weather forecasting has on flows, and developing an optimal strategy to mitigate this effect. In this thesis, we formulate semi-dynamic models and employ Multinomial Logistic Regression, Classification and Regression Trees (CART), and Random Forests to accurately predict the severity of convective weather penetration by flights in several U.S. airport terminal areas. Our models perform consistently well when re-trained on each individual airport rather than using common models across airports. Random Forests achieve the lowest prediction error with accuracies as high as 99%, false negative rates as low as 1%, and false positive rates as low as 3%. CART is the least sensitive to differences across airports, exhibiting very steady performance. We also identify weather-based features, particularly those describing the presence of fast-moving, severe convective weather within the projected trajectory of the flight, as the best predictors of future penetration. Thesis Supervisor: Hamsa Balakrishnan Title: Associate Professor of Aeronautics and Astronautics Acknowledgments I would like to thank my advisor, Professor Hamsa Balakrishnan, for her support and guidance throughout this project. Thanks to ICAT alum Yi-Hsin Lin for getting me up to speed with the data used in this thesis and answering any and all questions about her research. Thanks also to my fellow ORC students and friends, particularly Jack, Zeb, Kevin, and Virgile for their help at various stages of the project and for listening to me ramble about weather penetration and technical support. Finally, I would like to thank my family for encouraging me to pursue a graduate degree, and for cheering me on throughout the process. 5 6 Contents 1 Introduction 1.1 18 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 19 Convective Weather Avoidance Model (CWAM) and Weather Avoidance Fields (WAFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.1.2 Defining “Severe Convective Weather” . . . . . . . . . . . . . . . . . 21 1.1.3 Defining the “Terminal Area” . . . . . . . . . . . . . . . . . . . . . . 22 1.1.4 Terminal Area Operations . . . . . . . . . . . . . . . . . . . . . . . . 23 1.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Overview of Data 2.1 2.2 Weather Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.1 Vertically Integrated Liquid (VIL) . . . . . . . . . . . . . . . . . . . . 28 2.1.2 Echo Tops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.3 Case Days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 ETMS Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Verifying ETMS Trajectory Data for Model Dataset . . . . . . . . . . 33 ASPM Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.1 2.3 27 3 Feature Identification 37 3.1 Three Separate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Dynamic Nature of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Weather-Based Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.1 Measuring Severity of Weather . . . . . . . . . . . . . . . . . . . . . 44 3.3.2 Measuring Movement of Weather . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Spatial Positioning of Weather . . . . . . . . . . . . . . . . . . . . . . 47 7 3.4 3.5 In-Flight Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 Time Spent Within Terminal Area . . . . . . . . . . . . . . . . . . . 50 3.4.2 Flight Behavior Within Terminal Area . . . . . . . . . . . . . . . . . 51 3.4.3 Positioning Within Terminal Area . . . . . . . . . . . . . . . . . . . . 55 Behavior of Other Pilots in the Terminal Area . . . . . . . . . . . . . . . . . 57 3.5.1 Are Flights Ahead Penetrating Severe Convective Weather? . . . . . 58 3.5.2 Behavior of Flights in the Opposite Sequence . . . . . . . . . . . . . 59 3.5.3 Follow the Leader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5.4 Feature Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4 Predictive Modeling of Pilot Behavior 64 4.1 Defining the Dependent Variable . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Defining our Model Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 Predictive Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.1 Multinomial Logistic Regression . . . . . . . . . . . . . . . . . . . . . 70 4.3.2 Classification and Regression Trees (CART) . . . . . . . . . . . . . . 70 4.3.3 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1 Model 1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4.2 Model 2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4.3 Model 3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.4 Summary of ORD Results . . . . . . . . . . . . . . . . . . . . . . . . 85 Testing Our ORD Models on Other Airports . . . . . . . . . . . . . . . . . . 86 4.5.1 Selecting Airport Pairings for Common Model Experiment . . . . . . 87 4.5.2 Comparison of Results . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.3 Insight from Pairings Experiment . . . . . . . . . . . . . . . . . . . . 91 4.5.4 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 4.5 8 4.6 Sensitivity of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Case Studies and Pilot Experience 5.1 5.2 92 94 Takeaways from Pilot Interviews . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.1.1 Weather Radar and Forecasting Technology in the Cockpit . . . . . . 95 5.1.2 Deviation from the Filed Flight Path . . . . . . . . . . . . . . . . . . 95 5.1.3 Impact of Convective Weather on Departures . . . . . . . . . . . . . 96 5.1.4 Impact of Convective Weather on Arrivals . . . . . . . . . . . . . . . 97 5.1.5 Summary of Interview Takeaways . . . . . . . . . . . . . . . . . . . . 98 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2.1 Theme 1: Pilots Try to Avoid Storm Cells . . . . . . . . . . . . . . . 99 5.2.2 Theme 2: Arrivals Have a Tougher “Go-of-It” . . . . . . . . . . . . . 101 5.2.3 Theme 3: Weather Is Unpredictable . . . . . . . . . . . . . . . . . . . 102 5.2.4 Case Study Wrap-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6 Conclusions and Future Work 105 6.1 Thesis Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Ideas for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2.1 Expand Model Datasets . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2.2 Incorporate Weather Forecasts . . . . . . . . . . . . . . . . . . . . . . 107 6.2.3 Additional Weather Features . . . . . . . . . . . . . . . . . . . . . . . 107 6.2.4 Taking an Alternative Approach: Human Factors . . . . . . . . . . . 108 9 10 List of Figures 1.1 Example of WAF lookup table [15]. . . . . . . . . . . . . . . . . . . . . . . . 1.2 Map of Chicago O’Hare arrival fixes. O’Hare’s TRACON is outlined in blue 20 [14]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1 Example of a VIL image from June 13, 2008, at 0000Z [14]. . . . . . . . . . . 29 2.2 Example of an ET image from June 13, 2008, at 0000Z [14]. . . . . . . . . . 31 3.1 Terminal area intervals for Model 1. . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Distribution of Penetrations by Distance from ORD . . . . . . . . . . . . . . 40 3.3 Projected flight trajectory looking 10 km out with swath width of 55 degrees. Angles and distances are not exactly to scale. . . . . . . . . . . . . . . . . . 43 3.4 Calculation of weather-based feature that measures severity. . . . . . . . . . 44 3.5 Distribution of penetration entries by severe weather coverage within the trajectory projection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Calculation of weather-based features that measure movement. . . . . . . . . 46 3.7 Example of flanking metric calculations with the number of red cells representing “flankcount” and the standard deviation of the degree values representing “FlankingValue”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Plot of Altitude vs. Distance from Takeoff for departure trajectories on July 2nd that penetrated severe convective weather. . . . . . . . . . . . . . . . . . 3.9 48 51 Plot of Altitude vs. Distance from Takeoff for departure trajectories on July 2nd that took place during a weather impact but did not penetrate. . . . . . 52 3.10 Plot of Altitude vs. Distance from Takeoff for departure trajectories on June 10th with no weather impact. . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.11 Plot of arrival flight executing a trombone maneuver at ORD. The red dots represent the trajectory points and the nose of the plane is represented by the red circle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 54 3.12 Example of Model 3 distance metric calculations with point-to-point distance measured in kilometers representing “DistfromLanding” and angular distance measured in degrees representing “CircleDistfromLanding”. . . . . . . . . . . 57 3.13 Example of “follow-the-leader” behavior by departures in the West sector of the ORD terminal area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 61 Distribution of VIL and Echo Top values for arrival and departure penetrations within the ORD terminal area. . . . . . . . . . . . . . . . . . . . . . . 67 4.2 Model 1 ORD CART Output . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3 Model 2 ORD CART Output . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Model 3 ORD CART Output . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5 Map of Top 30 Penetration Airports . . . . . . . . . . . . . . . . . . . . . . . 87 5.1 Example of avoidance behavior by arrivals in the Southwest sector of the ORD terminal area on July 9, 2008 at 001730Z. . . . . . . . . . . . . . . . . . . . 5.2 99 Example of unexplained penetration behavior by a departure in the Northwest sector of the ORD terminal area on July 8, 2008 at 061730Z. . . . . . . . . . 100 5.3 Both departures and arrivals affected by weather in the West sector of the ORD terminal area on July 2, 2008 at 222500Z. . . . . . . . . . . . . . . . . 101 5.4 Ground stop is issued due to weather covering the airport on July 2, 2008 at 223230Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.5 Arrivals executing approach and landing maneuvers amidst severe weather in the Northwest sector of the ORD terminal area on August 22, 2008 at 173000Z.103 5.6 Concentration of VIL level 6 pixels forms in the middle of the arrival approach path in the Northwest sector of the ORD terminal area on August 22, 2008 at 174000Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 12 5.7 Arrivals begin to circumvent the storm cell as it moves west to east in order to maintain the approach path in the Northwest sector of the ORD terminal area on August 22, 2008 at 175730Z. . . . . . . . . . . . . . . . . . . . . . . 105 13 14 List of Tables 2.1 VIL Level Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 List of severe convective weather penetration periods within the ORD terminal 29 area during summer 2008. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 Summary of Model Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1 VIL Point Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Echo Top Point Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 Dependent Variable Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4 Breakdown of frequencies of each severity level for Models 1, 2, and 3. . . . . 68 4.5 Model 1 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6 Model 1 CART Feature Importance Values . . . . . . . . . . . . . . . . . . . 75 4.7 Model 1 Random Forests Feature Importance Values . . . . . . . . . . . . . 77 4.8 Model 2 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.9 Model 2 CART Feature Importance Values . . . . . . . . . . . . . . . . . . . 79 4.10 Model 2 Random Forests Feature Importance Values . . . . . . . . . . . . . 80 4.11 Model 3 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.12 Model 3 CART Feature Importance Values . . . . . . . . . . . . . . . . . . . 83 4.13 Model 3 Random Forests Feature Importance Values . . . . . . . . . . . . . 84 4.14 Airport Pairings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.15 Comparison of Model 1 performance results for the re-training vs. airport pairing methods. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree” represents CART, and ”RF” represents Random Forests. Regarding performance metrics, “Acc” represents the prediction accuracy. “FN 1” represents the first false negative rate defined, “FN 2” represents the second false negative rate, and “FP” represents the false positive rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 4.16 Comparison of Model 2 performance results for the re-training vs. airport pairing methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.17 Comparison of Model 3 performance results for the re-training vs. airport pairing methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.18 Comparison of Model 1 performance results for variable subsets. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree” represents CART, and ”RF” represents Random Forests. Regarding performance metrics, “Acc” is the proportion of interval entries for which the model predicts the correct severity level. “FN 1” examines how often the models predict a severity level lower than the actual severity level that occurred. “FN 2” examines how often the models predict a severity level of 0 when in fact the severity level was greater than 0. “FP” measures the proportion of interval entries for which the model predicts a severity level greater than 0 when in fact the severity level was 0. . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.19 Comparison of Model 2 performance results for variable subsets. . . . . . . . 93 4.20 Comparison of Model 3 performance results for variable subsets. . . . . . . . 94 16 17 1 Introduction The increase in demand for air travel in the United States has resulted in an increase in congestion and delays in the National Airspace System (NAS), making the system more susceptible to weather disruptions. Convective weather can close airports, degrade capacity for acceptance/departure, hinder or stop ground operations, and make operations inefficient in general [13]. These disruptions are particularly impactful during summer months, when travel demand is high and there is frequent convective weather activity (i.e. thunderstorms) across the United States [16]. Furthermore, the desire to sustain and meet air travel demand sometimes forces pilots into situations in which they are unable to avoid weather penetration, and controllers into situations in which they are unable to prevent it. When a pilot penetrates convective weather, intentionally or unintentionally, he/she not only puts those onboard in danger but also causes damage to the aircraft, resulting in lost revenue and excessive maintenance costs. Moreover, although it is clear that convective weather reduces airspace capacity and results in inefficient flying, the degree to which capacity is reduced and air traffic flows are affected as a result of weather is not clear. The re-routing of planes within the terminal area, initiated by either the pilot or air traffic controller, reduces airspace capacity and increases controller workload. Existing research into the types and severity of weather that cause re-routing/deviation typically treats all flights as equal [14], failing to differentiate between specific aircraft types, regional vs. international flights, departures vs. arrivals, and other flight categories. This thesis takes a different approach by exploring operational factors that may differentiate pilot behavior as well as weather-based factors that indicate future penetration within the terminal area. 18 1.1 Background In this thesis, we rely heavily on research previously conducted by Yi-Hsin Lin at the MIT International Center for Air Transportation (ICAT). Lins work built on the Convective Weather Avoidance Models (CWAM) developed at MIT Lincoln Laboratories. The CWAMs produce Weather Avoidance Fields (WAF), which identify the areas impenetrable to aircraft as a result of weather. The following subsections below will briefly describe the succession of CWAMs and WAF, and then explain why we chose to define severe convective weather differently than Lin. 1.1.1 Convective Weather Avoidance Model (CWAM) and Weather Avoidance Fields (WAFs) Rich DeLaura and his team at MIT Lincoln Laboratories developed the CWAM in response to increasing delays in the NAS caused by thunderstorms. The CWAM provides decision support tools for air traffic controllers to aid them in determining the impact of weather on existing traffic, devising a tactical response to mitigate the impacts of weather, predicting the effects of a particular routing strategy, and predicting updated arrival times for flights subjected to regions of convective weather [7]. The CWAM achieves this by analyzing planned and actual trajectories as well as a variety of weather indicators to predict enroute flight deviations due to convective weather. There are several versions of the CWAM. The first model (CWAM1), which focused on enroute flights, was developed in 2006 based on 800 trajectories over five different days in the Indianapolis (ZID) and Cleveland (ZOB) “super sectors” [7]. The study took into account the following three weather indicators that will be discussed in more detail in Chapter 2: VIL (measure of precipitation intensity), echo tops (storm height), and lightning strike counts. The second version (CWAM2), which also focused on enroute flights, was developed in 2008 and expanded the number of flights in the dataset to about 2,000 by adding the Washington 19 D.C. “super sector” [8]. It also considered additional weather factors such as vertical storm structure and vertical and horizontal storm growth to help decrease CWAM1’s prediction error rate. In 2010, the release of CWAM3 refined earlier models to improve detection of non-weather related deviations, such as shortcuts, and further expanded the dataset to about 5,000 flights [5]. Most recently, MIT Lincoln Laboratories developed a version of the CWAM specific to low-level flights within the terminal area, which typically operate below the tops of convective weather and have slightly different operational constraints [5]. The terminal area CWAM is calibrated based on historical pilot behavior during weather encounters near the destination airport. All versions of the CWAM return the probability of deviation for a pilot encountering a particular set of weather conditions. This output, commonly referred to as the Weather Avoidance Field (WAF), is a probability lookup table: for any given echo top height and local VIL coverage, the model returns a probability of pilot deviation on a pixel-by-pixel basis [14, 15]. An example of this lookup table can be seen in Figure 1.1. The main advantage of using WAF over the raw VIL/ET metrics is that WAF eliminates much of the light rain that has little to no effect on aviation and accounts for the frequency of lightning strikes within each pixel [14]. Figure 1.1: Example of WAF lookup table [15]. 20 The principal difference between the enroute and terminal area CWAMs is the primary determinant of pilot deviation. For the enroute airspace model, the difference in altitude between the flight and the echo top height served as the primary determinant [7]. In contrast, the fractional VIL coverage of Level 3 or above within a specified kernel of the flight trajectory served as the primary determinant of deviation in the terminal area model [5]. This makes sense because pilots can overfly weather during the enroute phase flight, but due to the low altitudes necessary for descent/ascent, pilots typically cannot overfly weather in the terminal area. Therefore, the WAF for enroute flights is fundamentally different than WAF for ascending/descending flights. 1.1.2 Defining “Severe Convective Weather” A question which naturally arises is how to quantitatively define “severe convective weather”. Lin defined it as WAF levels of 80 or above. WAFs of 80 or above can be interpreted as when a pilot has a greater than 80% chance of actually penetrating Level 3 VIL or above with flight altitude below the corresponding echo top value [14]. In contrast, we chose not to use WAF to classify severe convective weather for a variety of reasons. First, WAF reflects the probability that a pilot will deviate rather than explicitly describing weather conditions like VIL and echo top do. Second, since some low-level VIL pixels will correspond to high WAFs simply because of proximity (within 4 km kernel) to higher VIL levels, it is possible that pilots flying through high WAFs are not actually penetrating severe convective weather at all. At the other end of the spectrum, WAF is low in situations where pilots have no chance of deviating because they are surrounded by severe convective weather. Third, terminal WAF does not account for each individual flights altitude relative to echo top height [14], so it is unclear whether or not the flight is above or below the storm. Lastly, WAF is not consistent: a VIL/ET combination that constitutes a WAF of 80 on one case day does not always translate to a WAF of 80 on another case day. For example, on August 4, 2008, the 21 WAF model fails when none of the departures that we classify as severe convective weather penetrations have WAF of 80 or above during their entire ascent. Thus, we will define severe convective weather as pixels with VIL Level 3 or above and flight altitude below echo top height. In our analysis, penetration occurs when pilots taking off from or landing at Chicago O’Hare International Airport (ORD) fly through severe convective weather. This is an important distinction because there are several airports within or just outside the ORD terminal area, which we will thoroughly define in the following section. We do not examine penetrations that take place while departing from or landing at one of these nearby airports. Furthermore, we focus on ORD flights in our analysis because O’Hare is considered one of the worst airports in the U.S. for severe convective weather, with an average of 38 thunderstorm days per year [10]. Since the overwhelming majority of these thunderstorm days occur during the summer months, we will focus on flights that occur in June, July, and August. In the flight database used for this research, O’Hare accounts for the highest number of penetration flights as well as overall penetration entries (i.e. one flight can penetrate multiple times) of any airport in the continental U.S. 1.1.3 Defining the “Terminal Area” This thesis focuses on pilot behavior within a region near the airport we call the terminal area. The dimensions of this region are not precisely defined, varying from airport to airport. Most major airports have Terminal Radar Approach Control (TRACON) facilities, which serve the airspace immediately surrounding the airport. Using the TRACON boundary is one possible definition. However, TRACONs can vary in size and shape just like terminal areas, and a simpler, more general definition is desirable. To devise the best definition, we must consider what characteristics define the terminal area and why pilot behavior in this region might be different from pilot behavior during the enroute portion of the flight. 22 The primary difference is that aircraft trajectories are far more constrained both vertically and horizontally within the terminal area. Enroute flights can frequently overfly or deviate around convective weather, whereas a flight in its ascent or descent sequence will most likely be flying below the storm with limited ability to deviate due to the high level of congestion and standardized trajectories in the terminal area. In this thesis, we define the ORD terminal area for arrivals to be the circle of radius 180 km around the airport, and for departures a circle of radius 150 km around the airport. The specific radii of the terminal areas for arrivals and departures were determined based on when arrivals begin their descent sequence and when departures end their ascent sequences. A 180 km radius was chosen for arrivals because this is the distance at which aircraft begin to continuously decrease their altitude. A 150 km radius was chosen for departures because this is the distance at which aircraft begin to stop continuously increasing their altitude and start leveling off for the enroute portion of the flight. Within these radii, flights are below all non-negligible storm echo tops and cannot overfly the convective weather. Using a circular region simplifies analysis by allowing the region to be broken up into eight equally spaced sectors, each corresponding to cardinal or intermediate directions. At ORD, departures typically take off in the four cardinal direction sectors (North, South, East, West), and arrivals typically land in the four intermediate direction sectors (Northeast, Northwest, Southeast, Southwest). 1.1.4 Terminal Area Operations The airspace contained within the terminal area can be subdivided into sectors controlled by individual air traffic controllers. Control of aircraft as they fly between these sectors is handed off between the air traffic controllers. The controllers are responsible for maintaining the separation of aircraft, through voice radio communication and aircraft position tracking, and providing real-time information to aircraft such as weather conditions near the airport 23 [14]. Hence pilots must obtain approval from these controllers to deviate from their filed flight plan. The airspace capacity of a sector can vary depending on the complexity of flow patterns within the sector or other conditions such as the presence of convective weather. Each flight must follow a Standard Instrument Departure (SID) when departing an airport and a Standard Terminal Arrival Route (STAR) when arriving [16]. These routes are specified by a sequence of waypoints, or fixes, along with rules governing the speed, heading, and altitude of aircraft at certain waypoints [16]. Each airport has multiple STARs and SIDs; the assignment of an aircraft to specific routes is a function of its origin (or destination), aircraft type, runway restrictions, and load balancing of runways [16]. One of the most common terminal area layouts, as seen at Chicago O’Hare, is the four cornerpost configuration, in which airspace is divided into four arrival sectors alternating with four departure sectors. Figure 1.2 contains a diagram of a four cornerpost configuration. Figure 1.2: Map of Chicago O’Hare arrival fixes. O’Hare’s TRACON is outlined in blue [14]. 24 1.2 Thesis Contribution As mentioned in the Introduction, this thesis relies heavily on the research completed by Yi-Hsin Lin during her time at MIT ICAT. In order to determine the best predictors of severe convective weather penetration, Lin employed models that predicted the maximum WAF penetrated by pilots of arriving aircraft during the descent phase [14]. Her models built upon the WAF model by incorporating operational factors, such as prior delays and existing congestion in the terminal airspace, in addition to weather-based factors. Her best model accurately predicted penetration 90% of the time [14]. She found that weather-based and stream-based features were the most predictive of severe convective weather penetration. In particular, pilots were more likely to penetrate severe convective weather when they were part of a stream following other pilots that crossed through weather and less likely when they were pathfinders leading a new stream [14]. This implies that re-routing around weather is still often based on reported events to air traffic controllers rather than preemptive action based on forecasts [14]. Furthermore, Lin found that pilots were more likely to penetrate severe convective weather closer to the airport because, intuitively, there is less ability to deviate from the flight path upon approach. Our model is fundamentally different than Lins first and foremost in its definition of severe convective weather, as we examine the raw weather metrics that make up WAF rather than WAF itself. Applying this revised definition, our model dynamically predicts the severity of pilot penetration at several checkpoints throughout the ascent and descent phases for both departures and arrivals respectively. Thus, the features (or predictors) in our model are more specific to the trajectory of the flight and its location relative to the airport. Our best models accurately predict the severity of penetration over 98% of the time with a false negative rate of less than 1% and a false positive rate of less than 3%. Furthermore, the models reveal that the presence of severe convective weather within a specified distance from the projected trajectory of the flight is the best predictor of future penetration. Nonetheless, 25 the behavior of other flights nearby is still moderately correlated with penetration behavior in the terminal area. If flights close by, whether they be departures or arrivals, are penetrating, it is highly likely that the flight of interest will also penetrate severe convective weather in future time steps. Additionally, we found that the longer amount of time flights spend within the terminal area, the more likely they are to penetrate severe weather. This may seem counterintuitive at first because flights that try to deviate around weather often experience longer flight times. On the other hand, spending more time in the air subjects the flight to more opportunities for severe weather penetration. Lastly, after running our models on several U.S. airports in addition to ORD, we found that our models consistently perform well when re-trained on each individual airport rather than using common models across airports. This held true even among airports in the same region of the continental U.S. 1.3 Thesis Organization Due to the limited number of time periods plagued by severe convective weather in our research data set, we will present a combination of predictive modeling, case studies, and pilot observation to better understand pilot behavior within the terminal area during severe convective weather scenarios. Chapter 2 discusses the data sources for this study. These include weather data from MIT Lincoln Laboratories, trajectory data from the Volpe National Transportation Center, and flight information maintained by the Federal Aviation Administration (FAA). Chapter 3 describes the features included in our predictive models for both arrivals and departures. These features can be classified into three categories: weather-based, in-flight, and features that capture the behavior of other pilots flying in the terminal area simultaneously. Chapter 4 describes the three types of predictive models used in this study and the results obtained from employing them. Multinomial Regression was used for its ability to handle well 26 categorical, dependent variables of more than two classes and independent factor variables. Classification and Regression Trees (CART) were chosen due to their transparency/interpretability, applicability to relatively small sample sizes, and ability to weigh the relative importance of features. Random Forests were explored as an extension to CART with an extremely high level of randomness, facilitating the discovery of patterns not detected by CART. Chapter 5 presents the case studies explored and commonly observed themes of pilot behavior, highlighting scenarios in which pilots penetrated severe convective weather within the terminal area. Along with observations from pilots regarding terminal area procedures during severe convective weather, these case studies will help to verify, or sometimes disprove, model results in order to determine which features truly best predict penetration. Finally, Chapter 6 discusses the implications of this thesis and plans for future work. 2 Overview of Data Three main data sources were used in this thesis: weather data from MIT Lincoln Laboratories, trajectory data from the Enhanced Traffic Management System (ETMS) database provided by the Volpe National Transportation Center, and airport information from the FAAs Aviation System Performance Metrics (ASPM) database. 2.1 Weather Data Prior to 2006, there existed a range of competing weather forecasts for aviation that led to a great amount of inconsistencies and confusion in critical air traffic management situations [14]. In response to this highly inefficient and complicated system, the FAA’s Aviation Weather Research Program (AWRP) established the Consolidated Storm Prediction for Aviation (CoSPA) program to integrate the different forecast systems into one reliable, accurate system [3]. CoSPA features collaboration from a variety of different organizations including 27 MIT, NCAR, NOAA, NWS, NASA, and DoD [3]. These organizations, collectively, aim to improve and integrate existing prototype products such as the Corridor Integrated Weather System (CIWS) and the Integrated Terminal Weather System (ITWS) [3]. MIT Lincoln Laboratory is at the forefront of this effort with its tactical (0-2 hr) storm forecasting and has successfully harnessed high-resolution, real-time weather data that accurately depicts the severity of storm cells [3]. Through the integration of multiple sensor data sources, including radar (NEXRAD, TDWR, Canadian), satellite imagery, and surface observations, Lincoln Labs has produced Vertically Integrated Liquid (VIL) and echo top maps of the entire continental United States with 1 X 1 km pixel resolution updated every 2.5 minutes [14]. These maps, or matrices of pixel-by-pixel values, will serve as the main weather inputs considered in this thesis and will be described in detail in the following sections. MIT Lincoln Laboratories provided us with weather maps for 14 days in summer 2008. They also provided us with software scripts necessary for converting between latitude/longitude and matrix coordinates in each weather image using the Lambert azimuthal equal area projection. 2.1.1 Vertically Integrated Liquid (VIL) VIL is a measure of the amount of moisture in a vertical column of the atmosphere and is typically used to indicate areas experiencing heavy rain or hail as well as identify potential supercells and downbursts [7]. VIL helps to avoid false alarms by extending to high altitudes and looking at storm cells as a whole instead of just their effect at low altitudes [14]. The raw pixel-by-pixel data provided by MIT Lincoln Laboratories measures VIL on a 0-255 scale. The VIL maps divide these raw values into 6 unequally distributed VIL levels to help with visual interpretation of the severity of convective weather cells. The levels correspond to pilots perceived threat levels with Level 3 representing a “yellow” threat level, Levels 4 and 5 representing “orange” threat levels, and Level 6 representing the most severe “red” threat 28 level [14]. The precise VIL level cutoffs can be seen in Table 2.1: Table 2.1: VIL Level Cutoffs Figure 2.1: Example of a VIL image from June 13, 2008, at 0000Z [14]. Figure 2.1 provides a useful demonstration of the common types of convective weather encountered in different regions of the United States during the summer months. Moving from left to right, first we see scattered light rain across the Northwest, consisting mostly of Level 1 VIL that has little to no effect on aviation. Next, we see a long line of severe storm 29 cells, associated with the strong organized convection of a cold front [19], developing across the Midwest. This line of cells, commonly referred to as a “frontal storm” [19], makes it hard for pilots flying into the wind to deviate and find pockets of airspace without weather. Lastly, we see synoptic, scattered storm cells in the Southeast associated with summer convection [19]. These isolated, smaller cells, which comprise an “air mass storm”, have short lifecycles, making them very difficult to forecast [19]. Regardless of the arrangement of storm cells and their classification, they can be greatly disruptive to aviation. It is useful to consider these different types of storms in order to observe differences in the strategies of pilots in each scenario. 2.1.2 Echo Tops Although VIL gives us a good measure of the precipitation in a vertical column of the atmosphere, we have no idea the height at which this precipitation begins or ends. Echo tops provide an estimate of the maximum height (in thousands of feet) of clouds containing convective weather [7] so that we know whether a flight is above/below the convective weather. However, echo tops do not indicate the minimum height of storm cells, so we assume in this thesis that if a flight is below the echo top height then it is subjected to the weather in that pixel. Pixels unaffected by weather maintain an echo top value of 0. Figure 2.2 shows an example echo tops image from the same timeframe as the VIL image in Figure 2.1. Comparing this image to Figure 2.1, it is apparent that echo top values are generally correlated with VIL values. For instance, areas with high VIL values will also have relatively high echo tops. This is because stronger convective cells typically extend higher into the atmosphere [7], resulting in stronger storms. However, there exist rare severe storms that occur lower in the atmospheric column (i.e. storm height of 25,000 feet) that enroute pilots at altitudes above 35,000 feet can easily overfly but cause problems for aircraft in the ascent/descent phase of flight. 30 Figure 2.2: Example of an ET image from June 13, 2008, at 0000Z [14]. 2.1.3 Case Days The date/time associated with each of the weather files given to us by Lincoln Laboratories is in UTC format, so we will use this convention throughout this thesis. To convert UTC to local time during the summer months, we subtract 5 from the UTC hour unit. The weather files consist of 14 days in June, July, and August 2008 in which weather, at some point, impacted the ORD terminal area. However, in this thesis, we specifically focus on time periods within these days in which severe convective weather penetrations took place in the ORD terminal area. Thus, we examine periods in which severe convective weather was present long enough to affect air traffic flows in a negative manner. These specific periods are outlined in Table 2.2. Although there may be multi-hour gaps between penetrations within these periods, the table only includes one time period per day. Nonetheless, large time periods that last almost a full day indicate that weather impacted the ORD airspace consistently for a significant 31 Date Time Period 06/12 15:10-22:25 06/13 00:30-23:50 06/14 00:10-04:00 06/25 11:10-20:10 07/02 18:40-23:59 07/03 00:00-07:00 07/07 10:20-17:55 07/08 00:50-23:59 07/09 00:00-04:40 07/10 17:50-23:59 07/11 00:00-04:00 08/04 11:20-20:35 08/05 00:20-04:00 08/22 10:50-21:55 Table 2.2: List of severe convective weather penetration periods within the ORD terminal area during summer 2008. amount of time. 2.2 ETMS Database The Enhanced Traffic Management System (ETMS) was created by the FAA in order to monitor and react to traffic congestion in the United States using real-time trajectory data [1]. Air traffic controllers can leverage this data to direct aircraft flow and make decisions regarding Ground Delay Programs (GDP) and Ground Stop Programs (GS) [1]. The ETMS comprises a network of “hubsites” that transmit/receive trajectory data to/from several remotes sites throughout the United States using the Aircraft Situation Display to Industry (ASDI) feed [1]. Trajectory data is automatically generated by transponders on aircraft and sent as real-time messages to the ASDI feed. In 2008, many aircraft, especially general aviation aircraft, were not outfitted with transponders [14], so some flights 32 may be missing from the database. We have no way of identifying this missing flights. The Volpe National Transportation Center provided us with all ETMS data from 2008; this data consists of two main tables. The first table provides basic information about each flight such as arrival and departure airport, scheduled departure time, scheduled arrival time, actual arrival time, and aircraft type. ETMS also assigns a unique flight key to each flight so that flight information and trajectory information are linked. Several flights in the database have blank (NULL) fields, but these flights typically do not take place during the case periods. The second table contains positional data for each flight at numerous points throughout the flight. This data includes transponder message time, latitude, and longitude. This table also includes a “smoothed” altitude that is derived using a moving average as well as an average speed that is derived from position and time data. Messages are sent from flight transponders approximately once a minute during the enroute portion of the flight and approximately once every 15-20 seconds during the ascent/descent portions of the flight. 2.2.1 Verifying ETMS Trajectory Data for Model Dataset Like any large dataset, ETMS contains several data errors that required “cleansing” prior to analysis. Most of these errors were present in the trajectory data. One common error is the presence of gaps in trajectories because flight transponders may not transmit their position for long periods of time. Luckily, the majority of these gaps take place during the enroute portion of the flight, whereas we are concerned with trajectory gaps within the terminal area. When a flight is in its ascent or descent phase, it typically transmits its position every 15-20 seconds, if not more often. However, there are a handful of flights that have gaps much larger than 15-20 seconds. We chose to exclude flights from our model dataset that had a trajectory gap of greater than two minutes. Since the ascent/descent phases of flight last only 15-20 minutes, a gap larger than two minutes would represent a significant portion of 33 our observation period, during which a weather penetration may be missed. Additionally, our weather files represent 2.5 minute time periods, so gaps longer than this would skip over an entire weather file. Another common error that occurs within approximately 30% of flight trajectories in the database is the presence of unreasonable altitude entries. For instance, consider a departure that is listed at an altitude of 2,500 feet at message two, 36,000 feet at message three, and then 4,200 feet at message four. It is apparent that there is an error in the altitude entry for message three. We corrected faulty altitude entries in different ways depending on their corresponding message number and whether they occurred within a departure vs. arrival trajectory. All departures in the dataset record an initial altitude of zero. However, it is obvious that the initial transponder message did not take place at zero altitude because the initial position is some kilometers away from the runway or the altitude jump from the first to second transponder message is unreasonably large. Thus, we extrapolate the actual initial altitude using equations 2.1-2.3. 34 t1new = D1 ÷ δ t2 − 0 t2new = t1new + t2 a1new = a2 t2new ∗ t1new (2.1) (2.2) (2.3) where D1 = distance from takeoff to first message δ = distance from takeoff to first message t1new = time between takeoff and first message t2 = original time spent in the air from takeoff to second message t2new = new extrapolated time spent in the air from takeoff to second message a1new = new extrapolated altitude at first message a2 = original altitude value at second message From these equations, we can also extrapolate an estimate for the actual takeoff time of the flight, which will be useful in the predictive models. The takeoff runway for each flight is assigned based on distance from each runway and the heading of the plane at the first message entry. Moving on with unreasonable altitude entries with arrival trajectories, if the faulty entry took place at the first message entry within the terminal area, we would set the altitude value to the next altitude entry plus 300 feet. This ensures the trajectory will maintain a relatively smooth descent. On the other hand, if the faulty entry took place at the last message entry of an arrival trajectory within the terminal area, we would set the altitude value to the previous altitude value minus 300 feet. Similarly, for altitude misprints at the last message entry of a departure trajectory within the terminal area, we set the altitude to the previous altitude entry plus 300 feet to ensure smooth ascent. Finally, if the faulty altitude entry took place anywhere else in the trajectory, we set the new altitude to the 35 average of the altitude entries surrounding it. For example, if the altitude misprint took place at the 4th message, we set its value to the average of the altitudes at the 3rd and 5th message entry. The initial and final positions of flights is also an area of concern within the data. On the departures side, as discussed above, the initial latitude/longitude of flights indicates a non-trivial distance away from ORD. Thus, the initial transponder message does not reflect the flight’s initial position at takeoff but rather some position later within the takeoff phase. Consequently, we limited the model dataset to departures with an initial position less than 10 km from the published latitude/longitude coordinates of ORD. Since the largest distance between takeoff runway and airport terminal at ORD is 3.25 km, any flight with initial position larger than 10 km has most likely been in flight for a couple minutes, representing a large gap in trajectory that we want to avoid. Moreover, it is very difficult to assign takeoff runways to flights with initial positions greater than 10 km from ORD. Similar to departures, the final latitude/longitude of arrivals sometimes indicates a relatively large distance away from the published latitude/longitude coordinates of ORD. Consequently, we limited the model dataset to arrival trajectories with a final position less than 10 km from ORD. Beyond this distance, it was very difficult to assign landing runways, and a large portion of the flight would be left out of examination. Furthermore, the arrival trajectories often contain multiple message entries at the same altitude at the end of the trajectory. For our analysis, particularly when assigning landing runways, we assumed that the first of these message entries at the same altitude was the actual point of touchdown. Overall, we removed approximately 5% of the flights, including both departures and arrivals, from the original model datasets due to the data errors discussed above. 36 2.3 ASPM Database The Aviation System Performance Metrics (ASPM) is a FAA-built database of the National Airspace System providing airport and individual flight data for 77 airports and 22 carriers in the US [2]. This thesis accessed ASPM’s online “Efficiency Reports” to extract important airport data. The reports provide such useful information (in local time in 15-minute intervals) as runway configuration, wind speed, wind direction, visibility, and ceiling. Some of these metrics were used in preliminary models but had very weak predictive power. The metrics are accurate only within a 10 km radius of ORD rather than across the entire terminal area we defined. Nonetheless, the runway configuration data enabled verification of runway assignment algorithms used in the models and aided in the case studies presented in Chapter 6 by outlining airport operations in convective weather scenarios. 3 Feature Identification When we discuss “features” in this thesis, we are referring to the independent variables in our predictive models. Our models utilize three different types of features: weather-based, in-flight behavior, and behavior of other flights in the terminal area. The following sections will discuss these three feature types in detail while presenting the different model features that fall within these groups. To make sense of these different model features, we must first discuss the three separate models examined in our research and their dynamic nature. 3.1 Three Separate Models We first separated models by arrivals vs. departures because their behavior is inherently different in the terminal area, especially close to the airport. From a top-level perspective, the difference between them is clear: departures ascend while arrivals descend. However, the differences in their behavior can be characterized more granularly. When departures take 37 off, their horizontal and vertical movement is much less constrained than that of arrivals descending, making it easier to deviate around weather cells. Upon takeoff, departures are able to turn in a wide variety of directions right away, whereas arrivals on approach must follow specific landing patterns based on their assigned runway and the current wind direction. Farther out from approach, arrivals are assigned to cornerposts and their corresponding streams by air traffic controllers. In addition, there is pressure to get arrivals on the ground in a severe convective weather scenario. The situation at Chicago O’Hare adds to this pressure because diversion to its alternate airport, Midway (MDW), does not significantly improve weather conditions due to their very close proximity of less than 25 miles. Consequently, flights whose assigned runway is covered by severe convective weather may have no other option but to penetrate, as ATC rarely allows for runway reassignment. To pilots and airlines alike, this is a better option than continuing to fly around the terminal area burning fuel and subjecting the plane to further weather encounters, especially if storm cells are widespread throughout the terminal area. In contrast, on the departures side, ATC can slow or even halt departure operations altogether when severe weather is present because they are already on the ground. We then separated the arrivals model into two separate models: one for the portion of the trajectory far from ORD and one for the portion of the trajectory close to ORD. The “close” model examines the trajectory starting 50 km out from O’Hare. We made this second model split because arrivals behave much differently within this 50 km radius from the airport. Within this boundary, arrivals are setting up their approach for landing. Based on a flight’s assigned runway, the direction it is coming from, and current wind conditions, a flight may actually have to fly past the airport in order to obtain the proper approach direction. Thus, within the 50 km radius, an arriving flight’s distance from the airport may fluctuate, making it harder to bin messages based on distance from the airport. We will discuss why this presents a problem for our original model design in the next section. 38 In total, we now have three separate models: one for departure trajectories, one for arrival trajectories outside 50 km from the airport, and one for arrival trajectories within 50 km from the airport. We will refer to these three models as “Model 1”, “Model 2”, and “Model 3” respectively in this thesis. Furthermore, although these three models will use many of the same features, there are some features that are unique to a particular model. We will discuss the assignment of features to models in the following sections. 3.2 Dynamic Nature of Models One main distinction between our models and Lin’s models is their dynamic nature, meaning that predictions are updated based on updated feature values at multiple checkpoints throughout a flight’s trajectory within the terminal area. The checkpoint intervals for Model 1 and Model 2 are defined based on distance from the airport and become larger as you move farther away from the airport. Figure 3.1 shows the Model 1 checkpoint intervals. Figure 3.1: Terminal area intervals for Model 1. 39 The red dot in Figure 3.1 represents the airport. The concentric half-circles represent the checkpoint boundaries. Moving outwards, penetration predictions are made at the checkpoint for the portion of the flight that takes place between the checkpoint and the next half-circle. For instance, suppose a departing flight is currently 80 km from the airport. The prediction for whether or not the flight will penetrate severe convective weather between 80 km and the next checkpoint boundary (110 km) will be made at the 80 km checkpoint. The feature values used for this prediction are calculated based on information obtained up to the current point in the flight. The final prediction in this scenario would take place at the 110 km checkpoint boundary because it is the beginning point of the last interval in the terminal area. The Model 1 and 2 checkpoint intervals were constructed based on the distribution of penetration entries by distance from ORD, shown in Figure 3.2. Figure 3.2: Distribution of Penetrations by Distance from ORD Arrival and departure penetrations follow a similar distribution, with most penetrations taking place within 30 km of ORD. Some penetrations take place between 40 and 100 km 40 from ORD, and then very few take place beyond this point 1 . Our prediction intervals reflect this distribution, with more frequent, shorter intervals closer to the airport and less frequent, larger intervals farther from ORD. The intervals become larger as the flight gets farther away from the airport because behavior becomes more consistent at these farther distances and because time between transponder messages is longer 2 . Specifically, for Model 1, predictions are made for the following intervals: between takeoff and 10 km, between 10 km and 20 km, between 20 km and 30 km, between 30 km and 50 km, between 50 km and 80 km, between 80 km and 110 km, and between 110 km and 150 km. Model 2 predictions work almost identically except that the predictions are made in the opposite direction moving towards the airport, starting at 180 km out and ending with a prediction between 80 and 50 km. Model 3’s prediction intervals are based on time spent within 50 km of the airport instead of distance from the airport. Model 3 is designed this way because arrivals’ distance from the airport often fluctuates within 50 km due to varying approach paths such as “tromboning”, holding patterns, and other maneuvers that make it impossible to create consistent checkpoint boundaries. For Model 3, predictions are made every 2.5 minutes of flight. Thus, when an arrival first enters the 50 km radius, the model predicts whether it will penetrate severe convective weather within the next 2.5 minutes of flight. This process continues until the flight lands. This time-based prediction approach is hypothetically much more difficult than the distance-based prediction approach utilized in Model 1 and 2 for two main reasons: 1) a lot can happen in 2.5 minutes of flight and 2) flight behavior is already more uncertain close to the airport. Flights that spend more than 25 minutes within the 50 km radius before landing were excluded from the model dataset. Such cases were very rare and exhibited flight behavior that often did not make sense, suggesting errors in the recorded trajectory. In reality, the structure of our models may be better described as semi-dynamic than dynamic because we group information from multiple trajectory message entries into a finite 1 The plot shows that there are no departure penetrations past 150 km because that is the boundary for our terminal area with respect to departure flights. This boundary is 180 km for arriving flights. 2 Every 1-2 minutes as opposed to every 15-20 seconds 41 number of bins based on checkpoint boundaries. A purely dynamic model would, in contrast, make a prediction at each message time along the trajectory. We chose to use semi-dynamic models rather than purely dynamic models because semi-dynamic models are better able to capture overall flight behavior and traffic flows during weather impacts. Additionally, purely dynamic models would require predictions every 15-20 seconds during ascent/descent. This would necessitate not only magnitudes more computing power but could also result in “information overload” for air traffic controllers who use our prediction tool. 3.3 Weather-Based Features This group of features exploits the weather data discussed in Chapter 2 to characterize the current weather scenario that a flight is experiencing. For instance, if convective weather is on top of ORD and its surrounding area out to 10 km, one would assume that many flights will penetrate upon takeoff or landing. However, our model’s weather-based features do not aim to characterize the weather scenario within the ORD terminal area as a whole, but rather focus on weather in relation to a flight’s projected trajectory. This fundamental difference from Lin’s weather-based features makes sense considering the dynamic nature of our models and our interest in specific segments of a flight’s trajectory during ascent/descent. An example of the projection from which we calculate our weather-based features is shown in Figure 3.3. The matrix shown within the projection represents the VIL values corresponding to individual pixels within the projection. These VIL pixel values are vertical columns that extend infinitely into the atmosphere. All of the pixels that are within the triangle formed by the blue lines, including those that intersected by the triangle’s edges, are considered part of the trajectory projection. The length and swath width of the projection may vary based on the current prediction checkpoint boundary. Values of 10 km and 65 degrees are typical of projections for smaller intervals closer to the airport. The length of projections 42 Figure 3.3: Projected flight trajectory looking 10 km out with swath width of 55 degrees. Angles and distances are not exactly to scale. is usually equal to the interval distance, and the swath width is adjusted to account for potential horizontal movement of the flight within an interval. The weather-based features are calculated from the VIL values associated with the group of pixels within the projection. Echo top values, however, are not considered in our weatherbased features. If an echo top value exists for a pixel within the terminal area, which requires a VIL level≥3, most observed flights will be below this echo top because they are below cruising altitudes. Thus, thorough examination and manipulation of these values would be redundant. Nevertheless, these echo top values still serve as limiting criteria for penetration classification. Furthermore, the weather-based features can be divided into three distinct groups: one group measuring the severity of weather within the projection, one group measuring the movement of weather within the projection, and a third group measuring the spatial positioning of weather cells within the current prediction interval. We describe the features within each group in the sections below. Every one of these features is included in Models 43 1, 2, and 3. 3.3.1 Measuring Severity of Weather This subset of weather-based features aims to capture the strength of the storm cells, if present, within the trajectory projection. Intuitively, if a flight’s projection contains a large amount of strong storm cells, there is a good chance that the flight will penetrate in that interval, unless it deviates around the weather or the weather moves. We use two different features to measure the severity of weather within a flight’s projection, with both features measuring the proportion of cells within the projection that have VIL level greater than or equal to 3 (VIL value≥133). The features differ in the set of VIL pixels they examine: the first feature (“BadWeatherPercentageBefore”) evaluates the pixels within the projection from the prior weather data time period, whereas the second feature (“BadWeatherPercentageNow”) evaluates the pixels from the current weather data time period. Our weather data files are broken up into 2.5 minute periods, so if the current time is 10:30:00 UTC, then “BadWeatherPercentageBefore” would look at pixel data from the 10:27:30 weather file, and “BadWeatherPercentageNow” would like at pixel data from the 10:30:00 weather file. An example calculation, independent of time, of these features for a 15-pixel rectangular projection can be seen in Figure 3.4, with severe VIL values in red. Figure 3.4: Calculation of weather-based feature that measures severity. The median and average values for these features given a penetration are respectively 44 around 15% and 30%, compared to a value of 0% and 2% for non-penetrations. However, Figure 3.5 reveals that the large majority of penetrations takes place when the values for 500 these features are below 10%. Thus, weather coverage does not have to be particularly 300 200 0 100 # of Penetrations 400 overwhelming to give pilots trouble. 0.0 0.2 0.4 0.6 0.8 1.0 Trajectory Projection Severe Weather Coverage Figure 3.5: Distribution of penetration entries by severe weather coverage within the trajectory projection. 3.3.2 Measuring Movement of Weather This subset of weather-based features aims to capture the movement of weather within the projection by examining how weather conditions change from one time period to the next. The first feature (“PercentBadCellsDiffVIL”) calculates the difference, between time periods, in the proportion of pixels within the projection that have VIL level ≥ 3. The second feature (“SeverityDiffVIL”) calculates the difference in overall severity score of the projection between time periods. To calculate this overall severity score, each pixel within the projection is first assigned a VIL level based on its raw VIL value using Figure 2.1. The overall severity score is simply the sum of the VIL levels for all of the pixels within the projection. The third feature (“PercentWorseningVIL”) calculates the proportion of pixels within the projection that increase in VIL value between time periods. 45 The next two features are very similar in the way in which they are calculated, but their output describes the weather situation within the projection in very different manners. The features calculate the sum of the difference in VIL value between corresponding projection pixels between time periods. The distinction between the features is that one (“CellDiffVIL”) calculates the raw pixel-to-pixel difference, including sign, whereas the other (“CellDiffVILAbs”) calculates the absolute difference before summing up all of the pixel differences. Figure 3.6 displays the calculation processes for these two features using the same initial set of pixel values extracted from the middle of the projection region in Figure 3.3. Figure 3.6: Calculation of weather-based features that measure movement. To obtain the final pixel value differences on the far right, we subtract the Time 1 pixel values from the Time 2 pixel values. In this example, the VIL pixel values stay the same between Time 1 and Time 2 but switch positions within the square projection. Thus, the sum of the pixel values is 12 for both time periods. However, the final feature value, or the sum of the differences, is 0 for the first feature and 12 for the second feature. Hence the first feature captures the fact that the values and their sum within the projection have not changed, whereas the second feature recognizes the movement of the values within the projection. Consequently, we hypothesize that the absolute difference feature (“CellDiffVILAbs”) 46 captures weather movement better than the raw difference feature (“CellDiffVIL”). However, it is worth noting that the raw difference feature describes how the strength of weather within the projection changes between time periods, augmenting the subset described in the previous section. The final value used for each weather movement feature within the predictive models is the average of the feature values for two time period differences before the current time in order to best capture the situation at hand while only exploiting information that is already known at the time of prediction. For instance, if the current time is 10:30:00 UTC, we would average the feature values corresponding to the 10:22:30 and 10:25:00 weather file pairing as well as the feature values corresponding to the 10:25:00 and 10:27:30 weather file pairing. 3.3.3 Spatial Positioning of Weather This last subset of weather-based features aims to capture the spatial positioning of weather cells ahead of the flight being observed. These features do not constrain the trajectory projection with a swath width. Instead they consider all pixels in front of the plane within the specified interval distance. The first feature (“flankcount”) counts the number of pixels in this widespread projection that have VIL level ≥ 3. The count is weighted based on the distance of the pixel within the projection from the plane’s current position, with weight decreasing incrementally as distance from the plane increases. For example, a severe weather pixel that is located 10 km from the plane’s current position is worth more than a pixel that is located 20 km from the plane’s current position. This weighting scheme makes sense because storm cells closer to the plane’s current position provide more immediate danger and are less likely to migrate out of the projection before the plane encounters them. The second feature (“FlankingValue”) is a bit more complicated. It sets the nose of the plane as the common center of several concentric half circles whose radius depends on the distance of a severe weather pixel from the center. We then calculate the degree value 47 (between 0 and 180) of each severe weather pixel based on its position on one of the concentric half circles. The feature value is the standard deviation of the degree values of the storm cells. By using the standard deviation, we capture the span of the storm cells across the plane’s nose. If there are no storm cells within the interval ahead of the flight, then the feature value is 0. A large standard deviation hypothetically reflects that there are storm cells all across the plane’s direction of movement, making it harder to deviate around the storm cells or find pockets without weather. Figure 3.7 displays how we identify the degree values for this second flanking feature. Figure 3.7: Example of flanking metric calculations with the number of red cells representing “flankcount” and the standard deviation of the degree values representing “FlankingValue”. The red pixels contain severe VIL values, so we include them in our feature calculations. The radii of the concentric half circles is based on the severe weather pixel’s distance from the 48 current flight position. The radii values matter more for the “flankcount” weighting scheme than for the “FlankingValue” feature. In this example, the “FlankingValue” feature value would be the standard deviation between 160, 100, and 25: 67.64. This value represents a fairly large spread of severe weather cells across the plane’s nose. However, it is apparent that this feature does not capture whether there is a concentration of severe weather cells in one particular region of the relaxed projection. Possible improvement of this feature will be discussed in the “Future Work” section in Chapter 6. Since “flankcount” values differ based on the size of the trajectory projection, we will not present their summary statistics. Nonetheless, it is apparent that penetrations consistently encounter trajectory projections with a much larger “flankcount” than non-penetrations. With regards to “FlankingValue”, the median and average values for these features given a penetration are respectively around 18 and 21, compared to a value of 0 and 2 for nonpenetrations. Thus, the typical trajectory projection of penetrations not only contains significantly more severe weather pixels, but these pixels are more spread out across the nose of the plane. Nonetheless, it is worth noting that the maximum “FlankingValue” for penetrations was approximately 40, so the severe weather pixels are still relatively concentrated within the projection. 3.4 In-Flight Features The next set of features we will discuss deals with the behavior of the flight of interest within the terminal area and how this behavior may be correlated with severe convective weather penetration. These in-flight features are broken down into three subsections below. In addition to features that deal with the behavior of the flight of interest, our models consider standard characteristics about the flight, for instance, whether it is an international flight (“intl”) and/or a cargo (“car”) flight. We also note whether the flight’s ascent/descent sequence takes place during darkness (“night”) between the hours of 21:00:00 and 06:00:00 49 local time. Although delays hypothetically fall under in-flight features, we did not include them in our arrivals models based on their weak predictive power in Lin’s models, which exclusively examined arrivals. Exclusion of delay features makes sense on the departures side as well because most departures during weather impacts are delayed. The fact that only a small percentage of the total number of departures during these weather-impacted periods are penetrating supports this exclusion. 3.4.1 Time Spent Within Terminal Area The first subset of in-flight features keeps track of the time that a flight has spent in the terminal area up to the current prediction checkpoint. Models 1 and 2 consider total time spent in the terminal area (“TimeinTerm”), from the time departures take off or arrivals enter the terminal area. Model 3 considers this metric as well as the time spent within 50 km (“TimeWithin50km”) from the airport, reflecting the different approach maneuvers performed by arrivals prior to landing. Rare flights that spent excessive amounts of time in the terminal area, specifically 25 minutes within 50 km or 50 minutes overall, were excluded from the model datasets due to their unexplained behavior. One may assume that longer flight times within the terminal area would be less correlated with weather penetration since they might reflect deviation to avoid weather. However, our case studies indicate that typically pilots are unable to deviate around weather completely because blockage is too extensive. This is especially the case when weather is close to the airport and range of motion is limited. Hence longer flight times within the terminal area do not always translate to pilot deviation. Overall, we included these features in our models due to their strong performance in Lin’s models and the notion that longer flying times translate to more exposure to weather. 50 3.4.2 Flight Behavior Within Terminal Area The next subset of in-flight features aims to capture certain behavioral tendencies of flights in their ascent/descent phase. The first feature within this subset (“LevelOffORDecreasing”) applies to all three models, observing whether or not flights are leveling off within their trajectory. Although this behavior is systematic within the terminal area based on standardized routes, we found that flights frequently penetrate while leveling off. This was even the case for departures, which are not commonly restricted to step-like paths like arrivals. Furthermore, flights on bad-weather days tend to level off more often than flights on days without weather. Figures 3.8-3.10 exhibit this behavior for departure flights, with leveling off maneuvers typically occurring at 5000 ft altitude beginning 15 km from the airport. Each individual trajectory curve within the plots represents a different flight. Figure 3.8: Plot of Altitude vs. Distance from Takeoff for departure trajectories on July 2nd that penetrated severe convective weather. The July 2nd plots (Figures 3.8 and 3.9) exhibit several cases of leveling off behavior, whereas the occurrences of level offs in the June 10th plot are much fewer. Furthermore, 51 Figure 3.9: Plot of Altitude vs. Distance from Takeoff for departure trajectories on July 2nd that took place during a weather impact but did not penetrate. Figure 3.10: Plot of Altitude vs. Distance from Takeoff for departure trajectories on June 10th with no weather impact. 52 the proportion of pentration trajectories that show leveling off behavior is especially high, supporting our inclusion of this feature in our models. Nonetheless, it should be noted that not all case days exhibited the same high frequency of leveling off behavior. In fact, the summary statistics for this feature over all case days reveal that the proportion of penetration entries that occur while leveling off is the same as for non-penetrations. The percentage of flights that level off for Models 1, 2, and 3 are as follows: 3%, 23%, and 38%. The fact that arrivals exhibit level off behavior more often is not surprising, as they typically follow a step-like descent in contrast to the more constant ascent of departures. To analytically determine if a flight is leveling off at each prediction checkpoint, we calculate the difference between the flight’s current altitude entry and previous altitude entry. If the absolute value of this difference is less than or equal to 100 feet, we consider the the flight to be leveling off. The next feature in this subset (“Tromboning”) applies only to Model 3. “Tromboning” refers to the shape of the approach maneuver that arrivals often perform prior to landing the aircraft. During this maneuver, flights pass by the airport before circling around to line up for approach. The image below shows an arriving flight at ORD performing a “trombone” maneuver while avoiding weather penetration. To avoid viewing each arrival trajectory in our dataset, we devised an analytical way to determine whether or not a given arrival is in the midst of a trombone maneuver at the current prediction checkpoint. This analytical method classifies an arrival as tromboning when it is continuously decreasing its distance from the airport and then suddenly begins to continuously increase its distance from the airport. At this point, the flight has passed by the airport in order to set up its approach. Based on this analytical approach, go-arounds would also be recorded as tromboning behavior. The reasoning behind the use of tromboning as a predictor variable in Model 3 is that this 53 Figure 3.11: Plot of arrival flight executing a trombone maneuver at ORD. The red dots represent the trajectory points and the nose of the plane is represented by the red circle. maneuver exposes the flight to more opportunities to penetrate severe weather. Although it is possible that this feature may be correlated with the “TimeinTerm” and “TimeWithin50km” variables discussed in the above section, “Tromboning” captures pilot behavior at a more granular level by taking positional data into account. Furthermore, arriving flights perform tromboning maneuvers regardless of the presence of convective weather due to wind restrictions on approach and landing. Hence the feature is not specific to severe weather operating procedures. Lastly, if a high proportion of tromboning flights are penetrating severe weather, this may suggest that the runway configuration should be altered to better protect arrivals by altering approach routes. The summary statistics for this feature reveal that the proportion of penetration entries that occur while tromboning is the same as for non-penetrations. 54 Twelve percent of flights in the Model 3 dataset perform the trombone maneuver. The next feature in this subset (“OtherTermArea”) observes whether or not flights are flying within other airport terminal areas during ascent/descent. There are several airports within the ORD terminal area, some being regional and some being major airports, like Chicago Midway (MDW). O’Hare flights that pass through other terminal areas may encounter congestion and limited ability to deviate around weather. Since terminal areas often overlap, especially in the Chicago metropolitan area, we consider a flight to be in another terminal area if it is within 30 km of another airport. This occurs relatively often in our model dataset, so we included it in our Model 1 and 2 predictive models. Since all flights within the defined ORD terminal area are well below cruising altitudes, we do not include an altitude criteria in addition to distance for this feature. The Model 2 dataset contains the most interesting summary statistics for this feature, with 20% of penetrations occurring while flying within other airport terminal areas compared to 12% of non-penetrations. The final feature in this subset (“PenetrateAlready”) captures past penetration behavior, if any, of the current flight by recording the severity score (which will be defined in Chapter 4) from the previous prediction interval. This feature is useful because it reveals the weather situation within the terminal area with respect to the flight’s current trajectory. One would assume that a pilot who has already penetrated severe weather would try harder to avoid a second penetration, but this is not always possible. Approximately 50% of penetration entries occur after the flight has already penetrated at least once, meaning approximately 50% of flights that penetrate do so multiple times during the ascent/descent sequence. Thus, the “PenetrateAlready” feature is a strong indicator of future penetration. 3.4.3 Positioning Within Terminal Area The final subset of in-flight features applies only to the arrivals models: Models 2 and 3. The first feature in this subset (“FlightDistance”) measures the total distance the flight has 55 traveled from takeoff. The reasoning behind this feature is that flights who have traveled a long distance most likely took off without regard to the weather conditions at the arrival airport because so much can change during the long flight. The pilots for these flights also are hypothetically more subject to fatigue due to longer flying times, which could affect their decision-making when faced with severe weather in the terminal area. In contrast, flights with a relatively small “FlightDistance” most likely received approval to take off given current arrival airport weather conditions. If conditions at the arrival airport are severe, often times air traffic controllers do not give these corresponding departing flights permission to take off. Thus, through this feature, we aimed to capture differences in descent routes as well as intricacies within pilot behavior based on total distance traveled and currency of knowledge regarding terminal area weather conditions. One may argue that “FlightDistance” will be highly correlated with the “TimeinTerm” variables because “FlightDistance” logically increases as “TimeinTerm” increases. However, the disparity in distances traveled by different flights prior to reaching the ORD terminal area boundary prevents this correlation. It is worth noting that the “FlightDistance” feature is calculated as a straight line from takeoff to the flight’s current position rather than a point-to-point cumulative distance. The next two features within this subset (“DistfromLanding” and “CircleDistfromLanding”) only apply to Model 3 and aim to capture an arriving flight’s spatial positioning within the terminal area with respect to its assigned landing runway. The “DistfromLanding” feature measures the raw distance “as the crow flies” from the flight’s current point to its assigned point of touch down. The “CircleDistfromLanding” feature, on the other hand, measures the minimum angular distance, ranging from 0 to 360 degrees, between the flight’s current position and the point of touch down. This second feature provides a better representation of flight’s path to touch down because it simulates approach in addition to landing. 56 Figure 3.12: Example of Model 3 distance metric calculations with point-to-point distance measured in kilometers representing “DistfromLanding” and angular distance measured in degrees representing “CircleDistfromLanding”. The idea is that flights farther from touch down have more airspace to cover, and thus, are subjected to more opportunities for weather penetration while having limited deviation capability close to the airport. For example, a flight that is currently located on the opposite side of the airport from its landing runway must circumvent the airport to ensure a proper approach. If weather is surrounding the airport or is concentrated in large, unavoidable pockets, such a flight is almost sure to penetrate severe weather during its circumvention. 3.5 Behavior of Other Pilots in the Terminal Area The last set of features included in our models describes the behavior of other flights within the terminal area during the ascent/descent of the flight of interest. We found that this behavior is often a good indicator of the conditions that the flight of interest will experience in the upcoming prediction interval. These behavioral features, which are broken down into three subsections below, describe flights ahead in the arrival stream as well as departures 57 ahead ascending towards enroute altitudes. 3.5.1 Are Flights Ahead Penetrating Severe Convective Weather? The first subset of features describing the behavior of other flights in the terminal area examines flights ahead of the flight of interest and records whether or not they are penetrating severe convective weather. For example, applied to Model 1, these features examine flights further along in the ascent sequence than the flight of interest. Applied to Models 2 and 3, these features examine flights that are closer to ORD than the current flight. To be specific, we only consider flights ahead that have a trajectory point within the current prediction interval within the past three minutes. We set the time period length at three minutes rather arbitrarily, aiming to encapsulate the flight operations and weather conditions that the current prediction interval has been subjected to recently. A shorter time period may not succeed in “painting this picture” due to a low number of trajectory points. This subset of features outlines the following characteristics of flights ahead in the ascent/descent sequence: whether or not these flights penetrated severe weather in the current prediction interval (“PenetratingAhead”), the total number of flights that penetrated (“PenetratingAheadNumber”), the total number of times these flights penetrated in the current prediction interval (“PenetratingAheadNumberEntries”), the average severity level of these penetrations (“PenetratingAheadScore”), the average distance “as the crow flies” between the current flight’s position and where the penetrations took place (“PenetratingAheadDist”), and the average angular distance, ranging from 0 to 180 degrees, between the current flight and where the penetrations took place (“PenetratingAheadCircleDist”). The reasoning behind this subset of features is fairly straightforward: flights ahead have already experienced weather conditions, severe or not, that the current flight will soon face. After examining the summary statistics, we found that over 25% of the time there is a penetration ahead of the current flight, that flight will penetrate in the next prediction 58 interval as well. Furthermore, approximately 60% of penetrations occur after another flight had penetrated ahead in the interval. 3.5.2 Behavior of Flights in the Opposite Sequence The next subset of features is very similar to the subset described above, except that we are now examining flights in the opposite flight sequence. For instance, if the flight of interest is a departure, this subset of features examines arrival behavior. The situation is reversed for Models 2 and 3. Identical to the feature subset above, we record the following characteristics for flights with a trajectory point in the current prediction interval within the past three minutes, except in the opposite flight sequence: whether or not these flights penetrated severe weather in the current prediction interval (“Arrivals/DeparturesPenetrating”), the total number of flights that penetrated (“Arrivals/DeparturesPenetratingNumber”), the total number of times these flights penetrated in the current prediction interval (“Arrivals/DeparturesPenetratingNumberEntries”), the average severity level of these penetrations (“Arrivals/DeparturesPenetratingScore”), the average distance “as the crow flies” between the current flight’s position and where the penetrations took place (“Arrivals/DeparturesPenetratingDist”), and the average angular distance, ranging from 0 to 180 degrees, between the current flight and where the penetrations took place (“Arrivals/DeparturesPenetratingCircleDist”). After examining the summary statistics for these features, we found that over 20% of the time a flight in the opposite sequence penetrates in the current prediction interval, the flight of interest will penetrate in the current prediction interval as well. Furthermore, approximately 40% of penetrations occur after another a flight in the opposite sequence already penetrated in the current prediction interval. We also look to see whether flights of the opposite flight sequence are forming congestion around the flight of interest with the “DepsCrowding” and “ArrsCrowding” features. These 59 features count the number of flights of the opposite flight sequence with a trajectory point within 25 km of the current flight’s position in the past three minutes. Although flights of the opposite phase often operate in different terminal area sectors, are separated vertically by air traffic controllers, and can be moderated by Ground Delay/Ground Stop programs, a situation where this division of operations is not sustained may affect pilot behavior drastically by limiting range of movement. We expect that these features have more of an effect on entries close to the airport where traffic is more condensed. 3.5.3 Follow the Leader The final subset of features describing the behavior of other flights in the terminal area is similar to the other two in that we examine flights ahead of the flight of interest. However, this feature specifically examines whether or not the current flight is following the trajectory of another flight ahead in the same flight sequence (ascent/descent). This feature applies to all three models, as departure and arrival streams within the terminal area are both prevalent. In interviews with pilots (that will be discussed in more detail in Chapter 5), we learned that pilots are likely to follow the preceding pilot in a stream during a weather impact. The pilots, along with Lin’s results, asserted that the behavior of the preceding pilot, thus, heavily influences whether a pilot chooses to fly through severe weather. As one pilot said, “If it worked for the other guy, it will work for me.” This mindset may be problematic during periods when the weather is worsening or moving rapidly, potentially obstructing the path taken by the previous pilot. Moreover, although this “follower-the-leader” behavior is more common closer to the airport, where flight streams and trajectories are most regulated, we occasionally find pilots clearly following each other beyond 100 km from the airport. The relative importance of stream-based features in Lin’s models revealed that pilots are more likely to penetrate severe weather when they are “followers” in a stream than when they 60 are “pathfinders” leading a stream. Our features don’t distinguish between “pathfinders” and “followers”, instead focusing on whether the flight of interest is a “follower”. In our models, a flight is labeled a “follower” if there is a preceding flight with a trajectory point within a mile of the current position within the past 3 minutes. A mile is a very small distance in comparison to the terminal area we have defined, so trajectory points of this proximity are likely no coincidence, even when considering standardized routes. Furthermore, when we examined air traffic flows for each runway at ORD, we found that there was no statistical difference between flows on weather days vs. flows on non-weather days. Thus, we can consider “follower” behavior a conscience decision by the pilot rather than attribute it to weather-specific airport procedures. The trajectory plot of the ORD terminal area in Figure 3.13 reveals this “follow-the-leader” behavior. Figure 3.13: Example of “follow-the-leader” behavior by departures in the West sector of the ORD terminal area. In Figure 3.13, the connected red points represent individual arrival trajectories and the 61 black lines represent individual departure trajectories. The circles represent the nose of the plane. The reader should notice three departures following each other in the West sector of the airport, avoiding storm cells of threat level “orange” and “red” north of ORD. In this scenario, the departures penetrated severe weather immediately upon takeoff but quickly altered their trajectories to deviate around the concentration of storm cells. “Following-theleader” worked effectively in this scenario, as it did in the majority of “follower” scenarios in our dataset. This subset of features records the following metrics for the current prediction interval: whether or not the current flight is a follower (“Follower”), the number of flights it is following (“NumLeaders”), whether or not the leaders penetrated severe weather (“FollowerPenetrate”), the average severity level of these penetrations (“FollowerPenetrateScore”), and the average VIL value encountered by the leaders (“LeaderVILFinal”). 3.5.4 Feature Summary Table 3.1 provides a comprehensive list of the features described in Chapter 3. We will reference these features frequently in the following chapter presenting our results. Feature BadWeatherPercentageBefore/Now PercentBadCellsDiffVIL Description Proportion of pixels within projection with VIL level Difference between time periods in proportion of pixels within projection with VIL level SeverityDiffVIL ≥3 ≥3 Difference between time periods in overall severity score of projection PercentWorseningVIL Proportion of projection pixels that increase in VIL value between time periods CellDiffVIL Sum of raw difference in projection pixel-to-pixel VIL values between time periods CellDiffVILAbs Sum of absolute difference in projection pixel-to-pixel VIL values between time periods flankcount Number of pixels in 180-degree projection with VIL level 62 ≥3 FlankingValue Standard deviation of storm cell degree location in 180-degree projection TimeinTerm Total time spent in terminal area up to the current prediction checkpoint TimeWithin50km Total time spent within 50 km of airport (Model 3 only) LevelOffORDecreasing Whether or not the current flight is leveling off in the trajectory Tromboning Whether or not the current flight is performing a trombone maneuver OtherTermArea Whether or not the curent flight is intersecting another airport terminal area PenetrateAlready FlightDistance Flight’s severity score (0-3) from previous prediction interval Total distance the flight has traveled from takeoff DistfromLanding Distance “as the crow flies” from assigned landing runway CircleDistfromLanding Minimum angular distance from assigned landing runway PenetratingAhead Whether or not flights ahead penetrated severe weather in current prediction interval PenetratingAheadNumber Total number of flights ahead that penetrated in current prediction interval PenetratingAheadNumberEntries Total number of times flights ahead penetrated in current interval PenetratingAheadScore Average severity level of penetrations by flights ahead PenetratingAheadDist Average distance “as the crow flies” between current flight’s position and penetrations ahead PenetratingAheadCircleDist Average angular distance between current flight’s position and penetrations ahead Arrs/DepsPenetrating Whether or not flights in opp sequence penetrated severe weather in current prediction interval Arrs/DepsPenetratingNumber Total number of flights in opp sequence that penetrated in current prediction interval Arrs/DepsPenetratingNumberEntries Total number of times flights in opp sequence penetrated in current interval Arrs/DepsPenetratingScore Average severity level of penetrations by flights in opp sequence 63 Arrs/DepsPenetratingDist Avg distance “as crow flies” between current flight’s position and penetrations in opp sequence Arrs/DepsPenetratingCircleDist Avg angular distance between current flight’s position and penetrations in opp sequence Arrs/DepsCrowding Total number of flights in opposite sequence within 25 km of flight’s current position Follower Whether or not the current flight is following the trajectory of another flight ahead NumLeaders LeaderPenetrate Number of flights ahead within the common trajectory Whether or not the flights ahead in the common trajectory penetrated Table 3.1: Summary of Model Features 4 Predictive Modeling of Pilot Behavior Chapter 4 presents the methods we used to predict severe convective weather penetration as well as the results we achieved from these methods. The following sections will discuss in detail the different predictive methods we implemented in the statistical software “R”, how we constructed the model datasets, the performance of our models when applied to ORD, and the performance of our models when applied to other airport terminal areas besides ORD. 4.1 Defining the Dependent Variable Before describing the different predictive methods we implemented in our research, it is critical to understand what we are trying to predict. As we alluded to in earlier chapters, the dependent variable in our models is not just the binary question of whether or not a flight will penetrate severe convective weather in the current prediction interval. Instead we look to achieve a more granular level of classification by predicting not just whether a 64 penetration will occur but the severity of that penetration as well. For each transponder message entry, we have VIL and echo top data for a flight at its current position. We then calculate the number of “Penetration Points” that the flight receives at each message entry using Equation 4.1 and its supporting tables. P enetration P oints = V IL P oints ∗ EchoT op P oints, (4.1) where VIL Value VIL Points Assigned Echo Top Value EchoTop Points Assigned <133 0 <25,000 ft. 1 ≤146 1 ≤35,000 ft. 2 ≤159 2 >35,000 ft. 3 ≤170 3 ≤180 4 ≤193 5 ≤206 6 ≤218 7 ≤231 8 ≤244 9 >244 10 Table 4.2: Echo Top Point Assignment Table 4.1: VIL Point Assignment The equation is multiplicative in nature to give adequate weight to the echo top metric, provide a large range of Penetration Point values, and to better capture the severity of a penetration. The VIL Point values were assigned rather arbitrarily with the difference between each VIL Value bin being about 12 units. No VIL Points are assigned if a flight encounters a VIL Value less than 133 because this encounter is not considered a severe convective weather penetration. One may notice that due to the multiplicative nature of the 65 Penetration Points equation, if the VIL Points assigned are 0, then the Penetration Points for that message entry are consequently 0. This fail-safe makes sense because echo tops do not exist unless severe VIL values are present. EchoTop Points allocation comprises fewer bins than VIL Points allocation because the incremental changes in echo top value are not as sensitive as with VIL. Within our model datasets, if an echo top exists, then it is typically greater than 25,000 feet. Since we are working with flight trajectories within the terminal area, we can almost always be sure that if an echo top exists at the flight’s current position, then the flight is under the echo top. Furthermore, as discussed in Chapter 2, the severity of echo tops is correlated with the severity of VIL. Thus, if a flight’s current position maintains a relatively large VIL value, then its corresponding echo top will also be relatively large. A flight’s “Overall Penetration Score” is the average number of Penetration Points registered within the current prediction interval. Our models’ dependent variable, referred to as the “severity level” of a given prediction interval, is defined by the “Overall Penetration Score” cutoffs in Table 4.3. Overall Penetration Score Severity Level 0 0 ≤4 1 ≤8 2 >8 3 Table 4.3: Dependent Variable Cutoffs One may notice that the cutoffs are relatively low, especially considering that the maximum number of Penetration Points a flight can earn in a given interval is 30. The dependent variable cutoffs were set to ensure a sufficient mix of severity levels in our model dataset while still reflecting the true proportion of flights that experience each level of severity. In fact, 66 most flights that penetrate severe convective weather do so at low VIL values and low echo top values, relative to the values that constitute severe weather. Figure 4.1 supports this claim, showing that the large majority of penetrations, on both the arrivals and departures side, are of level 3 VIL with echo top height between 25,000 and 35,000 ft 3 . Figure 4.1: Distribution of VIL and Echo Top values for arrival and departure penetrations within the ORD terminal area. 3 It is evident that there are more arrival penetrations than departure penetrations; this is specific to ORD and may not always be the case at other airports. It is also worth noting that level 6 departure penetrations do indeed exist, but there are so few that they do not show up on the histogram. 67 The dependent variable breakdowns for each ORD model dataset, prior to any dataset balancing, can be seen in Table 4.4. Severity Level Model 1 Frequency Model 2 Frequency Model 3 Frequency 0 27,876 14,937 26,649 1 942 533 879 2 216 95 153 3 76 29 96 Table 4.4: Breakdown of frequencies of each severity level for Models 1, 2, and 3. From the above table, it is clear that severe weather penetrations are a very rare occurrence, even during periods in which the terminal area is impacted by weather. It is also worth nothing that of the penetrations (severity level ≥0), the overwhelming majority are of severity level 1, which validates our design of the dependent variable cutoffs in Table 4.3. The next section will address how we deal with the huge difference in magnitude between the number of penetrations vs. non-penetrations. 4.2 Defining our Model Dataset Model 1, which is focused on departures, contains predictor input 4 from a different set of flights than that of Models 2 and 3. The Model 1 dataset contains predictor input from all ORD flights that have a trajectory point within the defined terminal area within two hours of a departure penetration. The Model 2 and 3 datasets were built in the same way except we are looking within two hours of an arrival penetration. The Model 3 dataset contains only predictor input corresponding to trajectory points within 50 km from ORD. We preprocessed each dataset and removed flights as necessary based on the ETMS trajectory verification discussed in section 2.2.1. 4 “Predictor input” refers to the calculated feature values at each prediction interval for the verified ORD flights. These feature values serve as input to our predictive models. 68 After examining the summary statistics of the input, we tackled the dependent variable issue discussed in the above section. With penetrations occurring so rarely, using the raw input would result in the models always predicting a severity level of 0. To prevent this from occurring and achieve meaningful findings with respect to feature importance, we chose to balance the datasets using oversampling. Thus, we matched the number of penetration entries with the number of non-penetration entries, ensuring that the frequency of each severity level above 0 was scaled based on the proportion of each level in the original set of penetration entries. We used bootstrapping to implement the oversampling of data; this ensured random selection of penetration entries from the original pool. 4.3 Predictive Methods After balancing the flight dataset for all three models, we selected which predictive methods to apply to our models. Any method we used had to be capable of handling a categorical dependent variable with more than two classes, as well as both continuous and discrete (i.e. binary) predictor variables. In the end, we chose to apply Multinomial Logistic Regression, CART, and Random Forests to our predictor input. Each method has its strengths and weaknesses, which we will discuss in the following subsections. The metrics used to evaluate each method consist of prediction accuracy, two different versions of a false negative rate, and a false positive rate. Prediction accuracy is simply the proportion of interval entries for which the model predicts the correct severity level. The first false negative rate examines how often the models predict a severity level lower than the actual severity level that occurred. The second false negative rate examines how often the models predict a severity level of 0 when in fact the severity level was greater than 0. We are more concerned with the second false negative rate because missing a penetration altogether is more dangerous than predicting a penetration but misjudging its severity. The false positive rate measures the proportion of interval entries for which the model predicts a 69 severity level greater than 0 when in fact the severity level was 0. This metric will help us evaluate the ability of our models to differentiate between non-penetration and penetration in a binary context. We also extracted feature importance within the models. The way in which these importance values were calculated for each predictive method will be explained in the below subsections. 4.3.1 Multinomial Logistic Regression Multinomial Logistic Regression generalizes logistic regression to multi-class problems with more than two possible discrete outcomes [11]. This method handles categorical variables well, especially when outcomes are ordinal [4], such as in our case with the severity levels. However, results are sensitive to the arbitrary coding of dependent variable classes, which may cause misleading conclusions. Additionally, this method outputs a “black box” model in that the user does not know how the model used the predictor input or weighted different features in order to make predictions. Thus, the results outputted by Multinomial Logistic Regression are not very interpretable, with the magnitude of coefficient values not accurately representing the relative importance of a feature in the model. Consequently, we focus on this method’s performance rather than feature importance interpretations. 4.3.2 Classification and Regression Trees (CART) CART, otherwise known as decision tree learning, is used to visually represent decisionmaking. This method recursively partitions the data into two sets, finding a partition at each step that maximally differentiates the two sets [14]. In our case, each step divides the prediction interval entries by severity level (usually just 0 and 1), while minimizing the misclassifications. The recursion is complete when the subset of flights at a node has all the same dependent variable value, or when splitting no longer adds value to the predictions [18]. 70 We used CART in our research for several reasons. First, it performs well with large datasets and requires very little data preparation or “cleansing” [14]. With over 60,000 rows of data in each model dataset, this was an important consideration. Next, CART outputs a transparent, “white-box” model: the predicted outcome for each interval entry is easily explained by boolean logic based on predictor input. Thus, the results are simple and easy to understand, especially regarding the relative importance of various features. Lastly, CART is very robust, meaning it performs well even if its assumptions are violated by the predictor input [18]. Nonetheless, CART has its limitations. First, it does not guarantee global optimality due to its reliance on greedy algorithms [18]. Additionally, CART is subject to overfitting, characterized by an excessive number of tree splits that make decisions more complex. On the flip side, depending on the parameter that sets the minimum number of interval entries required to create a node, dominant variables may result in trees with very few splits. This makes it hard to determine the relative influence of other variables in the decisionmaking process. Our models use a minimum node “bucket” size of 25 to prevent this variable over-dominance. Finally, information gain in trees is biased towards features with more levels [18]. However, this biased feature selection can be combated with conditional inference [18]. In the end, the “R” CART software package addresses most of these limitations, securing CART’s status as a viable predictive method for our research. Regarding the extraction of feature importance from CART models, we must first acknowledge that although only relatively few features may appear explicitly as “splitters” in the visual output, this doesn’t mean that there aren’t other features important to understanding or predicting the severity level. The simplicity of the outputted decision tree can be attributed to the goal of CART: to develop a simple tree structure for predicting outcomes based on data [18]. Furthermore, a feature may be very influential even if it does not appear as a primary splitter. CART keeps track of surrogate splits in the tree-growing 71 process, so the contribution a feature makes in the prediction process is not determined only by primary splits [21]. Throughout the tree-growing process, whenever a primary splitter is missing, surrogate splitters are used instead to move an interval entry down the tree to its appropriate terminal node [21]. A feature may appear in a tree many times, either as a primary or a surrogate splitter [21]. To calculate the importance score for each feature, the “R” CART package sums the goodness of split measures for each split for which the feature is a primary splitter [20]. It then adds this sum of goodness to the term “goodness * (adjusted agreement)” for all cases in which the feature serves as a surrogate splitter [20]. The resultant scores are scaled to sum to 100. The importance score considers surrogate splits to prevent two similar features from obscuring the significance of one another [20]. It is important to note that importance scores are strictly relative to the given tree structure and do not indicate absolute information value of a feature [21]. 4.3.3 Random Forests The Random Forest method is an extension of CART in that it constructs hundreds of decision trees, using a random subset of the predictor input in each one. This approach, called bootstrap aggregating, or “bagging”, helps the model identify trends in the data that CART cannot. Moreover, after each tree votes on the dependent variable outcome, Random Forests use the mode of these tree predictions as its final prediction. Random Forests are a very useful prediction tool for several reasons. First, it is extremely robust. Random Forests can deal with many correlated, weak predictors without skewing the prediction results or having one variable become over-dominant [14]. In addition, the diversity of trees helps with the overfitting problem commonly seen in CART. Lastly, Random Forests handle unbalanced datasets well [14]. Thus, if we chose not to balance our model datasets with oversampling, we could still be confident using Random Forests to make 72 predictions. The most common complaint of Random Forests is that they are “black box” models with results that are not readily interpretable, unlike CART. However, we can determine the most influential features in the model using the Gini Index. The Gini Index measures node impurity, or ”how much each feature contributes to the homogeneity of the nodes and leaves in the resulting random forest” [9]. To obtain the importance score for a given feature in a Random Forest model, we randomly permute the values of each feature and measure the decrease in accuracy of the current tree based on the Gini Index [14]. This process is repeated for all trees in the forest containing the feature of interest. The resulting average of these accuracy decreases is the raw variable importance. A higher value (higher decrease in Gini) indicates that a particular feature is more influential in the classification process [14]. 4.4 Model Results We will now present the results of predictive methods discussed above applied to Models 1, 2, and 3 5 . These results include prediction accuracy, both false negative rates6 , a false positive rate, and a discussion of feature importance within the models. The feature importance discussion will very limited for Multinomial Regression due to its lack of transparency. Both the training and testing sets for the models consist of predictor input from ORD flights. The training set is made up of 60% of interval entries in the balanced model dataset, and the testing set is made up of the remaining 40% of interval entries in the balanced model dataset. 5 The predictor input for these models includes all of the features discussed in Chapter 3. We will explore partial inclusion of features in subsequent sections. 6 As a reminder, the first false negative rate examines how often the models predict a severity level lower than the actual severity level that occurred. The second false negative rate examines how often the models predict a severity level of 0 when in fact the severity level was greater than 0. The false positive rate measures the proportion of interval entries for which the model predicts a severity level greater than 0 when in fact the severity level was 0. 73 4.4.1 Model 1 Results Table 4.5 outlines the prediction accuracy, false negative rates, and false positive rate achieved by each predictive method applied to Model 1: Method Accuracy FN Rate 1 FN Rate 2 FP Rate Multinomial Regression 79% 32% 16% 9% CART 78% 27% 9% 16% Random Forests 97% 3% 3% 4% Table 4.5: Model 1 Performance Results From Table 4.5, it is clear that Random Forests’ performance is superior to that of the other two predictive methods. Multinomial Regression and CART perform very similarly, with CART having slightly lower accuracy, lower false negative rates, but a higher false positive rate. As mentioned before, it is hard to comment on feature importance for Multinomial Regression due to its lack of interpretability and coefficient values that do not accurately reflect feature importance. However, CART and Random Forests both provide meaningful output for interpretation. Figure 4.2 displays the outputted CART model with its respective splits. Each branch indicates the criterion for the left-hand daughter node; each node is labeled with the predicted severity level as well as the actual number of interval entries of each severity level (0/1/2/3) in the training set assigned to that node. Table 4.6 lists the relative importance of each feature rescaled to sum to 100, omitting any features whose proportion of the sum is less than 1%. The feature at the root of the tree is “flankcount”, with secondary splits on “PenetratingFartherOutDist” and “PenetrateAlready”. These splits are consistent regardless of the random training-testing split. Severity levels 0, 1, and 2 are included in the tree nodes, but severity level 3 is not because of its very rare occurrence in the data set. Since the first 74 Model 1 CART flankcount< 0.875 0| 16726/12766/2927/1033 PenetrateAlready< 5.833 1 2624/10865/2730/944 PenetratingFartherOutDist>=33.66 0 14102/1901/197/89 0 13967/1240/146/65 1 135/661/51/24 1 2569/10412/1964/436 2 55/453/766/508 Figure 4.2: Model 1 ORD CART Output Feature Relative Importance flankcount 21 FlankingValue 21 BadWeatherPercentageNow 17 BadWeatherPercentageBefore 16 CellDiffVILAbs 10 PercentWorseningVIL 9 PenetratingFartherOutDist 2 PenetrateAlready 2 PenetratingFartherOutScore 1 Table 4.6: Model 1 CART Feature Importance Values split occurs on a weather-based feature, and the top features in the importance chart are also weather-based, one may conclude that the most significant determinant of departure weather penetration is the weather itself and not any operational factors. It is worth noting 75 that it doesn’t require a very large “flankcount” for CART to predict a penetration. Furthermore, the tree is rather simple, with only three total splits. This, along with the results in the relative importance chart, support that there are a few dominant features that influence the prediction decision, with the remaining features’ influence being very weak. Decision trees are not well-equipped to handle the presence of many weak features, which could result in two features that are somewhat correlated not both being used despite their similarity. However, the presence of dominant features in our CART model prevents this from being relevant. Random Forests do not provide a useful visual output like CART, but do provide the relative importance of features using the Gini Index. Table 4.7 presents the normalized importance values, rounded to the nearest integer and rescaled to sum to 100, for each feature, omitting any features whose proportion of the sum is less than 1%. The most influential features are essentially the same as in Table 4.6 for CART, but the spread of relative importance values among features is much smaller. This results in the longer list of features in Table 4.7. Although the top features in the Random Forests importance chart are still weather-based, these features do not have as dominant of a presence as with CART. This could be attributed to the robustness of Random Forests, allowing them to deal with the presence of many weak features without skewing results or hurting performance. We now look to the Model 2 and 3 results to discern whether model performance and feature importance values are similar for ORD arrivals. 4.4.2 Model 2 Results Table 4.8 outlines the prediction accuracy, false negative rates, and false positive rate achieved by each predictive method applied to Model 2: From Table 4.8, it is clear that Random Forests’ performance is superior to that of the other two predictive methods. Multinomial Regression and CART perform very similarly, 76 Feature Relative Importance flankcount 14 FlankingValue 12 BadWeatherPercentageNow 8 PenetratingFartherOutDist 7 BadWeatherPercentageBefore 7 CellDiffVILAbs 6 PenetrateAlready 5 PenetratingFartherOutScore 5 PercentWorseningVIL 5 TimeinTerm 5 CellDiffVIL 4 SeverityDiffVIL 4 PercentBadCellsDiffVIL 4 ArrivalsPenetratingDist 2 PenetratingFartherOut 2 ArrsCrowding 2 ArrivalsPenetratingCircleDist 2 ArrivalsPenetratingScore 2 PenetratingFartherOutCircleDist 2 Table 4.7: Model 1 Random Forests Feature Importance Values with CART having slightly lower accuracy, lower false negative rates, but a false positive rate almost two times the size. Compared to Model 1 performance, Model 2 accuracy rates are slightly lower, false negative rates are slightly higher, and false positive rates are slightly lower. This difference may be due to the unforeseeable behavior of arrivals during descent 77 Method Accuracy FN Rate 1 FN Rate 2 FP Rate Multinomial Regression 82% 27% 14% 8% CART 80% 24% 7% 15% Random Forests 98% 0% 0% 3% Table 4.8: Model 2 Performance Results compared to departures, which follow more direct routes from takeoff up to cruising altitudes. Figure 4.3 displays the outputted CART model with its respective splits and Table 4.9 lists the relative importance of each feature. Model 2 CART FlankingValueMatrix< 2.566 0| 8962/7270/1296/398 PenetratingCloserInCircleDist>=38.35 0 7590/557/68/39 0 7570/443/68/39 1 1372/6713/1228/359 1 20/114/0/0 Figure 4.3: Model 2 ORD CART Output The feature at the root of the tree is “FlankingValue”, with a single secondary split on “PenetratingCloserInCircleDist”. These splits are consistent regardless of the random training-testing split. It is worth noting that it doesn’t require a very large “FlankingValue” for CART to predict a penetration. Interestingly, only severity levels 0 and 1 are predicted by the model. This may be attributed to the particularly high proportion of level 1 penetration 78 Feature Relative Importance FlankingValue 20 flankcount 20 BadWeatherPercentageNow 16 BadWeatherPercentageBefore 16 CellDiffVILAbs 15 PercentWorseningVIL 12 PenetratingCloserInCircleDist 1 PenetratingCloserInDist 1 Table 4.9: Model 2 CART Feature Importance Values entries in the model dataset, with over 80% of penetration entries classified as level 1. Just like in the Model 1 CART, the first split occurs on a weather-based feature and the top features in the importance chart are also weather-based, leading us to believe that the most significant determinant of arrival penetration far from the airport is the weather itself and not any operational factors. The dropoff in relative importance between weather-based features and non-weather-based features in Table 4.9 is noticeably large. The fact that the tree only contains two splits does not mean that there are only two important features in the set. Additionally, the fact that the second split is on “PenetratingCloserInCircleDist” does not mean that it is one of the most influential features in the set. In fact, according to Table 4.9, it has a relatively low importance value. This goes back to the discussion of splitters in section 4.3.2, which stated that the importance of a feature within the CART model is not dependent on its role as a primary splitter. Moving on, Table 4.10 presents the Model 2 Random Forests normalized feature importance values. The most influential features are similar to those in Table 4.9 for CART and as in the Model 1 results, but the specific importance rankings have shuffled around. “FlankingValue” 79 Feature Relative Importance FlankingValue 17 flankcount 13 BadWeatherPercentageNow 10 CellDiffVILAbs 8 BadWeatherPercentageBefore 8 PercentWorseningVIL 7 CellDiffVIL 4 PenetratingCloserInCircleDist 4 FlightDistance 4 SeverityDiffVIL 4 PenetratingCloserInDist 4 PercentBadCellsDiffVIL 3 PenetratingCloserInScore 2 TimeinTerm 2 PenetratingCloserInNumberEntries 2 PenetrateAlready 1 Table 4.10: Model 2 Random Forests Feature Importance Values is now dominant over “flankcount”. Furthermore, the gap between the dominant features and all others is more noticeable, resulting in a shorter list of variables in Table 4.10 than in Table 4.7 corresponding to Model 1. Table 4.10, unlike Table 4.7, does not contain any features based on the behavior of flights in the opposite flight sequence. This suggests that departure behavior does not greatly influence arrival penetration far from the airport, which makes sense because it is easier to create separation between flights when they are farther 80 from the congestion near the airport. We should see more unique trends in the Model 3 results due to the fact that all interval entries take place within 50 km of ORD. 4.4.3 Model 3 Results Table 4.11 outlines the prediction accuracy, false negative rates, and false positive rate achieved by each predictive method applied to Model 3. Method Accuracy FN Rate 1 FN Rate 2 FP Rate Multinomial Regression 86% 21% 10% 5% CART 85% 24% 6% 7% Random Forests 99% 0% 0% 3% Table 4.11: Model 3 Performance Results From Table 4.11, it is again clear that Random Forests’ performance is superior to that of the other two predictive methods for Model 3. Multinomial Regression and CART perform very similarly, with CART having slightly lower accuracy, lower false negative rates, and slightly higher false positive rate. Compared to Model 2 performance, Model 3 posts noticeably higher accuracy rates, lower false negative rates, and lower false positive rates for Multinomial Regression and CART. This difference is rather unexpected considering the uncertain behavior of arrivals close to the airport. One possible explanation for the difference is that the Model 3 dataset contains significantly more interval entries than the Model 2 dataset, allowing it to better train with predictive methods before applying the model to the testing set. This, however, is not necessarily the case. Figure 4.4 displays the outputted CART model with its respective splits and Table 4.12 lists the relative importance of each feature. The feature at the root of the tree is “flankcount”, with secondary splits on “PenetratingCloserInCircleDist” and “PenetratingCloserInScore”, a tertiary split on “Penetrating- 81 Model 3 CART flankcount< 1.5 0 15989/12472/2168/512 | PenetratingCloserInDist>=55.37 0 15124/2108/256/47 0 14755/1002/76/25 1 369/1106/180/22 PenetratingCloserInScore< 4.452 1 865/10364/1912/465 1 839/9877/1317/125 PenetratingCloserInScore< 8.528 2 26/487/595/340 DeparturesPenetratingCircleDist< 86.41 2 18/465/532/68 1 3/196/0/5 3 8/22/63/272 2 15/269/532/63 Figure 4.4: Model 3 ORD CART Output CloserInScore” again, and finally a quaternary split on “DeparturesPenetratingCircleDist”. These splits are not as consistent as Models 2 and 3 with regards to the random trainingtesting split. However, this framework of splits was the most frequently encountered by far. The large number of splits compared to the CART output for Models 1 and 2 is supported by the longer list of features in Table 4.12. The dropoff in relative importance between the weather-based features and the non-weather-based features is not as overwhelming as in Models 1 and 2. Furthermore, the fact that there are two splits on “PenetratingCloserInScore” does not mean that it is one of the most influential features in the set. In fact, according to Table 4.12, it has a relatively low importance value. This goes back to the discussion of splitters in section 4.3.2, which stated that the importance of a feature within the CART model is not dependent on its role as a primary splitter. Moreover, unlike Models 2 and 3, all severity 82 Feature Relative Importance flankcount 18 FlankingValue 18 BadWeatherPercentageNow 15 BadWeatherPercentageBefore 13 PenetratingCloserInDist 12 PenetrateAlready 10 PenetratingCloserInScore 4 PenetratingCloserInCircleDist 3 PenetratingCloserIn 2 PenetratingCloserInNumber 2 PenetratingCloserInNumberEntries 2 Table 4.12: Model 3 CART Feature Importance Values levels are included in the Model 3 tree nodes. This may be attributed to the relatively high proportion of level 3 penetrations in the Model 3 dataset in comparison to Models 1 and 2. In Model 3, 8.5% of penetrations are of level 3 severity, compared to 6.2% in the Model 1 dataset and 4.4% in the Model 2 dataset. Moving on, Table 4.13 presents the Model 3 Random Forests normalized feature importance values. 83 Feature Relative Importance flankcount 14 FlankingValue 10 BadWeatherPercentageNow 8 CellDiffVILAbs 7 BadWeatherPercentageBefore 6 PenetratingCloserInDist 5 PenetratingCloserInScore 5 PenetratingCloserInCircleDist 4 PenetratingCloserInNumberEntries 3 PenetratingCloserInNumber 3 TimeinTerm 2 CellDiffVIL 2 CircleDistfromLanding 2 DistfromLanding 2 SeverityDiffVIL 2 PenetrateAlready 2 DeparturesPenetratingScore 2 DeparturesPenetratingDist 2 PercentWorseningVIL 2 FlightDistance 2 TimeWithin50km 2 PercentBadCellsDiffVIL 2 PenetratingCloserIn 2 DeparturesPenetratingCircleDist 1 DeparturesPenetratingNumberEntries 1 FollowerPenetrateScore 1 FollowerVILFinal 1 DepsCrowding 1 Table 4.13: Model 3 Random Forests Feature Importance Values 84 The most noticeable thing about Table 4.13 is its length. The number of Random Forests features with greater than 1% of the proportion of the Gini Index sum is much larger than that of Models 2 and 3. This results in the spread of importance values among features being relatively small, with very small incremental decreases as you move down the list. Although the top features in the importance chart are still weather-based, these features do not have as dominant of a presence for Model 3. This could be attributed to the robustness of Random Forests, allowing it to deal with the presence of many weak features without skewing results or hurting performance. The list contains the entire set of features that evaluate arrivals closer to ORD as well as departures in the vicinity. This makes sense because interval entries in Model 3 are restricted to within 50 km of the airport. Thus, congestion near the airport is expectedly more of an issue, and the prevalence of systematic streams and approach paths allows us to better gauge future pilot behavior based on flights ahead. 4.4.4 Summary of ORD Results From the results presented above, it is apparent that Random Forests are the best predictive method in terms of the quantitative performance metrics. Moreover, its interpretability with regard to feature importance rivals that of CART, especially since the CART splits do not necessarily represent the most influential variables. These points make a strong case for Random Forests as the recommended predictive method for severe convective weather penetration. Nonetheless, we must keep in mind that the models are meant to serve as decision support tools for air traffic controllers. The value of a “white-box model” and its ability to demonstrate the proposed decisionmaking process should not be overlooked. Air traffic controllers would appreciate CART’s simplicity and visual output, and could engineer their own model splits based on personal experience. Thus, the recommended prediction tool is not so clear cut; an extended conversation, beyond the scope of this thesis, will most 85 likely have to take place to make a final choice. Regarding feature importance, our study shows that the primary indicators of penetration continue to be weather-based, particularly the presence of fast-moving weather within a flight’s trajectory projection. Nevertheless, we found that a number of the operational features in our models weakly correlate with severe convective weather penetration. Despite having lower importance values, these features help shed light on the dynamics of the terminal area. In particular, the importance of features that describe the behavior of other pilots in the terminal area may help us understand how pilots and air traffic controllers deal with weather impacts close to the airport. The most important conclusion was that pilots are more likely to penetrate severe weather when other pilots ahead of them in the ascent/descent sequence already penetrated. This makes sense because flights ahead have already experienced weather conditions, severe or not, that the current flight will soon face. On the flip side, one may ask why pilots don’t learn from the mistakes of pilots flying ahead of them. These results imply that rerouting around weather is still often done on an ad hoc basis once a pilot reports his/her weather penetration to ATC [14]. Further investigation into the dynamics of the terminal area is necessary to develop effective penetration mitigation strategies and obtain a better understanding of how weather impacts air traffic flows. The findings described above apply specifically to models run on ORD terminal area operations. How do we know if other terminal areas throughout the U.S. will produce similar results? Are our models robust to geographic location? We will explore these questions in the next section. 4.5 Testing Our ORD Models on Other Airports We tested our models on several other U.S. airports to determine whether they were robust to location. In addition, after re-training the models on each individual airport, we explored whether models trained on one airport could achieve success on another airport. Thus, are 86 the most important features and flow of decisions within airport terminal areas similar enough that we can develop a common model that will be successful across all U.S. airports? This would greatly decrease model computation time by skipping the re-training process while also standardizing how air traffic controllers approach severe convective weather penetration. Before presenting the results of each experiment for Models 1, 2, and 3, we must first discuss how we picked airport pairings for common models. 4.5.1 Selecting Airport Pairings for Common Model Experiment Figure 4.5 shows the 30 U.S. airports with the most severe convective weather penetration flights during the summer 2008. Each point represents a single airport. Figure 4.5: Map of Top 30 Penetration Airports The color of the points differs based on the total number of penetration flights that occurred within the respective airport terminal area, with “yellow” representing a low number and “red” representing a high number. The size of the points differ based on the correspond87 ing airport’s 2013 “hub” score, determined by the total number of passenger boardings. For our purposes, a higher “hub” score reflects a busier airport. The two airports with the most penetration flights, Chicago O’Hare and Atlanta, also are the two busiest airports in the top 30. However, this trend is not consistent across the top 30: Chicago Midway, Detroit, Orlando, and St. Louis experience a large number of penetration flights but don’t have particularly high hub rankings. One may notice that all of the airports in the top 30 for penetration flights are in the right half of the United States, especially concentrated in the Midwest and the Southeast. This is most likely due to the higher frequency of convective weather patterns in these areas as discussed in Section 2.1. In fact, 19 out of the top 30 airports are located in these regions, with 11 in the Midwest alone. This statistic greatly influenced our process for assigning airport pairings. Table 4.14 lists the pairings we tested. Pairing Region Distance (km) Chicago O’Hare (ORD)\Atlanta (ATL) Cross-Region 975 Chicago O’Hare (ORD)\Detroit (DTW) Midwest 377 Chicago O’Hare (ORD)\Cleveland (CLE) Midwest 508 Detroit (DTW)\Cleveland (CLE) Midwest 152 St. Louis (STL)\Memphis (MEM) Midwest 386 Indianapolis (IND)\Cincinnati (CVG) Midwest 161 Atlanta (ATL)\Orlando (MCO) Southeast 650 Orlando (MCO)\Tampa (TPA) Southeast 125 Table 4.14: Airport Pairings All of the airports in Table 4.14 are in the top 15 for penetration flights. We first test O’Hare against Atlanta because they were the two airports with the most penetration flights, and also happened to maintain the heaviest volume of air traffic. However, based on the poor results of this test, we decided to limit pairings to airports that were within the same region. 88 If a common model across the U.S. couldn’t be developed, maybe we could at least develop regional models. Furthermore, we wanted the airports to be relatively close in proximity so that weather patterns would be similar, but not within each other’s terminal area, because then operations may overlap and skew results. The following sections will present the aggregated results of the re-trained models alongside the results of the airport pairing models in order to determine whether regional models are feasible as well as obtain insight regarding feature importance from the experiments. 4.5.2 Comparison of Results Table 4.15, 4.16, and 4.17 compare the results for the re-training and airport pairing methods, displaying the average performance metrics across airports/pairings along with the corresponding standard deviation. The pairings method uses 60% of the flights from one airport as the training set and 100% of the flights in the other airport as the testing set. The pairings average performance metrics aggregate results from using both airports in a pair as the training set. Method MR Acc Tree RF Acc Acc MR FN 1 Tree RF FN FN 1 1 MR FN 2 Tree RF FN FN 2 2 MR FP Tree RF FP FP Re-training 79 (3) 80 (3) 98 (1) 31 (4) 27 (3) 1 (1) 15 (4) 10 (5) 1 (1) 9 (3) 13 (3) 3 (1) Pairing 74 (5) 74 (4) 68 (7) 39 (9) 35 (8) 59 (15) 17 (6) 12 (7) 29 (9) 10 (6) 14 (7) 3 (1) Table 4.15: Comparison of Model 1 performance results for the re-training vs. airport pairing methods. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree” represents CART, and ”RF” represents Random Forests. Regarding performance metrics, “Acc” represents the prediction accuracy. “FN 1” represents the first false negative rate defined, “FN 2” represents the second false negative rate, and “FP” represents the false positive rate. It is evident that the re-training method is superior to the pairing method for all performance metrics for all three models. The level of superiority differs based on the metric 89 Method MR Acc Tree RF Acc Acc MR FN 1 Tree RF FN FN 1 1 MR FN 2 Tree RF FN FN 2 2 MR FP Tree RF FP FP Re-training 81 (7) 81 (6) 98 (1) 26 (11) 23 (9) 0.6 (0.8) 14 (5) 11 (4) 0.6 (0.8) 10 (4) 12 (5) 3 (2) Pairing 67 (7) 72 (6) 65 (7) 52 (14) 37 (10) 65 (16) 25 (9) 16 (8) 33 (8) 10 (3) 13 (6) 3 (2) Table 4.16: Comparison of Model 2 performance results for the re-training vs. airport pairing methods. Method MR Acc Tree RF Acc Acc MR FN 1 Tree RF FN FN 1 1 MR FN 2 Tree RF FN FN 2 2 MR FP Tree RF FP FP Re-training 83 (4) 84 (5) 98 (1) 24 (7) 23 (9) 0.5 (0.6) 14 (4) 11 (4) 0.5 (0.6) 7 (2) 9 (4) 3 (2) Pairing 68 (10) 74 (6) 68 (8) 47 (16) 35 (8) 60 (16) 23 (10) 13 (5) 31 (9) 10 (7) 13 (6) 3 (1) Table 4.17: Comparison of Model 3 performance results for the re-training vs. airport pairing methods. of interest. The most surprising differences reside in the Random Forests metrics. While Random Forests are by far the best predictive method for the re-training approach, it is by far the worst predictive method for the pairing approach. It is unclear why such a dropoff took place, with CART appearing to be the best predictive method all around for the pairing approach. However, the CART output for the pairings method often contains a large number of splits that vary based on the randomly constructed training-testing set. Also, it is worth noting that the standard deviations of the re-training performance metrics are lower than those of the pairing performance metrics, suggesting more consistent performance. With regards to feature importance, both the re-training and pairing results mirrored those presented in the section above describing O’Hare. The most influential features were weather-based, with the flanking features, severity features, and “CellDiffVILAbs” consistently topping the importance rankings. 90 4.5.3 Insight from Pairings Experiment During the airport pairing experiment, we found that Midwestern airport pairings performed significantly better than Southeastern airport pairings, which may be attributed to the larger sample size of Midwestern pairings. We will not comment on the performance of crossregional pairings vs. regional pairings because we only tested one cross-regional pair. Additionally, we observed that setting the training set to an airport with more total flights in its dataset compared to the other airport in its pair does not necessarily result in a better, more consistent model. However, setting the training set to an airport with more penetration entries does indeed translate to better, more consistent models, especially if the difference in the number of penetration entries is large. This finding can be explained by the following example: consider two models with the same number of total flight entries. Since we balance our model datasets, an airport with fewer penetration entries would have to cycle through more duplicate entries than the other airport in the pair in order to match the number of non-penetration entries. The airport with fewer penetration entries would hypothetically serve as an inferior trainer because it has experienced less penetration behavior. The increase in consistency obtained by using the airport with more penetration entries as the trainer can be seen explicitly in the CART decision tree, which maintains the same structure of splits given the randomly constructed training-testing set. 4.5.4 Summary of Results Based on the results presented above, we recommend re-training on each individual airport rather than trying to construct regional models or a common model across all U.S. airports. The recommended predictive method is still Random Forests, but its sensitivity and poor performance in the pairing experiment should be noted and explored further. Its high false negative rates were due to its frequent prediction of severity level 0 when in fact the severity level was 1 or its prediction of severity level 1 when in fact the severity was of level 2. The 91 latter is less worrisome because at least the model still predicts that a penetration will occur, signaling that some course of action must be taken by the controller/pilot. Furthermore, the consistency with regards to feature importance suggests that the most influential variables are truly weather-based and that we have developed features that are robust to airport location. The fact that our models perform well independent of airport location is promising. Yet we have not looked into the intricacies of the pilot thought process when encountered with severe convective weather. The next chapter will more closely examine pilot behavior on a case-by-case basis in order to validate of our models. 4.6 Sensitivity of Models Since weather-based features were consistently the most influential across all three models, regardless of the airport, we explored whether the performance of our models changes significantly when only applying the weather-based feature subset. We also tested a few other subsets that build upon one another and contain some of the other more influential features. The four subsets tested are described below. Tables 4.18, 4.19, and 4.20 outline the results of this sensitivity analysis, displaying the average performance metrics, along with corresponding standard deviation, across all airports listed in section 4.5 7 . Round 1: Only weather-based features Round 2: Round 1 + “PenetrateAlready” + “PenetratingAhead” subset Round 3: Round 2 + “OtherFlightSequencePenetrating” subset Round 4: Round 3 + all other features listed in Table 3.1 7 We retrain on each individual airport and use 60% of interval entries as the training set and 40% of interval entries as the testing set. 92 Method MR Acc Tree RF Acc Acc MR FN 1 Tree RF FN FN 1 1 MR FN 2 Tree RF FN FN 2 2 MR FP Tree RF FP FP Round 1 75 (3) 78 (3) 98 (1) 39 (5) 30 (5) 2 (2) 18 (5) 12 (5) 1.5 (1.5) 9 (3) 13 (3) 3 (1) Round 2 78 (3) 79 (2) 98 (1) 35 (4) 27 (3) 2 (1) 16 (5) 11 (4) 1.8 (1) 9 (3) 13 (3) 3 (1) Round 3 78 (3) 79 (2) 97 (1) 33 (5) 27 (3) 4 (2) 16 (5) 11 (4) 3.5 (2) 9 (3) 13 (3) 3 (1) Round 4 79 (3) 79 (2) 98 (1) 32 (4) 27 (3) 1.5 (1) 15 (4) 11 (4) 1.5 (1) 9 (3) 13 (3) 3 (1) Table 4.18: Comparison of Model 1 performance results for variable subsets. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree” represents CART, and ”RF” represents Random Forests. Regarding performance metrics, “Acc” is the proportion of interval entries for which the model predicts the correct severity level. “FN 1” examines how often the models predict a severity level lower than the actual severity level that occurred. “FN 2” examines how often the models predict a severity level of 0 when in fact the severity level was greater than 0. “FP” measures the proportion of interval entries for which the model predicts a severity level greater than 0 when in fact the severity level was 0. Method MR Acc Tree RF Acc Acc MR FN 1 Tree RF FN FN 1 1 MR FN 2 Tree RF FN FN 2 2 MR FP Tree RF FP FP Round 1 76 (5) 81 (5) 97 (2) 36 (9) 24 (9) 1.6 (2) 18 (5) 12 (5) 1.6 (2) 10 (5) 13 (5) 4 (2) Round 2 79 (6) 82 (5) 97 (2) 30 (10) 22 (7) 2 (2) 16 (5) 10 (5) 2 (2) 10 (4) 12 (5) 4 (2) Round 3 80 (6) 82 (5) 97 (2) 29 (10) 22 (7) 2 (2) 16 (4) 10 (5) 2 (2) 10 (3) 12 (5) 4 (1.5) Round 4 82 (6) 82 (6) 98 (1) 25 (10) 21 (9) 0.8 (1) 15 (5) 10 (5) 0.8 (1) 10 (4) 12 (5) 3 (2) Table 4.19: Comparison of Model 2 performance results for variable subsets. 93 Method MR Acc Tree RF Acc Acc MR FN 1 Tree RF FN FN 1 1 MR FN 2 Tree RF FN FN 2 2 MR FP Tree RF FP FP Round 1 76 (3) 81 (6) 96 (2) 39 (5) 30 (10) 4 (2) 19 (4) 13 (6) 4 (2) 8 (2) 10 (4) 4 (2) Round 2 79 (3) 82 (5) 96 (1) 33 (6) 26 (9) 5 (2) 17 (5) 12 (5) 5 (2) 7 (2) 9 (4) 4 (2) Round 3 81 (4) 82 (5) 95 (2) 30 (7) 27 (9) 7 (3) 16 (5) 12 (5) 7 (5) 6.5 (2) 9 (4) 4 (2) Round 4 83 (4) 84 (5) 99 (1) 25 (6) 22 (9) 0.3 (0.5) 15 (4) 10 (4) 0.3 (0.5) 7 (2) 9 (4) 3 (2) Table 4.20: Comparison of Model 3 performance results for variable subsets. It is clear that Round 4 maintains the best performance, but the incremental improvement between rounds is very small. Round 1 performance is impressive considering it only uses 8 features, compared to 40 in Round 4. The proximity in performance between Rounds 1 and 4 further demonstrates the dominant influence of weather-based features in our models. The improvement between Rounds 1 and 2 is rather small, and the improvement between Rounds 2 and 3 is almost negligible if not counterproductive, suggesting that the added features in these rounds are not very influential. Each predictive method also exhibits a unique trend across all three models. Multinomial Regression performance improves consistently from Round 1 to 4. CART performs extremely consistently across all four rounds. Lastly, Random Forests performance worsens from Round 1 to Rounds 2 and 3 before reaching its best numbers in Round 4. 5 Case Studies and Pilot Experience The sections below will examine pilot behavior within the ORD terminal area during severe convective weather scenarios. Based on recurring themes in these scenarios and the personal experiences of commercial and military pilots that we interviewed, we will verify/disprove our model results and make conclusions about our research. 94 5.1 Takeaways from Pilot Interviews We interviewed over 20 professional pilots from various backgrounds and experience levels in order to learn their thought process upon encountering severe weather and how they would handle such an encounter. The most common trend was that a pilots attitude is to avoid weather at all costs. After all, the blame for weather-related accidents and structural damage falls on the pilot for accepting a bad vector or deciding to take off in severe weather conditions. Regardless of ATC coordination and recommendation, the aircraft is the pilot’s responsibility. The following subsections will address such topics as onboard weather radar and forecasting, flight path deviation, taking off/landing during a weather impact, and general takeaways from the interviews. 5.1.1 Weather Radar and Forecasting Technology in the Cockpit Pilots asserted that they used onboard radar and forecasting technology for avoidance rather than selecting the weakest weather-impacted areas to penetrate. They also complained that these tools are often outdated and inaccurate, especially regarding the currency of forecasts. Moreover, pilots have access to VIL in the cockpit but not echo tops, so they aren’t aware of the exact height of storm cells. As a result, pilots have very little confidence in the weather forecasts they receive in the cockpit. They are forced to rely on ATC and dispatch over radar because these resources provide more real time reports and weather forecasts, which is why our prediction tool is geared towards controller support rather than direct pilot support. 5.1.2 8 Deviation from the Filed Flight Path Given storm conditions in the terminal area, a pilot may wish to deviate around weather cells in order to avoid penetration. In theory, a pilot must first obtain approval from ATC to 8 The capabilities that ATC has in this sphere are airport-dependent. 95 deviate from the planned flight path. According to the pilots we interviewed, this process is actually easy and quite common, with 99% of proposed deviations receiving approval as long as the deviation does not endanger other aircraft in the vicinity. Nonetheless, even without approval, the aircraft is still the pilot’s responsibility. The FAA mandates that above 20,000 ft altitude, pilots must avoid storms by 20 NM or overfly them by 5,000 ft. Below 20,000 ft, pilots must avoid storms by at least 5 NM. Thus, the pilot may have to alter the heading given by ATC if it is not adequate to safely avoid the storm. The capabilities of the aircraft at hand, such as thrust, size, and maneuverability, play an important part in this decision. Yet the interviewees ranked the following as the top four factors contributing to the difficulty of deviation: fuel limitations, ATC advisories/regulations, the current runway configuration, and lack of visibility due to nighttime operations, in that order. 5.1.3 Impact of Convective Weather on Departures When convective weather is present near the airport, the decision to keep the flight scheduled is not up to pilot but rather the airline. However, it is officially the pilot’s decision whether or not to take off. But is it really? Not only are airlines pressuring their pilots to take off, but they are pressuring air traffic controllers to get planes airborne in order to minimize delays and reduce fuel wasted while idling. Consequently, ATC usually gives approval to take off unless weather is right on top of runway. In the end, despite pilots’ honest assessment of the weather situation, it looks bad if other pilots are taking off and they aren’t. This along with sensitive duty hour limits breeds the “need-to-get-it-done” attitude often seen among pilots today. The interviewees pointed out that pilots don’t taxi around with radar on, but rather get a good indication of the weather situation when they pull onto the runway. Feedback from prior departures also helps to paint a picture of the challenges following takeoff. All 96 in all, there is not much planning time once the pilot has decided to take off. Departures still receive weather updates from ATC when they are within 5 km of the airport, but the unknown grows from there. Despite this uncertainty, it is theoretically much easier to deviate around storm cells during ascent than descent because the pilot can turn in a wide variety of directions immediately following takeoff. This implies that there should be fewer departure penetrations than arrival penetrations, which is supported by our data. 5.1.4 Impact of Convective Weather on Arrivals Descending towards the airport during a weather impact is very different from being a departure because the plane must land at some point. The pilot does not have the luxury of staying on the ground if the weather is too severe. Consequently, arrivals are given priority over departures in bad weather and receive constant updates from ATC regarding the current weather situation. ATC is able to use Ground Delay and Ground Stop programs to slow down departures and enable arrivals to perform larger deviations. However, the weather situation is not always better in landing sectors vs. takeoff sectors. One would think this would be the case if operations were being catered to arrivals. Yet arrival approach paths are restricted by wind direction, sometimes forcing them to land on runways covered by convective weather. In these situations, penetration is immiment, but pilots must maneuver smartly in order to minimize exposure to weather while also landing safely. Of course pilots have the option to divert to an alternate airport. Though similar to the departures side, it looks bad if other pilots are landing and you choose to divert. For O’Hare arrivals, their alternate airport is Midway, which is less than 20 miles away. Thus, it is doubtful that diverting to Midway would prevent penetration, as Midway most likely encounters the same weather patterns as O’Hare due to their proximity. Furthermore, the “need-to-get-it-done” attitude on the departures side transitions to “get-there-itis” on the 97 arrivals side. The interviewees ranked the following as the top four sources of pressure on pilots to land as soon as possible: fuel limitations, coordination with ATC, flight behind schedule, and airline operations (AOC), in that order. 5.1.5 Summary of Interview Takeaways From the interviews with professional pilots, we learned that ascending flights, descending flights, and enroute flights at cruising altitudes all face difference challenges when encountering convective weather. The following will differ based on the sequence of flight: visibility issues, wind issues, type of precipitation and its effects, strength of turbulence, ability to avoid penetration, and overall how weather affects the flow of air traffic. Pilots, air traffic controllers, airline operations (AOC), and all personnel involved with air travel must be aware of these differences given a severe convective weather scenario. 5.2 Case Studies Case studies and their corresponding trajectory plots are very helpful for understanding the evolution and movement of weather within the terminal area and how this affects traffic flows. As we discussed in Chapter 3, the trajectory plots helped to identify several potential features in our models. In this section, we will focus on a few recurring themes that were frequently observed in the case studies. Each of the plots represent a snapshot of a single weather period, and contain all trajectory points close to the airport within that 2.5 minute period. The connected red points represent individual arrival trajectories and the black lines represent individual departure trajectories. The circles represent the nose of the plane. The large black circles around the airport help to indicate the distance of a trajectory point from the airport, increasing incrementally by 10 km. If a flight’s trajectory intersects severe VIL pixels, we assume that the flight indeed penetrated severe convective weather because echo tops will be larger than 98 the flight’s altitude close to the airport. 5.2.1 Theme 1: Pilots Try to Avoid Storm Cells Although in this thesis we focus on why pilots are penetrating severe weather cells, the trajectory plots often show pilots obviously trying to deviate around storm cells and find gaps in weather to avoid penetration. If penetration is imminent, pilots will also attempt to penetrate the lowest VIL areas within the storm cell. Figure 5.1 exhibits this avoidance behavior. Figure 5.1: Example of avoidance behavior by arrivals in the Southwest sector of the ORD terminal area on July 9, 2008 at 001730Z. The arrivals in the Southwest sector barely nick severe VIL cells as they fly through a gap in the frontal mass storm moving west to east just south of the airport. It is apparent that the pilots attempted to avoid not only the storm cells with the highest VIL but storms cells in general while gearing up for approach from the east. The departures turn immediately upon 99 takeoff to avoid the frontal storm south of the airport. Arrivals do not have this flexibility, with approach and landing restricted by wind conditions, runway configuration, and other operating procedures. However, there are rare cases when flights penetrate severe convective weather for seemingly no reason. Figure 5.2 provides an example of this behavior. Figure 5.2: Example of unexplained penetration behavior by a departure in the Northwest sector of the ORD terminal area on July 8, 2008 at 061730Z. Upon takeoff, the departure flies straight into a VIL level 3 storm cells and remains in weather for a long period of time. There are no other flights within the vicinity that would prevent the departure’s ability to deviate. From this example, one may suggest that pilots do not consider VIL level 3 as “severe”. Yet VIL does not provide us with specific details regarding weather conditions. Thus, the flight in Figure 5.2 may be experiencing light rain and limited convectivity, especially if the VIL values are on the low end of the VIL level 3 boundary. In addition, the weather may have worsened quickly, highlighting the uncertainty 100 of weather forecasts. 5.2.2 Theme 2: Arrivals Have a Tougher “Go-of-It” We have established in this thesis the fact that arrivals hypothetically have a harder time avoiding severe convective weather in the terminal area. This is due to their restrictive approach and landing procedures as well as the fact that they must land at some point. Figures 5.3 and 5.4 show that arrival operations continue even when penetration is imminent. Figure 5.3: Both departures and arrivals affected by weather in the West sector of the ORD terminal area on July 2, 2008 at 222500Z. In Figure 5.3 we see departures penetrating very high VIL levels despite turning immediately upon takeoff. The storm cell is so large that penetration is unavoidable. Arrivals execute a trombone maneuver while trying to stay on the outer edges of the storm. Less than ten minutes later, we see in Figure 5.4 that the massive storm cell has move quickly from west to east, with VIL levels 5 and 6 now covering the airport. ATC has halted 101 Figure 5.4: Ground stop is issued due to weather covering the airport on July 2, 2008 at 223230Z. all departure operations, while arrivals continue to execute the same trombone maneuver as in the first image. Diversion to Midway will not alleviate the situation, but rather will expose the arrivals to more opportunities for penetration. Thus, pilots brace themselves for severe weather conditions and do their best to land the plane safely. This example echoes that arrival penetration behavior is inherently different than departure penetration behavior. 5.2.3 Theme 3: Weather Is Unpredictable Although the title seems obvious, the extent to which the movement of weather and its changing level of strength affect terminal airspace was not explored in Lin’s thesis. We devoted an entire set of model features to this weather behavior based on the case studies we examined. Figures 5.5, 5.6, and 5.7 outline one of these case studies, which displays just 102 how fast the severity of weather cells can change despite slow movement of a frontal storm. Figure 5.5: Arrivals executing approach and landing maneuvers amidst severe weather in the Northwest sector of the ORD terminal area on August 22, 2008 at 173000Z. In Figure 5.5, we see that a group of severe storm cells is on top of the airport, forcing arrivals to penetrate while executing their approach/landing maneuver in the Northwest sector. Ten minutes later, Figure 5.6 shows that two of the large storm cells have joined and are surrounding the airport, while a large concentration of VIL level 6 pixels has formed right along the arrival approach path. Penetrations of this severity are not sustainable, so arrivals in Figure 5.7 begin to circumvent and fly behind the large storm cell north of ORD as it moves West to East. Departure operations have resumed; the plot shows them flying through areas of low VIL within the storm cell since the weather scenario dictates imminent penetration upon takeoff. This case provides an example of pilots adapting to the rapidly changing weather conditions close to the airport and minimizing collateral damage despite imminent penetration. 103 Figure 5.6: Concentration of VIL level 6 pixels forms in the middle of the arrival approach path in the Northwest sector of the ORD terminal area on August 22, 2008 at 174000Z. 5.2.4 Case Study Wrap-Up In this research, we have defined VIL levels of 3 or higher to be hazardous, as pilots tend not to fly through them in terminal airspace. In reality, the situation is more complex, with existing cases of pilots flying straight through level 3 weather and cases of pilots who seem to be avoiding level 2 weather. Case studies allow visualization of these scenarios and help to understand why pilots do what they do in the terminal area. Sequences of trajectory plots tell a story, bringing the numbers in the data to life rather than trying to make sense of the number themselves. 104 Figure 5.7: Arrivals begin to circumvent the storm cell as it moves west to east in order to maintain the approach path in the Northwest sector of the ORD terminal area on August 22, 2008 at 175730Z. 6 Conclusions and Future Work 6.1 Thesis Summary and Conclusions Through our use of predictive modeling, case studies, and pilot experience, we constructed semi-dynamic models that accurately predict severe convective weather penetration in terminal areas across the U.S. up to 99% of the time. We also extracted the relative importance of features within these models in order to identify those features that best correlate with and influence pilot penetration. Our findings in this area reinforced those of Yi-Hsin Lin, that the primary indicators of penetration continue to be weather-based, particularly the presence of severe weather within a flight’s trajectory projection. Nevertheless, we found that a number of the operational features in our models weakly 105 correlate with severe convective weather penetration. Despite having lower importance values, these features help shed light on the dynamics of the terminal area. In particular, the importance of features that describe the behavior of other pilots in the terminal area may help us understand how pilots and air traffic controllers deal with weather impacts close to the airport. The most important conclusion was that pilots are more likely to penetrate severe weather when other pilots ahead of them in the ascent/descent sequence already penetrated, as it is a good indication of what is to come. These results imply that rerouting around weather is still often done on an ad hoc basis once a pilot reports his/her weather penetration to ATC. In conclusion, we hope that the robustness of our models allows for implementation across the U.S. and serves as a supplemental tool to existing terminal area convective weather mitigation strategies used by ATC. 6.2 Ideas for Future Work There is no doubt that there is still a great deal of work to be done in understanding the impact of severe weather on terminal area operations as well as how pilots respond to severe weather scenarios. The following sections address some of the shortcomings of this thesis and propose ideas for future research in this area. 6.2.1 Expand Model Datasets Although a severe convective weather penetration is considered a rare event in the scheme of air operations, there are still a relatively small number of penetrating flights in our model datasets. With only three months of trajectory data from 2008, we would ideally acquire data from more recent years to expand our datasets, verify that the same patterns hold year over year, and retrain our models. Otherwise, the predictive power of our models may be limited by the fact that the penetration behavior they have been exposed to is also limited. 106 6.2.2 Incorporate Weather Forecasts Examination of weather forecasts distributed to pilots within the terminal area would help to paint a clearer picture of what information the pilot has upon encountering severe convecting weather. The challenge is that not all pilots receive the same information, with different aircraft having different forecasting and communication capabilities. Thus, the integration of advisory features based on these forecasts into our predictive models would most likely not achieve consistent results. Nonetheless, we could compare the penetration events to the terminal area forecasts (of varying lead time) to gauge whether 1) the pilot was aware of possible weather conditions and 2) whether penetration could have been avoided via advance deviation. If a large proportion of penetration events resulted from unforecasted weather, failure to avoid the weather can be attributed to the accuracy of the forecasting methods rather than the pilot, who could not feasibly deviate in time. On the other hand, if the forecasts largely match the actual weather that occurred, this would imply that such weather could have been avoided pilot. In this case, penetration may be classified as a calculated decision on the part of the pilot or ATC. Imminent penetration close to the airport, particularly during takeoff and landing, is a third possibility that is not as relevant to forecast accuracy but calls into question the decisionmaking process of Ground Control. 6.2.3 Additional Weather Features In addition to improving upon the precision of the current weather-based features in our models, it may be beneficial to include additional weather features such as wind conditions, turbulence levels, and NASA’s Weather Impacted Traffic Index (WITI) in our models. Convective weather is mostly caused by wind and turbulence. The ASPM database contains measures of these weather factors, but these measures are only accurate within 10 km of the airport. Additionally, these measures are not specific to aircraft location, instead 107 summarizing the area as a whole. Doppler data that measures these factors on a pixel-bypixel basis would strengthen our models by differentiating between heavy precipitation and convective weather during penetration classification. Moreover, WITI captures hourly traffic flow information that our current models do not address by quantifying the level of congestion in weather-impacted areas. WITI is primarily used by ATC to determine when to implement ground delay for departures and rerouting schemes for arrivals. WITI is robust, as it can be applied to both forecasted weather and actual weather. Overall, WITI could augment the already strong core of weather-based features in our models by capturing air traffic flows with quantitative values. 6.2.4 Taking an Alternative Approach: Human Factors It is no secret that pilots differ with regards to personality, training, experience, and background. For example, one pilot may be riskier than another, or a given pilot may have more experience in severe weather scenarios than his peers. Additionally, some pilots have “home base airports” which which they are more familiar. These factors, which deal with individual variability among pilots, were not explored in this thesis because this data does not exist. Future studies could take a completely different approach from this thesis and investigate how this pilot variability affects penetration behavior using flight simulators. Within these flight simulators, pilots of different backgrounds and experience levels would be subjected to severe weather encounters, and their behavior would be recorded and assessed. Identified common trends in behavior that correlate with certain personality traits or experience levels could spur modified training procedures. 109 References [1] Air traffic control system command center: enhanced traffic management system (ETMS). Available at http://www.fly.faa.gov/Products/Information/ETMS/etms.html (2006). [2] ASPM system overview. Available at http://aspmhelp.faa.gov/in- dex.php/ASPM System Overview (2014). [3] Aviation weather research: consolidated storm prediction for aviation (CoSPA). Available at https://www.ll.mit.edu/mission/aviation/aviationwxresearch/cospa.html (2013). [4] A. Bayaga. Multinomial logistic regression: usage and application in risk analysis. Journal of Applied Quantitative Methods, 5(2):288-297, 2010. [5] S. Campbell and R. DeLaura. Convective weather avoidance modeling in low-altitude airspace. In AIAA Modeling and Simulation Technologies Conference, Portland OR, 2011. [6] S. Campbell, M. Matthews, and R. DeLaura. Evaluation of the convective weather avoidance model for arrival traffic. In 12th AIAA Aviation Technology, Integration, and Operations (ATIO) Conference, Indianapolis IN, 2012. [7] R. DeLaura and J. Evans. Exploratory study of modeling enroute pilot convective storm flight deviation behavior. In 12th American Meteorological Society Conference on Aviation, Range and Aerospace Meteorology, Atlanta GA, 2006. [8] R. DeLaura, M. Robinson, M. Pawlak, and J. Evans. Modeling convective weather avoidance in enroute airspace. In 13th American Meteorological Society Conference on Aviation, Range, and Aerospace Meteorology, New Orleans LA, 2008. [9] E.A. Dinsdale. Multivariate analysis of functional metagenomes. Technical report, San Diego CA, 2013. [10] J. Erdman. 10 worst weather U.S. airports. Available at http://www.weather.com/news/news/10-worst-weather-airports-20131125 (2013/11/25). [11] W.H. Greene. Econometric Analysis. Prentice Hall, New York NY, 5th edition, 2003. [12] H. Griffioen. Air Crash Investigations: The Crash of Air France Flight 358. Lulu Enterprises, London UK, 2009. [13] G. Kulesa. Weather and aviation: how does weather affect the safety and operations of airports and aviation, and how does FAA work to manage weather-related effects? In The Potential Impacts of Climate Change on Transportation, pages 199-208. U.S. Department of Transportation Center for Climate Change and Environmental Forecasting, Washington DC, 2003. [14] Y. Lin. Prediction of Terminal-Area Weather Penetration Based on Operational Factors. Masters thesis, Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, Cambridge MA, 2013. [15] M. Matthews and R. DeLaura. Modeling convective weather avoidance of arrivals in the terminal airspace. In American Meteorological Society Special Symposium on Weather and Air Traffic Management Integration, Seattle WA, 2011. [16] D. M. Pfeil. Optimization of Airport Terminal-Area Air Traffic Operations under Uncertain Weather Conditions. PhD thesis, Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, Cambridge MA, 2011. 111 [17] D.A. Rhoda and M.L. Pawlak. An assessment of thunderstorm penetrations and deviations by commercial aircraft in the terminal area. Technical report prepared for NASA Ames Research Center, Lexington MA, 1999. [18] L. Rokach and O. Maimon. Data Mining with Decision Trees, volume 3204 of Machine Perception and Artificial Intelligence. World Scientific, London UK, 2008. [19] C.H. Snyder. Aviation weather for pilots and flight operations personnel. Technical manual prepared by the FAA Academy, Washington D.C., 1975. [20] T.M. Therneau and E.J. Atkinson. An introduction to recursive partitioning using the rpart routines. Technical report, Rochester MN, 2014. [21] What is the variable importance measure? Available at http://www.salford- systems.com/blog/company/item/40-what-is-the-variable-importance-measure? (2014). 112