Dynamic Prediction of Terminal-Area Severe Convective Weather Penetration Daniel Schonfeld

Dynamic Prediction of Terminal-Area Severe
Convective Weather Penetration
by
Daniel Schonfeld
B.S., United States Air Force Academy (2013)
Submitted to the Sloan School of Management
in partial fulfillment of the requirements for the degree of
Master of Science in Operations Research
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
c Massachusetts Institute of Technology 2015. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sloan School of Management
May 8, 2015
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hamsa Balakrishnan
Associate Professor of Aeronautics and Astronautics
Thesis Supervisor
Accepted by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Patrick Jaillet
Dugald C. Jackson Professor
Department of Electrical Engineering and Computer Science
Co-director, Operations Research Center
Dynamic Prediction of Terminal-Area Severe Convective
Weather Penetration
by
Daniel Schonfeld
Submitted to the Sloan School of Management
on May 8, 2015, in partial fulfillment of the
requirements for the degree of
Master of Science in Operations Research
Abstract
Despite groundbreaking technology and revised operating procedures designed to improve
the safety of air travel, numerous aviation accidents still occur every year. According to
a recent report by the FAA’s Aviation Weather Research Program, over 23% of these accidents are weather-related, typically taking place during the takeoff and landing phases.
When pilots fly through severe convective weather, regardless of whether or not an accident occurs, they cause damage to the aircraft, increasing maintenance cost for airlines.
These concerns, coupled with the growing demand for air transportation, put an enormous
amount of pressure on the existing air traffic control system.
Moreover, the degree to which weather impacts airspace capacity, defined as the number of aircraft that can simultaneously fly within the terminal area, is not well understood.
Understanding how weather impacts terminal area air traffic flows will be important for
quantifying the effect that uncertainty in weather forecasting has on flows, and developing
an optimal strategy to mitigate this effect.
In this thesis, we formulate semi-dynamic models and employ Multinomial Logistic
Regression, Classification and Regression Trees (CART), and Random Forests to accurately predict the severity of convective weather penetration by flights in several U.S.
airport terminal areas. Our models perform consistently well when re-trained on each
individual airport rather than using common models across airports. Random Forests
achieve the lowest prediction error with accuracies as high as 99%, false negative rates as
low as 1%, and false positive rates as low as 3%. CART is the least sensitive to differences
across airports, exhibiting very steady performance. We also identify weather-based features, particularly those describing the presence of fast-moving, severe convective weather
within the projected trajectory of the flight, as the best predictors of future penetration.
Thesis Supervisor: Hamsa Balakrishnan
Title: Associate Professor of Aeronautics and Astronautics
Acknowledgments
I would like to thank my advisor, Professor Hamsa Balakrishnan, for her support and
guidance throughout this project. Thanks to ICAT alum Yi-Hsin Lin for getting me up
to speed with the data used in this thesis and answering any and all questions about
her research. Thanks also to my fellow ORC students and friends, particularly Jack,
Zeb, Kevin, and Virgile for their help at various stages of the project and for listening
to me ramble about weather penetration and technical support. Finally, I would like to
thank my family for encouraging me to pursue a graduate degree, and for cheering me on
throughout the process.
5
6
Contents
1 Introduction
1.1
18
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
19
Convective Weather Avoidance Model (CWAM) and Weather Avoidance Fields (WAFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
1.1.2
Defining “Severe Convective Weather” . . . . . . . . . . . . . . . . .
21
1.1.3
Defining the “Terminal Area” . . . . . . . . . . . . . . . . . . . . . .
22
1.1.4
Terminal Area Operations . . . . . . . . . . . . . . . . . . . . . . . .
23
1.2
Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
1.3
Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2 Overview of Data
2.1
2.2
Weather Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.1.1
Vertically Integrated Liquid (VIL) . . . . . . . . . . . . . . . . . . . .
28
2.1.2
Echo Tops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.1.3
Case Days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
ETMS Database
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
Verifying ETMS Trajectory Data for Model Dataset . . . . . . . . . .
33
ASPM Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.2.1
2.3
27
3 Feature Identification
37
3.1
Three Separate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2
Dynamic Nature of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.3
Weather-Based Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.3.1
Measuring Severity of Weather
. . . . . . . . . . . . . . . . . . . . .
44
3.3.2
Measuring Movement of Weather . . . . . . . . . . . . . . . . . . . .
45
3.3.3
Spatial Positioning of Weather . . . . . . . . . . . . . . . . . . . . . .
47
7
3.4
3.5
In-Flight Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
3.4.1
Time Spent Within Terminal Area . . . . . . . . . . . . . . . . . . .
50
3.4.2
Flight Behavior Within Terminal Area . . . . . . . . . . . . . . . . .
51
3.4.3
Positioning Within Terminal Area . . . . . . . . . . . . . . . . . . . .
55
Behavior of Other Pilots in the Terminal Area . . . . . . . . . . . . . . . . .
57
3.5.1
Are Flights Ahead Penetrating Severe Convective Weather? . . . . .
58
3.5.2
Behavior of Flights in the Opposite Sequence
. . . . . . . . . . . . .
59
3.5.3
Follow the Leader . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
3.5.4
Feature Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4 Predictive Modeling of Pilot Behavior
64
4.1
Defining the Dependent Variable . . . . . . . . . . . . . . . . . . . . . . . .
64
4.2
Defining our Model Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
4.3
Predictive Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.3.1
Multinomial Logistic Regression . . . . . . . . . . . . . . . . . . . . .
70
4.3.2
Classification and Regression Trees (CART) . . . . . . . . . . . . . .
70
4.3.3
Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.4.1
Model 1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
4.4.2
Model 2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
4.4.3
Model 3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
4.4.4
Summary of ORD Results . . . . . . . . . . . . . . . . . . . . . . . .
85
Testing Our ORD Models on Other Airports . . . . . . . . . . . . . . . . . .
86
4.5.1
Selecting Airport Pairings for Common Model Experiment . . . . . .
87
4.5.2
Comparison of Results . . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.5.3
Insight from Pairings Experiment . . . . . . . . . . . . . . . . . . . .
91
4.5.4
Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
4.4
4.5
8
4.6
Sensitivity of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Case Studies and Pilot Experience
5.1
5.2
92
94
Takeaways from Pilot Interviews . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.1.1
Weather Radar and Forecasting Technology in the Cockpit . . . . . .
95
5.1.2
Deviation from the Filed Flight Path . . . . . . . . . . . . . . . . . .
95
5.1.3
Impact of Convective Weather on Departures . . . . . . . . . . . . .
96
5.1.4
Impact of Convective Weather on Arrivals . . . . . . . . . . . . . . .
97
5.1.5
Summary of Interview Takeaways . . . . . . . . . . . . . . . . . . . .
98
Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
5.2.1
Theme 1: Pilots Try to Avoid Storm Cells . . . . . . . . . . . . . . .
99
5.2.2
Theme 2: Arrivals Have a Tougher “Go-of-It” . . . . . . . . . . . . . 101
5.2.3
Theme 3: Weather Is Unpredictable . . . . . . . . . . . . . . . . . . . 102
5.2.4
Case Study Wrap-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6 Conclusions and Future Work
105
6.1
Thesis Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2
Ideas for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.1
Expand Model Datasets . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.2
Incorporate Weather Forecasts . . . . . . . . . . . . . . . . . . . . . . 107
6.2.3
Additional Weather Features . . . . . . . . . . . . . . . . . . . . . . . 107
6.2.4
Taking an Alternative Approach: Human Factors . . . . . . . . . . . 108
9
10
List of Figures
1.1
Example of WAF lookup table [15]. . . . . . . . . . . . . . . . . . . . . . . .
1.2
Map of Chicago O’Hare arrival fixes. O’Hare’s TRACON is outlined in blue
20
[14]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.1
Example of a VIL image from June 13, 2008, at 0000Z [14]. . . . . . . . . . .
29
2.2
Example of an ET image from June 13, 2008, at 0000Z [14]. . . . . . . . . .
31
3.1
Terminal area intervals for Model 1. . . . . . . . . . . . . . . . . . . . . . . .
39
3.2
Distribution of Penetrations by Distance from ORD . . . . . . . . . . . . . .
40
3.3
Projected flight trajectory looking 10 km out with swath width of 55 degrees.
Angles and distances are not exactly to scale. . . . . . . . . . . . . . . . . .
43
3.4
Calculation of weather-based feature that measures severity. . . . . . . . . .
44
3.5
Distribution of penetration entries by severe weather coverage within the trajectory projection.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.6
Calculation of weather-based features that measure movement. . . . . . . . .
46
3.7
Example of flanking metric calculations with the number of red cells representing “flankcount” and the standard deviation of the degree values representing
“FlankingValue”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8
Plot of Altitude vs. Distance from Takeoff for departure trajectories on July
2nd that penetrated severe convective weather. . . . . . . . . . . . . . . . . .
3.9
48
51
Plot of Altitude vs. Distance from Takeoff for departure trajectories on July
2nd that took place during a weather impact but did not penetrate. . . . . .
52
3.10 Plot of Altitude vs. Distance from Takeoff for departure trajectories on June
10th with no weather impact. . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.11 Plot of arrival flight executing a trombone maneuver at ORD. The red dots
represent the trajectory points and the nose of the plane is represented by the
red circle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
54
3.12 Example of Model 3 distance metric calculations with point-to-point distance
measured in kilometers representing “DistfromLanding” and angular distance
measured in degrees representing “CircleDistfromLanding”. . . . . . . . . . .
57
3.13 Example of “follow-the-leader” behavior by departures in the West sector of
the ORD terminal area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
61
Distribution of VIL and Echo Top values for arrival and departure penetrations within the ORD terminal area. . . . . . . . . . . . . . . . . . . . . . .
67
4.2
Model 1 ORD CART Output . . . . . . . . . . . . . . . . . . . . . . . . . .
75
4.3
Model 2 ORD CART Output . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4.4
Model 3 ORD CART Output . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.5
Map of Top 30 Penetration Airports . . . . . . . . . . . . . . . . . . . . . . .
87
5.1
Example of avoidance behavior by arrivals in the Southwest sector of the ORD
terminal area on July 9, 2008 at 001730Z. . . . . . . . . . . . . . . . . . . .
5.2
99
Example of unexplained penetration behavior by a departure in the Northwest
sector of the ORD terminal area on July 8, 2008 at 061730Z. . . . . . . . . . 100
5.3
Both departures and arrivals affected by weather in the West sector of the
ORD terminal area on July 2, 2008 at 222500Z. . . . . . . . . . . . . . . . . 101
5.4
Ground stop is issued due to weather covering the airport on July 2, 2008 at
223230Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5
Arrivals executing approach and landing maneuvers amidst severe weather in
the Northwest sector of the ORD terminal area on August 22, 2008 at 173000Z.103
5.6
Concentration of VIL level 6 pixels forms in the middle of the arrival approach
path in the Northwest sector of the ORD terminal area on August 22, 2008
at 174000Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12
5.7
Arrivals begin to circumvent the storm cell as it moves west to east in order
to maintain the approach path in the Northwest sector of the ORD terminal
area on August 22, 2008 at 175730Z. . . . . . . . . . . . . . . . . . . . . . . 105
13
14
List of Tables
2.1
VIL Level Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
List of severe convective weather penetration periods within the ORD terminal
29
area during summer 2008. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.1
Summary of Model Features . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.1
VIL Point Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.2
Echo Top Point Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.3
Dependent Variable Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
4.4
Breakdown of frequencies of each severity level for Models 1, 2, and 3. . . . .
68
4.5
Model 1 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . .
74
4.6
Model 1 CART Feature Importance Values . . . . . . . . . . . . . . . . . . .
75
4.7
Model 1 Random Forests Feature Importance Values . . . . . . . . . . . . .
77
4.8
Model 2 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4.9
Model 2 CART Feature Importance Values . . . . . . . . . . . . . . . . . . .
79
4.10 Model 2 Random Forests Feature Importance Values . . . . . . . . . . . . .
80
4.11 Model 3 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . .
81
4.12 Model 3 CART Feature Importance Values . . . . . . . . . . . . . . . . . . .
83
4.13 Model 3 Random Forests Feature Importance Values . . . . . . . . . . . . .
84
4.14 Airport Pairings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
4.15 Comparison of Model 1 performance results for the re-training vs. airport
pairing methods. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree” represents CART, and ”RF” represents Random
Forests. Regarding performance metrics, “Acc” represents the prediction accuracy. “FN 1” represents the first false negative rate defined, “FN 2” represents the second false negative rate, and “FP” represents the false positive
rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
89
4.16 Comparison of Model 2 performance results for the re-training vs. airport
pairing methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
4.17 Comparison of Model 3 performance results for the re-training vs. airport
pairing methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
4.18 Comparison of Model 1 performance results for variable subsets. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree”
represents CART, and ”RF” represents Random Forests. Regarding performance metrics, “Acc” is the proportion of interval entries for which the model
predicts the correct severity level. “FN 1” examines how often the models predict a severity level lower than the actual severity level that occurred. “FN 2”
examines how often the models predict a severity level of 0 when in fact the
severity level was greater than 0. “FP” measures the proportion of interval
entries for which the model predicts a severity level greater than 0 when in
fact the severity level was 0.
. . . . . . . . . . . . . . . . . . . . . . . . . .
93
4.19 Comparison of Model 2 performance results for variable subsets. . . . . . . .
93
4.20 Comparison of Model 3 performance results for variable subsets. . . . . . . .
94
16
17
1
Introduction
The increase in demand for air travel in the United States has resulted in an increase in
congestion and delays in the National Airspace System (NAS), making the system more
susceptible to weather disruptions. Convective weather can close airports, degrade capacity
for acceptance/departure, hinder or stop ground operations, and make operations inefficient
in general [13]. These disruptions are particularly impactful during summer months, when
travel demand is high and there is frequent convective weather activity (i.e. thunderstorms)
across the United States [16]. Furthermore, the desire to sustain and meet air travel demand
sometimes forces pilots into situations in which they are unable to avoid weather penetration,
and controllers into situations in which they are unable to prevent it. When a pilot penetrates
convective weather, intentionally or unintentionally, he/she not only puts those onboard
in danger but also causes damage to the aircraft, resulting in lost revenue and excessive
maintenance costs.
Moreover, although it is clear that convective weather reduces airspace capacity and
results in inefficient flying, the degree to which capacity is reduced and air traffic flows are
affected as a result of weather is not clear. The re-routing of planes within the terminal area,
initiated by either the pilot or air traffic controller, reduces airspace capacity and increases
controller workload. Existing research into the types and severity of weather that cause
re-routing/deviation typically treats all flights as equal [14], failing to differentiate between
specific aircraft types, regional vs. international flights, departures vs. arrivals, and other
flight categories. This thesis takes a different approach by exploring operational factors
that may differentiate pilot behavior as well as weather-based factors that indicate future
penetration within the terminal area.
18
1.1
Background
In this thesis, we rely heavily on research previously conducted by Yi-Hsin Lin at the MIT
International Center for Air Transportation (ICAT). Lins work built on the Convective
Weather Avoidance Models (CWAM) developed at MIT Lincoln Laboratories. The CWAMs
produce Weather Avoidance Fields (WAF), which identify the areas impenetrable to aircraft
as a result of weather. The following subsections below will briefly describe the succession
of CWAMs and WAF, and then explain why we chose to define severe convective weather
differently than Lin.
1.1.1
Convective Weather Avoidance Model (CWAM) and Weather Avoidance
Fields (WAFs)
Rich DeLaura and his team at MIT Lincoln Laboratories developed the CWAM in response
to increasing delays in the NAS caused by thunderstorms. The CWAM provides decision
support tools for air traffic controllers to aid them in determining the impact of weather on
existing traffic, devising a tactical response to mitigate the impacts of weather, predicting the
effects of a particular routing strategy, and predicting updated arrival times for flights subjected to regions of convective weather [7]. The CWAM achieves this by analyzing planned
and actual trajectories as well as a variety of weather indicators to predict enroute flight
deviations due to convective weather.
There are several versions of the CWAM. The first model (CWAM1), which focused on
enroute flights, was developed in 2006 based on 800 trajectories over five different days in the
Indianapolis (ZID) and Cleveland (ZOB) “super sectors” [7]. The study took into account
the following three weather indicators that will be discussed in more detail in Chapter 2: VIL
(measure of precipitation intensity), echo tops (storm height), and lightning strike counts.
The second version (CWAM2), which also focused on enroute flights, was developed in 2008
and expanded the number of flights in the dataset to about 2,000 by adding the Washington
19
D.C. “super sector” [8]. It also considered additional weather factors such as vertical storm
structure and vertical and horizontal storm growth to help decrease CWAM1’s prediction
error rate. In 2010, the release of CWAM3 refined earlier models to improve detection of
non-weather related deviations, such as shortcuts, and further expanded the dataset to about
5,000 flights [5]. Most recently, MIT Lincoln Laboratories developed a version of the CWAM
specific to low-level flights within the terminal area, which typically operate below the tops
of convective weather and have slightly different operational constraints [5]. The terminal
area CWAM is calibrated based on historical pilot behavior during weather encounters near
the destination airport.
All versions of the CWAM return the probability of deviation for a pilot encountering
a particular set of weather conditions. This output, commonly referred to as the Weather
Avoidance Field (WAF), is a probability lookup table: for any given echo top height and
local VIL coverage, the model returns a probability of pilot deviation on a pixel-by-pixel basis
[14, 15]. An example of this lookup table can be seen in Figure 1.1. The main advantage of
using WAF over the raw VIL/ET metrics is that WAF eliminates much of the light rain that
has little to no effect on aviation and accounts for the frequency of lightning strikes within
each pixel [14].
Figure 1.1: Example of WAF lookup table [15].
20
The principal difference between the enroute and terminal area CWAMs is the primary
determinant of pilot deviation. For the enroute airspace model, the difference in altitude
between the flight and the echo top height served as the primary determinant [7]. In contrast,
the fractional VIL coverage of Level 3 or above within a specified kernel of the flight trajectory
served as the primary determinant of deviation in the terminal area model [5]. This makes
sense because pilots can overfly weather during the enroute phase flight, but due to the
low altitudes necessary for descent/ascent, pilots typically cannot overfly weather in the
terminal area. Therefore, the WAF for enroute flights is fundamentally different than WAF
for ascending/descending flights.
1.1.2
Defining “Severe Convective Weather”
A question which naturally arises is how to quantitatively define “severe convective weather”.
Lin defined it as WAF levels of 80 or above. WAFs of 80 or above can be interpreted as when
a pilot has a greater than 80% chance of actually penetrating Level 3 VIL or above with
flight altitude below the corresponding echo top value [14]. In contrast, we chose not to use
WAF to classify severe convective weather for a variety of reasons. First, WAF reflects the
probability that a pilot will deviate rather than explicitly describing weather conditions like
VIL and echo top do. Second, since some low-level VIL pixels will correspond to high WAFs
simply because of proximity (within 4 km kernel) to higher VIL levels, it is possible that
pilots flying through high WAFs are not actually penetrating severe convective weather at
all. At the other end of the spectrum, WAF is low in situations where pilots have no chance
of deviating because they are surrounded by severe convective weather. Third, terminal
WAF does not account for each individual flights altitude relative to echo top height [14],
so it is unclear whether or not the flight is above or below the storm. Lastly, WAF is not
consistent: a VIL/ET combination that constitutes a WAF of 80 on one case day does not
always translate to a WAF of 80 on another case day. For example, on August 4, 2008, the
21
WAF model fails when none of the departures that we classify as severe convective weather
penetrations have WAF of 80 or above during their entire ascent. Thus, we will define severe
convective weather as pixels with VIL Level 3 or above and flight altitude below echo top
height.
In our analysis, penetration occurs when pilots taking off from or landing at Chicago
O’Hare International Airport (ORD) fly through severe convective weather. This is an
important distinction because there are several airports within or just outside the ORD
terminal area, which we will thoroughly define in the following section. We do not examine
penetrations that take place while departing from or landing at one of these nearby airports.
Furthermore, we focus on ORD flights in our analysis because O’Hare is considered one of the
worst airports in the U.S. for severe convective weather, with an average of 38 thunderstorm
days per year [10]. Since the overwhelming majority of these thunderstorm days occur during
the summer months, we will focus on flights that occur in June, July, and August. In the
flight database used for this research, O’Hare accounts for the highest number of penetration
flights as well as overall penetration entries (i.e. one flight can penetrate multiple times) of
any airport in the continental U.S.
1.1.3
Defining the “Terminal Area”
This thesis focuses on pilot behavior within a region near the airport we call the terminal
area. The dimensions of this region are not precisely defined, varying from airport to airport.
Most major airports have Terminal Radar Approach Control (TRACON) facilities, which
serve the airspace immediately surrounding the airport. Using the TRACON boundary is
one possible definition. However, TRACONs can vary in size and shape just like terminal
areas, and a simpler, more general definition is desirable. To devise the best definition,
we must consider what characteristics define the terminal area and why pilot behavior in
this region might be different from pilot behavior during the enroute portion of the flight.
22
The primary difference is that aircraft trajectories are far more constrained both vertically
and horizontally within the terminal area. Enroute flights can frequently overfly or deviate
around convective weather, whereas a flight in its ascent or descent sequence will most likely
be flying below the storm with limited ability to deviate due to the high level of congestion
and standardized trajectories in the terminal area.
In this thesis, we define the ORD terminal area for arrivals to be the circle of radius
180 km around the airport, and for departures a circle of radius 150 km around the airport.
The specific radii of the terminal areas for arrivals and departures were determined based on
when arrivals begin their descent sequence and when departures end their ascent sequences.
A 180 km radius was chosen for arrivals because this is the distance at which aircraft begin
to continuously decrease their altitude. A 150 km radius was chosen for departures because
this is the distance at which aircraft begin to stop continuously increasing their altitude and
start leveling off for the enroute portion of the flight. Within these radii, flights are below all
non-negligible storm echo tops and cannot overfly the convective weather. Using a circular
region simplifies analysis by allowing the region to be broken up into eight equally spaced
sectors, each corresponding to cardinal or intermediate directions. At ORD, departures typically take off in the four cardinal direction sectors (North, South, East, West), and arrivals
typically land in the four intermediate direction sectors (Northeast, Northwest, Southeast,
Southwest).
1.1.4
Terminal Area Operations
The airspace contained within the terminal area can be subdivided into sectors controlled
by individual air traffic controllers. Control of aircraft as they fly between these sectors is
handed off between the air traffic controllers. The controllers are responsible for maintaining
the separation of aircraft, through voice radio communication and aircraft position tracking,
and providing real-time information to aircraft such as weather conditions near the airport
23
[14]. Hence pilots must obtain approval from these controllers to deviate from their filed
flight plan.
The airspace capacity of a sector can vary depending on the complexity of flow patterns
within the sector or other conditions such as the presence of convective weather. Each
flight must follow a Standard Instrument Departure (SID) when departing an airport and
a Standard Terminal Arrival Route (STAR) when arriving [16]. These routes are specified
by a sequence of waypoints, or fixes, along with rules governing the speed, heading, and
altitude of aircraft at certain waypoints [16]. Each airport has multiple STARs and SIDs; the
assignment of an aircraft to specific routes is a function of its origin (or destination), aircraft
type, runway restrictions, and load balancing of runways [16]. One of the most common
terminal area layouts, as seen at Chicago O’Hare, is the four cornerpost configuration, in
which airspace is divided into four arrival sectors alternating with four departure sectors.
Figure 1.2 contains a diagram of a four cornerpost configuration.
Figure 1.2: Map of Chicago O’Hare arrival fixes. O’Hare’s TRACON is outlined in blue [14].
24
1.2
Thesis Contribution
As mentioned in the Introduction, this thesis relies heavily on the research completed by
Yi-Hsin Lin during her time at MIT ICAT. In order to determine the best predictors of
severe convective weather penetration, Lin employed models that predicted the maximum
WAF penetrated by pilots of arriving aircraft during the descent phase [14]. Her models
built upon the WAF model by incorporating operational factors, such as prior delays and
existing congestion in the terminal airspace, in addition to weather-based factors. Her best
model accurately predicted penetration 90% of the time [14]. She found that weather-based
and stream-based features were the most predictive of severe convective weather penetration.
In particular, pilots were more likely to penetrate severe convective weather when they were
part of a stream following other pilots that crossed through weather and less likely when they
were pathfinders leading a new stream [14]. This implies that re-routing around weather is
still often based on reported events to air traffic controllers rather than preemptive action
based on forecasts [14]. Furthermore, Lin found that pilots were more likely to penetrate
severe convective weather closer to the airport because, intuitively, there is less ability to
deviate from the flight path upon approach.
Our model is fundamentally different than Lins first and foremost in its definition of severe
convective weather, as we examine the raw weather metrics that make up WAF rather than
WAF itself. Applying this revised definition, our model dynamically predicts the severity
of pilot penetration at several checkpoints throughout the ascent and descent phases for
both departures and arrivals respectively. Thus, the features (or predictors) in our model
are more specific to the trajectory of the flight and its location relative to the airport. Our
best models accurately predict the severity of penetration over 98% of the time with a false
negative rate of less than 1% and a false positive rate of less than 3%. Furthermore, the
models reveal that the presence of severe convective weather within a specified distance from
the projected trajectory of the flight is the best predictor of future penetration. Nonetheless,
25
the behavior of other flights nearby is still moderately correlated with penetration behavior in
the terminal area. If flights close by, whether they be departures or arrivals, are penetrating,
it is highly likely that the flight of interest will also penetrate severe convective weather
in future time steps. Additionally, we found that the longer amount of time flights spend
within the terminal area, the more likely they are to penetrate severe weather. This may seem
counterintuitive at first because flights that try to deviate around weather often experience
longer flight times. On the other hand, spending more time in the air subjects the flight
to more opportunities for severe weather penetration. Lastly, after running our models on
several U.S. airports in addition to ORD, we found that our models consistently perform
well when re-trained on each individual airport rather than using common models across
airports. This held true even among airports in the same region of the continental U.S.
1.3
Thesis Organization
Due to the limited number of time periods plagued by severe convective weather in our
research data set, we will present a combination of predictive modeling, case studies, and
pilot observation to better understand pilot behavior within the terminal area during severe
convective weather scenarios. Chapter 2 discusses the data sources for this study. These
include weather data from MIT Lincoln Laboratories, trajectory data from the Volpe National Transportation Center, and flight information maintained by the Federal Aviation
Administration (FAA).
Chapter 3 describes the features included in our predictive models for both arrivals and
departures. These features can be classified into three categories: weather-based, in-flight,
and features that capture the behavior of other pilots flying in the terminal area simultaneously.
Chapter 4 describes the three types of predictive models used in this study and the results
obtained from employing them. Multinomial Regression was used for its ability to handle well
26
categorical, dependent variables of more than two classes and independent factor variables.
Classification and Regression Trees (CART) were chosen due to their transparency/interpretability, applicability to relatively small sample sizes, and ability to weigh the relative
importance of features. Random Forests were explored as an extension to CART with an
extremely high level of randomness, facilitating the discovery of patterns not detected by
CART.
Chapter 5 presents the case studies explored and commonly observed themes of pilot
behavior, highlighting scenarios in which pilots penetrated severe convective weather within
the terminal area. Along with observations from pilots regarding terminal area procedures
during severe convective weather, these case studies will help to verify, or sometimes disprove,
model results in order to determine which features truly best predict penetration.
Finally, Chapter 6 discusses the implications of this thesis and plans for future work.
2
Overview of Data
Three main data sources were used in this thesis: weather data from MIT Lincoln Laboratories, trajectory data from the Enhanced Traffic Management System (ETMS) database
provided by the Volpe National Transportation Center, and airport information from the
FAAs Aviation System Performance Metrics (ASPM) database.
2.1
Weather Data
Prior to 2006, there existed a range of competing weather forecasts for aviation that led
to a great amount of inconsistencies and confusion in critical air traffic management situations [14]. In response to this highly inefficient and complicated system, the FAA’s Aviation
Weather Research Program (AWRP) established the Consolidated Storm Prediction for Aviation (CoSPA) program to integrate the different forecast systems into one reliable, accurate
system [3]. CoSPA features collaboration from a variety of different organizations including
27
MIT, NCAR, NOAA, NWS, NASA, and DoD [3]. These organizations, collectively, aim to
improve and integrate existing prototype products such as the Corridor Integrated Weather
System (CIWS) and the Integrated Terminal Weather System (ITWS) [3].
MIT Lincoln Laboratory is at the forefront of this effort with its tactical (0-2 hr) storm
forecasting and has successfully harnessed high-resolution, real-time weather data that accurately depicts the severity of storm cells [3]. Through the integration of multiple sensor
data sources, including radar (NEXRAD, TDWR, Canadian), satellite imagery, and surface
observations, Lincoln Labs has produced Vertically Integrated Liquid (VIL) and echo top
maps of the entire continental United States with 1 X 1 km pixel resolution updated every
2.5 minutes [14]. These maps, or matrices of pixel-by-pixel values, will serve as the main
weather inputs considered in this thesis and will be described in detail in the following sections. MIT Lincoln Laboratories provided us with weather maps for 14 days in summer 2008.
They also provided us with software scripts necessary for converting between latitude/longitude and matrix coordinates in each weather image using the Lambert azimuthal equal area
projection.
2.1.1
Vertically Integrated Liquid (VIL)
VIL is a measure of the amount of moisture in a vertical column of the atmosphere and is
typically used to indicate areas experiencing heavy rain or hail as well as identify potential
supercells and downbursts [7]. VIL helps to avoid false alarms by extending to high altitudes
and looking at storm cells as a whole instead of just their effect at low altitudes [14]. The raw
pixel-by-pixel data provided by MIT Lincoln Laboratories measures VIL on a 0-255 scale.
The VIL maps divide these raw values into 6 unequally distributed VIL levels to help with
visual interpretation of the severity of convective weather cells. The levels correspond to
pilots perceived threat levels with Level 3 representing a “yellow” threat level, Levels 4 and
5 representing “orange” threat levels, and Level 6 representing the most severe “red” threat
28
level [14]. The precise VIL level cutoffs can be seen in Table 2.1:
Table 2.1: VIL Level Cutoffs
Figure 2.1: Example of a VIL image from June 13, 2008, at 0000Z [14].
Figure 2.1 provides a useful demonstration of the common types of convective weather
encountered in different regions of the United States during the summer months. Moving
from left to right, first we see scattered light rain across the Northwest, consisting mostly of
Level 1 VIL that has little to no effect on aviation. Next, we see a long line of severe storm
29
cells, associated with the strong organized convection of a cold front [19], developing across
the Midwest. This line of cells, commonly referred to as a “frontal storm” [19], makes it hard
for pilots flying into the wind to deviate and find pockets of airspace without weather. Lastly,
we see synoptic, scattered storm cells in the Southeast associated with summer convection
[19]. These isolated, smaller cells, which comprise an “air mass storm”, have short lifecycles,
making them very difficult to forecast [19]. Regardless of the arrangement of storm cells and
their classification, they can be greatly disruptive to aviation. It is useful to consider these
different types of storms in order to observe differences in the strategies of pilots in each
scenario.
2.1.2
Echo Tops
Although VIL gives us a good measure of the precipitation in a vertical column of the atmosphere, we have no idea the height at which this precipitation begins or ends. Echo tops
provide an estimate of the maximum height (in thousands of feet) of clouds containing convective weather [7] so that we know whether a flight is above/below the convective weather.
However, echo tops do not indicate the minimum height of storm cells, so we assume in this
thesis that if a flight is below the echo top height then it is subjected to the weather in that
pixel. Pixels unaffected by weather maintain an echo top value of 0.
Figure 2.2 shows an example echo tops image from the same timeframe as the VIL image
in Figure 2.1. Comparing this image to Figure 2.1, it is apparent that echo top values are
generally correlated with VIL values. For instance, areas with high VIL values will also have
relatively high echo tops. This is because stronger convective cells typically extend higher
into the atmosphere [7], resulting in stronger storms. However, there exist rare severe storms
that occur lower in the atmospheric column (i.e. storm height of 25,000 feet) that enroute
pilots at altitudes above 35,000 feet can easily overfly but cause problems for aircraft in the
ascent/descent phase of flight.
30
Figure 2.2: Example of an ET image from June 13, 2008, at 0000Z [14].
2.1.3
Case Days
The date/time associated with each of the weather files given to us by Lincoln Laboratories
is in UTC format, so we will use this convention throughout this thesis. To convert UTC
to local time during the summer months, we subtract 5 from the UTC hour unit. The
weather files consist of 14 days in June, July, and August 2008 in which weather, at some
point, impacted the ORD terminal area. However, in this thesis, we specifically focus on
time periods within these days in which severe convective weather penetrations took place in
the ORD terminal area. Thus, we examine periods in which severe convective weather was
present long enough to affect air traffic flows in a negative manner. These specific periods
are outlined in Table 2.2.
Although there may be multi-hour gaps between penetrations within these periods, the
table only includes one time period per day. Nonetheless, large time periods that last almost
a full day indicate that weather impacted the ORD airspace consistently for a significant
31
Date
Time Period
06/12
15:10-22:25
06/13
00:30-23:50
06/14
00:10-04:00
06/25
11:10-20:10
07/02
18:40-23:59
07/03
00:00-07:00
07/07
10:20-17:55
07/08
00:50-23:59
07/09
00:00-04:40
07/10
17:50-23:59
07/11
00:00-04:00
08/04
11:20-20:35
08/05
00:20-04:00
08/22
10:50-21:55
Table 2.2: List of severe convective weather penetration periods within the ORD terminal
area during summer 2008.
amount of time.
2.2
ETMS Database
The Enhanced Traffic Management System (ETMS) was created by the FAA in order to
monitor and react to traffic congestion in the United States using real-time trajectory data
[1]. Air traffic controllers can leverage this data to direct aircraft flow and make decisions
regarding Ground Delay Programs (GDP) and Ground Stop Programs (GS) [1].
The ETMS comprises a network of “hubsites” that transmit/receive trajectory data
to/from several remotes sites throughout the United States using the Aircraft Situation Display to Industry (ASDI) feed [1]. Trajectory data is automatically generated by transponders
on aircraft and sent as real-time messages to the ASDI feed. In 2008, many aircraft, especially general aviation aircraft, were not outfitted with transponders [14], so some flights
32
may be missing from the database. We have no way of identifying this missing flights.
The Volpe National Transportation Center provided us with all ETMS data from 2008;
this data consists of two main tables. The first table provides basic information about each
flight such as arrival and departure airport, scheduled departure time, scheduled arrival
time, actual arrival time, and aircraft type. ETMS also assigns a unique flight key to each
flight so that flight information and trajectory information are linked. Several flights in the
database have blank (NULL) fields, but these flights typically do not take place during the
case periods.
The second table contains positional data for each flight at numerous points throughout
the flight. This data includes transponder message time, latitude, and longitude. This
table also includes a “smoothed” altitude that is derived using a moving average as well
as an average speed that is derived from position and time data. Messages are sent from
flight transponders approximately once a minute during the enroute portion of the flight and
approximately once every 15-20 seconds during the ascent/descent portions of the flight.
2.2.1
Verifying ETMS Trajectory Data for Model Dataset
Like any large dataset, ETMS contains several data errors that required “cleansing” prior to
analysis. Most of these errors were present in the trajectory data. One common error is the
presence of gaps in trajectories because flight transponders may not transmit their position
for long periods of time. Luckily, the majority of these gaps take place during the enroute
portion of the flight, whereas we are concerned with trajectory gaps within the terminal
area. When a flight is in its ascent or descent phase, it typically transmits its position every
15-20 seconds, if not more often. However, there are a handful of flights that have gaps much
larger than 15-20 seconds. We chose to exclude flights from our model dataset that had a
trajectory gap of greater than two minutes. Since the ascent/descent phases of flight last
only 15-20 minutes, a gap larger than two minutes would represent a significant portion of
33
our observation period, during which a weather penetration may be missed. Additionally,
our weather files represent 2.5 minute time periods, so gaps longer than this would skip over
an entire weather file.
Another common error that occurs within approximately 30% of flight trajectories in the
database is the presence of unreasonable altitude entries. For instance, consider a departure
that is listed at an altitude of 2,500 feet at message two, 36,000 feet at message three, and
then 4,200 feet at message four. It is apparent that there is an error in the altitude entry
for message three. We corrected faulty altitude entries in different ways depending on their
corresponding message number and whether they occurred within a departure vs. arrival
trajectory.
All departures in the dataset record an initial altitude of zero. However, it is obvious
that the initial transponder message did not take place at zero altitude because the initial
position is some kilometers away from the runway or the altitude jump from the first to
second transponder message is unreasonably large. Thus, we extrapolate the actual initial
altitude using equations 2.1-2.3.
34
t1new = D1 ÷
δ
t2 − 0
t2new = t1new + t2
a1new =
a2
t2new
∗ t1new
(2.1)
(2.2)
(2.3)
where
D1 = distance from takeoff to first message
δ = distance from takeoff to first message
t1new = time between takeoff and first message
t2 = original time spent in the air from takeoff to second message
t2new = new extrapolated time spent in the air from takeoff to second message
a1new = new extrapolated altitude at first message
a2 = original altitude value at second message
From these equations, we can also extrapolate an estimate for the actual takeoff time of
the flight, which will be useful in the predictive models. The takeoff runway for each flight
is assigned based on distance from each runway and the heading of the plane at the first
message entry.
Moving on with unreasonable altitude entries with arrival trajectories, if the faulty entry
took place at the first message entry within the terminal area, we would set the altitude
value to the next altitude entry plus 300 feet. This ensures the trajectory will maintain
a relatively smooth descent. On the other hand, if the faulty entry took place at the last
message entry of an arrival trajectory within the terminal area, we would set the altitude
value to the previous altitude value minus 300 feet. Similarly, for altitude misprints at the
last message entry of a departure trajectory within the terminal area, we set the altitude
to the previous altitude entry plus 300 feet to ensure smooth ascent. Finally, if the faulty
altitude entry took place anywhere else in the trajectory, we set the new altitude to the
35
average of the altitude entries surrounding it. For example, if the altitude misprint took
place at the 4th message, we set its value to the average of the altitudes at the 3rd and 5th
message entry.
The initial and final positions of flights is also an area of concern within the data. On
the departures side, as discussed above, the initial latitude/longitude of flights indicates a
non-trivial distance away from ORD. Thus, the initial transponder message does not reflect
the flight’s initial position at takeoff but rather some position later within the takeoff phase.
Consequently, we limited the model dataset to departures with an initial position less than
10 km from the published latitude/longitude coordinates of ORD. Since the largest distance
between takeoff runway and airport terminal at ORD is 3.25 km, any flight with initial
position larger than 10 km has most likely been in flight for a couple minutes, representing a
large gap in trajectory that we want to avoid. Moreover, it is very difficult to assign takeoff
runways to flights with initial positions greater than 10 km from ORD.
Similar to departures, the final latitude/longitude of arrivals sometimes indicates a relatively large distance away from the published latitude/longitude coordinates of ORD. Consequently, we limited the model dataset to arrival trajectories with a final position less than
10 km from ORD. Beyond this distance, it was very difficult to assign landing runways,
and a large portion of the flight would be left out of examination. Furthermore, the arrival
trajectories often contain multiple message entries at the same altitude at the end of the
trajectory. For our analysis, particularly when assigning landing runways, we assumed that
the first of these message entries at the same altitude was the actual point of touchdown.
Overall, we removed approximately 5% of the flights, including both departures and
arrivals, from the original model datasets due to the data errors discussed above.
36
2.3
ASPM Database
The Aviation System Performance Metrics (ASPM) is a FAA-built database of the National
Airspace System providing airport and individual flight data for 77 airports and 22 carriers
in the US [2]. This thesis accessed ASPM’s online “Efficiency Reports” to extract important airport data. The reports provide such useful information (in local time in 15-minute
intervals) as runway configuration, wind speed, wind direction, visibility, and ceiling. Some
of these metrics were used in preliminary models but had very weak predictive power. The
metrics are accurate only within a 10 km radius of ORD rather than across the entire terminal area we defined. Nonetheless, the runway configuration data enabled verification of
runway assignment algorithms used in the models and aided in the case studies presented in
Chapter 6 by outlining airport operations in convective weather scenarios.
3
Feature Identification
When we discuss “features” in this thesis, we are referring to the independent variables in
our predictive models. Our models utilize three different types of features: weather-based,
in-flight behavior, and behavior of other flights in the terminal area. The following sections
will discuss these three feature types in detail while presenting the different model features
that fall within these groups. To make sense of these different model features, we must first
discuss the three separate models examined in our research and their dynamic nature.
3.1
Three Separate Models
We first separated models by arrivals vs. departures because their behavior is inherently
different in the terminal area, especially close to the airport. From a top-level perspective,
the difference between them is clear: departures ascend while arrivals descend. However, the
differences in their behavior can be characterized more granularly. When departures take
37
off, their horizontal and vertical movement is much less constrained than that of arrivals
descending, making it easier to deviate around weather cells. Upon takeoff, departures are
able to turn in a wide variety of directions right away, whereas arrivals on approach must
follow specific landing patterns based on their assigned runway and the current wind direction. Farther out from approach, arrivals are assigned to cornerposts and their corresponding
streams by air traffic controllers.
In addition, there is pressure to get arrivals on the ground in a severe convective weather
scenario. The situation at Chicago O’Hare adds to this pressure because diversion to its
alternate airport, Midway (MDW), does not significantly improve weather conditions due to
their very close proximity of less than 25 miles. Consequently, flights whose assigned runway
is covered by severe convective weather may have no other option but to penetrate, as ATC
rarely allows for runway reassignment. To pilots and airlines alike, this is a better option
than continuing to fly around the terminal area burning fuel and subjecting the plane to
further weather encounters, especially if storm cells are widespread throughout the terminal
area. In contrast, on the departures side, ATC can slow or even halt departure operations
altogether when severe weather is present because they are already on the ground.
We then separated the arrivals model into two separate models: one for the portion of the
trajectory far from ORD and one for the portion of the trajectory close to ORD. The “close”
model examines the trajectory starting 50 km out from O’Hare. We made this second model
split because arrivals behave much differently within this 50 km radius from the airport.
Within this boundary, arrivals are setting up their approach for landing. Based on a flight’s
assigned runway, the direction it is coming from, and current wind conditions, a flight may
actually have to fly past the airport in order to obtain the proper approach direction. Thus,
within the 50 km radius, an arriving flight’s distance from the airport may fluctuate, making
it harder to bin messages based on distance from the airport. We will discuss why this
presents a problem for our original model design in the next section.
38
In total, we now have three separate models: one for departure trajectories, one for arrival
trajectories outside 50 km from the airport, and one for arrival trajectories within 50 km
from the airport. We will refer to these three models as “Model 1”, “Model 2”, and “Model
3” respectively in this thesis. Furthermore, although these three models will use many of
the same features, there are some features that are unique to a particular model. We will
discuss the assignment of features to models in the following sections.
3.2
Dynamic Nature of Models
One main distinction between our models and Lin’s models is their dynamic nature, meaning that predictions are updated based on updated feature values at multiple checkpoints
throughout a flight’s trajectory within the terminal area. The checkpoint intervals for Model
1 and Model 2 are defined based on distance from the airport and become larger as you move
farther away from the airport. Figure 3.1 shows the Model 1 checkpoint intervals.
Figure 3.1: Terminal area intervals for Model 1.
39
The red dot in Figure 3.1 represents the airport. The concentric half-circles represent the
checkpoint boundaries. Moving outwards, penetration predictions are made at the checkpoint
for the portion of the flight that takes place between the checkpoint and the next half-circle.
For instance, suppose a departing flight is currently 80 km from the airport. The prediction
for whether or not the flight will penetrate severe convective weather between 80 km and the
next checkpoint boundary (110 km) will be made at the 80 km checkpoint. The feature values
used for this prediction are calculated based on information obtained up to the current point
in the flight. The final prediction in this scenario would take place at the 110 km checkpoint
boundary because it is the beginning point of the last interval in the terminal area.
The Model 1 and 2 checkpoint intervals were constructed based on the distribution of
penetration entries by distance from ORD, shown in Figure 3.2.
Figure 3.2: Distribution of Penetrations by Distance from ORD
Arrival and departure penetrations follow a similar distribution, with most penetrations
taking place within 30 km of ORD. Some penetrations take place between 40 and 100 km
40
from ORD, and then very few take place beyond this point 1 . Our prediction intervals reflect
this distribution, with more frequent, shorter intervals closer to the airport and less frequent,
larger intervals farther from ORD. The intervals become larger as the flight gets farther away
from the airport because behavior becomes more consistent at these farther distances and
because time between transponder messages is longer 2 . Specifically, for Model 1, predictions
are made for the following intervals: between takeoff and 10 km, between 10 km and 20 km,
between 20 km and 30 km, between 30 km and 50 km, between 50 km and 80 km, between
80 km and 110 km, and between 110 km and 150 km. Model 2 predictions work almost
identically except that the predictions are made in the opposite direction moving towards
the airport, starting at 180 km out and ending with a prediction between 80 and 50 km.
Model 3’s prediction intervals are based on time spent within 50 km of the airport instead
of distance from the airport. Model 3 is designed this way because arrivals’ distance from
the airport often fluctuates within 50 km due to varying approach paths such as “tromboning”, holding patterns, and other maneuvers that make it impossible to create consistent
checkpoint boundaries. For Model 3, predictions are made every 2.5 minutes of flight. Thus,
when an arrival first enters the 50 km radius, the model predicts whether it will penetrate
severe convective weather within the next 2.5 minutes of flight. This process continues until
the flight lands. This time-based prediction approach is hypothetically much more difficult
than the distance-based prediction approach utilized in Model 1 and 2 for two main reasons:
1) a lot can happen in 2.5 minutes of flight and 2) flight behavior is already more uncertain
close to the airport. Flights that spend more than 25 minutes within the 50 km radius before
landing were excluded from the model dataset. Such cases were very rare and exhibited flight
behavior that often did not make sense, suggesting errors in the recorded trajectory.
In reality, the structure of our models may be better described as semi-dynamic than
dynamic because we group information from multiple trajectory message entries into a finite
1
The plot shows that there are no departure penetrations past 150 km because that is the boundary for
our terminal area with respect to departure flights. This boundary is 180 km for arriving flights.
2
Every 1-2 minutes as opposed to every 15-20 seconds
41
number of bins based on checkpoint boundaries. A purely dynamic model would, in contrast,
make a prediction at each message time along the trajectory. We chose to use semi-dynamic
models rather than purely dynamic models because semi-dynamic models are better able
to capture overall flight behavior and traffic flows during weather impacts. Additionally,
purely dynamic models would require predictions every 15-20 seconds during ascent/descent.
This would necessitate not only magnitudes more computing power but could also result in
“information overload” for air traffic controllers who use our prediction tool.
3.3
Weather-Based Features
This group of features exploits the weather data discussed in Chapter 2 to characterize the
current weather scenario that a flight is experiencing. For instance, if convective weather
is on top of ORD and its surrounding area out to 10 km, one would assume that many
flights will penetrate upon takeoff or landing. However, our model’s weather-based features
do not aim to characterize the weather scenario within the ORD terminal area as a whole,
but rather focus on weather in relation to a flight’s projected trajectory. This fundamental
difference from Lin’s weather-based features makes sense considering the dynamic nature of
our models and our interest in specific segments of a flight’s trajectory during ascent/descent.
An example of the projection from which we calculate our weather-based features is shown
in Figure 3.3.
The matrix shown within the projection represents the VIL values corresponding to
individual pixels within the projection. These VIL pixel values are vertical columns that
extend infinitely into the atmosphere. All of the pixels that are within the triangle formed
by the blue lines, including those that intersected by the triangle’s edges, are considered
part of the trajectory projection. The length and swath width of the projection may vary
based on the current prediction checkpoint boundary. Values of 10 km and 65 degrees are
typical of projections for smaller intervals closer to the airport. The length of projections
42
Figure 3.3: Projected flight trajectory looking 10 km out with swath width of 55
degrees. Angles and distances are not exactly to scale.
is usually equal to the interval distance, and the swath width is adjusted to account for
potential horizontal movement of the flight within an interval.
The weather-based features are calculated from the VIL values associated with the group
of pixels within the projection. Echo top values, however, are not considered in our weatherbased features. If an echo top value exists for a pixel within the terminal area, which requires
a VIL level≥3, most observed flights will be below this echo top because they are below
cruising altitudes. Thus, thorough examination and manipulation of these values would be
redundant. Nevertheless, these echo top values still serve as limiting criteria for penetration
classification.
Furthermore, the weather-based features can be divided into three distinct groups: one
group measuring the severity of weather within the projection, one group measuring the
movement of weather within the projection, and a third group measuring the spatial positioning of weather cells within the current prediction interval. We describe the features
within each group in the sections below. Every one of these features is included in Models
43
1, 2, and 3.
3.3.1
Measuring Severity of Weather
This subset of weather-based features aims to capture the strength of the storm cells, if
present, within the trajectory projection. Intuitively, if a flight’s projection contains a large
amount of strong storm cells, there is a good chance that the flight will penetrate in that
interval, unless it deviates around the weather or the weather moves. We use two different
features to measure the severity of weather within a flight’s projection, with both features
measuring the proportion of cells within the projection that have VIL level greater than
or equal to 3 (VIL value≥133). The features differ in the set of VIL pixels they examine:
the first feature (“BadWeatherPercentageBefore”) evaluates the pixels within the projection
from the prior weather data time period, whereas the second feature (“BadWeatherPercentageNow”) evaluates the pixels from the current weather data time period. Our weather
data files are broken up into 2.5 minute periods, so if the current time is 10:30:00 UTC,
then “BadWeatherPercentageBefore” would look at pixel data from the 10:27:30 weather
file, and “BadWeatherPercentageNow” would like at pixel data from the 10:30:00 weather
file. An example calculation, independent of time, of these features for a 15-pixel rectangular
projection can be seen in Figure 3.4, with severe VIL values in red.
Figure 3.4: Calculation of weather-based feature that measures severity.
The median and average values for these features given a penetration are respectively
44
around 15% and 30%, compared to a value of 0% and 2% for non-penetrations. However,
Figure 3.5 reveals that the large majority of penetrations takes place when the values for
500
these features are below 10%. Thus, weather coverage does not have to be particularly
300
200
0
100
# of Penetrations
400
overwhelming to give pilots trouble.
0.0
0.2
0.4
0.6
0.8
1.0
Trajectory Projection Severe Weather Coverage
Figure 3.5: Distribution of penetration entries by severe weather coverage within the trajectory projection.
3.3.2
Measuring Movement of Weather
This subset of weather-based features aims to capture the movement of weather within
the projection by examining how weather conditions change from one time period to the
next. The first feature (“PercentBadCellsDiffVIL”) calculates the difference, between time
periods, in the proportion of pixels within the projection that have VIL level ≥ 3. The
second feature (“SeverityDiffVIL”) calculates the difference in overall severity score of the
projection between time periods. To calculate this overall severity score, each pixel within
the projection is first assigned a VIL level based on its raw VIL value using Figure 2.1. The
overall severity score is simply the sum of the VIL levels for all of the pixels within the
projection. The third feature (“PercentWorseningVIL”) calculates the proportion of pixels
within the projection that increase in VIL value between time periods.
45
The next two features are very similar in the way in which they are calculated, but their
output describes the weather situation within the projection in very different manners. The
features calculate the sum of the difference in VIL value between corresponding projection
pixels between time periods. The distinction between the features is that one (“CellDiffVIL”)
calculates the raw pixel-to-pixel difference, including sign, whereas the other (“CellDiffVILAbs”) calculates the absolute difference before summing up all of the pixel differences.
Figure 3.6 displays the calculation processes for these two features using the same initial set
of pixel values extracted from the middle of the projection region in Figure 3.3.
Figure 3.6: Calculation of weather-based features that measure movement.
To obtain the final pixel value differences on the far right, we subtract the Time 1 pixel
values from the Time 2 pixel values. In this example, the VIL pixel values stay the same
between Time 1 and Time 2 but switch positions within the square projection. Thus, the
sum of the pixel values is 12 for both time periods. However, the final feature value, or the
sum of the differences, is 0 for the first feature and 12 for the second feature. Hence the
first feature captures the fact that the values and their sum within the projection have not
changed, whereas the second feature recognizes the movement of the values within the projection. Consequently, we hypothesize that the absolute difference feature (“CellDiffVILAbs”)
46
captures weather movement better than the raw difference feature (“CellDiffVIL”). However, it is worth noting that the raw difference feature describes how the strength of weather
within the projection changes between time periods, augmenting the subset described in the
previous section.
The final value used for each weather movement feature within the predictive models is
the average of the feature values for two time period differences before the current time in
order to best capture the situation at hand while only exploiting information that is already
known at the time of prediction. For instance, if the current time is 10:30:00 UTC, we would
average the feature values corresponding to the 10:22:30 and 10:25:00 weather file pairing as
well as the feature values corresponding to the 10:25:00 and 10:27:30 weather file pairing.
3.3.3
Spatial Positioning of Weather
This last subset of weather-based features aims to capture the spatial positioning of weather
cells ahead of the flight being observed. These features do not constrain the trajectory
projection with a swath width. Instead they consider all pixels in front of the plane within
the specified interval distance. The first feature (“flankcount”) counts the number of pixels
in this widespread projection that have VIL level ≥ 3. The count is weighted based on
the distance of the pixel within the projection from the plane’s current position, with weight
decreasing incrementally as distance from the plane increases. For example, a severe weather
pixel that is located 10 km from the plane’s current position is worth more than a pixel that
is located 20 km from the plane’s current position. This weighting scheme makes sense
because storm cells closer to the plane’s current position provide more immediate danger
and are less likely to migrate out of the projection before the plane encounters them.
The second feature (“FlankingValue”) is a bit more complicated. It sets the nose of
the plane as the common center of several concentric half circles whose radius depends on
the distance of a severe weather pixel from the center. We then calculate the degree value
47
(between 0 and 180) of each severe weather pixel based on its position on one of the concentric
half circles. The feature value is the standard deviation of the degree values of the storm
cells. By using the standard deviation, we capture the span of the storm cells across the
plane’s nose. If there are no storm cells within the interval ahead of the flight, then the
feature value is 0. A large standard deviation hypothetically reflects that there are storm
cells all across the plane’s direction of movement, making it harder to deviate around the
storm cells or find pockets without weather. Figure 3.7 displays how we identify the degree
values for this second flanking feature.
Figure 3.7: Example of flanking metric calculations with the number of red cells representing
“flankcount” and the standard deviation of the degree values representing “FlankingValue”.
The red pixels contain severe VIL values, so we include them in our feature calculations.
The radii of the concentric half circles is based on the severe weather pixel’s distance from the
48
current flight position. The radii values matter more for the “flankcount” weighting scheme
than for the “FlankingValue” feature. In this example, the “FlankingValue” feature value
would be the standard deviation between 160, 100, and 25: 67.64. This value represents a
fairly large spread of severe weather cells across the plane’s nose. However, it is apparent
that this feature does not capture whether there is a concentration of severe weather cells in
one particular region of the relaxed projection. Possible improvement of this feature will be
discussed in the “Future Work” section in Chapter 6.
Since “flankcount” values differ based on the size of the trajectory projection, we will not
present their summary statistics. Nonetheless, it is apparent that penetrations consistently
encounter trajectory projections with a much larger “flankcount” than non-penetrations.
With regards to “FlankingValue”, the median and average values for these features given
a penetration are respectively around 18 and 21, compared to a value of 0 and 2 for nonpenetrations. Thus, the typical trajectory projection of penetrations not only contains significantly more severe weather pixels, but these pixels are more spread out across the nose
of the plane. Nonetheless, it is worth noting that the maximum “FlankingValue” for penetrations was approximately 40, so the severe weather pixels are still relatively concentrated
within the projection.
3.4
In-Flight Features
The next set of features we will discuss deals with the behavior of the flight of interest within
the terminal area and how this behavior may be correlated with severe convective weather
penetration. These in-flight features are broken down into three subsections below.
In addition to features that deal with the behavior of the flight of interest, our models
consider standard characteristics about the flight, for instance, whether it is an international
flight (“intl”) and/or a cargo (“car”) flight. We also note whether the flight’s ascent/descent
sequence takes place during darkness (“night”) between the hours of 21:00:00 and 06:00:00
49
local time.
Although delays hypothetically fall under in-flight features, we did not include them in
our arrivals models based on their weak predictive power in Lin’s models, which exclusively
examined arrivals. Exclusion of delay features makes sense on the departures side as well
because most departures during weather impacts are delayed. The fact that only a small
percentage of the total number of departures during these weather-impacted periods are
penetrating supports this exclusion.
3.4.1
Time Spent Within Terminal Area
The first subset of in-flight features keeps track of the time that a flight has spent in the
terminal area up to the current prediction checkpoint. Models 1 and 2 consider total time
spent in the terminal area (“TimeinTerm”), from the time departures take off or arrivals
enter the terminal area. Model 3 considers this metric as well as the time spent within
50 km (“TimeWithin50km”) from the airport, reflecting the different approach maneuvers
performed by arrivals prior to landing. Rare flights that spent excessive amounts of time in
the terminal area, specifically 25 minutes within 50 km or 50 minutes overall, were excluded
from the model datasets due to their unexplained behavior.
One may assume that longer flight times within the terminal area would be less correlated
with weather penetration since they might reflect deviation to avoid weather. However, our
case studies indicate that typically pilots are unable to deviate around weather completely
because blockage is too extensive. This is especially the case when weather is close to the
airport and range of motion is limited. Hence longer flight times within the terminal area
do not always translate to pilot deviation.
Overall, we included these features in our models due to their strong performance in Lin’s
models and the notion that longer flying times translate to more exposure to weather.
50
3.4.2
Flight Behavior Within Terminal Area
The next subset of in-flight features aims to capture certain behavioral tendencies of flights in
their ascent/descent phase. The first feature within this subset (“LevelOffORDecreasing”)
applies to all three models, observing whether or not flights are leveling off within their
trajectory. Although this behavior is systematic within the terminal area based on standardized routes, we found that flights frequently penetrate while leveling off. This was even
the case for departures, which are not commonly restricted to step-like paths like arrivals.
Furthermore, flights on bad-weather days tend to level off more often than flights on days
without weather. Figures 3.8-3.10 exhibit this behavior for departure flights, with leveling
off maneuvers typically occurring at 5000 ft altitude beginning 15 km from the airport. Each
individual trajectory curve within the plots represents a different flight.
Figure 3.8: Plot of Altitude vs. Distance from Takeoff for departure trajectories on July 2nd
that penetrated severe convective weather.
The July 2nd plots (Figures 3.8 and 3.9) exhibit several cases of leveling off behavior,
whereas the occurrences of level offs in the June 10th plot are much fewer. Furthermore,
51
Figure 3.9: Plot of Altitude vs. Distance from Takeoff for departure trajectories on July 2nd
that took place during a weather impact but did not penetrate.
Figure 3.10: Plot of Altitude vs. Distance from Takeoff for departure trajectories on June
10th with no weather impact.
52
the proportion of pentration trajectories that show leveling off behavior is especially high,
supporting our inclusion of this feature in our models.
Nonetheless, it should be noted that not all case days exhibited the same high frequency
of leveling off behavior. In fact, the summary statistics for this feature over all case days
reveal that the proportion of penetration entries that occur while leveling off is the same as
for non-penetrations. The percentage of flights that level off for Models 1, 2, and 3 are as
follows: 3%, 23%, and 38%. The fact that arrivals exhibit level off behavior more often is
not surprising, as they typically follow a step-like descent in contrast to the more constant
ascent of departures.
To analytically determine if a flight is leveling off at each prediction checkpoint, we
calculate the difference between the flight’s current altitude entry and previous altitude
entry. If the absolute value of this difference is less than or equal to 100 feet, we consider
the the flight to be leveling off.
The next feature in this subset (“Tromboning”) applies only to Model 3. “Tromboning”
refers to the shape of the approach maneuver that arrivals often perform prior to landing the
aircraft. During this maneuver, flights pass by the airport before circling around to line up
for approach. The image below shows an arriving flight at ORD performing a “trombone”
maneuver while avoiding weather penetration.
To avoid viewing each arrival trajectory in our dataset, we devised an analytical way
to determine whether or not a given arrival is in the midst of a trombone maneuver at the
current prediction checkpoint. This analytical method classifies an arrival as tromboning
when it is continuously decreasing its distance from the airport and then suddenly begins to
continuously increase its distance from the airport. At this point, the flight has passed by
the airport in order to set up its approach. Based on this analytical approach, go-arounds
would also be recorded as tromboning behavior.
The reasoning behind the use of tromboning as a predictor variable in Model 3 is that this
53
Figure 3.11: Plot of arrival flight executing a trombone maneuver at ORD. The red dots
represent the trajectory points and the nose of the plane is represented by the red circle.
maneuver exposes the flight to more opportunities to penetrate severe weather. Although it is
possible that this feature may be correlated with the “TimeinTerm” and “TimeWithin50km”
variables discussed in the above section, “Tromboning” captures pilot behavior at a more
granular level by taking positional data into account. Furthermore, arriving flights perform
tromboning maneuvers regardless of the presence of convective weather due to wind restrictions on approach and landing. Hence the feature is not specific to severe weather operating
procedures. Lastly, if a high proportion of tromboning flights are penetrating severe weather,
this may suggest that the runway configuration should be altered to better protect arrivals
by altering approach routes. The summary statistics for this feature reveal that the proportion of penetration entries that occur while tromboning is the same as for non-penetrations.
54
Twelve percent of flights in the Model 3 dataset perform the trombone maneuver.
The next feature in this subset (“OtherTermArea”) observes whether or not flights are
flying within other airport terminal areas during ascent/descent. There are several airports
within the ORD terminal area, some being regional and some being major airports, like
Chicago Midway (MDW). O’Hare flights that pass through other terminal areas may encounter congestion and limited ability to deviate around weather. Since terminal areas often
overlap, especially in the Chicago metropolitan area, we consider a flight to be in another
terminal area if it is within 30 km of another airport. This occurs relatively often in our
model dataset, so we included it in our Model 1 and 2 predictive models. Since all flights
within the defined ORD terminal area are well below cruising altitudes, we do not include
an altitude criteria in addition to distance for this feature. The Model 2 dataset contains
the most interesting summary statistics for this feature, with 20% of penetrations occurring
while flying within other airport terminal areas compared to 12% of non-penetrations.
The final feature in this subset (“PenetrateAlready”) captures past penetration behavior,
if any, of the current flight by recording the severity score (which will be defined in Chapter
4) from the previous prediction interval. This feature is useful because it reveals the weather
situation within the terminal area with respect to the flight’s current trajectory. One would
assume that a pilot who has already penetrated severe weather would try harder to avoid
a second penetration, but this is not always possible. Approximately 50% of penetration
entries occur after the flight has already penetrated at least once, meaning approximately
50% of flights that penetrate do so multiple times during the ascent/descent sequence. Thus,
the “PenetrateAlready” feature is a strong indicator of future penetration.
3.4.3
Positioning Within Terminal Area
The final subset of in-flight features applies only to the arrivals models: Models 2 and 3.
The first feature in this subset (“FlightDistance”) measures the total distance the flight has
55
traveled from takeoff. The reasoning behind this feature is that flights who have traveled
a long distance most likely took off without regard to the weather conditions at the arrival
airport because so much can change during the long flight. The pilots for these flights also
are hypothetically more subject to fatigue due to longer flying times, which could affect their
decision-making when faced with severe weather in the terminal area. In contrast, flights with
a relatively small “FlightDistance” most likely received approval to take off given current
arrival airport weather conditions. If conditions at the arrival airport are severe, often times
air traffic controllers do not give these corresponding departing flights permission to take
off. Thus, through this feature, we aimed to capture differences in descent routes as well as
intricacies within pilot behavior based on total distance traveled and currency of knowledge
regarding terminal area weather conditions.
One may argue that “FlightDistance” will be highly correlated with the “TimeinTerm”
variables because “FlightDistance” logically increases as “TimeinTerm” increases. However,
the disparity in distances traveled by different flights prior to reaching the ORD terminal
area boundary prevents this correlation. It is worth noting that the “FlightDistance” feature
is calculated as a straight line from takeoff to the flight’s current position rather than a
point-to-point cumulative distance.
The next two features within this subset (“DistfromLanding” and “CircleDistfromLanding”) only apply to Model 3 and aim to capture an arriving flight’s spatial positioning
within the terminal area with respect to its assigned landing runway. The “DistfromLanding” feature measures the raw distance “as the crow flies” from the flight’s current point
to its assigned point of touch down. The “CircleDistfromLanding” feature, on the other
hand, measures the minimum angular distance, ranging from 0 to 360 degrees, between the
flight’s current position and the point of touch down. This second feature provides a better
representation of flight’s path to touch down because it simulates approach in addition to
landing.
56
Figure 3.12: Example of Model 3 distance metric calculations with point-to-point distance
measured in kilometers representing “DistfromLanding” and angular distance measured in
degrees representing “CircleDistfromLanding”.
The idea is that flights farther from touch down have more airspace to cover, and thus,
are subjected to more opportunities for weather penetration while having limited deviation
capability close to the airport. For example, a flight that is currently located on the opposite
side of the airport from its landing runway must circumvent the airport to ensure a proper
approach. If weather is surrounding the airport or is concentrated in large, unavoidable
pockets, such a flight is almost sure to penetrate severe weather during its circumvention.
3.5
Behavior of Other Pilots in the Terminal Area
The last set of features included in our models describes the behavior of other flights within
the terminal area during the ascent/descent of the flight of interest. We found that this
behavior is often a good indicator of the conditions that the flight of interest will experience
in the upcoming prediction interval. These behavioral features, which are broken down into
three subsections below, describe flights ahead in the arrival stream as well as departures
57
ahead ascending towards enroute altitudes.
3.5.1
Are Flights Ahead Penetrating Severe Convective Weather?
The first subset of features describing the behavior of other flights in the terminal area
examines flights ahead of the flight of interest and records whether or not they are penetrating
severe convective weather. For example, applied to Model 1, these features examine flights
further along in the ascent sequence than the flight of interest. Applied to Models 2 and 3,
these features examine flights that are closer to ORD than the current flight. To be specific,
we only consider flights ahead that have a trajectory point within the current prediction
interval within the past three minutes. We set the time period length at three minutes
rather arbitrarily, aiming to encapsulate the flight operations and weather conditions that
the current prediction interval has been subjected to recently. A shorter time period may
not succeed in “painting this picture” due to a low number of trajectory points.
This subset of features outlines the following characteristics of flights ahead in the ascent/descent sequence: whether or not these flights penetrated severe weather in the current
prediction interval (“PenetratingAhead”), the total number of flights that penetrated (“PenetratingAheadNumber”), the total number of times these flights penetrated in the current
prediction interval (“PenetratingAheadNumberEntries”), the average severity level of these
penetrations (“PenetratingAheadScore”), the average distance “as the crow flies” between
the current flight’s position and where the penetrations took place (“PenetratingAheadDist”), and the average angular distance, ranging from 0 to 180 degrees, between the current
flight and where the penetrations took place (“PenetratingAheadCircleDist”).
The reasoning behind this subset of features is fairly straightforward: flights ahead have
already experienced weather conditions, severe or not, that the current flight will soon face.
After examining the summary statistics, we found that over 25% of the time there is a
penetration ahead of the current flight, that flight will penetrate in the next prediction
58
interval as well. Furthermore, approximately 60% of penetrations occur after another flight
had penetrated ahead in the interval.
3.5.2
Behavior of Flights in the Opposite Sequence
The next subset of features is very similar to the subset described above, except that we are
now examining flights in the opposite flight sequence. For instance, if the flight of interest
is a departure, this subset of features examines arrival behavior. The situation is reversed
for Models 2 and 3. Identical to the feature subset above, we record the following characteristics for flights with a trajectory point in the current prediction interval within the past
three minutes, except in the opposite flight sequence: whether or not these flights penetrated
severe weather in the current prediction interval (“Arrivals/DeparturesPenetrating”), the total number of flights that penetrated (“Arrivals/DeparturesPenetratingNumber”), the total
number of times these flights penetrated in the current prediction interval (“Arrivals/DeparturesPenetratingNumberEntries”), the average severity level of these penetrations (“Arrivals/DeparturesPenetratingScore”), the average distance “as the crow flies” between the
current flight’s position and where the penetrations took place (“Arrivals/DeparturesPenetratingDist”), and the average angular distance, ranging from 0 to 180 degrees, between the
current flight and where the penetrations took place (“Arrivals/DeparturesPenetratingCircleDist”).
After examining the summary statistics for these features, we found that over 20% of
the time a flight in the opposite sequence penetrates in the current prediction interval,
the flight of interest will penetrate in the current prediction interval as well. Furthermore,
approximately 40% of penetrations occur after another a flight in the opposite sequence
already penetrated in the current prediction interval.
We also look to see whether flights of the opposite flight sequence are forming congestion
around the flight of interest with the “DepsCrowding” and “ArrsCrowding” features. These
59
features count the number of flights of the opposite flight sequence with a trajectory point
within 25 km of the current flight’s position in the past three minutes. Although flights of
the opposite phase often operate in different terminal area sectors, are separated vertically
by air traffic controllers, and can be moderated by Ground Delay/Ground Stop programs, a
situation where this division of operations is not sustained may affect pilot behavior drastically by limiting range of movement. We expect that these features have more of an effect
on entries close to the airport where traffic is more condensed.
3.5.3
Follow the Leader
The final subset of features describing the behavior of other flights in the terminal area is
similar to the other two in that we examine flights ahead of the flight of interest. However,
this feature specifically examines whether or not the current flight is following the trajectory
of another flight ahead in the same flight sequence (ascent/descent). This feature applies
to all three models, as departure and arrival streams within the terminal area are both
prevalent.
In interviews with pilots (that will be discussed in more detail in Chapter 5), we learned
that pilots are likely to follow the preceding pilot in a stream during a weather impact.
The pilots, along with Lin’s results, asserted that the behavior of the preceding pilot, thus,
heavily influences whether a pilot chooses to fly through severe weather. As one pilot said,
“If it worked for the other guy, it will work for me.” This mindset may be problematic during
periods when the weather is worsening or moving rapidly, potentially obstructing the path
taken by the previous pilot. Moreover, although this “follower-the-leader” behavior is more
common closer to the airport, where flight streams and trajectories are most regulated, we
occasionally find pilots clearly following each other beyond 100 km from the airport.
The relative importance of stream-based features in Lin’s models revealed that pilots are
more likely to penetrate severe weather when they are “followers” in a stream than when they
60
are “pathfinders” leading a stream. Our features don’t distinguish between “pathfinders”
and “followers”, instead focusing on whether the flight of interest is a “follower”. In our
models, a flight is labeled a “follower” if there is a preceding flight with a trajectory point
within a mile of the current position within the past 3 minutes. A mile is a very small
distance in comparison to the terminal area we have defined, so trajectory points of this
proximity are likely no coincidence, even when considering standardized routes. Furthermore,
when we examined air traffic flows for each runway at ORD, we found that there was no
statistical difference between flows on weather days vs. flows on non-weather days. Thus,
we can consider “follower” behavior a conscience decision by the pilot rather than attribute
it to weather-specific airport procedures. The trajectory plot of the ORD terminal area in
Figure 3.13 reveals this “follow-the-leader” behavior.
Figure 3.13: Example of “follow-the-leader” behavior by departures in the West sector of
the ORD terminal area.
In Figure 3.13, the connected red points represent individual arrival trajectories and the
61
black lines represent individual departure trajectories. The circles represent the nose of the
plane. The reader should notice three departures following each other in the West sector of
the airport, avoiding storm cells of threat level “orange” and “red” north of ORD. In this
scenario, the departures penetrated severe weather immediately upon takeoff but quickly
altered their trajectories to deviate around the concentration of storm cells. “Following-theleader” worked effectively in this scenario, as it did in the majority of “follower” scenarios
in our dataset.
This subset of features records the following metrics for the current prediction interval: whether or not the current flight is a follower (“Follower”), the number of flights it is
following (“NumLeaders”), whether or not the leaders penetrated severe weather (“FollowerPenetrate”), the average severity level of these penetrations (“FollowerPenetrateScore”),
and the average VIL value encountered by the leaders (“LeaderVILFinal”).
3.5.4
Feature Summary
Table 3.1 provides a comprehensive list of the features described in Chapter 3. We will
reference these features frequently in the following chapter presenting our results.
Feature
BadWeatherPercentageBefore/Now
PercentBadCellsDiffVIL
Description
Proportion of pixels within projection with VIL level
Difference between time periods in proportion of pixels within
projection with VIL level
SeverityDiffVIL
≥3
≥3
Difference between time periods in overall severity score of projection
PercentWorseningVIL
Proportion of projection pixels that increase in VIL value between time periods
CellDiffVIL
Sum of raw difference in projection pixel-to-pixel VIL values
between time periods
CellDiffVILAbs
Sum of absolute difference in projection pixel-to-pixel VIL values between time periods
flankcount
Number of pixels in 180-degree projection with VIL level
62
≥3
FlankingValue
Standard deviation of storm cell degree location in 180-degree
projection
TimeinTerm
Total time spent in terminal area up to the current prediction
checkpoint
TimeWithin50km
Total time spent within 50 km of airport (Model 3 only)
LevelOffORDecreasing
Whether or not the current flight is leveling off in the trajectory
Tromboning
Whether or not the current flight is performing a trombone
maneuver
OtherTermArea
Whether or not the curent flight is intersecting another airport
terminal area
PenetrateAlready
FlightDistance
Flight’s severity score (0-3) from previous prediction interval
Total distance the flight has traveled from takeoff
DistfromLanding
Distance “as the crow flies” from assigned landing runway
CircleDistfromLanding
Minimum angular distance from assigned landing runway
PenetratingAhead
Whether or not flights ahead penetrated severe weather in current prediction interval
PenetratingAheadNumber
Total number of flights ahead that penetrated in current prediction interval
PenetratingAheadNumberEntries
Total number of times flights ahead penetrated in current interval
PenetratingAheadScore
Average severity level of penetrations by flights ahead
PenetratingAheadDist
Average distance “as the crow flies” between current flight’s
position and penetrations ahead
PenetratingAheadCircleDist
Average angular distance between current flight’s position and
penetrations ahead
Arrs/DepsPenetrating
Whether or not flights in opp sequence penetrated severe
weather in current prediction interval
Arrs/DepsPenetratingNumber
Total number of flights in opp sequence that penetrated in
current prediction interval
Arrs/DepsPenetratingNumberEntries
Total number of times flights in opp sequence penetrated in
current interval
Arrs/DepsPenetratingScore
Average severity level of penetrations by flights in opp sequence
63
Arrs/DepsPenetratingDist
Avg distance “as crow flies” between current flight’s position
and penetrations in opp sequence
Arrs/DepsPenetratingCircleDist
Avg angular distance between current flight’s position and penetrations in opp sequence
Arrs/DepsCrowding
Total number of flights in opposite sequence within 25 km of
flight’s current position
Follower
Whether or not the current flight is following the trajectory of
another flight ahead
NumLeaders
LeaderPenetrate
Number of flights ahead within the common trajectory
Whether or not the flights ahead in the common trajectory
penetrated
Table 3.1: Summary of Model Features
4
Predictive Modeling of Pilot Behavior
Chapter 4 presents the methods we used to predict severe convective weather penetration
as well as the results we achieved from these methods. The following sections will discuss in
detail the different predictive methods we implemented in the statistical software “R”, how
we constructed the model datasets, the performance of our models when applied to ORD,
and the performance of our models when applied to other airport terminal areas besides
ORD.
4.1
Defining the Dependent Variable
Before describing the different predictive methods we implemented in our research, it is
critical to understand what we are trying to predict. As we alluded to in earlier chapters,
the dependent variable in our models is not just the binary question of whether or not a
flight will penetrate severe convective weather in the current prediction interval. Instead
we look to achieve a more granular level of classification by predicting not just whether a
64
penetration will occur but the severity of that penetration as well.
For each transponder message entry, we have VIL and echo top data for a flight at its
current position. We then calculate the number of “Penetration Points” that the flight
receives at each message entry using Equation 4.1 and its supporting tables.
P enetration P oints = V IL P oints ∗ EchoT op P oints,
(4.1)
where
VIL Value
VIL Points Assigned
Echo Top Value
EchoTop Points Assigned
<133
0
<25,000 ft.
1
≤146
1
≤35,000 ft.
2
≤159
2
>35,000 ft.
3
≤170
3
≤180
4
≤193
5
≤206
6
≤218
7
≤231
8
≤244
9
>244
10
Table 4.2: Echo Top Point Assignment
Table 4.1: VIL Point Assignment
The equation is multiplicative in nature to give adequate weight to the echo top metric,
provide a large range of Penetration Point values, and to better capture the severity of
a penetration. The VIL Point values were assigned rather arbitrarily with the difference
between each VIL Value bin being about 12 units. No VIL Points are assigned if a flight
encounters a VIL Value less than 133 because this encounter is not considered a severe
convective weather penetration. One may notice that due to the multiplicative nature of the
65
Penetration Points equation, if the VIL Points assigned are 0, then the Penetration Points
for that message entry are consequently 0. This fail-safe makes sense because echo tops do
not exist unless severe VIL values are present.
EchoTop Points allocation comprises fewer bins than VIL Points allocation because the
incremental changes in echo top value are not as sensitive as with VIL. Within our model
datasets, if an echo top exists, then it is typically greater than 25,000 feet. Since we are
working with flight trajectories within the terminal area, we can almost always be sure that
if an echo top exists at the flight’s current position, then the flight is under the echo top.
Furthermore, as discussed in Chapter 2, the severity of echo tops is correlated with the
severity of VIL. Thus, if a flight’s current position maintains a relatively large VIL value,
then its corresponding echo top will also be relatively large.
A flight’s “Overall Penetration Score” is the average number of Penetration Points registered within the current prediction interval. Our models’ dependent variable, referred to
as the “severity level” of a given prediction interval, is defined by the “Overall Penetration
Score” cutoffs in Table 4.3.
Overall Penetration Score
Severity Level
0
0
≤4
1
≤8
2
>8
3
Table 4.3: Dependent Variable Cutoffs
One may notice that the cutoffs are relatively low, especially considering that the maximum number of Penetration Points a flight can earn in a given interval is 30. The dependent
variable cutoffs were set to ensure a sufficient mix of severity levels in our model dataset while
still reflecting the true proportion of flights that experience each level of severity. In fact,
66
most flights that penetrate severe convective weather do so at low VIL values and low echo
top values, relative to the values that constitute severe weather. Figure 4.1 supports this
claim, showing that the large majority of penetrations, on both the arrivals and departures
side, are of level 3 VIL with echo top height between 25,000 and 35,000 ft 3 .
Figure 4.1: Distribution of VIL and Echo Top values for arrival and departure penetrations
within the ORD terminal area.
3
It is evident that there are more arrival penetrations than departure penetrations; this is specific to ORD
and may not always be the case at other airports. It is also worth noting that level 6 departure penetrations
do indeed exist, but there are so few that they do not show up on the histogram.
67
The dependent variable breakdowns for each ORD model dataset, prior to any dataset
balancing, can be seen in Table 4.4.
Severity Level
Model 1 Frequency
Model 2 Frequency
Model 3 Frequency
0
27,876
14,937
26,649
1
942
533
879
2
216
95
153
3
76
29
96
Table 4.4: Breakdown of frequencies of each severity level for Models 1, 2, and 3.
From the above table, it is clear that severe weather penetrations are a very rare occurrence, even during periods in which the terminal area is impacted by weather. It is also
worth nothing that of the penetrations (severity level ≥0), the overwhelming majority are
of severity level 1, which validates our design of the dependent variable cutoffs in Table 4.3.
The next section will address how we deal with the huge difference in magnitude between
the number of penetrations vs. non-penetrations.
4.2
Defining our Model Dataset
Model 1, which is focused on departures, contains predictor input
4
from a different set of
flights than that of Models 2 and 3. The Model 1 dataset contains predictor input from
all ORD flights that have a trajectory point within the defined terminal area within two
hours of a departure penetration. The Model 2 and 3 datasets were built in the same way
except we are looking within two hours of an arrival penetration. The Model 3 dataset
contains only predictor input corresponding to trajectory points within 50 km from ORD.
We preprocessed each dataset and removed flights as necessary based on the ETMS trajectory
verification discussed in section 2.2.1.
4
“Predictor input” refers to the calculated feature values at each prediction interval for the verified ORD
flights. These feature values serve as input to our predictive models.
68
After examining the summary statistics of the input, we tackled the dependent variable
issue discussed in the above section. With penetrations occurring so rarely, using the raw
input would result in the models always predicting a severity level of 0. To prevent this from
occurring and achieve meaningful findings with respect to feature importance, we chose to
balance the datasets using oversampling. Thus, we matched the number of penetration
entries with the number of non-penetration entries, ensuring that the frequency of each
severity level above 0 was scaled based on the proportion of each level in the original set
of penetration entries. We used bootstrapping to implement the oversampling of data; this
ensured random selection of penetration entries from the original pool.
4.3
Predictive Methods
After balancing the flight dataset for all three models, we selected which predictive methods
to apply to our models. Any method we used had to be capable of handling a categorical
dependent variable with more than two classes, as well as both continuous and discrete (i.e.
binary) predictor variables. In the end, we chose to apply Multinomial Logistic Regression,
CART, and Random Forests to our predictor input. Each method has its strengths and
weaknesses, which we will discuss in the following subsections.
The metrics used to evaluate each method consist of prediction accuracy, two different
versions of a false negative rate, and a false positive rate. Prediction accuracy is simply the
proportion of interval entries for which the model predicts the correct severity level. The
first false negative rate examines how often the models predict a severity level lower than
the actual severity level that occurred. The second false negative rate examines how often
the models predict a severity level of 0 when in fact the severity level was greater than 0.
We are more concerned with the second false negative rate because missing a penetration
altogether is more dangerous than predicting a penetration but misjudging its severity. The
false positive rate measures the proportion of interval entries for which the model predicts a
69
severity level greater than 0 when in fact the severity level was 0. This metric will help us
evaluate the ability of our models to differentiate between non-penetration and penetration
in a binary context. We also extracted feature importance within the models. The way in
which these importance values were calculated for each predictive method will be explained
in the below subsections.
4.3.1
Multinomial Logistic Regression
Multinomial Logistic Regression generalizes logistic regression to multi-class problems with
more than two possible discrete outcomes [11]. This method handles categorical variables
well, especially when outcomes are ordinal [4], such as in our case with the severity levels.
However, results are sensitive to the arbitrary coding of dependent variable classes, which
may cause misleading conclusions. Additionally, this method outputs a “black box” model
in that the user does not know how the model used the predictor input or weighted different
features in order to make predictions. Thus, the results outputted by Multinomial Logistic
Regression are not very interpretable, with the magnitude of coefficient values not accurately
representing the relative importance of a feature in the model. Consequently, we focus on
this method’s performance rather than feature importance interpretations.
4.3.2
Classification and Regression Trees (CART)
CART, otherwise known as decision tree learning, is used to visually represent decisionmaking. This method recursively partitions the data into two sets, finding a partition at
each step that maximally differentiates the two sets [14]. In our case, each step divides
the prediction interval entries by severity level (usually just 0 and 1), while minimizing the
misclassifications. The recursion is complete when the subset of flights at a node has all the
same dependent variable value, or when splitting no longer adds value to the predictions
[18].
70
We used CART in our research for several reasons. First, it performs well with large
datasets and requires very little data preparation or “cleansing” [14]. With over 60,000 rows
of data in each model dataset, this was an important consideration. Next, CART outputs
a transparent, “white-box” model: the predicted outcome for each interval entry is easily
explained by boolean logic based on predictor input. Thus, the results are simple and easy to
understand, especially regarding the relative importance of various features. Lastly, CART
is very robust, meaning it performs well even if its assumptions are violated by the predictor
input [18].
Nonetheless, CART has its limitations. First, it does not guarantee global optimality
due to its reliance on greedy algorithms [18]. Additionally, CART is subject to overfitting,
characterized by an excessive number of tree splits that make decisions more complex. On
the flip side, depending on the parameter that sets the minimum number of interval entries
required to create a node, dominant variables may result in trees with very few splits. This
makes it hard to determine the relative influence of other variables in the decisionmaking
process. Our models use a minimum node “bucket” size of 25 to prevent this variable
over-dominance. Finally, information gain in trees is biased towards features with more levels
[18]. However, this biased feature selection can be combated with conditional inference [18].
In the end, the “R” CART software package addresses most of these limitations, securing
CART’s status as a viable predictive method for our research.
Regarding the extraction of feature importance from CART models, we must first acknowledge that although only relatively few features may appear explicitly as “splitters”
in the visual output, this doesn’t mean that there aren’t other features important to understanding or predicting the severity level. The simplicity of the outputted decision tree
can be attributed to the goal of CART: to develop a simple tree structure for predicting
outcomes based on data [18]. Furthermore, a feature may be very influential even if it does
not appear as a primary splitter. CART keeps track of surrogate splits in the tree-growing
71
process, so the contribution a feature makes in the prediction process is not determined only
by primary splits [21]. Throughout the tree-growing process, whenever a primary splitter
is missing, surrogate splitters are used instead to move an interval entry down the tree to
its appropriate terminal node [21]. A feature may appear in a tree many times, either as a
primary or a surrogate splitter [21].
To calculate the importance score for each feature, the “R” CART package sums the
goodness of split measures for each split for which the feature is a primary splitter [20]. It
then adds this sum of goodness to the term “goodness * (adjusted agreement)” for all cases
in which the feature serves as a surrogate splitter [20]. The resultant scores are scaled to sum
to 100. The importance score considers surrogate splits to prevent two similar features from
obscuring the significance of one another [20]. It is important to note that importance scores
are strictly relative to the given tree structure and do not indicate absolute information value
of a feature [21].
4.3.3
Random Forests
The Random Forest method is an extension of CART in that it constructs hundreds of
decision trees, using a random subset of the predictor input in each one. This approach,
called bootstrap aggregating, or “bagging”, helps the model identify trends in the data that
CART cannot. Moreover, after each tree votes on the dependent variable outcome, Random
Forests use the mode of these tree predictions as its final prediction.
Random Forests are a very useful prediction tool for several reasons. First, it is extremely robust. Random Forests can deal with many correlated, weak predictors without
skewing the prediction results or having one variable become over-dominant [14]. In addition,
the diversity of trees helps with the overfitting problem commonly seen in CART. Lastly,
Random Forests handle unbalanced datasets well [14]. Thus, if we chose not to balance our
model datasets with oversampling, we could still be confident using Random Forests to make
72
predictions.
The most common complaint of Random Forests is that they are “black box” models
with results that are not readily interpretable, unlike CART. However, we can determine
the most influential features in the model using the Gini Index. The Gini Index measures
node impurity, or ”how much each feature contributes to the homogeneity of the nodes and
leaves in the resulting random forest” [9]. To obtain the importance score for a given feature
in a Random Forest model, we randomly permute the values of each feature and measure
the decrease in accuracy of the current tree based on the Gini Index [14]. This process is
repeated for all trees in the forest containing the feature of interest. The resulting average of
these accuracy decreases is the raw variable importance. A higher value (higher decrease in
Gini) indicates that a particular feature is more influential in the classification process [14].
4.4
Model Results
We will now present the results of predictive methods discussed above applied to Models 1, 2,
and 3 5 . These results include prediction accuracy, both false negative rates6 , a false positive
rate, and a discussion of feature importance within the models. The feature importance
discussion will very limited for Multinomial Regression due to its lack of transparency. Both
the training and testing sets for the models consist of predictor input from ORD flights.
The training set is made up of 60% of interval entries in the balanced model dataset, and
the testing set is made up of the remaining 40% of interval entries in the balanced model
dataset.
5
The predictor input for these models includes all of the features discussed in Chapter 3. We will explore
partial inclusion of features in subsequent sections.
6
As a reminder, the first false negative rate examines how often the models predict a severity level lower
than the actual severity level that occurred. The second false negative rate examines how often the models
predict a severity level of 0 when in fact the severity level was greater than 0. The false positive rate measures
the proportion of interval entries for which the model predicts a severity level greater than 0 when in fact
the severity level was 0.
73
4.4.1
Model 1 Results
Table 4.5 outlines the prediction accuracy, false negative rates, and false positive rate
achieved by each predictive method applied to Model 1:
Method
Accuracy
FN Rate 1
FN Rate 2
FP Rate
Multinomial Regression
79%
32%
16%
9%
CART
78%
27%
9%
16%
Random Forests
97%
3%
3%
4%
Table 4.5: Model 1 Performance Results
From Table 4.5, it is clear that Random Forests’ performance is superior to that of the
other two predictive methods. Multinomial Regression and CART perform very similarly,
with CART having slightly lower accuracy, lower false negative rates, but a higher false positive rate. As mentioned before, it is hard to comment on feature importance for Multinomial
Regression due to its lack of interpretability and coefficient values that do not accurately
reflect feature importance. However, CART and Random Forests both provide meaningful
output for interpretation.
Figure 4.2 displays the outputted CART model with its respective splits. Each branch
indicates the criterion for the left-hand daughter node; each node is labeled with the predicted
severity level as well as the actual number of interval entries of each severity level (0/1/2/3)
in the training set assigned to that node. Table 4.6 lists the relative importance of each
feature rescaled to sum to 100, omitting any features whose proportion of the sum is less
than 1%.
The feature at the root of the tree is “flankcount”, with secondary splits on “PenetratingFartherOutDist” and “PenetrateAlready”. These splits are consistent regardless of the
random training-testing split. Severity levels 0, 1, and 2 are included in the tree nodes, but
severity level 3 is not because of its very rare occurrence in the data set. Since the first
74
Model 1 CART
flankcount< 0.875
0|
16726/12766/2927/1033
PenetrateAlready< 5.833
1
2624/10865/2730/944
PenetratingFartherOutDist>=33.66
0
14102/1901/197/89
0
13967/1240/146/65
1
135/661/51/24
1
2569/10412/1964/436
2
55/453/766/508
Figure 4.2: Model 1 ORD CART Output
Feature
Relative Importance
flankcount
21
FlankingValue
21
BadWeatherPercentageNow
17
BadWeatherPercentageBefore
16
CellDiffVILAbs
10
PercentWorseningVIL
9
PenetratingFartherOutDist
2
PenetrateAlready
2
PenetratingFartherOutScore
1
Table 4.6: Model 1 CART Feature Importance Values
split occurs on a weather-based feature, and the top features in the importance chart are
also weather-based, one may conclude that the most significant determinant of departure
weather penetration is the weather itself and not any operational factors. It is worth noting
75
that it doesn’t require a very large “flankcount” for CART to predict a penetration.
Furthermore, the tree is rather simple, with only three total splits. This, along with the
results in the relative importance chart, support that there are a few dominant features that
influence the prediction decision, with the remaining features’ influence being very weak.
Decision trees are not well-equipped to handle the presence of many weak features, which
could result in two features that are somewhat correlated not both being used despite their
similarity. However, the presence of dominant features in our CART model prevents this
from being relevant.
Random Forests do not provide a useful visual output like CART, but do provide the
relative importance of features using the Gini Index. Table 4.7 presents the normalized
importance values, rounded to the nearest integer and rescaled to sum to 100, for each
feature, omitting any features whose proportion of the sum is less than 1%.
The most influential features are essentially the same as in Table 4.6 for CART, but the
spread of relative importance values among features is much smaller. This results in the
longer list of features in Table 4.7. Although the top features in the Random Forests importance chart are still weather-based, these features do not have as dominant of a presence as
with CART. This could be attributed to the robustness of Random Forests, allowing them
to deal with the presence of many weak features without skewing results or hurting performance. We now look to the Model 2 and 3 results to discern whether model performance
and feature importance values are similar for ORD arrivals.
4.4.2
Model 2 Results
Table 4.8 outlines the prediction accuracy, false negative rates, and false positive rate
achieved by each predictive method applied to Model 2:
From Table 4.8, it is clear that Random Forests’ performance is superior to that of the
other two predictive methods. Multinomial Regression and CART perform very similarly,
76
Feature
Relative Importance
flankcount
14
FlankingValue
12
BadWeatherPercentageNow
8
PenetratingFartherOutDist
7
BadWeatherPercentageBefore
7
CellDiffVILAbs
6
PenetrateAlready
5
PenetratingFartherOutScore
5
PercentWorseningVIL
5
TimeinTerm
5
CellDiffVIL
4
SeverityDiffVIL
4
PercentBadCellsDiffVIL
4
ArrivalsPenetratingDist
2
PenetratingFartherOut
2
ArrsCrowding
2
ArrivalsPenetratingCircleDist
2
ArrivalsPenetratingScore
2
PenetratingFartherOutCircleDist
2
Table 4.7: Model 1 Random Forests Feature Importance Values
with CART having slightly lower accuracy, lower false negative rates, but a false positive
rate almost two times the size. Compared to Model 1 performance, Model 2 accuracy rates
are slightly lower, false negative rates are slightly higher, and false positive rates are slightly
lower. This difference may be due to the unforeseeable behavior of arrivals during descent
77
Method
Accuracy
FN Rate 1
FN Rate 2
FP Rate
Multinomial Regression
82%
27%
14%
8%
CART
80%
24%
7%
15%
Random Forests
98%
0%
0%
3%
Table 4.8: Model 2 Performance Results
compared to departures, which follow more direct routes from takeoff up to cruising altitudes.
Figure 4.3 displays the outputted CART model with its respective splits and Table 4.9
lists the relative importance of each feature.
Model 2 CART
FlankingValueMatrix< 2.566
0|
8962/7270/1296/398
PenetratingCloserInCircleDist>=38.35
0
7590/557/68/39
0
7570/443/68/39
1
1372/6713/1228/359
1
20/114/0/0
Figure 4.3: Model 2 ORD CART Output
The feature at the root of the tree is “FlankingValue”, with a single secondary split
on “PenetratingCloserInCircleDist”. These splits are consistent regardless of the random
training-testing split. It is worth noting that it doesn’t require a very large “FlankingValue”
for CART to predict a penetration. Interestingly, only severity levels 0 and 1 are predicted by
the model. This may be attributed to the particularly high proportion of level 1 penetration
78
Feature
Relative Importance
FlankingValue
20
flankcount
20
BadWeatherPercentageNow
16
BadWeatherPercentageBefore
16
CellDiffVILAbs
15
PercentWorseningVIL
12
PenetratingCloserInCircleDist
1
PenetratingCloserInDist
1
Table 4.9: Model 2 CART Feature Importance Values
entries in the model dataset, with over 80% of penetration entries classified as level 1. Just
like in the Model 1 CART, the first split occurs on a weather-based feature and the top
features in the importance chart are also weather-based, leading us to believe that the most
significant determinant of arrival penetration far from the airport is the weather itself and not
any operational factors. The dropoff in relative importance between weather-based features
and non-weather-based features in Table 4.9 is noticeably large.
The fact that the tree only contains two splits does not mean that there are only two
important features in the set. Additionally, the fact that the second split is on “PenetratingCloserInCircleDist” does not mean that it is one of the most influential features in the
set. In fact, according to Table 4.9, it has a relatively low importance value. This goes back
to the discussion of splitters in section 4.3.2, which stated that the importance of a feature
within the CART model is not dependent on its role as a primary splitter.
Moving on, Table 4.10 presents the Model 2 Random Forests normalized feature importance values.
The most influential features are similar to those in Table 4.9 for CART and as in the
Model 1 results, but the specific importance rankings have shuffled around. “FlankingValue”
79
Feature
Relative Importance
FlankingValue
17
flankcount
13
BadWeatherPercentageNow
10
CellDiffVILAbs
8
BadWeatherPercentageBefore
8
PercentWorseningVIL
7
CellDiffVIL
4
PenetratingCloserInCircleDist
4
FlightDistance
4
SeverityDiffVIL
4
PenetratingCloserInDist
4
PercentBadCellsDiffVIL
3
PenetratingCloserInScore
2
TimeinTerm
2
PenetratingCloserInNumberEntries
2
PenetrateAlready
1
Table 4.10: Model 2 Random Forests Feature Importance Values
is now dominant over “flankcount”. Furthermore, the gap between the dominant features
and all others is more noticeable, resulting in a shorter list of variables in Table 4.10 than
in Table 4.7 corresponding to Model 1. Table 4.10, unlike Table 4.7, does not contain any
features based on the behavior of flights in the opposite flight sequence. This suggests that
departure behavior does not greatly influence arrival penetration far from the airport, which
makes sense because it is easier to create separation between flights when they are farther
80
from the congestion near the airport. We should see more unique trends in the Model 3
results due to the fact that all interval entries take place within 50 km of ORD.
4.4.3
Model 3 Results
Table 4.11 outlines the prediction accuracy, false negative rates, and false positive rate
achieved by each predictive method applied to Model 3.
Method
Accuracy
FN Rate 1
FN Rate 2
FP Rate
Multinomial Regression
86%
21%
10%
5%
CART
85%
24%
6%
7%
Random Forests
99%
0%
0%
3%
Table 4.11: Model 3 Performance Results
From Table 4.11, it is again clear that Random Forests’ performance is superior to that
of the other two predictive methods for Model 3. Multinomial Regression and CART perform very similarly, with CART having slightly lower accuracy, lower false negative rates,
and slightly higher false positive rate. Compared to Model 2 performance, Model 3 posts
noticeably higher accuracy rates, lower false negative rates, and lower false positive rates
for Multinomial Regression and CART. This difference is rather unexpected considering the
uncertain behavior of arrivals close to the airport. One possible explanation for the difference is that the Model 3 dataset contains significantly more interval entries than the Model
2 dataset, allowing it to better train with predictive methods before applying the model to
the testing set. This, however, is not necessarily the case.
Figure 4.4 displays the outputted CART model with its respective splits and Table 4.12
lists the relative importance of each feature.
The feature at the root of the tree is “flankcount”, with secondary splits on “PenetratingCloserInCircleDist” and “PenetratingCloserInScore”, a tertiary split on “Penetrating-
81
Model 3 CART
flankcount< 1.5
0
15989/12472/2168/512
|
PenetratingCloserInDist>=55.37
0
15124/2108/256/47
0
14755/1002/76/25
1
369/1106/180/22
PenetratingCloserInScore< 4.452
1
865/10364/1912/465
1
839/9877/1317/125
PenetratingCloserInScore< 8.528
2
26/487/595/340
DeparturesPenetratingCircleDist< 86.41
2
18/465/532/68
1
3/196/0/5
3
8/22/63/272
2
15/269/532/63
Figure 4.4: Model 3 ORD CART Output
CloserInScore” again, and finally a quaternary split on “DeparturesPenetratingCircleDist”.
These splits are not as consistent as Models 2 and 3 with regards to the random trainingtesting split. However, this framework of splits was the most frequently encountered by far.
The large number of splits compared to the CART output for Models 1 and 2 is supported
by the longer list of features in Table 4.12. The dropoff in relative importance between
the weather-based features and the non-weather-based features is not as overwhelming as in
Models 1 and 2.
Furthermore, the fact that there are two splits on “PenetratingCloserInScore” does not
mean that it is one of the most influential features in the set. In fact, according to Table
4.12, it has a relatively low importance value. This goes back to the discussion of splitters in
section 4.3.2, which stated that the importance of a feature within the CART model is not
dependent on its role as a primary splitter. Moreover, unlike Models 2 and 3, all severity
82
Feature
Relative Importance
flankcount
18
FlankingValue
18
BadWeatherPercentageNow
15
BadWeatherPercentageBefore
13
PenetratingCloserInDist
12
PenetrateAlready
10
PenetratingCloserInScore
4
PenetratingCloserInCircleDist
3
PenetratingCloserIn
2
PenetratingCloserInNumber
2
PenetratingCloserInNumberEntries
2
Table 4.12: Model 3 CART Feature Importance Values
levels are included in the Model 3 tree nodes. This may be attributed to the relatively high
proportion of level 3 penetrations in the Model 3 dataset in comparison to Models 1 and 2.
In Model 3, 8.5% of penetrations are of level 3 severity, compared to 6.2% in the Model 1
dataset and 4.4% in the Model 2 dataset.
Moving on, Table 4.13 presents the Model 3 Random Forests normalized feature importance values.
83
Feature
Relative Importance
flankcount
14
FlankingValue
10
BadWeatherPercentageNow
8
CellDiffVILAbs
7
BadWeatherPercentageBefore
6
PenetratingCloserInDist
5
PenetratingCloserInScore
5
PenetratingCloserInCircleDist
4
PenetratingCloserInNumberEntries
3
PenetratingCloserInNumber
3
TimeinTerm
2
CellDiffVIL
2
CircleDistfromLanding
2
DistfromLanding
2
SeverityDiffVIL
2
PenetrateAlready
2
DeparturesPenetratingScore
2
DeparturesPenetratingDist
2
PercentWorseningVIL
2
FlightDistance
2
TimeWithin50km
2
PercentBadCellsDiffVIL
2
PenetratingCloserIn
2
DeparturesPenetratingCircleDist
1
DeparturesPenetratingNumberEntries
1
FollowerPenetrateScore
1
FollowerVILFinal
1
DepsCrowding
1
Table 4.13: Model 3 Random Forests Feature Importance Values
84
The most noticeable thing about Table 4.13 is its length. The number of Random Forests
features with greater than 1% of the proportion of the Gini Index sum is much larger than
that of Models 2 and 3. This results in the spread of importance values among features being
relatively small, with very small incremental decreases as you move down the list. Although
the top features in the importance chart are still weather-based, these features do not have as
dominant of a presence for Model 3. This could be attributed to the robustness of Random
Forests, allowing it to deal with the presence of many weak features without skewing results
or hurting performance.
The list contains the entire set of features that evaluate arrivals closer to ORD as well as
departures in the vicinity. This makes sense because interval entries in Model 3 are restricted
to within 50 km of the airport. Thus, congestion near the airport is expectedly more of an
issue, and the prevalence of systematic streams and approach paths allows us to better gauge
future pilot behavior based on flights ahead.
4.4.4
Summary of ORD Results
From the results presented above, it is apparent that Random Forests are the best predictive
method in terms of the quantitative performance metrics. Moreover, its interpretability
with regard to feature importance rivals that of CART, especially since the CART splits
do not necessarily represent the most influential variables. These points make a strong case
for Random Forests as the recommended predictive method for severe convective weather
penetration. Nonetheless, we must keep in mind that the models are meant to serve as
decision support tools for air traffic controllers. The value of a “white-box model” and its
ability to demonstrate the proposed decisionmaking process should not be overlooked. Air
traffic controllers would appreciate CART’s simplicity and visual output, and could engineer
their own model splits based on personal experience. Thus, the recommended prediction
tool is not so clear cut; an extended conversation, beyond the scope of this thesis, will most
85
likely have to take place to make a final choice.
Regarding feature importance, our study shows that the primary indicators of penetration
continue to be weather-based, particularly the presence of fast-moving weather within a
flight’s trajectory projection. Nevertheless, we found that a number of the operational
features in our models weakly correlate with severe convective weather penetration. Despite
having lower importance values, these features help shed light on the dynamics of the terminal
area. In particular, the importance of features that describe the behavior of other pilots in the
terminal area may help us understand how pilots and air traffic controllers deal with weather
impacts close to the airport. The most important conclusion was that pilots are more likely
to penetrate severe weather when other pilots ahead of them in the ascent/descent sequence
already penetrated. This makes sense because flights ahead have already experienced weather
conditions, severe or not, that the current flight will soon face. On the flip side, one may ask
why pilots don’t learn from the mistakes of pilots flying ahead of them. These results imply
that rerouting around weather is still often done on an ad hoc basis once a pilot reports
his/her weather penetration to ATC [14]. Further investigation into the dynamics of the
terminal area is necessary to develop effective penetration mitigation strategies and obtain
a better understanding of how weather impacts air traffic flows.
The findings described above apply specifically to models run on ORD terminal area
operations. How do we know if other terminal areas throughout the U.S. will produce similar
results? Are our models robust to geographic location? We will explore these questions in
the next section.
4.5
Testing Our ORD Models on Other Airports
We tested our models on several other U.S. airports to determine whether they were robust
to location. In addition, after re-training the models on each individual airport, we explored
whether models trained on one airport could achieve success on another airport. Thus, are
86
the most important features and flow of decisions within airport terminal areas similar enough
that we can develop a common model that will be successful across all U.S. airports? This
would greatly decrease model computation time by skipping the re-training process while
also standardizing how air traffic controllers approach severe convective weather penetration.
Before presenting the results of each experiment for Models 1, 2, and 3, we must first
discuss how we picked airport pairings for common models.
4.5.1
Selecting Airport Pairings for Common Model Experiment
Figure 4.5 shows the 30 U.S. airports with the most severe convective weather penetration
flights during the summer 2008. Each point represents a single airport.
Figure 4.5: Map of Top 30 Penetration Airports
The color of the points differs based on the total number of penetration flights that occurred within the respective airport terminal area, with “yellow” representing a low number
and “red” representing a high number. The size of the points differ based on the correspond87
ing airport’s 2013 “hub” score, determined by the total number of passenger boardings. For
our purposes, a higher “hub” score reflects a busier airport. The two airports with the most
penetration flights, Chicago O’Hare and Atlanta, also are the two busiest airports in the
top 30. However, this trend is not consistent across the top 30: Chicago Midway, Detroit,
Orlando, and St. Louis experience a large number of penetration flights but don’t have
particularly high hub rankings.
One may notice that all of the airports in the top 30 for penetration flights are in the
right half of the United States, especially concentrated in the Midwest and the Southeast.
This is most likely due to the higher frequency of convective weather patterns in these areas
as discussed in Section 2.1. In fact, 19 out of the top 30 airports are located in these regions,
with 11 in the Midwest alone. This statistic greatly influenced our process for assigning
airport pairings. Table 4.14 lists the pairings we tested.
Pairing
Region
Distance (km)
Chicago O’Hare (ORD)\Atlanta (ATL)
Cross-Region
975
Chicago O’Hare (ORD)\Detroit (DTW)
Midwest
377
Chicago O’Hare (ORD)\Cleveland (CLE)
Midwest
508
Detroit (DTW)\Cleveland (CLE)
Midwest
152
St. Louis (STL)\Memphis (MEM)
Midwest
386
Indianapolis (IND)\Cincinnati (CVG)
Midwest
161
Atlanta (ATL)\Orlando (MCO)
Southeast
650
Orlando (MCO)\Tampa (TPA)
Southeast
125
Table 4.14: Airport Pairings
All of the airports in Table 4.14 are in the top 15 for penetration flights. We first test
O’Hare against Atlanta because they were the two airports with the most penetration flights,
and also happened to maintain the heaviest volume of air traffic. However, based on the poor
results of this test, we decided to limit pairings to airports that were within the same region.
88
If a common model across the U.S. couldn’t be developed, maybe we could at least develop
regional models. Furthermore, we wanted the airports to be relatively close in proximity so
that weather patterns would be similar, but not within each other’s terminal area, because
then operations may overlap and skew results.
The following sections will present the aggregated results of the re-trained models alongside the results of the airport pairing models in order to determine whether regional models
are feasible as well as obtain insight regarding feature importance from the experiments.
4.5.2
Comparison of Results
Table 4.15, 4.16, and 4.17 compare the results for the re-training and airport pairing methods, displaying the average performance metrics across airports/pairings along with the
corresponding standard deviation. The pairings method uses 60% of the flights from one
airport as the training set and 100% of the flights in the other airport as the testing set. The
pairings average performance metrics aggregate results from using both airports in a pair as
the training set.
Method
MR
Acc
Tree RF
Acc Acc
MR
FN
1
Tree RF
FN
FN
1
1
MR
FN
2
Tree RF
FN
FN
2
2
MR
FP
Tree RF
FP
FP
Re-training
79
(3)
80
(3)
98
(1)
31
(4)
27
(3)
1
(1)
15
(4)
10
(5)
1
(1)
9
(3)
13
(3)
3
(1)
Pairing
74
(5)
74
(4)
68
(7)
39
(9)
35
(8)
59
(15)
17
(6)
12
(7)
29
(9)
10
(6)
14
(7)
3
(1)
Table 4.15: Comparison of Model 1 performance results for the re-training vs. airport pairing
methods. For predictive methods, ”MR” represents Multinomial Logistic Regression, ”Tree”
represents CART, and ”RF” represents Random Forests. Regarding performance metrics,
“Acc” represents the prediction accuracy. “FN 1” represents the first false negative rate
defined, “FN 2” represents the second false negative rate, and “FP” represents the false
positive rate.
It is evident that the re-training method is superior to the pairing method for all performance metrics for all three models. The level of superiority differs based on the metric
89
Method
MR
Acc
Tree RF
Acc Acc
MR
FN
1
Tree RF
FN
FN
1
1
MR
FN
2
Tree RF
FN
FN
2
2
MR
FP
Tree RF
FP
FP
Re-training
81
(7)
81
(6)
98
(1)
26
(11)
23
(9)
0.6
(0.8)
14
(5)
11
(4)
0.6
(0.8)
10
(4)
12
(5)
3
(2)
Pairing
67
(7)
72
(6)
65
(7)
52
(14)
37
(10)
65
(16)
25
(9)
16
(8)
33
(8)
10
(3)
13
(6)
3
(2)
Table 4.16: Comparison of Model 2 performance results for the re-training vs. airport pairing
methods.
Method
MR
Acc
Tree RF
Acc Acc
MR
FN
1
Tree RF
FN
FN
1
1
MR
FN
2
Tree RF
FN
FN
2
2
MR
FP
Tree RF
FP
FP
Re-training
83
(4)
84
(5)
98
(1)
24
(7)
23
(9)
0.5
(0.6)
14
(4)
11
(4)
0.5
(0.6)
7
(2)
9
(4)
3
(2)
Pairing
68
(10)
74
(6)
68
(8)
47
(16)
35
(8)
60
(16)
23
(10)
13
(5)
31
(9)
10
(7)
13
(6)
3
(1)
Table 4.17: Comparison of Model 3 performance results for the re-training vs. airport pairing
methods.
of interest. The most surprising differences reside in the Random Forests metrics. While
Random Forests are by far the best predictive method for the re-training approach, it is by
far the worst predictive method for the pairing approach. It is unclear why such a dropoff
took place, with CART appearing to be the best predictive method all around for the pairing
approach. However, the CART output for the pairings method often contains a large number
of splits that vary based on the randomly constructed training-testing set. Also, it is worth
noting that the standard deviations of the re-training performance metrics are lower than
those of the pairing performance metrics, suggesting more consistent performance.
With regards to feature importance, both the re-training and pairing results mirrored
those presented in the section above describing O’Hare. The most influential features were
weather-based, with the flanking features, severity features, and “CellDiffVILAbs” consistently topping the importance rankings.
90
4.5.3
Insight from Pairings Experiment
During the airport pairing experiment, we found that Midwestern airport pairings performed
significantly better than Southeastern airport pairings, which may be attributed to the larger
sample size of Midwestern pairings. We will not comment on the performance of crossregional pairings vs. regional pairings because we only tested one cross-regional pair.
Additionally, we observed that setting the training set to an airport with more total flights
in its dataset compared to the other airport in its pair does not necessarily result in a better,
more consistent model. However, setting the training set to an airport with more penetration
entries does indeed translate to better, more consistent models, especially if the difference
in the number of penetration entries is large. This finding can be explained by the following
example: consider two models with the same number of total flight entries. Since we balance
our model datasets, an airport with fewer penetration entries would have to cycle through
more duplicate entries than the other airport in the pair in order to match the number
of non-penetration entries. The airport with fewer penetration entries would hypothetically
serve as an inferior trainer because it has experienced less penetration behavior. The increase
in consistency obtained by using the airport with more penetration entries as the trainer can
be seen explicitly in the CART decision tree, which maintains the same structure of splits
given the randomly constructed training-testing set.
4.5.4
Summary of Results
Based on the results presented above, we recommend re-training on each individual airport
rather than trying to construct regional models or a common model across all U.S. airports.
The recommended predictive method is still Random Forests, but its sensitivity and poor
performance in the pairing experiment should be noted and explored further. Its high false
negative rates were due to its frequent prediction of severity level 0 when in fact the severity
level was 1 or its prediction of severity level 1 when in fact the severity was of level 2. The
91
latter is less worrisome because at least the model still predicts that a penetration will occur,
signaling that some course of action must be taken by the controller/pilot.
Furthermore, the consistency with regards to feature importance suggests that the most
influential variables are truly weather-based and that we have developed features that are
robust to airport location. The fact that our models perform well independent of airport
location is promising. Yet we have not looked into the intricacies of the pilot thought
process when encountered with severe convective weather. The next chapter will more closely
examine pilot behavior on a case-by-case basis in order to validate of our models.
4.6
Sensitivity of Models
Since weather-based features were consistently the most influential across all three models, regardless of the airport, we explored whether the performance of our models changes
significantly when only applying the weather-based feature subset. We also tested a few
other subsets that build upon one another and contain some of the other more influential
features. The four subsets tested are described below. Tables 4.18, 4.19, and 4.20 outline the
results of this sensitivity analysis, displaying the average performance metrics, along with
corresponding standard deviation, across all airports listed in section 4.5 7 .
Round 1: Only weather-based features
Round 2: Round 1 + “PenetrateAlready” + “PenetratingAhead” subset
Round 3: Round 2 + “OtherFlightSequencePenetrating” subset
Round 4: Round 3 + all other features listed in Table 3.1
7
We retrain on each individual airport and use 60% of interval entries as the training set and 40% of
interval entries as the testing set.
92
Method
MR
Acc
Tree RF
Acc Acc
MR
FN
1
Tree RF
FN
FN
1
1
MR
FN
2
Tree RF
FN
FN
2
2
MR
FP
Tree RF
FP
FP
Round 1
75
(3)
78
(3)
98
(1)
39
(5)
30
(5)
2
(2)
18
(5)
12
(5)
1.5
(1.5)
9
(3)
13
(3)
3
(1)
Round 2
78
(3)
79
(2)
98
(1)
35
(4)
27
(3)
2
(1)
16
(5)
11
(4)
1.8
(1)
9
(3)
13
(3)
3
(1)
Round 3
78
(3)
79
(2)
97
(1)
33
(5)
27
(3)
4
(2)
16
(5)
11
(4)
3.5
(2)
9
(3)
13
(3)
3
(1)
Round 4
79
(3)
79
(2)
98
(1)
32
(4)
27
(3)
1.5
(1)
15
(4)
11
(4)
1.5
(1)
9
(3)
13
(3)
3
(1)
Table 4.18: Comparison of Model 1 performance results for variable subsets. For predictive
methods, ”MR” represents Multinomial Logistic Regression, ”Tree” represents CART, and
”RF” represents Random Forests. Regarding performance metrics, “Acc” is the proportion of
interval entries for which the model predicts the correct severity level. “FN 1” examines how
often the models predict a severity level lower than the actual severity level that occurred.
“FN 2” examines how often the models predict a severity level of 0 when in fact the severity
level was greater than 0. “FP” measures the proportion of interval entries for which the
model predicts a severity level greater than 0 when in fact the severity level was 0.
Method
MR
Acc
Tree RF
Acc Acc
MR
FN
1
Tree RF
FN
FN
1
1
MR
FN
2
Tree RF
FN
FN
2
2
MR
FP
Tree RF
FP
FP
Round 1
76
(5)
81
(5)
97
(2)
36
(9)
24
(9)
1.6
(2)
18
(5)
12
(5)
1.6
(2)
10
(5)
13
(5)
4
(2)
Round 2
79
(6)
82
(5)
97
(2)
30
(10)
22
(7)
2
(2)
16
(5)
10
(5)
2
(2)
10
(4)
12
(5)
4
(2)
Round 3
80
(6)
82
(5)
97
(2)
29
(10)
22
(7)
2
(2)
16
(4)
10
(5)
2
(2)
10
(3)
12
(5)
4
(1.5)
Round 4
82
(6)
82
(6)
98
(1)
25
(10)
21
(9)
0.8
(1)
15
(5)
10
(5)
0.8
(1)
10
(4)
12
(5)
3
(2)
Table 4.19: Comparison of Model 2 performance results for variable subsets.
93
Method
MR
Acc
Tree RF
Acc Acc
MR
FN
1
Tree RF
FN
FN
1
1
MR
FN
2
Tree RF
FN
FN
2
2
MR
FP
Tree RF
FP
FP
Round 1
76
(3)
81
(6)
96
(2)
39
(5)
30
(10)
4
(2)
19
(4)
13
(6)
4
(2)
8
(2)
10
(4)
4
(2)
Round 2
79
(3)
82
(5)
96
(1)
33
(6)
26
(9)
5
(2)
17
(5)
12
(5)
5
(2)
7
(2)
9
(4)
4
(2)
Round 3
81
(4)
82
(5)
95
(2)
30
(7)
27
(9)
7
(3)
16
(5)
12
(5)
7
(5)
6.5
(2)
9
(4)
4
(2)
Round 4
83
(4)
84
(5)
99
(1)
25
(6)
22
(9)
0.3
(0.5)
15
(4)
10
(4)
0.3
(0.5)
7
(2)
9
(4)
3
(2)
Table 4.20: Comparison of Model 3 performance results for variable subsets.
It is clear that Round 4 maintains the best performance, but the incremental improvement
between rounds is very small. Round 1 performance is impressive considering it only uses 8
features, compared to 40 in Round 4. The proximity in performance between Rounds 1 and
4 further demonstrates the dominant influence of weather-based features in our models. The
improvement between Rounds 1 and 2 is rather small, and the improvement between Rounds
2 and 3 is almost negligible if not counterproductive, suggesting that the added features in
these rounds are not very influential.
Each predictive method also exhibits a unique trend across all three models. Multinomial
Regression performance improves consistently from Round 1 to 4. CART performs extremely
consistently across all four rounds. Lastly, Random Forests performance worsens from Round
1 to Rounds 2 and 3 before reaching its best numbers in Round 4.
5
Case Studies and Pilot Experience
The sections below will examine pilot behavior within the ORD terminal area during severe
convective weather scenarios. Based on recurring themes in these scenarios and the personal
experiences of commercial and military pilots that we interviewed, we will verify/disprove
our model results and make conclusions about our research.
94
5.1
Takeaways from Pilot Interviews
We interviewed over 20 professional pilots from various backgrounds and experience levels
in order to learn their thought process upon encountering severe weather and how they
would handle such an encounter. The most common trend was that a pilots attitude is to
avoid weather at all costs. After all, the blame for weather-related accidents and structural
damage falls on the pilot for accepting a bad vector or deciding to take off in severe weather
conditions. Regardless of ATC coordination and recommendation, the aircraft is the pilot’s
responsibility.
The following subsections will address such topics as onboard weather radar and forecasting, flight path deviation, taking off/landing during a weather impact, and general takeaways
from the interviews.
5.1.1
Weather Radar and Forecasting Technology in the Cockpit
Pilots asserted that they used onboard radar and forecasting technology for avoidance rather
than selecting the weakest weather-impacted areas to penetrate. They also complained that
these tools are often outdated and inaccurate, especially regarding the currency of forecasts.
Moreover, pilots have access to VIL in the cockpit but not echo tops, so they aren’t aware of
the exact height of storm cells. As a result, pilots have very little confidence in the weather
forecasts they receive in the cockpit. They are forced to rely on ATC and dispatch over radar
because these resources provide more real time reports and weather forecasts, which is why
our prediction tool is geared towards controller support rather than direct pilot support.
5.1.2
8
Deviation from the Filed Flight Path
Given storm conditions in the terminal area, a pilot may wish to deviate around weather
cells in order to avoid penetration. In theory, a pilot must first obtain approval from ATC to
8
The capabilities that ATC has in this sphere are airport-dependent.
95
deviate from the planned flight path. According to the pilots we interviewed, this process is
actually easy and quite common, with 99% of proposed deviations receiving approval as long
as the deviation does not endanger other aircraft in the vicinity. Nonetheless, even without
approval, the aircraft is still the pilot’s responsibility.
The FAA mandates that above 20,000 ft altitude, pilots must avoid storms by 20 NM or
overfly them by 5,000 ft. Below 20,000 ft, pilots must avoid storms by at least 5 NM. Thus,
the pilot may have to alter the heading given by ATC if it is not adequate to safely avoid
the storm. The capabilities of the aircraft at hand, such as thrust, size, and maneuverability,
play an important part in this decision. Yet the interviewees ranked the following as the top
four factors contributing to the difficulty of deviation: fuel limitations, ATC advisories/regulations, the current runway configuration, and lack of visibility due to nighttime operations,
in that order.
5.1.3
Impact of Convective Weather on Departures
When convective weather is present near the airport, the decision to keep the flight scheduled
is not up to pilot but rather the airline. However, it is officially the pilot’s decision whether
or not to take off. But is it really? Not only are airlines pressuring their pilots to take off, but
they are pressuring air traffic controllers to get planes airborne in order to minimize delays
and reduce fuel wasted while idling. Consequently, ATC usually gives approval to take off
unless weather is right on top of runway. In the end, despite pilots’ honest assessment of the
weather situation, it looks bad if other pilots are taking off and they aren’t. This along with
sensitive duty hour limits breeds the “need-to-get-it-done” attitude often seen among pilots
today.
The interviewees pointed out that pilots don’t taxi around with radar on, but rather
get a good indication of the weather situation when they pull onto the runway. Feedback
from prior departures also helps to paint a picture of the challenges following takeoff. All
96
in all, there is not much planning time once the pilot has decided to take off. Departures
still receive weather updates from ATC when they are within 5 km of the airport, but the
unknown grows from there. Despite this uncertainty, it is theoretically much easier to deviate
around storm cells during ascent than descent because the pilot can turn in a wide variety of
directions immediately following takeoff. This implies that there should be fewer departure
penetrations than arrival penetrations, which is supported by our data.
5.1.4
Impact of Convective Weather on Arrivals
Descending towards the airport during a weather impact is very different from being a
departure because the plane must land at some point. The pilot does not have the luxury of
staying on the ground if the weather is too severe. Consequently, arrivals are given priority
over departures in bad weather and receive constant updates from ATC regarding the current
weather situation.
ATC is able to use Ground Delay and Ground Stop programs to slow down departures
and enable arrivals to perform larger deviations. However, the weather situation is not
always better in landing sectors vs. takeoff sectors. One would think this would be the case
if operations were being catered to arrivals. Yet arrival approach paths are restricted by
wind direction, sometimes forcing them to land on runways covered by convective weather.
In these situations, penetration is immiment, but pilots must maneuver smartly in order to
minimize exposure to weather while also landing safely.
Of course pilots have the option to divert to an alternate airport. Though similar to
the departures side, it looks bad if other pilots are landing and you choose to divert. For
O’Hare arrivals, their alternate airport is Midway, which is less than 20 miles away. Thus,
it is doubtful that diverting to Midway would prevent penetration, as Midway most likely
encounters the same weather patterns as O’Hare due to their proximity. Furthermore, the
“need-to-get-it-done” attitude on the departures side transitions to “get-there-itis” on the
97
arrivals side. The interviewees ranked the following as the top four sources of pressure on
pilots to land as soon as possible: fuel limitations, coordination with ATC, flight behind
schedule, and airline operations (AOC), in that order.
5.1.5
Summary of Interview Takeaways
From the interviews with professional pilots, we learned that ascending flights, descending
flights, and enroute flights at cruising altitudes all face difference challenges when encountering convective weather. The following will differ based on the sequence of flight: visibility
issues, wind issues, type of precipitation and its effects, strength of turbulence, ability to
avoid penetration, and overall how weather affects the flow of air traffic. Pilots, air traffic
controllers, airline operations (AOC), and all personnel involved with air travel must be
aware of these differences given a severe convective weather scenario.
5.2
Case Studies
Case studies and their corresponding trajectory plots are very helpful for understanding the
evolution and movement of weather within the terminal area and how this affects traffic
flows. As we discussed in Chapter 3, the trajectory plots helped to identify several potential
features in our models. In this section, we will focus on a few recurring themes that were
frequently observed in the case studies.
Each of the plots represent a snapshot of a single weather period, and contain all trajectory points close to the airport within that 2.5 minute period. The connected red points
represent individual arrival trajectories and the black lines represent individual departure
trajectories. The circles represent the nose of the plane. The large black circles around
the airport help to indicate the distance of a trajectory point from the airport, increasing
incrementally by 10 km. If a flight’s trajectory intersects severe VIL pixels, we assume that
the flight indeed penetrated severe convective weather because echo tops will be larger than
98
the flight’s altitude close to the airport.
5.2.1
Theme 1: Pilots Try to Avoid Storm Cells
Although in this thesis we focus on why pilots are penetrating severe weather cells, the
trajectory plots often show pilots obviously trying to deviate around storm cells and find
gaps in weather to avoid penetration. If penetration is imminent, pilots will also attempt
to penetrate the lowest VIL areas within the storm cell. Figure 5.1 exhibits this avoidance
behavior.
Figure 5.1: Example of avoidance behavior by arrivals in the Southwest sector of the ORD
terminal area on July 9, 2008 at 001730Z.
The arrivals in the Southwest sector barely nick severe VIL cells as they fly through a gap
in the frontal mass storm moving west to east just south of the airport. It is apparent that
the pilots attempted to avoid not only the storm cells with the highest VIL but storms cells in
general while gearing up for approach from the east. The departures turn immediately upon
99
takeoff to avoid the frontal storm south of the airport. Arrivals do not have this flexibility,
with approach and landing restricted by wind conditions, runway configuration, and other
operating procedures.
However, there are rare cases when flights penetrate severe convective weather for seemingly no reason. Figure 5.2 provides an example of this behavior.
Figure 5.2: Example of unexplained penetration behavior by a departure in the Northwest
sector of the ORD terminal area on July 8, 2008 at 061730Z.
Upon takeoff, the departure flies straight into a VIL level 3 storm cells and remains in
weather for a long period of time. There are no other flights within the vicinity that would
prevent the departure’s ability to deviate. From this example, one may suggest that pilots
do not consider VIL level 3 as “severe”. Yet VIL does not provide us with specific details
regarding weather conditions. Thus, the flight in Figure 5.2 may be experiencing light rain
and limited convectivity, especially if the VIL values are on the low end of the VIL level 3
boundary. In addition, the weather may have worsened quickly, highlighting the uncertainty
100
of weather forecasts.
5.2.2
Theme 2: Arrivals Have a Tougher “Go-of-It”
We have established in this thesis the fact that arrivals hypothetically have a harder time
avoiding severe convective weather in the terminal area. This is due to their restrictive
approach and landing procedures as well as the fact that they must land at some point.
Figures 5.3 and 5.4 show that arrival operations continue even when penetration is imminent.
Figure 5.3: Both departures and arrivals affected by weather in the West sector of the ORD
terminal area on July 2, 2008 at 222500Z.
In Figure 5.3 we see departures penetrating very high VIL levels despite turning immediately upon takeoff. The storm cell is so large that penetration is unavoidable. Arrivals
execute a trombone maneuver while trying to stay on the outer edges of the storm.
Less than ten minutes later, we see in Figure 5.4 that the massive storm cell has move
quickly from west to east, with VIL levels 5 and 6 now covering the airport. ATC has halted
101
Figure 5.4: Ground stop is issued due to weather covering the airport on July 2, 2008 at
223230Z.
all departure operations, while arrivals continue to execute the same trombone maneuver
as in the first image. Diversion to Midway will not alleviate the situation, but rather will
expose the arrivals to more opportunities for penetration. Thus, pilots brace themselves for
severe weather conditions and do their best to land the plane safely. This example echoes
that arrival penetration behavior is inherently different than departure penetration behavior.
5.2.3
Theme 3: Weather Is Unpredictable
Although the title seems obvious, the extent to which the movement of weather and its
changing level of strength affect terminal airspace was not explored in Lin’s thesis. We
devoted an entire set of model features to this weather behavior based on the case studies
we examined. Figures 5.5, 5.6, and 5.7 outline one of these case studies, which displays just
102
how fast the severity of weather cells can change despite slow movement of a frontal storm.
Figure 5.5: Arrivals executing approach and landing maneuvers amidst severe weather in
the Northwest sector of the ORD terminal area on August 22, 2008 at 173000Z.
In Figure 5.5, we see that a group of severe storm cells is on top of the airport, forcing
arrivals to penetrate while executing their approach/landing maneuver in the Northwest
sector. Ten minutes later, Figure 5.6 shows that two of the large storm cells have joined
and are surrounding the airport, while a large concentration of VIL level 6 pixels has formed
right along the arrival approach path. Penetrations of this severity are not sustainable, so
arrivals in Figure 5.7 begin to circumvent and fly behind the large storm cell north of ORD
as it moves West to East. Departure operations have resumed; the plot shows them flying
through areas of low VIL within the storm cell since the weather scenario dictates imminent
penetration upon takeoff.
This case provides an example of pilots adapting to the rapidly changing weather conditions close to the airport and minimizing collateral damage despite imminent penetration.
103
Figure 5.6: Concentration of VIL level 6 pixels forms in the middle of the arrival approach
path in the Northwest sector of the ORD terminal area on August 22, 2008 at 174000Z.
5.2.4
Case Study Wrap-Up
In this research, we have defined VIL levels of 3 or higher to be hazardous, as pilots tend
not to fly through them in terminal airspace. In reality, the situation is more complex, with
existing cases of pilots flying straight through level 3 weather and cases of pilots who seem to
be avoiding level 2 weather. Case studies allow visualization of these scenarios and help to
understand why pilots do what they do in the terminal area. Sequences of trajectory plots
tell a story, bringing the numbers in the data to life rather than trying to make sense of the
number themselves.
104
Figure 5.7: Arrivals begin to circumvent the storm cell as it moves west to east in order to
maintain the approach path in the Northwest sector of the ORD terminal area on August
22, 2008 at 175730Z.
6
Conclusions and Future Work
6.1
Thesis Summary and Conclusions
Through our use of predictive modeling, case studies, and pilot experience, we constructed
semi-dynamic models that accurately predict severe convective weather penetration in terminal areas across the U.S. up to 99% of the time. We also extracted the relative importance
of features within these models in order to identify those features that best correlate with
and influence pilot penetration. Our findings in this area reinforced those of Yi-Hsin Lin,
that the primary indicators of penetration continue to be weather-based, particularly the
presence of severe weather within a flight’s trajectory projection.
Nevertheless, we found that a number of the operational features in our models weakly
105
correlate with severe convective weather penetration. Despite having lower importance values, these features help shed light on the dynamics of the terminal area. In particular, the
importance of features that describe the behavior of other pilots in the terminal area may
help us understand how pilots and air traffic controllers deal with weather impacts close to
the airport. The most important conclusion was that pilots are more likely to penetrate
severe weather when other pilots ahead of them in the ascent/descent sequence already penetrated, as it is a good indication of what is to come. These results imply that rerouting
around weather is still often done on an ad hoc basis once a pilot reports his/her weather
penetration to ATC.
In conclusion, we hope that the robustness of our models allows for implementation across
the U.S. and serves as a supplemental tool to existing terminal area convective weather
mitigation strategies used by ATC.
6.2
Ideas for Future Work
There is no doubt that there is still a great deal of work to be done in understanding the
impact of severe weather on terminal area operations as well as how pilots respond to severe
weather scenarios. The following sections address some of the shortcomings of this thesis
and propose ideas for future research in this area.
6.2.1
Expand Model Datasets
Although a severe convective weather penetration is considered a rare event in the scheme
of air operations, there are still a relatively small number of penetrating flights in our model
datasets. With only three months of trajectory data from 2008, we would ideally acquire
data from more recent years to expand our datasets, verify that the same patterns hold year
over year, and retrain our models. Otherwise, the predictive power of our models may be
limited by the fact that the penetration behavior they have been exposed to is also limited.
106
6.2.2
Incorporate Weather Forecasts
Examination of weather forecasts distributed to pilots within the terminal area would help to
paint a clearer picture of what information the pilot has upon encountering severe convecting
weather. The challenge is that not all pilots receive the same information, with different
aircraft having different forecasting and communication capabilities. Thus, the integration
of advisory features based on these forecasts into our predictive models would most likely
not achieve consistent results.
Nonetheless, we could compare the penetration events to the terminal area forecasts (of
varying lead time) to gauge whether 1) the pilot was aware of possible weather conditions and
2) whether penetration could have been avoided via advance deviation. If a large proportion
of penetration events resulted from unforecasted weather, failure to avoid the weather can
be attributed to the accuracy of the forecasting methods rather than the pilot, who could
not feasibly deviate in time. On the other hand, if the forecasts largely match the actual
weather that occurred, this would imply that such weather could have been avoided pilot.
In this case, penetration may be classified as a calculated decision on the part of the pilot
or ATC. Imminent penetration close to the airport, particularly during takeoff and landing,
is a third possibility that is not as relevant to forecast accuracy but calls into question the
decisionmaking process of Ground Control.
6.2.3
Additional Weather Features
In addition to improving upon the precision of the current weather-based features in our
models, it may be beneficial to include additional weather features such as wind conditions,
turbulence levels, and NASA’s Weather Impacted Traffic Index (WITI) in our models.
Convective weather is mostly caused by wind and turbulence. The ASPM database contains measures of these weather factors, but these measures are only accurate within 10 km
of the airport. Additionally, these measures are not specific to aircraft location, instead
107
summarizing the area as a whole. Doppler data that measures these factors on a pixel-bypixel basis would strengthen our models by differentiating between heavy precipitation and
convective weather during penetration classification.
Moreover, WITI captures hourly traffic flow information that our current models do not
address by quantifying the level of congestion in weather-impacted areas. WITI is primarily
used by ATC to determine when to implement ground delay for departures and rerouting
schemes for arrivals. WITI is robust, as it can be applied to both forecasted weather and
actual weather. Overall, WITI could augment the already strong core of weather-based
features in our models by capturing air traffic flows with quantitative values.
6.2.4
Taking an Alternative Approach: Human Factors
It is no secret that pilots differ with regards to personality, training, experience, and background. For example, one pilot may be riskier than another, or a given pilot may have more
experience in severe weather scenarios than his peers. Additionally, some pilots have “home
base airports” which which they are more familiar. These factors, which deal with individual
variability among pilots, were not explored in this thesis because this data does not exist.
Future studies could take a completely different approach from this thesis and investigate
how this pilot variability affects penetration behavior using flight simulators. Within these
flight simulators, pilots of different backgrounds and experience levels would be subjected
to severe weather encounters, and their behavior would be recorded and assessed. Identified
common trends in behavior that correlate with certain personality traits or experience levels
could spur modified training procedures.
109
References
[1] Air traffic control system command center: enhanced traffic management system
(ETMS). Available at http://www.fly.faa.gov/Products/Information/ETMS/etms.html
(2006).
[2] ASPM
system
overview.
Available
at
http://aspmhelp.faa.gov/in-
dex.php/ASPM System Overview (2014).
[3] Aviation weather research:
consolidated storm prediction for aviation (CoSPA).
Available at https://www.ll.mit.edu/mission/aviation/aviationwxresearch/cospa.html
(2013).
[4] A. Bayaga. Multinomial logistic regression: usage and application in risk analysis. Journal of Applied Quantitative Methods, 5(2):288-297, 2010.
[5] S. Campbell and R. DeLaura. Convective weather avoidance modeling in low-altitude
airspace. In AIAA Modeling and Simulation Technologies Conference, Portland OR,
2011.
[6] S. Campbell, M. Matthews, and R. DeLaura. Evaluation of the convective weather
avoidance model for arrival traffic. In 12th AIAA Aviation Technology, Integration, and
Operations (ATIO) Conference, Indianapolis IN, 2012.
[7] R. DeLaura and J. Evans. Exploratory study of modeling enroute pilot convective storm
flight deviation behavior. In 12th American Meteorological Society Conference on Aviation, Range and Aerospace Meteorology, Atlanta GA, 2006.
[8] R. DeLaura, M. Robinson, M. Pawlak, and J. Evans. Modeling convective weather
avoidance in enroute airspace. In 13th American Meteorological Society Conference on
Aviation, Range, and Aerospace Meteorology, New Orleans LA, 2008.
[9] E.A. Dinsdale. Multivariate analysis of functional metagenomes. Technical report, San
Diego CA, 2013.
[10] J.
Erdman.
10
worst
weather
U.S.
airports.
Available
at
http://www.weather.com/news/news/10-worst-weather-airports-20131125
(2013/11/25).
[11] W.H. Greene. Econometric Analysis. Prentice Hall, New York NY, 5th edition, 2003.
[12] H. Griffioen. Air Crash Investigations: The Crash of Air France Flight 358. Lulu Enterprises, London UK, 2009.
[13] G. Kulesa. Weather and aviation: how does weather affect the safety and operations of
airports and aviation, and how does FAA work to manage weather-related effects? In
The Potential Impacts of Climate Change on Transportation, pages 199-208. U.S. Department of Transportation Center for Climate Change and Environmental Forecasting,
Washington DC, 2003.
[14] Y. Lin. Prediction of Terminal-Area Weather Penetration Based on Operational Factors. Masters thesis, Massachusetts Institute of Technology, Department of Civil and
Environmental Engineering, Cambridge MA, 2013.
[15] M. Matthews and R. DeLaura. Modeling convective weather avoidance of arrivals in the
terminal airspace. In American Meteorological Society Special Symposium on Weather
and Air Traffic Management Integration, Seattle WA, 2011.
[16] D. M. Pfeil. Optimization of Airport Terminal-Area Air Traffic Operations under Uncertain Weather Conditions. PhD thesis, Massachusetts Institute of Technology, Sloan
School of Management, Operations Research Center, Cambridge MA, 2011.
111
[17] D.A. Rhoda and M.L. Pawlak. An assessment of thunderstorm penetrations and deviations by commercial aircraft in the terminal area. Technical report prepared for NASA
Ames Research Center, Lexington MA, 1999.
[18] L. Rokach and O. Maimon. Data Mining with Decision Trees, volume 3204 of Machine
Perception and Artificial Intelligence. World Scientific, London UK, 2008.
[19] C.H. Snyder. Aviation weather for pilots and flight operations personnel. Technical
manual prepared by the FAA Academy, Washington D.C., 1975.
[20] T.M. Therneau and E.J. Atkinson. An introduction to recursive partitioning using the
rpart routines. Technical report, Rochester MN, 2014.
[21] What is the variable importance measure?
Available at http://www.salford-
systems.com/blog/company/item/40-what-is-the-variable-importance-measure?
(2014).
112