2. literature review - Academic Science,International Journal of

advertisement
Weather Optimized Routing Algorithm for Aircrafts
Hari Iyer
Pursuing B.E.,
Department of
Computer Engineering,
Dwarkadas J. Sanghvi
College of Engineering
Mumbai, India
hariiyer1@gmail.com
Harsh Desai
Pursuing B.E.,
Department of
Computer Engineering,
Dwarkadas J. Sanghvi
College of Engineering
Mumbai, India
harsh301994@gmail.com
ABSTRACT
Aviation data analysis has been the most prominent and vital
source of testing and statistical data that can be used for
heuristics and performance evaluation applications in
Commercial Aviation. This paper focuses on data mining
from online weather resources that can be used for real-time
data-pulls and analyze the most optimum path for flying a
particular leg. Data clustering, Geocoding, Earth Geometry,
and Google Maps API together provide a perfect blend of
tools that can be used to structure, model, evaluate, and
represent data. Word Sense Disambiguation (WSD) and
Natural Language Processing (NLP) techniques are useful
techniques to interpret weather data that is originally fetched
in common language format. Mathematical diagnosis of this
information results in usable and use-case relevant datasets.
Using all these services, the aim here is to pick out the best
possible flight path from the currently functional routes.
General Terms
WSD, NLP, API.
Keywords
Aviation Data Analysis, Geocoding, Earth Geometry, Google
Maps, Word Sense Disambiguation, Natural Language
Processing.
1.
INTRODUCTION
The past decade saw many air mishaps, which were
investigated for flaws in the system in place. One of the major
reasons for a major chunk of these incidents was found out to
be the unpredicted weather fluctuations. Modern airliners
make it a point to analyze the weather that the crew will
experience through their flight. This enables precautions and
checklists to be added to the routine flying operation.
However, there is a vast room for improvement in this
structure. Weather forecast is available for developer use from
various free APIs, airports have history of flight data which
can be used for statistical delay analysis, and Earth geometry
to calculate distance related consumption. We are after a
system that will provide a complete package of safest route
determination, flight time optimization, and fuel efficiency. A
survey of the existing data mining techniques for extracting
weather data is presented here. But, the data retrieved is
usually in text format, which the machine cannot interpret in
terms of weather polarity. To solve this issue, a new approach
to natural language processing is adopted. The algorithm
follows a stratified methodology to provide all the features to
the customer use-cases, which is the topmost layer in the
bottom-up model. Adding another dimension to the currently
existing (latitude, longitude) coordinates, called altitude, will
help increase the range and scope of alternative provisions.
Darshan Bhansali
Pursuing B.E.,
Department of
Computer Engineering,
Dwarkadas J. Sanghvi
College of Engineering
Mumbai, India
darshan941018@gmail.com
Abhijit Patil
Assistant Professor,
Department of
Computer Engineering,
Dwarkadas J. Sanghvi
College of Engineering,
Mumbai, India
abhijit.patil@djsce.ac.in
This model, being generic for all airlines will increase
resource pooling and will ease out the burden on individual
systems. A user-understandable representation protocol helps
in easy and quick readability. Suggestions for flight paths is
the maximum possible output of this system. The adjudicating
authority for the flight path to be selected is the airline, or in
some cases the airports authority. Flight data is available as an
open-source and free resource by many firms like
Flightaware. Though not shared most of the times, airport's
flight history data is the most reliable source as far as facts
and figures are considered. In this paper, the above-mentioned
features will be integrated into a single tool.
2.
LITERATURE REVIEW
2.1 Data extraction and mining
The world is exploding with electronic information and
business data use-cases. The area in which the data mining
concept is used is mainly in Business Analytics, Web data
analysis, text analysis, social science problems and many
other such domains in which the technique of retrieving
hidden information is applied. Data mining is a concept that
comprises of many different techniques which are used to
understand different patterns in large databases and helps in
decision making process. Analyzing large amount data is an
emerging trend which helps in creating an efficient working
environment. Different data mining methods like clustering,
classification, outlier, association, pattern matching are used
for analysis of data which eventually helps for efficient
decision making. A set of data mining strategies are
implemented for any system which involves classification,
estimation, prediction, association and clustering. The
classification stage involves classifying the given data
instance, one example from our system can be whether the
weather data received can be classified as safe or unsafe
depending on the past weather reports. Estimation and
prediction model are used for determining values which are
not certain. Eg. Prediction based on past weather data to
determine weather activities in near future. Association rules
are interesting hidden rules which improves the functioning
for optimum outputs, considering the concept of finding the
optimum route for airplanes, different attributes such as the
distance between source and destination, safety level of the
route when associated together helps to find the best results
from all the available data. Finally clustering is a technique
which is used to classify certain data items into groups of
certain similar criteria. That is to group weather data into two
criteria 1) set of data instances where weather is declared safe
2)set of data which is declared as unsafe. These were a few
strategies used for implementation of data mining. On similar
grounds the concept of data extracting is used to retrieve data
1
from different sources of data which are usually not structured
properly which can later be used for storing and processing
that data for better and efficient analysis of any particular
system.
2.2 Flight routing approaches
Flight routing has many important parameters to be taken
under consideration before finalizing the path for flying. In
any normal scenario it’s the pilot or the dispatch agency that
requests the controlling agency for a route when they report a
plan for the flight before the commencement of the flight.
They have the right to request for any route that is legal and
feasible, on receiving the flight plan the controlling agency
will consider different factors and determine a felicitous flight
plan in arrangement with other traffic. The pilot requests for
clearance of their flight shortly before their scheduled time
and the controlling agency sends the route that is appropriate
to fly, which might be different from the route which was
earlier filed. For any given two points i.e. the source and the
destination there is a direct route between those two points
called the “Green Circle”. There are four different factors due
to which the flight route deviates from the Green Circle and
chooses a route that is not direct (Green Circle). The first
factor which is considered is the air traffic. If there are many
different airplanes in the same corridor that might cause air
congestion, the flight will have to choose another corridor.
Second factor which is very important which is wind/
weather, for instance if the headwinds are too strong for any
particular path then the plane will have to choose another part
avoiding the path with violent weather conditions. Third, the
flight path will not be the Green Circle if there are no
appropriate diversion airports along the selected flight route.
Alternate airports are required if there is any case for
emergency landing and the airplane can land accordingly to
the nearest airport avoiding any mishap. Preferably the flight
path is selected which has direct route plan i.e. the Green
Circle is the most preferable flight route. Weather is the most
crucial factor to be considered, this can be concluded on the
basis of past accidents in the aviation industry. Considering
the example of Air Asia QZ8501, which crashed into the Java
Sea. The reason for the crash was due to thunderstorms. If the
flight plan could have avoided such thunderstorm the crash
could have been avoided.
3.
STUDY
TECHNIQUES
OF
DATA
MINING
3.1 k-means
The k-means algorithm is a data mining technique used to
partition objects into clusters.
3.1.1 Introduction
In signal processing, the k –means algorithm is primarily used
for the vector quantization method. In addition, it is used for
clustering of data in data mining. The algorithm has proved
exceedingly effective in clustering large data sets[5]. It is able
to successfully cluster both numeric data and real world
unstructured data. In clustering, the objects are partitioned in
such a manner that the objects having greater similarity are in
one cluster. Many such clusters are formed form the data
partitioned. When we are provided with a d-dimensional
vector, the algorithm partitions the n objects into different sets
with the aim to optimize the within cluster sum of squares.
Now calculate the new mean as the centroid in the newly
formed clusters. This process eventually reduces the WCSS.
3.1.2 Drawbacks
K-means is a widely used method in cluster analysis.
Primarily the algorithm optimizes the WCSS when provided
with a d-dimension data set. Hence k-means algorithm is
basically an optimization problem. However, it suffers from
few drawbacks. The algorithm wrongly assumes that the
variance of distribution of all the objects in the data sets is
spherical. In some cases, it wrongly assumes that the variance
of various variables is the same. The algorithm assumes that
the size of each cluster is approximately same. Hence, the
failure of any one of the assumptions leads to failure in the kmeans algorithm.
3.2 Decision Trees
A decision tree consists of structures like root and leaf nodes,
branches etc. The test on an attribute is denoted by the internal
node whereas the branches denote the test outcome.
3.2.1 Introduction
On the basis of a classification or a regression model, the
decision trees are built in the form of a tree structure. The aim
is to design a model capable of predicting the final value
based on a large number of input variables. On the basis of
various attribute tests, the tree is learned by the sets into
various subsets. In other words, a decision tree is concurrently
developed by breaking down the data sets into incrementally
smaller subsets. The decision node has two or more branches
each denoting a particular test attribute. Decision trees are
equipped to handle both categorical real world data and
numerical data.
3.2.2 Drawbacks
As the decision tree optimization leads to the NP-complete
problem in several cases, it becomes difficult to learn it. As a
result the decision tree algorithms utilize the Greedy
algorithm for local optimization which cannot ensure to return
a decision tree that is optimized globally[8]. To overcome the
problem of creation of very complex trees, pruning techniques
have to be employed. In addition the decision trees face issues
when working either missing values or when super attributes
comes into play. The process of converting the numerical data
into equivalent categorical data also leads to the problem of
binning in decision trees.
3.3 Artificial neural network
The artificial neural network is a system that tries to replicate
the biological neural network like human brain.
3.3.1 Introduction
The artificial neural network is extensively used in machine
learning and data mining. The artificial neural network (ANN)
attempts to build a system that functions like the neurons in
human brain. Although the task is daunting, ANN have been
implemented with a certain degree of success for data mining.
2
In ANN[1], a large number of artificial nodes are built that
function as the neurons. Each and every node is connected to
the every other node in the system. The strength of this
connection is used to assign a value to it, indicating whether it
shares a strong or weak connection. The input node is fed data
which is in the form of numerical data. Each node is then
assigned a number indicating the value of activation. On the
basis of the strength of the connections two nodes share, the
activation value is shared between them. The activation value
then flows through the entire neural network in a hidden
manner until it reaches the output node where it is reflected in
a meaningful way to the end user.
3.3.2 Drawbacks
To implement an effective artificial neural network, a large
amount of resources needs to be deployed due to the
complexity of the process. The ANN approach may prove to
be infeasible in comparison to other approaches for
performing data mining on smaller data sets. It often suffers
from the problem of under training or over training in terms of
the learning it requires[2]. Hence it becomes important to train
them using the right data set and in a proportional manner.
1. Input Flight source, flight destination, and scheduled
departure time.
2. Calculate distance:
The distance is calculated w.r.t. the geographical coordinates
of the source and destination supplied. The latitude and
longitude are fetched from Google Map's Geocoding API.
Then, Haversine formula is used to determine the aerial
distance between the two points. The algorithm goes as
follows:
function dist(a, b, x, y)
{
// (a, b) is source, and (x, y) is destination.
Var R = 6371000;
a = a.toRadians();
x = x.toRadians();
diff_lat = (x-a).toRadians();
diff_lon = (y-b).toRadians();
a = pow(Math.sin(diff_lat/2), 2) +
c = 2*atan2(sqrt(a), sqrt(1-a));
d = R * c;
return d;
}
// Calculate the minimum distance path hierarchy
4.ALGORITHM
A stratified bottom-up approach is adopted in order to enable
cost and space optimization based testing between state
transitions. The individual workload on airlines around the
world can be reduced using this flow of operations. Real-time
forecast, route suggestion, interactive map interface, and
Illustration 1
many other features are pooled into the system. The
algorithm, layer-by-layer can be stated as follows:
if(d approximately equals 20,000)
{
// 20+ hrs duration flights.
// Source – London – Destination:
A = dist(src, LHR) + dist(LHR, dest);
// Source – Hong Kong – Destination:
B = dist(src, HKG) + dist(HKG, dest);
// Source – North Pole – Destination:
C = dist(src, North Pole) + dist(North Pole, dest);
path[] = ascending_order(A, B, C);
}
else
{
//Direct flights:
path = distance(src, dest);
}
//image
Consider the example, Mumbai(BOM) to Los Angeles(LAX).
Case 1: BOM – LHR – LAX.
Coordinates[][]:
\
Coordinates:
0
1
0
BOM_lat
BOM_lon
1
LHR_lat
LHR_lon
2
LAX_lat
LAX_lon
4.1 Preliminary operations
In this stage, all the inputs that are required by the system are
taken from the user or retrieved from a third-party system. It
provides data for operating the algorithm. The steps are as
follows:
3
Case 2: BOM – HKG – LAX.
Same as case 1, only LHR coordinates are replaced by those
of HKG. A similar Coordinates matrix is maintained for thos
case too.
Case 3: BOM – NORTH POLE – LAX.
In this case, we consider North Pole as an airport, since the
route that the aircraft will follow once in the other side of the
globe will change w.r.t. the destination. The coordinates are
maintained in a 2-dimensional array.
In this step, the routes to be analyzed for weather and routing
have been decided and segregated according to distance.
Coordinates[][] and path[] are given as inputs to the next step.
4.2 Coordinate-wise weather analysis
This process is the route analyzer and plotter on the maps. It is
a step-by-step procedure, and can be stated as follows:
According to path[], the 0th, 1st, and 2nd routes are
selected. For all the three routes, the coordinates matrix
contains the source, stop, and destination points. They will
be individually analyzed for mathematical pointprogression and calculations[10]. The working is as
follows:
for(i = 0; i < 2; i ++)
{
slope
=
equation_builder(coordinates[i][0],
coordinates[i][1],
coordinates[i+1][0],
coordinates[i+1][1], );
}
/*This passes the current and next (lat, lon) pair to the
equation builder.*/
function equation_builder(a, b, c, d)
{
slope = (d-b) / (c-a); // (dy/dx)
intercept = b-slope*a;
return slope;
}
Thus, the equation for both the legs of the flight are
calculated. A major reason for this operation is is to keep
track of latitude and longitude increment-factor and
directional heading. The most short and optimized path for a
flight is a straight line drawn from source to destination. Thus,
we focus on linear propagation.
Working of linear propagation:
Consider a line, with end points A(2, 3) and B(5, 7).
diff_x = 3;
diff_y = 4;
x = a to c
{
y = (source) y-coordinate + slope*(current_x –
initial_x);
}
(x, y) is the next point on the curve. This very concept is used
to find latitude and longitude coordinates on a flight path.
Equation_builder returns the slope, which will be used to
evaluate weather at every point by using the above-mentioned
incrementation process. The step-value of delta is kept as 0.25
to ensure close coverage.
// Now, the weather enroute is calculated using this routine:
Calculate_weather(slope);
function Calculate_weather(slope)
{
current_x = a;
current_y = b;
for(i = a; i <= c; i += 0.25)
{
current_x = I;
current_y = b + slope*(current_x – a);
weather_data[] = get_Polarity(current_x,
current_y);
}
}
/*The following function computes polarity of the weather
condition verdict: */
function get_Polarity(x,y)
{
// Just once:
static time = flight_departure_time;
time += 5 minutes;
data = file_get_contents(“forcast.io?lat=x&lon=y”);
}
/*
The output of “data” would be as follows:
Date
Time
Location
Weather
07/07/2015
07:05:00
(x, y)
Light rains
07/07/2015
07:10:00
(x, y)
Thunderstorm
07/07/2015
07:15:00
(x, y)
Rough air
Illustration 2
4
This data has to be mined for the most accurate data result.
The time that currently the aircraft is in its flight has been
updated. So, the entry in the data retrieved has to be closely
matched with flight's current travel time position. For that, the
above procedure is used.
After executing the above-mentioned routines, weather_data[]
is populated with the polarity of weather conditions over the
path.
Let diff1 = Flight_time – x;
Let diff2 = (x+5) – Flight_time;
Before going ahead with weather analysis, the polarity of
individual point's weather conditions has to be calculated. For
that, the following algorithm is used:
Consider, weather = “Light winds and thunderstorms”;
Polarities are assigned as neutral, positive, and negative to the
words contained by the verdict string.
So,
Light – Positive.
Winds – Neutral.
And – Neutral.
Thunderstorms – Negative.
4.3 Clustering
To make this data interpretable and understandable quickly,
clustering is used over the container storing the data.
One of the basic rules of nature is that change in weather
cannot be predicted for abruptness. The deterioration or
suitability may alter without prior symptoms and warnings.
So, a label should be associated to every area, stating if it is
suitable of dangerous to fly in that zone. For this, the
weather_data[] is divided into chunks and aggregated for
finding out distinct regional statistics[9]. This process is
known as clustering. The algorithm for data clustering is as
follows:
i = 0, head = 0;
function cluster(weather_data[])
Now, the negative component has far more ill-effect on the
flight than the good impact of positive forecast.
{
while(!end_of_weather_data)
Setting up a scale,
0 – Neutral.
1 – Positive.
-3 – Negative.
//based on head
{
while(consistent_polarity_sign)
{
Thus, negative polarity will dominate, if in existence.
temp[i] = index(weather_data[]);
}
avg = average(polarity(temp[]));
cluster_assign_polarity(avg,
weather_data[], i);
/*This assigns a cluster_polarity to the groups
formed in weather_data[] */
Illustration 3
i++;
}
After getting the polarity of the weather verdict, it has to fit
into the time frame.
As discussed earlier,
diff1 = Flight_time – x;
diff2 = (x+5) – Flight_time;
Using section-formula,
current_weather polarity:
polarity = (diff2*polarity(x) + diff1*polarity(x+5)) / 5;
*/
/*After the above steps, the polarity is returned to the calling
function. */
return polarity;
head = i;
}
The above routine groups weather_data into zones as safe or
unsafe flying conditions. It helps in quick readability and of
data and efficient route decision. If the last cluster of
weather_data [] shows negative polarity, nearby airports will
be suggested according to radial distance for alternate landing.
The cluster-based system will be used to find the most
optimum path. path[0], path[1], path[2]; all the three have
individual weather data for their respective routes. The next
section explains selection of these pathway corridors.
}
5
4.4 Optimized Routing
The weather data finally obtained has to be mapped onto realtime flight use-cases. The source, destination, and departure
time of the flight is given as input by the user/airline. The
protocol for determining the safest route is as follows:
path[0] > path[1] > path[2] … Preference order.
cluster_polarity(weather_data[]) should be maximum.
The overall cluster_polarity for all the paths is determined,
and the most favorable route is given priority, followed in
descending order of weather harmlessness. In the rarest case
of same weather conditions on more one route, altitude is
taken into consideration. If changing the altitude improves a
contestant's chance of becoming the most optimized path, the
priority is redesigned and distance optimality is maintained.
Routes are represented as a very user-friendly interface, using
Google Maps API and modern web technologies. All the three
routes are presented in a lucid manner using selection options
and a suitable color code to indicate the optimal solution to
the user-query.
5.CONCLUSION AND FUTURE WORK
An amalgamated approach for efficient flight routing has been
presented in this paper. The model, if deployed, can enhance
air travel both, commercially and functionally. The millions of
people flying in the sky at any given point in time can be sure
of what they will expect as the flight cruises at high flight
levels. The approach, being hybrid, pools in advantages from
various domains and fields in order to build a new and
practically implementable system. Airlines can partner for this
model to be put in place, which would ease out the burden for
weather analysis before every aircraft lifts towards the sky.
This algorithm will be implemented, and hosted online as a
resource for clients and private jetliners for weather data
retrieval purposes. For a wide range of applications, the
designed algorithm provides leverage for optimization. The
algorithm will be tested and applied to various open-source
knowledge graphs, and a schema will be provided to the
approved data chunk. FlightRouter, an online platform in
under development will apply this algorithm for better results
and upgraded performance.
[2] Chintan Shah and Anjali Jivani, “Comparison of Data
Mining Clustering Algorithms”, at IEEE, 2013 Nirma
University International Conference on Engineering
(NuiCONE).
[3] Marshall E. Koch and Alex Buchholz, The MITRE
Corporation, McLean, Virginia, “Quantitative Analysis of
Aircraft Height ”, at 2011 IEEE.
[4] Jesper Bronsvoort, Greg McDonald, Mike Paglione,
Carlos Garcia-Avello & Ibrahim Bayraktutar, “Impact of
Missing Longitudinal Aircraft Intent on Descent Trajectory
Prediction.”
[5] Huang, Zhexue. "Extensions to the k-means algorithm for
clustering large data sets with categorical values." Data
mining and knowledge discovery 2.3 (1998): 283-304.
[6] Apté, Chidanand, and Sholom Weiss. "Data mining with
decision trees and decision rules." Future generation computer
systems 13.2 (1997): 197-210.
[7] Hall, Mark, et al. "The WEKA data mining software: an
update." ACM SIGKDD explorations newsletter 11.1 (2009):
10-18.
[8] Craven, Mark W., and Jude W. Shavlik. "Using neural
networks for data mining." Future generation computer
systems 13.2 (1997): 211-229.
[9] Srivastava, Jaideep, et al. "Web usage mining: Discovery
and applications of usage patterns from web data." ACM
SIGKDD Explorations Newsletter 1.2 (2000): 12-23.
[10] Berkhin, Pavel. "A survey of clustering data mining
techniques." Grouping multidimensional data. Springer Berlin
Heidelberg, 2006. 25-71.
REFERENCES
[1] Barahate Sachin R. and Shelake Vijay M., “A Survey and
Future Vision of Data mining in Educational Field” at 2012
Second International Conference on Advanced Computing &
Communication Technologies.
6
Download