Weather Optimized Routing Algorithm for Aircrafts Hari Iyer Pursuing B.E., Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering Mumbai, India hariiyer1@gmail.com Harsh Desai Pursuing B.E., Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering Mumbai, India harsh301994@gmail.com ABSTRACT Aviation data analysis has been the most prominent and vital source of testing and statistical data that can be used for heuristics and performance evaluation applications in Commercial Aviation. This paper focuses on data mining from online weather resources that can be used for real-time data-pulls and analyze the most optimum path for flying a particular leg. Data clustering, Geocoding, Earth Geometry, and Google Maps API together provide a perfect blend of tools that can be used to structure, model, evaluate, and represent data. Word Sense Disambiguation (WSD) and Natural Language Processing (NLP) techniques are useful techniques to interpret weather data that is originally fetched in common language format. Mathematical diagnosis of this information results in usable and use-case relevant datasets. Using all these services, the aim here is to pick out the best possible flight path from the currently functional routes. General Terms WSD, NLP, API. Keywords Aviation Data Analysis, Geocoding, Earth Geometry, Google Maps, Word Sense Disambiguation, Natural Language Processing. 1. INTRODUCTION The past decade saw many air mishaps, which were investigated for flaws in the system in place. One of the major reasons for a major chunk of these incidents was found out to be the unpredicted weather fluctuations. Modern airliners make it a point to analyze the weather that the crew will experience through their flight. This enables precautions and checklists to be added to the routine flying operation. However, there is a vast room for improvement in this structure. Weather forecast is available for developer use from various free APIs, airports have history of flight data which can be used for statistical delay analysis, and Earth geometry to calculate distance related consumption. We are after a system that will provide a complete package of safest route determination, flight time optimization, and fuel efficiency. A survey of the existing data mining techniques for extracting weather data is presented here. But, the data retrieved is usually in text format, which the machine cannot interpret in terms of weather polarity. To solve this issue, a new approach to natural language processing is adopted. The algorithm follows a stratified methodology to provide all the features to the customer use-cases, which is the topmost layer in the bottom-up model. Adding another dimension to the currently existing (latitude, longitude) coordinates, called altitude, will help increase the range and scope of alternative provisions. Darshan Bhansali Pursuing B.E., Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering Mumbai, India darshan941018@gmail.com Abhijit Patil Assistant Professor, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai, India abhijit.patil@djsce.ac.in This model, being generic for all airlines will increase resource pooling and will ease out the burden on individual systems. A user-understandable representation protocol helps in easy and quick readability. Suggestions for flight paths is the maximum possible output of this system. The adjudicating authority for the flight path to be selected is the airline, or in some cases the airports authority. Flight data is available as an open-source and free resource by many firms like Flightaware. Though not shared most of the times, airport's flight history data is the most reliable source as far as facts and figures are considered. In this paper, the above-mentioned features will be integrated into a single tool. 2. LITERATURE REVIEW 2.1 Data extraction and mining The world is exploding with electronic information and business data use-cases. The area in which the data mining concept is used is mainly in Business Analytics, Web data analysis, text analysis, social science problems and many other such domains in which the technique of retrieving hidden information is applied. Data mining is a concept that comprises of many different techniques which are used to understand different patterns in large databases and helps in decision making process. Analyzing large amount data is an emerging trend which helps in creating an efficient working environment. Different data mining methods like clustering, classification, outlier, association, pattern matching are used for analysis of data which eventually helps for efficient decision making. A set of data mining strategies are implemented for any system which involves classification, estimation, prediction, association and clustering. The classification stage involves classifying the given data instance, one example from our system can be whether the weather data received can be classified as safe or unsafe depending on the past weather reports. Estimation and prediction model are used for determining values which are not certain. Eg. Prediction based on past weather data to determine weather activities in near future. Association rules are interesting hidden rules which improves the functioning for optimum outputs, considering the concept of finding the optimum route for airplanes, different attributes such as the distance between source and destination, safety level of the route when associated together helps to find the best results from all the available data. Finally clustering is a technique which is used to classify certain data items into groups of certain similar criteria. That is to group weather data into two criteria 1) set of data instances where weather is declared safe 2)set of data which is declared as unsafe. These were a few strategies used for implementation of data mining. On similar grounds the concept of data extracting is used to retrieve data 1 from different sources of data which are usually not structured properly which can later be used for storing and processing that data for better and efficient analysis of any particular system. 2.2 Flight routing approaches Flight routing has many important parameters to be taken under consideration before finalizing the path for flying. In any normal scenario it’s the pilot or the dispatch agency that requests the controlling agency for a route when they report a plan for the flight before the commencement of the flight. They have the right to request for any route that is legal and feasible, on receiving the flight plan the controlling agency will consider different factors and determine a felicitous flight plan in arrangement with other traffic. The pilot requests for clearance of their flight shortly before their scheduled time and the controlling agency sends the route that is appropriate to fly, which might be different from the route which was earlier filed. For any given two points i.e. the source and the destination there is a direct route between those two points called the “Green Circle”. There are four different factors due to which the flight route deviates from the Green Circle and chooses a route that is not direct (Green Circle). The first factor which is considered is the air traffic. If there are many different airplanes in the same corridor that might cause air congestion, the flight will have to choose another corridor. Second factor which is very important which is wind/ weather, for instance if the headwinds are too strong for any particular path then the plane will have to choose another part avoiding the path with violent weather conditions. Third, the flight path will not be the Green Circle if there are no appropriate diversion airports along the selected flight route. Alternate airports are required if there is any case for emergency landing and the airplane can land accordingly to the nearest airport avoiding any mishap. Preferably the flight path is selected which has direct route plan i.e. the Green Circle is the most preferable flight route. Weather is the most crucial factor to be considered, this can be concluded on the basis of past accidents in the aviation industry. Considering the example of Air Asia QZ8501, which crashed into the Java Sea. The reason for the crash was due to thunderstorms. If the flight plan could have avoided such thunderstorm the crash could have been avoided. 3. STUDY TECHNIQUES OF DATA MINING 3.1 k-means The k-means algorithm is a data mining technique used to partition objects into clusters. 3.1.1 Introduction In signal processing, the k –means algorithm is primarily used for the vector quantization method. In addition, it is used for clustering of data in data mining. The algorithm has proved exceedingly effective in clustering large data sets[5]. It is able to successfully cluster both numeric data and real world unstructured data. In clustering, the objects are partitioned in such a manner that the objects having greater similarity are in one cluster. Many such clusters are formed form the data partitioned. When we are provided with a d-dimensional vector, the algorithm partitions the n objects into different sets with the aim to optimize the within cluster sum of squares. Now calculate the new mean as the centroid in the newly formed clusters. This process eventually reduces the WCSS. 3.1.2 Drawbacks K-means is a widely used method in cluster analysis. Primarily the algorithm optimizes the WCSS when provided with a d-dimension data set. Hence k-means algorithm is basically an optimization problem. However, it suffers from few drawbacks. The algorithm wrongly assumes that the variance of distribution of all the objects in the data sets is spherical. In some cases, it wrongly assumes that the variance of various variables is the same. The algorithm assumes that the size of each cluster is approximately same. Hence, the failure of any one of the assumptions leads to failure in the kmeans algorithm. 3.2 Decision Trees A decision tree consists of structures like root and leaf nodes, branches etc. The test on an attribute is denoted by the internal node whereas the branches denote the test outcome. 3.2.1 Introduction On the basis of a classification or a regression model, the decision trees are built in the form of a tree structure. The aim is to design a model capable of predicting the final value based on a large number of input variables. On the basis of various attribute tests, the tree is learned by the sets into various subsets. In other words, a decision tree is concurrently developed by breaking down the data sets into incrementally smaller subsets. The decision node has two or more branches each denoting a particular test attribute. Decision trees are equipped to handle both categorical real world data and numerical data. 3.2.2 Drawbacks As the decision tree optimization leads to the NP-complete problem in several cases, it becomes difficult to learn it. As a result the decision tree algorithms utilize the Greedy algorithm for local optimization which cannot ensure to return a decision tree that is optimized globally[8]. To overcome the problem of creation of very complex trees, pruning techniques have to be employed. In addition the decision trees face issues when working either missing values or when super attributes comes into play. The process of converting the numerical data into equivalent categorical data also leads to the problem of binning in decision trees. 3.3 Artificial neural network The artificial neural network is a system that tries to replicate the biological neural network like human brain. 3.3.1 Introduction The artificial neural network is extensively used in machine learning and data mining. The artificial neural network (ANN) attempts to build a system that functions like the neurons in human brain. Although the task is daunting, ANN have been implemented with a certain degree of success for data mining. 2 In ANN[1], a large number of artificial nodes are built that function as the neurons. Each and every node is connected to the every other node in the system. The strength of this connection is used to assign a value to it, indicating whether it shares a strong or weak connection. The input node is fed data which is in the form of numerical data. Each node is then assigned a number indicating the value of activation. On the basis of the strength of the connections two nodes share, the activation value is shared between them. The activation value then flows through the entire neural network in a hidden manner until it reaches the output node where it is reflected in a meaningful way to the end user. 3.3.2 Drawbacks To implement an effective artificial neural network, a large amount of resources needs to be deployed due to the complexity of the process. The ANN approach may prove to be infeasible in comparison to other approaches for performing data mining on smaller data sets. It often suffers from the problem of under training or over training in terms of the learning it requires[2]. Hence it becomes important to train them using the right data set and in a proportional manner. 1. Input Flight source, flight destination, and scheduled departure time. 2. Calculate distance: The distance is calculated w.r.t. the geographical coordinates of the source and destination supplied. The latitude and longitude are fetched from Google Map's Geocoding API. Then, Haversine formula is used to determine the aerial distance between the two points. The algorithm goes as follows: function dist(a, b, x, y) { // (a, b) is source, and (x, y) is destination. Var R = 6371000; a = a.toRadians(); x = x.toRadians(); diff_lat = (x-a).toRadians(); diff_lon = (y-b).toRadians(); a = pow(Math.sin(diff_lat/2), 2) + c = 2*atan2(sqrt(a), sqrt(1-a)); d = R * c; return d; } // Calculate the minimum distance path hierarchy 4.ALGORITHM A stratified bottom-up approach is adopted in order to enable cost and space optimization based testing between state transitions. The individual workload on airlines around the world can be reduced using this flow of operations. Real-time forecast, route suggestion, interactive map interface, and Illustration 1 many other features are pooled into the system. The algorithm, layer-by-layer can be stated as follows: if(d approximately equals 20,000) { // 20+ hrs duration flights. // Source – London – Destination: A = dist(src, LHR) + dist(LHR, dest); // Source – Hong Kong – Destination: B = dist(src, HKG) + dist(HKG, dest); // Source – North Pole – Destination: C = dist(src, North Pole) + dist(North Pole, dest); path[] = ascending_order(A, B, C); } else { //Direct flights: path = distance(src, dest); } //image Consider the example, Mumbai(BOM) to Los Angeles(LAX). Case 1: BOM – LHR – LAX. Coordinates[][]: \ Coordinates: 0 1 0 BOM_lat BOM_lon 1 LHR_lat LHR_lon 2 LAX_lat LAX_lon 4.1 Preliminary operations In this stage, all the inputs that are required by the system are taken from the user or retrieved from a third-party system. It provides data for operating the algorithm. The steps are as follows: 3 Case 2: BOM – HKG – LAX. Same as case 1, only LHR coordinates are replaced by those of HKG. A similar Coordinates matrix is maintained for thos case too. Case 3: BOM – NORTH POLE – LAX. In this case, we consider North Pole as an airport, since the route that the aircraft will follow once in the other side of the globe will change w.r.t. the destination. The coordinates are maintained in a 2-dimensional array. In this step, the routes to be analyzed for weather and routing have been decided and segregated according to distance. Coordinates[][] and path[] are given as inputs to the next step. 4.2 Coordinate-wise weather analysis This process is the route analyzer and plotter on the maps. It is a step-by-step procedure, and can be stated as follows: According to path[], the 0th, 1st, and 2nd routes are selected. For all the three routes, the coordinates matrix contains the source, stop, and destination points. They will be individually analyzed for mathematical pointprogression and calculations[10]. The working is as follows: for(i = 0; i < 2; i ++) { slope = equation_builder(coordinates[i][0], coordinates[i][1], coordinates[i+1][0], coordinates[i+1][1], ); } /*This passes the current and next (lat, lon) pair to the equation builder.*/ function equation_builder(a, b, c, d) { slope = (d-b) / (c-a); // (dy/dx) intercept = b-slope*a; return slope; } Thus, the equation for both the legs of the flight are calculated. A major reason for this operation is is to keep track of latitude and longitude increment-factor and directional heading. The most short and optimized path for a flight is a straight line drawn from source to destination. Thus, we focus on linear propagation. Working of linear propagation: Consider a line, with end points A(2, 3) and B(5, 7). diff_x = 3; diff_y = 4; x = a to c { y = (source) y-coordinate + slope*(current_x – initial_x); } (x, y) is the next point on the curve. This very concept is used to find latitude and longitude coordinates on a flight path. Equation_builder returns the slope, which will be used to evaluate weather at every point by using the above-mentioned incrementation process. The step-value of delta is kept as 0.25 to ensure close coverage. // Now, the weather enroute is calculated using this routine: Calculate_weather(slope); function Calculate_weather(slope) { current_x = a; current_y = b; for(i = a; i <= c; i += 0.25) { current_x = I; current_y = b + slope*(current_x – a); weather_data[] = get_Polarity(current_x, current_y); } } /*The following function computes polarity of the weather condition verdict: */ function get_Polarity(x,y) { // Just once: static time = flight_departure_time; time += 5 minutes; data = file_get_contents(“forcast.io?lat=x&lon=y”); } /* The output of “data” would be as follows: Date Time Location Weather 07/07/2015 07:05:00 (x, y) Light rains 07/07/2015 07:10:00 (x, y) Thunderstorm 07/07/2015 07:15:00 (x, y) Rough air Illustration 2 4 This data has to be mined for the most accurate data result. The time that currently the aircraft is in its flight has been updated. So, the entry in the data retrieved has to be closely matched with flight's current travel time position. For that, the above procedure is used. After executing the above-mentioned routines, weather_data[] is populated with the polarity of weather conditions over the path. Let diff1 = Flight_time – x; Let diff2 = (x+5) – Flight_time; Before going ahead with weather analysis, the polarity of individual point's weather conditions has to be calculated. For that, the following algorithm is used: Consider, weather = “Light winds and thunderstorms”; Polarities are assigned as neutral, positive, and negative to the words contained by the verdict string. So, Light – Positive. Winds – Neutral. And – Neutral. Thunderstorms – Negative. 4.3 Clustering To make this data interpretable and understandable quickly, clustering is used over the container storing the data. One of the basic rules of nature is that change in weather cannot be predicted for abruptness. The deterioration or suitability may alter without prior symptoms and warnings. So, a label should be associated to every area, stating if it is suitable of dangerous to fly in that zone. For this, the weather_data[] is divided into chunks and aggregated for finding out distinct regional statistics[9]. This process is known as clustering. The algorithm for data clustering is as follows: i = 0, head = 0; function cluster(weather_data[]) Now, the negative component has far more ill-effect on the flight than the good impact of positive forecast. { while(!end_of_weather_data) Setting up a scale, 0 – Neutral. 1 – Positive. -3 – Negative. //based on head { while(consistent_polarity_sign) { Thus, negative polarity will dominate, if in existence. temp[i] = index(weather_data[]); } avg = average(polarity(temp[])); cluster_assign_polarity(avg, weather_data[], i); /*This assigns a cluster_polarity to the groups formed in weather_data[] */ Illustration 3 i++; } After getting the polarity of the weather verdict, it has to fit into the time frame. As discussed earlier, diff1 = Flight_time – x; diff2 = (x+5) – Flight_time; Using section-formula, current_weather polarity: polarity = (diff2*polarity(x) + diff1*polarity(x+5)) / 5; */ /*After the above steps, the polarity is returned to the calling function. */ return polarity; head = i; } The above routine groups weather_data into zones as safe or unsafe flying conditions. It helps in quick readability and of data and efficient route decision. If the last cluster of weather_data [] shows negative polarity, nearby airports will be suggested according to radial distance for alternate landing. The cluster-based system will be used to find the most optimum path. path[0], path[1], path[2]; all the three have individual weather data for their respective routes. The next section explains selection of these pathway corridors. } 5 4.4 Optimized Routing The weather data finally obtained has to be mapped onto realtime flight use-cases. The source, destination, and departure time of the flight is given as input by the user/airline. The protocol for determining the safest route is as follows: path[0] > path[1] > path[2] … Preference order. cluster_polarity(weather_data[]) should be maximum. The overall cluster_polarity for all the paths is determined, and the most favorable route is given priority, followed in descending order of weather harmlessness. In the rarest case of same weather conditions on more one route, altitude is taken into consideration. If changing the altitude improves a contestant's chance of becoming the most optimized path, the priority is redesigned and distance optimality is maintained. Routes are represented as a very user-friendly interface, using Google Maps API and modern web technologies. All the three routes are presented in a lucid manner using selection options and a suitable color code to indicate the optimal solution to the user-query. 5.CONCLUSION AND FUTURE WORK An amalgamated approach for efficient flight routing has been presented in this paper. The model, if deployed, can enhance air travel both, commercially and functionally. The millions of people flying in the sky at any given point in time can be sure of what they will expect as the flight cruises at high flight levels. The approach, being hybrid, pools in advantages from various domains and fields in order to build a new and practically implementable system. Airlines can partner for this model to be put in place, which would ease out the burden for weather analysis before every aircraft lifts towards the sky. This algorithm will be implemented, and hosted online as a resource for clients and private jetliners for weather data retrieval purposes. For a wide range of applications, the designed algorithm provides leverage for optimization. The algorithm will be tested and applied to various open-source knowledge graphs, and a schema will be provided to the approved data chunk. FlightRouter, an online platform in under development will apply this algorithm for better results and upgraded performance. [2] Chintan Shah and Anjali Jivani, “Comparison of Data Mining Clustering Algorithms”, at IEEE, 2013 Nirma University International Conference on Engineering (NuiCONE). [3] Marshall E. Koch and Alex Buchholz, The MITRE Corporation, McLean, Virginia, “Quantitative Analysis of Aircraft Height ”, at 2011 IEEE. [4] Jesper Bronsvoort, Greg McDonald, Mike Paglione, Carlos Garcia-Avello & Ibrahim Bayraktutar, “Impact of Missing Longitudinal Aircraft Intent on Descent Trajectory Prediction.” [5] Huang, Zhexue. "Extensions to the k-means algorithm for clustering large data sets with categorical values." Data mining and knowledge discovery 2.3 (1998): 283-304. [6] Apté, Chidanand, and Sholom Weiss. "Data mining with decision trees and decision rules." Future generation computer systems 13.2 (1997): 197-210. [7] Hall, Mark, et al. "The WEKA data mining software: an update." ACM SIGKDD explorations newsletter 11.1 (2009): 10-18. [8] Craven, Mark W., and Jude W. Shavlik. "Using neural networks for data mining." Future generation computer systems 13.2 (1997): 211-229. [9] Srivastava, Jaideep, et al. "Web usage mining: Discovery and applications of usage patterns from web data." ACM SIGKDD Explorations Newsletter 1.2 (2000): 12-23. [10] Berkhin, Pavel. "A survey of clustering data mining techniques." Grouping multidimensional data. Springer Berlin Heidelberg, 2006. 25-71. REFERENCES [1] Barahate Sachin R. and Shelake Vijay M., “A Survey and Future Vision of Data mining in Educational Field” at 2012 Second International Conference on Advanced Computing & Communication Technologies. 6