ENS 491-492 – Graduation Project (Implementation) Final Report Shortest Path Optimization for Public Transport According to User’s Preferences Group Members: Ayça Şevval Kahraman Batu Tan Ceylin Tetik Murat Dalyancı Özge Gül Erbay Supervisor: Gizem Özbaygın Date: 12.06.2022 1. EXECUTIVE SUMMARY Mobility-as-a-Service (MaaS) platforms provide users with various necessary operations they encounter during their trip planning process. The main purpose of MaaS platforms is to create a joint platform where end-to-end trip planning, payment, and ticketing are available for all users, increasing the user experience tremendously. The main need and desire to create a MaaS platform emerged from the lack of simple navigation systems, although they are widely used among users, they lack in terms of creating end-to-end, detailed trip planning solutions, as well as the lack of payment and ticketing features. Our project’s purpose is not to develop a MaaS platform directly, but to develop an efficient trip planning algorithm used in MaaS services, which results in an enhanced user experience. To develop a route planning algorithm, we needed to decide on several factors. The method of data collection and generation, and the decision of the optimization algorithm used were two of them. Optimization methods such as A*, K-Shortest Path (KSP), and Dijkstra’s algorithm have been widely used for route optimization purposes. Mostly used for traveling salesman problems, the A* algorithm provides the best solution for our problem to find the shortest route to go from a starting node to a target node. It allows us to consider the user’s preferences and provides the most optimal routing possible, therefore we decided to use A*. Public transportation data used during this project is generated synthetically, as it was not possible to obtain the necessary data directly from any source. Several considerations have been made to make routes as realistic as possible, such as merging close stations to create a route and assigning vehicle arrival times according to the distance between stations. This data has been formatted in a way that is acceptable to the A* algorithm used. Even though this is not a necessity for the project, we aim to visualize the results by showing the stations and the route on a real-life map. 1 All these processes allow us to create an optimization model that will work according to the user’s wishes and show them the most optimal route with ease. Given the real-life data in the same format, the model should generate the real-life route for the user. 2. PROBLEM STATEMENT The main problem in our project is how to get from one point to another by one or more public transport vehicles. The scope of this problem changes according to the user because the user can specify the objectives. The objectives can be; (1) Reaching the target destination as soon as possible (2) Walking as little as possible (3) Using the smallest amount of public transportation vehicles This problem is a sub-problem of creating a MaaS platform, which includes all the stages of trip planning. As the solution to this sub-problem would benefit the resulting MaaS application, we were motivated to work on it. To achieve this goal, we conducted extensive research to find the necessary data and the most efficient route optimization algorithm. Constraints and problems we faced only increased our interest and we found solutions to each one of them one by one. The fact that this algorithm and the resulting MaaS application could be used by people in real life made it more fun to work on this project. We worked on a real scenario, not a hypothetical one. Of course, learning about a new optimization algorithm and handling data affected our motivation positively, and we became better equipped with knowledge. While learning made it easier for us to commit to the project, there were certain tasks and objectives we needed to deal with, some of which gave us a hard time. 2 2.1. Objectives/Tasks a. Finding data The greatest resource the problem requires is data, as it is the core part where the optimization algorithm will work on. This data must include; (1) Station names (2) Station codes (3) Station locations (4) Station types in terms of vehicle type (5) Routes (6) Arrival times of certain vehicles at certain stations For the scope of this project, we searched for a database that included this information. The closest dataset was the one provided by the Istanbul Municipality, but it only included the names, codes, and locations of some stations. We generated more detailed data using the data we collected. A lot of simplifications have been done to make it usable by the A* algorithm, however, it is not over. We still need to format it and make it easier to use for optimization. b. Route Optimization Algorithm One of the biggest tasks was determining the optimization algorithm we wanted to use. Several algorithms are used for finding the shortest path, including Dijkstra’s, K-Shortest Path (KSP), and A*. We needed an algorithm that could work dynamically in accordance with changing costs between stations depending on the time the algorithm is executed. For this reason, we eliminated the Dijkstra’s and KSP algorithms and used the A* algorithm. If a similar problem needs to be solved without a dynamic network, the other two algorithms could be considered. 3 c. Entering input data The algorithm should be able to include user-specific data so that it can work properly. This data includes the location of the user and the user’s preferences. This task was one of the shorter ones, as it can be solved with a basic input mechanism in the code. d. Mapping Even though the mapping is out of the scope of this project, we think it is the best way to show the results of the optimization algorithm. Therefore, this task had an important place. One can use a subscription-based API that supports mapping, but there are also free options. We used a Python library called Plotly, which enables us to show the locations of the stations and the route given by the algorithm on a real-life map. The original problem does not need any user interface to be solved, however, it makes the usage much easier. This mapping will be used as the user interface of the project. This part is the part where the user's screen is visible and the main purpose, so it should be reflected in the output part in a very understandable and detailed way for the user. 2.2. Realistic Constraints Working on this problem comes with its difficulties. There were constraints that we faced as developers, and there will be constraints when this algorithm is used by users. a. Constraints Faced by the Developers The biggest constraint we had was regarding the data. At the beginning of the project, we worked on data separation and data access. It was quite challenging to find the public transportation data including the stops of the public transportation. In addition, there was no way to access usable data that included the routes and arrival times of the buses found. Decomposing the obtained data and integrating it into excel or code were also headaches. Additionally, there was confusion regarding mapping and specifying the route on the map. Most of the applications we could draw a map with were subscription-based, and we had to 4 pay before using them. Eventually, we managed to find a Python library that enabled us to show the stations and routes on a map. b. Constraints Faced by the Users Users will need to satisfy several constraints to be able to use the algorithm. Firstly, the algorithm finds the location of the user to match it with the closest public transportation stations. The user has to either manually enter the location, or enable the location services of its device. According to the device and the IP of the network it is connected to, the obtained locations might be less accurate. Furthermore, the algorithm requires more computing power as the network of public transportation gets bigger and the difficulty of optimization increases. If this algorithm is used in real-life by the users, the most optimal way would be to use a server and make the performance independent of the user’s device. As it was out of the scope of the project, we did not provide any user interface for the algorithm. For someone that does not understand programming, it could be hard to use the model. The only ease of use lies in the mapping, as the resulting route is aimed to be shown on a map. 5 3. METHODOLOGY 1. A* Optimization Algorithm a. Description A* is a “best fit” searching algorithm that is generally used for traveling salesman problems and is classified as an “admissible heuristic algorithm”. The algorithm finds the shortest route to go from the starting node to the target node1. Even though the algorithm is commonly used for the purpose of optimizing pricing and route, in our project, we can have different objectives. We may try to minimize total time, the total number of vehicles used, or total time spent walking. A* algorithm does a good job finding the most efficient route respecting user-defined constraints and definition of efficiency. Users may choose to exclude several nodes, such as stops used by specific vehicle types. According to the time the algorithm is executed, information about how many minutes later the vehicles will come to the stop may change. After taking user data such as location and local time, the method uses available public transportation information to use a dynamic network of moving transportation vehicles to optimize users’ travel. b. The Algorithm Our A* algorithm has four inputs: a map structure containing the list of nodes, edges, and node IDs; the ID of the start node, the ID of the goal node, and the current time. The output is a list of integers representing the IDs of the nodes in the shortest route. The start and the goal nodes are entered by the user of the program. The algorithm asks the user to enter names or codes of stations for the location. If the names or codes of the stations are valid, then a corresponding coordinate pair can be returned for both start and goal locations. In this step, one can use the geocoder library by python to find the coordinates of entered places. 1 Dere, E., & Durdu, A. (2018, November 13). Usage of the A* Algorithm to Find the Shortest Path in Transportation Systems. 6 This step is necessary to find out the trip details, however not sufficient to clearly identify the specific stations. Therefore, these coordinates can later be used to find out the closest stations to these entered locations. Based on these coordinates, a cluster search can be initiated to find if there is a station in our dataset with a short distance. If not found, then the size of the cluster can be increased. By using this library, one can find the closest station and its corresponding ID, name, and coordinates which will be the closest to the user. During this project, we did not implement the option to find stations close to the user's location, instead, the user needs to enter the code of the starting station. The algorithm starts with initializing the parameters which are a dictionary of coordinates of the intersections given in the input map structure, an adjacency list of neighbors that represent the roads in this map, an empty set for later storage of explored nodes, and a frontier dictionary associating nodes on the frontier to a composed of a tuple (route, path length). The route is the sequence of traversed locations from start to frontier node and becomes the output of our algorithm. The algorithm starts its search by initiating a while loop which iterates until the frontier dictionary’s length is less than zero. If the length is smaller than zero, it means that too many nodes are removed from the list accidentally and the list contains nothing. If the length is equal to zero, it means that the algorithm has not added any frontier node to the list because it has not yet found a frontier node following the starting node. The process of finding the nearest frontier node is done with a helper function. This function selects and returns the location on the frontier node having considered the shortest total estimated path cost to the goal. The total cost is the sum of the current path cost up to the frontier node and the travel time between the frontier node and the goal node. This travel distance is calculated based on the current time, and the arrival time of the transportation vehicle to the frontier node and the goal node. Wait time, which is found by subtracting the current time from the arrival time of the vehicle, is added to the travel time. The inputs of this function are the routes, the goal, and the current time. Arrival times are taken from an imported data frame. Routes are in a dictionary associating nodes on the frontier to a composed tuple consisting of routes and path lengths. The tuple contains the sequence of traversed locations from start to frontier node and associated path cost. The goal is in the integer format stating the station code of 7 the goal node. The function returns the nearest frontier node after calculating the costs to the neighboring stations. While there is a node on the frontier, the next one with the shortest distance is selected as the start and the current node is updated. This is a greedy approach. Then, the identified route up to this node is collected in a list named current routes and also the path cost. If the frontier node with the shortest path is the goal node, then the shortest path is found and the while loop is broken, which means our final route consists of these nodes only. If not, the frontier node is removed from the list of current_node and it is added to the explored set. The next step is to visit all the neighbors of this popped frontier node. A for loop is started for this iteration and the respective neighbor in the for loop is added to a copy of the current path list for calculation purposes. The step cost to transition to the neighbor from the frontier node is calculated and the total path cost is updated with this new step cost. If the neighbor is already in the explored set, no action is taken and the algorithm moves to the next neighbor. If the next neighbor is not on the frontier node list, it is added with its route and path cost. If the neighbor is already on the frontier, then the list is updated if the new route is shorter. This step is done by collecting the path cost of the route already on the frontier and comparing the path cost with the new route. If the existing path cost is larger than the new one, then the frontier is updated as the new route. This for loop iterates for each neighbor and the frontier is only updated if the newly found route is smaller. After the iteration is done, the current route is returned. The respective code can be found in Appendix A in the Appendix section. One important detail in the algorithm is that it does not accept all of the stations to run the optimization. The algorithm only needs intersection stations, in which users can change vehicles. After the optimization is done between the intersection stations, travel time from the start location to the beginning intersection station and from the end station to the target location can be added. We also have Dijkstra and Uniform Cost Search and if necessary, these might be used for comparison purposes. However, these algorithms are solely used for shortest path purposes, and it may be hard to update them to use travel times instead of distance. Our main focus is the A* algorithm but since there can be many different scenarios in Istanbul’s public transportation 8 routes, if needed, these two algorithms can be used to see if they result in a route with a smaller cost because their complexities are small and they do not take large time to compile. 2. Data Collection and Generation This project aims to create an optimization model, to minimize metrics like walking distance, total time, or the total number of vehicles used. In order to achieve this minimization problem, we needed the data identifying the constraints and distances that will be used in the objective. We faced several problems acquiring the actual public transportation data of Istanbul, but we managed to use the open data portal of the municipality of Istanbul and download the data containing locations, names, and codes of 13907 bus stops in the city. Unfortunately, this data was not enough, and we needed the same data for all vehicle types, the connections of the stops, and estimated arrival times of vehicles. In addition to these, the direction of the public transportation vehicles and the stops they pass through requires variables depending on the time. After trying to reach this data using the same portal and other sources available, we decided to generate synthesized data representing the actual one, so that we can use it in the optimization model. This creation took place in several steps. a. Removing Duplicates The bus station data we collected included multiple stations with the same name, mostly because of the stops that are in the same place but in different directions. We want to model the routes as two-way options, meaning vehicles can travel up and down the route. Therefore, we removed the duplicates from the data using Microsoft Excel, which left us with 6551 stations. b. Distributing Stops to Different Vehicle Types In order to provide different transportation options to the user and modify the optimization according to the user's preferences, we need the station data of all available vehicle types. For this reason, we analyzed the number of stations different vehicles have in Istanbul. 9 Proportionally, we used Python to divide the 6551 stations we have into different vehicle types. This match represented the vehicles that can use those certain stations. Additionally, we matched some stations with multiple vehicle types, representing the transfer stations. After this process, we were left with 3406 bus stations, 756 metro stations, 309 metrobus stations, 1921 minibus stations, and 327 tram stations. c. Generating Distance Matrix for Each Vehicle Type To be able to generate routes logically, and use the data later in the optimization model, we needed distance matrices showing distances between every station of the same type, that is either bus, metro, metrobus, minibus, or tram. Using Python, we developed an algorithm that uses the coordinates of the stations of the same type, finds the distance between them, and puts that data into an excel file as a distance matrix. Rows and columns were named with the unique station codes, representing which stations are calculated. This process requires lots of computation power and took 4 hours after the algorithm started working, because calculating the distances between every station means over 16 million iterations. At the end of the algorithm, it produced 5 excel files, each showing distances for stations of a different vehicle type. d. Generating Routes We tried two approaches while generating routes that connect stations. For both of the approaches, we first grouped the stations that are on the same route together and then sorted them. I) Clustering Approach To match the stations that are close to each other, we wanted to use a clustering algorithm. The most suitable algorithm was Agglomerative Clustering, as it can use a precalculated distance matrix and have different objectives such as minimizing average distance, 10 maximum distance, or total distance between stations in one cluster. We can also choose the number of clusters to be created. For each vehicle type, we created as many clusters as one-tenth of the number of stations, aiming to average 10 stations per route, and successfully created the clusters. One problem was that it was impossible to reach from one cluster of stations to another using the same vehicle type, as they were not connected. To overcome this issue, we randomly copied some of the stations to clusters, which did not initially include that station, creating bridges between them. II) Random Approach One idea was to assign stations to clusters completely randomly. With an average of 10 stations per route, we managed to create random groups of stations. As this was a random generation, connections between stations did not make much sense. However, it is easier to use in the optimization model, as it is possible to walk from one route to the other one because two close stations can be on different routes. Whether we used the clustering approach or the random approach, we then sorted the stations in clusters based on their locations to decide on their ordering. We made sure that stations are ordered logically so that vehicles do not go forward and back on the same route all the time. e. Generating Estimated Arrival Times Using the routes generated, we assigned estimated arrival times of vehicles to stations. To represent that the vehicles go from one station to another in the correct order as routes show, we assign these times in incremental order, using the distance matrix and average speed of the vehicle type. This way, travel time from one station to another one increases as the distance between these stations increases. All of the data generation processes may be executed again using different parameters at any time in this project if we decide that the new parameters represent the public transportation system in Istanbul more accurately. 11 3. Integration of Data to A* Algorithm Our synthetic data was not in a format that is directly usable by the A* algorithm. To overcome this issue, we needed to reformat it. a. Intersection Stations and Neighbors The A* algorithm we use accepts the data of intersection stations in pickle format. Pickle can only be created, opened, and read by the Python programming language. So that our data is acceptable, we had to do some adjustments. However, we realized that we could not put our data in the same format for the algorithm. Therefore, we updated the algorithm. In the necessary pickle file format, there were keys representing intersection station codes and values as a list, representing the codes of the intersection stations that can directly be reached from the key. An intersection station means that the user can change vehicles at that station. We figured that we could achieve the same purposes with a dictionary in python. Using the data we already had, we spotted the intersection stations and saved them as keys to a dictionary. After that, we found the neighboring intersection stations to these and added them as the values of the related key. We were able to use this data in the A* algorithm and run the optimization on this information. b. Arrival Times The original A* algorithm does not have the concept of arrival times, as it directly uses the distance between the nodes as path cost. In our project, the path cost should be the travel times between stations and the waiting times at stations. Therefore, we needed to import the arrival times of vehicles to these stations. The way we did this is by importing the algorithm, where we calculate and assign arrival times of vehicles to stations directly to the A* algorithm. This causes the run time of the algorithm to be higher, but we had no other choice, as extracting this information to an excel document and importing it again resulted in data type problems. It was not possible to access arrival times that way. 12 After we import the data, the only necessary filtering was about the stations. We found the route both intersection stations are on, and the related arrival times of each station. Comparing these arrival times with each other and the current time gave us the travel time and waiting time, respectively. 4. Map Generation To see the performance and recommendations of our algorithm, we wanted to show the locations of the stations and the recommended route on a map. If it is possible, we also intend to draw the route to make it more visible. The library we used for this aim is Plotly, which allowed us to plot dots and draw lines between them on the map of the world. The main goal at first was to find an interactive map where the data for the roads were accessible. This way, we could draw our output route on the map in a way that would reflect the real-life scenario. However, there were some obstacles with mapping. First of all, our data for this project is synthetic and it may not reflect a logical scenario in some cases. The generated routes and the connections between them are subject to randomized probabilities. Therefore, if we were to use a dynamic map the results would be complex and unrealistic. Second of all, the dynamic map options such as the Google Maps API, Yandex, and others required a subscription. These economic and technical barriers prevented dynamic map usage, therefore the python library Plotly which provided us a free map was selected even though it only provided a static map. With Plotly, one can show the result of the optimization only as a line segment between the nodes rather than a real-life road-view. Yet, Plotly could enable us to show our results to the user in a more user-friendly manner. Implementing a more sophisticated mapping option like Google Maps or Yandex would provide a clearer image of the result, and can be considered in the future. 13 4. RESULTS & DISCUSSION The initial objective of the project was to develop an optimization algorithm aiding the users in finding the most optimal route from the start to the target location. The meaning of an optimal route should have been dependent on the preferences of the user. We wanted to use the real-life data for stations, routes, and locations so that the resulting route could be applied and tried in reality. Even though accessing the real data was not possible, our team succeeded in developing the algorithm, mimicking the real-life data and combining them to find the optimal route. Using a dataset in the same format we used in this project, the optimization algorithm should be successful in running and resulting in the best route. The algorithm also prints the total travel time and necessary transitions between vehicles. The actual A* algorithm took around 1 second to run and give the resulting route. However, our algorithm calculates the necessary data to be used in optimization, which takes an additional 28 seconds to complete. This is a very long time, but if this algorithm is to be published as an application, the data can be given statically, so the time would be a lot shorter. Of course, this success does not mean that the project cannot be developed. One important lack of our project is that it uses synthetically generated data, and not the real one. Even though the generated data is based on a part of the real data and realistic assumptions, it may result in routes that are not applicable in real life. To consider using this algorithm in reality, one should use the real data in the same file format, and run it again. Another part that can be developed is the user interface of the application. At the moment, the project has no user interface, if we ignore our plans of showing the stations and the route on a map. The user has to open the code and enter the user-specified information, which may not be easy. Even though it is an important part of the travel optimization applications, we were not able to implement different objective options that would depend on the user's preferences. Users should be able to select objective options such as minimizing the walking distance or the number of vehicles used. We had time constraints that prevented us from implementing these solutions. 14 This optimization algorithm can be developed further, however, it is a good basis for real-world applications of the A* algorithm and works correctly according to the data entered as input. 5. IMPACT The project aims to make an impact in two ways. The first one directly benefits the users of this application and the second one helps the MaaS project, where our project can be used. a. Social Impact The algorithm makes it possible for users to find the most efficient way of reaching their destinations. The meaning of efficiency can change from user to user. If a user wants to find the cheapest way, that would benefit the user economically. However, the algorithm can also provide planning benefits, if a user wants to reach the target destination as fast as possible. Additionally, the algorithm can impact the user’s daily routine in terms of sports, doing tours, or resting. In a rapidly changing global world, people need to be fast in order to catch up with time. For this reason, the application accelerates the social life of individuals and makes their lives easier. b. Impact on MaaS Projects This problem is a sub-problem for developing a MaaS platform. Using a MaaS platform, users should be able to plan their route to target destinations, look for accommodation options, and pay for their travels and their stay. Our project solved this sub-problem and provides the optimal route planning services to its users. The optimization algorithm of this project can be used in different mobility-as-a-service products with different data if needed. 6. ETHICAL ISSUES There are no ethical issues. 15 7. PROJECT MANAGEMENT In basic terms, our project is a route optimization problem. To find the best and optimal algorithm that meets our needs, we had to use different features and approaches that helped us with the optimization process and data generation. Throughout the time we spent working on our project, several obstacles came along the way that interfered with the initial plans, which changed the functioning of our designed prototype we had in the first place. The first obstacle we had to face was during our search to find an API connection. Our initial plan was to use Google’s API, however after necessary research was made, we found out that Google API has some sort of limitations and did not have the ability to meet the technical needs of our project, ultimately, we had to choose a different path and use Google apps, which provides every user with free services. Although this was a setback somewhat in terms of testing, fortunately, it did not widely affect the project’s scope and operation. The biggest obstacle we had to encounter during the scope of our project which abruptly changed our initial plans was in terms of data collection. In order to give users an optimization service while also considering their preferences, we needed stored data at the beginning to give an optimal solution for our problem. Initially, we tried to collect real-life data in a detailed manner such as locations, routes, public transportation stops, the estimated time that a vehicle will reach its desired destination, etc. Our focus was on Istanbul, which is a heavily populous city that has various routes and schedules in terms of public transportation, hence the data we needed was huge. After necessary research, an open data portal from the Municipality of Istanbul has provided us with a database that had the information we needed. The database gave us information for the locations of bus stops with their names and codes, but still, we weren’t able to get the routes and schedules of other public transportation types. The data collection and research part of our project has been a major setback in terms of both time and efficiency. Even after extensive research, we still weren’t able to find a dataset that matched our needs. Eventually, due to our setbacks encountered, we decided to change the initial plan of collecting pre-calculated data into generating synthetic data. 16 8. CONCLUSION AND FUTURE WORK Our project mainly focuses on developing an efficient route planning algorithm for MaaS services to give users the best experience. It is defined and developed in accordance with critical points such as the scale of the network, the way of retrieving static data, and means of transportation. It is able to generate effective dynamic solutions to real-time problems. It offers the user a journey with the details of stops according to the vehicle that is used and produces the optimal route. The model that is implemented includes route optimization and integration of different data for real-life applications. There are several other steps that can be taken to develop our work. Because of time related, economic and technical constraints, we were not able to implement these in this version. a. Running Tests to Evaluate Performance This project needs real-time data in order to evaluate the performance. First of all, to make it usable by many users at the same time there should be a server that holds this algorithm using the real-time data and sources. The server should provide a seamless connection while refreshing the data accessed by the traffic resources which would provide information about the traffic congestion amount, the average speed of the highway etc., so that it can healthily deliver the traffic status. If this was the case, our project’s evaluation would depend on the performance of the server. However, in this project neither the real-life resources nor the data is accessible. These preventive constraints are explained thoroughly in the report. Since this is the case, there remains some other metrics to consider for performance purposes. The most important metric to consider is the speed of the shortest route finding algorithm’s speed and complexity, in our case the speed of the A* algorithm. In the worst case scenario where the data grows exponentially in 𝑑 branches, the A* has a complexity of O(𝑏 ) where “b” is the number of branches and “d” is the number representing the depth of the graph. For traffic data it is never the case to grow exponentially as there are limited resources and roads to provide such utility. Therefore the A* algorithm never downgrades to its worst performance complexity and provides much faster results even though the data is crowded. As a future work, the first task to complete is to obtain or develop a server that provides the missing features that our data has. There needs to be 17 real-time traffic data and resources that can provide updates periodically. After this server is obtained, the tests on the performance of the server can be conducted by measuring the congestion level when a large group of users use the server at the same time or measuring how the algorithm would react in extraordinary cases such as an accident in a crowded highway or a closed path that affects the route dramatically. Hence, without the implementation of such a server, these evaluations cannot be applied, measuring only the performance of the algorithm is adequate until later. b. Implementing real-life data Our main goal in this project was to provide users with an optimization service to help them select between various public transportation options. For this optimization model to have a real-life application, we needed to use real-life data but were unsuccessful in obtaining detailed enough data due to the reasons explained in this report. Eventually, we used partial real-life data and worked on it to synthetically generate the rest. This data collection setback causes some problems in terms of providing a logical result. The results we obtained as a result of the optimization model may not be as accurate as they would be if we could use the real-life data. Hence, one of the most important future works for this project is obtaining real-life data, putting it in the format expected by the algorithm, and running the optimization with that. Two formatting operations need to be done after obtaining the data. These are: 1. Intersection Stations The data of intersection stations and their neighboring intersection stations must be kept in a Python dictionary. Keys should be the station codes and values should be lists containing the station codes of its neighbors. 2. Arrival Times Arrival times of vehicles to all the stations should be kept as a pandas data frame. This data frame must include station codes, station names, routes that these stations are on and arrival time of each vehicle to that station as a list. 18 This way, the algorithm may generate logical results that may be with the current public transportation network in Istanbul or any other city. c. Mapping Mapping could not be done as intended because it does not make sense to do it with synthetically generated data. Real-world data would make more sense. Also, we had technical and economical difficulties, as the priced mapping options were unavailable and the free version did not work as intended. The only mapping feature that was available for free to us was the Python library Plotly which enables drawing a line segment between a pair of coordinates. It provides a visuality however does not meet the expectations of using a navigation system. For the future work of this project, if real-time data and traffic resources were accessible a dynamic map would be preferred rather than a static map. Since obtaining such a server would already overcome the economical barrier as it is impossible to do so otherwise, it would also resolve the technical barrier where the data and resources were not sufficient. Hence, obtaining a dynamic map with some economical support would not count as a barrier for the project. Using such a navigation system would provide the integration of many live traffic data such as addresses, transportation network systems, road databases, up-to-date status of transportation vehicles, and traffic congestion. With the help of such a mapping system, the project would provide better and more realistic visuality as well as a friendly user interface. d. Implementing user preferences We use the A* algorithm to achieve minimum travel time using vehicles’ arrival times to stations, as it is the most common definition of an objective in route planning applications. Alternatively, other objectives such as walking as little as possible, using the smallest amount of public transportation vehicles, and minimizing the cost of travel can be used. A* algorithm allows the change of definition of the objective function. This was the initial reason we wanted to use A* instead of Dijkstra’s and K-Shortest Path. In each step, the cost is calculated with the help of an additional function. We have updated the function to 19 calculate travel times rather than distances. However, for this application to be completed, helper functions that calculate other definitions of cost should be integrated. e. Front-end of the application The algorithm may be hard to understand for people with little knowledge of this certain area. Therefore, a simple frontend where users can enter their preferences, current location and target location as input should be created. Since the front-end will be the first thing that the user will see, it should be quite understandable and simple. Input data can be received by specifying the current place and the destination. The selections of the users could be supported by additional visual materials such as mapping. To develop the front-end, any tasks such as inputting data to be used in the application, placement of content, selection, and application in line with user requests should be considered. Current applications that provide route optimization are good examples of easy-to-use front-end, put together with a big map and clear input options for the user. f. Updating the algorithm for accepting any location In the current version of the algorithm, users are allowed to choose their start and target destinations among the transaction stations in the network. Transaction stations are the stations the user can change vehicles at. A* optimization runs between these nodes, therefore this was the most important step. To make the application more user-friendly and provide a service where users can plan to go from anywhere to anywhere, a location discovery system is needed. If the user wants to start the journey from the current position, the coordinates should be taken and the closest station should be determined. It is important to integrate this feature, as the proportion of intersection stations to the total number of stations is very low, and users should be able to see and use other stations too. An update in the A* algorithm is also needed after finding the closest station to the user. The optimization will still run between intersection stations, however, the costs of going from the 20 closest station to the first intersection station and from the last intersection station to the target location should be included. The optimal route between intersection stations will probably not change, but the resulting optimal travel time will. g. Pricing Optimization Public transportation is generally preferred because it is more cost-friendly than other transportation options. The various public transportation vehicles that the user will use along the route may have different pricing according to the resulting optimal route. To help the user make an economic decision, an integration that can distinguish the price of each vehicle on each route created could be added. However, the fact that the data we use is synthetic, that is, we do not have the real data, is a situation that limits our progress. Therefore, the inclusion of a ticketing system is also unthinkable, as our model cannot provide the user with a transportation cost for the destination point they want to reach. If the real-life data can be found along with its pricing, an implementation where a price optimization is made and thus travel costs are minimized may be achieved. Implementation of all these future works would improve our work and make it more accessible to the end-user. Our algorithm and synthetic data provide a very good basis for future works, but may not be enough to be marketed for commercial use. 21 9. APPENDIX Appendix A: A* Algorithm, import IntersectionRelationships import allRoutes def calculate_cost(nodeA, nodeB, currentTime): finalDepartureTime = 0.1 finalArrivalTime = 0.1 waitTime = 0 travelTime = 0 selectedRoute = -1 infoOfNodeA = allRoutes.loc[allRoutes["Station Code"] == nodeA] routesOfNodeA = infoOfNodeA["Route ID"].tolist() for routeId in routesOfNodeA: if nodeB in allRoutes.loc[allRoutes["Route ID"] == routeId]["Station Code"].tolist(): selectedRoute = routeId break 22 OldarrivalTimes = allRoutes.loc[allRoutes["Route ID"] == selectedRoute][["Station Code","Arrival Times"]] if(len(OldarrivalTimes.loc[OldarrivalTimes["Station Code"] == nodeB]["Arrival Times"])>0): arrivalTimes = OldarrivalTimes.loc[OldarrivalTimes["Station Code"] == nodeB]["Arrival Times"].item() else: arrivalTimes = [999*999] for i in range(len(arrivalTimes)): departureTime = arrivalTimes[i] if currentTime <= departureTime: finalDepartureTime = departureTime waitTime = finalDepartureTime - currentTime break if departureTime == 0.1: finalDepartureTime = departureTime = arrivalTimes[0] waitTime = (24*60 - currentTime) + finalDepartureTime for j in range(len(arrivalTimes)): arrivalTime = arrivalTimes[j] if finalDepartureTime <= arrivalTime: finalArrivalTime = arrivalTime 23 travelTime = finalArrivalTime - finalDepartureTime break if arrivalTime == 0.1: travelTime = 48*9999 return waitTime + travelTime def find_nearest_frontier_node_AStar(routes, goal, currentTime): path_costs = {node:routes[node][1] + calculate_cost(node, goal, currentTime) for node in routes} return [node for node, path_cost in sorted(path_costs.items(), key= lambda x: x[1])][0] def goaltest(location, goal): return location == goal def shortest_path(start,goal,currentTime): print("shortest path called") coordinates = IntersectionsDictionary neighbours = IntersectionRelationships explored = set() frontier = {start:([start], 0)} count = 1 24 while len(frontier) > 0: current_node = find_nearest_frontier_node_AStar(frontier, goal, currentTime) print(count,":", current_node) count += 1 current_route, path_cost = frontier[current_node] if goaltest(current_node, goal): break frontier.pop(current_node) explored.add(current_node) for neighbour in neighbours[current_node]: new_route = deepcopy(current_route) new_route.append(neighbour) step_cost = calculate_cost(current_node, neighbour, currentTime) new_path_cost = path_cost + step_cost if neighbour not in explored: if neighbour not in frontier: frontier[neighbour] = (new_route, new_path_cost) else: existing_cost = frontier[neighbour][1] if new_path_cost < existing_cost: frontier[neighbour] = (new_route, new_path_cost) 25 print('Minimum time to reach goal using A* algorithm: {:.2f}'.format(frontier[goal][1])) return current_route 10. REFERENCES Dere, E., & Durdu, A. (2018, November 13). Usage of the A* Algorithm to Find the Shortest Path in Transportation Systems. 26