Uploaded by muratdalyanci

ENS 492 Final Report

advertisement
ENS 491-492 – Graduation Project (Implementation)
Final Report
Shortest Path Optimization for Public Transport According to
User’s Preferences
Group Members:
Ayça Şevval Kahraman
Batu Tan
Ceylin Tetik
Murat Dalyancı
Özge Gül Erbay
Supervisor: Gizem Özbaygın
Date: 12.06.2022
1.
EXECUTIVE SUMMARY
Mobility-as-a-Service (MaaS) platforms provide users with various necessary operations
they encounter during their trip planning process. The main purpose of MaaS platforms is to
create a joint platform where end-to-end trip planning, payment, and ticketing are available for
all users, increasing the user experience tremendously. The main need and desire to create a
MaaS platform emerged from the lack of simple navigation systems, although they are widely
used among users, they lack in terms of creating end-to-end, detailed trip planning solutions, as
well as the lack of payment and ticketing features.
Our project’s purpose is not to develop a MaaS platform directly, but to develop an
efficient trip planning algorithm used in MaaS services, which results in an enhanced user
experience.
To develop a route planning algorithm, we needed to decide on several factors. The
method of data collection and generation, and the decision of the optimization algorithm used
were two of them.
Optimization methods such as A*, K-Shortest Path (KSP), and Dijkstra’s algorithm have
been widely used for route optimization purposes. Mostly used for traveling salesman problems,
the A* algorithm provides the best solution for our problem to find the shortest route to go from
a starting node to a target node. It allows us to consider the user’s preferences and provides the
most optimal routing possible, therefore we decided to use A*.
Public transportation data used during this project is generated synthetically, as it was not
possible to obtain the necessary data directly from any source. Several considerations have been
made to make routes as realistic as possible, such as merging close stations to create a route and
assigning vehicle arrival times according to the distance between stations. This data has been
formatted in a way that is acceptable to the A* algorithm used.
Even though this is not a necessity for the project, we aim to visualize the results by
showing the stations and the route on a real-life map.
1
All these processes allow us to create an optimization model that will work according to
the user’s wishes and show them the most optimal route with ease. Given the real-life data in the
same format, the model should generate the real-life route for the user.
2.
PROBLEM STATEMENT
The main problem in our project is how to get from one point to another by one or more
public transport vehicles. The scope of this problem changes according to the user because the
user can specify the objectives. The objectives can be;
(1) Reaching the target destination as soon as possible
(2) Walking as little as possible
(3) Using the smallest amount of public transportation vehicles
This problem is a sub-problem of creating a MaaS platform, which includes all the stages
of trip planning. As the solution to this sub-problem would benefit the resulting MaaS
application, we were motivated to work on it.
To achieve this goal, we conducted extensive research to find the necessary data and the
most efficient route optimization algorithm. Constraints and problems we faced only increased
our interest and we found solutions to each one of them one by one.
The fact that this algorithm and the resulting MaaS application could be used by people
in real life made it more fun to work on this project. We worked on a real scenario, not a
hypothetical one. Of course, learning about a new optimization algorithm and handling data
affected our motivation positively, and we became better equipped with knowledge.
While learning made it easier for us to commit to the project, there were certain tasks and
objectives we needed to deal with, some of which gave us a hard time.
2
2.1. Objectives/Tasks
a. Finding data
The greatest resource the problem requires is data, as it is the core part where the
optimization algorithm will work on.
This data must include;
(1) Station names
(2) Station codes
(3) Station locations
(4) Station types in terms of vehicle type
(5) Routes
(6) Arrival times of certain vehicles at certain stations
For the scope of this project, we searched for a database that included this information.
The closest dataset was the one provided by the Istanbul Municipality, but it only included the
names, codes, and locations of some stations.
We generated more detailed data using the data we collected. A lot of simplifications
have been done to make it usable by the A* algorithm, however, it is not over. We still need to
format it and make it easier to use for optimization.
b. Route Optimization Algorithm
One of the biggest tasks was determining the optimization algorithm we wanted to use.
Several algorithms are used for finding the shortest path, including Dijkstra’s, K-Shortest Path
(KSP), and A*.
We needed an algorithm that could work dynamically in accordance with changing costs
between stations depending on the time the algorithm is executed. For this reason, we eliminated
the Dijkstra’s and KSP algorithms and used the A* algorithm.
If a similar problem needs to be solved without a dynamic network, the other two
algorithms could be considered.
3
c. Entering input data
The algorithm should be able to include user-specific data so that it can work properly.
This data includes the location of the user and the user’s preferences. This task was one of the
shorter ones, as it can be solved with a basic input mechanism in the code.
d. Mapping
Even though the mapping is out of the scope of this project, we think it is the best way to
show the results of the optimization algorithm. Therefore, this task had an important place.
One can use a subscription-based API that supports mapping, but there are also free
options. We used a Python library called Plotly, which enables us to show the locations of the
stations and the route given by the algorithm on a real-life map.
The original problem does not need any user interface to be solved, however, it makes the
usage much easier. This mapping will be used as the user interface of the project. This part is the
part where the user's screen is visible and the main purpose, so it should be reflected in the
output part in a very understandable and detailed way for the user.
2.2. Realistic Constraints
Working on this problem comes with its difficulties. There were constraints that we faced
as developers, and there will be constraints when this algorithm is used by users.
a. Constraints Faced by the Developers
The biggest constraint we had was regarding the data. At the beginning of the project, we
worked on data separation and data access. It was quite challenging to find the public
transportation data including the stops of the public transportation. In addition, there was no way
to access usable data that included the routes and arrival times of the buses found. Decomposing
the obtained data and integrating it into excel or code were also headaches.
Additionally, there was confusion regarding mapping and specifying the route on the
map. Most of the applications we could draw a map with were subscription-based, and we had to
4
pay before using them. Eventually, we managed to find a Python library that enabled us to show
the stations and routes on a map.
b. Constraints Faced by the Users
Users will need to satisfy several constraints to be able to use the algorithm.
Firstly, the algorithm finds the location of the user to match it with the closest public
transportation stations. The user has to either manually enter the location, or enable the location
services of its device. According to the device and the IP of the network it is connected to, the
obtained locations might be less accurate.
Furthermore, the algorithm requires more computing power as the network of public
transportation gets bigger and the difficulty of optimization increases. If this algorithm is used in
real-life by the users, the most optimal way would be to use a server and make the performance
independent of the user’s device.
As it was out of the scope of the project, we did not provide any user interface for the
algorithm. For someone that does not understand programming, it could be hard to use the
model. The only ease of use lies in the mapping, as the resulting route is aimed to be shown on a
map.
5
3. METHODOLOGY
1. A* Optimization Algorithm
a. Description
A* is a “best fit” searching algorithm that is generally used for traveling salesman
problems and is classified as an “admissible heuristic algorithm”. The algorithm finds the
shortest route to go from the starting node to the target node1.
Even though the algorithm is commonly used for the purpose of optimizing pricing and
route, in our project, we can have different objectives. We may try to minimize total time, the
total number of vehicles used, or total time spent walking.
A* algorithm does a good job finding the most efficient route respecting user-defined
constraints and definition of efficiency. Users may choose to exclude several nodes, such as stops
used by specific vehicle types.
According to the time the algorithm is executed, information about how many minutes
later the vehicles will come to the stop may change. After taking user data such as location and
local time, the method uses available public transportation information to use a dynamic network
of moving transportation vehicles to optimize users’ travel.
b. The Algorithm
Our A* algorithm has four inputs: a map structure containing the list of nodes, edges, and
node IDs; the ID of the start node, the ID of the goal node, and the current time. The output is a
list of integers representing the IDs of the nodes in the shortest route.
The start and the goal nodes are entered by the user of the program. The algorithm asks
the user to enter names or codes of stations for the location. If the names or codes of the stations
are valid, then a corresponding coordinate pair can be returned for both start and goal locations.
In this step, one can use the geocoder library by python to find the coordinates of entered places.
1
Dere, E., & Durdu, A. (2018, November 13). Usage of the A* Algorithm to Find the Shortest Path in
Transportation Systems.
6
This step is necessary to find out the trip details, however not sufficient to clearly identify the
specific stations. Therefore, these coordinates can later be used to find out the closest stations to
these entered locations. Based on these coordinates, a cluster search can be initiated to find if
there is a station in our dataset with a short distance. If not found, then the size of the cluster can
be increased. By using this library, one can find the closest station and its corresponding ID,
name, and coordinates which will be the closest to the user. During this project, we did not
implement the option to find stations close to the user's location, instead, the user needs to enter
the code of the starting station.
The algorithm starts with initializing the parameters which are a dictionary of coordinates
of the intersections given in the input map structure, an adjacency list of neighbors that represent
the roads in this map, an empty set for later storage of explored nodes, and a frontier dictionary
associating nodes on the frontier to a composed of a tuple (route, path length). The route is the
sequence of traversed locations from start to frontier node and becomes the output of our
algorithm.
The algorithm starts its search by initiating a while loop which iterates until the frontier
dictionary’s length is less than zero. If the length is smaller than zero, it means that too many
nodes are removed from the list accidentally and the list contains nothing. If the length is equal
to zero, it means that the algorithm has not added any frontier node to the list because it has not
yet found a frontier node following the starting node.
The process of finding the nearest frontier node is done with a helper function. This
function selects and returns the location on the frontier node having considered the shortest total
estimated path cost to the goal. The total cost is the sum of the current path cost up to the frontier
node and the travel time between the frontier node and the goal node. This travel distance is
calculated based on the current time, and the arrival time of the transportation vehicle to the
frontier node and the goal node. Wait time, which is found by subtracting the current time from
the arrival time of the vehicle, is added to the travel time. The inputs of this function are the
routes, the goal, and the current time. Arrival times are taken from an imported data frame.
Routes are in a dictionary associating nodes on the frontier to a composed tuple consisting of
routes and path lengths. The tuple contains the sequence of traversed locations from start to
frontier node and associated path cost. The goal is in the integer format stating the station code of
7
the goal node. The function returns the nearest frontier node after calculating the costs to the
neighboring stations.
While there is a node on the frontier, the next one with the shortest distance is selected as
the start and the current node is updated. This is a greedy approach. Then, the identified route up
to this node is collected in a list named current routes and also the path cost. If the frontier node
with the shortest path is the goal node, then the shortest path is found and the while loop is
broken, which means our final route consists of these nodes only. If not, the frontier node is
removed from the list of current_node and it is added to the explored set.
The next step is to visit all the neighbors of this popped frontier node. A for loop is
started for this iteration and the respective neighbor in the for loop is added to a copy of the
current path list for calculation purposes. The step cost to transition to the neighbor from the
frontier node is calculated and the total path cost is updated with this new step cost. If the
neighbor is already in the explored set, no action is taken and the algorithm moves to the next
neighbor. If the next neighbor is not on the frontier node list, it is added with its route and path
cost. If the neighbor is already on the frontier, then the list is updated if the new route is shorter.
This step is done by collecting the path cost of the route already on the frontier and comparing
the path cost with the new route. If the existing path cost is larger than the new one, then the
frontier is updated as the new route. This for loop iterates for each neighbor and the frontier is
only updated if the newly found route is smaller. After the iteration is done, the current route is
returned. The respective code can be found in Appendix A in the Appendix section.
One important detail in the algorithm is that it does not accept all of the stations to run the
optimization. The algorithm only needs intersection stations, in which users can change vehicles.
After the optimization is done between the intersection stations, travel time from the start
location to the beginning intersection station and from the end station to the target location can
be added.
We also have Dijkstra and Uniform Cost Search and if necessary, these might be used for
comparison purposes. However, these algorithms are solely used for shortest path purposes, and
it may be hard to update them to use travel times instead of distance. Our main focus is the A*
algorithm but since there can be many different scenarios in Istanbul’s public transportation
8
routes, if needed, these two algorithms can be used to see if they result in a route with a smaller
cost because their complexities are small and they do not take large time to compile.
2. Data Collection and Generation
This project aims to create an optimization model, to minimize metrics like walking
distance, total time, or the total number of vehicles used. In order to achieve this minimization
problem, we needed the data identifying the constraints and distances that will be used in the
objective.
We faced several problems acquiring the actual public transportation data of Istanbul, but
we managed to use the open data portal of the municipality of Istanbul and download the data
containing locations, names, and codes of 13907 bus stops in the city. Unfortunately, this data
was not enough, and we needed the same data for all vehicle types, the connections of the stops,
and estimated arrival times of vehicles. In addition to these, the direction of the public
transportation vehicles and the stops they pass through requires variables depending on the time.
After trying to reach this data using the same portal and other sources available, we
decided to generate synthesized data representing the actual one, so that we can use it in the
optimization model. This creation took place in several steps.
a. Removing Duplicates
The bus station data we collected included multiple stations with the same name, mostly
because of the stops that are in the same place but in different directions. We want to model the
routes as two-way options, meaning vehicles can travel up and down the route. Therefore, we
removed the duplicates from the data using Microsoft Excel, which left us with 6551 stations.
b. Distributing Stops to Different Vehicle Types
In order to provide different transportation options to the user and modify the
optimization according to the user's preferences, we need the station data of all available vehicle
types. For this reason, we analyzed the number of stations different vehicles have in Istanbul.
9
Proportionally, we used Python to divide the 6551 stations we have into different vehicle types.
This match represented the vehicles that can use those certain stations.
Additionally, we matched some stations with multiple vehicle types, representing the
transfer stations. After this process, we were left with 3406 bus stations, 756 metro stations, 309
metrobus stations, 1921 minibus stations, and 327 tram stations.
c. Generating Distance Matrix for Each Vehicle Type
To be able to generate routes logically, and use the data later in the optimization model,
we needed distance matrices showing distances between every station of the same type, that is
either bus, metro, metrobus, minibus, or tram.
Using Python, we developed an algorithm that uses the coordinates of the stations of the
same type, finds the distance between them, and puts that data into an excel file as a distance
matrix. Rows and columns were named with the unique station codes, representing which
stations are calculated.
This process requires lots of computation power and took 4 hours after the algorithm
started working, because calculating the distances between every station means over 16 million
iterations. At the end of the algorithm, it produced 5 excel files, each showing distances for
stations of a different vehicle type.
d. Generating Routes
We tried two approaches while generating routes that connect stations. For both of the
approaches, we first grouped the stations that are on the same route together and then sorted
them.
I) Clustering Approach
To match the stations that are close to each other, we wanted to use a clustering
algorithm. The most suitable algorithm was Agglomerative Clustering, as it can use a
precalculated distance matrix and have different objectives such as minimizing average distance,
10
maximum distance, or total distance between stations in one cluster. We can also choose the
number of clusters to be created.
For each vehicle type, we created as many clusters as one-tenth of the number of stations,
aiming to average 10 stations per route, and successfully created the clusters. One problem was
that it was impossible to reach from one cluster of stations to another using the same vehicle
type, as they were not connected. To overcome this issue, we randomly copied some of the
stations to clusters, which did not initially include that station, creating bridges between them.
II) Random Approach
One idea was to assign stations to clusters completely randomly. With an average of 10
stations per route, we managed to create random groups of stations. As this was a random
generation, connections between stations did not make much sense. However, it is easier to use
in the optimization model, as it is possible to walk from one route to the other one because two
close stations can be on different routes.
Whether we used the clustering approach or the random approach, we then sorted the
stations in clusters based on their locations to decide on their ordering. We made sure that
stations are ordered logically so that vehicles do not go forward and back on the same route all
the time.
e. Generating Estimated Arrival Times
Using the routes generated, we assigned estimated arrival times of vehicles to stations. To
represent that the vehicles go from one station to another in the correct order as routes show, we
assign these times in incremental order, using the distance matrix and average speed of the
vehicle type. This way, travel time from one station to another one increases as the distance
between these stations increases.
All of the data generation processes may be executed again using different parameters at
any time in this project if we decide that the new parameters represent the public transportation
system in Istanbul more accurately.
11
3. Integration of Data to A* Algorithm
Our synthetic data was not in a format that is directly usable by the A* algorithm. To
overcome this issue, we needed to reformat it.
a. Intersection Stations and Neighbors
The A* algorithm we use accepts the data of intersection stations in pickle format. Pickle
can only be created, opened, and read by the Python programming language. So that our data is
acceptable, we had to do some adjustments. However, we realized that we could not put our data
in the same format for the algorithm. Therefore, we updated the algorithm.
In the necessary pickle file format, there were keys representing intersection station codes
and values as a list, representing the codes of the intersection stations that can directly be reached
from the key. An intersection station means that the user can change vehicles at that station. We
figured that we could achieve the same purposes with a dictionary in python. Using the data we
already had, we spotted the intersection stations and saved them as keys to a dictionary. After
that, we found the neighboring intersection stations to these and added them as the values of the
related key. We were able to use this data in the A* algorithm and run the optimization on this
information.
b. Arrival Times
The original A* algorithm does not have the concept of arrival times, as it directly uses
the distance between the nodes as path cost. In our project, the path cost should be the travel
times between stations and the waiting times at stations. Therefore, we needed to import the
arrival times of vehicles to these stations.
The way we did this is by importing the algorithm, where we calculate and assign arrival
times of vehicles to stations directly to the A* algorithm. This causes the run time of the
algorithm to be higher, but we had no other choice, as extracting this information to an excel
document and importing it again resulted in data type problems. It was not possible to access
arrival times that way.
12
After we import the data, the only necessary filtering was about the stations. We found
the route both intersection stations are on, and the related arrival times of each station.
Comparing these arrival times with each other and the current time gave us the travel time and
waiting time, respectively.
4. Map Generation
To see the performance and recommendations of our algorithm, we wanted to show the
locations of the stations and the recommended route on a map. If it is possible, we also intend to
draw the route to make it more visible.
The library we used for this aim is Plotly, which allowed us to plot dots and draw lines
between them on the map of the world. The main goal at first was to find an interactive map
where the data for the roads were accessible. This way, we could draw our output route on the
map in a way that would reflect the real-life scenario. However, there were some obstacles with
mapping.
First of all, our data for this project is synthetic and it may not reflect a logical scenario in
some cases. The generated routes and the connections between them are subject to randomized
probabilities. Therefore, if we were to use a dynamic map the results would be complex and
unrealistic.
Second of all, the dynamic map options such as the Google Maps API, Yandex, and
others required a subscription. These economic and technical barriers prevented dynamic map
usage, therefore the python library Plotly which provided us a free map was selected even though
it only provided a static map. With Plotly, one can show the result of the optimization only as a
line segment between the nodes rather than a real-life road-view. Yet, Plotly could enable us to
show our results to the user in a more user-friendly manner.
Implementing a more sophisticated mapping option like Google Maps or Yandex would
provide a clearer image of the result, and can be considered in the future.
13
4. RESULTS & DISCUSSION
The initial objective of the project was to develop an optimization algorithm aiding the
users in finding the most optimal route from the start to the target location. The meaning of an
optimal route should have been dependent on the preferences of the user. We wanted to use the
real-life data for stations, routes, and locations so that the resulting route could be applied and
tried in reality.
Even though accessing the real data was not possible, our team succeeded in developing
the algorithm, mimicking the real-life data and combining them to find the optimal route.
Using a dataset in the same format we used in this project, the optimization algorithm
should be successful in running and resulting in the best route. The algorithm also prints the total
travel time and necessary transitions between vehicles.
The actual A* algorithm took around 1 second to run and give the resulting route.
However, our algorithm calculates the necessary data to be used in optimization, which takes an
additional 28 seconds to complete. This is a very long time, but if this algorithm is to be
published as an application, the data can be given statically, so the time would be a lot shorter.
Of course, this success does not mean that the project cannot be developed. One
important lack of our project is that it uses synthetically generated data, and not the real one.
Even though the generated data is based on a part of the real data and realistic assumptions, it
may result in routes that are not applicable in real life. To consider using this algorithm in reality,
one should use the real data in the same file format, and run it again.
Another part that can be developed is the user interface of the application. At the
moment, the project has no user interface, if we ignore our plans of showing the stations and the
route on a map. The user has to open the code and enter the user-specified information, which
may not be easy.
Even though it is an important part of the travel optimization applications, we were not
able to implement different objective options that would depend on the user's preferences. Users
should be able to select objective options such as minimizing the walking distance or the number
of vehicles used. We had time constraints that prevented us from implementing these solutions.
14
This optimization algorithm can be developed further, however, it is a good basis for
real-world applications of the A* algorithm and works correctly according to the data entered as
input.
5. IMPACT
The project aims to make an impact in two ways. The first one directly benefits the users
of this application and the second one helps the MaaS project, where our project can be used.
a. Social Impact
The algorithm makes it possible for users to find the most efficient way of reaching their
destinations. The meaning of efficiency can change from user to user. If a user wants to find the
cheapest way, that would benefit the user economically. However, the algorithm can also provide
planning benefits, if a user wants to reach the target destination as fast as possible. Additionally,
the algorithm can impact the user’s daily routine in terms of sports, doing tours, or resting. In a
rapidly changing global world, people need to be fast in order to catch up with time. For this
reason, the application accelerates the social life of individuals and makes their lives easier.
b. Impact on MaaS Projects
This problem is a sub-problem for developing a MaaS platform. Using a MaaS platform,
users should be able to plan their route to target destinations, look for accommodation options,
and pay for their travels and their stay.
Our project solved this sub-problem and provides the optimal route planning services to
its users. The optimization algorithm of this project can be used in different mobility-as-a-service
products with different data if needed.
6. ETHICAL ISSUES
There are no ethical issues.
15
7.
PROJECT MANAGEMENT
In basic terms, our project is a route optimization problem. To find the best and optimal
algorithm that meets our needs, we had to use different features and approaches that helped us
with the optimization process and data generation. Throughout the time we spent working on our
project, several obstacles came along the way that interfered with the initial plans, which
changed the functioning of our designed prototype we had in the first place.
The first obstacle we had to face was during our search to find an API connection. Our
initial plan was to use Google’s API, however after necessary research was made, we found out
that Google API has some sort of limitations and did not have the ability to meet the technical
needs of our project, ultimately, we had to choose a different path and use Google apps, which
provides every user with free services. Although this was a setback somewhat in terms of testing,
fortunately, it did not widely affect the project’s scope and operation.
The biggest obstacle we had to encounter during the scope of our project which abruptly
changed our initial plans was in terms of data collection. In order to give users an optimization
service while also considering their preferences, we needed stored data at the beginning to give
an optimal solution for our problem. Initially, we tried to collect real-life data in a detailed
manner such as locations, routes, public transportation stops, the estimated time that a vehicle
will reach its desired destination, etc. Our focus was on Istanbul, which is a heavily populous
city that has various routes and schedules in terms of public transportation, hence the data we
needed was huge.
After necessary research, an open data portal from the Municipality of Istanbul has
provided us with a database that had the information we needed. The database gave us
information for the locations of bus stops with their names and codes, but still, we weren’t able
to get the routes and schedules of other public transportation types. The data collection and
research part of our project has been a major setback in terms of both time and efficiency. Even
after extensive research, we still weren’t able to find a dataset that matched our needs.
Eventually, due to our setbacks encountered, we decided to change the initial plan of collecting
pre-calculated data into generating synthetic data.
16
8.
CONCLUSION AND FUTURE WORK
Our project mainly focuses on developing an efficient route planning algorithm for MaaS
services to give users the best experience. It is defined and developed in accordance with critical
points such as the scale of the network, the way of retrieving static data, and means of
transportation. It is able to generate effective dynamic solutions to real-time problems. It offers
the user a journey with the details of stops according to the vehicle that is used and produces the
optimal route. The model that is implemented includes route optimization and integration of
different data for real-life applications.
There are several other steps that can be taken to develop our work. Because of time
related, economic and technical constraints, we were not able to implement these in this version.
a. Running Tests to Evaluate Performance
This project needs real-time data in order to evaluate the performance. First of all, to
make it usable by many users at the same time there should be a server that holds this algorithm
using the real-time data and sources. The server should provide a seamless connection while
refreshing the data accessed by the traffic resources which would provide information about the
traffic congestion amount, the average speed of the highway etc., so that it can healthily deliver
the traffic status. If this was the case, our project’s evaluation would depend on the performance
of the server. However, in this project neither the real-life resources nor the data is accessible.
These preventive constraints are explained thoroughly in the report. Since this is the case, there
remains some other metrics to consider for performance purposes. The most important metric to
consider is the speed of the shortest route finding algorithm’s speed and complexity, in our case
the speed of the A* algorithm. In the worst case scenario where the data grows exponentially in
𝑑
branches, the A* has a complexity of O(𝑏 ) where “b” is the number of branches and “d” is the
number representing the depth of the graph. For traffic data it is never the case to grow
exponentially as there are limited resources and roads to provide such utility. Therefore the A*
algorithm never downgrades to its worst performance complexity and provides much faster
results even though the data is crowded. As a future work, the first task to complete is to obtain
or develop a server that provides the missing features that our data has. There needs to be
17
real-time traffic data and resources that can provide updates periodically. After this server is
obtained, the tests on the performance of the server can be conducted by measuring the
congestion level when a large group of users use the server at the same time or measuring how
the algorithm would react in extraordinary cases such as an accident in a crowded highway or a
closed path that affects the route dramatically. Hence, without the implementation of such a
server, these evaluations cannot be applied, measuring only the performance of the algorithm is
adequate until later.
b. Implementing real-life data
Our main goal in this project was to provide users with an optimization service to help
them select between various public transportation options. For this optimization model to have a
real-life application, we needed to use real-life data but were unsuccessful in obtaining detailed
enough data due to the reasons explained in this report. Eventually, we used partial real-life data
and worked on it to synthetically generate the rest.
This data collection setback causes some problems in terms of providing a logical result.
The results we obtained as a result of the optimization model may not be as accurate as they
would be if we could use the real-life data.
Hence, one of the most important future works for this project is obtaining real-life data,
putting it in the format expected by the algorithm, and running the optimization with that. Two
formatting operations need to be done after obtaining the data. These are:
1. Intersection Stations
The data of intersection stations and their neighboring intersection stations must be kept
in a Python dictionary. Keys should be the station codes and values should be lists containing the
station codes of its neighbors.
2. Arrival Times
Arrival times of vehicles to all the stations should be kept as a pandas data frame. This
data frame must include station codes, station names, routes that these stations are on and arrival
time of each vehicle to that station as a list.
18
This way, the algorithm may generate logical results that may be with the current public
transportation network in Istanbul or any other city.
c. Mapping
Mapping could not be done as intended because it does not make sense to do it with
synthetically generated data. Real-world data would make more sense. Also, we had technical
and economical difficulties, as the priced mapping options were unavailable and the free version
did not work as intended. The only mapping feature that was available for free to us was the
Python library Plotly which enables drawing a line segment between a pair of coordinates. It
provides a visuality however does not meet the expectations of using a navigation system. For
the future work of this project, if real-time data and traffic resources were accessible a dynamic
map would be preferred rather than a static map. Since obtaining such a server would already
overcome the economical barrier as it is impossible to do so otherwise, it would also resolve the
technical barrier where the data and resources were not sufficient. Hence, obtaining a dynamic
map with some economical support would not count as a barrier for the project. Using such a
navigation system would provide the integration of many live traffic data such as addresses,
transportation network systems, road databases, up-to-date status of transportation vehicles, and
traffic congestion. With the help of such a mapping system, the project would provide better and
more realistic visuality as well as a friendly user interface.
d. Implementing user preferences
We use the A* algorithm to achieve minimum travel time using vehicles’ arrival times to
stations, as it is the most common definition of an objective in route planning applications.
Alternatively, other objectives such as walking as little as possible, using the smallest amount of
public transportation vehicles, and minimizing the cost of travel can be used.
A* algorithm allows the change of definition of the objective function. This was the
initial reason we wanted to use A* instead of Dijkstra’s and K-Shortest Path. In each step, the
cost is calculated with the help of an additional function. We have updated the function to
19
calculate travel times rather than distances. However, for this application to be completed, helper
functions that calculate other definitions of cost should be integrated.
e. Front-end of the application
The algorithm may be hard to understand for people with little knowledge of this certain
area. Therefore, a simple frontend where users can enter their preferences, current location and
target location as input should be created. Since the front-end will be the first thing that the user
will see, it should be quite understandable and simple.
Input data can be received by specifying the current place and the destination. The
selections of the users could be supported by additional visual materials such as mapping. To
develop the front-end, any tasks such as inputting data to be used in the application, placement of
content, selection, and application in line with user requests should be considered.
Current applications that provide route optimization are good examples of easy-to-use
front-end, put together with a big map and clear input options for the user.
f. Updating the algorithm for accepting any location
In the current version of the algorithm, users are allowed to choose their start and target
destinations among the transaction stations in the network. Transaction stations are the stations
the user can change vehicles at. A* optimization runs between these nodes, therefore this was the
most important step.
To make the application more user-friendly and provide a service where users can plan to
go from anywhere to anywhere, a location discovery system is needed. If the user wants to start
the journey from the current position, the coordinates should be taken and the closest station
should be determined. It is important to integrate this feature, as the proportion of intersection
stations to the total number of stations is very low, and users should be able to see and use other
stations too.
An update in the A* algorithm is also needed after finding the closest station to the user.
The optimization will still run between intersection stations, however, the costs of going from the
20
closest station to the first intersection station and from the last intersection station to the target
location should be included. The optimal route between intersection stations will probably not
change, but the resulting optimal travel time will.
g. Pricing Optimization
Public transportation is generally preferred because it is more cost-friendly than other
transportation options. The various public transportation vehicles that the user will use along the
route may have different pricing according to the resulting optimal route. To help the user make
an economic decision, an integration that can distinguish the price of each vehicle on each route
created could be added.
However, the fact that the data we use is synthetic, that is, we do not have the real data, is
a situation that limits our progress. Therefore, the inclusion of a ticketing system is also
unthinkable, as our model cannot provide the user with a transportation cost for the destination
point they want to reach. If the real-life data can be found along with its pricing, an
implementation where a price optimization is made and thus travel costs are minimized may be
achieved.
Implementation of all these future works would improve our work and make it more
accessible to the end-user. Our algorithm and synthetic data provide a very good basis for future
works, but may not be enough to be marketed for commercial use.
21
9.
APPENDIX
Appendix A: A* Algorithm,
import IntersectionRelationships
import allRoutes
def calculate_cost(nodeA, nodeB, currentTime):
finalDepartureTime = 0.1
finalArrivalTime = 0.1
waitTime = 0
travelTime = 0
selectedRoute = -1
infoOfNodeA = allRoutes.loc[allRoutes["Station Code"] == nodeA]
routesOfNodeA = infoOfNodeA["Route ID"].tolist()
for routeId in routesOfNodeA:
if nodeB in allRoutes.loc[allRoutes["Route ID"] ==
routeId]["Station Code"].tolist():
selectedRoute = routeId
break
22
OldarrivalTimes = allRoutes.loc[allRoutes["Route ID"] ==
selectedRoute][["Station Code","Arrival Times"]]
if(len(OldarrivalTimes.loc[OldarrivalTimes["Station Code"] ==
nodeB]["Arrival Times"])>0):
arrivalTimes = OldarrivalTimes.loc[OldarrivalTimes["Station Code"]
== nodeB]["Arrival Times"].item()
else:
arrivalTimes = [999*999]
for i in range(len(arrivalTimes)):
departureTime = arrivalTimes[i]
if currentTime <= departureTime:
finalDepartureTime = departureTime
waitTime = finalDepartureTime - currentTime
break
if departureTime == 0.1:
finalDepartureTime = departureTime = arrivalTimes[0]
waitTime = (24*60 - currentTime) + finalDepartureTime
for j in range(len(arrivalTimes)):
arrivalTime = arrivalTimes[j]
if finalDepartureTime <= arrivalTime:
finalArrivalTime = arrivalTime
23
travelTime = finalArrivalTime - finalDepartureTime
break
if arrivalTime == 0.1:
travelTime = 48*9999
return waitTime + travelTime
def find_nearest_frontier_node_AStar(routes, goal, currentTime):
path_costs =
{node:routes[node][1] + calculate_cost(node, goal,
currentTime) for node in routes}
return [node for node, path_cost in sorted(path_costs.items(), key=
lambda x: x[1])][0]
def goaltest(location, goal):
return location == goal
def shortest_path(start,goal,currentTime):
print("shortest path called")
coordinates = IntersectionsDictionary
neighbours = IntersectionRelationships
explored = set()
frontier = {start:([start], 0)}
count = 1
24
while len(frontier) > 0:
current_node = find_nearest_frontier_node_AStar(frontier, goal,
currentTime)
print(count,":", current_node)
count += 1
current_route, path_cost = frontier[current_node]
if goaltest(current_node, goal):
break
frontier.pop(current_node)
explored.add(current_node)
for neighbour in neighbours[current_node]:
new_route = deepcopy(current_route)
new_route.append(neighbour)
step_cost = calculate_cost(current_node, neighbour,
currentTime)
new_path_cost = path_cost + step_cost
if neighbour not in explored:
if neighbour not in frontier:
frontier[neighbour] = (new_route, new_path_cost)
else:
existing_cost = frontier[neighbour][1]
if new_path_cost < existing_cost:
frontier[neighbour] = (new_route, new_path_cost)
25
print('Minimum time to reach goal using A* algorithm:
{:.2f}'.format(frontier[goal][1]))
return current_route
10. REFERENCES
Dere, E., & Durdu, A. (2018, November 13). Usage of the A* Algorithm to Find the Shortest
Path in Transportation Systems.
26
Download