Uploaded by Harish Harlani

Programming Project README

advertisement
Programming Project README
Vikas Harlani
This README serves as a user guide for executing the project, as well as details the design
decisions made.
Executing Project
There is only one .cpp file named main.cpp that encompasses all the code required.
Compilation
For compilation, GCC is used.
One important parameter to set is to specify the C++ version to be 17. This is required as
<filesystem> library was used which is part of the C++17 standard libraries, but not older
versions.
Compilation can be done using the following code: g++ -std=c++17 main.cpp -o main
Figure 1 below shows an example of the compilation
Figure 1: Successful compilation of code
Execution
Once the code has been successfully compiled, the executable can be run by calling the
application in terminal/command prompt. Figure 2 below shows an example
Execution Flow & Commands
This section discusses the execution flow of the application and the various commands.
Initial launch
After initially running the program, it will prompt you to specify the path to directory that
contains persistent storage orderbook snapshot data. If this is your first time running the
code, then you will not have any orderbook snapshot data yet. In this case, please specify
the path to where you would like the data to be saved. If the path specified is not currently a
directory (folder), a directory will be created.
You can specify partial paths, relative to the current working directory.
Figure 3 below shows an example of specifying the path.
Figure 3: Specifying relative path to Snapshot_data directory
Main workflow
After specifying the path to the snapshot data, the workflow of the program is to allow the
user to input one of three commands: read, query or exit. After executing a command, it will
return to this prompt, until exit is called.
Read
The read command is for reading in new order based market data. The assumption made
here is that you will be reading in data for a particular symbol and a .log file will be supplied
with the data.
After specifying the read command, the program will prompt you for the symbol you are
reading for. Figure 4 below shows an example.
Figure 4: Read Command
The user would input the symbol they are reading in data for. As an example, I will use the
symbol SCS and the SCS.log file. After supplying the symbol information, the program will
prompt you for the filepath to read data from. Once again, relative paths are accepted.
Figure 5 below shows an example of the execution.
Figure 5: Read Command Execution
For each symbol, a separate snapshot file is created. If there currently exists a snapshot file
for a symbol that we are supplying new data for, the read command would read that file and
update the snapshot data using the new order based data.
Query
The query command is for querying of snapshots with multiple parameters to be specified.
After specifying the query command, the program will prompt you for the symbols you wish
to query for. Figure 6 below shows an example.
Figure 6: Query Command Symbol Prompt
You can specify as many symbols you wish to query for and each must be separated by a
whitespace.
After specifying the symbols, the program prompts you to select the fields you wish to
output. You can either choose to specify ‘all’ or to select the fields, each separated by a
whitespace.
The possible fields are: symbol, epoch, ask1p, ask1q, ask2p, ask2q, ask3p, ask3q, ask4p,
ask4q, ask5p, ask5q, bid1p, bid1q, bid2p, bid2q, bid3p, bid3q, bid4p, bid4q, bid5p, bid5q,
lastTradePrice, lastTradeQuantity
The final prompt before executing the query is to specify the time range of your query. For
this you will need to specify two epoch values separated by a whitespace. The first indicates
the starting epoch and the second indicates the ending epoch. The query will return all
epoch values that are within this range, inclusive. These starting and ending epoch values
can be the same. For either of these epoch values, you may also choose to specify ‘earliest’
or ‘latest’ instead. If you select earliest, the epoch value would be the minimum value
present in the snapshot data for that symbol. If you select latest, the epoch value is the
maximum value present in the snapshot data for that symbol.
Figure 7 below shows an example execution for Symbol SCH, all fields and starting and
ending epoch being set to earliest.
Figure 7: Query Command Execution
Exit
The final command is exit. This allows you to exit from the program, and can be used while
specifying the parameters for the other commands too.
Testing
For testing I have added two commented out functions in the main() function. Figure 8
below shows a screenshot.
Figure 8: Testing functions
These functions allow you to test the read and query commands without having to go
through the normal workflow. If I had more time, I would’ve like to add more automated
testing and test my functionality better as well.
Design Decisions
There were quite a few design decisions made during this project.
Storing snapshot data
The first decision was to calculate the snapshot at each time epoch when reading in new
order based data and to store these snapshots in persistent storage. This was done to allow
for much more efficient querying, as all the snapshots would be ready to be queried.
Following on this decision, the snapshot data for each symbol is stored as a separate .CSV
file. Each symbol is stored separately to allow the program to easily identify which file
contains the snapshots required. CSV files were chosen as they have great readability.
Calculating snapshot data
When calculating the snapshots, the order based data is first imported as a vector of
LogEntry, where the LogEntry struct is as follows:
Each LogEntry is basically a row of the .log file provided as input data. The vector of LogEntry
is then sorted according to the following:
1. Sort by ascending order of epoch
2. If epoch’s are the same then order_side == “BUY” comes before order_side == “SELL”
3. If order_side are both “BUY”, then sort by ascending order of price
4. If order_side are both “SELL”, then sort by descending order of price
5. If price is the same, then order_category == “NEW” comes before others
This is done because I realized there is a flaw in my calculations as I do not keep track of the
6th, 7th, 8th… highest bid and ask prices. So when there are cancelled and trade orders that
remove one of the top 5 bid or ask prices, the 5th bid or ask prices is set back to null. By
doing this sorting, I am to minimize such situations, as the highest BUY orders and lowest
SELL orders come last. This means that previous 5th bid or ask prices which are removed are
less likely to be present in the final standings of the epoch.
For my logic of calculations, I followed the order book building.pdf guide provided. For the
bid price, the highest price is 1st, while for ask price, the lowest price is 1st.
Earliest and Latest
I added the earliest and latest options when selecting the time range as I felt the epoch
values were quite difficult to type out. If I just wanted the latest snapshot, this allowed me
an easy way to obtain that information
Back function
When inside the ‘read’ or ‘query’ commands, there are times when you might make a
mistake. In such cases, rather than quitting the program, you can type back to return to the
main prompt of selecting command.
Difficulties and Improvements
One of the biggest difficulties came from the reading in of stored snapshot data. Initially I
had decided to store the snapshot data as a binary file instead of CSV file. This was because I
read that binary files would store the data more efficiently. However, after a few hours spent
trying to debug and read binary files, I gave up and switched my approach to using CSV files.
Even using the CSV file, I encountered errors such as the epoch’s not being read correctly,
and all the bid and ask prices being read as NA. In the end, I managed to overcome this error
using the getline function with a comma delimiter. However, the resulting code was not very
elegant. This is something I would fix if I had more time.
Another improvement I would make would be to fix the logic of my calculations to deal with
the flaw of not keeping track of the 6th, 7th, 8th… highest bid and ask prices.
I would also like to add more automated testing and test the functionality of my program
better.
Finally I would improve the structure of the code overall, as it is currently just one main.cpp
file with no real structure. I would also add more detailed comments that explains the
functionality better.
Download