Programming Project README Vikas Harlani This README serves as a user guide for executing the project, as well as details the design decisions made. Executing Project There is only one .cpp file named main.cpp that encompasses all the code required. Compilation For compilation, GCC is used. One important parameter to set is to specify the C++ version to be 17. This is required as <filesystem> library was used which is part of the C++17 standard libraries, but not older versions. Compilation can be done using the following code: g++ -std=c++17 main.cpp -o main Figure 1 below shows an example of the compilation Figure 1: Successful compilation of code Execution Once the code has been successfully compiled, the executable can be run by calling the application in terminal/command prompt. Figure 2 below shows an example Execution Flow & Commands This section discusses the execution flow of the application and the various commands. Initial launch After initially running the program, it will prompt you to specify the path to directory that contains persistent storage orderbook snapshot data. If this is your first time running the code, then you will not have any orderbook snapshot data yet. In this case, please specify the path to where you would like the data to be saved. If the path specified is not currently a directory (folder), a directory will be created. You can specify partial paths, relative to the current working directory. Figure 3 below shows an example of specifying the path. Figure 3: Specifying relative path to Snapshot_data directory Main workflow After specifying the path to the snapshot data, the workflow of the program is to allow the user to input one of three commands: read, query or exit. After executing a command, it will return to this prompt, until exit is called. Read The read command is for reading in new order based market data. The assumption made here is that you will be reading in data for a particular symbol and a .log file will be supplied with the data. After specifying the read command, the program will prompt you for the symbol you are reading for. Figure 4 below shows an example. Figure 4: Read Command The user would input the symbol they are reading in data for. As an example, I will use the symbol SCS and the SCS.log file. After supplying the symbol information, the program will prompt you for the filepath to read data from. Once again, relative paths are accepted. Figure 5 below shows an example of the execution. Figure 5: Read Command Execution For each symbol, a separate snapshot file is created. If there currently exists a snapshot file for a symbol that we are supplying new data for, the read command would read that file and update the snapshot data using the new order based data. Query The query command is for querying of snapshots with multiple parameters to be specified. After specifying the query command, the program will prompt you for the symbols you wish to query for. Figure 6 below shows an example. Figure 6: Query Command Symbol Prompt You can specify as many symbols you wish to query for and each must be separated by a whitespace. After specifying the symbols, the program prompts you to select the fields you wish to output. You can either choose to specify ‘all’ or to select the fields, each separated by a whitespace. The possible fields are: symbol, epoch, ask1p, ask1q, ask2p, ask2q, ask3p, ask3q, ask4p, ask4q, ask5p, ask5q, bid1p, bid1q, bid2p, bid2q, bid3p, bid3q, bid4p, bid4q, bid5p, bid5q, lastTradePrice, lastTradeQuantity The final prompt before executing the query is to specify the time range of your query. For this you will need to specify two epoch values separated by a whitespace. The first indicates the starting epoch and the second indicates the ending epoch. The query will return all epoch values that are within this range, inclusive. These starting and ending epoch values can be the same. For either of these epoch values, you may also choose to specify ‘earliest’ or ‘latest’ instead. If you select earliest, the epoch value would be the minimum value present in the snapshot data for that symbol. If you select latest, the epoch value is the maximum value present in the snapshot data for that symbol. Figure 7 below shows an example execution for Symbol SCH, all fields and starting and ending epoch being set to earliest. Figure 7: Query Command Execution Exit The final command is exit. This allows you to exit from the program, and can be used while specifying the parameters for the other commands too. Testing For testing I have added two commented out functions in the main() function. Figure 8 below shows a screenshot. Figure 8: Testing functions These functions allow you to test the read and query commands without having to go through the normal workflow. If I had more time, I would’ve like to add more automated testing and test my functionality better as well. Design Decisions There were quite a few design decisions made during this project. Storing snapshot data The first decision was to calculate the snapshot at each time epoch when reading in new order based data and to store these snapshots in persistent storage. This was done to allow for much more efficient querying, as all the snapshots would be ready to be queried. Following on this decision, the snapshot data for each symbol is stored as a separate .CSV file. Each symbol is stored separately to allow the program to easily identify which file contains the snapshots required. CSV files were chosen as they have great readability. Calculating snapshot data When calculating the snapshots, the order based data is first imported as a vector of LogEntry, where the LogEntry struct is as follows: Each LogEntry is basically a row of the .log file provided as input data. The vector of LogEntry is then sorted according to the following: 1. Sort by ascending order of epoch 2. If epoch’s are the same then order_side == “BUY” comes before order_side == “SELL” 3. If order_side are both “BUY”, then sort by ascending order of price 4. If order_side are both “SELL”, then sort by descending order of price 5. If price is the same, then order_category == “NEW” comes before others This is done because I realized there is a flaw in my calculations as I do not keep track of the 6th, 7th, 8th… highest bid and ask prices. So when there are cancelled and trade orders that remove one of the top 5 bid or ask prices, the 5th bid or ask prices is set back to null. By doing this sorting, I am to minimize such situations, as the highest BUY orders and lowest SELL orders come last. This means that previous 5th bid or ask prices which are removed are less likely to be present in the final standings of the epoch. For my logic of calculations, I followed the order book building.pdf guide provided. For the bid price, the highest price is 1st, while for ask price, the lowest price is 1st. Earliest and Latest I added the earliest and latest options when selecting the time range as I felt the epoch values were quite difficult to type out. If I just wanted the latest snapshot, this allowed me an easy way to obtain that information Back function When inside the ‘read’ or ‘query’ commands, there are times when you might make a mistake. In such cases, rather than quitting the program, you can type back to return to the main prompt of selecting command. Difficulties and Improvements One of the biggest difficulties came from the reading in of stored snapshot data. Initially I had decided to store the snapshot data as a binary file instead of CSV file. This was because I read that binary files would store the data more efficiently. However, after a few hours spent trying to debug and read binary files, I gave up and switched my approach to using CSV files. Even using the CSV file, I encountered errors such as the epoch’s not being read correctly, and all the bid and ask prices being read as NA. In the end, I managed to overcome this error using the getline function with a comma delimiter. However, the resulting code was not very elegant. This is something I would fix if I had more time. Another improvement I would make would be to fix the logic of my calculations to deal with the flaw of not keeping track of the 6th, 7th, 8th… highest bid and ask prices. I would also like to add more automated testing and test the functionality of my program better. Finally I would improve the structure of the code overall, as it is currently just one main.cpp file with no real structure. I would also add more detailed comments that explains the functionality better.