CSCI-20 Hashing and Direct File I/O Programming Lab #6, due... Please work in groups of two. Do not work...

advertisement

CSCI-20 Hashing and Direct File I/O Programming Lab #6, due 5/12/16

Please work in groups of two. Do not work alone. Create a direct access file containing a hash table for rapid access to records in a retail point-of-sale database, then model several transactions in the system.

The data will be two kinds of files: a text file with multiple lines to extract and load into the direct access file, and series of sale transactions text files each selling a number of items from the data file. Your output will be an audit of the sales transactions. Also, send me the direct access file. You will write two programs, one to build the direct access file and one to process transactions from the direct access file.

Do not build the direct access file and process transactions in the same program. Build a hashing library for use by both clients.

Use a table of p = 97 buckets, with four slots per bucket, for a total of 388 maximum records. Your hash function for key k is h(k) = k % p. q(k) is k / p % p, and if q(k) = 0 set q(k) to 1. Your collision strategy will be to use (h(k) + q(k)) % p, then (h(k) - 2q(k)) % p, then (h(k) + 3q(k)) % p, or alternating between adding and subtracting the quotient times the number of collisions so far. This will scatter the rehashes as much as possible. Be careful with negative values mod p, since they can give a negative result, so you will need to add p if you get a result < 0. Using four slots per bucket means that collisions only really matter after the first four records are added to a bucket and the fifth record hashes to that location.

Your item data structure is as follows: struct DATA

{

char key[8]; // '1' followed by 7 digits for key (not a C-string)

char desc[25]; // description (not a C-string)

char tax; // 'Y'/'N' taxable

char price[8]; // 6.2 format with implied fixed point

};

The data file comprises 291 data records in this format. You will read them and hash them into your direct access file, which will fill the file approximately 75% full. You will not delete records from the file, so you don't need to worry about keeping a status (e.g., never-used, in-use, deleted) in the file.

The transaction files will be series of transactions (all cash sales), each a series of one or more items

(optionally containing a quantity) on lines by themselves, followed by a total line. You will look up the items and print the item number, description, taxability, quantity (if not one), and extended sale price (sale price times quantity). Accumulate taxable and non-taxable subtotals, and when you encounter the total line, calculate tax at 9.25%, rounded to the nearest cent; then print a subtotal line (the sum of taxable and non-taxable subtotals), a tax line and a total line followed by a row of dashes to indicate cutting the receipt. Align the amounts so the decimal points are in the same column.

Items in the transaction file may not exist. In that case, print a message saying "item number xxxxxx not found" or something similar, and continue processing the transaction. In this case, do not change any subtotals.

At end of file on the input, print accumulated totals: the amount of taxable and non-taxable sales and the amount of tax collected, and the grand total, the amount of cash collected.

Write this in two pieces, as two separate programs. The first builds the data file. This program can also output statistics about the hashing so you know which items had long collision chains and which didn't.

The second program then processes the transaction file(s). Test with my data and transaction file, and at

least 50 transactions of several items each of our own. You may use my items for all of your tests. Your transaction test data must include items with relatively long collision chains and items not on file. You must access the data file directly for all access, do NOT just load the entire file into a table and access the table. Send me your direct access (hash access) file, all source/header files, and the test files (your input and output).

Hints:

First, design and write the hashing algorithm using just the keys in a 2-D array (97x4) of 32-bit ints, initially all some illegal value like -1. After testing that, then (and only then) adapt the algorithm to the direct access disk file.

You may want to dump debug information about the hashing values and collision chains to check your work. Remove all of the debug code before submitting the program.

For the items file, you can read the characters from the file into a buffer and then memcpy() them into the structs per my direct access file example program.

For programs in this class, they are quite small...

I have test data on my web site. The DFAItems.txt file looks something like this:

10015290Sales item description Y00005555

10028753Hashing is Fun! Y00003414

10010634This is the good stuff Y00009041

10000541Pipe wrench set Y00002262

10004227Bird cages Y00000914

10006469This is the good stuff Y00003678

10010731This is a test Y00001536

10006283Got data? N00009080

... with many more records

The transaction.txt file looks something like this:

10015290 item with no quantity (= 1)

10028753 2 item with qty of 2

T Total – end of transaction

10010634 start of next transaction

10000541

10006469 3

T

... with, again, many more transactions.

Download