Design Document Lab 4 – Part 1 Key Observations 1) The number of command line arguments is fixed and known at run‐time. Therefore the number of words that the program will be searching for in each line is fixed and known at run‐time. Hence the program can use an array to store each of the search words. 2) For each search word, the program can use a linked list to keep track of the line numbers on which the search word occurs. The problem specification does not limit the size of the file or the number of lines on which a search word appears so it is inappropriate to use an array to hold the line numbers. 3) We are searching the lines in ascending order so the line numbers will be stored in ascending order in the linked lists. The problem states that we should not duplicate line numbers so when we locate a search word, we do not want to add a line number to the search word’s list if it is already there. Since the line numbers appear in ascending order on the search word’s linked list, the program only needs to check the last entry in the list to determine whether or not the line has already been added to the list. It is not necessary to search the entire list of line numbers. The problem description states that doubly‐linked lists with sentinel nodes must be used, so it will be possible to access the last element in a list by following the previous pointer of the sentinel node. 4) There are two ways to examine each line for the search words and each way is equally good: a) for each of the words in the line, determine whether or not that word matches one of the search words. If so, append the line number to the end of the appropriate search word’s list, unless the line number is already in the list. If the problem had stated that the search words were unique, the program could immediately break out of the loop for this word in the line since none of the remaining search words would match it. However, since the problem allows for search words to be duplicated, we must continue the search. b) for each search word, scan through the line to determine if the search word is on the line. If the search word is found, append the line number to the end of the word’s list. We can immediately break out of the loop for this search word since there is no reason to search for duplicates. Data Structure Diagrams In a diagram it is ok to use either abstract data or concrete data, whichever you prefer. I would like to see a version of the array/linked list diagram that I drew in class yesterday. Structs The program needs two structs, one to hold the words that go in the array and one to represent the nodes in the doubly‐linked list: typedef struct _Node { int linenum; struct _Node *next; struct _Node *prev; } Node; typedef struct { char *word; Node *sentinel_node; } Word; The word field in the Word struct is declared as a char * instead of a character array because the size of each search word is unknown. Test Cases There are three types of test cases: 1) normal data, 2) error checks, 3) boundary checks or unusual cases. 1) Normal Data: This is the easiest case and often requires only one or two files to check your program: a) A program with a few lines of words and several lines with duplicated words 2) Error Checks: Errors can occur in either the command line arguments given to the program or in the input data: a) Improper number of command‐line arguments b) Input file does not exist c) There are no obvious source of errors in the input data because they are just ordinary words with no restriction on length, case, or composition of characters (e.g., they are not restricted to characters only, or numbers only, etc.). 3) Boundary checks/Unusual cases a) Empty input file b) Extremely large input file c) Duplicate search words on the command line d) Duplicate search words with different case (e.g., Nice, nice, NICE) e) Duplicate words with different case on the same line in the input data (e.g., That nice dog is very nice, very NICE). f) A search word does not appear in the input file