Beginners Program Web Page Builders and Verifiers Martha J. Kosa Tennessee Technological University Box 5101 Cookeville, TN 38505 +1 (931) 372-3579 www.csc.tntech.edu/~mjkosa/iticse99 1. ABSTRACT Many students "surf the Web" in their spare (or not so spare) time. They may see Web sites with deep hierarchical structures that were generated automatically. Some word processors can save their work in HTML, the language understood by browsers. The students may have their own HTML validated at a validation site. Although browsers, generators, and validators can be complex programs, beginning students can implement some of their features to reinforce their understanding of basic CS concepts. In this paper, we describe a set of programming assignments, with HTML as the unifying theme, for a data structures class. This set was used during the Spring 1998 semester. We also give additional ideas for HTML-related assignments in introductory classes. 1.1 Keywords introductory classes, assignments, WWW 2. INTRODUCTION Tim Berners-Lee started the Web phenomenon in 1991, and now the Web is ubiquitous. It has dramatically changed the way the world thinks about computers. It has precipitated a new, evolving paradigm to help faculty teach [2]. Students enjoy using Web browsers to investigate topics of academic and/or personal interest, hopefully with academics having top priority when necessary. They can access the Web at any time and from any place, provided that they have a viable connection to the Internet or have Web pages stored on their local machines. Many students want to build their own Web pages, which are normally written in HTML (HyperText Markup Language). They can write their own HTML code, or they can use a program, such as Microsoft FrontPage, to generate the code. They may also use scanners or drawing programs to create pictures, which can be incorporated in their Web pages. To ensure that their Web pages look reasonably consistent when viewed in different Web browsers, they may send their pages to a validation site, which checks their pages for standard HTML usage. These validators work like parsers, which the students also encounter when compiling their class programming assignments. Writing pure HTML code is a form of programming; however, it is different from the programming that students do in their classes, whether the language they use is procedural, object-oriented, or functional. [7] discusses a first programming course (actually, a "preprogramming" course) in which the students use HTML and JavaScript to build Web pages, one at a time. JavaScript is used as the vehicle for expressing such algorithmic concepts as decision, repetition, and function abstraction. [5] describes an upper-division course that presents various Web development technologies (HTML, JavaScript, Java, CGI, and Web databases) in a breadth-first manner. The main focus of our work is on the data structures course, using a traditional high-level programming language, and we have the students write programs which verify existing Web pages or generate sets of new Web pages as output files. The output files then become input files to another program, namely, a Web browser. The output files are linked to each other via hyperlinks; thus, the students can navigate through them. This helps illustrate further that "running programs on other programs is an every-day occurrence" [1]. Verifying and generating Web pages are realistic ways to use the standard data structures that they have learned in class. The Web can then serve as a unifying theme for the class. [4] discussed the analysis of Martian planetary images as a theme for the first programming course. A well-chosen theme can serve to maintain student interest, even in a challenging course, because the students can see practical uses for what they are learning and can interconnections among seemingly unrelated areas. infer Our paper is organized as follows. First, we describe a set of programming assignments, having the Web as a unifying theme, which were used in an offering of our data structures course during the Spring 1998 semester. Then we give some suggestions for additional assignments for a (possibly more advanced) data structures course. Finally, we close with some concluding remarks. 3. DATA STRUCTURES ASSIGNMENTS In our data structures class, we cover the traditional stacks, queues, lists, binary search trees, and multiway B-trees in great detail. We discuss how they are implemented, including comparisons of alternative implementations. For programming assignments, the students do not have to rebuild what was discussed in class; we give them the necessary modules. Sometimes the students may be asked to implement a new data structure; this allows them to build on their understanding of the data structures presented in class. We now describe assignments that show applications of standard data structures to the Web. We believe that our approach to the data structures class is in general agreement with the views advocated in SIGCSE 1998's popular panel on the future of CS2 [6]. 3.1 Stacks This first assignment shows an application of stacks to the Web. It consists of two parts. The first part is as follows. The program will display how a given HTML file would look in some form of browser. The HTML file will consist of zero or more ordered and/or unordered lists. When this option is selected, the program will ask the user for two file names: an input file containing the HTML commands, and an output file where the formatted text will be stored. The <OL> and </OL> tags are used to open and close ordered (numbered) lists, and the <UL> and </UL> tags are used to open and close unordered lists. If the input file is not valid HTML (i.e., a list is not terminated correctly, there are too many </OL> tags, or some tags do not match up properly), the program will give the user an error message indicating which line of the file caused the problem. We now describe the second part. Given an indented input file, the program will produce the HTML file corresponding to the file. When this option is selected, the program will ask the user for two file names: an input file, and an output file where the HTML output will be stored. The first line of the input file contains a number indicating the number of spaces in an indentation unit. The first part of the assignment gives the student an appreciation of the kind of work performed by browsers and compilers. The second part of the assignment shows the student what is involved in producing outlines and tables of contents. stacks. Both give realistic applications for 3.2 Queues After discussing various uses and implementations of queues, we turn our attention to the priority queue. We discuss the operations that a priority queue should provide to their users and what the operations need to do their jobs. InitializePQ needs an integer indicating the maximum number of active priorities, and initializes the priority queue to be empty. EmptyPQ (respectively, FullPQ) returns a Boolean indicating whether the priority queue is empty (respectively, full). InsertPQ needs access to the item to be inserted in the priority queue, along with an integer indicating the priority of the item. If the priority queue is not empty, RemovePQ removes the highest-priority item and returns it. SizePQ returns the number of items in the priority queue. PrintPQ displays all items in the priority queue, from highest priority to lowest priority. We then discuss how standard queues can be used to implement a priority queue. The students are given a module implementing a standard queue, and they then use this module to implement the operations for a priority queue. Next, they need to test their priority queue. Here is the description of the test program that they were asked to write. Of course, it involves the Web in some way. The demonstration program will use priority queues (the abstract data type just implemented) to analyze another kind of HTML file. The files that the program will analyze will have lines of the following form: <H#> text </H#> where the # denotes a positive integer between 1 and some known upper bound. The program will need to make sure that the numbers in the <H#> and </H#> tags are the same on a given line. The tags are not case-sensitive; i.e., <h2> and <H2> are equivalent. The number will indicate the priority of the item. In a Web browser, the height of the text would depend on the number; a bigger number would indicate taller text. The program will ask the user for an input file name and an output file name. Then it will compute the average heading size for the file, the largest heading size, the smallest heading size, the number of headings of each size (from largest to smallest), and display the text, grouped by heading sizes from largest to smallest. The headings with the same heading size will be displayed in the order in which they were found in the file. All output will be written to the output file. The program is to use the priority queue operations to complete each of these tasks. This assignment shows an application of priority queues in grouping. 3.3 Lists Stacks and queues are specialized lists, where additions and removals occur at only one end for the stack and at opposite ends for the queue. It is not always convenient to use stacks and queues for solving problems, so we discuss generalized lists, where additions and removals can occur at arbitrary places. In class, we compare two alternative implementations, the contiguous and linked, and use a module of the basic list operations to implement more operations. The students now have the tools they need to complete the next Web-related assignment, which is a variation of the assignment using stacks. The list items corresponding to each ordered list or unordered list must be placed in the output file in alphabetical order. The students can assume an upper limit on the number of different indentation levels, and there will be at most one list at each indentation level. This assignment could be extended to handle an arbitrary amount of nesting. Another variation would be to generate a new file, whenever a new list is started, and to add a link to it in the higher-level output file. This assignment gives the students a chance to implement a basic sorting algorithm, without having to know how the underlying list is implemented. They often see the necessity of ordering, such as in maintaining student records. 3.4 Binary Search Trees The binary tree is typically the first example of a nonlinear data structure that students see. They study the traditional preorder, inorder, and postorder traversal techniques, yielding another application of recursion. They learn how to build binary search trees, which have the potential for faster searching times than the exhaustive search required in an unordered structure. After an introduction to binary search trees and traversals, the students can complete the following assignment. The input to the program will be the name of a text file, with the following record format: a line consisting of a name of word or place followed by a line containing a description. The program will then construct a set of Web pages corresponding to the file. Each student will need to build a binary search tree from the entries to help in constructing the set of Web pages. The Web pages will have the following format: name of word or place centered and in bold text the description in a paragraph a picture corresponding to the word a link to the parent file, if present links to the children files, if present Each student will also need to create a root Web page, which has a link to the Web page corresponding to the item stored in the root of the tree. Although we generated test cases for the students during the Spring 1998 semester, students could participate in generating test cases for the program because they can use scanners or drawing programs to produce picture files. Perhaps they could produce a class directory. This assignment further illustrates the power of recursion because a small recursive function can generate many Web pages. It also introduces the students to several new HTML tags. 3.5 Multiway Trees After the students learn about binary trees, they study trees with higher branching factors if time permits. The B-tree, in which all leaves are at the same level, is a typical example. When the discussion of multiway trees is complete, the students can complete the following assignment. The assignment is a variation of the assignment described in Section 3.3. The input to the program will be the name of an indented input file. The program is to produce a set of HTML files corresponding to the file. The number of files generated is equal to the number of lines in the file. The first line of the file contains a number indicating the number of spaces in an indentation unit. The program will also tell the user how many of the HTML files are "under construction". Each student will need to build a multiway tree from the entries in the file to help in constructing the set of Web pages. The name of the root Web page will be name.html, if name.txt is the name of the input file. The name of each non-root Web page file will be topic.html, where topic corresponds to the textual information, with all spaces removed, on the relevant line of the input file. The Web pages will have the following format: name of original topic centered and in bold text (not included for the root Web page) unordered list corresponding to any subtopics, where the list items are in alphabetical order and include links to the Web pages corresponding to the subtopics an indication of "under construction" if there are no subtopics the parent topic and a link to its file, if present For each HTML file generated, there is a known upper bound on the number of links to HTML files in a list. This assignment gives the students the chance to implement and use a new data structure. They get practice in building hierarchies, such as course catalogs and encyclopedic structures. They can again work with sorting and appreciate the power of recursion. 4. MORE DATA STRUCTURES IDEAS In the previous section, we described several assignments from our Spring 1998 offering of the data structures course, which integrate the Web with standard topics from the course. In this section, we describe more ideas for assignments related to sorting algorithms, trees, graphs, and hashing. 4.1 Sorting Algorithms Sorting algorithms are often compared and contrasted in the data structures course. Sorting algorithms which work based on comparing items to each other are doomed to take (n log n) operations, where n is the number of items to be sorted. Algorithms typically presented include bubble sort, selection sort, insertion sort, merge sort, quick sort, tree sort, and heap sort. In linear-time sorting algorithms, comparisons between items are forbidden; thus, there must be some other restrictions on the items (i.e., their values must fall within a particular range). Radix sort is often presented as an example of a linear-time sorting algorithm. The popular and exhaustive Cormen, Leiserson, and Rivest algorithms text [3] describes another linear-time sorting algorithm, counting sort, which works using a tally to count the number of occurrences of a particular item. We now describe a Web-related assignment dealing with sorting algorithms. When students use a browser to visit Web pages, they are causing the pages to get "hits". Systems have log files to maintain statistics on the number of hits received. Sorting can be used to organize this information, such as in ranking the pages at a site according to the number of hits that they have received. The students could write a program to read a log file and rank the pages, from the highest number of hits to the lowest. Their program could then produce a Web page containing the information in graphical form. They could generate a primitive bar chart using simple picture files. They could also compare different sorting algorithms, doing a timing analysis, with a graphical comparison. 4.2 AVL Trees Standard binary search trees are sensitive to the order in which items are added to them. If the items are already in increasing or decreasing order when they are placed in the tree, the tree degenerates into a linear structure, causing inefficient searches. In AVL trees, the tree is restructured when it starts to get too unbalanced, thus maintaining efficient search times. Each item in the tree includes a balance factor, which indicates the difference between the heights of the item's left and right subtrees. For each item, the difference between the heights of the left and right subtrees can be no more than 1. Two kinds of rotations (single or double) serve to restructure the tree. Many data structures textbooks include a discussion of AVL trees. We could modify the assignment from Section 3.4 (binary search trees) to include the balance factors in the Web pages that are generated. Perhaps pictures of a balance scale could be used. 4.3 Trees In Section 3.5 (multiway trees), we described an assignment that produced a set of Web pages from an indented text file. We present here another Web-related assignment dealing with trees, in which the students analyze multiple Web pages to produce an index for the set of Web pages, collecting all the links in a single repository. This index serves the purpose of a table of contents. The input to the program would be a starting Web page. If a given Web page has links in it, the files corresponding to those links would be read and analyzed. The links would be used to build a tree. After all links have been exhausted, the tree is complete, and the process of building the index can begin. This assignment is similar to the directory printing example from [1]. 4.4 Graphs The World-Wide Web can be considered to be a graph, where the directed edges are the hyperlinks between Web pages. A standard adjacency matrix or adjacency list can be transformed into a set of Web pages, which the students can navigate. The standard graph algorithms can also be applied to the Web. The students could build a minimum spanning tree for a set of Web pages, causing a new set of pages to be generated with the minimum number of hyperlinks (and associated costs) such that all pages in the set are still reachable from each other. The students could determine the shortest path from a given Web page to another given Web page. They could perform depth-first and breadth-first traversals for a set of Web pages to ensure that all pages are visited in a systematic fashion. 4.5 Hashing Hashing is a way to improve the efficiency of searching, provided that the number of collisions when adding items to the hashing table is not too high. A hash table could be used to construct a rudimentary search engine program, giving the students yet another application of data structures concepts to the Web. 5. CONCLUSION We have seen the proliferation of the World Wide Web in the past few years. Every day, we see references to the Web in newspaper and magazine advertisements and in television commercials. The World Wide Web relies on powerful hardware to send its information to home and school computers. However, much of the information that is sent is in the relatively simple format of HTML files, which some other programs may have automatically generated. Students today need to obtain a good understanding of the Web and traditional concepts from data structures. In this paper, we presented some ideas for programming assignments, which use the Web to reinforce data structures concepts. In these assignments, students generate Web pages and/or verify the correctness of the pages. These assignments are suitable for either open or closed laboratory settings. The Web browser used by the students to view their generated pages can help in the debugging process. 6. REFERENCES [1] Astrachan, O. Self-Reference is an Illustrative Essential. Proceedings of the Twenty-Fifth SIGCSE Technical Symposium on Computer Science Education (Phoenix AZ, March 1994), ACM Press, 238-242. [2] Boroni, C.M., Goosey, F.W., Grinder, M.T., and Ross, R.J. A Paradigm Shift! The Internet, the Web, Browsers, Java, and the Future of Computer Science Education. Proceedings of the Twenty-Ninth SIGCSE Technical Symposium on Computer Science Education (Atlanta GA, February 1998), ACM Press, 145-152. [3] Cormen, T.H., Leiserson, C.E., and Rivest, R.L. Introduction to Algorithms. MIT Press, 1990. [4] Fell, H.J. and Proulx, V.K. Exploring Martian Planetary Images C++ Exercises for CS1. Proceedings of the Twenty-Eighth SIGCSE Technical Symposium on Computer Science Education (San Jose CA, February 1997), ACM Press, 30-34. [5] Lim, B.B.L. Teaching Web Development Technologies in CS/IS Curricula. Proceedings of the Twenty-Ninth SIGCSE Technical Symposium on Computer Science Education (Atlanta GA, February 1998), ACM Press, 107-111. [6] McCracken, D.D., Dale, N., Wolz, U., Berman, M., and Astrachan, O. Possible Futures for CS2. Proceedings of the Twenty-Ninth SIGCSE Technical Symposium on Computer Science Education (Atlanta GA, February 1998), ACM Press, 357-358. [7] Mercuri, R., Herrmann, N., and Popyack, J. Using HTML and JavaScript in Introductory Programming Courses. Proceedings of the Twenty-Ninth SIGCSE Technical Symposium on Computer Science Education (Atlanta GA, February 1998), ACM Press, 176-180.