Software Engineering for the World-Wide Web? Ray Welland Department of Computing Science University of Glasgow ray@dcs.gla.ac.uk Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 1 WWW Issues • Quality of information and how to find useful information • Policing the contents of the Web (pornography??) • Presentation of information, page design issues, HCI fundamentals • Accessing the information (down load times, network traffic) • Navigation within Web sites and between sites • Structural issues for large Web sites Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 2 Software Engineering • Software Engineering is concerned with the construction and maintenance of large, complex software systems. • SE recognises the fact there is a huge step from writing individual programs for personal use to writing large software systems to support corporate activities. • We have the same problem with the Web! The anarchic element is fine but increasingly companies are investing in Web sites which contain large amounts of important data. • A major lesson from SE is that maintenance is a much bigger problem than development; the same is becoming true of Web sites. Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 3 Software Engineering • One of Lehman’s Laws (observations of the software development process) : Increasing Complexity - as an evolving program changes, its structure tends to become more complex. Extra resources must be devoted to preserving and simplifying the structure. • The same seems to true of the structure of Web sites. But we also have the problem of keeping the information content up to date Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 4 Web Site Maintenance Maintaining your site There is a lot of out-of-date information and dead links on the Web. Don’t add to it! Keep your information current. Check hypertext links regularly to make sure they still work. There are tools available for doing such housekeeping tasks automatically. Usage statistics are an important aspect of maintenance. They will tell you, for instance, if anyone is accessing the information you are providing or if they encountered errors in finding any of the files you linked to. from: TERENA Guide to Network Resource Tools Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 5 Our Experimental Web Site Web site for the Hunterian Museum of the University of Glasgow (http://www.gla.ac.uk/Museum/) • Built up from a series of student projects, investigating: • • • • • • navigation (guided tours of the Museum) use of audio sequences (Latin inscriptions on Roman milestones) inclusion of video clips (Roman armour) use of image maps for navigation (Captain Cook’s voyages) use of QTVR panoramas (Mackintosh House) use of QTVR objects (Asante Gold weights) • Characteristics: • real customers (museum curators) • live site, particularly for educational use • about 900 pages of complex data, approximately 50 contributors Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 6 Cleaning up our Web site! • Within pages: • • • • content errors proper titling of pages for indexing and searching addition of tags for search engines consistency of button usage (within sections and return to higher levels) • Dead link checking • Garbage Collection • Structuring the pages • Visualisation of the linkage structure Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 7 What tools are available? My Web search turned up a number of possible tools: • Website Garage • NetMechanic • Dr Watson • Site Doctor • ... All seemed to have similar characteristics, so I will look at the first two in a little more detail, then try to identify some generally desirable features of Website tools. Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 8 Website Garage • Very snappy presentation of your Web page “tune-up” which checked: • • • • • • • Browser compatibility - whether tags and attributes supported by different browsers Register It readiness - search tool visibility, use of META tags Load time - estimates of load time for different bandwidths Dead links - identify dead links (or time-outs), 25 links checked Link popularity - incoming links to page (from Infoseek) Spelling - rather suspect! HTML design - checks whether HTML tags conform to standard (not design!) • Suggestions: • Keep home page to less than 40K, other pages less than 30K • “Stay aware of interconnections in your Web site by keeping an up-to-date site map” Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 9 NetMechanic • Carried out a site check for links, results were returned via Web pages. • Poor presentation of results: - exhaustive listing of links, mostly marked “OK”; difficult to extract the problems; - many pages with no links, presented at three results per NetMechanic web page. • Failed to deal with mailto tags; some problems with image maps; identified some password protected links but gave false password protected results as well. • A number of links were marked “Not Checked” but why was not clear . • Identified some “No response from host” problems which were not reproducible. • It did find some genuinely broken links! • Lessons • Needs better HTML tag parser to recognise mailto, deal with image maps; • Retain data in better format for filtering, further manipulation. Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 10 Requirements for new tool(s) • Robust HTML parser which is able to handle special tags, such as mailto. • Explicit identification of good Website design metrics (ranges of values where appropriate) so that we can measure them. Tools should accept metric values as parameters. • Storage of data for filtering (e.g. highlight all broken links or all servers not responding) and further analysis (e.g. calculation of additional metrics.) Leads us into Website Meta-database ... • Generation of “site maps”; need for visualisation(s) ... Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 11 Website Meta-database • Meta-database • Domain path; Timestamp of last database update • Possible contents for a component: • • • • • • • • • URL, local pathname and Title of component Type of component (HTML, image, video, audio, ...) Ownership, authorship Date of last change to component Size (+ other characteristics?) Intra component links? Links out: within domain; mailto (and other special cases); external Links in: within domain; (external) Hits (number, where from) • For each link, we need its current status: • OK; Not found; Not responding (time out); password protected; inaccessible Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 12 Website Meta-database (2) • Definition of a Website? • Transitive closure of all pages which are accessed from a given home page on the current server (within the current WWW domain); • All pages contained within a given domain and all its subordinate directories; • Should we allow a collection of bookmarks to be treated as a “website”? • Transitive Closure from home page ensures the referential integrity of the site (if no links are broken). • Using the domain directory structure it is possible to identify pages which are within the current “website” but are not connected to it. (Garbage collection) • Both of these are useful Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 13 Uses for Meta-database • Dead link finding • Disconnected pages (depending upon definition of “website”) • Impact Analysis - if this page is changed which other pages are affected? • Analysis of Structure - is there a Web equivalent of the fan-in and fan-out measures, used in Structured Design? Clustering of pages for access? (Other complexity metrics?) • Identifying recent changes, rate of change, “hot spots” • Visualisation of structure - visual representation of complexity of structure ... Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 14 Visualisation • Initially build a tool to present the structure of a Website • Differentiate types of links (colour) and be able to select which are shown • Differentiate types of pages (icons) and select which are included (e.g. filter out images, etc.) • Change viewpoints (perhaps fisheye with given page at centre of focus) • Special treatment of return links to home page(s) to reduce detail? • Show current browser position in Website map (as an aid to navigation) • Active visualisation might follow • Use visualisation for navigation; move through the site by moving around the visualisation (switch between browser view and structural view) • Link editing to navigation; select page to be edited from visualisation Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 15 Other Possible Developments • Structured Editor for pages • specify templates for standard pages (specific to Website or section), page header, colours, title tag, return buttons • Web site design tool • allow the user to model the structure of a website (or part of it) • generate the outline structure of the site • need for design notation? • Version Control • too much work is done “on the fly” with existing websites (would not be acceptable for large commercial systems) • no record of changes or control of changes Institutt for informatikk - R. Welland 1998 Software Engineering for the World-Wide Web? 16