A 3D Web Visualization System A project proposal for CS8803 Advanced Internet Applications Development Course Pradeep Yadav College of Computing Georgia Institute of Technology GA, Atlanta pyadav3@mail.gatech.edu Abstract World Wide Web is a currently a huge collection of web pages which are linked to one another in arbitrary ways. One of the major challenges is to visualize this structure in an effective and manageable way. World Wide Web can be regarded as large graph with web pages acting as nodes and hyperlinks as edges. In literature many methods have been applied to view this graph in a way, which provides meaningful insights into the structure of World Wide Web (www). This paper aims at representing this large graph in three dimensions in a slightly different way than that employed in current literature. Potential uses of such a representation of www are visualizing the evolution of www and effective web browsing. One interesting usage is blogging at a particular node. If this three dimensional viewing service is provided on a server, such blogging can provide a way for users to convey their opinions about sites, hold discussions and view blogs of particular websites in a very effective way. Motivation World Wide Web has grown into a massive structure of network consisting of millions of web pages. Web browsing using conventional web browsers is not an effective way of navigating through these millions of pages. Such techniques do not allow user to get an overall picture of the structure of the web, which can help in providing useful web browsing capabilities. An effective visualization technique is required which can represent such a large graph in a manageable way, which should also provide an intuitive insight into the structure of www. Much work has already been done in this respect and people have come up with various schemes. But either all the schemes lack an effective visualization by displaying an overwhelming amount of information or they lack an effective way of navigation. In this paper I will try to represent www as a graph but in a slightly different way. The visualization of this graph will be in three dimensions and will provide features of easy, intuitive, hierarchical navigation. The aim is effective, manageable visualization together with intuitive navigation. Related Work Many people have represented World Wide Web as a graph with web pages as nodes and hyperlinks as edges. To visualize such a graph many techniques have been employed such as tree, 2D graph layout such as in [1]. Some have visualised the local web search as 2D graphs [2]. But such techniques are not efficient enough to represent www in a way, which provides an overall view of the structure of www. To 1 this end people have represented www in three dimensions. One particularly interesting work is provided by Tamara Munzner and Paul Burchard in [3], which is visualization in 3D hyperbolic space. A comprehensive study of some three dimensional visualization techniques employed, is provided in [4]. Proposed Work In this project I propose to implement a three-dimensional visualization system for World Wide Web in a way, which is both manageable together with effective navigation. World Wide Web can be viewed as a graph with web pages acting as nodes and hyperlinks as edges. This approach is very idealistic and makes visualization of the graph unmanageable as the graph consists of millions of nodes and edges. My approach is to represent domains as nodes, rather than each web page as a single node. For example the highest-level domains can be .com, .edu, .org, .net etc that can be represented as nodes with directed weighted edges connecting them. The weights of the edges correspond to the number of links pointing from one domain to another. Such representation brings out the hierarchical nature of www. This also shows encapsulation in terms of hiding all the sub domains within a domain. As an example take .edu domain. This node consists of gatech.edu, Stanford.edu, princeton.edu etc. Gatech.edu in turn consists of cc.gatech.edu, ece.gatech.edu, library.gatech.edu sub domains which in turn consists of their own sub domains and so on. But even .edu domain is overwhelming as it consists of thousands of university websites. One solution is to divide the domains region wise, i.e. geographically clustered, for example .edu domain will consist of domains consisting for sub-domains for US, China, India etc. There are two major advantages associated with above representation. First of all it provides a manageable, intuitive way of visualizing World Wide Web. Also it will provide an easy navigation through the domains. As can be noted the edges at higher level don’t provide a web page to web page edge, but an overall metric of number of edges pointing from one domain to another domain and as such provides an approximation to the true graph. This scheme is also not sensitive to dynamic contents to web pages, which is to say that if the number of links to and fro of the domains are changing on individual pages, it won’t effect the overall look of the graph at the highest level, as total number of link pointing to and from the domain will not change so much. The project will have following major components: 1. A basic web crawler, which crawls through seed pages. 2. A database to record the URL of pages visited and to the URLs they are pointing to. 3. A 3D visualization interface with very basic navigation features such as selecting a particular domain to view the sub domains of that domain, navigating back, rotating around etc. Three-dimensional visualization will consist of domains represented as spheres whose size will indicate the number of sub domains it has and its colour will indicate the 2 domain type. The edges from one sphere to another sphere will be shown as cylinders whose radius will indicate the number of links pointing to a domain and it’s colour will indicate from which domain is pointing from, which is same as the colour of the sphere it is pointing from. A threshold will be used to consider whether to display an association of one domain with another. If it’s a weak association, i.e., number of links pointing to the domain is less than the threshold then it will not be displayed. Such a threshold will be a parameter to this program. Following diagram shows the basic high-level architectural design: Plan of Action I plan to implement the project in Java. Web crawler will crawl www.gatech.edu to build a database. Database will be either MS-Access database or any open source database, which can be accessed using JDBC. 3D display will be implemented using JOGL, which is an implementation of OpenGL for Java. GUI design will be done using Java Swing components. Following is the plan of action for this project: 1. Implementing a basic web crawler using java, which dumps the results in a flat data file. (Feb 20) 2. Designing an algorithm to calculate spatial orientation for all domains, so that display accommodates all domains. (Feb 25) 3. Modifying web crawler to dump data in a database. (Mar 5) 4. Experimenting with www.gatech.edu domain and deciding on storing data in database in most efficient manner, which will help in obtaining hierarchy easily (Mar 10). 5. Basic GUI module to display basic 3D shapes and basic feature like rotation. (Mar 15) 6. Displaying domains in 3D as spheres and hyperlinks from one domain to another as cylinders. (Mar 25) 7. Start working on final report. (Mar 20) 8. Integrating all components in one and testing with www.gatech.edu domain and two other domains. (Mar 25 – April 10) 9. Complete project report and work on enhancements (April 10 – April 20). 3 Evaluation and Testing Method 1. The primary deliverable of this project is a package which can be run as GUI and provide feature to crawl on many domains and display 3D visualization of domains. 2. The system evaluation will be primarily based on www.gatech.edu domain. The evaluation will be more qualitative in nature providing details about effectiveness of the visualization system as compared to others. 3. Another evaluation measure can be effectiveness of visualisation system in providing insights about structure of a domain. 4. Yet another evaluation measure can be potential uses of such a system. References [1] An On-Line Web Visualization System with Filtering and Clustering Graph Layout by Wei Lai, Xiaodi Huang, Ronald Wibowo, and Jiro Tanaka, IEEE Intelligent Informatics Bulletin 5(1): 11-17 (2005) [2] A Visualization System for Web Local Search by M. Angelaccio, B. Buttarazzi, Information Visualization, 2000. Proceedings. IEEE International Conference on 19-21 July 2000 Page(s):474 - 478 [3] Visualizing the Structure of the World Wide Web in 3D Hyperbolic Space by Tamara Munzner and Paul Burchard, Proceedings of VRML '95, (San Diego, California, December 14-15, 1995), special issue of Computer Graphics, ACM SIGGRAPH, New York, 1995, pp. 33-38. [4] Three Dimensional Visualization of the World Wide Web by Steve Benford et al, CM Computing Surveys, Vol. 31, Number 4es, December 1999 4 5