A 3D Web Visualization System

advertisement
A 3D Web Visualization System
A project proposal for CS8803 Advanced Internet Applications Development Course
Pradeep Yadav
College of Computing
Georgia Institute of Technology
GA, Atlanta
pyadav3@mail.gatech.edu
Abstract
World Wide Web is a currently a huge collection of web pages which are linked to
one another in arbitrary ways. One of the major challenges is to visualize this
structure in an effective and manageable way. World Wide Web can be regarded as
large graph with web pages acting as nodes and hyperlinks as edges. In literature
many methods have been applied to view this graph in a way, which provides
meaningful insights into the structure of World Wide Web (www). This paper aims at
representing this large graph in three dimensions in a slightly different way than that
employed in current literature. Potential uses of such a representation of www are
visualizing the evolution of www and effective web browsing. One interesting usage is
blogging at a particular node. If this three dimensional viewing service is provided on
a server, such blogging can provide a way for users to convey their opinions about
sites, hold discussions and view blogs of particular websites in a very effective way.
Motivation
World Wide Web has grown into a massive structure of network consisting of
millions of web pages. Web browsing using conventional web browsers is not an
effective way of navigating through these millions of pages. Such techniques do not
allow user to get an overall picture of the structure of the web, which can help in
providing useful web browsing capabilities. An effective visualization technique is
required which can represent such a large graph in a manageable way, which should
also provide an intuitive insight into the structure of www. Much work has already
been done in this respect and people have come up with various schemes. But either
all the schemes lack an effective visualization by displaying an overwhelming amount
of information or they lack an effective way of navigation.
In this paper I will try to represent www as a graph but in a slightly different way. The
visualization of this graph will be in three dimensions and will provide features of
easy, intuitive, hierarchical navigation. The aim is effective, manageable visualization
together with intuitive navigation.
Related Work
Many people have represented World Wide Web as a graph with web pages as nodes
and hyperlinks as edges. To visualize such a graph many techniques have been
employed such as tree, 2D graph layout such as in [1]. Some have visualised the local
web search as 2D graphs [2]. But such techniques are not efficient enough to
represent www in a way, which provides an overall view of the structure of www. To
1
this end people have represented www in three dimensions. One particularly
interesting work is provided by Tamara Munzner and Paul Burchard in [3], which is
visualization in 3D hyperbolic space. A comprehensive study of some three
dimensional visualization techniques employed, is provided in [4].
Proposed Work
In this project I propose to implement a three-dimensional visualization system for
World Wide Web in a way, which is both manageable together with effective
navigation.
World Wide Web can be viewed as a graph with web pages acting as nodes and
hyperlinks as edges. This approach is very idealistic and makes visualization of the
graph unmanageable as the graph consists of millions of nodes and edges. My
approach is to represent domains as nodes, rather than each web page as a single
node. For example the highest-level domains can be .com, .edu, .org, .net etc that can
be represented as nodes with directed weighted edges connecting them. The weights
of the edges correspond to the number of links pointing from one domain to another.
Such representation brings out the hierarchical nature of www. This also shows
encapsulation in terms of hiding all the sub domains within a domain. As an example
take .edu domain. This node consists of gatech.edu, Stanford.edu, princeton.edu etc.
Gatech.edu in turn consists of cc.gatech.edu, ece.gatech.edu, library.gatech.edu sub
domains which in turn consists of their own sub domains and so on. But even .edu
domain is overwhelming as it consists of thousands of university websites. One
solution is to divide the domains region wise, i.e. geographically clustered, for
example .edu domain will consist of domains consisting for sub-domains for US,
China, India etc.
There are two major advantages associated with above representation. First of all it
provides a manageable, intuitive way of visualizing World Wide Web. Also it will
provide an easy navigation through the domains. As can be noted the edges at higher
level don’t provide a web page to web page edge, but an overall metric of number of
edges pointing from one domain to another domain and as such provides an
approximation to the true graph. This scheme is also not sensitive to dynamic contents
to web pages, which is to say that if the number of links to and fro of the domains are
changing on individual pages, it won’t effect the overall look of the graph at the
highest level, as total number of link pointing to and from the domain will not change
so much.
The project will have following major components:
1. A basic web crawler, which crawls through seed pages.
2. A database to record the URL of pages visited and to the URLs they are
pointing to.
3. A 3D visualization interface with very basic navigation features such as
selecting a particular domain to view the sub domains of that domain,
navigating back, rotating around etc.
Three-dimensional visualization will consist of domains represented as spheres whose
size will indicate the number of sub domains it has and its colour will indicate the
2
domain type. The edges from one sphere to another sphere will be shown as cylinders
whose radius will indicate the number of links pointing to a domain and it’s colour
will indicate from which domain is pointing from, which is same as the colour of the
sphere it is pointing from. A threshold will be used to consider whether to display an
association of one domain with another. If it’s a weak association, i.e., number of
links pointing to the domain is less than the threshold then it will not be displayed.
Such a threshold will be a parameter to this program.
Following diagram shows the basic high-level architectural design:
Plan of Action
I plan to implement the project in Java. Web crawler will crawl www.gatech.edu to
build a database. Database will be either MS-Access database or any open source
database, which can be accessed using JDBC. 3D display will be implemented using
JOGL, which is an implementation of OpenGL for Java. GUI design will be done
using Java Swing components.
Following is the plan of action for this project:
1. Implementing a basic web crawler using java, which dumps the results in a flat
data file. (Feb 20)
2. Designing an algorithm to calculate spatial orientation for all domains, so that
display accommodates all domains. (Feb 25)
3. Modifying web crawler to dump data in a database. (Mar 5)
4. Experimenting with www.gatech.edu domain and deciding on storing data in
database in most efficient manner, which will help in obtaining hierarchy
easily (Mar 10).
5. Basic GUI module to display basic 3D shapes and basic feature like rotation.
(Mar 15)
6. Displaying domains in 3D as spheres and hyperlinks from one domain to
another as cylinders. (Mar 25)
7. Start working on final report. (Mar 20)
8. Integrating all components in one and testing with www.gatech.edu domain
and two other domains. (Mar 25 – April 10)
9. Complete project report and work on enhancements (April 10 – April 20).
3
Evaluation and Testing Method
1. The primary deliverable of this project is a package which can be run as GUI
and provide feature to crawl on many domains and display 3D visualization of
domains.
2. The system evaluation will be primarily based on www.gatech.edu domain.
The evaluation will be more qualitative in nature providing details about
effectiveness of the visualization system as compared to others.
3. Another evaluation measure can be effectiveness of visualisation system in
providing insights about structure of a domain.
4. Yet another evaluation measure can be potential uses of such a system.
References
[1] An On-Line Web Visualization System with Filtering and Clustering Graph
Layout by Wei Lai, Xiaodi Huang, Ronald Wibowo, and Jiro Tanaka, IEEE
Intelligent Informatics Bulletin 5(1): 11-17 (2005)
[2] A Visualization System for Web Local Search by M. Angelaccio, B. Buttarazzi,
Information Visualization, 2000. Proceedings. IEEE International Conference on
19-21 July 2000 Page(s):474 - 478
[3] Visualizing the Structure of the World Wide Web in 3D Hyperbolic Space by
Tamara Munzner and Paul Burchard, Proceedings of VRML '95, (San Diego,
California, December 14-15, 1995), special issue of Computer Graphics, ACM
SIGGRAPH, New York, 1995, pp. 33-38.
[4] Three Dimensional Visualization of the World Wide Web by Steve Benford et al,
CM Computing Surveys, Vol. 31, Number 4es, December 1999
4
5
Download