Lab Report File - California State University, Los Angeles

advertisement
John Paul Adigwu
The Semantic Network and Scraping Tool
February 2010
Introductory Overview
The purpose of the semantic network is to facilitate the collaboration, organization and integration of data
into organized hierarchal structures, accessible to users of the network. The network allows users to add
different types of data from various sources (computers, phones, portable mobile devices). The tools
developed to implement the semantic network include the Scraping Tool, Object Tree Structure display
interface, and the Ubiquitous Video Conferencing.
The Semantic Network
The concept of the semantic network involves the implementation of an object relational model. The
representation of information within the network consists of parent and child nodes. Visually, the
relationships between specific sets of objects are displayed in a tree-like structure, with expandable nodes
of sub-objects. Relationships between objects are user generated, allowing the modification, expansion and
reorganization of objects.
NASA MISSIONS
UNMANNED
MISSION
Mars
Reconnaissance
Orbiter
MANNED
MISSION
2006
2009
2007
2008
Figure 1. Model of Hierarchal tree structure of Semantic Network
An object within the semantic network is made up of several core attributes. The implementation of these
core attributes provides the relational object framework. Core attributes include object data categories such
as title, owners, creation date and scraped content. The type of content within each object can vary, as the
network utilizes the extensible markup language (XML) to facilitate the linking, transmitting and
displaying of data. Moreover, because the semantic network is built on a distributed server model, the
network is expansible and can adapt to fit the needs of researchers, individuals and organizations.
XML Tree Structure
Consider an example wherein a NASA engineering team is developing a vehicle, SpaceCraftX.
The SpaceCraftX object represented in the semantic network would consist of information pertaining to
spacecraft design schematics localized to specific regions of the system. Such objects would include the
research and development papers relevant to the design of subcomponents of the craft. The sharing of
information, the collaboration of perspectives and the understanding of engineering roles is enabled with
the XML tree Structure Graphical User Interface (figure 3). A specific example of a XML file displayed on
the interface can be seen in Figure 2, which shows an layered xml structure for “NASA Missions”. .
Figure 2: Layered XML structure for “Mars Craters Exposed Ice, Water” video
Figure 3: Tree Structure Displayed in GUI.
As soon as one of the NASA engineers, a user of the Semantic Network, objectizes a piece of
information by tagging them with the appropriate terms, the object is created in the network. An example of
core terms associated with a record might consist of the following metadata: ‘image recognition software
for SpaceCraftX’ (object name), ‘02/12/10’ (date created). Such metadata linked with the content (as well
as program generated metadata, like a user-id) would represent the object in the network.
The creation of an object allows other users of the network (users tapped into SpaceCraftX’s network) to
retrieve the material, and in some cases, add additional information relevant to the object. The Semantic
network is especially useful in a multiuser environment, addressing the need for all users have sufficient
access to controls (or ownership) to objects, even in within a multiuser collaborative scenario.
The Scraping Tool
The use of the Scraping Tool allows users to create new objects in the network. The scraping tool primarily
functions as an event driven component of the Semantic Network Application. The events defined in our
current implementation of the tool include data Submission and Data Highlighting. An example of Data
Submission would be the user clicking the ‘submit’ button in the Scrape Tool to confirm the start of object
creation. An example of Data Highlighting consists of retrieving/parsing the information selected by the
user. In our current version, the selected information is primarily retrieved using Windows Interprocess
Communications (IPC) clipboard mechanism. In addition, the design of an html parser is in development as
an alternative method for the retrieval of content.
Current Objectives
Current efforts focus on the implementation of integrating a hashing function related to the creation of
objects. That is, an objects path relative to the network should be hashed in order to (1) Efficiently achieve
communication to all distributed servers and (2) Standardize the saved filename convention to optimize
system performance in searching and storage. An example of the proposed hashing of semantic objects is
shown in figure 4.
Figure 4. Proposed object filename convention and possible content items.
Secondly, the testing and coding of data synchronization requests on the distributed server platform. That
is, when a user makes a change to information, the change should be reflected and acknowledged across all
servers. This is particularly important, since one of the main goals of our network is scalability and system
stability. The proposed mechanism has been developed (given the name Active Directory, not to be
confused with Microsoft’s), but has yet to be properly configured and tested.
Lastly, current objectives focus on adding more functionally to the Semantic Network’s Scraping Tool.
Specifically, drag and drop capabilities for all file types (local files, images, video etc). Additionally, the
Scraping tool will have the ability to parse a highlighted portion of a webpage, parse the information, and
store all the informational types (text, images, and video) to a user defined object.
References
[1]
S. Balle, D. Palermo, “Enhancing an Open Source Resource Manager with Multi-Core/Multithreaded Support,” Hewlett-Packard Company, 2007.
[2]
C. Liu, J. Layland, “Scheduling Algorithms for Multiprogramming in a Hard-Real-Time
Environment,” Journal of ACM (JACM), Vol. 20-1, pp. 46-61, January 1973.
[3]
A. Alegre “Aerospace Information Server”, California State University Competition, 2008.
“Structures, Propulsions, And Control Engineering (SPACE) Center – Research Topic: Ubiquitous
Computing and Embedded Architectures” URL:
http://www.calstatela.edu/orgs/space/sc_UbiquitousComputing.htm
[4]
A. Alegre, S. Beltran, J. Estrada, B. Coalson, A. Milshteyn, C. Liu, H. Boussalis, “Development and
Implementation of an Information Server for Webbased Education in Astronomy”. Proceedings of
the International Joint Conferences on Computer, Information, Systems, Sciences, and Engineering
(CISSE) 2007.
[5]
J. Alvarenga, H. Boussalis, “A Semantic Aerospace Network using Objectized Tags Exchanged
through Decentralized Information Servers”, HBCU/OMSI Aerospace Collaboration Conference
2009 Cleveland, Ohio, USA, July 2009.
[6]
R. Tolksdorf, F. Liebsch, and D. Minh Nguyen, “XMLSpaces.NET: An Extensible Tuplespace as
XML Middleware”.
[7]
A. Alegre, S. Beltran, J. Estrada, A. Milshteyn, C. Liu, H. Boussalis, “Implementation and
Quantitative Analysis on of a Shared-Memory Based Parallel Server Architecture for Aerospace
Information Exchange Applications.” Proceedings on the World Congress of the Computer Science
and Information Engineering (CSIE) 2009.
Download