John Paul Adigwu The Semantic Network and Scraping Tool February 2010 Introductory Overview The purpose of the semantic network is to facilitate the collaboration, organization and integration of data into organized hierarchal structures, accessible to users of the network. The network allows users to add different types of data from various sources (computers, phones, portable mobile devices). The tools developed to implement the semantic network include the Scraping Tool, Object Tree Structure display interface, and the Ubiquitous Video Conferencing. The Semantic Network The concept of the semantic network involves the implementation of an object relational model. The representation of information within the network consists of parent and child nodes. Visually, the relationships between specific sets of objects are displayed in a tree-like structure, with expandable nodes of sub-objects. Relationships between objects are user generated, allowing the modification, expansion and reorganization of objects. NASA MISSIONS UNMANNED MISSION Mars Reconnaissance Orbiter MANNED MISSION 2006 2009 2007 2008 Figure 1. Model of Hierarchal tree structure of Semantic Network An object within the semantic network is made up of several core attributes. The implementation of these core attributes provides the relational object framework. Core attributes include object data categories such as title, owners, creation date and scraped content. The type of content within each object can vary, as the network utilizes the extensible markup language (XML) to facilitate the linking, transmitting and displaying of data. Moreover, because the semantic network is built on a distributed server model, the network is expansible and can adapt to fit the needs of researchers, individuals and organizations. XML Tree Structure Consider an example wherein a NASA engineering team is developing a vehicle, SpaceCraftX. The SpaceCraftX object represented in the semantic network would consist of information pertaining to spacecraft design schematics localized to specific regions of the system. Such objects would include the research and development papers relevant to the design of subcomponents of the craft. The sharing of information, the collaboration of perspectives and the understanding of engineering roles is enabled with the XML tree Structure Graphical User Interface (figure 3). A specific example of a XML file displayed on the interface can be seen in Figure 2, which shows an layered xml structure for “NASA Missions”. . Figure 2: Layered XML structure for “Mars Craters Exposed Ice, Water” video Figure 3: Tree Structure Displayed in GUI. As soon as one of the NASA engineers, a user of the Semantic Network, objectizes a piece of information by tagging them with the appropriate terms, the object is created in the network. An example of core terms associated with a record might consist of the following metadata: ‘image recognition software for SpaceCraftX’ (object name), ‘02/12/10’ (date created). Such metadata linked with the content (as well as program generated metadata, like a user-id) would represent the object in the network. The creation of an object allows other users of the network (users tapped into SpaceCraftX’s network) to retrieve the material, and in some cases, add additional information relevant to the object. The Semantic network is especially useful in a multiuser environment, addressing the need for all users have sufficient access to controls (or ownership) to objects, even in within a multiuser collaborative scenario. The Scraping Tool The use of the Scraping Tool allows users to create new objects in the network. The scraping tool primarily functions as an event driven component of the Semantic Network Application. The events defined in our current implementation of the tool include data Submission and Data Highlighting. An example of Data Submission would be the user clicking the ‘submit’ button in the Scrape Tool to confirm the start of object creation. An example of Data Highlighting consists of retrieving/parsing the information selected by the user. In our current version, the selected information is primarily retrieved using Windows Interprocess Communications (IPC) clipboard mechanism. In addition, the design of an html parser is in development as an alternative method for the retrieval of content. Current Objectives Current efforts focus on the implementation of integrating a hashing function related to the creation of objects. That is, an objects path relative to the network should be hashed in order to (1) Efficiently achieve communication to all distributed servers and (2) Standardize the saved filename convention to optimize system performance in searching and storage. An example of the proposed hashing of semantic objects is shown in figure 4. Figure 4. Proposed object filename convention and possible content items. Secondly, the testing and coding of data synchronization requests on the distributed server platform. That is, when a user makes a change to information, the change should be reflected and acknowledged across all servers. This is particularly important, since one of the main goals of our network is scalability and system stability. The proposed mechanism has been developed (given the name Active Directory, not to be confused with Microsoft’s), but has yet to be properly configured and tested. Lastly, current objectives focus on adding more functionally to the Semantic Network’s Scraping Tool. Specifically, drag and drop capabilities for all file types (local files, images, video etc). Additionally, the Scraping tool will have the ability to parse a highlighted portion of a webpage, parse the information, and store all the informational types (text, images, and video) to a user defined object. References [1] S. Balle, D. Palermo, “Enhancing an Open Source Resource Manager with Multi-Core/Multithreaded Support,” Hewlett-Packard Company, 2007. [2] C. Liu, J. Layland, “Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment,” Journal of ACM (JACM), Vol. 20-1, pp. 46-61, January 1973. [3] A. Alegre “Aerospace Information Server”, California State University Competition, 2008. “Structures, Propulsions, And Control Engineering (SPACE) Center – Research Topic: Ubiquitous Computing and Embedded Architectures” URL: http://www.calstatela.edu/orgs/space/sc_UbiquitousComputing.htm [4] A. Alegre, S. Beltran, J. Estrada, B. Coalson, A. Milshteyn, C. Liu, H. Boussalis, “Development and Implementation of an Information Server for Webbased Education in Astronomy”. Proceedings of the International Joint Conferences on Computer, Information, Systems, Sciences, and Engineering (CISSE) 2007. [5] J. Alvarenga, H. Boussalis, “A Semantic Aerospace Network using Objectized Tags Exchanged through Decentralized Information Servers”, HBCU/OMSI Aerospace Collaboration Conference 2009 Cleveland, Ohio, USA, July 2009. [6] R. Tolksdorf, F. Liebsch, and D. Minh Nguyen, “XMLSpaces.NET: An Extensible Tuplespace as XML Middleware”. [7] A. Alegre, S. Beltran, J. Estrada, A. Milshteyn, C. Liu, H. Boussalis, “Implementation and Quantitative Analysis on of a Shared-Memory Based Parallel Server Architecture for Aerospace Information Exchange Applications.” Proceedings on the World Congress of the Computer Science and Information Engineering (CSIE) 2009.