CSCI 8350, Spring 2002,UGA
Bernhard Schueler
A tool to analyse metadata
• Intention
• Implementation
• Demo
• Where are the semantics?
• Unfinished feature: Synonyms, Homonyms, Similarity between RDF-Graphs
• What is unique about RDFBrowser
Bernhard Schueler RDFBrowser 2
Provide a tool to analyze RDF-based metadata.
This includes everything, which is or will be developed on top of RDF.
This tool should allow for:
• Convenient browsing,
• Comparisons between files, focusing on helping the user find semantic similarities/differences,
• Synonyms, homonyms, graph similarity.
Bernhard Schueler RDFBrowser 3
Parsing of RDF files:
ARP, Another RDF Parser, by Jeremy Carrol, HP.
Uses XERCES XML-parser.
I use this parser to extract the triples of the RDF data model.
Bernhard Schueler RDFBrowser 4
RDFBrowser uses AMZI! Prolog to store and query the
RDF triples.
AMZI! Prolog provides interfaces to Java, C, C++, Delphi and more. It runs under Windows and UNIX.
Prolog can easily be abused as database. First-argument indexing provides a certain efficiency.
I considered it to be convenient, especially for advanced inferences, such as finding synonyms (which unfortunately is unfinished)
Bernhard Schueler RDFBrowser 5
The graphical user interface is realized using Java (JDK
1.3.1), especially the “Swing” library.
All parts of the system run at least under Windows and
UNIX.
Bernhard Schueler RDFBrowser 6
Not in here … but on the screen (hopefully).
Bernhard Schueler RDFBrowser 7
RDFBrowser tries to overcome syntactic barriers to help the user retrieve the semantics.
The ability to simultaneously browse files should highlight semantic relationships.
A feature to find synonyms, homonyms, and similar structures in the underlying RDF graph would provide semantic analysis of
• Different descriptions of the same domain,
• Descriptions of different domains.
Bernhard Schueler RDFBrowser 8
Bernhard Schueler RDFBrowser
9
Bernhard Schueler RDFBrowser 10
Bernhard Schueler RDFBrowser 11
Bernhard Schueler RDFBrowser 12
The (sub-)graph isomorphism problem is in NP.
The size of the search space is larger than n! .
Precisely: n(1+ (n-1)(1+(n-2)(1+(n-3)…2(1+1).
And that’s only exact matches!
9 Nodes: more than 362880 possible matchings.
Bernhard Schueler RDFBrowser 13
Bernhard Schueler RDFBrowser 14
Only querying of subgraphs of a small user defined length makes sense in a browser.
Inexact matches are more likely than an exact match of 42
Nodes.
Bernhard Schueler RDFBrowser 15
Heuristics for pruning the search tree:
• Same labels of nodes and edges (URIs),
• Accuracy (percent of not matched nodes),
• Consider only reachable nodes.
Bernhard Schueler RDFBrowser 16
There are advanced techniques for indexing graphs to speed up the average case, e.g. based on the number of adjacent nodes.
But imagine the tree of your direct ancestors (excluding uncles and aunts). Everyone has 2 parents…
Bernhard Schueler RDFBrowser 17
Regarding the semantic information contained in RDF files this browsers weakness is its strength:
It does not consider anything on top of RDF, e.g. RDFS,
DAML, OIL.
Thus, it can work with any of them.
Bernhard Schueler RDFBrowser 18
Dennis Shasha, Jason T. L. Wang, Rosalba Giugno.
Algorithmics and Applications of Tree and Graph
Searching.
Jeremy Carrol. Matching RDF Graphs. Draft, July 2001.
Koebler, Schoening, Toran. The Graph Isomorphism
Problem: Its structural complexity.Birkhaeuser, Boston,
1993.
Bernhard Schueler RDFBrowser 19