RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011 Semantic web • A phrase coined by Tim Berners-Lee, inventor of the WWW, in a 2001 Scientific American Article • Berners-Lee and others have described it as a major component of “Web 3.0” • Wikipedia defines it well: • A “web of data” that enables machines to understand the semantics, or meaning, of information on the WWW • Extends the network of hyperlinked human-readable web pages by inserting machine-readable metadata • Enables automated agents to access the Web more intelligently and perform tasks on behalf of users RDF - Jim Ellenberger - May, 2011 • What is it? 2 Why do we need it? • Can’t directly access the meaning of information on the Web • Can’t provide consistent methods to aggregate and query information on the Web • Semantic web technologies provide these missing components • Information can be stored, aggregated and queried based on its meaning • All of this can be automated, because the information is available in machine-readable formats RDF - Jim Ellenberger - May, 2011 • Traditional web technologies like HTML are focused on organizing, presenting and linking documents 3 How is the semantic web implemented? • • • • Resource Description Framework (RDF) Data interchange formats (RDF/XML, N3, Turtle, N-Triples) Notations (RDFS, OWL) Query languages (SPARQL) • My focus: RDF • Essentially, the building block for all semantic web technologies • Originally specified W3C as a metadata language; it was extended to accommodate semantic web concepts • See http://www.w3.org/RDF RDF - Jim Ellenberger - May, 2011 • There is a need to encode and manipulate knowledge on the web, but how can it be done? • Technologies that describe and manipulate information based on meanings and relationships 4 RDF: general structure • RDF is graph-based • Advantages of graph-based model • Virtually any kind and number of relationships can be represented - no need to adhere to a hierarchy • Diverse graphs can be combined as simply as defining a relationship between two nodes - no need for graphs to have compatible hieracrchies RDF - Jim Ellenberger - May, 2011 • Not hierarchical like XML and other data description formats • Single pieces of information are graph nodes and the relationships between them are graph edges 5 RDF statements • Subject – thing the statement is about • Predicate or property – a property or characteristic of the subject • Object – the value of the property or characteristic • Example, a statement about a camera: • The D300 – subject of the statement • is manufactured by – predicate • Nikon – object of the predicate • This triple encodes a single piece of information: The D300 is manufactured by Nikon RDF - Jim Ellenberger - May, 2011 • The basic unit of information in RDF is a statement or triple with three components 6 RDF URIs • Unique – to avoid confusion • Universally accessible – to make useable web wide • These identifiers are called URIs - Uniform Resource Identifiers • The camera example in URIs: • http://dbpedia.org/page/Nikon_D300 - subject • http://mywebpage.org/camera#manufactured_by - predicate • http://www.dbpedia.org/resource/Nikon - object RDF - Jim Ellenberger - May, 2011 • Subjects and objects that make up RDF statements are called resources • In order to be useful web wide, resources and the predicates that link them need identifiers that are: 7 More abut URIs • URIs are not URLs (but URLs are URIs) • Where do URIs come from? • Use an existing URI if an appropriate one exists: http://dbpedia.org/page/Nikon_D300 • If one doesn’t exist, make your own: http://mywebpage.org/camera#manufactured_by • If you create your own, it must be universally accessible and must return data to RDF clients RDF - Jim Ellenberger - May, 2011 • URLs represent things retrievable from the web • URIs represent things identified on the web, which may or may not be retrievable 8 Camera example in graph form http://mywebpage.org/camera#manufact ured_by http://www.dbpedia.org/resource/ Nikon RDF - Jim Ellenberger - May, 2011 http://dbpedia.org/page/Nikon_ D300 9 Camera example linked to other graphs http://dbpedia.org/page/Nikon_ D300 http://mywebpage.org/camera#manufact ured_by http://www.dbpedia.org/resource/ Nikon [URL: stock_price_of] [URL of Stock Price] [URL: Review] RDF - Jim Ellenberger - May, 2011 [URL: review_of] 10 What Does RDF Look Like in the Wild? • • • • RDF/XML Turtle N3 RDFa • RDF/XML is probably the most common RDF - Jim Ellenberger - May, 2011 • RDF statements need to be serialized to be used on the WWW and processed by machines • There are many formats used for this: 11 RDF/XML Example <?xml version="1.0"?> <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:mypage="http://mywebpage.org#"> <rdf:Description rdf:about="http://dbpedia.org/page/Nikon_D300"> <mypage:manufacured_by rdf:resource="http://www.dbpedia.org/resource/Nikon"/> </rdf:Description> </rdf:RDF> • XML Tags • • • • • rdf:RDF - begin RDF document rdf:Description – begin description of subject(s) rdf:about – URI for the subject mypage:manufactured_by – the predicate rdf:resource – URI for the object RDF - Jim Ellenberger - May, 2011 • RDF is not XML, but it can be encoded in XML • The camera example, in RDF/XML: 12 A real world example: OpenCalais <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:c="http://s.opencalais.com/1/pred/"> <rdf:Description rdf:about="http://d.opencalais.com/er/product/electronics..."> ... <c:name>Nikon D300 Digital Camera</c:name> </rdf:Description> </rdf:RDF> • Essentially, the edited RDF code contains the triple • electronics product (subject) • name (predicate) • Nikon D300 Digital Camera (object) RDF - Jim Ellenberger - May, 2011 • OpenCalais is a web service that automatically generates semantic metadata in RDF/XML from text submitted to it • This is a portion of OpenCalais’ output when “D300” is submitted: 13 What else is happening? • DBPedia project • FOAF - Friend of a Friend project •Uses RDF to describe relationships among people •http://www.foaf-project.org/ • OpenPSI project •Publishes UK government data in semantic web formats •http://www.openpsi.org/ • GoodRelations vocabulary •A means to publish product info in semantic web formats •http://www.heppnetz.de/projects/goodrelations/ RDF - Jim Ellenberger - May, 2011 •Publishes Wikipedia information in semantic web formats •http://dbpedia.org 14 • The amount of information that could be encoded is staggering • Encoding meaning isn’t always straightforward -- e.g., what does “young” mean? • Not everyone wants their information freely available •Information can be a commodity •Information can be a trade secret • Accuracy -- how do we deal with information that is inaccurate or deceptive • Performance -- how will semantic web data stores perform compared to more traditional datasets? RDF - Jim Ellenberger - May, 2011 Important Issues 15 Conclusion • There is quite a bit more to RDF • There are also many related areas to explore • • • • How can RDF data be created? How can it be stored? How can it be served and retrieved? Once we retrieve RDF data, what should we do with it? RDF - Jim Ellenberger - May, 2011 • RDF has more capabilities than described here • RDF has been expanded with other technologies to create still more capabilities 16