Linked Data Scott E. Barasch barasch1 [at] umbc [dot] edu scottbarasch.com Linked Data exemplifies the original vision of the Semantic Web as being a web of interconnected links of information such as those stored in FOAF, RDF, OWL or other files All data must be named with a URI This URI must be a valid URL There must be a page at this URL which contains the data that is represented by the URI Name This URL / URL should NEVER change Data should be interlinked between documents / files on the web In the past, Semantic Web data was not published to the web It was stored in a zip file, and often stored on an external disk or tape media An example of this is an ontology which contains data about all of the Semantic Web researchers Recently this has changed, as the need for an interwoven mesh of linked data has become appearent Many different ontologies contain similar information for various data members I.e. Name, SSN, Birthday, Zip Code, Telephone Number These data members can be connected, to join the data from multiple ontologies into a giant collection of data, which can be commonly queried. The ultimate result would be to create an entire mesh web of all the ontologies in the world, where each ontology would be a node in a giant graph. That graph would be the Semantic Web DBpedia - a dataset containing extracted data from Wikipedia; it contains about 2.18 million concepts described by 218 million triples, including abstracts in 11 different languages DBLP Bibliography - provides bibliographic information about scientific papers; it contains about 800,000 articles, 400,000 authors, and approx. 15 million triples GeoNames provides RDF descriptions of more than 6,500,000 geographical features worldwide. Revyu - a Review service consumes and publishes Linked Data, primarily from DBpedia. riese - serving statistical data about 500 million Europeans (the first linked dataset deployed with XHTML+RDFa) UMBEL - a lightweight reference structure of 20,000 subject concept classes and their relationships derived from OpenCyc, which can act as binding classes to external data; also has links to 1.5 million named entities from DBpedia and YAGO Sensorpedia - A scientific initiative at Oak Ridge National Laboratory using a RESTful web architecture to link to sensor data and related sensing systems. Creating a single ontology out of all of the linked ontologies in the world would be a nightmare Data access and reasoning time would be astronomical The sheer load of a single user could possibly cripple the network No computer on the earth could realistically process and compute such a large amount of data Imagine a just in time access model for this single ontology: The multiple ontologies would be “linked” by common data members (Name, Address, Zip Code, et. al.) Users or agents know ahead of time which ontologies they would wish to query These queries go only to the individual ontologies The data is returned to the user agent, which then parses the data, and connects the similar data members These data members are “linked”, and a local subset of the global single ontology is created for the extracted data Linking Data by itself is not enough We need to be able to follow those links, and combine ontologies so that we can combine the information stored in one ontology with the data stored in many other ontologies This merging of data allows us to gain more enhanced information, and sometimes can provide new information that is larger than the sum of all the information in all of the ontologies we are querying. The concept of a data Mashup is how this is accomplished today A Mashup engine is the client side user agent. Web Services query Semantic Web data repositories and retrieve the requested data. The data is connected, and a greater meaning is discovered from small sets of disjoint data which are now connected. A Mashup is a way of combining related data into a pictorial form using Socially Rich computing technology to make the data easy to read and understand Charts Graphs Websites Maps Tables Movies AJAX Rich Applications Web 2.0 is known as the Social / Collaborative Web Web 3.0 is another term used to express the Semantic Web The linked data is considered Web 3.0. The practice of pulling the data into the Mashup Engine is a mix between Web 3.0 and Web 2.0 The practice of displaying the data in a Mashup is referred to as Web 2.0. http://www.jackbe.com/enterprisemashup/ http://www-01.ibm.com/software/lotus/products/mashups/ Data can be pulled from existing Enterprise Datacenter Services, and also from feeds on the internet or Semantic Web. Example input data can include: XML, RDF, LDAP, SQL, CSV, Office Documents, RSS Feeds, Directory Servers, among others. Data mapping patterns, merging, looping, and logical operations are all supported