>> Evelyne Viegas: Hello, everyone, and welcome to the session. It's a real honor to have Professor Jim Hendler here with us today. He's the -- I'm going to read his bio. I want to take the time. So it's he's the Tetherless World Chair of Computer and Cognitive Science at the Rensselaer Polytechnic Institute. He is also the associate director of the Web Science Research Initiative which has headquarters at MIT. He was the recipient of a 1995 Fulbright Foundation Fellowship, a former member of the US Air Force Science Advisory Board, and also a fellow of the very prestigious American Association for Artificial Intelligence and also the British Computer Society. He was also the former Chief Scientist of DARPA, the US Defense Advanced Research Projects Agency. And he is today also the Editor in Chief of IEEE Intelligent Systems and the first computer scientist to search on the Board of Reviewing Editors for Science. Last, but not least, he's also one of the inventors of the Semantic Web. And today he's going to talk about the Semantic Web, Web 3.0, Linked Data. Please welcome me in having here professor Jim Hendler. Thank you. (Applause). >> Jim Hendler: Thanks a lot. It's a pleasure to be here. It's taken a number of years to be invited to give this talk in this forum, so it's nice to see that some of the stuff is starting to really hit, and that's a little bit of what my talk is about. I was going to actually subtitle this talk the buzz words that rule my life because I really spend a lot of my time trying to explain sort of the relations between these different things, what's new, what's different, is this just the old AI stuff? And this talk has been evolving for a while from sort of a talk about specific technologies particularly ontologies for the Web to just more generally what's going on in the space to help people kind of plot what they're doing, what other people are doing, what some of the excitement's about, help you under when you read a newspaper story that says some, you know, semantic search company was bought by some very large software company for 100 million dollars, what that implies, that kind of thing. So I'm going to start from ontologies for various reasons. This is a term that's been used a lot, and it's really the thing that's probably the most controversial or confusing part of this whole Semantic Web story because A, the term has a long historic history, this term ontology is actually named after a Greek philosopher, and you know, has been around for a few thousand years, and the -- couple thousand years, the original definition sort of brought to computers is provided definitive and exhaustive classification of entities in all spheres of being. That's a Barry Smith quote. Barry's university -- one of the New York Universities I should know. Buffalo. And it's a particular view of ontology that's kind of led to this model of very formal things, very expressive. I mean, again, if you want to do that, there's a couple of interesting things there. One is this notion of getting it right, right, that there's some kind of ground truth. So if we say there are physical objects and non physical objects that seems to be true of the outside world. When we -- but, you know, sort of those things could get very contentious very quickly, very hard to get content agreement. Sometimes the lower level terms are harder to get agreement on than the high level, sometimes the other way around. A different view usually attributable to Gruber's ontology is sort of an abstract simplified view of the world that we wish to represent for some purpose. And what's interesting is this is not his most used quote. There's another definition that has more buzz words and sounds more mystical. And what people have to a large degree forgotten is that part at the end of that for some purpose, so that's really in a certain sense what the talk is about today. I want to look at some emerging models and ongoing work, and some of the challenge that bringing this kind of technology to the Web does. But I want to bring it from a very pragmatic viewpoint. I have a wonderful quote I found in a Graham Hurst's article, where Graham said the solution to any AI problem may be found in the writings of Lichtenstein, though the details of the implementation are sometimes rather sketchy. And of course we as computer scientists and related fields, that's the thing we care about is making something work. So this talk is really an implementation detail's talk, although I'm not going to drive down into details of the languages, details of the model, things like that, I want to sort of stay at the 50,000 foot level as they say in the DOD, the view of abstraction where what's going on that's new and different without really getting down to the level of detail. But I'm happy to both be interrupted with questions and I'll leave plenty of time after for -- if you have those kind of questions I'm happy to do it. I mean, I normally teach this tough in a term long course and I sort of shrunk it into an hour talk. A lot of tough is on the cutting room floor. To put it into a historic context, a lot of the current talk about the Semantic Web dates from an article scientific American March 2001, called the Semantic Web which was written by Tim Burners-Lee, myself and Ora Lassila and has now become sort of one of the standard citations in the field. And this is the actual table of contents we sent to Scientific American when they asked us what the article would be about. So the letters are the -- you know, it's sort of who was going to write what. So we were standing in front of a white board in Tim's office, and we sort of knew what we were building now and what worked, and we kind of knew what we thought would be really cool if this stuff was really out there. And we kind of had this blank in the middle, and we were talking about all sorts of transition models, and Tim just walks up to the board and writes that a miracle occurs which is an attribution to this famous cartoon where you see the two scientists and one is saying to the other I think you should be more explicit here in step 2. But what Tim said is, you know, if we get the design right, right, if we pay attention to the Web stuff, right, this thing will scale, this thing will grow, and that the important thing is to build on the model that worked for the Web, not to build on the models that didn't work for many, many parts of AI in the past. And so that's again attention I want to sort of visit in this talk. The other thing I want to show is in a certain sense the miracle is happening that last time I gave this talk, I spent so long on this slide I didn't have time for much else. So I'm going to not really talk to it very much right now except to mention a couple really quick things. So one is a lot of companies are growing in this space, including now large companies getting second round financing, you've had a couple of big acquisitions like the Powerset acquisition in the semantic search space, but that's an area where dozens and dozens of companies are playing in that space, a lot of startups and stealth. And then this thing called Web 3.0 is a growing new buzz word which I'll visit again, but companies like Garlock in the U.K., Meadoweb and Radar Networks in the U.S. Anyone who has a Twine account knows about Radar Networks. Joost, Talus. Joost is an online television, Talus is a company that's been doing database support for many years, is now doing RDF data hosting for some of these other companies and for certain other large companies. The other thing that's happened over the past couple of years is you've had a lot more -- you've had almost, you know, the accelerating avalanche of larger players getting involved. So in about 2004, 2005, we could point sort of Adobe is putting this stuff in PDF documents and Oracle was starting to think about it. Now it's supported in the Oracle suite. IBM has both OpenSource tools and has announced that they will be announcing a support in D base, although they haven't said what it is yet. Drug companies like Pfizer and Eli Lilly have given large presentations at Bio-IT. So a lot has been going on. Your own company has actually got some stuff happening in the space. I'll mention some of that later. And so tools are starting. Several verticals have really driven this, but it's getting beyond those verticals now which is part of what's exciting. And there's a lot in OpenSource, so one of the things that's nice now is if you want to play with this stuff, it's not, you know, take a lot of courses and build a lot of stuff from scratch, it's download some stuff, fool around with it, hack up something and start playing. And that's really, really been an exciting difference. Nowadays when I teach an undergraduate course on this stuff, the three week project students do at the end of the term far exceed anything that was commercially available in '03 or '04. One of the things that's been happening in this space, and it's not the one I'm going to talk the most about is the whole notion of lexical semantics getting excised. So one of the things that was going to power the Web 2.0 revolution was tagging tag spaces which still is happening, which is very important, but there was this claim that somehow that was going to put the vocabulary development and ontology people out of business because communities would just generate their own tags and their own hierarchies, and the answer is really that turned out to be not the case. There's not a scalable technology there only from the tags. I mean, there's still a lot you can get out of them, but you need social context also, which I'm going to talk about later. But what started to happen is things like WordNet, things that have been learned, ontologies, a lot of things like the stuff that's in Meadoweb. So Powerset, for example, takes advantage of everything they can get their hands on, both natural language technology and semantic categorizations and does things. There's some people have match making engines doing the same kind of thing. So some companies that are trying to figure out does this profile match this profile, are you using just categorization hierarchies, things like that, and realize that now it's easy to find online readable large scale versions of those instead of building your own or learning them from (inaudible). So some of that's happening. And of course that works very well for certain things, it doesn't work so well for others. So there's a whole litany of traditional natural language projects, problems that we fight all the time when we do these lexical problems. But the other thing that sort of became evident in about the year 2000 and has been really taking off since then, is that non linguistic resources are a growing percentage of the Web. And here I just simply mean things where either there aren't words or the words aren't the primary thing in it. So YouTube videos have some words attached to them, but by and large you can't do a very good search in YouTube, even if you know what you're looking for. So you're looking for a particular formers version of the particular song at a particular concert, you'll still get eight pages of hits on YouTube, a lot of which are some kid with a guitar imitating that guy at the thing or someone who says I was at the concert and, you know, this thing happened over here. So again, a lot of the imprecision partly because there's a lot less words. So Flickr tagging was the idea was that you would be able to do clustering of the Flickr tags, that would give you nice hierarchical structures, turned out Flickr put that on be and then took it off a few months later because it was doing so poorly, right. So for a short time you would do Turkey and you would get birds and you would get the country and you would get a couple other things, and then they started getting confusions and confusion and confusion. Well, you know why, because they had a million documents each with four words on it. Ask an IR person how easy that is to do good precision on. So more stuff was needed. And then the other big thing of course is any type of structured or semi structured data that's not primarily linguistic, you know the number 14 in a database can be an age, it can be a lot of other things. So you can't just say do me a search in the database for the number 14 as a meaningful thing. You have to know what the columns, rows, meanings, et cetera are. Again, if you're trying to do that across databases, if you're trying to get data interoperability. So sort of on the Web scale, XML for data transfer has been very successful, XML for data integration, search, et cetera, has been much less successful on its own. So again, a lot of this stuff has been going on. The traditional AI approach to this stuff was, well, you can't -- if you can't do the words you need sort of knowledge about the stuff, so this is the field of knowledge representation has been a main stay of AI since very early on. You can talk about formal things like so a student is a person. So in this view, the ontology is just this set of rules and all of the entailments from those, if they say X is a student, and if I say a student is a person and a person is a mammal, then we're entitled to know X is a mammal. Okay. It's often being -- it's still a maintain of AI, there's still big KR conference and things, but it was rightly criticized for many years, complexity on decidable issues, definitional adequacy, knowledge engineering, grounding. I mean there's a whole literature. So just like some of these other things, right, there's still a lot of research in that space alone. But what's been happening is that stuff has been seeing a new life on the Web. Some of this was motivated by the Web ontology language OWL, but what I want to show you is that there's kind of a different way it's been gaining life and that there's really these different things happening that even though they're using the same substrate are using it in very different ways, and some of it is very different than what we've seen in the past because of the Web's involvement. So I think in this the main thing I want to point out is there was this language that can be viewed a couple of different ways. So it was when it was created, there was a tension between the people from the AI community who wanted a knowledge representation language and the people from the Web community who wanted a language that would be useful on the Web for data integration and other such things. So it was created by a committee, and like most things created by a committee, it has good and bad features and depending who you are, you have different opinions about which are which, but what's interesting is by many orders of magnitude it's the most used knowledge representation language in history depending how you ask Google can find you thousands to tens of thousands of these documents that are just definitions of domains in OWL and there's many more of them behind fire walls and things like that. And this is just an example. If you Google for student, looking only in files that end with OWL, you get about 400 or 500 of them, and you know, the top ones -the bottom ones, the ranking doesn't work very well because of course these are just being ranked as if they're text documents but they don't have that same point or space, things like that. So there's a lot of interesting issues. But what you can see, there's a lot of stuff out there, even down to the 20th or 30th page, we're still seeing documents that are people defining something called a student. Okay. So this stuff has really started to be out there, but the question is how it's used. And what we've seen is very different kinds of use cases emerging. One is sort of what I would call much more like the traditional AI use case. So the US National Center for Biotechnology has what they call the oncology Metathesaurus or everyone around the world calls the cancer ontology, is maintained full time, has about 50,000 plus classes, updated monthly. If you get funding from the NIH to do cancer research, basically you have to use this as your key word taxonomy, but it's actually more complex than a key word taxonomy and has many subsections so if you're doing a clinical trial on the use of some chemical to prevent a certain kind of cancer in a certain kind of organism, you have to sort of categorize it against those things. It's rigorously follows what's called the OWL DL model, which is the decidable subset of OWL, it's provably consistent, which is one of the things that takes so many people so when one makes a change everyone else has to make sure that change is done. And I'm going to contrast that with another ontology called Friend of a Friend. Friend of a Friend is sort of a social open networking thing. It was built by a couple people in England, Dan Brickley and Libby Miller. I mean this one you can't even talk about who built it. Here we can actually say Dan and Libby did it. It's maintained by consensus in a small community of developers. It violates all the rules of this decidable stuff. It's used very inconsistently. Different organizations describing their people have attended some of them, some of store email addresses, some of them store encoding of email addresses, things like that. The interesting thing about it is that the MCBI ontology does have high use in a very specialized community, it has a very high cost for what it does. There's very little data on the Web that's indexed against that, though it's growing. FOAF has about 60 million entities that are known to it, so there are these files out there and it says you know this is a FOAF thing and it has this name and this property and things like that. Not necessarily distinct individuals, in fact necessarily not distinct individuals. A lot of these are created by different organizations and what they're publishing the FOAF for is to allow people to link things across that. It's used by a large number of providers, so you know, sometimes I ask people do they have a FOAF file, and you know, very few people in the audience raise their hand, and I say do you use LiveJournal, have you ever tried anything in Flickr, tried Joost, a lot of these things actually create for you some kind of FOAF file. And many of the social networking sites because this is catching on will now make it exportable so if you have a Facebook profile and want to play in the FOAF world there's a little program you can run and it will export you and your friends and all that stuff in a FOAF compatible way. So it's certainly going to becoming a de facto standard for open social networking. So the interesting thing about it is we have the first ontology which by every standard of the traditional you know kind of knowledge representation and knowledge engineering approach is the Cadillac of this. I mean, beautiful, well maintained, heavily cared for versus this whole thing which is small, scruffy and being used more than any other ontology in history again by orders of magnitude. So it's kind of the what's happening there that I want to drill down on a little bit. So one of the things that's been happening on the Web is a view of OWL as a formal KR standard. And in fact, one of the things that happened is OWL grew out of a DARPA program called the DARPA agent markup language joined with a European group called OIL, which actually the three or four different acronyms, depending whose paper you read. So it was the OWL -- it was the ontology integration level, the ontology integration layer and various other things. The DARPA stopped funding it or pretty much let the original program run out and didn't create any follow-up funding, so almost all the funding moved to Europe. And for the first couple years of European funding under framework five and framework six, a lot of the work was for formal work in ontology development and ontology -- so it was really the old AI stuff in the new guys. So a lot of effort went in, and so there was this thing that went out that what OWL is is a KR language, and that's where a lot of the government funding was and since most of the work going on was within universities, that was kind of where the focus of attention was. And that actually had some neat, you know, advantages. So for example in about 2000, so when the DAML program was starting the description logic community claimed to have sort of the scaleable knowledge rep stuff that anyone had done that actually had a formal logic base, et cetera, et cetera, et cetera, and if you actually looked out and found the papers you could find that the best performance result was 10,000 axioms took about a day to get classified, okay. If you look now, because again there's a commercial interest in this stuff, you're seeing something like 50 million axioms with a million fact taking 10 or 15 minutes. So it's very different kind of performance scale time and not just, you know, computers getting faster. A lot of research effort went into it again, a lot of that from the European union and also a lot of moving to parallelism as a scaling mechanism has been being looked at now, so more Web server back end kind of parallel since I got a hundred machine server farm how can I distribute some of this stuff. It turns out not to distribute very well, so a lot of people are thinking about how you could do that differently. And there's actually some efforts going on now to look at much larger data sets against these ontologies. IBM has something called sure, Oracle has something called OWL prime. So again, this view is OWL as a KR language. I give a talk called the two towers, and this is the tower I use to portray this work. It's extremely powerful stuff. It's sort of like Sauron's tower in Lord of the Rings where, you know, you've got to keep a lot of works around the outside because all it took was one Hobbit sneaking in to make the whole thing come by. And of course here this thing has a decidable logic base. So the biggest advantage of this kind of reasoning is it's sound and complete. You get all and only the correct answers and if you use this particular set of restrictions it's a decidable problem. It may still take a very long time but at least provably there is a correctness to this thing, which of course makes researchers very happy. I think I'm not going to talk much about this except to say, again, so this was a thesis written about a couple years ago by one of my students was given a large one of these things, you know, there had been things that could prove them inconsistent but they couldn't actually tell you why, so his doctoral thesis was he took a 15 hundred -- well, he built something that was a reasonably scaleable algorithm for something with a couple thousand classes for explaining what were the set of inferences that caused the problem. All right. So that's a -- that was again a thesis level problem in this stuff. So this stuff is things where again the inconsistency makes the thing almost unusable. I'm lying in one sense, which is that many of these models have some kind of way of having a modelled inconsistency, so the sight, language which is more powerful than this, has a notion they call microcontext and psych you can know that Sherlock Holmes --in the Sherlock Holmes universe, Sherlock Holmes is a really person and in the non Sherlock -- you know, in our universe he's a fictional person, right. But it's still a very controlled model. You can't just sort of throw an arbitrary fact in to psych and if it doesn't get into the right place it can cause all sorts of (inaudible). So if I say Sherlock Holmes did something and forget to say by the way I'm talking in the Sherlock Holmes universe, then I have implied that a fictional character can do certain things that a fictional character can't do, and I get inconsistency and problems and all that sort of stuff. But there's a fairly compelling argument for these kind of systems. This is sort of the old AI knowledge expert kind of argument on the Web. So here's, you know, looking for proteins using various kinds of image searches and things. Here's using a Semantic Web portal that Jiu jitsu was playing with and there's now one of these being supported within several of the large drug companies. So it's pulling together, it's integrating a lot of information about things coming from different sources and putting them into kind of a Web framework so you can see the particular protein, the various different ways of looking at it, sequencing information, things like that. And again, you know, if you're doing professional work with these kind of things, there's a fairly compelling reason. You know, I want my doctor to know about this kind of information, not this kind of information. Okay? And where the stuff has really seen its return on investment has been what's new from the traditional AI point of view is traditionally I pretty much looked at these ontologies without paying too much attention to the data, and where this stuff has been is it's now scaled to the point where within a controlled enterprise framework, so where you own the vocabulary and where you own the data and where you can do the registration between those things carefully, cleanly, you can clean up the data, things like that, you can do some very exciting things in here, and so drug discovery was one of the early published successes. You build an ontology as expressively as possible that a thing that has this property and this property and this property can be in this class but not in this case, and then what you can do is take results from something like combinatorial chemistry where you're getting -- somebody once told me that they can now generate in an hour what it takes a human a single year to process, okay. And they've do the thing running 24-7, right. So sort of each day they're getting 24 years behind if you only had a single person, and they need machine help in this. And here what they can do is grab a bunch of that stuff, throw it against the model looking for a particular category or class that's needed for some drug thing, and in this kind of ROI, you know, the thing you're building is very, very expensive, but what's nice is that bad or missed answers are money down the drain, right? So if you didn't get some part of your assay that could have been the thing that was the aerobic, anaerobic, hydrophilic, something or other, you've missed your chance to get the next big drug out the door. On the other hand, if you find too many wrong -- you know, lots of false negatives then each one you have to test can cost a lot of money. So modelling is very expensive and the return on the investment must be very high. Okay? Problem is that's exactly why the old expert system revolution in AI never really came to fruition the way it was expected, because there aren't that many problems in the real world where there is that much money and you need that return, you get that kind of return, right? There were lots of them. But it's not nearly as much as the long tail of problems that generate, you know, a little bit or have a very different flavor to them or have a very different feel to them. And a lot of data integration in the wild or large scale data integration goes beyond the needs of controlling a single vocabulary. So there's an alternate view that's been growing over the past couple years and this is really the one that towers this Web 3.0 idea, which is that OWL was based on RDF, which was a model to bring it to the Semantic Web. And in fact, there's a new research, there's a new standard's group building the new version of OWL who are trying desperately to pull it out of RDF and get it into XML. The good news is too many of the users have said no. Built with various Web architecture things in mine. But more importantly it was built with the Web culture in mind, which is very different than the mind set of that other kind of model, this is a open and extensible model. You don't like my ontology, fine, build your own or extend it or change it or do something. Or, you know, find someone else's to use. Right? That wasn't the traditional knowledge -- that's not the traditional engineering approach, that's the traditional sort of software hacking approach, right? Get some stuff, make it work, play with it, share it, publish it. It also has a nice feature from the Web application development point of view because it scales in a sort of databasy way, which I'll talk about in a minute. So things can link to each other, things can create link spaces, things can be pulled out and stored in various kind of repositories and things. So there's some ways of linking to the formal model, but what we're really seeing is a heavy use of a very small part of OWL, so you have a lot of data and a little bit of semantics. And OWL has this model theory, you know, the exact legitimatized inferences that if you want to build a compliant OWL reasoner it must have all and only the (inaudible) right? A lot of the guys building these triple stores and inferences over that say OWL has this thing called same as, I need a thing called same as, great, we'll just, you know, use same as, right. And they couldn't read the model theory if they wanted. I can't read the model theory and I've got a Ph.D. in this stuff. I chaired the working group that produced it. But so there's a lot of interesting things going on, and there's some really interesting debates in the community in -- about these things. But what's nice is the Web application community doesn't listen to these debates, they find something useful, they build with it. So that's really what started to happen was a data Web approach which said people wanted to share data on the Web, data sort of shares better in these graph models, RDF matches these graph models nicely to the Web, and what the real motivation that's pushing people now is, you know, you hear people talking about well, is this going to put Google out of business? Well, that's not really what people are trying to do, what people want to be is the next Google thing, right? So for the original Web, you know, there were some big winners, right? For the Web 2.0 there were some big winners. There's now this other thing happening on the Web and people want to be the big winner. So if you go to a conference down in the West Coast down in the San Francisco area, you will see that community starting to get very excited by this stuff. Couple the companies I mentioned before the Web 3.0 companies, Radar, Meadoweb have gotten their second round funding, of course some excite was generated by the Powerset buyout, things like that. So a lot of excitement in this space right now. And you know, given what the economy looks like right now, for something to be hot in the venture area means something must be going on there, and here's something going on that's sort of a more technical level that I'm not going to go into too much detail about. But anyone who knows really how Web apps work know that you start by you have a browser or a Web application, it does a request back to some kind of dynamic content engine, dynamic content engine works with a database, pulse up something which is then sent back at HTML which is then displayed in your browser or XML or whatever, okay. So essentially a lot of the text documents we talk about on the Web as text documents aren't really text documents, they're really textural representation of something in a database. And when you do mashups, a lot of what you try to do is then mine out of that HTML what the data looked like so you can put it together with some other data. Okay. Well, as this motion of fast, quick mashup, mashup tools, ways to put things together, some people playing with RDF started to realize that the triple store looks just like a database. The Sparkle language lets you do a SQL like query against that kind of database, and what you get back can be turned both into traditional presentation but also into something that can be stuck into another triple store. Again, you have to go read the specs if you want to see the details. But that's exciting because now this triple store can have its own Web app. So my Web app can simultaneously be the dynamic content engine for my application and a service provider for another application, and you can kind of get this fractal nature effect. So people started recognize that. It's now been used in a lot of different sites. This is actually so as I was doing this talk today, right after I had done the save, I got email which had the announcement of the first Web 3.0 conference expo, and interestingly enough I already had titled my slide before I saw this thing called Linked Data, Semantic Web, Web 3.0. So you can see that this is a mean that is starting to get out there. Will this be as big as the Web 2.0 conference? Not this year. Will it in five years? It will be bigger. That's what people are betting on. Sorry, that's not a statement of fact. People are betting on this one's on this curve and the other one is on the leveling out. Why? Again, because the big one is still out there. If you don't believe me, right, believe the world's largest software company which said that there's a real reason why you should use RDF databases, which is that basically it gives you a data store and a very flexible schema that later can be added to. So one of the problems you have in the database is you have to get your -- you got to get your data representation right the first time because once you've deployed the application if you want to add two new columns, that's a non trivial thing. Whereas in some of these other things you can just say well I'll just add some properties and attach them to those things that have them and we'll just keep going and then, you know, kind of do it. So there are tradeoffs. It's sort of easier to get stuff in and some ways harder to get stuff out. Second one is it helps you to create Web like relationships between data which is not easily done in a typical relational database. This was actually published in one of the .NET documents back in December '06. If you go to it today, you won't actually find this quote, you'll find a pointer discussion about Sparkle, which has come out since then which is now the data standard for query and RDF data. So there's Sparkle support deep in there for doing certain things which are just very hard to do in relational data. So again, the Web community, the developer community didn't really care about AI's religion or about things like that, they hey, we got some stuff that matches very neatly to this kind of problem. And that really became the driver. So just as an example, here's a company that's been doing a lot of stuff with this. Dave Beckett who works at Yahoo gave a talk at Semantic Technologies '08. The slides are there. If you don't believe me about this stuff, his slides actually tell the story better and at a technical level from a Web developer viewpoint. But these are some of the Yahoo sites that now use the stuff. It's not completely RDF, it's the RDF being used to enhance the traditional website, food, travel, finance, lifestyle. Again, they're using some of this semantic stuff to link them to other databases or like in the food's things, recipes are very hard to store in a traditional database. How many steps, how many things, how do you do it, right? So typically what happens is they're stored in traditional bases a few fields of metadata and then the whole recipe in text, right? And our gift there was kind of a model where you say it has steps and the steps are these things. It's not that you can't morph one into the other, it's just it was a more natural fit and they were pulling these recipes for many different sites. So again, it was just that mashup story I was telling you before, they needed to get from data to data. They were on their own sites, so they built this Sparkle way of doing it and then merged the stuff which gives them some better control over what they get. So again, it's just a different use of an applied technology, but, you know, sort of is this stuff real? Well, these are Websites that are real Websites with very high hits and live in the real world. The question I was asked most by the AONIC community is but you know how can you live without this soundness and completeness, how did these guys get along with such a scruffy thing? Answer is for a lot of applications that's not your call, okay? This is Twine. Twine is Radar Networks attempt to be the big one in Web 3.0, Nova Spivak's company, they -- it's basically social networking but with the ability to share entities and a lot of metadata extraction happening automatically to make the entity sharing storage surge presentation happen better and a lot of ideas for the future about how vocabularies play into this. I'm on the advisory boards. I can't talk too much about the details. But let's look at this example, right? They're recommending to me some people who I might want to be a friend with in this thing, okay? And here's the list of people. Well, you know, I actually know a couple of these people. This one is someone named Tricia who describes herself as an arbiter of style from San Francisco, California. Now, this of course you look at me, that doesn't strike you as the necessary best recommendation, right? So guess which one I clicked on, right? Well, I tricked on Tricia and discovered a lot of common friends, a lot of common interests, said hey, you know, this thing recommended, we should join. We now actually share a fair number of things sending back and forth. So she's made me aware of some stuff I didn't know existed, like style, and I help her with some of the technology stuff of Semantic Web. We've never actually met in person. I have no idea if it's a he, a she, a what, but again, or if she has style. More than I do, I can tell you that no matter what. Key point is, think Google, right? Google's job is not to get you all the right documents, right, it's get some good one into that first page, right? It doesn't even have to get all the good ones into the first page. And a lot of good things happening. LiveSearch has a lot of this, too. Can I find a set of different ways? So if you Google for Hendler and I only show you the same guy for the first 500 hits, that's a bad idea. So how do you start recognizing different people. So again, some of the semantics is even coming into either latent or real in terms of just categorizing those first hits. So no one has ever -- is ever actually going to look at the entire set of recommendations to see if they're sound and complete or care. In fact, some of the errors could be interesting. But it's got to be fast and it's got to work, right? If you had to wait 30 seconds for a Google page, when Google first came out, they would not be, you know, the search company we all talk about trying to compete with. Right? But there's a problem. So this is my second tower. So I portrayed that first view as Sauron's tower, the second view is more like the Tower of Babel, right, we're going to build this new Web, it's going to be the Web of data, everything is going to blink together, it's going to be this wonderful, wonderful Web 3.0 web, and who cares about language issues, who cares about common terminology or anything, you know, we'll use a little bit of linguistics and it will make the whole thing work. And you know, when God wanted to make a tower fall down, if you believe the bible, it said you make the people talk different languages. You can't build together, right? In a certain sense you can't get interoperability without agreement. You can get agreement in something like OWL or RDF on this syntactic form, but you still have all the content issues. So again in the small or where there's a small amount of ontology over a very large set of data, you can solve that problem essentially by keeping things small and manageable by him. But as things grow, there's a lot of issues. But one of the important things that is -- has emerged on this Semantic Web frame is, I don't really want to go into this in too much detail but RDF is topped on top of URIs. URIs are very different than strings, right? Because if I say student I could mean a lot of different things, both explicitly different. So for example if you go to Oxford and go to Christ Church College, student means faculty, actually student means a don, okay, so one of the college dons. I don't actually know what they call -- I think it's a capital S student and the small S student. Small S students are the people who take the classes from the large S students. It could also be just subtle differences, right. In some places somebody taking a training course could be called a student and another place only someone enrolled in a university might be called, things like that. But if I say, you know, http//csrpieduhendlertwgroup.owlstudent and you see two Websites using that thing, seems very unlikely that they're doing it by -- by accident. So seeing that they mean the same thing becomes easy. Of course now if I'm using one of these to describe something and you're using a different one, we still have a mapping problem. But now we can put an assertion in there and say this is the same as that, again, so we can build these things up as vocabularies across different vocabularies, never allowed in the AEI world. You had to bring it all into one place and get all the agreement, right? And that actually turns out to be a big difference. And there's a lot of interesting things so recognizing asserting and changing inequality. So it turns out the little bit of OWL that has proved to be used over and over again is essentially a class hierarchy and some property things and the ability to say for those of you who are database people it's almost like an eR kind of model, right one to one, one to many, many to one, subclass, class, inverse transitive symmetric. So a few vocabulary terms for properties but the ability to say those things about different ones to get some kind of linking and merging turns out there's a lot of other things about URIs that are interesting in terms of, you know, so one of the things someone once said is, you know, boy, the Web would be really easy to label if we only had an infinitely expensible labeling space that was dereferenceable. Well, you know, we actually have one of those on the Web, it's http:// and then all the rest of that stuff, right? You can always coin a new one of those, right? And there's some social conventions about who owns it, where it lives, what it does and it happens to play nice with your browser, right. So you don't need a new technology infrastructure to go look at someone else's document here, you can just grab it by typing that URI and looking at the thing that comes back. So again, there were a lot of advantages to doing it this way, and so what's been happening now, this is one of several different projects in this kind of space, so this is where the Linked Data term is coming from, so this is a project called link opened data cloud, there's some other things. But basically you have a data set you'd like to share with the rest of the world, you publish it as RDF and you provide mapping to some other thing in the cloud. And mapping is defined as expressing same as relationships or some thing, some property that lets the computer map the same as. So everybody who has the same email address will be considered same as. Okay? These are all things that are actually in the thing now. This is about a month old, so there's probably another half dozen bullets. It's got tens of billions of assertions, so to pull it all together into a single data store is hard, to mine it is hard. In fact, what to do with it is questionable. A lot of people are looking at browsers for this thing, so you start in something like the music one and that will get you into dbpedia, which is Wikipedia's kind of relationship model and that will get you into some places so you can kind of say hey, I just noticed that this guy I'm looking at was born in this place, and that's interesting for whatever reason. One of my students did a project in mind just music for instance in dbpedia to do things like say what music genre started in Seattle. Or I'm sorry you need music rings, dbpedia and (inaudible) right? And, you know, it did as well as any other data mining thing done quickly. So, you know, 60, 70 percent. Interesting answers, a lot of bad answers, no real deep analysis. But it just shows that this thing is growing and lots and lots of stuff. So if you want to start playing with this, right, it's no getting a data asset is actually not the hard thing. I mean, right? You can go grab any part of this, it's all public, it's all out there, it's all easily available standards around there's OpenSource and proprietary tools for viewing, playing, extending. So it's actually happening. So again, this is -- so a sense this idea of Web 3.0 is this link is data cloud, that RDF and OWL story to provide these links, so the databases, the links that are created by OWL same as inverse functional, functional, that kind of thing and the sort of vocabulary space over it that helps you start linking these things up to other kind of concepts. So that's sort of the model Web 3.0. I'm not -- in the interest of time, I'm not going to go through the next set of slides very much, but another thing that turns out to really be showing up in this thing is Web 2.0 and Web 3.0 being used together. This sort of comes as, you know, a doe experience for me, you know. Sort of back in the old days we kept realizing that AI never did very much itself, it had to be integrated with something else, and so here, too, social context is a very important part of a lot of what happens on the Web now, and in these -- you know, you get the network effect from social context. One of the things that's interesting, though, that's happening now is there are some Semantic Web standards starting to be used by the library community to release some large thesauri. So the Getty catalog, I mentioned the (inaudible) National Library of Agriculture. A lot of these things are starting to release these ontologies. What's interesting about that is that gives you persistent URIs, so I can link stuff I'm doing to those and other people can link things to those to build that Web space I was talking about. So I don't actually have to say my cow is the same as her cow or use the same term, if we both said mine is the same the one the National Library of Agriculture calls a cow, right? So various things happen. And by the way, that's also true it turns out for anything of persistent URI. So a lot of people are starting to link these things up to Wikipedia entry. So in the ontology world there's this whole conference devoted to how should we do commenting on classes and ontologies, what's the right level? On the Semantic Web most of the people doing this started realizing just stick a link to the Wikipedia page or to the Flickr persistent URI that you mean. So if cat shows you 500 pictures of things with four legs that go meow, there's really no confusion about what kind of thinking you're talking about, whether it's a tractor or an animal. For a human. So again this is more human referencing. And but of course that starts to create a link space that says this term and this term have been linked to this thing in Wikipedia, so you also get this other kind of emerging thing happening. I have a paper about this for anyone who is interested so I won't talk about really, I'm just going to quickly. But a lot of us are starting to believe that what's going on is Web 2.0 is really being powered less by the vocabulary space and more by the social context. That's not really too surprising. But what I mean by that is if you can to Flickr and type James, you will find over a million picture that have the tag James on them. That's not very useful as a KAD. I suspect very few people other than me ever go to Flickr and type James as their keyword search. Because I do it to see how many I get. What makes sense about James being used so often as a tag is that Flickr let's me go to your Flickr stuff, Bob's Flickr stuff and Bob has a brother named James, so James suddenly becomes a useful term. Okay. In fact, if you go to a lot of conferences nowadays, you'll see somewhere on the conference page or somebody announce let's use the following Flickr term when you upload pictures of this thing. So again, if you were at the -- in the social context where you could learn that term or if you can guess it through some other context, then you can find those pictures. If not, it doesn't really have any semantic meaning, okay? Web 3.0 is much more shared data and linked ontology so here you're really see the network effect through the social network at scale now, and what you're seeing here is really the relationships through the vocabulary network and data network at the scale and it's putting those two together, it's starting to get a lot of people excited. I won't explain this very much, but except to say so one example that people are starting to play with a lot, it used to be called seated tagging, now people are referring to it as semantic ridding. But the idea is that I give you the ability to do tags, but I try hard to find a way to figure out what to get you to put -to sort of register your tag against known terms. So you type POL about something, and I show you the Poland is a possible extension and you click on that, right, then I don't get POLNAD very often, right? Then I know that you chose and I could have said countries, Poland or something like that. So I kind of know the context you were in when you did that gridding. So I try to get the tags to a known vocabulary. Now, they still can be expanded by lots of social stuff, but the minute I get some good tags to some things I know, then I can start doing some very powerful thing. A few different of these sites that the country related things have now realized they all had different country vocabularies. They did the same as trick, okay. So instead of building one single one and reengineering, they built a set of mappings and then what happens now is that when you're on one of these pages you see ads for the others because it knows that you're looking for information about Poland on the travel site and it says want to see pictures, want to see blog entries, you know. Again, from other sites which have used that same kind of thing. So by gridding things to the right place, you can get these relationships instead of trying to guess from the text you've sort of made it so people would tag the right places. Another example, there's been something floating around the Semantic Web space, actually before the Semantic Web called the wine agent and the wine ontology. It's sort of was a thing which would say you want to have this food, here's a good recommendation of a wine. First version of it had all the wines in the knowledge base, all of the recommendations in the database, so if you wanted to get, you know, so a big community kind of got together and said fish is served with white wine and should be a light red, you know, it should be a heavy sugar light wine, this region kind of thing. Needed to get a lot of community involvement, a lot of community agreement. And essentially this old agent would give you the right wine recommendation. If you disagreed, tough, right, you're wrong, right? That was kind of old mine set. Well, you know, some of us like different wines. So a more recent wine ontology, this is one developed at my group, so I'm actually showing you some of my own work for once, we added a recommendation kind of thing on top of it, so again using some of this gridding kind of thing. So it still needs a lot of interface work, and part of the reason it looks bad there is it actually is made to run on iPhones and similar type devices and so when you see it on a regular browser it looks a little funny. But basically what would happen is so if I clicked on one of these kinds of fish, what I'd see is -- well, I skipped a step. I would now see something that says here are the recommendations relating to that, and then I could say yes, yes, no, no, to the ones I'm on, so roughly corresponding to I'm going to dinner with Deb and Fred and Sam, I want to see Deb and Fred and Sam's preferences taken into account and not everybody else's. And then what it dozen is it goes through the old wine ontology, generates the instead of now having the wines built in, it takes those wine properties, it does spark queries to some wine databases that right now are being done by scraping wine sites, we're actually talking to some of the large wine stores in our area to have them keep their database, build them a little tool so their database will export the Sparkle for us so it will actually be up-to-date with their prices and things. Anyway, this says for example this particular chardonnay was a good recommendation but you can actually see that of the people you picked, nine of them kind of would agree with this recommendation and six of them would have some problems with that recommendation and these respect kind of their preferences. Again, lot of interface work needed. If I pick a different wine, so this is picking a red one, it's actually telling me really only this guy liked it and there were a lot of people who didn't like it, but, you know, that kind of thing. So you can imagine some day being able to say sorry. My killer app for this is you know you're in a restaurant, their menu is online, their wine list is online, you've got your social network so you say I'm with this person, this person, this person, this person is having this dish, this person is having this dish, this person is having this dish, and you know, it will come back and say, okay, here are the wines I recommend and I say that one is too expensive, how about this one, that's good although there's an even better one this that class if you're willing to make Fred less happy. So that kind of thing. So again, that's a modern kind of webby way of thinking about this. It's all still powered by ontology, reasoning, data linking. But again we're not thinking about it anymore as the world of making everyone agree, we're thinking about it as the Web world of people able to -- now, by the way, I should mention for those of you who are former knowledge rep junkees, right, that red wine and white wine are disjoint classes, and so if somebody says they like red wine with fish and somebody else says they like white wine with fish, you just stick it in the ontology is those are both the correct thing to have with fish, your whole thing becomes inconsistent and logic goes to hell. So either you could build a complex context mechanism for knowing it's Jim who likes red and so and so who likes white, what we did again is we just built this on top and because the inferencing is pretty quick, we just run through your friends doing the inferences for each one, get that set of things and then do the thing. So again, we're mixing traditional reasoning with just more kind of quick ad hoc procedural rule stuff. Web 3.0, I've been talking about it primarily in the Web app domain. Let me quickly mention that it's also been used in a lot of other places, so interestingly there's been this funny circle. So a lot of the domains that first got interested in the Semantic Web, particularly life science and health stuff were because those domains actually already had a lot of stuff that was in the old AI formats, that they liked getting it to the Web, so there were lots of dig. So like the national cancer instance institution of ontology wasn't built for OWL. It says hey great, OWL is the way we can share the ontology, okay? But there was no reason er, other than the one that they had as their proprietary thing, which they had a whole company doing, they kept them in business so that they could get some reasoning done mostly in their workflow for keeping the thing consistent, that went, you know, sort of once it went to OWL, a bunch of people started trying out their tools on it, right? So there's now about a half dozen reasoners which are typically tested against that ontology, which of course meant that a lot of other players in the medical domain said hey, my ontology can do that, too, and I don't have to build a company to keep it. I now got all these other players. Well, that put out a bunch of big vocabularies which motivated some other people, particularly Oracle in the early days, to start building too many sets which start supporting other things, which these by the time in 2006 the Web 3.0 guys started to come around, now those tools were ready at a scale, so they kind of started playing with it. Well, now some of the original guys are saying, hey, we can do that Web thing, too, not just the AI thing. So in health science, eScience is an area you're seeing a lot of it. There was a -- if you went to the summit, there was talk about SpaceBook, myExperiment, VSO is the Virtual Solar Observatory, WTT actually doesn't use this stuff but the group at Harvard that helped in that work is actually starting to use this to link in a bunch of paper stuff. We're starting to work with them. A lot of people have been looking at provenance and annotation for data because there are some Semantic Web ways of doing that. A group curation of domain ontology. So one of the things that's actually proved to be very interesting is as new guys come in and say, well, we want an ontology, but we don't have the million dollars that the cancer institute has, what's become sort of the methodology is take the data stuff you have, reverse engineer one of these sort of weaker ontologies from it, run that past some people in your community who say that's stupid, this is good, that's stupid, this is good, fix it up in a sort of iterative process and then put that one out there for the community to curate. And then have, you know, among those curators and AI person as sort of a gardener in the Wiki sense, a gardener. So you have one knowledge engineer, a bunch of scientists and a science ontology that's now being maintained in a consistent way. That's proving to be used in several of these merged virtual things. So there's now several differently virtual observe entries for sun, for you know standard space, things like that starting to link up. A lot of smaller sciences are starting to do some of this stuff. A lot of stuff is fed back into the finance and business world. Two in particular that are interesting is RSS type feeds were originally done in RDF. Some of them moved out of it now have moved back into it using more of the OWL stuff in there, so against ontologies, meaning that the people getting it can do a faster and better -- can do a good sorting. Of course there's some performance issues there. There's been some new work on very rapid streaming versions of this kind of reasoning, so better feeds with fast -- the main reasoning -- and then personnel finders using some of the matches. So it turns out a lot of the semantic stuff has been you've in a lot of the dating sites for a while very quietly, because somebody says I play tennis and somebody else says I play squash, they want to give them some kind -- they want to give them some value of that being a match rather -- you know, squash to I don't know, you know, scuba diving will be less so, right, but at least they're both sports, that kind of thing. So it turns out so some people are now taking some of those kind of match engines trying to bring them back into professional context. There's a company called Paradigm 5 that just recently went into an open beta that's doing some of that stuff. Research challenges remain. This is one of my favorite ways of saying this. There's a big project in the EU called the large knowledge collider which has this slogan. Currently reasoning systems do not scale to the requirements of their hottest applications. So in a sense, that's what I've been talking about. There's also integrating of these things. There's a project going on in the Cleveland Clinic which was kind of neat. There was a bit -- there was funding going into the site project doing kind of deep semantic querying and there was funding going into a group that was just trying to get all the questionnaires being filled out in all the different databases to come into a unified data format in RDF, and these guys were told, you know, we'd like you to work together and they said no way, go away, leave us alone. And I said no, maybe you didn't hear me, neither of you get funded unless you work together and they said right, right, we said we'd love to do this. It was a marriage of convenience for the first six months or a year. Now it's actually showing some really neat results. So it's still a little bit of a marriage of convenience. This is the RDF guys gave me this slide and if you look up in the corner here, it mentions psych, and the psych guys give me this slide, and if you notice down on the corner here it mentions Sparkle. But they do link up. But more importantly -- so I mean, you know, it's still big differences in the communities. But what's interesting is they're using this OWL representation which these guys see as too powerful for what they normally do, and these guy see as too weak for what they normally do as a place to have the joint curation happening because it's actually kind of a decent compromise for that. So the joint terminology is now only maintained in one place, and then this group worries about how to get it through the database and this group worries about how to get it into the deeper representation. You know, is that particular project the paradigm to follow? I don't know. But more and more people are starting to think about this idea, scalable bottom end complex reasoner in a hybrid or heterogenous kind of setup. And of course take that to the next step. You start getting into sort of an ecosystem of these things, differently kind of reasoners, different kind of data. You can imagine in the intelligence context I used the example of there's a big difference between saying find me all the caves in Pakistan and tell me where Osama bin Laden is hiding. But if you give the Osama bin Laden is hiding reasoner, all of the caves is actually pretty complicate -- all of the domain model it has to spend a lot of its time going this isn't a cave, this isn't a cave. So again, you can imagine these systems in a heterogenous way being able to do very different things if we can figure out how to harness it. I know at least four or five different doctoral students at different places starting to think very heavily about, you know, what are the models where this incomplete fast stuff and this very complete slow stuff can be linked up. And it's actually my last slide, module. I'll tell you one more in a minute. So this is just to summarize this lightweight ontologies near data is a growing part of the Semantic Web. Grounding in the URIs is very critical. We're seeing RDF and OWL really being used. In a sense, the Semantic Web today is about the same scale as the '92, '93 Web was, which to put it another way means it's almost big enough to be visible on the Web of today. You know, a few years ago there was a presentation on this stuff and one of the people was from a large search engine. It said you know, there's only a couple million pages with that stuff on it. We can't even -- you know, we don't even see it, right? Well, now there's tens of millions of pages with this stuff on it, and they do see it. They don't see a lot of it, but they haven't -- they're not tailoring their stuff for it, but some companies which have looked in that space like Powerset or you know doing -- have gotten other people excited, things like that. So a lot going on. And finally I'm just going to leave this slide up while I take questions. Beyond the Semantic Web, the other thing that I sort of spend my time doing is we're actually thinking a lot about the fact that this Web thing has become very important to the world. In computer science we kind of for many years have treated the Web as this application that sits on -- you know, it sits on top of the important seven layer stack of the Internet, and it's not -- you know, it's just, you know, your browser and some other browsers and no big deal, right? As it's becoming more and more of a society changing thing, more and more people are starting to say, hey, you know, weird also changes on this thing can cause real financial damage to people, can cause security issues, you know Pakistan tried to shut down some YouTube stuff, accidentally shut off the entire YouTube site, right. Whose suing who over that, right? Well, no one because you can't sue a country for shutting down your site. But YouTube lost a lot of money in that deal. Turns out the reason, again whether it was on purpose or accident is still debated was because the DNS system was based on the assumption that everybody was trustworthy and happy and sharing in a friendly way and that if somebody turned off something it would only do it for their own stuff and no one actually really thought about it but if they actually wrote it into the ones that would get propagated. Okay. So again, we built this thing in the small, it's gone into a huge, and that huge has a lot of social impact. So there's an article about that in the July CACM if anyone wants to do it. We're really interested in trying to think about, you know, how do we really study this Web thing in a scientific way, the modelling, the engineering and the social impacts. Thank you. >> Evelyne Viegas: Thank you. (Applause). >> Evelyne Viegas: We have time for questions. And please take advantage of that. We have Jim here with us. >>: So in the Semantic Web, (inaudible) which builds on top of the (inaudible) and I know at least about the (inaudible) language so how do you see this, is this actually necessary to have a (inaudible) language of ontology. >> Jim Hendler: So actually I took out the slide that was the new layer cake diagram. So the question was you know sort of there's some rule stuff happening, there's the Sparkle data query stuff happening, you know, does all this need to happen. You didn't even ask about RDFA which is a way of embedding RDF into standard Web text, griddle, which is a different way. So I mean, there's -- so, you know, like Web service is right, there was SOAP, and now there's an entire stack, you know. You know, there was RDF and now there's an entire semantic Web stack, and the question is given an ontology thing, do we also need this rule thing? There's no easy answer to that. The ontology world has focused on a particular set of expressivity for the reasons of decidable and stuff. The real world is looking at other things. The intersection doesn't turn out to be very powerful. So there's sort of two different subsets of first order logic with very different performative things and again it's these damn webbies don't understand the stuff, they just want to use all of it. So they've sort of said, well, make it work together. Same thing with Sparkle. Sparkle is just a way right now of just an RDF query language but if that RDF query language -- so if -- so I go to a -- so here's a bunch of FOAF data. That FOAF data includes a lot of information that a reasoner to be used to tell whether people are the same or not, so two people with the same email address should be treated as if they're the same person, okay. Now I do a query again, it's the thing, be nice if I can ask, by the way, have you applied the FOAF stuff so that I don't have to do the reasoning later? I'll ask you a very different query if I know the answer to that. We don't have any protocols for that or anything right now. So there's a lot of interest right now in how we put all this stuff together. What's starting to look like a suite spot sort of for the next generation of things, again this is, you know, sort of a researcher making a prediction, so, you know, cover your years and go screaming the other direction, but it's looking like a big group is trying to make OWL powerful enough so that the rules guys won't be needed, right? There's another group who are saying well maybe the right thing to do is just get a small piece of OWL that we can express as rules or reasoners but then build sort of a rule language, use the rule language on top of that. So there was a lot of problem with rule languages for doing data stuff just because again there's all the mapping and things. And now we use that RDF layer sort of as an intermediary layer, the vocabulary stuff to get the persistent URIs and the rules to hook them together, right. And again, the problem is if you build big rule systems with thousands of rules and things like that, very hard to maintain, very hard model, you know, it's why prolog isn't the primary programming language today that many people thought it would be some day, it's just, you know, very hard to maintain some of these things. But small rule sets, small ontologies, big data again another thing people are starting to look at. So, you know, the answer is do you need all of it, depends what you're doing. Are there ways to morph one into the other, yes, just like there's always ways to morph anything in computing into almost anything else in computing but each of them has certain natural advantages and disadvantages and so if you can find the right place and the trade-off space to put this stuff, this stuff, and this stuff together right, sometimes you can get a much larger scale for a much smaller cost, and so again I think it's more that people are exploring all of these things. But all of these things that come into the standard's group have to be at a certain level of maturity where people are saying we actually want interoperability and so I think that's the new thing. And the rules language which ten years ago was the new thing in the ontology space was people realizing the importance of that interoperability across applications. So that's kind of my spin on it. Yes? >>: (Inaudible) Semantic Web services (inaudible). >> Jim Hendler: You know, almost every time I give this talk now somebody asks this question. I keep saying oh, I got to put a slide in, and then I keep forgetting. So the question was could I comment a little bit on Semantic Web services, the growth, what's happening in that space. I will, but I -- but you know, but start from the fact that if you said that about just Web services, right, different communities have different meanings. Do you mean actually what's happening in WISDL and, you know, BIPEL and stuff like that, or do you mean things on the Web that can do stuff. If we focus more on the technical end of services, the. >>: (Inaudible). >> Jim Hendler: Yeah, yeah, right. So again, so at the technical end of service, service languages, service descriptors, things like that. So one of the things that -- actually the DARPA program started funding was the notion that if you actually look at a lot of the stuff one of the problems you have is matching inputs to outputs only really works right if people did semantic stuff in the naming, right. So I say the output is something is a number and it's a FOAF and this one has an input that's a number and it's a (inaudible), you have no idea whether those two services can be mapped together or not. On the other hand if I said FOAF and FOAF and it works well you're back to the same kind of vocabulary story we were just talking about in data. So a lot of people started saying, you know, can we take these service descriptors tie them into these ontologies. There were a lot of research efforts, some of them all the way down to why don't we just do all the, you know, why do we do anything procedurally, right? We all know logic is better, let's do the service definitions all the way down to logic and like a lot of things that has people who say it's the right thing but it hasn't really gained a lot of visibility. The thing that has started happening and in fact there's even some standardization in the space is being able to take WISDL and say here's a hook into some other naming space, right, which will tell you more about my input and output and my i-outs, input, output precondition effects. So I'm able to say, you know, if you use this particular attachment in your WISDL, then I can say this thing here if you want to say more about it, dereference this URI and you'll find an OWL page or you'll see something else and at that you can, you know, see some terminology stuff that may be useful, that's starting to be explored a lot. It really was -- you know, it's one of those funny cases where there was a lot of money in it in the research world which was keeping industry from getting very interested. Now that money is starting to cut back a little, so people are saying, hey, you know, show us real use. So there's still a lot of -- you know, it's still very early days for that stuff. The other thing is the service so you know, sort of by 2008 everything was going to be in SOAP and WISDL and every organization would have all of its services advertise and open, and that hasn't happened for a lot of reasons. That has happened much more within the enterprise, within the enterprise of course you can do vocabulary, control things that don't necessarily need this stuff. So again, it hasn't really happened at the open scale stuff yet. But people are still really looking at that. There's still a lot of applications that want to go in that correction. So there's still a lot of people betting that that's going to be a very important technology. I certainly keep a research foot in that area. We did a lot of work on, you know, using AI planning systems to compose services and you desperately needed some kind of -- you know, every time we tried to go just down to the raw services and just use their input output descriptors it failed miserably. So yes, I'd say it was a promising -- all of this is sort of young, but maturing, that one is sort of a few years behind but gaining a lot of interest. DARPA looks like it may actually be considering starting a program in it which will have mixed benefit. But if you're a researcher in the US, it's a great -- it will make you very happy if you do this study. Yes? >>: How do you see the academic programs ramping up or where is the momentum there? >> Jim Hendler: It's a good question. So the question was, you know, how do I see the academic programs going in this space? I think it's -- again, because more of the funding was -- if you go to almost any EU university that had an AI group, they now have Semantic Web courses, some of which are formal OWL courses, some of which are actually more RDF than OWL and building Web app courses, so there's a mix there. In the states it's starting to catch on. You're starting to see more of it. You're actually seeing a large demand for training courses now, because again there's this Web 3.0 stuff, companies want to hire people, there's a very small number of people who know how to do this stuff. A lot of self trained people doing very well in it. So you know, small consulting companies are spraying up every day because NASA needs someone to help them do this, and this library needs someone to do this, and then they having done that they now have six people on salary, so they got to find another. So sort of the thing that caused the contractor community and the database world is sort of happening again here to some degree. And the universities are starting to do it. I don't think -- there's been a couple of books out, one of which is sort of textbooky, but again really the formal language. I have a book with someone else called Semantic Web for the working ontologist which was sort of an effort to not go to the classroom but to sort of get it in front of the people who are actually trying to do this stuff. There's not really a good answer to that except to say things are happening. What is interesting is this Web science stuff under that name and other names is starting to get some attention as a thing to have curriculum around because again, a company that wants to build Web stuff needs people understand this. So you know, most computer science students, even if they've seen parallelism have never learned in any course they took in -- through their Ph.D. how you do the scheduling in a million server server farm at a search engine company. You know, what are the things that -- you know, you don't just go in with some kind of round robin. Most people don't realize that google.com isn't some single big computer back there that's just got a bunch of static Web pages that delivers to you, right, it's a million computers doing a complex calculation using an incredibly complicated scheduling algorithm to get the queries to the right places quickly, that kind of thing. So again, there's a lot of scaling stuff that people are getting interested in the Web and the promise is sort of hard to teach that stuff without first teaching HTTP and HTML and not so much HTML, but HTTP, some of these languages, some of these techniques. So I think we're seeing a growth in the Web as a thing to be studied in computer science and then the next step beyond that is sort of this part of the Web thing coming in. So I'm actually very encouraged by that. You know, one of the things that caused me to move from Maryland to RPI is Maryland was sort of reluctantly letting me teach the most popular classes in the department and RPI wants me to create a whole curriculum with a bunch of people to make this sort of stuff so that our students will be, you know, they realize that people are going to these companies and then being trained about the Web. If they came out with training about the Web, they would be ahead of -- you know, they would be able to get some of the jobs that right now are going only to you know a few top schools. So I think there's some excitement there, and we're seeing that at other places, too. So I think, you know, an open question but like anything else in the technology space as it becomes popular the courses kind of follow, not the other way around. >> Evelyne Viegas: (Inaudible) so the Semantic Web (inaudible) education programs, do you see that translating into business (inaudible). >> Jim Hendler: Yes. >>: Apart from the (inaudible). >> Jim Hendler: No, it's a great question. Her question is so a lot of money went into Europe, a lot of research money and in fact Europe actually tends to fund people jointly, companies and businesses and these networks, you know, are we seeing a lot of business development there. The answer is actually strangely enough we're seeing a lot of the business development in Silicon Valley because that's where the venture capital is if you want to build a Web 3.0 company not -- so in fact, DERI is -- D-E-R-I, is an institute in Ireland has 125 people working for it in Semantic Web, things very close relationship with Oracle there. They now have started a Silicon Valley group to manage two or three spinoffs they're creating because it was the -- you know, because that's the place to do it. So one of the things is we -- so in fact the funding world is very weird. It turns out having lived in this for a while that I actually realize. The US tends to fund research and get out as soon as companies express interest. The EU tends to try to fund across the gap, so they come in a little later, they leave a little later, right. In this Semantic Web stuff part of what happened is the US view was that this -the companies are just going to take it over and run with it now not really understanding the difference between sort of the small parts of it that were ready for prime time and the parts that still need research, because this is, you know, like any -- like database technology, right? We haven't said now that Oracle exists all the universities don't have database research anymore, but yet that was the thinking in the Semantic Web and of course it didn't work very well. In Europe it was just the other problem, they feel like they've funded it too long and the companies are coming over here to start. There are some notable exceptions. The U.K. has several large startups in this Garlock being the biggest. There are a lot of -- so one in the US would be either a senator or an institute in a university will often become a small company in the EU because then they become a network partner for the university to go after some of these larger grants. A lot of those are now trying to stand on their own feet and become startups in this space. So the answer is there's a lot happening. At the moment I'd say interestingly enough, the Web 3.0 stuff is still primarily U.S. The formal Semantic Web stuff is still primarily Europe, but a large switch happening at the moment of a bunch of the Europeans realizing that and starting to look at scaling issues, things like that. The other thing interesting is DARPA is starting to fund somewhat are called feelings, some preprograms in, quote, Web 3.0. In fact, it was interesting, I was asked to come to DARPA, talk to them about Web 3.0 as I mentioned, and you know, I sort of had to say, well, you know, let me show you the slides I presented when I was a DARPA program manager. You funded this stuff, you just sort of forgot it for a couple years. This is the path. So I think it's an open an interesting issue. But I think right now if you're a -- you know, if you're a researcher at a university an want to have Semantic Web funding you're better off moving to Europe still. The other thing that was interesting is the most recent framework they took Semantic Web off the list of topics that you could be directly funded for, so they were just tired of people saying, you know, we will build a better way to do ontologies, we will build an ontology assessment tool. But put Semantic Web into one of the desirable features of the applications they would fund. So there's just been a new funding round, big projects in things like cultural Websites using Semantic Web. And one of the really interesting ones is a big joint project between BBC and a group in the Netherlands. If you've ever gone to one of these sports Websites, you can get all the information about every pitch in a baseball game in some kind of database, but if you want to do, you know, merge that with other thing you're back to the old mashup thing. Some people have been using RDF as the way to do that. BBC is very excited about that. So for the 2012 Olympics they're going to be teamed with this Netherlands group and try to build website, you know, stuff where not only can you go to their website, but you can have Sparkle access to virtually every minute of every event over the course. The thing that links back to the video, a lot of it, with embedded video, I don't know enough about the actual project, I just heard about a 10 minute talk on it recently. Yes? >>: What do you mean, what does it mean to have Sparkle access to you know not anybody can sit down and write a Sparkle query. >> Jim Hendler: Well, not anybody can. >>: (Inaudible). >> Jim Hendler: Well, that's right. That's right. Well, and hardly anybody with sit down and write a SQL query but at least by the same big O notation. Well, but the key is that the Web act developers are starting now how to write Sparkle access. So the idea is not to make it so that you as an individual can go to the website and pull the luge minute by minute thing, but that a third party developer can use that information to create a luge mashup for the luge community or something like that. The idea is that they now become a service provider and you know there's some interest in the BBC because you can obviously see how eventually that could be funded through advertising, through joint deals, et cetera, et cetera, although for this one, since mostly government funded, the plan is to release all that stuff free as approved for concept. >> Evelyne Viegas: So we'll have to cut now because (inaudible). Thank you very, very much. (Applause)