>> Jim: Roy and Danielle worked in graphics for a long time. In fact, we had contracted some work with them -- was it about ten years ago now? -- on doing a standard graphics library model. And they had a very interesting take on that. My understanding, at least what Danielle tells me, is that that has blossomed in a direction that we could have never imagined. So without further adieu... >> Danielle Forsyth: All right. I guess I'll get us started and we'll add Roy in a little bit. Thanks, Jim, and thanks, Lili, for setting this up. We're going to talk a little bit about semantic modeling as it relates to the discovery process. And Roy and I are with a company down in Portland, Oregon. The company is Thetus, and we are not researchers; we're actually in the enterprise actually doing this stuff. And our customers are extremely, extremely demanding in that they really want to see their own information from their own perspective. They want to see other people's information from their own perspective. They want to understand increasingly complex systems, and they want to use semantics as a way of being able to connect internal and external information. So I'm going to talk a little bit about what we actually do for a living, and then we're going to carry on and I'm going to conclude with a demonstration that involves the Social Computing Symposium so you can see actually how this stuff plays together. So with that, here's our world. We live in this world of interconnected systems, of systems, and it doesn't matter if you're talking about procurement risk or if you're talking about transportation systems, storm water environmental engineering, we're all working with information that was collected by somebody else for a different purpose. And what you're trying to do is see systems on a scale that's right sized. If I want to ask hard questions, is global warming happening, well, do you mean in your backyard in the next two weeks or do you mean in your community in the next ten years. Scale, scope, perspective matter. Do you mean global warming as it relates to temperature or do you mean as it relates to carbon or lifestyle or any number of other things that might be considerations in your problem space. So when we started the company about six years ago, we started by working with a couple of research organizations. And we did this because we picked a nice and complex problem space. So we initially worked with a proteomics group out of Oregon Health & Science University and an atmosphere kind of environmental group out of Oregon State University, the College of Oceanic and Atmospheric Science. And we found that pretty much everyone had the same challenges. They were trying to glue together a picture with a changing information environment where they weren't able to control all of the information, that bench science environment that a lot of them grew up in. We had great success in doing that technically, and financially decided that it would be a lot more lucrative to move out of the research realm, so we started working with people in energy infrastructure, started working with fusion centers where you've got law enforcement who are trying to cooperate with community health, with gang organizations. All speak different languages, want to put information in, get it out and have it contextualized. So people who are in law enforcement want to know aliases, need to know aliases, people in community health don't want to see that stuff. Relationships really, really matter in this context. We work heavily with U.S. federal, a number of the major intelligence and defense agencies. So we spent last week in Nashville, Tennessee, with Microsoft promoting our work with Virtual Earth, SharePoint, and looking at some of the things we might be able to do to take advantage of Photosynth. We also are now working in the financial community, largely with an eye towards bringing longer term financial metrics to bear. So instead of looking at things like quarter to quarter earnings, how do you look at things like triple bottom line accounting where you're bringing together social and environmental factors and giving people some sort of idea of their long-term viability. So we're going to talk today a little bit about the past -- I'll skip through this very quickly -- Jim did a great job. We're going to enter into how we got from 3D graphics into semantics. Roy is going to do that really quickly for us. And then we're going to talk about knowledge modeling, knowledge services and conclude with a demonstration. So just very quickly, on the history part, we both all -- several of us spent way too long in 3D graphics. Ironically, first project Roy and I worked at at Microsoft was called Chrome, and we then moved on to a thing called PC Model, and that's a parameterized and constrained modeling and methods and that's the basis for our Workflow network. Microsoft was very generous in letting us publish that work and it's turned into something that is the whole backbone of automated processing of information contextually in our system. I attended the Social Computer Symposium and so I decided that it would be appropriate to do a demonstration using that information. And we've recently started working with Microsoft on a new effort around carbon modeling. And we're doing this through HGR, one of our customers, a consulting engineering firm who's looking at systems. So we've got a number of ties, we use a number of technologies and we're looking forward to sharing how we do that with you. So Roy's going to talk generally about modeling, and then I'll come talk products. >> Roy Hall: Okay. I actually was a modeling geek long before I was a graphics geek and I was a finite element jock at structural analysis. Modeling to me is how I see the world. And we talk a lot about simple models, well-understood models, like gravity, causal models, you drop a ball, you can guess what's going to happen. But really those aren't the interesting ones. And what's really interesting is the models, they get to be more chaotic, more interesting that our systems have systems and are often unpredictable. So we talk a lot about the observed reality, what we can see, what we can measure, what we can document and instrument. And we talk about the models we put together and what we're going to predict from that and what's really important is that we can capture the whole workflow piece in a reproducible way so that people can look at models, look at past data, look at predictions, look at what's happening in the future to decide whether their models are good. What's really interesting to us, though, is the part that starts to happen around policy and the opportunity to influence the world that we live in. And so we're really talking about policy models, models that are sometimes really difficult to quantify, behavioral models, things like that, and talking about how we start to push -- you know, given what we observe, what we predict, how can we push the system to move it towards what we'd really like to have as an outcome, that's led us to a lot of work with people who are doing policy modeling for water or energy or things like that. How does that relate to semantics? Well, we actually stumbled into semantics kind of ass-backwards. We didn't know we were doing it. But we were really worried about how you support the discovery of all the various models that are around and how you do the impedance matching so that you can start to plug these models together and in fact get models from different disciplines that apply to your problem. We needed to support how you present the results and map the results from models that come from different disciplines into some terminology that's meaningful for you and in whatever problem you're solving. We needed to promote the discovery of novel relationships, and so we discovered a lot of things in the OWL and RDF community that really gave us a formulization that we could use for inference and discovery and presentation. And so that's really where the semantic part came here, is we needed some way that we could talk about the world in a very abstract conceptual, and when we talk about uber ontologies, we don't believe there's an uber ontology, but we believe there's some pretty common abstract ways to organize things that you can talk about and model around, and as long as you use the semantics well, people can extend those models into whatever discipline space they need. And you're next, right? >> Danielle Forsyth: All righty. So as computer technical folks, we tend to think the world is all about the data. We tend to think that all of the things we need to know in terms of modeling are in the information itself. And really we as a company tend to look at the problem very differently. We think it's all about the model, the view, the perspective. And so as the information assets are ever changing, what you really want to do is capture the model that you're using and render a view so that you can actually see things from very different and changing perspectives, both in terms of the understanding of a changing information climate and in terms of the understanding of an ever changing model. And so when we're working on these kinds of problems, we have to design systems where the models can change and we track the challenges as one of the most important metrics of the community understanding. So we're working over here. And when people say what do you mean by that, we tend to talk to systems like SharePoint, like any other kind of information store, that keep the what you have, and we tend to focus on the what you know and what you leave. So two things: The models and what we call -- what people call the tacit knowledge, which is what you're discovering throughout this process. So if I get too deep, too high, just sort of give me some visual feedback. This is pretty hard to see unfortunately. A couple weeks ago I went to the Social Computing Symposium. And in going to the Social Computing Symposium, I sat and watched a whole group of people interact in such an incredible way, independent of any of the organizations that they belong to. You had academics, you had people from Google and Yahoo! and Microsoft and people like me who didn't know what the heck they were talking about. And we all were existing in an environment where we had a set of relationships that were really pretty much bound around the use of social computing technologies. That was sort of our common, shared understanding: social objects, behavior and perspective, social computer technologies. And so the relationships that mattered in that kind of context were quite different. So when you looked at the world through a social computing lens, you saw the world one way. As you started to look at the world in a connection lens, so if you start to take the view and say outside of this forum how are people affiliated, you start to get a very different view of the information. A lot of the people who were working there were collaborators, had relationships with Microsoft that were well outside of the scope. And as you watched over the couple of days that we were there, you started to see that some people clearly knew other people really well and some people got to know some people while they were there fairly well. It was a lot of fun. But if you step back and say how do you apply the corporate lens to this, well, Microsoft and Google look at each other very differently as corporations than they do as participants in a conference or as collaborators. And so the terms and relationships we use when applying those different understandings are very, very different. And I'm going to show you some of the ways that we've done -- some of the things that we've got to deal with is strength of relationship. So it's not just Danielle knows Roy, Danielle knows Roy professionally very well. And so what I have to do is be able to articulate relationships that are well beyond an easy triple. And I'll try and stay out of onto-geek speak because I know some of the people here are semantic geeks and some probably aren't. But this idea that as we move forward and you've got billions and billions of people in your information set -- I want to see people that Lili knows well. I mean, people Lili knows well professionally. People who are like Lili, physiologically, academically, professionally. So what we've got to start to do is capture context so we can overlay different models and be able to see things from very different perspectives. And that's our little world. So the classic vendor piece. We'll tell you actually what we do for a living. The part that we do is we make a semantic knowledge base. And one time Oracle thought they were being particularly mean to me and said you guys are just the comments field. And somebody yelled from the audience, said that's where all the good stuff is. And we pride ourselves on bringing meaning to all of that stuff, bringing context and perspective. So at the core of our underlying knowledge base are three key pieces. The first is the modeling component so that you can keep different ontologies or models in different name spaces and put policy around them so you can say who can see them, who can read them, who can write them. And you can track lineage on all of those things. The second is the metadata indexing and search piece so that you can extract context-appropriate metadata and connect it into a knowledge model. So a lot of people talk about triple assertions. We want to be able to say when Danielle says -- when somebody says Danielle knows Roy, were they talking in the context of a professional relationship or a certain relationship or what was it. The third big piece that we've got is the piece that Microsoft started us off with, which is the procedural network. And originally when we did PC Model -- originally when we did it, it was intended to allow us to build 3D objects by putting together parameterized nodes in the network. So you could have input parameters in the export -- output parameters. We then took that and applied it more generally to a workflow process so that you could process information, route, notify, filter information and extract lineage, so that I can do automated processing of information and show you an information picture that's relative to the problem space you're most interested in. So this has really been a key thing. Overlaying policy and lineage on all of these components allows us to see how the information environment is changing. There is an associated SDK. I'm going to talk in a minute about a horizontal knowledge services. This has been an exciting new area for us because what we realized in doing this is to deliver end-user experiences, you've got a set of stuff that you need to be able to do for any kind of application. And so over the last year we've put a lot of emphasis in that. So I'm going to describe an application that kind of is intended to get you out of the data thinking, which we all fall into, even our own staff, a lot. So the Army Corps of Engineers came to us and said we want to know where people are going to be. We said cool. For what purpose? They said, well, maybe for humanitarian rescue missions, maybe because we're planning some sort of military, maybe because we just want to know so that we can put people out on the streets so that they can be helpful. We said okay. What have we got to work with? And they said pretty much nothing. And we would like it to apply worldwide and we'd like it to be culturally sensitive and we'd like to know exactly where all these people are. So we set about with them and their geographic expertise and built cultural practice models and then applied spatial models so that we could build generic cultural practice models, schools, have kids in classes and curriculum, and then apply them regionally and fall back on the next-best behavior. So, for instance, if I know that a school is in Boston, I know certain things about Boston practices that are pretty consistent with Massachusetts and the United States. But if I know that it's in a region that's a lower economic stature, I might apply different kinds of rules on the attendance at school and the regularity of participation and all. We did this and then overlaid a set of rules and actually anchored the cultural practices in the one piece of data that we did have, which was building identification data. In the second phase of this project we actually built ontologies that allow us to take the properties of a visual element and derive what the likely use of it was. And so we deal in this world where often we're almost dataless and we're using the model to create a simulated environment that's kind of the next-best thing. So this is a good -yeah, go ahead. >> Roy Hall: I think the key part of this project was that we were really operating with a very sparse dataset. Most often, and occasionally when you were talking about a specific area, there would be people on the ground, there would be feet on the street, so to speak. And you'd start to get some very specific information. And so we were always saying how do we manufacture the information we're missing. So it was metrics around if I have information for a similar facility, it -- close by is -would that be closer than if I had cultural information for how -- say it was a mosque or a church, you know, in this culture if I kind of know how things work, is that a better model than the church just down the street. And so this whole idea of how do we meld these different sources and how do we manufacture information, how do we bring in sources that we discover other places was really a key aspect of this project. >> Danielle Forsyth: And it led us to our first experience where we said the kind of annotations people are thinking about are woefully insufficient to capture context. And so if I want to make a notation about a dry cleaner, I may want to make the notation and say I was out walking in the neighborhood and I observed they had an art exhibit at the dry cleaner. We actually live in a part of town where that's true. But it isn't necessarily a normal dry cleaner behavior; happens to be first Thursday in Portland, so everybody and their brother finds some way of making [inaudible] around the first Thursday in Portland. But sometimes you want to make an annotation and you want to make it a rich pattern that applies regionally or applies to a certain cultural archetype. So we started to realize that when people were talking about annotations, they were talking about putting context-free sticky notes on things. And this was the first experience where we said we not only want to capture your tacit knowledge but we want to be able to contextualize it right there when we do that. Wow. Got my first victim. Second example, and these are just intended as example applications, this is an application that we do with one of the law enforcement fusion centers where they're merging information from 23 different disciplines and providing people with contextually relevant information back. So if I'm a cop, I get to put my stuff in in cop speak, it gets fused with all that community health and gang enforcement and all of that other stuff. I get back a cop view and I see the concepts that matter to me spatially in a database around concepts that matter. In fact, what we do is much richer relationship graphs in space. But turns out that you can only use mocked-up data when you're doing public presentations. This is one of the most interesting applications that we've got. And we do this in conjunction with Consulting Engineering Partners. But the City of Portland came to us and said we're going to spend $1.4 billion in seven years and we just spent 1.4 billion -and you guys have this problem too -- we're putting way to much storm water down our storm drains and we're covering more and more of our surface area with impervious surface, so more water goes down the storm drain. If you are one of the 770 communities in the nation that has combined sewage overflow, it backs up and puts sewage in your streets. And Puget Sound has a huge problem here. Absolutely huge. So what we really did was use semantic models to right size information presentation to people who can make a difference. So in this case this is a view that's intended for residents who want to be able to say this is where I live, let me see my lot, and then in this case the only data we have from them is lot line and house layout that we're getting out of a GIS system. Based on that we calculate how much water they're likely putting down their storm drain if their lot's covered in concrete. We give them a little pallet for configuring their lot and the pallet is generated around which types of incentives -- or which kinds of programs apply to your lot characteristics, based on clay, based on elevation, based on all kinds of things. So Roy talked about model impedance. We're really looking at what fits. And finally we're saying you just moved up to rank No. 3, you get a coupon. In the background of all of that application, the City of Portland is asking who's responding to what kind of programs under what kinds of conditions and what difference is it making, because if there's any way that I can get community participation and stop the need to build additional structure, while improving the local habitat, it's a huge win for everybody. So in this case we're working with people who've traditionally done transportation and environmental structure to eliminate the need for structure. >> Roy Hall: And so this really -- of the first slides about modeling, in this case, all the water models are very well-understood engineering models, we know how water runs off, we know -- you know, we know things about how it rains, we know how big we need to make pipes, we know how much all that stuff costs. And then there's the policy, where do you want to go, what do we want to change. And the whole part, how do you start to influence people to move that way, what can you make available, and then modeling what they might do, predicting what they might do, getting continuous feedback so you can better craft programs and policy in the future. So it's really around long-term projects that involve an evolution of what you know and what your models are over a very long time span. >> Danielle Forsyth: So a lot of what we were doing is working on policy and behavior, as Roy said, and anything that involves people, doesn't have that engineering formula that we all know and love. People do different things under different conditions, and they respond -- it turns out they respond 27 percent better if asked to do a campaign not by a politician. And that's pretty difficult for the politicians to understand, that turns out that elected officials don't have a lot of credibility when they tell you what to change on your house. So those little things are things that you actually want to build in so that your campaigns can be more effective. Whether they do or not, that's a different story. This is a Washington project that we're doing across the state of Washington. And -- did you have a question? >>: One question is: How stable is that number, 27 percent? >> Danielle Forsyth: Yeah. >>: Over the course of time or in different geographical ->> Danielle Forsyth: One campaign. You know, and that's what you want to do is you want to build that in as a property and then constantly evaluate it under what situations do politicians have higher credibility. And what you do -- what we do is we build baseline models and we use them as evaluators, so if you think you're going to get a 7 percent response to a direct mail campaign, when do you -- what were the other factors? Turns out the City of Portland is phenomenal at sending out mailers during key holiday seasons, you know, and they just don't put that into their thinking. They do now because they get a much lower response rate to [inaudible] disconnect when you're having a turkey. But ->>: What is it, people are very different [inaudible] interaction, they respond to their environment, right? If I find that I'm getting all these mailings at Christmas, I'm busy with Christmas, I don't want to hear about it, then I might start getting them in July, there will come a time after a certain amount of training where I'll say, you know what, it's all in July, I don't want to hear it anymore. >> Danielle Forsyth: Right. And I think that's what you're trying to do is build community-centered models so that when Portland gets a set of knowledge assets and Seattle says I want to start, Portland's knowledge assets are better to start with than Indianapolis' models because we've got some demographic and culture similarities. And that's what you're really trying to do is get everyone to not start at ground zero and understand some of these inner relationships and their connections. And there's a lot of other considerations in those models. You know, we use rain barrels in Portland because it doesn't freeze. It turns out in Chicago every degree increase in the heat island costs $150 million, according to Mer Daily, in incremental air conditioning costs which of course then has an effective increase on the heat island. So during this process, they're doing green roofs and water retention for temperature mitigation, not for storm water collection. So, you know, where's the model end? And I think that's the whole thing, can you build them, evaluate them and start to use them in a way that you started a better position than just sort of guessing, which is what we often do now. >> Roy Hall: And I think, again, you have a sparse dataset with those kind of models, you have places where they actually have monitored, you have -- particularly with the social models, it's difficult to figure out what is causal there. You can -- you have coincidental models. We've statistically seen that these things are correlated, but we have no idea which one to push to get the other one to happen. So ->> Danielle Forsyth: But I think you've just jump started and said a lot of people try to build super models, you've heard about regionalization, you've heard about human activity, you've heard about demographic and environmental sensitivity. So this is an application that we're doing. I picked some for the State of Washington. This is an application that we're doing with our partner HDR and it's an offset credit purchasing system, but the offset credit purchases are all environmental. So if you're going to put a Wal-Mart in wherever, you've got to buy offsetting water credits. And those water credits have to be matched up against one of the fish boards environmental objectives. So what's happening here is you've got a buyer, a Wal-Mart owner, a seller, people who own land that can be offset, the EPA who's trying to make sure that we actually adhere to some of the environmental regulations, and all of the fish boards that are trying to make sure that they regionalize a lot of those rules. And so what you've got is different interaction models across that system. But what you're allowing them to do is trade fluffy stuff and evaluate what the impact of doing that is. And so these kinds of systems are incredibly rewarding in that if you can start to show community impact, you can start to get people to put their own properties in and look at what options do I have for my property, could I put the back 40 up for mitigation, would it cause improvement, would it get us a new park? So they allow us to sort of bring some critical thinking skills to people who wouldn't necessarily consider participation by kind of RightScaling the information. So I've been talking about what we call composite Web applications, their mashups in the enterprise. And we're not always the keeper of all of the visual piece. We're often participants on a site where you've got a visual component that's interacting with a lot of textual component. But what we've found in all of three applications is that there are three key capabilities that all of them need. And it doesn't matter if it's a SharePoint site with a couple of visual components, and we've done that many times, or if it's a dedicated interactive visual Silverlight site. It makes no difference. What is underlying all of those things is the need to know information governance and trust, so I need to know lineage, prominence, pedigree, whatever you want to call it, so I can determine whether or not that information fits for my own analysis. I need the ability to do rich annotation so that I can actually capture what I'm observing so that as there is something new observed, I can quickly say under these conditions these people are angry. Turns out, you know, whole -- organic grocery stores are really, really great places to offer environmental campaigns, but that only hits about 7 percent of the market right now and you've got to be to find out where else people might be open to participation. And this third area which is this area of what we call abstraction, which is allowing people to see information assets that other people have from their own user-defined perspectives. So a lot of people are talking about SOA environments. We're talking about semantic SOA environments where you allow people to not only use data services but evaluate the integrity of that service for your particular use, be able to add new knowledge to it so that you can talk about other people's things that you don't have and be able to see new information assets from perspectives that are user defined. Yeah. >>: [inaudible] the knowledge on what they share [inaudible] adding comments to [inaudible]. >> Danielle Forsyth: I'm going to show them to you. We really work hard to bring as much connection, so it's talking about a URI or URL, to bring context so that it's off of some sort of conceptual layer and so that it's bringing connection which allows you to say under these circumstances these things happen. So, no, this idea of more and more sticky notes, more and more tags, those are all very brittle for us because they don't give us enough context to determine fit. >> Roy Hall: I think one of the things that goes with that is there's always a terminology in which you express things. And, again, we try to start it at high-level abstractions that people can mostly agree on, but if you can't find the word or the expression or the kind of property or the kind of relationship, you extend the dictionary of terminology and you place your term. It's a specialization of this, it's related to that, and then it falls into the -we do all our search through dynamic inferencing, which then let's you say in my world my world looks like this. When I say this term, it means this, it maps this way. And so you're really both adding information and you're extending the definition, the terminology that's used. >> Danielle Forsyth: Which is why you want to put different understandings in different name spaces or dictionaries. Our end-users talk about having personal dictionaries where they keep terms and relationships and anchor them into some community understanding so that you can -- you know, when we were at the Social Computing Symposium, there was a young lady that was talking about salsa dancing, and you can imagine that salsa dancing has a whole different set of nomenclature than what you're talking about with just dancing or with just social events or with just any kind of teenage activity. And what you really want to do is allow people to talk richly about what they're talking about, but not necessarily create these uber models. >>: So does this mean that you have essentially a meta model that includes such concepts as you just alluded to, is an instance of contains abstracts ->> Roy Hall: We use an OWL syntactic semantic model which does have those concepts. We've done a number of our own. We've found a lot of what comes out of the semantic community is really rooted in AI and it's about machines being able to understand and identify people work a little bit differently. We're really good at patterns and in one sense there is a I can discover a thing is a kind of coffee cup or tree or whatever by looking at its properties, people work in a much different way where I say it's a tree and that -- and I use that label really to assign a number of properties. So we kind of use this mix-and-match approach to semantics which gives us that basic structure, and then we'll take things like a map. Well, regardless of how you want to put things on a map, it comes down to having a location and perhaps an area. And some databases it's an address, and that needs to be mapped through some service to a location. In some databases, it's something else. So we really talk about what are the abstractions we need to derive all the visuals and the interactions. And then we put semantic mappings around that to mix and match whatever the sources are or to derive data in places where it's data poor. >> Danielle Forsyth: So two key points: one is when you share a language with a community, you can use categories or classes or whatever it is that you want to -concepts. And I'll talk a little bit about OWL and RDF. The second place is where you don't share a language. If I put a class of something and say those are really hip nightclubs, well, my 21-year-old doesn't think they're hip nightclubs. I think they're hip nightclubs. So if I put a lot of properties on it, he's then able to derive whether or not that actually is a hip nightclub, and guarantee you it isn't when he looks at that set of information. Sometimes a very descriptive dog-type label is going to give you a lot, and sometimes you want to be able to allow people to derive what they think of as a very hip nightclub. So in the semantics world for those of you who don't speak RDF and OWL, RDF is the syntax. It's the -- we talk about articulating triples and it's a W3C XML format that allows us to describe triples -- subject, predicate, object -- very straightforward. The problem is it's generally insufficient for describing context and perspective. So if Danielle knows Roy professionally really well, how do I say that. So within our system we have to deal with the triple, the name space, and what are called reedification properties -- and if you don't understand that, it doesn't matter -- so that we can deal with confidence and relevance and ambiguity and security at the triple level. On the model side of the world, OWL is the common machine-readable form. And it's another W3C standard. The OWL 2 is just about to come out much richer. And what it describes -- and this is going to be a gross simplification for those of you who actually know about this world -- is that classes, data properties and object properties which to our community we call categories, properties and relationships, because we've yet to meet any normal person who understanding what an object property is. And so really at the barebones you've got categories, like humans, you've got properties, like height equals, and you've got relationships, like is married to. And that's kind of the fundamental structure that we work in. If you want to talk more about that, we can, but if I got into transitive and symmetric properties, probably everybody would leave. So in our world where we're talking about annotating or adding value to information, we recommend that you actually have a SOA environment where you can talk about things that have a URI or URL. Some way of extracting metadata in a rich enough form that we can infer what you meant by what you said, some verifiable service. We're not caught with any particular approach, we've seen an awful lot of chatty SOAs in our day, and a user interaction that allows for a Web-based connection to the back-end system. And we put these kinds of things up here because when people say I want to talk about other people's stuff, you want to be able to keep a pointer so that you can keep a rich description and be able to know what you meant by what you said about that particular stuff and share it with others. So I'm not going to spend -- just because we want to leave lots of time for questions, we ended up packaging not only the knowledge base but a set of knowledge services into horizontal services that we envisioned you could deploy in an enterprise. So imagine if across Microsoft everybody had annotation services and they let you talk about other stuff and easily contextualize it, and because we're onto-geeks, we used ontologies to describe the structure of the service. And we put tools in place so that you can deploy these services horizontally. So we have three services: the first is lineage, the second is annotation, and the third is abstraction. And they're intended to allow you to talk about other people's stuff. And when I do this demonstration, there's a different dialogue that goes on, for instance, with the Social Computing Symposium where people are talking about their stuff as it relates to the symposium verses people who are organizing the symposium are talking about the participants and what they're bringing and how well prepared they are and all of that stuff versus Microsoft looking at the symposium and saying we're spending our money on what and getting what out of it and how does it contribute to our long-term agenda and research or in community building or in standard setting or there whatever it is, our particular objectives are. And so this idea that you're seeing things from different perspectives but allowing people to add and filter comments, so that Lili can see stuff the community participants don't see, is really kind of key. And so I'll just walk through this really quick. The whole idea is that we've got these knowledge services that can talk to internal and external information sources, be able to connect up SharePoint content with Web sites of interest with other things where you say I looked at this and it conflicts with our internal understanding of these kinds of things. So lineage has a lineage ontology. That's what they look like and we document them really well so people know what we mean by what we say. And if you want to geek out, we can do that. Generally speaking, it gets presented in something very temporal. I didn't put anything very good in there. We have an annotation service, and it's got a very simple ontology. It tends to be presented in a much richer way so that people can make interesting annotations. And then we've got an annotation summary dashboard that we're just doing that says how is our understanding of these things changing. And as we start looking at social -- the social interactions, you can't read every blog, you can't keep up with every Wiki. You really want to know how is that community's understanding shifting, who's becoming more of the expert in this field, how was the expertise gained, where did it come from. And so we're pursuing a number of ways of presenting this picture of change based on the capture and contextualization of knowledge in different kinds of communities. And finally abstraction, our third piece. What we've got here is we've got the abstraction ontology. We're geeks so we've got an onto-ontology that documents the ontology. This is using our lineage ontology to show you the changes in the ontology. And this actually is one of the things we do whenever we develop a semantic model is we start out with the people we're working with and say what questions do you need to be able to answer, because the only way to evaluate the effectiveness of any kind of semantic model is to know whether or not you're answering the questions people ask. And this is the part that's always missing when you pick up somebody's ontology. You sort of say, oh, you've got a time ontology or a threat ontology. What threats were you worried about? Under what conditions? And it's so difficult to reuse these models because you don't have an understanding of really what people meant and why they did it, or where you can anchor extensions in the model. Finally, this kind of information gets presented in ways where datasets can be defined by users and axes can be defined by users. So I want to see your information as it relates to our expertise in a new field. And what I mean by expertise is how many patents are we getting, how many newly minted experts did we hire, but also what's the quality of their publication as related by both their peer group and the industry users of that particular information. You start to get into the softer stuff by what do you mean by quality. And that's where you start to say, well, quality to me means this, but it may mean something very different to you. So I'm going to let Roy describe what that stuff looks like under the hood, and then we'll demonstrate and hopefully... >> Roy Hall: Okay. This is really short. We have a publisher server ->> Danielle Forsyth: Thetus publisher. >> Roy Hall: Thetus publisher server as opposed to Microsoft publisher. And the server is both an ontology store and a dynamic inferencing engine and it includes components for distributed workflow serving. When people build applications, we have low-level communication with the publish -with the Thetus publisher, a client SDK, and we overlay this with knowledge services. The big thing about knowledge services to us is that they're based on some relatively high-level abstraction, they can be extended any way you want to extend them but all the tools that are built to work on that service understand the extensions. And we provide that as a way for anybody else to build any other kind of service they want to build or any other kind of interface. You attach yourself to whatever sources you have on the outside by whatever mechanisms are available for those sources. And the server -- the application server kind of wires that together and serves the Web experience. This is our client-side ontology data store, if you will, knowledge services are built on top of that which again bring the semantic part out to the client, and then you build the visual components on top of that. And that's about it. Do I have another slide here? >> Danielle Forsyth: That's just the sort of five pieces to make it... >> Roy Hall: Oh. Right. And so if you want -- yeah, Danielle prompts me sometimes. If you want to build one of these, what do you need to do, what do you need to talk about. Well, the first part, application and system design, you need to know what people need an application for. We have a number of patterns that we provide, we have a number of components. But, you know, systems get designed in a lot of different ways. You're going to be talking about where information is coming from but more importantly what the abstract models are and how you're going to map information into that. You're going to talk about various models that are going to do workflow bits to digest new information, track changes, things like that. And whether you're going to use -whether and how and where you're going to use the knowledge services we provide, where you're going to extend and provide your own knowledge-based services. There's developing the semantic model again. That's what are the high-level abstractions about the things that you're doing. We believe that for the most part people can't tell you exactly what they need to do and nobody agrees at the very low level about what they really want to do until they start to use it. But at a high level people can give you a pretty good idea what it is needs to be done. So we really focus on the high-level model and because of the semantic underpinnings, it can be extended as you go. And a key part is to have those semantic underpinnings. User experience design, finally, integration with the data sources, workflow design, we have a large number of tasks that we provide. And a very standard way to build little black boxes and say this is what goes in, this is what comes out, and do the impedance matching between those boxes so you can figure out whether you can wire them together, whether you need converters, and how to extend the types of information you pass around so that it's specific to your application. >> Danielle Forsyth: So I'm going to show an application that we never released to anybody ever, but it was just sort of relevant to what we were doing with Microsoft, so I thought I would start out in IE with Virtual Earth. And what you're looking at here is you're actually -- the application framework was developed for something totally different and I stuffed our social computing information into it just for fun. So what you're looking at up here is you're looking at I have logged in as Danielle, and I have in this case six different personalities. And these are conceptual layers that I defined of things that might be important to cabbies, to realtors, to public infrastructure folks. So right now I'm in the cabbie layer and I'm going to say I'm interested in fancy night life. And you can see I pull up a number of places. This one's particularly interesting to me because fancy night life in the case of a cabbie is Pizza Express. And what we've got happening here is we can see that Pizza Express is one of the places of interest in that conceptual layer but we can also say what have people said about Pizza Express. And what I'm doing here is I'm actually bringing in annotations that were grounded in that conceptual layer. And if I did this well, I would actually bring up concepts that better describe stuff. So actually when I add -- so let's go ahead and add a new annotation to this guy. When I add a new annotation -- excuse me, I can barely see the screen -- whoa. Sorry. We'll go back to Pizza Express and see him in the other side of the world. Again, note to self, wear glasses. So I go back to Pizza Express and I say I'm interested in adding an annotation. Have to get off them. And I'm going to edit the annotations and create a new annotation. And it gives me back an annotation that says Herb likes it and I'm going to say okay, I actually want to turn that into a relationship graph and I'm just going to put it over here. And I'm not recommending this UI for this particular purpose, but I want to be able to say -- and this is not a semantically appropriate triple, it's just describing relationships. But Herb likes it, and it turns out that Herb is a person. And so what I've done is I've actually right there in place -- and if I was doing this for real, I would have extremely appropriate stuff for each of the concept layers. But I just grabbed a bunch of stuff kind of loosely. You can see down here that the ontology is actually driving a lot of the subclasses stuff. So depending on how well you categorize stuff, you can start to get concepts that are much higher fidelity in place right there. And in one of the applications that we're just doing, you can just describe things like the hours and turn them into a pattern. So I now know that this is a fancy nightlife for cabbies. I can go ahead, describe the properties of that particular place and turn it into a pattern so that that particular pattern applies to anything else that's like it. And so this idea that I'm now starting to derive tacit knowledge that's contextualized allows us to see what we're observing and when we're observing these things. And so I can look at it in a timeline, I can look at it in a connected view, I can go ahead and say I want to see this as it relates to model-based views, I can center this guy -- this guy up and actually see how this thing related. If I had to find relationships around this, I could look fairly closely. So if I take a look -- let's go back to the map-based view. And note to self, larger font -and switch to a public structure view, I'm going to see concepts that apply to public works. And these don't necessarily have anything to do with any particular data layer. These are conceptual layers. And we have people who define conceptual layers by all of the properties they have, the sources they come from, whether or not somebody's actually reviewed the material. So I might look at peer-reviewed research papers for instance. And to me peer review research papers are peer-reviewed research papers who've been reviewed by a peer I trust versus something that just happened to get published in whatever publication. But I'm going to switch over and take a look at social computing layer. And it turns out that when I turn this particular layer on, we only had one participant from London in the Social Computing Symposium. And when I take a look at that particular individual, it's Alexandra. And I don't think I have -- I don't have any annotations about her, so I'm going to go ahead and say let's go ahead and take a look at Alexandra in the model view and let's center the world around her and we'll move further to the center of the world. And what we're looking at here is you can see that there aren't a lot of relationships that relate to her. So let's pick something -- well, let's see actually what relationships do relate to her. She -- let's see. She directly links to Molly -- and you can tell me if any of this is right, Lili. I don't know. She holds a Twitter account. And he's got a home page, she's from London, she's attended the Social Computing Symposium clearly because that's how she got in here. And when I expand this note, I can actually see that all of these things relate to everybody for obvious reasons. So when I go in here -- I didn't attach a mouse to this, sorry. So I'm going to have -- Roy, could you get me a mouse? Because otherwise I'm going to get [inaudible] oh, this mouse. Actually did this on my own laptop and had to bring my own mouse. So when I look at your Yuri over here and I want to expand Yuri by one node, I can see that Yuri actually works for Google and is based in Mountain View. And when I look at Yuri I can see that I've got some annotations on Yuri. And this annotation relates to -let's see. He moved from Finland apparently. I think that's probably true. And as I start to expand on Google, you can see that the things that I learn about Google are that they have a group called -- has a group, YouTube, and you're looking at a very, very short development process for this particular not very good application. A group of open social. And that's all I get to know about them. So I'm going to switch views for a second. I'm going to go into a corporate view. And I'm going to use a slightly different model. And when I turn the corporate view on, I still see all of the same players, they're marked a little bit differently and of course I have the ability to take a look at all this. But I'm going to look at Yuri and I'm going to say I'm interested in expanding in one degree. Let's get them -- let's send them. That's better. And when I expand in one degree and I start to look at what's around him, I can see that he works for Google. But for this case I have a different set of relationships that apply. He has relationships with Microsoft and Yahoo! that involve competition, which I didn't set a re-edification property on him, which is a strength of. And this idea that there are different concepts and relationships that exist in different name spaces with different policies around us allows us to see different views of information depending on what personality you're in, what rights and privileges you've got and what model you're using. And it allows us to see filtered views where we say too much, I just want to see the top whatever my ranking scheme is this. So what we've done here is taken a scraped spreadsheet view, so I didn't even have the Social Computing Symposium data, so we're working against a subset of -- and where it says -- you can't say this took a day because actually the data striping took a little bit longer. But in a couple of hours the guys took the ontology, overlaid it and said here's the views. So all of these tools are just in play and they allow you to see information, if I actually put some detailed time into the ontology, to see information from very different perspectives. And our belief is this relates to any kind of content, any kind of ongoing understanding of tacit knowledge shifts and changes and really just sort of on going capture of what we're learning and why. And so that's what we're doing. We're doing this in a much larger scale than the Social Computing Symposium, and our big thing is what are the connections when I see something that conflicts with something else, what else does it affect or change, and how is my understanding shifting over time, and can I allow communities to see stuff that's relevant to them from a perspective that matters. So that's what we do. Any questions? Yeah. >>: Thank you for a very interesting presentation. My major concern is how one derives these as you call them lenses or in other contexts they're called filters. How one derives the -- in such a way that one is not to subject to error? There's an issue of impedance. Like your example with your son and the club, that's a perfect example. >> Danielle Forsyth: I think the reality is that we start every system we ever do with the idea that it's wrong. And the thing that's beautiful about semantically-based systems is you can do two things: You can improve it as you go and you can understand how you're improving it and for what reason. And I think that's the beauty here is these systems are designed to change. There isn't this sense that you're building a model, the model is right. The idea is that you want to build a model and put policy around it that you don't want change when you know stuff's pretty consistent. You know, igneous rocks haven't changed a lot, so it's okay to describe them with a pretty consistent description and have people anchor into it. But when you talk about involving communities that deal with water, the snow guys, the polar icecap guys, all of those guys would never agree to a model. And they all are evolving their own understandings independently. And so you just kind of have to start with -- I always start with the idea that I get 50 percent right and it gives me a baseline for assessment. And as somebody tells me it's wrong, I ask why so that I can contextualize the change. And I think that's the big thing. You know, it's not a hip night club, Mom, because live jazz sucks, is the answer I would get. >> Roy Hall: I think there's a little bit larger answer to that, which is -- say what? >> Danielle Forsyth: I'm a small thinker, big thinker. >> Roy Hall: Small thinker. The real -- there's an aspect to the answer which is you have information out there that's been labeled in many different ways by many different people, right? And a part of the -- a part of your understanding of that information is an understanding of how that labeling is biased relative to your point of view. And being able to say when this person labels something this way, what they really mean is. And so you can really do semantic mapping that says this label, this category by this person correlates with this set of properties which then means I can say I want to define a category of stuff I'm interested in and this category has these properties associated with it, which is really our kid wants to say his hip nightclub has these properties associated with it. Somebody else has defined this as a hip nightclub but in their biased perspective that's what hip is. When I search, I want to search against properties, not against whether somebody else labeled it as a hip kind of place. >> Danielle Forsyth: Well, and if you look at a lot of the social computing forums, you've got a lot of contextual free tagging. And so what you end up with is keywords which ends up not giving you a picture of -- you know, you get a picture that there's a lot of that keyword but you don't get a picture of what somebody mean's by that keyword and it was kind of fun to watch. I think one of the big The New York Times put up a tag cloud and they gave you the choice yesterday of choosing one of the tags and actually pulling your own tag in. And all of the big tags were chosen, which I thought was sort of interesting because people are more comfortable in choosing off a list than in actually coming up with a word that described what they were feeling about the election. And yet at the same time the only context we had was are you an Obama supporter or are you a McCain supporter, and it turned out if you looked at the red and blue words, they were quite different. But that's about all you got from it. You didn't get some, you know, I'm exhilarated about the win or I'm exhilarated about the future. >>: But I'm concerned with -- it sounds as though your approach to building these semantic models or lenses is you go in and you talk to people and you -- hands on and you have a feedback loop and so forth. But it's very labor intensive. >> Danielle Forsyth: No. We actually teach people how to do it. And we try and teach people who involve at least some nonliteral thinkers. One of the things that we've learned in building a lot of semantic models is there are some of us who don't need instance information. And there are some people who cannot describe a thing unless they're talking about a specific thing or a specific piece of information. We've -- of the modeling staff, we've got very few engineers and computer scientists who are good conceptual modelers. So you actually find that you get great conceptual modeling done when you bring a combination of somebody who knows the logic of semantic models with subject matter expertise. And, you know, quite honestly, a little ontology goes a very long way. There's like I think 12 nodes in the ontology that I've built for displaying the social computing stuff. There's nothing really there to speak of. And I've built a lot more, but I just used a part of it because it was sufficient for me to express a couple of key concepts. So the other thing that's nice about most of the semantically rich systems is when you discover a property that's interesting, has young single females for my son's hip nightclub, you can add that into your belief system as a property. And you can give it things like re-edification properties that are rank and other kinds of weight. So you don't have to get it right, I guess, is my point. But clearly semantics and schema are not the same people. >> Roy Hall: Yeah. And I think, you know, there's a part where you say how do you define the lens. Well, we define -- we work on starting points. We don't define the lens. You define the lens. When I come to this and I want to look at things, I create my own lens, I start to talk about what's important to me. And depending on what I'm doing, I may have eight or ten different personal lenses that I use. And they evolve over time. And so they're -- and, yes, sometimes I miss things, sometimes I get stuff I don't want, and I continue to refine my lenses. I do that. One of the things about semantics is it really kind of gives you a tool to talk about closeness, so you can say, well, I'm -- you know, I thought this would be a good lens, I'm not seeing what I want expand it a little bit, show me what's around it, and let me refocus it. So it really gives you the opportunity to do that. And I think that that's part of -- you know, it's up to you to evaluate whether your lens is a good lens. If it doesn't give you the information you need, it's not a good lens. >> Danielle Forsyth: This is one of the big things about the standards is they give us a vehicle for publishing lenses that are information dependent. And one of the things that we've found is that unless you publish them without rich description of what they were intended for and what the points of expansion are, you actually provide someone with something that is near meaningless. If you look at some Web central which is where a lot of ontologies is parked, you kind of look at them and you go I wonder what questions these people wanted to ask, you know, why was this thing done. And so I think that, you know, the community is at large looking at better publishing metaphors so that they can make these ontologies available and you can start with something that's close enough for -- and there's some really good ones around cultural, political, social modeling. There's -- you know, the base ontology gills for time and space and all those kinds of things are in good shape. We tend to find that, you know, by putting a bunch of them together you can get a partial solution and start to just work on the parts that you think are your unique problem. >>: I'm just concerned about scalability, right? I have 320 million users I'm concerned about. I cannot go even with this whole room full of people one by one asking them questions. I need to somehow in an automated fashion create these ontologies that are appropriate to the person and, as you say, each person has at least six or eight or twelve ontologies of their own, and those may even differ from day-to-day. >>: [inaudible] >> Danielle Forsyth: Are you talking about people who you want to characterize themselves but you need to kind of get them to tell you about what they care about? >>: Yes. >> Danielle Forsyth: So I actually thought about coming here and thinking about Live Search and thinking about -- I was thinking about SharePoint and I was thinking about research papers and I was thinking about Live Search. And there's a number of ways that you could actually build a characterization ontology. You know, we do this around market mitigation and any number of other things. We ask people questions and you give them not yes-no, you give them sliders, how environmentally sensitive do you think you are. And you start to get people to sort of -we do this, I mean, your HR department does this all the time to sort of put you in quadrants and decide on all kinds of stuff, but, you know, this idea of -- I don't want to say profiling, but characterizing people around intraspaces, which sites they go to, how much time they spend online during the day, what are their -- you know, how much are they willing to tell you so that you can suggest models of interest and they the give you some feedback and say, that chess club thing was just way out there. And I think what you're trying to do is better characterize over time so you can start to -and I don't -- you know, I'm very careful about profiling and pushing, but this idea of suggesting things that are like that that might be meaningful. And I actually think you can do a lot of really interesting things in allowing people to self-characterize for better ad participation or better just social participation. >>: So the other question I have is when you mentioned nearness and you mentioned [inaudible] what does that mean? How do you define it? >> Roy Hall: It's just -- it's defined a lot of different ways. And really when we talk about building an ontology and you say, you know, I have a class of thing I'll call people and then I'll group people in different ways, it really is -- you really need to assign what you mean and whether those groups are close or not to you. There are a number of different distance metrics we've tried, and certainly you can do degrees of separation. But degrees of separation tends to be pretty inadequate because in places where space is highly populated, you'll end up building a lot of nodes and categories, and in fact those things are all pretty close to each other. Whereas in sparse places, if you go degrees of separation, you'll identify something as close which is actually conceptually pretty distant, but you never really had to describe a lot of stuff around it. So, you know, that single node was good enough. >> Danielle Forsyth: But getting it wrong actually has a lot of learning opportunities too. And I think the thing is -- for instance, we work with a whole family of general dynamics procurement risk applications, and your bias line, if you think about process modeling, we have descriptive models, and in a sense what a descriptive model is from their perspective is what they expect to happen in a process. So say they're ruling out thin client workstations, they expect this to happen in this order, they've made a bunch of assumptions, they want to know when things are happening differently and whether or not the decisions that are making it are contributing to the goal or conflicting with the goal. So they actually develop prescriptive models for the things they don't want to have happen. And so this idea of models that are always right is only part of what you're trying to do. Because if you find a model that is only right when you've got one of these, it's appropriate for one of these. It's just not appropriate for some of these. So modeling things that you don't want to have happen, modeling things that don't work, it's a much more fun modeling workshop. You don't have to get it right. >>: So in your approach do you take a probabilistic approach or is it completely it's right or wrong, binary? >> Roy Hall: Well, it -- again, it's right or wrong for what? There's -[multiple people speaking at once] >>: You talked about papers from authorities that you trust. And I don't trust people completely, maybe I trust them three out of four times. >> Danielle Forsyth: Or maybe I trust them in graphics and I don't trust them in psychology. >>: But given psychology, given that context, there's still -- it may not be -Roy Hall: Okay. So there's two things that are going on here. One is how do I describe my desires or wants or my point of view, which is around an ontology that's already there. And I'll say around it's an ontology that's already there, that we've identified that there are some abstract things that exist, there are papers, there are authors, there are reviewers, there are places that they went to school and associations that they have and things like that. And it's really when you want to say -- to me, I want to rank this reviewer, I want to rank this particular author, I want to rank this author's work in this area. And the ontology is separate from that ranking. That ranking is a bunch of instance data that I created. It's my particular view onto that ontology. And then you process everything through a model to -- you know, to get really strength of relationship if you graphed it to be able to do the heavy lines, the light lines, the things you might just ghost in the background of, yeah, they were connected, but you said you weren't interested, I think it's more presentation problem at that point than a correctness problem. From the sense of if I let you -- if I let you discover where this stuff -- where what you find is within a global context, highlighting what you find and then let you say, yep, that -- you know, I nailed it or say no, look at all this stuff around that really is interesting to me that I didn't find, means you need to change your weightings, means ->> Danielle Forsyth: It's one of the reasons that re-edification is really important, that your confidence in something is probably ever changing and watching how your confidence in something shifts over time, tells you a lot. You know, Roy and Jim, if you put them side by side, you probably think that they're both 3D graphics experts, but Roy's 3D graphics expertise is ten years out of date and I don't even know what Jim's doing these day. I haven't looked at his papers for a while. I'm not trying to -- hopefully you're doing something really cool. But I know he is. But so it could be that something -- some metric is really good until your some number of years out of date so the metric itself is self-declining as a result of time away from the community. So there there's just this whole thing of change. It's like, yeah, you've got to design it so it changes. >> Roy Hall: I think there's the definition of your stuff or the expression of your stuff, and then there's the expression of your beliefs or what's the model, what gets me close to a goal, how do I weight things, what do those weighting functions look like. And, again, that comes out of a different ontology that talks about models. And if you say, well, I have a different kind of weighting function I want to use, you add it, it becomes one of the things in the network you build to rank and weight things. >> Danielle Forsyth: You had a question. >>: You say [inaudible] relationship that might be erroneous based on a subset of the data [inaudible] something is outside of [inaudible]. >> Danielle Forsyth: Yeah. I think, you know, when you start with an assumption -- and I show annotation as a thing that's sort of found in the social computing context, but when I find out that something is wrong and it was part of my summary base, how is it that I connected that in and know what else that was affiliated with and who else referenced it and what model it was tied to. So I think that it doesn't matter if it was a data problem or if it's just an ongoing learning problem. As you discover things that no longer apply, you want to capture that they both no longer apply or fit and why. >>: Can you also see -- can you look at something in the system, can the system tell you I bet -- watch out, this might be wrong because [inaudible]? >> Roy Hall: So one of the things that we do have difficulty with when we talk to people is it sounds like we're building reasoning engines around matching engines and all sorts of analytics to assign probabilities and things like that. We do the approved inferences based on whatever are the ontologies and whatever point of view you've expressed. The rest of it is we do the infrastructure, how people express models. We don't do basing in analytics or things like that. But you the [inaudible] in a box and wire them up really easily. >> Danielle Forsyth: In fact, the workflow network was really defined in a way that you can run multiple analytics and compare the results and do things like that in working with so many different communities where different things, they've got different views of science and it's fit for their application and any number of things. And so in the text world we work with text analytics, we work with image analysis, we work with a number of things. But most of the people who are doing analytical studies have basing algorithms or some sort of [inaudible] that they want to be able to throw into the loop. And you want to know you ran it and what the results were and why that was important to you, but we don't claim to be experts in the analysis field for every discipline that we serve. >> Jim: Okay. Let's thank the speakers. [applause]