16233 >> Ed Cuttrell: Okay. So I think we're... Welcome. And it's my pleasure today to introduce Andreas...

advertisement
16233
>> Ed Cuttrell: Okay. So I think we're going to go ahead and get started.
Welcome. And it's my pleasure today to introduce Andreas Dengel. He's at the
German Research Center for Artificial Intelligence in Kaiserslautern, DFKI.
And Professor Dengel was going to be here, I guess, about a month and a half
ago or something like that.
>> Andreas Dengel:
About three months ago.
>> Ed Cuttrell: That long ago? Okay. And unfortunately he wasn't able to
make it then. I think you were ill. Something happened. He's here now, so
this is fabulous. And in particular, on a personal note, you may know
something else about him, is that he's the advisor of the star intern of my
summer, George Bushard. So he's a student of Andreas.
So anyway, thank you. And I'll turn it over to you.
strategy for building the semantic desktop.
This is tools and
>> Andreas Dengel: Thanks a lot, Ed. Thanks for coming today.
here. So I like to move, not staying here.
I'm glad to be
So it's better for explanations and gestures and so on. So today I'm talking
about the semantic desktop. It's a bit derived from the semantic web. I'll
talk about that a little later on.
I'd like to ask you, maybe Gil knows something, but who knows about DFKI? Some
of you. Well, let me give a very short introduction to DFKI, because it's a
very special institution, not because of all research but because of his
organization. So DFKI was founded about 20 years ago. So as a laser pointer,
you can see it was in July of 1988. And it was based on a competition of the
federal government as a reaction to the fifth generation program, fifth
generation programming systems in Japan.
And also on MCC. And the federal government intended to found the national
center to focus all research on AI on one hand side and the other to build
something that is going to transfer to the industry.
So they thought about not founding another [inaudible] institute or another
[inaudible] institute. You should know these both directions, but to do
something different.
And they asked for proposals and so many cities competed at the time from
hamburg to [inaudible] to Munich.
Finally, a decision was made that the institute is founded in Kaiserslautern
[inaudible], but it's founded as a non for profit private research company,
which means the main share is given to the industry and they should take care
that after some years the institute is running by its own.
So starting, the government gave some money and only said at the time that five
years should be enough to make the experiment. And after that the institute
should run by its own.
So today we have a third side in Braman which is founded in 2005, and since
last year we also have another side in Berlin, because our federal government
had the idea that we should be present in our capital.
So, however, as I told you, as a private company, there were shareholders. And
then DFKI was founded. The condition was that companies had to prove that they
do research on AI, and at the time, late '80s, you can imagine there were the
traditional suspects, so to say, Siemens and IBM and Phillips and AEG, which
had a tremendously good research organization at the time.
And there was [inaudible], may be different, but there was also a German
company called Nicksdorf [phonetic], maybe some of you remember, a very
successful company. It was the first SAP, so to say. But as you see today
there's a new generation of shareholders, including Microsoft, Deutschland,
Germany, of course. But these are very good partners how.
They are giving lots of research contracts to DFKI, and therefore own some
share while all of them have the same share. DFKI grew a lot. So today we
have more than 310 full-time researchers, which are employed, and another 300
part-time researchers. Most of them are students. Many interns from other
countries.
And it's still growing. This year again it's growing and, well, there are some
customers, just a short collection, ranging from the United Nations near us
from Google from Apple, Microsoft as well, SAP, and then to Japan, many
Japanese companies like Sony, [inaudible], Hitachi are customers of DFKI.
Some results, because I told you DFKI is a [inaudible] institute or company,
whatever you like, because we're just in between, so I'm also a professor at
the university. And all the directors of the DFKI are professors as well. So
that's a bit strange, however.
Transfer means transfer to the academic sector. But also to industry. So we
should consider both. And all five years we are strictly evaluated by the
federal government. And this is kind of an immediate result. Right now there
are 45 professors up to today coming out from our staff.
And we are belonging to eight networks of excellence in Europe, et cetera, but
also from the economic side. Because we are not working directly at the
market, one of our missions is to provide transfer in terms of spin-off
companies.
And right now there are 42 companies. There are another two which will be
founded very soon and for that reason receive the German spin-off award in
2004.
And that should it be about DFKI. Maybe now you have an idea about who we are
and what's our mission. And today I'm talking a bit more about our scientific
mission, the semantic desktop, because this is a very important and growing
segment of how to provide a way to semantically interact with information or
how to implement the semantic web, whatever you like to tell it, and give an
idea about what we think should semantic desktop be, how we could build it,
because it's not so easy to do.
So we need to implement several strategies, and also how to integrate paper
documents because I worked for Xerox in the early '90s, and you know about the
paperless office, they're going to award the paper pool office but we still
interact with paper a lot and that is not in our community, but if you go to
industry, they have a lot of paper and it's a matter of fact that we need to
integrate paper into this consideration.
And, finally, I'll sum up and show you about what we are intending to do in the
future.
So what is the semantic desktop? I first start with a kind of definition we
gave, that it's a device in which an individual stores all heretible
information like documents, multi-media or messages, and these are interpreted
as semantic web resources.
So it's a unique identifier, URI, and all data is accessible and queriable as
RDF craft. Resources from the web can be stored and content can be shared with
others. And ontotologies allow the user to express personal mentle models as
their own way how to consider information.
And that's formed the semantic clue interconnecting information and systems.
Applications respect this and store read and communicate via ontotologies and
semantic web proticols. Semantic desktop is, so to say, an enlarged supplement
to the user's memory.
So what do you like to say with that? This is just a result of our research we
do on social network analysis. This shows the blogs, the hundred-most
important blogs in Germany, how they are interconnected.
This is just a small fragment of the whole Internet.
Internet, the web is our primary means to express, to
others, to society, to our fox-on am I and everything
is input by the human being. However, it's very hard
of all these information items.
And if you consider the
give information to
that we find on the web
to interpret the contents
And, therefore, many researchers in the world, including Tim Burners Lee tend
to represent semantics of all the information to make the information better
understandable.
But there's a big problem. And the problem is how could we find a respective
vocabulary to all the people, how could we speak about the same? Because
people have different roles, interests, backgrounds, educations, projects,
whatever, and I like to give you an idea how we could solve such a problem if
we start from an individual consideration of information.
So finding the right vocabulary is difficult. I give you this example here
showing you the four [inaudible] and I want you to ask you to categorize them
into two categories maybe you can help me.
>>: [Inaudible].
>> Andreas Dengel: Other.
>>: Sweet [inaudible].
>> Andreas Dengel: Good.
>>: Round and not round.
>> Andreas Dengel: One with stick, without stick. The one I like, the one I
don't like. The one I bought last week, the one I didn't. There's many ways
to express the view to information. This is a very simple example. And if you
see the daily practice working with documents, it strongly depends on your
specific situation about what you're currently doing.
If you have time, if you have no time to work with information. If you just
consider a document and you would ask a lawyer or a serviceman or a technology
guy what is the contents of the document. It's different. So people think
different. And it's if you go to the situation, the office, people use their
own vocabulary to express what information means.
And major companies class categorization of information is highly supported.
And if you take the [inaudible] the Delphi, the Delphi Report from 2004 and
this showed who classified the information, you can see that most of the
information is classified by an individual.
Because individuals tend to use their own organization, because the
organization represents their own world of information, which is easy to use
for them.
So I want to state some thesis that, first of all, the bondage of formal
organization of information inhibits creativity and limits the option of
self-organization. And, second, the document is not just a piece of paper like
in the past. Today we have a very dynamic object of multi-media contents,
which is changing over time.
So we need a new definition of an archive, because the individual drains of
thoughts, they always drive the interpretation of the contents. So we have
very perspective views to the contents depending on our demand in the specific
moment or because somebody calls us and so on.
So if we consider even single terms expressing what we mean, it's very hard for
us to transfer the meaning to other people. I just want to take a statement of
Immanuel Kant, a famous philosopher in Germany, talks about imaginations
without terms applied or terms without imagination are empty.
So if I would tell you something you have never heard before, you could not
understand. And if I have something in mind, I can tell you, you can see it.
Because we have imaginations associations in our brain and we cannot send it
through the air to other people. We have to do abstractions via the language.
If I would tell you about my daughter, of course you know what a daughter is.
But you don't know whether it's a small kid or a young lady or how she looks
like because we need all this contextual information when we talk about this.
So this is very important for the interpretation. And this is always reflected
within our mental models. And mental models is kind of a representation of the
past where we have hypothesis and test all the situations we have in mind and
try to reflect the situations to our knowledge.
And that's what we do at our work spaces. So in the office environment we
classify documents according to our specific preferences. We establish folders
and give names to the folders and these folders represent our very perspective
view to the world.
And so if we have these taxonomies available, these taxonomies, however, have
no unique meaning. Assume I would take a folder and I have a very common
example, like with a name vacations. So I store all the document inside about
my preferences to vacations, so I have something in mind.
If I would just tell the term vacations to somebody else who has plans, who
just came from vacations, she or he must think about something different. And
that's a situation in the office today.
If I want to have a look into the taxonomies or directories of my colleague
because he or she is absent, it is very hard for me to understand what's
inside.
So this is because we think different. On the other hand, these taxonomies are
also allowing perspective considerations. Assuming you would ask me -- I know
I gave it to Henry, you asked me for my slides and I would send you an e-mail
with the attachment of my slides, you have to ask where should I file them?
Because on abstract layer, information as to who, where, what, when, et cetera,
dimension, and you could put it into talks, into a new one, maybe my name and
to MSR and Semantic Desktop and whatever and you have to question where to file
it best because you're to recover, remind someone.
But depending on the situation, you sometimes have to ask for who or sometimes
you ask for when or for where. And then you have to remind yourself in which
folder you put it. Or you have a good desktop search, whatever.
But beside this, there's a third issue which is of importance that there's no
intuitive view to this information based on the taxonomical organization. But
we have the e-mail folders and and bookmarks and all these words are separated.
There's no way to really associate the information.
So many people, believe it or not, we did some studies, they copy the
information, put them in different folders and different roles to be sure that
they refind it. So based upon these considerations we first of all develop a
specific interface called the personal memory. And this personal memory allows
you to store information with respect to.different views at the same time.
So this is not a physical file of information, but a virtual filing. So we are
able to just mirror the explorer in this present memory and at the same time we
could define appropriate or additional views to information like here
unfortunately is in German.
We have the document view up here. There's the partner and customer view.
There's the organization view. There's a topic view and the project view.
you could put single files into different folders, but you could apply
information retrieval technology to really analyze how users consider
information.
So
How do they cluster information into single folders. And so everything that
the user is doing is recorded by the system and all the folders receive a kind
of profile representing the mental model.
Think about vacations, if I were to ask you what do you think if I were to ask
you for vacations? And maybe you think about palms and beach and sea and
whatever. And this is very similar here from the interpretation, of course,
there are mainly statistical methods but what we can store is a kind of
electronic mental model for each of the folders explaining the contents,
because all the documents are considered to belong together.
So the advantage here is that based on that we could not only store documents
with respect to different views, but if new documents arrive, the system could
guess about the relationships, associate the contents of the documents to the
different views.
So this is shown by these question marks here so that a document proposes and
asks you would you like to store it or to file it within this virtual folder.
You just say yes and change the question mark to the who. As soon as you do
that, the profile of this folder is changed. So the new document is added and
the system learns over time more and more about your perception of the world.
Now, there are many different retrieval techniques included like user feedback
with a plus or minus of the system and many others. So it's explained in the
paper if you would have a look more deeper to this technology.
So up to now I want to summarize there are some advantages of this basic
approach. The contents of the information objects whether it's a single
document, whether it's a terminal or non-terminal folder, expressed in the same
way. So we could compare documents with folders and with higher order folders
at the same time.
And the communication between the user and his model is driven by
conceptualizations, allowing to communicate with your system on the same layer
of your mental models, because the system helps you to associate terms in the
context of the meaning.
So it's very easy to understand your system in a better way. And if you
combine it with a perspective directories, the user also gets an excellent
orientation and access point to the information.
However, there's not yet a semantics. And there are only implicit
relationships among the informations and the terms. And our idea is to have
more expressiveness in this system and we are considering this system as a
vehicle to go towards semantics and show you an example of how this can be
seen.
So assume you are an insurance company and a document would arrive and you have
the choice to store this document in different information dimensions. For
example. You have a dimension of different document types or offers or
invoices, whatever. You have different contacts about persons. You have
events and you have cases, and the system would just guess because of the
experience it has of your earlier filing.
That document may belong or might belong to different folders. So enlarge some
of them. We have certificates, notification of claims, we have different
contacts. We have one event here, an accident. And we have cases like car
damage.
And so as soon as the user would change the question mark to a hook, the system
could use implicit pre-given generic relationships which are generated on the
fly.
For example, that this document is a notification of claim. It further is
generated by a specific person. Or it describes an accident. Or, furthermore,
it addresses the car damage, but you could do even more. So assuming that the
document is belonging to the dimensions at the same time, you could further
initiate a relationship like the person also declares the accident or the
accident implies the car damage.
So long the work flow, other documents could come in and the documents are then
[inaudible], addressing all the same subject generated by a different person
that addresses the same car damage and comments on the accident.
They're just generated on the fly while the user makes its talks. So as a
result we obtain a representation, which is compatible to the semantic web.
Everything is considered to be semantic web resource using the URIs.
And so we have different resources like e-mail here with the IMAP address, with
a file here, file document or website. And all the resources may be
categorized as an event, a topic or an organization at the same time so you
have some categories.
So RDFS. So [inaudible] which are break given. And in the first setting we
are not looking as a kind of general framework while you're still with the
user.
So because we have this personal view to the information. So we consider a
personal information model in the first step. Not yet an ontology. So as an
example here, as consider my talk again which is filed in some of my folders.
So we have the URI of this talk, and I could categorize this as being a talk.
And it's held in the specific place, which has a website. So have another URI.
And I could say that this is an organization. And this talk is hosted by
Microsoft. And this is also an organization.
And there's a chairperson and there's another one who is giving the talk and
those of these are persons. And well, it's working for Microsoft. I work for
DFKI and DFKI is another organization. And maybe I have a address of it in my
Outlook system so I have another URI which is connected, and I also could make
connection to my calendar system.
So everything could be connected via this approach, and we are generating
triples on the fly, and therefore connect our native resources with all the
applications on our PC.
And generating step by step this PIMO. So we have a hierarchy of PIMO classes,
catergories, RDFS, and we have a mixture of the classes and the entire stances
on our PC.
And so step by step the user is able to generate more and more semantics.
>>: What's a triple?
>> Andreas Dengel: It's and RDF representation where you can think of subject,
verb, object, subject, predicate object. So you use a URI, any predicate given
like in the example before. So this is a triple. This is a subject and
object, and this is the predicate in between and you could interconnect many of
those. Whether it's an entire resource or it's a class.
So the good thing is that step by step you're enhancing the semantics within
your work space. And you could also share the resources with others. For
example, you'd just make them public. You'd make them accessible. If you're
an expert on specific fields you can just give parts of your PIMO to other
people.
And there are also techniques to avoid the use of different syntax, talking
about the same things. It's called smashing. There are several approaches to
avoid this. You have just the representative, which is normalized from
different syntaxes, and then guarantees kind of a semantic identity of this.
There are some sources you could read or references you can read about all
this.
>>: What does it mean to share a PIMO? Does it mean you're sharing the
objects as well as their relations or just the RDL?
>> Andreas Dengel: It depends on what you want to do. So you could first
share your PIMO on the layer of URIs, but as soon as the URI is visible for the
other person, it's accessible for him or her. So you could share the resources
as well. So the model but also the resources.
>>: And the relation types that you have in the previous slide like relates to
or describes, are those part of the pre-existing [inaudible] ontologies?
>> Andreas Dengel:
>>: The verbs.
Yes.
Exactly.
>> Andreas Dengel: Yes. You should try to normalize the way how to use
relations, because if you have, say, a close environment where people share
information, you have to predefine these somehow. So our intention is to
really find out from the PIMO what kind of relations. This is another work I
do not present today what are the most common relationships people use in
specific environments to build extracts out of this.
And do something like a preconfiguration of relationships which suit to a
specific application. So to learn from people when doing so, I'll come to this
later based upon web 2.0 issues but then to learn from this and learn from the
basic framework for the people who can use it.
So, yes, so first step is to build a semantic desktop where you have a personal
assistant at your work space, which step by step learns what you're doing and
your preferences are and who works in the background and are giving advice,
connecting documents with applications and so on.
The most important thing here is that we combine this with active user
observation, and this is some relation to [inaudible] work, I'll come to that
later, because our intention is also to see in what context people work with
what information items.
How do they use those items in a specific way? And there are some benefits out
of that. For example, that incoming e-mails are tried to be classified by the
system. An e-mail coming in here is about the topic ontology or is about
reviewing specific papers to a conference called WM-1 or that this e-mail
refers to the DFKI department knowledge management.
So the user could just accept the proposals or just do another one.
We also implemented a so-called drop box, so which is applied or employed as
soon as you do save-as. The system makes proposals based on that what you
already get. And what categories the information should be put in. And this
is another very nice big example about context averse services. It's about
observing the user in the active window.
Which means assuming that the user would just browse here on this web page, we
developed an iSight bar which uses, first of all, common search but more than
that it offers all tasks the user is working on, all data sources and concepts,
and, more specifically, it shows all current tasks which are related to the
contents of this web page.
Or it shows all context relevant information items or all categories which
might fit to the contents or the persons and projects. Moreover, you could
also relate the contents of the web page by a link which says that the subject
of this document is the project [inaudible]. You can see another triple here,
which you could define interactively with the system.
So you combine traditional IR with the semantics of the Internet. And there's
another example we developed which is called iDocument. And this is a
combination of Web 2.0 issues like the tent cloud, but it also guests about the
contents here, what persons could be related to the contents, yes, or what
organization items, what projects, what topics, so you can see the different
sizes of the terms here because of the importance.
And moreover, because of the information items the entities found. The system
also proposes triples. For example, it finds a project called [inaudible].
It's shown here at the beginning, and it also found things like [inaudible] and
the system guesses that IPOS has project member [inaudible]. So it's just a
proposal for the user instead of typing everything in, it just makes a hook.
He or she could just make a hook and then accept the proposal of the system.
So on the fly you could, again, collect semantic relationships.
also some of the web services here within the system.
So we're using
And this is about semantic search. So assuming you have this PIMO model at the
end, you could also combine this with semantic search where you could type in a
term like [inaudible] it's just a string and the system guests that there are
some persons related to that string, some projects, concepts and events maybe.
There are more than that, but I show you four. And with the search results you
could have direct access to the resources again.
So you can see this is an Outlook entry, but more important the system also
finds another person watching clients which has nothing to do with this
[inaudible] but because we use [inaudible] roots in the background, which, for
example, here says if you found the project related to that, show also the
different other project members.
So you could assist the search by using semantic derivations. This is also an
addition we just developed. This is context sensitive dashboard where an
individual person could retrieve specific situations. This is based on the
assumption that usually people work on different tasks at the same time, and
you have lots of interrupts, because a phone call is coming in and the phone
call implies to do something different. And so you change. You start with
something going on. And this helps you retrieve the, all the active windows
related to the situation you just skipped, to come back to see all the relevant
information at the same time.
So the semantic desktop is built on Patrick Roller, a system developed by DFKI,
open source platform where you can combine all native sources. And we have a
72 repository with PIMO resource stores and others, and we have the kinosis
server. This is part of the semantic desktop.
And on top of that we have different semantic applications, we are web
interfaces and coming through that later is this is all open source. So you
could just look at our web page, open DFKI to unload some of these applications
for your own.
And we also use some annotations within applications like Outlook or like
Mozilla or others where you could just easily browse and link resources
together.
And coming to a very important aspect is if we are still dealing with paper,
how could we integrate paper into this philosophy? And we have different
approaches to that. This is just using a desktop camera, which has about 300
DPI. So about three mega pixels, and this camera projects, I'm not sure
whether you can see. It projects a laser frame on the desktop, and you could
just use, put some documents in there, click and you have the image on your PC
or laptop, and then you combine it with OCR.
>>: Is that available?
>> Andreas Dengel:
Yes.
The company is called Sky.
Can have a look.
And then you've got the OCR results, and this is then transferred to a semantic
wiki we developed. And in the wiki we show not only the recognized text but
furthermore we also show the recognized entities.
So these are part of our PIMO model if you click you go directly to the URI.
And the nice thing here is that we, again, thought about how to collect
information from the resources, how to expand the PIMO. And this is done by
the single entry which allows you to have a predefined kind of categorization
scheme. For example, persons, organizations, events, et cetera.
And as soon as, if you move the mouse into this window, you see the small flag
here, and depending where your mouse is, for example, if you look for places
you just go to the words, click on that, and you can collect new instances of
places, for example.
And, furthermore, you have the kind of semantic indexing of the resources at
the same time.
Another approach we follow is using the iPen, and based on the Anoto pattern,
because many people tend to still read on paper because it's a good interface
to your brain, as in lots of interactions. So we are using the Anoto pattern
to sometimes print paper, to print documents out on the Anoto paper and then
using the iPen, which is staffed with the camera and the pressure sensor and
then we do interactions on this paper.
For example, not only the annotations but also some gestures, some written
gestures, while the gestures indicate semantics. So we can again combine the
PIMO with the text on the document. We also now are developing an Anoto
tabletop where we use Anoto paper with a single beamer beneath it, and then we
do interactions, write on it, and so on.
There's some applications we have in mind, and now coming to user observation,
I already talked about it's very important to see what documents are used in
what situations, and because of these devices today only have limited ways to
observe what users are doing, we thought about how to develop interface to
brain.
And this is not very easy. Next year we will start with the EEG measurement.
But more straightforward approach was just to use the eye, because the eye is
an excellent interface to the brain.
And so we combined eye tracking. There's a company from Scandinavia called
Tobe, maybe you've heard, I've heard you have an interface. We now bought a
screenless eye tracker. You also bought, maybe. I'm not sure. But the idea
is not only to see those hit maps or hitting maps, but also to see how
intensive people read information on the screen. And this is very important
when solving specific tasks because we can store the information of this eye
reading, combined with the intensity of the different passages in order to
collect best practices.
View, for example, would do a proposal, write a project proposal for specific
federal institution, and you now know how to do it. Maybe there is another
person who does not know, and so the system could give some hints about what
information is used to solve which tasks and the whole process.
And this is a very important fact to do so. And coming to the chances of this
approach, finally, I'd first like to sum up because you see the shift of the
web from the traditional web towards a web of people and we have the other
shift of the web of meaning.
And the focus today is on communities, on foxonomies, to have collective wisdom
to solve tasks. And we think the semantic desktop is not only driving
paradigm, but also a very nice tool, instrument to implement semantic web on
your micro volt of the desktop in the first step.
And because we have these front networks where we have trust, project teams,
departments in a company or interest groups, whatever, you could share these
semantic information with the others at the same time.
So we think about these two dimensions having community relationship as one
dimension and the semantic foundations as the others. And you see the shift
towards web 1.0 and towards the semantic web. We think the semantic desktop
could be a nice instrument to further reach higher semantics or more semantics
of the information. There's also a nice paper just finished about a month ago.
We also succeeded in getting the largest IP, so integrated projects of the six
framework in the EU. It's called the Social Semantic Desktop. And this is
open source framework which could be used by everybody. And this is now going
to provide a mechanism how we could share the PIMOs with others and how to
socialize them in order to build real ontologies. It's going into how
ontologies can be mapped and merged based on individual PIMOs, how we can build
extracts based on expert knowledge. For example, let's say in the insurance
companies where you have different experts working on the same topics and they
have their own PIMOs, how could we build a kind of sustainable piece of
knowledge which could be given to a newcomer, to freshmen joining the company
as a kind of starting, which could be used by her or him.
So there are many partners in this project, and this project will finish end of
this year. And we are now going to develop very nice interfaces and we like to
spin off some new companies based on that.
And we'll see how that works. And so thanks to my team and thanks to you for
your attention, and if there are questions, I'm ready to answer them. Thank
you.
[applause]
>>: Can you answer about the evolution of the ontology over time? So I guess
very much like the notion of having lots of attributes and being able to
browse. But if I now decide to organize, say, a file hierarchy that has single
class membership, and I rearrange projects, I move the files to the right
projects and it's over.
If you have back pointers to those objects it's hard to know how they should
get reorganized.
>> Andreas Dengel: That's right.
>>: So the question is, in general, is there any way for dealing with the
evolution of these ontologies over time for things than simple container --
>> Andreas Dengel: This is a very difficult task. We have two approaches to
that. As each memory, such a memory should also forget. The question is how
should the memory forget. So we are going to implement a kind of time stamp.
If a document is used -- is not used for a certain time or has a certain age,
it will pop up and ask do you need me again. Or do you need me once more. So
you could just change the time stamp to another, maybe a time in the future.
And the second issue is that we just started a new project about validity,
whether a relation still holds or not.
This is a very important fact. And we just started. So we have different
ideas about that, because we also like to use some kind of trust technology
from the social network analysis domain to evaluate those, but complete
solution is not yet available.
>>: My question is about for semantic desktop [phonetic].
>> Andreas Dengel: Basically statistical methods. We use different methods at
the same time. We have support vector machines. We have [inaudible] methods.
Maybe you can, a good idea would be to just see our open DFKI website. There's
a tool called Dinaque [phonetic]. So it's a very rich, very powerful retrieval
engine, which is used also in the semantic desktop. It combines all
state-of-the-art technologies, basically built on Lucene, but it's enriched by
many things, including dynamic search. We have sliders, so you can have in
between weighting of the terms and many nice features. So please look at it
and download it.
>>: Wondering, can you talk a little bit about how you encourage people to
apply the semantics in the first place? When you showed your systems of all of
the different things up there, looks quite overwhelming, and it seems one of
the key things you have to do is to get somebody to apply the semantics.
>> Andreas Dengel: Yes. That's not an easy thing to do. First of all, we are
trusting knowledge workers. So we also did some first case studies in
insurances and in retail companies. But if you see traditional users of
Microsoft systems, for example, they are overwhelmed with relationships.
They like to have a very lightweight, easy way to categorize, and they accept
some weaknesses of the system in retrieval, because they want to avoid
complexity. So they rather invest time instead of having higher ordered
relationships of between different information items.
So we basically address knowledge workers and we see the tendency based on the
Internet, the deployment, how they use the Internet, that knowledge work is
very much improving in the future so that people will join virtual teams which
will have very high qualification into specific things and they need tools to
really orient themselves to navigate through information.
So there will be a difference between people who could use the systems very
soon and those who should wait for more lightweight interfaces. And usability
is a thing which is of high importance for the future, how to build those
things.
>>: I have a question. I think this is different. I've done some stuff with
building graphs, over knowledge resources and one of the things that is really
essential there is that links have kind of a time -- or some links do. Like Ed
doesn't put up this link that says Microsoft, doesn't work with Microsoft, and
started working at Microsoft on this day. He happens to be working at
Microsoft now but he might not be working at Microsoft in the future. So
that's one aspect of the link, the time boxing.
Another is the entity that made the assertion about Ed working at Microsoft,
and when that assertion was made and what evidence supports that assertion is
important for traceability and trustability of that knowledge.
And if it's just a link in a database it doesn't have that trustability.
do I link it?
How
>> Andreas Dengel: From my point of view it's a similar situation like for
Wikipedia. So that many people share this resource and many people contribute
to that resource. And not everything people would put into Wikipedia is the
right stuff, just if you see all the contributions made for George W. Bush, you
see different fractions of people who want to tell this one and want to tell
that one.
So I think at the end you need a combination between detecting whether things
still hold or not. So a system could propose, for example, relationships and
the user would observe, well, I'm not sure whether this still holds, but you
need a moderator. So a qualified person who finally would decide what to do.
So Ed may have left Microsoft sometime and you would see this relationship.
You could just send it to a moderator and say please check.
>>: The underlying representation, is there room for that accountability
information?
>> Andreas Dengel: Not yet. But in Germany we started a very large project
called Thesauri, this is the name of an ancient king, but this thesis project
is focusing on semantic web services. And one of the topics there is to
qualify information.
So we like to integrate measurements about trust, about how to compare services
on the web, respecting specific problems, how could we apply them, and in this
consideration we will have some options to represent this. So we are enhancing
just semantic representation by qualification of information.
>>: If I may add this, this is a popular topic for discussion in the semantic
web community. And that given this area around this, his solution is the
concept of [inaudible] representing the relationship between this and the
resource itself. And if you present the resource that attributes like this
statement the person works for Microsoft may at that particular time in his
duration, so you can represent a graph around the relationship. In either
approach that we support that we particularly like is rather than having
triples, triples you have doubles, where every relationship now can have its
own properties. So you can qualify the relationships as you apply them between
objects the and therefore you can now [inaudible] all of the approaches have
the negatives and the positives.
>> Andreas Dengel: Many people think about both approaches at the same time
having triples having a [inaudible] representation about specific properties.
>>: I love this idea of removing the disorganization of information that I
struggle with on my desktop, jumbled up and search context, because the
situation persists where I find myself constantly distracted by that. I love
the idea. Abstract users memory. And it powerfully reorganizes their
experience. I like that idea. It's the complexity of the systems and, second,
also felt like my, firstly, for my categorization tends to be, time is very
relevant to [inaudible] made the categorization, then [inaudible] e-mail and
two weeks the reason I did that. Why has it changed. So I struggle with this.
Not the goal, but I fear that the explicit nature of categorization is bound to
create a factual taxonomy that adds complexity, becomes irrelevant and is
obtained.
>> Andreas Dengel: Right. This is something I have to talk a bit more about.
Oops, the next one. Here we are. So to tell a bit more about the
functionality of the system. You are right that users change their thinking
because there are new topics arising, new projects people coming into the team
and this system refers to view still avoids semantics. It's built on implicit
relationships for the first thing.
So this is just a platform to build relationships on the next layer, because we
are integrating now kind of generic relationships between the folders. But you
are right that things are changing and therefore we have some cluster
algorithms in between. So, for example, if the topics within a folder are
going to be too heterogeneous, the system pops up and tells you I would split
this folder in two sub categories and also proposes some placeholder. So the
most important term of these two clusters.
So the user, again, can think about, hmm, maybe I should change the
organization. So the system is a kind of an assistant, which helped the users
to really keep track about the changes as well.
And what I didn't mention also here in the system is that the system also
allows you to publish folders without giving the documents to some people.
Because we also -- we detected that people avoid to give the documents out to
others, because they don't know exactly what they would do with the documents.
So this is a kind of psychological hurdle to really come over. And so we have,
with the right mouse click you can go in the folder and you can publish a
profile of the folder without a document. So if people would search for
specific topics, say for RDF, for example, and I would have published the
semantic web folder, I could be shown as being an expert for RDF, for example.
At the same time documents are shown to the user. So and then can I use a chat
room that people could connect me and ask me? Well, could you give me some
information about this and then it's kind of explicit. If I would publish
everything, I do not really record who is taking the documents away.
But that way I'm available as an expert in the network of the company, and
people could come and explicitly ask me for the documents. So there are many
features. So if you would like to see, have a look in the paper. More
questions?
>>: [Inaudible] do you think that the digital management [inaudible].
>> Andreas Dengel: Interesting. Because I think there is a trend. If you see
the people who are working more and more with computers, they are wanting to
use paper. So it's really going toward a paper-poor office. But I think both
media are very valuable, and, therefore, I already mentioned to [inaudible]
that I'm waiting for the electronic paper. And there are two ways to consider
electronic paper. The one is just to use Anoto patterns to, for example, have
the common paper as an interface, and the other one would be to have a very
lightweight interface where you could interact not only display information
like today but we have both in mind in the focus to develop new technologies to
interact with both media at the same time. So I think the pen and the paper is
really direct, say, multi-sensoric interface to the brain.
It's very important to have a pen, work with a pen with information. Very
interesting studies about this. So you can be more creative than just typing.
And, therefore, I think in the future electronic paper will be very important
interface.
Okay.
>> Ed Cuttrell:
[applause]
Thank you.
Download