From Documents to Knowledge Models Max Völkel Forschungszentrum Informatik

advertisement
From Documents to Knowledge Models
Max Völkel
voelkel@fzi.de
Forschungszentrum Informatik
an der Universität Karlsruhe (TH)
Personal Knowledge Management
Definition: knowledge cues [Haller]
 any kind of symbol, pattern or artefact which evokes some
knowledge in a person’s mind, when viewed or used.
 Knowledge cues can be stored and retrieved on a
computer – while knowledge may or may not.
© 2007
Max Völkel, FZI
 Ok, in fact you 29.03.07,
store ProKW
bits
(signals)
@ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
2
What is a Document?
A team of 50 French researchers discussed …
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
3
Definition: Document
A team of 50 French researchers could agree on:
 Document as form
 Document as a container, which assembles and structures the content to
make it easier for the reader to understand it.
 Document as sign
 Emphasize argumentative structure of the content.
 Document can be referenced  acts as a sign for its meaning.
 Document as medium
 “Reading contract“ = intention or assumption of the author what will
happen with the document.
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
4
Document (my definition) I/II
 A document consists of information atoms.
 An information atom is the smallest unit of content which can be
interpreted without a documents context (but of course requiring
background knowledge). For text, these atoms are single words.
 Packaging – establishes a context
 Reference-ability – reference to a
published document can act as a placeholder
for the content expressed within.
Document
Author, audience, goal
 Process metadata – should be sent along
 such as authors, audience, goal
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
5
Document (my definition) II/II
 A document is a knowledge artefact consisting of several layers:
– content means something.
Content Semantics

Building upon logical and argumentative structure, the author encodes
statements about a domain within the content.
– to convey its content to the reader.
Argumentative Structure
Logical Structure
Visual Structure
Linearity

Argumentative structures appear on all scales. A typical structure is the
“Introduction - Related work – Contribution - Conclusion”-pattern of
scientific articles. On smaller scales, patterns like “claim-proof” and
“question-answer” are used.
– can reference smaller parts within a document
i.e. paragraphs, headlines, footnotes, citations, and title

– guides the reader informally

type-setting (i.e. bold, italics, different font styles and size),
placement of figures, pages – carries additional information
– defined order

for navigating through all information items
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
6
I propose a different document agenda:
I believe we need new electronic
documents which are transparent,
public, principled, and freed from the
traditions of hierarchy and paper.
Ted Nelson
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
7
What do people want?
Why?
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
8
What is a Wiki? What‘s new compared to CMS?
 Easy Contribution  shorter time-to-publication
 Wiki pages can be created and edited by any user quickly and easily
Wikis were the first
 Easy Writing
deployed, collaborative
hypertext
authoring
 Simple text formatting without the need to learn
HTML  Wiki
Syntax
environments
 Easy Linking
 Automatic linking converts written names of pages, images and websites to links
 People want more links
 Recent Changes
 See what has happened – Awareness
 Diff function shows the latest changes
 Easily check whether changes are ok
 Fulltext search for page titles and text
 Backlink function shows which pages link to the current page
 Find the context of this page
© 2007 Max Völkel, FZI
 Directly link deep
into
a
wiki using readable names
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
9
My definition based on
OMG metamodel MOF
What is a Model?
Typed entities and typed relations
Type
A2
Type
C2
Type
B2
(Meta-)Modelling
Type
A1
Type
C1
Type
B1
Modelling
Entity
X
Artifact
X
Entity
Y
Artifact
© 2007 Max Völkel, FZI
Y
Real world from the
viewpoint of the individual
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
10
What is a Knowledge Model?
Document
Ontology
Knowledge Model
Information
atoms
Text (paragraphs, images,
multimedia resources)
Concepts
Items
(text, images, other binary resources)
- Text
Short (headlines) and
longer (paragraphs)
Short labels
Anything from short labels
to structured documents
Order
Strict linear order
–
Yes, may be partial and have cycles
Hierarchy
Yes (chapters, sections,
paragraphs, sentences)
Yes
Yes, may be partial and have cycles
Annotations
Yes (footnotes)
Yes
Yes
- Tagging
–
–
Yes
–
Yes
Yes
Hyperlinks
Yes (internal references
and external citations)
–
Yes, don‘t have to occur inside text
Visual layout
Yes
–
–
(annotation with
keywords)
- Typing
(inc. Inferencing)
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
11
From Documents to Knowledge Models
From analogue to digital documents
 Knowledge models
 smaller content granularity
 very small information atoms,
such as single words
 more interconnected content
 Richly connected items
 more explicit structures.
 explicit semantics for the links.
Definition
 A knowledge model is a superset
of documents and formal ontologies.
 Annotated documents, stored together with their annotations,
can be seen as a knowledge model.
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
12
What is a CDS?
Conceptual Data Structures
context
annotation
source
before
Item
after
target
annotation
member
detail
M. Völkel and H. Haller: Conceptual Data Structures (CDS)
- Towards an Ontology for Semi-Formal Articulation of Personal Knowledge
In Proc. of the 14th International Conference on Conceptual Structures 2006.
Aalborg University - Denmark, July 2006.
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
13
What is a CDS-based Knowledge Model?
 A set of addressable items (text, images, maybe even
multimedia elements)
 Relations between items, classified in four types




Source/target: the generic, directed hyperlink link
Before/after:
ordering relations, linear navigation
Context/detail: hierarchical relations, document and concept hierarchies
Annotation/annotationMember:
annotations, to give the ability to type items and
relations, items are used as types  meta-modeling
 Knowledge models must be able to capture
work-in-progress
 CDS is not strict, you can have cycles, untyped items, paradox ordering, …
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
14
CDS: A Hierarchy of Relations
Legend
Undirected Relation:
related/related
Relation Type
relation/inverse
Directed Linking:
source/target
Annotation:
Order:
annotation/
before/after
annotationMember
Tagging:
tag/tagMember
Instantiation:
type/instance
Task
priority
Hierarchy:
detail/context
informal
Equivalency:
equivalent
Labelled Links:
…/…-inverse
Subclassing:
is-a/superclass-of
Document
order
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
formal
15
Motivation
Examples for Knowledge Models
Engineering
Fiction Writing
Thinking
Req. Engineering
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
Simulation
17
How does Writing/Reading works?
Writing / Sending
Reading / Recieving
 Write down ideas
 Visualise the structure
graphically
 Group them
Mind
maps
Mind
maps
 Connect new structures with
existing own structures
 Structure them
 Add argumentation structures
???
 Add references to literature
Reference Manager
???
 Link pieces in a first draft
 Add introduction and conclusion
 Repeat until coherent flow
Text
processing
 Publish document
„Von der Idee zum Text“ [Esselborn 2004]
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
18
The tool chains break
 Create a new slide show out of three old presentation plus
one from your colleague
 Why not have the content in smaller, more logical chunks?
 Re-use the motivation part of an old paper for a new one
 If you find a mis-spelling, why have to fix it twice?
 Search a stack of paper notes with good ideas
 Why are those not in your computer?
 Search email archives to find out what the high-level
architecture for the new authentication system is
 Why not browse your PKM and see the relations?
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
19
Technological Developments
  accelerated distribution by many orders of magnitude
  lower costs
Analog
 Digital
Communication
speed
cost
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
time
20
Cost of Communication
Data transmission is cheap now
 Total cost of communication to send content to n people:
+
+
+ n ·(
+
+
+
)
| choosing relevant parts of the personal model |
| encoding of model parts in document parts |
| order document parts strictly linear/hierarchical |
| data transmission |
| linear reading of the document |
| decoding of model parts from document parts |
| creating a networked model out of model parts |
| integrate new model to existing model |
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
21
Cost of Communication
Where can we save, if n is small?
 Total cost of communication to send content to n people:
+
+
+ n ·(
+
+
+
)
| choosing relevant parts of the personal model |
| encoding of model parts in document parts |
| order document parts strictly linear/hierarchical |
| data transmission |
| linear reading of the document |
| decoding of model parts from document parts |
| creating a networked model out of model parts |
| integrate new model to existing model |
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
22
Cost of Communication
 Total cost of communication to send content to n people:
+
+
+ n ·(
+
+
+
)
| choosing relevant parts of the personal model |
| encoding of model parts in document parts |
| order document parts strictly linear/hierarchical |
| data transmission |
| linear reading of the document |
| decoding of model parts from document parts |
| creating a networked model out of model parts |
| integrate new model to existing model |
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
23
Current process – culture is document-centric
Sender
Recipient(s)
Cost
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
24
Ideal process - What if not documents, but knowledge
models would be exchanged between people?
Sender
Recipient(s)
Cost
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
25
Realistic (improved) process – use both
Sender
Recipient(s)
Cost
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
26
Information Management Problems
 Solution: Knowledge Models
 Under-utilisation of the interlinked nature of information [Oren]
 fine-granular nature of knowledge models allows for precise and
effective linking – and browsing
 People have problems in using strict hierarchies [Oren]
 classification methods like tagging and non-strict taxonomies
 Keep the context [Oren]
 networked nature of a knowledge model is more suited to represent
contextual links than a set of documents
 Granularity
 Represent more than the content of just one document
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
27
When to use Knowledge Models?
Fixed
domain
 Use domain specific tools & languages
 Standardised representation formalisms
 Established data exchange processes
 Use personal knowledge models
Open
domain
- or –
Multiple
domains
 Unstructured, semi-structured,
semi-formal and formal parts
 Ad-hoc formalisation
 Cheaper to create, easier to integrate
Myself!
My Team
My Community
 Use Documents
 Costly to create
 Cheap to read  sometimes the best solution
 Hard to integrate
Broad audience
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
28
Related Work in Semantic Authoring
 Initial ideas - although that term was not used can be found already in V. Bush and D. Engelbart
 ABCDE Format from Anita de Waard
 Semantically annotated Latex (SALT) by Tudor Groza
 Systems allowing end-users to construct ontologies out of their linked
information objects.
 L. Ludwig sees redundancy within and among documents as a hurdle to efficient information
usage. Traditional notion of a document is replaced by virtual documents, which render parts of
the knowledge base as an interactive tree.
 Bernstein describes TinderBox, a "personal content management assistant", which offers
sophisticated HTML generation via templates.
 Gnowsis system by Sauermann allows to link desktop objects,
integrates with wiki
 iMapping – semantic concept maps by Haller
 Same direction in the fields of semantic desktop and semantic wiki
 Semantic Web Content Repository (swecr)
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
29
Thank
You very much
Conclusion
for Your attention
Contact:
Max Völkel, voelkel@fzi.de
 Documents
 Authoring is the bottleneck
 Document-centered culture is a costly
legacy artefact and bottleneck for our
society
 Personal knowledge models
 We should bring the power of
modeling to the end-user
 Don‘t break the tool chain
 Focus on work-in-progress
 Superset of documents and ontologies
 Integrate with the semantic desktop
 Make knowledge worker happier and more
productive
© 2007 Max Völkel, FZI
29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km
30
Download