From Documents to Knowledge Models Max Völkel voelkel@fzi.de Forschungszentrum Informatik an der Universität Karlsruhe (TH) Personal Knowledge Management Definition: knowledge cues [Haller] any kind of symbol, pattern or artefact which evokes some knowledge in a person’s mind, when viewed or used. Knowledge cues can be stored and retrieved on a computer – while knowledge may or may not. © 2007 Max Völkel, FZI Ok, in fact you 29.03.07, store ProKW bits (signals) @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 2 What is a Document? A team of 50 French researchers discussed … © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 3 Definition: Document A team of 50 French researchers could agree on: Document as form Document as a container, which assembles and structures the content to make it easier for the reader to understand it. Document as sign Emphasize argumentative structure of the content. Document can be referenced acts as a sign for its meaning. Document as medium “Reading contract“ = intention or assumption of the author what will happen with the document. © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 4 Document (my definition) I/II A document consists of information atoms. An information atom is the smallest unit of content which can be interpreted without a documents context (but of course requiring background knowledge). For text, these atoms are single words. Packaging – establishes a context Reference-ability – reference to a published document can act as a placeholder for the content expressed within. Document Author, audience, goal Process metadata – should be sent along such as authors, audience, goal © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 5 Document (my definition) II/II A document is a knowledge artefact consisting of several layers: – content means something. Content Semantics Building upon logical and argumentative structure, the author encodes statements about a domain within the content. – to convey its content to the reader. Argumentative Structure Logical Structure Visual Structure Linearity Argumentative structures appear on all scales. A typical structure is the “Introduction - Related work – Contribution - Conclusion”-pattern of scientific articles. On smaller scales, patterns like “claim-proof” and “question-answer” are used. – can reference smaller parts within a document i.e. paragraphs, headlines, footnotes, citations, and title – guides the reader informally type-setting (i.e. bold, italics, different font styles and size), placement of figures, pages – carries additional information – defined order for navigating through all information items © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 6 I propose a different document agenda: I believe we need new electronic documents which are transparent, public, principled, and freed from the traditions of hierarchy and paper. Ted Nelson © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 7 What do people want? Why? © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 8 What is a Wiki? What‘s new compared to CMS? Easy Contribution shorter time-to-publication Wiki pages can be created and edited by any user quickly and easily Wikis were the first Easy Writing deployed, collaborative hypertext authoring Simple text formatting without the need to learn HTML Wiki Syntax environments Easy Linking Automatic linking converts written names of pages, images and websites to links People want more links Recent Changes See what has happened – Awareness Diff function shows the latest changes Easily check whether changes are ok Fulltext search for page titles and text Backlink function shows which pages link to the current page Find the context of this page © 2007 Max Völkel, FZI Directly link deep into a wiki using readable names 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 9 My definition based on OMG metamodel MOF What is a Model? Typed entities and typed relations Type A2 Type C2 Type B2 (Meta-)Modelling Type A1 Type C1 Type B1 Modelling Entity X Artifact X Entity Y Artifact © 2007 Max Völkel, FZI Y Real world from the viewpoint of the individual 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 10 What is a Knowledge Model? Document Ontology Knowledge Model Information atoms Text (paragraphs, images, multimedia resources) Concepts Items (text, images, other binary resources) - Text Short (headlines) and longer (paragraphs) Short labels Anything from short labels to structured documents Order Strict linear order – Yes, may be partial and have cycles Hierarchy Yes (chapters, sections, paragraphs, sentences) Yes Yes, may be partial and have cycles Annotations Yes (footnotes) Yes Yes - Tagging – – Yes – Yes Yes Hyperlinks Yes (internal references and external citations) – Yes, don‘t have to occur inside text Visual layout Yes – – (annotation with keywords) - Typing (inc. Inferencing) © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 11 From Documents to Knowledge Models From analogue to digital documents Knowledge models smaller content granularity very small information atoms, such as single words more interconnected content Richly connected items more explicit structures. explicit semantics for the links. Definition A knowledge model is a superset of documents and formal ontologies. Annotated documents, stored together with their annotations, can be seen as a knowledge model. © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 12 What is a CDS? Conceptual Data Structures context annotation source before Item after target annotation member detail M. Völkel and H. Haller: Conceptual Data Structures (CDS) - Towards an Ontology for Semi-Formal Articulation of Personal Knowledge In Proc. of the 14th International Conference on Conceptual Structures 2006. Aalborg University - Denmark, July 2006. © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 13 What is a CDS-based Knowledge Model? A set of addressable items (text, images, maybe even multimedia elements) Relations between items, classified in four types Source/target: the generic, directed hyperlink link Before/after: ordering relations, linear navigation Context/detail: hierarchical relations, document and concept hierarchies Annotation/annotationMember: annotations, to give the ability to type items and relations, items are used as types meta-modeling Knowledge models must be able to capture work-in-progress CDS is not strict, you can have cycles, untyped items, paradox ordering, … © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 14 CDS: A Hierarchy of Relations Legend Undirected Relation: related/related Relation Type relation/inverse Directed Linking: source/target Annotation: Order: annotation/ before/after annotationMember Tagging: tag/tagMember Instantiation: type/instance Task priority Hierarchy: detail/context informal Equivalency: equivalent Labelled Links: …/…-inverse Subclassing: is-a/superclass-of Document order © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km formal 15 Motivation Examples for Knowledge Models Engineering Fiction Writing Thinking Req. Engineering © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km Simulation 17 How does Writing/Reading works? Writing / Sending Reading / Recieving Write down ideas Visualise the structure graphically Group them Mind maps Mind maps Connect new structures with existing own structures Structure them Add argumentation structures ??? Add references to literature Reference Manager ??? Link pieces in a first draft Add introduction and conclusion Repeat until coherent flow Text processing Publish document „Von der Idee zum Text“ [Esselborn 2004] © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 18 The tool chains break Create a new slide show out of three old presentation plus one from your colleague Why not have the content in smaller, more logical chunks? Re-use the motivation part of an old paper for a new one If you find a mis-spelling, why have to fix it twice? Search a stack of paper notes with good ideas Why are those not in your computer? Search email archives to find out what the high-level architecture for the new authentication system is Why not browse your PKM and see the relations? © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 19 Technological Developments accelerated distribution by many orders of magnitude lower costs Analog Digital Communication speed cost © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km time 20 Cost of Communication Data transmission is cheap now Total cost of communication to send content to n people: + + + n ·( + + + ) | choosing relevant parts of the personal model | | encoding of model parts in document parts | | order document parts strictly linear/hierarchical | | data transmission | | linear reading of the document | | decoding of model parts from document parts | | creating a networked model out of model parts | | integrate new model to existing model | © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 21 Cost of Communication Where can we save, if n is small? Total cost of communication to send content to n people: + + + n ·( + + + ) | choosing relevant parts of the personal model | | encoding of model parts in document parts | | order document parts strictly linear/hierarchical | | data transmission | | linear reading of the document | | decoding of model parts from document parts | | creating a networked model out of model parts | | integrate new model to existing model | © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 22 Cost of Communication Total cost of communication to send content to n people: + + + n ·( + + + ) | choosing relevant parts of the personal model | | encoding of model parts in document parts | | order document parts strictly linear/hierarchical | | data transmission | | linear reading of the document | | decoding of model parts from document parts | | creating a networked model out of model parts | | integrate new model to existing model | © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 23 Current process – culture is document-centric Sender Recipient(s) Cost © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 24 Ideal process - What if not documents, but knowledge models would be exchanged between people? Sender Recipient(s) Cost © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 25 Realistic (improved) process – use both Sender Recipient(s) Cost © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 26 Information Management Problems Solution: Knowledge Models Under-utilisation of the interlinked nature of information [Oren] fine-granular nature of knowledge models allows for precise and effective linking – and browsing People have problems in using strict hierarchies [Oren] classification methods like tagging and non-strict taxonomies Keep the context [Oren] networked nature of a knowledge model is more suited to represent contextual links than a set of documents Granularity Represent more than the content of just one document © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 27 When to use Knowledge Models? Fixed domain Use domain specific tools & languages Standardised representation formalisms Established data exchange processes Use personal knowledge models Open domain - or – Multiple domains Unstructured, semi-structured, semi-formal and formal parts Ad-hoc formalisation Cheaper to create, easier to integrate Myself! My Team My Community Use Documents Costly to create Cheap to read sometimes the best solution Hard to integrate Broad audience © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 28 Related Work in Semantic Authoring Initial ideas - although that term was not used can be found already in V. Bush and D. Engelbart ABCDE Format from Anita de Waard Semantically annotated Latex (SALT) by Tudor Groza Systems allowing end-users to construct ontologies out of their linked information objects. L. Ludwig sees redundancy within and among documents as a hurdle to efficient information usage. Traditional notion of a document is replaced by virtual documents, which render parts of the knowledge base as an interactive tree. Bernstein describes TinderBox, a "personal content management assistant", which offers sophisticated HTML generation via templates. Gnowsis system by Sauermann allows to link desktop objects, integrates with wiki iMapping – semantic concept maps by Haller Same direction in the fields of semantic desktop and semantic wiki Semantic Web Content Repository (swecr) © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 29 Thank You very much Conclusion for Your attention Contact: Max Völkel, voelkel@fzi.de Documents Authoring is the bottleneck Document-centered culture is a costly legacy artefact and bottleneck for our society Personal knowledge models We should bring the power of modeling to the end-user Don‘t break the tool chain Focus on work-in-progress Superset of documents and ontologies Integrate with the semantic desktop Make knowledge worker happier and more productive © 2007 Max Völkel, FZI 29.03.07, ProKW @ WM2007, Potsdam, Germany http://xam.de/2007/doc2km 30