20010611-ADBIS_jwshwsms

advertisement
Lecture Notes in Computer Science
1
Subject-Oriented Work: Lessons Learned from an
Interdisciplinary Content Management Project
Joachim W. Schmidt, Hans-Werner Sehring, Michael Skusa, and Axel Wienberg
Technical University TUHH Hamburg, Software Systems Institute,
Harburger Schloßstraße 20, D-21073 Hamburg, Germany
{j.w.schmidt, hw.sehring, skusa, ax.wienberg}@tu-harburg.de
Abstract. …
1 Introduction
Message
- Since advanced and extensible database technology is available as off-the-shelves
products, a substantial part of database research and development work generalized
into work on models and systems for multimedia content management.
- R&D work in content management includes a range of models and systems concentrating on services for the following three lines of work:
 content production and publication work by multimedia documents;
 classification and retrieval work based on document content;
 management and control of such work for communities of users differentiated by their roles, profiles, rights etc.
- Currently, many contributions to such R&D work are based on XML as a syntactic
framework which provides a structural basis as well as some form of implementation platform.
- The main reasons for XML’s strong position are its strong structural commitment
and its semantic neutrality. Unfortunately, much of the XML-based R&D contributes to the above three lines of work only individually. Examples include [ ], [ ],
…[ ].
- Successful content management requires, however, that the above services are not
provided in isolation along the individual lines of work but in an coherent and cooperative working environment covering the entire space spanned by all three dimensions of work;
- The main reason why XML-based work on content management falls short can be
phrased as a corollary of XML’s strength (see above): its weak semantic and exclusively structural commitment.
- Computer science alone
Lecture Notes in Computer Science
2
- The work reported here is based on interdisciplinary project work
 a partner project from the humanities with strong semantic and weak
formal commitment: icon- and text-based content form the subject area
of “Political Iconography (PI)” organized as a hierarchical subject index
(PI-“Bildindex”, BPI) and used for art history research and education;
 iconography has a long tradition, e.g., back into 19 th century “Christian
Iconograpyh, and is based on integrated experience from three sources:
- from art: handling of multimedia content;
- from the library sciences: content classification and retrieval;
- from art history research and education: process knowledge;
 the interdisciplinary project “Warburg Electronic Library (WEL)” which
models and computerizes the services for the BPI and allows interdisciplinary experiments with the resulting prototype; deep application insights gained into multimedia content management;
 the generalization of our experience and the establishment of a work plan
for content management R&D aiming at a generic Subject-Oriented
Working environment (SOWing environment).
Structure: The paper is structured as follows
 section two gives a short overview over the two projects involved, the
“Bildindex Political Iconography (BPI)” and the “Warburg Electronic
Library (WEL)”;
 in section three the WEL model is generalized towards a generic “Work
Explication Language”;
 a systems view on the WEL prototype is given in section four and the
first contours of a generic Subject-Oriented Working environment
(SOWing platform) are outlined;
 related work, in particular work in the XML context, is discussed in section six;
 the paper concludes with a short summary and a presentation of future
work.
2
Warburg Electronic Library: An Interdisciplinary Content
Management Project
The development of the currently predominant models for data management was heavily influenced by application requirements from the business and banking world and
their bookkeeping experience: the concepts of record, tabular view, transaction etc.
are obvious examples. Data model development had to go through several generations
- record-based file, hierarchy and network models – until the relational model reached
a widely accepted level of abstraction for database structuring and content-based data
operation.
Lecture Notes in Computer Science
3
For traditional relational data management we basically assume that content is values
form business domains operated by transactions and laid-out as tables with rows and
columns. The question now is:
- how can we generalize from this data management basis to the area of content management where content domains are not ”just dates and dollars”,
content operation goes beyond “debit-credit transactions” and content layout
means multimedia documents?
- what are key application areas beyond bookkeeping which help us understand the core set of requirements for multimedia content management in
terms of domain modelling, content work as well as content (re-) presentation?
Figure 1: Working with the Index for Political Iconography
in the Warburg house, Hamburg (old Abb. 2.1)
The work presented here is based on an interdisciplinary r&d-project between computer science and art history, the “Warburg Electronic Library” project. The application area was chosen because of art history’s long-term management experience with
content of various media. The project itself is grounded on extensive material and user
experience from the area of “Political Iconography”.
2.1 An Image Index for the Political Iconography (BPI)
Political iconography basically intends to capture the semantics of key concepts of the
political realm under the assumption that political goals, roles, values, means etc.
requires mass communication which is implemented by the iconographic use of images. Our partner project in art history, the “Bildindex zur Politischen Iconographie
(BPI)”, was initiated in 1982 by the art historian Martin Warnke [ ] and consists of a
bout 1.500 named political concepts (subject terms, “Schlagworte”) and more than
300.000 records on iconographic works relevant to the BPI. In 1990 Warnke’s work
was awarded by the Leibniz-Preis, one of the most prestigious research grants in Germany.
Iconography has a long tradition, at least back into the 19th century (“Christian Iconography”), and is based on contributions and experience from three sources:
- from art: handling and presentation of multimedia content;
- from the library sciences: content classification and retrieval;
- from art history research and education: process knowledge and methodological experience.
Figure 2: BPI “Bildkarte” St. Moritz (media card) describing art work by attribute
aggregation
<old Abb. 3.1, some English>
Lecture Notes in Computer Science
4
Starting from this experience, BPI-work essentially relies on an art historian’s
knowledge on political acts in which images played an active role and on documents
which record such acts. The art historians interpret “acts” as encompassing aspects of
- “projects” (who initiated and contributed to an act? the when and where of an
act? etc.);
- “products” (what piece of art did the project produce? on what medium?
place of current residence etc.); and finally, the
- “concepts” behind the act (what political goals, roles, institutions etc. to be
addressed? what iconographic means to be used by the artist? etc.).
On this basis, the BPI isolates political concepts – e.g., rulers, princes, popes, rulers
on horseback – and names them individually by subject terms.
Subject term semantics is captured and represented in the BBI by
- designing a conceptual, prototypical model for each subject term (mostly
mentally), e.g., a prototype which encompasses all facets considered relevant
to a prototypical equestrian statute;
- giving value to facets by reference to the art historian’s body of knowledge
and documents on political acts and arts; each such facet is represented by a
BPI-element (“Bildkarte”, “Textkarte”, Videokarte” – “media cards” etc.) intended to hold the description of a “good case” for that facet, see for example, St. Moritz, see fig. 2.
- collecting all BPI-elements describing some prototype into an extent ( “Bildkartenstapel”, … - “extent of media cards”, see fig. 3) which defines the subject term’s semantics; additional fine structure may be imposed on subject
term extents (order, “neighborhood”, named subextents, general association/navigation etc.);
- maintaining a process which aims at
o “representative” subject terms covering and structuring the subject
area at hand (e.g., political iconography);
o a “good” prototype for each subject term;
o a “complete” set of facets for prototype description;
o a “qualifying” case from the subject area for the substantiation of
each facet.
The BPI is by no means an index for accessing an image repository! The BPI uses
images only in their rather specific role as icons and for the specific purpose of contributing to the semantics of subject terms. In this sense, images represent the iconographic vocabulary of BPI-documents just as keywords contribute to the linguistic
vocabulary of text documents.
Figure 3: BPI subject term semantics (e.g. equestrian statue) by media cards classification (image, text, video etc.)
<old Abb. 3.1, some English>
Lecture Notes in Computer Science
5
The BPI has essentially two groups of users: a few highly experienced BPI-editors for
content maintenance and various wide user communities which access BPI-content for
research and education purposes.
Being implemented on paper technology, the traditional BPI also shows, however,
severe shortcomings.
- conceptually: the above attributes “representative”, “good”, “complete”, etc.
are highly subjective and hard to meet even within a single “Warnke-owned
BPI”;
- technically: crucial representational limitations - ranging from problems with
multiple subsumption of BPI elements to online and networked BPI access are obvious.
In the subsequent section we give an outline of the “Warburg Electronic Library”
project which approaches these conceptual and technical shortcomings by an advanced digital library project which, as a first application area, is now hosting the
“Index for Political Iconography”.
2.2 The Warburg Electronic Library (WEL) Project
Viewed from the perspective of our Computer Science interests which shifted in recent years clearly from research in “databases and persistent programming” towards
r&d in “software systems for content management (online, multi media, …)”, the
WEL project addresses a range of highly relevant and interrelated content application
issues:
- content representation by multiple media: images, texts, data, …;
- content structuring, navigation and querying, content presentation;
- content work based on subjects and ontologies: classification, indexing, …;
- utilization of different referencing mechanisms: icon, index, symbol;
- cooperative processes and projects on multi media content in research and
education.
The WEL is an interdisciplinary project between the art history department of Hamburg University (research group on Political Iconography, Warburg House) and the
software systems institute of the Technical University, TUHH, Hamburg. It started in
1996 as a 5-years project and will be extended into an interdisciplinary r&dframework between several Hamburg-based institutions.
For a short WEL overview we will concentrate on two project contributions:
- application of the classical semantic database modelling principles for WELdesign;
- development of personalized and cooperating digital WEL libraries based on
project-specific prototypes and their use in art history education.
Lecture Notes in Computer Science
6
2.2.1 WEL Semantic Modelling Principles
The WEL design is based – and so is already the BPI design – on the classical principles for semantic data modelling [ ], [ ]: aggregation, classification, generalization /
specialization and association/navigation (see figs. 2 and 3).
Figure 4: Media card in the BPI-WEL associated with information on (multiple)
subject term subsumption and on classification work identification
<old Abb. 3.5 and 3.8, some English>
Fig. 4 shows the media card of St.Moritz associated with all the (multiple) subject
terms to which St. Moritz contributes. The example also links St. Moritz to details of
the classification process by which this card entered the WEL. This information is
essential for the realization of project-oriented views on subject terms which may be
customized along the following two dimensions:
- thematic views: projects usually concentrate on sub areas of the all encompassing “Index for the Political Iconography”;
- personalized views: they cope with the conceptual problems mentioned at the
end of section 2.1: what is a “representative”, “good”, “complete”, etc. subject definition?
Initial experiences with both of these aspects are outlined in the subsequent section.
2.2.2 On Project-oriented Educational Work with the WEL-BPI
Figure 5: Art history seminar work with focused and personalized WEL-BPIs
<old Abb. 5.2, some English>
3
A Generalized WEL-Model: Towards a "Work Explication
Language"
In this section: a look at the building blocks of the SOWing platform.
Conceptually, our SOWing platform is based on a notion of “work”, and the services provided are intended to make work more explicit. In doing so, our SOWing
platform improves the basis for content-oriented work organization, management and
execution.
Lecture Notes in Computer Science
7
3.1 An Overview on SOWing Entities
Content management as supported by the services of our SOWing platform is based
on the notion of “work”. In our context, the notion of work is essentially captured by
three dimensions of work properties:
- the circumstances under which a work is performed (work as “project”),
- a work’s result (work as “product”), and
- the conceptual idea behind a work (work as “concept”). -> work triangle.
Conceptually, the service domains provided by a SOWing platform are based on
three kinds of work-related entities (“SOWing entities”):
- “works” characterized by the three work dimensions as sketched above;
- “work documents” which are assumed to report on some work (work documents
may exist physically or mentally);
- “work document descriptions” which record the essential content of a work document, i.e., content which refers to our three work dimensions. Such document descriptions provide the basis for “subject term” definition.
With respect to their degree of formalization, SOWing entities may range from rather informal entities (to be mediated to humans) to fully formalized structures (to be
read and processed by machines). Furthermore, SOWing entities will be consumed
and produced by external tools thus requiring a general interfacing technology to the
world outside the SOWing platform.
Subsequently, we discuss the three kinds of SOWing entities in more detail.
3.1.1 On the Notion of “Work”
Work: idea behind a document, circumstances of its creation etc. [Svenonius]
Warnkes interest in three aspects:
- the circumstances of its production (goals, time, resources, …): project
- the outcome: product
- the abstract idea of it: concept
-> work triangle
Works inside the SOWing system:
- document description
- subject completion
- publication
3.1.1.1 Work as Task
Documents can be viewed as the visible outcome of some task. Then Work can be
viewed as that task.
Work thus represent tasks carried out in the past, but might also be used to monitor
tasks currently carried out in the system or to prescribe tasks to be carried out in the
future.
This task can be modeled from the observations, e.g., the document descriptions
and subjects created during the creation of some document (see ad-hoc workflows).
Then information on the work can be archived to learn from it / to use it when performing a comparable work later.
Lecture Notes in Computer Science
8
The task can also be defined in advance, to guide (or constrain) users. Typically (?)
along the organizational structures of the community:
- the group defines the work to be carried out by its members.
- a user files a request for preparatory work to a group
- …
3.1.1.2 Reflection
Everything done inside the SOWing system can be considered work. This means that
everything said about capturing works in the remainder of the paper also applies to the
works going on in the system.
Benefits:
- Identification of goal-oriented actions
 “intellectual transaction”
 actions: record steps carried out by user or prescribe steps to be carried
out be workflow engine
 work: “flow of works”, relationships between works / work creations
- re-use of work
 when starting new work, consideration of past own works and others’
works carried out by community members
- recording of actions
 to provide better meta-data on works (for community) and documents
(for the public)
31.1.3 [Note: Dockets]
Comparison with dockets
- content provider = document repository
- reviewer = work processor
- evaluator = work modeler
- property record = description
Dockets for works including contributions from outside the community?
3.1.2 On SOWing Documents
Documents: narration of work.
They will contain information of the three aspects of work: abstraction, project, and
product.
Furthermore, documents are created in the SOWing platform (s.b.).
3.1.2.1 Range of Document Use
Today, both “binary” files meant to be viewed by human beings as well as semistructured content is called document. We intend to allow this whole range, dealing
with documents which are just byte sequences, have a known format, …, elements
which have a known meaning, …, are input for a software component.
Apart from this technical range, the actual representation depends on
- the source from which the document comes
Lecture Notes in Computer Science
9
- the receiver for which the document is created
3.1.3 Work Document Descriptions and their Use: Subjects
Subjects cover
- the topics which are found in documents, and
- thus reflect the domain knowledge
Subjects:
- subject terms
- relationships (specialization -> index, instantiation -> extension)
- extension consisting of (document and work) descriptions (and collection of them,
eventually with special properties)
Completion semantics. Facets. Prototypes.
3.1.3.1 Completion semantics. Facets. Prototypes. Descriptions
Documents are considered being entities outside the system:
- autonomy of document repositories
- own documents are transmitted, not hosted (-> no access control, …)
- …
Our first aim is to provide a registry (“… a facility that stores relevant descriptive
information about registered objects, and a registered object is something important
that an author or producer wants to have visible to the world so that it can be discovered and used by a client or customer.” [1]) for a community.
Documents are found by searching the virtual, distributed document repository (real world, WWW, …). They are abstracted by creating a description; analyzed/interpreted with respect to the task at hand. Identification of the work which is
materialized by this document. This is the classical abstracting done in every library
when creating cards.
Descriptions are stored in the community’s registry.
Descriptions furthermore decouple documents from subjects (s.b.) by providing
“smart handles” which abstract from technical constraints (access, change management, versions, …).
Description = work description + document description + reference to document +
set of subject terms
Two goals:
- classify documents
- capture knowledge buried in documents
Growing knowledge reflected in subject indices (“organizational memory”):
- values in descriptions: record creation
- references in descriptions: relationships to past works
- …
3.1.3.2 Basic Subject Structure and Semantics
Specialization relationship on subjects, forming hierarchy (subject index).
Instantiation relationship between subject and document descriptions. Multiple subsumption of descriptions (see Error! Reference source not found.):
Lecture Notes in Computer Science
10
- in different indices of same type
- in indices of different types
Semantics of instantiation made up of different aspects:
- object structure: technically, descriptions have a type (in the sense of data type); all
aspects of the type theory of PLs / DBs apply here
- feature sets: dual to the object structure, features determine a semantic type; the
more special a subject becomes the more features are the descriptions from its extent expected to have
- completion semantics: the extent of a subject is supposed to be complete (at the
current time, for the user who created it); thus, the extent is a prototype for the subject; adding / removing descriptions from the extent adjusts the subject’s semantics
On Formal Subject Semantics:
Formal properties of subject classes originate form different semantic sources:
- object semantics: primarily, subject class entities are also object class entities in the
sense of object-oriented language models. However, the extent of a subject class
may be heterogeneous because it may be made up by entities describing different
media documents – text, image, video documents etc. Therefore, subject class entities viewed as objects may be members of different object classes – text, image,
video classes etc.
- content semantics: furthermore, all entities of the same subject class share the fact
that the content of the work documents they describe is to be subsumed under the
same subject term. Formally this can be expressed by a subset relationship between
content characteristics and a subject-related set of key features (key icons, key
words etc.). In our WEL application, for example, all paintings of rulers share key
motives such as crowns, scepters, swords etc. Such sets of key features also contribute to the semantics of subject classes. Note that specialization or generalization
of subject classes is based on the extension and union or reduction and intersection
of subject-related key features sets.
- completion semantics: although completion semantics may be considered more of
an issue of class pragmatics than class semantics it implies formal constraints on
subject class extents. Since users of subject definitions act under the assumption
that the subject owner established a subject extent which represents all relevant aspects of the owners subject prototype, any change of that extent is primarily monotonic, i.e., may improve a subject definition by adding or replacing subject class elements. Therefore, references to existing subject class entities should not become
invalid.
3.1.3.3 Specific Semantic Issues
Subject indices are not only used to capture domain knowledge on the community’s
(research) topic, but also “secondary knowledge”, especially
- organizational structure of the community, including knowledge on users, rights
granted to them …
- information on documents not related to their content, e.g., (kind of ) origin, document types, quality, …
- …
Lecture Notes in Computer Science
11
Each subject index is assigned a semantics (how to interpret specialization and instantiation).
index types:
- content-related
- community-related
- action-related
3.1.3.4 Range of Subject Use
Dimension, like for documents: from informal (ontological, only understandable by
humans) to typed (machine-readable, e.g. class hierarchy).
3.2 Predominant SOWing Relationships
3.2.1 Subject Index <-> Subject Index
3.3.1.1 Inter-Subject Relationships
Two (or more) subjects can be related
- directly
- by sharing descriptions
- via a relationships on the description layer
The relation of subjects can have a special semantics, e.g., if a description which
has been placed under a subject describing a user group and a subject for a class of
access rights might be interpreted as: the group may use functionality which requires
to have that right on the description.
Q: co- / contra-variant use of subjects, e.g., in the above example: right co-variant
(you have right r1 if you have right r2 with r2 <: r1), user group contra-variant (have
right r if one of the groups you belong to has right r)
3.3.1.2 Personalization
Coexistence / collaboration within in the community: personalization of subject indices. Take existing index as starting point, refine, contribute.
Within sub-communities / along organizational structures: direct exchange of subjects and descriptions.
- personalization: “copy” subjects and descriptions into private view -> ability to
change data and structure
- contribution: offer personalized items to next level in the organization; delivery of
results; decision about inclusion of the changes (small organizational unit: vote;
large organizational unit: librarian); if changes are rejected eventually birth of a
new community
Communication across communities / to other systems: documents (see next subsection).
Lecture Notes in Computer Science
12
3.2.2 Document <-> Subject
Relationship established via description.
Distinction of closed world of a community and the open public.
- documents for communication between community and “public” (other communities or unknown / unsupported tools)
- subjects to support community work
(work? only within community, or also between communities?)
3.3.2.1 Subject Definition
Subject is defined by documents.
- Completion semantics; ever changing semantics of subject.
- Extensional semantics.
- Find prototypes for subjects.
3.3.2.2 Document Classification
Document is classified by subjects.
- To share knowledge, to find documents, to cluster documents, …
- Subject term first.
- Each index reflects one dimension.
3.3.2.3 Document Creation From Subjects
Documents are “narrative”: they add enough context information to subjective
knowledge structure so that they can be communicated. To communicate with other
communities, subject indices are finally published as documents by
- adding “narration”
- adding layout
The amount of additional information to be inserted into the document depends on
the receiver. The greater the “organizational distance” between sender and receiver is,
the less can be assumed about the shared knowledge, and the more has to be put into
the document.
Usual (?) way: stepwise transformation of subject index into document structure
adding properties to extents (linear order, spatial ordering, …). Final step: layout;
usually outside the scope of subject-oriented work (except cases where the layout is
part of the content).
For publication, the work support enables the SOWing platform to provide additional meta-data from
- community, users (individuals with known interests and expertise)
- work support (knowledge about steps carried out for structure from which document is created)
In the sense of
- “semantic web” (machine-readable descriptions for software agents)
- work-document-relationships (see Svenonius; definition which documents refer to
the same work)
Lecture Notes in Computer Science
13
3.2.3 Work <-> Document
Interpretation and publication of documents is work.
Work is communicated using documents? Again, these documents can be directed
at human beings (“Handlungsanweisungen”) or software (e.g., it can be a workflow
plan to be enacted by a workflow engine). Like for any document, it is necessary to
know the organizational context of the receiver to decide how much information to put
into the document.
3.2.4 Work <-> Subject
Work on the subject level recorded/described by work description.
Work is classified/organized by subjects (by viewing it as a document and providing a description for it?). Relationships of domain, organization, and work reflected by
relationships between subjects / descriptions of the respective indices.
Document creation (“programming”) like above?
3.3 SOWing Entities and External Tool Support
- on the external use of SOWing entities
- XML as gluing language and interface technology: see below
4 The SOWing System
4.1 Examples
4.1.1 Example for Personalization
Given a data model for the Warburg Electronic Library, imagine the the file bpi.xml:
<library>
<name>Bildindex zur Politischen Ikonographie</name>
<description>...</description>
<classifier>
<name>Adel</name>
<number>10</number>
<classifier>
<name>Turniere</name>
<number>30</number>
<extent>
<extent-ref ref="4711"/>
<extent-ref ref="4712"/>
</extent>
</classifier>
<extent>
Lecture Notes in Computer Science
14
<extent-ref ref="0815"/>
</extent>
</classifier>
<picture-card id="0815">
<title>Bonaparte überquert die Alpen ...</title>
<artist>...</artist>
</picture-card>
<movie-card id="4711">
<title>Napoleon</title>
<director>Abel Gance</director>
</movie-card>
</library>
According to his conceptual model, the user might work with descriptions like this:
/bpi.xml?user=sehring:
<library>
<name>Bildindex zur Politischen Ikonographie</name>
<description>...</description>
<classifier>/cls_10.xml</classifier>
</library>
/cls_10.xml?user=sehring:
<classifier>
<name>Adel</name>
<number>10</number>
<classifier>/cls_10_30.xml</classifier>
<extent>/pic_crd_0815.xml</extent>
</classifier>
/cls_10_30.xml?user=sehring:
<classifier>
<name>Turniere</name>
<number>30</number>
<extent>/mov_crd_4711.xml</extent>
<extent>...</extent>
</classifier>
/pic_crd_0815.xml?user=sehring:
<picture-card>
<title>Bonaparte überquert die Alpen ...</title>
<artist>...</artist>
</picture-card>
/mov_crd_4711.xml?user=sehring:
<movie-card>
<title>Napoleon</title>
<director>Abel Gance</director>
</movie-card>
Lecture Notes in Computer Science
15
While performing description work, users often modify the Conceptual Model. In our
example, a user gets pic_crd_0815.xml from the community's library, personalizes it,
and it sends back to the community as pic_crd_0815.xml:
<picture-card>
<title>Napoleon überquert die Alpen ...</title>
<artist>/artistdb/artist4368.xml</artist>
<medium>Gemälde</medium>
</picture-card>
When analyzing the user’s contribution, the SOWing system detects
- a value change: “Bonaparte” -> “Napoleon”
- a type change: “artist” now is a reference, no longer a value
- an additional feature has been added: “medium”
When the community decides to integrate the user’s contribution into the community’s registry, it has to perform the data (and eventually schema, otherwise slide 59)
update.
An issue: how does the system recognize the type-change for artist?
- user provides that information by submitting his personalized conceptual model (->
protocol between server and client)
- server gets the description work description (his personalized system as well as the
community’s registry are SOWing system; they can exchange the descriptions)
- <artist> => <artist-ref>
- <artist type="value"> => <artist type="reference">
=> 2nd solution??
The list features introduced by users grows without control; the system just stores
the additional features. Eventually, the community decides to change its conceptual
model to reflect the commonly used additional features (user-induced schema evolution). Community process to decide
- which features to include
- which feature name to use
- which feature values to allow
Descriptions created according to the old conceptual model need to be considered
(see slide 60).
4.1.2 Example for an Updated Document Description
E.g., user submits the XML document report_on_napoleon.xml:
<report>
In dem <medium>Gemälde</medium>
<title>Napoleon überquert die Alpen ...</title>
stellt <artist>...</artist> Napoleon reitend dar...
</report>
Then the SOWing system will start analyzing the document and generate a document
description:
<picture-card>
<!-- according to the original conceptual model -->
Lecture Notes in Computer Science
16
<title>Napoleon überquert die Alpen ...</title>
<artist>...</artist>
<!-- additional markup recognized -->
<medium>Gemälde</medium>
</picture-card>
This description serves as the basis for the description workers task.
Addition of features s.a.
4.2 Requirements
4.2.1 Descriptions
Structured part and free-form part.
Each entry: value or reference.
XML with “partial schema”?
4.2.2 Subject Terms and Feature Names
Vocabulary (dictionary), thesaurus, … -> terminology, spelling, synonyms, …
E.g., user can select feature names from the list of names invented in the past; mapping from user-defined names to community’s terminology.
Likewise, support for feature values (if from extendable enumeration type).
4.2.3 Personalization
System requirements for ~
Degrees of ~:
- New content or descriptions
 basic: annotations
 also new documents + descriptions
 new subjects (at leaf position)
- structural amendments
 additional instantiation relationships.
 else?
- modification of subjects or descriptions
 modify copy (copy-on-demand).
 notification about changes of the source object. merging?
- schema personalization
 descriptions (and documents?) have a structured and an unstructured part
 structured part defined in schema (conceptual model)
 unstructured part addible by user (additional features)
Lecture Notes in Computer Science
17
4.3 System Design Issues
4.3.1 Organization, Ownership, Relationship of Objects and Community
see “new content or descriptions”, section 4.2.3
Registry must know ownership to formulate queries etc. w.r.t. owner.
4.3.2 Relationships as First-Class Objects
For “structural amendments”, section 4.2.3
Registry stores objects and relationships separately; esp. not object-oriented (references belonging to one of the objects) or optimizations (1:n relationships by foreign
key).
4.3.3 Variants
for personalization (modification of subjects or descriptions, see 4.2.3)
Subjects, descriptions, and relationships stored in registry which supports variants:
at least one per user.
Personalized variant hides public one?
Notification about changes of public variant.
4.3.4 Flexible Data Model
Needed for schema personalization (see 4.2.3).
Divide descriptions in structured and unstructured part.
Unstructured part can be inserted into schema during contribution -> limited form
of schema evolution
for:
- semi-structured descriptions with changing schema
- changing value / reference semantics (interpretation of properties)
- schema personalization
4.3.5 Revisions
Especially for structural amendments and modifications (see 4.2.3) it is important to
support revisions (as opposed to update-in-place). Reason: the additions/modifications
need not be right for the updated objects (ex.: annotation to correct mistake in description, then description is corrected). Therefore, revisions of subjects, descriptions, and
relationships held in registry.
Notifications if new revisions are created (to decide whether to change structure to
reference newest revision).
4.3.6 Work Support
ad-hoc work (someone starts doing things) or planned work (defined goal, predefined
by modeler)
support: work identification, tracking, transmission of documents describing work
Lecture Notes in Computer Science
18
4.3.6.1 Work Identification
initial description by worker or modeler (explicit?)
resume (see persistent session?)
identification of contributing (previous) works
issue: how does user understand which work he is working on (e.g., if concurrently
engaged in multiple works)
4.3.6.2 Tracking
observe user to answer questions later -> every functionality decorated [G. Kaiser:
Tools in Heterogeneous Environments?]
(idea: explanation component for “expert system where the theorem prover is a human
being”?)
system creates description of work taking place
e.g., needed for interpretation of schema changes (see 4.1)
4.3.6.3 [Range of Work Descriptions -> Kap. 5?]
Documents to communicate task descriptions(?):
- Program source code
- Function calls (-> XML RPC, SOAP, XP)
- Workflow plans (-> XPDL, WfXML, ebxml)
- Task descriptions for humans beings
4.3.6 SOWing Environment
for the above requirements
4.3.6.1 Personalized Conceptual Models
Data model for machine
Conceptual model for user
Idea: user gets data and functionality in terms of his conceptual model; translated into
data model
XML-based Communication
see figures…
4.3.6.2 Generative Approach
When implementing the envisioned system the following issues arise: (1) Personalized
content, (2) a heterogeneous architecture, and (3) personalized conceptual models.
Implementing a highly generalized system to meet these requirements leads to serious
conceptual problems (schema integration, change management, …) as well as performance problems (comp. XML databases etc.).
Our solution to the problem is to generate the DL system from a user’s conceptual
model and a description of the target platform. This way we can fulfill the requirements of a heterogeneous architecture (by just generating the DL for different sites)
and personalized conceptual models (different DLs reflecting different conceptual
models can co-exist using the same repository). When dealing with personalized con-
Lecture Notes in Computer Science
19
tent in an efficient manner, the generative approach helps by providing a generationtime determined mapping, thus avoiding expensive queries and table lookups a generic system has to carry out.
-> zero overhead
The idea is the following: For such advanced applications you can either implement
a highly generic system based on standard technology like a DB, or you can use a
customizable system like a CMS + WF engine + … + glue code. The situation is comparable to the history of programming languages: They started from machine oriented
ones (assembly languages) which are hard to use but lead to efficient implementations.
These where followed by human-oriented languages (interpreted languages) which
allowed the programmer to keep the code closer to the conceptual model, but were
less efficient to execute. The dilemma was solved by compilation which combines the
advantages and pushes them even further: compiled languages are at least as highlevel as interpreted ones and lead to more efficient implementations than hand-written
machine code (since the compiler can apply optimizations a human being can’t overlook). The advent of virtual machines pushed the concept even further, allowing for
reflection etc. and to access the compilation environment at runtime. Translated to
digital libraries, we have hand-coded ones using databases (a technology which is
closer to the machine than to the programmer in that it does not reflect the conceptual
model) and content management / document storage systems which are close to the
user’s conceptual model, but have a significant runtime overhead (because they need
to mediate between the conceptual model and the data model at runtime). The answer
should be “compilation”, that is generation of a database-based application from a
user-defined conceptual model. In analogy to virtual machines, we need to access
“generation time” information, to translate operations formulated in terms of the conceptual model into operations on the generated data model. The generative approach
will allow (hopefully!) to reduce runtime overhead to a minimum.
4.4 Prototype Experience / Basis / Contribution
WEL system, history:
- Tycoon 2
- Java/CORBA/SQL
- Corem + “BOs”
- FELIDA
Two seminars. Prototypical adoption (e.g., AdVision)
4.4.1 Personalization
see chap. 3, four layers (1-3 implemented)
performance problem with contemporary content management systems (simulating
variants)
notifications
visualization problem:
- owner?
- changed?
Lecture Notes in Computer Science
20
- implicit vs. explicit copy-on-demand?
4.4.2 Specialized Subject Semantics, Reflection
uses indices for user organization
access rights are modeled as descriptions
5 The role of XML and in a Subject-Oriented Working
Environment
In the previous chapters we have tried to develop a scenario for a software system
which supports people in understanding concepts which are considered to be relevant
inside a certain community and in completing the knowledge about these concepts
while working with the system. If the knowledge maintained by such a system has to
be communicated to other people or should even be used together with different systems, the question arises, how to represent the knowledge in a system independent
manner. We expect that the Extensible Markup Language and standards based on this
language could help improving the process of knowledge representation and
knowledge interchange.
Parallel to our work many different groups are working on standards based on
XML for various purposes, e.g. for document description or knowledge representation. But currently these standardization efforts lack a common application scenario,
which could prove that not only one of these isolated standards is useful, but that especially the combination of two or more of them is worth being exploited. Our SOWing system can serve as such a common application scenario which could motivate
these standardization groups to join their forces and to enable a new kind of applications.
5.1 XML as a common means of representing and communicating knowledge
The Warburg Electronic Library enables art historians to
- Collect iconographic evidence for a certain political concept,
- Classify existing artifacts according to the subject terms of the index,
- Define or detect new concepts not covered by the current version of the index.
This results in an incrementally improving understanding of the subject terms, i.e.
the ontology, which makes up the vocabulary of this special research area, here art
history.
Within such a well-defined community vocabularies will not be identical but at
least similar across different research groups. So communication between those
groups should be rather easy. But as soon as cooperations become interdisciplinary,
the problem arises, that people from different communities have to work together.
Those communities may have completely different ways of working, thinking and
speaking about things. Even if different research groups have basically the same background knowledge, their “working sets” of subject terms may differ.
Lecture Notes in Computer Science
21
When cooperation becomes important, standards are needed not only to represent
the knowledge of one user group, but to communicate knowledge to other user groups.
If the two (or more) groups do not share a common vocabulary, additional methods
are needed to either
- Map one group’s subject index onto another one or
- Find a unified common vocabulary for knowledge exchange or
- Transport both the works of one group and those parts of the group’s knowledge
that are necessary to understand the works.
We will take a look at current research activities that could enable members of a
special community to “leave” the scope of their closed community, using the Extensible Markup Language as a “transport layer” for communication. As a side effect such
a standardized representation can serve not only as a tool for data exchange between
different working environments, but as an intermediate representation inside the same
working environment, which may already consist of different applications, too.
The following sections show, in which way different standardization efforts from
the XML community could support a SOWing system and how these standardization
efforts could profit from the SOWing concepts.
5.2 XML and Documents
One application of XML in the scenarios mentioned in the previous sections is its
usage as a platform independent language for the exchange of data. XML can be read
and understood by humans and it can be processed by machines. Processing in this
context means, that machines are able to store the XML data and to perform certain
checks, e.g. whether the XML data is well-formed or valid according to a certain document type definition. With pure XML, as stated in [1], only syntactical interoperability can be achieved. A target machine can decide whether the data is well-formed and
whether it corresponds to a certain document type. If the target machine should reason
about the content of the XML data, domain knowledge has to be hardwired into the
machine which processes the data and this knowledge has to be synchronized with the
knowledge assumed by the author of the data.
Pure XML then can be used to facilitate communication between partners who
share a common knowledge about the nature of the data being exchanged.
5.3 XML and Semantics
Up to a certain degree XML can be used to represent ontologies, but as [2] shows
important ontological relationships (e.g. “subclass-of” or “instance-of”) can not be
modeled directly in a XML document type definition. Another problem is, that XML
offers different ways to model the same relationship. For instance a property of a
certain concept can be modeled as XML element of its own or as an attribute of another element. Supposed a document type description exists, which represents an
ontology, the receiver of a document based on this type description has to “know” that
Lecture Notes in Computer Science
22
this document type represents an ontology and how this ontology can be derived from
the document type definition. So instead of trying to agree on a formalism for a common ontology [1] and [2] suggest to agree on a common formalism for representing
ontologies themselves.
The Resource Description Framework [3] provides primitives which facilitate the
representation of ontologies in a much more natural way compared to (pure) XML.
RDF can be implemented on top of XML and is used as a basic framework for the
definition of common ontology interchange languages like OIL [4]. We expect that
works can be published from inside a SOWing environment using the above standards
in order to simplify the reuse of such works. The understanding for humans outside
the community for which a work was originally published can be improved and even
machine reasoning about the works becomes feasible, as soon as both the SOWing
system and the processing machine stick to the same ontology representation standard.
5.4 XML and Activities
As mentioned in previous sections “work” is not only a product that materializes concepts or ideas in a specific manner, but the process which led to the creation of the
product is part of the work, too. From the observation how a certain product was generated we expect additional insights about facts like
- In which sequence were the production steps carried out ?
- How long did each production step take ?
- What are or what may have been the influences leading to a concrete object production sequence ?
For existing works this information usually is not available or has to be derived
from other sources. But for description works like those carried out by the art historians in our WEL example, data about the production process of such a description
work can be collected almost automatically by the SOW environment. If we remember
then, that an object description work can become a value of its own, which itself can
become a subject to later research, we need not revert to “guessing” what the circumstances of the production may have been, but we are able to know what these circumstances were. So if we consider the data about the production of a work as being as
valuable as the original work itself, its quite natural to apply the same representations
used for communicating knowledge (XML) about works to communicate knowledge
about the production history of works. This uniform handling would assist users in
both understanding works already finished and completing work that is still in progress.
The SOW environment can in this sense document the process which led to a certain document and enhance the understanding of the “work” behind the document. On
the other hand this process description can be used to derive recommendations for
future production processes in the sense of instructions that have to be carried out for
tasks similar to a successfully completed process. So a well-documented production
process can serve as a template for future production processes and contributes in this
way to a standardization of the methods used in the application domain. One possibility to represent such processes can be XML based languages like XRL (eXchangeable
Routing Language) described in [5].
Lecture Notes in Computer Science
23
5.5 SOWing as a process integrating previously disjoint standardization
activities
As we have seen in section 5.2 to 5.4 different standardization efforts exist, whose
results can be used to address some of the representation and communication problems occurring in a SOWing scenario. While SOW can be taken as a certain kind of
work which has to be represented somehow, the XML community delivers recommendations for such representations, which in turn have to be embedded into a concrete application scenario.
A SOWing system can contribute to the vision of these disjoint standardization efforts
to turn the web into a “semantic web”, in which data is not only transported by machines, but also processed in the sense, that machines could reason about the contents
of documents in a more sophisticated manner as this is possible today. Web authors
normally do not necessarily provide data about their documents. So even if systems
are built which can reason about the contents of a document based on such metadata
the system will not be able to reason about documents that do not contain any metadata. Automatic generation of metadata on the other hand requires a certain “initial
knowledge” about the facts that could be useful for other users or machines. But such
a common knowledge which could be used to describe everything in the web can be
hardly defined.
The SOWing environment can be helpful in this scenario, because many metadata
about documents can be collected automatically by the SOWing system. The kind of
data that can be collected is usually restricted to the concrete application domain of
the SOWing environment, but even this restricted kind of metadata is already much
more detailed than the metadata offered by normal web documents. If a web application has no rules for the handling of such specialized metadata it can not make much
use of it. But if a web application has a “basic knowledge” about the domain which
the metadata was generated for, then this application can help humans effectively in
accessing data they are really interested in, which brings us back to the communication and interchange questions of the section 5.1. The SOW environment can deliver
lots of the metadata while the semantic web activities deliver the formalisms to
transport such data.
6 Summary and Outlook
Text …
References
1. Decker, S., van Harmelen, F., Broekstra, J., Erdmann, M., Fensel, D., Horrocks, I., Klein,
M., and Melnik, S.: The semantic web: The roles of xml and rdf. IEEE Expert, 15(3), October 2000
Lecture Notes in Computer Science
24
2. Fensel, D.: Relating Ontology Languages and Web Standards. Modelle und Modellierungssprachen. In Informatik und Wirtschaftsinformatik. Modellierung 2000. Foelbach Verlag.
2000.
3. Lassila, O. and Swick, R. R.: Resource Description Framework (RDF) Model and Syntax
Specification. Recommendation. W3C. February 2000. http://www.w3.org/TR/REC-rdfsyntax/
4. Cover, R. : Ontology Interchange Language. In: The XML Cover Pages. OASIS. March
2001. http://xml.coverpages.org/oil.html
5. van der Aalst, W. M. P., and Kumar, A.: XML Based Schema Definition for Support of .
Inter-organizational Workflow. http://tmitwww.tm.tue.nl/staff/wvdaalst/Workflow/xrl/isr015.pdf
Download