Lecture Notes in Computer Science:

advertisement
Supporting metadata creation with an ontology built
from an extensible dictionary
Trent Apted, Judy Kay, Andrew Lum
School of Information Technologies,
University of Sydney, NSW 2006
Australia
{tapted, judy, alum}@it.usyd.edu.au
Abstract. This paper describes Metasaur, which supports creation of metadata
about the content of learning objects. The core of Metasaur is a visualisation for
an ontology of the domain. We describe how we build lightweight ontologies
for Metasaur automatically from existing dictionaries and how a user can enhance the ontology with additional terms. We report our use of Metasaur to
mark up a set of audio lecture learning objects for use in a course.
1 Introduction
Metadata tagging is a problem, especially in systems with many existing documents
and a large metadata term vocabulary [1]. The task of annotating existing documents
with metadata is challenging and non-trivial because it is hard to be thorough and
consistent, and the task is both demanding and boring. The task becomes even harder
when the documents might be multimedia objects such as an audio clip.
A reflection of the importance and difficulty of metadata markup is the growing
number of tools which are exploring ways to support the task. For example, one such
tool, Annotea [2] builds on Resource Description Format (RDF) technologies, providing a framework to allow users to add and retreive a set of annotations for a web
object from an “annotation server”.
Since it is such a tedious task to add the metadata by hand, there is considerable
appeal in finding ways to automate part of the process. Even in this case, there is likely to be a need for human checking and enhancing of the metadata. We need interfaces
that can support both the checking of metadata which was created automatically as
well as hand-crafting of metadata. We call this the metadata-interface problem.
We believe that ontologies will provide an important tool in allowing people to
create metadata by providing a common vocabulary of terms and relationships for a
domain. Ontologies have an important role in the vision of the Semantic Web [3]. It
makes sense to exploit the technologies and standards developed as part of the Semantic Web initiative. The Ontology Web Language (OWL) [4] aims to provide a standard representation for ontologies in the Semantic Web. Ontologies will play an important role in the task of metadata tagging as they provide a common vocabulary to
describe a particular domain.
However there are also problems in exploiting ontologies. One of these is that ontologies are often time consuming to construct [5]. It is therefore, appealing to find
ways to create ontologies automatically. The OntoExtract tool described in [6] is an
example of one such system. One problem with such approaches to automated ontology construction is that they may not include all the concepts needed for metadata
markup. We call this the restricted-ontology problem.
Another challenge in exploiting ontologies relates to issues of interfaces. If we are
to exploit ontologies as an aid in the metadata markup, we need to provide intuitive
and effective interfaces to the ontology. These are critical in supporting users in navigating the ontology to find the terms they want and to easily see the closely related
ones that may also deserve consideration as potential metadata candidate terms. The
importance of the ontology-interface problem is reflected in the range of novel ontology visualisations tools such as Ontorama [7], Bubbleworld [8] and the graph drawing
system by Golbeck and Mutton described in [9].
This paper describes a new and novel interface, Metasaur, which tackles the
metadata-interface problem. It builds on the SIV interface, which we have created as
an exploration of solutions to the ontology-interface problem. In this paper, we describe new approaches to address the restricted-ontology problem by supporting users
in adding new dictionary entries to an existing dictionary which is then automatically
analysed and incorporated into the ontology.
Section 2 provides an overview of Metasaur, and then Section 3 describes the ontology visualisation part of its interface. Section 4 explains the process we use to automatically build the ontology and support additions to it. It is then followed by a
description of the ontology structure and a way to augment the ontology with additional dictionary definitions. We conclude with a discussion of our evaluations and
plans for future work.
2 Metasaur
There are existing systems that allow instructors to add metadata to learning objects
[10] as well as standards for metadata about Learning Objects [11]. These systems
employ an extensive description of the domain that is usually defined by the course
instructors. In contrast, Metasaur use a lightweight ontology [12] that is automatically
constructed from an existing data source. It also provides a novel visualisation of the
ontology that supports an exploratory approach to discovering appropriate metadata
terms in the domain.
Fig. 1. The Metasaur interface showing the SIV interface on the left, and the slide with associated metadata on the right. The SIV interface currently has the term select-then-operate paradigm in focus with related terms such as noun-verb paradigm and objects-and-actions design
methodology are shown as a secondary focus. Note: this image and subsequent ones have been
rendered in black and white for publication clarity.
The driving application domain for Metasaur is the need to markup metadata on
learning objects in an online course. The User Interface Design and Programming
course is taught in the February semester at this university. The course is taught
through a combination of online material and face-to-face tutorials. It consists of 20
online lectures that students are expected to attend at times they can choose, but partly
dictated by the assignment deadlines. There are recommended deadlines for attending
each lecture. Each lecture has around 20 slides, and each slide has associated audio by
the author. Generally this audio provides the bulk of the information, with the usual
slide providing a framework and some elements of the lecture. This parallels the way
many lecturers use overhead slides for live lectures. A user profile keeps track of
student marks and lecture progress.
2.1 Interface overview
Figure 1 gives an example of the Metasaur interface. Users can select text in the learning object, such as the word observation, and click on the Search Selected button to
do a word matching search of the terms in the ontology. Results are shown on the
visualisation. Users can then scan through the words and select terms that are appropriate. For example, the term observational study is one term that would be returned
when the search described is executed. This word can be selected in the visualisation,
and a click on the Add Metadata Element button will associate the concept with the
learning object.
There are several core components to Metasaur as shown in Figure 2. The blocks in
the diagram represent objects and interfaces that exist in the system. Of note are the
Existing Dictionary as input to Mecureo, and the Ontology output in OWL format.
Mecureo is discussed in more detail in Section 4.
There are two main parts to the Metasaur interface. The left contains a visualisation
called the Scrutable Inference Viewer (SIV) that allows users to easily navigate
through the ontology structure. The SIV Interface is described in further detail in
section 3. The learning object contents and visualisation control widgets are on the
right. The content of each slide currently consists of the slide itself, an audio object
associated with the slide, and questions related to the concepts.
Users can interact with the interface to create metadata for the learning object. A
mechanism has been designed to allow users to define their own terms to add to the
ontology. These local definitions are merged with the existing dictionary and processed into the ontology graph by OWL. A demonstration version of Metasaur is
available online1.
2.1 Ontology visualisation
Scrutable Inference Viewer (SIV) is an evolution of VlUM (for Visualisation of Larger User Models), a tool that can effectively display large user models in web-based
systems. The VlUM interface has been extensively tested with user models consisting
of upto 700 concepts. Users have been able to navigate around the user model and
gain an overview of the concepts inside it [13]. The interface has been modified to
allow us to be able to visualise ontologies.
The concepts in the ontology are displayed in a vertical listing. It utilises perspective distortion to enable users to navigate the ontology. At any point in time, the concept with the largest font is the one currently selected. A subgraph is created encompassing this term and those that are deemed related. Concepts connected directly to
the selected concept are put into a secondary focus, appearing in a larger font size,
spacing and brightness than those further away in the ontology. Similarly, concepts at
lower levels in the tree are shown in progressively smaller fonts, less spacing and
lower brightness. Concepts that are not relevant are bunched together in a small
dimmed font.
Users can navigate through the ontology by clicking on a concept to select it. The
display changes so that the newly selected concept becomes the focus (see Figure 5
1
http://www.it.usyd.edu.au/~alum/demos/metasaur_hci/
Existing Dictionary
Mecureo
New Local
Definitions
Ontology (OWL)
SIV/Jena
SIV
Learning
Object
Teacher
Metadata
Fig. 2. Overview of the Metasaur architecture.
for an example). A slider allows users to limit the spanning tree algorithm to theselected depth. This effectively changes the number of visible terms. In Figure 1, for
example, the main focus is select-then-operate paradigm, Some secondary terms are
noun-verb paradigm and objects-and-actions design methodology and the depth is set
at 2. Changing the depth will change the number of visible terms on the visualisation.
We envisaged that the SIV interface would guide the navigation for users adding
metadata. The converse is also true; contents of the slide can be used to guide the
navigation of the ontology. This is achieved through the use of Javascript to allow
users to select text in the slide contents, and clicking Search Selected, allowing rapid
searching of terms in the contents. For example, in Figure 1, a user could select the
text observation and click Search Selected to quickly see all the terms in the ontology
that contain the text string “observation”. This forms a useful starting point for users
to then navigate to other related terms.
3 Augmenting the ontology
The process taken by Mecureo to generate a directed graph of the terms in the dictionary involves making each term a node. It then scans through each of the definitions for
terms that are also nodes and generates a link between them. The graph is gradually
built up as more definitions are parsed until there are no more to do. In the usability
glossary there exists 1127 defined terms (this includes category definitions)
This means that there will be many words that appear in the definitions that will
Fig. 3. The user has added the term Novice. Mercureo has automatically created relationships to
other concepts in the ontology.
not be in the final graph because they are not a term in the dictionary. As an example,
the word novice appears many times in the Usability Glossary (such as in the definition for hunt and peck) but is not a term because it is not in the glossary.
If a word like novice would be a suitable metadata term, we would like to be able to
enhance the core Mecureo ontology to include it. So we have enhanced Mecureo to
allow a user to create their own pseudo-terms. These are merged with the dictionary
and parsed by Mecureo to create the graph. These pseudo-terms need to be no more
than just a declaration of the word as a term, and does not require a definition of its
own since Mecureo will form links to and from the pseudo-term to existing terms
through their definitions.
Figure 4 shows the term novice in the SIV visualisation, with relationships to other
terms such as selection bias and shortcuts generated by the Mecureo parser.
4 Marking up learning objects
Through our own experiences and evaluations we have discovered that the unaugmented Usability Glossary has only a very small overlap with the terms used in the
learning objects of the User Interface Design and Programming course (the course
used less than 10% of the terms defined in the dictionary). This poor term coverage is
attributed to two facets.
Firstly, there are cases where we use slightly different terminology. For example,
the term cognitive modeling in the glossary is used in a similar sense to the term predictive usability which is used in the course.
The second problem is that there are some concepts that are considered important
in the course and which are part of several definitions in the Usability First dictionary
but are not included as actual dictionary entries. This is the case for terms such as
novice. We wanted this term as metadata on learning objects which describe usability
techniques such as cognitive walkthough. It is the problem that the current extensions
to Mecurio particularly address.
We have run a series of evaluations of our approach. One that was intended to assess the usability of the Metasaur interface [14] indicated that users could use it effectively to search the SIV interface for terms that appeared in the text of an online lecture slide. This also gave some insight into the ways that SIV might address the first
problem, where the problems of slightly different terminology.
The participants were asked to select terms that seemed most appropriate to describe a particular slide of the online lecture. The participants were a mix of domain
experts and non-experts. Participants were frustrated when words on the slides did not
appear in the ontology. Domain experts navigated the ontology to find close terms to
those words. Non-experts chose to continue onto the next slide.
The current version of Metasaur addresses this problem by allowing users to define
their own terms. It would clearly be far preferably to tackle this in a more systematic
and disciplined way so that when a new term is added to the metadata set for a learning object, it is also integrated well into the set of terms available for future learning
objects. Our most recent work has explored simple ways to enhance the ontology with
terms used in the course and relate them to the already existing terms in the Usability
Glossary. This way, metadata added by a user who chooses only to add terms that
appear on the slides of the online lecture will still be extremely useful as similar terms
can be inferred from the ontology.
Our implementation involves adding an additional screen to Metasaur that allows
users to define a new term (and explicitly) state the category, definition and related
words if they wish.
Exploration
[#exploration]
Simulate the way users explore and learn about
an {interactive system}.
Related: {cognitive modeling} {learning curve}
Categories: <Usability Methods>
Fig. 4. An entry for the term exploration (declared on line 1). The second line is the URL identifier for the term, followed by the definition and the related (existing) terms in the dictionary
and which categories this term belongs to, respectively.
These are appended to a separate file that gets merged with the Usability Glossary
and parsed by Mecureo. Essentially, this means we are creating pseudo-terms in the
dictionary as described in Section 3.
Ideally, we would like to make the ontology enhancement process as lightweight as
possible. The simplest approach would be to nominate a would be to nominate a term,
such as novice, to become treated as a new, additional term in the dictionary so that
Mecureo will then link this term within existing dictionary definitions to other parts of
the ontology. It would be very good if this alone were sufficient. The first column in
Table 1 shows the results of this very lightweight approach for the terms that we wanted to include as metadata for the collection learning object slides in the lecture on
cognitive walkthrough.
Table 1. Added Term Linkage
Term
Term name only
Term and Definition
Term, Definition
and Related
novice users
2
3
5
discretionary users
0
1
3
casual users
0
1
3
exploration
9
10
12
usability technique
0
1
1
testing process
1
2
4
Each column represents a separate resulting graphs after Mecureo processed a dictionary with the user defined terms added. They were created with the same parameters. Minimum peerage was set to 0 and the link weight distance was set to 100. This
means that all nodes in the graph are included in the OWL output file. More information on Mecureo parameters can be found in [15].
Term name only shows the linkage to the new terms when we only had the term
name and category (in terms of Figure 5, only lines 1, 2 and 7 had the appropriate
values) in the user defined list of terms. With no bootstrapping of definitions or related terms, words such as novice user and exploration result in a number of relationships in the ontology simply by appearing in the definitions of existing terms. Other
words such as discretionary user do not occur in any of the definitions, resulting in a
node not connected to any other node in the graph. This is due to differences in terminology between the authors of this dictionary and the authors of the materials used in
our course.
Term and Definition shows the linkage to the new terms when we used the contents
of the online lecture slide that taught this concept as the definition. For the example
term in Figure 5, the term will have had everything present except for the ‘related’
field. This meant that links to other existing terms could be inferred from the words
appearing in the definition.
The Term, Definition and Related column shows the linkage to the new terms when
we use two keywords, in addition to the definition as just described. For example, the
term exploration would appear in the user defined term list as it appears in Figure 5.
Essentially this allowed us to ‘force’ a relationship between our new term and one or
more existing terms in the dictionary. These can been seen in the OWL representation
of exploration as shown in figure 6. We can see the defined relationships to cognitive
modeling and learning curve as ‘siblings’. The other relationships have come from
parsing the definition we provided for the term, and the definitions in the terms that
Mecureo has chosen to relate to this term.
Bootstrapping the parsing process by giving the pseudo-term some existing related
terms and a short definition minimizes this effect and gives more favorable results. For
the longer user defined terms, the lower number of links occurs because of the text
matching level in the parser (which controls things such as case-sensitivity and substring matching). There is an obvious bias towards shorter words. Processing the dictionary with the Term name only user defined dictionary and same parameters but
replacing novice users with the word novice results in novice having 8 directly connected terms.
5 Related Work
There has been considerable interest in tools for ontology construction and metadata
editing. This section briefly discusses some existing tools and constrast them to our
own research.
In [16], Jannink describes the conversion of the Webster’s dictionary into a graph.
Relationships have a strength value based on their appearance in the definition, similar
to Mecureo. The major difference with our work is that because the dictionary is so
comprehensive the resultant graph contains lexical constructs such as conjunctions
and prepositions as nodes in the graph. There are three types of relationships between
the words in the graph determined by a heuristic that utilizes the strength value. In
contrast, Mecureo determines the relationship type through some simple NLP and
pattern recognition. This means that it tackles a quite different style of ontology with
more generic concepts modelled where we have purposely chosen to focus on specialised dictionaries since they are better suited to the markup of learning objects in a
particular domain. It is not clear whether approaches that are suited to basic generic
concepts should be particularly suited to our more specialised concept sets.
AeroDAML [17] is a tool that automatically marks up documents in the
DAML+OIL ontology language. The amount of automation can be varied to suit the
level of user interaction. Technical users are more likely to use a semi-automated
approach to annotating the metadata, where non-technical users might prefer an automatic approach. AeroDAML uses WordNet upper level noun hierarchy as the ontology, in contrast to Metasaur’s ontology built from any online dictionary or glossary
source.
The SemTag [18] application does semantic annotation of documents, designed for
large corpora (for example, existing documents on the Web). SemTag stores the semantic annotations on a server separate from the original document as it does not have
permission to add annotations to those files. In contrast, Metasaur has been designed
to be used in an environment where the metadata authors do have access to write to
the existing content. Importantly, the nature of the evaluation of their system is inherently different from our own. They have asked arbitrary users to check and approve
large numbers of semantic links constructed as a means of evaluation. We have taken
a more qualitative approach with the metadata checking being performed by the
teacher who wants to be in complete control of the metadata associated with the learning objects they use in their own course. Perforce, this means that we have done a
much less extensive evaluation but one that sets much more difficult standards.
Another very important element of the current work is that text available of the
learning objects is just a small part of the learning object. The bulk of the content of
most slides is in the audio `lecture' attached to the text of each slide. If we were to aim
for automated extraction of metadata, that would require analysis of this audio stream,
a task that is currently extremely challenging with current speech understanding technology. But even beyond this, the accurate markup of the metadata is challenging even
for humans as we found in our earlier evaluations [14] where less expert users made
poor choices of metadata compared with relatively more expert users, who had recently completed this course satisfactorily. This later group defined metadata that was a
much better match to that created by the lecturer. Indeed, this is one reason that we
believe an interesting avenue to pursue is to enhance the interaction with the learning
objects by asking students to create their own metadata after listening to each of the
learning objects. Checking this against the lecturer's defined metadata should help
identify whether the student appreciated the main issues in that learning object.
The novel aspect of our work is the emphasis on the support for the user to scrutinise parts of the ontology. Users can always refer to the original dictionary source that
the ontology was constructed from since all the relationships are inferred from the
text. Because the dictionary source is online, it is easily available to the users, and
changes to the ontology can be made either through the addition of new terms, or
regenerating the ontology with a different set of parameters for Mecureo.
6 Discussion and conclusions
We have performed a number of evaluations on SIV and Metasaur. Results show that
users were able to navigate the graph and use it as an aid to discover new concepts
when adding metadata. The evaluation of Metasaur described in [14] was on an earlier
version of the system that did not allow users to define their own terms. A larger evaluation is currently being planned that will incorporate the ability to add new concepts
to the ontology.
The Metasaur enhancements that enable users to add additional concepts, with their
own definitions, is important for the teacher or author creating metadata for the learning objects they create and use. An interesting situation arises when users have different definitions for terms – they will be able to merge their definitions with the core
glossary definitions at runtime, resulting in a different ontology for each user. Users
could potentially share out their own dictionary or use parts from other user’s dictionaries to create their own ontologies.
There are still some issues with the current design. The differences between UK
and US spellings are not picked up by the parser. There is also the likely possibility of
users adding words that do not appear in any of the definitions
We believe that Metasaur is a valuable tool for aiding users mark up data. For
teaching, it will not only be useful to instructors wishing to add metadata to learning
objects, but also to students who will be able to annotate their own versions of the
slides, providing potential to better model their knowledge for adaptation. The user
defined dictionaries enrich the base ontology resulting in better inferences about the
concepts. In our teaching context this means the metadata will be a higher quality
representation of the learning objects allowing for better user models and adaptation
of the material for users.
We are creating user models from web usage logs and assessment marks of students
doing the User Interface Design and Programming course. The current version of
Metasaur provides an easy way to navigate the domain ontology and to create of
metadata. The same terms as are used as metadata will be used as the basic components in the user model. In addition, the user model will, optionally, include the ontologically related terms with the possibility of inferring that the user's knowledge of the
basic terms might be used to infer their knowledge of closely related terms that are not
used in the metadata. The enhancements made to allow users to add terms to the ontology results in a higher quality representation of the concepts taught by the course.
7 Acknowledgements
We thank Hewlett-Packard for supporting the work on Metasaur and SIV.
References
1. Thornely, J. The How of Metadata: Metadata Creation and Standards. In: 13th National
Cataloguing Conference, (1999)
2. Kahan, J., et al. Annotea: An Open RDF Infrastructure for Shared Web Annotations. In:
WWW10 International Conference, (2001)
3. Berners-Lee, T., Hendler, J., and Lassila, O., The Semantic Web, (2001)
4. OWL Web Ontology Language Overview, Available at http://www.w3.org/TR/owl-features/.
(2003)
5. Fensel, D., Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce: Springer (2001)
6. Reimer, U., et al., Ontology-based Knowledge Management at Work: The Swiss Life Case
Studies, in J. Davis, D. Fensel, and F.v. Harmelen, Editors, Towards the Semantic
Web: Ontology-driven Knowledge Management. 2003, John Wiley & Sons: West
Sussex, England. p. 197-218.
7. Ecklund, P. Visual Displays for Browsing RDF Documents. In: J. Thom and J. Kay, Editors.
Australian Document Computing Symposium, (2002) 101-104
8. Berendonck, C.V. and Jacobs, T. Bubbleworld : A New Visual Information Retreival Technique. In: T. Pattison and B. Thomas, Editors. Australian Symposium on Information
Visualisation. Australian Computer Society, (2003) 47-56
9. Mutton, P. and Golbeck, J. Visualisation of Semantic Metadata and Ontologies. In: E. Banissi, et al., Editors. Seventh International Conference on Information Visualisation.
IEEE Computer Society, (2003) 306-311
10. Murray, T., Authoring Knowledge Based Tutors: Tools for Content, Instructional Strategy,
Student Model and Interface Design. In Journal of Learning Sciences. Vol. 7(1)
(1998) 5--64
11. Merceron, A. and Yacef, K. A Web-based Tutoring Tool with Mining Facilities to Improve
Learning and Teaching. In: U. Hoppe, F. Verdejo, and J. Kay, Editors. Artificial Intelligence in Education. IOS Press, (2003) 201-208
12. Mizoguchi, R., Ontology-based systemization of functional knowledge. In., (2001)
13. Uther, J., On the Visualisation of Large User Model in Web Based Systems, PhD Thesis,
University of Sydney (2001)
14. Kay, J. and Lum, A., An ontologically enhanced metadata editor, TR 541, University of
Sydney, Sydney (2003)
15. Apted, T. and Kay, J. Generating and Comparing Models within an Ontology. In: J. Thom
and J. Kay, Editors. Australian Document Computing Symposium. School of Information Technologies, University of Sydney, (2002) 63-68
16. Jannink, J. and Wiederhold, G. Thesaurus Entry Extraction from an On-line Dictionary. In:
Fusion '99, (1999)
17. Kogut, P. and Holms, W. AeroDAML: Applying Information Extraction to Generate DAML
Annotations from Web Pages. In: First International Conference on Knowledge Capture (K-CAP 2001) Workshop on Knowledge Markup and Semantic Annotation,
(2001)
18. Dill, S., et al. SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation. In: Twelfth International World Wide Web Conference, (2003)
Download