Recognition of Non-Subject Tags: A Model for Improving Tag Quality by Adding Personal Value

advertisement
Fall
Recognition of Non Subject Tags: A
Model for Improving Tag Quality by
Adding Personal value
Ame Wongsa
INF 385T
December 2, 2008
08
Ame Wongsa
INF 385T
Dec 12, 2008
Table of Contents
1. BACKGROUND
3
2. IMPLICIT DIVISION OF TAGS IN TAGGING SYSTEMS
4
3. DECREASING TAG EFFICIENCY: A NEED FOR TAG SEPARATION
5
4. WHY PEOPLE USE NON SUBJECT TAGS
6
5. THE MODEL
7
5.1 BENEFIT TO TAGGERS
5.2 BENEFIT TO COMMUNITY
7
7
6. SUGGESTED TAGGING INTERFACE
7
7. SYSTEM AND VIEW CHANGES
8
7.1 USER VIEW
7.2 TAG VIEW
7.3 URL VIEW
9
9
10
8. MANUAL TAG REFACTORING
10
9. FUTURE WORK AND POTENTIAL USE
10
REFERENCES:
12
2
Ame Wongsa
INF 385T
Dec 12, 2008
1. Background
Collaborative tagging systems are often defined by complementary personal and
public aspects. “Individuals employ personal vocabulary to describe personal objects.
Organically through the efforts of many diverse users, a global language is developed that
is used to describe the global set of objects” (Chi 2008). Through the use of a personal
vocabulary in the form of “tags”, individuals organize and categorize web pages for later
retrieval. On the community level, the rich descriptions provide by individual tags
provide a “folk taxonomy” or “folksonomy”, a term defined by Thomas Vander Wal to
classify resources on the web.
The popularity of tagging systems on the web such as Delicious for web pages,
Flickr for photos, and Connotea and CiteULike for academic papers can be attributed to a
low barrier of entry. Sinha describes how tagging “taps into an existing cognitive process
without adding add much cognitive cost” (2005). Tagging systems allows anyone to
freely attach keywords to content without knowledge of a classification system or a
controlled vocabulary, yet through aggregate use, it allows precise information retrieval.
Tagging is most useful “when there is simply too much content for a single authority to
classify… which is true of the web (Golder & Huberman 2006).
While collaborative tagging may be better suited for classifying web content
compared to other traditional taxonomies, it does face certain limitations. Vocabulary
control is an issue for any classification system but becomes more of a problem for
tagging systems that depend on users varying vocabularies. Other areas of tagging such
as tag formatting, tag clustering, query expansion and disambiguation stand to improve
(Peters & Weller 2008). As tagging systems mature, information theory analysis has
shown waning tag efficiency and the reduced usefulness of tags for information retrieval
(Chi 2008).
Many research in this regard currently deals with developing different
algorithms to restructure folksonomies automatically. Some of these methods include
automatic tag recommendation, natural language processing for query expansion, and
semantic navigation of tags. Peters and Weller suggest “tag gardening” “manual activity,
3
Ame Wongsa
INF 385T
Dec 12, 2008
performed by the users to manage folksonomies and gain better retrieval results, which
can be supported by certain automatic processes” (2008). While they present many
reasons for “tag gardening” and list many areas of tagging systems that need tending by
users, they do not present multiple models for accomplishing this task.
This paper presents a model for “tag gardening” and explores its use in the social
bookmarking system Delicious. We explore the idea of a tagging system that recognizes
subject tags and non subject tags separately. This model increases the value for personal
organization, which in turns provides the foundation for improving the quality of tags by
algorithmic methods.
2. Implicit Division of Tags in Tagging Systems
In studies of tagging systems, many researchers employ informal divisions of tags
into public and private domains. Golder and Huberman describe seven functions tags
perform for bookmarks and separate them into information extrinsic to the tagger and
information relative to or only relevant to the tagger (2006). Kipp found that 16% of tags
on Delicious are non subject tags (2006). And in a later study grouped these non-subject
tags into affective tags, tags that describe an emotional state, and time and task related
tags, both of which do not describe the aboutness of a resource and have been excluded in
traditional classification systems (2006b). Peters and Weller acknowledge the need for
distinguishing different tag qualities and purposes and suggest “distinguishing different
tag qualities using forms of facets, categories or fields”(2008). Guy and Tonkin posit
“the real problem with folksonomies is not in their chaotic tags but that they are trying to
serve two masters at once; the personal collection and the collective collection”. Their
proposed solution for this problem mirrors current development of appropriate set of
algorithms to “revisit data with another aim” to reveal usefulness in certain “sloppy” tags
(2006).
Interestingly, even Delicious’s tagging bookmarklet makes the distinction
between “most popular tags”, tags that describe the aboutness of the document and “your
tags”, previously used tags that help add the document to the tagger’s information space.
While this distinction help tagger brainstorm indexing terms, the selections made from
4
Ame Wongsa
INF 385T
Dec 12, 2008
these two groups are added to the flat namespace through the single tag box, which
arguably defeats the purpose of separating the two contexts in the first place. If Delicious
were to keep the distinction between social and personal tags in that tagging system could
they add new dimensions to their system that both individuals and the community can use
to their advantage?
Examined Delicious pages of Delicious power users may help answer that
question. Delicious users develop different methods to deal with tag entropy. Some
Delicious power users employ hierarchical tags such as “web/programming/flash”.
Others use conventions such as @Name and to+Verb to add context to their bookmarks
(Wagner 2005). Francoeur uses multiple delicious accounts to organize his bookmarks in
different contexts: one for sharing links with his wife, one for indexing his blog and one
for professional purposes (2008). People already try to hack the flat namespace to add
structure to their tags or separate their tags by context. Considering that users already find
such features useful, allowing easier separation of tags would add personal value to
existing tagging systems. Instead of using conventions that machine algorithms do not
understand, taggers can use the explicit organization feature. Creating a dual tagging
system presents a new avenue of studying personal organization systems and the resulting
changes in tagging behavior.
3. Decreasing Tag Efficiency: A Need for Tag Separation
Perhaps the most important reason for separation of personal and indexing tags
rest in the findings of recent tagging analyses – that the most popular tags are too general
to render precise and useful retrieval results (Paolillo & Penumarthy, 2007; Kip, 2006;
Peter & Weller 2008). Through information theory analysis, Chi discovered that the
specificity of any given tag is decreasing” and concludes that we moving close to the
proverbial “needle in a haystack” in which “any single tag references too many
documents to be considered useful” (2008). If algorithms for improving social tags
cannot depend on popular tags to provide descriptive data, the ability to distinguishing
personal and specialized meaning tag will become more crucial. “Tags that are generally
5
Ame Wongsa
INF 385T
Dec 12, 2008
meaningful [are] likely to be used by many taggers, while tags with personal or
specialized meaning will likely be used by fewer users” (Golder & Huberman 2006).
Peters and Weller suggest “using an inverse tag cloud (showing rarely used tags
in bigger font sizes)” to display rarely used tags an provide an additional access point to
counter the decreasing usefulness of popular tags. In this case, personal tag would create
noise if they are not separated from specialized tags. Chi proposes that tagging sites ask
taggers to describe the document by terms not on the popular list (2008). Again, this
approach would work better if taggers were able to contribute to the vocabulary
independently from adding personal tags to the document.
4. Why People use Non Subject Tags
A good question to ask is how non subject tags, such as affective tags and time
and task related tags mentioned earlier, relate to information classification and retrieval.
In other words, why are people using these types of tags on Delicious. A study by Kip
and Cambell found that over 16% of a sample of Delicious tags were found to be non
subject related (2006). Furthermore, research in personal information management have
found that people classify documents with contextual information to enhance findability
(Kwasnik 1991). People also find it easier to locate things by physical location than by
classification (Malone 1983). In addition, people find it easier to find things by
recognition than searching or relying on memory. Overall user may see classification as
“a holistic process closely tied to themselves and their work” and is an interest to “all
who design classification systems to aid users in location of information” (Kip 2006). As
a result, a system to allow users to take advantage of the way they naturally relate to
information will be able to enhance its metadata both for personal and social use. Given
the chance and a better interface, it is likely that users will use personal tags to increase
findability of their information and separate those from subject related tags that on help
others. People already try to separate contextual tags from subject tag. Scanning the tags
of Delicious power users (and the Delicious development team) sorted alphabetically will
show that most of them already have employed conventions to keep these tags at the top
of their list.
6
Ame Wongsa
INF 385T
Dec 12, 2008
5. The Model
To comply with a tagging system’s flat namespace, the separation of tags can be
accomplished by tagging of tags. Tags are internally tagged as subject or non subject.
Instead of having two tagging system, the service should allow humans and machines to
analyze the both types of tags together and separately. The separation of these two types
may be most useful at the interface level where users interact with the information. Users
should be able to adopt the new dimension if they find it useful. People few bookmarks
may not need a more complex organization system. All tags start off as subject tags and
then users can specify certain tags as non subjects as they see fit. Overall the model for
dual tagging is flexible and should not require a complete overhaul of a tagging system
that adopts this model.
5.1 Benefit to Taggers
The system will allow users to tidy up their tags in a way that adds meaning to
their bookmark organization and allow searching for tags by terms in their vocabulary
and memory. Users will be more likely to switch to one bookmarking system, reduce use
of contextual conventions or the need to use multiple accounts to organize social and
personal bookmarks.
5.2 Benefit to Community
The added personal value becomes an incentive for users to organize their
bookmarks better, cleaning up tags for themselves and as a result for the community
much in the same way that social bookmarking allows personal information retrieval and
social information discovery. Users can choose to view or ignore non subject tags of
taggers, reducing the visual noise in tag clouds or tag related points of entry. Lower
frequency tags become more meaningful and useful in algorithmic enhancement to tag
based information retrieval.
6. Suggested Tagging Interface
Since the success of tagging has been attributed to its low barrier of entry and a
“personal value [that] preceeds network value” (Porter 2006), the user interface can be
crucial to the success of this modified tagging model. Care should be given to the
interface to prevent confusion and needless addition to tagger’s cognitive cost. While
7
Ame Wongsa
INF 385T
Dec 12, 2008
user testing would be needed to determine the best interface for this new tagging model,
Figure 3 presents suggested tagging interfaces.
Figure 1 Modified tagging Interface to allow addition of non subject tags that can be organized into folders
Figure 1 presents a tagging interface with personal tags as optional to reduce any
additional cognitive cost. Personal tags are automatically derived from the tagging
syntax. It is possible that a single tag box serves two purposes is not intuitive to users so a
selector box is used to add personal tags. This option is labeled as a “list” to help users
understand that the following sets of labels are personal (physical) and belongs to the
user. Once the user checks “Add to List” a selection box appears allowing the user to
select a list. The default setting hides the list selector to indicate that the funtion is
optional. Once user selects a list the notation for adding a non subject tag appears in the
tag box and help teach users how to quickly add these types of tags in the future.
Although a better user interface can be determined upon user testing, the point is to allow
the separation of indexing labels and personal labels.
7. System and View Changes
Delicious is bases on three major axes: users, tags, and URLS, which are reflected in the
URL design used by the site (ex. http://del.icio.us/mattb, http://del.icio.us/tag/xml, and
8
Ame Wongsa
INF 385T
Dec 12, 2008
http://del.icio.us/url/8b7fe…) (Biddulph 2004). I explain changes to the user interface
according to these three major access points in the following sections:
Figure 2 Modified User View of Delicious. A non subject section is added to the sidebar. Subject and non subject
tags appear in different color in the bookmark listing.
7.1 User View
The main change to the delicious user page is the addition of the non subject tags
in its own view in the sidebar. Users can choose whether to view either sets of tags as a
cloud or as a sorted list. Users are able to employ a hierarchical organization system
alongside archive resources for later retrieval. The subject tag cloud may better represent
the interests of the user. The non subject tags may be shared or kept private by the user.
Either way they present an interesting view of the user based more on current projects or
interests. For quick identification, the two types of tags can be displayed in different color
on the bookmark listing (see figure 1).
7.2 Tag view
In the tag view the dual tagging system could clean up related tags sections that
may contain non subject tags. Personal and subject tags shown in different colors on the
9
Ame Wongsa
INF 385T
Dec 12, 2008
entries allow people to scan for relevant information since the non subject tags are
usually only relevant to the tagger.
7.3 URL View
In the URL View the separation of subject and non subject tabs allows people to
quickly tell what the URL is about and how people use the site.
8. Manual Tag Refactoring
Given that the dual-tagging model adds focus to personal tags, better tag
refactoring will be crucial the model. The use of non subject tags have been excluded
from traditional classification systems due to their potentially temporary or task specific
nature. Promoting them in a tagging system would require more robust tag management
tools for editing, renaming, and deleting tags as context changes. Currently delicious only
allows management of single tags and the creation of tag bundles. Peters and Weller
determined that improvement to folksonomies is required at multiple levels: at the
Document collection vs. single document level, personal vs. collaborative level and at the
intra- and cross-platform level (2008). In addition to these areas, the dual tagging model
would need editing features standard to desktop organization systems. Users should be
able to easily select multiple documents to tag and untag. Conversely users should be
able to select multiple tags to label a single or multiple documents.
9. Future Work and Potential Use
This paper proposes adding a dimension to the collaborative tagging model.
While the potential benefits to tagger and the community have been outlined, a prototype
needs to be created and implemented to determine the usefulness of the system. An
exploratory case study investigating this dual tagging system should answer the following
research questions:
1. Feasibility: Would users assign non subject tags and would they be meaningful
and maintained?
2. Accuracy: Does the removal of non subject tags allow for better information
retrieval?
10
Ame Wongsa
INF 385T
Dec 12, 2008
3. Utility: Can the modified tagging system improve algorithmic based tag
refinement?
4. Meaning: Are the non subject tag graphs and visualization meaningful?
If the addition of non subject tagging to the tagging paradigm proves to be useful,
it can be expanded and used in other paradigms such as Vander Wal’s Model of
Attraction. Vander Wal describes the Model of Attraction as a framework for describing
how users interact with information with a focus on outlining user’s needs and desires in
the information life cycle. As the model’s name suggests, attraction is the key to user’s
relationship with information: users draw information closer and are attracted to terms in
the presentation layer creating a two-way attraction (Vander Wal 2004). Similarly, a
well-developed, hierarchical template of our interests, such as our bookmarking folder
system, can serve as a filter to draw in information. We can also use the hierarchical
structure to browse the information that fits the characteristics of our personal
information cloud. Imagine if we were able to browse someone else’s bookmarks in the
way we organize our own resources. If we browse the resources of someone who uses a
similar vocabulary, we could traverse a familiar taxonomy multiple levels deep and make
discoveries at each turn. Furthermore, what if we can get updates for our whole system of
knowledge without creating a new RSS feed for each new interest.
A personal information template also fits in to Vander Wals’ discussion of
“Attraction Receptors”. The following points show how the two models are compatible:
1. Intellectual: “Classification systems are based…on the cognitive attraction
terms based on the user definition of those terms.”
2. Perceptual: “[U]ser has preconceived ideas of the… visual and auditory
presentation form [and] style.”
3. Mechanical: Aggregators draw information that matches certain criteria.
4. Physical: “Users are continually trying to attract and keep desired information
closer to themselves.” “Users set parameters of attraction for the information.”
“User prefer to have information in formats that work easily with their
receptors” (Vander Wals 2004).
11
Ame Wongsa
INF 385T
Dec 12, 2008
8. Conclusion
The problem with traditional organization systems is that information is organized
statically. We overcame these problems with search and tagging systems but we lost the
semantics provide in complex taxonomic systems as well as our ability to grok
information important to us, most of which now lives on our computer or on the web. We
may begin by separating our resources according to subject and context to better
understand our own information space. Doing so could allow us to start building a
dynamic template for the better information understanding and discovery.
References:
Biddulph, M. (2004). Introducing del.icio.us. O’Rielly xml.com. Retrieved December 1,
2008 from http://www.xml.com/pub/a/2004/11/10/delicious.html
Chi, E. H. (2008). Mytkowicz, T. Understanding the Efficiency of Social Tagging
Systems using Information Theory. Proceedings of the nineteenth ACM conference on
Hypertext and hypermedia, pages 81-88, 2008.
Francoeur, S. (2008). Improving tagging in del.icio.us. Digital Reference. Retrieved
December 1, 2008 from http://www.teachinglibrarian.org/weblog/2008/07/improvingtagging-in-delicious.html#links
Golder, S. Huberman, B. A. (2006). The Structure of Collaborative Taggi
.ng Systems. Information Dynamics Labs, HP Labs, Palo Alto, USA.
Guy, M., Tonkin, E. (2006). Tidying up Tags?. D-Lib Magazine, 12(1).
Kipp, M.E.I. (2006). @toread and cool: Tagging for time, task and emotion. 17th
ASIS&T SIG/CR Classification Research Workshop. Abstracts of Posters, pp. 16-17.
Kipp, M. E. I. & Campbell, D. G. (2006). Patterns and Inconsistencies in Collaborative
Tagging Practices: An Examination of Tagging Practices. Proceedings of the Annual
General Meeting of the American Society for Information Science and Technology.
Austin, TX, November 3-8, 2006.
12
Ame Wongsa
INF 385T
Dec 12, 2008
Kwasnik, B. H. (1991). The Importance of Factors That Are Not Document Attributes in
the Organisation of Personal Documents. Journal of Documentation 47(4), 389-398.
Malone, T. W. (1983). How Do People Organize Their Desks? Implications for the
Design of Office Information Systems. ACM Transactions on Office Information Systems
1(1), 99-112.
Paolillo, J., & Penumarthy, S. (2007). The social structure of tagging Internet video on
Del.icio.us. Proceedings of the 40th Hawaii International Conference on System Science.
Peter, I. Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance.
Webology, 5(3), Article 58.
Porter, J. (2006). The Del.icio.us Lesson. Bokardo.com, retrieved December 1, 2008 from
http://bokardo.com/archives/the-delicious-lesson/
Shirky, C. (2006).Ontology is overrated: Categories, links and tags, Clay Shirky’s
Writings about the internet. Retrieved December 1, 2008, from
http://www.shirky.com/writings/ontology_overrated.html
Sinha, R. (2005). A cognitive analysis of tagging, Rashmi’s blog, retrieved December 1,
2008, from http://rashmisinha.com/2005/09/27/a-cognitive-analysis-of-tagging/
Vander Wal, T. (2004). Understanding the Personal Info Cloud: Using the Model of
Attraction. Vanderwal.net. Presentation, retrieved December 1, 2008 from
http://www.vanderwal.net/essays/moa/040608/index.php
Wagners, O. (2005). Themenmonat Tagging: 5. Best Practice. Oliver Wagners
agenturblog, retrieved December 1, 2008 from http://www.agenturblog.de/200511/themenmonat-tagging-5-best-practice/
13
Download