Fall Recognition of Non Subject Tags: A Model for Improving Tag Quality by Adding Personal value Ame Wongsa INF 385T December 2, 2008 08 Ame Wongsa INF 385T Dec 12, 2008 Table of Contents 1. BACKGROUND 3 2. IMPLICIT DIVISION OF TAGS IN TAGGING SYSTEMS 4 3. DECREASING TAG EFFICIENCY: A NEED FOR TAG SEPARATION 5 4. WHY PEOPLE USE NON SUBJECT TAGS 6 5. THE MODEL 7 5.1 BENEFIT TO TAGGERS 5.2 BENEFIT TO COMMUNITY 7 7 6. SUGGESTED TAGGING INTERFACE 7 7. SYSTEM AND VIEW CHANGES 8 7.1 USER VIEW 7.2 TAG VIEW 7.3 URL VIEW 9 9 10 8. MANUAL TAG REFACTORING 10 9. FUTURE WORK AND POTENTIAL USE 10 REFERENCES: 12 2 Ame Wongsa INF 385T Dec 12, 2008 1. Background Collaborative tagging systems are often defined by complementary personal and public aspects. “Individuals employ personal vocabulary to describe personal objects. Organically through the efforts of many diverse users, a global language is developed that is used to describe the global set of objects” (Chi 2008). Through the use of a personal vocabulary in the form of “tags”, individuals organize and categorize web pages for later retrieval. On the community level, the rich descriptions provide by individual tags provide a “folk taxonomy” or “folksonomy”, a term defined by Thomas Vander Wal to classify resources on the web. The popularity of tagging systems on the web such as Delicious for web pages, Flickr for photos, and Connotea and CiteULike for academic papers can be attributed to a low barrier of entry. Sinha describes how tagging “taps into an existing cognitive process without adding add much cognitive cost” (2005). Tagging systems allows anyone to freely attach keywords to content without knowledge of a classification system or a controlled vocabulary, yet through aggregate use, it allows precise information retrieval. Tagging is most useful “when there is simply too much content for a single authority to classify… which is true of the web (Golder & Huberman 2006). While collaborative tagging may be better suited for classifying web content compared to other traditional taxonomies, it does face certain limitations. Vocabulary control is an issue for any classification system but becomes more of a problem for tagging systems that depend on users varying vocabularies. Other areas of tagging such as tag formatting, tag clustering, query expansion and disambiguation stand to improve (Peters & Weller 2008). As tagging systems mature, information theory analysis has shown waning tag efficiency and the reduced usefulness of tags for information retrieval (Chi 2008). Many research in this regard currently deals with developing different algorithms to restructure folksonomies automatically. Some of these methods include automatic tag recommendation, natural language processing for query expansion, and semantic navigation of tags. Peters and Weller suggest “tag gardening” “manual activity, 3 Ame Wongsa INF 385T Dec 12, 2008 performed by the users to manage folksonomies and gain better retrieval results, which can be supported by certain automatic processes” (2008). While they present many reasons for “tag gardening” and list many areas of tagging systems that need tending by users, they do not present multiple models for accomplishing this task. This paper presents a model for “tag gardening” and explores its use in the social bookmarking system Delicious. We explore the idea of a tagging system that recognizes subject tags and non subject tags separately. This model increases the value for personal organization, which in turns provides the foundation for improving the quality of tags by algorithmic methods. 2. Implicit Division of Tags in Tagging Systems In studies of tagging systems, many researchers employ informal divisions of tags into public and private domains. Golder and Huberman describe seven functions tags perform for bookmarks and separate them into information extrinsic to the tagger and information relative to or only relevant to the tagger (2006). Kipp found that 16% of tags on Delicious are non subject tags (2006). And in a later study grouped these non-subject tags into affective tags, tags that describe an emotional state, and time and task related tags, both of which do not describe the aboutness of a resource and have been excluded in traditional classification systems (2006b). Peters and Weller acknowledge the need for distinguishing different tag qualities and purposes and suggest “distinguishing different tag qualities using forms of facets, categories or fields”(2008). Guy and Tonkin posit “the real problem with folksonomies is not in their chaotic tags but that they are trying to serve two masters at once; the personal collection and the collective collection”. Their proposed solution for this problem mirrors current development of appropriate set of algorithms to “revisit data with another aim” to reveal usefulness in certain “sloppy” tags (2006). Interestingly, even Delicious’s tagging bookmarklet makes the distinction between “most popular tags”, tags that describe the aboutness of the document and “your tags”, previously used tags that help add the document to the tagger’s information space. While this distinction help tagger brainstorm indexing terms, the selections made from 4 Ame Wongsa INF 385T Dec 12, 2008 these two groups are added to the flat namespace through the single tag box, which arguably defeats the purpose of separating the two contexts in the first place. If Delicious were to keep the distinction between social and personal tags in that tagging system could they add new dimensions to their system that both individuals and the community can use to their advantage? Examined Delicious pages of Delicious power users may help answer that question. Delicious users develop different methods to deal with tag entropy. Some Delicious power users employ hierarchical tags such as “web/programming/flash”. Others use conventions such as @Name and to+Verb to add context to their bookmarks (Wagner 2005). Francoeur uses multiple delicious accounts to organize his bookmarks in different contexts: one for sharing links with his wife, one for indexing his blog and one for professional purposes (2008). People already try to hack the flat namespace to add structure to their tags or separate their tags by context. Considering that users already find such features useful, allowing easier separation of tags would add personal value to existing tagging systems. Instead of using conventions that machine algorithms do not understand, taggers can use the explicit organization feature. Creating a dual tagging system presents a new avenue of studying personal organization systems and the resulting changes in tagging behavior. 3. Decreasing Tag Efficiency: A Need for Tag Separation Perhaps the most important reason for separation of personal and indexing tags rest in the findings of recent tagging analyses – that the most popular tags are too general to render precise and useful retrieval results (Paolillo & Penumarthy, 2007; Kip, 2006; Peter & Weller 2008). Through information theory analysis, Chi discovered that the specificity of any given tag is decreasing” and concludes that we moving close to the proverbial “needle in a haystack” in which “any single tag references too many documents to be considered useful” (2008). If algorithms for improving social tags cannot depend on popular tags to provide descriptive data, the ability to distinguishing personal and specialized meaning tag will become more crucial. “Tags that are generally 5 Ame Wongsa INF 385T Dec 12, 2008 meaningful [are] likely to be used by many taggers, while tags with personal or specialized meaning will likely be used by fewer users” (Golder & Huberman 2006). Peters and Weller suggest “using an inverse tag cloud (showing rarely used tags in bigger font sizes)” to display rarely used tags an provide an additional access point to counter the decreasing usefulness of popular tags. In this case, personal tag would create noise if they are not separated from specialized tags. Chi proposes that tagging sites ask taggers to describe the document by terms not on the popular list (2008). Again, this approach would work better if taggers were able to contribute to the vocabulary independently from adding personal tags to the document. 4. Why People use Non Subject Tags A good question to ask is how non subject tags, such as affective tags and time and task related tags mentioned earlier, relate to information classification and retrieval. In other words, why are people using these types of tags on Delicious. A study by Kip and Cambell found that over 16% of a sample of Delicious tags were found to be non subject related (2006). Furthermore, research in personal information management have found that people classify documents with contextual information to enhance findability (Kwasnik 1991). People also find it easier to locate things by physical location than by classification (Malone 1983). In addition, people find it easier to find things by recognition than searching or relying on memory. Overall user may see classification as “a holistic process closely tied to themselves and their work” and is an interest to “all who design classification systems to aid users in location of information” (Kip 2006). As a result, a system to allow users to take advantage of the way they naturally relate to information will be able to enhance its metadata both for personal and social use. Given the chance and a better interface, it is likely that users will use personal tags to increase findability of their information and separate those from subject related tags that on help others. People already try to separate contextual tags from subject tag. Scanning the tags of Delicious power users (and the Delicious development team) sorted alphabetically will show that most of them already have employed conventions to keep these tags at the top of their list. 6 Ame Wongsa INF 385T Dec 12, 2008 5. The Model To comply with a tagging system’s flat namespace, the separation of tags can be accomplished by tagging of tags. Tags are internally tagged as subject or non subject. Instead of having two tagging system, the service should allow humans and machines to analyze the both types of tags together and separately. The separation of these two types may be most useful at the interface level where users interact with the information. Users should be able to adopt the new dimension if they find it useful. People few bookmarks may not need a more complex organization system. All tags start off as subject tags and then users can specify certain tags as non subjects as they see fit. Overall the model for dual tagging is flexible and should not require a complete overhaul of a tagging system that adopts this model. 5.1 Benefit to Taggers The system will allow users to tidy up their tags in a way that adds meaning to their bookmark organization and allow searching for tags by terms in their vocabulary and memory. Users will be more likely to switch to one bookmarking system, reduce use of contextual conventions or the need to use multiple accounts to organize social and personal bookmarks. 5.2 Benefit to Community The added personal value becomes an incentive for users to organize their bookmarks better, cleaning up tags for themselves and as a result for the community much in the same way that social bookmarking allows personal information retrieval and social information discovery. Users can choose to view or ignore non subject tags of taggers, reducing the visual noise in tag clouds or tag related points of entry. Lower frequency tags become more meaningful and useful in algorithmic enhancement to tag based information retrieval. 6. Suggested Tagging Interface Since the success of tagging has been attributed to its low barrier of entry and a “personal value [that] preceeds network value” (Porter 2006), the user interface can be crucial to the success of this modified tagging model. Care should be given to the interface to prevent confusion and needless addition to tagger’s cognitive cost. While 7 Ame Wongsa INF 385T Dec 12, 2008 user testing would be needed to determine the best interface for this new tagging model, Figure 3 presents suggested tagging interfaces. Figure 1 Modified tagging Interface to allow addition of non subject tags that can be organized into folders Figure 1 presents a tagging interface with personal tags as optional to reduce any additional cognitive cost. Personal tags are automatically derived from the tagging syntax. It is possible that a single tag box serves two purposes is not intuitive to users so a selector box is used to add personal tags. This option is labeled as a “list” to help users understand that the following sets of labels are personal (physical) and belongs to the user. Once the user checks “Add to List” a selection box appears allowing the user to select a list. The default setting hides the list selector to indicate that the funtion is optional. Once user selects a list the notation for adding a non subject tag appears in the tag box and help teach users how to quickly add these types of tags in the future. Although a better user interface can be determined upon user testing, the point is to allow the separation of indexing labels and personal labels. 7. System and View Changes Delicious is bases on three major axes: users, tags, and URLS, which are reflected in the URL design used by the site (ex. http://del.icio.us/mattb, http://del.icio.us/tag/xml, and 8 Ame Wongsa INF 385T Dec 12, 2008 http://del.icio.us/url/8b7fe…) (Biddulph 2004). I explain changes to the user interface according to these three major access points in the following sections: Figure 2 Modified User View of Delicious. A non subject section is added to the sidebar. Subject and non subject tags appear in different color in the bookmark listing. 7.1 User View The main change to the delicious user page is the addition of the non subject tags in its own view in the sidebar. Users can choose whether to view either sets of tags as a cloud or as a sorted list. Users are able to employ a hierarchical organization system alongside archive resources for later retrieval. The subject tag cloud may better represent the interests of the user. The non subject tags may be shared or kept private by the user. Either way they present an interesting view of the user based more on current projects or interests. For quick identification, the two types of tags can be displayed in different color on the bookmark listing (see figure 1). 7.2 Tag view In the tag view the dual tagging system could clean up related tags sections that may contain non subject tags. Personal and subject tags shown in different colors on the 9 Ame Wongsa INF 385T Dec 12, 2008 entries allow people to scan for relevant information since the non subject tags are usually only relevant to the tagger. 7.3 URL View In the URL View the separation of subject and non subject tabs allows people to quickly tell what the URL is about and how people use the site. 8. Manual Tag Refactoring Given that the dual-tagging model adds focus to personal tags, better tag refactoring will be crucial the model. The use of non subject tags have been excluded from traditional classification systems due to their potentially temporary or task specific nature. Promoting them in a tagging system would require more robust tag management tools for editing, renaming, and deleting tags as context changes. Currently delicious only allows management of single tags and the creation of tag bundles. Peters and Weller determined that improvement to folksonomies is required at multiple levels: at the Document collection vs. single document level, personal vs. collaborative level and at the intra- and cross-platform level (2008). In addition to these areas, the dual tagging model would need editing features standard to desktop organization systems. Users should be able to easily select multiple documents to tag and untag. Conversely users should be able to select multiple tags to label a single or multiple documents. 9. Future Work and Potential Use This paper proposes adding a dimension to the collaborative tagging model. While the potential benefits to tagger and the community have been outlined, a prototype needs to be created and implemented to determine the usefulness of the system. An exploratory case study investigating this dual tagging system should answer the following research questions: 1. Feasibility: Would users assign non subject tags and would they be meaningful and maintained? 2. Accuracy: Does the removal of non subject tags allow for better information retrieval? 10 Ame Wongsa INF 385T Dec 12, 2008 3. Utility: Can the modified tagging system improve algorithmic based tag refinement? 4. Meaning: Are the non subject tag graphs and visualization meaningful? If the addition of non subject tagging to the tagging paradigm proves to be useful, it can be expanded and used in other paradigms such as Vander Wal’s Model of Attraction. Vander Wal describes the Model of Attraction as a framework for describing how users interact with information with a focus on outlining user’s needs and desires in the information life cycle. As the model’s name suggests, attraction is the key to user’s relationship with information: users draw information closer and are attracted to terms in the presentation layer creating a two-way attraction (Vander Wal 2004). Similarly, a well-developed, hierarchical template of our interests, such as our bookmarking folder system, can serve as a filter to draw in information. We can also use the hierarchical structure to browse the information that fits the characteristics of our personal information cloud. Imagine if we were able to browse someone else’s bookmarks in the way we organize our own resources. If we browse the resources of someone who uses a similar vocabulary, we could traverse a familiar taxonomy multiple levels deep and make discoveries at each turn. Furthermore, what if we can get updates for our whole system of knowledge without creating a new RSS feed for each new interest. A personal information template also fits in to Vander Wals’ discussion of “Attraction Receptors”. The following points show how the two models are compatible: 1. Intellectual: “Classification systems are based…on the cognitive attraction terms based on the user definition of those terms.” 2. Perceptual: “[U]ser has preconceived ideas of the… visual and auditory presentation form [and] style.” 3. Mechanical: Aggregators draw information that matches certain criteria. 4. Physical: “Users are continually trying to attract and keep desired information closer to themselves.” “Users set parameters of attraction for the information.” “User prefer to have information in formats that work easily with their receptors” (Vander Wals 2004). 11 Ame Wongsa INF 385T Dec 12, 2008 8. Conclusion The problem with traditional organization systems is that information is organized statically. We overcame these problems with search and tagging systems but we lost the semantics provide in complex taxonomic systems as well as our ability to grok information important to us, most of which now lives on our computer or on the web. We may begin by separating our resources according to subject and context to better understand our own information space. Doing so could allow us to start building a dynamic template for the better information understanding and discovery. References: Biddulph, M. (2004). Introducing del.icio.us. O’Rielly xml.com. Retrieved December 1, 2008 from http://www.xml.com/pub/a/2004/11/10/delicious.html Chi, E. H. (2008). Mytkowicz, T. Understanding the Efficiency of Social Tagging Systems using Information Theory. Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 81-88, 2008. Francoeur, S. (2008). Improving tagging in del.icio.us. Digital Reference. Retrieved December 1, 2008 from http://www.teachinglibrarian.org/weblog/2008/07/improvingtagging-in-delicious.html#links Golder, S. Huberman, B. A. (2006). The Structure of Collaborative Taggi .ng Systems. Information Dynamics Labs, HP Labs, Palo Alto, USA. Guy, M., Tonkin, E. (2006). Tidying up Tags?. D-Lib Magazine, 12(1). Kipp, M.E.I. (2006). @toread and cool: Tagging for time, task and emotion. 17th ASIS&T SIG/CR Classification Research Workshop. Abstracts of Posters, pp. 16-17. Kipp, M. E. I. & Campbell, D. G. (2006). Patterns and Inconsistencies in Collaborative Tagging Practices: An Examination of Tagging Practices. Proceedings of the Annual General Meeting of the American Society for Information Science and Technology. Austin, TX, November 3-8, 2006. 12 Ame Wongsa INF 385T Dec 12, 2008 Kwasnik, B. H. (1991). The Importance of Factors That Are Not Document Attributes in the Organisation of Personal Documents. Journal of Documentation 47(4), 389-398. Malone, T. W. (1983). How Do People Organize Their Desks? Implications for the Design of Office Information Systems. ACM Transactions on Office Information Systems 1(1), 99-112. Paolillo, J., & Penumarthy, S. (2007). The social structure of tagging Internet video on Del.icio.us. Proceedings of the 40th Hawaii International Conference on System Science. Peter, I. Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance. Webology, 5(3), Article 58. Porter, J. (2006). The Del.icio.us Lesson. Bokardo.com, retrieved December 1, 2008 from http://bokardo.com/archives/the-delicious-lesson/ Shirky, C. (2006).Ontology is overrated: Categories, links and tags, Clay Shirky’s Writings about the internet. Retrieved December 1, 2008, from http://www.shirky.com/writings/ontology_overrated.html Sinha, R. (2005). A cognitive analysis of tagging, Rashmi’s blog, retrieved December 1, 2008, from http://rashmisinha.com/2005/09/27/a-cognitive-analysis-of-tagging/ Vander Wal, T. (2004). Understanding the Personal Info Cloud: Using the Model of Attraction. Vanderwal.net. Presentation, retrieved December 1, 2008 from http://www.vanderwal.net/essays/moa/040608/index.php Wagners, O. (2005). Themenmonat Tagging: 5. Best Practice. Oliver Wagners agenturblog, retrieved December 1, 2008 from http://www.agenturblog.de/200511/themenmonat-tagging-5-best-practice/ 13