Folksonomy — the significance of the least effort

Folksonomy – the significance of the least effort Unpublished Working Paper Timme Bisgaard Munk Phd- Student, New Media Copenhage University Kristian Moerk Editor, DR IT National Danish broadcasting Purpose: The purpose is to theoretically discuss and conclude on the value of folksonomies on the basis of a literature review and an empirical study of a great number of tags in the del.icio.us computer program, focusing on possible theoretical explanations of the imitation patterns found by analysis and the stability of the user's tags. Design/methodology: Literature review, theory discussion and empirical statistical analysis. Empirical findings: The existence of a statistical power law and a number of common stable patters in the user-created metadata in del.icio.us. Practical implications: The need to teach the users how to tag in order to realize the potential of folksonomies. Originality/value: Cognitive economizing and information cascades as a central dynamic behind the power law and the information dynamic in the user-created metadata, which makes it impossible to realize the full benefit of the system. Keywords: Folksonomy, Social tagging, Classification, Collaborative tagging, Knowledge organization Paper type: Empirical analysis and theoretical discussion Abstract: Today it is possible to find programs on the Internet that provide the users with the possibility to freely mark out information by creating their own personal metadata. This is called folksonomy. This article discusses the categorizing concept of folksonomies with the empirical point of departure being an analysis of the social bookmarking system del.icio.us. The essence of folksonomies is user-created descriptive metadata as opposed to the traditional sender-determined descriptive metadata in taxonomies and faceted classification. Its supporters perceive user-created metadata as a better, cheaper and more realistic means of creating descriptive metadata than that which can be created in taxonomies and faceted classification. Descriptive metadata is precisely the determining prerequisite for better sharing of knowledge because it is the key to creating better semantic relations and search possibilities in the explosive mass of data on the Internet. From a reception-critical deconstruction of the premises of the debate on folksonomy and a series of empirical analyses of the production of descriptive metadata in the social bookmarking system 1 del.icio.us, a pattern in the user-created description is found through analysis. The pattern is the classic law of power, which, in many complex systems is seen to unfold as an imitation dynamic that creates an asymmetry, where a few descriptive metadata are often reproduced and the majority seldom reproduced. In del.icio.us, it is the very broad and basal subject headings that are often reproduced and achieve power in the system – which in cognitive psychology is called cognitive basal categories – while the small, more specific subject headings are seldom reproduced. The law of power's underlying imitation dynamic in del.icio.us is explained from the perspective of two theoretical paradigms, i.e. market and cognition. The conclusion is that the law of power and asymmetry chiefly come into being as a consequence of cognitive economizing through a simplification principle in the users' construction of the descriptive metadata. The users predominantly choose broad basic categories, because that requires the least cognitive effort. The consequences are that folksonomy is not necessarily a better, more realistic and cheaper method of creating metadata than that which can be generated through taxonomies, faceted classification or search algorithms, e.g. Google. The conclusion is that folksonomy as a self-organizing system without virtue of immanent out springing of dynamics alone can create better and cheaper descriptive metadata. This can only be achieved by the user learning to better mark up and invest both more time and cognitive resources in practice. The least effort gives the least benefit. What is folksonomy? The so-called folksonomies are concentrations of user-generated categorization principles that are both private and public. The term was coined in 2003 by information architect Thomas Vander Wal (Smith 2004). It is a neologism consisting of a combination of the words folk and taxonomy. Taxonomy is from the Greek taxis and nomos. Taxis means classification and nomos means management. Literally, it may be translated to "people's classification management". Folksonomy may be said to be metadata from/for the masses (Merholz 2004). In the etymological and connotative meaning of the term, the intention of folksonomy is to create a better, more popular and thus more democratic alternative to the elitist and undemocratic taxonomy. Folksonomies are thus created by the people for the people on the basis of the premise that the categorizing people can create a categorization that will better reflect the people's conceptual model, contextualizations and actual use of the data. With folksonomy, it would in this way be possible to create a more representative, natural, comprehensive, diversified, up-to-date and dynamic categorization than through the classic taxonomy. Folksonomies are an expression of a paradigm shift away from classic cataloging for better or for worse (Gorman 2004), with people everywhere beginning to tag information with their own words on the Internet. Folksonomies are spreading exponentially on the Internet and are now created by millions of users in a system which is no longer restricted by language or geography. As mentioned above, the categorizing of information used to be a specialist job for a producer or a librarian, but now it is not just a professional job for the few, only it has become a job for everyone and no one. The separation between user and producer implodes with the usergenerated metadata, when the active, co-creating user contextualizes and categorizes the information with his new dual role as both producer and consumer, where meaning is created through self-reflective use and co-creating construction. Data explosion, digitalization and democratization of information have thus lead to privatization, socialization and individualization of the cataloging of data through metadata. This has resulted in new systems for categorizing information spreading complementarily, in parallel or as a substitution for the classic semantic and hierarchical classification principles. 2 Folksonomy may be seen both as an individual act and as an expression of many people's collective, but independent recording of metadata. As an act, it is the individual person who categorizes and thus tags information with his or her own metadata by adding personal key words. In this way, a personomy of individual tags is created. To rephrase, folksonomy as an expression is a function of the total sum of personomies, where the individual users collect and tag in order to explore, remember and retrieve their own knowledge, thus creating a shared opportunity to explore and retrieve. This sharing and copying of knowledge is of no great cost to the tagger; however, it is of great benefit to all. The individual taggers will invest their time and effort in helping themselves, and without further cost they will implicitly also help other taggers. In this sense, folksonomy is based on a more or less explicit social contract, where the individual tagger invests his or her personomy in order to relate his or her own personal categorization to that of others. Everyone can see and use the folksonomy, but if you choose to be excluded from the contract and not contribute with your own metadata, you will be forced to follow other people's keywords at random. However, if you choose to register as a user and start tagging pages as an active and contributing participant in the folksonomy universe, you will be rewarded, as the tags and links that you add are contextualized dynamically in a user-generated system and become transparent. Metaphorically it may be said that the benefit of participating in the social system of the network is that you can always see an updated "exchange rate" of your individualized world picture relative to other people's world picture. Here, the "exchange rate" is an infinite, relational social semiosis where everyone associates each other's associations further. Since Thomas Vander Wal coined the term folksonomy, there has been a consensus that folksonomy is the collective term for a type of social classification on the Internet. However, this is an imprecise and incorrect definition. The distinctive feature of folksonomies is that it is not classification in a strict sense, but loose, horizontal social categorizations (Jacob 2004). Folksonomy consists of disconnected and loosely related keywords, which ideal-typically exist in a coordinated horizontal universe, only connected by associative relations. Here, there is no hierarchy between superior and subordinated concepts. No keywords are children, parents, twins or synonyms as in classic taxonomy. In theory, the user may thus choose freely without considering the hierarchy. In practice, however, there are forms of categorization in the folksonomies where the relations between the individual tags are hierarchical, because the user chooses tags that are not coordinate but subsets of each other. All words in the folksonomy are thus in theory unrelated. Only the associative relations between keywords are generated on the basis of the collaborative recording of tags. Contrary to formal taxonomy and classification, folksonomy thus lacks the explicit relations with predefined, consistent, descriptive and shared terms expressed as a controlled vocabulary (Mathes 2004). A distinctive feature of folksonomies is thus the possibility of adding own keywords unsupervised and viewing the keywords added by other users unsupervised. By using other popular keywords, your own tags are rendered visible and you have the opportunity of following the tags of others with the same popular keywords. In this sense, folksonomy may be regarded as a popular shift away from the hierarchical, controlled and authoritarian ways of categorizing information, where the user chooses not to learn a hierarchy but instead releases his or her own personal association chain in a common social forum (Quintarelli 2005). This is based on the notion that this forum makes it easier to find the relevant information, provides more transparency in respect of other people's knowledge, contains more representative, rational knowledge due to the number of participants, and that the knowledge in this 3 system is more up-to-date, because it is more dynamic/social and is created through widespread collaboration over the Internet (Surowiecki 2005) (Sunstein 2006). As mentioned, the principle in folksonomies is that the user adds information on his or her own and other people's information sources. This type of data about data is called metadata, and the process of adding metadata is called tagging. The user thus tags information with metadata and can subsequently use the generated metadata to organize the data. Metadata are the pillars of all taxonomies and categorizations. Metadata are the guiding principle of many content management systems in which the pages are compiled and placed according to the dimensions of the metadata. In connection with searches, metadata often also play an important part as a categorization tool in combination with free-text search. There are three types of metadata: inherent, administrative and descriptive metadata. The inherent or structural metadata are information used to describe the nature of information. This may include a file type or size etc. The administrative metadata are information describing the handling of the file. This may be a publication date, date of latest change, status etc. Finally, there are the descriptive metadata, which is information on content: subjects, context, related information, recipient etc. On the Internet, these three types of metadata are often used in combination for organizing content, where the descriptive data organize the general structure (typically a classification of subjects and sub-subjects), while the administrative and, to some extent, the inherent metadata are used to organize list material and an overview (typically the most recently added content within a subject, other content by the same author etc.). Descriptive metadata are very different from the other two types, as they relate to the content of the information, i.e. to the meaning that may be deduced, and for this reason, they are very hard to deduce automatically. This is a problem, since the descriptive metadata for most people are the most natural entry point to large volumes of information, where information about subject, context and content is vital. At the same time, the organization of the descriptive metadata forms the basis of many of the taxonomies and classification principles that we use to arrange information. Consider the subject index in a library or the sections in a newspaper as classic examples of descriptive metadata categorizations. Basically, there are three strategies for creating descriptive metadata and thus organizing content: hierarchical, polyhierarchical or horizontal (Quintarelli 2005). In its pure form, taxonomy is vertically constructed in a hierarchical structure, in which descriptive data are assigned on the basis of predefined rules. Different types of information fit into different places in the often very comprehensive hierarchy of classes such as supercategories or subcategories or synonyms. Everything has its place, and if you know the system, it is fairly easy to retrieve the information. Because integrity and consistency are its strength, this strategy requires a comprehensive overview, consistency and a methodical knowledge requiring professional metadata administrators (normally librarians), who will assign the information to its rightful place in the system. With a polyhierarchical strategy, it is possible to go across the hierarchical structures in a kind of faceted classification, where the same information unit may be assigned different facets which may then be used for searching. A facet is a category (Ranganathan 1962; Ranganathan 1964; Taylor 1992; Alvesson and Karreman 2001). The notion behind faceted classification is to create a higher degree of multidimensionality in the metadata (Wynar 2002). For each search, the different facets are filtered and selected until the user reaches a manageable and limited set of facets meeting the entered search criteria. In stead of navigating through a predefined hierarchy, the organization is determined on a current basis by the user's searches and thus by the dynamic polyhierarchies. Thus, the search is dynamic, but limited, since the facets have been defined 4 centrally and as a final repertoire updated through guided navigation (Vickery 1966) in contrast to folksonomy, where the repertoire is infinite and decentralized. The difference between the faceted classification and folksonomy is that in the former there are many categories of links, while in folksonomy every link is a category. While folksonomy can clearly be distinguished from the two other principles in all dimensions as a radically different and innovative way of creating metadata, because it is the user who creates the structure, taxonomy is centralized, hierarchical and structuralist, and faceted navigation is polyhierarchical structuring with a predefined set of facets. In both cases, the user's role is more limited and structured than in folksonomy. Literature: Since this is a new phenomenon, the scientific literature on folksonomies on the Internet is limited to less than 10-15 articles, which have been produced within the past two to three years. Please refer to Macgregor & McCulloch for an overview (Macgregor and McCulloch 2006). It has only recently become possible to share bookmarks with the creation and dissemination of social bookmarking software programs. Literature on this subject falls into three general groups or schools. Firstly, there is the rather narrow and apolitical information science approach which is based on library science. Here, the phenomenon is analyzed on the basis of the theories, methods and standards that apply within the library field. The central problem is the categorization of information and the relation between the new folksonomy and the classic taxonomy. The idealtypical representatives of this perspective are Mathes and Quintarelli, who both conceptualize folksonomy as opposed to taxonomy. Secondly, there is the much wider media studies approach. Here, folksonomy is placed in a larger social policy framework and connected with the Internet's paradigmatic changing of the communication and power structure in society. The ideal-typical representative of this perspective is Clay Shirky (Shirky 2004; Shirky 2005) from Colombia University. Thirdly, we have the scientifically and mathematically inspired computer science approach. Here, the focus has been on how to analyze folksonomies on the basis of a number of statistical and mathematical models. The ideal-typical representatives of this perspective are Golder and Huberman (Golder and Huberman 2005). The theoretical perspective is a mathematical modeling of folksonomy in order to obtain a number of patterns and mathematical rules by analysis. Folksonomy is essentially an illustrative example of how many network phenomena may be described and modeled. The most significant contribution is the demonstration of a mathematical power law in del.icio.us (Shen and Wu 2005) as in other complex systems as well as the existence of stability in the tags used for a given website over time. In parallel with scientific literature, a mass of references to and descriptions of the phenomenon have emerged on the Internet in the spirit of folksonomy and the genealogy of the word. An idealtypical representative of this type of text is, of course, the man who coined the phrase, Thomas Vander Wal (Vander Wal 2005). The people who have written about the phenomenon have been dedicated lead users who are technically gifted and interested in emerging computer programs and technologies. The broad popular writings about folksonomy may be divided into two groups. Firstly, there is a great deal of superficial literature which generally just briefly introduces the technology or social bookmarking programs such as Flickr or del.icio.us. These are mainly texts focusing on how the system may be used in practice. Secondly, there is a group of people, who in brief sections in their blogs profile and discuss some of the problems with folksonomy raised in the scientific articles within information science, media studies and computer science. These are often 5 highly-educated practicians who work with the Internet as either IT consultants or web managers in large corporations. Reception criticism of the literature Folksonomy as a concept and a method is essentially hype and illuminated by the zeitgeist, since it endeavors to provide a solution to the data explosion on the Internet, which is a growing problem. Metadata help create order in the information chaos, and any strategy and any program promising more metadata quickly, cheaply and better is thus often given a positive reception as an important and useful tool for solving a growing problem. Possibly with uncritical and unrealistic expectations. There is no doubt that folksonomies have been hedged in the hope and ideas of the zeitgeist for better or worse. When technology becomes a part of the fickle zeitgeist, the debate is polarized, and the participants become either ardent supporters or pessimistic critics. This aspect in particular has characterized the debate on this technology. At one end of the scale, there is the idealistic belief that these technologies represent a paradigm shift within knowledge management and knowledge sharing, where tagging and folksonomies are perceived as a substituting alternative to taxonomies which are inferior to folksonomy for practical and ideological reasons (Shirky 2004; Shirky 2005). At the other end of the scale, there is a pessimistic correction to the supporters' idealization of the technology. Here, the premise is that the perception of the user's self-generated metadata as the foundation of knowledge sharing goes far beyond the practical applicability of such metadata. Metadata become metacrap, and folksonomy is metautopia (Doctorow 2005) . Where the people lie and are lazy and stupid. The result is what might be expected. On the basis of the classic library criteria of classification, critics believe that taxonomies are far more precise and consistent and that they make for a better information retrieval tool than the popular amateurish alternative. The idealistic supporters of folksonomies emphasize that taxonomy is a generalized method that implicitly was created as a classification system to arrange physical books in a library, where the books only have one correct position. The method is suitable for this; however, it is not suitable for categorizing digital knowledge resources, where you can choose to update different important facets at the same time, and where information can easily appear in several places and in several different contexts at the same time (Shirky 2005). Here, folksonomy is the only future-oriented substituting alternative for the supporters. The premise of this contrasting here is that folksonomies as a method represent an epochal shift from the library's old world of dusty books to the Internet's new world of ideas flying around in a digital democracy(Shirky 2005). Critics argue that folksonomy in its essence is imprecise since the starting point is an imprecise vocabulary and a horizontal tag with unrelated keywords. The individual keywords stand alone. Identical keywords refer to different links. Identical links have different keywords. In most folksonomies there is no synonym control, no distinction between the plural and the singular or translations of abbreviations (Mathes 2004). Although the most recent versions of the newest folksonomies offer the opportunity of seeing related tags, and thus finding out how others have tagged identical links, the ambivalence and the lack of precision persist (Mathes 2004). With folksonomies, the users are involved actively in a user-friendly system. From the critics' point of view, the price is a simplified and more imprecise system relative to the classic taxonomies – a system that does not meet the requirement for consistency as a library virtue. The result is a people's classification system which is poorer and less efficient than the 6 alternative methods on the basis of the current requirements for information search and information categorization. Both the critics and the supporters may be categorized in different groups on the basis of thematics, abstraction level and the question of whether folksonomy should be perceived as an alternative, a substitution or a supplement to the traditional hierarchical taxonomies. There is also disagreement on the purpose of folksonomy. This is reflected in the fact that what some perceive as a weakness, others consider a strength, because there are different opinions on the purpose of folksonomy. All writings about folksonomies have six problematic preconditions, which have so far distorted the debate: 1. Firstly, the debate has been marred by unreflective and unreasoned media essentialism, where certain media forms are considered to have certain qualities and values without anyone having considered the non-media essential qualities. The supporters consider books to be inferior to hypertext documents that have a number of ideological values per se. Critics have a superficial perception of the Internet as a cluttered library in need of good oldfashioned order. In both cases, this formalism poses a problem, since it disregards the fact that quality and value must also be perceived in relation to the specific content, independent of the medium. In other words, there are many examples of well-functioning taxonomies on the Internet today, and folksonomies as a people's classification and cultural practice existed long before the Internet. For this reason, it is not necessarily a matter of making a clear choice between taxonomies, faceted navigation and folksonomies. The three different methods have their strengths and weaknesses in respect of the different types of information, different tasks and different contextual interrelationships across media. 2. Secondly, method essentialism has now come into play, where certain methods for categorizing information are uncritically given different essential properties and ideological values. Naturally, taxonomies, facet classification and folksonomies as systems have certain essential properties, but they are in both cases also practice and action. They have both essential properties and non-essential properties, depending on use, information type and context. Consequently, one must consider the individual articulations and applications of the methods in order to get a true and fair view. There are both good and bad taxonomies, faceted classifications and folksonomies, depending on who has created them, how and with which tags. The quality and efficiency of the three systems is thus also highly dependent on practice. 3. Thirdly, both supporters and opponents all distort the views and methods of the opposition as being either/or. An argumentation and a zero-sum game, where one of the methods is right and the other is wrong. Here, it is wrongfully described as though the opponents would like a substitution, or more sophisticated as a dichotomy, where the opponents' solution is wrong, which means that one's absolutely postulated alternative is right (Shirky 2004; Shirky 2005). This contrasting is wrong, since the three methods can in no way substitute each other. They are complementary or supplementary in the general universe of possibilities of creating metadata. All three methods have some essential qualities and properties. Taxonomy is an expedient tool for finding what you are looking for, while folksonomy is suitable for finding out what to look for. Facet classification is a good tool for finding what you are looking for relative to certain dimensions that you know that you are searching for. In the former 7 case, you have a clearly defined object for the search, and in the two latter cases, you investigate, more or less exploratory, an interpretative plurality through your own and other people's associations. Criticizing taxonomies for being an undemocratic hierarchy that does not support learning and criticizing folksonomies for being undemocratic and too influenced by horizontal associations, which are unsuitable for searches, is therefore somewhat of a misunderstanding. And similarly, to criticize faceted classification for being too centralized in the selection of dimensions is also wrong. In all events, it is these qualities that determine the essential qualities of the methods and create complementarity. In practice and in theory, it is thus rather the question of how the different methods should be combined in a practice of metadata ecology relative to the specific objects and types of information that comprise both the collective and the individual so that the methods are utilized in the best way possible in the right instrumental mix (Wright 2001; Quintarelli 2005). 4. A fourth problem is that different knowledge domains and terminology are mixed in the discussion. Library-related problems and quality criteria are mixed and contrasted with ideological and political value judgments in the same arguments. Where words such as democracy, freedom and openness are premises for contrasting the different methods and drawing library-scientific conclusions on them, or conversely, where political conclusions are drawn from library-related premises. 5. Fifthly, the debate is characterized by information essentialism, where information as a concept is regarded as an unambiguous, transcendent and homogenous phenomenon, which may be related to the different knowledge sharing tools in the same way. This is an erroneous simplification because information is multi-faceted. Some types of information are best represented in a folksonomy. This applies to dynamic, unstable, heterogeneous information with many different dimensions, where meaning is being negotiated dynamically. On the other hand, homogeneous static types of information, where negotiations have been concluded, are best represented in taxonomy or faceted classification (Kwasnik 1999; Sturtz 2004; Quintarelli 2005). 6. Sixthly, different analysis levels in the debate are mixed and described as absolute contrasts where they are not. At a very abstract macro level, folksonomies may be regarded as a digital democracy, a market, a language or a network. At a micro level, it may be seen as a cognitive process and a categorizing system. The point is that both the macro perspectives and the micro perspectives may be true at the same time and that they are not necessarily absolute contrasts, only they are different and immaterial facets of the concept existing side by side. These six premises for the debate cannot form the basis of a correct understanding of folksonomy as a method. In the next section, we will endeavor to achieve a non-essential understanding of the phenomenon which to a greater extent sees it as a pragmatic practice and empirical reality. Case: del.icio.us The Internet now features a vast number of programs supporting folksonomies as a social practice, e.g. computer programs such as Flickr, Technocrati, Shadows, Yahoo!, My Web 2.0 and del.icio.us. In order to create a better framework for the discussion of folksonomies, del.icio.us has been chosen for this case, as the program was one of the first and is one of the leading folksonomies. In 8 December 2005, the inventor of del.icio.us was named the Innovator of the Year by MIT's Technology Review Magazine for his work with the del.icio.us program. The name del.icio.us contains an invitation for the users to compile everything that is delicious. Del.icio.us is a broad folksonomy, because many people tag many different websites and the same websites as opposed to the photo database Flickr, which is a narrow folksonomy, where few people tag their own pictures with their own words (Vander Wal 2005). Del.icio.us is essentially a system that can be used to keep track of your own and other people's bookmarks on the Internet by saving links for selected pages and adding your own keywords in connection with the link. These keywords are the information architecture and navigation principle of the website. The users can basically do four things in the system: They can follow a certain tag (e.g. business), follow a certain user or group of users or create their own lists of tags. When the users press a keyword or create their own keywords, they can see what links other users have added to the same or related keywords. This requires registering as a user and opening an account, in which you then gather links and related keywords of your own choice in a very simple interface. You may then choose to publish your keywords and links and thus share them with others, who may the follow the tags and see what other users in turn tag with the same keyword. This unique knowledge-sharing technology provides new scope for sharing bookmarks and keywords between large groups of people who used to be separate in time and space, but who are now joined in the words they use to describe the world. The technology thus creates the opportunity of having a joint association field on the basis of the total sum of tags. del.icio.us users call this a tag cloud. A joint cloud of tags in the form of keywords. Which websites are tagged in del.icio.us? Information on the tendency in the database may be drawn from looking at the 500 most frequently tagged websites and the 50 websites that have been tagged most times since the launch of del.icio.us. Even though it must be noted that the more than 100,000 users have different purposes of sharing their bookmarks with others, there is still a tendency that del.icio.us is essentially a community for people with an interest in or who work with information technology. Bookmarks to IT-related pages and the use of IT-related tags are thus over-represented. Considering that folksonomies is a relatively new concept on the Internet, this characteristic is not surprising, since the professional users of the medium will typically be the frontrunners in connection with new technology, software and opportunities. The content may be divided into the following categories on the basis of a qualitative analysis of the 50 most popular websites since the launch and the 500 most frequently tagged websites. (Sources: Populicio [http://populicio.us/fulltotal.html] and CollaborativeRank [http://collabrank.web.cse.unsw.edu.au/ Del.icio.us/?cmd=top_urls]. The different websites fall into three groups, where the first group is the largest one by volume according to number and popularity. 1. How-to websites/tutorials on a number of popular computer programs and technologies. All websites that can help the user solving concrete IT-related jobs. 2. Websites that can help the users make better use of the del.icio.us system. 3. Websites which in one way or another live up to one or more of the classic news criteria in respect of the IT field as a profession. This is information which may broadly be perceived as being news within the IT field. Here, there is an over-representation of websites that may be considered to be quaint news, ideas or concepts. 4. Websites which constitute a digital public such as digital debate media and may be perceived as joint forums for users with an interest in IT. 9 The most important function of the system is therefore that it supports knowledge sharing and helps the users share tools and knowledge about how to solve different IT jobs. This also includes sharing new knowledge (news) in respect of specific IT-related jobs. If you look at the 87 most popular keywords and thus the strongest tags in the system at any given time, the picture of a database for people interested in IT created by people interested in IT is reinforced. The users here tend to have a far more varied and broad repertoire of keywords on IT than on other categories in the system. In addition, the creation of hierarchies through subcategorizations and synonyms appear more often with IT-related bookmarks, while other categories outside the IT area appear as detached and without sub-categories. The following list, which was retrieved from del.icio.us in mid-October 2005, shows the most frequently used tags sorted by frequency (http:// Del.icio.us/tag/?sort=freq ): "Blog programming software web design reference music news ajax tols linux javascript css howto web2.0 firefox art blogs politics games webdesign shopping technology humor business google science photography tutorial mac search tips fun java tech windows opensource video development free internet daily toread flash diy security mp3 osx ruby php hardware books funny cool media travel article productivity language webdev del.icio.us research comics microsoft photo education downloads maps computer work tv email blogging apple culture electronics rss podcast audio hacks food wiki movies religion freeware geek rails community graphics entertainment writing architecture game download inspiration shop extensions html tutorials history library photos philosophy radio python code .net tool hosting lifehacks" Generally, the most frequently used keywords may be divided into two groups. Firstly, very broad and very general keywords on cultural and societal, generally human categories (music, politics, science). Secondly, a number of keywords which may all be regarded as a sub-category under terms such as computers or the Internet. From this list, it may be seen that tags which are practically related to the solution of concrete IT tasks are predominant (tools, tips, howto, tutorial, tutorials), while terms and programs related thereto are also very predominant, including a number of ITrelated abbreviations (html, osx). There are also a number of job-based tags which are very active (such as toread, 2read, read later, to do) and a number of value judgments (funny, cool). Another interesting observation is that the most popular keywords are very general and may be said to belong to a number of cognitive basic categories which many people working professionally in the IT field will consider if asked to mention some IT terms off the top of their heads. How do users in theory tag in del.icio.us? According to a qualitative analysis, there are seven different tagging strategies for the users' recording of metadata about various websites in del.icio.us (Golder and Huberman 2005). The different tags are either based on content, medium, ownership, hierarchization, specification, value judgments, meta-reference or jobs: 1. Tagging is content categorization. Categorization through content in different degrees of specification relative to different dimensions. This is by far the most popular categorization method. Here, tagging marks how the content of the bookmark may be described as belonging to a number of content categories such as: business, marketing and web design. The subject categories are often very broad. 2. Tagging is media categorization. This takes the form of a reference to the medium which the bookmark concerns or appears in. Here, it is not the content that is described but the 10 3. 4. 5. 6. 7. form of provision of information. This may, for example, be a reference to media such as book, TV or newspaper. Tagging is categorization subject to copyright. This takes the form of a reference to ownership or producer. The producer's or the institutional sender's name is mentioned. Tagging is categorization creating a subcategory or a supercategory. Here, the horizontal structure of scattered keywords is replaced by hierarchizations and relations between the individual tags through tagging. Tagging is categorization through a value judgment. Here, a value judgment on the bookmark is given. There are, for example, often positive value judgments: cute, funny, nice or cool. Here, the bookmark is related to personal references and values. Tagging is categorization referring to the categorization. This meta-reflexive categorization takes place through a meta-reference to tagging as an action or structure or to the social bookmarking system as such. This could, for example, take the form of a reference to tagging as an individual action: Mytags, worktags, tagging or del.icio.us. Tagging is categorization on the basis of jobs and process. There is often a reference to the bookmark in relation to a future job. Examples include: Toread later, checkout later. These different tagging strategies must be perceived as ideal-typical descriptions, where it may be hard in practice to set clear boundaries between the different strategies. They may, for example, appear in the same words as hybrid tagging strategies, where a content categorization is also a job description, and in some cases, the different strategies may be subsets of each other. Furthermore, it must be noted that the individual tags may often cover different motives pointing further than their immediate meaning. In the strict sense, tagging of the word RSS is a content categorization of a bookmark on the RSS technology; however, the meaning may be to collect everything new and relevant on RSS in respect of a certain job. Thus, the meaning is really: "Exciting news (-> value judgment) on RSS which must be read (-> job categorization)". The meaning of the word may thus be far more complex in the tagger's consciousness than what may immediately be seen from the presentation form as a tagging strategy. We could also add an additional strategy which was not included in the seven concepts above, but which does apply in practice. Tagging where categorization identifies the bookmark's content in time in relation to the tagger's time as news, old news, history, future etc. In addition, it is natural to add a subcategory under the subject categorization, where the users in some cases tag a website on the basis of different life spheres, social roles and contexts for the bookmark such as workrelated, hobby, personal, private etc. The patterns of tagging It is clear in the system that the user-generated tagging does not result in anarchy or coincidence, but tends towards a consensus on specific words related to a specific website over time. This will be obvious to any user of the system. You would think that the opportunity of choosing tags freely would create a chaotic an idiosyncratic mass of tags without a clear pattern. However, figures show that this is not the case. On the contrary, a number of patterns relating to frequency, interpretation communities and relations between the different tags can be seen in the del.icio.us database and among the almost 70,000 users. There is regularity in the way the tags are used, their frequency and the relative distribution of frequency between the different tags. Through a qualitative empirical examination of the system it may also be concluded that the seven different tagging strategies are retrieved, and that the broad and general content-categorized strategy is dominant. 11 Empirically, tags from 500 random users in the system were collected in the period from 30 November to 1 December 2005 for this article. The total number of tags is 178,215. On this basis, it may be demonstrated that tagging follows the so-called power law (Shen and Wu 2005). As mentioned, the power law results in some tags being very dominant and frequently used. The majority of tags are peripheral and used very rarely. This is evident from the fact that the 246 most frequently used tags were used 66,398 times, corresponding to 37.2 per cent of the total number of keywords. This percentage is substantial. The statistical evidence that the randomized 178,215 tags follow the mathematical power-law distribution is provided in the following graph: magtlov distributionen i logaritmisk skala påvist for en gruppe af 178 215 opmærkninger fra e af 500 tilfældige brugere af del.icio.us 8 7 6 ln(frequency) 5 4 Serie1 3 2 1 0 0 2 4 6 8 10 12 ln(rank) The graph shows a power law with an exponent (i.e. slope) of 1. This is interesting, because it shows that the frequency of the chosen tags does not fall as quickly as in the natural language (an exponent of 1.5) (Zipf 1949). In other words, tags are MORE distributed than a person would do if he/she was to write a text. However, this is not strange, since the tags have a certain encyclopedic character as expressions and language action reflecting the many different interests of the users. An encyclopedic list or a dictionary has an exponent of 0. Consequently, this graph shows that the del.icio.us users are relatively heterogeneous and more heterogeneous than the typical language users in the written language. On the other hand, if you take a closer look at some of the 178,215 tags, it is clear to see that they are "strange", i.e. that they refer to certain acronym traditions, ITrelated catchphrases and IT slang. They are certainly not everyday words. This means that there are many small groups of people who have specific tag language and communities. What is fascinating about the demonstrated power law in the system is that, like in other complex systems, it is holographically organized, where all the tags follow the power law; however, the tagging of each individual website also follows the power law. This means that the shared power law consists of a million different power laws. This is the fractal organization principle seen in complex systems and recurring in folksonomies. The theoretical starting point is 12 the emerging scientific network theory, where it has been demonstrated mathematically that all complex networks follow a number of general regularities. The connection between the different vertices in the network is thus not a coincidence; rather it forms a pattern. The pattern of all complex systems is the power law. In del.icio.us and in many other networks, a few vertices in networks have many connections, while the majority of vertices have few connections (Barabasi and Albert 1999; Barabasi 2000; Barabasi and Albert 2002; Shen and Wu 2005) These central vertices are very important for the functioning of the network, while the majority of vertices play a more peripheral role. The power law thus not only appears in del.icio.us, only it may be found in a number of biological and non-biological systems. It may be perceived as a law of nature for complex networks. As described by one of the leading network theorists, Barabasi, on the power law and the common topology for all complex systems: "Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems." (Barabasi and Albert 1999) (59) The power law has thus been found in networks as diverse as the relation between Hollywood stars, the distribution of wealth in a population (Pareto's Law) (Frank and Cook 1995), the pattern of scientific references, the relational size of big cities, active words in the English language (Zipf's Law) and in many biological systems (Barabasi and Albert 1999; Barabasi and Albert 2002). The power law has its own cumulative dynamic over time, which is popularly called the "The rich get richer" effect or the "The winner takes it all" logic, because winners become more and more victorious over time. This dynamic is also called the Matthew Effect, because the system follows the line in the Gospel of Matthew saying that "For unto every one that hath shall be given, and he shall have abundance". As described in the Barabasi quote, the power law arises because the network expands with the addition of new vertices which must be connected to other vertices in the network. In the case of del.icio.us, this takes place with the addition of new tags. Secondly, the new vertices have a general tendency to be attracted by the vertices that have many connections, have existed for a long time and already have a high frequency. The stronger and more well-connected a vertex is to begin with, the more probable it is that new vertices/tags are connected with it. In del.icio.us this means that the users have a tendency to use the keywords that already exist and which many users have already chosen. The power law is based on the principle of preferential attachment (Barabasi and Albert 1999) (511). In short, this means that popularity is attractive. The longer a tag has existed, and the more popular it is, the more attractive it will be. As described by Barabasi: "These examples indicate that the probability with which a new vertex connects to the existing vertices is not uniform; there is a higher probability that it will be linked to a vertex that already has a large number of connections….Because of the preferential attachment, a vertex that acquires more connections than another one will increase its connectivity at a higher rate; thus, an initial difference in the connectivity between two vertices will increase further as the network grows." (Barabasi and Albert 1999) (511) 13 The strong tags in del.icio.us are naturally the broad tags, since they can be connected to most websites because they are so broad. They are popular in their semantic essence, and thus they become even more popular. However, the constancy in the tags may be explained by the fact that the popular tags are attractive over time. The existing tags are every visible in the system, which makes it very probable that they will be used again. This creates constancy. Here, the fact that the system makes existing keywords and information about them visible (the "Most popular" function) plays an important part for the imitation. The closest that the scientific network theory has come to explaining preferential attachment is by referring to it taking place "due to the local decision… based on information that is biased toward the more visible". As expressed somewhat vaguely by Barabasi & Albert: "Similar mechanisms could explain the origin of the social and economic disparities governing competitive systems, because the scale-free inhomogeneities are the inevitable consequence of self-organization due to the local decisions made by the individual vertices, based on information that is biased toward the more visible (richer) vertices, irrespective of the nature and origin of this visibility." (Barabasi and Albert 1999) (512) Unfortunately, this explanation generates more questions than answers. This quote still lacks an explanation of "The local decision". It is only mentioned that the phenomenon may be explained by the degree of visibility, competition and as an individual decision. The scientific network theory can thus explain how imitation takes place in all types of complex systems; however, not why people decide to use the same tags. In the classic scientific network theory, we thus see a naturalization of the power law as a law of nature in all types of complex systems, which, unfortunately, limits the scope for explaining this regularity. The scientific approach seems convincing in its empirical and mathematical interpretation of complex systems and its description of the power law. However, this perspective seems more like a descriptive understanding and a model used to show the dynamic than as an actual tool for explaining it. Consequently, the mathematical modeling of the power law says nothing about its origin or background. As poignantly described by the famous biologist Evelyn Fox Keller in her critical essay "Revisiting 'scale-free' networks": "First, power-law distributions are neither new nor rare; second, fitting available data to such distributions is suspiciously easy; third, even when the fit is robust, it adds little if anything to our knowledge either of the actual architecture of the network, or of the processes giving rise to a given architecture (many different architectures can give rise to the same power laws, and many different processes can give rise to the same architecture). Finally, even though power laws do show up in the physics of phase transitions, the hope that the resemblance would lead to a "new and unsuspected order" in complex systems of the kind that physicists had found in their analysis of critical phenomena appears, upon closer examination, to lack basis." (Keller 2005) (1066) Natural science is thus not very helpful when it comes to explaining preferential attachment. It is explained as a unique and dependent feature of any system, which may not be explained by the general theory for complex networks. Albert & Barabasi explain it as follows: "It is now established that highly connected vertices have better chances of acquiring new edges than their less connected counterparts. The Barabasi-Albert model reflects this fact by incorporating it explicitly through preferential attachment. But where does preferential attachment come from? We do not yet have a universal answer to this question, and there is a growing suspicion that the mechanisms responsible for preferential attachment are system-dependent." (Barabasi and Albert 2002) (82) 14 Consequently, despite the fact hat the power law is evident everywhere in nature, there is no unambiguous explanation of where it comes from. This must be explained differently for each network. This also means that the mathematical network theory can easily co-exist with other theoretical explanations provided in this article. The scientific network theory is thus not able to provide an unambiguous explanation of the background to imitation in del.icio.us; however, it does provide a number of tools for better conceptualizing the degree and quality of knowledge sharing in the system. This may also be visualized in an abstract sense in the following way, where we see a marked over-representation and use of certain keywords for the entire database: y-axis Number of tagged websites Number of keywords 0-n x-axis The slope of the power law: The x-axis represents all the tags used in folksonomy. The y-axis is the number of websites tagged with the tags on the x-axis. The conclusion is that few keywords are used to tag several websites. Many of the authors who have written about folksonomies and del.icio.us agree to acknowledge the existence of these patterns and the power law (Shirky 2004; Shirky 2005). The majority of authors note the existence of some or all of these regularities; however, they do not provide an actual explanation of the background: What is the impact of the power law with the broad cognitive categories on the benefits of del.icio.us? What is the impact on the benefit of using the system? One of the attempts to explain the regularities has been made by Golder & Huberman, whose descriptions relate specifically to del.icio.us (Golder and Huberman 2005). They explain the patterns in del.icio.us as a result of three equal factors: imitation, knowledge sharing and the robustness of the cognitive concepts. Imitation and thus the constancy take place through the users' copying of other people's tags, where the dynamic is that the users perceive it as more legitimate and "safer" to use other people's tags than finding new tags themselves. The two authors consider this imitation to be an expression of a social imitation, where other people's use is social "proof" of what is right and thus safe to copy. Consequently, the explanation is sociological, and the premise is that popularity is attractive and becomes even more attractive in step with the popularity of the keyword. At the same time, they believe that stability is a consequence of the fact that the users have a number of work-related joint interests and thus the same world picture, for which reason they conceptualize/tag the world in the same way using the same keywords. In this world picture, certain keywords are dominant, because del.icio.us is predominantly a tool with a specific dominant purpose. In this case, the purpose is to solve certain IT-related work assignments. According to the two authors, while ideas come and go, the terms used to describe the world and thus the changing ideas of the time are far more constant and less dynamic. They do not change much or quickly. This 15 is a tendency in the database which is reinforced by the fact that it primarily concerns the operationalization of basic and very broad categories. The constant is thus a consequence of the robustness of our system of concepts and of the abstraction level and thus the representativity of the basic categories. Golder & Huberman's explanations of the stability and imitation dynamics are not sufficiently thorough. Explaining the stability as imitation, knowledge sharing and socio-linguistic robustness is essentially a conceptual displacement and a process description, where it is not clear what creates the conditions and the dynamic driving imitation, but only imitation explained as imitation. Imitation is thus both a premise and conclusion in respect of the process description. There is no coincidental imitation in the system, where everything is imitated to the same extent; only imitation is selected. Explaining this as an expression of social behavior does not explain why people perceive it as being "safer" to copy others. What creates this perception of safety, and what drives the users to do it? Is it because you buy other people's meaning in this way? Is it because you agree? Or because you do not have the time or the resources to make your own "unsafe" tags? This remains unclear, because it is uncertain what the social proof, although it is proof, is proof of. In the same way, the concept of knowledge sharing is an explanation providing a new concept for the description of the same process; however, it far from explains the imitation or puts it into perspective. The cognitive robustness may partly explain the accessible universe of conceptualizations and thus possible tags; however, it is a long way from explaining the constancy in certain categories and tags. We have the same linguistic terms to describe the world; however, the universe is so enormous and the combinations are so many among the almost 70,000 users that the robustness and the common basis can no longer explain the imitation. In the following, different explanations on what creates these patterns for knowledge sharing, tagging and imitation in the database will be described in order to provide a better explanation than that of Golder & Hubermann. This is based on a number of theoretical perspectives on folksonomy and imitation logic as such. In each perspective, some dimensions of the phenomenon are chosen over others, and the phenomenon is thus illustrated from different levels from the macro-economic sociological and economic level to a micro-analytical cognitive level. The perspectives have been chosen on the basis of an assessment of whether they increase the recognition and conceptualization of the empirical factors emerging from the database. How can an empirical analysis explain the power law and the imitation logic? The 178,215 tags from the 500 users have been processed through a statistical correspondence analysis of the tags. This method has been chosen because it is especially suited for uncovering possible patterns in data material with many variables and with many variables relative to many users. The correspondence analysis was developed by the French mathematician Jean-Paul Benzécri as a tool to analyze systematic relations between data. The analysis does not require any assumptions on distribution or scale properties, as all variables are considered to be nominal or categorical variables. The correspondence principle is, in short, that it generates a profile for each row and column in a cross table, and the analysis results in a map where similar profiles end up close to each other and different profiles far away from each other. The method constructs a profile for each of the subjects of the analysis (in this case the 500 tagging users and the 246 most frequent tags). The 246 have been chosen because the most frequent are also the most interesting in light of the hypothesis and strategy of the analysis of uncovering a number of interpretation communities. On the basis of a table of the profiles, the analysis constructs a map, where users/tags with similar profiles are situated close together, while users with more diversified profiles are situated far away 16 from each other. The closer to the corners they are situated, the more significant they are statistically. The map may in this sense be regarded as a visualization of statistic overrepresentation for both users and tags (for a detailed account of the underlying mathematics, please refer to (Hill 1974; Greenacre 1993). Two mathematical maps have been created. Firstly, a map of the 245 most frequent tags among the 500 users, and then the 129 most frequent tags for all users of the system distributed on the 500 users. 17 The conclusion is that the predominant tagging strategy is a content categorization. The patterns of both statistical visualizations show that there is a left-right axis from general use to IT-specific use. On the left side, very general common basic cultural categories which all people share, such as travel, politics and culture, are used. On the right side, there are a number of very specific IT-related keywords connected to certain IT-related fields and work functions such as Apache (database technology/language), Perl (database language) and SQl (database system). At the top of the map are a cluster of keywords describing a number of graphical and design-related words and concepts. All words connected with the work of a graphical designer and the graphical layout of websites. The statistical analysis thus indicates that there are three different significant tagging strategies, user groups and interpretation communities in the system. They have been described below on the basis of the static clusters of keywords: 1. The well-informed citizen – "The curious citizen" The purpose is to obtain general information within a number of wide social subjects. The categories activated are the general human categories existing across social boundaries and interests. It is the he dedicated citizen who tags about his or her hobbies and to fulfill a general need to "keep up with developments/the Internet". The motivation is curiosity regarding what is going on in society, and the contextual framework is a combination of spare time and work. 2. The professional IT worker – "The professional IT nerd expert" The purpose here is primarily work-related. The objective is to keep up-to-date with programs and technologies within narrowly defined IT fields. In this case, database technology. This is a professional IT expert working with very heavy and complicated database solutions. In all probability a man who is not interested in the "softer" part of the IT world relating to graphics and user experience. Here, the focus is on the basic structure of the system and the possibilities provided by technology. He is not interested in the system as presentation, rather as representation and structure, focusing on the underlying structures – the so-called back-end. His basic categories are at a completely different level and much more specifically related to a professional community. The motivation is to have an updated toolbox containing knowledge on specific technologies, and the framework is specifically work-related. 3. The professional IT designer - "The soft and general IT expert" Here, the purpose is to stay up-to-date with developments within graphical design and the design of websites. Here, pictures, icons and fonts for the graphical work in programs such as Photoshop are collected. This is probably a woman who is interested in the meeting between technology, people and graphical design. The tendency is to use broader and more diffuse categories which are not only related to specific narrowly defined subjects (freeware, Webdev article). Here, the focus is on the IT system as presentation – the so-called front-end. Her basic categories are related to the work with graphical design and the most commonly used keywords in that world. Her motivation is to have an updated and shared important knowledge of graphical design and web design. These three interpretation communities can be seen from the 245 most frequently used keywords; however, they also appear if you look at the 129 most frequent keywords for the 500 users in the system. This confirms and supports the indication that these three groups exist as distinct tagging strategies throughout the system. 18 The percentage distribution of the keywords among the 500 users show that the strong top 245 tags, and thus also the 129 most frequently used, represent a very high percentage of the total group of 178,215 tags. This confirms the existence of the power law in the system. One website as an illustrative example If we look at one article in del.icio.us, the tags and relations between the different strategies will become clearer. We have chosen a famous article by the American economist Milton Friedman from New York Times, which has been tagged by 75 users, including K-Forum (www. del.icio.us/kforum). The article criticizes the idea of corporate social responsibility, and the message is that companies have been founded to make money for their shareholders and do not have to pursue politics by taking on a social responsibility. Of the total number of 198 tags from the 75 users, we can see that subject-wide categorizations are very predominant (business, economics, politics). We may also see a few examples of a task-oriented categorization (toread) and a formalistic categorization through the media (article, articles). Furthermore, there is a categorization based on the producer (Friedman) and related to time ( future). The figure below shows the frequency of the individual tags: Ideal URL: The Social Responsibility of Business is to Increase its Profits by Milton Friedman 2 2 2 2 2 2 2 friedman article strategy ethics nyt essays responsibility 3 3 3 3 enterprise society Tags future corporate 4 4 4 management legal law 5 articles 7 government 9 9 9 money toread philosophy 18 social 23 politics 38 economics 42 business 0 5 10 15 20 25 30 35 40 45 number of use Concretely, over time the users of the system start to agree on which words best describe the content. Not surprisingly, it is the broad subject categories that take the prize as the ones describing the website in the best possible way. The holographic organization of the power law shows that like the system as a whole follows a power law, this one article also follows a power law where few tags are over-represented. The point is that the more different tags people use, the more diversification and heterogeneity there is in the system and the more varied the categorization of the content will be. This is due to the fact that the benefit of the database and knowledge sharing, as mentioned before, depend on the match between one's own and other people's categorizations of the world. The 19 more varied people tag, the more other people can learn from the tags. For example, if you tag Friedman's article with the keywords business and economics, you may learn that this article is about business life and economics. This is not a relatively great benefit. If you had used these broad tags and a number of more specific tags instead, other users would learn a lot more about the subject and the article. For example as a combination of the following tag: business, economics, CSR, neoliberal stakeholder theory. Here, the people who categorized the article with the very general tags could learn something with the existence of the more specific tags. The problem with the power law and the very broad subject categories in a network-theoretical perspective is that they mathematically give an excessive density between the few and strong tags, it reduces homogeneity and thus the opportunity of getting surprised by new tags and thus new conceptualizations. Today, the users of del.icio.us get very surprised because the keywords used are so broad, which enables them to find many different "superficial" surprises under a term such as business. This form of surprise is, however, worth less than if the users were surprised more often through a more finemeshed and varied categorization. It would thus provide a higher quality knowledge sharing, if the surprise was based on new knowledge from new sources through very precise and varied subject headings, where the relation between the keyword and the source was deep – what could be termed "deep" surprises. The homogenizing bias of the system, and thus the dynamic of the power law, is thus in many cases contra-productive relative to an optimal knowledge sharing through system, because the broad categories get the power. The point is that the optimal knowledge sharing is not brought about by a few broad keywords with many connections and a central position in the network. It is achieved by the existence of many varied connections between many different tags. The more homogenized, the higher the general density and the fewer "true" surprises, since people know the same, refer to the same and use the same few words. However, the higher the heterogeneity, the lower the general density and the more surprises, because there are more people who know different things and use different keywords. In this way, Granovetter's network-theoretical point about the strength of the weakest link also applies to a network like del.icio.us (Granovetter 1982; Granovetter 1995). With Granovetter, it may be said that the problem is that the weakest links are too invisible in del.icio.us and that there is no dynamic in the system challenging and contrasting the homogenization and thus the strong, broad tags. In the system, lists of the most popular tags take up a lot of space on one's own lists and on the system's lists of all tags used. Here, the supporters of folksonomies would of course raise objections and emphasize that the transparency of the system ensures that all keywords and thus also the weak ones are potentially visible. That is of course correct; however, the problem is that they are relationally too invisible. The mere existence of the power law illustrates this point and shows that the law regarding the strength of the weakest link applies here. The message is thus that the optimization of knowledge sharing is a function of the fact that different people collect and share different websites with different tags in stead of having the same people share the same knowledge, the same few strong tags and the same websites. In this case, you will risk having a high degree of redundancy, where you will benefit less from the system. The possibility of sharing new knowledge thus depends on the existence of many people who do not know the same. The best knowledge sharing thus happens when there are a number of structural "gaps" in the network of unconnected people, tags and websites, where there is a low density and thus a good possibility of surprises and acquisition of knowledge (Burt 1982; Burt 1992; Burt 2005). The figure below shows the constancy over time in respect of the keywords connected to Friedman's article. The figure is a time-related overview of the tags that have been used in 20 connection with Friedman's article. All the tagged websites go through a certain lifecycle, where they are added to the system and given a number of different keywords of different frequencies, after which time constancy over time is achieved. After the first 100 tags, a pattern begins to emerge, where a certain repertoire of tags becomes constant with a fixed ratio relative to each other. This has been illustrated with Friedman's article; however, it is a general feature of all bookmarks in the system (Golder and Huberman 2005). Here, you can se the constancy over time of the tags used: In order to uncover possible patterns in the users' tags of Friedman's article, the most frequently used tags from 72 users have been collected and processed through a statistical correspondence analysis. The users were chosen because they had all added Friedman's previously mentioned article to their list of bookmarks. In this sense, the selection of users is random. Once more, the three user groups found earlier can be seen. We see the well-informed citizen perceiving the world in broad societal categories (religion, culture, economics). It is quite interesting that this group categorizes Friedman's article as a question of economics, i.e. the broad societal question of administration and roles in an economic system. In respect of the website containing Friedman's article, we also see a new group, who, as opposed to the "citizen", categorizes Friedman's article as a question of money (money). This is a far more mercantile perspective taking its starting point in the company's need to earn money as opposed to society's general need for administrating resources (economics). This new group of "capitalists" is also more practically oriented and interested in how 21 the problems are solved. For example, the tag howto is very strongly represented with this group. Curiously enough, this group confirms the prejudice of the busy businessman who never has time to get fully acquainted with things, cf. the fact that the toread tag is pronounced for this group. The fact that we suddenly see a new interpretation community in respect of this article could indicate that the number is not finite in the database, only it varies from website to website depending on the context. The statistical analysis thus demonstrates the existence of different interpretation communities and a shared structural design thereof around a content categorizing tagging strategy. However, the analysis does not give any indication of the number or what they may potentially consist of. 22 How can a cognitive perspective declare the power law and the imitation logic? Generally, folksonomies may be viewed as systems expanding the cognitive process from an individual to a social process through increased transparency, without cognitive costs being increased, because the medium is digital and knowledge may thus be copied with no extra cost. The process in the system is that the cognitive resources are collected, which generates a form of distributed cognition: The shared thoughts/tags are the sum and the distribution of the individual thoughts/tags. The shared thought occurs as a function of the brain's work with the categorization of concepts in relation to the individual websites. The motivation for using the system is thus an objective assessment that the cognitive costs and benefits that would be connected to the learning of the classic taxonomy's hierarchy are less profitable than to invest in the non-committal community of folksonomy around user-generated descriptive metadata. The problem is, however, that the benefit is a total function of the participants' cognitive investment in the system. Whether or not it will add up thus not only depends on the individual taggers, but also on the total investment of time and knowledge from all users. The problem may be expressed as follows: "On the surface tagging seems to offer a new paradigm of organising information, one that reduces the cost of entry and so enables a long tail of participation to emerge. I've come to realise that the cost isn't removed, instead it's displaced and possibly increased. Tagging bulldozes the cost of classification and piles it onto the price of discovery. ..Tagging makes things harder to find because it vastly increases the places you have to look. In my view the total cost of an information retrieval system is the cost of classification plus the cost of discovery…. In the formal classification world you have a very small number of people incurring a high cost in order to reduce the costs incurred by a very large number of people. In contrast the tagging world has the unit costs reversed: it's cheap to classify, expensive to find. But the numbers of people involved are large in both cases so you end up with a lot of people paying a tiny cost to classify added to a lot of people paying a high price to discover. I think it's pretty likely that the total cost is going to end up much higher than in the classification scenario." (Davis 2005) Cognitively, the brain goes through several different steps, from the discovery of a website to the final tagging in the system. Firstly, the brain uncovers similarities and differences between the website and potential keywords. In this process, different associative relations between the content of the website and the meaning of different keywords are processed. As opposed to the traditional taxonomies, the brain does not have to make drastic rejections of certain categories, because the user may choose several keywords at the same time in folksonomy. Consequently, the user may cognitively maintain ambivalent and double memberships outside of the hierarchy, because taxonomy's requirement for unambiguity and fixed positions in the linguistic/cognitive hierarchy lapses. This means the user must spend a lot less cognitive resources, because the website does not have to have a specific keyword relative to a hierarchy where it is necessary to weigh for or against different places in the hierarchy. Of course, this makes the process of adding keywords easier and faster, but it does not necessarily make it unstructured, which is due to the previously mentioned experience structures creating an amorphous framework of understanding. As mentioned several times above, del.icio.us is in principle a flat, non-hierarchical, horizontal universe of keywords. Immediately you could think that this means that a concept such as subcategories would disappear. However, this is not the case. The system merely places subcategories and supercategories next to each other horizontally; however, for the user, they still 23 exist. If you, for example, consider the tags: cat, tomcat, Siamese. This is a way of creating subcategories, where the taxonomy is still in the users' consciousness; however, they appear as coordinate categories. In folksonomy, taxonomy and hierarchization in many cases just move into the user's head, because the user in some cases uses a tagging strategy which is a hierarchical specification. The challenge of social tagging is that the cognitive delineations and category constructions come into play for better or worse. In cognitive psychology, a great number of empirical studies of how the brain categorizes, when the categories, as in folksonomy, are not predefined, have been carried out (Rosch 1976). These studies unambiguously show that people spontaneously use basic names as keywords in unsupervised categorization (Rosch 1976; Jolicocur and Kosslyn 1984). Generally, people thus categorize through a number of cognitive basic categories which everyone is able to activate directly. The basic level of categories plays a central role in cognition and in any form of categorization. This may be perceived as an archetypical abstraction level in an implicit shared taxonomy (Rosch 1976). The basic categories may be defined as follows: "Categorizations which humans make of the concrete world are not arbitrary but highly determined. In taxonomies of concrete objects, there is one level of abstraction at which the most basic category cuts are made. Basic categories are those which carry the most information, possess the highest category cue validity, and are thus the most differentiated from one another. Basic categories are shown to be the most inclusive categories for which a concrete image of the category as a whole can be formed, to be the first categories sorted and earliest named by children and to be the categories most codable, most coded, and most necessary in language." (Rosch 1976) (382). The basic-level categories are the large, broad denotative categories that the brain process fast such as cat, dog, car, house etc. These may be perceived as prototypical categories for a group of objects. These basic categories have the highest cue validity, which may be defined as: "The validity of a cue is defined in terms of its total frequency within a category and its proportional frequency in that category relative to contrasting categories. Mathematically, cue validity has been defined as a conditional probability – specifically. The frequency of a cue being associated with the category in question divided by the total frequency of that cue over all relevant categories."(Rosch and B.Caroly 1975) (575). These categories are very basic and thus require few cognitive resources to activate (Rosch 1976). By activating these basic categories, you can thus save cognitive resources, because you do not have to consider category conflicts or the deeper meaning of the content. The basic categories are created as a function of a cognitive economizing and task assessment on the basis of the requirements and expectations of the surroundings (Rosch 1976) (383-384). In a cognitive sense, social tagging as in folksonomy thus activates the problem of what to define as a basic category for whom, since different tags of varying degrees of specification may be added to different subjects on the basis of who knows what. The basic categories may thus vary due to differences in knowledge and culture (Tanaka and Taylor 1991) (457-482). Friedman's article can be tagged with everything from economics to neoliberal shareholder theory. For people who know a little about Friedman, the first very general tag is the best way they know how to categorize the article; however, for a professor in corporate social responsibility at London Business School, the last very specific category will be the immediate categorization. The point is that different people have different ways of categorizing the world on the basis of how much they know. 24 And the more you know, the more thorough you can categorize. They have different basic levels and immediate basic levels (Tanaka and Taylor 1991)(457-482). In social bookmarking systems, the different categorizations and knowledge levels meet between people and in respect of the dynamic development of different people's ability to categorize. The thing is that a person's own basic categories are not constant over time either; they develop dynamically in step with the acquisition of new knowledge. The professor from London Business School would thus not have categorized Friedman's article as neoliberal shareholder theory while he was still a bachelor. Then, he would in all probability have used the tags business or readlater. But now, after many years of studies, he is able to tag the article more precisely relative to the content. Over time, his basic level has changed. The point is that the benefit of knowledge sharing in connection with the del.icio.us system depends on whether the user's categorization of his or her surrounding world corresponds to the way in which other users in the system categorize the world. The challenge is to find twin thinkers who categorize the world with the same terms and connect the terms with the world in the same way. The ability to find these twin thinkers is one of the system's greatest strengths. For example, as mentioned before, few of the users work with marketing, branding or PR. This means that people who are interested in these subjects and know many terms in this area have relatively less to gain by participating. They simply lack terms for searching. Here, there are not only much too few tagged bookmarks; there is also much too few categories within these subjects. It is simply not interesting to use the system for these subjects, as the categorization is not detailed enough and the user group is too small. On the other hand, the benefit is greater within the areas where the users have far more varied categorizations, and where there are more people with knowledge about the subject. In the del.icio.us database this applies to the IT field. Here, there are far better and more varied conceptualizations and thus more to gain from knowledge sharing. This all points to the fact that the system's possibilities and limitations in respect of knowledge sharing are mainly a consequence of practice, including which people choose to participate in the system with which resources and levels of basic categories. The quality of knowledge sharing thus does not depend on the quantity but on the quality of participants. As explained above, the system is characterized by a power law created by an imitation logic, where the majority of users often use very broad and few, widely known keywords, which provides more broad and imprecise information than a little precise information. This wish for quantity over quality may probably be explained by different cognitive factors. Firstly, the broad basic categories are intuitive and part of the immediate repertoire in the cognitive consciousness. Furthermore, the users are without doubt driven by a cognitive simplification principle in their categorization (Chater 1999; Chater and Pothos 2002). By choosing very broad categories, users save cognitive resources and can just add the first impulsive basic categories directly without having to consider it much. In this way, users do not have to reflect on the depth, hierarchy or the keyword's family or associative relations. The cognitive explanation of imitation logic thus forms a contrast to the other explanations. In all other explanations, imitation logic is a consequence of dedication, investment and attitude, either in the spirit of democracy or according to the market rationale. However, in the cognitive explanation, the power law occurs in the unsupervised categorization as a consequence of the need for simplification and complexity reduction(Chater 1999; Chater and Pothos 2002; Chater and Hurley 2005). This point has also been made by the linguist Zipf in his comprehensive statistical analysis of the use of words in the English language (Zipf 1949). In this analysis shortly after the War, he found that power law was a distribution of words in the English language. He explained the existence of this power law with the concept of least effort. What he meant by this was that the speaker endeavored to save cognitive resources by exerting the least amount of effort, while the listener wanted to get the most out of it and required 25 more from the speaker. The relationship between the speaker and the listener is thus a struggle between the attempt to communicate with the least effort and to understand with the greatest benefit. Zipf referred to this trade-off as the least effort, and it is this least effort that is significant in folksonomy. Zipf's power law is thus also driven by a cognitive economizing and thus supports Chater's central premise, as described by Vogt: "Humans undoubtedly try to minimise the cognitive effort to categorise sensorimotor events. It is therefore plausible that the tendency to use generalised categories is a bias that – in addition to other biases, such as reducing articulatory effort (Mandelbrot, 1953; Zipf, 1949) and other frequency-related approaches (Gunther et al., 1996; Tullo and Hurford, 2003) – yields the emergence of the Zipf-Mandelbrot law." (Vogt 2000) For both Zipf and Chater, simplification is thus a basic dynamic in the cognitive consciousness when objects are to be categorized and classified (Chater 1999). The idea of the economizing of thought may be traced back to Ockham's razor in scholasticism and the contemplations of the empiric scientist Mach on the explanative power of scientific theories; however, it has later been brought up again in the cognitive psychology. An exponent of this is the cognitivist Nick Chater, who views the hunt for simplicity as a fundamental cognitive principle: "The cognitive system imposes patterns on the world according to a simplicity principle: Choose the pattern that provides the briefest representation of the available information. The simplicity principle is normatively justified – patterns that support simple representations provide good explanations and predictions on the basis of which the agent can make decisions and actions. Moreover, the simplicity principle appears to be consistent with empirical data from many psychological domains, including perception, similarity, learning, memory, and reasoning. Thus, the simplicity principle promises to serve as the starting point for the rational analysis of a wide range of cognitive processes." (Chater 1999) (273). The general principle of simplification and reduction of cognitive complexity has subsequently been demonstrated empirically by Chater through a number of tests with unsupervised categorization. An experimental design which is essentially identical to tagging as a cognitive process in folksonomies. In these experiments with unsupervised categorization, the test persons automatically choose the simplest categories. As expressed by Pothos & Chater in the article "A simplicity principle in unsupervised human categorization": "There are situations where people would spontaneously recognize that a set of objects can be organized in different groups, with no information either about the number of groups sought or the distributional properties of the objects (in other words, there are no prior expectations for the objects to be categorized).We have argued that this process of unconstrained classification is a process like perceptual organization. Utilizing the simplicity principle from perceptual organization …. We proposed and empirically tested a simplicity model for unsupervised categorization." (Chater and Pothos 2002) (333). The Internet is full of millions of websites and the users have a limited amount of time to search, for which reason they try to save the cognitive resources, because they are driven by the need for cognitive economizing. Therefore they choose the shortest and fastest entry of descriptive metadata. 26 With the word of Friedrich Nietzsche:”To predict the behavior of ordinary people in advance, you only have to assume that they will always try to escape a disagreeable situation with the smallest possible expenditure of intelligence.” (Nietzsche 1980) (330). It will be the basic-level categories that are the preferred descriptive metadata, because they are the fastest and most profitable solution to a disagreeable situation with the smallest possible expenditure of intelligence. This is due to the fact that the basic categories are the most coherent categories in our shared world picture which is determined by culture, as described by Pothos & Chater in their article on the relation between the basic-level categories and unsupervised categorization. "A possible relation can be established between basic-level categorization and unsupervised classification. Rosch and Mervis (1975) noted that out of the hierarchy of more or less general categories into which we may place an item (as being a Scottish highland terrier, a terrier, a dog, an animal, a living thing, and so on), there appears to be a privileged 'basic' level. This seems to be the default level for identifying new objects with linguistic labels (seeing Fido, the default is that we say or think "A dog!" rather than "An animal!" or "A terrier!"). The notion of basic-level categories has been supported from a range of converging sources of evidence. For instance, basic-level categories lead to rapid picture naming in comparison with subordinate or superordinate categories, and there is less between-participant variation concerning what attributes objects have, if they belong to basic categories (Rosch, Mervis, Gray, Johnson, & Boyles-Braem, 1976). Similarly, Mervis and Crisafi (1982) showed that basic categories are privileged in naming and other category-related behavior of children (see also Horton & Markman, 1980). Basic-level categorization can relate to unsupervised classification if basic-level categories are viewed as especially coherent." (Chater and Pothos 2002) (309). The cognitive basic categories are activated because they reduce the cognitive complexity consisting of the following: "According to the simplicity principle, the best explanation of a set of data (e.g., patterns of sensory input, or a set of objects) corresponds to the shortest description that encodes that data. Similarity- and theory-based categorizations can then be viewed as complementary ways of building short descriptions of available data. For example, the category of birds may be justified on the basis of similarity, to the extent that birds are highly similar, so that it makes sense to encode their common properties in a single category, rather than separately for each specific bird. But equally, the category of birds may be justified by the commonsense 'theoretical' generalizations defined over birds, concerning their biology, behavior and so on. One classification is preferred to another, if it provides a shorter overall description of the relevant data." (Chater and Pothos 2002) (335). According to these theorists, the least effort is thus probably the dynamic behind imitation and reproduction in the system. The most popular tag is the shortest possible tag. This is typically monosyllabic words which can be written in a few seconds, such as blog, web, read, ajax etc. The most important quality of all tags is that they are easy to write and are "the shortest overall description of the relevant data". However, the cognitive perspective is not unproblematic and should be varied a bit. The empirical survey of the del.icio.us users shows that simplification cannot stand alone as the only dynamic in the users' categorization. The users choose not only the shortest possible tag; they also categorize in relation to certain interpretation communities and tagging 27 strategies. Complexity reduction must thus be regarded as a dynamic which is contextually activated according to interpretation communities and pictures of the world. The point is probably that the users are "lazy". However, they are lazy in different ways in respect of different subjects according to who they are and what they know. The ways in which the users' "laziness" differs are in all probability a function of the three interpretation communities and tagging strategies which was the result of the statistical analysis herein. Folksonomy is thus used as a tool for becoming inspired and surprised within the framework of some very broad categories. This is due to the fact that the central search imperative and the most optimal use of the system are generally the hypothesis-generating intention and not a narrow and targeted search. The users thus want as much as possible in order to be as inspired as possible in the most possible ways and as cheaply as possible. Thus, the power law supports and favors a certain explorative use of the system. The explorative benefit of using the system is partly a consequence of the use of the very broad categories. If you search on the tag business, you may also find everything under the sun. This may be expedient as long as you are exploring; however, it is less expedient if you are searching for subjects that may be categorized more precisely than business. The complexity reduction and cognitive economizing are also due to the fact that many of the users do not read on the Internet, they only skim texts. You may presume that many people in practice add keywords or copy other people's keywords, without necessarily having read the texts or links they are tagging. They simplify because they cannot make it more complex. In this way, the power law is reinforced by the widespread cognitive simplification and eternal hunt for the shortest possible entry of metadata. Few users invest the amount of time in the system that is an important prerequisite for having something valuable to share. This suspicion is partly backed up by the frequency with which keywords such as readlater, toread, checkoutlater occur. The system also contains many lists to which the users do not add keywords at all. Here, folksonomy is thus simply used as a collection of links. We thus probably see a decoupling of parts of the system, where only a few parts are used, while others are ignored. Many links and bookmarks thus circulate unread in the system, while they have been skimmed and given, at best, imprecise and general and, at worst, incorrect keywords. This situation is probably the result of many users' implicit maximizing strategy for knowledge collection, where they rarely reject information out of fear of not being able to retrieve it. On the basis of the notion of "better safe than sorry", large amounts of information are piled up – information that has not been read but only skimmed and given a number of superficial, very simplified keywords taken from the brain's immediate store of denotative basic categories. As it is very simple to save links, and the cognitive costs of having to retrieve something in the chaos on the Internet, the majority of people choose to fill the system with texts that they have quickly skimmed with general and thus superficial keywords, which do not contribute anything towards actual organization. The fear of not being able to retrieve a website again is greater than the willingness to spend time and effort on a thorough read-through and tagging. Consequently, the value of the system decreases, because the users choose to simplify complexity considerably and spend too little time on tagging. The quality and the cognitive benefit are thus reduced as a result of the cognitive investments in the system being to modest. The power law, the imitation logic, the broad categories and the constancy of certain tags may thus over time be explained as a more or less conscious wish to save cognitive resources. When everyone saves cognitive resources, it is obvious that the end result will be impaired. This may be regarded as the weakness of the system, while its supporters on the other hand indirectly consider it a strength. In the cognitive sense, the system still has some strengths. Compared to related methods, folksonomy provides a better reflection of the users' cognitive conceptual models, practice 28 and language for better or worse. Despite the fact that folksonomy in many cases probably reflects the users' cognitive simplification, complexity reduction is to be preferred, as it is closer to the users' reality and the living languages. In folksonomy, users create the keywords themselves, for which reason there is no distance between the users' conceptualization and the system's concepts, since they are identical. Compared to the classic taxonomies, folksonomy thus better reflects the users' own language and their world. They are iterative systems which dynamically reflect the dynamic development of language through quick updates. Folksonomies may not be as precise, but the metadata have been created on the basis of the users' own horizon and are thus much more valuable than machine-generated metadata. However, the idealistic supporters of the system cannot deny the lack of precision in the system. As mentioned before, the individual websites are often described with an insufficient number of tags and with very general keywords, because amateurs do not necessarily understand or have experienced the need for a stringent and precise hierarchization of the information. In this sense, folksonomies often do not yield better search results than using the same keywords in search machines such as Google, because the majority of the keywords used are very general and at a very high abstraction level such as marketing, PR or business. While the supporters see a digital expression of the wisdom of crowds, the critics see wisdom dependent on who is tagging and how the majority tags. The fewer tags, and the more botched and broad keywords, the more imprecise the tagging will be as a joint expression relative to the ideal. Information quality is thus not directly proportional to the number of participants, and the system is in its essence difficult to scale. The risk is thus that there is a direct proportionality between the number of participants and the volume of infotrash (Doctorow 2005), because everyone simplifies, skims and thus adds general and superficial keywords, which means that everything will end up a chaos of meaning due to indolence. The consequence of the fact that the shared tags have poor relevance and value is that most users choose alternative methods of assessing and selecting which information they want to reach, and here the populist functions in the system are preferred as a method of assessing the content of the database, because the metadata are so devoid of meaning and of such a poor quality. The search result will thus be what the majority prefers, and thus a result which is not very different from or better than that created by the classic search algorithms. In the face of such criticism, the supporters argue that answering concrete questions is not the purpose and object of folksonomy. Rather the unique quality of folksonomies is that they can be used to explore a knowledge area and lead the users to unforeseen and unacknowledged knowledge resources. Through the system, you surf other people's networks of associations and contextualize your own associations. The system is thus excellent at assisting in the formulation of questions; however, it is not good for finding answers to unambiguous and clearly formulated questions. It is an excellent tool for explorative journeys and exploration of problems. As Mathes puts it: "There is a fundamental difference in the activities of browsing to find interesting content, as opposed to direct searching to find relevant documents in a query. It is similar to the difference between exploring a problem space to formulate questions, as opposed to actually looking for answers to specifically formulated questions." (Mathes 2004). Ultimately, the purpose of folksonomies is not to find specific answers, but to be inspired by others. If the user is not able to find a certain thing, he or she will find something else. The lack of precision challenges the way in which the users would clarify the point and formulate their problems themselves. Folksonomy thus gives the user a unique chance to find what they are not looking for or what they did not know that they were looking for. Thus, the system provides a possibility of dynamic learning which is not offered by other organization principles. The user- 29 generated metadata create room for learning and learning about learning by making the relationship between categories and objects visible and reflecting it. Knowledge sharing through the system is, in this perspective, described as the result of cognitive categorization and associations. If you are better able to understand the cognitive processes, you will understand the conditions and thus the possibilities of better knowledge sharing, both as an individual and as a social process. In this perspective, the problem regarding the quality and precision of metadata is shifted through differentiation of knowledge concepts and knowledge collection processes. Now, it is not a question of sharing knowledge, but of sharing a certain hypothesis-generating knowledge, for which reason the lack of precision and quality is not a problem, but a necessary prerequisite for the surprise. The system must thus be viewed as a tool for exploring and wondering, not for finding and understanding. Folksonomy as a market-economic system At a macro level, folksonomy is regarded as a market for information, where a number of economic regularities for the ideal-typical market apply. There is a dynamic and transparent relationship between supply and demand on a market. The market logic is a function and total sum of decentralized decisions made by a few players. Folksonomy is a total procedural expression of the collected practice of spontaneous collaborative categorization in a large population through the independently selected keywords. The collective and the individual are connected as in the classic neoliberal theory, where the interests of the individual (i.e. the creation of own metadata) is an indirect benefit for the shared pool of metadata, which is a direct function of the total sum of tags. The user's use of other people's tags and his or her own tags may be perceived as an expression for them "buying" a certain meaning. Just as the individual is free to create his or her own tags, the market logic determines which keywords are central on the basis of supply and demand. In a market, there are often some products that are more popular than others, just as some information is more popular than other information in folksonomies such as del.icio.us. A market-economic system thus has a number of similarities with del.icio.us, even though there are also differences. One of the differences is that in a digital database, there are no limitations or increasing costs incidental to the copying of information, while there are often increasing costs incidental to the sale or purchase of more products in a traditional market for physical products. In a market, there is often a cumulative tendency that a few products are very popular and counts for a large share of sales. The market power thus often hands the power to a few strong products that are in great demand. In del.icio.us, this is reflected in the fact that the key words that are used often are, everything else being equal, strengthened over time, giving them more power because they are copied often and thus are more visible on the individual lists and under the most popular key words. Consequently, there is a tendency of homogenization in the system, making the few, strong keywords have a great influence. The point is that the users act on the basis of the equation: great knowledge sharing benefit = a large and surprising volume of information from many users if you use very popular keywords. The broader the keywords, the more contextualization and transparency, even though the result becomes more imprecise. This may be illustrated as follows: If we, for example, look at the very popular tag branding, we will potentially get the opportunity to see far more references, more people will refer to the term, more people will see the tags, and we will get a lot more information than if we used a narrower, more precise, but less popular tag such as corporate_branding. If we used this tag, we would have more precise but fewer links to this keyword. The majority of users clearly reject the few, but more specific keywords in favor of having more and more imprecise keywords. This is underscored by the information architecture in del.icio.us, where all tags are quantified and given a number. As in any other market, we thus see 30 that the most popular is made visible, and what has already been purchased is offered directly to the new shoppers. In this way, the power law is made visible, which reasserts its power. In a market perspective, del.icio.us may be perceived as a self-regulating market, which, due to the player's own-interest, generates the largest number of hypothesis-generating surprises for the total sum of players. The knowledge-sharing perspective here is that the market is the best dynamic for sharing knowledge. The market is thus a way to achieve the best possible knowledge sharing. The premise is that homogenized knowledge with a few frequently used and broad tags generates better knowledge and better knowledge sharing than if knowledge were not shared on market terms and were less homogenized. Just as the individual is free to create his or her own tags, the market logic determines which keywords are central on the basis of supply and demand. The imitation and the constancy of the tags over time may thus be explained by the fact that people buy a certain meaning, which leads others to buy it too. This perspective has played an important part in the folksonomy debate, either as an unacknowledged ideological point, a descriptive establishment of a fact, an active ideological imperative or as a mixture of the descriptive and the imperative. Unacknowledged as an implicit and unreflective consequence, descriptive as an establishment of fact or actively ideological as an imperative and opposed to taxonomy. Folksonomy is presented as being better and more democratic than the centralized and undemocratic taxonomy. Here, democratic should be interpreted as liberalism and the power law of the market. In a few words, this position may be characterized as the market of meaning is the meaning. Some supporters view this liberalism as the underlying positive impetus in folksonomy. The market is democratic and dynamic, as expressed by Shirky: "The other essential value of market logic is that individual differences don't have to be homogenized. Look for the word 'queer' in almost any top-level categorization. You will not find it, even though, as an organizing principle for a large group of people, that word matters enormously. Users don't get to participate in those kinds of discussions around traditional categorization schemes, but with tagging, anyone is free to use the words he or she thinks are appropriate, without having to agree with anyone else about how something "should" be tagged. Market logic allows many distinct points of view to co-exist, because it allows individuals to preserve their point of view, even in the face of general disagreement." (Shirky 2005) The argument is that the solution to the problems of the old world's dusty libraries and their inadequateness of course lies in bringing the market into play. The argument is as always that this is essentially the intentional will of the people because taxonomies do not work for the people, and hence its postulated contrast works better. With the market, greater spread and dynamic are created than with the static taxonomies. Here, the introduction of the market is exactly what provides the required structure. Technology has spoken and determined a future, which is the future of the market. The market must be realized since it is already real as a social act. The people have voted with their computers, and they have voted for folksonomy. Folksonomies are the only alternative, since there are not other alternatives, and when the people have already started to reject the classic taxonomies in practice, the theoretical models and the professional practice must follow reality and the productive powers. In other words, the ideological objective is to create a market for meaning, because the market is the meaning. The means is to release the creation of meaning. Where some perceive the market as the solution, others see it as the problem in itself. Those people have criticized folksonomy for not being a release of meaning; rather they perceive it as an implicit oppression of meaning and use of symbolic power in its marketization of meaning. In this perspective, oppression of meaning is the drawback of the system. The 31 idealistic supporters of the technology see a liberating market for meaning and forget the social gravitation and the fact that creation of meaning always happens against someone's interests in the interest of others (Bowker and Leigh Star 2000; Boyd 2005). Thus, in del.icio.us, you always see the most recent entries first, regardless of which keyword you are following. In this way, the system is in its essence centered on news value and in this sense always reflects the classic news criteria and how the media always choose the news and thus also always reject that which is no longer news. In this way, the system's organizing impetus becomes news and not lasting relevance. It also means that these systems tend towards favoring contagious information. Contagious information becomes clearly visible in the system, and often you will see that the same links appear several times on the same list, because new participants are infected with these links and spread the information virus in the social system. In this sense, the system may also be criticized for being meme-centric (the people who are very contagious become stronger) and having a viral bias (what is reproduced often, get more power). The problem is that the potential value of the distributed cognition in folksonomy is reduced, because the system is affected by and favors information cascades (Bikchandani, Hirshleifer et al. 1992; Bikhchandani and Welch 1998). They are follower processes where the users copy and imitate other people's tags without considering how to tag in relation to a given website. This is also a consequence of the cognitive economizing and is reinforced by the fact that the system automatically offers the users to see other people's tags and their own previous tags. In both cases, the user is lured into not using his or her own current and private information but only copies from others or from a previous cognitive investment from another context. This means that the cognitive load is reduced much too much: "Each of these services boasts features geared towards simplifying the process of organising a variety of media, in addition to mechanisms facilitating the future retrieval of such items. Of the aforementioned services, del.icio.us is arguably the most developed and possibly the most collaborative. For example, it combines information gathered from unique identifiers (i.e. the URL) with information gathered about the most popular tags used for that URL. This allows del.icio.us to suggest possible tags when users are bookmarking new resources or to provide users with a list of 'common tags' (i.e. popular tags that are assigned to the same resource by multiple users). These common tags can then be used in a subsequent user search strategy." (Macgregor and McCulloch 2006)(4). The result is an accelerating sequence of uninformed choices. James Surowiecki formulates the problem of information cascades, which clearly characterize social systems such as folksonomies: "The fundamental problem with an information cascade is that after a certain point it becomes rational for people to stop paying attention to their own knowledge – their private information – and start looking instead at the actions of others and imitate them... But once each individual stops relying on his own knowledge, the cascade stops becoming informative. Everyone thinks that people are making decisions based on what they know, when in fact people are making decision based on what they think the people who came before them knew. Instead of aggregating all the information individuals have, the way a market or a voting system does, the cascade becomes a sequence of uninformed choices, so collectively the group ends up making a bad 32 decision." (Surowiecki 2005) (54). Conclusion: At the start of this article, we raised the question of whether folksonomies are a better, cheaper and more valid method of creating descriptive metadata than taxonomies and faceted navigation. The answer is no. For different reasons, folksonomy is not a better, cheaper or more realistic method of creating metadata. In that sense, folksonomy does not represent an epochal paradigmatic and substituting shift that will replace other categorization principles. Such idealism is more of an ideological position, false essence thinking about folksonomy as a method and a media-essential idealization of the Internet at the expense of other media than a scientific reflection. Folksonomy should not substitute taxonomy and faceted navigation. Folksonomy is a complementary knowledge sharing tool which potentially has the qualities that search algorithms, the classic taxonomies and polyhierarchical principles lack. The traditional methods are efficient tools for finding what you are looking for, while folksonomies are efficient tools for looking for what to look for. Folksonomy is thus an excellent tool for reframing, exploring, hypothesis-generating and contextualizing information on the basis of the user-created metadata's pluralism; however, it is no good if you want to find clearly defined information quickly and reliably. Only by combining the three different methods is the complementarity exploited, making the search explorative, pluralistic and teleological. The reason why folksonomy has this complementary quality may be explained by two factors. Firstly, tagging in practice: The users often use very broad keywords to cover many different websites, for which reason there is great scope for surprises. Secondly, the dynamic of the system: Pluralism occurs as a consequence of the fact that the system makes it possible for a number of market mechanisms and network structure dynamics to emerge. The metadata created in folksonomy are not better, since the pluralism is often the wrong pluralism. The surprise occurs because the users spend too little time and too few resources adding the tags. Pluralism is thus a function of the fact that very broad keywords are used for too many websites too imprecisely. The essential ability of folksonomy to surprise and create the right pluralism is thus a potential and a possibility condition that may be utilized to a lesser or greater degree. Firstly, the possibilities for this pluralism are conditional upon tagging as practice. When attempting to understand folksonomy and its potential, you must necessarily stop the mistaken essence thinking, where you have looked at tagging in an ideal-typical way and not as a practice that could be better or worse. Because the better and more precisely you tag, the greater the potential. Here, better must be understood as a more precise and qualified relation between keyword and website. In practice, the use of keywords in the system is too basic and the categories are much too wide and often do not go beyond the name or the heading of a website. The poor and unqualified relation is a consequence of the fact that many users tag too superficially and invest too few cognitive resources in choosing the tag and checking the content behind the bookmark. The result will be that the users of the system use the broad basic categories with the resulting simplification, because they do not take the time to read or choose the most suitable keywords. The consequence is that there is a large pool of very superficial keywords and websites, which have, at best, been skimmed or, at worst, have not even been read, only copied. The proof of this is first of all that tagging as a practice has many common features with experiments with free categorization, and secondly the results of the collected figures from the del.icio.us database. The underlying dynamic in the system creating the imitations is thus without doubt essentially the cognitive simplification. The descriptive metadata created in folksonomy are thus potentially poorer, because the imitation does not happen as a result of a major investment of resources, but as a sad consequence of the lack of time and will in the process. The power law, 33 where a few strong tags become stronger, cannot in itself be criticized. This is a law of nature for complex systems. However, what can be criticized is the context of cognitive simplification, which means that the very general tags win. A power law driven by simplification is untenable. Polemically it could be said that the problem is that folksonomy is an expression of the meaningful cloud of tags with simplification. This is why folksonomy can and will be a fruitful foundation for the creation of more descriptive metadata in its present form. Folksonomy in its present form is thus not the magic entry to a better Internet based on the users' reception and not the senders' intention, the so-called web 2.0. The realization of the true pluralistic potential is, however, conditional upon a number of properties in the people and their mutual relations in the system. In folksonomy, the associative semiosis among large groups of people is set free, which creates a distributed cognition, where the total sum of cognitive resources is only large because of the amount of people participating. However, not all tags are interesting, and some people have more in common than others. The result is thus not only dependent on how people tag; it is also dependent on who participates in the system in practice with which interests and qualifications. In this article, three distinct interpretation communities were found through the statistical analysis of the tags from 500 users. The users' laziness is thus also a function of these interpretation communities and is only structured on the basis of these three cognition interests. The benefit to be gained from the system therefore very much depends on whether your motivation and tagging strategy are identical to these three interpretation communities. If not, as in the case with the public relations field, the benefit to be gained from the system and from sharing bookmarks is quite limited. Either you perceive the world as a curious citizen, a graphic IT designer or a database expert, or you have very little to win by using the system. For this reason, it is important to explode the false myth about the wisdom of crowds and the magic belief that the mere volume and the system's self-regulating and emerging mechanism can create better metadata. The thing is that the constancy over time of the tags does not necessarily reflect a consensus of reflective meaning, but is a consensus to invest the fewest possible cognitive resources. The rest is a romantic and illusory idea about the emerging intelligence, which in no way can be confirmed by this study. As described above, the problem is that the potential value of the distributed cognition in folksonomy is reduced, because the system is affected by and favors information cascades (Bikchandani, Hirshleifer et al. 1992; Bikhchandani and Welch 1998; Surowiecki 2005). To this should be added that folksonomy is still quite a new phenomenon, and many users have never before had the fascinating experience of transparency and flow, where you can easily "surf" other people's tags and see what they are thinking. Amidst all the enthusiasm for the new experience, the users are unable to assess the quality of this experience, because it is new and thus incommensurable to what they know. The point is that the quality and the experience could be better if the users in practice became better taggers and invested more time and effort. The true pluralism and quality are, however, not only a function of who is participating in folksonomy. The quality of the tags also depends on which network relations exist between the participants in the system. The more heterogeneity and the lower the density in the network, the greater is the possibility of being surprised in the true, "deep" way. The point is that the system works best if the other users know something that you do not, but tag it under the same keywords that you use. This requires that there are knowledge gaps in networks which may be filled by the distributed cognition of folksonomy. Again, the value of the system very much depends on external factors such as who is participating in the system and whether someone in the system possesses knowledge that you do not, but express it through the same keywords. Many researchers have perceived folksonomy as a cheaper and more realistic way to create descriptive metadata. In 34 short, it is not a very good route, because the users, as mentioned above, take a shortcut. As in other systems, the "benefit" to be gained from a folksonomy is a function of the "investment". In this case, the total pool and quality of distributed cognition are reduced, because the investment made by the individual participants is insufficient. If folksonomies are to create "better" descriptive metadata, it requires that the users are taught how to become better taggers. Just as people all over the world are taught how to write in order to become part of the pluralistic and democratic writing culture. In conclusion, we need a higher degree of "tag literacy", if the system is to realize its fullest potential. Otherwise, the future potential of the much too meaningful meaning of the least effort will be wasted. Bibliography: Barabasi, A.-L. (2000). "Linked: How Everything Is Connected to Everything Else and What It Means." Barabasi, A.-L. and R. Albert (1999). "Emergence of Scaling in Random Networks." Science VOL 286(October 15). Barabasi, A.-L. and R. Albert (2002). "Statistical mechanics of complex networks." REVIEWS OF MODERN PHYSICS VOLUME 74(JANUARY): pp. 47-74. Bikchandani, s., d. Hirshleifer, et al. (1992). "A Theory of Fads, Fashion and Cultural Change as Informational Cascades." Journal of Political Economy 6 Bikhchandani, S., Hirshleifer, D. and I. Welch (1998). "Learning from the Behavior of Others: Conformity, Fads and Informational Cascades. ." Journal of Economic Perspectives. 12(3): pp. 151-170. . Bowker, G. C. and S. Leigh Star (2000). "Sorting Things Out: Classification and Its Consequences. ." Boyd, D. (2005). "Questions of classification." Burt, R. S. (1982). Toward a structural theory of action network models of social structure, perception, and action. New York, Academic Press. Burt, R. S. (1992). Structural holes the social structure of competition. Cambridge, Mass. London, Harvard University Press. Burt, R. S. (2005). Brokerage and closure an introduction to social capital. New York, Oxford University Press. Chater, N. (1999). "The Search for Simplicity: A Fundamental Cognitive Principle? ." THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY 52A (2 273/302). Chater, N. and S. Hurley (2005). "Perspectives on Imitation - From Neuroscience to Social Science." Chater, N. and E. M. Pothos (2002). "A simplicity principle in unsupervised human categorization." Cognitive Science 303-343. (26). Davis, I. (2005). "Why Tagging is Expensive." Doctorow, C. (2005). "Metacrap: Putting the torch to seven straw-men of the meta-utopia." Frank, R. H. and P. J. Cook (1995). The winner-take-all society. how more and more Americans compete for ever fewer and bigger prizes, encouraging economic waste, income inequality, and an impoverished cultural life. New York, The Free Press. Golder, S. A. and B. A. Huberman (2005). "The Structure of Collaborative Tagging Systems." Gorman, M. (2004). "The Corruption of Cataloging." Granovetter, M. (1982). "The strenght of weak ties: A network theory revisite In P.V Marsden & N.Lin.Ed.Social structure and network analysis." 105 -130. Granovetter, M. (1995). Getting a job. a study of contacts and careers. Chicago, University of Chicago Press. Greenacre, M. (1993). "Correspondence Analysis in Practice." 35 Hill, M. O. (1974). "Correspondence Analysis: A Neglected Multivariate Method." Applied Statistics 23 3. Jacob, E. K. (2004). "Classification and categorization: a difference that makes a difference." Library Trends. Jolicocur, P. G., M.A. and S. M. Kosslyn (1984). "Picture and Names: Making the Connection." Psychology, 16: pp. 243-275. Keller, E. F. (2005). "Revisiting "scale-free" networks." BioEssays 27: pp. 1060–1068. Kwasnik, B. H. (1999). "The Role of Classification in Knowledge Representation and Discovery." Library Trends 48(1): pp. 22-47. Macgregor, G. and E. McCulloch (2006). Collaborative Tagging as a Knowledge Organisation and Resource Discovery Tool, Library Review. 55. Mathes, A. (2004). Folksonomies - Cooperative Classification and Communication Through Shared Metadata. Merholz, P. (2004). "Metadata for the Masses." Nietzsche, F. (1980). "Sämtliche Werke: Kritische Studienausgabe, vol. 2,Human, All-Too-Human, “Man Alone With Himself,” aphorism 551, “The Prophet’s Trick,” (1878)." 330. Quintarelli, E. (2005). "Folksonomies: power to the people ". Ranganathan, S. R. (1962). "Elements of Library Classification." Ranganathan, S. R. (1964). "Colon classification." Ranganathan series in library science. Rosch, E. and M. B.Caroly (1975). "Family Resemblances: Studies In the Internal Structure of categories." Cognitive Psychology 7: pp. 573-605. Rosch, E. e. a. (1976). "Basic Objects In Natural Categories. ." Cognitive Psychology 8: pp. 382-439. Shen, K. and L. Wu (2005). "Folksonomy as a Complex Network." Shirky, C. (2004). "Folksonomy." Shirky, C. (2005). "Ontology is Overrated: Categories, Links, and Tags." Smith, G. (2004). "Atomiq: Folksonomy: social classification." Sturtz, D. N. (2004). "Communal Categorization: The Folksonomy. ." INFO622: Content Representation. Sunstein, C. R. (2006). Infotopia. how many minds produce knowledge. Oxford, Oxford University Press. Surowiecki, J. (2005). The wisdom of crowds. why the many are smarter than the few. London, Abacus. Tanaka, J. and M. Taylor (1991). "Object Categories and Expertise: Is the Basic Level in the Eye of the Beholder?" Cognitive Psychology 23 3: pp.457-482. Taylor, A. G. (1992). "Introduction to Cataloging and Classification." Vander Wal, T. (2005). "Explaining and showing broad and narrow folksonomies ". Vickery, B. C. (1966). "Faceted Classification Schemes." Rutgers Series on Systems for the Intellectual Organization of Information, ed. Susan Artandi, v. 5. Vogt, P. (2000). "Minimum cost and the emergence of the Zipf-Mandelbrot law. Induction of Linguistic Knowledge/Computational Linguistics." Wright, T. (2001). "Toward a holistic view of staff development of regional tutorial staff at the Open University." Systemic Practice and Action Research 14(6): 735-762. Wynar, B. S. (2002). "Introduction to cataloging and classification. ." Zipf, G. K. (1949). "Human Behaviour and the Principle of Least Effort: An Introduction to Human Ecology." 36

Folksonomy — the significance of the least effort

Related documents

Products

Support

Folksonomy — the significance of the least effort

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib