Recreating Popular User-Generated Tags Effectively and Efficiently by Utilizing
Crowdsourcing
A Thesis
Submitted to the Faculty
of
Drexel University
by
Deima Elnatour
in partial fulfillment of the
requirements for the degree
of
Doctor of Philosophy
June 2011
UMI Number:
Information to Users
ii
© Copyright 2011
Deima Elnatour. All Rights Reserved
iii
Thesis approval form
iii
Dedications
To my parents who have given me the security to question and the liberty to seek my own
answers.
iv
Acknowledgements
It took a village; without the help of so many people I would not be writing these
words today. I want to thank the faculty and staff members of the College of Information
Science and Technology at Drexel University for all their support and guidance
throughout this arduous, yet joyful, process. Without your help and guidance this work
would have not been possible. I am grateful for this opportunity and am fortunate to have
worked with the best in our field.
Xiaohua (Tony) Hu, Ph.D., my advisor thank you so much for your guidance and
support. I am especially thankful to you for introducing me to my life passion, the field of
data mining and user-generated content. Your patience and support helped me overcome
the impossible. My committee members Jiexun Jason Li, Ph.D., Yuan An, Ph.D.,
Christopher C. Yang, Ph.D., and Zeka Berhane, Ph.D., thank you so much for being there
for me every step in the way. Your feedback and advice have been invaluable to this line
of work.
To my mom, Itaf Elnatour, I am grateful for your kindness, endless encouragement,
and for teaching me the value of education and continuous learning. To my dad, Tahsein
Elnatour, I wish you were with us today. To my sister, Jumana and her beautiful family,
thanks for keeping me humbled. To my brothers Naill, Hazim, and Abdulla, thanks for
keep pushing me forward and expecting more of me.
Last but not least, I want to thank my long life friends and colleagues for being there
for me during this journey.
Thanks everyone.
v
Table of Contents
CHAPTER 1: INTRODUCTION TO THE STUDY.......................................................... 1
1.1
Introduction .......................................................................................................... 1
1.2
Background of the problem .................................................................................. 2
1.3
Statement of the Problem ..................................................................................... 4
1.4
Purpose of the Study ............................................................................................ 4
1.5
Significance of the Study ..................................................................................... 5
1.6
Nature of the Study .............................................................................................. 6
1.7
Research Questions and Hypothesis .................................................................... 7
1.8
Assumptions ......................................................................................................... 8
1.9
Limitations ........................................................................................................... 8
1.10
Delimitations ........................................................................................................ 9
1.11
Summary .............................................................................................................. 9
CHAPTER 2: LITERATURE REVIEW .......................................................................... 10
2.1 Folksonomies .......................................................................................................... 11
2.2 Use of Tags in Web Search ..................................................................................... 12
2.3 Motivations of Using Tags ...................................................................................... 14
2.4 Types of Tags .......................................................................................................... 16
2.5 Metadata .................................................................................................................. 17
2.6 Crowdsourcing through Mechanical Turk .............................................................. 19
2.7 Information Retrieval Systems and Evaluation Models .......................................... 23
CHAPTER 3: METHODOLOGY .................................................................................... 35
3.1 Introduction ............................................................................................................. 35
3.2 Research Design ...................................................................................................... 36
vi
3.3 Appropriateness of Design ...................................................................................... 38
3.4 Research Questions ................................................................................................. 40
3.5 Population................................................................................................................ 42
3.6 Sampling.................................................................................................................. 42
3.7 Instrumentation and Data Collection....................................................................... 43
3.8 Operationalization of Variables .............................................................................. 46
3.9 Data Analysis .......................................................................................................... 48
3.9.1 Descriptive Statistics ........................................................................................ 48
3.9.2 ANOVA ............................................................................................................ 48
3.9.3 Multiple Linear Regression .............................................................................. 49
3.10 Summary ............................................................................................................... 50
CHAPTER 4: RESULTS .................................................................................................. 52
4.1 Introduction ............................................................................................................. 52
4.2 Collected Data and Overview of Sample Population .............................................. 52
4.2.1 Mechanical Turk Population and Survey Descriptive Statistics ...................... 53
4.2.2 Popacular and Delicious Data for Most Tagged Sites ...................................... 56
4.3 Hypothesis Data Analysis ....................................................................................... 59
4.4 Summary ................................................................................................................. 67
CHAPTER 5: CONCLUSIONS AND RECOMMENDATIONS .................................... 68
5.1 Scope, Limitations, Delimitations ........................................................................... 68
5.2 Findings and Implications ....................................................................................... 69
5.3 Recommendations ................................................................................................... 72
5.4 Scope and Limitations of the Study ........................................................................ 73
5.5 Significance of the Study ........................................................................................ 75
vii
5.5 Summary and Conclusions ...................................................................................... 75
References ......................................................................................................................... 77
Appendix A: The Survey Tool .......................................................................................... 81
Appendix B: Popacular Top 100 Most Tagged Sites on Delicious – All-Time ............... 88
Appendix C: Popacular List of Most Tagged Sites – One Month .................................... 91
Appendix D: Popacular List of Most Tagged Sites – One Week ..................................... 92
Appendix E: Popacular List of Most Tagged Sites – One Day ........................................ 93
Appendix F: Popacular List of Most Tagged Sites – 8 Hours .......................................... 94
Vita.................................................................................................................................... 95
viii
List of Tables
1. Cost-Benefit Assessment for All Participants……………………………………….26
2. Descriptive Statistics of Study Sample..........................................................................55
3. Data for the All Time Top 5 Most Tagged Sites...........................................................57
4. Mapping Between Tag Classification Schemes.............................................................57
5. Site 1 - YouTube Tagging Data from Delicious............................................................57
6. Site 2 - Flickr Tagging Data from Delicious.................................................................58
7. Site 3 - Pandora Tagging Data from Delicious..............................................................58
8. Site 4 - Facebook Tagging Data from Delicious...........................................................59
9. Site 5 - Digg Tagging Data from Delicious...................................................................59
10. Pairwise Comparisons of Tag Creation Effectiveness among Sites............................61
11. Comparison of Tag Creation Effectiveness by Tag Types at Sites 1, 2, 3 and 5.........63
12. Regression Results for Site 1.......................................................................................64
13. Regression Results for Site 2.......................................................................................64
14. Regression Results for Site 3.......................................................................................65
15. Regression Results for Site 4.......................................................................................65
16. Regression Results for Site 5.......................................................................................66
ix
List of Figures
1. Taxonomy of Tagging Motivations...............................................................................15
2. IR Evaluation Model......................................................................................................24
3. Real-time Processing Pipeline.......................................................................................33
4. Iterative Experimental Design Approach Used in This Study.......................................36
5. Age Distribution of Mechanical Turk Workers.............................................................54
6. Education Distribution of Mechanical Turk Workers ..................................................54
7. Primary Reasons for Participation.................................................................................55
x
ABSTRACT
Recreating Popular User-Generated Tags Effectively and Efficiently by Utilizing
Crowdsourcing
Deima Elnatour
Tony Hu, Ph.D.
It is well-known today that not all user-generated tags provide additional
information that can be used to further improve web search beyond traditional methods.
A number of studies established that popular tags are most useful for web search. Popular
tags are common tags provided independently by a large number of users to describe web
sources of interest. However, the same studies concluded that incorporating these tags
will not make a measurable impact on search engine performance given the size of the
web and the scarcity and distribution of popular user-generated tags across the extended
web.
This dissertation is focused on finding a way to create social bookmarking tags
efficiently and effectively by utilizing crowdsourcing systems. Crowdsourcing is a
platform in which tasks can be posted that would then be completed at a stated price or
reward. This aims to attract users to complete relatively simple tasks that are easier for
humans and harder for machines. Crowdsourcing has been widely used by companies and
researchers to source micro-tasks requiring human intelligence such as identifying objects
in images, finding or verifying relevant information, or natural language processing.
xi
The purpose of the study is to determine whether popular internet bookmarking
tags can be recreated through crowdsourcing. Amazon Mechanical Turk, the work
marketplace, was used as a means to conduct an experiment regarding the reproduction of
popular tags for a variety of websites using Delicious, a service for storing and sharing
bookmarked pages on the internet. The key research questions for the study were
examined as a number of factors regarding tag creation including the effectiveness of
crowdsourcing in reproducing popular tags, categorizing which tags can be recreated
most effectively, and the relationship of worker characteristics and demographics to the
effectiveness of producing popular tags.
The results of the study suggest that popular internet bookmarking tags can be
recreated effectively through crowdsourcing. Moreover, tag creation effectiveness was
significantly higher for tag type “Factual and Subjective” (F & S) than for tag type
“Factual” (F). Additionally, other variables were tested to assess their relationship with
tag creation effectiveness. Interest in site, familiarity with site, tag creation experience
and tag usage experience were significantly related to tag creation effectiveness for some
of the sites, although the direction and significance of these relationships was not
consistent across all sites included in this study.
This study provides a promising new direction for cheap, fast and effective
creation of user-generated tags that would be useful in indexing more of the extended
web and consequently help improve web search. Furthermore, it informs future
experimental and micro-task design for creating high quality tags reliably using
crowdsourcing platforms.
1
CHAPTER 1: INTRODUCTION TO THE STUDY
1.1 Introduction
Web-based tagging systems, which include social bookmarking systems such as
Delicious, allow participants to annotate or tag a particular resource. Historically,
annotations have been used in several ways. Students annotate or tag their books to
emphasize interesting sections, to summarize ideas and to comment on what they have
read (Wolfe, 2000). Davis and Huttenlocher (1995) suggested that shared annotations in
the educational context can serve as a communication tool among students and between
students and instructors. There can be threaded discussions around class materials that
directly and specifically link to the class material. Farzan and Brusilovsky (2005, 2006)
made use of annotation as an indicator of the page relevance for a group of learners in an
online learning system.
Web systems that allow for social annotation can provide useful information for
various purposes. Dmitriev et al. (2006) explored the use of social annotation to improve
the quality of enterprise search. Freyne et al. (2007) made use of social annotation to rerank research paper search results. Hotho et al. (2006) proposed a formal model and a
new search algorithm for folksonomies. Bao et al. (2007) explored the use of social
annotation to improve web search. Social annotations have the potential to improve
searching for resources (Marlow et al., 2006). However, published research on using
social annotations to improve web search is sparse.
Rowley (1988, p. 43) explained that “the indexing process creates a description of
a document or information, usually in some recognized and accepted style or format”
where the term “document” is used to reflect a container of information or knowledge.
2
Therefore, a document could take the form of any combination of form and medium.
Document indices could also be viewed as a structured form of content representation of
a document. This leads to the notion that indexing is actually the activity of creating
surrogates of documents that summarizes their contents (Fidel, 1994).
With the new era of the internet and the growing popularity of online social
software, ordinary people are becoming human indexers with no policies, rules or
training. This calls for the need to focus on social indexing and understand its
characteristics as it stands in online social networks today. It is also critical to understand
the nature of user-generated tags (folksonomies) and see if such tags can be leveraged to
improve the search function and address some of the long standing issues in the IR field.
This dissertation is focused on developing a better understanding of usergenerated tags and finding a way to generate tags that can help improve web search. A
number of empirical studies concluded that social bookmarking tags can provide
additional data to search engines that were not provided by other sources and can
consequently improve web search. These studies however concluded that there was a
lack of availability and distribution of the tags that can improve search. This study was
focused on finding a way to create social bookmarking tags efficiently and effectively
using crowdsourcing.
1.2 Background of the problem
Tags, also known as user-contributed metadata, are viewed as a tool for digital
information management. Most of these tags are created during the use of social
computing software (e.g. Delicious and Flickr) and web 2.0 systems. Social computing
3
systems allow users to be both producers and consumers of information. They also allow
users to find and save interesting documents and web pages that were produced by others.
In either case, social computing systems have one important and common feature: they
provide the user with the ability to tag documents. A tag is a one word descriptor that best
describes the content of the tagged document. Users have the flexibility to assign no tags
at all or assign multiple tags to best describe the content of the document. This
phenomenon has spread contagiously throughout the Internet where most organizations
had no choice but to incorporate some form of social network features into their home
pages and online services. It is believed that social features such as tags enhance the user
experience and are usually associated with higher user satisfaction. Researchers turned
their attention to this phenomenon and tried to figure out why tagging is a popular feature
and increasingly in high-demand. There are two main incentives to tagging: i) private,
mainly to address Personal Information Management (PIM) needs, or ii) public, where
the tags are driven by the need to collaborate and share information with others. Pirollli
(2005) suggests that tags provide the “information scent” that connects users with
information.
Social network tags are also known as user-generated tags. Researchers find that
tags have both private and public benefits. Private benefits are mainly focused on
enhancing Personal Information Management (PIM) where users’ main incentive is
saving this information for personal use in the future. Public benefits, on the other hand,
are usually driven by the desire to collaborate and discover new information that is of
interest through navigating tags. Tagging requires a lesser cognitive burden than
categorizing, as stated by Sinha’s tagging cognitive analysis (Sinha, 2005). Social tagging
4
offers a flexible solution when compared to traditional hierarchical tagging as it allows a
user to express the various dimensions of a document by applying multiple tags to one
document. Some studies show that users choose tags based on the words that they are
likely to use when searching for these documents in the future (Wash, 2006). Therefore,
social network tags open new doors and provide new hope to solving some of the long
standing problems in the field of IR. Especially ones related to the mismatch between
search query words and words in the document. The hope is that tags offer quality indices
for documents that cannot be achieved by machine indexing function. This is believed to
be true since social network tags harness the social and collaborative wisdom of the
crowds, and they are likely to be more effective as indices that can enhance IR function.
1.3 Statement of the Problem
A number of studies concluded that there was a lack of availability and
distribution of the tags that can improve web search. This study focused on finding a
way to create social bookmarking tags efficiently and effectively using crowdsourcing.
Furthermore, this study examines tag creation effectiveness across systems and tag types
while exploring the relationships between tag creation effectiveness and a number of
user-related factors such as interest in the website, familiarity with the website, tagging
experience (both usage and creation), experience with search engines, and time spent on
the internet.
1.4 Purpose of the Study
The purpose of the study was to determine whether popular internet bookmarking
tags can be recreated through crowdsourcing. Amazon Mechanical Turk, the work
5
marketplace, was used as a means to conduct an experiment regarding the reproduction of
popular tags for a variety of websites using Delicious, a service for storing and sharing
bookmarked pages on the internet. The key research questions for the study were
examined as a number of factors regarding tag creation including the effectiveness of
crowdsourcing in reproducing popular tags, categorizing which tags can be recreated
most effectively, and the relationship of worker characteristics and demographics on the
effectiveness of producing popular tags.
1.5 Significance of the Study
The significance of this study is two-fold. First, the findings of this study could
help develop a better understanding of how social bookmarking tags are created and what
can be done to effectively improve their availability and distribution towards an efficient
web search.
Second, the findings of this study provide new insights on indexing which is the
basis of information retrieval. Numerous variations on indexing have been tried over the
years. Modern search engines use several methods to find additional metadata
information to improve a resource indexing for enhancing the performance of the
similarity ranking. Craswell et al. (2001) and Westerveld et al., (2002) explored the use
of links and anchors for web resource retrieval. They pointed out that anchor text helps
improve the quality of search results significantly. The anchor text can be viewed as web
page creator annotation. This suggests that annotation can be used to support document
indexing. Social tagging systems, e.g. Delicious, allow participants to add keywords that
are tags to a web resource. These tags can be viewed as user annotations of a web
resource. Dmitriev et al. (2006) explored the use of user annotation as intranet document
6
indexes. Yanbe et al. (2007) converted a tag and its frequency to be a vector that
represents a page’s content.
These findings suggest that there is potential value in investigating and finding a
reliable method to producing popular tags through crowdsourcing effectively and
efficiently. This would allow for on-demand indexing of web resources which would then
lead to enhanced web search.
1.6 Nature of the Study
The research method that was selected for this study was a quantitative quasi
experimental correlational research design. The dependent variable for the study is tag
creation effectiveness which is a continuous variable. The independent variables for the
study are tag type, system type, interest in the website topic, experience with website or
similar website, tagging experience, search engine experience, and average daily time
spent on the internet. Tag type and system type are categorical variables while the
average daily time spent on the internet is a continuous variable and the rest are ordinal
variables. An analysis of variance (ANOVA) was conducted to determine the relationship
among the independent and the dependent variables. ANOVA was used for the second
and third research question. Furthermore, a multiple regression was conducted to support
the ANOVA.
The use of a quasi-experimental research design allowed the determination of
whether there were statistically significant differences between groups (Cozby, 2007) in
which for this study are the different tag and tag types. The quasi experimental design
was appropriate to assess these differences because it allowed the researcher to compare
7
the levels or categories of the independent variables with regard to the dependent variable
in order to determine whether there was a difference between the groups (Broota, 1989).
More so, this quasi experimental correlational quantitative study specifically
investigated the relationship of tagging experience, search engine experience, and
average daily time spent on the Internet of the participants. With such an objective then a
correlational design was appropriate. In the context of social and educational research,
correlational research is used to determine the degree to which one factor may be related
to one or more factors under study (Leedy & Ormrod, 2005).
1.7 Research Questions and Hypothesis
The research questions and hypotheses that guided this study were:
RQ1: Are there statistically significant differences in tag creation effectiveness for
popular tags among the sites included in this study?
H10: There are no statistically significant differences in tag creation effectiveness
for popular tags among the sites included in this study.
RQ2: Are there statistically significant differences in tag creation effectiveness
across tag types?
H20: There is no statistically significant difference in tag creation effectiveness
across tag types.
RQ3: What is the relationship among tag creation effectiveness and Interest in the
Website Topic, Experience with Website or Similar Website, Tagging Creation
Experience, Tag Usage Experience, Experience with Search Engine, and Time spent on
the Internet?
8
H30: None of the independent variables of Interest in the Website Topic,
Experience with Website or Similar Website, Tagging Experience, Experience with
Search Engine, and Time on the Internet have a statistically significant effect on tag
creation effectiveness.
1.8 Assumptions
The conclusions and interpretations developed as a result of this study will be
based on a number of assumptions guiding the study. First, it is assumed that the
participants in this study know and are good representatives of the post tagging tasks that
are to be evaluated on Mechanical Turk. Second, since this study will be based on the
answers to the survey instruments used to collect data, it will be assumed that the
instrument will be valid and reliable with respect to the collection of the tag creation
effectiveness.
1.9 Limitations
Limitations are factors that limit the study, such as weaknesses, problems, and
reservations, which impact the research. All participants in the study will participate
voluntarily and will fill out all survey questions honestly and completely. Therefore,
respondents will be limited to the number of consenting participants that choose to
participate in the research study and complete the survey. Individuals may decide not to
participate for various reasons (Leedy & Ormrod, 2003; Creswell, 2009). The study will
also be limited by the potential for malingering by the participants and the time limits
placed on the conduction of the study.
The validity of this quantitative quasi experimental correlational study will be
limited to the reliability of the instruments used to gather and interpret research data. The
9
methodology of the study might present a limitation because it does not allow for variable
manipulation. The lack of variable manipulation prevents the opportunity for
determining causality in regards to research relationships.
1.10
Delimitations
Delimitations are factors the researcher could not control or decide not to include
in the study. These factors limit the ability to generalize the results of the study to the
actual population. This study will be confined to surveying participants selected for the
post tagging tasks in Mechanical Turk. This study will focus only on determining
whether popular internet bookmarking tags can be recreated through crowdsourcing. The
findings of this study are limited to the use of Mechanical Turk, the work marketplace.
Mechanical Turk was used as a means to conduct an experiment regarding the
reproduction of popular tags for a variety of websites using Delicious, a service for
storing and sharing bookmarked pages on the internet.
1.11
Summary
This chapter provided an introduction to the study regarding the determination of
whether popular internet bookmarking tags can be recreated through crowdsourcing. The
background was discussed along with the problem purpose, and nature of the study.
Research questions and hypotheses were presented to guide the quantitative correlational
study. Assumptions, limitations, and delimitations were discussed. Chapter 2 will
contain a more detailed discussion of the literature review and Chapter 3 will include the
methodological specification for the proposed study.
10
CHAPTER 2: LITERATURE REVIEW
Web 2.0 applications focused on users’ interactions and encouraged them to
generate content. As a result, online communities flourished and users started sharing
their ideas and actively generating new content and collaborating with others to satisfy
their curiosity and needs. With all this content the need for organizing, finding and haring
such content became an important focus for most users. Web 2.0 applications kept up
with the rising needs around managing and accessing user generated content by
introducing a number of features to help classify and explore content. Tagging was one of
these features. Tagging allowed the user to assign self-defined words that would
effectively describe and organize their content. This presented users with a platform to
explore available content and connect by finding similarities in their collections.
As online communities continue to grow in importance and user based, prominent
search engines decided to invest in these systems, with Yahoo acquiring Flickr and
Del.icio.us and Google acquiring YouTube (Bischoff, Firan, Nejdl, & Paiu, 2008). By
annotating their respective content with tags, users’ connections can be established.
Flickr and other photo-sharing websites provide users with an option to provide tags that
best describe their photographs. Tags in this case could be a description of the photo, the
place or setting, subject of the picture or any other distinguishing characteristic such as
color or action. Meanwhile, for music sharing websites like Last.fm, songs are tagged
based on their artists, album, genre, mood, or any other classification specified by the
user.
11
The emergence of tagging and the wide spread of tagging applications caused it to
become a main topic of research in the field of information science (Figueiredo, et al.,
2009). Problems arise with the lack of an accepted standard methodology for evaluation
and distinct textual features available on web 2.0 systems. Large-scale samples to be
used for these are also restricted in access, in effect hindering the comparison of
performance between social networks or content features on these networks. Thus,
different techniques have been formulated to perform these tasks. Website crawls or nonpublic snapshots are used in analysis, but because of their innate characteristics and
distinct methodologies, the comparison between them has become challenging, if not
inaccurate (Crecelius & Schenkel, 2009).
2.1 Folksonomies
Existing taxonomies and predefined dictionaries have been found as lacking in
flexibility and expensive to create and maintain. Tagging has become an alternative to
these top-down categorization techniques, allowing users to choose their own labels
based on their real needs, tastes, language or anything that would reduce their required
cognitive efforts. Community services that offer tagging of resources are called
folksonomies, where thousands or millions of users share their interests, preferences and
contributions. Folksonomies come in two forms, depending on the tagging rights that are
given. Narrow folksonomies restrict the bookmarking and tagging of resources only to a
number of users – such as the owner or other users he/she would specify. Broad
folksonomies, on the other hand, are those open to the entire community, enabling each
individual to relate to the activity of other users (Wetzker, Bauckhage, Zimmermann, &
Albayrak, 2010).
12
Aside from the existing indices for search queries of search engines, tags could be
used to complement existing indices and produce enhanced results. Tags and annotations
can provide additional information about the sources they are describing. These tags and
annotations include keywords or phrases that would be linked with other related and
relevant sources. In doing so, semantic web relationships are developed, leading to
improved retrieval and review ratings that would attract other users to it.
The objective of semantic web is to make online resources more understandable
to humans and machines. This has ushered in the emergence of web applications such as
web blogs, social annotations and social networks. Research in this field has been
centered on discovering the latent communities, detecting topics from temporal text
streams and the retrieval of highly dynamic information (Zhou, Bian, Zheng, Zha, &
Giles, 2008).
2.2 Use of Tags in Web Search
Tags have also been utilized as a way of bookmarking and giving out brief,
concise summaries about web pages for search engines. As mentioned earlier, this could
be used in a developed algorithm that would measure the popularity of a page or its
contents. It is used as an alternative for determining customer preference, such as the case
for Last.fm. Aside from associating track lists of similar users, it uses descriptive tasks to
recommend new songs to existing users. It has been proven that these tag-based search
algorithms come up with better results in comparison to track-based collaborative
filtering methods (Bischoff, Firan, Nejdl, & Paiu, 2008).
The text in the hyperlink that is most visible and accessible to the user is referred
to as the Anchor Text (AT) or link label. These are the ones that are used heavily by web
13
search engines. As they are able to describe the content of a linked object, they are used
as a measure for similarity among objects in various webpages and aid in query
refinement. (Bischoff, Firan, Nejdl, & Paiu, 2008)
As most personalization algorithms still work on text, the documents in the
dataset should be primarily textual social web content. The documents should be
equipped with full text information, but more important is the basic bibliographic
information such as author, title, abstracts and keywords. The dataset should explicitly
contain users and their search tasks for evaluating personalization. Because the
algorithms rely on the history of behavior and results that were adapted, then there should
be a sufficient sample. The person who proposed the search task is also encouraged to
provide relevance annotations. This should include as many extra features as possible,
such as hyperlinks, tags, categories/topic labels and virtual communities defined. This
optional user profile enables personalized results by identifying the users’ interests and
other document similarities, helping online communities make it easier to identify user
expertise and interest. (Yue, et al., 2009)
Social bookmarking systems could also help in the detection or identification of
trends in tagging, popularity and content. Del.icio.us is fast growing because of its ability
to centrally collect and share bookmarks among users. It follows a format that shares
information through two channels of the website. The first channel is through bookmarks
or tags. This is where users subscribe to others’ content and are updated whenever their
interests are added onto. The second channel is through the main webpage, where the
front page is the primary means of sharing information. As it is the first point of contact,
14
it attracts the attention of all visitors of the site (Wetzker, Zimmermann, & Bauckhage,
2008).
2.3 Motivations of Using Tags
There are a number of studies that focused on understanding the motivation
behind tagging systems Studies have shown that among the primary goals of tags would
be to serve the needs of individual users – such as in browsing, categorizing and finding
items (Bischoff, Firan, Nejdl, & Paiu, 2008). This could also be used for information
discovery, sharing, community ranking, search, navigation and information extraction.
There are two aspects of motivation in using tags – organizational and personal
(Suchanek, Vojnovic, & Gunawardena, 2008). The organizational aspect is for the
community – to provide context to others or describe characteristics of a certain object.
The personal aspect is done by the user for his/her own use, for better organization and
classification of information.
An example of the use of these tags in social networking sites is that of Flickr.
The site has helped in the annotation of photos and enabled users to share it with others.
A photo is made searchable and tags are generated to further increase its exposure to the
community. Another service, ZoneTag, is a mobile phone application that encourages
annotation immediately after taking the picture. Aside from personal organization,
another motivation for tagging was to convey information and opinions about the photo
itself yet. Another motivation is to share contextual information for other people
(presumably relatives and friends) whenever they tag their photos. A taxonomy is
developed (Ames & Naaman, 2007) for annotation motivations and are summarized in
Figure 1. It states that there are two dimensions with different incentives whenever
15
photographs are tagged. The first dimension is sociality, on whether the tag was intended
for use of the person who took the shot or for friends/family or the general public. The
second dimension, function, refers to the intended uses of the tag.
Figure 1. Taxonomy of Tagging Motivations (Ames & Naaman, 2007)
From the observations and analysis of user motivations, several implications were
made for tagging systems in general (Ames & Naaman, 2007). First, the annotation
should be pervasive and multi-functional, incorporating all the categories in the
taxonomy. Second, information captured should be easy to annotate right away. Easy
annotation at the point of capture would ensure that the tagging activity is done, and at a
more precise manner. Third, users should not be forced to annotate. Even if this might be
a more efficient way of tagging, it is still up to the discretion of the user when they would
annotate. Fourth, annotation should be allowed in both desktop and mobile settings. For
mobile annotation, it would help define the in-the-moment aspect of annotation, while the
desktop/web-based component would allow more descriptive or bulk notation. Lastly,
relevant tag suggestions can encourage tagging and give users ideas about possible tags.
16
These suggestions should be clearly defined in order to prevent confusion or ambiguity. It
should be ensured that tags are accurate and not just entered when made available.
2.4 Types of Tags
Tags are classified through eight different dimensions (Bischoff, Firan, Nejdl, &
Paiu, 2008): topic, time, location, type, author/owner, opinions/qualities, usage context
and self-reference. The topic provides a description about the item under consideration –
the subject of a picture, the title and lyrics of a song, and so on. While the theme of a
written piece can be extracted from its content, it is not as easily done in pictures and
songs. The time category specifies the month, year, season or any other periodical
indicator. The location would talk about the setting – a country or city, its sights and
attractions, landmarks or hometowns of artists/writers. The type of file would define
which kind of media is used, such as the type of web page presented. For music, it would
define the accompaniment of instruments or genre. For pictures, the camera settings and
styles used would be identified. The classification by author/owner defines the user and
the rights to the said object. Tags could also be made on subjective descriptions, such as
the quality of the object and the common opinion shared by different users. Usage
context states the purpose of the object, along with how it is collected or classified.
Lastly, self-reference are tags that are highly personalized for personal use (Bischoff,
Firan, Nejdl, & Paiu, 2008).
In an analysis of tag types, it was found that 50% of the tags in Del.icio.us, Flickr
as well as Anchor Text (AT) are Topic-related keywords (Bischoff, Firan, Nejdl, & Paiu,
2008). This was mainly because of the convention that pictures and web pages are
classified according to topic. For music in Last.fm, the type was the classification that
17
was most prominent. This is because this category is comprised of the song format,
instrumentation and genre for music-related media. It was followed by Opinion/Quality
and Author/Owner, showing how users refer to their content when it comes to music.
Another finding in this research was that more than half of the existing tags provide
additional information to the resources they annotate.
2.5 Metadata
Metadata is essential for the organization and search of information resources.
There are professional metadata creators who base their work on standards or controlled
vocabularies. However, the high quality of this data also entails high cost. This would
limit its production and how it expands its scale. With the increasing volume of digital
resources on the internet, alternative methods to metadata creation are desired. Even if
automatic or semi-automatic generation of metadata is explored, the capabilities are
limited in comparison to those that are created through human intelligence. Through the
transition from Web 1.0 to Web 2.0, web users who annotate web resources through
social tagging systems have become another class of metadata creators.
The social tags created by users provide a special type of metadata that can be
used for classifying and searching for resources. All web users who are able to access the
content can also be taggers. The main difference is visible in the enforcement of strict
indexing standards where in Web 2.0 there are fewer rules and more freedoms provided
to the users to choose whatever they see as a good description. Folksonomies introduced
a big improvement compared to controlled vocabularies or large scale ontologies – an
appropriate set of required resources or information – that would have to be upgraded and
would entail high maintenance cost (Lu, Park, Hu, & Song, 2010).
18
While innovations in technology have brought forth greater attention to
photography and other related fields, semantic metadata about photo content is not
readily available. Thus, photo collections would need to have some form of annotation to
improve usefulness, as well as to help recall and support search. However, the burden of
this semantic interpretation and annotation still falls to the owner of the collection.
Therefore, tools for annotation have been a constant topic of research (Ames & Naaman,
2007).
Two approaches for metadata creation in the web environment are studied: usercreated and author-created metadata (Lu, Park, Hu, & Song, 2010). The user-created
metadata are those applied by users to annotate web pages, such as social tags. The
author-provided metadata are the keywords and descriptions placed in the head part of
the document. The overlap of metadata with the page title and text/body is examined to
gauge how much these tags contain additional information beyond page content. It was
found that both tags add to the existing page content, but more than 50% of the tags and
keywords are not present in the title and content of the pages. Authors are also more
likely to use terms from the page content to annotate the pages. Data analysis also
showed that users and authors only agree on a small portion of terms that can be used in
describing the web pages (Lu, Park, Hu, & Song, 2010). Clustering methods were then
used to evaluate whether social tags or author-provided keywords and descriptions are
effective in discovering web resources. The results showed that both tags and authorprovided data could be used to improve the performance significantly, with tags being the
more effective independent information source. Lastly, it was found that tags can be more
effectively utilized as the links connecting pages with related topics and with regards to
19
the social/user property (Lu, Park, Hu, & Song, 2010).
2.6 Crowdsourcing through Mechanical Turk
This knowledge must also be in the form that is suitable for reasoning. Text
corpora are mined to create useful and high quality collections of this knowledge under a
methodology referred to as Open knowledge extraction (Gordon, Van Durme, &
Schubert, 2010). The whole process of encoding the knowledge is quite cumbersome and
entails additional cost in labor by experts. OKE creates logical formulas by using forms
of human knowledge – books, newspapers and websites. It extracts insights and
information from important sources, and is different from information extraction as it
focuses on the everyday, common sense knowledge rather than specific facts. It also
offers a logical interpretability of outputs. The knowledge base of these OKE systems is
aimed to address the gap in the quality of automatically acquired knowledge. Thus, it
would propose an easy method for evaluating the quality of results. This is where the use
of the Mechanical Turk comes into play. From their research, it was found that an
inexpensive and fast evaluation of its output could be a way to measure incremental
improvements in output quality coming from the same source.
Based on these issues, new ways have been developed to collect input from users
online, such as surveys, online experiments and remote usability testing. With these tools,
the potential users could easily be accessed through anyone with internet. A study has
been done on the micro-task market (Kittur, Chi, & Suh, 2008), where small tasks are
entered into a common system that users would select and complete for some reward
(monetary or non-monetary). The micro-task market offers convenience as it could be
20
completed within a few seconds or minutes. It presents instant gratification as there is a
quick access to the large user pool to collect data and they are immediately compensated.
Amazon’s Mechanical Turk (MTurk) is one of the platforms in which tasks can
be posted that would then be completed at a stated price. This aims to attract human users
to complete relatively simple tasks that are easier for humans and harder for machines.
The Mechanical Turk has been widely applied by companies to source micro-tasks
requiring human intelligence such as identifying objects in images, finding relevant
information or natural language processing. The Mechanical Turk works by converting
each annotation task into a Human Intelligence Task (HIT). The core tasks for a
researcher are: (1) define an annotation protocol and (2) determine what data needs to be
annotated (Sorokin & Forsyth, 2008). These tasks only require minimal time and effort
and today this system employs over 100,000 users from about 100 countries.
However, it is inevitable that the system would face some serious challenges.
First, it would go against the conventional way of participation assignment and would
rely solely on people accepting and completing the tasks. Second, it requires a bona fide
answer that could not be quickly monitored in case other users do not cooperate properly.
Lastly, the diversity in user base can generalize population from different areas without
taking into consideration their demographic information, expertise and other important
user-specific data. It should be noted that during the initial operation of the Mechanical
Turk, only workers with US bank accounts were accepted. However, they have recently
allowed workers from India and other countries to receive payment as well (Ipeirotis,
2010).
21
Several designs have been recommended to address these issues after experiments
were made (Kittur, Chi, & Suh, 2008). First, it is important to have explicitly verifiable
questions as part of the task. This would ensure their awareness in monitoring and
checking of their answers, promoting better participation. Second, the task should be
designed to require least effort in order to prevent wrongful completion. Third, there
should be multiple ways to detect suspect responses. This could be done through task
durations or repeated answers.
These are similar to the three distinct aspects of quality assurance (Sorokin &
Forsyth, 2008): (a) ensuring that the workers understand the requested task and try to
perform it well; (b) cleaning up occasional errors; and (c) detecting and preventing
cheating in the system. The basic strategy done is collecting multiple annotations per
image. This would identify the natural variability of human performance and how
occasional errors influence the results. While it allows malicious users to be caught, it
entails additional cost. Another strategy is to perform a separate grading task. This is
done through scanning annotated images and scoring each one. This results in cheaper
quality assessments. The third strategy is to build a gold standard with the use of images
with trusted annotations. This would detect the performance immediately, as feedback
would also be provided to the worker. It is also a cheaper alternative as only a few images
would be used to demonstrate the gold standard. It has been found that it is important to
turn the annotation process into a utility. This would make it easy to determine which
data to annotate and what type should be applied.
Another study (Snow, O'Connor, Jurafsky, & Ng, 2008) was explored to
determine whether non-expert labelers can provide reliable natural language notations.
22
Similar to the gold-label strategy, they chose five natural language understanding tasks
that were easy to learn and understand even for non-experts. These tasks are: affect
recognition, word similarity, recognizing textual entailment, event temporal ordering and
word sense disambiguation. Each task was processed through the MTurk to annotate data
and to measure the quality of annotations in comparison with the gold-standard labels.
From the experiments conducted on the different tasks, the evaluation was that only a
small number of non-expert annotations per item are necessary to equal the performance
of an expert annotator.
One thing that the previous study was missing was the inclusion of machine
translation (Callison-Burch, 2009). MTurk provides the requesters of tasks three ways to
ensure quality. First, multiple workers could complete each HIT. This allows them to
select higher quality labels among respondents. Second, requesters could specify a
particular set of qualifications for the workers. Third, they have the option to reject the
work, which does not require them to pay. This keeps the level of participation high, even
if the incentive system is relatively small. This low cost is used to generate multiple
redundant annotations, which is in turn used for ensuring translation quality. The
judgment that was extracted from non-experts was able to achieve the equivalent quality
of experts. The study also showed the other capabilities of the MTurk, such as creating,
administering and grading a reading comprehension test with minimal intervention.
The Amazon Mechanical Turk has also been investigated on how built-in
qualifications could avoid spammers. From an investigation of worker performance, it
was found that a low constraint for a group would attract more spammers. It was also
found that there was no improvement in annotator reliability over time. Thus, consistent
23
annotations cannot be easily expected. However, it was observed that the workers could
be reliable in subjectivity word sense annotation. This provides great benefit as it enables
annotations to be collected for low costs and over short time periods. Thus, the large
scale general subjectivity word sense disambiguation component could possibly be
implemented, helping with various subjectivity and sentiment task analysis (Akkaya,
Conrad, Wiebe, & Mihalcea, 2010).
2.7 Information Retrieval Systems and Evaluation Models
The content and organization of information presented in screen displays is
critical to the successful performance of online catalogues. It is imperative that the
presentation of information is clear and effective in order to be helpful to the users. Thus,
classification methods have been used to summarize the contents of retrieved records into
one or two screens instead of long lists being displayed. The information retrieval system
should be carefully evaluated on what basis it would be designed after and what structure
it would follow. This could also take into account the perspective of the system designer
or a content-centered design based on user behavior (Carlyle, 1999). Moving from the
static process, a sequence of unrelated events, it now includes the users, tasks and
contexts in a dynamic setting (Belkin, Cole, & Liu, 2009). The current systems used in
evaluating information retrieval are not appropriate for many circumstances considered in
research. An evaluation model was proposed (as depicted in Figure 2) to address these
needs, with particular focus on usefulness.
24
Figure 2. IR Evaluation Model (Belkin, Cole, & Liu, 2009)
Information seeking occurs whenever there is an information need. The
performance of the system would be measured on how it supports users in meeting their
goal or completing the task that led them to information seeking. The proposed
evaluation model conducts IR evaluation of three levels. The first level is to evaluate the
information seeking with regards to what the user wants to accomplish. The second level
should assess each interaction and its contribution to the overall accomplishment of the
user. The third level would then assess each interaction with the information seeking
strategy (ISS) being used. The usefulness of each level is measured by how it contributes
to the outcome per interaction and how it would accomplish the whole task.
Other evaluations of search engines measure their accuracy and completeness in
returning relevant information, quantified through variables like recall and precision.
25
However, these measures are not sufficient to evaluate the whole system. Accuracy and
completeness only measure the system’s impact on the user.
It was found that the vocabulary used by del.icio.us is highly standardized,
attributed to the tag recommendation mechanisms they provide to the users. It was also
observed that the attention of users to new URLs is only for a short period, thus making
them disappear after just a short while. This could be caused by spam posted by
automated mechanisms. This presence of spam highly distorts any analysis, and it was
seen that 19 among the top 20 super users are automated. Thus, characteristics (very high
activity, few domains, very high or very low tagging rate, bulk posts, or any combination
of the aforementioned) are identified to improve detection and filtering (Wetzker,
Zimmermann, & Bauckhage, Analyzing social bookmarking systems: a del.icio.us
cookbook, 2008)
To understand how the system of information retrieval would work, participants
and roles are clearly defined. First, there is the information seeker, usually the end-user or
consumer of the services offered by the system. Second is the information provider, the
entity responsible for the content to be searched, explored and delivered. Third, there are
the information intermediaries, which could be categorized as either resource builders or
exploration partners. Lastly, there is the system provider, responsible for the development
and maintenance of the technology (Paris, Colineau, Thomas, & Wilkinson, 2009). While
these roles aim to differentiate between all entities, they may not be appropriate for all
situations. A summary of the cost-benefit assessment for these participants is shown in
the table below.
26
Table 1
Cost-Benefit Assessment for all participants (Paris, Colineau, Thomas, & Wilkinson,
2009)
Participant Information
Information
Information
System
Seeker
Provider
Intermediaries
Provider
Benefits
Task
Audience reach Resource builders: System usage
Ease of knowledge Reliability
effectiveness
Audience
Knowledge
accuracy
creation & context Response time
modeling
Correctness
gained
Message
Accuracy of
accuracy
Exploration
partners: Task
exploration
Satisfaction
effectiveness
Costs
Time to
Metadata
Resource builders: Implementation
Time to create and hardware &
complete task
provision
Cognitive load
Structured
integrate the
software cost
resource
Learning time
information
System
Exploration
Currency of
maintenance
Data
partners: Time to
System
capture contextual
integration
factors
The Cranfield or “batch mode” style of evaluation has been a cornerstone of IR
progress for over 40 years and serves as a complement to manual user studies (Smucker,
2009). This evaluation style utilizes a list of documents from which ranks are generated
by a retrieval system based on the response to queries. From here, the list is evaluated
through a pre-existing set of relevance judgments. The caveat in the process is how it
does not take into consideration the wide range of user behavior that is present in
interactive IR systems. To address these, three ideas are proposed: (a) evaluation should
be predictive of user performance, (b) evaluation should concern itself with both the user
interface and the underlying retrieval engine and (c) evaluation should measure the time
required for users to satisfy their information needs.
There has been an evident shift of interest from the retrieval of query-relevant
documents to the retrieval of information that is relevant to the user needs. An approach
27
for identifying user needs is to have an analysis of user activity based on query results
(Stamou & Efthimiadis, 2009). It has been established that the evaluation would depend
on the analysis of user interaction with the retrieved results for judging their usefulness in
satisfying the user search intentions. Another aspect to be explored systematically is the
user’s perception of the usefulness of the results. It would also observe the impact of
retrieved results that are not used on user satisfaction from retrieval effectiveness. Almost
half of the searches conducted do not result on a single click for the results, and these
might fall under two categories: intentional-cause and unintentional-cause. The
difference is that the unintentional is when the user does not get what is expected and the
intentional is used for instant information or updates (Stamou & Efthimiadis, 2009).
This is where the importance of tags is highlighted, as it would also be evaluated
on the following measures developed by Stamou and Efthimiadis: (a) query refinement
probability, (b) query-results usefulness, and (c) update search probability. The three
methods undergo a probabilistic approach in which query refinement probability finds the
effectiveness of consecutive searches and how they are refined through identification of
overlapping terms. A threshold is set for these refinements, and when it is not met, the
query-results usefulness is examined. It calculates the amount of time spent on the results
as well as the activity that is done on them. It would then lead to update search
probability, the probability that the user intention is only to obtain new information about
a previous search. These measures enable feedback into the system, using it to determine
user satisfaction from searches.
User-Independent Ground Truth. Using the DMOZ (Open Project) catalogue,
queries and ground truth are extracted. Each category would be given respective labels to
28
be used for keyword query. The set of relevant results for this query was formed by the
URLs in that category which was also present in the crawl made in del.icio.us. This
requires a large test collection while completely disregarding the user who submits the
query (Crecelius & Schenkel, 2009).
Context-based Ground Truth. A set of relevant answers were developed and
assumed to be more relative to the querying user. The set of relevant answers for a
keyword query is computed through the sets of items from friends of the user that match
the query. This is not entirely reliable as some bias could be found whenever those within
close proximity to the user are prioritized (Crecelius & Schenkel, 2009).
Temporal Ground Truth. A snapshot or group of snapshots from the social
network would be used to gauge the change experienced by the network over time. This
would be used to come up with relevant answers for a query. This may lack relevance
also as a user may just list down an item out of lack of knowledge and not actual interest
(Crecelius & Schenkel, 2009).
User Study. A set of topics was defined for each user, and results for each topic
from different methods are gathered. Each group (pool) is assessed by the user who
defined the topic. Although the queries are made public, a snapshot of the network is not
available, making it hard to reuse and evaluate other approaches (Crecelius & Schenkel,
2009).
Community-driven evaluation venues have successfully distributed the load of
defining queries and assessing evaluation results among the participating organizations
(Crecelius & Schenkel, 2009). Thus, it is preferable to apply it in social tagging networks.
Each organization would be required to define several topics, along with a description of
29
the information need, a corresponding keyword query and example results. Each topic
must be partnered with a user from the organization, one who has been a member of the
social network or has experienced being part of it. As the topics are established, a
snapshot of the network including the users is taken. This would be the data set that
would be submitted and compiled per topic and assessed by its original author. This
approach will then enable an evaluation that would incorporate all the peculiarities of
social networks. The success of such an initiative would be dependent on the cooperation
of companies and institutions who own social network data, along with others who would
want to participate in the project.
The perspective of test collection has truly shifted – from the use of a single judge
(topic author) before letting samples of the user population make explicit judgments, or
just analyzed to infer relevance (Kazai & Milic-Frayling, 2009). Google also applies
Crowdsourcing for its Image Labeler game and Yahoo has its Answers portal. While both
offer no incentive, Yahoo rewards the members with points that would raise their status
in the community. This is also referred to as Community Question Answering. The
incentive system has been found as a critical factor that motivates the workers to provide
relevant answers. The establishment of trust is further strengthened when multiple
assessors agree upon a common judgment. This translates to a better-defined topic,
subsequently leading to similar interpretations among judges. Care must be applied as
this may also lead to collusion by these workers just so they could increase their score.
Meanwhile, disagreement can indicate that a topic is ambiguous and there are difference
in the workers’ knowledge and criteria. Thus, the trust weight will depend on the ability
to differentiate between the two. From the experiments made, the observed levels of
30
agreement are relatively high. This suggests two things – collusion between the workers
was present or there was bias in their work. As relevance labels were already showed on
their tasks, it could have affected their opinions. It was also found that background
knowledge or topic familiarity does contribute to differences of opinions. Annotations are
also given importance as it may be more trustworthy since workers spend extra time and
effort in adding them. Three out of every four (76%) comments were explanations of
relevance decisions or short summaries, while around 15% were qualitative statements
about the relevance of the content (Kazai & Milic-Frayling, 2009). These comments may
have been added as suggestions to the reviewers and may signal ambiguous content or be
a measure of relevance. Lastly, it provides clues on the user background and task (Kazai
& Milic-Frayling, 2009).
A common problem encountered by search engines is vocabulary mismatch.
Existing work on information retrieval has been categorized into two classes: query
expansion and document expansion (Chen & Zhang, 2009). Query expansion is executed
at query running time, and terms related to the original query are added. Meanwhile,
document expansion modifies the documents as the system adds words related to the
document at indexing time. Document expansion is seen as the more desirable form as it
will not affect the query response time due to the long list of expanded query terms (Chen
& Zhang, 2009).
The effectiveness of the search engine can be measured via inferring classical
precision-recall based on the click-through rates mined from other websites of the main
link, inferred relevance of the different information facets from the click-through rates
mined from weblogs, and user studies to determine user satisfaction of the retrieved
31
information on the web via navigation. Thus, issues such as redundancy and effort to
navigate should be evaluated. It has been found that the Enhanced Web Retrieval Task
can be applied to numerous, active areas in web IR including semantic relationships,
opinions, sponsored content, geo-spatially localized results, personalization of search,
and multilingual support in search results (Ali & Consens, 2009).
The textual features comprise the self-contained textual blocks that are associated
with an object, usually with a well-defined topic or functionality. This type of analysis
uses four features: title, description, tags and comments. For scientific publications,
description is referred to as abstract and comments are reviews. Textual features may also
be categorized according to the level of user collaboration allowed by the application.
The textual features can either be collaborative or restrictive. Collaborative features are
those that may be altered or appended by any user, while restrictive only allows the user
to apply changes. This is also referred to as tagging/annotation rights. Usually, the title is
restrictive while the comments are collaborative. These textual features are characterized
in four aspects: feature usage, amount of content, descriptive and discriminative power
and content diversity. Feature usage shows that the title offers the best quality of all
features in all applications. It provides an understanding for the other objects and whether
they may be a reliable source of information or not. The amount of content would
determine if a feature is sufficient to be effective for IR. Heuristics are then used to assess
the descriptive and discriminative power of each feature, on whether they offer a
reasonably accurate description of the object content and/or discriminate objects into
different pre-defined categories. This measure of efficiency classifies the categories into
levels of relevance. Lastly, the content diversity across different features is measured to
32
come up with feature combination strategies. It was found that restrictive features seem to
be more often explored than collaborative ones. However, there is a higher amount of
content for collaborative features. Also, title and tags both exhibit higher descriptive and
discriminative power. Lastly, there is significant content diversity among features
associated with the same object, indicating that each feature possesses various kinds of
information about it (Figueiredo, et al., 2009).
Traditionally, there were only three ways search engines were able to access data
describing pages. These were page content, link structure and query or clickthrough log
data (Figueiredo, et al., 2009). An emerging field is the fourth type of data made
available: the user-generated content that uses tags or bookmarks to describe pages
directly. There are two different strategies in gathering datasets for the website. The first
one is through the monitoring of the recent feed, a real-time tracking but without
including older posts. There is also the crawl method, where tags are used to identify
similar URLs that are subsequently added to the queue. It provides a relatively unfiltered
view of the data but could experience bias towards popular tags, users and URLs. The
two methods complement each other and are represented through Figure 3. It shows (1)
whether the post metadata is acquired, (2&4) where the page text and forward link page
text is acquired and (3) where the backlink page text is acquired.
33
A significant finding from the research (Heymann, Koutrika, & Garcia-Molina,
2008) was that social bookmarking as a data source for search has URLs that are often
actively updated and prominent in search results. The use of tags has been proven to be
overwhelmingly relevant and objective. However, these tags are often functionally
determined by context. Almost one in six tags are found in the title of the annotated page,
and more than half are found in the page text. Other URLs also determine other tags,
which still lack in sample size to be more effective in use rather than full text search.
Improvement could be made through user interface features that would increase the
quality of tags.
Figure 3. Real-time Processing Pipeline (Heymann, Koutrika, & Garcia-Molina, 2008)
User-generated tags are able to make substantial semantic noise more than the
terms from page content and search queries. Tags are made more meaningful when they
are created by more users. Popular tags for a document would provide a better
incorporation of terms for the queries rather than the frequent content terms. These useful
terms (titles, categories, search keywords and descriptions) in the tags grow
proportionally with their popularity. While tag suggestions may bring about the bias that
34
the user would have on the more popular ones, it was found that the users did not prefer
this kind of suggestion method. They are more interested in the larger set of data rather
than its popularity. This, in turn, encourages the users to suggest a few tags for them to
gain popularity as well (Suchanek, Vojnovic, & Gunawardena, 2008).
A study was made on tagging within folksonomies from a user-centric perspective
(Wetzker, Bauckhage, Zimmermann, & Albayrak, 2010). It was seen that users who tag
for content categorization develop distinct tag vocabularies over time. While this
promotes heterogeneity, it is reduced when the tags of many users are aggregated. This is
how characteristic tag distributions are formulated. A novel approach to tag translation
was introduced, as it maps user tags to the global folksonomy vocabulary using the
labeled resources as intermediates. These mappings were used as basis for inference of
the meaning of user tags and a predictor of which tags a user will be assigning to new
content. Tag translation under this approach improves prediction accuracy for both tag
recommendation and tag-based social search. Expanding this approach for narrow
folksonomies would help in understanding how interests of users shift and change over
time. By creating accurate user models, quality of service could be improved.
Incorporating social annotations with document content is a natural idea,
especially for IR application. A framework was proposed to combine the modeling of
information retrieval with the documents associated with social annotations (Zhou, Bian,
Zheng, Zha, & Giles, 2008). Therein, user domains would be discussed based on their
social annotations. Language models from tags are combined with documents before user
expertise is evaluated on activity intensity. The study suggests that the effect of parameter
sets should be observed, especially on how it impacts user experience.
35
CHAPTER 3: METHODOLOGY
3.1 Introduction
The purpose of the study was to determine whether popular internet bookmarking
tags can be recreated through crowdsourcing. Amazon Mechanical Turk, the work
marketplace for tasks that require human intelligence, was used as a mean to conduct the
study. The study was comprised of multiple iterative experiments that were designed to
achieve the highest possible quality in popular tag reproduction. Delicious – an online
service for tagging, saving, and sharing bookmarks from a centralized location. Most
tagged websites and their tags were used as the golden set of tags to be ultimately
reproduced in this study. Key research questions for the study were examined along with
a number of factors regarding tag creation including the effectiveness of crowdsourcing
in reproducing popular tags, categorizing which tags can be recreated most effectively,
and the relationship of worker characteristics and demographics on the effectiveness of
producing popular tags.
Based on these criteria, a quantitative quasi-experimental research design was
deemed to be appropriate. This chapter presents a discussion of the following
specifications: (a) the research design, (b) sample size, (c) research questions/hypotheses,
(d) variables, and finally (e) the data analysis that would be conducted in order to
comprehensively address the research objectives. A summary will conclude the chapter.
36
3.2 Research Design
This proposed quantitative approach with a quasi-experimental correlational
research design primarily examined whether or not popular bookmarking tags can be
recreated through crowd sourcing. The main purpose of the research design is to provide
a method that allows for effective and efficient reproduction of popular tags using
crowdsourcing. To this end a number of experiments were conducted. Each experiment
provided useful data that suggested modifications to improve the experimental design of
the study, which helped improve tag recreation activity.
Figure 4. Iterative experimental design approach used in this study
37
The effectiveness of crowd sourcing in reproducing popular tags was examined
using a) quantitative data derived from online surveys and b) popular tags for most
tagged websites on Delicious. Participants were gathered by posting tagging tasks on
Mechanical Turk. Each participant was required to go through a qualification survey
before he/she was trusted to take part in the research study. This quality assurance step
was necessary to protect against automated scripts and workers that were trying to game
the system. There were three main objectives to the quality assurance step: a) verifying
that the participants understood the task and what was requested of them: b) identifying
incomplete responses or non-sense responses, c) identifying cheaters and preventing them
from participating in the study. Five websites were considered for tagging tasks and this
include You Tube, Flickr, Pandora, Facebook, and Digg. Those sites were chosen
because they are the most all time tagged sites on Delicious according to popacular.com.
Popacular.com is an online service that tracks most tagged web pages on Delicious at the
following intervals: hourly, 8 hours, day, week, month, and all time.
The top 10 most popular tags for each one of these sites were used in this study
along with data collected from the study participant survey responses. The top 10 popular
tags were used as a golden set to measure the participant’s ability to reproduce the same
tags and exploring tag creation effectiveness with a number of user related factors. The
38
analysis of these variables with respect to the objectives of the study was completed by
employing analysis of variance (ANOVA) and multiple linear regression.
3.3 Appropriateness of Design
The use of a quasi-experimental research design allowed the determination of
whether there were statistically significant differences between groups (Cozby, 2001) in
which for this study are the different tag and websites. The quasi-experimental design
was appropriate to assess these differences because it allowed the researcher to compare
the levels or categories of the independent variables with regard to the dependent variable
in order to determine whether there was a difference between the groups (Broota, 1989).
More so, this quasi experimental correlational quantitative study specifically
investigated the relationship of tagging experience (both usage and creation), search
engine experience, interest in the website, and average daily time spent on the Internet of
the participants. With such objective then a correlational design was appropriate. In the
context of social and educational research, correlational research is used to determine the
degree to which one factor may be related to one or more factors under study (Leedy &
Ormrod, 2005).
The research design is quantitative for the reason that a comparison was made
between an independent variable and dependent variable (Creswell, 2009). This means
39
that the researcher was able to quantitatively assign numerical values to the independent
and dependent variables so that a comparison was possible.
The quantitative research approach was more appropriate for this research study
than a qualitative design because with a qualitative design the researcher would not be
able to assess a direct relationship between two variables as a result of the open-ended
questions (Creswell, 2009). Qualitative design is more appropriate for observational or
exploratory research that requires open ended questions and possibly ethnographic
procedures. This study however follows a traditional deductive approach by building on
existing theories and operationalizing variables derived from previous empirical studies.
In this study quantitative research methods are most appropriate since the researcher was
able to measure the variables needed for this study and define specific research questions
derived from existing research. Therefore, the quasi-experimental design was used since
this would allow the researcher to determine whether there was a difference in the
different tags and websites based on the dependent variables.
In order to determine whether there was a difference between the tag creation
effectiveness and the various sites in terms of the tagging experience (both creation and
usage), search engine experience, and average daily time spent on the Internet, an
analysis of variance (ANOVA) was implemented. The ANOVA was appropriate because
40
the purpose was to determine whether there was a statistically significant difference
between two independent populations (treatment vs. control) (Moore & McCabe, 2006).
In addition, a multiple linear regression analysis was used to determine the relationship
between the independent and dependent variables. The dependent variable would be tag
creation effectiveness. The independent variables were interest in the website, familiarity
with website, previous tag usage experience, previous tag creation experience, experience
with search engines, time spend on the internet, and tag types. A multiple linear
regression is appropriate because there would be multiple independent variables and only
one dependent variable (Moore & McCabe, 2006).
3.4 Research Questions
A number of empirical studies concluded that social bookmarking tags can
provide additional data to search engines that were not provided by other sources and
consequently improve web search. The same studies however concluded that there was
a lack of availability and distribution of the tags that can improve search. This study was
focused on finding a way to create social bookmarking tags efficiently and effectively
using crowdsourcing.
The research questions and hypothesis that guided this study were:
41
RQ1: Are there statistically significant differences in tag creation effectiveness for
popular tags among the sites included in this study?
H10: There are no statistically significant differences in tag creation effectiveness
for popular tags among the sites included in this study.
RQ2: Are there statistically significant differences in tag creation effectiveness
across tag types?
H20: There is no statistically significant difference in tag creation effectiveness
across tag types.
RQ3: What is the relationship among tag creation effectiveness and Interest in the
Website Topic, Experience with Website or Similar Website, Tagging Creation
Experience, Tag Usage Experience, Experience with Search Engine, and Time spent on
the Internet?
H30: None of the independent variables of Interest in the Website Topic,
Experience with Website or Similar Website, Tagging Experience, Experience with
Search Engine, and Time on the Internet have a statistically significant effect on tag
creation effectiveness.
42
3.5 Population
The participants for this study were selected by posting tagging tasks on
Mechanical Turk. All participants were subjected to an initial qualification survey before
being allowed to participate in this study. Information related to tagging tasks was
collected from the participants and were subjected for analysis.
3.6 Sampling
When calculating the sample size for the study, there are several factors that have
to be taken into consideration. These factors include the power, the effect size, and the
level of significance of the study. The statistical power is based on the probability of
rejecting a false null hypothesis. As a general rule of thumb, the minimum power of a
study that would be necessary to reject a false null hypothesis would be equal to 80%
(Keuhl, 2000).
The next important factor is the effect size. The effect size is a measurement of
the strength of the relationship between the independent and dependent variables in the
analysis (Cohen, 1988). In most instances, the effect size of the study can be divided into
three different categories: small, medium, and large.
Finally, the last two important considerations for the correct calculation of the
sample size are the level of significance and the statistical procedure. The level of
significance is usually set at an alpha equal to a 5%, which is typically the standard for
43
statistical significance. The statistical procedure must also be taken into account. Simple
t-tests require a smaller sample than multiple regressions and, as a result, the most
complicated method determines the sample size. In this case, multiple linear regression
was used. Based on this information, the minimum sample size required for this study
was 74 (specified as a medium effect size, a power of 95% and a level of significance
equal to 5%). However in this study the overall number of participants gathered and user
for the analysis was 107.
3.7 Instrumentation and Data Collection
The information that was used for this study comes from two sources:
1. Popacular.com was used to obtain the top 5 most tagged web pages on
Delicious. In this study the researcher used the all-time data for most tagged
sites. Other options include hourly, 8 hours, daily, weekly, and monthly.
2. A survey that was presented on Mechanical Turk (see Appendix A). The
survey gathered key demographical information of the participants along with
information pertaining to tagging tasks. The information that was gathered
from this instrument included age, gender, education level, participant’s
interest in site, familiarity with the site, and participant’s experience with
search engines, time typically spent on the Internet, tag creation and usage
44
experience if any. The collection of data was administered through the
Mechanical Turk system.
The researcher used iterative survey research design and kept updating the survey
and qualifications requirements until the desired quality was achieved. There were three
total iterations of this survey. Each iteration provided tags that overlap more with the
golden set of popular tags gathered from popacular.com. The researcher found that the
forth iteration did not provide any tag quality benefit and decided to lock in the design
and instructions of the third survey. Mechanical Turk allowed workers to comment on
tasks and provide feedback to requesters. The researcher found this feature to be very
useful as it helped the researcher quickly identify ambiguous questions and task
instructions and improve them in relatively short period of time. The initial survey design
yielded low quality responses for the reason that turkers try to game the system by
attempting to complete a high number of human intelligent tasks (HITs) in the shortest
possible time. The original task that was given to the selected participants was priced at
$0.02 or 2 cents.
The initial survey did not have a qualification requirement. So in the second
iteration the researcher added a qualification requirement for the available HITs. The
qualification requirements were mainly geared to ensure that the workers are invested in
45
the task and intended to perform it well. Some of these qualification requirements
included questions about the “about us” section of the sites included in the study. The
questions were brief but ranged from asking the worker how many images were present
on a certain web page to finding a sentence and fill in the missing words in the survey for
the same sentence. In this second iteration a number of workers provided feedback
regarding the work on some questions or tasks. In the third iteration, the researcher
introduced a survey with improved instructions and clearly stated questions. This was the
last iteration that provided the highest quality results (later iterations did not add any
significant improvement). The researcher at that time finalized the survey design and
launched the actual study.
The final survey contained the enhanced version of the instructions, qualification
requirement and the questions related to the 5 websites. The final survey HIT was priced
at $0.02 or 2 cents. A participant average time to complete the survey is 15 minutes.
Shortest time was 12 minutes and longest was 19. The responses were very reliable and
this had resulted in the completion of the final survey responses in 5 hours.
The raw data from Mechanical Turk was then downloaded for statistical analyses.
A unique identification number was assigned to each of the participants so that no
46
personal information was revealed or exposed (Cozby, 2001). This identification
number was used to specify each participant in the study.
3.8 Operationalization of Variables
The following variables and their specifications will be used in the analysis.
Tag Creation Effectiveness (TCE): Dependent continuous variable. TCE was
calculated as the proportion of the participant-created tags that are listed on the popular
tag list generated by the social network users. 10 popular tags were used for each site.
Each tag was given a value that represents the usage frequency of the tag by Delicious
users. For example, if 100,000 users used tag1 and 50,000 users used tag2, then tag1 is
assigned a higher score then tag2 which is a reflection of frequency of use. Therefore,
more popular tags, i.e. employed by more users, provide a greater variance in this
variable and thus, a more robust analysis.
Tag Type (TT): Independent categorical variable. Tag type was designed to
categorize the type of tags created. In this case the researcher used the tag classification
schema provided by Bischoff et al. 2008, which includes: Topic, Time, Location, Type,
Author/Owner, Opinion/Qualities, Usage Context, and Self.
47
Interest in the Website Topic (Interest): Independent ordinal variable. Interest was
assessed through a 2 point Likert-scale question with 1 being most interested and 0 being
least interested.
Experience with Website or Similar Website (Experience): Independent ordinal
variable. Experience was assessed through a 5 point Likert-scale question with 4 being
most experienced and 0 being least experienced.
Previous Tag Usage Experience: Independent ordinal variable. This variable was
assessed through a 4 point Likert-scale question with 3 being most experienced and 0
being least experienced.
Previous Tag Creation Experience (TCX): Independent ordinal variable. This
variable was assessed through a 5 point Likert-scale question with 4 being most
experienced and 0 being least experienced.
Previous Tag Usage Experience (TUX): Independent ordinal variable. This
variable was assessed through a 5 point Likert-scale question with 4 being most
experienced and 0 being least experienced.
Experience with Search Engine: Independent ordinal variable. This variable was
assessed through a 5 point Likert-scale question with 4 being most experienced and 0
being least experienced.
48
Average Daily Time Spent on the Internet: Independent ordinal variable. This
variable was assessed through a 4 point Likert-scale question with 3 being most time and
0 being least time.
3.9 Data Analysis
The data analysis that was used in this study comprised of descriptive statistics,
analysis of variance (ANOVA), and multiple linear regression. Each of these analyses
was conducted in SPSS Version 16.0®.
3.9.1 Descriptive Statistics
The descriptive statistics was comprised of frequency distributions as well as
measures of central tendency. For the frequency distributions, the number and
percentage of each occurrence were presented for the categorical variables in the study.
The measures of central tendency included the presentation of the mean, standard
deviation, and minimum and maximum values for the continuous variables in the study
such as the age of the participant.
3.9.2 ANOVA
As a subsequent analysis, an ANOVA was conducted for the first and second
hypothesis. The ANOVA is a statistical method that is used in order to determine
whether an independent variable(s) has a significant impact on a single dependent
49
variable. An advantage of the ANOVA is that it allows the researcher to be able to
include more than one independent variable in the model at the same time in order to
determine the effect of each variable or to control for specific variables (Tabachnick &
Fidell, 2001). In other words, the researcher is not limited to only including one variable
in the analysis. This is important since this allows the researcher to control for a number
of variables that may be related to the dependent variable.
When the variables have been included in the ANOVA model, the results would
indicate whether an individual or several independent variables contribute to the
explanation in the variation of the dependent variable (Tabachnick & Fidell, 2001). What
this means is that if a variable is found to be significant then it could be concluded that
this variable significantly contributes to the explanation in the variation of the dependent
variable (Keuhl, 2000). The significance of the test is based on an F-statistic that is from
the F-distribution (Keuhl, 2000). Therefore, if the F-statistic exceeds this critical value
then one would be able to conclude that there is a relationship between the independent
and dependent variables.
3.9.3 Multiple Linear Regression
A multiple linear regression model was used specifically for the third research
question. The dependent variable would be tag creation effectiveness. The independent
50
variables were Interest in the Website Topic, Experience with Website or Similar
Website, Tag Creation Experience, Tag Usage Experience, Search Engine Experience,
and Time on the Internet. A multiple linear regression is appropriate because there would
be multiple independent variables and only one dependent variable. This would be the
most complex of the analyses because there would have to be more assumptions made in
order to make valid inferences about the target population. The one limitation to this
multivariate analysis is that the regression residuals must be normally distributed.
Statistically significant parameter estimates for the multiple linear regression at the 0.05
significance level would be sufficient evidence to reject the null hypothesis.
3.10 Summary
This chapter presented the type of research design that was used which is a quasiexperimental correlational design. This was chosen because it is the objective of the study
to determine whether there are significant relationships between or among tag creation
effectiveness and a number of independent variables. Mechanical Turk workers were
surveyed and used as participants for this study. In terms of the statistical analysis, three
separate statistical tests were used. Descriptive analysis, ANOVA, and multiple linear
regression were deemed to be the most appropriate methodologies for testing the
hypotheses of the study. This chapter also discussed the source of the data, research
51
questions and procedures, hypotheses and data collection. The data analysis and results
will be discussed in Chapters 4 and 5
52
CHAPTER 4: RESULTS
In this chapter, the results of the statistical analyses that were conducted to address the
objectives of the study are presented. The chapter is organized in the following manner:
4.1 Introduction
4.2 Collected Data and Overview of Sample Population
4.3 Hypothesis Data Analysis
4.4 Summary
4.1 Introduction
At a high level, this study examined if popular tags can be reproduced using
crowdsourcing systems. To that end, there were three research questions – two primary
ones and one secondary. The first research question (RQ1) examined the tag creation
effectiveness of popular tags across the sites included in our study. The second research
question (RQ2) examined those relationships by tag type to find out if certain types of
tags are easier to reproduce by employing crowdsourcing workers. The third research
question (RQ3) was mainly concerned with exploring the relationship between tag
creation effectiveness and the following user specific factors: time spent on the Internet,
experience with search engines, interest in the site, familiarity with the site or similar
sites, previous tag creation experience, and previous tag usage experience.
4.2 Collected Data and Overview of Sample Population
This study included data sets from three main sources: a) survey responses from
Mechanical Turk study participants, b) popacular.com for the top most tagged sites on
Delicious, c) Delicious sites for the golden set of the top 10 popular tags used in this
study.
53
4.2.1 Mechanical Turk Population and Survey Descriptive Statistics
Amazon Mechanical Turk is an online marketplace that was launched in 2005 to
facilitate the completion of tasks that require human intelligence. This service provided
requestors with a diverse, on-demand, scalable workforce while giving workers the
flexibility to work from anyplace anytime and the freedom to choose from thousands of
tasks. Mechanical Turk was based on the simple idea that there were many tasks that
human beings can do much better and more effectively than computers. Tasks in this
marketplace ranged from identifying objects in a photo or a video, performing data
inspection and clean-up, translation, transcription of audio recordings, or researching data
and validating its accuracy. During the time of this study the Amazon Mechanical Turk
marketplace had about 85,000 tasks available to workers. At that time Mechanical Turk
was viewed as a sweatshop that takes advantage of people by making them do tedious
tasks in exchange for pennies. Many people wondered about the workers population and
their demographics. In 2008 Panos Ipeirotis, a researcher at Stern School of Business of
New York University, conducted an extensive survey that revealed data regarding the
demographics of Mechanical Turk workers and proved that the early ideas about who
these people were was far from accurate. To that end and according the 2008 survey
about 76% of the workers were from the US, 8% were from India, 3.5% were from the
United Kingdom, 2.5% were from Canada, and the remaining 9% were distributed across
a large number of counties. The survey also revealed that about 59% of workers were
females. Age distribution data was favoring the age group of 21 years of age to 40 years
of age. Figure 5 shows the details of age distribution.
54
Figure 5: Age Distribution of Mechanical Turk Workers
For education level about 52% of the workers have a bachelor’s degree. Figure 5 shows
the details of education distribution of the Mechanical Turk Population:
Figure 6: Education Distribution of Mechanical Turk Workers
The survey also provided information about why workers participate in the Mechanical
Turk marketplace – i.e. what motivates them to complete these tasks? Figure 7 shows that
for money only, for money and fun, for money fun and killing time are the primary three
reasons of participation.
55
Figure 7: Primary Reasons for Participation
In summary, it was concluded that the Mechanical Turk population is good
representation of online users.
The study sample included responses from 107 total Mechanical Turk participants
and all the responses were used for the statistical analysis. Sixty-four of these participants
(59.8%) were female. The age of the participants ranged from 18 to 66 years (M = 42.21,
SD = 10.92). Table 2 presents descriptive statistics on the participants’ education,
experience with search engines, time spent on the internet, tagging usage experience, and
tag creation effectiveness (agreement scores between participant’s provided tags and
popular tags) for each of the five sites included in the study. As can be gleaned from this
table, the average tag creation effectiveness ranged from .5665 (digg) through .7561
(facebook).
Table 2
Descriptive Statistics of Study Sample
Variable
Minimum Maximum
Mean
Std. Deviation
Age
18.00
66.00
42.2150
10.92170
Education
1.00
7.00
3.3271
1.62397
56
Variable
Minimum Maximum
Mean
Std. Deviation
ESE
.00
4.00
3.0187
.85761
Time Spent on the Internet
.00
3.00
2.0374
.86793
Previous Tag Usage Exp
.00
4.00
2.1215
1.37848
Previous Tag Creation Exp
.00
4.00
1.5981
1.18050
Agreement score – Youtube
.35
.86
.6577
.18398
Agreement score – Flickr
.36
.83
.5713
.18366
Agreement score – Pandora
.27
.85
.6084
.20645
Agreement score – Facebook
.47
.85
.7561
.14264
Agreement score – Digg
.36
.71
.5665
.13884
4.2.2 Popacular and Delicious Data for Most Tagged Sites
Popacular is a site that offered data about popular Delicious bookmarks and
related user tagging activities including a list of the 100 most tagged sites on Delicious,
how many users tagged each one of these sites and time durations of tagging activities
such as: hourly, daily, weekly, monthly and an all-time category of most tagged sites. The
all times list reflected the least fluctuation in activities and changes to the sites over time
while the hourly lists reflected the most changes in the list of sites and frequency of
tagging. The all-time list of 100 sites had been tagged by a total of 3,328,778 users. The
top ranked site was tagged by 91,345 users and the last site on the list was tagged by
21,370 users (M = 33,287.78, SD = 12,720.73). Appendix B provides the complete list of
the 100 most tagged sites.
57
The top 5 most tagged sites were chosen for this study. Table 3 provides detailed
information about number of users that tagged each one of these sites.
Table 3
Data for the All Time Top 5 Most Tagged Sites
Site
Brief Description
Youtube
Video sharing
Flickr
Photo sharing
Pandora
Music sharing
Facebook
Relationships
Digg
News Sharing
No of Unique Taggers
91,347
79,982
62,186
62,007
58,237
Delicious.com provided the list of tags for the five sites included in this study.
Tables 5 through 9 below provide more data regarding the tags used and frequency of use
for each site.
Tag type was determined using the classification schemes developed by Sen et al.
(2006). Table 4 shows the tag type classification schemas available and the mapping
between them.
Table 4
Mapping between tag classification schemes
Bischoff et al.
Golder et al.
Topic
What or who it is about
Time
Refining Categories
Location
Type
What it is
Author/Owner
Who owns it
Qualities and
Opinion/Qualities
characteristics
Usage Context
Task organization
Self-Reference
Self-Reference
Xu et al.
Content-based
Context-based
# of Taggers
Tag Weight
Factual
Attribute
Subjective
Subjective
Organizational
Personal
Table 5
Site 1 - YouTube Tagging Data from Delicious
Tag
Sen et al.
Tag Type
(Category)
58
Video
youtube
Videos
entertainment
Media
web2.0
Social
Fun
Music
Community
Total
26,000
18,280
16,906
9,221
7,559
6,747
4,649
4,626
3,391
3,141
100,520
0.258654994
0.181854357
0.168185436
0.091732988
0.075198965
0.067120971
0.046249503
0.046020692
0.03373458
0.031247513
1
F
F
F
F
F
F
S
S
F
F
Table 6
Site 2 - Flickr Tagging Data from Delicious
Tag
Photos
Flickr
photography
Photo
Sharing
Images
web2.0
Community
social
Pictures
Total
# of Taggers
22,755
19,077
15,990
15,256
10,670
9,650
9,542
4,586
3,805
3,805
115,136
Tag Weight
0.197635839
0.165691009
0.138879238
0.132504169
0.092673013
0.083813924
0.082875903
0.039831156
0.033047874
0.033047874
1
Table 7
Site 3 - Pandora Tagging Data from Delicious
# of
Tag
Taggers
Tag Weight
Music
41,731
0.369291081
Radio
24,403
0.215950019
Pandora
8,181
0.072396308
streaming
8,149
0.07211313
Audio
7,560
0.066900879
Free
6,210
0.054954293
web2.0
6,010
0.053184429
mp3
4,908
0.043432475
Tag Type
(Category)
F
F
F
F
F
F
F
F
S
F
Tag Type
(Category)
F
F
F
F
F
F
F
F
59
recommendations
Social
Total
3,000
2,851
113,003
0.026547968
0.025229419
1
Table 8
Site 4 - Facebook Tagging Data from Delicious
# of
Tag
Taggers
Tag Weight
Facebook
16,466
0.231335525
Social
15,174
0.213183849
Networking
9,711
0.136432606
Friends
8,272
0.116215685
Community
6,590
0.092584787
socialnetworking
4,732
0.066481216
web2.0
4,448
0.062491219
network
3,104
0.04360898
Blog
1,443
0.020273118
Personal
1,238
0.017393015
Total
71,178
1
F
S
Tag Type
(Category)
F
F
F
F
F
F
F
F
S
S
Table 9
Site 5 - Digg Tagging Data from Delicious
# of
Tag
Taggers
Tag Weight Tag Type (Category)
News
25,629 0.297665505
F
Technology
12,263 0.14242741
F
Blog
9,405 0.109233449
F
web2.0
9,041 0.105005807
F
Social
7,090 0.082346109
F
tech
6,947 0.08068525
F
Daily
5,445 0.063240418
S
community
4,920 0.057142857
F
Links
2,732 0.031730546
F
web
2,628 0.030522648
F
Total
86,100
1
4.3 Hypothesis Data Analysis
Hypothesis 1
60
Null hypothesis 1 stated “There are no statistically significant differences in tag
creation effectiveness for popular tags among the sites included in this study.” To assess
whether there was a statistically significant difference in tag creation effectiveness among
the sites, a repeated-measures ANOVA was conducted. The dependent variables in this
analysis were the tag creation effectiveness scores for all five sites. The results are
presented in Table 9.
Results from the repeated-measures ANOVA showed that there were indeed
significant differences in tag creation effectiveness across the sites (F (1, 106) = 70.597, p
< 0.001). In order to assess which sites were significantly different from other ones,
multiple pairwise comparisons were conducted, using a Bonferroni correction. The
results are presented in Table 10.
61
Table 10
Pairwise Comparisons of Tag Creation Effectiveness among Sites
(I) Site
1
2
3
4
5
(J) Site
Mean Difference
(I-J)
Std. Error
Sig.
2
.086*
.011
.000
3
.049*
.012
.001
4
-.098*
.014
.000
5
.091*
.011
.000
1
-.086*
.011
.000
3
-.037
.014
.109
4
-.185*
.015
.000
5
.005
.012
1.000
1
-.049*
.012
.001
2
.037
.014
.109
4
-.148*
.015
.000
5
.042
.016
.082
1
.098*
.014
.000
2
.185*
.015
.000
3
.148*
.015
.000
5
.190*
.011
.000
1
-.091*
.011
.000
2
-.005
.012
1.000
3
-.042
.016
.082
4
-.190*
.011
.000
62
As can be gleaned from this table, most of the pairwise comparisons were
statistically significant. Site 4 had the highest average tag creation effectiveness (M =
0.756), and it was significantly higher than all other sites. Site 1 had the second highest
average tag creation effectiveness (M = 0.657), and it was also significantly different
from all other sites. The lowest average tag creation effectiveness was observed for Site
5 (M = 0.566), although its average was not significantly different from that of sites 2 or
3. Based on these results, Null Hypothesis 1 was rejected.
Hypothesis 2
Null hypothesis 2 stated “There is no statistically significant difference in tag
creation effectiveness across tag types.” To assess whether there was a statistically
significant difference in tag creation effectiveness across tag types, a series of ANOVAs
were conducted. Specifically, one ANOVA for each site was used. The dependent
variable in this analysis was the participants’ tag creation effectiveness for the site,
whereas the grouping variable was the tag type for each participant for that site. There
were two categories of tag types: “F” and “F and S.” It is important to note that, for Site
4, all participants had the same tag type (F). Therefore, no comparison was possible for
this site. The analyses were thus limited to Sites 1, 2, 3 and 5. The results are presented in
Table 11.
As can be gleaned from this table, tag creation effectiveness was significantly
higher (p < 0.001 in all cases) for the “F and S” tag types (with average tag creation
effectiveness ranging from .715 to .838) than for “F” tag types (with average tag creation
effectiveness ranging from .455 to .558). Therefore, Null Hypothesis 2 was rejected.
63
Table 11
Comparison of Tag Creation Effectiveness by Tag Types at Sites 1, 2, 3 and 5
F
F and S
M
SD
M
SD
F(1, 105)
P
Site 1
.558
.156
.838
.017
120.780
<.001
Site 2
.535
.166
.833
.000
41.693
<.001
Site 3
.525
.175
.833
.073
85.533
<.001
Site 5
.455
.067
.715
.000
681.526
<.001
Note: The F statistic corresponds to the test statistics of the ANOVA comparing tag
creation effectiveness between the “F” and “F and S” group. The p value is the one
associated to that test.
Hypothesis 3
Null Hypothesis 3 stated: “None of the independent variables of Interest in the
Website Topic, Experience with Website or Similar Website, Tagging Experience,
Search Engine Experience, and Time on the Internet have a statistically significant effect
on tag creation effectiveness.” To determine the relationship between tag creation
effectiveness and a set of predictor variables, a series of multiple linear regression
analysis procedures were used. Specifically, five regressions were estimated; one for each
site. The dependent variable in these analyses was tag creation effectiveness for the site.
The predictor variables were: interest in the site, familiarity with site, previous tag usage
experience, previous tag creation experience, experience with search engines (ESE), time
spent on the internet, and tag types.
It is important to note that some of the predictor variables were constant for some
of the sites, and thus had to be removed from the analysis. For example, as explained
previously, it was not possible to use tag type as a predictor variable for Sites 3 and 4.
64
Additionally, for Site 4, the variable “interest in the site” had to be dropped for the same
reason. The regression results are presented in tables 12 through 16.
Table 12
Regression Results for Site 1
Variable
(Constant)
B
Std. Error
.480
.025
-.108
.018
Familiarity with Site
.098
Previous Tag Usage Exp
Beta
t
Sig.
18.870
.000
-.234
-6.021
.000
.011
.476
8.816
.000
.067
.007
.506
10.062
.000
Previous Tag Creation Exp
.098
.010
.629
10.290
.000
ESE
.008
.007
.037
1.188
.238
Time Spent on the Internet
.000
.012
-.001
-.016
.987
-.189
.023
-.493
-8.374
.000
Interest in Site
Participant Tag Types (=F and S)
R2 = .951; F(7, 99) = 272.131, p < 0.001
Table 13
Regression Results for Site 2
Variable
B
Std. Error
(Constant)
.318
.017
Interest in Site
.228
.016
Familiarity with Site
.034
Previous Tag Usage Exp
Beta
t
Sig.
18.530
.000
.619
13.967
.000
.009
.167
3.658
.000
.008
.005
.057
1.605
.112
Previous Tag Creation Exp
.031
.007
.199
4.497
.000
ESE
.002
.005
.009
.400
.690
-.012
.008
-.057
-1.457
.148
.052
.014
.093
3.714
.000
Time Spent on the Internet
Participant Tag Types (=F and S)
R2 = .972; F(7, 99) = 491.780, p < 0.001
65
Table 14
Regression Results for Site 3
Variable
B
(Constant)
Std. Error
.427
.021
-.036
.010
Familiarity with Site
.186
Previous Tag Usage Exp
Beta
t
Sig.
20.780
.000
-.073
-3.633
.000
.007
1.072
26.393
.000
.007
.006
.045
1.106
.271
Previous Tag Creation Exp
-.007
.008
-.042
-.903
.369
ESE
-.003
.006
-.011
-.456
.650
Time Spent on the Internet
.007
.010
.029
.682
.497
Participant Tag Types (=F
and S)
-.054
.013
-.117
-4.191
.000
Interest in Site
R2 = .968; F(7, 99) = 423.770, p < 0.001
Table 15
Regression Results for Site 4
Variable
B
Std. Error
(Constant)
.379
.032
Interest in Site
.183
.014
Previous Tag Usage Exp
.014
Previous Tag Creation Exp
ESE
Time Spent on the Internet
R2 = .825; F(5, 101) = 95.043, p < 0.001
Beta
t
Sig.
11.842
.000
.965
13.071
.000
.010
.132
1.427
.157
-.008
.013
-.066
-.634
.527
.002
.009
.012
.213
.832
-.028
.016
-.169
-1.749
.083
66
Table 16
Regression Results for Site 5
Variable
B
Std. Error
(Constant)
.131
.013
Interest in Site
.134
.008
-.015
Previous Tag Usage Exp
Beta
t
Sig.
9.791
.000
.379
16.035
.000
.006
-.125
-2.664
.009
.007
.003
.068
2.135
.035
Previous Tag Creation Exp
.000
.004
.002
.052
.959
ESE
.003
.003
.016
.868
.387
-.001
.005
-.006
-.175
.862
.232
.011
.832
20.499
.000
Familiarity with Site
Time Spent on the Internet
Participant Tag Types (=F and S)
R2 = .981; F(7, 99) = 716.785, p < 0.001
As can be gleaned from these tables, the predictive power of the models was high
in all five cases, with R2 statistics ranging from .825 (for Site 4) through .981 (for Site 5).
This suggests that the chosen set of predictor variables was enough to explain a very
large proportion of the variability in tag creation effectiveness.
The following conclusions can be derived from the regression results. First, it is
apparent that experience with search engines and time spent on the internet did not have a
significant effect on tag creation effectiveness for any of the sites.
For Site 1, tag creation effectiveness was significantly and negatively related with
interest in site and positively related with familiarity with site, previous tag usage
experience, and previous tag creation experience.
67
For Site 2, tag creation effectiveness was significantly and positively related with
interest in site, familiarity with site, previous tag creation experience. However, tag usage
experience was not significantly related to tag creation effectiveness.
For Site 3, tag creation effectiveness was significantly and negatively related with
interest in site and positively related with familiarity with site. Moreover, for Site 4, tag
creation effectiveness was significantly and positively related only with interest in site.
Finally, for Site 5, tag creation effectiveness was significantly and positively
related with interest in sit and previous tag usage experience. Additionally, it was
significantly and negatively related with familiarity with site.
4.4 Summary
The purpose of the study was to determine whether popular internet bookmarking
tags can be recreated through crowdsourcing. Based on the results from the statistical
analysis, it was found that Sites 4 and 1 had the highest average tag creation
effectiveness, while the lowest one was associated with Site 5. Moreover, it appears that
tag creation effectiveness was significantly higher for tag type “F and S” than for tag type
“F.” Additionally, other variables were tested to assess their relationship with tag creation
effectiveness. Interest in site, familiarity with site, tag creation experience and tag usage
experience were significantly related to tag creation effectiveness for some of the sites,
although the direction and significance of these relationships was not consistent across
sites.
68
CHAPTER 5: CONCLUSIONS AND RECOMMENDATIONS
The purpose of this experimental study was to determine whether popular internet
bookmarking tags can be recreated through crowdsourcing. Using the Amazon
Mechanical Turk as a means to conduct an experiment, the reproduction of popular
Delicious tags for a variety of websites was successfully achieved. Additional objectives
of the study was to examine a number of factors regarding tag creation including the
effectiveness of crowdsourcing in reproducing popular tags, learn about what tags can be
recreated most effectively, and the relationship of worker characteristics and
demographics on the effectiveness of producing popular tags.
The dependent variable for the study is tag creation effectiveness while the
independent variables for the study are tag type, interest in the website topic, experience
with website or similar website, tagging creation or usage experience, search engine
experience, and average daily time spent on the internet. An analysis of variance
(ANOVA) was conducted to determine the relationship among the independent and the
dependent variables.
Chapter 5 provides interpretations of the findings found in Chapter 4 as it relates
to the research questions and literature reviewed. Chapter 5 also provides
recommendations in terms of the significance of the study. Recommendations for future
research and a brief summary conclude the chapter.
5.1 Scope, Limitations, Delimitations
The scope of the present study was limited to the participants selected by post
tagging tasks on Mechanical Turk. Mechanical Turk workers are believed to be an
acceptable representative of online users according to the surveys conducted by Panos
69
Ipeirotis, an associate Professor at the IOMS Department at Stern School of Business of
New York University in 2008 and 2010. However, online users are quite dynamic and
they are always shifting and changing in demographics and interests. The same applies to
the Amazon Mechanical Turk worker communities. Since one community is a subset of
the other, in this case Mechanical Turk workers are also online users, it is reasonable to
assume that these two communities would have similar characteristics to some degree.
However, the study may not be generalizable beyond the scope of these participants as
they may not represent the total user of the sites in consideration or the population that
uses search engines to find information online. Limitations include the nonrandomization of the participants selected and the truthfulness of the answers given by
the participants, which could limit the analysis and interpretations of the study’s results.
5.2 Findings and Implications
A total of 107 participants were used to gather data and results were subjected to
statistical analysis to answer the study’s research questions. This section will outline the
findings and its implications in the field of the bookmarking tags and crowd sourcing per
research questions.
Research Question 1: Are there statistically significant differences in tag creation
effectiveness for popular tags among the sites included in this study?
H10: There are no statistically significant differences in tag creation effectiveness
for popular tags among the sites included in this study.
Results from the repeated-measures ANOVA showed that there were indeed
significant differences in tag creation effectiveness across the sites. In order to assess
70
which sites were significantly different from other ones, multiple pairwise comparisons
were conducted, using a Bonferroni correction. This analysis revealed that Site 4 had the
highest average in tag creation effectiveness, Site 2 falls second inline while Site 5 had
the lowest average amongst the five sites included in our study. One possible reason for
this is the popularity and wide memberships of Sites 4 and 1 especially when compared
to site 5. During the time of this study Sties 4, 1 and 5 had 500+ million, 70+ million, and
2.7+ million active members respectively.
This finding justifies what has been founded in the literature. It has been said by
Bischoff, Firan, Nejdl, and Paiu (2008) that popular tags have been utilized as a way of
bookmarking and giving out brief, concise summaries about web pages for search
engines. Thus, this could be used in a developed algorithm that would measure the
popularity of a page or its contents. Further, in terms of social bookmarking systems the
finding strengthens the idea that tags could also help in the detection or identification of
trends in tagging, popularity and content. For example, del.icio.us was fast growing
because of its ability to centrally collect and share bookmarks among users. It follows a
format that shares information through two channels of the website. The first channel is
through bookmarks or tags. This is where users subscribe to others’ content and are
updated whenever their interests are added onto. The second channel is through the main
webpage, where the front page is the primary means of sharing information. As it is the
first point of contact, it attracts the attention of all visitors of the site (Wetzker,
Zimmermann, & Bauckhage, 2008). While many believed that popular tags can be used
to improve web search but integrating would not yield a noticeable difference due to lack
of their availability and distribution across the web. This study provides a way to
71
generate popular tags in an efficient and scalable way, which opens the door to
incorporating existing popular tags and creating them for sites where they do not yet
exist.
Research Question 2: Are there statistically significant differences in tag creation
effectiveness across tag types?
H20: There is no statistically significant difference in tag creation effectiveness
across tag types.
Series of ANOVA analyses were conducted to assess the above hypothesis. The
dependent variable used was the tag creation effectiveness and the independent variable
was the tag type for each participant for that site. There were two categories of tag types:
“F” and “F and S.” It is important to note that, for Site 4, all participants had the same tag
type (F). Therefore, no comparison was possible for this site. The analyses were thus
limited to Sites 1, 2, 3, and 5. Based from the series of ANOVA results, it was found that
tag creation effectiveness was significantly higher at p < 0.001 for the “F and S” tag types
than for “F” tag types in all cases. This means that there is enough statistical evidence
that there is a significant difference in tag creation effectiveness across tag types.
This finding suggests that tag creation effectiveness will differ as the tag type
changes. Bischoff, Firan, Nejdl, and Paiu (2008) mentioned that there are 8 different
dimensions of tag types. These dimensions are topic, time, location, type, author/owner,
opinions/qualities, usage context, and self-reference. With these dimensions, it is
expected that the level of tag creation effectiveness will differ. More so, the dimension
type refers to the kind of media that is used, such as the type of web page presented. As
72
such, a site that is using different media as compared to another site is expected as well to
have a different tag creating effectiveness.
Research Question 3: Are there statistically significant differences in tag creation
effectiveness across tag types?
H30: None of the independent variables of Interest in the Website Topic,
Experience with Website or Similar Website, Tagging Experience, Search Engine
Experience, and Time on the Internet have a statistically significant effect on tag creation
effectiveness.
A series of multiple linear regression analysis was used to test the above
hypothesis. The independent variable was the tag creation effectiveness and the
dependent variables were interest in the site, familiarity with site, previous tag usage
experience, previous tag creation experience, experience with search engines (ESE), time
spent on the internet, and tag types. Five regressions were estimated, one for each site.
The predictive power of the models was high in all five cases, with R2 statistics ranging
from .825 (for Site 4) through .981 (for Site 5). This suggests that the chosen set of
predictor variables was enough to explain a very large proportion of the variability in tag
creation effectiveness.
5.3 Recommendations
The findings from the study revealed several significant themes with regard to
finding ways to create social bookmarking tags efficiently and effectively using
crowdsourcing. This objective was measured through tag creation effectiveness. Firstly,
tag creation effectiveness was measured against popular tags among the sites considered
in the study. The findings suggest that popular tags can indeed be recreated using
73
crowdsourcing and thus can be made available through this method to improve web
search. Creating popular tags or tags useful for search engines through crowdsourcing is
one reliable, effective and efficient way to go about it. This solved the scarcity and
limited distributions of popular tags, which have been proven most useful to improve web
search. Secondly, tag creation effectiveness was measured against tag types. The results
suggest that the effectiveness of tags differs from one tag type to another. Thus, it must
be of a concern on a site with different tag types. Lastly, tag creation effectiveness was
measured across user specific characteristics. It has been found that experience with
search engines and time spent on the internet did not have a significant effect on tag
creation effectiveness for any of the five sites considered in the study. Specifically, the
finding suggests that tag creation effectiveness for Site 1 was significantly and negatively
related with interest in site and positively related with familiarity with site, previous tag
usage experience, and previous tag creation experience. Meanwhile, tag creation
effectiveness for Site 2 was significantly and positively related with interest in site,
familiarity with site, previous tag creation experience. Tag creation effectiveness for Site
3 was significantly and negatively related with interest in site and positively related with
familiarity with site. Moreover, for Site 4, tag creation effectiveness was significantly and
positively related only with interest in site. Lastly, tag creation effectiveness for Site 5
was significantly and positively related with interest in sit and previous tag usage
experience.
5.4 Scope and Limitations of the Study
The scope of this study was limited to user-generated content only, more
specifically, through social networking sites. It does not cover or address other types of
74
content created through traditional channels with proper content management and control
processes and procedures. Therefore, the findings of this study should apply only to
systems of user-generated content. For example, the findings of this study may not be
expanded to business or institutional sites without further research and examination. The
rational to this focus is driven from the basic idea that user-generated content presents a
bigger problem when it comes to information organization and indexing. Given the speed
in which user-generated content is created, it is important for us to find new and creative
ways to quickly index this content so it can be accessible to users through web search.
Furthermore, the study focused on measuring Tag Creation Effectiveness for the top 5
most tagged sites on Delicious. Tag Creation Effectiveness in essence is the level of
agreement between the tags produced by the study participants and those found as
popular tags on Delicious for the 5 sites included in this research. There was a question
regarding the use of popular Delicious tags as a golden set to measure Tag Creation
Effectiveness. This was done because the popular tags are believed to be a good set of
tags that can help search. Since a large number of unique users chose these tags, then they
are a good set of tags that can serve as a golden set for the study. Since the process used
to produce tags using Amazon Mechanical Turk showed that that the Tag Creation
Effectiveness was strong across all sites, it is believed that this same process can now be
employed to generate tags for less popular user-content sites. New sites are usually not
very accessible through web search because it takes time for users to adopt the new
features and use them to generate good descriptive data about the new content. The
process presented in this research study can be employed at any time to address this gap
in search and accessibility of this information – what some call the cold start problem.
75
Lastly, the efficiency aspect of this research study refers to the process of
producing tags though the use of crowdsourcing platforms, in this case, Amazon
Mechanical Turk. This process is a key feature to this research and must be viewed as a
process that can be repeated to generate good set of tags that can describe web pages and
more specifically ones that contain user-generated content. This process addresses the
efficiency aspect of creating tags because it is fast, cheap and reliable.
5.5 Significance of the Study
This experimental study is significant as it added to the body of knowledge by
providing a reliable, inexpensive, and fast method of recreating user-generated tags that
can be useful for search engines via crowdsourcing. The results identified the potential
benefits of crowdsourcing in tag creation and provided a specific process and design for
tag creation tasks. This method is most useful for new web sources that are not yet
popular or have not yet been tagged by large number of users. The study revealed that tag
creation effectiveness differs across different sites, tag types, and user related
characteristics. Web sites could use this information on creating their social bookmarking
tags to effectively and efficiently improve their web search and make themselves more
accessible to their potential users. The results also provide insight into the role of
crowdsourcing in generating social tags and ultimately improving web search.
5.5 Summary and Conclusions
The present study employed a quantitative experimental research design in order
to explore the phenomenon of whether popular internet bookmarking tags can be
recreated through crowdsourcing. The participants for this study were selected by post
76
tagging tasks on Mechanical Turk. All participants were subjected to an initial
qualification step before they were allowed to participate in the actual study. Information
related to tagging tasks was collected from the participants and were subjected for
analysis. The results revealed that popular bookmarking tags can be recreated effectively
and efficiently through crowdsourcing. Moreover, the analysis revealed that generally tag
creation effectiveness differs across different sites and tag types.
77
References
1. Broota, K. D. (1989). Experimental design in behavioral research. Daryaganj, New
Delhi: New Age International.
2. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Erlbaum.
3. Cozby, P. C. P. (2007). Methods in behavioral research (12th ed.). New York, NY:
McGraw Hill.
4. Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods
approaches. Thousand Oaks, CA: Sage Publications, Inc. Doi:
10.1177/1558689808325771
5. Fidel, R. (1994). Human-Centered Indexing. Journal of the American Society of
Information Science. 45(8), 572-578.
6. Leedy, P. & Ormrod, J. (2001). Practical research: Planning and design (7th ed.).
Upper Saddle River, NJ: Merrill Prentice Hall. Thousand Oaks: SAGE Publications.
7. Pirolli, P. (2005). Rational Analysis of Information Foraging on the Web. Cognitive
Science, 29(3), 343-373.
8. Rowley, J. E. (1988). Abstracting and Indexing (2nd ed.). London: Clive Bingley.
9. Sinha, R. (2005). A Cognitive Analysis of Tagging. Retrieved from
http://www.rashmisinha.com
10. Broota, K. D. (1989). Experimental design in behavioral research. Daryaganj, New
Delhi: New Age International.
11. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Erlbaum.
12. Cozby, P. C. (2001). Methods in behavioral research. New York: McGraw Hill.
13. Cozby, P. C. P. (2007). Methods in behavioral research (12th ed.). New York, NY:
McGraw Hill.
14. Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed
methods approaches. Thousand Oaks, CA: Sage Publications, Inc. Doi:
10.1177/1558689808325771
15. Keuhl, R.O. (2000). Design of experiments: Statistical principles of research design
and analysis. Pacific Grove, CA: Duxbury Press.
78
16. Leedy, P. & Ormrod, J. (2001). Practical research: Planning and design (7th ed.).
Upper Saddle River, NJ: Merrill Prentice Hall. Thousand Oaks: SAGE Publications.
17. Moore D. S., & McCabe, G. P. (2006). Introduction to the practice of statistics. New
York: W.H. Freeman.
18. Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics. Needham
Heights, MA: Allyn and Bacon.
19. Bischoff, K., Firan, C. S., Nejdl, W., & Paiu, R. (2008). Can all tags be used for
search? Conference on Information and Knowledge Management (pp. 203-212).
California, USA: Association for Computing Machinery.
20. Gordon, J., Van Durme, B., & Schubert, L. K. (2010). Evaluation of Commonsense
Knowledge with Mechanical Turk. NAACL HLT 2010 Workshop on Creating Speech and
Language Data with Amazon's Mechanical Turk (pp. 159-162). Los Angeles, CA:
Association for Computational Linguistics.
21. Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing User Studies with
Mechanical Turk. 26th Annual CHI Conference on Human Factors in Computing
Systems. Florence, Italy: Association for Computing Machinery.
22. Sorokin, A., & Forsyth, D. (2008). Utiity data annotation with Amazon Mechanical
Turk. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops. Anchorage, EK: IEEE.
23. Ipeirotis, P. (2010). Demographics of Mechanical Turk. Retrieved from New York
University: http://archive.nyu.edu/bitstream/2451/29585/2/CeDER-10-01.pdf
24. Snow, R., O'Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast- but is it
good? Evaluating non-expert annotations for natural language tasks. Retrieved from
Stanford University: http://www.stanford.edu/~jurafsky/amt.pdf
25. Callison-Burch, C. (2009). Fast, Cheap and Creative: Evaluating Translation Quality
Using Amazon's Mechanical Turk. Conference on Empirical Methods in Natural
Language Processing (pp. 286-295). Singapore: ACL.
26. Belkin, N. J., Cole, M., & Liu, J. (2009). A model for evaluation of interactive
information retrieval. SIGIR 2009 Workshop on the Future of IR Evaluation (pp. 7-8).
Boston, MA: IR Publications.
27. Paris, C. L., Colineau, N. F., Thomas, P., & Wilkinson, R. G. (2009). Stakeholders
and their respective costs-benefits in IR evaluation. SIGIR 2009 Workshop on the Future
of IR Evaluation (pp. 9-10). Boston, MA: IR Publications.
79
28. Smucker, M. D. (2009). A plan for making information retrieval evaluation
synonymous with human performance prediction. SIGIR 2009 Workshop on the Future of
IR Evaluation (pp. 11-12). Boston, MA: IR Publications.
29. Stamou, S., & Efthimiadis, E. N. (2009). Queries. SIGIR 2009 Workshop on the
Future of IR Evaluation (pp. 13-14). Boston, MA: IR Publications.
30. Crecelius, T., & Schenkel, R. (2009). Evaluating Network-Aware Retrieval in Social
Networks. SIGIR 2009 Workshop on the Future of IR Evaluation (pp. 17-18). Boston,
MA: IR Publishing.
31. Kazai, G., & Milic-Frayling, N. (2009). On the evaluation of the quality of relevance
assessments collected through crowdsourcing. SIGIR 2009 Workshop on the Future of IR
Evaluation (pp. 21-22). Boston, MA: IR Publications.
32. Yue, Z., Harpale, A., He, D., Grady, J., Lin, Y., Walker, J., et al. (2009). CiteEval for
evaluating personalized social web search. SIGIR 2009 Workshop on the Future of IR
Evaluation (pp. 23-24). Boston, MA: IR Publications.
33. Ali, M. S., & Consens, M. P. (2009). Enhanced web retrieval task. SIGIR 2009
Workshop on the Future of IR Evaluation (pp. 35-36). Boston, MA: IR Publications.
34. Figueiredo, F., Almeida, J., Belém, F., Gonçalves, M., Pinto, H., Fernandes, D., et al.
(2009). Evidence of quality of textual features on the web 2.0. 18th ACM Conference on
Information and Knowledge Management (pp. 909-918). New York: Association for
Computing Machinery.
35. Wetzker, R., Zimmermann, C., & Bauckhage, C. (2008). Analyzing social
bookmarking systems: a del.icio.us cookbook. ECAI Mining Social Data Workshop (pp.
26-30). Patras, Greece: ECAI.
36. Marge, M., Banerjee, S., & Rudnicky, A. I. (2010). Using the amazon mechanical
turk for transcription of spoken language. IEEE International Conference on Acoustics
Speech and Signal Processing (pp. 5270-5273). Dallas, TX: IEEE.
37. Heymann, P., Koutrika, G., & Garcia-Molina, H. (2008). Can social bookmarking
improve web search? WSDM International Conference on Web Search and Web Data
Mining (pp. 195-205). New York: Association for Computing Machinery.
38. Suchanek, F. M., Vojnovic, M., & Gunawardena, D. (2008). Social tags: meaning and
suggestions. 17th ACM Conference on Information and Knowledge Management. New
York: Association for Computing Machinery.
80
39. Ames, M., & Naaman, M. (2007). Why we tag: motivations for annotation in mobile
and online media. Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems. New York: Association for Computing Machinery.
40. Wetzker, R., Bauckhage, C., Zimmermann, C., & Albayrak, S. (2010). I tag, you tag:
translating tags for advanced user models. Third ACM International Conference on Web
Search and Data Mining (pp. 71-80). New York: Association for Computing Machinery.
41. Zhou, D., Bian, J., Zheng, S., Zha, H., & Giles, C. (2008). Exploring social
annotations for information retrieval. 17th International Conference on World Wide Web.
New York: Association for Computing Machinery.
42. Akkaya, C., Conrad, A., Wiebe, J., & Mihalcea, R. (2010). Amazon mechanical turk
for subjectivity word sense disambiguation. NAACL HLT 2010 Workshop on Creating
Speech and Language Data with Amazon's Mechanical Turk (pp. 195-203). Los Angeles,
CA: Association for Computational Linguistics.
43. Chen, S.-Y., & Zhang, Y. (2009). Improve web search ranking with social tagging.
1st International Workshop on Mining Social Media. Sevilla, Spain: CAEPIA-TTIA.
44. Lu, C., Park, J.-r., Hu, X., & Song, I.-Y. (2010). Metadata effectiveness: a
comparison between user-created social tags and author-provided metadata. 43rd Hawaii
International Conference on System Sciences (pp. 1-10). Hawaii: IEEE Computer
Society.
45. Carlyle, A. (1999). User categorisation of works toward improved organisation of
online catalogues. Journal of Documentation , 55 (2), 184-208.
81
Appendix A: The Survey Tool
Survey Starts Here
1. How old are you?
------ years
2. What is your gender? Please check one selection from the choices below:
-- Male
-- Female
3. What is your education level? Please check one selection from the choices
below:
-- Less than high school
-- High school
-- Associates degree
-- Some college, no degree
-- 4 year college degree (Bachelor’s degree)
-- Some grad school, no degree
-- Masters degree
-- Ph.D., MD, JD, or other advanced degree
4. What is your experience with using search engine services such as Google,
Yahoo or Bing?
-- Not at all experienced
(I use it rarely and only when instructed by someone else).
-- Novice
(I use it regularly but I am not always successful at finding the
information I need).
-- Average
(I rely on it regularly to find what I need online and it works in
82
most cases).
-- Above average
(I use it very often and can find what I need with very little
trouble if any at all - rely on it very heavily).
-- Expert
(I use it all the time and can find anything I need with no
trouble at all - I can not live without it)
5. How much time do you spend on the Internet on average?
-- I rarely spend time on the Internet
-- I use it at least once a week
-- I use it at least once a day
-- I use it more than once a day
6. Tag Usage Experience: Do you use tags for any purpose (finding, sharing, or
storing information)?
-- Never: I know nothing about tags
-- I don’t use tags but I know about them
-- Sometimes I use them
-- I use them frequently
-- I use tags all the time
7. Tag Creation Experience: Do you create tags for any purpose (finding, sharing or
storing information)?
-- Never: I know nothing about tags
-- I don’t create tags but I know about them
-- Sometimes I create them
-- I create them frequently
83
-- I create tags all the time
8. Follow the instructions provided below and answer questions about 5 different
websites.
Website 1: http://www.youtube.com/
Click on the hyperlink provided for website 1 and answer the following questions:
a) What do you think this website is about – you can use the “About Us” section
to provide this information?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Are you familiar with this site or a similar site?
-- Never saw it before – I am not familiar with it.
-- Seen it before or heard about it but I did not use it
-- I use this site sometimes so I am somewhat familiar with it
-- I use this site all the time so I am familiar with it
c) Are you interested in this site or interested in what it is about
(the topic it covers (do you like this site)?
-- Yes
-- No
d) What words would you use as tags to describe this site?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
84
Website 2: http://www.flickr.com/
Click on the hyperlink provided for website 2 and answer the following questions:
a) What do you think this website is about – you can use the “About Us” section
to provide this information?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Are you familiar with this site or a similar site?
-- Never saw it before – I am not familiar with it.
-- Seen it before or heard about it but I did not use it
-- I use this site sometimes so I am somewhat familiar with it
-- I use this site all the time so I am familiar with it
c) Are you interested in this site or interested in what it is about
(the topic it covers (do you like this site)?
-- Yes
-- No
d) What words would you use as tags to describe this site?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Website 3: http://www.pandora.com/
Click on the hyperlink provided for website 3 and answer the following questions:
a) What do you think this website is about – you can use the “About Us” section
to provide this information?
85
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Are you familiar with this site or a similar site?
-- Never saw it before – I am not familiar with it.
-- Seen it before or heard about it but I did not use it
-- I use this site sometimes so I am somewhat familiar with it
-- I use this site all the time so I am familiar with it
c) Are you interested in this site or interested in what it is about
(the topic it covers (do you like this site)?
-- Yes
-- No
d) What words would you use as tags to describe this site?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Website 4: http://www.facebook.com/
Click on the hyperlink provided for website 4 and answer the following questions:
a) What do you think this website is about – you can use the “About Us” section
to provide this information?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Are you familiar with this site or a similar site?
-- Never saw it before – I am not familiar with it.
86
-- Seen it before or heard about it but I did not use it
-- I use this site sometimes so I am somewhat familiar with it
-- I use this site all the time so I am familiar with it
c) Are you interested in this site or interested in what it is about
(the topic it covers (do you like this site)?
-- Yes
-- No
d) What words would you use as tags to describe this site?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Website 5: http://digg.com/
Click on the hyperlink provided for website 5 and answer the following questions:
a) What do you think this website is about – you can use the “About Us” section
to provide this information?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Are you familiar with this site or a similar site?
-- Never saw it before – I am not familiar with it.
-- Seen it before or heard about it but I did not use it
-- I use this site sometimes so I am somewhat familiar with it
-- I use this site all the time so I am familiar with it
87
c) Are you interested in this site or interested in what it is about
(the topic it covers (do you like this site)?
-- Yes
-- No
d) What words would you use as tags to describe this site?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Survey Ends Here
88
Appendix B: Popacular Top 100 Most Tagged Sites on Delicious – All-Time
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Site
YouTube - Broadcast Yourself
Flickr
Pandora Radio - Listen to Free Internet Radio, Find New Music
Welcome to Facebook! | Facebook
Digg.com
Wordle - Beautiful Word Clouds
All News, Videos, & Images
Google
stock.xchng - the leading free stock photography site
TED: Ideas worth spreading
Lifehacker, the Productivity and Software Guide
Zamzar - Free online file conversion
dafont.com
COLOURlovers :: Color Trends + Palettes
Web 2.0 Tools and Applications - Go2web20
The Internet Movie Database (IMDb)
Scribd
Upload & Share PowerPoint presentations and documents
Smashing Magazine
Slashdot - News for nerds, stuff that matters
Wikipedia, the free encyclopedia
Install Bookmarklets on Delicious
Instructables - Make, How To, and DIY
Tw
deviantART: where ART meets application!
W3Schools Online Web Tutorials
Technorati: Front Page
Etsy :: Your place to buy and sell all things handmade
Browsershots
kuler
Internet Archive - Suchmaschine die u.a. alte Versionen von
Websiten findet
The New York Times
Yahoo!
Last.fm - Listen to internet radio and the largest music catalogue
online
Prezi - The zooming presentation editor
Number of Unique
Taggers
91,347
79,982
62,186
62,007
58,237
58,847
58,019
55,835
55,220
55,041
53,157
45,945
45,075
44,311
40,766
40,295
39,716
38,976
38,652
38,261
37,831
37,477
37,228
36,751
36,696
35,573
35,063
35,003
34,807
34,740
34,144
33,515
33,332
32,903
32,864
89
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
MySpace
Index/Left on MySpace Music - Free Streaming MP3s, Pictures &
Music Downloads
script.aculo.us - web 2.0 javascript
Netvibes
FFFFOUND!
Ajaxload - Ajax loading gif generator
Web Developer's Handbook | CSS, Web Development, Color
Tools, SEO, Usability etc...
Mininova : The ultimate BitTorrent source!
Color Scheme Generator
Engadget
Speedtest.net - The Global Broadband Speed Test
Boing Boing
CNN
A List Apart: A List Apart
KeepVid: Download and save any video from Youtube ...
Hulu - Watch your favorites. Anytime. For free.
TechCrunch
HowStuffWorks
Wolfram|Alpha
Mashable
Animoto - the end of slideshows
jQuery: The Write Less, Do More, JavaScript Library
TeacherTube - Teach the World | Teacher Videos | Lesson Plan
Videos ...
teachertube
Stock Photography: Search Royalty Free Images & Photos
Royalty-Free Stock Photography at iStockphoto.com
The FWA: Favourite Website Awards - Web awards at the
cutting edge
Picnik: edita fotos fácilmente y en línea en tu explorador
css Zen Garden: The Beauty in CSS Design
ZOHO Email Hosting, CRM, Project Management, Office Suite,
Document Management, ...
Email Hosting, CRM, Project Management, Office Suite,
Document Management, Remot...
Zoho
Email Hosting, CRM, Project Management, Database Software,
Office Suite, Documen...
Online Diagram Software - Gliffy
Academic Earth - Video lectures from the world's top scholars
31,991
31,991
31,501
31,357
31,283
31,186
30,946
30,460
30,449
30,312
29,945
29,879
29,730
29,555
29,448
29,130
28,991
28,937
28,870
28,668
28,146
27,983
27,883
27,883
27,084
27,084
26,873
26,847
26,719
26,155
26,155
26,155
26,155
25,680
25,481
90
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Urban Dictionary
Homepage | Dictionary.com
LibraryThing | Catalog your books online
Threadless graphic t-shirt designs
PortableApps.com - Portable software for USB drives
Open Source Web Design - Download free web design
templates.
Project Gutenberg
Main Page - Gutenberg
popurls® | the genuine aggregator for the latest web buzz
xkcd - A Webcomic - Blockbuster Mining
Gizmodo, the Gadget Guide
Download music, movies, games, software! The Pirate Bay - The
world's largest Bi...
Ning lets you create and join new social networks for your
interests and passion...
960 Grid System
LogoPond - Identity Inspiration Wikipedia
Vimeo, Video Sharing For You
MiniAjax.com / Highlighting Rich Experiences on the Web
Iconfinder | Icon search made easy
53 CSS-Techniques You Couldn't Live Without | CSS | Smashing
Magazine
53 CSS-Techniques You Couldn't Live Without « Smashing
Magazine
Remember The Milk: Online to do list and task management
Jing | Add visuals to your online conversations
Wired News
Digital Camera Reviews and News: Digital Photography Review:
Forums, Glossary, F...
Learn to Read at Starfall - teaching comprehension and phonics
Bugmenot.com - login with these free web passwords to bypass
compulsory registra...
Amazon.com: Online Shopping for Electronics, Apparel,
Computers, Books, DVDs & m...
Khan Academy
Facebook | Home
25,231
24,851
24,737
24,345
24,307
24,253
24,016
24,015
24,012
23,930
23,882
23,747
23,642
23,388
23,110
22,981
22,821
22,769
22,725
22,639
22,639
22,464
22,420
21,944
21,779
21,696
21,596
21,431
21,405
21,280
91
Appendix C: Popacular List of Most Tagged Sites – One Month
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Site
Javascript PC Emulator
Subtle Patterns | High quality patterns for your next web project
SpyBubble Review
Innovative Techniques To Simplify Sign-Ups and Log-Ins Smashing Magazine
Cool, but obscure unix tools :: KKovacs
Microjs: Fantastic Micro-Frameworks and Micro-Libraries for
Fun and Profit!
delicious/register/bookmarklets
Angry Birds
The Architecture of Open Source Applications
National Jukebox LOC.gov
Front End Development Guidelines
On TermKit | Steven Wittens - Acko.net
SLR Camera Simulator | Simulates a digital SLR camera
Affordable Link Building Services
Layer Styles
Clean Up Your Mess - A Guide to Visual Design for Everybody
Home Based Business
Dictionary of Algorithms and Data Structures
CSS3 Generator - By Eric Hoffman & Peter Funk
Stolen Camera Finder - find your photos, find your camera
Layer Styles
YouTube - Broadcast Yourself.
lovely ui
Data Mining Map
LogicalDOC Document Management - Document Management
Software, Open Source DMS
Number of Unique
Taggers
1778
1734
1459
1244
1201
1052
1038
966
958
957
900
872
811
764
733
678
677
621
618
599
584
565
536
517
508
92
Appendix D: Popacular List of Most Tagged Sites – One Week
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Site
SpyBubble Review
The Architecture of Open Source Applications
Cool, but obscure unix tools :: KKovacs
Data Mining Map
Link Building Service at Diamond Links |
Leyden Energy Develops Durable Laptop batteries
Hewlett-Packard Updated Their Mini Note, Notebook Lines
AT&T Lifts Android Application Confinements
Google Correlate
Hype
Hivelogic - Top 10 Programming Fonts
Samsung Galaxy Tab 10.1 With Android 3.1 Coming in a Few
Days
Android Security Fix Will Enter Market In Coming Few Days,
Says Google
Boy or Girl? Gender Reveal Parties Let the Cat Out of the Box
The History Of Car Accidents
Kung Fu Panda 2 Preview: The Awesomeness is back
Number of Unique
Taggers
1302
958
628
517
404
389
380
379
371
361
330
294
294
285
253
250
93
Appendix E: Popacular List of Most Tagged Sites – One Day
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Site
Google Correlate
Samsung Galaxy Tab 10.1 With Android 3.1 Coming in a Few
Days
Styling ordered list numbers
The History Of Car Accidents
The Best & Worst James Bond Themes Of All Time
SpyBubble Review
Advanced Google Analytics for Startups | Think Vitamin
The Architecture of Open Source Applications
Good News For Sri Lankan Auto Lovers: Tata Nano To be Sold in
Sri Lanka Soon !!!...
Hivelogic - Top 10 Programming Fonts
Better Image Management With WordPress - Smashing
Magazine
10 Types of Videos That YouTube Should Simply Ban
Better Light Effect For Design Inspiration
Kung Fu Panda 2 Review: WOWsomeness!!!
17 Futuristic Eco-Homes
5 Best Free File Compression Software
Sheetalbhabhi.com Preview
What is Internet Marketing? Is it for your business??
7 Unique Jquery Navigation Menus for Everyones Needs
The Success Story of Mycroburst
Introduction to DNS: Explaining The Dreaded DNS Delay Smashing Magazine
Press Brakes-Mechanical press Brake-Hydraulic Press Brake
The Only Way to Get Important Things Done - Tony Schwartz Harvard Business Rev...
Sustaining Continuous Innovation through Problem Solving ...
5 Cities in the U.S. with Excellent Public WiFi
Number of Unique
Taggers
314
294
239
233
227
219
171
169
166
150
139
139
129
124
118
117
101
91
91
89
87
87
87
87
86
94
Appendix F: Popacular List of Most Tagged Sites – 8 Hours
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Site
Better Image Management With WordPress - Smashing
Magazine
The Architecture of Open Source Applications
Samsung Galaxy Tab 10.1 With Android 3.1 Coming in a Few
Days
Kung Fu Panda 2 Review: WOWsomeness!!!
Styling ordered list numbers
The Success Story of Mycroburst
Google Correlate
What is Internet Marketing? Is it for your business??
Good News For Sri Lankan Auto Lovers: Tata Nano To be Sold in
Sri Lanka Soon !!!...
17 Futuristic Eco-Homes
Car Rental deals
PhotoSwipe - The web image gallery for your mobile device
Best Cloud Based Invoicing Software & Applications
Kickoff - Coming soon
Romnatic Dating Tips!!
Want To Get Paid Faster? Top 5 Cloud-Based Financial Tools To
Speed Up Your Rece...
SpyBubble Review
70 Free PSD Web UI Elements For Designers | Free and Useful
Online Resources for...
Mercedes SLK With New Look
loads.in - test how fast a webpage loads in a real browser from
over 50 location...
Top 5 Fastest Bikes of 2010
Disaster in Little Community
Better Light Effect For Design Inspiration
Press Brakes-Mechanical press Brake-Hydraulic Press Brake
Sustaining Continuous Innovation through Problem Solving ...
Number of Unique
Taggers
139
135
130
113
92
79
77
72
60
59
55
55
54
52
52
50
48
47
45
44
44
44
43
42
42
95
Vita
Highlights
Over 13 years of experience in managing global teams, developing state
of the art ITSM and data management solutions.
Strong history of driving innovations at scale and making a significant
difference.
Strong leadership skills and proven ability to recruit the best, build solid
teams, and set a clear vision.
Data management and integration expert. Strong in data analytics,
measurements definitions and representation.
Outstanding problem solving and critical thinking skills.
Architecture and creator of original ITSM solutions for various
processes including; Incident, Change, Asset, Release, Procurement,
Knowledge, and Business Continuity Management Processes.
Process redesign and improvement expert including maturity road maps
planning and implementations.
ITIL and ITAM (IT Asset Management) certified.
Experience
Amazon.com
Seattle, WA
October 2009 – Present
Catalog and Data Quality Ops and Program Mgt
Responsible for managing Amazon’s catalog quality global team and build
processes to enable fast based innovation and delivery of solutions on all
our sites world-wide.
Design and implement processes that improves customer experience on our
site and especially when interacting with the data in the catalog.
Partner with senior management to define the vision for catalog quality
efforts at Amazon and sponsor key projects to make this vision a reality.
Optimize the current team and use strategic sourcing to respond to high and
un-predictable market and customer demands.
Implement a set of light-weight project management practices for the
catalog quality team to enable them to drive continuous improvement
projects successfully. This work is empowering the team to break
organizational barriers and innovate beyond the day-to-day responsibilities.
Serving as a principal for operations research and process management for
the internal Amazon community of businesses and companies.
96
Key contributor to Amazon’s hiring practices and raising the bar program
with every new hire.
Active mentor and contributor to Amazon’s leadership principles. We focus
on identifying candidates and employees of high potential and mentor them
into leadership positions.
Pepperweed Consulting LLC
Sewickley, PA
March 2008 – October 2009
Sr. Management Consultant
(on September 2009 Pepperweed Consulting became Cognizant
Technologies due to acquisition)
Responsible for delivering management consulting services to fortune 500
companies in the areas of ITSM, PPM, and ITAM. The following is a list of
key clients: Boeing, Western Union, T-Mobile, Catholic Health West
Health System, Cook Children Hospital Health Systems, and Great
American Financial Resources Insurance.
Responsible for managing complex engagements to address key problems
by designing and implementing best practice processes.
Help clients in transforming IT and making it more transparent to the
business.
Contributed to the development of the new practice area of IT Governance
and Project Portfolio Management processes.
Work closely with the leadership teams in large organizations to design and
implement marketing programs geared to promote process improvement
initiatives.
Expert in designing process performance metrics programs to increase
organizational awareness and present opportunities for improvements.
Design and implement new organizational structures required to support
PPM and ITSM programs needed for continual success.
Dow Jones & Co. Inc.
Employment History (10 years) Princeton, NJ
(on December 2007 Dow Jones became News Corporation due to
acquisition)
August 2005 – February 2008
Process & Control Manager
Responsible for engineering and launching critical enterprise processes
such as Incident Management, Problem Management, Change
Management, Configuration Management (CMDB), Knowledge
Management, IT Asset Management, Procurement, Vendor Relationship
Management, Request Management …etc.
Process maturity planning includes the creation of the Capability Maturity
97
Model (CMM) road maps for the various enterprise processes while
accounting for dependencies across processes.
Overseeing the design and implementation of IT outsourcing processes
along with implementation and compliance. Involved in designing SLA
with outsourcing vendors.
Provide consulting services for process implementations to internal
departments including international and domestic divisions.
Strategic planning of how to utilize social computing tools to support
organizational objectives.
Responsible to promoting process standards and increase compliance
across all departments and acquired companies.
June 2003 – August 2005
Production Control Manager
Manage change coordinators team that oversees all enterprise
infrastructure changes and related day-to-day activities.
Lead the design and implementation of enterprise processes needed to
logically and physically secure the production environment and data
centers of mission critical applications and services.
Serve as an internal consultant to assist other groups in solving chronic
problems related to workflow, process or procedural communications.
October 2001 – May 2003
Line of Business Supervisor (LOB) for
Electronic Publishing and .com products (including Factiva.com)
Serve as the Operational services Liaison for all the electronic publishing
systems and the .com products in Dow Jones & Co.
Provide post mortem disruption reports for all critical production issues.
Track and approve new business projects and provide necessary training to
operations staff.
Address operational and technical exposures in the LOB technologies and
provide solutions to mitigate risks.
Analyze performance against set SLA levels and follow up with senior
management on corrective actions when needed.
May 2000 – September 2001
Systems Administration Consultant
Achieve significant savings and reduce headcount by leading monitoring
tools integrations efforts and automation of manual tasks.
Analyzed systems performance and provided recommendations to enhance
its efficiency.
May 1998 – May 2000
Senior Operations Analyst
98
Led large automation efforts for all manual check of online commerce
systems.
Tested and evaluated new enterprise management tools software and
provided recommendations of possible application and usage.
Education
2004 – Present
Drexel University
Philadelphia, PA
Ph.D. in Information Science & Technology Current GPA: 3.9/4.0
Research interests include information retrieval (search) on the Web, how
organizations use information to solving problems, Web mining, online social
network analysis and informal structures, as well as search algorithm and
methods for user-generated content and use of tags in search
1999 – 2001
University of Maryland UC
College Park, MD
Masters of Science in Information Technology Management and
Telecommunications GPA: 3.5/4.0
1997 - 1999
Rider University
Lawrenceville, NJ
Bachelor of Science in Business Administration with concentration in Computer
Information Systems GPA: 3.92/4.0
Affiliation
Industry
Recognition
Member of the following organizations: International Association of IT
Asset Management (IAITAM), Information Systems Audit and Control
Association (ISACA), and International Institute of Business Analysis
(IIBA).
Honorary President and Founder of the Computer Information Systems
Society at Rider University
Member, Phi Beta Delta, International Honor Society for International
Scholars
Member, Beta Gamma Sigma, National Honor Society for the Best Business
Scholars
Leader of the World Health Organization Committee of the Model United
Nations team in 1999 for Rider University
Invited Faculty for the HDI Service Management Annual Conference and
Expo: Annually I get invited to present new and interesting discoveries
related to ITSM processes and practices. Workshops are half day in
duration and they attract top industry leaders seeking new and exciting
ideas to take their ITSM practices and processes to the next level.
List of workshops delivered through this venue so far:
October 2010: Power to the People: harnessing the power of crowds
99
to succeed and thrive. For more info click here.
November 2009: ITSM Organizational Structures and Leadership
Requirements for Success. For more info click here.
November 2009: Process Maturity Models and Self Assessment
Tools. For more info click here.
November 2009: Virtualization Demystified: All you need to know
to ensure you survive and thrive. For more info click here.
Columnist in the IT Asset Knowledge (ITAK) monthly magazine – Column
Title: Process Demystified
January 2008 International Association of IT Asset Managers
Columnist in the IT Asset Knowledge (ITAK) monthly magazine – Column
Title: Process Demystified
In this column I take my readers on a journey to demystify IT processes.
The focus is the ITAM domain and I try to discuss various aspects of the
ITAM processes each month and provide useful tips to enable a successful
ITAM program implementation.
January 2008 International Association of IT Asset Managers
Editorial Board Member of the Best Practice Library for IAITAM Practice
I am one of six industry leaders that have been chosen to edit and comment
on the IAITAM best practice library which is an industry standards for
ITAM program implementation.
May 2000
Assurance 2000 Conference
Las Vegas, NV
Presenter of Dow Jones Best Practices for Operations Management and
24X7 availability
Assurance 2000 is an annual technical executive conference hosted by
BMC Software. In this event about 3,500 industry leaders gather from all
around the world to share knowledge and strategize for the future.