Mark Chetcuti
University of London
Email: mark@chetcuti.org
Dr. Alexiei Dingli
Department of Artificial Intelligence
University of Malta
Email: alexiei.dingli@um.edu.mt
Abstract
Social networks are increasingly being used by all types of people, creating large virtual communities of people. Many of the users engage in various online games with the sole scope of competing with their peers. Through these games, these communities are indirectly generating a myriad of information (such as image tags, folksonomies, etc) most of which is not being exploited. Our system explores the use of such games for image indexing.
We have created a game called PicChanster within the Facebook social network which asks users to describe images, pertaining to particular domains, within a time limit. Scores are awarded to the user's labels based upon a matching process with the respective image's defined labels which are collected from the source of the image. The information gathered would then be used by an indexing mechanism within an image search engine.
After analysing the data collected we can deduce that comparing the labels gathered through the game for each of the two sets of images, one set sourced from a traditional indexing mechanism (Uncertain set) and the other from a human based tagging (Certain set), we can confirm that human based tagging is more accurate. We have shown that the indexing of images obtained with the help of human computation has three major advantages; firstly it produces better results than automated systems. Secondly, it filters away errors from the result set. Finally, the combination of social networking together with incidental knowledge acquisition (KA) makes the system feasible for large scale indexing.
Keywords
Social Networks, Knowledge Acquisition, Image
Search Indexing, Human Computation.
1.
Introduction
The internet has not only become the biggest source of information but at the same time is a source which is capable of providing this same information for instant retrieval. The biggest shortcoming in making this information available instantly is the capability of retrieving appropriate data which matches the user’s searching criteria. Evidently, an improvement is shown on the indexing done on web documents. On the other hand however, the indexing performed on multimedia content is still one step behind and many a times does not produce precise results.
Multimedia content on the web is ever increasing and becoming more popular than traditional web documents. The concept of images, audio or video clips is by far more appealing to users than reading articles.
Websites like Flickr and YouTube can verify the popularity these websites are witnessing due to the number of hits each site occupies. This shows that indexing done on multimedia content must be improved even further to improve content retrieval.
This paper aims at finding ways of improving the information captured onto which indexing mechanisms are applied on images. This emphasises that the precise indexing of multimedia content is of utmost importance for effective retrieval. The two most popular categories for indexing images are textual information surrounding the image and visual content being relevant to images. We shall be
using the power of social networks to allow users to index images automatically through an application embedded into a social network.
2.
Related Work and Literature
2.1
Knowledge Acquisition
The difference between the knowledge available electronically and useable knowledge also known as valuable knowledge is a complex task to overrun. Valuable knowledge is conquered when Knowledge acquisition is acquired. Knowledge acquisition as explained by Tennison [9] can be divided into two concepts being either Direct Knowledge
Acquisition or Incidental Knowledge Acquisition .
Direct Knowledge Acquisition requires knowledge engineers that are focused and aware of the creation of knowledge itself, meaning that this has to be done on an intentional and planned basis. In fact, Shadbolt and Burton [7] state that Direct KA is also known as a contrived elicitation method.
The three techniques which fall under the cap of Direct KA are laddered grids, card sorts and repertory grids. All three techniques follow a structured way of how KA can be retrieved under a particular domain representing scores, grids or ratings.
Incidental Knowledge Acquisition is performed when experts generate knowledge artefacts which are then used for future references. This reinforces Incidental KA as a non-contrived method, as explained by Shadbolt and Burton
[7]. Other methods which can solve part of the
KA bottleneck are ontologies and multiple experts. Ontologies allow domain knowledge to be shared and hence no knowledge elicitation is needed to broaden the knowledge about a particular domain.
On the other hand, when multiple experts contribute to a particular domain, the need of further knowledge elicitation is also reduced.
Interviews and Protocol Analysis (video or audio recordings of experts) are two methods for
Incidental KA that tackles the problem of “how do we get experts to tell us, or else show us, what they do?” Interviews are a form of
Incidental KA which can be further separated into structured, semi-structured and unstructured interviews. We can conclude that
Incidental KA is the more natural way of collecting the data needed.
2.2
Social Networks
Social networking is fostering the idea of having people collaborating and communicating together either because they already have a connection amongst each other or else because they want to get to know each other by having a new connection.
Once a user populates his profile with the relevant demographics details, the social graph
[2] will be built once connections and indirect connections are connected to the user. As explained by Boyd [1] some Social Networking
Sites (SNS) require either bi-directional confirmation or one directional confirmation.
Boyd declares that bi-directional SNSs are safer and deem seriousness in such a networking site since one cannot become a friend to another person without a confirmation from the latter.
SNSs produce user generated content where users can tag, comment or review items on the social network. Folksonomy is the hottest trend in the Internet nowadays. The use of folksonomies is done to improve and categorise the content of the World Wide Web through social networks. The term image tagging is more popular nowadays where people are tagging themselves and their friends in the uploaded images on the web.
2.3
Image Search Indexing
Image Search indexing through the web can be an enormous headache and for these reasons search engines have a difficult job in providing efficient information to store, manage, organise and retrieve specific content. Multimedia
searching over the web is still a problem and their indexing is done either on the text associated to the content, such as the name of the content, or else through the visual features of the content which is not quite accurate for indexing.
Textual information refers to any text that is directly or indirectly related to the image perceived in a web page or document.
Jayaratne et al. [3] justifies that the following are a number of components where semantics of text can be extracted from that since they relate to the images embedded on the web –
Image Title, Image Alternate Text, Image
Caption, Page Title, Main Text, Meta Data.
Recent systems are facilitating the indexing of images with their visual contents. Visual contents may include various features of the images such as colour, texture, shape, location, spatial arrangements, topological relations and many more. WebSeer [8] utilises an image analysis algorithm which classifies images from photos and artificial drawings.
Another system, developed by Rathi and
Majumdar [6], justifies that image indexing is performed through the spatial and topological relationships between objects in images. This is based on objects which are to the left, to the right, above or below other objects, and then the topological relationships are taken into consideration in order to extract the exact relation in the 2-D planar region. Then again all systems still rely on human based relevance feedback where feedback is provided to be used in future searches.
Relevance feedback is being used in most of the systems mentioned above. Another system
Cortina [5] acknowledge relevance feedback from users to classify whether the image produced is accepted or rejected for a particular search or query to enhance future searches.
3.
Methodology
We will be using human computation in order to improve image searching mechanisms. This will be done through PicChanster a game embedded in the Facebook social network which is an innovative idea. Von Ahn’s initial idea [10] of using human computation was integrated in a personal web site domain and never analysed the idea of integrating such philosophies into a social networking environment as per our proposal. Figure 1 below depicts the system architecture of
PicChanster.
Figure 1: High Level Diagram of the system
Source: M. Chetcuti (2008)
3.1
Images and Keywords
The first step involved is the collection of images and keywords. These were captured by our crawlers which consist of two distinct sets of 100 images. The first set of images called as the Certain set of images is composed of images that were indexed by humans through manual human image tagging and which have been extracted from the Flickr web site. Since Flickr allows users to manually tag images uploaded, the four image keywords captured for our scope were taken directly from the tags entered by the users themselves.
On the other hand, the second set of images called the Uncertain set of images is acquired from the Google image search engine whose indexing mechanism is based on the textual information surrounding the images. In fact, these pictures’ indexing is based on the HTML captions, the text adjacent to every image and the filenames of the same images. For each image retrieved through Google, four labels were captured from the surrounding text of each image ensuring that stop words were not captured as labels. The choice of four labels was based on frequency, i.e. the four most frequently used words in the HTML page of the image source URL.
The defined four labels of each of the 200 images were invoked through a stemming tool algorithm [4] in order to use the root of each label.
3.2
PicChanster
Once images and keywords were captured, a database was created to retain all data. The aim of the game is to capture as many keywords as possible during the game. Each game of
PicChanster lasts two minutes and each player can enter up to four keywords per image. If one or more keywords entered by the player matches any of the stored keywords relevant to that image in our database respective points are awarded and the main aim for the player is to top the global rankings.
Scores are only awarded to matched labels for the Certain set since we deem that keywords are only accurate if they are human tagged.
Keywords provided for the Uncertain set will be analysed for the conversion of images from the
Uncertain set to the Certain set. During each game, the images from both sets are shown randomly but alternately to have equal showing.
Our main aim is to use the intensity of social networks to compel users to join PicChanster and acquire as many labels as possible for the enhancement of image indexing through these forms; a Facebook invitation, applications directory, mini feeds or news feeds.
4.
Evaluation
These test results were extracted after four weeks from when the application was launched.
The number of labels entered by all the players in the whole four weeks of the game sums up to
11,061 labels. This amount of labels shows an average of 142 labels per user. This is equal to an average of 55 labels entered per image in our database. From the 11,061 labels a slightly larger number of labels have been assigned to the Certain set of images. The Uncertain set obtained a total of 5,148 labels whereas the
Certain set of images had a total of 5,913 labels.
All collected labels have been invoked through a stemming tool in order to have consistency throughout the labels.
The number of labels entered by the users which match the Certain set outweighs the number of labels entered which match the
Uncertain set as seen below.
Set
Total no. of
Labels entered
Matching the 4 defined labels
Percentage of the labels matched
Uncertain
Certain
5,148
5,913
317
1,298
6.16%
21.95%
Table 1: Label Matching Statistics
Source: M. Chetcuti (2008)
4.1
Matched Labels
Given that we have two sets of images with 100 images in each set, we have a total of 200 images. Each image has 4 labels associated with it and each set has a total of 400 labels with a grand total of 800 labels for the whole image set in our database. We will hereby analyse how many distinct labels entered by our players were matched with the 800 labels for each set.
Table 2 below features a 51.50% match of distinct labels to the Certain set indicating that more than half of these labels do actually match labels which were captured from a human based source. On the other hand, the labels from the Uncertain set only have 23.00% match these defined labels.
Set
Defined labels for each set
No. of matched labels
Percentage of the labels matched
Uncertain
Certain
400
400
92
206
23.00%
51.50%
Table 2: Number Matched Labels
Source: M. Chetcuti (2008)
4.2
Accuracy Rate Percentage
From the accuracy rate for each of the 78 users who played PicChanster we will be able to calculate the average accuracy percent rating per label. The accuracy rate of each user was calculated on the labels entered for the Certain set only. This percentage ranges from 58% down to 0%.
The average accuracy rate per label is calculated by finding the Sum of Players’ Accuracy and this is calculated by adding up the accuracy percentage rate of all players that had contributed towards a specific label of a particular image. Hence we will divide the computed sum by the number of players contributing towards the same label. The result obtained is the accuracy percentage per label which can also be defined as the average accuracy of the each label. The formula is shown below.
Average
Accuracy =
% per
Label
Sum of Players’ Accuracy who contributed towards a particular label x 100
Number of Players contributing towards Label
4.3
Converting Images from Uncertain to
Certain Set
Two important factors which were taken into consideration to convert images from the
Uncertain set to the Certain set are popularity and accuracy. We cannot convert an image if a high accuracy rating is only present and vice versa. We have devised a mechanism which determines the average popularity based on all labels pertaining to the Uncertain set with popularity larger or equal to 3 and accuracy greater or equal to the average of the same set of labels. As per chart below the averages are equal to 5 for popularity and 22.41% for accuracy. The average percentage accuracy and the number of labels for each respective label together with the average amounts are displayed in figure 2 below.
Figure 2: Analysis on Label Accuracy & Popularity of the Uncertain Set
Source: M. Chetcuti (2008)
Stemmed labels of the Uncertain Set
Educ
Smith
College
Depart
New stemmed labels of the Certain Set
Girl
Read
Book
Library
Table 3: Stemmed labels before and after
Source: M. Chetcuti (2008)
Table 3 shows the labels of the image in figure 3 which was located in the Uncertain set and also the new labels after reaching the conversion threshold and placed in the Certain set
Figure 3: Image converted to the Certain set
Source: M. Chetcuti (2008)
We can deduce that the labels collected through the social network pertain more than the labels collected from the Google website.
5.
Future Work
Even though we have improved the way images are indexed, we believe that more contributions can be put forward in order to enhance this project further. Since we are restricting players to enter only four labels per image, the accuracy rate per image is limited. In order to enhance the game further, this number should be increased or left to the user’s choice on the amount of labels he wants to enter. This will allow the average accuracy rating of each label to be more accurate.
Given that we have applied stemming on the labels gathered from the users, a consistent set of keywords is collected but the restriction of allowing only one word per label has proven that accuracy can be pushed further by allowing users to enter a phrase or a whole sentence as a description. Applying deep semantic analysis on the phrases or sentences allows an accurate description of a whole situation in an image.
This will in turn lead to the generation of
natural language sentences utilised for future information retrieval. Natural language sentences formalise a structured way of ruling to select and classify sentences and transformations for structured strings.
6.
Bibliography
[1] Ellison, N., Boyd, D., Social Network Sites:
Definition, History, and Scholarship, Journal of
Computer-Mediated Communication (2007),
URL: http://jcmc.indiana.edu/vol13/issue1/boyd.ell
ison.html [cited: February 2008]
[2] Hinchcliffe, D., The Social Graph: Issues and
Strategies in 2008, Dion Hinchcliffe’s Web 2.0
Blog, URL: http://web2.socialcomputingmagazine.com/t he_social_graph_issues_and_strategies_in_20
08.htm [cited: February 2008]
[3] Jayaratne, L., et al., A Unified Approach to
Indexing Multimedia on the Web, At&t Poster presentation (May 2003), URL: http://citeseer.ist.psu.edu/cache/papers/cs/2
7542/http:zSzzSzwww.research.att.comzSz~rj anazSzjayaratne.pdf/a-unified-approachto.pdf [cited: February 2008]
[4] Porter, M., The Porter Stemming Algorithm,
Tartarus People, Projects and Penitence,
(January 2006), URL: http://tartarus.org/~martin/PorterStemmer/
[cited: April 2008]
[5] Quack, T., et al., Cortina: A System for Large-
Scale, Content-Based Web Image Retrieval and the Semantics within, ACM Multimedia
2004, (October 2004), URL: http://vision.ece.ucsb.edu/publications/04AC
MMQuack.pdf [cited: March 2008]
[6] Rathi, V., Majumdar, A.K., Content Based
Image Search over the World Wide Web,
Online ICVGIP – 2002 Proceedings (2002),
URL: http://www.ee.iitb.ac.in/~icvgip/PAPERS/142.
pdf [cited: February 2008]
[7] Shadbolt, N., & Burton, M., Knowledge elicitation: a systematic approach. In J.R.
Wilson & E.N. Corlett (eds.) Evaluation of
Human Work: A practical ergonomics methodology. Second Edition. Taylor &
Francis: UK (1995).
[8] Swain, J., et al.
, WebSeer: An Image Search
Engine for the World Wide Web, IEEE
Computer Vision and Pattern Recognition
Conference (Submitted) (June 1997),
URL:http://citeseer.ist.psu.edu/cache/papers/ cs/394/http:zSzzSzwww. cs.uchicago.eduzSz~swainzSzpubszSzCVPR97s ub.pdf/swain97webseer.pdf - WebSeer: An
Image Search Engine for the World Wide Web
- Swain, Frankel, Athitsos (1997) [cited:
February 2008]
[9] Tennison, J., LIVING ONTOLOGIES:
Collaborative Knowledge Structuring on the
Internet - D.Phil University of Notthingam
URL: (May 1999), http://jenitennison.com/jeni-tennisonthesis.doc [cited: November 2007]
[10] Von Ahn, L., Games With a Purpose, Carnegie
URL: Mellon University, (2004), http://www.cs.cmu.edu/~biglou/ieeegwap.pdf [cited: October 2007]