Utilizing Social Annotations for Topical Search in Twitter Online

advertisement
Utilizing social annotations for
topical search in Twitter
Saptarshi Ghosh
BESU Shibpur
Complex Network Research Group
CSE, IIT Kharagpur
General overview

Social networks in online world




Twitter, folksonomies such as Delicious
Modeling the network evolution
Improving search services
Socio-technological networks in offline world


Indian Railway Network
Traffic analysis
Topical attributes of Twitter users

Twitter has emerged as an important source of
information & real-time news



Increasing access through topical search
[Teevan WSDM 2011]
Motivation: to discover topical attributes / expertise
of users
Potential applications


Know credentials of a user
Identify topical experts
How to discover topical attributes?

Prior attempts rely on contents of tweets or userprofiles [Ramage ICWSM 2010, Pochampally SIGIR Workshop 2011]



Many profiles do not give topical information
Tweets often contain day-to-day conversation  difficult
to infer topics [Java SNA-KDD 2007, Wagner SocialCom 2012]
Proposed methodology


Use social annotations – how a user is described by others
Social annotations gathered through Twitter Lists
Mining Lists to infer topics

Collect Lists containing a given user U

Identify U’s topics from List meta-data


Basic IR techniques such as case-folding,
remove domain-specific stopwords
Extract nouns and adjectives
Topics inferred from Lists
politics, senator, congress, government,
republicans, Iowa, gop, conservative
politics, senate, government, congress,
democrats, Missouri, progressive, women
linux, tech, open, software, libre, gnu,
computer, developer, ubuntu, unix
Lists vs. other features
Profile bio
love, daily, people, time, GUI, movie,
video, life, happy, game, cool
Most common
words from tweets
celeb, actor, famous, movie, stars,
comedy, music, Hollywood, pop culture
Most common
words from Lists
Who-is-who service

Developed a Who-is-Who
service for Twitter


Shows word-cloud for major
topics for a given user
http://twitter-app.mpisws.org/who-is-who/
N. Sharma, S. Ghosh, F. Benevenuto,
N. Ganguly, K. Gummadi, Inferring
who-is-who in the Twitter social
network, WOSN 2012.
Search system for topic experts
Cognos, a search system for topic experts
http://twitter-app.mpi-sws.org/whom-to-follow/


Given a query (topic)



Identify users related to the topic using Lists
Rank identified users
Uses ranking scheme based on Lists


Relevance of user to query
Popularity of user
Cognos
results for
“politics”
Cognos
results for
“stem cell”
Evaluation of Cognos

Evaluations through user-surveys


Cognos gives accurate results for wide variety of queries
Cognos vs. Twitter Who-To-Follow service


Judgment by majority voting
Out of 27 queries, Cognos judged better for 12, Twitter
WTF better for 11 and tie for 4
S. Ghosh, N. Sharma, F. Benevenuto, N. Ganguly, K. Gummadi, Cognos:
Crowdsourcing Search for Topic Experts in Microblogs, SIGIR 2012.
Twitter as a source of information


Characterizing the experts in Twitter 
characterizing Twitter platform as a whole
What are the topics on which information is
available on Twitter?
Topics in Twitter – major topics to niche ones
Study on the Indian Railway Network
Motivation: rail accidents during 2010
• Details of accidents: in
Wiki page on IR accidents
• Considered only
accidents due to
• Collision between
trains
• Derailment
IRN data collection

Crawled schedules of express trains from
www.indianrail.gov.in in October 2010



2195 express train-routes, 3041 stations
Scheduled time of each train reaching each station
Express train schedules for several years since 1991


From Trains At A Glance time-tables
Obtained from National Rail Museum, New Delhi
Observations


Many trunk-routes in the Indo-Gangetic Plain (IGP)
have high daily traffic with low headway
Bad scheduling of IR traffic



Routes in north India have especially low headway during
early morning hours when dense fog is likely
Skewed distribution of daily traffic
Unbalanced growth of traffic in IGP


Traffic in some segments in IGP has increased by 250% in
2009, compared to the traffic in 1991
Very low construction of new tracks
Publication and press coverage
S. Ghosh, A. Banerjee, N. Ganguly. Some insights on the recent spate
of accidents in Indian Railways. Physica A, Elsevier, 2012.
Thank You
Questions / Suggestions?
Backup slides
Cognos vs. Twitter Who-To-Follow
Download