(slides 6MB)

advertisement
From Research of Social Media to
Socially Mediated Research
2010 HCIL Symposium Workshop - UMD
Government Applications of Social Media
Networks and Communities
May 28, 2010
Natasa Milic-Frayling
Microsoft Research Cambridge
Outline
 Microsoft Research. Integrated Systems
team, research areas and approach
 ‘Social’ as a research topic:
Modelling Human to Human Interaction in
Technology Mediated Communities
 ‘Social’ as facilitator of research
Leveraging Communities of Practice.
Microsoft Research (MSR)
1000
800
# PhD Researchers
MSR Cambridge
600
400
200
0
1991 1995 1999 2003 2006 2008
MSR New England
Silicon Valley
 MSR Sites
–
–
–
–
–
–
–
Redmond, Washington
(September 1991)
San Francisco, California
(June 1995)
Cambridge, United Kingdom (July 1997)
Beijing, China
(November 1998)
Silicon Valley, California
(July 2001)
Bangalore, India
(January 2005)
Cambridge, Massachusetts (July 2008)
Redmond
MSR India
MSR Asia
Research Areas
WEB AND ON-LINE
COMMUNITIES
Information
retrieval & NLP
CONTENT ANALYSIS AND
RICH UI
Machine Learning
and Statistics
HCI and
Design
Academic
AcademicDisciplines
Disciplines
MOBILE AND CROSS
PLATFORM MEDIA
Mathematical
Modelling
Graph Theory
and Analysis
Research Areas
WEB AND ON-LINE
COMMUNITIES
Information
retrieval & NLP
CONTENT ANALYSIS AND
RICH UI
Machine Learning
and Statistics
HCI and
Design
MOBILE AND CROSS
PLATFORM MEDIA
Mathematical
Modelling
Graph Theory
and Analysis
Team
Gabriella
Janez
Annika
Rachel
Gavin
Gerard
Jamie
Natasa
Eduarda
Research Areas
WEB AND ON-LINE
COMMUNITIES
Information
retrieval & NLP
CONTENT ANALYSIS AND
RICH UI
Machine Learning
and Statistics
HCI and
Design
MOBILE AND CROSS
PLATFORM MEDIA
Mathematical
Modelling
Graph Theory
and Analysis
TeamDisciplines
Academic
Gabriella
Janez
Annika
Rachel
Gavin
Vinay
Derek
Hansen
Elizabeth
Bosnignore
Gerard
Natasa
Eduarda
Jamie
Dana
Ben
Marc
Rotman Shneiderman Smith
Cody
Dunn
Aleks
Ignjatovic
Tom Lee
Research Areas
WEB AND ON-LINE
COMMUNITIES
CONTENT ANALYSIS AND
RICH UI
MOBILE AND CROSS
PLATFORM MEDIA
Projects
InSite Live
Research Desktop
weConnect
Web site structure analysis
and decomposition into
subsites
Research in information
management and tagging
practices in the Desktop
environment
Social Footprints
Social IR
Investigating narrow-cast
of personalized content
in close relationships and
potential for mobile
advertising.
Analysis of social
interaction in online
communities
Extension of IR models
with social network and
models of approval, trust
and reputation.
NodeXL
Interactive graph
analysis and
visualization.
VideoSnaps
Investigating concepts
and services for cross
platform media editing
and streaming.
Research Areas
WEB AND ON-LINE
COMMUNITIES
CONTENT ANALYSIS AND
RICH UI
MOBILE AND CROSS
PLATFORM MEDIA
Projects
InSite Live
Research Desktop
weConnect
Web site structure analysis
and decomposition into
subsites
Research in information
management and tagging
practices in the Desktop
environment
Social Footprints
Social IR
Investigating narrow-cast
of personalized content
in close relationships and
potential for mobile
advertising.
Analysis of social
interaction in online
communities
Extension of IR models
with social network and
models of approval, trust
and reputation.
NodeXL
Interactive graph
analysis and
visualization.
Connect the quantitative
analyses with the qualitative
analyses.
VideoSnaps
Investigating concepts
and services for cross
platform media editing
and streaming.
Research Platforms
Principles, mechanisms, and tools
for knowledge management.
Methodology – how to develop
mobile and social applications.
Trust and reputation.
Integration with the ecosystem
– pre-requisites for adoption
Shared summaries and overviews.
social as a research topic
INTERACTIONS IN TECHNOLOGY MEDIATED
COMMUNTIES
Community Question-Answering
2008
Newsgroups
2006
Blogs
Forums
Online
Communities
Web
Boards
Question
Distribution
Lists
Answering
2006
2005
2003
2002-06
2002
Community Question-Answering
Question
Answers
Content Organization, Browsing and Search
Tags
Topic categories
100 Most Frequent Tags on Live QnA
100 Most Frequent Tags on Live QnA
Politics
100 Most Frequent Tags on Live QnA
Fun, Life, People, Philosophy
Community Analysis and Health Index
Towards a sustainable
community

Support novice users in becoming
active community participants

Support frequent users in
increasing the volume and quality
of their content contributions

Promote high quality contributions
(for external exploitation – through
search).
85% of new users start with a question
72% never ask a question again
5% will engage in answering
61% of questions from new users
don’t get more than 1 answer
(23% get 0 answers)
Example: Investigate QnA Voting Practice
Approach:

Statistical analysis of the user logs

Manual inspection of the content
comment to
C
– Taxonomy of the users’ intent; to be
evolved by the community of practice



vote on
comment to
Define the basic features of the
individuals and governing assumptions
Derive a mathematical model of the
voters metric.
Observe the properties with regards to
the irregular voting behaviour: random
voting or collusion.
V
A
A
answer to
answer to
Q
Social network
activities:
Answer to a question
Comment on an answer
Vote on the best answer
Which Answer to Vote On?
 Different ‘best answer’ connotations
The notion of the ‘best answer’ thus depends on the context and nature of
the answers - from correctness and usefulness to entertainment value
 Social bias
Assignment of votes may be influenced by social and personal ties,
voter’s perception, familiarity, and preferential treatment of familiar
community members
“Microsoft or Apple? Feel free to argue and point out their good and bad
points. Also feel free to rebut or debate on other people's standpoint. Best
argument/answer will get my friends’ and my "best answer" reward.”
 Self-promotion
Individuals’ aspirations to excel in their social status can adversely affect
the quality of their contribution to the community.
Reliability as Conformity?
 Reliability of a voter
Relative reliability of two voters is determined by the
proportion of all the voters who made the same choice of
the best answer:
The reliability scores represent a fixed-point for the
function F – apply Brouwer Fixed Point Theorem.
Real Data Analysis
‘FUN’
Vote Count
FP Method
‘PHILOSOPHY’
Random Voting
Simulate Random Voting by uniform distribution in place of Zipf’s Law
We vary the percentage of affected questions (from 1% to 10%) and the
percentage of voters who voted randomly (from 1% to 10%).
The number of best answer changed is lower for fixed point score (right)
than for plurality voting (left)
Ballot Stuffing

Simulate the collusion: fix the number of involved voters (‘stuffers’, here 4
and 10) and the percentage of questions affected (here 50%)

Both majority voting and fixed point scoring are susceptible to ballot stuffing

Fixed point scoring flags out the outliers and helps identifying collusion
Detecting Sybil Attack - Leveraging Social
Networks
• Social networks are Fast
Mixing
– Random walks quickly converge
to stationary distribution
• Sybil attacks induce a
bottleneck cut
– Fast mixing is disrupted
• Knowledge of an apriori
honest node
– Breaks Symmetry
Attack
edges
Honest
Nodes
Sybil
Nodes
social as facilitator of research
LEVERAGING COMMUNITIES OF PRACTICE
Issue: the Scale and the Limitations of Humans
 We require user input in order to inform the systems’ design
and verify our hypotheses
 In search we build test collections:
– A set of topics, a corpus of documents, and relevance judgements
for documents in the corpus
 Question: how do we build test collections for books
– Search over Web pages involves low cost of inspection of individual
Web pages
– Search over Book collections increases the cost due to the size
and the coherence of topics across pages.
Web scenario
Book scenario
…
Read’n Play
SOCIAL GAME SUPPORT
USER ANNOTATIONS
SEARCH AND
NAVIGATION SUPPORT
DATA STORE AND
SEARCHABLE INDEX
OCR Text
Database
Text and
Metadata Index
Image Database - Scanned
Document Page
 Architecture comprises
four functional layers
 Implemented using Web
services - no client based
interaction with the content
 Can be repurposed for
other research projects
Social game
• Explorers
• Reviewers
Reward for
finding relevant
content
Penalty
Reward for
finding mistakes
in explorers’
work
• Conflicts
Reward for
re-assessment
(agreement is not
necessary)
Explore
Pilot Study
Incentives for participation
Participants
• Tangible, e.g., monetary,
 Open to everyone
– Winners: Microsoft Hardware and
 48 registered + 81 INEX participants
software
 17 contributed assessments
– All: Access to collected data
(16 INEX participants)
• Intangible reward, e.g., fun, social
gain
Collected data
– Leader board: Social status
 Relevance assessments
– 3,478 judged books with
– 23,098 judged pages from
– 29 topics

Log data
– 32,112 navigational events
– 45,126 judgement events
– 2,970 ‘search inside a book’ events
Feasibility
Averages across the 17 assessors

7.2 days with activity, out of 42

11.4 hours judging time

220 judged books
Average effort

7.3 minutes per relevant book, 2.7 minutes per irrelevant book (comparable to
INEX 2003 ad hoc track)

37 seconds per relevant page, 22 seconds per irrelevant page
Extrapolated statistics

1000 books takes 52.7 hours, 1 : 9 ratio of relevant : irrelevant

33.3 days to judge one topic, with 95 minutes a day

70 topics, 200 books per topic with 20 judges takes 36.9 days

737 judges to complete task in one hour
Productivity Games
Summary

Understanding social media requires cross-disciplinary
approach and new methods to study them

Defining the characteristics and metrics of ‘healthy
communities’ is a challenging task.

‘Social’ is increasing its role as an enabler for large
scale experiments
Generally, we need to be reflective of our methods and
approaches we take when studying online communities.
Thank you
Microsoft Research
Cambridge
https://research.microsoft.com/is
Download