Community, Consensus, Coercion and Control in Wikipedia

advertisement
How A Million People Could Save the Planet:
The Next Research Agenda for
Collaborative Computing
2012 Brazilian Symposium on Collaborative Systems
David W. McDonald
dwmc@uw.edu
University of Washington
The Information School
October 18, 2012
These are not <the thing>,
but are pointers to <the
thing>.
The Shifting Paradigm in
Collaborative Computing
• There are a set of interesting problems at the
intersection of computing and how people use
computing
• The issues at the intersection can change as
participation scales up to larger numbers
• Insight gained from studying the intersection can
fundamentally change computing as a purposely
designed and built artifact
• Theory and methods originating from one single
perspective (computational, behavioral, or social) are
insufficient to fully interpret the happenings in the
intersection
Talk Outline
•
•
•
Introduction
The Shifting Paradigm in Collaborative Computing
Insights from Prior Research
–
–
–
•
Patterns of Behavioral Observations in Wikipedia
–
–
–
–
•
Expertise Locating
Proactive Displays
Lifestyle Behavior Change
Collective Behavioral Observation
Machine Learning Experiments
Candidate Patterns
Validation and Limitations
Social Computational Systems Research Agenda
–
–
High-Level Research Challenges
Research Openings
Expertise Locating
• Research Questions
– How do people find necessary expertise?
– How can we build systems to support natural
expertise locating behavior?
• Methods
–
–
–
–
Qualitative, 9 month ethnographic field study
Grounded Theory analysis
System building/design (wrote code)
Quantitative evaluation of locating and
matching heuristics
Expertise Locating
• Findings
– Locating process – Identification, Selection,
Escalation
– Identification – Work products and byproducts
can be used to generate recommendations of
individuals with ‘localized’ expertise
– Selection – Social Networks for contextualizing
social recommendation are only partially
effective
– Plugable software architecture (ERArch) to
allow extension and addition of Identification
and Selection techniques
Proactive Displays
Auto Speaker ID
• Design Goals/Questions
– Enhance the feeling of community among
conference attendees.
– Mesh with common social practices at the
conference.
– Manage the privacy concerns of all participants.
Ticket 2 Talk
• Methods
Neighborhood Window
– System building/design
– Field trial (deployment) at academic conference
– Observation, ad-hoc interviews, post conference
survey
Proactive Displays
Auto Speaker ID
Ticket 2 Talk
Neighborhood Window
• Findings
– Proactive Display as an Open Region – an area
where people of different status are socially allowed
to interact (Goffman, Behavior in Public Places, 1963)
– Shared Interactions – You don’t really “interact”
with a “proactive” system
– Design Implication for Public Displays – Context(s),
Content, Control
Lifestyle Behavior Change UbiFit
• Research Question
– How can technology help people move from the
behaviors that define the lifestyle they have to a
new lifestyle they want?
• Methods
–
–
–
–
–
System building/design
Field Trial – 3 weeks
Field Experiment – 3 month
Interviews, surveys, activity data
Analysis
• Presentation of Self (Goffman)
• Cognitive Dissonance Theory (Festinger)
• Transtheoretical Model of Behavior Change (Prochaska et al)
Lifestyle Behavior Change UbiFit
• Findings
– Traditional models of validation for inference
systems are problematic when deployed in the
real world
– Theories being used for UbiComp fitness/health
applications are somewhat problematic (TTM)
– Awareness of behavior through personal ambient
display can overcome avoidance
– Fitness behavior patterns are not very regular
(exceptions are the rule)
Talk Outline
•
•
•
Introduction
The Shifting Paradigm in Collaborative Computing
Insights from Prior Research
–
–
–
•
Patterns of Behavioral Observations in Wikipedia
–
–
–
–
•
Expertise Locating
Proactive Displays
Lifestyle Behavior Change
Collective Behavioral Observation
Machine Learning Experiments
Candidate Patterns
Validation and Limitations
Social Computational Systems Research Agenda
–
–
High-Level Research Challenges
Research Openings
Collective Behavioral
Observations
• People make behavioral observations
– Every day social/behavioral science
– Motivating example – Driving
Collective Behavioral
Observations
• People make behavioral observations
– Every day social/behavioral science
– Motivating example – Driving
• Online Communities
– Observations are attenuated
– Leverage the power of the crowd, many people
• Wikipedia Behavioral Observations
– Barnstars
Barnstars
Barnstars
Barnstar Gallery
Observational Patterns
• Can we identify patterns of user activity through nonspecialist observations?
• Possible problems …
– Pro-social recognition (piling on)
– Singular activity – popular
– Singular activity – extraordinary efforts
Generate Train & Test Sets
• Previous work (became Train Set)
–
–
–
–
Mined Nov. 2006 Wikipedia data dump
Over 14K unique barnstars, ~4900 recipients
Created coding scheme, 7 top-level categories
3 coders, ~2126 barnstars
• Additional Coding (new Test Set)
– Random selection, cleaning
– 2 coders, ~478 barnstars
Train & Test Set Distributions
Train Set
Dimension of Observed Activity
Test Set
Code
s
%
Codes
%
Editing Work
852
27.8
180
29.1
Social and Community Support Action
763
24.9
150
24.2
Border Patrol
342
11.2
81
13.1
Administrative
284
9.3
54
8.7
Collaborative Action and Disposition
244
8.0
41
6.6
Meta-Content Work
128
4.2
23
3.7
Undifferentiated Work
447
14.6
90
14.5
Classification Experiments
• General Multi-label Classification Approaches
– Problem Transformation (PT)
– Algorithm Adaptation
• Features
– n-gram, barnstar name, barnstar image name, policy named, policy
linked, link to a page, link to a specific edit, …
• What worked reasonably well
– PT1 – Independent binary classification
– PT4 – Classifier for every set of applied labels
– AA – MLkNN, multi-label version of k Nearest Neighbors
PT1 – Results (AUC)
Dimension of Activity
Logistic
Regressio Naïve
n
Bayes
Random
Forest (1k KNN
Trees)
(k=10)
Administrative
0.833
0.949
0.942
0.903
Border Patrol
0.922
0.941
0.952
0.956
Collaborative Action
0.750
0.722
0.743
0.725
Editing
0.878
0.875
0.879
0.884
Meta-Content
0.835
0.842
0.883
0.800
Social and Community
0.802
0.796
0.797
0.805
Undifferentiated Work
0.847
0.848
0.844
0.854
Avg. AUC
0.838
0.853
0.862
0.847
Identifying Candidates
• Select Barnstar Recipients
– Recipients with 9 or more barnstars
• 259 candidates, 4327 barnstars
• Applied the Random Forest
– Label the received barnstars
• Candidate Recipients
– Predominate observed activity if the same label applied to more
than half
Candidates
Dimension of Activity
Label
Avg %
Candidate
s
Editing
E
67.9
25
Border Patrol
B
73.2
13
Social and Community
S
61.5
54
Administrative
A
66.4
75
Collaborative Actions
C
52.0
1
Meta-Content
M
76.8
4
Undifferentiated Work
U
60.0
10
182
Review Candidates & Labels
• Random selection of pattern candidates
– 39 of the 182 (21.4% ), yield 544 of 4327 barnstars (12.6%)
• Validation
– Possible duplicates, possible non-barnstars
– Mislabel application
Reviewing Patterns
• Independence of the Observations
– Seem relatively independent
– No evidence of barnstars awarded to the same recipient for the
exact same event
• Limitations
– Skew in what the community “values” and in the numbers (a
challenge for ML validation – unbalanced data)
– Link candidate and patterns to the actual edits
•
Future work
Working at the Intersection
• Contribution to Computing
– Naturalistic datasets open interesting problems for ML algorithms
•
Massive datasets probably require application of ML techniques
– Approaches for handling short text, incremental contributions
– Unbalanced data
McDonald, D. W., S. Javanmardi and M. Zachry (2011) Finding Patterns in Behavioral Observations by
Automatically Labeling Forms of Wikiwork in Barnstars. Proceedings of the 7th International Symposium on
Wikis and Open Collaboration (WikiSym'11).
Sajnani, H., S. Javanmardi, D. W. McDonald and C. Lopes. (2011) Multi-Label Classification of Short Text: A
Study on Wikipedia Barnstars. Presented at the “Analyzing Microtext” Workshop at the Twenty-Fifth
Conference on Artificial Intelligence (AAAI-11).
Talk Outline
•
•
•
Introduction
The Shifting Paradigm in Collaborative Computing
Insights from Prior Research
–
–
–
•
Patterns of Behavioral Observations in Wikipedia
–
–
–
–
•
Expertise Locating
Proactive Displays
Lifestyle Behavior Change
Collective Behavioral Observation
Machine Learning Experiments
Candidate Patterns
Validation and Limitations
Social Computational Systems Research Agenda
–
–
High-Level Research Challenges
Research Openings
The Shifting Paradigm in
Collaborative Computing
• There are a set of interesting problems at the
intersection of computing and how people use
computing
• The issues at the intersection can change as
participation scales up to larger numbers
• Insight gained from studying the intersection can
fundamentally change computing as a purposely
designed artifact
• Theory and methods originating from one single
perspective (computational, behavioral, or social) are
insufficient to fully interpret the happenings in the
intersection
Social Computational Systems
Research Agenda
• Defining SoCS
– A Social Computational System (SoCS) interleaves machine
activity and human activity to solve problems that neither machine
nor human can solve alone.
• Properties of SoCS
– Allow people to do what people do best
– Allow machines to do what machines do best
– Solve unique problems that interleave both
– Could be a 1-with-1 system (one person with one machine)
– Perhaps scaling of SoCS could solve more difficult problems
Interface
Human Computer
SoCS: Research Openings
Collaborative Substrate/Infrastructure
SoCS: Research Openings,
Collaborative Substrate
Collaborative Substrate/Infrastructure
• Software Engineering
– Architectures for effective interleaving
– Toolkits to support new system development
• Languages
– Support massive parallelization between people/machine
– Expressive asynchrony
– Support task decomposition & recomposition among people/machine
• Data Management
– Data Provenance
– Who generated the data? (Person or machine?)
– How does data (quality) change over time?
SoCS: Research Openings,
Human/Social
• Psychological
– Motivations, incentives to make contributions
– Promote high quality contributions
– Skill development and individual growth
• Interaction, Social
–
–
–
–
Support for prosocial or congenial interaction
Leveraging or minimizing conflict
Effective support for meta conversations about the system/tasks
Provide meaningful feedback on the work, tasks, contributions
SoCS: Research Openings,
Computational
• Intelligent Systems
– Understanding error rates of machine and people
– Patterns across very large numbers of contributions
– Patterns in very small contributions
• Data Mining
– Effective use of user contribution
– Working to minimize multiple collections
SoCS: Research Openings,
Interface
• Visualizations
– Understand, interpret who, what, where of contributions
– Where are groups of people, clusters of work
– Where are there gaps
Interface
– Simplify making a contribution
– Identifying tasks or places where contributions are needed
– Administration tasks
Human Computer
• Usability
Social Computational Systems
Research Agenda
• Three High-Level Challenges for SoCS
– Methodological Challenge
Effectively use existing methods to study the intersection and,
where those methods fail, develop new methods to address the
intersection.
– Human Trait or Technical Quality (Trait/Quality) Challenge
Understand the shifting influences of human traits and technical
qualities across scales to accommodate shifting levels of
participation in SoCS - potentially increasing or decreasing.
– Design Challenge
Communicate SoCS design principles so that the broader
community of system builders and industry can readily utilize
them.
Promising Domains
• Leverage human skills, insight, intuitions
• Leverage the ability of machines to model, calculate,
aggregate, visualize
Promising Domains
• Leverage human skills, insight, intuitions
• Leverage the ability of machines to model, calculate,
aggregate, visualize
• Social Computational Systems as Applications
– Cognitive support – memory, understanding, comprehension
– Social support – facilitate interactions with others, cross-cultural
– Educational – interleave people and machines for teaching as well
as learning
– Government – grow participation in decision making
– Work/Labor – enable new forms of work, potentially new economies
Promising Domains
• Leverage human skills, insight, intuitions
• Leverage the ability of machines to model, calculate,
aggregate, visualize
• Social Computational Systems for Grand Challenges
–
–
–
–
Global warming
Preserve cultural knowledge from extinction
Sustainable economic development
Health and wellness
Obrigado!
• Questions & Discussion
• Acknowledgements
– Patterns of Behavioral Observations Study
• Sara Javanmardi, Hitesh Sajnani, Greg Tsoumakas, Mark Zachry, Crista Lopes
• NSF IIS-0811210
– Many other students, collaborators on the prior work
Download