How A Million People Could Save the Planet: The Next Research Agenda for Collaborative Computing 2012 Brazilian Symposium on Collaborative Systems David W. McDonald dwmc@uw.edu University of Washington The Information School October 18, 2012 These are not <the thing>, but are pointers to <the thing>. The Shifting Paradigm in Collaborative Computing • There are a set of interesting problems at the intersection of computing and how people use computing • The issues at the intersection can change as participation scales up to larger numbers • Insight gained from studying the intersection can fundamentally change computing as a purposely designed and built artifact • Theory and methods originating from one single perspective (computational, behavioral, or social) are insufficient to fully interpret the happenings in the intersection Talk Outline • • • Introduction The Shifting Paradigm in Collaborative Computing Insights from Prior Research – – – • Patterns of Behavioral Observations in Wikipedia – – – – • Expertise Locating Proactive Displays Lifestyle Behavior Change Collective Behavioral Observation Machine Learning Experiments Candidate Patterns Validation and Limitations Social Computational Systems Research Agenda – – High-Level Research Challenges Research Openings Expertise Locating • Research Questions – How do people find necessary expertise? – How can we build systems to support natural expertise locating behavior? • Methods – – – – Qualitative, 9 month ethnographic field study Grounded Theory analysis System building/design (wrote code) Quantitative evaluation of locating and matching heuristics Expertise Locating • Findings – Locating process – Identification, Selection, Escalation – Identification – Work products and byproducts can be used to generate recommendations of individuals with ‘localized’ expertise – Selection – Social Networks for contextualizing social recommendation are only partially effective – Plugable software architecture (ERArch) to allow extension and addition of Identification and Selection techniques Proactive Displays Auto Speaker ID • Design Goals/Questions – Enhance the feeling of community among conference attendees. – Mesh with common social practices at the conference. – Manage the privacy concerns of all participants. Ticket 2 Talk • Methods Neighborhood Window – System building/design – Field trial (deployment) at academic conference – Observation, ad-hoc interviews, post conference survey Proactive Displays Auto Speaker ID Ticket 2 Talk Neighborhood Window • Findings – Proactive Display as an Open Region – an area where people of different status are socially allowed to interact (Goffman, Behavior in Public Places, 1963) – Shared Interactions – You don’t really “interact” with a “proactive” system – Design Implication for Public Displays – Context(s), Content, Control Lifestyle Behavior Change UbiFit • Research Question – How can technology help people move from the behaviors that define the lifestyle they have to a new lifestyle they want? • Methods – – – – – System building/design Field Trial – 3 weeks Field Experiment – 3 month Interviews, surveys, activity data Analysis • Presentation of Self (Goffman) • Cognitive Dissonance Theory (Festinger) • Transtheoretical Model of Behavior Change (Prochaska et al) Lifestyle Behavior Change UbiFit • Findings – Traditional models of validation for inference systems are problematic when deployed in the real world – Theories being used for UbiComp fitness/health applications are somewhat problematic (TTM) – Awareness of behavior through personal ambient display can overcome avoidance – Fitness behavior patterns are not very regular (exceptions are the rule) Talk Outline • • • Introduction The Shifting Paradigm in Collaborative Computing Insights from Prior Research – – – • Patterns of Behavioral Observations in Wikipedia – – – – • Expertise Locating Proactive Displays Lifestyle Behavior Change Collective Behavioral Observation Machine Learning Experiments Candidate Patterns Validation and Limitations Social Computational Systems Research Agenda – – High-Level Research Challenges Research Openings Collective Behavioral Observations • People make behavioral observations – Every day social/behavioral science – Motivating example – Driving Collective Behavioral Observations • People make behavioral observations – Every day social/behavioral science – Motivating example – Driving • Online Communities – Observations are attenuated – Leverage the power of the crowd, many people • Wikipedia Behavioral Observations – Barnstars Barnstars Barnstars Barnstar Gallery Observational Patterns • Can we identify patterns of user activity through nonspecialist observations? • Possible problems … – Pro-social recognition (piling on) – Singular activity – popular – Singular activity – extraordinary efforts Generate Train & Test Sets • Previous work (became Train Set) – – – – Mined Nov. 2006 Wikipedia data dump Over 14K unique barnstars, ~4900 recipients Created coding scheme, 7 top-level categories 3 coders, ~2126 barnstars • Additional Coding (new Test Set) – Random selection, cleaning – 2 coders, ~478 barnstars Train & Test Set Distributions Train Set Dimension of Observed Activity Test Set Code s % Codes % Editing Work 852 27.8 180 29.1 Social and Community Support Action 763 24.9 150 24.2 Border Patrol 342 11.2 81 13.1 Administrative 284 9.3 54 8.7 Collaborative Action and Disposition 244 8.0 41 6.6 Meta-Content Work 128 4.2 23 3.7 Undifferentiated Work 447 14.6 90 14.5 Classification Experiments • General Multi-label Classification Approaches – Problem Transformation (PT) – Algorithm Adaptation • Features – n-gram, barnstar name, barnstar image name, policy named, policy linked, link to a page, link to a specific edit, … • What worked reasonably well – PT1 – Independent binary classification – PT4 – Classifier for every set of applied labels – AA – MLkNN, multi-label version of k Nearest Neighbors PT1 – Results (AUC) Dimension of Activity Logistic Regressio Naïve n Bayes Random Forest (1k KNN Trees) (k=10) Administrative 0.833 0.949 0.942 0.903 Border Patrol 0.922 0.941 0.952 0.956 Collaborative Action 0.750 0.722 0.743 0.725 Editing 0.878 0.875 0.879 0.884 Meta-Content 0.835 0.842 0.883 0.800 Social and Community 0.802 0.796 0.797 0.805 Undifferentiated Work 0.847 0.848 0.844 0.854 Avg. AUC 0.838 0.853 0.862 0.847 Identifying Candidates • Select Barnstar Recipients – Recipients with 9 or more barnstars • 259 candidates, 4327 barnstars • Applied the Random Forest – Label the received barnstars • Candidate Recipients – Predominate observed activity if the same label applied to more than half Candidates Dimension of Activity Label Avg % Candidate s Editing E 67.9 25 Border Patrol B 73.2 13 Social and Community S 61.5 54 Administrative A 66.4 75 Collaborative Actions C 52.0 1 Meta-Content M 76.8 4 Undifferentiated Work U 60.0 10 182 Review Candidates & Labels • Random selection of pattern candidates – 39 of the 182 (21.4% ), yield 544 of 4327 barnstars (12.6%) • Validation – Possible duplicates, possible non-barnstars – Mislabel application Reviewing Patterns • Independence of the Observations – Seem relatively independent – No evidence of barnstars awarded to the same recipient for the exact same event • Limitations – Skew in what the community “values” and in the numbers (a challenge for ML validation – unbalanced data) – Link candidate and patterns to the actual edits • Future work Working at the Intersection • Contribution to Computing – Naturalistic datasets open interesting problems for ML algorithms • Massive datasets probably require application of ML techniques – Approaches for handling short text, incremental contributions – Unbalanced data McDonald, D. W., S. Javanmardi and M. Zachry (2011) Finding Patterns in Behavioral Observations by Automatically Labeling Forms of Wikiwork in Barnstars. Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym'11). Sajnani, H., S. Javanmardi, D. W. McDonald and C. Lopes. (2011) Multi-Label Classification of Short Text: A Study on Wikipedia Barnstars. Presented at the “Analyzing Microtext” Workshop at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11). Talk Outline • • • Introduction The Shifting Paradigm in Collaborative Computing Insights from Prior Research – – – • Patterns of Behavioral Observations in Wikipedia – – – – • Expertise Locating Proactive Displays Lifestyle Behavior Change Collective Behavioral Observation Machine Learning Experiments Candidate Patterns Validation and Limitations Social Computational Systems Research Agenda – – High-Level Research Challenges Research Openings The Shifting Paradigm in Collaborative Computing • There are a set of interesting problems at the intersection of computing and how people use computing • The issues at the intersection can change as participation scales up to larger numbers • Insight gained from studying the intersection can fundamentally change computing as a purposely designed artifact • Theory and methods originating from one single perspective (computational, behavioral, or social) are insufficient to fully interpret the happenings in the intersection Social Computational Systems Research Agenda • Defining SoCS – A Social Computational System (SoCS) interleaves machine activity and human activity to solve problems that neither machine nor human can solve alone. • Properties of SoCS – Allow people to do what people do best – Allow machines to do what machines do best – Solve unique problems that interleave both – Could be a 1-with-1 system (one person with one machine) – Perhaps scaling of SoCS could solve more difficult problems Interface Human Computer SoCS: Research Openings Collaborative Substrate/Infrastructure SoCS: Research Openings, Collaborative Substrate Collaborative Substrate/Infrastructure • Software Engineering – Architectures for effective interleaving – Toolkits to support new system development • Languages – Support massive parallelization between people/machine – Expressive asynchrony – Support task decomposition & recomposition among people/machine • Data Management – Data Provenance – Who generated the data? (Person or machine?) – How does data (quality) change over time? SoCS: Research Openings, Human/Social • Psychological – Motivations, incentives to make contributions – Promote high quality contributions – Skill development and individual growth • Interaction, Social – – – – Support for prosocial or congenial interaction Leveraging or minimizing conflict Effective support for meta conversations about the system/tasks Provide meaningful feedback on the work, tasks, contributions SoCS: Research Openings, Computational • Intelligent Systems – Understanding error rates of machine and people – Patterns across very large numbers of contributions – Patterns in very small contributions • Data Mining – Effective use of user contribution – Working to minimize multiple collections SoCS: Research Openings, Interface • Visualizations – Understand, interpret who, what, where of contributions – Where are groups of people, clusters of work – Where are there gaps Interface – Simplify making a contribution – Identifying tasks or places where contributions are needed – Administration tasks Human Computer • Usability Social Computational Systems Research Agenda • Three High-Level Challenges for SoCS – Methodological Challenge Effectively use existing methods to study the intersection and, where those methods fail, develop new methods to address the intersection. – Human Trait or Technical Quality (Trait/Quality) Challenge Understand the shifting influences of human traits and technical qualities across scales to accommodate shifting levels of participation in SoCS - potentially increasing or decreasing. – Design Challenge Communicate SoCS design principles so that the broader community of system builders and industry can readily utilize them. Promising Domains • Leverage human skills, insight, intuitions • Leverage the ability of machines to model, calculate, aggregate, visualize Promising Domains • Leverage human skills, insight, intuitions • Leverage the ability of machines to model, calculate, aggregate, visualize • Social Computational Systems as Applications – Cognitive support – memory, understanding, comprehension – Social support – facilitate interactions with others, cross-cultural – Educational – interleave people and machines for teaching as well as learning – Government – grow participation in decision making – Work/Labor – enable new forms of work, potentially new economies Promising Domains • Leverage human skills, insight, intuitions • Leverage the ability of machines to model, calculate, aggregate, visualize • Social Computational Systems for Grand Challenges – – – – Global warming Preserve cultural knowledge from extinction Sustainable economic development Health and wellness Obrigado! • Questions & Discussion • Acknowledgements – Patterns of Behavioral Observations Study • Sara Javanmardi, Hitesh Sajnani, Greg Tsoumakas, Mark Zachry, Crista Lopes • NSF IIS-0811210 – Many other students, collaborators on the prior work