Loren Terveen Computer Science & Engineering The University of Minnesota August 2011 1 Theory Simulation Lab studies Surveys Qualitative studies Build and learn (e.g., Google, Facebook, Wikipedia) Build To Learn GroupLens Research • • • Create new interaction / social computing techniques Do empirical, quantitative research Learn from what we and others build Data Experimental Control Learning from others’ data 2. Learning from our own data 3. Exercising experimental control 1. Q&A systems Wikipedia WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance. Lam, S.K., Uduwage, A., Dong, Z., Sen, S., Musicant, D.R., Terveen, L., Riedl, J. WikiSym 2011. NICE: Social translucence through UI intervention. A. Halfaker, B. Song, D. A. Stuart, A. Kittur and J. Riedl. Wikisym 2011. Don't bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. A. Halfaker, A. Kittur and J. Riedl. Wikisym 2011. Mentoring in Wikipedia: A Clash of Cultures. D. Musicant, Y. Ren, J. Johnson and J. Riedl. Wikisym 2011. The Effects of Group Composition on Decision Quality in a Social Production Community, Lam, S.K., Karim, J., Riedl, J. Group 2010. The Effects of Diversity on Group Productivity and Member Withdrawal in Online Volunteer Groups, Chen, J., Ren, Y., Riedl, J. CHI 2010. rv you're dumb: Identifying Discarded Work in Wiki Article History, Ekstrand, M.D., Riedl, J.T. Wikisym 2009. A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia, Halfaker, A., Kittur, N., Kraut, R., Riedl, J. Wikisym 2009. Is Wikipedia Growing a Longer Tail?, Lam, S.K., Riedl, J. Group 2009. Wikipedians are born, not made: a study of power editors on Wikipedia, Panciera, K., Halfaker, A., Terveen, L. Group 2009. SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia, Cosley, D., Frankowski, D., Terveen, L., Riedl, J. IUI 2007. Creating, Destroying, and Restoring Value in Wikipedia, Priedhorsky, R., Chen, J., Lam, S.K., Panciera, K., Terveen, L., Riedl, J. Group 2007. WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance. Lam, S.K., Uduwage, A., Dong, Z., Sen, S., Musicant, D.R., Terveen, L., Riedl, J. www.grouplens.org/node/466 http://www.nytimes.com/2011/01/31/business/media/3 1link.html?_r=1&src=busln A topic generally restricted to teenage girls, like friendship bracelets, can seem short at four paragraphs when compared with lengthy articles on something boys might favor, like, toy soldiers or baseball cards, whose voluminous entry includes a detailed chronological history of the subject. (BTW, it’s not about the friendship bracelets) 9 Only 16% of new editors joining Wikipedia during 2009 identified themselves as women Women made only 9% of the edits by this cohort New women editors are more likely to stop editing and leave Wikipedia when their edits are reverted Topics of particular interest to women appear to get less (and poorer) coverage in Wikipedia (Hmm… maybe Wikipedia has a low collective IQ!) Come to Wikisym to get the details! MovieLens Cyclopath 200 Union St SE Lagoon Theatre How do contributors to open content systems become contributors? Inspired by… Wikipedians fill different niches than nonWikipedians Wikipedians branch out to new areas and topics as they mature Wikipedians take on more “community work” as they mature Qualitative study with nine participants self-reporting Evidence for “becoming”? Quantity of work Quality of work Nature of work Are Wikipedians Born or Made? A registered editor with 250+ edits over his/her lifetime If editors reach 250 edits within our data set, they are labeled Wikipedian from the beginning English Wikipedia dump (January 13, 2008) Edits from bots and other non-human means removed We counted: Only registered editors Wikipedians (users with 250+ edits) - 38K Non-wikipedians - random sample of 38K Edits per day per editor (“User days”) (“Day 1”) Wikipedians are Born Made Is a user’s fate sealed? Measure: Persistent Word Revisions (PWRs) Proportion of words added that persist five revisions Wikipedians are Born Made Other quality metrics? Conjecture: Wikipedians take on community maintenance work over time Several ways to formalize Editing in “talk” (and other) namespaces (Nope: still “born”) Referring to “community norms” (Wikipedia policies) to explain edits Wikipedians are Born Made Learning norms vs. learning to appeal to the norms? Training: effective editing Common pattern: Initial burst of activity, decline, steady state Wikipedians look different from day one Little evidence for “Becoming Wikipedian”: Wikipedians are born, not made Can we reconcile? This is depressing! Possible responses: Early interventions Change the culture Systemic initiatives, e.g., APS Wikipedia Initiative: http://www.psychologicalscience.org/index.php/members/apswikipedia-initiative Accept the reality of the long tail We can’t ask Wikipedia users about our interpretations What if the learning happened before users registered? As of September 2009, we identified: 1172 “unambiguous” users 268 of these users made some edits 440 “ambiguous” users For unambiguous users Day 1 = First time a user came to the site (not the day they registered) Same pattern as for Wikipedia 300 # of users 250 200 150 100 50 0 Do Not Edit Do Edit # of users 800 700 or two A minute 600 500 400 300 200 100 0 0 <= 5 min. 1-50 <= 15 <= 30 <= 60 51-100 101-250 251-500 5011000 1001+ “Born, Not Made” still seems true Cyclopath user surveys – Wikisym 2011 paper Why these patterns? What ‘triggers’ initial contribution? And how might we nurture ongoing participation? Cyclopath contextual interviews planned Motivating participation: How can we get more work done in open content systems? Idea: match users with tasks they’re likely to be interested in and capable of doing Requirements: Introduce tasks matching algorithms/interfaces Assign users to different conditions Gather data necessary for evaluation Survey users Goals Get work done Nurture new users Serve community Intelligent Task Routing Tools Theory Recommender algorithms Interaction design Collective Effort Model Social Influence MovieLens Task: Edit movie content theory-based High Pred Pick movies the system thinks the user will really like Rare Rated Pick movies the user has rated that few others have Needs Work Pick movies that are missing the most information Random Pick random movies (individual value of outcomes) (lower effort for a given performance) (contribution matters to group) (baseline) Assign ML users to four groups, one per algorithm About 2,000 subjects, 200 contributors Count # editors, contributions, fields Editing behavior by strategy 250 Count 200 Rare rated: dominant Needs work: bang for buck Random: not bad here High prediction: lousy 150 HighPred RareRated NeedsWork Random 100 50 0 Number of editors Number of edits Metric Fields filled in Task matching worked Familiarity of user with task was most helpful Reduces effort Increases value Note: we’ve tried this approach in Wikipedia and Cyclopath, too Different issues Generality MovieLens 14 years of continuous development Several complete software architecture / UI redos (and another needed!) 1 full-time software engineer Much graduate student time over the years ~140K lines of code, in multiple languages 1 full-time software engineer Grad students: expectation they will spend 25-30% of their time on ‘development’ tasks Looming tasks: UI redesign / reimplementation Expanding geographic coverage Significant resources devoted to development But: typically enables new experiments and/or builds the user community And: funding for these resources often came only due to the success of the system/community Fewer papers But: papers of a type that would be impossible otherwise We can investigate questions in different settings, applying different methods: cumulative science Cycloplan (in collab. with Metropolitan Council) Planners can develop ideas informed by usage data (“What if I add a trail here?”) Planners can share plans with public Public can explore plans, give feedback (“How much would my route be improved with this trail?”) Public can share concerns directly to relevant officials Participatory Crowdsourcing (in collab. with IBM) Citizens as sensors Continua of participation; incentives Models for participation in open content systems Roles, privileges, processes: Nupedia vs. Wikipedia Models for volunteer participation Initial vs. ongoing http://www.grouplens.org/biblio The GroupLens Research Group, particularly: John Riedl Joe Konstan Reid Priedhorsky Dan Cosley Katie Panciera And: Tom Erikcson, IBM Me: terveen@gmail.com Twitter: @lorenterveen