Melding human and machine capabilities to document the world’s living organisms University of Maryland TMSP series March 7, 2011 Project Team Arijit Biswas (CS, Doctoral student); Anne Bowser (iSchool, Masters student); Jen Hammock (EOL); Derek Hansen (iSchool); David Jacobs (CS, UMIACS); Darcy Lewis (iSchool, doctoral student); Cyndy Parr (EOL); Jenny Preece (iSchool); Dana Rotman (iSchool, Doctoral student); Erin Stewart (iSchool Masters student); Eric (CS, Undergrad student) What we will talk about… • • • • Research aims Encyclopedia of Life (EOL) Scientists, citizen scientists, enthusiasts Identifying leaves: – Machine vision approach – Odd Leaf Out – Field Mission Games • Questions and Discussion BioTracker system architecture Mobile Devices with BioTracker app Camera Internet connection Match recommendations Q&A component Biotracks map Photos, Biocaching and commentary upload image Community Portal user input Profiles, groups, and species pages Images, accuracy Identifications, Maps, estimate Threaded discussion Computational Tools Image database Shape descriptors Image segmentation algorithm Image recognition algorithm Inference system Possible new species answers information collection, clarification questions identification and upload Enthusiasts Scientists First research question • What are the most effective strategies for motivating enthusiasts and experts to voluntarily contribute and collaborate? The biodiversity crisis The biodiversity crisis Global collapse of commercial fisheries by 2053 A crisis in science Citizen science Photo credit: Mary Keim NA Butterfly Association Fourth of July Count Photo credit: Cornell Univ. Audubon Christmas Bird Count Powerful citizen science data http://ebird.org More species, less training Bioblitzes Geocaching The Encyclopedia of Life Imagine an electronic page for each species of organism on Earth. EOL is a content curation community Content providers Databases Journals LifeDesks Public contributions Curating Commenting Tagging http://www.eol.org EOL statistics • 100+ partner databases 700 curators/1000s contributors/46,000 members • 2.8 million pages 500 thousand pages with Creative Commons content • Over 2 million data objects and >1 million pages with links to research literature • Traffic in past year: 1.7 million unique users, 6.2 million page views Scientists and volunteers "Scientists often have an aversion to what nonscientists say about science” (Salk, 1986) Collaboration is based on several factors: • Shared vocabulary, practices, and meanings • Mutual recognition of knowledge, competency, and prestige • Motivation to collaborate Motivations for participation Participation in social activities stems from personal and collective reasons Collectivism Principalism Egoism Altruism Batson, Ahmad, Tsang, 2002 Pilot study – scientists’ motivational factors 5 4 Faculty/ research position 3 Senior Junior Other 2 1 0 Egoism Collectivism Altruism Principalism Pilot study – volunteers’ motivational factors 5 4 Years of experience 3 1-3 4-5 2 1 0 Egoism Egoism Collectivism Collectivism Altruism Altruism Principalism Primcipallism Second research question • How can a socially intelligent system be used to direct human effort and expertise to the most valuable collection and classification tasks? Mobile devices for plant species ID • • • • • Build new digital collections Image-based search to assist in identification Make this available on mobile devices Use this platform to build user communities Collaboration with dozens of people at Columbia University, the Smithsonian NMNH, and UMD. New images For EOL, people using mobile devices, highest quality images of live specimens. For Botanists: digitize 90,000+ Type Specimens at Smithsonian And for machines, images that capture leaf diversity Computer Vision for species ID Use a photo to search a data set of known species. Goal is to assist the user, not make identification fully automatic. 1. Take a photo of a leaf on a plain background. 2. Automatic segmentation and stem removal Segmentation relies on value and saturation of pixels, EM algorithm, domain knowledge. Must handle diversity of shapes Humulus japonicus Ipomoea lacunosa 3. Build shape descriptors • Inner Distance Shape Context • Multiscale histograms of curvature 4. Search data set System accuracy Incorporating games into the Biotracker platform Using games to direct human effort and computational resources towards species identification and classification • Data Validation Games • Field Data Collection Games Odd Leaf Out Using computer games for data validation and algorithm refinement Odd Leaf Out Research Questions • What will make this game more fun? • What motivates users to play when the data is imperfect? • How can the game assist with algorithm improvement? Odd Leaf Out Next Steps • Continue User Testing • Analyze Game Play Logs and Surveys • Preferred version • What aspects give most accurate data • Does this provide useful feedback into LeafSnap algorithm • Place game on Mechanical Turk for additional data Biotracker field missions Developing mobile-social games that motivate citizens to collect and validate useful scientific data Smart Phone as Data Collection Tool Inspirations Geocaching Letterboxing BioBlitz SFZero Project Noah Biotracker Missions Biotracker field missions Next steps - prototyping and user testing Low fidelity prototypes Field testing at UMD Questions and Discussion www.biotrackers.net