Melding human and machine capabilities to document the world’s living organisms

advertisement
Melding human and machine
capabilities to document the
world’s living organisms
University of Maryland TMSP series
March 7, 2011
Project Team
Arijit Biswas (CS, Doctoral student); Anne Bowser (iSchool, Masters student); Jen Hammock (EOL);
Derek Hansen (iSchool); David Jacobs (CS, UMIACS); Darcy Lewis (iSchool, doctoral student);
Cyndy Parr (EOL); Jenny Preece (iSchool); Dana Rotman (iSchool, Doctoral student); Erin Stewart
(iSchool Masters student); Eric (CS, Undergrad student)
What we will talk about…
•
•
•
•
Research aims
Encyclopedia of Life (EOL)
Scientists, citizen scientists, enthusiasts
Identifying leaves:
– Machine vision approach
– Odd Leaf Out
– Field Mission Games
• Questions and Discussion
BioTracker system architecture
Mobile Devices
with BioTracker app
Camera
Internet connection
Match recommendations
Q&A component
Biotracks map
Photos,
Biocaching
and
commentary
upload
image
Community
Portal
user
input
Profiles, groups,
and species pages
Images,
accuracy
Identifications, Maps, estimate
Threaded discussion
Computational
Tools
Image database
Shape descriptors
Image segmentation algorithm
Image recognition algorithm
Inference system
Possible
new
species
answers
information collection, clarification questions
identification
and upload
Enthusiasts
Scientists
First research question
• What are the most effective strategies for motivating
enthusiasts and experts to voluntarily contribute and
collaborate?
The biodiversity crisis
The biodiversity crisis
Global collapse of
commercial fisheries by
2053
A crisis in science
Citizen science
Photo credit: Mary Keim
NA Butterfly Association
Fourth of July Count
Photo credit: Cornell Univ.
Audubon Christmas Bird Count
Powerful citizen science data
http://ebird.org
More species, less training
Bioblitzes
Geocaching
The Encyclopedia of Life
Imagine an electronic page for each
species of organism on Earth.
EOL is a content curation community
Content providers
Databases
Journals
LifeDesks
Public contributions
Curating
Commenting
Tagging
http://www.eol.org
EOL statistics
• 100+ partner databases
700 curators/1000s contributors/46,000 members
• 2.8 million pages
500 thousand pages with Creative Commons content
• Over 2 million data objects and >1 million pages with
links to research literature
• Traffic in past year: 1.7 million unique users, 6.2
million page views
Scientists and volunteers
"Scientists often have an aversion to what nonscientists
say about science” (Salk, 1986)
Collaboration is based on several factors:
• Shared vocabulary, practices, and meanings
• Mutual recognition of knowledge, competency, and
prestige
• Motivation to collaborate
Motivations for participation
Participation in social activities stems from personal
and collective reasons
Collectivism
Principalism
Egoism
Altruism
Batson, Ahmad, Tsang, 2002
Pilot study – scientists’ motivational
factors
5
4
Faculty/
research
position
3
Senior
Junior
Other
2
1
0
Egoism
Collectivism
Altruism
Principalism
Pilot study – volunteers’ motivational
factors
5
4
Years of
experience
3
1-3
4-5
2
1
0
Egoism
Egoism
Collectivism
Collectivism
Altruism
Altruism
Principalism
Primcipallism
Second research question
• How can a socially intelligent system be used
to direct human effort and expertise to the
most valuable collection and classification
tasks?
Mobile devices for plant species ID
•
•
•
•
•
Build new digital collections
Image-based search to assist in identification
Make this available on mobile devices
Use this platform to build user communities
Collaboration with dozens of people at Columbia
University, the Smithsonian NMNH, and UMD.
New images
For EOL, people using mobile devices, highest
quality images of live specimens.
For Botanists: digitize
90,000+ Type Specimens
at Smithsonian
And for machines, images that capture leaf diversity
Computer Vision for species ID
Use a photo to search a
data set of known
species.
Goal is to assist the user,
not make identification
fully automatic.
1. Take a photo of a leaf
on a plain
background.
2. Automatic segmentation and
stem removal
Segmentation relies on value and saturation of
pixels, EM algorithm, domain knowledge.
Must handle diversity of shapes
Humulus
japonicus
Ipomoea lacunosa
3. Build shape descriptors
• Inner Distance Shape Context
• Multiscale histograms of curvature
4. Search data set
System accuracy
Incorporating games
into the Biotracker platform
Using games to direct human effort and computational
resources towards species identification and
classification
• Data Validation Games
• Field Data Collection Games
Odd Leaf Out
Using computer games for data validation and
algorithm refinement
Odd Leaf Out
Research Questions
• What will make this game more fun?
• What motivates users to play when the data is imperfect?
• How can the game assist with algorithm improvement?
Odd Leaf Out
Next Steps
• Continue User Testing
• Analyze Game Play Logs and Surveys
• Preferred version
• What aspects give most accurate data
• Does this provide useful feedback into LeafSnap algorithm
• Place game on Mechanical Turk for additional data
Biotracker field missions
Developing mobile-social games that motivate citizens
to collect and validate useful scientific data
Smart Phone as Data
Collection Tool
Inspirations
Geocaching
Letterboxing
BioBlitz
SFZero
Project Noah
Biotracker
Missions
Biotracker field missions
Next steps - prototyping and user testing
Low fidelity prototypes
Field testing at UMD
Questions and Discussion
www.biotrackers.net
Download