Practice and Theory in Digital Libraries: The Case of Open Video

advertisement
Practice and Theory in Digital
Libraries: The Case of Open
Video
Libraries in the Digital Age (LIDA05)
Dubrovnik, Croatia
Gary Marchionini, PhD
University of North Carolina at Chapel Hill
www.ils.unc.edu/~march
march@ils.unc.edu
May 30, 2005
Outline
• Digital Libraries as phenomena
• Multimedia and video challenge our
text biases
• Open Video concepts and system
Moebius
– User studies
• Conclusion
Gary Marchionini, UNC-CH LIDA 2005
Pragmatics
• Useful theory and practice are a Moebius strip
• DL practice in informed by multiple theories
related to:
–
–
–
–
Information structure
Human behavior
System design
Social-political-economic constraints and organizational
behavior
– History and epistemology
• “We want principles, not only developed—the work of the
closet—but applied, which is the work of life.” Horace
Mann, Thoughts, 1867
Gary Marchionini, UNC-CH LIDA 2005
Theories of What and Why
• Digital extensions of physical
libraries
• Augmentations of intellect
• Collaborative spaces: sharium
• Cultural institutions
• World Brain
• Economic models
• Complex information systems
Gary Marchionini, UNC-CH LIDA 2005
Theories of How
• Reuse and open source information
• Levels of abstraction
• Information retrieval
• Information interaction
• Iterative design and evaluation
• Resource management
Gary Marchionini, UNC-CH LIDA 2005
Digital Library Design Space
1999: What Has Changed in
2005?
Community
Technology
Services
Content
Adapted from Marchionini & Fox, IP&M, 1999
Gary Marchionini, UNC-CH LIDA 2005
Provocation: Text no longer rules:
• The Net generation depends much less of reading (they are
entering universities as students and soon, as professors;
Oblinger & Oblinger, 2005 Educause book). In the US:
– Children age 6 or younger: average of 2 hrs/day using screen
media, 1.6 hrs/day playing outside, 39 min. reading
– 13-17 yr olds: average 3.1 hrs/day watching TV and 3.5
hrs/day with digital media. They multitask
– >2M million US children (ages 6–17) have their own Web site.
Girls are more likely to have a Web site than boys (12.2
percent versus 8.6 percent).
– Ability to use nontext expression—audio, video, graphics—
appears stronger in each successive cohort.
• Multimedia and Multitasking the trend of 21st century
• Information specialists MUST get over our text bias
Gary Marchionini, UNC-CH LIDA 2005
Open Video DL Case
• Open
– Public good
– Reusable
• Files not streams
• Chunking
• Agile views user interface
– Alternative representations (views)
– Agile control mechanisms
Gary Marchionini, UNC-CH LIDA 2005
Open Video Vision/Contributions
• An open repository of video files that can be re-used in a
variety of ways by the education and research communities
– Encourages contributions
– A testbed for interactive interfaces
• An easy to use DL based upon the agile views interface
design framework
– Multiple, cascading, easy to control views (pre, over, re,
shared, peripheral)
– Views based upon empirically validated surrogates
– An environment for building theory of human information
interaction
• A set of methods and metrics that reveal how people
understand digital video through surrogates
Gary Marchionini, UNC-CH LIDA 2005
Background & Status
• Begun 1995 with colleagues at UMD & BCPS
• Funding: NSF, NASA, NSF/LoC
• Collaborators/Contributors: I2-DSI, ibiblio, CMU,
UMD, NIST, Prelinger and Internet Archives,
NASA, ACM
• ~2600 video segments
• ~2000 different titles
• ~15000 unique visitors per month
• MPEG-1, MPEG-2, MPEG-4, QT
• OAI provider
• Ongoing user studies
• New Preservation initiative
Gary Marchionini, UNC-CH LIDA 2005
Agile Views Interface Research
• Provide a variety of access
representations (e.g., indexes) and
control mechanisms
• Usual search and browse capabilities
• Leverage both visual and linguistic
cues
• Create and test surrogates for
overview preview, shared and history
views
Gary Marchionini, UNC-CH LIDA 2005
User Study Framework
GOALS
learning, work, entertainment
TASKS
EFFORT
TIME
time spent searching
and viewing results
MENTAL LOAD
perceptual load
cognitive load
PHYSICAL LOAD
amount of muscle
movement
VIDEO
CHARACTERISTICS
select video for viewing
select scene for viewing
copy and use scenes
copy and use frames
other tasks?
genre: documentary,
narrative
topic: literal, figurative
style: visual, audio,
textual, place
SURROGATES,
AGILE VIEWS
display controls
keywords
storyboard w/ text, audio
slide show w/ text, audio
fast forward w/ audio
poster frames
Gary Marchionini, UNC-CH LIDA 2005
OUTCOMES
PERFORMANCE
INDIVIDUAL
CHARACTERISTICS
domain experience
video experience
cultural experience
computer experience
info seeking experience
metacognitive abilities
demographics
retrieval (precision, recall)
recognition (objects, action)
gist comprehension
(linguistic, visual)
SATISFACTION
perceived usefulness
perceived ease of use
flow
user satisfaction
The Surrogates
• Storyboard with text keywords (20-36 per
board@ 500 ms)
• Storyboard with audio keywords
• Slide show with text keywords (250ms repeated
once)
• Slide show with audio keywords
• Fast forward (~ 4X)
• Fast forwards 32X, 64X, 128X, 256X
• Poster frames
• Real time clips
• Text titles
Gary Marchionini, UNC-CH LIDA 2005
Surrogate Examples
Type of surrogate
Examples
Text surrogate
Title, keyword, description, etc.
Still image surrogate
Poster frame, storyboard/filmstrip, slide show, video stream, key-frame-based table of
contents, etc.
Moving image surrogate
Skim, fast forward, etc.
Audio surrogate
Spoken keywords, environmental sounds, music, etc.
Mutlimodal surrogate
Text surrogate + still image surrogate, still image surrogate + audio surrogate, etc.
Gary Marchionini, UNC-CH LIDA 2005
Metrics
Text gist
Still image
Action
Recognition
Object recognition (text)
Object recognition (graphical)
Action recognition
Inference
Gist determination
(free text)
Gist determination (multiplechoice)
Visual gist (vist) determination
Gary Marchionini, UNC-CH LIDA 2005
User Studies
• Study 1: Qualitative Comparison of Surrogates
(ECDL 02)
• Study 2: Fast Forwards (JCDL 03)
• Study 3: Narrativity (CHI 02; ASIST 03 paper)
• Study 4: Shared views and History Views (Geisler
dissertation)
• Study 4: Poster frames and text (eye tracking,
CIVR 03)
• Study 5: TREC evaluations (03 and 04)
• Study 6: cognitive load and ISEE (Mu diss.)
• Study 7: relevance judgments for video (Yang
diss.)
• Study 8: Surrogate integration study (in analysis)
• Others: several specific master’s papers (Hughes,
Gruss
Gary Marchionini, UNC-CH LIDA 2005
Study 1: Compare Surrogates
• What are the strengths and
weaknesses of different surrogates
from the users’ perspective?
• Are any of the surrogates better than
the others in supporting user
performance?
Gary Marchionini, UNC-CH LIDA 2005
The Surrogates
• Storyboard with text keywords (2036 per board@ 500 ms)
• Storyboard with audio keywords
• Slide show with text keywords
(250ms repeated once)
• Slide show with audio keywords
• Fast forward (~ 4X)
Gary Marchionini, UNC-CH LIDA 2005
Method
• 7 video segments (2-10 min), 5 surrogates
created for each
• 10 subjects with high video and computer
experience
• Three phases (all multi-camera videotaped)
– View full video then use 3 surrogates, repeat
• Participant observation and debriefing
– Do NOT view full video, use 3 surrogates, repeat
• Participant observation and debriefing
– Complete 3 assigned tasks with surrogates of choice
• Think aloud and debriefing
•
http://www.open-video.org/experiments/chi-2002/methods/study1.mov
Gary Marchionini, UNC-CH LIDA 2005
Tasks
• Gist determination—free text
• Gist determination—multiple choice
• Object recognition—textual
• Object recognition—graphical
• Action recognition (2-3 second clips)
• Visual gist (predict which frames
belong)
– http://www.open-video.org/experiments/chi2002/surrogates/index.html
Gary Marchionini, UNC-CH LIDA 2005
Preferences
• In debriefing after each phase, subjects
asked about preferences.
• Some preferences changed over the
phases
• 2 subjects preferred ff
• 4 subjects said ff if audio keywords added
• 1 storyboard with audio keywords
• 2 slide show with audio keywords
•  drop ss with text keywords, develop ff
Gary Marchionini, UNC-CH LIDA 2005
Performance
• No SRD on gist (both free text and multiple
choice)
• SRD on action recognition favoring ff
• ‘Near’ SRD on text object recognition favoring
SB/w audio keywords
• 8:1 to 29:1 compaction rates suitable for tasks
• Psychometric and face validity support for the
tasks (means and variances; relevant to real
tasks)
• SRD in gist and visual gist for one video
– Homogeneity of frames diminishes surrogate value
– Keywords help when visual variability decreases
Gary Marchionini, UNC-CH LIDA 2005
Qualitative Results
• Subjects suggested different surrogates
for different tasks (e.g., ff for judging kid
safe, sb for identifying images, ff for video
styles)
• Three senses of gist
– Topic (T)
– Narrativity (N)
– T+N+visual style
• Individual preferences and experiences
influence surrogate effectiveness
Gary Marchionini, UNC-CH LIDA 2005
Study 2: Fast Forward
• How fast can we make fast forwards?
–
–
–
–
4 ff conditions (32X, 64X, 128X, 256X)
Four video segments for each condition
45 subjects (1/2 UG, 1/2 grad, 2/3 female)
6 tasks (full text gist, multiple choice gist,
word object recognition, graphical object
recognition, action recognition, visual gist)
– Counterbalance speed and videos
– Web-driven experimental condition, 3-camera
video tapes, single subject at a time in
usability laboratory
Gary Marchionini, UNC-CH LIDA 2005
Example Image Recognition Stimulus
Gary Marchionini, UNC-CH LIDA 2005
Results
• SRD on 4 of 6 tasks as speed increases, however,
reasonable performance at even the highest rate
• Video content/genre interacts with performance
• Preference does not parallel performance (people
can perform well under extreme conditions but
do not like/enjoy)
• No user characteristic differences (age, sex)
• Give users control but select appropriate
defaults
• Caveat: controlled, independent focus on FF,
likely a lower bound on performance
Gary Marchionini, UNC-CH LIDA 2005
Speed Effects on Performance
Visual gist at 32
is better than at
other speeds
12
11
Object recognition (g) at 32
and 64 is better than at 256
10
9
Mean score
8
7
6
Gist comp (ft)
Action recognition at 32 is
better than at 128 or 256.
Visual gist
Object rec (g)
5
Action rec
4
3
2
Gist comprehension (ft)
at 32 and 64 is better
than at 128 and 256
1
0
32
64
128
Surrogate speed
Gary Marchionini, UNC-CH LIDA 2005
256
Narrativity Study
• CHI walk up kiosk, 20 people used
• 20 one-minute clips ( half b&w, no
audio) selected on 2 criteria: contain
characters, have cause/effect
relations between scenes (5 in each
category)
• SRD on chars, cause, and interaction
Gary Marchionini, UNC-CH LIDA 2005
Shared Views and History Views
Studies
• Evaluate AV Design Framework by
instantiating and evaluating a design
• Shared (based on recommendations) and
History Views (based on logs)
• Phase 1: compare OV to Views interface
(28 participants). OV>accuracy; NSRD on
time, but learning effect;
AV>navigation/efficiency; AV>satisfaction
• Phase 2: qualitative analysis of shared and
history views
Gary Marchionini, UNC-CH LIDA 2005
Poster Frame Study
• Research Questions:
– Given both textual and visual metadata;
which surrogate will be utilized, which
surrogate will be preferred?
– Does the placement of the surrogates
affect how they are used?
– Does the assigned task affect how
surrogates are used?
– Does personal preference play a role
in how surrogates are used?
Gary Marchionini, UNC-CH LIDA 2005
Study Methods / Procedures
• 12 undergraduate students (paid volunteers)
• Pre-Study questionnaire
– Demographics
– Visual vs. Verbal learning style (VVQ)
• 10 search problems
– Counter-balanced
• Design 1 and 2
– 1 : text on left / visuals on right
– 2 : visuals on left / text on right
• Eyetracking
• Post-study questionnaire
– Follow up questions
Gary Marchionini, UNC-CH LIDA 2005
Results
• All participants over all tasks:
– Mean time looking at text = 29.7 sec.
– Mean time looking at pics = 6.8 sec.
– 75% of fixations over text
– 18% of fixations over pics
– First fixations over text = 65
– First fixations over pics = 54
• Text requires and gets more user attention
Gary Marchionini, UNC-CH LIDA 2005
Results cont’d
• Design 1 vs. Design 2
– When text was placed on the left, mean time
per fixation was slightly higher
• VVQ
– Balanced group spent more time looking at
text
• Tasks
– Varied by task:
• Time spent looking at text
• Time spent per fixation over text
• Frequency of fixations over text
Gary Marchionini, UNC-CH LIDA 2005
Screen Shots
Gary Marchionini, UNC-CH LIDA 2005
Screen Shots
Gary Marchionini, UNC-CH LIDA 2005
Screen Shots
Gary Marchionini, UNC-CH LIDA 2005
Tasks
• Please find a video that discusses the destruction
earthquakes can do to buildings. These search
results are from a search on the word
“Earthquake”.
• Please find a video that discusses nurses and
their contributions to the United States Army.
These search results are from a search on the
word “Work”.
• Please choose a video from the following list that
you think would be entertaining for you and your
friends to watch.
Gary Marchionini, UNC-CH LIDA 2005
Discussion
• In this restricted situation (i.e. preformulated results page) participants
used text as the main anchor point
– ? Because text is a better surrogate?
– ? Because text contains more
information?
– ? Because text is more familiar to
people
– ? Because tasks directed users to text?
Gary Marchionini, UNC-CH LIDA 2005
Discussion cont’d
• Layout seemed to have little effect on how
surrogates were used.
– Difference of .03 of a second
– Participants didn’t report a significant
preference for layout
• Some liked design 1 and some liked design 2
• VVQ
– Hypothesis that visual learners would use
visual surrogates and verbal learners would
use verbal surrogates was not supported
Gary Marchionini, UNC-CH LIDA 2005
Discussion cont’d
• Tasks
– Some tasks took more time to complete
• Regardless of:
– Counterbalancing order
– Participant
– Layout design
Gary Marchionini, UNC-CH LIDA 2005
Text or Pictures?
• Text was reported as:
+ Being the search anchor
+ Containing significant topical information
– Taking longer to read than pictures
• Visuals were reported as:
Being globally liked
Being used to quickly narrow down choices
Taking less time to decode than text
All participants said the results page would be
weaker without them
– Often lacking in reference points
+
+
+
+
Gary Marchionini, UNC-CH LIDA 2005
Conclusion
• Visual metadata was used to make
(confirm???) relevance judgments
• Combination of visual & verbal
stronger than one or the other
• Generalize with caution:
– Small number of study participants
– Specific set of search results pages
– Ten specific search tasks.
Gary Marchionini, UNC-CH LIDA 2005
The Integration Study
• Compare old OV to redesign?
Compare to Internet archive?
• How do multiple surrogates and agile
control mechanisms affect
understanding of video?
• Accuracy? Time? Satisfaction?
Cognitive load? Navigational
overhead?
• Data analysis underway
Gary Marchionini, UNC-CH LIDA 2005
Relevance Study (Yang)
• 3 task groups (illustration [10 profs],
collection building [8 video librarians],
video production [8 producers/editors])
• In-depth interviews
• Text, audiovisual, implicit categories of 39
different criteria
– Topicality most often mentioned, but far less
than text studies
– Production groups less varied, more
audiovisual criteria
Gary Marchionini, UNC-CH LIDA 2005
Theory-Practice Lessons from OV
• User-centered design and user
testing pays off, i.e. research informs
practice
• Production system operation raises
new kinds of research questions
– Sustainability models
– Curatorial models
– Preservation challenges
– Upgrade paths for universal access
Gary Marchionini, UNC-CH LIDA 2005
DL Research Directions
• Incorporating people into DLs
(patrons, librarians)
• Leveraging contributions and
implications for curatorship
• Preservation strategies; how much
context?
• Hybrid physical-digital library
operations
Gary Marchionini, UNC-CH LIDA 2005
Observations
• A moebius strip is infinite: the
interplay between theory and
practice goes on
• Need for collaboration between
working libraries and researchers
Gary Marchionini, UNC-CH LIDA 2005
Selected Open Video Readings
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Yang, M. & Marchionini, G. (2005). “Deciphering visual gist and its implications for video retrieval and interface
design.” Conference on Human Factors in Computing Systems (CHI). Portland, OR. Apr. 2-7, 2005.
Yang, M. & Marchionini, G. (2004). “Exploring Users' Video Relevance Criteria -- A Pilot Study.” Proceedings of the
Annual Meeting of the American Society of Information Science and Technology, pp. 229-238. Nov. 12-17, 2004.
Providence, RI.
Yang, M., Wildemuth, B., & Marchionini, G. (2004). “The relative effectiveness of concept-based versus content-based
video retrieval.” Proceedings of the ACM Multimedia conference, pp. 368-371.
Mu, X., & Marchionini, G. (2003). “ Enriched video semantic metadata: authorization, integration, and presentation.”
Proceedings of the Annual Meeting of the American Society for Information Science and Technology, 40, 316-322.
Wilkens, T., Hughes, A., Wildemuth, B. M., & Marchionini, G. (2003). “ The role of narrative in understanding digital
video: an exploratory analysis.” Proceedings of the Annual Meeting of the American Society for Information Science,
40, 323-329.
Hughes, A., Wilkens, T., Wildemuth, B., Marchionini, G. (2003). “Text or Pictures? An Eyetracking Study of How People
View Digital Video Surrogates.” Proceedings of CIVR 2003, pp. 271-280.
Wildemuth, B. M., Marchionini, G., Yang, M., Geisler, G., Wilkens, T., Hughes, A., and Gruss, R. (2003). “How Fast Is
Too Fast? Evaluating Fast Forward Surrogates for Digital Video.” Proceedings of the 3rd ACM/IEEE-CS Joint Conference
on Digital Libraries (JCDL 2003), pp. 221-230. (Vannevar Bush Award Winner for Best Paper at JCDL 2003)
Mu, X., Marchionini, G., & Pattee, A. (2003). “ The Interactive Shared Educational Environment: User interface, system
architecture and field study.” Proceedings of the Annual Meeting of the American Society for Information Science and
Technology, 40, 291-300.
Mu, X., Marchionini, G. (2003) “Statistical Visual Features Indexes in Video Retrieval.” Proceedings of SIGIR 2003, pp.
395-396.
Marchionini, Gary (2003). “Video and Learning Redux: New Capabilities for Practical Use.” Educational Technology.
Marchionini, Gary and Geisler, Gary. (2002). “The Open Video Digital Library.” D-Lib Magazine, Vol. 8, Number 12,
December.
Barbara M. Wildemuth, Gary Marchionini, Todd Wilkens, Meng Yang, Gary Geisler, Beth Fowler, Anthony Hughes, and
Xiangming Mu (2002). “Alternative Surrogates for Video Objects in a Digital Library: Users� Perspectives on Their
Relative Usability.” Proceedings of the 6th European Conference on Digital Libraries, September 16 - 18, 2002, Rome,
Italy.
Geisler, G., Marchionini, G., Wildemuth, B. M., Hughes, A., Yang, M., Wilkens, T., and Spinks, R. (2002). “Video
Browsing Interfaces for the Open Video Project.” Proceedings of CHI 2002, Extended Abstracts.
Nelson, Michael L., Marchionini, Gary, Geisler, Gary, and Yang, Meng (2001). "A Bucket Architecture for the Open Video
Project [short paper]." JCDL ’01, ACM - IEEE Joint Conference on Digital Libraries (June 24-28, 2001, Roanoke,
Virginia).
Geisler, Gary, and Gary Marchionini (2000). "The Open Video Project: A Research-Oriented Digital Video Repository
[short paper]." In Digital Libraries '00: The Fifth ACM Conference on Digital Libraries (June 2-7 2000, San Antonio, TX).
New York: Association for Computing Machinery, 258-259.
Slaughter, L., Marchionini, G. and Geisler, G. (2000). "Open Video: A Framework for a Test Collection." Journal of
Network and Computer Applications, Vol. 23(3). San Diego: Academic Press.
Download