VKB - Center for the Study of Digital Libraries

advertisement
Designing Systems to
Support Document Triage
Frank Shipman
Center for the Study of Digital Libraries
Texas A&M University
Outline
• What is Document Triage?
• Spatial Hypertext and VKB
• VKB and Document Triage
• Effects of Display Configuration
• Recognizing User Interest /
Document Value
• Current Directions
Document Triage
• The practice of quickly determining the
merit and disposition of relevant
documents in a collection of documents
• Common aspects
– Selection from information repository via
querying and browsing interfaces
– Extensive, hyper-extensive and intensive
reading and viewing in reading interfaces
– Collection and interpretation of resources in
organization interface
– Mode switching / attention shifting
Information Work
Variety of information tasks
– Short-term: Facts and references
• What is the escape velocity?
– Long-term: Analysis and synthesis
• How to design a space craft?
For longer-term information activities
the work really begins after
potentially relevant materials are
located.
Information Life-Cycle
Modification:
Reading results
in annotation,
note taking,
and writing
Annotation & Authoring
Understanding one document may
require other documents or result
in further information requests
Comprehension:
Skimming and Reading
Added
content
influences
further
access
Location:
Searching and Browsing
Located resources must be understood to be evaluated
Modified from work on software libraries: [Fischer, Henninger, Redmiles 1991]
Document Triage
• We want to look at situations
where people are reading
more than one document at
once
• Document triage places
different demands on
attention than singledocument reading activities
• Continuum of types of
reading: working in overview
(metadata), reading at various
levels of depth (skimming),
reading intensively
Library Table as Success Model
• How people make use of library resources
can give design goals.
• Characteristics of the library table:
– Integration and easy differentiation of source
materials and personal interpretation
– Implicit and explicit expression via spatial layout
and attached annotation
– Patrons can collaborate using the materials on
the table as a prop for their conversation
• Limitations: The library table and resources
are shared/limited resources, so must be
cleaned up after each work session.
Outline
• What is Document Triage?
• Spatial Hypertext and VKB
• VKB and Document Triage
• Effects of Display Configuration
• Recognizing User Interest /
Document Value
• Current Directions
Spatial Hypertext and Document Triage
• Spatial hypertext – where inter-document
relationships are expressed via visual and
spatial cues rather than links.
• Earlier study compared use of two variations
of the VIKI spatial hypertext system with
paper [Marshall, Shipman 1997]
• Results showed that
– people use the affordances of the medium
provided
– those working with paper read more
– those working with VIKI organized more
Visual Knowledge Builder (VKB)
• VKB is a second generation spatial hypertext
– greater support for collaborative and long-term
tasks
– navigable history
– explicit (as well as implicit) links
• VKB provides:
– A hierarchy of two-dimensional workspaces called
collections for placing information
– Easy manipulation of visual properties of
information
– Information objects pointing to external content
– Attribute/value pairs for attaching metadata
– Integrated search for Google and NSDL
Personal Collection Creation and Use
Getting content in VKB
– Embedded Search for NSDL and Google
– Drag-and-drop file system folders
– Metadata peeling for files, jpg, mp3, search
results
Comprehension and modification of content
– Metadata visualization of NSDL search results
– Metadata extraction and applicators
– Mouse-based browsing of content (including
mp3 collections)
Metadata Extraction and Application
Goal: to allow easy and consistent metadata authoring.
Select objects as source for extracting metadata attributes
and values
Menubar of applicators is updated to allow attaching same
metadata to other objects.
Metadata Profiles
Metadata applicators can be saved in
profiles
– Profiles stored in VKB datafile and in user’s
VKB settings for reuse.
– Profiles are easily swapped out.
Outline
• What is Document Triage?
• Spatial Hypertext and VKB
• VKB and Document Triage
• Effects of Display Configuration
• Recognizing User Interest /
Document Value
• Current Directions
Study of VKB Use for Selecting
and Organizing Materials
• Study designed to understand how
spatial hypertext would change work
practices when accessing a digital
library.
• Decided to look at document triage
– deciding what to keep
– expressing an initial view of relationships
Study Setup
• Task: 16 subjects placed in role of a reference
librarian, selecting and organizing information
on ethnomathematics for a teacher
• Setting: top 20 search results from NSDL & top
20 search results from Google
• 16 subjects were divided into two groups of 8:
VKB (VKB/IE)
Control (IE/Editor)
Search *
VKB
IE
Reading
IE
IE
Organization
VKB
Editor
* Initial search done by us
• Subjects given as much time as they deemed
necessary (after training for VKB users)
Results
Data collected
– Demographic information
– Questionnaire about experience
– Videos of screen activity
– VKB files (with history) for VKB users
Analysis of activity
– All subject organized links into labeled
categories.
Perception of Activity and
Results
VKB group:
– felt more able to organize the content as
desired
– that their organizations would be more
understandable to others
VKB/I
E
IE/Edit
or
(p)
I was able to organize everything as
I wanted
3.63
2.63
0.064
Easy for someone to understand my
organization
4.13
3.25
0.132
Five point Likert scale where 1 is “strongly disagree” and 5 is “strongly agree”
Time, Selection, Organization
Little difference in time spent on task
VKB participants
– kept more links
– created deeper organizations of categories
VKB/IE
IE/Editor
(p)
Time spent on the task in minutes
52.88
43.00
0.315
Number of links kept
34.63
18.38
0.003
Number of links kept from NSDL
17.13
8.13
0.002
Number of links kept from Google
17.50
10.25
0.015
Number of collections
9.63
5.00
0.062
Number of top level collections
4.75
4.00
0.506
Number of levels of collections
2.00
1.38
0.032
IE/Editor Authoring Activities
IE/editor participants more likely to:
– Add comments about resource
– Select parts of resource for teacher
– Visit links in order
VKB/IE
IE/Editor
(p)
Percentage of subjects in group that added
personal comments
0.00
37.50
0.080
Percentage of subjects in group that copied
and pasted text from web
12.50
50.00
0.124
Percentage of subjects in group that
processed links in the order presented
12.50
62.50
0.043
Percentage of subjects in group that
changed links or added new ones
25.00
50.00
0.335
Discussion & Caveats
• Initial metadata visualization seems to cause
users to avoid changing visual semantics
– Compared to 1997 study of VIKI use, VKB users
did not express interpretation via color
• Some effects may be due to experience
– IE/editor participants were using their normal
tools compared with novice VKB users.
– Training did not show how to drag-and-drop
portions of Web pages into VKB space.
• Study suggests value in spatial hypertext for
collecting and organizing information
resources
Outline
• What is Document Triage?
• Spatial Hypertext and VKB
• VKB and Document Triage
• Effects of Display Configuration
• Recognizing User Interest /
Document Value
• Current Directions
Attention Switching in Document Triage
• Document Triage Recap:
– different demands on attention than singledocument reading activities
– people are reading more than one document
at once
– people switch between reading and
organizing
– transitions generate potential for breakdown
• Question: Can a dedicated reading
surface make a difference in how people
engage with content during triage?
A second look at the earlier data…
Overview in VKB/Content displayed in IE, and transitions between the two
Subject ID
1
2
3
4
5
6
7
8
Total time 1:04:08 0:54:14 0:21:59 0:22:48 1:33:28 1:20:09 1:03:48 1:01:43
Number of
transitions
134
Summary
Total time (seconds)
28
78
VKB
18,874
81
IE
98
106
87
90
more than 2/3 of the time
is spent organizing references
7,596
% total time (in app)
71%
29%
Average time looking
at window (seconds)
47
20
Average time on the task (minutes): 58
time spent reading unfamiliar
material is very brief
window management is
extremely time-consuming!
Average number of transitions between applications: 88
Document Triage—Starting Point
• Given reduced representations of multiple relevant documents
(e.g. a list of search results or email headers), people don’t
spend much time reading (or even skimming)
– Lots of time is spent managing screen space and windows
(opening, closing, reshaping, etc.) – might people be trying to
minimize that?
• When people are overwhelmed in this way, there’s a tendency
to work from metadata instead of content, manipulating and
organizing it
– Think about how we handle our email (especially spam, but others
as well)
– Think about how we decide to follow a link from a list of search
results (perhaps using poor or deceptive metadata)
• We’d like to give people a chance to read more, focus their
attention, and spend less time managing windows: what
happens if we give readers a dedicated reading surface like a
tablet computer?
Initial information triage study setup
to answer our research question
• Duplicate task – subjects act as reference librarians, sifting
through ethnomathematics material from the NSDL
(National Science Digital Library) and Google
• Envisioned technology scenario:
• Associated technology prototyping – to develop
infrastructure for the study and to investigate heuristic
techniques for assessing interest through action
Infrastructure Development
• Our envisioned technology scenario
(tetherless tablet with extended display for
overview and organizing) didn’t work
– Controlling Windows on a second screen
using a pen is not easy.
– Pushed infrastructure development to create
a case close enough to envisioned scenario
– Extended desktop was sufficient for two
cases that didn’t use a tablet computer
• Infrastructure: screen control of windows
displayed on different computers
– Push and pull selected windows between
different computers
• Logging and instrumentation – capturing
events of interest
Study Configurations
Display
Configuration
Input Devices
Assignment of Activity
Laptop and tabletop LCD
display
Extended desktop controlled via
keyboard and mouse
User controls which windows
are on which display
Laptop and projected
display
Extended desktop controlled via
keyboard and mouse
User controls which windows
are on which display
Tablet computer and
projected display
Projected display controlled via
keyboard and mouse, tablet
computer controlled via pen
Software assigns document
overview to projected display
and IE to tablet
Data Sources
Wide variety of data captured:
• video capture of environment (subject doing
the task)
• continuous screen capture of both displays
• demographic profile of participants
• interviews and questionnaires about task,
technology, and resources including identifying
5 most useful and 5 least useful documents
• activity logs (for IE and for VKB)
A First Look at the New Data
Prior Study
Configuration
# of Displays
Desktop PC
Current Study
Laptop &
LCD Screen
Laptop &
Projected
Display
Tablet PC &
Projected
Display
1
2
2
2
Avg. Total Time
3,309
3,554
3,642
4,234
Avg. Time (VKB)
2,359
2,453
2,627
3,005
950
1,102
1,015
1,229
97
193
168
205
Avg. Time (IE)
Avg. # of
Transitions
(shifts of focus)
Time Spent in IE (glancing, skimming, reading)
Occurrence of Time Spent in IE
16
tendency:
14
in the 2 display
condition, there’s
a greater number
One
of brief
Two
encounters;
Percent (%)
12
10
8
6
might represent
more glances,
more checking
more revisits?
4
2
0
0
10
20
30
40
50
60
Time Spent (second)
One Display
Two Display
Questionnaires & Interviews
• Subjects with laptop and extra screen
felt most comfortable of the multiple
display configurations
• Tablet computer was rated lowest in all
questions concerning ease of use or
enjoyment
• Preference for multiple displays at the
same focal length
• Subjects found size of projected display
and pen interactions annoying
Did Reading Actions Correlate
with Document Preferences?
• Top five and bottom five documents
were identified
• Log files recorded user actions in IE
• A number of user actions were
significantly correlated with
document preference
User Actions vs. User
Interests
(1) time spent
6
p = 0.001
(2) # of following 5
embedded links
p = 0.040
(3) # of visits
4
p = 0.003
(4) text selections
3
p = 0.049
(5) clicks
2
p = 0.034
(6) scrolls
1
p < 0.0001
0
0.2
0.532
-0.334*
0.480
0.331
0.354
0.632
0.4
0.6
Pearson Coefficient
0.8
1
Discussion
Caveat: this data comes from a
single domain and document set
– Need to explore other document sets
– Need to investigate effect of domain
and subject matter expertise on task
performance
This is purely activity in reading
interface, not organizing interface.
Display Configuration
Summary
• Number of transitions between
applications almost doubled for multiple
display configurations
• No significant difference between display
configurations although subjects did
express a preference for two side-by-side
displays
• Scrolling, time spent on document, and
number of visits to document were
correlated with document preference
Outline
• What is Document Triage?
• Spatial Hypertext and VKB
• VKB and Document Triage
• Effects of Display Configuration
• Recognizing User Interest /
Document Value
• Current Directions
User Interest and Document Value
from
Reading and Organizing Activities
• Recognizing user interest &
document value
• Representing user interest
• Recognizing documents of potential
interest
• Visualizing interest information
User Interest
• Explicit interest indicators
– Precise, easy to implement
– Distraction, cognitive load, fewer result
• Implicit interest indicators
– Reading activity
– Annotation activity
Motivation
• People spend much time for
documents that they finally evaluate
as not useful
• Understanding of user interests on
documents could be the basis for
active supporting document triage
• User activity in reading & organizing
implicitly represent user interests on
documents
Motivation (cont.)
• Observed significant differences
between individual styles in reading
& organizing
• Observed no dominant factor to
determine user interests
• Combining partially identified user
interests from multiple applications
could more accurately recognize user
interests
Interest Profile Manager (1)
• Infrastructure for
sharing information
about user activity
between multiple
application
Interest
Profile
Manager
Reading
Application
User Interest
Estimation Engine
Reading
Application
• Estimation of user
interests from user
activity
Reading
Application
Interest Profile
Organizing
Application
Location/Overview
Application
Interest Profile Manager (2)
• Flexible information structure to handle
various user activity from multiple
applications
System Architecture
• Interest
Profile
Manager
• VKB
• Instrumen
ted
Internet
Explorer
Data Sources
• Document (Web page) attributes (3)
Number of characters, number of
links, …
• User events from reading activity
(10)
Reading time, scrolls, clicks…
• User events from organizing activity
(14)
Moving symbol, resizing symbol, …
User Model (1)
• User activity vs.
user interests
• Statistical &
qualitative
approach
Example of Simple Model
0.6
0.5
User interest
• Predict a general
pattern of user
interests on
documents
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10
Time (second)
Reading time
Scrolling time
12
14
16
User Model (2)
• Statistical model 1 (reading-activity)
0.877 + 0.133 * factor1 + 0.120 * factor2
• Statistical model 2 (organizing activity)
0.877 + 0.185 * factor1 – 0.092 * factor2
• Statistical model (combined)
0.877 + 0.125 * factor1 + 0.152 * factor2 +
0.0662 * factor3 + 0.0653 * factor4
* Factors of different models are different from
each other
User Model (3)
• Factors for the combined activity model
User Model (4)
Model
R
R Square
Adjusted R
Square
Reading
0.690
0.477
0.444
Organizin
g
Combined
0.797
0.636
0.613
0.841
0.708
0.669
• Result when three models are used
with the same data set
User Model (5)
• Qualitative model (14)
Comparison of Models
Model
Average Error
Reading
0.258
Standard
Deviation
0.192
Organizing
0.216
0.146
Combined
0.176
0.138
Qualitative
0.197
0.134
• Result when four models are used with
different data set
Results
• User activity in reading & organizing often
corresponds to user interests and can be the
basis for supporting document triage
• Combined activity model is better than all the
other models
• Combining partially identified user interests
from multiple applications can be the basis
for more accurate estimation of user interests
Outline
• What is Document Triage?
• Spatial Hypertext and VKB
• VKB and Document Triage
• Effects of Display Configuration
• Recognizing User Interest /
Document Value
• Current Directions
Download