Designing Systems to Support Document Triage Frank Shipman Center for the Study of Digital Libraries Texas A&M University Outline • What is Document Triage? • Spatial Hypertext and VKB • VKB and Document Triage • Effects of Display Configuration • Recognizing User Interest / Document Value • Current Directions Document Triage • The practice of quickly determining the merit and disposition of relevant documents in a collection of documents • Common aspects – Selection from information repository via querying and browsing interfaces – Extensive, hyper-extensive and intensive reading and viewing in reading interfaces – Collection and interpretation of resources in organization interface – Mode switching / attention shifting Information Work Variety of information tasks – Short-term: Facts and references • What is the escape velocity? – Long-term: Analysis and synthesis • How to design a space craft? For longer-term information activities the work really begins after potentially relevant materials are located. Information Life-Cycle Modification: Reading results in annotation, note taking, and writing Annotation & Authoring Understanding one document may require other documents or result in further information requests Comprehension: Skimming and Reading Added content influences further access Location: Searching and Browsing Located resources must be understood to be evaluated Modified from work on software libraries: [Fischer, Henninger, Redmiles 1991] Document Triage • We want to look at situations where people are reading more than one document at once • Document triage places different demands on attention than singledocument reading activities • Continuum of types of reading: working in overview (metadata), reading at various levels of depth (skimming), reading intensively Library Table as Success Model • How people make use of library resources can give design goals. • Characteristics of the library table: – Integration and easy differentiation of source materials and personal interpretation – Implicit and explicit expression via spatial layout and attached annotation – Patrons can collaborate using the materials on the table as a prop for their conversation • Limitations: The library table and resources are shared/limited resources, so must be cleaned up after each work session. Outline • What is Document Triage? • Spatial Hypertext and VKB • VKB and Document Triage • Effects of Display Configuration • Recognizing User Interest / Document Value • Current Directions Spatial Hypertext and Document Triage • Spatial hypertext – where inter-document relationships are expressed via visual and spatial cues rather than links. • Earlier study compared use of two variations of the VIKI spatial hypertext system with paper [Marshall, Shipman 1997] • Results showed that – people use the affordances of the medium provided – those working with paper read more – those working with VIKI organized more Visual Knowledge Builder (VKB) • VKB is a second generation spatial hypertext – greater support for collaborative and long-term tasks – navigable history – explicit (as well as implicit) links • VKB provides: – A hierarchy of two-dimensional workspaces called collections for placing information – Easy manipulation of visual properties of information – Information objects pointing to external content – Attribute/value pairs for attaching metadata – Integrated search for Google and NSDL Personal Collection Creation and Use Getting content in VKB – Embedded Search for NSDL and Google – Drag-and-drop file system folders – Metadata peeling for files, jpg, mp3, search results Comprehension and modification of content – Metadata visualization of NSDL search results – Metadata extraction and applicators – Mouse-based browsing of content (including mp3 collections) Metadata Extraction and Application Goal: to allow easy and consistent metadata authoring. Select objects as source for extracting metadata attributes and values Menubar of applicators is updated to allow attaching same metadata to other objects. Metadata Profiles Metadata applicators can be saved in profiles – Profiles stored in VKB datafile and in user’s VKB settings for reuse. – Profiles are easily swapped out. Outline • What is Document Triage? • Spatial Hypertext and VKB • VKB and Document Triage • Effects of Display Configuration • Recognizing User Interest / Document Value • Current Directions Study of VKB Use for Selecting and Organizing Materials • Study designed to understand how spatial hypertext would change work practices when accessing a digital library. • Decided to look at document triage – deciding what to keep – expressing an initial view of relationships Study Setup • Task: 16 subjects placed in role of a reference librarian, selecting and organizing information on ethnomathematics for a teacher • Setting: top 20 search results from NSDL & top 20 search results from Google • 16 subjects were divided into two groups of 8: VKB (VKB/IE) Control (IE/Editor) Search * VKB IE Reading IE IE Organization VKB Editor * Initial search done by us • Subjects given as much time as they deemed necessary (after training for VKB users) Results Data collected – Demographic information – Questionnaire about experience – Videos of screen activity – VKB files (with history) for VKB users Analysis of activity – All subject organized links into labeled categories. Perception of Activity and Results VKB group: – felt more able to organize the content as desired – that their organizations would be more understandable to others VKB/I E IE/Edit or (p) I was able to organize everything as I wanted 3.63 2.63 0.064 Easy for someone to understand my organization 4.13 3.25 0.132 Five point Likert scale where 1 is “strongly disagree” and 5 is “strongly agree” Time, Selection, Organization Little difference in time spent on task VKB participants – kept more links – created deeper organizations of categories VKB/IE IE/Editor (p) Time spent on the task in minutes 52.88 43.00 0.315 Number of links kept 34.63 18.38 0.003 Number of links kept from NSDL 17.13 8.13 0.002 Number of links kept from Google 17.50 10.25 0.015 Number of collections 9.63 5.00 0.062 Number of top level collections 4.75 4.00 0.506 Number of levels of collections 2.00 1.38 0.032 IE/Editor Authoring Activities IE/editor participants more likely to: – Add comments about resource – Select parts of resource for teacher – Visit links in order VKB/IE IE/Editor (p) Percentage of subjects in group that added personal comments 0.00 37.50 0.080 Percentage of subjects in group that copied and pasted text from web 12.50 50.00 0.124 Percentage of subjects in group that processed links in the order presented 12.50 62.50 0.043 Percentage of subjects in group that changed links or added new ones 25.00 50.00 0.335 Discussion & Caveats • Initial metadata visualization seems to cause users to avoid changing visual semantics – Compared to 1997 study of VIKI use, VKB users did not express interpretation via color • Some effects may be due to experience – IE/editor participants were using their normal tools compared with novice VKB users. – Training did not show how to drag-and-drop portions of Web pages into VKB space. • Study suggests value in spatial hypertext for collecting and organizing information resources Outline • What is Document Triage? • Spatial Hypertext and VKB • VKB and Document Triage • Effects of Display Configuration • Recognizing User Interest / Document Value • Current Directions Attention Switching in Document Triage • Document Triage Recap: – different demands on attention than singledocument reading activities – people are reading more than one document at once – people switch between reading and organizing – transitions generate potential for breakdown • Question: Can a dedicated reading surface make a difference in how people engage with content during triage? A second look at the earlier data… Overview in VKB/Content displayed in IE, and transitions between the two Subject ID 1 2 3 4 5 6 7 8 Total time 1:04:08 0:54:14 0:21:59 0:22:48 1:33:28 1:20:09 1:03:48 1:01:43 Number of transitions 134 Summary Total time (seconds) 28 78 VKB 18,874 81 IE 98 106 87 90 more than 2/3 of the time is spent organizing references 7,596 % total time (in app) 71% 29% Average time looking at window (seconds) 47 20 Average time on the task (minutes): 58 time spent reading unfamiliar material is very brief window management is extremely time-consuming! Average number of transitions between applications: 88 Document Triage—Starting Point • Given reduced representations of multiple relevant documents (e.g. a list of search results or email headers), people don’t spend much time reading (or even skimming) – Lots of time is spent managing screen space and windows (opening, closing, reshaping, etc.) – might people be trying to minimize that? • When people are overwhelmed in this way, there’s a tendency to work from metadata instead of content, manipulating and organizing it – Think about how we handle our email (especially spam, but others as well) – Think about how we decide to follow a link from a list of search results (perhaps using poor or deceptive metadata) • We’d like to give people a chance to read more, focus their attention, and spend less time managing windows: what happens if we give readers a dedicated reading surface like a tablet computer? Initial information triage study setup to answer our research question • Duplicate task – subjects act as reference librarians, sifting through ethnomathematics material from the NSDL (National Science Digital Library) and Google • Envisioned technology scenario: • Associated technology prototyping – to develop infrastructure for the study and to investigate heuristic techniques for assessing interest through action Infrastructure Development • Our envisioned technology scenario (tetherless tablet with extended display for overview and organizing) didn’t work – Controlling Windows on a second screen using a pen is not easy. – Pushed infrastructure development to create a case close enough to envisioned scenario – Extended desktop was sufficient for two cases that didn’t use a tablet computer • Infrastructure: screen control of windows displayed on different computers – Push and pull selected windows between different computers • Logging and instrumentation – capturing events of interest Study Configurations Display Configuration Input Devices Assignment of Activity Laptop and tabletop LCD display Extended desktop controlled via keyboard and mouse User controls which windows are on which display Laptop and projected display Extended desktop controlled via keyboard and mouse User controls which windows are on which display Tablet computer and projected display Projected display controlled via keyboard and mouse, tablet computer controlled via pen Software assigns document overview to projected display and IE to tablet Data Sources Wide variety of data captured: • video capture of environment (subject doing the task) • continuous screen capture of both displays • demographic profile of participants • interviews and questionnaires about task, technology, and resources including identifying 5 most useful and 5 least useful documents • activity logs (for IE and for VKB) A First Look at the New Data Prior Study Configuration # of Displays Desktop PC Current Study Laptop & LCD Screen Laptop & Projected Display Tablet PC & Projected Display 1 2 2 2 Avg. Total Time 3,309 3,554 3,642 4,234 Avg. Time (VKB) 2,359 2,453 2,627 3,005 950 1,102 1,015 1,229 97 193 168 205 Avg. Time (IE) Avg. # of Transitions (shifts of focus) Time Spent in IE (glancing, skimming, reading) Occurrence of Time Spent in IE 16 tendency: 14 in the 2 display condition, there’s a greater number One of brief Two encounters; Percent (%) 12 10 8 6 might represent more glances, more checking more revisits? 4 2 0 0 10 20 30 40 50 60 Time Spent (second) One Display Two Display Questionnaires & Interviews • Subjects with laptop and extra screen felt most comfortable of the multiple display configurations • Tablet computer was rated lowest in all questions concerning ease of use or enjoyment • Preference for multiple displays at the same focal length • Subjects found size of projected display and pen interactions annoying Did Reading Actions Correlate with Document Preferences? • Top five and bottom five documents were identified • Log files recorded user actions in IE • A number of user actions were significantly correlated with document preference User Actions vs. User Interests (1) time spent 6 p = 0.001 (2) # of following 5 embedded links p = 0.040 (3) # of visits 4 p = 0.003 (4) text selections 3 p = 0.049 (5) clicks 2 p = 0.034 (6) scrolls 1 p < 0.0001 0 0.2 0.532 -0.334* 0.480 0.331 0.354 0.632 0.4 0.6 Pearson Coefficient 0.8 1 Discussion Caveat: this data comes from a single domain and document set – Need to explore other document sets – Need to investigate effect of domain and subject matter expertise on task performance This is purely activity in reading interface, not organizing interface. Display Configuration Summary • Number of transitions between applications almost doubled for multiple display configurations • No significant difference between display configurations although subjects did express a preference for two side-by-side displays • Scrolling, time spent on document, and number of visits to document were correlated with document preference Outline • What is Document Triage? • Spatial Hypertext and VKB • VKB and Document Triage • Effects of Display Configuration • Recognizing User Interest / Document Value • Current Directions User Interest and Document Value from Reading and Organizing Activities • Recognizing user interest & document value • Representing user interest • Recognizing documents of potential interest • Visualizing interest information User Interest • Explicit interest indicators – Precise, easy to implement – Distraction, cognitive load, fewer result • Implicit interest indicators – Reading activity – Annotation activity Motivation • People spend much time for documents that they finally evaluate as not useful • Understanding of user interests on documents could be the basis for active supporting document triage • User activity in reading & organizing implicitly represent user interests on documents Motivation (cont.) • Observed significant differences between individual styles in reading & organizing • Observed no dominant factor to determine user interests • Combining partially identified user interests from multiple applications could more accurately recognize user interests Interest Profile Manager (1) • Infrastructure for sharing information about user activity between multiple application Interest Profile Manager Reading Application User Interest Estimation Engine Reading Application • Estimation of user interests from user activity Reading Application Interest Profile Organizing Application Location/Overview Application Interest Profile Manager (2) • Flexible information structure to handle various user activity from multiple applications System Architecture • Interest Profile Manager • VKB • Instrumen ted Internet Explorer Data Sources • Document (Web page) attributes (3) Number of characters, number of links, … • User events from reading activity (10) Reading time, scrolls, clicks… • User events from organizing activity (14) Moving symbol, resizing symbol, … User Model (1) • User activity vs. user interests • Statistical & qualitative approach Example of Simple Model 0.6 0.5 User interest • Predict a general pattern of user interests on documents 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 Time (second) Reading time Scrolling time 12 14 16 User Model (2) • Statistical model 1 (reading-activity) 0.877 + 0.133 * factor1 + 0.120 * factor2 • Statistical model 2 (organizing activity) 0.877 + 0.185 * factor1 – 0.092 * factor2 • Statistical model (combined) 0.877 + 0.125 * factor1 + 0.152 * factor2 + 0.0662 * factor3 + 0.0653 * factor4 * Factors of different models are different from each other User Model (3) • Factors for the combined activity model User Model (4) Model R R Square Adjusted R Square Reading 0.690 0.477 0.444 Organizin g Combined 0.797 0.636 0.613 0.841 0.708 0.669 • Result when three models are used with the same data set User Model (5) • Qualitative model (14) Comparison of Models Model Average Error Reading 0.258 Standard Deviation 0.192 Organizing 0.216 0.146 Combined 0.176 0.138 Qualitative 0.197 0.134 • Result when four models are used with different data set Results • User activity in reading & organizing often corresponds to user interests and can be the basis for supporting document triage • Combined activity model is better than all the other models • Combining partially identified user interests from multiple applications can be the basis for more accurate estimation of user interests Outline • What is Document Triage? • Spatial Hypertext and VKB • VKB and Document Triage • Effects of Display Configuration • Recognizing User Interest / Document Value • Current Directions