Understanding Searchers' Perception of Task Difficulty

advertisement
Understanding Searchers’ Perception of Task Difficulty:
Relationships with Task Type
Jingjing Liu1, Chang Liu2, Xiaojun Yuan3, Nicholas J. Belkin2
1 Department of Information and Library Science, Southern Connecticut State University
2 School of Communication and Information, Rutgers University
3 College of Computing and Information, University at Albany, State University of New York
jliujingjing@gmail.com, changl@eden.rutgers.edu, xyuan@albany.edu, belkin@rutgers.edu
ABSTRACT
We report findings that help us better understand the
difficulty of tasks which involve information seeking,
retrieving, gathering, and use. We examined the data
gathered from two interactive information retrieval user
studies on how users’ perception of task difficulty changes
before and after searching for information to solve tasks,
and how the difficulty of tasks relates with users’
background, previous experience with the tasks, and
knowledge of the task topics, etc. The two studies
employed carefully designed tasks with different types
along several dimensions: task structure (subtasks being
dependent upon or parallel with each other), task goal in
quality (being specific or amorphous), and naming (being
named or unnamed). It was found that while in some types
of tasks, users’ perceptions of task difficulty did not change
before and after working on the tasks, in others, this did,
either increasing or decreasing. Specifically, in the
dependent-structured task, this did not change. In the
parallel-structured or specific/named task, it decreased. In
the amorphous/unnamed task, it increased. We also found
that users’ background factors do not normally correlate
with their perceived task difficulty, or perceived difficulty
change. In addition to helping understand the dynamic and
complex nature of task difficulty, our findings have
implications for system design that can provide assistance
to users with their search and task solving strategies.
Keywords
Task difficulty, expected difficulty, reflected difficulty, task
type.
INTRODUCTION
Current search engines do a good job in returning answers
with easy and simple tasks, e.g., “when is the poster
submission deadline for ASIS&T 2011 conference?” One
can simply type in Google (http://www.google.com) the
keywords “ASIST 2011 poster deadline” and find the
answer from the snippet of the top ranked search result.
However, search systems do not do as well with somewhat
difficult tasks, e.g., “collect information that is helpful to
make a 2-week tour plan in October this year to some good
tour sites in China.” The task being not specific but
Copyright is held by the author/owner(s).
ASIST 2011, October 9–13, 2011, New Orleans, LA, USA.
amorphous in goal, the complex nature of the task, as well
as the user’s lack of knowledge with China, etc., could all
be factors that make the task difficult. Better search systems
are needed that can help people more easily locate useful
information and more effectively solve such difficult tasks.
It is fundamental to first understand task difficulty, based
on which systems can be designed or improved, that can
provide assistance to the users to help solve tasks, or to
reduce the degree of difficulty that users perceive.
Li & Belkin (2008) define task difficulty as a subjective
perception assessed by task doers. It could be formed in
both pre- and post-task (Kim, 2006). Task difficulty has
been found to be a significant factor influencing users’
search behaviors and search performance. In difficult tasks,
users are more likely to visit more web pages (Gwizdka &
Spence, 2006; Kim, 2006; Liu et al., 2010), issue more
queries (Aula, Khan, & Guan, 2010; Kim, 2006; Liu et al.,
2010), and spend longer total time (Aula, Khan, & Guan,
2010; Liu et al., 2010) and first dwell time (Liu et al., 2010)
on search result pages. These findings on the relationships
between users’ search behaviors and task difficulty suggest
that it is possible to predict the difficulty level of a task
from the users’ search behaviors. Further, researchers have
found that the relationships between search behaviors and
task difficulty vary in different types of tasks such as in
factual, interpretive, and exploratory (Kim, 2006), or in
fact-finding and information gathering (Liu et al., 2010).
Even though a system can predict, from observing the users’
behaviors, that they are having difficulty with their search,
the system cannot help users overcome the difficulty unless
it has a further understanding of the nature of task difficulty.
The following research questions need to be answered: 1)
Does a user’s perception of the level of task difficulty
change along the search and task solving process? 2) Does
this change of difficulty perception vary in different task
types? 3) What could be the possible factors that lead to this
change? 4) What can systems do to reduce task difficulty?
LITERATURE REVIEW
As an important factor determining task performance, task
difficulty, as well as a closely related concept, task
complexity, has attracted significant attention in interactive
information retrieval (IIR). We discuss here work related to
ours in both of these areas.
Many efforts have been spent on exploring the effect of task
complexity on information task performance and/or users
behaviors. Byström and her colleagues (e.g., Byström &
Järvelin, 1995; Byström, 2002) conducted a series of
studies analyzing the effect of task complexity on human
information behaviors. In their research, task complexity
was defined from the worker’s point of view based on “a
priori determinability of, or uncertainty about, task
outcomes, process, and information requirements”
(Byström & Järvelin, 1995, p. 194). They found that as
people’s task complexity increased, they needed more types
and more sources of information, needed more domain
information and more problem solving information, were
less likely to predict the types of information they need, and
were more dependent upon experts to provide useful
information. Vakkari (1999) outlined a model showing the
relationships of factors determining task performance,
including task complexity (following Byström and
colleagues’ definition), problem structure, prior knowledge,
and information actions. His main argument was that
information activities are systematically connected to task
complexity and the structure of the problem at hand.
In their study examining the impact of task complexity on
the utility of implicit relevance feedback (IRF), as opposed
to explicit relevance feedback (ERF), White, Ruthven and
Jose (2005) used an objective way to define complexity by
operationalizing it into measurable variables including the
number of potential information sources and type of
information required to complete a task. Results indicate
that for more complex tasks, participants preferred IRF, but
for less complex tasks, they preferred ERF.
Li and Belkin (2008) argued that task complexity can be
both objective and subjective, with subjective task
complexity assessed by task performers, and objective task
complexity defined by the number of activities involved in
a “work task” (Ingwersen & Järvelin, 2005) or the number
of information sources involved in a “search task.”
Following their definition, studies have found that in tasks
with higher objective task complexity, users searched more
systems, issued more queries, viewed more pages, and
spent longer time completing the task (Li & Belkin, 2010;
Liu et al., 2010).
Task complexity is sometimes used in an interchangeable
sense with task difficulty, such as in Bell & Ruthven (2004).
They explored the nature of task complexity (difficulty),
basically following Byström and colleagues’ definition, in
terms of whether users could recognize task complexity and
how it affected search success and satisfaction. They
pointed out that task complexity is a dynamic entity, and
that system evaluation should use tasks with appropriate
complexity levels.
Some other researchers looked at task difficulty. In their
comprehensive task classification scheme, Li and Belkin
(2008) defined task difficulty as a subjective perception
assessed by task doers. Cole et al. (2010) operationalized it
as anticipated task difficulty. Similarly, Kim (2006) defined
task difficulty as task performers’ perception of the
complexity of a task. Many studies measured task difficulty
using users’ self-reported perception of how difficult a task
is through questionnaires. With this measurement, Gwizdka
and Spence (2006) performed a web study to investigate the
relationship between task difficulty and search behavior
operationalized as the number of the unique web pages
visited, the time spent on each page, the degree of deviation
from the optimal path and the degree of the navigation
path’s linearity. Their results demonstrated that these four
measures were good predictors of subjective task difficulty.
It was found that objective task complexity has an impact
on the relative importance of those predictors and on the
subjective perception of task difficulty. However, the only
task type was information-finding. Kim (2006) reported a
close relationship between task difficulty and searching
interaction. For different task types, the relationship varies.
In particular, the correlation results of task difficulty and
some behavioral variables indicate that post-task difficulty
was significantly associated with task completion time, and
the numbers of queries and documents viewed in factual
tasks; user behaviors were significantly correlated with pretask difficulty in exploratory tasks; but most correlations
were not significant in interpretive tasks. Therefore, posttask difficulty is a good indicator for factual task only. Liu,
Gwizdka, Liu, and Belkin (2010) also reported how
behavioral predictors of task difficulty vary across different
task types, including single-fact finding, multiple-fact
finding and multiple-piece information gathering tasks. In
single-fact finding tasks, participants’ total dwell time and
first dwell time on unique content pages were significantly
longer in difficult tasks than in easy tasks. This relationship
was not supported in other tasks. Also in the same task type,
participants’ total dwell time and first dwell time on unique
SERPs did not differ between difficult and easy tasks.
However, in other task types, participants spent longer
dwell time on SERPs in difficult tasks than in easier tasks.
Aula, Khan, and Guan (2010) conducted a large-scale study
(179 participants with each completing an average of 22.3
tasks of varying task difficulty) to observe user web search
behavior. They used closed informational tasks to which
there is a single, unambiguous answer. They defined a task
as difficult if a user failed it. Their results indicate that
participants tended to issue more diverse queries, and spend
longer total time on the search result page and used
advanced operators more in completing difficult tasks than
easier tasks. As in Gwizdka and Spence (2006), the single
task type limits the generalization of results of is study.
Some other researchers looked at system assistance, which
is closely related to task difficulty. Jansen (2006) evaluated
the effectiveness of automated assistance using complex
searching tasks (the tasks used by TREC1). A system with
1
http://trec.nist.gov/
!
.$#0/&"(1
-2332'+4/51
!"#$!%&'()$*+,-"!
#$%&'()*!*+,-.%/0%1!
#$%23,4)!%5#%$3%+6%1!
%'67!
@1
!
?1
6&"(13#&/+$#"1
!"#$1%#9&:2*$"1
;2,/#$&'/2*,1<2/91
"5"/#7=1
A1
!
.*"/0/&"(1
-2332'+4/51
B1
!"#$!%&'()$*+,-"!#,)'&
'()*!*+,-.%/0%1!%'67!
@1
!
6&"(1'*784#/2*,!
?1
!
>1
Figure 1. A theoretical model of user perception of task difficulty and affecting factors
pattern-based searching assistance was compared to a
system with assistance provided at every point in the whole
search process. Results indicate that in 70% of the cases,
participants on the system with the pattern-based searching
assistance performed better than those on the other system.
It further indicates that appropriate assistance can improve
user performance in dealing with complex tasks. With the
similar intention of exploring ways of helping users find
needed information, Xie and Cool (2009) did a study to
identify different types of help-seeking situations and
different types of factors that may be related to such
situations. Three tasks were used; that is: an exploring task,
a task requiring searching for specific information, and a
task requiring searching for information with a variety of
characteristics. It was found that task type and task
complexity had an impact on whether participants need to
overcome help-seeking situations. Also, the help-seeking
situation “inability to select appropriate terms” was highly
related to the task type and task complexity, particularly
when users have to look for specific information. These
studies made important steps on investigating how to
provide enough and appropriate help for users with
complex tasks. Nevertheless, previous studies have rarely
examined users’ perceptions of the tasks’ difficulty after
they have finished the task in comparison with before
working on it.
A THEORETICAL MODEL
The above literature review shows that task complexity
and/or difficulty has significant relations with a number of
other factors: task performance, search behaviors,
knowledge background, relevance judgment, etc. We are
specifically interested in users’ perception of task difficulty,
both before and after working with a task, as well as how
these perceptions relate with other factors, such as users’
background, task type, user behaviors, etc. We propose a
model showing the above-mentioned relations (Figure 1).
There are five sets of components in the model, which are
about 1) task completion, 2) task difficulty, 3) user
background, 4) user interaction with the system and all the
behaviors in the process, and 5) task features.
For the task that drives one to seek information, the task
doer will eventually finish working with it, no matter how
successful it is. This is the first component in the model
(marked as number “1” in Figure 1), which provides an
ending point to the task.
The second set of components (marked as “2” in Figure 1)
is about the task doer’s perception of task difficulty. Before
working on a task, although it is not always explicitly
expressed, the task doer usually has an estimate of the
task’s difficulty, which is the expected task difficulty. After
working on the task, he/she has another perception on the
task’s difficulty, which is the reflected. The importance of
the two types of task difficulty is that it is likely that the
task doer’s perception of task difficulty changes after
working on the task, and this will inevitably affect their
satisfaction with the IR system in which they seek
information.
The third set of components (marked as “3” in Figure 1) is
the task-doer’s background, including both pre-task and
post-task. This is similar to the “prior knowledge” in
Vakkari’s (1999) model that shows the relations between
task complexity, problem structure, knowledge, and
information actions, however, our model considers more
variables than just “prior knowledge”. In our model, the
pre-task background, such as pre-task topic knowledge,
previous experience with the type of task, etc., could affect
their expected task difficulty. The post-task background,
including one’s post-task topic knowledge, will likely affect,
together with pre-task background factors, the task doer’s
post-task perception of task difficulty.
The fourth set of components (marked as “4” in Figure 1) is
the task doer, i.e., the information system user’s behaviors,
which is demonstrated through interacting with the search
system. This is similar to the “information actions” in
Vakkari’s (1999) model. These can include users’ queries
used to search in the system, documents (web pages, etc.)
opened, viewed, and/or retained, time spent on reading
documents, etc. Task difficulty will affect such behaviors,
and such behaviors could be indicators of task difficulty.
Previous studies (e.g., Gwizdka & Spence, 2006; Liu et al.,
2010) have looked at predicting task difficulty based on
behavioral evidence. While it is an important research area,
it is not the focus of the current paper.
Tasks
The fifth set of components (marked as “5” in Figure 1) is
about task features. This is similar to the “problem structure”
in Vakkari’s (1999) model. Task features, including
different types of tasks, have been found to affect users’
behaviors and difficulty perception (e.g., Kim, 2006; Liu et
al., 2010). Therefore, it is an important aspect to include in
our model and current examination.
The tasks asked the participants to write a three-section
feature story on hybrid cars for a newspaper, and to finish
and submit each article section at the end of each
experiment session. At the end of the 3rd session, they were
asked to integrate the 3 sections into one article. In the
dependent task (DT), the three sub-tasks were: 1) collect
information on what manufacturers have hybrid cars; 2)
select three models that you will mainly focus on in this
feature story; and 3) compare the pros and cons of three
models of hybrid cars. In the parallel task (PT), the three
sub-tasks were finding information and writing a report on
three models of cars from auto manufacturers renown for
good warranties and fair maintenance costs: 1) Honda Civic
hybrid; 2) Nissan Altima hybrid, and 3) Toyota Camry
hybrid. It was hypothesized that the sub-tasks in the parallel
task were independent of one another, but that in the
dependent task there would be perceived to be at least some
notional order. To maintain consistency, sub-task orders in
task description in both tasks were rotated and users were
allowed to choose whatever order of sub-task performance
they preferred.
METHOD
Study 1
Data came from a 3-session lab experiment designed to
examine information system users’ behavioral and
performance changes along the way of searching for
information to solve a task.
Experimental design
In this experiment, the 3 sessions were treated as 3 stages.
The design was 2*2 factorial with two between-subjects
factors (Table 1). One was task type, with two levels:
parallel or dependent. The other was search system, with
two levels: query suggestion (QS) or non-query suggestion
(NQS). The two tasks and the system conditions are
described in more detail below.
System condition
One aspect of the study as a whole was aimed at exploring
whether query terms extracted from useful pages in
previous sessions were helpful for the users in their current
search, and to this end, two versions of the search system
were designed. One version (NQS) is the regular IE
window, and the other (QS) offered query term suggestions
based on previous sessions, on the left frame of screen, the
right being the regular IE window. Since this is a withinsubject factor, it is not likely that it affects the task
differences that are between-subjects and so this factor is
not considered further in this paper.
Table 1. Study 1 experimental design
System
condition
Task
1
2
3
4
Dependent
Parallel
Dependent
Parallel
System version
Session
Session
(stage 1) (stage 2)
NQS
NQS
NQS
NQS
NQS
QS
NQS
QS
Session
(stage 3)
NQS
NQS
QS
QS
Tasks were designed to mimic journalists’ assignments
since they could be relatively easily set as realistic tasks in
different domains. Among the many dimensions of task
types, this study focused on task structure, i.e., the intersubtask relation, varying them while keeping other facets in
the comprehensive task classification scheme proposed by
Li & Belkin (2008) as constant as possible. This makes it
reasonable to attribute the task difference to this single
factor of task structure. Two tasks types were used in the
study: one parallel and one dependent. They both had three
sub-tasks, each of which was worked on by the participant
during one separate session, for three sessions in total.
In each session, participants were allowed to work up to 40
minutes to write and submit their reports. They were
allowed to search freely on the Web for resources in report
writing. For logging purpose, users were allowed to keep
only one Internet Explorer (IE) window open and use back
and forward buttons to move between web pages.
Participants
The study recruited 24 undergraduate Journalism/Media
Studies students (21 female, 3 male) via email to the
student mailing list at the Journalism/Media Studies
undergraduate program in the authors’ school. Their mean
age was 20.4 years. They self reported to have an average
of 8.4 years of online searching experience, and rated their
levels of expertise with searching as slightly above average
(M=5.38) (1=novice, 7=expert). Each of them came 3 times
within a 2-week period based on their schedule. Each was
assigned randomly to a task/system condition. Each
obtained $30 payment upon finishing all 3 sessions, with an
incentive (informed before experiment) of an additional $20
for the top 6 who submitted the most detailed reports to
encourage them to take a serious manner in the study.
Procedures
Participants came individually to a usability lab to take part
in the experiment. Upon arrival in the first session, they
completed a consent form and a background questionnaire
eliciting their demographic information and search
experience. They were then given the general work task to
be finished in the whole experiment. A pre-session task
questionnaire followed to collect their familiarity with the
general task topic, previous experience with the type of task,
and the expected task difficulty. Then they were asked to
pick one sub-task to work with in the current session. A
pre-session sub-task questionnaire followed to collect their
familiarity with the sub-task topic. Then they worked with
the subtask: searching for useful sources and writing reports.
After report submission, participants went through an
evaluation process in which they were asked to rate on a 7point scale each document that they had viewed, in the
order of viewing them in the actual search process, with
respect to its usefulness to the overall task. A post-session
sub-task questionnaire and a post-session general task
questionnaire were then administered to elicit user
perceptions on the difficulty of the task and sub-task, as
well as their satisfaction with their reports. This ended the
first session.
In the 2nd and the 3rd sessions, participants went through
the same processes except for the consent form and
background questionnaire, as well as an instruction step on
using query suggestion features for those assigned with the
QS version system. In the 3rd session, after the post-session
general task questionnaire, an exit interview asked them to
reflect their overall knowledge gain (rating on a 7-point
scale) and to comment on the whole experiment.
Data collection
The experiment was conducted using a two-monitor
workstation: the main monitor was an eye-tracker in which
the users searched and worked on writing their reports; the
2nd monitor was a regular monitor sitting beside the search
monitor, which displayed the online questionnaires and the
task and sub-task descriptions. Users’ eye movements were
captured but are not reported here. Logging software Morae
(http://www.techsmith.com/morae.asp) was used to record
all the user-system interactions (such as mouse and
keyboard activities, window display) in the main monitor.
Study 2
Tasks
Tasks in this study also follow the faceted task type
classification method proposed by Li and Belkin (2008) to
vary and control the values of the task facets. Table 2 is an
overview of the facets of Li and Belkin’s classification
scheme which we manipulated. We added two facets to the
classification scheme: “Naming”, according to whether the
expected fact is named in the search task or not; “Level of
document judgment”, the level when users judge the
usefulness of document for the tasks.
Table 2. Facets of task which were varied in this study
(After Li & Belkin, 2008, modified)
Facets
Values
Intellectual
Product
Factual
Named
Naming
Unnamed
Goal
(Quality)
Specific
goal
Amorphous
goal
Combined
goal
Document
Level
Segment
Operational Definitions/Rules
A task which produces new
ideas or findings
A task locating facts, data, or
other similar items in
information systems
A task locating factual
information to confirm or
disconfirm named fact
A task locating factual
information about unnamed fact
A task with a goal that is explicit
and measurable
A task with a goal that cannot be
measurable
A task with both concrete and
amorphous goals
A task for which a document as
a whole is judged
A task for which a part or parts
of a document are judged
!Copy Editing (CPE)
Your assignment: You are a copy editor at a newspaper and you have only 20
minutes to check the accuracy of the three underlined statements in the excerpt of a
piece of news story below. New South Korean President Lee Myung-bak takes
office Lee Myung-bak is the 10th man to serve as South Korea’s president and the
first to come from a business background. He won a landslide victory in last
December’s election. He pledged to make economy his top priority during the
campaign. Lee promised to achieve 7% annual economic growth, double the
country’s per capita income to US$4,000 over a decade and lift the country to one
of the topic seven economies in the world. Lee, 66, also called for a stronger
alliance with top ally Washington and implored North Korea to forgo its nuclear
ambitions and open up to the outside world, promising a better future for the
impoverished nation. Lee said he would launch massive investment and aid
projects in the North to increase its per captia income to US$3,000 within a decade
“once North Korea abandons its nuclear program and chooses the path to
openness.”
Your task: Please find and save an authoritative page that either confirms or
disconfirms each statement.
Advance Obituary (OBI)
Your assignment: Many newspapers commonly write obituaries of important
people years in advance, before they die, and in this assignment, you are asked to
write an advance obituary for a famous person.
Your task: Please collect and save all the information you will need to write an
advance obituary of the artist Trevor Malcolm Weeks.
!
Figure 2. Two tasks in Study 2
Table 3. Variable facet values for the search tasks
Task
Product
Goal
(quality)
Naming
Level
CPE
OBI
Factual
Factual
Specific
Amorphous
Named
Unnamed
Segment
Document
Four tasks were designed and used in this study, and each
of the tasks was a combination of several facets. For the
analysis in this paper, we selected two (Figure 2 and Table
3) from the four tasks to evaluate users’ difficulty ratings.
The two tasks were selected for the following reasons. First,
the two tasks have some common characteristics, which
made them comparable. For example, both of these two
tasks were “Factual” tasks since they required users to
identify factual information and did not require “Intellectual”
information. In addition, both of these two tasks required
users to collect important information about a particular
person: CPE is about a Korean president and OBI is about
an artist. Secondly, these two tasks were different in three
other facets: Goal (quality); Naming; Level of document
judgment. The Level of document judgment was found to
affect users’ dwell time spent on documents (Liu et al.,
2010), but is less likely to affect the task’s difficulty in
general. Nevertheless, it is reasonable to assume that the
other two facets could influence the task difficulty. For
example, a task with “Specific” goals and “Named” fact
may provide a lot more specific information about the
expected information than another task with “Amorphous”
goals and “Unnamed” fact, making it easier to accomplish.
Participants
Thirty-two participants (26 female, 6 male) were recruited
from undergraduate Journalism/Media Studies students
(same school as in Study 1, but only a couple of them
participated in both studies). They were between 18 and 27
years old. They rated their computing and search skills high
and reported an average search experience of 8.85 years
using a range of different browsers (IE, Firefox, Safari, et.).
They were generally positive about their average success
during online search. As in Study 1, participants were
informed in advance that their payment for participation in
the experiment would be $20.00, and that the 8 who saved
the best set of pages for all four tasks, as judged by an
external expert, would receive an additional $20.00.
Procedures
Participants came individually to a usability lab to take part
in the experiment. Upon their arrival, they first completed a
background questionnaire about their demographic
information and search experience. Each participant was
given a tutorial as a warm-up task and were then asked to
work on 4 tasks, in the assigned order balanced for all
participants. Participants were asked to search using IE 6.0
on the computer in our lab and they were free to go
anywhere on the Web to search for information and were
asked to continue the search until they had gathered enough
information to accomplish the task. In each task, they were
asked to save content pages that were useful for them to
accomplish the assignments, or delete saved pages that
were found to be not useful later. When participants
decided they found and saved enough information objects
for purposes of the task, they were then asked to evaluate
the usefulness of the information objects they saved, or
saved and then deleted, through replaying the search using
the screen capture program.
Before working on each task, participants were asked to
complete a pre-task questionnaire about their familiarity
with the general task topic, previous experience with the
type of assignment, and the expected task difficulty. After
the task, participants were asked to complete a post-task
questionnaire about their perceptions of the difficulty of the
task and their success in gathering information for the task.
Data collection
This was the same as that in Study 1.
RESULTS
From Study 1
Difficulty comparison between the two tasks
In order to have an idea of the difficulty level of each of the
two tasks, we first looked individually at the pre-task and
post-task difficulties of the two tasks and compared each of
them between two tasks. The distribution of task difficulty
being not normal, the non-parametric Mann-Whitney U test
was used to compare the difficulty of the two general tasks.
Results (Table 4) show no significant difference in users’
ratings for the two tasks either before or after they worked
on them. This was reasonable given that both tasks were in
the same domain, with the same general requirements.
Table 4. Pre- and post-task difficulty ratings in two tasks
Difficulty
type
Pre-task
Post-task
Mean (standard deviation)
DT
PT
2.83 (1.34)
2.58 (0.90)
2.92 (1.31)
2.08 (0.90)
MannWhitney U(p)
61.5 (0.551)
43.5 (0.101)
Expected vs. reflected task difficulty
We then compared users’ expected and reflected task
difficulty for each task. The non-parametric Wilcoxon test
was used due to the non-normal distribution of task
difficulty. Results (Figure 3) show that in the dependent
task (DT), there was no significant difference (W(11)=21.0,
p=0.852) between users’ ratings on these two types of
difficulty. However, in the parallel task (PT), after working
on the whole task, users felt that the general task was not as
difficult as they expected in the beginning of the
experiment (W(11)=4.5, p=.034).
3.5
3
2.5
2
1.5
1
0.5
0
pre-task
difficulty
post-task
difficulty
DT
PT
Figure 3. Comparison of pre- and post-task difficulty
ratings in each task Relationship between the expected and the reflected
difficulty and users’ background factors
We then examined the relation between users’ background
factors and their perceived task difficulty. Our results
(Table 5) showed that the users’ background factors we
investigated showed significant correlation with users’
perception of task difficulty only on a few occasions, and in
these cases, the patterns for the different tasks were
different. Pre-task difficulty was found to be correlated only
with the perceived successfulness in gathering information,
and since the later was elicited post-task, it was not likely to
affect users’ expected difficulty before the task. Post-task
difficulty had a negative correlation with pre-task topic
familiarity only in the parallel task, and it has a positive
correlation with previous experience with the type of task
only in the dependent variable.
Table 5. Pearson correlation between task difficulty and
other factors
Pre-task topic
familiarity
Previous
experience
Post-task topic
familiarity
Successful in
gathering
information
Task
accomplishment
Pre-task difficulty
DT
PT
-.21
-.51
(.51)
(.087)
-.23
-.52
(.48)
(.083)
-.12
-.16
(.72)
(.61)
-.61
-.42
(.034)
(.17)
-.66
(.020)
-.29
(.36)
Post-task difficulty
DT
PT
-.11
-.68
(.74)
(.015)
.62
-.27
(.031)
(.405)
.03
-.31
(.93)
(.335)
.58
-.20
(.051)
(.534)
-.53
(.08)
-.10
(.77)
Relationship between the perceived task difficulty change
and users’ background factors
We then compared the change of users’ perceived task
difficulty, i.e., the differences between the post-task and the
pre-task difficulty ratings. In each task, users were
categorized into groups according to their changes in posttask difficulty being greater/equal/less than their pre-task
difficulty. The three groups of users were compared in
aspects of their pre-task topic familiarity, previous
experience.
Table 6. Dependent task: Comparison of background
factors among three groups
Post< Post = Post > Kruskal
-Wallis
pre
pre
pre
H (p)
N
Pre-task topic
familiarity (mean (SD))
Previous experience
(mean (SD))
Post-task topic
familiarity (mean(SD))
Successful in gathering
information (mean(SD))
Task accomplishment
(mean(SD))
Knowledge gain
5
2.20
(1.30)
3.20
(1.48)
4.00
(0)
4.80
(0.45)
4.60
(0.89)
5.40
(0.89)
3
3.00
(1.00)
2.33
(1.16)
4.33
(0.58)
6.33
(0.58)
6.67
(0.58)
6.33
(0.58)
4
2.00
(0.82)
4.75
(0.96)
4.25
(0.50)
5.00
(1.41)
5.00
(1.41)
5.75
(0.96)
12
1.593
(.451)
5.582
(.061)
1.650
(.438)
5.705
(.058)
5.509
(.064)
2.716
(.257)
Table 7. Parallel task: Comparison of background
factors among three groups
N
Pre-task topic familiarity
(mean (SD))
Previous experience
(mean (SD))
Post-task topic
familiarity (mean(SD))
Successful in gathering
information (mean(SD))
Task accomplishment
(mean(SD))
Knowledge gain
Post<
pre
Post=
pre
Post>
pre
7
3.43
(1.62)
2.71
(1.60)
4.57
(1.62)
5.00
(1.41)
4.71
(1.25)
5.29
(1.11)
4
3.00
(2.45)
3.75
(1.71)
4.00
(1.41)
5.50
(0.58)
5.50
(0.58)
5.25
(0.50)
1
2.00
(na)*
4.00
(na)*
4.00
(na)*
6.00
(na)*
5.00
(na)*
7.00
(na)*
KruskalWallis
H(p)
12
.594
(.743)
1.226
(.542)
.565
(.754)
1.342
(.511)
2.266
(.322)
2.309
(.315)
* N=1 in this group, so its standard deviation was not computed. In summary of the effect of user background on task
difficulty, our investigations of the relations between user
background factors and perceived task difficulty (both the
expected and the reflected) detected significant correlations
in only few cases (Table 5), and our investigation of
relations between user background factors and perceived
difficulty change (Tables 6 & 7) found that the examined
background variables did not show any significant
differences among user groups whose perceived task
difficulty was reduced, unchanged, and increased. Together,
these results indicate that the examined user background
variables were not important factors influencing users’
perceived task difficulty and task difficulty change. Since
task structure is the only main difference between the two
tasks, it seems fairly safe to say that this task feature
affected users’ perceived difficulty change.
From Study 2
Difficulty comparison between CPE and OBI
Similar to Study 1, we first looked individually at the pretask and post-task difficulties of CPE and OBI, and
compared each of them between the two tasks. Since the
distributions of these two measures in the dataset were not
normally distributed, independent 2-group Mann-Whitney
U Test was used to compare the difficulty of the two tasks.
Table 8. Comparison of pre- and post-task difficulty
ratings between the two tasks
Difficulty
type
Mean (standard deviation)
CPE
OBI
Man-Whitney
U(p)
Pre-task
3.13(1.43)
3.25(1.27)
485.5(0.72)
Post-task
2.31(1.28)
5.25(1.37)
78(<.001)
Results (Table 8) show no significant difference in user
ratings for the two tasks before they worked on the tasks,
while OBI was rated to be significantly more difficult than
CPE after users worked on it (U(31)=78, p<.001). In other
words, the two tasks were rated at the similar difficulty
level before task, but OBI was rated to be more difficult
than CPE after task completion.
Expected vs. reflected task difficulty in each task
It is necessary to assess why the two tasks with similar pretask difficulty showed significant difference after users
worked on them. In this part, we compared the difference
between the expected (pre-task) and the reflected (posttask) difficulty in each task.
Table 9. Pearson correlation between Pre-/ and posttask difficulty with other factors
Pre-task topic
familiarity
Previous
experience
Post-task
difficulty
Successful in
information
gathering
Pre-task difficulty
CPE
OBI
-0.59
-1.97
(0.56)
(0.06)
2.97
3.46
(0.005) (0.001)
0.85
0.41
(0.40)
(0.69)
0.08
0.34
(0.93)
(0.73)
Post-task difficulty
CPE
OBI
0.41
0.91
(0.68)
(0.37)
0.10
0.23
(0.93)
(0.82)
0.85
0.41
(0.40)
(0.69)
4.41
7.01
(<0.001) (<.001)
Relationship between the perceived task difficulty change
and users’ background factors
Figure 4. Comparison of pre- and post-task difficulty
ratings in each task
Dependent 2-group Wilcoxon Signed Rank Test was used
to compare the pre-task difficulty with post-task difficulty
in the two tasks. Results (Figure 4) show that in CPE, the
post-task difficulty was significantly lower than the pre-task
difficulty (W(31)=228, p=.02); while in OBI, the post-task
difficulty was significantly higher than the pre-task
difficulty (W(31)=26, p<0.001).
The relationship between
background factors
task
difficulty
and
users’
The above results showed that users rated CPE and OBI at
similar difficulty level before searching, but after working
on them, users found the task was actually much easier than
they expected in CPE, but the task was actually much more
difficult than they expected in OBI. Again we examined the
relationship between task difficulty and other users’
background factors.
The correlation analysis (Table 9) showed that users’ pretask difficulty ratings in both CPE and OBI were
significantly negatively related to their previous experience,
but not significantly correlated with any other factors. In
particular, the less previous experience the user had before
searching, the more difficulty they expected for the
searching task. On the other hand, in both tasks, users’ posttask difficulty rating was only significantly negatively
related to their post-task success rating, and none of the
other factors were significantly related. In particular, the
higher the post-task difficulty, the less successful they felt
about task accomplishment. These results are reasonable
but they are not very helpful in explaining the difference in
the difficulty rating change from pre-task to post-task in the
two tasks.
Then we compared the change in users’ perceived task
difficulty, i.e., the differences between the post-task and the
pre-task difficulty ratings. Like what was done in Study 1,
in both CPE and OBI in Study 2, we had three groups of
users according to whether their change in post-difficulty
was higher/equal/lower than their pre-task difficulty, and
we then compared users’ pre-task familiarity and previous
experience among these three groups.
Table 10. CPE: Comparison of background factors
among three groups
Post< Post= Post> Kruskal
-Wallis
pre
pre
pre
H(p)
N
pre-task topic familiarity
(mean(SD))
Previous experience
(mean(SD))
Successful in gathering
information (mean(SD))
19
3.16
(2.11)
3.26
(1.59)
6.37
(1.16)
8
2.50
(1.51)
3.25
(1.58)
6.38
(0.52)
5
2.60
(2.61)
4.20
(1.30)
5.20
(1.30)
32
0.71
(0.70)
1.54
(0.46)
4.82
(0.09)
Table 11. OBI: Comparison of background factors
among three groups
KruskalPost
Post= Post
Wallis
< pre
pre
> pre
H(p)
N
pre-task
topic
familiarity (mean(SD))
Previous
experience
(mean(SD))
Successful in gathering
information (mean(SD))
3
2.33
(2.31)
2.0
(1.00)
6.33
(0.58)
4
3.00
(1.63)
3.0
(1.83)
4.75
(0.50)
25
2.44
(1.73)
2.8
(1.73)
3.60
(1.58)
32
0.73
(0.69)
0.57
(0.75)
8.66
(0.01)
The Kruskal-Wallis tests results (Tables 10 and 11) show
that there were no significant differences among these three
groups in either CPE or OBI on their pre-task familiarity
and previous experience. In the OBI task, the three groups
were significantly different among each other on the
successful rating in gathering information after searching.
In particular, users whose reflected difficulty was higher
than their expected difficulty had the lowest rating in their
success in gathering information, while users whose
reflected difficulty was lower than their expected difficulty
had the highest rating in their success in gathering
information. Such results are reasonable but they do not
help explain the difference of difficulty change between the
two tasks.
Therefore, neither the pre-task familiarity nor previous
experience had a significant effect on users’ difficulty
ratings, pre-task difficulty, post-task difficulty or the
change from pre- to post- difficulty ratings. Meanwhile, the
characteristics of the tasks seem to be important factors that
affected users’ perceived difficulty and the change in users’
difficulty rating before and after search. Even though the
two tasks had similar pre-task difficulty ratings, the task
with Specific goal and Named fact, CPE, was much easier
than was expected, while the task with Amorphous goal and
Unnamed fact, OBI, was much more difficult than was
expected. In addition, from the experimenters’ observations
of users’ search process during these two tasks, users had
been taking much of their time in disambiguating the artist
that the OBI task required them to search for, since there
were multiple people with that name on the Web, and this
artist was not as famous as the Korean President in the CPE
task. Therefore, the results demonstrate that task features,
including task facets and task topic ambiguity, were the
most influential factors on post-task difficulty and the
change of post-task difficulty with pre-task difficulty.
DISCUSSION
Our results in the two different studies demonstrate similar
patterns in some aspects, which are discussed below.
Expected difficulty vs. reflected difficulty
Our results show that before and after working with a task,
users’ perception of its difficulty does not always stay
unchanged. Instead, the reflected difficulty could change
dramatically, either increase (e.g., in OBI of study 2) or
decrease (e.g., in the parallel task of study 1 and CPE of
study 2), in comparison with the expected difficulty. These
results demonstrate that the estimation of a task’s difficulty
level before working with it is likely to be inaccurate, and
the expected difficulty should not be used to assess the
task’s real difficulty. This finding is reasonable considering
that the pre-task estimation of task difficulty is based just
upon one’s interpretation of how much cognitive effort will
be needed to accomplish the task, given one’s background
knowledge and understanding of the task and of task
accomplishment at that moment. If the user does not have
full comprehension of the task and its topic, without
interacting with the search system and learning through
searching for the information, the estimation could certainly
be limited in accuracy.
The effect of user background on perceived task difficulty
As Vakkari (1999) outlines, users’ prior knowledge relates
to task complexity. Also as we speculate in our model
(Figure 1), one may think that the perceived (pre- and/or
post-) task difficulty is affected by one’s background,
including the knowledge before and/or after working with
the task, previous experience with the type of task,
satisfaction with information gathering, perception of task
accomplishment, and so on. Surprisingly, our two studies
showed that the perceived task difficulty is rarely correlated
with the above-mentioned factors about the user’s
background, whether the task’s actual difficulty level was
high (e.g., OBI) or low (e.g., CPE, or tasks in study 1). This
meant that in the examined task types, users’ perceived task
difficulty is not affected by the examined user background
factors. However, it will remain undecided if, and how,
user’s background affects task difficulty perception until
further exploration about other task types and more
background factor(s) are investigated. Further studies could
look at factors such as users’ knowledge about the search
task domain (instead of the topic examined in this study),
computer expertise, number of years in web searching, and
confidence in web searching, etc.
The effect of user background on perceived task difficulty
change
Our results did not show evidence that user’s background
variables as examined in our studies were influencing
factors of the perceived difficulty change before and after
working on a task. More surprisingly, the change in
perceived task difficulty was not correlated with perceived
knowledge change before and after the task, either. Further
effort is needed to examine the relationship between
perceived difficulty change and user’s background factors.
The effect of task features
Although we studied only two tasks in each of our two
studies, they were carefully designed along certain task type
dimensions, and the two tasks in each study are comparable
in the only task facet(s) that they differ. Our studies found
that the three task facets (i.e., task Structure, the Quality of
task Goal, and Naming) affected users’ perceived difficulty
change. In particular, the parallel task structure may
decrease users’ reflected task difficulty, as does the specific
and named task (CPE), while on the other hand, the
amorphous and unnamed task (OBI) may increase user’s
reflected task difficulty. The finding that task type plays
role in perceived task difficulty is in accord with Vakkari’s
(1999) model, in that there is a connection between problem
structure and task complexity. We believe using the faceted
task classification method to characterize search task types
can extend the generality of the result of our study. Future
studies could examine other task facets for their effects on
perceived task difficulty change.
Implications for system design
Our findings have several implications for IR system design.
First, the user’s perception of a task’s difficulty level could
change along the way of searching for information to solve
the task. In order for systems to be able to provide help to
users when they have difficulty, systems should be able to
monitor this change along users’ search process through
observing users’ behaviors such as time spent on pages,
number of queries issued (e.g., Aula, Khan, & Guan, 2010;
Liu et al., 2010).
Second, systems could be designed to provide assistance for
users in solving difficult tasks, not only by returning better
results for queries and suggesting queries based on semantic
meaning, but also by offering suggestions in their task
solving strategies. For example, decomposing a task into
several parallel sub-tasks rather than dependent sub-tasks
could lead users to find the task less difficult; systems could
make query suggestions that help the users to solve a
general task through parallel sub-tasks. Also, trying to make
users’ tasks more specific and unambiguous could possibly
reduce users’ perception of task difficulty. In addition to
suggesting queries or query terms that help reduce user
queries’ ambiguity to get better (more relevant or useful)
search results, systems could also improve their search
result page display to make the results less ambiguous to
the users, by ways such as grouping results of the same
category together, etc. This could help the users more easily
gain a better understanding of the search results, especially
when they spread across various aspects.
Third, the above-mentioned change of the user’s perception
of a task’s difficulty can be different in different task types.
Therefore, it would help systems provide the right ways of
assistance if they are able to detect users’ task type through
monitoring users’ behaviors (e.g., Liu et al., 2010).
Limitations and future studies
Our research has limitations in terms of the number of task
types and tasks, and the non-naturalistic lab experiment
itself. Despite these limitations, we believe our research
made a crucial step toward understanding better about users,
user search behaviors, and provides meaningful advice for
IIR system design.
It has been mentioned above in our discussion that future
studies could look at more task type facets and more user
background factors in order to gain a more comprehensive
understanding about the effects of task type and user
background on users’ perception of task difficulty, as well
as the perception changes before and after tasks. As Xie and
Cool (2009) proposed, different types of tasks and different
task requirements need different types of assistance. More
studies are needed to help researchers understand more
about the relationship between these factors in order to
provide appropriate help for users.
CONCLUSION
Task difficulty requires in-depth understanding in order to
build systems that can help users when they have difficult
tasks to solve. We conducted analyses of users’ ratings on
the difficulty levels of several types of tasks in two studies.
Our results of the two studies showed that user’s perceived
task difficulty could change in different ways, or not change,
before and after users worked on their tasks. Therefore, the
estimate of a task’s difficulty level before working on it is
likely to be inaccurate. Also, surprisingly, both studies
showed that the expected and the reflected difficulty were
rarely correlated with the users’ background, including pretask topic familiarity, previous experience, post-task topic
familiarity, satisfaction with information gathering, and
perception of task accomplishment and neither was the
perceived difficulty change. In addition, it seems that the
perceived difficulty change before and after users’ worked
on the task is affected by task types. Our findings are
helpful for IR system design in providing assistance to
users and reducing their perceived task difficulty.
ACKNOWLEDGMENTS
This research was sponsored by IMLS grant LG#06-070105-07. We thank the reviewers for their insightful
comments.
REFERENCES
Aula, A., Khan, R. & Guan, Z. (2010). How does search behavior
change as search becomes more difficult? Proceedings of CHI
‘10, 35-44.
Bell, D. & Ruthven, I. (2004). Searcher's sssessments of task
complexity for Web searching. Advances in information
retrieval. Proceedings of ECIR ‘04, 57-71.
Byström, K. (2002). Information and information sources in tasks
of varying complexity. Journal of the American Society for
Information Science and Technology, 53(7), 581-591.
Byström, K., & Järvelin, K. (1995). Task complexity affects
information seeking and use. Information Processing and
Management, 31(2), 191-213.
Cole, M. J., Zhang, X., Liu, J., Liu, C., Belkin, N. J., Bierig, R.,
and Gwizdka, J. (2010). Are self-assessments reliable indicators
of topic knowledge? Proceedings of ASIS&T ‘10.
Gwizdka, J., Spence, I. (2006). What can searching behavior tell
us about the difficulty of information tasks? A study of Web
navigation. Proceedings of ASIS&T ‘06.
Ingwersen, P. & Järvelin, K. (2005). The turn: Integration of
information seeking and retrieval in context. Springer-Verlag
New York, Inc. Secaucus, NJ, USA.
Jansen, B. J. (2006). Using temporal patterns of interactions to
design effective automated searching assistance systems.
Communications of the ACM, 49(4), 72-74.
Kim, J. (2006). Task difficulty as a predictor and indicator of web
searching interaction. Proceedings of CHI ‘06, 959-964.
Li, Y. & Belkin, N.J. (2008). A faceted approach to
conceptualizing tasks in information seeking. Information
Processing & Management, 44, 1822-1837.
Li, Y. & Belkin, N. J. (2010). An exploration of the relationships
between work task and interactive information search behavior.
Journal of the American Society for Information Science and
Technology, 61(9), 1771-1789.
Liu, J., Gwizdka, J., Liu C., & Belkin, N.J. (2010). Predicting task
difficulty for different task types. Proceedings of ASIS&T ‘10.
Liu, J., Cole, M.J., Liu, C., Bierig, R., Gwizdka, J., Belkin, N. J.,
Zhang, J., & Zhang, X. (2010). Search behaviors in different
task types. Proceedings of JCDL ‘10, 69-78.
Varkkari, P. (1999). Task complexity, problem structure and
information
actions. Information
Processing
&
Management, 35(6), 819-837.
Xie, I. & Cool, C. (2009). Understanding Help-Seeking within the
context of searching digital libraries. Journal of the American
Society for Information Science and Technology, 60(3), 477-494.
Download