- Optimal Workshop

advertisement
Tree testing: a quick way to evaluate your IA
Dave O’Brien, Optimal Usability (Wellington, New Zealand)
Abstract
A big part of information architecture is organisation – creating the structure of a
site. For most sites – particularly large ones – this means creating a hierarchical “tree”
of topics.
But to date, the IA community hasn’t found an effective, simple technique (or tool) to
test site structures. The most common method used – closed card sorting – is neither
widespread nor particularly suited to this task.
Some years ago, Donna Spencer pioneered a simple paper-based technique to test
trees of topics. Recent refinements to that method, some made possible by online
experimentation, have now made “tree testing” more effective and agile.
This article describes the current state of tree testing, a web tool for automating it,
and what we’ve learned from running tree tests to improve IAs for several large sites.
Callout quote - “Getting to the right page within a website or intranet is the inevitable
prerequisite to getting anything done.” Jakob Nielsen
Introduction
Some time ago, we were working on an information-architecture project for a large
government client here in Wellington. It was a classic IA situation – their current
site’s structure (the hierarchical “tree” of topics) was a mess, they knew they had
outgrown it, and they wanted to start fresh.
We jumped in and did some research, including card-sorting exercises with various
user groups. We’ve always found card sorts (in person or online) to be a great way to
generate ideas for a new IA.
Brainstorming sessions followed, and we worked with the client to come up with
several possible new site trees. But were they better than the old one? And which new
one was best? After a certain amount of debate, it became clear that debate wasn’t the
way to decide. We needed some real data – data from users. And, like all projects,
we needed it quickly.
What kind of data? At this early stage, we weren’t concerned with visual design or
navigation methods; we just wanted to test organisation – specifically, findability and
labeling. We wanted to know:
o Could users successfully find particular items in the tree?
o Could they find those items directly, without having to backtrack?
o Could they choose between topics quickly, without having to think too much
(the Krug Test1)?
o Overall, which parts of the tree worked well, and which fell down?
Not only did we want to test each proposed tree, we wanted to test them against each
other, so we could pick the best ideas from each.
And finally, we needed to test the proposed trees against the existing tree. After all,
we hadn’t just contracted to deliver a different IA – we had promised a better IA, and
we needed a quantifiable way to prove it.
The problem
This, then, was our IA challenge:
o getting objective data on the relative effectiveness of several tree
structures, and
o getting it done quickly, without having to build the actual site first.
As mentioned earlier, we had already used open card sorting to generate ideas for the
new site structure. We had done in-person sorts (to get some of the “why” behind our
users’ mental models) as well as online sorts (to get a larger sample from a wider
range of users).
But while open card sorting is a good “detective” technique, it doesn’t yield the final
site structure - it just provides clues and ideas. And it certainly doesn’t help in
evaluating structures.
For that, information architects have traditionally turned to closed card sorting,
where the user is provided with predefined category “buckets” and ask to sort a pile of
content cards into those buckets. The thinking goes that if there is general agreement
about which cards go in which buckets, then the buckets (the categories) should
perform well in the delivered IA.
The problem here is that, while closed card sorting mimics how users may file a
particular item of content (e.g. where they might store a new document in a
document-management system), it doesn’t necessarily model how users find
information in a site. They don’t start with a document – they start with a task, just
as they do in a usability test.
What we wanted was a technique that more closely simulates how users browse
sites when looking for something specific. Yes, closed card sorting was better than
nothing, but it just didn’t feel like the right approach.
Other information architects have grappled with this same problem. We know some
who wait until they are far enough along in the wireframing process that they can
include some IA testing in the first rounds of usability testing. That piggybacking
saves effort, but it also means that we don’t get to evaluate the IA until later in the
design process, which means more risk.
We know others who have thrown together quick-and-dirty HTML with a proposed
site structure and placeholder content. This lets them run early usability tests that
1
Don’t Make Me Think, Steve Krug
focus on how easily participants can find various sublevels of the site. While that gets
results sooner, it also means creating a throw-away set of pages and running an extra
round of user testing.
With these needs in mind, we looked for a new technique – one that could:
o Test topic trees for effective organisation
o Provide a way to compare alternative trees
o Be set up and run with minimal time and effort
o Give clear results that could be acted on quickly
The technique – tree testing
Luckily, the technique we were looking for already existed. Even luckier was that we
got to hear about it firsthand from its inventor, Donna Spencer, the well-regarded
information architect out of Australia2, and author of the recently released book Card
Sorting.
During an IA course that Donna was teaching in Wellington, she was asked how she
tested the site structures she created for clients. She mentioned closed card sorting, but
like us, she wasn’t satisfied with it.
She then went on to describe a technique she called card-based classification, which
she had used on some of her IA projects. Basically, it involved modeling the site
structure on index cards, then giving participants a “find-it” task and asking them to
navigate through the index cards until they found what they were looking for.
To test a shopping site, for example, she might give them a task like “Your 9-year-old
son asks for a new belt with a cowboy buckle”. She would then show them an index
card with the top-level categories of the site:
The participant would choose a topic from that card, leading to another index card
with the subtopics under that topic.
2
An island located just northwest of New Zealand.
The participant would continue choosing topics, moving down the tree, until they
found their answer. If they didn’t find a topic that satisfied them, they could backtrack
(go back up one or more levels). If they still couldn’t find what they were looking for,
they could give up and move on to the next task.
During the task, the moderator would record:
o the path taken through the tree (using the reference numbers on the cards)
o whether the participant found the correct topic
o where the participant hesitated or backtracked
By choosing a small number of representative tasks to try on participants, Donna
found that she could quickly determine which parts of the tree performed well and
which were letting the side down. And she could do this without building the site
itself – all that was needed was a textual structure, some tasks, and a bunch of index
cards.
Donna was careful to point out that this technique only tests the top-down
organisation of a site and the labeling of its topics. It does not try to include other
factors that affect findability, such as:
o the visual design and layout of the site
o other navigation routes (e.g. cross links)
o search
While it’s true that this technique does not measure everything that determines a site’s
ease of browsing, that can also be a strength. By isolating the site structure - by
removing other variables at this early stage of design - we can more clearly see how
the tree itself performs, and revise until we have a solid structure. We can then move
on in the design process with confidence. It’s like unit-testing a site’s organisation and
labeling. Or as my colleague Sam Ng says, “Think of it as analytics for a website you
haven’t built yet.”
Treejack - a tree-testing tool
As we started experimenting with “card-based classification” on paper, it became
clear that, while the technique was simple, it was tedious to create the cards on paper,
recruit participants, record the results manually, and enter the data into a spreadsheet
for analysis. The steps were easy enough, but they were time eaters.
It didn’t take too much to imagine all this turned into a web app – both for the
information architect running the study and the participant browsing the tree. Card
sorting had gone online with good results, so why not card-based classification?
Ah yes, that was the other thing that needed work – the name. During the paper
exercises, it got called “tree testing”, and because that seemed to stick with
participants and clients, it stuck with us. And it sure was a lot easier to type.
To create a good web app, we knew we had to be absolutely clear about what it was
supposed to do. For online tree testing, we aimed for something that was:
o Quick for an information architect to learn and get going on
o Simple for participants to do the test
o Able to handle a large sample of users
o Able to present clear results
We created a rudimentary application as a proof of concept, running a few client
pilots to see how well tree testing worked online. After working with the results in
Excel, it became very clear which parts of the trees were failing users, and how they
were failing. The technique was working.
However, it also became obvious that a wall of spreadsheet data did not qualify as
“clear results”. So when we sat down to design the next version of the tool – the
version that information architects could use to run their own tree tests – reworking
the results was our number-one priority.
Optimal Workshop has now released that beastie into the wild – it’s called Treejack,
and it’s available for anyone to try tree testing on their own site structures.
Participating in a tree test
So, what does online tree testing look like?
As an example, let’s look at what a participant sees when they do a test in Treejack.
Suppose we’ve emailed an invitation to a list of possible participants. (We
recommend at least 30 to get reasonable results – more is good, especially if you have
different types of users.) Clicking a link in that email takes them to the Treejack site,
where they’re welcomed and instructed in what to do.
Once they start the test, they’ll see a task to perform. The tree is presented as a simple
list of top-level topics:
They click down the tree one topic at a time. Each click shows them the next level of
the tree:
Once they click to the end of a branch, they have 3 choices:
o Choose the current topic as their answer (“I’d find it here”).
o Go back up the tree and try a different path (by clicking a higher-level topic).
o Give up on this task and move to the next one (“Skip this task”).
Once they’ve finished all the tasks, they’re done – that’s it. For a typical test of 10
tasks on a medium-sized tree, most participants take 5-10 minutes. As a bonus, we’ve
found that participants usually find tree tests less taxing than card sorts, so we get
lower drop-out rates.
Setting up a tree test
Let’s peek behind the scenes to see how the tree test was set up.
Entering the tree
The heart of a tree test is…um…the tree. You might already have this modeled as a
Visio diagram, an Excel spreadsheet, a Word outline, etc. We couldn’t find a standard
for representing tree structures, so we stuck with simple. You can get your tree into
Treejack by either:
o Typing it in and indenting the levels, or
o Paste it in from Excel, Word, a text editor (or what have you) and having it
convert the tabs or columns to the corresponding levels.
The tree can be any size, depending on how deep we want to test. For very large trees,
we often cut the tree down to size by:
o Only using the upper levels (say 3-4 levels deep), or
o Testing sections of a tree separately (which assumes that users can navigate
the top level easily – a big assumption)
One lesson that we learned early was to build the tree based on the content of the site,
not simply its page structure. For example, suppose a North American company had a
Contact Us page that listed general contact information (for North America), with
subpages for other regions like South America and Europe. The page structure would
be this:
o
Contact Us
o South America
o Europe
…but the content structure is really this:
o
Contact Us
o North America
o South America
o Europe
Because a tree test shows no content, implicit topics (like “North America” above)
have to be made explicit, so they can be selected by tree-test participants.
Also, because we want to measure the effectiveness of the site’s topic structure, we
typically omit “helper” topics such as Search, Site Map, Help, and yes, Contact Us. If
we leave them in, it makes it too easy for users to choose them as alternatives to
browsing the tree, and we don’t learn as much.
Entering tasks
We test the tree by getting participants to look for specific things – to perform “find
it” tasks.
Just as in a usability test, a good task is clear, specific, and representative of the tasks
that actual users will do on the real site.
How many tasks? You might think that more is better (short of exhausting the
participant), but we’ve found a sizable learning effect in tree tests. After a participant
has browsed through the tree several times looking for various items, they start to
remember where things are, and that can skew later tasks.
For that reason, we recommend a maximum of 10 tasks per test –a few less if the tree
is small, perhaps a few more if the tree is very large.
Another way we reduce the learning effect is by providing an option to randomise the
order of tasks. If task 10 is always the last task, participants would presumably do
better at it because they’ve already browsed the tree 9 times before. Unless you need
to present tasks in a certain order (which is rare in tree testing), always randomise.
Entering correct answers
In a tree test, the most obvious thing to measure is the success rate – for a given task,
how many participants actually find the right topic in the tree?
In most medium-to-large site structures, there can often be more than one “correct”
destination – more than 1 page that answers the user’s question.
Consider also that, in well-designed sites, the IA has anticipated some of the alternate
paths that users will follow. The IA knows, for example, that instead of browsing
“properly” to page A, some users will end up at a certain page B. Assuming that page
B clearly redirects those users to page A, we could consider both A and B to be
“correct” destinations – they both lead to user success.
Accordingly, for each task, Treejack lets us mark several pages as correct:
Entering the tree, the tasks, and the answers – that’s it. We send out our test URL (as
an email invitation or a link on a web page), and the users go put our tree through its
paces.
Skimming the high-level results
Just as we wanted the test to be easy to set up and to participate in, we also wanted it
to give us some clear high-level results. The details can wait for later; up front we
want to find out “How’d we do?”
As mentioned earlier, we wanted to know:
o Could users successfully find particular items in the tree?
o Could they choose between topics quickly?
o Could they find those items with a minimum of backtracking?
We designed the high-level results to show just that:
o Success - % of participants who found the correct answer. This is the single
most important metric, and is weighted highest in the overall score.
o Speed – how fast participants clicked through the tree. In general, confident
choices are made quickly (i.e. a high Speed score), while hesitation (taking
extra time) suggests that the topics are either not clear enough or not
distinguishable enough.
o Directness – how directly participants found an answer. Ideally, they reach
their destination without wandering or backtracking.
Each of these is measured per task, normalised to a 0-100 scale, so we can see how
each task fared. There’s also an overall score averaged across all tasks:
The overall score, of course, is what people look at first. If we see an 8/10 success rate
for the entire test, we’ve earned ourselves a beer. Often, though, we’ll find ourselves
looking at a 5 or 6, and realise that there’s more work to be done.
The good news is that our miserable overall score of 5/10 is often some 8’s brought
down by a few 2’s and 3’s. This is where tree testing really shines – separating the
good parts of the tree from the bad, so we can spend our time and effort fixing the
latter.
Diving into the details
Speaking of the latter, let’s find out what’s behind those 2’s and 3’s.
From the summary gauges, we can generally see which measure (success, speed, or
directness) is dragging the score down. To delve deeper, we can download the
detailed results as an annotated spreadsheet.
Destinations
The “Destination” page shows, for each task, how many participants chose a given
topic as the answer:
For tasks with low success rates, we look for wrong answers following two patterns:
o High totals - many participants choosing the same wrong answer. This
suggests a problem with that particular topic (perhaps in relation to its
siblings).
o Clusters of totals – many participants choosing wrong answers in the same
subsection of the site. This suggests a problem with the parent level.
First clicks
The “First click” page shows where participants went on their first click for a task –
which top-level section they chose:
The first click is the one most highly correlated with eventual success – if we can get
users to the right section of a site, they’re much more likely to find the right topic.
This page also shows which top-level sections were visited sometime during the task.
Perhaps the participants completely ignored the correct section, or maybe they visited
it, but backed out and went somewhere else instead.
Paths
The “Paths” page shows the click-by-click paths that participants took through the
tree as they tried each task.
Browsing these paths is useful when:
o We’re trying to figure out how the heck a participant got to some far-flung
corner of the tree.
o The task shows a lot of backtracking in general (i.e. a low Directness score),
and we’re looking for any patterns in where participants backed up. That can
indicate that the parent topic (the topic just clicked) was misleading.
We also look for other signs of trouble, such as:
o High skip rates – Participants can elect to skip a task at any time. Rates above
10% suggest that they found the task particularly difficult.
o Evil attractors – These topics lure clicks even though they have little to do
with the task at hand. Often this is caused by a too-generic label. We saw this
happen to a consumer-review site that had a “Personal” category; they meant
personal-care products like electric shavers, but participants also went there
for “personal” items like cell phones, watches, etc.
In general, we’ve found that tree-testing results are much easier to analyse than cardsorting results. The high-level results pinpoint where the problems are, and the
detailed results usually make the reason plain. In cases where a result has us
scratching our heads, we do a few in-person tree tests, prompting the participant to
think aloud and asking them about the reasons behind their choices.
Lessons learned
We’ve run several tree tests now for large clients, and we’re very pleased with the
technique. Along the way, we’ve learned a few things too:
o Test a few different alternatives. Because tree tests are quick to do, we can
take several proposed structures and test them against each other. This is a
quick way of resolving opinion-based debates over which is better. For the
government web project we discussed earlier, one proposed structure had
much lower success rates than the others, so we were able to discard it without
regrets or doubts.
o Test new against old. Remember how we promised that government agency
that we would deliver a better IA, not just a different one? Tree testing proved
to be a great way to demonstrate this. In our baseline test, the original structure
notched a 31% success rate. Using the same tasks, the new structure scored
67% - a solid quantitative improvement.
o Do iterations. Everyone talks about developing designs iteratively, but
schedules and budgets often quash that ideal. Tree testing, on the other hand,
has proved quick enough that we’ve been able to do two or three revision
cycles for a given tree, using each set of results to progressively tweak and
improve it.
o Identify critical areas to test, and tailor your tasks to exercise them.
Normally we try to cover all parts of the tree with our tasks. If, however, there
are certain sections that are especially critical, it’s a good idea to run more
tasks that involve those sections. That can reveal subtleties that you may have
missed with a “vanilla” test. For example, in another study we did, the client
was considering renaming an important top-level section, but was worried that
the new term (while more accurate) was less clear. Tree testing showed both
terms to be equally effective, so the client was free to choose based on other
criteria.
o Crack the toughest nuts with “live” testing. Online tree tests suffer from the
same basic limitation as most other online studies – they give us loads of
useful data, but not always the “why” behind it. Moderated testing (either in
person or by remote session) can fill in this gap when it occurs.
Summary
To sum up, tree testing has given us the IA tool we were after – a quick, clear,
quantitative method to test site structures. Like user testing, it shows us (and our
clients) where we need to focus our efforts, and injects some user-based data into the
IA design process. The simplicity of the technique means that we can do variations
and iterations until we get a really good end result.
Tree testing also makes our clients happy. They quickly “get” the concept, the highlevel results are easy for them to understand, and they love having data to show their
management and to measure their progress against.
On the “but” side, tree testing is still an embryonic technique. Our metrics and scoring
algorithms are first-generation, and we have a long list of improvements we’re keen to
make to Treejack, our online tool.
Also, simplicity is a two-sided coin – what we gain by having a clear single-purpose
tool, we give up in comprehensiveness. Tree testing is a narrow technique – it only
tells us how a structure performs in vacuo. A good tree still needs to be married up to
an effective navigation system, content design, and visual treatment before we can say
we’ve built a good information architecture.
Tree testing has proved effective for us. We think it can be useful for anyone
designing a large content structure. If you’re keen, try it as described here and in
Donna’s original article, or put your own spin on it for your own needs. Either way,
our hope is that tree testing, as a quick way to evaluate site structures, earns a place in
the information architect’s toolkit.
Download