A-Z - UCC CS | Intro

advertisement
1) Introduction - [Background and History]
Early computers were usually batch programmed:
 Tasks (programmed and data) prepared off-line.
 Batch of tasks loaded and run in sequence.
 Operator intervention limited to loading batches.
 No intervention whilst batch is running.
In recent years, new challenges and possibilities have emerged:
 Computers have become smaller and cheaper
 New interaction technologies have been developed.
 Legislation has raised standards, particularly with regards to accessibility.
Accessibility means giving users with special needs the same level of access as other users.
For example, people who are blind/visually impaired can operate GUIs with the aid of a screenreader.
However, compared to other users, screen-reader usually tends to:
 work more slowly
 make more errors
 report higher level of fatigue
Thus, while visually impaired people can user GUIs, it cannot be said that they have the same level
of access as other users.
As computers become cheaper and more powerful, we are seeing a move from reactive systems to
adaptive/proactive systems:
 Reactive Systems
 User always initiates actions
 Large screens, focus on user attention
 Little need for adaptivity
 Proactive Systems
 System or user can initiate actions
 No-screen/hands free, user attention elsewhere
 Adaptivity essential for effective operation
2) Introduction - [Issues and Topics]
"A user interface should be so simple that a beginner in an emergency can understand it within ten
seconds"
"..Any application designed for people should be:
 Easy to learn( and remember )
 Useful, that is, contain functions that people really need in their work, and
 Be easy and pleasant to use."
Four Key Concepts
 Learnability - the time and effort required to reach a specified level of user performance.
 Throughput - tasks accomplished, speed of execution, errors made, etc.
 Flexibility - the extent to which the systems can accommodate changes to the tasks and
environments beyond those first specified.
 Attitude - the attitude engendered in users by the application.
For example:
 A travel agent (primary user) may use a system to search for hotels, flights, trains, etc., on
behalf of...
 A customer( secondary user )
3) Human Memory & Perception - [Memory]
Human memory as three distinct stages. Information is:
1.
2.
3.
Received through one or more of the sensory memories e.g.
 Iconic( visual ) memory
 Ethoic( auditory ) memory
 Haptic memory
Selectively held in short term memory while it is analyzed, after which is may be either
discarded or...
Stored permanently in long term memory
The short term memory can hold around seven items of information
However, it is not easy to define an "item of information". An item might be:
 A single digit or character or word, or...
 A long number or entire phrase, if that number or phrase is already known by the person
There are two types of long term memory:
 Episodic Memory represents our memory of events and experiences - It stores items in
serial form, allowing us to reconstruct sequences of events and experiences from earlier
points in our lives.
 Semantic Memory is structured so that is represents relationships between information it stores information without regard to the order in which it was acquired or the sense
through which it was acquired.
There are three main processes associated with long term memory (LTM):
 Storage/remembering
 Forgetting
 Information Retrieval
Information passes into long term memory via the short term memory.
However, not everything that is held in short term memory is eventually stored in long term
memory.
The main factors that determine what is stored are:
 Rehersal
 Meaning
Rehersal, i.e. repeated exposure to data, or consideration of it, increases the likelihood that it will be
stored in LTM.
Meaningful information is more likely to be stored in LTM than meaningless data.
There are two main theories to explain the loss of information from long term memory: decay or
interference.
 Decay: Ebbinghaus concluded that information is lost through natural decay.
 Interference: new information may replace or corrupts older information.
 For example, changing your telephone number may cause you to forget your old
number. This is known as retroactive interference.
 However, there may also be times when older information 'resurfaces' and becomes
confused with newer information. For example, you may suddenly recall an old
telephone number and confuse it with your new one. This is known as proactive
inhibition.
There are two types of information retrieval from LTM:
 Recall: the recovery of information as a result of a conscious search.
 Recognition: the automatic recovery of information as a result of an external stimulus.
Recognition is around twice as fast and three times as accurate as recall.
4) Human Memory & Perception - [Visual Perception]
The human visual system can be divided into two stages:
 Physical reception of light
 Processing and interpretation
The human visual system has both strengths and weaknesses:
 Certain things cannot be seen when present
 Processing allows images to be constructed from incomplete information
Light passes through the cornea and is focused by the lens producing and inverted image on the
retina.
The iris regulates the amount of light entering the eye.
The retina is covered with photoreceptors. These are of two types:
 Rods; high sensitivity to light, see's black and white colors, low resolution
 Cones; low sensitivity to light, see's color( red, green, black ), high resolution
The eye contains:
 around 120 million rods, most of which are located around the periphery of the retina
 around 6 million cones, most of which are located in the fovea
The lens is flexible and can focus the image on different parts of the retina.
This makes it possible to adapt between light and dark conditions:
 In bright conditions, light is focused on the fovea, giving high resolution and color vision
 In dark conditions, focus is shifted onto the periphery, giving greater sensitivity but
reducing resolution and color perception.
The retina contains ganglion cells which perform some local processing of images.
There are two types of ganglion cells:
 X-Cells
 perform basic pattern recognition
 mainly concentrated in the fovea
 Y-Cells
 perform movement detection
 more widely distributed than X-Cells, and predominate the periphery
The photo-receptors and ganglion cells are all connected to the optic nerve, which carries visual
information to the brain.
There are no photo-receptors in the area of the retina around the optic nerve.
Thus there is a blind spot at this point.
We are not usually aware of the blind spot because our brains ‘fill in' the missing part of the image.
The luminance of an object depends on:
 the amount of light falling on its surface
 the reflective properties of the surface(s)
Contrast is related to luminance. It is the difference in luminance between the brightest and darkest
areas of an image.
The human visual system compensates for bright or dark conditions by varying the relative
percentage of rods and cones it uses.
The human eye can distinguish about 150 hues within the visible light spectrum.
However, the total number of colors we can distinguish is much higher.
This is because:
 Each of the pure hues can be mixed with white in various quantities to produce other
colors.
 We refer to the spectrum of hues as fully-saturated colors.
 When mixed with white, we refer to them as partially-saturated or de-saturated
colors.
 The brightness of each color can be varied:
In practice, we use a limited number of primary colors, e.g.:
 Red, Green and Blue( RGB ) when mixing light
 This is known as additive mixing.
 Cyan, Magenta, Yellow and Black( CMYK ) when mixing pigments
 This is known as subtractive mixing.
Factors affecting our judgement of size include:
 Stereo vision - the difference in the image seen by each eye can be analysed to gauge
distances
 Head Movement - small changes in viewing position produce changes in view that allow
distance to be gauged
 Monocular Cues:
 Relative size
 Relative height
 Relative motion
When children learn to read, they initially read linearly, i.e.
 start at the beginning of the sentence
 read each word in turn
 identify the meaning of each word
 identify the meaning of the sentence
This is a very slow and inefficient method of reading.
As they become more proficient at reading they learn to scan text by spotting key-words.
This process involves the following stages:
 Identify a word or character
 Guess the meaning of the phrase of sentence
 Confirm/disaprove the guess
 Revise the guess if necessary
A number of methods are used to measure the readibility of text:
 Average reading time
 A group of people are asked to read the text, and the average time taken is
noted.
 Fog Index
 Takes into account word-length, sentence-complexity, etc.
 Cloze Technique
 Subjects are asked to read a piece of text in which every fifth word is blanked
out.
 The index is based on the percentage of blanked words that are guessed
correctly.
Factors that affect the readability of text include:
 Font-style and capitalization
 Font size
 Character spacing
 Line Lengths
5) Human Memory & Perception - [Auditory Perception]
Like the visual system, the human auditory system can be divided into two stages:
 Physical reception of sounds
 Processing and interpretation
Like the visual system, the human auditory system has both strengths and weaknesses:
 Certain things cannot be heard even when present
 Processing allows sounds to be constructed from incomplete information
The principal characteristics of sound - as perceived by the listener - are:
 Pitch
 Loudness
 Timbre
The perceived intensity of a sound depends upon:
 The sound pressure.
 The distance between the source and the listener.
 The duration of the sound.
 The frequency of the sound.
Our hearing system allows us to determine the location of sound sources with reasonable accuracy,
subject to certain limitations.
 Stereo hearing allows us to locate the source of a sound by comparing the sound arriving
at each ear and noting differences in:
 Amplitude
 Time of arrival
 Head movement allows us to improve the localization accuracy of stereo hearing
 Analysis of reflected vs direct sound allows us to localize both the horizontal and vertical
planes - to a limited extent
 Familiarity affects localization accuracy
Judgment of distance is based partly on intensity - the quicker the sound, the farther away the
source.
Sound localization (in both horizontal and vertical planes) can be improved by tailoring the sound
distribution.
This is done using Head-Related Transfer-Functions (HRTFs).
Ideally, HRTFs should be tailored to suit the individual. However, this is complex and costly.
Researchers are currently trying to develop non-individualized HRTFs which will give a useful
improvement in localization accuracy for a substantial percentage of the population.
Research suggests that the human auditory system includes a short-term store - a kind of mental
'tape loop' that always stores the last few seconds of sound.
This is known as the Pre-categorical Acoustic Store or PAS.
Researchers disagree as the length of the store. Estimates range from as little as 10 seconds to as
much as 60 seconds.
However, there is significant evidence for the existence of such a store.
The existence of this auditory store explains some of the following effects.
 Recall of Un-attended Material
 The Recency Effect
 If someone listens to a voice reciting a list of digits (or characters etc.) and is then
asked to repeat the digits, he or she will recall the last few digits more reliably
than the earlier ones.
 Typically the last 3 - 5 digits are recalled.
 The Auditory Suffix Effect
 The recency effect (see above) is most noticeable when the speech or sound is
followed by a period of silence.
 If a further sound occurs after (e.g.) a list has been spoken recall is impaired.
 Conversely, if speech or sound is followed by complete silence, the period for
which the last few seconds of it can be recalled extends significantly.
6) Human Memory & Perception - [Haptic Perception]
Haptic Perception is the general term covering the various forms of perception based on touch.
There are three types of sensory receptor in the skin:
 thermoreceptors respond to heat and cold
 mechanoreceptors respond to pressure
 nociceptors respond to intense heat, pressure or pain
In computing applications, we are mostly concerned with mechanoreceptors.
Mechanoreceptors are of two types:
 rapidly-adapting mechanoreceptors react to rapid changes in pressure, but do not
respond to continuous pressure
 slowly-adapting mechanoreceptors respond to continuous pressure
Sensory acuity is often measured using the two-point test.
This simply involves pressing two small points (ed. sharpened pencil tips) against the body.
The two points are initially placed very close together, and then moved further apart until it
becomes possible to feel two distinct pressure points rather than one.
The smaller the distance at which both points can be detected, the greater the sensory acuity.
The fingers and thumbs have the greatest acuity.
Sensory acuity varies considerably among individuals.
It can be improved with training, within certain limits.
For example, blind people who read Braille generally have better sensory acuity than non-Braille
readers.
However, certain medical conditions can lead to reduced sensory acuity.
Kinaesthetic Feedback
Another aspect of haptic perception is known as kinaesthetic feedback.
Kinaesthetic receptors in our joints and muscles tell us where our limbs, fingers, etc., are relative to
the rest of our body.
Kinaesthetic receptors are of three types:
 Rapidly-adapting kinaesthetic receptors respond only to changes in the position of limbs,
etc...
 Slowly-adapting kinaesthetic receptors respond both to changes in position and static
position of limbs, etc..
 Static receptors respond only to static position of limbs, etc..
Kinaesthetic feedback is important in many rapid actions, e.g. typing or playing a musical instrument.
Haptic Memory
As with auditory perception, we have a short-term sensory memory for haptic experience.
This is known as haptic memory.
It functions in a very similar way to the auditory store, i.e.:
 Haptic events are stored as they are experienced
 New experiences replace older ones in the memory, but...
 If no new haptic events are experienced, previous events remain in the store.
7) Human Memory & Perception - [Speech Perception]
How do humans extract meaning from speech?
Early models assume a 'bottom-up' approach, i.e.:
 Separate the stream of speech into words.
 Identify each word and determine its meaning.
 Determine the meaning of the whole utterance.
More recent models assume a 'top-down' approach, i.e.:
 Analyze prosody and other cues to locate the key words.
 Identify the key words and guess the meaning of utterance.
 If unsuccessful, analyze more words until the meaning has been extracted.
Even when the individual words are correctly recognized, speech is more difficult to analyze than
written language.
There are a number of reasons for this:
 Speech relies heavily on non-grammatical sentence forms( minor sentences )
 There is no punctuation
 Repetition and re-phrasing are common.
 Efficient speech communication relies heavily on other communication channels - gesture,
facial expression, etc...
8) User-Centered Design - [Intro]
The first stage in the design of an interface is to identify the requirements.
This involves consideration of a number of questions, such as:
1. What area of expertise will the application be designed for?
2. Who are the users?
3. What do the users want to do with the application?
4. Where and how will the application be used?
1. What is the area of expertise?
The task of identifying domain knowledge is known as domain analysis.
A common problem with domain analysis is that:
 Experts are so familiar with their field that they regard some domain knowledge as
general knowledge.
 Thus they are unable to accurately identify domain knowledge.
Therefore, domain analysis should involve talking to both:
 experts in the relevant fields(s)
 end-users( or potential end users )
2. Who are the users?
It is important to know who the system is being designed for.
Therefore the designer should start by identifying the target users.
One approach is to draw-up a 'profile' which includes factors such as:
 Age
 Sex
 Culture
 Physical abilities and disabilities
 Computing/IT knowledge experience
However, it’s very difficult to design for a large, loosely-defined group.
A better approach is to segment the users into a number of smaller, tightly-defined groups.
Each group can be represented by a profile of an imaginary user. These profiles are called personas.
A persona:
 Should cover all the factors listed above, but should also include other details, such as likes
and dislikes, habits, etc…
 Can be a composite, combining characteristics from a number of real people, but should be
consistent and realistic.
 Should read as the description of a real person.
In segmenting the users, it may also be necessary to distinguish between primary and secondary
users.
For example: in the case of a flight information system:
 the primary users might be travel agents
 the secondary users might be customers who book flights through travel agents
3. What do the users want to do?
In identifying needs we must distinguish between:
 Needs identified by professional designers/developers.
 These are often referred to as normative needs
 The needs of the end-user. These can be difficult to determine. It often helps to think in
terms of:
 Expressed needs - what end-users SAY they want
 Felt needs - what end-users ACTUALLY want( or would like ) from the system
The principal methods used to identify user needs are:
 direct observation( where possible )
 questionnaires
 interviews
Ideally, you should observe people who are using the system for their own ends, unprompted by
you.
For example, if the task is to develop a better ATM interface, you could (with the banks permission)
user video to monitor people using existing ATMs.
You could then note any problems they encounter.
An artefact is an object or aid used in the performance of a task. Examples of artefacts include:
 Notes stuck to the computer detailing (e.g.) keyboard short-cuts.
 Reference manuals pinned in a prominent position.
 Manuals created by the users themselves.
The questionnaire might cover:
 How much experience they have with relevant systems?
 What kinds of tasks, queries, etc., they have carried out using this type of system?
 Did they encounter particular problems?
 If they have tried several computing systems, did they find one easier to use than another,
and if so, in what way?
Interviews vs questionnaires:
 Interviews are usually less structured than questionnaires
 Questionnaires provide a more formal, structured setting than interviews, ensuring
consistency between respondents.
4. How will the application be used?
For example, users of an ATM may only be able to devote part of their attention to the task because:
 They are surrounded by other people and feel pressured or concerned about their privacy
 They are simultaneously trying to control small children
9) User-Centered Design - [Conceptual Design]
…Nothing of note…
10) UCD - Guidelines - [Shneiderman's Golden Rules]
Shneiderman’s Eight Golden Rules are widely-used general-purpose guidelines.
Shneiderman has revised the rules a number of times since he first proposed them. The current set
of rules is as follows:
1. Strive for consistency
 Identical terminology should be used in menus, prompts etc.
 Consistent colour and layout should be used
 If exceptions have to be made, they should be comprehensible and limited in
number
2. Cater for universal usability
 Recognise the needs of diverse users(range of ages, levels of expertise, special
needs, etc.), e.g.
 Explanations for novices
 Shortcuts for experts
3. Offer informative feedback
 For every user action there should be system feedback, tailored to the action:
 Modest feedback for frequent and/or modest actions
 More substantial feedback for infrequent and/or major actions
4.
Design dialogs to yield closure
 Sequence of actions should be organized into groups with a beginning, middle, and
end.
5. Prevent errors
 As far as possible, design systems so that users cannot make errors, e.g.:
 Grey-out inappropriate menu-items
 Do not allow typing of alphabetic characters into numeric fields
6. Permit easy reversal of actions
 This relives anxiety since the user knows that errors can be undone
7. Support internal locus of control
 Operators want to feel they are in charge of a system
8. Reduce short-term memory load
11) UCD - Guidelines - [Web Content Accessibility
Guidelines]
It comprises 12 Guidelines which relate to four general principles:
 Perceivable
 Operable
 Understandable
 Robust
Perceivable
 Provide text alternatives for any non-text content
 Provide alternatives for time-based media.
 Create content that can be presented in different ways without losing information or
structure.
 Make it easier for users to see and hear content
2. Operable
 Make all functionality available from a keyboard.
 Provide users enough time to read and use content.
 Do not design content in a way that is known to cause seizures.
 Provide ways to help users navigate and find content.
3. Understandable
 Make text content readable and understandable.
 Make Web pages appear and operate in predictable ways.
 Help users avoid and correct mistakes.
4. Robust
 Maximize compatibility with current and future user agents, including assistive
technologies.
1.
12) UCD - [Heuristics and Metrics]
Once a prototype system (or even a partial prototype) has been created, it can be analysed to see
how usable it is.
The two main approaches to testing are Heuristic Evaluation and Usability Metrics.
Heuristic Evaluation
In Heuristic Evaluation, a number of evaluators examine an interface and assess its compliance with
a set of recognised usability principles (the heuristics).
Heuristics are general rules which describe common properties of usable interfaces.
The process is as follows:
 Each evaluator is asked to assess the interface in the light of the heuristics - not their own
likes/dislikes, etc..
 Evaluators work alone, so that they cannot influence one-another.
 Each evaluator should work through the interface several times.
 Evaluators should record their comments so that they in turn, can be recorded by an
observer
 If an evaluator encounters problems with the interface the experimenter should offer
assistance, but not until the evaluator has assessed and commented upon the problem.
 Only when all the evaluators have assessed the system individually should the results be
aggregated and the evaluators allowed to communicate with one another.
Usability Metrics
The term Usability Metrics refers to a range of techniques that are typically more expensive and
time-consuming than Heuristic Evaluation but yield more reliable results.
Techniques based on usability metrics involve asking a group of users to perform a specified task (or
set of tasks).
The data gathered may include:
 success rate (task completion/non-completion, % of task completed)
 time
 errors (number of errors, time wasted by errors)
 user satisfaction
Examples
Web Accessibility Testers
These work in a similar way to HTML validators, but analyse the target page for accessibility as well
as for HTML code validity.
They automatically check many of the accessibility issues listed in the Web Content Accessibility
Guidelines, e.g.:
 Inclusion of alt text, summaries, table header information, etc.
 Contrast between foreground and background colours
 etc.
Where a page is found to violate the guidelines, most testers identify the type of error and the line
of HTML code on which it occurs.
13) UCD - Interaction Modelling - [Introduction]
Interaction models can be divided into two broad categories:
 Task analysis
 models only what happens - or is observable - during interaction
 Cognitive models
 Designed to incorporate some representation of the user's abilities,
understanding, knowledge, etc...
Cognitive models can be broadly categorised as follows:
 Hierarchical representations of the user's task and goal structure
 These models deal directly with the issues of formulating tasks and goals.
 Linguistic and Grammatical models
 These models deal with articulation and translation between the system and
the user.
 Physical and Device-Level models
 These models deal with articulation at the human motor level rather than at
higher levels.
14) UCD - Interaction Modelling - [Goal & Task Hierarchies]
Probably the best-known and most influential model based on goal/task hierarchies is
GOMS
GOMS stands for Goals, Operators, Methods and Selection.
Goals
These describe what the user wishes to achieve.
Operators
These represent the lowest level of analysis, the basic actions that
the user must perform in order to use the system.
Methods
It may be possible to achieve a goal using any of several
alternative sub-goals or sequences of sub-goals. These are known
as methods.
Selection
Where a goal may be achieved using several alternative methods,
the choice of method is determined by a selection rule.
Note that GOMS, like many models based on goal/task hierarchies, does not take account of
error.
Cognitive Complexity Theory
CCT has two descriptions which operate in parallel:
 A description of the user's goals based on a GOMS-like hierarchy but expressed
through production rules.
 A description of the system state, expressed as generalised transition networks, a
form of state transition network.
15) UCD - Interaction Modelling - [Linguistic & Grammatical
Models]
These use formalisms such as BNF (Backus-Naur Form) to describe interactions.
The intention is to represent the cognitive difficulty of the interface so that it can be
analysed.
Backus-Naur Form
BNF can be used to define the syntax of a language.
BNF defines a language in terms of Terminal Symbols, Syntactic Constructs and
Productions.
 Terminal Symbols
 Elementary symbols of a language, such as words and punctuation marks.
 In computing languages, these may be variable-names, operators, reserved
words, etc...
 Syntactic Constructs (or non-terminal symbols)
 Phrases, sentences, etc.
 In computing languages, these may be conditions, statements, programs,
etc...
 Productions are sets of rules which determine how Syntactic Constructs are built.
16) UCD - IM - [Physical & Device Models - Fitt's Law]
Fitts' Law states that, for a given system, the time taken to move a pointer onto a target
varies as a function of:
 The distance the pointer has to be moved
 The size of the target.
Fitts' Law is normally stated as follows:
d
tm = a + b log2 (
+1)
s
Where:
tm
a
b
d
s
=
=
=
=
=
movement time
start/stop time
device tracking speed
distance moved
target size (relative to the direction of movement)
a and b must be empirically determined for different operations, pointing devices, etc..
Some implications of Fitts' Law:
 Interaction times can be reduced by making targets large and distances small
wherever possible, e.g.:
 Pop-up menus are generally faster to use than fixed menus.
 The efficiency of fixed, linear menus can be improved by:
 Placing frequently-used options near the start-point
 Placing the menu at (or near) the screen edge so that it becomes
infinitely large in the direction of movement.
 Point-and-click operations are usually faster than dragging operations.
 The distance/size ratio determines acquisition time.
17) UCD - IM - [Physical & Device Models - KLM]
The Keystroke-Level Model (KLM) is designed to model unit-tasks within an interaction.
These would typically be short command sequences, such as changing the font of a
character.
The KLM would rarely be used to model sequences lasting more than twenty seconds.
The Keystroke-Level Model divides tasks into two phases:
 Acquisition - the user builds a mental model of the task.
 Execution - the task is executed using the system's facilities.
The KLM does not attempt to model what happens during the acquisition phase.
This must be done using other models or methods.
However, the KLM models what happens during the execution phase in great detail.
The execution phase is broken down into physical motor operations, system responses, and
mental operations.
The KLM defines five types of motor operation:
K
keystroking, i.e., striking a key, including a modifier key such as shift
B
Pressing a mouse button
P
Pointing, using the mouse or other pointing device, at a target
H
Homing, i.e., switching the hand between mouse and keyboard
D
Drawing lines using the mouse
The KLM also provides mental response and system response operators:
M
Mentally preparing for a physical action
R
Response from the system: may be ignored in some cases, e.g., copy-typing
Suppose we wish to model the interaction involved in correcting a single-character error
using a mouse-driven text-editor.
This involves pointing at the error, deleting the character, re-typing it, then returning to the
original point in the text.
This might be modelled as follows:
1
move hand to mouse
H[mouse]
2
position cursor after bad character PB[LEFT]
3
return hand to keyboard
H[keyboard]
4
delete character
MK[DELETE]
5
type correction
K[char]
6
reposition insertion point
H[mouse]MPB[LEFT]
Once an operation has been decomposed in this way, the time required to perform it can be
calculated.
This is done by counting the number of each type of operation, multiplying by the time
required for each type of operation, then summing the times, e.g.:
Texecute = TK + TB + TP + TH + TD + TM + TR
For example, the time required for the operation described earlier could be calculated as
follows:
Texecute = 2tB + 3tH + 2tK + 2tM + 2tP
18) UCD - Usability Testing - [Experimental Design]
Testing can be carried out at various stages during design and development, e.g.:
 At a preliminary stage, to determine requirements, expectations, etc.
 During design, as a means of testing general concepts or individual elements of a
proposed system.
 At the prototype stage, to find out if the design meets expectations
 etc.
The best approach is iterative testing, i.e., testing at each stage of the design and
development cycle.
Usability testing may take different forms, depending upon the stage at which it is carried
out and the type of data required:
Surveys - subjects fill-in a questionnaire or are interviewed. Surveys can be either:
 Qualitative
 The questionnaire contains 'open' questions that may elicit a wide
range of responses.
 For example, 'what did you like most about the web-site?’
 Quantitative
 The questionnaire contains questions or statements that require a
'yes/no' or numerical response. The results can be analysed
statistically if required.
 For example, 'The performance was too slow', to which the user
should indicate agreement or disagreement on a numerical scale.
 Observation - users are observed (or videoed) using a system and data is gathered
on (e.g.) time taken to perform tasks, number of errors made, etc.
 Controlled Studies
 Usually involve comparing a new system with a reference system.
 The comparison is based on measurable/observable factors such as time to
complete task, number of errors made, etc..
 The results would normally be analysed statistically.

Designing Controlled Studies
In order to carry out a controlled study we need:
 Two (or more) conditions to compare, e.g. performance of a task on:
 a new/experimental system
 an existing system which serves as a reference
 A task which can be performed on both systems
 A prediction that can be tested
 A set of variables, including:
 an independent variable
 one or more dependent variables
 A number of subjects who may need to be divided into groups
 An experimental procedure
Conditions
If we conduct a controlled study in which we compare a new system against a reference
system, we use the following terminology:
 The condition in which the new system is used is known as the experimental
condition.
 The condition in which reference system is used is known as the control condition.
Variables
The independent variable is the one we wish to control.
The dependent variable is the one we will measure in order to determine if changing the
independent variable has produced an effect.
The dependent variable will be some measure of performance on the two systems, e.g.:
 task-completion time
 level of knowledge/skills acquired
 user satisfaction
Subjects and Groups
The subjects should be chosen to suit the system under test, e.g.:
 potential customers, if testing an eCommerce system
 students, if testing an eLearning application
 people with a relevant special need, if testing an accessible system
Having chosen the subjects, we also have to decide how to assign them to the conditions.
The options are:
 Independent measures: divide the subjects randomly into groups, and test each
group under a different condition.
 Matched subjects: as above, but match the groups according to relevant criteria
(e.g., the average IQ score is the same for each group).
 Repeated measures: all subjects are tested under all conditions.
Quality of Data
When designing a test or questionnaire, careful thought should be given to the kind of data
it will generate.
If our aim (for example) is merely to gather ideas on how to improve a system, then a
qualitative questionnaire will be suitable.
However, if we hope to demonstrate that our system is better than existing systems in some
way(s), we may want to use a statistical test to prove this.
In this latter case, we will need to design our test or questionnaire carefully to ensure it
yields testable data.
Statisticians classify data under the following headings:
Nominal-scaled data
 There is no numerical relationship between scores
 e.g., a score of 2 is not necessarily higher than a score of 1.
 Ordinal-scaled data
 A score of 2 is higher than a score of 1, but not necessarily twice as high.
 Data obtained from questionnaires is usually ordinal-scaled.
 Interval-scaled data
 A score of 2 is exactly twice as high as a score of 1.
 Timing data is usually interval-scaled.
 Parametric data
 The data must be interval-scaled (see above) and in addition:
 The scores must be drawn from a normal population
 If we were to measure our subjects on factors which are
important in the study (e.g., intelligence), the results would lie
on a normal distribution (sometimes known as a bell-curve).
 The scores must be drawn from a population that has normal variance
 If we were measure our subjects as described above, the
spread of scores would be the same as that found in the
general population.

19) UCD - Usability Testing - [Data Analysis]
The Frequency Distribution
As a first attempt at visualising the result, we might create a frequency distribution.
This is a graph showing the frequency with which each score occurs under each condition:
The frequency distribution shows us that the scores for the experimental group appear to
be higher than the scores for the control group.
This is a commonly-used descriptive method.
It presents the data, without loss, in a form that allows the characteristics of the data to be
understood more easily than is possible using just the raw data.
The Average
Descriptive measures are useful but have limitations. Often we need to summarise the data
in some way.
One of the simplest ways to summarise data is by calculating the averages.
However, this tells us very little about the data.
For small groups of subjects, a single very low or very high score (an outlier) can significantly
affect the average.
This would be obvious in a frequency distribution but not in an average value.
Therefore the average, while useful, does not capture all the features of the data.
The Variance
A more useful way of summarising data is to state the variance.
The variance indicates the amount of dispersion in the scores.
By quoting just two values - the variance and the average - we can summarise a set of scores
in considerable detail.
Standard Deviation
Another widely-used measure of dispersion is the standard deviation.
The standard deviation is simply the square-root of the variance.
Standard Deviation and the Normal Distribution
The frequency distribution graph obtained earlier shows marked differences between the
two sets of scores, not only in their average values but also in their distribution.
If we were to take samples from an infinite number of subjects and then chart the frequency
distribution, we would probably find that the results show a normal distribution.
The normal distribution has the following features:
 It is symmetrical, with most of the scores falling in the central region.
 Because it is symmetrical, all measures of central tendency (mean, mode, median)
have the same value.
 It can be defined using only the mean and the standard deviation.
20) UCD - Usability Testing - [Statistical Inference]
Experimental Design and Analysis
Statistical Inference
All the techniques described so far are intended either to describe or summarise data.
For many purposes this is sufficient, but sometimes we need to go further and attempt to
prove that:
 there is a significant difference between two sets of experimental data
 there is a significant difference in a particular direction, e.g., that data-set a is better
in some way than data-set b
This is known as drawing statistical inference.
Significance
When designing experiments we try to keep all possible factors stable with the exception of
one, the independent variable, which we deliberately manipulate in some way.
We then measure another variable, the dependent variable, to see how it has been affected
by the change(s) in the independent variable.
However, we cannot assume that all changes in the dependent variable are due to our
manipulation of the independent variable.
Some changes will almost certainly occur by chance.
The purpose of statistical testing is to determine the likelihood that the results occurred by
chance.
We can never prove beyond doubt that any differences observed are the result of changes
in the independent variable rather than mere chance occurrences.
However, we can determine just how likely it is that a given result could have occurred by
chance, and then use this figure to indicate the reliability of our findings.
Before testing, we formulate two hypotheses:
 Any differences arise purely as a result of chance variations
 This is known as the null hypothesis
 Any differences arise - at least in part - as a result of the change(s) in the independent
variable
 This is known as the alternate or experimental hypothesis
Statistical tests allow us to determine the likelihood of our results having occurred purely by
chance.
Thus they allow us to decide whether we should accept the null hypothesis or the alternate
hypothesis.
We usually express probability on a scale from 0 to 1. For example:
p ≤ 0.05
When accompanying a statistical finding, this indicates that the likelihood of the observed
difference having occurred as a result of chance factors is less than one in 20.
This is known as the significance level.
What is an appropriate level of significance to test for?
 If we choose a relatively high value of significance, we are more likely to obtain
significant results, but the results will be wrong more often.
 This is known as a Type 1 error.
 If we choose a very low value for significance, we can place more confidence in our
results. However, we may fail to find a correlation when it does in fact exist.
One-Tailed and Two-Tailed Predictions
In formulating our prediction, we must also decide whether to predict the direction of any
observed difference or not.
 If we predict only that there will be a difference, we are using a two-tailed test.
 If we predict the direction of the difference, we are using a one-tailed test.
 This is known as a Type 2 error.
Choice of Test
When choosing a test, the following factors should be taken into account:
Two-sample or k-sample
 Most tests compare two groups of samples, e.g., the results obtained from
comparative tests on two different systems.
 Some tests can be used to compare more than two groups of samples, e.g.,
the results obtained from comparative tests on three or four different
systems.
 Related measures or independent measures
 Different tests are used depending upon whether the two (or more) groups
from which the data is drawn are related or not.
 Nominal, ordinal, interval or parametric data.

These three factors - number of groups, relationship between groups and quality of data are the principal factors to be taken into account when designing a study and choosing a
statistical test.
There are tests available to suit each combination of these factors.
2-Sample Tests
k-Sample Tests
Related Samples Tests
t-test (related-samples)
parametric data
Wilcoxon
interval-scaled data
Sign test
ordinal-scaled data
Page's L test
ordinal-scaled data
Independent Samples Tests
t-test (independent-samples)
parametric data
Mann-Whitney
ordinal-scaled data
X2 test
nominal-scaled data
Jonckheere trend test
ordinal-scaled data
Various software packages are available to carry out these and similar tests.
Therefore, the main task facing the designer of a usability test is to choose the right test, in
accordance with the data being gathered.
Download