Presentation about accuracy and precision

advertisement
Accuracy and precision:
Is there a difference, and if
there is, why is it important?
Dr Richard R. Plant
Department of Psychology, University of York, UK
Technical Director, The Black Box ToolKit Ltd
Computer use in Experimental/Field Settings
• Widespread use of computers for test delivery
• Ever more complex paradigms which look for smaller
effect sizes measured in milliseconds
• Often interoperability with extremely complex third party
hardware and software, e.g. custom response pads,
simulators, fMRI scanners etc.
• Assumption that “anything goes” with today’s hardware –
faster must = better and more accurate
• Becoming an accepted misnomer!
http://www.blackboxtoolkit.com
Should we be Concerned About Millisecond
Timing?
• Should we be concerned about presentation accuracy, response
timing and synchronicity between multimodal stimuli and other
devices?
• Overuse – do researchers become hooked on computer-based
methods?
• Do they know about the potential pitfalls as well as the benefits?
• Because the “hoops are fewer” is attention to detail laxer today?
• Is research/field work suffering?
• Do today’s computer systems produce timing errors?
• Should we do something about it?
http://www.blackboxtoolkit.com
Accuracy, Precision & Validity
In the fields of science, engineering, industry and statistics,
accuracy is the degree of conformity of a measured or
calculated quantity to its actual (true) value. Accuracy is
closely related to precision, also called reproducibility or
repeatability, the degree to which further measurements or
calculations show the same or similar results. The results of
calculations or a measurement can be accurate but not
precise; precise but not accurate; neither; or both.
A result is called valid if it is both accurate and precise.
http://www.blackboxtoolkit.com
Accuracy vs Precision - The Target Analogy
• Accuracy is the degree of veracity while precision is the degree of
reproducibility. The analogy used here to explain the difference between
accuracy and precision is the target comparison.
• Arrows that strike closer to the bulls eye are considered more accurate.
The closer a system's measurements to the accepted value, the more
accurate the system is considered to be.
• To continue the analogy, if a large number of arrows are fired, precision
would be the size of the arrow cluster. When all arrows are grouped tightly
together, the cluster is considered precise since they all struck close to the
same spot, if not necessarily near the bullseye. The measurements are
precise, though not necessarily accurate.
• Another example is where a measuring rule is supposed to be 1m long but
is actually only 97cm, measurements can be precise but inaccurate. The
measuring rule will give consistently similar results but the results will be
consistently wrong.
http://www.blackboxtoolkit.com
Quantifying Accuracy & Precision
Ideally a measurement device is both
accurate and precise, with
measurements all close to and tightly
clustered around the known value.
The accuracy and precision of a
measurement process is usually
established by repeatedly measuring
some traceable reference standard.
• When a result is both accurate and precise it is said to
be valid.
E-Prime® is the revolutionary suite of applications which
comprehensively fulfills your research needs. From experiment
generation and millisecond precision data collection through
data handling and processing, E-Prime is the most powerful and
flexible experiment generator available.
http://www.pstnet.com/products/e-prime/
http://www.blackboxtoolkit.com
Effects of Different Hardware on Millisecond
Timing
• Remember: often software is marketed and sold as
being capable of presenting stimuli and taking
measurements reliably down to the millisecond level
• However software can logically know nothing of the
equipment it runs on
• You can use any PC and additional hardware you like!
• At the moment people generally don’t check their timing
accuracy
http://www.blackboxtoolkit.com
Research – Timing Characteristics of Mice
• What kind of contribution can a response device make to
timing?
– Examined various brands of mice
– Looked at various interfaces, PS/2, USB, Serial
– Examined the timing characteristics using a signal generator and
Digital Phosphor Oscilloscope (external to a PC)
– Examined the performance of each mouse under a simple
paradigm in E-Prime. Flash a block mid screen then simulate a
response at a known offset (collate response times in terms of
known versus actual)
•
•
•
•
•
Can you predict response device performance?
What’s the typical contribution?
What effect does the operating system have?
What does the experiment generator contribute?
Does it matter? (Ulrich & Giray 1989)
http://www.blackboxtoolkit.com
http://www.blackboxtoolkit.com
80
Response Target Error (ms)
70
60
50
40
30
20
10
AMI wheel mouse, PS/2 (ref 6b)
ALPS keyboard, PS/2 (ref KB)
OEM mouse, PS/2 (ref 8)
PST response box, serial (ref BB1)
http://www.blackboxtoolkit.com
77
73
69
65
61
57
53
49
45
41
Trial Number
37
33
29
25
21
17
13
9
5
1
0
Display devices – All Created Equal?
http://www.blackboxtoolkit.com
Conditional Biases - Cross Modal Priming in the Field
http://www.blackboxtoolkit.com
So What About benchmarking?
Can’t you Just tell us Which Software is Best?
http://www.blackboxtoolkit.com
http://www.blackboxtoolkit.com
What About the Real-World?
•
There is little doubt that the majority of today’s high speed, high spec
hardware and operating systems are capable of real-time data collection
(MacInnes & Taylor 2001, Finney 2001).
•
Such research, whilst providing a solid baseline, leaves researchers in the
field with the fundamental question:
“How does my own paradigm on my own
equipment perform in the real-world?”
•
Until now this has been a question that has been extremely difficult to
answer.
•
Complete real-world paradigms can often be extremely complex making use
of both visual and auditory stimuli and requiring complex patterns of
responses from subjects.
http://www.blackboxtoolkit.com
The Only Solution is Self-Validation/Certification
• Taking a leaf out of other research cultures where
equipment is calibrated yearly
• Independently check presentation, synchronisation and
response timing
• State error limits in reports/academic papers
• Raise awareness of the issues and their increasing
importance
• Make it a requirement from government/journals?
• How easy is it to do? Until now it was very hard!
http://www.blackboxtoolkit.com
Our Timing Toolkit (the Black Box ToolKit)
A virtual human that independently checks any paradigm in situ
http://www.blackboxtoolkit.com
Pressing Need for Researchers to Easily and Cheaply SelfValidate Their own Paradigms in-situ on Their Own hardware
http://www.blackboxtoolkit.com
Clear Results
http://www.blackboxtoolkit.com
The Beneficiaries
• Researchers in the behavioural sciences (psychology, ergonomics
and human-computer interaction) carrying out work involving timecritical measurement: Not forgetting Traffic Psychology
• Practitioners involved in the design and evaluation of equipment for
time-critical human performance control
• Hardware & Software developers of tools for measuring performance
timing
• Lecturers (and their students) who teach experimental design and
methodology using software tools
• Above all improving the quality and consistency of research findings
within the field… At the moment some studies look “suspect” based
on our experience to date. Validity is key
http://www.blackboxtoolkit.com
Download