Accuracy and precision: Is there a difference, and if there is, why is it important? Dr Richard R. Plant Department of Psychology, University of York, UK Technical Director, The Black Box ToolKit Ltd Computer use in Experimental/Field Settings • Widespread use of computers for test delivery • Ever more complex paradigms which look for smaller effect sizes measured in milliseconds • Often interoperability with extremely complex third party hardware and software, e.g. custom response pads, simulators, fMRI scanners etc. • Assumption that “anything goes” with today’s hardware – faster must = better and more accurate • Becoming an accepted misnomer! http://www.blackboxtoolkit.com Should we be Concerned About Millisecond Timing? • Should we be concerned about presentation accuracy, response timing and synchronicity between multimodal stimuli and other devices? • Overuse – do researchers become hooked on computer-based methods? • Do they know about the potential pitfalls as well as the benefits? • Because the “hoops are fewer” is attention to detail laxer today? • Is research/field work suffering? • Do today’s computer systems produce timing errors? • Should we do something about it? http://www.blackboxtoolkit.com Accuracy, Precision & Validity In the fields of science, engineering, industry and statistics, accuracy is the degree of conformity of a measured or calculated quantity to its actual (true) value. Accuracy is closely related to precision, also called reproducibility or repeatability, the degree to which further measurements or calculations show the same or similar results. The results of calculations or a measurement can be accurate but not precise; precise but not accurate; neither; or both. A result is called valid if it is both accurate and precise. http://www.blackboxtoolkit.com Accuracy vs Precision - The Target Analogy • Accuracy is the degree of veracity while precision is the degree of reproducibility. The analogy used here to explain the difference between accuracy and precision is the target comparison. • Arrows that strike closer to the bulls eye are considered more accurate. The closer a system's measurements to the accepted value, the more accurate the system is considered to be. • To continue the analogy, if a large number of arrows are fired, precision would be the size of the arrow cluster. When all arrows are grouped tightly together, the cluster is considered precise since they all struck close to the same spot, if not necessarily near the bullseye. The measurements are precise, though not necessarily accurate. • Another example is where a measuring rule is supposed to be 1m long but is actually only 97cm, measurements can be precise but inaccurate. The measuring rule will give consistently similar results but the results will be consistently wrong. http://www.blackboxtoolkit.com Quantifying Accuracy & Precision Ideally a measurement device is both accurate and precise, with measurements all close to and tightly clustered around the known value. The accuracy and precision of a measurement process is usually established by repeatedly measuring some traceable reference standard. • When a result is both accurate and precise it is said to be valid. E-Prime® is the revolutionary suite of applications which comprehensively fulfills your research needs. From experiment generation and millisecond precision data collection through data handling and processing, E-Prime is the most powerful and flexible experiment generator available. http://www.pstnet.com/products/e-prime/ http://www.blackboxtoolkit.com Effects of Different Hardware on Millisecond Timing • Remember: often software is marketed and sold as being capable of presenting stimuli and taking measurements reliably down to the millisecond level • However software can logically know nothing of the equipment it runs on • You can use any PC and additional hardware you like! • At the moment people generally don’t check their timing accuracy http://www.blackboxtoolkit.com Research – Timing Characteristics of Mice • What kind of contribution can a response device make to timing? – Examined various brands of mice – Looked at various interfaces, PS/2, USB, Serial – Examined the timing characteristics using a signal generator and Digital Phosphor Oscilloscope (external to a PC) – Examined the performance of each mouse under a simple paradigm in E-Prime. Flash a block mid screen then simulate a response at a known offset (collate response times in terms of known versus actual) • • • • • Can you predict response device performance? What’s the typical contribution? What effect does the operating system have? What does the experiment generator contribute? Does it matter? (Ulrich & Giray 1989) http://www.blackboxtoolkit.com http://www.blackboxtoolkit.com 80 Response Target Error (ms) 70 60 50 40 30 20 10 AMI wheel mouse, PS/2 (ref 6b) ALPS keyboard, PS/2 (ref KB) OEM mouse, PS/2 (ref 8) PST response box, serial (ref BB1) http://www.blackboxtoolkit.com 77 73 69 65 61 57 53 49 45 41 Trial Number 37 33 29 25 21 17 13 9 5 1 0 Display devices – All Created Equal? http://www.blackboxtoolkit.com Conditional Biases - Cross Modal Priming in the Field http://www.blackboxtoolkit.com So What About benchmarking? Can’t you Just tell us Which Software is Best? http://www.blackboxtoolkit.com http://www.blackboxtoolkit.com What About the Real-World? • There is little doubt that the majority of today’s high speed, high spec hardware and operating systems are capable of real-time data collection (MacInnes & Taylor 2001, Finney 2001). • Such research, whilst providing a solid baseline, leaves researchers in the field with the fundamental question: “How does my own paradigm on my own equipment perform in the real-world?” • Until now this has been a question that has been extremely difficult to answer. • Complete real-world paradigms can often be extremely complex making use of both visual and auditory stimuli and requiring complex patterns of responses from subjects. http://www.blackboxtoolkit.com The Only Solution is Self-Validation/Certification • Taking a leaf out of other research cultures where equipment is calibrated yearly • Independently check presentation, synchronisation and response timing • State error limits in reports/academic papers • Raise awareness of the issues and their increasing importance • Make it a requirement from government/journals? • How easy is it to do? Until now it was very hard! http://www.blackboxtoolkit.com Our Timing Toolkit (the Black Box ToolKit) A virtual human that independently checks any paradigm in situ http://www.blackboxtoolkit.com Pressing Need for Researchers to Easily and Cheaply SelfValidate Their own Paradigms in-situ on Their Own hardware http://www.blackboxtoolkit.com Clear Results http://www.blackboxtoolkit.com The Beneficiaries • Researchers in the behavioural sciences (psychology, ergonomics and human-computer interaction) carrying out work involving timecritical measurement: Not forgetting Traffic Psychology • Practitioners involved in the design and evaluation of equipment for time-critical human performance control • Hardware & Software developers of tools for measuring performance timing • Lecturers (and their students) who teach experimental design and methodology using software tools • Above all improving the quality and consistency of research findings within the field… At the moment some studies look “suspect” based on our experience to date. Validity is key http://www.blackboxtoolkit.com