Safety in Interactive Systems Christopher Power Human Computer Interaction Research Group Department of Computer Science University of York Safety? • Usually in HCIT, we have talked about properties of the interactive system as being facets of usability • However, there are other properties that we often need to consider – safety is one of them • Informally, safety in interactive systems broadly means preventing incidents that can lead to catastrophic loss, either from the human, the machine or the organisation Example case: High Speed Train Derailment • A high speed train crashed in Spain 25 July 2013 • The train derailed at 192 km/h on a curve that was only rated for 80 km/h • 79 people died, hundreds more injured • The following were ruled out: – – – – Mechanical failure in the train Technical failure in the track Sabotage Terrorism • What is left? – The media referred to it as “Human Error” Human error • We often talk about “human error” – but this term is kind of meaningless • Why is it meaningless though? – Because life is messy … there are lots of routes to any given error – Observed phenomena may be caused by many underlying factors, some related to the human, some to the device and some to the environment/context Reason’s “Swiss Cheese” Model of Human Error • John Reason proposed a model in 1990 for understanding where error occurs by categorizing them by the process where they occur. • Proposed that errors could occur in one process be propagated forward in a system. • Forms the basis for the Human Factors Analysis and Classification System (HFACS) Swiss Cheese Model • Each process is a layer of cheese, with holes where errors can slip through: HFACS: Swiss Cheese Model • In most cases safe-guards in other processes catch them and correct them. HFACS: Swiss Cheese Model • However, sometimes the holes in the system line up, and an error makes it all the way to the end with the effects of the error being realised. Unsafe Acts • Unsafe acts are those that are tied to the action cycle involving humans and system interaction. Errors and violations Returning to the train accident … • What went wrong? • The driver was distracted at 192 km/h for 2 minutes by a call on his work phone – It was to inform him that there was a change in the platform he would be coming in on • At 192 km/h that’s a looooong time to be on the phone – By the time he came off the phone, it was too late for him to stop Train Accident Sources of Error • Where were the potential sources of error? No policy on the length of call that one should have while in control of the train – nothing to prevent this from happening again Call came in that was unexpected, but within the regulations Loss of attention due to call Leads to delay in altering speed An easier case … • You are in a restaurant, and you have enjoyed a nice meal. You offer to pay and the server brings you the machine to pay by card. You attempt to pay three times, and you find a message ‘your card has been locked’ • What went wrong? Unsafe Acts • Similar to the interaction cycle discussed in other modules: – Errors deal with the perception, evaluation, integration and executing actions. – Violations deal with goals, intentions and plans HEA and HRA • How is this classification useful to us? • In respect to error, there are two different processes that we can undertake: – Human Error Analysis – trying to capture where errors can happen in the system, either proactively (evaluation prior to incident) or retrospectively – Human Reliability Analysis – trying to capture the probability that a human will have a fault in the system at some point Human Error Analysis Techniques • There are around 40 different techniques that I could point to in the literature • Many of these have little empirical basis and have never been validated • Some have had some work done, but it is debatable how well they work • We’re going to look at different types of error analysis methods over the next couple of hours Error Modes • Most modern techniques use the idea of an “error mode” • Error modes are categories of phenomena that we see when an incident occurs in the world • These phenomena could have many causes – we can track back along the causal chain • Alternatively we can use the phenomena we see (or suspect will happen) and compare it to interface components to see what might happen in the future SHERPA Background • SHERPA stands for “The Systematic Human Error Reduction and Prediction Approach” • Developed by Embrey in the mid-1980s for the nuclear reprocessing industry (but you cannot find the original reference!) • Has more recently been applied with notable success to a number of other domains • (Baber and Stanton 1996, Stanton 1998, Salmon et al. 2002, Harris et al. 2005…) • Has its roots in Rasmussen’s “SRK” model (1982)… Skills-Rules-Knowledge • Skill-based actions – Those that require very little conscious control e.g. driving a car on a known route • Rule-based actions – Those which deviate from “normal” but can be dealt with using rules stored in memory or rules which are otherwise e.g. setting the timer on an oven • Knowledge-based actions – The highest level of behaviour, applicable when the user has either run out of rules to apply or did not have any applicable rules in the first place. At that time the user is required to use in-depth problem solving skills and knowledge of the mechanics of the system to proceed e.g. pilot response during QF32 incident SHERPA Taxonomy • SHERPA, like many HEI techniques has it’s own cut-down taxonomy, here drawn up taking cues from SRK • Uses prompts are firmly based on operator behaviours as opposed to listing conceivable errors in taxonomic approaches • The taxonomy was “domain specific” (nuclear), but SHERPA has still been shown to work well across other domains (see references) • Rather than have the evaluator consider what psychological level the error has occurred at, the taxonomy simplifies this into the most likely manifestations (modes) for errors to occur SHERPA Taxonomy • The headings for SHERPA’s modes are (expanded next): – Action (doing something like pressing a button) – Retrieval (getting information from a screen or instruction list) – Checking (verifying action) – Selection (choosing one of a number of options) – Information Communication (conversation/radio call etc.) Taxonomy - Action • Action modes: – – – – – – – – – – A1: Operation too long/short A2: Operation mistimed A3: Operation in wrong direction A4: Operation too little/much A5: Misalign A6: Right operation on wrong object A7: Wrong operation on right object A8: Operation omitted A9: Operation incomplete A10: Wrong operation on wrong object Taxonomy - Retrieval & Checking • Retrieval modes are: – R1: Information not obtained – R2: Wrong information obtained – R3: Information retrieval incomplete • Checking modes are: – – – – – – C1: Check omitted C2: Check incomplete C3: Right check on wrong object C4: Wrong check on right object C5: Check mistimed C6: Wrong check on wrong object Taxonomy - Selection & Comms. • Selection modes are: – S1: Selection omitted – S2: Wrong selection made • Information Communication modes are: – I1: Information not communicated – I2: Wrong information communicated – I3: Information communication incomplete SHERPA Methodology • SHERPA begins like many HEI methods, with a Hierarchical Task Analysis (HTA) • Then the ‘credible’ error modes are applied to each of the bottom-level tasks in the HTA • The analyst categorises each task into a behaviour, and then determines if any of the error modes provided are credible • Each credible error is then considered in terms of consequence, error recovery, probability and criticality Step 1 - HTA • Using the example of a Sat-nav Step 2 - Task Classification • Each task at the bottom level of the HTA is classified into a category from the taxonomy Action Action Selection Retrieval Action Step 3 – Error Identification • For the category selected for a given task, the credible error modes are selected and a description of the error provided Selection: “Wrong selection made” – The user makes the wrong selection, clicking “point of interest” or something similar Retrieval “Wrong information obtained” – The user reads the wrong postcode and inputs it Step 4 – Consequence Analysis • For each error, the analyst considers the consequences The user makes the wrong selection, clicking “point of interest” or something similar… This would lead to the wrong menu being displayed which may confuse the user The user reads the wrong postcode and inputs it… Depending on the validity of the entry made the user may plot a course to the wrong destination Step 5 – Recovery Analysis • For each error, the analyst considers the potential for recovery The user makes the wrong selection… There is good recovery potential from this error as the desired option will not be available and back buttons are provided. This may take a few menus before the correct one is selected though The user reads the wrong postcode and inputs it… The recovery potential from this is fair, from the perspective that the sat Nav shows the duration and overview of the route, so depending on how far wrong the postcode is, it may be noticed at that point Steps 6, 7 – Probability & Criticality • Step 6 is an ordinal probability analysis, where L/M/H is assigned to the error based on previous occurrence – This requires experience and/or subject matter expertise • Step 7 is a criticality analysis, which is done in a binary fashion binary (it is either critical or it is not critical) Step 8 – Remedy • Step 8 is a remedy analysis, where error reduction strategies are proposed under the headings; – Equipment, Training, Procedures, Organisational Equipment The use of the term ‘address’ may confuse some people when intending to input a postcode…as postcode is a common entry, perhaps it should not be beneath ‘address’ in the menu system Procedure The user should check the destination/postcode entered for validity. The device design could display the destination more clearly than it does to offer confirmation to the user Output • Output of a full SHERPA analysis (Stanton et al 2005) Output Summary • “Human error” is a misnomer – there are a number of different things that contribute to failures in interactive systems • SHERPA is an alternative to other inspection methods • Claims in the literature point to it being “more easy to learn” and “more easy to apply by novices” – which is attractive • Founded on some of the roots of HF work done in the 1970s but … • … simplifies that work into something that can be applied References • Rasmussen, J. (1982) Human errors, a taxonomy for describing human malfunction in industrial installations. The Journal of Occupational Accidents, 4, 22. • Baber, C. & N. A. Stanton (1996) Human error identification techniques applied to public technology: Predictions compared with observed use. Applied Ergonomics, 27, 119-131. • Stanton, N. 1998. Human Factors in Consumer Products. CRC Press. • Salmon, P., N. Stanton, M. Young, D. Harris, J. Demagalski, A. Marshall, T. Waldman & S. Dekker. 2002. Using Existing HEI Techniques to Predict Pilot Error: A Comparison of SHERPA, HAZOP and HEIST. HCI-02 Proceedings. • Harris, D., N. A. Stanton, A. Marshall, M. S. Young, J. Demagalski & P. Salmon (2005) Using SHERPA to predict design-induced error on the flight deck. Aerospace Science and Technology, 9, 525-532. • Stanton, N., P. Salmon, G. Walker, C. Baber & D. Jenkins. 2005. Human Factors Methods. Ashgate. • • Action modes: – A1: Operation too long/short – A2: Operation mistimed – A3: Operation in wrong direction – A4: Operation too little/much – A5: Misalign – A6: Right operation on wrong object – A7: Wrong operation on right object – A8: Operation omitted – A9: Operation incomplete – A10: Wrong operation on wrong object Selection modes are: – S1: Selection omitted – S2: Wrong selection made • Retrieval modes are: – R1: Information not obtained – R2: Wrong information obtained – R3: Information retrieval incomplete • Checking modes are: – – – – – – C1: Check omitted C2: Check incomplete C3: Right check on wrong object C4: Wrong check on right object C5: Check mistimed C6: Wrong check on wrong object • Information Communication modes are: – I1: Information not communicated – I2: Wrong information communicated – I3: Information communication incomplete