GENERATION OF HUMAN RELIABILITY DATA FOR THE AIR TRAFFIC INDUSTRY Barry Kirwan, EUROCONTROL Eric Perrin, EUROCONTROL Brian Hickling, EUROCONTROL Huw Gibson, Birmingham University Ed Smith, DNV Consulting SUMMARY Air Traffic Management (ATM) deals with the safe and efficient passage of aircraft across national and international airspace. In Europe, ATM, as with other industries, must now comply with formalized risk assessment procedures, for example those embodied in EUROCONTROL Safety Regulatory Requirements (ESARR) 4 [1]. In order to demonstrate that systems are acceptably safe, a Safety Assessment Methodology (SAM) has been proposed by EUROCONTROL [2] and is being applied by many countries in Europe. Safety cases for existing or new systems, or significant system changes, can utilize fault and event tree (or equivalent) approaches to model and quantify risk as is done in other industries such as nuclear power, chemical, process, and petrochemical. However, there has been little emphasis to date on Human Reliability Assessment (HRA) in the world of ATM, although it is recognized that the high degree of safety evident in this industry is mainly due to the human element (in particular the air traffic controller). The context of this paper therefore concerns the feasibility of HRA in air traffic risk assessments. As a first step towards HRA in ATM, this paper focuses on the degree to which quantitative human error data can be generated to substantiate or calibrate an ATM HRA approach. Two separate exercises are reported. The first concerns collection of human error data from a real-time simulation involving air traffic controllers and pilots. This study focused on communication errors between controllers and pilots. The second relates to a formal expert judgment study using direct numerical estimation (also called Absolute Probability Judgment) and Paired Comparisons protocols to elicit and structure the controller and pilot expertise. The results showed that stable HEPs can be provided from real time simulations, at least with respect to communications activities, and to a lesser extent from expert judgment approaches. These results suggest that the approach of HRA can be adapted to ATM safety case methodologies and frameworks. An example of a recent developing air traffic safety case which has utilized the HRA approach is briefly discussed. The conclusion is that HRA is feasible, but that more data do need to be collected, since ATM dynamics and safety scenario timings, as well as its operational culture and performance shaping factors, are different to other industries where HRA application is ‘the norm’. 1 Copyright © #### by ASME INTRODUCTION Human Reliability Assessment (HRA) has been around for some time, notably since the Three Mile Island nuclear power plant accident in 1979, when the Technique for Human Error Rate Prediction (THERP) [3] became the predominant technique in use. Since that time, the usage of HRA has spread to other process-control-related industries (petrochemical; chemical and process). Recently it has also spread to the transportation sector, notably the rail industry, and the medical and air traffic management domains are increasingly focusing on the management and assessment of human error [4]. HRA has several main attributes [3, 5], principally determining what can go wrong (human error identification), determining the risk significance of errors (or correct performance: the kernel of this function being human error quantification), and identifying how to mitigate human-related risks or assure safe human performance (error reduction). HRA can be seen as an engineering function (belonging principally to the sub-discipline of Reliability Engineering) or as a Human Factors function. In truth it belongs to both domains, and normally people carrying out HRA are ‘hybrid’ practitioners with a mixture of reliability engineering and Human Factors/psychology expertise. HRA is however usually most clearly associated with its second main function, namely the quantification of human error likelihood (i.e. how frequently will a particular error actually occur?). This likelihood is usually expressed as a probability, called a Human Error Probability (HEP), which is effectively the probability of an error per demand. In theory, and in practice, HRA rests upon a fundamental premise that HEPs can be quantified, i.e. that they exist as stable quantitative values. The quantitative expression for an HEP is straightforward and is shown below, based on the idea that a HEP can be measured by observation1: HEP = number of errors observed / number of opportunities for error Whenever HRA is being applied to a new industry, it must map onto the risk or safety management framework and indeed the organizational ‘culture’ of that industry. Where such an industry already has a quantitative approach to risk (e.g. nuclear power), HRA may ‘find its niche’ quickly. Conversely, where risk is managed more qualitatively, it may encounter resistance and incredulity from those who need to be convinced in order for it to be utilized effectively. Thus, HRA in the medical domain at the moment is mainly being used qualitatively (for error identification and reduction purposes). But in ATM, the whole industry in Europe has been undergoing a paradigm shift in safety assessment, moving from a more ‘implicit’ and qualitative approach to a formalized methodological framework with quantitatively stated target levels of safety that safety cases need to reach. This leads to formal use of techniques such as fault and event trees, and in some cases dynamic risk assessment approaches. The former approaches in particular will need HEPs if human error plays a significant role in ATM safety. In fact, human performance dominates safe system performance in ATM (controllers literally control where aircraft go, in real time, using radar and radio-telephony), and so should be a major component in any safety case. Nevertheless there is some skepticism in the industry as to whether HRA is meaningful, and in particular, whether HEPs are a stable entity. Since ATM is significantly different to industries where HRA is accepted (ATM is very dynamic and its human elements operate in a much faster timeframe than, say, nuclear power), the argument that ‘HRA works and is accepted elsewhere’ is not sufficient. As a preliminary step to developing a HRA methodology for ATM, it was decided to explore whether HRA’s basic premise that HEPs can be quantified and remain stable, is true also in ATM. However, ATM’s culture is more judgment-based than scientifically based (again, unlike nuclear power, for example). Many of those in key management positions today were once controllers themselves and would be more convinced by controllers saying that HRA’s estimates were reasonable, than by reading a scientific or academic report. Therefore a two-pronged approach was taken to explore the applicability of HRA to ATM. The first was to determine if HEPs could be collected in realistic (high fidelity) real-time simulations of ATM operations2. The second approach was to carry out expert judgment procedures to quantify HEPs for an ongoing ATM safety case, using controllers and pilots as subjects. The hypotheses were simple, and capable of being rejected (i.e. scientifically they are ‘falsifiable’): 1 Assuming that errors occur in a stochastic fashion, for example following a Poisson distribution. The confidence limits around such HEPs of course depend on the respective sizes of the numerator and denominator, more observations leading to higher confidence and narrower uncertainty bounds around the resultant HEP. 2 Although the pilot situation is less high fidelity, using ‘pseudo-pilots’, trained pilots sitting at a special terminal with scripts and direct communications with the air traffic controllers. 2 Copyright © #### by ASME Can robust HEPs be elicited for ATM safety related tasks? Are HEPs as may be used in safety cases credible to controllers and pilots? If the answer to either is ‘no’, then this bodes badly for HRA in this work domain, and perhaps other approaches to manage human related risk must be investigated. If the answer to both is ‘yes’, then it means HRA in ATM is a feasible proposal. It does not of course mean the road to full acceptance and implementation will be easy, or even acceptable, but that is beyond the scope of this paper, which seeks only to establish feasibility at this early stage of HRA in ATM. The remainder of this paper therefore presents the abridged results of the two studies, showing indeed that the premise for HRA in ATM appears to be supported. References to the full studies are given. A final concluding section outlines a way forward for developing HRA as an effective approach in ATM. STUDY 1: HEP DATA COLLECTION IN REAL-TIME SIMULATION (CO-SPACE) [6] The primary principle in air traffic management is to keep aircraft separate, by certain minimum distances both vertically and horizontally. This is currently generally the controllers’ task when dealing with civil/commercial traffic. This task can lead to high workload in certain high density areas, as different ‘streams’ of aircraft are approaching busy airports. An option is therefore to allow the crew of one aircraft some degree of autonomy in separating their aircraft from the one in front, for example, via the use of specialized cockpit equipment. In the EUROCONTROL Co-space project, therefore, a new allocation of spacing tasks between controller and flight crew is envisaged as one possible option to improve air traffic management. It relies on a set of new “spacing” instructions, whereby the flight crew can be tasked by the controller to maintain a given spacing to a target aircraft. The motivation is neither to “transfer problems” nor to “give more freedom” to flight crew, but really to identify a more effective task distribution beneficial to all parties, without modifying responsibility for separation provision. In Co-Space the airborne spacing assumes availability of airborne Automatic Dependent Surveillance (ADS-B) along with cockpit automation (Airborne Separation Assistance System; ASAS). ASAS is a set of new ATC instructions, to allow, under the right conditions, the delegation of separation from ATC to pilots. No significant change on ground systems is initially required. These procedures and systems are under development in the Co-Space project, and in parallel a number of extensive real-time simulations (RTS) are being conducted to evaluate the adequacy of the resulting system performance. These RTS are carried out to assess usability and usefulness of time-based spacing instructions in TMA under very high traffic conditions, with and without the use of spacing instructions. In the pursuit of HRA feasibility, and also because one day the airborne separation assurance concept may be the subject of its own quantified safety case, it was agreed by the project team that HEP data collection could be attempted during the simulation. The Real Time Simulation (RTS) involved Approach controllers from Gatwick, Orly and Roma. They employed two generic approach sectors derived from an existing environment (Paris Terminal Maneuvering Area or TMA). Each air traffic control ‘sector’ (an area of airspace) was feeding into a single landing runway airport and was controlled by a unique Approach position manned with an executive and a planning controller. The role of each executive was to integrate two flows onto the final approach, and to transfer them to Tower controllers. Seven air traffic controllers (ATCO) positions, including the TMA one, were used for each one hour simulated session. The traffic was presequenced when entering each approach sector via two initial approach ‘fixes’. The traffic followed standard trajectories. No departure traffic and no ‘stacks’ (vertical holding points, as used in some airports) were simulated. The RTS utilized paper (rather than the new electronic) strips and a separate arrival manager tool. Controllers talked to pilots using standard radio-telephony headsets. Results During the simulation, a total of 613 communication ‘transactions’ between controllers and plots were analyzed, which contained 3,411 communication elements, and a number of errors. Tables 1 & 2 show the types of error made and by whom (controller or pilot), and how the errors were recovered (Table 2). This typology is in the context of air traffic operations. Table 3 shows the types of error that occurred in more general terms – such information is useful when trying to determine how to improve human performance. Table 3 shows in particular that simple numerical errors are common. This is seen in practice when two aircraft having a similar call sign occur in the same controller’s airspace in real life, leading to what is called a ‘call sign confusion’ error. Such an error can lead to a loss of safe standard separation distance between aircraft if the controller gives the right message to the wrong aircraft. A number 3 Copyright © #### by ASME of airlines today work hard with the ATM community to try to prevent similar call signs occurring in the same sector of airspace, to reduce the frequency of this type of error and so avoid its potential consequences. Table 4 shows errors with and without ASAS. Chi-square analysis was undertaken to identify if there were any significant differences between sessions where ASAS was used and those where ASAS was not used. No significant differences were identified. An equivalent analysis was also undertaken to identify if transactions which contained an ASAS instruction were more susceptible to communication errors than those which did not. Again, the analysis did not identify any significant differences between the two cases. This suggests that ASAS usage does not impact on the likelihood of communication errors. Table 5 shows errors by controllers and pilots (in this case ‘pseudo-pilots’ – however these are nevertheless actual qualified pilots, but working in a computer simulator workstation rather than a cockpit or cockpit simulator). The error rates are in this case strikingly similar even though the task environment and training is very different. Table 1. Errors during the ASAS Real Time Simulation Error type Controller 30 Slip No read back No response Contradict previous instruction Query Context required Use of non-English language Change of plan Break Station calling Expedite Total Pseudo-pilot 31 4 1 1 2 1 4 1 17 2 3 2 63 Total 61 4 2 2 10 8 1 Percentage 52% 3% 2% 2% 9% 9% 2% 14% 2% 3% 2% 100% 11 12 2 17 2 3 2 118 55 Table 2. How errors were recovered Error type Slip No read back No response Contradict previous instruction Query Context required Use of non-English language Change of plan Break Station calling Expedite Total None 8 1 Recovery Other Later 5 4 2 1 2 2 Self 44 Not Identified 11 2 9 2 17 2 3 2 81 13 4 10 5 10 Total 61 4 2 2 11 12 2 17 2 3 2 118 Copyright © #### by ASME Table 3. Details of error nature Slip type Incorrect numeric element within a numeric (e.g. 516 for 515) Whole numeric substituted for another (e.g. say 56 7 for 123) Numeric omission (e.g. say 1233 for 12335) Phonetic alphabet Company identifier (e.g. Britannia for Ryan air) Pilot read back of controller use of 'please' Repetition of phrases or call signs Errors in words/sequences in a standard phrase (e.g. ‘its er two nine er flight level two nine zero’) Total Percentag e 67% 2% 2% 5% 8% 2% 7% Frequency 41 1 1 3 5 1 4 5 61 8% 100% Table 4: Communication errors and use of ASAS All errors Slips Number of elements Likelihood of error Likelihood of slips Session used ASAS 71 39 1921 0.0370 0.0203 Session did not use ASAS 47 22 1490 0.0315 0.0148 Table 5: Controller versus pseudo-pilot slip rates Slips Number of elements Likelihood of slips Controllers 30 1705 0.018 Pseudo-Pilots 31 1706 0.018 How do the data compare with data collected in the field? Table 6 presents equivalent data from this study as the final column against data from actual studies for different UK and US airspace types (i.e. studies which have measured human error rates in the field). The data do provide very similar human error probabilities to the other studies. This suggests that the communication performance in the trial is similar to that experienced during live Air Traffic Control. This study as a whole found that HEPs can be collected, and that they appeared to exhibit the property of stability. As a key finding for example, the likelihood of communication slips (e.g. say 5 4 6 when 5 6 4 was intended), were shown to be constant across a range of conditions. Differences were not identified between error rates with/without ASAS, between ASAS and other communications, between different instruction types, between different controllers or between controllers and pseudo-pilots. These results would therefore support the required premise for HRA, at least for the task of communication, which is itself a safety critical one in the ATM industry. A separate study [7] presents unrecovered readback error rates for ATC communication of 0.006 from a number of field studies. The Co-Space simulator data provides an unrecovered readback error rate of 0.003. While low cell counts prohibit statistical comparisons, these data are certainly in the same 'ball park' and a tentative conclusion is that the performance in the simulation is comparable with data collected in the field. 5 Copyright © #### by ASME STUDY 2: HEP DATA GENERATION USING EXPERT JUDGMENT: THE GBAS CAT-1 SAFETY CASE GBAS: Ground-Based Augmentation System CAT-I/II/III operations at European airports are presently supported by an Instrument Landing Systems (ILS). The continued use of ILS-based operations as long as operationally acceptable and economically beneficial is promoted by the European Strategy for the planning of All Weather Operations (AWO). However, in the ECAC (European Civil Aviation Conference) region, the forecast traffic increase will create major operational constraints at all airports, in particular in Low Visibility Conditions (LVC) with the decreased capacity of runways. Consequently, the technical limitations of ILS such as Very High Frequency (VHF) interference, multipath effects due to, for example, new building works at and around airports, and ILS channel limitations will be a major constraint to its continued use. Within this context GBAS is expected to maintain existing all weather operations capability at CATI/II and III airports. GBAS CAT-I (ILS look-alike operations) is seen as a necessary step in order to extend its use to the more stringent operations of CAT-II/III precision approach and landing. Initial implementation of GBAS could be achieved in ECAC as early as 2008. A safety case is therefore being prepared for GBAS to see if it can be implemented. Within this developing safety case a number of potentially critical human errors were identified in the associated fault and event trees. No real-time simulation for GBAS has yet occurred, and its use is different from ILS. Furthermore, since ILS was implemented a long time ago, prior to the current safety paradigm, there was no safety case for ILS with which to compare identified human errors. Consequently, since there were no prior identified HEPs and none available from the real world (GBAS is not yet implemented) and few relevant ones from ILS operation, it was decided to attempt to use expert judgment approaches to quantify some of the key HEPs for the GBAS safety assessment. In HRA a number of approaches are recommended for using expert judgment [4, 8]. This study chose two methods which have formerly been used successfully in a new HRA application area (offshore petrochemical) [9]. The approaches are Absolute Probability Judgment and Paired Comparisons, as outlined below. In addition, the Human Error Assessment and Reduction Technique (HEART) [10] was applied to address those HEPs that were not tackled by the APJ session and also to overlap with the APJ figures so that effective cross checking between the techniques could be conducted. Absolute probability Judgment (APJ) Absolute Probability Judgment [4, 8] is the most direct approach to the quantification of human error probabilities (HEPs). It relies on the use of experts to estimate HEPs based on their knowledge and experience. The method used in this project was the Consensus Group Method, following these steps: 1. 2. 3. 4. 5. 6. Task statements were prepared in the form of a booklet with room for individuals to enter their estimate for each task plus any key assumptions they made. During the session, each scenario and each task was explained to the experts. The experts were then given time to enter their individual estimate for each task. A discussion was held in which each expert gave a view, usually starting with those who had given the most extreme (high and low) estimates. The group was then facilitated in order to try to obtain a consensus value. If that was not possible, each individual would be asked to revisit their original estimates in light of the discussion and revise if necessary. Following the session all the booklets were collected and a list of the consensus values and the aggregated individual estimates (where consensus failed) was made. A dry run session was held with three ex-controllers at the EUROCONTROL Experimental Centre (EEC) on 18th August 2004. This session was intended to test out some of the task questions and the general method. It provided a valuable indication of the time necessary for such sessions in an ATM context and led to a revised question set making the language more appropriate for controllers and pilots. Prior to the full session, briefing material was sent to all participants explaining the objectives and the format of the exercise. The full expert session was held on 30th November and 1st December 2004 at the EUROCONTROL Experimental Centre in Bretigny, South of Paris (France). The first day was dedicated to pilot estimates of error probabilities; the second day focussed on estimates by controllers and a ground system maintenance specialist. In the session itself, prior to beginning the process described above, an introduction and warm up exercise was carried out. The introduction gave a brief history to the method and explained how the session would be run. The warm up 6 Copyright © #### by ASME exercise involved using the templates to estimate errors involving the use of a car. This helped the specialists understand how the process worked. Paired Comparison (PC) This technique does not require experts to make any quantitative assessments. Rather the experts are asked to compare a set of pairs of tasks and for each pair the expert must decide which has the highest likelihood of error. In the context of this study this technique was used to test “within-judge consistency”. Experts may exhibit internal inconsistencies that will be shown as “circular triads” (this is where, for example, an expert successively says that A is greater than B, B is greater than C and C is greater than A - effectively saying that A is greater than itself. It is therefore necessary to determine the number of circular triads and decide if this is so high that the results should be rejected, and there are mathematical approaches to determining how many triads are ‘allowable’ before the judgment is considered unsound [8]. In the ‘dry run’ seven questions were used with a total of 21 pair comparisons3 to be made. The comparisons were randomized for each subject. The participants were asked to answer the questions without delay and without reference back to previous answers. Only three questions were put on each page of the answer booklet to reduce the chance of referring back. The three participants found the exercise reasonably straightforward and no significant changes were identified for the full session itself. Results Paired Comparisons Applying the Paired Comparisons calculation method one person from the pilot group and one person from the controller group were ‘screened out’ (too many circular triads). The criterion chosen was if there was more than a 10% chance that the preferences had been allocated purely at random then the results were set to one side. Thus, for example in the case of the pilot group, if the number of circular triads (c) was 3 or more then there was a greater than 10% chance that the ordering was random. Absolute probability Judgment It proved difficult for the participants to arrive at a consensus when directly estimating HEPs. Following discussion a number of participants were willing to change their HEP values (e.g. if they were presented with new information, or they had misunderstood an aspect of the context of the error). However, this very rarely led to a consensus across the whole group. Therefore aggregation of individual estimates was required and geometrical means were produced. The Geometric mean (GeoMean) is obtained by multiplying ‘n’ probabilities together and then taking the nth root. This approach is used because an arithmetic mean of probabilities across several orders of magnitude is biased towards the higher probabilities (consider for example the arithmetic and geometric means for the following values: 0.1; 0.01; and 0.001). A median can also be used, but in this case the GM approach was chosen. Some excerpts from the results are shown in Tables 6. In several cases the ranges between maximum and minimum estimates were too large, so that little confidence could be attached to the aggregated values. In other cases cross checks against the little available historical data (input and output) indicated under-prediction (optimism) from APJ. The values refer to events in the risk model that need to be quantified. They relate to the following three types of errors: C1: Capturing false information about final approach path D1: Failure to maintain a/c on final approach path F1: Selecting wrong runway It can be seen that there was a considerable range in the estimates (estimates within a factor of ten are desirable). There were several factors contributing to this. First, it was a diverse group, and they were of varied experience. Secondly, their familiarity with GBAS was also variable. Third, expert judgment work was new for all these experts, and there was only one day to get used to the process and make the assessments. Perhaps the most significant factor 3 The combinations involve comparing every question with every other one and removing the double counting of order. The number is calculated by n (n-1)/2 = 21 in the case of 7 questions and 15 in the case of 6 questions. 7 Copyright © #### by ASME however, for both the pilot and the controller sub-group, was the detailed nature of the assessments required. The ‘granularity’ of the assessments, namely specific errors in very specific contexts, was more precise than they were used to. In particular the experts needed significant contextual information. Furthermore, there was a tendency for the experts in both groups to think in terms of failure scenarios rather than concentrating on a single failure within a cutset, where such a failure on its own would be unlikely to lead to complete failure. This appeared to make the experts optimistic, since they believed the numbers they were quantifying would mean system failure, rather than a contribution (i.e. necessary but insufficient) towards system failure. Table 6. Pilot APJ Session – extract of results Excluding PC test (unsound expertise discarded) Potential Error {Code in Risk Model} C1a C1b D1 F1a F1b F1c Maximum 1.1E-03 2.5E-04 1.0E-03 4.0E-04 1.0E-03 1.0E-03 Minimum 2.0E-05 1.0E-05 1.0E-04 1.0E-05 1.0E-04 1.0E-04 Range 55 25 10 40 10 10 Geometric Mean 2.1E-04 3.5E-05 4.3E-04 6.9E-05 4.0E-04 4.6E-04 The lessons learned with this study were several. First, more time is needed, both to get used to the process, and also to fully understand what is being quantified. Experts need to understand when HEPs are effectively conditional and only part of a cutset, not leading to total failure. This requires more time in understanding the fault or event tree, and suggests a basic risk assessment appreciation, which could be provided during an extended warm-up or training period. A further lesson is the need for true experts, e.g. current pilots and controllers (and not retired ones or very new ones), and sufficient in number to gain evidence for statistical clustering of values around a mean HEP value. A final lesson learned is that having some existing data, and using it during the training session, acts as a good means of ‘calibration’ of experts. Despite the difficulties, the overall process for both controllers and pilots showed that some good estimates could be provided and used in actual safety cases (in several cases experts were asked to quantify HEPs where other reliability estimates for which the true observed values were known, and there was a good degree of agreement by some of the experts). The experts themselves also ‘warmed’ to the process, seeing the benefits of being involved and also the necessity of carrying out such quantitative analysis. They also felt it important to have their expertise informing such quantitative HRA, rather than being left to safety analysts who may not understand the real operational context and factors that influence pilot and controller behavior. In short, they saw their role in HRA. The developing GBAS safety case (currently at the Preliminary System Safety Assessment [PSSA] stage [11]) has undergone a favorable review in the industry. The GBAS PSSA has used a judicious mixture of actual data, expert judgment, and use of a non-industry specific HRA technique (HEART, [10]) to address the system’s key human involvements. The benefits for the safety case in terms of a better and more realistic representation of human performance, and resulting specific safety requirements on the future GBAS system, have vindicated the approach. In particular, the HRA approach as a whole led to qualitative safety requirements that were judged as necessary by the Air Navigation Service Provider stakeholders. The mechanics, benefits and limitations of practical HRA in ATM have therefore been demonstrated, and as a whole it has led to added safety value. CONCLUSIONS & FUTURE DIRECTIONS This paper has concerned itself with the feasibility of HRA in ATM. Two studies have leant support for a fundamental premise of HRA, namely that the HEP exists and can be a stable entity, and that the main ‘players’ in ATM (pilots and controllers) can see the benefits of HRA and their role with respect to HRA. These results suggest that HRA can indeed function usefully in ATM. In order to realize an integration of HRA into ATM safety assessment practice, several things must happen. First, more data must be elicited, for data calibration purposes, and to extend the premise of HRA beyond communications. Second, there must be a better understanding of the contextual factors and their role in influencing human performance. These factors, known generically in HRA as Performance Shaping factors, may differ in ATM 8 Copyright © #### by ASME compared to other industries. Work is now ongoing in EUROCONTROL to achieve a better understanding of such factors based on analysis of incidents. This work will lead to a database of some HEPs and associated contextual factors, which will help assessors know the approximate ‘ballpark’ value for a specific controller task type, and what factors they should consider when carrying out assessments for new or existing systems. Ultimately, a specific HRA technique should probably be developed for ATM, since such databases will never be large enough to answer all questions, and may be too ‘rooted’ in current operations to enable sufficient extrapolation to next generation ATM concepts. In conclusion therefore, HRA is feasible in ATM, and can add valuable insights in ATM safety cases. There now remains much work to be done to migrate from a state of ‘feasibility’ to one of full integration of HRA into safety and risk management of current and future ATM systems. But the first steps have been taken. ACKNOWLEDGMENTS & DISCLAIMER The authors would like to express thanks to the Co-Space team and the GBAS participants for their invaluable contributions to this study, and Oliver Straeter who is now working with the team on the further development of a data-informed HRA approach for EUROCONTROL. The opinions expressed in this paper are however those of the authors and do not necessarily reflect those of the organizations mentioned above or affiliated organizations. REFERENCES [1] EUROCONTROL, ESARR4, Risk Assessment and Mitigation in ATM, Edition 1.0, 5th April 2001, http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr4v1.pdf [2] http://www.eurocontrol.int/safety/GuidanceMaterials_SafetyAssessmentMethodology.htm [3] Swain, A.D. and Guttmann, H.E., 1983, Human reliability analysis with emphasis on nuclear power plant applications. USNRC/CR 1278, Washington DC 20555. [4] Kirwan, B. Rodgers, M. Schaefer, D. (Eds.), 2005, Human Factors Impacts in Air Traffic Management. Ashgate Publishing, Aldershot. [5] Kirwan, B. (1994) A guide to practical human reliability assessment. London: Taylor & Francis. [6] Gibson, W.H., and Hickling, B., 2006 (in press), Feasibility study into the collection of human error probability data, EEC Report, EUROCONTROL Experimental Centre, Bretigny sur Orge, BP 15, F-91222 CEDEX France. [7] Gibson, W.H. Megaw, E.D., Young, M and Lowe, E. (2006) A Taxonomy of Human Communication Errors and Application to Railway Track Maintenance. Cognition, Technology and Work. [8] Seaver, D.A. and Stillwell, W.G. (1983) Procedures for using expert judgement to estimate human error probabilities in nuclear power plant operations. NUREG/CR-2743, Washington DC 20555. [9] Basra, G. and Kirwan, B. (1998) Collection of offshore human error probability data. Reliability Engineering and System Safety, 61, 77-93. [10] Williams, J.C., 1986, "HEART - A Proposed Method for Assessing and Reducing Human Error", Proceedings of the 9th "Advances in Reliability Technology" Symposium, University of Bradford. [11] EUROCONTROL, 2005, Category-I (CAT-I) Ground-Based Augmentation System (GBAS) PSSA Report, v1.0, 18th Nov. contact eric.perrin@eurocontrol.int 9 Copyright © #### by ASME