Confidence Experiment Münster, EWICS Workshop, 28/04/04 Eugenio Alberdi, Meine van der Meulen, Robin Bloomfield, Bev Littlewood, Peter Ayton DIRC Easter Meeting – 16 March 2005 Motivation ‘Dependability Cases’ TA uncertainty in quantitative probabilistic claims for dependability – always present need for a rigorous and formal understanding of the role of confidence in dependability arguments Goals of the exercise: Investigate how experts express confidence on dependability judgements to support a claim at a particular quantitative level AND at a particular level of confidence elicit distributions underlying experts’ own beliefs look at a distribution of a population of experts See impact of empirical data on modelling work Psychology: studies of human confidence on probabilistic judgments e.g. overconfidence and incoherent judgments influence of how information is presented not normally studied with experts Procedure Presentation from Meine on the application Participants asked to: choose the ‘pfd interval’ in which the described functionality fits best confidence in different ‘pfd intervals’ for that application 4 phases: I: after hearing Meine’s presentation II: after they get an answer for 1 question they are allowed to ask III: after they hear Meine’s answers to all questions asked by other participants IV: after a “Delphi”-like interaction amongst participants Application Domain Hoist Nuclear material Transfer port Safety function: If the safety pushbutton is pressed, the carousel will stop moving Goal: Assess pfd of (new) software that controls the safety function Information about: test reports, safety analyses, formal proof Procedure Presentation from Meine on the application Participants asked to: choose the ‘pfd interval’ in which the described functionality fits best confidence in different ‘pfd intervals’ for that application 4 phases: I: after hearing Meine’s presentation II: after they get an answer for 1 question they are allowed to ask III: after they hear Meine’s answers to all questions asked by other participants IV: after a “Delphi”-like interaction amongst participants Procedure Presentation from Meine on the application Participants asked to: choose the ‘pfd interval’ in which the described functionality fits best confidence in different ‘pfd intervals’ for that application 4 phases: I: after hearing Meine’s presentation II: after they get an answer for 1 question they are allowed to ask III: after they hear Meine’s answers to all questions asked by other participants IV: after a “Delphi”-like interaction amongst participants Participants 12 delegates at EWICS meeting: 3 left early Remaining 9 participants: Self-reported experience in the assessment of safety software 3 “not very experienced” • 2 researchers; 1 security & business development 3 “fairly experienced” (5-10 years) • 2 researchers; 1 researcher and assessor 3 “very experienced” (10-30 years) • 1 researcher; 1 assessor; 1 researcher and assessor Countries: Austria, Germany, Norway, Poland, UK Mean Confidence Values (9 participants) 25% 20% 15% Mean Confidence % PHASE I PHASE II PHASE III PHASE IV 10% 5% 0% >10-1 (highest) 10-1 to 10-2 10-2 to 10-3 10-3 to 10-4 pfd intervals 10-4 to 10-5 <10-5 (lowest) Chosen Interval - Confidence 7 >10-1 10-1 10-2 to to 6 10-2 10-3 10-3 to 10-4 10-4 to 10-5 <10-5 75% 70% 90% 60% 60% 60% 70% 60% 66% 5 3 60% 60% 0% 4 95% 45% 50% 0% 0% 65% 50% 65% 0% 65% 45% 29% 29% 40% 40%, 40% 30% 29% 2 1 0 PHASE II 2 6 7 8 9 10 50% 29% PHASE I 1 PHASE III PHASE IV 11 12 Chosen Interval - Confidence 7 >10-1 10-1 to 6 10-2 10-2 to 10-3 10-3 to 10-4 10-4 to 10-5 <10-5 75% 70% 90% 60% 60% 60% 70% 60% 66% 5 4 3 95% 60% 60% 1 45% 50% 65% 50% 65% 50% 65% 45% 30% 2 40% 40% 40% 1 0 PHASE I PHASE II PHASE III PHASE IV 2 6 8 9 11 45% 40% More experienced (3 participants) 35% 30% 25% Mean Confidence % 20% 15% 10% 5% 0% >10-1 10-1 to 10-2 10-2 to 10-3 10-3 to 10-4 10-4 to 10-5 highest pfd <10-5 lowest pfd 45% 40% Less experienced (3 participants) 35% 30% 25% Mean Confidence % 20% 15% 10% 5% 0% >10-1 highest pfd 10-1 to 10-2 10-2 to 10-3 10-3 to 10-4 10-4 to 10-5 <10-5 lowest pfd Some conclusions Feasible task: mixture of cooperation and resistance Considerable variability among participants: Expertise levels, backgrounds, assessments, confidence levels, proneness to change their mind… Less experienced participants: less likely to change their minds More experienced participants : tend to perceive the system as being less reliable (higher pfd) Increasing scepticism about system’s reliability as the study evolved and more information was available Most likelihood to change judgement in the 4th ("Delphi") phase of experiment Future Work Further analysis of collected data: Correspondence between “qualitative” & “quantitative” confidence Implications for modelling Further data collection: Better screened & larger set of experts Control - different groups getting: Refine some of the procedures different types of information? (e.g. better safety arguments, less “ideal” system) or information presented in different ways? collection of think-aloud protocols? more open ended? how questions are asked… Psychological literature on people’s confidence in uncertain judgments