Confidence Experiment Münster, EWICS Workshop, 28/04/04

advertisement
Confidence Experiment
Münster, EWICS Workshop, 28/04/04
Eugenio Alberdi, Meine van der Meulen, Robin Bloomfield, Bev Littlewood, Peter Ayton
DIRC Easter Meeting – 16 March 2005
Motivation

‘Dependability Cases’ TA
uncertainty in quantitative probabilistic claims for
dependability – always present
 need for a rigorous and formal understanding of the
role of confidence in dependability arguments



Goals of the exercise:

Investigate how experts express confidence on
dependability judgements




to support a claim at a particular quantitative level AND at a
particular level of confidence
elicit distributions underlying experts’ own beliefs
look at a distribution of a population of experts
See impact of empirical data on modelling work
Psychology:

studies of human confidence on probabilistic judgments



e.g. overconfidence and incoherent judgments
influence of how information is presented
not normally studied with experts
Procedure


Presentation from Meine on the application
Participants asked to:
 choose
the ‘pfd interval’ in which the described
functionality fits best
 confidence in different ‘pfd intervals’ for that
application

4 phases:
 I:
after hearing Meine’s presentation
 II: after they get an answer for 1 question they
are allowed to ask
 III: after they hear Meine’s answers to all
questions asked by other participants
 IV: after a “Delphi”-like interaction amongst
participants
Application Domain
Hoist
Nuclear material
Transfer port
Safety function:
If the safety pushbutton is pressed, the carousel will
stop moving
Goal: Assess pfd of (new) software that controls the safety
function
Information about: test reports, safety analyses, formal
proof
Procedure


Presentation from Meine on the application
Participants asked to:
 choose
the ‘pfd interval’ in which the described
functionality fits best
 confidence in different ‘pfd intervals’ for that
application

4 phases:
 I:
after hearing Meine’s presentation
 II: after they get an answer for 1 question they
are allowed to ask
 III: after they hear Meine’s answers to all
questions asked by other participants
 IV: after a “Delphi”-like interaction amongst
participants
Procedure


Presentation from Meine on the application
Participants asked to:
 choose
the ‘pfd interval’ in which the described
functionality fits best
 confidence in different ‘pfd intervals’ for that
application

4 phases:
 I:
after hearing Meine’s presentation
 II: after they get an answer for 1 question they
are allowed to ask
 III: after they hear Meine’s answers to all
questions asked by other participants
 IV: after a “Delphi”-like interaction amongst
participants
Participants

12 delegates at EWICS meeting: 3 left early

Remaining 9 participants:
 Self-reported
experience in the assessment of
safety software

3 “not very experienced”
• 2 researchers; 1 security & business development

3 “fairly experienced” (5-10 years)
• 2 researchers; 1 researcher and assessor

3 “very experienced” (10-30 years)
• 1 researcher; 1 assessor; 1 researcher and assessor
 Countries:
Austria, Germany, Norway, Poland, UK
Mean Confidence Values (9 participants)
25%
20%
15%
Mean
Confidence
%
PHASE I
PHASE II
PHASE III
PHASE IV
10%
5%
0%
>10-1
(highest)
10-1 to 10-2
10-2 to 10-3
10-3 to 10-4
pfd intervals
10-4 to 10-5
<10-5
(lowest)
Chosen Interval - Confidence
7
>10-1
10-1
10-2
to
to
6
10-2
10-3
10-3 to 10-4
10-4 to 10-5
<10-5
75%
70%
90%
60%
60%
60%
70%
60%
66%
5
3
60%
60%
0%
4
95%
45%
50%
0%
0%
65%
50%
65%
0%
65%
45%
29%
29%
40%
40%,
40%
30%
29%
2
1
0
PHASE II
2
6
7
8
9
10
50%
29%
PHASE I
1
PHASE III
PHASE IV
11
12
Chosen Interval - Confidence
7
>10-1
10-1
to
6
10-2
10-2 to 10-3
10-3
to
10-4
10-4 to 10-5
<10-5
75%
70%
90%
60%
60%
60%
70%
60%
66%
5
4
3
95%
60%
60%
1
45%
50%
65%
50%
65%
50%
65%
45%
30%
2
40%
40%
40%
1
0
PHASE I
PHASE II
PHASE III
PHASE IV
2
6
8
9
11
45%
40%
More experienced
(3 participants)
35%
30%
25%
Mean Confidence %
20%
15%
10%
5%
0%
>10-1
10-1 to 10-2
10-2 to 10-3
10-3 to 10-4
10-4 to 10-5
highest pfd
<10-5
lowest pfd
45%
40%
Less experienced
(3 participants)
35%
30%
25%
Mean Confidence %
20%
15%
10%
5%
0%
>10-1
highest pfd
10-1 to 10-2
10-2 to 10-3
10-3 to 10-4
10-4 to 10-5
<10-5
lowest pfd
Some conclusions

Feasible task: mixture of cooperation and resistance

Considerable variability among participants:

Expertise levels, backgrounds, assessments, confidence
levels, proneness to change their mind…

Less experienced participants: less likely to change their
minds

More experienced participants : tend to perceive the
system as being less reliable (higher pfd)

Increasing scepticism about system’s reliability as the
study evolved and more information was available

Most likelihood to change judgement in the 4th ("Delphi")
phase of experiment
Future Work

Further analysis of collected data:
Correspondence between “qualitative” & “quantitative”
confidence
 Implications for modelling


Further data collection:
Better screened & larger set of experts
 Control - different groups getting:




Refine some of the procedures



different types of information? (e.g. better safety
arguments, less “ideal” system)
or information presented in different ways?
collection of think-aloud protocols? more open ended?
how questions are asked…
Psychological literature on people’s confidence in
uncertain judgments
Download