Task 4.2: Empirical Studies Validation Summary Report October 2015 (Month 18 deliverable report) 1 Norman Fenton, Martin Neil, et al, “Hypothesis testing revisited: why the classical approach will almost certainly give the ‘wrong’ results”, working paper, Sept 2015 In this paper we will argue that the problems with classical statistical hypothesis testing are even more fundamental than previously considered. While much of what is presented in this paper is known, we properly articulate what hypothesis testing actually means and the relationship between different types of hypothesis testing using a BN model that unifies the different approaches and assumptions. Fenton, N. E. (2014). Assessing evidence and testing appropriate hypotheses. Science & Justice, 54(6), 502-504. http://dx.doi.org/10.1016/j.scijus.2014.10.007 It is crucial to identify the most appropriate hypotheses if one is to apply probabilistic reasoning to evaluate and properly understand the impact of evidence. Subtle changes to the choice of a prosecution hypothesis can result in drastically different posterior probabilities to a defence hypothesis from the same evidence. To illustrate the problem we consider a real case in which probabilistic arguments assumed that the prosecution hypothesis “both babies were murdered” was the appropriate alternative to the defence hypothesis “both babies died of Sudden Infant Death Syndrome (SIDS)”. Since it would have been sufficient for the prosecution to establish just one murder, a more appropriate alternative hypothesis was “at least one baby was murdered”. Based on the same assumptions used by one of the probability experts who examined the case, the prior odds in favour of the defence hypothesis over the double murder hypothesis are 30 to 1. However, the prior odds in favour of the defence hypothesis over the alternative ‘at least one murder’ hypothesis are only 5 to 2. Assuming that the medical and other evidence has a likelihood ratio of 5 in favour of the prosecution hypothesis results in very different conclusions about the posterior probability of the defence hypothesis. Fenton, N., Neil, M., & Constantinou, A. C. (2015a). Simpson's Paradox and the implications for medical trials. Under review, 2015. This paper describes Simpson’s paradox, and explains its serious implications for randomised control trials. In particular, we show that for any number of variables we can simulate the result of a controlled trial which uniformly points to one conclusion (such as ‘drug is effective’) for every possible combination of the variable states, but when a previously unobserved confounding variable is included every possible combination of the variables state points to the opposite conclusion (‘drug is not effective’). In other words no matter how many variables are considered, and no matter how ‘conclusive’ the result, one cannot conclude the result is truly ‘valid’ since there is theoretically an unobserved confounding variable that could completely reverse the result. Yet, B., Constantinou, A. C., Fenton, N., & Neil, M. (2015b). Partial Expected Value of Perfect Information of Continuous Variables using Dynamic Discretisation. Under review, 2015. In decision theory models partial Expected Value of Perfect Information (EVPI) is an important analysis technique that is used to identify the value of acquiring information on individual variables. Partial EVPI can be used to prioritise the parts of a model that should be improved or identify the parts where acquiring additional data or expert knowledge is most beneficial. Calculating partial EVPI of continuous 2 variables is challenging, and several sampling and approximation techniques have been proposed. This paper proposes a novel approach for calculating partial EVPI in hybrid Bayesian network (BN) models. The proposed approach uses dynamic discretisation (DD) and the junction tree algorithm to calculate the partial EVPI. This approach is an improvement on the previously proposed simulation-based partial EVPI methods, since users do not need to determine the sample size and the DD algorithm has been implemented in a BN software tool. We compare our approach with the previously proposed techniques using two case studies. Our approach accurately calculates the partial EVPI values, and, unlike the previous approaches, it does not require the user to assess sampling convergence. Dewitt S. H., Hsu, A, Fenton NE, Lagnado D (2015), “Nested Sets or Causal Framing? Optimal Presentation Format for Bayesian Reasoning”, submitted Journal of Experimental Psychology: General Previous work has found that nested sets and causal framings significantly help people solve simple Bayesian problems. We compare the individual and combined effects of these two approaches in a single experiment. We find a beneficial effect for the nested sets, but no effect for causal framing. We also use qualitative methods to capture participants’ thought processes during problem solving. Based on this data, we propose a four-stage process model of how individuals successfully solve simple Bayesian problems. This process is the modal response in all four experimental conditions and is therefore proposed as a universal process that successful individuals undertake on simple Bayesian problems regardless of framing. The model predicts our current experiment results, being more frequent in the nested sets condition than the control, but no more frequent in the causal condition. It also provides a good fit with results reported in previous work on the causal framing. 3