Supplementary material Artificial Intelligence (AI) is a branch of computer science concerned with developing systems that perceive their environment and use this information to take actions that maximise the chance of success. AI has uses in many fields of everyday life but pertinent to this study are its uses in data mining, that is the ability to discover knowledge in large databases. These are commonly used in online shopping sites to make predictions about future purchases (“people who liked that also liked this…”) and targeting adverts based on previous purchases. They have had limited uses in medicine to date, in part because of difficulty in gaining access to the large databases required to develop accurate predictors. All systems will rely on an algorithm and for the purposes of this study three relevant algorithms were investigated. The three different modelling algorithms used were Decision tree, Bayesian Networks and Artificial Neural Networks (ANN). Decision tree is a classifier in the form of a tree, which has nodes and a root. Each node is either a leaf node (indicates the value of outcome) or decision node (which performs some test to be carried out on a single attribute with a branch or sub-tree for each outcome). The process starts from the root node of the tree and moves through decision nodes and branches until reaching the leaf node (outcome) [1]. A Bayesian network for a set of variables X { X 1 ,..., X n } consists of a network structure S that interprets a set of conditions about variables in X and a set of P of probabilities associated with each variable [2]. Neural networks (NN) are an abstract, computer representation of the human brain. They mimic the fundamental functional source of intelligence which are neurons [26], with artificial neurons. Although the term neural network is widely used in the literature some other authors, to distinguish from the natural brain neural networks, using the term artificial neural networks (ANN) instead [3]. The algorithm comprises at least three levels of nodes; the input nodes which are defined by the dataset, the hidden nodes which the system defines and the outlut nodes which constitute the result of the analysis. The relative weightings of the nodes are defined during the training phase of the system and can be reattributed as new data become available. In tandem with the algorithm, data need to be classified using one of the following approaches: support vector machine (SVM), naive bayes (NB) and K-nearest neighbor (KNN). A Support Vector Machine (SVM) performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories [4]. NB algorithm computes the current probability that the data belongs to different classes and assigns it to the class with the highest posterior probability and a KNN classifies an object by the majority vote of its neighbours, with the object being assigned to the class most common amongst its k nearest neighbours [5]. Table 1. clinical characteristics of dataset total age stage grade 1 2 3 4 170 39 354 64 41 1 2 3 105 94 305 164 300 260 46 62 0 histological type high grade serous endometrioid clear cell mucinous preoperative CA125 outcome of surgery missing median data 67 6 420 complete optimal suboptimal 261 164 136 113 107 Additional references 1. 2. 3. 4. 5. Alpaydin, E., Introduction to Machine Learning (Adaptive Computation and Machine Learning)2004, Cambridge: MIT Press. Neapolitan, R., Learning Bayesian Networks2004, New Jersey: Prentice-Hall. Fausett, L., Fundamentals of neural networks1994, New Jersey: Prentice-Hall. Cristianini, N. and J. Shawe-Taylor, An Introduction to Support Vector Machines2000, Cambridge: Cambridge University Press. Enas, G. and S. Choi, Choice of the smoothing parameter and efficiency of the k-nearest neighbour classification. Comput. Math. Applic, 1986. 12: p. 235244.