Supplementary Material Materials and Methods 1. Methods of Logical Inference A comparison of deduction and abduction as methods of logical inference. Deduction Rule Fact ∴ If a cell grows on minimal medium, then it can synthesise tryptophan. Cell cannot synthesise tryptophan Cell cannot grow on minimal medium. Given the rule P Q, and the fact Q, infer the fact P (deduction - modus tollens) Abduction Rule Fact ∴ If a cell grows on minimal medium, then it can synthesise tryptophan. Cell cannot grow on minimal medium. Cell cannot synthesise tryptophan. Given the rule P Q, and the fact P, infer the fact Q (abduction) Deduction is sound in the logic we use (first-order predicate logic). Informally, this means that if the rule and fact used in inference of the above form are true, then the inferred fact must also be true. However, abduction is generally not sound. Thus, in the abduction example, there could be many other reasons why the cell cannot grow. Despite this, abduction is required to infer new scientific knowledge. 2. Auxotrophic Growth Experiments and the Aromatic Amino Acid Pathway The mutants (of strain BY4741 [ATCC201388] MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0; Brachmann, C.B. et al. Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disrution and other applications. Yeast 14, 115-132) had the complete reading-frame of each protein-encoding gene deleted by replacement with a selectable marker gene that has no phenotype in the absence of the selective agent (Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901-906, 1999; Giaever G. et al. Functional profiling of the Saccharomyces cerevisiae genome Nature 418, 387-391, 2002). They are thus nonrevertible null mutants. Limitations in the availability of mutant yeast strains restricted the number of genes used in the in vivo investigations to fifteen, and the auxotrophic experimental requirement for a difference in growth phenotype, in turn, reduced this number to eight (the other mutants always either grew or failed to grow): ybr166c, ydr007w, ydr035w, ydr354w, yer090w, ygl026c, ykl211c, ynl316c. The number of possible metabolites was limited by availability and cost to nine: anthranilate (10), indole (190), p-hydroxyphenol pyruvic acid (193), L-phenylalanine (53), phenylpyruvate (30), phosphoenol pyruvate (9385), shikimic acid (633), L-tyrosine (53), L-tryptophan (53). The numbers in brackets are the normalised true experimental costs of using each metabolite in a growth medium, note the ~3 order of magnitude range. Note that, in this pathway, some open reading-frames (ORFs) encode enzymes that catalyse more than one biochemical reaction (e.g. YDR127w – 3-dehyrdoquinate synthase, 3-dehydroquinate dehydratase, shikamate 5-dehydroenase, shikimate kinase, 3- phosphoshikimate 1-carboxyvinyltransferase); while, for other reactions, there are iso-enzymes encoded by different ORFs (e.g. YBR249c and YDR035w both encode phospho-2-dehydro-3-deoxyheptonate aldolase), thus providing redundancy. Investigating the full behavior of the available genes and metabolites would require at least 7,665 growth experiments (without repetition). We therefore decided to restrict the investigation to experiments with either a single metabolite or a pair of metabolites added. The number of experiments for each gene is thus restricted to 45 (9 + ((9*8)/2)), giving 360 (8*45) possible experiments. A single experiment uses a single well of the 96-well plate. One [mutant + medium] combination is placed in each of the 8 wells, in a single column, together with minimal medium agar gel. The mutants are pre-grown overnight in 5ml rich medium in a shaker at 37ºC (200rpm) then diluted to a 1:100 concentration in a ¼ strength Ringer’s solution, whilst the added metabolites are made up to a 0.2% concentration. As a control, each medium combination found on a single plate is also used with the wild-type yeast (BY4741) that is the parent of the mutant strains. To reduce (and to monitor) contamination, wells belonging to the outer columns of the plate were filled with agar only. After the experiments were set up, the plates were incubated for 24 hours at 30ºC and then growth measured. Growth was measured using a Wallac 1420 Multilabel counter. The mean and median growth measurement for each experimental combination (mutant + medium) is derived from all wells containing the combination on the plate (usually 8 wells, but 16 if two columns have the same combination of substances). The following simple decision tree was used to determine growth (see below for an explanation of how this was formed): If the mean of the mutant’s growth =< 0.44225 then the growth class is “no-growth” else If the difference between the median values for the mutant’s growth and the wild type’s =< 0.2565 then the growth class is “no-growth” else the growth class is “growth”. The use of this tree is a form of inductive inference. 3. Computational Model of the Aromatic Amino Acid Pathway The Prolog model of the pathway was refined in three stages: The original model was translated from KEGG and carefully checked with the literature. We checked that all the KEGG reactions were documented in S. cerevisiae (consistency), and that there were no other related reactions described in the literature (completeness). The predictions of the model were then compared with the results of the singlemetabolite experiments (see above). Whether growth or no-growth was observed was at this point decided visually. Since certain metabolites did not seem to affect growth in the way predicted by the literature, we refined the model to make these metabolites unable to be imported into the cells efficiently. This inference was, of course, an abduction. It was also necessary to add inhibition effects. For example, the results for adding tyrosine to ydr035w deletion mutants were anomalous: without tyrosine, the mutants grew; with tyrosine they didn’t. This was unexpected, as one would predict that the result of adding an amino acid, such as tyrosine, should be monotonic as regards to growth. Our implemented explanation of this is that YBR249C and YDR035W encode isoenzymes that catalyse the reaction: phosphoenolpyruvate + erythrose 4-phosphoric acid -> 7-P-2-dehydro-3-deoxy-darabino-heptonate; when YDR035W is deleted, YBR249C remains and allows pathways in the graph to tyrosine, tryptophan, and phenyalalanine. In the presence of tyrosine in the medium, the enzymic product of YBR249C is inhibited, blocking the pathway to tryptophan and phenyalalanine, and stopping growth. The results of the double-metabolite experiments were then tested against the model, and the automatic growth-calling software optimised by learning a decision tree to fit the experimental results to the model (see above).The model developed on the single metabolites was consistent with all but < 1.5% of the double-metabolite experiments. The model was not further changed to include these experimental discrepancies. The final model is therefore logically “incorrect”, in that it incorrectly predicts the results of some experimental observations. We consider this a “feature” of the model, as it is the typical situation in biological research. There are two types of “noise” in the physical experiments: Experiment and measurement noise (we estimate that ~25% of experiments are noisy, i.e. they give an observation of growth or no-growth different from expected). Noise due to errors in the background knowledge (the model does not agree with < 1.5% of experimental results.) Because of the possibility of noise, we also implemented a simple system to allow the Robot Scientist to backtrack when all possible hypotheses were contradicted by experimental results. A training set is known to contain misclassified examples when either of two situations occurs: the hypotheses generated at a given iteration are not a subset of the hypotheses generated at the previous iteration, or no hypotheses can be generated at a given iteration. In an effort to correct the misclassified training examples, new training sets are generated where the classification of a single example is changed. Hypotheses are generated from each of these new training sets and the hypothesis set containing the fewest hypotheses is chosen. This noise abatement system has a number of drawbacks: it assumes that there is only one misclassified example in a training set and the process of altering the classification of training examples does not guarantee that the new hypothesis set is correct. 4. Performance Measures The average performance of the hypotheses is an appropriate performance measure because it rewards learners that discriminate between competing hypotheses. This approach is a compromise between selecting the highest probability hypothesis, and weighting all predictions by the probability of the hypotheses that generated them. In active learning, the performance curves that have been generally used plot predictive accuracy against the number of training examples. Often, two curves are plotted on the same graph, one for active learning and one for random sampling. The accuracy of a single hypothesis is the number of correct predictions that this hypothesis makes about all possible single- and double-metabolite experiments, based on using the model as the oracle. (An alternative approach would have been to have used the hand-generated expected result of the experiments, but this has the disadvantage of not giving the correct hypothesis 100% accuracy, as well as compromising the status of the Robot Scientist as an automated system.) Such performance plots allow the difference in the number of experiments (examples/time) required to reach a particular level of performance to be compared. However, one drawback of such plots is that they ignore any variation in the price of obtaining individual examples. When such variation does exist, and the aim is to compare the price of attaining particular levels of performance, these plots are potentially misleading. To overcome this drawback, we also plot the cumulative price of the experiments against performance (Bryant et al., 2001). For this, we use the normalised price of the metabolite. At the start of the experiments, when there are 8 possible hypotheses, the average accuracy is 57%. 5. Structure of the Robot Scientist The robot automates the task of liquid handling and can conduct assays by pipetting and mixing liquids on microtitre plates. The robot is controlled using TCL, and we have written a compiler that translates Prolog commands into TCL robot operations. Given a Prolog definition of one or more experiments, we have developed code which designs a layout of the robot that will allow these experiments, with controls, to be carried out efficiently. In addition, the robot has to be automatically programmed to plate out the yeast and media into the correctly defined wells. The microtitre plates were measured using the adjacent plate reader and the results were returned to the LIMS. To reduce cost, the transfer of the plates from the robot to the incubator, and from the incubator to the plate reader were done manually – although this would have been trivial to automate. The key point is that there was no human intellectual input in the experiment design/interpretation etc. 6. Prolog Model %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This model is designed to mimic auxotrophic mutant experiments in % the aromatic amino acid pathway of yeast. It pertains to % Phenylalanine, tyrosine and tryptophan biosynthesis. See KEGG map % 400 at % http://www.genome.ad.jp/dbget-bin/get_pathway?org_name=sce&mapno=00400 % Also see Stryer Chp 28 page 724. % Note that the pathway had to be carefully checked by hand as there % were errors and missing data in KEGG. The model is a % representation of all of the known steps in this pathway. % The model does not include genes as such. Instead it includes Open % Reading Frames (ORFs) which are putative genes. Some ORFs may not % code for anything. % The code for processing sets assumes that any lists are ordered. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% start('C00631'). start('C00279'). start('C00005'). start('C00000'). % h not in KEGG start('C00002'). start('C00014'). start('C00064'). start('C00119'). start('C00065'). start('C00003'). start('C00006'). start('C00001'). % h2o start('C00011'). % co2 start('C00025'). end('C00078'). end('C00079'). end('C00082'). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Import metabolites % To account for slow import of some metabolites % NB use of "I" to label metabolites outside cell enzyme(i1,[import],[x],1,1,[['I00074']],[['C00074']]). enzyme(i2,[import],[x],1,1,[['I00078']],[['C00078']]). enzyme(i3,[import],[x],1,1,[['I00079']],[['C00079']]). enzyme(i4,[import],[x],1,1,[['I00082']],[['C00082']]). enzyme(i5,[import],[x],1,1,[['I00108']],[['C00108']]). enzyme(i6,[import],[x],1,2,[['I00166']],[['C00166']]). % slow enzyme(i7,[import],[x],1,1,[['I00463']],[['C00463']]). enzyme(i8,[import],[x],1,1,[['I00493']],[['C00493']]). enzyme(i9,[import],[x],1,2,[['I01179']],[['C01179']]). % slow %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Enzymes in aromatic pathway % enzyme(+enz_id,[+orf|ORFS],[+enz_class|Classes],+direction,+day, % [+lhs|Lefts],[+rhs|Rights]). % enz_id - unique ID referencing enzyme classes % orf - open reading frame % enz_class - a list of enzyme classes. Each class is the Enzyme % Commission classification number for an enzyme. % lhs - [+left|LHS] - a list of metabolites to be found in the lhs of % a reaction % % rhs - [+right|RHS] a list of metabolites to be found in the rhs of % a reaction enzyme(e1,['YGR254W'],['4.2.1.11'],1,1,[['C00631']],[['C00001','C00074']]). enzyme(e2,['YHR174W'],['4.2.1.11'],1,1,[['C00631']],[['C00001','C00074']]). enzyme(e3,['YMR323W'],['4.2.1.11'],1,1,[['C00631']],[['C00001','C00074']]). enzyme(e4,['YBR249C'],['4.1.2.15'],1,1,[['C00001','C00074','C00279']], [['C00009','C04691']]). enzyme(e5,['YDR035W'],['4.1.2.15'],1,1,[['C00001','C00074','C00279']], [['C00009','C04691']]). enzyme(e6,['YDR127W'],['4.6.1.3','4.2.1.10','X','1.1.1.25','2.7.1.71','2.5.1.19'], 1,1,[['C04691'],['C00944'],['C02637'],['C00000','C00005','C02652'], ['C00002','C00493'],['C00074','C03175']], [['C00009','C00944'],['C00001','C02637'],['C02652'],['C00006','C00493'], ['C00008','C03175'],['C00009','C01269']]). enzyme(e7,['YGL148W'],['4.6.1.4'],1,1,[['C01269']],[['C00009','C00251']]). enzyme(e8,['YER090W'],['4.1.3.27'],1,1,[['C00014','C00251']], [['C00001','C00022','C00108']]). enzyme(e9,['YER090W','YKL211C'],['4.1.3.27'],1,1,[['C00064','C00251']], [['C00022','C00025','C00108']]). enzyme(e10,['YDR354W'],['2.4.2.18'],1,1,[['C00108','C00119']], [['C00013','C04302']]). enzyme(e11,['YDR007W'],['5.3.1.24'],1,1,[['C04302']],[['C01302']]). enzyme(e12,['YKL211C'],['4.1.1.48'],1,1,[['C01302']], [['C00001','C00011','C03506']]). enzyme(e13,['YGL026C'],['4.2.1.20'],1,1,[['C00065','C03506'],['C03506'], ['C00065','C00463']],[['C00001','C00078','C00661'], ['C00463','C00661'],['C00001','C00078']]). enzyme(e14,['YPR060C'],['5.4.99.5'],1,1,[['C00251']],[['C00254']]). enzyme(e15,['YNL316C'],['4.2.1.51'],1,1,[['C00003','C00254']], [['C00004','C00011','C00166']]). enzyme(e16,['YBR166C'],['1.3.1.13'],1,1,[['C00006','C00254']], [['C00000','C00005','C00011','C01179']]). enzyme(e17,['YGL202W'],['2.6.1.7'],1,1,[['C00025','C00166'],['C00025','C01179']], [['C00026','C00079'],['C00026','C00082']]). %not in KEGG enzyme(e18,['YHR137W'],['2.6.1.7'],1,1,[['C00025','C00166'],['C00025','C01179']], [['C00026','C00079'],['C00026','C00082']]). %not in KEGG %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % inhibitor(+orf,[+enz_id|Enzymes],+enz_class,[+met|Metabolites]) % inhibitor/4 states that each of the given list of metabolites % inhibits the function of the enzyme(s) coded for by the given ORF % orf - open reading fram % enz_id - unique enzyme id number % enz_class - Enzyme Commision enzyme class % met - metabolite number inhibitor('YBR249C',[e4],'4.1.2.15',['C00082']). inhibitor('YDR035W',[],'4.1.2.15',[]). inhibitor('YDR127W',[],'1.1.1.25',[]). inhibitor('YGL148W',[],'4.6.1.4',[]). inhibitor('YER090W',[],'4.1.3.27',[]). inhibitor('YKL211C',[],'4.1.3.27',[]). inhibitor('YGR354W',[],'4.2.1.11',[]). inhibitor('YDR007W',[],'5.3.1.24',[]). inhibitor('YGL026C',[],'4.2.1.20',[]). inhibitor('YPR060C',[],'5.4.99.5',[]). inhibitor('YNL316C',[],'4.2.1.51',[]). inhibitor('YGL202W',[],'2.6.1.7',[]). inhibitor('YHR137W',[],'2.6.1.7',[]). inhibitor('YBR166C',[],'1.3.1.13',[]). % shared_orfs(+orf,[+enz_id|Enzymes]) % The given ORF is involved in the production of the given list of % enzymes (must be more than one - the enzymes ``share'' the ORF) shared_orfs('YER090W',[e8,e9]). shared_orfs('YKL211C',[e9,e12]). % new_enzyme/4 - test the reaction coded for by the enzyme % the enzyme should not be used as an import mechanism new_enzyme(EnzId,EnzClass,Reactants,Products) :enzyme(EnzId,Orfs,EnzClass,_Direction,_Day,Reactants,Products), \+Orfs == [import]. % change_all_metabolites changes all import metabolite identifying % labels from C**** to I***** e.g C00074 becomes I00074 % used to convert metabolites appearing in 2nd argument of phenotypic_effect/2 change_all_metabolites([],RDone,Changed) :reverse(RDone,Changed). change_all_metabolites([M|Metabolites],Done,Changed) :change_metabolite(M,CM),!, change_all_metabolites(Metabolites,[CM|Done],Changed). change_metabolite(Metabolite,NewMetabolite) :name(Metabolite,[_C|Ascii]), conc([73],Ascii,NewAscii), name(NewMetabolite,NewAscii). % inhibited/2 - test whether an enzyme is inhibited by the % current cell contents inhibited(EnzId1,Starts) :inhibitor(_Orf,EnzIds,_,List), member(EnzId1,EnzIds), member(Inhib,Starts), member(Inhib,List). same_enzyme(EnzId1,EnzId2) :EnzId1 == EnzId2. % same_enzyme_function/9 is true if those metabolites on the RHS % that belong to the main pathway are the same for each enzyme, % A metabolite belongs to the main pathway if it appears in the % starts, is an end product, or appears in the LHS lists of the enzyme % definitions. % assumes only forwards direction - could be made reversible if % direction was taken into account and the RHS was used when the % system ran backwards same_enzyme_function(EnzClass1,EnzClass2, Lefts,_LHS1,RHS1,_LHS2,RHS2) :EnzClass1 == EnzClass2, main_pathway_rights(RHS1,Lefts,[],Mains1), main_pathway_rights(RHS2,Lefts,[],Mains2), quicksort_mets(Mains1,SortedMains1), quicksort_mets(Mains2,SortedMains2), SortedMains1 == SortedMains2. %write(Enzyme:Enzyme2),nl, %write(Reactants:R1),nl, %write(Products:R2),nl,%get0(10). %write('evaluate_enzyme succeeded'),nl. % gene_sharing/2 tests to see if an enzyme (enz_id) can be made by two % or more genes (shared) gene_sharing(EnzId1,EnzId2) :shared_orfs(_Gene,EnzIds), member(EnzId1,EnzIds), member(EnzId2,EnzIds). % An enzyme can't be used if is the same enzyme as those produced % by the knocked out orf (including those enzymes ``shared'' - i.e. % where two or more orfs are required to produce the enzyme) cant_use_enzyme(EnzId1,EnzId2,EnzClass1,EnzClass2, Lefts,_LHS1,RHS1,_LHS2,RHS2) :same_enzyme_function(EnzClass1,EnzClass2, Lefts,_LHS1,RHS1,_LHS2,RHS2), same_enzyme(EnzId1,EnzId2). cant_use_enzyme(EnzId1,EnzId2,EnzClass1,EnzClass2, Lefts,_LHS1,RHS1,_LHS2,RHS2) :same_enzyme_function(EnzClass1,EnzClass2, Lefts,_LHS1,RHS1,_LHS2,RHS2), gene_sharing(EnzId1,EnzId2). % uses Transitive Closure % All Metabolites added to the minimal media are converted to import % metabolites and added to the starts. Pathway reactions are % tested by evaluating each enzyme at a time - deciding whether % the enzyme is available (i.e not produced by a knocked out ORF) % and adding any metabolites formed by the reaction. % backtracking ensures all (allowable) reactions are tested. If % the end products are not found in the results from this process % the procedure succeeds i.e. phenotypic_effect/2 succeeds if % there is NO GROWTH for the given ORF and nutrients (metabolites % added to the minimal media) % When the model is used in hypotheses generation, the encodes/5 % fact is constructed by the learning step and appears in the % hypotheses generated (the codes/7 facts are omitted from the model) phenotypic_effect(ORF,Nutrients):change_all_metabolites(Nutrients,[],Changed), bagof(S,start(S),Minimal), union(Minimal,Changed,Starts), %write('Starts ':Starts),nl, bagof(End,end(End),Ends), find_knocked_out_enzymes(ORF,KnockedOut), all_lefts(Lefts), %write(KnockedOut),get0(10),nl, new_enzyme(EnzId,EnzClass,LHS,RHS), %write(EnzId:EnzClass:LHS:RHS),nl, encodes(ORF,EnzId,EnzClass,LHS,RHS), %write(ORF:EnzClass:LHS:RHS),nl,!, connected_without_this_step(Starts,MutantProducts, EnzId,EnzClass,Lefts,LHS,RHS,1), %write('Mutant Products ':MutantProducts),nl, \+new_subset(Ends,MutantProducts). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % connected_without_this_step/5 succeeds if all of the end points can % be reached from the start points without involving the reaction % specified in the 3rd,4th and 5th terms. connected_without_this_step(Starts1,Ends,EnzId,EnzClass, Lefts,Reactants,Products,Day) :expand_without_this_step(Starts1,Starts2,EnzId,EnzClass, Lefts,Reactants,Products,Day),!, connected_without_this_step(Starts2,Ends,EnzId,EnzClass, Lefts,Reactants,Products,Day). connected_without_this_step(Starts,Starts,_EnzId,_EnzClass, _Lefts,_Reactants,_Products,_Day). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % expand_without_this_step/9 attempts to add new metabolites from a % reaction, checking that the enzyme catalysing the reaction is ... % % 1) Not inhibited by any of the current cell products 2) Available for use (not coded for by the knocked out gene) expand_without_this_step(Starts1,Starts2,EnzId1,EnzClass1,Lefts, Reactants,Products,Day) :enzyme1(EnzId2,_Orfs,EnzClass2,R1,R2,Starts1,Day), %write(EnzId2:EnzClass2:R1:R2),nl, %write(KnockedOut),nl, \+inhibited(EnzId2,Starts1), \+cant_use_enzyme(EnzId1,EnzId2,EnzClass1,EnzClass2, Lefts,Reactants,Products,R1,R2), valid_reactants(R1,R2,Starts1,1,RHS), union(RHS,Starts1,Starts2). %write('Adding ':RHS),get0(10),nl. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % subset(Set, Subset). subset([], []):!. subset([H | T], [H | T2]):!, subset(T, T2). subset([H | T], T2):subset(T, T2). new_subset([],_S). new_subset([I|Items],Set) :member(I,Set), new_subset(Items,Set). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % set_union/3 - set union for ordered lists. % union([],S2,S2) :- !. union(S1,[],S1) :- !. union([H1|T1],[H2|T2],[H1|T3]) :H1 == H2, union(T1,T2,T3), !. % Equal union([H1|T1],[H2|T2],[H2|T3]) :H1 @> H2, !, union([H1|T1],T2,T3), !. % Greater than union([H1|T1],S2,[H1|T3]) :- !, union(T1,S2,T3), !. % Less than %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ends(['C00078', 'C00079', 'C00082']). % On minimal medium need to have a path from starts to all ends % because every end point is needed for growth. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% generated_by_other_pathways(['C00000', 'C00001', 'C00002', 'C00003', 'C00005', 'C00006', 'C00014', 'C00025', 'C00064', 'C00065', 'C00119', 'C00279', 'C00631']). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % mandatory_nutrients/1 mandatory_nutrients([]). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % optional_nutrients/1 optional_nutrients(['C00074', 'C00078', 'C00079', 'C00082', 'C00108', 'C00166', 'C00463', 'C00493', 'C01179']). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Ignoring Direction and Day: % Direction is always forward only % Only condiering Day = 1 encodes(ORF,EnzId,Type,Reactants1,Reactants2):codes(ORF,EnzId,Type,Direction,Day,Reactants1,Reactants2). %enzyme(EnzId,Orfs,Type, Reactants1, Reactants2):% enzyme(EnzId,Orfs,Type, Direction, Day, Reactants1, Reactants2). enzyme1(EnzId,Orfs,Enzyme,R1,R2,Starts,Day) :enz(EnzId,Orfs,Enzyme,R1,R2,Day). enz(EnzId,Orfs,Ec,R1,R2,Day) :enzyme(EnzId,Orfs,Ec,1,D,R1,R2), D =< Day. % tests time % non reverse enz(EnzId,Orfs,Ec,R1,R2,Day) :enzyme(EnzId,Orfs,Ec,2,D,R1,R2), D =< Day. % reverse enz(EnzId,Orfs,Ec,R2,R1,Day) :enzyme(EnzId,Orfs,Ec,2,D,R1,R2), D =< Day. % reverse % valid_reactants/5 ensures that a new reaction is evaluated when an % enzyme evaluated - used when more than one reaction is coded for by % one ORF e.g. YDR127W % i.e. all new metabolites are are eventually added valid_reactants([LHS|_Lefts],Rights,Starts,I,RHS) :new_subset(LHS,Starts), find_rhs(1,I,Rights,RHS), \+new_subset(RHS,Starts). valid_reactants([_LHS|Lefts],Rights,Starts,I,RHS) :I2 is I + 1, valid_reactants(Lefts,Rights,Starts,I2,RHS). find_rhs(I,N,[RHS|_Rights],RHS) :I == N. find_rhs(I,N,[_RHS|Rights],Found) :I2 is I + 1, find_rhs(I2,N,Rights,Found). in_mets_list(Metabolite,[MList|_LHS]) :member(Metabolite,MList). in_mets_list(Metabolite,[_MList|LHS]) :in_mets_list(Metabolite,LHS). in_lhs(Metabolite,[LHS|_Lefts]) :in_mets_list(Metabolite,LHS). in_lhs(Metabolite,[_LHS|Lefts]) :in_lhs(Metabolite,Lefts). main_pathway(Metabolite,_Lefts) :start(Metabolite). main_pathway(Metabolite,Lefts) :in_lhs(Metabolite,Lefts). main_pathway(Metabolite,_Lefts) :end(Metabolite). all_lefts(Lefts) :bagof(LHS,EId^Genes^EC^Dir^Day^RHS^enzyme(EId,Genes,EC,Dir,Day,LHS,RHS),Lefts). main_pathway_rhs([],_Lefts,Mains,Mains). main_pathway_rhs([Met|Metabolites],Lefts,Done,Mains) :main_pathway(Met,Lefts), main_pathway_rhs(Metabolites,Lefts,[Met|Done],Mains). main_pathway_rhs([_Met|Metabolites],Lefts,Done,Mains) :main_pathway_rhs(Metabolites,Lefts,Done,Mains). main_pathway_rights([],_Lefts,Mains,Mains). main_pathway_rights([RHS|Rights],Lefts,Done,Mains) :main_pathway_rhs(RHS,Lefts,[],Found), conc(Done,Found,NewDone), main_pathway_rights(Rights,Lefts,NewDone,Mains). member(I,[I|_Items]). member(I,[_OI|Items]) :member(I,Items). conc([],L,L). conc([X|L1],L2,[X|L3]) :conc(L1,L2,L3). reverse([],[]). reverse([I],[I]). reverse([I1,I2],[I2,I1]). reverse([I|Items],ReversedRest) :reverse(Items,Reversed), conc(Reversed,[I],ReversedRest). quicksort_mets([],[]). quicksort_mets([X|Tail],Sorted) :split_mets(X,Tail,Small,Big), quicksort_mets(Small,SortedSmall), quicksort_mets(Big,SortedBig), append(SortedSmall,[X|SortedBig],Sorted). split_mets(_,[],[],[]). split_mets(X,[Y|Tail],[Y|Small],Big) :X @> Y,!, split_mets(X,Tail,Small,Big). split_mets(X,[Y|Tail],Small,[Y|Big]) :split_mets(X,Tail,Small,Big). find_all_enzymes(Enzymes) :bagof(enzyme(EnzId,Orfs,EnzClass,Direction,Day,LHS,RHS), enzyme(EnzId,Orfs,EnzClass,Direction,Day,LHS,RHS),Enzymes). knocked_out_enzymes(_Orf,[],KnockedOut,KnockedOut). knocked_out_enzymes(Orf,[Enz|Enzymes],Done,KnockedOut) :Enz = enzyme(EnzId,Orfs,_EnzClass,_Direction,_Day,_LHS,_RHS), member(Orf,Orfs), knocked_out_enzymes(Orf,Enzymes,[EnzId|Done],KnockedOut). knocked_out_enzymes(Orf,[_Enz|Enzymes],Done,KnockedOut) :knocked_out_enzymes(Orf,Enzymes,Done,KnockedOut). find_knocked_out_enzymes(Orf,KnockedOut) :find_all_enzymes(Enzymes), knocked_out_enzymes(Orf,Enzymes,[],KnockedOut). knocked_out_enzyme(EnzId,KnockedOut) :member(EnzId,KnockedOut). 7. Experimental Data Iteration ase 0 1 2 3 4 5 random 57.36 67.17 76.14 79.54 80.47 80.11 naive 57.36 67.39 72.59 71.58 73.03 72.16 57.36 67.27 68.55 73.83 73.89 73.94 Table 1: Classification Accuracies for ase, random and naïve. Experimental Data Iteration ase 0 1 2 3 4 5 random 0.00 1.00 1.76 2.26 2.41 2.50 naive 0.00 2.91 3.56 3.75 3.87 3.96 0.00 1.00 1.59 1.89 2.11 2.26 Table 2: Relative Cost (Log10 £) for ase, random and naïve. Experimental Data Ase Gene Run Technique Iteration YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Cost 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Accuracy 69.3827 74.3210 28.8889 74.3210 59.7531 79.2593 63.7427 69.3827 69.3827 63.7427 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 53.4503 74.3210 59.7531 79.2593 74.3210 47.8363 Table 3: Ase Results for the Robot Scientist: Iteration 1. Experimental Data Ase Gene Run Technique Iteration YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Cost 62 62 62 62 62 62 39 62 62 39 39 62 62 62 62 62 62 62 39 62 62 62 62 62 62 62 39 62 62 62 62 39 Accuracy 69.3827 95.5556 43.3333 95.5556 80.0000 86.6667 88.3333 69.3827 69.3827 88.3333 53.4503 95.5556 80.0000 86.6667 95.5556 69.3827 69.3827 95.5556 55.0000 95.5556 59.7531 79.2593 74.3210 69.3827 69.3827 95.5556 55.0000 95.5556 59.7531 79.2593 74.3210 42.7778 Table 4: Ase Results for the Robot Scientist: Iteration 2. Experimental Data Ase Gene Run Technique Iteration YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Cost 166 251 251 251 251 251 78 166 166 78 78 251 251 251 251 166 166 251 78 251 166 166 166 166 166 251 78 251 166 166 166 78 Accuracy 69.3827 100.0000 46.6667 100.0000 84.4444 86.6667 88.3333 82.2222 82.2222 88.3333 55.0000 100.0000 80.0000 86.6667 100.0000 82.2222 82.2222 100.0000 55.0000 100.0000 59.7531 79.2593 74.3210 69.3827 82.2222 100.0000 55.0000 100.0000 59.7531 79.2593 74.3210 42.7778 Table 5: Ase Results for the Robot Scientist: Iteration 3. Experimental Data Ase Gene Run Technique Iteration YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Cost 270 280 280 280 280 450 130 218 218 130 130 280 450 450 280 218 218 280 130 280 270 270 270 270 218 280 130 280 270 270 270 130 Accuracy 63.5556 100.0000 46.6667 100.0000 84.4444 86.6667 88.3333 82.2222 100.0000 88.3333 55.0000 100.0000 80.0000 86.6667 100.0000 82.2222 82.2222 100.0000 55.0000 100.0000 59.7531 79.2593 74.3210 69.3827 100.0000 100.0000 55.0000 100.0000 59.7531 79.2593 74.3210 42.7778 Table 6: Ase Results for the Robot Scientist: Iteration 4. Experimental Data Ase Gene Run Technique Iteration YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase ase 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 Cost 459 319 319 319 319 479 182 270 218 182 182 319 479 479 319 270 270 319 182 319 374 374 374 374 218 319 182 319 374 374 374 182 Accuracy 63.5556 100.0000 46.6667 100.0000 84.4444 86.6667 88.3333 82.2222 100.0000 88.3333 55.0000 88.3333 80.0000 86.6667 100.0000 82.2222 82.2222 100.0000 55.0000 100.0000 59.7531 79.2593 74.3210 69.3827 100.0000 100.0000 55.0000 100.0000 59.7531 79.2593 74.3210 42.7778 Table 7: Ase Results for the Robot Scientist: Iteration 5. Experimental Data Random Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique Iteration 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Cost 247 81 62 247 104 62 241 205 189 822 62 9427 9427 241 104 224 81 29 62 62 104 685 62 384 224 104 199 643 241 247 218 685 Accuracy 81.3333 79.4444 53.4503 76.4646 63.1579 81.1111 63.7427 69.3827 78.5185 63.7427 53.4503 76.4646 67.9798 80.4444 63.7427 64.5455 64.5556 76.8687 53.4503 77.2222 47.1111 56.9591 77.2222 78.5185 64.5455 81.1111 20.0000 74.3210 63.1579 74.8485 63.7427 64.5556 Table 8: Random Results for the Robot Scientist: Iteration 1. Experimental Data Random Gene YBR166C YDR007W YDR035W YDR354W YER090W .YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random Iteration 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Cost 488 133 257 446 309 9626 9626 404 388 10249 143 18831 9531 293 9531 276 276 276 143 724 9489 880 309 9788 305 299 9603 861 9668 328 1046 926 Accuracy 81.3333 93.1481 53.4503 76.4646 72.0000 81.1111 95.5556 69.3827 78.5185 63.7427 53.4503 76.4646 85.3704 80.4444 82.0833 64.5455 64.5556 76.8687 17.3333 77.2222 47.1111 81.2963 77.2222 78.5185 100.0000 81.1111 20.0000 74.3210 85.3704 74.8485 95.3333 64.5556 Table 9: Random Results for the Robot Scientist: Iteration 2. Experimental Data Random Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random Iteration 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Cost 687 374 481 670 498 9831 10311 9831 1050 10353 828 19474 9593 9678 18916 328 523 328 9570 786 10132 942 319 19215 344 984 10288 1079 10353 546 10610 1611 Accuracy 81.3333 93.1481 53.4503 46.6667 84.4444 81.1111 95.5556 69.3827 78.5185 87.2222 47.2222 74.4444 85.3704 80.4444 74.4444 64.5455 64.5556 51.1111 17.3333 75.9259 47.1111 81.2963 77.2222 78.5185 100.0000 46.6667 33.3333 100.0000 85.3704 74.8485 95.3333 64.5556 Table 10: Random Results for the Robot Scientist: Iteration 3. Experimental Data Random Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random Iteration 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Cost 10251 1059 728 670 1160 9860 10363 10474 1735 11038 10255 19715 9622 10062 19601 990 9950 328 9769 838 10373 1627 371 19456 344 984 10288 1463 19917 1179 10714 11038 Accuracy 81.3333 93.1481 100.0000 46.6667 84.4444 81.1111 95.5556 69.3827 78.5185 87.2222 47.2222 74.4444 85.3704 80.4444 74.4444 64.5455 64.5556 51.1111 17.3333 75.9259 47.1111 81.2963 77.2222 78.5185 100.0000 46.6667 33.3333 100.0000 85.3704 74.8485 95.3333 64.5556 Table 11: Random Results for the Robot Scientist: Iteration 4. Experimental Data Random Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random random Iteration 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 Cost 10303 10623 728 670 10545 10049 10425 11136 2378 11723 10359 19796 9869 10114 29005 1675 10054 328 9831 1037 10425 11012 560 19485 344 984 10288 1710 20579 10606 10776 11285 Accuracy 81.3333 93.1481 100.0000 46.6667 84.4444 81.1111 95.5556 69.3827 78.5185 87.2222 47.2222 46.6667 85.3704 80.4444 74.4444 64.5455 64.5556 51.1111 17.3333 75.9259 47.1111 81.2963 77.2222 78.5185 100.0000 46.6667 33.3333 100.0000 85.3704 74.8485 95.3333 64.5556 Table 12: Random Results for the Robot Scientist: Iteration 5. Experimental Data Naive Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique Iteration 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Cost 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Accuracy 69.3827 74.3210 28.8889 74.3210 59.7531 79.2593 63.7427 69.3827 69.3827 63.7427 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 56.4503 74.3210 59.7531 79.2593 74.3210 47.8363 Table 13: Naive Results for the Robot Scientist: Iteration 1. Experimental Data Naive Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique Iteration 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Cost 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 Accuracy 69.3827 74.3210 28.8889 74.3210 59.7531 79.2593 88.3333 69.3827 69.3827 88.3333 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 53.4503 74.3210 59.7531 79.2593 74.3210 42.7778 Table 14: Naive Results for the Robot Scientist: Iteration 2. Experimental Data Naive Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique Iteration 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Cost 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 Accuracy 69.3827 88.3333 55.0000 74.3210 96.1111 79.2593 88.3333 69.3827 69.3827 88.3333 53.4503 88.3333 96.1111 79.2593 88.3333 69.3827 69.3827 88.3333 53.4503 74.3210 59.7531 79.2593 74.3210 69.3827 69.3827 74.3210 53.4503 88.3333 59.7531 79.2593 74.3210 42.7778 Table 15: Naive Results for the Robot Scientist: Iteration 3. Experimental Data Naive Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique Iteration 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Cost 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 130 Accuracy 69.3827 88.3333 55.0000 95.5556 96.1111 86.6667 88.3333 69.3827 69.3827 88.3333 53.4503 88.3333 96.1111 79.2593 88.3333 54.4444 69.3827 88.3333 55.0000 74.3210 59.7531 79.2593 74.3210 69.3827 54.4444 74.3210 55.0000 88.3333 59.7531 79.2593 74.3210 42.7778 Table 16: Naive Results for the Robot Scientist: Iteration 4. Experimental Data Naive Gene YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C YBR166C YDR007W YDR035W YDR354W YER090W YGL026C YKL211C YNL316C Run Technique Iteration 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive naive 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 Table 17: Naive Results for the Robot Scientist: Iteration 5. Cost 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 182 Accuracy 69.3827 88.3333 55.0000 95.5556 96.1111 86.6667 88.3333 69.3827 69.3827 88.3333 55.0000 88.3333 96.1111 79.2593 88.3333 54.4444 69.3827 88.3333 55.0000 74.3210 59.7531 79.2593 74.3210 69.3827 54.4444 74.3210 55.0000 88.3333 59.7531 79.2593 74.3210 42.7778 7 Simulation Data itera tech median 2.5% tion nique Accuracy 97.5% (median (97.5% - 2.5%) median 2.5% Cost 97.5% -median) (median (97.5% - 2.5%) - median) 0 ase 1 ase 2 ase 3 ase 4 ase 5 ase 6 ase 7 ase 8 ase 9 ase 10 ase 57.36 69.70 78.59 89.28 95.63 97.85 97.85 97.85 97.85 97.85 97.85 57.36 69.70 78.59 89.28 95.63 97.85 97.85 97.85 97.85 97.85 97.85 57.36 69.70 78.59 89.28 95.63 97.85 97.85 97.85 97.85 97.85 97.85 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.77 2.20 2.26 2.27 2.27 2.27 2.27 2.27 2.27 0.00 1.00 1.77 2.20 2.26 2.27 2.27 2.27 2.27 2.27 2.27 0.00 1.00 1.77 2.20 2.26 2.27 2.27 2.27 2.27 2.27 2.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 random 1 random 2 random 3 random 4 random 5 random 6 random 7 random 8 random 9 random 10 random 57.36 67.29 75.38 80.81 84.90 86.77 87.82 89.53 90.97 91.57 92.89 57.36 62.04 66.65 70.24 72.23 73.03 75.31 75.31 79.58 79.58 79.58 57.36 72.67 87.08 89.41 92.89 95.58 96.01 96.01 96.59 96.99 96.99 0.00 5.24 8.73 10.56 12.67 13.73 12.51 14.21 11.38 11.99 13.31 0.00 5.38 11.70 8.61 7.99 8.81 8.19 6.48 5.63 5.42 4.10 0.00 2.67 3.16 3.42 3.57 3.69 3.79 3.87 3.92 3.96 4.00 0.00 2.14 2.64 2.98 3.12 3.17 3.19 3.42 3.46 3.46 3.49 0.00 3.39 3.69 3.94 4.05 4.14 4.18 4.22 4.27 4.29 4.32 0.00 0.53 0.52 0.43 0.45 0.52 0.60 0.45 0.46 0.50 0.51 0.00 0.72 0.53 0.52 0.48 0.44 0.38 0.34 0.35 0.33 0.32 0 naive 1 naive 2 naive 3 naive 4 naive 5 naive 6 naive 7 naive 8 naive 9 naive 10 naive 57.36 69.70 73.82 73.82 82.71 86.53 96.18 96.18 96.18 96.18 96.18 57.36 69.70 73.82 73.82 82.71 86.53 96.18 96.18 96.18 96.18 96.18 57.36 69.70 73.82 73.82 82.71 86.53 96.18 96.18 96.18 96.18 96.18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.60 1.87 2.06 2.17 2.24 2.27 2.30 2.33 2.36 0.00 1.00 1.60 1.87 2.06 2.17 2.24 2.27 2.30 2.33 2.36 0.00 1.00 1.60 1.87 2.06 2.17 2.24 2.27 2.30 2.33 2.36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Table 18. Median, 2.5,97.5 percentiles for Classification Accuracy and Relative Experimental Cost (Log10 £): 100 runs. Robot Scientist simulation with 0% noise. itera tech tion Accuracy median 2.5% 97.5% (median (97.5% nique - 2.5%) median 2.5% Cost 97.5% -median) (median (97.5% - 2.5%) - median) 0 ase 1 ase 2 ase 3 ase 4 ase 5 ase 6 ase 7 ase 8 ase 9 ase 10 ase 57.36 65.68 72.35 77.49 80.22 82.17 83.28 84.72 84.72 85.07 84.19 57.36 58.08 63.79 61.81 66.57 66.29 67.99 66.60 66.29 70.88 70.63 57.36 69.70 78.04 87.92 91.74 92.71 94.02 96.39 96.39 97.85 97.85 0.00 7.61 8.56 15.69 13.65 15.88 15.29 18.13 18.43 14.19 13.56 0.00 4.02 5.69 10.42 11.52 10.54 10.74 11.67 11.67 12.78 13.66 0.00 1.00 1.76 2.17 2.34 2.40 2.43 2.46 2.49 2.51 2.51 0.00 1.00 1.72 2.07 2.24 2.28 2.30 2.32 2.34 2.35 2.36 0.00 1.00 1.79 2.29 2.45 2.53 2.58 2.63 2.67 2.70 2.71 0.00 0.00 0.04 0.10 0.10 0.12 0.13 0.14 0.15 0.16 0.16 0.00 0.00 0.03 0.12 0.11 0.13 0.14 0.16 0.18 0.19 0.20 0 random 1 random 2 random 3 random 4 random 5 random 6 random 7 random 8 random 9 random 10 random 57.36 64.15 68.45 70.10 72.18 71.80 71.09 69.35 69.54 69.51 69.24 57.36 55.4 56.62 55.99 55.99 53.76 51.39 50.57 50.57 50.57 50.57 57.36 72.52 80.96 84.93 88.61 89.63 88.65 89.90 89.90 92.10 92.10 0.00 8.75 11.83 14.11 16.19 18.04 19.68 18.78 18.97 18.94 18.67 0.00 8.37 12.51 14.83 16.43 17.84 17.59 20.55 20.37 22.58 22.86 0.00 2.56 3.11 3.43 3.63 3.77 3.85 3.91 3.98 4.03 4.08 0.00 1.91 2.65 2.94 3.09 3.21 3.31 3.35 3.39 3.40 3.41 0.00 3.00 3.57 3.89 4.00 4.13 4.17 4.24 4.28 4.31 4.34 0.00 0.65 0.46 0.48 0.54 0.57 0.54 0.57 0.59 0.63 0.67 0.00 0.43 0.46 0.46 0.37 0.36 0.33 0.32 0.30 0.28 0.27 0 naive 1 naive 2 naive 3 naive 4 naive 5 naive 6 naive 7 naive 8 naive 9 naive 10 naive 57.36 65.48 69.46 71.69 75.73 77.58 79.58 79.65 79.65 79.58 79.58 57.36 58.25 59.98 63.05 67.83 63.60 60.76 60.76 60.76 59.81 59.81 57.36 69.70 75.76 77.99 81.80 85.21 90.55 91.51 91.51 91.51 94.38 0.00 7.23 9.48 8.64 7.89 13.97 18.83 18.90 18.90 19.77 19.77 0.00 4.22 6.31 6.30 6.08 7.63 10.96 11.86 11.86 11.93 14.79 0.00 1.00 1.60 1.87 2.06 2.19 2.27 2.34 2.40 2.44 2.49 0.00 1.00 1.60 1.87 2.06 2.17 2.24 2.29 2.32 2.34 2.37 0.00 1.00 1.60 1.90 2.12 2.27 2.36 2.44 2.51 2.57 2.63 0.00 0.00 0.00 0.00 0.00 0.02 0.04 0.06 0.08 0.10 0.11 0.00 0.00 0.00 0.04 0.07 0.08 0.09 0.10 0.11 0.13 0.14 Table 19. Median, 2.5,97.5 percentiles for Classification Accuracy and Relative Experimental Cost (Log10 £): 100 runs. Robot Scientist simulation with 25% noise: noise abatement strategy used