From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. A Statistical Approachto Adaptive Problem-Solving for Large-Scale Scheduling and Resource Allocation Problems Steve Chien*, Jonathan Gratch+, and Michael Burl* *Jet Propulsion Laboratory CaliforniaInstitute of Technology 4800OakGroveDrive, Pasadena,CA91109-8099 { chien,burl } @aig.jpl.nasa.gov +Beckman Institute University of Illinois 405N. Mathews Av., Urbana,IL 61801 gratch@cs.uiuc.edu Abstract of actions, one could gather further information which might help makeyour decision (at some cost), or one mightsimplymakea decision. If one decidesto gather further information,whichinformationwouldbe the most useful? Whenone wishes somesort of statistical guarantees on the (local) optimality of the choiceand/orthe guaranteeof rationality, a statistical decisiontheoreticframework is useful. To summarize,this problemof decisionmakingwith incompleteinformationand information costs can be analyzedin twoparts: Our workfocuses upon developmentof techniques for choosingamonga set of altematives in the presenceof incompleteinformationand varyingcosts of acquiringinformation.In our approach, wemodelthe cost and utility of various alternatives usingparameterized statistical models.Byapplyingtechniquesfroman area of statistics calledparameter estimation,statistical modelscan be inferred fromdata regardingthe utility andinformationcost of eachof the various alternatives. Thesestatistical modelscan thenbe usedto estimatethe utility andcost of acquiringadditionalinformationandthe utility of selectingspecific alternativesfromthe possible choices at hand. Weapplythese techniquesto adaptiveproblem-solving,a techniquein which a systemautomaticallytunes variouscontrolparameters on a performanceelementto improve performancein a given domain.Wepresent empirical results comparing the effectiveness of these techniques on speeduplearning from a real-world NASAscheduling domain and schedulequality data fromthe samereal-world NASA scheduling domain. AI: Howmuchinformation is enough? At what point do wehaveadequateinformationto select oneof the altematives? 1 INTRODUCTION In machinelearning and basic decision makingin AI, a systemmust often reason about alternative coursesof action in the absenceof perfect information. In manycases, acquiringadditional information maybe possible,but has associatedwithit some cost. Acentral problemof interest to AIresearchers, andstatisticiansis that of designing strategiesto balancethe cost of acquiringadditionalinformation againstthe expectedutility of the informationto be acquired. For example,whenchoosingamonga set 27 A2: If one wishes to acquire moreinformation, whichinformationwill allow us to makethe best possible decisionat handwhileminimizing informationcosts? Possiblesolutions to this decision-making quandary dependonthe contextin whichthe decisionis being made. This paper focuses upon the decision-makingproblemsinvolved in adaptive problem-solving. Adaptive problem-solving occurs whena systemhas a numberof control points whichaffect its performance over a distribution of problems.Whenthe system solves a problemwith a givenset of settingsfor controlpoints, it produces a result whichhas a corresponding utility. Thegoal of adaptive problem-solvingis: given a problem distribution, find the setting of controlpointswhich maximizes the expectedutility of the result of applyingthe systemto problemsin the distribution. Morerigorously, the adaptive problem-solving problemcan be describesas follows. Givena flexible performanceelementPEwith control points CP1...CPn, whereeach control point CPi correspondsto a particularcontroldecisionandfor which there is a set of alternative decision methods Mi, l...Mi~,l a controlstrategy is a selection of a specific methodforever), control point (e.g., STRAT <MI,a,M2,6,M3.1,...>). A control strategy determinesthe overall behavior of the scheduler. It may effect properties like computationalefficiency or the quality of its solutions. Let PE(STRAT) be the problemsolver operating under a particular control strategy. The function U(PE(STRAT), d) is a valuedutility function that is a measureof the goodness of the behavior of the scheduler over problem d. Thegoal of learning can be expressed as: given a problemdistribution D, find STRAT so as to maximizethe expectedutility of PE. Expectedutility is defined formally as: [Gratch and DeJong93], and are not directly addressed by the workdescribed in this abstract. Question A2 from above is "find which information will allow us to makethe best possible decision at hand while minimizing information costs". This question can be cast in two ways: 1) for a given amountof resources, which information should I get to maximizethe expectedutility of of the outcomemydecision; and 2) given that I want to have a certain confidencelevel in the goodnessof mydecision (e.g., local optimality), howcan I achievethis while expendingthe least resources possible. This abstract focuses on question A2, namely: "find which information will allow us to makethe best possible decision at hand while minimizinginforU(PE(STRAT),d) × probability(d) mation costs". Specifically, our approach draws dED ¯ upon techniques from statistics in the of area of For example, in a planning system such as PRODI- parametric statistical modelsto modelthe uncertainty in utility and informationcost estimates. In GY[Minton88], whenplanning to achieve a goal, parametric statistical models, one presumes that control points wouldbe: howto select an operator data is distributed according to someform of model to use to achieve the goal; howto select variable (e.g., the normaldistribution, the poissondistribubindings to instantiate the operator, etc. A method tion, etc.). This distribution can be described in for the operator choice control point mightbe a set terms of a fixed set of parameters (such as mean, of control rules to determinewhichoperators to use variance, etc.). If one can infer the relevant parameto achieve various goals plus a default operator ters for the distribution underlying the data (sochoice method.A strategy wouldbe a set of control called parameter estimation), then because the rules and default methodsfor every control point uncertainty in utility estimates is explicitly mod(e.g., one for operator choice, one for binding elled in the statistics, three types of questions rechoice, etc.). Utility mightbe defined as a function garding utility can be answered: of the time to construct a plan, cost to execute the plan, or someoverall measureof the quality of the B 1: whichaltemative has the highestexpectedutilplan produced. ity; Giventhe context of adaptive problem-solving, the question A 1 from above "howmuchinformation is enough"can be further elaborated. In adaptive problem-solving, the expected utility can be interpreted as average utility from the performance element on a per problem basis. This means that "how muchinformation is enough" generally depends upon the amortization period for learned knowledge,the relative cost of acquiring additional information and interactions between successive choices. These issues are the focus of other work 1. Note that a methodmayconsist of smaller elements so that a methodmaybe a set of control rules or a combination of heuristics. Note also that a methodmayalso involve real-valued parameters. Hence, the numberof methodsfor a control point maybe infinite, and there may be an infinite numberof strategies. 28 B2: howcertain are weof this rankingof the alternatives; and B3: howmuchis this uncertainty likely to change if weacquire additional information. Correspondingly, the same approach can be used in modellingthe cost distribution. Hencethe expected cost of acquiring additional informationcan also be estimated. Of course, the accuracyof all of these estimates is dependentuponthe goodnessof fit of the parametric modelused to modelthe uncertainty and cost estimates. Generally, we use the normal (gaussian) distribution modelas our parametric model. Fortunately, the normaldistribution has the strong property that it is a goodapproximationto manyforms of distributions for reasonably large sample sizes (due to the Central Limit Theorem). Thus, using techniques from parameter estimation and analysis of the data, wecan answer questions B 1, B2, &B3from aboveregarding utility and estimate expected cost of information. The remaining task is to apply decision theoretic methodsto determinethe answers to questions A1 and A2 from this information. Workingtowards this goal we have developed two general approaches to addressing this problem: dominance-indifference and expected loss. The first set of me~ods, called interval-based methods, involve quantifying the uncertainty in competinghypotheses by computingthe statistical confidencethat one hypothesisis better than another hypothesis. Oncethis confidence has been computed, the system attempts to allocate examplesto efficiently showthat one hypothesis dominatesall the other hypotheses with some specified confidence. These methods also rely upon an indifference parameter, which allows it to showthat the preferred hypothesis is quite similar in expected performance to another hypothesis with some confidence, thus makingthe preferred hypothesisis acceptable. Thesecond set of methodsuses the decision theoretic concept of expected loss [Russell & Wefald89, Russell & Wefald93], whichmeasuresthe probability that a less preferable decision is madeweighted by the lost utility with respect to the alternative choice. Morespecifically, the expectedloss of utility from adopting option Hi over option Hj with correspondingutility distributions Ui and Uj to be the integral ofui ~ Ui, uj e Uj, over all the regions whereuj > Ui, of Puiuj(Ui, Uj) (Uj - U’L). In expected loss approach, the system acquires information until the expected loss is reduced belowsome specified threshold. This approach has the added benefit of not attempting to distinguish amongtwo hypotheses with similar meansand low variances (e.g., it recognizesindifference without a separate indifference parameter). For both the interval-based and expected loss approaches, whenselecting a best hypothesis, one must base this selection upon comparisons of the utility of the "best" hypothesisto the other possible hypotheses. Because there are multiple comparisons, the estimate for the overall error (or confidence) in a conclusion of selection of a best hypothesis, is based uponmultiple smaller conclusions. For example, if we wish to showthat H3is 29 the best choice amongH1, H2, H3, H4, and H5, in the interval-based approach, we might show that H1 and H3 are indifferent, that H3 dominates H2, H3 dominates H4, and H3 dominates H5. Thus if wewish a confidence level of 95%,with a straight sumerror modeland equal allocation of error, each of the individual hypotheses wouldneed a 98.75% confidence level (since 4 x 1.25%= 5%). The exact form of the relationship dependsuponthe particularerrormodel used. However,sampling additional examplesto comparerelative utilities of various alternatives can have varying effect upon the overall error estimate, based uponparametersof the particular distribution. Also, selecting an example fromdifferent distributions can have widelyvarying cost. Thus, in manycases, it maybe desirable to allocate the error estimates unequally. Whilewe have implementeddefault strategies of equal allocation of errors, wehave also developedalgorithms whichallocate the error unequally. Inthis approach, the system estimates the marginal benefit and marginal cost of samplingto extract another data point to compare two competing hypotheses. The marginal benefit is estimated by computingthe decrease in the error estimate (or increase in confidenceestimate) due to acquiring another sample presuming that the statistical parametersremainconstant (e.g., in the case of the normaldistribution, that the sample meanand variance do not change). The marginal cost is estimated using estimated parameters on the cost distribution for the relevant hypotheses. The systemthen allocates additional examplespreferring the highest ratio of marginalbenefit to marginal cost. For the expected loss approach, a comparablecombination of expected losses is used. In this approach, the expected loss of choosingHi is the sum of the pair-wise expected losses from choosing Hi over each of the other Hj’s. Again, dependingupon the exact parametersof the utility distributions, allocating examplesto each of the pair-wise expected loss computationswill have varying effects on the sumof expected losses. Additionally, exactly as with the interval-based case, the cost of sampling also varies. Thus, allocating examplesto the pairwise computations unequally can be advantageous. Consequently, we have also implemented an expected loss algorithm whichuses the estimated marginal cost and estimated marginal benefit of sampling to allocate sampling resources. Thus, in all, there are four newalgorithms, interval-based equal error allocation (STOP1),interval-based unequal error allocation (STOP2),expectedloss equal allocation (EL1), and expectedloss unequalalloca2. tion (EL2) 2 EMPIRICAL EVALUATION PERFORMANCE Wenowturn to an empirical evaluation of the hypothesis selection techniques. This evaluation lends support to the techniques by: (1) using synthetic data to demonstratethat the techniques perform as predicted whenassumptionsare met and (2) using a real-world hypothesis selection problemto demonstrate the robustness of the approaches. As the interval-based and expected loss approachesuse different parameters which are incomparable, we first test the interval-based and expectedloss approacbes separately and then compare them headto-bead in a real-world comprehensivetest. 2.1 SYNTHETIC DATA Synthetic data is used to showthat the techniques perform as expected whenthe underlying assumptions are valid. For interval-based approaches we showthat the techniquewill choosethe best hypotheses, or one e--close to the best, with the requested probability. For the expected loss approach we showthat the techniquewill exhibit no morethat the requested level of expectedloss. Thestatistical ranking and selection literature uses a standard class of synthetic problemsfor evaluating the statistical error of hypothesis evaluation techniques called the least favorable configuration of the population means[Bechhofer54]. This is a parameterconfiguration that is most likely to cause a technique to choose a wronghypothesis (one that is not e-close) and thus providesthe mostsevere test of the technique’s abilities. Underthis configuration, k- 1 of the hypotheseshave identical expected utilities, I1, and the remaininghypothesis has expectedutility I.t+e. Thelast hypothesishas the highest expected utility and should be chosen by the technique. The costs and variances of all hypotheses are equal. Wetest each technique on the least favorable configuration under a variety of control parametersettings. The least favorable configuration becomes moredifficult (requires moreexamples)as the confidence y*, the numberof hypotheses k, or common 2. For details of these algorithms see [Chien et al 94]. 30 utility variance o~, increases. It becomeseasier as the indifference interval e, increases. In the standard methodologya technique is evaluated varying values for k, y*, and¢r/e. Thelast termcombinesthe variance and indifference interval size into a single quantity which as it increases makesthe problem more difficult. For our experiments, no is set to seven,Ix is fifty, o-2 is sixty-four,andall other parameters are varied as indicated inthe results. All experimental trials are repeated 5000 times to obtain statistically reliable results. 3 Theresults fromthese experiments, in Table 1 below, showthat as the requested confidence increases, the STOP1 algorithm correctly takes more training examples to ensure that the requested accuracylevel is achieved. The expected loss approachEL 1 was also evaluated in the least favorable configurationto test the algorithms ability to bound expected loss. EL1was tested on various loss thresholds, H*, over this problem. For this evaluation, Ix is fifty, all hypotheses share a common utility variance of sixty-four, and e is two, k = 3,5,10, H*= 1.0, 0.75, 0.5, 0.25. The sample size results and observed loss values are summarized belowin Table 1. The results illustrate that as the loss threshold is loweredEL1takes more training examplesto ensure the expected loss remainsbelowthe threshold. Again, all trials are repeated 5000 times. 0.75 0.90 0.95 STOP1 n¯ EL1 1253 (0.85) 1976 (0.93) 2596 (0.95) 1.0 307 (0.67) 332 (0.67) 399 (0.17) 497 (0.07) 0.75 0.5 0.25 Table 1: Least Favorable Configuration Results Bothsystemssatisfied their statistical requirements. In each configuration STOP1selected the correct hypothesis with at least as muchconfidence as requested. The expected loss exhibited by EL1was 3. Anotherimportantquestion with the interval-basedapproachis its efficiencywhenall hypotheses are indifferentto eachother. Evaluations in this configurationgive comparable efficiencyto the least favorableconfiguration,but spaceprecludespresentation of thoseresults. no greater than the loss threshold. As expected the numberof examples required increased as the acceptable errorlevel orloss threshold decreased. The numberof examplesrequired by the two techniques should not be compareddirectly as each algorithm is solving a different task. Later wewill discuss the relationship between the efficiency of the approaches. 2.2 NASA SCHEDULINGDATA Thetest of real-world applicability is based on data drawn from an actual NASAscheduling application. This data providesa strong test of the applicability of the techniques. All of the statistical techniques makesome form of normality assumption. However the data in this application is highly non-normal- in fact most of the distributions are hi-modal. This characteristic provides a rather severe test of the robustness of the approaches. Our particular application has been adaptive learning of heuristics for a lagrangian relaxation [Hsher81] scheduler developed by Colin Bell called LR-26[Bell93] to schedule communications between NASA/JPL antennas and low earth orbiting satellites. LR-26formulates the scheduling problem as an integer linear programmingproblem and a typical problemin this domainhas approximately 650 variables and 1300 constraints. The goal of these evaluations wasto choose a heuristic that solved scheduling problemsquickly on average. This is easily seen as a hypothesisevaluation problem.Eachof the heuristics correspondsto a hypothesis. The cost of evaluating a hypothesis over a training exampleis the cost of solving the scheduling problem with the given heuristic. The utility of the training exampleis simplythe negation of its cost. In that way,choosinga hypothesis with maximalexpected utility corresponds to choosinga scheduling heuristic with minimalaverage cost. Application of the COMPOSER method of adaptive problem-solving[Gratch93] to this domainhas producedvery promisingresults. For one version of the problem,the goal is to learn heuristics to allow for faster construction of schedules. For this application [Gratch et al. 93a, Gratch & Chien 93], learned heuristics producedthe results of: 50%reduction in CPUtime to construct a schedule for solvable problems, and a 15%increase in the number of solvable problems within resource bounds. For a second version of the problem,the goal is to leam heuristics to choosegoal constraints to relax whenit appears that the problems posed to the scheduleris unsolvable.In this application, the utility is negative the numberof constraints relaxed in the final solution. Preliminarytests showeda range from -681 to -35 in average utility in the control space, with an averageutility of-137. Over6 trials, COMPOSER picked the best strategy in every trial. Weare currently in the process of enriching the heuristic space for this application. Currently, we are using data from this scheduling application to further test our adaptive problemsolving techniques. The scheduling application involved several hypothesis evaluation problems, four of whichweuse in this evaluation. Eachproblem corresponds to a comparison of some set of schedulingheuristics, and contains data on the heuristics’ performance over about one thousand scheduling problems. An experimental trial consists of executinga techniqueover one of these data sets. Each time a training exampleis to be processed, some problem is drawn randomlyfrom the data set with replacement. The actual utility and cost values associated with this scheduling problem is then used. As in the synthetic data, each experimental trial is repeated 5000times and all reported results are the averageof these trials. Weran the interval-based hypothesis selection method(STOP1),the cost sensitive interval-based method (STOP2), the COMPOSER approach, and an existing statistical technique by Tumbulland Weiss[Tumbull& Weiss 87] over the four scheduling data sets. In each case the confidencelevel was set at 95%,and no set to fifteen. These results, shown in Table 2, show that the interval-based techniques significantly outperformedthe Tumbull and COMPOSER techniques and that STOP2 slightly outperformed STOP2.Wealso ran the expected loss techniques EL1 and EL2(cost sensitive) over the four scheduling data sets. In each case the loss threshold was set at three and no wasfifteen. Table 3 summarizesthe results along with the numberof hypothesesand the relative difficulty (o/e) each data set. Theseresults in Table3 showthat the expected loss approaches bounded the expected loss as predicted. 2.3 Comparing STOP1 and EL1 It is difficult to directly comparethe performance of STOP1and EL1as the algorithms are accomplishing different tasks. STOP1 is attemptingto identify 31 Parameters k STOPI STOP2 TURNBULL COMPOSER D1 3 0.95 34 908(1.oo) 648(1.oo) 26,691 (1.00) 78 (1.00) D2 2 0.95 34 74 (1.00) 76 (1.00) 13,066 (1.00) 346 (1.00) D3 7 0.95 14 2,371 (0.94) 2,153 (0.93) 94,308 (1.00) 2,456 (0.97) I34 7 0.95 11 7,972 (0.96) 7,621 (0.94) 87,357 (1.00) 21,312 (0.89) Table 2: Estimated expected total numberof Observationsfor scheduling data. Achievedprobability of a correct selection is shownin parenthesis. Parameters k H* D1 3 D2 ELI EL2 Observations Loss Observations 3.0 78 0.1 49 1.0 2 3.0 30 1.8 30 1.8 D3 7 3.0 335 3.0 177 3.9 D4 7 3.0 735 1.7 283 2.2 Loss Table 3: Estimated expected total numberof observations and expectedloss of an incorrect selection for the schedulingdata. a nearly optimal hypothesis with high confidence while EL1is attempting to minimizethe cost of a mistakenselection. If the goal of the task is to identify the best hypothesis then clearly STOP1should be use. If the goal is to simply improveexpected utility as muchas possible, either could be used and it is unclear whichis to be preferred. This section attempts to assess this question on the NASA scheduling domain.Werun each algorithm under a variety of parameter settings and compare the best performanceof each algorithm. To directly compareSTOP1and EL1, we ran a comprehensive adaptive scheduling test. In this test, STOP1and EL1all were given the task of optimizing several control parametersof an adaptive scheduler, with the goal of speeding up the schedule generation process. This task correspondsto the sequential solution of the problemsD1 through 134 posedin the schedulingtest (for moredetails on this applicationdomainsee [Gratchet al. 93]. In this test the STOP1algorithm was run with confidence lev32 els y*=0.75,0.90,0.95 and an indifference level of 1.0, andEL1was run with loss boundL=5, 1, 0.5. Thebest settings found and averagedresults resulting from 1000 runs are shownbelow in Table 4. Cost (CPU STOP1 (0.75,1.0) ELl(0.5) Examp. Utility Days) 1.94 4199 17.1 1.84 2630 17.3 Table 4: Direct Comparisonof STOP1and EL1 These results showthat the two algorithms produce roughlycomparableutilities, the difference in utilities is smaller than the indifference interval specified to STOP1. However, the STOP1algorithm requires nearly twice the examples on average to achievethis utility. Whileit is difficult to generalize this result to other applications, this andother empirical results has suggestedto us that the expected loss approachis moreefficient atthe task of improving expectedutility. 3 CONCLUSION This paper has described techniques for choosing amonga set of altematives in the presenceof incomplete informationand varying costs of acquiring information. In our approach, we modelthe cost and utility of various alternatives using parameterized statistical models. Byapplying techniques from an area of statistics called parameterestimation, statistical modelscan be inferred fromdata regarding the utility and informationcost of eachof the variousalternatives. Thesestatistical modelscan then be used to estimatethe utility andcost of acquiringadditional informationandthe utility of selecting specific alternatives from the possible choices at hand. We apply these techniques to adaptive problem-solving, a technique in which a system automatically tunes various control parameters on a performance element to improveperformancein a given domain. Wepresent empirical results comparingthe effectiveness of these techniques on two applications in a real-world domain:speeduplearning from a realworld NASA scheduling domain and schedule quality data from the same real-world NASA scheduling domain. Acknowledgements Portions of this workwere performedby the Jet Propulsion Laboratory, CalifomiaInstitute of Technology, under contract with the National Aeronautics and Space Administration and portions at the BeckmanInstitute, University of Illinois under National Science Foundation Grant NSF-IRI-92--09394. [Gratch93] J. Gratch, "COMPOSER: A Decision-theoretic Approach to AdaptiveProblemSolving,"Technical Report UIUCDCS-R-93-1806, Department of Computer Science,Universityof Illinois, Urbana, IL, MayI993. [Gratchet al. 93a] J. Gratch,S. Chien,andG. DeJong, "Learning Search Control Knowledgefor Deep Space NetworkScheduling," Proceedingsof the Tenth International Conference on Machine Learning, Amherst,MA,June 1993, pp. 135-142. [Gratchet al. 93b]J. Gratch;S. Chien,andG. DeJong, "Learning Search Control Knowledge to Improve Schedule Quality," Proceedings of the of the IJCAI93Workshopon Production Planning, Scheduling,and Control, Chambcrry,France, August 1993. [Chienet al. 94] S. Chien,J. Gratch,andM.Burl,"On the Efficient Allocation of Resourcesfor Hypothesis Evaluationin MachineLearning:AStatistical Approach," ForthcomingJPL Technical Report, Jet PropulsionLaboratory,CaliforniaInstitute of Technology, Pasadena,CA,1994. [Gratch& Chien93] J. Gratchand S. Chien,"Learning SearchControl Knowledge for the DeepSpaceNetwork Scheduling Problem: Extended Report and Guide to Software," Technical Report UIUCDCS-R-93-1801, Department of Computer Science,Universityof Illinois, Urbana,IL, January 1993. [Gratch &DeJong93] J. GratchandG. DeJong,"Rational Learning:A Principled Approachto Balancing Learning and Action," Technical Report UIUCDCS-R-93-1801, Department of Computer Science,Universityof Illinois, Urbana,IL, April 1993. [Russell&Wefald89] S. Russelland E. Wefald,"OnOptimal GameTree Search using Rational Meta-Reasoning," Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, MI,August1989,pp. 334-340. References [Bell93] C. E. Bell, "SchedulingDeepSpace Network Data Transmissions:A LagrangianRelaxation Approach," Proceedingsof the SPIEConferenceon Applicationsof Artificial Intelligence 1993: Knowledge--based Systemsin AerospaceandIndnstry,Orlando,FL, April 1993, pp. 330-340. [Fisher81] M.Fisher, "TheLagrangianRelaxationMethod for Solving Integer ProgrammingProblems," ManagementSclence 27, 1 (1981), pp. 1-18. 33 [Russell &Wefald91] S. RussellandE. Wefald,"Dothe Right Thing: Studies in LimitedRationality," MIT Press, Cambridge,MA1991. [Turnbull&Weiss87] Aclass of sequential procedures for k-sample problemsconcerning normal means with unknown unequal variances, in Statistical RankingandSelection, ThreeDecadesof Development,R. Haseeb(cd.), AmericanSciencesPress, Columbus,OH,1987, pp. 225-240.