From: AAAI Technical Report SS-02-05. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. ProgramSynthesis for CombinatorialOptimisation Problems Position Statement* Pierre Flener Departmentof Information Technology UppsalaUniversity, Box337, S - 751 05 Uppsala, Sweden pierref@csd.uu.se ASTRA group: http://www.csd.uu.se/,~pierref/astra/ Abstract Ahigh-levelabstract-datatype-based constraint modelling languageopensthe doorto an automatable empiricaldetermination-- by a synthesiser-- of howto ’best’ represent the decisionvariablesof a combinatorial optimisation problem,basedon(real-life) traininginstancesof the problem. the extremecasewhereno suchtraining instancesare provided,sucha synthesiserwouldsimplybe non-deterministic. Afirst-orderrelationalcalculusis a goodcandidate for sucha language, as it givesrise to verynaturalandeasy-to-maintain modelsof combinatorial optimisationproblems. Introduction Combinatorialoptimisation problemsare increasingly ubiquitous and crucial in industry. Indeed, staying competitive in the global NewEconomy requires the efficient modelling and solving of such problems,whoseinstances are getting larger and harder. Examplesare productionplanningsubject to customerdemandand resource availability so that sales are maximised, and air traffic control subject to safety protocols so that flight times are minimised.Appropriatevalues for the decision variables haveto be foundwithin their domains,subject to someconstraints, such that someoptional objective function on these variables takes an optimalvalue. In recent years, modellinglanguagesbasedon somelogic with sets and relations have gained popularity in formal methods,witness the B [1] and z [10] specification languages, the ALLOY [6] object modellinglanguage, and the Object Constraint Language(OCL)of UML.In database modelling, this had been long advocated, most notably via entity-relation-attribute (ERA)diagrams. Weexamine whetherconstraint modellingcan benefit fromthese ideas. Sets and set expressions recently started appearing as modelling devices in someconstraint programminglanguages,with set variables often implemented by the set intervai representation[5]. In the absenceof such an explicit set concept, modellersusuallyrepresent a set variable as an array of 0/1 integer variables, indexedby the domainof the *Theseideas are the result of manyfruitful discussionswith Brahim Hnichand Zeynep KJzlltan.Thisresearchis partly funded undergrant 221-99-369 of VR,the Swedish ResearchCouncil,and undergrant IG2001-67 of STINT, the SwedishFoundation for InternationalCooperation in ResearchandHigherEducation. 41 set. In termsof propagation, the set interval representationis equivalent to the 0/1 representation, whichconsumesmore memory but is able to supportmoreset expressionsand constraints. Bothrepresentations are restrictedto finite sets. Relations have not received muchattention yet in constraint progranuninglanguages,except the particular case of a total function,via arrays. Indeed,a total functionf can be representedas a 1-d array of variables over the range of f, indexedby its domain,or as a 2-d array of 0/1 variables, indexedby the domainand range of f, or evenwith someredundancy, as long as channellingconstraints relate the parts of the redundantrepresentation. Otherthan retrieving the (unique)imageundera total function of a domainelement, there has beenno supportfor relational expressions. Weclaim that a high-level constraint modellinglanguage withabstract datatypes(for sets, relations, and sequences) opensthe door to an automatableempiricaldetermination by a synthesiser -- of howto ’best’ represent the decision variables of a combinatorialoptimisationproblem,basedon (real-life) training instances thereof. In the extremecase whereno such training instances are provided, such a synthesiser wouldsimplybe non-deterministic.Asuitable firstorder relational calculusis a goodcandidatefor sucha language,as it gives rise to verynatural and easy-to-maintain modelsof combinatorialoptimisation problems. Wehere ignore the issue of howto parameterisea solver, say by providinga suitable labelling heuristic, towardsthe solving of the modelledproblem. For non-expert or lazy modellers,this task canalso be left to synthesisers[7,8]. We thus here only aimat techniquesthat find the ’best’ model for a givensolver, underits defaultsettings. Relational Modelling with ESRA DesignDecisions. In constraint satisfaction, muchmore effort has beendirectedat efficiently solvingthe constraints than at facilitating their modelling.Constraintprogramming languagesreflect this, as their controlstructuresandvariable representationoptions are usuallyquite low-level. Thekeydesigndecisionsfor our constraint modellinglanguage-- called ESRA -- are as follows. Wewant to capture common modellingidioms in abstract datatypes, especially for relations, so as to designa truly high-levellanguage. Computational completenessis not aimedat, as long as the notationis useful for elegantly modellinga large hum- ber of combinatorialoptimisationproblems.We(currently) do not supportprocedures,and henceno procedurecalls and no recursion. Similarly, we focus on finite domains, and support only boundedquantification. In order to maximally sugar the tint-order-logic nature of the language,weadopt a ’lower-128ASCII’syntax, unlike the I6TEX-requiring syntax of z, as well as a JAVA-style declaration of the universally quantified variables. For reasonsof space, wehere only introducethe conceptsof ESRA that are actually illustrated in this paper. Also, we can "only" give an informal semantics. Thereader maymonitorwww.csd.uu.se/,,~pierref/astrafor a completedescriptionof the full language. Modellingthe Data. Aprimitivetype is either a finite enumerationof newconstantidentifiers, or a finite rangeof integers, indicated by its lowerand upperbounds.Theonlypredefined primitive types are the ranges nat: and int, which are 0 :maxint and -maxint : nmxint:, respectively, with maxint being the maximum representable integer. Relations are declared using the # relation typeconstructor. Considerthe relation type A m: n # p : q B. ThenA and B mustbe primitive types, designating the two participants of any relation of this type, with Abeingcalled the domainand B the range of such a relation. The second and third argumentsof # are multiplicities, with the followingsemantics: for every elementof A, there are betweenmand n elements of B, and for every element of B, there are betweenp and q elementsof A in such a relation. Wethus (currently)restrict the focus to binaryrelations, betweenprimitive types only. For partial and total functions, m: n is 0 : 1 and 1 : 1, respectively.For injections, surjections, and bijections, p: q is 0 : 1, 1 :xp~txint, and1 : 1, respectively. Ratherthan elevating functionsand their particular casesto first-class conceptswitha specific syntax,we prefer keepingthe notation lean and leave their specialised handlingto the synthesiser. This has the further advantage that only the multiplicities needto be changedduringmodel maintenance,say whena function becomesa relation. (Arraysof) instance-datavariables are declaredin a JAVAstyle stronglytypedsyntax. All instance data are read in at run-timefroma data file. Decisionvariable declarationsfollow the samesyntax, but are precededby the var keyword. Theusageof arrays of decision variables, thoughpossible, is sometimesdiscouraged, as they mayamountto a premature commitment to a low-level representation of what essentiaily are relations. Dueto the (current) restrictions relations, arrays are not a redundantfeature. All declarations denote universally quantified variables, with the instancedata ones expectedto be groundat solving-timeand the decision onesexpectedto still be variablesthen. Modelling the Cost Function and the Constraints. Expressions are constructedin the usual way.Theusual arithmeticoperatorsare available, suchas card for the cardinalit), of a set expression,ord for the positionof an identifier in an enumeration,and sumfor the sumof a bounded(and possibly filtered) numberof numericexpressions. Let R be a relation of type Aat: n # 1~ : q B. For any element(or 42 subse0 a of A, the navigation expression a. R designates the relational imageof a, that is the possibly emptyset of all elementsin B that are related by R to (any elementin) a. If m: n is 1 : 1, then a. R simplydesignatesthe (unique) elementof B that is related to elementa of A. Therelation expression,-,R designatesthe transposerelation of R, which is thus of type B p: q # at: n A. The elementsof a rela- ¯ tion are representedas a#b pairs. First-orderlogic formulasare also constructedin the usual way.Atomsare built fromexpressionswith the usual predicates, suchas the infix in for set or relation membership and the infix ’<=’ for the ’<_’ inequality betweennumericexpressions. Formulasare built fromatomswiththe usual connectivesand quantifiers, such as not for negation,the infix ’&’ and ’=>’ for conjunctionand implication, and £ora3.1 and exists for bounded(and possibly filtered) universal andexistential quantification.Theusualtyping, association, and precedencerules apply. Thecost function is a numericexpressionthat has to be either minimisedor maximised.Theconstraints on the decision variablesare a conjunctionof formulas. The Warehouse Location Problem A companyconsiders opening warehouses on somecandidate locations to supply its existing stores. Eachcandidate warehousehas the same maintenancecost, and the supply cost to a store dependson the warehouse.Eachstore must be supplied by exactly one warehouse(C1). Eachcandidate warehousehas a capacity designating the maximum number of stores it can supply(Ca). Theobjective is to determine which warehousesto open, and which of these warehouses should supply the various stores, such that the sumof the maintenanceand supply costs is minimised.In moremathematicalterms,the soughtsupplyrelationshipis a total function fromthe set of stores into the set of warehouses, and the set of warehouses to be openedis the rangeof that function. This problemwas first modelledas a constraint program in the reference manualof 1LOG SOLVER 4.0 (in 1997), and then modelledin OPL[11]. There, the sought total function is modelledby a l-d array of variables representingthe (unique)warehouse that supplieseach store, therebycapturing the constraint Cx. The set of warehousesto be opened is modelledin a redundantway(becauseit Wouldsuffice to retrieve the range of that function), namelyas a 1-d array owof 0/1 variables, such that OW[w] is 1 iff warehousew is opened.A channelling constraint is then necessary, expressing that a warehousethat is actually supplying some store mustbe opened.The cost function and constraint C2 can only be expressed in a low-level way, namelyby reinterpreting the Booleansof OW andthe truth values of local constraints as numericweights. Thingsbecomeeven more awkwardif we non-redundantlymodelthe supply function, namelyjust by the 1-d array of variables representing the warehousethat supplies each store. Onthe instance data we tried, this modelis actually an order of magnitudemore efficient Coyall measures)than the publishedone, but it is muchless readable. This showsthat redundancyelimination maypay off in performance,but it mayjust as well be re- enumWomen, Men nat MaintCost nat RankW[Women,Men],RankM[Men,Women] enum Warehouses, Stores var Women 1:1 # 1:1 Men Marriage // bij. nat Capacity[Warehouses], solve ( SupplyCost[Stores,Warehouses] vat Stores 1:1 # nat Warehouses Supply // Cl forall(w#m, p#o in Marriage) RankW[w,o] < RankW[w,m] // stability 1 minimise => RankM[o,p]< RankM[o,w] sum(s#w in Supply) SupplyCost[s,w] & RankM[m,p] < RankM[m,w] // stability 2 + card(Stores.Supply)* MaintCost => RankW[p,o] < RankW[p,m] } } subject to { // C2 forall(w in Warehouses) Rgure3:Thcoriginal Stable Marriageproblcm card(w.’Supply)<= Capacity[w] Pigurel:TheWarehouseLocationpmbiem dundancyintroduction. But this is hard to guess, as human intuition maybe weakhere. Figure 1 shows an ESRAmodel of the problem. The soughtsupplyrelationship is modelledas a relation and constrained to be a total functionfromthe stores into the warehouses, thereby capturing constraint G1. The elegance of the cost function reflects the freedomfromrepresentation choices, with the navigation expression Stores. Supply retrieving the set of warehousesthat are to be opened.The only constraint gracefully captures G2,using the navigation expressionw. -,~supply to retrieve the set of stores that warehousew supplies. Fromthis model, lower-level models can be syntbesised, includingthe ones discussedabove. constraint is expressedfor both functions, requiring every personto be identical to the spouseof their spouse. A second modelwouldnon-redundantly modelthe marriages, namelyby a single total function, that is a l-d army of variables representingthe wifeof each man,say. Toenforce the bijectivenessof this function,all variablesare constrainedto be different. Thismodelis probablyless efficient, andthis has beenthe case withthe instancedata wetried. Athird modelwouldmodelthe marriages in a 2-d array Marriage of 0/I integer variables, indexed by the women and men, so that Marriage [w, m] is 1 iff womanw is marriedto manm. Twobijectiveness constraints are necessary to enforcethat every personhas exactly one spouse,so that there is exactly one i in each row and in each column. This modelis probablyless efficient than the secondone, and this has beenthe case with the instancedata wetried. Figure 3 showsan ESR^modelof the problem.The marriages are modelledas a relation over the sets of women and men,suchthat it is a bijection. Fromthis model,lower-level modelscan be synthesised, including the ones above, using the variouswaysof representingrelations, andexploitinginsights gainedfromthoroughstudies of bijections [9,12]. The Stable Marriage Problem Original Version. Consider a dating agency where an equal numbern of womenand menhave signed up and are willing to marryany opposite-sexperson of the group.They have rankedall possible spousesby decreasingpreference. Figure 2 has sampleinstance data, wherea lowerrank means a higher preference.For instance, Halis Nat’s first choice, but it is Evewhois Hal’s first choice. Theobjective is to match up the womenand mensuch that all marriages are stable. A marriageis stable if, wheneverspouses prefers someother partner, this partner prefers her/his spouseto s. So s maybe unhappy,but s/he is boundto stay with her/his spouse. In moremathematicalterms, the sought marriages form a bijection betweenthe sets of women and men. This problemwasfirst modelledas a constraint program in the reference manualof ILOGSOLVER 4.0 (in 1997), and then modelledin OPLin a significantly simpler way[11]. Themarriagesare modelledin a redundantway, via two 1-d arrays of variables representingthe (unique)husbandof each woman and the (unique) wife of each man,respectively. channellingconstraint is necessaryto ensurethat bothtotal functionsare the inverse of eachother, that is to achievea bijectiun. To achieve better propagation, this channelling ModelMaintenance.Relations and their particular cases (partial functions,total functions,injections, surjections,bijections, and so on) are a single, powerfulconceptfor elegantly modellingmanyaspects of combinatorial optimisation problems.Also, there are not too manydifferent, and evenstandard,waysof representingrelations and relational expressions.Therefore,weadvocatethat the synthesiser can actually makea (systematic)empiricalevaluation of candidate representations,using (real-life) training instances the problem.In the absenceof suchtraining instances, such a synthesiser wouldsimplybe non-deterministic.Also, theoretical studies such as [12] should be madefor particular casesof relations in order to obtainrules stating whena representation is advisableand whennot, thereby reducingthe volumeof such empiricalstudies by synthesisers. Modelmaintenanceat the high ESR^level reduces to adaptingto the newproblemand re-synthesising,as all representation(and thus solving)issues are left to the synthesiser. At lowerlevels, modelmaintenanceis quite tedious, as the early if not uninformedrepresentation choices have to be taken into accountand as the lower-level notation is more awkward.Worse, a representation change, a redundancyelimination, or a redundancyintroduction (such as Hal Nat 1 Eve 2 Pat 3 Jim 2 3 2 Bob 3 1 1 Hal Jim Bob Nat 3 3 3 Eve 1 1 2 Pat 2 2 1 Figure 2: Rankingsof the women for the men(left), rankings of the menfor the women (fight) and 43 enumWomen, Men nat RankW[Women,Men],RankM[Men,Women] var Women 1:3 # O:l Men Marriage solve { foral1(w#m in Marriage) forall(oin Men) // stability RankW[w,o] < RankW[w,m] => exists(p in Women: p#o in Marriage) RankM[o,p] < RankM[o,w] & forall(pin Women) // stability RankM[m,p] < RankM[m,w] => forall(o in Men: p#o in Marriage) RankW[p,o] < RankW[p,m] } } Rgure4:TbePolyandricStable Marriagepm~em a modelintegration or the addition of impliedconstraints) may"haveto" be operated,becauseit is unlikelythat, for the consideredtraining instancesor in general,the ’best’ representationis the samefor bijectionsas for full relations, say. Suchre-synthesis is also necessarywhenthe distribution of instances on whichthe modelis deployedbecomesdifferent from the training distribution used whenthe modelwas formulated.But the modellermaybe unwillingor unable to do this experimentation for finding the ’best’ model,or s/he maybe unawareof insights gained froma general empirical study, suchas on howto ’best’ modelbijections [9]. strongly-typed(full) first-order logic witharrays (andwith the look of an imperativelanguage),while dispensingwith such hard-to-properly-implement and rarely-necessary (for constraint modelling)’luxuries’ as recursion and unbounded quantification. As shown,ESRAeven goes beyondthem, by advocatingan abstract viewof relations. Current and Future Work. Our ESRAlanguage is an extension of a streamlined(significan0 subset of OPL. A prototype ESRA-tO-OPL synthesiser [4] has been implemented by SimonWrang.The semantics of ESRAwill be given in an implementation-independent way, in twolayers. Indeed, somefeatures of ESRA ale just syntactic sugar for combinations of (a few)kernel features, hencewewill provide operational semantics(by rewrite roles) for the non-kernel features, and a set-oriented denotationalsemanticsfor the kernel features. Wecan then tacHe the joint consideration of the modellingand the solver parameterisation. The synthesiser will also benefit from our workon symmetryreducing/breakingconstraints [3]. A graphical languagecan be developedfor the variable modelling,includingthe multiplicity constraintsonrelations, so that onlythe cost function andthe other constraints needto be textually expressed. References 1. J.-R. Abrial. The B-Book:Assigning Programsto Meanings. Cambridge University Press, 1996. 2. K.R. Apt, J. Bmnekreef,V. Partington, and A. Schaerf. Polyandry Version. Imagine a country where the law alAnimperative languagethat supports declarative prolows women to marry up to 3 men,but menmaymarry only gramming.ACMTOPLA$ 20(5): 1014--1066, 1998. 1 woman.Alsoconsider that all women whosigned up at the 3. P. Flener, A. Frisch, B. Hnich,Z. K~fltan, I. Mignel,J. agencyneedto marry. The sought marriagesnowform a full Pearson, and T. Walsh. Symmetryin matrix models. In relation betweenthe sets of women and men.A polyandric CP’OIWorkshopon Symmetryin Constraints. marriageis stable if, wheneverspouses prefers someother partner, this partner/s married,and s/he prefers all her/his high-level 4. R Flener, B. Hnich,and Z. l(azfltan. Compiling spousesto s. If at least as manymenas women havesigned type constructors in constraint programming. In Proc. of up at the agency,the problemremainsa decision problem. PADL’OI.LNCS 1990. Springer-Verlag, 2001. Figure 4 shows an ~SRAmodel of this new problem. 5. C. Gervet.Interval propagationto reasonabout sets: DefThemultiplicities werechanged,andthe stability constraints inition and implementation of a practical language.Conwererephrasedto reflect the newdefinition. ~Thesamestastraints 1(3):191-244, 1997. bility constraints could actually also havebeenused in the A lightweight object modelling no6. D. Jackson. ALLOY: modelof Figure 3, because the newdefmition of stability tation. ACMTOPLAS, forthcoming. implies the original one in its context.) The modelmaintenancewas indeed unburdenedby representation issues. 7. Z. Klzlltan, P. Flener, and B. Hnich. Towardsinferring labelling heuristics for CSPapplicationdomains.In Proc. Conclusion of Kl’O1. LNA12174.Springer-Verlag, 2001. Related Work.This research owesa lot to previous work 8. S. Minton.Automatically configuringconstraint satisfacon relational modellingin formal methodsand on ERA-style tion programs.Constraints 1(1-2):7-43, 1996. semantic data modelling, especially to the ALLOY object 9. B.M. Smith. Modelling a permutation problem. RR18, modellinglanguage[6], whichitself gainedmuchfromthe 7. Univ. of Leeds(UK),Schoolof ComputerStudies, 2000. specification notation [10] (and learned from UML/OCL how 10. J.M. Spivey. The z Notation: A ReferenceManual(secnot to do it). Contraryto ERAmodelling,we do not distinondedition). Prentice-Hall,1992. gnish betweenattributes and relations. In constraint programming,OPL[I1] stands out as a 11. R Van Hentenryck. The OPLOptimi~tion Programming medium-level constraint modelling language, and ALMA Language.The MITPress, 1999. [2] is alSObecominga very powerful notation, on top of 12. T. Walsh. Permutation problems and channelling conMODULA-2.Our ESRA language shares with them the quest straints. TR26 at www.dcs.st-and.ac.uld,-~apes, 2001. for a practical declarative modellinglanguagebased on a 44