Labs TDDD10 AI Programming

Labs TDDD10AIProgramming MultiagentDecisionMaking CyrilleBerger newmap:Kobe2013-stations ChangeSetcontainsalltheproperties,notjustnewones InAbstractAgentclass: protectedvoidprocessSense(KASensesense){ model.merge(sense.getChangeSet()); Collection<Command>heard=sense.getHearing(); think(sense.getTime(),sense.getChangeSet(),heard); } Youcanoverrideit: protectedvoidprocessSense(KASensesense){ //sendupdatetootheragent //usingworldmodelbeforemerge super.processSense(sense); } 2/83 Lectures Lecturegoals 1AIProgramming:Introduction 2IntroductiontoRoboRescue 3AgentsandAgentsArchitecture 4Multi-AgentandCommunication 5Multi-AgentDecisionMaking 6CooperationAndCoordination1 7CooperationAndCoordination2 8MachineLearning 9AutomatedPlanning 10PuttingItAllTogether Multi-agentdecisioninacompetitive environment Learnabouttheconceptofutility, rationalagents,votingandauctioning 3/83 4/83 Lecturecontent Self-InterestedAgents SocialChoice Auctions Self-InterestedAgents SingleDimensionAuctions CombinatorialAuctions 5/83 UtilitiesandPreferences Whatisutility? Utilityisnotmoney,butsimilar Assumewehavejusttwoagents:Ag={i,j} Agentsareassumedtobeself-interested:theyhave preferencesoverhowtheenvironmentis AssumeΩ={ω₁,ω₂,…}isthesetof“outcomes” thatagentshavepreferencesover Wecapturepreferencesbyutilityfunctions: uᵢ=Ω→ℝ uⱼ=Ω→ℝ Utilityfunctionsleadtopreferenceorderingsover outcomes:ω⪰ω’meansuᵢ(ω)≥uᵢ(ω’) ω⪲ω’meansuᵢ(ω)>uᵢ(ω’) 7 8 MultiagentEncounters(1/2) Self-InterestedAgents Ifagentsrepresentindividualsororganizationsthenwecannotmake thebenevolenceassumption. Weneedamodeloftheenvironmentin whichtheseagentswillact… agentssimultaneouslychooseanactiontoperform,andasa resultoftheactionstheyselect,anoutcomeinΩwillresult theactualoutcomedependsonthecombinationofactions assumeeachagenthasjusttwopossibleactionsthatitcan perform,C(“cooperate”)andD(“defect”) Environmentbehaviorgivenbystate transformerfunction: Agentswillbeassumedtoacttofurtherthereowninterests,possibly atexpenseofothers. Potentialforconflict. Maycomplicatethedesigntaskenormously. τ:Acⁱ⨯Acʲ→Ω 9 MultiagentEncounters(2/2) 10 Coordinationgame Supposewehavethecasewherebothagentscaninfluence theoutcome,andtheyhaveutilityfunctionsasfollows: Examplesofastatetransformer function uᵢ(ω₁)=2uᵢ(ω₂)=1uᵢ(ω₃)=3uᵢ(ω₄)=4 uⱼ(ω₁)=2uⱼ(ω₂)=3uⱼ(ω₃)=1uⱼ(ω₄)=4 Thisenvironmentissensitivetoactionsofboth agents: τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₃τ(C,C)=ω₄ Neitheragenthasanyinfluenceinthisenvironment: τ(D,D)=ω₁τ(D,C)=ω₁τ(C,D)=ω₁τ(C,C)=ω₁ Thisenvironmentiscontrolledbyj τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₁τ(C,C)=ω₂ Thisenvironmentissensitivetoactionsofbothagents: τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₃τ(C,C)=ω₄ Withabitofabuseofnotation: uᵢ(D,D)=2uᵢ(D,C)=1uᵢ(C,D)=3uᵢ(C,C)=4 uⱼ(D,D)=2uⱼ(D,C)=3uⱼ(C,D)=1uⱼ(C,C)=4 Thenagenti’spreferencesare: C,C⪰ᵢC,D≻ᵢD,C⪰ᵢD,D “C”istherationalchoicefori. 11 12 DisgraceofGijón(WorldCup1982) PayoffMatrices Wecancharacterizetheprevious scenarioinapayoffmatrix: Onegameleft:Germany-Austria uᵢ(≥3-0)=2uⱼ(≥3-0)=-1 uᵢ(2-0)=uᵢ(1-0)=2uⱼ(2-0)=uⱼ(1-0)=1 uᵢ(a-a)=-1uⱼ(a-a)=2 uᵢ(0-a)=-1uⱼ(0-a)=2(a>1) Finalscore:Germany1-0Austria Agentiisthecolumnplayer Agentjistherowplayer 13 ThePrisoner’sDilemma 14 ThePrisoner’sDilemma Payoffmatrixforprisoner’sdilemma: Twomenarecollectivelychargedwitha crimeandheldinseparatecells,withnoway ofmeetingorcommunicating.Theyaretold that: ifoneconfessesandtheotherdoesnot,theconfessorwill befreed,andtheotherwillbejailedforthreeyears Ifbothconfess,theneachwillbejailedfortwoyears Topleft:Ifbothdefect,thenbothgetpunishmentformutualdefection Topright:Ificooperatesandjdefects,igetssucker’spayoffof1,while jgets4 Bottomleft:Ifjcooperatesandidefects,jgetssucker’spayoffof1, whileigets4 Bottomright:Rewardformutualcooperation Bothprisonersknowthatifneitherconfesses, thentheywilleachbejailedforoneyear 15 16 DominantStrategies(1/2) SolutionConcepts Givenanyparticularstrategy(eitherCorD)ofagenti, therewillbeanumberofpossibleoutcomes Wesays₁dominatess₂ifeveryoutcomepossiblebyi playings₁ispreferredovereveryoutcomepossiblebyi playings₂ Arationalagentwillneverplayadominatedstrategy Soindecidingwhattodo,wecandelete dominatedstrategies Unfortunately,thereisnotalwaysaunique undominatedstrategy Howwillarationalagentbehaveinany givenscenario? Answeredinsolutionconcepts: dominantstrategy; Nashequilibriumstrategy; Paretooptimalstrategies; strategiesthatmaximizesocialwelfare. 17 18 (PureStrategy)NashEquilibrium(1/2) DominantStrategies(2/2) Ingeneral,wewillsaythattwostrategiess1ands2arein Nashequilibriumif: Coordinationgame: undertheassumptionthatagentiplayss₁,agentjcandonobetterthanplays₂; and undertheassumptionthatagentjplayss₂,agenticandonobetterthanplays₁. NeitheragenthasanyincentivetodeviatefromaNash equilibrium Unfortunately: NoteveryinteractionscenariohasaNashequilibrium SomeinteractionscenarioshavemorethanoneNash equilibrium Prisoner'sDilemna: 19 20 (PureStrategy)NashEquilibrium(2/2) ParetoOptimality(1/2) Coordinationgame: AnoutcomeissaidtobeParetooptimal(orParetoefficient)ifthere isnootheroutcomethatmakesoneagentbetteroffwithoutmaking anotheragentworseoff. IfanoutcomeisParetooptimal,thenatleastoneagentwillbe reluctanttomoveawayfromit(becausethisagentwillbeworse off). Prisoner'sDilemna: IfanoutcomeωisnotParetooptimal,thenthereisanother outcomeω’thatmakeseveryoneashappy,ifnothappier,thanω. “Reasonable”agentswouldagreetomovetoω’inthiscase.(Evenif Idon’tdirectlybenefitfromω,youcanbenefitwithoutme suffering.) 21 ParetoOptimality(2/2) 22 SocialWelfare(1/2) Thesocialwelfareofanoutcomeωisthesumoftheutilities thateachagentgetsfromω: Coordinationgame: Prisoner'sDilemna: Thinkofitasthe“totalamountofutilityinthesystem”. Asasolutionconcept,maybeappropriatewhenthewhole system(allagents)hasasingleowner(thenoverallbenefit ofthesystemisimportant,notindividuals). 23 24 SocialWelfare(2/2) ThePrisoner’sDilemma Solutionconcepts Coordinationgame: Disadominantstrategy. (D,D)istheonlyNashequilibrium. Alloutcomesexcept(C,C)areParetooptimal. (C,C)maximizessocialwelfare. Theindividualrationalactionisdefect Thisguaranteesapayoffofnoworsethan2,whereas cooperatingguaranteesapayoffofatmost1.Sodefection isthebestresponsetoallpossiblestrategies:bothagents defect,andgetpayoff=2 Butintuitionsaysthisisnotthebestoutcome: Surelytheyshouldbothcooperateandeachgetpayoffof3! Prisoner'sDilemna: 25 26 ThePrisoner’sDilemma TheIteratedPrisoner’sDilemma Thisapparentparadoxisthefundamental problemofmulti-agentinteractions. Itappearstoimplythatcooperationwillnot occurinsocietiesofself-interestedagents. Realworldexamples: Oneanswer:playthegamemorethan once Ifyouknowyouwillbemeetingyour opponentagain,thentheincentiveto defectappearstoevaporate Cooperationistherationalchoiceinthe infinitelyrepeatedprisoner’sdilemma nucleararmsreduction(“whydon’tIkeepmine...”) freeridersystems—publictransport; televisionlicenses. Canwerecovercooperation? 27 28 BackwardsInduction Axelrod’sTournament But…,supposeyoubothknowthatyouwillplay thegameexactlyntimes Supposeyouplayiteratedprisoner’s dilemmaagainstarangeofopponents… Whatstrategyshouldyouchoose,soas tomaximizeyouroverallpayoff? Axelrod(1984)investigatedthisproblem, withacomputertournamentfor programsplayingtheprisoner’sdilemma Onroundn-1,youhaveanincentivetodefect,togainthatextra bitofpayoff… Butthismakesroundn–2thelast“real”,andsoyouhavean incentivetodefectthere,too. Thisisthebackwardsinductionproblem. Playingtheprisoner’sdilemmawithafixed, finite,pre-determined,commonlyknown numberofrounds,defectionisthebeststrategy 29 StrategiesinAxelrod’sTournament 30 Axelrod’sTournamentresults RANDOM ALLD:“Alwaysdefect”—thehawkstrategy; TIT-FOR-TAT: TIT-FOR-TATwonthefirsttournament Asecondtournamentwascalled TIT-FOR-TATwonthesecond tournamentaswell Onroundu=0,cooperate Onroundu>0,dowhatyouropponentdidonroundu–1 TESTER: On1stround,defect.Iftheopponentretaliated,thenplayTITFOR-TAT.Otherwiseinterspersecooperationanddefection. JOSS: AsTIT-FOR-TAT,exceptperiodicallydefect 31 32 RecipesforSuccessinAxelrod’sTournament CompetitiveandZero-SumInteractions Wherepreferencesofagentsarediametrically opposedwehavestrictlycompetitivescenarios Zero-sumencountersarethosewhereutilities sumtozero: Axelrodsuggeststhefollowingrulesfor succeedinginhistournament: Don’tbeenvious: Don’tplayasifitwerezerosum! Benice: Startbycooperating,andreciprocatecooperation Retaliateappropriately: Alwayspunishdefectionimmediately,butuse“measured” force—don’toverdoit Don’tholdgrudges: Alwaysreciprocatecooperationimmediately uᵢ(ω)+uⱼ(ω)=0forallω∊Ω Zerosumimpliesstrictlycompetitive Zerosumencountersinreallifeareveryrare, butpeopletendtoactinmanyscenariosasif theywerezerosum 33 34 MixedStrategiesforMatchingPennies MatchingPennies Playersiandjsimultaneouslychoosethe faceofacoin,either“heads”or“tails”. Iftheyshowthesameface,theniwins, whileiftheyshowdifferentfaces,thenj wins. Nopairofstrategiesformsapure strategyNashEquilibrium:whateverpair ofstrategiesischosen,somebodywill wishtheyhaddonesomethingelse. Thesolutionistoallowmixedstrategies: play“heads”withprobability0.5 play“tails”withprobability0.5. ThisisaNashEquilibriumstrategy. 35 36 MixedStrategies Amixedstrategyhastheform playα₁withprobabilityp₁ playα₂withprobabilityp2₂ ... playαkwithprobabilitypk. thatp₁+p₂+…+pₖ=1. Nashprovedthateveryfinitegamehasa Nashequilibriuminmixedstrategies. SocialChoice 37 ComponentsofaSocialChoiceModel SocialChoice Socialchoicetheoryisconcernedwith groupdecisionmaking. Classicexampleofsocialchoicetheory: voting. Formally,theissueiscombining preferencestoderiveasocialoutcome. AssumeasetAg={1,…,n}ofvoters. Thesearetheentitieswhoexpresses preferences. VotersmakegroupdecisionswrtasetΩ ={ω₁,ω₂,…}ofoutcomes. Thinkoftheseasthecandidates. If|Ω|=2,wehaveapairwiseelection. 39 40 Preferences PreferenceAggregation Thefundamentalproblemofsocialchoice theory: Givenacollectionofpreferenceorders,one foreachvoter,howdowecombinetheseto deriveagroupdecision,thatreflectsas closelyaspossiblethepreferencesofvoters? variantsofpreferenceaggregation: EachvoterhaspreferencesoverW:an orderingoverthesetofpossible outcomesΩ. Example,Suppose: Ω={gin,rum,brandy,whisky} thenwemighthaveagentiwithpreferenceorder: ωᵢ=(brandy,rum,gin,whisky) meaning: socialwelfarefunctions; socialchoicefunctions. brandy>ᵢrum>ᵢgin>ᵢwhisky 41 SocialWelfareFunctions 42 SocialChoiceFunctions LetП(Ω)bethesetofpreferenceorderingsoverΩ. Asocialwelfarefunctiontakesthevoter preferencesandproducesasocialpreferenceorder: Sometimes,wewantjusttoselectone ofthepossiblecandidates,ratherthana socialorder. Thisgivessocialchoicefunctions: Wedefine≻*astheoutcomeofasocialwelfare function whisky≻*gin≻*brandy≻*rum≻*gin S≻*M≻*SD≻*MP≻*C≻*V≻*FP≻*KD≻*FI≻*PP Example:presidentialelection. 43 44 VotingProcedures:Plurality AnomalieswithPlurality Socialchoicefunction:selectsasingleoutcome. Eachvotersubmitspreferences. Eachcandidategetsonepointforevery preferenceorderthatranksthemfirst. Winneristheonewithlargestnumberofpoints. Suppose|Ag|=100andΩ={ω₁,ω₂, ω₃}with: 40%votersvotingforω₁ 30%ofvotersvotingforω₂ 30%ofvotersvotingforω₃ Example:PoliticalelectionsinUK,France,USA... Withplurality,ω₁getselectedeven thoughaclearmajority(60%)prefer anothercandidate! Ifwehaveonlytwocandidates,thenpluralityis asimplemajorityelection. 45 StrategicManipulationbyTacticalVoting 46 Condorcet’sParadox Supposeyourpreferencesare SupposeAg={1,2,3}andΩ={ω₁,ω₂,ω₃}with: ω₁≻ω₂≻ω₃ ω₁≻₁ω₂≻₁ω₃ ω₂≻₂ω₃≻₂ω₁ ω₃≻₃ω₁≻₃ω₂ whileyoubelieve49%ofvotershavepreferences ω₂≻ω₁≻ω₃ andyoubelieve49%havepreference Foreverypossiblecandidate,thereisanother candidatethatispreferredbyamajorityofvoters! ThisisCondorcet’sparadox:therearesituationsin which,nomatterwhichoutcomewechoose,a majorityofvoterswillbeunhappywiththe outcomechosen. ω₃≻ω₂≻ω₁ Youmaydobettervotingforw2,eventhoughthisis notyourtruepreferenceprofile. Thisistacticalvoting:anexampleofstrategic manipulationofthevote. Especiallyaproblemintwolegselections 47 48 Applicationsofsocialchoicetheory Mainapplicationisforhumanchoice anddecisionmaking Resultsaggregation Auctions aggregatetheoutputofseveralsearchengines 49 Applicationofauctions WhatisanAuction? WiththeriseoftheInternet,auctionshavebecome popularinmanye-commerceapplications(e.g.eBay) Auctionsareanefficienttoolforreaching agreementsinasocietyofself-interestedagents Anauctiontakesplacebetweenan agentknownastheauctioneeranda collectionofagentsknownasthe bidders Forexample,bandwidthallocationonanetwork,sponsorlinks Auctionscanbeusedforefficientresourceallocation withindecentralizedcomputationalsystems Frequentlyutilizedforsolvingmulti-agentand multi-robotcoordinationproblems Thegoaloftheauctionisfortheauctioneerto allocateallgoodstothebidders Theauctioneerdesirestomaximizethepriceand biddersdesiretominimizetheprice Forexample,team-basedexplorationofunknownterrain 51 52 LimitPrice LimitPrice Privatevalue Eachtraderhasavalueorlimitpricethatthey placeonthegood. Goodhasanvaluetomethatisindependentofwhatitisworthtoyou. TextbookgivestheexampleofJohnLennon’slastdollarbill. Abuyerwhoexchangesmorethantheirlimitpriceforagood makesaloss. Asellerwhoexchangesagoodforlessthantheirlimitprice makesaloss. Commonvalue Thegoodhasthesamevaluetoallofus,butwehavediffering estimatesofwhatitis. Winner’scurse Limitpricesclearlyhaveaneffectonthe behavioroftraders. Thereareseveralmodels,embodyingdifferent assumptionsaboutthenatureofthegood. Correlatedvalue Ourvaluesarerelated. Themoreyouarepreparedtopay,themoreIshouldbepreparedto pay. 53 Winner'scurse 54 AuctionCharacteristics Termedinthe1950s: Auctionprocedure OilcompaniesbidfordrillingrightsintheGulfof Problemwasthebiddingprocessgiventheuncertaintiesinestimatingthe potentialvalueofanoffshoreoilfield Competitivebiddinginhighrisksituations,byCapen,ClappandCampbell,Journal ofPetroleumTechnology,1971 Oneshot:Onlyonebidding Ascending:Auctioneerbeginsatminimumprice,biddersincrease Descending:Auctioneerbeginsatpriceovervalueofgoodandlowers thepriceateachround Continuous:Internet Forexample Auctionsmaybe Anoilfieldhadanactualintrinsicvalueof$10 Oilcompaniesmightguessitsvaluetobeanywherefrom$5millionto$20 Thecompanywhowronglyestimatedat$20millionandplacedabidatthat levelwouldwintheauction,andlaterfindthatitwasnotworththatmuch StandardAuction:Onesellerandmultiple ReverseAuction:Onebuyerandmultiple DoubleAuction:Multiplesellersandmultiple Inmanycasesthewinneristhepersonwhohasoverestimated themost⇒“TheWinner’scurse” BidShading:Offerbidbelowacertainamountofthevaluation CombinatorialAuctions Buyersandsellersmayhavecombinatorialvaluationsforbundlesof 55 56 SingleversusMulti-dimensional Singledimensionalauctions Theonlycontentofanofferarethepriceandquantity ofsomespecifictypeofgood. “I’llbid$200forthose2chairs” SingleDimensionAuctions Multidimensionalauctions Offerscanrelatetomanydifferentaspectsofmany differentgoods. “I’mpreparedtopay$200forthosetworedchairs,but $300ifyoucandeliverthemtomorrow.” Frequencyrangesforcellphones 57 EnglishAuction DutchAuction Dutchauctionsareexamplesoffirst-priceopen-cry descendingauctions Protocol: Anexampleoffirst-priceopen-cryascendingauctions Protocol: Auctioneerstartsbyofferingthegoodatalow Auctioneeroffershigherpricesuntilnoagentiswillingtopaythe proposedlevel Thegoodisallocatedtotheagentthatmadethehighest Auctioneerstartsbyofferingthegoodatartificiallyhighvalue Auctioneerlowersofferpriceuntilsomeagentmakesabidequaltothecurrent offerprice Thegoodisthenallocatedtotheagentthatmadetheoffer Properties Properties Generatescompetitionbetweenbidders(generatesrevenueforthe sellerwhenbiddersareuncertainoftheirvaluation) Dominantstrategy:Bidslightlymorethancurrentbit,withdrawif bidreachespersonalvaluationofgood Winner’scurse(forcommonvaluegoods) Itemsaresoldrapidly(cansellmanylotswithinasingleday) Intuitivestrategy:waitforalittlebitafteryourtruevaluationhasbeencalled andhopenooneelsegetsintherebeforeyou(nogeneraldominantstrategy) Winner’scursealsopossible 59 60 First-PriceSealed-BidAuctions VickreyAuctions ProposedbyWilliamVickreyin1961(NobelPrizeinEconomicSciencesin 1996) Vickreyauctionsareexamplesofsecond-pricesealed-bidone-shot Protocol: First-pricesealed-bidauctionsareone-shotauctions: Protocol: Withinasingleroundbidderssubmitasealedbidforthegood Thegoodisallocatedtotheagentthatmadehighestbid Winnerpaysthepriceofhighestbid Oftenusedincommercialauctions,e.g.,publicbuildingcontractsetc. withinasingleroundbidderssubmitasealedbidforthegood goodisallocatedtoagentthatmadehighestbid winnerpayspriceofsecondhighestbid Dominantstrategy:bidyourtrue Problem:thedifferencebetweenthehighestandsecond highestbidis“wastedmoney”(thewinnercouldhave offeredless) Intuitivestrategy:bidalittlebitlessthanyourtrue valuation(nogeneraldominantstrategy) ifyoubidmore,yourisktopaytoomuch ifyoubidless,youloweryourchancesofwinningwhilestillhavingtopaythesamepriceincaseyou win Antisocialbehavior:bidmorethanyourtruevaluationtomake opponentssuffer(not“rational”) Forprivatevalueauctions,strategicallyequivalenttotheEnglish auctionmechanism Asmorebiddersassmallerthedeviationshouldbe! 61 62 Generalizedsecondpriceauctions Generalizedfirstpriceauctions UsedbyYahoofor“sponsoredlinks”auctions Introducedin1997forsellingInternetadvertisingby Yahoo/Overture(beforetherewereonly“bannerads”) IntroducedbyGoogleforpricing sponsoredlinks(AdWordsSelect) Observation:Biddersgenerallydonot wanttopaymuchmorethantherank belowthem Therefore:2ndpriceauction Furthermodifications: Advertiserssubmitabidreportingthewillingnesstopayonaper-clickbasisfor aparticularkeyword Cost-Per-Click(CPC)bid Advertiserswerebilledforeach“click”onsponsoredlinks leadingtotheirpage Advertisersbidforkeywordsandkeywordcombinations Rank:CPC_BIDXqualityscore Price:withrespecttolowerranks Thelinkswerearrangedindescendingorderofbids,makinghighestbidsthe mostprominent Auctionstakeplaceduringeach http://www.chipkin.com/googleadwords-actual-cpc-calculation/ AfterseeingGoogle’ssuccess,Yahooalso switchedtosecondpriceauctionsin2002 However,auctionmechanismturnedouttobeunstable! Biddersrevisedtheirbidsasoftenaspossible 63 64 CombinatorialAuctions Inacombinatorialauction,theauctioneerputs severalgoodsonsaleandtheotheragentssubmit bidsforentirebundlesofgoods Givenasetofbids,thewinnerdetermination problemistheproblemofdecidingwhichofthe bidstoaccept CombinatorialAuctions Thesolutionmustbefeasible(nogoodmaybeallocatedtomorethan oneagent) Ideally,itshouldalsobeoptimal(inthesenseofmaximizingrevenue fortheauctioneer) Achallengingalgorithmicproblem 66 ComplementsandSubstitutes Protocol Thevalueanagentassignstoabundleofgoodsmay dependonthecombination Complements:Thevalueassignedtoasetisgreaterthan thesumofthevaluesassignstoitselements Oneauctioneer,severalbidders,andmanyitemstobesold Eachbiddersubmitsanumberofpackagebidsspecifyingthevaluation (price)thebidderispreparedtopayforaparticularbundle Theauctioneerannouncesanumberofwinning Thewinningbidsdeterminewhichbidderobtainswhichitem,and howmucheachbidderhastopay Example:„apairofshoes”(leftshoeandarightshoe) Substitutes:Thevalueassignedtoasetislowerthanthe sumofthevaluesassignedtoitselements Noitemmaybeallocatedtomorethanonebidder Examplesofpackagebids: Agent1:({a,b},5),({b,c},7),({c,d},6) Agent2:({a,d},7),({a,c,d},8) Agent3:({b},5),({a,b,c,d},12) Example:atickettothetheatreandanotheronetoafootballmatchforthe samenight Insuchcasesanauctionmechanismallocatingoneitemata timeisproblematicsincethebestbiddingstrategyinone auctionmaydependontheoutcomeofotherauctions Generally,thereare2n−1non-emptybundlesfornitems,howto computetheoptimalsolution? 67 68 OptimalWinnerDeterminationAlgorithm AnauctioneerhasasetofitemsM={1,2,…,m}to ThereareN={1,2,…,n}buyersplacingbids BuyerssubmitasetofpackagebidsB={B1,B2,…,Bn} ApackagebidisatupleB=[S,v(S)],whereS⊆Misaset ofitems(bundle)andvi(S)>0buyer’sitruevaluation xS,i∈{0,1}isadecisionvariableforassigningbundleSto buyeri Thewinnerdeterminationproblem(WDP)istolabelthebids aswinningorlosing(bydecidingeachxs,isoastomaximizethe sumofthetotalacceptedbidprice) ThisisNP-Complete!Canbesolvedwithaninteger programsolver,orheuristicsearch SolvingWDPsbyHeuristicSearch Twowaysofrepresentingthestate Branch-on-items: Astateisasetofitemsforwhichanallocationdecision hasalreadybeenmade Branchingiscarriedoutbyaddingafurtheritem Branch-on-bids: Astateisasetofbidsforwhichanacceptancedecision hasalreadybeenmade Branchingiscarriedoutbyaddingafurtherbid 69 70 Problemwithbranch-on-items Branch-on-Items Branchingbasedonthequestion: “Whatbidshouldthisitembe assignedto?” Eachpathinthesearchtree consistsofasequenceofdisjoint bids Whatiftheauctioneer'srevenuecanincreaseby keepingitems? Example: Thereisnobidfor1, $5bidfor2, $3bidfor{1;2} Thus,bettertokeep1andsell2thanselling Theauctioneer'spossibilityofkeepingitemscanbe implementedbyplacingdummybidsofpricezeroon thoseitemsthatreceivedno1-itembids(Sandholm 2002) Bidsthatdonotshareitemswitheachother Apathendswhennobidcanbeaddedtoit Costsateachnodearethesumof thepricesofthebidsacceptedon thepath 71 72 Branch-on-bids Exampleofbranch-on-items Branchingisbasedonthequestion:“Shouldthis bidbeacceptedorrejected?“ Binarytree Whenbranchingonabid,thechildreninthesearch treearetheworldwherethatbidisaccepted(IN), andtheworldwherethatbidisrejected(OUT) Nodummybidsareneeded Firstabidgraphisconstructedthatrepresentsall constraintsbetweenthebids Then,bidsareaccepted/rejecteduntilallbidshave beenhandled Bids:{1,2},{2,3},{3}, {1;3} WeaddDummyBids: {1},{2} Onaccept:removeallconstrainedbidsfromthegraph Onreject:removebiditselffromthegraph 73 Branch-on-bids-Example 74 HeuristicFunction ForanynodeNinthesearchtree,letg(N)betherevenue generatedbybidsthatwereacceptedaccordinguntilN Theheuristicfunctionh(N)estimatesforeverynodeNhow muchadditionalrevenuecanbeexpectedongoingfromN Anupperboundonh(N)isgivenbythesumoverthe maximumcontributionofthesetofunallocateditemsA: Bids:{1,2},{2,3},{3}, {1;3} Tighterboundscanbeobtainedbysolvingthelinear programrelaxationoftheremainingitems(Sandholm2006) 75 76 AuctionsforMulti-RobotExploration Multi-RobotExploration Considerateamofmobilerobotsthathastovisitanumberofgiven targets(locations)ininitiallypartiallyunknownterrain Examplesofsuchtasksarecleaningmissions,space-exploration, surveillance,andsearchandrescue Continuousre-allocationoftargetstorobotsisnecessary Forexample,robotsmightdiscoverthattheyareseparatedbyablockagefromtheir target Toallocateandre-allocatethetargetsamongthemselves,the robotscanuseauctionswheretheysellandbuytargets Teamobjectivecanbetominimizethesumofallpathcosts,hence, biddingpricesareestimatedtravelcosts Thepathcostofarobotisthesumoftheedgecostsalongitspath, fromitscurrentlocationtothelasttargetthatitvisits ThreerobotsexploringMars.Therobots’taskistogatherdataaroundthe fourcraters,e.g.tovisitthehighlightedtargetsites.Source:N.Kalra 77 78 Single-RoundCombinatorialAuction GeneralExploration Robotalwaysfollowaminimumcostpaththatvisitsall allocatedtargets Wheneverarobotgainsmoreinformationabouttheterrain,it sharesthisinformationwiththeotherrobots Iftheremainingpathofatleastonerobotisblocked,thenall robotsputtheirunvisitedtargetsupforauction Theauction(s)closeafterapredeterminedamountoftime Protocol: Everyrobotbidsallpossiblebundlesoftargets Thevaluationistheestimatedsmallestpathcostneededtovisitall targetsinthebundle(TSP) Acentralauctioneerdeterminesandinformsthewinningrobotswithin oneround Optimalteamperformance: Combinatorialauctionstakeallpositiveandnegativesynergiesbetween targetsintoaccount Minimizationofthetotalpathcosts Drawbacks: Constraints:eachrobotwinsatmostonebundleandeachtargetiscontained inexactlyonebundle Robotscannotbidonallpossiblebundlesoftargetsbecausethe numberofpossiblebundlesisexponentialinthenumberoftargets Tocalculatecostsforeachbundlerequirestocalculatethesmallest pathcostforvisitingasetoftargets(TravelingSalesmanProblem) WinnerdeterminationisNP-hard Aftereachauction,robotsgainednewtargetsorexchanged targetswithotherrobots Then,thecyclerepeats 79 80 SequentialSingle-ItemAuctions ParallelSingle-ItemAuctions Protocol: Protocol: TargetsareauctionedafterthesequenceT1,T2,T3,T4,… Thevaluationistheincreaseinitssmallestpathcostthatresultsfrom winningtheauctionedtarget Therobotwiththeoverallsmallestbidisallocatedthecorresponding target Finally,eachrobotcalculatestheminimum-costpathforvisitingallof itstargetsandmovesalongthispath Everyrobotbidsoneachtargetinparalleluntil alltargetsareasigned Thevaluationisthesmallestpathcostfromthe robotscurrentpositiontothetarget SimilartoTargetClustering Advantage: Advantages: Hillclimbingsearch:somesynergiesbetweentargetsaretakeninto account(butnotallofthem) Simpletoimplementandcomputationandcommunicationefficient Sincerobotscandeterminethewinnersbylisteningtothebids(and identifyingthesmallestbid)themethodcanbeexecuteddecentralized Simpletoimplementandcomputation andcommunicationefficient Disadvantage: Theteamperformancecanbehighlysuboptimal sinceitdoesnottakeanysynergiesbetweenthe targetsintoaccount Disadvantages: Orderoftargetschangetheresult 81 Summary Utilitiesandcompetitive Votingmechanism WediscussedEnglish,Dutch,First-Price Sealed-Bid,andVickreyauctions Generalizedsecondpriceauctionshaveshown goodpropertiesinpractice,however,“truth telling”isnotadominantstrategy Combinatorialauctionsareamechanismto allocateanumberofgoodstoanumberofagents 83/83 82

Labs TDDD10 AI Programming

Related documents

Products

Support

Labs TDDD10 AI Programming

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib