Labs TDDD10 AI Programming

advertisement
Labs
TDDD10AIProgramming
MultiagentDecisionMaking
CyrilleBerger
newmap:Kobe2013-stations
ChangeSetcontainsalltheproperties,notjustnewones
InAbstractAgentclass:
protectedvoidprocessSense(KASensesense){
model.merge(sense.getChangeSet());
Collection<Command>heard=sense.getHearing();
think(sense.getTime(),sense.getChangeSet(),heard);
}
Youcanoverrideit:
protectedvoidprocessSense(KASensesense){
//sendupdatetootheragent
//usingworldmodelbeforemerge
super.processSense(sense);
}
2/83
Lectures
Lecturegoals
1AIProgramming:Introduction
2IntroductiontoRoboRescue
3AgentsandAgentsArchitecture
4Multi-AgentandCommunication
5Multi-AgentDecisionMaking
6CooperationAndCoordination1
7CooperationAndCoordination2
8MachineLearning
9AutomatedPlanning
10PuttingItAllTogether
Multi-agentdecisioninacompetitive
environment
Learnabouttheconceptofutility,
rationalagents,votingandauctioning
3/83
4/83
Lecturecontent
Self-InterestedAgents
SocialChoice
Auctions
Self-InterestedAgents
SingleDimensionAuctions
CombinatorialAuctions
5/83
UtilitiesandPreferences
Whatisutility?
Utilityisnotmoney,butsimilar
Assumewehavejusttwoagents:Ag={i,j}
Agentsareassumedtobeself-interested:theyhave
preferencesoverhowtheenvironmentis
AssumeΩ={ω₁,ω₂,…}isthesetof“outcomes”
thatagentshavepreferencesover
Wecapturepreferencesbyutilityfunctions:
uᵢ=Ω→ℝ
uⱼ=Ω→ℝ
Utilityfunctionsleadtopreferenceorderingsover
outcomes:ω⪰ω’meansuᵢ(ω)≥uᵢ(ω’)
ω⪲ω’meansuᵢ(ω)>uᵢ(ω’)
7
8
MultiagentEncounters(1/2)
Self-InterestedAgents
Ifagentsrepresentindividualsororganizationsthenwecannotmake
thebenevolenceassumption.
Weneedamodeloftheenvironmentin
whichtheseagentswillact…
agentssimultaneouslychooseanactiontoperform,andasa
resultoftheactionstheyselect,anoutcomeinΩwillresult
theactualoutcomedependsonthecombinationofactions
assumeeachagenthasjusttwopossibleactionsthatitcan
perform,C(“cooperate”)andD(“defect”)
Environmentbehaviorgivenbystate
transformerfunction:
Agentswillbeassumedtoacttofurtherthereowninterests,possibly
atexpenseofothers.
Potentialforconflict.
Maycomplicatethedesigntaskenormously.
τ:Acⁱ⨯Acʲ→Ω
9
MultiagentEncounters(2/2)
10
Coordinationgame
Supposewehavethecasewherebothagentscaninfluence
theoutcome,andtheyhaveutilityfunctionsasfollows:
Examplesofastatetransformer
function
uᵢ(ω₁)=2uᵢ(ω₂)=1uᵢ(ω₃)=3uᵢ(ω₄)=4
uⱼ(ω₁)=2uⱼ(ω₂)=3uⱼ(ω₃)=1uⱼ(ω₄)=4
Thisenvironmentissensitivetoactionsofboth
agents:
τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₃τ(C,C)=ω₄
Neitheragenthasanyinfluenceinthisenvironment:
τ(D,D)=ω₁τ(D,C)=ω₁τ(C,D)=ω₁τ(C,C)=ω₁
Thisenvironmentiscontrolledbyj
τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₁τ(C,C)=ω₂
Thisenvironmentissensitivetoactionsofbothagents:
τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₃τ(C,C)=ω₄
Withabitofabuseofnotation:
uᵢ(D,D)=2uᵢ(D,C)=1uᵢ(C,D)=3uᵢ(C,C)=4
uⱼ(D,D)=2uⱼ(D,C)=3uⱼ(C,D)=1uⱼ(C,C)=4
Thenagenti’spreferencesare:
C,C⪰ᵢC,D≻ᵢD,C⪰ᵢD,D
“C”istherationalchoicefori.
11
12
DisgraceofGijón(WorldCup1982)
PayoffMatrices
Wecancharacterizetheprevious
scenarioinapayoffmatrix:
Onegameleft:Germany-Austria
uᵢ(≥3-0)=2uⱼ(≥3-0)=-1
uᵢ(2-0)=uᵢ(1-0)=2uⱼ(2-0)=uⱼ(1-0)=1
uᵢ(a-a)=-1uⱼ(a-a)=2
uᵢ(0-a)=-1uⱼ(0-a)=2(a>1)
Finalscore:Germany1-0Austria
Agentiisthecolumnplayer
Agentjistherowplayer
13
ThePrisoner’sDilemma
14
ThePrisoner’sDilemma
Payoffmatrixforprisoner’sdilemma:
Twomenarecollectivelychargedwitha
crimeandheldinseparatecells,withnoway
ofmeetingorcommunicating.Theyaretold
that:
ifoneconfessesandtheotherdoesnot,theconfessorwill
befreed,andtheotherwillbejailedforthreeyears
Ifbothconfess,theneachwillbejailedfortwoyears
Topleft:Ifbothdefect,thenbothgetpunishmentformutualdefection
Topright:Ificooperatesandjdefects,igetssucker’spayoffof1,while
jgets4
Bottomleft:Ifjcooperatesandidefects,jgetssucker’spayoffof1,
whileigets4
Bottomright:Rewardformutualcooperation
Bothprisonersknowthatifneitherconfesses,
thentheywilleachbejailedforoneyear
15
16
DominantStrategies(1/2)
SolutionConcepts
Givenanyparticularstrategy(eitherCorD)ofagenti,
therewillbeanumberofpossibleoutcomes
Wesays₁dominatess₂ifeveryoutcomepossiblebyi
playings₁ispreferredovereveryoutcomepossiblebyi
playings₂
Arationalagentwillneverplayadominatedstrategy
Soindecidingwhattodo,wecandelete
dominatedstrategies
Unfortunately,thereisnotalwaysaunique
undominatedstrategy
Howwillarationalagentbehaveinany
givenscenario?
Answeredinsolutionconcepts:
dominantstrategy;
Nashequilibriumstrategy;
Paretooptimalstrategies;
strategiesthatmaximizesocialwelfare.
17
18
(PureStrategy)NashEquilibrium(1/2)
DominantStrategies(2/2)
Ingeneral,wewillsaythattwostrategiess1ands2arein
Nashequilibriumif:
Coordinationgame:
undertheassumptionthatagentiplayss₁,agentjcandonobetterthanplays₂;
and
undertheassumptionthatagentjplayss₂,agenticandonobetterthanplays₁.
NeitheragenthasanyincentivetodeviatefromaNash
equilibrium
Unfortunately:
NoteveryinteractionscenariohasaNashequilibrium
SomeinteractionscenarioshavemorethanoneNash
equilibrium
Prisoner'sDilemna:
19
20
(PureStrategy)NashEquilibrium(2/2)
ParetoOptimality(1/2)
Coordinationgame:
AnoutcomeissaidtobeParetooptimal(orParetoefficient)ifthere
isnootheroutcomethatmakesoneagentbetteroffwithoutmaking
anotheragentworseoff.
IfanoutcomeisParetooptimal,thenatleastoneagentwillbe
reluctanttomoveawayfromit(becausethisagentwillbeworse
off).
Prisoner'sDilemna:
IfanoutcomeωisnotParetooptimal,thenthereisanother
outcomeω’thatmakeseveryoneashappy,ifnothappier,thanω.
“Reasonable”agentswouldagreetomovetoω’inthiscase.(Evenif
Idon’tdirectlybenefitfromω,youcanbenefitwithoutme
suffering.)
21
ParetoOptimality(2/2)
22
SocialWelfare(1/2)
Thesocialwelfareofanoutcomeωisthesumoftheutilities
thateachagentgetsfromω:
Coordinationgame:
Prisoner'sDilemna:
Thinkofitasthe“totalamountofutilityinthesystem”.
Asasolutionconcept,maybeappropriatewhenthewhole
system(allagents)hasasingleowner(thenoverallbenefit
ofthesystemisimportant,notindividuals).
23
24
SocialWelfare(2/2)
ThePrisoner’sDilemma
Solutionconcepts
Coordinationgame:
Disadominantstrategy.
(D,D)istheonlyNashequilibrium.
Alloutcomesexcept(C,C)areParetooptimal.
(C,C)maximizessocialwelfare.
Theindividualrationalactionisdefect
Thisguaranteesapayoffofnoworsethan2,whereas
cooperatingguaranteesapayoffofatmost1.Sodefection
isthebestresponsetoallpossiblestrategies:bothagents
defect,andgetpayoff=2
Butintuitionsaysthisisnotthebestoutcome:
Surelytheyshouldbothcooperateandeachgetpayoffof3!
Prisoner'sDilemna:
25
26
ThePrisoner’sDilemma
TheIteratedPrisoner’sDilemma
Thisapparentparadoxisthefundamental
problemofmulti-agentinteractions.
Itappearstoimplythatcooperationwillnot
occurinsocietiesofself-interestedagents.
Realworldexamples:
Oneanswer:playthegamemorethan
once
Ifyouknowyouwillbemeetingyour
opponentagain,thentheincentiveto
defectappearstoevaporate
Cooperationistherationalchoiceinthe
infinitelyrepeatedprisoner’sdilemma
nucleararmsreduction(“whydon’tIkeepmine...”)
freeridersystems—publictransport;
televisionlicenses.
Canwerecovercooperation?
27
28
BackwardsInduction
Axelrod’sTournament
But…,supposeyoubothknowthatyouwillplay
thegameexactlyntimes
Supposeyouplayiteratedprisoner’s
dilemmaagainstarangeofopponents…
Whatstrategyshouldyouchoose,soas
tomaximizeyouroverallpayoff?
Axelrod(1984)investigatedthisproblem,
withacomputertournamentfor
programsplayingtheprisoner’sdilemma
Onroundn-1,youhaveanincentivetodefect,togainthatextra
bitofpayoff…
Butthismakesroundn–2thelast“real”,andsoyouhavean
incentivetodefectthere,too.
Thisisthebackwardsinductionproblem.
Playingtheprisoner’sdilemmawithafixed,
finite,pre-determined,commonlyknown
numberofrounds,defectionisthebeststrategy
29
StrategiesinAxelrod’sTournament
30
Axelrod’sTournamentresults
RANDOM
ALLD:“Alwaysdefect”—thehawkstrategy;
TIT-FOR-TAT:
TIT-FOR-TATwonthefirsttournament
Asecondtournamentwascalled
TIT-FOR-TATwonthesecond
tournamentaswell
Onroundu=0,cooperate
Onroundu>0,dowhatyouropponentdidonroundu–1
TESTER:
On1stround,defect.Iftheopponentretaliated,thenplayTITFOR-TAT.Otherwiseinterspersecooperationanddefection.
JOSS:
AsTIT-FOR-TAT,exceptperiodicallydefect
31
32
RecipesforSuccessinAxelrod’sTournament
CompetitiveandZero-SumInteractions
Wherepreferencesofagentsarediametrically
opposedwehavestrictlycompetitivescenarios
Zero-sumencountersarethosewhereutilities
sumtozero:
Axelrodsuggeststhefollowingrulesfor
succeedinginhistournament:
Don’tbeenvious:
Don’tplayasifitwerezerosum!
Benice:
Startbycooperating,andreciprocatecooperation
Retaliateappropriately:
Alwayspunishdefectionimmediately,butuse“measured”
force—don’toverdoit
Don’tholdgrudges:
Alwaysreciprocatecooperationimmediately
uᵢ(ω)+uⱼ(ω)=0forallω∊Ω
Zerosumimpliesstrictlycompetitive
Zerosumencountersinreallifeareveryrare,
butpeopletendtoactinmanyscenariosasif
theywerezerosum
33
34
MixedStrategiesforMatchingPennies
MatchingPennies
Playersiandjsimultaneouslychoosethe
faceofacoin,either“heads”or“tails”.
Iftheyshowthesameface,theniwins,
whileiftheyshowdifferentfaces,thenj
wins.
Nopairofstrategiesformsapure
strategyNashEquilibrium:whateverpair
ofstrategiesischosen,somebodywill
wishtheyhaddonesomethingelse.
Thesolutionistoallowmixedstrategies:
play“heads”withprobability0.5
play“tails”withprobability0.5.
ThisisaNashEquilibriumstrategy.
35
36
MixedStrategies
Amixedstrategyhastheform
playα₁withprobabilityp₁
playα₂withprobabilityp2₂
...
playαkwithprobabilitypk.
thatp₁+p₂+…+pₖ=1.
Nashprovedthateveryfinitegamehasa
Nashequilibriuminmixedstrategies.
SocialChoice
37
ComponentsofaSocialChoiceModel
SocialChoice
Socialchoicetheoryisconcernedwith
groupdecisionmaking.
Classicexampleofsocialchoicetheory:
voting.
Formally,theissueiscombining
preferencestoderiveasocialoutcome.
AssumeasetAg={1,…,n}ofvoters.
Thesearetheentitieswhoexpresses
preferences.
VotersmakegroupdecisionswrtasetΩ
={ω₁,ω₂,…}ofoutcomes.
Thinkoftheseasthecandidates.
If|Ω|=2,wehaveapairwiseelection.
39
40
Preferences
PreferenceAggregation
Thefundamentalproblemofsocialchoice
theory:
Givenacollectionofpreferenceorders,one
foreachvoter,howdowecombinetheseto
deriveagroupdecision,thatreflectsas
closelyaspossiblethepreferencesofvoters?
variantsofpreferenceaggregation:
EachvoterhaspreferencesoverW:an
orderingoverthesetofpossible
outcomesΩ.
Example,Suppose:
Ω={gin,rum,brandy,whisky}
thenwemighthaveagentiwithpreferenceorder:
ωᵢ=(brandy,rum,gin,whisky)
meaning:
socialwelfarefunctions;
socialchoicefunctions.
brandy>ᵢrum>ᵢgin>ᵢwhisky
41
SocialWelfareFunctions
42
SocialChoiceFunctions
LetП(Ω)bethesetofpreferenceorderingsoverΩ.
Asocialwelfarefunctiontakesthevoter
preferencesandproducesasocialpreferenceorder:
Sometimes,wewantjusttoselectone
ofthepossiblecandidates,ratherthana
socialorder.
Thisgivessocialchoicefunctions:
Wedefine≻*astheoutcomeofasocialwelfare
function
whisky≻*gin≻*brandy≻*rum≻*gin
S≻*M≻*SD≻*MP≻*C≻*V≻*FP≻*KD≻*FI≻*PP
Example:presidentialelection.
43
44
VotingProcedures:Plurality
AnomalieswithPlurality
Socialchoicefunction:selectsasingleoutcome.
Eachvotersubmitspreferences.
Eachcandidategetsonepointforevery
preferenceorderthatranksthemfirst.
Winneristheonewithlargestnumberofpoints.
Suppose|Ag|=100andΩ={ω₁,ω₂,
ω₃}with:
40%votersvotingforω₁
30%ofvotersvotingforω₂
30%ofvotersvotingforω₃
Example:PoliticalelectionsinUK,France,USA...
Withplurality,ω₁getselectedeven
thoughaclearmajority(60%)prefer
anothercandidate!
Ifwehaveonlytwocandidates,thenpluralityis
asimplemajorityelection.
45
StrategicManipulationbyTacticalVoting
46
Condorcet’sParadox
Supposeyourpreferencesare
SupposeAg={1,2,3}andΩ={ω₁,ω₂,ω₃}with:
ω₁≻ω₂≻ω₃
ω₁≻₁ω₂≻₁ω₃
ω₂≻₂ω₃≻₂ω₁
ω₃≻₃ω₁≻₃ω₂
whileyoubelieve49%ofvotershavepreferences
ω₂≻ω₁≻ω₃
andyoubelieve49%havepreference
Foreverypossiblecandidate,thereisanother
candidatethatispreferredbyamajorityofvoters!
ThisisCondorcet’sparadox:therearesituationsin
which,nomatterwhichoutcomewechoose,a
majorityofvoterswillbeunhappywiththe
outcomechosen.
ω₃≻ω₂≻ω₁
Youmaydobettervotingforw2,eventhoughthisis
notyourtruepreferenceprofile.
Thisistacticalvoting:anexampleofstrategic
manipulationofthevote.
Especiallyaproblemintwolegselections
47
48
Applicationsofsocialchoicetheory
Mainapplicationisforhumanchoice
anddecisionmaking
Resultsaggregation
Auctions
aggregatetheoutputofseveralsearchengines
49
Applicationofauctions
WhatisanAuction?
WiththeriseoftheInternet,auctionshavebecome
popularinmanye-commerceapplications(e.g.eBay)
Auctionsareanefficienttoolforreaching
agreementsinasocietyofself-interestedagents
Anauctiontakesplacebetweenan
agentknownastheauctioneeranda
collectionofagentsknownasthe
bidders
Forexample,bandwidthallocationonanetwork,sponsorlinks
Auctionscanbeusedforefficientresourceallocation
withindecentralizedcomputationalsystems
Frequentlyutilizedforsolvingmulti-agentand
multi-robotcoordinationproblems
Thegoaloftheauctionisfortheauctioneerto
allocateallgoodstothebidders
Theauctioneerdesirestomaximizethepriceand
biddersdesiretominimizetheprice
Forexample,team-basedexplorationofunknownterrain
51
52
LimitPrice
LimitPrice
Privatevalue
Eachtraderhasavalueorlimitpricethatthey
placeonthegood.
Goodhasanvaluetomethatisindependentofwhatitisworthtoyou.
TextbookgivestheexampleofJohnLennon’slastdollarbill.
Abuyerwhoexchangesmorethantheirlimitpriceforagood
makesaloss.
Asellerwhoexchangesagoodforlessthantheirlimitprice
makesaloss.
Commonvalue
Thegoodhasthesamevaluetoallofus,butwehavediffering
estimatesofwhatitis.
Winner’scurse
Limitpricesclearlyhaveaneffectonthe
behavioroftraders.
Thereareseveralmodels,embodyingdifferent
assumptionsaboutthenatureofthegood.
Correlatedvalue
Ourvaluesarerelated.
Themoreyouarepreparedtopay,themoreIshouldbepreparedto
pay.
53
Winner'scurse
54
AuctionCharacteristics
Termedinthe1950s:
Auctionprocedure
OilcompaniesbidfordrillingrightsintheGulfof
Problemwasthebiddingprocessgiventheuncertaintiesinestimatingthe
potentialvalueofanoffshoreoilfield
Competitivebiddinginhighrisksituations,byCapen,ClappandCampbell,Journal
ofPetroleumTechnology,1971
Oneshot:Onlyonebidding
Ascending:Auctioneerbeginsatminimumprice,biddersincrease
Descending:Auctioneerbeginsatpriceovervalueofgoodandlowers
thepriceateachround
Continuous:Internet
Forexample
Auctionsmaybe
Anoilfieldhadanactualintrinsicvalueof$10
Oilcompaniesmightguessitsvaluetobeanywherefrom$5millionto$20
Thecompanywhowronglyestimatedat$20millionandplacedabidatthat
levelwouldwintheauction,andlaterfindthatitwasnotworththatmuch
StandardAuction:Onesellerandmultiple
ReverseAuction:Onebuyerandmultiple
DoubleAuction:Multiplesellersandmultiple
Inmanycasesthewinneristhepersonwhohasoverestimated
themost⇒“TheWinner’scurse”
BidShading:Offerbidbelowacertainamountofthevaluation
CombinatorialAuctions
Buyersandsellersmayhavecombinatorialvaluationsforbundlesof
55
56
SingleversusMulti-dimensional
Singledimensionalauctions
Theonlycontentofanofferarethepriceandquantity
ofsomespecifictypeofgood.
“I’llbid$200forthose2chairs”
SingleDimensionAuctions
Multidimensionalauctions
Offerscanrelatetomanydifferentaspectsofmany
differentgoods.
“I’mpreparedtopay$200forthosetworedchairs,but
$300ifyoucandeliverthemtomorrow.”
Frequencyrangesforcellphones
57
EnglishAuction
DutchAuction
Dutchauctionsareexamplesoffirst-priceopen-cry
descendingauctions
Protocol:
Anexampleoffirst-priceopen-cryascendingauctions
Protocol:
Auctioneerstartsbyofferingthegoodatalow
Auctioneeroffershigherpricesuntilnoagentiswillingtopaythe
proposedlevel
Thegoodisallocatedtotheagentthatmadethehighest
Auctioneerstartsbyofferingthegoodatartificiallyhighvalue
Auctioneerlowersofferpriceuntilsomeagentmakesabidequaltothecurrent
offerprice
Thegoodisthenallocatedtotheagentthatmadetheoffer
Properties
Properties
Generatescompetitionbetweenbidders(generatesrevenueforthe
sellerwhenbiddersareuncertainoftheirvaluation)
Dominantstrategy:Bidslightlymorethancurrentbit,withdrawif
bidreachespersonalvaluationofgood
Winner’scurse(forcommonvaluegoods)
Itemsaresoldrapidly(cansellmanylotswithinasingleday)
Intuitivestrategy:waitforalittlebitafteryourtruevaluationhasbeencalled
andhopenooneelsegetsintherebeforeyou(nogeneraldominantstrategy)
Winner’scursealsopossible
59
60
First-PriceSealed-BidAuctions
VickreyAuctions
ProposedbyWilliamVickreyin1961(NobelPrizeinEconomicSciencesin
1996)
Vickreyauctionsareexamplesofsecond-pricesealed-bidone-shot
Protocol:
First-pricesealed-bidauctionsareone-shotauctions:
Protocol:
Withinasingleroundbidderssubmitasealedbidforthegood
Thegoodisallocatedtotheagentthatmadehighestbid
Winnerpaysthepriceofhighestbid
Oftenusedincommercialauctions,e.g.,publicbuildingcontractsetc.
withinasingleroundbidderssubmitasealedbidforthegood
goodisallocatedtoagentthatmadehighestbid
winnerpayspriceofsecondhighestbid
Dominantstrategy:bidyourtrue
Problem:thedifferencebetweenthehighestandsecond
highestbidis“wastedmoney”(thewinnercouldhave
offeredless)
Intuitivestrategy:bidalittlebitlessthanyourtrue
valuation(nogeneraldominantstrategy)
ifyoubidmore,yourisktopaytoomuch
ifyoubidless,youloweryourchancesofwinningwhilestillhavingtopaythesamepriceincaseyou
win
Antisocialbehavior:bidmorethanyourtruevaluationtomake
opponentssuffer(not“rational”)
Forprivatevalueauctions,strategicallyequivalenttotheEnglish
auctionmechanism
Asmorebiddersassmallerthedeviationshouldbe!
61
62
Generalizedsecondpriceauctions
Generalizedfirstpriceauctions
UsedbyYahoofor“sponsoredlinks”auctions
Introducedin1997forsellingInternetadvertisingby
Yahoo/Overture(beforetherewereonly“bannerads”)
IntroducedbyGoogleforpricing
sponsoredlinks(AdWordsSelect)
Observation:Biddersgenerallydonot
wanttopaymuchmorethantherank
belowthem
Therefore:2ndpriceauction
Furthermodifications:
Advertiserssubmitabidreportingthewillingnesstopayonaper-clickbasisfor
aparticularkeyword
Cost-Per-Click(CPC)bid
Advertiserswerebilledforeach“click”onsponsoredlinks
leadingtotheirpage
Advertisersbidforkeywordsandkeywordcombinations
Rank:CPC_BIDXqualityscore
Price:withrespecttolowerranks
Thelinkswerearrangedindescendingorderofbids,makinghighestbidsthe
mostprominent
Auctionstakeplaceduringeach
http://www.chipkin.com/googleadwords-actual-cpc-calculation/
AfterseeingGoogle’ssuccess,Yahooalso
switchedtosecondpriceauctionsin2002
However,auctionmechanismturnedouttobeunstable!
Biddersrevisedtheirbidsasoftenaspossible
63
64
CombinatorialAuctions
Inacombinatorialauction,theauctioneerputs
severalgoodsonsaleandtheotheragentssubmit
bidsforentirebundlesofgoods
Givenasetofbids,thewinnerdetermination
problemistheproblemofdecidingwhichofthe
bidstoaccept
CombinatorialAuctions
Thesolutionmustbefeasible(nogoodmaybeallocatedtomorethan
oneagent)
Ideally,itshouldalsobeoptimal(inthesenseofmaximizingrevenue
fortheauctioneer)
Achallengingalgorithmicproblem
66
ComplementsandSubstitutes
Protocol
Thevalueanagentassignstoabundleofgoodsmay
dependonthecombination
Complements:Thevalueassignedtoasetisgreaterthan
thesumofthevaluesassignstoitselements
Oneauctioneer,severalbidders,andmanyitemstobesold
Eachbiddersubmitsanumberofpackagebidsspecifyingthevaluation
(price)thebidderispreparedtopayforaparticularbundle
Theauctioneerannouncesanumberofwinning
Thewinningbidsdeterminewhichbidderobtainswhichitem,and
howmucheachbidderhastopay
Example:„apairofshoes”(leftshoeandarightshoe)
Substitutes:Thevalueassignedtoasetislowerthanthe
sumofthevaluesassignedtoitselements
Noitemmaybeallocatedtomorethanonebidder
Examplesofpackagebids:
Agent1:({a,b},5),({b,c},7),({c,d},6)
Agent2:({a,d},7),({a,c,d},8)
Agent3:({b},5),({a,b,c,d},12)
Example:atickettothetheatreandanotheronetoafootballmatchforthe
samenight
Insuchcasesanauctionmechanismallocatingoneitemata
timeisproblematicsincethebestbiddingstrategyinone
auctionmaydependontheoutcomeofotherauctions
Generally,thereare2n−1non-emptybundlesfornitems,howto
computetheoptimalsolution?
67
68
OptimalWinnerDeterminationAlgorithm
AnauctioneerhasasetofitemsM={1,2,…,m}to
ThereareN={1,2,…,n}buyersplacingbids
BuyerssubmitasetofpackagebidsB={B1,B2,…,Bn}
ApackagebidisatupleB=[S,v(S)],whereS⊆Misaset
ofitems(bundle)andvi(S)>0buyer’sitruevaluation
xS,i∈{0,1}isadecisionvariableforassigningbundleSto
buyeri
Thewinnerdeterminationproblem(WDP)istolabelthebids
aswinningorlosing(bydecidingeachxs,isoastomaximizethe
sumofthetotalacceptedbidprice)
ThisisNP-Complete!Canbesolvedwithaninteger
programsolver,orheuristicsearch
SolvingWDPsbyHeuristicSearch
Twowaysofrepresentingthestate
Branch-on-items:
Astateisasetofitemsforwhichanallocationdecision
hasalreadybeenmade
Branchingiscarriedoutbyaddingafurtheritem
Branch-on-bids:
Astateisasetofbidsforwhichanacceptancedecision
hasalreadybeenmade
Branchingiscarriedoutbyaddingafurtherbid
69
70
Problemwithbranch-on-items
Branch-on-Items
Branchingbasedonthequestion:
“Whatbidshouldthisitembe
assignedto?”
Eachpathinthesearchtree
consistsofasequenceofdisjoint
bids
Whatiftheauctioneer'srevenuecanincreaseby
keepingitems?
Example:
Thereisnobidfor1,
$5bidfor2,
$3bidfor{1;2}
Thus,bettertokeep1andsell2thanselling
Theauctioneer'spossibilityofkeepingitemscanbe
implementedbyplacingdummybidsofpricezeroon
thoseitemsthatreceivedno1-itembids(Sandholm
2002)
Bidsthatdonotshareitemswitheachother
Apathendswhennobidcanbeaddedtoit
Costsateachnodearethesumof
thepricesofthebidsacceptedon
thepath
71
72
Branch-on-bids
Exampleofbranch-on-items
Branchingisbasedonthequestion:“Shouldthis
bidbeacceptedorrejected?“
Binarytree
Whenbranchingonabid,thechildreninthesearch
treearetheworldwherethatbidisaccepted(IN),
andtheworldwherethatbidisrejected(OUT)
Nodummybidsareneeded
Firstabidgraphisconstructedthatrepresentsall
constraintsbetweenthebids
Then,bidsareaccepted/rejecteduntilallbidshave
beenhandled
Bids:{1,2},{2,3},{3},
{1;3}
WeaddDummyBids:
{1},{2}
Onaccept:removeallconstrainedbidsfromthegraph
Onreject:removebiditselffromthegraph
73
Branch-on-bids-Example
74
HeuristicFunction
ForanynodeNinthesearchtree,letg(N)betherevenue
generatedbybidsthatwereacceptedaccordinguntilN
Theheuristicfunctionh(N)estimatesforeverynodeNhow
muchadditionalrevenuecanbeexpectedongoingfromN
Anupperboundonh(N)isgivenbythesumoverthe
maximumcontributionofthesetofunallocateditemsA:
Bids:{1,2},{2,3},{3},
{1;3}
Tighterboundscanbeobtainedbysolvingthelinear
programrelaxationoftheremainingitems(Sandholm2006)
75
76
AuctionsforMulti-RobotExploration
Multi-RobotExploration
Considerateamofmobilerobotsthathastovisitanumberofgiven
targets(locations)ininitiallypartiallyunknownterrain
Examplesofsuchtasksarecleaningmissions,space-exploration,
surveillance,andsearchandrescue
Continuousre-allocationoftargetstorobotsisnecessary
Forexample,robotsmightdiscoverthattheyareseparatedbyablockagefromtheir
target
Toallocateandre-allocatethetargetsamongthemselves,the
robotscanuseauctionswheretheysellandbuytargets
Teamobjectivecanbetominimizethesumofallpathcosts,hence,
biddingpricesareestimatedtravelcosts
Thepathcostofarobotisthesumoftheedgecostsalongitspath,
fromitscurrentlocationtothelasttargetthatitvisits
ThreerobotsexploringMars.Therobots’taskistogatherdataaroundthe
fourcraters,e.g.tovisitthehighlightedtargetsites.Source:N.Kalra
77
78
Single-RoundCombinatorialAuction
GeneralExploration
Robotalwaysfollowaminimumcostpaththatvisitsall
allocatedtargets
Wheneverarobotgainsmoreinformationabouttheterrain,it
sharesthisinformationwiththeotherrobots
Iftheremainingpathofatleastonerobotisblocked,thenall
robotsputtheirunvisitedtargetsupforauction
Theauction(s)closeafterapredeterminedamountoftime
Protocol:
Everyrobotbidsallpossiblebundlesoftargets
Thevaluationistheestimatedsmallestpathcostneededtovisitall
targetsinthebundle(TSP)
Acentralauctioneerdeterminesandinformsthewinningrobotswithin
oneround
Optimalteamperformance:
Combinatorialauctionstakeallpositiveandnegativesynergiesbetween
targetsintoaccount
Minimizationofthetotalpathcosts
Drawbacks:
Constraints:eachrobotwinsatmostonebundleandeachtargetiscontained
inexactlyonebundle
Robotscannotbidonallpossiblebundlesoftargetsbecausethe
numberofpossiblebundlesisexponentialinthenumberoftargets
Tocalculatecostsforeachbundlerequirestocalculatethesmallest
pathcostforvisitingasetoftargets(TravelingSalesmanProblem)
WinnerdeterminationisNP-hard
Aftereachauction,robotsgainednewtargetsorexchanged
targetswithotherrobots
Then,thecyclerepeats
79
80
SequentialSingle-ItemAuctions
ParallelSingle-ItemAuctions
Protocol:
Protocol:
TargetsareauctionedafterthesequenceT1,T2,T3,T4,…
Thevaluationistheincreaseinitssmallestpathcostthatresultsfrom
winningtheauctionedtarget
Therobotwiththeoverallsmallestbidisallocatedthecorresponding
target
Finally,eachrobotcalculatestheminimum-costpathforvisitingallof
itstargetsandmovesalongthispath
Everyrobotbidsoneachtargetinparalleluntil
alltargetsareasigned
Thevaluationisthesmallestpathcostfromthe
robotscurrentpositiontothetarget
SimilartoTargetClustering
Advantage:
Advantages:
Hillclimbingsearch:somesynergiesbetweentargetsaretakeninto
account(butnotallofthem)
Simpletoimplementandcomputationandcommunicationefficient
Sincerobotscandeterminethewinnersbylisteningtothebids(and
identifyingthesmallestbid)themethodcanbeexecuteddecentralized
Simpletoimplementandcomputation
andcommunicationefficient
Disadvantage:
Theteamperformancecanbehighlysuboptimal
sinceitdoesnottakeanysynergiesbetweenthe
targetsintoaccount
Disadvantages:
Orderoftargetschangetheresult
81
Summary
Utilitiesandcompetitive
Votingmechanism
WediscussedEnglish,Dutch,First-Price
Sealed-Bid,andVickreyauctions
Generalizedsecondpriceauctionshaveshown
goodpropertiesinpractice,however,“truth
telling”isnotadominantstrategy
Combinatorialauctionsareamechanismto
allocateanumberofgoodstoanumberofagents
83/83
82
Download