Lectures TDDD10 AI Programming Automated Planning

Lectures TDDD10AIProgramming AutomatedPlanning CyrilleBerger Planningslidesborrowed fromDanaNau: http://www.cs.umd.edu/~nau/planning/slides/ 1AIProgramming:Introduction 2Introductionto 3AgentsandAgents 4Multi-Agentand 5Multi-AgentDecision 6CooperationAndCoordination 7CooperationAndCoordination 8MachineLearning 9AutomatedPlanning 10PuttingItAll 2/58 Lecturecontent AutomatedPlanning Whatisplanning? Typeofplanners Domain-dependentplanners Configurableplanners ReenforcementLearning IndividualandGroupAssignment 3/58 AutomatedPlanning DictionaryDefinitionsof“Plan” 1Ascheme,program,ormethodworkedout beforehandfortheaccomplishmentofanobjective:a planofattack. 2Aproposedortentativeprojectorcourseofaction: hadnoplansfortheevening. 3Asystematicarrangementofelementsorimportant parts;aconfigurationoroutline:aseatingplan;the planofastory. 4Adrawingordiagrammadetoscaleshowingthe structureorarrangementofsomething. 5Aprogramorpolicystipulatingaserviceorbenefit:a pensionplan. Whatisplanning? 6 Statetransitionsystem AIDefinitionofPlan Realworldisabsurdlycomplex, needtoapproximate [arepresentation]offuture behavior(...)usuallyasetof actions,withtemporalandother constraintsonthem,forexecution bysomeagentoragents. –AustinTate,MITEncyclopediaof theCognitiveSciences,1999 Onlyrepresentwhattheplannerneedstoreason about StatetransitionsystemΣ= (S,A,E,ɣ) S={abstractstates} e.g.,statesmightincludearobot’slocation,but notitspositionandorientation A={abstractactions} e.g.,“moverobotfromloc2toloc1”mayneed complexlower-levelimplementation E={abstractexogenousevents} Notundertheagent’scontrol ɣ=statetransitionfunction Givesthenextstate,orpossiblenextstates,after anactionorevent ɣ:S×(A∪E)→Sorɣ:S×(A∪E)→{S₁,...Sₙ } 7 8 Fromplantoexecution Example statesS={s₀,…, s₅} A={move1,move2, put,take,load, unload} E=∅ ɣ:S×A→S:defined ontherightside 9 10 Planningproblem Plan Classicalplan:a sequenceof ⟨take,move1,load,move2⟩ actions: Policy:partialfunction fromSintoA DescriptionofΣ =(S,A,E,ɣ) Initialstateorset ofstates Objective {(s₀,take),(s₁,move1), (s₃,load),(s₄,move2)} {(s₀,move1),(s₂,take), (s₃,load),(s₄,move2)} Goalstate,setofgoalstates, setoftasks,“trajectory”of states,objectivefunction,… Example: Both,ifexecuted startingats, produces₅ Initialstate=s₀ Goalstate=s₅ 11 12 PlanningandScheduling Applications Scheduling Robotics Sequenceofactions Pathandmotionplanning Decidewhenandhowto performagivensetofactions Industrialapplications TimeandResourceconstraints andpriorities i.e.theschedulerofyourkernel Printer NP-Complete Planning Decidewhatactionstouseto achievesomesetof objectives CanbemuchworsethanNPcomplete;worstcaseis undecidable Productionmachines:sheet-metalbending Forestfirefighting Packagesdelivery Games:playingbridge,chess... ... 13 14 Typeofplanners DomainMadeortunedforaspecificplanningdomain Won’tworkwell(ifatall)inotherplanningdomains Typeofplanners DomainInprinciple,worksinanyplanningdomain Inpractice,needrestrictionsonwhatkindof planningdomain Configurable Domain-independentplanningengine Inputincludesinfoabouthowtosolveproblemsin somedomain 16 Domain‐SpecificPlanners Domain-SpecificPlanner-Example Mostsuccessfulreal-worldplanningsystems workthisway: Bridge Aftercarddealing, playersmakebidsand prepareaplanthatthey willfollowduringthe game Domainspecific: Marsexploration,sheet-metalbending,playingbridge, Removestatedependingonthe cardyouareholding Forinstance,Northwillnotchoose "heart"astrumpcolor Oftenuseproblem-specifictechniquesthatare difficulttogeneralizetootherplanningdomains 17 18 RestrictiveAssumptions Domain-IndependentPlanners Inprinciple,worksinany planningdomain A0:Finitesystem:finitelymanystates,actions,events A1:Fullyobservable:thecontrolleralwaysΣ’scurrent state A2:Deterministic:eachactionhasonlyoneoutcome A3:Static(noexogenousevents):nochangesbutthe controller’sactions A4:Attainmentgoals:asetofgoalstatesSg A5:Sequentialplans:aplanisalinearlyordered sequenceofactions(a1,a2,...an) A6:Implicittime:notimedurations;linearsequenceof instantaneousstates A7:Off-lineplanning:plannerdoesn’tknowthe executionstatus Nodomain-specificknowledgeexceptthedescriptionof thesystemΣ In Notfeasibletomakedomain-independentplannerswork wellinallpossibleplanningdomains Makesimplifyingassumptionsto restrictthesetofdomains Classicalplanning Historicalfocusofmostresearchonautomatedplanning 19 20 ClassicalPlanning Classicalplanningrequiresalleightrestrictive assumptions Offlinegenerationofactionsequencesforadeterministic, static,finitesystem,withcompleteknowledge,attainment goals,andimplicittime Domain-dependentplanners Reducestothefollowingproblem: GivenaplanningproblemP=(Σ,s0, Findasequenceofactions(a1,a2,...an)thatproducesa sequenceofstatetransitions(s1,s2,...,sn)suchthatsnisin Sg. Thisisjustpath-searchinginagraph Nodes= Edges= Isthistrivial? 22 ClassingPlanning Plan‐SpacePlanning Decomposesetsofgoals intotheindividualgoals Planforthem Generalizetheearlierexample: 5locations, 3robotvehicles, 100containers, 3palletstostackcontainerson Thisisprobablyjustasingleboat... Bookkeepinginfotodetectandresolve interactions Produceapartially orderedplanthatretains asmuchflexibilityas possible TheMarsroversuseda temporal-planning extensionofthis Thenthereare10²⁷⁷states Numberofparticlesintheuniverseisonlyabout10⁸⁷ Theexampleismorethan10¹⁹⁰timesaslarge Automated-planningresearchhasbeen heavilydominatedbyclassicalplanning Dozens(hundreds?)ofdifferentalgorithms 23 24 PlanningGraphs HeuristicSearch HeuristicfunctionlikethoseinA* Createdusingtechniquessimilartoplanninggraphs Problem:A*quicklyrunsoutofmemory Sodoagreedysearchinstead Greedysearchcangettrappedinlocal minima Roughidea: Greedysearchpluslocalsearchatlocalminima First,solvearelaxed Each“level”containsalleffectsofallapplicableactions Eventhoughtheeffectsmaycontradicteachother HSP[Bonet&Geffner] FastForward[Hoffmann] Next,doastate-spacesearchwithintheplanning Graphplan,IPP,CGP,DGP,LGP,PGP,SGP,TGP, 25 26 Configurableplanners Inanyfixedplanningdomain,adomain-independent plannerusuallywillnotworkaswellasadomain-specific plannermadespecificallyforthatdomain Configurableplanners Adomain-specificplannermaybeabletogodirectlytowardasolutionin situationswhereadomain-independentplannerwouldexploremay alternativepaths Butwedon’twanttowriteawholenewplannerforevery domain Configurableplanners Domain-independentplanningengine Inputincludesinfoabouthowtosolveproblemsinthedomain Generallythismeansonecanwriteaplanningenginewith fewerrestrictionsthandomain-independentplanners HierarchicalTaskNetwork(HTN)planning Planningwithcontrolformulas 28 PlanningwithControlFormulas HTNPlanning(1/2) Problemreduction Tasks(activities)ratherthangoals Methodstodecomposetasksinto Ateachstates,wehaveacontrolformulawrittenin temporallogic Enforceconstraints,backtrackifnecessary e.g.,“neverpickupxunlessxneedstogoontopofsomethingelse” E.g.,taxinotgoodforlong Foreachsuccessorofs,deriveacontrolformulausing logicalprogression Pruneanysuccessorstateinwhichtheprogressed formulaisfalse Real-worldapplications Noah,Nonlin,O-Plan,SIPE,SIPE-2, SHOP,SHOP2 TLPlan,TALplanner, 29 30 ForwardandBackwardSearch HTNPlanning(2/2) Instate-spaceplanning,mustchoose whethertosearchforwardorbackward InHTNplanning,therearetwochoicesto makeaboutdirection: forwardorbackward upordown 31 32 Planninginanuncertainworld LimitationofOrdered-TaskPlanning Problemoftotalorder Untilnow,wehaveassumedthat eachactionhasonlyonepossible outcome Butoftenthat’sunrealistic Inmanysituations,actionsmay havemorethanonepossible outcome Thiscouldbenicer Actionfailures e.g.,gripperdropsitsload Exogenousevents e.g.,roadclosed Wouldliketobeabletoplanin suchsituations Oneapproach:MarkovDecision Processes Solvedwithpartialordermethod 33 34 AutomatedPlanning-Summary ReenforcementLearning Domain-specific Writeanentirecomputerprogram-lotsof Lotsofdomain-specific performanceimprovements Domain-independent Justgiveitthebasicactions-notmuch Notvery 35 ReinforcementLearningDefinition NaïveApproach Usesupervised learningtolearn: Agentsaregivensensoryinputs: States∊S RewardR∊ f(s,a)= Ateachsteps,agentsselect anoutput: Foranyinput state,pick thebest a=argmax action: Actiona∊A a∊A Willthat 37 MarkovDecisionProcess(1/2) Theagentneedtothinkahead Itneedsagoodsequence ofactions. FormalizedintheMarkov DecisionProcessframework! 38 MarkovDecisionProcess(2/2) FinitesetofstatesS,finiteset ofactionsA. Ateachdiscretetimestep agentobservesstatesₜ∊S choosesactionaₜ∊A andreceivesanimmediaterewardrₜ. Thestatechangestosₜ+1∊ Markovassumptionissₜ+1= δ(sₜ,aₜ)andrₜ=r(sₜ,aₜ). 39 40 PolicyFunction Rewards Thepolicyfunctiondecideswhichaction totakeineachstate: Tothinkahead: anagentslooksatfuturerewards:r(sₜ₊₁,aₜ₊₁), r(sₜ₊₂,aₜ₊₂)... formalizedasthesumofrewards(alsocalledutilityor value): V=∑ɣᵗr(sₜ,aₜ) aₜ=π(sₜ) Thepolicyfunctioniswhatwewantto ɣisthediscountfactormakingrewards faroffintothefuturelessvaluable. Ifwefollowaspecificpolicyπ,the valueofstatesₜis: Vπ(sₜ)=r(sₜ,π(sₜ))+ 41 ValueFunction 42 OptimalPolicy Valuefunctionforrandommovement: Ifwefollowaspecificpolicyπ: Vπ(sₜ)=r(sₜ,π(sₜ))+ɣVπ(sₜ₊₁) IfweknowVπ(st)thenthepolicy πisgivenby: Optimalvaluefunctionforoptimal policy: π(s)=argmaxa(r(s,a)+ɣVπ(δ(s,a))) Findingtheoptimalpolicyisabout findingπ(s)orVπ(sₜ)orboth. 43 44 Q-FunctionLearninginunknownenvironment ValueIteration InitializethefunctionV(s) withrandomvaluesV₀(s) Foreachstatesₜand eachiterationkdo: Optimalpolicy π*(s)=argmaxₐ(r(s,a)+V*(δ(s,a))) Whatifwedonotknowδ(s,a)?orr(s,a)? Q-Function Q(s,a)=r(s,a)+V*(δ(s,a))) π*(s)=argmaxa(Q(s,a)) computeVₖ₊₁(sₜ)=maxₐ(r(sₜ,a)+ɣVₖ(sₜ₊₁)) 45 UpdatetheQ-Function 46 Q-LearningforDeterministicWorlds Foreachs,ainitializetable entryQ^(s,a)⟵0. Observecurrentstates. Doforever: QandV*areclosely V*(s)=maxₐQ(s,a) Qcanbewritten Q(sₜ,aₜ)=r(sₜ,aₜ)+V*(δ(sₜ,aₜ)) =r(sₜ,aₜ)+ɣmaxₐ'Q(sₜ₊₁,a') Selectanactionaandexecute Receiveimmediatereward Observethenewstate IfQ^denotethecurrent approximationofQthenitcanbe updatedby: Updatethetableentryfor Q^(s,a):=r+ɣmaxₐ' s⟵ Q^(s,a):=r+ɣmaxₐ'Q^(s',a') 47 48 Q-LearningforNonDeterministicWorlds Q-LearningExample Whatiftheworldisnondeterministic? VandQarethenexpected V=E[∑ɣᵗr(sₜ,aₜ)] Q(s,a)=E[r(s,a)+V*(δ(s,a)))] 49 Q-LearningforNonDeterministicWorlds 50 Summary LearningQbecomes: Advantages Q^ₙ(s,a):=(1-αₙ)Q^ₙ₋₁(s,a)+αₙ(r+ɣmaxₐ' Q^ₙ₋₁(s',a')) Allowsanagenttoadapttomaximiserewards inapotentiallyunknownenvironment. Disadvantages Requirescomputationpolynomialinthe numberofstates! Thenumberofstatesgrowsexponentially withinputdimensions! ReinforcementLearningassumesdiscrete statesandactionspaces. 51 52 Project IndividualandGroupAssignment Agroupof4to6students ImplementaRoboRescue Workindividuallyonasubpartof theproblem 54 Tasks Reports Indiviual Foundationtasks Findaround4relatedarticles Writeaonepagedescription Deadline:October,28th Navigation Communication Agents:police,ambulance,fire brigad Individual Implementandevaluatethetechnique Writeareportdescribingthetechnique,resultsanda discussion Deadline,draft:December,14th,final:January6th Exploration Prediction Tasksallocation Group Oneperteam! Adescriptionofthealgorithmsandstrategiesused 55 56 WhatisagoodReport Summary Gradebasedonthereportquality,suchasreadability, language,pictures,structureandlength,and theleveloftechnicaldetailweightedwiththedifficultyofthe chosenapproaches Thereportsshouldbe5-6pages,butitismoreimportanttomake itpossibleforthereadertounderstandyourworkthantogetthe exactrightnumberofpages. Youcangetfeedback,submityourdraftbeforeDecember14th, andwegetaseminaronDecember16thfordiscussingthedrafts Automatedplanning Classicalplanningproblem HTN Reenforcementlearning Markovdecisionprocess Q-Learning 57 58/58

Lectures TDDD10 AI Programming Automated Planning

Related documents

Products

Support

Lectures TDDD10 AI Programming Automated Planning

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib