Yoon-ArtificialIntel..

advertisement
Artificial Intelligence (AI)
Planning
Sungwook Yoon
Sungwook Yoon
What do we (AI researchers) mean by
Plan?
Sungwook Yoon
plan n.
1. A scheme, program, or method
worked out beforehand for the
accomplishment of an objective: a
plan of attack.
2. A proposed or tentative project or
course of action: had no plans for the
evening.
4. A drawing or diagram made to scale
showing the structure or arrangement
of something.
5. In perspective rendering, one of
several imaginary planes
perpendicular to the line of vision
between the viewer and the object
being depicted.
3. A systematic arrangement of elements 6. A program or policy stipulating a
service or benefit: a pension plan.
or important parts; a configuration or
outline: a seating plan; the plan of a Synonyms: blueprint, design, project,
story.
scheme, strategy
plan n.
1.
A scheme,
program, or
method worked
out beforehand
for the
accomplishment
of an objective:
a plan of attack
(or exit).
plan n.
2.A proposed or tentative project or
course of action: had no plans for the
evening.
plan n.
3. A systematic arrangement
of elements or important
parts; a configuration or
outline:
a seating plan;
the plan of a story.
plan n.
4.
A drawing or diagram made
to scale showing the structure or
arrangement of something.
plan n.
5. In perspective rendering, one of
several imaginary planes
perpendicular to the line of vision
between the viewer and the object
being depicted.
plan n.
6. A program or policy stipulating a service or benefit:
a pension plan.
plan n.
1. A scheme, program, or method
worked out beforehand for the
accomplishment of an objective: a
plan of attack.
2. A proposed or tentative project or
course of action: had no plans for the
evening.
4. A drawing or diagram made to scale
showing the structure or arrangement
of something.
5. In perspective rendering, one of
several imaginary planes
perpendicular to the line of vision
between the viewer and the object
being depicted.
3. A systematic arrangement of elements 6. A program or policy stipulating a
service or benefit: a pension plan.
or important parts; a configuration or
outline: a seating plan; the plan of a Synonyms: blueprint, design, project,
story.
scheme, strategy
Automated Planning concerns …
• Mainly synthesizing a course of actions to achieve
the given goal
• Finding actions that need to be conducted in each
situation
– When you are going to Chicago
– In Tempe, “take a cab”
– In Sky Harbor, “take the plane”
• In summary, planning tries to find a plan (course
of actions) given the initial state (you are in
Tempe) and the goal (you want to be in Chicago)
Sungwook Yoon
What is a Planning Problem?
• Any problem that needs sequential decision
– For a single decision, you should look for Machine
Learning
• Classification
• Given a picture “is this a cat or a dog?”
• Any Examples?
–
–
–
–
–
FreeCell
Sokoban
Micro-mouse
Bridge Game
Football
Sungwook Yoon
What is a Planner?
Planner
Sungwook Yoon
1.
1.Move
Move
spade
block2
to the
1 tocell
left
2.
2. Move
Movespace
block3
to1the
to above
cell
3.3. …
….
Planning Involves Deciding a Course of Action to achieve a
desired state of affairs
(Static vs. Dynamic)
Environment
(perfect vs.
Imperfect)
(Full vs.
Partial satisfaction)
(Observable vs.
Partially Observable)
(Instantaneous vs.
Durative)
Goals
(Deterministic vs.
Stochastic)
What action next?
Any real world application for planning
please?
Sungwook Yoon
Space Exploration
• Autonomous planning, scheduling, control
– NASA: JPL and Ames
• Remote Agent
Experiment (RAX)
– Deep Space 1
• Mars Exploration
Rover (MER)
Manufacturing
• Sheet-metal bending machines - Amada
Corporation
– Software to plan the sequence of bends
[Gupta and Bourne, J. Manufacturing Sci. and
Engr., 1999]
Games
• Bridge Baron - Great Game Products
– 1997 world champion of computer bridge
[Smith, Nau, and Throop, AI Magazine, 1998]
Us:East declarer, West dummy
Opponents:defenders, South & North
– 2004: 2nd placeFinesse(P ; S)
1
LeadLow(P1; S)
PlayCard(P1; S, R1)
Contract:East – 3NT
On lead:West at trick 3
FinesseTwo(P2; S)
EasyFinesse(P2; S)
West— 2
StandardFinesse(P2; S)
…
PlayCard(P2; S, R2)
North— 3
BustedFinesse(P2; S)
…
(North— Q)
StandardFinesseTwo(P2; S)
East:KJ74
West: A2
Out: QT98653
(North—
StandardFinesseThree(P3; S)
3)
FinesseFour(P4; S)
PlayCard(P3; S, R3)
PlayCard(P4; S, R4)
PlayCard(P4; S, R4’)
East— J
South— 5
South— Q
Planning Involves Deciding a Course of Action to achieve a
desired state of affairs
(Static vs. Dynamic)
Environment
(perfect vs.
Imperfect)
(Full vs.
Partial satisfaction)
(Observable vs.
Partially Observable)
(Instantaneous vs.
Durative)
Goals
(Deterministic vs.
Stochastic)
What action next?
Dynamic
Stochastic
Partially
Observable
Durative
Static
Deterministic
Observable
InstantaneousPropositiona
Continuous
“Classical Planning”
Classical Planning Assumptions
Percepts
World
sole source
of change
perfect
fully
observable
Actions
????
deterministic
instantaneous
21
Representing States
World states are represented as sets of facts.
We will also refer to facts as propositions.
A
B
C
holding(A)
clear(B)
on(B,C)
onTable(C)
State 1
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
A
B
C
State 2
Closed World Assumption (CWA):
Fact not listed in a state are assumed to be false. Under CWA
we are assuming the agent has full observability.
Representing Goals
Goals are also represented as sets of facts.
For example { on(A,B) } is a goal in the blocks world.
A goal state is any state that contains all the goal facts.
A
B
C
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
State 1
A
B
C
holding(A)
clear(B)
on(B,C)
onTable(C)
State 2
State 1 is a goal state for the goal { on(A,B) }.
State 2 is not a goal state.
Representing Action in STRIPS
A
B
C
holding(A)
clear(B)
on(B,C)
onTable(C)
Stack(A,B)
State 1
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
A
B
C
State 2
A STRIPS action definition specifies:
1) a set PRE of preconditions facts
2) a set ADD of add effect facts
3) a set DEL of delete effect facts
Stack(x,y):
PRE: { holding(x), clear(y) }
ADD: { on(x,y), handEmpty }
DEL: { holding(x), clear(y) }
24
x←A
x←B
Stack(A,B):
PRE: { holding(A), clear(B) }
ADD: { on(A,B), handEmpty }
DEL: { holding(A), clear(B) }
Semantics of STRIPS Actions
A
B
C
holding(A)
clear(B)
on(B,C)
onTable(C)
Stack(A,B)
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
S
A
B
C
S  ADD – DEL
A STRIPS action is applicable (or allowed) in a state when its
preconditions are contained in the state.
• Taking an action in a state S results in a new state S  ADD – DEL
(i.e. add the add effects and remove the delete effects)
•
Stack(A,B):
PRE: { holding(A), clear(B) }
ADD: { on(A,B), handEmpty }
DEL: { holding(A), clear(B) }
25
STRIPS Planning Problems
A STRIPS planning problem specifies:
1) an initial state S
2) a goal G
3) a set of STRIPS actions
Objective: find a “short” action sequence reaching a goal state,
or report that the goal is unachievable
Example Problem:
A
B
holding(A)
clear(B)
onTable(B)
Initial State
Stack(A,B):
PRE: { holding(A), clear(B) }
ADD: { on(A,B), handEmpty }
DEL: { holding(A), clear(B) }
Solution: (Stack(A,B))
on(A,B)
Goal
Stack(B,A):
PRE: { holding(B), clear(A) }
ADD: { on(B,A), handEmpty }
DEL: { holding(B), clear(A) }
STRIPS Actions
26
Properties of Planners
 A planner is sound if any action sequence it
returns is a true solution
 A planner is complete outputs an action
sequence or “no solution” for any input problem
 A planner is optimal if it always returns the
shortest possible solution
Is optimality an important requirement?
Is it a reasonable requirement?
27
Complexity of STRIPS Planning
PlanSAT
Given: a STRIPS planning problem
Output: “yes” if problem is solvable, otherwise “no”
 PlanSAT is decidable.
Why?
 In general PlanSAT is PSPACE-complete!
Just finding a plan is hard in the worst case.
even when actions limited to just 2 preconditions
and 2 effects
Does this mean that we should give up on AI planning?
NOTE: PSPACE is set of all problems that are decidable in polynomial space.
PSPACE-complete is believed to be harder than NP-complete
28
Satisficing vs. Optimality
 While just finding a plan is hard in the worst case, for many planning domains,
finding a plan is easy.
 However finding optimal solutions can still be hard in those domains.
 For example, optimal planning in the blocks world is NP-
complete.
 In practice it is often sufficient to find “good” solutions “quickly” although they
may not be optimal.
 For example, finding sub-optimal blocks world solutions can be
done in linear time. How?
?
Search Space: Blocks World
Search space is finite.
30
Forward-Chaining Search
initial state
....
goal
....
 Breadth-first and best-first search are sound and complete
 Very large branching factor can cause search to waste time and space
trying many irrelevant actions
 O(bd) worst-case where b = branching factor, d = depth limit
 Need a good heuristic function and/or pruning procedure
 Early AI researchers gave up on forward search.
 But there has been a recent resurgence. More on this later in the course.
31
Backward-Chaining Search
goal
initial state
....
 Backward search can focus on more “goal relevant” actions, but
still the branch factor is typically huge
 Again a good heuristic function and/or pruning procedure
 Early AI researchers gave up on forward and backward search.
But there has been recent progress in developing
general planning heuristics leading to a resurgence.
More on this later in the course.
32
Total-Order vs. Partial-Order Planning (POP)
?
A
B
C
D
B
A
D
C
There are many possible plans:
1) move(A, B, TABLE) ; move(B, TABLE, A) ; move(C, D, TABLE) ; move(D, TABLE, C)
2) move(A, B, TABLE) ; move(C, D, TABLE) ; move(D, TABLE, C) ; move(B, TABLE, A)
3) move(C, D, TABLE) ; move(D, TABLE, C) ; move(A, B, TABLE) ; move(B, TABLE, A)
ect . . .
 State-space planning techniques produce totally-ordered plans,
i.e. plans consisting of a strict sequence of actions.
 Often, however, there are many possible orderings of actions
than have equivalent effects.
 However, often many orderings of the actions have equivalent
effects.
33
Total-Order vs. Partial-Order Planning (POP)
?
A
B
C
D
B
A
D
C
There are many possible plans:
1) move(A, B, TABLE) ; move(B, TABLE, A) ; move(C, D, TABLE) ; move(D, TABLE, C)
2) move(A, B, TABLE) ; move(C, D, TABLE) ; move(D, TABLE, C) ; move(B, TABLE, A)
3) move(C, D, TABLE) ; move(D, TABLE, C) ; move(A, B, TABLE) ; move(B, TABLE, A)
ect . . .
•
These plans share some common structure. They are all different
interleavings of two separate plans:
1) move(A, B, TABLE) ; move(B, TABLE, A)
2) move(C, D, TABLE) ; move(D, TABLE, C)
•
A partial-order plan is one which specifies only the necessary ordering
information. One partial-order plan may have many total-orderings
34
Total-Order vs. Partial-Order Planning (POP)
35
Planning Techniques in Summary
• Forward State Space Search
• Backward State Space Search
• Partial Order Planning (plan space search)
• What is the state of the art technique?
Sungwook Yoon
Exercise
Sungwook Yoon
What is a Planning Problem?
• Any problem that needs sequential decision
– For a single decision, you should look for Machine
Learning
• Any Examples?
– FreeCell
– Sokoban
– Micromouse
– Bridge Game
– Football
Sungwook Yoon
Markov Decision Process (MDP)
• Sequential decision problems under uncertainty
– Not just the immediate utility, but the longer-term
utility as well
– Uncertainty in outcomes
• Roots in operations research
• Also used in economics, communications engineering,
ecology, performance modeling and of course, AI!
– Also referred to as stochastic dynamic programs
Markov Decision Process (MDP)
• Defined as a tuple: <S, A, P, R>
– S: State
– A: Action
– P: Transition function
• Table P(s’| s, a), prob of s’ given action “a” in state “s”
– R: Reward
• R(s, a) = cost or reward of taking action a in state s
• Choose a sequence of actions (not just one decision or one action)
– Utility based on a sequence of decisions
Example: What SEQUENCE of actions should our
agent take?
• Each action costs –1/25
• Agent can take action N, E, S, W
• Faces uncertainty in every state
1
2
N
0.8
Blocked
CELL
0.1
Reward
+1
Reward
-1
0.1
3
1
Start
2
3
4
MDP Tuple: <S, A, P, R>
• S: State of the agent on the grid (4,3)
– Note that cell denoted by (x,y)
• A: Actions of the agent, i.e., N, E, S, W
• P: Transition function
– Table P(s’| s, a), prob of s’ given action “a” in state “s”
– E.g., P( (4,3) | (3,3), N) = 0.1
– E.g., P((3, 2) | (3,3), N) = 0.8
– (Robot movement, uncertainty of another agent’s actions,…)
• R: Reward (more comments on the reward function later)
– R( (3, 3), N) = -1/25
– R (4,1) = +1
??Terminology
• Before describing policies, lets go through some terminology
• Terminology useful throughout this set of lectures
•Policy: Complete mapping from states to actions
MDP Basics and Terminology
An agent must make a decision or control a probabilistic
system
• Goal is to choose a sequence of actions for optimality
• Defined as <S, A, P, R>
• MDP models:
– Finite horizon: Maximize the expected reward for the
next n steps
– Infinite horizon: Maximize the expected discounted
reward.
– Transition model: Maximize average expected reward
per transition.
– Goal state: maximize expected reward (minimize expected
cost) to some target state G.
???Reward Function
• According to chapter2, directly associated with state
– Denoted R(I)
– Simplifies computations seen later in algorithms presented
• Sometimes, reward is assumed associated with state,action
– R(S, A)
– We could also assume a mix of R(S,A) and R(S)
• Sometimes, reward associated with state,action,destination-state
– R(S,A,J)
– R(S,A) = S R(S,A,J) * P(J | S, A)
J
Markov Assumption
• Markov Assumption: Transition probabilities (and rewards) from
any given state depend only on the state and not on previous
history
• Where you end up after action depends only on current state
– After Russian Mathematician A. A. Markov (1856-1922)
– (He did not come up with markov decision processes
however)
– Transitions in state (1,2) do not depend on prior state (1,1) or
(1,2)
???MDP vs POMDPs
• Accessibility: Agent’s percept in any given state identify the
state that it is in, e.g., state (4,3) vs (3,3)
– Given observations, uniquely determine the state
– Hence, we will not explicitly consider observations, only states
• Inaccessibility: Agent’s percepts in any given state DO NOT
identify the state that it is in, e.g., may be (4,3) or (3,3)
– Given observations, not uniquely determine the state
– POMDP: Partially observable MDP for inaccessible environments
• We will focus on MDPs in this presentation.
MDP vs POMDP
MDP
World
World
Actions
Observations
States
Agent
Agent
Actions
SE
b
P
Policy
• Policy is like a plan
– Certainly, generated ahead of time, like a plan
• Unlike traditional plans, it is not a sequence of actions
that an agent must execute
– If there are failures in execution, agent can continue to
execute a policy
• Prescribes an action for all the states
• Maximizes expected reward, rather than just reaching
a goal state
MDP problem
• The MDP problem consists of:
– Finding the optimal control policy for all possible states;
– Finding the sequence of optimal control functions for a specific initial
state
– Finding the best control action(decision) for a specific state.
Non-Optimal Vs Optimal Policy
+1
1
-1
2
3
Start
1
2
3
4
• Choose Red policy or Yellow policy?
• Choose Red policy or Blue policy?
Which is optimal (if any)?
• Value iteration: One popular algorithm to determine optimal policy
Value Iteration: Key Idea
• Iterate: update utility of state “I” using old utility of
neighbor states “J”; given actions “A”
– U t+1 (I) = max [R(I,A) + S P(J|I,A)* U t (J)]
A
J
– P(J|I,A): Probability of J if A is taken in state I
– max F(A) returns highest F(A)
– Immediate reward & longer term reward taken into
account
Value Iteration: Algorithm
• Initialize: U0 (I) = 0
• Iterate:
U t+1 (I) = max [ R(I,A) + S P(J|I,A)* U t (J) ]
A
J
– Until close-enough (U t+1, Ut)
• At the end of iteration, calculate optimal policy:
Policy(I) = argmax [R(I,A) + S P(J|I,A)* U t+1 (J) ]
A
J
MDP Solution Techniques
•
•
•
•
•
Value Iteration
Policy Iteration
Matrix Inversion
Linear Programming
LAO*
Sungwook Yoon
Planning vs. MDP
• Common
– Try to act better
• Difference
– Relational vs. Propositional
– Symbolic vs. Value
– Less Toyish vs. More Toyish
– Solution Techniques
– Classic vs. More General
Sungwook Yoon
Planning vs. MDP, recent trends
• Recent Trend in Planning
– Add diverse aspect
• Probabilistic, Temporal, Oversubscribed .. Etc
– Getting closer to MDP but with Relational
Representation
• More real-world like ..
• Recent Trend in MDP
– More structure
• Relational, Options, Hierarchy, Finding Harmonic Functions …
– Getting closer to Planning!
Sungwook Yoon
Planning better than MDP?
• They deal with different objectives
– MDP focused more on optimality in general planning
setting
• The size of the domain is too small
– Planning focused on classic setting (unrealistic)
• Well, still many interesting problems can be coded into
classic setting
• Sokoban, FreeCell
• Planning’s biggest advances is from fast preprocessing of the relational problems
– Actually, they turn the problems into propositional
ones.
Sungwook Yoon
Can we solve real world problems?
• Suppose all the Planning and MDP techniques are
well developed.
– Temporal, Partial Observability, Continuous Variable,
etc.
• Well, who will code such problems into AI agent?
• Should consider the cost of developing such
problem definition and developing “very” general
Planner
– Might be better with domain specific planner
• Sokoban solver, FreeCell solver etc.
Sungwook Yoon
What is AI?
• An AI is 99 percent Engineering and 1 percent
Intelligent Algorithms
– Sungwook Yoon
Sungwook Yoon
Download