Algorithms for POMDP

advertisement
Algorithms for POMDP
Presented by Alp Sardağ
Monahan Enumeration Phase
Generate all vectors:
Number of gen. Vectors = |A|M||
where M vectors of previous state
Monahan Reduction Phase
All vectors can be kept:
Each time maximize over all vectors.
Lot of excess baggage
The number of vectors in next step will be
even large.
LP used to trim away useless vectors
Monahan Reduction Phase
For a vector to be useful, there must be
at least one belief point it gives larger
value than others:
Monahan Algorithm
Monahan’s LP Complication
Formulate LP and check for :
Eagle’s Variant of Monahan
The optimization occurs in enumaration phase.
If, in the enumaration process, a vector’s
components are completely dominated by
another vector’s component, discard it.
Generate ji(t) and following condition holds:
Discard ji(t).
Can be applied to check new vector dominates
any vector previously enumarated.
Sondik’s One-Pass Algorithm
Find theproper set of belief states to plug into
the below formula to get all necessary
vectors:
The algorithm is guaranteed to visit finite
number of regions.
The union of these regions is the entire belief
space.
Sondik’s One-Pass Algorithm
Simplified version of Sondik’s algorithm:
Sondik’s One-Pass Algorithm
How to define a region around this
belief state where that vector is
guaranteed to be true linear portion of
the value function?
Construct a series of constraints when
satisfied, region is found.
Then go step (5)
Sondik’s One-Pass Algorithm
The condition *(t), generated at ,
larger for all other a(t), as  varies:
Variations in  can cause changes in
a(t).
Need a new constraint to ensure
components of a(t) stay the same.
Sondik’s One-Pass Algorithm
What affects *(t) and a(t)?
To ensure that every part of the function does
not change, these constraint exists for every
combination of a and 
Sondik’s One-Pass Algorithm
Constraints restrict belief states to lie on
the belief state space simplex:
Sondik’s One-Pass Algorithm
A constraint consists of a region with all
the points on one side of the line:
Sondik’s One-Pass Algorithm
The LP constraints at step (4):
Sondik’s One-Pass Algorithm
In step (5), find belief states guaranteed
not to be in region defined in step (4).
With the new point proceed exactly as
step (4).
The algorithm goes until a complete
partition of the belief space found.
Sondik’s One-Pass Algorithm
To find points in the neighboring regions, points lying on
the edge of the region defined by the constraints is
used:
Sondik’s One-Pass Algorithm
Which constraints are binding:
For each constraint, change its inequality
into an equality,
Solve this LP.
If the LP has solution, it is a binding
constraint, a non-binding constraint can
not pass through the region defined by
all other constraints.
Cheng’s Relaxed Region
Same as Sondik’s One Pass algorithm
except each region specified with fewer
constraints.
Defines regions that will typically be
larger than the actual vectors’ regions.
Cheng’s Relaxed Region
Set of constraints for the relaxed
regions of Cheng:
Cheng’s Relaxed Region
Corners found with interior algorithm:
Cheng’s Linear Support
The algorithm defines an approximate
value function over the entire belief
space.
Refine this approximation until it
reaches the optimal value function.
Cheng’s Linear Support
Difference between two algorithms:
Cheng’s Linear Support
Initiliaze a search list with extreme points on
the belief simplex(e.g. [1,0,0...],[0,1,0,0...]),
and an empty set of vectors.
For each of these points the true (t) vector
calculated, and added to the set of vectors.
Cheng’s Linear Support
Since both the true and the approximation are
PWLC, the largest difference must occur at a
corner point.
Cheng then finds all the corner points of the
regionsinduced by the approximation.
Disregard the corner points seen before and add
those not seen before to search list.
Pick a point from the search list, generate the
vector. If it is different all the other approximation,
add it to the approximation set.
Repeat whole procedure with the new
approximation
Cheng’s Linear Support
Cheng’s Linear Support
Download