Talk - Muthukumaran Chandrasekaran

advertisement
Solving Multi-agent Sequential Decision Making Problems
Using I-DIDs
Muthukumaran Chandrasekaran
Computer Science Department
The University of Georgia
mkran@uga.edu
Introduction
Interactive dynamic influence diagrams (I-DIDs) are
graphical models for sequential decision making in
uncertain settings shared by other agents. The
challenge is an exponentially growing space of
candidate models ascribed to other agents, over time.
Redundant and/or ‘equivalent’ models are then
pruned to reduce the model space. We present a new
approximation technique that reduces the candidate
model space by removing models that are εsubjectively equivalent (ε-SE) with representative
ones (ε is the approximation factor).
Background
I-DIDs have nodes (decision (rectangle), chance
(oval), utility(diamond), model(hexagon)), arcs
(functional, conditional, informational), links (policy
(dashed), model update (dotted)) (shown in Fig 1). IDIDs are graphical counterparts of IPOMDPs [1].
Discussion
The set of SE models include s those that are
BE. It further includes models that include
identical distributions over the subject agent’s
action observation paths but these models
could be behaviorally distinct over those paths
that have zero probability. These models are
not BE and are called (strictly) Observationally
Equivalent.
Fig 6 shows a recursive way to compute the
distribution over the subject agent’s actionobservation history.
Test domains – Multi-agent tiger (Fig 2) and
Multi-agent machine maintenance problems.
The quality of the solution generated using ε-SE
improves as we reduce ε and approaches that of
the exact solution – Indicative of flexibility (Fig 3).
Compared to DMU, ε-SE obtains higher rewards
for identical number of initial models – Indicative
of a more informed clustering and pruning
although less efficient (Fig 4).
Problems: Scalability issues due to the curse of
history (distribution computations are time and
space consuming)
References
1. P. Doshi, Y. Zeng, and Q. Chen, Graphical models for
interactive POMDPs: Representations and solutions,
JAAMAS, 2009.
Implementation: Exact and ε-SE using HUGIN
2.B. Rathnas., P. Doshi, and P. J. Gmytrasiewicz, Exact
Java API for DIDs
Approach
Goal is to reduce the model space. First, we prune
Behaviorally Equivalent [2] - whose behavioral
predictions for the agent are identical – models. We
further reduce the space by pruning (SE) models that
induce an approximately identical distribution over the
subject agent’s future action-observation history (Fig
5) and replacing them with a representative.
Empirical comparisons with the state-of-the-art
approach DMU [3] were done and the
approximation error was bounded.
solutions to interactive POMDPs using behavioral
equivalence, AAMAS, 2006.
3. P. Doshi, Y. Zeng, Improved approximation of
interactive dynamic influence diagrams using
discriminative model updates, AAMAS, 2009.
Acknowledgments
I thank Dr. Prashant Doshi and Dr. Yifeng Zeng
for their valuable contributions in this paper. This
research is partially supported by NSF CAREER
grant, IIS-0845036, and AFOSR grant, #FA955008-1-0429, to Prof. Prashant Doshi (PI).
Download