Solving Multi-agent Sequential Decision Making Problems Using I-DIDs Muthukumaran Chandrasekaran Computer Science Department The University of Georgia mkran@uga.edu Introduction Interactive dynamic influence diagrams (I-DIDs) are graphical models for sequential decision making in uncertain settings shared by other agents. The challenge is an exponentially growing space of candidate models ascribed to other agents, over time. Redundant and/or ‘equivalent’ models are then pruned to reduce the model space. We present a new approximation technique that reduces the candidate model space by removing models that are εsubjectively equivalent (ε-SE) with representative ones (ε is the approximation factor). Background I-DIDs have nodes (decision (rectangle), chance (oval), utility(diamond), model(hexagon)), arcs (functional, conditional, informational), links (policy (dashed), model update (dotted)) (shown in Fig 1). IDIDs are graphical counterparts of IPOMDPs [1]. Discussion The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation paths but these models could be behaviorally distinct over those paths that have zero probability. These models are not BE and are called (strictly) Observationally Equivalent. Fig 6 shows a recursive way to compute the distribution over the subject agent’s actionobservation history. Test domains – Multi-agent tiger (Fig 2) and Multi-agent machine maintenance problems. The quality of the solution generated using ε-SE improves as we reduce ε and approaches that of the exact solution – Indicative of flexibility (Fig 3). Compared to DMU, ε-SE obtains higher rewards for identical number of initial models – Indicative of a more informed clustering and pruning although less efficient (Fig 4). Problems: Scalability issues due to the curse of history (distribution computations are time and space consuming) References 1. P. Doshi, Y. Zeng, and Q. Chen, Graphical models for interactive POMDPs: Representations and solutions, JAAMAS, 2009. Implementation: Exact and ε-SE using HUGIN 2.B. Rathnas., P. Doshi, and P. J. Gmytrasiewicz, Exact Java API for DIDs Approach Goal is to reduce the model space. First, we prune Behaviorally Equivalent [2] - whose behavioral predictions for the agent are identical – models. We further reduce the space by pruning (SE) models that induce an approximately identical distribution over the subject agent’s future action-observation history (Fig 5) and replacing them with a representative. Empirical comparisons with the state-of-the-art approach DMU [3] were done and the approximation error was bounded. solutions to interactive POMDPs using behavioral equivalence, AAMAS, 2006. 3. P. Doshi, Y. Zeng, Improved approximation of interactive dynamic influence diagrams using discriminative model updates, AAMAS, 2009. Acknowledgments I thank Dr. Prashant Doshi and Dr. Yifeng Zeng for their valuable contributions in this paper. This research is partially supported by NSF CAREER grant, IIS-0845036, and AFOSR grant, #FA955008-1-0429, to Prof. Prashant Doshi (PI).