Planning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh Branislav Kveton Intelligent Systems Program, University of Pittsburgh Carlos Guestrin Department of Computer Science, CMU Factored MDPs • Limitations Standard MDP formulation permits only flat discrete state and action spaces Real-world problems are complex, factored, and involve continuous quantities • Factored MDPs with discrete components State and action spaces grow exponentially in the number of variables Traditional methods for solving MDPs are polynomial in the size of state and action spaces • Factored MDPs with continuous components State and action spaces are infinitely large optimal value function or policy may not have finite support Naive discretization of continuous variables often leads to an exponential complexity of solution methods Hybrid Factored MDPs • A hybrid factored MDP (HMDP) is a 4-tuple M = (X, A, P, R): X = (XD, XC) is a vector of state variables (discrete or continuous) A = (AD, AC) is a vector of action variables (discrete or continuous) P and R are factored transition and reward models represented by a dynamic Bayesian network (DBN) A1 X1 X’1 R1 A2 X2 R2 t X’2 A3 X3 t+1 X’3 Optimal Value Function and Policy • Optimal policy π∗ maximizes the infinite horizon discounted reward: ∞ t E π ∑ γ R(x t , π(x t )) t =0 • Optimal value function V∗ is a fixed point of the Bellman equation: ∗ ∗ V (x ) = sup R(x, a )+ γ ∑ ∫ P(x′ | x, a )V (x′)dx′C a x ′D x ′C • Linear value function approximation A compact representation of an MDP does not guarantee the same structure in the optimal value function or policy Approximation by a linear combination of |w| basis functions: V w ( x ) = ∑ w i fi (x i ) i Hybrid Approximate Linear Programming • Hybrid approximate linear programming (HALP) formulation: Basis function relevance weight: α i = ∑ ∫ ψ (x )fi (x )dx C minimize w subject to : xD x C ∑w α ∑ w F (x, a )− R(x, a )≥ 0 i i i i i i ∀x ∈ X, a ∈ A Difference between the basis function fi(x) and its discounted backprojection: Fi (x, a ) = fi (x )− γ ∑ ∫ P(x′ | x, a )fi (x′)dx′C x ′D x ′C HALP solutions HALP formulation contains infinite number of constraints, ∑ w iFi (x , a)− R (x , a) ≥ 0, ∀x ∈ X , a ∈ A i Methods: • Monte Carlo (Hauskrecht, Kveton, 2004): • ε−HALP (Guestrin, Hauskrecht, Kveton, 2004): • Sample constraint space Discretize constraint space on the regular grid Take advantage of the structure Cutting plane methods on discrete state space Exponential in the treewidth of the problem Direct cutting plane methods (Kveton, Hauskrecht, 2005) Stochastic optimizations for finding a maximally violated constraint Scale-up potential • Problems with over 25 state and 20 action variables A8 X7 X10 Regulation device represented by a discrete action node A5 X11 Irrigation channel represented by a continuous variable A10 X17 A15 X2 A2 Inflow regulation device X5 X8 X16 A14 Outflow regulation device X14 A1 X4 A9 X13 A13 A4 X1 A6 X3 A3 X6 X15 A11 X9 A7 X12 A12 Large irrigation network Traffic Management • Recent surveys show Americans spend billions of hours in city traffic jams Congestions have grown in urban areas of every size Congestions have increasing economic impact The problem is too complex and growing rapidly No technology has provided a definite solution yet • Our approach Automatic traffic management to ease congestions Traffic management exhibits local structure with long-term global consequences Automatic Traffic Management XF-3 – # cars on Forbes Ave between Schenley Dr and S Bellefield Ave XF-2 – # cars on Forbes Ave between Bigelow Blvd and Schenley Dr XB-1 – # cars on Bigelow Blvd between 5th Ave and Forbes Ave XF-1 – # cars on Forbes Ave between S Bouquet St and Bigelow Blvd AF-2 – traffic lights at Forbes Ave and Schenley Dr AF-1 – traffic lights at Forbes Ave and Bigelow Blvd