S T A N F O R D Learning Vehicular Dynamics, with Application to Modeling Helicopters Pieter Abbeel, Varun Ganapathi, Andrew Y. Ng First Autonomous Funnel Models in Prior Work Overview Model-based reinforcement learning has been very successful. State-of-the-art: Reinforcement learning returns policies that fly well in simulation. Remaining helicopter failures typically caused by inaccurate simulation. S T A N F O R D Predict velocities and angular rates: f: learned from data. Obtain position and orientation Our approach: Encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. Explicitly learn simulation that is predictive at long time-scales. Result Significantly improved helicopter model. First autonomous funnel (aerobatic maneuver) using our model. from numerical integration. Shortcomings Key technical challenge: Building an accurate simulator. Aerobatic maneuver. Method: model-based reinforcement learning. Simulator: From physics we have: Rotation between body coordinate frames at times t and t+1 Accelerations Body coordinate frame is different at every time step. This makes inertia highly nonlinear in the state and very difficult to capture/learn from data. For most physical systems, forces and torques have a fairly simple relation to inputs and current state. This simplicity is lost by the change of coordinate frame. Acceleration prediction. Longer time-scale criterion. Acknowledgments: control is joint work with Adam Coates, Ben Tse. (Paper forthcoming.) Video available. Simulator Accuracy RC Helicopters Our acceleration prediction model Bergen Industrial Twin Predict accelerations: f: learned from data. Obtain velocity, angular rates, position and orientation from numerical integration. Advantages No need to learn inertia from data. Constraints from physics are incorporated explicitly. The relation between state, inputs and accelerations is not cluttered by the change of coordinate frame, and thus easier to learn from data. XCell Tempest Standard learning criteria Bergen Industrial Twin Frequency domain fitting: requires a linear model, used in CIFER (industry standard). Minimize one-step prediction error: For f linear in state s and inputs u: f can be found by linear regression. Linear model, one-step prediction error. Linear model, frequency domain fit with CIFER. Accuracy of simulation over longer time-scales is important for control. The following longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: (H: time-scale of interest) Linear model, longer time scale prediction error. Acceleration model, one-step prediction error. Acceleration model, longer time scale prediction error. Helicopter State and Inputs Velocity Angular rates 12-D state: Encode symmetries using body (=robot-centric) coordinates 8-D state: Body coordinate frame attached to helicopter u1, u2: The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the helicopter to pitch forward/backward or roll sideways. u3: The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn) the helicopter. u4: The main rotor collective pitch control affects the main rotor's thrust. EM-algorithm for maximization is expensive in our continuous state-action space setting. We present a simple and fast algorithm for (approximately) minimizing the average squared error over a certain duration. Sketch of algorithmic idea (see paper for full algorithm) Model: One step prediction at time t: One step prediction at time t+1: Two step prediction at time t: Therefore, can approximate multiple-step dynamics by linear combination of one-step dynamics. Our algorithm iterates the following two steps: Compute estimate of st+1 given st, ut, ut+1 for current model A,B. Estimate Acceleration prediction model significantly better. Reasons: Captures gravity exactly. Captures inertia, thus side-slip effects in the data. Longer time scale criterion outperforms CIFER, which in turn outperforms the one-step criterion. Differences more significant for Tempest than for Bergen, since Bergen data is mostly around hover. Position Observations Legend Longer time-scale criterion Orientation: roll, pitch, yaw XCell Tempest Conclusion Key technical challenge for model-based reinforcement learning applied to helicopters: building an accurate simulator. Our approach By using acceleration-based approach, we can encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. Explicitly learn simulation that is predictive at long time-scales. Result Significantly improved helicopter model. First autonomous funnel (aerobatic maneuver) using our model.