First Autonomous Funnel

S T A N F O R D Learning Vehicular Dynamics, with Application to Modeling Helicopters Pieter Abbeel, Varun Ganapathi, Andrew Y. Ng First Autonomous Funnel Models in Prior Work Overview Model-based reinforcement learning has been very successful.  State-of-the-art:  Reinforcement learning returns policies that fly well in simulation.  Remaining helicopter failures typically caused by inaccurate simulation.   S T A N F O R D Predict velocities and angular rates:  f: learned from data.  Obtain position and orientation  Our approach:  Encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics.  Explicitly learn simulation that is predictive at long time-scales.  Result  Significantly improved helicopter model.  First autonomous funnel (aerobatic maneuver) using our model.  from numerical integration. Shortcomings Key technical challenge: Building an accurate simulator.     Aerobatic maneuver. Method: model-based reinforcement learning. Simulator:  From physics we have:   Rotation between body coordinate frames at times t and t+1 Accelerations Body coordinate frame is different at every time step. This makes inertia highly nonlinear in the state and very difficult to capture/learn from data.  For most physical systems, forces and torques have a fairly simple relation to inputs and current state. This simplicity is lost by the change of coordinate frame. Acceleration prediction. Longer time-scale criterion. Acknowledgments: control is joint work with Adam Coates, Ben Tse. (Paper forthcoming.) Video available.  Simulator Accuracy RC Helicopters Our acceleration prediction model Bergen Industrial Twin Predict accelerations:  f: learned from data.  Obtain velocity, angular rates, position and orientation from numerical integration.  Advantages No need to learn inertia from data. Constraints from physics are incorporated explicitly.  The relation between state, inputs and accelerations is not cluttered by the change of coordinate frame, and thus easier to learn from data.  XCell Tempest Standard learning criteria Bergen Industrial Twin Frequency domain fitting: requires a linear model, used in CIFER (industry standard).  Minimize one-step prediction error:  For f linear in state s and inputs u: f can be found by linear regression.  Linear model, one-step prediction error. Linear model, frequency domain fit with CIFER. Accuracy of simulation over longer time-scales is important for control. The following longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: (H: time-scale of interest) Linear model, longer time scale prediction error. Acceleration model, one-step prediction error. Acceleration model, longer time scale prediction error. Helicopter State and Inputs  Velocity  Angular rates 12-D state: Encode symmetries using body (=robot-centric) coordinates  8-D state: Body coordinate frame attached to helicopter u1, u2: The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the helicopter to pitch forward/backward or roll sideways.  u3: The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn) the helicopter.  u4: The main rotor collective pitch control affects the main rotor's thrust.  EM-algorithm for maximization is expensive in our continuous state-action space setting. We present a simple and fast algorithm for (approximately) minimizing the average squared error over a certain duration.  Sketch of algorithmic idea (see paper for full algorithm)  Model:  One step prediction at time t:  One step prediction at time t+1:  Two step prediction at time t:  Therefore, can approximate multiple-step dynamics by linear combination of one-step dynamics.  Our algorithm iterates the following two steps:  Compute estimate of st+1 given st, ut, ut+1 for current model A,B.  Estimate Acceleration prediction model significantly better. Reasons:  Captures gravity exactly.  Captures inertia, thus side-slip effects in the data.  Longer time scale criterion outperforms CIFER, which in turn outperforms the one-step criterion.  Differences more significant for Tempest than for Bergen, since Bergen data is mostly around hover.   Position Observations Legend Longer time-scale criterion Orientation: roll, pitch, yaw XCell Tempest Conclusion Key technical challenge for model-based reinforcement learning applied to helicopters: building an accurate simulator.  Our approach  By using acceleration-based approach, we can encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics.  Explicitly learn simulation that is predictive at long time-scales.   Result Significantly improved helicopter model.  First autonomous funnel (aerobatic maneuver) using our model. 

First Autonomous Funnel

Related documents

Products

Support

First Autonomous Funnel

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib