First Autonomous Funnel

advertisement
S
T
A
N
F
O
R
D
Learning Vehicular Dynamics, with
Application to Modeling Helicopters
Pieter Abbeel, Varun Ganapathi, Andrew Y. Ng
First Autonomous Funnel
Models in Prior Work
Overview
Model-based reinforcement learning has been very successful.
 State-of-the-art:
 Reinforcement learning returns policies that fly well in simulation.
 Remaining helicopter failures typically caused by inaccurate simulation.


S
T
A
N
F
O
R
D
Predict velocities and angular rates:
 f: learned from data.
 Obtain position and orientation

Our approach:
 Encode all constraints known from physics. (Gravity, inertia, etc.) Learn
only parts of model not determined by physics.
 Explicitly learn simulation that is predictive at long time-scales.
 Result
 Significantly improved helicopter model.
 First autonomous funnel (aerobatic maneuver) using our model.

from numerical integration.
Shortcomings
Key technical challenge: Building an accurate simulator.




Aerobatic maneuver.
Method: model-based reinforcement
learning.
Simulator:

From physics we have:


Rotation between body coordinate
frames at times t and t+1
Accelerations
Body coordinate frame is different at every time step. This makes inertia highly nonlinear in the state and very difficult to capture/learn from data.
 For most physical systems, forces and torques have a fairly simple relation to inputs and
current state. This simplicity is lost by the change of coordinate frame.
Acceleration prediction.
Longer time-scale criterion.
Acknowledgments: control is joint work
with Adam Coates, Ben Tse. (Paper
forthcoming.)
Video available.

Simulator Accuracy
RC Helicopters
Our acceleration prediction model
Bergen Industrial Twin
Predict accelerations:
 f: learned from data.
 Obtain velocity, angular rates, position and orientation from numerical integration.

Advantages
No need to learn inertia from data. Constraints from physics are incorporated explicitly.
 The relation between state, inputs and accelerations is not cluttered by the change of
coordinate frame, and thus easier to learn from data.

XCell Tempest
Standard learning criteria
Bergen Industrial Twin
Frequency domain fitting: requires a linear model, used in CIFER (industry standard).
 Minimize one-step prediction error:
 For f linear in state s and inputs u: f can be found by linear regression.

Linear model, one-step prediction error.
Linear model, frequency domain fit with CIFER.
Accuracy of simulation over longer time-scales is important for control. The following
longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: (H: time-scale of
interest)
Linear model, longer time scale prediction error.
Acceleration model, one-step prediction error.
Acceleration model, longer time scale prediction error.
Helicopter State and Inputs

Velocity

Angular
rates
12-D state:
Encode symmetries using
body (=robot-centric)
coordinates

8-D state:
Body coordinate frame
attached to helicopter
u1, u2: The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the
helicopter to pitch forward/backward or roll sideways.
 u3: The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn)
the helicopter.
 u4: The main rotor collective pitch control affects the main rotor's thrust.

EM-algorithm for maximization is expensive in our continuous state-action space setting.
We present a simple and fast algorithm for (approximately) minimizing the average squared
error over a certain duration.
 Sketch of algorithmic idea (see paper for full algorithm)
 Model:
 One step prediction at time t:
 One step prediction at time t+1:
 Two step prediction at time t:
 Therefore, can approximate multiple-step dynamics by linear combination of one-step
dynamics.
 Our algorithm iterates the following two steps:
 Compute estimate
of st+1 given st, ut, ut+1 for current model A,B.
 Estimate
Acceleration prediction model significantly better.
Reasons:
 Captures gravity exactly.
 Captures inertia, thus side-slip effects in the data.
 Longer time scale criterion outperforms CIFER,
which in turn outperforms the one-step criterion.
 Differences more significant for Tempest than for
Bergen, since Bergen data is mostly around hover.


Position
Observations
Legend
Longer time-scale criterion
Orientation:
roll, pitch, yaw
XCell Tempest
Conclusion
Key technical challenge for model-based reinforcement learning applied to
helicopters: building an accurate simulator.
 Our approach

By using acceleration-based approach, we can encode all constraints known from physics.
(Gravity, inertia, etc.) Learn only parts of model not determined by physics.
 Explicitly learn simulation that is predictive at long time-scales.


Result
Significantly improved helicopter model.
 First autonomous funnel (aerobatic maneuver) using our model.

Download