From: AAAI-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. Automatically Tuning Control Systems for Simulated Legged Robots Robert Ringrose MIT Leg Lab MIT Artificial Intelligence Laboratory 545 Technology Square Cambridge, MA 02139 ringrose@ai.mit .edu Abstract Rather than create a control system from scratch each time we build a new robot creature, we would like to generate control systems automatically. I have implemented an algorithm which, given a control system that works well for one creature, automatically tunes it to work for a new, similar creature. Using this approach, the control system for a horse might be adjusted for use with elephants, giraffes, and dogs. The adjustment is accomplished by gradually altering the original creature to make it like the new one and repeatedly tuning the control system as these changes are made. Because the creature’s alteration is gradual, the control system can be tuned using a local search such as gradient descent. In simulation tests, the tuning algorithm has successfully tuned the control system of a planar quadruped simulation to accommodate a reduction in leg length by a factor of two, an increase in body mass by a factor of three, and changes in the commanded speed while trotting. Introduction Within the domain of actively balanced legged locomotion, it is necessary to tune control systems to reflect physical alterations of the robot. I have designed and implemented a tuning algorithm which will tune an existing control system to control a different robot, or to exhibit different behavior. This algorithm has successfully tuned the control system of a planar quadruped simulation to accommodate a reduction in leg length by a factor of two, an increase in body mass by a factor of three, and changes in the commanded speed while tr0tting.l Any control system has some set of control parameters, numbers which determine how it performs. For example, a parameter might control how rapidly it tries to accelerate to a desired speed or how much energy it injects at each step. For a complicated system, the ‘This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation. appropriate values for these control parameters are not obvious. There are several advantages to having a computer search for a set of parameters which minimizes an evaluation function rather than having a human tune the control system directly. Automatically tuning the control system does not require that a human with experience tuning invest a large amount of time. One can also specify the desired behavior without considering the interactions between any specific input parameters to the control system. Additionally, it is easier for a computer to optimize for something which is not obvious to a human, such as minimal energy consumption. Properly specifying the desired behavior is not a trivial task, but it seems easier than manually tuning the control system. Other work on self-tuning controllers (Helferty, Collins, & Kam 1988), which frequently used searching techniques such as spacetime constraints (Witkin & Kass 1988) or genetic algorithms (Pearse, Arkin, & Ram 1992), h as addressed similar problems. Tuning controllers for dynamically balanced legged systems is particularly challenging because there is typically only a small “sweet spot” near the global minimum where one can effectively evaluate the robot’s behavior. A parameter set outside this sweet spot will generally make the robot fall over or not take any steps, while a parameter set inside the sweet spot will make the robot run well enough that its performance can be evaluated objectively. The vast majority of the possible parameter sets for dynamically balanced legged locomotion lie outside of the sweet spot. As a consequence, general search methods like genetic algorithms and simulated annealing will take a long time to find any working so- lution, and searches which follow a local slope will only find a useful minimum if they start in the sweet spot. A frequently used method for getting around this challenging search space is to simplify the problem so as to drastically increase the sweet spot’s size (for example, one can add constraints to the model that prevent the robot from falling over). Instead of modifying the search space tosuit my algorithm, however, I have attempted to adjust my algorithm to fit the search space. In the event that there is a parameter set which is Control 1297 Figure 1: Illustration of the simulated quadruped and the associated model. The actuators at the hips are implemented as torque sources. The leg actuator is implemented as a spring with controllable rest length and different constants in compression and extension. The simulation is a planar rigid-body model. in the sweet spot, a simple gradient descent search can find a local minimum. Additionally, most of the time a small change in the robot’s configuration will result in only a small change in the sweet spot. The tuning algorithm presented here uses this characteristic to break the search for a new set of parameters into a series of smaller searches for which minima are easier to find. For example, assume the control system for a quadruped running simulation has been tuned to run with legs of a particular length. To find a set of control parameters for a quadruped running with legs half as long, the tuning algorithm gradually reduces the leg length and optimizes at several leg lengths between full and half length. As the leg length changes, the location of the sweet spot will change. For small changes the sweet spot’s motion will usually be slight enough that the control parameters for the unchanged leg length will still be within the new sweet spot. The tuning process may fail if gradient descent cannot find a local minimum (the sweet spot might not be continuous), if the sweet spot changes dramatically with a small alteration in the robot, or if there is no way for the given control system to control the robot. To find appropriate values for the control parameters, the tuning algorithm described in this paper starts with an existing control system, simulation, and parameter set. It finds out how far it can modify the simulation and still get acceptable behavior, makes that modification, and then uses a gradient descent search to improve the performance of the modified simulation. This process of modifying the simulation and re-tuning the control system is repeated until you have the desired final simulation. based on a physical robot described by Raibert (Raibert, Chepponis, & Brown 1986)(Raibert 1990). The simulation is a rigid body simulation, the dynamics of which are generated using a commercial dynamic modeling program (Rosenthal & Sherman 1986)(Ringrose 1992a). Simulation creation is automated so that it is possible to change and re-create any simulation as part of the tuning process. The planar quadruped which I used is illustrated in figure 1. The simulated robot is controlled by a planar variation of the finite state controller for the Raibert trotting quadruped (Raibert et al. 1992). The control system uses measurements which could be sensed or calculated on a physical robot, such as position, velocity, actuator lengths, and ground contact. The control system’s behavior can be modified through 20 parameters, including maximum acceleration, desired speed, spring constants, and desired leg length during different running phases. Some previous investigations into robotic running are described in references (Hodgins & Raibert 1989)(Hodg ins & Raibert 199l)(Playter & Raibert 1992)(Raibert et al. 1992). Further details of the controller and model are available in (Ringrose 1992b). The a search space whose structure makes global searching algorithms ineffective. However, once you have a reasonably good solution, a gradient descent search (Press et al. 1988) modified to take into account local minima (Ringrose 1992b) will frequently be able to find a better solution. Simulation I have used a planar quadruped simulation to test the tuning algorithm. It retains enough complexity to illustrate most of the problems that come up, but is simple to visualize and easy to explain. The simulation is 1298 Robotics Searching for Solutions In order to search for an appropriate set of control parameters, you need a way to compare the behavior generated by different sets of control parameters. I use the results from an evaluation function which simulates the creature and returns an objective measure of the creature’s performance. Most evaluation functions for dynamically stable running motions result in Figure 2: Graph of evaluation results over changes in leg spring and damping constants for the planar quadruped, using the evaluation function described in the text. The noisy, high-valued regions are outside the sweet spot. The right graph is the same as the left, with a lower maximum value imposed to emphasize the structure of the sweet spot. Most evaluation functions for legged locomotion have a nearly pathological search space because if the parameters are out of a small sweet spot around the good solutions the simulated creature fails catastrophically, usually by falling over or not taking any steps. When such a failure occurs, a meaningful evaluation of performance is difficult since the causes of the simulation’s failure to trot are difficult to determine. The creature could fall over if it stubs its toe, the leg springs are not strong enough, the swing legs do not come forward fast enough to catch the robot, or some other reason. It is not difficult to make an evaluation function which recognizes when it is out of the sweet spot, but it is difficult to ensure that when outside the sweet spot the gradient of the evaluation function leads towards the sweet spot. Because of the inherent difficulty evaluating a catastrophic failure, most evaluation functions only have a useful section near the global minimum and the rest is noise. Evaluation functions usually have more dimensions than can be readily visualized, but cross sections can give an idea of the search space’s general structure. A two dimensional cross-section of the evaluation function is somewhat like a smooth canyon (the sweet spot) in noise, with the noise being uniformly highervalued than the sweet spot. There are many parameter sets that do not generate information except to indicate that the creature failed (in the noisy section) and there is a smaller number of parameter sets where the creature may actually run well. Figure 2 illustrates the general shape of evaluation functions, using data from the simulated quadruped. Evaluating Performance In order to evaluate the performance of a set of control parameters, I created an evaluation function which reflects the fact that a control algorithm for a trotting quadruped needs to do more than propel the quadruped forward. Raibert’s experimentation in quadruped control suggests that it should (Raibert 1990): control the forward velocity. regulate the body attitude. put reasonable constraints on the forces and torques applied. limit the vertical motion of the body. keep the running cycle stable. The evaluation function I used is the integration of the departures from these goals over the course of seven simulated seconds (Ringrose 199213). Seven seconds, the length of time over which the behavioral error is integrated, is several times the length of the step cycle, allowing transients to die out. This evaluation function does not guarantee that the running cycle is stable beyond the seven seconds of running actually simulated. In practice, however, if the simulation successfully trots for that length of time it is stable enough that it is unlikely to fail later. Getting a Close Solution A goal of this work is to be able to automatically tune the control system when there are large changes in the physical characteristics of the creature. When the simulation changes by a small amount, a good parameter set may no longer be locally optimal, but it may still be within the sweet spot. Figure 3 shows a cross section of a sample evaluation function and how it changes as the simulation is altered. If the simulation’s change is small enough that the new location of the sweet spot still overlaps the old set of control parameters, those original control parameters can be used with a local search such as gradient descent to find a new set of acceptable control parameters. Generally, if there is a large physical change, the control parameters which were originally acceptable give poor results because the sweet spot moves too far. However, by splitting the large change into a series of smaller changes one can follow the motion of the sweet spot as the simulation changes. Control 1299 Figure 3: Leg spring constant in compression as weight on the quadruped increases. Note how by staying in the minimum as weight is added to the quadruped’s trunk it is possible to find a spring constant for 50 kg which is within the sweet spot. Parameters other than the leg spring constant in compression are optimized for the appropriate weight. In order to have the tuning process work efficiently, it is desirable to take large steps when possible. I use a divide-and-conquer algorithm which splits large changes in half if necessary and recursively solves each half. Set up the simulation with a fraction f which goes from 0 to 1, where 0 is the original configuration and 1 is the final configuration. Let F,(P) represent running the simulation with the fraction f = a and the parameter set P, applying the evaluation function to the run, and returning the result. Let a and b be numbers between 0 and 1. Let Pa be a parameter set such that F, (P,) is “acceptable” (less than a user-defined constant). The algorithm used to find some &, a parameter set such that Fb(pb) is acceptable, is: If Fb( pa) is “good enough” (less than a constant supplied by the user), pb = Pa. Otherwise, if Fb(p,) is “acceptable”, pb is the set of parameters arrived at by a gradient descent search with Pa as a starting point and using the model with fraction b until the result Fb(pb) is “optimized” (less than another user-defined constant). Let c = (a + b)/2. Recursively use this algorithm to find PC, a parameter set such that FC(PC) is acceptable, from Pa, a, and c. Recursively use this algorithm to find pb from PC, c, and b. Setting the constants “optimized”, “good enough” and “acceptable” requires some care. If the level at which the simulation is considered “optimized” is too low, the gradient descent search will take a long time 1300 Robotics finding a good parameter set (recall that low evaluation results correspond to desired behavior). On the other hand, if “optimized” is too high, the gradient descent search could return a parameter set which is close to the edge of the sweet spot, reducing efficiency. The constant “good enough” corresponds to the point at which the control system performs well and does not need further optimization. This means that the conditions for beginning optimization are less rigorous than the conditions for ending optimization, so that each time the gradient descent search is used it is required to perform a non-trivial amount of work. Finally, “acceptable” corresponds to the edge of the sweet spot. If the result of the evaluation is too high, it is considered to be in the area where the evaluation function is essentially useless. Note that the fractions at which the gradient descent search is used increase monotonically from 0, and the only time the parameter set P is modified is when the gradient descent search is used. There are restrictions to this procedure. It must be possible to gradually change the parameter set and configuration from the initial parameter set and configuration to the final one, without leaving the sweet spot. Also, the control system must be able to control the new configuration. For example, the person designing the control system might neglect one of the moments of inertia, and in a new configuration that moment could be extremely important for stability. Since this tuning method will not alter the control system, it will not be able to address this type of problem. Additionally, the behavior for the initial configuration must be similar to the behavior required in the final configuration. If there is a drastic change in strategy involved, such as a change in gait, it may not be able to find suitable parameters. Finally, if the global minimum is on the edge of the sweet spot, this type of tuning will be inefficient, although still functional. Results In order to evaluate the algorithm described in the previous chapters, I applied it to adjusting the control of the simulated quadruped running machine shown in figure 1. I used a planar quadruped simulation because it allowed the solution of interesting problems with a reasonable amount of processing time. I tested the algorithm for variations in leg length, body weight, and desired speed; it performed well on all of these. Due to space considerations, only the data on variations in leg length is included here. The tuning algorithm was used to reduce the leg length to half its initial value, while maintaining the trotting gait. Interestingly, the problem of getting the quadruped to trot with half-length legs was more difficult than expected because of the very short travel allowed for the leg actuators. The initial configuration was the quadruped simulation and control system mentioned earlier. The final 2 -1 0 1 2 3 4 Time (s) Figure 4: Height above stable before and after indicate half leg length same, so the quadruped 5 6 7 OO 1 2 3 4 Time (s) 5 6 7 ground and forward velocity over time, included to illustrate that the trotting achieved is tuning. Solid lines indicate original leg length and original parameter set and dashed lines and the corresponding tuned parameter set. Note that the initial conditions remain the with shorter legs actually falls to the ground, stops, and begins trotting. Parameter Maximum acceleration Leg spring constant, compression Leg damping coefficient, compression Leg spring constant, extension Leg damping coefficient, extension Stance leg length Stance leg length increase Swing leg length Increase in swing hip torque with speed Acceleration rate Hip servo Hip damping Desired forward speed Table 1: Parameters Initial Final 0.31284 0.36085 7803.57 471.971 8731.33 630.748 21183.7 462.931 0.62217 19637.5 1071.51 0.35498 0.08635 0.45252 -0.0042 0.65024 282.093 0.07645 0.28027 -0.0058 0.05992 182.354 21.3930 1.50000 17.0266 0.34164 Units m N/m Ns/m N/m Nslm m m NT Nit Nms ds modified while decreasing the quadruped leg length. configuration was the same quadruped simulation and control system, with legs half as long and leg moments of inertia and masses scaled as cylinders. The tuning experiment took three and a half days on an IBM RS/SOOO model 550 to find the result listed in table 1. Figure 4 illustrates the stable running elicited by the original and final parameter sets when used with their respective simulations. Figure 5 shows the leg lengths at which the optimization occurred. Note that that the original control parameters for full length legs will not work on the final quadruped. Conclusions The tuning algorithm presented here has successfully solved several optimization problems relating to dynamically stable legged locomotion. All of these problems involve a planar trotting quadruped simulation and vary the amount of weight on the body, the leg length, or how closely it tracks a desired speed. I have also used this tuning algorithm to increase the amount of weight on the quadruped’s feet and to increase the running speed of a kangaroo-like robot. Many searching methods fail when dealing with dynamically balanced legged locomotion because easily created evaluation functions tend to result in a search space which is only tractable near a solution. The 20 Attempt Figure 5: Leg lengths where the tuner tried to optimize (dotted line) and leg lengths where it succeeded (solid line). Control 1301 tuning algorithm presented here succeeds because it makes the simplifying assumption that the tractable area moves slowly as the simulation is altered. This simplification allows the use of a fairly simple search within the tractable area. Because of the assumptions behind it, there are limitations to the usefulness of this tuning algorithm. It must be possible to gradually change the parameter set and configuration from the initial parameter set and configuration to the final one, without leaving the sweet spot. If there is a drastic change in strategy involved, such as a change in gait, it may not be possible to gradually change the control parameters. Also, the control system must be capable of controlling the new configuration, as the tuning algorithm will not alter the structure of the control system. Finally, if the global minimum is on the edge of the sweet spot, this tuning algorithm will be inefficient. Some of these limitations can be overcome by carefully constructing the evaluation function. Even with its limitations, this tuning method will prove useful for modifying simulations and eliciting desired behaviors. I believe that tuning methods such as the one presented here will turn the art of tuning a simulation into the art of constructing an evaluation function-still an art, but one which is a little easier. Acknowledgments The author would like to thank Marc Raibert for his guidance during the course of this research. References Helferty, J. J.; Collins, J. B.; and Kam, M. 1988. A learning strategy for the control of a mobile robot that hops and runs. In Proceedings of the 1988 International Association of Science IASTED. Development. and Technology for Hodgins, J. K., and Raibert, M. H. 1989. Biped gymnastics. International Journal of Robotics Research. Hodgins, J. K., and Raibert, M. H. 1991. Adjusting step length for rough terrain locomotion. IEEE Transactions on Robotics and Automation 7(3). Pearse, M.; Arkin, R.; and Ram, A. 1992. The learning of reactive control parameters through genetic algorithms. Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems 1~130-137. Playter, R. R., and Raibert, M. H. 1992. Control of a biped somersault in 3d. In IFToMM-jc International Symposium on Theory of Machines and Mechanisms. Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Numerical Recipes in C. Vetterling, W. T. 1988. Cambridge University Press. chapter 10, 290-352. Raibert, M. H.; Hodgins, J. K.; Playter, R. R.; and Ringrose, R. P. 1992. Animation of maneuvers: Jumps, somersaults, and gait transitions. In Imagina. 1302 Robotics Raibert, M. H.; Chepponis, M.; and Brown, Jr., B. 1986. Running on four legs as though they were one. IEEE Journal of Robotics and Automation RA-2(2). Raibert, M. H. 1990. Trotting, pacing and bounding by a quadruped robot. Journal of Biomechanics 23. Ringrose, R. 1992a. The creature library. Unpublished reference guide to a C library used to create physically realistic simulations. Ringrose, R. 199213. Simulated creatures: Adapting control for variations in model or desired behavior. Master’s thesis, Massachusetts Institute of Technology* Rosenthal, D. E., and Sherman, M. A. 1986. High performance multibody simulations via symbolic equation manipulation and kane’s method. Journal of Astronautical Sciences 34(3):223-239. Winston, P. H., and Shellard, S. A., eds. 1990. Artificial Intelligence at MIT: Expanding Frontiers, vol- ume 2. Cambridge, MA: MIT Press. 149-179. Witkin, A., and Kass, M. 1988. Spacetime constraints. In Computer Graphics, 159-168.