CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales Geoffrey Hinton An overview of how biology solves search problems • Searching for good combinations can be very slow if its done in a naive way. • Evolution has found many ways to speed up searches. – Evolution works too well to be blind. It is being guided. – It has discovered much better methods than the dumb trial-and-error method that many biologists seem to believe in. Some search problems in Biology • Searching for good genes and good policies for when to express them. – To understand how evolution is so efficient, we need to understand forms of search that work much better than random trial and error. • Searching for good policies about when to express muscles. – Motor control works much too well for a system with a 30 mille-second feedback loop. • Searching for the right synapse strengths to represent how the world works – Learning works much too well to be blind trial and error. It must be doing something smarter than just randomly perturbing synapse strengths. A way to make searches work better • In high-dimensional spaces, it is a very bad idea to try making multiple random changes. – Its impossible to learn a billion synapse strengths by randomly changing synapses. – Once the system is significantly better than random, almost all combinations of random changes will make it worse. • It is much more effective to compute a gradient and change things in the direction that makes things better. – That’s what brains are for. They are devices for computing gradients. What of? A different way to make searches work better • It is much easier to search a fitness landscape that has smooth hills rather than sharp spikes. – Fast adaptive processes can change the fitness landscape to make search much easier for slow adaptive processes. An example of a fast adaptive process changing the fitness landscape for a slower one • Consider the task of drawing on a blackboard. – It is very hard to do with a dumb robot arm: • If the robot positions the tip of the chalk just beyond the board, the chalk breaks. • If the robot positions the chalk just in front of the board, the chalk doesn’t leave any marks. • We need a very fast feedback loop that uses the force exerted by the board on the chalk to stop the chalk. – Neural feedback is much too slow for this. A biological solution • Set the relative stiffnesses of opposing muscles so that the equilibrium point has the tip of the chalk just beyond the board. • Set the absolute stiffnesses so that small perturbations from equilibrium only cause small forces (this is called “compliance”). • The feedback loop is now in the physical system so it works at the speed of shockwaves in the arm. – The feedback in the physics makes a much nicer fitness landscape for learning how to set the muscle stiffnesses. The energy landscape created by two opposing muscles Physical energy in the opposing springs start Location of board Location of endpoint The difference of the two muscle stiffnesses determines where the minimum is. The sum of the stiffnesses determines how sharp the minimum is. Two fitness landscapes • System that directly specifies joint angles • System that specifies spring stiffnesses fitness fitness neural signals neural signals Objective functions versus programs • By setting the muscle stiffnesses, the brain creates an energy function. – Minimizing this energy function is left to the physics. – This allows the brain to explore the space of objective functions (i.e. energy landscapes) without worrying about how to minimize the objective function. • Slow adaptive processes should interact with fast ones by creating objective functions for them to optimize. – Think how a general interacts with soldiers. He specifies their goals. – This avoids micro-management. Generating the parts of an object “square” + pose parameters sloppy top-down activation of parts parts with topdown support clean-up using lateral interactions specified by the layer above. Its like soldiers on a parade ground Another example of the same principle • The principle: Use fast adaptive processes to make the search easier for slow ones. • An application: Make evolution go a lot faster by using a learning algorithm to create a much nicer fitness landscape (the Baldwin effect). • Almost all of the search is done by the learning algorithm, but the results get hard-wired into the DNA. – Its strictly Darwinian even though it achieves most of what Lamark wanted. A toy example to explain the idea • Consider an organism that has a mating circuit containing 20 binary switches. If exactly the right subset of the switches are closed, it mates very successfully. Otherwise not. – Suppose each switch is governed by a separate gene that has two alleles. – The search landscape for unguided evolution is a one-in-a-million spike. • Blind evolution has to build about a million organisms to get one good one. – Even if it finds a good one, that combination of genes will be almost certainly be destroyed in the next generation by crossover. Guiding evolution with a fast adaptive process (godless intelligent design :-) • Suppose that each gene has three alleles: ON, OFF, and “leave it to learning”. – ON and OFF are decisions hard-wired into the DNA – “leave it to learning” means that on each learning trial, the switch is set randomly. • Now consider organisms that have 10 switches hard-wired and 10 left to learning. – One in a thousand will have the correct hardwired decisions, and with only about a thousand learning trials, all 20 switches will be correct. The search tree Evolution can ask learning: “Am I correct so far?” Evolution: 1000 nodes Learning: 999,000 nodes 99.9% of the work required to find a good combination is done by learning. A learning trial is MUCH cheaper than building a new organism. The results of a simulation (Hinton and Nowlan 1987) • After building about 30,000 organisms, each of which runs 1000 learning trials, the population has nearly all of the correct decisions hard-wired into the DNA. – The pressure towards hard-wiring comes from the fact that with more of the correct decisions hard-wired, an organism learns the remaining correct decisions faster. • This suggests that learning performed almost all of the search required to create brain structures that are currently hard-wired. Using the dynamics of neural activity to speed up learning • A Boltzmann machine has an inner-loop iterative search to find a locally optimal interpretation of the current visible vector. – Then it updates the weights to lower the energy of the locally optimal interpretation. • An autoencoder can be made to use the same trick: It can do an inner loop search for a code vector that is better at reconstructing the input than the code vector produced by its feedforward encoder. – This speeds the learning if we measure the learning time in number of input vectors presented to the autoencoder (Ranzato, PhD thesis, 2009). Major Stages of Biological Adaptation • Evolution keeps inventing faster inner loops to make the search easier for slower outer loops: – Pure evolution: each iteration takes a lifetime. – Development: each iteration of gene expression takes about 20 minutes. The developmental process may be optimizing objective functions specified by evolution (see next slide) – Learning: each iteration takes about a second. – Inference: In one second, a neural network can perform many iterations to find a good explanation of the sensory input. The three-eyed frog • The two retinas of a frog connect to its tectum in a way that tries to satisfy two conflicting goals: – 1. Each point on the tectum should receive inputs from corresponding points on the two retinas. – 2. Nearby points on one retina should go to nearby points on the tectum. • A good compromise is to have interleaved stripes on the tectum. – Within each stripe all cells receive inputs from the same retina. – Neighboring stripes come from corresponding places on the two retinas. What happens if you give a frog embryo three eyes? • The tectum develops interleaved stripes of the form: LMRLMRLMR… – This suggests that in the normal frog, the interleaved stripes are not hard-wired. – They are the result of running an optimization process during development (or learning). • The advantage of this is that it generalizes much better to unforeseen circumstances. – It may also be easier for the genes to specify goals than the details of how to achieve them. The next great leap? • Suppose that we let each biological learning trial consist of specifying a new objective function. • Then we use computer simulation to evaluate the objective function in about one second. – This creates a new inner loop that is millions of times faster than a biological learning trial. • Maybe we are on the brink of a major new stage in the evolution of biological adaptation methods. We are in the process of adding a new inner loop: – Evolution, development, learning, simulation THE END