Evolvable Hardware John Mixter Overview Motivation Artificial Neural Networks Genetic Algorithms Evolvable Hardware Neurograph Networks 2 Motivation The looming threat of Moore’s Law Previous barriers have been technology based. The barrier we are approaching now is physics based. The unforgiving nature of Amdahl’s Law Only a portion of an application can be made parallel. We are not very good at thinking (programming) in parallel. We have been following the von Neumann Model since the late 40’s For the most part our progress has been evolutionary (pipelines, caches, etc). We need to explore new revolutionary ideas. 3 Artificial Neurons An artificial neuron mimics the basic function of a biological neuron. The perceptron was one of the first models of a neuron. Frank Rosenblatt came up with the idea in 1957. An perceptron generates an output signal when the sum of its (inputs × weight) is greater than a threshold value. A perceptron is trained by adjusting the 𝑁 y= 𝑥𝑖 𝑤𝑖 𝑖=0 1 𝑖𝑓 𝑦 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 input weights. 4 Artificial Neural Networks Perceptrons are connected together in parallel to form artificial neural networks (ANNs). ANNs are arranged in layers: Input, Hidden and Output. ANNs are trained by providing an input and testing the output. If the output is incorrect, new weight values are calculated by an error function and the training repeats. The training is indirect, ANNs cannot be programmed by hand. 5 ANNs are Awesome Artificial Neural Networks are massively parallel. One perceptron is useless, connect a bunch together and they become powerful. They are asynchronous, no clock needed. They are very good at prediction and pattern recognition. 6 Example: Branch Prediction The branch address is used to select from a table of perceptrons. The history shift register is presented as inputs to the perceptron, up to 128 inputs. The prediction is calculated. After the branch direction is determined, it is compared against the predicted direction. If the actual direction taken does not agree with the prediction, the perceptron is trained. Error = Desired Output – Actual Correction = Learning Rate × Error w0 += ( x0 × Correction ) Based on work done by Daniel A. Jim´enez and Calvin Lin University of Texas at Austin 7 Did it work? Integer Benchmark Averages 1 1 0.98 0.98 0.96 0.96 0.94 0.94 0.92 GAg 0.9 GAp GShare 0.88 PAg PAp 0.86 Direction Accuracy Direction Accuracy Floating Point Benchmark Averages 0.92 GAg 0.9 GAp GShare 0.88 PAg PAp 0.86 Neural Neural 0.84 0.84 0.82 0.82 0.8 0.8 Hardware Costs Hardware Costs SimpleScalar running Spec2000 Benchmarks 8 ANNs and FPGAs ANNs do not perform well in software. WHY? 9 ANNs and FPGAs ANNs do not perform well in software. They need to run and play in parallel. FPGAs seem like a good platform, lots of gates and reconfigurable. Researchers have tried and failed to implement ANNs on FPGAs, routing is a problem. You need many perceptrons to make a good predictive network. Larger FPGAs offer hope. 10 Genetic Algorithms Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation by Scott Hauck and André DeHon Morgan Kaufmann Publishers © 2008 ISBN:9780123705228 11 An Example • Generate 100 viable chromosomes and add them to the gene pool. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 • Decode the chromosomes (build equations). • Evaluate the chromosomes: 1 • 𝐹𝑖𝑡𝑛𝑒𝑠𝑠 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝐴𝑛𝑠𝑤𝑒𝑟 − 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝐴𝑛𝑠𝑤𝑒𝑟 • Select fittest chromosomes. • Randomly mutate chromosomes. • Randomly split/select genes for mating. 0 1 2 3 4 5 6 7 8 9 + * / 0110 1010 0101 1100 0100 1101 0010 1010 0001 6 + 5 * 4 / 2 + 1 • Add new generation to gene pool. http://www.ai-junkie.com/ga/intro/gat1.html 12 Does it work? It finds a solution quickly. But, it may not work for other answer sets, so it needs to keep evolving. Notice the chromosome – 010010110100100101110101110… Looks a lot like an FPGA bitstream. 13 Evolvable Hardware Evolvable hardware physically changes to solve a given problem. It does this by dynamically reconfiguring its connections and functions. FPGAs are an excellent platform for evolvable hardware research. Bitstreams for partial reconfiguration are chromosomes in the gene pool. An FPGA is initially configured as an 2D array of processing elements that have a fixed number of predefined functions. The generated chromosomes are fed directly into the FPGA as a reconfigure frame that determines the function and connections of each processor element (PE). After the reconfiguration has taken place, the array is evaluated and the fittest chromosomes are selected, mutated and put into the gene pool. 14 FPGA based Evolvable Hardware 15 A Case Study In the paper “Towards Evolvable Systems Based on the Xilinx Zynq Platform” a case study was performed that demonstrated an evolving image filter using a Zynq-7000. • The filter consists of: • • • 9 Processing Elements (PEs) 9 inputs, one for the pixel being filtered and 8 for its neighbors 1 output, the filtered pixel • A PE input can be connected to a filter input or to a PE output in the direction of the filter inputs. 16 Chromosome Format The PE gene is encoded as follows: Input A 0000 Input B 0000 Operation 0000 The chromosome length would be (3 numbers × 3 ×3) + 1 =28 numbers Each number is 4 bits, total would be 4 × 28 = 112 bits Code Operation Description 0000 255 constant 0001 x identity 0010 255 − x inversion 0011 x∨y bitwise OR 0100 x∨y bitwise x OR y 0101 x∧y bitwise AND 0110 x∧y bitwise NAND 0111 x⊕y bitwise XOR 1000 x >> 1 right shift by 1 1001 x >> 2 right shift by 2 1010 swap (x, y) swap nibbles 1011 x+y addition 1100 x+sy addition with saturation 1101 (x + y) >> 1 average 1110 max (x, y) maximum 1111 min (x, y) minimum 17 Filter Input To evolve the filter, the inputs are the target pixel (141) and its eight neighbors. 18 Filter Evolution The chromosome pool is filled. The chromosomes are injected to configure the PEs. The filter is given an input. 19 Filter Evolution The output is calculated by the filter. The results are evaluated by 𝑐−1 𝑟−1 𝑝 𝑖, 𝑗 − 𝑝𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑖, 𝑗 fitness = 𝑖=0 𝑗=0 The most fit chromosomes are selected to create the next generation In this example, the fittest chromosomes are the ones with the smallest fitness values, zero being perfect. The process continues until a suitable filter is created. 20 Paper Results The experiments were focused on the time needed to evaluate a given number of generations and did not address how well the filter worked. Mutations PS i5 VRC DPR DPR DPR DPR DPR DPR DPR DPR 7 7 7 1 2 3 4 5 6 7 8 Individual (µs) Generation (µs) Generations (s-1) Build/Eval Time Build/Eval Time Build/Eval per second 225,285.3 42,372.9 469.3 206.2 247.2 288.2 329.2 370.2 411.2 452.2 493.2 901,141.1 169,491.5 1877.2 824.8 988.8 1152.8 1316.8 1480.8 1644.8 1808.8 1972.8 1.1 5.9 532.7 1212.4 1011.3 867.5 759.4 675.3 608 552.9 506.9 Acceleration 1 5 484 1102 919 789 690 614 553 503 461 PS - Pure software running on the Zynq-7000 on chip processor (~3W) i5 - Pure software running on a i5 @ 3.33GHz (~80W) VRC - Virtual Reconfigurable Circuits, a workaround for the absence of partial reconfiguration in early Virtex chips DPR - Dynamic Partial Reconfiguration 21 Neurograph Networks A neurograph network is a hybrid of an ANN and Evolvable Hardware. They are structured as a high level ANN and are trained in a similar fashion as a perceptron. The network structure and connections are evolved using hardware evolution techniques described earlier. Small specialized networks are evolved and stored to be used by them- selves or combined into larger, more powerful networks. The goal is to create super massive parallel networks to predict, recognize and solve problems. 22 Creating a Neurograph 23 5 x 5 Matrix Determinate Determinate = a(g(m(sy - tx)-n(ry - tw)+o(rx - sw)) - h(l(sy - tx) n(qy - tv) + o(qx - sv)) + i(l(ry - tw) - m(qy - tv) + o(qw - rv)) j(l(rx - sw) - m(qx - sv) + n(qw - rv))) - b(f(m(rx - sw) - n(qx - sv) + o(qw - rv)) - h(k(sy - tx) - n(py - tu) + o(px - su)) + i(k(ry - tw) m(py - tu) + o(pw - ru)) - j(k(rx - sw) - m(px - su) + n(pw - ru))) + c(f(l(sy - tx) - n(qy - tv) + o(qx - sv)) - g(k(sy - tx) - n(py - tu) + o(px - su)) + i(k(qy - tv) - l(py - tu) + o(pv - qu)) - j(k(qx - sv) - l(px - su) + n(pv - qu))) - d(f(l(ry - tw) - m(qy - tv) + o(qw - rv)) - g(k(ry - tw) - m(py - tu) + o(pw - ru)) + h(k(qy - tv) - l(py - tu) + o(pv qu)) - j(k(qw - rv) - l(pw - ru) + m(pv - qu))) + e(f(l(rx - sw) - m(qx - sv) + n(qw - rv)) - g(k(rx - sw) - m(px - su) + n(pw - ru)) + h(k(qx - sv) - l(px - su) + n(pv - qu)) - i(k(qw - rv) - l(pw - ru) + m(pv qu))) 24 a b c d e f g h i j k l m n o q p r t s u v w x y ne0 ne1 ne3 ne4 ne6 ne7 ne14 ne15 ne17 ne18 ne25 ne26 ne50 ne51 ne53 ne54 ne61 ne62 ne81 × × × × × × × × × × × × × × × × × × × ne82 × ne2 ne5 ne8 ne16 ne19 ne27 ne52 ne55 ne63 ne83 - - - - - - - - - - ne9 ne10 ne11 ne33 ne34 ne35 ne45 ne46 ne20 ne21 ne22 ne56 ne57 ne58 ne28 ne29 ne30 ne64 ne65 ne66 ne84 ne85 ne86 ne69 ne70 ne71 ne89 ne90 ne91 ne101 ne102 ne103 × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × .. ne12 ne36 ne48 ne23 ne59 ne31 ne67 ne87 ne72 ne92 ne104 - - - - - - - - - - - ne13 ne37 ne49 ne24 ne60 ne32 ne68 ne88 ne73 ne93 ne105 + + + + + + + + + + + ne38 ne39 ne40 ne41 ne74 ne75 ne76 ne77 ne94 ne95 ne96 ne97 ne106 ne107 ne108 ne109 ne113 ne114 ne115 ne116 × × × × × × × × × × × × × × × × × × × × ne42 ne43 ne78 ne79 ne98 ne99 ne110 ne111 ne117 ne118 - - - - - - - - - - ne44 ne80 ne100 ne112 ne119 + + + + + ne120 ne121 ne122 ne123 ne124 × × × × × ne125 ne126 - - ne127 + ne127 + Output 25 Project Goal To implement key neurograph functions in software. To determine the feasibility of implementing a neurograph network on an FPGA. 26