NEATwise

advertisement
NEATwise
Justin Chen & Gabriele Giaroli
April 21st, 2016
EVOLVING HETEROGENEOUS ARTIFICIAL NEURAL NETWORKS
THROUGH COMBINATORIALLY GENERATED PIECEWISE
ACTIVATION FUNCTIONS
Introduction
Our original idea:
Optimize NEAT Algorithm by introducing piecewise activation functions
“NEATwise”
Results:
NEATwise performs comparably well to original NEAT on its hardest
benchmark
Combinatorially generated piecewise activation functions are competitive
with conventional activation functions!
Motivation
Yann LeCun, Yoshua Bengio & Geoffrey Hinton,
Deep Learning (2015) Nature
ReLU activation functions most optimal for CNN
Combinatorial generated piecewise activation
functions are unexplored
Model resting and activated states
Activation function: ReLU
What is NEAT
NEAT stands for “NeuroEvolution of Augmenting Topologies”, developed by Professors
Kenneth O. Stanley and Risto Miikkulainen et al. in 2001
Genetic Algorithm
Stochastic reinforcement learning control tasks
Protean n-polytopic space
NEAT’s 3 main characteristics:
Growing from Minimal Structure
Principled Crossover
Speciation
What is NEATwise
Sample Pool of Activation Functions
NEAT + Piecewise activation functions
ReLU
ELU
Bent Identity
Canonical activation functions
ArcTa
n
TanH
Sine
Combinatorially generated piecewise activation
functions
Heterogeneous networks
Capable of differentiable and nondifferentiable
functions
ArcTan and Sine Piecewise
Approach
Initially..
Selected canonical activation functions that could be combined into piecewise differentiable
functions
Analyzed code and made modifications to create NEATwise
Realized uniform selection function makes NEAT fail on Double Pole Balancing No Velocity
(DPNV) benchmark
Then.. Change approach: restricted version of NEATwise
Tested original NEAT with each canonical function to find best performing canonical
functions
Finally, thoroughly tested Sigmoid/Arctan-NEATwise (SA-NEATwise) on DPNV
Canonical Activation Functions
1. Sigmoid
2. Tanh (Hyperbolic Tangent)
3. ReLU (Rectified Linear Unit)
4. Arctan
5. ELU (Exponential Linear Unit)
6. BentId
7. Sine
Uniform Random Selection
NEATwise fails when selecting uniformly at random
Uniform random noise destroys genetic algorithm search bias
Activation functions required bias for selection
Redesigned our approach to use homogeneous networks as a baseline
Activation functions of two best performing homogeneous networks were selected
for further study, specifically Arctan and Sigmoid
Homogeneous Network Baselines
7 canonical functions
Version: NEAT.1.2.1
Test: Double Pole Balancing No Velocity
Number of tests: 20 per function
Parameter file: p2nv.ne
Homogeneous ELU network
Homogeneous Networks Baselines
Average accuracy for each function over 20 runs of original NEAT
However, 100 out-of-box tests for a fair comparison later on
Function
Out-of-box
Accuracy
96.70%
Arctan
ELU
Sine
Tanh
BentId
ReLU
Sigmoid
94.20%
87.9%
87.40%
84.10%
82.05%
80.00%
54.20%
Benchmark
Double Pole Balancing No Velocity:
Main benchmark used by original NEAT
Hardest of the pole balancing tests
Easier variants:
Single Pole with Velocity
Single Pole without Velocity
Double Pole experiment
Double Pole with Velocity
XOR approximation
Performance Metrics
Evaluations
Example Test Results:
Failures: 0 out of 100 runs
Generalization
Network size
Activation function distribution
100 experiments/instance
100 instances
Evaluation info:
Total # evals: 2,968,877
Total # samples: 100
Average # evals: 29,688.8
Generalization info:
Total generalization score: 26,741
Total generalization champs: 100
Average generalization: 267
Network size info:
Total Nodes: 17,923
Total Samples: 3026
Average Network Size: 5.923
Sigmoid-Arctan NEATwise III
Activation functions:
Arctan(x) & Sigmoid(x)
50% bias towards Arctan
50% bias towards Sigmoid
Slope: unaltered
Discontinuous
Piecewise functions: Sigmoid-Arctan, Arctan-Sigmoid
Success: 0 experiments
Failure: 100 experiments
50-50 SA NEATwise 1, Genome 468
Population Statistics
Damaged
Genome 468
Node 1 & 262: Arctan
Node 2, 4 & 5: Sigmoid
Node 3 & 10: Arctan-Sigmoid
Node 155: Sigmoid-Arctan
Sigmoid-Arctan NEATwise II
Activation functions:
Arctan(x) & Sigmoid(x)
87.5% bias towards Arctan
12.5% bias towards Sigmoid
Slope: unaltered
Discontinuous
Piecewise functions: Sigmoid-Arctan, Arctan-Sigmoid
Success: 87 experiment
Failure: 13 experiments
Sigmoid-Arctan NEATwise 1, Genome 350
Population Statistics
Damaged
Genome 350
Node 1, 2, 4, 5 & 87: Arctan
Node 3: Sigmoid-Arctan
Node 45: Arctan-Sigmoid
Sigmoid-Arctan NEATwise 88, Genome 175
Population Statistics
Accuracy:
99%
Avg Network size: 6.04
Avg Evals:
35,984.10
Avg Generalization: 263.00
Genome 175
Node 1, 3, 4, 6, 158 & 388: Arctan
Node 2: Sigmoid-Arctan
Sigmoid-Arctan NEATwise I
Activation functions:
Arctan(x) & Sigmoid(4.92*x) - 0.5
87.5% bias towards Arctan
12.5% bias towards Sigmoid
Continuous
Success: 88 experiments
Failure: 12 experiment
Piecewise functions: Sigmoid-Arctan, Arctan-Sigmoid
Sigmoid-Arctan NEATwise 17, Genome 177
Population Statistics
Damaged
Genome 177
Node 1 & 2:
Arctan
Sigmoid-
Node 4, 5, & 46: Sigmoid
Node: 3
Arctan
Sigmoid-Arctan NEATwise 27, Genome 14
Population Statistics
Accuracy:
100%
Avg Network size: 6.53
Avg Evals:
33,701.00
Avg Generalization: 271.00
Genome 14
Node 1 & 2:
Arctan-Sigmoid
NEATwise MARI/O
Generations: 274
Max fitness: 4115
Training: 5 days
87.5% bias for Arctan
12.5% bias for Sigmoid
4.924273 slope for Sigmoid
Discontinuous
NEAT vs. NEATwise
Method
Accuracy
Evaluations Generalization No. Nodes
NEAT
96.70%
28,925
269
5.72
NEATwise I
97.27%
36,452
279
6.56
NEATwise II
98.39%
36,792
266
6.28
NEATwise III
0.00%
0
0
0
Future work
Gaussian NEATwise to select N activation functions
Expanding the choices for canonical function selection
Analyzing optimal bias for choosing each activation function of the piecewise
Multi-piece piecewise activation functions
Combinatorial generated piecewise activation functions in different architectures
Analyze network structures to explain hidden failures
As the problem domain changes, how do you determine the optimal nontopological hyperparameters such as drop off age, mutation rate, initial
population size, etc.?
Conclusions
1. Combinatorially generated piecewise activation functions outperforms
conventional choices
2. Automatic model selection
3. Enhanced expressive power
4. NEATwise does not require parameter tuned slopes
5. Slightly decreased average generalization
6. Requires negligibly more evaluations
7. Adaptable to various problem domains
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
Stanley, Kenneth O., and Risto Miikkulainen. "Evolving neural networks through augmenting topologies." Evolutionary computation 10.2 (2002): 99-127.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
Stanley, Kenneth O. "Compositional pattern producing networks: A novel abstraction of development." Genetic programming and evolvable machines 8.2 (2007): 131-162.
Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.
Stanley, Kenneth O., Bobby D. Bryant, and Risto Miikkulainen. "Evolving adaptive neural networks with and without adaptive synapses." Evolutionary Computation, 2003. CEC'03. The 2003 Congress
on. Vol. 4. IEEE, 2003.
Stanley, Kenneth O., David B. D'Ambrosio, and Jason Gauci. "A hypercube-based encoding for evolving large-scale neural networks." Artificial life 15.2 (2009): 185-212.
Agostinelli, Forest, et al. "Learning activation functions to improve deep neural networks." arXiv preprint arXiv:1412.6830 (2014).
Yao, Xin. "Evolving artificial neural networks." Proceedings of the IEEE 87.9 (1999): 1423-1447.
Turner, Andrew James, and Julian Francis Miller. "NeuroEvolution: Evolving Heterogeneous Artificial Neural Networks." Evolutionary Intelligence 7.3 (2014): 135-154.
D. White and P. Ligomenides, “GANNet: A genetic algorithm for optimizing topology and weights in neural network design,” in Proc. Int. Workshop Artificial Neural Networks (IWANN’93), Lecture Notes
in Computer Science, vol. 686. Berlin, Germany: Springer-Verlag, 1993, pp. 322–327.
Y. Liu and X. Yao, “Evolutionary design of artificial neural networks with different nodes,” in Proc. 1996 IEEE Int. Conf. Evolutionary Computation (ICEC’96), Nagoya, Japan, pp. 670–675.
M. W. Hwang, J. Y. Choi, and J. Park, “Evolutionary projection neural networks,” in Proc. 1997 IEEE Int. Conf. Evolutionary Computation, ICEC’97, pp. 667–671.
Whitley, Darrell. "A genetic algorithm tutorial." Statistics and computing 4.2 (1994): 65-85.
Duch, Włodzisław, and Norbert Jankowski. "Survey of neural transfer functions." Neural Computing Surveys 2.1 (1999): 163-212.
Duch, Wlodzislaw, and Norbert Jankowski. "Transfer functions: hidden possibilities for better neural networks." ESANN. 2001.
Weingaertner, Daniel, et al. "Hierarchical evolution of heterogeneous neural networks." Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on. Vol. 2. IEEE, 2002.
Turner, Andrew James, and Julian Francis Miller. "Cartesian genetic programming encoded artificial neural networks: a comparison using three benchmarks." Proceedings of the 15th annual
conference on Genetic and evolutionary computation. ACM, 2013.
Sebald, Anthony V., and Kumar Chellapilla. "On making problems evolutionarily friendly part 1: evolving the most convenient representations."Evolutionary Programming VII. Springer Berlin Heidelberg,
1998.
Annunziato, M., et al. "Evolving weights and transfer functions in feed forward neural networks." Proc. EUNITE2003, Oulu, Finland (2003).
Duch, Włodzisław, and Norbert Jankowski. "New neural transfer functions."Applied Mathematics and Computer Science 7 (1997): 639-658.
Yao, Xin, and Yong Liu. "Ensemble structure of evolutionary artificial neural networks." Evolutionary Computation, 1996., Proceedings of IEEE International Conference on. IEEE, 1996.
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Deep sparse rectifier neural networks." International Conference on Artificial Intelligence and Statistics. 2011.
Download