PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015 BASICS OF ARTIFICIAL NEURAL NETWORKS What is an Artificial Neural Network (ANN)? What makes up a neuron? How is “learning” modelled in ANNs? A neural network is a collection of interconnected neurons that compute and generate impulses Specific parts include neurons, synapses, and activation functions An artificial neural network is a mathematical model, based on natural neural networks found in animals’ brains. STRUCTURE OF A NEURAL NETWORK BASIC STRUCTURE OF A NEURON • There is an input vector containing {x1, x2, … , xn} and an associated vector of weights {w1, w2, … , wn}. • The input x weight vector summation is calculated and the output is sent into an activation function. • Based on the activation function, the summation is mapped to some value, generally between {-1, 1}, such as in the shown step activation function. This value is then considered the output of the neuron. To properly train a neural network, the weights must be “tuned” to model the goal function as closely as possible. “Goal” function represents the function that maps input data to output data in our training set. Training a neural network is by far the most costly step in the majority of scenarios. Google has reported training times <2 days for certain problems and network sizes. Once trained, new items can be classified very quickly though Some popular options Backpropagation (used in the majority of cases). genetic algorithms with simulated annealing Hebbian learning a combination of different methods in a “Committee of Machines” TRAINING A NEURAL NETWORK Most popular training method Works by reducing error on the training set Uses gradient descent on the error Requires many training examples to get error low mean squared error Partial derivatives are used to determine which neuron/weight to blame for parts of the error BACKPROPAGATION Backward pass is done through backpropagation • Uses chain rule to calculate partial derivative Underlying operations are embarrassingly parallel, but many problems still remain Backpropagation, Communication and Computational issues all must be considered when scaling neural networks Requires neurons of one layer to be fully connected to the neurons of the next layer Gradient descent is prone to getting stuck in local optima Lots of communication required Requires many iterations to reduce error to acceptable rate Training data set sizes are very large Rule of thumb for error Training set size should be roughly the number of weights divided by the permitted classification error rate 10% error rate = 10x the number of weights, 1% = 100x, etc. PROBLEMS WITH SCALING BACKPROPAGATION Main operation is matrix multiplication N-node layer requires N2 scalar multiplications and N sums of N numbers Requires a good multiply or multiply-and-add function Activation function Often sigmoid is used f(x) = 1/(1+e-x) Has to be approximated efficiently COMPUTATIONAL ISSUES IN SCALING ANNS High degree of connectivity Large data flows Structure and bandwidth are very important Broadcasts and ring topologies are often used because of the necessary communication requirements More processors does not mean faster computation in many cases COMMUNICATION ISSUES IN SCALING ANNS Model dimension Data Dimension One model, but multiple workers train individual parts Different workers train on completely different sets of data High amount of communication Also high amount of communication Need to synchronize at the edges Efficient when the computation is heavy per neuron Datasets where each data point contains many attributes Need to synchronize parameters, weights to ensure consistent model Efficient when each weight needs a high amount of computation Large datasets where each data point only contains a few attributes TWO KEY METHODOLOGIES Example of splitting on the data dimension Inspired by human brain’s ability to communicate between groups of neurons without fully connected paths Focused on parallelizing the model dimension Uses MPI library Reduces need for communication between every neuron in consecutive layers of a neural network Only boundary values are communicated between “ghost” neurons SPANN (SCALABLE PARALLEL ARTIFICIAL NEURAL NETWORK) Neocortex is the part of the brain most commonly associated with intelligence Columnar structure with an estimated 6 layers BIOLOGICAL INSPIRATION Recall from Serial Backpropagation Example comparison of 3 layer network: • Serial ANN • 200 input, 48 output, 125 hidden • (200+48)*125 = 31,000 weights need to be trained • Using SPANN in a Parallel ANN • 200 input, 48 output, 120 hidden • 6 layers, 8 processors • 30,280 weights need to be trained, but only 3785 per processor SPANN CONT. Parallel Backpropagation • L is the number of layers, including input/output layers • Nproc is the number of processors being used • As shown by the first box, every input is sent to every processor • Each processor only has Nhidden / Nproc hidden neurons/layers and Nout / Nproc output layers • Divide by number of processors to get weights/processor • 37890 weights on a serial ANN took 1313 seconds to complete training, compared to 30,240 weights taking 842 seconds • There is significant slowdown shown in the serial version • 8 resolution computes ~36 weights/sec, but 9 resolution falls to only ~28.5 weights/sec • The time taken per weight grows slower in SPANN, so once the size of the training data reaches a significant size, it becomes much quicker per weight. • Speedup factor is related to the training data size • Larger size, larger speedup PERFORMANCE COMPARISON RESULTS CONT. Developed an architecture that can scale into billions of weights or synapses Successful by reducing the communication requirements in between layers to a few “gatekeeper nodes” Uses a human biological model as inspiration SPANN CONCLUSIONS SCALING ANNS CONCLUSIONS • Neural networks are a tool that have provided significant developments in artificial intelligence and machine learning fields • Scaling issues are big, even though calculations are embarrassingly parallel • Communication • Computational • SPANN showed promising results • Research continues today • Heavy focus on communication, as training set sizes are growing faster than the computational requirements in many cases QUESTIONS?