Artificial Neural Networks (ANN) For Data Mining The Biology Analogy Brain cells vs. other cells? Neurons: brain cells 3 Nucleus (at the center) Dendrites provide inputs Axons send outputs Synapses increase or decrease connection strength and cause excitation or inhibition of subsequent neurons Artificial Neural Networks (ANN) Three Interconnected Artificial Neurons Biological Soma Dendrites Axon Synapse Slow speed Many neurons (50-150 Billions) 4 Artificial Node Input Output Weight Fast speed Few neurons (Dozens) ANN Fundamentals Components and Structure “A network is composed of a number of processing elements organized in different ways to form the network structure” Processing Elements (PEs) – Neurons Network Structure of the Network 5 Collection of neurons (PEs) grouped in layers Topologies / architectures – different ways to interconnect PEs ANN Fundamentals 6 Calculations on the PE level ANN Fundamentals Processing Information by the Network 7 Inputs Outputs Connection weights Summation Function (transfer function) ANN Fundamentals Transformation (Transfer) Function Computes the activation level of the neuron Function types: Linear, sigmoid (logical activation), or hyperbolic tangent function Y = 0.77 8 ANN Architectures / Structures 9 Learning in ANN 1. 2. 3. 10 Compute outputs Compare outputs with desired targets Adjust the weights and repeat the process Neural Network Application Development Preliminary steps ANN Application Development Process 1. 2. 3. 4. 5. 6. 7. 8. 9. 11 Requirement determination Feasibility study Top management champion Collect Data Separate into Training and Test Sets Define a Network Structure Select a Learning Algorithm Set Parameter values, Initialize Weights Transform Data to Network Inputs Start Training (Revise Weights) Stop and Test Implementation/Deployment: Use the Network with New Cases Data Collection and Preparations Collect data and separate it into Training set (60%) Cross validation set (20%) Testing set (20%) Make sure that all three sets represent the population: true random sampling (stratification) Error (MSE) 12 Best Generalization Cross Validation Set x Training Set x Number of Iterations (Epochs) Use training and cross validation cases to adjust the weights Use test cases to validate the trained network Neural Network Architecture Feed forward Neural Network Multi Layer Perceptron, - Two, Three, sometimes Four or Five Layers Class 1 - FLOP (BO < 1 M) Class 2 (1M < BO < 10M) MPAA Rating (5) (G, PG, PG13, R, NR) Competition (3) (High, Medium, Low) Class 3 (10M < BO < 20M) Star Value (3) (High, Medium, Low) Class 4 (20M < BO < 40M) Genre (10) (Sci-Fi, Action, ... ) Class 5 (40M < BO < 65M) Class 6 (65M < BO < 100M) Technical Effects (3) (High, Medium, Low) Sequel (1) (Yes, No) ... ... Class 7 (100M < BO < 150M) Class 8 (150M < BO < 200M) Number of Screens (Positive Integer) Class 9 - BLOCKBUSTER (BO > 200M) INPUT LAYER (26 PEs) 13 HIDDEN LAYER I (18 PEs) HIDDEN LAYER II (16 PEs) OUTPUT LAYER (9 PEs) Neural Network Preparation Choose the network's structure (nodes and layers) Determine several parameters Select initial conditions (randomize the weights) Transform training and testing data to the required format 14 Learning rate (high or low) / momentum Initial weight values Other parameters Non-numerical Input Data (text, pictures): preparation may involve simplification or decomposition Training the Network Present the training data set to the network Adjust weights to produce the desired output for each of the inputs 15 Several iterations of the complete training set to get a consistent set of weights that works for all the training data Each iteration is called an Epoch Batch vs. Online learning Supervised Learning: Backpropagation Back-propagation (back-error propagation) Most widely used learning Relatively easy to implement Requires training data for conditioning the network before using it as a predictor Network includes one or more hidden layers Network is considered to be feed-forward * Also, look at the other learning methods in your book 16 Backpropagation algorithm How does backpropagation algorithm minimizes the error • By taking the partial derivative of the error of the network with respect to each weight, the ANN learns about the direction of the error and if possible moves to minimize it. 17 Backpropagation 1. 2. 3. 4. 5. Initialize the weights Read the input vector Generate the output Compute the error Error = Out - Desired Change the weights Drawbacks: 18 A large network can take a very long time to train May not converge Testing Test the network after training Examine network performance: measure the network’s prediction/classification ability Do the inputs produce the appropriate outputs? Not necessarily 100% accurate But may be better than most other algorithms Test plan should include 19 Routine cases Potentially problematic situations May have to retrain based on the test results Implementation Frequently requires Interfaces with other CBIS 20 Embedded into parent software applications User training Gain confidence of the users and management early A Sample Neural Network Project Backdropcy Prediction – Sharda et al. 21 ANN Development Tools 22 NeuroSolutions Statistica Neural Network Toolkit Braincel (Excel Add-in) NeuralWorks Brainmaker PathFinder Trajan Neural Network Simulator NeuroShell Easy SPSS Neural Connector Matlab Neural Network Toolkit Benefits of ANN 23 Pattern recognition, learning, classification, generalization and abstraction, and interpretation of incomplete and noisy inputs Character, speech and visual recognition Can tackle highly complex / nonlinear problems Robust Fast Flexible and easy to maintain Powerful hybrid systems Limitations of ANN 24 Lack explanation capabilities (a.k.a. black-box syndrome) Training time can be excessive and tedious for large and complex data sets Usually requires significantly large amounts of training and test data Also requires knowledge to set the proper parameters to generate a good model