CSC 562 Business Intelligence Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining S 1/31/2011 1 Business Intelligence CSC 562 Learning Objectives S Understand the concept and definitions of artificial neural networks (ANN) S Know the similarities and differences between biological and artificial neural networks S Learn the different types of neural network architectures S Learn the advantages and limitations of ANN S Understand how backpropagation learning works in feedforward neural networks 1/31/2011 2 Business Intelligence CSC 562 Learning Objectives S Understand the step-by-step process of how to use neural networks S Appreciate the wide variety of applications of neural networks; solving problem types of S Classification S Regression S Clustering S Association S Optimization 1/31/2011 3 Business Intelligence CSC 562 Opening Vignette: (Page 242) “Predicting Gambling Referenda with Neural Networks” S Using NeuroSolutions, this study developed and tested models to predict community support for commercial gaming. S The study examined the role of factors that contribute to legalization and/or probation of gambling activities using neural networks. S It attempted to use Neural Network technology to predict various counties voting outcome on this subject. 1/31/2011 4 Business Intelligence CSC 562 Opening Vignette: S On average, the models accurately predicted the voting results for 4 out of every 5 counties (approximately 82% accuracy) on a sample data set. (1287 records of data) S Interestingly, and contrary to popular belief, the counties financial characteristics and age distribution were not found to be significant factors in determining ballot outcome. Dominant factors are identified on Page 244 S The study demonstrates that demographic data can be used to accurately predict voting outcomes on controversial issues. 1/31/2011 5 Business Intelligence Opening Vignette: Predicting Gambling Referenda… CSC 562 Socio-demographic = Predicted vs. Actual Religious Financial Voted “yes” or “no” to legalizing gaming . . . . . . Other INPUT LAYER 1/31/2011 HIDDEN LAYER OUTPUT LAYER 6 Business Intelligence CSC 562 Opening Vignette: S NeuroSolutions is offered by NeuroDimension and offers algorithms for the in the field of artificial intelligence. S NeuroDimension offers NeuroSolutions, NeuroSolutions for Excel, and a Custom Solution Wizard each of which can be downloaded for a free eval. 1/31/2011 7 Business Intelligence CSC 562 Opening Vignette: S An very good video is offered by the company that explains Neural Network algorithms and the field in general. S Pricing is relatively reasonable for the product. - NS for Excel costs $295 1/31/2011 8 Business Intelligence Neural Network Concepts CSC 562 (Page 245) S Neural networks (NN): a brain metaphor for information processing– uses artificial neurons (programming constructs that mimic the properties of biological neurons). S Neural computing - pattern recognition methodology for machine learning S Artificial neural network (ANN) – resulting model from neural computing S Many uses for ANN for S pattern recognition, forecasting, prediction, and classification S finance, marketing, manufacturing, operations, information systems, and so on 1/31/2011 9 Business Intelligence CSC 562 ANN Video S Here is an excellent video offered by NeuroSolutions that provides a good overview of ANN 1/31/2011 10 Business Intelligence Biological Neural Networks CSC 562 (Page 246) Dendrites Synapse Synapse Axon Axon Soma Dendrites Soma S Two interconnected brain cells (neurons) S S 1/31/2011 An axon is a long, slender projection of a nerve cell, or neuron, that conducts electrical impulses away from the neuron's cell body or soma. Dendrites are branched filaments in nerve cells (neurons). The word dendrite derives from the Greek word for tree which describes their branching tree-like structure. 11 Business Intelligence CSC 562 Biological Neural Networks (Page 246) Synapse Dendrites Synapse Axon Axon Dendrites Soma S S 1/31/2011 Soma Synapse – able to increase or decrease the strength of the connection between neurons and cause excitation or inhibition of a subsequent neuron. The word "soma" comes from the the Greek word “body”; the soma of a neuron is often called the cell body. 12 Business Intelligence CSC 562 Processing Information in ANN (Page 247 Figure 6.3) Inputs Weights Outputs x1 Y1 w1 x2 w2 . . . Neuron (or PE) S f (S ) n i 1 X iW Summation i Transfer Function Y . . . Y2 Yn wn xn S A single neuron (processing element – PE) with inputs and outputs 1/31/2011 13 Business Intelligence Biology Analogy CSC 562 (Page 247) 1/31/2011 14 Business Intelligence Elements of ANN CSC 562 (Page 248-250) S Processing element (PE) – organized in different ways to form the networks structure. S Network architecture S Hidden layers - takes input from the previous layer and converts into outputs for more processing (used in complex problems) S Parallel processing – resembles the way the brain works – different than serial processing in conventional computing Not this ANN 1/31/2011 15 Business Intelligence CSC 562 Elements of ANN (Page 248-250) S Network information processing S Inputs – single attribute such as age, income level, etc S Outputs – solution to the problem – ie – loan app “yes” or “no” S Connection weights – relative strength of input data (how important) S Summation function – weighted sums of all input elements entering a PE. 1/31/2011 16 Business Intelligence CSC 562 Elements of ANN (Figure 6.4 Page 249) (PE) x1 (PE) x2 Weighted Transfer Sum Function (f) (S) x3 Y1 (PE) (PE) (PE) Output Layer (PE) (PE) Hidden Layer Neural Network with One Hidden Layer Input Layer 1/31/2011 17 Business Intelligence CSC 562 Elements of ANN (a) Single neuron (b) Multiple neurons x1 x1 w11 (PE) Y1 (PE) Y2 w1 (PE) w21 Y w1 x2 w12 Y X 1W1 X 2W2 x2 w22 PE: Processing Element (or neuron) Summation Function for a Single Neuron (a) and Several Neurons (b) Y1 X1W11 X 2W21 Y2 X1W12 X2W22 Y3 X 2W 23 1/31/2011 18 w23 (PE) Y3 Business Intelligence CSC 562 Elements of ANN (Page 251) S Transformation (Transfer) Function – activation level of a neuron (based on this level the neuron may or may not produce an output). S Computed via Sigmoid (logical activation) function – YT=1/(1+e-Y) S Y is computed via weighted summation S Any value less than threshold will not be passed to output (0); anything above does (1) X1 = 3 W 1 X2 = 1 Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2 Transfer function: YT = 1/(1 + e-1.2) = 0.77 =0 .2 W2 = 0.4 W =0 3 Processing element (PE) Y = 1.2 YT = 0.77 Threshold value .1 X3 = 2 1/31/2011 19 Business Intelligence Neural Network Architectures CSC 562 (Page 251-252) S Several ANN architectures exist S Feedforward - figure 6.4 page 249 (see previous slide) S Recurrent - - figure 6.7 page 252 (next slide) S Associative memory S Self-organizing feature maps S Hopfield networks, etc 1/31/2011 20 Business Intelligence Neural Network Architectures Recurrent Neural Networks CSC 562 (Page 252, figure 6.7) 1/31/2011 21 Business Intelligence CSC 562 Neural Network Architectures (Page 252) S Architecture of a neural network is driven by the task it is intended to address S Most popular architecture: Feedforward, multi-layered perceptron with backpropagation learning algorithm S 1/31/2011 Ie – Feedforward Perceptron is the architecture and backpropagation is the learning algorithm. 22 Business Intelligence Neural Network Architectures CSC 562 Frank Rosenblatt (1957) The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. Frank Rosenblatt was a computer scientist born in 1928 in New York City. He helped to create the Perceptron, a.k.a. the Mark 1, computer in 1960 at Cornell University. This was the first computer that could learn skills by trial and error in an attempt to mimic human thought processes through the use of a neural network. (Died 1971) S Backpropagation is a common, supervised method for teaching artificial neural networks how to perform a given task. It was first described by Arthur E. Bryson and Yu-Chi Ho in 1969. S 1/31/2011 23 Business Intelligence Neural Network Architectures CSC 562 Original Mark 1 (Automatic Sequence Controlled Calculator (ASCC) The building elements of the ASCC were switches, relays, rotating shafts, and clutches. 1/31/2011 24 Business Intelligence CSC 562 Learning in ANN (Page 252) S A process by which a neural network learns the underlying relationship between input and outputs, or just among the inputs S Supervised learning S For prediction type problems S E.g., backpropagation S Unsupervised learning S For clustering type problems S Self-organizing S E.g., adaptive resonance theory 1/31/2011 25 Business Intelligence A Taxonomy of ANN Learning Algorithms CSC 562 (Page 253, Figure 6.8) Learning Algorithms Discrete/binary input Surepvised · Simple Hopefield · Outerproduct AM · Hamming Net Continuous Input Unsupervised Surepvised · ART-1 · Carpenter / Grossberg Most popular · · · · · Delta rule Gradient Descent Competitive learning Neocognitron Perceptor Recurrent 1/31/2011 · ART-3 · SOFM (or SOM) · Other clustering algorithms Architectures Supervised · Hopefield Unsupervised Unsupervised Feedforward · · · · Extimator · SOFM (or SOM) Nonlinear vs. linear Backpropagation ML perceptron Boltzmann 26 Extractor · ART-1 · ART-2 Business Intelligence Read Application Case CSC 562 (Page 254) S Microsoft used BrainMaker Neural Network software from California Scientific to maximize return on direct Mail S Some of the variables considered (25 in total) S Recency (how long since last registration / product purchase) S First date to file – loyal over time? S Number of products bought and filed S Value of products bought and registered S Number of days from product release to purchase S Improved response rate from 4.9% to 8.2% - 35% cost savings on 40 Millions pieces of direct mailings 1/31/2011 27 Business Intelligence A Supervised Learning Process CSC 562 (Pages 255-256, figure 6.9) ANN Model Three-step process: Compute output Adjust weights No 1. Compute temporary outputs 2. Compare outputs with desired targets 3. Adjust the weights and repeat the process Is desired output achieved? Yes Stop learning 1/31/2011 28 Business Intelligence CSC 562 How a Network Learns (Page 256) S Example: single neuron that learns the inclusive OR operation Learning parameters: Learning rate Momentum * See page 257 for step-by-step progression of the learning process 1/31/2011 29 Business Intelligence Backpropagation Learning CSC 562 (Page 258) S Errors are used to correct weights – called Back-error propagation S The (supervised) learning algorithm procedure: 1. 2. 3. 4. 5. 6. Initialize weights with random values and set other network parameters Read in the inputs and the desired outputs Compute the actual output (by working forward through the layers) Compute the error (difference between the actual and desired output) Change the weights by working backward through the hidden layers Repeat steps 2-5 until weights stabilize 1/31/2011 30 Business Intelligence Backpropagation Learning CSC 562 (Figure 6.10 Page 258) a(Zi – Yi) error x1 w1 x2 w2 . . . Neuron (or PE) S n i 1 X iW i Summation f (S ) Y f (S ) Yi Transfer Function wn xn S Backpropagation of Error for a Single Neuron 1/31/2011 31 Business Intelligence Development Process of an ANN CSC 562 (Page 259) Similar to structured design for traditional IS, with some new elements See page 253 1/31/2011 32 Business Intelligence An MLP ANN Structure for the Box-Office CSC 562 Prediction Problem (Page 262, Fig 6.12) This is the vignette at the start of Chapter 5 of page 191 Class 1 - FLOP (BO < 1 M) MPAA Rating (5) (G, PG, PG13, R, NR) 1 2 Class 2 (1M < BO < 10M) Competition (3) (High, Medium, Low) 2 3 Class 3 (10M < BO < 20M) Star Value (3) (High, Medium, Low) 3 4 Class 4 (20M < BO < 40M) Genre (10) (Sci-Fi, Action, ... ) 4 5 Class 5 (40M < BO < 65M) Technical Effects (3) (High, Medium, Low) 5 6 Class 6 (65M < BO < 100M) Sequel (2) (Yes, No) 6 7 Class 7 (100M < BO < 150M) Number of Screens (Positive Integer) 7 8 Class 8 (150M < BO < 200M) 9 Class 9 - BLOCKBUSTER (BO > 200M) INPUT LAYER (27 PEs) 1/31/2011 1 ... ... HIDDEN LAYER I (18 PEs) HIDDEN LAYER II (16 PEs) 33 OUTPUT LAYER (9 PEs) Business Intelligence Data Collection and Testing CSC 562 (Page 261) S Data is split into three parts S Training (~60%) S Validation (~20%) S Testing (~20%) 1/31/2011 34 Business Intelligence Sensitivity Analysis on ANN Models CSC 562 (Page 264-265) S A common criticism for ANN: The black-box syndrome! S Answer: sensitivity analysis S Conducted on a trained ANN S The inputs are changed while the relative change on the output is measured/recorded S Results illustrates the relative importance of input variables 1/31/2011 35 Business Intelligence Sensitivity Analysis on ANN Models CSC 562 (Page 265, Figure 6.13) Trained ANN “the black-box” Systematically Perturbed Inputs Observed Change in Outputs D1 S See and read example Application Case 6.5 (Page 266) S Sensitivity analysis reveals the most important injury severity factors in traffic accidents 1/31/2011 36 Business Intelligence CSC 562 Sensitivity Analysis on ANN Models (Page 266) S Application Case 6.5 – see here S 41,000 die in 6M US traffic accidents S Analyze the factors that elevate the risk of severe injury S Factors include behavior, environment, technical, etc. S Used series of ANN models to estimate the significance of the crash factors on the level of severity sustained by the driver. S Two step process used (1) prediction models, (2) sensitivity analysis on trained neural network S Results shows significant differences among models built for different injury severity levels. (The most influential factors HIGHLY depend on the level of injury). 1/31/2011 37 Business Intelligence A Sample Neural Network Project Bankruptcy Prediction CSC 562 (Pg 267-270) S A comparative analysis of ANN versus logistic regression (LR) (a statistical method) S Inputs S X1: Working capital/total assets S X2: Retained earnings/total assets S X3: Earnings before interest and taxes/total assets S X4: Market value of equity/total debt S X5: Sales/total assets 1/31/2011 38 Business Intelligence A Sample Neural Network Project Bankruptcy Prediction CSC 562 S Data was obtained from Moody's Industrial Manuals S Time period: 1975 to 1982 S 129 firms (65 of which went bankrupt during the period and 64 nonbankrupt) S Different training and testing propositions are used/compared S 90/10 versus 80/20 versus 50/50 S Resampling is used to create 60 data sets 1/31/2011 39 Business Intelligence A Sample Neural Network Project Bankruptcy Prediction x1 x2 BR = 1 x3 x4 NBR = 1 x5 1/31/2011 40 CSC 562 Network Specifics Feedforward MLP Backpropagation Varying learning and momentum values 5 input neurons (1 for each financial ratio), 10 hidden neurons, 2 output neurons (1 indicating a bankrupt firm and the other indicating a nonbankrupt firm) Business Intelligence A Sample Neural Network Project Bankruptcy Prediction – Results CSC 562 (Page 269 figure 6.2) 1/31/2011 41 Business Intelligence Bottomline - Advantages of ANN CSC 562 (Pages 274-276) S Able to deal with (identify/model) highly nonlinear relationships S Can handle variety of problem types (loan apps, forecast profitability / finances, sports – team success, fraud prevention, time-series forecasting, health care and medicine – diagnose breast cancer – see Case 6.4 on page 276) S Usually provides better results (prediction and/or clustering) compared to its statistical counterparts 1/31/2011 42 Business Intelligence CSC 562 Disadvantages of ANN S They are deemed to be black-box solutions, lacking expandability S It is hard to find optimal values for large number of network parameters S Optimal design is still an art: requires expertise and extensive experimentation S It is hard to handle large number of variables (especially the rich nominal attributes) S Training may take a long time for large datasets; which may require case sampling 1/31/2011 43 Business Intelligence ANN Software CSC 562 (Page 263) S Standalone ANN software tool S NeuroSolutions S BrainMaker S NeuralWare S NeuroShell, … for more (see pcai.com) … S Part of a data mining software suit S PASW (formerly SPSS Clementine) S SAS Enterprise Miner S Statistica Data Miner, … many more … 1/31/2011 44 Business Intelligence CSC 562 Next lecture Chapter 7 - Text and Web Mining S 1/31/2011 45 Business Intelligence