Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. louise_francis@msn.com Objectives of Paper Introduce actuaries to neural networks Show that neural networks are a lot like some conventional statistics Indicate where use of neural networks might be helpful Show how to interpret neural network models Data Mining Neural networks are one of a number of data mining techniques Methods primarily developed in artificial intelligence and statistical disciplines to find patterns in data Typically applied to large databases with complex relationships Some Other Data Mining Methods Decision trees Clustering Regression splines Association rules Some Data Mining Advantages Nonlinear relationships Interactions Multicollinearity Data Mining: Neural Networks One of more established approaches Somewhat glamorous AI description: they function like neurons in the brain Neural Networks: Disadvantages They are a black box User gets a prediction from them, but the form of the fitted function is not revealed Don’t know which variables are the most important in the prediction Kinds of Neural Networks Supervised learning Multilayer perceptron • Also known as backpropagation neural network Paper explains this kind of NN Unsupervised learning Kohonen neural networks The MLP Neural Network THREE LAYER NEURAL NETWORK Input Layer (Input Data) Hidden Layer (Processes Data) Output Layer (Predicted Value) The Activation Function The sigmoid logistic function f (Y ) 1 1 e Y Y w0 w1 * X 1 w2 X 2 ... wn X n The Logistic Function Logistic Function 1.0 0.8 y 0.6 f(x)=1/(1+exp(-5*x)) 0.4 0.2 0.0 -0.8 -0.6 -0.4 -0.2 0.0 x 0.2 0.4 0.6 0.8 1.0 The Logistic Function Logistic Function for Various Values of w1 1.0 0.8 w1=-10 w1=-5 w1=-1 w1=1 w1=5 w1=10 0.6 0.4 0.2 0.0 X -1.2 -0.7 -0.2 0.3 0.8 The Logistic Function Logistic Curve With Varying Constants 1.0 0.8 0.6 y w0=-1 w0=0 wo=1 0.4 0.2 0.0 -0.8 -0.6 -0.4 -0.2 0.0 x 0.2 0.4 0.6 0.8 1.0 Other Data is usually normalized Usually both independent and dependent variables transformed to lie in range between 0 and 1 Logistic Function Constant=2 Constant=-2 1.0 1.0 w1=-10 w1=-5 w1=-1 w1=1 w1=5 w1=10 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 -0.1 0.3 0.7 x w1=-10 w1=-5 w1=-1 w1=1 w1=5 w1=10 0.8 1.1 -0.1 0.3 0.7 x 1.1 Fitting the curve Typically use a procedure which is like gradient descent 2 ˆ Min ((Y Y ) ) Fitting a nonlinear function X f ( X ) ln( X ) sin( ) 675 X ~ U (500,5000) e ~ N (0,.2) Graph of nonlinear function Scatterplot of Y = sin(X/675)+ln(X) + e 10 9 8 7 6 0 1000 2000 3000 X 4000 5000 Fitted Weights Table 4 W0 W1 Node 1 -4.107 7.986 Node 2 6.549 -7.989 Hidden Layer P lo t o f V a lu e s fo r H id d e n L a y e r N o d e s E x a m p le 2 1 .0 N ode1 N ode2 0 .8 0 .6 0 .4 0 .2 0 .0 0 1000 2000 3000 4000 5000 X Table 5 W0 W1 W2 6.154 -3.0501 -6.427 Selected Fitted Values for function Table 6 Computation of Predicted Values for Selected Values of X (1) (2) (3) (4) ((1)-508)/4994 X Normalized X Output of Node 1 Output of Node 2 (5) (6) (7) 6.15-3.05*(3)6.43*(4) 1/(1+exp(-(5)) 6.52+3.56*(6) Weighted Hidden Node Output Output Node Logistic Function Predicted Y 508.48 0.00 0.016 0.999 -0.323 0.420 7.889 1,503.00 0.22 0.088 0.992 -0.498 0.378 7.752 3,013.40 0.56 0.596 0.890 -1.392 0.199 7.169 4,994.80 1.00 0.980 0.190 1.937 0.874 9.369 Hidden and Output Layer H id d e n N o d e W e ig h te d O u tp u t fro m H id d e n N o d e s 2 0 .9 0 .4 -1 0 1500 3000 X 4500 0 1100 2200 X 3300 L o g is tic F u n c tio n o f H id d e n N o d e O u tp u t 0 .8 0 .4 0 1500 3000 X 4500 4400 Fit of Curve with 2 Nodes F itte d 2 N o d e N e u ra l N e tw o rk a n d T ru e Y V a lu e s 9 .5 P re d ic te d T ru e Y 9 .0 8 .5 8 .0 7 .5 7 .0 0 1000 2000 3000 X 4000 5000 Fit of Curve with 3 Nodes Fitted 3 Node Neural Network and True Y Value 9.5 9.0 Y 8.5 8.0 True Y 3 Node Y 7.5 7.0 0 1000 2000 3000 X 4000 5000 Universal Function Approximator The multilayer perceptron neural network with one hidden layer is a universal function approximator Theoretically, with a sufficient number of nodes in the hidden layer, any nonlinear function can be approximated Correlated Variables Variables used in model building are often correlated. It is difficult to isolate the effect of the individual variables because of the correlation between the variables. Example of correlated variables CarPartsRate vs CarBody Inflation Rates 0.02 Car Parts Rate 0.015 0.01 0.005 0 -0.04 -0.02 0 0.02 -0.005 -0.01 -0.015 Car Body Rates 0.04 0.06 0.08 A Solution: Principal Components & Factor Analysis One Factor Model F1 X1 U1 X2 U2 X3 U3 Factor Analysis: An Example Factor Analysis Diagram Social Inflation Factor Litigation Rates U1 Size of Jury Awards U2 Index of State Litigation Environment U3 Factor Analysis Factor Analysis Result used for Prediction X1 X2 F1 Y X2 Input Variables Factor Dependent Variable Iˆ w1 X 1 w2 X 2 ... wn X n Factor Analysis Three Layer Neural Network With One Hidden Node Input Hidden Output Layer Layer Layer Correlated Variables: An Example Workers Compensation Line Produce an economic inflation index Wage Inflation Medical Inflation Benefit Level Index In simplified example no other variable drives severity results Factor Analysis Example Variable Table 8 Loading Weights Wage Inflation Index 0.985 0.395 Medical Inflation Index 0.988 0.498 Benefit Level Inflation Index 0.947 0.113 X1 = b1 Factor1 X2 = b2 Factor1 X3 = b3 Factor1 Index =.395 (Wage Inflation)+.498(Medical Inflation)+.113(Benefit Level Inflation) Factor Analysis Example Neural Network Predicted vs Inflation Factor NeuralNetworkPredicted 7200 6200 5200 1.0 1.1 1.2 1.3 Unobserved Inflation Factor 1.4 Interpreting Neural Network Look at weights to hidden layer Compute sensitivities: a measure of how much the predicted value’s error increases when the variables are excluded from the model one at a time Interpretation of Neural Network Table 9: Factor Example Parameters W0 W1 W2 W3 2.549 -2.802 -3.010 0.662 Table 10 Sensitivities of Variables in Factor Example Benefit Level 23.6% Medical Inflation 33.1% Wage Inflation 6.0% Interactions: Another Modeling Problem Impact of two variables is more or less than the sum of their independent impacts. Plot of Frequency vs Age by Territory Territory: 3 Territory: 4 0.3 "True" Frequency 0.1 Territory: 1 Territory: 2 0.3 0.1 17 22.5 32.5 47.5 62.5 77.5 17 Age 22.5 32.5 47.5 62.5 77.5 Interactions: Simulated Data P lo t o f S im u la te d F r e q u e n c ie s T e rrito ry : 3 T e rrito ry : 4 F re q u e n c y 0 .5 0 .2 T e rrito ry : 1 T e rrito ry : 2 0 .5 0 .2 17 2 2 .5 3 2 .5 4 7 .5 6 2 .5 7 7 .5 17 Age 2 2 .5 3 2 .5 4 7 .5 6 2 .5 7 7 .5 Interactions: Neural Network Neural Netw ork Predicted by Age and Territory Territory: 3 Territory: 4 Neural Network Predicted 0.20 0.10 Territory: 1 Territory: 2 0.20 0.10 17 22.5 32.5 47.5 62.5 77.5 17 Age 22.5 32.5 47.5 62.5 77.5 Interactions: Regression R e g re s s io n P re d ic te d b y A g e a n d T e rrito ry T e rrito ry : 3 T e rrito ry : 4 R e g re s s io n P re d ic te d 0 .2 0 0 .0 5 T e rrito ry : 1 T e rrito ry : 2 0 .2 0 0 .0 5 17 2 1 2 5 .5 3 2 .5 4 2 .5 5 2 .5 6 2 .5 7 2 .5 8 2 .5 1 7 Age 2 1 2 5 .5 3 2 .5 4 2 .5 5 2 .5 6 2 .5 7 2 .5 8 2 .5 Example With Messy Data Table 15 Variable Age of Driver Territory Age of Car Car Type Credit Rating Auto BI Inflation Factor Variable Type Continuous Categorical Continuous Categorical Continuous Continuous Auto PD and Phys Dam Inflation Factor Continuous Law Change Bogus Categorical Continuous Number of Categories 45 4 Missing Data No No Yes No Yes No No 2 No No Example With Messy Data D istribution of M u 1.2 0.8 0.4 0.0 6.50 6.75 7.00 7.25 7.50 7.75 8.00 8.25 8.50 8.75 9.00 Mu Visualizing Neural Network Result Scatterplot of Predicted log(Severity) vs Age Predicted 8.5 8.0 7.5 7.0 20 40 60 Age 80 Visualizing Neural Network Result 8.15 8.10 8.05 Predicted 8.20 8.25 Visualization Plot of Predicted log(Severity) vs Age 20 30 40 50 Age 60 70 80 Visualization of Law Change Effect V is u a liz a tio n P lo t o f P re d ic te d L o g (S e v e rity ) v s . L a w C h a n g e L a w : 1 .0 0 2000 1500 1000 500 L a w : 0 .0 0 2000 1500 1000 500 0 7 .7 0 7 .8 0 7 .9 0 8 .0 0 8 .1 0 8 .2 0 8 .3 0 8 .4 0 8 .5 0 P re d ic te d 0 Visualization of Inflation 8.1 P redicted 8.3 In fla tio n V isu a liza tio n P lo t o f P re d icte d lo g (S e ve rity) vs T im e 5 10 15 Q uarters of a Y ear 20 How Good Was the Fit? Plot of Neural Network Predicted Log(Severity) vs True Expected Severity Expected Severity 8000 7000 6000 5000 7.7 7.9 8.1 lnpred 8.3 How Good Was the Fit? 6.5 7.0 7.5 Mu 8.0 8.5 9.0 Scatterplot of Neural Network Predicted vs Mu 7.5 8.0 lnpred 8.5