Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.

advertisement
Neural Networks
Demystified
by Louise Francis
Francis Analytics and Actuarial Data Mining, Inc.
louise_francis@msn.com
Objectives of Paper
Introduce actuaries to neural
networks
 Show that neural networks are a lot
like some conventional statistics
 Indicate where use of neural
networks might be helpful
 Show how to interpret neural
network models

Data Mining

Neural networks are one of a
number of data mining techniques
Methods primarily developed in artificial
intelligence and statistical disciplines to
find patterns in data
 Typically applied to large databases
with complex relationships

Some Other Data Mining
Methods
Decision trees
 Clustering
 Regression splines
 Association rules

Some Data Mining Advantages
Nonlinear relationships
 Interactions
 Multicollinearity

Data Mining: Neural
Networks
One of more established approaches
 Somewhat glamorous
 AI description: they function like
neurons in the brain

Neural Networks:
Disadvantages

They are a black box
User gets a prediction from them, but
the form of the fitted function is not
revealed
 Don’t know which variables are the
most important in the prediction

Kinds of Neural Networks

Supervised learning

Multilayer perceptron
• Also known as backpropagation neural
network


Paper explains this kind of NN
Unsupervised learning

Kohonen neural networks
The MLP Neural Network
THREE LAYER NEURAL NETWORK
Input Layer
(Input Data)
Hidden Layer (Processes Data) Output Layer (Predicted Value)
The Activation Function

The sigmoid logistic function
f (Y ) 
1
1  e Y
Y  w0  w1 * X 1  w2 X 2 ...  wn X n
The Logistic Function
Logistic Function
1.0
0.8
y
0.6
f(x)=1/(1+exp(-5*x))
0.4
0.2
0.0
-0.8 -0.6 -0.4 -0.2
0.0
x
0.2
0.4
0.6
0.8
1.0
The Logistic Function
Logistic Function for Various Values of w1
1.0
0.8
w1=-10
w1=-5
w1=-1
w1=1
w1=5
w1=10
0.6
0.4
0.2
0.0
X
-1.2
-0.7
-0.2
0.3
0.8
The Logistic Function
Logistic Curve With Varying Constants
1.0
0.8
0.6
y
w0=-1
w0=0
wo=1
0.4
0.2
0.0
-0.8 -0.6 -0.4 -0.2
0.0
x
0.2
0.4
0.6
0.8
1.0
Other
Data is usually normalized
 Usually both independent and
dependent variables transformed to
lie in range between 0 and 1

Logistic Function
Constant=2
Constant=-2
1.0
1.0
w1=-10
w1=-5
w1=-1
w1=1
w1=5
w1=10
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
-0.1
0.3
0.7
x
w1=-10
w1=-5
w1=-1
w1=1
w1=5
w1=10
0.8
1.1
-0.1
0.3
0.7
x
1.1
Fitting the curve

Typically use a procedure which is
like gradient descent
2
ˆ
Min ((Y  Y ) )
Fitting a nonlinear function
X
f ( X )  ln( X )  sin(
)
675
X ~ U (500,5000)
e ~ N (0,.2)
Graph of nonlinear function
Scatterplot of Y = sin(X/675)+ln(X) + e
10
9
8
7
6
0
1000
2000
3000
X
4000
5000
Fitted Weights
Table 4
W0
W1
Node 1
-4.107
7.986
Node 2
6.549
-7.989
Hidden Layer
P lo t o f V a lu e s fo r H id d e n L a y e r N o d e s
E x a m p le 2
1 .0
N ode1
N ode2
0 .8
0 .6
0 .4
0 .2
0 .0
0
1000
2000
3000
4000
5000
X
Table 5
W0
W1
W2
6.154
-3.0501
-6.427
Selected Fitted Values for
function
Table 6
Computation of Predicted Values for Selected Values of X
(1)
(2)
(3)
(4)
((1)-508)/4994
X
Normalized X
Output of
Node 1
Output of
Node 2
(5)
(6)
(7)
6.15-3.05*(3)6.43*(4)
1/(1+exp(-(5))
6.52+3.56*(6)
Weighted
Hidden Node
Output
Output Node
Logistic
Function
Predicted Y
508.48
0.00
0.016
0.999
-0.323
0.420
7.889
1,503.00
0.22
0.088
0.992
-0.498
0.378
7.752
3,013.40
0.56
0.596
0.890
-1.392
0.199
7.169
4,994.80
1.00
0.980
0.190
1.937
0.874
9.369
Hidden and Output Layer
H id d e n N o d e
W e ig h te d O u tp u t fro m H id d e n N o d e s
2
0 .9
0 .4
-1
0
1500
3000
X
4500
0
1100
2200
X
3300
L o g is tic F u n c tio n o f H id d e n N o d e O u tp u t
0 .8
0 .4
0
1500
3000
X
4500
4400
Fit of Curve with 2 Nodes
F itte d 2 N o d e N e u ra l N e tw o rk a n d T ru e Y V a lu e s
9 .5
P re d ic te d
T ru e Y
9 .0
8 .5
8 .0
7 .5
7 .0
0
1000
2000
3000
X
4000
5000
Fit of Curve with 3 Nodes
Fitted 3 Node Neural Network and True Y Value
9.5
9.0
Y
8.5
8.0
True Y
3 Node Y
7.5
7.0
0
1000
2000
3000
X
4000
5000
Universal Function
Approximator
The multilayer perceptron neural
network with one hidden layer is a
universal function approximator
 Theoretically, with a sufficient
number of nodes in the hidden layer,
any nonlinear function can be
approximated

Correlated Variables
Variables used in model building are
often correlated.
 It is difficult to isolate the effect of
the individual variables because of
the correlation between the
variables.

Example of correlated variables
CarPartsRate vs CarBody Inflation Rates
0.02
Car Parts Rate
0.015
0.01
0.005
0
-0.04
-0.02
0
0.02
-0.005
-0.01
-0.015
Car Body Rates
0.04
0.06
0.08
A Solution: Principal Components &
Factor Analysis
One Factor Model
F1
X1
U1
X2
U2
X3
U3
Factor Analysis: An Example
Factor Analysis Diagram
Social Inflation
Factor
Litigation Rates
U1
Size of Jury
Awards
U2
Index of State
Litigation
Environment
U3
Factor Analysis
Factor Analysis Result used for
Prediction
X1
X2
F1
Y
X2
Input Variables
Factor
Dependent Variable
Iˆ  w1 X 1  w2 X 2 ...  wn X n
Factor Analysis
Three Layer Neural
Network With One Hidden
Node
Input
Hidden
Output
Layer
Layer
Layer
Correlated Variables: An Example
Workers Compensation Line
 Produce an economic inflation index

Wage Inflation
 Medical Inflation
 Benefit Level Index


In simplified example no other
variable drives severity results
Factor Analysis Example
Variable
Table 8
Loading
Weights
Wage Inflation Index
0.985
0.395
Medical Inflation Index
0.988
0.498
Benefit Level Inflation
Index
0.947
0.113
X1 = b1 Factor1
X2 = b2 Factor1
X3 = b3 Factor1
Index =.395 (Wage Inflation)+.498(Medical Inflation)+.113(Benefit Level Inflation)
Factor Analysis Example
Neural Network Predicted vs Inflation Factor
NeuralNetworkPredicted
7200
6200
5200
1.0
1.1
1.2
1.3
Unobserved Inflation Factor
1.4
Interpreting Neural Network
Look at weights to hidden layer
 Compute sensitivities:


a measure of how much the predicted
value’s error increases when the
variables are excluded from the model
one at a time
Interpretation of Neural Network
Table 9: Factor Example Parameters
W0
W1
W2
W3
2.549
-2.802
-3.010
0.662
Table 10
Sensitivities of Variables in Factor Example
Benefit Level
23.6%
Medical Inflation
33.1%
Wage Inflation
6.0%
Interactions: Another Modeling
Problem
Impact of two variables is more or less than the
sum of their independent impacts.
Plot of Frequency vs Age by Territory
Territory: 3
Territory: 4
0.3
"True" Frequency

0.1
Territory: 1
Territory: 2
0.3
0.1
17
22.5 32.5 47.5 62.5 77.5 17
Age
22.5 32.5 47.5 62.5 77.5
Interactions: Simulated Data
P lo t o f S im u la te d F r e q u e n c ie s
T e rrito ry : 3
T e rrito ry : 4
F re q u e n c y
0 .5
0 .2
T e rrito ry : 1
T e rrito ry : 2
0 .5
0 .2
17
2 2 .5
3 2 .5
4 7 .5
6 2 .5
7 7 .5
17
Age
2 2 .5
3 2 .5
4 7 .5
6 2 .5
7 7 .5
Interactions: Neural Network
Neural Netw ork Predicted by Age and Territory
Territory: 3
Territory: 4
Neural Network Predicted
0.20
0.10
Territory: 1
Territory: 2
0.20
0.10
17
22.5 32.5 47.5 62.5 77.5 17
Age
22.5 32.5 47.5 62.5 77.5
Interactions: Regression
R e g re s s io n P re d ic te d b y A g e a n d T e rrito ry
T e rrito ry : 3
T e rrito ry : 4
R e g re s s io n P re d ic te d
0 .2 0
0 .0 5
T e rrito ry : 1
T e rrito ry : 2
0 .2 0
0 .0 5
17
2 1 2 5 .5 3 2 .5 4 2 .5 5 2 .5 6 2 .5 7 2 .5 8 2 .5 1 7
Age
2 1 2 5 .5 3 2 .5 4 2 .5 5 2 .5 6 2 .5 7 2 .5 8 2 .5
Example With Messy Data
Table 15
Variable
Age of Driver
Territory
Age of Car
Car Type
Credit Rating
Auto BI Inflation Factor
Variable
Type
Continuous
Categorical
Continuous
Categorical
Continuous
Continuous
Auto PD and Phys Dam Inflation
Factor
Continuous
Law Change
Bogus
Categorical
Continuous
Number of
Categories
45
4
Missing
Data
No
No
Yes
No
Yes
No
No
2
No
No
Example With Messy Data
D istribution of M u
1.2
0.8
0.4
0.0
6.50 6.75 7.00 7.25 7.50 7.75 8.00 8.25 8.50 8.75 9.00
Mu
Visualizing Neural Network Result
Scatterplot of Predicted log(Severity) vs Age
Predicted
8.5
8.0
7.5
7.0
20
40
60
Age
80
Visualizing Neural Network Result
8.15
8.10
8.05
Predicted
8.20
8.25
Visualization Plot of Predicted log(Severity) vs Age
20
30
40
50
Age
60
70
80
Visualization of Law Change Effect
V is u a liz a tio n P lo t o f P re d ic te d L o g (S e v e rity ) v s . L a w C h a n g e
L a w : 1 .0 0
2000
1500
1000
500
L a w : 0 .0 0
2000
1500
1000
500
0
7 .7 0 7 .8 0 7 .9 0 8 .0 0 8 .1 0 8 .2 0 8 .3 0 8 .4 0 8 .5 0
P re d ic te d
0
Visualization of Inflation
8.1
P redicted
8.3
In fla tio n V isu a liza tio n P lo t o f P re d icte d lo g (S e ve rity) vs T im e
5
10
15
Q uarters of a Y ear
20
How Good Was the Fit?
Plot of Neural Network Predicted Log(Severity) vs True Expected Severity
Expected Severity
8000
7000
6000
5000
7.7
7.9
8.1
lnpred
8.3
How Good Was the Fit?
6.5
7.0
7.5
Mu
8.0
8.5
9.0
Scatterplot of Neural Network Predicted vs Mu
7.5
8.0
lnpred
8.5
Download