- ArtificialNeuralNetwork

advertisement
CSC 562
Business Intelligence
Lecture 9
Chapter 6 – Artificial Neural Networks for Data Mining
S
1/31/2011
1
Business Intelligence
CSC 562
Learning Objectives
S Understand the concept and definitions of artificial neural networks
(ANN)
S Know the similarities and differences between biological and
artificial neural networks
S Learn the different types of neural network architectures
S Learn the advantages and limitations of ANN
S Understand how backpropagation learning works in feedforward
neural networks
1/31/2011
2
Business Intelligence
CSC 562
Learning Objectives
S Understand the step-by-step process of how to use neural
networks
S Appreciate the wide variety of applications of neural
networks; solving problem types of
S Classification
S Regression
S Clustering
S Association
S Optimization
1/31/2011
3
Business Intelligence
CSC 562
Opening Vignette:
(Page 242)
“Predicting Gambling Referenda with Neural Networks”
S Using NeuroSolutions, this study developed and tested
models to predict community support for commercial
gaming.
S The study examined the role of factors that contribute to
legalization and/or probation of gambling activities
using neural networks.
S It attempted to use Neural Network technology to
predict various counties voting outcome on this subject.
1/31/2011
4
Business Intelligence
CSC 562
Opening Vignette:
S On average, the models accurately predicted the voting results for 4
out of every 5 counties (approximately 82% accuracy) on a sample
data set. (1287 records of data)
S Interestingly, and contrary to popular belief, the counties financial
characteristics and age distribution were not found to be
significant factors in determining ballot outcome. Dominant
factors are identified on Page 244
S The study demonstrates that demographic data can be used to
accurately predict voting outcomes on controversial issues.
1/31/2011
5
Business Intelligence
Opening Vignette:
Predicting Gambling Referenda…
CSC 562
Socio-demographic
=
Predicted
vs. Actual
Religious
Financial
Voted “yes” or
“no” to legalizing
gaming
.
.
.
.
.
.
Other
INPUT
LAYER
1/31/2011
HIDDEN
LAYER
OUTPUT
LAYER
6
Business Intelligence
CSC 562
Opening Vignette:
S NeuroSolutions is offered by NeuroDimension and offers
algorithms for the in the field of artificial intelligence.
S NeuroDimension offers NeuroSolutions, NeuroSolutions for
Excel, and a Custom Solution Wizard each of which can be
downloaded for a free eval.
1/31/2011
7
Business Intelligence
CSC 562
Opening Vignette:
S An very good video is offered by the company that explains
Neural Network algorithms and the field in general.
S Pricing is relatively reasonable for the product. - NS for
Excel costs $295
1/31/2011
8
Business Intelligence
Neural Network Concepts
CSC 562
(Page 245)
S Neural networks (NN): a brain metaphor for information
processing– uses artificial neurons (programming constructs that
mimic the properties of biological neurons).
S Neural computing - pattern recognition methodology for machine
learning
S Artificial neural network (ANN) – resulting model from neural
computing
S Many uses for ANN for
S pattern recognition, forecasting, prediction, and classification
S finance, marketing, manufacturing, operations, information systems, and
so on
1/31/2011
9
Business Intelligence
CSC 562
ANN Video
S Here is an excellent video offered by NeuroSolutions that provides a
good overview of ANN
1/31/2011
10
Business Intelligence
Biological Neural Networks
CSC 562
(Page 246)
Dendrites
Synapse
Synapse
Axon
Axon
Soma
Dendrites
Soma
S Two interconnected brain cells (neurons)
S
S
1/31/2011
An axon is a long, slender projection of a nerve cell, or neuron, that
conducts electrical impulses away from the neuron's cell body or soma.
Dendrites are branched filaments in nerve cells (neurons). The word
dendrite derives from the Greek word for tree which describes their
branching tree-like structure.
11
Business Intelligence
CSC 562
Biological Neural Networks
(Page 246)
Synapse
Dendrites
Synapse
Axon
Axon
Dendrites
Soma
S
S
1/31/2011
Soma
Synapse – able to increase or decrease the strength of the connection between
neurons and cause excitation or inhibition of a subsequent neuron.
The word "soma" comes from the the Greek word “body”; the soma of a
neuron is often called the cell body.
12
Business Intelligence
CSC 562
Processing Information in ANN
(Page 247 Figure 6.3)
Inputs
Weights
Outputs
x1
Y1
w1
x2
w2
.
.
.
Neuron (or PE)
S 
f (S )
n

i 1
X iW
Summation
i
Transfer
Function
Y
.
.
.
Y2
Yn
wn
xn
S A single neuron (processing element – PE)
with inputs and outputs
1/31/2011
13
Business Intelligence
Biology Analogy
CSC 562
(Page 247)
1/31/2011
14
Business Intelligence
Elements of ANN
CSC 562
(Page 248-250)
S Processing element (PE) – organized in different ways
to form the networks structure.
S Network architecture
S Hidden layers - takes input from the previous layer and
converts into outputs for more processing (used in complex
problems)
S Parallel processing – resembles the way the brain works –
different than serial processing in conventional computing
Not this ANN
1/31/2011
15
Business Intelligence
CSC 562
Elements of ANN
(Page 248-250)
S Network information processing
S Inputs – single attribute such as age, income level, etc
S Outputs – solution to the problem – ie – loan app “yes” or
“no”
S Connection weights – relative strength of input data (how
important)
S Summation function – weighted sums of all input
elements entering a PE.
1/31/2011
16
Business Intelligence
CSC 562
Elements of ANN
(Figure 6.4 Page 249)
(PE)
x1
(PE)
x2
Weighted Transfer
Sum
Function
(f)
(S)
x3
Y1
(PE)
(PE)
(PE)
Output
Layer
(PE)
(PE)
Hidden
Layer
Neural Network with
One Hidden Layer
Input
Layer
1/31/2011
17
Business Intelligence
CSC 562
Elements of ANN
(a) Single neuron
(b) Multiple neurons
x1
x1
w11
(PE)
Y1
(PE)
Y2
w1
(PE)
w21
Y
w1
x2
w12
Y  X 1W1  X 2W2
x2
w22
PE: Processing Element (or neuron)
Summation Function for a Single
Neuron (a) and Several Neurons (b)
Y1  X1W11  X 2W21
Y2  X1W12  X2W22
Y3  X 2W 23
1/31/2011
18
w23
(PE)
Y3
Business Intelligence
CSC 562
Elements of ANN
(Page 251)
S
Transformation (Transfer) Function – activation level of a neuron (based on this level
the neuron may or may not produce an output).
S
Computed via Sigmoid (logical activation) function – YT=1/(1+e-Y)
S
Y is computed via weighted summation
S
Any value less than threshold will not be passed to output (0); anything above does (1)
X1 = 3
W
1
X2 = 1
Summation function:
Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2
Transfer function:
YT = 1/(1 + e-1.2) = 0.77
=0
.2
W2 = 0.4
W
=0
3
Processing
element (PE)
Y = 1.2
YT = 0.77
 Threshold value
.1
X3 = 2
1/31/2011
19
Business Intelligence
Neural Network Architectures
CSC 562
(Page 251-252)
S Several ANN architectures exist
S Feedforward - figure 6.4 page 249 (see previous slide)
S Recurrent - - figure 6.7 page 252 (next slide)
S Associative memory
S Self-organizing feature maps
S Hopfield networks, etc
1/31/2011
20
Business Intelligence
Neural Network Architectures
Recurrent Neural Networks
CSC 562
(Page 252, figure 6.7)
1/31/2011
21
Business Intelligence
CSC 562
Neural Network Architectures
(Page 252)
S Architecture of a neural network is driven by the task it is intended
to address
S Most popular architecture: Feedforward, multi-layered perceptron
with backpropagation learning algorithm
S
1/31/2011
Ie – Feedforward Perceptron is the architecture and backpropagation is the
learning algorithm.
22
Business Intelligence
Neural Network Architectures
CSC 562
Frank Rosenblatt (1957)
The perceptron is a type of artificial neural network invented in 1957 at
the Cornell Aeronautical Laboratory by Frank Rosenblatt. Frank
Rosenblatt was a computer scientist born in 1928 in New York City. He
helped to create the Perceptron, a.k.a. the Mark 1, computer in 1960 at
Cornell University. This was the first computer that could learn skills by
trial and error in an attempt to mimic human thought processes through
the use of a neural network. (Died 1971)
S Backpropagation is a common, supervised method for teaching artificial
neural networks how to perform a given task. It was first described
by Arthur E. Bryson and Yu-Chi Ho in 1969.
S
1/31/2011
23
Business Intelligence
Neural Network Architectures
CSC 562
Original Mark 1 (Automatic Sequence Controlled Calculator (ASCC)
The building elements of the ASCC were switches, relays, rotating shafts, and clutches.
1/31/2011
24
Business Intelligence
CSC 562
Learning in ANN
(Page 252)
S A process by which a neural network learns the underlying
relationship between input and outputs, or just among the inputs
S Supervised learning
S For prediction type problems
S E.g., backpropagation
S Unsupervised learning
S For clustering type problems
S Self-organizing
S E.g., adaptive resonance theory
1/31/2011
25
Business Intelligence
A Taxonomy of ANN Learning Algorithms
CSC 562
(Page 253, Figure 6.8)
Learning Algorithms
Discrete/binary input
Surepvised
· Simple Hopefield
· Outerproduct AM
· Hamming Net
Continuous Input
Unsupervised
Surepvised
· ART-1
· Carpenter /
Grossberg
Most popular
·
·
·
·
·
Delta rule
Gradient Descent
Competitive learning
Neocognitron
Perceptor
Recurrent
1/31/2011
· ART-3
· SOFM (or SOM)
· Other clustering
algorithms
Architectures
Supervised
· Hopefield
Unsupervised
Unsupervised
Feedforward
·
·
·
·
Extimator
· SOFM (or SOM)
Nonlinear vs. linear
Backpropagation
ML perceptron
Boltzmann
26
Extractor
· ART-1
· ART-2
Business Intelligence
Read Application Case
CSC 562
(Page 254)
S Microsoft used BrainMaker Neural Network software from
California Scientific to maximize return on direct Mail
S Some of the variables considered (25 in total)
S Recency (how long since last registration / product purchase)
S First date to file – loyal over time?
S Number of products bought and filed
S Value of products bought and registered
S Number of days from product release to purchase
S Improved response rate from 4.9% to 8.2% - 35% cost
savings on 40 Millions pieces of direct mailings
1/31/2011
27
Business Intelligence
A Supervised Learning Process
CSC 562
(Pages 255-256, figure 6.9)
ANN
Model
Three-step process:
Compute
output
Adjust
weights
No
1. Compute temporary
outputs
2. Compare outputs with
desired targets
3. Adjust the weights and
repeat the process
Is desired
output
achieved?
Yes
Stop
learning
1/31/2011
28
Business Intelligence
CSC 562
How a Network Learns
(Page 256)
S Example: single neuron that learns the inclusive OR operation
Learning parameters:
 Learning rate
 Momentum
* See page 257 for step-by-step progression of the learning process
1/31/2011
29
Business Intelligence
Backpropagation Learning
CSC 562
(Page 258)
S Errors are used to correct weights – called Back-error propagation
S The (supervised) learning algorithm procedure:
1.
2.
3.
4.
5.
6.
Initialize weights with random values and set other network parameters
Read in the inputs and the desired outputs
Compute the actual output (by working forward through the layers)
Compute the error (difference between the actual and desired output)
Change the weights by working backward through the hidden layers
Repeat steps 2-5 until weights stabilize
1/31/2011
30
Business Intelligence
Backpropagation Learning
CSC 562
(Figure 6.10 Page 258)
a(Zi – Yi)
error
x1
w1
x2
w2
.
.
.
Neuron (or PE)
S 
n

i 1
X iW i
Summation
f (S )
Y  f (S )
Yi
Transfer
Function
wn
xn
S Backpropagation of Error for a Single Neuron
1/31/2011
31
Business Intelligence
Development Process of an ANN
CSC 562
(Page 259)
Similar to structured design for traditional IS, with some new elements
See page 253
1/31/2011
32
Business Intelligence
An MLP ANN Structure for the Box-Office
CSC 562
Prediction Problem (Page 262, Fig 6.12)
This is the vignette at the start of Chapter 5 of page 191
Class 1 - FLOP
(BO < 1 M)
MPAA Rating (5)
(G, PG, PG13, R, NR)
1
2
Class 2
(1M < BO < 10M)
Competition (3)
(High, Medium, Low)
2
3
Class 3
(10M < BO < 20M)
Star Value (3)
(High, Medium, Low)
3
4
Class 4
(20M < BO < 40M)
Genre (10)
(Sci-Fi, Action, ... )
4
5
Class 5
(40M < BO < 65M)
Technical Effects (3)
(High, Medium, Low)
5
6
Class 6
(65M < BO < 100M)
Sequel (2)
(Yes, No)
6
7
Class 7
(100M < BO < 150M)
Number of Screens
(Positive Integer)
7
8
Class 8
(150M < BO < 200M)
9
Class 9 - BLOCKBUSTER
(BO > 200M)
INPUT
LAYER
(27 PEs)
1/31/2011
1
...
...
HIDDEN
LAYER I
(18 PEs)
HIDDEN
LAYER II
(16 PEs)
33
OUTPUT
LAYER
(9 PEs)
Business Intelligence
Data Collection and Testing
CSC 562
(Page 261)
S Data is split into three parts
S Training (~60%)
S Validation (~20%)
S Testing (~20%)
1/31/2011
34
Business Intelligence
Sensitivity Analysis on ANN Models
CSC 562
(Page 264-265)
S A common criticism for ANN: The black-box
syndrome!
S Answer: sensitivity analysis
S Conducted on a trained ANN
S The inputs are changed while the relative change on
the output is measured/recorded
S Results illustrates the relative importance of input
variables
1/31/2011
35
Business Intelligence
Sensitivity Analysis on ANN Models
CSC 562
(Page 265, Figure 6.13)
Trained ANN
“the black-box”
Systematically
Perturbed
Inputs
Observed
Change in
Outputs
D1
S See and read example Application Case 6.5 (Page 266)
S Sensitivity analysis reveals the most important injury severity
factors in traffic accidents
1/31/2011
36
Business Intelligence
CSC 562
Sensitivity Analysis on ANN Models
(Page 266)
S Application Case 6.5 – see here
S 41,000 die in 6M US traffic accidents
S Analyze the factors that elevate the risk of severe injury
S Factors include behavior, environment, technical, etc.
S Used series of ANN models to estimate the significance of the
crash factors on the level of severity sustained by the driver.
S Two step process used (1) prediction models, (2) sensitivity
analysis on trained neural network
S Results shows significant differences among models built for
different injury severity levels. (The most influential factors
HIGHLY depend on the level of injury).
1/31/2011
37
Business Intelligence
A Sample Neural Network Project
Bankruptcy Prediction
CSC 562
(Pg 267-270)
S A comparative analysis of ANN versus logistic regression
(LR) (a statistical method)
S Inputs
S X1: Working capital/total assets
S X2: Retained earnings/total assets
S X3: Earnings before interest and taxes/total assets
S X4: Market value of equity/total debt
S X5: Sales/total assets
1/31/2011
38
Business Intelligence
A Sample Neural Network Project
Bankruptcy Prediction
CSC 562
S Data was obtained from Moody's Industrial Manuals
S Time period: 1975 to 1982
S 129 firms (65 of which went bankrupt during the period and 64
nonbankrupt)
S Different training and testing propositions are
used/compared
S 90/10 versus 80/20 versus 50/50
S Resampling is used to create 60 data sets
1/31/2011
39
Business Intelligence
A Sample Neural Network Project
Bankruptcy Prediction
x1
x2
BR = 1
x3
x4
NBR = 1
x5
1/31/2011
40
CSC 562
Network Specifics
 Feedforward MLP
 Backpropagation
 Varying learning and
momentum values
 5 input neurons (1 for
each financial ratio),
 10 hidden neurons,
 2 output neurons (1
indicating a bankrupt firm
and the other indicating a
nonbankrupt firm)
Business Intelligence
A Sample Neural Network Project
Bankruptcy Prediction – Results
CSC 562
(Page 269 figure 6.2)
1/31/2011
41
Business Intelligence
Bottomline - Advantages of ANN
CSC 562
(Pages 274-276)
S Able to deal with (identify/model) highly nonlinear
relationships
S Can handle variety of problem types (loan apps, forecast
profitability / finances, sports – team success, fraud
prevention, time-series forecasting, health care and
medicine – diagnose breast cancer – see Case 6.4 on page
276)
S Usually provides better results (prediction and/or
clustering) compared to its statistical counterparts
1/31/2011
42
Business Intelligence
CSC 562
Disadvantages of ANN
S They are deemed to be black-box solutions, lacking expandability
S It is hard to find optimal values for large number of network
parameters
S Optimal design is still an art: requires expertise and extensive
experimentation
S It is hard to handle large number of variables (especially the rich
nominal attributes)
S Training may take a long time for large datasets; which may
require case sampling
1/31/2011
43
Business Intelligence
ANN Software
CSC 562
(Page 263)
S Standalone ANN software tool
S NeuroSolutions
S BrainMaker
S NeuralWare
S NeuroShell, … for more (see pcai.com) …
S Part of a data mining software suit
S PASW (formerly SPSS Clementine)
S SAS Enterprise Miner
S Statistica Data Miner, … many more …
1/31/2011
44
Business Intelligence
CSC 562
Next lecture
Chapter 7 - Text and Web Mining
S
1/31/2011
45
Business Intelligence
Download