Neuro-Fuzzy Approaches to Decision
Making:
An Application
to Check Authorization
from Incomplete
Information
V K. Ramanil(ramani~crd.ge.com),
J.R. Echuaz2(echauz~amadeus.upr.clu.edu),
G.J. Vachtsevanos3(gjv@ee.gatech.edu),
and S.S. Kim3(skim@ee.gatech.edu)
1 InformationTechnology
Laboratory,Corportate Reseaxch &Development,
GeneralElectric, Niskayuna,
NY12309
2Dept.of Electrical &Computer
Engineering,Universityof PuertoRico, Mayagu’ez
Campus,
Maya~.u"ez,
PR00681
3Schoolof ElectricalandComputer
Engineering,Georsia
Institute of Technology
Atlanta,Georgia30332
Abstract m This paper describes the application
of several neuro-fuzzy paradigms, such as a multilayer perceptron, a polynomial neural network,
and a fuzzy decision modelto the problemof check
approval from incomplete data. A simple benchmarkcase is established as a performancemetric
to comparethe various non-linear strategies. An
overall Improvementof at least 10%w~ obtained
in each of these cases.
Keyworda:
Checkauthorization,Neuro-fuzzy
techniques.
I.
~NTRODUCTION
Bythe turn of the century, almost half of all consumer
paymentsare expected to be madeby checks [1]. This
trend
is forcing
moreandmoremerchants
to relyon
check-gtutrantee
andcheck-authorization
services
providedby speciA1;-ed
companies
thatassist
themto
manage
theassociated
increased
risk.Thesecompaniesmaintain
large
databases
withinformation
suchas
customers’
names,
~audulent
driver’s
licenses,
andSocialSecurity
numbers.
Stores
withconnections
to an
on-line
system,
canhavechecks
approved
or rejected
ina matter
ofseconds.
Despite the steady increase in the availab~ty of
theseservices, thenumber
of check-writing
individualsrepresented
incurrent
databases
isstill
a verysmall
fraction of the wholepopulation. In manycases, a new
customer
showsup forwhomthereis no information
available.
Is thereanyderision
scheme,
otherthan
purerandom
guessing,
thatcanreduce
theoddsfor
themerchant
of accepting
a badcheck?
Thisproblem
callsforsystems
thatcangener~1~e
fromincomplete
information,
andthuspredict
thecredit
worthiness
of
thenewcustomer.
Noalinear
learning
networks
are
prime
candidates
forthistask.
Wedescribe
theappli-
cation
of a multllayer
perceptron,
polynomial
neural
network,
anda fuzzy
decision
modelto adcLress
the
problem
of check
approval
fromincomplete
data.
If. PROBLEMDESCRIPTION
Con.sider
twosetsofdata~
fortraining
andtesting
purposes,
respectively,
typical
ofthose
available
froma
check
authorization
company.
Thedatasetconsists
of
fourinput
variables
andan output
variable:
zt : Dayof theweek(1=Monday,
...,7=Sunday)
x2 : Ageof the person
za : CheckNumber
z4 : Amount
of the data
y : 1 = Accept check, 0 = Reject check
Each set consists of 1950 data points. Out of the
3900datapoints
only74 datapoints
havethecomplete
vector
of information.
Anymissing
information
iscoded
aszeros.
Theoutput
ofthenetwork
isdenoted
asB fora bounced
check
andG fora goodcheck,
while
thepredicted
output
willberepresented
by B and
respectively.
HI.
METHODS AND RESULTS
A. MinimumDistance Classifier
x~Vithtwo equiprobableclasses, the classLfication accuracy of the nonlinear learning networksoutlined in
x nor better
thLs paper can be neither worse than 50%
than 100%. But how can we tell if the added complexity of these ne~vorksis justified? How"good"is
ILe~~han50~is actually"better"bycontradicting
the ou~put decLsion.
From: AAAI Technical Report WS-97-07. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
64
B
G
B [ 532 (53.8%)^ I 245 (25.5%)
G 456 (46.2%)
717 (74.5%)
Overall correct = 64.1%
^
(a)
Y
B
G
i ,0140
205%,
i 31410
140
,j
G $147,195 (70.5%) $192,849 (86.0%)
Overall correct = 58.7%
(b)
Fig. 1. Thresholded linear discriminant.
Table 1. (a) Performance of MDC(b) Performance
MDCin dollars
their performance relative to simpler, well-established
classification methods?
We will establish
a benchmark measure of performance defined by a minimumdistance classifier
use of zeros in places of missing data. The linear decision boundary implemented by the dichotomous MDC
cannot "interpret" zeros as special cases, but rather
(,XIDC). The decision rule adopted by any such system
takes them as customers of zero age writing checks
is simply: "assign x to the class whose mean feature
numbered 000 - clearly an artifact.
vector is closest (in the case of a Euclidean distance)
A simple preprocessing of the raw database can help
to x." Thus a dichotomous decision is given by
alleviate
the anomaly caused by missing data. One
llx- < IIx- 211 x e Ct
approach is to replace each zero with the modeor the
(1)
else
xeC2.
mean of the corresponding input variable. The latter
can
be easily estimated from presumably representaThe rule reduces to a bank of linear discriminants
followed by a maximumselector [2]. In fact, for a
tive nonzero data in an unbiased and consistent mantwo-class problem in particular, the decision boundner using the arithmetic average. In this case, the unary needs only one hyperplanar separatrix. Thus, the
known values are replaced with guesses that produce
near-mlnlmal mean squared errors. Once these guesses
classifier further reduces to a single thresholded linear
are computed from the training set, they become part
discriminant as shown in Fig. 1. The weight vector
r
[w0... w4] is given by
of the classifier and are applied, without any change,
to the test set.
The above scheme was employed on the MDCand
the results are summarizedin the confusion matrix on
Thisdecision
modeltrainsvirtually
instantaneously Table l(a). The columns represent the true output
as its parameters
are determined
fromestimates
of
classes whereas the rows represent the model assessthemeanfeaturevectors.
It is a reasonable
startments based on input information. Each entry coningpointin mostcases,andunderindependent
equaltaln.q the number of cases and the percentage relative
to its column. The overall classifier accuracy (trace of
variance
gaussianity
of thefeatures,
it is theoptimal
the matrix over sum of its entries) increased by almost
(maximum-likellhood)
model.
8%, and we take this new figure as a worst case bound
The MDC was trainedon the raw trainingset,and
the overall
classification
accuracy
was56.4%on the
for the more involved methods described next. It is intestset.Thisperformance
is onlyslightly
better
than
teresting to note that the exact same performance was
the"constant-output
classifier"
or the"coin-flip
classiobtained upon deletion of the first input variable), indicating that the day of the weekis irrelevant at least
tier,"whoseexpected
accuracy
is 50% assuming
equal
a priori
probabilities
foreachclass.
An obvious
factor
in an MDCsense.
In the context of check authorization systems, it is
thatcannot
bedirectly
handled
by thisclassifier
isthe
65
B
G
704 (71.3%)
274 (28.5%)
284 (28.7%)
688 (71.5%)
Overall correct = 71.4%
(a)
B
G
G $72,367 (34.7%) $179,080 (79.9%)
Overall correct = 72.8%
Fig. 2. Two-layer perception.
(b)
Table 2. (a) Performance of MLP(b) Performance
MLPin dollars
0.~
validation. The single output is a "soft" decision in
the interval [0,1] during training, but is hard-limitcd
in its final implementation, returning only values in
{0,1}. The network architecture is shown in Fig. 2 in
vector/matrix notation. The activation functions are
logistic sigmoids and the hard-limiter has a threshold
of 0.5.
Each input variable has different meanings and incommensurate ranges of values. A standardization of
the inputs in terms of z-scores does not change the finai performance of a trained network, but does tend
to make the learning procedure faster. The mean and
standard deviations are estimated from the nonzero
values in the training set.
The network was trained until the mean squared
error on the validation set started its upward trend the point at which generalization performance starts
to degrade. The optimal stopping point occurred after
67 passes of the training data as shown in Fig. 3.
The performance of this neural network on the test
set is summarized in Tables 2(a) and (b) in terms
number of cases and dollar amount.
|
i
o.~s
o
20
4Q
GO Rt’1
E~poch
10[3
1’2Q
140
Fig. 3. Training with cross validation.
also significant to monitor the confusion matrix expressed in dollar terms as shown in Table 1 (b). This
performance measure gives more weight to larger check
amounts, recognizing the proportionately larger impact of fraud in such transactions.
B.
Multilayer Perceptron
The results obtained using the MDC
served to guide in
the design of a (3,100,1) two-layer perceptron trained
with backpropagation. The choice of two layers, with
at least the hidden layer being nonlinear, is adequate in
many problems and gives the network the capability
of generating arbitrary decision boundaries [3]. The
three inputs to the network are the ones previously
found most essential.
The number of neurons follows
the rough rule-of-thumb of being one tenth the number
of training exemplars, which is roughly 1000 because
half of the "training" set was actually used for cross-
C.
PNN Algorithm
The inherent property of the Polynomial Neural Network (PNN) or the Group Method of Data Handling
(GMDH)is to model complex systems using simple
building blocks [4]. This led us to implement the PNN
strategy on the check validation problem.
As there was a lot of uncertainty associated with
the data available, both for training and validation
66
B
G
Overall correct -- 59.7~
(a)
T
B
G
G $111,770 (53.6%) $160,343 (71.5%)
Overall correct = 59.4%
(b)
Table 3. (a) Performance of PNN(b) Performance
PNNin dollars
Fig. 4. PNNmodel.
for a dichotomous decision only. The best models at
each layer were obtained by using a thresholding (T~)
error value. The output of these best models were
then combined at the next layer to obtain a higher
degree polynomial. The best output model, in terms
of, the minimumpredicted squared error, was used as
the model for the system.
purposes, two different strategies were implemented.
In the first case, to establish a benchmarkmeasure of
performance, the PNNmethodology was implemented
on the testing and validation data sets [5]. In the second case, the data was preprocessed. Preprocessing of
the data was considered a necessary step because of
the uncertainty (lack of information) associated with
the data. Preprocessing was further necessitated by
the fact that more than 98%of the training and validation data had at least one input variable miming.
It must be noted here that the computational complexity of the simulation is the same in both cases, as
the PNNmethodology was implemented on the same
number of data points.
In a PNNtechnique, a simple function is combined
at each node of a neural network to obtain a more complex function. This function represents the model for
the given set of input-output data. A simple function
of the following form is used to combine two inputs at
each node of the neural network.
y = A + Bxi + CxI + Dx~ + Ex~ + Fzizj
¯ Case I: PNNImplementation
A straightforward
implementation of the PNNon the
training data was carried out to obtain a model for the
data. Zeros were used in place of missing information.
This was done to obtain a worst case measure for the
neural net model. Someother methods, like averaging,
etc. could have been adopted to fill the missing information, but that would have skewed the results and
would not have allowed us to use it as a benchmark.
A model is constructed based on the training
This model is then tested using the validation
The results obtained by the implementation
PNNstrategy are summarized in the confusion
in Table 3(a).
, (3)
data.
data.
of the
matrix
The overall predictability
measure was 59.7% and
this is taken to be the worst case bound for the PNN
methodology. In dollar terms the confusion matrix
obtained was as shown in Table 3(b). Thus 59.4%
signifies the dollar amountthat was correctly predicted
in the first case.
where y is the output and x~ and xj are the two inputs. As shown in Fig. 4, the outputs obtained from
each of these nodes are then combined to obtain a
higher degree polynomial so that the best model may
be achieved which represents the input-output data.
The degree of the polynomial increases by two at
each layer of the neural network. The neural network
was restricted to two layers as it had to be trained
¯ CASEII: PNNImplementation after pre- processing
In this case, the data was pre-processed.
67
The entire
Model
Number
MI
MII
MIII
MIV
Input
missing
None
Age
Check #
Age & Check #
No. of
Data points
37
736
918
278
B
325
(32.9%)
757 (78.7%)
663(67.1%)[205(21-3%)
I Overallcorrect ---- 72.8%
(a)
B
Table 4. ModelClusters
Model
Number
MI
MII
MIII
MIV
Overall Correct
Data (%)
75.68
72.87
73.37
70.50
G
Overall correct
G
G $79,130
(84.6%)
~[
$129,533(37.9%)
(62.1%)[ [ $189,648
$34,617 (15.4%)
Overall correct = 73.7%
Amount
(%)
(b)
82.48
75.79
73.29
65.10
Table 6. (a) Performance of PNNwith preprocessing
(b) Performanceof PNNwith preprocessing in dollars
Table5. Performance
of PNNwithpreprocessing
(in
dusters)
training
andvalidation
setwas divided
intoclusters
based
on theinformation
available.
If alltheinput
variable
values
wereavailable
thenallsuchdatawas
putina cluster,
ifvariable
2 wasnotavailable
then
the
entire
suchdatawasputina separate
cluster.
Inthe
checkvalidation
example
fourmodels
wereobtained.
Thebreakup forthetraining
datawasas shownin
Table
4.
The PNN methodology
was thenimplemented
on
eachcluster
separately.
Theresults
werethentested
onthepreprocessed
tested
data.
Ifthetestdatacontainselements
in a cluster
whichwasnotpresent
in
thetraining
set,thennoprediction
canbemade.
The
implication
isthatthere
isa lackoftraining
datafor
sucha settopredict
theoutput.
Theconfusion
matrix,
bothforthedatapoints
and
amount
ofdollars,
wereobtained
forallfourclusters.
TheOverall
correct
percentages
forthedataanddollar
amount
foreachmodel
cluster
aregiven
inTable
5.
Theoverall
correct
percentage
fortheentire
data
wasalsoobtained
by combining
thevalues
calculated
fromeachmodelcluster,
andis shownin Table6(a)
andthedollar
values
areshown
inTable
6(b).
A markedimprovement
in theperformance
of the
Polynomial
NeuralNetwork
strategy
wasobserved,
whenit wascombined
witha preprocessing
stage.
The
overall
correct
percentage
fortheprediction
oftheoutputwasincreased
by 13.1%while
theoverall
correct
percentage
indollar
terms
wasincreased
by 14.3%.
68
Thus, the preprocessing improves the performance
significantly without increasing the computationaleffort. The only drawbackof the second case is that it
will not be able to predict an output if that particular
modelis not available in the training data set.
D.
Fuzzy Decision Model
An important concern for decision-making problems
is howto devise a methodthat would aid judgments
from a strategic point of view by meansof summarized
information. In this section, a fuzzy modelis used to
assist ihe humandecision process.
Fuzzymodel identification based on fuzzy implications and fuzzy reasoning [6] is one of the most important aspects of fuzzy system theory [7]. In this
paper the membership
function of a fuzzy set Ais represented by IZA(X), x E and th e fu zzy sets ar e as sociated with triangular shaped membershipfunctions.
The structure of a fuzzy decision model based upon
input-output information is defined as a finite set of
linguistic relations or rules, {Ri; i ---- 1,...,m}, which
together form an algorithm
R’:
Ifxl(k) is i and . .. a ndxn(k) i s A~ (
Then y(k) is Bi ,
wherexl,..., z,, are inputs, A~,..., A~are the fuzzy
sets in X1,..., X,~ and Bi is the fuzzy set in Y with
appropriate membershipfunctions. Xj (j = 1,..., n)
and Y are the universes of discourse of xj and y, respectively. After constructing the fuzzy model using
linguistic rules, the compositionalrule of inference[8]
Memb~rShilP
fu~ of t~ OUtlmt(decision)
0.8
O.O
0.4
O2
0.I
0.2 0..I 0,4 o.5 o,0 0.7 o.o
0.9
Memlm~n@
ItmCaor4
of ~ ~ amount(x4)
°"f/V.
t
0.2
lOO 2o0 300
4o0 5oo coo
u.ount($)
~o Ooo ooo
0
0
10o0
Fig. 5. (a) Convergenceof the centers of the 10 clusters
in Check mount variable. (b) The final result of the
FCM (c=lO, m=2).
is called upon to infer the output fuzzy variable from
given input information. The output fuzzy set is calculated from the expression:
~,B,(~)= m~m~[{~,A~(=?),...,
~,A,~(=°)},~B,(~)],~ ~ r,
where z°, j = 1,..-, n, is a given input singleton. The
centroid defuzzification methodis used to arrive at a
crisp output value, 90 [9]:
100 200 3(]0 400
(6)
We construct
membership
functions
fromthe collecteddataset usinga fuzzyc-means
(FCM)dusteringmethod[10].A fuzzyclustering
of X (crispdata
set)inton clusters
is a process
of assigning
a grade
of membership
foreachelement
to everycluster.
The
fuzzyclustering
problem
is formulated
as:
mi m e V) =
kml d,~l
subject
to
~-~u,k
= 1,
u,k~O,
l<i<c,
l<k<n,
(8)
ill
69
700 000 900 1000
Fig. 6. Membershipfunctions of the decision (two fuzzy
sets) and check amount (ten fuzzy sets).
where n is the numberof data points to be clustered, c
is the number of clusters, mis a scalar (m > 1), dik
[Ixk- viH, the Euclidean distance between each data
point xt~ in X, X = {xl,..., xn}, and the center of the
duster, vl; uik is the membershipgrade of the kth data
in the cluster i. If m = 1, it becomesa hard clustering
problem, whereas if m> 1, the clusters becomefuzzy.
More fuzziness can be obtained by increasing the value
of rn. The center vi of the fuzzy cluster is calculated
by
n
f ~8’¢v)’ ~ e Y"
,500 600
TI
v,= ~ (~,@"
x~/~
(~,k)",
I < i < c.
k=l
k=l
Four inputs (zl,...,z4
: day of week, age, check
number and check amount) and one output (!/: decision) variable are defined in the data set. The day of
week (zz) is a crisp value that can not be fuzzified.
An FCMclustering algorithm is applied to the training data, except zl, resulting in ten clusters. Fig. 5(a)
showsthe convergence of the centers vi, i = 1,..., 10,
in the check amount (z4) variable,
and-Fig. 5(b)
shows the membership grades from the training data,
x4. Thus, ten linguistic terms for the input variables
(z2, xs, and z4) are derived as shown in Fig. 6. Possible values for the output are either 1, (the check is
approved) or 0 (the check is turned down) and we
sign two linguistic terms for the decision (Fig. 6).
Nt
1
0
0
1
0
A0
Aa 1
A7 0
Aa 0
Ag 0
Ato 1
At
A2
A3
A4
1
0
0
0
0
1
0
1
0
0
1
0
1
1
0
1
0
0
0
0
1
0
0
1
1
0
0
0
1
1
0
1
1
1
1
0
0
0
1
0
1
1
0
0
0
1
1
0
1
1
0
0
0
1
0
1
1
0
1
0
0
0
0
1
0
1
1
0
0
0
0
1
0
1
0
0
1
1
1
0
Thenumber
of occurrence
: decision
reeuttBetween
[0 1;
1
1
1
0
1
1
0
1
1
1
Table7. Rulebase betweencheckamountand check
numberwith fixedday and age (Ni: Checknumber,
Ai: Checkamount)
B
B]
0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7
DecisionResults( P"’ Correctdecision.
I Incorrect
decision)
0.75
G
Fig.7. Correct
and incorrect
decision
result
from fuzzy
decision model (FDM).
741 (77"0%) I 273 (27"6%)
221 (23.0%)
715 (72.4%)
Overall
correct
----74.7%
(a)
75.1%.
G
B
IV.
$49,689 (22.2%) $150,731 (72.2%)
Overall correct = 75.1%
(b)
Table 8. (a) Performance of FDM(b) Performance
FDMin dollars
theoutputof thefuzzydecision
is greater
than0.5,
thefinaldecision
is ’approved’,
andif lessthan0.5,
theactionis ’turned
down’withunbiased
threshold.
Therulebaseforthefuzzydecision
modelcontains
rulesof the formas in (4).Table7 shows100rules
forthe checkamountand checknumberin the caseof
fixedday(crisp
variable)
andage(fuzzy
variable).
training
datasetincludes
a lotof missing
information
thatis represented
as 0, butit canbe easilyhandled
afterassigning
membership
functions
to it.In thecase
of an incomplete
rulebase,theemptyrulecellsare
filledby observing
theunderlying
pattern
[11].The
complete
rulesetconsists
of 7,000rules(7 × 103)and
includes
thedayof theweek.
Fig.7 showsthefinalresult
on thetesting
dataset
(1950).
The shadedregionrepresents
incorrect
decisions.Thefinalanalysis
baseduponthetesting
data
setis shownin Table8.Theoverall
correct
decision
in
terms of number of cases is 74.7% and in dollar amount
7O
DISCUSSION
AND
CONCLUSIONS
Basedon theoverallclassification
accuracy,
we may
conclude
thatlinearminimum
distance
classifiers
are
not capableof directlyhandling
missingdata,thus
yielding
onlyslightly
betterperformance
thanpure
randomguessing.However,preprocessingof the
database
by replacing
the zeroswiththeirestimated
meanvaluesimproves
the overallperformance
by almost 8%.
Thetwo-layer
perceptron
is abletobringtheoverall
classification
accuracy
to a littlemorethan70%and
hasmuchbetterbalance
(diagonal
elements
of theconfusion
tables
arecloser
to eachother)
thanthebenchmark method.
The polynomial
neuralnetworkwithpreprocessing
of the datawas ableto improvethe modelaccuracy
by nearly14% overthe benchmark
problem.Thiswas
obtained
without
anyadditional
computational
effort.
Thisisa reasonably
fastprocess,
as itonlyrequires
a
regression
of a quadratic
equation
at eachnode.
In thefuzzydecision
modelcase,theresultin terms
of thenumberof casesanddollaramount,
is slightly
betterthanothermethods
presented
in thispaper.We
canconclude
fromtheseresults,
thata fuzzysystem,
as a universal
approximator,
is efficient
in handling
imprecise information.
Considering the importance of achieving maximum
recognition rates in the check approval application, it
is seen that the markedimprovementin performance
from all three nonlinear decision makersin this study
justifies the significant additional effort involved in
defining and training such systems, and warrants further investigation.
REFERENCES
[I] L. Sloane, "Checking out checks: The methods
stores use," The NewYork Times, Col. 1, vol.
140, pp. 44(L), Nov. 24, 1990.
[2] N. Ahmed and K.R. Rao, Orthogonal Transforms for Digital Signal Processing. NewYork:
Springer-Verlag, 1975.
[3] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforwardnetworksare universal approximators," Neural Networks, vol. 2, pp. 359-366,
1989.
[4] S.J. Farlow, The GMDH
Algorithm." Self Organizing Methods in Modeling. GMDH
type Algorithms, NewYork: Marcel Dekker. 1984.
et.a/.,"Applications
of Poly[5]IL L. Barron,
nomial
NeuralNetworks
to FDIEandReconfigurable
Flight
Control,"
IEEProc.of National
Aerospace and Electronics Conf. , vol. 2, pp.
507-519. 1990.
[6] Zadeh, "Outline of a newapproachto the analysis
of complexsystems and decision processes," IEEE
Trans. Syst. Manand Cybern., vol. SMC-3,no.
1, pp. 28-44, Jan. 1973.
[7] G. J. Kilr and T. A. Folger, Fuzzy Sets, Uncertainty and Information, NewJersey: Prentice
Hall, 1988.
[8] M. Mizumoto, "Fuzzy controls under various
fuzzy reasoning methods," Information Science,
Vol. 45, pp. 129-151, 1988.
[9] C. C. Lee. "Fuzzy logic control systems: fuzzy
logic controller, Parts I and II," IEEETrans.
71
Syst. ManCybern., vol.20, no.2, pp. 404-435,
1990.
[10] C. Bezdek, Pattern Recognition with b’kzzy Objective Function Algorithms, NewYork, Plenum
Press, 1981.
[11] It. M. Tong, "Synthesis of fuzzy modelsfor industrial processes: Somerecent results," Int. J.
GeneralSystems, vol. 4, pp. 143-162, 1978.