Detecting Cellular Fraud Using Adaptive Prototypes.

From: AAAI Technical Report WS-97-07. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Detecting Cellular FraudUsing Adaptive Prototypes.
Peter Burge and John Shawe-Taylor
Department
of Computer
Science
RoyalHolloway
Universityof London
Egham.Surrey, England.TW20
0EX
{ Peteb,jst} @dcs.rhbnc.ac.uk
Abstract
Thispaperdiscussesthe currentstatus of researchonfraud
detectionundertaken
a.s part of the European
Commissionfunded ACTS
ASPECT
(AdvancedSecurity for Personal
CommunicationsTechnologies) project, by Royal
Holloway
Universityof London.Usinga recurrentneural
networktechnique,weuniformly
distribute prototypesover
Toll Tickets. sampledfromthe U.K.networkoperator,
Vodafone.
Theprototypes,whichcontinueto adaptto cater
for seasonalor long term trends, are usedto classify
incoming
TollTicketsto formstatistical behaviour
proFdes
covering both the short and long-term past. These
behaviour
profiles,maintained
as probabilitydistributions,
comprisethe input to a differential analysisutilising a
measureknownas the HeUinger
distance[5]betweenthem
as an alarmcriteria. Finetuningthe systemto minimise
the
number
of false alarmsposesa significanttask dueto the
lowfraudulent/non
fraudulentactivity ratio. Webenefit
fromusing unsupervisedlearning in that no fraudulent
examples
ate requitedfor training. Thisis veryrelevant
consideringthe currently secure nature of GSM
where
fraudscenarios,otherthanSubscription
Fraud,haveyet to
manifest themselves. It is the aim of ASPECT
to be
preparedfor the would-befraudster for both GSM
and
UMTS,
Introduction
Whena mobile originated phonecall is madeor various
inter-call criteria are metthe cells or switchesthat a mobile
phone is communicating with produce information
pertaining to the call attempt. Thesedata records, for
billing purposes, are referred to as Toll Tickets. Toll
Tickets contain a wealth of informationabout the call so
that chargescan be madeto the subscriber. Byconsidering
well studiedfraud indicators these recordscan also be used
to detect fraudulentactivity. Bythis wemeaninterrogating
a series of recent Toll Tickets andcomparing
a functionof
the variousfields with fixed criteria, known
as triggers. A
trigger, if activated, raises an alert status which
cumulativelywouldlead to investigation by the network
operator. Someexamplefraud indicators are that of a new
9
subscriber makinglong back-to-backinternational calls
beingindicative of direct call selling or short back-to-back
calls to a single land numberindicating an attack on a
PABXsystem. Sometimes geographical information
deducedfromthe cell sites visited in a call can indicate
cloning. This can be detected throughsetting a velocity
trap.
Fixedtrigger criteria can be set to catch such extremesof
activity, but these absolute usagecriteria cannottrap all
types of fraud. Analternative approachto the problemis to
performa differential analysis. Herewedevelopbehaviour
profiles relating to the mobile phone’s activity and
compareits mostrecent activities with a longerhistory of
its usage. Techniquescan then be derived to determine
whenthe mobilephone’sbehaviourchangessignificantly.
One of the most commonindicators of fraud is a
significant changein behaviour.
The performanceexpectations of such a system mustbe of
prime concern whendeveloping any fraud detection
strategy. To implementa real time fraud detection tool on
the Vodafonenetworkin the U.K,it wasestimatedthat, on
average, the system would need to be able to process
around38 Toll Tickets per second. This figure varied with
peakand off-peak usageand also had seasonal trends. The
distribution of the times that calls are madeand the
duration of each call is highly skewed.Consideringall
calls that are madein the U.K., including the use of
supplementary
services, wefoundthe averagecall duration
to be less than eight seconds,hardlytime to ordera pizza.
In this paper we present one of the methodsdeveloped
under ASPECTthat tackles the problem of skewed
distributions and seasonaltrends using a recurrent neural
network technique that is based around unsupervised
learning. Weenvisagethis techniquewouldformpart of a
larger fraud detectionsuite that also comprisesa rule based
fraud detection tool and a neural networkfraud detection
tool that uses supervised learning on a multi-layer
perceptron. Each of the systems has its strengths and
weaknessesbut weanticipate that the hybrid systemwill
combinetheir strengths.
Thefollowingsection discusses in moredetail the concept
of behaviourprofiling for the purposesof performinga
differential analysis. Thisis followed,in section 3, by the
neural networkprototyping technique and the waythese
prototypes are used to generate behaviour profiles. In
section 4 wedescribe the workingsof the fraud engineand
followup with somepreliminaryresults. Lastly wediscuss
howweeater for changingdistributions.
Behaviour profiling
For a differential analysis weneedinformationabout the
mobile phone’s history of behaviour plus a more recent
sampleof the mobilephone’sactivities. Aninitial attempt
mightbe to extract heuristic information from the Toll
Tickets and store it in record format. For this simple
scenario we wouldneed to consider two windowsor time
spans over the sequenceof transactions for each user. The
shorter sequencecould be called the Current Behaviour
Profile (CBP)and the longer sequence the Behaviour
Profile History (BPH).Both profiles could be treated
finite length queues. Whena newToll Ticket arrives,
relating to a given user, the oldest entry from the BPH
would be discarded and the oldest entry from the CBP
would move to the back of the BPHqueue. The new
record encodedfrom the incomingToll Ticket wouldthen
join the backof the CBPqueue.
Clearly,in practice, it is not optimalto searchandretrieve
a history of transaction records froma databaseprior to
each calculation on receipt of a newToll Ticket. Instead
wecomputea single behaviour profile record whichwe
store in a database using the International Mobile
SubscriptionIdentity (IMSI)as the primarykey. As a new
Toll Ticketarrives, for a particular subscriber,the profile
record is simply updated with information reduced from
the Toll Ticket. In order to preserve the conceptof two
different time spansover the Toll Tickets, wewill needto
decay the influence of previous Toll Tickets, on the
profiles, before addingin informationfromthe newone.
Byapplying two different decay factors wecan maintain
the concept of a CBPand a BPH.Of course we haveto be
careful not to dilute informationby applyinga decayfactor
and thus introducing false informationto the behaviour
profile. The following section describes a prototyping
technique, based on the Second Maximal Entropy
Principle by Grabec[1] which enables us to construct
statistical behaviourprofiles that are simple to decay
withoutintroducingfalse behaviourpatterns.
systemsgenerally reduces empirical information. Neural
Networks are capable of forming optimal discrete
representations of continuous randomvariables through
their ability to converge,by lateral interaction, to stable
uniformlydistributed states, yon der Malsburg[4].
Grabec[1]introduces a technique to dynamicallygenerate
prototypical values to span a continuousrandomvariable
as samplesare taken fromit. Healso suggestsan extension
to generate prototypes for multi dimensional random
variables. In its simplest formhis methodresemblesthe
morewell knownself organisation technique developedby
Kohonen1989 [3]. However,Grabec’s method does not
restrict the prototypes to lie on a two dimensional
manifold, but allows them to form their owntopology.
Onlyone pass throughthe training set is required giving
rise to the potentialfor onlineadaption.
Grabecintroduced the second maximalentropy principle
stating that
The mappingof a continuousrandomvariable X into a set
of K discrete prototypes O, reduces the empirical
informationby the least amountif a uniformdistribution
{P(qi) = ~,i 1...K }, corresponding to theabsolute
maximum(SQ = logK) of information
entropy,
is
assignedto Q.
Whenconsidering the set of all possible Toll Tickets, we
clearly have a dimensionto represent every parameter,
froma Toll Ticket, that wewishto includein the analysis.
Each parameter of a Toll Ticket can assumea range of
values and is thus itself a randomvariable. Grabec’s
techniqueenables us to create a numberof prototypesthat
dynamicallyand uniformlyspan the set of samplesfrom a
downloadof Toll Tickets taken from a live network.
Owingto the fact that there are so few fraudulent Toll
Tickets in comparisonto non fraudulent ones, in a live
network download, the prototypes will organize
themselvesas if the data were totally fraud free. The
resulting set of prototypeswill enableus to classify future
incoming Toll Tickets with minimalloss of empirical
information.
Todistribute the prototypesover the input streamof Toll
Tickets, weset up an iterative procedurethat computesthe
changein the current value of the KprototypesQ
K
M
k~l i#m
I = 1...K; m=l...M
Prototyping
Prototyping is a methodof forming an optimal discrete
representation of a naturally continuousrandomvariable.
Theprocessingof continuousrandomvariables by discrete
10
which starts with Aq}°)= B~. The coefficients
determinedby the expressions:
are
6OO8
,!
rs,., _ (qtm - qk=)(%
2
2o.
L
4~
expr- (qt - qk )2
.2
4o
¯.
L
,<l<,<..,.-q,.,-V-z,-j
I,~-,
, [_(q
1-’~" L tqk,~ -q,,, ) exp[t
::
C~I~tllt Tinle
.
B"-N+I/
4~
k=l
12pm
Figure 2 - 50 prototypes of the 50,000 international calls from
the Panafon network.
_q,)2
Constructing
L
-."
profiles
Usingthe K Toll Ticket prototypes Q, wecan nowencode
future Toll Ticketsas feature vectorsv
s
where O.- K)/U approximates the standard deviation
exp(-II XN+
1 - Qs II)
where v s - ,~S~l exp(-IIXN.l - ajll)
and S is the expectedrange of X
Figure 1 below shows the distribution of 50,000
international calls randomlysampledover a two month
period from the Greeknetworkoperator Panafon. In this
figure weonly consider the two parametersTT-ChargingStart-Timeand "IT-Call-Duration.
"’,i ’.,,..:"...;".~;::
.’.;.;;~;,:
:-’.,"",
"..I
¯¯.
-.,-:i:’~..;,...~.
:.’:"’:’..:’-:..’-’.,:
J;-.;’2".~’.;
"~’"
,’ ~. ".~. ~,. : ¯ ’., ,::’,~" ..~;’::.’l, ..¢¢.’,~,’ ." t
60111
Dtlnluo¢l
0S
....
1
’
,7."~,<’~"~:"
,~’f"..":,--’
~.~;~,.,~,i:*
’.~,_~"-~
~,~,li:.,~,~:~:,
¯,
: ¯ ,~.,g~_c..i~,.tb~~,A
~.~.-.i
.."
Om~
. . ..’.’ ;;
Note that
~K
j=l vs = 1 and so the
feature
vector can be
viewedas a probabilitydistribution.
Feature vectors are generated for National Calls,
International Calls and for the use of supplementary
services. The CBPand the BPHaxe nowalso formed as
probability distributions using twodifferent decayfactors
O~and /] to maintain the concept of two time spans over
the Toll Tickets. After the encodingof each Toll Ticket
into the feature vector v, each element of the CBPis
updatedin the followingway
C~= aC~+ (1 - oc)v,.
Cdl St.trt "rln~
lipm
Where C~is the ith entry of the CBP. Note that
C~= 1 .Following this. both the CBPand BPHaxe
Figure 1 - 50,00 international calls fromthe Panafonnetwork.
Figure 2 demonstrates the result of applying the neural
network prototyper to the above data specifying that it is to
develop 50 prototypes.
presented to the fraud engine as will be discussed in the
followingsection.
TheBPHis then updatedusing the CBPas follows:
H, = ~H, + (1 - fl)C~
where Hi represents
makes ~Hs = 1.
11
the ith entry in the BPH. This also
Figure 4 show the CBPand BPHof a subscriber who
raisedan alarm.It is interestingto notethat the spikeis for
the BPHwhichindicates that the alarm wasraised through
a suddendrop in activity. Onefraud scenario that could
accountfor this is handsettheft wherethe thief is unlikely
to call the subscribersvoicemail service.
Thefraud engine
Thetask of the fraud engine is to take the user profile
record, consisting of the CBPandthe BPH,and calculate a
measureknownas the Hellinger distance
£
d=
2
On its ownhowever, one would probably not consider a
drop in activity as a primaryfraud indicator howeverit
couldbe usedto supportother indications.
t=O
whereC and H are the CBPand BPHrespectively and K is
nowthe numberof entries in the profile record. This is a
natural measureto take whencomparingtwo probability
distributions. TheHellingerdistance will alwaysbe a value
betweenzero and two wherezero is for equal distributions
andtworepresents orthogonality.TheHellingerdistancein
this scenario can be seen as a measureof howerratic the
behaviouris. Thefraud enginethen checksif d is less than
somethreshold value, whichis determinedby experiment
through performancetuning, and if necessary raises an
alarm.
Thevalue of d can be further used to indicate howsevere
the changein behaviourwas. This meansthat alarms can
be prioritised for investigation by the networkoperator.
Owingto this fact, weare not so concernedabout setting
low alarm thresholds because if sorted into order of
severity, only the upperpart of the ordered list needbe
investigated.
P(X)
National Calls Profile
International
Calls Profile
SS
X=~ototype
Figure4 - A CBP
anda BPH
of a subscriberwhoraisedan
alarm.
Experimentalresults
Figure 3 showsa BPHand a CBPfor a subscriber whowas
displaying what might be considered as acceptable
behaviour,i.e. the fraud detection tool had not raised an
alarm. The subscriber did not makeinternational calls
howeverdid use supplementary
services such as voice mail
and call forwarding.
To develop the prototypes we used a two months
downloadof Toll Tickets relating to all calls madeby new
subscribers. 50 prototypes were generated relating to
national calls, 50 for international calls and10 for calls
relating to supplementaryservices. Thedecay factors o~
and /~ used to maintain the CBPand the BPHwere 0.9
and 0.98 respectively. Behaviourprofiles weredeveloped
over a twoweekperiodbefore detection started.
P(X)
...../.j9
National Calls Profile
I~
International
Thefraud detectiontool is still in the very early stages of
development. Initial trials have been performedusing
parameters that have yet to be performancetuned, with
very pleasing results. For these initial trials, weonly
developedprototypes of the 2Dspace of call start time
versescall duration.
lgt~d
Twodata sets were used for the trial itself. The first
consistedof the Toll Ticketsfor a subset of nonfraudulent
subscribers. The second comprisedthe Toll Tickets lbr
subscribers whoseservice had been barred due to the
detection of overlappingcalls, indicative of cloning.These
Toll Tickets were converted from the analogue TACS
networkinto GSM
format.
Calls Profile
X--prototype
Figure3 - ACBP
anda BPH
of a subscriberexhibiting
acceptablebehaviour.
12
Even though these fraudsters were all detected for
overlappingcalls, our fraud detection tool wouldbase its
analysis on behaviour change. For this reason we
consideredthis trial a valid first attempt. Theresult was
that our fraud detection tool detected75%of the fraudsters
whilst only misclasifying 4%of the valid subscribers as
fraudsters. However
as the list wasordered accordingto
the severity of the fraud the 4% of misclassified
subscribersappearedat the bottomof the fraudsterlist.
Tickets, or the numberof days, that the behaviourprofiles
should be developedover before detection starts and the
alarm threshold needto be determinedby experiment.
Theaboveideas are currently being developedandput into
practicefor the nextgenerationof our frauddetectiontool.
References
Online adaptation of prototypes
[1] GrabecI, Modellingof Chaosby a Self-Organizing
Neural Network.Artificial Neural Networks.Proceedings
of ICANN, Espoo, Vol 1, pp 151-156, Elsevier
publications.
By considering a windowover the sequence of Toll
Tickets. within whichprototypesare uniformlydistributed,
wecan ensurethat the prototypesadaptto natural trends in
behavioursuchas with seasonaldrift. This ensuresthat the
feature vectors being constructed always comprise
maximum
informationcontent, enablingclear distinctions
betweenCurrentandHistoryprofiles.
[2] Haykin S, 1984, Neural NetworksA Comprehensive
Foundation. MacmillanCollege Publishing Company.
[3] KohonenT, 1988, Self-organization and Associative
Memory,
Springer Verlag, Berlin, pp 119.
Grabec’s technique encodes the input Toll Tickets X as
prototypesQaccordingto the probabilitydistribution
Pr
- ~ k-~
(X)_K(2xcr)
[4] Malsburg,Chr. Vonder, 1973, Self-Organization of
orientation sensitive cells in the striate cortex. Kybernetic
14, pp 85-100.
exp
L
[5] Pitman E, 1979, Somebasic theory for statistical
inference. Chapman
and Hall, London.
whereO" = ~K/t/~¢ .
Wepropose that with hindsight, we could focus the
generationof prototypeson Toll Tickets that proveto be of
most relevance to fraud scenarios. For example,
concentratingless on medium
duration national calls. We
proposethat this be implementedby an adaptive critic,
Haykin[2l, wherebyinformationconcerningthe success or
failure of a schemeis used to direct the future learning
experience, determiningregions of the Toll Ticket space
critical to anyanalysis. Anegativeeffect of an evolvingset
of prototypesis the impacton very infrequent users whose
profiles will becomeunstable showinguntrue behavioural
anomalies. Weare currently investigating this issue,
however
questionhowuseful a differential analysis of this
categoryof user is in the first place.
Thereare a numberof parameters,in addition, relating to
the operation of the fraud detection tool that needto be
optimised.Thefirst is the numberof prototypesthat will
be used to encodethe Toll Ticket space. Nextare the two
decay factors O~and fl which determine the relative
lengths of the short and long term past. These decay
factorscouldwell be critical to the successor failure of the
system. Further parameters such as the numberof Toll
13