From: AAAI Technical Report WS-97-07. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved. Detecting Cellular FraudUsing Adaptive Prototypes. Peter Burge and John Shawe-Taylor Department of Computer Science RoyalHolloway Universityof London Egham.Surrey, England.TW20 0EX { Peteb,jst} @dcs.rhbnc.ac.uk Abstract Thispaperdiscussesthe currentstatus of researchonfraud detectionundertaken a.s part of the European Commissionfunded ACTS ASPECT (AdvancedSecurity for Personal CommunicationsTechnologies) project, by Royal Holloway Universityof London.Usinga recurrentneural networktechnique,weuniformly distribute prototypesover Toll Tickets. sampledfromthe U.K.networkoperator, Vodafone. Theprototypes,whichcontinueto adaptto cater for seasonalor long term trends, are usedto classify incoming TollTicketsto formstatistical behaviour proFdes covering both the short and long-term past. These behaviour profiles,maintained as probabilitydistributions, comprisethe input to a differential analysisutilising a measureknownas the HeUinger distance[5]betweenthem as an alarmcriteria. Finetuningthe systemto minimise the number of false alarmsposesa significanttask dueto the lowfraudulent/non fraudulentactivity ratio. Webenefit fromusing unsupervisedlearning in that no fraudulent examples ate requitedfor training. Thisis veryrelevant consideringthe currently secure nature of GSM where fraudscenarios,otherthanSubscription Fraud,haveyet to manifest themselves. It is the aim of ASPECT to be preparedfor the would-befraudster for both GSM and UMTS, Introduction Whena mobile originated phonecall is madeor various inter-call criteria are metthe cells or switchesthat a mobile phone is communicating with produce information pertaining to the call attempt. Thesedata records, for billing purposes, are referred to as Toll Tickets. Toll Tickets contain a wealth of informationabout the call so that chargescan be madeto the subscriber. Byconsidering well studiedfraud indicators these recordscan also be used to detect fraudulentactivity. Bythis wemeaninterrogating a series of recent Toll Tickets andcomparing a functionof the variousfields with fixed criteria, known as triggers. A trigger, if activated, raises an alert status which cumulativelywouldlead to investigation by the network operator. Someexamplefraud indicators are that of a new 9 subscriber makinglong back-to-backinternational calls beingindicative of direct call selling or short back-to-back calls to a single land numberindicating an attack on a PABXsystem. Sometimes geographical information deducedfromthe cell sites visited in a call can indicate cloning. This can be detected throughsetting a velocity trap. Fixedtrigger criteria can be set to catch such extremesof activity, but these absolute usagecriteria cannottrap all types of fraud. Analternative approachto the problemis to performa differential analysis. Herewedevelopbehaviour profiles relating to the mobile phone’s activity and compareits mostrecent activities with a longerhistory of its usage. Techniquescan then be derived to determine whenthe mobilephone’sbehaviourchangessignificantly. One of the most commonindicators of fraud is a significant changein behaviour. The performanceexpectations of such a system mustbe of prime concern whendeveloping any fraud detection strategy. To implementa real time fraud detection tool on the Vodafonenetworkin the U.K,it wasestimatedthat, on average, the system would need to be able to process around38 Toll Tickets per second. This figure varied with peakand off-peak usageand also had seasonal trends. The distribution of the times that calls are madeand the duration of each call is highly skewed.Consideringall calls that are madein the U.K., including the use of supplementary services, wefoundthe averagecall duration to be less than eight seconds,hardlytime to ordera pizza. In this paper we present one of the methodsdeveloped under ASPECTthat tackles the problem of skewed distributions and seasonaltrends using a recurrent neural network technique that is based around unsupervised learning. Weenvisagethis techniquewouldformpart of a larger fraud detectionsuite that also comprisesa rule based fraud detection tool and a neural networkfraud detection tool that uses supervised learning on a multi-layer perceptron. Each of the systems has its strengths and weaknessesbut weanticipate that the hybrid systemwill combinetheir strengths. Thefollowingsection discusses in moredetail the concept of behaviourprofiling for the purposesof performinga differential analysis. Thisis followed,in section 3, by the neural networkprototyping technique and the waythese prototypes are used to generate behaviour profiles. In section 4 wedescribe the workingsof the fraud engineand followup with somepreliminaryresults. Lastly wediscuss howweeater for changingdistributions. Behaviour profiling For a differential analysis weneedinformationabout the mobile phone’s history of behaviour plus a more recent sampleof the mobilephone’sactivities. Aninitial attempt mightbe to extract heuristic information from the Toll Tickets and store it in record format. For this simple scenario we wouldneed to consider two windowsor time spans over the sequenceof transactions for each user. The shorter sequencecould be called the Current Behaviour Profile (CBP)and the longer sequence the Behaviour Profile History (BPH).Both profiles could be treated finite length queues. Whena newToll Ticket arrives, relating to a given user, the oldest entry from the BPH would be discarded and the oldest entry from the CBP would move to the back of the BPHqueue. The new record encodedfrom the incomingToll Ticket wouldthen join the backof the CBPqueue. Clearly,in practice, it is not optimalto searchandretrieve a history of transaction records froma databaseprior to each calculation on receipt of a newToll Ticket. Instead wecomputea single behaviour profile record whichwe store in a database using the International Mobile SubscriptionIdentity (IMSI)as the primarykey. As a new Toll Ticketarrives, for a particular subscriber,the profile record is simply updated with information reduced from the Toll Ticket. In order to preserve the conceptof two different time spansover the Toll Tickets, wewill needto decay the influence of previous Toll Tickets, on the profiles, before addingin informationfromthe newone. Byapplying two different decay factors wecan maintain the concept of a CBPand a BPH.Of course we haveto be careful not to dilute informationby applyinga decayfactor and thus introducing false informationto the behaviour profile. The following section describes a prototyping technique, based on the Second Maximal Entropy Principle by Grabec[1] which enables us to construct statistical behaviourprofiles that are simple to decay withoutintroducingfalse behaviourpatterns. systemsgenerally reduces empirical information. Neural Networks are capable of forming optimal discrete representations of continuous randomvariables through their ability to converge,by lateral interaction, to stable uniformlydistributed states, yon der Malsburg[4]. Grabec[1]introduces a technique to dynamicallygenerate prototypical values to span a continuousrandomvariable as samplesare taken fromit. Healso suggestsan extension to generate prototypes for multi dimensional random variables. In its simplest formhis methodresemblesthe morewell knownself organisation technique developedby Kohonen1989 [3]. However,Grabec’s method does not restrict the prototypes to lie on a two dimensional manifold, but allows them to form their owntopology. Onlyone pass throughthe training set is required giving rise to the potentialfor onlineadaption. Grabecintroduced the second maximalentropy principle stating that The mappingof a continuousrandomvariable X into a set of K discrete prototypes O, reduces the empirical informationby the least amountif a uniformdistribution {P(qi) = ~,i 1...K }, corresponding to theabsolute maximum(SQ = logK) of information entropy, is assignedto Q. Whenconsidering the set of all possible Toll Tickets, we clearly have a dimensionto represent every parameter, froma Toll Ticket, that wewishto includein the analysis. Each parameter of a Toll Ticket can assumea range of values and is thus itself a randomvariable. Grabec’s techniqueenables us to create a numberof prototypesthat dynamicallyand uniformlyspan the set of samplesfrom a downloadof Toll Tickets taken from a live network. Owingto the fact that there are so few fraudulent Toll Tickets in comparisonto non fraudulent ones, in a live network download, the prototypes will organize themselvesas if the data were totally fraud free. The resulting set of prototypeswill enableus to classify future incoming Toll Tickets with minimalloss of empirical information. Todistribute the prototypesover the input streamof Toll Tickets, weset up an iterative procedurethat computesthe changein the current value of the KprototypesQ K M k~l i#m I = 1...K; m=l...M Prototyping Prototyping is a methodof forming an optimal discrete representation of a naturally continuousrandomvariable. Theprocessingof continuousrandomvariables by discrete 10 which starts with Aq}°)= B~. The coefficients determinedby the expressions: are 6OO8 ,! rs,., _ (qtm - qk=)(% 2 2o. L 4~ expr- (qt - qk )2 .2 4o ¯. L ,<l<,<..,.-q,.,-V-z,-j I,~-, , [_(q 1-’~" L tqk,~ -q,,, ) exp[t :: C~I~tllt Tinle . B"-N+I/ 4~ k=l 12pm Figure 2 - 50 prototypes of the 50,000 international calls from the Panafon network. _q,)2 Constructing L -." profiles Usingthe K Toll Ticket prototypes Q, wecan nowencode future Toll Ticketsas feature vectorsv s where O.- K)/U approximates the standard deviation exp(-II XN+ 1 - Qs II) where v s - ,~S~l exp(-IIXN.l - ajll) and S is the expectedrange of X Figure 1 below shows the distribution of 50,000 international calls randomlysampledover a two month period from the Greeknetworkoperator Panafon. In this figure weonly consider the two parametersTT-ChargingStart-Timeand "IT-Call-Duration. "’,i ’.,,..:"...;".~;:: .’.;.;;~;,: :-’.,"", "..I ¯¯. -.,-:i:’~..;,...~. :.’:"’:’..:’-:..’-’.,: J;-.;’2".~’.; "~’" ,’ ~. ".~. ~,. : ¯ ’., ,::’,~" ..~;’::.’l, ..¢¢.’,~,’ ." t 60111 Dtlnluo¢l 0S .... 1 ’ ,7."~,<’~"~:" ,~’f"..":,--’ ~.~;~,.,~,i:* ’.~,_~"-~ ~,~,li:.,~,~:~:, ¯, : ¯ ,~.,g~_c..i~,.tb~~,A ~.~.-.i .." Om~ . . ..’.’ ;; Note that ~K j=l vs = 1 and so the feature vector can be viewedas a probabilitydistribution. Feature vectors are generated for National Calls, International Calls and for the use of supplementary services. The CBPand the BPHaxe nowalso formed as probability distributions using twodifferent decayfactors O~and /] to maintain the concept of two time spans over the Toll Tickets. After the encodingof each Toll Ticket into the feature vector v, each element of the CBPis updatedin the followingway C~= aC~+ (1 - oc)v,. Cdl St.trt "rln~ lipm Where C~is the ith entry of the CBP. Note that C~= 1 .Following this. both the CBPand BPHaxe Figure 1 - 50,00 international calls fromthe Panafonnetwork. Figure 2 demonstrates the result of applying the neural network prototyper to the above data specifying that it is to develop 50 prototypes. presented to the fraud engine as will be discussed in the followingsection. TheBPHis then updatedusing the CBPas follows: H, = ~H, + (1 - fl)C~ where Hi represents makes ~Hs = 1. 11 the ith entry in the BPH. This also Figure 4 show the CBPand BPHof a subscriber who raisedan alarm.It is interestingto notethat the spikeis for the BPHwhichindicates that the alarm wasraised through a suddendrop in activity. Onefraud scenario that could accountfor this is handsettheft wherethe thief is unlikely to call the subscribersvoicemail service. Thefraud engine Thetask of the fraud engine is to take the user profile record, consisting of the CBPandthe BPH,and calculate a measureknownas the Hellinger distance £ d= 2 On its ownhowever, one would probably not consider a drop in activity as a primaryfraud indicator howeverit couldbe usedto supportother indications. t=O whereC and H are the CBPand BPHrespectively and K is nowthe numberof entries in the profile record. This is a natural measureto take whencomparingtwo probability distributions. TheHellingerdistance will alwaysbe a value betweenzero and two wherezero is for equal distributions andtworepresents orthogonality.TheHellingerdistancein this scenario can be seen as a measureof howerratic the behaviouris. Thefraud enginethen checksif d is less than somethreshold value, whichis determinedby experiment through performancetuning, and if necessary raises an alarm. Thevalue of d can be further used to indicate howsevere the changein behaviourwas. This meansthat alarms can be prioritised for investigation by the networkoperator. Owingto this fact, weare not so concernedabout setting low alarm thresholds because if sorted into order of severity, only the upperpart of the ordered list needbe investigated. P(X) National Calls Profile International Calls Profile SS X=~ototype Figure4 - A CBP anda BPH of a subscriberwhoraisedan alarm. Experimentalresults Figure 3 showsa BPHand a CBPfor a subscriber whowas displaying what might be considered as acceptable behaviour,i.e. the fraud detection tool had not raised an alarm. The subscriber did not makeinternational calls howeverdid use supplementary services such as voice mail and call forwarding. To develop the prototypes we used a two months downloadof Toll Tickets relating to all calls madeby new subscribers. 50 prototypes were generated relating to national calls, 50 for international calls and10 for calls relating to supplementaryservices. Thedecay factors o~ and /~ used to maintain the CBPand the BPHwere 0.9 and 0.98 respectively. Behaviourprofiles weredeveloped over a twoweekperiodbefore detection started. P(X) ...../.j9 National Calls Profile I~ International Thefraud detectiontool is still in the very early stages of development. Initial trials have been performedusing parameters that have yet to be performancetuned, with very pleasing results. For these initial trials, weonly developedprototypes of the 2Dspace of call start time versescall duration. lgt~d Twodata sets were used for the trial itself. The first consistedof the Toll Ticketsfor a subset of nonfraudulent subscribers. The second comprisedthe Toll Tickets lbr subscribers whoseservice had been barred due to the detection of overlappingcalls, indicative of cloning.These Toll Tickets were converted from the analogue TACS networkinto GSM format. Calls Profile X--prototype Figure3 - ACBP anda BPH of a subscriberexhibiting acceptablebehaviour. 12 Even though these fraudsters were all detected for overlappingcalls, our fraud detection tool wouldbase its analysis on behaviour change. For this reason we consideredthis trial a valid first attempt. Theresult was that our fraud detection tool detected75%of the fraudsters whilst only misclasifying 4%of the valid subscribers as fraudsters. However as the list wasordered accordingto the severity of the fraud the 4% of misclassified subscribersappearedat the bottomof the fraudsterlist. Tickets, or the numberof days, that the behaviourprofiles should be developedover before detection starts and the alarm threshold needto be determinedby experiment. Theaboveideas are currently being developedandput into practicefor the nextgenerationof our frauddetectiontool. References Online adaptation of prototypes [1] GrabecI, Modellingof Chaosby a Self-Organizing Neural Network.Artificial Neural Networks.Proceedings of ICANN, Espoo, Vol 1, pp 151-156, Elsevier publications. By considering a windowover the sequence of Toll Tickets. within whichprototypesare uniformlydistributed, wecan ensurethat the prototypesadaptto natural trends in behavioursuchas with seasonaldrift. This ensuresthat the feature vectors being constructed always comprise maximum informationcontent, enablingclear distinctions betweenCurrentandHistoryprofiles. [2] Haykin S, 1984, Neural NetworksA Comprehensive Foundation. MacmillanCollege Publishing Company. [3] KohonenT, 1988, Self-organization and Associative Memory, Springer Verlag, Berlin, pp 119. Grabec’s technique encodes the input Toll Tickets X as prototypesQaccordingto the probabilitydistribution Pr - ~ k-~ (X)_K(2xcr) [4] Malsburg,Chr. Vonder, 1973, Self-Organization of orientation sensitive cells in the striate cortex. Kybernetic 14, pp 85-100. exp L [5] Pitman E, 1979, Somebasic theory for statistical inference. Chapman and Hall, London. whereO" = ~K/t/~¢ . Wepropose that with hindsight, we could focus the generationof prototypeson Toll Tickets that proveto be of most relevance to fraud scenarios. For example, concentratingless on medium duration national calls. We proposethat this be implementedby an adaptive critic, Haykin[2l, wherebyinformationconcerningthe success or failure of a schemeis used to direct the future learning experience, determiningregions of the Toll Ticket space critical to anyanalysis. Anegativeeffect of an evolvingset of prototypesis the impacton very infrequent users whose profiles will becomeunstable showinguntrue behavioural anomalies. Weare currently investigating this issue, however questionhowuseful a differential analysis of this categoryof user is in the first place. Thereare a numberof parameters,in addition, relating to the operation of the fraud detection tool that needto be optimised.Thefirst is the numberof prototypesthat will be used to encodethe Toll Ticket space. Nextare the two decay factors O~and fl which determine the relative lengths of the short and long term past. These decay factorscouldwell be critical to the successor failure of the system. Further parameters such as the numberof Toll 13