Here

advertisement
+
Adaptive Fraud Detection
by Tom Fawcett and Foster Provost
Presented by: Lara Nargozian
updated from last 3 year’s presentation
by Adam Boyer,
Yunfei Zhao and Ahmen Abdeen Hamed
+
2
Why?

Solving real-world problems that are very important to each
and everyone of us

Provide a framework that can be adapted to solve similar
problems

Use Data Mining algorithms and techniques learned this
semester


Rule Learning

Classification
Fun to learn about
+
3
Outline

Problem Description
 Cellular cloning fraud problem
 Why it is important
 Current strategies

Construction of Fraud Detector
 Framework
 Rule learning, Monitor construction, Evidence combination

Experiments and Evaluation
 Data used in this study
 Data preprocessing
 Comparative results

Conclusion

Exam Questions
+
4
The Problem
 How
to detect suspicious changes in user behavior
to identify and prevent cellular fraud

Non-legitimate users, aka bandits, gain illicit access to a
legitimate user’s, or victim’s, account
 Solution

useful in other contexts
Identifying and preventing credit card fraud, toll fraud, and
computer intrusion
+
5
Cellular Fraud - Cloning
Cloning Fraud


A kind of Superimposition fraud.(parasite)

Fraudulent usage is superimposed upon ( added to ) the
legitimate usage of an account.

Causes inconvenience to customers and great expense to
cellular service providers.
+
6
Cellular communications and
Cloning Fraud
 Mobile
Identification Number (MIN) and Electronic
Serial Number (ESN)

Identify a specific account

Periodically transmitted unencrypted whenever phone is on
 Cloning
occurs when a customer’s MIN and ESN
are programmed into a cellular phone not
belonging to the customer

Bandit can make virtually unlimited, untraceable calls at someone
else’s expense
+ Interest in reducing Cloning Fraud
 Fraud
is detrimental in several ways:

Fraudulent usage congests cell sites

Fraud incurs land-line usage charges

Cellular carriers must pay costs to other carriers for usage outside
the home territory

Crediting process is costly to carrier and inconvenient to the
customer
7
+
8
Strategies for dealing
with cloning fraud
 Pre-call
Methods

Identify and block fraudulent calls as they are made

Validate the phone or its user when a call is placed
 Post-call
Methods

Identify fraud that has already occurred on an account so that
further fraudulent usage can be blocked

Periodically analyze call data on each account to determine
whether fraud has occurred.
+
9
Pre-call Methods
 Personal

PIN cracking is possible with more sophisticated equipment.
 RF

Identification Number (PIN)
Fingerprinting
Method of identifying phones by their unique transmission
characteristics
 Authentication



Reliable and secure private key encryption method.
Requires special hardware capability
An estimated 30 million non-authenticatable phones are in use in
the US alone (in 1997)
+
10
Post-call Methods
 Collision

Analyze call data for temporally overlapping calls
 Velocity

Detection
Checking
Analyze the locations and times of consecutive calls
 Disadvantage

of the above methods
Usefulness depends upon a moderate level of legitimate activity
+
11
Another Post-call Method
( Main focus of this paper )
 User
Profiling

Analyze calling behavior to detect usage anomalies suggestive of
fraud

Works well with low-usage customers

Good complement to collision and velocity checking because it
covers cases the others might miss
12
Sample Frauded Account
Date
Time
Day Duration
1/01/95 10:05:01
Mon
1/05/95 14:53:27
Origin
Destination Fraud
13 minutes
Brooklyn, NY
Stamford, CT
Fri
5 minutes
Brooklyn, NY
Greenwich, CT
1/08/95 09:42:01
Mon
3 minutes
Bronx, NY
Manhattan, NY
1/08/95 15:01:24
Mon
9 minutes
Brooklyn, NY
Brooklyn, NY
1/09/95 15:06:09
Tue
5 minutes
Manhattan, NY
Stamford, CT
1/09/95 16:28:50
Tue
53 seconds
Brooklyn, NY
Brooklyn, NY
1/10/95 01:45:36
Wed
35 seconds
Boston, MA
Chelsea, MA
Bandit
1/10/95 01:46:29
Wed
34 seconds
Boston, MA
Yonkers, NY
Bandit
1/10/95 01:50:54
Wed
39 seconds
Boston, MA
Chelsea, MA
Bandit
1/10/95 11:23:28
Wed
24 seconds
Brooklyn, NY
Congers, NY
1/11/95 22:00:28
Thu
37 seconds
Boston, MA
Boston, MA
Bandit
1/11/95 22:04:01
Thu
37 seconds
Boston, MA
Boston, MA
Bandit
+
13
The Need to be Adaptive

Patterns of fraud are dynamic – bandits constantly change
their strategies in response to new detection techniques

Levels of fraud can change dramatically from month-to-month

Cost of missing fraud or dealing with false alarms change
with inter-carrier contracts
+
Automatic Construction of Profiling Fraud Detectors
+
15
One Approach

Build a fraud detection system by classifying calls as being
fraudulent or legitimate

However there are two problems that make simple
classification techniques infeasible.
+
16
Problems with simple classification
 Context

A call that would be unusual for one customer may be typical for
another customer (For example, a call placed from Brooklyn is not
unusual for a subscriber who lives there, but might be very strange
for a Boston subscriber. )
 Granularity

(over fitting?)
At the level of the individual call, the variation in calling behavior is
large, even for a particular user.
+
17
In Summary:
Learning The Problem
1.
Which phone call features are important?
2.
How should profiles be created?
3.
When should alarms be raised?
+ Proposed Detector Constructor Framework
(DC-1)
18
+
19
DC-1 Processing Account-Day Example
+
20
DC-1 Fraud Detection Stages
Stage
1: Rule Learning
Stage
2: Profile Monitoring
Stage
3: Combining Evidence
+
21
Rule Learning – the 1st stage
 Rule

Rules are generated locally based on differences
between fraudulent and normal behavior for each
account
 Rule

Generation
Selection
Then they are combined in a rule selection step
+
22
Rule Generation
 DC-1
uses the RL program to generate rules with
certainty factors above user-defined threshold
 For
each Account, RL generates a “local” set of
rules describing the fraud on that account.
 Example:
(Time-of-Day = Night) AND (Location = Bronx)  FRAUD
Certainty Factor = 0.89
+
23
Rule Selection

Rule generation step typically yields tens of thousands of
rules

If a rule is found in ( or covers ) many accounts then it is
probably worth using

Selection algorithm identifies a small set of general rules
that cover the accounts

Resulting set of rules is used to construct specific
monitors
+ Rule Selection and Covering Algorithm
24
+
26
Profiling Monitors – the 2nd stage
Monitors have 2 distinct steps  Profiling


Monitor is applied to an account’s normal usage to measure the
account‘s normal activity.
Statistics are saved with the account.
 Use



step:
step:
A monitor processes a single account-day,
References the normalcy measure from profiling
Generates a numeric value describing how abnormal the current
account-day is.
+
27
Most Common Monitor Templates
Threshold
Standard
Deviation
+
28
Threshold Monitors
+
29
Standard Deviation Monitors
+
30
Comparing the same standard
deviation monitor on two accounts
+
31
Example for Standard Deviation
 Rule

(TIMEOFDAY = NIGHT) AND (LOCATION = BRONX) FRAUD
 Profiling

the subscriber called from the Bronx an average of 5 minutes per night with a
standard deviation of 2 minutes. At the end of the Profiling step, the monitor
would store the values (5,2) with that account.
 Use

Step
step
if the monitor processed a day containing 3 minutes of airtime from the Bronx
at night, the monitor would emit a zero; if the monitor saw 15 minutes, it would
emit (15 - 5)/2 = 5. This value denotes that the account is five standard
deviations above its average (profiled) usage level.
+
32
Combining Evidence from the
Monitors – the 3rd stage
 Weights
the monitor outputs and learns a
threshold on the sum to produce high
confidence alarms
 DC-1 uses Linear Threshold Unit (LTU)


Simple and fast
Enables good first-order judgment
A



Feature selection process is used to
Choose a small set of useful monitors in the final detector
Some rules don’t perform well when used in monitors, some overlap
Forward selection process chooses set of useful monitors
+
33
Final Output of DC-1

Detector that profiles each user’s behavior based on several
indicators

An alarm when sufficient evidence of fraudulent activity
+
Data used in the study
Data Information
+

Four months of phone call records from the New
York City area.
 Each
call is described by 31 original attributes
 Some
derived attributes are added

Time-Of-Day (MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)

To-Payphone
 Each
call is given a class label of fraudulent or
legitimate.
35
+
36
Data Cleaning
 Eliminated
credited calls made to
destinations/numbers that are not in the
created block

The destination number must be only called by the legitimate
user.
 Days
with 1-4 minutes of fraudulent usage were
discarded.

May have credited for other reasons, such as wrong number
 Call
times were normalized to Greenwich
Mean Time for chronological sorting
+
37
Data Description
 Once
the monitors are created and accounts
profiled, the system transforms raw call data into a
series of account-days using the monitor outputs as
features
 Selected





for Profiling, training and testing:
3600 accounts that have at least 30 fraud-free days of usage
before any fraudulent usage.
Initial 30 days of each account were used for profiling.
Remaining days were used to generate 96,000 account-days.
Distinct training and testing accounts:10,000 account-days for
training; 5000 for testing
20% fraud days and 80% non-fraud days
+
Experiments and Evaluation
+
39
Output of DC-1 components
 Rule

Each covering at least two accounts
 Rule
2
learning: 3630 rules
selection: 99 rules
monitor templates yielding 198 monitors
 Final
feature selection: 11 monitors
+
40
The Importance Of Error Cost
 Classification
accuracy is not sufficient to
evaluate performance
 Should
take misclassification costs into account
 Estimated


Error Costs:
False positive(false alarm): $5
False negative (letting a fraudulent account-day go undetected):
$0.40 per minute of fraudulent air-time
 Factoring
in error costs requires second
training pass by LTU
+
41
Alternative Detection Methods
 Collisions

+ Velocities
Errors almost entirely due to false negatives
 High
Usage – detect sudden large jump in
account usage
 Best

Individual DC-1 Monitor
(Time-of-day = Evening) ==> Fraud
 SOTA


- State Of The Art
Incorporates 13 hand-crafted profiling methods
Best detectors identified in a previous study
42
DC-1 Vs. Alternatives
Detector
Accuracy(%)
Cost ($)
Accuracy at Cost
Alarm on all
20
20000
20
Alarm on none
80
18111 +/- 961
80
Collisions +
Velocities
High Usage
82 +/- 0.3
17578 +/- 749
82 +/- 0.4
88+/- 0.7
6938 +/- 470
85 +/- 1.7
Best DC-1 monitor
89 +/- 0.5
7940 +/- 313
85 +/- 0.8
State of the art
(SOTA)
DC-1 detector
90 +/- 0.4
6557 +/- 541
88 +/- 0.9
92 +/- 0.5
5403 +/- 507
91 +/- 0.8
SOTA plus DC-1
92 +/- 0.4
5078 +/- 319
91 +/- 0.8
+
43
Shifting Fraud Distributions
 Fraud
detection system should adapt to shifting
fraud distributions
To illustrate the above point  One
non-adaptive DC-1 detector trained on a
fixed distribution ( 80% non-fraud ) and tested
against range of 75-99% non-fraud
 Another
DC-1 was allowed to adapt (re-train its
LTU threshold) for each fraud distribution
 Second
first
detector was more cost effective than the
44
Cost
Effects of Changing Fraud Distribution
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Adaptive
80/20
75
80
85
90
95
Percentage of non-fraud
100
+
47
Conclusion

DC-1 uses a rulelearning program
to uncover indicators of fraudulent
behavior from a large database of
customer transactions.

Then the indicators are used to
create a set of monitors, which
profile legitimate customer
behavior and indicate anomalies.

Finally, the outputs of the monitors
are used as features in a system
that learns to combine evidence to
generate highconfidence alarms.
+
48
Conclusion
 Adaptability
to dynamic patterns of fraud
can be achieved by generating fraud
detection systems automatically from data,
using data mining techniques
 DC-1
can adapt to the changing conditions
typical of fraud detection environments
 Experiments
indicate that DC-1 performs
better than other methods for detecting
fraud
49
+
Exam Questions
+
50
Question 1
• What are the two major fraud detection categories,
differentiate them, and where does DC-1 fall under?
•
Pre Call Methods
•
•
Involves validating the phone or its user when a call is placed.
Post Call Methods – DC1 falls here
•
Analyzes call data on each account to determine whether cloning
fraud has occurred.
+
51
Question 2
• Why do fraud detection methods need to be
adaptive?
• Bandits change their behavior- patterns of fraud dynamic
• Levels of fraud varies month-to-month
• Cost of missing fraud or handling false alarms changes
between inter-carrier contracts
+
52
Question 3
• What are the two steps of profiling monitors
and and what are the two main monitor
templates?
• Profiling Step: measure an accounts normal activity and
save statistics
• Use Step: process usage for an account-day to produce a
numerical output describing how abnormal activity was on
that account-day
•
Threshold and Standard Deviation monitors.
53
+
The End.
Questions?
Download