User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University

advertisement
User Interfaces and Algorithms
for Fighting Phishing
Jason I. Hong
Carnegie Mellon University
Everyday Privacy and Security Problem
This entire process
known as phishing
Phishing is a Plague on the Internet
•
•
•
•
Estimated 3.5 million people have fallen for phishing
Estimated $350m-$2b direct losses a year
9255 unique phishing sites reported in June 2006
Easier (and safer) to phish than rob a bank
Project: Supporting Trust Decisions
•
Goal: help people make better online trust decisions
– Currently focusing on anti-phishing
•
Large multi-disciplinary team project at CMU
– Six faculty, five PhD students, undergrads, staff
– Computer science, human-computer interaction,
public policy, social and decision sciences, CERT
Our Multi-Pronged Approach
•
Human side
–
–
–
–
•
Interviews to understand decision-making
PhishGuru embedded training
Anti-Phishing Phil game
Understanding effectiveness of browser warnings
Computer side
– PILFER email anti-phishing filter
– CANTINA web anti-phishing algorithm
Automate where possible, support where necessary
Our Multi-Pronged Approach
•
Human side
–
–
–
–
•
Interviews to understand decision-making
PhishGuru embedded training
Anti-Phishing Phil game
Understanding effectiveness of browser warnings
Computer side
– PILFER email anti-phishing filter
– CANTINA web anti-phishing algorithm
What do users know about phishing?
Interview Study
•
•
•
Interviewed 40 Internet users (35 non-experts)
“Mental models” interviews included email
role play and open ended questions
Brief overview of results (see paper for details)
J. Downs, M. Holbrook, and L. Cranor. Decision Strategies
and Susceptibility to Phishing. In Proceedings of the
2006 Symposium On Usable Privacy and Security, 12-14
July 2006, Pittsburgh, PA.
Little Knowledge of Phishing
•
Only about half knew meaning of the term “phishing”
“Something to do with the band Phish, I take it.”
Little Attention Paid to URLs
•
•
Only 55% of participants said they had ever
noticed an unexpected or strange-looking URL
Most did not consider them to be suspicious
Some Knowledge of Scams
•
55% of participants reported being cautious
when email asks for sensitive financial info
– But very few reported being suspicious of email
asking for passwords
•
Knowledge of financial phish reduced likelihood
of falling for these scams
– But did not transfer to other scams, such as an
amazon.com password phish
Naive Evaluation Strategies
•
The most frequent strategies don’t help much
in identifying phish
– This email appears to be for me
– It’s normal to hear from companies you do business with
– Reputable companies will send emails
“I will probably give them the information that they asked for.
And I would assume that I had already given them that
information at some point so I will feel comfortable giving it to
them again.”
Summary of Findings
•
•
•
•
People generally not good at identifying scams
they haven’t specifically seen before
People don’t use good strategies to protect
themselves
Currently running large-scale survey across
multiple cities in the US to gather more data
Amazon also active in looking for fake domain names
Our Multi-Pronged Approach
•
Human side
–
–
–
–
•
Interviews to understand decision-making
PhishGuru embedded training
Anti-Phishing Phil game
Understanding effectiveness of browser warnings
Computer side
– PILFER email anti-phishing filter
– CANTINA web anti-phishing algorithm
Can we train people not to fall for phish?
Web Site Training Study
•
•
Laboratory study of 28 non-expert computer users
Asked participants to evaluate 20 web sites
– Control group evaluated 10 web sites, took 15 min break to
People
canor
learn
web-based
materials,
read email
play from
solitaire,
evaluatedtraining
10 more
web sites
– Experimental
group
same
above,
spent
15 min break
if only we
could
getasthem
to but
read
them!
reading web-based training materials
•
Experimental group performed significantly better
identifying phish after training
– Less reliance on “professional-looking” designs
– Looking at and understanding URLs
– Web site asks for too much information
How Do We Get People Trained?
•
Most people don’t proactively look for training
materials on the web
•
Companies send “security notice” emails to
employees and/or customers
We hypothesized these tend to be ignored
•
– Too much to read
– People don’t consider them relevant
– People think they already know how to protect themselves
•
Led us to idea of embedded training
Embedded Training
•
Can we “train” people during their normal use of
email to avoid phishing attacks?
– Periodically, people get sent a training email
– Training email looks like a phishing attack
– If person falls for it, intervention warns and highlights
what cues to look for in succinct and engaging format
P. Kumaraguru, Y. Rhee, A. Acquisti, L. Cranor, J. Hong,
and E. Nunge. Protecting People from Phishing:
The Design and Evaluation of an Embedded
Training Email System. CHI 2007.
Embedded training example
Subject: Revision to Your Amazon.com Information
Please login and enter your information
http://www.amazon.com/exec/obidos/sign-in.html
Intervention #1 – Diagram
Intervention #1 – Diagram
Explains why they are
seeing this message
Intervention #1 – Diagram
Explains what a
phishing scam is
Intervention #1 –Explains
Diagram
how to identify
a phishing scam
Intervention #1 – Explains
Diagram
simple things
you can do to protect self
Intervention #2 – Comic Strip
Embedded Training Evaluation #1
•
Lab study comparing our prototypes to
standard security notices
– EBay, PayPal notices
– Intervention #1 – Diagram that explains phishing
– Intervention #2 – Comic strip that tells a story
•
10 participants in each condition (30 total)
– Screened so we only have novices
•
Go through 19 emails, 4 phishing attacks
scattered throughout, 2 training emails too
– Role play as Bobby Smith at Cognix Inc
Embedded Training Results
•
•
•
Existing practice of security notices is ineffective
Diagram intervention somewhat better
Comic strip intervention worked best
– Statistically significant
– Combination of less text, graphics, story?
Evaluation #2
•
New questions:
– Have to fall for phishing email to be effective?
– How well do people retain knowledge?
•
Roughly same experiment as before
–
–
–
–
–
•
Role play as Bobby Smith at Cognix Inc, go thru 16 emails
Embedded condition means have to fall for our email
Non-embedded means we just send the comic strip
Had people come back after 1 week
Improved design of comic strip intervention
To appear in APWG eCrime Researchers’ Summit
(Oct 4-5 at CMU)
Results of Evaluation #2
1.00
0.90
0.80
Correctness
•
Have to fall for phishing email to be effective?
How well do people retain knowledge after a week?
Mean correctness
•
0.68
0.70
0.64
0.60
0.50
0.40
0.30
0.20
0.10
0.18
0.14
0.04
0.07
0.00
before
immediate
delay
Training set
Non-embedded condition
Embedded condition
Results of Evaluation #2
1.00
0.90
0.80
Correctness
•
Have to fall for phishing email to be effective?
How well do people retain knowledge after a week?
Mean correctness
•
0.68
0.70
0.64
0.60
0.50
0.40
0.30
0.20
0.10
0.18
0.14
0.04
0.07
0.00
before
immediate
delay
Training set
Non-embedded condition
Embedded condition
Anti-Phishing Phil
•
A game to teach people not to fall for phish
– Embedded training focuses on email
– Our game focuses on web browser, URLs
•
Goals
– How to parse URLs
– Where to look for URLs
– Use search engines for help
•
Try the game!
– http://cups.cs.cmu.edu/antiphishing_phil
Anti-Phishing Phil
Evaluation of Anti-Phishing Phil
•
Test participants’ ability to identify phishing
web sites before and after training up to 15 min
– 10 web sites before training, 10 after, randomized order
•
Three conditions:
– Web-based phishing education
– Printed tutorial of our materials
– Anti-phishing Phil
•
14 participants in each condition
– Screened out security experts
– Younger, college students
Results
•
No statistically significant difference in
false negatives among the three groups
– Actually a phish, but participant thinks it’s not
– Unsure why, considering a larger online study
•
Though game group had fewest false positives
Our Multi-Pronged Approach
•
Human side
–
–
–
–
•
Interviews to understand decision-making
PhishGuru embedded training
Anti-Phishing Phil game
Understanding effectiveness of browser warnings
Computer side
– PILFER email anti-phishing filter
– CANTINA web anti-phishing algorithm
Do people see, understand,
and believe web browser warnings?
Screenshots
Internet Explorer – Passive Warning
Screenshots
Internet Explorer – Active Block
Screenshots
Mozilla FireFox – Active Block
How Effective are these Warnings?
•
We tested four conditions
–
–
–
–
•
FireFox Active Block
IE Active Block
IE Passive Warning
Control (no warnings or blocks)
“Shopping Study”
–
–
–
–
–
Setup some fake phishing pages and added to blacklists
Users were phished after purchases
Real email accounts and personal information
Spoofing eBay and Amazon (2 phish/user)
We observed them interact with the warnings
How Effective are these Warnings?
Improving Phishing Indicators
•
Passive warning failed for many reasons
– Didn’t interrupt the main task
– Wasn’t clear what the right action was
– Looked too much like other ignorable warnings
•
Now looking at science of warnings
– How to create effective security warnings
Our Multi-Pronged Approach
•
Human side
–
–
–
–
•
Interviews to understand decision-making
PhishGuru embedded training
Anti-Phishing Phil game
Understanding effectiveness of browser warnings
Computer side
– PILFER email anti-phishing filter
– CANTINA web anti-phishing algorithm
Can we automatically detect phish emails?
PILFER Email Anti-Phishing Filter
•
Philosophy: automate where possible, support
where necessary
•
Goal: Create email filter that detects phishing emails
– Spam filters well-explored, but how good for phishing?
– Can we create a custom filter for phishing?
•
I. Fette, N. Sadeh, A. Tomasic. Learning to Detect
Phishing Emails. In WWW 2007.
PILFER Email Anti-Phishing Filter
•
Heuristics combined in SVM
–
–
–
–
–
–
–
–
IP addresses in link (http://128.23.34.45/blah)
Age of linked-to domains (younger domains likely phishing)
Non-matching URLs (ex. most links point to PayPal)
“Click here to restore your account”
HTML email
Number of links
Number of domain names in links
Number of dots in URLs
(http://www.paypal.update.example.com/update.cgi)
– JavaScript
– SpamAssassin rating
PILFER Evaluation
•
Ham corpora from SpamAssassin (2002 and 2003)
– 6950 good emails
•
Phishingcorpus
– 860 phishing emails
PILFER Evaluation
PILFER Evaluation
•
•
PILFER now implemented as SpamAssassin filter
Alas, Ian has left for Google
Our Multi-Pronged Approach
•
Human side
–
–
–
–
•
Interviews to understand decision-making
PhishGuru embedded training
Anti-Phishing Phil game
Understanding effectiveness of browser warnings
Computer side
– PILFER email anti-phishing filter
– CANTINA web anti-phishing algorithm
Can we do better in automatically
detecting phish web sites?
Lots of Phish Detection Algorithms
•
Dozens of anti-phishing toolbars offered
–
–
–
–
–
Built into security software suites
Offered by ISPs
Free downloads
Built into latest version of popular web browsers
132 on download.com
Lots of Phish Detection Algorithms
•
Dozens of anti-phishing toolbars offered
–
–
–
–
–
•
Built into security software suites
Offered by ISPs
Free downloads
Built into latest version of popular web browsers
132 on download.com
But how well do they detect phish?
– Short answer: still room for improvement
Testing the Toolbars
•
November 2006: Automated evaluation of 10 toolbars
– Used phishtank.com and APWG as source of phishing URLs
– Evaluated 100 phish and 510 legitimate sites
Y. Zhang, S. Egelman, L. Cranor, J. Hong. Phinding Phish:
An Evaluation of Anti-Phishing Toolbars. NDSS 2006.
Testbed System Architecture
100%
38% false positives
Phishing sites correctly identified
Results
90%
1% false positives
80%
SpoofGuard
EarthLink
Netcraft
Google
IE7
Cloudmark
TrustWatch
eBay
Netscape
McAfee
70%
60%
50%
40%
30%
20%
10%
0%
0
1
2
Time (hours)
12
24
PhishTank
100%
Phishing sites correctly identified
90%
80%
SpoofGuard
EarthLink
Netcraft
Firefox w/Google
IE7
Cloudmark
TrustWatch
eBay
Netscape
CallingID
Firefox
70%
60%
50%
40%
30%
20%
10%
0%
0
1
2
Time (hours)
12
24
APWG
Results
•
•
Only one toolbar >90% accuracy (but high false positives)
Several catch 70-85% of phish with few false positives
Results
•
Only one toolbar >90% accuracy (but high false positives)
Several catch 70-85% of phish with few false positives
•
Can we do better?
•
– Can we use search engines to help find phish?
Y. Zhang, J. Hong, L. Cranor. CANTINA: A ContentBased Approach to Detecting Phishing Web Sites. In
WWW 2007.
Robust Hyperlinks
•
•
Developed by Phelps and Wilensky to solve
“404 not found” problem
Key idea was to add a lexical signature to URLs
that could be fed to a search engine if URL failed
– Ex. http://abc.com/page.html?sig=“word1+word2+...+word5”
•
How to generate signature?
– Found that TF-IDF was fairly effective
•
Informal evaluation found five words was sufficient
for most web pages
Adapting TF-IDF for Anti-Phishing
•
Can same basic approach be used for anti-phishing?
– Scammers often directly copy web pages
– With Google search engine, fake should have low page rank
Fake
Real
How CANTINA Works
•
•
•
•
Given a web page, calculate TF-IDF score for
each word in that page
Take five words with highest TF-IDF weights
Feed these five words into a search engine (Google)
If domain name of current web page is in top N
search results, we consider it legitimate
– N=30 worked well
– No improvement by increasing N
•
Later, added some heuristics to reduce false positives
Fake
eBay, user, sign, help, forgot
Real
eBay, user, sign, help, forgot
Evaluating CANTINA
PhishTank
Summary
•
Whirlwind tour of our work on anti-phishing
– Human side: how people make decisions, training, UIs
– Computer side: better algorithms for detecting phish
•
More info about our work at cups.cs.cmu.edu
Acknowledgments
•
•
•
•
•
•
•
Alessandro Acquisti
Lorrie Cranor
Sven Dietrich
Julie Downs
Mandy Holbrook
Norman Sadeh
Anthony Tomasic
•
•
•
•
•
•
•
•
Serge Egelman
Ian Fette
Ponnurangam
Kumaraguru
Bryant Magnien
Elizabeth Nunge
Yong Rhee
Steve Sheng
Yue Zhang
Supported by NSF, ARO, CyLab, Portugal Telecom
http://cups.cs.cmu.edu/
CMU Usable Privacy and Security
Laboratory
Group A
pa
m
Group B
Emails which had links in them
Group C
17
:P
16
:P
14
:P
ea
l
his
h
his
h
his
h
13
:R
pa
m
ra
ini
ng
12
:S
11
:T
8:
S
ea
l
ra
ini
ng
his
h
7:
R
5:
T
3:
P
Percentage of users who clicked
on a link
Embedded Training Results
100
90
80
70
60
50
40
30
20
10
0
Is it legitimate
Our label
Yes
No
Yes
True positive
False positive
No
False negative
True negative
Minimal Knowledge of Lock Icon
“I think that it means secured, it symbolizes
some kind of security, somehow.”
•
•
•
85% of participants were aware of lock icon
Only 40% of those knew that it was supposed
to be in the browser chrome
Only 35% had noticed https, and many of those
did not know what it meant
Download