Google Safe Browsing: Privacy and Security

advertisement
Google Safe Browsing: Privacy and Security
Amrit Kumar
Univ. de Grenoble Alpes & Privatics team, INRIA
June 4, 2015
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
1
Outline
1
Google Safe Browsing
2
Privacy
3
Security
4
Conclusion
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
2
Outline
1
Google Safe Browsing
2
Privacy
3
Security
4
Conclusion
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
3
Google Safe Browsing
Demo time !
d99q.cn
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
4
Google Safe Browsing
Started in 2008 by Google and used by :
I
I
I
I
Google Chrome
Mozilla Firefox
Apple Safari
Opera
Impact : billions of users according to Google
Goals : prevent users from visiting
I
I
phishing sites
malwares sites
Methodology : blacklist
API compatibility with C#, Python and PHP
Cloned by Yandex.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
5
Safe Browsing Lookup API
Google crawls the web to seek phishing and malwares URLs to
feed a blacklist on their servers.
How to use ?
Ask Google’s server using a simple HTTP GET request.
https://sb-ssl.google.com/safebrowsing/api/lookup?
Issues :
• bad scaling
• privacy issue
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
6
The first blacklist
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
7
72 demons in the catalog
Agares
Aim
Alloces
Amdusias
Amon
Amy
Andras
Andrealphus
Andromalius
..
.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
8
Identifying demon
Problem : Is a hand book. How to make it a pocket book ?
Solution : Lossy compression.
Ag
Ai
Al
Am
An
..
.
From 72 names to 50 prefixes (30% compression).
From 518 characters to 100 (80% compression).
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
9
False positives
Hollande → Ho is not in the pocket book. Hollande isn’t a demon.
Valls → Va is in the pocket book.
But Valls isn’t in the complete catalog.
⇒ false positive !
If a prefix is in the compressed list :
I
I
I
Inconclusive : requires a verification from the handbook
For Va, we would have : Valefar, Vapula et Vassago.
Check among the full words.
Solution is interesting if false positives are small in number.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
10
Google Safe Browsing (GSB) API v3
The local lookups are done over these files :
List name
goog-malware-shavar
googpub-phish-shavar
goog-regtest-shavar
goog-whitedomain-shavar
Description
malware
phishing
test file
unused
#prefixes
317,807
312,621
29,667
1
Nearly ≈ 650000 entries overall.
We are not working on URLs themselves but on their digests. We only
use the first 4 bytes of SHA-256 digest.
Prefix32(SHA256(www.example.com/))=0xd59cc9d3
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
11
GSB API v3
Found
prefixes ?
Start
client
no
Nonmalicious
URL
no
yes
Update
needed ?
no
URL
Canonicalize
and compute
digests
yes
Found
digest ?
yes
Update local
database
Kumar (Univ. de Grenoble Alpes)
Get full
digests
Malicious
URL
Google Safe Browsing
June 4, 2015
12
Yandex Safe Browsing (YSB)
Google’s Evil Twin
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
13
Yandex Safe Browsing API
List name
goog-malware-shavar
goog-mobile-only-malware-shavar
goog-phish-shavar
ydx-adult-shavar
ydx-adult-testing-shavar
ydx-imgs-shavar
ydx-malware-shavar
ydx-mitb-masks-shavar
ydx-mobile-only-malware-shavar
ydx-phish-shavar
ydx-porno-hosts-top-shavar
ydx-sms-fraud-shavar
ydx-test-shavar
ydx-yellow-shavar
ydx-yellow-testing-shavar
ydx-badcrxids-digestvar
ydx-badbin-digestvar
ydx-mitb-uids
ydx-badcrxids-testing-digestvar
Kumar (Univ. de Grenoble Alpes)
Description
malware
mobile malware
phishing
adult website
test file
malicious image
malware
man-in-the-browser
malware
phishing
pornography
sms fraud
test file
shocking content
test file
.crx file ids
malicious binary
man-in-the-browser
test file
Google Safe Browsing
#prefixes
283,211
2,107
31,593
434
535
0
283,211
87
2,107
31,593
99,990
10,609
0
209
370
*
*
*
*
June 4, 2015
14
Why 32-bit prefixes ?
Optmization
SHA-256
prefix (bits)
32
64
80
128
256
Kumar (Univ. de Grenoble Alpes)
Raw data (MB)
2. 5
5.1
6.4
10.2
20.3
Data structure (MB)
size
Compr.
1.3
3.9
5.1
8.9
19.1
Google Safe Browsing
1.9
1.3
1.2
1.1
1
June 4, 2015
15
Why 32-bit prefixes ?
Privacy
Year
2008
2012
2013
` (bits)
16
32
64
96
# unique URLs (Google)
1 Billion
30 Billion
60 Billion
M for URLs
2008 2012
2013
228
228
229
443 7541 14757
2
2
2
1
1
1
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
# of domains
177 Million
252 Million
271 Million
M for domain
2008 2012 2013
253
363
388
2
3
3
1
1
1
1
1
1
June 4, 2015
16
Outline
1
Google Safe Browsing
2
Privacy
3
Security
4
Conclusion
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
17
Highlights
Google Chrome Privacy Notice on Safe Browsing.
“Google cannot determine the real URL from this information.”
(to be read prefixes)
This statement is re-iterated in GSB usage in Mozilla Firefox.
Conclusion : GSB must provide the same level of privacy than a
private information retrieval algorithm.
Really ?
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
18
Re-identification
URL
https://persyval-lab.org/content/edition/
https://persyval-lab.org/content/
https://persyval-lab.org/
32-bit prefix
0x2929f0b1
0xc99584e3
0x192af851
Problem with the false-positives :
1 match : 0x2929f0b1 → no privacy issue.
2 matches : 0xc99584e3 and 0x192af851 → Problem.
Sending several prefixes is indeed the case.
More problem with temporal correlation :
URL
https://persyval-lab.org/phd/appel-2015/depot/
https://persyval-lab.org/phd/appel-2015/
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
32-bit prefix
0x6e2abf0a
0x79f13238
June 4, 2015
19
Interesting URLs
URL
http ://fr.xhamster.com/user/video
http ://nl.xhamster.com/user/video
http ://m.mofos.com/user/login
http ://m.mofos.com/user/logout
http ://mobile.teenslovehugecocks.com/user/join
http ://fr.xhamster.com/user/kmille
http ://de.xhamster.com/user/video
http ://nl.xhamster.com/user/ppbbg
http ://nl.xhamster.com/user/photo
Kumar (Univ. de Grenoble Alpes)
matching decomposition
fr.xhamster.com/
xhamster.com/
nl.xhamster.com/
xhamster.com/
m.mofos.com/
mofos.com/
m.mofos.com/
mofos.com/
mobile.teenslovehugecocks.com/
teenslovehugecocks.com/
fr.xhamster.com/
xhamster.com/
de.xhamster.com/
xhamster.com/
nl.xhamster.com/
xhamster.com/
nl.xhamster.com/
xhamster.com/
Google Safe Browsing
prefix
0xe4fdd86c
0x3074e021
0xa95055ff
0x3074e021
0x6e961650
0x00354501
0x6e961650
0x00354501
0x585667a5
0x92824b5c
0xe4fdd86c
0x3074e021
0x0215bac9
0x3074e021
0xa95055ff
0x3074e021
0xa95055ff
0x3074e021
June 4, 2015
20
Am I paranoid ?
Twitter Bitly
Facebook
Firefox
Chrome
Mail.ru
GSB
Chromium
Opera
WOT
Orbitum
YSB
Maxthon
TRUSTe
DuckDuckGo
Yandex.Browser
Safari
65% of the browsers in use.
Major social networks.
Activated by default in some releases of Tor Browsers.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
21
Orphans
Google
Yandex
list name
goog-malware-shavar
googpub-phish-shavar
ydx-malware-shavar
ydx-adult-shavar
ydx-mobile-only-malware-shavar
ydx-phish-shavar
ydx-mitb-masks-shavar
ydx-porno-hosts-top-shavar
ydx-sms-fraud-shavar
ydx-yellow-shavar
Kumar (Univ. de Grenoble Alpes)
#full hash per
0
1
0.9% 99%
0.9% 99%
1.5% 98%
43% 57%
6% 94%
99%
1%
100%
0
1% 99%
95%
5%
100%
0
Google Safe Browsing
prefix
2
0.1%
0.1%
0.5%
0
0
0
0
0
0
0
Total
317,807
312,621
283,211
434
2,107
31,593
87
99,990
10,609
209
June 4, 2015
22
Popular orphans
Google
Yandex
list name
goog-malware-shavar
googpub-phish-shavar
ydx-malware-shavar
ydx-adult-shavar
ydx-mobile-only-malware-shavar
ydx-phish-shavar
ydx-mitb-masks-shavar
ydx-porno-hosts-top-shavar
ydx-sms-fraud-shavar
ydx-yellow-shavar
Kumar (Univ. de Grenoble Alpes)
#Coll. with TopAlexa
0
1
2
0
572
0
0
88
0
73
2,614
0
38
43
0
2
22
0
22
0
0
2
0
0
43 17,541
0
76
3
0
15
0
0
Google Safe Browsing
Total
572
88
2,687
81
24
22
2
17,584
79
15
June 4, 2015
23
Conclusion
Google and Yandex can track users.
Mysterious files : Presence of large number of orphans.
Accountability ?
Private Information Retrieval is the definitive answer, but ...
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
24
Outline
1
Google Safe Browsing
2
Privacy
3
Security
4
Conclusion
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
25
Unsafe Browsing
SB architecture is meaningful if false positive probability is low.
Can an attacker increase the false positive probability ?
I
I
I
Increase requests towards server.
Increase responses towards client.
Or both.
Attack impact :
I
I
I
Challenges the design rationale of the verification algorithm.
Safe browsing can be potentially brought to its knees.
Consumes bandwidth on client’s side.
Goal is to mount a DoS attack.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
26
Attack routine
Step 1 : Generate false positives. Example : Hollande is frequently
searched. We search for names with the same prefix :
Hochart
Houssin
Hoareau
Hocquet
Horn
..
.
Step 2 : Transform these names into demons and include them into
the Key of Solomon.
Step 3 : Observe the impact.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
27
Establishing the flow
GSB Server
Client
(5) connect
Database
(4) update
(6) send
(7) update
(3) transmit URLs
(2) find malicious
Web
URLs
Local dump
Crawler
(1) pre-images
Adversary
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
28
Step 1 : Second pre-images
Given a URL m, find m0 6= m tel que :
Prefix32(SHA256(m)) = Prefix32(SHA256(m0 ))
232 brute-force computations to find such an m0 .
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
29
Generating Second pre-images
Top 106 of Alexa
1 week of computation on 32 cores :
I
I
Python
fake-factory 0.4.2 ⇒ Human readable URLs
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
30
Results on TopAlexa
Around 111 Million second pre-images were generated.
#urls of topalexa (×103 )
40
36
32
28
24
20
16
12
8
4
0
0
20
40
60
80
100
120
140
160
# second pre-images
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
31
Multiple second pre-images
Prefix
0xd8b4483f
0xbbb9a6be
0x0f0eb30e
0x13041709
0xff42c50e
0xd932f4c1
#
165
163
162
161
160
160
Alexa Site
http://getontheweb.com/
http://exqifm.be/
http://rustysoffroad.com/
http://meetingsfocus.com/
http://js118114.com/
http://cavenergie.nl/
Sample URLs :
• http://62574314ginalittle.org/
• http://chloekub.biz/id9352871
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
32
URL of Death
malicious URL
deadly-domain.com/tag1/
deadly-domain.com/tag1/tag2/
deadly-domain.com/tag1/tag2/tag3/
popular domain
google.com
facebook.com
youtube.com
prefix
0xd4c9d902
0x31193328
0x4dc3a769
Generate a tree of URL on the same domain.
Attacker needs to purchase only one domain.
Second pre-image search is relatively less parellelizable.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
33
Step 2 : Inclusion
Reporting to Google :
I
I
google.com/safebrowsing/report_badware/
google.com/safebrowsing/report_phish/
Reporting to Google’s sources :
I
I
phishtank.com
stopbadware.org
Google Webmaster tools.
Inclusion is the most difficult part :
I
I
Ethical reasons.
Blackbox implementation on the Google side.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
34
Step 3 : Consequences
DoS : Increase in traffic towards SB server and its clients.
Discount : 4 bytes sent, 5280 received.
Amplification
Worst case
8
Average case
800
Best case
1320
Bonus : browser’s cache pollution !
A prefix can only be queried every 45 min.
I
I
Browser must conserve the list of all corresponding hashes in the cache
for 45 min.
Consumes memory !
No botnets required.
Clever crafting of malicious URLs.
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
35
Outline
1
Google Safe Browsing
2
Privacy
3
Security
4
Conclusion
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
36
Conclusion
Privacy :
I
I
I
I
Safe Browsing is a useful service.
But, privacy policy is incorrect.
Has potential to track users.
But, no strong evidence.
Security :
I
I
I
Attacks challenge the fundamental design rationale.
Challenge : Google servers are blackbox.
White-listing ?
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
37
Thank you !
Questions ?
Kumar (Univ. de Grenoble Alpes)
Google Safe Browsing
June 4, 2015
38
Download