Google Safe Browsing: Privacy and Security Amrit Kumar Univ. de Grenoble Alpes & Privatics team, INRIA June 4, 2015 Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 1 Outline 1 Google Safe Browsing 2 Privacy 3 Security 4 Conclusion Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 2 Outline 1 Google Safe Browsing 2 Privacy 3 Security 4 Conclusion Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 3 Google Safe Browsing Demo time ! d99q.cn Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 4 Google Safe Browsing Started in 2008 by Google and used by : I I I I Google Chrome Mozilla Firefox Apple Safari Opera Impact : billions of users according to Google Goals : prevent users from visiting I I phishing sites malwares sites Methodology : blacklist API compatibility with C#, Python and PHP Cloned by Yandex. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 5 Safe Browsing Lookup API Google crawls the web to seek phishing and malwares URLs to feed a blacklist on their servers. How to use ? Ask Google’s server using a simple HTTP GET request. https://sb-ssl.google.com/safebrowsing/api/lookup? Issues : • bad scaling • privacy issue Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 6 The first blacklist Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 7 72 demons in the catalog Agares Aim Alloces Amdusias Amon Amy Andras Andrealphus Andromalius .. . Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 8 Identifying demon Problem : Is a hand book. How to make it a pocket book ? Solution : Lossy compression. Ag Ai Al Am An .. . From 72 names to 50 prefixes (30% compression). From 518 characters to 100 (80% compression). Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 9 False positives Hollande → Ho is not in the pocket book. Hollande isn’t a demon. Valls → Va is in the pocket book. But Valls isn’t in the complete catalog. ⇒ false positive ! If a prefix is in the compressed list : I I I Inconclusive : requires a verification from the handbook For Va, we would have : Valefar, Vapula et Vassago. Check among the full words. Solution is interesting if false positives are small in number. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 10 Google Safe Browsing (GSB) API v3 The local lookups are done over these files : List name goog-malware-shavar googpub-phish-shavar goog-regtest-shavar goog-whitedomain-shavar Description malware phishing test file unused #prefixes 317,807 312,621 29,667 1 Nearly ≈ 650000 entries overall. We are not working on URLs themselves but on their digests. We only use the first 4 bytes of SHA-256 digest. Prefix32(SHA256(www.example.com/))=0xd59cc9d3 Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 11 GSB API v3 Found prefixes ? Start client no Nonmalicious URL no yes Update needed ? no URL Canonicalize and compute digests yes Found digest ? yes Update local database Kumar (Univ. de Grenoble Alpes) Get full digests Malicious URL Google Safe Browsing June 4, 2015 12 Yandex Safe Browsing (YSB) Google’s Evil Twin Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 13 Yandex Safe Browsing API List name goog-malware-shavar goog-mobile-only-malware-shavar goog-phish-shavar ydx-adult-shavar ydx-adult-testing-shavar ydx-imgs-shavar ydx-malware-shavar ydx-mitb-masks-shavar ydx-mobile-only-malware-shavar ydx-phish-shavar ydx-porno-hosts-top-shavar ydx-sms-fraud-shavar ydx-test-shavar ydx-yellow-shavar ydx-yellow-testing-shavar ydx-badcrxids-digestvar ydx-badbin-digestvar ydx-mitb-uids ydx-badcrxids-testing-digestvar Kumar (Univ. de Grenoble Alpes) Description malware mobile malware phishing adult website test file malicious image malware man-in-the-browser malware phishing pornography sms fraud test file shocking content test file .crx file ids malicious binary man-in-the-browser test file Google Safe Browsing #prefixes 283,211 2,107 31,593 434 535 0 283,211 87 2,107 31,593 99,990 10,609 0 209 370 * * * * June 4, 2015 14 Why 32-bit prefixes ? Optmization SHA-256 prefix (bits) 32 64 80 128 256 Kumar (Univ. de Grenoble Alpes) Raw data (MB) 2. 5 5.1 6.4 10.2 20.3 Data structure (MB) size Compr. 1.3 3.9 5.1 8.9 19.1 Google Safe Browsing 1.9 1.3 1.2 1.1 1 June 4, 2015 15 Why 32-bit prefixes ? Privacy Year 2008 2012 2013 ` (bits) 16 32 64 96 # unique URLs (Google) 1 Billion 30 Billion 60 Billion M for URLs 2008 2012 2013 228 228 229 443 7541 14757 2 2 2 1 1 1 Kumar (Univ. de Grenoble Alpes) Google Safe Browsing # of domains 177 Million 252 Million 271 Million M for domain 2008 2012 2013 253 363 388 2 3 3 1 1 1 1 1 1 June 4, 2015 16 Outline 1 Google Safe Browsing 2 Privacy 3 Security 4 Conclusion Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 17 Highlights Google Chrome Privacy Notice on Safe Browsing. “Google cannot determine the real URL from this information.” (to be read prefixes) This statement is re-iterated in GSB usage in Mozilla Firefox. Conclusion : GSB must provide the same level of privacy than a private information retrieval algorithm. Really ? Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 18 Re-identification URL https://persyval-lab.org/content/edition/ https://persyval-lab.org/content/ https://persyval-lab.org/ 32-bit prefix 0x2929f0b1 0xc99584e3 0x192af851 Problem with the false-positives : 1 match : 0x2929f0b1 → no privacy issue. 2 matches : 0xc99584e3 and 0x192af851 → Problem. Sending several prefixes is indeed the case. More problem with temporal correlation : URL https://persyval-lab.org/phd/appel-2015/depot/ https://persyval-lab.org/phd/appel-2015/ Kumar (Univ. de Grenoble Alpes) Google Safe Browsing 32-bit prefix 0x6e2abf0a 0x79f13238 June 4, 2015 19 Interesting URLs URL http ://fr.xhamster.com/user/video http ://nl.xhamster.com/user/video http ://m.mofos.com/user/login http ://m.mofos.com/user/logout http ://mobile.teenslovehugecocks.com/user/join http ://fr.xhamster.com/user/kmille http ://de.xhamster.com/user/video http ://nl.xhamster.com/user/ppbbg http ://nl.xhamster.com/user/photo Kumar (Univ. de Grenoble Alpes) matching decomposition fr.xhamster.com/ xhamster.com/ nl.xhamster.com/ xhamster.com/ m.mofos.com/ mofos.com/ m.mofos.com/ mofos.com/ mobile.teenslovehugecocks.com/ teenslovehugecocks.com/ fr.xhamster.com/ xhamster.com/ de.xhamster.com/ xhamster.com/ nl.xhamster.com/ xhamster.com/ nl.xhamster.com/ xhamster.com/ Google Safe Browsing prefix 0xe4fdd86c 0x3074e021 0xa95055ff 0x3074e021 0x6e961650 0x00354501 0x6e961650 0x00354501 0x585667a5 0x92824b5c 0xe4fdd86c 0x3074e021 0x0215bac9 0x3074e021 0xa95055ff 0x3074e021 0xa95055ff 0x3074e021 June 4, 2015 20 Am I paranoid ? Twitter Bitly Facebook Firefox Chrome Mail.ru GSB Chromium Opera WOT Orbitum YSB Maxthon TRUSTe DuckDuckGo Yandex.Browser Safari 65% of the browsers in use. Major social networks. Activated by default in some releases of Tor Browsers. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 21 Orphans Google Yandex list name goog-malware-shavar googpub-phish-shavar ydx-malware-shavar ydx-adult-shavar ydx-mobile-only-malware-shavar ydx-phish-shavar ydx-mitb-masks-shavar ydx-porno-hosts-top-shavar ydx-sms-fraud-shavar ydx-yellow-shavar Kumar (Univ. de Grenoble Alpes) #full hash per 0 1 0.9% 99% 0.9% 99% 1.5% 98% 43% 57% 6% 94% 99% 1% 100% 0 1% 99% 95% 5% 100% 0 Google Safe Browsing prefix 2 0.1% 0.1% 0.5% 0 0 0 0 0 0 0 Total 317,807 312,621 283,211 434 2,107 31,593 87 99,990 10,609 209 June 4, 2015 22 Popular orphans Google Yandex list name goog-malware-shavar googpub-phish-shavar ydx-malware-shavar ydx-adult-shavar ydx-mobile-only-malware-shavar ydx-phish-shavar ydx-mitb-masks-shavar ydx-porno-hosts-top-shavar ydx-sms-fraud-shavar ydx-yellow-shavar Kumar (Univ. de Grenoble Alpes) #Coll. with TopAlexa 0 1 2 0 572 0 0 88 0 73 2,614 0 38 43 0 2 22 0 22 0 0 2 0 0 43 17,541 0 76 3 0 15 0 0 Google Safe Browsing Total 572 88 2,687 81 24 22 2 17,584 79 15 June 4, 2015 23 Conclusion Google and Yandex can track users. Mysterious files : Presence of large number of orphans. Accountability ? Private Information Retrieval is the definitive answer, but ... Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 24 Outline 1 Google Safe Browsing 2 Privacy 3 Security 4 Conclusion Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 25 Unsafe Browsing SB architecture is meaningful if false positive probability is low. Can an attacker increase the false positive probability ? I I I Increase requests towards server. Increase responses towards client. Or both. Attack impact : I I I Challenges the design rationale of the verification algorithm. Safe browsing can be potentially brought to its knees. Consumes bandwidth on client’s side. Goal is to mount a DoS attack. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 26 Attack routine Step 1 : Generate false positives. Example : Hollande is frequently searched. We search for names with the same prefix : Hochart Houssin Hoareau Hocquet Horn .. . Step 2 : Transform these names into demons and include them into the Key of Solomon. Step 3 : Observe the impact. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 27 Establishing the flow GSB Server Client (5) connect Database (4) update (6) send (7) update (3) transmit URLs (2) find malicious Web URLs Local dump Crawler (1) pre-images Adversary Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 28 Step 1 : Second pre-images Given a URL m, find m0 6= m tel que : Prefix32(SHA256(m)) = Prefix32(SHA256(m0 )) 232 brute-force computations to find such an m0 . Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 29 Generating Second pre-images Top 106 of Alexa 1 week of computation on 32 cores : I I Python fake-factory 0.4.2 ⇒ Human readable URLs Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 30 Results on TopAlexa Around 111 Million second pre-images were generated. #urls of topalexa (×103 ) 40 36 32 28 24 20 16 12 8 4 0 0 20 40 60 80 100 120 140 160 # second pre-images Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 31 Multiple second pre-images Prefix 0xd8b4483f 0xbbb9a6be 0x0f0eb30e 0x13041709 0xff42c50e 0xd932f4c1 # 165 163 162 161 160 160 Alexa Site http://getontheweb.com/ http://exqifm.be/ http://rustysoffroad.com/ http://meetingsfocus.com/ http://js118114.com/ http://cavenergie.nl/ Sample URLs : • http://62574314ginalittle.org/ • http://chloekub.biz/id9352871 Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 32 URL of Death malicious URL deadly-domain.com/tag1/ deadly-domain.com/tag1/tag2/ deadly-domain.com/tag1/tag2/tag3/ popular domain google.com facebook.com youtube.com prefix 0xd4c9d902 0x31193328 0x4dc3a769 Generate a tree of URL on the same domain. Attacker needs to purchase only one domain. Second pre-image search is relatively less parellelizable. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 33 Step 2 : Inclusion Reporting to Google : I I google.com/safebrowsing/report_badware/ google.com/safebrowsing/report_phish/ Reporting to Google’s sources : I I phishtank.com stopbadware.org Google Webmaster tools. Inclusion is the most difficult part : I I Ethical reasons. Blackbox implementation on the Google side. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 34 Step 3 : Consequences DoS : Increase in traffic towards SB server and its clients. Discount : 4 bytes sent, 5280 received. Amplification Worst case 8 Average case 800 Best case 1320 Bonus : browser’s cache pollution ! A prefix can only be queried every 45 min. I I Browser must conserve the list of all corresponding hashes in the cache for 45 min. Consumes memory ! No botnets required. Clever crafting of malicious URLs. Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 35 Outline 1 Google Safe Browsing 2 Privacy 3 Security 4 Conclusion Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 36 Conclusion Privacy : I I I I Safe Browsing is a useful service. But, privacy policy is incorrect. Has potential to track users. But, no strong evidence. Security : I I I Attacks challenge the fundamental design rationale. Challenge : Google servers are blackbox. White-listing ? Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 37 Thank you ! Questions ? Kumar (Univ. de Grenoble Alpes) Google Safe Browsing June 4, 2015 38