後藤研究室の紹介 早稲田大学 理工学術院 基幹理工学部 情報理工学科 後藤滋樹 1 1. Drive-by-Download Attack Beyond simple URL and referrer tests Landing site Vulnerable browsers & plugins Redirect Hopping site … … Exploit site Malware site © Daiki Chiba, Waseda Univ. 2 Analysis Method • Analyze communication data by three tests URL test Referrer test Host test Check URLs and Location field in HTTP response Check Referrer field in HTTP request Check Host (IP address and Domain name) behavior in a Session © Y. Takata 3 Proposed Method, Step 1: URL Test • 1. URLs in HTTP data of an HTTP response – Occurrence of GET request for the URL • 2. The URL in the “Location” field – The HTTP status code “3xx” GET request Header GET /foo/index.php HTTP/1.1 Request Version: HTTP/1.1 Accept: */* Referer: http://hoge.com/index.html Accept-Language: en-us Connection: Keep-Alive Host: hoge2.com Contents URL http://hoge2.com/foo/index.php HTTP response Header HTTP/1.1 200 OK HTTP Status Code Date: Mon, 08 Mar 2010 10:43:21 GMT Server: Apache/2.2.3 (CentOS) Content-Length: 27919 Connection: Keep-Alive Content-Type: text/html Search Extracted URLs ['http://hoge2.com/test.index‘, 'http://hoge3.com/attack.php'] 4 HTTP status code code 1xx 2xx 3xx 4xx 5xx status Informational Successful Redirect Client error Server error RFC 2616, Hypertext Transfer Protocol – HTTP/1/1, June 1999. 5 Proposed Method, Step 2: Referrer Test • The URL set in the “Referrer” field – Occurrence of GET request for the URL *Referrer・・・The referrer identifies the address of the webpage of the resource which links to it. By checking the referrer, the new webpage can see where the request was originated. HTTP response Header GET request Header GET /foo/index.php HTTP/1.1 Request Version: HTTP/1.1 Accept: */* Referer: http://hoge.com/index.html Accept-Language: en-us Connection: Keep-Alive Host: hoge2.com Search APAN Network Research Workshop HTTP/1.1 200 OK Date: Mon, 08 Mar 2010 10:43:21 GMT Server: Apache/2.2.3 (CentOS) Content-Length: 27919 Connection: Keep-Alive Content-Type: text/html Create the file URL from the GET request Header for this HTTP response File URL http://hoge2.com/foo/index.php 6 Proposed method 3: Host test Associate redirections based on appeared Host Server 1 (IP:x.x.x.x, test.com) 1:15.4 1:17.0 1:18.1 10:13.6 The Host means a Server Server 2 (IP:y.y.y.y, hoge.co.jp) IP address and domain name 1:19.1 1:20.4 1:24.4 D DNS D S S D D S S S 3-way (SYN) S G G S S G G G GET G H H G G H H H HTTP H G G H H G G F FIN RST G H H G G H H H G G H H G G G H H G G H H H H H Session from an Entrance Web site Session from an Entrance Web site (Identify Referrer test and URL test) Sessions from not an entrance Web site (cannot identify by Referrer test and URL test) Session identified by Host test (Only near session started SYN) Not the Same Redirection F ←Start F F Session APAN Network Research Workshop F F F F End→ 7 Host test is effective URL Test GET requests for URLs in a HTTP data and a Location field Referrer Test A GET request for the URL set in the Referrer field Host Test e.g. A result of obfuscating a program alert(“Hello, World!!”); Associate redirections based on a Host (IP address and Domain name) Table. Tests to acquire malicious distribution URLs Date Total of malicious URLs March 8th March 9th 202 205 URL Test 12 (5.9%) 10 (4.9%) Referrer Test 13 (6.4%) 13 (6.3%) Host Test March 11th 158 Hide URLs by obfuscating and redirect users 10 (6.3%) without a Referrer field 6 (3.8%) 177 (87.6%) 182 (88.8%) 142 (89.9%) © Y. Takada 8 2. Priority crawling (1) IP address Hilbert Curve http://en.wikipedia.org/wiki/File:Hilbert_curve.svg • Skewed IP addresses – Hilbert curve ■blue: benign ■red: malicious – ExOctet Method by D. Chiba The IP address is an important parameter to tell if the URL is malicious or benign. © D. Chiba 9 Priority crawling (2) new domain names • “Whois” information • When a domain name is registered? • Malicious domain names are new. • parameter W = dn – d (dn: the current date、d: date of the registration) CDF: Cumulative Distribution Function How many years past after the registration? Red: malicious, Blue: benign © Daiki Chiba, Waseda Univ. 10 2. Priority crawling (3) FQDN length CCDF: Complementary Cumulative Distribution Function Red: malicious, Blue: benign • Length of the FQDN character string • Malicious FQDN is long. • parameter L = (length of FQDN string) • Example: www.waseda.jp (L=13) 9d2c76904ddcb022c5e1bb604 b9b037g.example.com (L=44) Length of FQDN character string 11 2. Priority crawling (4) FQDN entropy • Entropy of FQDN character string CCDF (Complementary Cumulative Distribution Function) Malicious domain names are Entropy of a malicious FQDN is large. composed randomly. Red: malicious, Blue: benign • Parameter: FQDN X {x1 , x2 ,, xn } p(xi ) = probability of xi n E p ( xi ) log p ( xi ) i 1 Entropy of FQDN character strings • Example: www.waseda.jp (E=2.78) 9d2c76904ddcb022c5e1b b604b9b037g.example.co m (E=4.18) 12 Information entropy Uncertainty 𝑢 = log 2 8 = 3 1 𝑢 = −log 2 8 Average uncertainty http://item.rakuten.co.jp/headwear/780897 𝑢 13 Probability P(xi) Character string= “abcabcaabaacaabc” p(a)=8/16, p(b)=4/16, p(c)=4/16. E=--{(8/16)*log(8/16)+ 2*(4/16)*log(4/16)} = 1.5 occurrences of alphabet characters occur percentage http://www.nii.ac.jp/userdata/shimin/docu ments/H23/111102_5thlec.pdf The natural occurrence of alphabet has been Total investigated. http://www7.plala.or.jp/dvorakjp/hinshutu.htm 14 2. Priority crawling (5) FQDN n-gram Top 30 n-gram (n=2) frequency of occurrences Red: malicious, Blue: benign • n-gram (n=2) of FQDN strings • Malicious and benign FQDNs • parameter {g-0, … , gk, … , gz9} (gk: n-gram文字列 k の出現頻度) frequency • 2-character strings → which has one numeral or one symbol. • Example: a1-a2.example.com → a1, 1-, -a, a2, 2., .e, e., .c 15 3. NICTER or darknet • All the hosts on the Internet have IP addresses. • It is difficult to allocate all the IP addresses to real hosts. There are certain number of unused or unallocated IP addresses. • Questions: If we observe an unsed IP address block, can we observe any incoming traffic to the IP addresses? Answer: Yes. 16 3. NICTER or darknet It is meaningful to collect a large number of IP packets which have unallocated IP addresses as the destination. http://www.nict.go.jp/glossary/nicter.html (Japanese text) http://www.nicter.jp/nw_public/scripts/index.php#nicter (nicterweb) http://www.youtube.com/watch?v=jLYs52OBh_A (movie) 17 4. DarkPots © Akihiro Shimoda Vacancy Checker Forwarder Analyzer Analyzers emulated response DarkPots System list of unused-IPs Forwarder Vacancy checker mirroring The Internet 18 Gateway Router (ACLs deployed) Enterprise / Campus Network Attack counts per source The last two slides are not properly shown in the PDF. Please refer to a separate PDF file. in time series on the honeypot analyzer (top 7 heavy-hitters) 19 The second experiment: Comparison of botnet attacks Continuous Random attack count Continuous 20 Random The last two slides are not properly shown in the PDF. Please refer to a separate PDF file. 5. Android malware Android application (.apk) Manifest file (AndroidManifest.xml) Application programs (classes.dex) Application resources 21 Manifest file Manifest file must be present in all Android applications It takes the form of “AndroidManifest.xml” This file has essential information about Android application - 24 kinds of information are described Manifest file (AndroidManifest.xml) The version number of an application Intent filter Required permission Required API level ・ ・ ・ 22 Proposed method : Keyword lists (1) With this new method, several keyword lists are compiled for an application Benign or malicious strings† in a manifest file are recorded in the keyword list Four types of keyword lists: (1) Permission (2) Intent filter (action) (3) Intent filter (category) (4) Process name †(5) Intent filter (priority) and (6) Number of redefined permission are represented by an integer, and not a text string 23 Experiment : Result Correct detection (%) Incorrect detection (%) Benign samples 91.4 8.6 Malware samples 87.5 12.5 Total 90.0 10.0 • Correct detection : correct detection to the total number • Incorrect detection : incorrectly detection to the total number 24