CAMP: Content-Agnostic Malware Protection

advertisement
20th Annual Network & Distributed System Security Symposium
(NDSS 2013)
CAMP: CONTENT-AGNOSTIC MALWARE
PROTECTION
Niels Provos, Moheeb Abu Rajab, Lucas Ballard,
Noe Lutz and Panayiotis Mavrommatis
Google Inc.
左昌國
2013/04/01 Seminar @ ADLab, NCU-CSIE
2
• X-agnostic
• Without the knowledge of X
• Content-agnostic malware protection
• The protection operates without the knowledge of the malware
content
3
Outline
• Introduction
• Related Work
• System Architecture
• Reputation System
• Evaluation
• Conclusion
4
Introduction
• Malware distribution through web browsers
• Drive-by Downloads
• I will not talk about it in this paper
• Social Engineering
• Fake Anti-Virus
• The defense?
• Blacklists / Whitelists
• Signature-based solution
• CAMP
• Reputation system
• Low false positive
5
Related Work
• Content-based Detection
• Anti-virus software
• CloudAV
• Blacklist-based Protection
• Google Safe Browsing API
• McAfee Site Advisor
• Symantec Safe Web
• Whitelist-based Schemes
• Bit9
• CoreTrace
• Reputation-based Detection
• SNARE
• Notos and EXPOSURE
• Microsoft SmartScreen
6
System Architecture
Client
Server
7
System Architecture – Binary Analysis
8
System Architecture – Binary Analysis
• Producing labels (benign or malicious) for training
purpose
• To classify binaries based on static and dynamic analysis
• The labels are also used to decide thresholds
• Goal: low false positive
9
System Architecture – Client
10
System Architecture – Client
• Doing local checks before asking the server for decision
1. In blacklists?
Google Safe Browsing API
2. Potentially harmful?
e.g. DMG files in Mac OS X
3. In whitelists?
Trusted domains and trusted signing certificates
• If no results in the local decision
• Extracting features from the downloaded binary
•
•
•
•
Final download URL / IP address
Referrer URL / (corresponding) IP address
Size / hash
Signature
• Sending the features to the server
11
System Architecture – Client
• The returned decision
12
System Architecture – Client
• ~70% of all downloads are considered benign due to
policy or matching client-side whitelists
• (on server side) Regularly analyzing binaries hosted on
the trusted domains or signed by trusted signers
13
System Architecture – Client
14
System Architecture – Server
15
System Architecture – Server
• The server receives the client request and renders a
reputation verdict
• The server uses the information to update its reputation
data
• BigTable and MapReduce
16
System Architecture – Frontend and Data
Storage
17
System Architecture – Frontend and Data
Storage
• Frontend
• RPC to reputation system
• URL as index?
• Popular URLs
 timestamp(request to the URL) : Reverse-Ordered hexadecimal
string
18
System Architecture – Spam Filtering
19
System Architecture – Spam Filtering
• Velocity controls on the user IP address
• The spam filter is employed to fetch binaries from the web
that have not been analyzed by the binary classifier
• Filter: only binaries that exhibit sufficient diversity of context
• The analysis may complete a long time after a reputation decision
was made
20
System Architecture – Aggregator
21
System Architecture – Aggregator
• Aggregate
• Forming the reputation data
• 3-dimensional index
• From where
• Features
• Categories: reputation / urls / hash
• client | site:foo.com | reputation
• analysis | ip:1.2.3.4/24 | urls
• Value
• (a, b)
• a: the number of interesting observations
• b: the total number of observations
(6, 10)
(0, 3)
22
Reputation System
• Feature Extraction
• IP address: single or netblock
• URL: direct download or host/domain/site
• Sign/Hash
23
Reputation System – Decision
≔𝑝 𝑛≥𝑡
• 𝑓 𝑝, 𝑛, 𝑡 ≔ 𝑛 ≥ 𝑡
• 𝑓 𝑝, 𝑛, 𝑡
24
Reputation System – Decision
• Threshold
• Thresholds are chosen according to the precision and recall for
each AND gate
• Precision and recall are determined from a labeled training set
• Training set: matching (hash from requests) with (hash from binary
analysis)
• Binary analysis provides the label (benign or malicious)
• Request provides the features
• 4000 benign requests / 1000 malicious requests
• Precision and recall
• http://en.wikipedia.org/wiki/Precision_and_recall
25
Reputation System – Decision
26
Evaluation
• Google Chrome
• Targeting Windows executables
• Accuracy of Binary Analysis
• Compared against VirusTotal
• 2200 samples selected
• 1100 were labeled clean by binary analysis component
• 1100 were labeled malicious
• Submitting to VirusTotal and waiting for 10 days
• 99% of the malicious labeled binaries were flagged by 20%+ of AV
engines on VirusTotal
• 12% of the clean labeled binaries were flagged 20%+ of AV engines on
VirusTotal
27
Evaluation – Accuracy of CAMP
• Feb. 2012 ~ July 2012
• Total 200 million users
• Each day, 8~10 million request
• 200~300 thousand labeled as malicious
• Total 3.2 billion aggregates
• 𝑡𝑝𝑟 ≔
𝑡𝑝
,
𝑡𝑝+𝑓𝑛
𝑓𝑝𝑟 ≔
• Overall accuracy
𝑓𝑝
,
𝑓𝑝+𝑡𝑛
𝑡𝑛𝑟 ≔
𝑡𝑝+𝑡𝑛
𝑡𝑝+𝑡𝑛+𝑓𝑝+𝑓𝑛
𝑡𝑛
,
𝑡𝑛+𝑓𝑝
𝑓𝑛𝑟 ≔
𝑓𝑛
𝑡𝑝+𝑓𝑛
28
Evaluation – Accuracy of CAMP
29
Evaluation – Accuracy of CAMP
30
Evaluation – Accuracy of CAMP
31
Evaluation – Accuracy of CAMP
32
Evaluation – Comparison to other
systems
• A random sample of 10,000 binaries labeled as benign
• 8,400 binaries labeled as malicious
33
Evaluation – Comparison to other
systems
34
Evaluation – Comparison to other
systems
35
Evaluation – Case Study
36
Conclusion
• This paper presents a content-agnostic malware
protection system, CAMP
• This paper performed a large scale of evaluation, and
show that the detection approach is both accurate and
good performance(processing requests in less 130ms)
Download