BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection

Clustering Analysis of Network Traffic for Protocol- and
Structure-Independent Botnet Detection
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
College of Computing, Georgia Institute of Technology
Presented by Joshua Cox
Group of compromised computers
Controlled by remote commands
Malicious activities
DDoS attacks
– spam
– identity theft
Centralized Botnets
Botmaster sends
command to
designated C&C
Bots request
commands from
P2P Botnets
No C&C server
Botmaster sends
command to any
Bots share
commands with
Detects IRC based botnets
Monitors traffic
Suspicious nicknames
Suspicious servers
Uncommon server ports
Network based anomaly detection
All bots within a botnet will share similar
traffic patterns
Works with IRC and HTTP botnets
Does not detect P2P botnets
Works with
P2P botnets
Relies on “Infection Lifecycle Model”
What if we change the lifecycle?
BotMiner Objective
Detect groups of compromised machines
that are part of a botnet
Independent of C&C communication
structure and content
Minimal false positives
Resource efficient detection
BotMiner Architecture
BotMiner Architecture
C-plane Monitor
Who is talking to whom?
TCP and UDP traffic flows
time, duration
source, destination
packet count, bytes transferred
Manageable log size
less than 1GB per day for 300 Mbps
A-plane Monitor
Who is doing what?
Detects malicious activities
– binary downloading
– exploit attempts
Snort with custom plugins
C-plane Clustering
Which machines have similar
communication patterns?
C-plane monitor logs → cluster reports
C-plane Clustering
Basic Filtering
Remove internal flows
Remove one way flows
C-plane Clustering
White Listing
Remove flows to popular destination
Google, Yahoo, etc.
C-plane Clustering
C-flow: all traffic flows over a period of
time that share the same source,
destination, and protocol
C-plane Clustering
Feature Extraction
flows per hour
– bytes per packet
packets per flow
– bytes per second
C-plane Clustering
Two-step Clustering
Coarse-grain and Refined clustering
X-means clustering algorithm
A-plane Clustering
Which machines have similar activity
A-plane monitor logs → cluster reports
A-plane Clustering
Activity Type Clustering
– spam
binary download
– exploit
A-plane Clustering
Activity Feature Clustering
target subnet
– similar binary
spam content
– exploit type
Cross-plane Correlation
Which machines are in a botnet?
Botnet score
Number of clusters
Score of other hosts in cluster
Activity weighting
Which bots are in the same botnet?
Test Case
Georgia Tech campus network
Ran monitors for 10 days
up to 300 Mbps
wide variety of protocols
Obtained traces for 8 botnets
IRC, HTTP, and P2P
Botnets Used
Overlaid malicious traffic on normal traffic
Mapped IPs from random hosts to bots
Filtering Results
Internal/External filter reduces data by 90%
10 billion packets reduced to 50k C-flows
Detection Results
All botnets detected
99.6% bot detection
0.3% false positive rate
Traffic randomization and mimicry
Individual or group commands
C-plane cluster evasion
A-plane cluster evasion
Delay bot tasks
Cross-plane analysis evasion
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. BotMiner: Clustering Analysis of
Network Traffic for Protocol- and Structure-Independent Botnet Detection. 17th USENIX Security
Symposium (Security'08), San Jose, CA, 2008.
D. Pelleg and A. W. Moore. X-means: Extending k-means with efficient estimation of the number of
clusters. In Proceedings of the Seventeenth International Conference on Machine Learning
(ICML’00), pages 727–734, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
J. Goebel and T. Holz.
Rishi: Identify bot contaminated hosts by irc nickname evaluation. In
Proceedings of USENIX HotBots’07, 2007.
G. Gu, P. Porras, V. Yegneswaran, M. Fong, and W. Lee. BotHunter: Detecting malware infection
through ids-driven dialog correlation. In Proceedings of the 16th USENIX Security Symposium
(Security’07), 2007.
G. Gu, J. Zhang, and W. Lee. BotSniffer: Detecting botnet command and control channels in
network traffic. In Proceedings of the 15th Annual Network and Distributed System Security
Symposium (NDSS’08), 2008.