botnet

advertisement
Internet Measurement Conference 2006
A MULTIFACETED APPROACH TO
UNDERSTANDING THE BOTNET
PHENOMENON
Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose,
Andreas Terzis
Computer Science Department
Johns Hopkins University
OUTLINE

Introduction

Working of Botnet
Measuring of Botnet
 Result and Analysis
 Comments

2
BOTNET

Very little known about the behavior of these
distributed computing platforms.



model the botnet life cycle
The term botnet is used to define networks of
infected end-hosts, called bots, that are under the
control of a human operator commonly known as
botmaster.
While botnets recruit vulnerable machines using
methods also utilized by other classes of malware,
their defining characteristic is the use of
command and control (C&C) channels.
3
BOTNET (CONT’D)

Channels

IRC, Internet Relay Channel





was originally designed to form large social chat rooms
HTTP
P2P
While other class of malware were mostly used
demonstrate technical prominence among hackers,
botnets are used for illegal activities.
A multifaceted measurement approach to capture the
behavior and impact of botnets



distributed malware collection (binary)
IRC tracking (live botnet)
DNS cache probing
4
BOTNET LIFE CYCLE
(authenticate)
defining characteristic
(authenticate)
resolving the DNS name
of IRC server (instead of
using hard-coded IP)
actual bot binary
shell code
5
- remotely exploiting software vulnerabilities
- social engineering
MEASUREMENT METHODOLOGY

Three Distinct Phases

Malware Collection


Binary analysis via gray-box testing


Collect as many bot binaries as possible
Extract the features of suspicious binaries
Longitudinal tracking of IRC botnets
Through IRC and DNS trackers
 Track how bots spread and its reach

6
INFRASTRUCTURE DEPLOYMENT
1 Large Local darknet.
14 distributed nodes
(PlanetLab testbed).
1 Honeynet
1 Download Station
1 Gateway
1 local IRC server
IRC trackers (drone)
DNS probers
Use of 10 different class
A (/8) darknet IP spaces.
7
Darknet: denote an allocated but unused portion of the IP addresses space.
MALWARE COLLECTION

Nepenthes (on PlanetLab) mimics the replies
generated by vulnerable services in order to
collect the first stage exploit.

Nepenthes is a low interaction honeypot


a framework for large-scale collection of information on selfreplicating malware in the wild, emulating only the
vulnerable parts of a service
Modules in nepenthes




emulate vulnerabilities
download files – done by the Download Station
submit the downloaded files
shellcode handler
8
MALWARE COLLECTION (CONT’D)

Honeynets also used along with nepenthes



Running unpatched instances of Windows XP in
a virtualized environment (VMware) with static
private-space IP.


ensure catching exploits missed by nepenthes
These failures are most likely due to the responder’s
inability to mimic unknown exploit sequences or to
parse certain shellcodes.
One infection allowed and connections with unique
IRC servers
Binaries (from nepenthes or honeynets) are sent
to analysis engine for graybox testing.
9
MALWARE COLLECTION (CONT’D)

Gateway
Forwards traffic to 8 /24, daily rotating to cover the
whole darknet (NAT)
 Firewall (SNORT)



Prevent outbound attacks & self infection by honeypots
Only 1 infection in a honeypot
10
BINARY ANALYSIS (GREY BOX TESTING)

They use graybox analysis to extract the features
of suspicious binaries (regardless of the
mechanism by which they were collected).
Phase 1: Creation of a network fingerprint
 fnet = <DNS, IPs, Ports, scan>


DNS requests, destination IPs, Contact Ports, Contact
Protocols, default scanning behavior (e.g n=20
destination/port/monitored period)
Phase 2: Extraction of IRC-related features
 firc = <PASS, NICK, USER, MODE, JOIN>


initial password, nickname and username, the particular
modes set, and which IRC channels are joined (with
associated channel passwords)
11
LEARN A BOTNET DIALECT

Taken together, fnet and firc provide enough
information to join a botnet in the wild.


not enough
They make the bot connect to their local IRC
channel.
Force bot to join a local IRC server ( fake Botmaster)
 Use a query engine to learn the botnet “dialect”,
extracting command-response templates.

12
LONGITUDINAL TRACKING

IRC tracker (Drone)


Connects to a real IRC channel using fnet and firc.
Pretends to dutifully follow any commands from the
botmaster, and provides realistic responses to her
commands.
need to be intelligent enough
 filter inappropriate information included in the template


DNS Tracking
Bots issue DNS queries to resolve the IP addresses of
their IRC servers (~800,000 name servers are used)
 Each DNS name of a newly detected IRC server is
added to the list of servers to be probed.
 They probe the caches of all DNS and record any
cache hits.

13
RESULTS AND ANALYSIS

Collection period starts 1 Feb 2006
Darknet Traffic traces > 3 months
 IRC logs (honeynet, drones) > 3 months
 More than 100 botnet IRC channels
 Result of DNS cache hits from tracking 65 IRC
servers more than 45 days


Captured

318 malicious binaries.
14
BOTNET TRAFFIC SHARE
- ~27% of the incoming SYN is
contributed by known botnet spreader
- 76% to target ports (135, 139, 445,
3127)
- >70% succeed to send shellcode
15
Botnet Spreader: any source that successfully completed an exploitation
transaction and delivered a bot executable.
(Top Level Domain)
DNS TRACKER RESULTS
- Total 65 IRC server identified.
- 11% of the name servers involved in at
least one botnet activity.
- 29% of the .com servers had at least 1
cache hit.
Geographic location of the DNS
cache hits for one of the tracked
botnets. The star indicates the
location of the IRC server.
16
BOT SCAN METHOD

Type I (34 of 192 IRC bots) 17%
worm-like scanning
 continuously scan certain ports following a specific
target selection algorithm


Type II (158 of 192 IRC bots ) 83%
variable scanning behaviors
 only scan after receiving a command over C&C
channel

17
Different bots have different growth pattern, and they can be shown by DNS
and IRC views.
BOTNET GROWTH – DNS AND IRC
18
BOTNET STRUCTURE

Of 318 malicious binaries, 60% were IRC
70% of the botnets has single IRC server.
 Bridged 30% ( 25% public servers)



Unrelated botnets had similar naming conventions,
channel names, user IDs.


Two Servers 50%
In many cases, these botnets seem to belong to the same
botmaster(s).
Several instances where a selected group of bots were
commanded to download an updated binary, which
subsequently moved the bots to a different IRC
server.
19
SIZE AND LIFETIME
Bots generally do not stay long
on the IRC channel
broadcast join/leave information for members on the channel
20
BOTNET SOFTWARE TAXONOMY
AV: Anti-Virus
FW: Firewall
21
COMMENTS

A measurement methodology
How to capture a botnet’s binary?
 How to find the characteristic of a binary?


Build a system over honeypot.
Only focus on RPC and DNS analysis
 They did lots of analysis after capturing the bot,


how about evaluate the methodology?
22
Download