Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Terzis

advertisement
Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Terzis
Computer Science Department
Johns Hopkins University
A MULTIFACETED APPROACH TO
UNDERSTANDING THE BOTNET
PHENOMENON (2006)
Jonathan Brant
CAP 6135 – Spring 2010
Overview



Introduction
Background
Measurement Methodology




Results and Analysis








Malware Collection
Graybox testing
Longitudinal Tracking of Botnets
Botnet Prevalence
Spreading Methods
Growth Patterns
Botnet Structures
Effective Botnet Size
Lifetime
“Insider’s view”
Conclusion
Introduction

Botnets – “networks of infected end-hosts that are
under the control of a human operator”
 Bots
– end-hosts
 Botmaster – human operator

Command and Control channels facilitate botmaster
commands to bots in the botnet
 Channels
can use different communication mechanisms
(e.g. P2P)
 Most

modern botnets use Internet Relay Chat (IRC)
Originally used to form large chat rooms
Introduction

Botnets almost always used for illegal activities
 Extortion
 E-mail
spamming
 Identity theft
 Software piracy
Introduction

Paper attempts to address inquiries such as:
 Number
of botnet “species”
 Behavioral
 Evolution
categorization of different species
of a botnet
Background

Step 1 – Botnets
commandeer victims via
remotely exploiting
vulnerability of software
running on victim
 Infection
strategies include:
 Self-replicating
worms
 E-mail
viruses
 Social engineering

Convincing victims to run
malicious code on their machine
Background

Step 2 – Victim executes
shellcode and image of bot
binary is fetched from
location within botnet
 When
fetch is complete, the
binary installs itself on
target machine and
automatically starts on each
reboot
Background

Step 3 – Bot attempts to
contact IRC server (address
stored in executable)
 Using
a DNS name instead
of IP address allows
botmaster to retain control
if IP is blacklisted by ISP
Background

Step 4 – Bot attempts to
establish IRC session and join
C2 channel

Three authentication steps:

Bot authenticates itself using
PASS message


Bot issues C2 channel password


This is the IRC session password
This password and session
password are in bot binary
Botmaster authenticates to bot
population

This prevents other botmasters
from seizing control of botnet
Background

Step 5 – Channel topic is
parsed and executed


Contains default command
that every bot executes
Future commands coming
from botmaster can vary
widely

Wide variety of available
commands/responses
increases difficulty of
classifying botnet behaviors
Measurement Methodology

Data collection includes three phases:
 Malware
collection
 Binary analysis via gray-box testing
 Tracking of IRC botnets through IRC and DNS trackers
Measurement| Malware Collection

Goal is to collect as many bot binaries as possible


Must support a wide array of data collection endpoints and
be highly scalable
Distributed darknet

Locally deployed
darknet


Allocated but
unused portion of
IP address space
14 distributed
nodes using
PlanetLab testbed
Measurement| Malware Collection

Modified nepenthes platform

Mimics replies generated by
vulnerable services


Raw packets from PlanetLab
nodes translated


Collects first-stage exploit
(shell-code)
Using translation module written
in Click
Packets were injected into
local tunneling interface
Measurement | Malware Collection

On-line download
modules in nepenthes
disabled to prevent
excessive downloads
Binaries retrieved by
generating list of URL
targets and sending to
download station
 Download station
filtered entries in list and
extracted unique
sources/URLs

Measurement | Malware Collection

Honeynet catches exploits
missed by nepenthes

Composed of honeypots
running unpatched, virtual
instances of Windows XP



Each honeypot assigned
private static IP on
separate VLAN
Infected honeypots sustain
IRC connections until VM’s
reimaged
Suspect binaries retrieved
by comparing VM contents
to clean Windows image
Measurement | Malware Collection

Gateway routes
darknet traffic to
various parts on
internal network
 Half
of darknet
prefixes directed to
local responder and
other half to honeynet
 NAT
used to map each
honeypot to 128
darknet IP addresses
Measurement | Malware Collection

Serves as firewall
preventing honeypots
from conducting
outbound attacks or
infecting each other

Cross-infection prevented
by:



Placing each honeypot
on separate VLAN and
terminating cross-VLAN
traffic
Terminating cross-VLAN
traffic
Outbound traffic block on
popular vulnerable ports

135, 139, 445, etc.
Measurement | Malware Collection

Runs IRC detection
module

Application-level traffic
searched for common IRC
protocol strings

NICK, JOIN, USER
Once IRC connection
witnessed, detection
module establishes record
for IRC session
 When honeypot attempts
to reconnect, connection
allowed to proceed to IRC
server

Measurement | Malware Collection
 Detection
module only
allows one honeypot to
connect to an IRC
server at given point in
time
 Gateway
detects when
honeypot is infected

Rules inserted to block
inbound attacks to that
honeypot
Measurement | Malware Collection
 Gateway
also
performs miscellaneous
tasks
 Triggering
honeypot re-
imaging
 Loading clean
Windows images
 Pre-filtering for
download station
 Running local DNS
server to resolve DNS
queries from honeypots
Measurement | Graybox Testing


Graybox testing used to extract features of
suspicious binaries
Analysis spans two distinct phases (performed on
isolated network segment)
 First
phase derives network fingerprint of binary
 Second phase extracts binaries IRC-specific features
Measurement | Graybox Testing

Phase 1: Creation of a network fingerprint

Server acts as network sink


All network activity initiated by malware will be detected
Traffic logs automatically processed to extract network
fingerprint
f net  DNS , IPs, Ports, scan
DNS – target of DNS requests
 IPs – destination IP addresses
 Ports – contacted ports and protocols
 Scan – whether or not default scanning behavior was detected


Default scanning behavior – any attempt to contact more than 20
distinct destinations on the same port during the monitored period
Measurement | Graybox Testing

Phase 2: Extraction of IRC-related features
Modified version of UnrealIRC daemon instantiated on
network sink
 IRC listens on all ports ever observed in network fingerprint
 Upon detecting an IRC connection, IRC-fingerprint is created

firc  PASS, NICK ,USER, MODE , JOIN
PASS – initial password to establish IRC session
 NICK – nickname
 USER – username
 MODE – modes set
 JOIN – IRC channels to be automatically joined (and their
associated passwords)

Measurement | Graybox Testing

(Phase 2 continued…)
 To
learn botnet “dialect”, bot connects to local IRC
server and enters default channel
 IRC
query engine plays role of botmaster
 Bot behavior is learned by subjecting it to series of
commands

Command set includes:
 IRC commands observed in honeynet traces
 Commands extracted from publicly available bot source
code
Measurement | Longitudinal Tracking

Botnet tracking is performed by two means:
 The
use of a custom, lightweight IRC tracker
 Probing DNS caches across the globe
Measurement | Longitudinal Tracking

IRC Tracker
 “A
modified IRC client that can join a specified IRC
channel and automatically answer directed queries
based on the template created by the graybox testing
technique”
 IRC tracker instantiates new IRC session to IRC server
using fingerprint and template
 IRC
trackers need to appear responsive
Measurement | Longitudinal Tracking
 In
order to appear “real”, the following must be
performed:
 Traffic
filtered so inappropriate information is not included
in template

Filtering performed automatically while bot is executing
 Computer
specifications (e.g. memory, disk space) are
changed to resemble specifications of a real machine
 IRC query engine issues a set of commands that require
stateful responses

Emulates a bot’s stateful software
Measurement | Longitudinal Tracking

DNS Tracking
 Most
bots issue DNS queries to resolve IP addresses of
IRC servers
 Caches of DNS servers are probed to determine
number of DNS servers giving cache hits
 “Cache
hit” implies at least one client queried DNS server
during lifetime of its DNS entry
Measurement | Longitudinal Tracking

Original list contained 1.6 million DNS servers

First filter removed top level domains


Second filter checked consistency of replies




.gov, .mil, etc.
Two consecutive DNS queries
 First query was recursive and forced DNS server to completely
resolve query
 Second query was not recursive and obtained local answers
from server cache
TTL field in second response should be smaller than first
After filtering, master list consisted of 800,000 name servers
For a given IRC server, the caches of all DNS servers were
probed and any associated cache hits recorded
Results and Analysis

Results include:
 Traffic
3
 IRC
3
traces captured on local darknet
month period
logs gathered
month period
 DNS
 45
cache hit results from tracking 65 IRC servers
day period
Results| Botnet Prevalence

Botnet Traffic share

Two week snapshot of total incoming SYN packets to local
darknet vs. packets originating from botnet spreaders


A botnet spreader is any source that delivered a bot executable
27% of incoming SYNs
attributed to botnet
spreaders
 76%
come from botnet
spreaders if target
ports considered
Results| Botnet Prevalence



More than 90% of all traffic during peaks targeted ports
used by botnet spreaders
More than 70% of sources during peak periods sent shell
exploits
This suggests the
total amount of
botnet-related traffic
is far greater than
27%
Results| Botnet Prevalence

11% (85,000) of probed servers were involved in
at least one botnet activity

55% of servers in
dataset are for .com
domains



82% of DNS cache hits
from name servers in
that domain
29% of .com servers
had at least 1 cache hit
.cn servers only 0.2% of
total servers

95% of them exhibited
botnet activity
Results|Spreading Methods

Botnets use a variety of means to spread and recruit
new victims
Email
 Web
 Active scanning (most prevalent)


Botnets can be grouped into two types:

Worm-like


Continuosly scan ports following target selection algorithm
Variable scanning behavior

Uses a number of scanning algorithms

Uniform, non-uniform, localized
Results|Spreading Methods

192 botnets captured
 34
botnets were Type-I
 Upon
infection, bot starts scanning IP space for new victims
 Initiates connection to IRC servers (identified by hard-coded
list of DNS names)


All IRC servers/channels bots tried to join were unreachable
 Channel was banned by public IRC server
 DNS name did not resolve to valid IP address
Still, botnet grew over time due to persistence of scanning
Results|Spreading Methods

Type-II botnets were the most prevalent class



Scanning triggered by a command
More difficult to track due to continuosly changing behavior
Localized and targeted scanning are were most prevalent techniques


Localized scanning focused on Class B address space
Targeted scanning focused on Class A address space
Results|Growth Patterns

In order to examine botnet growth patterns, two
approaches were taken:
 Cumulative
number of unique DNS cache hits for distinct
botnets over time was plotted
 Growth pattern was compared to behavior learned
from IRC tracker
Results|Growth Patterns

Botnets with semi-exponential growth patterns exhibit
persistent random scanning activity (unchanging over time)


Example: for one botnet, topic of the corresponding channel was
set to randomly scan port 445 indefinitely for one month
Related to worm infections
Results|Growth Patterns

Also representative of botnets with intermittent activity profiles

Example: Botnet III corresponds to botnet that infected honeypots on
3/13/2006


IRC server went down between 4/12/2006 – 4/30/2006
When IRC server became available, growth slope increased and honeypots
were re-infected by the same botnet
Results|Growth Patterns

Predominantly used time-scoped scanning
commands
 As
opposed to continuous scanning like the previous two
Results|Growth Patterns

Botnet evolution estimated by counting unique
sources for message broadcast to the channel
 Only
plotted botnets of comparable size on a given
plot

Trends confirm heterogeneity in botnets
Results | Botnet Structures


60% of 318 collected malicious binaries were IRC bots
Four predominant IRC structures were revealed

All bots connected to a single IRC server



IRC servers can be connected to form an IRC network supporting large
numbers of users



30% of botnets bridged on multiple servers
50% bridged between two servers only
Seemingly unrelated botnets appear more similar when comparing their
naming conventions, channel names, and operators’ user IDs


Prevalent among smaller classes of botnets (few hundred users)
70% of observed botnets fell into this category
These botnets may seem to belong to the wrong botmaster
Selected group of bots commanded to download an updated binary

Results in bots being moved to a different IRC server
Results | Effective Botnet Size

Botnet footprint can become fairly large (> 15,000
bots)
 Predominant
structures were botnets managed by a
single or few servers

Distinction drawn between
 Botnet’s
footprint
 Number of bots connected to IRC channel at a given
time
 Effective
Size
Results | Effective Botnet Size

Some “chatty” IRC servers broadcast join/leave information for members
on channel


Maximum size of online
population is significantly
smaller than botnet’s footprint



Number of online bots versus time for these IRC servers is plotted in figure 9
Footprint greater than
10,000
No more than 3,000 bots
online at the same time
Effective size has little impact
on long term activity,
however, it affects number of
bots available to execute
commands in a timely manner
Results | Lifetime

Discrepancy between footprint and effective size
likely due to the long lifetime of a typical botnet
 Bot
death rates and high churn rates can affect botnet’s
effective size
Results | Lifetime

High churn rates

Bots do not stay long on IRC channel



Average stay time: 25 minutes
90% stay less than 50 minutes
Likely causes include



Client instability (as
a result of infection)
Machine hibernation
Botmasters
commanding bots to
leave the channel
Results | Botnet Software Taxonomy

183 of 192 confirmed IRC-based bot executables responded to
probes of IRC query engine


49% of bots run AV/FW killer – a utility that disables anti-virus and
firewall processes
43% run identd server which performs user identification


40% run system security monitor which tightens bot security


Ensures only intended bots join a given IRC channel
E.g. disables DCOM service and file sharing
38% run a registry monitor which alerts the bot of any attempts to
disable it
Results | Botnet Software Taxonomy

Number of exploits within bot binaries varied from
3 to 29
 Average
of 15 exploits per binary
 Most popular exploits (appeared in over 75% of
binaries)
 DCOM135
 LSASS445
 NTPASS
Results | Botnet Software Taxonomy

Authors evaluated effectiveness of ClamAV and
Norton anti-virus on 192 malicious binaries
 ClamAV
classified 137 binaries as malicious
 Norton anti-virus classified 179 binaries as malicious

Windows XP service pack 2 still not immune
Results | “Insider’s view”

Traces show that:
 Botmasters
share information concerning what prefixes
should not be scanned
 Bots are tweaked to minimize chatter on C2 channel
 Bots are probed to detect and isolate “misbehavers”
 Also
look for “super-bots” with high bandwidth network links
and large storage capacities
Results | “Insider’s view”

Bots migrate from one IRC channel to another, instructed by:


Command from botmaster
Download of replacement software that points to a different C2
server
Results | “Insider’s view”



Control commands include channel joins and leaves
Mining category includes commands that collect
machine specifications
Attack category includes commands from
botmasters to attack other network computers
Results | “Insider’s view”

Small botnets receive larger portion of control and mining
commands


Hands-on botmasters that devote large amounts of time to
manually control their botnet
Medium and large
botnets have a larger
percentage of cloning
and download
commands

Cloning could include
the use of one botnet to
attack another botnet
by overloading its IRC
server with join requests
Conclusion

Botnets are a major contributor to overall unwanted internet
traffic



Most botnet traffic can be attributed to scans used to recruit new
bots
IRC is still the dominant protocol used for C2 communications
Effective sizes of botnets can range from a few hundred to
a few thousand

Botnet footprints are usually much larger than effective size



This is due to high churn rate within a botnet
Bot’s average channel occupancy is less than half an hour
Graybox testing revealed sophistication of modern bot software

E.g. Self-protection measures
Contributions

Established empirical measurements for botnet
prevalence


Particularly in considering DNS cache hits by IRC botnets
that were tracked
Classified typicality's of bot binaries
Registry monitoring tactics
 Locking down host vulnerabilities




Classified most prevalent botnet activities as a function
of botnet size
Delineated between botnet footprint and “effective
size.”
Large experiment samples further solidified results
Critique

Focused mainly on Windows-based systems


It would be interesting to see the effectiveness of noted
infection strategies on Unix systems
Only evaluated two anti-virus applications

Perhaps include other popular anti-virus applications


McAfee, Symantec Corporate, AVG, etc.
Authors noted 60% of binaries collected were IRC bots

Did the other 40% use a different communication
mechanism?

If so, it would be interesting to know how they were structured and
if the authors evaluated them in any way
References
[1] Rajab, M.A., Zarfoss, J., Monrose, F., & Terzis A. (2006). A
multifaceted approach to understanding the botnet
phenomenon. Proceedings of the 6th ACM SIGCOMM
conference on Internet measurement, Rio de Janeriro, Brazil
Download