Automated Worm
Fingerprinting
Sumeet Singh, Cristian Estan,
George Varghese, and Stefan
Savage
Introduction
Problem: how to react quickly to worms?
CodeRed 2001
Infected ~360,000 hosts within 11 hours
Sapphire/Slammer (376 bytes) 2002
Infected ~75,000 hosts within 10 minutes
Existing Approaches
Detection
Ad hoc intrusion detection
Characterization
Manual signature extraction
Isolates and decompiles a new worm
Look for and test unique signatures
Can take hours or days
Existing Approaches
Containment
Updates to anti-virus and network filtering products
Earlybird
Automatically detect and contain new worms
Two observations
Some portion of the content in existing worms is invariant
Rare to see the same string recurring from many sources to many destinations
Earlybird
Automatically extract the signature of all known worms
Also Blaster, MyDoom, and Kibuv.B hours or days before any public signatures were distributed
Few false positives
Background and Related Work
Almost all IPs were scanned by
Slammer < 10 minutes
Limited only by bandwidth constraints
- Infections doubled every 8.5 seconds
- Spread 100X faster than Code Red
- At peak, scanned 55 million hosts per second.
Network Effects Of The SQL
Slammer Worm
At the height of infections
Several ISPs noted significant bandwidth consumption at peering points
Average packet loss approached 20%
South Korea lost almost all Internet service for period of time
Financial ATMs were affected
Some airline ticketing systems overwhelmed
Signature-Based Methods
Pretty effective if signatures can be generated quickly
For CodeRed, 60 minutes
For Slammer, 1 – 5 minutes
Worm Detection
Three classes of methods
Scan detection
Honeypots
Behavioral techniques
Scan Detection
Look for unusual frequency and distribution of address scanning
Limitations
Not suited to worms that spread in a nonrandom fashion (i.e. emails, IM, P2P apps)
Based on a target list
Spread topologically
Scan Detection
More limitations
Detects infected sites
Does not produce a signature
Honeypots
Monitored idle hosts with untreated vulnerabilities
Used to isolate worms
Limitations
Manual extraction of signatures
Depend on quick infections
Behavioral Detection
Looks for unusual system call patterns
Sending a packet from the same buffer containing a received packet
Can detect slow moving worms
Limitations
Needs application-specific knowledge
Cannot infer a large-scale outbreak
Characterization
Process of analyzing and identifying a new worm
Current approaches
Use a priori vulnerability signatures
Automated signature extraction
Vulnerability Signatures
Example
Slammer Worm
UDP traffic on port 1434 that is longer than
100 bytes (buffer overflow)
Can be deployed before the outbreak
Can only be applied to well-known vulnerabilities
Some Automated Signature
Extraction Techniques
Allows viruses to infect decoy programs
Extracts the modified regions of the decoy
Uses heuristics to identify invariant code strings across infected instances
Some Automated Signature
Extraction Techniques
Limitation
Assumes the presence of a virus in a controlled environment
Some Automated Signature
Extraction Techniques
Honeycomb
Find longest common subsequences among sets of strings found in messages
Autograph
Uses network-level data to infer worm signatures
Limitations
Scale and full distributed deployments
Containment
Mechanism used to deter the spread of an active worm
Host quarantine
Via IP ACLs on routers or firewalls
String-matching
Connection throttling
On all outgoing connections
Host Quarantine
Preventing an infected host from talking to other hosts
Via IP ACLs on routers or firewalls
Defining Worm Behavior
Content invariance
Portions of a worm are invariant (e.g. the decryption routine)
Content prevalence
Appears frequently on the network
Address dispersion
Distribution of destination addresses more uniform to spread fast
Finding Worm Signatures
Traffic pattern is sufficient for detecting worms
Relatively straightforward
Extract all possible substrings
Raise an alarm when
FrequencyCounter[substring] > threshold1
SourceCounter[substring] > threshold2
DestCounter[substring] > threshold3
Practical Content Sifting
Characteristics
Small processing requirements
Small memory requirements
Allows arbitrary deployment strategies
Estimating Content
Prevalence
Finding the packet payloads that appear at least x times among the N packets sent
During a given interval
Estimating Content
Prevalence
Table[payload]
1 GB table filled in 10 seconds
Table[hash[payload]]
1 GB table filled in 4 minutes
Tracking millions of ants to track a few elephants
Collisions...false positives
Multistage Filters
Array of counters
Hash(Pink) stream memory
[Singh et al. 2002]
Multistage Filters
Array of counters
Hash(Green) packet memory
Multistage Filters
Array of counters
Hash(Green) packet memory
Multistage Filters packet memory
Multistage Filters
Collisions are OK packet memory
Multistage Filters
Reached threshold packet memory packet1 1
Insert
Multistage Filters packet memory packet1 1
Multistage Filters packet memory packet1 1 packet2 1
Multistage Filters
Stage 1 packet memory packet1 1
No false negatives!
(guaranteed detection)
Stage 2
Conservative Updates
Gray = all prior packets
Conservative Updates
Redundant
Redundant
Conservative Updates
Detecting Common Strings
Cannot afford to detect all substrings
Maybe can afford to detect all strings with a small fixed length
Detecting Common Strings
Cannot afford to detect all substrings
Maybe can afford to detect all strings with a small fixed length
A horse is a horse, of course, of course
F
1
= (c
1 p 4 + c
2 p 3 + c
3 p 2 + c
4 p 1 + c
5
) mod M
Detecting Common Strings
Cannot afford to detect all substrings
Maybe can afford to detect all strings with a small fixed length
F
2
= (c
2 p 4 + c
3 p 3 + c
4 p 2 + c
5 p 1 + c
6
) mod M
A horse is a horse, of course, of course
F
1
= (c
1 p 4 + c
2 p 3 + c
3 p 2 + c
4 p 1 + c
5
) mod M
Detecting Common Strings
Cannot afford to detect all substrings
Maybe can afford to detect all strings with a small fixed length
F
2
= (c
2 p 4 + c
3 p 3 + c
4 p 2 + c
5 p 1 + c
6
) mod M
= (c
1 p 5 + c
2 p 4 + c
3 p 3 + c
4 p 2 + c
5 p 1 + c
6
- c
1 p 5 ) mod M
= (pF
1
+ c
6
- c
1 p 5 ) mod M
A horse is a horse, of course, of course
F
1
= (c
1 p 4 + c
2 p 3 + c
3 p 2 + c
4 p 1 + c
5
) mod M
Detecting Common Strings
Cannot afford to detect all substrings
Maybe can afford to detect all strings with a small fixed length
Still too expensive…
Estimating Address Dispersion
Not sufficient to count the number of source and destination pairs
e.g. send a mail to a mailing list
Two sources —mail server and the sender
Many destinations
Need to count the unique source and destination traffic flows
For each substring
Bitmap counting – direct bitmap
Set bits in the bitmap using hash of the flow ID of incoming packets
HASH( green )=10001001
[Estan et al. 2003]
Bitmap counting – direct bitmap
Different flows have different hash values
HASH( blue )=00100100
Bitmap counting – direct bitmap
Packets from the same flow always hash to the same bit
HASH( green )=10001001
Bitmap counting – direct bitmap
Collisions OK, estimates compensate for them
HASH( violet )=10010101
Bitmap counting – direct bitmap
HASH( orange )=11110011
Bitmap counting – direct bitmap
HASH( pink )=11100000
Bitmap counting – direct bitmap
As the bitmap fills up, estimates get inaccurate
HASH( yellow )=01100011
Bitmap counting – direct bitmap
Solution: use more bits
HASH( green )=10001001
Bitmap counting – direct bitmap
Solution: use more bits
Problem: memory scales with the number of flows
HASH( blue )=00100100
Bitmap counting – virtual bitmap
Solution: a) store only a portion of the bitmap b) multiply estimate by scaling factor
Bitmap counting – virtual bitmap
HASH( pink )=11100000
Bitmap counting – virtual bitmap
Problem: estimate inaccurate when few flows active
HASH( yellow )=01100011
Bitmap counting – multiple bmps
Solution: use many bitmaps, each accurate for a different range
Bitmap counting – multiple bmps
HASH( pink )=11100000
Bitmap counting – multiple bmps
HASH( yellow )=01100011
Bitmap counting – multiple bmps
Use this bitmap to estimate number of flows
Bitmap counting – multiple bmps
Use this bitmap to estimate number of flows
Bitmap counting – multires. bmp
OR
OR
Problem: must update up to three bitmaps per packet
Solution: combine bitmaps into one
Bitmap counting – multires. bmp
HASH( pink )=11100000
Bitmap counting – multires. bmp
HASH( yellow )=01100011
Multiresolution Bitmaps
Still too expensive to scale
Scaled bitmap
Recycles the hash space with too many bits set
Adjusts the scaling factor according
E.g., 1 bit represents 2 flows as opposed to a single flow
Too CPU-Intensive
A packet with 1,000 bytes of payload
Needs 960 fingerprints for string length of
40
Prone to Denial-of-Service attacks
CPU Scaling
Obvious approach: sampling
- Random sampling may miss many substrings
Solution: value sampling
Track only certain substrings
e.g. last 6 bits of fingerprint are 0
P(not tracking a worm)
= P(not tracking any of its substrings)
CPU Scaling
Example
Track only substrings with last 6 bits = 0s
String length = 40
1,000 char string
960 substrings 960 fingerprints
‘11100…101010’…‘10110…000000’…
Use only ‘xxxxx….
000000 ’ as signatures
Probably 960 / 2 6 = 15 signatures
CPU Scaling
P(finding a 100-byte signature) = 55%
P(finding a 200-byte signature) = 92%
P(finding a 400-byte signature) =
99.64%
Putting It Together
Address Dispersion Table key src cnt dest cnt header payload substring fingerprints substring fingerprints
AD entry exist?
update counters else update counter key cnt counters > dispersion threshold?
report key as suspicious worm
Content Prevalence Table cnt > prevalence threshold?
create AD entry
Putting It Together
Sample frequency: 1/64
String length: 40
Use 4 hash functions to update prevalence table
Multistage filter reset every 60 seconds
System Design
Two major components
Sensors
Sift through traffic for a given address space
Report signatures
An aggregator
Coordinates real-time updates
Distributes signatures
Implementation and
Environment
Written in C and MySQL (5,000 lines) rrd-tools library for graphical reporting
PHP scripting for administrative control
Prototype executes on a 1.6Ghz AMD
Opteron 242 1U Server
Linux 2.6 kernel
EarlyBird
Processes 1TB of traffic per day
Can keep up with 200Mbps of continuous traffic
Parameter Tuning
Prevalence threshold: 3
Very few signatures repeat
Address dispersion threshold
30 sources and 30 destinations
Reset every few hours
Reduces the number of reported signatures down to ~25,000
Parameter Tuning
Tradeoff between and speed and accuracy
Can detect Slammer in 1 second as opposed to 5 seconds
With 100x more reported signatures
Performance
200Mbps
Can be pipelined and parallelized for achieve 40Gbps
Memory Consumption
Prevalence table
4 stages
Each with ~500,000 bins (8 bits/bin)
2MB total
Address dispersion table
25K entries (28 bytes each)
< 1 MB
Total: < 4MB
Trace-Based Verification
Two main sources of false positives
2,000 common protocol headers
e.g. HTTP, SMTP
Whitelisted
SPAM e-mails
BitTorrent
Many-to-many download
False Negatives
So far none
Detected every worm outbreak
Inter-Packet Signatures
An attacker might evade detection by splitting an invariant string across packets
With 7MB extra, EarlyBird can keep per flow states and fingerprint across packets
Live Experience with EarlyBird
Detected precise signatures
CodeRed variants
MyDoom mail worm
Sasser
Kibvu.B
Variant Content
Polymorphic viruses
Semantically equivalent but textually distinct code
Invariant decoding routine
Extensions
Self configuration
Slow worms
Containment
How to handle false positives?
If too aggressive, EarlyBird becomes a target for DoS attacks
An attacker can fool the system to block a target message
Coordination
Trust of deployed servers
Validation
Policy
Conclusions
EarlyBird is a promising approach
To detect unknown worms real-time
To extract signatures automatically
To detect SPAMs with minor changes
Wire-speed signature learning is viable