Deduplication - DEEPNESS Lab

advertisement
Deep Packet Inspection(DPI)
Engineering for Enhanced Performance of
Network Elements and Security Systems
PIs: Dr. Anat Bremler-Barr (IDC)
Dr. David Hay (HUJI)
www.deepness-lab.org
1
• Deepness Lab was founded in November 2010
• Our mission: Deep Packet Inspection (DPI) for Next
Generation Network devices
• Funding:
• 5 years ERC Starting Grant (1M Euro)
• 3 years Kabarnit, a Magnet program ($70K/year)
• A gift from Cisco ($75K)
• Main Industry Collaborations: Commtouch, Radware,
Verint
2
People
Faculty: Anat Bremler-Barr (IDC Herzliya), David Hay(The Hebrew University of Jerusalem)
Postdoc : Shimrit Tzur-David, Yaron Koral
Ph.D. Students
Liron Schiff (Tel Aviv University), Yotam Harchol (The Hebrew University of Jerusalem)
Collaborators:
Yehuda Afek (Tel Aviv University), Isaac Keslassy (Technion),Shir Landau-Feibish (Tel Aviv
University)
Past Students
Victor Zigdon, M.Sc. (IDC Herzliya),Adam Mor, M.Sc. (IDC Herzliya)
3
People
Dr. Anat Bremler-Barr - Ph.D. with distinction, TelAviv University, Israel (2001). Founder and chief
scientist of Riverhead Networks (focused on
distributed denial of service solution, and was
acquired by Cisco). Senior lecturer (assistant
professor) with tenure at IDC.
Dr. David Hay - Ph.D. from the Technion (2007).
Post-doc at Columbia University, NY, USA and
Politecnico di Torino. Previously, also at IBM
Research and Cisco San Jose. Senior lecturer
(assistant professor) at the Hebrew U.
Deep Packet Inspection (DPI)
• DPI - Identifying signatures (patterns or regular
expressions) in the packets’ payload
• DPI is the main action taken to inspect traffic and
therefore it is a critical component in next generation
networks:
security, content filtering, traffic monitoring, load
balancing, lawful interception, targeted advertising, data
leakage prevention, application-aware routing ….
• High-speed DPI is challenging and quickly becomes the
bottleneck of the entire packet inspection process.
5
Impact
• 66% of network network equipment vendors
define DPI as “a must have” technology today
[Heavy Reading Survey, 2011]
• DPI market on 2011 estimated at $550 million,
growth of 20%/year [Qosmos report, Heavy
Reading, Dec. 2012]
6
Major Challenges
• Scalability:
– Rate - greater than 10 or even 100 Gbps
– Memory - handling thousands of signatures
– Power - educing the high power consumption
• Compressed traffic
• Security of the NIDS itself:
– Current solutions are vulnerable to Denial of Service attack
• DPI in Software Defined Networks
• Signatures Extraction
7
Compressed Traffic
14
Compressed HTTP
• 84.1% of the top 1,000 sites compress their traffic.
• Data compression is done by adding references to
19% increase
repeated data.
in 8 month!
• There are two types of compression:
– Intra-response compression – the references point to
bytes within the response (Gzip/Deflate)
– Inter-responses/connections compression – the
references point to bytes in a separate file, called
dictionary (Google’s SDCH).
15
Challenges
Current security tools do not deal with
compressed traffic due to the great challenges in
time and space
16
Compressed Traffic : Space Challenge
• Thousands of concurrent sessions
Compressed, Mem: 32KB/session
Uncompressed Traffic
Contribution:
Improve
Space
Time
80%
40%
Compressed Traffic : Time Challenge
• General belief:
Decompression + pattern matching
>> pattern matching
• Our algorithms show how to accelerate the
pattern matching using the compression
information
Decompression + pattern matching
< pattern matching
18
High-Level Idea
• Compression is done by compressing repeated
sequences of bytes
• Store information about the pattern matching results
 No need to fully perform again pattern matching on
repeated sequences which were already scanned 
x2-3 time reduction
• The buffers needed for decompression are not used
most of the time, and therefore can be kept in
compressed form most of time  x5 space reduction
19
The Other Side of the Coin: Acceleration by
Identifying repetitions in uncompressed Traffic
There are repetitions in uncompressed HTTP traffic
– Entire files (e.g., images)
– Parts of the files (e.g., HTML tags, javascripts)
 We keep scanning again and again the same thing (and get the
same scanning results..)
1. Identify frequently repeated data
Stored in a dictionary
2. Perform DPI on the data once and remember the results
DPI by pattern matching Aho-Corasick algorithm. Result is the state.
3. When encountering a repetition, recover the state without rescanning
Delicate points need to be taken care of, so we won’t miss any pattern
23
Securing the NIDS Itself
24
Complexity DoS Attack Over NIDS
• Easy to craft – very hard to process packets
• 2 Steps attack:
1. Kill IPS/FW
Attacker
Internet
2. Sneak into the network
Attack on Security Elements
Combined Attack:
DDoS on Security Element
exposed the network –
theft of customers’
information
Attack on Snort
The most widely deployed IDS/IPS worldwide.
Heavy packets rate
OUR GOAL:
A multi-core system
architecture, which is robust
against complexity DDoS attacks
System Throughput Over Time
Reaction time
can be smaller
System Architecture
Q
Core #1
Q
Core #2
Q
Core #8
Q
Core #9
Q
Core #10
Routine Mode:
Load balance between cores
Processor Chip
NIC
Detects
heavy
packets
System Architecture
Q
Core #1
Q
Core #2
Q
Core #8
Alert Mode:
Dedicated cores for heavy packets
Others detect and move heavy to
Dedicated.
Q
Q
Dedicated
Core #9
Dedicated
Core #10
Processor Chip
NIC
Detects
heavy
packets
B
B
B
Cloud solution
• The different cores are different (virtual)
machines.
• Load balancing sends heavy packets to
machines that run a special more efficient
processing method.
• In SDN, this can be done even faster and easier.
32
DPI using TCAMs
33
TCAM – Ternary Content- Addressable Memory
1
2
3
1110101010100101001********1111
1110101010100101001*******11111
1110101010100101001*********011
4
0011101010*********************
1110*********0101001010101010**
5
1110101010100101001************
6
*************************001110
0011101010101******************
7
8
9
0
0
0
1
0
2
1
0
0
1
Encoder
0
Action
3
3
4
5
6
7
1
0
De-facto
solution
1111111111111111111111111111***
1
*******************************classification.
of packet 8
9
deny
deny
accept
deny
accept
deny
deny
deny
log
accept
Core component of SDN switch
Match lines
TCAM
SRAM
Search Key
0011101010101001110001110001110
34
Some Challenges In Using TCAM
• Reducing the number of entries  power
consumption reduction
• Dealing with ranges
(how to encode the range [1-6]?)
• How to correct errors?
– More about it in the next slide
• How to use it for non-traditional tasks
– Traditionally, TCAM is used for IP lookup and header
classification (e.g., using 5-tuples)
35
Example: Error Correction in TCAM
• In SRAM (or any regular memory)
– Input: address (entry number)
– Output: content of that address
– One can apply an error detection/correcting code
on that content
• In TCAM
– Even if the content seems OK, we still have false
miss or indirect false miss errors, TCAM EDC/ECC
are harder
PEDS: Parallel Error Detection
Scheme for TCAM Devices
• Detecting all errors using the built-in parallel
lookup of the TCAM
• The number of lookups is a function of the
width of the TCAM word, and not the number
of entries in the database.
– Which is 3 orders of magnitude larger
• Developed, patented in DEEPNESS lab
CompactDFA for DPI
• Using TCAM to represent a huge DFA in a
compact manner.
• Reducing the problem of pattern matching to
IP lookup (much easier problem)
• Each byte scan  one TCAM lookup
– Can be reduced using variable stride traversal
– Further performance boost with parallelism and
pipelining
38
TCAM
Current
Sym
Next State
1
0000(s0)
A
0000 (s0)
2
0000(s0)
B
0110(s6)
3
0000(s0)
C
1100(s12)
4
0000(s0)
D
0000(s0)
5
0000(s0)
E
0001(s1)
6
0000(s0)
F
0000(s0)
7
0001(s1)
A
0000(s0)
8
0001(s1)
B
0010(s2)
9
0001(s1)
C
0000(s0)
10
0001(s1)
D
0000(s0)
11
0001(s1)
E
0000(s0)
12
0001(s1)
F
0000(s0)
13
0010(s2)
A
0000(s0)
14
0010(s2)
B
0100(s4)
15
0010(s2)
C
0011(s3)
16
0010(s2)
D
0000(s0)
84
1101(s13)
F
0000 (s0)
SRAM
Longest Prefix Match
DFA  CompactDFA
Snort:
73MB  0.6MB
ClamAV: 1.5GB  26MB
Signature Extraction
40
Current DDoS Attack
• Armies of zombies  Many sources
• Hard to identify behaviorally
• No known signatures
Zombies
on
innocent
computers
Infrastructure-level
DDoS attacks
Bandwidth-level
DDoS attacks
Server-level DDoS
attacks
41
Automated Extraction of Signatures
for Zero-day Internet Attacks
• Input:
• sample of attack traffic (high volume attack)
• sample of normal traffic
Output: Automatically find signatures that appear frequently only
during attack
• Where:
– Input collection:
• In mitigation apparatus (DDoS Guard/firewall/anti-DDoS etc.)
• In the cloud – collect data from several collectors.
– DDoS – power computation saving
– Signatures used by anti-DDoS devices and firewalls to stop attack
• Mitigation in minutes, good enough for these types of attacks
42
Download