Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers and Prashant Shenoy

advertisement
Hyperion: High Volume Stream
Archival for Restrospective
Querying
Peter Desnoyers and Prashant Shenoy
University of Massachusetts
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Packet monitoring with history
Packet monitor:
capture and search packet headers
E.g.: Snort, tcpdump, Gigascope
… with history:
Capture, index, and store
packet headers
Interactive queries on stored data
Provides new capabilities:
Network forensics:
monitor
When was a system compromised?
From where? How?
Management:
After-the-fact debugging
storage
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Challenges
Speed
1 gbit/s x 80%
÷ 400 B/pkt
= 250,000 pkts/s
Storage rate, capacity
to store data without loss,
retain long enough
For each link
monitored
Queries
must search millions of packet
records
Indexing in real time
for online queries
Commodity hardware
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Existing approaches
Event
rates
Archive
Index,
query
Commodity
Streaming query systems
(GigaScope, Bro, Snort)
Yes
No
No
Yes
Peer-to-peer systems
(MIND, PIER)
No
Yes
Yes
Yes
Conventional DBMS
No
Yes
Yes
Yes
CoMo
Yes
Yes
No
Yes
Proprietary systems*
?
Yes
Yes
No
*Niksun NetDetector, Sandstorm NetInterceptor
Packet monitoring with history requires
a new system.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
HW
Outline of talk
Introduction and Motivation
Design
Implementation
Results
Conclusions
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Hyperion Design
Multiple monitor systems
High-speed storage system
Local index
Distributed index for query routing
Monitor/
capture
Index
Distributed
index
Hyperion node
Storage
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Storage Requirements
Real-time
Writes must keep up
or data is lost
Prioritized
Reads shouldn’t interfere
with writes
Aging
Old data replaced by new
Stream storage
Typical
app
Hyperion
Likely deletes
Newest
files
Oldest
data
File size
Random,
small
Streaming
yes
no
Sequential
reads
Behavior:
Typical app. vs. Hyperion
Different behavior
Packet monitoring is different
from typical applications
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Log structured stream storage
Goal: minimize seeks
despite interleaved writes on
multiple streams
Log-structured file system
minimizes seeks
disk position
Interleave writes at
advancing frontier
free space collected
by segment cleaner
But:
A C C
A B
1:
2:
3:
4:
A
C
A
B
General-purpose segment
cleaner performs poorly on streams
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Write
frontier
Hyperion StreamFS
How to improve on a generalpurpose file system?
Rely on application use patterns
Eliminate un-needed features
skip
StreamFS – log structure with
no segment cleaner.
No deletes (just over-write)
No fragmentation
No segment cleaning overhead
Operation:
Write fixed-size segment
Advance write frontier to next
segment ready for deletion
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
StreamFS Design
Record
record
Single write, packed into:
Segment
segment
Fixed-size, single stream,
interleaved into:
region
Region
Contains:
Region map
Region map
Identifies segments in region
directory
Used when write frontier wraps
Stream_A
Directory
…
Locate streams on disk
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
StreamFS optimizations
New data
Data Retention
Control how much
history saved
Lets filesystem make
delete decisions
Old data is
deleted
Reservation
Speed balancing
Worst-case speed set by slowest tracks
Solution: interleave fast and slow sections
Worst-case speed now set by average track
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Local Index
Requirements:
High insertion
speed
Interactive query
response
Index and search mechanisms
Search
speed
Insert
speed
Exhaustive
search
No
Yes
B-tree
Yes
No
Hash
index
Yes
No
Signature
index
Yes
Yes
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Signature Index
Compress data into signature
Store signature separately
Signature
Search signature, not data
Retrieve data itself on match
Signature algorithm:
Bloom filter
Keys
No false negatives –
never misses a result
False positives –
extra read overhead
Records
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Signature index efficiency
Index size
False positives
(data scan)
Concise index:
Index scan cost: low
False positive scans: high
Bytes searched
Overhead = bytes searched
Index size
Verbose index:
Index scan cost: high
False positive scans: low
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Multi-level signature index
Concise index:
Concise
index
Low scan overhead
Verbose index:
Low false positive
overhead
Verbose
index
Use both
Scan concise index
Check positives in
verbose index
Data records
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Distributed Index
Query routing:
Send queries only to
nodes holding matches
Use signature index
Index distribution:
Aggregate indexes at
cluster head
Route queries through
cluster head
Rotate cluster head for
load sharing
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Implementation
Components:
StreamFS
Index
Capture
RPC, query & index
distribution
Query API
Linux OS
Python framework
RPC, query,
index dist.
Query
API
Index
Linux
kernel capture
StreamFS
Hyperion components
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Outline of talk
Introduction and Motivation
Design
Implementation
Results
Conclusions
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Experimental Setup
Hardware:
Linux cluster
Dual 2.4GHz Xeon CPUs
1 GB memory
4 x 10K RPM SCSI disks
Syskonnect SK98xx + U. Cambridge driver
Test data
Packet traces from UMass Internet gateway*
400 mbit/s, 100k pkt/s
*http://traces.cs.umass.edu
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
StreamFS – write performance
Tested configurations:
NetBSD / LFS
Linux / XFS (SGI)
StreamFS
Workload:
multiple streams, rates
Logfile rotation
Used for LFS, XFS
Results:
50% boost in
worst-case throughput
Fast enough to store
1,000,000 packet hdrs/s
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
StreamFS – read/write
Workload:
Continuous writes
Random reads
StreamFS:
sustained write
throughput
XFS
throughput collapse
StreamFS can handle
stream read+write
traffic without data loss.
XFS cannot.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Index Performance
Calculation benchmark:
Query:
380M packet
headers
26GB data
selective query
(1 pkt returned)
Data fetched (MB)
250,000 pkts/sec
Query results:
Index size
13MB data fetched to
query 26GB data (1:2000)
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
System Performance
Workload:
Trace replay
Simultaneous queries
Speed:
100-200K pkts/s
Packet loss measured:
#transmitted - #received
Results:
Packets/s
Loss rate
110,000
0
130,000
0
150,000
2·10-6
160,000
4·10-6
175,000
10·10-6
200,000
.001
Up to 175K pkts/s with negligible
packet loss
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Conclusions
Hyperion - packet monitoring with retrospective
queries
Key components:
Storage
50% improvement over GP file systems
Index
Insert at 250K pkts/sec
Interactive query over 100s of millions of pkts
System
Capture, index, and query at 175K pkts/sec
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Questions
Questions?
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Download