Master’s Thesis (30 credits)

advertisement
Master’s Thesis (30 credits)
By: Morten Lindeberg
Supervisors: Vera Goebel and Jarle Søberg
Design, Implementation, and
Evaluation of Network Monitoring
Tasks for the Borealis Stream
Processing Engine
Outline
•
•
•
•
•
•
•
•
•
•
Problem description
Application domains
Data stream management system (DSMS)
Borealis
Design
Experiment Setup
Network monitoring tasks
Implementation
Evaluation
Conclusion
Future Work
Slide no. 2
Problem Description
• Design, Implementation, and Evaluation of
Network Monitoring Tasks for the Borealis
Stream Processing Engine
• Network Monitoring Tasks:
– Task-1: Verify Borealis load shedding mechanisms.
– Task-2: Measure the average load of packets and network
load per second over a one minute interval.
– Task-3: How many packets have been sent to certain ports
during the last five minutes?
– Task-4: How many bytes have been exchanged on each
connection during the last ten seconds?
– Task-5: Identify possible SYN flood attacks
Slide no. 3
Application Domains
• Network monitoring
Push - based
(Controlling and measuring the Internet or parts of it)
– Challenges
• Traffic volumes
• Get relevant data
• Privacy
– On-line network measurements
• Passive: Our network tasks
• Active: E.g. Traceroute and Ping
N.M
Looks at all
passing packets
– Off-line network measurements
• Passive: E.g. InTraBase (Siekkinen, 2006)
• Active: Pandora FMS(Pandora, 2007)
DB
Slide no. 4
Cont. Application Domains
• Sensor networks
– TinyDB
Pull-based
• Financial tickers
– Traderbot
Push-based
Slide no. 5
DSMS
• Stream Data Model
– Definition:
A data stream is a real-time, continuous, ordered
sequence of items (Golab, 2003)
n
Slide no. 6
Cont. DSMS
• Requirements
Windows are either
time-based or
tuple-based
Streaming tuples should
only be kept in main
memory, never written
to disk (too slow)
Window techniques:
– Continuous query language
– Data reduction techniques
• Sampling
• Load shedding
• Aggregations with window techniques
Without sliding windows aggregations
would be a blocking operator, since
one never will see the whole stream at
once
– Adaptive
– Integration with a traditional database
– Low latency and high throughput
Hopping windows
Tumbling windows
Overlapping windows
Slide no. 7
Cont. DSMS
• Existing systems:
Name:
Language:
TelegraphCQ (Berkeley Uni.)
SQL-like
STREAM (Stanford Uni.)
SQL-like
Aurora (Brown, M.I.T++)
Boxes and arrows
Medusa (Brown, M.I.T++)
Boxes and arrows
Borealis (Brown, M.I.T++)
Boxes and arrows
Gigascope ($ AT&T)
SQL-Like
Slide no. 8
Borealis
Data stream
• Stream processing engine (SPE)
– Academic research / Public domain
– Distributed queries
– General purpose
• Multi-player first person shooter game
• Network monitoring
• Continuous query language
Distributed
query
n1
High
Availability
n2
– Operator boxes and stream arrows
– XML + GUI
– E.g., operators: Map, Aggregate, Join, Filter,
Random Drop and operators for integration with
statically stored tables
n3
n4
n5
n6
Result tuples
Slide no. 9
Design
Task 1 - Version 1
– Mapping
Task 2 - Version 1
–
Average load and packet
count
Slide no. 10
Cont. Design
Task 3 - Version 2
– Port destination cont
Task 4 - Version 2
– Exchanged bytes
Slide no. 11
Cont. Design
Task 5 - Version 1
– SYN Flood attack (Several hosts initiate half-open connections to a
server so that it has to deny service to others)
– Identifies the relation between the count of SYN packets and
normal packets (Non-SYN). Joins aggregated tuples if SYN count is
twice or more the normal packet count.
Slide no. 12
Cont. Design
<box name="synfilter" type="filter" >
<in stream="Packet" />
<out stream="Syn" />
<out stream="Normal" />
<parameter name="expression.0” value="syn == 1"/>
<parameter name="pass-on-false-port”
</box>
<box name="SynfloodJoin" type="join" >
<in stream="AggregateNormal" />
<in stream="AggregateSyn" />
<out stream="Result" />
value="1" />
<box name="Normalcount" type="aggregate" >
<in stream="Normal" />
<out stream="Aggregatenormal" />
<parameter name="aggregate-function.0”
value="count()" />
<parameter name="aggregate-function-output-name.0”
value="count" />
<parameter name="window-size-by” value="VALUES" />
<parameter name="window-size”
value="1"
/>
<parameter name="advance”
value="1"
/>
<parameter name="order-by”
value="FIELD" />
<parameter name="order-on-field" value="timestamp" />
</box>
<box name="Syncount" type="aggregate" >
<in stream="Syn" />
<out stream="Aggregatesyn" />
<parameter name="aggregate-function.0”
value="count()" />
<parameter name="aggregate-function-output-name.0”
value="count" />
<parameter name="window-size-by” value="VALUES" />
<parameter name="window-size”
value="1"
/>
<parameter name="advance”
value="1"
/>
<parameter name="order-by”
value="FIELD" />
<parameter name="order-on-field” value="timestamp" />
</box>
<parameter name="predicate"
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
<parameter
</box>
value = "left.count * 2 < right.count
and left.count > 0" />
name="left-buffer-size" value = "1" />
name="left-order-by"
value = "VALUES" />
name="left-order-on-field” value = "timestamp" />
name="right-buffer-size” value = "1" />
name="right-order-by” value = "VALUES" />
name="right-order-on-field” value = "timestamp" />
name="out-field-name.0” value="timestamp" />
name="out-field.0"
value="left.timestamp" />
name="out-field-name.1" value="ratio" />
name="out-field.1”
value="right.count / left.count" />
name="out-field-name.2" value="syn" />
name="out-field.2"
value="right.count" />
name="out-field-name.3” value="normal" />
name="out-field.3"
value="left.count" />
Slide no. 13
Experiment Setup
•
•
•
•
•
Scripts executes the different stages of each experiment
TG: Generates traffic
fyaf: Filters packet headers from NIC. Counts the number of packets retrieved
by the C.A
C.A: Transforms the packet headers into tuples. I/O to the Q.P
Q.P: Performs the query on the tuples retrieved from C.A
System resource
consumption is logged
by the execution
scripts..
TG controls the
amount of
generated traffic
per second..
fyaf calculates the
number of lost
packets..
Slide no. 14
Implementation
fyaf
Client application
Data stream
Query processor
Borealis
<xml-query>
• Client application main-method:
int
{
...
main( int
argc, const char
*argv[] )
Results
sock = get_connection();
NOTICE << "Socket opened: " << sock;
status = marshal.open();
if ( status )
{
WARN << "Could not deply the network.";
}
else
{
//Start the timer..
timer = Time::now();
// Send the first batch of tuples.
marshal.sentPacket();
// Run the client event loop.
marshal.runClient();
Queue up the next round with a delay.
Return only on an exception.
}
...
}
Slide no. 15
Evaluation
Results for Task 1 ( The map task )
CPU Maximums
Drop box can lead to increased
CPU utilization
Slide no. 16
Cont. Evaluation
Results for Task 2 - (the simple task)
(Lost packets at different network loads)
40 Mbit/s
Slide no. 17
Cont. Evaluation
Results for Task 2 - (the simple task)
(Task result - Measured Load)
Ac 98%
Ac 96%
Ac 93%
Slide no. 18
Cont. Evaluation
Results for Task 3 - Memory Consumption
Static tables causes increased
memory consumption,
but not much.
Low memory consumption. (31 Mbyte).
No changes when increasing load.
Slide no. 19
Cont. Evaluation
Task
Network Load
Task 1
30,40 Mbit/s
Memory
Consumption
31 Mbyte
Task 2
40 Mbit/s
31 Mbyte
Task 3
10, 30 Mbit/s
31, 33 Mbyte
Task 4
20 Mbit/s
31 Mbyte
Task 5
20 Mbit/s
30, 50+ Mbyte
Slide no. 20
Conclusion
• Support complex network monitor queries
• Borealis can handle network loads:
– 40 Mbit/s for simple tasks
– 20 - 30 Mbit/s for complex tasks
– 10 Mbit/s when comparing input packets with several
thousands of statically stored tuples.
• Load Shedding
– Not fully working, does not identify overload situations
– random_drop box does not significantly increase supported
network load
• Low memory consumption
– System code parameters might affect performance
Slide no. 21
Future Work
• Distribution of queries
• Expand client application (fyaf and load
shedding)
• Optimization of source code system
parameters
• New version of Borealis (Winter 2007)
• Comparison with results from TelegraphCQ
(Søberg, 2006) and STREAM (Hernes, 2006)
Slide no. 22
Bibliography
• (Søberg, 2006) - Design, implementation, and evaluation
of network monitoring tasks with the TelegraphCQ data
stream management system,Master’s Thesis 2006.
• (Hernes, 2006) - Design, implementation, and evaluation
of network monitoring tasks with the STREAM data stream
management system, Master’s Thesis 2006.
• (Siekkinen, 2006) - Root Cause Analysis of TCP
Throughput: Methodology, Techniques, and Applications,
Dr. Scient. Thesis 2006.
• (Golab, 2003) - Issues in Data Stream Management,
Lukasz Golab and M. Tamer Ötzu, 2003
• (Pandora, 2007) - http://pandora.sourceforge.net
Slide no. 23
Download