WaveScope: A Signal-Oriented Stream Management System Samuel Madden

advertisement
WaveScope: A Signal-Oriented
Stream Management System
Samuel Madden
madden@csail.mit.edu
w/ Hari Balakrishnan, Lewis Girod, Yuan Mei, Ryan
Newton, Stanislav Rost, Arvind Thiagarajan
Early Sensor Network Research
• Primarily focused on low-rate, fixed
deployments of primitive devices
• Our focus is on higher rate, mobile apps
Mobility
CarTel
Early Apps (TinyDB)
WaveScope
100 - 10M Hz
This talk
Data Rate
WaveScope: Pipeline Monitoring
• Pressure sensors measure
the frequency response of
the pipe to pressure
transients.
• Questions:
• Are there leaks in the
pipe?
Data
rates:
10 kSamples/channel x 10s - 100s of channels
• If so,
approximately
where are they?
Borealis:
• What is 15
thekSamples/sec
estimated
WaveScope:
7 MSamples/sec
leak size?
(Embedded hardware ~50 times slower)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Commercial stream processors?
WaveScope: WiFi Packet Error
Correction
• Align packets in time, remove errors, output
error corrected packet stream
Valid
Packet ?
Receiver 0
PacketStreamAlign
Window
Group
By
Receiver 1
Union
<PacketGroup>
Estimate
Time
Offsets
Receiver 2
Receiver N
Packet streams
(~ 5 kPackets/sec)
CRC
Error ?
Output
Packets
Window
Time
Join
Time +/-
Error
Correction
FilterPair
FilterPair
(0,1)Match
(0,2)
Packets
(0,3)
WaveScope: Marmots
• Marmot Detection
Fast 1-ch
Detection
4 channel
audio
Questions:
Temporal
Selection
Audio bitstreams
(~ 44 kHz)
Enhance
& Classify
Output
(Marmota flaviventris)
1. Is there current activity (energy) in the frequency band
corresponding to the marmot alarm call?
2. If so, which direction is the call coming from?
3. Is the call that of a male or female? Is it a juvenile?
4. Where is each individual marmot located over time?
Other Applications
• Condition-based maintenance
• Medical monitoring
• CarTel
Quic k Ti me™ and a
T IFF (Unc om pres s ed) dec om pres s or
are needed to s ee t his pic t ure.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
WaveScope Goal
• Build a stream query processor for these apps
• Common requirements:
• User defined functions (e.g., signal proc., math, stats)
• Relational operations
• Good performance at high rates (100+ kHz) / on
embedded hardware
• Programmability
• Distributability (not today)
• Support for asynchronous data
Why not use…
• [your favorite streaming db] ?
• UDFs for signal processing, etc.
• Problem: “impedence mismatch”
• Conversion to and from external tools
• (At least for Borealis), performance
• Per tuple overheads (timestamps, scheduling, etc.)
• No particular allegiance to SQL
• MATLAB
• Poor support for streaming data
• Poor integration with storage tools
• Poor performance, unoptimizable
WaveScope: Features & Usage
Signal-oriented
data
Resultmodel
User Console
WaveScript
Errors
Compile & Optimize
Program
Type Inference
Single language for
WaveScope Engine
queries
& UDFs
Query Plan
Partial Evaluate
Scheduler
data
data
Threads
Optimize
Generate C
Installs Query Plan
Domain specific
optimizations
Memory Manager
Efficient
Runtime
WaveScope: Features & Usage
User Console
WaveScript
Errors
Result
Compile & Optimize
Program
Type Inference
WaveScope Engine
Query Plan
Partial Evaluate
Scheduler
data
data
Threads
Optimize
Generate C
Installs Query Plan
Memory Manager
Data Model
Stream =
< Int, String, array[ ] , SigSeg >, <
>, …, <
tuples
• Streams are asynchronous, carry tuples
• SigSeg is the primary windowing construct
• (Logically) consists of an array of values
• Isochronously (regularly in time) sampled
• Windows passed between operators
• Several possible implementations
• When done right, isochrony enables
• Efficient merging of streams
• Quick lookup of historical stream fragments
>
WaveScript Language
• Integrated query and custom operator language
• Avoids mediating between external langs
• Provides type safety, optimization
• For queries & UDFs!
• Compiles down to data flow
• 3 main constructs:
• Config - stream definition
• Query - main program
• Subquery - subroutine
Marmot Schematic
Fast 1-ch Detection
Temporal selection
ProfileDetector
<t1,t2,emit>
Audio0
<audio>
Audio1
<audio>
Audio2
<audio>
Audio3
<audio>
sync
<w1,w2,w3,w4>
Enhance and classify full dataset (expensive)
Beamformer
Output
<DOA, enhanced>
Classifier
<type, id>
<DOA, combined, type, id>
zip
WaveScript Example 1
fun profileDetect(S : Stream<SigSeg<int16>>, Implicit types
scorefun, <winsize,step>) {
wins = rewindow(S, winsize, step);
Window once
scores : Stream< float >
scores = iterate(w in wins) {
Custom ops via
emit scorefun(fft(w));
“iterate” loops
};
withscores : Stream<float, SigSeg<int16>>
withscores = zip2(scores, wins);
Recombination ops
return threshFilter(withscores);
}
iterate
S
rewindow
zip2
threshFilt
WaveScript Example 2
Fast 1-ch Detection
config {
Ch0 = AudioSource(0, 48000, 1024);
Ch1 = AudioSource(1, 48000, 1024);
Ch2 = AudioSource(2, 48000, 1024);
Ch3 = AudioSource(3, 48000, 1024);
}
Temporal selection
ProfileDetector
<t1,t2,emit>
Audio0
<audio>
Audio1
<audio>
Audio2
<audio>
Audio3
<audio>
sync
<w1,w2,w3,w4>
Enhance and classify full dataset (expensive)
Beamformer
Output
<DOA, enhanced>
Classifier
<type, id>
<DOA, combined, type, id>
Query {
control = profileDetect(Ch0, marmotScore, <64,192>);
datawindows = sync4(control, Ch0, Ch1, Ch2, Ch3);
beam = beamform(datawindows, arrayGeometry);
marmots = classify(beam.enhanced, marmotClassifier);
return zip2(beam, marmots);
}
zip
User Defined Functions
• Standard SQL aggregates
• 3 function: init, merge, output
• Example: average
• Init(A, val) { A.sum = val; A.count = 1; }
• Merge(A1, A2) { A1.sum += A2.sum;
A1.count += A2.count; }
• Output (A) {return A.sum / A.count; }
sub build_aggr(S, init, aggr, out) {
S2 = iterate (x in S) {
state { acc = init(); }
acc := aggr(acc, x);
emit out(x);
}
return S2; }
WaveScope: Features & Usage
User Console
WaveScript
Errors
Result
Compile & Optimize
Program
Type Inference
WaveScope Engine
Query Plan
Partial Evaluate
Scheduler
data
data
Threads
Optimize
Generate C
Installs Query Plan
Memory Manager
Query Compilation
• Queries compile to dataflow graph of union (ziplike) and iterate boxes
• Fancy generic operators all go away
• Each union/iterate contains compiled code
• Intermediate steps type check, expand, and
optimize this graph
• Runtime system executes this query plan
(Planned) Query Optimizations
• Performed during compilation
• Query plan transforms
• Iterate merging
• Common expression elimination
• Operator reordering
• Domain specific rewrites
• Rule-driven
(pattern) -> (replacement)
Rewrite Optimizations
S2=autocorr(S1);
S3=FFT(S2);
S2=IFFT(Mult(FFT(S1),FFT(S1)));
S3=FFT(S2);
T1=FFT(S1);
S2=IFFT(Mult(T1,T1));
S3=FFT(S2);
autocorr(X) ≡ convolve(X,X)
convolve(X,Y) ≡ IFFT(mult(FFT(X),FFT(Y))
Common Sub-expression
FFT(IFFT(X)) ≡ X
T1=FFT(S1);
S3=Mult(T1,T1);
A.Hussian, J. Heidemann, and C.Papadopoulos. A Framework for
Classifying Denial of Service Attacks. SIGCOMM, 2003.
WaveScope: Features & Usage
User Console
WaveScript
Errors
Result
Compile & Optimize
Program
Type Inference
WaveScope Engine
Query Plan
Partial Evaluate
Scheduler
data
data
Threads
Optimize
Generate C
Installs Query Plan
Memory Manager
Runtime Issues
• Memory management
• Tuples may be copied between operators
• Simplifies branching plans, synchronization
• Can sometimes avoid (more later)
• Can we avoid copying costs for signal data?
• Scheduling / Threading model
• Want to minimize scheduling overhead, memory
utilization
• Benefit of SigSeg based data model
Memory Manager
• Objective: minimize overheads of managing
signal data (SigSegs)
• SigSeg Ops
• Allocate
• Pass between operators
• Append, subseg
• Materialize (into array)
Alternatives
• SigSeg
AlwaysSigSeg
copy
• SigSegs are just arrays, and are copied between
ops and SegList
on subseg/append
• Materialization is free
• RefCount-Lazy
• SigSegs are reference counted pointers to seglists
• Seglists contain
a start time, end time, and list of
Sub-range
segments
Underlying
Data Buffer
• Segments are produced by
data source
operators
Reference
counted
pointer
sample range
• Subseg/append/pass areLogical
cheap
List pointer
• Materialization is costly
Memory Manager Evaluation
• Evaluation is on traces of audio data from marmots on 3 GHz P4
• 4 channels x 48 kHz / channel
Comparison of Memory Managers on Marmot App
5000
Throughput, kSamp/sec
4500
4000
3500
3000
2500
2000
1500
CopyAlways
RefCount-Lazy
1000
500
0
0
50
100
150
200
250
300
Data Batch Size (Per Channel, kB)
350
400
Scheduler
• Goals:
• Maximize throughput
• Instruction and data cache locality
• Scheduler overheads
• Minimize memory consumption
• Queue lengths
Alternatives
• FIFO-timeslice (TS)
• Run-to-completion (RTC)
• Depth First
Alternatives
• FIFO-timeslice (TS)
• Every t seconds, run least-recently run
operator
• Emits copy into output queues
• Run-to-completion
• Depth First
Alternatives
• FIFO-timeslice (TS)
• Run-to-completion (RTC)
• Choose an op with FIFO, consume all inputs
• Choose one downstream operator, schedule
using RTC
• Emits copy
• Repeat
• Depth First
Alternatives
• FIFO-timeslice (TS)
• Run-to-completion (RTC)
• Depth First
• Choose an op with FIFO
• After every emit, invoke 1 downstream op
• Direct procedure call, no copying
• Enqueue on other downstream operators
• Maintain stack for cyclic plans
• Repeat
Discussion
• FIFO-timeslice (TS)
• Run-to-completion (RTC)
• Depth First
• Expect FIFO-TS, RTC to have good instruction cache
locality
• Depth first has good data locality, avoids copies
• Admission control (limiting input rate) matters
Scheduler Evaluation
• Same setup as before, with RefCount-Lazy
• Copying kills non
DepthFirst
• Benefit highly plan
dependent
• Looking at multiprocessor/multicore scheduling
SigSeg Evaluation
• Windowing
• No windows
• Windows, but per tuple timestamps
• SigSegs
• Pipelined hash join vs. syncs
• Sync-n extracts time ranges from n input signals
based on times in control stream
• Outputs new stream with n SigSegs
• Sync can exploit isochrony to directly offset to
time range of interest
• Join must compare timestamps
SigSeg Performance
Throughput (x1000 Samples/Second)
Marmot (no sync)
8000
7200
6400
5600
4800
4000
3200
2400
1600
800
0
Segs
Windows
Marmot (sync)
Samples
Factor of 2 from
eliminating timestamps
Factor of 4 from
windowing
Segs+Sync Windows+Join Samples+Join
Exploiting ordered,
regularly spaced time
domain in sync is a
huge win
Status
• Built single node prototype system
• Planned deployments:
• Pipeline monitoring in London
• Antbird tracking in Mexico
• Working on distributed & multicore
extensions
Conclusions
• WaveScope: high rate stream/signal processor
• Features
• SigRef based data model exploits isochrony
• WaveScript integrated query / UDF language
• Semantic optimizations
• Efficient runtime
• Coming soon
• Distribution
• Multi-core optimizations
Pipeline App
input = rewindow(sensorsource(‘pressure’), 8192, 500);
first <peak,rest> = trimpeak(haarwavelet(input, 4));
second<peak,rest> = trimpeak(first.rest);
third <peak,rest> = trimpeak(second.rest);
fourth<peak,rest> = trimpeak(third.rest);
BASE ← leakdetect(zip2(second,fourth));
Download