WaveScope: A Signal-Oriented Stream Management System Samuel Madden madden@csail.mit.edu w/ Hari Balakrishnan, Lewis Girod, Yuan Mei, Ryan Newton, Stanislav Rost, Arvind Thiagarajan Early Sensor Network Research • Primarily focused on low-rate, fixed deployments of primitive devices • Our focus is on higher rate, mobile apps Mobility CarTel Early Apps (TinyDB) WaveScope 100 - 10M Hz This talk Data Rate WaveScope: Pipeline Monitoring • Pressure sensors measure the frequency response of the pipe to pressure transients. • Questions: • Are there leaks in the pipe? Data rates: 10 kSamples/channel x 10s - 100s of channels • If so, approximately where are they? Borealis: • What is 15 thekSamples/sec estimated WaveScope: 7 MSamples/sec leak size? (Embedded hardware ~50 times slower) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Commercial stream processors? WaveScope: WiFi Packet Error Correction • Align packets in time, remove errors, output error corrected packet stream Valid Packet ? Receiver 0 PacketStreamAlign Window Group By Receiver 1 Union <PacketGroup> Estimate Time Offsets Receiver 2 Receiver N Packet streams (~ 5 kPackets/sec) CRC Error ? Output Packets Window Time Join Time +/- Error Correction FilterPair FilterPair (0,1)Match (0,2) Packets (0,3) WaveScope: Marmots • Marmot Detection Fast 1-ch Detection 4 channel audio Questions: Temporal Selection Audio bitstreams (~ 44 kHz) Enhance & Classify Output (Marmota flaviventris) 1. Is there current activity (energy) in the frequency band corresponding to the marmot alarm call? 2. If so, which direction is the call coming from? 3. Is the call that of a male or female? Is it a juvenile? 4. Where is each individual marmot located over time? Other Applications • Condition-based maintenance • Medical monitoring • CarTel Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. WaveScope Goal • Build a stream query processor for these apps • Common requirements: • User defined functions (e.g., signal proc., math, stats) • Relational operations • Good performance at high rates (100+ kHz) / on embedded hardware • Programmability • Distributability (not today) • Support for asynchronous data Why not use… • [your favorite streaming db] ? • UDFs for signal processing, etc. • Problem: “impedence mismatch” • Conversion to and from external tools • (At least for Borealis), performance • Per tuple overheads (timestamps, scheduling, etc.) • No particular allegiance to SQL • MATLAB • Poor support for streaming data • Poor integration with storage tools • Poor performance, unoptimizable WaveScope: Features & Usage Signal-oriented data Resultmodel User Console WaveScript Errors Compile & Optimize Program Type Inference Single language for WaveScope Engine queries & UDFs Query Plan Partial Evaluate Scheduler data data Threads Optimize Generate C Installs Query Plan Domain specific optimizations Memory Manager Efficient Runtime WaveScope: Features & Usage User Console WaveScript Errors Result Compile & Optimize Program Type Inference WaveScope Engine Query Plan Partial Evaluate Scheduler data data Threads Optimize Generate C Installs Query Plan Memory Manager Data Model Stream = < Int, String, array[ ] , SigSeg >, < >, …, < tuples • Streams are asynchronous, carry tuples • SigSeg is the primary windowing construct • (Logically) consists of an array of values • Isochronously (regularly in time) sampled • Windows passed between operators • Several possible implementations • When done right, isochrony enables • Efficient merging of streams • Quick lookup of historical stream fragments > WaveScript Language • Integrated query and custom operator language • Avoids mediating between external langs • Provides type safety, optimization • For queries & UDFs! • Compiles down to data flow • 3 main constructs: • Config - stream definition • Query - main program • Subquery - subroutine Marmot Schematic Fast 1-ch Detection Temporal selection ProfileDetector <t1,t2,emit> Audio0 <audio> Audio1 <audio> Audio2 <audio> Audio3 <audio> sync <w1,w2,w3,w4> Enhance and classify full dataset (expensive) Beamformer Output <DOA, enhanced> Classifier <type, id> <DOA, combined, type, id> zip WaveScript Example 1 fun profileDetect(S : Stream<SigSeg<int16>>, Implicit types scorefun, <winsize,step>) { wins = rewindow(S, winsize, step); Window once scores : Stream< float > scores = iterate(w in wins) { Custom ops via emit scorefun(fft(w)); “iterate” loops }; withscores : Stream<float, SigSeg<int16>> withscores = zip2(scores, wins); Recombination ops return threshFilter(withscores); } iterate S rewindow zip2 threshFilt WaveScript Example 2 Fast 1-ch Detection config { Ch0 = AudioSource(0, 48000, 1024); Ch1 = AudioSource(1, 48000, 1024); Ch2 = AudioSource(2, 48000, 1024); Ch3 = AudioSource(3, 48000, 1024); } Temporal selection ProfileDetector <t1,t2,emit> Audio0 <audio> Audio1 <audio> Audio2 <audio> Audio3 <audio> sync <w1,w2,w3,w4> Enhance and classify full dataset (expensive) Beamformer Output <DOA, enhanced> Classifier <type, id> <DOA, combined, type, id> Query { control = profileDetect(Ch0, marmotScore, <64,192>); datawindows = sync4(control, Ch0, Ch1, Ch2, Ch3); beam = beamform(datawindows, arrayGeometry); marmots = classify(beam.enhanced, marmotClassifier); return zip2(beam, marmots); } zip User Defined Functions • Standard SQL aggregates • 3 function: init, merge, output • Example: average • Init(A, val) { A.sum = val; A.count = 1; } • Merge(A1, A2) { A1.sum += A2.sum; A1.count += A2.count; } • Output (A) {return A.sum / A.count; } sub build_aggr(S, init, aggr, out) { S2 = iterate (x in S) { state { acc = init(); } acc := aggr(acc, x); emit out(x); } return S2; } WaveScope: Features & Usage User Console WaveScript Errors Result Compile & Optimize Program Type Inference WaveScope Engine Query Plan Partial Evaluate Scheduler data data Threads Optimize Generate C Installs Query Plan Memory Manager Query Compilation • Queries compile to dataflow graph of union (ziplike) and iterate boxes • Fancy generic operators all go away • Each union/iterate contains compiled code • Intermediate steps type check, expand, and optimize this graph • Runtime system executes this query plan (Planned) Query Optimizations • Performed during compilation • Query plan transforms • Iterate merging • Common expression elimination • Operator reordering • Domain specific rewrites • Rule-driven (pattern) -> (replacement) Rewrite Optimizations S2=autocorr(S1); S3=FFT(S2); S2=IFFT(Mult(FFT(S1),FFT(S1))); S3=FFT(S2); T1=FFT(S1); S2=IFFT(Mult(T1,T1)); S3=FFT(S2); autocorr(X) ≡ convolve(X,X) convolve(X,Y) ≡ IFFT(mult(FFT(X),FFT(Y)) Common Sub-expression FFT(IFFT(X)) ≡ X T1=FFT(S1); S3=Mult(T1,T1); A.Hussian, J. Heidemann, and C.Papadopoulos. A Framework for Classifying Denial of Service Attacks. SIGCOMM, 2003. WaveScope: Features & Usage User Console WaveScript Errors Result Compile & Optimize Program Type Inference WaveScope Engine Query Plan Partial Evaluate Scheduler data data Threads Optimize Generate C Installs Query Plan Memory Manager Runtime Issues • Memory management • Tuples may be copied between operators • Simplifies branching plans, synchronization • Can sometimes avoid (more later) • Can we avoid copying costs for signal data? • Scheduling / Threading model • Want to minimize scheduling overhead, memory utilization • Benefit of SigSeg based data model Memory Manager • Objective: minimize overheads of managing signal data (SigSegs) • SigSeg Ops • Allocate • Pass between operators • Append, subseg • Materialize (into array) Alternatives • SigSeg AlwaysSigSeg copy • SigSegs are just arrays, and are copied between ops and SegList on subseg/append • Materialization is free • RefCount-Lazy • SigSegs are reference counted pointers to seglists • Seglists contain a start time, end time, and list of Sub-range segments Underlying Data Buffer • Segments are produced by data source operators Reference counted pointer sample range • Subseg/append/pass areLogical cheap List pointer • Materialization is costly Memory Manager Evaluation • Evaluation is on traces of audio data from marmots on 3 GHz P4 • 4 channels x 48 kHz / channel Comparison of Memory Managers on Marmot App 5000 Throughput, kSamp/sec 4500 4000 3500 3000 2500 2000 1500 CopyAlways RefCount-Lazy 1000 500 0 0 50 100 150 200 250 300 Data Batch Size (Per Channel, kB) 350 400 Scheduler • Goals: • Maximize throughput • Instruction and data cache locality • Scheduler overheads • Minimize memory consumption • Queue lengths Alternatives • FIFO-timeslice (TS) • Run-to-completion (RTC) • Depth First Alternatives • FIFO-timeslice (TS) • Every t seconds, run least-recently run operator • Emits copy into output queues • Run-to-completion • Depth First Alternatives • FIFO-timeslice (TS) • Run-to-completion (RTC) • Choose an op with FIFO, consume all inputs • Choose one downstream operator, schedule using RTC • Emits copy • Repeat • Depth First Alternatives • FIFO-timeslice (TS) • Run-to-completion (RTC) • Depth First • Choose an op with FIFO • After every emit, invoke 1 downstream op • Direct procedure call, no copying • Enqueue on other downstream operators • Maintain stack for cyclic plans • Repeat Discussion • FIFO-timeslice (TS) • Run-to-completion (RTC) • Depth First • Expect FIFO-TS, RTC to have good instruction cache locality • Depth first has good data locality, avoids copies • Admission control (limiting input rate) matters Scheduler Evaluation • Same setup as before, with RefCount-Lazy • Copying kills non DepthFirst • Benefit highly plan dependent • Looking at multiprocessor/multicore scheduling SigSeg Evaluation • Windowing • No windows • Windows, but per tuple timestamps • SigSegs • Pipelined hash join vs. syncs • Sync-n extracts time ranges from n input signals based on times in control stream • Outputs new stream with n SigSegs • Sync can exploit isochrony to directly offset to time range of interest • Join must compare timestamps SigSeg Performance Throughput (x1000 Samples/Second) Marmot (no sync) 8000 7200 6400 5600 4800 4000 3200 2400 1600 800 0 Segs Windows Marmot (sync) Samples Factor of 2 from eliminating timestamps Factor of 4 from windowing Segs+Sync Windows+Join Samples+Join Exploiting ordered, regularly spaced time domain in sync is a huge win Status • Built single node prototype system • Planned deployments: • Pipeline monitoring in London • Antbird tracking in Mexico • Working on distributed & multicore extensions Conclusions • WaveScope: high rate stream/signal processor • Features • SigRef based data model exploits isochrony • WaveScript integrated query / UDF language • Semantic optimizations • Efficient runtime • Coming soon • Distribution • Multi-core optimizations Pipeline App input = rewindow(sensorsource(‘pressure’), 8192, 500); first <peak,rest> = trimpeak(haarwavelet(input, 4)); second<peak,rest> = trimpeak(first.rest); third <peak,rest> = trimpeak(second.rest); fourth<peak,rest> = trimpeak(third.rest); BASE ← leakdetect(zip2(second,fourth));