West Coast Spectrometer Team Mark Wagner, Berkeley project manager, FPGA designer Terry Filiba, data transport: FPGA --> CPU --> GPU Suraj Gowda, boosting FFT/FPGA clock speed Glenn Jones, digital downconverter design (caltech) Guifre Molera, 10Gbit ethernet protocol, GUPPI mods Gregory Desvignes, Guppi Code modifications Simon Scott, systems integration, arriving march 26 Hong Chen, fft optimizations (bit growth, unscramber) Billy Mallard, DSP library optimizations (DSP48, etc) Andrew Siemion, galactic center pulsar application Dan Werthimer, taking credit for above work 1 Observing Modes: 250 MHz GPU 2 3 Overall Block Diagram 4 Roach Motel (Roach Nest) (KAT) 7 Roach I vs Roach II • Roach I works well. Deployed at many observatories • Roach II doesn’t exist. Prototypes spring. Prod Winter? • Roach I resources are tight, harder to get to work at high speed hard to add features, 500 MHz 8K channels won’t fit • Roach II can use SFP+ connectors, more reliable 10Gbe connector • Plan: develop and test using Roach I. Decide later. 8 3 GS/s ADC Board 9 10 Polyphase Filter Bank FGPA Spectrometer – Mark Wagner 12 FPGA DDC/Packetizer (Mark Wagner) (extract sub-band(s) and send to GPU) 13 64 channel spectrometer 2 GHz bandwidith with xilinx place/route 1024 channel spectrometer, 3 GHz BW Suraj Gowda scripts for autoplacement “Automated Placement for Parallelized FPGA FFTs” Suraj Gowda et al, 2011 No Placement Constraint Placement Constrained using our algorithm Processable Bandwidth <2.4 GHz > 3 GHz Compile time 80:19 minutes 38:22 minutes Existing CASPER DDC/Decimation filter Quarter band filter for 8 real inputs 8*Fclk real samples per second (BW=4*Fclk) x0 x1 x2 x3 x4 x5 x6 x7 x x x x x x x x Multiply by complex sinusoid E0(z) E1(z) E2(z) E3(z) E4(z) E5(z) E6(z) E7(z) Polyphase filter components Fclk cplx samples per second (BW=Fclk) + … y3 y2 y1 y0 Half Band DDC/Filter - Glenn Jones 8*Fclk real samples per second (BW=4*Fclk) x0 x4 x E0(z) x1 x5 x E1(z) x2 x6 x E2(z) x3 x7 x E3(z) 2*Fclk cplx samples per second (BW=2*Fclk) v0 v4 v1 v5 v2 v6 v3 v7 + … y2 y0 + … y3 y1 SPEAD packet - FGPA 10Gbe (Guifre Molera) Preliminary Design Work • Concentrating on the hard parts – 3 GS/s sampling and PFB/FFT calculations – Heterogeneous Computing Approach • Divide processing into front/back ends • Use FPGAs to fully process bandwidths greater than 250 MHz • Use FPGA front-ends to pre-process, split and packetize data, then GPUs to provide fine channelization on narrower chunks – Software Design • Adapting code from the Green Bank Ultimate Pulsar Processing Instrument (GUPPI) 20 Pulsars at the Galactic Center ?? 100’s of pulsars predicted in the central pc none undiscovered - Macquart, Frail, Ransom, Bower.. Map gravitational field (timing), ISM at GC, black hole spin? Extreme scattering smears out the pulse High Frequency Observation to minimize scattering High Bandwidth Needed at High Frequency (low flux) 800 MHz 8 GHz 21 Worries • Speed of FPGA - difficult, time consuming layouts, perhaps impossible for full Roach I chip • Lots of modes - time consuming (design/test/software/document) • Will everything fit in Roach I? Use Roach II? • Might loose Mark Wagner in September 22