ISTC-EC @ Cornell Accelerating Belief Propagation in Hardware Skand Hurkat and José Martínez Computer Systems Laboratory Cornell University http://www.csl.cornell.edu/ The Cornell Team ISTC-EC @ Cornell • Prof. José Martínez (PI), Prof. Rajit Manohar @ Computer Systems Lab • Prof. Tsuhan Chen @ Advanced Multimedia Processing Lab • MS/Ph.D. students – Yuan Tian, MS ’13 – Skand Hurkat – Xiaodong Wang The Cornell Graph ISTC-EC @ Cornell Rob Rutenbar UIUC Intel Center Eriko Nurvithadi Ithaca Limor Fix Ravi Iyer Oregon Omesh Tickoo Boris Ginsburg Haifa Cornell's ISTC-EC Graph Carlos Guestrin Washington Ronny Ronen Yoav Talgam CMU James Hoe The Cornell Project ISTC-EC @ Cornell • Provide hardware accelerators for belief propagation algorithms on embedded SoCs (retail/car/home/mobile) – High speed Graph – Very low power – Self-optimizing – Highly programmable Inference Algorithm BP Accelerator within SoC Result What is belief propagation? ISTC-EC @ Cornell Belief propagation is a message passing algorithm for performing inference on graphical models, such as Bayesian networks or Markov Random Fields What is belief propagation? ISTC-EC @ Cornell • • • • Labelling problem Energy as a measure of convergence Minimize energy (MAP label estimation) Exact results for trees – Converges in exactly two iterations • Approximate results for graphs with loops – Yields “good” results in practice • Minimum over large neighbourhoods • Close to optimal solution Not all “that” alien to embedded ISTC-EC @ Cornell Remember the Viterbi algorithm? • Used extensively in digital communications 𝑠0 𝑠11 𝑠21 𝑠31 𝑠41 𝑠𝑠12 1 𝑠𝑠22 2 𝑠𝑠32 3 𝑠𝑠42 4 𝑠13 𝑠23 𝑠33 𝑠43 𝑠5 What does this mean? ISTC-EC @ Cornell • Every mobile device uses Viterbi decoders – Error correction codes (eg: turbo codes) – Mitigating inter-symbol interference (ISI) • Increasing number of mobile applications involve belief propagation – More general belief propagation accelerators can greatly improve user experience with mobile devices Target markets ISTC-EC @ Cornell Retail/Car/Home/Mobile • Image processing – – – – De-noising Segmentation Object detection Gesture recognition • Handwriting recognition – Improved recognition through context identification • Speech recognition – Hidden Markov models are key to speech recognition Servers • Data mining tasks – Part-of-speech tagging – Information retrieval – “Knowledge graph” like applications • Machine learning based tasks – Constructive machine learning – Recommendation systems • Scientific computing – Protein structure inference Hardware accelerator for BP ISTC-EC @ Cornell Inference Algorithm Graph BP Accelerator within SoC Result Work done so far ISTC-EC @ Cornell Software • General purpose MRF inference library – Support for arbitrary graphs – Floating point math – Parallel techniques for faster inference • Library optimized for grid graphs – Optimized data structures – Template can use any data type – Multiple inference techniques optimized for early vision – Stereo matching in ∼200 ms Hardware • High level synthesis of message update unit – Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard – ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference – Fixed point math • GraphGen collaboration – On-going work – Stereo matching task mapped to multiple platforms – ∼10x speedup on GPU w.r.t. CPU only implementation Work done so far ISTC-EC @ Cornell Software • General purpose MRF inference library – Support for arbitrary graphs – Floating point math – Parallel techniques for faster inference • Library optimized for grid graphs – Optimized data structures – Template can use any data type – Multiple inference techniques optimized for early vision – Stereo matching in ∼200 ms Hardware • High level synthesis of message update unit – Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard – ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference – Fixed point math • GraphGen collaboration – On-going work – Stereo matching task mapped to multiple platforms – ∼10x speedup on GPU w.r.t. CPU only implementation Work done so far ISTC-EC @ Cornell Software • General purpose MRF inference library – Support for arbitrary graphs – Floating point math – Parallel techniques for faster inference • Library optimized for grid graphs – Optimized data structures – Template can use any data type – Multiple inference techniques optimized for early vision – Stereo matching in ∼200 ms Hardware • High level synthesis of message update unit – Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard – ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference – Fixed point math • GraphGen collaboration – On-going work – Stereo matching task mapped to multiple platforms – ∼10x speedup on GPU w.r.t. CPU only implementation Hierarchical belief propagation ISTC-EC @ Cornell Results – Stereo Matching ISTC-EC @ Cornell Comparing inference algorithms on “Tsukuba” benchmark E n e r g y 480000 475000 470000 465000 460000 455000 450000 445000 440000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0 U p d a t e s Updates Energy Work done so far ISTC-EC @ Cornell Software • General purpose MRF inference library – Support for arbitrary graphs – Floating point math – Parallel techniques for faster inference • Library optimized for grid graphs – Optimized data structures – Template can use any data type – Multiple inference techniques optimized for early vision – Stereo matching in ∼200 ms Hardware • High level synthesis of message update unit – Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard – ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference – Fixed point math • GraphGen collaboration – On-going work – Stereo matching task mapped to multiple platforms – ∼10x speedup on GPU w.r.t. CPU only implementation Work done so far ISTC-EC @ Cornell Software • General purpose MRF inference library – Support for arbitrary graphs – Floating point math – Parallel techniques for faster inference • Library optimized for grid graphs – Optimized data structures – Template can use any data type – Multiple inference techniques optimized for early vision – Stereo matching in ∼200 ms Hardware • High level synthesis of message update unit – Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard – ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference – Fixed point math • GraphGen collaboration – On-going work – Stereo matching task mapped to multiple platforms – ∼10x speedup on GPU w.r.t. CPU only implementation GraphGen synthesis of BP-M ISTC-EC @ Cornell • BP-M update (logspace messages) implemented using GraphGen (Intel/CMU/UW) • GPU implementation ∼10x faster than CPU based implementation • On-going work on FPGA based implementation and on implementing hierarchical update Cornell Publications (2013 only) ISTC-EC @ Cornell • • • • 3x Comp. Vision & Pattern Recognition (CVPR) 3x Asynchronous VLSI (ASYNC) 2x Intl. Symp. Computer Architecture (ISCA) 1x Intl. Conf. Image Processing (ICIP) • 1x ASPLOS (w/ GraphGen folks, under review) Year 3 Plans ISTC-EC @ Cornell • GraphGen extensions for BP applications – Multiple inference techniques • Extraction of “BP ISA” – Ops on arbitrary graphs – Efficient representation • Amplification work on UAV ensembles – Self-optimizing, collaborative SoCs • One-day “graph” workshop with GraphGen+UIUC ISTC-EC @ Cornell Accelerating Belief Propagation in Hardware Skand Hurkat and José Martínez Computer Systems Laboratory Cornell University http://www.csl.cornell.edu/