FPGA in the Medical Field

Houffaneh Osman halio029@uottawa.ca    Single Instruction, Multiple Data Part of Flynn Taxonomy computer classification Multiple processors ◦ Different data streams  Same instruction executed   Able to operates on multiple data items at the same time Computation : The most minimal time possible ◦ Vectors ◦ Matrices  Better speedup then sequential  Two type of processors ◦ True SIMD ◦ Pipelined SIMD  Divide a instruction into smaller function  Execute smaller function in parallel on different data      Single control unit M processing elements act as arithmetic unit N data elements (or even more then M) Processor elements receives instruction from control unit If a processor element need information from another processor element ◦ Send request to control unit and it manage the memory exchanges      Single control unit M processing elements act as arithmetic unit N data elements (or even more then M) Processor elements receives instruction from control unit Processing elements able to share their memory without control unit access True SIMD : Distributed Memory True SIMD : Shared Memory  Cell used : IBM Cell BE  The Cell Broadband Engine (CBE) ◦ Single-chip multiprocessor  with 9 processor ◦ All processor share the same main storage  Processor function used in 2 functions ◦ PowerPC Processor Element (PPE) ◦ Synergistic Processor Element (SPE)  VMX : Vector Multimedia eXtension to the PowerPC architecture ◦ Utilizes data parallelism for faster performance   SIMD in VMX and SPE (Reference IBM Cell Programming) ◦ 128bit-wide datapath ◦ 128bit-wide registers ◦ 4-wide fullwords, 8-wide halfwords, 16-wide bytes ◦ SPE includes support for 2-wide doublewords Vector Programming   Each of the 4 elements in VA and VB are added and their sum placed in VC VC = vec_add(VA,VB)  SIMD Unprocessable Patterns ◦ Case where the instruction differ for each processing element  SIMD Processable Patterns ◦ Case where the instruction are the same for each processing element  Register view of the add instruction in previous slide  VC = vec_add(VA,VB)  Permute method or shuffling ◦ Between two vector ◦ Third vector used for control vector  VT = vec_perm(VA,VB,VC)  SSE : Streaming SIMD Extensions ◦ Instruction set to the x86 architectures ◦ Extension of 128-bit  Introduced in 1999 in the Pentium III ◦ Latest version : SSE5 before revision  Future extension from Intel ◦ AVX : Advanced Vector Extensions ◦ 256-bit instructions  Image Processing  Digital Signal Processing  Encoding  Streaming load  Streaming load instruction ◦ Enables faster read ◦ Improves performance of application that ‘s using the GPU and CPU  SIMD improve encoding speed ◦ Required arithmetic performed on pixel  Pixel in a video -> high level of parallelism required Matrix multiplication – No data parallelism Matrix multiplication – Employed data parallelism  Native vs Traditional programming  Auto-vectorization ◦ Detection of low-level operation ◦ Convert these sequential program to process 2 to up to 16 elements in one operation  Auto-parallization ◦ Turning sequential code into multi-threaded  Intel C++ Compiler ◦ Serial section of input program -> multithreaded code ◦ Compiler also efficient in order to not have too much overhead when creating multithreads  Intel® Architecture Code Analyzer  PGI CDK Cluster Development Kit ◦ AMD Opteron ◦ Intel Core 2  GNU Compiler for C and C++ ◦ Nested Loops conditions ◦ Multidimensional arrays  PGI CDK Cluster Development Kit ◦ SSE vectorization  Developed to utilized ◦ Multi-core processors ◦ Graphics processing units   Takes advantages of the SIMD and core processing elements Portion of C/C++ code that have parallelism can be used in conjunction with ArBB  Isolated data objects from rest of codes ◦ Intel mention this imposes a restrictions ◦ Restrictions eliminates locks and data races  Threading by itself ◦ Do not provide access to per-core vector parallelism  ArBB API provides programming models at software level for developers       Intel Press, “Multi-Core Programming : Increasing Performance through Software Multithreading,'' pp. 2--6 -- 11--13, Apr 2006. Intel Corp. “Intel C++ Compiler 8.1 for Linux,” Internet: ftp://download.intel.com/support/performancetools/c/linux/sb/clin81_relnotes.pdf, 2004 pg 1--9.[2010-10-24] Linux Kernel Organization, “Cell Programming Primer : Basics of SIMD programming,Documents of PS3 Linux Distributor's Starter Kit, Internet: http://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/ps3-linuxdocs-08.06.09/CellProgrammingTutorial/BasicsOfSIMDProgramming.html, 2006,2007,2008 [Oct. 24, 2010]. C.Chen, R.Raghavan, J.Dale, E.Iwata, “Cell Broadband Engine Architecture and its first implementation,". Internet: http://www.ibm.com/developerworks/power/library/pacellperf/, Oct. 2005 [Oct. 24, 2010]. H.Chang, C.Cho, S.Wonyong, “Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit, Signal Processing Systems Design and Implementation, 2006. SIPS '06. IEEE Workshop on}, oct. 2006, pp. 1520-6130.      GCC GNU Project, “Auto-vectorization in GCC,". Internet: http://gcc.gnu.org/projects/tree-ssa/vectorization.html, Aug. 2010 [Oct. 24, 2010]. Intel Software Network, “Performance Tools for Software Developers - Auto parallelization and /Qpar-threshold,". Internet: http://software.intel.com/enus/articles/performance-tools-for-software-developers-auto-parallelization-andqpar-threshold/, Jul. 2009 [Oct. 24, 2010]. National Instruments, “Programming Strategies for Multicore Processing: Data Parallelism,". Internet: http://zone.ni.com/devzone/cda/tut/p/id/6421, Nov. 2008 [Oct. 24, 2010]. A.Lanterman, “Multicore and GPU Programming for Video Games: Developing Code for Cell - SIMD". Internet: http://users.ece.gatech.edu/~lanterma/mpg09/, Fall 2010 [Oct. 24, 2010]. R.Michael Hord, "Parallel supercomputing in SIMD architectures," Boca Raton, FL: CRC Press, c1990      IBM Corp and Sony Computer Entertainment, “Software Development Kit for Multicore Acceleration Version 3.0: Data Parallelism,". Internet: http://users.ece.gatech.edu/~lanterma/mpg09/CBE_Programming_Tutorial_v3.0.pdf, Nov. 2008 [Oct. 24, 2010]. IBM Corp and Sony Computer Entertainment (2006,2007). "Software Development Kit for Multicore Acceleration (Version 3). [On-line],", Internet: http://users.ece.gatech.edu/~lanterma/mpg09/CBE_Programming_Tutorial_v3.0.pdf"[O ct. 24, 2010]. J.Demmel, "A closer look at parallel architectures: Lecture 9," Internet: http://www.eecs.berkeley.edu/~demmel/cs267/lecture09/lecture09.html, Feb. 1996 [Oct. 24, 2010]. S.Morse, "Practical parallel computing ," Boston : AP Professional, c1994 C.Leopold, "Parallel and distributed computing : a survey of models, paradigms and approaches ," New York : Wiley, 2001       L.Dong-hwan, S. Wonyong, ``Importance of SIMD computation reconsidered,''Parallel and Distributed Processing Symposium, 2003. Proceedings. International}, apr. 2003, pp. 8. W.C. Meilander,J.W. Baker, M. Jin, ``Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit,'', Signal Processing Systems Design and Implementation, 2006. SIPS '06. IEEE Workshop on}, oct. 2006, pp. 1520-6130. http://www.gamasutra.com/view/feature/4248/designing_fast_crossplatform_simd_.ph p http://domino.watson.ibm.com/comm/research.nsf/pages/r.arch.simd.html Intel Array Building Blocks : http://software.intel.com/en-us/articles/intel-arraybuilding-blocks/ http://www.wolfire.com/

FPGA in the Medical Field

Related documents

Products

Support

FPGA in the Medical Field

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib