Introduction to CBEA SDK Veselin Dikov Moscow-Bavarian Joint Advanced Student School 19-29. March 2006 Moscow, Russia Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Outline ● ● ● ● ● Getting started SPU Language Extensions SPE Library SDK Libraries Remote Procedure Calls Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Getting started ● Two executables formats: ppu vs. spu Two different compilers: ppu-gcc and spu-gcc Check the format of an executable # file <executable> ● Makefile headers Reside in SDK root directory make.header make.footer make.env Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Getting started ● Makefile - ppu # Subdirectories ############### DIRS := spu # Target ####################### PROGRAM_ppu:= simple ● Makefile - spu # Target ####################### PROGRAMS_spu := simple_spu # created embedded library LIBRARY_embed:= lib_simple_spu.a # Local Defines ################ # Local Defines ################################ IMPORTS = $(SDKLIB_spu)/libc.a IMPORTS := spu/lib_simple_spu.a \ -lspe # imports the embedded simple_spu # library allows consolidation of # spu program into ppe binary # make.footer ################################ include ../../make.footer # make.footer ################## include ../make.footer # make.footer is in the top of # the SDK # make.footer is in the top of # the SDK Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Outline ● ● ● ● ● Getting started SPU Language Extensions SPE Library SDK Libraries Remote Procedure Calls Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia SPU Language Extensions ● ● ● ● ● Provides SPU functionality Extended RISC instruction set Extended Data types: vector C/C++ intrinsic routines Memory Flow Control (MFC) Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia SIMD Vectorization ● vector - Reserved as a C/C++ keyword ● 128 bits of size 16 bytes, or 4 32-bit words, or etc. ● Single instruction acts on the entire vector vector unsigned int vec = (vector unsigned int)(1,2,3,4); vector unsigned int v_ones = (vector unsigned int)(1); vector unsigned int vdest = spu_add(vec, v_ones); Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia SIMD Vectorization ● Overridden C/C++ operators sizeof() (=16 for any vector) operator= operator& ● Various intrinsic functions for vectors Arithmetical – spu_add, spu_mult, etc. Logical – spu_and, spu_nor, etc. Comparison – spu_cmpeq Scalar – spu_extract, spu_insert, spu_promote Etc. Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia MFC routines ● Local Store (LS) vs. Effective Address (EA) ● Data transport via DMA Up to 16,384 bytes on DMA message Asynchronous Part of the instruction set, C/C++ intrinsics mfc_get(&data,addr+16384*i,16384,20,0,0); LS address EA address ● What is __attribute__ ((aligned (128)));? Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia MFC example ● ppu main file int main() { … /* the array needs to be aligned on a 128-byte cache line */ data = (int *) malloc(127 + DATASIZE*sizeof(int)); while (((int) data) & 0x7f) ++data; … spe_create_thread(gid,&sample_spu,data,NULL,-1,0); Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia MFC example ● ppu main file int main() { … /* the array needs to be aligned on a 128-byte cache line */ data = (int *)malloc_align(DATASIZE*sizeof(int),7); … spe_create_thread(gid,&sample_spu,data,NULL,-1,0); Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia MFC example ● spu main file int databuf [DATASIZE]__attribute__ ((aligned (128))); int main(void * arg) { int *data = &databuf[0]; … mfc_get(data,arg,DATASIZE*sizeof(int),20,0,0); mfc_write_tag_mask(1<<20); mfc_read_tag_status_all(); Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Outline ● ● ● ● ● Getting started SPU Language Extensions SPE Library SDK Libraries Remote Procedure Calls Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia ● Provides PPU functionality ● Two sets of functions Thread management – POSIX like MFC access functions – access to mailboxes ● Header: <libspe.h> Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Thread management ● Threads are organized in thread groups Thread scheduling: •SCHED_RR •SCHED_FIFO •SCHED_OTHER Priority: •RR and FIFO – 1 to 99 •OTHER – 0 to 99 Thread Group 0 Th0 Th1 Th2 ThN Thread Group 1 Th0 Th1 Th2 ThN Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Thread management ● Threads are organized in thread groups spe_gid_t gid; // group handle speid_t speids[1]; // thread handle // Create an SPE group gid = spe_create_group (SCHED_OTHER, 0, 1); if (gid == NULL) exit(-1); // failed // allocate the SPE task speids[0] = spe_create_thread(gid,&sample_spu,0,0,-1,0); if (speids[0] == NULL) exit (-1); // wait for the single SPE to complete spe_wait(speids[0], &status[0], 0); Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Thread management ● spe_get_affinity, spe_set_affinity 8-bit mask specifying SPE units where to start the thread ● spe_get_ls Gets access to SPU’s local store Dirty!! Not supported by all operating systems Used for RPC communication ● spe_get/set_context, spe_get_ps Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Thread management ● Two ways of communicating with threads via signals via mailboxes ● POSIX-like Signals spe_get_event - check for signals from a thread spe_kill – send signal to a thread spe_write_signal – writes a signal through MFC ● MFC mailboxes spe_read_out_mbox, spe_write_in_mbox spe_stat_in_mbox, spe_stat_out_mbox, spe_stat_out_intr_mbox Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Outline ● ● ● ● ● Getting started SPU Language Extensions SPE Library SDK Libraries Remote Procedure Calls Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia SDK Libraries ●The SDK comes with various applied libraries Library name Short Description PPE SPE C Library standard C99 functionality. POSIX.1 functions. x x Audio Resample Library audio resampling functionality for PPE and SPE x x Curves and Surfaces Library quadratic and cubic Bezier curves. Biquadric and bicubic Bezier surfaces, and curved point-normal triangles. x x FFT Library 1-D FFT and kernel functions for 2-D FFT x x Game Math Library math routines applicable to game performance needs x x Image Library routines for processing images - convolutions and histograms x x Large Matrix Library basic linear algebra routines on large vectors and matrices Math Library general purpose math routines tuned to exploit SIMD x x x Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia SDK Libraries ●The SDK comes with various applied libraries Library name Short Description PPE SPE Matrix Library routines for operations on 4x4 Matrices and quaternions x x Misc Library set of general purpose routines that don’t logically fit within any other x x Multi-Precision Math Library operations on unsigned integer numbers with large number of bits Noise LibraryPPE 1-,2-, 3-, 4-D noise, Lattice and non-lattice noise, Turbulance x x Oscillator Libraries definition of sound sources x x Simulation Library functionality related to the Full-Simulator - - Sync Library synchronization primitives, like atomic operations, mutex x x Vector Library a set of general purpose routines that operate on vectors. x x x Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Example typedef union { vector float fv; vector unsigned int uiv; unsigned int ui[4]; float f[4]; } v128; typedef vector float floatvec; ● Solving Ax y … v128 A[4] = {(floatvec)(0.9501,0.8913,0.8214,0.9218), (floatvec)(0.2311,0.7621,0.4447,0.7382), from LargeMatrix library. (floatvec)(0.6068,0.4565,0.6154,0.1763), from Matrix (floatvec)(0.4860,0.0185,0.7919,0.4057)}; Solves: v128 invA[4]; library y = A*x + y v128 x, y; x.fv = (floatvec)(1, 5, 6, 7); y.fv = (floatvec)(0); inverse_matrix4x4(&invA[0].fv, &A[0].fv); madd_vector_matrix(4,4,&invA[0].fv,4,(float *)&x,(float *)&y); Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Outline ● ● ● ● ● Getting started SPU Language Extensions SPE Library SDK Libraries Remote Procedure Calls Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Remote Procedure Calls ● Function-Offload Model SPE threads as services PPE communicates with them thought RPC calls ● Remote Procedure Calls stubs Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Remote Procedure Calls ● Preparing stubs Interface Description Language (IDL) idl files interface add { import "../stub.h"; const int ARRAY_SIZE = 1000; [sync] idl_id_t do_inv ([in] int array_size, [in, size_is(array_size)] int array_a[], [out, size_is(array_size)] int array_res[]); … } idl compiler # idl -p ppe_sample.c -s spe_sample.c sample.idl Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Remote Procedure Calls ● Preparing stubs Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Example ● .idl file interface add { import "../stub.h"; const int ARRAY_SIZE = 1000; [sync] idl_id_t do_add ([in] int array_size, [in, size_is(array_size)] int array_a[], [in, size_is(array_size)] int array_b[], [out, size_is(array_size)] int array_c[]); [sync] idl_id_t do_sub ([in] int array_size, [in, size_is(array_size)] int array_a[], [in, size_is(array_size)] int array_b[], [out, size_is(array_size)] int array_c[]); } Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Example ● spe do_add.c file (user file) #include "../stub.h" idl_id_t do_add ( int array_size, int array_a[], int array_b[], int array_c[]) { int i; for (i = 0; i < array_size; i++) { array_c[i] = array_a[i] + array_b[i]; } return 0; } idl_id_t do_sub ( int array_size, int array_a[], int array_b[], int array_c[]) { ... return 0; } Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Example ● ppe main.c file (user file) #include "../stub.h" int array_a[ARRAY_SIZE] __attribute ((aligned(128))); int array_b[ARRAY_SIZE] __attribute ((aligned(128))); int array_add[ARRAY_SIZE] __attribute ((aligned(128))); int main() { ... do_add (ARRAY_SIZE, array_a, array_b, array_add); ... } Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia Conclusion ● Powerful SDK vector data type Extensive Matrix routines Extensive Signal Processing routines ● Game-industry driven But applicable for scientific work as well Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia References Thank you for your attention! ● ● ● ● ● CBEA-Tutorial.pdf, SDK documentation idl.pdf, SDK documentation libraries_SDK.pdf, SDK Documentation libspe_v1.0.pdf, SDK Documentation SPU_language_extensions_v21.pdf Sony online resources http://cell.scei.co.jp/pdf/SPU_language_extensions_v21.pdf, 15.03.2006 Moscow-Bavarian Joint Advanced Student School 19-29 March 2006, Moscow, Russia