Embedded Software for Radar Signal Processing Applications Table of Contents 1 System Design Approach ................................................................................................................................4 1.1 Critical Timing Requirements ..................................................................................................................4 1.2 Architecture Hardware Features and Application Notes...........................................................................4 1.3 Estimate of Performance Requirements. ................................................................................................5 1.4 Measuring Performance...........................................................................................................................5 2 TI DSP + ARM Software Development Resources.........................................................................................5 2.1 Starting Software Development: “Quick Start Guide Installation Guide”, “Floating-Point Starter Kit” ....6 3 System Software Architecture..........................................................................................................................6 3.1 Unix System V Architecture......................................................................................................................7 3.2 Inter-Process Communication (IPC)........................................................................................................7 3.2.1 System V IPC...................................................................................................................................7 3.2.2 System V STREAMS.......................................................................................................................7 3.2.3 POSIX, pthreads..............................................................................................................................7 3.2.4 Sockets, Pipes.................................................................................................................................7 3.2.5 uClinux and IPC...............................................................................................................................8 3.2.6 OMAP-L1x MSGQ (IPC) and DSPLIB/DSPLink..............................................................................8 3.2.6.1 DSPLINK.................................................................................................................................9 3.2.7 DSPLINK Summary.......................................................................................................................11 3.2.8 Host Port Interface.........................................................................................................................11 3.2.9 Radar Data and the Host Port Interface.........................................................................................13 3.3 Signal Processing for Long Range Radar..............................................................................................14 3.4 Texas Instruments DSP BIOS → SYS BIOS........................................................................................15 3.4.1 DSPBIOS 6.x ................................................................................................................................15 3.5 Texas Instruments FastRTS Library.....................................................................................................15 3.6 Applying New Technology:.....................................................................................................................15 3.7 Software Libraries..................................................................................................................................16 3.8 Linear Math Libraries cblas_zgemm and zgemm:..................................................................................16 3.9 OMAP-L137...........................................................................................................................................16 3.9.1 JTAG .............................................................................................................................................17 4 References:....................................................................................................................................................17 4.1 Texas Instruments..................................................................................................................................17 4.2 SAAB (Microwave) ................................................................................................................................18 5 Appendix .......................................................................................................................................................18 6 Linux and the Texas Instruments c6x family...................................................................................................18 7 Software as Texas Instruments redefines it. .................................................................................................18 8 OMAP Software Resources. .........................................................................................................................19 9 VisionMid TMS320DM814x Software Resources...........................................................................................22 10 Pseudo Code...............................................................................................................................................23 Illustration Index Illustration 1: Initial ARC5-B Digital Signal Processing System...............................................................................3 Illustration 2: Texas Instruments ARM - DSP Layered Architecture........................................................................5 Illustration 3: Texas Instruments Inter-OS Communications....................................................................................6 Michael Nolin 1 of 24 January 7, 2011 Illustration 4: Texas Instruments DSP/BIOS Link Architecture..............................................................................10 Illustration 5: GPP-DSP connectivity through DSP/BIOS LINK.............................................................................10 Illustration 6: Architecture RingIO Transfer, Shared Memory................................................................................11 Illustration 7: CCStudio MSGQ configuration .......................................................................................................12 Illustration 8: DSP/LINK Message Queue MSGQ.................................................................................................13 Illustration 9: DSP hardware accelerator or algorithm co-processing engine........................................................20 Illustration 10: DSP algorithm co-processing engine and external peripherals.....................................................20 Illustration 11: OMAP-L137, running “Example DSPLIB/DSPLink Application on OMAP-L1x”. ...........................22 Illustration 12: BIOS PSP Users Guide (OMAP-L137) block driver.......................................................................23 Illustration 13: BIOS PSP driver with streaming interface (OMAP-L137)...............................................................24 Michael Nolin 2 of 24 January 7, 2011 Illustration 1: Initial ARC5-B Digital Signal Processing System Scope: The initial motivation of this document was a request to explain “IPC” as it would be required to demonstrate the capture of basic radar data on the ARC5-B digital signal processing board. InterProcessor Communication is difficult to explain outside of its operating systems context and related Michael Nolin 3 of 24 January 7, 2011 computer architecture building blocks. As a result this document continues to expand into general software and architecture concepts, which more suitable and organized explanations may be found in the numerous references. Embedded Real Time operating systems have continued to evolve over several decades from simple micro controllers to the multi core systems being used today for image and signal processing systems of all types. The Reference manuals Users Guides and texts are invaluable to the understanding of embedded architecture building blocks. In many cases freely available software, operating systems and documentation have been leading the way in 64 bit, multi-core, RF. Hand-held user applications... The hardware designs for radar signal processing, to date have been implementations of Texas Instrument Floating Point DSP's, as a result much of the hardware and software integration is also suitably described on www.ti.com and http://processors.wiki.ti.com. 1 System Design Approach “Identify critical timing or CPU performance requirements.”1 Check vendor application notes or example of similar application notes. if not identify architecture hardware features that can be leveraged. This can indicate how far an application can be optimized. 3. Roughly estimate performance requirements MIPS, FLOPS, precise calculations may not be necessary for large variations in performance. 4. For small performance gaps (10-20%) on major components of the application specific implementation test models should be measured using the provided SDK Software Development Kit. 1. 2. 1.1 Critical Timing Requirements 12 bit A/D converters with 80ns conversion time, this conversion time is determined by the TMS320F28335 in current designs. Also under consideration, 14,16 bit A/D converters 24 bit converters are sometimes implemented in high end audio equipment. 5ns software real time requirement for Sequencer control, this may be addressed with a PWM??. 1 Mbps serial CAN buss interface for communicating 'target' information 50ms.. 7040 256pt IFFTS per sweep for 8 antenna array 40-75ms sweeps? 4, 6,8, 9 and 12(12x64) antenna arrays are being considered. 1ms DSP algorithm goal for processing between radar sweeps. At 40ms per sweep an estimated 75% to process ESPRIT angle estimation or 30ms. 10 targets per sweep for 3ms to process each target. 2 16 bit (½ speed) memory bus differs significantly from reference hardware implementation and measurements. 150MHz F28335 (6.66ns cycle time) 300MHz 6747 (3.33ns cycle time). 7MHz SPI slave limitations TI specific design constraints. Performance goals and critical timing requirements continue to emerge. Vendors have offered an array of options from 100MHz devices through multi core 1GHz devices with hardware acceleration. Cost goals continue to drive component selection. 1.2 Architecture Hardware Features and Application Notes. Clearly titled example “Example DSPLIB/DSPLink Application on OMAP-L1x” addresses both processor to 1 Embedded Systems Design October 2010 “Why MIPS is just a number” Gaurang Kavaiya 2 August 4,2010 email “Timing for TI functions”. Michael Nolin 4 of 24 January 7, 2011 processor communication as well as key benchmarks specific to the Radar algorithm being developed. The example provided a valuable working program that can be built and modified by the developer to further specialize the application and measure performance for a variety of DSP functions on the OMAP-L137 EVM hardware. ARM to 6747 shared memory as well as the 335 to 6747 HPI provide the hardware transports for processor to DSP co-processor communications. The effectiveness of processor to processor communication is a significant system consideration. Existing product implements a SPI serial communication scheme significantly limiting effectiveness. (asynchronous serial verses synchronous parallel) 1.3 Estimate of Performance Requirements. Benchmark spreadsheets. 1.4 Measuring Performance OMAP-L137 EVM and platform software, DSPLIB Function for DSP_sp_mat_mul_cplx 32x32 was run and confirmed .8ms. The software package was rebuilt along with the linux kernel and similar results were obtained. A series of discrete algorithm functions, for single and double precision floating point test were measured on the OMAP-L137 EVM using CCStudio resources. Measured end to end 3ms was consumed through ½ the algorithm CSVD. 2 TI DSP + ARM Software Development Resources Illustration 2: Texas Instruments ARM - DSP Layered Architecture With the development of new SoC designs from TI updates and recent releases (October-November 2010) Michael Nolin 5 of 24 January 7, 2011 are continually reviewed for architecture resources that can be leveraged in the LRR design goals. TI continues to design and expand its SoC ARM+DSP architectures. The software development tools and resources continues to evolve around the frame work introduced with OMAP-L137. A familiar list of cross host compatible tools: • TI C6000 Code Generation Tools v6.0.9 or higher • TI DSP/BIOS v5.41.x • TI Codec Engine 2.25 or higher • TI XDC Tools eXpanDed C.. • TI Frame work components • TI DVSDK [Platform dependent] C6Accel C6Flo C6Run Note: The VisionMid product development is not consistent with the above mentioned tools, in the earliest available “engineering” releases. 2.1 Starting Software Development: “Quick Start Guide Installation Guide”, “Floating-Point Starter Kit” Always a good place to start typically included with the initial developers kit shipped in along with other necessities (power supply, console cable). As recommended in the System Design approach basic developer resources can often contain invaluable resources that may be directly related to the development effort. In the case of Radar signal processing effective benchmarks on linear processing functions were provided illustrating the performance concerns of DSP co-processing tasks. 3 System Software Architecture Designs for Radar Sensors under consideration include multi-CPU cores. Current designs two cores are used, one for Control tasks related to integration with the overall automotive system design and a second for signal processing tasks. With continued technological innovation new low power parts for signal processing and SoC designs have introduced a third ARM core into the system architectures. The subdivision of system operations and synchronization into a functional signal processing time line requires the use of Inter-Process Communication, signaling, and interrupts between CPU cores. Serial, Parallel, internal/external shared memory interfaces are available to support system communication needs. Without common device driver interface architecture and openly defined communication interfaces, each sensor effort will develop software implementations specific each unique design. Growth and scaling successful product designs may result in increasing difficulty while support of exiting designs overwhelms new product and production efforts Michael Nolin 6 of 24 Illustration 3: Texas Instruments Inter-OS Communications January 7, 2011 3.1 Unix System V Architecture Essential to the understanding of common computer architecture building blocks such as IPC and STREAMS. Also: 'man ipc' 3.2 Inter-Process Communication (IPC) Introduced by early Unix architectures, System V IPC, has taken on a more generalized meaning and function with the steady advance of technologies. Linux Kernel development continues with architectural concepts introduced by Unix as well as these concepts are available to micro-controllers (MCU) through uClinux.org and commercially available packages. Openly available documentation for these packages provides easy reference to this construct. New SPRUG06B “SYS/BIOS Inter-Processor Communication (IPC) and I/O User's Guide” May 2010 now available on ti.com release date Q3-Q4 2010. 3.2.1 System V IPC Support for Inter-Process communication also includes shared memory, message queues, and semaphores. 3.2.2 System V STREAMS 3.2.3 POSIX, pthreads POSIX compliant systems include Mutexes, semaphores, condition variables as well as shared memory access routines. Perhaps one of the only relevant IEEE contributions to recent computer technology. 3.2.4 Sockets, Pipes For embedded Radar applications inter-process communication is dependent on the underlying the full duplex serial peripheral interface bus (SPI), Host Port Interface (HPI) or shared memory. A common device driver interface is required to implement inter-process communications. IPC can be used with a variety of physical interfaces, shared memory, serial, parallel, and Ethernet the physical interface is abstracted by the common device driver IO model. Historically IPC has been implemented over a variety of physical interfaces dating back to main frame development when CPU's, memory and disks existed in different rack mounted chassis. Michael Nolin 7 of 24 January 7, 2011 TI DSP BIOS offers “6.5 Message Queues” for “homogeneous or heterogeneous multi-processor messaging” MSGQ. QUE and MBX offer smaller implementation footprints while sacrificing advanced features. The TMS320C6000 DSP/BIOS 5.31 Application Programming Interface (API) Reference Guide section 2.19 “MSGQ Module” provides a detailed description of the MSGQ construct for system integration. For serial interface communications common practice can support the implementation of a checksum in the device driver layer where parallel HPI interfaces would not require checksum support. 6747 SPI0 including optional slave chip select (SPI0_SCS) is connected. SPI1 is unused for serial communications. TMS320F28335 GPIO53 – GPIO 56 drive SPI_DSP_ENA SPI_SIMO_DSP SPI_SOMI_DSP SPI_CLK_DSP respectively optional DSP_CS SPI_SCS the slave ship select is driven by GPIO17. SPI Master Slave hand shake signals master SPIx_ENA and slave SPIx_ENA “increase SPI bus throughput since the master does not need to delay each transfer long enough to allow for the worst case latency...”3 IPC messaging short messages matched to queue lengths for efficiency, as many protocols are developed. SPI Module 16bit shift register and 16bit buffer. IPC data; large data transfers may be passed outside the short message queues through BIOS streaming “continuous sequence of real-time data. Messaging is generally performed with zero copying.” bios_5_41_07_24\packages\tibios\example\advanced\streaming provides valuable streaming Pipe IO example code. 3.2.5 uClinux and IPC Inter-Process Communication inherited from UNIX architectures is consistent with xNIX architectures including MMU less variants uClinux.org for ARM9 cores. TI has provided uClinux (ucLinux) compatibility for the DSPLINK/dsplinkk.ko 3.2.6 OMAP-L1x MSGQ (IPC) and DSPLIB/DSPLink A valuable application of Inter-Process Communication is provided by: http://processors.wiki.ti.com/ http://processors.wiki.ti.com/index.php/Example_DSPLIB/DSPLink_Application_on_OMAP-L1x A valuable example to consider as manufactured boards are arriving with OMAP/6747 hardware installed. An ARM Linux shell application loads and runs a DSPLINK application connected through the MSGQ API. Following the “getting started guide for the EVM” Using command line arguments supplied by the user the ARM application can then report total processing time (including DSP execution and MSGQ communication). An effective system model of the ARC5_B hardware as built. Performance of TI DSPLIB resources are referenced for both OMAP-L137 and 674x implementations. A performance spreadsheet is provided with the docs folder of the C674x DSPLIB installation. (C:\CCStudio_v3.3\c674x\dsplib_v12\docs and http://10.106.10.119/pub/c674x_dsplib_dev_notes.xls). For the OMAP-L1x sample application performance cycle counts with IPC and without are provided on the web page for DSPF_sp_mat_mul(1.24ms), DSPF_sp_mat_mul_cplx (.812ms) and DSPF_sp_mat_trans(.721ms), of particular interest to Radar application processing using the Esprit algorithm. http://processors.wiki.ti.com/index.php/Example_DSPLIB/DSPLink_Application_on_OMAP-L1x 3 6747 Fixed/Floating-point Digital Signal Processor. Michael Nolin 8 of 24 January 7, 2011 http://processors.wiki.ti.com/index.php/C674x_DSPLIB#Performance http://processors.wiki.ti.com/index.php/Getting_Started_Guide_for_OMAP-L137 DSPLIB c674x/dsplib_v11 and dsplib_v12 have effective release dates of 6/25/2009 and 1/5/2010 respectively4. Example source includes input data and benchmark results for several matrix math operations. 3.2.6.1 DSPLINK DSPLINK provides and IPC like software support package to a DSP co-processor running TI/DSPBIOS providing an API to TI DSPLIB functions. DSPLINK supports an interface to/from more traditional Linux based IPC supporting operating systems as demonstrated in the OMAP-L137 (ARM926EJ-K <=> 6747) and DaVinci example code. A Ring IO buffering compatible with operating systems supporting IPC. TI representatives have expressed concerns that the DSPLINK is a large module. 102674 2009-04-16 21:37 dsplink.lib RELEASE BUILD 122946 2009-04-16 21:37 dsplinkk.ko 1261756 2009-04-16 21:37 ../DEBUG/dsplinkk.ko DEBUG BUILD 360267 2009-04-16 21:37 ../DEBUG/dsplink.lib Building a suitable library with kernel module for an embedded system appears only to be dependent on effective tools usage. As the DSPLINK can be 'scaled at compile time' to add or remove functionality, its not clear how size could be a design consideration. DSPLINK, and all dependent components have been built and integrated onto the OMAP-L137 EVM hardware. Working example code (DSPLIB/DSPLINK Application on OMAP-L1x DSPF_sp_mat_mul_cplx) was used to validate the completeness of the newly built source code, performance measured was comparable to published benchmarks. GPP-DSP boundary Basic processor control Shared/synchronized memory pool across multiple processors Notification of user events Mutually exclusive access to shared data structures Linked list based data streaming Data transfer over logical channels Messaging (based on MSGQ module of DSP/BIOS) Ring buffer based data streaming Zero Copy Messaging Support for different physical links LINK DRIVER can be accommodated LNK_012_DES.pdf DSP/BIOS LINK, LNK 012 DES ,Link Driver. 4 Difficult to understand why hand coded matrix multiply routines were being reviewed as late as August of 2010 with C++ types and double indexed arrays, long understood to block 'parallelization'. Michael Nolin 9 of 24 January 7, 2011 Illustration 4: Texas Instruments DSP/BIOS Link Architecture Illustration 5: GPP-DSP connectivity through DSP/BIOS LINK Michael Nolin 10 of 24 January 7, 2011 Illustration 6: Architecture RingIO Transfer, Shared Memory work/OMAP-L137/OMAPL137_arm_1_00_00_11/dsplink-1_61_03-prebuilt/packages/dsplink/doc> kpdf UserGuide.pdf The General Purpose Processor (GPP) end of the DSPLINK supports Linux (MV_pro5), Nucleus, and PrOS(eSOL) . Native build tools are necessary dependent on the GPP's target OS. 1.1.1.1. Ring IO LNK_129_DES.pdf This component allows creation of a ring buffer created within the shared memory. The reader and writer of the ring buffer can be on different processors. 1.1.1.2. LDRV Link Driver 3.2.7 DSPLINK Summary Complete IPC support including semaphores, interrupts HW and SW, as well as Data messaging and control messaging. Debug and informational statistics support is also provided. Procstats, MSGQstats, and Chnlstats, for integrated kernel logging. Support for the OMAP-L137 – 6747, working examples are provided, integration into no-OS micro-controller constructs used on the TMS320F28335 could prove challenging though there seems to be no hard multithreading requirement placed on the GPP. 3.2.8 Host Port Interface The Host Port Interface HPI is provided as a “parallel port interface through which an external host Michael Nolin 11 of 24 January 7, 2011 processor can directly access the processor's resources (configuration and program/data memories).”5 The HPI interface is a user configurable 16 bit interface. Dedicated address HPIA and data HPID, HPIC control register is also provided. Sprufm7d.pdf The TMS320F28335 provides host processing support for Radar designs. The 28335 does not support UHPI as it is defined, discreet GPIO pins (XZCS7-GPIO37, XRD, XZCS0-GPIO36, HRDY-GPIO28, HINTGPIO63) are defined through software and the 335's “External Interface XINTF” 4.14 TMS320F28335 Data Manual to provide HPI functionality to the C6747. The HPI is supported as a Transport (6.5.4 Transports) of the BIOS Message Queues Input/Output support. The TI DSP BIOS supports its Message Queue MSGQ IPC like interface over the HPI as a supported infrastructure layer. Some example code exists for a MSGQ implementation. TI DSP bios/packages/ti/bios/ examples/advanced/msgq_swi2swi/msgq_swi2swi.c Working with E2E.TI.com Brad Griffis provided some feedback on the suitability of IPC mechanisms between C2000 C6000 devices. The DSP – DSP interface model provided by the TMS320F28335 connected through XINT/HPI to TMS320C6747 is supported by the BIOS – BIOS MSGQ interface, with a driver abstraction MQT (message queue transport). Alternately, DSPLINK was developed for the GPP-DSP interface model or ARM-C6747 as exists in the OMAP family. Since legacy implementations of radar devices does not implement TI/DSPBIOS framework on either DSP device an original coding, development and integration effort will be necessary. Lyrtech has been contracted for both HPI software and Ethernet, further postponing the effort for a common device driver interface and OS-BIOS framework. File->New->DSP/BIOS Configuration. Opens Configuration1 Panel “Input/Output” selection 'MSGQ' selection. Illustration 7: CCStudio MSGQ configuration 5 13. sprufk9b.pkf 674x/OMAP-L1x Processor Peripherals Overview. Michael Nolin 12 of 24 January 7, 2011 The MSGQ configuration in CCStudio is analogous to spru423.pdf TMS320 DSP/BIOS v5.41 Users Guide Section 6 Input/Output Methods subsection 6.5 Message Queues. Illustration 8: DSP/LINK Message Queue MSGQ 3.2.9 Radar Data and the Host Port Interface The TMS320F28335 provides the ADC support for the incoming radar data, the TMS320C6747 provides signal processing as well as an external Ethernet interface for downloading antenna data. Due to data rates and the limitations of the HPI and shared memory access to the 6747 SRAM, a basic buffering and DMA solution will be required. Buffering ADC (converter) data in a ring buffer construct on the 335 will address Real Time data rates, minimize data usage and allow for other functions to use the HPI to the 6747 co-processor. A DMA interface function managing current and end pointers will allow for contiguous shared memory writes of radar and support HPI interface interruptions, during large accumulations of entire radar data sets. Option 1 using TI DSP/BIOS available to 335 and 6747. (1) ADC Radar Data with Stream IO interrupt handler (buffer put) (2) DMA application, buffer get DMA to 6747 shared memory buffer area. Michael Nolin 13 of 24 January 7, 2011 (3) (4) Control/IPC Message MSGQ and SWI/HWI 6747 signal: start algorithm on 1-n buffers. Result data/ Raw data SIO to Ethernet TX ring. Data flow Diagram Here. Option 2 DSPLINK, working example code (October 12, 2010) port to HPI using LDRV Link driver LNK_012_DES (1) (2) (3) (4) ADC Radar Data with Stream IO interrupt handler (buffer put) DSPLINK with 'Ring IO' (DMA application, buffer get) algorithm integrated with DSPLIB API start algorithm on 1-n buffers. Result data/ Raw data Stream IO SIO to Ethernet TX ring. Option 3 Linux style IPC to RTOS DSP co-processor. Data, Buffer, Buffer Management, Ring Buffers Perf Ftrace LTTng Linux Trace Tool 3.3 Signal Processing for Long Range Radar A repository for Digital Signal Processing development notes for Radar applications. Measurement of Floating point DSP performance presents unusual challenges. A theoretical calculation of CPU instructions without consideration of the actual number of pipeline instructions and Execution cycles (single precision 4, double precision 106) can lead to significant errors in estimated execution times. The problem is further complicated by variable optimization performance achieved though parallel instruction execution which can produce up to 20x7 performance improvement over Linear Predictive Coding. Unfortunately not all functions, operations, or algorithms can achieve fully parallelized execution for maximum optimized performance. Fortunately DSP silicon vendors as well as DSP design solutions vendors have significant market incentives to measure and bench mark8 their performance against existing technologies. With system design experience it is possible to navigate the volumes of performance data to produce accurate embedded performance estimates which can be measured on embedded hardware. Memory architectures must also be considered when determining expected embedded system performance embedded designs typically have much slower clock rates and narrower bus widths. Few simulation tools for embedded devices would provide a matrix for memory interfaces hence only relative CPU cycle counts would be useful. Compiled esprit program executable binary with libraries 8.023805 Mbytes and 8.027934 Mbytes with smoothing conditional code ARC_5B 4Mbytes. Esprit program compiled against TI DSP libraries and BIOS 6 Reference 1. page 3-32 “Total Result Latency” MPYSP and MPYDP 7 Reference 2. E2E Forum C/C++ compiler group. 8 Reference 4 dsplib developers notes.xls DSPF_sp_mat_mul_cplx 553 Cycles (Absolute) Michael Nolin 14 of 24 January 7, 2011 currently 1Mbytes. 3.4 Texas Instruments DSP BIOS → SYS BIOS The TI DSP BIOS has been “designed to minimize memory and CPU requirements”. It has been optimized to effectively work with the TI tools taking advantage of parallelization with key performance improvements implemented in assembly language. The TI DSP/BIOS is explained in length though three references: 1. “Using DSP/BIOS” lessons in the online Code Composer Studio Tutorial. 2. “TMS320C6000 DSP/BIOS 5.31 Application Programming Interface (API) Reference Guide SPRU403M” 3. “TMS320 DSP/BIOS User's Guide SPRU423F” I BIOS 5_31_02 BIOS 5_33_01, and latest available for CCSv3 5_41_07_24, also available for Linux. Bios_5_33_05 /home/mnolin/work/OMAP-L137. Quick Start instructions: “If you want to quickly try DSP/BIOS with command-line/makefile builds (and not use Code Composer Studio)...” The TI DSP BIOS is supported across the family of TI parts including the TMS320C6000 and TMS320C2000/TMS320F28335, a license agreement may be necessary (2008) to obtain full sources. With the newly available IPC Users Guide recently published “Previous versions of SYS/BIOS were called DSP/BIOS. The new name reflects that this operating system can also be use on processors other than DSPs”9 3.4.1 DSPBIOS 6.x TI DSP BIOS updates for version 6.x “IPC support may be used independently of core DSP/BIOS 6.x kernel functionality. “ http://focus.ti.com/docs/tollsw/folder/print/dspbios6.html Upgrading to CCSv4 is required for DSP BIOS 6.x DSP BIOS 5.41 works with CCSv3 or CCSv4. 3.5 Texas Instruments FastRTS Library The TMS320C67x Fast Run-Time-Support Library, 26 optimized floating-point math functions for the TMS320C67x. Spru100A. Note: We have already root caused significant performance problems, resulting by mistakenly linking against older RTS library routines. 3.6 Applying New Technology: The Texas Instruments C6xxx floating point processor family is a mature device family dating back to 1999-2000. While there have been improvements in clock rates (up to 1GHz) and multi core devices these advances do not exist for the mid range power devices 5-4 watts. The C6747 device represents a new integration of fixed point and floating point DSP features appearing in the past in separate devices. 9 SYS/BIOS Inter-Processor Communication (IPC) and I/O User's Guide SPRUG06B May 2010 Michael Nolin 15 of 24 January 7, 2011 Lately released documentation (April 2010) related to the C6747 Host Port Interface suggest a new DSP coprocessor system architecture. 3.7 Software Libraries Software Libraries are common to many software projects and all levels of infrastructure from web interfaces to low level scientific libraries and simple string processing libraries. Discussion to the suitability of libraries to any system design is best left to experienced professionals. 3.8 Linear Math Libraries cblas_zgemm and zgemm: Esprit calls cblas_zgemm 6 times as well as getS1S2 function calls cblas_zgemm another 2 times for 8 calls to cblas_zgemm. These well defined linear algebra functions and API's made a logical starting point for the embedded integration effort. The core mathematical processing of the Esprit algorithm is handled by the 8 function calls to zgemm (complex) generic matrix multiplication functions.10 C := alpha*op( A )*op( B ) + beta*C The repetitive nature of Linear algebra mathematics with matrices has been optimized over several decades, optimized on the essential equation above. 3.9 OMAP-L137 System overview of OMAP: http://processors.wiki.ti.omc/index.php/OMAPL1x/C674x/AM1x_SoC_Architectural_Overview Multiple CPU masters (ARM or DSP) are combined with multiple slave (peripheral, memories) all manages through a Switch Central Resources (SCRs) module. Powerful yet complex combination of asymmetric CPU cores and peripheral resources. The OMAP-L137 EVM, for high end multi channel audio processing. The OMAP data sheets PRU Subsystem PRUSS Programmable Real-Time Unit Subsystem two units are included PRU0 and PRU1. Complete with interrupt controller and associated memories the PRU's represent instruction memory that can be used to perform a variety of embedded tasks with tight real-time constraints. For Radar designs this flexibility can alleviate SPI bus bottlenecks or implement a CAN bus interface. See 1.4 and 6.20 of OMAP-L137 ADVANCE INFORMATION. 10 Reference 9 B. General ESPRIT Algorithm 1-7 Michael Nolin 16 of 24 January 7, 2011 3.9.1 JTAG ARC5_B schematics indicate a common JTAG header “JTAG CON” connect with both TMS320F28335 and TMS320C6747xxx. This is not consistent with the OMAP-L137 EVM which provided separate JTAG connector headers for each JTAG target “ARM JTAG” and “TI JTAG”. The OMAP-L137 EVM provides external logic to multiplex the ARM and TI JTAG connectors to the OMAP-L137ZKB (pins J1, 2, 3,4 H3 TCLK) signals DSP_EMU0 and DSP_EMU1 connected to J5 GPIO7_15 provided by JTAG_EMU1 and JTAG_EMU0 from J4 the “TI JTAG” connector, effectively provide a work around for standard JTAG scan chain. The ARM JTAG connector providing no JTAG_EMUx singals provide default ARM JTAG to TMS320C6747ZKB. See: EVM schematic pages 10 and 22 of EVMOMAPL137_TechRef_revg.pdf Section 6.31 JTAP Port Description omap-L137 Advance Information data sheets, JTAG scan chain taps are used to select C674x or ARM926 debug interface. The ARC5_B schematics should support three JTAG tap ID's for the F28335, ARM926 and C6747. 4 References: 1. Www.nr.com Numerical Recipes Third Edition William H. Press, Saul A. Teukolsky, William T. Vettering, Brian P. Flannery. 2007. 2. “Numerical Recipes, The Art of Scientific Computing” Third Edition 2007 3. netlib.org cblas_zgemm 4. “Numerical Methods for DSP Systems in C Practical application of numerical methods in : Signal Processing, Graphics, Video Programming, Scientific Applications” Don Morgan. 5. “Digital Signal Processing and Applications with the C6713 and C6416 DSK” Rulph Chassaing 2005 by John Wiley & Sons Inc. 6. ESPRIT Beam Forming for the Autoliv Long Range Radar, Bruce Labitt. 7. “Singular Value Decomposition – A Primer” Sonia Leach Department of Computer Science Brown University Providence RI 02912. DRAFT VERSION. (Postscript) Ghost view (1994) 4.1 Texas Instruments 8. Texas Instruments TMS320C6000 Optimization Workshop Student Guide 9. TI E2E Community “Use of C++ <complex> types and measured performance” TI C/C++ Compiler Forum Clear Quest SDOWP ID#SDSCM00037600 Georgem. 10. http://processors.wiki.com/index.php/C6000 Compiler Tuning Software Pipelined Loops 11. http://processors.wiki.ti.com/index.php/C674x_DSPLIB c647x_dsplib_dev_notes.xls TI DSPLIB “Legacy ASM Implementation from C67x” DSPF_sp_mat_mul_cplx.asm 12. www.wiki.ti.DPS64x 13. SPRU423F “TMS320 DSP/BIOS User's Guide” November 2004 14. TMS320C6000 DSP/BIOS 5.31 Application Programming Interface (API) Reference Guide (spru403m.pdf) July 2006 15. TMS320C674x/OMAP-L1x Processor Peripherals Overview Reference Guide SPRUFK9B June 2009 Users Guide sprufm7d.pdf April 2010. Michael Nolin 17 of 24 January 7, 2011 16. OMAP-L137 Low-Power Applications Processor ADVANCE INFORMATION September 2008 revised August 2010. 17. OMAP-L137/TMS320C6747 Floating-Point Starter Kit '01 May 09' Early Adopter (EA) and (GA) 18. TMS320C67x FastRTS Library Programmer's Reference spru100a 19. TMS320F28335, ..334, 332,235,234,232 Digital Signal Controllers Data Manual SPRS439H June 2007 – Revised March 2010 http://focus.ti.com/lit/ds/symlink/tms320f28335.pdf. 20. TMS320x28xx, 28xxx DSP Peripheral Reference Guide SPRU566D June 2003 - October 2006 21. DSP/BIOS LINK LNK 058 USR User's Guide Version 1.61.03 March 31, 2009. OMAPL137/OMAPL137_arm_1_00_00_11/dsplink-1_61_03-prebuilt/packages/dsplink/doc 22. xDIAS-DM Users Guide OMAPL137_arm_1_00_00_11/framework_components_2_23_01/fctools/packages/ti/xdais/dm/docs/XD M_UsersGuide.pdf 4.2 SAAB (Microwave) SAAB documents concerning ARC4 and ARC5 devices can be found on the shared drive S:\AEACommon1\ active_safety\24GHz\Saab Transfer Documents several documentations revi 23. ARC5 SW design considerations 2009-05-25 A24R-00104 DDJX, Alexei Zernov 5 Appendix netlib.org CLAPACK-3.2.1, netlib.org LAPACK-3.2.2 netlib.org ATLAS numbpy sourceforge.net. 6 Linux and the Texas Instruments c6x family Http://www.linux-c6x.org/wiki/index.php/Main_Page. The evolution of multiple devices and multiple cores into many embedded products has lead to questions of interoperability and communications between devices. Multi processor multi core designs with built in architecture support from Unix, Linux and uClinux, are challenging the traditional RTOS micro-controller software support packages. RTOS vendors must support a common IPC like communications interface to enable system integrators to include micro-controllers in to current multiprocessor architecture designs. For many reasons Linux to RTOS IPC is expected in todays technology systems. EABI Embedded Application Binary Interface, a requirement of DSPLINK and TI PSP Platform Support Packages. 7 Software as Texas Instruments redefines it. Texas Instruments as a hardware company has several definitions to describe basic C programming concepts, some dating back to the origins of Unix and Xnix like concepts GNU.org. IPS: Interprocessor Signaling (Semaphores) PSP: Platform Software Package device drivers for 6747 for DSP/BIOS environments. Michael Nolin 18 of 24 January 7, 2011 XDC eXpanDed C : A command shell for GNU make support, (see Richard Stallman, GNU.org.) XDM xDIAS-DM Digital Media Users guide. The xDM standard defines a uniform set of APIs across various multimedia codecs to ease integration and ensure interoperability. xDM is built over TI’s well proven eXpress DSP Algorithm Interoperability Standard (also known as xDAIS) specification. A form of binary achieved container, for 3rd party intellectual property, without which research and development of new algorithms would not be undertaken by businesses. 8 OMAP Software Resources. Following the OMAP-L1x Getting Started guides through to working example code provided for “DSPLIB/DSPLink Applications” illustrated the value of TI's solution to integrating DSP co-processing devices into multiple core design solutions. After running the pre-built executables for the “Example DSPLIB/DSPLink Application on OMAP-L1x” building from sources further validated, build instructions and portability of DSPLINK to new designs. work/OMAP-L137 mnolin@linux-lap:~/work/mv_pro_5.0> -rw-r--r-- 1 users 1804720 2010-10-13 14:59 montavista/pro/devkit/lsp/ti-davinci/linux2.6.18_pro500/arch/arm/boot/uImage LSP Linux Support Package U-Boot Kernel User-Bootloader and Flash Drivers. Linux Utils 2.23.01 2009 CMEM contiguous memory manager. SDMA, NA for OMAP-L137 EDMA C64x VICP C64x Video Image Co-Processing includes vicp and irq. The Linux Utils utility package provides the ability for user-mode applications to access the CMEM, EDMA, SDMA, and VICP utility libraries http://processors.wiki.ti.com/index.php/Building_The_OMAP-L137_SDK#CMEM OMAPL137_arm_1_00_00_11/linuxutils_2_23_01/packages/ti/sdo/linuxutils/cmem/src/module QA-C “static” tools proprietary tools checking for MISRA compliance. Michael Nolin 19 of 24 January 7, 2011 Illustration 9: DSP hardware accelerator or algorithm co-processing engine DSPLink Application Block Diagram http://processors.wiki.ti.com/OMAP-L137_Audio_Drivers_in_the_DSP_ %2B_Linux an illustration from Texas Instruments of the OMAP-L137 integrated as a DSP hardware accelerator or algorithm co-processing engine. Illustration 10: DSP algorithm co-processing engine and external peripherals DSPLink Application Block Diagram http://processors.wiki.ti.com/OMAP-L137_Audio_Drivers_in_the_DSP_ %2B_Linux an illustration from Texas Instruments of the OMAP-L137 integrated as a DSP hardware accelerator or algorithm co-processing engine, with the additional flexibility of external peripherals. Currently Long Range Radar implements an external Ethernet connected to the C6747 DSP for radar data retrieval. Michael Nolin 20 of 24 January 7, 2011 Last login: Fri Jan 14 07:37:28 2000 on console Linux 169.254.1.2 2.6.18_pro500-da830_omapl137_evm-arm_v5t_le #1 PREEMPT Wed Oct 13 14:58:53 EDT 2010 armv5tejl GNU/Linux Welcome to MontaVista(R) Linux(R) Professional Edition 5.0.0 (0801921). root@169.254.1.2:/home# lsmod Module Size Used by root@169.254.1.2:/home# ./loadmodules.sh CMEMK module: built on Oct 14 2010 at 15:20:13 Reference Linux version 2.6.18 File /home/mnolin/work/OMAPL137/OMAPL137_arm_1_00_00_11/linuxutils_2_23_01/packages/ti/sdo/linuxutils/cmem/src/module/cmemk.c ioremap_nocache(0xc2000000, 12582912)=0xc3000000 allocated heap buffer 0xc3000000 of size 0x8ac000 cmem initialized 3 pools between 0xc2000000 and 0xc2c00000 dsplinkk: no version for "struct_module" found: kernel tainted. DSPLINK Module (1.61.03) created on Date: Oct 13 2010 Time: 17:08:03 root@169.254.1.2:/home# modinfo dsplinkk.ko filename: dsplinkk.ko license: GPL v2 depends: vermagic: 2.6.18_pro500-da830_omapl137_evm-arm_v5t_le preempt mod_unload ARMv5 gcc-4.2 root@169.254.1.2:/home# modinfo cmemk.ko filename: cmemk.ko license: GPL depends: vermagic: 2.6.18_pro500-da830_omapl137_evm-arm_v5t_le preempt mod_unload ARMv5 gcc-4.2 parm: phys_start: Start Address for CMEM Pool Memory (charp) parm: phys_end: End Address for CMEM Pool Memory (charp) parm: pools: List of Pool Sizes and Number of Entries, comma separated, decimal sizes (array of charp) parm: phys_start_1: Start Address for Extended CMEM Pool Memory (charp) parm: phys_end_1: End Address for Extended CMEM Pool Memory (charp) parm: pools_1: List of Pool Sizes and Number of Entries, comma separated, decimal sizes, for Extended CMEM Pool (array of charp) parm: allowOverlap: Set to 1 if cmem range is allowed to overlap memory range allocated to kernel physical mem (via mem=xxx) (int) root@169.254.1.2:/home# toot@169.254.1.2:/home# ./call_dsplib DSPF_sp_mat_mul_cplx mat_mul_cplx_input.tx Initializing DSPLINK... Calling DSPLIB function... Received response from DSP after 0.000785 seconds. DSP completed processing in 89271 cycles. Closing DSPLINK... call_dsplib completed successfully! root@169.254.1.2:/home# root@169.254.1.2:/home# cat /proc/version Michael Nolin 21 of 24 January 7, 2011 Linux version 2.6.18_pro500-da830_omapl137_evm-arm_v5t_le (mnolin@linux-lap) (gcc version 4.2.0 (MontaVista 4.2.0-16.0.32.0801914 2008-08-30)0 Illustration 11: OMAP-L137, running “Example DSPLIB/DSPLink Application on OMAP-L1x”. Kernel modules as indicated have been built from source, MontaVista pro5 Kernel also built October 13th. 'Terminal', minicom console port, telnet and ssh terminals are supported. 9 VisionMid TMS320DM814x Software Resources Product Status Product Preview (PP) An “Engineering Release” for the VisionMid Software kit has most recently been released as of December 2, 2010. VisionMid support is currently only available in the latest CCStudio 4.2.1.00004 on Windows hosts only. A required BIOS and PSP “DM8148 BIOS PSP” package is also only available with limited functionality. Attempts to build sample test applications was unsuccessful. A test sample of CCStudio, 30 day trial due to expire January 21'st is also a dependency. TMS320DM814x DaVinci Digital Media Processors TMS320DM8148(ALP) November 5,2010 REL-B It is clear that VisionMid support is only available as an internal engineering release, several releases may be necessary before performance benchmarks provided with the OMAP PSP can be executed. Program Files\Texas Instruments\pspdrivers_02_20_00_02\docs\DM8148 Release Notes PSP release versioned 02.20.00.02 is a Beta release for EVM DM8148. DSP/BIOS Version 6.31.00.06 CCStudio Version 4.2.0.09007 CG tools 4.6.3 EDMA3, XDC Tool IPC, and supported drivers tables of performance and memory usage are provided for each driver provided by data sheets. “Pleas note that at this point of time the drivers does not have any abstraction for the OS APIs and they use the OS (BIOS 6.31.00.06) inside the drivers.” pspdrivers_02_20_00_02\packages\ti\psp\mcspi\docs 4 channel chip select (SPIEN). TI DSP/BIOS driver with streams interface. Cslr Chip supprot register configuration (Macros) PCRM (that helps to turn the clock on/off for the modules) XDC (eXpanDed C) EA1 Release, Eclipse Community Forums: DSDP Real Time software Components (RTSC) Eclipse DSDP device software development project? DM8148_BIOSPSP_Userguide.pdf Installation Guide. All new IPC interface May 20,2010 IPC_Users_Guide.pdf Michael Nolin 22 of 24 January 7, 2011 10 Pseudo Code http://en.wikipedia.org/wiki/Pseudocode Matlab scripts, Illustration 12: BIOS PSP Users Guide (OMAP-L137) block driver Michael Nolin 23 of 24 January 7, 2011 Illustration 13: BIOS PSP driver with streaming interface (OMAP-L137) Alphabetical Index BIOS............................................................................ DSP BIOS...........................................................15 JTAG...................................................................17 SYS BIOS...........................................................15 DSPLIB......................................................................5 Host Port Interface.....................................................7 HPI............................................................................... Host Port Interface................................................7 serial peripheral interface.....................................7 IPC............................................................................... Inter-Process Communication...............................6 IPC........................................................................3 Michael Nolin man ipc.................................................................7 JTAG........................................................................17 Link.............................................................................. DSPLINK.........................................................9, 14 LSP..........................................................................19 serial peripheral interface...........................................7 SPI............................................................................... serial peripheral interface.....................................7 VICP........................................................................19 VisionMid.................................................................22 XDC...........................................................................6 24 of 24 January 7, 2011