Table of Contents Illustration Index

Embedded Software for Radar Signal Processing Applications
Table of Contents
1 System Design Approach ................................................................................................................................4
1.1 Critical Timing Requirements ..................................................................................................................4
1.2 Architecture Hardware Features and Application Notes...........................................................................4
1.3 Estimate of Performance Requirements. ................................................................................................5
1.4 Measuring Performance...........................................................................................................................5
2 TI DSP + ARM Software Development Resources.........................................................................................5
2.1 Starting Software Development: “Quick Start Guide Installation Guide”, “Floating-Point Starter Kit” ....6
3 System Software Architecture..........................................................................................................................6
3.1 Unix System V Architecture......................................................................................................................7
3.2 Inter-Process Communication (IPC)........................................................................................................7
3.2.1 System V IPC...................................................................................................................................7
3.2.2 System V STREAMS.......................................................................................................................7
3.2.3 POSIX, pthreads..............................................................................................................................7
3.2.4 Sockets, Pipes.................................................................................................................................7
3.2.5 uClinux and IPC...............................................................................................................................8
3.2.6 OMAP-L1x MSGQ (IPC) and DSPLIB/DSPLink..............................................................................8 DSPLINK.................................................................................................................................9
3.2.7 DSPLINK Summary.......................................................................................................................11
3.2.8 Host Port Interface.........................................................................................................................11
3.2.9 Radar Data and the Host Port Interface.........................................................................................13
3.3 Signal Processing for Long Range Radar..............................................................................................14
3.4 Texas Instruments DSP BIOS → SYS BIOS........................................................................................15
3.4.1 DSPBIOS 6.x ................................................................................................................................15
3.5 Texas Instruments FastRTS Library.....................................................................................................15
3.6 Applying New Technology:.....................................................................................................................15
3.7 Software Libraries..................................................................................................................................16
3.8 Linear Math Libraries cblas_zgemm and zgemm:..................................................................................16
3.9 OMAP-L137...........................................................................................................................................16
3.9.1 JTAG .............................................................................................................................................17
4 References:....................................................................................................................................................17
4.1 Texas Instruments..................................................................................................................................17
4.2 SAAB (Microwave) ................................................................................................................................18
5 Appendix .......................................................................................................................................................18
6 Linux and the Texas Instruments c6x family...................................................................................................18
7 Software as Texas Instruments redefines it. .................................................................................................18
8 OMAP Software Resources. .........................................................................................................................19
9 VisionMid TMS320DM814x Software Resources...........................................................................................22
10 Pseudo Code...............................................................................................................................................23
Illustration Index
Illustration 1: Initial ARC5-B Digital Signal Processing System...............................................................................3
Illustration 2: Texas Instruments ARM - DSP Layered Architecture........................................................................5
Illustration 3: Texas Instruments Inter-OS Communications....................................................................................6
Michael Nolin
1 of 24
January 7, 2011
Illustration 4: Texas Instruments DSP/BIOS Link Architecture..............................................................................10
Illustration 5: GPP-DSP connectivity through DSP/BIOS LINK.............................................................................10
Illustration 6: Architecture RingIO Transfer, Shared Memory................................................................................11
Illustration 7: CCStudio MSGQ configuration .......................................................................................................12
Illustration 8: DSP/LINK Message Queue MSGQ.................................................................................................13
Illustration 9: DSP hardware accelerator or algorithm co-processing engine........................................................20
Illustration 10: DSP algorithm co-processing engine and external peripherals.....................................................20
Illustration 11: OMAP-L137, running “Example DSPLIB/DSPLink Application on OMAP-L1x”. ...........................22
Illustration 12: BIOS PSP Users Guide (OMAP-L137) block driver.......................................................................23
Illustration 13: BIOS PSP driver with streaming interface (OMAP-L137)...............................................................24
Michael Nolin
2 of 24
January 7, 2011
Illustration 1: Initial ARC5-B Digital Signal Processing System
The initial motivation of this document was a request to explain “IPC” as it would be required to
demonstrate the capture of basic radar data on the ARC5-B digital signal processing board. InterProcessor Communication is difficult to explain outside of its operating systems context and related
Michael Nolin
3 of 24
January 7, 2011
computer architecture building blocks. As a result this document continues to expand into general software
and architecture concepts, which more suitable and organized explanations may be found in the numerous
Embedded Real Time operating systems have continued to evolve over several decades from simple micro
controllers to the multi core systems being used today for image and signal processing systems of all types.
The Reference manuals Users Guides and texts are invaluable to the understanding of embedded
architecture building blocks. In many cases freely available software, operating systems and
documentation have been leading the way in 64 bit, multi-core, RF. Hand-held user applications...
The hardware designs for radar signal processing, to date have been implementations of Texas Instrument
Floating Point DSP's, as a result much of the hardware and software integration is also suitably described
on and
1 System Design Approach
“Identify critical timing or CPU performance requirements.”1
Check vendor application notes or example of similar application notes. if not identify
architecture hardware features that can be leveraged. This can indicate how far an application can
be optimized.
Roughly estimate performance requirements MIPS, FLOPS, precise calculations may not be
necessary for large variations in performance.
For small performance gaps (10-20%) on major components of the application specific
implementation test models should be measured using the provided SDK Software Development
1.1 Critical Timing Requirements
12 bit A/D converters with 80ns conversion time, this conversion time is determined by the TMS320F28335
in current designs. Also under consideration, 14,16 bit A/D converters 24 bit converters are sometimes
implemented in high end audio equipment.
5ns software real time requirement for Sequencer control, this may be addressed with a PWM??.
1 Mbps serial CAN buss interface for communicating 'target' information 50ms..
7040 256pt IFFTS per sweep for 8 antenna array 40-75ms sweeps? 4, 6,8, 9 and 12(12x64) antenna arrays
are being considered.
1ms DSP algorithm goal for processing between radar sweeps. At 40ms per sweep an estimated 75% to
process ESPRIT angle estimation or 30ms. 10 targets per sweep for 3ms to process each target. 2
16 bit (½ speed) memory bus differs significantly from reference hardware implementation and
150MHz F28335 (6.66ns cycle time) 300MHz 6747 (3.33ns cycle time).
7MHz SPI slave limitations TI specific design constraints.
Performance goals and critical timing requirements continue to emerge. Vendors have offered an array of
options from 100MHz devices through multi core 1GHz devices with hardware acceleration. Cost goals
continue to drive component selection.
1.2 Architecture Hardware Features and Application Notes.
Clearly titled example “Example DSPLIB/DSPLink Application on OMAP-L1x” addresses both processor to
1 Embedded Systems Design October 2010 “Why MIPS is just a number” Gaurang Kavaiya
2 August 4,2010 email “Timing for TI functions”.
Michael Nolin
4 of 24
January 7, 2011
processor communication as well as key benchmarks specific to the Radar algorithm being developed. The
example provided a valuable working program that can be built and modified by the developer to further
specialize the application and measure performance for a variety of DSP functions on the OMAP-L137 EVM
ARM to 6747 shared memory as well as the 335 to 6747 HPI provide the hardware transports for processor
to DSP co-processor communications. The effectiveness of processor to processor communication is a
significant system consideration. Existing product implements a SPI serial communication scheme
significantly limiting effectiveness. (asynchronous serial verses synchronous parallel)
1.3 Estimate of Performance Requirements.
Benchmark spreadsheets.
1.4 Measuring Performance
OMAP-L137 EVM and platform software, DSPLIB Function for DSP_sp_mat_mul_cplx 32x32 was run and
confirmed .8ms. The software package was rebuilt along with the linux kernel and similar results were
A series of discrete algorithm functions, for single and double precision floating point test were measured
on the OMAP-L137 EVM using CCStudio resources. Measured end to end 3ms was consumed through ½
the algorithm CSVD.
2 TI DSP + ARM Software Development Resources
Illustration 2: Texas Instruments ARM - DSP Layered Architecture
With the development of new SoC designs from TI updates and recent releases (October-November 2010)
Michael Nolin
5 of 24
January 7, 2011
are continually reviewed for architecture resources that can be leveraged in the LRR design goals. TI
continues to design and expand its SoC ARM+DSP architectures. The software development tools and
resources continues to evolve around the frame work introduced with OMAP-L137.
A familiar list of cross host compatible tools:
• TI C6000 Code Generation Tools v6.0.9 or higher
• TI DSP/BIOS v5.41.x
• TI Codec Engine 2.25 or higher
• TI XDC Tools eXpanDed C..
• TI Frame work components
• TI DVSDK [Platform dependent]
Note: The VisionMid product development is not consistent with the above mentioned tools, in the earliest
available “engineering” releases.
2.1 Starting Software Development: “Quick Start Guide Installation
Guide”, “Floating-Point Starter Kit”
Always a good place to start typically included with the initial developers kit shipped in along with other
necessities (power supply, console cable).
As recommended in the System Design approach basic developer resources can often contain invaluable
resources that may be directly related to the development effort. In the case of Radar signal processing
effective benchmarks on linear processing functions were provided illustrating the performance concerns of
DSP co-processing tasks.
3 System Software Architecture
Designs for Radar Sensors under consideration include multi-CPU cores. Current designs two cores are
used, one for Control tasks related to integration with the overall automotive system design and a second
for signal processing tasks. With continued technological innovation new low power parts for signal
processing and SoC designs have introduced a third ARM core into the system architectures. The
subdivision of system operations and synchronization into a functional signal processing time line requires
the use of Inter-Process Communication, signaling, and interrupts between CPU cores. Serial, Parallel,
internal/external shared memory interfaces are available to support system communication needs. Without
common device driver interface architecture and openly defined communication interfaces, each sensor
effort will develop software implementations specific each unique design. Growth and scaling successful
product designs may result in increasing difficulty while support of exiting designs overwhelms new product
and production efforts
Michael Nolin
6 of 24
Illustration 3: Texas Instruments Inter-OS Communications
January 7, 2011
3.1 Unix System V Architecture
Essential to the understanding of common computer architecture building blocks such as IPC and
STREAMS. Also: 'man ipc'
3.2 Inter-Process Communication (IPC)
Introduced by early Unix architectures, System V IPC, has taken on a more generalized meaning and
function with the steady advance of technologies. Linux Kernel development continues with architectural
concepts introduced by Unix as well as these concepts are available to micro-controllers (MCU) through and commercially available packages. Openly available documentation for these packages
provides easy reference to this construct.
New SPRUG06B “SYS/BIOS Inter-Processor Communication (IPC) and I/O User's Guide” May 2010 now
available on release date Q3-Q4 2010.
System V IPC
Support for Inter-Process communication also includes shared memory, message queues, and
POSIX, pthreads
POSIX compliant systems include Mutexes, semaphores, condition variables as well as shared memory
access routines. Perhaps one of the only relevant IEEE contributions to recent computer technology.
Sockets, Pipes
For embedded Radar applications inter-process communication is dependent on the underlying the full
duplex serial peripheral interface bus (SPI), Host Port Interface (HPI) or shared memory. A common
device driver interface is required to implement inter-process communications. IPC can be used with a
variety of physical interfaces, shared memory, serial, parallel, and Ethernet the physical interface is
abstracted by the common device driver IO model. Historically IPC has been implemented over a variety of
physical interfaces dating back to main frame development when CPU's, memory and disks existed in
different rack mounted chassis.
Michael Nolin
7 of 24
January 7, 2011
TI DSP BIOS offers “6.5 Message Queues” for “homogeneous or heterogeneous multi-processor
messaging” MSGQ. QUE and MBX offer smaller implementation footprints while sacrificing advanced
features. The TMS320C6000 DSP/BIOS 5.31 Application Programming Interface (API) Reference Guide
section 2.19 “MSGQ Module” provides a detailed description of the MSGQ construct for system integration.
For serial interface communications common practice can support the implementation of a checksum in the
device driver layer where parallel HPI interfaces would not require checksum support.
6747 SPI0 including optional slave chip select (SPI0_SCS) is connected. SPI1 is unused for serial
SPI_CLK_DSP respectively optional DSP_CS SPI_SCS the slave ship select is driven by GPIO17. SPI
Master Slave hand shake signals master SPIx_ENA and slave SPIx_ENA “increase SPI bus throughput
since the master does not need to delay each transfer long enough to allow for the worst case latency...”3
IPC messaging short messages matched to queue lengths for efficiency, as many protocols are developed.
SPI Module 16bit shift register and 16bit buffer.
IPC data; large data transfers may be passed outside the short message queues through BIOS streaming
“continuous sequence of real-time data. Messaging is generally performed with zero copying.”
bios_5_41_07_24\packages\tibios\example\advanced\streaming provides valuable streaming Pipe IO
example code.
uClinux and IPC
Inter-Process Communication inherited from UNIX architectures is consistent with xNIX architectures
including MMU less variants for ARM9 cores. TI has provided uClinux (ucLinux) compatibility
for the DSPLINK/dsplinkk.ko
A valuable application of Inter-Process Communication is provided by:
A valuable example to consider as manufactured boards are arriving with OMAP/6747 hardware installed.
An ARM Linux shell application loads and runs a DSPLINK application connected through the MSGQ API.
Following the “getting started guide for the EVM” Using command line arguments supplied by the user the
ARM application can then report total processing time (including DSP execution and MSGQ
communication). An effective system model of the ARC5_B hardware as built.
Performance of TI DSPLIB resources are referenced for both OMAP-L137 and 674x implementations. A
performance spreadsheet is provided with the docs folder of the C674x DSPLIB installation.
(C:\CCStudio_v3.3\c674x\dsplib_v12\docs and For
the OMAP-L1x sample application performance cycle counts with IPC and without are provided on the web
page for DSPF_sp_mat_mul(1.24ms), DSPF_sp_mat_mul_cplx (.812ms) and
DSPF_sp_mat_trans(.721ms), of particular interest to Radar application processing using the Esprit
3 6747 Fixed/Floating-point Digital Signal Processor.
Michael Nolin
8 of 24
January 7, 2011
DSPLIB c674x/dsplib_v11 and dsplib_v12 have effective release dates of 6/25/2009 and 1/5/2010
respectively4. Example source includes input data and benchmark results for several matrix math
operations. DSPLINK
DSPLINK provides and IPC like software support package to a DSP co-processor running TI/DSPBIOS
providing an API to TI DSPLIB functions. DSPLINK supports an interface to/from more traditional Linux
based IPC supporting operating systems as demonstrated in the OMAP-L137 (ARM926EJ-K <=> 6747) and
DaVinci example code. A Ring IO buffering compatible with operating systems supporting IPC.
TI representatives have expressed concerns that the DSPLINK is a large module.
102674 2009-04-16 21:37 dsplink.lib RELEASE BUILD
122946 2009-04-16 21:37 dsplinkk.ko
1261756 2009-04-16 21:37 ../DEBUG/dsplinkk.ko DEBUG BUILD
360267 2009-04-16 21:37 ../DEBUG/dsplink.lib
Building a suitable library with kernel module for an embedded system appears only to be dependent on
effective tools usage. As the DSPLINK can be 'scaled at compile time' to add or remove functionality, its
not clear how size could be a design consideration.
DSPLINK, and all dependent components have been built and integrated onto the OMAP-L137 EVM
hardware. Working example code (DSPLIB/DSPLINK Application on OMAP-L1x DSPF_sp_mat_mul_cplx)
was used to validate the completeness of the newly built source code, performance measured was
comparable to published benchmarks.
GPP-DSP boundary
Basic processor control
Shared/synchronized memory pool across multiple processors
Notification of user events
Mutually exclusive access to shared data structures
Linked list based data streaming
Data transfer over logical channels
Messaging (based on MSGQ module of DSP/BIOS)
Ring buffer based data streaming
Zero Copy Messaging
Support for different physical links LINK DRIVER can be accommodated LNK_012_DES.pdf DSP/BIOS
LINK, LNK 012 DES ,Link Driver.
4 Difficult to understand why hand coded matrix multiply routines were being reviewed as late as August of
2010 with C++ types and double indexed arrays, long understood to block 'parallelization'.
Michael Nolin
9 of 24
January 7, 2011
Illustration 4: Texas Instruments DSP/BIOS Link Architecture
Illustration 5: GPP-DSP connectivity through DSP/BIOS LINK
Michael Nolin
10 of 24
January 7, 2011
Illustration 6: Architecture RingIO Transfer, Shared Memory
work/OMAP-L137/OMAPL137_arm_1_00_00_11/dsplink-1_61_03-prebuilt/packages/dsplink/doc> kpdf
The General Purpose Processor (GPP) end of the DSPLINK supports Linux (MV_pro5), Nucleus, and
PrOS(eSOL) . Native build tools are necessary dependent on the GPP's target OS. Ring IO LNK_129_DES.pdf
This component allows creation of a ring buffer created within the shared memory.
The reader and writer of the ring buffer can be on different processors. LDRV Link Driver
Complete IPC support including semaphores, interrupts HW and SW, as well as Data messaging and
control messaging. Debug and informational statistics support is also provided. Procstats, MSGQstats, and
Chnlstats, for integrated kernel logging.
Support for the OMAP-L137 – 6747, working examples are provided, integration into no-OS micro-controller
constructs used on the TMS320F28335 could prove challenging though there seems to be no hard multithreading requirement placed on the GPP.
Host Port Interface
The Host Port Interface HPI is provided as a “parallel port interface through which an external host
Michael Nolin
11 of 24
January 7, 2011
processor can directly access the processor's resources (configuration and program/data memories).”5 The
HPI interface is a user configurable 16 bit interface. Dedicated address HPIA and data HPID, HPIC control
register is also provided. Sprufm7d.pdf
The TMS320F28335 provides host processing support for Radar designs. The 28335 does not support
UHPI as it is defined, discreet GPIO pins (XZCS7-GPIO37, XRD, XZCS0-GPIO36, HRDY-GPIO28, HINTGPIO63) are defined through software and the 335's “External Interface XINTF” 4.14 TMS320F28335
Data Manual to provide HPI functionality to the C6747.
The HPI is supported as a Transport (6.5.4 Transports) of the BIOS Message Queues Input/Output support.
The TI DSP BIOS supports its Message Queue MSGQ IPC like interface over the HPI as a supported
infrastructure layer. Some example code exists for a MSGQ implementation. TI DSP bios/packages/ti/bios/
Working with Brad Griffis provided some feedback on the suitability of IPC mechanisms
between C2000 C6000 devices. The DSP – DSP interface model provided by the TMS320F28335
connected through XINT/HPI to TMS320C6747 is supported by the BIOS – BIOS MSGQ interface, with a
driver abstraction MQT (message queue transport). Alternately, DSPLINK was developed for the GPP-DSP
interface model or ARM-C6747 as exists in the OMAP family.
Since legacy implementations of radar devices does not implement TI/DSPBIOS framework on either DSP
device an original coding, development and integration effort will be necessary. Lyrtech has been
contracted for both HPI software and Ethernet, further postponing the effort for a common device driver
interface and OS-BIOS framework.
File->New->DSP/BIOS Configuration. Opens Configuration1 Panel “Input/Output” selection 'MSGQ'
Illustration 7: CCStudio MSGQ configuration
5 13. sprufk9b.pkf 674x/OMAP-L1x Processor Peripherals Overview.
Michael Nolin
12 of 24
January 7, 2011
The MSGQ configuration in CCStudio is analogous to spru423.pdf TMS320 DSP/BIOS v5.41 Users Guide
Section 6 Input/Output Methods subsection 6.5 Message Queues.
Illustration 8: DSP/LINK Message Queue MSGQ
Radar Data and the Host Port Interface
The TMS320F28335 provides the ADC support for the incoming radar data, the TMS320C6747 provides
signal processing as well as an external Ethernet interface for downloading antenna data. Due to data
rates and the limitations of the HPI and shared memory access to the 6747 SRAM, a basic buffering and
DMA solution will be required. Buffering ADC (converter) data in a ring buffer construct on the 335 will
address Real Time data rates, minimize data usage and allow for other functions to use the HPI to the 6747
co-processor. A DMA interface function managing current and end pointers will allow for contiguous shared
memory writes of radar and support HPI interface interruptions, during large accumulations of entire radar
data sets.
Option 1 using TI DSP/BIOS available to 335 and 6747.
ADC Radar Data with Stream IO interrupt handler (buffer put)
DMA application, buffer get DMA to 6747 shared memory buffer area.
Michael Nolin
13 of 24
January 7, 2011
Control/IPC Message MSGQ and SWI/HWI 6747 signal: start algorithm on 1-n buffers.
Result data/ Raw data SIO to Ethernet TX ring.
Data flow Diagram Here.
Option 2 DSPLINK, working example code (October 12, 2010) port to HPI using LDRV Link driver
ADC Radar Data with Stream IO interrupt handler (buffer put)
DSPLINK with 'Ring IO' (DMA application, buffer get)
algorithm integrated with DSPLIB API start algorithm on 1-n buffers.
Result data/ Raw data Stream IO SIO to Ethernet TX ring.
Option 3 Linux style IPC to RTOS DSP co-processor.
Data, Buffer, Buffer Management, Ring Buffers
LTTng Linux Trace Tool
3.3 Signal Processing for Long Range Radar
A repository for Digital Signal Processing development notes for Radar applications.
Measurement of Floating point DSP performance presents unusual challenges. A theoretical calculation of
CPU instructions without consideration of the actual number of pipeline instructions and Execution cycles
(single precision 4, double precision 106) can lead to significant errors in estimated execution times. The
problem is further complicated by variable optimization performance achieved though parallel instruction
execution which can produce up to 20x7 performance improvement over Linear Predictive Coding.
Unfortunately not all functions, operations, or algorithms can achieve fully parallelized execution for
maximum optimized performance.
Fortunately DSP silicon vendors as well as DSP design solutions vendors have significant market
incentives to measure and bench mark8 their performance against existing technologies. With system
design experience it is possible to navigate the volumes of performance data to produce accurate
embedded performance estimates which can be measured on embedded hardware.
Memory architectures must also be considered when determining expected embedded system performance
embedded designs typically have much slower clock rates and narrower bus widths. Few simulation tools
for embedded devices would provide a matrix for memory interfaces hence only relative CPU cycle counts
would be useful.
Compiled esprit program executable binary with libraries 8.023805 Mbytes and 8.027934 Mbytes with
smoothing conditional code ARC_5B 4Mbytes. Esprit program compiled against TI DSP libraries and BIOS
6 Reference 1. page 3-32 “Total Result Latency” MPYSP and MPYDP
7 Reference 2. E2E Forum C/C++ compiler group.
8 Reference 4 dsplib developers notes.xls DSPF_sp_mat_mul_cplx 553 Cycles (Absolute)
Michael Nolin
14 of 24
January 7, 2011
currently 1Mbytes.
3.4 Texas Instruments DSP BIOS → SYS BIOS
The TI DSP BIOS has been “designed to minimize memory and CPU requirements”. It has been optimized
to effectively work with the TI tools taking advantage of parallelization with key performance improvements
implemented in assembly language.
The TI DSP/BIOS is explained in length though three references:
1. “Using DSP/BIOS” lessons in the online Code Composer Studio Tutorial.
2. “TMS320C6000 DSP/BIOS 5.31 Application Programming Interface (API) Reference Guide SPRU403M”
3. “TMS320 DSP/BIOS User's Guide SPRU423F”
BIOS 5_31_02 BIOS 5_33_01, and latest available for CCSv3 5_41_07_24, also available for Linux.
Bios_5_33_05 /home/mnolin/work/OMAP-L137. Quick Start instructions: “If you want to quickly try
DSP/BIOS with command-line/makefile builds (and not use Code Composer Studio)...”
The TI DSP BIOS is supported across the family of TI parts including the TMS320C6000 and
TMS320C2000/TMS320F28335, a license agreement may be necessary (2008) to obtain full sources.
With the newly available IPC Users Guide recently published “Previous versions of SYS/BIOS were called
DSP/BIOS. The new name reflects that this operating system can also be use on processors other than
TI DSP BIOS updates for version 6.x “IPC support may be used independently of core DSP/BIOS 6.x
kernel functionality. “
Upgrading to CCSv4 is required for DSP BIOS 6.x DSP BIOS 5.41 works with CCSv3 or CCSv4.
3.5 Texas Instruments FastRTS Library
The TMS320C67x Fast Run-Time-Support Library, 26 optimized floating-point math functions for the
TMS320C67x. Spru100A.
Note: We have already root caused significant performance problems, resulting by mistakenly linking
against older RTS library routines.
3.6 Applying New Technology:
The Texas Instruments C6xxx floating point processor family is a mature device family dating back to
1999-2000. While there have been improvements in clock rates (up to 1GHz) and multi core devices these
advances do not exist for the mid range power devices 5-4 watts. The C6747 device represents a new
integration of fixed point and floating point DSP features appearing in the past in separate devices.
9 SYS/BIOS Inter-Processor Communication (IPC) and I/O User's Guide SPRUG06B May 2010
Michael Nolin
15 of 24
January 7, 2011
Lately released documentation (April 2010) related to the C6747 Host Port Interface suggest a new DSP coprocessor system architecture.
3.7 Software Libraries
Software Libraries are common to many software projects and all levels of infrastructure from web
interfaces to low level scientific libraries and simple string processing libraries. Discussion to the suitability
of libraries to any system design is best left to experienced professionals.
3.8 Linear Math Libraries cblas_zgemm and zgemm:
Esprit calls cblas_zgemm 6 times as well as getS1S2 function calls cblas_zgemm another 2 times for 8
calls to cblas_zgemm. These well defined linear algebra functions and API's made a logical starting point
for the embedded integration effort. The core mathematical processing of the Esprit algorithm is handled by
the 8 function calls to zgemm (complex) generic matrix multiplication functions.10
C := alpha*op( A )*op( B ) + beta*C
The repetitive nature of Linear algebra mathematics with matrices has been optimized over several
decades, optimized on the essential equation above.
3.9 OMAP-L137
System overview of OMAP:
Multiple CPU masters (ARM or DSP) are combined with multiple slave (peripheral, memories) all manages
through a Switch Central Resources (SCRs) module. Powerful yet complex combination of asymmetric
CPU cores and peripheral resources.
The OMAP-L137 EVM, for high end multi channel audio processing.
The OMAP data sheets PRU Subsystem PRUSS Programmable Real-Time Unit Subsystem two units are
included PRU0 and PRU1. Complete with interrupt controller and associated memories the PRU's represent
instruction memory that can be used to perform a variety of embedded tasks with tight real-time constraints.
For Radar designs this flexibility can alleviate SPI bus bottlenecks or implement a CAN bus interface. See
1.4 and 6.20 of OMAP-L137 ADVANCE INFORMATION.
10 Reference 9 B. General ESPRIT Algorithm 1-7
Michael Nolin
16 of 24
January 7, 2011
ARC5_B schematics indicate a common JTAG header “JTAG CON” connect with both TMS320F28335
and TMS320C6747xxx. This is not consistent with the OMAP-L137 EVM which provided separate JTAG
connector headers for each JTAG target “ARM JTAG” and “TI JTAG”. The OMAP-L137 EVM provides
external logic to multiplex the ARM and TI JTAG connectors to the OMAP-L137ZKB (pins J1, 2, 3,4 H3
TCLK) signals DSP_EMU0 and DSP_EMU1 connected to J5 GPIO7_15 provided by JTAG_EMU1 and
JTAG_EMU0 from J4 the “TI JTAG” connector, effectively provide a work around for standard JTAG scan
chain. The ARM JTAG connector providing no JTAG_EMUx singals provide default ARM JTAG to
TMS320C6747ZKB. See: EVM schematic pages 10 and 22 of EVMOMAPL137_TechRef_revg.pdf
Section 6.31 JTAP Port Description omap-L137 Advance Information data sheets, JTAG scan chain taps are
used to select C674x or ARM926 debug interface. The ARC5_B schematics should support three JTAG
tap ID's for the F28335, ARM926 and C6747.
4 References:
1. Numerical Recipes Third Edition William H. Press, Saul A. Teukolsky, William T. Vettering,
Brian P. Flannery. 2007.
2. “Numerical Recipes, The Art of Scientific Computing” Third Edition 2007
3. cblas_zgemm
4. “Numerical Methods for DSP Systems in C Practical application of numerical methods in : Signal
Processing, Graphics, Video Programming, Scientific Applications” Don Morgan.
5. “Digital Signal Processing and Applications with the C6713 and C6416 DSK” Rulph Chassaing 2005 by
John Wiley & Sons Inc.
6. ESPRIT Beam Forming for the Autoliv Long Range Radar, Bruce Labitt.
7. “Singular Value Decomposition – A Primer” Sonia Leach Department of Computer Science Brown
University Providence RI 02912. DRAFT VERSION. (Postscript) Ghost view (1994)
4.1 Texas Instruments
8. Texas Instruments TMS320C6000 Optimization Workshop Student Guide
9. TI E2E Community “Use of C++ <complex> types and measured performance” TI C/C++ Compiler
Forum Clear Quest SDOWP ID#SDSCM00037600 Georgem.
10. Compiler Tuning Software Pipelined Loops
TI DSPLIB “Legacy ASM Implementation from C67x” DSPF_sp_mat_mul_cplx.asm
13. SPRU423F “TMS320 DSP/BIOS User's Guide” November 2004
14. TMS320C6000 DSP/BIOS 5.31 Application Programming Interface (API) Reference Guide
(spru403m.pdf) July 2006
15. TMS320C674x/OMAP-L1x Processor Peripherals Overview Reference Guide SPRUFK9B June 2009
Users Guide sprufm7d.pdf April 2010.
Michael Nolin
17 of 24
January 7, 2011
16. OMAP-L137 Low-Power Applications Processor ADVANCE INFORMATION September 2008 revised
August 2010.
17. OMAP-L137/TMS320C6747 Floating-Point Starter Kit '01 May 09' Early Adopter (EA) and (GA)
18. TMS320C67x FastRTS Library Programmer's Reference spru100a
19. TMS320F28335, ..334, 332,235,234,232 Digital Signal Controllers Data Manual SPRS439H June 2007
– Revised March 2010
20. TMS320x28xx, 28xxx DSP Peripheral Reference Guide SPRU566D June 2003 - October 2006
21. DSP/BIOS LINK LNK 058 USR User's Guide Version 1.61.03 March 31, 2009. OMAPL137/OMAPL137_arm_1_00_00_11/dsplink-1_61_03-prebuilt/packages/dsplink/doc
22. xDIAS-DM Users Guide
4.2 SAAB (Microwave)
SAAB documents concerning ARC4 and ARC5 devices can be found on the shared drive S:\AEACommon1\
active_safety\24GHz\Saab Transfer Documents several documentations revi
23. ARC5 SW design considerations 2009-05-25 A24R-00104 DDJX, Alexei Zernov
5 Appendix CLAPACK-3.2.1, LAPACK-3.2.2 ATLAS
6 Linux and the Texas Instruments c6x family
The evolution of multiple devices and multiple cores into many embedded products has lead to questions of
interoperability and communications between devices. Multi processor multi core designs with built in
architecture support from Unix, Linux and uClinux, are challenging the traditional RTOS micro-controller
software support packages. RTOS vendors must support a common IPC like communications interface to
enable system integrators to include micro-controllers in to current multiprocessor architecture designs. For
many reasons Linux to RTOS IPC is expected in todays technology systems.
EABI Embedded Application Binary Interface, a requirement of DSPLINK and TI PSP Platform Support
7 Software as Texas Instruments redefines it.
Texas Instruments as a hardware company has several definitions to describe basic C programming
concepts, some dating back to the origins of Unix and Xnix like concepts
IPS: Interprocessor Signaling (Semaphores)
PSP: Platform Software Package device drivers for 6747 for DSP/BIOS environments.
Michael Nolin
18 of 24
January 7, 2011
XDC eXpanDed C : A command shell for GNU make support, (see Richard Stallman,
XDM xDIAS-DM Digital Media Users guide. The xDM standard defines a uniform set of APIs across
various multimedia codecs to ease integration and ensure interoperability. xDM is built over TI’s well proven
eXpress DSP Algorithm Interoperability Standard (also known as xDAIS) specification.
A form of binary achieved container, for 3rd party intellectual property, without which research and
development of new algorithms would not be undertaken by businesses.
8 OMAP Software Resources.
Following the OMAP-L1x Getting Started guides through to working example code provided for
“DSPLIB/DSPLink Applications” illustrated the value of TI's solution to integrating DSP co-processing
devices into multiple core design solutions.
After running the pre-built executables for the “Example DSPLIB/DSPLink Application on OMAP-L1x”
building from sources further validated, build instructions and portability of DSPLINK to new designs.
-rw-r--r-- 1 users 1804720 2010-10-13 14:59 montavista/pro/devkit/lsp/ti-davinci/linux2.6.18_pro500/arch/arm/boot/uImage
LSP Linux Support Package U-Boot Kernel User-Bootloader and Flash Drivers.
Linux Utils 2.23.01 2009
CMEM contiguous memory manager.
SDMA, NA for OMAP-L137
VICP C64x Video Image Co-Processing includes vicp and irq.
The Linux Utils utility package provides the ability for user-mode applications to access the CMEM, EDMA,
SDMA, and VICP utility libraries
QA-C “static” tools proprietary tools checking for MISRA compliance.
Michael Nolin
19 of 24
January 7, 2011
Illustration 9: DSP hardware accelerator or algorithm co-processing engine
DSPLink Application Block Diagram
%2B_Linux an illustration from Texas Instruments of the OMAP-L137 integrated as a DSP hardware
accelerator or algorithm co-processing engine.
Illustration 10: DSP algorithm co-processing engine and external peripherals
DSPLink Application Block Diagram
%2B_Linux an illustration from Texas Instruments of the OMAP-L137 integrated as a DSP hardware
accelerator or algorithm co-processing engine, with the additional flexibility of external peripherals.
Currently Long Range Radar implements an external Ethernet connected to the C6747 DSP for radar data
Michael Nolin
20 of 24
January 7, 2011
Last login: Fri Jan 14 07:37:28 2000 on console
Linux 2.6.18_pro500-da830_omapl137_evm-arm_v5t_le #1 PREEMPT Wed Oct 13 14:58:53 EDT
2010 armv5tejl GNU/Linux
Welcome to MontaVista(R) Linux(R) Professional Edition 5.0.0 (0801921).
root@ lsmod
Size Used by
root@ ./
CMEMK module: built on Oct 14 2010 at 15:20:13
Reference Linux version 2.6.18
File /home/mnolin/work/OMAPL137/OMAPL137_arm_1_00_00_11/linuxutils_2_23_01/packages/ti/sdo/linuxutils/cmem/src/module/cmemk.c
ioremap_nocache(0xc2000000, 12582912)=0xc3000000
allocated heap buffer 0xc3000000 of size 0x8ac000
cmem initialized 3 pools between 0xc2000000 and 0xc2c00000
dsplinkk: no version for "struct_module" found: kernel tainted.
DSPLINK Module (1.61.03) created on Date: Oct 13 2010 Time: 17:08:03
root@ modinfo dsplinkk.ko
GPL v2
2.6.18_pro500-da830_omapl137_evm-arm_v5t_le preempt mod_unload ARMv5 gcc-4.2
root@ modinfo cmemk.ko
2.6.18_pro500-da830_omapl137_evm-arm_v5t_le preempt mod_unload ARMv5 gcc-4.2
Start Address for CMEM Pool Memory (charp)
End Address for CMEM Pool Memory (charp)
List of Pool Sizes and Number of Entries, comma separated,
decimal sizes (array of charp)
Start Address for Extended CMEM Pool Memory (charp)
End Address for Extended CMEM Pool Memory (charp)
List of Pool Sizes and Number of Entries, comma separated,
decimal sizes, for Extended CMEM Pool (array of charp)
Set to 1 if cmem range is allowed to overlap memory range
allocated to kernel physical mem (via mem=xxx) (int)
toot@ ./call_dsplib DSPF_sp_mat_mul_cplx mat_mul_cplx_input.tx
Initializing DSPLINK...
Calling DSPLIB function...
Received response from DSP after 0.000785 seconds.
DSP completed processing in 89271 cycles.
Closing DSPLINK...
call_dsplib completed successfully!
root@ cat /proc/version
Michael Nolin
21 of 24
January 7, 2011
Linux version 2.6.18_pro500-da830_omapl137_evm-arm_v5t_le (mnolin@linux-lap) (gcc version 4.2.0
(MontaVista 4.2.0- 2008-08-30)0
Illustration 11: OMAP-L137, running “Example DSPLIB/DSPLink Application on OMAP-L1x”.
Kernel modules as indicated have been built from source, MontaVista pro5 Kernel also built October 13th.
'Terminal', minicom console port, telnet and ssh terminals are supported.
9 VisionMid TMS320DM814x Software Resources
Product Status Product Preview (PP)
An “Engineering Release” for the VisionMid Software kit has most recently been released as of December
2, 2010. VisionMid support is currently only available in the latest CCStudio on Windows hosts
only. A required BIOS and PSP “DM8148 BIOS PSP” package is also only available with limited
functionality. Attempts to build sample test applications was unsuccessful. A test sample of CCStudio, 30
day trial due to expire January 21'st is also a dependency.
TMS320DM814x DaVinci Digital Media Processors TMS320DM8148(ALP) November 5,2010 REL-B
It is clear that VisionMid support is only available as an internal engineering release, several releases may
be necessary before performance benchmarks provided with the OMAP PSP can be executed.
Program Files\Texas Instruments\pspdrivers_02_20_00_02\docs\DM8148
Release Notes PSP release versioned is a Beta release for EVM DM8148.
DSP/BIOS Version
CCStudio Version
CG tools 4.6.3
EDMA3, XDC Tool IPC, and supported drivers
tables of performance and memory usage are provided for each driver provided by data sheets.
“Pleas note that at this point of time the drivers does not have any abstraction for the OS APIs and they
use the OS (BIOS inside the drivers.”
pspdrivers_02_20_00_02\packages\ti\psp\mcspi\docs 4 channel chip select (SPIEN). TI DSP/BIOS
driver with streams interface.
Cslr Chip supprot register configuration (Macros)
PCRM (that helps to turn the clock on/off for the modules)
XDC (eXpanDed C) EA1 Release, Eclipse Community Forums: DSDP Real Time software Components
Eclipse DSDP device software development project?
DM8148_BIOSPSP_Userguide.pdf Installation Guide.
All new IPC interface May 20,2010 IPC_Users_Guide.pdf
Michael Nolin
22 of 24
January 7, 2011
Pseudo Code
Matlab scripts,
Illustration 12: BIOS PSP Users Guide (OMAP-L137) block driver
Michael Nolin
23 of 24
January 7, 2011
Illustration 13: BIOS PSP driver with streaming interface (OMAP-L137)
Alphabetical Index
DSP BIOS...........................................................15
SYS BIOS...........................................................15
Host Port Interface.....................................................7
Host Port Interface................................................7
serial peripheral interface.....................................7
Inter-Process Communication...............................6
Michael Nolin
man ipc.................................................................7
DSPLINK.........................................................9, 14
serial peripheral interface...........................................7
serial peripheral interface.....................................7
24 of 24
January 7, 2011