1 Embedded Processing Portfolio Software, Tools, Kits & Boards MCU Microcontroller (MCU) Portfolio at-a-glance 16-bit Ultra-Low Power & Value Line MCUs 32-bit Real-Time MCUs MSP™ MCU C2000™ MCU • MSP430 MCU Measurement, sensing, general purpose, consumer, medical • Delfino, Piccolo single-core MCU • Concerto C28x+ ARM Cortex™-M Motor control, digital power, lighting, renewable energy, smart grid ARM® Portfolio at-a-glance 32-bit MCUs Stellaris® ARM MCU & Hercules™ Safety ARM MCU • ARM Cortex™-M • ARM Cortex™-R 32-bit Microprocessor s Sitara™ ARM MPU • ARM CortexA8 • ARM9™ DSP & ARM® MPU Digital Signal Processor (DSP) Portfolio at-a-glance 16/32-bit Single-core DSPs C6000™ & C5000™ single-core DSP • C6000 high performance fixed/floating-point DSP • C5000 ultra-low power fixed-point DSP Industrial automation, Motion control, Connected audio/voice, point-of-service, human machine interface, video, fingerprint biometrics, human machine interface, portable medical, sensors industrial automation, portable navigation smart grid, safety, transportation, industrial & medical 32-bit Multicore DSPs C6000™-based multicore DSP Fixed/floating-point: • DSP + ARM • C66x multicore DSP • DaVinci video processors High performance realtime computing, video security and analytics, video communications, multimedia infrastructure C67x Architecture and Features 3 C6x VLIW CPU Core ’C62x Fixed-Point CPU Core Program Fetch Control Registers Instruction Dispatch Instruction Decode Data Path 1 Data Path 2 A Register File B Register File Control Logic Test Emulation L1 S1 M1 D1 D2 M2 S2 L2 Interrupts • DSP architecture challenge: – DSP algorithms have a high degree of parallelism – Cost-effective control of parallelism is difficult • VLIW architecture solution: – Provides simple, cost-effective control of parallelism •fetches 8 instructions/cycle •executes 1-8 instructions/cycle reducing –code size –program fetches –power consumption – Can support high-performance compilers •3x improvement in efficiency based on DSP benchmark suite – Can scale to support architectural enhancements C67x Floating point core • Performance (Comm/Ind) – IEEE Floating Point Format •Double Precision •Single Precision – 668 Multiplies & AccumulatesSingle-Precision •2 Multipliers (334 MFLOPS) •2 ALUs (334 MFLOPS) – 420 MFLOPS, Double Precision – 250 Multiplies & AccumulatesDouble-Precision •1 Result/4 Cycles (83.5 MFLOPS) •1 Result/2 Cycles (167 MFLOPS) 5 VelociTITM: Speed with efficiency • Execute: CPU executes 1 to 8 instructions/cycle • As a result, fetch packets can contain multiple execute packets • Parallelism is determined at compile/assembly time and can be: Fully Parallel Fully Serial Serial/Parallel 6 Floating Point DSP Comparison C6701B 167 MHz C6713B 200 MHz C6727 250 MHz MIPS MFLOPs 167 x8= 1336 1000 1600 1200 2000 1500 Architecture C67x C67x C67x+ 4KB L1-P, 4KB L1-D, 256KB L2 Cache/SRAM 32KB L1-P, 256KB L2 SRAM, 384KB ROM Memory 64KB Data Memory 64KB Program Memory HPI HPI-16 1 32/16-bit 1 UHPI 32/16-bit EMIF 100MHz 32-bit (SDRAM) 100MHz 32-bit (SDRAM) 100MHz 32-bit (SDRAM) DMA 4-ch DMA 16-ch EDMA 16-ch dMAX McBSP 2 2 0 McASP 0 2 3 I2C 0 2 3 SPI 0 0 2 (10MHz) Package 429-pin Ceramic BGA (27mm, 1.27mm) 352-pin Plastic BGA, (35.2mm, 1.27mm) 272-pin PBGA 27x27xmm, 1.27mm Software Compatible 256-pin PBGA 16x16mm, 1.0mm (Ceramic Package TBD) 7 TMS320C672x Device Overview TMS320C672x Floating-Point DSP 256K Bytes SRAM 768K Bytes ROM Instruction Cache 32K Bytes SPI 0 C67x+TM DSP Core IIC 0 IIC 1 Memory Controller 300 MHz DSP core 300 MHz 67x+™ core 64 Reg + Additional FP instructions Code Compatible with 6713 Devices Large on-chip memory 384KB on-chip ROM 256KB on-chip RAM McASP 0 32KB Inst. cache (Int Mem + EMIF) McASP 1 EMIF for expansion Config McASP 2 EMIF Switch HPI DMA SPI 1 Enhanced Audio IO RTI TImer Max Max Control dMax 16 serial data pins Up to 6 different clock rates dMAX - Support for dma, circular and multi-tap memory delay (for Reverb) HPI supports mux A/D and nonmux A/D 8 Memory Architecture • New memory architecture – Improved Instruction cache • Size increased from 4KB to 32KB • Cache miss penalty to Internal Memory reduced 40% • Supports internal RAM/ROM and EMIF – Direct single level flat memory for data, Single Cycle access (ROM and RAM) – All RAM and ROM is accessible as pgm or data (like C6713) 9 Enhancements – DP, Code Density • Changes in 67x+ – All changes are backwards compatible to 67x CPU (C6713) – General Purpose Registers increased from 32 to 64 – New MPYSPDP instruction – SP x DP into DP – New MPYSP2DP instruction – SP x SP into DP – Additional ADDSP/DP, ADDDP, SUBSP, SUBDP in S unit • Now have 4 floating point add or subtracts in parallel – Execution packets can span Fetch Packets (64x feature) • Code size reduction (5 to 10% reduction) since no padding with NOPs 10 Benchmark Performance 11 Performance: The BDTImark TM Real block FIR filter Complex block FIR filter Single-sample LMS-adaptive FIR filter Single-sample real FIR filter Single-sample IIR filter Vector dot product Vector add Vector maximum IS-54 convolutional encoder Finite state machine 256-point FFT TM Berkeley Design Technology, Inc - Berkeley, CA 12 ’C67x: Floating point performance* BDTImarkTM: A DSP Speed Metric Source www.BDTI.com. ©1999 BDTI 23 Intel Pentium 200 MHz 17 ADI ADSP-2106x 60 MIPS TI TMS320C3X 30 MIPS, 80 MFLOPS TI TMS320C4X 25 MIPS, 60 MFLOPS 9 7 TI TMS320C67x 1 GFLOPS 65 *Commercial Temp TM Berkeley Design Technology, Inc - Berkeley, CA 13 ’C67x: Benchmark performance* Floating-Point Performance Execution time (in Sec) 108.33 Complex Radix 4 FFT Block FIR Convolution Matrix Vector Multiply *Commercial Temp 1,672 13.296 149 0.828 16.6 0.420 1.25 TI TMS320C6701 1 GFLOPS Typical Floating-Point DSP (60 MFLOPS) 14 C28x Digital Signal Controller TMS320F2812 18Kw RAM 128Kw Flash + 1Kw OTP 4Kw Boot ROM Event Mgr A Event Mgr B XINTF Memory Bus 12-Bit ADC Watchdog Interrupt Management TM 150 MIPs C28x 32x32 bit Multiplier 32-bit Timers (3) Real-Time JTAG 32-bit DSP RMW Atomic ALU 32-Bit Register File GPIO Peripheral Bus McBSP CAN 2.0B SCI/UART-A SCI/UART-B SPI TMS320F2812 Features and Benefits Features Benefits 150-MHz C28x 32-bit DSP core C28x 32-bit DSP core enables high-speed execution of control algorithms. Faster control code execution gives headroom for advanced control techniques enabling great efficiency and cuttingedge features Unique control peripherals 12-bit high-speed dual-sample-hold ADC allow for simultaneous sampling of power system currents and voltages; Event Manager modules provide a hardware interface for sensored or sensorless three-phase inverter control. On-chip communication CAN, I2C, SPI, UART, and external memory peripherals interface allow for a full system implementation. 17 C28x CPU • • • • • • Modified Harvard Bus Architecture Emulation Logic 32-bit fixed-point DSP RISC instruction set 8-stage protected pipeline 32x32 bit fixed-point MAC for single-cycle 32-bit multiply Dual 16x16 bit fixed-point MACs Single-cycle instruction execution •Separate data and instruction buss •Two data buses – one for read, one for write •Enables fetch, read, and write in a single cycle •Essential to maximizing single-cycle MAC • Real-time emulation allows interrupt servicing even when main program is halted • Debug host has direct access to registers and memory • Multiple hardware debug events and breakpoints 18 C28x Core: Bus Structure Program Address Bus (22) Program (4 M* 16) Program Data Bus (32) Data Address Bus (32) Data (4 G * 16) Data Data Bus (32) Registers ARAU SP DP @X XAR0 to XAR7 Execution MPY32x32 ALU XT P ACC R-M-W Atomic ALU Debu gReal-Time JTAG Emulation & Test Engine Register Bus Data Write Bus (32) Program Write Bus (32) The C28x multiple bus architecture makes better use of the processor cycles: Instruction fetch, decode and execute can happen on the same clock cycle Memory Standard Peripherals External Interfaces C28x Core: Protected Pipeline A F1 F2 D1 D2 R1 R2 X W B F1 F2 D1 D2 R1 R2 X W F1 F2 D1 D2 R1 R2 X W R2 X W R2 X F F1 F2 D1 D2 R1 R2 G F1 F2 D1 D2 C D E F1 F2 D1 D2 R1 F1 F2 D1 D2 R1 8-stage pipeline F1 F2 D1 F1 F2 D1 D2 R1 R2 X W Writes: are ? “free” E & G access same address W X W RR1 1 R2 XR2 X W W DD22 R1 R R21 X R2 W X W Instruction address Protected Pipeline Instruction content Decode instruction Order of results are as written in source code Resolve operand address Programmer need not worry about the Operand address Many MCUs Get operand pipeline CPU doing “real” work Shared bus for program and data address and Store content to memory content Typically results in only one instruction in 4 cycles C28x Core: Instruction set for Control Read/Modify/Write and Atomic Operation Offers sufficient hardware resources to efficiently handle control algorithms LOAD Registers CPU ALU / MPY Atomic WRITE Memory READ Atomic Instructions Benefits: Simpler programming Smaller, faster code Non-interruptible operations STORE RISC Read/Modify/Write SETC MOV AND MOV CLRC INTM AL,*XAR2 AL,#1234h *XAR2,AL INTM 6 words/ 5 cycles DSP Read/Modify/Write SETC AND MOV CLRC INTM AL,*XAR2,#1234h *XAR2,AL INTM 5 words/ 4 cycles C28x Atomic Operation AND *XAR2,#1234h 2 words/ 1 cycle C28x Core: Fast Interrupt Response PIE: Peripheral Interrupt Expansion Internal Sources TINT2 TINT1 TINT0 EV and Non-EV Peripherals (EV, ADC, SPI, SCI, McBSP, CAN) External Sources XINT1 XINT2 PDPINTx RS XNMI_XINT13 C28x Core PIE (Peripheral Interrupt Expansion) RS NMI INT1 INT2 INT3 • • • INT12 INT13 INT14 C28x Core: Fast Interrupt Response Latency is Minimized Latency External Signal INTx Internal Signal Sync Interrupt Signal Set IFR Vector fetch Auto context save 2 1 8 1 PIE HW Interrupt jammed Sync Set PIEIFR into pipeline Decode 1st ISR instruction Latency: time between when an interrupt occurs to decoding (D2) the first ISR instruction Minimum latency: Internal peripherals: 10-14 cycles (100 ns @100MHz) External signals: 11 cycles (110 ns @ 100 MHz) Maximum latency: depends on wait states, ready, INTM, etc. C2000™ real-time controllers software Software Highlights ControlSuite™ Software Software infrastructure and tools for every stage of development and evaluation Allows customers to focus on differentiation, not basics Key Functional Areas: Device Support (Bit fields, API Drivers, Examples) Library Repository (Math Library, DSP Library, Application Library, Utilities) Development Kits (Hardware Package, Software Examples, Complete System Framework, Graphical User Interfaces) Debug and Software Tools (IDE, RTOS, Emulation Integrated Development Environment (IDE) Eclipse-based Code Composer Studio™ IDE supports all Application Specific Software: Motor Control Software Library Supports multiple motor types and control techniques (ex: FOC (sensored and sensorless) for ACI, PMSM Digital Power Software Library Library for both C28x Core and CLA Getting Started ControlSuite Application Notes Users Guide Tools/Reference Designs ControlSticks ControlCards Evaluation Kits Development Tools 25 Tools • Code Composer is an Integrated Development Environment (IDE) similar to MS Visual C++ and built specifically for DSP • DSP/BIOS is a library of scheduling, instrumentation, and communications functions that provides real-time analysis and RTDXTM (Real-Time Data Exchange) • Hardware Emulation, and Evaluation tools allow code debug on actual silicon and low-cost analysis of performance in early stages of development cycle • Code Composer Studio provides an extensible tool plug-in and seamless integration between the host and target DSP tools 26 CCSv4/v5 Perspectives contain separate window arrangements depending on what you are doing. Customize toolbars & menus Tab data displays together to save space Tabbed editor windows Fast view windows don’t display Until you click on them 10/19/11 27 Code Composer Studio v5 CCSv5 is split into two phases 5.0 Not a replacement for CCSv4 Targeted at users who are using devices running Linux & multi-core C6000 Addresses a need (Linux debug) that is not supported by CCSv4 Available today 5.1 replacement for CCSv4 and is targeted at all users Available fall 2011 Supports both Windows & Linux Note that not all emulators will be supported on Linux SD DSK/EVM onboard emulators, XDS560 PCI are not supported Most USB/LAN emulators will be supported XDS100, SD 510USB/USB+, 560v2, BH 560m/bp/lan http://processors.wiki.ti.com/index.php/Linux_Host_Support Code Composer Studio v4 • Easy to use, Eclipse based IDE: Compiler, linker, more • Supports all MSP430 MCUs • Enhancements since CCE v3: – Speed – Code size improvements – Auto-updating • $495 for CCS v4 MCU Edition • Free for apps <16KB • Identical look and feel as Code Composer Essentials http://wiki.msp430.com/wiki/index.php?title=Category:Code_Composer_Studio_v4 Analyze: Visualize data Graphical Signal Analysis: – View signals in native format – Change variables on the fly and see their effects – Numerous applicationspecific graphical plots • • • • FFT waterfall Eye diagram Constellation plot Image displays & more – Requires no additional code 30 BACKUP 31 C6701 DSP Block Diagram C672x DSP Block Diagram 33 THANK YOU 34