EECS 452 – Lecture 23 Today: TI MSP430 and Piccolo. Handouts: printed copy of today’s lecture slides Read: about DSP! References: Last one out should close the lab door!!!! Please keep the lab clean and organized. Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers in the future may have only 1,000 vaccuum tubes and perhaps weigh 1.5 tons. – Popular Mechanics, March 1949 EECS 452 – Winter 2010 Lecture 23 – Page 1/62 Friday – March 12, 2010 Actually . . . Actually there were 18800 vacuum tubes and of those 6550 were 6SN7s. The 6SN7 was/is a dual triode and was used to implement the 20 digit signed decimal accumulators. By not turning off the power to ENIAC the average failure rate was 1 tube about every two days. The longest up period was 116 hours. A portion of ENIAC is located in the lobby of the CSE building. The tubes that you see are very likely 6SN7s. ENIAC’s active lifetime was 9 years, 1947–1955. EECS 452 – Winter 2010 Lecture 23 – Page 2/62 Friday – March 12, 2010 Overview of today’s lecture Unfortunately, likely to be fragmented and rambling. ◮ Comments on single supply operation. ◮ The MPS430 ◮ Multiplying without a multiplier. ◮ An IIR filter for the MSP430 ◮ The MSP430 SPI interface. ◮ Linking MSP430 SPI to C5505 I2S. ◮ The TI Piccolo EECS 452 – Winter 2010 Lecture 23 – Page 3/62 Friday – March 12, 2010 Thinking about single supply operation +V /2 +V /2 +V /2 +V ground V /2 ground −V /2 ground −V /2 −V /2 ground Bypass capacitors not shown. +V +V R +V ground R ground ground V /2 An alternative name for ground is common. Maybe a better choice. EECS 452 – Winter 2010 Lecture 23 – Page 4/62 Friday – March 12, 2010 Focusing now on the MSP430™ EECS 452 has a couple of eZ430-F2013 Development tools and several Z-Accel wireless kits (uses F2274). The development tool F2012/13 boards execute programs out of flash. The boards can operate stand-alone, have projects have used them in this manner. The F2012/F2013 boards have been used to interface to XBee wireless modules via UART and to the C5505 via SPI. The three most important documents are: ◮ The data manual for the F20xx microcontrollers. ◮ The MSP430x2xx Family User’s Guide, SLAU144E. ◮ The eZ430-F2012 Development Tool User’s Guide, SLAU176B. EECS 452 – Winter 2010 Lecture 23 – Page 5/62 Friday – March 12, 2010 Where used? http://www.ti.com/ww/en/mcu/valueline/index.shtml?DCMP=Value_Line&HQS=Other+BA+430value-promo. All these applications likely involve the use of Digital Signal Processing! I don’t understand how the new value line differs from the existing low end units other than in part number and price. EECS 452 – Winter 2010 Lecture 23 – Page 6/62 Friday – March 12, 2010 What is low power? ◮ There are six low power modes of operation. ◮ Standby (asleep) at 3V with self wake up with RAM retention, < 0.6µA, about 1.8 microwatts. ◮ 250µA per MIP when active. (MSP430X2xx family.) This is 3/4 milli-Watt per MIP at 3 Volts. ◮ Wake up time < 1µs. EECS 452 – Winter 2010 Lecture 23 – Page 7/62 Friday – March 12, 2010 Comments http://focus.ti.com/graphics/mcu/ulp/battery-life.gif. EECS 452 – Winter 2010 Lecture 23 – Page 8/62 Friday – March 12, 2010 eZ430-Development Tool The debugging interface shown is the old version. I believe that we only have the 6 pin version. For the F2012/13 boards simply use the center four pins. Note that the 14 pin pattern mirror images the physical pin positions on the F2012/13 packages. BEWARE! SLAU176B documents the tool and the F2013 board. (Figure from there.) EECS 452 – Winter 2010 Lecture 23 – Page 9/62 Friday – March 12, 2010 MSP430 generic block diagram ACLK Clock System SMCLK Flash/ ROM RAM Peripheral Peripheral Peripheral RISC CPU 16-Bit JTAG/Debug MCLK MAB 16-Bit MDB 16-Bit Bus Conv. MDB 8-Bit JTAG ACLK SMCLK Watchdog Peripheral Peripheral Peripheral Peripheral From the MSP430X2XX Family User’s Guide. EECS 452 – Winter 2010 Lecture 23 – Page 10/62 Friday – March 12, 2010 MSP430 CPU block diagram MDB − Memory Data Bus Memory Address Bus − MAB 15 ◮ RISC architecture. 0 R0/PC Program Counter 0 R1/SP Stack Pointer 0 R2/SR/CG1 Status ◮ 27 core instructions. R3/CG2 Constant Generator ◮ Plus 24 emulated instructions. ◮ 7 addressing modes. ◮ Every instruction usable with every addressing mode. ◮ Single-cycle register operations. ◮ Constant generator for six most commonly used values. ◮ Direct memory-to-memory transfers. ◮ Instruction times depend on the addressing mode used. ◮ Instruction can take from 1 to 6 cycles. R4 General Purpose R5 General Purpose R6 General Purpose R7 General Purpose R8 General Purpose R9 General Purpose R10 General Purpose R11 General Purpose R12 General Purpose R13 General Purpose R14 General Purpose R15 General Purpose 16 16 From the MSP430X2XX Family User’s Guide. EECS 452 – Winter 2010 Zero, Z Carry, C Overflow, V Negative, N Lecture 23 – Page 11/62 dst src 16−bit ALU MCLK Friday – March 12, 2010 How to do DSP without a multiplier? Here is the problem that I want to address: ◮ Manufacturers, such as TI, sell low cost, low power microcomputers, essentially by the millions. ◮ Many of these do not possess a multiplier, yet alone a MAC unit. ◮ In spite of this there, are likely many applications that would benefit (result in a more desirable product) by use of some DSP. ◮ Just as floating point arithmetic is emulated in the C5505 by software, one can emulate the operation of a multiplier hardware in software. ◮ Implementation of multiplication in a multiplierless can be divided into two basic categories : general purpose multiplication and hard coded multiplication. ◮ The general multiplier is the more flexible but is also the most costly in terms of execution time. ◮ The hard coding of the computation steps assumes multiplication by fixed values (such as filter coefficients). Is fastest but requires significant code space. EECS 452 – Winter 2010 Lecture 23 – Page 12/62 Friday – March 12, 2010 So what would I like to cover? Disclaimer: this is a work in progress. Some has been done, some not. I accidentally lost my MSP430 test codes when upgrading to CCS4. Some of the outline below is fantasy, at this point, but should provide hints to anyone interested in delving into this topic on their own. ◮ Pencil and paper unsigned binary multiplication. ◮ Pencil and paper two’s complement binary multiplication. ◮ Multiplier block diagrams. ◮ Coding a general multiplier in the MSP430. TI likely supplies code for such. ◮ Booth’s algorithm. ◮ Signed Digit (SD) and Canonical Signed Digit (CSD) representation. ◮ Testing. ◮ A IIR filter code generator. EECS 452 – Winter 2010 Lecture 23 – Page 13/62 Friday – March 12, 2010 Will knowing how to do this be useful? ◮ The lowest cost MSP430 having a multiplier appears to be the MSP430F2330 at $1.75 at 1ku. It has a slope A/D and lives in a 40 pin flat pack. ◮ If one could use a $0.60 part (e.g., the F2011) at the 1ku level the savings would be $1150 and at the 10ku level $11,500, etc. ◮ There likely will be many situations where knowing how to do this will be useful and make economic sense. ◮ Someone will benefit from knowing how to do this. Just who and when? It might be you. EECS 452 – Winter 2010 Lecture 23 – Page 14/62 Friday – March 12, 2010 Relevant TI application notes Efficient Multiplication and Division Using MSP430, Kripasagar Venkat, Application Report slaa329, 9/2006. Efficient MSP430 Code Synthesis for an FIR Filter, Kripasagar Venkat, Application Report slaa357, 3/2007. Combines Horner’s method of polynomial evaluation with the Canonical Signed Digit (CSD) number representation to “efficiently” (as well as one can) implement DSP. The focus is on the multiplierless MSP430 devices but the method will work on any computer or FPGA. The source files are also available. This pair of notes are what started me on this effort. EECS 452 – Winter 2010 Lecture 23 – Page 15/62 Friday – March 12, 2010 Comments on the application notes ◮ Author assumes use of Q15. ◮ Develops a right to left algorithm. ◮ Relates process to use of Horner’s method of polynomial evaluation. ◮ Hard codes the shift and add steps for constant multiplier values. ◮ Uses signed digit representation for multipliers. ◮ Essentially equivalent basic shift and add multiplier. ◮ Recall that Q15 is a state of mind, not a function of a hardware binary point. EECS 452 – Winter 2010 Lecture 23 – Page 16/62 Friday – March 12, 2010 Doing pencil and paper multiplication × + + + − b0 × b1 × b2 × b3 × b4 × a4 a3 a2 b4 b3 b2 a1 b1 a0 b0 a4 a4 a4 a4 a4 a4 a4 a4 a4 a4 a4 a4 a4 a4 a3 a4 a4 a4 a3 a2 a4 a4 a3 a2 a1 a4 a3 a2 a1 a0 a3 a2 a1 a0 a2 a1 a0 a1 a0 a0 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 The multiplicand sign bit is extended for each row. EECS 452 – Winter 2010 Lecture 23 – Page 17/62 Friday – March 12, 2010 Summing rows signed multiplier logic a b lsb lsb register shift register AND add/subtract S subtract lsb p-register shift register high bits low bits a×b EECS 452 – Winter 2010 Lecture 23 – Page 18/62 Friday – March 12, 2010 C simulation: unsigned shift and add multiplication // FPGA and MSP430 simulated unsigned shift and add multiply uint32_t u_sanda(uint16_t a, uint16_t b) { uint16_t ctr; uint32_t sum; sum = 0; for (ctr=0; ctr<16; ctr++) { if (b & 0x0001) { sum = sum&0xFFFF; // insure carry is 0 sum += a; } b = ((sum&0x0001)<<15) + (b>>1); sum = sum>>1; // shift right including carry } return ((uint32_t)sum<<16)+(long)b; EECS 452 – Winter 2010 Lecture 23 – Page 19/62 Friday – March 12, 2010 C simulation: signed shift and add multiplication // signed shift and add multiply int32_t fs_sanda(int16_t a, int16_t b) { uint16_t ctr, pr, low, carry, sign_a, sign_b; sign_a = a&0x8000; sign_b = b&0x8000; pr = 0; low = 0; carry = 0; for (ctr=0; ctr<16; ctr++) { if (b&0x0001 != 0) { carry = sign_a; if (ctr == 15) pr -= a; else pr += a; } b = b>>1; if (pr&0x0001 != 0) low = 0x8000+(low>>1); else low = (low>>1); pr = (pr>>1)|carry; } if (a !=0) pr = pr^(sign_b ); return ((int32_t)pr<<16)+low; } EECS 452 – Winter 2010 Lecture 23 – Page 20/62 Friday – March 12, 2010 Comments These simulations mimic were written in conjunction with MSP430 code. Multiplies two 16-bit values with a 32 bit result. Exhaustively tested using all possible multiplier and multiplicand values. EECS 452 – Winter 2010 Lecture 23 – Page 21/62 Friday – March 12, 2010 Working with Q15 values. ◮ Basically do integer multiplication. ◮ Product is 32 bits (two words). ◮ Left shift result by 1 and retain only the top 16 bits. Round first? ◮ Only need to do the multiplication keeping the top 16 bits. The low bits can be discarded as generated. Might complicate rounding. ◮ For the shown algorithm what if we don’t do the last right shift? ◮ Code and TEST. My norm is to exhaustively test where ever possible. ◮ When not possible, test end/special cases then use random values, lots of random values. EECS 452 – Winter 2010 Lecture 23 – Page 22/62 Friday – March 12, 2010 Signed digit number representation ◮ Instead of representing values with 0 and 1 digit values, use digit values of -1, 0, 1. ◮ Awkward on a binary processor. However, if one is hard coding the steps in a multiplication operation is easily done. ◮ Not a unique representation. Lots of ways of writing a given value using signed digits. EECS 452 – Winter 2010 Lecture 23 – Page 23/62 Friday – March 12, 2010 Canonical SD representation Uses the minimum number of non-zero digits. ◮ Reduces the instructions needed to hard code multiplication. ◮ Where to find an algorithm for generating CSD? Try Computer Arithmetic Algorithms, by Israel Koren. ◮ How much efficiency is obtained? EECS 452 – Winter 2010 Lecture 23 – Page 24/62 Friday – March 12, 2010 Converting an integer to CSD form /* File name: Int2CSD.c Two’s complement integer to canonical signed digit. Algorithm from Koren ... 16Feb2009 .. initial version .. K.Metzger */ #include <stdio.h> #include <stdint.h> #include <stdlib.h> void Int2CSD(int32_t value, int nbits, int *bits, int *digits) { int idx, cin=0, which; // // // // integer value to convert number of bits in value to convert bits array...nbits+1 elements digits array...nbits elements for(idx=0; idx<nbits; idx++) { bits[idx] = value & 0x1; value >>= 1; } bits[idx]= bits[idx-1]; // sign extend one extra bit } for (idx=0; idx<nbits; idx++) { which = (bits[idx+1]*2+bits[idx])*2+cin; switch(which) { case 0: digits[idx] = 0; cin = 0; break; case 1: digits[idx] = 1; cin = 0; break; case 2: digits[idx] = 1; cin = 0; break; case 3: digits[idx] = 0; cin = 1; break; case 4: digits[idx] = 0; cin = 0; break; case 5: digits[idx] = -1; cin = 1; break; case 6: digits[idx] = -1; cin = 1; break; case 7: digits[idx] = 0; cin = 1; break; default: printf("Int2CSD: oops!\n"); exit(1); } // end of switch } // end of for // end of function EECS 452 – Winter 2010 Lecture 23 – Page 25/62 Friday – March 12, 2010 Implementing a IIR filter Assume 16-bit values. Assuming a uniform distribution on the ones and zeros. ◮ On the average there will 8 ones and 8 zeros in the multiplier. ◮ Each one will be coded as a shift and an add. Eight shifts and eight adds. ◮ Each zero will be coded as a shift. Eight shifts. ◮ On the average (assuming that we are not doing Voodoo statistics here) a multiplication will need 16 shifts and 8 adds. Twenty four machine cycles. ◮ On a MSP430 running at 16 MHz a hard coded multiplication will take on the order of 1.5µs. ◮ To be conservative let’s use a value of 3µs. ◮ To implement an 8th order biquad filter we need five multiplications per biquad and four biquads. ◮ The nominal, very hand wavy, time required to filter a sample is on the order of 60µs. ◮ It might be possible to sample using a sample rate of 16 kHz and filter. EECS 452 – Winter 2010 Lecture 23 – Page 26/62 Friday – March 12, 2010 Is this reasonable and can we do better? ◮ A 16-bit FPGA multiplier implementation should only need about 16 clock tics. The multiplier foot print should be small enough to allow all 20 multipliers to be implemented. In this case a nominal 16 clock tics would be needed per input sample for each filter output. (This is an aside, sorry.) ◮ There is exists a non-unique number representation called signed digit. When placed into canonical form this representation contains the minimum number possible non-zero values. These non-zero values are either +1 or −1. ◮ There is the possibility of speeding up hard coded multiplications. ◮ A reasonable question is “by how much”. EECS 452 – Winter 2010 Lecture 23 – Page 27/62 Friday – March 12, 2010 Implementing multiplication in an MSP430 When updating to CCS V4 I deleted my old Code Composer Essentials. Oops. I had meant to back this work up. EECS 452 – Winter 2010 Lecture 23 – Page 28/62 Friday – March 12, 2010 Canonical heresy What are the maximum values associated with the w1 and the w2 ? What are the maximum values associated with the w3 and w4 ? (Assuming our usual scaling scheme.) Where does overflow occur? Is this important? (Combine the two top adders into one.) x b0 z−1 w1 z w2 b1 −1 b2 + y + z−1 w3 + + −a1 z−1 w4 −a2 Is this truly real? EECS 452 – Winter 2010 Lecture 23 – Page 29/62 Friday – March 12, 2010 Is the result worth the effort? ◮ I wrote a C simulation for the lab 8th order IIR filter. ◮ The straight shift and add multiplication algorithm takes 164 adds per sample. ◮ The CSD multiplication algorithm takes 112. ◮ The nominal CSD version does 0.68 times the number add/subtracts as the normal algorithm. ◮ In a final form filter there will also be additional overheads that will mute the speedup amount. Maybe by a factor on the order of two. This still gives an on the order of 16% speed up. ◮ Of course, I’m assuming that I’ve done everything correctly. The only really good way to answer this question is to build both versions and run them. EECS 452 – Winter 2010 Lecture 23 – Page 30/62 Friday – March 12, 2010 Moving onto the MSP430 SPI ◮ Two version have been present. Current can optionally do 8 or 16 bit transfers. ◮ A versatile device. ◮ Can be used to program a UART transmitter. ◮ Have programmed to communicate to C5505 via I2S. ◮ Used I2S mono mode. “Hand generated” frame sync. ◮ Last week TI issued an application note showing how to use a couple of chips external to the MSP430 to do the I2S link. Their solution is more general that what I did. EECS 452 – Winter 2010 Lecture 23 – Page 31/62 Friday – March 12, 2010 F2012/13 USI SPI block diagram USIGE USII2C = 0 USIOE USIPE6 SDO D Q G USI16B USILSB USIPE7 SDI 8/16 Bit Shift Register EN USISR USICNTx USIIFGCC Bit Counter EN USISWRST Set USIIFG USICKPH USICKPL USIPE5 Shift Clock 1 SCLK 0 USISSELx SCLK 000 ACLK 001 SMCLK 010 SMCLK 011 USISWCLK 100 TA0 101 TA1 110 TA2 111 USIMST USIDIVx 1 Clock Divider /1/2/4/8... /128 USICLK 0 HOLD USIIFG From slau144e.pdf. EECS 452 – Winter 2010 Lecture 23 – Page 32/62 Friday – March 12, 2010 F2012/13 USI SPI timing diagram USI USI USICNTx 0 CKPH CKPL 8 7 6 5 4 3 2 1 0 0 0 SCLK 0 1 SCLK 1 0 SCLK 1 1 SCLK 0 X SDO/SDI MSB LSB X SDO/SDI MSB LSB 1 Load USICNTx USIIFG From slau144e.pdf. EECS 452 – Winter 2010 Lecture 23 – Page 33/62 Friday – March 12, 2010 Can use SPI as a UART transmitter ◮ UART uses 10 bit frame. ◮ SPI has 16 bits in frame. ◮ Have to slow UART down some because sending 16 bits per item versus 10. ◮ Have to bit reverse order in SPI frame because UART is lsb to msb. EECS 452 – Winter 2010 Lecture 23 – Page 34/62 Friday – March 12, 2010 Application Examples 1. Moving 16-bit values from a F2012 using the MSP430 SPI interface to the C5505 using the C5505 I2S interface. The one available eZdsp SPI “channel” is used to interface FPGA display support to the C5505. Three I2S channels are available. Our intent is to use one of these. This is a slightly contrived example. The C5505 itself has four A/D input channels that could be used for this application. 2. Moving 8-bit values from a F2012 using the MSP430 SPI interface to the C5505 using the C5505 UART interface. Useful when sending values from a MSP430 to a XBee wireless device. EECS 452 – Winter 2010 Lecture 23 – Page 35/62 Friday – March 12, 2010 MSP430 Master SPI to C5501 slave I2S1 An example application would to measure the positions of four variable resistors (either rotary or slider) to be used as control inputs to an audio special effects processor running on a C5505. EECS 452 – Winter 2010 Lecture 23 – Page 36/62 Friday – March 12, 2010 F2012 pin use VCC 1 14 VSS P1.0/TACLK/ACLK/A0 2 13 XIN/P2.6/TA1 P1.1/TA0/A1 3 12 XOUT/P2.7 P1.2/TA1/A2 4 11 TEST/SBWTCK P1.3/ADC10CLK/A3/VREF--/VeREF-- 5 10 P1.4/SMCLK/A4/VREF+/VeREF+/TCK P1.5/TA0/A5/SCLK/TMS 6 9 RST/NMI/SBWTDIO P1.7/A7/SDI/SDA/TDO/TDI 7 8 P1.6/TA1/A6/SDO/SCL/TDI/TCLK ◮ The F2012 package has 14 pins. Pins 1 and 14 are used for power and ground. Pins 10 and 11 are used by JTAG, Spy by Wire. This leaves 10 for signals. ◮ Need to use three signals to interface to I2S, frame sync, clock (pin 7), data (pin 8). The MSP430 SPI hardware does not generate frame sync. Have to use an output port pin and generate it ourselves. ◮ Available A/D channels are on pins 2,3,4,5 and 9. Pin 2 is connected to an led. Pins 3,4,5 and 9 are available as A/D inputs. We will have to use either pin 12 or 13 as frame sync. This locks out possible use of a 32768 Hz crystal. Will use pin 12 (port 2 pin 7). From the TI MSP430F2012 data sheet. EECS 452 – Winter 2010 Lecture 23 – Page 37/62 Friday – March 12, 2010 C5505 and other considerations ◮ C5505 has X SPI ports of which only one is brought out and is generally used to drive the S3SB graphics. ◮ There are four I2S ports. Port I2S port 0 is use with the CODEC. Ports I2S1 and I2S2 are brought to the eZdsp connector. Port I2S3 is shared with the UART. ◮ When I2S is a slave the transfer timing is controlled by the master and does can be “bursty”. ◮ Will use I2S1 to support the slave input. ◮ Will use DSP mono-mode. ◮ The F2012/3 SPI output does not include a frame sync waveform. One can be generated using a port pin. ◮ Need at least one clock additional clock pulse to allow the C5505 to sample the frame sync transition. EECS 452 – Winter 2010 Lecture 23 – Page 38/62 Friday – March 12, 2010 F2012 main #include <msp430x20x3.h> volatile unsigned int i, value; void main(void) { WDTCTL = WDTPW + WDTHOLD; //12Mhz BCSCTL1 = CALBC1_12MHZ; DCOCTL = CALDCO_12MHZ; // Stop watchdog timer // Set range // Set DCO step + modulation P1DIR = 0x01; // P1.0 output, else input P1DIR |= 0x20; // also P1.5 output USICTL0 |= USIPE7 + USIPE6 + USIPE5 + USIMST + USIOE; // Port, SPI master USICTL1 |= USIIE; // Counter interrupt, flag remains set USICKCTL = USIDIV_4 + USISSEL_2; // SMCLK/16 USICTL0 &= ~USISWRST; // USI released for operation USISRL = 0; // initial load data value{IgnoreReturns} P2SEL = 0x00; // set up IO use on port 2 P2DIR = 0x80; // use port 2 pin 7 as frame sync output P2OUT &= ~0x80; // set sync low value = 0; // initialize output value USICNT = 16 | USI16B; // init-load counter--starts SPI running _BIS_SR(LPM0_bits + GIE); // Enter LPM0 w/ interrupt } EECS 452 – Winter 2010 Lecture 23 – Page 39/62 Friday – March 12, 2010 F2012 SPI interrupt support // USI interrupt service routine #pragma vector=USI_VECTOR __interrupt void universal_serial_interface(void) { for (i = 0xF; i > 0; i--); // delay between values USISRL = value; // load low 8 bits USISRH = value >> 8; // load high 8 bits value++; // increment value USICTL0 &= ~USIPE5; P1OUT |= 0x20; P2OUT |= 0x80; P1OUT &= ~0x20; P1OUT |= 0x20; P2OUT &= ~0x80; P1OUT &= ~0x20; USICTL0 |= USIPE5; // // // // // // // // generate two clock pulses manually clock rising edge sync rising edge clock falling edge clock rising edge sync falling edge clock falling edge return pin to the SPI USICNT = 16 | USI16B; // load counter which starts transfer } EECS 452 – Winter 2010 Lecture 23 – Page 40/62 Friday – March 12, 2010 This is strange looking code The main appears to start, run and then exit. The main sets up the F2012/3, loads a value into the USI counter and enters low power mode with interrupts (whatever that means). A “normal” program would then exit back to the system. The F2012/3 doesn’t have a system to exit back to. The USI/SPI hardware continues to run in low power mode. When the counter decrements to 0, the CPU is powered back on and the interrupt support routine is entered. The shown interrupt routine delays a while to space values for looking at on an oscilloscope. Loads a new 16-bit value into the shift registers, loads the counter with a count of 16 and puts the processor back to sleep. In our nominal resistor application the A/D clock would control events and the given interrupt routine would recast as a function. EECS 452 – Winter 2010 Lecture 23 – Page 41/62 Friday – March 12, 2010 C5505 test main #include <stdlib.h> #include <stdio.h> #include "..\c5505_support\data_types.h" #define FOREVER 1 unsigned int I2S1_receive(); void I2S1_transmit(unsigned int); void InitI2S1(); void InitSystem(); void ConfigPort(); void main(void) { unsigned int value, next_value, value_ctr, loop_ctr, bad_ctr; // CPU initialization InitSystem(); ConfigPort(); InitI2S1(); loop_ctr = 0; bad_ctr = 0; while(FOREVER) { value = I2S1_receive(); // discard first value next_value = I2S1_receive()+1; // get initial test value value_ctr = 0; while(value_ctr++ != 0xFFFF) { value = I2S1_receive(); if (next_value != value) { printf("expected: %04X received: %04X\n", next_value, value); bad_ctr++; break; } next_value++; } printf("loop %6u completed, bad = %3u\n", loop_ctr++, bad_ctr); } } EECS 452 – Winter 2010 Lecture 23 – Page 42/62 Friday – March 12, 2010 C5505 initialization and support // File name: I2S1_support // // 14Jan2010 .. initial version .. KMetzger // #include <stdlib.h> #include "..\c5505_support\data_types.h" #include "..\c5505_support\c5505.h" void InitI2S1(void) { PCGCR1 &= ~I2S1CG; // enable the I2S1 peripheral clock (0 enables) I2S1SCTRL = 0; // reset I2S1 I2S1SCTRL = I2SENABLE | I2SMONO | I2SDATADLY | I2SWDLENGTH16 | I2SFRMT ; I2S1INTMASK = I2SRCVMONFL; // enable the done flag--WARNING enables interrupt too! } unsigned int I2S1_receive(void) { while((I2S1INTFL & I2SRCVMONFL) == 0); return I2S1RXLT1; } EECS 452 – Winter 2010 // wait for received value // then return it Lecture 23 – Page 43/62 Friday – March 12, 2010 F2013 and C5505 waveforms C5505 I2S timing in DSP mode: LEFT CHANNEL I2S_FS RIGHT CHANNEL I2S_CLK N N N - - 1 2 3 DATA 3 2 1 0 N N N - - 1 2 3 LD(n) 3 2 1 N N N - - 1 2 3 0 RD(n) LD(n) = n'th sample of left channel data LD(n+1) RD(n) = n'th sample of right channel data From sprufp4.pdf. MSP430F2012/3 SPI timing: USI USI USICNTx 0 CKPH CKPL 8 7 6 5 4 3 2 1 0 0 0 SCLK 0 1 SCLK 1 0 SCLK 1 1 SCLK 0 X SDO/SDI MSB LSB 1 X SDO/SDI MSB LSB Load USICNTx USIIFG From TMS320F20xx data sheet. EECS 452 – Winter 2010 Lecture 23 – Page 44/62 Friday – March 12, 2010 C5505 I2S1 registers CPU Word Address Acronym Description 2900h I2SSCTRL I2S Serializer Control Register 2904h I2SSRATE I2S Sample Rate Generator Register 2908h I2STXLT0 I2S Transmit Left Data 0 Register 2909h I2STXLT1 I2S Transmit Left Data 1 Register 290Ch I2STXRT0 I2S Transmit Right Data 0 Register 290Dh I2STXRT1 I2S Transmit Right Data 1 Register 2910h I2SINTFL I2S Interrupt Flag Register 2914h I2SINTMASK I2S Interrupt Mask Register 2928h I2SRXLT0 I2S Receive Left Data 0 Register 2929h I2SRXLT1 I2S Receive Left Data 1 Register 292Ch I2SRXRT0 I2S Receive Right Data 0 Register 292Dh I2SRXRT1 I2S Receive Right Data 1 Register From sprufp4.pdf. EECS 452 – Winter 2010 Lecture 23 – Page 45/62 Friday – March 12, 2010 Configuration and flag register bits I2SnSCTRL register: 15 14 ENABLE 13 Reserved R/W-0 R-0 12 11 10 9 8 MONO LOOPBACK FSPOL CLKPOL DATADLY R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 2 1 0 7 6 5 PACK SIGN_EXT WDLNGTH MODE FRMT R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 LEGEND: R/W = Read/Write; R = Read only; -n = value after reset I2SnSINTFL register: 15 8 Reserved R-0 7 6 5 4 3 2 1 0 Reserved XMITSTFL XMITMONFL RCVSTFL RCVMONFL FERRFL OUERR R-0 R-0 R-0 R-0 R-0 R-0 R-0 LEGEND: R/W = Read/Write; R = Read only; -n = value after reset From sprufp4.pdf. EECS 452 – Winter 2010 Lecture 23 – Page 46/62 Friday – March 12, 2010 MSP403-C5505 SPI signals Frame Sync Bit Clock Data Bits Captured from an oscilloscope. EECS 452 – Winter 2010 Lecture 23 – Page 47/62 Friday – March 12, 2010 Time axis expanded Frame Sync Bit Clock Data Bits Captured from an oscilloscope. Different scan. EECS 452 – Winter 2010 Lecture 23 – Page 48/62 Friday – March 12, 2010 Comments about the waveforms ◮ Only those edges that are needed are generated. ◮ The clock dwell times are not relevant. ◮ Clock edge positions relevant to the data dwells are relevant. ◮ How were the important edges decided upon? Careful reading of the C5505 I2S documentation. Asking the question, "How would I implement this in a FPGA?". Cut and try. ◮ Note that the last bit sent stays in the shift register and thus on the data line. For the two waveforms shown, the last bit sent was a logic one. EECS 452 – Winter 2010 Lecture 23 – Page 49/62 Friday – March 12, 2010 Focusing now on the Piccolo™ This is of interest because: ◮ Very fast (≈ 5 MSPS) A/D. ◮ Dual track and holds. ◮ Ultra high resolution pulse width modulators that make it easy to implement D/A conveters. ◮ Low cost development tools. EECS 452 – Winter 2010 Lecture 23 – Page 50/62 Friday – March 12, 2010 TI MS320C2000 microcontrollers MS320C2000™ Microcontrollers combine control peripheral integration with the processing power of a 32-bit architecture. All C28x™ microcontrollers are 100% software compatible and offer high-speed 12-bit Analog to Digital converters and advanced PWM generators. From TI C3000 web pages. EECS 452 – Winter 2010 Lecture 23 – Page 51/62 Friday – March 12, 2010 Piccolo controlSTICK The big chip to the left is the USB interface and the big chip to the right is the F28027, $39. From a TI document. EECS 452 – Winter 2010 Lecture 23 – Page 52/62 Friday – March 12, 2010 TI controlSTICK overview The new Piccolo controlSTICK USB tool allows quick and easy evaluation of all the advanced capabilities of TI’s Piccolo 32-bit MCU for just $39. Slightly larger than a memory stick, the Piccolo controlSTICK features onboard JTAG emulation and access to all control peripherals. Example projects walk through the advanced functionality of Piccolo, from simply blinking an LED to configuring the high resolution ePWM peripherals. Included in the kit is the Piccolo controlSTICK, USB extension cable, jumpers and patch cords necessary for example projects, full version of Code Composer Studio with 32kB code size limit, example projects showcasing Piccolo MCU features and full hardware documentation, including bill of materials, schematics and Gerber files. From a TI web site. EECS 452 – Winter 2010 Lecture 23 – Page 53/62 Friday – March 12, 2010 What is a Piccolo ◮ Member of TI’s C2000 32-bit family of microcontrollers. ◮ Uses TI’s fixed point C28x core. ◮ 40-60 MIPS operation. ◮ single 3.3 Volt supply. Family members vary in ◮ ◮ ◮ the amount of on-chip RAM and flash EPROM. the peripheral mix and characteristics. ◮ Low cost. The F28027 is priced at ≈ $3.60 qty 100. ◮ Currently there are three family members. More on the way. EECS 452 – Winter 2010 Lecture 23 – Page 54/62 Friday – March 12, 2010 Piccolo block diagram From the TI Piccolo web site. EECS 452 – Winter 2010 Lecture 23 – Page 55/62 Friday – March 12, 2010 F28027 block diagram in detail Memory Bus M0 SARAM 1K x 16 (0-wait) M1 SARAM 1K x 16 (0-wait) OTP 1K x 16 Secure SARAM 1K/3K/4K x 16 (0-wait) Secure Code Security Module Boot-ROM 8K x 16 (0-wait) FLASH 16K/32K x 16 Secure OTP/Flash Wrapper PSWD Memory Bus TRST TCK TDI TMS TDO 32-bit periph eral bus COMP1OUT GPIO COMP2OUT MUX COMP1A COMP1B COMP2A COMP2B COMP C28x 32-bit CPU 3 External Interrupts PIE CPU Timer 0 AIO CPU Timer 1 MUX CPU Timer 2 OSC1, OSC2, Ext, PLL, LPM, WD XCLKIN X1 X2 LPM Wakeup XRS ADC A7:0 Memory Bus POR/ BOR B7:0 32-bit Peripheral Bus eCAP From COMP1OUT, COMP2OUT ECA Px ESYNCI EPWMxA EPWMxB HRPWM TZx SCLx SDAx VREG 32-Bit Peripheral Bus ePWM I2C (4L FIFO) SPISTEx SPICLKx SPISOMIx SPISIMOx SCITXDx SCIRXDx SPI (4L FIFO) ESYNCO 16-bit Peripheral Bus SCI (4L FIFO) GPIO Mux GPIO MUX A. EECS 452 – Winter 2010 Not all peripheral pins are available at the same time due to multiplexing. Lecture 23 – Page 56/62 Friday – March 12, 2010 Yet again TMS320F2802x/3x Block Diagram Program Bus ePWM Sectored eCAP Boot ROM RAM Flash eQEP CLA Bus 12-bit ADC Watchdog 32-bit R-M-W 32x32 bit Auxiliary Atomic Multiplier Registers ALU Real-Time JTAG Emulation CLA PIE Interrupt Manager I2C 3 32-bit Timers Register Bus CAN 2.0B SCI SPI CPU LIN Data Bus GPIO Available only on TMS320F2803x devices: CLA, QEP, CAN, LIN EECS 452 – Winter 2010 Lecture 23 – Page 57/62 Friday – March 12, 2010 The C28027 has what? ◮ 16 × 16, 32 × 32 and dual 16 × 16 MAC. ◮ Harvard architecture but with unified memory map. ◮ 2 internal, 1% accurate oscillators. ◮ On-chip temperature sensor. ◮ Clock phase-lock-loop multiplier. ◮ Watchdog timer module. ◮ Missing clock detection circuitry. ◮ Up to 22 individually programable GIPO pins. ◮ Three 32-bit timers. ◮ One enhanced pulse width modulator (ePWM). Eight outputs. ◮ Independent 16-bit timer per ePWM module. ◮ four high resolution PWM (HPRPWM). ◮ 1/2 analog comparator. ◮ 7/13 channel, 4.6 MHz, 12-bit A/D converter ◮ 128 bit security lock. ◮ Serial peripherals, one SCI, one SPI, one I2C. ◮ three external interrupts. EECS 452 – Winter 2010 Lecture 23 – Page 58/62 Friday – March 12, 2010 C28x processor block diagram Program-read data bus, PRDB(0:31) Program address bus, PAB(0:21) Data-read address bus, DRAB(0:31) Program-address generation logic Program control logic MUX MUX Data-read data bus, DRDB(0:31) Data-read buffer register Address from stack Immediate address Operand bus XAR7 Immediate data Immediate data Registers ARAU XAR0 XAR1 XAR2 XAR3 XAR4 XAR5 XAR6 XAR7 DP SP ST1 AH:AL PH:PL T:TL IER DBGIER IFR ST0 PC RPC Multiplier, barrel shifter, and ALU Result bus BUS RESULT Data-write buffer register Data-/program-write data bus, DWDB(0:31) Data-write address bus, DWAB(0:31) EECS 452 – Winter 2010 Lecture 23 – Page 59/62 Friday – March 12, 2010 F2807 on-chip memory ◮ On-chip flash – 32 K 16-bit words. ◮ On-chip SARAM – 6 K 16-bit words. ◮ Boot ROM – 8 K 16-bit words. Included (free) CCS has limit of 32 kB code size. Why is this considered a 32-bit MCU? No provision for adding external memory, easily. EECS 452 – Winter 2010 Lecture 23 – Page 60/62 Friday – March 12, 2010 F28027 memory map Prog Space Data Space 0x00 0000 M0 Vector RAM (Enabled if VMAP = 0) 0x00 0040 M0 SARAM (1K x 16, 0-Wait) 0x00 0400 Low 64K (24x/240x Equivalent Data Space) 0x00 0800 0x00 0D00 0x00 0E00 M1 SARAM (1K x 16, 0-Wait) Peripheral Frame 0 PIE Vector - RAM (256 x 16) (Enabled if VMAP = 1, ENPIE = 1) Reserved Peripheral Frame 0 0x00 2000 Reserved 0x00 6000 Peripheral Frame 1 (4K x 16, Protected) 0x00 7000 0x00 8000 Reserved Peripheral Frame 2 (4K x 16, Protected) L0 SARAM (4K x 16) (0-Wait, Secure Zone + ECSL, Dual Mapped) 0x00 9000 Reserved 0x3D 7800 User OTP (1K x 16, Secure Zone + ECSL) 0x3D 7C00 Reserved 0x3D 7C80 Calibration Data 0x3D 7CC0 0x3D 8000 High 64K (24x/240x Equivalent Program Space) Reserved 0x3F 0000 FLASH (32K x 16, 4 Sectors, Secure Zone + ECSL) 0x3F 7FF8 0x3F 8000 0x3F 9000 128-Bit Password L0 SARAM (4K x 16) (0-Wait, Secure Zone + ECSL, Dual Mapped) Reserved 0x3F E000 Boot ROM (8K x 16, 0-Wait) 0x3F FFC0 Vector (32 Vectors, Enabled if VMAP = 1) Figure 3-5. 28023/28027 Memory Map EECS 452 – Winter 2010 Lecture 23 – Page 61/62 Friday – March 12, 2010 Flash memory addresses Table 3-1. Addresses of Flash Sectors in F28021/28023/28027 ADDRESS RANGE PROGRAM AND DATA SPACE 0x3F 0000 - 0x3F 1FFF Sector D (8K x 16) 0x3F 2000 - 0x3F 3FFF Sector C (8K x 16) 0x3F 4000 - 0x3F 5FFF Sector B (8K x 16) 0x3F 6000 - 0x3F 7F7F Sector A (8K x 16) 0x3F 7F80 - 0x3F 7FF5 Program to 0x0000 when using the Code Security Module 0x3F 7FF6 - 0x3F 7FF7 Boot-to-Flash Entry Point (program branch instruction here) 0x3F 7FF8 - 0x3F 7FFF Security Password (128-Bit) (Do not program to all zeros) Please DO NOT change any of the security codes or passwords. Don’t even think about doing so. EECS 452 – Winter 2010 Lecture 23 – Page 62/62 Friday – March 12, 2010