EECS 452 W10 Lecture 23 slides

advertisement
EECS 452 – Lecture 23
Today:
TI MSP430 and Piccolo.
Handouts:
printed copy of today’s lecture slides
Read:
about DSP!
References:
Last one out should close the lab door!!!!
Please keep the lab clean and organized.
Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and
weighs 30 tons, computers in the future may have only 1,000 vaccuum tubes and
perhaps weigh 1.5 tons. – Popular Mechanics, March 1949
EECS 452 – Winter 2010
Lecture 23 – Page 1/62
Friday – March 12, 2010
Actually . . .
Actually there were 18800 vacuum tubes and of those 6550
were 6SN7s.
The 6SN7 was/is a dual triode and was used to implement the
20 digit signed decimal accumulators. By not turning off the
power to ENIAC the average failure rate was 1 tube about every
two days. The longest up period was 116 hours.
A portion of ENIAC is located in the lobby of the CSE building.
The tubes that you see are very likely 6SN7s.
ENIAC’s active lifetime was 9 years, 1947–1955.
EECS 452 – Winter 2010
Lecture 23 – Page 2/62
Friday – March 12, 2010
Overview of today’s lecture
Unfortunately, likely to be fragmented and rambling.
◮
Comments on single supply operation.
◮
The MPS430
◮
Multiplying without a multiplier.
◮
An IIR filter for the MSP430
◮
The MSP430 SPI interface.
◮
Linking MSP430 SPI to C5505 I2S.
◮
The TI Piccolo
EECS 452 – Winter 2010
Lecture 23 – Page 3/62
Friday – March 12, 2010
Thinking about single supply operation
+V /2
+V /2
+V /2
+V
ground
V /2
ground
−V /2
ground
−V /2
−V /2
ground
Bypass capacitors not shown.
+V
+V
R
+V
ground
R
ground
ground
V /2
An alternative name for ground is common. Maybe a better choice.
EECS 452 – Winter 2010
Lecture 23 – Page 4/62
Friday – March 12, 2010
Focusing now on the MSP430™
EECS 452 has a couple of eZ430-F2013 Development tools and several
Z-Accel wireless kits (uses F2274).
The development tool F2012/13 boards execute programs out of flash.
The boards can operate stand-alone, have projects have used them in
this manner.
The F2012/F2013 boards have been used to interface to XBee wireless
modules via UART and to the C5505 via SPI.
The three most important documents are:
◮
The data manual for the F20xx microcontrollers.
◮
The MSP430x2xx Family User’s Guide, SLAU144E.
◮
The eZ430-F2012 Development Tool User’s Guide, SLAU176B.
EECS 452 – Winter 2010
Lecture 23 – Page 5/62
Friday – March 12, 2010
Where used?
http://www.ti.com/ww/en/mcu/valueline/index.shtml?DCMP=Value_Line&HQS=Other+BA+430value-promo.
All these applications likely involve the use of Digital Signal Processing!
I don’t understand how the new value line differs from the existing low end units other
than in part number and price.
EECS 452 – Winter 2010
Lecture 23 – Page 6/62
Friday – March 12, 2010
What is low power?
◮
There are six low power modes of operation.
◮
Standby (asleep) at 3V with self wake up with RAM retention,
< 0.6µA, about 1.8 microwatts.
◮
250µA per MIP when active. (MSP430X2xx family.) This is 3/4
milli-Watt per MIP at 3 Volts.
◮
Wake up time < 1µs.
EECS 452 – Winter 2010
Lecture 23 – Page 7/62
Friday – March 12, 2010
Comments
http://focus.ti.com/graphics/mcu/ulp/battery-life.gif.
EECS 452 – Winter 2010
Lecture 23 – Page 8/62
Friday – March 12, 2010
eZ430-Development Tool
The debugging interface shown is the old version. I believe that we only
have the 6 pin version. For the F2012/13 boards simply use the center
four pins.
Note that the 14 pin pattern mirror images the physical pin positions on
the F2012/13 packages. BEWARE!
SLAU176B documents the tool and the F2013 board. (Figure from there.)
EECS 452 – Winter 2010
Lecture 23 – Page 9/62
Friday – March 12, 2010
MSP430 generic block diagram
ACLK
Clock
System
SMCLK
Flash/
ROM
RAM
Peripheral
Peripheral
Peripheral
RISC CPU
16-Bit
JTAG/Debug
MCLK
MAB 16-Bit
MDB 16-Bit
Bus
Conv.
MDB 8-Bit
JTAG
ACLK
SMCLK
Watchdog
Peripheral
Peripheral
Peripheral
Peripheral
From the MSP430X2XX Family User’s Guide.
EECS 452 – Winter 2010
Lecture 23 – Page 10/62
Friday – March 12, 2010
MSP430 CPU block diagram
MDB − Memory Data Bus
Memory Address Bus − MAB
15
◮ RISC architecture.
0
R0/PC Program Counter
0
R1/SP Stack Pointer
0
R2/SR/CG1 Status
◮ 27 core instructions.
R3/CG2 Constant Generator
◮ Plus 24 emulated instructions.
◮ 7 addressing modes.
◮ Every instruction usable with every addressing mode.
◮ Single-cycle register operations.
◮ Constant generator for six most commonly used values.
◮ Direct memory-to-memory transfers.
◮ Instruction times depend on the addressing mode used.
◮ Instruction can take from 1 to 6 cycles.
R4
General Purpose
R5
General Purpose
R6
General Purpose
R7
General Purpose
R8
General Purpose
R9
General Purpose
R10
General Purpose
R11
General Purpose
R12
General Purpose
R13
General Purpose
R14
General Purpose
R15
General Purpose
16
16
From the MSP430X2XX Family User’s Guide.
EECS 452 – Winter 2010
Zero, Z
Carry, C
Overflow, V
Negative, N
Lecture 23 – Page 11/62
dst
src
16−bit ALU
MCLK
Friday – March 12, 2010
How to do DSP without a multiplier?
Here is the problem that I want to address:
◮ Manufacturers, such as TI, sell low cost, low power microcomputers,
essentially by the millions.
◮ Many of these do not possess a multiplier, yet alone a MAC unit.
◮ In spite of this there, are likely many applications that would benefit
(result in a more desirable product) by use of some DSP.
◮ Just as floating point arithmetic is emulated in the C5505 by software, one
can emulate the operation of a multiplier hardware in software.
◮ Implementation of multiplication in a multiplierless can be divided into
two basic categories : general purpose multiplication and hard coded
multiplication.
◮ The general multiplier is the more flexible but is also the most costly in
terms of execution time.
◮ The hard coding of the computation steps assumes multiplication by fixed
values (such as filter coefficients). Is fastest but requires significant code
space.
EECS 452 – Winter 2010
Lecture 23 – Page 12/62
Friday – March 12, 2010
So what would I like to cover?
Disclaimer: this is a work in progress. Some has been done, some not. I
accidentally lost my MSP430 test codes when upgrading to CCS4. Some
of the outline below is fantasy, at this point, but should provide hints to
anyone interested in delving into this topic on their own.
◮
Pencil and paper unsigned binary multiplication.
◮
Pencil and paper two’s complement binary multiplication.
◮
Multiplier block diagrams.
◮
Coding a general multiplier in the MSP430. TI likely supplies code
for such.
◮
Booth’s algorithm.
◮
Signed Digit (SD) and Canonical Signed Digit (CSD) representation.
◮
Testing.
◮
A IIR filter code generator.
EECS 452 – Winter 2010
Lecture 23 – Page 13/62
Friday – March 12, 2010
Will knowing how to do this be useful?
◮
The lowest cost MSP430 having a multiplier appears to be the
MSP430F2330 at $1.75 at 1ku. It has a slope A/D and lives in a 40
pin flat pack.
◮
If one could use a $0.60 part (e.g., the F2011) at the 1ku level the
savings would be $1150 and at the 10ku level $11,500, etc.
◮
There likely will be many situations where knowing how to do this
will be useful and make economic sense.
◮
Someone will benefit from knowing how to do this. Just who and
when? It might be you.
EECS 452 – Winter 2010
Lecture 23 – Page 14/62
Friday – March 12, 2010
Relevant TI application notes
Efficient Multiplication and Division Using MSP430, Kripasagar Venkat,
Application Report slaa329, 9/2006.
Efficient MSP430 Code Synthesis for an FIR Filter, Kripasagar Venkat,
Application Report slaa357, 3/2007.
Combines Horner’s method of polynomial evaluation with the Canonical
Signed Digit (CSD) number representation to “efficiently” (as well as one
can) implement DSP.
The focus is on the multiplierless MSP430 devices but the method will
work on any computer or FPGA. The source files are also available.
This pair of notes are what started me on this effort.
EECS 452 – Winter 2010
Lecture 23 – Page 15/62
Friday – March 12, 2010
Comments on the application notes
◮
Author assumes use of Q15.
◮
Develops a right to left algorithm.
◮
Relates process to use of Horner’s method of polynomial evaluation.
◮
Hard codes the shift and add steps for constant multiplier values.
◮
Uses signed digit representation for multipliers.
◮
Essentially equivalent basic shift and add multiplier.
◮
Recall that Q15 is a state of mind, not a function of a hardware
binary point.
EECS 452 – Winter 2010
Lecture 23 – Page 16/62
Friday – March 12, 2010
Doing pencil and paper multiplication
×
+
+
+
−
b0 ×
b1 ×
b2 ×
b3 ×
b4 ×
a4 a3 a2
b4 b3 b2
a1
b1
a0
b0
a4
a4
a4
a4
a4
a4
a4
a4
a4
a4
a4
a4
a4
a4
a3
a4
a4
a4
a3
a2
a4
a4
a3
a2
a1
a4
a3
a2
a1
a0
a3
a2
a1
a0
a2
a1
a0
a1
a0
a0
p9
p8
p7
p6
p5
p4
p3
p2
p1
p0
The multiplicand sign bit is extended for each row.
EECS 452 – Winter 2010
Lecture 23 – Page 17/62
Friday – March 12, 2010
Summing rows signed multiplier logic
a
b
lsb
lsb
register
shift register
AND
add/subtract
S
subtract
lsb
p-register
shift register
high bits
low bits
a×b
EECS 452 – Winter 2010
Lecture 23 – Page 18/62
Friday – March 12, 2010
C simulation: unsigned shift and add multiplication
// FPGA and MSP430 simulated unsigned shift and add multiply
uint32_t u_sanda(uint16_t a, uint16_t b)
{
uint16_t ctr;
uint32_t sum;
sum = 0;
for (ctr=0; ctr<16; ctr++) {
if (b & 0x0001) {
sum = sum&0xFFFF; // insure carry is 0
sum += a;
}
b = ((sum&0x0001)<<15) + (b>>1);
sum = sum>>1; // shift right including carry
}
return ((uint32_t)sum<<16)+(long)b;
EECS 452 – Winter 2010
Lecture 23 – Page 19/62
Friday – March 12, 2010
C simulation: signed shift and add multiplication
// signed shift and add multiply
int32_t fs_sanda(int16_t a, int16_t b)
{
uint16_t ctr, pr, low, carry, sign_a, sign_b;
sign_a = a&0x8000; sign_b = b&0x8000;
pr = 0; low = 0; carry = 0;
for (ctr=0; ctr<16; ctr++) {
if (b&0x0001 != 0) {
carry = sign_a;
if (ctr == 15) pr -= a; else pr += a;
}
b = b>>1;
if (pr&0x0001 != 0) low = 0x8000+(low>>1); else low = (low>>1);
pr = (pr>>1)|carry;
}
if (a !=0) pr = pr^(sign_b );
return ((int32_t)pr<<16)+low;
}
EECS 452 – Winter 2010
Lecture 23 – Page 20/62
Friday – March 12, 2010
Comments
These simulations mimic were written in conjunction with MSP430 code.
Multiplies two 16-bit values with a 32 bit result.
Exhaustively tested using all possible multiplier and multiplicand values.
EECS 452 – Winter 2010
Lecture 23 – Page 21/62
Friday – March 12, 2010
Working with Q15 values.
◮
Basically do integer multiplication.
◮
Product is 32 bits (two words).
◮
Left shift result by 1 and retain only the top 16 bits. Round first?
◮
Only need to do the multiplication keeping the top 16 bits. The low
bits can be discarded as generated. Might complicate rounding.
◮
For the shown algorithm what if we don’t do the last right shift?
◮
Code and TEST. My norm is to exhaustively test where ever possible.
◮
When not possible, test end/special cases then use random values,
lots of random values.
EECS 452 – Winter 2010
Lecture 23 – Page 22/62
Friday – March 12, 2010
Signed digit number representation
◮
Instead of representing values with 0 and 1 digit values, use digit
values of -1, 0, 1.
◮
Awkward on a binary processor. However, if one is hard coding the
steps in a multiplication operation is easily done.
◮
Not a unique representation. Lots of ways of writing a given value
using signed digits.
EECS 452 – Winter 2010
Lecture 23 – Page 23/62
Friday – March 12, 2010
Canonical SD representation
Uses the minimum number of non-zero digits.
◮
Reduces the instructions needed to hard code multiplication.
◮
Where to find an algorithm for generating CSD? Try Computer
Arithmetic Algorithms, by Israel Koren.
◮
How much efficiency is obtained?
EECS 452 – Winter 2010
Lecture 23 – Page 24/62
Friday – March 12, 2010
Converting an integer to CSD form
/*
File name: Int2CSD.c
Two’s complement integer to canonical signed digit.
Algorithm from Koren ...
16Feb2009 .. initial version .. K.Metzger
*/
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
void Int2CSD(int32_t value,
int nbits,
int *bits,
int *digits)
{
int idx, cin=0, which;
//
//
//
//
integer value to convert
number of bits in value to convert
bits array...nbits+1 elements
digits array...nbits elements
for(idx=0; idx<nbits; idx++) {
bits[idx] = value & 0x1;
value >>= 1;
}
bits[idx]= bits[idx-1]; // sign extend one extra bit
}
for (idx=0; idx<nbits; idx++) {
which = (bits[idx+1]*2+bits[idx])*2+cin;
switch(which) {
case 0: digits[idx] = 0; cin = 0; break;
case 1: digits[idx] = 1; cin = 0; break;
case 2: digits[idx] = 1; cin = 0; break;
case 3: digits[idx] = 0; cin = 1; break;
case 4: digits[idx] = 0; cin = 0; break;
case 5: digits[idx] = -1; cin = 1; break;
case 6: digits[idx] = -1; cin = 1; break;
case 7: digits[idx] = 0; cin = 1; break;
default: printf("Int2CSD: oops!\n"); exit(1);
} // end of switch
}
// end of for
// end of function
EECS 452 – Winter 2010
Lecture 23 – Page 25/62
Friday – March 12, 2010
Implementing a IIR filter
Assume 16-bit values. Assuming a uniform distribution on the ones and zeros.
◮ On the average there will 8 ones and 8 zeros in the multiplier.
◮ Each one will be coded as a shift and an add. Eight shifts and eight adds.
◮ Each zero will be coded as a shift. Eight shifts.
◮ On the average (assuming that we are not doing Voodoo statistics here) a
multiplication will need 16 shifts and 8 adds. Twenty four machine cycles.
◮ On a MSP430 running at 16 MHz a hard coded multiplication will take on
the order of 1.5µs.
◮ To be conservative let’s use a value of 3µs.
◮ To implement an 8th order biquad filter we need five multiplications per
biquad and four biquads.
◮ The nominal, very hand wavy, time required to filter a sample is on the
order of 60µs.
◮ It might be possible to sample using a sample rate of 16 kHz and filter.
EECS 452 – Winter 2010
Lecture 23 – Page 26/62
Friday – March 12, 2010
Is this reasonable and can we do better?
◮
A 16-bit FPGA multiplier implementation should only need about 16
clock tics. The multiplier foot print should be small enough to allow
all 20 multipliers to be implemented. In this case a nominal 16 clock
tics would be needed per input sample for each filter output. (This
is an aside, sorry.)
◮
There is exists a non-unique number representation called signed
digit. When placed into canonical form this representation contains
the minimum number possible non-zero values. These non-zero
values are either +1 or −1.
◮
There is the possibility of speeding up hard coded multiplications.
◮
A reasonable question is “by how much”.
EECS 452 – Winter 2010
Lecture 23 – Page 27/62
Friday – March 12, 2010
Implementing multiplication in an MSP430
When updating to CCS V4 I deleted my old Code Composer Essentials.
Oops.
I had meant to back this work up.
EECS 452 – Winter 2010
Lecture 23 – Page 28/62
Friday – March 12, 2010
Canonical heresy
What are the maximum values associated
with the w1 and the w2 ?
What are the maximum values associated
with the w3 and w4 ? (Assuming our usual
scaling scheme.)
Where does overflow occur? Is this important? (Combine the two top adders into
one.)
x
b0
z−1
w1
z
w2
b1
−1
b2
+
y
+
z−1
w3
+
+
−a1
z−1
w4
−a2
Is this truly real?
EECS 452 – Winter 2010
Lecture 23 – Page 29/62
Friday – March 12, 2010
Is the result worth the effort?
◮
I wrote a C simulation for the lab 8th order IIR filter.
◮
The straight shift and add multiplication algorithm takes 164 adds
per sample.
◮
The CSD multiplication algorithm takes 112.
◮
The nominal CSD version does 0.68 times the number add/subtracts
as the normal algorithm.
◮
In a final form filter there will also be additional overheads that will
mute the speedup amount. Maybe by a factor on the order of two.
This still gives an on the order of 16% speed up.
◮
Of course, I’m assuming that I’ve done everything correctly.
The only really good way to answer this question is to build both
versions and run them.
EECS 452 – Winter 2010
Lecture 23 – Page 30/62
Friday – March 12, 2010
Moving onto the MSP430 SPI
◮
Two version have been present. Current can optionally do 8 or 16
bit transfers.
◮
A versatile device.
◮
Can be used to program a UART transmitter.
◮
Have programmed to communicate to C5505 via I2S.
◮
Used I2S mono mode. “Hand generated” frame sync.
◮
Last week TI issued an application note showing how to use a
couple of chips external to the MSP430 to do the I2S link. Their
solution is more general that what I did.
EECS 452 – Winter 2010
Lecture 23 – Page 31/62
Friday – March 12, 2010
F2012/13 USI SPI block diagram
USIGE
USII2C = 0
USIOE
USIPE6
SDO
D Q
G
USI16B
USILSB
USIPE7
SDI
8/16 Bit Shift Register
EN
USISR
USICNTx USIIFGCC
Bit Counter
EN
USISWRST
Set USIIFG
USICKPH
USICKPL
USIPE5
Shift Clock
1
SCLK
0
USISSELx
SCLK
000
ACLK
001
SMCLK
010
SMCLK
011
USISWCLK
100
TA0
101
TA1
110
TA2
111
USIMST
USIDIVx
1
Clock Divider
/1/2/4/8... /128
USICLK
0
HOLD
USIIFG
From slau144e.pdf.
EECS 452 – Winter 2010
Lecture 23 – Page 32/62
Friday – March 12, 2010
F2012/13 USI SPI timing diagram
USI USI
USICNTx 0
CKPH CKPL
8
7
6
5
4
3
2
1
0
0
0
SCLK
0
1
SCLK
1
0
SCLK
1
1
SCLK
0
X
SDO/SDI
MSB
LSB
X
SDO/SDI
MSB
LSB
1
Load USICNTx
USIIFG
From slau144e.pdf.
EECS 452 – Winter 2010
Lecture 23 – Page 33/62
Friday – March 12, 2010
Can use SPI as a UART transmitter
◮
UART uses 10 bit frame.
◮
SPI has 16 bits in frame.
◮
Have to slow UART down some because sending 16 bits per item
versus 10.
◮
Have to bit reverse order in SPI frame because UART is lsb to msb.
EECS 452 – Winter 2010
Lecture 23 – Page 34/62
Friday – March 12, 2010
Application Examples
1. Moving 16-bit values from a F2012 using the MSP430 SPI interface
to the C5505 using the C5505 I2S interface. The one available eZdsp
SPI “channel” is used to interface FPGA display support to the
C5505. Three I2S channels are available. Our intent is to use one of
these.
This is a slightly contrived example. The C5505 itself has four A/D
input channels that could be used for this application.
2. Moving 8-bit values from a F2012 using the MSP430 SPI interface to
the C5505 using the C5505 UART interface. Useful when sending
values from a MSP430 to a XBee wireless device.
EECS 452 – Winter 2010
Lecture 23 – Page 35/62
Friday – March 12, 2010
MSP430 Master SPI to C5501 slave I2S1
An example application would to measure the positions of four variable
resistors (either rotary or slider) to be used as control inputs to an audio
special effects processor running on a C5505.
EECS 452 – Winter 2010
Lecture 23 – Page 36/62
Friday – March 12, 2010
F2012 pin use
VCC
1
14
VSS
P1.0/TACLK/ACLK/A0
2
13
XIN/P2.6/TA1
P1.1/TA0/A1
3
12
XOUT/P2.7
P1.2/TA1/A2
4
11
TEST/SBWTCK
P1.3/ADC10CLK/A3/VREF--/VeREF--
5
10
P1.4/SMCLK/A4/VREF+/VeREF+/TCK
P1.5/TA0/A5/SCLK/TMS
6
9
RST/NMI/SBWTDIO
P1.7/A7/SDI/SDA/TDO/TDI
7
8
P1.6/TA1/A6/SDO/SCL/TDI/TCLK
◮
The F2012 package has 14 pins. Pins 1 and 14 are used for power
and ground. Pins 10 and 11 are used by JTAG, Spy by Wire. This
leaves 10 for signals.
◮
Need to use three signals to interface to I2S, frame sync, clock (pin
7), data (pin 8). The MSP430 SPI hardware does not generate frame
sync. Have to use an output port pin and generate it ourselves.
◮
Available A/D channels are on pins 2,3,4,5 and 9. Pin 2 is connected
to an led. Pins 3,4,5 and 9 are available as A/D inputs.
We will have to use either pin 12 or 13 as frame sync. This locks out
possible use of a 32768 Hz crystal. Will use pin 12 (port 2 pin 7).
From the TI MSP430F2012 data sheet.
EECS 452 – Winter 2010
Lecture 23 – Page 37/62
Friday – March 12, 2010
C5505 and other considerations
◮
C5505 has X SPI ports of which only one is brought out and is
generally used to drive the S3SB graphics.
◮
There are four I2S ports. Port I2S port 0 is use with the CODEC.
Ports I2S1 and I2S2 are brought to the eZdsp connector. Port I2S3 is
shared with the UART.
◮
When I2S is a slave the transfer timing is controlled by the master
and does can be “bursty”.
◮
Will use I2S1 to support the slave input.
◮
Will use DSP mono-mode.
◮
The F2012/3 SPI output does not include a frame sync waveform.
One can be generated using a port pin.
◮
Need at least one clock additional clock pulse to allow the C5505 to
sample the frame sync transition.
EECS 452 – Winter 2010
Lecture 23 – Page 38/62
Friday – March 12, 2010
F2012 main
#include <msp430x20x3.h>
volatile unsigned int i, value;
void main(void)
{
WDTCTL = WDTPW + WDTHOLD;
//12Mhz
BCSCTL1 = CALBC1_12MHZ;
DCOCTL = CALDCO_12MHZ;
// Stop watchdog timer
// Set range
// Set DCO step + modulation
P1DIR = 0x01;
// P1.0 output, else input
P1DIR |= 0x20;
// also P1.5 output
USICTL0 |= USIPE7 + USIPE6 + USIPE5 + USIMST + USIOE; // Port, SPI master
USICTL1 |= USIIE;
// Counter interrupt, flag remains set
USICKCTL = USIDIV_4 + USISSEL_2;
// SMCLK/16
USICTL0 &= ~USISWRST;
// USI released for operation
USISRL = 0;
// initial load data value{IgnoreReturns}
P2SEL = 0x00;
// set up IO use on port 2
P2DIR = 0x80;
// use port 2 pin 7 as frame sync output
P2OUT &= ~0x80;
// set sync low
value = 0;
// initialize output value
USICNT = 16 | USI16B;
// init-load counter--starts SPI running
_BIS_SR(LPM0_bits + GIE);
// Enter LPM0 w/ interrupt
}
EECS 452 – Winter 2010
Lecture 23 – Page 39/62
Friday – March 12, 2010
F2012 SPI interrupt support
// USI interrupt service routine
#pragma vector=USI_VECTOR
__interrupt void universal_serial_interface(void)
{
for (i = 0xF; i > 0; i--);
// delay between values
USISRL = value;
// load low 8 bits
USISRH = value >> 8;
// load high 8 bits
value++;
// increment value
USICTL0 &= ~USIPE5;
P1OUT |= 0x20;
P2OUT |= 0x80;
P1OUT &= ~0x20;
P1OUT |= 0x20;
P2OUT &= ~0x80;
P1OUT &= ~0x20;
USICTL0 |= USIPE5;
//
//
//
//
//
//
//
//
generate two clock pulses manually
clock rising edge
sync rising edge
clock falling edge
clock rising edge
sync falling edge
clock falling edge
return pin to the SPI
USICNT = 16 | USI16B;
// load counter which starts transfer
}
EECS 452 – Winter 2010
Lecture 23 – Page 40/62
Friday – March 12, 2010
This is strange looking code
The main appears to start, run and then exit.
The main sets up the F2012/3, loads a value into the USI counter and
enters low power mode with interrupts (whatever that means).
A “normal” program would then exit back to the system. The F2012/3
doesn’t have a system to exit back to.
The USI/SPI hardware continues to run in low power mode. When the
counter decrements to 0, the CPU is powered back on and the interrupt
support routine is entered.
The shown interrupt routine delays a while to space values for looking at
on an oscilloscope. Loads a new 16-bit value into the shift registers,
loads the counter with a count of 16 and puts the processor back to
sleep.
In our nominal resistor application the A/D clock would control events
and the given interrupt routine would recast as a function.
EECS 452 – Winter 2010
Lecture 23 – Page 41/62
Friday – March 12, 2010
C5505 test main
#include <stdlib.h>
#include <stdio.h>
#include "..\c5505_support\data_types.h"
#define FOREVER 1
unsigned int I2S1_receive();
void I2S1_transmit(unsigned int);
void InitI2S1();
void InitSystem();
void ConfigPort();
void main(void)
{
unsigned int value, next_value, value_ctr, loop_ctr, bad_ctr;
// CPU initialization
InitSystem();
ConfigPort();
InitI2S1();
loop_ctr = 0;
bad_ctr = 0;
while(FOREVER) {
value = I2S1_receive();
// discard first value
next_value = I2S1_receive()+1; // get initial test value
value_ctr = 0;
while(value_ctr++ != 0xFFFF) {
value = I2S1_receive();
if (next_value != value) {
printf("expected: %04X
received: %04X\n", next_value, value);
bad_ctr++;
break;
}
next_value++;
}
printf("loop %6u completed, bad = %3u\n", loop_ctr++, bad_ctr);
}
}
EECS 452 – Winter 2010
Lecture 23 – Page 42/62
Friday – March 12, 2010
C5505 initialization and support
// File name: I2S1_support
//
// 14Jan2010 .. initial version .. KMetzger
//
#include <stdlib.h>
#include "..\c5505_support\data_types.h"
#include "..\c5505_support\c5505.h"
void InitI2S1(void)
{
PCGCR1 &= ~I2S1CG;
// enable the I2S1 peripheral clock (0 enables)
I2S1SCTRL = 0;
// reset I2S1
I2S1SCTRL = I2SENABLE | I2SMONO | I2SDATADLY | I2SWDLENGTH16 | I2SFRMT ;
I2S1INTMASK = I2SRCVMONFL; // enable the done flag--WARNING enables interrupt too!
}
unsigned int I2S1_receive(void)
{
while((I2S1INTFL & I2SRCVMONFL) == 0);
return I2S1RXLT1;
}
EECS 452 – Winter 2010
// wait for received value
// then return it
Lecture 23 – Page 43/62
Friday – March 12, 2010
F2013 and C5505 waveforms
C5505 I2S timing in DSP mode:
LEFT CHANNEL
I2S_FS
RIGHT CHANNEL
I2S_CLK
N N N
- - 1 2 3
DATA
3
2 1 0
N N N
- - 1 2 3
LD(n)
3 2 1
N N N
- - 1 2 3
0
RD(n)
LD(n) = n'th sample of left channel data
LD(n+1)
RD(n) = n'th sample of right channel data
From sprufp4.pdf.
MSP430F2012/3 SPI timing:
USI USI
USICNTx 0
CKPH CKPL
8
7
6
5
4
3
2
1
0
0
0
SCLK
0
1
SCLK
1
0
SCLK
1
1
SCLK
0
X
SDO/SDI
MSB
LSB
1
X
SDO/SDI
MSB
LSB
Load USICNTx
USIIFG
From TMS320F20xx data sheet.
EECS 452 – Winter 2010
Lecture 23 – Page 44/62
Friday – March 12, 2010
C5505 I2S1 registers
CPU Word
Address
Acronym
Description
2900h
I2SSCTRL
I2S Serializer Control Register
2904h
I2SSRATE
I2S Sample Rate Generator Register
2908h
I2STXLT0
I2S Transmit Left Data 0 Register
2909h
I2STXLT1
I2S Transmit Left Data 1 Register
290Ch
I2STXRT0
I2S Transmit Right Data 0 Register
290Dh
I2STXRT1
I2S Transmit Right Data 1 Register
2910h
I2SINTFL
I2S Interrupt Flag Register
2914h
I2SINTMASK
I2S Interrupt Mask Register
2928h
I2SRXLT0
I2S Receive Left Data 0 Register
2929h
I2SRXLT1
I2S Receive Left Data 1 Register
292Ch
I2SRXRT0
I2S Receive Right Data 0 Register
292Dh
I2SRXRT1
I2S Receive Right Data 1 Register
From sprufp4.pdf.
EECS 452 – Winter 2010
Lecture 23 – Page 45/62
Friday – March 12, 2010
Configuration and flag register bits
I2SnSCTRL register:
15
14
ENABLE
13
Reserved
R/W-0
R-0
12
11
10
9
8
MONO
LOOPBACK
FSPOL
CLKPOL
DATADLY
R/W-0
R/W-0
R/W-0
R/W-0
R/W-0
2
1
0
7
6
5
PACK
SIGN_EXT
WDLNGTH
MODE
FRMT
R/W-0
R/W-0
R/W-0
R/W-0
R/W-0
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
I2SnSINTFL register:
15
8
Reserved
R-0
7
6
5
4
3
2
1
0
Reserved
XMITSTFL
XMITMONFL
RCVSTFL
RCVMONFL
FERRFL
OUERR
R-0
R-0
R-0
R-0
R-0
R-0
R-0
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
From sprufp4.pdf.
EECS 452 – Winter 2010
Lecture 23 – Page 46/62
Friday – March 12, 2010
MSP403-C5505 SPI signals
Frame Sync
Bit Clock
Data Bits
Captured from an oscilloscope.
EECS 452 – Winter 2010
Lecture 23 – Page 47/62
Friday – March 12, 2010
Time axis expanded
Frame Sync
Bit Clock
Data Bits
Captured from an oscilloscope. Different scan.
EECS 452 – Winter 2010
Lecture 23 – Page 48/62
Friday – March 12, 2010
Comments about the waveforms
◮
Only those edges that are needed are generated.
◮
The clock dwell times are not relevant.
◮
Clock edge positions relevant to the data dwells are relevant.
◮
How were the important edges decided upon? Careful reading of
the C5505 I2S documentation. Asking the question, "How would I
implement this in a FPGA?". Cut and try.
◮
Note that the last bit sent stays in the shift register and thus on the
data line. For the two waveforms shown, the last bit sent was a logic
one.
EECS 452 – Winter 2010
Lecture 23 – Page 49/62
Friday – March 12, 2010
Focusing now on the Piccolo™
This is of interest because:
◮
Very fast (≈ 5 MSPS) A/D.
◮
Dual track and holds.
◮
Ultra high resolution pulse width modulators that make it easy to
implement D/A conveters.
◮
Low cost development tools.
EECS 452 – Winter 2010
Lecture 23 – Page 50/62
Friday – March 12, 2010
TI MS320C2000 microcontrollers
MS320C2000™ Microcontrollers combine control peripheral
integration with the processing power of a 32-bit architecture. All
C28x™ microcontrollers are 100% software compatible and offer
high-speed 12-bit Analog to Digital converters and advanced PWM
generators.
From TI C3000 web pages.
EECS 452 – Winter 2010
Lecture 23 – Page 51/62
Friday – March 12, 2010
Piccolo controlSTICK
The big chip to the left is the USB interface and the big chip to the right is the F28027, $39. From a TI document.
EECS 452 – Winter 2010
Lecture 23 – Page 52/62
Friday – March 12, 2010
TI controlSTICK overview
The new Piccolo controlSTICK USB tool allows quick and easy
evaluation of all the advanced capabilities of TI’s Piccolo 32-bit MCU
for just $39. Slightly larger than a memory stick, the Piccolo
controlSTICK features onboard JTAG emulation and access to all
control peripherals. Example projects walk through the advanced
functionality of Piccolo, from simply blinking an LED to configuring
the high resolution ePWM peripherals. Included in the kit is the
Piccolo controlSTICK, USB extension cable, jumpers and patch cords
necessary for example projects, full version of Code Composer Studio
with 32kB code size limit, example projects showcasing Piccolo MCU
features and full hardware documentation, including bill of materials,
schematics and Gerber files.
From a TI web site.
EECS 452 – Winter 2010
Lecture 23 – Page 53/62
Friday – March 12, 2010
What is a Piccolo
◮
Member of TI’s C2000 32-bit family of microcontrollers.
◮
Uses TI’s fixed point C28x core.
◮
40-60 MIPS operation.
◮
single 3.3 Volt supply.
Family members vary in
◮
◮
◮
the amount of on-chip RAM and flash EPROM.
the peripheral mix and characteristics.
◮
Low cost. The F28027 is priced at ≈ $3.60 qty 100.
◮
Currently there are three family members. More on the way.
EECS 452 – Winter 2010
Lecture 23 – Page 54/62
Friday – March 12, 2010
Piccolo block diagram
From the TI Piccolo web site.
EECS 452 – Winter 2010
Lecture 23 – Page 55/62
Friday – March 12, 2010
F28027 block diagram in detail
Memory Bus
M0
SARAM 1K x 16
(0-wait)
M1
SARAM 1K x 16
(0-wait)
OTP 1K x 16
Secure
SARAM
1K/3K/4K x 16
(0-wait)
Secure
Code
Security
Module
Boot-ROM
8K x 16
(0-wait)
FLASH
16K/32K x 16
Secure
OTP/Flash
Wrapper
PSWD
Memory Bus
TRST
TCK
TDI
TMS
TDO
32-bit periph eral bus
COMP1OUT
GPIO
COMP2OUT
MUX
COMP1A
COMP1B
COMP2A
COMP2B
COMP
C28x
32-bit CPU
3 External Interrupts
PIE
CPU Timer 0
AIO
CPU Timer 1
MUX
CPU Timer 2
OSC1,
OSC2,
Ext,
PLL,
LPM,
WD
XCLKIN
X1
X2
LPM Wakeup
XRS
ADC
A7:0
Memory Bus
POR/
BOR
B7:0
32-bit Peripheral Bus
eCAP
From
COMP1OUT,
COMP2OUT
ECA Px
ESYNCI
EPWMxA
EPWMxB
HRPWM
TZx
SCLx
SDAx
VREG
32-Bit Peripheral Bus
ePWM
I2C
(4L FIFO)
SPISTEx
SPICLKx
SPISOMIx
SPISIMOx
SCITXDx
SCIRXDx
SPI
(4L FIFO)
ESYNCO
16-bit Peripheral Bus
SCI
(4L FIFO)
GPIO
Mux
GPIO MUX
A.
EECS 452 – Winter 2010
Not all peripheral pins are available at the same time due to multiplexing.
Lecture 23 – Page 56/62
Friday – March 12, 2010
Yet again
TMS320F2802x/3x Block Diagram
Program Bus
ePWM
Sectored
eCAP
Boot
ROM
RAM
Flash
eQEP
CLA Bus
12-bit ADC
Watchdog
32-bit
R-M-W
32x32 bit
Auxiliary
Atomic
Multiplier
Registers
ALU
Real-Time
JTAG
Emulation
CLA
PIE
Interrupt
Manager
I2C
3
32-bit
Timers
Register Bus
CAN 2.0B
SCI
SPI
CPU
LIN
Data Bus
GPIO
Available only on TMS320F2803x devices: CLA, QEP, CAN, LIN
EECS 452 – Winter 2010
Lecture 23 – Page 57/62
Friday – March 12, 2010
The C28027 has what?
◮ 16 × 16, 32 × 32 and dual 16 × 16 MAC.
◮ Harvard architecture but with unified memory map.
◮ 2 internal, 1% accurate oscillators.
◮ On-chip temperature sensor.
◮ Clock phase-lock-loop multiplier.
◮ Watchdog timer module.
◮ Missing clock detection circuitry.
◮ Up to 22 individually programable GIPO pins.
◮ Three 32-bit timers.
◮ One enhanced pulse width modulator (ePWM). Eight outputs.
◮ Independent 16-bit timer per ePWM module.
◮ four high resolution PWM (HPRPWM).
◮ 1/2 analog comparator.
◮ 7/13 channel, 4.6 MHz, 12-bit A/D converter
◮ 128 bit security lock.
◮ Serial peripherals, one SCI, one SPI, one I2C.
◮ three external interrupts.
EECS 452 – Winter 2010
Lecture 23 – Page 58/62
Friday – March 12, 2010
C28x processor block diagram
Program-read data bus, PRDB(0:31)
Program address bus, PAB(0:21)
Data-read address bus, DRAB(0:31)
Program-address
generation logic
Program control
logic
MUX
MUX
Data-read data bus, DRDB(0:31)
Data-read buffer register
Address
from stack
Immediate
address
Operand bus
XAR7
Immediate
data
Immediate
data
Registers
ARAU
XAR0
XAR1
XAR2
XAR3
XAR4
XAR5
XAR6
XAR7
DP
SP
ST1
AH:AL
PH:PL
T:TL
IER
DBGIER
IFR
ST0
PC
RPC
Multiplier,
barrel shifter,
and
ALU
Result
bus BUS
RESULT
Data-write buffer register
Data-/program-write data bus, DWDB(0:31)
Data-write address bus, DWAB(0:31)
EECS 452 – Winter 2010
Lecture 23 – Page 59/62
Friday – March 12, 2010
F2807 on-chip memory
◮
On-chip flash – 32 K 16-bit words.
◮
On-chip SARAM – 6 K 16-bit words.
◮
Boot ROM – 8 K 16-bit words.
Included (free) CCS has limit of 32 kB code size.
Why is this considered a 32-bit MCU?
No provision for adding external memory, easily.
EECS 452 – Winter 2010
Lecture 23 – Page 60/62
Friday – March 12, 2010
F28027 memory map
Prog Space
Data Space
0x00 0000
M0 Vector RAM (Enabled if VMAP = 0)
0x00 0040
M0 SARAM (1K x 16, 0-Wait)
0x00 0400
Low 64K
(24x/240x Equivalent Data Space)
0x00 0800
0x00 0D00
0x00 0E00
M1 SARAM (1K x 16, 0-Wait)
Peripheral Frame 0
PIE Vector - RAM
(256 x 16)
(Enabled if
VMAP = 1,
ENPIE = 1)
Reserved
Peripheral Frame 0
0x00 2000
Reserved
0x00 6000
Peripheral Frame 1
(4K x 16, Protected)
0x00 7000
0x00 8000
Reserved
Peripheral Frame 2
(4K x 16, Protected)
L0 SARAM (4K x 16)
(0-Wait, Secure Zone + ECSL, Dual Mapped)
0x00 9000
Reserved
0x3D 7800
User OTP (1K x 16, Secure Zone + ECSL)
0x3D 7C00
Reserved
0x3D 7C80
Calibration Data
0x3D 7CC0
0x3D 8000
High 64K
(24x/240x Equivalent Program Space)
Reserved
0x3F 0000
FLASH
(32K x 16, 4 Sectors, Secure Zone + ECSL)
0x3F 7FF8
0x3F 8000
0x3F 9000
128-Bit Password
L0 SARAM (4K x 16)
(0-Wait, Secure Zone + ECSL, Dual Mapped)
Reserved
0x3F E000
Boot ROM (8K x 16, 0-Wait)
0x3F FFC0
Vector (32 Vectors, Enabled if VMAP = 1)
Figure 3-5. 28023/28027 Memory Map
EECS 452 – Winter 2010
Lecture 23 – Page 61/62
Friday – March 12, 2010
Flash memory addresses
Table 3-1. Addresses of Flash Sectors in F28021/28023/28027
ADDRESS RANGE
PROGRAM AND DATA SPACE
0x3F 0000 - 0x3F 1FFF
Sector D (8K x 16)
0x3F 2000 - 0x3F 3FFF
Sector C (8K x 16)
0x3F 4000 - 0x3F 5FFF
Sector B (8K x 16)
0x3F 6000 - 0x3F 7F7F
Sector A (8K x 16)
0x3F 7F80 - 0x3F 7FF5
Program to 0x0000 when using the
Code Security Module
0x3F 7FF6 - 0x3F 7FF7
Boot-to-Flash Entry Point
(program branch instruction here)
0x3F 7FF8 - 0x3F 7FFF
Security Password (128-Bit)
(Do not program to all zeros)
Please DO NOT change any of the security codes or passwords.
Don’t even think about doing so.
EECS 452 – Winter 2010
Lecture 23 – Page 62/62
Friday – March 12, 2010
Download