INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
LECTURE 1
INTRODUCTION TO DSP
OBJECTIVES
This introduction is designed to answer several basic questions for the beginning student and familiarizes the
student experienced in digital signal processing with the design and architecture of Texas Instrument's
TMS320C54x device. The topics presented in this section include:
•
•
•
•
•
CHAPTER 1
The definition of digital signal processing (DSP)
The benefits of digital signal processors
Practical applications or uses of digital signal processing
General DSP design and architecture
Specific DSP architecture (TMS320C54x)
1-1
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
What Is DSP?
Analog Computer
a bit loud
Digital Computer
DSP
DAC
ADC
1010
OUTPUT
1001
1-1
LECTURE 1
Definition of a Digital Signal Processor
A digital signal processor (DSP) is an integrated circuit designed for high-speed data manipulations, and is
used in audio, communications, image manipulation, and other data-acquisition and data-control
applications.
How Digital Signal Processing Works
To explain how digital signal processing works, you must understand the difference between analog and
digital signals. Analog signals, which include sound intensity, pressure, light intensity, etc., are
continuously variable. Each of our senses is sensitive to different kinds of analog signals. Our ears are
sensitive to sound, our eyes are sensitive to light, and so on. Once we receive a signal, our sensory organs
convert it to an electrical signal and send it to our analog computer (the brain). Our brains are very powerful
parallel computers whose performance currently is unmatched by any digital computer. Our brains not only
analyze the information received, but also make decisions using this data.
Digital signals are those that are transmitted within or between computers, in which information is
represented by discrete states – for example, high voltages and low voltages – rather than by continuously
variable levels in a continuous stream, as in an analog signal.
How Analog and Digital Signals Work Together
Digital technology such as personal computers (PCs), assist us in many ways: writing documents, spell
checking, and drawing. Unfortunately, the world is analog, and electronic analog computers are not as
CHAPTER 1
1-2
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
versatile as digital computers. Therefore, in order to make use of the tremendous processing power that
digital technology offers us, we must do the following:
•
Convert the analog signals into electrical signals, using a transducer (such as a microphone, as
shown in the diagram).
• Digitize these signals (i.e., convert them from analog to digital using an analog-to-digital converter
(ADC)), as shown in the diagram.
Once the signal is in digital form, our computer can easily process it through a digital signal processor. The
DSP specializes in processing these signals, which makes it slightly different from microcomputers,
microcontrollers, or general-purpose microprocessors.
After the DSP has processed the signal, the output signal must be converted back to analog form so that we
can sense it. This is the digital-to-analog (DAC) conversion stage in the diagram. A loudspeaker, for
example, would reproduce analog signals coming from the DAC into sound.
So, we can see that to process the signal digitally, we need to convert it at least twice. Is it worth it? As you
will see, it really is, at least until someone designs an analog computer as versatile as a digital one.
CHAPTER 1
1-3
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Multiply and Add
1+2 = 3
Add
Multiply
0
1
0
1
0001
+
0010
0011
5
x
x
x
x
8
4
2
1
x
x
x
x
5*3 = 15
0011
0011
0011
0011
Shifted and
added
multiple times
3
0000
0011
0000
0011
=
MAC Operation
Most Common Operation in DSP
A = B*C + D
Typically 70 Clock Cycles With
Ordinary Processors
E = F*G + A
...
M ultiply, A d d , a n d Ac c u m u l a t e
Typically 1 Clock Cycle With
Digital Signal Processors
MAC Instruction
LECTURE 1
1-3
Why Do We Need Digital Signal Processors?
Why do we need a digital signal processor? Can we not use a general-purpose microprocessor to process
signals as well? Let us try to answer this question by giving an example of some arithmetic operations
performed by DSPs.
Add and Subtract
Add and subtract operations are performed quite simply by general-purpose microprocessors in a single or
very few clock cycles. Digital addition is similar to decimal add. Our example shows adding 1 plus 2. The
result is the decimal 3.
Multiply and Divide
The multiply and divide operations are more complex. A digital multiply operation consists of a series of
shift and add operations. Our example shows a multiplication of 3 by 5. Division, which is more complex,
will not be discussed here. It is discussed in TMS320C54x DSP Reference Set, Vol. 2 Mnemonic Instruction
Set, Chapter 2, reference number SPRU172B. The subtract conditionally (SUBC) instruction set describes
this process.
General-purpose microprocessors are quite slow in performing multiply and divide operations. They will
typically sequentially execute a series of shift, add, and subtract operations from their microcode to perform
a single multiply operation, and may consume many cycles to complete.
CHAPTER 1
1-4
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
The DSP performs multiplication in a single cycle by implementing all shift and add operations in parallel.
The circuitry is relatively complex and consumes a considerable number of transistors. The benefit is very
fast multiplication, which is required for processing most digital signals.
When general-purpose DSPs are not fast enough, the signal is either processed using analog circuits (which
may have some drawbacks), or in specialized DSP hardware designed only for that task. This eliminates
many of the benefits of a programmable DSP.
Digital signal processing, by its nature, requires many calculations of the form:
A = B*C + D
This may appear to be a simple task, but when speed is also required, we find that specialized, dedicated
hardware to perform this task is very useful.
Multiply, Accumulate (MAC)
Most DSPs have a specialized instruction that allows them to multiply, add, and save the result in a single
cycle. This instruction is usually called MAC (short for Multiply, Accumulate).
CHAPTER 1
1-5
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Drop in Multiplication Times
TIME (ns)
600
500
400
300
200
100
5 ns
0
1971
1976
1998
YEARS
LECTURE 1
1-4
We have established that for DSPs, we need specialized hardware that is capable of performing multiply and
accumulate functions in the shortest possible time (preferably in a single cycle). However, the central
problem remains. How can we achieve a fast multiply operation? Without a fast multiplier, a worthwhile
DSP design would only be a dream.
Designing fast multipliers was one of the greatest challenges in digital design up until the 1980s. In the
1970s, several of the world’s leading research laboratories sought to make fast digital multipliers a reality.
Multiply Times
In 1971, Lincoln Laboratories designed a multiplier using 10,000 integrated circuits, performing the
operation in just 600 ns. By the mid-1970s, multiply times of 200 ns were becoming commonplace. This
made it possible to design acceptable digital signal processors. These early designs were expensive and
bulky, but fast multiplication was determined to be possible.
In the early 1980s, single-chip DSPs with good performance started to appear, and ever since, multiply times
have continued to drop. Today’s 16-bit fixed-point devices can achieve multiply times of 5 ns. Given the
origins of this technology, this is a remarkable achievement.
CHAPTER 1
1-6
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
D igital Computers
von Neuman Machine
A
STORED
PROGRAM
AND
DATA
D
INPUT/
OUTPUT
ARITHMETIC
LOGIC
UNIT
A = ADDRESS
D = DATA
Harvard Architecture
A
A
ARITHMETIC
LOGIC
UNIT
STORED
PROGRAM
INPUT/
OUTPUT
STORED
DATA
D
D
1-
Now let us have a closer look at the internal architecture of computers so we can see how this has affected
the design of DSP chips.
Stored Program Machines
Computers need instructions to operate. At every clock cycle, they must be told what to do. If the
instructions are stored, the computer just has to fetch and execute them. Such computers are called stored
program machines. Our computer typically fetches an instruction and then data, operates on the data, and
returns the resulting data to the store.
Stored program machines use two well-known and widely used computer architectures: von Neuman and
Harvard.. The following diagram shows the structure of the two architectures.
von Neuman Architecture
The von Neuman machines store programming and data in the same memory area. In this type of machine,
an instruction contains the operation command and the address of the data on which the operation is
performed. There are two basic operation units within these machines: the arithmetic logic unit (ALU) and
the input/output unit. The ALU performs the core operations: multiply, add, subtract, and many more. It is
on these very simple core operations that complex software, such as word processing software, can be built.
The input/output unit manages the flow of external data for the machine.
CHAPTER 1
1-7
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Harvard Architecture
The primary difference between Harvard architecture and von Neuman architecture is that with Harvard,
program and data memories are physically separated transmission paths. This enables the machine to
transfer instructions and data simultaneously. Such a structure can greatly enhance performance, because
instructions and data can be fetched simultaneously. Harvard machines also have ALUs and input/output
units.
Von Neuman and Harvard Architecture History
The history of these two architectures is very interesting. The Harvard architecture was developed by
Howard Aiken in the late 1930s at Harvard University, with the Harvard Mark 1 becoming operational in
1944. The University of Pennsylvania followed in 1946 with the development of the Electronic Numerical
Integrator and Calculator (ENIAC).
John von Neuman, a Hungarian-born mathematician, suggested a simpler and lower cost architecture,
namely a single memory for programming and data. This simple solution has set the standard ever since. In
1951, the Institute of Advanced Studies in Princeton built the first von Neuman machine.
Which Architecture is Best Suited for DSP?
Common general-purpose personal computers use processors designed with the von Neuman architecture
while the Harvard architecture is more commonly used in specialized microprocessors for real-time and
embedded applications.
DSPs typically use Harvard architecture, although von Neuman DSPs also exist. Many signal and image
processing applications require fast, real-time machines. The drawback to using a true Harvard architecture
is that since it uses separate program and data memories, it needs twice as many address and data pins on
the chip and twice as much external memory. Unfortunately, as the number of pins or chips increases, so
does the price.
Electronic designers, who have had to tackle problems like these before, have come up with an elegant
solution: a single data and address bus is used externally, while two (or more) separate buses for program
and data are used internally. Timing (multiplexing) handles the separation of program and data
information. In one clock cycle, the program information flows on the pins, and in the second cycle, data
follows on the same pins. Program and data information is then routed onto separate internal program and
data buses. Such machines are called modified Harvard architecture processors because the internal
architecture is Harvard while the external architecture is von Neuman. The performance of modified
Harvard architecture processors typically compares well with the performance of true Harvard architecture
processors because most DSP chips also incorporate multiple internal RAM/ROM cells for high-use
instructions and data. This significantly reduces the time used for external sequential program and data
access associated with classic von Neuman processors.
CHAPTER 1
1-8
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
A Typical DSP System
MEMORY
l DSP Chip
l Memory
l Converters (Optional)
ADC
l Analog to Digital
DSP
l Digital to Analog
l Communication Ports
DAC
l Serial
l Parallel
PORTS
1-2
LECTURE 1
Components of a Typical DSP System
Typical DSP systems consist of a DSP chip, memory, possibly an analog-to-digital converter (ADC), a
digital-to-analog converter (DAC), and communication channels. Not all DSP systems have the same
architecture with the same components. The selection of components in a DSP system depends on the
application. For example, a sound system would probably require A/D and D/A converters, whereas an
image processing system may not.
DSP Chip
A DSP chip can contain many hardware elements; some of the more common ones are listed below.
Central Arithmetic Unit
This part of the DSP performs major arithmetic functions such as multiplication and addition. It is
the part that makes the DSP so fast in comparison with traditional processors.
Auxiliary Arithmetic Unit
DSPs frequently have an auxiliary arithmetic unit that performs pointer arithmetic, mathematical
calculations, or logical operations in parallel with the main arithmetic unit.
Serial Ports
DSPs normally have internal serial ports for high-speed communication with other DSPs and data
converters. These serial ports are directly connected to the internal buses to improve performance,
to reduce external address decoding problems, and to reduce cost.
CHAPTER 1
1-9
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Memory
Memory holds information, data, and instructions for DSPs and is an essential part of any DSP system.
Although DSPs are intelligent machines, they still need to be told what to do. Memory devices hold a series
of instructions that tell the DSP which operations to perform on the data (i.e., information). In many cases,
the DSP reads some data, operates on it, and writes it back. Almost all DSP systems have some type of
memory device, whether it is on-chip memory or off-chip memory; however, on-chip memory operates
faster.
A/D and D/A Converters
Converters provide the translator function for the DSP. Since the DSP can only operate on digital data,
analog signals from the outside world must be converted to digital signals. When the DSP provides an
output, it may need to be converted back to an analog signal to be perceived by the outside world.
Analog-to-digital converters (ADCs) accept analog input and turn it into digital data that consist of only 0s
and 1s. Digital-to-analog converters (DACs) perform the reverse process; they accept digital data and
convert it to a continuous analog signal.
Ports
Communication ports are necessary for a DSP system. Raw information is received and processed; then that
information is transmitted to the outside world through these ports. For example, a DSP system could output
information to a printer through a port. The most common ports are serial and parallel ports. A serial port
accepts a serial (single) stream of data and converts it to the processor format. When the processor wishes to
output serial data, the port accepts processor data and converts it to a serial stream (e.g., modem connections
on PCs). A parallel port does the same job, except the output and input are in parallel (simultaneous)
format. The most common example of a parallel port is a printer port on a PC.
CHAPTER 1
1-10
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Practical DSP Systems
l Hi-Fi Equipment
l Toys
l Videophones
l Modems
l Phone Systems
l 3D Graphics
l Image Processing
l And More ...
LECTURE 1
1-1 3
Practical Applications for DSP Systems
Since their introduction to the market, DSPs have found a wide variety of applications. They are used in
everyday hi-fi systems as well as high-end virtual-reality applications. Generally, DSP is not an expensive
technology. Some practical DSP systems are:
•
•
•
•
•
•
•
Hi-Fi Equipment
Toys
Videophones
Modems
Phone Systems
3D Graphics Systems
Image Processing Systems
Hi-Fi Equipment (Music Systems)
DSPs are now being used in sound processors that can create the illusion of three-dimensional sound or
modify the acoustics of a room to give the illusion of very large rooms and auditoriums. The result is movie
theater quality sound in a home music system.
Toys
Today, DSP technology is integrated in children's toys. Talking toys are commonplace; by pressing the
picture of a dog, children can hear it bark. They can also learn their alphabet by singing along with a
teaching toy. This clearly demonstrates that DSP technology is not expensive.
1-
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Videophones
Videophones will affect the lives of people from all walks of life. They are quickly improving in quality. It
is only a matter of time before prices drop and videophones become widely used. DSPs are used for
compression and decompression of images in videophones. There are several international standards for
compressing moving images. Programmable DSPs are the perfect answer to evolving standards since this
may only require a software update.
Modems
As the Internet continues to grow, so has the use of modems. To be able to handle the ever-increasing
communications load, modems have become faster and more efficient. DSPs perform vital functions in
modems such as modulating the digital bit stream into a signal compatible with a phone line, canceling line
echoes, and compressing and decompressing data
Phone Systems
These days, it is quite common to call a company and be answered by a machine that provides alternatives
such as: “Say 1 for sales,” “Say 2 for technical support,” and so on. These phone systems use DSPs to
perform the function of voice recognition. DSPs are also commonly used in the communications industry for
the add-on features you can get from your telephone company like caller ID, voice messaging, and call back.
3D Graphics Systems
Most flight simulators use 3D real-time graphics to enhance realism. To calculate the necessary details in
three dimensions (and to be able to do this 30 times every second) requires very efficient and powerful
processors. DSPs are now widely used in virtual-reality applications.
Image Processing Systems
Personal handheld digital cameras are also now becoming widespread. DSPs are used to perform the
conversion of charge-coupled device (CCD) chip analog voltages (video) to compressed data, which is then
stored digitally in constant storage EEPROM (electrically erasable ROM). The DSP also senses the buttons,
controls exposure times, provides the CCD gate timing, and downloads images to the PC.
DSPs are also used extensively in image processing, such as robot vision, machine vision, image
compression, and fingerprint recognition. A simple example of an image-processing application is the
inspection of printed circuit boards. The system works by recording the image of a working board and
comparing (subtracting) it to newly manufactured ones as they pass beneath a CCD camera. These systems
also use the efficient multiply and add cycles in DSPs to perform two-dimensional filtering.
1-
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Analog Advantages
l
Low cost and simplicity in some applications
–
–
Attenuators/amplifiers
Simple filters
l
Wide bandwidth (GHz)
l
Low signal levels
l
Infinite effective sampling rate
–
–
l
Infinite resolution in frequency
No aliasing/reconstruction issues
Infinite resolution in amplitude
–
No quantitation noise
1-1 4
LECTURE 1
Digital Signal Processing (DSP)
Advantages
l
Repeatability
–
–
–
Low sensitivity to component tolerances
Low sensitivity to temperature changes
Low sensitivity to aging effects
–
–
Nearly identical performance from unit to unit
Matched circuits cost less
l
High noise immunity
l
In many applications DSP offers higher
performance and lower cost
–
CD players versus phonographic turntable
LECTURE 1
CHAPTER 1
1-1 5
1-13
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
W h y D igital Processing?
ADC
PROCESS
DAC
l Advantages to Digital Processing
l Programmability
l Stability
l Repeatability
l Special Applications
LECTURE 1
1-8
So, Why Convert From Analog to Digital?
Some applications require analog designs, and some require digital designs. To process signals digitally,
they must be converted from analog to digital numbers. After a signal is processed, it is then often converted
back to analog form. Considering the overhead, digital processing must offer some clear advantages that
include:
•
•
•
•
Programmability
Stability
Repeatability
Special Applications
CHAPTER 1
1-14
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Programmability
l One Hardware = Many Tasks
LOW-PASS FILTER
SOFTWARE 1
SOFTWARE 2
SAME
HARDWARE
..
MUSIC SYNTHESIZER
..
MOTOR CONTROL
SOFTWARE N
l Upgradability and Flexibility
l Develop New Code
l Analog
Upgrade
Solder New Component
1-
Programmability
A single piece of digital DSP hardware can perform many functions. For example, a multimedia PC can
play music and also function as a word processor if it is loaded with suitable programs. This ability to use
the same hardware for many functions provides important flexibility. You can implement any new function
you think of, as long as you can program it.
Upgradability
Once you have designed and implemented your system, you may want to upgrade or add new functions.
Perhaps you would like to adapt your system to a new environment. With a digital system, this means
modifying your code. With an analog system, this could involve obtaining and soldering in new
components, or even a complete redesign.
Flexibility
A single DSP board can be made to perform many functions by simply loading new programs into it. In our
demonstrations, we are using the same DSK board as a music tune generator and as a low-pass filter by
simply loading it with different software. This flexibility reduces design time and complexity. With analog
circuits, a new circuit has to be designed for each new function.
CHAPTER 1
1-15
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Stability
The stability of analog circuits depends upon several factors. Analog circuits are affected by temperature
and aging, among other things. Also, two analog systems using the same design and components may differ
Analog Variability
Analog Circuits are affected by
lTemperature
lA g i n g
Tolerance of Components
Two Analog Systems using the same design and
components may differ in performance
1 k Ω + 10 years =
1.1k Ω
1-1 0
LECTURE 1
in performance.
Temperature
Ω + 10 years =
1kΩ
Ω
1.1kΩ
Analog components such as resistors, capacitors, diodes, and operational amplifiers are affected by
temperature, humidity, and aging. A temperature-sensitive analog circuit may perform quite differently in
the UK than in Egypt, where the temperatures are different. This could prove disastrous for a company that
sells its products worldwide.
Digital circuits do not gradually change their characteristics over time, temperature, or humidity. They
either work or they don’t work. In other words, digital circuits are repeatable as long as they are designed
with enough tolerance to operate properly over the range of expected conditions.
Aging
The effects of component aging can be detrimental to analog circuits as characteristics and performance
change. These effects can sometimes be anticipated, or their effect may not be critical. Analog designers
must be aware of these effects.
Tolerances
Components such as resistors and capacitors have tolerances. If a component tolerance is only accurate to
within 10%, two apparently identical analog circuits could perform differently enough to cause operational
problems. This can make design, manufacturing, and support expensive.
CHAPTER 1
1-16
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
D igital Repeatability
Perfect Reproducibility
l Nearly identical performance from unit to unit
l Performance not affected by tolerance
l No drift in performance due to temperature or aging
l Guaranteed accuracy
A CD player always plays the same music
quality
1-1 1
LECTURE 1
Digital Repeatability
A properly designed digital circuit will produce the same result every time, in addition to being identical
from unit to unit. If the same multiplication is performed on 500 computers, all 500 computers should
produce the same result. Component tolerances, aging, and temperature drifts also do not affect digital
circuits nearly as much.
A properly designed digital circuit will produce the same results in the UK as in Egypt, even when the
temperatures are different. On the other hand, 500 analog circuits could produce a range of results.
In digital circuitry, logical 1s and 0s are defined when an analog voltage is above or below an analog voltage
threshold. For a digital circuit to be repeatable, the analog voltage which represents the logical 1s and 0s
needs to be sufficiently greater or less than the threshold so as not to be affected by circuit variations or
noise. The only concern is that timing restrictions and maximum device ratings should not be exceeded. If
proper digital inputs are not maintained, the 1s and 0s can be corrupted, making a normally repeatable
digital circuit suddenly fail. On the other hand, analog circuit characteristics will tend to gradually drift.
Digital accuracy is determined by the number of bits used and is guaranteed to remain the same. With
analog circuits, the number of bits is effectively infinite, but the effects of noise, tolerances, and linearity can
rapidly diminish performance.
A digital CD player consistently produces the same high-quality digital music and is primarily only limited
by the analog components that are still required. Analog components in a CD player include the DAC,
laser, laser pickup, read head actuator, spindle motor and headphones.
CHAPTER 1
1-17
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Performance
Some special functions are best implemented
digitally
l Lossless C o m p r e s s i o n
l Adaptive Filters
l Linear Phase Filters
gain
f
phase
frequency
frequency
f1
LECTURE 1
f2
1-1 2
Compression
Storage media such as hard disk drives and satellite communications links for telephone and video are
examples where resources are limited in terms of the available size and bandwidths. More would be better,
but installing additional hardware tends to be very expensive. In these cases, costs are passed on to the
consumer in one way or another. An example would be the substantial cost difference between a 20-minute
and a 2-minute phone call, especially if the call is long distance.
Although the prices for installing more advanced hardware tend to be on a downward curve, our need for
more information is on an even more aggressive upward curve. Data compression can be a valuable tool for
providing adequate performance from available resources, and at a reasonable cost.
Let us consider the example of a satellite link or transmission channel. If one megabyte of data is
compressed down to half a megabyte and then transmitted, a decompressor can then recover the original data
at the other end. Considering that the transmission line is only aware that half a megabyte of data has been
passed, the data channel bandwidth is effectively doubled.
A DSP can compress raw binary data and signals through the use of appropriate software programs.
Lossless compression programs are suitable for exact binary data transfers. On the other hand, programs
designed for compressing speech and video offer much higher compression ratios but with some loss in
signal quality. Analog circuits can also be used for some very simple forms of lossless compression but offer
very little flexibility.
CHAPTER 1
1-18
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Adaptive Filters
DSP systems have been developed that cancel some of the noise within cabins of cars, helicopters, and
airplanes. The noises cancelled were those caused by engine vibrations. The noise cancellation systems
used the engine speed as a reference and produced an anti-noise signal from speakers to cancel the cabin
noise. Feedback from microphones in each headrest (or headphone) was used to adapt the characteristics of
the anti-noise until the best possible noise reduction was achieved. The system then continued to adapt
periodically to track changes in the cabin noise.
A DSP system can easily adapt to some changes in environmental variables. An adaptive algorithm simply
calculates the new parameters required and stores them back in main memory, overwriting the previous
values. A very basic level of adaptation is possible in analog systems, but the complete change of a complex
set of filter characteristics (used in noise cancellation) is beyond the practical scope of analog signal
processing.
A notch filter with a steep cut-off frequency would be one example of a filter that might be needed to
implement noise cancellation. In this case, the DSP has the ability to recalculate suitable notches to remove
the vibration noise as the engine RPM changes. It is virtually impossible to produce the many required
tunable filters using analog techniques alone.
Linear Phase Filters
There are some valuable signal processing techniques that are difficult or impossible to produce by analog
procedures. The classic example is that of a linear phase filter that is difficult to design in analog and even
then, over a limited bandwidth. With a digitally implemented filter, it is possible to keep the phase shift of
each component frequency consistent with all other frequencies. This is possible by using a finite impulse
response (FIR) filter. This term will be explained later.
CHAPTER 1
1-19
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
DSP Development
ADD A, B
11100010010100001001
HIGH-LEVEL LANGUAGE
ASSEMBLER
CODE
EMULATOR
TEST
S/W DESIGN
N
DSP
OK?
Y
PRODUCT
Tools of the Trade
LECTURE 1
1-7
The Program
The DSP chip is a piece of hardware that cannot function without the intelligence of a program. A program
is a series of instructions that perform certain functions. In our demonstrations, we will see some examples
of programming to compose simple musical tunes. To write these programs, we must use the tools of the
trade.
Assemblers
Assemblers generate machine-level code from text instructions. Let us assume we were given the following
two lines to remember:
ADD A, B
111000100101010001001
Since we understand written words better than a series of 1s and 0s, which line is easier to understand and
remember? Assemblers take our text instructions and convert them into machine language. This relieves us
of the burden of having to remember binary instructions for DSP. We will talk more about assembly
language in the next chapter.
High-Level Language
High-level languages are like assembly languages, but much friendlier. Assembly languages have very basic
instructions, such as multiply, add, and compare. High-level languages have higher-level instructions, such
as print, and repeat until equal to zero. Therefore, it is easier to write programs in high-level languages.
CHAPTER 1
1-20
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
While it is easier to write in high-level languages, assembly language can produce programs that are able to
execute faster. For this reason, both have their uses in DSPs. Sometimes it is necessary to write time-critical
sections of a program in assembly. A complete program may have sections of code in assembly and sections
in a high-level language. It is easy to combine both types of code into a single executable program.
Assembly and high-level programming languages make it possible to program DSPs to perform a variety of
functions.
Simulators
Flight simulators make you feel as if you are in the cockpit of a plane without the cost of an actual airplane,
fuel, or risk of crashing. Likewise, a DSP simulator is a software implementation of a DSP chip. A
simulator typically runs on a computer (PC or workstation), simulating almost all of the functionality of the
DSP. They are used to analyze the feasibility of designs before the designs are committed to hardware.
They are also very useful in determining whether or not a particular design will work.
Emulators
An emulator allows us to directly control and debug the results of instructions executing on the DSP.
Modern emulators do not replace the DSP chip on the board but exert their control through a serial
emulation scan path. Using these devices, it is possible to see all of the internal changes in the device at
each step. Developers can execute the instructions one step at a time, check voltage levels for correct
operation, and check each result in their own time. Emulators are invaluable tools in development
environments.
Debugger
A debugger interface is used to display program execution information in a useable format for the
programmer. The data displayed in the debugger windows is essentially a formatted data print of the
contents of the DSP memory. This memory is simply loaded into the PC using either an emulator or a
communications link with the PC using appropriate software.
For example, the memory window can display (and edit) data in hexadecimal, float or integer formats, but
the data is nonetheless binary 1s and 0s to the DSP. Likewise, the disassembly window is simply a
reformatting of the binary value into a recognizable alphanumeric mnemonic.
The CPU register window is a little different in that the C54x register values are not directly accessible as
memory data because they are not memory mapped registers. For the scan emulator, this job is quite easy
since the scan path simply routes through the internal registers of the DSP. For the DSK, this task is
accomplished using a special program that saves and restores the CPU registers from the DSP main memory.
Other than this, the data displayed in the CPU register window is simply another data form.
Debuggers consist of a user interface on the host PC computer, which can control and modify the contents
and execution of the chip. The user interface displays the contents of RAM, registers, and the disassembly of
the currently loaded program. The major advantage of debuggers over simulators is that they operate in real
time, allowing the designer to assess the performance of the system in a real-time environment.
CHAPTER 1
1-21
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Development Cycle
After the feasibility of the design is established through simulation, program design can begin. First, the
software is designed. This stage determines the complexity and the modules of the code. The modules of
software are written and tested, and then the full system is put together and tested. If everything works as
required, the result is version 1.0 of the product on the market. If it does not work as required, the process is
repeated until it does. When new requirements and improvements emerge as a result of user feedback, a new
version is produced via the same process.
CHAPTER 1
1-22
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Number Systems
l Represent numbers digitally
Decimal
2
Digit Number
Digit Number
128
2
7
64
2
7
32
6
2
6
16
5
2
5
8
4
2
4
4
3
2
3
2
2
2
2
1
1
2
1
0
0
l Any number can be represented as a series of 1s and 0s
l Decimal 3 in binary
Decimal
Digit Number
0
7
0
2
6
0
2
5
2
2
Digit Number
7
6
5
Binary
0
0
0
0
0
2
4
0
3
0
2
2+
1
1=
3
0
2
2
2
2
4
3
2
1
0
0
0
0
1
1
0000 0011
16+
8+
0
2=
0
26
l Decimal 26 in binary
Decimal
2
Digit Number
0
2
7
0
2
6
2
5
2
4
2
3
2
2
2
1
2
0
Digit Number
7
6
5
4
3
2
1
0
Binary
0
0
0
1
1
0
1
0
LECTURE 1
0001 1010
1-1 7
Number Systems
Let us now consider decimal, binary and hexadecimal (hex) number systems. The human-friendly decimal
system uses ten digits, 0 to 9, for number representation. Numbers larger than 9 are represented by carrying
a digit to the left. Number 10 represents one complete decimal count (digit 1x10) and a 0.
Binary
To represent numbers digitally, we are only allowed two binary digits, logic 1 and logic 0. Large numbers
can be represented in the binary system; however, more digits are needed to represent the same number in
the binary than are needed in the decimal system.
Consider the representation of decimal number 3 in binary, as shown in the preceding example. The value
of each binary digit is determined by its position. In the binary system, the maximum value the first digit
can have is 1. The second digit has a maximum value of 2. In an 8-bit system the decimal number 3 is
represented by setting the two least significant digits (LSB) to 1, or 0000 0011b.
To represent the decimal number 26, higher order binary digits are set to achieve the value 0001 1010b.
8
In an 8-bit binary system, the largest decimal number that can be represented is 2 -1 = 255 = 11111111b.
16
And the largest decimal number that can be represented in a 16-bit binary system is 2 -1 = 65,535.
CHAPTER 1
1-23
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
B inary and Hex
Decimal 0,1,2,…….,9
Binary 0,1
Hex
0,1,2,……..,A,B,C,D,E,F
16 Decimal
20 Decimal
0x10 Hex
0x14 Hex
l 4 bits of binary system is represented by a single hex digit
Decimal
2
8+
Digit Number
2
Binary
4+
3
2
1
2+
2
2
1
2
1
Hex
15
1=
1
0
1111
1
F
F
l Decimal 26 in binary and hex
Decimal
2
Digit Number
0
2
7
0
2
0
6
2
5
16+
2
4
8+
2
3
0
2
2=
2
2
1
0
2
Digit Number
7
6
5
4
3
2
1
0
Binary
0
0
0
1
1
0
1
0
Hex
LECTURE 1
1
A
26
0
0001 1010
1A
1-1 8
Binary and Hexadecimal
Another useful number system is base 16 or hexadecimal (hex). After digit 9, the alphabet letters A to F are
used to represent the top base numbers 10 through 15. The largest decimal number that can be represented
with a single-digit hex number is 15, which is F in hex. To represent decimal number 16 in hex, the next
digit position is used: 0x10 hex. To distinguish hex numbers from decimal we will use a preceding 0x.
Another common convention is to follow the number with an 'h'. In this case, the first hexadecimal digit
must be decimal numeric digit (0-9) to avoid confusing the resulting string as a symbol in a program. For
example, 0F3h would be a valid hexadecimal representation while F3h could be confused with a symbol.
A similar convention to the trailing 'h' for hexadecimal is used for binary numbers by following the binary 1
and 0 digits with a 'b'. Again, by specifying that the first character in the digit stream as numeric 0 or 1,
with a trailing 'b' the character string can be identified as only being a binary value.
Hex notation is very useful because large numbers can be represented with fewer digits than with the binary
or decimal system. The hex format is also extremely convenient for digital or binary systems because each
hex digit replaces exactly four binary digits. This is because the biggest single hex number (0xF, or 0Fh) is
represented exactly in four binary digits: 0xF hex = 1111b.
The 8-bit binary representation of decimal 26 is 00011010b, or 0x1A (hex) which is much shorter. The hex
system may look confusing at first, but when you need to convert to binary numbers, or represent large
numbers, you will soon realize the benefits.
CHAPTER 1
1-24
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
We will explain how to convert from hex to decimal and back in Chapter 6.
Signed Integers
l Signed magnitude integers
Signed
Decimal
Binary
Hex
Sign
Number
2
00 00 00 02
0
000 0000 0000 0000 0000 0000 0000 0010
3
00 00 00 03
0
000 0000 0000 0000 0000 0000 0000 0011
-2
80 00 00 02
1
000 0000 0000 0000 0000 0000 0000 0010
-3
80 00 00 03
1
000 0000 0000 0000 0000 0000 0000 0011
LECTURE 1
1-1 9
Signed Integers
To perform arithmetic, we need to be able to represent signed numbers. In the binary system, the most
significant bit (MSB) is used to indicate the sign of a number. When the MSB is set to 1, the number is
negative and when it is set to 0, the number is positive. Two conventions, signed magnitude and signed
two’s complement (2’s complement) exist for representing signed numbers.
The signed magnitude convention is familiar to us since this is how we represent negative decimal numbers.
For example, a +/- symbol is used for the sign bit to represent the negative of +10 as –10. However, this
leaves an interesting question about the value of 0. Is it +0 or –0, or are they the same? Another issue is
how to simplify the digital hardware in a DSP or microprocessor since smaller and faster circuits are an
advantage.
CHAPTER 1
1-25
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Two’s Complement Notation
2
Digit number
-2
Decimal
Binary
two’s complement
3
Decimal
calculation
Binary
two’s complement
Decimal
calculation
LECTURE 1
-2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
2
0
-128
64
32
16
8
4
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
+2
+1=
1
1
1
1
1
1
1
0
-128
+64
+32
+16
+8
+4
+2
+0=
3
-2
1-2 0
The table on the previous page shows binary notation for signed integers; 2, 3, -2 and -3 using a signed 32bit system. The first two positive integers are represented in the standard fashion. The last two negative
numbers have the top bit set to 1, but the rest of the representation remains the same.
This sign-representation system is not convenient for a binary or digital machine. The machine needs to
assess the sign bit and then carry out addition or subtraction, depending on the direction of the sign bit. A
more convenient system would use two’s compliment notation to perform both addition and subtraction with
the same hardware.
Two’s Complement Notation
To make it easier to understand two’s complement notation, our example uses an 8-bit binary representation.
For positive numbers, such as the example +3, the MSB is set to 0, and the other bit values are exactly the
same as in standard binary notation.
The two’s complement notation of a negative number is quite different. If the MSB is set to 1, the MSB
represents a negative value for that bit position. The top bit in an 8-bit system would therefore represent
negative 27 or -128 with the rest of the bits again representing positive values. The sum of the decimal
values of each bit that are set gives us the numbers decimal value. To represent –2 in two’s complement, the
top bit is set to 1, representing –128, and the lower-order bits are set to make the result of the addition of all
bits equal to –2. In this case, –2 = –128 + 126.
The hardware method that is used to implement a two’s complement converter and adder is even simpler.
This method negates a number by simply inverting all the bits and adding a 1 (as a carry bit) to the LSB. If
a 1 is added to the LSB, this causes a carry into the upper bits, which may ripple carry bits all the way to the
CHAPTER 1
1-26
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
top bit. During addition, each binary bit cell receives two bits from the two operands, plus any carry that
may propagate from the next lower-bit cell. The 1 that is added into the LSB is simply implemented as a
carry bit as if it were coming from the next lower-bit cell.
The largest negative and positive values in two’s complement form for an 8-bit system are as shown:
Most positive +1 * 27 - 1 = +127 >> 0111 1111b >>
0 +64+32+16+8+4+2+1
Most negative –1 * 27
= –128 >> 1000 0000b >> –128
+0
A 16-bit system would have a range of
Most positive +1 * 215 - 1 = +32767
Most negative –1 * 215
= –32768
For a 32-bit system such as a TMS320C31 32-bit processor
Most positive +1 * 231 – 1 = +2147483647
Most negative –1 * 231
= –2147483648
CHAPTER 1
1-27
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Fixed-Point Notation
Conventions
l Number range is between 1 and -1
l Decimal point is always in a fixed location (e.g., 0.74, 0.34, etc.)
l Multiplying a fraction by a fraction always results in a fraction and will
not produce an overflow (e.g., 0.99 x 0.9999 = less than 1)
l Successive additions may cause overflow
Why?
l Signal processing is multiplication-intensive
l Fixed-point notation prevents overflow (useful with a small dynamic
range)
l Fixed-point notation is less expensive
How is fixed-point notation realized in a DSP?
l Most fixed-point DSPs are 16 bits
l The range of numbers that can be represented is 32767 to -32768
l The most common fixed-point format is Q15
Q15 Notation
Bit 15
sign
Bits 14 to 0
two’s complement number
LECTURE 1
1-2 2
Fixed-Point Notation
Fixed-point notation, sometimes called fractional-point notation or ’Q’ format, uses an implied binary point
to represent binary fractions. This point always remains at a fixed location. The dynamic range of a
processor is the range between the smallest and the largest number it can represent. When the dynamic
range is limited,
In a 16-bit processor, the dynamic range is 32767 to –32768. Such a small dynamic range can easily create
overflows. For example, 200 × 350 = 70000, which is an overflow!
However, if the number range is limited, or more precisely scaled, to +1 to –1, a multiplication could never
produce an overflow. For example, the multiplication of two fractional numbers within the range of 1 to –1
must always produce a result that is also a fraction. The result is therefore confined to be within the range of
1 to –1. Unfortunately, successive additions can produce overflow values outside the range of 1 to –1. This
point should be remembered when performing fixed-point arithmetic.
Signal processing is both multiplication and addition intensive. An overflow can have serious consequences,
(e.g., unintentionally clipping a large signal). A fixed-point system can solve this problem by either
checking for overflows after each math operation, or by knowing that the inputs and outputs of the operation
are input bounded or well behaved.
CHAPTER 1
1-28
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Why Use a Fixed-Point System?
The cost of implementing many DSP systems is strongly dependent on the amount of chip silicon used to get
the job done, with most of the chip silicon being either in the processor or in the surrounding memory. If the
chip silicon is mostly used for data storage, such as long audio delay buffers, video or coefficient tables, the
difference between 16- and 32-bit data storage can be as much as 2:1.
Furthermore, routing twice as many signals around the chip and system board can consume extra space and
drive up the power consumption. Another advantage of short 16-bit fixed-point chips is that by making the
core processor small, not only are the chips smaller and less expensive, they are also usually a bit faster.
This may again lower the price of the DSP chip that, in price-sensitive volume applications, is an important
consideration. However, if a 16-bit system must also perform 32-bit operations, these advantages can be lost
and end up costing more. If a system can tolerate a smaller dynamic range and resolution, then the use of
16-bit data can be an economic advantage.
Fixed-Point Q Notation
As we have seen in multiplication and addition, overflows can be a problem for fixed-point DSPs. To
eliminate this problem, a programming convention called Q format is introduced where fixed-point DSPs
operate on fractional numbers which, by definition, cannot saturate. The principle of Q notation is the
application of a simple scaling coefficient to convert fractions to integers that a fixed-point DSP is designed
to handle. (Note that this is not an issue for floating-point DSP).
The letter Q represents the ‘Quantity of fractional bits’ and the number following the Q indicates the number
of bits that are used for the fraction. This divides the number into an upper and lower region of bits where
the upper region contains the sign bit and any whole integer values, and the lower bits hold the fraction.
Any Q format is possible, but Q15 is the most widespread in 16-bit DSPs and Q31 is most often used for 32bit DSPs.
In Q15 format, an imaginary decimal point is placed between bits 15 and 16. The upper range in this case is
only one MSB (for a 16-bit DSP) which is essentially the sign bit, or bits 16–31 in a 32-bit DSP. The
remaining 15 bits are used to represent the fractional part of the number. To convert a Q-format integer to a
floating-point value, a scaling coefficient is needed. If the Q number is 15, the coefficient or resolution of
the fraction will be 2^–15 or 30.518e–6.
CHAPTER 1
1-29
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Q15 Format
Dynamic range in Q15
Number
Biggest
Smallest
Fractional number
0.999
-1.000
Scaled integer for Q15
32767
-32768
Number representations in Q15
Decimal
Q15 = Decimal x 2
0.5
0.05
0.0012
15
Q15 Integer
0.5 x 32767
16384
0.05 x 32767
1638
0.0012 x 32767
39
Rules for operations
l Avoid
operations with numbers larger than 1
2.0 x (0.5 x 0.45) = (0.2 x 0.5 x 0.45) x 10 = (0.5 x 0.45) + (0.5 x 0.45)
l Scale numbers before the operation
0.5 in Q15 = 0.5 x 32767 =16384
LECTURE 1
1-2 3
Dynamic Range in Q15
The dynamic range, or ratio of largest to smallest magnitude levels, is the same for Q15 and normal integers.
It is the scaling coefficient that sets the two apart, and other than this, you may have difficulty knowing
which format is in use. As mentioned previously, to prevent overflows the inputs and outputs can be
constrained to fractions in the range of 1 to –1 by simply applying a scaling coefficient.
Number Representation in Q15
Scaling a number is simple:
15
Integer = Q15_fractional_number × 2
The second table on the slide shows several examples of scaling.
Rules for Operations
The most important rule in using the Q15 fixed-point format is to avoid using a number larger than 1 or
smaller than –1. There are some instances where this can be safely violated. For example, a property of a
2’s complement adder is that if an addition overflow occurs, exceeding the available 16-bit range, a
subtraction can unwrap the result back down into a valid range. Generally, however, it is best to avoid the
problem in the first place. If a dynamic range greater than 32767 to –32768 (i.e., a 16-bit system) is
required, it is also possible to perform longhand arithmetic in pieces, but this consumes CPU cycles and
data.
The bottom portion of the slide shows an example where multiplying 0.5 and 0.45 (unscaled for clarity)
results in another fraction, which is not a problem. Multiplying the product by 2 can be done using two
methods. One method is to multiply one of the inputs by 2 first. If the result of this intermediate operation
exceeds +/–1.0, we will have a problem. The inputs could be scaled down first and then scaled up
CHAPTER 1
1-30
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
afterwards, but this is also far from efficient. An alternative method is to add the product to itself,
effectively multiplying by 2. This is one of the difficulties of using the fixed-point operation. The
programmer needs to think about these issues and plan ahead.
Another important rule is that all numbers must be scaled to the same Q format (Q15 in our examples),
placing the decimal points of both operands in the same place, before an addition or subtraction is
performed. Generally, this is also practiced in multiplication. However since the scaling coefficients are
multiplied, the correct fraction can be retrieved using yet another scaling constant. Nevertheless, mixing Q
formats is not desired.
CHAPTER 1
1-31
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Q15 Operations
Addition
Decimal
Q15
Scale back
Q15 / 32767
0.5 + 0.05 = 0.55
16384 + 1638 =
18022
0.5 – 0.05 = 0.45
16384 – 1638 =
0.55
Multiplication
2 x 0.5 x 0.45 =
Decimal
Q15
Back to Q15
Product / 32767
0.5 x 0.45 = 0.225
16384 x 14745 =
241584537
0.225 + 0.225 = 0.45
7373 + 7373 =
14746
Scale back
Q15 / 32767
7373
0.45
1-2 4
LECTURE 1
Q15 Addition
Q15-format addition is shown in the first example above. The numbers 0.5 and 0.55 are each scaled by
32767 (Q15 coefficient) and then added. Since both numbers are scaled to the same Q format, the decimal
point in both will be in the same place (bit 15). The sum is then scaled back to verify the result.
In the second example, the correct subtraction (sum of two's complement) is 14746, and the expected scaled
result is 14746 / 32767 = 0.45.
Q15 Multiplication
When scaled numbers are multiplied, the scaling coefficients are multiplied. To compensate, a second
scaling factor that will put the data in the correct bit position is used. The Q15 multiplication shown gives
an idea of how large the numbers can get. But as we can also see, the division by the Q-15 coefficient scales
the number back down and we get the correct result.
Anticipating this, the multiplier on a 16-bit DSP produces a 32-bit result. In actuality, the result is packed
into the upper bits and comes with two sign bits. The programmer can either downshift to the lower 16 bits,
or can left-shift up by one bit before storing the upper bits. Both methods will produce the same result, but
the DSP is usually optimized to do the up-shift by 1 bit and store, so this is normally what is done.
We can see how a Q15*Q15 multiply works by examining the process in long hand. In particular when the
scaling coefficients are multiplied the result is a new scaling constant with a Q value equal to the sum of the
two Q constants used on the operands. Given A and B in Q15 format, the result C=A*B is
C=(A*215) * (B*215) = A*B * 230
C=(A*Qx)*(B*Qy) = A*B * Qx+y
CHAPTER 1
1-32
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
It is evident that the output is no longer in Q15 format. To compensate, we need to ask where the new
decimal point is. By noting that 230 is the same as saying Q30, we know that the decimal point is at bit 30 of
the 32-bit result. To get back to 215 (Q15), we can multiply by 2–15 (a shift right by 15), or by multiplying yet
again by 21 to a Q31 result. In this case, the correct bits are packed into the upper 16 bits of the DSP register
The fixed-point Q format has the advantage of preventing overflows but certainly introduces complications
for the DSP programmer.
CHAPTER 1
1-33
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
T M S Floating-Point Form a t
TMS single-precision floating-point format
31
... 24
e
8 bits
23
s
1 bit
22
..............
f
23 bits
Bit No
0
e = exponent is a signed two’s compliment 8-bit field and determines
the location of the binary Q point
s = sign of mantissa (s = 0 positive, s =1 negative)
f = fractional part of the mantissa; an implied 1.0 is added to this fraction
but is not allocated in the bit field since this value is always present
Conversion equations
s=0
s=1
Binary
e
X = 01.f x 2
e
X = 10.f x 2
Decimal
e
X = 01.f x 2
e
X = ( -2 + 0.f ) x 2
Equation
1
2
Special case
s=0
X=0
e = -128
Exponent (e)
Decimal
Hex
two’s comp.
0
1
127
-1
-128
00
01
7F
FF
80
1-25
LECTURE 1
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Floating-Point Formats
Although the C54x device is fixed-point, a popular floating-point format (used, for example, in
TMS320C67x devices) standard is IEEE 754. The differences between various floating-point formats are
actually insignificant, and conversion can be performed in ASIC hardware or software.
TMS320 Single-Precision Floating-Point Format
The preceding table shows an example of a TMS320C67x floating-point bit assignments. The top eight bits
represent the exponent (e) in two’s complement notation. Bit 23, (s), is the sign bit of the mantissa, and the
lower 23 bits are the fraction (f) of the mantissa. A value of 1.0 is also implied in the mantissa, but is not
allocated a bit position since it is always present. This format is called floating-point because of the implied
binary point floats around, depending on how large the exponent is. The exponent is essentially a variable Q
value that is automatically adjusted for maximum precision and range by the hardware.
Conversion Equations
The middle table on the slide shows the conversion equations for the TMS320 single-precision floating-point
format. The second column shows the binary and the third column shows the decimal version of the same
equation. The decimal version of the equation is easier to understand. There are two different equations for
positive and negative mantissa. We will use decimal examples of both equations to aid in the understanding
of this format.
CHAPTER 1
1-34
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
The representation of 0.0 is a special case where any number with an exponent of -128 (0x80) is treated as
zero. Since -128 is the smallest possible value for the exponent, the scaling coefficient for these numbers
would produce very small values. The convention used in the assembler is to represent zero as 0x80000000.
For example, all of the following numbers are treated as the value 0:
• 0x80000000
• 0x80123456
• 0x80876345
This is a special case worth remembering.
CHAPTER 1
1-35
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Floating-Point Numbers
Calculate 1.0e0
In hex
In binary
s = 0
00 00 00 00
0000 0000 0000 0000 0000 0000 0000 0000
Equation 1 applies:
X = 01.f x2 e
f = 0
01.0 x 2 0
e = 0
= 1.0
Calculate 1.5e01
In hex
In binary
s = 0
03 70 00 00
0011 0111 0000 0000 0000 0000 0000 0000
Equation 1 applies:
X = 01.f x2 e
0011
s111 ...
e = 3
f = 0.5 + 0.25 + 0.125 = 0.875
X = 01.875 x 2 3 = 15.0 decimal
LECTURE 1
1-2 6
Floating-Point Numbers
Let us now find the binary representation of 1.0e0.
Since this is a positive number, the sign bit s=0. Therefore, Equation 1 applies. The fractional part of the
mantissa (f) is 0 (f=0), and the exponent (e) is also 0 (e=0). Now that we know the decimal values for all the
appropriate parts, we can express the 32-bit binary format. The fractional part of the mantissa (f) is zero, and
is represented by setting bits 0 to 22 and the sign bit to 0. This leaves the top eight bits for the exponent (e),
which are also set to 0. The top part of the slide shows the binary and hex representation for 1.0e0.
The binary representation of the floating-point number decimal 1.5e01 is next. This number is positive,
which implies that the sign bit s = 0 and that Equation 1 applies. Knowing that 1.875 x 8 = 15, the
fractional part of the mantissa is 0.875 (the 1.0 is implied) and the exponent e = 3 (23=8). Adding fractions
0.5, 0.25 and 0.125 together yields 0.875, which corresponds to setting the top three bits (20,21 and 22) of
the fractional to 1. The binary representation of the floating-point value is shown in the bottom part of the
slide.
Calculating negative floating-point numbers is slightly different. Although it is important to understand
how binary representations correlate with decimal floating-point values, it is rarely necessary to perform the
calculation.
CHAPTER 1
1-36
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
More on Floating Point
Calculate -2.0e0
In hex
In binary
s = 1
00 80 00 00
0000 0000 1000 0000 0000 0000 0000 0000
Equation 2 applies:
X = ( -2.0 + 0.f ) x 2 e
f = 0
( -2.0 + 0.0 ) x 2 0
e = 0
= -2.0
Addition
1.5 + (-2.0) = 0.5
Multiplication
1.5e00 x 1.5e01 = 2.25e01 = 22.5
LECTURE 1
1-2 7
Negative Floating-Point Numbers
The binary representation in TMS320 format for -2.0e0 is now considered. Since the number is negative, the
sign bit s = 1, and equation 2 is applied. The mantissa is actually in twos compliment, so the fraction f (0.f)
e
is added to a decimal value of -2.0. The mantissa, -2.0+f, is then multiplied by the exponent multiplier, 2 .
To arrive at a value of -2.0 the fraction (f) and the exponent (e) are therefor both 0 with the sign bit set to 1.
The binary representation for -2.0e0 is shown on the top portion of the slide.
Addition and Multiplication
Addition and multiplication of floating-point numbers is simplicity in itself. The bottom portion of the slide
shows an example of each. The DSP programmer does not need to do any scaling or take any special
precautions before or after an addition, subtraction or multiply since this is all done in hardware. This is one
of the reasons why it is easier to program floating-point DSP devices.
CHAPTER 1
1-37
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Dynamic Range
Ranges of number systems
Numbers
Base 2
Largest Integer
2
Smallest Integer
2
Smallest Q15
Smallest Floating Point
31
-1
2 147 483 647
7F FF FF FF
31
-2 147 483 648
80 00 00 00
-1
32 767
7F FF
15
-32 768
80 00
38
7F 7F FF FD
38
83 39 44 6E
-2
Largest Q15
Largest Floating Point
15
-2
(2-2
Two’s
Complement
Hex
Decimal
-23
127
)x2
-2 x 2
3.402823 x 10
127
-3.402823 x 10
l The dynamic range of floating-point representation is very large
l Conclusion
l Largest integer x (1.5 x 10
29
l Largest Q15 x (1.03 x 10
) ~ = largest floating point
LECTURE 1
34
) ~ = largest floating point
1-2 8
Comparison of Dynamic Ranges
The dynamic range in a number system means the distance, in unit steps, between the largest and smallest
number in that system. The larger the dynamic range is, the less potential it has of creating overflow
conditions. Some signal-processing applications need a larger dynamic range than others. For example, a
radar application may be trying to extract a tiny signal of only a few µVs buried in noise with an average
level of several volts.
The top table on the slide shows the dynamic ranges of a 32-bit signed integer notation, a fixed-point Q15
format used by 16 bit fixed point DSPs, and a 32-bit TMS320 single-precision floating-point format. It is
clear that the TMS320 floating-point format has a larger dynamic range, but to fully appreciate the
difference in dynamic range, you must multiply the largest integer with a constant value to reach the biggest
value in TMS320 single-precision floating-point format. This constant is very large, indicating the vast
difference in the dynamic ranges of the 32-bit signed integer notation and the TMS320 single-precision
floating-point notation. The same comparison with the Q15 format reveals an even bigger difference.
Clearly, the TMS320 single-precision floating-point format has a much larger dynamic range than the other
number systems. The TMS320C67x single-precision floating-point architecture is just one reason for its
popularity in certain signal-processing applications.
Note that the resolution in TMS320 single-precision floating-point is 24 bits. This extra precision is a big
benefit in applications such as digital audio. Humans also tend to respond to audio in a logarithmic way that
is very similar to floating point. Applications and demonstration examples that take advantage of this can be
downloaded from the Texas Instruments web site.
CHAPTER 1
1-38
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Fixed vs . Floating Point
l
l
l
l
l
l
DSP devices are designed as floating point or fixed point
Fixed-point devices are usually 16-bits, e.g. TMS320C5x
Floating-point devices are usually 32-bits, e.g. TMS320C3x
Floating-point devices usually have a full set of fixed-point instructions
Floating point devices are easier to program
Fixed-point devices can emulate floating point in software
Comparison
Characteristic
Floating point
Dynamic range
much larger
Fixed point
smaller
Resolution
comparable
comparable
Speed
comparable
comparable
Ease of programming
much easier
more difficult
Compiler efficiency
more efficient
less efficient
Power consumption
comparable
comparable
Chip cost
comparable
comparable
System cost
comparable
comparable
Design cost
less
more
faster
slower
Time to market
1-2 9
LECTURE 1
DSP Devices
DSP devices are designed as fixed- or floating-point devices. The design philosophy, data paths, and
internal modules of each device are different. Generally, fixed-point devices address high-volume and
inexpensive applications while floating-point devices target high-performance applications. But these
differences are becoming hard to distinguish because the price of floating-point devices continues to fall.
Fixed-point devices, such as the TMS320C54x device, are usually 16 bits with fewer external pins.
Floating-point devices, such as the TMS320C67x device, are commonly 32 bits. Floating-point devices
usually have a full set of fixed-point instructions and can be used as fixed-point processors without any speed
penalty. Fixed-point devices can emulate floating-point devices in software, but there is a speed penalty
because the conversion from fixed- to floating- point is performed in software.
Comparison of Fixed vs. Floating-Point Devices
A table comparison of fixed- and floating-point devices shows clearly the key component of each of the
systems. Features like floating-point relieves the designer of any consideration of dynamic range in the
design, but can cost more in CPU and additional memory costs.
The speed of fixed point systems will tend to be slightly higher and consume less power, yet with the
parallelism and greater precision of 32 bit data, this can sometimes easily outweigh any speed penalty.
Floating-point devices are much easier to program; there is less concern with scaling, dynamic-range issues
and, in most cases, resolution. Resolution is often determined by bus width but this also drives system cost.
CHAPTER 1
1-39
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
C compilers for floating-points devices are much more efficient than C compilers for fixed-point devices.
The primary reason for this is that the fixed-point devices do not have large register sets and therefore need
software modules for number conversions to provide a reasonable C interface. For example, when the C
programmer declares a floating-point number, an assembler routine needs to convert this format into fixedpoint format and back again after the processor has executed the necessary operations. Fixed-point devices
typically have 1 or 2 accumulators, whereas the C67x floating-point family has sixteen 32-bit registers that
can be used for math operations. Having more registers to work with is an advantage for a C compiler.
These are important points in choosing a device for an application. Programs written in C will tend to favor
floating-point devices.
Power consumption depends heavily on both the system architecture and the software that is used. In a
CMOS design, power is consumed when a capacitive node is charged from one supply rail to the other. If
the change in state does not occur, no power is consumed. Since the processor, memory and surrounding
system board may consist of millions of internal and external nodes, it is important to toggle as few as
possible to get complete a given task. The other variable is to try and minimize the capacitance of each
node. Simply put, toggles per second and higher capacitance equates to more power usage.
Power consumption is therefore related to clock rates and data-bus width. If it takes fewer cycles to get the
same job done on a wider bus, the net power usage may be the same or even better. For example, a 16-bit
fixed-point device might use a similar amount of power when compared to a 32-bit floating-point device
using only 16 bits of its data bus.
The cost of floating-point-device chips is also becoming comparable to traditional fixed-point devices.
Floating-point system costs need not be high just because they are internally using 32 bits. Minimum
systems are made possible through the efficient use of internal RAM and fewer external components. The
DSP Starter Kits and all of the applications that run on them would be an excellent example of a minimal
system.
The cost of programming, measured in the salary dollars paid out to a programmer, can also be a deciding
factor. Selling many units to a very large market can absorb the extra time required for a fixed-point design.
For smaller markets, or when time to market is important, the low design costs of floating-point are very
beneficial.
Selecting a device for a particular application is a complex decision and should be considered carefully along
with any other points that are specific to the design. Our discussion highlighted some of the more important
considerations.
CHAPTER 1
1-40
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
TMS320 Family
16-Bit Fixed Point Devices
32-Bit Floating Point Devices
’C1x
Hard-Disk Controllers
’C3x
Videophones
’C2x
Fax Machines
’C4x
Parallel Processing
’C2xx
Embedded Control
’C5x
Voice Processing
’C54x
Digital Cellular
Phones
’
Other Devices
’C6x Advanced VLIW
Processor
Wireless Base
Stations/Pooled
Modems
’C8x
LECTURE 1
Video Conferencing
1-3 0
TMS320 Family
The Texas Instruments TMS320 family of DSP devices covers a wide range, from a 16-bit fixed-point device
to a single-chip parallel-processor device. In the past, DSPs were used only in specialized applications. Now
they are in many mass-market consumer products that are continuously entering new market segments.
Let us briefly consider the Texas Instruments TMS320 family of DSP devices and their typical applications.
C1x, C2x, C2xx, C5x, C54x
The width of the data bus on these devices is 16 bits. All have modified Harvard architectures. They have
been used in toys, hard disk drives, modems, cellular phones, and active car suspensions.
C3x
The width of the data bus in the C3x series is 32 bits. Because of the reasonable cost and floating-point
performance, these are suitable for many applications. These include almost any filters, analyzers, hi-fi
systems, voice-mail, imaging, bar-code readers, motor control, 3D graphics, or scientific processing.
C4x
This range is designed for parallel processing. The C4x devices have a 32-bit data bus and are floating-point.
They have an optimized on-chip communication channel, which enables a number of them to be put together
to form a parallel-processing cluster. The C4x range devices have been used in virtual reality, image
recognition, telecom routing, and parallel-processing systems.
C6x
The C6x devices feature VelociTI, an advanced very long instruction word (VLIW) architecture developed
by Texas Instruments. Eight functional units, including two multipliers and six arithmetic logic units
CHAPTER 1
1-41
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
(ALUs), provide 1600 MIPS of cost-effective performance. The C6x DSPs are optimized for multi-channel,
multifunction applications, including wireless base stations, pooled modems, remote-access servers, digital
subscriber loop systems, cable modems, and multi-channel telephone systems.
C8x
The C80 is the first processor in this range. It has parallel processing on a single piece of silicon with four
advanced DSPs (ADSPs) and a RISC master processor. It is used in high-performance video telephony, 3D
computer graphics, virtual reality, and a number of multimedia applications. A lower-cost version, the C82,
features two ADSPs and the RISC master processor.
CHAPTER 1
1-42
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
TMS320C54x Architecture
System control
interface
Program address generation
logic (PAGEN)
Data address generation
logic (DAGEN)
ARAU0, ARAU1
AR0-AR7
ARP, BK, DP, SP
PC, IPTR, RC,
BRC, RSA, REA
P A
B
P
B
Memory
and
external
interface
C A
B
C
B
D A
B
Peripheral
interface
D
B
E A
B
E
B
EXP
encoder
X
D
A
B
M U X
T register
T
D
A
A
Sign ctr
P
C
D
T
A(40)
Sign ctr
A
B
C
D
Sign ctr
B
S
Sign ctr
0
A
Fractional
B
M U X
Adder(40)
S A T
R O U N D
M U
B
A
Legen
d: A
B
C
D
E
M
P
S
T
U
Accumulator A
Accumulator B
CB data bus
DB data bus
EB data bus
MAC unit
PB program bus
Barrel shifter
T register
ALU
C
D
Barrel
shifter
ALU(40)
A
A
Sign ctr
M U X
Multiplier (17 y 17)
Z E R O
B(40)
B
M U X
S
C O M
M S W / L S W
select
P
TR
N
T
E
C
LECTURE 1
1-3 1
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
TMS320C54x Architecture
The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power with eight
buses. Separate program and data spaces allow simultaneous access to program instructions and data,
providing a high degree of parallelism. For example, three reads and one write can be performed in a single
cycle. Instructions with parallel store and application-specific instructions fully utilize this architecture. In
addition, data can be transferred between data and program spaces. Such parallelism supports a powerful set
of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle.
Also, the C54x includes the control mechanisms to manage interrupts, repeated operations, and function
calling.
Fixed-point processors represent numbers as a magnitude and sign within a certain number of bits. For the
C54x, it is 16 bits. This is in contrast to floating-point processors that represent numbers as magnitude
multiplied by an exponent. Fixed-point processors have smaller dynamic range the range of numbers that
can be represented) than floating-point processors, but are also less complex and consequently less
expensive. If the extended dynamic range is not needed, a fixed-point processor may be a more cost-efficient
choice.
CHAPTER 1
1-43
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Bus Structure
The C54x architecture is built around eight major 16-bit buses (four program/data buses and four address
buses):
• The program bus (PB) carries the instruction code and immediate operands from program
memory.
• Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, data
address generation logic, program address generation logic, on–chip peripherals, and data
memory.
• The CB and DB carry the operands that are read from data memory.
• The EB carries the data to be written to memory.
• Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction
execution.
• The C54x can generate up to two data-memory addresses per cycle using the two auxiliary
register arithmetic units (ARAU0 and ARAU1).
The PB can carry data operands stored in program space (for instance, a coefficient table) to the multiplier
and adder for multiply/accumulate operations or to a destination in data space for data move instructions
(MVPD and READA). This capability, in conjunction with the feature of dual-operand read, supports the
execution of single-cycle, 3-operand instructions such as the FIRS instruction. The C54x also has an onchip bi-directional bus for accessing on-chip peripherals. This bus is connected to DB and EB through the
bus exchanger in the CPU interface. Accesses that use this bus can require two or more cycles for reads and
writes, depending on the peripheral’s structure. Table 1 summarizes the buses used by various types of
accesses.
Table 1.
Bus Usage for Read and Write Accesses
Access Type
Address Bus
PAB
Program read
√
Program write
√
CAB DAB EAB
√
√
√
√
√
Data read/data write
√
Dual read/coefficient read
√
√(hw) √(lw)
√
lw
=
1-44
√
√
√
√
√
√
√
Peripheral write
√
√
√
Peripheral read
high 16–bit word
EB
√
√
Data single write
CHAPTER 1
DB
√
√(hw) √(lw)
Data long (32–bit) read
=
CB
√
Data dual read
hw
PB
√
Data single read
Legend:
Data Bus
low 16–bit word
√
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Internal Memory Organization
The C54x memory is organized into three individually selectable spaces: program, data, and I/O space. All
C54x devices contain both random-access memory (RAM) and read-only memory (ROM). Among the
devices, two types of RAM are represented: dual-access RAM (DARAM) and single-access RAM (SARAM).
Table 2 shows how much ROM, DARAM, and SARAM are available on the different C54x devices. The
C54x also has 26 CPU registers plus peripheral registers that are mapped in data-memory space.
Table 2.
Program and Data Memory on the TMS320C54x Devices
Memory Type
’541
’542
’543
’545
’546
’548
’549
’5402
’5410
’5420
ROM:
28K
2K
2K
48K
48K
2K
16K
4K
16K
0
Program
20K
2K
2K
32K
32K
2K
16K
4K
16K
0
Program/
data
8K
0
0
16K
16K
0
16K
4K
0
0
DARAM?
5K
10K
10K
6K
6K
8K
8K
16K
8K
32K
SARAM?
0
0
0
0
0
24K
24K
0
56K
168K
You can configure the dual–access RAM (DARAM) and single–access RAM (SARAM) as data memory or program/data
memory.
CHAPTER 1
1-45
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
ALU Functional Diagram
CB15 - CB0
DB15 - DB0
T
A
40
B T
C
D
S
Shifter output (40)
40
SXM
A
MUX
MUX
Sign ctr
Sign ctr
X
Y
B
SXM
OVM
C16
C
ACC
ALU
MUX
OVA/OVB
ZA/ZB
TC
40
Legend:
40
A
M
U
B
A
B
C
D
M
S
T
U
40
MAC
output
LECTURE 1
Accumulator A
Accumulator B
CB data bus
DB data bus
MAC unit
Barrel shifter
T register
ALU
1-3 2
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Central Processing Unit (CPU)
The C54x CPU contains:
• 40–bit arithmetic logic unit (ALU)
• Two 40–bit accumulators
• Barrel shifter
• 17 × 17–bit multiplier
• 40–bit adder
• Compare, select, and store unit (CSSU)
• Data address generation unit
• Program address generation unit
Arithmetic Logic Unit (ALU)
The C54x performs 2s-complement arithmetic with a 40-bit arithmetic logic unit (ALU) and two 40-bit
accumulators (accumulators A and B). The ALU can also perform Boolean operations. The ALU uses these
inputs:
•
•
•
•
16–bit immediate value
16–bit word from data memory
16–bit value in the temporary register, T
Two 16-bit words from data memory
CHAPTER 1
1-46
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
• 32-bit word from data memory
• 40-bit word from either accumulator
Accumulators
Accumulators A and B (see Figure 1) store the output from the ALU or the multiplier/adder block. They can
also provide a second input to the ALU; accumulator A can be an input to the multiplier/adder. Each
accumulator is divided into three parts:
• Guard bits (bits 39-32)
• High-order word (bits 31-16)
• Low-order word (bits 15-0)
Instructions are provided for storing the guard bits, for storing the high- and the low-order accumulator
words in data memory, and for transferring 32-bit accumulator words in or out of data memory. Also, either
of the accumulators can be used as temporary storage for the other.
Barrel Shifter
The C54x barrel shifter has a 40-bit input connected to the accumulators or to data memory (using CB or
DB), and a 40-bit output connected to the ALU or to data memory (using EB). The barrel shifter can
produce a left shift of 0 to 31 bits and a right shift of 0 to 16 bits on the input data. The shift requirements
are defined in the shift count field of the instruction, the shift count field (ASM) of status register ST1, or in
the temporary register T (when it is designated as a shift count register).
The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle. The
LSBs of the output are filled with 0s, and the MSBs can be either zero filled or sign extended, depending on
the state of the sign-extension mode bit (SXM) in ST1. Additional shift capabilities enable the processor to
perform numerical scaling, bit extraction, extended arithmetic, and overflow prevention operations
Multiplier/Adder Unit
The multiplier/adder unit performs 17 x 17-bit 2s-complement multiplication with a 40–bit addition in a
single instruction cycle. In the C54x architecture, the 17 x 17-bit multiplier is present to accommodate the
ability to multiply a signed number by an unsigned number. Although the original data from memory is 16bit, unsigned numbers are sign-extended into a 17th bit so that they can be used by the multiplier.
The multiplier/adder block consists of several elements: a multiplier, an adder, signed/unsigned input
control logic, fractional control logic, a zero detector, a rounder (2s complement), overflow/saturation logic,
and a 16-bit temporary storage register (T). The multiplier has two inputs: one input is selected from T, a
data-memory operand, or high part of accumulator A; the other is selected from program memory, data
memory, accumulator A, or an immediate value.
The fast, on-chip multiplier allows the C54x to perform operations efficiently such as convolution,
correlation, and filtering. In addition, the multiplier and ALU together execute multiply/accumulate (MAC)
computations and ALU operations in parallel in a single instruction cycle. This function is used in
determining the Euclidean distance and in implementing symmetrical and least mean square (LMS) filters,
which are required for complex DSP algorithms.
CHAPTER 1
1-47
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Compare, Select, and Store Unit (CSSU)
The compare, select, and store unit (CSSU) performs maximum comparisons between the accumulator’s
high and low word, allows both the test/control flag bit (TC) in status register ST0 and the transition register
(TRN) to keep their transition histories, and selects the larger word in the accumulator to store into data
memory. The CSSU also accelerates Viterbi-type butterfly computations with optimized on-chip hardware.
On-Chip ROM
The on-chip ROM is part of the program memory space and, in some cases, part of the data memory space.
The amount of on-chip ROM available on each device varies, as shown in Table 2. On devices with a small
amount of ROM (2K words), the ROM contains a bootloader that is useful for booting to faster on-chip or
external RAM. For bootloading details on all C54x devices except the ’548 and ’549, see TMS320C54x DSP
Reference Set, Volume 4: Applications Guide, SPRU173.
On devices with larger amounts of ROM, a portion of the ROM may be mapped into both data and program
space (except the ’5410). The larger ROMs are also custom ROMs: you provide the code or data to be
programmed into the ROM in object file format, and Texas Instruments generates the appropriate process
mask to program the ROM.
On–Chip Dual–Access RAM (DARAM)
The DARAM is composed of several blocks. Because each DARAM block can be accessed twice per
machine cycle, the central processing unit (CPU) and peripherals such as the buffered serial port (BSP) and
host port interface (HPI) can read from and write to a DARAM memory address in the same cycle. The
DARAM is always mapped in data space and is primarily intended to store data values. It can also be
mapped into program space and used to store program code.
On–Chip Single–Access RAM (SARAM)
The SARAM is composed of several blocks. Each block is accessible once per machine cycle for either a
read or a write. The SARAM is always mapped in data space and is primarily intended to store data values.
It can also be mapped into program space and used to store program code.
On-Chip Memory Security
The C54x maskable memory security option protects the contents of on-chip memories. When you designate
this option, no instruction that has originated externally can access the on-chip memory spaces.
CHAPTER 1
1-48
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Two C54x Memory Maps
'541 Program Memory
0000h
'541 Data Memory
0000h
OVLY = 0
0000h-13FFh
External
0000h-005Fh
Memory-mapped registers
OVLY = 1
0000h-007Fh
Reserved
0060h-007Fh
Scratch-pad DARAM
0080h-13FFh
On-chip DARAM
0080h-13FFh
On-chip DARAM
2000h
2000h
4000h
4000h
1400h-8FFFh
External
6000h
6000h
8000h
8000h
A000h
A000h
1400h-DFFFh
MP/MC = 0
C000h
MP/MC = 1
9000h-FF7Fh
On-chip ROM
FF80h-FFFFh
Interrupt vectors
(internal)
9000h-FF7Fh
FF80h-FFFFh
C000h
External
Interrupt vectors
(external)
E000h
E000h
DROM = 0
DROM = 1
FFFFh
LECTURE 1
External
E000h-FFFFh
External
E000h-FEFFh
On-chip ROM
FF00h-FFFFh
Reserved
FFFFh
1-3 3
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Memory–Mapped Registers
The data memory space contains memory-mapped registers for the CPU and the on-chip peripherals. These
registers are located on data page 0, simplifying access to them. The memory-mapped access provides a
convenient way to save and restore the registers for context switches and to transfer information between the
accumulators and the other registers.
CHAPTER 1
1-49
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Direct Addressing Block Diagram
DP(9)
7 LSBs from IR (dma)
SP(16)
DAB(16) (read)
CPL
DAGEN
CPL
0
EA = DP : offset(IR)
1
EA = SP + offset(IR)
EAB(16) (write)
or
CAB(16)
(32-bit read)
Data bus DB(16)
Data bus EB(16)
EA
Effective address
IR Instruction register
Legend:
1-34
LECTURE 1
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Data Addressing
The C54x offers seven basic data addressing modes:
•
•
•
•
Immediate addressing uses the instruction to encode a fixed value.
Absolute addressing uses the instruction to encode a fixed address.
Accumulator addressing uses accumulator A to access a location in program memory as data.
Direct addressing uses seven bits of the instruction to encode the lower seven bits of an address.
The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to determine
the actual memory address.
• Indirect addressing uses the auxiliary registers to access memory.
• Memory–mapped register-addressing uses the memory-mapped registers without modifying
either the current DP value or the current SP value.
• Stack addressing manages adding and removing items from the system stack.
During the execution of instructions using direct, indirect, or memory-mapped register addressing, the data–
address generation logic (DAGEN) computes the addresses of data–memory operands.
CHAPTER 1
1-50
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
C54x Program Memory
PAGEN
PC
Repeat registers
RC
BRC
RSA
REA
LECTURE 1
1-3 5
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Program Memory Addressing
Program memory is usually addressed on a C54x device with the program counter (PC). With some
instructions, however, absolute addressing may be used to access data items that have been stored in program
memory.
The PC is loaded by the program-address generation logic (PAGEN). And is used to fetch individual
instructions. Typically, the PAGEN increments the PC as sequential instructions are fetched. However, the
PAGEN may load the PC with a nonsequential value as a result of some instructions or other operations.
Operations that cause a discontinuity include branches, calls, returns, conditional operations, single–
instruction repeats, multiple–instruction repeats, reset, and interrupts. For calls and interrupts, the current
PC is saved onto the stack; it is referenced by the stack pointer (SP). When the called function or interrupt
service routine is finished, the PC value that was saved is restored from the stack via a return instruction.
For a detailed discussion of the hardware and software factors in program address generation, see Chapter 7,
Program Memory Addressing.
CHAPTER 1
1-51
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
C54x Pipeline
Loads PAB with
the PC's contents
Prefetch
Loads IR with the contents
of PB
Decodes the IR's contents
Fetch
Loads PB with the
fetched instruction
word
Decode
Loads DB with the data1
read operand
Loads CB with the data2
read operand
Loads EAB with the data3
write address, if required
Access
Loads DAB with the data1 read
address, if required
Loads CAB with the data2 read
address, if required
Updates auxiliary registers and
stack pointer
Read
Execute/write
Executes the instruction
and loads EB with write
data
Time
LECTURE 1
1-3 6
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Pipeline Operation
An instruction pipeline consists of a sequence of operations that occur during the execution of an instruction.
The C54x pipeline has six levels: prefetch, fetch, decode, access, read, and execute. At each of the levels, an
independent operation occurs. Because these operations are independent, from one to six instructions can be
active in any given cycle, each instruction at a different stage of completion. Typically, the pipeline is full
with a sequential set of instructions, each at one of the six stages. When a PC discontinuity occurs, such as
during a branch, call, or return, one or more stages of the pipeline may be temporarily unused. For more
details about the pipeline operation, see Chapter 7, Pipeline.
On–Chip Peripherals
All the C54x devices have the same CPU, but different on–chip peripherals are connected to their CPUs. The
C54x devices have these on–chip peripheral options:
• General-purpose I/O pins: XF and BIO
• Timer
• Clock generator
• Host port interface
• 8-bit standard (’542, ’545, ’548, ’549)
• 8-bit enhanced (’5402, ’5410?-?see note below)
• 16-bit enhanced (’5420?-?see note below)
• Synchronous serial port (’541, ’545, and ’546)
CHAPTER 1
1-52
INSTRUCTOR'S GUIDE
•
•
•
•
•
INTRODUCTION TO DSP
Buffered serial port (’542, ’543, ’545, ’546, ’548, and ’549)
Multichannel buffered serial port (McBSP) (’5402, ’5410, and ’5420?-?see note below)
Time-division multiplexed (TDM) serial port (’542, ’543, ’548, and ’549).
Software-programmable wait-state generator
Programmable bank-switching module
Note: Enhanced Peripherals For more detailed information on the enhanced peripherals, see SPRU302,
TMS320C54xDSP, Enhanced Peripherals: Volume 5.
General–Purpose I/O Pins
Each C54x device has two general-purpose I/O pins: BIO and XF. BIO is an input pin that can be used to
monitor the status of external devices. XF is a software-controlled output pin that allows you to signal
external devices.
Software–Programmable Wait–State Generator
The software-programmable wait–state generator extends external bus cycles up to seven machine cycles (14
machine cycles in the ’549, ’5402, ’5410, and ’5420) to interface with slower off-chip memory and I/O
devices. The software wait-state generator is incorporated without any external hardware. For off-chip
memory accesses, from zero to seven wait states can be specified within the software wait-state register
(SWWSR) for each 32K-word block of program and data memory, and for the 64K-word block of I/O space.
Programmable Bank-Switching Logic
The programmable bank–switching logic can automatically insert one cycle when an access crosses memory
bank boundaries inside program memory or data memory. One cycle can also be inserted when an access
crosses from program memory to data memory. This extra cycle prevents bus contention by allowing
memory devices to release the bus before other devices start driving the bus. The size of memory bank for
bank switching is defined by the bank switching control register (BSCR).
Host Port Interface
The host port interface (HPI) is a parallel port that provides an interface to a host processor. Information is
exchanged between the C54x and the host processor through C54x on-chip memory that is accessible to both
the host processor and the C54x. Table 3 identifies the HPI-equipped C54x devices.
Table 3.
Host Port Interfaces on the TMS320C54x DSP
Host Port
Interface
’541
’542
’543
’545
’546
’548
’549
’5402
’5410
’5420
Standard 8–bit
HPI
0
1
0
1
0
1
1
0
0
0
Enhanced 8–bit
HPI
0
0
0
0
0
0
0
1
1
0
Enhanced 16–bit
HPI
0
0
0
0
0
0
0
0
0
1
CHAPTER 1
1-53
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Hardware Timer
The C54x features a 16–bit timing circuit with a 4-bit prescaler. The timer counter is decremented by 1 at
every CLKOUT cycle. Each time the counter decrements to 0, a timer interrupt is generated. The timer can
be stopped, restarted, reset, or disabled by specific status bits.
Clock Generator
The clock generator consists of an internal oscillator and a phase–locked loop (PLL) circuit. The clock
generator can be driven internally by a crystal resonator with the internal oscillator or externally by a clock
source. The PLL circuit can generate an internal CPU clock by multiplying the clock source by a specific
factor; thus, you should use a clock source with a lower frequency than that of the CPU.
CHAPTER 1
1-54
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Serial Port Interface Block
Diagram
Data Bus
16
DRR (16)
16
Load
control
logic
(Load)
DXR (16)
16
16
RINT on
RSR-DRR
transfer
Load
Control
Logic
(Load)
RSR (16)
Byte/word
counter
XINT on
DXR-XSR
transfer
XSR (16)
(Clear)
(Clear)
(Clock)
(Clock)
FSR
Byte/word
counter
FSX
CLKRCLKX
DR
DX
1-3 7
LECTURE 1
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
Serial Ports
The serial ports on the C54x vary by device, and are represented by four types: synchronous, buffered,
multichannel buffered (McBSP), and time-division multiplexed (TDM). See Table 4 for the number of each
type on the various C54x devices. The sections that follow provide an introduction to the four types of serial
ports. For more details about these ports, see Chapter 9, Serial Ports. For detailed information about the
McBSPs, see volume 5 of this reference set: TMS320C54x DSP, Enhanced Peripherals, literature number
SPRU302.
Table 4.
Serial Port Interfaces on the TMS320C54x Devices
Serial Ports
’541
’542
’543
’545
’546
’548
’549
’5402
’5410
’5420
Synchronous
2
0
0
1
1
0
0
0
0
0
Buffered
0
1
1
1
1
2
2
0
0
0
Multichannel
Buffered
0
0
0
0
0
0
0
2
3
6
TDM
0
1
1
0
0
1
1
0
0
0
CHAPTER 1
1-55
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Synchronous Serial Ports
Synchronous serial ports are high-speed, full-duplexed serial ports that provide direct communication with
serial devices such as codecs, analog-to-digital (A/D) converters, and other serial systems. When more than
one synchronous serial port resides on a C54x, these ports are identical but independent. Each synchronous
serial port can operate at up to one-fourth the machine cycle rate (CLKOUT). The synchronous serial port
transmitter and receiver are double buffered and individually controlled by maskable external interrupt
signals. Data is framed either as bytes or as words.
Buffered Serial Ports
A buffered serial port (BSP) is a synchronous serial port that is enhanced with an autobuffering unit and is
clocked at the full CLKOUT rate. It is full–duplexed and double–buffered to offer flexible data stream
length. The autobuffering unit supports high–speed transfers and reduces the overhead of servicing
interrupts.
Multichannel Buffered Serial Ports (McBSPs)
The McBSP is an enhanced buffered serial port that includes the following standard features: buffered data
registers, full duplex communication, and independent clocking and framing for receive and transmit. In
addition, the McBSP includes the following enhanced features: internal programmable clock and frame
generation, multichannel mode, and general purpose I/O. For detailed information about the McBSPs, see
volume 5 of this reference set: TMS320C54x DSP, Enhanced Peripherals, literature number SPRU302.
TDM Serial Ports
A time-division multiplexed (TDM) serial port is a synchronous serial port that is enhanced to allow time–
division multiplexing of the data. It can be configured for either synchronous operations or for TDM
operations and is commonly used in multiprocessor applications.
CHAPTER 1
1-56
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
C54x External Bus Interface
CLKOUT
PB Fetch
CB/DB Reads
EB Write
A(22 - 0)
Write
D(15 - 0)
LECTURE 1
Read
Read
Fetch
1-3 8
Copyright  1998, Texas Instruments Incorporated All Rights Reserved
External Bus Interface
The C54x can address up to 64K words of data memory, 64K words of program memory (8M words in the
’548, ’549, and ’5410; 1M words in the ’5402; 256K words in the ’5420), and up to 64K words of 16–bit
parallel I/O ports. Accesses to either external memory or I/O ports take place through the external interface.
Individual space-select signals, DS, PS, and IS, allow the selection of physically separate spaces.
The interface’s external ready input signal and software–generated wait states allow the processor to
interface with memory and I/O devices of many different speeds. The interface’s hold modes allow an
external device to take control of the C54x buses; in this way, an external device can access the resources in
the program, data, and I/O spaces.
External memory can be accessed by most C54x instructions. However, accessing I/O ports requires the use
of special instructions: PORTR and PORTW.
IEEE Standard 1149.1 Scanning Logic
The IEEE Standard 1149.1 scanning–logic circuitry is used for emulation and testing purposes only. This
logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-to-pin
continuity as well as to perform operational tests on devices peripheral to the C54x. The IEEE Standard
1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of the on–chip
resources. Thus, the C54x can perform on-board emulation using the IEEE Standard 1149.1 serial scan pins
and the emulation–dedicated pins.
CHAPTER 1
1-57
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
REFERENCES
Ahmed, Irfan (ed.). [1991]. Digital Control Applications With the TMS320 Family, Texas Instruments,
Dallas, TX, 1991.
Allen, J. [1975]. “Computer Architecture for Signal Processing,” Proceedings of the IEEE, vol. 63, no. 4,
pp. 624-633, April 1975
Arazi, Benjamin. [1988]. A Commonsense Approach to the Theory of Error Correcting Codes, MIT Press,
Cambridge, MA
Augarten, S. [1984]. Bit by Bit, Ticknor & Fields, New York
Auslander, E. [1993]. “Digital signal processing and the emerging markets of the ’90s,” Le Traitement du
Signal et ses Applications, Actes des Conferences, DSP’93
Bell, C. G. and Newell, A. [1971]. Computer Structures, McGraw-Hill, New York
Bowen, B. A. and Brown, W. R. [1982]. VLSI Systems Design for Digital Signal Processing, Volume 1:
Signal Processing and Signal Processors, Prentice-Hall, Englewood Cliffs, NJ
Cooley, J. W., Lewis, P. A. W. and Welch, P. D. [1967]. “Historical Notes on the Fast Fourier Transform,”
IEEE Transactions on Audio and Electroacoustics, Vol AU-15, No. 2, pp.76-79, June 1967
Cooley, J. W. and Tukey J. W. [1965]. “An algorithm for the machine computation of complex Fourier
Math. Of Comput., Vol 19, pp. 297-301
Danielson, C. G. and Lanczos, C. [1942]. “Some improvements in practical Fourier analysis and their
J. Franklin Inst., Vol 233, pp. 365-380 and 435-452,
April 1942
DeFatta, David J.; Lucas, Joseph G. and Hodgkiss, William S. [1988]. Digital Signal Processing: A System
Design Approach, John Wiley, New York
Dote, Y. [1990]. Servo Motor and Motion Control using Digital Signal Processors, Prentice-Hall,
Englewood Cliffs, NJ
Hanselmann, H. [1987]. “Implementation of Digital Controllers - A Survey”, Automatica, Vol. 23, No. 1,
1987
Hayes, John P. [1979]. Computer Architecture and Organization, McGraw-Hill International
Heidemann, Michael T., Johnson, Don. H. and Burrus, C. Sidney [1984]. “Gauss and the History of the Fast
IEEE ASSP Magazine, pp. 14-21, October 1984
CHAPTER 1
1-58
INSTRUCTOR'S GUIDE
INTRODUCTION TO DSP
Jury, E. I. [1964]. Theory and Application of the Z-Transform Method, John Wiley, New York Lewis, F.
[1992]. Applied Optimal Control & Estimation: Digital Design & Implementation, Prentice-Hall,
Englewood Cliffs, NJ
Lynn, Paul A. [1982]. The Analysis and Processing of Signals, MacMillan, London
Oppenheim, A. V. and Schafer, R. W. [1975 and 1988]. Digital Signal Processing, Prentice-Hall,
Englewood Cliffs, NJ
Runge, C. [1903]. Zeit. fur Math. and Physik, Vol 48, p. 43.
CHAPTER 1
1-59