INSTRUCTOR'S GUIDE INTRODUCTION TO DSP LECTURE 1 INTRODUCTION TO DSP OBJECTIVES This introduction is designed to answer several basic questions for the beginning student and familiarizes the student experienced in digital signal processing with the design and architecture of Texas Instrument's TMS320C54x device. The topics presented in this section include: • • • • • CHAPTER 1 The definition of digital signal processing (DSP) The benefits of digital signal processors Practical applications or uses of digital signal processing General DSP design and architecture Specific DSP architecture (TMS320C54x) 1-1 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP What Is DSP? Analog Computer a bit loud Digital Computer DSP DAC ADC 1010 OUTPUT 1001 1-1 LECTURE 1 Definition of a Digital Signal Processor A digital signal processor (DSP) is an integrated circuit designed for high-speed data manipulations, and is used in audio, communications, image manipulation, and other data-acquisition and data-control applications. How Digital Signal Processing Works To explain how digital signal processing works, you must understand the difference between analog and digital signals. Analog signals, which include sound intensity, pressure, light intensity, etc., are continuously variable. Each of our senses is sensitive to different kinds of analog signals. Our ears are sensitive to sound, our eyes are sensitive to light, and so on. Once we receive a signal, our sensory organs convert it to an electrical signal and send it to our analog computer (the brain). Our brains are very powerful parallel computers whose performance currently is unmatched by any digital computer. Our brains not only analyze the information received, but also make decisions using this data. Digital signals are those that are transmitted within or between computers, in which information is represented by discrete states – for example, high voltages and low voltages – rather than by continuously variable levels in a continuous stream, as in an analog signal. How Analog and Digital Signals Work Together Digital technology such as personal computers (PCs), assist us in many ways: writing documents, spell checking, and drawing. Unfortunately, the world is analog, and electronic analog computers are not as CHAPTER 1 1-2 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP versatile as digital computers. Therefore, in order to make use of the tremendous processing power that digital technology offers us, we must do the following: • Convert the analog signals into electrical signals, using a transducer (such as a microphone, as shown in the diagram). • Digitize these signals (i.e., convert them from analog to digital using an analog-to-digital converter (ADC)), as shown in the diagram. Once the signal is in digital form, our computer can easily process it through a digital signal processor. The DSP specializes in processing these signals, which makes it slightly different from microcomputers, microcontrollers, or general-purpose microprocessors. After the DSP has processed the signal, the output signal must be converted back to analog form so that we can sense it. This is the digital-to-analog (DAC) conversion stage in the diagram. A loudspeaker, for example, would reproduce analog signals coming from the DAC into sound. So, we can see that to process the signal digitally, we need to convert it at least twice. Is it worth it? As you will see, it really is, at least until someone designs an analog computer as versatile as a digital one. CHAPTER 1 1-3 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Multiply and Add 1+2 = 3 Add Multiply 0 1 0 1 0001 + 0010 0011 5 x x x x 8 4 2 1 x x x x 5*3 = 15 0011 0011 0011 0011 Shifted and added multiple times 3 0000 0011 0000 0011 = MAC Operation Most Common Operation in DSP A = B*C + D Typically 70 Clock Cycles With Ordinary Processors E = F*G + A ... M ultiply, A d d , a n d Ac c u m u l a t e Typically 1 Clock Cycle With Digital Signal Processors MAC Instruction LECTURE 1 1-3 Why Do We Need Digital Signal Processors? Why do we need a digital signal processor? Can we not use a general-purpose microprocessor to process signals as well? Let us try to answer this question by giving an example of some arithmetic operations performed by DSPs. Add and Subtract Add and subtract operations are performed quite simply by general-purpose microprocessors in a single or very few clock cycles. Digital addition is similar to decimal add. Our example shows adding 1 plus 2. The result is the decimal 3. Multiply and Divide The multiply and divide operations are more complex. A digital multiply operation consists of a series of shift and add operations. Our example shows a multiplication of 3 by 5. Division, which is more complex, will not be discussed here. It is discussed in TMS320C54x DSP Reference Set, Vol. 2 Mnemonic Instruction Set, Chapter 2, reference number SPRU172B. The subtract conditionally (SUBC) instruction set describes this process. General-purpose microprocessors are quite slow in performing multiply and divide operations. They will typically sequentially execute a series of shift, add, and subtract operations from their microcode to perform a single multiply operation, and may consume many cycles to complete. CHAPTER 1 1-4 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP The DSP performs multiplication in a single cycle by implementing all shift and add operations in parallel. The circuitry is relatively complex and consumes a considerable number of transistors. The benefit is very fast multiplication, which is required for processing most digital signals. When general-purpose DSPs are not fast enough, the signal is either processed using analog circuits (which may have some drawbacks), or in specialized DSP hardware designed only for that task. This eliminates many of the benefits of a programmable DSP. Digital signal processing, by its nature, requires many calculations of the form: A = B*C + D This may appear to be a simple task, but when speed is also required, we find that specialized, dedicated hardware to perform this task is very useful. Multiply, Accumulate (MAC) Most DSPs have a specialized instruction that allows them to multiply, add, and save the result in a single cycle. This instruction is usually called MAC (short for Multiply, Accumulate). CHAPTER 1 1-5 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Drop in Multiplication Times TIME (ns) 600 500 400 300 200 100 5 ns 0 1971 1976 1998 YEARS LECTURE 1 1-4 We have established that for DSPs, we need specialized hardware that is capable of performing multiply and accumulate functions in the shortest possible time (preferably in a single cycle). However, the central problem remains. How can we achieve a fast multiply operation? Without a fast multiplier, a worthwhile DSP design would only be a dream. Designing fast multipliers was one of the greatest challenges in digital design up until the 1980s. In the 1970s, several of the world’s leading research laboratories sought to make fast digital multipliers a reality. Multiply Times In 1971, Lincoln Laboratories designed a multiplier using 10,000 integrated circuits, performing the operation in just 600 ns. By the mid-1970s, multiply times of 200 ns were becoming commonplace. This made it possible to design acceptable digital signal processors. These early designs were expensive and bulky, but fast multiplication was determined to be possible. In the early 1980s, single-chip DSPs with good performance started to appear, and ever since, multiply times have continued to drop. Today’s 16-bit fixed-point devices can achieve multiply times of 5 ns. Given the origins of this technology, this is a remarkable achievement. CHAPTER 1 1-6 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP D igital Computers von Neuman Machine A STORED PROGRAM AND DATA D INPUT/ OUTPUT ARITHMETIC LOGIC UNIT A = ADDRESS D = DATA Harvard Architecture A A ARITHMETIC LOGIC UNIT STORED PROGRAM INPUT/ OUTPUT STORED DATA D D 1- Now let us have a closer look at the internal architecture of computers so we can see how this has affected the design of DSP chips. Stored Program Machines Computers need instructions to operate. At every clock cycle, they must be told what to do. If the instructions are stored, the computer just has to fetch and execute them. Such computers are called stored program machines. Our computer typically fetches an instruction and then data, operates on the data, and returns the resulting data to the store. Stored program machines use two well-known and widely used computer architectures: von Neuman and Harvard.. The following diagram shows the structure of the two architectures. von Neuman Architecture The von Neuman machines store programming and data in the same memory area. In this type of machine, an instruction contains the operation command and the address of the data on which the operation is performed. There are two basic operation units within these machines: the arithmetic logic unit (ALU) and the input/output unit. The ALU performs the core operations: multiply, add, subtract, and many more. It is on these very simple core operations that complex software, such as word processing software, can be built. The input/output unit manages the flow of external data for the machine. CHAPTER 1 1-7 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Harvard Architecture The primary difference between Harvard architecture and von Neuman architecture is that with Harvard, program and data memories are physically separated transmission paths. This enables the machine to transfer instructions and data simultaneously. Such a structure can greatly enhance performance, because instructions and data can be fetched simultaneously. Harvard machines also have ALUs and input/output units. Von Neuman and Harvard Architecture History The history of these two architectures is very interesting. The Harvard architecture was developed by Howard Aiken in the late 1930s at Harvard University, with the Harvard Mark 1 becoming operational in 1944. The University of Pennsylvania followed in 1946 with the development of the Electronic Numerical Integrator and Calculator (ENIAC). John von Neuman, a Hungarian-born mathematician, suggested a simpler and lower cost architecture, namely a single memory for programming and data. This simple solution has set the standard ever since. In 1951, the Institute of Advanced Studies in Princeton built the first von Neuman machine. Which Architecture is Best Suited for DSP? Common general-purpose personal computers use processors designed with the von Neuman architecture while the Harvard architecture is more commonly used in specialized microprocessors for real-time and embedded applications. DSPs typically use Harvard architecture, although von Neuman DSPs also exist. Many signal and image processing applications require fast, real-time machines. The drawback to using a true Harvard architecture is that since it uses separate program and data memories, it needs twice as many address and data pins on the chip and twice as much external memory. Unfortunately, as the number of pins or chips increases, so does the price. Electronic designers, who have had to tackle problems like these before, have come up with an elegant solution: a single data and address bus is used externally, while two (or more) separate buses for program and data are used internally. Timing (multiplexing) handles the separation of program and data information. In one clock cycle, the program information flows on the pins, and in the second cycle, data follows on the same pins. Program and data information is then routed onto separate internal program and data buses. Such machines are called modified Harvard architecture processors because the internal architecture is Harvard while the external architecture is von Neuman. The performance of modified Harvard architecture processors typically compares well with the performance of true Harvard architecture processors because most DSP chips also incorporate multiple internal RAM/ROM cells for high-use instructions and data. This significantly reduces the time used for external sequential program and data access associated with classic von Neuman processors. CHAPTER 1 1-8 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP A Typical DSP System MEMORY l DSP Chip l Memory l Converters (Optional) ADC l Analog to Digital DSP l Digital to Analog l Communication Ports DAC l Serial l Parallel PORTS 1-2 LECTURE 1 Components of a Typical DSP System Typical DSP systems consist of a DSP chip, memory, possibly an analog-to-digital converter (ADC), a digital-to-analog converter (DAC), and communication channels. Not all DSP systems have the same architecture with the same components. The selection of components in a DSP system depends on the application. For example, a sound system would probably require A/D and D/A converters, whereas an image processing system may not. DSP Chip A DSP chip can contain many hardware elements; some of the more common ones are listed below. Central Arithmetic Unit This part of the DSP performs major arithmetic functions such as multiplication and addition. It is the part that makes the DSP so fast in comparison with traditional processors. Auxiliary Arithmetic Unit DSPs frequently have an auxiliary arithmetic unit that performs pointer arithmetic, mathematical calculations, or logical operations in parallel with the main arithmetic unit. Serial Ports DSPs normally have internal serial ports for high-speed communication with other DSPs and data converters. These serial ports are directly connected to the internal buses to improve performance, to reduce external address decoding problems, and to reduce cost. CHAPTER 1 1-9 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Memory Memory holds information, data, and instructions for DSPs and is an essential part of any DSP system. Although DSPs are intelligent machines, they still need to be told what to do. Memory devices hold a series of instructions that tell the DSP which operations to perform on the data (i.e., information). In many cases, the DSP reads some data, operates on it, and writes it back. Almost all DSP systems have some type of memory device, whether it is on-chip memory or off-chip memory; however, on-chip memory operates faster. A/D and D/A Converters Converters provide the translator function for the DSP. Since the DSP can only operate on digital data, analog signals from the outside world must be converted to digital signals. When the DSP provides an output, it may need to be converted back to an analog signal to be perceived by the outside world. Analog-to-digital converters (ADCs) accept analog input and turn it into digital data that consist of only 0s and 1s. Digital-to-analog converters (DACs) perform the reverse process; they accept digital data and convert it to a continuous analog signal. Ports Communication ports are necessary for a DSP system. Raw information is received and processed; then that information is transmitted to the outside world through these ports. For example, a DSP system could output information to a printer through a port. The most common ports are serial and parallel ports. A serial port accepts a serial (single) stream of data and converts it to the processor format. When the processor wishes to output serial data, the port accepts processor data and converts it to a serial stream (e.g., modem connections on PCs). A parallel port does the same job, except the output and input are in parallel (simultaneous) format. The most common example of a parallel port is a printer port on a PC. CHAPTER 1 1-10 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Practical DSP Systems l Hi-Fi Equipment l Toys l Videophones l Modems l Phone Systems l 3D Graphics l Image Processing l And More ... LECTURE 1 1-1 3 Practical Applications for DSP Systems Since their introduction to the market, DSPs have found a wide variety of applications. They are used in everyday hi-fi systems as well as high-end virtual-reality applications. Generally, DSP is not an expensive technology. Some practical DSP systems are: • • • • • • • Hi-Fi Equipment Toys Videophones Modems Phone Systems 3D Graphics Systems Image Processing Systems Hi-Fi Equipment (Music Systems) DSPs are now being used in sound processors that can create the illusion of three-dimensional sound or modify the acoustics of a room to give the illusion of very large rooms and auditoriums. The result is movie theater quality sound in a home music system. Toys Today, DSP technology is integrated in children's toys. Talking toys are commonplace; by pressing the picture of a dog, children can hear it bark. They can also learn their alphabet by singing along with a teaching toy. This clearly demonstrates that DSP technology is not expensive. 1- INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Videophones Videophones will affect the lives of people from all walks of life. They are quickly improving in quality. It is only a matter of time before prices drop and videophones become widely used. DSPs are used for compression and decompression of images in videophones. There are several international standards for compressing moving images. Programmable DSPs are the perfect answer to evolving standards since this may only require a software update. Modems As the Internet continues to grow, so has the use of modems. To be able to handle the ever-increasing communications load, modems have become faster and more efficient. DSPs perform vital functions in modems such as modulating the digital bit stream into a signal compatible with a phone line, canceling line echoes, and compressing and decompressing data Phone Systems These days, it is quite common to call a company and be answered by a machine that provides alternatives such as: “Say 1 for sales,” “Say 2 for technical support,” and so on. These phone systems use DSPs to perform the function of voice recognition. DSPs are also commonly used in the communications industry for the add-on features you can get from your telephone company like caller ID, voice messaging, and call back. 3D Graphics Systems Most flight simulators use 3D real-time graphics to enhance realism. To calculate the necessary details in three dimensions (and to be able to do this 30 times every second) requires very efficient and powerful processors. DSPs are now widely used in virtual-reality applications. Image Processing Systems Personal handheld digital cameras are also now becoming widespread. DSPs are used to perform the conversion of charge-coupled device (CCD) chip analog voltages (video) to compressed data, which is then stored digitally in constant storage EEPROM (electrically erasable ROM). The DSP also senses the buttons, controls exposure times, provides the CCD gate timing, and downloads images to the PC. DSPs are also used extensively in image processing, such as robot vision, machine vision, image compression, and fingerprint recognition. A simple example of an image-processing application is the inspection of printed circuit boards. The system works by recording the image of a working board and comparing (subtracting) it to newly manufactured ones as they pass beneath a CCD camera. These systems also use the efficient multiply and add cycles in DSPs to perform two-dimensional filtering. 1- INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Analog Advantages l Low cost and simplicity in some applications – – Attenuators/amplifiers Simple filters l Wide bandwidth (GHz) l Low signal levels l Infinite effective sampling rate – – l Infinite resolution in frequency No aliasing/reconstruction issues Infinite resolution in amplitude – No quantitation noise 1-1 4 LECTURE 1 Digital Signal Processing (DSP) Advantages l Repeatability – – – Low sensitivity to component tolerances Low sensitivity to temperature changes Low sensitivity to aging effects – – Nearly identical performance from unit to unit Matched circuits cost less l High noise immunity l In many applications DSP offers higher performance and lower cost – CD players versus phonographic turntable LECTURE 1 CHAPTER 1 1-1 5 1-13 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP W h y D igital Processing? ADC PROCESS DAC l Advantages to Digital Processing l Programmability l Stability l Repeatability l Special Applications LECTURE 1 1-8 So, Why Convert From Analog to Digital? Some applications require analog designs, and some require digital designs. To process signals digitally, they must be converted from analog to digital numbers. After a signal is processed, it is then often converted back to analog form. Considering the overhead, digital processing must offer some clear advantages that include: • • • • Programmability Stability Repeatability Special Applications CHAPTER 1 1-14 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Programmability l One Hardware = Many Tasks LOW-PASS FILTER SOFTWARE 1 SOFTWARE 2 SAME HARDWARE .. MUSIC SYNTHESIZER .. MOTOR CONTROL SOFTWARE N l Upgradability and Flexibility l Develop New Code l Analog Upgrade Solder New Component 1- Programmability A single piece of digital DSP hardware can perform many functions. For example, a multimedia PC can play music and also function as a word processor if it is loaded with suitable programs. This ability to use the same hardware for many functions provides important flexibility. You can implement any new function you think of, as long as you can program it. Upgradability Once you have designed and implemented your system, you may want to upgrade or add new functions. Perhaps you would like to adapt your system to a new environment. With a digital system, this means modifying your code. With an analog system, this could involve obtaining and soldering in new components, or even a complete redesign. Flexibility A single DSP board can be made to perform many functions by simply loading new programs into it. In our demonstrations, we are using the same DSK board as a music tune generator and as a low-pass filter by simply loading it with different software. This flexibility reduces design time and complexity. With analog circuits, a new circuit has to be designed for each new function. CHAPTER 1 1-15 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Stability The stability of analog circuits depends upon several factors. Analog circuits are affected by temperature and aging, among other things. Also, two analog systems using the same design and components may differ Analog Variability Analog Circuits are affected by lTemperature lA g i n g Tolerance of Components Two Analog Systems using the same design and components may differ in performance 1 k Ω + 10 years = 1.1k Ω 1-1 0 LECTURE 1 in performance. Temperature Ω + 10 years = 1kΩ Ω 1.1kΩ Analog components such as resistors, capacitors, diodes, and operational amplifiers are affected by temperature, humidity, and aging. A temperature-sensitive analog circuit may perform quite differently in the UK than in Egypt, where the temperatures are different. This could prove disastrous for a company that sells its products worldwide. Digital circuits do not gradually change their characteristics over time, temperature, or humidity. They either work or they don’t work. In other words, digital circuits are repeatable as long as they are designed with enough tolerance to operate properly over the range of expected conditions. Aging The effects of component aging can be detrimental to analog circuits as characteristics and performance change. These effects can sometimes be anticipated, or their effect may not be critical. Analog designers must be aware of these effects. Tolerances Components such as resistors and capacitors have tolerances. If a component tolerance is only accurate to within 10%, two apparently identical analog circuits could perform differently enough to cause operational problems. This can make design, manufacturing, and support expensive. CHAPTER 1 1-16 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP D igital Repeatability Perfect Reproducibility l Nearly identical performance from unit to unit l Performance not affected by tolerance l No drift in performance due to temperature or aging l Guaranteed accuracy A CD player always plays the same music quality 1-1 1 LECTURE 1 Digital Repeatability A properly designed digital circuit will produce the same result every time, in addition to being identical from unit to unit. If the same multiplication is performed on 500 computers, all 500 computers should produce the same result. Component tolerances, aging, and temperature drifts also do not affect digital circuits nearly as much. A properly designed digital circuit will produce the same results in the UK as in Egypt, even when the temperatures are different. On the other hand, 500 analog circuits could produce a range of results. In digital circuitry, logical 1s and 0s are defined when an analog voltage is above or below an analog voltage threshold. For a digital circuit to be repeatable, the analog voltage which represents the logical 1s and 0s needs to be sufficiently greater or less than the threshold so as not to be affected by circuit variations or noise. The only concern is that timing restrictions and maximum device ratings should not be exceeded. If proper digital inputs are not maintained, the 1s and 0s can be corrupted, making a normally repeatable digital circuit suddenly fail. On the other hand, analog circuit characteristics will tend to gradually drift. Digital accuracy is determined by the number of bits used and is guaranteed to remain the same. With analog circuits, the number of bits is effectively infinite, but the effects of noise, tolerances, and linearity can rapidly diminish performance. A digital CD player consistently produces the same high-quality digital music and is primarily only limited by the analog components that are still required. Analog components in a CD player include the DAC, laser, laser pickup, read head actuator, spindle motor and headphones. CHAPTER 1 1-17 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Performance Some special functions are best implemented digitally l Lossless C o m p r e s s i o n l Adaptive Filters l Linear Phase Filters gain f phase frequency frequency f1 LECTURE 1 f2 1-1 2 Compression Storage media such as hard disk drives and satellite communications links for telephone and video are examples where resources are limited in terms of the available size and bandwidths. More would be better, but installing additional hardware tends to be very expensive. In these cases, costs are passed on to the consumer in one way or another. An example would be the substantial cost difference between a 20-minute and a 2-minute phone call, especially if the call is long distance. Although the prices for installing more advanced hardware tend to be on a downward curve, our need for more information is on an even more aggressive upward curve. Data compression can be a valuable tool for providing adequate performance from available resources, and at a reasonable cost. Let us consider the example of a satellite link or transmission channel. If one megabyte of data is compressed down to half a megabyte and then transmitted, a decompressor can then recover the original data at the other end. Considering that the transmission line is only aware that half a megabyte of data has been passed, the data channel bandwidth is effectively doubled. A DSP can compress raw binary data and signals through the use of appropriate software programs. Lossless compression programs are suitable for exact binary data transfers. On the other hand, programs designed for compressing speech and video offer much higher compression ratios but with some loss in signal quality. Analog circuits can also be used for some very simple forms of lossless compression but offer very little flexibility. CHAPTER 1 1-18 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Adaptive Filters DSP systems have been developed that cancel some of the noise within cabins of cars, helicopters, and airplanes. The noises cancelled were those caused by engine vibrations. The noise cancellation systems used the engine speed as a reference and produced an anti-noise signal from speakers to cancel the cabin noise. Feedback from microphones in each headrest (or headphone) was used to adapt the characteristics of the anti-noise until the best possible noise reduction was achieved. The system then continued to adapt periodically to track changes in the cabin noise. A DSP system can easily adapt to some changes in environmental variables. An adaptive algorithm simply calculates the new parameters required and stores them back in main memory, overwriting the previous values. A very basic level of adaptation is possible in analog systems, but the complete change of a complex set of filter characteristics (used in noise cancellation) is beyond the practical scope of analog signal processing. A notch filter with a steep cut-off frequency would be one example of a filter that might be needed to implement noise cancellation. In this case, the DSP has the ability to recalculate suitable notches to remove the vibration noise as the engine RPM changes. It is virtually impossible to produce the many required tunable filters using analog techniques alone. Linear Phase Filters There are some valuable signal processing techniques that are difficult or impossible to produce by analog procedures. The classic example is that of a linear phase filter that is difficult to design in analog and even then, over a limited bandwidth. With a digitally implemented filter, it is possible to keep the phase shift of each component frequency consistent with all other frequencies. This is possible by using a finite impulse response (FIR) filter. This term will be explained later. CHAPTER 1 1-19 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP DSP Development ADD A, B 11100010010100001001 HIGH-LEVEL LANGUAGE ASSEMBLER CODE EMULATOR TEST S/W DESIGN N DSP OK? Y PRODUCT Tools of the Trade LECTURE 1 1-7 The Program The DSP chip is a piece of hardware that cannot function without the intelligence of a program. A program is a series of instructions that perform certain functions. In our demonstrations, we will see some examples of programming to compose simple musical tunes. To write these programs, we must use the tools of the trade. Assemblers Assemblers generate machine-level code from text instructions. Let us assume we were given the following two lines to remember: ADD A, B 111000100101010001001 Since we understand written words better than a series of 1s and 0s, which line is easier to understand and remember? Assemblers take our text instructions and convert them into machine language. This relieves us of the burden of having to remember binary instructions for DSP. We will talk more about assembly language in the next chapter. High-Level Language High-level languages are like assembly languages, but much friendlier. Assembly languages have very basic instructions, such as multiply, add, and compare. High-level languages have higher-level instructions, such as print, and repeat until equal to zero. Therefore, it is easier to write programs in high-level languages. CHAPTER 1 1-20 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP While it is easier to write in high-level languages, assembly language can produce programs that are able to execute faster. For this reason, both have their uses in DSPs. Sometimes it is necessary to write time-critical sections of a program in assembly. A complete program may have sections of code in assembly and sections in a high-level language. It is easy to combine both types of code into a single executable program. Assembly and high-level programming languages make it possible to program DSPs to perform a variety of functions. Simulators Flight simulators make you feel as if you are in the cockpit of a plane without the cost of an actual airplane, fuel, or risk of crashing. Likewise, a DSP simulator is a software implementation of a DSP chip. A simulator typically runs on a computer (PC or workstation), simulating almost all of the functionality of the DSP. They are used to analyze the feasibility of designs before the designs are committed to hardware. They are also very useful in determining whether or not a particular design will work. Emulators An emulator allows us to directly control and debug the results of instructions executing on the DSP. Modern emulators do not replace the DSP chip on the board but exert their control through a serial emulation scan path. Using these devices, it is possible to see all of the internal changes in the device at each step. Developers can execute the instructions one step at a time, check voltage levels for correct operation, and check each result in their own time. Emulators are invaluable tools in development environments. Debugger A debugger interface is used to display program execution information in a useable format for the programmer. The data displayed in the debugger windows is essentially a formatted data print of the contents of the DSP memory. This memory is simply loaded into the PC using either an emulator or a communications link with the PC using appropriate software. For example, the memory window can display (and edit) data in hexadecimal, float or integer formats, but the data is nonetheless binary 1s and 0s to the DSP. Likewise, the disassembly window is simply a reformatting of the binary value into a recognizable alphanumeric mnemonic. The CPU register window is a little different in that the C54x register values are not directly accessible as memory data because they are not memory mapped registers. For the scan emulator, this job is quite easy since the scan path simply routes through the internal registers of the DSP. For the DSK, this task is accomplished using a special program that saves and restores the CPU registers from the DSP main memory. Other than this, the data displayed in the CPU register window is simply another data form. Debuggers consist of a user interface on the host PC computer, which can control and modify the contents and execution of the chip. The user interface displays the contents of RAM, registers, and the disassembly of the currently loaded program. The major advantage of debuggers over simulators is that they operate in real time, allowing the designer to assess the performance of the system in a real-time environment. CHAPTER 1 1-21 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Development Cycle After the feasibility of the design is established through simulation, program design can begin. First, the software is designed. This stage determines the complexity and the modules of the code. The modules of software are written and tested, and then the full system is put together and tested. If everything works as required, the result is version 1.0 of the product on the market. If it does not work as required, the process is repeated until it does. When new requirements and improvements emerge as a result of user feedback, a new version is produced via the same process. CHAPTER 1 1-22 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Number Systems l Represent numbers digitally Decimal 2 Digit Number Digit Number 128 2 7 64 2 7 32 6 2 6 16 5 2 5 8 4 2 4 4 3 2 3 2 2 2 2 1 1 2 1 0 0 l Any number can be represented as a series of 1s and 0s l Decimal 3 in binary Decimal Digit Number 0 7 0 2 6 0 2 5 2 2 Digit Number 7 6 5 Binary 0 0 0 0 0 2 4 0 3 0 2 2+ 1 1= 3 0 2 2 2 2 4 3 2 1 0 0 0 0 1 1 0000 0011 16+ 8+ 0 2= 0 26 l Decimal 26 in binary Decimal 2 Digit Number 0 2 7 0 2 6 2 5 2 4 2 3 2 2 2 1 2 0 Digit Number 7 6 5 4 3 2 1 0 Binary 0 0 0 1 1 0 1 0 LECTURE 1 0001 1010 1-1 7 Number Systems Let us now consider decimal, binary and hexadecimal (hex) number systems. The human-friendly decimal system uses ten digits, 0 to 9, for number representation. Numbers larger than 9 are represented by carrying a digit to the left. Number 10 represents one complete decimal count (digit 1x10) and a 0. Binary To represent numbers digitally, we are only allowed two binary digits, logic 1 and logic 0. Large numbers can be represented in the binary system; however, more digits are needed to represent the same number in the binary than are needed in the decimal system. Consider the representation of decimal number 3 in binary, as shown in the preceding example. The value of each binary digit is determined by its position. In the binary system, the maximum value the first digit can have is 1. The second digit has a maximum value of 2. In an 8-bit system the decimal number 3 is represented by setting the two least significant digits (LSB) to 1, or 0000 0011b. To represent the decimal number 26, higher order binary digits are set to achieve the value 0001 1010b. 8 In an 8-bit binary system, the largest decimal number that can be represented is 2 -1 = 255 = 11111111b. 16 And the largest decimal number that can be represented in a 16-bit binary system is 2 -1 = 65,535. CHAPTER 1 1-23 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP B inary and Hex Decimal 0,1,2,…….,9 Binary 0,1 Hex 0,1,2,……..,A,B,C,D,E,F 16 Decimal 20 Decimal 0x10 Hex 0x14 Hex l 4 bits of binary system is represented by a single hex digit Decimal 2 8+ Digit Number 2 Binary 4+ 3 2 1 2+ 2 2 1 2 1 Hex 15 1= 1 0 1111 1 F F l Decimal 26 in binary and hex Decimal 2 Digit Number 0 2 7 0 2 0 6 2 5 16+ 2 4 8+ 2 3 0 2 2= 2 2 1 0 2 Digit Number 7 6 5 4 3 2 1 0 Binary 0 0 0 1 1 0 1 0 Hex LECTURE 1 1 A 26 0 0001 1010 1A 1-1 8 Binary and Hexadecimal Another useful number system is base 16 or hexadecimal (hex). After digit 9, the alphabet letters A to F are used to represent the top base numbers 10 through 15. The largest decimal number that can be represented with a single-digit hex number is 15, which is F in hex. To represent decimal number 16 in hex, the next digit position is used: 0x10 hex. To distinguish hex numbers from decimal we will use a preceding 0x. Another common convention is to follow the number with an 'h'. In this case, the first hexadecimal digit must be decimal numeric digit (0-9) to avoid confusing the resulting string as a symbol in a program. For example, 0F3h would be a valid hexadecimal representation while F3h could be confused with a symbol. A similar convention to the trailing 'h' for hexadecimal is used for binary numbers by following the binary 1 and 0 digits with a 'b'. Again, by specifying that the first character in the digit stream as numeric 0 or 1, with a trailing 'b' the character string can be identified as only being a binary value. Hex notation is very useful because large numbers can be represented with fewer digits than with the binary or decimal system. The hex format is also extremely convenient for digital or binary systems because each hex digit replaces exactly four binary digits. This is because the biggest single hex number (0xF, or 0Fh) is represented exactly in four binary digits: 0xF hex = 1111b. The 8-bit binary representation of decimal 26 is 00011010b, or 0x1A (hex) which is much shorter. The hex system may look confusing at first, but when you need to convert to binary numbers, or represent large numbers, you will soon realize the benefits. CHAPTER 1 1-24 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP We will explain how to convert from hex to decimal and back in Chapter 6. Signed Integers l Signed magnitude integers Signed Decimal Binary Hex Sign Number 2 00 00 00 02 0 000 0000 0000 0000 0000 0000 0000 0010 3 00 00 00 03 0 000 0000 0000 0000 0000 0000 0000 0011 -2 80 00 00 02 1 000 0000 0000 0000 0000 0000 0000 0010 -3 80 00 00 03 1 000 0000 0000 0000 0000 0000 0000 0011 LECTURE 1 1-1 9 Signed Integers To perform arithmetic, we need to be able to represent signed numbers. In the binary system, the most significant bit (MSB) is used to indicate the sign of a number. When the MSB is set to 1, the number is negative and when it is set to 0, the number is positive. Two conventions, signed magnitude and signed two’s complement (2’s complement) exist for representing signed numbers. The signed magnitude convention is familiar to us since this is how we represent negative decimal numbers. For example, a +/- symbol is used for the sign bit to represent the negative of +10 as –10. However, this leaves an interesting question about the value of 0. Is it +0 or –0, or are they the same? Another issue is how to simplify the digital hardware in a DSP or microprocessor since smaller and faster circuits are an advantage. CHAPTER 1 1-25 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Two’s Complement Notation 2 Digit number -2 Decimal Binary two’s complement 3 Decimal calculation Binary two’s complement Decimal calculation LECTURE 1 -2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 2 0 -128 64 32 16 8 4 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 +2 +1= 1 1 1 1 1 1 1 0 -128 +64 +32 +16 +8 +4 +2 +0= 3 -2 1-2 0 The table on the previous page shows binary notation for signed integers; 2, 3, -2 and -3 using a signed 32bit system. The first two positive integers are represented in the standard fashion. The last two negative numbers have the top bit set to 1, but the rest of the representation remains the same. This sign-representation system is not convenient for a binary or digital machine. The machine needs to assess the sign bit and then carry out addition or subtraction, depending on the direction of the sign bit. A more convenient system would use two’s compliment notation to perform both addition and subtraction with the same hardware. Two’s Complement Notation To make it easier to understand two’s complement notation, our example uses an 8-bit binary representation. For positive numbers, such as the example +3, the MSB is set to 0, and the other bit values are exactly the same as in standard binary notation. The two’s complement notation of a negative number is quite different. If the MSB is set to 1, the MSB represents a negative value for that bit position. The top bit in an 8-bit system would therefore represent negative 27 or -128 with the rest of the bits again representing positive values. The sum of the decimal values of each bit that are set gives us the numbers decimal value. To represent –2 in two’s complement, the top bit is set to 1, representing –128, and the lower-order bits are set to make the result of the addition of all bits equal to –2. In this case, –2 = –128 + 126. The hardware method that is used to implement a two’s complement converter and adder is even simpler. This method negates a number by simply inverting all the bits and adding a 1 (as a carry bit) to the LSB. If a 1 is added to the LSB, this causes a carry into the upper bits, which may ripple carry bits all the way to the CHAPTER 1 1-26 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP top bit. During addition, each binary bit cell receives two bits from the two operands, plus any carry that may propagate from the next lower-bit cell. The 1 that is added into the LSB is simply implemented as a carry bit as if it were coming from the next lower-bit cell. The largest negative and positive values in two’s complement form for an 8-bit system are as shown: Most positive +1 * 27 - 1 = +127 >> 0111 1111b >> 0 +64+32+16+8+4+2+1 Most negative –1 * 27 = –128 >> 1000 0000b >> –128 +0 A 16-bit system would have a range of Most positive +1 * 215 - 1 = +32767 Most negative –1 * 215 = –32768 For a 32-bit system such as a TMS320C31 32-bit processor Most positive +1 * 231 – 1 = +2147483647 Most negative –1 * 231 = –2147483648 CHAPTER 1 1-27 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Fixed-Point Notation Conventions l Number range is between 1 and -1 l Decimal point is always in a fixed location (e.g., 0.74, 0.34, etc.) l Multiplying a fraction by a fraction always results in a fraction and will not produce an overflow (e.g., 0.99 x 0.9999 = less than 1) l Successive additions may cause overflow Why? l Signal processing is multiplication-intensive l Fixed-point notation prevents overflow (useful with a small dynamic range) l Fixed-point notation is less expensive How is fixed-point notation realized in a DSP? l Most fixed-point DSPs are 16 bits l The range of numbers that can be represented is 32767 to -32768 l The most common fixed-point format is Q15 Q15 Notation Bit 15 sign Bits 14 to 0 two’s complement number LECTURE 1 1-2 2 Fixed-Point Notation Fixed-point notation, sometimes called fractional-point notation or ’Q’ format, uses an implied binary point to represent binary fractions. This point always remains at a fixed location. The dynamic range of a processor is the range between the smallest and the largest number it can represent. When the dynamic range is limited, In a 16-bit processor, the dynamic range is 32767 to –32768. Such a small dynamic range can easily create overflows. For example, 200 × 350 = 70000, which is an overflow! However, if the number range is limited, or more precisely scaled, to +1 to –1, a multiplication could never produce an overflow. For example, the multiplication of two fractional numbers within the range of 1 to –1 must always produce a result that is also a fraction. The result is therefore confined to be within the range of 1 to –1. Unfortunately, successive additions can produce overflow values outside the range of 1 to –1. This point should be remembered when performing fixed-point arithmetic. Signal processing is both multiplication and addition intensive. An overflow can have serious consequences, (e.g., unintentionally clipping a large signal). A fixed-point system can solve this problem by either checking for overflows after each math operation, or by knowing that the inputs and outputs of the operation are input bounded or well behaved. CHAPTER 1 1-28 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Why Use a Fixed-Point System? The cost of implementing many DSP systems is strongly dependent on the amount of chip silicon used to get the job done, with most of the chip silicon being either in the processor or in the surrounding memory. If the chip silicon is mostly used for data storage, such as long audio delay buffers, video or coefficient tables, the difference between 16- and 32-bit data storage can be as much as 2:1. Furthermore, routing twice as many signals around the chip and system board can consume extra space and drive up the power consumption. Another advantage of short 16-bit fixed-point chips is that by making the core processor small, not only are the chips smaller and less expensive, they are also usually a bit faster. This may again lower the price of the DSP chip that, in price-sensitive volume applications, is an important consideration. However, if a 16-bit system must also perform 32-bit operations, these advantages can be lost and end up costing more. If a system can tolerate a smaller dynamic range and resolution, then the use of 16-bit data can be an economic advantage. Fixed-Point Q Notation As we have seen in multiplication and addition, overflows can be a problem for fixed-point DSPs. To eliminate this problem, a programming convention called Q format is introduced where fixed-point DSPs operate on fractional numbers which, by definition, cannot saturate. The principle of Q notation is the application of a simple scaling coefficient to convert fractions to integers that a fixed-point DSP is designed to handle. (Note that this is not an issue for floating-point DSP). The letter Q represents the ‘Quantity of fractional bits’ and the number following the Q indicates the number of bits that are used for the fraction. This divides the number into an upper and lower region of bits where the upper region contains the sign bit and any whole integer values, and the lower bits hold the fraction. Any Q format is possible, but Q15 is the most widespread in 16-bit DSPs and Q31 is most often used for 32bit DSPs. In Q15 format, an imaginary decimal point is placed between bits 15 and 16. The upper range in this case is only one MSB (for a 16-bit DSP) which is essentially the sign bit, or bits 16–31 in a 32-bit DSP. The remaining 15 bits are used to represent the fractional part of the number. To convert a Q-format integer to a floating-point value, a scaling coefficient is needed. If the Q number is 15, the coefficient or resolution of the fraction will be 2^–15 or 30.518e–6. CHAPTER 1 1-29 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Q15 Format Dynamic range in Q15 Number Biggest Smallest Fractional number 0.999 -1.000 Scaled integer for Q15 32767 -32768 Number representations in Q15 Decimal Q15 = Decimal x 2 0.5 0.05 0.0012 15 Q15 Integer 0.5 x 32767 16384 0.05 x 32767 1638 0.0012 x 32767 39 Rules for operations l Avoid operations with numbers larger than 1 2.0 x (0.5 x 0.45) = (0.2 x 0.5 x 0.45) x 10 = (0.5 x 0.45) + (0.5 x 0.45) l Scale numbers before the operation 0.5 in Q15 = 0.5 x 32767 =16384 LECTURE 1 1-2 3 Dynamic Range in Q15 The dynamic range, or ratio of largest to smallest magnitude levels, is the same for Q15 and normal integers. It is the scaling coefficient that sets the two apart, and other than this, you may have difficulty knowing which format is in use. As mentioned previously, to prevent overflows the inputs and outputs can be constrained to fractions in the range of 1 to –1 by simply applying a scaling coefficient. Number Representation in Q15 Scaling a number is simple: 15 Integer = Q15_fractional_number × 2 The second table on the slide shows several examples of scaling. Rules for Operations The most important rule in using the Q15 fixed-point format is to avoid using a number larger than 1 or smaller than –1. There are some instances where this can be safely violated. For example, a property of a 2’s complement adder is that if an addition overflow occurs, exceeding the available 16-bit range, a subtraction can unwrap the result back down into a valid range. Generally, however, it is best to avoid the problem in the first place. If a dynamic range greater than 32767 to –32768 (i.e., a 16-bit system) is required, it is also possible to perform longhand arithmetic in pieces, but this consumes CPU cycles and data. The bottom portion of the slide shows an example where multiplying 0.5 and 0.45 (unscaled for clarity) results in another fraction, which is not a problem. Multiplying the product by 2 can be done using two methods. One method is to multiply one of the inputs by 2 first. If the result of this intermediate operation exceeds +/–1.0, we will have a problem. The inputs could be scaled down first and then scaled up CHAPTER 1 1-30 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP afterwards, but this is also far from efficient. An alternative method is to add the product to itself, effectively multiplying by 2. This is one of the difficulties of using the fixed-point operation. The programmer needs to think about these issues and plan ahead. Another important rule is that all numbers must be scaled to the same Q format (Q15 in our examples), placing the decimal points of both operands in the same place, before an addition or subtraction is performed. Generally, this is also practiced in multiplication. However since the scaling coefficients are multiplied, the correct fraction can be retrieved using yet another scaling constant. Nevertheless, mixing Q formats is not desired. CHAPTER 1 1-31 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Q15 Operations Addition Decimal Q15 Scale back Q15 / 32767 0.5 + 0.05 = 0.55 16384 + 1638 = 18022 0.5 – 0.05 = 0.45 16384 – 1638 = 0.55 Multiplication 2 x 0.5 x 0.45 = Decimal Q15 Back to Q15 Product / 32767 0.5 x 0.45 = 0.225 16384 x 14745 = 241584537 0.225 + 0.225 = 0.45 7373 + 7373 = 14746 Scale back Q15 / 32767 7373 0.45 1-2 4 LECTURE 1 Q15 Addition Q15-format addition is shown in the first example above. The numbers 0.5 and 0.55 are each scaled by 32767 (Q15 coefficient) and then added. Since both numbers are scaled to the same Q format, the decimal point in both will be in the same place (bit 15). The sum is then scaled back to verify the result. In the second example, the correct subtraction (sum of two's complement) is 14746, and the expected scaled result is 14746 / 32767 = 0.45. Q15 Multiplication When scaled numbers are multiplied, the scaling coefficients are multiplied. To compensate, a second scaling factor that will put the data in the correct bit position is used. The Q15 multiplication shown gives an idea of how large the numbers can get. But as we can also see, the division by the Q-15 coefficient scales the number back down and we get the correct result. Anticipating this, the multiplier on a 16-bit DSP produces a 32-bit result. In actuality, the result is packed into the upper bits and comes with two sign bits. The programmer can either downshift to the lower 16 bits, or can left-shift up by one bit before storing the upper bits. Both methods will produce the same result, but the DSP is usually optimized to do the up-shift by 1 bit and store, so this is normally what is done. We can see how a Q15*Q15 multiply works by examining the process in long hand. In particular when the scaling coefficients are multiplied the result is a new scaling constant with a Q value equal to the sum of the two Q constants used on the operands. Given A and B in Q15 format, the result C=A*B is C=(A*215) * (B*215) = A*B * 230 C=(A*Qx)*(B*Qy) = A*B * Qx+y CHAPTER 1 1-32 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP It is evident that the output is no longer in Q15 format. To compensate, we need to ask where the new decimal point is. By noting that 230 is the same as saying Q30, we know that the decimal point is at bit 30 of the 32-bit result. To get back to 215 (Q15), we can multiply by 2–15 (a shift right by 15), or by multiplying yet again by 21 to a Q31 result. In this case, the correct bits are packed into the upper 16 bits of the DSP register The fixed-point Q format has the advantage of preventing overflows but certainly introduces complications for the DSP programmer. CHAPTER 1 1-33 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP T M S Floating-Point Form a t TMS single-precision floating-point format 31 ... 24 e 8 bits 23 s 1 bit 22 .............. f 23 bits Bit No 0 e = exponent is a signed two’s compliment 8-bit field and determines the location of the binary Q point s = sign of mantissa (s = 0 positive, s =1 negative) f = fractional part of the mantissa; an implied 1.0 is added to this fraction but is not allocated in the bit field since this value is always present Conversion equations s=0 s=1 Binary e X = 01.f x 2 e X = 10.f x 2 Decimal e X = 01.f x 2 e X = ( -2 + 0.f ) x 2 Equation 1 2 Special case s=0 X=0 e = -128 Exponent (e) Decimal Hex two’s comp. 0 1 127 -1 -128 00 01 7F FF 80 1-25 LECTURE 1 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Floating-Point Formats Although the C54x device is fixed-point, a popular floating-point format (used, for example, in TMS320C67x devices) standard is IEEE 754. The differences between various floating-point formats are actually insignificant, and conversion can be performed in ASIC hardware or software. TMS320 Single-Precision Floating-Point Format The preceding table shows an example of a TMS320C67x floating-point bit assignments. The top eight bits represent the exponent (e) in two’s complement notation. Bit 23, (s), is the sign bit of the mantissa, and the lower 23 bits are the fraction (f) of the mantissa. A value of 1.0 is also implied in the mantissa, but is not allocated a bit position since it is always present. This format is called floating-point because of the implied binary point floats around, depending on how large the exponent is. The exponent is essentially a variable Q value that is automatically adjusted for maximum precision and range by the hardware. Conversion Equations The middle table on the slide shows the conversion equations for the TMS320 single-precision floating-point format. The second column shows the binary and the third column shows the decimal version of the same equation. The decimal version of the equation is easier to understand. There are two different equations for positive and negative mantissa. We will use decimal examples of both equations to aid in the understanding of this format. CHAPTER 1 1-34 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP The representation of 0.0 is a special case where any number with an exponent of -128 (0x80) is treated as zero. Since -128 is the smallest possible value for the exponent, the scaling coefficient for these numbers would produce very small values. The convention used in the assembler is to represent zero as 0x80000000. For example, all of the following numbers are treated as the value 0: • 0x80000000 • 0x80123456 • 0x80876345 This is a special case worth remembering. CHAPTER 1 1-35 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Floating-Point Numbers Calculate 1.0e0 In hex In binary s = 0 00 00 00 00 0000 0000 0000 0000 0000 0000 0000 0000 Equation 1 applies: X = 01.f x2 e f = 0 01.0 x 2 0 e = 0 = 1.0 Calculate 1.5e01 In hex In binary s = 0 03 70 00 00 0011 0111 0000 0000 0000 0000 0000 0000 Equation 1 applies: X = 01.f x2 e 0011 s111 ... e = 3 f = 0.5 + 0.25 + 0.125 = 0.875 X = 01.875 x 2 3 = 15.0 decimal LECTURE 1 1-2 6 Floating-Point Numbers Let us now find the binary representation of 1.0e0. Since this is a positive number, the sign bit s=0. Therefore, Equation 1 applies. The fractional part of the mantissa (f) is 0 (f=0), and the exponent (e) is also 0 (e=0). Now that we know the decimal values for all the appropriate parts, we can express the 32-bit binary format. The fractional part of the mantissa (f) is zero, and is represented by setting bits 0 to 22 and the sign bit to 0. This leaves the top eight bits for the exponent (e), which are also set to 0. The top part of the slide shows the binary and hex representation for 1.0e0. The binary representation of the floating-point number decimal 1.5e01 is next. This number is positive, which implies that the sign bit s = 0 and that Equation 1 applies. Knowing that 1.875 x 8 = 15, the fractional part of the mantissa is 0.875 (the 1.0 is implied) and the exponent e = 3 (23=8). Adding fractions 0.5, 0.25 and 0.125 together yields 0.875, which corresponds to setting the top three bits (20,21 and 22) of the fractional to 1. The binary representation of the floating-point value is shown in the bottom part of the slide. Calculating negative floating-point numbers is slightly different. Although it is important to understand how binary representations correlate with decimal floating-point values, it is rarely necessary to perform the calculation. CHAPTER 1 1-36 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP More on Floating Point Calculate -2.0e0 In hex In binary s = 1 00 80 00 00 0000 0000 1000 0000 0000 0000 0000 0000 Equation 2 applies: X = ( -2.0 + 0.f ) x 2 e f = 0 ( -2.0 + 0.0 ) x 2 0 e = 0 = -2.0 Addition 1.5 + (-2.0) = 0.5 Multiplication 1.5e00 x 1.5e01 = 2.25e01 = 22.5 LECTURE 1 1-2 7 Negative Floating-Point Numbers The binary representation in TMS320 format for -2.0e0 is now considered. Since the number is negative, the sign bit s = 1, and equation 2 is applied. The mantissa is actually in twos compliment, so the fraction f (0.f) e is added to a decimal value of -2.0. The mantissa, -2.0+f, is then multiplied by the exponent multiplier, 2 . To arrive at a value of -2.0 the fraction (f) and the exponent (e) are therefor both 0 with the sign bit set to 1. The binary representation for -2.0e0 is shown on the top portion of the slide. Addition and Multiplication Addition and multiplication of floating-point numbers is simplicity in itself. The bottom portion of the slide shows an example of each. The DSP programmer does not need to do any scaling or take any special precautions before or after an addition, subtraction or multiply since this is all done in hardware. This is one of the reasons why it is easier to program floating-point DSP devices. CHAPTER 1 1-37 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Dynamic Range Ranges of number systems Numbers Base 2 Largest Integer 2 Smallest Integer 2 Smallest Q15 Smallest Floating Point 31 -1 2 147 483 647 7F FF FF FF 31 -2 147 483 648 80 00 00 00 -1 32 767 7F FF 15 -32 768 80 00 38 7F 7F FF FD 38 83 39 44 6E -2 Largest Q15 Largest Floating Point 15 -2 (2-2 Two’s Complement Hex Decimal -23 127 )x2 -2 x 2 3.402823 x 10 127 -3.402823 x 10 l The dynamic range of floating-point representation is very large l Conclusion l Largest integer x (1.5 x 10 29 l Largest Q15 x (1.03 x 10 ) ~ = largest floating point LECTURE 1 34 ) ~ = largest floating point 1-2 8 Comparison of Dynamic Ranges The dynamic range in a number system means the distance, in unit steps, between the largest and smallest number in that system. The larger the dynamic range is, the less potential it has of creating overflow conditions. Some signal-processing applications need a larger dynamic range than others. For example, a radar application may be trying to extract a tiny signal of only a few µVs buried in noise with an average level of several volts. The top table on the slide shows the dynamic ranges of a 32-bit signed integer notation, a fixed-point Q15 format used by 16 bit fixed point DSPs, and a 32-bit TMS320 single-precision floating-point format. It is clear that the TMS320 floating-point format has a larger dynamic range, but to fully appreciate the difference in dynamic range, you must multiply the largest integer with a constant value to reach the biggest value in TMS320 single-precision floating-point format. This constant is very large, indicating the vast difference in the dynamic ranges of the 32-bit signed integer notation and the TMS320 single-precision floating-point notation. The same comparison with the Q15 format reveals an even bigger difference. Clearly, the TMS320 single-precision floating-point format has a much larger dynamic range than the other number systems. The TMS320C67x single-precision floating-point architecture is just one reason for its popularity in certain signal-processing applications. Note that the resolution in TMS320 single-precision floating-point is 24 bits. This extra precision is a big benefit in applications such as digital audio. Humans also tend to respond to audio in a logarithmic way that is very similar to floating point. Applications and demonstration examples that take advantage of this can be downloaded from the Texas Instruments web site. CHAPTER 1 1-38 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Fixed vs . Floating Point l l l l l l DSP devices are designed as floating point or fixed point Fixed-point devices are usually 16-bits, e.g. TMS320C5x Floating-point devices are usually 32-bits, e.g. TMS320C3x Floating-point devices usually have a full set of fixed-point instructions Floating point devices are easier to program Fixed-point devices can emulate floating point in software Comparison Characteristic Floating point Dynamic range much larger Fixed point smaller Resolution comparable comparable Speed comparable comparable Ease of programming much easier more difficult Compiler efficiency more efficient less efficient Power consumption comparable comparable Chip cost comparable comparable System cost comparable comparable Design cost less more faster slower Time to market 1-2 9 LECTURE 1 DSP Devices DSP devices are designed as fixed- or floating-point devices. The design philosophy, data paths, and internal modules of each device are different. Generally, fixed-point devices address high-volume and inexpensive applications while floating-point devices target high-performance applications. But these differences are becoming hard to distinguish because the price of floating-point devices continues to fall. Fixed-point devices, such as the TMS320C54x device, are usually 16 bits with fewer external pins. Floating-point devices, such as the TMS320C67x device, are commonly 32 bits. Floating-point devices usually have a full set of fixed-point instructions and can be used as fixed-point processors without any speed penalty. Fixed-point devices can emulate floating-point devices in software, but there is a speed penalty because the conversion from fixed- to floating- point is performed in software. Comparison of Fixed vs. Floating-Point Devices A table comparison of fixed- and floating-point devices shows clearly the key component of each of the systems. Features like floating-point relieves the designer of any consideration of dynamic range in the design, but can cost more in CPU and additional memory costs. The speed of fixed point systems will tend to be slightly higher and consume less power, yet with the parallelism and greater precision of 32 bit data, this can sometimes easily outweigh any speed penalty. Floating-point devices are much easier to program; there is less concern with scaling, dynamic-range issues and, in most cases, resolution. Resolution is often determined by bus width but this also drives system cost. CHAPTER 1 1-39 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP C compilers for floating-points devices are much more efficient than C compilers for fixed-point devices. The primary reason for this is that the fixed-point devices do not have large register sets and therefore need software modules for number conversions to provide a reasonable C interface. For example, when the C programmer declares a floating-point number, an assembler routine needs to convert this format into fixedpoint format and back again after the processor has executed the necessary operations. Fixed-point devices typically have 1 or 2 accumulators, whereas the C67x floating-point family has sixteen 32-bit registers that can be used for math operations. Having more registers to work with is an advantage for a C compiler. These are important points in choosing a device for an application. Programs written in C will tend to favor floating-point devices. Power consumption depends heavily on both the system architecture and the software that is used. In a CMOS design, power is consumed when a capacitive node is charged from one supply rail to the other. If the change in state does not occur, no power is consumed. Since the processor, memory and surrounding system board may consist of millions of internal and external nodes, it is important to toggle as few as possible to get complete a given task. The other variable is to try and minimize the capacitance of each node. Simply put, toggles per second and higher capacitance equates to more power usage. Power consumption is therefore related to clock rates and data-bus width. If it takes fewer cycles to get the same job done on a wider bus, the net power usage may be the same or even better. For example, a 16-bit fixed-point device might use a similar amount of power when compared to a 32-bit floating-point device using only 16 bits of its data bus. The cost of floating-point-device chips is also becoming comparable to traditional fixed-point devices. Floating-point system costs need not be high just because they are internally using 32 bits. Minimum systems are made possible through the efficient use of internal RAM and fewer external components. The DSP Starter Kits and all of the applications that run on them would be an excellent example of a minimal system. The cost of programming, measured in the salary dollars paid out to a programmer, can also be a deciding factor. Selling many units to a very large market can absorb the extra time required for a fixed-point design. For smaller markets, or when time to market is important, the low design costs of floating-point are very beneficial. Selecting a device for a particular application is a complex decision and should be considered carefully along with any other points that are specific to the design. Our discussion highlighted some of the more important considerations. CHAPTER 1 1-40 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP TMS320 Family 16-Bit Fixed Point Devices 32-Bit Floating Point Devices ’C1x Hard-Disk Controllers ’C3x Videophones ’C2x Fax Machines ’C4x Parallel Processing ’C2xx Embedded Control ’C5x Voice Processing ’C54x Digital Cellular Phones ’ Other Devices ’C6x Advanced VLIW Processor Wireless Base Stations/Pooled Modems ’C8x LECTURE 1 Video Conferencing 1-3 0 TMS320 Family The Texas Instruments TMS320 family of DSP devices covers a wide range, from a 16-bit fixed-point device to a single-chip parallel-processor device. In the past, DSPs were used only in specialized applications. Now they are in many mass-market consumer products that are continuously entering new market segments. Let us briefly consider the Texas Instruments TMS320 family of DSP devices and their typical applications. C1x, C2x, C2xx, C5x, C54x The width of the data bus on these devices is 16 bits. All have modified Harvard architectures. They have been used in toys, hard disk drives, modems, cellular phones, and active car suspensions. C3x The width of the data bus in the C3x series is 32 bits. Because of the reasonable cost and floating-point performance, these are suitable for many applications. These include almost any filters, analyzers, hi-fi systems, voice-mail, imaging, bar-code readers, motor control, 3D graphics, or scientific processing. C4x This range is designed for parallel processing. The C4x devices have a 32-bit data bus and are floating-point. They have an optimized on-chip communication channel, which enables a number of them to be put together to form a parallel-processing cluster. The C4x range devices have been used in virtual reality, image recognition, telecom routing, and parallel-processing systems. C6x The C6x devices feature VelociTI, an advanced very long instruction word (VLIW) architecture developed by Texas Instruments. Eight functional units, including two multipliers and six arithmetic logic units CHAPTER 1 1-41 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP (ALUs), provide 1600 MIPS of cost-effective performance. The C6x DSPs are optimized for multi-channel, multifunction applications, including wireless base stations, pooled modems, remote-access servers, digital subscriber loop systems, cable modems, and multi-channel telephone systems. C8x The C80 is the first processor in this range. It has parallel processing on a single piece of silicon with four advanced DSPs (ADSPs) and a RISC master processor. It is used in high-performance video telephony, 3D computer graphics, virtual reality, and a number of multimedia applications. A lower-cost version, the C82, features two ADSPs and the RISC master processor. CHAPTER 1 1-42 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP TMS320C54x Architecture System control interface Program address generation logic (PAGEN) Data address generation logic (DAGEN) ARAU0, ARAU1 AR0-AR7 ARP, BK, DP, SP PC, IPTR, RC, BRC, RSA, REA P A B P B Memory and external interface C A B C B D A B Peripheral interface D B E A B E B EXP encoder X D A B M U X T register T D A A Sign ctr P C D T A(40) Sign ctr A B C D Sign ctr B S Sign ctr 0 A Fractional B M U X Adder(40) S A T R O U N D M U B A Legen d: A B C D E M P S T U Accumulator A Accumulator B CB data bus DB data bus EB data bus MAC unit PB program bus Barrel shifter T register ALU C D Barrel shifter ALU(40) A A Sign ctr M U X Multiplier (17 y 17) Z E R O B(40) B M U X S C O M M S W / L S W select P TR N T E C LECTURE 1 1-3 1 Copyright 1998, Texas Instruments Incorporated All Rights Reserved TMS320C54x Architecture The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power with eight buses. Separate program and data spaces allow simultaneous access to program instructions and data, providing a high degree of parallelism. For example, three reads and one write can be performed in a single cycle. Instructions with parallel store and application-specific instructions fully utilize this architecture. In addition, data can be transferred between data and program spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle. Also, the C54x includes the control mechanisms to manage interrupts, repeated operations, and function calling. Fixed-point processors represent numbers as a magnitude and sign within a certain number of bits. For the C54x, it is 16 bits. This is in contrast to floating-point processors that represent numbers as magnitude multiplied by an exponent. Fixed-point processors have smaller dynamic range the range of numbers that can be represented) than floating-point processors, but are also less complex and consequently less expensive. If the extended dynamic range is not needed, a fixed-point processor may be a more cost-efficient choice. CHAPTER 1 1-43 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Bus Structure The C54x architecture is built around eight major 16-bit buses (four program/data buses and four address buses): • The program bus (PB) carries the instruction code and immediate operands from program memory. • Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, data address generation logic, program address generation logic, on–chip peripherals, and data memory. • The CB and DB carry the operands that are read from data memory. • The EB carries the data to be written to memory. • Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction execution. • The C54x can generate up to two data-memory addresses per cycle using the two auxiliary register arithmetic units (ARAU0 and ARAU1). The PB can carry data operands stored in program space (for instance, a coefficient table) to the multiplier and adder for multiply/accumulate operations or to a destination in data space for data move instructions (MVPD and READA). This capability, in conjunction with the feature of dual-operand read, supports the execution of single-cycle, 3-operand instructions such as the FIRS instruction. The C54x also has an onchip bi-directional bus for accessing on-chip peripherals. This bus is connected to DB and EB through the bus exchanger in the CPU interface. Accesses that use this bus can require two or more cycles for reads and writes, depending on the peripheral’s structure. Table 1 summarizes the buses used by various types of accesses. Table 1. Bus Usage for Read and Write Accesses Access Type Address Bus PAB Program read √ Program write √ CAB DAB EAB √ √ √ √ √ Data read/data write √ Dual read/coefficient read √ √(hw) √(lw) √ lw = 1-44 √ √ √ √ √ √ √ Peripheral write √ √ √ Peripheral read high 16–bit word EB √ √ Data single write CHAPTER 1 DB √ √(hw) √(lw) Data long (32–bit) read = CB √ Data dual read hw PB √ Data single read Legend: Data Bus low 16–bit word √ INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Internal Memory Organization The C54x memory is organized into three individually selectable spaces: program, data, and I/O space. All C54x devices contain both random-access memory (RAM) and read-only memory (ROM). Among the devices, two types of RAM are represented: dual-access RAM (DARAM) and single-access RAM (SARAM). Table 2 shows how much ROM, DARAM, and SARAM are available on the different C54x devices. The C54x also has 26 CPU registers plus peripheral registers that are mapped in data-memory space. Table 2. Program and Data Memory on the TMS320C54x Devices Memory Type ’541 ’542 ’543 ’545 ’546 ’548 ’549 ’5402 ’5410 ’5420 ROM: 28K 2K 2K 48K 48K 2K 16K 4K 16K 0 Program 20K 2K 2K 32K 32K 2K 16K 4K 16K 0 Program/ data 8K 0 0 16K 16K 0 16K 4K 0 0 DARAM? 5K 10K 10K 6K 6K 8K 8K 16K 8K 32K SARAM? 0 0 0 0 0 24K 24K 0 56K 168K You can configure the dual–access RAM (DARAM) and single–access RAM (SARAM) as data memory or program/data memory. CHAPTER 1 1-45 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP ALU Functional Diagram CB15 - CB0 DB15 - DB0 T A 40 B T C D S Shifter output (40) 40 SXM A MUX MUX Sign ctr Sign ctr X Y B SXM OVM C16 C ACC ALU MUX OVA/OVB ZA/ZB TC 40 Legend: 40 A M U B A B C D M S T U 40 MAC output LECTURE 1 Accumulator A Accumulator B CB data bus DB data bus MAC unit Barrel shifter T register ALU 1-3 2 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Central Processing Unit (CPU) The C54x CPU contains: • 40–bit arithmetic logic unit (ALU) • Two 40–bit accumulators • Barrel shifter • 17 × 17–bit multiplier • 40–bit adder • Compare, select, and store unit (CSSU) • Data address generation unit • Program address generation unit Arithmetic Logic Unit (ALU) The C54x performs 2s-complement arithmetic with a 40-bit arithmetic logic unit (ALU) and two 40-bit accumulators (accumulators A and B). The ALU can also perform Boolean operations. The ALU uses these inputs: • • • • 16–bit immediate value 16–bit word from data memory 16–bit value in the temporary register, T Two 16-bit words from data memory CHAPTER 1 1-46 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP • 32-bit word from data memory • 40-bit word from either accumulator Accumulators Accumulators A and B (see Figure 1) store the output from the ALU or the multiplier/adder block. They can also provide a second input to the ALU; accumulator A can be an input to the multiplier/adder. Each accumulator is divided into three parts: • Guard bits (bits 39-32) • High-order word (bits 31-16) • Low-order word (bits 15-0) Instructions are provided for storing the guard bits, for storing the high- and the low-order accumulator words in data memory, and for transferring 32-bit accumulator words in or out of data memory. Also, either of the accumulators can be used as temporary storage for the other. Barrel Shifter The C54x barrel shifter has a 40-bit input connected to the accumulators or to data memory (using CB or DB), and a 40-bit output connected to the ALU or to data memory (using EB). The barrel shifter can produce a left shift of 0 to 31 bits and a right shift of 0 to 16 bits on the input data. The shift requirements are defined in the shift count field of the instruction, the shift count field (ASM) of status register ST1, or in the temporary register T (when it is designated as a shift count register). The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle. The LSBs of the output are filled with 0s, and the MSBs can be either zero filled or sign extended, depending on the state of the sign-extension mode bit (SXM) in ST1. Additional shift capabilities enable the processor to perform numerical scaling, bit extraction, extended arithmetic, and overflow prevention operations Multiplier/Adder Unit The multiplier/adder unit performs 17 x 17-bit 2s-complement multiplication with a 40–bit addition in a single instruction cycle. In the C54x architecture, the 17 x 17-bit multiplier is present to accommodate the ability to multiply a signed number by an unsigned number. Although the original data from memory is 16bit, unsigned numbers are sign-extended into a 17th bit so that they can be used by the multiplier. The multiplier/adder block consists of several elements: a multiplier, an adder, signed/unsigned input control logic, fractional control logic, a zero detector, a rounder (2s complement), overflow/saturation logic, and a 16-bit temporary storage register (T). The multiplier has two inputs: one input is selected from T, a data-memory operand, or high part of accumulator A; the other is selected from program memory, data memory, accumulator A, or an immediate value. The fast, on-chip multiplier allows the C54x to perform operations efficiently such as convolution, correlation, and filtering. In addition, the multiplier and ALU together execute multiply/accumulate (MAC) computations and ALU operations in parallel in a single instruction cycle. This function is used in determining the Euclidean distance and in implementing symmetrical and least mean square (LMS) filters, which are required for complex DSP algorithms. CHAPTER 1 1-47 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Compare, Select, and Store Unit (CSSU) The compare, select, and store unit (CSSU) performs maximum comparisons between the accumulator’s high and low word, allows both the test/control flag bit (TC) in status register ST0 and the transition register (TRN) to keep their transition histories, and selects the larger word in the accumulator to store into data memory. The CSSU also accelerates Viterbi-type butterfly computations with optimized on-chip hardware. On-Chip ROM The on-chip ROM is part of the program memory space and, in some cases, part of the data memory space. The amount of on-chip ROM available on each device varies, as shown in Table 2. On devices with a small amount of ROM (2K words), the ROM contains a bootloader that is useful for booting to faster on-chip or external RAM. For bootloading details on all C54x devices except the ’548 and ’549, see TMS320C54x DSP Reference Set, Volume 4: Applications Guide, SPRU173. On devices with larger amounts of ROM, a portion of the ROM may be mapped into both data and program space (except the ’5410). The larger ROMs are also custom ROMs: you provide the code or data to be programmed into the ROM in object file format, and Texas Instruments generates the appropriate process mask to program the ROM. On–Chip Dual–Access RAM (DARAM) The DARAM is composed of several blocks. Because each DARAM block can be accessed twice per machine cycle, the central processing unit (CPU) and peripherals such as the buffered serial port (BSP) and host port interface (HPI) can read from and write to a DARAM memory address in the same cycle. The DARAM is always mapped in data space and is primarily intended to store data values. It can also be mapped into program space and used to store program code. On–Chip Single–Access RAM (SARAM) The SARAM is composed of several blocks. Each block is accessible once per machine cycle for either a read or a write. The SARAM is always mapped in data space and is primarily intended to store data values. It can also be mapped into program space and used to store program code. On-Chip Memory Security The C54x maskable memory security option protects the contents of on-chip memories. When you designate this option, no instruction that has originated externally can access the on-chip memory spaces. CHAPTER 1 1-48 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Two C54x Memory Maps '541 Program Memory 0000h '541 Data Memory 0000h OVLY = 0 0000h-13FFh External 0000h-005Fh Memory-mapped registers OVLY = 1 0000h-007Fh Reserved 0060h-007Fh Scratch-pad DARAM 0080h-13FFh On-chip DARAM 0080h-13FFh On-chip DARAM 2000h 2000h 4000h 4000h 1400h-8FFFh External 6000h 6000h 8000h 8000h A000h A000h 1400h-DFFFh MP/MC = 0 C000h MP/MC = 1 9000h-FF7Fh On-chip ROM FF80h-FFFFh Interrupt vectors (internal) 9000h-FF7Fh FF80h-FFFFh C000h External Interrupt vectors (external) E000h E000h DROM = 0 DROM = 1 FFFFh LECTURE 1 External E000h-FFFFh External E000h-FEFFh On-chip ROM FF00h-FFFFh Reserved FFFFh 1-3 3 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Memory–Mapped Registers The data memory space contains memory-mapped registers for the CPU and the on-chip peripherals. These registers are located on data page 0, simplifying access to them. The memory-mapped access provides a convenient way to save and restore the registers for context switches and to transfer information between the accumulators and the other registers. CHAPTER 1 1-49 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Direct Addressing Block Diagram DP(9) 7 LSBs from IR (dma) SP(16) DAB(16) (read) CPL DAGEN CPL 0 EA = DP : offset(IR) 1 EA = SP + offset(IR) EAB(16) (write) or CAB(16) (32-bit read) Data bus DB(16) Data bus EB(16) EA Effective address IR Instruction register Legend: 1-34 LECTURE 1 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Data Addressing The C54x offers seven basic data addressing modes: • • • • Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses accumulator A to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode the lower seven bits of an address. The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to determine the actual memory address. • Indirect addressing uses the auxiliary registers to access memory. • Memory–mapped register-addressing uses the memory-mapped registers without modifying either the current DP value or the current SP value. • Stack addressing manages adding and removing items from the system stack. During the execution of instructions using direct, indirect, or memory-mapped register addressing, the data– address generation logic (DAGEN) computes the addresses of data–memory operands. CHAPTER 1 1-50 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP C54x Program Memory PAGEN PC Repeat registers RC BRC RSA REA LECTURE 1 1-3 5 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Program Memory Addressing Program memory is usually addressed on a C54x device with the program counter (PC). With some instructions, however, absolute addressing may be used to access data items that have been stored in program memory. The PC is loaded by the program-address generation logic (PAGEN). And is used to fetch individual instructions. Typically, the PAGEN increments the PC as sequential instructions are fetched. However, the PAGEN may load the PC with a nonsequential value as a result of some instructions or other operations. Operations that cause a discontinuity include branches, calls, returns, conditional operations, single– instruction repeats, multiple–instruction repeats, reset, and interrupts. For calls and interrupts, the current PC is saved onto the stack; it is referenced by the stack pointer (SP). When the called function or interrupt service routine is finished, the PC value that was saved is restored from the stack via a return instruction. For a detailed discussion of the hardware and software factors in program address generation, see Chapter 7, Program Memory Addressing. CHAPTER 1 1-51 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP C54x Pipeline Loads PAB with the PC's contents Prefetch Loads IR with the contents of PB Decodes the IR's contents Fetch Loads PB with the fetched instruction word Decode Loads DB with the data1 read operand Loads CB with the data2 read operand Loads EAB with the data3 write address, if required Access Loads DAB with the data1 read address, if required Loads CAB with the data2 read address, if required Updates auxiliary registers and stack pointer Read Execute/write Executes the instruction and loads EB with write data Time LECTURE 1 1-3 6 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Pipeline Operation An instruction pipeline consists of a sequence of operations that occur during the execution of an instruction. The C54x pipeline has six levels: prefetch, fetch, decode, access, read, and execute. At each of the levels, an independent operation occurs. Because these operations are independent, from one to six instructions can be active in any given cycle, each instruction at a different stage of completion. Typically, the pipeline is full with a sequential set of instructions, each at one of the six stages. When a PC discontinuity occurs, such as during a branch, call, or return, one or more stages of the pipeline may be temporarily unused. For more details about the pipeline operation, see Chapter 7, Pipeline. On–Chip Peripherals All the C54x devices have the same CPU, but different on–chip peripherals are connected to their CPUs. The C54x devices have these on–chip peripheral options: • General-purpose I/O pins: XF and BIO • Timer • Clock generator • Host port interface • 8-bit standard (’542, ’545, ’548, ’549) • 8-bit enhanced (’5402, ’5410?-?see note below) • 16-bit enhanced (’5420?-?see note below) • Synchronous serial port (’541, ’545, and ’546) CHAPTER 1 1-52 INSTRUCTOR'S GUIDE • • • • • INTRODUCTION TO DSP Buffered serial port (’542, ’543, ’545, ’546, ’548, and ’549) Multichannel buffered serial port (McBSP) (’5402, ’5410, and ’5420?-?see note below) Time-division multiplexed (TDM) serial port (’542, ’543, ’548, and ’549). Software-programmable wait-state generator Programmable bank-switching module Note: Enhanced Peripherals For more detailed information on the enhanced peripherals, see SPRU302, TMS320C54xDSP, Enhanced Peripherals: Volume 5. General–Purpose I/O Pins Each C54x device has two general-purpose I/O pins: BIO and XF. BIO is an input pin that can be used to monitor the status of external devices. XF is a software-controlled output pin that allows you to signal external devices. Software–Programmable Wait–State Generator The software-programmable wait–state generator extends external bus cycles up to seven machine cycles (14 machine cycles in the ’549, ’5402, ’5410, and ’5420) to interface with slower off-chip memory and I/O devices. The software wait-state generator is incorporated without any external hardware. For off-chip memory accesses, from zero to seven wait states can be specified within the software wait-state register (SWWSR) for each 32K-word block of program and data memory, and for the 64K-word block of I/O space. Programmable Bank-Switching Logic The programmable bank–switching logic can automatically insert one cycle when an access crosses memory bank boundaries inside program memory or data memory. One cycle can also be inserted when an access crosses from program memory to data memory. This extra cycle prevents bus contention by allowing memory devices to release the bus before other devices start driving the bus. The size of memory bank for bank switching is defined by the bank switching control register (BSCR). Host Port Interface The host port interface (HPI) is a parallel port that provides an interface to a host processor. Information is exchanged between the C54x and the host processor through C54x on-chip memory that is accessible to both the host processor and the C54x. Table 3 identifies the HPI-equipped C54x devices. Table 3. Host Port Interfaces on the TMS320C54x DSP Host Port Interface ’541 ’542 ’543 ’545 ’546 ’548 ’549 ’5402 ’5410 ’5420 Standard 8–bit HPI 0 1 0 1 0 1 1 0 0 0 Enhanced 8–bit HPI 0 0 0 0 0 0 0 1 1 0 Enhanced 16–bit HPI 0 0 0 0 0 0 0 0 0 1 CHAPTER 1 1-53 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Hardware Timer The C54x features a 16–bit timing circuit with a 4-bit prescaler. The timer counter is decremented by 1 at every CLKOUT cycle. Each time the counter decrements to 0, a timer interrupt is generated. The timer can be stopped, restarted, reset, or disabled by specific status bits. Clock Generator The clock generator consists of an internal oscillator and a phase–locked loop (PLL) circuit. The clock generator can be driven internally by a crystal resonator with the internal oscillator or externally by a clock source. The PLL circuit can generate an internal CPU clock by multiplying the clock source by a specific factor; thus, you should use a clock source with a lower frequency than that of the CPU. CHAPTER 1 1-54 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Serial Port Interface Block Diagram Data Bus 16 DRR (16) 16 Load control logic (Load) DXR (16) 16 16 RINT on RSR-DRR transfer Load Control Logic (Load) RSR (16) Byte/word counter XINT on DXR-XSR transfer XSR (16) (Clear) (Clear) (Clock) (Clock) FSR Byte/word counter FSX CLKRCLKX DR DX 1-3 7 LECTURE 1 Copyright 1998, Texas Instruments Incorporated All Rights Reserved Serial Ports The serial ports on the C54x vary by device, and are represented by four types: synchronous, buffered, multichannel buffered (McBSP), and time-division multiplexed (TDM). See Table 4 for the number of each type on the various C54x devices. The sections that follow provide an introduction to the four types of serial ports. For more details about these ports, see Chapter 9, Serial Ports. For detailed information about the McBSPs, see volume 5 of this reference set: TMS320C54x DSP, Enhanced Peripherals, literature number SPRU302. Table 4. Serial Port Interfaces on the TMS320C54x Devices Serial Ports ’541 ’542 ’543 ’545 ’546 ’548 ’549 ’5402 ’5410 ’5420 Synchronous 2 0 0 1 1 0 0 0 0 0 Buffered 0 1 1 1 1 2 2 0 0 0 Multichannel Buffered 0 0 0 0 0 0 0 2 3 6 TDM 0 1 1 0 0 1 1 0 0 0 CHAPTER 1 1-55 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Synchronous Serial Ports Synchronous serial ports are high-speed, full-duplexed serial ports that provide direct communication with serial devices such as codecs, analog-to-digital (A/D) converters, and other serial systems. When more than one synchronous serial port resides on a C54x, these ports are identical but independent. Each synchronous serial port can operate at up to one-fourth the machine cycle rate (CLKOUT). The synchronous serial port transmitter and receiver are double buffered and individually controlled by maskable external interrupt signals. Data is framed either as bytes or as words. Buffered Serial Ports A buffered serial port (BSP) is a synchronous serial port that is enhanced with an autobuffering unit and is clocked at the full CLKOUT rate. It is full–duplexed and double–buffered to offer flexible data stream length. The autobuffering unit supports high–speed transfers and reduces the overhead of servicing interrupts. Multichannel Buffered Serial Ports (McBSPs) The McBSP is an enhanced buffered serial port that includes the following standard features: buffered data registers, full duplex communication, and independent clocking and framing for receive and transmit. In addition, the McBSP includes the following enhanced features: internal programmable clock and frame generation, multichannel mode, and general purpose I/O. For detailed information about the McBSPs, see volume 5 of this reference set: TMS320C54x DSP, Enhanced Peripherals, literature number SPRU302. TDM Serial Ports A time-division multiplexed (TDM) serial port is a synchronous serial port that is enhanced to allow time– division multiplexing of the data. It can be configured for either synchronous operations or for TDM operations and is commonly used in multiprocessor applications. CHAPTER 1 1-56 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP C54x External Bus Interface CLKOUT PB Fetch CB/DB Reads EB Write A(22 - 0) Write D(15 - 0) LECTURE 1 Read Read Fetch 1-3 8 Copyright 1998, Texas Instruments Incorporated All Rights Reserved External Bus Interface The C54x can address up to 64K words of data memory, 64K words of program memory (8M words in the ’548, ’549, and ’5410; 1M words in the ’5402; 256K words in the ’5420), and up to 64K words of 16–bit parallel I/O ports. Accesses to either external memory or I/O ports take place through the external interface. Individual space-select signals, DS, PS, and IS, allow the selection of physically separate spaces. The interface’s external ready input signal and software–generated wait states allow the processor to interface with memory and I/O devices of many different speeds. The interface’s hold modes allow an external device to take control of the C54x buses; in this way, an external device can access the resources in the program, data, and I/O spaces. External memory can be accessed by most C54x instructions. However, accessing I/O ports requires the use of special instructions: PORTR and PORTW. IEEE Standard 1149.1 Scanning Logic The IEEE Standard 1149.1 scanning–logic circuitry is used for emulation and testing purposes only. This logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-to-pin continuity as well as to perform operational tests on devices peripheral to the C54x. The IEEE Standard 1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of the on–chip resources. Thus, the C54x can perform on-board emulation using the IEEE Standard 1149.1 serial scan pins and the emulation–dedicated pins. CHAPTER 1 1-57 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP REFERENCES Ahmed, Irfan (ed.). [1991]. Digital Control Applications With the TMS320 Family, Texas Instruments, Dallas, TX, 1991. Allen, J. [1975]. “Computer Architecture for Signal Processing,” Proceedings of the IEEE, vol. 63, no. 4, pp. 624-633, April 1975 Arazi, Benjamin. [1988]. A Commonsense Approach to the Theory of Error Correcting Codes, MIT Press, Cambridge, MA Augarten, S. [1984]. Bit by Bit, Ticknor & Fields, New York Auslander, E. [1993]. “Digital signal processing and the emerging markets of the ’90s,” Le Traitement du Signal et ses Applications, Actes des Conferences, DSP’93 Bell, C. G. and Newell, A. [1971]. Computer Structures, McGraw-Hill, New York Bowen, B. A. and Brown, W. R. [1982]. VLSI Systems Design for Digital Signal Processing, Volume 1: Signal Processing and Signal Processors, Prentice-Hall, Englewood Cliffs, NJ Cooley, J. W., Lewis, P. A. W. and Welch, P. D. [1967]. “Historical Notes on the Fast Fourier Transform,” IEEE Transactions on Audio and Electroacoustics, Vol AU-15, No. 2, pp.76-79, June 1967 Cooley, J. W. and Tukey J. W. [1965]. “An algorithm for the machine computation of complex Fourier Math. Of Comput., Vol 19, pp. 297-301 Danielson, C. G. and Lanczos, C. [1942]. “Some improvements in practical Fourier analysis and their J. Franklin Inst., Vol 233, pp. 365-380 and 435-452, April 1942 DeFatta, David J.; Lucas, Joseph G. and Hodgkiss, William S. [1988]. Digital Signal Processing: A System Design Approach, John Wiley, New York Dote, Y. [1990]. Servo Motor and Motion Control using Digital Signal Processors, Prentice-Hall, Englewood Cliffs, NJ Hanselmann, H. [1987]. “Implementation of Digital Controllers - A Survey”, Automatica, Vol. 23, No. 1, 1987 Hayes, John P. [1979]. Computer Architecture and Organization, McGraw-Hill International Heidemann, Michael T., Johnson, Don. H. and Burrus, C. Sidney [1984]. “Gauss and the History of the Fast IEEE ASSP Magazine, pp. 14-21, October 1984 CHAPTER 1 1-58 INSTRUCTOR'S GUIDE INTRODUCTION TO DSP Jury, E. I. [1964]. Theory and Application of the Z-Transform Method, John Wiley, New York Lewis, F. [1992]. Applied Optimal Control & Estimation: Digital Design & Implementation, Prentice-Hall, Englewood Cliffs, NJ Lynn, Paul A. [1982]. The Analysis and Processing of Signals, MacMillan, London Oppenheim, A. V. and Schafer, R. W. [1975 and 1988]. Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ Runge, C. [1903]. Zeit. fur Math. and Physik, Vol 48, p. 43. CHAPTER 1 1-59