EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION Chintan G Govani B.E., Gujarat University, India, 2007 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in ELECTRICAL AND ELECTRONIC ENGINEERING at CALIFORNIA STATE UNIVERSITY, SACRAMENTO FALL 2010 EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION A Project by Chintan G Govani Approved by: __________________________________, Project Advisor Jing Pang, Ph.D. __________________________________, Second Reader Preetham Kumar , Ph.D. ____________________________ Date ii Student: Chintan G Govani I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Dr. Preetham Kumar Department of Electrical and Electronic Engineering iii ________________ Date Abstract of EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION by Chintan G Govani The main goal of this project is to implement the DCT and quantization of a JPEG image compression algorithm using hardware. Basically, in this project the JPEG algorithm converts an image from BMP format into a JPEG format. The main step of this algorithm is discrete cosine transform (DCT) which is implemented using hardware (ATmega32 micro-controller) and other parts are implemented using Microsoft Foundation Class (MFC) library based application. The other main thing in this project was to interface a micro-controller with the computer in order to receive data from computer for processing DCT on it and then send back the processed data to the computer. The media used for this communication is RS232 and one other chip, MAX-232 which converts data between RS-232 and TTL format. The MFC Application takes in the BMP format image as an input. After that this application works on extracting the raw data from that image in order to send it to a micro-controller for further processing and waits until micro-controller finishes the iv processing. As soon as micro-controller is done with processing, it sends data back to MFC application and then MFC application completes remaining processing steps in JPEG compression algorithm and creates an image in JPEG format which is very small in size as compared to BMP format. The report will further discuss on how all the things like MFC application is implemented, hardware is setup and how an interfacing between computer and microcontroller is established. ______________________, Committee Chair Jing Pang, Ph.D. ______________________ Date v ACKNOWLEDGMENTS First of all, before going in to the details of this project, I would like to thank you Dr. Jing Pang for allowing me to work under her guidance for this project and also providing me an encouragement and advise throughout this project without which this project would not have been completed. I would also like to thank my team mate Parikshit Nigam for providing support while working on this project. I also want to thank Dr. Preetham Kumar for providing guidance and proof reading this report. Special thanks to Dr. Suresh Vadhava, Department Chair of Electrical and Electronics Engineering for their great support and suggestions. Finally, I would like to thank all the faculty members of Electrical and Electronics Engineering department for their help from start till the end of my master’s degree at California State University, Sacramento. vi TABLE OF CONTENTS Page Acknowledgments....................................................................................................... vi List of Tables ............................................................................................................. vii List of Figures ............................................................................................................. ix Chapter 1. INTRODUCTION ...................................................................................................1 1.1 Introduction to JPEG Algorithm ...................................................................... 1 1.2 Purpose of Project .............................................................................................3 1.3 Organization of Project Report .........................................................................3 2. 2-DIMENSIONAL DISCRETE COSINE TRANSFORM ......................................5 2.1 Introduction to DCT .........................................................................................5 2.2 Coefficients .......................................................................................................5 2.3 Quantization ....................................................................................................10 3. HARDWARE SYSTEM OVERVIEW .................................................................13 3.1 ATmega32 Microcontroller Description ....................................................... 13 3.1.1 Features ...................................................................................................13 3.1.2 Pin Layout of Atmega32 .........................................................................14 3.1.3 Pin Function - General Description ........................................................14 3.1.4 Pin Function - Alternate Description ......................................................16 3.2 Block Diagram of Atmega32 Micro-Controller .............................................19 3.3 Oscillator.........................................................................................................20 3.4 Memories ........................................................................................................21 3.5 USART ...........................................................................................................22 3.5.1 Clock Generator ......................................................................................23 vii 3.5.2 USART Frame Format ............................................................................24 3.5.3 USART Registers....................................................................................25 3.6 Driver/Receiver MAX232 ..............................................................................31 3.6.1 Pin Layout of MAX 232 .........................................................................31 3.6.2 Pin Description........................................................................................32 3.6.3 Functional Description ............................................................................33 4. SYSTEM DESIGN AND IMPLEMENTATION ....................................................35 4.1 Flow of Project ...............................................................................................35 4.2 Block Diagram of System ...............................................................................38 4.3 Software Implementation................................................................................39 4.3.1 Initialization of Micro-Controller ...........................................................41 4.3.2 Implementation of DCT ..........................................................................42 4.4 Code Optimization ..........................................................................................43 5. CONCLUSION ........................................................................................................47 Appendix ......................................................................................................................48 References ....................................................................................................................58 viii LIST OF TABLES Page 1. Table 1 Input Image Pixel Matrix of Size 8x8 ...................................................7 2. Table 2 Output DCT Coefficient Matrix of size 8x8 .........................................8 3. Table 3 Quantization matrix for JPE Standard ................................................11 4. Table 4 Output matrix after quantization .........................................................12 5. Table 5 Input matrix before quantization.........................................................12 6. Table 6 General functionality of ATmega32 pins ...........................................16 7. Table 7 Port A pins alternate functionality ......................................................16 8. Table 8 Port B pins alternate functionality ......................................................17 9. Table 9 Port C pins alternate functionality ......................................................17 10. Table 10 Port D pins alternate functionality ....................................................18 11. Table 11 Equations for calculating Baud Rate ................................................23 12. Table 12 UCSRA Description .........................................................................25 13. Table 13 UCSRB Description..........................................................................27 14. Table 14 UCSRC Description..........................................................................29 15. Table 15 UBBRH and UBBRL Description ....................................................30 16. Table 16 Pin Description of MAX 232 ............................................................32 17. Table 17 Voltage ranges for RS-232 and TTL ................................................34 ix LIST OF FIGURES Page 1. Figure 1 Steps involved in JPEG Compression Algorithm ...............................1 2. Figure 2 Method for computing 2D-DCT using 1D-DCT .................................6 3. Figure 3 Saturn and its 2-D DCT .......................................................................9 4. Figure 4 Image reconstructed using all the DCT coefficients ...........................9 5. Figure 5 Image reconstructed utilizing 75% of DCT coefficients .....................9 6. Figure 6 Image reconstructed utilizing 50% of DCT coefficients ...................10 7. Figure 7 Image reconstructed using 25% of DCT coefficients........................10 8. Figure 8 Pin Layout of ATmega32 ..................................................................14 9. Figure 9 ATmega32 Block Diagram................................................................19 10. Figure 10 Crystal Oscillator Connections ........................................................20 11. Figure 11 Block Diagram of USART ..............................................................22 12. Figure 12 USART Frame Format ....................................................................24 13. Figure 13 Pin Diagram of MAX 232 ...............................................................31 14. Figure 14 Logic Diagram for Driver / Receiver ..............................................33 15. Figure 15 Flow chart of project .......................................................................37 16. Figure 16 Block diagram of a system ..............................................................38 17. Figure 17 Flow chart for software implementation .........................................40 18. Figure 18 Performance improvement using code optimization .......................46 x 1 Chapter 1 INTRODUCTION 1.1 Introduction to JPEG Algorithm The name JPEG is an acronym for “Joint Photographic Experts Group”. It is a name of the committee who has created this JPEG algorithm standard and other related standards. JPEG is an international standard for compressing the images. Different steps involved in the JPEG compression are shown in the figure 1. The main focus of this project is on discrete cosine transform (DCT) and quantization of JPEG algorithm. Figure 1 Steps involved in JPEG Compression Algorithm The algorithm starts with taking an image and dividing the whole image into individual pixel blocks of 8x8. All the operations afterwards will be performed on these blocks. If the integer numbers of 8x8 pixel blocks are not available then encoder will have to feed some dummy data in order to complete. Also, the values in 8x8 blocks are converted in the form of centered on zero, range from -128 to 127, by subtracting 128 from each value of 8x8 matrixes. The DCT will be performed on the resultant matrix. The DCT and quantization are discussed with more details in the following chapter. 2 The algorithm has specifications for lossy image compression as well as for the lossless image compression. The lossless compression is not widely used. Generally, the lossy compression is most popular because it reduces the size of original image and that way it helps not only in saving a disk space but also in transmitting the images from one point to another very quickly. The loss depends on the compression ratio which means one can adjust the compression parameters according to the requirements of the size of an image. The more the compression ratio the smaller the image, that means there is a tradeoff between size of an image and a quality of an image. JPEG can compress color image by 10 to 20 times. For example, if you have an image of size 200K then resultant JPEG image will be of only 10-20K in size. It can compress gray scale image by 4 to 5 time without causing the visible loss in the image. The other important thing about JPEG algorithm is that you can vary the decoding speed by using different approximations for the required calculations. The JPEG algorithm works on the fact that human eyes cannot detect the small changes in color of an image and can detect changes in brightness. Due to the above mentioned reasons, the algorithm is mainly used in the applications where images are being viewed by humans and not by machines because machine might easily detect changes in the color. One other advantage of JPEG is that it can store color information in 24 bit per pixel. The only disadvantage of JPEG lossy compression algorithm is that it loses more information if you do compress and decompress repeatedly. 3 1.2 Purpose of Project The main objective of this project is to study JPEG image compression algorithm and implement discrete cosine transform (DCT) as well as Quantization using microcontroller ATmega32. Once the DCT is successfully implemented, the project can be extended to impalement other steps involved in the JPEG algorithm using micro-controller and external flash memory can also be interfaced. 1.3 Organization of Project Report The report contains detailed description of every aspect of this project including the result obtained and future enhancement possibility. Chapter two discusses in detail about the major steps involved in JPEG compression which are DCT and Quantization. It also provides brief introduction about implantation of DCT using floating point and fixed point method. Chapter three describes different hardware components used for successfully implementing this project. The major components are ATmega32 micro-controller and Receiver/Transmitter Max232 IC. Chapter four explains about system design and implementation through description of block diagram and also discusses about interfacing of a micro-controller with computer in order transfer data. 4 Chapter five concludes the report and provides prospective of future implementation. 5 Chapter 2 2-DIMENSIONAL DISCRETE COSINE TRANSFORM 2.1 Introduction to DCT 2-Dimensional Discrete cosine transform is an integral. DCT is the most important and most costly step in a process of JPEG compression algorithm. Fundamentally, DCT works on the principle of converting the spatial domain representation in to a frequency domain representation. Discrete cosine transform is real part of Discrete Fourier Transform. Since, only real values are taken and imaginary values are discarded, it has lower energy. Two dimensional discrete cosine transform is calculated by first doing 1-D DCT on rows followed by 1-D DCT on columns or vice versa. 2.2 Coefficients The 2D-DCT can be computed by performing 1D-DCT for rows and columns separately as shown in the figure 2 below. The left most top corner value in the matrix of 8-by-8 is called as a “DC value” which is the average value of the block. All other values in the block are “AC values” which represents changes in a block across its height and width. The main idea behind doing the DCT is to separate out high and low frequency information in the image so that it becomes easy to eliminate the high frequency components without losing the low frequency components [2]. 6 Figure 2 Method for computing 2D-DCT using 1D-DCT In the mathematical form, DCT for a given block of size N x N can be given by, ……………………………………………… (1) As the JPEG algorithm standard uses a block of size 8x8, we can put N=8 in the above given equation 1. So the DCT step in JPEG algorithm takes in 64 input values and produces unique set of 64 values which are two dimensional. These are sometimes called as DCT coefficients. In the output matrix of size 8x8 (64 values), the top-left corner value is called as DC coefficient (zero frequency) and other 63 values are called as AC coefficients. The main advantage of DCT is that it accumulates most of the energy in the low frequency 7 components. It discards information in the high frequency components without affecting the quality of a resultant image as it is not visibly detected by human eyes [2]. In the example below, the input block of 8x8 from the gray scale image and the output block after DCT are shown in the table 1 and 2. 140 144 147 140 140 155 179 175 144 152 140 147 140 148 167 179 152 155 136 167 163 162 152 172 168 145 156 160 152 155 136 160 162 148 156 148 140 136 147 162 147 167 140 155 155 140 136 162 136 156 123 167 162 144 140 147 148 155 136 155 152 147 147 136 Table 1 Input Image Pixel Matrix of Size 8x8 [6] 186 -18 15 -9 23 -9 -14 19 21 -34 26 -9 -11 11 14 7 -10 -24 -2 6 3 3 -20 -1 -8 -5 14 -15 -3 -3 -3 8 -3 10 8 1 18 18 18 15 4 -2 -18 8 -4 -4 1 -7 8 9 1 -3 4 -7 -7 -1 -2 0 -8 -2 2 4 -6 -6 0 Table 2 Output DCT Coefficient Matrix of size 8x8 [6] An Example of DCT is shown in below figures from 3 to 7 for Saturn. The percentage in the figure indicates that only those many coefficients are used to reconstruct the image again. For example, if an image is of size 100x100 then DCT with 25% contains only 100x100x0.25 (=2250) coefficients out of total 10,000 coefficients. When number of coefficients is decreased, entropy is also reduced. The reduction in entropy can be marked by looking at the histograms. Discrete cosine transform concentrates all the energy from lower frequencies. All the pixel values and energy at higher frequencies is reduced to lower values and most of the times can be approximated to zero. Human eyes are more sensitive towards lower frequencies pixel values rather than higher frequencies pixel values. DCT takes this fact to its advantage. Implementing DCT in hardware is a challenging as it is the most optimized algorithm that implements uses at least 18 multiplies and 29 additions. Note the increase in blur and loss in sharpness of the reconstructed image as more and more DCT coefficients are discarded. But, still the image can be viewed without major loss. This property is exploited in DCT for reducing size on disk while storing the image and reducing the cost and bandwidth while transmitting the image. [2] 9 Figure 3 Saturn and its 2-D DCT [2] Figure 4 Image reconstructed using all the DCT coefficients [2] Figure 5 Image reconstructed utilizing 75% of DCT coefficients [2] 10 Figure 6 Image reconstructed utilizing 50% of DCT coefficients [2] Figure 7 Image reconstructed using 25% of DCT coefficients [2] 2.3 Quantization Quantization is performed by dividing each of the frequency domain components by respective values in a quantization matrix and then rounding the resultant value to a nearest integer. The quantization matrix is a JPEG standard 8x8 matrix which has predefined 64 values. The quantization matrix values are shown in the table 3 below. The compression ratio of an overall JPEG algorithm can be varied by multiplying the quantization matrix with some scaling factor. Low scaling factor 11 will provide excellent quality in the resultant image but it will have larger size than the image produced with high scaling factor. The quantization is the only lossy step in the process of JPEG compression algorithm because of this rounding operation. Mostly high frequency components are quantized with larger values than low frequency values to achieve higher compression with lesser loss in the resulting image. 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 Table 3 Quantization matrix for JPEG Standard Mathematically, the quantization can be performed by the following equation Quantized Value (a, b) = DCT (a, b)/ Quantum (a, b)…………………………… (2) For example, the output matrix is shown in the table 4 below after quantization using quantization matrix in the table 3 above, performed on the input matrix shown in the 5 below. 12 190 13 -10 -7 1 -1 0 8 -45 -60 14 12 -5 2 -5 -8 -86 62 2 -17 4 40 -4 5 -54 -37 -10 31 24 7 -6 2 -87 -40 50 -18 38 -21 -1 0 -63 62 89 12 -8 6 10 -7 -18 13 -55 45 -6 12 8 10 -55 34 -14 -13 15 -9 -3 0 Table 4 Output matrix after quantization 12 1 -1 0 0 0 0 0 -3 -5 1 0 0 0 0 0 -6 5 0 0 0 1 0 0 -4 -2 0 1 0 0 0 0 -3 -1 1 0 0 0 0 0 -2 2 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 Table 5 Input matrix before quantization 13 Chapter 3 HARDWARE SYSTEM OVERVIEW In this project, I have used ATmega32 microcontroller for communication with PC as well as for performing the Discrete Cosine Transform (DCT) operation of JPEG algorithm. Texas Instruments Max232 IC was used to interface ATmega32 with PC in order to transmit and receive data. 3.1 ATmega32 Microcontroller Description 3.1.1 Features There are lots of other micro-controllers available but this microcontroller was use for this project specifically because of the following features available. 1. It is a 40 pin PDIP SOC package which is easy to use for general purpose application. 2. It has programmable serial USART available with pins RXD and TXD which is most useful for this project for transferring date between microcontroller and PC. 3. It includes one feature called ISP (In-System Self-Programmable) which means that EEPROM can be programmed without ejecting the controller from its socket [3]. 4. It is a low power microcontroller. 5. It has separate program memory of 32 K and date memory of 1 K available which is more than enough for the application of this project 14 6. It has its own instructions set available and can be programmed using that. It can also be programmed using C language and the cross compiler can be used to convert that in to a hex code. 3.1.2 Pin Layout of ATmega32 Figure 8 Pin Layout of ATmega32 [3] 3.1.3 Pin Function - General Description The table 6 explains the general function of each pin of ATmega32. Similar pins are grouped together in order to understand the functionality easily. Pin Name Pin Functionality 15 VCC Power supply for the chip GND Ground PORT A (PA0…PA7) It’s an 8-bit bi-directional I/O port. Port pins have internal pull up resistors available. Port pins are in tri-state condition when reset is active. PORT B (PB0…PB7) It’s an 8-bit bi-directional I/O port. Port pins have internal pull up resistors available. Port pins are in tri-state condition when reset is active. PORT C (PC0…PC7) It’s an 8-bit bi-directional I/O port. Port pins have internal pull up resistors available. Port pins are in tri-state condition when reset is active. PORT D (PD0…PD7) It’s an 8-bit bi-directional I/O port. Port pins have internal pull up resistors available. Port pins are in tri-state condition when reset is active. RESET A low level of this signal for more than one cycle puts whole chip in the reset condition even if the clock is not running. XTAL 1 Input to the internal Oscillator amplifier as well as to the internal clock operating circuit. XTAL 2 Output from the internal Oscillator amplifier. 16 AVCC It’s a supply pin for port A and Analog-to-Digital converter if connected. AREF It’s an analog reference pin for Analog-to-Digital converter. Table 6 General functionality of ATmega32 pins [3] 3.1.4 Pin Function - Alternate Description The table 6 provides a general description of all the pins of microcontroller ATmega32. All four ports (Port A, Port B, Port C and Port D) can be used for other functionality too which is described in the following four tables from 7 to 10. Table 7 Port A pins alternate functionality [3] 17 Table 8 Port B pins alternate functionality [3] Table 9 Port C pins alternate functionality [3] 18 Table 10 Port D pins alternate functionality [3] 19 3.2 Block Diagram of ATmega32 Micro-Controller Figure 9 ATmega32 Block Diagram [3] 20 3.3 Oscillator There are many options available for clock generation in the ATmega32 microcontroller. I have used crystal oscillator for this project to generate the clock. As shown in the figure 10 below, there are two crystal pins available, XTAL1 which is an input and XTAL2, which is an output for an inverting amplifier which we can configure to use as an on-chip clock generator. Figure 10 Crystal Oscillator Connections [3] There is a CKOPT fuse available to select between two different oscillator amplifier modes. Programming the CKOPT will give full rail-to-rail swing on the resulting clock and also provides higher frequency range. If CKOPT is not programmed then output swing will be very small and frequency range available is also very small. The other important thing about the connections in crystal oscillator is that C1 and C2 should always be equal [3]. 21 3.4 Memories The main advantage of ATmega32 is that it has separate program and data memory available on-chip. It also has extra EEPROM available for data storage. The ATmega32 has an on-chip reprogrammable flash memory available to store the source code and the size of this memory is 32Kbytes which enough for an application like I have developed. It has separate secure section for boot program available. There are total of 2144 locations available for data storage which is divided between register file, I/O memory and internal SRAM data. First 96 location are reserved for register file and I/O locations while other 2048 (2K) locations are dedicated to SRAM data memory. If required, extra flash memory can be interfaced to the micro-controller for data storage. Many applications require more data memory then available on-chip. For example, DCT is performed on block by block basis by micro-controller and then resultant data is transferred to computer in this project. If it’s required to perform DCT on all the blocks before sending back the data to computer then micro-controller’s onchip memory would not be enough and it will require extra flash memory to be interfaced with it. 22 3.5 USART The USART (Universal Synchronous Asynchronous Receiver Transmitter) is basically a serial communication device and it was used in this project as to communicate with computer in order to transfer data to and from micro-controller. The block diagram of an USART is shown in the figure 11 below. There are three main sections inside USART and they are clock generator, receiver and transmitter. Figure 11 Block Diagram of USART [3] 23 3.5.1 Clock Generator The clock generator basically generates the clock required by receiver and transmitter. The USART has four different modes available for clock generation. 1) Normal Asynchronous mode 2) Double Asynchronous mode 3) Master Synchronous mode 4) Slave Synchronous mode Also, the clock frequency is set by baud-rate generator which in turn is set by programming the USART Baud Rate Register (UBBR). The receiver clock frequency is same as baud rate while the transmitter divides baud rate by 2, 4 or 8 depending on the mode of operation. There are different equations available for calculating the baud rate as well as for calculating the value to program UBBR register as shown in table 11. Table 11 Equations for calculating Baud Rate [3] 24 3.5.2 USART Frame Format The USART frame has following parts: a) Start bit b) 5,6,7,8 or 9 data bits c) No, even or odd parity bit d) 1 or 2 stop bits The USART frame always starts with start bit which indicates the start of frame followed by data bits. The parity is optional for USART. A frame format sequence is shown in the figure 12. Figure 12 USART Frame Format [3] Where, St = Start bit which is always low 0 to 8= Data bits P= Parity bit (optional), even or odd Sp= Stop bit, always high 25 IDLE= No Transfer, must be high The frame format is defined by initializing the USART for communication. There are different registers available inside USART which needs to be programmed before initiating any transfer. 3.5.3 USART Registers 1) UCSRA (Universal Control and Status Register A) The table 11 shows description of this register. Bit Number 7 6 5 4 3 2 1 0 RXC TXC UDRE FE DOR PE U2X MPCM Read/Write R R/W R R R R R/W R/W Initial Value 0 0 1 0 0 0 0 0 Bit Name Table 12 UCSRA Description [3] Bit 7 - RXC (USART Receive Complete) Set: When receive buffer is not empty Clear: When receive buffer is empty Bit 6 - TXC (USART Transmit Complete) Set: When entire frame in transmit buffer is out Clear: When transmit complete interrupt is generated 26 Bit 5 - UDRE (USART Data Register Empty) Set: Means that transmit buffer is empty and ready to receive new data Clear: Transmit buffer is not empty Bit 4 - FE (Frame Error) Set: If the first stop bit of received data is zero Clear: If stop bit of received data is one Bit 3 - DOR (Data Overrun) Set: When data overrun is detected Clear: Always initialized it as clear Bit 2 - PE (Parity Error) Set: If next data received in receive buffer has parity error Clear: Always initialized it with zero Bit 1 - U2X (Double the USART Transmission Speed) Set: Indicates that Asynchronous operation is running with double speed Clear: Always write zero for synchronous operation Bit 0 - MPCM (Multi-Processor Communication Mode) 27 Set: Enables multi-processor communication mode Clear: Write zero when using only one processor 2) UCSRB (USART Control and Status Register B) Bit Number Bit Name 7 6 RXCIE TXCIE 5 4 3 2 1 0 UDRIE RXEN TXEN UCSZ2 RXB8 TXB8 Read/Write R/W R R/W R/W R/W R/W R/W R R/W Initial Value 0 0 0 0 0 0 0 0 Table 13 UCSRB Description [3] Bit 7 – RXCIE (Receive Complete Interrupt Enable) - Setting this bit enables interrupt for RXC flag in UCSRA Bit 6- TXCIE (Transmit Complete Interrupt Enable) - Setting this bit enables interrupt for TXC flag in UCSRA Bit 5- UDRIE (USART Data Register Empty Interrupt Enable) - Setting this bit enables interrupt for UDRE flag in UCSRA Bit 4- RXEN (Receiver Enable) - Setting this bit Enables the UART receiver and override normal functionality of pin 28 Bit 3- TXEN (Transmitter Enable) - Setting this bit Enables the UART transmitter and override normal functionality of pin Bit 2- UCSZ2 (Character Size) - This bit in combination with UCSZ1:0 sets the number of bit in a frame for communication Bit 1- RXB8 (Receive Data Bit 8) - It is the ninth data bit of received frame when using a frame with nine bits Bit 0- TXB8 (Transmit Data Bit 8) - It is the ninth data bit of transmitted frame when using a frame with nine bits 3) UCSRC (USART Control and Status Register C) Bit No. Bit Name Read/Write Initial 7 6 5 4 3 2 1 URSEL UMSEL UPM1 UPM0 USBS UCSZ1 UCSZ0 0 UCPOL R/W R/W R/W R/W R/W R/W R/W R/W 1 0 0 0 0 1 1 0 Table 14 UCSRC Description [3] 29 Bit 7- URSEL (Register Select) Set: When using UCSRC register Clear: When using UBRRH register Bit 6- UMSEL (USART Mode Select) Set: Enables asynchronous operation Clear: Enables synchronous operation Bit 5:4- UPM1:0 (Parity Mode) 00: Parity mode disabled 01: Reserved 10: Even parity enabled 11: Odd parity enabled Bit 3- USBS (Stop Bit Select) Set: Enables communication with 2-stop bit Clear: Enables communication with 1-stop bit Bit2:1- UCSZ1:0 (Character Size) - Works in combination with UCSZ2 for selecting number of data bit in a frame 30 Bit 0- UCPOL (Clock Polarity) Set: Enables transmitted data to be changed on falling edge of clock and received data to be sampled on rising edge of clock Clear: Enables transmitted data to be changed on rising edge of clock received data to be sampled on falling edge of clock 4) UBBRH and UBBRL (USART Baud Rate Registers) Bit No. 15 14 13 12 Bit Name URSEL - - - Bit name 11 10 9 8 UBBR [11:8] UBBR [7:0] Bit No. 7 6 5 4 3 2 1 0 Read/Write R/W R R R R/W R/W R/W R/W Read/Write R/W R/W R/W R/W R/W R/W R/W R/W Table 15 UBBRH and UBBRL Description [3] An initial value for all bits is zero. Bit 15- URSEL (Register Select) Set: Selects UCSRC register Clear: Selects UBBRH register Bit 14:12- Reserved Bits 31 Bit 11:0- UBBR 11:0 (USART Baud Rate Register) - Bits 11:8 are from UBBRH and bits 7:0 are from UBBRL register. - These bits are used to set the baud rate. 3.6 Driver/Receiver MAX 232 The microcontroller communicates to the PC using an IC called MAX 232 which provides a simple receiver and transmitter functionality. The main feature of this IC is that it has dual pair of driver/receiver available. 3.6.1 Pin Layout of Max 232 Figure 13 Pin Diagram of MAX 232 [4] 32 3.6.2 Pin Description Pin Name (Pin Number) Pin Description C1 + + connector for capacitor C1 Vs + Output of voltage pump C1 - - connector for capacitor C1 C2 + + connector for capacitor C2 C2 - - connector for capacitor C2 Vs - Output of voltage pump T2OUT Driver 2 output R2IN Receiver 2 input R2OUT Receiver 2 output T2IN Driver 2 input T1IN Driver 1 input R1OUT Receiver 1 output R1IN Receiver 1 input T1OUT Driver 1 output GND Ground VCC Power supply Table 16 Pin Description of MAX 232 [4] 33 3.6.3 Functional Description The basic principle of Max 232 is to convert signals from an RS-232 serial port to TTL which is compatible in digital logic circuits. Basically, drivers convert from TTL to RS-232 and receivers convert from RS-232 to TTL. When converting from one to another, it actually changes the voltage level of particular pin. The functionality or receiver and transmitter can be given by following figure. Figure 14 Logic Diagram for Driver / Receiver [4] An RS-232 works with the voltages in the range of -15V to -3V for logic “0” and +3V to +15V for logic “1”. These voltage ranges are not compatible with digital logic circuits and therefore they needs to be converted in to an appropriate levels to communicate with them. The table 17 shows voltages ranges for RS-232 and TTL along with corresponding logic level. 34 RS-232 TTL Logic -15V to -3V +2V to +5V 1 +3V to +15V 0V to +0.8V 0 Table 17 Voltage ranges for RS-232 and TTL [10] 35 Chapter 4 SYSTEM DESIGN AND IMPLEMENTATION 4.1 Flow of Project The main goal of this project was to implement the DCT and quantization steps of JPEG algorithm using micro-controller to convert an image from BMP format to JPEG format and it is being accomplished with combination of software and hardware. The flow chart of the project implementation is show in the figure 15. The microcontroller in only involved in two steps in that flow chart. First of all the input is a gray scale image is loaded into buffers and the headers processed according to the type of image using a MFC application. On valid BMP files the data is sampled. The Micro-controller communicates with computer using MFC application in order to start data transfer. First an 8-byte header is initially sent to the microcontroller. The first byte tells micro-controller on what action to take on incoming data. If the first byte value is zero (0) then micro-controller will perform DCT on it and if it’s one (1) then micro-controller will perform IDCT on incoming values. The next Two bytes contain information on how many blocks of data is going to be transferred. The rest 5 bytes are reserved for future use. After header, micro-controller receives quantization table from computer. Once received the micro-controller decides which function needs to be called. It inputs a 64- 36 bytes character array. This data is sent to the “DCT ()” function within the microcontroller. The “DCT ()” function takes some time to process in the data in microcontroller. So the MFC poll's for a key with information “done” from the microcontroller. Once the process is complete on the given block, the micro-controller sends the key “done” to the MFC. Then the output is sent by the micro-controller, 128-bytes (64 short int) from “DCT ()”. This information is written back into the files in the required order and the file pointers as well as the Dynamic memory allocations are freed. Then the MFC application process on that data, insert as header part on it and converts it in to JPEG file. 37 Figure 15 Flow chart of project 38 4.2 Block diagram of System The block diagram of whole system is shown in figure 16 below. Figure 16 Block diagram of a system As shown in the figure 16, the system consists of ATmega32 micro-controller, Receiver/Transmitter Max232, connectors and power supply. The main part of this system is the micro-controller which performs DCT and quantization. The MAX232 is used to interface micro-controller with computer to send and receive data. There are 39 only two pins required for this interface. Those pins are PD0 and PD1 from port D. In normal mode, these pins work as input or output but when used for serial communication, they work as RXD and TXD respectively. The micro-controller sends and receives data from computer serially through RXD and TXD pins of USART inside micro-controller. The USART needs to be initialized (programmed) before any transfer starts. 4.3 Software Implementation The software for DCT and quantization is implemented using C language and the code is converted into an assembly language using code-vision compiler. The following flowchart shows an implementation of the software code. 40 Figure 17 Flow chart for software implementation 41 4.3.1 Initialization of Micro-Controller As mentioned before, all the registers inside microcontroller needs to be programmed properly in order to start serial communication between micro-controller and computer. The following code snippet shows an initialization of port and direction of different pins as well as initialization of USART at the start of transfer. Port A initialization PORTA=0x00; // Input Port DDRA=0x00; Port B initialization PORTB=0x00; // Input Port DDRB=0x00; Port C initialization PORTC=0x00; // Input Port DDRC=0x00; Port D initialization PORTD=0x00; // Input Port DDRD=0x00; 42 USART initialization UCSRA=0x00; UCSRB=0x18; // Receiver and Transmitter ON UCSRC=0x86; // Communication Parameters: 8 Data, 1 Stop, No Parity UBRRH=0x00; // USART Baud rate: 9600 UBRRL=0x67; 4.3.2 Implementation of DCT The micro-controller first fetches an 8x8 blocks of data from computer and then performs DCT on that data. After the DCT is completed for one block, micro-controller sends back the data to computer and then starts fetching new block. The process continues until all blocks of an image completed. Following code snippet shows an implementation on the DCT function. for(u=0;u<=7;u++) { for(v=0;v<=7;v++) { x=0; if(u==0 && v==0) else if (v==0 || u==0) {coeff=.125;} {coeff=1.414*.125;} 43 else {coeff=.25;} x=0; for(i=0;i<=7;i++) { for(j=0;j<=7;j++) { x=x+(norm[i][j]*(cos(3.142*(i+.5)*u/8))*(cos(3.142*(j+.5)*v/8))); } } dct1[u][v]=x*coeff; } } 4.4 Code Optimization Implementation in digital signal processing requires a lot of computation and simulation time so it is very important to write an optimized code. Software tools and compiler are very good at optimization now-a-days but still they are not as good as requires. Many times it becomes necessary to optimize the code after compilation. The optimization is recommended after successful completion of an initial implementation. In this project I have implemented the functionality of micro-controller using C language and then converted that C code into an assembly code using a compiler called code vision. Although the compiler is good in optimizing the resultant assembly code, 44 there are some portions in C code which can be better optimized by hand coding than using compiler. I have optimized some functionality of micro-controller using an assembly language itself which helps in two ways. First, it is more optimized code than compiler and the second is that compiler does not have to even try to optimize that portion of code as it is already written in an assembly language. One may argue that why don’t we write a whole code in assembly if it is more optimized? The answer to that questions is, it is not always feasible to write assembly code for whole design because implementing digital signal processing related stuff is very hard using assembly only as it requires lot of mathematics and computation. Following code snippets show an optimization of code by replacing a condition of test and branch with single instruction. Also it optimizes the code by removing an extra register require for comparison during test. Code for Fetching data from computer: Before optimization: // for(i=0;i<=7;i++) CLR R8 _0x4: LDI R30,LOW(7) // load i with 7 45 CP R30,R8 // compare R30 with R8 BRLO _0x5 // if R30 < R8, come out of “ I ” loop // for(j=0;j<=7;j++) CLR R9 _0x7: LDI R30,LOW(7) // load j with 7 CP R30,R9 // compare R9 and R30 BRLO _0x8 // if R30 < R9, come out of “ j ” loop _0x8: INC R8 // increment count to compare with “ i ” RJMP _0x4 // jump to next iteration for “ i ” // j++ and go inside loop for j INC R9 // increment counter to compare with “ j ” RJMP _0x7 // jump for next iteration of “ j ” After optimization: LDI R16 , 8 // i loop, load register R16 with 8 46 fori: // label for “ i ” loop LDI R18,8 // j loop, load register R18 with 8 forj: // label for the loop DEC R18 // decrement R16 by 1 (for j loop) BRNE forj // if j=0 then jump out of loop DEC R16 // Decrement R16 by 1 (for i loop) BRNE fori // if i=0 then jump out of loop Performance difference: Flash usage before optimization = 11.7% Flash usage after optimization= 11.2% Figure 18 Performance improvement using code optimization 47 Chapter 5 CONCLUSION After detailed study of JPEG compression algorithm, now I understand how it is used to compress image. Also, I came to know about how to make tradeoff between the size of an image and quality of an image by controlling the quantization. The JPEG algorithm takes an advantage of human eye limitation to reduce the size of an image. This project is mainly based on image processing and the main goal of this project was to implement the discrete cosine transform and quantization using a microcontroller and also interface the micro-controller with computer. I did not have much idea about the project when I first started it but while working on this project I got know about many important concepts about image processing as well as DSP (Digital Signal Processing). The milestones achieved in this project are: 1. Successfully implemented DCT and quantization using micro-controller 2. Interfacing on a computer with micro-controller. In future, this project can be expanded to implement other steps of JPEG algorithm like parsing the data from image, Huffman coding etc using micro-controller and also interfacing an extra flash memory to process more data. 48 APPENDIX #include <mega32.h> #include <delay.h> #include<math.h> // Standard Input/Output functions #include <stdio.h> // Declare global variables short c[8][8],norm[8][8]; char qt[8][8]; int u,v; float coeff; float x; char pixel[8][8]; unsigned char i,j,p,q; short temp_short; char key[5]="done",temp; short notimes; void DCT() { // int i,j; float dct1[8][8]; float temp; //store pixel into short int array norm for(i=0;i<=7;i++) 49 { for(j=0;j<=7;j++) { if(pixel[i][j]&0x80) norm[i][j]=(0xff00|pixel[i][j]); else norm[i][j]=pixel[i][j]; } } //DCT for(u=0;u<=7;u++) { for(v=0;v<=7;v++) { x=0; if(u==0 && v==0) {coeff=.125;} else if (v==0 || u==0) {coeff=1.414*.125;} else {coeff=.25;} x=0; for(i=0;i<=7;i++) { for(j=0;j<=7;j++) 50 { x=x+(norm[i][j]*(cos(3.142*(i+.5)*u/8))*(cos(3.142*(j+.5)*v/8))); } } dct1[u][v]=x*coeff; } } //implementation of quantization for (i=0;i<=7;i++) { for(j=0;j<=7;j++) { temp=(dct1[i][j])/(qt[i][j]) ; c[i][j]=(short int)(temp+.5); } } } void call_dct() { PORTA.0=1; for(p=0;p<notimes;p++) { for(i=0;i<8;i++) for(j=0;j<8;j++) { 51 pixel[i][j]=getchar(); } DCT(); for(i=0;i<4;i++) putchar(key[i]); for(i=0;i<8;i++) for(j=0;j<8;j++) { temp_short=c[i][j]&0xff00; temp=(temp_short>>8); putchar(temp); //delay_ms(10); temp=c[i][j]&0xff; putchar(temp); } } PORTA.0=0; } void main(void) { unsigned char header[8]; // Input/Output Ports initialization // Port A initialization PORTA=0x00; DDRA=0x00; 52 // Port B initialization PORTB=0x00; DDRB=0x00; // Port C initialization DDRC=0x00; // Port D initialization PORTD=0x00; DDRD=0x00; // Timer/Counter 0 initialization TCCR0=0x00; TCNT0=0x00; OCR0=0x00; // Timer/Counter 1 initialization TCCR1A=0x00; TCCR1B=0x00; TCNT1H=0x00; TCNT1L=0x00; ICR1H=0x00; ICR1L=0x00; OCR1AH=0x00; OCR1AL=0x00; OCR1BH=0x00; OCR1BL=0x00; // Timer/Counter 2 initialization ASSR=0x00; 53 TCCR2=0x00; TCNT2=0x00; OCR2=0x00; // External Interrupt(s) initialization MCUCR=0x00; MCUCSR=0x00; // Timer(s)/Counter(s) Interrupt(s) initialization TIMSK=0x00; // USART Baud rate: 9600 UCSRA=0x00; UCSRB=0x18; UCSRC=0x86; UBRRH=0x00; UBRRL=0x67; ACSR=0x80; SFIOR=0x00; while (1) { for(i=0;i<8;i++) header[i]=getchar(); temp_short=header[1]; temp_short<<=8; temp_short|=header[2]; notimes=temp_short; for(i=0;i<8;i++) 54 for(j=0;j<8;j++) { qt[i][j]=getchar(); } if(header[0]==1) call_dct(); else call_idct(); }; } 55 REFERENCES [1] Randall C. Reiningek and Jerry D. Gibson, “Distributions of the Two-Dimensional DCT Coefficients for Images”, IEEE Transactions on Communications, Vol. 31, Issue 6, June 1983 [2] Syed Ali Khayam, “Discrete Cosine Transform (DCT): Theory and Application”, Michigan State University, March 2003 [3] Atmel Corporation, “ATmega32 microcontroller”, pp 1-170, July 2010 [4] Texas Instruments, “MAX232 Drivers/Receivers”, pp 1-9, Oct. 2002 [5] Agostini, Silva and Bampi, “Pipelined fast 2D DCT Architecture for JPEG image compression”, Integrated Circuits and System Design, Issue 2001, pp 226-231, 2001 [6] Viranchi Dwivedi, “JPEG Image Compression and Decompression with Modeling of DCT Coefficients on the Texas Instrument Video Processing Board TMS320DM6437”, Master Project Report, California State University, Sacramento, Summer 2010 [7] David Taubman and Michael Marcellin, “JPEG 2000: Image Compression Fundamentals, Standards and Practice”, Springer 2001, ISBN: 079237519X [8] Al Bovik, “Handbook of Image & Video Processing”, Academic Press Series, 1999 [9] Sakamoto and Tase, “Software JPEG for a 32-bit MCU with dual issue”, IEEE transactions on Consumer Electronics, Vol. 44, Issue 4, Nov 1998