EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION Chintan G Govani

advertisement
EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION
Chintan G Govani
B.E., Gujarat University, India, 2007
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
ELECTRICAL AND ELECTRONIC ENGINEERING
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
FALL
2010
EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION
A Project
by
Chintan G Govani
Approved by:
__________________________________, Project Advisor
Jing Pang, Ph.D.
__________________________________, Second Reader
Preetham Kumar , Ph.D.
____________________________
Date
ii
Student: Chintan G Govani
I certify that this student has met the requirements for format contained in the University
format manual, and that this project is suitable for shelving in the Library and credit is to
be awarded for the Project.
__________________________, Graduate Coordinator
Dr. Preetham Kumar
Department of Electrical and Electronic Engineering
iii
________________
Date
Abstract
of
EMBEDDED SYSTEM DESING OF JPEG IMAGE COMPRESSION
by
Chintan G Govani
The main goal of this project is to implement the DCT and quantization of a JPEG
image compression algorithm using hardware. Basically, in this project the JPEG
algorithm converts an image from BMP format into a JPEG format. The main step of this
algorithm is discrete cosine transform (DCT) which is implemented using hardware
(ATmega32 micro-controller) and other parts are implemented using Microsoft
Foundation Class (MFC) library based application.
The other main thing in this project was to interface a micro-controller with the
computer in order to receive data from computer for processing DCT on it and then send
back the processed data to the computer. The media used for this communication is RS232 and one other chip, MAX-232 which converts data between RS-232 and TTL format.
The MFC Application takes in the BMP format image as an input. After that this
application works on extracting the raw data from that image in order to send it to a
micro-controller for further processing and waits until micro-controller finishes the
iv
processing. As soon as micro-controller is done with processing, it sends data back to
MFC application and then MFC application completes remaining processing steps in
JPEG compression algorithm and creates an image in JPEG format which is very small in
size as compared to BMP format.
The report will further discuss on how all the things like MFC application is
implemented, hardware is setup and how an interfacing between computer and microcontroller is established.
______________________, Committee Chair
Jing Pang, Ph.D.
______________________
Date
v
ACKNOWLEDGMENTS
First of all, before going in to the details of this project, I would like to thank you
Dr. Jing Pang for allowing me to work under her guidance for this project and also
providing me an encouragement and advise throughout this project without which this
project would not have been completed. I would also like to thank my team mate
Parikshit Nigam for providing support while working on this project. I also want to thank
Dr. Preetham Kumar for providing guidance and proof reading this report. Special
thanks to Dr. Suresh Vadhava, Department Chair of Electrical and Electronics
Engineering for their great support and suggestions.
Finally, I would like to thank all the faculty members of Electrical and Electronics
Engineering department for their help from start till the end of my master’s degree at
California State University, Sacramento.
vi
TABLE OF CONTENTS
Page
Acknowledgments....................................................................................................... vi
List of Tables ............................................................................................................. vii
List of Figures ............................................................................................................. ix
Chapter
1. INTRODUCTION ...................................................................................................1
1.1 Introduction to JPEG Algorithm ...................................................................... 1
1.2 Purpose of Project .............................................................................................3
1.3 Organization of Project Report .........................................................................3
2. 2-DIMENSIONAL DISCRETE COSINE TRANSFORM ......................................5
2.1 Introduction to DCT .........................................................................................5
2.2 Coefficients .......................................................................................................5
2.3 Quantization ....................................................................................................10
3. HARDWARE SYSTEM OVERVIEW .................................................................13
3.1 ATmega32 Microcontroller Description ....................................................... 13
3.1.1 Features ...................................................................................................13
3.1.2 Pin Layout of Atmega32 .........................................................................14
3.1.3 Pin Function - General Description ........................................................14
3.1.4 Pin Function - Alternate Description ......................................................16
3.2 Block Diagram of Atmega32 Micro-Controller .............................................19
3.3 Oscillator.........................................................................................................20
3.4 Memories ........................................................................................................21
3.5 USART ...........................................................................................................22
3.5.1 Clock Generator ......................................................................................23
vii
3.5.2 USART Frame Format ............................................................................24
3.5.3 USART Registers....................................................................................25
3.6 Driver/Receiver MAX232 ..............................................................................31
3.6.1 Pin Layout of MAX 232 .........................................................................31
3.6.2 Pin Description........................................................................................32
3.6.3 Functional Description ............................................................................33
4. SYSTEM DESIGN AND IMPLEMENTATION ....................................................35
4.1 Flow of Project ...............................................................................................35
4.2 Block Diagram of System ...............................................................................38
4.3 Software Implementation................................................................................39
4.3.1 Initialization of Micro-Controller ...........................................................41
4.3.2 Implementation of DCT ..........................................................................42
4.4 Code Optimization ..........................................................................................43
5. CONCLUSION ........................................................................................................47
Appendix ......................................................................................................................48
References ....................................................................................................................58
viii
LIST OF TABLES
Page
1.
Table 1 Input Image Pixel Matrix of Size 8x8 ...................................................7
2.
Table 2 Output DCT Coefficient Matrix of size 8x8 .........................................8
3.
Table 3 Quantization matrix for JPE Standard ................................................11
4.
Table 4 Output matrix after quantization .........................................................12
5.
Table 5 Input matrix before quantization.........................................................12
6.
Table 6 General functionality of ATmega32 pins ...........................................16
7.
Table 7 Port A pins alternate functionality ......................................................16
8.
Table 8 Port B pins alternate functionality ......................................................17
9.
Table 9 Port C pins alternate functionality ......................................................17
10.
Table 10 Port D pins alternate functionality ....................................................18
11.
Table 11 Equations for calculating Baud Rate ................................................23
12.
Table 12 UCSRA Description .........................................................................25
13.
Table 13 UCSRB Description..........................................................................27
14.
Table 14 UCSRC Description..........................................................................29
15.
Table 15 UBBRH and UBBRL Description ....................................................30
16.
Table 16 Pin Description of MAX 232 ............................................................32
17.
Table 17 Voltage ranges for RS-232 and TTL ................................................34
ix
LIST OF FIGURES
Page
1.
Figure 1 Steps involved in JPEG Compression Algorithm ...............................1
2.
Figure 2 Method for computing 2D-DCT using 1D-DCT .................................6
3.
Figure 3 Saturn and its 2-D DCT .......................................................................9
4.
Figure 4 Image reconstructed using all the DCT coefficients ...........................9
5.
Figure 5 Image reconstructed utilizing 75% of DCT coefficients .....................9
6.
Figure 6 Image reconstructed utilizing 50% of DCT coefficients ...................10
7.
Figure 7 Image reconstructed using 25% of DCT coefficients........................10
8.
Figure 8 Pin Layout of ATmega32 ..................................................................14
9.
Figure 9 ATmega32 Block Diagram................................................................19
10.
Figure 10 Crystal Oscillator Connections ........................................................20
11.
Figure 11 Block Diagram of USART ..............................................................22
12.
Figure 12 USART Frame Format ....................................................................24
13.
Figure 13 Pin Diagram of MAX 232 ...............................................................31
14.
Figure 14 Logic Diagram for Driver / Receiver ..............................................33
15.
Figure 15 Flow chart of project .......................................................................37
16.
Figure 16 Block diagram of a system ..............................................................38
17.
Figure 17 Flow chart for software implementation .........................................40
18.
Figure 18 Performance improvement using code optimization .......................46
x
1
Chapter 1
INTRODUCTION
1.1 Introduction to JPEG Algorithm
The name JPEG is an acronym for “Joint Photographic Experts Group”. It is a
name of the committee who has created this JPEG algorithm standard and other related
standards. JPEG is an international standard for compressing the images. Different
steps involved in the JPEG compression are shown in the figure 1. The main focus of
this project is on discrete cosine transform (DCT) and quantization of JPEG algorithm.
Figure 1 Steps involved in JPEG Compression Algorithm
The algorithm starts with taking an image and dividing the whole image into
individual pixel blocks of 8x8. All the operations afterwards will be performed on these
blocks. If the integer numbers of 8x8 pixel blocks are not available then encoder will
have to feed some dummy data in order to complete. Also, the values in 8x8 blocks are
converted in the form of centered on zero, range from -128 to 127, by subtracting 128
from each value of 8x8 matrixes. The DCT will be performed on the resultant matrix.
The DCT and quantization are discussed with more details in the following chapter.
2
The algorithm has specifications for lossy image compression as well as for the
lossless image compression. The lossless compression is not widely used. Generally,
the lossy compression is most popular because it reduces the size of original image and
that way it helps not only in saving a disk space but also in transmitting the images
from one point to another very quickly. The loss depends on the compression ratio
which means one can adjust the compression parameters according to the requirements
of the size of an image. The more the compression ratio the smaller the image, that
means there is a tradeoff between size of an image and a quality of an image. JPEG can
compress color image by 10 to 20 times. For example, if you have an image of size
200K then resultant JPEG image will be of only 10-20K in size. It can compress gray
scale image by 4 to 5 time without causing the visible loss in the image. The other
important thing about JPEG algorithm is that you can vary the decoding speed by using
different approximations for the required calculations. The JPEG algorithm works on
the fact that human eyes cannot detect the small changes in color of an image and can
detect changes in brightness. Due to the above mentioned reasons, the algorithm is
mainly used in the applications where images are being viewed by humans and not by
machines because machine might easily detect changes in the color. One other
advantage of JPEG is that it can store color information in 24 bit per pixel. The only
disadvantage of JPEG lossy compression algorithm is that it loses more information if
you do compress and decompress repeatedly.
3
1.2 Purpose of Project
The main objective of this project is to study JPEG image compression
algorithm and implement discrete cosine transform (DCT) as well as Quantization
using microcontroller ATmega32. Once the DCT is successfully implemented, the
project can be extended to impalement other steps involved in the JPEG algorithm
using micro-controller and external flash memory can also be interfaced.
1.3 Organization of Project Report
The report contains detailed description of every aspect of this project including
the result obtained and future enhancement possibility.
Chapter two discusses in detail about the major steps involved in JPEG
compression which are DCT and Quantization. It also provides brief introduction about
implantation of DCT using floating point and fixed point method.
Chapter three describes different hardware components used for successfully
implementing this project. The major components are ATmega32 micro-controller and
Receiver/Transmitter Max232 IC.
Chapter four explains about system design and implementation through
description of block diagram and also discusses about interfacing of a micro-controller
with computer in order transfer data.
4
Chapter five concludes the report and provides prospective of future
implementation.
5
Chapter 2
2-DIMENSIONAL DISCRETE COSINE TRANSFORM
2.1 Introduction to DCT
2-Dimensional Discrete cosine transform is an integral. DCT is the most
important and most costly step in a process of JPEG compression algorithm.
Fundamentally, DCT works on the principle of converting the spatial domain
representation in to a frequency domain representation. Discrete cosine transform is
real part of Discrete Fourier Transform. Since, only real values are taken and imaginary
values are discarded, it has lower energy. Two dimensional discrete cosine transform is
calculated by first doing 1-D DCT on rows followed by 1-D DCT on columns or vice
versa.
2.2 Coefficients
The 2D-DCT can be computed by performing 1D-DCT for rows and columns
separately as shown in the figure 2 below. The left most top corner value in the matrix
of 8-by-8 is called as a “DC value” which is the average value of the block. All other
values in the block are “AC values” which represents changes in a block across its
height and width. The main idea behind doing the DCT is to separate out high and low
frequency information in the image so that it becomes easy to eliminate the high
frequency components without losing the low frequency components [2].
6
Figure 2 Method for computing 2D-DCT using 1D-DCT
In the mathematical form, DCT for a given block of size N x N can be given by,
……………………………………………… (1)
As the JPEG algorithm standard uses a block of size 8x8, we can put N=8 in the
above given equation 1. So the DCT step in JPEG algorithm takes in 64 input values
and produces unique set of 64 values which are two dimensional. These are sometimes
called as DCT coefficients.
In the output matrix of size 8x8 (64 values), the top-left corner value is called as
DC coefficient (zero frequency) and other 63 values are called as AC coefficients. The
main advantage of DCT is that it accumulates most of the energy in the low frequency
7
components. It discards information in the high frequency components without
affecting the quality of a resultant image as it is not visibly detected by human eyes [2].
In the example below, the input block of 8x8 from the gray scale image and the
output block after DCT are shown in the table 1 and 2.
140
144
147
140
140
155
179
175
144
152
140
147
140
148
167
179
152
155
136
167
163
162
152
172
168
145
156
160
152
155
136
160
162
148
156
148
140
136
147
162
147
167
140
155
155
140
136
162
136
156
123
167
162
144
140
147
148
155
136
155
152
147
147
136
Table 1 Input Image Pixel Matrix of Size 8x8 [6]
186
-18
15
-9
23
-9
-14
19
21
-34
26
-9
-11
11
14
7
-10
-24
-2
6
3
3
-20
-1
-8
-5
14
-15
-3
-3
-3
8
-3
10
8
1
18
18
18
15
4
-2
-18
8
-4
-4
1
-7
8
9
1
-3
4
-7
-7
-1
-2
0
-8
-2
2
4
-6
-6
0
Table 2 Output DCT Coefficient Matrix of size 8x8 [6]
An Example of DCT is shown in below figures from 3 to 7 for Saturn. The
percentage in the figure indicates that only those many coefficients are used to
reconstruct the image again. For example, if an image is of size 100x100 then DCT
with 25% contains only 100x100x0.25 (=2250) coefficients out of total 10,000
coefficients. When number of coefficients is decreased, entropy is also reduced. The
reduction in entropy can be marked by looking at the histograms. Discrete cosine
transform concentrates all the energy from lower frequencies. All the pixel values and
energy at higher frequencies is reduced to lower values and most of the times can be
approximated to zero. Human eyes are more sensitive towards lower frequencies pixel
values rather than higher frequencies pixel values. DCT takes this fact to its advantage.
Implementing DCT in hardware is a challenging as it is the most optimized algorithm
that implements uses at least 18 multiplies and 29 additions. Note the increase in blur
and loss in sharpness of the reconstructed image as more and more DCT coefficients
are discarded. But, still the image can be viewed without major loss. This property is
exploited in DCT for reducing size on disk while storing the image and reducing the
cost and bandwidth while transmitting the image. [2]
9
Figure 3 Saturn and its 2-D DCT [2]
Figure 4 Image reconstructed using all the DCT coefficients [2]
Figure 5 Image reconstructed utilizing 75% of DCT coefficients [2]
10
Figure 6 Image reconstructed utilizing 50% of DCT coefficients [2]
Figure 7 Image reconstructed using 25% of DCT coefficients [2]
2.3 Quantization
Quantization is performed by dividing each of the frequency domain
components by respective values in a quantization matrix and then rounding the
resultant value to a nearest integer. The quantization matrix is a JPEG standard 8x8
matrix which has predefined 64 values. The quantization matrix values are shown in
the table 3 below. The compression ratio of an overall JPEG algorithm can be varied
by multiplying the quantization matrix with some scaling factor. Low scaling factor
11
will provide excellent quality in the resultant image but it will have larger size than the
image produced with high scaling factor. The quantization is the only lossy step in the
process of JPEG compression algorithm because of this rounding operation. Mostly
high frequency components are quantized with larger values than low frequency
values to achieve higher compression with lesser loss in the resulting image.
16
11
10
16
24
40
51
61
12
12
14
19
26
58
60
55
14
13
16
24
40
57
69
56
14
17
22
29
51
87
80
62
18
22
37
56
68
109
103
77
24
35
55
64
81
104
113
92
49
64
78
87
103
121
120
101
72
92
95
98
112
100
103
99
Table 3 Quantization matrix for JPEG Standard
Mathematically, the quantization can be performed by the following equation
Quantized Value (a, b) = DCT (a, b)/ Quantum (a, b)…………………………… (2)
For example, the output matrix is shown in the table 4 below after quantization
using quantization matrix in the table 3 above, performed on the input matrix shown in
the 5 below.
12
190
13
-10
-7
1
-1
0
8
-45
-60
14
12
-5
2
-5
-8
-86
62
2
-17
4
40
-4
5
-54
-37
-10
31
24
7
-6
2
-87
-40
50
-18
38
-21
-1
0
-63
62
89
12
-8
6
10
-7
-18
13
-55
45
-6
12
8
10
-55
34
-14
-13
15
-9
-3
0
Table 4 Output matrix after quantization
12
1
-1
0
0
0
0
0
-3
-5
1
0
0
0
0
0
-6
5
0
0
0
1
0
0
-4
-2
0
1
0
0
0
0
-3
-1
1
0
0
0
0
0
-2
2
1
0
0
0
0
0
0
0
-1
0
0
0
0
0
-1
0
0
0
0
0
0
0
Table 5 Input matrix before quantization
13
Chapter 3
HARDWARE SYSTEM OVERVIEW
In this project, I have used ATmega32 microcontroller for communication with
PC as well as for performing the Discrete Cosine Transform (DCT) operation of JPEG
algorithm. Texas Instruments Max232 IC was used to interface ATmega32 with PC in
order to transmit and receive data.
3.1 ATmega32 Microcontroller Description
3.1.1 Features
There are lots of other micro-controllers available but this microcontroller was
use for this project specifically because of the following features available.
1. It is a 40 pin PDIP SOC package which is easy to use for general
purpose application.
2. It has programmable serial USART available with pins RXD and TXD
which is most useful for this project for transferring date between
microcontroller and PC.
3. It includes one feature called ISP (In-System Self-Programmable) which
means that EEPROM can be programmed without ejecting the controller
from its socket [3].
4. It is a low power microcontroller.
5. It has separate program memory of 32 K and date memory of 1 K
available which is more than enough for the application of this project
14
6. It has its own instructions set available and can be programmed using
that. It can also be programmed using C language and the cross compiler
can be used to convert that in to a hex code.
3.1.2 Pin Layout of ATmega32
Figure 8 Pin Layout of ATmega32 [3]
3.1.3 Pin Function - General Description
The table 6 explains the general function of each pin of ATmega32. Similar pins
are grouped together in order to understand the functionality easily.
Pin Name
Pin Functionality
15
VCC
Power supply for the chip
GND
Ground
PORT A (PA0…PA7)
It’s an 8-bit bi-directional I/O port. Port pins have internal
pull up resistors available. Port pins are in tri-state
condition when reset is active.
PORT B (PB0…PB7)
It’s an 8-bit bi-directional I/O port. Port pins have internal
pull up resistors available. Port pins are in tri-state
condition when reset is active.
PORT C (PC0…PC7)
It’s an 8-bit bi-directional I/O port. Port pins have internal
pull up resistors available. Port pins are in tri-state
condition when reset is active.
PORT D (PD0…PD7)
It’s an 8-bit bi-directional I/O port. Port pins have internal
pull up resistors available. Port pins are in tri-state
condition when reset is active.
RESET
A low level of this signal for more than one cycle puts
whole chip in the reset condition even if the clock is not
running.
XTAL 1
Input to the internal Oscillator amplifier as well as to the
internal clock operating circuit.
XTAL 2
Output from the internal Oscillator amplifier.
16
AVCC
It’s a supply pin for port A and Analog-to-Digital
converter if connected.
AREF
It’s an analog reference pin for Analog-to-Digital
converter.
Table 6 General functionality of ATmega32 pins [3]
3.1.4 Pin Function - Alternate Description
The table 6 provides a general description of all the pins of microcontroller
ATmega32. All four ports (Port A, Port B, Port C and Port D) can be used for other
functionality too which is described in the following four tables from 7 to 10.
Table 7 Port A pins alternate functionality [3]
17
Table 8 Port B pins alternate functionality [3]
Table 9 Port C pins alternate functionality [3]
18
Table 10 Port D pins alternate functionality [3]
19
3.2 Block Diagram of ATmega32 Micro-Controller
Figure 9 ATmega32 Block Diagram [3]
20
3.3 Oscillator
There are many options available for clock generation in the ATmega32 microcontroller. I have used crystal oscillator for this project to generate the clock.
As shown in the figure 10 below, there are two crystal pins available, XTAL1
which is an input and XTAL2, which is an output for an inverting amplifier which we
can configure to use as an on-chip clock generator.
Figure 10 Crystal Oscillator Connections [3]
There is a CKOPT fuse available to select between two different oscillator
amplifier modes. Programming the CKOPT will give full rail-to-rail swing on the
resulting clock and also provides higher frequency range. If CKOPT is not programmed
then output swing will be very small and frequency range available is also very small.
The other important thing about the connections in crystal oscillator is that C1 and C2
should always be equal [3].
21
3.4 Memories
The main advantage of ATmega32 is that it has separate program and data
memory available on-chip. It also has extra EEPROM available for data storage.
The ATmega32 has an on-chip reprogrammable flash memory available to store
the source code and the size of this memory is 32Kbytes which enough for an
application like I have developed. It has separate secure section for boot program
available.
There are total of 2144 locations available for data storage which is divided
between register file, I/O memory and internal SRAM data. First 96 location are
reserved for register file and I/O locations while other 2048 (2K) locations are
dedicated to SRAM data memory.
If required, extra flash memory can be interfaced to the micro-controller for data
storage. Many applications require more data memory then available on-chip. For
example, DCT is performed on block by block basis by micro-controller and then
resultant data is transferred to computer in this project. If it’s required to perform DCT
on all the blocks before sending back the data to computer then micro-controller’s onchip memory would not be enough and it will require extra flash memory to be
interfaced with it.
22
3.5 USART
The USART (Universal Synchronous Asynchronous Receiver Transmitter) is
basically a serial communication device and it was used in this project as to
communicate with computer in order to transfer data to and from micro-controller.
The block diagram of an USART is shown in the figure 11 below. There are
three main sections inside USART and they are clock generator, receiver and
transmitter.
Figure 11 Block Diagram of USART [3]
23
3.5.1 Clock Generator
The clock generator basically generates the clock required by receiver and
transmitter. The USART has four different modes available for clock generation.
1) Normal Asynchronous mode
2) Double Asynchronous mode
3) Master Synchronous mode
4) Slave Synchronous mode
Also, the clock frequency is set by baud-rate generator which in turn is set by
programming the USART Baud Rate Register (UBBR). The receiver clock frequency
is same as baud rate while the transmitter divides baud rate by 2, 4 or 8 depending on
the mode of operation. There are different equations available for calculating the baud
rate as well as for calculating the value to program UBBR register as shown in table 11.
Table 11 Equations for calculating Baud Rate [3]
24
3.5.2 USART Frame Format
The USART frame has following parts:
a) Start bit
b) 5,6,7,8 or 9 data bits
c) No, even or odd parity bit
d) 1 or 2 stop bits
The USART frame always starts with start bit which indicates the start of frame
followed by data bits. The parity is optional for USART. A frame format sequence is
shown in the figure 12.
Figure 12 USART Frame Format [3]
Where,
St =
Start bit which is always low
0 to 8= Data bits
P=
Parity bit (optional), even or odd
Sp=
Stop bit, always high
25
IDLE= No Transfer, must be high
The frame format is defined by initializing the USART for communication. There are
different registers available inside USART which needs to be programmed before
initiating any transfer.
3.5.3 USART Registers
1) UCSRA (Universal Control and Status Register A)
The table 11 shows description of this register.
Bit Number
7
6
5
4
3
2
1
0
RXC
TXC
UDRE
FE
DOR
PE
U2X
MPCM
Read/Write
R
R/W
R
R
R
R
R/W
R/W
Initial Value
0
0
1
0
0
0
0
0
Bit Name
Table 12 UCSRA Description [3]
Bit 7 - RXC (USART Receive Complete)
Set: When receive buffer is not empty
Clear: When receive buffer is empty
Bit 6 - TXC (USART Transmit Complete)
Set: When entire frame in transmit buffer is out
Clear: When transmit complete interrupt is generated
26
Bit 5 - UDRE (USART Data Register Empty)
Set: Means that transmit buffer is empty and ready to receive new data
Clear: Transmit buffer is not empty
Bit 4 - FE (Frame Error)
Set: If the first stop bit of received data is zero
Clear: If stop bit of received data is one
Bit 3 - DOR (Data Overrun)
Set: When data overrun is detected
Clear: Always initialized it as clear
Bit 2 - PE (Parity Error)
Set: If next data received in receive buffer has parity error
Clear: Always initialized it with zero
Bit 1 - U2X (Double the USART Transmission Speed)
Set: Indicates that Asynchronous operation is running with double speed
Clear: Always write zero for synchronous operation
Bit 0 - MPCM (Multi-Processor Communication Mode)
27
Set: Enables multi-processor communication mode
Clear: Write zero when using only one processor
2) UCSRB (USART Control and Status Register B)
Bit Number
Bit Name
7
6
RXCIE TXCIE
5
4
3
2
1
0
UDRIE
RXEN
TXEN
UCSZ2
RXB8
TXB8
Read/Write
R/W R
R/W
R/W
R/W
R/W
R/W
R
R/W
Initial Value
0
0
0
0
0
0
0
0
Table 13 UCSRB Description [3]
Bit 7 – RXCIE (Receive Complete Interrupt Enable)
-
Setting this bit enables interrupt for RXC flag in UCSRA
Bit 6- TXCIE (Transmit Complete Interrupt Enable)
-
Setting this bit enables interrupt for TXC flag in UCSRA
Bit 5- UDRIE (USART Data Register Empty Interrupt Enable)
-
Setting this bit enables interrupt for UDRE flag in UCSRA
Bit 4- RXEN (Receiver Enable)
-
Setting this bit Enables the UART receiver and override normal
functionality of pin
28
Bit 3- TXEN (Transmitter Enable)
-
Setting this bit Enables the UART transmitter and override normal
functionality of pin
Bit 2- UCSZ2 (Character Size)
-
This bit in combination with UCSZ1:0 sets the number of bit in a
frame for communication
Bit 1- RXB8 (Receive Data Bit 8)
-
It is the ninth data bit of received frame when using a frame with
nine bits
Bit 0- TXB8 (Transmit Data Bit 8)
-
It is the ninth data bit of transmitted frame when using a frame with
nine bits
3) UCSRC (USART Control and Status Register C)
Bit No.
Bit Name
Read/Write
Initial
7
6
5
4
3
2
1
URSEL UMSEL UPM1 UPM0 USBS UCSZ1 UCSZ0
0
UCPOL
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
1
0
0
0
0
1
1
0
Table 14 UCSRC Description [3]
29
Bit 7- URSEL (Register Select)
Set: When using UCSRC register
Clear: When using UBRRH register
Bit 6- UMSEL (USART Mode Select)
Set: Enables asynchronous operation
Clear: Enables synchronous operation
Bit 5:4- UPM1:0 (Parity Mode)
00: Parity mode disabled
01: Reserved
10: Even parity enabled
11: Odd parity enabled
Bit 3- USBS (Stop Bit Select)
Set: Enables communication with 2-stop bit
Clear: Enables communication with 1-stop bit
Bit2:1- UCSZ1:0 (Character Size)
-
Works in combination with UCSZ2 for selecting number of data bit
in a frame
30
Bit 0- UCPOL (Clock Polarity)
Set: Enables transmitted data to be changed on falling edge of clock and
received data to be sampled on rising edge of clock
Clear: Enables transmitted data to be changed on rising edge of clock
received data to be sampled on falling edge of clock
4) UBBRH and UBBRL (USART Baud Rate Registers)
Bit No.
15
14
13
12
Bit Name
URSEL
-
-
-
Bit name
11
10
9
8
UBBR [11:8]
UBBR [7:0]
Bit No.
7
6
5
4
3
2
1
0
Read/Write
R/W
R
R
R
R/W
R/W
R/W
R/W
Read/Write
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
Table 15 UBBRH and UBBRL Description [3]
An initial value for all bits is zero.
Bit 15- URSEL (Register Select)
Set: Selects UCSRC register
Clear: Selects UBBRH register
Bit 14:12- Reserved Bits
31
Bit 11:0- UBBR 11:0 (USART Baud Rate Register)
-
Bits 11:8 are from UBBRH and bits 7:0 are from UBBRL register.
-
These bits are used to set the baud rate.
3.6 Driver/Receiver MAX 232
The microcontroller communicates to the PC using an IC called MAX 232
which provides a simple receiver and transmitter functionality. The main feature of this
IC is that it has dual pair of driver/receiver available.
3.6.1 Pin Layout of Max 232
Figure 13 Pin Diagram of MAX 232 [4]
32
3.6.2 Pin Description
Pin Name (Pin Number)
Pin Description
C1 +
+ connector for capacitor C1
Vs +
Output of voltage pump
C1 -
- connector for capacitor C1
C2 +
+ connector for capacitor C2
C2 -
- connector for capacitor C2
Vs -
Output of voltage pump
T2OUT
Driver 2 output
R2IN
Receiver 2 input
R2OUT
Receiver 2 output
T2IN
Driver 2 input
T1IN
Driver 1 input
R1OUT
Receiver 1 output
R1IN
Receiver 1 input
T1OUT
Driver 1 output
GND
Ground
VCC
Power supply
Table 16 Pin Description of MAX 232 [4]
33
3.6.3 Functional Description
The basic principle of Max 232 is to convert signals from an RS-232 serial port
to TTL which is compatible in digital logic circuits. Basically, drivers convert from
TTL to RS-232 and receivers convert from RS-232 to TTL. When converting from one
to another, it actually changes the voltage level of particular pin. The functionality or
receiver and transmitter can be given by following figure.
Figure 14 Logic Diagram for Driver / Receiver [4]
An RS-232 works with the voltages in the range of -15V to -3V for logic “0” and +3V
to +15V for logic “1”. These voltage ranges are not compatible with digital logic
circuits and therefore they needs to be converted in to an appropriate levels to
communicate with them. The table 17 shows voltages ranges for RS-232 and TTL
along with corresponding logic level.
34
RS-232
TTL
Logic
-15V to -3V
+2V to +5V
1
+3V to +15V
0V to +0.8V
0
Table 17 Voltage ranges for RS-232 and TTL [10]
35
Chapter 4
SYSTEM DESIGN AND IMPLEMENTATION
4.1 Flow of Project
The main goal of this project was to implement the DCT and quantization steps
of JPEG algorithm using micro-controller to convert an image from BMP format to
JPEG format and it is being accomplished with combination of software and hardware.
The flow chart of the project implementation is show in the figure 15. The microcontroller in only involved in two steps in that flow chart.
First of all the input is a gray scale image is loaded into buffers and the headers
processed according to the type of image using a MFC application. On valid BMP files
the data is sampled.
The Micro-controller communicates with computer using MFC application in
order to start data transfer. First an 8-byte header is initially sent to the microcontroller. The first byte tells micro-controller on what action to take on incoming data.
If the first byte value is zero (0) then micro-controller will perform DCT on it and if it’s
one (1) then micro-controller will perform IDCT on incoming values. The next Two
bytes contain information on how many blocks of data is going to be transferred. The
rest 5 bytes are reserved for future use.
After header, micro-controller receives quantization table from computer. Once
received the micro-controller decides which function needs to be called. It inputs a 64-
36
bytes character array. This data is sent to the “DCT ()” function within the microcontroller.
The “DCT ()” function takes some time to process in the data in microcontroller. So the MFC poll's for a key with information “done” from the microcontroller. Once the process is complete on the given block, the micro-controller sends
the key “done” to the MFC. Then the output is sent by the micro-controller, 128-bytes
(64 short int) from “DCT ()”.
This information is written back into the files in the required order and the file
pointers as well as the Dynamic memory allocations are freed. Then the MFC
application process on that data, insert as header part on it and converts it in to JPEG
file.
37
Figure 15 Flow chart of project
38
4.2 Block diagram of System
The block diagram of whole system is shown in figure 16 below.
Figure 16 Block diagram of a system
As shown in the figure 16, the system consists of ATmega32 micro-controller,
Receiver/Transmitter Max232, connectors and power supply. The main part of this
system is the micro-controller which performs DCT and quantization. The MAX232 is
used to interface micro-controller with computer to send and receive data. There are
39
only two pins required for this interface. Those pins are PD0 and PD1 from port D. In
normal mode, these pins work as input or output but when used for serial
communication, they work as RXD and TXD respectively. The micro-controller sends
and receives data from computer serially through RXD and TXD pins of USART inside
micro-controller. The USART needs to be initialized (programmed) before any transfer
starts.
4.3 Software Implementation
The software for DCT and quantization is implemented using C language and
the code is converted into an assembly language using code-vision compiler. The
following flowchart shows an implementation of the software code.
40
Figure 17 Flow chart for software implementation
41
4.3.1 Initialization of Micro-Controller
As mentioned before, all the registers inside microcontroller needs to be
programmed properly in order to start serial communication between micro-controller
and computer. The following code snippet shows an initialization of port and direction
of different pins as well as initialization of USART at the start of transfer.

Port A initialization
PORTA=0x00; // Input Port
DDRA=0x00;

Port B initialization
PORTB=0x00; // Input Port
DDRB=0x00;

Port C initialization
PORTC=0x00; // Input Port
DDRC=0x00;

Port D initialization
PORTD=0x00; // Input Port
DDRD=0x00;
42

USART initialization
UCSRA=0x00;
UCSRB=0x18; // Receiver and Transmitter ON
UCSRC=0x86; // Communication Parameters: 8 Data, 1 Stop, No Parity
UBRRH=0x00; // USART Baud rate: 9600
UBRRL=0x67;
4.3.2 Implementation of DCT
The micro-controller first fetches an 8x8 blocks of data from computer and then
performs DCT on that data. After the DCT is completed for one block, micro-controller
sends back the data to computer and then starts fetching new block. The process
continues until all blocks of an image completed.
Following code snippet shows an implementation on the DCT function.
for(u=0;u<=7;u++) {
for(v=0;v<=7;v++)
{
x=0;
if(u==0 && v==0)
else if (v==0 || u==0)
{coeff=.125;}
{coeff=1.414*.125;}
43
else
{coeff=.25;}
x=0;
for(i=0;i<=7;i++)
{
for(j=0;j<=7;j++) {
x=x+(norm[i][j]*(cos(3.142*(i+.5)*u/8))*(cos(3.142*(j+.5)*v/8)));
}
}
dct1[u][v]=x*coeff;
}
}
4.4 Code Optimization
Implementation in digital signal processing requires a lot of computation and
simulation time so it is very important to write an optimized code. Software tools and
compiler are very good at optimization now-a-days but still they are not as good as
requires. Many times it becomes necessary to optimize the code after compilation. The
optimization is recommended after successful completion of an initial implementation.
In this project I have implemented the functionality of micro-controller using C
language and then converted that C code into an assembly code using a compiler called
code vision. Although the compiler is good in optimizing the resultant assembly code,
44
there are some portions in C code which can be better optimized by hand coding than
using compiler.
I have optimized some functionality of micro-controller using an assembly
language itself which helps in two ways. First, it is more optimized code than compiler
and the second is that compiler does not have to even try to optimize that portion of
code as it is already written in an assembly language. One may argue that why don’t we
write a whole code in assembly if it is more optimized? The answer to that questions is,
it is not always feasible to write assembly code for whole design because implementing
digital signal processing related stuff is very hard using assembly only as it requires lot
of mathematics and computation.
Following code snippets show an optimization of code by replacing a condition
of test and branch with single instruction. Also it optimizes the code by removing an
extra register require for comparison during test.
Code for Fetching data from computer:
Before optimization:
// for(i=0;i<=7;i++)
CLR R8
_0x4:
LDI R30,LOW(7) // load i with 7
45
CP R30,R8 // compare R30 with R8
BRLO _0x5 // if R30 < R8, come out of “ I ” loop
// for(j=0;j<=7;j++)
CLR R9
_0x7:
LDI R30,LOW(7) // load j with 7
CP R30,R9 // compare R9 and R30
BRLO _0x8 // if R30 < R9, come out of “ j ” loop
_0x8:
INC R8 // increment count to compare with “ i ”
RJMP _0x4 // jump to next iteration for “ i ”
// j++ and go inside loop for j
INC R9 // increment counter to compare with “ j ”
RJMP _0x7 // jump for next iteration of “ j ”
After optimization:
LDI R16 , 8 // i loop, load register R16 with 8
46
fori:
// label for “ i ” loop
LDI R18,8 // j loop, load register R18 with 8
forj:
// label for the loop
DEC R18 // decrement R16 by 1 (for j loop)
BRNE forj // if j=0 then jump out of loop
DEC R16
// Decrement R16 by 1 (for i loop)
BRNE fori // if i=0 then jump out of loop
Performance difference:
Flash usage before optimization = 11.7%
Flash usage after optimization= 11.2%
Figure 18 Performance improvement using code optimization
47
Chapter 5
CONCLUSION
After detailed study of JPEG compression algorithm, now I understand how it is
used to compress image. Also, I came to know about how to make tradeoff between the
size of an image and quality of an image by controlling the quantization. The JPEG
algorithm takes an advantage of human eye limitation to reduce the size of an image.
This project is mainly based on image processing and the main goal of this
project was to implement the discrete cosine transform and quantization using a microcontroller and also interface the micro-controller with computer. I did not have much
idea about the project when I first started it but while working on this project I got
know about many important concepts about image processing as well as DSP (Digital
Signal Processing).
The milestones achieved in this project are:
1. Successfully implemented DCT and quantization using micro-controller
2. Interfacing on a computer with micro-controller.
In future, this project can be expanded to implement other steps of JPEG
algorithm like parsing the data from image, Huffman coding etc using micro-controller
and also interfacing an extra flash memory to process more data.
48
APPENDIX
#include <mega32.h>
#include <delay.h>
#include<math.h>
// Standard Input/Output functions
#include <stdio.h>
// Declare global variables
short c[8][8],norm[8][8];
char qt[8][8];
int u,v;
float coeff;
float x;
char pixel[8][8];
unsigned char i,j,p,q;
short temp_short;
char key[5]="done",temp;
short notimes;
void DCT()
{
// int i,j;
float dct1[8][8];
float temp;
//store pixel into short int array norm
for(i=0;i<=7;i++)
49
{
for(j=0;j<=7;j++)
{
if(pixel[i][j]&0x80)
norm[i][j]=(0xff00|pixel[i][j]);
else
norm[i][j]=pixel[i][j];
}
}
//DCT
for(u=0;u<=7;u++)
{
for(v=0;v<=7;v++)
{
x=0;
if(u==0 && v==0)
{coeff=.125;}
else if (v==0 || u==0)
{coeff=1.414*.125;}
else
{coeff=.25;}
x=0;
for(i=0;i<=7;i++)
{
for(j=0;j<=7;j++)
50
{
x=x+(norm[i][j]*(cos(3.142*(i+.5)*u/8))*(cos(3.142*(j+.5)*v/8)));
}
}
dct1[u][v]=x*coeff;
}
}
//implementation of quantization
for (i=0;i<=7;i++)
{
for(j=0;j<=7;j++)
{
temp=(dct1[i][j])/(qt[i][j]) ;
c[i][j]=(short int)(temp+.5);
}
}
}
void call_dct()
{
PORTA.0=1;
for(p=0;p<notimes;p++)
{
for(i=0;i<8;i++)
for(j=0;j<8;j++)
{
51
pixel[i][j]=getchar();
}
DCT();
for(i=0;i<4;i++)
putchar(key[i]);
for(i=0;i<8;i++)
for(j=0;j<8;j++)
{
temp_short=c[i][j]&0xff00;
temp=(temp_short>>8);
putchar(temp);
//delay_ms(10);
temp=c[i][j]&0xff;
putchar(temp);
}
}
PORTA.0=0;
}
void main(void)
{
unsigned char header[8];
// Input/Output Ports initialization
// Port A initialization
PORTA=0x00;
DDRA=0x00;
52
// Port B initialization
PORTB=0x00;
DDRB=0x00;
// Port C initialization
DDRC=0x00;
// Port D initialization
PORTD=0x00;
DDRD=0x00;
// Timer/Counter 0 initialization
TCCR0=0x00;
TCNT0=0x00;
OCR0=0x00;
// Timer/Counter 1 initialization
TCCR1A=0x00;
TCCR1B=0x00;
TCNT1H=0x00;
TCNT1L=0x00;
ICR1H=0x00;
ICR1L=0x00;
OCR1AH=0x00;
OCR1AL=0x00;
OCR1BH=0x00;
OCR1BL=0x00;
// Timer/Counter 2 initialization
ASSR=0x00;
53
TCCR2=0x00;
TCNT2=0x00;
OCR2=0x00;
// External Interrupt(s) initialization
MCUCR=0x00;
MCUCSR=0x00;
// Timer(s)/Counter(s) Interrupt(s) initialization
TIMSK=0x00;
// USART Baud rate: 9600
UCSRA=0x00;
UCSRB=0x18;
UCSRC=0x86;
UBRRH=0x00;
UBRRL=0x67;
ACSR=0x80;
SFIOR=0x00;
while (1)
{
for(i=0;i<8;i++)
header[i]=getchar();
temp_short=header[1];
temp_short<<=8;
temp_short|=header[2];
notimes=temp_short;
for(i=0;i<8;i++)
54
for(j=0;j<8;j++)
{
qt[i][j]=getchar();
}
if(header[0]==1)
call_dct();
else
call_idct();
};
}
55
REFERENCES
[1]
Randall C. Reiningek and Jerry D. Gibson, “Distributions of the Two-Dimensional
DCT Coefficients for Images”, IEEE Transactions on Communications, Vol. 31,
Issue 6, June 1983
[2]
Syed Ali Khayam, “Discrete Cosine Transform (DCT): Theory and Application”,
Michigan State University, March 2003
[3]
Atmel Corporation, “ATmega32 microcontroller”, pp 1-170, July 2010
[4]
Texas Instruments, “MAX232 Drivers/Receivers”, pp 1-9, Oct. 2002
[5]
Agostini, Silva and Bampi, “Pipelined fast 2D DCT Architecture for JPEG
image compression”, Integrated Circuits and System Design, Issue 2001, pp
226-231, 2001
[6]
Viranchi Dwivedi, “JPEG Image Compression and Decompression with
Modeling of DCT Coefficients on the Texas Instrument Video Processing
Board TMS320DM6437”, Master Project Report, California State University,
Sacramento, Summer 2010
[7]
David Taubman and Michael Marcellin, “JPEG 2000: Image Compression
Fundamentals, Standards and Practice”, Springer 2001, ISBN: 079237519X
[8]
Al Bovik, “Handbook of Image & Video Processing”, Academic Press Series, 1999
[9]
Sakamoto and Tase, “Software JPEG for a 32-bit MCU with dual issue”, IEEE
transactions on Consumer Electronics, Vol. 44, Issue 4, Nov 1998
Download