JPEG Decoder Design - UCR Libraries Websites

advertisement
EE175WS-00-11
EE175WS00-11
JPEG Decoder Design
Team # 11
EE 175AB: Senior Design Project
June 14, 2000
John P. Jones
Technical Advisor: Frank Vahid
Project Advisor: Barry S. Todd
EE175WS-00-11
Executive Summary
This design project consisted of designing and implementing a JPEG decoder system.
JPEG is a commonly used digital image compression algorithm officially known as ISO
Standard 10918-1. JPEG coding allows digital images to be stored in a compressed form
that achieves anywhere from 12:1 to 100:1 depending on the acceptable loss in image
quality of the compression.
This JPEG decoder system is primarily intended for use in consumer electronics devices
such as digital cameras. This use requires the design to have a number of features
including low price, low power, compact design, and high speed. To meet these
requirements the system was designed as a custom digital logic component described in
the standard hardware description language VHDL (VHSIC Hardware Description
Language). This type of design is a high-level description of the system that is then
translated into a digital circuit.
A number of challenges needed to be met to design and implement a JPEG decoder in
hardware rather than in software running on a microprocessor. JPEG coding normally
requires many floating-point calculations. Since these types of calculations are not
efficiently implemented in custom hardware they were replaced by scaled fixed-point
approximations. Also the JPEG decoding algorithm requires a substantial amount of
memory. To reduce the memory requirements of the JPEG decoding only the core
algorithm, which works on relatively small blocks of data, was implemented.
To test and demonstrate the design a Field Programmable Gate Array (FPGA) prototype
board was purchased. Unfortunately the project cannot currently be downloaded to the
prototype board due to time constraints. The JPEG decoder system does work in
Simulation however and the results of that simulation will be presented.
1
EE175WS-00-11
Acknowledgements
I would like to thank the following individuals for their assistance throughout the course of
this project. Their help has greatly improved the project, my understanding of the
concepts and the design process in general.
Dr. Frank Vahid
Dr. Vahid has been very helpful in providing the resources necessary for this project.
Dr. Vahid was also very helpful in helping me to decide upon this project and
providing me with ideas on where to find information helpful to the project.
Barry Todd
Mr. Todd has provided a lot of guidance on the design process and project
management.
Tony Givargis
Tony Givargis has helped considerably by providing me with a book on Graphics File
Formats and a rough Inverse Discrete Cosine Transform unit, which I was able to
modify and incorporate into the design. Also Tony’s PC side serial interface
library for Windows was used to communicate with the XESS board.
Jeremy Thorpe
Jeremy Thorpe was helpful in helping to configure and test the board and to help
develop the serial communications devices on the board. In the early parts of
testing Jeremy and I were able to work together to solve our common problems
communicating with the development board.
2
EE175WS-00-11
Keywords / Terminology
Following is a list of some important terminology used in this report and a brief description
of its usage.

JPEG (Joint Photographic Experts Group)
The Joint Photographic Experts Group is a standardization body for the development
of continuous tone computer image algorithms.

ISO 10918-1
ISO 10918-1 is the formal name for the basic image compression algorithm
developed by the Joint Photographic Experts Group and is commonly called
JPEG. This is the algorithm that is discussed and decoded in this project.

VHDL (VHSIC (Very High Speed Integrated Circuit) Hardware Description Language)
VHDL is an IEEE standardized language for describing the function and behavior of a
digital logic device.

VLSI (Very Large Scale Integration)
Very Large Scale Integration is a description of the process of designing and
implementing digital systems using CMOS Integrated Circuit technology.

SOC (System On a Chip)
Today as minimum chip feature sizes decrease the effective area on a single IC is
growing rapidly. Many designers are working to design entire systems on a
single chip from components the same way that ICs are commonly
interconnected on a printed circuit board today.

Rapid Prototyping
Rapid Prototyping is the effort to increase turn-around time for design testing by
initially testing designs on a programmable logic device before they are sent to a
fabrication plant for prototyping.

FPGA (Field Programmable Gate Array)
A Field Programmable Gate Array is a type of re-configurable logic device that uses
an array of logic blocks that can be programmed and interconnected to one
another to implement both combinational and sequential logic.

CPLD (Complex Programmable Logic Device)
A Complex Programmable Logic Device is another type of re-configurable logic
device that uses a number of PLA (Programmable Logic Array) type devices
interconnected on a single chip.
3
EE175WS-00-11

XILINX
XILINX is a company that builds and sells FPGAs, CPLDs and a number of software
packages that allow these chips to be programmed from a number of different
sources including VHDL code.

XESS (X Engineering Software Systems)
XESS Corporation is a manufacturer of prototype boards with Xilinx FPGAs. The
prototype board used for testing in this project was manufactured by XESS.

DCT (Discrete Cosine Transform)
The primary principle of JPEG compression of an image is based on a discrete
frequency transformation called the Discrete Cosine Transform.
This
transformation is related to the standard Discrete Fourier Transform but has
specific properties that make it applicable to image processing.

Huffman Coding
Huffman Coding is an algorithm to minimize the length of messages by using a short
code word to encode highly probable symbols such as the letter ‘e’ and longer
code words for less probable symbols such as the letter ‘z’.

JFIF (JPEG File Interchange Format)
The JPEG File Interchange Format is a commonly used file format for storing JPEG
encoded streams of data for storage and communication. JFIF files are
commonly named with a .JPG file extension.

BMP (Bitmap)
A Bitmap is a device independent format for describing a graphics image as a simple
array of pixel values. This method is commonly used for displays and image
processing algorithms since it provides a simple Cartesian representation of the
data.
4
EE175WS-00-11
Table Of Contents
EXECUTIVE SUMMARY ......................................................................................................................................... 1
ACKNOWLEDGEMENTS ........................................................................................................................................ 2
KEYWORDS / TERMINOLOGY............................................................................................................................. 3
TABLE OF CONTENTS............................................................................................................................................. 5
INTRODUCTION ........................................................................................................................................................ 7
PROBLEM STATEMENT ......................................................................................................................................... 8
SPECIFICATION............................................................................................................................................................. 8
General Description ............................................................................................................................................... 8
Performance Requirements.................................................................................................................................... 8
SOLUTION .................................................................................................................................................................. 10
ALTERNATE SOLUTIONS ANALYSIS .......................................................................................................................... 10
Software Implementation...................................................................................................................................... 10
Hardware Implementation ................................................................................................................................... 10
Solutions Analysis Table....................................................................................................................................... 11
ENGINEERING ANALYSIS........................................................................................................................................... 12
DESIGN OVERVIEW.................................................................................................................................................... 12
HUFFMAN DECODER ................................................................................................................................................. 14
RUN-LENGTH DECODER............................................................................................................................................ 14
QUANTIZATION DECODER......................................................................................................................................... 15
INVERSE DISCRETE COSINE TRANSFORMATION....................................................................................................... 16
XESS DEVELOPMENT BOARD INTERFACING ........................................................................................................... 17
TESTING PROCEDURE.......................................................................................................................................... 18
Simulation ............................................................................................................................................................. 18
Synthesis & Hardware Testing ............................................................................................................................ 18
RESULTS ..................................................................................................................................................................... 19
BUDGET / RESOURCES ............................................................................................................................................... 19
COMPARISON TO SPECIFICATIONS ............................................................................................................................ 19
Precision ............................................................................................................................................................... 20
Chip Area .............................................................................................................................................................. 20
Speed ..................................................................................................................................................................... 20
Power .................................................................................................................................................................... 20
CONCLUSIONS AND RECOMMENDATIONS................................................................................................. 21
WHAT WAS LEARNED ............................................................................................................................................... 21
WHAT WENT WRONG ............................................................................................................................................... 21
FUTURE WORK .......................................................................................................................................................... 21
REFERENCE DOCUMENTS ................................................................................................................................. 23
APPENDICES ............................................................................................................................................................. 24
FIXED-POINT ARITHMETIC ........................................................................................................................................ 24
5
EE175WS-00-11
SCHEMATICS .............................................................................................................................................................. 25
JPEG Decoder Unit.............................................................................................................................................. 25
Huffman Decoder / Run-Length Decoder Unit ................................................................................................... 25
Quantization Decoder Unit .................................................................................................................................. 26
Inverse Discrete Cosine Transformation Unit .................................................................................................... 26
VHDL SOURCE CODE ............................................................................................................................................... 28
JPEG Library........................................................................................................................................................ 28
JPEG Decoder Unit.............................................................................................................................................. 30
Huffman / Run-Length Decoder Unit .................................................................................................................. 36
Quantization Decoder Unit .................................................................................................................................. 41
Inverse Discrete Cosine Transform Unit ............................................................................................................. 43
Serial Input Controller ......................................................................................................................................... 52
Serial Output Controller....................................................................................................................................... 54
Memory Input Controller ..................................................................................................................................... 56
MATLAB & C++ CODE .............................................................................................................................................. 60
Data Create & Test Matlab Script....................................................................................................................... 60
Huffman Coding in C ........................................................................................................................................... 63
DCT Test Matlab Code ........................................................................................................................................ 73
Computation of DCT Coefficient Matrix ............................................................................................................. 74
Quantization Testing in Matlab ........................................................................................................................... 75
Image DCT, Quantization, De-Quantization, IDCT Testing in Matlab ............................................................ 76
XESS XSV BOARD V1.0 MANUAL .......................................................................................................................... 78
6
EE175WS-00-11
Introduction
Since modern computer systems are required to store and transmit vast amounts of data
the field of data compression has become very important. One form of data that is
commonly processed by computer systems is graphic images. To compress graphic
images the Joint Photographic Experts Group (JPEG) developed a method of
compressing images by reducing the precision of the high-frequency portions of images.
This allows the images to be stored more compactly without sacrificing the important lowfrequency portions.
This is done by first dividing the image into an array of 8 pixels by 8 pixels data blocks and
performing a transformation on these data blocks that expresses each data block by a
linear combination of sinusoidal components of harmonic frequencies. Then the
magnitudes of the components corresponding to the higher frequency harmonics are
stored with less precision then the lower frequencies. This filtering loses some of the
detail of the image but retains most of the image’s information since the human eye acts
as an integrator, which reduces the contribution of high detail portions of our visual field.
After being filtered the data is coded so that large values will be stored with larger numbers
of bits then smaller values. This process allows a variable length coding of the data for
compression. Finally the data is Huffman coded so that more frequent data values are
stored as shorter codes
This algorithm for image compression is formally known as ISO10918-1 but is commonly
referred to as JPEG after the standardization body that developed it. JPEG is frequently
used both on the Internet and in consumer electronics devices such as digital cameras.
To decode JPEG images into uncompressed data commonly stored as Bitmaps, which
are a device independent representation of the array of pixels that make up an image a
device called a JPEG decoder, is needed to restore the image. This device performs the
inverse of the JPEG encoder, which encodes bitmap images as JPEG streams. Usually
JPEG encoders and decoders are written as programs in a high-level language such as C
or C++ and run on general-purpose microprocessors.
The purpose of this project is to design and implement a JPEG decoding system that can
be incorporated into a digital camera design. This application requires the JPEG decoder
to be simple, fast, low power, and easily integrated into a larger system. Such systems
have been built before and are commonly used in consumer electronics devices such as
digital cameras. Also JPEG decoder designs such as the one built for this project are
available for purchase and can be incorporated into larger designs.
7
EE175WS-00-11
Problem Statement
This project involved the design of a JPEG decoder. JPEG, the Joint Photographic Expert
Group, is a standardization body that produces standards for continuous tone image
coding. Perhaps the best known such standard is IS10918-1 which is a widely used
image compression standard. The JPEG decoder designed in this project will be used to
decode a JPEG File Interchange Format (JFIF) file into an uncompressed bitmap file.
JFIF is the file format that is commonly associated with JPEG and is used widely on the
Internet and in consumer electronics devices to store still image data. While the JPEG
standard (ISO 10918-1) defines a large class of related compression algorithms the JPEG
decoder designed for this project will focus on the simplest and most widely used such
algorithm known as baseline JPEG.
The wide use of JPEG in consumer electronics devices such as digital cameras produces
a need for a fast, low-power implementation that is capable of meeting the demands of the
overall system. As with any digital system the JPEG decoder could be implemented either
in software running on a general purpose microprocessor, or more likely a special purpose
microprocessor such as a Digital Signal Processor, DSP, or with custom hardware
circuitry.
The advantages and disadvantages of both software and hardware
implementations will be discussed shortly. While this project will produce only the JPEG
decoder much of the design would be reusable in the design of a JPEG encoder.
This project will demonstrate the JPEG decoder using a Field Programmable Gate Array
(FPGA). The FPGA will be programmed with the JPEG decoder design and will receive
input JPEG images from a serial communication link with a computer system and send
the decoded output images back to the computer for viewing.
Specification
General Description
This project requires that a system for decoding JPEG images into a standard bitmap
image representation be implemented. This system must adhere to the baseline JPEG
standard described by ISO 10918-1.
Performance Requirements
Precision
Keep average per pixel error to within 3% of a standard floating-point implementation of
JPEG decoding.
Chip Area
Maintain a reasonable area for the implementation of the JPEG decoder. The target
FPGA has a capacity of about 300 thousand gates. Since routing of components
produces a less then optimal usage of an FPGA it is desired to keep the gate count at
about 140 thousand gates. This target gate count will be useful in ensuring that the entire
design will be able to fit onto the FPGA board.
8
EE175WS-00-11
Speed
It is desirable to maximize the JPEG decoder’s speed. While again the speed of the
circuit is highly dependent on implementation technology the JPEG decoder must be able
to perform at speeds between the speed of a software implementation of JPEG decoding
and the speed of a fully optimized JPEG decoder design that is available commercially.
While speed is a crucial design point in a production design it will not be emphasized in
this prototype using an FPGA while the design should be suitable for optimization towards
a specific usage.
Power
Lastly the power consumption of the JPEG decoder must be within acceptable limits.
However since the power consumed is determined by the FPGA used not the design itself
only simulation data will be available to measure the predicted actual power consumption
of the JPEG decoder when implemented using as an ASIC (Application Specific
Integrated Circuit).
9
EE175WS-00-11
Solution
Alternate Solutions Analysis
As previously mentioned a digital system can be implemented either in software running
on a microprocessor or with a custom designed digital logic circuit. These are the major
realms of digital system design; each of these solutions has a wide variety of design
decisions associated with them.
Software Implementation
Implementation of the JPEG decoding algorithm in software is very common. There are
numerous open-source software implementations of JPEG in languages such as C and
C++. The existence of this software and the easy accessibility to C compilers for most
microprocessor designs simplifies the software design to the point where only moderate
coding would be required to modify one of these implementations for a specific use. Since
microprocessors are relatively affordable at low volumes such an implementation would
be essential for a small volume product. In some applications that use a microprocessor it
would be reasonable to bear the extra load of JPEG decoding on the microprocessor but
in many situations the microprocessor is a valued resource that would better be utilized
performing other calculations.
Hardware Implementation
The repetitive, well defined nature of the process of JPEG decoding lends itself very well
to a hardware implementation where the ease of design and implementation are traded off
for a faster, less power consuming solution which allows greater computational flexibility at
the cost of design effort. In addition to these conventional arguments for a hardware
implementation the constantly expanding area of chips produced by the continual
progress of Moore’s Law, which states that chip capacity will double every 18 months,
provides another reason to consider a hardware implementation of JPEG decoding. The
increased chip capacity has allowed, in recent years, the combination of a generalpurpose microprocessor with custom logic units on a single chip. By placing these
external units on the same piece of silicon as the microprocessor the costs of
communications are greatly reduced. As the capacity of integrated circuits continues to
increase such System-On-a-Chip designs will continue to grow in popularity.
VHDL – VHSIC Hardware Description Language
VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL) is a
powerful language used for the description of digital circuits. VHDL allows the mixture of
both high-level behavioral descriptions and low-level structural descriptions to be
connected and used together. Using these multiple levels of abstraction together allows
the design process to focus on testing functionality and then optimizing the critical areas of
the design by specifying them at a more detailed level where the designer can optimize
the circuit as needed to meet specifications.
10
EE175WS-00-11
Verilog
Verilog is another standardized hardware description language. In contrast to VHDL,
Verilog is more commonly used in the United States since it is more popular in industry.
Schematic Capture
Before hardware description languages such as VHDL and Verilog became popular the
standard industry practice was to use CAD programs to draw board and chip layouts from
standard components and simple logic gates. This method is similar to structural
architecture description in hardware description languages. Schematic capture is
basically a graphical way of connecting standard and custom discrete components
together to develop a digital system. While sophisticated automated routing tools are
available in packages such as Protel the placement of the components must be done by
hand in most instances, which increases the design complexity of the process. Modern
HDL synthesis tools take advantage of regular design structures such as Field
Programmable Gate Arrays (FPGAs) to simplify the task of placement and routing of logic
for a design.
Magic Layout Design Editor
Magic is a layout design editor for CMOS technology where actual transistors are created
from the varying layers of silicon of varying impurity levels and insulation layers. These
transistors are then routed by metal layers into the actual physical structure of a microchip.
The output of such a tool would then be extracted to a simulation tool such as PSpice and
accurate, albeit intolerably slow, simulations could be run on the design. Finally the masks
defined by the Magic generated layout would be sent to manufacturing where the actual
manufacturing masks would be generated and the chip could be produced.
Solutions Analysis Table
Design Metrics
Software
Vs. Solutions
VHDL –
Behavioral
Synthesis
VHDL –
Structural
Description
Schematic
Capture /
Magic
Design Ease /
Design Cost
Winner
Good
Acceptable
Unacceptable
Speed
Worst Case
Acceptable
Good
Good
Power
Worst Case
Good
Good
Winner
Chip Area
Efficiency.
Worst Case
Acceptable
Good
Winner
Accuracy
Winner
Good
Good
Good
Unit Cost (@
High Volume)
Poor
Good
Good
Good
11
EE175WS-00-11
This table clearly shows that while the low level solutions towards the right side have
better performance attributes the high level solutions towards the left side have better
practicality attributes. The behavioral or algorithmic VHDL description method provides a
strong compromise between software design, in which JPEG is usually implemented, and
higher performance hardware solutions.
Engineering Analysis
The relationships between layout tools, schematic capture tools, and hardware description
languages are very closely analogous to the corresponding relationships in software
between machine languages, assembly languages, and programming languages.
Just as the current focus in software design is on reusable, machine independent
algorithmic descriptions the use of a technology independent description language such
as VHDL or Verilog are strongly preferred. The effort expended on designing HDL
descriptions of digital circuits can be reused and optimized as logic synthesis tools
become more powerful. This potential for improving designs through the advancement of
synthesis tools and implementation technologies makes the design of large libraries of
digital designs to be designed and reused as software libraries are today. This concept of
a design, described in an HDL, has been termed Intellectual Property or IP which conveys
the great potential importance of reusable designs.
For all these reasons digital system design in a High-Level Description Language is
becoming the preferred method for design of hardware and the relative ease of this design
in an HDL is comparable to software implementation in a High-Level Programming
Language such as C and C++.
The proposed design for the JPEG decoder will allow the pipeline structure of the JPEG
decoding operation to be performed in a parallel manner to enhance the operation’s
concurrency thus increasing speed. Due to this explicit parallelism of the design a
hardware implementation of JPEG decoding has a great potential for being faster then
software implementations.
Design Overview
The JPEG decoder device was designed and implemented in VHDL at a behavioral
description level of abstraction to be synthesized to logic gates. The design of a JPEG
decoder in VHDL will provide a robust, hardware technology independent description.
The decoder could then be downloaded to a Field Programmable Gate Array for testing
and verification.
Figure 1 is a general block diagram representing the JPEG encoding process. A good
understanding of the encoding process will help illuminate some of the design options of
the decoding process while describing the fundamental problem at hand in greater detail.
12
EE175WS-00-11
Figure 1 [1]
As shown in Figure 1 the JPEG encoding process is performed on blocks of an image that
are 8 pixel wide by 8 pixels high. Each of these JPEG data blocks is encoded in a
sequence of three operations. First the image block is transformed using a 2-dimensional
Forward Discrete Cosine Transform (FDCT) to determine the spectral components of the
image. After the FDCT is performed the upper left corner of the coefficient matrix contains
the DC component of the block and the lower right corner contains the highest frequency
components of the image. Since the human eye does not readily perceive high frequency
changes the high frequency components can be stored with less precision then the more
important low frequency components. This low pass filtering of the image is performed by
the next stage, which quantizes the data in exactly this manner. The Quantization table
used in this step thus determines the exact filter characteristics and thus the compression
ratio and quality of the encoded JPEG image. Finally the Coding stage transforms the 8x8
quantized block into a linear stream of values and then assigns the more frequently
occurring values to shorter binary codes and less frequently occurring values to longer
binary codes to minimize the length of the encoded message. The Coding table used in
this step determines the compression ratio since the table must accurately match the
relative frequencies of the input values to achieve good compression.
The JPEG decoding process is an inverse transformation where the encoded data is first
decoded and restored to 8 pixel by 8 pixel data blocks using a preliminary Decoding stage.
This stage includes both Huffman Decoding and Run-Length decoding two distinct coding
schemes. Next the Quantization table specification is used to approximately regain the
spectral components of the image block, while low frequency components may be fully
restored the high frequency components may be severely distorted however this distortion
is barely perceptible. Finally the Inverse Discrete Cosine Transform approximately
recovers the original 8x8 data block. Figure 2 is a detailed block diagram showing this
process, which is implemented by the JPEG decoder designed for this project.
Figure 2 [1]
13
EE175WS-00-11
Huffman Decoder
The first stage of the JPEG decoding process is the decoding of data values using a
Huffman coding. The Huffman decoder was designed to read in a Huffman Table that
was extracted from the JPEG data file and use that data to determine the decoding of the
input. Since the Huffman encoded data is of variable length the decoder must make
decisions one bit at a time. To do this the decoder reads data 1 bit at a time from the input
using a separate process to handle buffering the input data and delivering it as needed to
the decoder. The process of decoding is exactly like walking down a binary tree. At each
step from the root of the Huffman tree the decoder makes a single decision based on the
next bit of input until it reaches the end of the path. At this point the decoder is able to
decide from the Huffman Table what the appropriate decoded value is. The following
figure shows a Huffman Tree and the corresponding codes for a 2-bit message where ‘00’
is very common and ‘10’ and ‘11’ are very rare.
0
Root
1
00
Value
00
01
10
11
Code
0
10
110
111
0
1
01
0
10
1
11
Figure 3
Run-Length Decoder
The decoded 8-bit word from the Huffman Decoder represents a 4-bit run-length followed
by a 4-bit data-length. The 4-bit run-length is a count of the number of zero data values
occurred between the last non-zero data value and the current one. The 4-bit data-length
is the number of bits following this 8-bit word that make up the actual non-zero data point.
A data-length of 0 signifies either the end of a data block or if the run-length is 15 then the
event of 16 consecutive zero data values. Since both the Huffman Decoder and the RunLength decoder have to read bit by bit from the input to decode the data I decided to
merge these two distinct operations into a single VHDL entity that performs both of these
operations. The data values are then read from the input and decoded according to the
following rules. If the high order bit of the data value is 0 then it corresponds to a negative
number and should be sign extended with ones since we are using a signed 2’s
complement numbering system. If the high order bit of the value is 1 then it corresponds
to a positive value and should be sign extended with zeros. Then 1 is added to the
negative values to make their codes the right value. This creates a gap between the least
14
EE175WS-00-11
negative number and the least positive number exactly large enough to hold the values
that can be represented by less bits and would thus have another sign bit. Table 4 below
is a table for the 3-bit long data values and their decodings.
Value to Be Coded
Conversion If
Negative
8-bit 2’s
Complement
Encoded Value
-7
-8
11111000
000
-6
-7
11111001
001
-5
-6
11111010
010
-4
-5
11111011
011
4
00000100
100
5
00000101
101
6
00000110
110
7
00000111
111
Table 4 [1]
Quantization Decoder
The Quantization Decoder requests data values from its input. It multiplies these data
values by the corresponding value in the Quantization table and then places them in the
appropriate location in the 8x8 JPEG data block. During JPEG encoding the frequency
components of the data block are ordered so that the low frequency components are at
the beginning and higher frequency components follow. To do this the frequency matrix is
ordered in a zig-zag fashion as described in the following diagram.
15
EE175WS-00-11
Figure 5 [1]
This data block is then passed on to the Inverse Discrete Cosine Transform unit. Since
the Quantization Decoder is in the middle of the JPEG decoder pipeline and is relatively
simple I decided to make it the master device of the JPEG decoder. It requests the
Huffman Decoder to give it data and with that data it assembles a data block and requests
the Inverse Discrete Cosine Transform unit to decode it. This allows almost all of the
operations of the Quantization Unit to be done while the Huffman decoder, which takes a
long time since it has to make decisions at every bit, is running. This increased parallelism
is one of the major advantages to a hardware-based design.
Inverse Discrete Cosine Transformation
The Inverse Discrete Cosine Transform unit is definitely the most complex unit in the
JPEG decoder. The IDCT requires many multiplications and additions of irrational values
and is computationally intensive. Since a floating point ALU is very difficult to design, very
large, and very slow floating point arithmetic is generally never done in custom hardware
designs except for the data-path of a microprocessor where it can be properly shared
among many different uses. For this design I quickly realized that I would have to work
around this problem. I chose to implement the IDCT using only scaled fixed-point
arithmetic. After extensive Matlab testing I decided that a 16-bit whole part followed by an
8-bit fractional extension would be used. Thus the input was extended from 16-bits to 2416
EE175WS-00-11
bits and then the calculations could be performed. After computation the output is
rounded back to whole numbers and reduced to the final 8-bit output.
Here are the equations for the 2-Dimensional 8x8 Discrete Cosine Transform and its
Inverse Transform…
7
7
1
 2 x  1  u   2 y  1  v 
FDCT : Su ,v u, v   Cu Cv  s x , y cos
 cos

4
16
16

 

x 0 y 0
7
7
1
 2 x  1  u   2 y  1  v 
IDCT : s x , y  x, y    Cu Cv Su ,v cos
 cos

4 u 0 v 0
16
16

 

 1
n0
Cn   2
 1 n  0
These equations can be rewritten in the form of linear transformations using matrices as
follows…

 S  D  D

 D
S  D  s  D T  D  DT  S  D  D T
s  DT
T
 D  s  DT
Here D is a constant 8x8 Matrix formed by the cosine values and constants above. This
Form shows that D is orthogonal since its transpose is also its inverse. The JPEG
decoder used this linear transformation equation to compute the IDCT since computing
the product of two matrices is relatively straightforward.
XESS Development Board Interfacing
The XESS Board that I decided to use for this project has turned out to be a very useful
and versatile development board. The documentation and developing tools provided by
XESS have been very helpful and have enabled the basic communications devices
required to communicate with the board to be developed. To communicate with the PC
Jeremy and I decided to use the onboard Serial communications port and to implement
VHDL entities to work on the board to communicate with the PC via the serial port. To do
this we had to reprogram the CPLD (Complex Programmable Logic Device) on the board
to route the serial pins to the FPGA. These serial lines were not originally configured to
connect to the FPGA and we needed these pins for serial communication. Once the serial
pins were routed to the FPGA we designed two very simple VHDL entities to control the
receiving and transmitting of data. The Serial Input Controller listens to the serial receive
line and reads off bytes of data as they arrive and presents this data as an 8-bit output
value. The Serial Output Controller waits for a signal to send an 8-bit input value to the
serial communications link and when it receives this signal it transmits it over the serial
port to the PC. The design of these entities is based on a simple Finite State Machine to
read and write the data one bit at a time.
17
EE175WS-00-11
Testing Procedure
Since the prototype fabrication process is prohibitively expensive for this project the actual
JPEG decoder chip could not be built and tested instead testing of the JPEG decoder
design proceeded in two phases, software simulation testing and hardware testing on a
Field Programmable Gate Array (FPGA). This process of software simulation of VHDL
code followed by FPGA testing is commonly referred to as Rapid Prototyping since these
tools allow the design to iterate much quicker then under a conventional development
cycle.
Simulation
To simulate the JPEG decoder the program Active HDL was used. This program allows
the VHDL code to be written and simulated in a single development environment and
offers an advanced logic analyzer type waveform display for observing how signals evolve
through simulation time. This allows different signals to be viewed and analyzed for errors
easily. The procedure basically consists of starting an empty design project and adding to
the design all the VHDL source code. The code can then be compiled and the simulation
will begin. The first time the software asks which entity is the top level entity to simulate
and this can be changed as needed later. Then the simulation will display the available
VHDL entities, their input and output ports, and their internal signals for observation.
When the desired signals have been added to the waveform viewer the simulation begins
by specifying how long to run. This will run the simulation for the specified interval and
display the waveforms during that interval.
The simulation of the JPEG decoder has been successful and has yielded good results.
The simulation helped considerably during the design process, as I was able to pinpoint
what signals were behaving incorrectly and where in the code this problem arose.
Synthesis & Hardware Testing
The plan was to have the synthesis of the JPEG decoder completed and have the device
downloaded to the FPGA on the XESS board and running so that JPEG data blocks could
be sent to the board for decoding and the decoded data could be returned to the PC and
assembled into a viewable image. While many of the pieces for such a setup are in place
the project ran out of time and this testing could not be performed. Preliminary work on
designing the serial communications units has been completed and additional work has
been done to design a memory interface controller. Specifically the memory interface
controller can successfully read and write the on-board memory but some timing issues
have not been resolved when the device is both reading and writing. I have been making
use of a logical analyzer to discover errors and fix them and that has proven to be a very
important tool. The logical analyzer is easily connected to the expansion port pins on the
XESS development board and signals from the FPGA can be easily viewed.
18
EE175WS-00-11
Results
The implementation of the JPEG decoder was not completed. All of the pieces for the
JPEG decoder have been written and simulated and I have determined that they do work.
Then these pieces were assembled and a full simulation of the JPEG decoder was made
for a single block of a JPEG image.
The second part of the project, which involves the synthesis of the JPEG decoder and
testing on the XESS development board, has not been completed.
Budget / Resources
Since this project consisted of the design and testing of a JPEG decoder in simulation
followed by testing of the design on a re-programmable logic device the resources
required for this project included a great deal of usage of expensive software systems and
prototyping devices but no expenditure of resources will be involved. The required
resources for this project included the VHDL tools necessary to write, compile, test, and
synthesize a VHDL description and the FPGA prototype hardware are very expensive and
the Technical Advisor, Frank Vahid, generously provided access to these resources. Dr.
Vahid supplied the resources necessary to purchase the XESS XSV-300 Virtex
Prototyping Board, which I selected for use on this project. This prototype board has all
the features such as a gate capacity of approximately 300K gates and an easy interface
with the software packages being used. The total cost of this board was $899.00. This
board also has many other features that will be used by Dr. Vahid and his research group
as well as possible future Senior Design Projects under Dr. Vahid. This board features a
Xilinx FPGA of the Virtex class that is capable of implementing approximately a 300
thousand gate design with reasonable speed.
Comparison to Specifications
The specifications required that the JPEG decoder operate not just in simulation but
actually function on the board. Also the glue logic to repeat the JPEG decoding process
for every block of an image and to interface with a standard JPEG file format (JFIF) was
not completed. I will discuss these two issues separately
First there was not enough time for me to get the JPEG decoder to work on the board.
This was due to the amount of time needed to get the JPEG decoder operational in a
simulation environment. There were no major stumbling blocks except for time
constraints. I believe that if I had been able to schedule the project with more time
devoted to slowly integrating the working parts of the design on the board the project
would be able to operate on the board.
Second the current JPEG decoder needs a substantial amount of control to deliver the
appropriate data. Ideally a standard JFIF file could be delivered to the decoder and a
decoded image bitmap would be produced. Unfortunately time did not permit me to
continue on with adding this control logic. As it stands now the JPEG decoder design, if
implemented in hardware would be able to significantly increase the speed of decoding
JPEG images when connected to a microprocessor programmed to control it and do the
necessary bookkeeping. This in itself is an important feature that demonstrates how a
19
EE175WS-00-11
system can be implemented as a hybrid hardware / software implementation to improve
performance without sacrificing versatility.
Precision
The specification required that the average per pixel error be below 3% of the error rate
introduced by a standard floating-point implementation. This design goal appears to have
been met. Matlab experiments show that I my approximate method of computing the
IDCT will introduce approximately 0.245% error into the image. The VHDL simulation
results agree to this prediction within 0.06%. The actual error introduced through the
simulation was found to be 0.227%, which is actually less then the predicted value. This
data shows that the JPEG decoder is sufficiently accurate.
Chip Area
The specification required that the design be able to synthesize and fit on the FPGA on
the development board, which has a capacity of about 300 thousand gates. Since this
part of the project did not get completed this design goal has not been evaluated. I believe
that the JPEG decoder is sufficiently simple to fit within this boundary but I have not been
able to determine the actual required number of gates due to time restrictions.
Speed
From simulation it has been determined that the number of clock cycles required to
complete one 8x8 block of data is about 3,700 clock cycles. Since I anticipate no
problems implementing this design to run at 50MHz and possibly much faster at about
100MHz I have calculated that a 640 by 480 pixel image consisting of 4,800 data blocks
will take about one third of a second. This would give a throughput of about 8 Mega bits
per second. This speed is very acceptable and while it is slower then a standard PC
decoding images it can keep up with less powerful microprocessors.
Power
As mentioned in the specifications this project being implemented on a FPGA is not a
good measure of the power requirements of the design since the power used is
dependent on the FPGA that it is running on.
20
EE175WS-00-11
Conclusions And Recommendations
This project proved to be considerably more challenging and time-consuming then
originally projected. While the overall project was not completed the decoder is basically
complete and simulation shows that it does work. Also many of the necessary
components for communication and memory storage on the prototype board were
developed. I believe that future projects building upon this one would be able to use the
work I have done to complete this project or a similar one. I believe that a two-member
team could expand the results of this project and deliver a system capable of downloading
JPEG images.
The theoretical foundations of the JPEG algorithm such as the Discrete Cosine Transform
and Huffman coding are good applications of the concepts learned in Digital Signal
Processing and Digital Communications.
What Was Learned
The project showed that the design and implementation of complex systems in hardware
is considerably different then in software. While the JPEG algorithm is relatively simple to
implement in software it does not easily translate to a hardware implementation. Also a lot
of valuable experience in project management was learned.
It was also very instructive to see how the frequency transformations learned about
throughout my studies can be applied to a seemingly simple problem such as
compression. I have an increased awareness of the complexities of the JPEG algorithm.
The chosen prototype board produced by XESS worked well and was well documented. I
believe that the board is capable of being useful for a number of different design projects.
What Went Wrong
The failure of this project is due primarily due to time delays and the complexity of the
project. Even though the project did not get finished I believe that I successfully completed
a lot of work on the project and that my final progress of a proper simulation of the JPEG
decoder is significant. The project turned out to be considerably more challenging then I
originally anticipated and I believe that it would have been much better if I had only taken a
minimal course load during the second quarter of the project so that I could focus on the
project almost exclusively.
Future Work
First of all I believe that this project is too much for a single beginning engineer to handle
alone. Further work on this progress should be attempted in a group where the team
members are well trained in VHDL programming and experienced in the design of digital
systems. Beyond finishing this project there are a number of related topics that would be
interesting to explore…
21
EE175WS-00-11

Prototype Board – Since this project did not succeed in getting the JPEG decoder
working on the prototype board future work should start with replicating the successful
simulation results on the actual board.

Huffman Table Extraction – The Huffman Table format used in the JPEG decoder is
not the same as the information stored in a normal JPEG file. This could be changed
by extracting the useful form of the Huffman Tree from the data stored in the file. I
have done this with some simple C code, which could be used as a basis for doing it
in VHDL.

Color Space Transformation – The JPEG decoder does not currently use a specific
color space and generally treats the data as an array of bytes. For image applications
hardware usually uses the RGB color space but many JPEG streams use the YCbCr
color space since the Cb and Cr coordinates can be stored more efficiently [1].

‘Fast’ Cosine Transformation Realization – The Discrete Cosine Transformation is
very similar to the Discrete Fourier Transform and the efficient algorithms for
computing the Discrete Fourier Transform collectively known as the Fast Fourier
Transform algorithms can be applied to improve the efficiency of the Discrete Cosine
Transformation as well. While I used a simple algorithm based on linear
transformations to implement the Inverse Discrete Cosine Transformation a Fast
Cosine Transform implementation may be a better alternative.

JPEG Encoder – A JPEG encoder could also be extended from this project. I would
suggest that the encoder be designed to use a pre-determined Huffman code tree
and one of a few selected Quantization terms. This would decrease the complexity of
the design considerably. The corresponding decoder would also be simplified since
the Codes would be known without having to extract them from the data stream,
which I found to be difficult and inefficient. Also the compression would be better
since the codes would not need to be explicitly stored in the file. I believe that this
type of implementation is better for a digital camera since many features of the data
set are well known and generality is not needed. ( A digital camera does not need to
encode any size image for example but only one of a few different resolutions. )

JPEG 2000 – The Joint Photographic Experts Group has now submitted a draft for
completely new image compression standard to replace JPEG. This new standard is
expected to be approved later this year and quickly replace the current JPEG
standard. This new algorithm uses more advanced Wavelet analysis methods to
provide better compression, image quality and versatility. I believe that the design of a
JPEG 2000 system perhaps using a Digital Signal Processor would make a good
project.
22
EE175WS-00-11
Reference Documents
The following is a list of the documents that I have referred to and found to be useful in
the course of this project.
[1] C. W. Brown, B. J. Shepherd, Graphics File Formats: Reference and Guide.
Greenwich, CT: Manning Publications Co, 1995.
[2] XSV Board v1.0 Manual, XESS Corporation, Apex, NC, 2000
[3] Hsu et al, VHDL Modeling for Digital Design Synthesis, Norwell, MA, Kluwer
Academic Publishers, 1995
[4] Xentec, “JPEG_CODEC –X_JPEG Short Form Datasheet,” http://www.xentecinc.com/X-datasheets/x_jpeg_rev1.4.pdf (current June 11, 2000)
[5] T. Tran, “Fast Multiplierless Approximation of the DCT”, Johns Hopkins University,
ECE Department, http://thanglong.ece.jhu.edu/Tran/Pub/intDCT-SPL.pdf (current
June 11, 200)
[6] Collosseum Builders Inc, “Image Library Source Code Version 3”,
http://www.collosseumbuilders.com/imageformats/compressedimageformats.html
(current June 11, 2000)
23
EE175WS-00-11
Appendices
Fixed-Point Arithmetic
To efficiently compute he Inverse Discrete Cosine Transform in hardware a scaled fixedpoint arithmetic system was used to simulate real numbers. This numbering system
consisted of 16-bits of a whole number followed by 8-bits of fractional part. This allows
numbers from –32,768.9906375 to 32,767.9906375 to be stored in 24-bits as follows.
A  a15 a14 a13 a12 a11a10 a9 a8 a 7 a 6 a5 a 4 a3 a 2 a1 a 0 .a 1 a  2 a 3 a  4 a 5 a 6 a 7 a 8
Where
ai   0, 1 
And…
15
23
i  8
i 0
A   ai  2i  2 8  ai 8  2 i  A  2 8
Where A is a normal 24-bit binary number. Addition works as expected for this system
due to linearity.
A  B  A  B   2 8
However multiplication does introduces an extra term.
A  B  A  B   2 8  2 8
To correct for this extra term the product of two numbers must be shifted right an extra 8bits. In actuality this could be performed by editing the multiplication algorithm to shift right
once extra per iteration (normal hand multiplication algorithm from grade-school.) but this
is not feasible when working with a pre-defined multiplication algorithm. This can be
described simply by noting that while 1.0  1.0  1.0 when shifted to allow fractions
10  10  100 , which gives 10.0 unless we include an additional shift to arrive at the
expected answer of 1.0 .
With this scaled-fixed point system and the appropriate correction for multiplication I was
able to approximate the Inverse Discrete Cosine Transform using only standard integer
arithmetic.
24
EE175WS-00-11
Schematics
The following sections show the schematics of the different components of the JPEG
decoder. They primarily depict the I/O characteristics of each device and their
interconnection structure. The implementation of these devices is described by a VHDL
process that is basically a high-level representation of a Finite State Machine with
Datapath. This behavior is best analyzed by reading the VHDL source presented later.
JPEG Decoder Unit
JPEG Decoder Unit
in_req
inp(0:7)
go
value(0:7)
in_rdy
rdy
rst
clk
Here the JPEG Decoder Unit takes a stream of input data in the form of 8-bit words and
produces the JPEG decoded values from the data stream. The beginning of the input
data stream will have both the Huffman Code Table and the Quantization Table appended
to it. The decoder logic will extract these values and send them to the Huffman Decoder
and Quantization Decoder which are serially connected along with the Inverse Discrete
Cosine Transform Unit.. The output of the IDCT unit is then placed one entry at a time on
the 8-bit output value.
Huffman Decoder / Run-Length Decoder Unit
Huffman Decoder
in_req
inp(0:7)
value(15:0)
in_rdy
rst
clk
rst
25
go
rdy
code
EE175WS-00-11
Here code is the Huffman Code Table extracted from the input stream. Input is an 8-bit
word from the data stream. Value is the Huffman and Run-Length Decoded data value
that has been extracted from the data stream. The other signals are control signals.
Quantization Decoder Unit
Quantization Decoder
in_req
in_val(15:0)
outp
in_rdy
rst
clk
go
rdy
quan
Here quan is the Quantization Table extracted from the input stream. In_val is the 16-bit
word from the Huffman Decoder. Outp is the assembled 8x8 JPEG data block matrix. It
consists of 64 entries each a 16-bit value that is the product of one of the last 64 in_vals
from the Huffman Decoder and the corresponding term from the Quantization Table.
These are the restored DCT coefficients and are sent to the IDCT unit to recover the
actual data values. The other signals are control signals.
Inverse Discrete Cosine Transformation Unit
Inverse Discrete
Cosine Transform
X
go
O
rdy
rst
clk
Here the IDCT Unit computes the Inverse Discrete Cosine Transformation of the input and
produces it on the output. To do this the IDCT Unit has two matrix multiplication sub-units
and an output-rounding subunit, which are described next. First the input is extended to
form the internal scaled fixed-point form of the input matrix then the two matrix
26
EE175WS-00-11
multiplications by the constant DCT matrix and its inverse are performed and then the
output is rounded back to an 8-bit integer valued matrix.
Matrix Multiplication Sub-Unit
Matrix Multiplier
inp1
outp
inp2
go
rdy
rst
clk
Here the Matrix Multiplier will compute inp1 pre-multiplied to inp2 (inp1*inp2) and
deliver the result on outp.
Output Rounding Sub-Unit
Output Rounder
inpt
go
outp
rdy
rst
clk
Here the Output Rounder will take the input, which is a 8x8 matrix of internal scaled fixedpoint values, and round it to an integral 8x8 matrix and output it.
27
EE175WS-00-11
VHDL Source Code
JPEG Library
28
EE175WS-00-11
29
EE175WS-00-11
JPEG Decoder Unit
30
EE175WS-00-11
31
EE175WS-00-11
32
EE175WS-00-11
33
EE175WS-00-11
34
EE175WS-00-11
35
EE175WS-00-11
Huffman / Run-Length Decoder Unit
36
EE175WS-00-11
37
EE175WS-00-11
38
EE175WS-00-11
39
EE175WS-00-11
40
EE175WS-00-11
Quantization Decoder Unit
41
EE175WS-00-11
42
EE175WS-00-11
Inverse Discrete Cosine Transform Unit
43
EE175WS-00-11
44
EE175WS-00-11
45
EE175WS-00-11
46
EE175WS-00-11
47
EE175WS-00-11
48
EE175WS-00-11
49
EE175WS-00-11
50
EE175WS-00-11
51
EE175WS-00-11
Serial Input Controller
52
EE175WS-00-11
53
EE175WS-00-11
Serial Output Controller
54
EE175WS-00-11
55
EE175WS-00-11
Memory Input Controller
56
EE175WS-00-11
57
EE175WS-00-11
58
EE175WS-00-11
59
EE175WS-00-11
Matlab & C++ Code
Data Create & Test Matlab Script
60
EE175WS-00-11
61
EE175WS-00-11
62
EE175WS-00-11
Huffman Coding in C
63
EE175WS-00-11
64
EE175WS-00-11
65
EE175WS-00-11
66
EE175WS-00-11
67
EE175WS-00-11
68
EE175WS-00-11
69
EE175WS-00-11
70
EE175WS-00-11
71
EE175WS-00-11
72
EE175WS-00-11
DCT Test Matlab Code
73
EE175WS-00-11
Computation of DCT Coefficient Matrix
74
EE175WS-00-11
Quantization Testing in Matlab
75
EE175WS-00-11
Image DCT, Quantization, De-Quantization, IDCT Testing in Matlab
76
EE175WS-00-11
77
EE175WS-00-11
XESS XSV Board V1.0 Manual
78
EE175WS-00-11
137
Download