-4 Demonstration System for A Low Power Video Compression Integrated Circuit by Charatpong Chotigavanich Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2000 @ Charatpong Chotigavanich, MM. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document ftr in whole or in part. MASSACHUSETTS NSTITUTE OFTECHNOLOGY 7 2000 JUL 2 L:BARIES A uthor ........................... Department of Electrical Engineering and Computer Science January 20, 2000 . . . ................ Anantha P. Chandrakasan Associate Profeseor~of Electrical Engineering C ertified by ................................. Th0si§)Supervisor Accepted by................ ............... Arthur C. Smhii. Chairman, Department Committee on Graduate Theses Demonstration System for A Low Power Video Compression Integrated Circuit by Charatpong Chotigavanich Submitted to the Department of Electrical Engineering and Computer Science on January 20, 2000, in partial fulfillment of the requirements for the degree of Master of Engineering Abstract This thesis demonstrates a low power video compression integrated circuit which consumes ultra low power. The system digitizes analog video signal and compresses it using the video compression integrated circuit which utilizes wavelet transform and zero tree coding algorithm to achieve high compression ratio and ultra low power. The compressed data is then sent to a PC where it is decoded and played as a movie in real time. Thesis Supervisor: Anantha P. Chandrakasan Title: Associate Professor of Electrical Engineering 2 Acknowledgments My thesis could not be accomplished without the following people. First of all, I would like to thank Prof. Anantha Chandrakasan for agreeing to supervise this thesis. I greatly appreciate his help, suggestions, and also patience. Throughout the term, he has been an excellent consultant who gives me advices on not only academic issues but also life after school in general. I also would like to thank Rex Min for his great assistance with almost everything. I learned a lot from his past work and his hand-on experiences. He has been a great instructor who leaves his desk at his busiest time of the day just to help me figure out some minor bugs. This thesis would not even exist without him. I thank Jim MacArthur. He is the guy who actually educated me and taught me the real-world engineering lessons. In addition, Jim is a kind of entertainer whose characters provide a great relief for me in the lab. We share a lot of thoughts about many projects, business, laws, and even startups. If I ever become a millionaire in the future, this is the man I will give my first million to. I would like to thank Keith Fife for lending me an ISR cable to program CPLDs. He also helped me with the board layout and other circuit problems. I also thank Thomas Simon for his advices and his patience. Although I was literally an annoyance to him during debugging, he was still calm and continued to help me out. I am thankful to many people in the lab. They have been very nice and helpful in general. Thanks to: Alice Wang, Raj Amirtharajah, Vadim Gutnik, and Jim Goodman. I thank all my friends who hang around me when I was not in the lab. Thanks to Sandia Ren, Laurie Qian, and Duncan Bryce who also helped me review this thesis. Thanks to Nuwong Chollacoop for his help in moving my stuff to my new apartment while I was being busy with my thesis. I owe tremendous gratitude to Preeyanuch Sangtrirutnugul. I could actually say that she is a co-author of this thesis. After all, she stayed up with me until 7am everyday making sure that I wouldn't do something foolish. Finally, I am grateful to my parents and my family for their continuing support throughout my entire life. Without them, I would not even be writing this thesis. If I have left anyone I should be thankful to, I would like to apologize here. 3 Contents 1 Introduction 1.1 Background on Low-Power Video Processing . . . . . . . . . . . . . . 1.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 11 11 2 Hardware System 2.1 High-Level Block Diagram ........................ 2.2 Image EPROM .............................. 2.3 Input Frame Buffer . . . . . . . . . . . . . . . . . 2.4 Video Compression Integrated Circuit(EZW Chip) 2.5 EZW Output Frame Buffer . . . . . . . . . . . . . 2.5.1 Bit Packing Finite State Machine . . . . . 2.5.2 Synchronizer Finite State Machine . . . . 2.5.3 Read/Write Buffer Switch . . . . . . . . . 2.6 Parallel Port Interface . . . . . . . . . . . . . . . 2.7 Complex Programmable Logic Devices(CPLD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 14 17 18 25 29 31 34 35 35 40 Software System 3.1 Direct Memory Access Device Driver 3.2 D ecoder . . . . . . . . . . . . . . . . 3.3 Video Player . . . . . . . . . . . . . . 3.4 User Interface(UI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 43 43 44 45 System Implementation 4.1 Hardware System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Software System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 49 3 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 System Performance 52 6 Conclusion and Future Improvement 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Ideas for the Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 54 55 Bibliography 56 A Schematic Diagrams 57 4 64 B VHDL Code 64 B.1 VHDL Code for CPLD 1 ......................... 64 . . . . . . . . . . . . . . . . . . . . B .I.1 i2c.vhd . . . . . . . . . . . B .I.2 ntsc.vhd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 B.I.3 ezw-sram -drive. vhd . . . . . . . . . . . . . . . . . . . . . . . . 76 B .I.4 top-a.vhd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 B.2 VHDL Code for CPLD 2 . . . . . . . . . . . . . . . . . . . . . . . . . 83 B.2.1 sram-switch.vhd . . . . . . . . . . . . . . . . . . . . . . . . . . 83 B.2.2 ezw-out.vhd . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 B.2.3 parallel -sram. vhd . . . . . . . . . . . . . . . . . . . . . . . . . 91 B.2.4 top-b.vhd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 C Decoder Code C-1 Decoder Code in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . C .1.1 StdA fx.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1.2 StdAfx.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.1.3 vdodm a3.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.1.4 vdodma3.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . C .1.5 resource.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.1.6 m akefrm -c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 100 100 100 101 101 101 141 142 List of Figures 2-1 Overview block digram of the hardware system 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 Timing diagram of output signals from Bt829A . . . . . . An example of a buffer using a one-port SRAM . . . . . . A mechanism used to prevent image corruption . . . . . . Finite state machine of the left control logic module . . . . Finite state machine of the right control logic module . . . How a frame is loaded . . . . . . . . . . . . . . . . . . . . Addressing scheme of the input buffer . . . . . . . . . . . . Combinational logic which controls the tri-state outputs . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . of SRAM 16 19 20 21 22 24 24 buffer and image EPROM . . . . . . . . . . . . . . . . . . . . . . . . 25 2-10 Timing Diagram of Input Signals to Video Compression Integrated Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 Timing Diagram of Output Signals from Video Compression Integrated C ircuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 64MHz to 500KHz clock divider . . . . . . . . . . . . . . . . . . . . . 2-13 Schematic Diagram of the Output Frame Buffer Controller . . . . . . 2-14 Bit Packing Finite State Machine . . . . . . . . . . . . . . . . . . . . 2-15 Bit Storage Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 An alternative of how to store a bit . . . . . . . . . . . . . . . . . . . 2-17 State diagram of parts of Read/Write Buffer Switch . . . . . . . . . . 2-18 Schematic Diagram of parts of Read/Write Buffer Switch . . . . . . . 2-19 Timing Diagram of the Parallel Port Protocol in ECP mode. . . . . . 2-20 State diagram of finite state machine that controls the parallel port interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 28 29 30 31 32 33 34 36 37 39 2-21 Logic blocks in the first CPLD . . . . . . . . . . . . . . . . . . . . . . 40 2-22 Logic blocks in the second CPLD . . . . . . . . . . . . . . . . . . . . 41 3-1 3-2 Block Diagram of the Software System . . . . . . . . . . . . . . . . . Double frame buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 45 3-3 The Working System . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4-1 4-2 4-3 The front side of the unpopulated PCB . . . . . . . . . . . . . . . . . The back side of the unpopulated PCB . . . . . . . . . . . . . . . . . The Finished PCB with all components in place . . . . . . . . . . . . 48 49 50 A-1 Schematic Diagram of Analog to Digital Converter and Its Control Logic 58 6 A-2 A-3 A-4 A-5 A-6 A-7 A-8 A-9 Schematic diagram of EZW chip . . . . . . . . Three instruction EPROMs for the EZW chip Schematic diagram of the image EPROM . . . Schematic diagram of the SRAM buffer and its Schematic Diagram of i2c programmer . . . . Schematic Diagram of Output Frame Buffer . Schematic Diagram of Parallel Port Interface . Schematic Diagram of Reset Control Logic . . 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . control logic modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 59 60 61 61 62 62 63 List of Tables 2.1 2.2 2.3 Bt829A pin descriptions . . . . . . . . . . . . . . . . . . . . . . . . . Pin descriptions of the EZW chip . . . . . . . . . . . . . . . . . . . . Mapping Between Centronics pinouts and D-SUB pinouts . . . . . . . 5.1 5.2 The relationship between the number of frames decoded and the latency 53 The performance of this EZW video chip . . . . . . . . . . . . . . . . 53 8 15 27 38 Chapter 1 Introduction 1.1 Background on Low-Power Video Processing Video compression has been an important research topic for the past several years, and a large number of algorithms and hardware devices were developped as a result. Usually these algorithms and hardware devices target better quality, and more compression ratio. A lot of algorithms developed for video compression are for desktop computers where the amount of power required for computation is not really a major concern. However, as portable applications become widespread, researchers have become aware of power consumption as an important issue. The need for low-power algorithms and hardware become inevitable. And for portable video devices, data compression is one obvious way of saving power. Data compression reduces not only the capacity required for data storage, but also power consumption because there is less data to be transmitted. Wireless camera is an evident example of how a system can benefit from data compression.In this case, the RF transmitter obviously consumes less power in sending smaller amount of data through a wireless network. Thus, a lot of algorithms for video data compression have been developed such as JPEG, MPEG, etc. Most algorithms fall under the category of "lossy compression" where image quality is reduced to compensate for the compression ratio gain. However, having a powerful algorithm is not enough for low-power application. How 9 an algorithm is implemented in hardware actually accounts for most of the power consumption. There are a lot of general purpose processors(GPP) which are capable of running virtually any of the video compression algorithms. But those GPPs usually consume too much power, and portable devices which utilize such processors, eg. laptop computers, usually have large batteries and require a long re-charging time almost daily. Therefore, a low-power integrated circuit is a necessity for this video application. This thesis demonstrates the performance of a "wavelet transform and zero-tree coding" video compression integrated circuit which requires low power consumption and yields high compression ratio. Designed by Thomas Simon as a part of his PhD thesis[4], this video compression integrated circuit uses the wavelet transform and zero-tree coding algorithm[2] to compress and encode digital video signal. The chip is a massively parallel SIMD processor which utilizes wavelet transform and zero tree coding algorithm to compress and encode digital video data. The parallel nature of the processor is key to achieving low power. When parallelism is introduced into a system, the computation process can be done faster. But for an application whose input or output is stream data with a fixed rate of arrival or departure, the speedup from parallelism is a waste. As a result, by reducing the power supply voltage, the computation time can be lengthened just enough for a system with parallelism to meet the required rate. At the lower power supply voltage, the circuit consumes less power. This technique means that the speed-up can save some power consumption. Using this technique, the core of this video compression chip operates at 1.5V, consumes 300-400/LW, and yields compression ratio up to 300:1 for acceptable image quality. 10 1.2 System Overview The system presented in this thesis will be a useful tool to demonstrate how much power can be saved from the parallel architecture. The function of this system is to digitize and compress a real-time video signal from a camera, send the data to a PC, decode the data and finally, play the video signal on the PC's monitor. The entire process can be performed in real-time with a few seconds of frame latencies. This demonstration system comprises of hardware and software. The hardware system is a 7.5"x7.5" printed circuit board with a video compression integrated circuit and several other electrical components including the "wavelet transform and zerotree coding" video compression integrated circuit. The board handles NTSC signal digitization, digital video compression, and data transmission to a PC. The software part of this demonstration is developed for Microsoft Windows platform. The software runs on Windows 95, Windows 98, and Windows NT. The software is responsible for data reception from the board, data decompression, and video playback. 1.3 Design Considerations " Ease of Use. Since this system will be frequently employed to demonstrate the performance of the video compression integrated circuit, the system has to be easy to operate. The hardware is designed to have a few controls on the circuit board, and the user-interface of the software is developed to be user-friendly. " Ease of Debugging. The system is designed with modularity. There are several small modules in this circuit and they are abstracted away from one another. The interface between each module is consistent throughout the design process. In addition, the circuit board has numerous accessible ports for logic 11 probes for debugging. The electrical components on the board are placed on sockets so that they can be changed if damaged. * Extensibility. This demonstration can potentially be a submodule of another system, such as a wireless camera; therefore, it has to be extensible. The circuit board utilizes surface-mounted complex programmable logic devices(CPLD) which can be programmed on-board. These CPLDs provide great flexibility and extensibility to modify the code in the future. The circuit also uses EPROMs for data storage so the contents can be changed if necessary. * Small Printed Circuit Board Area. The area of the printed circuit board is minimized. All electrical components are placed tightly next to one another to achieve minimal board area. 12 Chapter 2 Hardware System This chapter discusses the hardware part of this system which handles NTSC signal digitization, digital video compression, and data transmission to a PC. The architecture of the system is divided into several circuit modules. The interface specifications between each modules are kept consistent throughout the design process so that an internal change of one module would not effect the others. The modular design of the hardware reduces the possibility of bugs and greatly speeds up the design process. NTSC Video Source A D .-----------------------------------------------------------. Input Buffer Video Compression Integrated Circuit Output Buffer Parallel Port Interface EPROM A2D Controller Buffer Controller F EPROM Controller Video IC Controller rCPLD 21 Figure 2-1: Overview block digram of the hardware system 13 TOC 2.1 High-Level Block Diagram Figure 2-1 illustrates a high-level block digram of the hardware system. The system can receive two sources of video input. One is analog video signal from an NTSC video source, and the other is digital video signal programmed on an EPROM. For analog video signal input, the signal is digitized into digital video stream. This video stream is buffered and appropriately formatted for further processing. On the other hand, the digital video input from the EPROM does not require analog-to-digital conversion and frame buffering, and thus can be processed right away. Selected by a controller, one of these two sources is then passed on to the video compression integrated circuit, where the compression and encoding operations take place. The output, as a stream of bits, is again buffered before it is sent to a PC through a parallel port interface circuit which governs the transmission process. The A2D converter receives NTSC signal from an NTSC source and produces an 8 bit gray-scale image that can be further processed. This NTSC source can be any device, such as video player and video camera, which provides standard NTSC video signal. Because NTSC is a widespread standard in North America, using the NTSC interface allows this system to receive a variety of video input devices. The design of the A2D module in this system is a slightly modified version of the one from [3]. More details about this Bt829A can be found in [3] and [8]. Manufactured by Rockwell Semiconductor, the Bt829A chip is a widely used video decoder in several video appliances including personal computers. In addition to its ease of use, the chip supports a variety of video signal formats, such as NTSC and PAL. It is also capable of adjusting frame size, frame resolution, and zooming. In short, this chip is very powerful, versatile, and cheap. Table 2.1, replicated from [3], shows the pin descriptions of this chip. 14 Pin YIN SCL, SDA Input/Output Analog Input I2CCS RST XTOI VD[15..8] VD[7..0] I I I 0 0 DVALID 0 I/O Description NTSC video signal input. Clock and data lines for 12C serial bus for device programming. LSB of 8-bit 12C device address. Reset. NTSC clock (28.636 MHz). Digitized luminance output in 8-bit mode. Digitized luminance output in addition to VD[15..8] in 16-bit mode. Data Valid. High when a valid data (image pixel or blanking) data is being output. Low during blanking intervals or when no pixel is output due to scaling. ACTIVE 0 VACTIVE FIELD 0 0 HRESET 0 VRESET QCLK 0 0 CLKx1 0 Active Video. High when an active image area is being output. Vertical blanking. Low during active vertical lines. Odd/even field indication. "1" denotes odd that an odd field is being digitized. Horizontal Reset. Falling edge denotes new horizontal scan line. Vertical Reset. Falling edge denotes a new field. "Qualified Clock." Gated such that edges occur only when valid, active image pixels are being output. 14.31818 MHz clock output. All output signals are timed with respect to this clock. OE I Tri-state control for certain outputs. Table 2.1: Bt829A pin descriptions 15 VD[15. .0] DVALID ACTIVE CLKxl a) Pixels are valid when both DVALID and ACTIVE are both high. All signals are synchronized to CLKxl which is 14.31818MHz. The falling edge of DVALID signifies the new field is being output. HRESET DVALID ACTIVE b) This timing diagram, the zoomout diagram of (a), between HRESET, DVALID, and ACTIVE. displays the relationship VRESET HRESET VACTIVE C) This timing diagram, the zoomout diagram of between VRESET, HRESET, and VACTIVE (b), shows the relationship Figure 2-2: Timing diagram of output signals from Bt829A 16 The Bt829A digitizes analog NTSC signal from input YIN. Synchronized to CLKxl, the digital output VD[15..8] and VD[7..0] represent luminance and chrominance respectively. Because this system only needs 8 bits of data, the chrominance information is ignored. So only VD[15..8] is used as displayed in Figure A-1. The timing diagram of the output signal is illustrated in Figure 2-2. Since this Bt829A requires programming upon startup, a control logic provides programming interface to the chip using 12C protocol[7]. This control logic is implemented on a CPLD to allow flexible design and ease of debugging. The control logic is a finite state machine which programs the Bt829A on SCL and SDK pins with desired parameters. The implementation of the FSM was obtained from [3] with some parameters adjusted for this demonstration system. There are several sets of these parameters which yield 128x128 output frames, and each set of parameters results in different image size and quality. In this demonstration system, two sets of parameters were tested. The first parameter set programs the Bt829A to digitize a frame at resolution 256x256, and then scale down to 128x128 vertically and horizontally. This method creates jagged horizontal lines on every frame because the Bt829A does not scale interlaced frames very well. The other set of parameters eliminated this problem by using non-interlace mode or decimating the even fields of the digitized frame, so the frame then has 128x256 video resolution which is then horizontally scaled down to 128x128. As a result, the second parameter set yields much better and clearer video output. 2.2 Image EPROM One of the significant challenges in implementing a system that involves real time data is debugging. In a system whose input is real time and nondeterministic, the output is also nondeterministic. It is exceptionally difficult to determine whether or 17 not the system is operating correctly from the nondeterministic output because there is nothing to compare the output to. A good method for debugging such a system is to input some test vectors and determine whether the output is as expected. The image EPROM precisely serves this purpose. The schematic diagram of this image EPROM is shown in Figure A-4. This EPROM stores digital pixels of a video frame. It provides an alternative input in addition to NTSC signal as mentioned in the previous section. The data to be programmed on the EPROM is extracted from an 8-bit 128x128 black-and-white PGM image. The PGM image format is basically an array of raw digital pixels with a string header line which specifies the dimension of the image and level of grey-scale depth. This header line is simply ignored to extract the raw digital pixels. A program written in C is used to convert the PGM image into an appropriate size and format for the EPROM. This program is included in the Appendix. 2.3 Input Frame Buffer This input frame buffer is simply SRAM used to ensure that pixels are delivered in an appropriate order from the A2D to the video compression chip. Furthermore, to prevent video image corruption, the buffer also handles the rate difference between the output rate of the A2D and the input rate of the video compression chip. The image EPROM, however, does not need this buffer because an EPROM is itself a pre-programmed buffer. The pixels from the image EPROM are sent directly to the video compression chip without buffering. By eliminating the buffering process of the EPROM, the datapath is less complicated and, therefore, more efficient. There are several ways to implement a buffer. One typical way is as displayed in Figure 2-3. In this implementation, the controller determines when to read and write the the SRAM using two tri-state buffers. The controller cannot read and write 18 ADDR DATA ADDR R / W R/W WRITE DATA I/o READ DATA SRAM Controller Figure 2-3: An example of a buffer using a one-port SRAM simultaneously because there is only data port. Although this implementation might be sufficient, the controller has to operate at very high frequency to keep up with the data rate. Another way of implementation is to use dual-port SRAM. One port is used solely to write data from A2D, and the other is used to read data. Figure A-5 shows the schematic diagram of the SRAM and its control logic modules. The module on the left, UCYP_1:D controls the write buffer, and the right one, UCYP-1:E, controls the read buffer and also the output enable for the image EPROM discussed earlier. To reduce the complexity of the system, both control logic modules are implemented on a CPLD. With this implementation, the buffer can be read and written simultaneously and the controllers can operate at half of the frequency required for one-port SRAM(or 28.63MHz). One of the main functions of the buffer is to handle the rate difference between the A2D and the video compression chip. The output rate of the A2D converter is 30 frames per second, but the input rate of the video compression chip is 30.518 frames per second(will be discussed below). If the write address and read address are not controlled, every once in a while both addresses can overlap, thus causing image corruption. Therefore, a control mechanism is necessary to ensure that a frame cannot be read and written at the same time. To prevent the occurrence of image corruption, 19 FRAME #0, lowest address From 0 to (2^17 - 1) 01100111001... FRAME #1 From 2^17 to (2^18 - 1) READ DATA to video compression chip 110001111000... When this frame is completely read, the next frame is read if it is not being written. If it is being written, this frame is repeated. FRAME #2 From 2^18 to (2^19 -1) WRITE DATA from 2D 0101101100... When this frame is completely written, frame #3 is written next. If the current frame is #3, frame #0 is next. FRAME #3, highest address From 2^19 to (2^20 - 1) 110011101 ... Figure 2-4: A mechanism used to prevent image corruption 20 Waiting Set Address Count=O Address Count = 16383 (128x128-1) If BANK = 3 BANK = 0 else BANK = BANK + 1 If VRESET = 0 and field = 1 Loading Figure 2-5: Finite state machine of the left control logic module the SRAM buffer is divided into 4 memory banks, each capable of storing exactly 1 frame. The read and write control logic modules are designed not to access the same bank simultaneously. Figure 2-4 explains how the mechanism works. From Figure 2-4, there are four frames, each frame contains 128x128 pixels and occupies address bits 0 - 13 or 214 bytes of space in the SRAM buffer. The highest address bits 14 and 15 are used to identify the number of each frame. The frame being written is always kept at least one frame ahead of the frame being read. Since the reading rate is faster than the writing rate, every 1 or 2 seconds a frame is done reading whereas the next one is still being written. In this case, that read frame has to be repeated to provide more time for the next frame to finish writing. This frame repetition happens only once every few seconds, and it does not have significant effect on the final image. In fact, it is not noticeable at all in the final video output. In the schematic diagram in Figure A-5, there are two controllers implemented on a CPLD for design flexibility. The controller on the left of the SRAM is a "write controller" and on the right is a "read controller". The write controller is a simple finite state machine which receives inputs from A2D and writes the data to the SRAM buffer starting from the frame 0. From Figure 2-5, the machine has only 2 states: WAITING and LOADING. The machine is synchronized to the CLKx1 output of Bt829A chip. At first, the machine waits in the WAITING state. When a pixel is 21 Reset Send reset signal to the video chi After one clock cycle Delay Start Idle for 4 cycles After 4 clock cycles Waiting After the left control logic has finished writing a frame Start Signal Send a pulse of STARTFRM to the video chip One clock cycle Loading Looping forever in this state to delive pixels continuously. The machine repeats a frame when necessary Figure 2-6: Finite state machine of the right control logic module delivered from the Bt829A, or when VRESET and FIELD from Bt829A equal to 0 and 1 respectively, the machine proceeds to LOADING state where it starts loading pixel into the SRAM buffer. When all pixels are stored, the machine goes back into WAITING state and prepares to write to the next frame of the buffer. Unlike the write controller, the read controller has 2 functions: delivering pixels in a specific order to the video compression chip using the mechanism above to prevent image corruption, and controlling the tri-state outputs of the image EPROM and of the SRAM buffer. The part that delivers pixels is essentially a finite state machine, shown in Figure 2-6. Synchronized to the clock of the video chip, the machine starts 22 in the RESET state where the video chip is also reset, and then goes to the DELAY START state. The DELAY START state actually does not do anything but idle for a few cycles to allow some time for the video chip to become ready. After the DELAY START state, the machine waits in the WAITING state until the write control logic finishes writing the frame 0, then it moves to the START state. In this state, the machine asserts the start signal for the video chip and begins loading pixels continuously in the LOADING state. Looping forever in the LOADING state, the machine repeats loading a frame when the previously described condition occurs. Due to the SIMD architecture of the video chip, frame pixels cannot be loaded in order. Instead, the pixels are loaded as shown in Figure 2-7. One image frame is divided into 1024 4x4 sub-frames. The first pixel of each sub-frame is loaded first one by one, starting from the top left sub-frame(eg. pixel Al, then B1, then C1, ..., then D1, then El, then F1, ...) After that, the next pixel of each sub-frame is loaded(eg. A2, then B2, then C2, ..., then D2, then E2, then F2, ...) and so on until all the pixels are loaded. Although this method of pixel loading seems complicated, it is actually easy to implement. The addressing scheme is simply a crossing of the address lines as shown in the Figure 2-8. Using this address crossing scheme, when the Count Address counts in an increasing order from 0 to 16383, the SRAM is accessed in a fashion described above automatically. The other function of the read control logic, which controls the tri-state outputs of SRAM and EPROM, is implemented as a set of combinational logic as illustrated in Figure 2-9. Since the output pins of the image EPROM are connected directly to the output ports of the SRAM buffer, this control logic is necessary to prevent bus contention problem. Obtaining input from a switch, the logic simply determines whether the source of data is from the SRAM buffer or the image EPROM. 'Although the idle state is not required, it is recommended in [4] 23 Al A2 A3 A4 B1 B2 B3 B4 Cl C2 C3 C4 AS A6 A7 A8 B5 B6 B7 B8 C5 C6 C7 C8 A9 A10 All A12 B9 B10 B11 B12 C9 CIO Cl1 A13 A14 A15 A16 B13 B14 B15 B16 C13 C14 C15 El E2 E3 E4 Fl F2 F3 F4 E5 E6 E7 E8 F5 F6 F7 F8 E9 E10 Ell E12 F9 E10 F11 F12 E13 E14 E15 E16 F13 F14 F15 F16 D1 D2 D3 D4 D5 D6 D7 D8 C12 D9 D10 D11 :D12 C16 D13 D14 D15 :D16 0@@ 0 0 0 0 0 0 0 0 0 ---- 0 00------- 0 00 Figure 2-7: How a frame is loaded Count Addr 0 SRAM Addr 0 Count Addr 1 SRAM Addr 1 Count Addr 2 SRAM Addr 2 Count Addr 3 Count Addr 4 SRAM SRAM Count Addr 5 SRAM Addr 5 Count Addr 6 SRAM Count Addr 7 SRAM Addr 7 Count Addr 8 SRAM Addr 8 Count Addr 9 SRAM Addr 3 Addr 4 Addr 6 Addr 9 Count Addr 10 SRAM Addr 10 Count Addr 11 SRAM Addr 11 Count Addr 12 SRAM SRAM Count Addr 13 Addr 12 Addr 13 Figure 2-8: Addressing scheme of the input buffer 24 0........ ... ... CS of SRAM Input from a switch OE of SRAM CS of image EPROM OE of image EPROM Figure 2-9: Combinational logic which controls the tri-state outputs of SRAM buffer and image EPROM 2.4 Video Compression Integrated Circuit(EZW Chip) The video compression integrated circuit is the heart of this demonstration system. Its core circuit processes the most complexity yet consumes the least power. Utilizing wavelet transform and zero-tree coding algorithm[2], this chip is a massively parallel video processor designed by Thomas Simon as part of his PhD thesis[4]. This chip is designed especially to compress 8-bit 128x128 pixel digital video stream with 8 levels of adjustment for image quality (ie. compression ratio). It compresses a group of 16 frames at a time and outputs a series of bits for the entire 16 compressed frames. It also consumes 300-4O0pW of power with compression ratio of approximately 200:1 for good image quality. The key to the low power of this EZW chip is its parallel SIMD architecture. Contrary to intuition, parallelism sometimes can save more power especially when the required output rate is fixed. A circuit with parallelism generally can finish a calculation more rapidly. However, it is unnecessary for the circuit to compute faster when the data rate is fixed. Therefore, the circuit can spare the extra time for power consumption by decreasing its operating voltage. When the operating voltage becomes lower, the computation time is lengthened. The voltage is reduced just enough that the circuit can satisfy the required output rate. Although parallelism archi25 tecture usually consumes more power, a low operating voltage can offset the power increase and generally result in overall lower power consumption. Packaged in a 208 pin PGA, this chip requires 3 external instruction EPROMs which store several sets of instructions used to compute different levels of compression. Each instruction set is burned onto the EPROMs at different locations which can be accessed by 4 on-board switches. These switches allow real-time adjustment of compression ratio and image quality. Despite the complexity of the integrated circuit itself, its input and output interface is considerably simple to build a system around. Table 2.2 shows the pin descriptions of the chip. In addition to the input and output pins described in the Table, the EZW chips also have many other debugging pins which are not used in this demonstration system. RESET START EZW CLK IN wait a few cycles EZW DATA IN (8 bit bus) I Pixel 0 Pixel I Pixel 2 Figure 2-10: Timing Diagram of Input Signals to Video Compression Integrated Circuit The input interface of this chip is very simple. Figure 2-10 shows the timing diagram of input signals to the video compression integrated circuit. When the chip is powered up, it has to be reset before any computation is performed. All input signals are synchronized to EZW CLK IN clock signal which is fixed at 500KHz. The reset 26 Pin CLK Input/Output I RESETFRM I STARTFRM I Description 500KHz clock input. All inputs are synchronized to this clock. Reset signal. It puts the chip into a reset state and wait for the start signal. Start pulse. It signifies that the next data is valid at the next clock cycle. PIX[7..0] INST[19..0] I I CLKOUT DOUT ENDGRP 0 0 0 VCC Vdd I I VWW I Digital video pixel. Instructions for computation. In this system, the EZW chip receives instructions from 3 EPROMs. Clock for DOUT. Data output bit with respect to CLKOUT. A pulse that signifies the end of a compressed group. 5V power supply for output driver. Power supply for the core of the chip. It can range from 1.5V-2.5V. Power supply. VWW should be VDD + 1V. Table 2.2: Pin descriptions of the EZW chip 27 EZW DATA OUT X EZW DATA OUT CLK EZW GROUP CLK Figure 2-11: Timing Diagram of Output Signals from Video Compression Integrated Circuit signal, RESETFRM, has to be at least 1 clock cycle. After the chip is reset, it waits for the STAR TFRM signal which should be also at least 1 clock cycle. After the STARTFRM signal is asserted, each pixel is read continuously into the chip at the next rising edge of the CLK input signal. When all pixels of a frame are completely delivered, the first pixel of the next frame is immediately delivered at the next clock cycle. At 500Khz, the chip can process 500x10 3 /(128x128) = 30.518 frames per second. The output interface, Figure 2-11, of the EZW chip is even simpler than the input interface. There are only 3 output pins from the chip, as described in Table 2.2. The DOUT is the output data bit and should be read at the rising edge of the CLKOUT signal. The ENDGRP determines the end of 16 frame group. The chip can assert CLKOUT sparingly or in bursts depending on spatial and temporal content of the input frames. As described previously, the rate of the output bits depends on the voltage level of Vdd. The period of the EZW DAT CLK increases when the Vdd decreases. At the lowest Vdd of 1.5V, the EZW DAT CLK has the longest period of 200 ns. The period of the EZW DAT CLK decreases to 90 ns when Vdd is at its highest value of 2.5V. This EZW video compression chip requires only a small controller. It needs a reset signal and a start signal which are provided by the Input Frame Buffer module. 28 +1 D[6. .0] 64MHz 7 bit register Q[6..01 Q[6] 500KHz clock Figure 2-12: 64MHz to 500KHz clock divider The 500KHz clock is generated by dividing the 64MHz system clock by 128 as shown in Figure 2-12. To save some circuit board space, the clock divider is implemented on a CPLD. 2.5 EZW Output Frame Buffer The EZW Output Frame Buffer basically buffers the output from the video compression chip before it is sent to a PC. Similar to the Input Frame Buffer, the Output Frame Buffer is comprised of SRAM chips and control logic implemented on a CPLD. The EZW Output Frame Buffer receives serial data output from the video chip and writes it to one part of the memory buffer, while at the same time, another part of the buffer is being read to the PC. In other words, this buffer is essentially a ping-pong buffer. Unlike the Input Buffer described in the previous section, the size of the buffer needs to be estimated. Since the number of bits from the video compression chip can be varied, the capacity of the buffer has to be large enough to hold all the output bits in the worst case. A conservative estimation is computed through a simple calculation: 29 READ/WRITE BUFFER SWITCH WRITE DATA EZW ATAOLEBIT EZW DATA CLK 10PACKNG EZW GROUP CLK - - 16 s oloone. ADDR 1Mb SRAM #0 (8bit wide) 8DATA DATA CS OEWE FSM Write Finish OK x0 Read Finish READ ADDR. ik1 Parallel Port Interface Module DATA 8 eve me.. osifb CS,OE,WE n o"", orr ... tre ... pw d ....... a:.- J.. -q- 7 ----------- p CEDOE, WEj J Mb SRAM #1 1(8bit wide) Figure 2-13: Schematic Diagram of the Output Frame Buffer Controller The video chip compresses a group of 16 or 24 frames. Each frame has 128 x 128 x 8 or 2(7+7+3 bits. Therefore, uncompressed 16 frames should have 221 = 2M bits. For both read and write buffers, the SRAM should have 4M bits of capacity. However, a dual port SRAM with 4M bits of capacity is not easy to find. In fact, the estimation done above is too conservative. Practically, the video compression has compression ratio of about 100:1 or more, and even in the worst case, the ratio is certainly greater than 2:1, thus reducing the capacity requirement by at least a half. Nonetheless, even 2M-bit dual port SRAMs are still rare. To get around with this problem, a simpler buffer structure is designed. Since 1Mx8 bit SRAMs are commercially available from IDT, the buffer is, then, separated into 2 chips of SRAM with another controller implemented on a Cypress CPLD. The final design of the EZW Output Frame Buffer is shown in Figure 2-13. Furthermore, choosing 8bit wide datapath also simplifies the system greatly because the parallel port datapath is 8 bits as well. In Figure 2-13, there are three additional control modules worth discussing here: Bit Packing FSM, Synchronizer FSM, and Read/Write Buffer Switch. Similar to all 30 EZW DAT CLK=O or EZW GRP CLK=O EZW CLK = EZW DAT CLK=1, C RESET Set Addr 0 Wri WAIT FOR ONE EZW DATA CLK =1 titionI et EZW CLK 1 i !=7 , WAIT FOR ZERO =0 AVd1 WIT LAST BYK Assertser sga EZ WTFREZW FO GRP CLK= CLK= ZW THOWYGOPEZW sinGR Aser WEE DAT DAT EZW L CLK=1 CLK=1 GRP CLK= EW RPCL= O Figure 2-14: Bit Packing Finite State Machine other parts of the system, these three modules are implemented on a CPLD for design flexibility and ease of debugging. 2.5.1 Bit Packing Finite State Machine As described earlier in the previous section, the output from the video compression chip is serial. However, the data path of this system is 8 bits wide. Implemented on a Cypress CPLD, this Bit Packing Finite State Machine essentially packs 8 bits together to form a byte and pulses a WE signal to write the bits to a buffer. Figure 2-14 shows how the finite state machine functions. Although the state diagram looks complicated, the conceptual idea of the machine is very easy. Basically the machine waits for output data from the video chip and writes to the buffer. The 31 1 0 bit EZW DATA OUT X C EZW DATA OUT CLK * 000 MSB 0 7 1 1 6 5 X X X 4 3 2 LSB 1 0 1 BYTE BUFFER Figure 2-15: Bit Storage Format machine starts from the RESET state, then moves directly to the WAIT FOR ONE state in order to wait for data. When the EZW DAT CLK=1, a signal which means that the EZW DAT BIT holds a valid bit, the machine grabs that bit and writes it to a temporary buffer, as shown in Figure 2-15. From Figure 2-15, the machine writes that bit to the most significant slot available. The following incoming bits are written to the next slot on the right. When the temporary space is full, a condition which means all 8 bits have been stored, the machine asserts a WE signal to the Read/Write Buffer Switch and clears the temporary space. If the temporary space is not full, it simply stores that new bit. The machine then waits for EZW DAT CLK to become 0 in the state WAIT FOR ZERO. The process repeats all over again until EZW GRP CLK=1. When EZW GRP CLK=1 or all the bits have been received from the EZW video chip, the machine asserts the WE signal to write whatever it has to the buffer, and the machine notifies Read/Write Buffer Switch by asserting a READY signal. The machine then waits for the OK signal from the Read/Write Buffer Switch module in the WAIT FOR OK state. The assertion of the OK signal, which is done only after both Parallel Port Interface and Bit Packing FSM have finished transmitting and storing bits respectively, means that the Read/Write Buffer Switch has swapped the read and write buffer. Only after the OK signal is received, the Bit Packing FSM can begin capturing output bits again. In some cases, a new bit can arrive from the EZW video chip while the machine is waiting for the OK signal. This situation occurs because the EZW video chip operates independently from the 32 1 bit EZW DATA OUT 10 EZW DATA OUT CLK MSB x x 7 6 5 4 XX 1 1 3 2 1 T 1 LSB 0 1 BYTE BUFFER Figure 2-16: An alternative of how to store a bit EZW Output Frame Buffer. In such a case, the machine simply throws all the bits away in the THROW AWAY GROUP state. Since the Bit Packing FSM captures bits by sampling EZW DAT CLK, the clock that drives the FSM must be fast enough so that the FSM does not lose any data. From the finite state machine diagram in Figure 2-14, the longest loop between 2 consecutive incoming bits is when the machine traverses the following states in order: WAIT FOR ONE, WRITE SRAM, and WAIT FOR ZERO. This means that the clock that drives this state machine has to be fast enough to finish writing a byte to the SRAM while the EZW DAT CLK still maintains its value, otherwise the system would lose 1 bit of information. At the maximum Vdd of the video compression chip, Vdd = 2.5V, the EZW DAT CLK yields the shortest positive pulse, which is 90 ns. So, to guarantee that the system captures all the bits, this FSM has to traverse at least 3 states within 90 ns. Because 64MHz corresponds to 5.76 cycles within 90 ns, this clock frequency was chosen for this FSM. In addition, 64M is conveniently multiple of 2, and it can be divided to generate other clock speed easily. An alternative to implementing this finite state machine is to write the incoming bit to the least significant bit instead of the most significant bit in an 8 bit temporary space, Figure 2-16. However, this alternative would increase the complexity of the system, especially the decoding software. The decoding software expects a stream of 33 WAIT FOR DONE WRITE FINISH READ FINISH = = A WRITE FINISH = 1 READ FINISH = 1 Set toggle = 0 0 !toggle WAIT FOR CLEARl t O ig Figure 2-17: State diagram of parts of Read/Write Buffer Switch bits that are in the same order as the output bits from the video compression chip. Therefore, if the least significant bit was written first, the software would have to reverse the order of bits every time it received a byte of data from a parallel port. 2.5.2 Synchronizer Finite State Machine As the name suggests, this finite state machine synchronizes the Bit Packing FSM and the Parallel Port Interface module so that they start their processes at the same time. It ensures that the read buffer has completely been read and that the write buffer has completely been written before they are swapped. Figure 2-17 shows the state transition diagram of this FSM. This machine simply consists of two states, the WAIT FOR DONE and the WAIT FOR CLEAR states. The machine waits for a finish signal from both Bit Packing FSM and Parallel Port Interface in the WAIT FOR DONE state. When both of them send the finish signals to the Synchronizer(not necessarily at the same time), it proceeds to the WAIT FOR CLEAR state. In this state, the machine sends a swap signal to the Read/Write Buffer Switch and simultaneously sends back an OK to the two circuit modules to signify that the read and write buffers have been swapped. The machine goes back to the WAIT FOR DONE state when the two circuit modules 34 de-assert their finish signals. 2.5.3 Read/Write Buffer Switch Implemented on a CPLD, this circuit module is basically a switch whose function is to alternate the roles between the read and write SRAM buffers upon receiving a toggle signal from Synchronizer FSM. The swapping process is nothing but re-routing signals as shown in Figure 2-18. From the schematic diagram in Figure 2-18, the circuit utilizes tri-state buffers to route the signals to their appropriate destinations. The tri-state buffers are controlled by the TOGGLE signal sent from the Synchronizer FSM. The roles of the buffer 1 and 0 alternate in accord with the TOGGLE signal. In other words, when the TOGGLE = 1, the buffer 1 and 0 are write and read buffers, respectively, and vice versa. 2.6 Parallel Port Interface Implemented in a Cypress CPLD, this circuit module is the connection between the demonstration system and the computer. The main function of this module is to transmit data from the EZW Output Frame Buffer to the PC via parallel port. Although the parallel port is normally used to transfer data from a PC to a peripheral, in this demonstration system, it can transmit data from the board to the PC by operating in ECP mode. The protocol to transfer data is simply a hand-shaking protocol shown in Figure 2-19. Here are the steps to transfer data from the board to a PC: 1. When the board is ready, it brings AckReverseReq high and PeriphAck low. 35 TOGGLE (from Synchronizer FSM) WRITE ADDRESS[15..0] WRITE DATA[7..0] WRITE CS,OE,WE I A 1: READ ADDRESS(15. .0) READ DATA[7..0] READ CS,OE,WE ______I.. r-r-i 0u > SRAM BUFFER 0 0 *a 0 SRAM BUFFER 1 Figure 2-18: Schematic Diagram of parts of Read/Write Buffer Switch 36 DO-D7 (bidirectional) PeriphAck(board to PC) Ack(board to PC) HostAck(PC to board) HostCLK(PC to board) ReverseReq(PC to board) AckReverseReq(board to PC) Figure 2-19: Timing Diagram of the Parallel Port Protocol in ECP mode. 2. When the PC is ready to transmit the data, it brings HostClk high and HostAck low. 3. After a delay of at least 0.5 ms, the PC pulls ReverseReq low to request for incoming data. 4. The board acknowledges by bringing AckReverseReq low. 5. The parallel port can send either "data" or "command" depending on the PeriphAck signal. The difference between a "data" byte and a "command" byte is that the byte is written into a different memory space on a PC. PeriphAck is high for transmitting "data", and low for "command". Because this demonstration board always sends data, PeriphAck is always high. 6. The boards brings Ack low to notify the PC that the data is ready to be read. At this point the data bus DO-D7 must hold valid value. 7. The PC acknowledges that it has read the data by pulling HostAck high. 8. The board confirms that the PC has accepted the data by bringing Ack high. 9. The PC finishes one byte transmission by bringing HostAck low. 37 Pin: Signal Function Source D-SUB 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18-25 HostClk DO D1 D2 D3 D4 D5 D6 D7 Ack PeriphAck AckReverseReq Select HostAck Error ReverseReq P1284 Gnd Strobe DO-D7 Data Bit 0 Data Bit I Data Bit 2 Data Bit 3 Data Bit 4 Data Bit 5 Data Bit 6 Data Bit 7 Acknowledge Printer Busy Out of Paper Printer Online Automatic Line Feed Error Initialize Printer Select Printer Ground PC PC PC PC PC PC PC PC PC Printer Printer Printer Printer PC Printer PC PC Register Pin: Name Centronics Control Data Data Data Data Data Data Data Data Status Status Status Status Control Status Control Control 1 2 3 4 5 6 7 8 9 10 11 12 13 14 32 31 36 19-30 Table 2.3: Mapping Between Centronics pinouts and D-SUB pinouts 10. The process repeats starting from step 5 until all the bytes are transfered. Since this demonstration system only sends data to the PC, the ReverseReq is always low. The state diagram of the finite state machine that controls this transfer process is illustrated in Figure 2-20. While maintaining a hand-shaking process with the PC, the machine reads from the read buffer of the EZW Output Frame Buffer. Since the maximum size of each DMA transfer is 32K bytes, the data has to be divided into 4 blocks of 32K bytes so the total number of transfered bits is IM bits. This demonstration system uses a 35-pin Centronics(IEEE 1284-B) connector because it is easy to wire and compatible with many other type of connectors. Usually, a standard PC uses a 25-pin D-SUB(IEEE 1284-A) receptacle, the Parallel Port Inter38 else WAIT FOR REQ .P REVERSE REQ = HOST ACK = 0 and SEND DATA ACK = HOST ACK = 0 0 I WAIT FOR HSTACK UP HOST ACK = 1 WAIT FOR HOSTACK DOWN ACK HOST ACK = UPDATE ADDR else ONE 1 0 ONE BYTE read addr = 2^15(32K) nReverseReq = 0 WAIT FOR NEXT DMA else read adr = 2^17(lM) else WAIT FOR OK OK = 1 read finish=l Figure 2-20: State diagram of finite state machine that controls the parallel port interface 39 Bt829A Cypress CPL REST Bt829A Control Logic Input Frame Buffer Controller I 1-- Image EPROM EZW Controller and Clock Divider EZW video Compression Chip Figure 2-21: Logic blocks in the first CPLD face Module has to map Centronics pinouts to D-SUB pinouts using information from Table 2.3. This table is originally for a printer pinout mapping, but this mapping is compatible with this demonstration system as well. 2.7 Complex Programmable Logic Devices(CPLD) The programmable capability of the Complex Programmable Logic Device(CPLD) allows flexibility of design and ease of implementation and debugging. As mentioned throughout several sections discussed before, most of the logic blocks are implemented on a CPLD. These logic blocks are usually control logic modules, finite state machines, or other complicated logic blocks. The CPLD used in this system is Ultra 37256 manufactured by Cypress Semiconductor. The device has 16 logic blocks, each with 16 macrocells, for a total of 256 macrocells. The code is written in VHDL, which is compiled by Galaxy VHDL compiler and then programmed to the device by Cypress's ISR programming software tool. The VHDL code is simulated using Cypress's Warp VHDL simulation to ensure the functionality and timing property of the generated logic. In addition, each Cypress CPLD can be re-programmed on-board via a JTAG connector. This on-board re-programmable capability simplifies the debugging process tremendously. 40 Cypress CPLD Data From____________ EZW Video Compression Chip Bit Packing FSM SRAM BUFFER RESET AUX Synchronizer FSM Read/Write Buffer Switch SRAM BUFFER TO PC Parallel Port Interface Figure 2-22: Logic blocks in the second CPLD Due to the limited size and the I/O pins of a Cypress CPLD, this demonstration system uses 2 of Ultra 37256 Cypress CPLDs. The first CPLD contains logic blocks as shown in Figure 2-21, and the second one is shown in Figure 2-22. All the logic blocks shown in both figures have been discussed in the previous sections. The number of macrocells used in each CPLD is balanced so that both of them still have some space left for debugging and future expansion. In this system, both CPLDs have approximately 60% macrocell utilization. The only device which can control these two CPLDs are two on-board buttons, RESET and A UX. The RESET button resets both CPLDs to their default or start state. The A UX button is an extra button for debugging purposes. The A UX button has proven to be extremely useful for debugging as it was heavily used to generate several test vectors. The ability to use this A UX button would not be possible if the CPLDs were unable to be programmed on-board. 41 Chapter 3 Software System This chapter discusses the software part of this demonstration system. The function of the software is to read data from the parallel port, decode the data, and display the decoded image as a movie. Similar to the hardware system, modularity of the software system helps in developing the software so that it is fast, readable, and easy to debug. The software for this demonstration system is divided into 4 parts: Direct Memory Access(DMA) Device Driver, Decoder, Video Player, and User Interface. The software is built into an executable(.exe) file which can be executed on any Microsoft Windows platform. User Interface Data from Furae31lPok Data aegc Decode Display Figure 3-1: Block Diagram of the Software System 42 3.1 Direct Memory Access Device Driver This software module is the connection between the hardware and the PC. In order to display the video in real time, the transmission rate from the circuit board to the PC has to be fast enough. For an uncompressed video stream, the rate required is 128x128x8x30 or about 4M bits per second. Given the compression ratio of about 10, the expected data rate for compressed video is approximately 0.4M bps which can be handled by a parallel port in Direct Memory Access(DMA) mode. In DMA mode, the parallel port controller circuit in a PC bypasses the operating system memory routines and directly accesses the PC's main memory(thus, direct memory access). The DMA mode helps reduce the workload of the processor and increase the speed of data transfer. While the processor is decoding the data, the DMA controller can receive data from the parallel port and then store the data in a memory. The processor only needs to initiate the transfer, after which it can then perform other tasks while waiting for all the data to be completely stored in the memory. The DMA device driver used in this demonstration system is WinRT DMA driver from BlueWater Systems Incoperation. BlueWater Systems Inc. provides tool suite to use together with Microsoft Visual C++ compiler to build the software. The driver has two versions: WinRT for Windows 9x and WinRT for Windows NT. 3.2 Decoder This module takes data from the memory, stored by the WinRT DMA driver, and decodes it into raw images. It is based on a decoder developed by Thomas Simon[4] in C on a Unix platform. The functionality of this decoder is very limited. It can only decode a stream of "0" and "1", which represent bit 0 and bit 1 respectively, and that stream must be in a file with a specific name. The output of this decode is 43 a series of 128x128 PGM image files. In addition, the original decoder decodes only one group of frames and terminates without freeing memory it has allocated. Modified for compatibility, the decoder used in this demonstration system is designed so that it loops forever and frees all the allocated memory so as to prevent a memory overflow. However, after the modification, the output of the decoder was still not fast enough to keep up with the input from the circuit board. To resolve this difficulty, some frames had to be ignored. So, instead of decoding all 16 images, the decoder decodes only the first few images, stops, discards the rest of the data, and starts with the new group of frames. This solution works well and speeds up the process significantly although the final video output shows some discontinuities. 3.3 Video Player This module of the software displays the final stream of image frames on the display device. The final image is a 128x128 grey-scale pixel bitmap. To display the images as a movie, the PC simply draws a series of output images one after another. If the software just draws the images one by one, the video will not be smoothly continuous. In fact, if such a drawing algorithm were implemented, one would actually be able to see each pixel being drawn. To prevent such a problem, the image should be first drawn in background and then displayed when it is finished. Figure 3-2 illustrates how using a background page helps smoothen the video play. As shown in Figure 3-2, there are two copies of memory pages for the video display. The "primary page" is being displayed while the "background page" is being drawn. During the vertical sync, the monitor's electron gun is brought back to the upper left corner. While no image is being drawn on the screen during this vertical sync, the processor can swap the "primary page" and the "background page" pointers. As a result, every video frame displayed is clear and complete. The video display module of this demonstration system utilizes the Microsoft DirectDraw library which 44 Pointer to the page being displayed Pointer to the page being displayed U222 The 2 pages are swapped during vertical sync. Pointer to the page in background Pointer to the page in background Figure 3-2: Double frame buffer supports such a drawing method. 3.4 User Interface(UI) The user interface is the layer on top of all the modules. It is a connection between a user and the software which makes the software easily manageable. The UI is written conforming to Microsoft Windows API so that it is compatible with all Microsoft Windows operating systems. The look and feel of the UI is as shown in Figure 3-3. The UI is composed of 3 Windows components: 1. Radio buttons. These buttons are for users to select the image quality(or compression ratio) of the image. 2. Text area. The text area is located at the top left corner of the window. It shows the frame number being displayed and the current compression ratio. 3. Video panel. It is a 128x128 image panel displaying the actual video image. 45 Figure 3-3: The Working System 46 Chapter 4 System Implementation After the design was completed, the system was implemented. The circuit board was especially designed to facilitate the testing and debugging process. The software was developed to be portable and user-friendly. After testing and debugging, the system operated properly, and the functionality of the EZW video chip was verified. The implementation process started immediately after the completetion of the initial design. The hardware system was more complicated and, therefore, prone to more mistakes than the software. Unlike in the software, a bug that occurs in the hardware is usually difficult and expensive to fix. Thus, it is very crucial to pay special attention to the hardware system to minimize the number of bugs. 4.1 Hardware System Due to a careful design with modularity, the implementation is very straight-forward. The hardware system was implemented on a single printed circuit board using a design tool suite from Accel Technologies. A schematic diagram was graphically drawn into Accel Schematic. The Accel PCB then extracted the board layout from the schematic diagram. All the components were hand-placed to optimized the area of the printed circuit board. The connections between components were generated automatically 47 Figure 4-1: The front side of the unpopulated PCB by Accel Auto-Route tool. Unfortunately, most of the time the Accel Auto-Route was unable to find traces for all the connections, and the remaining connections were finished manually. The result was a 7.5" x 7.5" 8-layered PCB as shown in Figure 4-1, Figure 4-2, and 4-3. The board consists of 8 signal layers. Since there are numerous power and ground connections, the power and ground planes are necessary to help reduce ground bounce. In addition to 5V power supply and ground planes, the board has another power supply plane specifically designed for the EZW chip because it requires a stable power supply. The ground plane is placed in between the 2 power supply planes to form bypass capacitors to stabilize the system even more. Located at the top left corner of the board is the Bt829A chip which contains both analog and digital pinouts. To provide isolation to the analog signals from the digital noise, all the planes are notched around the Bt829A chip. 48 Figure 4-2: The back side of the unpopulated PCB Signal traces are 6 mils in width. the power supply and ground traces(necessary for surface mounted components) are 8 mils. The board has 101 elecrical components, 1382 pads, and 621 via holes(each with 14 mils in diameter). The board was fabricated by Compunetics Incorporation. 4.2 Software System The software was implemented with Microsoft Visual C++, which has good technical support and easily accessed on-line documentation. The DMA device driver chosen for this system is the WinRT DMA Driver from BlueWater Systems Incorporation. The WinRT DMA Device Driver is designed for the parallel port in DMA mode. The driver has a set of C libraries which can be easily included in the program. Reading data from the parallel port is done through a 49 Figure 4-3: The Finished PCB with all components in place 50 simple function call, and the data is, then, transferred without adding more workload on the processor. The decoder was ported from Thomas Simon's original code to Microsoft Visual C++. Since there is not much difference between Unix C and Microsoft Visual C++, the porting was easy and seamless. The user interface is written conforming to Microsoft Windows application programming interface(API). Using a standard API, the software can operate on any Microsoft Windows operating system, thus making it more portable. 51 Chapter 5 System Performance The circuit can capture images from a camera, compress and send the images to a PC, in which the PC can then display the video in real time. There is about a 2 second frame latency because of the delay from the software decoder. Although the EZW chip can keep up with the real time input, the decoder written for the PC is extremely slow. The software spent approximately 30 seconds to decode about half a second of compressed video. Clearly, the software could not maintain real time video play back at this rate of decoding. Therefore, to achieve the real time result, the decoder was modified so that only a few frames are decoded and displayed every 0.5 second. The delay of the decoder also constitutes the frame latencies. While the PC is decoding the data, the board has to wait until the process is completed before a new set of bits can be transmitted to the PC. The latency depends on the number of frames decoded every 0.5 seconds as previously mentioned. Table 5.1 shows the relationship between the number of frames decoded and the latency. The latency increases linearly as the number of frames increases. By keeping the number of frames decoded at 1 or 2, the system can sustain the real-time video playback at an acceptable rate of 2 frames per second with approximate latency of 2 seconds. 52 Number of Frames Decoded] Approximate Latency (in second) 1 2 2 3 3 5 4 7 5 9 6 11 7 12 8 13 9 14 10 15 Table 5.1: The relationship between the number of frames decoded and the latency EZW Passes 6 5 4 3 2 J Compresion Ratio 82:1 130:1 250:1 490:1 1020:1 Table 5.2: The performance of this EZW video chip This EZW video chip yields an excellent compression ratio as shown in Table 5.2. For good image quality, the chip gives compression ratio of 82:1 on average. The compression ratio increases as the number of EZW passes decreases. From the Table, the chip can even achieve the ratio of 1000:1 with acceptable image quality. The power consumed by this chip was in a range of 450-750pW depending on spatial and temporal content of the video image. 53 Chapter 6 Conclusion and Future Improvement 6.1 Conclusion This thesis investigates a demonstration system of a ultra low-power video compression and encoding integrated circuit. The system was designed and implemented by using the video chip and off-the-shelf parts. The datapath of the system was carefully planned to achieve the real-time video playback. The most challenging process was designing both the hardware and the software together so that they were compatible and resulted in a smooth, reliable system. Built from this design, this system was user-friendly and required only minimal setup. A substantial number of design methodologies were utilized while designing this demonstration system. Modularity, abstraction, and hierarchy significantly helped ease the implementation and debugging process of this system. The system was separated into several small modules. The interface between these modules was designed at the beginning and maintained throughout the design process so that a change in one module would not affect others. These design principles also provide the possibilities for further improvements and extension modules in the future. 54 6.2 Ideas for the Future Although the system functions properly, it can be further improved by the following suggestions: 1. Another idea is reducing the size of the circuit board. A number of components such as resistors and by-pass capacitors can be substituted with surface-mounted ones. The empty space left in the CPLDs can be used to store the instruction for the EZW chip instead of the EPROMs. 2. To change the compression level, a user must flip a set of dip switches on board. This method is, however, rather inconvenient because the user has to change the compression level both in the software and the hardware. An alternative to changing the compression level is to have the circuit read the value from the software on the PC instead of dip switches. 3. This system can be made wireless. Although a parallel port cable connects between the board and the PC, it can be replaced by a pair of wireless modules. One module is attached to the board and the other to the PC. These 2 modules communicate with each other using IR or RF signal. The data can then be sent from the board to the PC without any cable. 4. Because the current software decoder is extremely slow, many frames are thrown away and the final video can display only two frames per second. If the decoder is optimized more, the system can potentially display more frames in one second. The above ideas are just examples. There are also others ideas which can be applied to and immensely improve the system. 55 Bibliography [1] Jan Axelson. ParallelPort Complete. Programming,Interfacing, & Using the PC's ParallelPrinterPort. Lakeview Research. 1997. [2] Jerome M. Shapiro. Embedded image coding using zerotree of wavelet coefficients. IEEE Transactions on Signal Processing,41:3445-3462, December 1993. [3] Rex K. Min. DemonstrationSystem for a Low-Power Video Coder and Decoder. MS thesis, Massachusetts Institute of Technology, 1999. [4] Thomas Simon. A Low Power video Compression Chip for Portable Applications. PhD thesis, Massachusetts Institute of Technology, 1999. [5] Jon Bates, Tim Tompkins. Using Visual C++ 6. Que Corporation, 1998. [6] Viktor Toth. Visual C++ 5 Unleashed. Sams Publishing Inc., 1997. [7] Rockwell Semiconductor Systems. P C-Bus Reference Guide. Jine 1997. [8] Rockwell Semiconductor Systems. Bt829A/827A VideoStream II Decoders. Preliminary product datasheet, November, 1997. [9] Integrated Device Technology, Inc. IDT7008S/L High-Speed Dual-Port Static RAM. Preliminary product datasheet, June 1998. [10] Integrated Device Technology, Inc. IDT71124 CMOS Static RAM. Product datasheet, September 1999. [11] Fairchild Semiconductor. Datasheet, July 1998. NM27C256 56 High Performance CMOS EPROM. Appendix A Schematic Diagrams All the files in this thesis are in charatc/Thesis/ directory. 57 BT1 m J2 = ECg ------- BT829A SYNCOET Y1N 55 57 45 MUX MUXJO 2 58 MUX Mux 3 67 CIN BTCC31 ENC2 MUXOIJT VD15 VD13 VD12 , 14ir VD7 506 505 V04 -c CLEVEL 9.1 8.1 . out T.M F 501 VD8 24-n 23 CREF- 173 BTOS1 am- VDlS ACAP 0.1 -"" " PWRDN NUMXTAL I. . 141 OVAUI) /HRESEF /VRESEr T m,. , ig ] P6:E CCVALIC FIELL 78 XTs OSC Ja3 VACTIVE nwETa 36:F CBFLAG QCLI CLe CLKx2 17 IT1O W 1a OE SCL S.DA TCK 94 TDI TDO TMS TRS1 12CCS DEPOLE S3 bt- rt W_- 143 rPil 123 t bt- 2cce bt_ bt pwrdn , c kx1 clk 146 bt_ .id 147 bt vact tive bt ve bt_~hr bt- c UCYP_ 1:F 99 bt_- l!48 got aet ie 150 152 Wr..a"On -_amx Nr MB Or-wnM -r wr w DOLM UCYP_A Figure A-1: Schematic Diagram of Analog to Digital Converter and Its Control Logic 58 } _ ~ 0 I III I1I 4 raase III1' CL 0 N4 0- 44 D a. ONI R. c( J "cc ta21QVV 004 ~ 3C !2 C4'Pint §Rcgg t o 0 f: C) 0~ 04 PIX 1 GND VP EPROMCE EPRMOE 20 C 22 dE DCT-AM 10 cT _AMR ocT-AD4 AO 7 A2 6 A4 A DCT_-AM DCT_-AMn4 Wrc-AM 3 A7 DM _ADR1O DCT-ADR1 1 DC _ADR12 DCT-ADR1M 23 A10 2 Al1 2 A12 26 A13 A6 ocr_-AM 25 A ocT_-AM24 A9 PM 11DQO PMX1 13 DQ1 13 DQ2 PMx PMx pw 15 DQ3 16 DQ4 pwx 17 DQ5 18 DQ6 1 DQ7 PMxG PW- 27C256 Figure A-4: Schematic diagram of the image EPROM 60 UCYP A UCYP 1:E UCYPA - ezw_ clk in UCYP_ 1:D ezwcl ezw Ao reset 78 ezs_ start btcel IDT2 125 2 btr es126 bt-rw bt adrl5 bt-adr14 btaodr3 bt adr12 bt adr 1 bt-adrlC bt odr9 bt adr9 btaodr7 bt-odr6 bt adr5 btodr5 btadr4 bt odr3 124 122 119 118 117 115 114 113 112 110 169 btAodrO W AMIiS mr-a frfawA IrmA12 Ur-M11 arM"NmmA= I-MM WaWr. mff _A 168 rm 167 166 *rAGM m-M mrm .AO 165 btodr2 104 103 cI 8rc I 4 7 iCs CEl-L 46CEO-L _ I OE-L Rfw-i 40Al5-L 39A14-L A13-L 38 A12-L 37 36 Al1 -L OF-AMU SLAWn 35 All-L A9-L34 mmAin mm 33 AB-L A7-L A6-L A5-L A4-L A3-L N am IsADM 29261 A2-L A1-L AO-L Ar 14 UaM" aIs mi1 W-Azzi32 Aw AM irwIr? ffloua' urwus BLOMi marO4 1m3 eOsra rwn mmcum 54 545 07-L 55 56106-L 5705-L 5804-1 59 103-L 6102-L 62 01-L 00-L .sar 72 nCrncr 65 81 CEl-R 82 CEO-R 78 OE-R R/W-R 3 A15-R 4 A14-R 5 A13-R A12-R 6 Al1-R 7 A10-R 89 A9-R A8-R 16 A7-R 11 A6-R 12 A5-R 13 A4-R 14 A3-R 15 A2-R 16 17 Al -R 18 A6-R 73 72 107-R 106-R 105-R 104-R 103-R 102-R 69 67 101-R 65 100-R crqcs Ocr-aS , E rIJM4 EtAJI , OSaMr ocr war crm DEM , WJW Ocrm EMrAM 5 OCIJO OcraU r= ErIJ Erm E 3e srom 68 crwslS 98 adr15 Er m4 97 dctd-adr14 Q-A13 96 dct~ drl3 =cr1m n cr_.M2dcta 95 dr3 dct 1c mi 94 adr2 n cr awe 93 dct-odrl 1 EClAW 92 dct- dr ncAm 91 dct adr9 WcmV, 89 dctadrB OCTA 88 dct adr7 ErAM 87 dct adr6 OCr5f 86 dctl odr5 =_AND 85 r 84 dct adr3 ErMn 83 dct adr2 Er n 82 dct adrl dctad7 nr6 Psi Poo E~0Mm oeprom cOeprom POM FM :_ A -railr E-" IDT7008 Figure A-5: Schematic diagram of the SRAM buffer and its control logic modules UCYPy_ :C 74HC125 UCYPA . U6:C 74HC125 U6:8 7414C128 A 74HC125 U6:D Figure A-6: Schematic Diagram of i2c programmer 61 UCY91:C CYPB2 UCY_81:0 Cyp_92 - V .ooO.0U .ootOS .ouff .&AT., 7n 27 ME AS o~O~O=...""1644 T A' 6 A15 .dMO&A 0PlB e.rO4T. 1 9 .0.,IS13 od*KckSt"14 M3 U 11 A10 A9 Al A7 AG A5 A4 A3 A2 Al M eArOuffopll 0.OT.p7 A'0wff~p2 JL 1-U v Al I.-. 8.00.4al3. iii-- V07 V06 V05 V04 V03 V02 Vol V09 -i V/02 IDT71124 IDT71 124 Figure A-7: Schematic Diagram of Output Frame Buffer UCYB1:E CYPB2 dk Ul 74 MinO' - a 9- Al A2 A3 periphClh perphAch ockRewvmR.q saw A4 UU2AS 77 75 72- nError '-- - A :2 -U dout7 63 doutS 65 dout4 66 dout3 67 dout2 dout1 69 doutO 76 - dout5 A13 HOSMNu A14 =nuc 21 A15 M364 22E A16 23see--- A17 81 41 8: 40 36 37 i4 36 B.' Be 357 87 33 BE32 U7 5 6D5 D6 9 07 Y1C 45 13ACK 12 =k Y12 Y13 43 select 32 nError Y11 C14 21 C15 2 Cie 27 C17 263 P_ CON 2 3 DO 4D1 D2 D3 6C4 Y9 47 19 PLn PH30 24.MLHG 25 I hote k p1284&Iode 99 nRswra.R~q 50 A7 AS 3 2N A9 his 4 411 6 k12 74H161254 . q hostCik hostAck pn 2O4tode nRevereReq = 1 7 GND Figure A-8: Schematic Diagram of Parallel Port Interface 62 UCY_-B1:B S2 _59 CYP ~ R B2 Cl 1 GNG9 Figure~~~~~~~~~4 A-:Shmt ODMga 63 fRst oto o Appendix B VHDL Code B.1 B.1.1 VHDL Code for CPLD 1 i2c.vhd LIBRARY ieee; USE ieee.stdjlogic_1164.ALL; USE work.stdarith.ALL; entity i2c-drive is port ( sclW, sdaW: out std-ulogic; sclR-raw, sdaR: in stdulogic; clk3: in stdulogic; reset: in std-ulogic end entity i2cdrive; architecture i2carch of i2cdrive is signal sclR: stdulogic; -- synchronized version of sclR_raw type sclkStates is (low, high); signal sclkState, next-sclkState: sclkStates; constant maxPhase: integer := 31; -- SCK changes phase every "maxPhase" cycles of clk3 constant midPhase: integer := 15; -- phase where data changes, start/stop conditions are sent, etc. signal phase, next-phase: std-logicvector (4 downto 0); signal sendGo: std-ulogic; 64 signal bitNum, next-bitNum: integer (0 to 7); -- 0 to 7 addresses bits in a byte signal byteToSend: std-logic-vector(7 downto 0); type sdaStates is (idle, startCond, sendByte0, getAckO, ackOPassed, sendBytel, getAcki, ackiPassed, sendByte2, getAck2, success, ackFailed, stopCond); signal sdaState, next-sdaState: sdaStates; type sendStates is (sendCmd, waitForReply, done); signal sendState, nextsendState: sendStates; signal cmdNum, next-cmdNum: integer (0 to 15); constant lastCmdNum: integer := 11; signal oldReset: std-ulogic; begin sclkFsm: process(sclkState, phase, sclR) begin case sclkState is when low => -- drive SCL low for "phase" cycles of clk3 sclW <= '0'; if (phase = maxPhase) then -- time to flip SCL? next-sclkState <= high; next-phase <= (others => '0'); else next-sclkState <= low; next-phase <= phase + 1; end if; when high => -- drive SCL high for "phase" cycles of clk3 sclW <= '1'; if (phase = maxPhase) then -- time to flip SCL? next-sclkState <= low; next-phase <= (others => '0'); else next-sclkState <= high; if (sclR = '0') then -- is Brooktree holding SCL line LOW? next-phase <= (others => '0'); 65 -- wait for BT to release SCL else nextPhase <= phase + 1; end if; end if; end case; end process; sdaFsm: process (sdaState, phase, sclR, sdaR, byteToSend, sendGo, bitNum) begin case sdaState is when idle => -- wait until we are ready to generate a start condition sdaW <= '1'; next-bitNum <= 7; if (sendGo = '1') and (sclR = '1') and (phase = midPhase) then next-sdaState <= startCond; -- drop SDA to signal a start condition else next-sdaState <= idle; end if; when startCond => -- drop SDA now to signal start condition. sdaW <= '0'; next-bitNum <= 7; if (sclR = '0') and (phase = midPhase) then next-sdaState <= sendByteO; else next-sdaState <= startCond; end if; when sendByteO => if (bitNum = 7) or (bitNum = 3) then -- send "10001000" to device sdaW <= '1'; -- this initiates a write to the BT else sdaW <= '0'; end if; if (sclR = '0') and (phase = midPhase) then 66 nextbitNum <= bitNum - 1; if (bitNum = 0) then next-sdaState <= getAckO; -if that was the last bit, -switch states else next-sdaState <= sendByte0; end if; else next-sdaState <= sendByteO; next-bitNum <= bitNum; end if; when getAckO => sdaW <= '1'; -drive high onto bus, so that BT can pull -the line low to acknowledge next-bitNum <= 7; -- set to 8 so that sendBytel will if (sclR = '1') and (phase = midPhase) then if (sdaR = '0') then next-sdaState <= ackOPassed; -- successful ACK, send another byte else nextsdaState <= ackFailed; -- no ACK. give up. end if; else next-sdaState <= getAckO; end if; when ackOPassed => -- wait for middle of SCK low before sending next bit sdaW <= '1'; nextbitNum <= 7; if (sclR = '0') and (phase = midPhase) then nextsdaState <= sendBytel; else nextsdaState <= ackOPassed; end if; when sendBytel => sdaW <= byteToSend(bitNum); -- drive bit "bitNum" onto the bus if (sclR = '0') and (phase = midPhase) then 67 nextbitNum <= bitNum - 1; if (bitNum = 0) then next-sdaState <= getAcki; -if that was the last bit, switch states -- else next-sdaState <= sendBytel; end if; else nextsdaState <= sendBytel; nextbitNum <= bitNum; end if; when getAcki => sdaW <= '1'; drive high onto bus, so that BT can -pull the line low to acknowledge nextbitNum <= 7; if (sclR = '1') and (phase = midPhase) then if (sdaR = '0') then -- nextsdaState <= ackiPassed; -- successful ACK, send another byte else nextsdaState <= ackFailed; -- no ACK. give up. end if; else nextsdaState <= getAcki; end if; when ackiPassed => -- wait for middle of SCK low before sending next bit sdaW <= '1'; next-bitNum <= 7; if (sclR = '0') and (phase = midPhase) then nextsdaState <= sendByte2; else nextsdaState <= ackiPassed; end if; when sendByte2 => sdaW <= byteToSend(bitNum); -- drive bit "bitNum" onto the bus if (sclR = '0') and (phase = midPhase) then nextbitNum <= bitNum - 1; 68 if (bitNum = 0) then nextsdaState <= getAck2; -- if that was the last bit, -- switch states else next-sdaState <= sendByte2; end if; else next-sdaState <= sendByte2; next-bitNum <= bitNum; end if; when getAck2 => sdaW <= '1'; -drive high onto bus, so that BT can pull -the line low to acknowledge next-bitNum <= 7; if (sclR = '1') and (phase = midPhase) then if (sdaR = '0') then next-sdaState <= success; else next-sdaState <= ackFailed; end if; else next-sdaState <= getAck2; end if; when success => -- signal that we successfully completed a transaction sdaW <= '1'; next-bitNum <= 7; if (sclR = '0') and (phase = midPhase) then next-sdaState <= stopCond; else next-sdaState <= success; end if; when ackFailed => -- signal that we could not complete the transaction sdaW <= '1'; next-bitNum <= 7; if (sclR = '0') and (phase = midPhase) then next-sdaState <= stopCond; else next-sdaState <= ackFailed; 69 end if; when stopCond => sdaW <= '0'; -pull line down during low SCK. -- returning to idle state during -- middle of SCK high pulls SDA up and generates stop condition. -- next-bitNum <= 7; if (sclR = '1') and (phase = midPhase) then next-sdaState <= idle; else next-sdaState <= stopCond; end if; end case; end process; sendCommands: process(sendState, sdaState, cmdNum) begin case sendState is when sendCmd => -- tell process above that we wish to send data if (sdaState = startCond) then -- has it kicked off yet? next-sendState <= waitForReply; next-cmdNum <= cmdNum; else -- keep asserting our send request next-sendState <= sendCmd; next-cmdNum <= cmdNum; end if; when waitForReply => if (sdaState = success) then if (cmdNum = lastCmdNum) then next-sendState <= done; nextcmdNum <= cmdNum; else next-sendState <= sendCmd; next-cmdNum <= cmdNum + 1; end if; elsif (sdaState = ackFailed) then 70 next-sendState <= sendCmd; next-cmdNum <= cmdNum; else next-sendState <= waitForReply; next-cmdNum <= cmdNum; end if; when done => -all done! don't ever execute this code again -- (until next reset) next-sendState <= done; next-cmdNum <= cmdNum; end case; end process; sendGo <= '1' when (sendState = sendCmd) else '0'; btProgram: process(sdaState, cmdNum) begin if (sdaState = sendBytel) then -- BT register addresses case cmdNum is when 0 => byteToSend <= "00011111"; -- software reset (SRESET) when 1 => byteToSend <= "00000001"; -- input format (IFORM) when 2 => byteToSend <= "00000010"; -- temporal decimation (TDEC) when 3 => byteToSend <= "00000011"; -- MSB cropping (CROP) when 4 => byteToSend <= "00000100"; -- vertical delay (VDELAYLO) when 5 => byteToSend <= "00000101"; -- vertical active (VACTIVELO) when 6 => byteToSend <= "00000110"; -- horizontal delay (HDELAYLO) when 7 => byteToSend <= "00000111"; -- horizontal active (HACTIVELO) when 8 => byteToSend <= "00001000"; -- horizontal scale hi (HSCALEHI) when 9 => byteToSend <= "00001001"; -- horizontal scale lo (HSCALELO) when 10 => byteToSend <= "00010011"; -- vertical scale hi (VSCALE-HI) when 11 => byteToSend <= "00010100"; 71 -- vertical scale lo (VSCALE-LO) when others => byteToSend <= "11110000"; end case; elsif (sdaState = sendByte2) then -- command arguments case cmdNum is when 0 => byteToSend <= "00000000"; -- don't care when 1 => byteToSend <= "01001001"; -- force NTSC mode when 2 => byteToSend <= "00000000"; -- no decimation when 3 => byteToSend <= "00010000"; -- VD VA HD HA when 4 => byteToSend <= "01001000"; -- VDELAY = 136 when 5 => byteToSend <= "10000000"; -- VACTIVE = 384 (high bit in R3) when 6 => byteToSend <= "10010000"; -- HDELAY = 144 when 7 => byteToSend <= "10000000"; -- HACTIVE = 128 (high bit in R3) when 8 => byteToSend <= "00001110"; -- HSCALE HI = OxOE when 9 => byteToSend <= "11101110"; -- HSCALE LO = OxEE when 10 => byteToSend <= "01111111" ; -- VSCALE HI = Ox1F when 11 => byteToSend <= "00000000" ; -- VSCALE LO = OxOO when others => byteToSend <= "111100 00 ; end case; else byteToSend <= (others => '0'); end if; end process; --sclR <= sclW; -- enabled for simulation only clockUpdate: process(clk3, sclRraw) begin sclR <= sclR-raw; -- synchronize the incoming SCLK signal if (clk3'event) and (clk3 = '1') then 72 if (reset = '1') then phase <= (others => '0'); sclkState <= low; sdaState <= idle; bitNum <= 7; sendState <= sendCmd; cmdNum <= 0; else phase <= next-phase; sclkState <= nextsclkState; sdaState <= nextsdaState; bitNumn <= nextbitNum; sendState <= nextsendState; cmdNum <= nextcmdNum; end if; end if; end process; end; LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; package i2c-pack is component i2cdrive port ( sclW, sdaW: out std-ulogic; sclR-raw, sdaR: in std.ulogic; clk3: in std-ulogic; reset: in std-ulogic end component; end package i2c.pack; B.1.2 ntsc.vhd LIBRARY ieee; USE ieee.std-logic-1164.ALL; USE work.std-arith.ALL; entity ntscsramdrive is port ( -- SRAM left side addresses 73 --whichRam: buffer stdulogic; --whichRamNot: buffer stdulogic; ntscAdr: buffer std-logic-vector(15 downto 0); cel: out std-ulogic; -- Always tie to '1'; cs: out std.ulogic; rw: out std.ulogic; -- Brooktree 829A dvalid, active, hreset, vreset, vactive: in std-ulogic; field, qclk, clkxl: in stdulogic; oe, rst, i2ccs, pwrdn: out std-ulogic; -- misc. inputs reset: in std-ulogic end entity ntscsramdrive; architecture ntscarch of ntscsram-drive is type States is (waiting, loading); signal state, nextState: States; signal ntscAdrCount: std-logic-vector(13 downto 0); signal page: stdlogic-vector(1 downto 0); signal aboutToLoad: stdulogic; -- pulses high on wait->load state transition, for Mealy outputs signal useless: std-ulogic; begin stateMachine: process (state, nextState, vreset, field, ntscAdrCount) begin case state is when waiting => -- CHECK: is first field odd or even? (assumed odd) if (vReset = '0') and (field = '1') then nextState <= loading; aboutToLoad <= '1'; else nextState <= waiting; aboutToLoad <= '0'; end if; when loading => aboutToLoad <= '0'; 74 if (ntscAdrCount = "11111111111111") then nextState <= waiting; else nextState <= loading; end if; end case; end process stateMachine; cs <= '0' when (dvalid = '1') and (active = '1') and (clkxl = '0') else '1"; ntscAdr <= page & ntscAdrCount(12 downto 7) & ntscAdrCount(13) & ntscAdrCount(6 downto 0); cel <= '1'; i2ccs <= '0'; pwrdn <= '0'; rw <= '0'; oe <= '0'; -- remember FIELD should be routed as an address bit! clockUpdate: process (clkxl, reset) begin if (reset = '1') then state <= waiting; rst <= '0'; page <= (others => '1'); elsif (clkxl'event) and (clkxl '1') = then state <= nextState; rst <= '1'; if (aboutToLoad = '1') then page <= page + 1; end if; if (state = waiting) then ntscAdrCount <= (others => '0'); elsif (dvalid = '1') and (active = '1') then ntscAdrCount <= ntscAdrCount + 1; end if; end if; 75 end process clockUpdate; end architecture ntscarch; LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; package ntsc-pack is component ntscsramdrive port ( -- SRAM left side addresses ntscAdr: buffer stdjlogic-vector(15 downto 0); cel: out std-ulogic; -- Always tie to '1'; cs: out stdulogic; rw: out stdulogic; -- Brooktree 829A dvalid, active, hreset, vreset, vactive: in std-ulogic; field, qclk, clkxl: in std-ulogic; oe, rst, i2ccs, pwrdn: out std-ulogic; -- misc. inputs reset: in std-ulogic end component; end package ntsc-pack; B.1.3 ezw-sram-drive.vhd LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; entity ezw-sramdrive is port( Source of input(EPROM, or CAMERA) -FROM EXTERNAL BUTTON source: in stdulogic; -- -- input from left side SRAM pagel, pageO: in std-ulogic; -- EZW driver signals ezwClk: out std-ulogic; 76 ezwReset: out std-ulogic; ezwStart: out stdulogic; -- SRAM RIGHT SIDE and EEPROM control signals dctAdr: buffer stdlogic-vector(15 downto 0); cel, csSRAM, oeSRAM: out std-ulogic; csEPROM, oeEPROM: out std_ulogic; ezwClkIn: in stdulogic; clk16: in stdulogic; reset: in std.ulogic end entity ezwsram-drive; architecture ezwsram-drive-arch of ezwsramdrive is type States is (waiting, loading, startSignal, resetState, delayStart); signal currentState, nextState: States; signal readPage,decReadPage,minus: std-logic-vector(l downto 0); signal cntAdr: std-logic-vector(14 downto 0); signal pageOSync: std-ulogic; signal clkCnt: stdlogicvector(6 downto 0); signal delayVec: stdjlogic-vector(8 downto 0); signal delayEZWStart: stdlogic-vector(2 downto 0); begin ezwClkGenerator: process(clk16) begin if (clkl6'event) and (clk16 = '1') then clkCnt <= clkCnt + 1; end if; end process ezwClkGenerator; ezwClk <= clkCnt(6); -- source = 1 is to select EPROM cel <= '1'; csSRAM <= source; oeSRAM <= source; csEPROM <= not (source); oeEPROM <= not (source); readPage <= pagel & pageo when (source = '0') else (others minus <= readPage - decReadPage; -- Rearrange output address dctAdr <= decReadPage(1 downto 0) & 77 > '0'); cntAdr(9 downto 5) & cntAdr(13 downto 12) & cntAdr(4 downto 0) & cntAdr(11 downto 10); FSM: process(currentState, pageOSync, pageO, cntAdr, source, delayVec, delayEZWStart) begin case currentState is when resetState => ezwStart <= '0'; nextState <= delayStart; when delayStart => ezwStart <= '0'; if (delayEZWStart = "100") then nextState <= waiting; else nextState <= delayStart; end if; when waiting => ezwStart <= '0'; -- if EPROM, no need to wait for BT if (page0Sync = not pageO) or (source = '1') nextState <= startSignal; end if; when startSignal => ezwStart <= '1'; nextState <= loading; when loading => ezwStart <= '0'; nextState <= loading; end case; end process FSM; clkUpdate: process(ezwClkIn, reset,cntAdr) begin if (rising-edge(ezwClkIn)) then if (reset = '1') then currentState <= resetState; ezwReset <= '1'; cntAdr <= (others => '0'); delayEZWStart <= (others => '0'); decReadPage <= "10"; else ezwReset <= '0'; 78 then pageOSync <= pageO; currentState <= nextState; if (currentState = loading) then if (cntAdr(13 downto 0) = "11111111111111") then cntAdr <= (others => '0'); if (source = '0') then if (minus = "01") or ((readPage = "00") and (decReadPage = "11")) then decReadPage <= decReadPage; else decReadPage <= decReadPage + 1; end if; else decReadPage <= not decReadPage; end if; else cntAdr <= cntAdr + 1; end if; elsif (currentState = waiting) then cntAdr <= (others => '0'); delayVec <= (others => '0'); delayEZWStart <= (others => '0'); elsif (currentState = delayStart) then delayEZWStart <= delayEZWStart + 1; end if; end if; end if; end process clkUpdate; end architecture ezwsramdrivearch; LIBRARY ieee; USE ieee.std-logic1164.ALL; USE work.stdarith.ALL; package ezw-pack is component ezwsramdrive port ( --- Source of input(EPROM, or CAMERA) FROM EXTERNAL BUTTON source: in std-ulogic; -- input from left side SRAM pagel, pageo: in stdculogic; -- EZW driver signals ezwClk: out stdulogic; ezwReset: out std-ulogic; ezwStart: out stdulogic; 79 -- SRAM RIGHT SIDE and EEPROM control signals dctAdr: buffer std-logic-vector(15 downto 0); cel, csSRAM, oeSRAM: out std-ulogic; csEPROM, oeEPROM: out std-ulogic; ezwClkIn: in stdulogic; clk16: in std.ulogic; reset: in std-ulogic end component; end package ezw-pack; B.1.4 top-a.vhd LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; USE work.i2c-pack.ALL; USE work.ntsc-pack.ALL; USE work.ezw.pack.ALL; entity topleva is port ( -- Input from switches globalReset: in std-ulogic; source: in std-ulogic; -- select b/w camera and EPROM aux: in std-ulogic; -- 12C protocal i2cscldrive, i2csdadrive: out stdulogic; i2csclread, i2c-sda-read: in std-ulogic; -- Brooktree 829A btdvalid, bt-active, bt_hreset, bt_vreset, btvactive: in stdulogic; btfield, btqclk, bt_clkxl: in std.ulogic; bt-oe, bt-rst, btji2ccs, btpwrdn: out std.ulogic; -- LEFT NTSC SRAM btadr: buffer std-logic.vector(15 downto 0); btcel: out stdulogic; btcs: out stdulogic; btrw: out stdulogic; -- RIGHT NTSC SRAM dctAdr: buffer std-logic-vector(15 downto 0); 80 cel, csSRAM, oeSRAM: out std-ulogic; csEPROM, oeEPROM: out stdulogic; -- EZW driver signals ezwClk: out stdulogic; ezwReset: out stdulogic; ezwStart: out stdulogic; ezwClkIn: in stdulogic; -- Some clock signals(12C also uses this.. check?) clk3out: out stdulogic; clk3,clkl6: in stdulogic end entity toplev-a; architecture toplev-a_arch of toplev-a is signal adrBufEZW: stdjlogic-vector(16 downto 0); signal endAdrEZW: stdlogic-vector(16 downto 0); signal datBufEZW: std-logicvector(7 downto 0); signal csEZW, oeEZW, weEZW: stdulogic; signal adrBufPB: std-logic-vector(16 downto 0); signal datBufPB: std-logic-vector(7 downto 0); signal csPB, oePB, wePB: std-ulogic; signal globalResetSy: stdulogic; signal doneEZW, donePB: std-ulogic; signal okToGo: std-ulogic; signal clk3Cnt: std-logic-vector(4 downto 0); begin clk3outGen: process(clk16) begin if (clkl6'event) and (clk16 = '1') then clk3Cnt <= clk3Cnt + 1; end if; end process; clk30ut <= clk3Cnt(4); i2cPart: i2cdrive port map ( sclW => i2cscldrive, sdaW => i2csdadrive, sclRraw => i2cscl-read, sdaR => i2csdaread, 81 clk3 => clk3, reset => globalResetSy ntscPart: ntscsramdrive port map ( ntscAdr => bt-adr, cel => btcel, cs => btcs, rw => bt_rw, -- Brooktree 829A dvalid => btdvalid, active => btactive, hreset => bthreset, vreset => btvreset, vactive => btvactive, field => btjfield, qclk => bt.qclk, clkxl => bt-clkxl, oe => btoe, rst => btrst, i2ccs => bt-i2ccs, pwrdn => bt-pwrdn, -- misc. inputs reset => globalResetSy ezw-sram-drive-part: ezwsramdrive port map( source => source, pagel => btadr(15), pageO => btadr(14), ezwClk => ezwClk, ezwReset => ezwReset, ezwStart => ezwStart, dctAdr => dctAdr, cel => cel, csSRAM => csSRAM, oeSRAM => oeSRAM, csEPROM => csEPROM, oeEPROM => oeEPROM, clk16 => clk16, ezwClkIn => ezwClkIn, reset => globalResetSy 82 ); clkEdge: process(clk16) begin if (clkl6'event) and (clk16 = '1') then globalResetSy <= not globalReset; end if; end process clkEdge; end architecture toplev-a-arch; B.2 B.2.1 VHDL Code for CPLD 2 sram-switch.vhd -- Basically, this is a huge mux LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.std-arith.ALL; entity sramswitch is port ( -- Shakehand signals doneEZW, donePB: in stdulogic; okToGo: out stdulogic; -- Output adrOutTop, adrOutBut: buffer std-logic-vector(16 downto 0); datOutTop, datOutBut: inout stdlogic-vector(7 downto 0); csOutTop, oeOutTop, weOutTop: out stdulogic; csOutBut, oeOutBut, weOutBut: out stdulogic; -- Input from ezw-out adrBufEZW: in stdlogic-vector(16 downto 0); datBufEZW: in stdlogic-vector(7 downto 0); csEZW, oeEZW, weEZW: in std-ulogic; -- Input from parallelbuf adrBufPB: in stdlogicvector(16 downto 0); datBufPB: out stdjlogic-vector(7 downto 0); csPB, oePB, wePB: in stdulogic; toggleOut: out std-ulogic; reset: in std-ulogic; 83 clk16: in std-ulogic end entity sramswitch; architecture sramswitch-arch of sramswitch is type States is (waitForBothDone,waitForBothClear,resetState); signal currentState, nextState: States; signal toggle, changeToggle: std-ulogic; begin toggleOut <= toggle; FSM: process(currentState,doneEZW,donePB) begin case currentState is when resetState => nextState <= waitForBothDone; changeToggle <= '0'; when waitForBothDone => okToGo <= '0'; if (doneEZW = '1') and (donePB = '1') then nextState <= waitForBothClear; changeToggle <= '1'; else nextState <= waitForBothDone; changeToggle <= '0'; end if; when waitForBothClear => changeToggle <= '0'; okToGo <= '1'; if (doneEZW = '0') and (donePB = '0') then nextState <= waitForBothDone; else nextState <= waitForBothClear; end if; end case; end process FSM; switch: process(toggle,adrBufEZW,datBufEZW,csEZW, oeEZW,weEZW,adrBufPB,datOutBut,csPB,oePB,wePB,adrBufPB, datOutTop) begin if (toggle = '0') then adrOutTop <= adrBufEZW; datOutTop <= datBufEZW; 84 csOutTop <= csEZW; oeOutTop <= oeEZW; weOutTop <= weEZW; adrOutBut <= adrBufPB; datOutBut <= (others => datBufPB <= datOutBut; csOutBut <= csPB; oeOutBut <= oePB; weOutBut <= wePB; 'Z'); else adrOutTop <= adrBufPB; datOutTop <= (others => 'Z'); datBufPB <= datOutTop; csOutTop <= csPB; oeOutTop <= oePB; weOutTop <= wePB; adrOutBut <= adrBufEZW; datOutBut <= datBufEZW; csOutBut <= csEZW; oeOutBut <= oeEZW; weOutBut <= weEZW; end if; end process switch; clkEdge: process(clk16) begin if (rising-edge(clkl6)) then if (reset = '1') then currentState <= resetState; '0'; else toggle <= currentState <= nextState; if (changeToggle = '1') then toggle <= not (toggle); end if; end if; end if; end process clkEdge; end architecture sram_switcharch; LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; 85 package sram.switch-pack is component sramswitch port( -- Shakehand signals doneEZW, donePB: in stdulogic; okToGo: out stdulogic; -- Output adrOutTop, adrOutBut: buffer std-logic-vector(16 downto 0); datOutTop, datOutBut: inout stdlogic-vector(7 csOutTop, oeOutTop, weOutTop: out stdulogic; csOutBut, oeOutBut, weOutBut: out stdulogic; -- Input from ezwout adrBufEZW: in stdlogic.vector(16 downto 0); datBufEZW: in stdlogicvector(7 downto 0); csEZW, oeEZW, weEZW: in std-ulogic; -- Input from parallelbuf adrBufPB: in stdlogic.vector(16 downto 0); datBufPB: out std-logic-vector(7 downto 0); csPB, oePB, wePB: in stdulogic; toggleOut: out stdulogic; reset: in std-ulogic; clk16: in stdulogic end component; end package sramswitch-pack; B.2.2 ezw-out.vhd LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; entity ezwout is port -- Output of ezw ( datBit: in stdulogic; datClk: in stdulogic; grpClk: in stdulogic; -- SRAM Buffer adrBuf: buffer stdjlogic-vector(16 downto 0); datBuf: buffer std-logic-vector(7 downto 0); 86 downto 0); cs, oe, we: out std-ulogic; -- Shakehand signal okToGo: in stdulogic; done: out std-ulogic; -- Clk 16MHz because bit rate of ezw is 5MHz max clk16: in std-ulogic; -- --DEBUG datBitSyncOut, datClkSyncOut: out stdulogic; reset: in stdulogic end entity ezwout; architecture ezwout-arch of ezw-out is type States is (resetState,writeLastByte ,waitForOkToGo,writeSRAM, waitForOne, waitForZero,waitGrpClkZero,throwAwayGroup); signal currentState, nextState: States; signal datBitSync, datClkSync, grpClkSync: stdulogic; signal bitCnt: stdjlogic-vector(3 downto 0); signal updateAdrBuf, updateBitCnt: stdulogic; begin datBitSyncOut <= datBitSync; datClkSyncOut <= datClkSync; oe <= cs <= '1'; '0'; FSM: process(currentState,grpClkSync,datBitSync,datClkSync,bitCnt, okToGo) begin case currentState is when resetState => datBuf <= (others => '0'); we <= '1'; done <= '0'; updateAdrBuf <= '0'; updateBitCnt <= '0'; nextState <= waitForOne; when waitForOne => 87 updateAdrBuf <= '0'; we <= '1'; done <= '0'; if (grpClkSync = '1') then nextState <= writeLastByte; elsif (datClkSync = '1') then if (bitCnt = "0000") then datBuf(7) <= datBitSync; elsif (bitCnt = "0001") then datBuf(6) <= datBitSync; elsif (bitCnt = "0010") then datBuf(5) <= datBitSync; elsif (bitCnt = "0011") then datBuf(4) <= datBitSync; elsif (bitCnt = "0100") then datBuf(3) <= datBitSync; elsif (bitCnt = "0101") then datBuf(2) <= datBitSync; elsif (bitCnt = "0110") then datBuf(1) <= datBitSync; elsif (bitCnt = "0111") then datBuf(0) <= datBitSync; end if; if (bitCnt = "0111") then updateBitCnt <= '0'; nextState <= writeSRAM; else updateBitCnt <= '1'; nextState <= waitForZero; end if; else updateBitCnt <= '0'; nextState <= waitForOne; end if; when waitForZero => updateAdrBuf <= '0'; updateBitCnt <= '0'; we <= '1'; done <= '0'; if (datClkSync = '0') then nextState <= waitForOne; else nextState <= waitForZero; end if; when writeSRAM => 88 we <= '0'; done <= '0'; updateBitCnt <= '0'; updateAdrBuf <= '1'; if (datClkSync = '1') then nextState <= waitForZero; else nextState <= waitForOne; end if; when waitGrpClkZero => we <= '1'; done <= '0'; if (grpClkSync = '0') then nextState <= waitForOkToGo; else nextState <= waitGrpClkZero; end if; when writeLastByte => we <= '0'; done <= '0'; nextState <= waitGrpClkZero; when waitForOkToGo => we <= '1'; done <= '1'; datBuf <= (others => '0'); if (okToGo = '1') and (grpClkSync = '0') then nextState <= waitForOne; -- elsif (datClkSync = '1') then okToGo too slow, throw away until next grpClkSync nextState <= throwAwayGroup; else nextState <= waitForOkToGo; end if; when throwAwayGroup => we <= '1'; done <= '1'; datBuf <= (others => '0'); if (grpClkSync = '0') then nextState <= throwAwayGroup; else nextState <= waitFor0kToGo; end if; end case; end process FSM; clkEdge: process(clk16) begin 89 if (rising-edge(clkl6)) then if (reset = '1') then bitCnt <= (others => '0'); adrBuf <= (others => '0'); currentState <= resetState; else currentState <= nextState; datBitSync <= datBit; datClkSync <= datClk; grpClkSync <= grpClk; if (updateAdrBuf = '1') then adrBuf <= adrBuf + 1; end if; if (updateBitCnt = '1') then bitCnt <= bitCnt + 1; end if; if (currentState = waitForOkToGo) then bitCnt <= (others => '0'); adrBuf <= (others => '0'); elsif (currentState = writeSRAM) then bitCnt <= (others => '0'); end if; end if; end if; end process clkEdge; end architecture ezw_out_arch; LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.std-arith.ALL; package ezwout-pack is component ezwout port( -- Output of ezw datBit: in stdulogic; datClk: in stdulogic; grpClk: in stdulogic; -- SRAM Buffer 90 adrBuf: buffer stdilogic-vector(16 downto 0); datBuf: buffer std-logic-vector(7 downto 0); cs, oe, we: out std-ulogic; -- Shakehand signal okToGo: in std-ulogic; done: out std-ulogic; -- Clk clk16: in std-ulogic; --DEBUG datBitSyncOut, datClkSyncOut: out stdulogic; reset: in std-ulogic end component; end package ezw-out-pack; B.2.3 parallel-sram.vhd LIBRARY ieee; USE ieee.std-logic-1164.ALL; USE work.stdarith.ALL; entity parallel-sram is port( reset: in std-ulogic; -- Output to sram buffer adrBuf: out std-logic-vector(16 downto 0); datBuf: in stdlogic-vector(7 downto 0); cs, oe, we: out std-ulogic; -- Parallel port interface pins dout: inout std-logicvector(7 downto 0); pl284Mode,nReverseReq,hostAck,hostClk: in std-ulogic; nAckReverseReq,dir,nAck,periphAck: out stdulogic; periphClk: buffer stdulogic; signal to inidicate that entire buffer's been transfered okToGo: in stdulogic; done: out stdulogic; clk16: in stdulogic -- 91 end entity parallel-sram; architecture parallel-sramarch of parallel-sram is type States is (waitForNextDMA,updateAdr,resetStatewaitForReq, sendDat,waitFor0kwaitForHostAckUp,waitForHostAckDown, doneOneByte,sendDone); signal currentState, nextState: States; signal signal signal signal signal signal signal signal signal hostClkA: std-ulogic; hostAckA,nReverseReqA: stdulogic; doutSync, dat, cmd, datBufTmp: std-logic-vector(7 downto 0); cntAdr: std-logic-vector(17 downto 0); blockCnt: stdjlogic-vector(15 downto 0); port2PC,port2PCA,pc2Port: stdulogic; datOrNCmdOut: stdulogic; -- 1 if sending data, 0 if sending cmd hostAckDown,hostAckUpreverseReqUp: stdulogic; periphClkDelay: std-ulogic; begin cs <= '0'; oe <= '0'; we <= '1'; port2PC <= (p1284Mode) and (not nReverseReqA); pc2Port <= not port2PC; adrBuf <= cntAdr(16 downto 0); dir <= port2PCA or port2PC; dout <= datBufTmp; datBufTmp <= datBuf when (datOrNCmdOut = '1') and ((port2PCA and port2PC) = '1') else "01010111" when (datOrNCmdOut = '0') and ((port2PCA and port2PC) = '1') else (others => 'Z'); nAckReverseReq <= pc2Port; periphAck <= datOrNCmdOut; FSM: process(currentState,nReverseReqA,hostAckA,datOrNCmdOut, okToGo,blockCnt,cntAdr) begin case currentState is when resetState => datOrNCmdOut <= '0'; done <= '0'; 92 periphClkDelay <= '1'; nextState <= waitForReq; when waitForReq => datOrNCmdOut <= '0'; done <= '0'; periphClkDelay <= '1'; if (nReverseReqA = '0') and (hostAckA = '0') nextState <= sendDat; then else nextState <= waitForReq; end if; when waitForNextDMA => if (nReverseReqA = '0') then periphClkDelay <= hostAckA; nextState <= waitForNextDMA; elsif (cntAdr = "100000000000000000") then nextState <= waitFor0k; else nextState <= waitForReq; end if; when sendDat => done <= '0'; datOrNCmdOut <= '1'; -- sending data periphClkDelay <= '0'; nextState <= waitForHostAckUp; when waitForHostAckUp => if (hostAckA = '0') then nextState <= waitForHostAckUp; else nextState <= waitForHostAckDown; end if; when waitForHostAckDown => periphClkDelay <= '1'; if (hostAckA = '1') then nextState <= waitForHostAckDown; else nextState <= doneOneByte; end if; when doneOneByte => -- Or done DMA if (blockCnt(15 downto 0) = "1000000000000000") then nextState <= waitForNextDMA; else nextState <= updateAdr; end if; when updateAdr => nextState <= waitForReq; 93 when sendDone => -- Have to send the ending byte datOrNCmdOut <= '0'; periphClkDelay <= '0'; nextState <= waitForHostAckUp; when waitForOk => datOrNCmdOut <= '0'; done <= '1'; if (okToGo = '1') then nextState <= waitForReq; else nextState <= waitFor0k; end if; end case; end process; clkEdge: process(clk16) begin if (rising-edge(clkl6)) then hostClkA <= hostClk; hostAckA <= hostAck; nReverseReqA <= nReverseReq; doutSync <= dout; port2PCA <= port2PC; periphClk <= periphClkDelay; nAck <= periphClkDelay; if (reset = '1') then currentState <= resetState; cntAdr <= (others => '0'); blockCnt <= (others => '0'); else currentState <= nextState; end if; if (currentState = updateAdr) then cntAdr <= cntAdr + 1; blockCnt <= blockCnt + 1; elsif (currentState = waitFor0k) then cntAdr <= (others => '0'); elsif (currentState = waitForNextDMA) then blockCnt <= (others => '0'); end if; end if; end process clkEdge; 94 end architecture parallel-sramarch; LIBRARY ieee; USE ieee.std-logic_1164.ALL; USE work.stdarith.ALL; package parallel-sram.pack is component parallel-sram port( reset: in std.ulogic; -- Output to sram buffer adrBuf: out std-logic-vector(16 downto 0); datBuf: in stdlogic-vector(7 downto 0); cs, oe, we: out std-ulogic; -- Parallel port interface pins dout: inout std-logic-vector(7 downto 0); pl284Mode,nReverseReq,hostAck,hostClk: in stdulogic; nAckReverseReq,dir,nAck,periphAck: out stdulogic; periphClk: buffer stdulogic; -- signal to inidicate that entire buffer's been transfered okToGo: in stdulogic; done: out stdulogic; clk16: in std.ulogic end component; end package parallel-sram-pack; B.2.4 top-b.vhd LIBRARY ieee; USE ieee.stdlogic_1164.ALL; USE work.std-arith.ALL; USE work.ezw-out-pack.ALL; USE work.sramswitchpack.ALL; USE work.parallel-sram-pack.ALL; entity toplevb is port ( -- Input from switches globalReset: in stdulogic; aux: in std-ulogic; 95 -- EZW output signals ezwDatBit: in stdulogic; ezwDatClk: in stdulogic; ezwGrpClk: in stdulogic; -- EZW SRAM buffer signal adrOutTop, adrOutBut: buffer std-logic-vector(16 downto 0); datOutTop, datOutBut: inout stdlogic.vector(7 downto 0); csOutTop,oeOutTop,we~utTop: out std-ulogic; csOutBut,oeOutBut,weOutBut: out std-ulogic; -- Parallel port interface pins dout: inout stdlogicvector(7 downto 0); pl284Mode,nReverseReq,hostAck,hostClk: in stdulogic; nAckReverseReq,dir,nAck,periphAck: out std-ulogic; periphClk: buffer stdulogic; -- Some clock signals(12C also uses this.. check?) clk3,clkl6: in stdulogic; -- debug port debugO, debugi, debug2: out stdulogic; oldDatOutBut7,oldDatOutBut4: in std-ulogic end entity toplev-b; architecture toplev_b_arch of toplevb is signal tmpl, tmp2: stdulogic; signal adrBufEZW: std-logic-vector(16 downto 0); signal datBufEZW: std-logic-vector(7 downto 0); signal doutTmp: std-logic-vector(7 downto 0); signal csEZW, oeEZW, weEZW: stdulogic; signal adrBufPB: std-logicvector(16 downto 0); signal datBufPB: std-logic-vector(7 downto 0); signal csPB, oePB, wePB: std-ulogic; signal globalResetSync: std.ulogic; signal signal signal signal signal doneEZW, donePB: stdulogic; okToGo: stdculogic; toggleOut: std.ulogic; tmp: stdulogic; datBitSyncOut, datClkSyncOut: stdulogic; begin tmp1 <= oldDatOutBut7; 96 tmp2 <= oldDatOutBut4; debugO <= doneEZW; debugi <= datClkSyncOut; debug2 <= datBitSyncOut; ezw-out-part: ezw-out port map( done => doneEZW, okToGo => okToGo, -- Output datBit => datClk => grpClk => of ezw ezwDatBit, ezwDatClk, ezwGrpClk, -- SRAM Buffer adrBuf => adrBufEZW, datBuf => datBufEZW, cs => csEZW, oe => oeEZW, we => weEZW, -- Clk clkl6 => clkl6, datClkSyncOut => datClkSyncOut, datBitSyncOut => datBitSyncOut, reset => globalResetSync parallel-sram-part: parallel-sram port map( reset => globalResetSync, -- Input from ezwout okToGo => okToGo, -- Output to sram buffer adrBuf => adrBufPB, datBuf => datBufPB, cs => csPB, oe => oePB, we => wePB, 97 -- Parallel port interface pins dout => dout, p1284Mode => p1284Mode, nReverseReq => nReverseReq, hostAck => hostAck, hostClk => hostClk, nAckReverseReq => nAckReverseReq, dir => dir, nAck => nAck, periphAck => periphAck, periphClk => periphClk, -- signal to inidicate that entire buffer's been transfered done => donePB, clk16 => clk16 sramswitch-part: sramswitch port map( donePB => donePB, doneEZW => doneEZW, okToGo => okToGo, -- Output adrOutTop => adrOutTop, adrOutBut => adrOutBut, datOutTop => datOutTop, datOutBut => datOutBut, csOutTop => csOutTop, oeOutTop => oeOutTop, weOutTop => weOutTop, csOutBut => csOutBut, oeOutBut => oeOutBut, weOutBut => weOutBut, -- Input from ezwout adrBufEZW => adrBufEZW, datBufEZW => datBufEZW, csEZW => csEZW, oeEZW => oeEZW, weEZW => weEZW, -- Input from parallelbuf adrBufPB => adrBufPB, csPB => csPB, 98 oePB => oePB, wePB => wePB, -- Output to parallelbuf datBufPB => datBufPB, toggleOut => toggleOut, reset => globalResetSync, clkl6 => clkl6 clkUpdate: process(clkl6) begin if (rising-edge(clkl6)) then globalResetSync <= not globalReset; end if; end process clkUpdate; end architecture toplev-b-arch; 99 Appendix C Decoder Code C.1 Decoder Code in C Included here are files from Microsoft Visual C++ 6.0. Note that some standard files are not included in here. C.1.1 // // // StdAfx.h stdafx.h : include file for standard system include files, or project specific include files that are used frequently, but are changed infrequently // #if !defined(AFXSTDAFXH__A9DB83DBA9FD_11DO_BFD1_444553540000\ __INCLUDED-) #define AFXSTDAFX_H__A9DB83DB.A9FD_11DOBFD1_444553540000__INCLUDED_ #if _MSC_VER > 1000 #pragma once #endif // _MSC_VER > 1000 #define WIN32_LEANANDMEAN // Exclude rarely-used stuff from Windows headers // Windows Header Files: #include <windows.h> // C RunTime Header Files #include <stdlib.h> #include <malloc.h> #include <memory.h> #include <tchar.h> 100 // Local Header Files // TODO: reference additional headers your program requires here //{{AFXINSERT_LOCATION}} // Microsoft Visual C++ will insert additional declarations // immediately before the previous line. #endif // // StdAfx.cpp C.1.2 // // // !defined(AFXSTDAFX_H__A9DB83DBA9FD_11DOBFD1_444553540000\ __INCLUDED_) stdafx.cpp : source file that includes just the standard includes vdodma3.pch will be the pre-compiled header stdafx.obj will contain the pre-compiled type information #include "stdafx.h" // TODO: reference any additional headers you need in STDAFX.H // and not in this file C.1.3 vdodma3.h #if !defined(AFXVDODMA3_H__0D3F1D66_43AE_11D3_B3OA_9D05D47BE08A\ _-INCLUDED_) #define AFXVDODMA3_H__0D3F1D66_43AE_11D3_B30A-9D05D47BE08A__INCLUDED_ #if _MSC_VER > 1000 #pragma once #endif // _MSC_VER > 1000 #include #endif // // C.1.4 // "resource.h" !defined(AFXVDODMA3_H__0D3F1D66_43AE_11D3_B30A_9D05D47BE08A\ _INCLUDED_) vdodma3.cpp vdodma3.cpp : Defines the entry point for the application. // 101 #include "stdafx.h" #include #include #include #include #include #include "resource.h" "WinRTctl.h" "ioaccess.h" <winioctl.h> <ddraw.h> <windows.h> #include <stdio.h> #include <stdlib.h> #include <conio.h> #define #define #define #define #define #define #define #define #define #define NAME "EZW DEMO WITH DMA" TITLE "EZW DEMO WITH DMA" IMAGEWIDTH 256 FILESIZE 16384 FILESIZEO 65536 HEADERFILESIZE 15 TIMERID 1 TIMERRATE 33 BITSIN_A_FRAME 131072 MAXLOADSTRING 100 //***** DMA STUFF **********// //Parallel port stuff #define BLOCKCNT 3 // Or 4 blocks of DMA transfers #define DMASIZE 32768 //16384 #define EcpAFifo 0x0378; #define LptDsr 0x0379; / bit 7 = inverted version of Busy // bit 6 = nAck // bit 5 = PError // bit 4 = Select // bit 3 = nFault #define PeriphRequestMask 0x008; #define LptDcr 0x037A; // bit 5 = direction. 0 = out, 1 = in // // // // // bit bit bit bit bit 4 3 2 1 0 = = = = = ackIntEn. 1 enables an interrupt on rising edge of nAck inverted nSelectIn nInit inverted nAutoFd inverted nStrobe #define EcpDFifo 0x0778; 102 #define LptCnfgA 0x0778; / bit 7: 1= interrupts are level, 0 = interrupts are pulses / bit 6-4: P-word size: / OxOO PWord = 2 bytes / Ox01 PWord = 1 byte / Ox02 PWord = 4 bytes / bit 3: reserved / bit 2: nByteIntransceiver (for recovery) / bit 1-0: fractional Pword count (for recovery) #define LptCnfgB 0x0779; / bit 7: 1 = compression enabled / bit 6: value of ISA iReq line (read only) / bit 5-3: selects IRQ: / 111 = 5, 110 = 15, 101 = 14, 100 = 11, / 011 = 10, 010 = 9, 001 = 7 (default), 000 = jumpered // bits 2-0: selects DMA channel: // // 111 = 7, 110 = 6, 101 = 5 (16-bit default), 100 = jumpered 16-bit, 011 = 3, 010 = 2, 001 = 1, 000 = jumpered 8-bit #define LptEcr 0x077A; // // // // // // // // // // // // // // // // // // // // // bits 7:5 = mode 000: standard parallel port mode 001: PS/2 parallel port mode (direction tri-states data lines) 010: parallel port FIFO mode (direction = 0 only) 011: ECP mode. 100: undefined 101: undefined 110: test mode. data not sent to port 111: configuration mode: cnfga and cnfgb regs are accessible bit 4: 0 enables interrupt pulse on falling edge of nFault 1 disables interrupts bit 3: dmaEn: 0 disables DMA, 1 enables DMA (when serviceIntr = 0) bit 2: serviceIntr: 1 disables DMA and service interrupts enables service interrupts (which set serviceIntr to 1) If dmaEn= 1, int when terminal count is reached If dmaEn = 0, FIFO service int. bit 1: FIFO full bit 0: FIFO empty 103 static char cCmd; static char cStat=O; static char cTemp; static char cBitO,cBitl,cBit2,cBit3,cBit4,cBit5,cBit6,cBit7; static char LowAdd,MidAdd,HighAdd; static char DmaBuf[DMASIZE]; static char * LookHere = DmaBuf; static char BlockBuf[64]; //temp //WinRT variables static HANDLE hWinRT; iWinRTlength; static DWORD // length of returned buffers // DMA buffer information returned from the driver static WINRTDMABUFFERINFORMATION DmaInformation; static ULONG Length, DmaLength; // length of buffers from API calls 'X' #define DAS16DMAKNOWNBUFFERFILLER // known information to used to fill the buffer //static USHORT NumberOfBytes = 16384; //size of each actual DMA transfer //static USHORT NumberOfBytes = 64; #define NUMBEROFBYTES 32768 static USHORT NumberOfBytes = NUMBEROFBYTES; static static static static TCHAR szErrorMsg[128]; TCHAR szErrorTitle[128]; int j = 0; int block = 0; //***** END OF DMA STUFF ***********// II***** Display DDRAW STUFF ********// static BOOL pause = FALSE; //***** END OF Display DDRAW STUFF *******II //***** EZW and DECODE Constants *******// #define NUMGRP 1 // or 1 of 16 frames #define GRPSIZE 16 int NUMFRM = 1; //Number of frames to be displayed; 1 is 1(< than 16) int EZWPASSES = 6; // default value #define BITBUFSIZE NUMGRP * NUMBEROFBYTES * (BLOCKCNT+1) * 8 + 256 104 #define valuebits 11 #define top-value ((1 << valuebits) - 1) //2^11 - 1 = 2047 #define firstqtr ((top-value >> 2) + 1) //2047 >> 2 = 511 + 1 (firstqtr << 1) #define half #define thirdqtr (firstqtr + half) = 512 //512 << 1 = 1024 //512 + 1024 = 1536 #define maxfreq 255 #define tbls 3 #define root-sym 0 #define pos.sym 1 #define neg-sym 2 #define zerosym 3 char filenamejin[] = "C:\\Users\\charatc\\VC++\\RF"; char filename-out[] char filenameorg[] char filenamerpt[] //****** END of EZW = "C:\\Users\\charatc\\VC++\\RF"; = "C:\\Users\\charatc\\VC++\\org.000"; = "C:\\Users\\charatc\\VC++\\diff.rpt"; and DECODE Constants ********// // DirectDraw object lpDD; static LPDIRECTDRAW lpDDSPrimary; static LPDIRECTDRAWSURFACE primary surface // DirectDraw // DirectDraw back surface lpDDSBack; static LPDIRECTDRAWSURFACE // is application active? bActive; BOOL static HCURSOR hArrowCursor, hWaitCursor; // Mouse // buttons to choose ezw passes static HWND hBtnP2; static HWND hBtnP3; static HWND hBtnP4; static HWND hBtnP5; static HWND hBtnP6; HINSTANCE hInst; // current instant TCHAR szWindowClass[MAXLOADSTRING]; TCHAR szTitle[MAXLOADSTRING]; //static DDSURFACEDESC ddrval; //HRESULT ddscaps; //DDSCAPS ddsd; short bitBuf[BITBUFSIZE]; static unsigned long int bitBufIndex, totalBits; static long int ratio; static int frame = 0; static int framedisplay = 0; static int BitsPerPixel = 8; 105 static int BitsPerPixel0 = 32; static HBITMAP hbm; static char ImageData[20] [FILESIZE0]; static char ImageDataO[FILESIZE]; char *framePtr; char *erStr; hdcImage = NULL; HDC static char pHBuffer[HEADERFILESIZE]; char rptBuf[NUMGRP][256]; //********* EZW Routines ***********// int imagecols, imagerows, sblvls, top-bit, image-mean; int group-size, groups, bottombit, alubits, sign-bitsmask; int zero-poss, ezw-passes; int filter-args[5] = {0, 0, 0, 0, 0}; int id = 0; int freq[tbls][2]; int places[tbls]; int code[tbls]; void check-precision(int val) { if (val < 0) { if (~val & sign-bitsmask) wsprintf(erStr, "ERROR = %d precision overflow val = %d\n", id, val); else if (val & sign-bitsmask) wsprintf(erStr, "ERROR = %d precision overflow val = %d\n", id, val); } int fadd(int a,int b) { int ans = a + b; id = id * 10; checkprecision(ans); id = id / 10; return ans; } int fsub(int a,int b) { a + 1 + (~b); int ans id = id * 100; check-precision(ans); id = id / 100; return ans; } 106 } int faddh(int a,int b) { int ans = (a >> 1) + (b >> 1) + (a & b & 1); id = id * 10; checkprecision(ans); id = id / 10; return ans; } int fsubh(int a,int b) { int ans = (a >> 1) + 1 + (~b >> 1); id = id * 100; check-precision(ans); id = id / 100; return ans; } /* 5 x 5 */ double decjlo-pass5[] = {-0.0761, 0.3536, 0.8593, 0.3536, -0.0761}; double dec-hi.pass5[] = {-0.0761, -0.3536, 0.8593, -0.3536, -0.0761}; int enclo5() { int a; a a a a a a = = = = = = faddh(filter-args[0], , fsub(a >> 3, fsub(a >> 1, fadd(a , faddh(a , faddh(a filterargs[4]); filter-args[2]); a); filter-args[1]); filter-args[3]); filter-args[2]); return a; } int enchi5() int a; a a a a a a = = = = = = { faddh(filter-args[0], , fsubh(a >> 3, fsub(a fsubh(a , , fsub(a , faddh(a filter-args[4]); filter-args[2]); a); filter-args[1]); filter-args[3] >> 1); filter-args[2]); return a; } int decfilterlen = 5; double *dec-lo-pass = dec-lo-pass5; 107 double *dec-hi-pass = dechi-pass5; typedef int (*MYPROC)(); int encfilterlen encrenorm[] int MYPROC enclo-pass MYPROC enchi-pass 5; {3, 2, 1, 0}; = enc-lo5; = enc-hi5; char *strappend(char *strl,char *str2) { char *result; result = (char *) calloc((strlen(strl) strcpy(result, stri); strcat (result, str2); return result; + strlen(str2) + 1), sizeof(char)); } char *makefrm-name(char *name, int frnum) { char *ans,*ret; ans = (char *) calloc(5, sizeof(char)); if (frnum < 10) wsprintf(ans, ".00%d", frnum); else if (frnum < 100) wsprintf(ans, ".0%d", frnum); else wsprintf(ans, ".%d", frnum); ret = strappend(name,ans); free(ans); return ret; } int **make2dintarray(int dl, int d2) { int **array; int index; array = (nt **) calloc(dl, sizeof(int *)); for (index = 0; index < dl; index++) array[index] = (nt *) calloc(d2,sizeof(int)); return array; } void destroy2dintarray(int **ptr, int dl, int d2) { int i; for (i=0;i<dl;i++) free(ptr[i]); free(ptr); } double **make2ddblarray(int dl, int d2) { 108 double **array; int index; array = (double **) calloc(dl, sizeof(double *)); for (index = 0; index < dl; index++) array[index] = (double *) calloc(d2,sizeof(double)); return array; } void destroy2ddblarray(double **ptr, int dl, int d2) { int i; for (i=O;i<dl;i++) free(ptr[i]); free(ptr); } int ***make3dintarray(int dl, int d2, int d3) { int ***array; int index; array = (nt ***) calloc(dl, sizeof(int **)); for (index = 0; index < dl; index++) array[index] = make2dintarray(d2,d3); return array; } void destroy2dintarray(int ***ptr, int dl, int d2, int d3) { int i; for (i=0;i<dl;i++) destroy2dintarray(ptr[i],d2,d3); free(ptr); } double ***make3ddblarray(int dl, int d2, int d3) { double ***array; int index; array = (double ***) calloc(dl, sizeof(double **)); for (index = 0; index < dl; index++) array[index] = make2ddblarray(d2,d3); return array; } void destroy3ddblarray(double ***ptr, int dl, int d2, int d3) { int i; for (i=0;i<dl;i++) destroy2ddblarray(ptr[i],d2,d3); free(ptr); } int mag(int arg) { 109 if (arg < 0) arg = 0 - arg; return arg; } int reflect(int arg,int bound) { arg = mag(arg); if (arg >= bound) arg = (2 * (bound - 1)) - arg; return arg; } void check_p-sgn(int val, int p, int sgn) { if (sgn) { if (~val & (0 - (1 << (p - 1)))) wsprintf(erStr,"ERROR = precision overflow val = Xx\n", val); } else if (val & (0 - (1 << (p - 1)))) wsprintf(erStr,"ERROR = precision overflow val = %x\n", val); } void update.model(int sym,int table) { int cums0, cumsl, total; freq[table][sym]++; places [table] = freq [table] [1] > freq [table] [0]; check-p.sgn(freq[table] [0], 9, 0); check-p-sgn(freq[table][1], 9, 0); total = freq[table] [0] + freq[table] [1] check-p-sgn(total, 9, 0); for (cums0 = valuebits - 4; ((1 << cums0) & total) == 0; cums0--); for (cumsl = valuebits - 4; ((1 << cums1) & freq[table][1 - places[table]]) == 0; cumsl--); check-p-sgn(cums0, 4, 0); check-p-sgn(cumsl, 4, 0); code[table] = cums0 - cumsl; if (code[table] < 1) code[table] = 1; checkp.sgn(code[table], 4, 0); if (total == maxdfreq) { freq[table][0] = (freq[table][0] >> 1) freq[table][1] = (freq[table][1] >> 1) } } void initarith-model() { int j; for (j = 0; j < tbls; j++) { 110 1 1; 1 1; code[j] = 1; places[j] = 0; freq[j][0] = 1; freq[j][1] = 1; } } void flag-dscndnts(int col,int row,int lvl,int **part-of-tree) { if (lvi > 0) { part-of-tree[col][row] = 1; flagdscndnts((col << 1) ,(row << 1) ,lvl - 1,part-oftree); flag-dscndnts((col << 1) + 1,(row << 1) flag-dscndnts((col << 1) ,lvl - 1,part-oftree); ,(row << 1) + 1,lvl - 1,part-oftree); flag-dscndnts((col << 1) + 1,(row << 1) + 1,lvl - 1,part-oftree); } } //************* END of EZW Routines ************/ //************* DECODE Routines ***************II FILE *inFile,*outFile; double ***image-syn; int low, high; int value; int **found, **mags, **signs, **prev-frame; int even(int arg) { return 1 - (arg & 1); } void synthesize-image(int lvl) { int row, col, coef, index, cols, rows; cols = imagecols >> (lvl - 1); rows = imagerows >> (lvl - 1); for (row = 0; row < rows; row++) for (col = 0; col < cols; col++) { imagesyn[col][row][1] = 0; for (coef = 0; coef < dec-filterlen; coef++) { index = reflect(row + (decfilterlen / 2) - coef, rows); if (even(index)) image-syn[col][row][1] += declopass[coef] * image-syn[col][index / 2][0]; index = reflect(row + (decfilterlen / 2) - coef - 1, rows); if (even(index)) image-syn[col][row][1] += dechipass[coef] * image-syn[col][(index / 2) 111 + (rows / 2)][0]; } } for (row = 0; row < rows; row++) for (col = 0; col < cols; col++) { image-syn[col][row][0] = 0; for (coef = 0; coef < decfilterlen; coef++) { index = reflect(col + (decfilterlen / 2) - coef, cols); if (even(index)) image-syn[col][row][0] += declopass[coef] * imagesyn[index / 2][row][1]; index = reflect(col + (decfilterlen / 2) - coef - 1, cols); if (even(index)) imagesyn[col][row][0] += dechi_pass[coef] * imagesyn[(index / 2) + (cols / 2)][row][1]; } } } int input-bit() { short next-bit = bitBuf[bitBufIndex]; bitBufIndex++; if (next-bit == 1) return 1; else return 0; } void initarith-decode() { int i; value = 0; for (i = 0; i < valuebits; i++) value = (value << 1) + input-bito; } int decodesym(int table) { int range = high - low + 1; int neg-size = (- range) >> code[table]; int trunc = ((- range) & ((1 << code[table]) int place = (1 << (8 - code[table])) > ((((value - low + !trunc) << 8) - 1) / range); int sym = places[table] ^ place; if (place) high = low + ~neg-size; else low = low + ~neg-size + 1; for (; 1;) { if (high < half) else if (low >= half) { 112 - 1)) > 0; value -= low -= high -= half; half; half; } else if ((low >= first.qtr) && (high < thirdqtr)) { value -= firstqtr; low -= first-qtr; -= first.qtr; high } else break; low = 2 * low; high = (2 * high) + 1; value = (2 * value) + input-bito; } if (table) update.model(sym, table); return sym; } int decodedomsym() { int symi = 0; if (zero-poss) symi = decode-sym(1); if (symn) return zero.sym; if (decode-sym(2)) if (decode-sym(0)) return neg-sym; else return pos-sym; return rootsym; } int decodesubbit() { return decode-sym(O); } void decode-pass(int bit) { int row, col, sym, x, y, rowy, colx; int **part-oftree = make2dintarray(imagecols,imagerows); int rows = imagerows >> sblvls; int cols = imagecols >> sblvls; int subbit = bit >> 1; initarithmodel(); zero-poss = 1; for (row = 0; row < rows; row++) for (col = 0; col < cols; col++) { if (found[col][row] == 0) { sym = decodedomsymo; 113 if (sym == pos-sym) {found[col][row] = 1; mags[col] [row] = bit; } else if (sym == negsym) {found[col][row] = 1; signs[col][row] = 1; mags[col][row] = bit; } else if (sym == root-sym) { flag-dscndnts(col + cols,row ,sblvls,part-oftree); flag-dscndnts(col ,row + rows,sblvls,part-oftree); flag-dscndnts(col + cols,row + rows,sblvls,part-oftree); } } if (subbit && found[col] [row]) if (decodesubbitO) mags[col][row] = mags[col][row] I subbit; } initarithmodel(); if (bit == (1 << (alu.bits - 2 - encrenorm[2] - bottom-bit))) zero-poss = 0; for (row = 0; row < rows; row++) { for (col = 0; col < cols; col++) { if ((found[col + cols][row] == 0) && (part-oftree[col + cols][row] == 0)) { sym = decodedomsymO; if (sym == possym) {found[col + cols][row] = 1; mags[col + cols][row] = bit; } else if (sym == negsym) {found[col + cols][row] = 1; signs[col + cols][row] = 1; mags[col + cols][row] = bit; } else if (sym == root-sym) flagdscndnts(col + cols,row,sblvls,part-of-tree); } if (subbit && found[col + cols][row]) if (decodesubbit() mags[col + cols][row] = mags[col + cols][row] I subbit; } for (col = 0; col < cols; col++) { if ((found[col][row + rows] == 0) && (part-oftree[col][row + rows] == 0)) { sym = decodedomsymo; if (sym == pos.sym) {found[col][row + rows] = 1; mags[col][row + rows] = bit; } else if (sym == neg-sym) {found[col][row + rows] = 1; signs[col][row + rows] = 1; mags[col][row + rows] = bit; } else if (sym == root-sym) flag-dscndnts(col,row + rows,sblvls,part-of-tree); } 114 if (subbit && found[col] [row + rows]) if (decode-sub-bito) mags[col] [row + rows] = mags[col] [row + rows] I sub-bit; if ((found[col + cols][row + rows] == 0) && (part-oftree[col + cols][row + rows] == 0)) { sym = decode-domsymO; if (sym == pos-sym) {found[col + cols][row + rows] = 1; mags[col + cols][row + rows] = bit; } else if (sym == neg-sym) {found[col + cols][row + rows] = 1; signs[col + cols][row + rows] = 1; mags[col + cols][row + rows] = bit; } else if (sym == root-sym) flag-dscndnts(col + cols,row + rows,sblvls,part-of-tree); } if (subbit && found[col + cols][row + rows]) if (decode-subbitO) mags[col + cols][row + rows] mags[col + cols][row + rows] = I sub-bit; } } initarithmodel(); if (bit == (1 << (alu.bits - 2 zero-poss = 0; if (bit < (1 << (alu.bits - 1 - - encrenorm[1] encrenorm[1] - - bottom-bit))) bottombit))) int rows = imagerows >> 2; int cols = imagecols >> 2; for (row = 0; row < rows; row++) for (col = cols; col < (cols << 1); col++) { if ((found[col][row] == 0) && (part-oftree[col][row] == 0)) { == 0)) { sym = decodedom-symO; if (sym == possym) {found[col][row] = 1; mags[col][row] = bit; } else if (sym == neg.sym) {found[col][row] = 1; signs [col] [row] = 1; mags[col] [row] = bit; } else if (sym == rootsym) flag-dscndnts(col,row,2,part-oftree); } if (sub-bit && found[col][row]) if (decodesubbit() mags[col] [row] = mags[col][row] I sub-bit; } for (row = rows; row < (rows << 1); row++) for (col = 0; col < cols; col++) { if ((found[col][row] == 0) && (part-oftree[col][row] sym = decodedom-sym(; 115 { if (sym == possym) {found[col] [row] = 1; mags[col] [row] = bit; } else if (sym == negsym) {found[col][row] = 1; signs[col] [row] = 1; mags[col] [row] = bit; } else if (sym == rootsym) flag-dscndnts(col,row,2,part-oftree); } if (sub-bit && found[col][row]) if (decodesub-bit() mags [col] [row] = mags [col] [row] I subbit; } for (row = rows; row < (rows << 1); row++) for (col = cols; col < (cols << 1); col++) { if ((found[col][row] == 0) && (partoftree[col][row] sym = decodedomsym(; if (sym == pos-sym) {found[col] [row] = 1; mags[col] [row] = bit; } else if (sym == negsym) {found[col][row] signs [col] [row] = = == 0)) { 1; 1; mags[col] [row] = bit; } else if (sym == rootsym) flag-dscndnts(col,row,2,part-oftree); } if (sub-bit && found[col][row]) if (decodesubbitO) mags [col] [row] = mags [col] [row] I sub-bit; } } initarithmodel(); zero-poss = 0; if (bit < (1 << (alu.bits - 1 - encrenorm[0] - bottom-bit))) int rows = imagerows >> 1; int cols = imagecols >> 1; for (y = 0; y < 2; y++) for (x = 0; x < 2; x++) for (rowy = 0; rowy < rows; rowy += 2) for (colx = cols; colx < (cols << 1); colx += 2) { row = rowy + y; col = colx + X; if ((found[col][row] == 0) && (part-oftree[col][row] sym = decodedomsym(; if (sym == possym) = 1; mags[col] [row] = bit; } {found[col] [row] else if (sym == neg-sym) {found[col] [row] = 1; signs[col][row] = 1; 116 == 0)) { { mags [coil [row] = bit; } else if (sym == rootsym) flag-dscndnts(col,row,1,part-oftree); } if (sub-bit && found[col][row]) if (decodesubbitO) mags[col][row] = mags[col][row] I subbit; } for (y = 0; y < 2; y++) for (x = 0; x < 2; x++) for (rowy = rows; rowy < (rows << 1); rowy += 2) for (colx = 0; colx < cols; colx += 2) { row = rowy + Y; col = colx + X; if ((found[col][row] == 0) && (part-oftree[col][row] == 0)) { sym = decodedom-symO; if (sym == pos-sym) = 1; mags[col][row] = bit; } {found[col][row] else if (sym == neg-sym) {found[col][row] = 1; signs[col][row] = 1; mags[col][row] = bit; } else if (sym == rootsym) flag-dscndnts(col,row,1,part-oftree); } if (sub-bit && found[col] [row]) if (decodesubbitO) mags[col][row] = mags[col][row] I subbit; } for (y = 0; y < 2; y++) for (x = 0; x < 2; x++) for (rowy = rows; rowy < (rows << 1); rowy += 2) for (colx = cols; colx < (cols << 1); colx += 2) { row = rowy + y; col = colx + X; if ((found[col][row] == 0) && (part-oftree[col][row] sym = decodedom-symO; if (sym == possym) = bit; } = 1; mags[col][row] {found[col][row] else if (sym == neg-sym) {found[col][row] signs[col][row] = = 1; 1; mags[col][row] = bit; } else if (sym == root-sym) flag-dscndnts(col,row,1,partof_tree); } 117 == 0)) { if (sub-bit && found[col][row]) if (decodesubbit() mags[col][row] = mags[col][row] I sub-bit; } } destroy2dintarray(part-of-tree,imagecols,imagerows); } void dumpimage(int frm) { unsigned char temp-char; int row, col; for (row = 0; row < imagerows; row++) for (col = 0; col < imagecols; col++) { image-syn[col][row][0] += image-mean; tempchar = (char) image-syn[col][row][0]; if (image-syn[col][row][0] < 0) temp.char = 0; if (image-syn[col][row][0] > 255) temp.char = 255; ImageData[frm][((row*imagecols)+col)*4 ] = temp-char; ImageData[frm][((row*imagecols)+col)*4+1] ImageData[frm][((row*imagecols)+col)*4+2] = temp-char; = temp-char; } } /I************* END of DECODE Routines *********/ //************* DMA Routines *******************/I BOOL OpenWinRT(VOID) { hWinRT = WinRTOpenDevice(0, FALSE); //open device 0, no sharing if (hWinRT == INVALIDHANDLE-VALUE) { wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"Can't Start HWinRT Driver"); return(FALSE); } return(TRUE); } BOOL PointEcpOut(VOID) { 118 //%% // start up the preprocessor // //#SetSize 8 //#SetAbsolute On //#OnError pointouttrap // // DimB cTemp; // // // cTemp = 0x034; // mode 001, disable int, dma, serviceint outp(LptEcr,cTemp); // // // // // cTemp = inp(LptDcr); //get device control reg cTemp = cTemp & OxOOC4; // bring bits 5,4,3,1,0 low cTemp = cTemp I 0x0004; // and bit 2 high. This sets direction (bit 5) to OUT, // // disables nAck int (bit 4), // // brings nSelectin (1284mode) high (bit 3) // // // // // brings nInit (nReverseRequest) high (bit 2) // brings nAutoFd (nCmd/Data) high (bit 1) // brings nStrobe high (bit 0) // Note bits 0, 1, and 3 are inverted. outp(LptDcr,cTemp); // // // // cTemp = 0x0074; //go to mode 011 (ECP) outp(LptEcr,cTemp); I///7 { WINRTCONTROLITEM _WinRTpp0l[ = {// command parami param2 {DIM,0x10040001,0x0074},// OxOO constant {DIM,0x10040001,0x0004},// 0x01 constant {DIM,0x10040001,0x00C4},// Ox02 constant {DIM,0x10040001,0x0034},// Ox03 constant {DIM,0x00010001, Ox0},// Ox04 cTemp {MATH,0x000D0004 ,0x00030000}, {MATH,0x000D0004 ,0x00042000}, {OUTPBA,0,0}, {INPBA,0,0}, {MATH,0x000D0004,0x00044000}, {MATH,0x00080004,0x00040002}, {MATH,0x00090004 ,0x00040001}, {MATH,0x000D0004,0x00042000}, 119 {OUTPBA,0,0}, {MATH,0x000D0004,0x00000000}, {MATH,0x000D0004,0x00042000}, {OUTPBA,0,0}, }; _WinRTppO1 [ 4] value = (ULONG) cTemp; _WinRTpp0l[ 7].port = LptEcr; _WinRTpp0l[ 8].port = LptDcr; _WinRTpp0l[13].port = LptDcr; WinRTpp0l[16].port = LptEcr; if (!WinRTProcessIoBufferDirect(hWinRT, sizeof(_WinRTpp0l), &iWinRTlength)) _WinRTppOl, goto pointout-trap; cTemp = (UCHAR)_WinRTpp0l[ 4].value; // } return (TRUE) pointout-trap: wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"HWinRT error in PointEcpOut"); WinRTCloseDevice(hWinRT); return(FALSE); } BOOL PointEcpIn(VOID) { // // need to set direction to 0, strobe to 0, autoFD to 0, mode to 011 (ECP mode) //switch directions by first switching to mode 001, // negotiating for the channel, and setting // mode back to 011 //%% // start up the preprocessor // //#SetSize 8 //#SetAbsolute On //#OnError pointintrap /i // DimB cStat; // 120 // cStat = 0x034; // outp(LptEcr,cStat); // mode 001, disable int, dma, serviceint // // cStat = inp(LptDcr); //get device control reg // // // // // // // // // cStat = cStat & OxOOC4; // bring bits 5,4,3,1,0 low cStat = cStat I 0x0004; // and bit 2 high. This sets direction (bit 5) to OUT, // disables nAck int (bit 4), // brings nSelectin (1284mode) high (bit 3) // brings nInit (nReverseRequest) high (bit 2) // brings nAutoFd (nCmd/Data) high (bit 1) // brings nStrobe high (bit 0) // Note bits 0, 1, and 3 are inverted. // outp(LptDcr,cStat); // /I/we're in the default mode. Now switch the direction to in // // cStat = cStat I 0x0020; //bring bit 5(dir) high // // // cStat = cStat & OxOQFB; // outp(LptDcr,cStat); bring bit 2 (nInit, nReverseRequest) low // // cStat = 0x0074; //go to mode 011 (ECP) // outp(LptEcr,cStat); /7/ { WINRTCONTROLITEM _WinRTpp02[ {// command parami param2 {DIM,0x10040001,0x0074},// = OxOO constant {DIM,0x10040001,0x00FB},// 0x01 constant {DIM,0x10040001,0x0020},// Ox02 constant {DIM,0x10040001,0x0004},// Ox03 constant {DIM,0x10040001,0x00C4},// 0x04 constant {DIM,0x10040001,0x0034},// OxO5 constant {DIM,Ox0010001, Ox0},// Ox06 cStat {MATH,0x000D0006,0x00050000}, {MATH,0x000D0006,0x00062000}, {OUTPBA,0,0}, {INPBA,0,0}, {MATH,0x000D0006,0x00064000}, {MATH,0x00080006 ,0x00060004}, {MATH,0x00090006,0x00060003}, {MATH,OxOOOD0006,,0x00062000}, {OUTPBA,0,0}, 121 {MATH,0x00090006,0x00060002}, {MATH,OxOOOD0006,0x00062000}, {OUTPBA,0,0}, {MATH,0x00080006,0x00060001}, {MATH,OxOOOD0006,0x00062000}, {OUTP-BA,0,0}, {MATH,OxOOOD0006,OxOOOOOOOO}, {MATH,OxOOOD0006,0x00062000}, {OUTPBA,0,0}, }; _WinRTpp02[ 6].value = (ULONG)cStat; _WinRTpp02[ 9].port = LptEcr; _WinRTppO2[10].port = LptDcr; _WinRTpp02[15].port = LptDcr; _WinRTpp02[18].port = LptDcr; _WinRTpp02[21].port = LptDcr; _WinRTpp02[24].port = LptEcr; if (!WinRTProcessIoBufferDirect(hWinRT, _WinRTpp02, sizeof(_WinRTpp02), &iWinRTlength)) goto pointin-trap; cStat = (UCHAR)-WinRTpp02[ 6].value; // } return(TRUE); pointin-trap: wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"HWinRT error in PointEcpIn"); WinRTCloseDevice(hWinRT); return(FALSE); } BOOL StartDmaIn(VOID) { // // // LowAdd=(BYTE)(SramAddress); MidAdd=(BYTE)(SramAddress>>8); HighAdd=(BYTE)(SramAddress>>16); if(OpenWinRT() ==FALSE) { return(FALSE); } 122 /can I assume ECP is already pointing out? //if(PointEcpOut0==FALSE) // return(FALSE); //%% // start up the preprocessor // //#SetSize 8 //#SetAbsolute On //#OnError StartDmaIn-trap // // DimB LowAdd; // // DimB MidAdd; DimB HighAdd; // // // outp(EcpAFifo,0x030); //write SRAM low address to HOSTCMD // outp(EcpDFifo,LowAdd); // outp(EcpAFifo,0x034); //write SRAM mid address to HOSTCMD // outp(EcpDFifo,MidAdd); // outp(EcpAFifo,0x038); //write SRAM high address to HOSTCMD // outp(EcpDFifo,HighAdd); // outp(EcpAFifo,0x040); /mnit HOSTCMD with SRAMWR I- if(PointEcpIn()==FALSE) { return(FALSE); } II I/ prepare the WinRT DMA common buffer and get the DMA buffer information if (!WinRTSetupDmaBuffer(hWinRT, &DmaInformation, &Length)) { wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"WinRTSetupDmaBuffer failed"); 123 WinRTCloseDevice(hWinRT); return(FALSE); } if (DmaInformation.Length < DMASIZE) { wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg, "WinRT can't allocate enough buffer memory"); WinRTFreeDmaBuffer(hWinRT, &DmaInformation, &Length); WinRTCloseDevice(hWinRT); return(FALSE); } //%% // start up the preprocessor //#SetSize 8 //#SetAbsolute On //#OnError StartDmaIn-trap // // //start the printer port dma // DimB cStat; // DimW NumberOfBytes; // // cStat = OxO7C; //set bit 3 to enable DMA // outp(LptEcr,cStat); // // cStat = 0x078; //clear bit 2 to start DMA // outp(LptEcr,cStat); I//// // start the DMA DmaStart(FALSE,NumberOfBytes); // start DMA in { WINRTCONTROLITEM -WinRTpp03[ = {/ command parami param2 {DIM,0x10040001,0x0078},// OxOO constant {DIM,0x10040001,0x007C},// 0x01 constant {DIM,Ox0010001, Ox0},// 0x02 cStat {DIM,0x00020001, Ox0},// Ox03 NumberOfBytes {MATH,OxOOOD0002,OxOOO10000}, {MATH,0x000D0002,0x00022000}, {OUTPBA,0,0}, 124 {MATH,OxOOOD0002,OxOOOOOOOO}, {MATH,OxOOOD0002,0x00022000}, {OUTPBA,0,0}, {MATH,OxOOOD0003,0x00032000}, {DMASTARTOxO,OxO}, }; _WinRTpp03[ 2].value = (ULONG)cStat; _WinRTpp03[ 3].value = (ULONG)NumberOfBytes; _WinRTpp03[ 6].port = LptEcr; _WinRTpp03[ 9].port = LptEcr; if (!WinRTProcessDmaBufferDirect(hWinRT, _WinRTpp03, sizeof(_WinRTpp03), &iWinRTlength)) goto StartDmaIn_trap; cStat = (UCHAR)_WinRTpp03[ 2].value; NumberOfBytes = (USHORT)_WinRTppO3[ 3].value; // } WinRTCloseDevice(hWinRT); return (TRUE); StartDmaIn_trap: wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"WinRT driver failure in StartDmaIn"); WinRTCloseDevice(hWinRT); return(FALSE); } BOOL FinishDmaIn (LPSTR lpDmaDest,int a) { BOOL ReturnValue = TRUE; ULONG TimeoutTime; if(OpenWinRT()==FALSE) { return (FALSE); } TimeoutTime=GetTickCount(+1000; I/M second DMA timeout //wait for LptEcr bit 2 to go high again, signalling DMA done while((cStat & 0x0004) == 0) { 125 if(GetTickCount()>TimeoutTime) { wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"DMA timed out:Xd",a); ReturnValue=FALSE; goto PastDmaInWait; } //%% // start up the preprocessor // #SetSize 8 // #SetAbsolute On // DimB cStat; // // cStat = inp(LptEcr); //XXo/ { WINRTCONTROLITEM -WinRTpp04[] = {// command parami param2 {DIM,OxOO010001, Ox0},// OxOO cStat {INPBA,0,0}, {MATH,OxOOODOOOO,0x00004000}, }; _WinRTpp04 [ 0] value = (ULONG) cStat; _WinRTpp04[ 1].port = LptEcr; (void) WinRTProcessIoBuffer(hWinRT, -WinRTpp04, sizeof(_WinRTpp04), &iWinRTlength); cStat = (UCHAR)_WinRTpp04[ 0].value; // } } PastDmaInWait: //%% // start up the preprocessor // #SetSize 8 // #SetAbsolute On // // DmaFlush(; { WINRTCONTROLITEM _WinRTpp05[] {// command parami param2 = {DMAFLUSH,0x0,0x0}, 126 }; (void) WinRTProcessIoBuffer(hWinRT, _WinRTpp05, sizeof(_WinRTpp5), &iWinRTlength); // } //take a look at the buffer memory. memcpy(lpDmaDest,DmaInformation.pVirtualAddress,\ NumberOfBytes); // release the DMA buffer back to the system if (!WinRTFreeDmaBuffer(hWinRT, &DmaInformation, &Length)) { wsprintf(szErrorTitle,"ERROR"); wsprintf(szErrorMsg,"WinRTFreeDmaBuffer failed"); WinRTCloseDevice(hWinRT); return(FALSE); } WinRTCloseDevice(hWinRT); return(ReturnValue); } //************* END OF DMA Routines ************// * finiObjects * * finished with all objects we use; release them static void finiObjects( void ) { if( lpDD != NULL ) { if( lpDDSPrimary != NULL ) { lpDDSPrimary->Release(); lpDDSPrimary = NULL; } 1pDD->Release(); 127 lpDD = NULL; } } /* finiObjects */ //***This routine compares the source //***file and the output file from the decoder*** static BOOL bitDiff(HDC hdc) { int grp,found; FILE *ORG,*DEST,*RPT; char o,d; char *tmpStr; char buf[256]; unsigned long i=O; int rptPos = 0; ZeroMemory(rptBuf,sizeof(rptBuf)); //wsprintf(buf,"Bit diff ..... ); //TextOut(hdc,local-x+50,local-y+50,buf,lstrlen(buf)); if ( (ORG = fopen(filename-org,"r")) == NULL ) { //wsprintf(buf,"Cannot open org file to compare"); //TextOut(hdc,localx+400,local-y+50,buf,lstrlen(buf)); return FALSE; } // if ((RPT = fopen(filename-rpt,"a+")) == NULL) // wsprintf(buf,"Cannot open report file"); // TextOut(hdc,400,0,buf,lstrlen(buf)); // return FALSE; // } for (grp=0;grp<groups;grp++) { tmpStr = makefrm-name(filename-out,grp); if ((DEST = fopen(tmpStr,"r")) == NULL) { free(tmpStr); return FALSE; } //****** Compare routine *******// free (tmpStr); i = 0; fseek(ORG,0L,SEEKSET); found = 0; while (feof(ORG) == 0) { o = fgetc(ORG); if ((o != '0') && (o != d = fgetc(DEST); '1')) break; 128 { wsprintf(buf,"o:%c dc",o,d); Textfut(hdc,400,rptPos,buf,lstrlen(buf)); if (o != d) { found = 1; wsprintf (rptBuf [grp], "Diff at:%d TextOut(hdc,400,rptPos,rptBuf[grp], lstrlen(rptBuf[grp])); rptPos = rptPos + 20; break; } i++; } if (found == 0) { wsprintf(rptBuf[grp], "No diff(Xd bits) TextOut(hdc,400,rptPos,rptBuf[grp], lstrlen(rptBuf[grp])); rptPos = rptPos + 20; } } fclose(DEST); fclose(ORG); return TRUE; } //This routine reads 2 blocks of data group(2 of 4 DMA transfers) //or 2 of 16 frames and decode it static BOOL readDMAandDecode(HWND hwnd,HDC hdc) { HANDLE hSrc; DWORD dwRead; char buf[256]; char cBuf[1]; short ch; int i,j,grp,rowcol,bit,lvl; long local-x, local-y; imagecols = 128; imagerows = 128; alubits = 12; sblvls = 3; group-size = GRPSIZE; groups = NUMGRP; 129 ezw-passes = EZWPASSES; bottom-bit = 8 - ezwpasses; image-syn = make3ddblarray(imagecols,imagerows,2); signs = make2dintarray(imagecols,imagerows); mags = make2dintarray(imagecols,imagerows); found = make2dintarray(imagecols,imagerows); prev-frame = make2dintarray(imagecols,imagerows); ZeroMemory (ImageData, sizeof (ImageData)); ZeroMemory (ImageDataO, sizeof (ImageDataO)); ZeroMemory (bitBuf, sizeof(bitBuf)); SetBkColor( hdc, RGB( 255, 255, 255 ) ); SetTextColor( hdc, RGB( 0, 0, 0 ) ); bitBufIndex = 0; totalBits = 0; RECT rt; GetClientRect(hwnd,&rt); local-x = rt.top; local-y = rt.left; //Initiate DMA transfer for (grp=0;grp<groups;grp++) { // 2 groups of data (2 of 16 frames) wsprintf(buf,"DMA Transfer group:%d ",grp); TextOut(hdc,local-x+50,localy+50,buf,lstrlen(buf)); for (block=0;block<=BLOCKCNT;block++) { // 4 blocks of DMA transfers if (StartDmaIn() == FALSE) { TextOut(hdc,local-x+50,local-y+50, szErrorMsg,istrlen(szErrorMsg)); MessageBox(hwnd,szErrorMsg,szErrorTitle, MBOK); return(FALSE); } if (FinishDmaIn(DmaBuf,block) == FALSE) { TextOut(hdc,localx+50,local-y+50, szErrorMsg,lstrlen(szErrorMsg)); MessageBox(hwnd,szErrorMsg,szErrorTitle, MBOK); return(FALSE); } //Write buffer to file for (i=O;i<NumberOfBytes;i++) { for (j=7;j>=0;j--) { ch = DmaBuf[i] & (1<<j) ? 1 0; 130 bitBuf[bitBufIndex] = ch; bitBufIndex++; } } if (block == BLOCKCNT) { if (OpenWinRTO==FALSE) { return FALSE; } if (PointEcpOuto==FALSE) { return FALSE; } WinRTCloseDevice(hWinRT); } } } bitBufIndex = 0; for (grp = 0; grp < groups; grp++) { low = 0; high = top-value; //2047 initarithdecodeo; initarithmodel(); for (row = 0; row < imagerows; row++) for (col = 0; col < imagecols; col++) prevframe[col][row] = 0; image-mean = 0; for (bit = 0; bit < 8; bit++) image-mean = (image-mean << 1) + decode-sym(0); for (frame = 0; frame < group-size; frame++) { //*** Display stuff wsprintf(buf, "Decoding group:Xd frame:Xd",grp,(grp*group-size)+frame); TextOut(hdc,local-x+50,local-y+50,buf,lstrlen(buf)); for (row = 0; row < imagerows; row++) for (col = 0; col < imagecols; col++) { mags[col][row] 0; = signs[col][row] = 0; found[col][row] = 0; } for (top-bit = 1 << (9 - bottom-bit); top-bit && !decodesym(0); top-bit = top-bit >> 1); for (bit = top-bit; bit; bit = (bit >> 1)) decode-pass(bit); if (top-bit && frame) { image-mean = 0; for (bit = 0; bit < 8; bit++) 131 image-mean = (imagemean << 1) + decodesym(O); } for (row = 0; row < imagerows; row++) for (col = 0; col < imagecols; col++) { if (found[col][row]) { if (signs[col][row]) prev-frame[col][row] -= mags[col][row] << (bottombit + 1); else prev-frame[col][row] += mags[col][row] << (bottombit + 1); } image-syn[col][row][0] = (double) prevjframe[col][row]; } for (lvl = sblvls; lvl > 0; lvl--) synthesize-image(lvl); // Instead of dumping it to a file, dump it to an array. dumpimage((grp*group-size)+frame); totalBits = bitBuf Index; if ( ((grp*group.size)+frame) == NUMFRM-1 ) { destroy3ddblarray(image-syn,imagecols,imagerows,2); destroy2dintarray(signs,imagecols,imagerows); destroy2dintarray(mags,imagecols,imagerows); destroy2dintarray(found,imagecols,imagerows); destroy2dintarray(prev-frame,imagecols,imagerows); return TRUE; } } } //****** Done decoding destroy3ddblarray(image-syn,imagecols,imagerows,2); destroy2dintarray(signs,imagecols,imagerows); destroy2dintarray(mags,imagecols,imagerows); destroy2dintarray(found,imagecols,imagerows); destroy2dintarray(prev-frame,imagecols,imagerows); return TRUE; } static BOOL boardInit(HWND hwnd) { if(OpenWinRTO==FALSE) { return(FALSE); } if (PointEcpOutO==FALSE) { return FALSE; } WinRTCloseDevice(hWinRT); return TRUE; 132 } static BOOL readGarbage(HWND hwnd) { int i,loc-block; //Read to clear the SRAM on the board. (2 DMA reads) for (i=0;i<2;i++) { for (loc-block=;locblock<=BLOCKCNT;locblock++) { { if (StartDmaIn() == FALSE) MessageBox(hwnd,szErrorMsg,szErrorTitle,MB-OK); return(FALSE); } if (FinishDmaIn(DmaBuf,loc-block) == FALSE) { MessageBox(hwnd,szErrorMsg,szErrorTitle,MBOK); return(FALSE); } if (loc.block == BLOCKCNT) { if (OpenWinRT(==FALSE) { return FALSE; } if (PointEcpOutO==FALSE) { return FALSE; } WinRTCloseDevice(hWinRT); } } } return TRUE; } LRESULT CALLBACK WindowProc( HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam ) { PAINTSTRUCT ps; RECT rc; SIZE size; TCHAR szHello[MAXLOADSTRING]; LoadString(hInst, IDS-HELLO, szHello, MAXLOADSTRING); static HWND hPickedButton = NULL; static BYTE phase = 0; static BYTE error = 0; static char buf [256]; int wmId, wmEvent; HDC hdc; 133 long local-x, local-y; switch( message ) { case WMACTIVATEAPP: bActive = wParam; break; case WMCREATE: // Set mouse pointer.. hArrowCursor = LoadCursor (NULL, IDCARROW); hWaitCursor = LoadCursor (NULL, IDCWAIT); SetCursor (hArrowCursor); hBtnP2 = CreateWindow WSCHILD I WSVISIBLE 300,50,130,30, hWnd, (HMENU)BTN-P2, hInst, NULL); hBtnP3 = CreateWindow WSCHILD I WSVISIBLE 300,80,130,30, hWnd, (HMENU)BTNP3, hInst, NULL); ("BUTTON", "Level I BSRADIOBUTTON, 2 (0100)", ("BUTTON", "Level I BS-RADIOBUTTON, 3 (0011)", hBtnP4 = CreateWindow ("BUTTON", "Level WSCHILD I WSVISIBLE I BS-RADIOBUTTON, 300,110,130,30, hWnd, (HMENU)BTNP4, hInst, NULL); 4 (0010)", hBtnP5 = CreateWindow ("BUTTON", "Level WSCHILD I WSVISIBLE I BSRADIOBUTTON, 300,140,130,30, 5 (0001)", hWnd, (HMENU)BTNP5, hInst, NULL); hBtnP6 = CreateWindow ("BUTTON", "Level WSCHILD I WSVISIBLE I BSRADIOBUTTON, 300,170,130,30, hWnd, (HMENU)BTN-P6, 134 6 (0000)", hInst, NULL); EZWPASSES = 6; SendMessage(hBtnP6, BMSETCHECK, 1, OL); break; //case WMSETCURSOR: // // SetCursor(NULL); return TRUE; case WMTIMER: // ZeroMemory (ImageDataO, sizeof(ImageDatao)); RECT rt; GetClientRect(hWnd,&rt); localx = rt.top; local-y = rt.left; // Flip surfaces if( bActive ) { if (lpDDSPrimary->GetDC(&hdc) == DD.0K) { wsprintf(buf, "Frame:%dYdXd " (frame-display%1000)/100, (frame-display%100)/10,(frame-display%10)); TextOut( hdc, local-x + 50, local-y + 50, buf, lstrlen(buf) ); if (totalBits != 0) ratio = BITSIN_A_FRAME*GRPSIZE/totalBits; else ratio = 1; wsprintf(buf, "Total bits used:%d. Ratio Xd : 1 ",totalBits,ratio); TextOut(hdc, local-x + 50, local-y + 70, buf , lstrlen(buf) ); for (nt k=0;k<groups;k++) { TextOut(hdc,400,k*20,rptBuf[k],lstrlen(rptBuf[k])); } framePtr = ImageData[frame-display]; if(SetBitmapBits(hbm,sizeof(ImageData[frame-display]), framePtr) == 0) { PostMessage(hWnd, WMCLOSE, 0, 0); } if ( StretchBlt( hdc, 100, 100, 128, 128, hdcImage, 0, 0, 128, 128, SRCCOPY ) == FALSE ) { PostMessage(hWnd, WMCLOSE, 0, 0); } lpDDSPrimary->ReleaseDC (hdc); } } 135 //Move to next frame-display if (!pause) { if (frame-display >= (NUMFRM-1)) { frame-display = 0; if (lpDDSPrimary->GetDC(&hdc) == DDOK) { if (!readDMAandDecode(hWnd,hdc)) { MessageBox(hWnd, "ERROR(2):readDMAandDecode!", "ERROR",MBOK); exit(0); } lpDDSPrimary->ReleaseDC(hdc); } } else { framedisplay = framedisplay++; } } break; case WMKEYDOWN: switch( wParam ) { case VK.ESCAPE: case VK_F12: PostMessage(hWnd, WMCLOSE, 0, 0); break; case VK-SPACE: pause = !pause; break; case VKRIGHT: if (pause) { if (frame-display >= (NUMFRM-1)) { frame-display = 0; } else { frame-display = framedisplay++; } } break; case VKLEFT: if (pause) { if (frame-display <= 0) { frame-display = NUMFRM-1; } else { frame-display = frame-display--; } } break; } break; case WMCOMMAND: 136 wmId = LOWORD(wParam); wmEvent = HIWORD(wParam); switch (wmId) { case BTNP2: EZWPASSES = 2; goto ChangePasses; break; case BTNP3: EZWPASSES = 3; goto ChangePasses; break; case BTNP4: EZWPASSES = 4; goto ChangePasses; break; case BTNP5: EZWPASSES = 5; goto ChangePasses; break; case BTNP6: EZWPASSES = 6; goto ChangePasses; ChangePasses: hPickedButton = (HWND)lParam; SendMessage(hBtnP2, BMSETCHECK, 0, OL); SendMessage(hBtnP3, BM-SETCHECK, 0, OL); SendMessage(hBtnP4, BMSETCHECK, 0, OL); SendMessage(hBtnP5, BMSETCHECK, 0, OL); SendMessage(hBtnP6, BMSETCHECK, 0, OL); SendMessage(hPickedButton, BM-SETCHECK, 1, OL); break; default: return DefWindowProc(hWnd, message, wParam, lParam); } break; case WMDESTROY: finiObjects(); PostQuitMessage( 0 ); break; } 137 return DefWindowProc(hWnd, message, wParam, lParam); } /* WindowProc */ static BOOL doInit( HINSTANCE hInstance, int nCmdShow ) { HWND hwnd; WNDCLASSEX wc; DDSURFACEDESC ddsd; DDSCAPS ddscaps; ddrval; HRESULT hdc; HDC // buf[256]; char * set up and register window class wc.cbSize = sizeof(WNDCLASSEX); wc.style = CSHREDRAW I CSVREDRAW; wc.lpfnWndProc = (WNDPROC)WindowProc; wc.cbClsExtra = 0; wc.cbWndExtra = 0; wc.hInstance = hInstance; wc.hIcon = LoadIcon( hInstance, IDIAPPLICATION ); wc.hCursor = LoadCursor( NULL, IDCARROW ); wc.hbrBackground = (HBRUSH)(COLORWINDOW+1); wc.lpszMenuName = (LPCSTR)IDCVDODMA3; wc.lpszClassName = szWindowClass; wc.hIconSm = LoadIcon(wc.hInstance, (LPCTSTR)IDI-SMALL); RegisterClassEx( &wc ); * create a window hwnd = CreateWindow( szWindowClass, szTitle, WSOVERLAPPEDWINDOW, 0, 0, GetSystemMetrics( SM-CXSCREEN ), GetSystemMetrics( SMCYSCREEN ), NULL, NULL, hInstance, NULL ); 138 if( !hwnd ) { return FALSE; } ShowWindow( hwnd, nCmdShow ); UpdateWindow( hwnd ); if (!boardInit(hwnd)) { MessageBox(hwnd, "Something wrong in boardInit","ERROR",MB.OK); return FALSE; } MessageBox(hwnd, "Initialization Completed.\nReset the board.", "Initialization",MBOK); if (!readGarbage(hwnd)) { return FALSE; } //**** Creating BITMAP *******II hbm = CreateBitmap (128,128,1,32,framePtr); if ( hbm == NULL ) { MessageBox(hwnd,"ERROR Creating Bitmap!", "ERROR", MB-OK); return FALSE; } hdcImage = CreateCompatibleDC( NULL ); SelectObject( hdcImage, hbm ); //**** DONE Creating BITMAP *******/ * create the main DirectDraw object //Set frame display to last frame to initiate DMA transfer //right away frame-display = NUMFRM-1; // ddrval = DirectDrawCreate( NULL, &lpDD, NULL ); if( ddrval == DDOK ) { Get exclusive mode 139 ddrval = lpDD->SetCooperativeLevel( hwnd, DDSCLNORMAL ); //DDSCLEXCLUSIVE I DDSCL-FULLSCREEN ); if(ddrval == DD-OK ) { //ddrval = 1pDD->SetDisplayMode( 640, 480, 8 ); //if( ddrval == DDOK ) { // Create the primary ddsd.dwSize = sizeof( ddsd ); ddsd.dwFlags = DDSD-CAPS; ddsd.ddsCaps.dwCaps = DDSCAPSPRIMARYSURFACE; ddrval = 1pDD->CreateSurface( &ddsd, &lpDDSPrimary, NULL ); if( ddrval == DDOK ) { ZeroMemory( &ddsd, sizeof( ddsd ) ); ddsd.dwSize = sizeof( ddsd ); ddrval = lpDDSPrimary->GetSurfaceDesc( &ddsd ); if( ddrval == DDOK ) { //lpDDSPrimary->ReleaseDC(hdc); //if (lpDDSBack->GetDC(&hdc) == DD-OK) //SetBkColor( hdc, RGB( 0, 0, 255 ) ); //SetTextColor( hdc, RGB( 255, 255, 0 ) ); //lpDDSBack->ReleaseDC(hdc); // Create a timer to flip the pages if( SetTimer( hwnd, TIMERID, TIMERRATE, NULL)) { //if // (!readFiles(hwnd)) { return FALSE; return TRUE; } } } } } wsprintf(buf, "Direct Draw Init Failed (X081x)\n", ddrval ); MessageBox( hwnd, buf, "ERROR", MBOK ); finiObjects(); DestroyWindow( hwnd ); return FALSE; } 140 int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { // TODO: Place code here. MSG msg; LoadString(hInstance, IDSAPPTITLE, szTitle, MAXLOADSTRING); LoadString(hInstance, IDCVDODMA3, szWindowClass, MAXLOADSTRING); if( !doInit( hInstance, nCmdShow ) ) { return FALSE; } while (GetMessage(&msg, NULL, 0, 0)) { TranslateMessage(&msg); DispatchMessage(&msg); } return msg.wParam; } C.1.5 resource.h //{{NODEPENDENCIES}} // Microsoft Developer Studio generated include file. // Used by vdodma3.rc // #define #define #define #define #define #define #define #define #define #define #define #define IDCMYICON IDD.VDDMA3_DIALOG IDDABOUTBOX IDSAPPTITLE IDMABOUT IDMEXIT IDSHELLO IDIVDODMA3 IDISMALL IDCVDODMA3 IDRMAINFRAME IDCSTATIC 2 102 103 103 104 105 106 107 108 109 128 -1 //Define button resources #define BTN.P2 202 #define BTNP3 203 #define BTNP4 204 #define BTNP5 205 #define BTNP6 206 141 // Next default values for new objects // #ifdef APSTUDIO-INVOKED #ifndef APSTUDIOREADONLYSYMBOLS #define _APSNEXTRESOURCEVALUE #define _APSNEXTCOMMANDVALUE #define _APSNEXTCONTROLVALUE #define _APSNEXTSYMEDVALUE #endif 132 32772 1000 110 #endif C.1.6 makefrm.c #This program is used to extract a PGM image file to get #raw digital pixels and EPROM data in Intel 83 format. #include <stdio.h> #include <stdlib.h> #include <string.h> FILE *inFile; FILE *outFile; FILE *out2File; FILE *epromFile; char *filename-in; int int int int **image; imagecols, imagerows; pe-x, pe-y; addr; char *strappend(char *strl,char *str2) { char *result; result = calloc((strlen(strl) + strlen(str2) + 1), strcpy(result, stri); strcat(result, str2); return result; } char *makefrm-name(char *name, char *ans; ans = calloc(5, int frnum) { sizeof(char)); if (frnum < 10) sprintf(ans, ".00%d", frnum); else if (frnum < 100) sprintf(ans, ".Od", frnum); else sprintf(ans, ".Ad", frnum); return strappend(name, ans); 142 sizeof(char)); } int **make2dintarray(int dl, int d2) { int **array; int index; array = calloc(dl, sizeof(int *)); for (index = 0; index < dl; index++) array[index] (nt *) calloc(d2,sizeof(int)); return array; = } void loadimage() { unsigned char temp-char; int row, col; fscanf(inFile, "P5 %d %d 255", &imagecols, &imagerows); fread(&tempchar,1,1,inFile); for (row = 0; row < imagerows; row++) for (col = 0; col < imagecols; col++) { fread(&tempchar,1,1,inFile); if (feof(inFile)) {printf("ERROR: too few pixels\n"); exit(0); } image[col][row] = (int) temp-char; } fread(&tempchar,1,1,inFile); if (feof(inFile) == 0) {printf("ERROR: too many pixels\n"); } int max(int a, int b) { if (a > b) return a; else return b; } int posonly(int a) { if (a < 0) return 0; else return a; } void dumpimage() { unsigned char tempschar; int row, col, x, y, pix; int x0 = posonly(imagecols - (4 * pe-x)) >> 1; int yO = posonly(imagerows - (4 * pey)) >> 1; fprintf(out2File,"P5 %d %d 255\n", 4 * pe.x, 4 * pe-y); for (y = 0; y < 4; y++) for (x = 0; x < 4; x++) for (row = 0; row < pe-y; row++) 143 exit(0); } for (col = 0; col < pe-x; col++) { pix = image[xO + x + (4 * col)][y0 + y + (4 * row)]; /* pix = 0; */ /* fprintf(outFile, "%.2X\n", pix); */ fprintf(epromFile, ":01%.4X007.2X%.2X\n", addr, pix, ((- (pix + 1 + (addr >> 8) + (addr & 255))) & 255)); addr++; } for (row = 0; row < (pe-y * 4); row++) for (col = 0; col < (pe-x * 4); col++) { /* image[x0 + col][y0+row] = 0; */ tempchar = (char) image[xO + col][yO + row]; fwrite(&tempchar,1,1,out2File); } } main (int argc,char *argv[]) { int offset, frame, frms, col, row; argc--; argv++; if(argc != 5) { printf("ERROR: Invalid number of arguments.\n"); printf( "USAGE: makefrm offset frms pe-x pe-y. pe-x=32 pe-y=32 for 128x128\n"); exit(0); } sscanf(*argv++, "7d", &offset); sscanf(*argv++, "Xd", &frms); sscanf(*argv++, "Xd", &pe-x); sscanf(*argv++, "7d", &pe.y); filenamein = *argv; inFile = fopen(strappend(filenamejin, ".000"), "r"); fscanf(inFile, "P5 %d %d 255", &imagecols, &imagerows); fclose(inFile); image = make2dintarray(max(imagecols, pe-x * 4), max(imagerows, pe-y * 4)); for (row = 0; row < max(imagerows, pe-y * 4); row++) for (col = 0; col < max(imagecols, pe-x * 4); col++) { image[col][row] = 127; } addr = 0; epromFile = fopen("pix.83", "w"); for (frame = 0; frame < frms; frame++) { 144 inFile = fopen(make-frm-name(filename-in, (frame + offset)), loadimageO; fclose(inFile); /* outFile = fopen(make-frmname("frmdat", frame), "w"); */ out2File = fopen(makejfrm.name("infrm", frame), "w"); dumpimageo; /* fclose(outFile); */ fclose(out2File); } fprintf(epromFile, ":00000001FF\n"); fclose(epromFile); } 145 "r");