International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 Cost Efective and High Speed Design of DMA Memory Copy Accelerator * Jhansirani Koti1 1 G. Mani Kumar 2 PG Student (M. Tech), Dept. of ECE, Chirala Engineering College, Chirala., A.P, India. 2 Assistant Professor, Dept. of ECE, Chirala Engineering College, Chirala., A.P, India. Abstract: Direct memory access (DMA) is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit (CPU).Without DMA, when the CPU is using programmed input/output, it is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work. With DMA, the CPU initiates the transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller when the operation is done. This feature is useful any time the CPU cannot keep up with the rate of data transfer, or where the CPU needs to perform useful work while waiting for a relatively slow I/O data transfer. A Universal Asynchronous Receiver/Transmitter is a type of "asynchronous receiver/transmitter", a piece of computer hardware that translates data between parallel and serial forms. The universal designation indicates that the data format and transmission speeds are configurable and that the actual electric signaling levels and typically are handled by a special driver circuit external to the UART. A UART is usually an individual (or part of an) integrated circuit used for serial communications over a computer or peripheral device serial port. UARTs are now commonly included in microcontrollers. UART IP soft core based on DMA mode is proposed and well elaborated using the characteristic of DMA. The entire UART IP soft core in DMA mode mainly includes the following 5 sub-modules: UART send controller, UART Receive controller, Register file with the Interface of Avalon-MM Slave, Master Read type DMA controller with the interface of Avalon-MM Master and Master Write type DMA controller with the interface of Avalon-MM Master. The design of UART IP Soft Core based on DMA Mode is simulated using Modelsim tool, and synthesized using Xilinx tool . microprocessors to an asynchronous 1. Introduction This thesis portrays a novel serial data channel. The receiver Universal converts serial start, data, parity and Asynchronous Receiver Transmitter. stop bits. The transmitter converts UARTs are used for asynchronous parallel data into serial form and serial data communication between automatically adds start, parity and remote embedded systems. The UART stop bits. The data word length can be is 5, 6, 7 or 8 bits. Parity may be odd or architecture for of interfacing ISSN: 2231-5381 computers or http://www.ijettjournal.org Page 150 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 even. Parity checking and generation to the serial data. An example of the can be inhibited. The stop bits may be UART frame one or two or one and one-half when Figure 1 below. transmitting 5-bit code. The Receiver format is shown in The UART can be used in a wide Universal Asynchronous range Transmitter (UART) modems, printers, peripherals and is a of applications popular and widely-used device for remote data communication in the field of Utilizing the Intersil advanced scaled telecommunication. SAJI There are data including IV acquisition CMOS systems. process permits different versions of UARTs in the operation clock frequencies up to industry. Some of them contain FIFOs 8.0MHz for requirements, the receiver/transmitter data (500K Baud). by comparison, reduced 9 Data bits mode (Start bit + 9 Data Status logic increases flexibility and bits simplifies the user interface. Parity + Stop bits). This 300mW to are buffering and some of them have the + from Power 10mW. application note describes a fully configurable UART optimized for and implemented in a variety of Lattice devices, which performance have and superior architecture compared to existing semiconductor ASSPs (application-specific standard products). This This design instantiated can many times also be to get multiple UARTs in the same device. For easily embedding the design into UART reference design contains a receiver and a transmitter. The Figure 1 Basic Application of UART receiver parallel performs conversion serial-toon asynchronous data frame the received from the serial data input SIN. The transmitter performs parallel-to- serial conversion on the 8-bit data received from the CPU. In order to synchronize the asynchronous serial data and to insure the data integrity, a larger implementation, instead of using tri-state buffers, the directional data bus is separated into two buses, DIN and DOUT. The transmitter and receiver both share a common internal Clk16X clock. This internal clock which needs to be 16 times of the desired baud rate clock frequency is obtained from the onboard clock through the MCLK input directly. Start, Parity and Stop bits are added ISSN: 2231-5381 bi- http://www.ijettjournal.org Page 151 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 Serial Data- Transmission & Reception The Universal (with or without wires). Examples are Asynchronous modulation Receiver/Transmitter (UART) phone line modems, RF modulation bytes transmits of data and takes the individual bits in a sequential fashion. the bits into complete audio signals with with data radios, and the DC-LIN for power line communication. At the destination, a second UART reassembles of Communication may be simplex (in one direction only, with no bytes. Each UART contains a shift provision for the receiving device to register, which is the fundamental send method of conversion between serial transmitting device), full duplex (both and Serial devices send and receive at the same transmission of digital information time) or half duplex (devices take (bits) through a single wire or other turns transmitting and receiving). medium is much more cost effective Receiver than parallel transmission through All parallel forms. multiple wires. The operations back of to the the UART hardware are controlled by a clock does not signal which runs at a multiple of the receive the data rate. For example, each data bit between may be as long as 16 clock pulses. different items of equipment. Separate The receiver tests the state of the interface devices are used to convert incoming signal on each clock pulse, the logic level signals of the UART to looking for the beginning of the start and from the external signaling levels. bit. If the apparent start bit lasts at External signals may be of many least one-half of the bit time, it is different of valid and signals the start of a new standards for voltage signaling are character. If not, the spurious pulse is RS-232, RS-422 and RS-485 from the ignored. After waiting a further bit EIA. Historically, current (in current time, the state of the line is again loops) was used in telegraph circuits. sampled Some signaling schemes do not use clocked into a shift register. After the electrical wires. Examples of such are required number of bit periods for the optical and character length (5 to 8 bits, typically) (wireless) Bluetooth in its Serial Port have elapsed, the contents of the shift Profile (SPP). Some signaling schemes register is made available (in parallel use modulation of a carrier signal fashion) to the receiving system. The directly external UART information usually generate signals forms. fiber, IrDA ISSN: 2231-5381 or used Examples (infrared), and http://www.ijettjournal.org the resulting Page 152 level International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 UART will set a flag indicating new data is generate request available, a also Transmission operation is simpler processor interrupt to since it is under the control of the that the and may Transmitter host processor transfers the received data. transmitting system. As soon as data is deposited in the shift register after The best UARTs "resynchronize" completion of the previous character, on each change of the data line that is the UART hardware generates a start more than a half-bit wide. In this way, bit, shifts the required number of data they the bits out to the line, generates and transmitter is sending at a slightly appends the parity bit (if used), and different speed appends reliably receive when than the receiver. the stop bits. Since (This is the normal case, because transmission of a single character communicating units usually have no may take a long time relative to CPU shared timing system apart from the speeds, the UART will maintain a flag communication showing busy status so that the host signal.) Simplistic UARTs may merely detect the falling system edge of the start bit, and then read character for transmission until the the center of each expected data bit. A previous one has been completed; this simple UART can work well if the data may also be done with an interrupt. rates are close enough that the stop Since full-duplex operation requires bits are sampled reliably. characters to be sent and received at It is a standard feature for a UART does not deposit a new the same time, practical UARTs use to store the most recent character two while receiving the next. This "double transmitted characters and received buffering" gives a receiving computer characters an entire character transmission time DMA CONTROLLER to fetch a received character. Many different Transferring shift of registers data for bytes to UARTs have a small first-in, first-out memory from devices like magnetic FIFO buffer memory between the disk or optical disk are faster than receiver shift register and the host can be read by a software program. In system interface. This allows the host such applications we use a dedicated processor even more time to handle hardware an interrupt from the UART and controller prevents loss of received data at high transfer.The rates. temporarily borrows the data bus and ISSN: 2231-5381 http://www.ijettjournal.org device to called manage DMA DMA the data controller Page 153 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 control bus and control bus from the microprocessor and transfers the data bytes directly from the disk controller to a service of memory location. Because the data transfer is handled The transfer offer data is carried in the following way: 1) Acquires control of system bus then 2) Acknowledges that totally in hardware, it is much faster peripheral than it would be done by program highest priority channel. instructions. A DMA controller can 3) which is requesting connected to Outputs the LSB of the memory also transfer data from memory to a address on to system address lines port. Some DMA devices even can do A0-A7 memory address to the 8212 I/O port via the to memory transfer to outputs and MSB of the complement fast block transfers. data bus ( the 8212 process there bits INTEL 8257 DMA CONTROLLER on A8-A15 ) and The 8257 is a programmable, 4channel direct controller, in memory the and I/O read/write control signals that causes the peripherals to receive peripherals can request data transfer. or a data directly from or to the Each address location of memory. can that Generates the appropriate memory 4 channel sense access 4) transfer 16.384(16k) bytes, the MPU provides The 8257 will retain control of the a 16 bit starting address a 14 bit system bus and repeat the transfer count for the number bytes, direction sequence as long as a peripheral of data transfer and control words to maintains its DMA request. Thus the the 8257. 8257 can transfer a block of data FUNCTIONAL DESCRIPTION to/from a high-speed peripheral in a The 8257 is programmed, DMA single “burst”. When the specified devices which when coupled with a number of data bytes have been single Intel 8212 I/O devices, provides transferred, the 8257 activates its a complete 4 channel DMA controller terminal count(TC) output informing for a use in Intel Micro Computer the Systems. After being initialized by complete. software, the 8257 can transfer the MODES OF OPERATION data to peripheral devices directly, on receiving a DMA transfer request from an enabled peripheral. ISSN: 2231-5381 CPU that the operation is The 8257 can operate in three modes of operation a) DMA write b) DMA read http://www.ijettjournal.org Page 154 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 a) DMA WRITE of Avalon-MM Master. When the In this mode, data is transferred NIOSII processor sends data through from the peripheral device to the serial port, firstly, it’s necessary to memory (that is I/O Read memory make configuration to the UART sent write). controller and the Master Read type b) DMA READ DMA controller through the register In this mode data is transferred file with the interface of Avalon-MM from the memory to the peripheral Slave to set the baud rate of the serial device (that is, memory Read, I/O port, the number of bytes of the data Write). to be sent and the base address of the 2. Block diagram for UART-DMA data stored in the memory. Secondly, write the data to be sent to the specified location in the memory and then start the Master Read type DMA controller, thus the data stored in the memory is sent out one by one through the UART sent controller. When all the data that you want to send has been sent, an interrupt will be generated in the NIOSII UART IP soft core is designed using DMA transmission here. Its is shown in Figure 2. The entire UART IP soft core in DMA mode mainly includes the following 5 sub-modules: UART send controller, UART Receive controller, Register file with the Interface of Avalon-MM Slave, Master Read type DMA controller with the interface of Avalon-MM Master and Master Write type DMA controller with the interface ISSN: 2231-5381 inform the serial data is completed, so as to start ABOUT ARCHITECTURE architecture to processor that the transmission of Figure 2 Block diagram of DMA controller overall processor the next data transmission. Since the whole process of data sent is managed by the Master Read type DMA controller, NIOSII processor can concentrate on other things and not be disturbed, thus the utilization ratio of NIOSII CPU increases greatly. When the NIOSII processor need to receive data through serial ports, firstly, it’s necessary to fulfill the configuration on the UART receive controller and a Master Write type of http://www.ijettjournal.org Page 155 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 DMA controller through the register state machine in the hardware file with Avalon-MM Slave interface to description language of Verilog HDL, set baud rate of the serial port, the thus completing the timing control of number of bytes of data which will be the data transmission of serial port. received and the base address of the Its state transition diagram is shown data stored in the memory. Secondly, in Figure 3. start the Master Write type of DMA controller, thus the data received through the UART controller can be stored in the specified location in the memory one by one, when all the data is received, an interrupt will be generated in the NIOSII processor to inform the the As can be seen from Figure 3, the state is machine firstly is in idle state. When completed, so as to read the data that the Master Read type of DMA controller has been received from the memory starts to conduct a data transmission, for processing and start the next data the transmission. Since the whole process data_valid state. Data_valid, read_fifo of data reception is managed by the and the load, these three states are Master Write type of DMA controller, mainly used to access a byte which is NIOSII processor can concentrate on read by the Master Read type of DMA other things and not be disturbed, controller from the memory. Splice the thus the utilization ratio of NIOSII byte with the start bit and stop bit CPU increased substantially. together transmission processor of that Figure 3 the state transition diagram of the UART sending controller serial data 3. DESCRIPTION OF MODULES A. UART Tx controller Design: state machine and send moves into it to the the shift register. After that, the state machine enters into the send state. Send and The transmission of serial port finish, these two states are mainly used uses the basic frame format. First of to send the data in the transmit shift all, send low to the start-bit, and then register bit by bit under the control of under the control of the clock in the serial baud rate, send 8-bit data from D0 to process of sending the data in the D7, finally sent high to the stop-bit. In transmit shift register is completed; the this paper, a UART sending controller state machine will enter into the state is designed using the way of finite of block_finish. ISSN: 2231-5381 port http://www.ijettjournal.org baud clock.When Page 156 the International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 In this state, the state machine makes from figure 4, the state machine is in a judgment of the number of bytes of the idle state at the beginning. the data which has been sent out. If the number is less than the number of bytes of data that should be sent, it shows that all the data has not been sent out, so the count plus 1 and the state machine enters into the state of data_valid to read and send the next data byte. The state machine doesn’t Figure 4 The state transition diagram of the UART receiver controller enter into the state of master_done, When the Master Write type of until all the data bytes are sent out. In DMA controller is started to conduct a this state, the state machine makes reception of the data, the state machine detection whether this DMA transfer is enter into the start state. Start, ready, completed. state these two states are mainly used to machine will generate an interrupt clear the receiver shift register and the signal and enter into the idle state. At bit counter and prepare for receiving a this data byte of data. When it’s ready, the state transfer machine enters into the recv state. If it’s point, a transmission in done, full the the serial DMA mode. Racv, finish, these two states are B. UART receiver controller Design: mainly used to receive the byte of data The reception of serial port uses bit by bit under the control of serial the basic frame format. First of all, port baud rate clock and store it in the detect the start-bit is low-level. Then reception receive the bytes of data bit by bit reception of a byte of data is completed, under the control of the clock in the the state machine enters into the load baud rate. Finally, receive the high-level state. Load, buffer_ready, these two of the stop bit. In this paper, a UART states are mainly used to move the byte receiver controller is designed using the data to the Master Write type of DMA way of finite state machine in the controller in order to complete write hardware operation from the bytes of data to the Verilog description HDL, thus language completing of the shift register. When the memory. timing control of the data reception of Then the state machine enters into the serial port. Its state transition diagram block_finish state. In this state, the is shown in Figure 4. As can be seen state machine makes a judgment of the ISSN: 2231-5381 http://www.ijettjournal.org Page 157 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 number of bytes of the data that has Salve interface. There area total of four been received. If the number is less 32-bit registers in it, whose specific than the number of bytes of data that structure and function is shown in should be received, it shows that all the Table 1.The NIOSII processor accesses data has not been received, so the these 4 registers by the way of base count plus 1 and the state machine address plus address offset, which can enters into the ready state to receive control the configuration of the UART IP the next data byte and send it to the soft-core in the DMA mode as well as Master Write type of DMA controller. the reception and the transmission of The state machine doesn’t enter into the serial data. the state of master_done, until all the D. The design of the Master Read data bytes are received. Master_done type of DMA controller with the and get_done, these two states detect interface of Avalon-MM Master: whether is The Master Read type DMA controller state with the Avalon- MM Master interface this completed. If DMA it’s Transfer done, the machine will generate an interrupt designed signal and enter into the idle state. At peripheral with the Avalon-MM Master this point, a full serial data reception in main ports. It finishes the basic reading the DMA transfer mode. transport through the switching fabric C. The design of the register file with between Avalon-MM Master main ports the interface of Avalon-MM Slave: and the AVALON, so that it can read AVALON bus, an open interconnect the specified length of data from the bus; can be used to connect the main memory peripherals and the minor peripherals. specified and send it out one by one to The main peripheral can initiate bus the UART send controller. transfers on the AVALON bus. While E. The design of the Master Write the minor peripheral can only respond type of DMA controller with the to interface of Avalon-MM Master: the bus transfers. The main in this whose passage starting is address the is peripheral connects with the AVALON The Master Write type DMA controller bus with the Avalon-MM Master interface using the Avalon-MM Master interface, while the minor peripheral designed using the Avalon-MM Slave interface. peripheral with the Avalon-MM Master The register file with the Avalon-MM main ports. It finishes the basic writing Slave interface designed in this passage transport through the switching fabric is a peripheral with the Avalon-MM between Avalon-MM Master main ports ISSN: 2231-5381 in http://www.ijettjournal.org this passage is Page 158 the International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 and the AVALON, so that it can continuously store the specified length of data receiving received from controller in the the UART memory whose starting address is specified. 4. Results & Conclusions Figure 6 Simulation Result for Top module Write operation The entire UART IP soft core in DMA mode is designed with the following 5 sub-modules: UART send controller, UART Receive controller, Register file with the Interface of Avalon-MM Slave, Master Read type DMA controller with the interface of Avalon-MM Master and Master Write type DMA controller with the interface of Avalon-MM Master. Figure 7 Block diagram of UART_DMA soft core The design of UART IP Soft Core based on DMA Mode write and read operations simulated using Modelsim tool, and synthesized using Xilinx tool. From the reports the conclusion can be made that,the proposed design using UART can achieve better speed performance upto Ghz per Sec. So it became the cost effective solution Figure 8 RTL schematic of UART_DMA soft core where cost is the major issue. Fig 5 & 6 Shows the simulation result of Read and Write Operation of the designed Acknowledgements The authors would like to thank DMA cycle. Fig 7 & 8 shows the RTL Schematic of the designed DMA unit. the anonymous reviewers for their comments which were very helpful in improving the quality and presentation of this paper. Figure 5 Simulation Result for Top Module Read operation ISSN: 2231-5381 http://www.ijettjournal.org Page 159 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 3- Dec 2013 References: [1] Volnei A. Pedroni, ‘Circuit Design Authors Profile: with VHDL’, MIT Press, England. Jhansirani [2] Charless H. Roth, Jr (2005) ‘Digital Pursuing College, Thomson Asia private limited, Singapore. & 3rd Edition. communication based on Academic Chirala Efficient and DMA its in Journal UART Tech in the Communications Engineering fuguang. M. department of Electronics [3] Bhaskar .J (2004) ‘A VHDL Primer’, Yang her is from Chirala Engineering Systems Design Using VHDL’, 3rd edition, [4] Koti (ECE) with specialization in VLSI & Embedded systems. applications ARM. Web Chinese G.Mani Kumar is working Publishing as an Assistant Professor General Library,2008. in the Department of ECE [5] S. Vassiliadis, F. Duarte, and S. in CEC Chirala. He has 5 years Wong, “A load/store unit for a memory hardware accelerator,” in Proc. Int’l Conf. Field Programmable Logic and of Teaching Experience and 1 year Industrial Experience in various organizations Applications, 2007, pp. 537-541 [6] S. Wong, F. Duarte, and S. Vassiliadis, “A hardware cache memcpy accelerator,” in Proc. IEEE International Conference in Field Programmable Technology, 2006, pp. 141–147. [7] K. Vaidyanathan, W. Huang, L. Chai, and D. K. Panda, “Designing efficient asynchronous memory operations using hardware copy engine: a case study with I/OAT,” IEEE International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, 2007. [8] Altera corp, UART Core User’s Guide. ISSN: 2231-5381 http://www.ijettjournal.org Page 160