VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VLSI PROJECTS-2013-2014 1. An Image Steganography Technique using X-Box Mapping Abstract: Image Steganography is a method of concealing information into a cover image to hide it. Least Significant-Bit (LSB) based approach is most popular steganographic techniques in spatial domain due to its simplicity and hiding capacity. This paper presents a novel technique for Image Steganography based on LSB using X-box mapping where we have used several Xboxes having unique data. The embedding part is done by this Steganography algorithm where we use four unique X-boxes with sixteen different values (represented by 4- bits) and each value is mapped to the four LSBs of the cover image. This mapping provides sufficient security to the payload because without knowing the mapping rules no one can extract the secret data (payload). 2. FPGA Implementation of high speed 8-bit Vedic multiplier using barrel shifter Abstract- This paper describes the implementation of an 8-bit Vedic multiplier enhanced in terms of propagation delay when compared with conventional multiplier like array multiplier, Braun multiplier, modified booth multiplier and Wallace tree multiplier. In our design we have utilized 8-bit barrel shifter which requires only one clock cycle for ‘n’ number of shifts. The design is implemented and verified using FPGA and ISE Simulator. The core was implemented on Xilinx Spartan-6 family xc6s1x75T-3-fgg676 FPGA. The propagation delay comparison was extracted from the synthesis report and static timing report as well. The design could achieve propagation delay of 6.781ns using barrel shifter in base selection module and multiplier 3. Design of Low Power TPG Using LP-LFSR Abstract- This paper presents a novel test pattern generator which is more suitable for built in self test (BIST) structures used for testing of VLSI circuits. The objective of the BIST is to reduce power dissipation without affecting the fault coverage. The proposed test pattern generator reduces the switching activity among the test patterns at the most. In this approach, the single input change patterns generated by a counter and a gray code generator are Exclusive–ORed with the seed generated by the low power linear feedback shift register [LP-LFSR]. The proposed scheme is evaluated by using, a synchronous pipelined 4x4 and 8x8 Braun array multipliers. The System-On-Chip (SOC) approach is adopted for implementation on Altera Field Programmable Gate Arrays (FPGAs) based SOC kits with Nios II soft-core processor. From the implementation results, it is verified that the testing power for the proposed method is reduced by a significant percentage. 4. An Implementation of AES Algorithm Based on FPGA Abstract—An implementation of high speed AES algorithm based on FPGA is presented in this paper in order to improve the safety of data in transmission. The mathematic principle, encryption process and logic structure of AES algorithm are introduced. So as to reach the purpose of improving the system computing speed, the pipelining and parallel processing methods were used. The simulation results VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 show that the high-speed AES encryption algorithm implemented correctly. Using the method of AES encryption the data could be protected effectively. 5. Speed optimization of a FPGA based modified Viterbi Decoder Abstract — In the modern era of electronics and communication decoding and encoding of any data(s) using VLSI technology requires low power, less area and high speed constrains. The viterbi decoder using survivor path with necessary parameters for wireless communication is an attempt to reduce the power and cost and at the same time increase the speed compared to normal decoder. This paper presents three objectives. Firstly, an orthodox viterbi decoder is designed and simulated. For faster process application, the Gate Diffused Input Logic (GDIL) based viterbi decoder is designed using Xilinx ISE, simulated and synthesized successfully. The new proposed GDIL viterbi provides very less path delay with low power simulation results. Secondly, the GDIL viterbi is again compared with our proposed technique, which comprises a Survivor Path Unit (SPU) implements a trace back method with DRAM. This proposed approach of incorporating DRAM stores the path information in a manner which allows fast read access without requiring physical partitioning of the DRAM. This leads to a comprehensive gain in speed with low power effects. Thirdly, all the viterbi decoders are compared, simulated, synthesized and the proposed approach shows the best simulation and synthesize results for low power and high speed application in VLSI design. The Add-Compare-Select (ACS) and Trace Back (TB) units and its sub circuits of the decoder(s) have been operated in deep pipelined manner to achieve high transmission rate. Although the register exchange based survivor unit has better throughput when compared to trace back unit, but in this paper by introducing the RAM cell between the ACS array and output register bank, a significant amount of reduction in path delay has been observed. All the designing of viterbi is done using Xilinx ISE 12.4 and synthesized successfully in the FPGA Spartan-3 target device operated at 64.516 MHz clock frequency, reduces almost 41% of total path delay. 6. Traffic-aware Design of a High Speed FPGA Network Intrusion Detection System Abstract—Security of today’s networks heavily rely on Network Intrusion Detection Systems (NIDSs). The ability to promptly update the supported rule sets and detect new emerging attacks makes Field Programmable Gate Arrays (FPGAs) a very appealing technology. An important issue is how to scale FPGA-based NIDS implementations to ever faster network links. Whereas a trivial approach is to balance traffic over multiple, but functionally equivalent, hardware blocks, each implementing the whole rule set (several thousands rules), the obvious cons is the linear increase in the resource occupation. In this work, we promote a different, traffic-aware, modular approach in the design of FPGA-based NIDS. Instead of purely splitting traffic across equivalent modules, we classify and group homogeneous traffic, and dispatch it to differently capable hardware blocks, each supporting a (smaller) rule set tailored to the specific traffic category. We implement and validate our approach using the rule set of the well known Snort NIDS, and we experimentally investigate the emerging trade-offs and advantages, showing resource savings up to 80% based on real world traffic statistics gathered from an operator’s backbone. 7. Design and Implementation of Automated Wave-Pipelined Circuit using ASIC Abstract— Wave-pipelining enables a digital circuit to be operated at higher frequency. In the literature, only trial and error and manual procedures are adopted for the choice of the optimum value of clock and clock skew between the I/O registers of wave-pipelined circuits. The major contribution of this paper is the proposal for automating the above procedure for the ASIC implementation of wave- VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 pipelined circuits using built in self test approach. This is studied by a multiplier using dedicated AND gate by adopting three different schemes: wave-pipelining, pipelining and non-pipelining. From the implementation results, it is verified that the wavepipelined multipliers are faster by a factor of 1.08 compared to the non-pipelined multipliers. The wavepipelined Multiplier dissipates less power in the factor of 1.43 compared to the pipelined multiplier. 8. FPGA-based adaptive noise cancellation for ultrasonic NDE application Abstract— Adaptive filter has been widely used in different applications for interference cancellation, predication, inverse modeling and identifications. In this paper, Field Programmable Gate Array (FPGA)based adaptive noise cancellation is studied for adaptive filtering in ultrasonic non-destructive evaluation. Simulation and experimental results showed that backscattered noise from microstructures inside material can be efficiently reduced by adaptive filter. Additionally, four different architectures of filter realization on FPGA are discussed and compared. This type of study could have a broad range of applications such as target detection, object localization and pattern recognition. 9. FPGA Based Implementation of a Double Precision IEEE Floating-Point Adder Abstract-Floating-Point addition imposes a great challenge during implementation of complex algorithm in hard realtime due to the enormous computational burden associated with repeated calculations with high precision numbers. Moreover, at the hardware level, any basic addition or subtraction circuit has to incorporate the alignment of the significands. This paper presents a novel technique to implement a double precision IEEE floating-point adder that can complete the operation within two clock cycles. The proposed technique has exhibited improvement in the latency and also in the operational chip area management. The proposed double precision IEEE floating-point adder has been implemented with XC2V6000 and XC3SI500 Xilinx© FPGA devices. 10. Least Complex S-Box and Its Fault Detection for Robust Advanced Encryption Standard Algorithm Abstract- Advanced Encryption Standard (AES) is the symmetric key standard for encryption and decryption. In this work, a 128-bit AES encryption and decryption using Rijndael Algorithm is designed and synthesized using verilog code. The fault detection scheme for their hardware implementation plays an important role in making the AES robust to the internal and malicious faults. In the proposed AES, a composite field S-Box and inverse S-Box is implemented using logic gates and divided them into five blocks. Any natural or malicious faults which defect the logic gates are detected using parity based fault detection scheme. For increasing the fault exposure, the predicted parities of each of the block S-box and inverse S-box are obtained. The multi-bit parity prediction approach has low cost and high error coverage than the approaches using single bit parities. The Field Programmable Gate Array (FPGA) implementation of the fault detection structure has better hardware and time complexities. 11. An Application Instance of FPGA in the Field of PHM Abstract-With the extensive application of new types of sensors, the rate of data acquisition and transmission are becoming a problem in Prognostic and Health Management (PHM). This paper presents an FPGA-based method for high speed data acquisition and transmission taking full advantage of the parallel processing capabilities and easy to modify and upgrade features of FPGA, to cope with the VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 difficulties of highspeed data acquisition and transmission. This article focuses on the components performance and characteristics of the FPGA based fault prediction and diagnosis system. The system uses an FPGA chip as the core operational component to achieve fast transfer of large amounts of data by FIFO. The use of programmable FPGA hardware, makes the data acquisition system design flexible, while improving the reliability of the system. It is also important for achieving the real-time online monitoring of failure. 12. A Moving Window Architecture for a HW/SW Codesign Based Canny Edge Detection for FPGA Abstract - This paper proposes an accelerator for Canny edge detection implemented on FPGA. The proposed architecture relies on a moving window consisting of 7x8 pixels, which performs the more computational complex operations of the algorithm: smoothing, gradient’s magnitude and direction computation, nonmaximum suppression and double thresholding. By employing the proposed window, intermediate results are stored within the FPGA, without the need to buffer them in large memory structures. Furthermore, the design has a high throughput rate, due to its large numbers of pipeline stages, allowing considerable performance for the proposed algorithm. 13. An Electrocardiogram Diagnostic System Implemented in FPGA Abstract—In this paper, we present a signal processing method capable of detecting angina in electrocardiograms, and its implementation in Field Programmable Gate Array - FPGA. The adopted procedure is based on fuzzy clustering to reduce the amount of data sampling and a comparison with samples from a previously established database. By using the correlation method on the samples, it is possible to establish an initial indication of angina. The reduced number of samples of the clustering process turns the processing simpler and allows its hardware (FPGA) implementation. According to the tests conducted, the method achieves 85% correct diagnoses. 14. Enhanced Area Effi cient Architecture for 128 bit Modified CSLA Abstract- In the design of Integrated circuits, area occupancy plays a vital role because of increasing necessity of portable systems. Carry Select Adder (CSLA) is a fast adder used in dataprocessing processors for performing fast arithmetic functions. From the structure of the CSLA, the scope is to reduce the area of CSLA based on the efficient gate-level modification. In this paper 128 bit Regular Linear CSLA, Modified Linear CSLA, Regular Square-root CSLA (SQRT CSLA) and Modified SQRT CSLA architectures have been developed and compared. However, the Regular CSLA is still area-consuming due to the dual RippleCarry Adder (RCA) structure. For reducing area, the CSLA can be implemented by using a single RCA and an add-one circuit instead of using dual RCA. Comparing the Regular Linear CSLA with Regular SQRT CSLA, the Regular SQRT CSLA has reduced area as well as comparing the Modified Linear CSLA with Modified SQRT CSLA; the Modified SQRT CSLA has reduced area. The results and analysis show that the Modified Linear CSLA and Modified SQRT CSLA provide better outcomes than the Regular Linear CSLA and Regular SQRT CSLA respectively. This project was aimed for implementing high performance optimized FPGA architecture. Modelsim 6.3c is used for simulating the CSLA and synthesized using Xilinx PlanAhead12.2. Then the implementation is done in spartan3 FPGA Kit. 15. PNOC: Implementation on Verilog for FPGA VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 Abstract—Network on Chip (NoC) architectures provide a very efficient means for performance enhancement in digital circuits. The paper describes a NoC implementation that is specifically targeted towards FPGA based designs. Our implementation is based on a lightweight circuit-switched architecture called programmable NoC (PNoC). It is captured in the Verilog hardware description language and is implemented using the Xilinx Virtex-II pro FPGA (XC2Vp30-7) device at 126 MHz. The proposed architecture allows parametrization at the compile time for the number of nodes and amount of data. Moreover, experimental results have confirmed that the proposed implementation is the most efficient one in terms of performance. 16. Real-time FPGA-based Template Matching Module for Visual Inspection Application Abstract-Template matching enables localization of objects under inspection, but suffers from long computation due to high computation complexity. In this work, a real-time FPGA-based template matching module which accelerates time consuming normalized cross-correlation (NCC) template matching was presented. To improve NCC computation speed, we simplified the original NCC algorithm and designed pipelined parallel processing circuit architecture. Experimental results showed that our FPGA module accelerates NCC speed up to 80 times faster than PC performance. The real-time template matching module has been integrated into an LED die inspection system to localize LED dies. This realtime template matching module can be applied to LED, PV, and semiconductor inspection and manufacturing applications. Furthermore, this module can also be applied to vision applications for either service or industrial robots. 17. High-performance Implementation of a New Hash Function on FPGA Abstract— Skein has the advantages of higher security (resisting against traditional attacks), faster speed and selectable parameters. Therefore, it becomes a strong competitor for next generation secure hash algorithm standard (SHA-3) which will be used widely in communication and security for substitution of SHA-2. The problems of existing works lie in implementation for only one structure and lack detailed comparison of different structures. Based on analysis of the algorithm, we accomplished three structures (iterative, 4-unrolled and 8-unrolled) of Skein and ported the designs to FPGA respectively. Finally detailed analysis and comparison with our different structures and other implementations are provided from aspects of hardware resource and performance. The results show that our implementation has better performance and takes up less hardware resources than existing works under the same structure. Our implementation can meet the requirement of real-time and high performance field. 18. Improved Floating-Point Matrix Multiplier Abstract – Floating-point matrix multiplier is widely used in scientific computations. A great deal of efforts has been made to achieve higher performance. The matrix multiplication consists of many multiplications and accumulations. Yang and Duh proposed a modular design of floating-point matrix multiplier which reserving intermediate result as two vectors. It brings shorter delay but more cost. This work modifies Yang and Duh’s design with Booth encoding in multiplication to reduce the number of VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 partial products. As the result, the improved floating-point matrix multiplier has better performance with shorter delay and much less hardware cost than Yang and Duh’s design. 19. An area efficient multiplexer based CORDIC Abstract—In the literature, multiplexer has been proposed for the ASIC implementation of unrolled CORDIC (Coordinate Rotation DIgital Computer) processor. In this paper, the efficacy of this approach is studied for the implementation on FPGA. For this study, both non pipelined and 2 level pipelined CORDIC with 8 stages and using two schemes – one using adders in all the stages and another using multiplexers in the second and third stages. A 16 bit CORDIC for generating the sine/cosine functions is implemented using all the four schemes on both Xilinx Virtex 6 FPGA(XC6VLX240) and Altera Cyclone II FPGA(EP2C20F484C7). From the implementation results, it is found that the nonpipelined and pipelined CORDICs using multiplexer requires 1.6, 1.4 times lower area in Xilinx FPGA and 1.8, 1.6 times lower area in Altera FPGA than that using only adders. This is achieved without reduction in speed. 20. Simulation and Implementation of LDPC Code in FPGA Abstract—The paper deals with implementation of Low-Density Parity-Check (LDPC) codes [1] in FPGAbased bridge for Free- Space Optical link. The coder was designed with a regular parity matrix for code rate 1/2. The matrix of dimension 8x16 for the experimental implementation was found using a random search in MATLAB. The main advantage of this matrix is the decoder can correct all single-bit errors. The simulation for all possible values shows that Bit Error Ratio (BER) is zero. This result was not obtained with other matrices. An experimental communication channel was realized with encoder and decoder implemented in FPGA Virtex 5 development board ML505. DIP switches are sources for information bits and these values are shown on LCD display. The bit-flipping method is used in decoder and result code word is shown in the second line on the LCD display. 21.FPGA based implementation of a fuzzy Neural network modular architecture For embedded systems This paper presents a FPGA based approach for a modular architecture of Fuzzy Neural Networks (FNN) to embed easily different topologies set up. The project is based on a Takagi – Hayashi (T-H) method for the construction and tuning of fuzzy rules, this is commonly referred as neural network driven fuzzy reasoning. The proposed architecture approach consists of two main configurable modules: a Multilayer Perceptron – MLP with sigmoidal activation function that composes the first module to determine a Fuzzy membership function; the second employs an MLP with pure linear activation function to define the consequents. The DSPBuilder® software along the Simulink® is used to connect, set and synthesize the Fuzzy Neural Network desired. Other hardware components employed in the architecture proposed cooperate to the system modularity. The system was tested and validated through a control problem and an interpolation problem. Several papers proposed different hardware architecture to implement hybrid systems by using Fuzzy logic and Neural Network. However, there is no approach with this specific neural network driven fuzzy reasoning by T-H method and the aim to be embedded. The SelfOrganizing Map (SOM) and Levenberg-Marquardt back propagation were used to train the FNN proposed off-line. VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 22. Efficient Interleaver Design for MIMO-OFDM Based Communication Systems on FPGA In this paper, we present a memory-efficient and faster interleaver implementation technique for MIMO-OFDM communication systems on FPGA. The IEEE 802.16 standard is used as a reference for simulation, implementation, and analysis. A method for the interleaver design on FPGA and its memory utilization results are presented. Our design utilizes the minimum required on-chip memory for the interleaver implementation. Using the proposed interleaver design method, the data rates for MIMOOFDM based communication systems are doubled for 2×2 MIMO systems without using the transmit diversity. 23. A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection System Abstract— Pattern matching has became a bottleneck of software based Network Intrusion Detection System (NIDS) as the number of signature have recently increased dramatically. Many FPGA based architectures for detecting malicious patterns have been proposed recently. However, these approaches have just considered matching pattern separately while more and more complex combination of several patterns are utilized to describe intrusion activities. In this paper we present our work which concentrates on multi-pattern signature and propose a FPGA based deep packet inspection engine for NIDS. The system can support both static and dynamic patterns. We employ Snort signature set and realize our system on Net FPGA platform. The evaluation on real network environment shows that our system can maintain gigabit line rate throughput without dropping packets. 24. FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA Abstract--NEDA is one of the techniques to implement many digital signal processing systems that require multiply and accumulate units. FFT is one of the most employed blocks in many communication and signal processing systems. This paper proposes FPGA implementation of a 16 point radix-4 complexFFT core using NEDA. The proposed design has improvement in terms of hardware utilization compared to traditional methods. The design has been implemented on a range of FPGAs to compare the performance. The proposed design has a power consumption of 728.89 mW on XC2VP100-6FF1704 FPGA at 50 MHz. The maximum frequency achieved is 114.27 MHz on XC5VLX330-2FF1760 FPGA at a cost of higher power and the maximum throughput observed is 1828.32 Mbit/s and minimum slice delay product observed is 9.18. The design is also implemented using synopsys DC synthesis for both 65 nm and 180 nm technology libraries. 25. A High-Performance FPGA-Based Implementation of the LZSS Compression Algorithm The increasing growth of embedded networking applications has created a demand for highperformance logging systems capable of storing huge amounts of high-bandwidth, typically redundant data. An efficient way of maximizing the logger performance is doing a real-time compression of the logged stream. In this paper we present a flexible high-performance implementation of the LZSS compression algorithm capable of processing up to 50 MB/s on a Virtex-5 FPGA chip. We exploit the independently addressable dual-port block RAMs inside the FPGA chip to achieve an average performance of 2 clock cycles per byte. To make the compressed stream compatible with the ZLib library [1] we encode the LZSS algorithm output using a fixed Huffman table defined by the Deflate specification [2]. We also demonstrate how changing the amount of memory allocated to various internal tables impacts the performance and compression ratio. Finally, we provide a cycle-accurate VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 estimation tool that allows finding a trade-off between FPGA resource utilization, compression ratio and performance for a specific data sample. 26. FPGA Implementation of heterogeneous multicore platform with SMID/MIMD custom accelerators 27. Novel High Speed Vedic Mathematics Multiplie using Compressors Abstract—With the advent of new technology in the fields of VLSI and communication, there is also an ever growing demand for high speed processing and low area design. It is also a well known fact that the multiplier unit forms an integral part of processor design. Due to this regard, high speed multiplier architectures become the need of the day. In this paper, we introduce a novel architecture to perform high speed multiplication using ancient Vedic maths techniques. A new high speed approach utilizing 4:2 compressors and novel 7:2 compressors for addition has also been incorporated in the same and has been explored. Upon comparison, the compressor based multiplier introduced in this paper, is almost two times faster than the popular methods of multiplication. With regards to area, a 1% reduction is seen. The design and experiments were carried out on a Xilinx Spartan 3e series of FPGA and the timing and area of the design, on the same have been calculated. 28. FPGA Implementation of BASK-BFSK-BPSK Digital Modulators Abstract: Field-programmable gate-array (FPGA) implementations of binary amplitude-shift keying (BASK), binary frequency shift keying (BFSK), and binary phase-shift keying (BPSK) digital modulators are presented. The proposed designs are aimed at educational purposes in a digital communication course. They employ the minimum number of blocks necessary for achieving BASK, BFSK, and BPSK modulation, and for full integration with the other functional parts of the Altera Development and Education (DE2) FPGA board. The input carrier signal and the bit stream (modulating signal) are user controllable. These digital modulators were developed and compiled to a Verilog Hardware Description Language (HDL) netlist, and were later implemented into an Altera DE2 FPGA board. The functionality of these digital modulators was demonstrated through simulations using the Quartus II simulation software, and experimental measurements of the real-time modulated signal via an oscilloscope. 29. Design and Implementation of Adaptive filtering algorithm for Noise Cancellation in speech signal on FPGA Abstract- In recent years FPGA systems are replacing dedicated Programmable Digital Signal Processor (PDSP) systems due to their greater flexibility and higher bandwidth, resulting from their parallel architecture. This paper presents the applicability of a FPGA system for speech processing. Here adaptive filtering technique is used for noise cancellation in speech signal. Least Mean Squares (LMS ) , one of the widely used algorithm in many signal processing environment , is implemented for adaption of the filter coefficients. The cancellation system is implemented in VHDL and tested for noise cancellation in speech signal. The simulation of VHDL design of adaptive filter is performed and analyzed on the basis of Signal to Noise ratio (SNR) and Mean Square Error (MSE). 30. A Programmable FPGA-based 8-Channel Arbitrary Waveform Generator for Medical Ultrasound Research Activities VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 Abstract: In modern ultrasound imaging systems, digital transmit beamformer module typically generates accurate control of the amplitude of individual elements in a multielement array probe, as well as of the time delays and phase between them, to enable the acoustic beam to be focused and/or steered electronically. However, these systems do not provide the ultrasound researchers access to transmit front-end module. This paper presents the development of a digital transmit beamformer system for generating simultaneous arbitrary waveforms, specifically designed for research purposes. The proposed architecture has 8 independent excitation channels and uses an FPGA (Field Programmable Gated Array) device for electronic steering and focusing of ultrasound beam. The system allows operation in pulse-echo mode, with pulse repetition rate of excitation from 62.5 Hz to 8 kHz, center frequency from 500 kHz to 20 MHz, excitation voltage over 100 Vpp, and individual control of amplitude apodization, phase angle and time delay trigger. Experimental results show that this technique is suitable for generating the excitation waveforms needed for medical ultrasound imaging researches. 31. Area-Time Efficient Scaling-Free CORDIC Using Generalized Micro-Rotation Selection Abstract—This paper presents an area-time efficient CORDIC algorithm that completely eliminates the scale-factor. By suitable selection of the order of approximation of Taylor series the proposed CORDIC circuit meets the accuracy requirement, and attains the desired range of convergence. Besides we have proposed an algorithm to redefine the elementary angles for reducing the number of CORDIC iterations. A generalized micro-rotation selection technique based on high speed most-significant-1-detection obviates the complex search algorithms for identifying the micro-rotations. The proposed CORDIC processor provides the flexibility to manipulate the number of iterations depending on the accuracy, area and latency requirements. Compared to the existing recursive architectures the proposed one has 17% lower slice-delay product on Xilinx Spartan XC2S200E device. 32. ADPLL Design and Implementation on FPGA Abstract:-This paper presents the ADPLL design using Verilog and its implementation on FPGA. ADPLL is designed using Verilog HDL. Xilinx ISE 10.1 Simulator is used for simulating Verilog Code.This paper gives details of the basic blocks of an ADPLL. In this paper, implementation of ADPLL is described in detail. Its simulation results using Xilinx are also discussed. It also presents the FPGA implementation of ADPLL design on Xilinx vertex5 xc5vlx110t chip and its results. The ADPLL is designed of 200 kHz central frequency. The operational frequency range of ADPLL is 189 Hz to 215 kHz, which is lock range of the design. 33. Implementation and Comparison of Effective Area Efficient Architectures for CSLA Abstract- In the design of Integrated circuits, area occupancy plays a vital role because of increasing necessity of portable systems. Carry Select Adder (CSLA) is a fast adder used in data processing processors for performing fast arithmetic functions. From the structure of the CSLA, the scope is to reduce the area of CSLA based on the efficient gate-level modification. In this paper 128 bit Regular Linear CSLA, Modified Linear CSLA, Regular Square-root CSLA (SQRT CSLA) and Modified SQRT CSLA architectures have been developed and compared. However, the Regular CSLA is still area-consuming due to the dual RippleCarry Adder (RCA) structure. For reducing area, the CSLA can be implemented by using a single RCA and an add-one circuit instead of using dual RCA. Comparing the Regular Linear CSLA with Regular SQRT CSLA, the Regular SQRT CSLA has reduced area as well as comparing the Modified Linear CSLA with Modified SQRT CSLA; the Modified SQRT CSLA has reduced area. The results and VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 analysis show that the Modified Linear CSLA and Modified SQRT CSLA provide better outcomes than the Regular Linear CSLA and Regular SQRT CSLA respectively. This project was aimed for implementing high performance optimized FPGA architecture. Modelsim 6.3c is used for simulating the CSLA and synthesized using Xilinx PlanAhead12.2. Then the implementation is done in spartan3 FPGA Kit. 34. Automatic Gain Control on FPGA for Software- Defined Radios Abstract— This paper introduces an efficient implementation of the Automatic Gain Control (AGC) for an IEEE 802.15.3c compliant receiver developed using a Field-Programmable Gate Arrays (FPGAs), both feed-forward and feed-backward AGCs are designed, implemented and evaluated. 35. Design of Plural-Multiplier Based on CORDIC Algorithm for FFT Application Abstract—CORDIC plural-multiplier is the key module to affecting the speed and accuracy of FFT processor. Considering these demands, the problem of CORDIC algorithm is discussed in detail and the according optimization methods are given in this paper. Then, the hardware pipelining structure of the CORDIC multiplier is put forward. Comparison results about RTL simulation results with MATLAB calculation indicate that the design is feasible and practical. 36. VLSI Friendly ECG QRS Complex Detector for Body Sensor Networks Abstract—This paper aims to present a very-large-scale integration (VLSI) friendly electrocardiogram (ECG) QRS detector for body sensor networks. Baseline wandering and background noise are removed from original ECG signal by a mathematical morphological method. Then the multipixel modulus accumulation is employed to act as a low-pass filter to enhance the QRS complex and improve the signal-to-noise ratio. The performance of the algorithm is evaluated with standard MIT-BIH arrhythmia database and wearable exercise ECG Data. Corresponding power and area efficient VLSI architecture is designed and implemented on a commercial nano-FPGA. High detection rate and high speed demonstrate the effectiveness of the proposed detector. 37. FPGA Architecture for OFDM Software Defined Radio with an optimized Direct Digital Frequency Synthesizer Abstract— A Software Defined Radio (SDR) is a transmitter and receiver system that uses digital signal processing (DSP) for coding, decoding, modulating, and demodulating data. This paper presents the framework for hardware implementation of SDR using Orthogonal Frequency Division Multiplexing (OFDM). The framework comprises of VLSI mapping of algorithms, Orthogonal Frequency Division Multiplexing(OFDM), Quadrature Phase Shift Keying (QPSK), Fast Fourier Transform (FFT) Algorithms and most importantly, the algorithm for Direct Digital Frequency Synthesis (DDFS). A digital frequency synthesizer with optimized time and area resources has been proposed for the SDR. This VLSI implementation of the DDFS computes the sine and cosine function on a single edge of clock, thus proving to be optimized in terms of area and speed. Fixed-Point implementation was accomplished with ModelSim simulator. Verilog HDL was used as a description language for mapping Algorithms in VLSI. Xilinx Spartan 3 XC3S200 Field Programmable Gate Array (FPGA) was chosen as a Hardware Platform for the System Implementation. VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 38. FPGA based Area optimized and efficient Architecture for NMS and Thresholding used for Canny Edge Detector Abstract—— In this paper, we present an architecture for Non Maximal Suppression used in Canny edge detection algorithm that results in significantly reduced memory requirements decreased latency and increased throughput with no loss in edge detection. The new algorithm uses a low-complexity 8-bin non-uniform gradient magnitude histogram to compute block-based hysteresis thresholds that are used by the Canny edge detector. Furthermore, FPGA-based hardware architecture of our proposed algorithm is presented in this paper and the architecture is synthesized on the Xilinx Virtex 5 FPGA. The design development is done in VHDL and simulates the results in modelsim 6.3 using Xilinx 12.2. 39. Self-Programmable Multipurpose Digital Filter Design Based on FPGA Abstract: Conventional digital filter can only obtain fixed frequency domain characteristics at a time. In order to obtain variable characteristics, the digital filter’s type, number of taps and coefficients should be changed constantly such that the desired frequency-domain characteristics can be obtained. This paper proposes method for self-programmable Variable Digital Filter (VDF) design based on FPGA. Taking finite impulse response (FIR) digital filter as an example, we implement a digital filter system by using Custom embedded Micro-Processor, programmable FIR macro module, coefficient-loader, clock manager and A/D or D/A controller and other modules. The self-programmable VDF can provide the best solution for realization of digital filter algorithms, which are the low-pass, high-pass, band-pass and band-stop filter algorithms with variable frequency domain characteristics. The design examples with minimum 1 to maximum 32 taps FIR filter, based on Modelsim post-routed simulation and onboard running on XUPV5-LX110T, are provided to demonstrate the effectiveness of the proposed method and the online-adaptability of variable digital filters. 40. FPGA-based reconfigurable control for fault tolerant Back-to-back converter without redundancy Abstract— In this paper, an FPGA-based fault tolerant back-to-back converter without redundancy is studied. Before fault occurrence, the fault tolerant converter operates like a conventional back-to-back six-leg converter and after the fault it becomes a five-leg converter. Design, implementation and experimental verification of an FPGA-based reconfigurable control strategy for this converter are discussed. This reconfigurable control strategy allows the continuous operation of the converter with minimum affection from a fault in one of the semiconductor switches. A very fast detection scheme is used to detect and locate the fault. Implementation of the fault detection and of the fully digital control schemes on a single FPGA is realized, based on a suited methodology for rapid prototyping. FPGA in loop and also experimental tests are carried out and the results are presented. These results confirm the capability of the proposed reconfigurable control and fault tolerant structure. 41. Reliable On-Chip Network Design Using an Agent-based Management Method Abstract—As the complexity of evolving integrated circuits and the number of cores in each chip increase, reliability aspects are becoming an important issue in complex chip designs. In this paper, we present an on-chip network architecture that incorporates a novel agent-based management method to enhance the reliability and performance of network-based Chip Multi-Processor (CMP) and System-on Chip (SoC) designs against faulty links and routers. In addition, to utilize the fault information required for the routing process in a scalable manner, we classify the fault information to be exploited in the VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 proposed distributed and hierarchical management structure. The experimental results show that the proposed architecture incurs only a small hardware overhead. 42. Advanced Cryptographic System for data Encryption and Decryption Abstract- in this paper, we mainly focus on the implementation of Advanced Cryptographic System using two crypto algorithms in a single chip to provide high security and high performance. This paper proves the confidentiality of data over insecure medium. The two algorithms namely 128-bit AES and RSA implemented in a single chip prove difficult for the hacker to crack the system. It combines two transformations of AES algorithms which achieves high speed and less area on chip. This system generates keys internally, which also achieves high security and high performance with faster execution of the algorithm. 43. Mapping FPGA to Field Programmable neural Network Array (FPNNA) Abstract: My paper presents the implementation of a generalized back-propagation multilayer perceptron (MLP) architecture, on FPGA, described in VLSI hardware description language (VHDL). The development of hardware platforms is not very economical because of the high hardware cost and quantity of the arithmetic operations required in online artificial neural networks (ANNs), i.e., general purpose ANNs with learning capability. Besides, there remains a dearth of hardware platforms for design space exploration, fast prototyping, and testing of these networks. Our general purpose architecture seeks to fill that gap and at the same time serve as a tool to gain a better understanding of issues unique to ANNs implemented in hardware, particularly using field programmable gate array (FPGA).This work describes a platform that offers a high degree of parameterization, while maintaining generalized network design with performance comparable to other hardware-based MLP implementations. Application of the hardware implementation of ANN with back-propagation learning algorithm for a realistic application is also presented. 44. Design and Implementation of Reed Solomon Decoder for 802.16 Network using FPGA Abstract—This paper presents a design and FPGA implementation of a reconfigurable FEC Decoder based on Reed Solomon Code for WiMax Network. The implementation, written in Very High Speed hardware description Language (VHDL) is based on Berlekamp Massey, Forney and Chein Algorithm. The 802.16 network standards recommend the use of Reed-Solomon code RS (255,239), which is implemented and discussed in this paper. It is targeted to be applied in a forward error correction system based on 802.16 network standard to improve the overall performance of the system. The objective of this work is to implement a Reed- Solomon VHDL code to measure the performance of the RS Decoder on Xilinx Virtex II pro (xc2vp50- 5-ff1148) and Xilinx Spartan 3e (xc3s500e-4-fg320) FPGA Them performance of the implemented RS codec on both FPGAs will be compared .The performance metrics to be used are the area occupied by the design and the speed at which the design can run. 45. Wavelet-Based SC-FDMA System ABSTRACT: Recent research has shown that the Single Carrier Frequency Division Multiple Access (SCFDMA) is an attractive technology for uplink broadband wireless communications, because it does not have the problems of Orthogonal Frequency Division Multiple Access (OFDMA) such as the large Peakto-Average Power Ratio (PAPR). In this paper, an efficient transceiver scheme for the SC-FDMA systems, using the wavelet transform is proposed. In the proposed scheme, the Fast Fourier Transform (FFT) and VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 its inverse (IFFT) are replaced by the Discrete Wavelet Transform (DWT) and its inverse (IDWT). Wavelet filter banks at the transmitter and the receiver have the ability to reduce distortion in the reconstructed signals, while retaining all the significant features present in the signals. The performance of the proposed scheme is investigated with different wireless channels. Simulation results show that the proposed scheme provides better performance, when compared to the conventional SC-FDMA, while the complexity of the system is slightly increased. 46. An Improved VLSI Architecture of S-box for AES Encryption Abstract—This paper presents an improved VLSI architecture of S-box for AES encryption system. Certain basic blocks in conventional architecture are replaced by efficient multiplexers and an optimized combinational logic to facilitate speed improvement. The proposed as well as conventional architecture are implemented in Xilinx FPGA and 0.18 μm standard cell ASIC technology. ASIC implementation indicates speed enhancement while maintaining constant area compared to conventional architecture. FPGA implementation also confirms speed improvement of about 0.6 ns along with low utilization of FPGA fabrics. Furthermore, there is significant power improvement (155 %) compared to conventional structure. 47. FPGA Based Design and Implementation of Modified Viterbi Decoder for a Wi-Fi Receiver Abstract— Viterbi Decoders are employed in digital wireless communication systems to decode the convolution codes which are the forward error correcting codes. Although widely-used, the most popular communications decoding algorithm, the Viterbi Algorithm (VA), requires an exponential increase in hardware complexity to achieve greater decode accuracy. When the applications based with wireless technology has been developed tremendously with the world. The constraint length associated with the input bits are large, hence it needs to implement the larger constraint length with lesser hardware and lesser computations for decode the original data. When the decoding process uses the Modified Viterbi Algorithm (MVA) computations 50% reduced and reduction in the hardware utilization, which follows the maximum- likelihood path. It shows plan ahead associated with the modified Viterbi decoder implementation using Xilinx tool in verilog design. An implementation on Field Programmable Gate Arrays (FPGA) provides user flexibility to a programmable solutions and lowering the cost. 48. VLSI Architecture for a Reconfigurable Spectrally Efficient FDM Baseband Transmitter Abstract—Spectrally efficient FDM (SEFDM) systems employ non-orthogonal overlapped carriers to improve spectral efficiency for future communication systems. One of the key research challenges for SEFDM systems is to demonstrate efficient hardware implementations for transmitters and receivers. Focusing on transmitters,this paper explains the SEFDM concept and examines the complexity of published modulation algorithms, with particular consideration to implementation issues. We then present two new variants of a digital baseband transmitter architecture for SEFDM, based on a modulation algorithm which employs the discrete Fourier transform (DFT) implemented efficiently using the fast Fourier transform (FFT). The algorithm requires multiple FFTs, which can be configured either as parallel transforms, which is optimal for throughput or using a multi-stream FFT architecture, for reduced circuit area. We propose a simplified approach to IFFT pruning for pipeline architectures, based on a token-flow control style, specifically optimized for the SEFDM application. Reconfigurable implementations for different bandwidth compression ratios, including conventional OFDM, are easily derived from the proposed implementations. The SEFDM transmitters have been synthesized, placed VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874 and routed in a commercial 32 nm CMOS process technology and also verified in FPGA. We report circuit area and simulated power dissipation figures, which confirm the feasibility of SEFDM transmitters. 49. Research and Implementation of Speaker Recognition Algorithm Based on FPGA Abstract: As one of biometric identification technologies, speaker recognition shows better application prospects in many fields. At present, the implementation of speaker recognition algorithm on the hardware is mostly based on System on a Programmable Chip(SOPC) platform of Field Programmable Gate Array(FPGA) with Nios II Intellectual Property(IP) core. And the algorithm can be selected and optimized effectively on SOPC. However the high-speed and parallel operation of FPGA can’t be fully utilized. By researching and analyzing the speaker recognition algorithm, a FPGA-based speaker recognition method is presented in this paper. This method includes the Voice Active Detection (VAD), Mel Frequency Cepstrum Coefficient (MFCC) extraction and Vector Quantization (VQ) recognition algorithm. By using the technology such as Ping-pang operation, Pipeline and module reuse, it makes full use of FPGA’s speed. After testing, this method can effectively meet the requirement of real-time data processing. 50. Efficient Window-Architecture Design using Completely Scaling-Free CORDIC Pipeline Abstract—Filtering being one of the most important modules in signal processing paradigm, this paper presents an FPGA implementation of various window-functions using CORDIC algorithm to minimize area-delay product. The existing window architecture uses a linear CORDIC processor in series with circular CORDIC processor that results in a long pipeline. Firstly, we replace the linear CORDIC with multiple optimized shift-add networks to reduce area and pipeline depth. Secondly, the conventional circular CORDIC processor is replaced by a completely scaling-free CORDIC processor to further improve the area-time efficiency of the existing design. As a result, the proposed window-architecture, on an average requires approximately 64.34% less pipeline stages and saves upto 48% area. Both the existing and the proposed window-architecture are capable of generating Hanning, Hamming and Blackman window families. VENSOFT Technologies, www.ieeedeveloperslabs.in Email: info@ieeedeveloperslabs.in Contact: 9448847874