See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/368572469 Low Complexity Implementation of OTFS Transmitter using Fully Parallel and Pipelined Hardware Architecture Article in Journal of Signal Processing Systems · February 2023 DOI: 10.1007/s11265-023-01847-x CITATIONS READS 0 120 3 authors, including: Saikumar Dora Manodipan Sahoo Indian Institute of Technology (ISM) Dhanbad Indian Institute of Technology (ISM) Dhanbad 1 PUBLICATION 0 CITATIONS 73 PUBLICATIONS 326 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Modeling and Simulation of FET-based Sensors View project Modeling and Simulation of Carbon nanomaterial based interconnects View project All content following this page was uploaded by Saikumar Dora on 02 July 2023. The user has requested enhancement of the downloaded file. SEE PROFILE Journal of Signal Processing Systems https://doi.org/10.1007/s11265-023-01847-x Low Complexity Implementation of OTFS Transmitter using Fully Parallel and Pipelined Hardware Architecture Sai Kumar Dora1 · Himanshu B. Mishra1 · Manodipan Sahoo1 Received: 27 April 2022 / Revised: 16 January 2023 / Accepted: 16 January 2023 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023 Abstract In this work, we develop the conventional hardware architecture of the orthogonal time frequency space modulation (OTFS) based wireless transmitter to achieve highly reliable communication between high-speed moving devices. In this work, we are using parallel and depth pipelined hardware architecture of fast Fourier transforms/inverse fast Fourier transforms (FFTs/IFFTs) to accelerate the execution of the OTFS on field-programmable gate array (FPGA) with high accuracy and maximum throughput. Additionally, we propose an optimized OTFS transmitter architecture with modified Booth multiplier and memory that i) requires low hardware resources and, ii) provides high throughput. We observe from the synthesis results that the maximum throughput of conventional OTFS architecture achieves 186.95 Tbps with 132,233 lookup tables (LUTs) at 100 MHz clock frequency and 26,586 flip-flops. We however observe that the proposed OTFS hardware achieves a better throughput of 196.67 Tbps with 75,026 LUTs at 139.64 MHz maximum operating frequency on 7vx485tffg1157-1 FPGA device. Keywords OTFS · FFT · IFFT · FPGA · Maximum operating frequency · Throughput 1 Introduction In the current wireless communication standards, orthogonal frequency division multiplexing (OFDM) is used as a promising candidate technology in the quasi static frequency selective fading scenario. This happens because it i) combats the effect of inter-symbol interference (ISI) that arises due to the frequency-selective nature of the channel and ii) can be implemented with an efficient hardware architecture [1]. However, for the high-speed wireless communication scenarios, Doppler shifts come into the picture which gives rise to the inter-carrier interference (ICI) that in turn significantly degrades the performance of OFDM systems with the use of traditional transceivers [2]. Serial/parallel interference cancellation and operator perturbation techniques are * Sai Kumar Dora saikumar19190@gmail.com * Himanshu B. Mishra himanshu@iitism.ac.in Manodipan Sahoo manodipan@iitism.ac.in 1 Electronics Engineering Department, IIT (ISM)-Dhanbad, Dhanbad, India the repetitive techniques for manipulating a system of linear equations in the traditional OFDM that have been proposed to overcome this problem of ICI in [3, 4]. Furthermore, authors in [5], proposed linear and non-linear equalization techniques to eliminate the ICI. Transmit diversity and frequency domain equalization schemes were also proposed in [5, 6], for time-varying channels to improve the system performance. However, these aforementioned techniques can not completely nullify the ICI in the time-varying systems [7, 8]. Recently, Hadani et al. [9] have proposed a novel modulation scheme known as OTFS (orthogonal time frequency space modulation) which converts the fast fading channel into almost quasi static channel by multiplexing the transmit symbols in the delay Doppler domain. Consequently, OTFS provides better bit error rate (BER) than the existing OFDM modulation scheme in high mobility scenarios [10, 11] by using the novel signal processing algorithms developed in [10, 12, 13]. It can be observed from the existing literature [14, 15], that the basic architecture of OFDM transmitter is very simple as it consists of only one transform i.e, IFFT. On the other hand, the architecture of OTFS transmitter is complex as it consists of two operations known as inverse symplectic 13 Vol.:(0123456789) Journal of Signal Processing Systems fast Fourier transform (ISFFT) and Heisenberg transform. We further observe from [16] that i) ISFFT consists of many IFFT and FFT operations and ii) Heisenberg transform consists of only IFFT operations. In light of the above observations, we believe that the knowledge of the existing architectures of OFDM transmitter will help to build a robust architecture for OTFS transmitter. We, therefore, now discuss the existing literature related to the hardware architectures for OFDM transmitters. The basic brick of the efficient hardware implementation is adder and, the authors in [17], proposed a novel approach for designing robust 64-bit and 32-bit full adder circuits in quantum-dot cellular automata using rotated majority gate methodology. The evolution of hardware implementations give rise to the new standards for the design and evaluation of fast algorithms by emphasizing on the consistency, modularity, and localization in the architectural realization. The ultimate purposes of an effective system model in modern communication systems are to decrease energy consumption, increase network lifetime, and improve delay and throughput [18] and among the essential game changing technologies for the wireless communications field are the Internet of Things (IoT) and machine-to-machine communications [19]. In this context, systolic arrays are well-known for the VLSI implementation of processors, as they satisfy the above-mentioned architectural requirements [20]. Therefore, authors in [21], develop efficient OFDM transmitter by implementing IFFT architecture using systolic arrays. We also observe from [22], that an efficient log maximum a posteriori (MAP) turbo encoder/decoder and pipelined IFFT processor are being used to further enhance the throughput and to reduce the complexity of OFDM transmitter. Single delay feedback based pipelined memoryless FFT/IFFT processor helps to improve the area and power efficiency [20], whereas a log MAP turbo decoder achieves good error correction capacity and high speed but these techniques can’t work under the presence of time-varying channel, for a vehicle speed of 100 - 500 kmph. Since OTFS is a novel air interface which provides better performance than OFDM technique in the presence of high Doppler scenarios, it requires a significant research attention to develop efficient hardware architecture for both transmitter and receiver. To the best of our knowledge, no literature exists that describes the hardware implementation of the OTFS transmitter and receiver section. In this paper, we focus towards developing the architecture of OTFS transmitter. In particular, we develop the conventional architecture and propose an optimized architecture for 8 × 8 OTFS transmitter, where 8 × 8 denotes 8 delay and 8 Doppler bins. In the proposed architecture, we use memory instead of rotators in the ISFFT module and implement an efficient rotator by using a modified Booth multiplier. Moreover, we analyse the finite word length of the output of OTFS transmitter and 13 Figure 1 Block Diagram of Orthogonal Time Frequency Space (OTFS) Modulation. find the optimum word length at each butterfly stage using a simulation-based trial method to reduce hardware cost [23]. We can find the system’s minimum word length using this method, and it produces accurate results under worse than the expected conditions. We consider the OTFS architecture implementation to be fully parallel and pipelined. The rest of this paper is organized as follows: in Section 2, we present a brief overview of OTFS system model. Section 3 is dedicated to the step by step procedure of the hardware implementation for OTFS transmitter system model and the conventional hardware architecture. The proposed low complexity hardware architecture for OTFS transmitter is discussed in Sections 4 and 5 which presents numerical comparison of conventional and optimized architectures of OTFS transmitter. Finally, conclusions are drawn in Section 6. 1.1 Notations We use the following notations throughout the paper. Upper case bold face letter π ∈ βπ×π indicates M × N complex matrix and lower case letter π± ∈ βP×1 denotes P × 1 vector. Hermitian and transpose operations of a matrix π are denoted by (π)H and (π)T respectively. 2 OTFS System Model In this section, we discuss the system model for OTFS transmitter. Here, we consider SISO-OTFS system in which each frame is transmitted over M delay and N Doppler bins. The basic block diagram for the OTFS transmitter is shown in Fig. 1 [24]. We consider x[k, l] as the transmitted symbol at the kth Doppler and lth delay. In practice, these symbols are drawn from the constellation of a digital modulation scheme, which can be represented by a matrix π ∈ βM×N . Next, the serialto-parallel converter gives an output π± ∈ βMN×1 can (, which ) be represented mathematically as π± = vec πT . Next, ISFFT operation is used to transform the parallel symbols π± from delay-Doppler domain to the time-frequency domain symbols, which is explained mathematically as follows. The matrix form of the time-frequency symbols π ISFFT at the output of ISFFT block can be written in matrix form as [16] Journal of Signal Processing Systems π ISFFT = π M ππ H N (1) Where π K ∈ βK×K denotes the K-point FFT matrix with 1 pq pq π K (p, q) = { √ W K }K−1 , where W K = e−j2πpqβK . A transp,q=0 K mit windowing matrix π tx is being multiplied element-wise with πISFFT , generating 2D time-frequency symbols πFT ∈ βM×N which can be written as π FT = π ISFFT β π tx = π M ππ H N (2) Here, we consider a trivial rectangular window to reduce the computational complexity of complex multiplications, i.e. π tx ∈ βπ×π is the matrix of all 1’s, which in turn results in the following time domain signal π ∈ βM×N at the output of Heisenberg transform shown in Fig. 1. π = π MH π FT = π MH π M ππ NH = (3) ππ NH The vector representation of the discrete time domain signal is π¬ = π―ππ(π). 3 Conventional Hardware Architecture for OTFS Transmitter In this section, we discuss the step-by-step procedure of the conventional hardware implementation of OTFS as shown in Fig. 2. Initially, we consider a stream of binary information bits (digital sequence) which arises from the analog-to-digital-converter (ADC) [16]. Next, using digital modulation techniques, the binary sequences are mapped into the sequence of complex numbers which is further being converted from serial to parallel and stored in an array of size MN × 1. This array of complex numbers can be represented mathematically as π± = vec (πT ). At the next step, the bit positions are being exchanged to obtain πT , which gives rise to πIFFT = π H πT by using N-point N IFFT parallelly for M times. We further exchange the bit positions of the data sequence to obtain πTIFFT = ππ H , N which undergoes M-point FFT operations parallelly for N times to generate ISFFT data i.e. πISFFT = π M πTIFFT given in Eq. (1). Here, transmit windowing matrix is required to provide full diversity and randomize the phases of the transmitted symbols to eliminate the inter-cell interference [5]. In this architecture, we assume for simplicity that the window matrix contains all 1’s, which results in the time-frequency symbols πFT , represented mathematically in Eq. (2). The particular time-frequency symbols are converted into time domain symbols π through Heisenberg Figure 2 Flow chart of Orthogonal Time Frequency Space (OTFS) Modulation. transformation, which can be represented mathematically π = π H πISFFT in equation 3. We observe from the mathM ematical Eq. (3), that the Heisenberg transform contains N parallel operations on M point IFFTs. We further use a parallel to serial converter, to enable serial transmission of these symbols. Note that FFT and IFFT are the basic building blocks for ISFFT and Heisenberg modules in the architecture of the OTFS transmitter. The FFT/IFFT hardware architecture can be implemented in three ways such as memory-based implementation, pipelined architecture implementation, and direct implementation method. In the memory-based architecture implementation [25], each stage of FFT/ IFFT data is stored in memory and reused. In the direct implementation architecture, each adder and multiplier of the FFT/IFFT is implemented with adder and multiplier. The Pipelined hardware architectures continuously calculate the FFT/IFFT computations. If the size of the FFT is equal to the number of the sub-carriers then the direct implementation architecture acts as parallel and pipelined architecture, providing the highest performance. The pipelined hardware architecture [26, 27] can be implemented in single-path delay feedback, multi path delay feedback, and 13 Journal of Signal Processing Systems Figure 3 Conventional inverse symplectic fast Fourier transform (ISFFT) architecture using Radix-22 multi path delay commutator(R2MDC). multi path delay commutator (MDC). The Radix-22 MDC (R2MDC) is the most traditional method for implementing pipelined FFT/IFFT architecture. In this architecture, the input data sequence splits into two parallel streams, and the optimal distance between the data elements entering the butterfly is determined by the optimal delay. In the OTFS architecture, we have implemented ISFFT module with decimation-in-frequency inverse fast Fourier transform (DIF-IFFT) and decimation-in-time fast Fourier transform (DIT-FFT). The generated output order of the DIF-IFFT and the input of the DIT-FFT are in reverse bit order which is hard-wired connection during the code generation process and it doesn’t require extra hardware. However, the architecture is completely parallel, and all outcomes are provided parallel. One of the major factor to achieve the high performance of FPGA circuits is pipelining. A register level is added after each butterfly stage to achieve high depth pipelining. We next discuss the detailed conventional architecture of ISFFT and Heisenberg modules. Fig. 3 depicts a fully parallel and pipelined conventional ISFFT processor based on the Radix-22 MDC hardware architecture. It is the most straightforward architecture for implementing 8 × 8 ISFFT. Typically, to implement 8 × 8 ISFFT requires eight complex coefficients (C2) and the hardware implementation requirement is to show the additive butterfly(BF) separated from the trivial and nontrivial rotators. In the 8 × 8 ISFFT modulation made up of six butterfly stages, each FFT/IFFT butterfly stage requires four butterflies after each butterfly stage insert one register level and each register level requires one clock cycle. The 8 × 8 ISFFT Figure 4 Conventional rotator hardware implementation using carry look ahead(CLA) and barrel Shifters. 13 requires eight parallel IFFTs, eight parallel FFTs and one hard-wired exchange bit position(EBP). Each 8 point FFT/ IFFT requires two complex multipliers, each complex multiplier implemented with a multiplierless rotator. The rotator is implemented by using advanced shift and add operations to reduce the delay and resource utilization. While implementing the ISFFT and Heisenberg transform hardware architecture, we consider some major technical issues: i) it requires proper design knowledge about rotators otherwise it occupies higher area and power, ii) rotators must be implemented with advanced add and shift technology to minimise the delay and complexity, and iii) coefficients of the rotator are to be selected to achieve accurate results with less number of LUTs [28]. The implementation of rotator plays a major role in the hardware modelling of OTFS modulation. A large number of rotators are required to implement fully parallel ISFFT and Heisenberg transform. The hardware implementation of the rotators in the 8 point FFT/IFFT is simple and, it reduces extra hardware because magnitude of real and imaginary part of the twiddle factor (i.e. Twiddle factor is in the form of C ± jS) can be written as C = ±S , where C is cosπ and S is sinπ . Let us consider the rotator input as x and y, where x is a real number and y is an imaginary number. Initially compute Cx and Cy, and depending on the input sign, the addition or subtraction operation produces the result. Fig. 4 shows the conventional hardware architecture of the rotator by using barrel Shifters, which gives the result multiplied by 0.7071. The conventional architecture employs six carry look ahead (CLA) adders and six shifters. Journal of Signal Processing Systems Figure 5 Hardware implementation of 8 × 8 fully parallel inverse symplectic fast Fourier transform (ISFFT). In the M × N OTFS preprocessing, ISFFT is used to build two-dimensional symbols and translate them to the timefrequency domain. The M × N number of inputs are taken in the natural order bit position like 0,1,2,3 ... ( M × N) − 1 as shown in Fig. 5, here the size of the input M & N=8. The given inputs are divided into two equal groups based on odd and even bit positions wherein each group contains MN/2 bits and further each group is divided into two groups depending on their even and odd positions. This process is repeated until each group elements are equal to the N of the input matrix. These bit exchanged positions are assigned to the IFFT modules. Furthermore, the symbol positions of the IFFT modules output are divided into the even and odd parts until each group elements are equal to M, then these exchanged bit positions are given to the FFT stage. In 8 × 8 ISFFT architecture, the natural order of the bit positions are like b5 b4 b3 b2 b1 b0 and the order of the exchanged bit positions are like b2 b1 b0 b5 b4 b3 . It means that the natural order of input is like 0,1,2,3,4,..., 63, and the output of the exchanged bit positions are like 0,8,16,24,...,63. In the IFFT stage, the first Radix-2 butterfly stage of the first IFFT module (8 pt IFFT_1) is operated on Radix pair symbols with position values as (0,32), (8,40), (16,48), and (24,56) by referring to these positions in binary form as (000000,100000), (001000,101000), (010000,110000), and (011000,111000). In this pair, two index values are the same except the most significant bit (MSB) position bit which is b5 position. This will happen for all the pairs of bit positions in the first stage of butterfly in ISFFT. Likewise, the positions of the butterfly pairs remains the same except for b4 position in the second butterfly stage and b3 position in the third butterfly stage. In the third butterfly stage, the outputs of the butterflies have exchanged their bit positions, as we discussed earlier. In the 8 × 8 ISFFT, rows, and columns are equal so, both exchanged bit positions are the same. The input of the FFT stage is bit exchanged order and output is the natural order. At the fourth butterfly stage, butterflies operated on data pair symbols with position values as (0,4), (1,5), (2,6), and (3,7) are representing these data pair bit positions in the binary form, we will get like (000000,000100), (000001,000101), (000010,000110), and (000011,000111). In this pair, except b2 position remaining all positions are unchanged. In the fifth and sixth butterfly stages, pair index values are the same except b1 position and b0 positions are respectively. The output of the ISFFT is in the natural order. The conventional hardware architecture of 8 × 8 OTFS transmitter occupies a large area and high power consumption. It requires 132, 233 LUTs, 26, 586 FFs and 48 nontrivial complex multipliers. Each non-trivial complex multiplier requires 12 adders and 12 shifters. We propose an 13 Journal of Signal Processing Systems Figure 6 Proposed pipeline inverse symplectic fast Fourier transform (ISFFT) processor from Fig. 5 using Radix-22 multi path delay commutator (R2MDC). efficient architecture to reduce the complexity of hardware implementation, which is discussed in the section below. 4 Proposed Low Complexity Hardware Architecture for Otfs Transmitter The proposed OTFS transmitter architecture is implemented with Radix-22 DIF-FFT algorithm and MDC feed forward pipelined architecture with less number of trivial and nontrivial rotators where each non-trivial rotator occupies more area. Thus by inserting memory instead of non-trivial multipliers in ISFFT, the number of non-trivial rotators have been decreased.In the proposed OTFS architecture, nontrivial rotators were implemented with modified Booth algorithm and multiplierless operations. Trivial rotators are implemented by using 2’s complement and interchanging the real and imaginary symbols and/or exchanging the sign of the data. OTFS modulation is composed of a preprocessing block such as ISFFT, which is the back-to-back connection between the IFFT and FFT as shown in Fig. 6, and it is followed by a conventional frequency-time modulator such as Heisenberg transform. The ISFFT and Heisenberg transform consists of trivial and non-trivial rotators. The rotator coefficient angles are 0 β¦ , 90β¦ , 180β¦ and 270β¦ with the corresponding outcomes are 1, j, −1 and −j respectively [29]. These types of complex multipliers are considered trivial because it doesn’t require any LUTs or adders for implementation. In the hardware implementation of OTFS, non-trivial rotators plays a major role. The conventional 8 × 8 ISFFT architecture requires 32 non-trivial rotators, wherein each FFT and IFFT requires two non-trivial multipliers. In the proposed architecture, we do not use any non-trivial multipliers in the first butterfly stage of ISFFT. Here non-trivial multipliers are replaced with ROM because the output values of the first butterfly stage can be predicted. Here, non-trivial rotators lead to additional pipeline registers, which occupy more area, and consume more time. 13 In this work, OTFS modulation employs a BPSK digital modulator, the modulator outcomes are either 1 or -1, which is given to the first butterfly stage through the serial to parallel converter. The outputs of the first butterfly stage are either 0 or ±2 and after that, the first butterfly stage output is passed through the complex multiplier W80, W8−1, W8−2, and W8−3 . The output of the complex multiplier is either 0 or ±0.7071. So, these 0 and ±0.7071 predicted output values of the complex multiplier were stored in the ROM beforehand. The output is taken from ROM depending on the complex multiplier input. In the proposed ISFFT architecture, instead of 16 complex multipliers, we use ROM with only three stored values, which reduces the pipelining registers and also it consumes less area. To achieve high performance with an efficient utilization area, non trivial rotators are replaced with the proposed modified rotator [30] which is shown in Fig. 7. The modified rotator gives the result of any number multiplied to the 16-bit binary number that is 1011010100000101 in decimal 0.7071. If the input size of the real or imaginary part of the rotator is N bits, then the output size is N + 16 bits. In the modified rotator we use four CLA adders and three concatenations instead of six adders and six shifters. The concatenation block does not require LUTs because it is just a hard-wired connection. In this modified rotator, N bits are given to the first CLA adder as well as two zeros are concatenated left side of the input and given to the second CLA adder. The first eight bits of the second CLA adder is given to the final output and remaining bits are given to the first and third CLA adder. Three zeros are concatenated left side to the second CLA adder output and given to the third CLA adder. First fourteen bits of the third CLA adder and two to N + 1 bits of the first CLA adder is given to the fourth CLA adder. First two bits of the first CLA adder output, fourth CLA adder output and last eight bits of the third CLA adder are given to the final output. In the proposed 8 × 8 OTFS architecture, pipelined ISFFT module is implemented by using R2MDC technique which is shown in Fig. 6 and modified rotator is implemented by Journal of Signal Processing Systems Figure 7 Hardware implementation of modified rotator using modified Booth algorithm. using concatenations and CLA adders which is shown in Fig. 7. ISFFT module consists of FFT and IFFT stages, where FFT stage consists of 8 FFTs and IFFT stage consists of 8 IFFTs, which are used in parallel. In the OTFS transmitter, IFFT stage follows the FFT stage. Then, the 8 point Heisenberg transform i.e, IFFT is used 8 times in parallel as shown in Fig. 8. The OTFS modulation process consists of two IFFT and one FFT module stages, each module stage contains three butterfly stages. The OTFS modulation process consists of 9 butterfly stages and, after each butterfly stage one register level is placed. In the ISFFT module, IFFT stage doesn’t have any non-trivial multipliers and each FFT has two non-trivial rotations and entire FFT stage takes 16 non-trivial rotations. Each non-trivial rotator is implemented with modified rotator, which is shown in Fig. 7. Next, the Heisenberg stage requires 16 non-trivial rotations. The overall implementation of the proposed 8 × 8 OTFS modulation requires i) 32 non-trivial rotations, ii) 64 trivial multipliers for implementing the trivial window multiplication, and iii) 576 adders. Hence in this paper, 64 delay-Doppler information symbols are considered for the implementation of 8 × 8 OTFS modulation. The length of each information symbol is twobit, which is stored in registers. A digital BPSK modulator modulates each information symbol and these symbols are onto the part of a set of 2D orthogonal basis functions that will cover the bandwidth and duration of the transmission of the symbols. The 64 serial constellation symbols are converted into 64 parallel symbols by using serial to parallel converter, where each symbol has 2-bit length. Next, these 64 parallel symbols are fed to the first butterfly stage of the 8 × 8 ISFFT transform. The bit lengths of the input and output of the ISSFT are 2-bit and 20-bit, respectively. These 20-bit symbols are modulated onto the Heisenberg Figure 8 Hardware architecture flow of conventional and optimized Orthogonal Time Frequency Space modulation (OTFS) transmitter. 13 Journal of Signal Processing Systems Table 1 Specifications of conventional OTFS (with Figs. 3 and 4) and optimized OTFS (with Figs. 6 and 7) transmitter architecture. Parameters Conventional architecture Optimized architecture OTFS size Symbols FFT & IFFT size Non-trivial multiplication Input word length (bits) Output word length (bits) Heisenberg transforms Butterfly stages 8×8 64 8 48 2 27 8 9 8×8 64 8 32 2 27 8 9 transform, which produces 27-bit length output (i.e, final butterfly stage of the the Heisenberg Fourier transform) and remaining least significant bits are terminated by using simulation based trial method. The above mentioned 64 parallel information symbols (each symbol has 27-bit length) are given to the parallel to serial converter, which converts 64 parallel information symbols into the 64 serial information symbols. It is worth noting that in the ISFFT and Heisenberg transform, if length of the butterfly stage input is N, then the output of the each butterfly stage adder or subtractor will be (N + 1) . Furthermore, considering the input length of the rotators as N, the length of the output becomes (N + 16). The time period and bandwidth of each input and output symbol takes 10 ns and 100 MHz, respectively . We obtain high throughput of 196.67 Tbps using the proposed architecture of the OTFS. The input and output bit rate of the 8 × 8 OTFS transmitter are 200 Mbps and 2.7 GHz, respectively. Figure 9 Timing waveform of the signals in optimized OTFS transmitter. 13 Table 2 Estimated performance metrics of conventional architecture (with Figs. 3 and 4) and optimized architecture (with Figs. 6 and 7) implemented on 7vx485tffg1157 -1 FPGA device. Parameters Conventional architecture Optimized % of architecture improvement Frequency (MHz) Time Period (ns) Latency (ms) Max frequency (MHz) Min time period (ns) Throughput (Tbps) LUTs FF’s power(W) 100 10 1.42 132.74 7.5 186.95 132,233 26,586 1.42 100 10 1.42 139.64 7.16 196.67 75,026 26,924 1.273 5.2 4.5 5.2 43.2 1.27 10.35 The word length of the output symbol is 27-bits. For depth pipelining, register level is placed after each module and each butterfly adder or subtractor. 5 Results The input and output specifications of the conventional and optimized 8 × 8 OTFS transmitter modulations are shown in Table 1. The 8 × 8 OTFS transmitter has 64 information symbols. It requires one 8 × 8 ISFFT transform (which contains 8 FFTs and 8 IFFTs) and eight Heisenberg transforms. Since each transform contains three butterflies, 9 butterfly stages are required for implementing 8 × 8 OTFS transmitter. In the proposed architecture, the number of non-trivial rotators have been decreased from 48 to 32. Journal of Signal Processing Systems Table 2 shows the difference in synthesis results between conventional architecture and modified architecture of fully parallel and pipelined 8 × 8 OTFS transmitter. We observe from the simulation results of the conventional 8 × 8 OTFS architecture that the maximum operating frequency and clock frequency are estimated to be 132.74 MHz and 100 MHz, respectively. The throughput of the OTFS system is estimated as 186.95 Tbps with 132, 233 LUTs and 26, 586 flip-flops. This architecture contains 576 adders and 48 nontrivial multipliers. The input and output timing waveforms of the proposed architecture of 8 × 8 OTFS transmitter is shown in Fig. 9. Here, the number of input symbols in the delay Doppler domain is 64. Fig. 9a illustrates the input data of the OTFS system. The input data is 2-bits in size, and it took 10 ns to process each symbol. It has taken 640 ns to transmit 64 symbols of input data. (b) illustrates the starting point of the receiver input of the OTFS system. Here, (c) illustrates the real part output and (d) illustrates the imaginary part output. The size of the real part and imaginary part of the OTFS transmitter output is 27-bits. We notice that the maximum operating frequency and clock frequency of an optimized architecture of 8 × 8 OTFS transmitter are estimated as 139.64 MHz and 100 MHz. The throughput of the OTFS system based on maximum frequency is estimated as 196.67 Tbps, which occupies 75, 026 LUTs and 26, 924 flip-flops on 7vx485tffg1157 − 1 FPGA hardware device. 6 Conclusion In this manuscript, we proposed the fully parallel and pipelined VLSI hardware implementation of conventional and optimized 8 × 8 OTFS transmitter that can achieve 196.67 Tbps throughput at 139.64 MHz maximum operating frequency for future 5G and 6G wireless communication systems. Furthermore, the proposed hardware has been implemented with 32 non-trivial multipliers instead of 48 non-trivial multipliers. The 8 × 8 OTFS transmitter requires two IFFT stages and one FFT stage and each stage requires 3 butterfly stages that means it consists of 9 butterfly stages. In the proposed architecture design, a low latency and MDC feed forward pipelined FFT and IFFT architectures are applied to the transmitter. The 7vx485tffg1157-1 FPGA device has been used to verify these transmitters in Xilinx Vivado software environment. This FPGA device contains 303, 600 LUTs. The conventional hardware architecture requires 132, 233 LUTs i.e., it requires 43.5% of LUTs on FPGA board. The proposed hardware architecture consumes 75, 026 LUTs i.e., it occupies 24.7% on FPGA board. In this work, it reduces around 20% of LUTs on 7vx485tffg1157-1 FPGA board if optimized hardware architecture is used instead of conventional hardware architecture of the OTFS Transmitter. Data Availability This manuscript has no associated data. References 1. Yan, H., Wan, L., Zhou, S., Shi, Z., Cui, J.-H., Huang, J., & Zhou, H. (2012). DSP based receiver implementation for OFDM acoustic modems. Physical Communication, 5, 22–32. 2. Ribeiro, C., Gomes, R., Duarte, L., Hammoudeh, A., & Caldeirinha, R. F. (2020). Multi-gigabit/s OFDM real-time based transceiver engine for emerging 5G MIMO systems. Physical Communication, 38, 100957. 3. Molisch, A. F., Toeltsch, M., & Vermani, S. (2007). Iterative methods for cancellation of intercarrier interference in OFDM systems. IEEE Transactions on Vehicular Technology, 56, 2158–2167. 4. Wang, Z., Zhou, S., Giannakis, G. B., Berger, C. R., & Huang, J. (2011). Frequency-domain oversampling for zero-padded OFDM in underwater acoustic communications. IEEE Journal of Oceanic Engineering, 37, 14–24. 5. Shen, W., Dai, L., An, J., Fan, P., & Heath, R. W. (2019). Channel estimation for orthogonal time frequency space OTFS massive MIMO. IEEE Transactions on Signal Processing, 67, 4204–4217. 6. Gorokhov, A., & Linnartz, J.-P. (2004). Robust OFDM receivers for dispersive time-varying channels: Equalization and channel acquisition. IEEE Transactions on Communications, 52, 572–583. 7. Hadani, R., & Monk, A. (2018). OTFS: A new generation of modulation addressing the challenges of 5G. arXiv preprint http://β arxiv.βorg/βabs/β1802.β02623 8. Shrimali, Y., & Sharma, J. B. (2021). Efficient hdl implementation of turbo coded mimo-ofdm physical layer. In Nanoelectronics, Circuits and Communication Systems (pp. 739–753). Springer. 9. Hadani, R., Rakib, S., Molisch, A., Ibars, C., Monk, A., Tsatsanis, M., Delfeld, J., Goldsmith, A., & Calderbank, R. (2017). Orthogonal time frequency space OTFS modulation for millimeter-wave communications systems. In 2017 IEEE MTT-S International Microwave Symposium (IMS) (pp. 681–683). IEEE. 10. Ramachandran, M. K., & Chockalingam, A. (2018). MIMO-OTFS in high-Doppler fading channels: Signal detection and channel estimation. In 2018 IEEE Global Communications Conference (GLOBECOM) (pp. 206–212). IEEE. 11. Singh, P., Yadav, K., Mishra, H. B., & Budhiraja, R. (2022b). BER analysis for OTFS zero forcing receiver. IEEE Transactions on Communications, (pp. 1–1). 12. Mishra, H. B., Singh, P., Prasad, A. K., & Budhiraja, R. (2021). OTFS channel estimation and data detection designs with superimposed pilots. IEEE Transactions on Wireless Communications, . 13. Singh, P., Gupta, A., Mishra, H. B., & Budhiraja, R. (2022a). Lowcomplexity ZF/MMSE MIMO-OTFS receivers for high-speed vehicular communication. IEEE Open Journal of the Communications Society, 3, 209–227. 14. Hieu, N. T., Tu, N. T., Ha, N. V., Nguyen, T. T. T., & Phu, B. H. (2012). FPGA implementation of MIMO-OFDM STBC systems. In 2012 International Conference on Control, Automation and Information Sciences (ICCAIS) (pp. 349–353). 15. Murali, K. R., & Chockalingam, A. (2018). On OTFS modulation for high-doppler fading channels. In 2018 Information Theory and Applications Workshop (ITA) (pp. 1–10). 16. Raviteja, P., Hong, Y., Viterbo, E., & Biglieri, E. (2018). Practical pulse-shaping waveforms for reduced-cyclic-prefix OTFS. IEEE Transactions on Vehicular Technology, 68, 957–961. 17. Kassa, S., Gupta, P., Kumar, M., Stephan, T., & Kannan, R. (2021). Rotated majority gate-based 2n-bit full adder design in quantum-dot cellular automata nanotechnology. Circuit World, . 18. Chithaluru, P., Stephan, T., Kumar, M., & Nayyar, A. (2022). An enhanced energy-efficient fuzzy-based cognitive radio scheme for iot. Neural Computing and Applications, 34, 19193–19215. 13 Journal of Signal Processing Systems 19. Iraqi, Y., & Al-Dweik, A. (2020). Efficient information transmission using smart ofdm for iot applications. IEEE Internet of Things Journal, 7, 8397–8409. 20. Kung, S. (1985). Vlsi array processors. IEEE ASSP Magazine, 2, 4–22. 21. Nadal, J., Nour, C. A., & Baghdadi, A. (2015). Low-complexity pipelined architecture for FBMC/OQAM transmitter. IEEE Transactions on Circuits and Systems II: Express Briefs, 63, 19–23. 22. Shaheen, I. A., & Zekry, A. (2019). Design and implementation of FBMC/OQAM transceiver for 5G wireless communication system. In 2019 International Conference on Promising Electronic Technologies (ICPET) (pp. 73–79). IEEE. 23. Yoshizawa, S., & Miyanaga, Y. (2008). Use of a variable wordlength technique in an OFDM receiver to reduce energy dissipation. IEEE Transactions on Circuits and Systems I: Regular Papers, 55, 2848–2859. 24. Raviteja, P., Phan, K. T., Hong, Y., & Viterbo, E. (2018). Interference cancellation and iterative detection for orthogonal time frequency space modulation. IEEE Transactions on Wireless Communications, 17, 6501–6515. 25. Chang, C.-K., Hung, C.-P., & Chen, S.-G. (2003). An efficient memory-based FFT architecture. In Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03. (pp. II–II). IEEE volume 2. 26. Macleod, M. D. (2005). Multiplierless implementation of rotators and FFTs. EURASIP Journal on Advances in Signal Processing, 2005, 1–8. 27. Nibouche, O., Boussakta, S., Darnell, M., & Benaissa, M. (2010). Algorithms and pipeline architectures for 2-D FFT and FFT-like transforms. Digital Signal Processing, 20, 1072–1086. 28. Garrido, M., Möller, K., & Kumm, M. (2018). World’s fastest FFT architectures: Breaking the barrier of 100 GS/s. IEEE Transactions on Circuits and Systems I: Regular Papers, 66, 1507–1516. 29. Liu, Y., Wang, C., Sun, J., Du, S., & Hong, Q. (2022). One-step calculation circuit of fft and its application. IEEE Transactions on Circuits and Systems I: Regular Papers, (pp. 1–13). 30. Immareddy, S., & Sundaramoorthy, A. (2022). A survey paper on design and implementation of multipliers for digital system applications. Artificial Intelligence Review, (pp. 1–29). Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Sai Kumar Dora was born in Andhra Pradesh, india in 1994. He received the Diploma degree in Electronics and Communication engineering from Dr.BRA GMR polytechnic college, Andhra Pradesh 2012 . He then completed his graduation in Electronics and communication Engineering from JNTU Kakinada, Andhra Pradesh in 2015 and in 2019, he received post graduation degree in System and Signal processing from JNTU-UCEV college, Andhra Pradesh then he joined as a project research fellow in IIT (ISM) 13 View publication stats from 2020-2022. Currently, he is pursuing his PhD degree from IIT(ISM) dhanbad and his Area of interest is Systems and Signal processing ,Hardware design circuits for wireless communication and FPGA prototyping. Himanshu B. Mishra received the M. Tech. degree in Electronics and Communication Engineering from National Institute of Technology Rourkela, Odisha, India, in 2012, and the Ph.D. degree in Electrical Engineering from Indian Institute of Technology Kanpur, Utter Pradesh, India, in 2016. He is currently working as an Assistant Professor with the Electronics Engineering Department, Indian Institute of Technology (Indian School of Mines) Dhanbad, India. His current research interests include the area of parameter estimation, optimization techniques and transceiver design for the next generation wireless technologies including orthogonal-time-frequency-space (OTFS), Filter-bank-multi-carrier (FBMC), intelligent reflecting surface (IRS) and Massive MIMO. He received the Best M.Tech. award in the communication and signal processing domain in 2012. Manopidan Sahoo was born in Haldia, West Bengal, India in 1983. He received M.Tech. in Instrument Technology from Indian Institute of Science, Bangalore in 2006. He received PhD degree from IIEST, Shibpur, India in 2016. His PhD thesis was on “Modeling and Analysis of Carbon Nanotube and Graphene Nanoribbon based Interconnects”. He is currently serving as an Assistant Professor in the Department of Electronics Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, India. His research interests include Modeling and simulation of nanointerconnects and nano-devices, VLSI Circuits and Systems, Internet of Things. He has published more than 50 articles in archival journals and refereed conference proceedings. He has also published a Book entitled “Modelling and Simulation of CNT and GNR Interconnects” with Lambert Academic Publishers in 2019. He published a book chapter entitled “Modelling Interconnects for Future VLSI Circuit Applications” with IET Publishers in 2019. He is also associated as a Senior Member of IEEE, Fellow of IETE, Member of IEI, and Life Member of Instrument Society of India.