Uploaded by rohithrishan2002

FFT PROCESSORS IN OTFS

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/368572469
Low Complexity Implementation of OTFS Transmitter using Fully Parallel and
Pipelined Hardware Architecture
Article in Journal of Signal Processing Systems · February 2023
DOI: 10.1007/s11265-023-01847-x
CITATIONS
READS
0
120
3 authors, including:
Saikumar Dora
Manodipan Sahoo
Indian Institute of Technology (ISM) Dhanbad
Indian Institute of Technology (ISM) Dhanbad
1 PUBLICATION 0 CITATIONS
73 PUBLICATIONS 326 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Modeling and Simulation of FET-based Sensors View project
Modeling and Simulation of Carbon nanomaterial based interconnects View project
All content following this page was uploaded by Saikumar Dora on 02 July 2023.
The user has requested enhancement of the downloaded file.
SEE PROFILE
Journal of Signal Processing Systems
https://doi.org/10.1007/s11265-023-01847-x
Low Complexity Implementation of OTFS Transmitter using Fully
Parallel and Pipelined Hardware Architecture
Sai Kumar Dora1 · Himanshu B. Mishra1 · Manodipan Sahoo1
Received: 27 April 2022 / Revised: 16 January 2023 / Accepted: 16 January 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
In this work, we develop the conventional hardware architecture of the orthogonal time frequency space modulation (OTFS)
based wireless transmitter to achieve highly reliable communication between high-speed moving devices. In this work,
we are using parallel and depth pipelined hardware architecture of fast Fourier transforms/inverse fast Fourier transforms
(FFTs/IFFTs) to accelerate the execution of the OTFS on field-programmable gate array (FPGA) with high accuracy and
maximum throughput. Additionally, we propose an optimized OTFS transmitter architecture with modified Booth multiplier and memory that i) requires low hardware resources and, ii) provides high throughput. We observe from the synthesis
results that the maximum throughput of conventional OTFS architecture achieves 186.95 Tbps with 132,233 lookup tables
(LUTs) at 100 MHz clock frequency and 26,586 flip-flops. We however observe that the proposed OTFS hardware achieves
a better throughput of 196.67 Tbps with 75,026 LUTs at 139.64 MHz maximum operating frequency on 7vx485tffg1157-1
FPGA device.
Keywords OTFS · FFT · IFFT · FPGA · Maximum operating frequency · Throughput
1 Introduction
In the current wireless communication standards, orthogonal frequency division multiplexing (OFDM) is used as a
promising candidate technology in the quasi static frequency
selective fading scenario. This happens because it i) combats the effect of inter-symbol interference (ISI) that arises
due to the frequency-selective nature of the channel and ii)
can be implemented with an efficient hardware architecture
[1]. However, for the high-speed wireless communication
scenarios, Doppler shifts come into the picture which gives
rise to the inter-carrier interference (ICI) that in turn significantly degrades the performance of OFDM systems with the
use of traditional transceivers [2]. Serial/parallel interference cancellation and operator perturbation techniques are
* Sai Kumar Dora
saikumar19190@gmail.com
* Himanshu B. Mishra
himanshu@iitism.ac.in
Manodipan Sahoo
manodipan@iitism.ac.in
1
Electronics Engineering Department, IIT (ISM)-Dhanbad,
Dhanbad, India
the repetitive techniques for manipulating a system of linear
equations in the traditional OFDM that have been proposed
to overcome this problem of ICI in [3, 4]. Furthermore,
authors in [5], proposed linear and non-linear equalization
techniques to eliminate the ICI. Transmit diversity and frequency domain equalization schemes were also proposed
in [5, 6], for time-varying channels to improve the system
performance. However, these aforementioned techniques can
not completely nullify the ICI in the time-varying systems
[7, 8].
Recently, Hadani et al. [9] have proposed a novel modulation scheme known as OTFS (orthogonal time frequency
space modulation) which converts the fast fading channel
into almost quasi static channel by multiplexing the transmit
symbols in the delay Doppler domain. Consequently, OTFS
provides better bit error rate (BER) than the existing OFDM
modulation scheme in high mobility scenarios [10, 11] by
using the novel signal processing algorithms developed in
[10, 12, 13].
It can be observed from the existing literature [14, 15],
that the basic architecture of OFDM transmitter is very simple as it consists of only one transform i.e, IFFT. On the
other hand, the architecture of OTFS transmitter is complex
as it consists of two operations known as inverse symplectic
13
Vol.:(0123456789)
Journal of Signal Processing Systems
fast Fourier transform (ISFFT) and Heisenberg transform.
We further observe from [16] that i) ISFFT consists of
many IFFT and FFT operations and ii) Heisenberg transform consists of only IFFT operations. In light of the above
observations, we believe that the knowledge of the existing architectures of OFDM transmitter will help to build
a robust architecture for OTFS transmitter. We, therefore,
now discuss the existing literature related to the hardware
architectures for OFDM transmitters.
The basic brick of the efficient hardware implementation
is adder and, the authors in [17], proposed a novel approach
for designing robust 64-bit and 32-bit full adder circuits in
quantum-dot cellular automata using rotated majority gate
methodology. The evolution of hardware implementations
give rise to the new standards for the design and evaluation of fast algorithms by emphasizing on the consistency,
modularity, and localization in the architectural realization. The ultimate purposes of an effective system model
in modern communication systems are to decrease energy
consumption, increase network lifetime, and improve delay
and throughput [18] and among the essential game changing
technologies for the wireless communications field are the
Internet of Things (IoT) and machine-to-machine communications [19]. In this context, systolic arrays are well-known
for the VLSI implementation of processors, as they satisfy
the above-mentioned architectural requirements [20]. Therefore, authors in [21], develop efficient OFDM transmitter by
implementing IFFT architecture using systolic arrays. We
also observe from [22], that an efficient log maximum a
posteriori (MAP) turbo encoder/decoder and pipelined IFFT
processor are being used to further enhance the throughput
and to reduce the complexity of OFDM transmitter. Single
delay feedback based pipelined memoryless FFT/IFFT processor helps to improve the area and power efficiency [20],
whereas a log MAP turbo decoder achieves good error correction capacity and high speed but these techniques can’t
work under the presence of time-varying channel, for a vehicle speed of 100 - 500 kmph.
Since OTFS is a novel air interface which provides better performance than OFDM technique in the presence of
high Doppler scenarios, it requires a significant research
attention to develop efficient hardware architecture for both
transmitter and receiver. To the best of our knowledge, no
literature exists that describes the hardware implementation
of the OTFS transmitter and receiver section. In this paper,
we focus towards developing the architecture of OTFS transmitter. In particular, we develop the conventional architecture and propose an optimized architecture for 8 × 8 OTFS
transmitter, where 8 × 8 denotes 8 delay and 8 Doppler bins.
In the proposed architecture, we use memory instead of rotators in the ISFFT module and implement an efficient rotator
by using a modified Booth multiplier. Moreover, we analyse
the finite word length of the output of OTFS transmitter and
13
Figure 1 Block Diagram of Orthogonal Time Frequency Space
(OTFS) Modulation.
find the optimum word length at each butterfly stage using a
simulation-based trial method to reduce hardware cost [23].
We can find the system’s minimum word length using this
method, and it produces accurate results under worse than
the expected conditions. We consider the OTFS architecture
implementation to be fully parallel and pipelined.
The rest of this paper is organized as follows: in Section 2, we present a brief overview of OTFS system model.
Section 3 is dedicated to the step by step procedure of the
hardware implementation for OTFS transmitter system
model and the conventional hardware architecture. The
proposed low complexity hardware architecture for OTFS
transmitter is discussed in Sections 4 and 5 which presents
numerical comparison of conventional and optimized architectures of OTFS transmitter. Finally, conclusions are drawn
in Section 6.
1.1 Notations
We use the following notations throughout the paper. Upper
case bold face letter 𝐗 ∈ β„‚πŒ×𝐍 indicates M × N complex
matrix and lower case letter 𝐱 ∈ β„‚P×1 denotes P × 1 vector. Hermitian and transpose operations of a matrix 𝐀 are
denoted by (𝐀)H and (𝐀)T respectively.
2 OTFS System Model
In this section, we discuss the system model for OTFS transmitter. Here, we consider SISO-OTFS system in which each
frame is transmitted over M delay and N Doppler bins. The
basic block diagram for the OTFS transmitter is shown in
Fig. 1 [24].
We consider x[k, l] as the transmitted symbol at the kth
Doppler and lth delay. In practice, these symbols are drawn
from the constellation of a digital modulation scheme, which
can be represented by a matrix 𝐗 ∈ β„‚M×N . Next, the serialto-parallel converter gives an output 𝐱 ∈ β„‚MN×1
can
(, which
)
be represented mathematically as 𝐱 = vec 𝐗T . Next,
ISFFT operation is used to transform the parallel symbols 𝐱
from delay-Doppler domain to the time-frequency domain
symbols, which is explained mathematically as follows. The
matrix form of the time-frequency symbols 𝐗 ISFFT at the
output of ISFFT block can be written in matrix form as [16]
Journal of Signal Processing Systems
𝐗 ISFFT = 𝐅M 𝐗𝐅H
N
(1)
Where 𝐅K ∈ β„‚K×K denotes the K-point FFT matrix with
1
pq
pq
𝐅K (p, q) = { √ W K }K−1
, where W K = e−j2πœ‹pqβˆ•K . A transp,q=0
K
mit windowing matrix 𝐖 tx is being multiplied element-wise
with 𝐗ISFFT , generating 2D time-frequency symbols
𝐗FT ∈ β„‚M×N which can be written as
𝐗 FT = 𝐗 ISFFT βŠ™ 𝐖 tx
= 𝐅M 𝐗𝐅H
N
(2)
Here, we consider a trivial rectangular window to reduce
the computational complexity of complex multiplications,
i.e. 𝐖 tx ∈ β„‚πŒ×𝐍 is the matrix of all 1’s, which in turn
results in the following time domain signal 𝐒 ∈ β„‚M×N at the
output of Heisenberg transform shown in Fig. 1.
𝐒 = 𝐅MH 𝐗 FT
= 𝐅MH 𝐅 M 𝐗𝐅NH
=
(3)
𝐗𝐅NH
The vector representation of the discrete time domain
signal is 𝐬 = 𝐯𝐞𝐜(𝐒).
3 Conventional Hardware Architecture
for OTFS Transmitter
In this section, we discuss the step-by-step procedure of
the conventional hardware implementation of OTFS as
shown in Fig. 2. Initially, we consider a stream of binary
information bits (digital sequence) which arises from
the analog-to-digital-converter (ADC) [16]. Next, using
digital modulation techniques, the binary sequences are
mapped into the sequence of complex numbers which is
further being converted from serial to parallel and stored
in an array of size MN × 1. This array of complex numbers
can be represented mathematically as 𝐱 = vec (𝐗T ). At the
next step, the bit positions are being exchanged to obtain
𝐗T , which gives rise to 𝐗IFFT = 𝐅H
𝐗T by using N-point
N
IFFT parallelly for M times. We further exchange the bit
positions of the data sequence to obtain 𝐗TIFFT = 𝐗𝐅H
,
N
which undergoes M-point FFT operations parallelly for N
times to generate ISFFT data i.e. 𝐗ISFFT = 𝐅M 𝐗TIFFT given
in Eq. (1). Here, transmit windowing matrix is required
to provide full diversity and randomize the phases of the
transmitted symbols to eliminate the inter-cell interference [5]. In this architecture, we assume for simplicity
that the window matrix contains all 1’s, which results in
the time-frequency symbols 𝐗FT , represented mathematically in Eq. (2). The particular time-frequency symbols are
converted into time domain symbols 𝐒 through Heisenberg
Figure 2 Flow chart of Orthogonal Time Frequency Space (OTFS)
Modulation.
transformation, which can be represented mathematically
𝐒 = 𝐅H
𝐗ISFFT in equation 3. We observe from the mathM
ematical Eq. (3), that the Heisenberg transform contains
N parallel operations on M point IFFTs. We further use a
parallel to serial converter, to enable serial transmission
of these symbols.
Note that FFT and IFFT are the basic building blocks
for ISFFT and Heisenberg modules in the architecture of
the OTFS transmitter. The FFT/IFFT hardware architecture
can be implemented in three ways such as memory-based
implementation, pipelined architecture implementation,
and direct implementation method. In the memory-based
architecture implementation [25], each stage of FFT/
IFFT data is stored in memory and reused. In the direct
implementation architecture, each adder and multiplier of
the FFT/IFFT is implemented with adder and multiplier.
The Pipelined hardware architectures continuously calculate the FFT/IFFT computations. If the size of the FFT
is equal to the number of the sub-carriers then the direct
implementation architecture acts as parallel and pipelined
architecture, providing the highest performance. The pipelined hardware architecture [26, 27] can be implemented in
single-path delay feedback, multi path delay feedback, and
13
Journal of Signal Processing Systems
Figure 3 Conventional inverse
symplectic fast Fourier transform (ISFFT) architecture
using Radix-22 multi path delay
commutator(R2MDC).
multi path delay commutator (MDC). The Radix-22 MDC
(R2MDC) is the most traditional method for implementing
pipelined FFT/IFFT architecture. In this architecture, the
input data sequence splits into two parallel streams, and
the optimal distance between the data elements entering
the butterfly is determined by the optimal delay.
In the OTFS architecture, we have implemented ISFFT
module with decimation-in-frequency inverse fast Fourier
transform (DIF-IFFT) and decimation-in-time fast Fourier
transform (DIT-FFT). The generated output order of the
DIF-IFFT and the input of the DIT-FFT are in reverse
bit order which is hard-wired connection during the code
generation process and it doesn’t require extra hardware.
However, the architecture is completely parallel, and all
outcomes are provided parallel. One of the major factor to
achieve the high performance of FPGA circuits is pipelining. A register level is added after each butterfly stage to
achieve high depth pipelining.
We next discuss the detailed conventional architecture
of ISFFT and Heisenberg modules. Fig. 3 depicts a fully
parallel and pipelined conventional ISFFT processor based
on the Radix-22 MDC hardware architecture. It is the most
straightforward architecture for implementing 8 × 8 ISFFT.
Typically, to implement 8 × 8 ISFFT requires eight complex coefficients (C2) and the hardware implementation
requirement is to show the additive butterfly(BF) separated from the trivial and nontrivial rotators. In the 8 × 8
ISFFT modulation made up of six butterfly stages, each
FFT/IFFT butterfly stage requires four butterflies after
each butterfly stage insert one register level and each
register level requires one clock cycle. The 8 × 8 ISFFT
Figure 4 Conventional rotator
hardware implementation using
carry look ahead(CLA) and barrel Shifters.
13
requires eight parallel IFFTs, eight parallel FFTs and one
hard-wired exchange bit position(EBP). Each 8 point FFT/
IFFT requires two complex multipliers, each complex
multiplier implemented with a multiplierless rotator. The
rotator is implemented by using advanced shift and add
operations to reduce the delay and resource utilization.
While implementing the ISFFT and Heisenberg transform hardware architecture, we consider some major technical issues: i) it requires proper design knowledge about
rotators otherwise it occupies higher area and power, ii)
rotators must be implemented with advanced add and shift
technology to minimise the delay and complexity, and iii)
coefficients of the rotator are to be selected to achieve accurate results with less number of LUTs [28]. The implementation of rotator plays a major role in the hardware modelling of OTFS modulation. A large number of rotators are
required to implement fully parallel ISFFT and Heisenberg
transform.
The hardware implementation of the rotators in the 8
point FFT/IFFT is simple and, it reduces extra hardware
because magnitude of real and imaginary part of the twiddle factor (i.e. Twiddle factor is in the form of C ± jS) can
be written as C = ±S , where C is cosπœƒ and S is sinπœƒ . Let
us consider the rotator input as x and y, where x is a real
number and y is an imaginary number. Initially compute
Cx and Cy, and depending on the input sign, the addition or
subtraction operation produces the result. Fig. 4 shows the
conventional hardware architecture of the rotator by using
barrel Shifters, which gives the result multiplied by 0.7071.
The conventional architecture employs six carry look ahead
(CLA) adders and six shifters.
Journal of Signal Processing Systems
Figure 5 Hardware implementation of 8 × 8 fully parallel
inverse symplectic fast Fourier
transform (ISFFT).
In the M × N OTFS preprocessing, ISFFT is used to build
two-dimensional symbols and translate them to the timefrequency domain. The M × N number of inputs are taken
in the natural order bit position like 0,1,2,3 ... ( M × N) − 1
as shown in Fig. 5, here the size of the input M & N=8.
The given inputs are divided into two equal groups based
on odd and even bit positions wherein each group contains
MN/2 bits and further each group is divided into two groups
depending on their even and odd positions. This process is
repeated until each group elements are equal to the N of the
input matrix. These bit exchanged positions are assigned to
the IFFT modules. Furthermore, the symbol positions of
the IFFT modules output are divided into the even and odd
parts until each group elements are equal to M, then these
exchanged bit positions are given to the FFT stage.
In 8 × 8 ISFFT architecture, the natural order of the
bit positions are like b5 b4 b3 b2 b1 b0 and the order of the
exchanged bit positions are like b2 b1 b0 b5 b4 b3 . It means
that the natural order of input is like 0,1,2,3,4,..., 63,
and the output of the exchanged bit positions are like
0,8,16,24,...,63. In the IFFT stage, the first Radix-2 butterfly stage of the first IFFT module (8 pt IFFT_1) is operated on Radix pair symbols with position values as (0,32),
(8,40), (16,48), and (24,56) by referring to these positions
in binary form as (000000,100000), (001000,101000),
(010000,110000), and (011000,111000). In this pair, two
index values are the same except the most significant bit
(MSB) position bit which is b5 position. This will happen for all the pairs of bit positions in the first stage of
butterfly in ISFFT. Likewise, the positions of the butterfly pairs remains the same except for b4 position in the
second butterfly stage and b3 position in the third butterfly stage. In the third butterfly stage, the outputs of
the butterflies have exchanged their bit positions, as we
discussed earlier. In the 8 × 8 ISFFT, rows, and columns
are equal so, both exchanged bit positions are the same.
The input of the FFT stage is bit exchanged order and output is the natural order. At the fourth butterfly stage, butterflies operated on data pair symbols with position values as (0,4), (1,5), (2,6), and (3,7) are representing these
data pair bit positions in the binary form, we will get like
(000000,000100), (000001,000101), (000010,000110),
and (000011,000111). In this pair, except b2 position
remaining all positions are unchanged. In the fifth and
sixth butterfly stages, pair index values are the same
except b1 position and b0 positions are respectively. The
output of the ISFFT is in the natural order.
The conventional hardware architecture of 8 × 8 OTFS
transmitter occupies a large area and high power consumption. It requires 132, 233 LUTs, 26, 586 FFs and 48 nontrivial complex multipliers. Each non-trivial complex multiplier requires 12 adders and 12 shifters. We propose an
13
Journal of Signal Processing Systems
Figure 6 Proposed pipeline
inverse symplectic fast Fourier
transform (ISFFT) processor
from Fig. 5 using Radix-22
multi path delay commutator
(R2MDC).
efficient architecture to reduce the complexity of hardware
implementation, which is discussed in the section below.
4 Proposed Low Complexity Hardware
Architecture for Otfs Transmitter
The proposed OTFS transmitter architecture is implemented
with Radix-22 DIF-FFT algorithm and MDC feed forward
pipelined architecture with less number of trivial and nontrivial rotators where each non-trivial rotator occupies more
area. Thus by inserting memory instead of non-trivial multipliers in ISFFT, the number of non-trivial rotators have
been decreased.In the proposed OTFS architecture, nontrivial rotators were implemented with modified Booth
algorithm and multiplierless operations. Trivial rotators are
implemented by using 2’s complement and interchanging
the real and imaginary symbols and/or exchanging the sign
of the data.
OTFS modulation is composed of a preprocessing
block such as ISFFT, which is the back-to-back connection
between the IFFT and FFT as shown in Fig. 6, and it is followed by a conventional frequency-time modulator such as
Heisenberg transform. The ISFFT and Heisenberg transform
consists of trivial and non-trivial rotators. The rotator coefficient angles are 0 β—¦ , 90β—¦ , 180β—¦ and 270β—¦ with the corresponding outcomes are 1, j, −1 and −j respectively [29]. These
types of complex multipliers are considered trivial because
it doesn’t require any LUTs or adders for implementation. In
the hardware implementation of OTFS, non-trivial rotators
plays a major role. The conventional 8 × 8 ISFFT architecture requires 32 non-trivial rotators, wherein each FFT and
IFFT requires two non-trivial multipliers. In the proposed
architecture, we do not use any non-trivial multipliers in the
first butterfly stage of ISFFT. Here non-trivial multipliers
are replaced with ROM because the output values of the
first butterfly stage can be predicted. Here, non-trivial rotators lead to additional pipeline registers, which occupy more
area, and consume more time.
13
In this work, OTFS modulation employs a BPSK digital
modulator, the modulator outcomes are either 1 or -1, which
is given to the first butterfly stage through the serial to parallel converter. The outputs of the first butterfly stage are
either 0 or ±2 and after that, the first butterfly stage output
is passed through the complex multiplier W80, W8−1, W8−2, and
W8−3 . The output of the complex multiplier is either 0 or
±0.7071. So, these 0 and ±0.7071 predicted output values of
the complex multiplier were stored in the ROM beforehand.
The output is taken from ROM depending on the complex
multiplier input. In the proposed ISFFT architecture, instead
of 16 complex multipliers, we use ROM with only three
stored values, which reduces the pipelining registers and
also it consumes less area.
To achieve high performance with an efficient utilization area, non trivial rotators are replaced with the proposed
modified rotator [30] which is shown in Fig. 7. The modified rotator gives the result of any number multiplied to the
16-bit binary number that is 1011010100000101 in decimal 0.7071. If the input size of the real or imaginary part
of the rotator is N bits, then the output size is N + 16 bits.
In the modified rotator we use four CLA adders and three
concatenations instead of six adders and six shifters. The
concatenation block does not require LUTs because it is just
a hard-wired connection.
In this modified rotator, N bits are given to the first CLA
adder as well as two zeros are concatenated left side of the
input and given to the second CLA adder. The first eight
bits of the second CLA adder is given to the final output and
remaining bits are given to the first and third CLA adder.
Three zeros are concatenated left side to the second CLA
adder output and given to the third CLA adder. First fourteen
bits of the third CLA adder and two to N + 1 bits of the first
CLA adder is given to the fourth CLA adder. First two bits of
the first CLA adder output, fourth CLA adder output and last
eight bits of the third CLA adder are given to the final output.
In the proposed 8 × 8 OTFS architecture, pipelined ISFFT
module is implemented by using R2MDC technique which
is shown in Fig. 6 and modified rotator is implemented by
Journal of Signal Processing Systems
Figure 7 Hardware implementation of modified rotator using
modified Booth algorithm.
using concatenations and CLA adders which is shown in
Fig. 7. ISFFT module consists of FFT and IFFT stages,
where FFT stage consists of 8 FFTs and IFFT stage consists
of 8 IFFTs, which are used in parallel. In the OTFS transmitter, IFFT stage follows the FFT stage. Then, the 8 point
Heisenberg transform i.e, IFFT is used 8 times in parallel as
shown in Fig. 8. The OTFS modulation process consists of
two IFFT and one FFT module stages, each module stage
contains three butterfly stages. The OTFS modulation process consists of 9 butterfly stages and, after each butterfly
stage one register level is placed. In the ISFFT module, IFFT
stage doesn’t have any non-trivial multipliers and each FFT
has two non-trivial rotations and entire FFT stage takes 16
non-trivial rotations. Each non-trivial rotator is implemented
with modified rotator, which is shown in Fig. 7. Next, the
Heisenberg stage requires 16 non-trivial rotations. The overall implementation of the proposed 8 × 8 OTFS modulation
requires i) 32 non-trivial rotations, ii) 64 trivial multipliers
for implementing the trivial window multiplication, and iii)
576 adders.
Hence in this paper, 64 delay-Doppler information symbols are considered for the implementation of 8 × 8 OTFS
modulation. The length of each information symbol is twobit, which is stored in registers. A digital BPSK modulator
modulates each information symbol and these symbols are
onto the part of a set of 2D orthogonal basis functions that
will cover the bandwidth and duration of the transmission
of the symbols. The 64 serial constellation symbols are
converted into 64 parallel symbols by using serial to parallel converter, where each symbol has 2-bit length. Next,
these 64 parallel symbols are fed to the first butterfly stage
of the 8 × 8 ISFFT transform. The bit lengths of the input
and output of the ISSFT are 2-bit and 20-bit, respectively.
These 20-bit symbols are modulated onto the Heisenberg
Figure 8 Hardware architecture flow of conventional and
optimized Orthogonal Time
Frequency Space modulation
(OTFS) transmitter.
13
Journal of Signal Processing Systems
Table 1 Specifications of conventional OTFS (with Figs. 3 and 4)
and optimized OTFS (with Figs. 6 and 7) transmitter architecture.
Parameters
Conventional
architecture
Optimized
architecture
OTFS size
Symbols
FFT & IFFT size
Non-trivial multiplication
Input word length (bits)
Output word length (bits)
Heisenberg transforms
Butterfly stages
8×8
64
8
48
2
27
8
9
8×8
64
8
32
2
27
8
9
transform, which produces 27-bit length output (i.e, final
butterfly stage of the the Heisenberg Fourier transform)
and remaining least significant bits are terminated by using
simulation based trial method. The above mentioned 64 parallel information symbols (each symbol has 27-bit length)
are given to the parallel to serial converter, which converts
64 parallel information symbols into the 64 serial information symbols. It is worth noting that in the ISFFT and
Heisenberg transform, if length of the butterfly stage input
is N, then the output of the each butterfly stage adder or
subtractor will be (N + 1) . Furthermore, considering the
input length of the rotators as N, the length of the output
becomes (N + 16).
The time period and bandwidth of each input and output
symbol takes 10 ns and 100 MHz, respectively . We obtain
high throughput of 196.67 Tbps using the proposed architecture of the OTFS. The input and output bit rate of the 8 × 8
OTFS transmitter are 200 Mbps and 2.7 GHz, respectively.
Figure 9 Timing waveform of
the signals in optimized OTFS
transmitter.
13
Table 2 Estimated performance metrics of conventional architecture
(with Figs. 3 and 4) and optimized architecture (with Figs. 6 and 7)
implemented on 7vx485tffg1157 -1 FPGA device.
Parameters
Conventional
architecture
Optimized % of
architecture improvement
Frequency (MHz)
Time Period (ns)
Latency (ms)
Max frequency (MHz)
Min time period (ns)
Throughput (Tbps)
LUTs
FF’s
power(W)
100
10
1.42
132.74
7.5
186.95
132,233
26,586
1.42
100
10
1.42
139.64
7.16
196.67
75,026
26,924
1.273
5.2
4.5
5.2
43.2
1.27
10.35
The word length of the output symbol is 27-bits. For depth
pipelining, register level is placed after each module and
each butterfly adder or subtractor.
5 Results
The input and output specifications of the conventional and
optimized 8 × 8 OTFS transmitter modulations are shown
in Table 1. The 8 × 8 OTFS transmitter has 64 information
symbols. It requires one 8 × 8 ISFFT transform (which contains 8 FFTs and 8 IFFTs) and eight Heisenberg transforms.
Since each transform contains three butterflies, 9 butterfly
stages are required for implementing 8 × 8 OTFS transmitter. In the proposed architecture, the number of non-trivial
rotators have been decreased from 48 to 32.
Journal of Signal Processing Systems
Table 2 shows the difference in synthesis results between
conventional architecture and modified architecture of fully
parallel and pipelined 8 × 8 OTFS transmitter. We observe
from the simulation results of the conventional 8 × 8 OTFS
architecture that the maximum operating frequency and
clock frequency are estimated to be 132.74 MHz and 100
MHz, respectively. The throughput of the OTFS system is
estimated as 186.95 Tbps with 132, 233 LUTs and 26, 586
flip-flops. This architecture contains 576 adders and 48 nontrivial multipliers.
The input and output timing waveforms of the proposed
architecture of 8 × 8 OTFS transmitter is shown in Fig. 9.
Here, the number of input symbols in the delay Doppler
domain is 64. Fig. 9a illustrates the input data of the OTFS
system. The input data is 2-bits in size, and it took 10 ns
to process each symbol. It has taken 640 ns to transmit 64
symbols of input data. (b) illustrates the starting point of
the receiver input of the OTFS system. Here, (c) illustrates
the real part output and (d) illustrates the imaginary part
output. The size of the real part and imaginary part of the
OTFS transmitter output is 27-bits. We notice that the maximum operating frequency and clock frequency of an optimized architecture of 8 × 8 OTFS transmitter are estimated
as 139.64 MHz and 100 MHz. The throughput of the OTFS
system based on maximum frequency is estimated as 196.67
Tbps, which occupies 75, 026 LUTs and 26, 924 flip-flops
on 7vx485tffg1157 − 1 FPGA hardware device.
6 Conclusion
In this manuscript, we proposed the fully parallel and pipelined VLSI hardware implementation of conventional and
optimized 8 × 8 OTFS transmitter that can achieve 196.67
Tbps throughput at 139.64 MHz maximum operating frequency for future 5G and 6G wireless communication
systems. Furthermore, the proposed hardware has been
implemented with 32 non-trivial multipliers instead of 48
non-trivial multipliers. The 8 × 8 OTFS transmitter requires
two IFFT stages and one FFT stage and each stage requires 3
butterfly stages that means it consists of 9 butterfly stages. In
the proposed architecture design, a low latency and MDC feed
forward pipelined FFT and IFFT architectures are applied to
the transmitter. The 7vx485tffg1157-1 FPGA device has been
used to verify these transmitters in Xilinx Vivado software
environment. This FPGA device contains 303, 600 LUTs. The
conventional hardware architecture requires 132, 233 LUTs
i.e., it requires 43.5% of LUTs on FPGA board. The proposed
hardware architecture consumes 75, 026 LUTs i.e., it occupies
24.7% on FPGA board. In this work, it reduces around 20% of
LUTs on 7vx485tffg1157-1 FPGA board if optimized hardware architecture is used instead of conventional hardware
architecture of the OTFS Transmitter.
Data Availability This manuscript has no associated data.
References
1. Yan, H., Wan, L., Zhou, S., Shi, Z., Cui, J.-H., Huang, J., & Zhou,
H. (2012). DSP based receiver implementation for OFDM acoustic modems. Physical Communication, 5, 22–32.
2. Ribeiro, C., Gomes, R., Duarte, L., Hammoudeh, A., &
Caldeirinha, R. F. (2020). Multi-gigabit/s OFDM real-time based
transceiver engine for emerging 5G MIMO systems. Physical
Communication, 38, 100957.
3. Molisch, A. F., Toeltsch, M., & Vermani, S. (2007). Iterative methods for cancellation of intercarrier interference in OFDM systems.
IEEE Transactions on Vehicular Technology, 56, 2158–2167.
4. Wang, Z., Zhou, S., Giannakis, G. B., Berger, C. R., & Huang, J.
(2011). Frequency-domain oversampling for zero-padded OFDM
in underwater acoustic communications. IEEE Journal of Oceanic
Engineering, 37, 14–24.
5. Shen, W., Dai, L., An, J., Fan, P., & Heath, R. W. (2019). Channel
estimation for orthogonal time frequency space OTFS massive
MIMO. IEEE Transactions on Signal Processing, 67, 4204–4217.
6. Gorokhov, A., & Linnartz, J.-P. (2004). Robust OFDM receivers
for dispersive time-varying channels: Equalization and channel
acquisition. IEEE Transactions on Communications, 52, 572–583.
7. Hadani, R., & Monk, A. (2018). OTFS: A new generation of
modulation addressing the challenges of 5G. arXiv preprint http://​
arxiv.​org/​abs/​1802.​02623
8. Shrimali, Y., & Sharma, J. B. (2021). Efficient hdl implementation
of turbo coded mimo-ofdm physical layer. In Nanoelectronics,
Circuits and Communication Systems (pp. 739–753). Springer.
9. Hadani, R., Rakib, S., Molisch, A., Ibars, C., Monk, A., Tsatsanis,
M., Delfeld, J., Goldsmith, A., & Calderbank, R. (2017). Orthogonal time frequency space OTFS modulation for millimeter-wave
communications systems. In 2017 IEEE MTT-S International
Microwave Symposium (IMS) (pp. 681–683). IEEE.
10. Ramachandran, M. K., & Chockalingam, A. (2018). MIMO-OTFS
in high-Doppler fading channels: Signal detection and channel
estimation. In 2018 IEEE Global Communications Conference
(GLOBECOM) (pp. 206–212). IEEE.
11. Singh, P., Yadav, K., Mishra, H. B., & Budhiraja, R. (2022b). BER
analysis for OTFS zero forcing receiver. IEEE Transactions on
Communications, (pp. 1–1).
12. Mishra, H. B., Singh, P., Prasad, A. K., & Budhiraja, R. (2021).
OTFS channel estimation and data detection designs with superimposed pilots. IEEE Transactions on Wireless Communications, .
13. Singh, P., Gupta, A., Mishra, H. B., & Budhiraja, R. (2022a). Lowcomplexity ZF/MMSE MIMO-OTFS receivers for high-speed
vehicular communication. IEEE Open Journal of the Communications Society, 3, 209–227.
14. Hieu, N. T., Tu, N. T., Ha, N. V., Nguyen, T. T. T., & Phu, B. H.
(2012). FPGA implementation of MIMO-OFDM STBC systems.
In 2012 International Conference on Control, Automation and
Information Sciences (ICCAIS) (pp. 349–353).
15. Murali, K. R., & Chockalingam, A. (2018). On OTFS modulation
for high-doppler fading channels. In 2018 Information Theory and
Applications Workshop (ITA) (pp. 1–10).
16. Raviteja, P., Hong, Y., Viterbo, E., & Biglieri, E. (2018). Practical
pulse-shaping waveforms for reduced-cyclic-prefix OTFS. IEEE
Transactions on Vehicular Technology, 68, 957–961.
17. Kassa, S., Gupta, P., Kumar, M., Stephan, T., & Kannan, R.
(2021). Rotated majority gate-based 2n-bit full adder design in
quantum-dot cellular automata nanotechnology. Circuit World, .
18. Chithaluru, P., Stephan, T., Kumar, M., & Nayyar, A. (2022). An
enhanced energy-efficient fuzzy-based cognitive radio scheme for
iot. Neural Computing and Applications, 34, 19193–19215.
13
Journal of Signal Processing Systems
19. Iraqi, Y., & Al-Dweik, A. (2020). Efficient information transmission using smart ofdm for iot applications. IEEE Internet of
Things Journal, 7, 8397–8409.
20. Kung, S. (1985). Vlsi array processors. IEEE ASSP Magazine, 2, 4–22.
21. Nadal, J., Nour, C. A., & Baghdadi, A. (2015). Low-complexity
pipelined architecture for FBMC/OQAM transmitter. IEEE Transactions on Circuits and Systems II: Express Briefs, 63, 19–23.
22. Shaheen, I. A., & Zekry, A. (2019). Design and implementation
of FBMC/OQAM transceiver for 5G wireless communication system. In 2019 International Conference on Promising Electronic
Technologies (ICPET) (pp. 73–79). IEEE.
23. Yoshizawa, S., & Miyanaga, Y. (2008). Use of a variable wordlength technique in an OFDM receiver to reduce energy dissipation. IEEE Transactions on Circuits and Systems I: Regular
Papers, 55, 2848–2859.
24. Raviteja, P., Phan, K. T., Hong, Y., & Viterbo, E. (2018). Interference cancellation and iterative detection for orthogonal time
frequency space modulation. IEEE Transactions on Wireless Communications, 17, 6501–6515.
25. Chang, C.-K., Hung, C.-P., & Chen, S.-G. (2003). An efficient
memory-based FFT architecture. In Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03.
(pp. II–II). IEEE volume 2.
26. Macleod, M. D. (2005). Multiplierless implementation of rotators and
FFTs. EURASIP Journal on Advances in Signal Processing, 2005, 1–8.
27. Nibouche, O., Boussakta, S., Darnell, M., & Benaissa, M. (2010).
Algorithms and pipeline architectures for 2-D FFT and FFT-like
transforms. Digital Signal Processing, 20, 1072–1086.
28. Garrido, M., Möller, K., & Kumm, M. (2018). World’s fastest FFT
architectures: Breaking the barrier of 100 GS/s. IEEE Transactions on Circuits and Systems I: Regular Papers, 66, 1507–1516.
29. Liu, Y., Wang, C., Sun, J., Du, S., & Hong, Q. (2022). One-step
calculation circuit of fft and its application. IEEE Transactions on
Circuits and Systems I: Regular Papers, (pp. 1–13).
30. Immareddy, S., & Sundaramoorthy, A. (2022). A survey paper
on design and implementation of multipliers for digital system
applications. Artificial Intelligence Review, (pp. 1–29).
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of
such publishing agreement and applicable law.
Sai Kumar Dora was born in Andhra
Pradesh, india in 1994. He received
the Diploma degree in Electronics
and Communication engineering
from Dr.BRA GMR polytechnic
college, Andhra Pradesh 2012 . He
then completed his graduation in
Electronics and communication
Engineering from JNTU Kakinada,
Andhra Pradesh in 2015 and in
2019, he received post graduation
degree in System and Signal processing from JNTU-UCEV college,
Andhra Pradesh then he joined as a
project research fellow in IIT (ISM)
13
View publication stats
from 2020-2022. Currently, he is pursuing his PhD degree from
IIT(ISM) dhanbad and his Area of interest is Systems and Signal processing ,Hardware design circuits for wireless communication and
FPGA prototyping.
Himanshu B. Mishra received the
M. Tech. degree in Electronics
and Communication Engineering
from National Institute of Technology Rourkela, Odisha, India,
in 2012, and the Ph.D. degree in
Electrical Engineering from
Indian Institute of Technology
Kanpur, Utter Pradesh, India, in
2016. He is currently working as
an Assistant Professor with the
Electronics Engineering Department, Indian Institute of Technology (Indian School of Mines)
Dhanbad, India. His current
research interests include the area
of parameter estimation, optimization techniques and transceiver
design for the next generation wireless technologies including orthogonal-time-frequency-space (OTFS), Filter-bank-multi-carrier (FBMC),
intelligent reflecting surface (IRS) and Massive MIMO. He received
the Best M.Tech. award in the communication and signal processing
domain in 2012.
Manopidan Sahoo was born in
Haldia, West Bengal, India in
1983. He received M.Tech. in
Instrument Technology from
Indian Institute of Science, Bangalore in 2006. He received PhD
degree from IIEST, Shibpur, India
in 2016. His PhD thesis was on
“Modeling and Analysis of Carbon Nanotube and Graphene
Nanoribbon based Interconnects”.
He is currently serving as an
Assistant Professor in the Department of Electronics Engineering,
Indian Institute of Technology
(Indian School of Mines), Dhanbad, India. His research interests include Modeling and simulation of
nanointerconnects and nano-devices, VLSI Circuits and Systems, Internet of Things. He has published more than 50 articles in archival journals and refereed conference proceedings. He has also published a Book
entitled “Modelling and Simulation of CNT and GNR Interconnects”
with Lambert Academic Publishers in 2019. He published a book chapter entitled “Modelling Interconnects for Future VLSI Circuit Applications” with IET Publishers in 2019. He is also associated as a Senior
Member of IEEE, Fellow of IETE, Member of IEI, and Life Member of
Instrument Society of India.
Download