Ultra-wideband Digital Baseband

Ultra-wideband Digital Baseband
by
Rauil Blazquez
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2006
@ Massachusetts Institute-of Technology 21306. All rights reserved.
Author ................ .. ....... ..... ... .. .................
Department of Electricg
eering and Computer Science
May.25, 2006
Certified by..................................
/......
.. ..........
Anantha P. Chandrakasan
Professor of Electrical Engineering and Computer Science
Thesis Supervisor
Accepted by .......
.................
Arthur C. Smith
Chairman, Department Committee on Graduate Students
OF TECHNOLOGY
Nov .22006
LIBRARIES
ARCHVES
Ultra-wideband Digital Baseband
by
Rail Blhzquez
Submitted to the Department of Electrical Engineering and Computer Science
on May 25, 2006, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
The FCC approved the use of Ultra-wideband signals for communication purposes in
February 2002 in the band from 3.1GHz to 10.6GHz, effectively opening 7.5GHz of
free unlicensed bandwidth. There are two main constraints for the use of this band:
a maximum EIRP spectral density of -41.3dBm/MHz and a minimum instantaneous
bandwidth of 500MHz. One of the main driving applications of this technology is
high data rate communication over short distances.
In this thesis two digital baseband receivers for impulse UWB have been designed.
The first one was designed for baseband UWB pulses and achieves 193 kbps of wireless
communication using impulses of 300 MHz bandwidth and 2% duty cycle, and was
part of a system-on-a-chip.
The second baseband achieves 100Mbps using impulses of 500 MHz bandwidth in
the FCC compliant band, as part of a whole UWB system. Due to its bandwidth the
multipath becomes very relevant as the data rate is increased into the range of the
hundreds of megabits per second. The current multipath model, used for the development of IEEE standard 802.15.3a is a modified Saleh-Valenzuela model [1] that has
a root mean square duration of the impulse response from 5 to 25 ns. The maximum
data rate in an UWB system depends on the signal to noise ratio and the multipath.
The assessment of the quality of the channel and the exposure of several useful knobs
in the baseband to control the complexity of the signal processing implemented allows higher levels of the communication hierarchy to fine-tune the receiver, trading
off number of operations and power dissipation with quality of service. It includes a
MLSE and a RAKE receiver to compensate for multipath. It has been implemented
in 0.18 um CMOS technology using National Semiconductors process. The chip has
been demonstrated in a wireless system.
Thesis Supervisor: Anantha P. Chandrakasan
Title: Full Professor
Acknowledgments
When I came to MIT, I admit that it did not occur to me that I would work on digital
circuits. After all, I had moved in the previous years, steadily but surely, towards
signal processing. But after talking with professor Anantha Chandrakasan, I decided
to take his class 6.374, and became interested in circuit design, since it was a natural
evolution, not only having worked on signal processing but also implementing that
same signal processing. I would like to thank Anantha for recognizing this opportunity
and for allowing me to pursue in his group a research project with signal processing
components, while permitting me to delve deep in the circuit design field. Anantha's
advice, encouragement, enthusiasm, and guidance to explore this field of research
proved essential during these years to achieve the final goal. I would like to thank
him also for his patience, and for his care in the well-being of all his students, specially
taking into account the size of his research group. I also had the opportunity of being
his teaching assistant in 6.374, an experience enriching and rewarding in and of itself.
For Anantha's example as an educator, an engineer, an scientist and a person, I
consider it a privilege to have worked with him.
I would like to thank also Professors Lizhong Zheng and Moe Win, for their
encouragement, advice, feedback and patience during the development of this thesis.
I would like to thank Peter Holloway and his team at National Semiconductor
Inc. not only for the fabrication of the chip, but also for his patience and work in
facilitating the process of tape-out of the second chip implemented in this thesis.
Without them I would not have graduated on time.
I would like to thank Fred Lee, for his continuous help in the testing of the chip,
where his debugging skills, his enthusiasm, and his proficiency with the soldering iron
have cleared most of the obstacles. I feel lucky to have shared his friendship, his sense
of humour, and his conversation, going from circuits to good food, and of the humane
and divine of life and science, during the long hours shared in the cubicle and in the
lab. I would also like to thank David Wentzloff for his support, optimism, sense of
humour, and for teaching me how to solder a chip to a board. I would like to thank
Manish Bhardwaj for his advice since I met him my first year at MIT. At that time,
we were in contiguous cubicles and shared 6.241. I would like to thank him for all the
moments shared that included fruitful discussions on wireless, cinema, coding, future
plans, and ups, lows and in-betweens of grad school. Thanks also for helping me to
clear the last obstacles of this work and with the flow of this thesis. I would like to
thank Vivienne Sze and David Wentzloff for proof-reading this thesis.
I would also like to thank the members of the UWB group: Puneet Newaskar,
Vivienne Sze, Brian Ginsburg, Johnna Powell, Nathan Ackerman, Ashutosh Bhardwaj
and Kyle Gilpin. I am lucky to have been part of this team, and thanks to them I
have appreciated and admired the beauty and difficulty involved in the different parts
that comprise a communication system.
I would like to thank Daniel Finchelstein, Frank HonorS, Alice Wang and Michael
McIlrath for their continuous help with the quirks, manias, and Murphy's law compliance of the tools. Without their help, there would not have been tape-outs.
Many thanks to all the other members of the Digital Integrated Circuits and
Systems group, both present and past: Naveen Verma, Denis Daly, Payam Lajevardi,
Alex Kern, Nathan Ickes, Joyce Kwong, Yogesh Ramadass, Tao Pan, Taeg Sang Cho,
Vikram Chandrasekhar, Fred Chen, Nigel Drego, Rex Min, CheeWe Ng, SeongHwan
Cho, Julia Cline, Piyada Phanaphat, Theodoros Konstantakopoulos, Nisha Checka,
Shamik Das, Travis Simpkins, and Eugene Shih. They have made the group a fun
and interesting place to be and work.
I would like to thank Margaret Flaherty for her help with paperwork, finding
rooms for meetings, thesis defense, aligning the schedules of several professors for my
committee meetings, and, in general, making sure the only challenges I had to meet
were technical. I would like to thank also Debroah Hodges-Pabon for making MTL
a vibrant place through socials, seminars, and other activities. I would like to thank
Marilyn Pierce for her patience during these years even when I submitted my theses
on the eleventh hour.
I would like to thank La Caixa Fellowship Program, for the opportunity they
gave me to pursue my research interests abroad. Their efficient management of the
different stages of the fellowship makes it one of the best possible ways of starting
graduate studies in an American university. This research has also been sponsored
by an Intel Fellowship, Hewlett-Packard under the HP/MIT Alliance, and the NSF.
I would like to thank also those who provided invaluable support outside the lab.
Thanks to Pablo Vila, friend, colleague, roommate, for his support during all these
years, for sharing long conversations about life, wireless communications, music, and
"temazos llenapistas". I would also like to thank Virginia Romero, Ismael Calleja,
Ana Bravo, Luis Enrique Garcia for their friendship and support during this time,
no matter the distance, the time difference and my crazy schedule every time I went
back to Europe. Thanks to Susana, Clara, Emilenne, Fran, Andres, Eduardo, Karen,
Juan, Ana, Parmesh, along with the rest of the people I met, befriended and got to
know in Boston, allowing me to keep a balance between life and grad studies. Thanks
for being there.
I would like to thank Aidita for her unconditional support, advice and over all,
love. The trip is always more important than the destination, and I am grateful to
have shared these last two years with you. This thesis has all more value because in
this time I met you and you changed my life.
Finally, I would like to thank my parents, Magdalena and Felix, for their continuous, unconditional, unrelenting love and support during all these years. Anything
that I could write here would be but a pale shade of what they mean to me. This
work is as much yours as mine.
Finalmente, me gustaria dar las gracias a mis padres, Magdalena and Felix, por su
amor y apoyo continuo, incondicional e infalible durante todos estos afios. Cualquier
cosa que yo pudiera escribir seria s6lo un p.lido reflejo de lo que representan para mi.
Este trabajo es el resultado no s61o de mi esfuerzo, sino tambidn del vuestro.
Contents
1 Introduction
1.1 Background ..........................
1.2 UW B Signals .........................
1.3 Characteristics of UWB Signals . ...............
1.4 UW B Applications ......................
1.5 Previously Used Architectures . ..............
1.6 Signal Processing Techniques . ...............
1.7 Power Dissipation in UWB Systems . .............
1.8 Thesis Contributions .....................
.
.
2 A Baseband Processor for a Baseband UWB Transceiver
2.1 UW B Signals .................
... . ... .
2.2 System Trade-offs
. . . . . . . .
2.3 Architectural Choi ces for Clock Generation and ADC . . . . . . . . .
2.4 Digital Baseband
... .. .. .
2.4.1 Functionali
... .. ...
2.4.2 A parallelized approach ..............
. .. .. ...
2.4.3 Architectur
. .. ... ..
2.5 Performance Resul ts . . . . . . . . . . . . . . . . . . . . .. ... ..
33
34
36
37
37
41
43
45
3 System Analysis for the FCC Compliant System
3.1 Objectives of the Design .........
3.2 Homodyne vs Heterodyne architecture .
3.3 Specification of the ADC .........
3.3.1 Signal definition ..........
3.3.2 Automatic Gain Control .....
3.3.3 Demodulating Architectures . . .
3.3.4 Simulations and Analysis .....
3.4 Choice of UWB Signal ..........
3.5 M ultipath .................
3.5.1 Channel Model ..........
3.5.2 Data-Aided Channel Estimation .
3.5.3 Rake Receiver ...........
3.5.4 MLSE Equalizer .........
3.6 Choice of Packet Format .........
49
50
51
52
53
53
54
54
59
59
60
61
64
69
71
.
...
•..
.....
...
.....
.....
•..
.
. . . . . .
. . . . . .
.. .. ..
.. ... .
. . . . . .
. . . . . .
. . . . . .
. .. ...
. .. ...
. .. ....
. . . . . .
. .. ...
. .. ...
. . . . . .
Baseband Functionality .............
Non-idealities Model . . . . . . . . . . . . . .
Link Budget ...................
Summary ....................
72
74
76
77
4 FPGA Implementation
4.1 Architecture of the Discrete Platform .....
4.1.1 Transmitter ...............
4.1.2 Front-end ................
4.1.3 Receiver .................
4.1.4 Protocol .................
4.2 Application in the Digital Baseband Design
4.2.1 Limitations of the Digital Platform ..
4.2.2 Specifications and Interfaces ......
4.2.3 Architecture of the Baseband . . . . .
4.2.4 State Machine . .............
4.2.5 Results ..................
4.3 Application for Testing Multitone-FSK . . . .
4.3.1 Signal Definition . . . . . . . . . . . .
4.3.2 Receiver Architecture ..........
4.4 Conclusions ...................
81
81
82
84
84
85
86
86
86
88
91
92
92
92
94
94
5 ASIC Implementation of a Baseband for FCC Compliant UWB
5.1 Functionality of the Chip ............
5.2 Interfaces and Clock Structure .........
5.3 High-speed Clock Domain ...........
5.4 Correlators/Matched Filter Block .......
5.5 Channel Analysis Module ...........
5.6 Timing Synchronization ............
.......
5.7 MLSE Equalizer ..........
5.8 Implementation and Results ..........
95
95
96
98
100
105
108
111
114
3.7
3.8
3.9
3.10
............
o
.. .. .. .. .. ..
........... .
........... .
........... .
. . . . . . .... . .
.
.
.
.
.
.
.
.
.
.
.
.
6 Conclusions and future work
6.1 Thesis summary . ..............
6.2 Conclusions .................
6.3 Future work .................
121
121
123
124
A Link budget
.. .. ... . ... .. ..
A.1 Spreadsheet equations ...........
. . .. .
A.1.1 Notation ..........
... . . . .. . . . . ..
A.1.2 Definition of the parameter K . . . . . . . . . . . . . . . . . .
A.1.3 Link budget and sensitivity . . . . . . . . . . . . . . . . . . .
A.1.4 Extra losses due to the pulse shape. . . . . . . . . . . . . . .
.. .. .. .. ... .. ..
A.1.5 Receiver constraints ........
A.1.6 ADC constraints and detection . . . . . . . . . . . . . . . . .
125
125
125
126
127
128
129
129
A.1.7 Gain Specifications ........................
A.1.8 Noise Figure specification . ..................
B Comments on signal generation
B.1 Defining the transmitted signal . . .
B.2 Jitter in the transmitter .......
B.3 Channel impact ............
B.4 Summary of the model ........
B.5 Dealing with a complex non-linearity
B.6 Dealing with an I-Q unbalance ....
.
131
132
133
. . . . . . . . . . . . . . . . . .
. ... .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. ... .. ... . .. .. .. ..
133
134
134
136
138
141
List of Figures
1-1
1-2
1-3
1-4
1-5
EIRP mask approved by the FCC [2]. . . . . . . . . .. . . . . . . . .
Intended applications. ..........................
Architecture of UWB receiver by Berkeley Wireless Research Center.
Correlator channel in a CDMA receiver .
. .. . . . . . . . . . . . .
Architecture of baseband of UWB receiver by Sony Corp. ........
2-1 BER as a function of the SNR (a) or SIR (b) for different ADCs . . .
2-2 Baseband processor block diagram. . ...................
2-3 Pd as a function of D, the relative position between the pulse and the
template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
2-4 Coarse acquisition process as a Markov chain (D = Correct detection;
FD = False detection). .............
..............
to
a
2-5 Change of probability of detection due
difference in frequencies
between transmitter and receiver . . . . . . . . . . . . . . . . . . . .
2-6 Correlators Architecture. ............
............. .
2-7 Groups of four consecutive samples required . . . . . . . . . . . . . .
2-8 Block diagram of the retiming block ..... . . . . . . . . . . . . . .
2-9 Implementation of the correlation bank.
. . . . . . . . . . . . . .
2-10 Fine tracking subsystem block diagram....
. . . . . . . . . . . . . .
2-11 Coarse acquisition block diagram. .......
. . . . . . . . . . . . . .
2-12 Single chip UWB transceiver photograph... . . . . . . . . . . . . . .
3-1 500 MHz bandwidth channelization with FCC compliant power spectral density . . . .. . .. . ... . . . . . . . .. .. . .
3-2 Architectures for the receiver. . ...............
3-3 Receiver architectures for different UWB modulations. .
3-4 Probability of error for the AWGN limited case, OFDM UWB ..
3-5 Probability of error for the AWGN limited case, pulsed UWB
3-6 Probability of error for the interference limited case, OFDM UWB
3-7 Probability of error for the interference limited case, pulsed UWB
3-8 Example of UWB BPSK baseband signal, before up-conversion ..
3-9 500 MHz pulse with carrier 5 GHz. Courtesy of David Wentzloff .
3-10 Procedure to compensate for multipath .............
. .
3-11 Example of the clusters in one instance of the channels in [3]. . . .
40
42
44
45
46
47
47
47
3-12 Minimum SNR at the input to achieve a 10 dB SNR in the channel
estimation as a function of the number of bits of the samples and the
length of the integration. No saturation . ................
3-13 Minimum SNR at the input to achieve a 10 dB SNR in the channel
estimation as a function of the number of bits of the samples and the
length of the integration. 6 dB saturation . ...............
3-14 Functional diagram of a Rake receiver . .................
3-15 Functional diagram of the Rake receiver that will be implemented in
..........
this UWB system .......
..........
3-16 Modified Rake receiver. . ..................
.......
3-17 Losses in the modified Rake receiver as a function of the normalized
threshold and the channel model. . ..................
.
3-18 Losses associated with the parameter LMLSE in the Viterbi demodulator.
3-19 Design of the data packet. Courtesy of V. Sze. . .............
.
3-20 Required functionality of the digital baseband. . ...........
3-21 A simplified block diagram of a direct conversion front end...... .
.
3-22 Explanation of the losses due to shape of the pulse. . ........
3-23 Minimum received power as a function of the center frequency at 10 m.
.
..........
3-24 Range of the AGC ..................
3-25 Maximum noise figure of the receiver. . ..................
64
65
66
67
68
70
71
72
74
75
77
78
78
79
.
Block diagram of the discrete prototype. . ..............
.
Discrete prototype transmitter. Courtesy of N. Ackerman. ......
.
Discrete Prototype receiver. Courtesy of Fred S. Lee. . .......
Boards related to the ADC and baseband of the discrete prototype.
Courtesy of N. Ackerman .........................
4-5 Losses due to misrepresentation of the channel impulse response in the
discrete prototype. ............................
.
4-6 Block diagram of the discrete prototype baseband. . .........
.
4-7 Control Signals for the Serial to Parallel Register. . .........
4-8 Block diagram of the basic structure for the correlators and matched
filter ....................
..............
4-9 Block diagram of the retiming block. . ..................
4-10 Part of the preamble of a data packet as measured in the discrete
prototype, without (above) and with an interference(below). ......
4-11 Example of MFSK signal. Courtesy of Cheng Luo . ..........
82
83
84
4-12 Architecture for demodulation of Multitone FSK [4].
94
4-1
4-2
4-3
4-4
.........
85
87
88
89
90
91
92
93
96
5-1 Block diagram of the full transceiver. ...................
97
5-2 Block diagram of the functionality of the chip implemented. ......
. . 98
5-3 State machine implemented in the system. . ............
. . 99
5-4 Block diagram of the high speed clock domain. . ..........
100
5-5 Block diagram of the retiming block. . ..................
101
..
.. . .........
5-6 Retiming block ...............
. . . 102
5-7 Block diagram of the correlators. ....................
5-8
5-9
5-10
5-11
5-12
5-13
5-14
5-15
5-16
5-17
5-18
5-19
5-20
5-21
5-22
5-23
5-24
Block diagram of a correlator group . . . . . . . . . . . . . . . . . .
Block diagram of the minimal unit of the correlators. ..........
Block diagram of the channel analysis subsystem. .........
. .
Structure of one of the 25 components of "Threshold Check" Block..
Structure of one of the 25 components of the blocks "Threshold Comply" and "Complex Conjugation"......................
.
Block diagram of the MMSE weight estimator. . . . .......
. . .
Block diagram of the Costas loop. ...................
.
8-state Trellis diagram ...........................
Locating the most probable path in a 8-state Trellis. . .........
Block diagram of the MLSE equalizer. . .................
Robust UWB baseband layout. . ..................
..
Testing board ................................
Interface signals when a packet has been detected. .........
. .
Interface signals showing a sequence of demodulated bits. .......
Probability of error measures in the ASIC. . ...............
Demonstration of a QoS - Power trade-off. . ...............
Structure of the data packet ........................
104
104
107
108
109
109
110
112
113
113
115
116
117
117
118
119
119
List of Tables
2.1
2.2
Model results for a Gaussian pulse ...................
Chip Measurements ............................
3.1
3.2
ao,values set by AGC ...........................
Multipath Channel Models ........................
.
40
48
54
61
Chapter 1
Introduction
Although the concept of ultra-wideband modulation has been known and used for
already several decades [5], it is currently being re-visited by the integrated circuits
community as a viable high-speed, last-meter wireless link technology[6, 7]. Ultrawideband signals, for its large bandwidth, propagation characteristics [8], and timing
definition, add special advantages to wireless communication that would make it
amenable to some specific applications, while at the same time posing interesting
challenges to the task. In this chapter, I will introduce UWB signals and communications, identify the characteristics that distinguish them from normal narrowband
communications, the challenges it poses and the state of the art in the application of
UWB signals for wireless communication purposes.
1.1
Background
Although the denomination "ultra-wideband", UWB, and "impulse radio" are recent
[9], impulse radio communications can be considered the first wireless data signal
ever, since Marconi used signals that would fit the definition in his spark-gap to
communicate from Lavernock Point, South Wales, to Flat Holm Island in May 13th
1897. The spark-gap transmitter produced a signal with a frequency of approximately
500 kHz, a maximum average power of 35 kW and a peak of pulse power of several
tens of MW. The message received was three dots, the Morse code for the letter S.
But it was dropped for a while in favor of narrowband communications in which the
information is encoded either in the phase, the amplitude or the frequency of a carrier.
Narrowband signals are easily separated using filtering and heterodyne and superheterodyne architectures. The fact that they are bandwidth limited also simplifies its
control and regulation by government agencies such as the Federal Communication
Commission in the United States of America.
Meanwhile, although not for communication purposes, electromagnetic impulses
were used for RADAR and positioning applications. During the second World War,
the use of RADAR became widespread in the military, as were countermeasures for
such systems. It was shown that the space definition of a RADAR system is inversely proportional to the bandwidth of the signals used. As bandwidth of the
signal increases, greater detail that not only can locate with precision the position of
the target, but also help to identify the nature of the target (with what is called a
RADAR signature and pattern recognition procedures). With time, low probability
of interception capabilities became more relevant, and purely impulse signals were replaced with signals of the same bandwidth but larger duty cycle that allowed keeping
the same capabilities while at the same time reducing the probability of interception.
At the end of the 1960's signals that could be classified as UWB appeared under the name of carrier-free, baseband, time domain, non-sinusoidal, and orthogonal
function radio signals [5]. At the same time, the development of sample and hold
receivers for oscilloscopes commercially at Tektronix Inc. was also to aid the UWB
field. In 1973, the Ross and Robbins patents [5] pioneered the use of UWB signals
under these other names in a range of applications, including both communications
and RADAR. These patents already include: methods for generating pulse trains;
methods for modulating a pulse train; methods for switching to generate RF pulse
train signals; methods for detection and receiving; and appropriately efficient antennas. It has been claimed [5] that by 1975 a UWB-like system, for communications
or RADAR, could be constructed from components purchased from Tektronix. In
fact, impulse RADAR systems have been commercial since this time, for applications
such as ground-, wall- and foliage-penetration, position-location, collision warning for
avoidance, fluid level detection, intruder detection and vehicle RADAR measurements
[10, 11, 12].
Starting in early 90's, the use of impulses for communication purposes was revisited by Win and Scholtz for communication purposes [13, 14, 15, 16, 17, 18, 19, 20],
and impulse radio was defined[9]. Pulse position modulation was almost exclusively
adopted during the initial development of UWB radios because negating ultra-short
pulses was difficult to implement. It was not until the late 1990's that the name
ultra-wideband and the acronym UWB became popular. By this time, pulse negation became easier to implement, and pulse amplitude modulation attracted interest
[21]. It is also in this decade that the first start-ups and companies directly working
in UWB for communication purposes appeared.
In 2002, the Federal Communication Commission of the United States of America
authorized the use of ultra-wideband signals for communication purposes[2], in the
band from 3.1 GHz to 10.6 GHz, opening effectively 7.5 GHz of bandwidth to communication applications as long as some constraints were met. First the minimum
instantaneous signal bandwidth for a signal to be considered UWB would be 500 MHz.
The second important constraint is to meet a EIRP mask as shown in figure 1-1 with
a maximum equivalent radiated isotropic power spectral density of -41.3 dBm/MHz
and even more stringent limitations in other bands. These restrictions intended to
limit the impact of the new UWB devices interfering on already existing services in
the same frequency bands. The main impact that this legislation had was that it did
not specify any concrete type of modulation for the UWB signal, nor any concrete use
of the bandwidth available. For that reason, the definition of UWB signals started to
encompass signals that would not respond to more traditional UWB concepts. After
the FCC approved the use of UWB for communication purposes, a larger variety of
approaches suggested by different companies appeared.
..
........!
........
!ii!i~
i~
!!
!
!
.i
..
..
..
..
:
E -40
S-45
C
-60o
. -65
0-7055
-
:
-Part
- - -
-
:
:
:
15 bound
-First
Report
and
Order
-75
-o0
10o
Frequency in GHz
101
Figure 1-1: EIRP mask approved by the FCC [2].
1.2
UWB Signals
Initially, UWB signals were defined as any signal whose bandwidth is larger than 1.5
GHz or whose bandwidth is larger than 25% of its center frequency. The procedure
to obtain such a large bandwidth was to use very short duration impulses in the
range of the sub-nanosecond duration. The impulses used were initially [16] either
the Gaussian pulse, the Gaussian monocycle (first derivative of Gaussian pulse) or the
second derivative of the Gaussian pulse. UWB signals for communication purposes
are currently restricted by the FCC to the band between 3.1 GHz and 10.6 GHz,
a minimum bandwidth of 500 MHz, and a maximum equivalent isotropic radiated
power (EIRP) spectral density of -41.3 dBm/MHz [2].
In narrowband signals the information is encoded either in the phase, the frequency or the amplitude of a sinusoid. Although initially most of the work in UWB
was made in pulse position modulation (PPM) [17], during these years different modulation schemes have been explored:
* Pulse position modulation - PPM [9, 17, 18]. In this case, the signal follows a
time-hopping format. Assuming there are more than 1 transmitter, the signal
sent by the kth transmitter is:
(t(k) -_jT
8 k)(t(k))
-
ck)T -
dc(k)
(1.1)
-00
where t(k) is the kth transmitter's clock time and Tf is the pulse repetition time.
This scheme of modulation has been shown to asymptotically perform better
than direct sequence CDMA in a multipath environment [19, 20]. Its drawback
are the additional complexity that is required in the demodulator. PPM was
19
almost exclusively adopted in the early development of UWB radios because
negating ultra short pulses was difficult to implement. Another modulation
scheme that does not require pulse negation is the so termed on-off keying
(OOK), where symbol "1" is represented by transmitting a pulse, and "0" by
transmitting nothing.
* Pulse amplitude modulation - PAM [15]. The signal follows a scheme close to
that of a direct sequence code division multiplex access signal (DS-CDMA) as
shown:
N0-1
)(t(k)
k))
(t
-
jNTf
iTf)
(1.2)
j=-oo i=O
where b k) represents the sequence of symbols and c k)a pseudorandom sequence.
The difference between DS-CDMA and this modulation scheme as applied to
UWB is that the duty cycle of the waveform wtr(t) is small. Although it has a
bound in its capacity that is smaller to the PPM scheme, both the fact that the
complexity of the receiver is smaller and that for binary modulation it presents
a 3 dB advantage to PPM, makes it amenable to practical implementation [21].
A special case of PAM is Binary Phase Shifting keying (BPSK) or antipodal
modulation. This kind of modulation for large bandwidth has been found to be
asymptotically inefficient [19]. On the other hand, the transceivers associated
to this kind of modulation are less complex and synchronization to this kind of
signal is straightforward.
* Hybrid Direct-Sequence/Time-Hopping-CDMA (DS/TH-CDMA) modulation.
In this case the signal is represented as:
oo
sk)(t(k))
No-1
_
(k) k)w
(t(k) _ jTf
-Ck)T
-
d
NJ)
(1.3)
j=-oo i=0
This scheme has more degrees of freedom than the previous two. It is possible
to approach the capacity levels obtained by PPM, with a lower complexity in
the receiver as the PPM scheme.
* Transmitted reference UWB [22, 23, 24, 25]. In this case, before each information pulse, a reference pulse is sent, that allows a very simple demodulation
process at the cost of 3 dB of SNR. The signal can be described as:
S:
s) =
r
8Tf) +
- iNTf)
bk)(t - iN8 T1 ) + d~k)bk)(t
di'-
(1.4)
b, and bd represents the reference pulse and the data impulses respectively.
bk' (t)
0-1 a) p(t - j2Tf - cSk)TP)
(1.5)
bk)(t) = Ej-
a k)p(t - j2Tf - c k)T - Tr)
(1.6)
(1.7)
Transmitted-reference (TR) signaling, in conjunction with an autocorrelation
receiver, offers a low-complexity alternative to Rake reception. Further information on this modulation scheme can be found in [26, 27, 28].
Other schemes that have been reported are orthogonal waveform and block orthogonal
modulation schemes.
Due to the redefinition of ultra-wideband since the FCC ruling, a more varied set
of non-impulsive modulations have been considered, including OFDM signals [29, 30,
31], and other impulse UWB modulations [32, 33]. Although some of their relevant
characteristics will be analyzed in chapter three, a thorough study of all the possible
UWB modulation exceeds the scope of this thesis. In this thesis, we will focus on
impulse UWB systems, and details on the kind of modulation used and the reasons
for it are given in the following chapters. The main reasons for this decision will
be expanded in chapter 3, but hinge upon some of the challenges required for a
transceiver of these characteristics.
1.3
Characteristics of UWB Signals
The main characteristics of UWB signals are associated to their bandwidth, at least
an order of magnitude larger than other signals used for communications. Current
wideband standards consider 20 MHz signals (802.11a[34]) or lower. UWB signals
promise large data-rates, low probability of interception signals and the capability
of estimating distances between the transceivers with a precision as good as a few
centimeters. Further claims are its resilience to multipath, fading and narrow band
interferers. High data-rate UWB transceivers are dominated by a digital baseband
that would perform most of its required functionality.
Shannon capacity equation states that:
C=BWlog2
+
S
)
(1.8)
where BW represents the bandwidth of the signal, S represents its power and N represents the noise power in the same bandwidth. This expression shows that capacity
grows linearly with bandwidth but only logarithmically with signal power, making
UWB amenable to large data rates. The application of this equation is, on the other
hand, limited to single user communications in an AWGN channel.
In any receiver it is possible to detect and separate echoes of the signal as long
as they arrive to the receiver with delay differences of the order of magnitude of the
duration of the impulses. When they arrive closer than this, they combine together,
with a probability of adding together either constructively or destructively. The phenomenon in which several echoes of the same signal arrive close enough to combine
is known as fading and it is a purely narrowband phenomenon. Larger bandwidths
allow better timing resolution, and in the case of multipath, the possibility of separating the echoes that arrive at the receiver. Under these conditions, it is feasible to
use a Rake receiver to gather up the energy from these echoes obtaining a diversity
gain from a situation that would have caused fading in a narrowband setting.
Since the distance between transmitter and receiver is proportional to the time
delay measured between the instant the signal was transmitted to when the signal
is demodulated, it is possible in any communication system to measure the distance
between transmitter and receiver. The variance of the time estimation is inversely
proportional to the bandwidth of the signal transmitted. For a bandwidth of 1 GHz,
time delays with a difference of 1 ns can be distinguished directly from the received
bits (if we are using a BPSK modulation). A delay of 1 ns, taking into account
only a direct path, is equivalent to a distance of 30 cm, allowing very good locating
properties in UWB transceivers. UWB ranging has been studied in [35].
The low probability of interception (LPI) stems from the fact that to effectively
intercept an unknown signal, a complexity at least equal in order of magnitude to
that of the intended receiver is needed. The complexity of the interceptor would in
general grow with the length of the pseudorandom sequences used to randomize the
transmitted signal and its bandwidth. The communication capabilities of any signal
depend on its average power, while the possibility of effectively intercepting it depends
on its peak power. Since using bandwidth spreading allows maintaining constant the
average power (and maintaining constant the capability of transmitting information),
while reducing the peak power (and the probability of interception), the use of UWB
signals allows manufacturing signals with low probability of interception.
The tolerance to powerful narrowband interferers stems from the fact that a narrowband interferer is filtered out by the use of a filter matched to the input signal.
For that reason, even when the power in the band of the interferer is completely
dominated by the interferer, the UWB receiver is able to de-correlate the interferer.
This is the same effect that has already been observer in signals like direct sequence
code division multiple access signals.
The simplicity of the transceiver is associated to understanding UWB signals as
baseband signals. For this reason, it was assumed that it is possible to design a
transceiver whose front-ends are greatly simplified as compared to normal narrowband communications. The signal is generated in the transmitter without the need
of up-converting it. For that reason the transmitter lacks a mixer and a carrier generator, and in certain occasions, the digital part drives directly the antenna. On the
receiver side, the front-end does not require down-conversion. The front-end lacks a
mixer and the whole band is amenable to sampling using an ADC. Then the whole
signal processing may be performed in the digital domain. This implies lower cost,
lower power, ease-of-design and most of the associated benefits of CMOS technology
scaling[36]. Digital architectures were found to outperform analog approaches [37].
Furthermore, they allow for considerable flexibility: a single receiver may support
different modulation schemes, bit-rates, qualities of services and operating ranges,
and change these parameters dynamically.
There are limitations to these claims. Equation (1.8) assumes that the signal is
transmitted in the presence of only AWGN. Neither multipath, fading, multiuser or
the presence of other interferers are assumed. The linearity of the receiver, determined
by the RF front-end and the analog-to-digital converter will limit the performance in
other issues such as the possibility of using a Rake to compensate for the multipath,
and also its resilience to narrowband interferers. As long as the transmitted signal or
the additive white Gaussian noise (AWGN) are the dominant signals in the front-end,
the performance degrades gracefully and can even be extrapolated from that of the 1
bit ADC. If, on the other hand, the receiver is captured by a narrowband interferer, the
performance of the system degrades sharply, as will be proven in following chapters.
Regarding the locating capabilities, the problems encountered by UWB systems are
greatly determined by the complex multipath environment. These are problems that
have been already explored on every other location system based on triangulation
where direct signals may have been highly attenuated (by the presence of a wall) and
other echoes, that would arrive by more convoluted routes appear more clear. Since
UWB systems are conceived to be used in the indoor environment, these problems
may reduce the possibility of using the location capabilities unless the rooms are
carefully modeled.
1.4
UWB Applications
Although it can be argued that UWB is amenable to any communication applications,
there are currently two main drivers for this technology, as shown in Figure 1-2. The
main limitation of the use of larger bandwidths includes the attenuation of the signal.
Section 15 of the FCC limits either the reach of the transmitted signal to very short
distances or the data rate to very low data rates. In any case, UWB communication
is expected to be used for many consumer electronics products in the near future.
Figure 1-2 shows the current trends in applications for UWB communications. On
one side we have very high data rate, very low distance applications such as wireless
personal area networks (WPAN). Possible applications are varied, from communicating peripherals to a computer (replacing in this way Bluetooth in PC architectures)
and communicating wirelessly from a DVD player to either a flat-screen or a sophisticated sound system. The IEEE 802.15.3a standard group specified different modes of
transmission for various ranges: 110 Mbps at 10 m, 220 Mbps at 4 m, and an optional
mode of 480 Mbps at 1 m. There were two proposals that met these criteria: Multiband OFDM, backed by a consortium of more than 60 corporations, is an extension
of the standards 802.11a and 802.16e for a larger bandwidth, but retaining most of
the rest of their characteristics [33]; and the proposal presented by Motorola is based
on CDMA M-BOK (multi-bit bi-orthogonal keying) signals [32]. After several years
of stale-mate, the standardization group disbanded without generating a standard.
On the other hand, with smaller data rates in the order of 100 kbps and less,
there is a large space of applications that include RFID for inventory control, and
IEEE 802.15.3a
WPAN
500Mb*
WLAN
50Mb*
5Mb*
Wir 1Ps
__
VV· V·QC
I
IFFF ARf9 1_ A
USB &
Multimedia
500Kb'
Im
10OM
lOOm
Distance
Figure 1-2: Intended applications.
similar applications. In this case, since we are using a lower data rate, much larger
distances might be achieved. Besides, the locating capabilities of UWB signal add to
the value of these applications. IEEE standard group 802.15.4a is currently developing
a standard for these applications.
1.5
Previously Used Architectures
This section shows the architecture of three receivers, some of whose characteristics
we will borrow. First, the Berkeley Wireless Research Group proposal, uses a high
speed ADC right at the output of the LNA, and performs all the signal processing in
the digital domain. Second, as a paradigm of a broadband system, a CDMA receiver
has common characteristics with the UWB system, and part of the intuition obtained
here can be applied. Finally, I will point out to some of the characteristics of an FCC
compliant, impulse UWB transceiver that has been published in the International
Solid State Circuit Conference.
In 2005, an impulse ultra-wideband transceiver was presented in [21]. The main
contribution is that it is a traditionally conceived baseband UWB transceiver. This
system focuses in the band from 0 to 960 MHz. They focus on applications that
require low data rates such as sensor networks. This transceiver uses binary antipodal
modulation and its architecture is shown in Figure 1-3. This system implemented a
front-end with no mixers. A 1-bit ADC is used after the low-noise amplifier and the
variable gain amplifier. The chosen resolution is 1 bit, allowing a sharp reduction in
MATCHED
FILTER
Figure 1-3: Architecture of UWB receiver by Berkeley Wireless Research Center.
the power dissipation of the ADC and avoiding the necessity of an automatic gain
control. On the other hand, as shown in [38], this transceiver is easily captured by inband narrowband interferers. The whole signal processing is performed in the digital
domain, and it uses extensive digital correlation. Several parameters as the shape of
the pulse, the length of the code and even the use of PPM or BPSK can be changed
seamlessly as the receiver works. The timing control is partially performed in the
digital domain, since the clocks that perform the sampling are directly controlled
from the digital baseband. This kind of timing loop is no longer necessary as it has
been proved in multiple implementations of other broadband systems. This system
supports low communication rates (- 100 kbps) and ranging capabilities over short
distances (- 10 m).
Since the authorization of UWB by the FCC for communication purposes, the limitation not to use the spectrum below 3.1 GHz for high speed communications have
changed the approach to UWB systems, making its characteristics more conventional
and related to prior wideband systems. Concretely, the claim that a much simpler
RF front-end is required may not be applicable as before, since down-conversion will
help to limit the specifications of the ADC and the baseband. The architecture of the
baseband has then several similarities with standard CDMA systems, being the main
difference between them and impulse UWB system that the duty cycle of the UWB
signal is smaller than 100 %. For that reason, the architecture of a UWB transceiver
borrows important characteristics from classic CDMA transceivers. Since these systems have been extensively used since the 1980's for a large range of applications that
range from RADAR to communications, to locating (Global positioning system - GPS
[39]). From CDMA transceivers we will borrow the acquisition, synchronization and
tracking algorithms.
Figure 1-4 shows the different parts of the baseband of a CDMA receiver, from
the antenna to the demodulation channel. Of the characteristics of this receiver the
Figure 1-4: Correlator channel in a CDMA receiver.
most important are two: almost no feedback between the digital and the analog part is
needed (only the Automatic Gain Control) and the synchronization process has a part
that is hardwired (and performs the correlations) and a part that is programmable and
can be changed to adapt it to the current situation of the receiver. Since the impulse
UWB modulation can be interpreted as a direct sequence code division multiplex
signal, some of these techniques for time synchronization may be adapted for a larger
bandwidth and different data rates and duty cycles. This architecture receives the
samples of the signal in an intermediate frequency. In order to recover the data
signal from the CDMA signal, it is necessary to perform the last frequency downconversion by multiplying the incoming signal with the carrier and to correlate with
the pseudorandom code. Both are locally generated signals that need to be properly
synchronized. Two tasks must then be performed:
* Carrier synchronization: A Phase Locked Loop (PLL) does not have, by itself
a wide enough pull-in range to lock onto the signal. But a Frequency Locked
Loop (FLL), proficient enough to lock onto signals with a variety of center
frequencies, is too noisy to perform a proper tracking of the signal after having
achieved lock. The solution is easy as we are in the digital domain and part
of the loop is programmable. From the block diagram in Figure 1-4 only the
correlators (integrate and dump blocks, plus multipliers before them) and the
code and carrier generators are hardwired. The filter loop, and, in general all
decisions related to the data coming from the integrate and dump blocks are
controlled at low frequency through software. That way, different situations
can be detected, and a FLL with a large pull-in range or a Costas loop (I-Q
version of the classical PLL [40]) with a good noise response can be used, and
the functionality can be changed on the run.
* Code synchronization: Also affected by the Doppler effect, but, as it is a lower
Figure 1-5: Architecture of baseband of UWB receiver by Sony Corp.
frequency signal, its effect is smaller. More important in this case to align the
chips of the incoming signal with those of the code generated. Due to the autocorrelation properties of the pseudonoise code, misalignment larger than half a
chip causes to loose almost completely the signal. A linear procedure in the way
of a Delay Locked Loop (DLL) provides a very good noise bandwidth but it is
ineffective at the beginning of the search, because the procedure is only linear
within half a chip of perfect alignment. In order to bring the local generator
within half a chip of the code in the incoming signal, a coarse synchronization
algorithm (non-linear) must be used. As in the carrier synchronization, the loop
is closed by software and is programmable.
In ISSCC 2005, several UWB transceivers were presented in San Francisco. Although most of the systems presented were associated with MB-OFDM proposals
the system presented from Sony and Mixed Signal Systems proposed a 3.1 to 5 GHz
CMOS DSSS UWB transceiver [41]. It was FCC compliant and transmitted information that is spread with a chip rate of 1 Gchips/s in the baseband block. The
impulses are further shaped to lower the power density at 3.1 GHz, to increase the
total transmitted power by flattening the spectrum of the transmitted signal, and
to pre-equalize the waveform of the transmit signal for the RX filter characteristic.
As part of this transceiver, a baseband block to process the samples was included.
The baseband receives the 2b ADC samples of the output signal of the ADC at 1
GSample/s. The synchronization to both the phase of the carrier and the spreading
code is performed by controlling the phase of the ADC clock. Figure 1-5 shows the
block diagram of the proposed baseband in this transceiver. This transceiver was
implemented in a 0.18 /m CMOS process and consumes 105 mW in transmit mode
and 280 mW in receive mode from a 1.8 V supply.
1.6
Signal Processing Techniques
The channel allocated to UWB communication signals is impaired by severe multipath and in-band interferers. Regarding multipath, it has already been proved that
the UWB signal does not suffer fading, requiring little fading margin to guarantee
reliable communications [42]. There are several comprehensive studies on the statistical properties of the UWB indoor wireless channel [43, 44, 45, 46]. The IEEE
802.15.3a chose the multipath model presented in [3]. It is a Saleh-Valenzuela [1]
model with two modifications: a log-normal distribution is used instead of a Rayleigh
distribution for the multipath gain magnitude, and independent fading is assumed
for each cluster as well as each ray within the cluster. In order to compensate for the
effects of multipath in the receiver, the channel impulse response will be estimated
and this information will be used in a Rake receiver and in a MLSE detector. In the
following paragraphs some results known in these areas will be presented.
Channel estimation is critical in the context of narrowband and spread spectrum
system. The procedures already developed for DS-CDMA can be adapted to UWB
systems [16], although the high sample rates required usually motivate the search for
alternatives. Some impulse response estimators developed in [47, 48] are based on the
maximum likelihood (ML) criterion. The problem of channel parameter estimation
in UWB communications has been also addressed in [49], in this case in the context
of the signal energy capture in Rake receivers as a function of the number of Rake
fingers. [47] looks both at data-aided (DA - in which a training sequence is used)
and non-data-aided (NDA) estimation. In both cases, the objective of the algorithms
is to separate each component of the multipath estimating its delay (with respect
to a reference initial time of arrival) and the attenuation associated to that data
path. The channel is assumed to have unlimited bandwidth and for that reason the
complexity of the algorithms grows unbounded as the number of possible multipath
components are considered. They also analyze the characteristics of the transceiver
taking into account the presence of several users transmitting at the same time. The
system developed here has been conceived as a time-division multiple access (TDMA)
system, so the problem of multiuser detection will not be addressed, and the matched
filter [50] may be considered as the optimum receiver.
The joint timing synchronization and channel estimation has been recently pursued [51], in this case using least squares (LS) estimates of both the timing offset
and the channel impulse response, assuming Nyquist sampling of the baseband signal. Both sub-Nyquist schemes [52] and FFT based approaches [53] to the channel
estimation problem are also present in the literature. Normally sub-Nyquist schemes
provide a trade-off between a lower complexity and a larger minimum signal-to-noise
ratio since only a subband of the total signal is used, and the energy that falls outside
this bandwidth is not taken into account. Spread spectrum signal acquisition has
been studied theoretically in [54, 55, 56, 57].
Since the bandwidth of UWB signals allows the separation and characterization of
a large number of the components of the received channel, it is natural to use a Rake
receiver [50] in order to compensate for the multipath and also to take advantage of the
multipath diversity in the presence of obstacles to boost the SNR [16, 58]. It has been
already indicated in [19] that as the number of multipath components of the channel
increase, the amount of energy that can be used for channel estimation grows, and
the capacity of the channel goes asymptotically to zero. The use of Rake receivers for
UWB signals has been explored in [47, 59, 60, 48, 61, 62]. Concretely, [48], presents
a comprehensive analysis and approach to several different implementations of the
Rake receiver as more or less complexity is available. The terms all-Rake (ARake),
selective-rake (SRake) are introduced. As we will see in chapter 3, we will use an
intermediate solution, a partial Rake receiver will be used [59].
In order to combat the intersymbol interference (ISI) that comes when we are
transmitting using a symbol duration that is shorter than the duration of the channel
impulse response, it is necessary to use a maximum likelihood sequence estimator [63].
Several of the possible procedures and approximations to the perfect MLSE receiver
were explaind in [64]. Since this kind of receiver has been applied in convolutional
codes, its architecture is already well known and some high performance Viterbi
decoders have been reported in the bibliography [65, 66, 67, 68, 69, 70].
Although for design purposes we will take into account the presence of narrowband
interferers and how its presence degrades the performance of the system, further than
detecting the presence of an in-band interferer, we will not take further actions to
reduce its impact.
1.7
Power Dissipation in UWB Systems
The last years have seen the increase of the importance of using power awareness in
digital systems. The recent development of wideband and ultra-wideband wireless
systems in which the use of sophisticated signal processing and coding techniques
allows recovering the signal under stringent conditions of noise and interference has
contributed to this trend. As the bandwidth of the signal is increased, frequency
diversity can be used to boost the signal to noise ratio in the receiver. It has also
been a noticeable trend that the transmitted power is no longer a large percentage
of the total power of the system since it is out-weighted by signal processing and
bias currents in the front-end. There is a larger percentage of energy devoted to
the signal processing and the digital parts than to the rest of the system. Although
power scales down easier with technology in the digital domain than in the analog,
this trend is slowing down by the fact that leakage increases the current density that
is continuously drawn from the power supply [71].
Power awareness in communications systems is defined [72] as the awareness of the
exact performance demands of the user and the environment. A power-aware system
consumes just enough energy to achieve the desired level of performance, and not one
decibel, byte or hertz more. Power-aware systems exhibit this characteristic at all
levels of the system hierarchy. Energy trade-offs are enabled at the circuit level and
exploited at the algorithim level. For the system to become aware it will be shown in
this thesis that it must incorporate sensing techniques of the parameters that would
allow adapting intelligently the signal processing to the current environment and user
requests. The ability to trade off performance for energy savings within the node,
and collaborative processing among nodes reduces the overall energy dissipated in the
network. Energy inefficiencies in the system must be confronted and eliminated.
The adverse conditions of the UWB channel require sophisticated signal processing in any of the proposals to ensure the modes of transmission required by IEEE
802.15.3a. Therefore, the digital baseband of this transceiver consumes a significant
percentage of its total power dissipation. Some UWB transceivers along with their
power levels have already been reported in the bibliography:
* RF Front-end: 117.5 mW [73]. Implemented using SiGe BiCMOS 0.25 /Lm
technology.
* Clock and carrier generator: 73.44 mW [74], implemented using SiGe BiCMOS
0.25 pm technology.
* Digital Baseband: 523 mW [75]. This system includes a complex low-density
parity-check code (LDPC) demodulator, that constitutes the most important
contribution of the system.
* A pulsed UWB system [41], based in DSSS, implemented in 0.18 pm CMOS
technology, consumes 280 mW.
It was also estimated in the MBOA White paper that the transmitter for an MBOA
transceiver for 90 nm CMOS process would consume 93 mW in transmission and 169
mW in reception. For lower data rates, some arhitectures were compared in [76].
Taking this into account, quality of service may be traded-off with complexity
and power dissipation, depending on the channel quality and environment. In this
thesis, the digital baseband for a FCC compliant pulsed UWB transceiver will be
developed. A prototype will be designed that focuses on providing a flexible platform
that exposes several knobs in the architectures to control this trade-off explicitly,
adapting the power dissipation to the required quality of service and the channel
characteristics.
1.8
Thesis Contributions
As part of the learning process of this thesis, three UWB digital basebands have been
designed, two of them implemented as dedicated digital circuits. The first one is part
of a system-on-a-chip prototype oriented to the demodulation of UWB baseband
signals from 0 to 500 MHz, and for this reason it is not FCC compliant. It was
implemented using 0.18 pm CMOS technology working at 1.8 V to achieve a data
rate of 322 kbps. The second UWB baseband was designed for a discrete prototype
and implementing using an FPGA. With this digital baseband it was possible to
obtain either a wireless data rate of 100 Mbps using an arbitrary waveform generator
as transmitter or 50 Mbps using a dedicated impulse generator as transmitter using an
FCC compliant signal. A second ASIC was designed to implement an FCC compliant
UWB baseband working at 100 Mbps, using also a FCC compliant signal. This second
ASIC was also implemented in 0.18 ym CMOS technology working at 1.8 V. Among
the subsystems implemented here are included 150 correlators in parallel to reduce
the time to achieve coarse acquisition, a programmable partial Rake that may use up
to 25 complex taps and a Viterbi-like MLSE equalizer. The two main drivers of this
architecture are to supply a good characterization of the environment in which the
transceiver is, and also to provide higher levels of the architecture with knobs with
which to trade off power dissipation and data rate with quality of service, adapting
the transceiver sharply to the channel state.
For the implementation of these baseband processors, it was necessary to map
several signal processing algorithm such as correlations or a Rake receiver to an efficient parallel architecture, that would allow further optimizations such as dynamic
voltage scaling. The various prototypes were organized around a central structure
of correlators that performed several different tasks during the demodulation process
and several auxiliary sub-blocks to complete the necessary functionality for demodulation. Although this has not been fully exploited in this thesis it would be a good
starting point for future optimization.
Since the complexity of the signal processing and the power dissipation are linked,
this thesis has explicitly exposed the link between signal processing complexity, power
dissipation and quality of service.
Chapter 2
A Baseband Processor for a
Baseband UWB Transceiver
In this chapter, the architecture, implementation and measurements of a baseband
processor for pulsed ultra-wideband signals are presented. Although originally designed and tested for baseband pulses over a wireless link, this architecture may be
easily scaled to larger bandwidths or applied to an FCC compliant transceiver by
adding functionality to the RF front-end for up/down-conversion within the 3.1 GHz
to 10.6 GHz band. This architecture was implemented using a 0.18 Pm CMOS process at 1.8 V. The transceivers developed in chapters 3 and following will start with
the modification of this initial implementation.
2.1
UWB Signals
For this transceiver, BPSK was chosen because for binary signals, it has a 3 dB
signal to noise ratio (SNR) advantage over PPM [21] (considered as an orthogonal
modulation [50]). This work focuses on a receiver for pulsed UWB signals, using 0-500
MHz baseband pulses. In this implementation, each bit of information is represented
with a sequence of 31 pulses with a width equal to Tp = 2 ns, and every two consecutive
pulses are separated by Tf = Nf •Tp with Nf=50, resulting in a very low duty
cycle(- 2%) [6] and a bit duration of Dut=1550.T,. The information is encoded on the
sign of the pulses, that also depends on the corresponding bit of a Gold code sequence
cj of length Ne=31. A Gold code is chosen for its good autocorrelation properties,
that allows obtaining good synchronization to the received packet. Although it also
offers very good cross-correlation characteristics, we will not take into account in this
design more than one user. Channelization could be implemented by assigning a
different Gold code to each user [50]. A family of Gold codes is obtained initializing
one of the two shift registers used to generate the Gold code with different seeds.
Suppose the bit-stream is denoted by a sequence of binary symbols bj (with values
+1 for bit 1 or -1 for bit 0) for j = -oo, ..., oo. Let A denote the amplitude of each
pulse p (t). Then, the transmitted signal is :
oo
SBPSK (t) = A
E
Nc-1
E bjcp (t - jNcTf - iTf)
(2.1)
j=-oo i=O
where c, is the Gold code. This signal provides a processing gain [50]:
N
PG = 10 log - = 32
d
dB
(2.2)
where N, is the length of the gold code (31) and d is the duty cycle of the signal.
The data packet is comprised of a preamble and the payload. During the whole
packet, the same Gold code is used. The preamble is composed of a sequence of pulses
whose signs follow several repetitions of the Gold Code, plus a final sequence of 31
pulses in which the Gold code is reversed. This last sequence represents a bit 0 (as
opposed to the previous repetitions that represent a sequence of bits 1) indicates the
end of the preamble and the beginning of the payload.
The time to achieve packet synchronization is a critical specification of any high
data rate wireless system. The length of the preamble must be long enough to guarantee a high probability of achieving signal acquisition. The importance of the duration
of the preamble of the data packet stems from the fact that the energy spent during
the preamble is not spent in proper demodulation of the encoded information. Therefore, it represents an offset that is spread over the whole data packet. The longer the
data packet is, the less important it is to have a long preamble. For short, bursty
traffic, a long preamble implies that a significant percentage of energy is not spent in
demodulating the payload. The target application of the transceivers of which this
baseband is part is that of transmission of files and information between computers.
In this environment, previous wireless standards both for Wireless Local Area Networks and Wireless Personal Area Networks can serve as benchmarks to which to
compare our results. IEEE standard 802.11a [341 is a wideband standard for WLAN
with a maximum data rate of 54 Mbps. Some of the characteristics of this standard
will be used as a benchmark for our prototypes. For example, IEEE standard 802.11a
has packets with a preamble of 22 is [34]. An objective of the designed systems in
this thesis is to achieve an acquisition time of the same order of magnitude.
2.2
System Trade-offs
An important consideration in the receiver architecture is determining where to place
the analog/digital partition. We chose a digital architecture that also implements
all the synchronization in the digital domain. The advantages are the simplification
of the analog elements in the transceiver, its scalability, and the possibility of exploring digital channel adaptability and recovery. Performing the synchronization in
the digital domain eliminates the need to feed a signal from the digital domain to
the clock generation subsystem. As a preliminary prototype that would serve as a
il
iil ..
..
...
..
.......
••...
.................
...........
..
• .. 2 b t
..............
...............
...
..2 bits
.......
...
....
4 bits
....
.....
bits...~V
L56
......................
.....
.
..
.............
...r~
·--- 2bits
..
.......
...
. ··-·_--····_·
.......
.....
.....
AI...........
.....................
............
......................
........................
~..;.......
................
........
E10,
..............
............
.
.........
......
-3
..
;
...
..
..
..
..
..
..
..
..
...
..
..
...
..
.....
..
..
....
..
..
b
VOlO
....
....
...
...
....
...
....
...
....
....
...
....
...
....
...
..
....
...
Ni!.......
- .......................
0
i i ii ....................
...............
! i........................
............
•.......
...........
....
i• ....
..........
• ..............
....
.......................
......
.....
......
...
......
......
.......
..
~102
c:::::::::::..................
I.......
~.............
.
......
......
.....
.....
...
A:......
......
.....
.....
.....
............
....
......
............
...........
...........
......
....
.....
......
......
.....
...
.......
.....
.....
....
!:
!!
.........
'• :!iii
.'.
.••'•
..
.......
........
.
...........
~
a-
bi t
.......................
i...............
....I ...•....•........................
..
.................
;....
...
.............
...
.....
..,....
...•. . .. . .. . .. .. .
•
......................
:::: : s
=
If-4
'-30
-25
-20
SNR (dB)
-15
(a) Noise limited environment
-10
'"-30
-25
-20
SIR (dB)
-15
-10
(b) Interference limited environment
Figure 2-1: BER as a function of the SNR (a) or SIR (b) for different ADCs
proof of concept, it was chosen not to implement an automatic gain control even if a
conventional wireless communication system requires it.
For that purpose, the signal is sampled at twice the Nyquist rate [50]. It is
possible to show that by doing this the information that we have in a baseband signal
for timing control purposes is equivalent to that of the I-Q direct conversion scheme
in which both an in-phase and a quadrature components are sampled at the same
time.
A digital architecture depends on the feasibility of the analog to digital converter
(ADC) required to digitize the signal. To allow for an all-digital timing recovery, the
ADC must sample at 2 GSPS, oversampling at twice the Nyquist rate. A FLASH
ADC architecture is well suited for such a high sampling rate [77]. Since the power
consumption in FLASH ADCs scales exponentially with the number of bits of resolution, minimizing ADC resolution is crucial to reduce the power consumed in the
receiver. Four bits of resolution are sufficient to be closer than 1 dB to the infinite resolution ADC curve for bit error rate. This is true in both a noise limited environment
where the signal is degraded by AWGN and in an interference limited environment,
where the signal is corrupted by a powerful narrowband sinusoidal interferer [38].
Figure 2-1(a) shows the effects of ADC resolution on transceiver bit-error rate (BER)
for different SNRs. Figure 2-1(b) is the equivalent plot for the interference limited
case in terms of the signal-to-interference ratio (SIR).
The baseband UWB receiver uses a front-end that amplifies the incoming impulse
signal [78]. After it, the baseband processor shown in Figure 2-2 demodulates the
signal. The following sections describe some details of the clock generation subsystem
and the ADC, and the full design of the digital baseband.
Signal from
____·__
RF Front-endk
FowPhas
300MHz
Ck
Fast clocking "..'.
domain
Slow clocking
domain
Figure 2-2: Baseband processor block diagram.
2.3
Architectural Choices for Clock Generation and
ADC
The 4-bit ADC in the UWB receiver is comprised of four FLASH time interleaved
channels running on 500 MHz phase-offset clocks supplied by the PLL, achieving
a sampling rate of f,=2 GSPS. It was designed by Puneet P. Newaskar. A clear
advantage of the use of a flash time interleaved ADC is that the maximum frequency
that is generated in the receiver is 500 MHz instead of fs. The samples from the four
channels are aligned to the same 500 MHz clock edge instead of creating a sample data
rate clocked at 2 GHz. They are then presented in parallel to the digital back-end at
this reduced data rate.
The input to the digital baseband is the samples obtained from a flash interleaved
ADC. This has two advantages. First, overall performance is determined by the
average of all four channels rather than being limited by the worst case. This occurs
because the digital back-end adds groups of four consecutive samples, coming from the
four different channels, and treats the result as a single sample. This reduces the need
for calibration across the channels as required in most time-interleaved ADCs, since
the errors would be averaged out. The second advantage is that data is supplied to
the back-end at a reduced data rate. The outputs of the different interleaved channels
of the ADC are presented in parallel to the digital back-end. The samples from the
four channels are aligned to the same 500 MHz clock edge instead of creating a sample
data rate clocked at 2 GHz. For this same reason, the clock generation subsystem
does not need to generate a 2 GHz clock, but 4 phases of a 500 MHz clock, one of
them used for the digital back-end.
Since the system implements a fully digital synchronization algorithm, the only
input to the clocking system is the reference crystal clock, and its sole function is to
track it. The jitter requirements are mostly constrained by the digital back-end of the
receiver. Given that the probability of losing synchronization during a 1024-bit data
packet with a rms clock jitter of 100 ps is smaller than 0.01, and the degradation in the
SNR introduced in the ADC by the same jitter is smaller than 1 dB, a ring-oscillatorbased VCO can be used to generate the 500 MHz clocks required for the ADC.
The ring oscillator, designed by Fred S. Lee, consists of four differential inverters,
producing the four 90 degree phase-shifted clocks that drive the time-interleaved
FLASH ADC.
2.4
Digital Baseband
The digital back-end implements the functionality required to synchronize and demodulate the data packet. The digital baseband implements the entire synchronization
algorithm in the digital domain without feeding back any control signal to the other
blocks, and achieves packet synchronization in less than 70 ps.
2.4.1
Functionality
This receiver will be in one of two functional states: the coarse acquisition state
looks for the presence of a data packet and achieves synchronization, and the fine
tracking state performs the demodulation of the data packet after coarse acquisition
was achieved. The following paragraphs present the specification of the different
blocks of the digital back-end.
This transceiver is being developed as a proof of concept for the parallellized
architecture. For that purpose it contains only the minimal functionality to acquire
and demodulate a baseband signal. It assumes the signal is baseband so that no
carrier recovery functionality is included. It also assumes that the Automatic Gain
Control works perfectly, so it is not implemented.
Matched filter
The digital back-end recovers the information contained in the data packets from
the samples given by the ADC using an approximation to the matched filter [50]. A
matched filter is the optimum receiver for a signal in AWGN, and implies the correlation of the received signal with a local template synchronized to it and comprised
of perfect replicas of the received pulses separated by the inter-pulse interval and
whose signs coincide with that of the Gold code. This receiver uses a sequence of
rectangular pulses instead of perfect replicas of the received pulses, avoiding the use
of multipliers. The correlation of the incoming signal with a rectangular pulse of
width Tp is equivalent to adding four consecutive samples taken at 2 GSPS. For a
Gaussian pulse, 1.7 dB of SNR are lost, compared to the ideal matched filter.
Since the correlation is implemented completely in the digital domain, the use of
a larger T, than the optimum does not accelerate the coarse acquisition process and
it has roughly the same number of mathematical operations as performing the correlation of the incoming signal with several differently delayed templates. Moreover,
a larger than the optimum T, reduces the SNR at the output of the matched filter.
Therefore, the width of the integration window is chosen to maximize the processing
gain.
0.9
40.8
....
.
...
... ..
0
40.7
•0.6
0.5
n0
V'.2
Rectangular pulse
. - Triangular pulse
. Gaussian
.. pulse
0.4
0.6
0.8
1
D (ns)
.
1.2
1.4
1.6
Figure 2-3: Pd as a function of D, the relative position between the pulse and the
template.
Coarse acquisition
The coarse acquisition algorithm detects the presence of a data packet and estimates
its delay with a precision of half the width of the pulse. For that purpose, it locates a
peak of correlation between the incoming signal and a sequence of differently delayed
local templates. The results of each of these correlations are compared to a predefined threshold, Tho, and if this threshold is met, packet acquisition is declared. The
complexity of this process grows with the signal bandwidth, the length of the Gold
code, and the smaller duty cycle. This is a serial search. Some papers propose a nonserial search [79, 80, 81], but this usually implies loss of time while obtaining in the
practice the corresponding correlations. Some interesting work on synchronization in
multipath environments using dirty templates has been published in [82]. PPM coarse
acquisition has been studied in [83].
The difference in delay of two consecutive templates affects the number of opportunities available to detect the signal. A smaller difference implies a larger probability of
detection of the signal, but also an increase in the complexity of the receiver. Setting
this difference in delay equal to T,=2 ns gives a reasonable probability of detection of
roughly 0.9 for a Gaussian pulse, as shown in Figure 2-3. This choice also simplifies
the timing design in the receiver as all clocks have the same frequency.
If the transmitter and receiver clocks are assumed to have the same frequency, the
coarse acquisition process may be modeled as a Markov chain, as shown in Figure
2-4. The delay of the received pulses with respect to the different local templates is
assumed to be constant. The states from 1 to N = N, - Nf represent tests in which
the received signal is correlated with a template with a different delay. The initial
state can be any from state 1 to state N = N, . Nf. The states i and i + 1 contain the
pulses aligned with an error smaller than half the width of the pulse. If the pulse is
Figure 2-4: Coarse acquisition process as a Markov chain (D = Correct detection; FD
= False detection).
detected there, it goes to state D, correct detection (with probabilities Pd,i or Pd,i+l).
The rest of the states do not contain the pulses. Any detection in those states implies
a false detection (state FD) with probability Pfd. If no signal is detected, from each
state it jumps to the next one.
Using this information, it is possible to obtain the average number of iterations
to achieve lock (E [k]) and the probability of correct detection (Pcd).
E [k] -
NA + E•" 1 nPr[n]
n1-A
= (P,+ Pi+i (1 - PW)) Hi- (1 - P)
1-A
(2.3)
(2.4)
1 (1 - Pj). It is assumed that the probabilities
where A = j-N
of declaring a detection
for the slots j from 0 to N -1 is Pj. For states i and i+1, Pi = Pd,i and Pi+l = Pd,i+l.
For other values of j, P, = Pfd.
Averaging these expressions for all the possible delays of the incoming signals,
and choosing N,=31 and N =50, Table 2.1 is obtained. The probability of correct
detection drops sharply when Pfd = 10- 3 .This happens because Pfd is comparable
to 1/N, and in N trials, a false detection may arise before there is an opportunity of
testing the right delay. To ensure a reasonable probability of correct detection, Pfd
must be much lower than 1/N. The coarse acquisition algorithm designed for the
implemented system assumes 50 way parallelization. For a Gaussian pulse and using
a SNR and threshold that ensures Pfd = 10-4,the average time to declare coarse
acquisition is 65 ps.
In this discussion, it has been assumed that the frequencies of the transmitter
and receiver clocks were exactly the same. A large difference in frequencies spreads
the total energy of the pulses not only across the correlation to two consecutive
templates but three or more, reducing the probability of detection. It was proven that
'0
k in ppm
Figure 2-5: Change of probability of detection due to a difference in frequencies
between transmitter and receiver
for the specified clock stability of 20 ppm, typical in crystal oscillators, the current
specification is robust enough and the loss in probability of detection is negligible, as
shown in Figure 2-5. r in this figure represents the difference in delay between the
boundary of two consecutive integration windows and the center of the impulses. r 0.5 ns implies that the impulse is centered in one integration window. As r decreases,
part of the energy of the impulses appears in a different integration window and the
probability of detection decreases. But at the same time, it is necessary a larger
deviation in frequency (k) to impact this probability of detection.
This work was presented in [84].
Fine tracking
Once packet synchronization is declared, depending on the length of the data packet,
it may or may not be necessary to include a fine tracking algorithm to keep time synchronization for the duration of the data packet. For the specifications of the system,
if the difference between the transmitter and receiver clocks is 20 ppm and there were
no fine tracking mechanism, after 250 pulses half the energy of the incoming pulses is
not included in the correlation with the local template with which the received signal
was initially aligned. Since each bit is represented by N,=31 pulses, this allows a
maximum of 8 bits in the packet.
Table 2.1: Model results for a Gaussian pulse
E[k]
Pcd
526
0.48
10- 4
915
0.91
10-5
1066
0.99
Pfd
10- 3
For fine tracking, a classical Delay Locked Loop (DLL) [40] is used. It straddles the
incoming pulses between two consecutive local templates, representing early and late
versions of the signal. The relative values of these correlations are used to estimate
the delay of the incoming signal.
Since all of the timing control is performed in the digital backend, it is not possible
to continuously adjust the delay of the local templates generated in the receiver with
respect to the incoming signal. The only feasible delay adjustments are an integer
number of samples. Due to the architecture used, only a limited range of delay
corrections can be applied. The architecture presented in the next section allows for
corrections of -3, -2, -1, 0, 1, 2 and 3 samples. It was tested through simulation that
this is sufficient for this system.
2.4.2
A parallelized approach
In this section the procedure to use a parallel architecture for the implementation of
the matched filter at an acceptable clock frequency is explained. For that purpose we
will make use of the concept of poly-phase implementation of filters that is usually
used in decimation filters [85].
Let us assume that we are trying to lock to a signal that repeats indefinitely a
pseudorandom sequence of N, bits, with a duty cycle of 1/Nf, that for the sake of
argument implies that the transmitted signal has only one non-zero sample out of
every Nf consecutive samples. The sequence that we are trying to lock to then can
be written as:
Nc-1
x[n] = )
[n - k. Nf]
(2.5)
k=O
The matched filter to this signal is defined as
h[n] = zx[Nc - Ng - n]
(2.6)
In impulse UWB systems with a very low duty cycle, this filter is very long in the
number of taps, but most of them are equal to zero. In addition, a direct form I
implementation of this FIR requires Nc *Nf -1 registers, although only N, -1 adders
and N, multipliers (by 1 or -1 - so they are amenable to simple implementation). It
is necessary for later developments to notice that:
h[n -Nf + k] = 0
(2.7)
For k = 0, 1, ..., Nf - 2, for all values of n.
All these blocks should be working at the sampling frequency (in this case 500
Msamples/s). It is desirable to obtain an architecture with the same functionality,
that allows running most of the mathematical operations at a lower clock frequency.
Assuming that a series to parallel operation in N1 ways happens after the application
of the matched filter, the structure of the filter with the matched filter would look as
shown in Figure 2-6(a).
The first step would be to consider using a poly-phase decomposition of the filter
SERIES TO
PARALLEL
SERIES TO
PARALLEL
x[n]
[n]
y[mN]
I
I
I
I
Z-
I
I
SNf
I
y[mNrl]
yNmt'JmNr2l
I
I
I
,'
J
ymNf(N t -1)]
(a) Model of the matched filter with a
serial to parallel block at the output
(b) Poly-phase decomposition of the filter.
SERIES TO
PARALLEL
(c) Correlators architecture
Figure 2-6: Correlators Architecture.
h[n] in Nf phases:
ek[n] = h[n Nf + k]
(2.8)
There are Nf phases (from 0 to Nf - 1). And each of the down-sampled outputs of
the filters can be obtained as shown in Figure 2-6(b).
Now we take into account the special characteristics of the filter h[n]. As we have
defined h[n], only one of the poly-phase components is non-zero. This component
would be eN,1_[n]. For that reason, it is possible to simplify and obtain all the
outputs of the matched filter with an architecture as shown in Figure 2-6(c).
The architecture shown in Figure 2-6(c) requires a total of Nf(N' - 1) registers,
Nf(N, - 1) adders and N - Nf multipliers. The clock frequency has been reduced at
the cost of adding complexity to the receiver. It is possible to reduce the complexity at
the cost of not obtaining all the outputs of the filters, and, because of that, increasing
the time to achieve signal acquisition. The correlator architecture employed in this
transceiver, and its variations as shown in the next chapters will exploit this trade-off.
2.4.3
Architecture
The digital back-end is divided into a fast and a slow clocking domain, as shown in
Figure 2-2. The fast domain uses a custom layout, a 500 MHz clock coming from the
PLL, and is composed of a retiming block, a block that performs a 10x parallelization
of the incoming ADC samples, and the main control of the digital back-end. The
parallelization allows the slow domain to work with a 30 MHz clock. The correlations
and other mathematical operations needed to implement the synchronization and
demodulation are performed in 2's-complement fixed-point arithmetic in the slow
domain, using standard ASIC flow.
The retiming block provides the one-sample delay granularity required for fine
tracking. The groups of four samples that are the inputs to the correlators may start
with any arbitrary sample and may include samples belonging to two different ADC
vectors, as shown in Figure 2-7. These groups are obtained by selectively delaying
the outputs of one or more ADC interleaved channels in the retiming block shown
in Figure 2-8. Since the first operation in the correlator block is to add together
four consecutive samples, the outputs of the retiming block need not be reordered
chronologically in this case. This block is controlled by a four state finite state
machine whose value is updated based on the relative delay of the incoming signal
with respect to the local templates.
After the parallelization, the outputs of the fast clocking domain are processed by
10 correlators as shown in Figure 2-9. In each 30 MHz clock cycle, the four samples at
its input are added together, implementing the correlation with a rectangular pulse of
width T,. The result of this addition is either added or subtracted, depending on the
value of the Gold code, to the 11-bit value stored in the shift registers five cycles before
(50.T,, equal to the time between two consecutive pulses). Since the time between
two consecutive pulses is equal to 5 cycles, in order to cope with this duration, coarse
acquisition is decided only upon N, - 1 pulses instead of NC pulses. All multipliers
in Figure 2-9 are implemented using 2-to-1 multiplexers because in each of them one
ADCO[n I ADC1[n] IADC2[n] IADC3[n]
A
\
\
\
\
I
\
\
[4
\
I
\
\
\
\
\
ADCOn]
\
\\
\
ADC[n]
ADC2[n] IADO] n]
\
\
'
Figure 2-7: Groups of four consecutive samples required
of the coefficients represents a single bit. Each correlator performs five correlations
at the same time, equivalent to the output of an FIR of Dbt - f,=6200 coefficients
with values equal to 1, -1 or 0. The outputs from the ten correlators (a total of 50
correlation values) are used by the coarse acquisition subsystem, but only the first
two are active during fine tracking. The Gold code generator is implemented with two
shift registers, each of them generating a linear recursive sequence of which both the
coefficients of the generating polynomial and the seed values are programmable. If the
incoming signal is properly aligned to one of the first local templates in a correlation,
at the end of the iteration, the 50 correlations contain the samples of the channel
impulse response. This information was not further used in this transceiver, but it
may be used in future prototypes in a RAKE receiver to compensate for the multipath
and in an MLSE sequence estimator to make up for the inter-symbol interference if
the system uses a inter-pulse interval shorter than the channel impulse response.
The duration of a.correlation iteration is (N, + 1).133 ns. The extra 133 ns
provides additional time to compare the correlation results to Tho, and, if packet
synchronization is not declared, it is used to delay the position of the local templates
Tf with respect to the incoming signal.
The fine tracking subsystem shown in Figure 2-10 provides the functionality required to close the DLL. The division needed in the delay estimation is avoided by
multiplying by an approximation to the inverse stored in a ROM. The ROM stores
32 seven-bit numbers and the five most significant bits of the numerator are used
to choose the output. The coefficients of the filter are programmable, and it incorporates Baugh-Wooley multipliers. The delay decoder transforms the output of the
filter into signals relevant to the fast clocking domain: the new state of the retiming
block and indication of the need to start correlations a 500 MHz clock cycle before
(signal Advance) or later (signal Delay). The fine tracking subsystem also provides
a flag to restart coarse acquisition when the signal is lost. All the outputs of the
fine tracking system are ready in six 30 MHz cycles. Since there are only five 30
INO
INI1r4
IN2 4
Outo
i
IMUX
Out4
Out1
ux
Out2
-1
DQIC2
IN3I4.
IMUX-0
-
Out3
Figure 2-8: Block diagram of the retiming block
MHz cycles between every two consecutive pulses, there is not sufficient time between
the last pulse of the Gold code sequence and the first pulse of the next sequence to
perform this operation. This is the reason why the DLL only uses Nc - 1 pulses to
estimate the delay even if Nc pulses are used to recover the value of the transmitted
bit. This leads to a negligible loss of processing gain for the delay estimation.
As the 50 correlations are completed, they are read into the coarse acquisition
subsystem, whose block diagram is shown in Figure 2-11. The memory block provides
not only the value of the maximum but also the values in the two adjacent positions.
The two simplified fine tracking subsystems lack the loop filter shown in Figure 2-10
except for its direct path (bo). Only the simplified fine tracking subsystem using the
two positions with the most energy will be used to initialize the DLL. Detection of
the signal is given in six 30 MHz cycles, and the rest of the outputs are ready in seven
cycles thereafter. In order to provide enough time interval for these operations, the
evaluation of the 50 correlations starts only when Nc - 1 pulses have been integrated.
Still, during the change of state that happens after declaring packet synchronization,
the first two pulses of the next bit are lost. This implies a negligible loss of processing
gain in the demodulation of the first bit of the payload.
All thresholds, coefficients and other parameters used in the digital back-end must
be configured before utilization by using data fed through a serial port.
2.5
Performance Results
Figure 2-12 shows a photograph of the 0.18 /im ASIC. The PLL was verified at 500
MHz and can provide much higher clock frequencies (up to 2 GHz). The ADC is
verified using the testing method presented in [86], and proved to have an ENOB
Figure 2-9: Implementation of the correlation bank.
larger than 3. The digital back-end is completely functional at a clock frequency of
300 MHz, but not at 500 MHz. The part that failed in the baseband was the path
of the signals generated in the slow-speed clock-domain that needed to be registered
into the high-speed clock-domain. The frequency range for the coarse acquisition
algorithm between a pair of transceivers is shown to be ±3%. At 300 MHz a data
rate of 193 kbps was demonstrated. Table 2.2 contains a summary of overall chip
measurements. Most of these circuits can scale to 500 MHz. This architecture is
scalable to larger bandwidths.
-
.-
-
-
L
Figure 2-10: Fine tracking subsystem block diagram.
Figure 2-11: Coarse acquisition block diagram.
U
UCas
Frn'#d0F
.
U
Figure 2-12: Single chip UWB transceiver photograph.
Table 2.2: Chip Measurements
Chip specifications
0.18 pm
Process Technology
4.3 mm x 2.9 mm
Die Size
193 kbps
Bit Rate
Power Consumption
45 mW
PLL
65 mW
CLK Buffers
75 mW
Baseband
Chapter 3
System Analysis for the FCC
Compliant System
In this chapter a method for obtaining the specifications of a baseband system for an
UWB transceiver is developed. A system specification optimizes the resources of the
different subblocks that comprise the wireless system in order to achieve a certain data
rate at a concrete distance. The limitations of the different subblocks, the specific
challenges of the UWB system, and the constraints imposed in its architecture for
practical reasons are taken into consideration in the choice of the algorithms that
are implemented in the baseband. For UWB systems, the impact of the extreme
multipath and the techniques used to compensate for its effects will be carefully
analyzed. Emphasis is given to the programmability of the different elements of
the baseband subsystem as a way to prove the trade off between power dissipation,
complexity and quality of service.
First, the objective of the system in terms of data rate and distances will be
decided in relation with the different standards or proposals available for similar systems. Then, the reasons that support the choice of an homodyne architecture for the
UWB receiver are presented. The kind of signal chosen depends on the ADC specification, and a comparison between impulse signals and multiband-OFDM signals is
presented. After this, the main challenge of UWB communications, multipath, and
its solution will be addressed, and the algorithms to be implemented in the digital
baseband will be defined and developed for fixed-point arithmetic implementation and
programmability, making explicit the trade-off between complexity of signal processing and quality of service. With the sensitivity defined as a minimum SNR obtained
from this section, an analysis of the link budget is presented at the end of the chapter,
along with the model under which the final system was simulated.
The results of this chapter will be applied in the following chapters to analyze two
different systems: the one implemented in an FPGA as part of a discrete prototype
in which the main concern is simplicity, and the one implemented in an ASIC, where
the objectives are robustness and programmability.
3.1
Objectives of the Design
In our choice of the objective of this UWB system we decided to explore the high
data-rate applications and chose to obtain a data rate of 100 Mbps at 10 m. This
follows closely one of the modes specified for the IEEE 802.15.3a effort. This will be
achieved with a signal that meets the requirements of the spectral mask indicated in
Figure 1-1, in the band from 3.1 GHz to 10.6 GHz. Two problems are relevant for
this kind of signal: coexistence with other narrowband signals and multipath.
The interference that UWB systems would cause on already existing system is
almost negligible, but it has been proven that in spite of initial assumptions, UWB
systems are vulnerable to narrowband interferers. The initial claim that UWB systems would be able to easily filter out narrowband interferers depends on the linearity
of the transceiver. If the front-end does not provide enough linearity (given by the
RF front-end and the ADC), the receiver may be saturated by a powerful narrowband
interferer. Since the bandwidths of UWB systems is 500 MHz and larger, the linearity constraint in the ADC is associated to its bandwidth and obtaining more than 4
bits at those frequencies lead to very power hungry designs that are not amenable for
wireless applications. Furthermore, the linearity constraints add to the bandwidth
and low-noise constraints in the LNA, making these elements of the RF front-end
harder to design. The band approved for UWB communication uses includes the
UNI-II band at 5 GHz, already used by wireless local area networks (WLAN) standard IEEE 802.11a, a strong, narrowband interferer. [87] expands on the effect of
narrowband interferers on wideband wireless communications. Some techniques to
reduce the narrowband interferer impact in MB-OFDM signals has been presented in
[88].
For these reasons, most of the proposals submitted to the IEEE 802.15.3a committee chose to divide the 3.1-10.6 GHz band in a number of subbands of around
500 MHz bandwidth', the minimum allowed by [2]. This choice allows avoiding the
UNI-II band by simply not using the 500-MHz bands that collide with it, and filtering
this interference out in the front-end. The minimum bandwidth allows reducing the
design stress on several of the blocks and focus on the algorithms. The baseband
designed here follows this trend. In this thesis the in-band interferer will not be actively addressed further than this design choice. Figure 3-1 shows how the different
500-MHz channels fit in the spectral mask given by the FCC.
The other important problem is multipath. The multipath model used here is the
one presented in [3] used for the IEEE 802.15.3a standardization group. An effective
high data rate transceiver in this band should provide robust communication under
severe multipath conditions. This requires sophisticated signal processing that will
increase the power dissipated in the baseband. The two main drivers of this architecture are the possibility of providing a good characterization of the environment of the
transceiver, and also provide higher levels of the architecture with programmability
that allows trading off power dissipation and data rate with quality of service in or1
MB-OFDM proposal uses a 528 MHz channelization in order to simplify the design of the local
oscillators used in the transceiver.
E
m-40
.
..
.
-.
- -45
S -50
0 -55
-60
i -70
~ -75
....
i.............
.i..........
...
..
...
..
.....
...
Report
and
Order
.... .. .................. .............. . .-First
. .... .. ".... i. . ..... =.. i . ... ................
...... ....
-8n
I "---.
- Part 15 boundn
- UWB Channelization
101
0
10
Frequency in GHz
Figure 3-1: 500 MHz bandwidth channelization with FCC compliant power spectral
density
der to adapt the transceiver sharply to the service that needs to be provided and the
channel state. In addition, a fast signal acquisition algorithm must be implemented
to reduce the duration of the preamble to a value comparable with current wireless
systems (_ 20 its).
3.2
Homodyne vs Heterodyne architecture
The architecture choice in the front end determines the type of signal processing
that is required in the baseband. There are two main options [89]. The first one is
the heterodyne receiver shown in Figure 3-2(a) and the other one is the homodyne
receiver, depicted in Figure 3-2(b).
A heterodyne receiver needs to handle the rejection of the image band. This
implies a trade-off between the value of the intermediate frequency and this rejection.
The larger the intermediate frequency chosen, the easier it is to reject the image band,
but also the larger Q is required for the channel select filter and the larger sampling
frequency for the ADC afterwards. On the other hand, a small intermediate frequency
reduces the sampling frequency of the ADC and the Q required in the channel select
filter, while complicating the design of the image reject filter. The heterodyne receiver
by default will use an ADC whose sampling frequency is larger than the bandwidth
of the signal, and further signal processing is required in the digital baseband to
down-convert it all the way to baseband.
A homodyne receiver needs not worry about the image band, because the received
signal is directly converted to baseband. On the other hand, the homodyne receiver
needs to duplicate the receiver chain after the down conversion. The homodyne
receiver also requires two ADCs instead of one, but their sampling frequency may be
LNA
cos •Lot
a) Heterodyne receiver
b) Homodyne receiver
Figure 3-2: Architectures for the receiver.
of the same order of magnitude as the bandwidth of the signal that is being processed.
In addition, a homodyne receiver needs to take into account DC offsets present in the
signal (both from LO leakage or interferer leakage) and I/Q mismatch.
In our case, we chose the second option and the main reasons for that was the
trade-off regarding the image band rejection if a heterodyne architecture were used.
Concretely, let us consider an 802.11a interferer at 5.6 GHz, that appears in the image
band of the UWB channel right below. Let us also assume that the intermediate
frequency chosen is 300 MHz (so that the objective band ends up from 50 MHz to
550 MHz), and that after this down-conversion it is sampled at 1.2 Gsps. Under
these conditions, if 10 dB of attenuation of the image band is required, a filter that
ensures 140 dB/decade slope is required, what makes its design really difficult. The
homodyne architecture allows a reduced ADC sampling rate, and digital baseband
frequency clock.
3.3
Specification of the ADC
The broad definition of UWB by FCC allowed for complex modulations such as
OFDM to be included in the denomination of UWB signals. Taking into account that
OFDM allows a very clean and elegant solution to multipath [90], it was considered
as an alternative to impulse UWB for the UWB transceiver. In this section we will
compare both signals taking into account the specification of the ADC.
3.3.1
Signal definition
In order to be able to compare the performance of both kinds of signals, the following
approach is chosen so that each of them achieves a data-rate of 100 Mbps with a
bandwidth of 500 MHz.
* Pulsed UWB: Each bit of information is represented by one pulse of width 2 ns
and the distance between the beginning of two consecutive pulses is 10 ns.
* OFDM UWB with 256 carriers: Each OFDM symbol occupies, sampled at 1
GHz, 256 samples. In a heavy multipath environment, the expected RMS spread
of the channel impulse response is approximately 25 ns [2]. The cyclic prefix
has been conservatively chosen to be greater than this value; it has a length of
54 ns (54 samples). 31 bits of information are encoded in each symbol using
repetition codes so that each bit is modulating more than one carrier in BPSK.
In order to whiten the spectrum, the total number of carriers is divided into
four blocks of 31 carriers, while reserving the remainder of them for channel
estimation. The four blocks contain exactly the same bits of information, but
each scrambled in a unique way. In order to obtain a real baseband signal, the
symbols modulating the conjugate carriers are complex conjugates themselves.
These signals are sampled at a frequency of 1 GHz. Both signals provide the same
data rate. Because they occupy the same channel, they have a similar processing
gain. The following differences between the two signals should be highlighted:
* OFDM UWB inherently provides a simple mechanism of channel equalization.
On the other hand, a receiver using pulsed UWB should also include an equalizer
to mitigate the effects of the channel.
* Synchronization of the receiver is performed differently depending on the type of
signal transmitted. This leads to different kinds of synchronization algorithms
for the two signals. In the simulations shown in this section, it is assumed that
the receiver has achieved perfect synchronization and no jitter or time errors
are considered.
3.3.2
Automatic Gain Control
For the simulations we will take into account the use of an ADC with a finite number
of bits and see how the system performance changes as the number of bits of the
ADC goes down. The presence of an ADC in the system implies the necessity of an
automatic gain control (AGC). It is assumed that this system has an instantaneous
AGC in order to focus on the impact of quantization noise in the demodulation of
the signal. The ADC model has a fixed input range, from -1 to 1. If the number of
bits is b, then the quantization step is A = 2 1-b. The AGC is calibrated with the
assumption that, in addition to the input signal, there is only AWGN at the input
of the ADC. Then, the quantization noise can be assumed to be a uniform random
variable of variance A 2 /12. This assumes that the total input signal amplitude is
neither very large (avoiding saturation of the ADC) nor very small (in which the
quantization noise tends to A2 /4 as only the two smallest levels of the ADC are
exerted). The AGC will avoid these extremes. The AGC scales its noisy input signal
by a factor a such that the ADC is fed an optimal input mean square voltage of a,2
given in Table 3.1 for different resolutions. Due to this block, it is safe to assume
henceforth that the quantization noise power added by the ADC for all input SNR's
is A2/12.
3.3.3
Demodulating Architectures
The demodulation of the bits depends on the kind of signal received. Two types of
receivers are considered:
* OFDM receiver [90]: . Shown in Figure 3-3(a). After sampling the incoming
signal, the samples corresponding to the same OFDM symbol are separated.
The samples that belong to the cyclic prefix are removed. An FFT is performed
over the rest of the samples. Since each bit of information has been spread over
several carriers, the coefficients of the FFT corresponding to those carriers are
added. The sign of this number is related to the value of the data bit.
* Matched filter receiver [50]: Shown in Figure 3-3(b) After sampling, the incoming signal is correlated with a replica of the representation of the data bit.
For pulsed UWB, the incoming signal is correlated with a representation of the
pulse shape (in this case, a rectangular pulse). The sign of the correlation result
indicates the value of the data bit.
3.3.4
Simulations and Analysis
The signals are compared in terms of their behavior in a noise-limited environment
and an interference-limited environment. Noise samples are uncorrelated. The ADC is
preceded by an AGC which sets the power fed to the converter based on the previously
described policy. The Monte Carlo simulations required for the plots provide an
standard deviation in the error of the estimation of the probability of error (Pe)
Table 3.1: a, values set by AGC
ADC Resolution
2
3
4
5
ao
0.2850
0.2025
0.1425
0.1231
a) Architecture of an OFDM receiver
Template
b)Architecture of an Impulse receiver
Figure 3-3: Receiver architectures for different UWB modulations.
under 10% for a Pe of 10- 5 , under 3% for a Pe of 10- 4 and less than 1% for P, greater
or equal than 10- 3 . The results are shown in Figure 3-4 for the OFDM UWB signal.
Figure 3-5 represents the result for the pulsed UWB signal. In both cases, the results
were obtained for ADC resolutions of 1, 2, 3, 4 and 5 bits. The curves for a 6-bit
ADC were also obtained but since they are already very close to the ideal case with no
quantization, they are not presented here. In both figures, the curve that represents
the performance for the ideal case with no quantization is provided for comparison.
It is seen in these figures that the OFDM UWB signal performs slightly worse than
the pulsed UWB signal. This difference is caused by the presence of a cyclic prefix
in the OFDM UWB signals. This prefix reduces the real power used for detection of
the information bit by the ratio of the prefix length to the sum of the OFDM symbol
length and the prefix length.
These simulations allow the comparison of the three different signals for two
regimes:
* High SNR regime: In the case of OFDM UWB, as the SNR increases, P, tends
to a saturation value and cannot be reduced further. In the pulsed UWB,
Pe can be made arbitrarily small by increasing the SNR. In the case of the
OFDM UWB symbol, as the SNR increases, the only relevant noise term is the
quantization noise that asymptotes to a constant value of A2 /12, with A being
the quantization step. As an FFT of 256 points is performed, the demodulated
symbol for each carrier contains noise that is the result of the combination
.,o
................
_········
·· ·· · · · ·· · · · ·· · · ··- · ·· · · ·· · · ·· · · · ·· · ·· · ·
. no quantization
...1bit
lo
::::::::::::::::::::::::::
I ·-2 bits
10u
.............
--- 3 bits
..4bits
M -2
::::::::::
:::::::::::::::::~
~ r..
i'iiiiiiiiii~iii
10
.................
~iiii..................
.0 lo
0.
.......
.................
. ...
..................
...................
4: ....
....
...................
...................
. .............. ............
.........
.................. .................... ..
.................
...... ........
. . . .. .. .
........
..
:::::
:::::
:::
::
.. ... .:::::
. :::::
.... . ...:::::
.. . :::::
.. . ..:::::
. .
. ..:::
. ::
.. :::
. ...........
. .. ::::
.::.::
.. :::
. ..::
. ::
.. :.
0
-1 -11
-5
0
SNR (dB)
5
10
15
Figure 3-4: Probability of error for the AWGN limited case, OFDM UWB
....................
-;~i~iiitii
- -
.
-10
no quantization
I bit
2 bits
. .......
...........
3 bits
4 bIts
3_
.
......
:: .....
..
........
-
.Q
2
.................
::::::::
)L~··· ·· · · ·....
I::::::::
r1
.*
.........
...........
....
...
......
...
-10
-5
0
5
SNR (dB)
...
10
15
Figure 3-5: Probability of error for the AWGN limited case, pulsed UWB
56
with weights with different phases but the same amplitude of 256 samples of
quantization noise. The result can be assumed to be Gaussian. In this case, Pe
converges to the probability of error that corresponds to an SNR:
12
SNR = 1
P8
=P3
-, 2 2b
(3.1)
Where P, is the signal power. For an OFDM UWB signal in the high SNR
regime, each additional bit in the ADC will provide a better saturation Pe. In
the case of pulsed UWB, a small number of samples is used for each bit. The
effect of AWGN can be understood as a change of sign of the sample compared
to the sign it should have. As the amplitude of the signal compared to the
amplitude of the noise increases, the probability of changing the sign monotonically decreases and an arbitrarily low Pe is achieved. The behavior of OFDM
UWB signals can be also explained related to the clipping of these signals with
the ADC. Signals with a large peak-to-average ratio, such as OFDM, are more
vulnerable to clipping. In this case clipping causes inter-carrier interference that
increases the Pe.
Low SNR regime: For the case of pulsed UWB, 1 or 2 bits are sufficient since the
plots of these curves are close enough to the ideal case. Due to the saturation
effect observed previously, the curves of OFDM UWB for a low number of bits
are farther away from the ideal ones. A Pe of 10-4 can still be achieved in the
OFDM case by using an ADC of at least 3 bits.
For the interference-limited case, a pure sinusoid is chosen as a replacement of a
modulated carrier with a finite data bandwidth. Thus, there are no abrupt changes
of phase over the duration of both the representation of one information bit (in the
case of pulsed UWB) or during the duration of an OFDM symbol, including the
cyclic prefix, in the case of OFDM UWB. Its frequency is a uniform random variable
in the range from 0 to half the sampling rate. Its initial phase is an independent
uniform random variable from 0 to 27r. The Monte Carlo simulation provides the
same precision as the simulations of the AWGN limited case. The simulation results
are shown in Figures 3-6 (OFDM signal) and 3-7 (pulsed UWB).
For SIR values greater than those represented in the figure, Pe drops to zero. This
is because the interference is an amplitude limited signal and the amplitude is small
enough so that the samples of the signal do not change. There is a threshold effect
for SIR = -3 dB for pulsed UWB and for SIR = 11 dB for OFDM UWB. If the
signal modulation is more complex, each bit incorporates a higher number of samples
of the interference. The shortcomings of OFDM signals in the presence of strong
non-linearities may be improved by the use of coding techniques, although this will
require by default an increase in the complexity of the receiver. For all of them, while
1 and 2 bits are still slightly far from the ideal curve, 3 or 4 bits comes close enough
to it.
Two conclusions are derived from here. First, a BPSK signal is better behaved
in the presence of non-linearities. An OFDM signal requires a minimum number of
.,O
10
quantization
-no
10-1
no
... ........
..................
10
quant
.................. ..........................
. . . . ... .. . .. .. .. .. ..
...... ............ .................. •..............
..... ........... ................... •..............
. . .. .. I.. ..
2-2 bits
bits
10
.........
o-2
'10.. . .
...........
... ....
....
... ...... .......
.... ...
...
...
.... ...
...
...
..........................
• . ..............
: ....... .... ....
. . . . ...............
. ........
•..................
............... . .................. •....................• ........... .. ...• ..................
..................
...................
. . ..................
...................
............
..........
.........
.........
.........
.........
.:::
;..
-10
-5
0
SIR (dB)
5
10
15
Figure 3-6: Probability of error for the interference limited case, OFDM UWB
............
....................
no quantization
...-.......
......
- --
-"'
1 bit
2 bits
---3 bits
16
Eo
. 4 bits
-*-.
~::::::::::::::::: :::: ............
.......................
...
..
...
..
..
...
...
..
..
..
...
...
......
..
...
..
...
..
..
..
..
..
..
...
..
..
...
..
..
...
..
..
..
...
...
.
..
..
...
..
..
...
..
...
..
.
.....
......
.....
......
.....
...
......
......
....
......
........
......
......
.
...
..
..
..
..
..
..
...
...
..
....
...
..
...
..
..
..
...
..
..
..
..
..
..
..
..
....
...
....
..
....
....
....
....
...
0
l0
I.
1
-- -L
-10
-8
-6
-4
SIR (dB)
-2
0
2
Figure 3-7: Probability of error for the interference limited case, pulsed UWB
E
4
Time (ns)
Figure 3-8: Example of UWB BPSK baseband signal, before up-conversion
bits in the ADC even in the best SNR conditions to work with a low enough bit-error
rate, while a BPSK signal can achieve arbitrarily low BERs with a 1-bit ADC when
the SNR is high enough. Even if with both types of signal it is possible to argue that
similar performance is achievable, pulsed UWB allows reducing the number of bits in
the ADC and adapt it to the channel quality reducing the power consumption both
of the ADC and the digital baseband when the SNR or the channel quality are good
enough. The better behavior of impulses to non-linearities, allowing a more drastic
trade-off in terms of complexity was the main reason to choose BPSK impulses over
OFDM signals in our system. This work was presented in [91].
3.4
Choice of UWB Signal
This baseband has been designed for impulse UWB. Each bit of information is represented by the sign of one impulse (BPSK). The signal is comprised of a sequence of
500 MHz bandwidth pulses 2 that are up-converted to one of 14 channels (sub-bands)
of the bandwidth available in the 3.1-10.6 GHz band. The interval between every
two consecutive pulses is T8 = 10 ns during the payload. Each bit of information is
represented in BPSK by only one pulse, achieving a data rate of 100 Mbps. The data
packet structure will be described in a later section. Figure 3-8 shows an example of
the baseband UWB signal representing, using BPSK, a sequence of three bits. Figure
3-9 shows a 500 MHz pulse up-converted with a carrier of 5 GHz.
3.5
Multipath
In this section, the channel model in which the UWB system should work is presented.
After that, the different algorithms involved in the compensation of the multipath and
2In this and in the future, bandwidth in this thesis refers to -10 dB bandwidth.
Figure 3-9: 500 MHz pulse with carrier 5 GHz. Courtesy of David Wentzloff
ISI are developed.
3.5.1
Channel Model
The channel allocated to UWB communication signals is impaired by severe multipath. The IEEE 802.15.3a chose the multipath model presented in [3]. It is a
Saleh-Valenzuela [1] model with two modifications: a lognormal distribution is used
instead of a Rayleigh distribution for the multipath gain magnitude, and independent
fading is assumed for each cluster as well as each ray within the cluster. These two
changes fit the channel measures obtained better but make the mathematical study
of the model more complicated. Four different types of channels were provided for
the transceiver simulations. Their characteristics are shown in Table 3.2. This table
includes the number of paths that have attenuation smaller than 10 dB with respect
to the more powerful path. (NP1odB, and the average number of paths that include
85% of the total energy (NP (85%)). The multipath model consists of the following,
discrete time impulse response:
hi(t) = Xi
L
K
1
(t -
-
)
(3.2)
1=0 k=0
Since we are using the model as it is, we will not delve into how this model is
generated. For more details on this, please refer to [3, 1, 92]. [3] provides a realization
of the model that try to match important characteristics of the channel. Since it is
difficult to match all possible channel characteristics, the main characteristics of the
channel that are used to derive the model in [3] are: Mean excess delay, RMS delay
spread, number of multipath components (defined as the number of multipath arrivals
that are within 10 dB of the peak multipath arrival), and power decay profile. Table
3.2 shows these characteristics for the four channel models that are provided for
Matched
IT/T
Equalizer
Figure 3-10: Procedure to compensate for multipath
testing.
The large bandwidth of the UWB signal allows separating the echoes that arrive
to the receiver with a delay separation larger than the duration of the UWB impulses
(_ý2 ns), and use this information to implement a Rake receiver in order to gather
all the possible multipath energy. Additionally, since the RMS delay of the channel
is larger than the inter-symbol period (10 ns), equalization is needed to compensate
for the inter-symbol interference (ISI). The procedure to compensate for the channel
multipath consists of, first, to estimate the channel impulse response, and then, to use
this information in both a Rake receiver (matched filter) and an equalizer, as shown
in Figure 3-10. In the following sections these aspects will be analyzed.
3.5.2
Data-Aided Channel Estimation
Channel estimation in UWB communications has been previously addressed in [59, 93,
94, 95] to assess the signal energy capture in Rake receivers as a function of the number
of fingers. In these papers, an isolated monocycle is transmitted through the channel
and the corresponding received waveform is recorded. The problem is to approximate
the actual channel with a channel with Lc branches. The degree of matching depends
on Lc and the minimum value of L, required for a good match establishes the number
of fingers that a Rake receiver must posses to efficiently exploit the channel diversity.
The approach in this receiver matches that of [47]. In this paper, the authors lump
together the effect of the multiuser situation as additional additive white Gaussian
noise.
Figure 3-11 shows a subset of the echoes in one instance of the CM1 channel. There
are clusters that contain several echoes of different amplitude and sign in an interval
of duration smaller than 2 ns. As we are using impulses of 500 MHz bandwidth, it
is not possible to separate these echoes in the receiver. Due to the bandwidth of the
Table 3.2: Multipath Channel Models
Channel
CM1
CM2
CM3
Description
LOS, 0-4 m
NLOS, 0-4 m
NLOS, 4-10 m
CM4
Extreme NLOS
Mean Delay RMS Delay NPlodB
5.05 ns
5.28 ns
10.38 ns
8.03 ns
14.18 ns
14.28 ns
35
25.00 ns
NP (85%)
24
36.1
61.54
U.5
0.4
0.3
0.2
"
0.1
...... ! .............
0.
.1
. -0.1
-0.2
I
.8.10.12
.....
....
...
...
I...
...
.....
...I.....
I......
....
..
.....
.....
0 ....l
-0.3
nf
i. . ......... ?. ............ I.............. .............. ..............
...
......ii
i..............i..............
......
.......
·.
I.....lj......
2.4
.....
..
........
..
.....
.......
-0.4
-05
0
2
4
6
Delay (ns)
8
10
12
Figure 3-11: Example of the clusters in one instance of the channels in [3].
signal, the number of echoes that combine is small, not allowing the application of a
Rayleigh or a Ricean model (that use the central limit theorem). Assuming there is
no inter-symbol interference, the equivalent low-pass signal received is:
L
K
hip(t)= b
ak,1p(t - '?- -,)
(3.3)
1=0 k=0
where p(t) is the pulse shape. The objective is to estimate the equivalent low-pass
channel [50], after sampling:
h,[rn] =
Le-1
ai6[n - i]
(3.4)
i=O
where ai is a complex number, and the channel has already been sampled at the
Nyquist rate. This expression assumes that the channel impulse response can be
reliably represented with L, consecutive samples, either because the channel impulse
response is shorter in duration than Lc - Tb, where T b is the inverse of the sampling
frequency, or because the taps that arrive outside this interval have very small SNR
compared to these ones. The parameters here are then only the amplitudes, taking
into account that if no echo should be found at a certain delay, its associated amplitude
ai is zero. The value of L, will be determined in the section on Rake receivers and
MLSE equalizers.
For the purpose of the Rake receiver, it is not necessary to separate the information
of the pulse shape p(t) from that of the multipath. Only the aggregate result of their
convolution, as indicated in (3.4) is required to implement the Rake receiver as an
approximation to the matched filter.
It is assumed that the channel coherence time is much longer than the data packet,
so that the channel impulse response does not change during its duration. If the
channel impulse response is estimated during the preamble, this information needs
not be updated during the rest of the packet. During the preamble a known sequence
of signed impulses is sent. From each impulse sent we obtain a noisy snapshot of the
impulse response, assuming that the separation between consecutive impulses ensures
there is no inter-symbol interference. Let us define
= [h[O], h[1], ..., h[L - 1]] T
w'[n] = [w[0], w[1], ..., w[Lc - 1]]T
=n]
= b[n] + W
(3.5)
(3.6)
(3.7)
In this expressions both h (channel impulse response), w'[n] (AWGN) and i•n] (received snapshot of the channel impulse response) are complex vectors. b[n] is the
transmitted symbol and in general it could also be a complex number. Since we are
using BPSK, b[n] = ±A. During the preamble, the values of b[n] are the elements
of a known pseudorandom sequence and can be either 1 or -1. w'[n] is stationary,
white and Gaussian, with an autocovariance matrix equal to a 2 . ILc, with a being
the standard deviation of the Gaussian noise, and ILe the identity matrix of Lc rows
and columns. Since for each 9[n], b[n] is known, it is possible to create a sequence of
new random variables:
fi[n] = b[n] -1n] = + b[n] -w[n] =h + - 1 [n]
(3.8)
It is trivial to show that w1 [n] is also stationary, white and Gaussian, with a covariance
matrix equal to a 2 'ILe. The problem of estimating the channel impulse response is
then simply obtaining the mean of the random vector r' [n] having several realizations
(Ne) of this vector. Each realization of this vector corresponds to the reception of one
impulse in the preamble. This would be the least square error algorithm [96]. The
procedure for the channel estimation consists of:
1. Collect a set of Nc received vectors r1n] with n = 1, 2, ...,No, associated to a
known sequence of bits b[n].
2. For each r1n] obtain F' [n] using equation (3.8).
3. The channel impulse response is estimated as:
Nc
he
=E
l[n]
(3.9)
n=1
The previous analysis assumes a linear receiver and that the impulses in the preamble are separated enough to ensure that there is no ISI. It is possible to ensure the
non-ISI constraint by separating the impulses in the preamble for a larger time interval than in the payload. Regarding the linearity, both the impact of the number
of bits of the ADC, and its saturation due to an error in the automatic gain control
must be analyzed. The effect of this is modeled through simulation, and their results
shown in Figures 3-12 and 3-13. These figures show the contour of minimum SNR
ti
14
....
Ti.
05.5
0
5
0
S4.5
.0
E
4
a.....
..
....
\ ..
..
........
z 3.5
2
5
10
15
20
25
Number of pulses integrated
30
Figure 3-12: Minimum SNR at the input to achieve a 10 dB SNR in the channel
estimation as a function of the number of bits of the samples and the length of the
integration. No saturation
at the input of the channel estimation required to obtain a SNR of 10 dB in the
estimation of the channel impulse response, as a function of the number of bits of the
ADC and the number of impulses Nc used to obtain the channel impulse response
estimation. Figure 3-12 shows these curves when the gain provided by the front-end
ensures that the full range of the ADC is used without saturation. Using ADCs of
4 or 3 bits, and integrating for N, > 10 ensures a very good SNR in the estimation
even in very low SNR. Figure 3-13 shows what happens when the gain of the system
is 6 dB larger than in Figure 3-12, causing saturation. In this case, although 4 bits
still allow the same behavior as before, there is a noticeable decrease in performance
for 3 bits. Still, for N, > 10, the channel impulse response estimation is still reliable.
This serves to reduce the constraints in the automatic gain control (AGC). Steps of
6 dB gain in the front end should then be enough for the channel estimation to work
properly. Apart from that, for the channel estimation, Nc equal to the length of the
Gold code is chosen because it is a very conservative value that would tolerate even
worse performance of the AGC.
3.5.3
Rake Receiver
Multiuser detection [97, 98] is known to be the optimal solution even in a multipath
environment but, as its complexity increases exponentially with the number of users,
it is often impractical. The optimum receiver for detecting signals in a multipath
environment, when the observation noise is modeled as additive white Gaussian noise
(AWGN), is a matched filter or a correlation receiver, where the reference (template)
signal is the response of the transmission medium to a transmitted signal (composite
of the channel and the transmitted signal). A Rake receiver resolves the components of
a received signal (arriving at different times) and combines them to provide diversity
in multipath environments [99]. A Rake receiver is a suboptimal solution for multiuser
5
10
15
20
25
Number of pulses integrated
30
Figure 3-13: Minimum SNR at the input to achieve a 10 dB SNR in the channel
estimation as a function of the number of bits of the samples and the length of the
integration. 6 dB saturation
environment, since it would model the interferers also as AWGN. It is a good trade-off
between high performance and low complexity. In addition, it represents the building
block for other schemes performing multiuser interference cancellation.
It is known that as the spreading bandwidth of a signal increases, the number of
resolvable multipath components available also increases, making the signal amenable
to improvement by the use of a Rake receiver. In [100, 101, 60, 59, 621, the term all
Rake (ARake) receiver is used to describe the receiver with unlimited resources (taps
or correlators) and instant adaptability, so that it can, in principle, combine all of
the resolved multipath components, even if their number increases with the spreading
bandwidth. However, the number of multipath components that can be utilized in an
implemented Rake is limited by power consumption, design complexity and channel
estimation [49]. The opposing approach to the ARake is the Selection Combining
(SC) whereby the received signal is selected from one out of the L, available diversity
branches. Another well known approach is the maximal-ratio combining (MRC)
[102]. In MRC, the received signals from all the diversity branches are weighted and
combined to maximize the instantaneous signal-to-noise ratio (SNR) at the combiner
output.
The fact that in a normal receiver not all the multipath components can be taken
into account has been developed in several studies that use a reduced-complexity
multipath combining system that selects the L best paths (from the Lr available) and
then combine them based on a chosen criterion. Those receivers are known as selective
Rake (SRake receivers). Selecting the "best" paths can be accomplished by selecting
the multipath components with the largest signal-to-noise ratio (SNR), corresponding
to those echoes with smaller attenuation. An hybrid scheme in which L out of Lc
components are selected, and then combined using MRC has been developed for DSCDMA signals in several of the publications cited in this section. These publications
Figure 3-14: Functional diagram of a Rake receiver
analyze the performance of such a channel under the assumption that the channel is
a slowly varying wide-sense stationary uncorrelated scattering (WSSUS) channel.
A Rake receiver can be understood as an FIR filter with an impulse response:
L-1
h[n] =
as(n - mi)
(3.10)
i=O
in which the number of taps L is fixed, but both ai and mi are configurable. A
possible implementation is shown in Figure 3-14. Normally, out of the whole impulse
response, the L more powerful components would be detected and the FIR would be
set accordingly. In our case, the Rake receiver would be modeled with the following
expression:
L-1
h[n] =
a6s(n - i)
(3.11)
i=O
Figure 3-15 shows the functional diagram of an implementation of this filter. In
this case, there is a minimum number of taps with consecutive fixed delays, but the
amplitudes are programmable and parts of it (if the corresponding weights a2 is equal
to zero) may be turned off (marked in Figure 3-15 as a switch). This model of Rake
assumes that the maximum length of the channel impulse response is equal to L -Tb,
where Tb=2 ns is the inverse of the sampling frequency. After the channel impulse
response is estimated (as shown in previous section), each of the weights is compared
to a preprogrammed threshold, and only those taps with weights that exceed the
threshold will be used in the Rake. Instead of having a fixed number of fingers, this
Figure 3-15: Functional diagram of the Rake receiver that will be implemented in
this UWB system
Rake uses every sample of the channel impulse response that meets a programmable
threshold Th. Either all the samples with the same absolute value are used simultaneously or none at all. For example, Figure 3-16(a) shows the impulse response of
a CM1 channel. Figure 3-16(b) corresponds to the result of sampling this channel
impulse response with 4-bit precision and using a Rake receiver of 6 fingers. Figure
3-16(c) shows the equivalent impulse response if we chose all those impulse responses
that go over a threshold equal to Th=1 LSB. By using a threshold, the number of
fingers is a random variable and adapts to the channel impulse response. The block
that searches for the most powerful samples in the channel impulse response, required
in the Rake receiver, is replaced by the comparison of its samples to the threshold
Th. This reduces the complexity of the total receiver.
For simulation purposes, the transmitted signal can be written as follows:
s(t) =
bjpp( t-jTf)
-
(3.12)
This signal undergoes the channel impulse response h (t) that is assumed to have a
maximum duration of 25 samples. So the received signal can be modeled as
j = bjh + Wý
(3.13)
where f is a complex vector containing the samples that contain any information
about bit bj. h represents the channel impulse response sampled at the Nyquist rate
and down-converted to the low pass equivalent of the signal. t is AWGN. In this
Time (samples)
(a) CM1 Channel impulse response
0
(b) Channel after conventional 6-finger Rake
(c) Channel after threshold Rake
Figure 3-16: Modified Rake receiver.
case, the received signal after matched filtering is
Lc-1
r, = hHb +
= bj
IhI2
h
(3.14)
i=O
Where the superindex H indicates the conjugate transpose matrix. The signal to
noise ratio obtained is
LE-1
o-i= 1h42
(3.15)
o i=O
The value used for demodulation is obtained as follows:
d=R Q
bhH)(hb
+i)
(3.16)
So that if d > 0 the demodulated bit is a 1 and if d < 0 the demodulated bit is 0. h•
represents the estimated channel impulse response. This expression is assuming that
there is no Multiple Access Interference (MAI), and that transmitter and receiver are
properly synchronized (both phase and delay). Taking into account that Wis AWGN
with variance No, it is possible to obtain the mean and variance of d and due to the
Gaussianity of the noise, a very straightforward derivation of the probability of error
as a function of the Eb/No at the input of the Rake receiver can be obtained. Since
quantization both in the channel impulse response estimation and in the input signal
makes this structure difficult to analyze, simulation with the channels indicated in [3]
was used to characterize the performance of the receiver.
Figure 3-17 shows the losses of this Rake receiver as a function of the normalized
threshold and the channel model with respect to the perfect Rake receiver. Since a
decrease of Th implies a larger number of paths is taken into account, Figure 3-17
shows that signal processing complexity may be traded off with quality of service. The
channel estimation was obtained with a precision of 4 bits. The maximum loss for any
channel is 6 dB as compared to the optimum ARake. This plot makes explicit a tradeoff between signal processing complexity (as the threshold increases, the complexity
of the receiver is smaller) with the quality of service (as the threshold increases, the
minimum SNR required to obtain a fixed performance increases).
3.5.4
MLSE Equalizer
The Viterbi based MLSE equalizer is used in this architecture to compensate for the
inter-symbol interference that occurs when the channel impulse response is longer
than the time between two consecutive pulses. It is possible to obtain the number of
states in a MLSE equalizer depending on the length of the impulse response of the
channel, Lmp [63]. The number of states required for a BPSK signal in the MLSE
equalizer is LMLSE with:
LMLSE
L17)
The MLSE equalizer implemented will be able to cope with a predefined maximum
Threshold
Figure 3-17: Losses in the modified Rake receiver as a function of the normalized
threshold and the channel model.
channel impulse response. It is possible to downscale the MLSE equalizer if the length
of the impulse response does not require the use of the entire number of states. LMLSE
can also be interpreted as the length of the channel impulse response considered in
the MLSE equalizer. LMLSE = 1 would indicate a channel impulse response shorter
than 10 ns or 5 samples. In this case, it is assumed that there is no ISI. LMLSE = 2
indicates a channel impulse response longer than 10 ns and shorter than 20 ns. Each
symbol affects the next one, and a MLSE of 4 states is required. LMLSE = 3 requires
a MLSE of 8 states and LMLSE = 4 requires a MLSE of 16 states.
Figure 3-18 shows the losses in SNR associated with the MLSE equalizer as a
function of parameter LMLSE, for the different channels that were introduced in Table
3.2. This figure assumes a Rake receiver with a threshold Th equal to 1 LSB. For both
channel models CM1 and CM2, even without using a MLSE equalizer (or using one
of 2 states), a maximum loss of 1 dB is obtained. Channel model CM3 will provide a
satisfactory performance with a MLSE equalizer of 4 states. Only CM4 requires higher
complexity than this. This figure was obtained using a channel impulse response
representation of 4 bits (real and imaginary parts) and a maximum channel impulse
response of 25 taps.
Since the complexity of the MLSE equalizer is exponential with parameter LMLSE
[65], it is important to constrain it as much as possible. This is one of the structures
that will take a larger percentage of the area of the ASIC. For this reason, instead
of choosing an MLSE equalizer with a complexity of LMLSE= 4 , a complexity of
LMLSE= 3 is chosen, that, according to Figure 3-18 should be enough for the channel
model used in these simulations. Additional specifications that can be chosen already
are the maximum channel impulse response that will be considered (25 taps), and
the number of bits required for the channel impulse response representation (4 bits
LMLSE
Figure 3-18: Losses associated with the parameter LMLSE in the Viterbi demodulator.
for real and 4 bits for imaginary parts).
3.6
Choice of Packet Format
Each data packet consists of a preamble and payload as shown in Figure 3-19. During
the preamble duration, the receiver should detect the presence of a packet and achieve
coarse timing synchronization. The receiver would also achieve a channel impulse
estimation. For that purpose, the preamble is comprised of a series of 500 MHz
bandwidth impulses in which the separation between every two consecutive impulses
allows estimating the channel impulse response without having to compensate for
inter symbol interference (ISI). The preamble is, as in the previous transceiver, built
with a sequence of bits, each of them represented by 31 consecutive pulses modulated
in sign by a pseudorandom code (Gold Code). The preamble contains 16 repetitions of
this Gold code. The duration of the interval between every two consecutive pulses in
the preamble is denoted Tp, which is chosen to be an integer multiple of the sampling
time in the receiver for convenience, so that T, = Np
Tb = 60 ns with Np = 30
an integer. During the payload, the pulse repetition rate (PRF) increases in order
to achieve the required data rate. The time interval between every two consecutive
impulses is TP, = Nb'Tb = 10 ns. In the payload every bit of information is represented
by only one impulse, so that the processing gain comes only from the duty cycle of
the signal that is still less that 100%. The amplitude of the impulses does not change
from the preamble to the payload although this means that the average power during
the preamble is smaller than the average power during the payload. This also means
that the SNR changes from the preamble to the payload. Since the processing gain
also changes (number of impulses that represent a bit of information, duty cycle),
I PAYLOAD
PREAMBLE
Packet Begins
>10ns
I
I I
...L L
I
State 1
State 2
Acquisition Channel
Estimation
I
I I
..
LLLII
I
State 3
End of Preamble
Detection
1On s
,lC
'iii I I II I
State 4
Payload
Figure 3-19: Design of the data packet. Courtesy of V. Sze.
their specification is also independent.
The last repetition of the preamble has the sign of the impulses reversed and serves
to detect the end of the preamble. This preamble ensures a 90% packet detection rate
at the sensitivity level. The total duration of the preamble is 30 bts, comprised of
16 repetitions of the Gold code. The payload has a length of 5 kbits and the pulse
repetition period is 10 ns.
3.7
Baseband Functionality
The algorithms described in the previous sections of this chapter are fully implemented
in a custom ASIC. Part of them are also implemented in an FPGA in a discrete
prototype depending on the resources available in the FPGA. For both the complete
receiver and the simplified version to be implemented in the discrete prototype, the
receiver works like a state machine of four states. These four states, that are related
to the packet detection and demodulation, are the following:
1. Packet detection (PD) - The incoming signal is correlated with the pseudorandom sequence in order to detect a peak of correlation. The output of the
correlation is compared with a threshold. While the threshold is not met, the
receiver keeps looking for the peak of correlation. Once the threshold is met,
the packet presence is declared, and the position of the maximum is assumed
to contain the echo with the most energy of the multipath.
2. Channel estimation (CE) - The largest echo is assigned to the third tap of the
estimated channel impulse response. Another repetition of the pseudorandom
sequence in the preamble is used to estimate the channel impulse response.
Since the PRF in the preamble is lower than that of the payload, as long as the
channel impulse response is shorter than 60 ns, the receiver is able to obtain a
clear estimation of the channel impulse response. This channel impulse response
is truncated and quantized in the digital domain to reduce to a number of bits
that is feasible for implementation. At this stage, the system will choose the
effective impulse response that is used for the next stages. There is a chance
that PD happened with one of the last repetitions of the pseudorandom code. If
that happens, and CE is performed on the final repetition of the pseudorandom
code (where the signs are reversed), this situation is detected, and taken into
account. The next state is skipped and the receiver jumps directly to payload.
3. End of preamble detection (EPD) - Now we have to detect the end of the
preamble with the negative of the pseudorandom sequence to mark its position.
The receiver in this stage looks for the end of the preamble. During this stage
the maximum of correlation is compared to another threshold in order to ensure
that the signal has not been lost or that the system has locked to a false alarm.
During this stage the correlation is compared to a second threshold in order to
ensure there was not a false detection. This threshold is lower than the one
used in PD since it is assumed to be able to detect only false alarms with a high
probability.
4. Payload (PL) - Once the beginning of the payload is detected, the system
changes to payload demodulation. The matched filter is programmed with the
full impulse response estimated in CE and adjusted according to the parameters of the receiver. Normal communication systems have variable length payload. This usually involves including more complex information in the preamble
(length of the packet including usually some protection to this information). In
our case this is controlled by the program, and does not change. Once payload
demodulation starts, it repeats for a fixed number of cycles during which both
frequency and delay tracking are activated as required.
Figure 3-20 shows the block diagram of the digital baseband necessary to robustly
demodulate the UWB signal including the algorithms that were chosen in previous
sections of this chapter. In this block diagram, the programmable features used to
adapt the receiver to the channel characteristics are indicated. The energy spread
caused by the multipath can be compensated using a Rake receiver [63] that provides
up to 25 taps. The inter-symbol interference due to multipath is addressed with a
MLSE demodulator [50] with 8 states. These elements require an estimation of the
impulse response that may be obtained during the packet synchronization using the
correlators. The channel impulse response is estimated with a maximum precision
of 4 bits, using the information of 31 impulses. The input samples to the digital
baseband must have 4 bits, although less bits are required if the SNR is high or the
multipath is not severe. Timing synchronization is achieved with both a Delay Locked
Loop (DLL) and a Phase Locked Loop (PLL) [40]. Only the automatic gain control
is fed back to the RF front-end. The whole timing synchronization is performed in
the digital domain.
mmable
S
I
Figure 3-20: Required functionality of the digital baseband.
3.8
Non-idealities Model
Although not completely consistent, I will consider the front-end of the UWB system
as all the functional blocks that process the signal from the antenna output to the
analog-to-digital converter. Properly considering the RF front end includes those
elements that process the signal with the carrier. This includes the LNA and all the
programmable gain stages that are used before the down-conversion of the signal.
After the down-conversion, that should remove the carrier from the signal, obtaining
the in-phase and quadrature components of the low pass equivalent of the signal, the
signal will be further filtered and amplified, but this is performed in the baseband
domain.
A direct conversion transceiver can be modeled as shown in Figure 3-21. The
signal coming from the antenna is processed by an RF front-end. The RF front-end
function is to separate the desired signal from other signals present in the environment
in different bands. It is usually comprised of a low noise amplifier that helps to reduce
the final system noise figure and provides a minimum gain to the system. In addition,
some filtering is provided, that together with the baseband filters included in the
baseband part, helps to isolate the interest band where the signal is being received
from the rest of the signals present. The specification of this will have to do with
the coexistence that is expected to be with the other signals with which UWB is
competing in the spectrum.
The RF Front-end must meet the specifications on noise figure and linearity over
a bandwidth larger than 500 MHz. The impulse responses of both the antenna and
the RF front-end add to that of the channel. Since the receiver will only be able to
deal with a maximum channel impulse response set by design, the RF front-end must
be designed to meet this constraint.
Figure 3-21: A simplified block diagram of a direct conversion front end.
For each of the blocks reflected in Figure 3-21 we will obtain a equivalent model
that will allow us to model its limitations. The following things are taken into account:
* Antenna: The antenna is assumed to be a linear filter of the energy that it
receives from the environment. For that reason, everything that will be considered is an impulse response function ha(t) with a Fourier transform Ha(jw). It
will also be assumed that it has a noise temperature of Ta in kelvin.
* Low Noise Amplifier: The LNA has a linear impulse response hLNA(t), with a
Fourier transform HLNA(jW) that has a maximum gain of GLNA. It adds a noise
figure FLNA and it includes a non-linearity that will be modeled as a memory-less
non-linearity, and characterizeid with a third order intermodulation component
a3,LNA and a fifth order intermodulation component a5,LNA.
* Mixer: The down-converter is a multiplier that multiplies the input signal with
a cosine function to obtain the in-phase component of the equivalent low-pass
(ELP) of the incoming signal and with a sine function to obtain the quadrature
component. Mostly its non-idealities are modeled as a non-linearity characterized with the coefficients a3,m and as,m, a noise figure F, and an I-Q unbalance.
For the I-Q unbalance the in-phase branch is the temporal reference, and the
quadrature branch has both a difference in gain of Am and a difference in phase
of Arme.
* Baseband filter: This filter provides also some programmable gain, Gf, and
an impulse response hf(t). Finally, it is also assumed to introduce a nonlinearity characterized with the coefficients a3,f and as,.
Although the two
5
baseband filters will have slightly different impulse responses, we will not address
specifically this problem in this thesis. It has also a noise figure Ff.
* ADC: For this case, it will be modeled as a perfect analog to digital converter,
although it is possible the in-phase and quadrature ADCs have slightly different
gains and different integral and differential non-linearities, this part will not be
specifically address by this thesis.
This system, for simulation purposes is simplified to the following characteristics:
* AWGN: All the noise is added at the input of the system as consistent with any
linear system analysis. The noise figure will be only specified for this part.
* ELP: Everything is referred to the equivalent low pass filter model of the signal.
* Channel impulse response: The total impulse response of the front-end, being
the convolution of the impulses responses added by each of the components of
the system. The specification of this will be a maximum length of the channel
impulse response given as a number of samples.
* Non-linearity: Given as only one and to the part of the band that falls directly
into the baseband band after going through the mixer. For this reason the
analysis of this part may be oversimplified.
* Unbalance between the in-phase and quadrature components: Given in this case
only by a difference in gain of Am and a difference in phase of Arme.
The model including these characteristics is fully developed in Appendix A. For the
purposes of the specification of the system, in order to obtain a maximum packet
error rate (PER) of 10% a minimum SNR of 4 dB is required. This SNR is defined
as that of the payload of the data packet. During the preamble, since the duty cycle
of the signal is smaller, the SNR decreases. But this is compensated by the use of a
known sequence of 31 impulses during the preamble.
3.9
Link Budget
For the path loss model I will follow the model that was presented in [3]. Other study
related to this is [103]. A free space path loss model is adopted for propagation.
This model is based on the narrowband path loss calculations (known as the Friis
transmission formula), and justification for its use was provided in [104]:
L, = 201oglo4
(3.18)
where c = 3 x 10s m/s and fc is the geometric center frequency of the waveform:
fc =
fin and
fmin fImax
(3.19)
fmx are the -10 dB edges of the waveform spectrum. The effect of multipath
in UWB signals is already made explicit by the use of the channel impulse responses
indicated in [3].
It is not possible to generate a causal impulse that perfectly fits the spectrum mask
given by the FCC. For that reason, whatever power spectral density is generated, its
maximum must be fitted to the maximum allowed by the FCC. For that reason part
of the maximum total power that the UWB system would be allowed to use, will be
lost, as it is shown in figure 3-22. This would be losses as compared to a signal that
makes perfect use of the band available (a sequence of sinc pulses).
Frequency (MHz)
Figure 3-22: Explanation of the losses due to shape of the pulse.
This loss can be proved to be:
Lpulse shape = 10 log1 o
27r . BW
ioo BW
ft, IPo(iw)12 dw
(3.20)
where BW is the bandwidth desired for the signal and Po(jw) is the Fourier transform
of the pulse used for the BPSK (po(t)) such that its maximum (in frequency) is
equal to 1. If we perform this for a Gaussian pulse (or an approximation to the
Gaussian pulse) we obtain Lpulse shape = 2.33 dB. The full development of these results
is shown in Appendix A. For the power spectral density of PPM signal, refer to
[105, 106, 107, 108].
Taking this into account, and applying the formula that relates the system noise
figure with the propagation loss for different 500 MHz bands in the FCC compliant
band, it is obtained that to ensure a sensitivity of -81 dBm (signal power received at
10 m distance), the maximum noise figure of the receiver when it is set to provide the
maximum gain is 5 dB. Figure 3-23 shows the minimum received power at 10 m as
a function of the center frequency of the UWB signal. Figure 3-24 shows the range
of the automatic gain control required for each center frequency taking into account
a minimum distance of 30 cm and a maximum distance of 10 m. Finally, Figure
3-25 shows the maximum noise figure allowed in the system depending on the center
frequency.
3.10
Summary
The objective of this chapter was to specify a UWB system that transmits a raw
data-rate of 100 Mbps at 10 m distance using a bandwidth of 500 MHz in the FCC
compliant UWB band (from 3.1 GHz to 10.6 GHz). It has been established in this
chapter that a homodyne architecture is better suited for ultra-wideband signals and
that impulse UWB offers the possibility of scaling down the complexity of the receiver
35
4
Frequency (GHz)
45
5
Figure 3-23: Minimum received power as a function of the center frequency at 10 m.
Center Frequency (GHz)
Figure 3-24: Range of the AGC.
r
1!i
5
7
A0
z06.5
E
6
5.5
__
_
_
C4 4
__
_
_
i
__
_
_
__
_
_
j
Center Frequency (GHz)
Figure 3-25: Maximum noise figure of the receiver.
when the SNR and the channel impulse response are good. A data packet is defined
that is comprised of a preamble and a payload. The preamble is composed of 16
repetitions of a Gold code of 31 bits, in which every two consecutive impulses are
separated by a interval of 60 ns. The Gold code is used during the detection of the
data packet and the 16 repetitions ensure a time to achieve packet acquisition of 30
js. The separation between impulses allows the estimation of the channel impulse
response with reduced or no ISI. The channel impulse response is estimated using
the information obtained by receiving a sequence of 31 impulses. Each tap of the
channel impulse response is represented with a complex number in which both real
and imaginary parts have 4-bit precision. The sensitivity of this system is -81 dBm
with a noise figure of 5 dB.
Chapter 4
FPGA Implementation
As part of the process of developing a complete UWB system, a complete prototype
was build in the Digital Circuits and Systems group with off-the-shelf components.
This prototype was used to validate some of the theoretical claims of the system in
real-time conditions. As the components of the system are designed and fabricated,
they can also be individually substituted into the prototype to verify overall system
functionality. Due to the flexibility of the prototype, its applicability is not restricted
to only impulse UWB signals, and other kinds of modulations can be tested.
This transceiver is the result of the work of a group of students of the Digital
Circuits and Systems group : The dedicated pulse generator of this transceiver was
desinged by David Wentzloff. The RF front-end of this transceiver was designed by
Fred Lee. The digital baseband processor, comprised both by off-the-shelf and custom
designed boards was designed by Kyle Gilpin. The software interface between this
set of boards and a PC was designed by Nathan Ackerman. Nathan Ackerman also
provided an application interface to be able to send data packet through the wireless
link provided. The digital baseband implemented in the digital baseband processor
was designed by Vivienne Sze and myself.
4.1
Architecture of the Discrete Platform
The primary purpose of the UWB development platform is to allow rapid prototyping
and performance characterization of a UWB communication channel. Also, the UWB
development platform aims to provide testing of logic designs before ASIC fabrication.
In order to accomplish this, the UWB development platform must natively contain all
components essential for transmission and reception of data over UWB. Furthermore,
the UWB development platform must be modular to allow replacement of modules
without loss of functionality for testing and characterization.
Figure 4-1 shows the block diagram of the discrete prototype. It can be divided
into three distinct sections: the transmitter, the receiver, and the ADC and baseband
processing. The baseband UWB signal is generated using either a programmable
arbitrary waveform generator (AWG) or a dedicated pulse generator built using offthe-shelf components. The signal is then up-converted to a center frequency that may
Figure 4-1: Block diagram of the discrete prototype.
be selected using the programmable oscillator. The link between the transmitter and
receiver can be made through wireless transmission using various antennas [109, 110]
and spatial configurations to emulate a wide range of channels. The transmitter and
receiver may otherwise be directly connected through a cable with a variable attenuator to emulate an ideal channel. The receiver replicates a direct conversion receiver
front-end. After this, the signal is sampled by a dual ADC. The sampling frequency
of this ADC can go up to 1 GSPS, but for most of the testing purposes it was kept
down to only 500 MSPS. The output of this ADC may be either processed in real-time
using a digital baseband implemented in an FPGA, or buffered and processed offline
using Matlab. With this approach, virtually any baseband algorithm not requiring
real-time control of the front-end may be tested. This includes acquisition and fine
tracking, channel estimation, interferer rejection, and demodulation. Feedback loops
such as automatic gain control require real-time sampling, and therefore cannot be
tested using this acquisition board.
The characteristics of the blocks of this discrete platform are summarized in the
following subsections.
4.1.1
Transmitter
The transmitter up-converts the baseband signal to an arbitrary center frequency by
direct multiplication of the baseband signal with a sinusoid, as shown in figure 4-1.
Figure 4-2: Discrete prototype transmitter. Courtesy of N. Ackerman.
There are two choices regarding the generation of the baseband signal: the signal may
be generated using a dedicated impulse generator or an arbitrary waveform generator
(AWG).
The dedicated impulse generator can generate BPSK impulses with an up-converted
bandwidth of 500 MHz. All logic functions were implemented using commercially
available Emitter-Coupled Logic (ECL) components. This transmitter may generate
impulses every 20 ns, achieving a pulse repetition frequency (PRF) of 50 MHz. At
each interval of 20 ns, the transmitter may arbitrarily transmit a positive impulse, a
negative impulse or no impulse at all. This last feature is included because during the
data packet preamble the pulse repetition frequency is lower than that used during
the payload. The interface between the transmitter and a computer is implemented
using a board by Opal Kelly. This board contains an USB 2.0 that allows fast communication with the PC, and a Field Programmable Gate Array (FPGA) that allows
interfacing with the transmitter board. This FPGA has enough local memory to store
one packet of data, while it is being transmitted. Figure 4-2 shows a photograph of
the transmitted board, the FPGA board and the oscillator board when they are used
in this configuration.
The other option to generate the baseband signal is to use an AWG. The Tektronix
AWG710 allows generating any signal that can be represented with up to 8 bits of
precision at a maximum data rate of 4 GSPS. It contains a memory that would
allow storing up to 4 ms of data sampled at this rate, allowing for a large number
of data packets. Using an AWG enables a large amount of flexibility in the shape
of the pulses transmitted, modulation scheme and duration of transmission. For
example, although this work focuses on pulse-based systems, OFDM can also be
synthesized as long as the equivalent low pass signal has no imaginary part. The
samples for the AWG are generated using a PC and downloaded to the instrument.
Various non-idealities may be added to the signal prior to generating the samples
such as non-linearity. In-band interferers such as 802.11a or random tones may also
Figure 4-3: Discrete Prototype receiver. Courtesy of Fred S. Lee.
be added. The AWG is useful for implementing one-way communication with greater
flexibility than the impulse generator, but is limited in how fast new data can be
downloaded to the instrument. This platform is also flexible enough to generate
various waveforms within a bandwidth of 500 MHz, allowing the comparison between
different modulation schemes.
4.1.2
Front-end
The RF front-end is built entirely using discrete components. Its realization is shown
in Figure 4-3. As shown in Figure 4-1, the received signal is amplified by two cascaded
LNAs, then split and applied to two identical passive mixers performing I/Q direct
conversion. The 900 phase shift in the local oscillator (LO) is implemented by fixed,
unequal delays in the LO transmission lines to each mixer. This method of phase
shifting provides quadrature tones at 5.355 GHz, but also allows for tuning of the
I/Q error simply by tuning the RF center frequency. Tunable phase error is desirable
in the prototype for testing the robustness of the digital baseband. It is possible to
replace these transmission lines by a wideband 900 phase shifter. After frequency
down-conversion, the baseband signals are filtered and amplified with an adjustable
gain before being digitized.
4.1.3
Receiver
In the receiver end we have again two possible options for processing. First, it is
possible to sample the baseband I and Q signals from the front-end by a dual-channel
8-bit 500 Msamples/s ADC board that interfaces to a PC directly through the PCI
bus. Or it is possible to use a system of boards that includes the ADC, and several
Figure 4-4: Boards related to the ADC and baseband of the discrete prototype.
Courtesy of N. Ackerman
FPGAs that allow implementing both a real-time baseband to demodulate the data
packet and an USB 2.0 interface to send the received packets to a PC.
Figure 4-4 shows the different boards, some custom and some off-the-shelf, required to implement the ADC, the digital baseband the interface required. Hardware
located on top of one another in the figure indicates electrical connections. The details of the design of this platform can be found in [111]. These boards provide a
high speed dual ADC capable of sampling two independent input signals at 1 GSPS
with 8-bit precision (the high speed Atmel ADC - AT84AD001), and a Virtex2Pro
VP30-6 FPGA that will be used to implement the digital baseband of the UWB
system. The fact that the digital baseband is implemented in an FPGA allows an
important amount of flexibility to test in real-time the impact of different baseband
architectures in very little time.
4.1.4
Protocol
The current platform implements a one direction wireless link. In order to be able to
provide measures of packet drop rates, probability of errors, an API is implemented
that allows parsing data packets in a format that may be correctly interpreted by
the transmitter boards. On the receiver site, once a data packet has been received
in the digital baseband board, its content is buffered and sent through the USB 2.0
interface. If a UWB signal is detected, the signal processor will retrieve the data bits
from the UWB signal and send them to the module for transportation to the PC.
At the same time, the module responsible for receiving desired data to transmit will
receive incoming transmission requests from the PC and send the appropriate data
to the discrete pulse generator for creation of a UWB signal. The integrity of the
data packet is checked in the receiver PC and an acknowledgement signal is sent back
through the intranet (wired) to the transmitter computer.
4.2
4.2.1
Application in the Digital Baseband Design
Limitations of the Digital Platform
The main limitation of the digital baseband implemented in this backend is the maximum number of gates available. The FPGA used is a Xilinx Virtex2Pro VP30-6, with
1 million gates. How these gates are used depends on the architecture chosen, and a
direct translation of the Verilog generated for the chip may not be the most efficient
application of the resources in the FPGA. For the development of the digital baseband, less than half of this is available, in order to avoid severe routing problems and
to use part of the FPGA to debug the baseband. The components required to monitor the baseband inside the FPGA are automatically added when using Chipscope
software by Xilinx.
In a real wireless system, every clock and oscillator in the transmitter are generated
from the same reference (carrier generator and 100 MHz digital clock controlling the
pulse repetition frequency), and the local oscillator and the sampling clock (500 MHz)
in the receiver are generated from another reference. The differences between the
carrier generator in the transmitter and the local oscillator in the receiver lead to
difference in phase when the signal is demodulated. That is usually corrected in the
receiver with a phase-locked loop (PLL) subsystem. The errbrs in frequency of the
100 MHz digital clock in the transmitter and the 500 MHz digital clock in the receiver
translate in a drift of the incoming pulses generated in the transmitter with respect to
the sampling instants in the receiver. This is corrected in the receiver using a delaylocked loop. In a real wireless system where there is only one timing reference in the
transmitter and another timing reference in the receiver, the errors that are corrected
by the PLL and the DLL are correlated. It is possible to take advantage of this by
either using the information of the PLL (more precise) to refine the DLL, or to not
use a DLL and extrapolate the corrections a DLL would generate from the corrections
that the PLL is generating. In our system, since there are four independent timing
references, the errors corrected by the DLL and the PLL are independent and it is
not possible not to include both systems working independently.
The local oscillators used for the carrier in the transmitter and the carrier in the
receiver are very stable. Their frequency is generated with a precision better than 2
ppm. The change of phase between the transmitter and the receiver caused by this
difference is negligible for the duration of a data packet and no PLL is required. Only
the initial phase is necessary and that is compensated by the matched filter. The 100
MHz and the 500 MHz are not as stable a reference as the other one. For that reason
the change in delay for the duration of the packet is large enough to make the use of
a DLL necessary.
4.2.2
Specifications and Interfaces
The signal received is going to be comprised of data packets as the one indicated in
the previous chapter, but with some different parameters. First, during the preamble,
the incoming pulses are not separated 60 ns, but only 40 ns. This is less than what
-CM1
--- CM2
4
4
%...CM3,
CM4
'A
m
'3
V
o2
-J
.
..
.....
1
5
10
15
20
Assumed Channel Length (samples)
25
Figure 4-5: Losses due to misrepresentation of the channel impulse response in the
discrete prototype.
is needed to properly combat every possible multipath situation, but it allows a very
convenient partition of the architecture. In addition, it is necessary to create an easy
infrastructure that allows the reception of data at a raw PRF of 100 Mbits/s (time
between consecutive impulses in the payload equal to 10 ns) or 50 Mbits/s (time
between consecutive impulses in the payload equal to 20 ns). The reason for this is
that currently there are two options for the transmitter. On one hand, we can use
for the discrete prototype the arbitrary waveform generator (AWG) and with that
equipment it is possible to generate packets at 100 Mbps, but it is not possible to
change the packet sent wirelessly dynamically. Or the custom pulse generator board
may be used instead, that allows generating packets at 50 Mbps, but offers dynamic
control on the packet that is transmitted.
The total number of gates available on the FPGA for actual implementation of the
circuit is lower than that required to fully implement and test the transceiver that will
be developed in chapter 5. For that reason, a simpler version, with less functionality, is
implemented in this digital baseband. The changes are a lower number of correlations
in parallel (20 as compared to 150), a 5 tap partial Rake (as compared to the 25 tap
that is presented in next chapter), and neither a Viterbi-like MLSE nor automatic
gain control are included. The impact of these adjustments in the final performance
of the system are shown in Figure 4-5.
The interface between the dual ADC and the digital baseband is comprised of
a vector of four complex signals in which both the real and imaginary parts are
represented with four bits. Even if the ADC are able to provide 8 bit precision
in every case, in order both to reduce the percentage of the total FPGA required
and to make it work under conditions similar to that of the final ASIC, only the 4
most significant bits are used as input to the baseband. The vectors are properly
87
Mbps)
1bps)
Figure 4-6: Block diagram of the discrete prototype baseband.
synchronized to a 125 MHz clock that is also the main clock of the digital baseband.
The output of the baseband is comprised of a vector of four demodulated bits in
parallel, a valid data signal, and a 25 MHz clock synchronized to the data. The valid
data signal goes high to indicate that the data packet is ready to be read and stays
high while the vector of demodulated bits represents valid data. This signal may be
used as an interruption when interfacing with a computer that is to read the received
data packets.
4.2.3
Architecture of the Baseband
Figure 4-6 shows the block diagram implemented in the system. The samples are
provided by the ADC as vectors of four consecutive complex samples, properly aligned
with the rising edge of a 125 MHz clock. The FPGA allows some parts to work at
125 MHz, but to perform all the operations and all the functionality at this frequency
would require special care in the routing of the different circuits inside the FPGA.
At this frequency, only the retiming block and a series to parallel operation of the
incoming data through 5x parallelization are performed. After this operation, the
unit of processing is a vector of 20 chronologically ordered samples, and the clock
necessary to process them is only 25 MHz, obtained internally from the 125 MHz
clock. All the mathematical operations are implemented at 25 MHz, simplifying the
automatic design of this part by using standard automatic place and route. For that
reason we will have a fast clock domain and a slow clock domain as in the previous
prototype.
A clean interface between the high speed clock domain (running at 125 MHz)
and the slow speed clock domain (running at 25 MHz) is needed. The signals going
* clk25
x
* Clk125
1
=
.Counterl1 [2:0]
'hx
*b Counter2 _fast, 'd"
4* Inputs[15:0]
'hx
Sb tmp[79:0]
'hx
Figure 4-7: Control Signals for the Serial to Parallel Register.
to the slow speed clock domain should be latched into this clock domain after they
have been ordered in a vector, using for that the control signals indicated in Figure
4-7. This figure shows that the rising edge used to latch the data into the slow clock
domain is safely separated from the rising edge of the high speed clock domain that
presents the data at the output of the serial to parallel register right before. The
decision data, including the control to the retiming block, is latched to the 125 MHz
clock domain at least two cycles after it became stable as outputs of the slow clock
domain.
Correlators
The FPGA size limits the number of parallel correlators that is possible to implement.
The transmitter board allows transmitting impulses at integer multiples of 10 ns.
Because of this, the separation between the consecutive impulses for the duration
of the preamble is chosen to be an integer multiple of the time between every two
consecutive impulses in the payload. The architecture used for the correlators is
shown in figure 4-8.
The basic correlator unit in this case is comprised of five parallel correlators that
are not time shared so that each of them is accumulating only one correlation at every
instant during coarse acquisition. In this diagram, the variables w[i] have several uses
at the same time as the correlators themselves. During coarse acquisition the correlators are accumulating the correlation of the incoming signal with the pseudorandom
sequence. For that reason the value of the variables w[i] is common to all the correlators and contains the values of the sequence of bits of the pseudorandom sequence.
The outputs of the correlators, c[i] is not added together in groups of 5. The reason
the unit of correlators comprises 5 correlators is that this is the distance between
every two consecutive impulses in the payload is 10 ns, that is the same interval as
five samples at 500 Msamples per second. During the channel estimation state, only
the top five correlators will be working. After a full correlation of the pseudorandom
sequence we obtain an estimate of five taps of the channel impulse response. Since
they are the five taps closer to the position of the maximum, they are expected to
be also the ones containing most of the multipath energy. Those taps are estimated
with nine bits of precision. After state 2, the outputs of the 5 correlators of the first
group of correlators is used (by conjugating) to obtain a matched filter of only five
taps. This matched filter is used during states 3 and 4 (payload), and discarded. It
Figure 4-8: Block diagram of the basic structure for the correlators and matched
filter.
is assumed that the channel for each packet is white, but the coherence time is larger
than the duration of the packet.
Timing Synchronization
As indicated in a previous section, the transmitter carrier generator and the receiver
local oscillator are very well tuned eliminating the need for a phase locked loop (PLL)
during the packet demodulation, since the total phase change for the duration of a 16
kbit packet is negligible. On the other hand, the clocks controlling the sampling rate
of the ADC and the pulse repetition frequency do not posses the same stability. For
that reason, it is necessary to implement some kind of delay locked loop. It follows
the same scheme that was already used in the first prototype developed in chapter 2.
The only relevant change with respect to the previous version is the input. During
the preamble, the input to the DLL is given by the energy accumulated in the previous
and next taps (correlators 2 and 4 in Figure 4-8) while they were correlated with
the matched filter. During the payload demodulation, since each bit of information
is encoded using only one impulse, the sum of the energy of those same taps for 32
cycles of the 25 MHz is accumulated. This takes into account the delay of one impulse
out of every four if it is working at 100 Mbps, or one out of every two for a 50 Mbps
data rate.
The one sample granularity required in this tracking loop is obtained using a
similar block as the one designed for the prototype developed in chapter 2. In this
case, the first operation does not imply combining uniformly four consecutive samples,
since there is a partial Rake of five taps, and since the samples separation between
consecutive data impulses is equal to five or ten samples. In this case, it is necessary
to provide the input samples to the series to parallel block as a chronologically ordered
INO
X -OutO
-
MUX
Outi
IN1
IN2
O
IN3
O
Out2
-•ot3
ControlPos
Figure 4-9: Block diagram of the retiming block.
vector. For that reason, after some samples are selectively delayed, a programmable
connection matrix is used as shown in Figure 4-9.
4.2.4
State Machine
During the packet detection and demodulation process, this baseband will go through
the following states:
1. Packet detection - In each cycle of the system, 20 correlations are calculated and
their result is compared to the threshold. If any of them meets the threshold,
coarse acquisition finished. The transition to the next state consists on aligning
the next received impulse with the third correlator in the figure 4-6.
2. Channel Estimation - In the next iteration, the first five taps of the channel
impulse response are calculated. This is done by performing a further correlation
with the incoming signal. The results are stored with a precision of four bits to
perform the correlation. The sign of the center piece is compared to the ones
stored during coarse acquisition. By doing this, we are ensuring that we notice
the end of the preamble if we come too close to it. If they are different signs, it
indicates that the end of the preamble has been reached, and that the channel
estimation is also reversed in sign.
3. End of Preamble Detection - We keep doing the same thing as before, but now
we take into account the final 5 samples and use the channel estimated in the
previous stage to maximize the SNR. We are looking at a change in phase of
180 degrees.
4. Payload - Here the correlators are not working as such since there is no storing
of the previous result to compute the next ones.
-O 100-
-100
D
31
31.5
32
32.5
33
33.5
34
60
60.5
61
time (g s)
-F 100
0
a:-100
58
58.5
59
59.5
time (Ls)
Figure 4-10: Part of the preamble of a data packet as measured in the discrete
prototype, without (above) and with an interference(below).
4.2.5
Results
Figure 4-10 shows an example of a part of the preamble of the data packet as measured
in this discrete prototype using an FCC compliant signal centered at 5.355 GHz in a
wireless link without interference and affected by an interference with an SIR = -11
dB in the preamble. The receiver achieved packet synchronization in presence of the
interferer, and the channel impulse response was measured to be below the inter-pulse
interval of the payload. Packets of 10000 bits were perfectly demodulated without
ISI.
This discrete prototype was used to demonstrate a 100 Mbps data rate using the
AWG with data packets of 32 kbits payload. Using the transmitter set-up at 50 Mbps,
the API was used to transmit a continuous data stream comprised of a sequence of
jpeg images.
4.3
Application for Testing Multitone-FSK
The flexibility provided by this system allows a full range of possible testing. In this
section the first test of a communication scheme that has been proved to be optimal
in [4] is shown.
4.3.1
Signal Definition
In frequency shifting keying (FSK) systems, different symbols are represented by sinusoids with different frequencies. For multitone FSK, the symbols are combinations of
multiple sinusoids with different frequencies in the band. If we have a set of M mutu-
I
100
so
o0
-
-100
0.6
0.8
1
1.2
1A
Time (nh)
1.6
1.8 4
Vlin
Figure 4-11: Example of MFSK signal. Courtesy of Cheng Luo
ally orthogonal frequencies over the allocated bandwidth, every possible combination
of these tones is a possible symbol to use. For Q-tone FSK, there are
( ) possible
Q-tone combinations from M tones. Let S denote the complete set of symbols, and
Sm an symbol in the set, i.e., Sm E S. Let T, be the duration of the symbol. Then,
each symbol can be represented as [4]:
ie 2
x(t) =
fkt,
0<t <T
(4.1)
keSm
Some of these values must be specified as a function of the characteristics of the
channel and the bandwidth available. Let us assume (Af)c denotes the coherence
bandwidth of the channel. Thus, two sinusoids with frequency separation greater
than (Af), are affected differently by the channel. In the same way we define (At),
as the time spread of the channel. If we send two impulses with delays closer than
(At), then those impulses will be blurred together by the channel impulse response
if they are in the same frequency. The inter-symbol is chosen to be larger than (At),
to ensure that there is no ISI. The separation in frequency of the tones available is
also made larger than (Af)c, to ensure that the fading of each tone can be assumed
to be independent of the fading of all the other tones. If the alphabet built like this
is large enough, the probability of collisions between symbols of different users is very
low. Parameters such as the duty cycle or the separation between frames may be
optimized to reduce this. The resulting signal is shown in Figure 4-11.
The impulses used can be made longer and reduce its amplitude in order to comply
with any legal rule imposed in the system. Since the integration that is performed in
the receiver depends only on the duration of the impulse (it is non-coherent integration
and for that reason, the integration gain is sensibly lower than coherent integration)
and can be made as long as needed.
F-T
I
0
JiFterL
Fl.-r
Input
Signal
.iTL
Filtrr S
.
-r2
2ii
f3
fIFr _o4)
Figure 4-12: Architecture for demodulation of Multitone FSK [4].
4.3.2
Receiver Architecture
The receiver uses a bank of matched filters with their central frequencies tuned to
each of the M tones[4]. The simplicity of the receiver is shown in Figure 4-12. After
the matched filter, the power in each of the outputs is averaged over an interval in the
order of the duration of the symbol, and the outputs of each channel are compared
to a threshold chosen to minimized false detections when the tone is not being used,
using a Neyman-Pearson curve.
It has been shown [4] that this signal modulation scheme, with this conceptually
simple receiver achieves a capacity close to that of the wideband capacity limit independently of the channel multipath and for typical channel parameters. This receiver
is robust when the bandwidth between the tones is smaller than (Af)c (and then, the
fading/channel impulse response for these two tones correlated). When this happens,
the effect is that multitone FSK still approaches the wideband capacity, but in this
case, the convergence is slower. No study on this slower convergence (how slow it is)
has been done yet [4].
4.4
Conclusions
In this chapter we have presented a flexible platform for testing of UWB transceivers.
This platform is flexible enough to test non-impulse signals (such as multitone-FSK),
while at the same time providing the right functionality to test UWB systems under real-time conditions. This system has been used to implement a smaller scale
FCC compliant UWB system than the receiver developed in chapter 5. Using this
prototype, wireless data links of 100 Mbps and 50 Mbps were demonstrated.
Chapter 5
ASIC Implementation of a
Baseband for FCC Compliant
UWB
This chapter presents the architecture developed to implement an FCC compliant
robust UWB transceiver. This follows the design of an architecture that implements
the functionality that was presented in chapter 3 in order to obtain a robust receiver.
This chapter includes both a description of the architecture, and the results of the
measures required for testing the functionality and the power dissipation.
5.1
Functionality of the Chip
This circuit is part of a system designed in our group for FCC compliant UWB
communication in the 3.1 to 10.6 GHz band. Figure 5-1 shows the architecture of
the system. It comprises a homodyne receiver in which both the front-end [112], the
transmitter [113], and the ADC [114], have been designed by other students in the
Digital Circuits and Systems Group.
The UWB data packet is comprised of a preamble and a payload, both of fixed
length. The preamble is comprised of impulses that are transmitted with a separation
interval of 60 ns. It consists of 13 repetitions of a Gold code of length 31 that
will be used by the digital baseband to detect the presence of the packet and to
achieve packet acquisition. The larger time interval between impulses in the preamble
allows estimating the channel impulse response with a lower impact of inter symbol
interference. The payload on the other hand is comprised of a sequence of BPSK
impulses of 500 MHz bandwidth transmitted with a pulse repetition frequency of 100
MHz. Since each bit of information is represented by one impulse, this system allows
transmitting a raw data rate of 100 Mbits/s. The packet length is 5 Kbits.
The digital baseband performs the detection and demodulation of the data packets. It is implemented based on the assumption that it receives samples from a direct
conversion receiver with synchronized ADCs in the in-phase and quadrature channels.
It was determined in chapter 3 that the signal processing required in the demodulation
Bit to
Transmit
NM
Pus
Generator
TR
,TRANSMITTER
rnt
4
..
..
..
..
...........
------
-------------------
'-----
REEVRI
Figure 5-1: Block diagram of the full transceiver.
of the signal depends on the channel quality. Because of this, the digital baseband is
designed with two main objectives. First, it is going to be able to estimate the channel
impulse response during the preamble of the data packet, and use this information
both on a partial RAKE and an MLSE for channel compensation. Second, it provides
the possibility of activating and deactivating different subsystems of the baseband,
making possible to scale the complexity (and the energy dissipation) of the signal processing applied to demodulate the signal, allowing it to adapt to the channel quality.
The digital baseband has been designed also to minimize the number of signals that
are fed back to the analog front-end and the ADC. Only the automatic gain control is
set to the front-end, and the whole synchronization is performed autonomously in the
digital baseband. Figure 5-2 presents the block diagram implemented in this system.
5.2
Interfaces and Clock Structure
This baseband was designed to work with a dual scalable, successive approximation
register (SAR) ADC [77] sampling at 500 MS/s. Each of the two ADCs is comprised
by 6 parallel ADCs each of them sampling at 1 / 6th the total sampling frequency.
Following the trend already explored in the first prototype, the outputs of this ADC
are presented in parallel to the baseband, as chronologically ordered vectors of six
consecutive samples aligned with a 83.3 MHz clock (1/ 6th of the sampling rate) to
latch this information into the baseband. Each sample represents a complex number
that has a real part and an imaginary part, both with 5 bits.
The baseband is divided in two clock domains: a high-speed clock domain running
at 83.3 MHz, and a slow-speed clock domain running at 16.6 MHz. The 83.3 MHz
clock is provided externally to the baseband chip by the ADC. The 16.6 MHz is
generated internally by dividing the high-speed clock frequency by 5. The highspeed clock domain contains a retiming block for delay tracking, a series to parallel
Threshold
mberof states
To RF Front-end
I
Programmable
Features
Figure 5-2: Block diagram of the functionality of the chip implemented.
converter that performs a 5x parallelization, and the main control of the baseband.
The low-speed clock domain implements all the signal processing, taking advantage
of the longer period of the clock. It also determines the next state in the receiver.
The demodulated bits are presented through a parallel interface of six demodulated bits at a time at 16.6 MHz, with a signal that may be used as an interruption
and that indicates when the bits presented are valid data bits.
The baseband can be programmed using a serial port. This procedure loads a shift
register with several numeric values that are required for the normal function of the
chip (threshold, filter taps), and a sequence of flags that indicates which parts of the
system should be activated or not. This vector serves to adapt the complexity of the
receiver to the signal detected. As different subsystems are presented in this chapter,
the programmability options available for each of them will be introduced. In the
following sections we will provide details of the different subblocks of the receiver.
We will start with the functionality implemented in the high-speed clock domain.
Taking into account the complexity of the slow-speed clock domain, a section will be
dedicated to the correlators, the timing synchronization blocks, the channel analysis
sub-system and the MLSE equalizer.
The baseband system goes through a state machine of four states during the
detection and demodulation of a data packet as shown in Figure 5-3. The duration
of each of these states varies with the state itself. We will define an "iteration" as the
time necessary for the receiver to gather enough data to perform a decision on which
to jump from one state to another, or to perform a correction in phase of the received
signal or in delay. During the time of an iteration, no control adjustments are made
in the receiver of any kind. An iteration is going to comprise an integer number of
cycles of the 16.66 MHz clock. The number of cycles for states 1 (PD), 2(CE), and
PACKET DETECTION
CHANNEL ESTIMATION
PREAMBLE END
PAYLOAD DETECTION
Figure 5-3: State machine implemented in the system.
3 (EPD) are 36, 31 and 31 respectively. States CE and EPD duration are associated
to the length of the Gold Code. The duration of the iteration in state PD is linked
to the number of different delays that the system is testing in each iteration. Since
the receiver performs 150 correlations in each iteration, it is testing a time interval
equivalent to 5 cycles of the 16.66 MHz, and this must be added to the duration of
the iteration so that in the next iteration the next 150 possible delays are tested. In
state 4 (PL), taking into account that the duration of the payload is fixed, and that
once in this state the baseband will provide a sequence of 5k bits no matter what,
the iteration represents the time that both the delay locked loop and the Costas loop
perform an update. Each iteration of state 3 takes 32 cycles of the 16.66 MHz clock,
and there are a total of 32 iterations for the total payload. This number of cycles
was chosen because it was an integer power of 2 while being still able to track a total
frequency difference of 100 ppm between transmitter and receiver oscillators.
5.3
High-speed Clock Domain
Figure 5-4 shows the block diagram of the high-speed clock domain. This block receives the input from the dual ADC as a vector of six consecutive samples aligned
with the rising edge of an 83.3 MHz clock. For the purposes of processing the signal in the receiver, this block performs a 5 way parallellization providing vectors of
30 consecutive samples. In this way, it is possible to reduce the clock frequency in
most of the receiver to 16.66 MHz, simplifying the timing design of the most com-
Samples
ADC
Samples
ADC
6 complex
samples
I = 5 bits
Q = 5 bits
60 bits
Sample
0=>5
[--
L
Sample
6=*11
6 complex
samples
I= 5 bits
Q * 5 bits
60 bits
Correlators
-j%
Sample
24 29
CIk @83.3MHz
1
0I
,•1011WIM
I = 5 bits
Q * 5 bits
60 bits
- _CIk @16.6MHz
--
----
InitFrame
StateDemod (2bits)
To Slow-speed
Clock-domain
10- Counter (6 bits)
NextState (2 bits)
-
Res
-
Swapperln (3bits)
From Slow-speed
Delayln
Clock-domain
Delayiln
Waitln (5 bits)
Figure 5-4: Block diagram of the high speed clock domain.
plex mathematical operations that will be performed only at this reduced frequency.
Although for the process chosen, 16.66 MHz is a low enough frequency, the critical
path in some of the subsystems (concretelly the MLSE) will be close to the period
of this clock. The parallelization is performed in the serial to parallel block, and at
the output, the samples are latched using the slow clock in order to provide a clean
interface with the slow-speed clock domain.
The high-speed clock domain includes also the main control of the system. Following the paradigm already established in previous designs, the control of the system
is distributed, with individual subsystems receiving from the main control the 16.66
MHz clock, the state in which the receiver is operating and a signal (InitFrame)that
indicates when an "iteration" starts. Decisions related to changes of state are taken
in susbsystems present in the slow-speed clock domain. Only the result, expressed as
INO
--- OutO
-Outl
LU1-
MUX
IN1-·~~-l
IN3
D0
IN4ý
INS-EFD
-Out2
O3F:
-Out3
oZz
-' ut
0ZWP
1
oz
0
-Out4
CO
11X
ýOnC3
-Out5
S
1C4
"
ControlPos
Figure 5-5: Block diagram of the retiming block.
the next state, plus some minimum additional information (timing adjustments necessary for the retiming block and/or extra wait cycles) are sent to the main control,
simplifying its design.
The high-speed clock domain also includes a retiming block, whose block-diagram
is indicated in Figure 5-5. Its function is the same as the homologous structures in
the previous receivers. There are two differences in the implementation. Since the
vectors now have 6 elements instead of 4, the structure used before must be expanded
accordingly in order to be able to provide the vectors indicated in Figure 5-6. In this
receiver, the different elements of these vectors are processed independently. It does
not add together without previous processing several consecutive samples. For that
reason, the output of the retiming block must be chronologically ordered, and the
retiming block requires a configurable connection matrix.
5.4
Correlators/Matched Filter Block
The correlators block is part of the slow clock domain working at 16.66 MHz. It
receives, from the high speed clock domain with every rising edge of the 16.66 MHz
clock, a vector of 30 consecutive complex samples, with real and imaginary parts
represented with 5-bit 2s-complement binary numbers. It is possible to program this
block to perform either 30 or 150 correlations during each iteration of packet detection.
This impacts the length of the preamble required to ensure that the packet is detected.
This trade-off will be explored in the last section of this chapter.
100
ADCO[n
I ADCII ADC2[nj
I ADC3nj
I ADC4[nj
ADC5nI
\
\
\
\
\
\
\
\
\
\
\
\
\
\~
\
\
\
\
\
\
\
\
I
I
-\
\
-•
\
-\
\
\_
I
I
\
\
\_
\
\
\
\
'
\
\
\
\
AD"o0nI ADCi[n
J
\
\
\
\
'
\
\
\
AUW[n]
ADaIn
[n
\
\
\
\
\
\
\
\
oADcOn] ADC1In] '[QC2In
I
\
\
\
\
-+
j
X$DC3( "
\
________\
A
\
.
*ADGOn
I A"DCI-nijOCj
2[jAgP" I
AD=O(n] ADd
DOn
ADCOIn]i
Figure 5-6: Retiming block.
The correlators block in this implementation has grown from the simple structure used in the previous implementations to a complex block that allows even a 25
tap partial RAKE to be implemented. Its functionality is extended as compared to
previous receivers, since it will not only calculate the correlations of the incoming
signal with the pseudorandom sequence. It indicates the position of the maximum
of correlation, if any, during coarse acquisition. It provides the channel impulse response estimation for its use in other subsystems. It also implements the variable
Rake receiver. Figure 5-7 shows the high level block diagram of the correlators that
also indicates the inputs and outputs required by this block.
The correlator block is comprised of 6 correlator groups, each of them including 5
"slices". The outputs of the correlator block depend on the programmability and on
the state. The summary of the features is the following: During payload detection it is
possible to choose from programmability either to calculate 30 correlations in parallel
or to calculate 150 correlations in parallel. The duration of the iteration will depend
on the number of correlations calculated. In the case of 30 correlations, the iteration
has a length of 32 cycles of the 16.66 MHz clock. In the case of 150 correlations,
the iteration has a length of 36 cycles of the 16.66 MHz clock. In both cases, the
correlations are estimated with 28 out of the total of 31 impulses, which reduces the
signal to noise ratio of the peak of correlation by 2 dB. The outputs provided in
any of these two cases for the other blocks to process is comprised of a signal that
indicates if the packet detection threshold was met (ThreshMet0), the value of the
maximum sample (RealforCheck and ImagforCheck), its L1 metric (Maximo Value),
and its position (PosAxisl, PosAxis2 and PosAxis3). Of the three values given of the
position, only PosAxis2 and PosAxis3 are required when the baseband is programmed
to perform only 30 correlations during the packet detection estate.
During the Channel Estimation state, three outputs of the block are relevant.
• 10 bits
25 complex
samples
I * 2/3/4 bits
.9A
234
IGoldCode
I
To Coarse
Samples
Acquisition
-IMF
To CostasLoop
and DLL
SamplO
L
I
I
I
I
ii
II
ill
I
lex
To Channel
bit Analysis
:1
; I
Samples
24329
Q
30 complex
?",, \
Correlators
-.
Q =>5 bits
60 bits
Maximum
1 0-5
Q
> 10 bits
Wl=:;, I I I•I
PoesAdsl (3 bits)
PosAxis2 (3 bits)
PoAs3(3bits)
MaximumValue (11bits)
ThresmetO
To Coarse
Acquisition
Figure 5-7: Block diagram of the correlators.
102
First, the channel impulse response estimated appears in ChanOutReal(real part) and
ChanOutlmag (imaginary part). This output is given with 10 bits in both the real
and imaginary parts. Signal ThreshMetl indicates if a second threshold lower than
the first has been met, indicating that the previous packet detection was not a false
alarm. Finally, SignRealMax, SignImagMax, RealforCheckand ImagforCheck,may be
used to detect a change of the polarity in the signature sequence that would indicate
that the preamble has ended and the payload should be demodulated immediately
afterwards, skipping preamble end detection step.
During the Preamble End Detection state, the relevant outputs are ThreshMetl
(indicating if the signal is still present), SignlmagMax, SignRealMax, RealforCheck,
and ImagforCheck (to detect the end of the preamble). Finally, during the payload
detection, OutputCorrRealand OutputCorrlmagcontains the real and imaginary parts
of the outputs of the Rake receiver. Six outputs are obtained at a time, each of them
rounded to 10 bits. They are fed to the MLSE demodulator. The first of the outputs
(represented by the LSBs of the whole vector) is also used as input to the Costas
loop. In addition, PrevReal, PrevImag, NextReal and Nextlmag are used as inputs to
the DLL.
Figure 5-8 shows the block diagram of a correlator group, comprised of five "slices".
Its obtains partial results of the matched filter. During Packet detection, either 5 or 25
correlations with the signature sequence are calculated. Their maximum is obtained
and its position given as two 3 bit value. The latency for this is two cycles of the slow
clock after the last correlation. During channel estimation, five weights of the channel
impulse response are calculated. During payload detection, this block already receives
all the taps estimated of the channel impulse response, and they are fed into the five
slices as shown in Figure 5-8. The outputs of these are added together in groups of
five, so that the outputs of the first multipliers in each slice are added together, the
outputs of the second multipliers are added together, ... Those blocks associated to
at least one set of five consecutive taps that have been already made equal to 0 are
directly shut down. Since the numbers of multipliers activated depends on the length
of the channel impulse response used for the Rake receiver, the number of outputs
obtained on this structure will vary from one to five as the channel impulse response
grows in length.
Figure 5-9 shows one "slice". In the correlator bank we use 30 of these units, and
they perform different functions depending on the programming or the stage of the
baseband.
The input x to this block is a complex number in which both the real and imaginary parts are represented with five bits. x represents one arbitrary element of the
vector of 30 samples that is input to the correlators. Other inputs to this block are
up to five possible complex taps in which both the real and imaginary parts are represented with 2s-complement numbers with four bits. x is then multiplied with up to
five different taps and the result may or may not be accumulated in the registers by
using the signal InitZero. When InitZero is 1, it will add the number stored in the
register to the result of the product of x with one of the five taps. If InitZero is equal
to 0, the result of the product of x with one of the five taps is stored directly in the
register. This block also offers the possibility of obtaining an estimate of the power
103
010
ao]
c[s] c[lO] C[15)
al]
will
w[2]
y[1]
y[2]
w[3] w[4]
SLICE 0
y[3] y[4] Abs(y.M)Pos(y,)
i
af51 a10o] a15]
sr.i
m[Ol
LI)
LL
c[6]
!
c111]
c[16] c[21]
w[ol
will
w[2]
w[3]
y]l]
y[2]
wf41 SLICE 1
Y3]1 y[4] Abs(y,.)Pos(ym,)
a[6]
a[11]
a[16]
a[21] m[l]
c[91
c[14]
c[19] of24]
w[il
wI21
w[3]
y[)1
w[0]
fol
a[41
I
!
y[1] y[2]
a[9] a1
,[l9
5a[241
:
p[O]
c[i]
f1]
f4]
a[201
I
0_
a[24]
Abe(.
a[20]
i
A[141 a[191
Pos~m
p1]
a[21]
w[4] SLICE 4
y[3] yf[4] mAb(y)Pos(y
1
D
m[4]
as22]
p[4]
o4
as231
a[24]
Figure 5-8: Block diagram of a correlator group.
InitZero
Figure 5-9: Block diagram of the minimal unit of the correlators.
104
of the value contained in the register using the L1 metric. The five metrics are then
compared to produce the value and position of the register value with larger power.
There is an option of programming the slice during packet detection to perform either
one or 5 correlations at the same time.
During PD, the taps have the values of the bits of the pseudorandom sequence
that is used as the signature sequence in the preamble. By doing this, each of the correlators obtains the value of the correlation of the incoming signal with the signature
sequence. After a full iteration, the results of the five correlations are compared and
the one with the largest energy is chosen and its position is given as output. During
channel estimation, the first of the correlators has as weight the bits of the signature
sequence, but the other four have zero weight. The result of the first correlator after
the iteration of channel estimation is one of the taps of the channel impulse response.
The result is represented by ten bits for both the real and imaginary parts. During
EPD, again only the first correlator is used, and the weight contains the bits of the
signature sequence. Finally during PL, the weights of the slice contain up to four bits
of the complex channel impulse response estimation. If the channel impulse response
was detected (after nulling out of the coefficients that do not meet the threshold) to
have a length of less than 5 samples (10 ns), only the first multiplier in each slide is
used. If the channel impulse response is detected to have a length of 6 to 10 samples,
2 multipliers are activated. If 11 to 15 samples, 3 multipliers. If 16 to 20 samples, 4
multipliers. If 21 to 25 samples, 5 multipliers. The results are combined with other
results from other slices to obtain the output of the partial Rake.
The L 1 metric is used to estimate the output of the correlators:
ljy[n]ll~
= I R{y[nj]} I+ j {y[n]} 1
(5.1)
L2 metric requires the use of two multipliers of 10 bits inputs and 20 bits outputs
and an adder of 20 bits (unless there is some truncation afterwards). The L 1 metric
only requires three adders of 10 bits. Taking into account that there are five of these
L1 metric blocks in each slice and a total of 30 slices, the area and power savings due
to the use of L1 as compared to L2 are not negligible. L1 metric has replaced the L2
metric in every instance in this ASIC. An example of its implementation can be seen
in Figure 5-11.
5.5
Channel Analysis Module
The channel analysis block takes as input the channel impulse response estimated
in the correlators and obtains the data necessary for both the MLSE demodulator
and the Rake receiver. It takes into account the settings programmed in the chip
and performs the analysis to simplify the complexity of the computations as much
as possible. It analyzes the effective length of the channel impulse response to be
able to turn off as much of the functionality of the back-end as possible. The block
diagram of this subsystem is represented in Figure 5-10. It works with a 16.66 MHz
clock. The inputs to the channel analysis block are the 25 consecutive samples of
105
the channel impulse response read from the first 25 correlators during state CE of
the receiver. Each of these samples have a real and imaginary parts represented as
10 bit 2s-complement binary numbers. All the arithmetic in these blocks is fixed
point arithmetic. This block is programmable. It admits as inputs the number of
bits used to represent the estimation of the channel impulse response (2, 3 or 4), and
the threshold of the minimum threshold for using a channel impulse response tap
(dependent on the number of bits of the representation indicated previously).
During channel estimation, the position that was detected as the first one with
the maximum energy is aligned with the third correlator out of the total of thirty.
It is expected that the maximum amplitude happens in the first five samples of
the channel, given the exponential decay profile of the channel model [3]. The first
five samples are used to determine what set of bits will be used as more significant
bits. Once the more significant bit switching is identified, this is going to be used
as a rough normalization of the weights or automatic gain control for the channel
impulse response estimation. By using the MSB detector we are performing a defacto normalization of the channel impulse response that was estimated with 10 bits
precision in the correlator blocks, with steps of 6 dB (since the maximum possible
value from one step to the next is roughly 6 dB apart). The inclusion of the MSB
detector, the specifications on the AGC are relaxed, allowing the system to work with
the granularity provided by the front-end (6 dB). After the MSB has been obtained,
this information is used to reduce internally the representation of the channel to the
number of bits programmed (2 to 4 for both real and imaginary parts).
The reduction itself happens by means of rounding, not truncating, since truncating for this small number of bits would imply severe positive biases (offsets) introduced. The channel impulse response samples are then truncated from any initial
number of bits to 2, 3 or 4 bits, depending on the programming value.
Once the signals have been rounded to the desired number of bits, the next step
consists of comparing to the preprogrammed threshold. What is compared to the
threshold is the sum of the absolute values of the real and imaginary parts of each
tap. This is done in the Threshold Comply block. This block is comprised of 25
blocks as the one shown in Figure 5-11. Again, it was chosen to perform the L 1
metric instead of using the Euclidean (L 2 ) metric for simplicity of implementation,
although in this case the impact is reduced with respect to the previous one since,
only 25 comparators for numbers of four bits are required as compared to the ones
that were required in the correlators. The Threshold Check and generates a T[j] signal
for each of the input taps that indicates if the threshold has been met. After that,
the "OR" of each group of consecutive five samples is also obtained as T[0] to T[4].
This signal is used to determine a rough estimation of the channel impulse response
length. The channel estimation is given in multiples of five samples or 10 ns. These
summarized signals can be used to add granularity to turning off parts of both the
channel impulse analysis and the correlators. The number of taps used in each slice
in the correlators is decided here as the number of five tap groups that are used at
all.
The Threshold Comply block nulls out those samples of the channel impulse response that did not meet the threshold, and afterwards, the results are conjugated
106
5 complex
EnableMSBDetector
(3 bits)
Chan
10-4]
h
5 bits)
r[5-9]
ý5bits)
Chan
TS
S(6
I
,
a
jLno
"D
A
.jC0
F[20-24
5 bits)
25 complex
numbers
I > 10 bits
Q = 10 bits
(4 bits)
EnableSummary
To MLSE
Decoder
numbers
I = 2/3/4 bits
Q = 2/3/4 bits
EnableMLSE
EnableMiddle
EnableConjugate
Figure 5-10: Block diagram of the channel analysis subsystem.
107
--
Figure 5-11: Structure of one of the 25 components of "Threshold Check" Block.
before being used in the correlators. The blocks Threshold Comply and Complex
Conjugation are comprised of 25 blocks as that shown in Figure 5-12. For the MLSE
demodulator it is necessary to obtain the autocorrelation of the channel impulse response and down-sample it at the symbol rate. The MLSE Weights block, depicted
in Figure 5-13, obtains this result. In this Figure, the block labelled as Dot Product,
takes as inputs two complex vectors of four consecutive taps of the channel impulse
response estimation and computes is inner product. The output of this block are four
weights that will be used in the MLSE equalizer.
This block is only used once during the packet demodulation process after the
channel has been estimated in state CE. The results are not loaded into the respective
blocks that use them until state PL. The total time required for the computations is
four cycles. We are taking advantage in this case of the long period that is used for
the slow clock domain.
5.6
Timing Synchronization
There are two possible timing errors that must be corrected during the packet demodulation. In this case, the preamble duration is short enough to make sure it is not
needed during the preamble (states CE and EPD). This system as designed has four
independent timing references, in the same way that was shown in chapter 4. The
difference now is that it cannot be ensured that the difference in frequency is small
enough between the transmitter carrier and the receiver oscillator. Both a DLL and
a Costas loop [40] are implemented in this transceiver.
The DLL used in this transceiver is exactly the same one that has been used
108
11~~1~
/Threshold
I Comply
(Complex
Conjugation
Figure 5-12: Structure of one of the 25 components of the blocks "Threshold Comply"
and "Complex Conjugation".
Figure 5-13: Block diagram of the MMSE weight estimator.
109
Reset-4
phi
----'J
L'J..
/
phi
'I
0.
Mf 2]V
i
N
I
-----------------------------
0.4 -
-
a 0.2 -0--
phi
--------
--------------
-----
-. 4 -0.8 -----------
-0.8----i--
-"-
-
.1
--
-----------
_nC
Y"·
-----
----------
n
V
n,
A
V···
------
1
Figure 5-14: Block diagram of the Costas loop.
in previous prototypes. For that reason, we will not describe it again. On the other
hand, this is the first receiver where a Costas Loop is implemented. Its block diagram
is shown in Figure 5-14.
The output of the correlators during PL consists of six complex outputs of the
matched filter in parallel. For the purpose of detecting the phase error, only the first
of these six outputs will be used, ignoring the rest. This allows reducing the data rate
by 2, 3 or 6 without having to make any changes to how both the DLL and the Costas
Loop work. In BPSK, the symbols should be aligned with the real axis, with positive
real part if a 1 was sent, and negative real part if a 0 was sent. For the Costas Loop,
if a 1 is received, the complex number is accumulated. If a 0 is received its value,
negated, is accumulated instead. This accumulation process is performed during 32
cycles of the 16.66 MHz clock, for a duration of 1.92 ps. At the end of this interval,
assuming the output of the accumulator is I + j -Q, the phase error is approximated
as:
_esign(I) -Q
(5.2)
This approximation is accurate when I << Q. The phase estimation is filtered with a
programmable filter and its output scaled to five significant bits. This five significant
bits are used as addresses of a 32 position ROM, each of the positions occupied with
a complex number representing a concrete phase correction, as shown also in Figure
5-14. The phase corrections are stored in the ROM consecutively, so that an increase
110
in the phase correction translates in an increase of the address of the ROM. We also
take into account that the phase corrections are periodic, so that when an overflow
occurs in the address, the corrections are still continuous since the address goes back
to small values.
Even if only the first of the outputs of the partial Rake are used for phase error
estimation, the correction is applied to the six outputs of the Rake. The total latency
from the output of the partial Rake to the phase correction updating is 3 cycles of
the 16.66 MHz clock.
5.7
MLSE Equalizer
The use of a Viterbi demodulator for MLSE equalization follows a similar path to
that of demodulating a convolutional code. Figure 5-15 shows the Trellis diagram
that corresponds to an eight state MLSE equalizer, as specified in chapter 3. During
the demodulation process, every time an impulse is received, it is assumed that the
initial state of the demodulator may be any of the eight initial states indicated in
Figure 5-15. The final state depends on the initial state and on the demodulated
value of the impulse received (each state has two output branches depending on if
the next bit is a 0 or a 1). As more than one impulse are received, the Trellis shown
in the diagram 5-16 with one slice per impulse received represents the procedure to
find the more probable path and demodulating the information at the same time. An
example of a possible path with the demodulated bits associated to it is shown also/
in 5-16. The objective of the MLSE algorithm is to find the path along the Trellis
that maximizes the maximum likelyhood metric.
For our MLSE equalizer we will use a classical architecture as that shown in Figure
5-17. It consists of a branch metric unit (BMU), and add-compare-select unit (ACS)
and a trace-back unit (TBU). The purpose of the BMU is to calculate all the metric
units as a function of the channel impulse response and the output of the partial Rake.
These branch metrics are taken into account in the ACS as they are accumulated to
the previous states and the result serves to choose one branch out of all the branches
that arrive to every final state. The TBU stores the values of the initial states and
the paths followed to be able to perform the trace back function and demodulate
the received bits. All these blocks are implemented in the slow clock domain. The
first two perform the operations in only one cycle. The last one adds a latency that
depends in the number of cycles required for trace-back and decode. In our transceiver
the MLSE equalizer receives six inputs at a time. Six bits must be demodulated with
every clock positive edge, although the total latency is not important. The solution
for this, instead of performing only one Trellis iteration per cycle, is to unroll the
algorithm six times, and to perform the equivalent of six iterations per clock edge.
The main impact of this occurs in the BMU, since now it must obtain all the
metrics associated to all the possible paths from the initial states to the final states
after six slices of the Trellis. An elegant solution appears since all the possible paths
that share the initial state and the final state also have in common the initial state
metric. A prior decision may be taken without looking at the initial state metrics
Figure 5-15: 8-state Trellis diagram.
112
Figure 5-16: Locating the most probable path in a 8-state Trellis.
Costas
LoopOutputs_
Add
Compare
Select
Branch
Metric Unit
Preselect
Pointer 1 Pointer 2 Pointer 3
Write Trace-Back Decode
Trace Back Unit
Depth 12
6 bits output
8 blocks
64 blocks
64 branch
metrics
Figure 5-17: Block diagram of the MLSE equalizer.
113
that reduces the number of outputs of the BMU to only 64 (one per initial and
final state). Once this simplification is performed, the ACS needs only to perform
a decision among eight paths to each of the eight final states. This is done with a
Radix-8 ACS unit.
Taking into account that it is assumed that each bit can collide with the next
three, the duration of the trace-back operation must be at least equal to five times this
number, that is, 15 bits. Since in each operation of the MLSE, six bits are associated
to each path, the trace-back would need to go over three of these iterations. We will
be conservative and perform the trace-back operation over four of these iterations.
The TBU is designed with a depth 16, in which each position contains a path for
six bits. Three pointers will be travelling along the TBU, as shown in Figure 5-17.
One pointer will perform the write operation, another one the trace back operation,
and a third one, the decode operation. The size of the TBU is decided so that these
pointers never collide. The latency associated to the TBU is equal to 12 cycles of the
16.66 MHz clock.
5.8
Implementation and Results
The previous architecture was implemented in a 0.18 pm CMOS process, at 1.8V.
Figure 5-18 shows the layout of the chip. It was implemented using normal digital
flow, synthesizing all the blocks as a whole. The total area of the chip is 3.8 mm x
3.6 mm = 13.7 mm2 . It used approximately 1.5 million gates, of which 47% belong
to the correlators and the high-speed clock domain, and 36% belong to the MLSE
equalizer. For the purpose of testing, a PCB board was designed as shown in Figure
5-19. This board allows to connect the custom ASIC both to a logic analyzer, for
testing under controlled inputs and outputs or to connect it to an interface board
with the ADC that was designed by Brian Ginsburg.
The first stage of testing was performed with the logic analyzer. In this case,
it was not possible to test the system with a complex signal (active inputs in both
the in-phase and quadrature channels) because the pattern generator of the logic
analyzer does not have enough outputs for both generating the input signal and the
control signals required for the chip. For this reason, it was chosen to perform part
of the testing with real inputs only. Under these circumstances, the Costas loop
cannot be tested, since it is designed to correct the phase error of a complex signal.
Another limitation of this phase of testing is the maximum clock frequency that can
be generated with this pattern generator, that limits the testing to 70 MHz instead of
83.3 MHz. Figure 5-20 shows an example of the signals captured by the logic analyzer.
EnBits is a signal that goes from zero to one whenever a packet has been detected
and the output bits of the ASIC (shown in the line Bits) are valid demodulated
bits. The signal EnBits may be used in the interface with another board or a PC
as an enable for an input register. This figure also shows the inputs to the ASIC
(InputsRealO to InputsReal5), corresponding to 6 samples in parallel at a frequency
five times faster than that of Bits simulating the output of the ADC. Signal Swap
represents the control of the retiming block. Figure 5-21 shows the sequence of bits
114
-
i-
3.8mm
IF
5-18: Robust
3.6Figure
UWB baseband layout.
Figure 5-18: Robust UWB baseband layout.
115
Figure 5-19: Testing board.
sent and demodulated in these tests. This sequence of bits is a Manchester
code and
was chosen to ensure that not a long sequence of impulses of the same
sign would
then be used in the RF front-end.
A signal with Gaussian noise was simulated in the pattern generator in order
to
estimate the baseband sensitivity. Figure 5-22 shows the measured bit-error
rate for
different SNRs in two cases. First, when the input signal is comprised of Gaussian
impulses with AWGN. Second, when the input signal has gone through a CM2
channel.
Since the input signal follows the assumptions of the simulations presented in
chapter
3, the results closely follow the simulated curves. For each point with bit-error
rates
larger than 10- 4 , 10 data packets were used to perform the bit-error rate estimation.
For lower bit-error rates, the estimation was based in the results of 20 packets.
Figure 5-23 demonstrates the trade-off between complexity of the signal processing
and the power dissipation. As the threshold that controls the number of taps
of the
partial Rake receiver that are activated decreases, more taps are used and the
total
power dissipation increases. For this plot a CM2 channel [3] was used,
and the
MLSE equalizer was enabled. In this plot, the threshold never goes over amplitude
5. This is due to the fact that during the testing it was found that because of
the
normalization performed in the channel analysis subsystem, sometimes, even if
the
packet was detected with good SNR, no sample of the channel impulse response after
normalization met the threshold when it was larger than 5. Because of that, the
whole packet was lost.
If we change the number of bits of the internal representation of the channel impulse response (it can be chosen to be 2, 3 or 4 bits), these power results do not
change sensibly. The reason for this is that the procedure to change the internal representation is simply to null-out those bits that are not used. Since a 2s-complement
binary representation is used, a change of sign implies the switching
of the MSBs,
116
Figure 5-20: Interface signals when a packet has been detected.
Figure 5-21: Interface signals showing a sequence of demodulated bits.
117
.....
.....
......
no quantization
.............
Simulated
x ASIC clean pulse
i.............;.........,
*i!~i~iiriiiiiiiii
-9i
- ASIC CM2
10- 1
1D-
2
10
..................
........................
::::
?:::
.......
....
...........
.............
.....
...
...
....
...
I....
...
....
.
-....
..
0
..............
..........
........
:
..............
.......
10a10 "'
-10
-5
0
5
SNR (dB)
10
15
Figure 5-22: Probability of error measures in the ASIC.
even if it is not needed. To avoid this problem and achieve better energy trade-off,
it would have been necessary to use an architecture that ensures that the multipliers
switch only the relevant bits.
This ASIC has also been tested integrated in a full wireless system at a 62.5
MHz frequency. Figure 5-24 shows the end of the preamble and the beginning of the
payload obtained in these conditions. A 7% reduction in packet acquisition resulted
due to observing a change of sign in the real part of the output of the matched filter
to detect the end of the preamble. This loss would be avoided by looking instead at
the change of sign after the phase correction given by the Costas loop has been taken
into account.
A wireless data rate of 85 Mbps was obtained 62.5 MHz. With the logic analyzer,
the ASIC consumes a maximum of 83 mA from a 2 V power source. The simulated
values were for these conditions 76 mA at 1.8V, of which 28 mA correspond to the
MLSE equalizer and 40 mA correspond to the high-speed clock domain and the
correlators. This power dissipation can be reduced up to 45% changing the threshold
used to select the taps of the channel impulse response estimation. This proves the
possibility of trading off energy dissipation with quality of service. In a 5 kbit packet,
the energy dissipated at the full functionality is 2.4 nJ/bit.
118
MM6
....
...
..
..
.
..
...
....
..
.....
..
..
...
...
. ...
.....
..
...
.....
..
...
..
...
...
..
...
...
.....
...
................. ...........
...
...
.....
..
..
... "*.. .................... ...................
......................... ......... .....
.....................
....
.......... ............. ...........
..
............
An
1
5
2
2.5
3
3.5
4
Threshold for channel (maximum 7)
4
5
Figure 5-23: Demonstration of a QoS - Power trade-off.
Data packet
Figure 5-24: Structure of the data packet.
119
120
Chapter 6
Conclusions and future work
6.1
Thesis summary
Although impulse signals were used for some time in radar applications, only recently have they been revisited for communication purposes. The authorization of
the band from 3.1 to 10.6 GHz for communication purposes under certain restrictions
has spurred the research and development of applications using UWB signals. There
are currently two main drivers of the technology: applications that try to achieve very
high data rates at very short distances, and applications that achieve very low data
rates at larger distances. In this thesis we have focused on the challenges associated
with impulse UWB targeting high data rates.
The use of ultra-wideband signals for wireless communications presents some advantages over conventional narrowband signals while at the same time posing some
interesting challenges. UWB signals have better time definition than narrowband
signals. Multipath does not appear as fading since individual echoes may be detected
independently, and their energy collected. To compensate for the multipath it is necessary to estimate the channel impulse response, and to use this information in both
a matched filter (that gathers the energy from the different echoes of the signal) and
an equalizer (to compensate for possible inter-symbol interference). UWB receivers
may be successful in multipath environments at the cost of increasing the complexity
and power dissipation of the digital baseband. In this context, the power dissipation
of the digital baseband becomes a relevant part of the total power budget, and should
be optimized.
In this thesis, the design and implementation of a baseband for UWB wireless
systems has been explored through several prototypes. First, a custom ASIC oriented
to baseband UWB signals was designed. It was shown that, for this application, an
ADC of only 4 bits ensured reliable signal demodulation. The baseband was designed
for a signal in the band from 0 to 500 MHz, as part of a system-on-a-chip implemented
in 0.18 ,im CMOS technology working at 1.8 V. Each bit of information is represented
with 31 baseband impulses of 2 ns width and 2% duty cycle , resulting in a raw data
rate of 322 kbps. The digital baseband is completely functional at a clock frequency
of 300 MHz but not at 500 MHz. The frequency range for the coarse acquisition
121
algorithm between a pair of transceivers is ±3%. The average time to declare coarse
acquisition is 65 ps. At 300 MHz, a data rate of 193 kbps was demonstrated. The
baseband consumes 75 mW. This architecture is also scalable to larger bandwidths.
Second, the specification of a FCC compliant UWB system with a raw data-rate of
100 Mbps at 10 m distance using a bandwidth of 500 MHz in the band from 3.1 GHz
to 10.6 GHz was analyzed. It has been established in this thesis that a homodyne
architecture is better suited.for ultra-wideband signals. The main challenge in this
kind of wireless communication is multipath. In order to cope with it, it is necessary
to estimate the channel impulse response and use this information in a Rake receiver
and in an equalizer to compensate for inter-symbol interference. It is possible to
estimate the channel quality and adapt the signal processing available in the digital
baseband to the concrete channel quality. For the purpose of exhibiting this trade-off,
impulse UWB signals are better suited than OFDM signals where the complexity of
the signal processing is already fixed to avoid inter-carrier interference. It has been
proven that impulse UWB offers the opportunity of reducing the number of bits of
the ADC under favorable conditions of SNR or channel impulse response.
A data packet is defined that is comprised of a preamble and a payload. The
preamble is composed of 16 repetitions of a Gold code of 31 bits, in which every two
consecutive impulses are separated by an interval of 60 ns. The Gold code is used
during the detection of the data packet and the 16 repetitions ensure a time to achieve
packet acquisition of 30 Ms. The separation between impulses allows the estimation of
the channel impulse response with reduced or no ISI. The channel impulse response
is estimated using the information obtained by receiving a sequence of 31 impulses.
Each tap of the channel impulse response is represented with a complex number in
which both real and imaginary parts have 4-bit precision. During the payload, the
impulses are separated by 10 ns, and each bit of information is represented by the
sign of only one impulse. The sensitivity of this system is -81 dBm with a noise figure
of 5 dB.
Third, a complete discrete prototype was built in the Digital Circuits and Systems
group with off-the-shelf components. This prototype was used to validate some of the
theoretical claims of the system in real-time conditions. As the components of the
system are designed and fabricated, they can also be individually substituted into
the prototype to verify overall system functionality. The second UWB baseband was
designed for this discrete prototype and implemented using an FPGA. This prototype
is FCC compliant and uses 500 MHz subbands from 3.1 GHz to 10.6 GHz. With this
digital baseband it was possible to obtain either a data rate of 100 Mbps using an
arbitrary waveform generator or 50 Mbps using a dedicated impulse generator. Due
to the flexibility of this discrete prototype, it is not restricted to impulse UWB signals,
being possible to test other modulations. For example, it was used to test multitoneFSK modulation.
Finally, a second ASIC was designed to implement a robust, FCC-compliant UWB
baseband working at 100 Mbps. This second ASIC was designed using 0.18 pm CMOS
technology working at 1.8 V. Among the subsystems implemented are 150 correlators
in parallel to reduce the time to achieve coarse acquisition, a programmable partial
Rake that may use up to 25 complex taps and a MLSE equalizer. This thesis has
122
explicitly exposed the link between signal processing complexity, power dissipation
and quality of service. The packet acquisition is achieved in 30 Ms. It consumes a
maximum of 83 mA from a 1.8 V power source. This power dissipation can be reduced
by 45% by changing the threshold of the channel impulse response estimation, proving
the possibility of trading off energy dissipation with quality of service. In a 5 kbit
packet, the energy dissipated at the full functionality is 2.4 nJ/bit.
6.2
Conclusions
In this work it has been determined that the main challenge in a UWB baseband for
high-data rate applications is multipath compensation. Because of the bandwidth of
the signal, it is possible to separate the different echoes that comprise the channel
impulse response and use this information to gather signal energy using a Rake receiver. But the multipath causes inter-symbol interference, needing an equalizer to
compensate for it. The Rake receiver and the equalizer severely increase the complexity of the digital baseband, and, together with the large sampling rate needed for
high bandwidth signals causes the digital baseband to consume a sensible part of the
system power budget.
Impulse UWB allows trading off signal processing complexity with quality of service, depending on the channel quality. When the received SNR is high and the
channel impulse response is short to ensure that no ISI happens, it is possible to
reduce the energy dissipated per packet by not using the MLSE equalizer and also
reducing the number of taps that the Rake is using (this can be done by increasing
the threshold used for the channel impulse response). If, on the other hand, the SNR
is low, and the channel impulse response causes ISI, it is necessary to use the full
complexity of the system to recover the information. This trade-off has been explicitly proven in this thesis in terms of the loss of SNR and the complexity in chapter
3 and in terms of an explicit power dissipation difference in chapter 5. MB-OFDM
does not allow this trade-off.
One of the most important specifications of an UWB system is the specification of
the number of bits of the ADC. It has been proven that for the applications designed
in this thesis, a 4-bit ADC allows reliable demodulation. In the interference limited
case, for a data rate of 100 Mbps, it even allows operation with a SIR of -7 dB.
Still, the received impulses depends not only on the transmitted impulses but also
on the channel impulse response and in the transfer functions of both transmitter
and receiver antenna and the receiver front-end. For this reason, it is advisable to
estimate the received impulse shape as part of the channel impulse response. This
limitation was discovered in the first custom ASIC designed in this thesis and its
solution implemented both in the discrete prototype and in the second custom ASIC.
Ultra low power UWB system might be designed if the data rate is reduced enough
to ensure no ISI happens. If it is possible also to reduce the data rate so that each bit
of information is represented with more than one impulse, there is no need of a Rake
receiver either, and the architecture is simplified. As the data rate increases, with
constant signal bandwidth, the number of symbols affected by the channel impulse
123
response increases linearly. The Rake complexity also increases linearly, but, more
important, the complexity of the MLSE equalizer increases exponentially. To keep
the complexity to a minimum, it is necessary to keep data rates low.
6.3
Future work
In terms of future lines of work, the exploration of further ways of reducing the
power dissipation seems promising. In this work we focused on the concrete tradeoff that involves the complexity of the signal processing and the quality of service.
Circuit techniques that reduce power dissipation, specifically turning off the different
segments of the system when they are not needed should be explored. Among the
possible techniques, dynamic voltage scaling, clock gating, and the use of high-Vt
devices to reduce the leakage will prove very effective.
This thesis used a MLSE equalizer. This is not the only architecture available
for this purpose, although it represents the theoretically optimum solution to the ISI
problem. It would be desirable to thoroughly study other architectures (such as zeroforcing or decision-feedback schemes), and the different complexity/quality of service
trade-offs that each of them would involve.
Finally, we have not addressed the problem of in-band narrowband interferers. It
was initially claimed that, not only would an UWB signal cause a negligible interference to already existing services, but it would also be robust to strong interference.
This assumes that the whole system is linear. Linearity constraints are imposed by
the RF front end and the ADC. This sharply reduces the tolerance of UWB signal to
narrowband interferers to only 7 dBr. Active cancellation of in-band interferers, by
estimating the frequency of the narrowband interferers in the digital baseband and
providing this information to the RF-front end, would help to improve the robustness
of the receiver.
124
Appendix A
Comments on Link Budget
In this chapter, the equations that were used for the link budget specification are
presented.
A.1
Notation
The meaning of the symbols used in this appendix is shown below.
fmin - Minimum frequency of the band (Hz)
fmax - Maximum frequency of the band (Hz)
Sd - Maximum spectral density (dBm/MHz)
BW - fax - fmin - Signal Bandwidth (Hz)
(A.1)
(A.2)
(A.3)
(A.4)
BW
Pt = Sd + 10 -log 10
Average transmitted power (dBm)
Gt - Transmitter antenna gain (dB)
f, --
/fmax
fmin -
Geometric center frequency (Hz)
20. log ( 4 fc lm)
L1
Path loss at im (dB)
dmin= Minimum distance (m)
L 2min ~
20 log
di
mim
Extra path loss at dmin (dB)
AADC
da
=
KADC -
M _
(A.7)
(A.8)
(A.10)
(A.11)
20 -log max - Extra path loss at da (dB)
G, Zo -
(A.6)
(A.9)
dmax - Maximum distance (m)
L 2 mazx
(A.5)
1m
Receiver antenna gain (dB)
Reverence resistance (Q)
Peak amplitude value for the ADC
Constant from ADC
Design Margin (dB)
125
(A.12)
(A.13)
(A.14)
(A.15)
(A.16)
(A.17)
PRF - Pulse repetition frequency (Hz)
A.2
(A.18)
Definition of the Parameter K
K is defined in the following equation:
Ep =
j
p(t)12 = KA 2
(V 2s)
(A.19)
where p(t) is the mathematical representation of the pulse shape, A is the peak value
of the impulse, and Ep is an amount proportional to the energy of the pulse. In order
to get energy, in Joules, it is necessary to consider an impedance where this energy
is dissipated. Then the energy Ef is:
KA 2
Ef = ZA
(A.20)
Z1 is the input resistance (not necessarily 50 Q). The power is:
P = K - A 2 - PRF
K . A2 • PRF
Z
Psf --
(V 2 )
(A.21)
(W)
(A.22)
When these impulses are up-converted, all these expressions must be divided by 2.
E =
Ef =
_00
KA 2
I(t)
p . coswt
2
2A
dt = KK
2
(A.23)
(A.24)
(W)
2Z 1
(V 2 s)
Applied to power expressions:
P =
PA
A.2.1
A 2 PRF
2
K. A2 • PRF
2 PRF
(V 2 )
(A.25)
(W)
(A.26)
Gaussian Pulse
The equations for this kind of pulse are:
(A.27)
p(t) = Ae - t2/ 2a 2
,2w
P(jw) = AaivHe
E, = U 7rrA
126
2
2
2
(A.28)
(A.29)
from where
K = avo
(A.30)
It is also very useful to relate the standard deviation of the Gaussian pulse with the
bandwidth of the signal at certain attenuations:
0.241
a=
(A.31)
0.132
a =0132
(A.32)
From these equations, the standard deviation required for a Gaussian pulse of 10dB
bandwidth equal to 250MHz is 0.964 10-'s. This would be 250MHz of baseband
bandwidth or 500MHz of passband bandwidth (once up-converted).
A.2.2
RC-charge Pulse
The equations for this kind of pulse are:
p(t) = A
1 - e-T/'
A
u(t) - A
1 - e-T/r
1 - e jwT
P(jw) = 1 - eT/ j(1
+j
1- e-T/' jU(1 + jW7)
EpA= T - r(1 - e - Tr/
E,= A2
(1 - e-T/r)2
ue-(t(t - T)
(A.33)
(A.34)
)
(A.35)
from where
T - r(1 - e -T/)
(A
(1 - e-T/T) 2
With T = 2ns and 7 = 1.11ns, the baseband bandwidth is 250MHz and the bandpass
bandwidth is 500MHz.
A.3
Link Budget and Sensitivity
The sensitivity, defined as the minimum required power with which the receiver works
properly, is simulated for a reference Pulse repetition frequency PRFref. It gives:
SNRef =
A2KPRFref
2
PRF
(A.37)
The probability of error is still the same when the pulses are closer together or further
apart, as long as they do not collide, because it depends only on the ratio A/a. It is
127
possible to refer this sensitivity to other PRFs as follows:
SNRe,, = 10 log
SNR = 10 log
A2 K
2
A 2K
2
+ 10 log PRFre,
(A.38)
+ 10 log PRF
(A.39)
from where
SNR = SNRr,, - 10 log PRFef + 10 log PRF
PRF
= SNREef + 10 log
(A.40)
PRFref
Once SNRref and a PRFre,f are obtained for a concrete Pe, for a bandwidth and
a bit rate, SNR is obtained as a function of PRF.
A.4
Extra Losses Due to Pulse Shape
In this section the losses due to the fact that a sinc pulse is not used are considered.
They are associated to not taking advantage of all the power spectral density available.
The power of the square spectrum (see Figure 3-22) is
2
B A 2dw = A 27rBW
(A.41)
meanwhile, the one with the pulse that used in transmission assumed to be p,(t) with
Fourier transform Po(jw), and unit maximum amplitude, has a power:
A2 IP(jw) 2 dw
j0
(A.42)
The losses can be defined as:
Lshape
A.4.1
2irBW
Po(jw) I do
0 1~
(A.43)
Gaussian Pulse
For a gaussian pulse:
Lgauss = 2W - BWa
Taking in to account that
(A.44)
2
Po(jw) =e
a2w
2
(A.45)
_=
(A.46)
the following result is obtained
SlPo()
12 d
128
Assumming the BW is for -10dB, Lgaus, = 2.33dB.
A.4.2
RC-charge Pulse
Starting from the pulse shape:
1
Po(jW) = T
1
- e
- j
wT
1- -jwT
(A.47)
Tju(1 +j-r)
using a version of the Parseval equality:
p(t)12 = 2rEp
SP(jw)I2 dw = 27r
From where:
LRC = 10 log
BW . T 2
- (1 - e-T/
(A.48)
(A.49)
T - -r(1- e-Tl7)
This formula gives 2.7dB for a 250MHz baseband bandwidth.
A.5
Receiver Constraints
The minimum desired received power is
Prmin = Pt - L1 - L 2max - Lex + Gr
(dBm)
(A.50)
(dBm)
(A.51)
and the maximum desired received power is
Prmax = Pt - L - L2min - Lex + Gr
This is the power given by the antenna when the antenna is matched. The use of this
equation implies assumptions on the matching of the receiver antenna. It does not
provide any data on the transmitter as Pt dumps together the gain of the antenna,
implying any matching effects are already included in the number.
A.6
ADC Constraints and Detection
The following formulas assume that the automatic gain control works perfectly. They
refer to signals at the input of the ADC.
* Minimum Desired Received Voltage
Vmin =
F KZ
K- PRF
PRF
129
(mV)
(A.52)
e
Maximum Desired Received Voltage
P2Z10+30
Vmin
=
(mV)
K. PRF
(A.53)
* Desired Voltage at the Input of ADC
J
Va.dc
KADC - AADC
K-PRF(1
+ 10-SNR/10)
(mV)
(A.54)
* Desired Power at the Input of ADC
Psao, = - 30 + 2 0 log KADC - 10 log Z1 + 2 0 log AADC
- 10 log (1 + 10- SNR10O)
(dBm)
(A.55)
* Maximum Noise Standard Deviation at Input of ADC
Oade = KADC *AADC
10-
(1
S N R / I O)
V(1 + lo-SNRI1O)
(mV)
(A.56)
o Maximum Noise Power at the Input of ADC
Pn,,a = -30 + 20 log KADC
10 log Z1 + 20 log AADC
- 10 log (1 + 10SNRI10O)
(dBm)
A.6.1
-
(A.57)
Explanation of KADC and AADC
This analysis is performed for a baseband signal, right at the input of the ADC.
Ps =
A 2 K - PRF
Z,
(A.58)
a2
PN = 0
Z1
(A.59)
Where Ps is the signal power and PN is the noise power. The policy of the AGC can
be described as follows:
Total Power
Scaling
X = A2 K PRF + a2
KADC
130
(A.60)
(A.61)
Where the Scaling represents the number the input signal is divided to fit the ADC
input range. This Scaling is used as follows:
A
Scaling
Apime-
a
a
aprime = Scaling
SNR = 10log
(A.62)
AA KADC
• KADC
(A.63)
KADC
V
A 2K - PRF
(A.64)
2
Where Aprime and sigmaji,, refer to the signals at the input of the ADC.
The total power is then:
X=o 2 + A 2 K - PRF =
2
a+ A2K PRF
(A.65)
= U2 (1 + 10SNR/1)
Scaling is rewritten as
Scaling =
sN
o•/1 + 10
R /I1
(A.66)
KADC
then
Aprime
=
A
A1•
aprime = a1+
besides
KADC
1
KADC K . PRF 1 +
KADC
V+ 1 0SNR I O
_
-A10 SNR/1O
a . KADc
10
SNR/O
A 2 K PRF =
1
1 0 -SNR/1
10 SNRo10
(A.67)
(A.68)
(A.69)
-2
Then
A=
KPRF
(A.70)
In order to normalizing the input scale of ADC, Aprin and -prime are multiplied by
AADC. The units depend on those of AADC.
A.7
Gain Specifications
The following equations refer to the total minimum and maximum gains required of
the analog front-end.
* Minimum Gain in the margin
Gmin = Psadc - Prmaz
131
(dB)
(A.71)
* Maximum Gain in the Margin
Gmax =
Psadc - Prmin
(dB)
(A.72)
* Constant Part of the Gain
Gfpoor = Gmin
(dB)
(A.73)
* Variable Part of the Gain
Gvar = Gmax - Gmin
A.8
(dB)
(A.74)
Noise Figure Specification
This formula refers to the maximum noise figure allowed to the analog front-end.
F = P, - SNR - M - 10 log BW + 174
In this formula P, must be expressed in dBm.
132
(A.75)
Appendix B
Comments on Signal Generation
In this chapter, the procedure to include the non-idealities of both the transmitter
and the receiver in simulations is demonstrated.
B.1
Transmitted Signal
Two different signals are considered. The difference is the timing origin of pulses
with respect to the phase of the carrier used to up-convert them. The theoretical
transmitted signal is defined as:
bkp(t - kT,)
s(t) =
(B.1)
k=O
where bk is the sequence of transmitted bits, p(t) is the transmitted pulse, and T, is
the time interval between consecutive pulses. There are two different cases. In the
first case,
p(t) = g(t)
st.(t) = s(t) - cos(wct + ¢)
(B.2)
(B.3)
p(t) = g(t) - cos(wct + q)
(B.4)
In the second case,
stz(t) = s(t)
(B.5)
In these expressions, st.(t) represents the signal that is transmitted. Two cases are
considered:
sl(t)= 00Ebkp(t-kT)cos(Wt+O)=O
s 2 (t) = E bkp(t - kT,)cos(w,(t - kT,) + €)
k=O
133
(B.6)
(B.7)
00
= bkp(t - kT,)cos(w t + - wkT,)
k=O
B.2
Jitter in the Transmitter
Jitter affects the instants of generation of pulses (Sti) and the carrier phase (601). The
effect is different depending on the kind of transmitted signal. Taking into account
them:
sl(t) =
bkp(t
-
kT
tk)
-
1 (t))
(B.8)
- w 0Stk)
c
- w~kT,
(B.9)
cos(t
+
00
s 2 (t) =
E bkp(t - kT,6tk)cos(wct +
k=O
B.3
Channel Impulse Response
The description of the channel is given as a sequence of amplitude and delays. From
this expression, the equivalent low pass representation of the channel is obtained. Let
the impulse response of the channel be:
N-1
h(t) =
a6(t - ti)
(B.10)
i=O
where ai are real numbers and ti are delays and N is the number of multipath components in the model. We can also summarize the input signal to the channel as:
s(t) = x(t)cos ((w, + 0)t + ¢(t))
= x(t)cos(wot)cos(wct + 0(t)) - x(t)sin(wot)sin(wt + 0(t))
(B.11)
= si(t)coswot + sQ(t)sinwot
Where the sub-index I represents in-phase component and the sub-index Q represents
quadrature component. The output of the channel is therefore:
N-1
r(t) = s(t) * h(t) = E ais(t- ti)
i=O
N-1
N-1
aisi(t - ti)coswo(t - ti) + E aisQ(t - ti)sinw,(t - ti)
S
i=O
i=0
N-1
aisl(t - ti) (coswot - coswti + sinwot - sinwoti)
=-
i=O
134
(B.12)
N-1
+ E aisQ(t - ti) (sinwt - coswoti - coswot, sinroiti)
i=O
From this it is possible to write:
N-1
N-1
aisr(t - ti)coswoti -
r(t) =
i=0
aisq(t - ti)sinwoti cOSWot
(B.13)
i=O
N-1
N-1
aisI(t - ti )sinwoti + E aisQ(t - ti)coswoti sinwot
+
i=O
i=O
Let
(B.14)
sL(t) = si(t) + jsQ(t)
N-1
hi(t) = E aiej-"t'(t -
ti)
(B.15)
i=O
ei•wt' = coswoit + j - sinwoti
(B.16)
Taking into account the previous equations,
N-1
st(t) * hi(t) = E aieij w t ' (sj(t - ti) + j - sq(t - ti))
i=0
N-1
ai (si(t - ti)coswoti - sQ(t - ti)sinwotl)
=
(B.17)
i=O
N-1
ai (si(t - ti)sinwoti + sQ(t - ti)coswoti)
i=O
Therefore the signal at the output of the channel is
r(t) = Re { (sL(t) * h (t)) e - w
i dt }
B.4
(B.18)
Signal at the Input of the Receiver
In a realistic receiver there is a difference in frequency between the transmitter(tx)
and receiver(rx) carriers:
Wr = Wo
= Wo + Awo
eWx
(B.19)
(B.20)
but in general Awo << w0 . There is one jitter that affects the center position of the
pulses (Sti) and another one that affects the phase of the carrier (6S1(t)). For the
135
first kind of signal, taking into account these jitters:
s(t)=
bkg(t - kT, - 6tk)) cos (wot + 0 + 6o 1 (t))
k=0
(O
5tk))
bkg(t- kT, -
cos (wot + Awot + q + J0
6(t))
(B.21)
±
bkg(t- kT, - 6tk)) cos (/wot +-- +
-
(
(Ok=0
bk9(t
coswt
1 (t))
- kT, - 6tk)) sin (Awot +
,1 (t)) sirnw
From where
slI(t) =
S1Q(t) =
siz(t) =
(Ok=0
k=(
(t)
kT - 6tk)) cos (Awot + + 6q11(t))
bkg(t
-=
bkg(t
-
kTU -
Ybkg(t - kTs
(ook=0
tk)) sin (Awto +
- Jtk))
+ 0 1 (t))
(B.22)
(B.23)
(B.24)
eAwot ej ejsbl (t)
Following a similar procedure for the second kind of the signal
s 2 (t) =
E bkg(t - kTs - 6 tk)COS (wc(t - kT, -
6
tk) +
)
k=O
(B.25)
00
6
bkg(t - kT, -
=
tk)COS ((wO + AWo)(t - kT, -
3
tk) + q)
k=O
Defining
oi,tz(t) = -wokT, -
Wtk
+ Awt - AwokT, - AAWotk + q
(B.26)
Therefore
'
bkg(t - kT - 6tk)ej Oii(t)
s21(t)=
(B.27)
k=O
From these equations
si(t) = Re {sil(t)ej~ot}
2(t) = Ree { 2
136
(t)e w'o
}
(B.28)
(B.29)
The transmitted signal is obtained from the equivalent low pass model as
s(t) = Re {si(t)eiwot)
= si(t)coswot - sQ(t)sinwot
(B.30)
Assuming the channel is
N-1
h(t) = E ai(t - ti)
(B.31)
i=O
With a2 complex. The procedure to use the channel with the low pass model of the
signal is:
r(t) = N--1
s(t) * h(t) = (sj(t)coswot - sQ(t)sinoWt) *
=
ai (sI(t
N-1
=
-
t,)cos(wot - woti) -
sQ(t
i=O
N-1
-
ai(sQ(t -
ti)sin(wot
ii=O= - woti))
-
a, (si(t - ti)cosw ot -swti)
+ si(t -
ti)sin(wtsinwti)
tj) sinwtcoswoti
+sQ(t- t,)coswotsinwotj)
i=O
N-1
-
a, (sQ(t - t,)c sw ot+ sQ(t - t)sinw
i=O
N-1
-
it,)
coswot5
(B.32)
a, (sQ(t- ti)coswoti - sQ(t - t)sinwt) sinwt
i=O
N-1
-
aj5(t - ti))
aRe {(s(t - ti)+ jsQ(t - t,))(oswi
(B32)
j iwt}
- jsinwot,)e
i=O
=
= 0Re
af(s
1 (t - ti) + jsQ(t - t))(cosw
-
sin
e
Re {re(t)eiwot}
with
r(t) = ha(t) * si(t)
(B.33)
N-1
h=(t) =
a)e-•wt6(t - t,)
i=O
137
(B.34)
ELP of a Non-linearity
B.5
Let the input to the non linearity be the output of the channel, that is
r(t) = Re {r(t)ejdwot = rI(t)coswot - rQ(t)sinwot
(B.35)
A fifth-order non-linearity is assumed:
z(t) = r(t) + a2r 2(t) + a3 r3 (t) + a4r 4 (t) + a 5r 5 (t)
(B.36)
The objective of this section is to obtain the coefficients bo, b,,i and b,,i in the
following expression:
r(t) = bo + bC,icoswot - b,,lsinwot
+ bc, 2cos2wot - b,,2sin2wt
+ bc, 3cos3wot - b,,3sin3wot
+ b, 4cos4wot - b8,4sin4wot
(B.37)
+ bc,scos5wot - b,, 5sin5wot
The square term in (B.36) is expanded as:
r2 (t) =
r•(t) + r•(t)
r1(t) - r4(t)
cos2wot - rI(t)rQ(t)sin2wot
(B.38)
For the r 3 (t) in (B.36):
r3 (t) = r (t)cos3Wot - 3r2 (t)rQ (t)cos2w tsinwJt
1
2
1 2W,,
w t _r 3
Wt
+ 3rI(t)rQ(t)coswotsin2wt r(t)sin3wot
(B.39)
(B39)
Where:
3
1
cos 3 wot = -coswot + -cos3wot
4
4
1
1
cos 2wotsitnwt = -sinwot + -sin3wt
4
4
1
1
2
coswotsin ot = cOswot - -cos3wot
4
4
1
3
sin3 wot = -sinwot - -sin3wt
4
4
(B.40)
(B.41)
(B.42)
(B.43)
Therefore,
r((t) =
3
(r(t)+ r (t)) (r,(t)coswot - rQ(t)sinwot)
+ 1 (r1 (t) - 3r (t)) ri(t)cos3wot
138
(B.44)
1
I
+
(r2(t) - 3r2(t)) rQ(t)sin3wot
For the r 4 (t) term in (B.36):
r 4 (t)
r4 t) C4
+
- 4rr(t)rQ(t) cos3 Wotsin
6rf(t)r2(t) cos2
ot
ot sin 2 wot
- 4ri(t)rQ(t)coswt sin3 0ot
(B.45)
+ r'(t)sin 4 wot
The required expressions for this are:
31
1
- + - cos 2wt + - cos 4wot
82
8
1
1
3
cos wot sin wot = - sin 2wt + - sin 4wt
4
8
Cos 4 wot
1 1
cos 2 Wot sin 2 wt = I - cos 4wt
8 8
cos wt sin3 wt = - sin 2wot - - sin 4wot
4
8
31
1
sin 4 wt = -cos 2wt + - cos 4wot
82
8
(B.46)
(B.47)
(B.48)
(B.49)
(B.50)
Therefore:
4
(t) =3 (r(t)+
4
2r (t)r2(t) + r4(t))
+
1
(r (t) - r (t)) cos 2wot
- rI(t)rQ(t) (r2 (t) + r2(t)) sin 2wot
1
+ 8 (r (t) - 6rV(t)r,(t) + rQ(t)) cos4wot
(B.51)
•2rI(t)rQ(t) (r2(t) + r2(t)) sin 4wt
For the term r5 (t) term in (B.36):
r5 (t) = r5 (t) cos5 wt - 5r14(t)rQ(t) cos4 Wt sin wt
+ 10rI(t)r(t)cos wot sin2 wot
- 10r2(t)r+,(t)cos 2 wt sin 3 • ,t
+
(B.52)
5ri(t)r4(t)coswotsin4 wot
+ rQ(t) sin5 wot
The required expressions for this are:
5
5
1
cos 5 wot = - coswot + - cos 3wot + - cos 5wot
8
16
16
139
(B.53)
3
1
sin 3wot +
8
16
1
1
Cos 3 Wt sin 2 wot = - cos wt - I cos 3wt Cos 4 W
0 t sin wt= - sin wt + -
16
8
(B.54)
(B.55)
16
1
1
1
s-in wot + - sin 3wot - - sin 5wot
16
16
8
cos 2 Wot sin 3 wot =
SA
cos wot sin wot =
1
- sin 5wot
16
1
- cos 5wot
-
cos Wot - -
°
w
cos 3wt + -
(B.56)
,
cos 5wot
8
16
16
1
5
5
sin 5wot
sin5 wot = -sin wot - - sin 3wot +
8
16
16
(B.57)
(B.58)
From where:
r
5
+ r• (t)r2 (t) + r(t)r
((t)
8r8(t
(t)rQ(t) + s r(t)(t)
-(r4
15
16'(t) -
r(t )
6()r I
r4(t)rQ(t)
j-~rk~r
si 5ot
- 16r5(t))
r
10 2
16r (t)r(t
-
(B.59)
5
2(t)
I(t)r 3 1---Ir5
(
(t)rq(t)) cos3wot
10
+
o(t)
sin w ot
+(16r (t) - 16rI (t)r2(t) - 165
-
cos
( t ) rq)
16r5(t)
cos5ot
sin 5wot
+
After this, the coefficients bo, b,,i and b8,i are obtained. For the DC term:
(
bo =
(t)
r2 (t)) +
+ r (t))
2(r(t)+
(B.60)
For the fundamental term:
5a
b, = 1+ a (r2(t) + 2(t)) + 5a8 (r2(t) + r2(t))2)
5
b3,1 =-
( + 3
(r I(t) + r
ri(t)
+(t))
+- (• (t) + r(t))2 rQ(t)
(B.61)
(B.62)
For the second harmonic:
+(t))Q+
(r(t)- r4(t))
4
bs,2 = a2rs(t)rQ(t) + a4rs(t)rQ(t) (r 2(t) + r 2(t))
bc,2
a2 (r (t)
140
(B.63)
(B.64)
For the third harmonic:
b,3a
bs,3 =
(r(t)- 3r(t)) + 5
(3r2
(t) - r,(t)) +
(r4(t) - 2r2(t)r2(t) - 3rQ(t))2
5 (3r(t)+
ri(t)
(B.65)
2r 2(t)r(t) - rQ(t))2 rQ•(t)
(B.66)
For the fourth harmonic:
bc,4 = a (r (t) - 6r2(t)r2(t) + rQ(t))
(B.67)
b,4 =
(B.68)
r(t)rQ(t) (r2 (t) + r2(t))
For the fifth harmonic:
S=
B.6
(t)ar(t) + 5r (t)) rI(t)
(B.69)
s = (5r(t)- 1Or2(t)r2(t) + ra(t)) rQ(t)
(B.70)
(r(t)
(t)r--
Modelling I-Q unbalance
In order to obtain the in-phase and quadrature components, the input signal is multiplied by:
In-phase =- cos (Wot + c0 2 (t))
Quadrature - - (1 + A) sin (wot + Ak + 50 2 (t))
(B.71)
(B.72)
(B.73)
where A represents the amplitude unbalance, A0 represents the phase unbalance
and 652(t) the jitter in the receiver oscillator. A low pass filter (l.p.f) after the
down-conversion is assumed to remove higher frequencies spurs. If the input to the
down-converter is
y(t) = Re { y (t)e&wt } = yi(t) cos wot - yQ(t) sinwot
Then, the in-phase component is obtained as:
I(t) = 2 -1.p.f. {y(t) - cos (wot + 56 2 (t))}
= yI(t) cos 62 (t) + yQ(t) sin 502 (t)
and the quadrature component is obtained as:
Q(t) = -2 l1.p.f. {y(t) - (1 + A) sin (wot + AO + 50 2 (t))}
= yQ(t) (1 + A) cos (AO + J52 (t)) + yj(t) (1 + A) sin (A0 + 502 (t))
141
(B.74)
142
Bibliography
[1] A.A.M. Saleh and R.A. Valenzuela, "A Statistical Model for Indoor Multipath
Propagation," IEEE Journalon Selected Areas in Communications,vol. SAC-5,
no. 2, pp. 128-137, Feb. 1987.
[2] Federal Communications Commission, Ultra-Wideband (UWB) First Report
and Order, Federal Communication Commission, Feb. 2002.
[3] J. Foerster, "Channel Modeling Sub-Committee Report Final," Tech. Rep.,
IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs),
Feb. 2002.
[4] C. Luo, M. Medard, L. Zheng, "On Aproaching Wideband Capacity Using
Multitone FSK," IEEE Journal on Selected Areas in Communications,vol. 23,
no. 9, pp. 1830-1838, September 2005.
[5] T.W. Barrett, "History of UltraWideBand (UWB) Radar & Communications:
Pioneers and Innovators," in Proceedings of Progress In ElectromagneticsSymposium, Cambridge MA, 2000.
[6] K. Siwiak, P. Withington, S. Phelan, "Ultra-wide band radio: the emergence
of an important new technology," in Vehicular Technology Conference 2001,
2001, vol. 2, pp. 1169-1172.
[7] S. Roy, J.R. Foerster, V.S. Somayazulu, D.G. Leeper, "Ultrawideband Radio Design: The Promise of High-Speed, Short-Range Wireless Connectivity,"
Proceedings of the IEEE, vol. 92, no. 2, pp. 295-311, February 2004.
[8] I.I. Immoreev and A.N. Sinyavin, "Features of Ultra-wideband signals radiation," in Proceedings of the IEEE Conference on Ultra Wideband Systems and
Technologies, 2002, pp. 345-349.
[9] M.Z. Win and R.A. Scholtz, "Impulse Radio: How it Works," IEEE Communication Letters, vol. 2, no. 2, pp. 36-38, February 1998.
[10] C.L. Bennet, G.F. Ross, "Time-domain electromagnetics and its application,"
Proceedings of the IEEE, vol. 66, pp. 299-318, 1978.
[11] R.N. Morey, "Geophysical survey system employing electromagnetic impulses,"
U.S. Patent 3,806,795, Apr. 1974.
143
[12] H.F. Harmuth, Sequency Theory, Academic Press, 1977.
[13] R.A. Scholtz, "Multiple Access with Time-Hopping Impulse Modulation," in
Proceedings of the MILCOM conference, 1993, pp. 447-450.
[14] M.Z. Win and R.A. Scholtz, "Ultra-Wide Bandwidth Time-hopping, SpreadSpectrum Impulse Radio for Wireless Multiple Access Communications," IEEE
Transactions on Communications,vol. 48, no. 4, pp. 679-691, Apr. 2000.
[15] M.Z. Win, R.A. Scholtz, L.W. Fullerton, "Time-hopping SSMA techniques for
impulse radio with an analog modulated data subcarrier," in Proceedings of
the IEEE Fourth InternationalSymposium on Spread Spectrum Techniques and
Applications, Mainz, Germany, September 1996, pp. 359-394.
[16] M.Z. Win and R.A. Scholtz, "On the Robustness of Ultra-Wide Bandwidth
Signals in Dense Multipath Environments," IEEE CommunicationLetters, vol.
2, no. 2, pp. 51-53, Feb. 1998.
[17] M.Z. Win and R.A. Scholtz, "On the Energy Capture of Ultrawide Bandwidth
Signals in Dense Multipath Environments," IEEE Communications Letters,
vol. 2, no. 9, pp. 245-247, Sept. 1998.
[18] R.J. Cramer, R.A. Scholtz, M.Z. Win, "An Evaluation of the Ultra-wideband
Propagation Channel," IEEE Transactions on Antennas Propagation,vol. 50,
no. 5, pp. 516-570, May 2002.
[19] D. Cassioli, M.Z. Win, A.F. Molisch, "The Ultra-wide Bandwidth Indoor Channel: from Statistical Model to Simulations," IEEE Journal on Selected Areas
of Communication, vol. 20, no. 6, pp. 1247-1257, August 2002.
[20] M.Z. Win and R.A. Scholtz, "Characterization of Ultra-wide Bandwidth Wireless Indoor Channel: A Communication Theoretic View," IEEE Journal on
Selected Areas of Communication, vol. 20, no. 9, pp. 1613-1627, December
2002.
[21] C.J. Le Martret and G.B. Giannakis, "All Digital PAM Impulse Radio for
Multiple Access through Frequency Selective Multipath," in Proc. of the IEEE
2000 Global Telecommunications Conference, 2000, pp. 77-81.
[22] L. Yang and G.B. Giannakis, "Ultra-wideband Communications: An Idea
Whose Time Has Come," IEEE Signal Processing Magacine, vol. 21, no. 6,
pp. 26-54, November 2004.
[23] C.J. Le Martret and G.B. Giannakis, "All Digital PPM Impulse Radio for
Multiple Access through Frequency Selective Multipath," in Proceedings of the
2000 IEEE Sensor Array and Multichannel Signal Processing Workshop, 2000,
pp. 22-26.
144
[24] C.N. Georghiades, "On PPM Sequences with Good Autocorrelation Properties," IEEE Transactions on Information Theory, vol. 34, no. 3, pp. 571-576,
may 1988.
[25] M. Medard and R.G. Gallager, "Bandwidth Scaling for Fading Multipath Channels," IEEE Transactions on Information Theory, vol. 48, no. 4, pp. 840-852,
April 2002.
[26] I.E. Telatar and D.N.C. Tse, "Capacity and Mutual Information of Wideband
Multipath Fading Channels," IEEE Transactions on Information Theory, vol.
46, no. 4, pp. 1384-1400, July 2000.
[27] I.D. O'Donnel and R.W. Brodersen, "An Ultra-Wideband Transceiver Architecture for Low Power, Low Rate, Wireless Systems," IEEE Transactions on
Vehicular Technology, vol. 54, no. 5, pp. 1623-1631, September 2005.
[28] T.Q.-S. Quek and M.Z. Win, "Performance Analysis of Ultrawide Bandwidth
Transmitted-reference Communications," in Proceedings of the IEEE Semiannual Vehicular Technology Conference, Milan, Italy, May 2004, vol. 3, pp.
1285-1289.
[29] T.Q.-S. Quek and M.Z. Win, "Analysis of UWB Transmitted REference Communication Systems in Dense Multipath Channels," IEEE Journal on Selected
Areas of Communication, vol. 23, no. 9, pp. 1863-1874, September 2005.
[30] R.T. Hoctor and H.W. Tomlinson,
"An Overview of Delay-Hopped,
Transmitted-Reference RF Communications," Tech. Rep., GE Research & Development Center, January 2002.
[31] J.D. Choi and W.E. Stark, "Performance of Ultra-wideband Communications
with Suboptimal Receivers in Multipath Channels," IEEE Journal on Selected
Areas in Communications,vol. 20, no. 9, pp. 1754-1766, December 2002.
[32] W.M. Gifford and M.Z. Win, "On Transmitted-Reference UWB Communications," in Proceedings of the 38th Asilomar Conference on Signals, Systems and
Computers, Pacific Grove, CA, november 2004, pp. 1526-1531, Invited Paper.
[33] T.Q.S. Quek, M.Z. Win, D. Dardari, "UWB Transmitted-Reference Signalling
Schemes - Part I: Performance Analysis," in Proceedings of the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland, September 2005,
pp. 587-592.
[34] T.Q.S. Quek, M.Z. Win, D. Dardari, "UWB Transmitted-Reference Signalling
Schemes - Part 2: Narrowband Interference Analysis," in Proceedings of
the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland,
September 2005, pp. 593-598.
[35] Anuj Batra et al., "TI Physical Layer Proposal for IEEE 802.15 Task Group
3a," Tech. Rep., Texas Instruments, May 2003.
145
[36] J. Balakrishnan, A. Batra, A. Dabak, "A Multi-band OFDM System for UWB
Communication," in Proceedings of the COnference on Ultra-Wideband Systems
and Technologies, 2003, pp. 354-358.
[37] E. Saberinia and A. Tewfik, "Pulsed and Non-pulsed OFDM Ultra Wideband
Wireless Personal Area Networks," in Proceedings of the 2003 IEEE Conference
on Ultra Wideband Systems and Technologies, November 2003, pp. 275-270.
[38] R. Roberts, "XtremeSpectrum CFP Document," Tech. Rep., Physical Layer
Submission to IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs), July 2003.
[39] A. Batra, J. Balakrishnan, A. Dabak, R. Gharpurey, P. Fontaine, J. Lin, "TimeFrequency Interleaved Orthogonal Frequency Division Multiplexing," Tech.
Rep., Physical Layer Submission to IEEE P802.15 Working Group for Wireless
Personal Area Networks (WPANs), May 2003.
[40] "IEEE 802.11a, supplement to Standard IEEE 802.11. Part 11: Wireless LAN
Medium Access Control (MAC) and Physical Layer (PHY) specifications: Highspeed Physical Layer in the 5 GHz Band," Tech. Rep., IEEE, Sept. 1999.
[41] D.B. Jourdan, J.J. Deyst, M.Z. Win, N. Roy, "Monte-Carlo Localization
in Dense Multipath Environments using UWB Ranging," in Proceedings of
the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland,
september 2005, pp. 314-319.
[42] J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits, Prentice
Hall, 2nd edition, 2002.
[43] P. Newaskar, R. Blazquez, A. Chandrakasan, "A/D Precision Requirements for
an Ultra-Wideband Radio Receiver," in Proc. of the 2002 IEEE Workshop on
SIPS, 2002, pp. 270-275.
[44] M.S. Braasch and A.J. Van Dierendonck, "GPS Receiver Architectures and
Measurements," Proceedings of the IEEE, vol. 87, no. 1, pp. 48-64, January
1999.
[45] H. Meyr and G. Ascheid, Synchronization in Digital Communications, Volume
1: Phase-Frequency-LockedLoops and Amplitude Control, Wiley Interscience,
1990.
[46] S.Iida, K. Tanaka, H. Suzuki, N. Yoshikawa, N. Shoji, B. Griffiths, D. Mellor, F.
Hayden, I. Butler, J. Chatwin, "A 3.1 to 5 GHz CMOS DSSS UWB Transceiver
for WPANs," in Proceedingsof the 2005 IEEE InternationalSolid-State Circuits
Conference, 2005, pp. 214-215.
[47] J.R. Foerster, "The Effects of Multipath Interference on the Performance of
UWB systems in an Indoor Wireless Channel," in Proc. of the 2001 IEEE
Vehicular Technology Conference, May 2001, pp. 1176-1180.
146
[48] S.S. Ghassemzadeh, R. Jana, C.W. Rice, W. Turin, V. Tarokh, "A Statistical
Path Loss Model for Inhome UWB Channels," in Proc. of the 2002 IEEE
UWBST, May 2002, pp. 59-64.
[49] D. Cassioli, M.Z. Win, A.F. Molish, "A Statistical Model for the UWB Indoor
Channel," in Proc. of the 2001 IEEE Vehicular Technology Conference, May
2001, pp. 1159-1163.
[50] Vincenzo Lottici, Aldo D'Andrea, Umberto Mengali, "Channel Estimation for
Ultra-Wideband Communications," IEEE Journal on Selected Areas in Communications, vol. 20, no. 9, pp. 1638-1645, december 2002.
[51] J.G. Proakis, Digital Communications, McGraw Hill Inc, fourth edition, 2000.
[52] C. Carbonelli, U. Mengali, U. Mitra, "Synchronization and Channel Estimation
for UWB Signals," in Proceedings of the Global Telecoommunications Confer-
ence, 2003, pp. 764-768.
[53] I. Maravic, J. Kusuma, M. Vetterli, "Low-Sampling Rate UWB Channel Characterization and Synchronization," Journal of Communication Networks, vol.
5, no. 4, pp. 319-327, 2002.
[54] Z. Wang and X. Yang, "Ultra wide-band communications with blind channel
estimation based on first-order statistics," in Proceedings of the International
Conference in Acoustics, Speech and Signal Processing,2004.
[55] W. Suwansantisuk and M.Z. Win, "Fundamental Limits on Spread Spectrum
Signal Acquisition," in Proceedings of the Conference on Information Science
and Systems, Baltimore, MD, march 2005.
[56] W. Suwansantisuk, M.Z. Win, L.A.Shepp, "Properties of the Mean Acquisition
Time for Wide-Bandwidth Signals in Dense Multipath Channels," in Proceedings of the 3rd SPIE International Symposium on Fluctuation and Noise in
Communication Systems, Austin, TX, may 2005, pp. 121-135.
[57] W. Suwansantisuk and M.Z. Win, "On the Asymptotic Performance of MultiDwell Signal Acquisition in Dense Multipath Channels," in Proceedings of
the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland,
September 2005, Invited Paper.
[58] W. Suwansantisuk and M.Z. Win, "Multipath Aided Rapid Acquisition: Optimal Search Strategies," IEEE Transactions on Information Theory, vol. tbp,
pp. tbp, 2006.
[59] G.R. Aiello and G.D. Rogerson, "Ultra-wideband wireless systems,"
microwave magazine, vol. 4, no. 2, pp. 36-47, June 2003.
147
IEEE
[60] M.Z. Win, G. Chrisikos, N.R. Sollenberger, "Performance of Rake Reception in
Dense Multipath Channels: Implications of Spreading Bandwidth and Selection
Diversity Order," IEEE Journal on selected areas in communications, vol. 18,
no. 8, pp. 1516-1525, August 2000.
[61] M.Z. Win, G. Chrisikos, N.R. Sollenberger, "Effects of Chip Rate on Selective
Rake Combining," IEEE Communications Letters, vol. 4, no. 7, pp. 233-235,
July 2000.
[62] L. Yang and G.B. Giannakis, "A General Model and SINR Analysis of Low
Duty-Cycle UWB Access Through Multipath With Narrowband Interference
and Rake Reception," IEEE Transactions on Wireless Communications, vol.
4, no. 4, pp. 1818-1833, July 2004.
[63] D. Cassioli, M.Z. Win, F. Vatalaro, A.F. Molish, "Performance of LowComplexity RAKE Reception in a Realistic UWB Channel," in Proc. of the
2002 IEEE International Conference on Communications, 2002, pp. 763-767.
[64] G.D. Forney, "Maximum Likelihood Sequence estimation of Digital Sequences
in the Presence of Intersymbol Interference," IEEE Transactions on Information Theory, vol. 18, pp. 363-378, May 1972.
[65] A. Hafeez and W.E. Stark, "Decision Feedback Sequence Estimation for Unwhitened ISI Channels with Applications to Multiuser Detection," IEEE Journal on Selected Areas in Communications, vol. 16, no. 9, pp. 1785-1795, December 1998.
[66] P.J. Black, T.H.Y. Meng, "A 1-Gb/s, Four-State, Sliding Block Viterbi Decoder," IEEE Journal of Solid State Circuits, vol. 32, pp. 797-805, June 1997.
[67] C.B. Shung, H.D. Lin, R. Cypher, P.H. Siegel, H.K. Thapar, "Area-efficient
Architectures for the Viterbi Algorithm - Part I: Theory," IEEE Transactions
on Communications, vol. 41, no. 4, pp. 636-644, april 1993.
[68] C.B. Shung, H.D. Lin, R. Cypher, P.H. Siegel, H.K. Thapar, "Area-Efficient
Architectures for the Viterbi Algorithm - Part II: Applications," IEEE Transactions on Communications,vol. 41, no. 5, pp. 802-807, may 1993.
[69] M.A. Bickerstaff, et al., "A Unified Turbo/Viterbi Channel Decoder for 3 GPP
Mobile Wireless in 0.18-pm CMOS," IEEE Journal of Solid-State Circuits, vol.
37, no. 11, pp. 1555-1564, november 2002.
[70] X. Liu and M.C. Papaefthymiou, "Design of a 20-Mb/s 256-State Viterbi Decoder," IEEE Transactions on very large scale integration (VLSI) systems, vol.
11, no. 6, pp. 965-975, december 2003.
[71] E. Yeo, S.A. Augsburger, W.R. Davis, B. Nikolic, "A 500-Mb/s Soft-Output
Viterbi Decoder," IEEE Journal of Solid-State Circuits, vol. 38, no. 7, pp.
1234-1241, july 2003.
148
[72] A.P. Chandrakasan, S. Sheng, R.W. Brodersen, "Low-power Digital CMOS
Design," IEEE Journal of Solid State Circuits, vol. 27, no. 4, pp. 473-484,
April 1992.
[73] R. Min, M. Bhardwaj, S.H. Cho, N. Ickes, E. Shih, A. Sinha, A. Wang, A.
Chandrakasan, "Energy-Centric Enabling Technologies For Wireless Sensor
Networks," IEEE Wireless Communications, vol. 9, no. 4, pp. 28-39, August
2002.
[74] J. Bergervoet, K. Harish, G. van der Weide, D. Leenaerts, R. van de Beek, H.
Waite, Y. Zhang, S. Aggarwal, C. Razzell, R. Roovers, "An Interference Robust
Receive Chain for UWB Radio in SiGe BiCMOS," in Proceedings of the 2005
IEEE InternationalSolid-State Circuits Conference, 2005, pp. 200-201.
[75] D. Leenaerts, R. van de Beek, G. van der Weide, J. Bergervoet, K.S. Harish, H.
Wite, Y. Zhang, C. Razzell, R. Roovers, "A SiGe BiCMOS 1 ns Fast Hopping
Frequency Synthesizer for UWB Radio," in Proceedings of the 2005 IEEE
InternationalSolid-State Circuits Conference, 2005, pp. 202-203.
[76] H.Y. Liu, C.C. Lin, Y.W. Lin, C.C. Chung, K.L. Lin, W.C. Chang, L.H. Chen,
H.S. Chang, C.Y. Lee, "A 480Mb/s LDPC-COFDM-Based UWB Baseband
Transceiver," in Proceedingsof the 2005 IEEE InternationalSolid-State Circuits
Conference, 444-445, 2005.
[77] M. Verhelst, W. Vereecken, M. Steyaert, W. Dehaene, "Architectures for Low
Power Ultra-wideband Radio Receivers in the 3.1-5GHz Band for Data Rates
< 10Mbps," in Proceedings of the ISLPED, Newport Beach, California, USA,
2004, pp. 280-285.
[78] B. Razavi, Principles of Data Conversion System Design, Wiley-IEEE Press,
1994.
[79] F.S. Lee, D. Wentzloff, A. Chadrakasan, "An Ultra-Wideband Baseband FrontEnd," in Digest of Papers of the 2004 Radio Frequency Integrated Circuits
Symposium, June 2004, pp. 493-496.
[80] W. Suwansantisuk, M.Z. Win, L.A. Shepp, "On the Performance of WideBandwidth Signal Acquisition in Dense Multipath Channels," IEEE Transactions on Vehicular Technology, vol. 54, no. 5, pp. 1584-1594, September 2005.
[81] W. Suwansantisuk and M.Z. Win, "Optimal Search Strategies for Ultrawide
Bandwidth Signal Acquistion," in Proceedingsof the IEEE InternationalConference on Ultra-Wideband,Zurich, Switzerland, September 2005, pp. 349-354.
[82] E.A. Homier and R.A. Scholtz, "Rapid Acquisition of Ultra-wideband signals
in the dense multipath channel," in Proceedings of the IEEE Conference on
Ultra Wideband Systems and Technologies, 2002, pp. 105-109.
149
[83] L. Yang and G. B. Giannakis, "Blind UWB Timing with a Dirty Template,"
in Proceedings of the International Conference in Acoustics, Speech and Signal
Processing, 2004, pp. 509-512.
[84] R. Gagliardi, J. Robbins, H. Taylor, "Acquisition Sequences in PPM Communications," IEEE Transactions on Information Theory, vol. IT-33, no. 5, pp.
738-744, september 1987.
[85] R. Blazquez, P. Newaskar, A. Chandrakasan, "Coarse Acquisition for Ultrawideband digital Receivers," in Proc. of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing,Apr. 2003, vol. 4, pp. 137-140.
[86] A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing,
Prentice Hall Signal Processing Series. Prentice Hall, second edition, 1999.
[87] J. Dornberg, H.S. Lee, D.A. Hodges, "Full-speed testing of A/D converters,"
IEEE Journal of Solid-State Circuits,vol. 19, no. 12, pp. 22-26, December 1984.
[88] A. Giorgetti, M. Chiani, M.Z. Win, "The Effect of Narrowband Interference on
Wideband Wireless Communication Systems," IEEE Transactions on Communications, vol. 53, no. 12, pp. 2139-2149, december 2005.
[89] D. Gerakoulis and P. Salmi, "An Interference Suppressing OFDM System for
Ultra Wide Bandwidth Radio Channels," in Proceedings of the IEEE Conference on Ultra Wideband Systems and Technologies, 2002, pp. 259-264.
[90] B. Razavi, RF Microelectronics, Communications, Engineering and Emerging
Technologies. Prentice Hall, first edition, 1998.
[91] J.A.C. Bingham, "Multicarrier Modulation for Data Transmission: An Idea
Whose Time Has Come," IEEE Communications Magazine, vol. 28, no. 5, pp.
5-14, may 1990.
[92] R. Blazquez, F.S. Lee, D. Wentzloff, P. Newaskar, J. Powell, A. Chandrakasan,
"Digital Architecture for an Ultra-wideband Radio Receiver," in Proc. of the
2003 IEEE Vehicular Technology Conference, 2003, vol. 2, pp. 1303-1307.
[93] W.R. Braun and U. Dersch, "A Physical Mobile Radio Channel Model," IEEE
Transactions on Vehicular Technology, vol. 40, no. 2, pp. 472-482, May 1991.
[94] M.Z. Win and J.H. Winters, "Analysis of Hybrid Selection/Maximal-Ratio
Combining in Rayleigh Fading," IEEE Transactions on Communications,vol.
47, no. 12, pp. 1773-17767, December 1999.
[95] M.Z. Win and J.H. Winters, "Analysis of Hybrid Selection/Maximal-Ratio
Combining in Rayleigh Fading," in Proceedings of the IEEE InternationalConference on Communications, Vancouver Canada, June 1999, vol. 1, pp. 6-10.
150
[96] M.Z. Win and J.H. Winters, "Analysis of Hybrid Selection/Maximal-Ratio
Combining of Diversity Branches with Unequeal SNR in Rayleigh Fading," in
Proceedings of the 49th Annual International Vehicular Technology Conference,
Houston, TX, May 1999, vol. 1, pp. 215-220.
[97] A. Papoulis, Probability, Random Variables, and Stochastic Processes, Electrical & Electronic Engineering. McGraw-Hill International Editions, third edition,
1991.
[98] S. Verdu, Multiuser Detection, Cambridge University Press, 1998.
[99] Q. Li and L.A. Rusch, "Multiuser Detection for DS-CDMA UWB in the Home
Environment," IEEE Journal on Selected Areas in Communications, vol. 20,
no. 9, pp. 1701-1711, December 2002.
[100] N. Kong and L.B. Milstein, "Combined Average SNR of A Generalized Diversity Selection Combining Scheme," in Proceedings of the IEEE International
Conference on Communications, June 1998, vol. 3, pp. 1556-1560.
[101] M.Z. Win and Z.A. Kostic, "Impact of Spreading Bandwidth on Rake Reception in Dense Multipath Channels," IEEE Journal on Selected Areas in
Communications,vol. 17, no. 10, pp. 1794-1806, October 1999.
[102] M.Z. Win and Z.A. Kostic, "Virtual Path Analysis of Selective Rake Receiver
in Dense Multipath Channels," IEEE Communications Letters, vol. 3, no. 11,
pp. 308-310, November 1999.
[103] W.C. Jakes, Microwave Mobile Communications, IEEE Press, Piscataway, NJ,
08855-1331, IEEE press classic reissue edition edition, 1995.
[104] J. Foerster and Q. Li, "UWB Channel Modeling Contribution from Intel,"
Tech. Rep., IEEE P802.15-02/279-SG3a.
[105] M.Z. Win, "A Unified Spectral Analysis of Generalized Time-Hopping SpreadSpectrum Signals in the Presence of Timing Jitter," IEEE Journal on Selected
Areas in Communications, vol. 20, no. 9, pp. 1664-1676, December 2002.
[106] M.Z. Win, "Spectral Density of Random Time-hopping Spread-spectrum UWB
Signals," IEEE Communications Letters, vol. 6, no. 12, pp. 526-528, December
2002.
[107] A. Ridolfi and M.Z. Win, "Ultrawide Bandwidth Signals as Shot-Noise: a
Unifying Approach," IEEE Journal on Selected Areas of Communications,vol.
24, no. 4, pp. 899-905, april 2006.
[108] J. Romme and L. Piazzo, "On the Power Spectral Density of Time-Hopping
Impulse Radio," in Proceedings of the Conference on Ultra-wideband Systems
and Technologies, 2002, pp. 241-244.
151
[109] J. Powell and A.P. Chandrakasan, "Differential and Single Ended Elliptical
Antennas for 3.1-10.6 GHz Ultra Wideband Communication," in Proceedings
of the IEEE Antennas and PropagationSociety InternationalSymposium, June
2004.
[110] J. Powell and A.P. Chandrakasan, "Spiral Slot Antenna and Circular Disc
Monopole Antenna for 3.1-10.6 GHz Ultra Wideband Communications," in
Proceedingsof the 2004 InternationalSymposium on Antennas and Propagation,
June 2004.
[111] N. Ackerman, "A Platform for Ultra Wideband Communication Systems," M.S.
thesis, Massachusetts Institute of Technology, May 2005.
[112] F.S. Lee and A.P. Chandrakasan, "A BiCMOS Ultra-wideband 3.1-10.6GHz
Front-End," in Proceedings of the IEEE CICC, September 2005.
[113] D.D. Wentzloff and A.P. Chandrakasan, "A 3.1-10.6 GHz Ultra-wideband
Pulse-shaping Mixer," in IEEE Radio Frequency IC symposium, June 2005.
[114] B. Ginsburg and A.P. Chandrakasan, "Dual Scalable 500MS/s, 5b TimeInterleaved SAR ADCs for UWB Applications," in Proceedings of the IEEE
CICC, 2005.
152