Ultra-wideband Digital Baseband by Rauil Blazquez Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2006 @ Massachusetts Institute-of Technology 21306. All rights reserved. Author ................ .. ....... ..... ... .. ................. Department of Electricg eering and Computer Science May.25, 2006 Certified by.................................. /...... .. .......... Anantha P. Chandrakasan Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by ....... ................. Arthur C. Smith Chairman, Department Committee on Graduate Students OF TECHNOLOGY Nov .22006 LIBRARIES ARCHVES Ultra-wideband Digital Baseband by Rail Blhzquez Submitted to the Department of Electrical Engineering and Computer Science on May 25, 2006, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The FCC approved the use of Ultra-wideband signals for communication purposes in February 2002 in the band from 3.1GHz to 10.6GHz, effectively opening 7.5GHz of free unlicensed bandwidth. There are two main constraints for the use of this band: a maximum EIRP spectral density of -41.3dBm/MHz and a minimum instantaneous bandwidth of 500MHz. One of the main driving applications of this technology is high data rate communication over short distances. In this thesis two digital baseband receivers for impulse UWB have been designed. The first one was designed for baseband UWB pulses and achieves 193 kbps of wireless communication using impulses of 300 MHz bandwidth and 2% duty cycle, and was part of a system-on-a-chip. The second baseband achieves 100Mbps using impulses of 500 MHz bandwidth in the FCC compliant band, as part of a whole UWB system. Due to its bandwidth the multipath becomes very relevant as the data rate is increased into the range of the hundreds of megabits per second. The current multipath model, used for the development of IEEE standard 802.15.3a is a modified Saleh-Valenzuela model [1] that has a root mean square duration of the impulse response from 5 to 25 ns. The maximum data rate in an UWB system depends on the signal to noise ratio and the multipath. The assessment of the quality of the channel and the exposure of several useful knobs in the baseband to control the complexity of the signal processing implemented allows higher levels of the communication hierarchy to fine-tune the receiver, trading off number of operations and power dissipation with quality of service. It includes a MLSE and a RAKE receiver to compensate for multipath. It has been implemented in 0.18 um CMOS technology using National Semiconductors process. The chip has been demonstrated in a wireless system. Thesis Supervisor: Anantha P. Chandrakasan Title: Full Professor Acknowledgments When I came to MIT, I admit that it did not occur to me that I would work on digital circuits. After all, I had moved in the previous years, steadily but surely, towards signal processing. But after talking with professor Anantha Chandrakasan, I decided to take his class 6.374, and became interested in circuit design, since it was a natural evolution, not only having worked on signal processing but also implementing that same signal processing. I would like to thank Anantha for recognizing this opportunity and for allowing me to pursue in his group a research project with signal processing components, while permitting me to delve deep in the circuit design field. Anantha's advice, encouragement, enthusiasm, and guidance to explore this field of research proved essential during these years to achieve the final goal. I would like to thank him also for his patience, and for his care in the well-being of all his students, specially taking into account the size of his research group. I also had the opportunity of being his teaching assistant in 6.374, an experience enriching and rewarding in and of itself. For Anantha's example as an educator, an engineer, an scientist and a person, I consider it a privilege to have worked with him. I would like to thank also Professors Lizhong Zheng and Moe Win, for their encouragement, advice, feedback and patience during the development of this thesis. I would like to thank Peter Holloway and his team at National Semiconductor Inc. not only for the fabrication of the chip, but also for his patience and work in facilitating the process of tape-out of the second chip implemented in this thesis. Without them I would not have graduated on time. I would like to thank Fred Lee, for his continuous help in the testing of the chip, where his debugging skills, his enthusiasm, and his proficiency with the soldering iron have cleared most of the obstacles. I feel lucky to have shared his friendship, his sense of humour, and his conversation, going from circuits to good food, and of the humane and divine of life and science, during the long hours shared in the cubicle and in the lab. I would also like to thank David Wentzloff for his support, optimism, sense of humour, and for teaching me how to solder a chip to a board. I would like to thank Manish Bhardwaj for his advice since I met him my first year at MIT. At that time, we were in contiguous cubicles and shared 6.241. I would like to thank him for all the moments shared that included fruitful discussions on wireless, cinema, coding, future plans, and ups, lows and in-betweens of grad school. Thanks also for helping me to clear the last obstacles of this work and with the flow of this thesis. I would like to thank Vivienne Sze and David Wentzloff for proof-reading this thesis. I would also like to thank the members of the UWB group: Puneet Newaskar, Vivienne Sze, Brian Ginsburg, Johnna Powell, Nathan Ackerman, Ashutosh Bhardwaj and Kyle Gilpin. I am lucky to have been part of this team, and thanks to them I have appreciated and admired the beauty and difficulty involved in the different parts that comprise a communication system. I would like to thank Daniel Finchelstein, Frank HonorS, Alice Wang and Michael McIlrath for their continuous help with the quirks, manias, and Murphy's law compliance of the tools. Without their help, there would not have been tape-outs. Many thanks to all the other members of the Digital Integrated Circuits and Systems group, both present and past: Naveen Verma, Denis Daly, Payam Lajevardi, Alex Kern, Nathan Ickes, Joyce Kwong, Yogesh Ramadass, Tao Pan, Taeg Sang Cho, Vikram Chandrasekhar, Fred Chen, Nigel Drego, Rex Min, CheeWe Ng, SeongHwan Cho, Julia Cline, Piyada Phanaphat, Theodoros Konstantakopoulos, Nisha Checka, Shamik Das, Travis Simpkins, and Eugene Shih. They have made the group a fun and interesting place to be and work. I would like to thank Margaret Flaherty for her help with paperwork, finding rooms for meetings, thesis defense, aligning the schedules of several professors for my committee meetings, and, in general, making sure the only challenges I had to meet were technical. I would like to thank also Debroah Hodges-Pabon for making MTL a vibrant place through socials, seminars, and other activities. I would like to thank Marilyn Pierce for her patience during these years even when I submitted my theses on the eleventh hour. I would like to thank La Caixa Fellowship Program, for the opportunity they gave me to pursue my research interests abroad. Their efficient management of the different stages of the fellowship makes it one of the best possible ways of starting graduate studies in an American university. This research has also been sponsored by an Intel Fellowship, Hewlett-Packard under the HP/MIT Alliance, and the NSF. I would like to thank also those who provided invaluable support outside the lab. Thanks to Pablo Vila, friend, colleague, roommate, for his support during all these years, for sharing long conversations about life, wireless communications, music, and "temazos llenapistas". I would also like to thank Virginia Romero, Ismael Calleja, Ana Bravo, Luis Enrique Garcia for their friendship and support during this time, no matter the distance, the time difference and my crazy schedule every time I went back to Europe. Thanks to Susana, Clara, Emilenne, Fran, Andres, Eduardo, Karen, Juan, Ana, Parmesh, along with the rest of the people I met, befriended and got to know in Boston, allowing me to keep a balance between life and grad studies. Thanks for being there. I would like to thank Aidita for her unconditional support, advice and over all, love. The trip is always more important than the destination, and I am grateful to have shared these last two years with you. This thesis has all more value because in this time I met you and you changed my life. Finally, I would like to thank my parents, Magdalena and Felix, for their continuous, unconditional, unrelenting love and support during all these years. Anything that I could write here would be but a pale shade of what they mean to me. This work is as much yours as mine. Finalmente, me gustaria dar las gracias a mis padres, Magdalena and Felix, por su amor y apoyo continuo, incondicional e infalible durante todos estos afios. Cualquier cosa que yo pudiera escribir seria s6lo un p.lido reflejo de lo que representan para mi. Este trabajo es el resultado no s61o de mi esfuerzo, sino tambidn del vuestro. Contents 1 Introduction 1.1 Background .......................... 1.2 UW B Signals ......................... 1.3 Characteristics of UWB Signals . ............... 1.4 UW B Applications ...................... 1.5 Previously Used Architectures . .............. 1.6 Signal Processing Techniques . ............... 1.7 Power Dissipation in UWB Systems . ............. 1.8 Thesis Contributions ..................... . . 2 A Baseband Processor for a Baseband UWB Transceiver 2.1 UW B Signals ................. ... . ... . 2.2 System Trade-offs . . . . . . . . 2.3 Architectural Choi ces for Clock Generation and ADC . . . . . . . . . 2.4 Digital Baseband ... .. .. . 2.4.1 Functionali ... .. ... 2.4.2 A parallelized approach .............. . .. .. ... 2.4.3 Architectur . .. ... .. 2.5 Performance Resul ts . . . . . . . . . . . . . . . . . . . . .. ... .. 33 34 36 37 37 41 43 45 3 System Analysis for the FCC Compliant System 3.1 Objectives of the Design ......... 3.2 Homodyne vs Heterodyne architecture . 3.3 Specification of the ADC ......... 3.3.1 Signal definition .......... 3.3.2 Automatic Gain Control ..... 3.3.3 Demodulating Architectures . . . 3.3.4 Simulations and Analysis ..... 3.4 Choice of UWB Signal .......... 3.5 M ultipath ................. 3.5.1 Channel Model .......... 3.5.2 Data-Aided Channel Estimation . 3.5.3 Rake Receiver ........... 3.5.4 MLSE Equalizer ......... 3.6 Choice of Packet Format ......... 49 50 51 52 53 53 54 54 59 59 60 61 64 69 71 . ... •.. ..... ... ..... ..... •.. . . . . . . . . . . . . . .. .. .. .. ... . . . . . . . . . . . . . . . . . . . . .. ... . .. ... . .. .... . . . . . . . .. ... . .. ... . . . . . . Baseband Functionality ............. Non-idealities Model . . . . . . . . . . . . . . Link Budget ................... Summary .................... 72 74 76 77 4 FPGA Implementation 4.1 Architecture of the Discrete Platform ..... 4.1.1 Transmitter ............... 4.1.2 Front-end ................ 4.1.3 Receiver ................. 4.1.4 Protocol ................. 4.2 Application in the Digital Baseband Design 4.2.1 Limitations of the Digital Platform .. 4.2.2 Specifications and Interfaces ...... 4.2.3 Architecture of the Baseband . . . . . 4.2.4 State Machine . ............. 4.2.5 Results .................. 4.3 Application for Testing Multitone-FSK . . . . 4.3.1 Signal Definition . . . . . . . . . . . . 4.3.2 Receiver Architecture .......... 4.4 Conclusions ................... 81 81 82 84 84 85 86 86 86 88 91 92 92 92 94 94 5 ASIC Implementation of a Baseband for FCC Compliant UWB 5.1 Functionality of the Chip ............ 5.2 Interfaces and Clock Structure ......... 5.3 High-speed Clock Domain ........... 5.4 Correlators/Matched Filter Block ....... 5.5 Channel Analysis Module ........... 5.6 Timing Synchronization ............ ....... 5.7 MLSE Equalizer .......... 5.8 Implementation and Results .......... 95 95 96 98 100 105 108 111 114 3.7 3.8 3.9 3.10 ............ o .. .. .. .. .. .. ........... . ........... . ........... . . . . . . . .... . . . . . . . . . . . . . . 6 Conclusions and future work 6.1 Thesis summary . .............. 6.2 Conclusions ................. 6.3 Future work ................. 121 121 123 124 A Link budget .. .. ... . ... .. .. A.1 Spreadsheet equations ........... . . .. . A.1.1 Notation .......... ... . . . .. . . . . .. A.1.2 Definition of the parameter K . . . . . . . . . . . . . . . . . . A.1.3 Link budget and sensitivity . . . . . . . . . . . . . . . . . . . A.1.4 Extra losses due to the pulse shape. . . . . . . . . . . . . . . .. .. .. .. ... .. .. A.1.5 Receiver constraints ........ A.1.6 ADC constraints and detection . . . . . . . . . . . . . . . . . 125 125 125 126 127 128 129 129 A.1.7 Gain Specifications ........................ A.1.8 Noise Figure specification . .................. B Comments on signal generation B.1 Defining the transmitted signal . . . B.2 Jitter in the transmitter ....... B.3 Channel impact ............ B.4 Summary of the model ........ B.5 Dealing with a complex non-linearity B.6 Dealing with an I-Q unbalance .... . 131 132 133 . . . . . . . . . . . . . . . . . . . ... .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. ... . .. .. .. .. 133 134 134 136 138 141 List of Figures 1-1 1-2 1-3 1-4 1-5 EIRP mask approved by the FCC [2]. . . . . . . . . .. . . . . . . . . Intended applications. .......................... Architecture of UWB receiver by Berkeley Wireless Research Center. Correlator channel in a CDMA receiver . . .. . . . . . . . . . . . . Architecture of baseband of UWB receiver by Sony Corp. ........ 2-1 BER as a function of the SNR (a) or SIR (b) for different ADCs . . . 2-2 Baseband processor block diagram. . ................... 2-3 Pd as a function of D, the relative position between the pulse and the template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 2-4 Coarse acquisition process as a Markov chain (D = Correct detection; FD = False detection). ............. .............. to a 2-5 Change of probability of detection due difference in frequencies between transmitter and receiver . . . . . . . . . . . . . . . . . . . . 2-6 Correlators Architecture. ............ ............. . 2-7 Groups of four consecutive samples required . . . . . . . . . . . . . . 2-8 Block diagram of the retiming block ..... . . . . . . . . . . . . . . 2-9 Implementation of the correlation bank. . . . . . . . . . . . . . . 2-10 Fine tracking subsystem block diagram.... . . . . . . . . . . . . . . 2-11 Coarse acquisition block diagram. ....... . . . . . . . . . . . . . . 2-12 Single chip UWB transceiver photograph... . . . . . . . . . . . . . . 3-1 500 MHz bandwidth channelization with FCC compliant power spectral density . . . .. . .. . ... . . . . . . . .. .. . . 3-2 Architectures for the receiver. . ............... 3-3 Receiver architectures for different UWB modulations. . 3-4 Probability of error for the AWGN limited case, OFDM UWB .. 3-5 Probability of error for the AWGN limited case, pulsed UWB 3-6 Probability of error for the interference limited case, OFDM UWB 3-7 Probability of error for the interference limited case, pulsed UWB 3-8 Example of UWB BPSK baseband signal, before up-conversion .. 3-9 500 MHz pulse with carrier 5 GHz. Courtesy of David Wentzloff . 3-10 Procedure to compensate for multipath ............. . . 3-11 Example of the clusters in one instance of the channels in [3]. . . . 40 42 44 45 46 47 47 47 3-12 Minimum SNR at the input to achieve a 10 dB SNR in the channel estimation as a function of the number of bits of the samples and the length of the integration. No saturation . ................ 3-13 Minimum SNR at the input to achieve a 10 dB SNR in the channel estimation as a function of the number of bits of the samples and the length of the integration. 6 dB saturation . ............... 3-14 Functional diagram of a Rake receiver . ................. 3-15 Functional diagram of the Rake receiver that will be implemented in .......... this UWB system ....... .......... 3-16 Modified Rake receiver. . .................. ....... 3-17 Losses in the modified Rake receiver as a function of the normalized threshold and the channel model. . .................. . 3-18 Losses associated with the parameter LMLSE in the Viterbi demodulator. 3-19 Design of the data packet. Courtesy of V. Sze. . ............. . 3-20 Required functionality of the digital baseband. . ........... 3-21 A simplified block diagram of a direct conversion front end...... . . 3-22 Explanation of the losses due to shape of the pulse. . ........ 3-23 Minimum received power as a function of the center frequency at 10 m. . .......... 3-24 Range of the AGC .................. 3-25 Maximum noise figure of the receiver. . .................. 64 65 66 67 68 70 71 72 74 75 77 78 78 79 . Block diagram of the discrete prototype. . .............. . Discrete prototype transmitter. Courtesy of N. Ackerman. ...... . Discrete Prototype receiver. Courtesy of Fred S. Lee. . ....... Boards related to the ADC and baseband of the discrete prototype. Courtesy of N. Ackerman ......................... 4-5 Losses due to misrepresentation of the channel impulse response in the discrete prototype. ............................ . 4-6 Block diagram of the discrete prototype baseband. . ......... . 4-7 Control Signals for the Serial to Parallel Register. . ......... 4-8 Block diagram of the basic structure for the correlators and matched filter .................... .............. 4-9 Block diagram of the retiming block. . .................. 4-10 Part of the preamble of a data packet as measured in the discrete prototype, without (above) and with an interference(below). ...... 4-11 Example of MFSK signal. Courtesy of Cheng Luo . .......... 82 83 84 4-12 Architecture for demodulation of Multitone FSK [4]. 94 4-1 4-2 4-3 4-4 ......... 85 87 88 89 90 91 92 93 96 5-1 Block diagram of the full transceiver. ................... 97 5-2 Block diagram of the functionality of the chip implemented. ...... . . 98 5-3 State machine implemented in the system. . ............ . . 99 5-4 Block diagram of the high speed clock domain. . .......... 100 5-5 Block diagram of the retiming block. . .................. 101 .. .. . ......... 5-6 Retiming block ............... . . . 102 5-7 Block diagram of the correlators. .................... 5-8 5-9 5-10 5-11 5-12 5-13 5-14 5-15 5-16 5-17 5-18 5-19 5-20 5-21 5-22 5-23 5-24 Block diagram of a correlator group . . . . . . . . . . . . . . . . . . Block diagram of the minimal unit of the correlators. .......... Block diagram of the channel analysis subsystem. ......... . . Structure of one of the 25 components of "Threshold Check" Block.. Structure of one of the 25 components of the blocks "Threshold Comply" and "Complex Conjugation"...................... . Block diagram of the MMSE weight estimator. . . . ....... . . . Block diagram of the Costas loop. ................... . 8-state Trellis diagram ........................... Locating the most probable path in a 8-state Trellis. . ......... Block diagram of the MLSE equalizer. . ................. Robust UWB baseband layout. . .................. .. Testing board ................................ Interface signals when a packet has been detected. ......... . . Interface signals showing a sequence of demodulated bits. ....... Probability of error measures in the ASIC. . ............... Demonstration of a QoS - Power trade-off. . ............... Structure of the data packet ........................ 104 104 107 108 109 109 110 112 113 113 115 116 117 117 118 119 119 List of Tables 2.1 2.2 Model results for a Gaussian pulse ................... Chip Measurements ............................ 3.1 3.2 ao,values set by AGC ........................... Multipath Channel Models ........................ . 40 48 54 61 Chapter 1 Introduction Although the concept of ultra-wideband modulation has been known and used for already several decades [5], it is currently being re-visited by the integrated circuits community as a viable high-speed, last-meter wireless link technology[6, 7]. Ultrawideband signals, for its large bandwidth, propagation characteristics [8], and timing definition, add special advantages to wireless communication that would make it amenable to some specific applications, while at the same time posing interesting challenges to the task. In this chapter, I will introduce UWB signals and communications, identify the characteristics that distinguish them from normal narrowband communications, the challenges it poses and the state of the art in the application of UWB signals for wireless communication purposes. 1.1 Background Although the denomination "ultra-wideband", UWB, and "impulse radio" are recent [9], impulse radio communications can be considered the first wireless data signal ever, since Marconi used signals that would fit the definition in his spark-gap to communicate from Lavernock Point, South Wales, to Flat Holm Island in May 13th 1897. The spark-gap transmitter produced a signal with a frequency of approximately 500 kHz, a maximum average power of 35 kW and a peak of pulse power of several tens of MW. The message received was three dots, the Morse code for the letter S. But it was dropped for a while in favor of narrowband communications in which the information is encoded either in the phase, the amplitude or the frequency of a carrier. Narrowband signals are easily separated using filtering and heterodyne and superheterodyne architectures. The fact that they are bandwidth limited also simplifies its control and regulation by government agencies such as the Federal Communication Commission in the United States of America. Meanwhile, although not for communication purposes, electromagnetic impulses were used for RADAR and positioning applications. During the second World War, the use of RADAR became widespread in the military, as were countermeasures for such systems. It was shown that the space definition of a RADAR system is inversely proportional to the bandwidth of the signals used. As bandwidth of the signal increases, greater detail that not only can locate with precision the position of the target, but also help to identify the nature of the target (with what is called a RADAR signature and pattern recognition procedures). With time, low probability of interception capabilities became more relevant, and purely impulse signals were replaced with signals of the same bandwidth but larger duty cycle that allowed keeping the same capabilities while at the same time reducing the probability of interception. At the end of the 1960's signals that could be classified as UWB appeared under the name of carrier-free, baseband, time domain, non-sinusoidal, and orthogonal function radio signals [5]. At the same time, the development of sample and hold receivers for oscilloscopes commercially at Tektronix Inc. was also to aid the UWB field. In 1973, the Ross and Robbins patents [5] pioneered the use of UWB signals under these other names in a range of applications, including both communications and RADAR. These patents already include: methods for generating pulse trains; methods for modulating a pulse train; methods for switching to generate RF pulse train signals; methods for detection and receiving; and appropriately efficient antennas. It has been claimed [5] that by 1975 a UWB-like system, for communications or RADAR, could be constructed from components purchased from Tektronix. In fact, impulse RADAR systems have been commercial since this time, for applications such as ground-, wall- and foliage-penetration, position-location, collision warning for avoidance, fluid level detection, intruder detection and vehicle RADAR measurements [10, 11, 12]. Starting in early 90's, the use of impulses for communication purposes was revisited by Win and Scholtz for communication purposes [13, 14, 15, 16, 17, 18, 19, 20], and impulse radio was defined[9]. Pulse position modulation was almost exclusively adopted during the initial development of UWB radios because negating ultra-short pulses was difficult to implement. It was not until the late 1990's that the name ultra-wideband and the acronym UWB became popular. By this time, pulse negation became easier to implement, and pulse amplitude modulation attracted interest [21]. It is also in this decade that the first start-ups and companies directly working in UWB for communication purposes appeared. In 2002, the Federal Communication Commission of the United States of America authorized the use of ultra-wideband signals for communication purposes[2], in the band from 3.1 GHz to 10.6 GHz, opening effectively 7.5 GHz of bandwidth to communication applications as long as some constraints were met. First the minimum instantaneous signal bandwidth for a signal to be considered UWB would be 500 MHz. The second important constraint is to meet a EIRP mask as shown in figure 1-1 with a maximum equivalent radiated isotropic power spectral density of -41.3 dBm/MHz and even more stringent limitations in other bands. These restrictions intended to limit the impact of the new UWB devices interfering on already existing services in the same frequency bands. The main impact that this legislation had was that it did not specify any concrete type of modulation for the UWB signal, nor any concrete use of the bandwidth available. For that reason, the definition of UWB signals started to encompass signals that would not respond to more traditional UWB concepts. After the FCC approved the use of UWB for communication purposes, a larger variety of approaches suggested by different companies appeared. .. ........! ........ !ii!i~ i~ !! ! ! .i .. .. .. .. : E -40 S-45 C -60o . -65 0-7055 - : -Part - - - - : : : 15 bound -First Report and Order -75 -o0 10o Frequency in GHz 101 Figure 1-1: EIRP mask approved by the FCC [2]. 1.2 UWB Signals Initially, UWB signals were defined as any signal whose bandwidth is larger than 1.5 GHz or whose bandwidth is larger than 25% of its center frequency. The procedure to obtain such a large bandwidth was to use very short duration impulses in the range of the sub-nanosecond duration. The impulses used were initially [16] either the Gaussian pulse, the Gaussian monocycle (first derivative of Gaussian pulse) or the second derivative of the Gaussian pulse. UWB signals for communication purposes are currently restricted by the FCC to the band between 3.1 GHz and 10.6 GHz, a minimum bandwidth of 500 MHz, and a maximum equivalent isotropic radiated power (EIRP) spectral density of -41.3 dBm/MHz [2]. In narrowband signals the information is encoded either in the phase, the frequency or the amplitude of a sinusoid. Although initially most of the work in UWB was made in pulse position modulation (PPM) [17], during these years different modulation schemes have been explored: * Pulse position modulation - PPM [9, 17, 18]. In this case, the signal follows a time-hopping format. Assuming there are more than 1 transmitter, the signal sent by the kth transmitter is: (t(k) -_jT 8 k)(t(k)) - ck)T - dc(k) (1.1) -00 where t(k) is the kth transmitter's clock time and Tf is the pulse repetition time. This scheme of modulation has been shown to asymptotically perform better than direct sequence CDMA in a multipath environment [19, 20]. Its drawback are the additional complexity that is required in the demodulator. PPM was 19 almost exclusively adopted in the early development of UWB radios because negating ultra short pulses was difficult to implement. Another modulation scheme that does not require pulse negation is the so termed on-off keying (OOK), where symbol "1" is represented by transmitting a pulse, and "0" by transmitting nothing. * Pulse amplitude modulation - PAM [15]. The signal follows a scheme close to that of a direct sequence code division multiplex access signal (DS-CDMA) as shown: N0-1 )(t(k) k)) (t - jNTf iTf) (1.2) j=-oo i=O where b k) represents the sequence of symbols and c k)a pseudorandom sequence. The difference between DS-CDMA and this modulation scheme as applied to UWB is that the duty cycle of the waveform wtr(t) is small. Although it has a bound in its capacity that is smaller to the PPM scheme, both the fact that the complexity of the receiver is smaller and that for binary modulation it presents a 3 dB advantage to PPM, makes it amenable to practical implementation [21]. A special case of PAM is Binary Phase Shifting keying (BPSK) or antipodal modulation. This kind of modulation for large bandwidth has been found to be asymptotically inefficient [19]. On the other hand, the transceivers associated to this kind of modulation are less complex and synchronization to this kind of signal is straightforward. * Hybrid Direct-Sequence/Time-Hopping-CDMA (DS/TH-CDMA) modulation. In this case the signal is represented as: oo sk)(t(k)) No-1 _ (k) k)w (t(k) _ jTf -Ck)T - d NJ) (1.3) j=-oo i=0 This scheme has more degrees of freedom than the previous two. It is possible to approach the capacity levels obtained by PPM, with a lower complexity in the receiver as the PPM scheme. * Transmitted reference UWB [22, 23, 24, 25]. In this case, before each information pulse, a reference pulse is sent, that allows a very simple demodulation process at the cost of 3 dB of SNR. The signal can be described as: S: s) = r 8Tf) + - iNTf) bk)(t - iN8 T1 ) + d~k)bk)(t di'- (1.4) b, and bd represents the reference pulse and the data impulses respectively. bk' (t) 0-1 a) p(t - j2Tf - cSk)TP) (1.5) bk)(t) = Ej- a k)p(t - j2Tf - c k)T - Tr) (1.6) (1.7) Transmitted-reference (TR) signaling, in conjunction with an autocorrelation receiver, offers a low-complexity alternative to Rake reception. Further information on this modulation scheme can be found in [26, 27, 28]. Other schemes that have been reported are orthogonal waveform and block orthogonal modulation schemes. Due to the redefinition of ultra-wideband since the FCC ruling, a more varied set of non-impulsive modulations have been considered, including OFDM signals [29, 30, 31], and other impulse UWB modulations [32, 33]. Although some of their relevant characteristics will be analyzed in chapter three, a thorough study of all the possible UWB modulation exceeds the scope of this thesis. In this thesis, we will focus on impulse UWB systems, and details on the kind of modulation used and the reasons for it are given in the following chapters. The main reasons for this decision will be expanded in chapter 3, but hinge upon some of the challenges required for a transceiver of these characteristics. 1.3 Characteristics of UWB Signals The main characteristics of UWB signals are associated to their bandwidth, at least an order of magnitude larger than other signals used for communications. Current wideband standards consider 20 MHz signals (802.11a[34]) or lower. UWB signals promise large data-rates, low probability of interception signals and the capability of estimating distances between the transceivers with a precision as good as a few centimeters. Further claims are its resilience to multipath, fading and narrow band interferers. High data-rate UWB transceivers are dominated by a digital baseband that would perform most of its required functionality. Shannon capacity equation states that: C=BWlog2 + S ) (1.8) where BW represents the bandwidth of the signal, S represents its power and N represents the noise power in the same bandwidth. This expression shows that capacity grows linearly with bandwidth but only logarithmically with signal power, making UWB amenable to large data rates. The application of this equation is, on the other hand, limited to single user communications in an AWGN channel. In any receiver it is possible to detect and separate echoes of the signal as long as they arrive to the receiver with delay differences of the order of magnitude of the duration of the impulses. When they arrive closer than this, they combine together, with a probability of adding together either constructively or destructively. The phenomenon in which several echoes of the same signal arrive close enough to combine is known as fading and it is a purely narrowband phenomenon. Larger bandwidths allow better timing resolution, and in the case of multipath, the possibility of separating the echoes that arrive at the receiver. Under these conditions, it is feasible to use a Rake receiver to gather up the energy from these echoes obtaining a diversity gain from a situation that would have caused fading in a narrowband setting. Since the distance between transmitter and receiver is proportional to the time delay measured between the instant the signal was transmitted to when the signal is demodulated, it is possible in any communication system to measure the distance between transmitter and receiver. The variance of the time estimation is inversely proportional to the bandwidth of the signal transmitted. For a bandwidth of 1 GHz, time delays with a difference of 1 ns can be distinguished directly from the received bits (if we are using a BPSK modulation). A delay of 1 ns, taking into account only a direct path, is equivalent to a distance of 30 cm, allowing very good locating properties in UWB transceivers. UWB ranging has been studied in [35]. The low probability of interception (LPI) stems from the fact that to effectively intercept an unknown signal, a complexity at least equal in order of magnitude to that of the intended receiver is needed. The complexity of the interceptor would in general grow with the length of the pseudorandom sequences used to randomize the transmitted signal and its bandwidth. The communication capabilities of any signal depend on its average power, while the possibility of effectively intercepting it depends on its peak power. Since using bandwidth spreading allows maintaining constant the average power (and maintaining constant the capability of transmitting information), while reducing the peak power (and the probability of interception), the use of UWB signals allows manufacturing signals with low probability of interception. The tolerance to powerful narrowband interferers stems from the fact that a narrowband interferer is filtered out by the use of a filter matched to the input signal. For that reason, even when the power in the band of the interferer is completely dominated by the interferer, the UWB receiver is able to de-correlate the interferer. This is the same effect that has already been observer in signals like direct sequence code division multiple access signals. The simplicity of the transceiver is associated to understanding UWB signals as baseband signals. For this reason, it was assumed that it is possible to design a transceiver whose front-ends are greatly simplified as compared to normal narrowband communications. The signal is generated in the transmitter without the need of up-converting it. For that reason the transmitter lacks a mixer and a carrier generator, and in certain occasions, the digital part drives directly the antenna. On the receiver side, the front-end does not require down-conversion. The front-end lacks a mixer and the whole band is amenable to sampling using an ADC. Then the whole signal processing may be performed in the digital domain. This implies lower cost, lower power, ease-of-design and most of the associated benefits of CMOS technology scaling[36]. Digital architectures were found to outperform analog approaches [37]. Furthermore, they allow for considerable flexibility: a single receiver may support different modulation schemes, bit-rates, qualities of services and operating ranges, and change these parameters dynamically. There are limitations to these claims. Equation (1.8) assumes that the signal is transmitted in the presence of only AWGN. Neither multipath, fading, multiuser or the presence of other interferers are assumed. The linearity of the receiver, determined by the RF front-end and the analog-to-digital converter will limit the performance in other issues such as the possibility of using a Rake to compensate for the multipath, and also its resilience to narrowband interferers. As long as the transmitted signal or the additive white Gaussian noise (AWGN) are the dominant signals in the front-end, the performance degrades gracefully and can even be extrapolated from that of the 1 bit ADC. If, on the other hand, the receiver is captured by a narrowband interferer, the performance of the system degrades sharply, as will be proven in following chapters. Regarding the locating capabilities, the problems encountered by UWB systems are greatly determined by the complex multipath environment. These are problems that have been already explored on every other location system based on triangulation where direct signals may have been highly attenuated (by the presence of a wall) and other echoes, that would arrive by more convoluted routes appear more clear. Since UWB systems are conceived to be used in the indoor environment, these problems may reduce the possibility of using the location capabilities unless the rooms are carefully modeled. 1.4 UWB Applications Although it can be argued that UWB is amenable to any communication applications, there are currently two main drivers for this technology, as shown in Figure 1-2. The main limitation of the use of larger bandwidths includes the attenuation of the signal. Section 15 of the FCC limits either the reach of the transmitted signal to very short distances or the data rate to very low data rates. In any case, UWB communication is expected to be used for many consumer electronics products in the near future. Figure 1-2 shows the current trends in applications for UWB communications. On one side we have very high data rate, very low distance applications such as wireless personal area networks (WPAN). Possible applications are varied, from communicating peripherals to a computer (replacing in this way Bluetooth in PC architectures) and communicating wirelessly from a DVD player to either a flat-screen or a sophisticated sound system. The IEEE 802.15.3a standard group specified different modes of transmission for various ranges: 110 Mbps at 10 m, 220 Mbps at 4 m, and an optional mode of 480 Mbps at 1 m. There were two proposals that met these criteria: Multiband OFDM, backed by a consortium of more than 60 corporations, is an extension of the standards 802.11a and 802.16e for a larger bandwidth, but retaining most of the rest of their characteristics [33]; and the proposal presented by Motorola is based on CDMA M-BOK (multi-bit bi-orthogonal keying) signals [32]. After several years of stale-mate, the standardization group disbanded without generating a standard. On the other hand, with smaller data rates in the order of 100 kbps and less, there is a large space of applications that include RFID for inventory control, and IEEE 802.15.3a WPAN 500Mb* WLAN 50Mb* 5Mb* Wir 1Ps __ VV· V·QC I IFFF ARf9 1_ A USB & Multimedia 500Kb' Im 10OM lOOm Distance Figure 1-2: Intended applications. similar applications. In this case, since we are using a lower data rate, much larger distances might be achieved. Besides, the locating capabilities of UWB signal add to the value of these applications. IEEE standard group 802.15.4a is currently developing a standard for these applications. 1.5 Previously Used Architectures This section shows the architecture of three receivers, some of whose characteristics we will borrow. First, the Berkeley Wireless Research Group proposal, uses a high speed ADC right at the output of the LNA, and performs all the signal processing in the digital domain. Second, as a paradigm of a broadband system, a CDMA receiver has common characteristics with the UWB system, and part of the intuition obtained here can be applied. Finally, I will point out to some of the characteristics of an FCC compliant, impulse UWB transceiver that has been published in the International Solid State Circuit Conference. In 2005, an impulse ultra-wideband transceiver was presented in [21]. The main contribution is that it is a traditionally conceived baseband UWB transceiver. This system focuses in the band from 0 to 960 MHz. They focus on applications that require low data rates such as sensor networks. This transceiver uses binary antipodal modulation and its architecture is shown in Figure 1-3. This system implemented a front-end with no mixers. A 1-bit ADC is used after the low-noise amplifier and the variable gain amplifier. The chosen resolution is 1 bit, allowing a sharp reduction in MATCHED FILTER Figure 1-3: Architecture of UWB receiver by Berkeley Wireless Research Center. the power dissipation of the ADC and avoiding the necessity of an automatic gain control. On the other hand, as shown in [38], this transceiver is easily captured by inband narrowband interferers. The whole signal processing is performed in the digital domain, and it uses extensive digital correlation. Several parameters as the shape of the pulse, the length of the code and even the use of PPM or BPSK can be changed seamlessly as the receiver works. The timing control is partially performed in the digital domain, since the clocks that perform the sampling are directly controlled from the digital baseband. This kind of timing loop is no longer necessary as it has been proved in multiple implementations of other broadband systems. This system supports low communication rates (- 100 kbps) and ranging capabilities over short distances (- 10 m). Since the authorization of UWB by the FCC for communication purposes, the limitation not to use the spectrum below 3.1 GHz for high speed communications have changed the approach to UWB systems, making its characteristics more conventional and related to prior wideband systems. Concretely, the claim that a much simpler RF front-end is required may not be applicable as before, since down-conversion will help to limit the specifications of the ADC and the baseband. The architecture of the baseband has then several similarities with standard CDMA systems, being the main difference between them and impulse UWB system that the duty cycle of the UWB signal is smaller than 100 %. For that reason, the architecture of a UWB transceiver borrows important characteristics from classic CDMA transceivers. Since these systems have been extensively used since the 1980's for a large range of applications that range from RADAR to communications, to locating (Global positioning system - GPS [39]). From CDMA transceivers we will borrow the acquisition, synchronization and tracking algorithms. Figure 1-4 shows the different parts of the baseband of a CDMA receiver, from the antenna to the demodulation channel. Of the characteristics of this receiver the Figure 1-4: Correlator channel in a CDMA receiver. most important are two: almost no feedback between the digital and the analog part is needed (only the Automatic Gain Control) and the synchronization process has a part that is hardwired (and performs the correlations) and a part that is programmable and can be changed to adapt it to the current situation of the receiver. Since the impulse UWB modulation can be interpreted as a direct sequence code division multiplex signal, some of these techniques for time synchronization may be adapted for a larger bandwidth and different data rates and duty cycles. This architecture receives the samples of the signal in an intermediate frequency. In order to recover the data signal from the CDMA signal, it is necessary to perform the last frequency downconversion by multiplying the incoming signal with the carrier and to correlate with the pseudorandom code. Both are locally generated signals that need to be properly synchronized. Two tasks must then be performed: * Carrier synchronization: A Phase Locked Loop (PLL) does not have, by itself a wide enough pull-in range to lock onto the signal. But a Frequency Locked Loop (FLL), proficient enough to lock onto signals with a variety of center frequencies, is too noisy to perform a proper tracking of the signal after having achieved lock. The solution is easy as we are in the digital domain and part of the loop is programmable. From the block diagram in Figure 1-4 only the correlators (integrate and dump blocks, plus multipliers before them) and the code and carrier generators are hardwired. The filter loop, and, in general all decisions related to the data coming from the integrate and dump blocks are controlled at low frequency through software. That way, different situations can be detected, and a FLL with a large pull-in range or a Costas loop (I-Q version of the classical PLL [40]) with a good noise response can be used, and the functionality can be changed on the run. * Code synchronization: Also affected by the Doppler effect, but, as it is a lower Figure 1-5: Architecture of baseband of UWB receiver by Sony Corp. frequency signal, its effect is smaller. More important in this case to align the chips of the incoming signal with those of the code generated. Due to the autocorrelation properties of the pseudonoise code, misalignment larger than half a chip causes to loose almost completely the signal. A linear procedure in the way of a Delay Locked Loop (DLL) provides a very good noise bandwidth but it is ineffective at the beginning of the search, because the procedure is only linear within half a chip of perfect alignment. In order to bring the local generator within half a chip of the code in the incoming signal, a coarse synchronization algorithm (non-linear) must be used. As in the carrier synchronization, the loop is closed by software and is programmable. In ISSCC 2005, several UWB transceivers were presented in San Francisco. Although most of the systems presented were associated with MB-OFDM proposals the system presented from Sony and Mixed Signal Systems proposed a 3.1 to 5 GHz CMOS DSSS UWB transceiver [41]. It was FCC compliant and transmitted information that is spread with a chip rate of 1 Gchips/s in the baseband block. The impulses are further shaped to lower the power density at 3.1 GHz, to increase the total transmitted power by flattening the spectrum of the transmitted signal, and to pre-equalize the waveform of the transmit signal for the RX filter characteristic. As part of this transceiver, a baseband block to process the samples was included. The baseband receives the 2b ADC samples of the output signal of the ADC at 1 GSample/s. The synchronization to both the phase of the carrier and the spreading code is performed by controlling the phase of the ADC clock. Figure 1-5 shows the block diagram of the proposed baseband in this transceiver. This transceiver was implemented in a 0.18 /m CMOS process and consumes 105 mW in transmit mode and 280 mW in receive mode from a 1.8 V supply. 1.6 Signal Processing Techniques The channel allocated to UWB communication signals is impaired by severe multipath and in-band interferers. Regarding multipath, it has already been proved that the UWB signal does not suffer fading, requiring little fading margin to guarantee reliable communications [42]. There are several comprehensive studies on the statistical properties of the UWB indoor wireless channel [43, 44, 45, 46]. The IEEE 802.15.3a chose the multipath model presented in [3]. It is a Saleh-Valenzuela [1] model with two modifications: a log-normal distribution is used instead of a Rayleigh distribution for the multipath gain magnitude, and independent fading is assumed for each cluster as well as each ray within the cluster. In order to compensate for the effects of multipath in the receiver, the channel impulse response will be estimated and this information will be used in a Rake receiver and in a MLSE detector. In the following paragraphs some results known in these areas will be presented. Channel estimation is critical in the context of narrowband and spread spectrum system. The procedures already developed for DS-CDMA can be adapted to UWB systems [16], although the high sample rates required usually motivate the search for alternatives. Some impulse response estimators developed in [47, 48] are based on the maximum likelihood (ML) criterion. The problem of channel parameter estimation in UWB communications has been also addressed in [49], in this case in the context of the signal energy capture in Rake receivers as a function of the number of Rake fingers. [47] looks both at data-aided (DA - in which a training sequence is used) and non-data-aided (NDA) estimation. In both cases, the objective of the algorithms is to separate each component of the multipath estimating its delay (with respect to a reference initial time of arrival) and the attenuation associated to that data path. The channel is assumed to have unlimited bandwidth and for that reason the complexity of the algorithms grows unbounded as the number of possible multipath components are considered. They also analyze the characteristics of the transceiver taking into account the presence of several users transmitting at the same time. The system developed here has been conceived as a time-division multiple access (TDMA) system, so the problem of multiuser detection will not be addressed, and the matched filter [50] may be considered as the optimum receiver. The joint timing synchronization and channel estimation has been recently pursued [51], in this case using least squares (LS) estimates of both the timing offset and the channel impulse response, assuming Nyquist sampling of the baseband signal. Both sub-Nyquist schemes [52] and FFT based approaches [53] to the channel estimation problem are also present in the literature. Normally sub-Nyquist schemes provide a trade-off between a lower complexity and a larger minimum signal-to-noise ratio since only a subband of the total signal is used, and the energy that falls outside this bandwidth is not taken into account. Spread spectrum signal acquisition has been studied theoretically in [54, 55, 56, 57]. Since the bandwidth of UWB signals allows the separation and characterization of a large number of the components of the received channel, it is natural to use a Rake receiver [50] in order to compensate for the multipath and also to take advantage of the multipath diversity in the presence of obstacles to boost the SNR [16, 58]. It has been already indicated in [19] that as the number of multipath components of the channel increase, the amount of energy that can be used for channel estimation grows, and the capacity of the channel goes asymptotically to zero. The use of Rake receivers for UWB signals has been explored in [47, 59, 60, 48, 61, 62]. Concretely, [48], presents a comprehensive analysis and approach to several different implementations of the Rake receiver as more or less complexity is available. The terms all-Rake (ARake), selective-rake (SRake) are introduced. As we will see in chapter 3, we will use an intermediate solution, a partial Rake receiver will be used [59]. In order to combat the intersymbol interference (ISI) that comes when we are transmitting using a symbol duration that is shorter than the duration of the channel impulse response, it is necessary to use a maximum likelihood sequence estimator [63]. Several of the possible procedures and approximations to the perfect MLSE receiver were explaind in [64]. Since this kind of receiver has been applied in convolutional codes, its architecture is already well known and some high performance Viterbi decoders have been reported in the bibliography [65, 66, 67, 68, 69, 70]. Although for design purposes we will take into account the presence of narrowband interferers and how its presence degrades the performance of the system, further than detecting the presence of an in-band interferer, we will not take further actions to reduce its impact. 1.7 Power Dissipation in UWB Systems The last years have seen the increase of the importance of using power awareness in digital systems. The recent development of wideband and ultra-wideband wireless systems in which the use of sophisticated signal processing and coding techniques allows recovering the signal under stringent conditions of noise and interference has contributed to this trend. As the bandwidth of the signal is increased, frequency diversity can be used to boost the signal to noise ratio in the receiver. It has also been a noticeable trend that the transmitted power is no longer a large percentage of the total power of the system since it is out-weighted by signal processing and bias currents in the front-end. There is a larger percentage of energy devoted to the signal processing and the digital parts than to the rest of the system. Although power scales down easier with technology in the digital domain than in the analog, this trend is slowing down by the fact that leakage increases the current density that is continuously drawn from the power supply [71]. Power awareness in communications systems is defined [72] as the awareness of the exact performance demands of the user and the environment. A power-aware system consumes just enough energy to achieve the desired level of performance, and not one decibel, byte or hertz more. Power-aware systems exhibit this characteristic at all levels of the system hierarchy. Energy trade-offs are enabled at the circuit level and exploited at the algorithim level. For the system to become aware it will be shown in this thesis that it must incorporate sensing techniques of the parameters that would allow adapting intelligently the signal processing to the current environment and user requests. The ability to trade off performance for energy savings within the node, and collaborative processing among nodes reduces the overall energy dissipated in the network. Energy inefficiencies in the system must be confronted and eliminated. The adverse conditions of the UWB channel require sophisticated signal processing in any of the proposals to ensure the modes of transmission required by IEEE 802.15.3a. Therefore, the digital baseband of this transceiver consumes a significant percentage of its total power dissipation. Some UWB transceivers along with their power levels have already been reported in the bibliography: * RF Front-end: 117.5 mW [73]. Implemented using SiGe BiCMOS 0.25 /Lm technology. * Clock and carrier generator: 73.44 mW [74], implemented using SiGe BiCMOS 0.25 pm technology. * Digital Baseband: 523 mW [75]. This system includes a complex low-density parity-check code (LDPC) demodulator, that constitutes the most important contribution of the system. * A pulsed UWB system [41], based in DSSS, implemented in 0.18 pm CMOS technology, consumes 280 mW. It was also estimated in the MBOA White paper that the transmitter for an MBOA transceiver for 90 nm CMOS process would consume 93 mW in transmission and 169 mW in reception. For lower data rates, some arhitectures were compared in [76]. Taking this into account, quality of service may be traded-off with complexity and power dissipation, depending on the channel quality and environment. In this thesis, the digital baseband for a FCC compliant pulsed UWB transceiver will be developed. A prototype will be designed that focuses on providing a flexible platform that exposes several knobs in the architectures to control this trade-off explicitly, adapting the power dissipation to the required quality of service and the channel characteristics. 1.8 Thesis Contributions As part of the learning process of this thesis, three UWB digital basebands have been designed, two of them implemented as dedicated digital circuits. The first one is part of a system-on-a-chip prototype oriented to the demodulation of UWB baseband signals from 0 to 500 MHz, and for this reason it is not FCC compliant. It was implemented using 0.18 pm CMOS technology working at 1.8 V to achieve a data rate of 322 kbps. The second UWB baseband was designed for a discrete prototype and implementing using an FPGA. With this digital baseband it was possible to obtain either a wireless data rate of 100 Mbps using an arbitrary waveform generator as transmitter or 50 Mbps using a dedicated impulse generator as transmitter using an FCC compliant signal. A second ASIC was designed to implement an FCC compliant UWB baseband working at 100 Mbps, using also a FCC compliant signal. This second ASIC was also implemented in 0.18 ym CMOS technology working at 1.8 V. Among the subsystems implemented here are included 150 correlators in parallel to reduce the time to achieve coarse acquisition, a programmable partial Rake that may use up to 25 complex taps and a Viterbi-like MLSE equalizer. The two main drivers of this architecture are to supply a good characterization of the environment in which the transceiver is, and also to provide higher levels of the architecture with knobs with which to trade off power dissipation and data rate with quality of service, adapting the transceiver sharply to the channel state. For the implementation of these baseband processors, it was necessary to map several signal processing algorithm such as correlations or a Rake receiver to an efficient parallel architecture, that would allow further optimizations such as dynamic voltage scaling. The various prototypes were organized around a central structure of correlators that performed several different tasks during the demodulation process and several auxiliary sub-blocks to complete the necessary functionality for demodulation. Although this has not been fully exploited in this thesis it would be a good starting point for future optimization. Since the complexity of the signal processing and the power dissipation are linked, this thesis has explicitly exposed the link between signal processing complexity, power dissipation and quality of service. Chapter 2 A Baseband Processor for a Baseband UWB Transceiver In this chapter, the architecture, implementation and measurements of a baseband processor for pulsed ultra-wideband signals are presented. Although originally designed and tested for baseband pulses over a wireless link, this architecture may be easily scaled to larger bandwidths or applied to an FCC compliant transceiver by adding functionality to the RF front-end for up/down-conversion within the 3.1 GHz to 10.6 GHz band. This architecture was implemented using a 0.18 Pm CMOS process at 1.8 V. The transceivers developed in chapters 3 and following will start with the modification of this initial implementation. 2.1 UWB Signals For this transceiver, BPSK was chosen because for binary signals, it has a 3 dB signal to noise ratio (SNR) advantage over PPM [21] (considered as an orthogonal modulation [50]). This work focuses on a receiver for pulsed UWB signals, using 0-500 MHz baseband pulses. In this implementation, each bit of information is represented with a sequence of 31 pulses with a width equal to Tp = 2 ns, and every two consecutive pulses are separated by Tf = Nf •Tp with Nf=50, resulting in a very low duty cycle(- 2%) [6] and a bit duration of Dut=1550.T,. The information is encoded on the sign of the pulses, that also depends on the corresponding bit of a Gold code sequence cj of length Ne=31. A Gold code is chosen for its good autocorrelation properties, that allows obtaining good synchronization to the received packet. Although it also offers very good cross-correlation characteristics, we will not take into account in this design more than one user. Channelization could be implemented by assigning a different Gold code to each user [50]. A family of Gold codes is obtained initializing one of the two shift registers used to generate the Gold code with different seeds. Suppose the bit-stream is denoted by a sequence of binary symbols bj (with values +1 for bit 1 or -1 for bit 0) for j = -oo, ..., oo. Let A denote the amplitude of each pulse p (t). Then, the transmitted signal is : oo SBPSK (t) = A E Nc-1 E bjcp (t - jNcTf - iTf) (2.1) j=-oo i=O where c, is the Gold code. This signal provides a processing gain [50]: N PG = 10 log - = 32 d dB (2.2) where N, is the length of the gold code (31) and d is the duty cycle of the signal. The data packet is comprised of a preamble and the payload. During the whole packet, the same Gold code is used. The preamble is composed of a sequence of pulses whose signs follow several repetitions of the Gold Code, plus a final sequence of 31 pulses in which the Gold code is reversed. This last sequence represents a bit 0 (as opposed to the previous repetitions that represent a sequence of bits 1) indicates the end of the preamble and the beginning of the payload. The time to achieve packet synchronization is a critical specification of any high data rate wireless system. The length of the preamble must be long enough to guarantee a high probability of achieving signal acquisition. The importance of the duration of the preamble of the data packet stems from the fact that the energy spent during the preamble is not spent in proper demodulation of the encoded information. Therefore, it represents an offset that is spread over the whole data packet. The longer the data packet is, the less important it is to have a long preamble. For short, bursty traffic, a long preamble implies that a significant percentage of energy is not spent in demodulating the payload. The target application of the transceivers of which this baseband is part is that of transmission of files and information between computers. In this environment, previous wireless standards both for Wireless Local Area Networks and Wireless Personal Area Networks can serve as benchmarks to which to compare our results. IEEE standard 802.11a [341 is a wideband standard for WLAN with a maximum data rate of 54 Mbps. Some of the characteristics of this standard will be used as a benchmark for our prototypes. For example, IEEE standard 802.11a has packets with a preamble of 22 is [34]. An objective of the designed systems in this thesis is to achieve an acquisition time of the same order of magnitude. 2.2 System Trade-offs An important consideration in the receiver architecture is determining where to place the analog/digital partition. We chose a digital architecture that also implements all the synchronization in the digital domain. The advantages are the simplification of the analog elements in the transceiver, its scalability, and the possibility of exploring digital channel adaptability and recovery. Performing the synchronization in the digital domain eliminates the need to feed a signal from the digital domain to the clock generation subsystem. As a preliminary prototype that would serve as a il iil .. .. ... .. ....... ••... ................. ........... .. • .. 2 b t .............. ............... ... ..2 bits ....... ... .... 4 bits .... ..... bits...~V L56 ...................... ..... . .. ............. ...r~ ·--- 2bits .. ....... ... . ··-·_--····_· ....... ..... ..... AI........... ..................... ............ ...................... ........................ ~..;....... ................ ........ E10, .............. ............ . ......... ...... -3 .. ; ... .. .. .. .. .. .. .. .. ... .. .. ... .. ..... .. .. .... .. .. b VOlO .... .... ... ... .... ... .... ... .... .... ... .... ... .... ... .. .... ... Ni!....... - ....................... 0 i i ii .................... ............... ! i........................ ............ •....... ........... .... i• .... .......... • .............. .... ....................... ...... ..... ...... ... ...... ...... ....... .. ~102 c:::::::::::.................. I....... ~............. . ...... ...... ..... ..... ... A:...... ...... ..... ..... ..... ............ .... ...... ............ ........... ........... ...... .... ..... ...... ...... ..... ... ....... ..... ..... .... !: !! ......... '• :!iii .'. .••'• .. ....... ........ . ........... ~ a- bi t ....................... i............... ....I ...•....•........................ .. ................. ;.... ... ............. ... ..... ..,.... ...•. . .. . .. . .. .. . • ...................... :::: : s = If-4 '-30 -25 -20 SNR (dB) -15 (a) Noise limited environment -10 '"-30 -25 -20 SIR (dB) -15 -10 (b) Interference limited environment Figure 2-1: BER as a function of the SNR (a) or SIR (b) for different ADCs proof of concept, it was chosen not to implement an automatic gain control even if a conventional wireless communication system requires it. For that purpose, the signal is sampled at twice the Nyquist rate [50]. It is possible to show that by doing this the information that we have in a baseband signal for timing control purposes is equivalent to that of the I-Q direct conversion scheme in which both an in-phase and a quadrature components are sampled at the same time. A digital architecture depends on the feasibility of the analog to digital converter (ADC) required to digitize the signal. To allow for an all-digital timing recovery, the ADC must sample at 2 GSPS, oversampling at twice the Nyquist rate. A FLASH ADC architecture is well suited for such a high sampling rate [77]. Since the power consumption in FLASH ADCs scales exponentially with the number of bits of resolution, minimizing ADC resolution is crucial to reduce the power consumed in the receiver. Four bits of resolution are sufficient to be closer than 1 dB to the infinite resolution ADC curve for bit error rate. This is true in both a noise limited environment where the signal is degraded by AWGN and in an interference limited environment, where the signal is corrupted by a powerful narrowband sinusoidal interferer [38]. Figure 2-1(a) shows the effects of ADC resolution on transceiver bit-error rate (BER) for different SNRs. Figure 2-1(b) is the equivalent plot for the interference limited case in terms of the signal-to-interference ratio (SIR). The baseband UWB receiver uses a front-end that amplifies the incoming impulse signal [78]. After it, the baseband processor shown in Figure 2-2 demodulates the signal. The following sections describe some details of the clock generation subsystem and the ADC, and the full design of the digital baseband. Signal from ____·__ RF Front-endk FowPhas 300MHz Ck Fast clocking "..'. domain Slow clocking domain Figure 2-2: Baseband processor block diagram. 2.3 Architectural Choices for Clock Generation and ADC The 4-bit ADC in the UWB receiver is comprised of four FLASH time interleaved channels running on 500 MHz phase-offset clocks supplied by the PLL, achieving a sampling rate of f,=2 GSPS. It was designed by Puneet P. Newaskar. A clear advantage of the use of a flash time interleaved ADC is that the maximum frequency that is generated in the receiver is 500 MHz instead of fs. The samples from the four channels are aligned to the same 500 MHz clock edge instead of creating a sample data rate clocked at 2 GHz. They are then presented in parallel to the digital back-end at this reduced data rate. The input to the digital baseband is the samples obtained from a flash interleaved ADC. This has two advantages. First, overall performance is determined by the average of all four channels rather than being limited by the worst case. This occurs because the digital back-end adds groups of four consecutive samples, coming from the four different channels, and treats the result as a single sample. This reduces the need for calibration across the channels as required in most time-interleaved ADCs, since the errors would be averaged out. The second advantage is that data is supplied to the back-end at a reduced data rate. The outputs of the different interleaved channels of the ADC are presented in parallel to the digital back-end. The samples from the four channels are aligned to the same 500 MHz clock edge instead of creating a sample data rate clocked at 2 GHz. For this same reason, the clock generation subsystem does not need to generate a 2 GHz clock, but 4 phases of a 500 MHz clock, one of them used for the digital back-end. Since the system implements a fully digital synchronization algorithm, the only input to the clocking system is the reference crystal clock, and its sole function is to track it. The jitter requirements are mostly constrained by the digital back-end of the receiver. Given that the probability of losing synchronization during a 1024-bit data packet with a rms clock jitter of 100 ps is smaller than 0.01, and the degradation in the SNR introduced in the ADC by the same jitter is smaller than 1 dB, a ring-oscillatorbased VCO can be used to generate the 500 MHz clocks required for the ADC. The ring oscillator, designed by Fred S. Lee, consists of four differential inverters, producing the four 90 degree phase-shifted clocks that drive the time-interleaved FLASH ADC. 2.4 Digital Baseband The digital back-end implements the functionality required to synchronize and demodulate the data packet. The digital baseband implements the entire synchronization algorithm in the digital domain without feeding back any control signal to the other blocks, and achieves packet synchronization in less than 70 ps. 2.4.1 Functionality This receiver will be in one of two functional states: the coarse acquisition state looks for the presence of a data packet and achieves synchronization, and the fine tracking state performs the demodulation of the data packet after coarse acquisition was achieved. The following paragraphs present the specification of the different blocks of the digital back-end. This transceiver is being developed as a proof of concept for the parallellized architecture. For that purpose it contains only the minimal functionality to acquire and demodulate a baseband signal. It assumes the signal is baseband so that no carrier recovery functionality is included. It also assumes that the Automatic Gain Control works perfectly, so it is not implemented. Matched filter The digital back-end recovers the information contained in the data packets from the samples given by the ADC using an approximation to the matched filter [50]. A matched filter is the optimum receiver for a signal in AWGN, and implies the correlation of the received signal with a local template synchronized to it and comprised of perfect replicas of the received pulses separated by the inter-pulse interval and whose signs coincide with that of the Gold code. This receiver uses a sequence of rectangular pulses instead of perfect replicas of the received pulses, avoiding the use of multipliers. The correlation of the incoming signal with a rectangular pulse of width Tp is equivalent to adding four consecutive samples taken at 2 GSPS. For a Gaussian pulse, 1.7 dB of SNR are lost, compared to the ideal matched filter. Since the correlation is implemented completely in the digital domain, the use of a larger T, than the optimum does not accelerate the coarse acquisition process and it has roughly the same number of mathematical operations as performing the correlation of the incoming signal with several differently delayed templates. Moreover, a larger than the optimum T, reduces the SNR at the output of the matched filter. Therefore, the width of the integration window is chosen to maximize the processing gain. 0.9 40.8 .... . ... ... .. 0 40.7 •0.6 0.5 n0 V'.2 Rectangular pulse . - Triangular pulse . Gaussian .. pulse 0.4 0.6 0.8 1 D (ns) . 1.2 1.4 1.6 Figure 2-3: Pd as a function of D, the relative position between the pulse and the template. Coarse acquisition The coarse acquisition algorithm detects the presence of a data packet and estimates its delay with a precision of half the width of the pulse. For that purpose, it locates a peak of correlation between the incoming signal and a sequence of differently delayed local templates. The results of each of these correlations are compared to a predefined threshold, Tho, and if this threshold is met, packet acquisition is declared. The complexity of this process grows with the signal bandwidth, the length of the Gold code, and the smaller duty cycle. This is a serial search. Some papers propose a nonserial search [79, 80, 81], but this usually implies loss of time while obtaining in the practice the corresponding correlations. Some interesting work on synchronization in multipath environments using dirty templates has been published in [82]. PPM coarse acquisition has been studied in [83]. The difference in delay of two consecutive templates affects the number of opportunities available to detect the signal. A smaller difference implies a larger probability of detection of the signal, but also an increase in the complexity of the receiver. Setting this difference in delay equal to T,=2 ns gives a reasonable probability of detection of roughly 0.9 for a Gaussian pulse, as shown in Figure 2-3. This choice also simplifies the timing design in the receiver as all clocks have the same frequency. If the transmitter and receiver clocks are assumed to have the same frequency, the coarse acquisition process may be modeled as a Markov chain, as shown in Figure 2-4. The delay of the received pulses with respect to the different local templates is assumed to be constant. The states from 1 to N = N, - Nf represent tests in which the received signal is correlated with a template with a different delay. The initial state can be any from state 1 to state N = N, . Nf. The states i and i + 1 contain the pulses aligned with an error smaller than half the width of the pulse. If the pulse is Figure 2-4: Coarse acquisition process as a Markov chain (D = Correct detection; FD = False detection). detected there, it goes to state D, correct detection (with probabilities Pd,i or Pd,i+l). The rest of the states do not contain the pulses. Any detection in those states implies a false detection (state FD) with probability Pfd. If no signal is detected, from each state it jumps to the next one. Using this information, it is possible to obtain the average number of iterations to achieve lock (E [k]) and the probability of correct detection (Pcd). E [k] - NA + E•" 1 nPr[n] n1-A = (P,+ Pi+i (1 - PW)) Hi- (1 - P) 1-A (2.3) (2.4) 1 (1 - Pj). It is assumed that the probabilities where A = j-N of declaring a detection for the slots j from 0 to N -1 is Pj. For states i and i+1, Pi = Pd,i and Pi+l = Pd,i+l. For other values of j, P, = Pfd. Averaging these expressions for all the possible delays of the incoming signals, and choosing N,=31 and N =50, Table 2.1 is obtained. The probability of correct detection drops sharply when Pfd = 10- 3 .This happens because Pfd is comparable to 1/N, and in N trials, a false detection may arise before there is an opportunity of testing the right delay. To ensure a reasonable probability of correct detection, Pfd must be much lower than 1/N. The coarse acquisition algorithm designed for the implemented system assumes 50 way parallelization. For a Gaussian pulse and using a SNR and threshold that ensures Pfd = 10-4,the average time to declare coarse acquisition is 65 ps. In this discussion, it has been assumed that the frequencies of the transmitter and receiver clocks were exactly the same. A large difference in frequencies spreads the total energy of the pulses not only across the correlation to two consecutive templates but three or more, reducing the probability of detection. It was proven that '0 k in ppm Figure 2-5: Change of probability of detection due to a difference in frequencies between transmitter and receiver for the specified clock stability of 20 ppm, typical in crystal oscillators, the current specification is robust enough and the loss in probability of detection is negligible, as shown in Figure 2-5. r in this figure represents the difference in delay between the boundary of two consecutive integration windows and the center of the impulses. r 0.5 ns implies that the impulse is centered in one integration window. As r decreases, part of the energy of the impulses appears in a different integration window and the probability of detection decreases. But at the same time, it is necessary a larger deviation in frequency (k) to impact this probability of detection. This work was presented in [84]. Fine tracking Once packet synchronization is declared, depending on the length of the data packet, it may or may not be necessary to include a fine tracking algorithm to keep time synchronization for the duration of the data packet. For the specifications of the system, if the difference between the transmitter and receiver clocks is 20 ppm and there were no fine tracking mechanism, after 250 pulses half the energy of the incoming pulses is not included in the correlation with the local template with which the received signal was initially aligned. Since each bit is represented by N,=31 pulses, this allows a maximum of 8 bits in the packet. Table 2.1: Model results for a Gaussian pulse E[k] Pcd 526 0.48 10- 4 915 0.91 10-5 1066 0.99 Pfd 10- 3 For fine tracking, a classical Delay Locked Loop (DLL) [40] is used. It straddles the incoming pulses between two consecutive local templates, representing early and late versions of the signal. The relative values of these correlations are used to estimate the delay of the incoming signal. Since all of the timing control is performed in the digital backend, it is not possible to continuously adjust the delay of the local templates generated in the receiver with respect to the incoming signal. The only feasible delay adjustments are an integer number of samples. Due to the architecture used, only a limited range of delay corrections can be applied. The architecture presented in the next section allows for corrections of -3, -2, -1, 0, 1, 2 and 3 samples. It was tested through simulation that this is sufficient for this system. 2.4.2 A parallelized approach In this section the procedure to use a parallel architecture for the implementation of the matched filter at an acceptable clock frequency is explained. For that purpose we will make use of the concept of poly-phase implementation of filters that is usually used in decimation filters [85]. Let us assume that we are trying to lock to a signal that repeats indefinitely a pseudorandom sequence of N, bits, with a duty cycle of 1/Nf, that for the sake of argument implies that the transmitted signal has only one non-zero sample out of every Nf consecutive samples. The sequence that we are trying to lock to then can be written as: Nc-1 x[n] = ) [n - k. Nf] (2.5) k=O The matched filter to this signal is defined as h[n] = zx[Nc - Ng - n] (2.6) In impulse UWB systems with a very low duty cycle, this filter is very long in the number of taps, but most of them are equal to zero. In addition, a direct form I implementation of this FIR requires Nc *Nf -1 registers, although only N, -1 adders and N, multipliers (by 1 or -1 - so they are amenable to simple implementation). It is necessary for later developments to notice that: h[n -Nf + k] = 0 (2.7) For k = 0, 1, ..., Nf - 2, for all values of n. All these blocks should be working at the sampling frequency (in this case 500 Msamples/s). It is desirable to obtain an architecture with the same functionality, that allows running most of the mathematical operations at a lower clock frequency. Assuming that a series to parallel operation in N1 ways happens after the application of the matched filter, the structure of the filter with the matched filter would look as shown in Figure 2-6(a). The first step would be to consider using a poly-phase decomposition of the filter SERIES TO PARALLEL SERIES TO PARALLEL x[n] [n] y[mN] I I I I Z- I I SNf I y[mNrl] yNmt'JmNr2l I I I ,' J ymNf(N t -1)] (a) Model of the matched filter with a serial to parallel block at the output (b) Poly-phase decomposition of the filter. SERIES TO PARALLEL (c) Correlators architecture Figure 2-6: Correlators Architecture. h[n] in Nf phases: ek[n] = h[n Nf + k] (2.8) There are Nf phases (from 0 to Nf - 1). And each of the down-sampled outputs of the filters can be obtained as shown in Figure 2-6(b). Now we take into account the special characteristics of the filter h[n]. As we have defined h[n], only one of the poly-phase components is non-zero. This component would be eN,1_[n]. For that reason, it is possible to simplify and obtain all the outputs of the matched filter with an architecture as shown in Figure 2-6(c). The architecture shown in Figure 2-6(c) requires a total of Nf(N' - 1) registers, Nf(N, - 1) adders and N - Nf multipliers. The clock frequency has been reduced at the cost of adding complexity to the receiver. It is possible to reduce the complexity at the cost of not obtaining all the outputs of the filters, and, because of that, increasing the time to achieve signal acquisition. The correlator architecture employed in this transceiver, and its variations as shown in the next chapters will exploit this trade-off. 2.4.3 Architecture The digital back-end is divided into a fast and a slow clocking domain, as shown in Figure 2-2. The fast domain uses a custom layout, a 500 MHz clock coming from the PLL, and is composed of a retiming block, a block that performs a 10x parallelization of the incoming ADC samples, and the main control of the digital back-end. The parallelization allows the slow domain to work with a 30 MHz clock. The correlations and other mathematical operations needed to implement the synchronization and demodulation are performed in 2's-complement fixed-point arithmetic in the slow domain, using standard ASIC flow. The retiming block provides the one-sample delay granularity required for fine tracking. The groups of four samples that are the inputs to the correlators may start with any arbitrary sample and may include samples belonging to two different ADC vectors, as shown in Figure 2-7. These groups are obtained by selectively delaying the outputs of one or more ADC interleaved channels in the retiming block shown in Figure 2-8. Since the first operation in the correlator block is to add together four consecutive samples, the outputs of the retiming block need not be reordered chronologically in this case. This block is controlled by a four state finite state machine whose value is updated based on the relative delay of the incoming signal with respect to the local templates. After the parallelization, the outputs of the fast clocking domain are processed by 10 correlators as shown in Figure 2-9. In each 30 MHz clock cycle, the four samples at its input are added together, implementing the correlation with a rectangular pulse of width T,. The result of this addition is either added or subtracted, depending on the value of the Gold code, to the 11-bit value stored in the shift registers five cycles before (50.T,, equal to the time between two consecutive pulses). Since the time between two consecutive pulses is equal to 5 cycles, in order to cope with this duration, coarse acquisition is decided only upon N, - 1 pulses instead of NC pulses. All multipliers in Figure 2-9 are implemented using 2-to-1 multiplexers because in each of them one ADCO[n I ADC1[n] IADC2[n] IADC3[n] A \ \ \ \ I \ \ [4 \ I \ \ \ \ \ ADCOn] \ \\ \ ADC[n] ADC2[n] IADO] n] \ \ ' Figure 2-7: Groups of four consecutive samples required of the coefficients represents a single bit. Each correlator performs five correlations at the same time, equivalent to the output of an FIR of Dbt - f,=6200 coefficients with values equal to 1, -1 or 0. The outputs from the ten correlators (a total of 50 correlation values) are used by the coarse acquisition subsystem, but only the first two are active during fine tracking. The Gold code generator is implemented with two shift registers, each of them generating a linear recursive sequence of which both the coefficients of the generating polynomial and the seed values are programmable. If the incoming signal is properly aligned to one of the first local templates in a correlation, at the end of the iteration, the 50 correlations contain the samples of the channel impulse response. This information was not further used in this transceiver, but it may be used in future prototypes in a RAKE receiver to compensate for the multipath and in an MLSE sequence estimator to make up for the inter-symbol interference if the system uses a inter-pulse interval shorter than the channel impulse response. The duration of a.correlation iteration is (N, + 1).133 ns. The extra 133 ns provides additional time to compare the correlation results to Tho, and, if packet synchronization is not declared, it is used to delay the position of the local templates Tf with respect to the incoming signal. The fine tracking subsystem shown in Figure 2-10 provides the functionality required to close the DLL. The division needed in the delay estimation is avoided by multiplying by an approximation to the inverse stored in a ROM. The ROM stores 32 seven-bit numbers and the five most significant bits of the numerator are used to choose the output. The coefficients of the filter are programmable, and it incorporates Baugh-Wooley multipliers. The delay decoder transforms the output of the filter into signals relevant to the fast clocking domain: the new state of the retiming block and indication of the need to start correlations a 500 MHz clock cycle before (signal Advance) or later (signal Delay). The fine tracking subsystem also provides a flag to restart coarse acquisition when the signal is lost. All the outputs of the fine tracking system are ready in six 30 MHz cycles. Since there are only five 30 INO INI1r4 IN2 4 Outo i IMUX Out4 Out1 ux Out2 -1 DQIC2 IN3I4. IMUX-0 - Out3 Figure 2-8: Block diagram of the retiming block MHz cycles between every two consecutive pulses, there is not sufficient time between the last pulse of the Gold code sequence and the first pulse of the next sequence to perform this operation. This is the reason why the DLL only uses Nc - 1 pulses to estimate the delay even if Nc pulses are used to recover the value of the transmitted bit. This leads to a negligible loss of processing gain for the delay estimation. As the 50 correlations are completed, they are read into the coarse acquisition subsystem, whose block diagram is shown in Figure 2-11. The memory block provides not only the value of the maximum but also the values in the two adjacent positions. The two simplified fine tracking subsystems lack the loop filter shown in Figure 2-10 except for its direct path (bo). Only the simplified fine tracking subsystem using the two positions with the most energy will be used to initialize the DLL. Detection of the signal is given in six 30 MHz cycles, and the rest of the outputs are ready in seven cycles thereafter. In order to provide enough time interval for these operations, the evaluation of the 50 correlations starts only when Nc - 1 pulses have been integrated. Still, during the change of state that happens after declaring packet synchronization, the first two pulses of the next bit are lost. This implies a negligible loss of processing gain in the demodulation of the first bit of the payload. All thresholds, coefficients and other parameters used in the digital back-end must be configured before utilization by using data fed through a serial port. 2.5 Performance Results Figure 2-12 shows a photograph of the 0.18 /im ASIC. The PLL was verified at 500 MHz and can provide much higher clock frequencies (up to 2 GHz). The ADC is verified using the testing method presented in [86], and proved to have an ENOB Figure 2-9: Implementation of the correlation bank. larger than 3. The digital back-end is completely functional at a clock frequency of 300 MHz, but not at 500 MHz. The part that failed in the baseband was the path of the signals generated in the slow-speed clock-domain that needed to be registered into the high-speed clock-domain. The frequency range for the coarse acquisition algorithm between a pair of transceivers is shown to be ±3%. At 300 MHz a data rate of 193 kbps was demonstrated. Table 2.2 contains a summary of overall chip measurements. Most of these circuits can scale to 500 MHz. This architecture is scalable to larger bandwidths. - .- - - L Figure 2-10: Fine tracking subsystem block diagram. Figure 2-11: Coarse acquisition block diagram. U UCas Frn'#d0F . U Figure 2-12: Single chip UWB transceiver photograph. Table 2.2: Chip Measurements Chip specifications 0.18 pm Process Technology 4.3 mm x 2.9 mm Die Size 193 kbps Bit Rate Power Consumption 45 mW PLL 65 mW CLK Buffers 75 mW Baseband Chapter 3 System Analysis for the FCC Compliant System In this chapter a method for obtaining the specifications of a baseband system for an UWB transceiver is developed. A system specification optimizes the resources of the different subblocks that comprise the wireless system in order to achieve a certain data rate at a concrete distance. The limitations of the different subblocks, the specific challenges of the UWB system, and the constraints imposed in its architecture for practical reasons are taken into consideration in the choice of the algorithms that are implemented in the baseband. For UWB systems, the impact of the extreme multipath and the techniques used to compensate for its effects will be carefully analyzed. Emphasis is given to the programmability of the different elements of the baseband subsystem as a way to prove the trade off between power dissipation, complexity and quality of service. First, the objective of the system in terms of data rate and distances will be decided in relation with the different standards or proposals available for similar systems. Then, the reasons that support the choice of an homodyne architecture for the UWB receiver are presented. The kind of signal chosen depends on the ADC specification, and a comparison between impulse signals and multiband-OFDM signals is presented. After this, the main challenge of UWB communications, multipath, and its solution will be addressed, and the algorithms to be implemented in the digital baseband will be defined and developed for fixed-point arithmetic implementation and programmability, making explicit the trade-off between complexity of signal processing and quality of service. With the sensitivity defined as a minimum SNR obtained from this section, an analysis of the link budget is presented at the end of the chapter, along with the model under which the final system was simulated. The results of this chapter will be applied in the following chapters to analyze two different systems: the one implemented in an FPGA as part of a discrete prototype in which the main concern is simplicity, and the one implemented in an ASIC, where the objectives are robustness and programmability. 3.1 Objectives of the Design In our choice of the objective of this UWB system we decided to explore the high data-rate applications and chose to obtain a data rate of 100 Mbps at 10 m. This follows closely one of the modes specified for the IEEE 802.15.3a effort. This will be achieved with a signal that meets the requirements of the spectral mask indicated in Figure 1-1, in the band from 3.1 GHz to 10.6 GHz. Two problems are relevant for this kind of signal: coexistence with other narrowband signals and multipath. The interference that UWB systems would cause on already existing system is almost negligible, but it has been proven that in spite of initial assumptions, UWB systems are vulnerable to narrowband interferers. The initial claim that UWB systems would be able to easily filter out narrowband interferers depends on the linearity of the transceiver. If the front-end does not provide enough linearity (given by the RF front-end and the ADC), the receiver may be saturated by a powerful narrowband interferer. Since the bandwidths of UWB systems is 500 MHz and larger, the linearity constraint in the ADC is associated to its bandwidth and obtaining more than 4 bits at those frequencies lead to very power hungry designs that are not amenable for wireless applications. Furthermore, the linearity constraints add to the bandwidth and low-noise constraints in the LNA, making these elements of the RF front-end harder to design. The band approved for UWB communication uses includes the UNI-II band at 5 GHz, already used by wireless local area networks (WLAN) standard IEEE 802.11a, a strong, narrowband interferer. [87] expands on the effect of narrowband interferers on wideband wireless communications. Some techniques to reduce the narrowband interferer impact in MB-OFDM signals has been presented in [88]. For these reasons, most of the proposals submitted to the IEEE 802.15.3a committee chose to divide the 3.1-10.6 GHz band in a number of subbands of around 500 MHz bandwidth', the minimum allowed by [2]. This choice allows avoiding the UNI-II band by simply not using the 500-MHz bands that collide with it, and filtering this interference out in the front-end. The minimum bandwidth allows reducing the design stress on several of the blocks and focus on the algorithms. The baseband designed here follows this trend. In this thesis the in-band interferer will not be actively addressed further than this design choice. Figure 3-1 shows how the different 500-MHz channels fit in the spectral mask given by the FCC. The other important problem is multipath. The multipath model used here is the one presented in [3] used for the IEEE 802.15.3a standardization group. An effective high data rate transceiver in this band should provide robust communication under severe multipath conditions. This requires sophisticated signal processing that will increase the power dissipated in the baseband. The two main drivers of this architecture are the possibility of providing a good characterization of the environment of the transceiver, and also provide higher levels of the architecture with programmability that allows trading off power dissipation and data rate with quality of service in or1 MB-OFDM proposal uses a 528 MHz channelization in order to simplify the design of the local oscillators used in the transceiver. E m-40 . .. . -. - -45 S -50 0 -55 -60 i -70 ~ -75 .... i............. .i.......... ... .. ... .. ..... ... Report and Order .... .. .................. .............. . .-First . .... .. ".... i. . ..... =.. i . ... ................ ...... .... -8n I "---. - Part 15 boundn - UWB Channelization 101 0 10 Frequency in GHz Figure 3-1: 500 MHz bandwidth channelization with FCC compliant power spectral density der to adapt the transceiver sharply to the service that needs to be provided and the channel state. In addition, a fast signal acquisition algorithm must be implemented to reduce the duration of the preamble to a value comparable with current wireless systems (_ 20 its). 3.2 Homodyne vs Heterodyne architecture The architecture choice in the front end determines the type of signal processing that is required in the baseband. There are two main options [89]. The first one is the heterodyne receiver shown in Figure 3-2(a) and the other one is the homodyne receiver, depicted in Figure 3-2(b). A heterodyne receiver needs to handle the rejection of the image band. This implies a trade-off between the value of the intermediate frequency and this rejection. The larger the intermediate frequency chosen, the easier it is to reject the image band, but also the larger Q is required for the channel select filter and the larger sampling frequency for the ADC afterwards. On the other hand, a small intermediate frequency reduces the sampling frequency of the ADC and the Q required in the channel select filter, while complicating the design of the image reject filter. The heterodyne receiver by default will use an ADC whose sampling frequency is larger than the bandwidth of the signal, and further signal processing is required in the digital baseband to down-convert it all the way to baseband. A homodyne receiver needs not worry about the image band, because the received signal is directly converted to baseband. On the other hand, the homodyne receiver needs to duplicate the receiver chain after the down conversion. The homodyne receiver also requires two ADCs instead of one, but their sampling frequency may be LNA cos •Lot a) Heterodyne receiver b) Homodyne receiver Figure 3-2: Architectures for the receiver. of the same order of magnitude as the bandwidth of the signal that is being processed. In addition, a homodyne receiver needs to take into account DC offsets present in the signal (both from LO leakage or interferer leakage) and I/Q mismatch. In our case, we chose the second option and the main reasons for that was the trade-off regarding the image band rejection if a heterodyne architecture were used. Concretely, let us consider an 802.11a interferer at 5.6 GHz, that appears in the image band of the UWB channel right below. Let us also assume that the intermediate frequency chosen is 300 MHz (so that the objective band ends up from 50 MHz to 550 MHz), and that after this down-conversion it is sampled at 1.2 Gsps. Under these conditions, if 10 dB of attenuation of the image band is required, a filter that ensures 140 dB/decade slope is required, what makes its design really difficult. The homodyne architecture allows a reduced ADC sampling rate, and digital baseband frequency clock. 3.3 Specification of the ADC The broad definition of UWB by FCC allowed for complex modulations such as OFDM to be included in the denomination of UWB signals. Taking into account that OFDM allows a very clean and elegant solution to multipath [90], it was considered as an alternative to impulse UWB for the UWB transceiver. In this section we will compare both signals taking into account the specification of the ADC. 3.3.1 Signal definition In order to be able to compare the performance of both kinds of signals, the following approach is chosen so that each of them achieves a data-rate of 100 Mbps with a bandwidth of 500 MHz. * Pulsed UWB: Each bit of information is represented by one pulse of width 2 ns and the distance between the beginning of two consecutive pulses is 10 ns. * OFDM UWB with 256 carriers: Each OFDM symbol occupies, sampled at 1 GHz, 256 samples. In a heavy multipath environment, the expected RMS spread of the channel impulse response is approximately 25 ns [2]. The cyclic prefix has been conservatively chosen to be greater than this value; it has a length of 54 ns (54 samples). 31 bits of information are encoded in each symbol using repetition codes so that each bit is modulating more than one carrier in BPSK. In order to whiten the spectrum, the total number of carriers is divided into four blocks of 31 carriers, while reserving the remainder of them for channel estimation. The four blocks contain exactly the same bits of information, but each scrambled in a unique way. In order to obtain a real baseband signal, the symbols modulating the conjugate carriers are complex conjugates themselves. These signals are sampled at a frequency of 1 GHz. Both signals provide the same data rate. Because they occupy the same channel, they have a similar processing gain. The following differences between the two signals should be highlighted: * OFDM UWB inherently provides a simple mechanism of channel equalization. On the other hand, a receiver using pulsed UWB should also include an equalizer to mitigate the effects of the channel. * Synchronization of the receiver is performed differently depending on the type of signal transmitted. This leads to different kinds of synchronization algorithms for the two signals. In the simulations shown in this section, it is assumed that the receiver has achieved perfect synchronization and no jitter or time errors are considered. 3.3.2 Automatic Gain Control For the simulations we will take into account the use of an ADC with a finite number of bits and see how the system performance changes as the number of bits of the ADC goes down. The presence of an ADC in the system implies the necessity of an automatic gain control (AGC). It is assumed that this system has an instantaneous AGC in order to focus on the impact of quantization noise in the demodulation of the signal. The ADC model has a fixed input range, from -1 to 1. If the number of bits is b, then the quantization step is A = 2 1-b. The AGC is calibrated with the assumption that, in addition to the input signal, there is only AWGN at the input of the ADC. Then, the quantization noise can be assumed to be a uniform random variable of variance A 2 /12. This assumes that the total input signal amplitude is neither very large (avoiding saturation of the ADC) nor very small (in which the quantization noise tends to A2 /4 as only the two smallest levels of the ADC are exerted). The AGC will avoid these extremes. The AGC scales its noisy input signal by a factor a such that the ADC is fed an optimal input mean square voltage of a,2 given in Table 3.1 for different resolutions. Due to this block, it is safe to assume henceforth that the quantization noise power added by the ADC for all input SNR's is A2/12. 3.3.3 Demodulating Architectures The demodulation of the bits depends on the kind of signal received. Two types of receivers are considered: * OFDM receiver [90]: . Shown in Figure 3-3(a). After sampling the incoming signal, the samples corresponding to the same OFDM symbol are separated. The samples that belong to the cyclic prefix are removed. An FFT is performed over the rest of the samples. Since each bit of information has been spread over several carriers, the coefficients of the FFT corresponding to those carriers are added. The sign of this number is related to the value of the data bit. * Matched filter receiver [50]: Shown in Figure 3-3(b) After sampling, the incoming signal is correlated with a replica of the representation of the data bit. For pulsed UWB, the incoming signal is correlated with a representation of the pulse shape (in this case, a rectangular pulse). The sign of the correlation result indicates the value of the data bit. 3.3.4 Simulations and Analysis The signals are compared in terms of their behavior in a noise-limited environment and an interference-limited environment. Noise samples are uncorrelated. The ADC is preceded by an AGC which sets the power fed to the converter based on the previously described policy. The Monte Carlo simulations required for the plots provide an standard deviation in the error of the estimation of the probability of error (Pe) Table 3.1: a, values set by AGC ADC Resolution 2 3 4 5 ao 0.2850 0.2025 0.1425 0.1231 a) Architecture of an OFDM receiver Template b)Architecture of an Impulse receiver Figure 3-3: Receiver architectures for different UWB modulations. under 10% for a Pe of 10- 5 , under 3% for a Pe of 10- 4 and less than 1% for P, greater or equal than 10- 3 . The results are shown in Figure 3-4 for the OFDM UWB signal. Figure 3-5 represents the result for the pulsed UWB signal. In both cases, the results were obtained for ADC resolutions of 1, 2, 3, 4 and 5 bits. The curves for a 6-bit ADC were also obtained but since they are already very close to the ideal case with no quantization, they are not presented here. In both figures, the curve that represents the performance for the ideal case with no quantization is provided for comparison. It is seen in these figures that the OFDM UWB signal performs slightly worse than the pulsed UWB signal. This difference is caused by the presence of a cyclic prefix in the OFDM UWB signals. This prefix reduces the real power used for detection of the information bit by the ratio of the prefix length to the sum of the OFDM symbol length and the prefix length. These simulations allow the comparison of the three different signals for two regimes: * High SNR regime: In the case of OFDM UWB, as the SNR increases, P, tends to a saturation value and cannot be reduced further. In the pulsed UWB, Pe can be made arbitrarily small by increasing the SNR. In the case of the OFDM UWB symbol, as the SNR increases, the only relevant noise term is the quantization noise that asymptotes to a constant value of A2 /12, with A being the quantization step. As an FFT of 256 points is performed, the demodulated symbol for each carrier contains noise that is the result of the combination .,o ................ _········ ·· ·· · · · ·· · · · ·· · · ··- · ·· · · ·· · · ·· · · · ·· · ·· · · . no quantization ...1bit lo :::::::::::::::::::::::::: I ·-2 bits 10u ............. --- 3 bits ..4bits M -2 :::::::::: :::::::::::::::::~ ~ r.. i'iiiiiiiiii~iii 10 ................. ~iiii.................. .0 lo 0. ....... ................. . ... .................. ................... 4: .... .... ................... ................... . .............. ............ ......... .................. .................... .. ................. ...... ........ . . . .. .. . ........ .. ::::: ::::: ::: :: .. ... .::::: . ::::: .... . ...::::: .. . ::::: .. . ..::::: . . . ..::: . :: .. ::: . ........... . .. :::: .::.:: .. ::: . ..:: . :: .. :. 0 -1 -11 -5 0 SNR (dB) 5 10 15 Figure 3-4: Probability of error for the AWGN limited case, OFDM UWB .................... -;~i~iiitii - - . -10 no quantization I bit 2 bits . ....... ........... 3 bits 4 bIts 3_ . ...... :: ..... .. ........ - .Q 2 ................. :::::::: )L~··· ·· · · ·.... I:::::::: r1 .* ......... ........... .... ... ...... ... -10 -5 0 5 SNR (dB) ... 10 15 Figure 3-5: Probability of error for the AWGN limited case, pulsed UWB 56 with weights with different phases but the same amplitude of 256 samples of quantization noise. The result can be assumed to be Gaussian. In this case, Pe converges to the probability of error that corresponds to an SNR: 12 SNR = 1 P8 =P3 -, 2 2b (3.1) Where P, is the signal power. For an OFDM UWB signal in the high SNR regime, each additional bit in the ADC will provide a better saturation Pe. In the case of pulsed UWB, a small number of samples is used for each bit. The effect of AWGN can be understood as a change of sign of the sample compared to the sign it should have. As the amplitude of the signal compared to the amplitude of the noise increases, the probability of changing the sign monotonically decreases and an arbitrarily low Pe is achieved. The behavior of OFDM UWB signals can be also explained related to the clipping of these signals with the ADC. Signals with a large peak-to-average ratio, such as OFDM, are more vulnerable to clipping. In this case clipping causes inter-carrier interference that increases the Pe. Low SNR regime: For the case of pulsed UWB, 1 or 2 bits are sufficient since the plots of these curves are close enough to the ideal case. Due to the saturation effect observed previously, the curves of OFDM UWB for a low number of bits are farther away from the ideal ones. A Pe of 10-4 can still be achieved in the OFDM case by using an ADC of at least 3 bits. For the interference-limited case, a pure sinusoid is chosen as a replacement of a modulated carrier with a finite data bandwidth. Thus, there are no abrupt changes of phase over the duration of both the representation of one information bit (in the case of pulsed UWB) or during the duration of an OFDM symbol, including the cyclic prefix, in the case of OFDM UWB. Its frequency is a uniform random variable in the range from 0 to half the sampling rate. Its initial phase is an independent uniform random variable from 0 to 27r. The Monte Carlo simulation provides the same precision as the simulations of the AWGN limited case. The simulation results are shown in Figures 3-6 (OFDM signal) and 3-7 (pulsed UWB). For SIR values greater than those represented in the figure, Pe drops to zero. This is because the interference is an amplitude limited signal and the amplitude is small enough so that the samples of the signal do not change. There is a threshold effect for SIR = -3 dB for pulsed UWB and for SIR = 11 dB for OFDM UWB. If the signal modulation is more complex, each bit incorporates a higher number of samples of the interference. The shortcomings of OFDM signals in the presence of strong non-linearities may be improved by the use of coding techniques, although this will require by default an increase in the complexity of the receiver. For all of them, while 1 and 2 bits are still slightly far from the ideal curve, 3 or 4 bits comes close enough to it. Two conclusions are derived from here. First, a BPSK signal is better behaved in the presence of non-linearities. An OFDM signal requires a minimum number of .,O 10 quantization -no 10-1 no ... ........ .................. 10 quant .................. .......................... . . . . ... .. . .. .. .. .. .. ...... ............ .................. •.............. ..... ........... ................... •.............. . . .. .. I.. .. 2-2 bits bits 10 ......... o-2 '10.. . . ........... ... .... .... ... ...... ....... .... ... ... ... .... ... ... ... .......................... • . .............. : ....... .... .... . . . . ............... . ........ •.................. ............... . .................. •....................• ........... .. ...• .................. .................. ................... . . .................. ................... ............ .......... ......... ......... ......... ......... .::: ;.. -10 -5 0 SIR (dB) 5 10 15 Figure 3-6: Probability of error for the interference limited case, OFDM UWB ............ .................... no quantization ...-....... ...... - -- -"' 1 bit 2 bits ---3 bits 16 Eo . 4 bits -*-. ~::::::::::::::::: :::: ............ ....................... ... .. ... .. .. ... ... .. .. .. ... ... ...... .. ... .. ... .. .. .. .. .. .. ... .. .. ... .. .. ... .. .. .. ... ... . .. .. ... .. .. ... .. ... .. . ..... ...... ..... ...... ..... ... ...... ...... .... ...... ........ ...... ...... . ... .. .. .. .. .. .. ... ... .. .... ... .. ... .. .. .. ... .. .. .. .. .. .. .. .. .... ... .... .. .... .... .... .... ... 0 l0 I. 1 -- -L -10 -8 -6 -4 SIR (dB) -2 0 2 Figure 3-7: Probability of error for the interference limited case, pulsed UWB E 4 Time (ns) Figure 3-8: Example of UWB BPSK baseband signal, before up-conversion bits in the ADC even in the best SNR conditions to work with a low enough bit-error rate, while a BPSK signal can achieve arbitrarily low BERs with a 1-bit ADC when the SNR is high enough. Even if with both types of signal it is possible to argue that similar performance is achievable, pulsed UWB allows reducing the number of bits in the ADC and adapt it to the channel quality reducing the power consumption both of the ADC and the digital baseband when the SNR or the channel quality are good enough. The better behavior of impulses to non-linearities, allowing a more drastic trade-off in terms of complexity was the main reason to choose BPSK impulses over OFDM signals in our system. This work was presented in [91]. 3.4 Choice of UWB Signal This baseband has been designed for impulse UWB. Each bit of information is represented by the sign of one impulse (BPSK). The signal is comprised of a sequence of 500 MHz bandwidth pulses 2 that are up-converted to one of 14 channels (sub-bands) of the bandwidth available in the 3.1-10.6 GHz band. The interval between every two consecutive pulses is T8 = 10 ns during the payload. Each bit of information is represented in BPSK by only one pulse, achieving a data rate of 100 Mbps. The data packet structure will be described in a later section. Figure 3-8 shows an example of the baseband UWB signal representing, using BPSK, a sequence of three bits. Figure 3-9 shows a 500 MHz pulse up-converted with a carrier of 5 GHz. 3.5 Multipath In this section, the channel model in which the UWB system should work is presented. After that, the different algorithms involved in the compensation of the multipath and 2In this and in the future, bandwidth in this thesis refers to -10 dB bandwidth. Figure 3-9: 500 MHz pulse with carrier 5 GHz. Courtesy of David Wentzloff ISI are developed. 3.5.1 Channel Model The channel allocated to UWB communication signals is impaired by severe multipath. The IEEE 802.15.3a chose the multipath model presented in [3]. It is a Saleh-Valenzuela [1] model with two modifications: a lognormal distribution is used instead of a Rayleigh distribution for the multipath gain magnitude, and independent fading is assumed for each cluster as well as each ray within the cluster. These two changes fit the channel measures obtained better but make the mathematical study of the model more complicated. Four different types of channels were provided for the transceiver simulations. Their characteristics are shown in Table 3.2. This table includes the number of paths that have attenuation smaller than 10 dB with respect to the more powerful path. (NP1odB, and the average number of paths that include 85% of the total energy (NP (85%)). The multipath model consists of the following, discrete time impulse response: hi(t) = Xi L K 1 (t - - ) (3.2) 1=0 k=0 Since we are using the model as it is, we will not delve into how this model is generated. For more details on this, please refer to [3, 1, 92]. [3] provides a realization of the model that try to match important characteristics of the channel. Since it is difficult to match all possible channel characteristics, the main characteristics of the channel that are used to derive the model in [3] are: Mean excess delay, RMS delay spread, number of multipath components (defined as the number of multipath arrivals that are within 10 dB of the peak multipath arrival), and power decay profile. Table 3.2 shows these characteristics for the four channel models that are provided for Matched IT/T Equalizer Figure 3-10: Procedure to compensate for multipath testing. The large bandwidth of the UWB signal allows separating the echoes that arrive to the receiver with a delay separation larger than the duration of the UWB impulses (_ý2 ns), and use this information to implement a Rake receiver in order to gather all the possible multipath energy. Additionally, since the RMS delay of the channel is larger than the inter-symbol period (10 ns), equalization is needed to compensate for the inter-symbol interference (ISI). The procedure to compensate for the channel multipath consists of, first, to estimate the channel impulse response, and then, to use this information in both a Rake receiver (matched filter) and an equalizer, as shown in Figure 3-10. In the following sections these aspects will be analyzed. 3.5.2 Data-Aided Channel Estimation Channel estimation in UWB communications has been previously addressed in [59, 93, 94, 95] to assess the signal energy capture in Rake receivers as a function of the number of fingers. In these papers, an isolated monocycle is transmitted through the channel and the corresponding received waveform is recorded. The problem is to approximate the actual channel with a channel with Lc branches. The degree of matching depends on Lc and the minimum value of L, required for a good match establishes the number of fingers that a Rake receiver must posses to efficiently exploit the channel diversity. The approach in this receiver matches that of [47]. In this paper, the authors lump together the effect of the multiuser situation as additional additive white Gaussian noise. Figure 3-11 shows a subset of the echoes in one instance of the CM1 channel. There are clusters that contain several echoes of different amplitude and sign in an interval of duration smaller than 2 ns. As we are using impulses of 500 MHz bandwidth, it is not possible to separate these echoes in the receiver. Due to the bandwidth of the Table 3.2: Multipath Channel Models Channel CM1 CM2 CM3 Description LOS, 0-4 m NLOS, 0-4 m NLOS, 4-10 m CM4 Extreme NLOS Mean Delay RMS Delay NPlodB 5.05 ns 5.28 ns 10.38 ns 8.03 ns 14.18 ns 14.28 ns 35 25.00 ns NP (85%) 24 36.1 61.54 U.5 0.4 0.3 0.2 " 0.1 ...... ! ............. 0. .1 . -0.1 -0.2 I .8.10.12 ..... .... ... ... I... ... ..... ...I..... I...... .... .. ..... ..... 0 ....l -0.3 nf i. . ......... ?. ............ I.............. .............. .............. ... ......ii i..............i.............. ...... ....... ·. I.....lj...... 2.4 ..... .. ........ .. ..... ....... -0.4 -05 0 2 4 6 Delay (ns) 8 10 12 Figure 3-11: Example of the clusters in one instance of the channels in [3]. signal, the number of echoes that combine is small, not allowing the application of a Rayleigh or a Ricean model (that use the central limit theorem). Assuming there is no inter-symbol interference, the equivalent low-pass signal received is: L K hip(t)= b ak,1p(t - '?- -,) (3.3) 1=0 k=0 where p(t) is the pulse shape. The objective is to estimate the equivalent low-pass channel [50], after sampling: h,[rn] = Le-1 ai6[n - i] (3.4) i=O where ai is a complex number, and the channel has already been sampled at the Nyquist rate. This expression assumes that the channel impulse response can be reliably represented with L, consecutive samples, either because the channel impulse response is shorter in duration than Lc - Tb, where T b is the inverse of the sampling frequency, or because the taps that arrive outside this interval have very small SNR compared to these ones. The parameters here are then only the amplitudes, taking into account that if no echo should be found at a certain delay, its associated amplitude ai is zero. The value of L, will be determined in the section on Rake receivers and MLSE equalizers. For the purpose of the Rake receiver, it is not necessary to separate the information of the pulse shape p(t) from that of the multipath. Only the aggregate result of their convolution, as indicated in (3.4) is required to implement the Rake receiver as an approximation to the matched filter. It is assumed that the channel coherence time is much longer than the data packet, so that the channel impulse response does not change during its duration. If the channel impulse response is estimated during the preamble, this information needs not be updated during the rest of the packet. During the preamble a known sequence of signed impulses is sent. From each impulse sent we obtain a noisy snapshot of the impulse response, assuming that the separation between consecutive impulses ensures there is no inter-symbol interference. Let us define = [h[O], h[1], ..., h[L - 1]] T w'[n] = [w[0], w[1], ..., w[Lc - 1]]T =n] = b[n] + W (3.5) (3.6) (3.7) In this expressions both h (channel impulse response), w'[n] (AWGN) and i•n] (received snapshot of the channel impulse response) are complex vectors. b[n] is the transmitted symbol and in general it could also be a complex number. Since we are using BPSK, b[n] = ±A. During the preamble, the values of b[n] are the elements of a known pseudorandom sequence and can be either 1 or -1. w'[n] is stationary, white and Gaussian, with an autocovariance matrix equal to a 2 . ILc, with a being the standard deviation of the Gaussian noise, and ILe the identity matrix of Lc rows and columns. Since for each 9[n], b[n] is known, it is possible to create a sequence of new random variables: fi[n] = b[n] -1n] = + b[n] -w[n] =h + - 1 [n] (3.8) It is trivial to show that w1 [n] is also stationary, white and Gaussian, with a covariance matrix equal to a 2 'ILe. The problem of estimating the channel impulse response is then simply obtaining the mean of the random vector r' [n] having several realizations (Ne) of this vector. Each realization of this vector corresponds to the reception of one impulse in the preamble. This would be the least square error algorithm [96]. The procedure for the channel estimation consists of: 1. Collect a set of Nc received vectors r1n] with n = 1, 2, ...,No, associated to a known sequence of bits b[n]. 2. For each r1n] obtain F' [n] using equation (3.8). 3. The channel impulse response is estimated as: Nc he =E l[n] (3.9) n=1 The previous analysis assumes a linear receiver and that the impulses in the preamble are separated enough to ensure that there is no ISI. It is possible to ensure the non-ISI constraint by separating the impulses in the preamble for a larger time interval than in the payload. Regarding the linearity, both the impact of the number of bits of the ADC, and its saturation due to an error in the automatic gain control must be analyzed. The effect of this is modeled through simulation, and their results shown in Figures 3-12 and 3-13. These figures show the contour of minimum SNR ti 14 .... Ti. 05.5 0 5 0 S4.5 .0 E 4 a..... .. .... \ .. .. ........ z 3.5 2 5 10 15 20 25 Number of pulses integrated 30 Figure 3-12: Minimum SNR at the input to achieve a 10 dB SNR in the channel estimation as a function of the number of bits of the samples and the length of the integration. No saturation at the input of the channel estimation required to obtain a SNR of 10 dB in the estimation of the channel impulse response, as a function of the number of bits of the ADC and the number of impulses Nc used to obtain the channel impulse response estimation. Figure 3-12 shows these curves when the gain provided by the front-end ensures that the full range of the ADC is used without saturation. Using ADCs of 4 or 3 bits, and integrating for N, > 10 ensures a very good SNR in the estimation even in very low SNR. Figure 3-13 shows what happens when the gain of the system is 6 dB larger than in Figure 3-12, causing saturation. In this case, although 4 bits still allow the same behavior as before, there is a noticeable decrease in performance for 3 bits. Still, for N, > 10, the channel impulse response estimation is still reliable. This serves to reduce the constraints in the automatic gain control (AGC). Steps of 6 dB gain in the front end should then be enough for the channel estimation to work properly. Apart from that, for the channel estimation, Nc equal to the length of the Gold code is chosen because it is a very conservative value that would tolerate even worse performance of the AGC. 3.5.3 Rake Receiver Multiuser detection [97, 98] is known to be the optimal solution even in a multipath environment but, as its complexity increases exponentially with the number of users, it is often impractical. The optimum receiver for detecting signals in a multipath environment, when the observation noise is modeled as additive white Gaussian noise (AWGN), is a matched filter or a correlation receiver, where the reference (template) signal is the response of the transmission medium to a transmitted signal (composite of the channel and the transmitted signal). A Rake receiver resolves the components of a received signal (arriving at different times) and combines them to provide diversity in multipath environments [99]. A Rake receiver is a suboptimal solution for multiuser 5 10 15 20 25 Number of pulses integrated 30 Figure 3-13: Minimum SNR at the input to achieve a 10 dB SNR in the channel estimation as a function of the number of bits of the samples and the length of the integration. 6 dB saturation environment, since it would model the interferers also as AWGN. It is a good trade-off between high performance and low complexity. In addition, it represents the building block for other schemes performing multiuser interference cancellation. It is known that as the spreading bandwidth of a signal increases, the number of resolvable multipath components available also increases, making the signal amenable to improvement by the use of a Rake receiver. In [100, 101, 60, 59, 621, the term all Rake (ARake) receiver is used to describe the receiver with unlimited resources (taps or correlators) and instant adaptability, so that it can, in principle, combine all of the resolved multipath components, even if their number increases with the spreading bandwidth. However, the number of multipath components that can be utilized in an implemented Rake is limited by power consumption, design complexity and channel estimation [49]. The opposing approach to the ARake is the Selection Combining (SC) whereby the received signal is selected from one out of the L, available diversity branches. Another well known approach is the maximal-ratio combining (MRC) [102]. In MRC, the received signals from all the diversity branches are weighted and combined to maximize the instantaneous signal-to-noise ratio (SNR) at the combiner output. The fact that in a normal receiver not all the multipath components can be taken into account has been developed in several studies that use a reduced-complexity multipath combining system that selects the L best paths (from the Lr available) and then combine them based on a chosen criterion. Those receivers are known as selective Rake (SRake receivers). Selecting the "best" paths can be accomplished by selecting the multipath components with the largest signal-to-noise ratio (SNR), corresponding to those echoes with smaller attenuation. An hybrid scheme in which L out of Lc components are selected, and then combined using MRC has been developed for DSCDMA signals in several of the publications cited in this section. These publications Figure 3-14: Functional diagram of a Rake receiver analyze the performance of such a channel under the assumption that the channel is a slowly varying wide-sense stationary uncorrelated scattering (WSSUS) channel. A Rake receiver can be understood as an FIR filter with an impulse response: L-1 h[n] = as(n - mi) (3.10) i=O in which the number of taps L is fixed, but both ai and mi are configurable. A possible implementation is shown in Figure 3-14. Normally, out of the whole impulse response, the L more powerful components would be detected and the FIR would be set accordingly. In our case, the Rake receiver would be modeled with the following expression: L-1 h[n] = a6s(n - i) (3.11) i=O Figure 3-15 shows the functional diagram of an implementation of this filter. In this case, there is a minimum number of taps with consecutive fixed delays, but the amplitudes are programmable and parts of it (if the corresponding weights a2 is equal to zero) may be turned off (marked in Figure 3-15 as a switch). This model of Rake assumes that the maximum length of the channel impulse response is equal to L -Tb, where Tb=2 ns is the inverse of the sampling frequency. After the channel impulse response is estimated (as shown in previous section), each of the weights is compared to a preprogrammed threshold, and only those taps with weights that exceed the threshold will be used in the Rake. Instead of having a fixed number of fingers, this Figure 3-15: Functional diagram of the Rake receiver that will be implemented in this UWB system Rake uses every sample of the channel impulse response that meets a programmable threshold Th. Either all the samples with the same absolute value are used simultaneously or none at all. For example, Figure 3-16(a) shows the impulse response of a CM1 channel. Figure 3-16(b) corresponds to the result of sampling this channel impulse response with 4-bit precision and using a Rake receiver of 6 fingers. Figure 3-16(c) shows the equivalent impulse response if we chose all those impulse responses that go over a threshold equal to Th=1 LSB. By using a threshold, the number of fingers is a random variable and adapts to the channel impulse response. The block that searches for the most powerful samples in the channel impulse response, required in the Rake receiver, is replaced by the comparison of its samples to the threshold Th. This reduces the complexity of the total receiver. For simulation purposes, the transmitted signal can be written as follows: s(t) = bjpp( t-jTf) - (3.12) This signal undergoes the channel impulse response h (t) that is assumed to have a maximum duration of 25 samples. So the received signal can be modeled as j = bjh + Wý (3.13) where f is a complex vector containing the samples that contain any information about bit bj. h represents the channel impulse response sampled at the Nyquist rate and down-converted to the low pass equivalent of the signal. t is AWGN. In this Time (samples) (a) CM1 Channel impulse response 0 (b) Channel after conventional 6-finger Rake (c) Channel after threshold Rake Figure 3-16: Modified Rake receiver. case, the received signal after matched filtering is Lc-1 r, = hHb + = bj IhI2 h (3.14) i=O Where the superindex H indicates the conjugate transpose matrix. The signal to noise ratio obtained is LE-1 o-i= 1h42 (3.15) o i=O The value used for demodulation is obtained as follows: d=R Q bhH)(hb +i) (3.16) So that if d > 0 the demodulated bit is a 1 and if d < 0 the demodulated bit is 0. h• represents the estimated channel impulse response. This expression is assuming that there is no Multiple Access Interference (MAI), and that transmitter and receiver are properly synchronized (both phase and delay). Taking into account that Wis AWGN with variance No, it is possible to obtain the mean and variance of d and due to the Gaussianity of the noise, a very straightforward derivation of the probability of error as a function of the Eb/No at the input of the Rake receiver can be obtained. Since quantization both in the channel impulse response estimation and in the input signal makes this structure difficult to analyze, simulation with the channels indicated in [3] was used to characterize the performance of the receiver. Figure 3-17 shows the losses of this Rake receiver as a function of the normalized threshold and the channel model with respect to the perfect Rake receiver. Since a decrease of Th implies a larger number of paths is taken into account, Figure 3-17 shows that signal processing complexity may be traded off with quality of service. The channel estimation was obtained with a precision of 4 bits. The maximum loss for any channel is 6 dB as compared to the optimum ARake. This plot makes explicit a tradeoff between signal processing complexity (as the threshold increases, the complexity of the receiver is smaller) with the quality of service (as the threshold increases, the minimum SNR required to obtain a fixed performance increases). 3.5.4 MLSE Equalizer The Viterbi based MLSE equalizer is used in this architecture to compensate for the inter-symbol interference that occurs when the channel impulse response is longer than the time between two consecutive pulses. It is possible to obtain the number of states in a MLSE equalizer depending on the length of the impulse response of the channel, Lmp [63]. The number of states required for a BPSK signal in the MLSE equalizer is LMLSE with: LMLSE L17) The MLSE equalizer implemented will be able to cope with a predefined maximum Threshold Figure 3-17: Losses in the modified Rake receiver as a function of the normalized threshold and the channel model. channel impulse response. It is possible to downscale the MLSE equalizer if the length of the impulse response does not require the use of the entire number of states. LMLSE can also be interpreted as the length of the channel impulse response considered in the MLSE equalizer. LMLSE = 1 would indicate a channel impulse response shorter than 10 ns or 5 samples. In this case, it is assumed that there is no ISI. LMLSE = 2 indicates a channel impulse response longer than 10 ns and shorter than 20 ns. Each symbol affects the next one, and a MLSE of 4 states is required. LMLSE = 3 requires a MLSE of 8 states and LMLSE = 4 requires a MLSE of 16 states. Figure 3-18 shows the losses in SNR associated with the MLSE equalizer as a function of parameter LMLSE, for the different channels that were introduced in Table 3.2. This figure assumes a Rake receiver with a threshold Th equal to 1 LSB. For both channel models CM1 and CM2, even without using a MLSE equalizer (or using one of 2 states), a maximum loss of 1 dB is obtained. Channel model CM3 will provide a satisfactory performance with a MLSE equalizer of 4 states. Only CM4 requires higher complexity than this. This figure was obtained using a channel impulse response representation of 4 bits (real and imaginary parts) and a maximum channel impulse response of 25 taps. Since the complexity of the MLSE equalizer is exponential with parameter LMLSE [65], it is important to constrain it as much as possible. This is one of the structures that will take a larger percentage of the area of the ASIC. For this reason, instead of choosing an MLSE equalizer with a complexity of LMLSE= 4 , a complexity of LMLSE= 3 is chosen, that, according to Figure 3-18 should be enough for the channel model used in these simulations. Additional specifications that can be chosen already are the maximum channel impulse response that will be considered (25 taps), and the number of bits required for the channel impulse response representation (4 bits LMLSE Figure 3-18: Losses associated with the parameter LMLSE in the Viterbi demodulator. for real and 4 bits for imaginary parts). 3.6 Choice of Packet Format Each data packet consists of a preamble and payload as shown in Figure 3-19. During the preamble duration, the receiver should detect the presence of a packet and achieve coarse timing synchronization. The receiver would also achieve a channel impulse estimation. For that purpose, the preamble is comprised of a series of 500 MHz bandwidth impulses in which the separation between every two consecutive impulses allows estimating the channel impulse response without having to compensate for inter symbol interference (ISI). The preamble is, as in the previous transceiver, built with a sequence of bits, each of them represented by 31 consecutive pulses modulated in sign by a pseudorandom code (Gold Code). The preamble contains 16 repetitions of this Gold code. The duration of the interval between every two consecutive pulses in the preamble is denoted Tp, which is chosen to be an integer multiple of the sampling time in the receiver for convenience, so that T, = Np Tb = 60 ns with Np = 30 an integer. During the payload, the pulse repetition rate (PRF) increases in order to achieve the required data rate. The time interval between every two consecutive impulses is TP, = Nb'Tb = 10 ns. In the payload every bit of information is represented by only one impulse, so that the processing gain comes only from the duty cycle of the signal that is still less that 100%. The amplitude of the impulses does not change from the preamble to the payload although this means that the average power during the preamble is smaller than the average power during the payload. This also means that the SNR changes from the preamble to the payload. Since the processing gain also changes (number of impulses that represent a bit of information, duty cycle), I PAYLOAD PREAMBLE Packet Begins >10ns I I I ...L L I State 1 State 2 Acquisition Channel Estimation I I I .. LLLII I State 3 End of Preamble Detection 1On s ,lC 'iii I I II I State 4 Payload Figure 3-19: Design of the data packet. Courtesy of V. Sze. their specification is also independent. The last repetition of the preamble has the sign of the impulses reversed and serves to detect the end of the preamble. This preamble ensures a 90% packet detection rate at the sensitivity level. The total duration of the preamble is 30 bts, comprised of 16 repetitions of the Gold code. The payload has a length of 5 kbits and the pulse repetition period is 10 ns. 3.7 Baseband Functionality The algorithms described in the previous sections of this chapter are fully implemented in a custom ASIC. Part of them are also implemented in an FPGA in a discrete prototype depending on the resources available in the FPGA. For both the complete receiver and the simplified version to be implemented in the discrete prototype, the receiver works like a state machine of four states. These four states, that are related to the packet detection and demodulation, are the following: 1. Packet detection (PD) - The incoming signal is correlated with the pseudorandom sequence in order to detect a peak of correlation. The output of the correlation is compared with a threshold. While the threshold is not met, the receiver keeps looking for the peak of correlation. Once the threshold is met, the packet presence is declared, and the position of the maximum is assumed to contain the echo with the most energy of the multipath. 2. Channel estimation (CE) - The largest echo is assigned to the third tap of the estimated channel impulse response. Another repetition of the pseudorandom sequence in the preamble is used to estimate the channel impulse response. Since the PRF in the preamble is lower than that of the payload, as long as the channel impulse response is shorter than 60 ns, the receiver is able to obtain a clear estimation of the channel impulse response. This channel impulse response is truncated and quantized in the digital domain to reduce to a number of bits that is feasible for implementation. At this stage, the system will choose the effective impulse response that is used for the next stages. There is a chance that PD happened with one of the last repetitions of the pseudorandom code. If that happens, and CE is performed on the final repetition of the pseudorandom code (where the signs are reversed), this situation is detected, and taken into account. The next state is skipped and the receiver jumps directly to payload. 3. End of preamble detection (EPD) - Now we have to detect the end of the preamble with the negative of the pseudorandom sequence to mark its position. The receiver in this stage looks for the end of the preamble. During this stage the maximum of correlation is compared to another threshold in order to ensure that the signal has not been lost or that the system has locked to a false alarm. During this stage the correlation is compared to a second threshold in order to ensure there was not a false detection. This threshold is lower than the one used in PD since it is assumed to be able to detect only false alarms with a high probability. 4. Payload (PL) - Once the beginning of the payload is detected, the system changes to payload demodulation. The matched filter is programmed with the full impulse response estimated in CE and adjusted according to the parameters of the receiver. Normal communication systems have variable length payload. This usually involves including more complex information in the preamble (length of the packet including usually some protection to this information). In our case this is controlled by the program, and does not change. Once payload demodulation starts, it repeats for a fixed number of cycles during which both frequency and delay tracking are activated as required. Figure 3-20 shows the block diagram of the digital baseband necessary to robustly demodulate the UWB signal including the algorithms that were chosen in previous sections of this chapter. In this block diagram, the programmable features used to adapt the receiver to the channel characteristics are indicated. The energy spread caused by the multipath can be compensated using a Rake receiver [63] that provides up to 25 taps. The inter-symbol interference due to multipath is addressed with a MLSE demodulator [50] with 8 states. These elements require an estimation of the impulse response that may be obtained during the packet synchronization using the correlators. The channel impulse response is estimated with a maximum precision of 4 bits, using the information of 31 impulses. The input samples to the digital baseband must have 4 bits, although less bits are required if the SNR is high or the multipath is not severe. Timing synchronization is achieved with both a Delay Locked Loop (DLL) and a Phase Locked Loop (PLL) [40]. Only the automatic gain control is fed back to the RF front-end. The whole timing synchronization is performed in the digital domain. mmable S I Figure 3-20: Required functionality of the digital baseband. 3.8 Non-idealities Model Although not completely consistent, I will consider the front-end of the UWB system as all the functional blocks that process the signal from the antenna output to the analog-to-digital converter. Properly considering the RF front end includes those elements that process the signal with the carrier. This includes the LNA and all the programmable gain stages that are used before the down-conversion of the signal. After the down-conversion, that should remove the carrier from the signal, obtaining the in-phase and quadrature components of the low pass equivalent of the signal, the signal will be further filtered and amplified, but this is performed in the baseband domain. A direct conversion transceiver can be modeled as shown in Figure 3-21. The signal coming from the antenna is processed by an RF front-end. The RF front-end function is to separate the desired signal from other signals present in the environment in different bands. It is usually comprised of a low noise amplifier that helps to reduce the final system noise figure and provides a minimum gain to the system. In addition, some filtering is provided, that together with the baseband filters included in the baseband part, helps to isolate the interest band where the signal is being received from the rest of the signals present. The specification of this will have to do with the coexistence that is expected to be with the other signals with which UWB is competing in the spectrum. The RF Front-end must meet the specifications on noise figure and linearity over a bandwidth larger than 500 MHz. The impulse responses of both the antenna and the RF front-end add to that of the channel. Since the receiver will only be able to deal with a maximum channel impulse response set by design, the RF front-end must be designed to meet this constraint. Figure 3-21: A simplified block diagram of a direct conversion front end. For each of the blocks reflected in Figure 3-21 we will obtain a equivalent model that will allow us to model its limitations. The following things are taken into account: * Antenna: The antenna is assumed to be a linear filter of the energy that it receives from the environment. For that reason, everything that will be considered is an impulse response function ha(t) with a Fourier transform Ha(jw). It will also be assumed that it has a noise temperature of Ta in kelvin. * Low Noise Amplifier: The LNA has a linear impulse response hLNA(t), with a Fourier transform HLNA(jW) that has a maximum gain of GLNA. It adds a noise figure FLNA and it includes a non-linearity that will be modeled as a memory-less non-linearity, and characterizeid with a third order intermodulation component a3,LNA and a fifth order intermodulation component a5,LNA. * Mixer: The down-converter is a multiplier that multiplies the input signal with a cosine function to obtain the in-phase component of the equivalent low-pass (ELP) of the incoming signal and with a sine function to obtain the quadrature component. Mostly its non-idealities are modeled as a non-linearity characterized with the coefficients a3,m and as,m, a noise figure F, and an I-Q unbalance. For the I-Q unbalance the in-phase branch is the temporal reference, and the quadrature branch has both a difference in gain of Am and a difference in phase of Arme. * Baseband filter: This filter provides also some programmable gain, Gf, and an impulse response hf(t). Finally, it is also assumed to introduce a nonlinearity characterized with the coefficients a3,f and as,. Although the two 5 baseband filters will have slightly different impulse responses, we will not address specifically this problem in this thesis. It has also a noise figure Ff. * ADC: For this case, it will be modeled as a perfect analog to digital converter, although it is possible the in-phase and quadrature ADCs have slightly different gains and different integral and differential non-linearities, this part will not be specifically address by this thesis. This system, for simulation purposes is simplified to the following characteristics: * AWGN: All the noise is added at the input of the system as consistent with any linear system analysis. The noise figure will be only specified for this part. * ELP: Everything is referred to the equivalent low pass filter model of the signal. * Channel impulse response: The total impulse response of the front-end, being the convolution of the impulses responses added by each of the components of the system. The specification of this will be a maximum length of the channel impulse response given as a number of samples. * Non-linearity: Given as only one and to the part of the band that falls directly into the baseband band after going through the mixer. For this reason the analysis of this part may be oversimplified. * Unbalance between the in-phase and quadrature components: Given in this case only by a difference in gain of Am and a difference in phase of Arme. The model including these characteristics is fully developed in Appendix A. For the purposes of the specification of the system, in order to obtain a maximum packet error rate (PER) of 10% a minimum SNR of 4 dB is required. This SNR is defined as that of the payload of the data packet. During the preamble, since the duty cycle of the signal is smaller, the SNR decreases. But this is compensated by the use of a known sequence of 31 impulses during the preamble. 3.9 Link Budget For the path loss model I will follow the model that was presented in [3]. Other study related to this is [103]. A free space path loss model is adopted for propagation. This model is based on the narrowband path loss calculations (known as the Friis transmission formula), and justification for its use was provided in [104]: L, = 201oglo4 (3.18) where c = 3 x 10s m/s and fc is the geometric center frequency of the waveform: fc = fin and fmin fImax (3.19) fmx are the -10 dB edges of the waveform spectrum. The effect of multipath in UWB signals is already made explicit by the use of the channel impulse responses indicated in [3]. It is not possible to generate a causal impulse that perfectly fits the spectrum mask given by the FCC. For that reason, whatever power spectral density is generated, its maximum must be fitted to the maximum allowed by the FCC. For that reason part of the maximum total power that the UWB system would be allowed to use, will be lost, as it is shown in figure 3-22. This would be losses as compared to a signal that makes perfect use of the band available (a sequence of sinc pulses). Frequency (MHz) Figure 3-22: Explanation of the losses due to shape of the pulse. This loss can be proved to be: Lpulse shape = 10 log1 o 27r . BW ioo BW ft, IPo(iw)12 dw (3.20) where BW is the bandwidth desired for the signal and Po(jw) is the Fourier transform of the pulse used for the BPSK (po(t)) such that its maximum (in frequency) is equal to 1. If we perform this for a Gaussian pulse (or an approximation to the Gaussian pulse) we obtain Lpulse shape = 2.33 dB. The full development of these results is shown in Appendix A. For the power spectral density of PPM signal, refer to [105, 106, 107, 108]. Taking this into account, and applying the formula that relates the system noise figure with the propagation loss for different 500 MHz bands in the FCC compliant band, it is obtained that to ensure a sensitivity of -81 dBm (signal power received at 10 m distance), the maximum noise figure of the receiver when it is set to provide the maximum gain is 5 dB. Figure 3-23 shows the minimum received power at 10 m as a function of the center frequency of the UWB signal. Figure 3-24 shows the range of the automatic gain control required for each center frequency taking into account a minimum distance of 30 cm and a maximum distance of 10 m. Finally, Figure 3-25 shows the maximum noise figure allowed in the system depending on the center frequency. 3.10 Summary The objective of this chapter was to specify a UWB system that transmits a raw data-rate of 100 Mbps at 10 m distance using a bandwidth of 500 MHz in the FCC compliant UWB band (from 3.1 GHz to 10.6 GHz). It has been established in this chapter that a homodyne architecture is better suited for ultra-wideband signals and that impulse UWB offers the possibility of scaling down the complexity of the receiver 35 4 Frequency (GHz) 45 5 Figure 3-23: Minimum received power as a function of the center frequency at 10 m. Center Frequency (GHz) Figure 3-24: Range of the AGC. r 1!i 5 7 A0 z06.5 E 6 5.5 __ _ _ C4 4 __ _ _ i __ _ _ __ _ _ j Center Frequency (GHz) Figure 3-25: Maximum noise figure of the receiver. when the SNR and the channel impulse response are good. A data packet is defined that is comprised of a preamble and a payload. The preamble is composed of 16 repetitions of a Gold code of 31 bits, in which every two consecutive impulses are separated by a interval of 60 ns. The Gold code is used during the detection of the data packet and the 16 repetitions ensure a time to achieve packet acquisition of 30 js. The separation between impulses allows the estimation of the channel impulse response with reduced or no ISI. The channel impulse response is estimated using the information obtained by receiving a sequence of 31 impulses. Each tap of the channel impulse response is represented with a complex number in which both real and imaginary parts have 4-bit precision. The sensitivity of this system is -81 dBm with a noise figure of 5 dB. Chapter 4 FPGA Implementation As part of the process of developing a complete UWB system, a complete prototype was build in the Digital Circuits and Systems group with off-the-shelf components. This prototype was used to validate some of the theoretical claims of the system in real-time conditions. As the components of the system are designed and fabricated, they can also be individually substituted into the prototype to verify overall system functionality. Due to the flexibility of the prototype, its applicability is not restricted to only impulse UWB signals, and other kinds of modulations can be tested. This transceiver is the result of the work of a group of students of the Digital Circuits and Systems group : The dedicated pulse generator of this transceiver was desinged by David Wentzloff. The RF front-end of this transceiver was designed by Fred Lee. The digital baseband processor, comprised both by off-the-shelf and custom designed boards was designed by Kyle Gilpin. The software interface between this set of boards and a PC was designed by Nathan Ackerman. Nathan Ackerman also provided an application interface to be able to send data packet through the wireless link provided. The digital baseband implemented in the digital baseband processor was designed by Vivienne Sze and myself. 4.1 Architecture of the Discrete Platform The primary purpose of the UWB development platform is to allow rapid prototyping and performance characterization of a UWB communication channel. Also, the UWB development platform aims to provide testing of logic designs before ASIC fabrication. In order to accomplish this, the UWB development platform must natively contain all components essential for transmission and reception of data over UWB. Furthermore, the UWB development platform must be modular to allow replacement of modules without loss of functionality for testing and characterization. Figure 4-1 shows the block diagram of the discrete prototype. It can be divided into three distinct sections: the transmitter, the receiver, and the ADC and baseband processing. The baseband UWB signal is generated using either a programmable arbitrary waveform generator (AWG) or a dedicated pulse generator built using offthe-shelf components. The signal is then up-converted to a center frequency that may Figure 4-1: Block diagram of the discrete prototype. be selected using the programmable oscillator. The link between the transmitter and receiver can be made through wireless transmission using various antennas [109, 110] and spatial configurations to emulate a wide range of channels. The transmitter and receiver may otherwise be directly connected through a cable with a variable attenuator to emulate an ideal channel. The receiver replicates a direct conversion receiver front-end. After this, the signal is sampled by a dual ADC. The sampling frequency of this ADC can go up to 1 GSPS, but for most of the testing purposes it was kept down to only 500 MSPS. The output of this ADC may be either processed in real-time using a digital baseband implemented in an FPGA, or buffered and processed offline using Matlab. With this approach, virtually any baseband algorithm not requiring real-time control of the front-end may be tested. This includes acquisition and fine tracking, channel estimation, interferer rejection, and demodulation. Feedback loops such as automatic gain control require real-time sampling, and therefore cannot be tested using this acquisition board. The characteristics of the blocks of this discrete platform are summarized in the following subsections. 4.1.1 Transmitter The transmitter up-converts the baseband signal to an arbitrary center frequency by direct multiplication of the baseband signal with a sinusoid, as shown in figure 4-1. Figure 4-2: Discrete prototype transmitter. Courtesy of N. Ackerman. There are two choices regarding the generation of the baseband signal: the signal may be generated using a dedicated impulse generator or an arbitrary waveform generator (AWG). The dedicated impulse generator can generate BPSK impulses with an up-converted bandwidth of 500 MHz. All logic functions were implemented using commercially available Emitter-Coupled Logic (ECL) components. This transmitter may generate impulses every 20 ns, achieving a pulse repetition frequency (PRF) of 50 MHz. At each interval of 20 ns, the transmitter may arbitrarily transmit a positive impulse, a negative impulse or no impulse at all. This last feature is included because during the data packet preamble the pulse repetition frequency is lower than that used during the payload. The interface between the transmitter and a computer is implemented using a board by Opal Kelly. This board contains an USB 2.0 that allows fast communication with the PC, and a Field Programmable Gate Array (FPGA) that allows interfacing with the transmitter board. This FPGA has enough local memory to store one packet of data, while it is being transmitted. Figure 4-2 shows a photograph of the transmitted board, the FPGA board and the oscillator board when they are used in this configuration. The other option to generate the baseband signal is to use an AWG. The Tektronix AWG710 allows generating any signal that can be represented with up to 8 bits of precision at a maximum data rate of 4 GSPS. It contains a memory that would allow storing up to 4 ms of data sampled at this rate, allowing for a large number of data packets. Using an AWG enables a large amount of flexibility in the shape of the pulses transmitted, modulation scheme and duration of transmission. For example, although this work focuses on pulse-based systems, OFDM can also be synthesized as long as the equivalent low pass signal has no imaginary part. The samples for the AWG are generated using a PC and downloaded to the instrument. Various non-idealities may be added to the signal prior to generating the samples such as non-linearity. In-band interferers such as 802.11a or random tones may also Figure 4-3: Discrete Prototype receiver. Courtesy of Fred S. Lee. be added. The AWG is useful for implementing one-way communication with greater flexibility than the impulse generator, but is limited in how fast new data can be downloaded to the instrument. This platform is also flexible enough to generate various waveforms within a bandwidth of 500 MHz, allowing the comparison between different modulation schemes. 4.1.2 Front-end The RF front-end is built entirely using discrete components. Its realization is shown in Figure 4-3. As shown in Figure 4-1, the received signal is amplified by two cascaded LNAs, then split and applied to two identical passive mixers performing I/Q direct conversion. The 900 phase shift in the local oscillator (LO) is implemented by fixed, unequal delays in the LO transmission lines to each mixer. This method of phase shifting provides quadrature tones at 5.355 GHz, but also allows for tuning of the I/Q error simply by tuning the RF center frequency. Tunable phase error is desirable in the prototype for testing the robustness of the digital baseband. It is possible to replace these transmission lines by a wideband 900 phase shifter. After frequency down-conversion, the baseband signals are filtered and amplified with an adjustable gain before being digitized. 4.1.3 Receiver In the receiver end we have again two possible options for processing. First, it is possible to sample the baseband I and Q signals from the front-end by a dual-channel 8-bit 500 Msamples/s ADC board that interfaces to a PC directly through the PCI bus. Or it is possible to use a system of boards that includes the ADC, and several Figure 4-4: Boards related to the ADC and baseband of the discrete prototype. Courtesy of N. Ackerman FPGAs that allow implementing both a real-time baseband to demodulate the data packet and an USB 2.0 interface to send the received packets to a PC. Figure 4-4 shows the different boards, some custom and some off-the-shelf, required to implement the ADC, the digital baseband the interface required. Hardware located on top of one another in the figure indicates electrical connections. The details of the design of this platform can be found in [111]. These boards provide a high speed dual ADC capable of sampling two independent input signals at 1 GSPS with 8-bit precision (the high speed Atmel ADC - AT84AD001), and a Virtex2Pro VP30-6 FPGA that will be used to implement the digital baseband of the UWB system. The fact that the digital baseband is implemented in an FPGA allows an important amount of flexibility to test in real-time the impact of different baseband architectures in very little time. 4.1.4 Protocol The current platform implements a one direction wireless link. In order to be able to provide measures of packet drop rates, probability of errors, an API is implemented that allows parsing data packets in a format that may be correctly interpreted by the transmitter boards. On the receiver site, once a data packet has been received in the digital baseband board, its content is buffered and sent through the USB 2.0 interface. If a UWB signal is detected, the signal processor will retrieve the data bits from the UWB signal and send them to the module for transportation to the PC. At the same time, the module responsible for receiving desired data to transmit will receive incoming transmission requests from the PC and send the appropriate data to the discrete pulse generator for creation of a UWB signal. The integrity of the data packet is checked in the receiver PC and an acknowledgement signal is sent back through the intranet (wired) to the transmitter computer. 4.2 4.2.1 Application in the Digital Baseband Design Limitations of the Digital Platform The main limitation of the digital baseband implemented in this backend is the maximum number of gates available. The FPGA used is a Xilinx Virtex2Pro VP30-6, with 1 million gates. How these gates are used depends on the architecture chosen, and a direct translation of the Verilog generated for the chip may not be the most efficient application of the resources in the FPGA. For the development of the digital baseband, less than half of this is available, in order to avoid severe routing problems and to use part of the FPGA to debug the baseband. The components required to monitor the baseband inside the FPGA are automatically added when using Chipscope software by Xilinx. In a real wireless system, every clock and oscillator in the transmitter are generated from the same reference (carrier generator and 100 MHz digital clock controlling the pulse repetition frequency), and the local oscillator and the sampling clock (500 MHz) in the receiver are generated from another reference. The differences between the carrier generator in the transmitter and the local oscillator in the receiver lead to difference in phase when the signal is demodulated. That is usually corrected in the receiver with a phase-locked loop (PLL) subsystem. The errbrs in frequency of the 100 MHz digital clock in the transmitter and the 500 MHz digital clock in the receiver translate in a drift of the incoming pulses generated in the transmitter with respect to the sampling instants in the receiver. This is corrected in the receiver using a delaylocked loop. In a real wireless system where there is only one timing reference in the transmitter and another timing reference in the receiver, the errors that are corrected by the PLL and the DLL are correlated. It is possible to take advantage of this by either using the information of the PLL (more precise) to refine the DLL, or to not use a DLL and extrapolate the corrections a DLL would generate from the corrections that the PLL is generating. In our system, since there are four independent timing references, the errors corrected by the DLL and the PLL are independent and it is not possible not to include both systems working independently. The local oscillators used for the carrier in the transmitter and the carrier in the receiver are very stable. Their frequency is generated with a precision better than 2 ppm. The change of phase between the transmitter and the receiver caused by this difference is negligible for the duration of a data packet and no PLL is required. Only the initial phase is necessary and that is compensated by the matched filter. The 100 MHz and the 500 MHz are not as stable a reference as the other one. For that reason the change in delay for the duration of the packet is large enough to make the use of a DLL necessary. 4.2.2 Specifications and Interfaces The signal received is going to be comprised of data packets as the one indicated in the previous chapter, but with some different parameters. First, during the preamble, the incoming pulses are not separated 60 ns, but only 40 ns. This is less than what -CM1 --- CM2 4 4 %...CM3, CM4 'A m '3 V o2 -J . .. ..... 1 5 10 15 20 Assumed Channel Length (samples) 25 Figure 4-5: Losses due to misrepresentation of the channel impulse response in the discrete prototype. is needed to properly combat every possible multipath situation, but it allows a very convenient partition of the architecture. In addition, it is necessary to create an easy infrastructure that allows the reception of data at a raw PRF of 100 Mbits/s (time between consecutive impulses in the payload equal to 10 ns) or 50 Mbits/s (time between consecutive impulses in the payload equal to 20 ns). The reason for this is that currently there are two options for the transmitter. On one hand, we can use for the discrete prototype the arbitrary waveform generator (AWG) and with that equipment it is possible to generate packets at 100 Mbps, but it is not possible to change the packet sent wirelessly dynamically. Or the custom pulse generator board may be used instead, that allows generating packets at 50 Mbps, but offers dynamic control on the packet that is transmitted. The total number of gates available on the FPGA for actual implementation of the circuit is lower than that required to fully implement and test the transceiver that will be developed in chapter 5. For that reason, a simpler version, with less functionality, is implemented in this digital baseband. The changes are a lower number of correlations in parallel (20 as compared to 150), a 5 tap partial Rake (as compared to the 25 tap that is presented in next chapter), and neither a Viterbi-like MLSE nor automatic gain control are included. The impact of these adjustments in the final performance of the system are shown in Figure 4-5. The interface between the dual ADC and the digital baseband is comprised of a vector of four complex signals in which both the real and imaginary parts are represented with four bits. Even if the ADC are able to provide 8 bit precision in every case, in order both to reduce the percentage of the total FPGA required and to make it work under conditions similar to that of the final ASIC, only the 4 most significant bits are used as input to the baseband. The vectors are properly 87 Mbps) 1bps) Figure 4-6: Block diagram of the discrete prototype baseband. synchronized to a 125 MHz clock that is also the main clock of the digital baseband. The output of the baseband is comprised of a vector of four demodulated bits in parallel, a valid data signal, and a 25 MHz clock synchronized to the data. The valid data signal goes high to indicate that the data packet is ready to be read and stays high while the vector of demodulated bits represents valid data. This signal may be used as an interruption when interfacing with a computer that is to read the received data packets. 4.2.3 Architecture of the Baseband Figure 4-6 shows the block diagram implemented in the system. The samples are provided by the ADC as vectors of four consecutive complex samples, properly aligned with the rising edge of a 125 MHz clock. The FPGA allows some parts to work at 125 MHz, but to perform all the operations and all the functionality at this frequency would require special care in the routing of the different circuits inside the FPGA. At this frequency, only the retiming block and a series to parallel operation of the incoming data through 5x parallelization are performed. After this operation, the unit of processing is a vector of 20 chronologically ordered samples, and the clock necessary to process them is only 25 MHz, obtained internally from the 125 MHz clock. All the mathematical operations are implemented at 25 MHz, simplifying the automatic design of this part by using standard automatic place and route. For that reason we will have a fast clock domain and a slow clock domain as in the previous prototype. A clean interface between the high speed clock domain (running at 125 MHz) and the slow speed clock domain (running at 25 MHz) is needed. The signals going * clk25 x * Clk125 1 = .Counterl1 [2:0] 'hx *b Counter2 _fast, 'd" 4* Inputs[15:0] 'hx Sb tmp[79:0] 'hx Figure 4-7: Control Signals for the Serial to Parallel Register. to the slow speed clock domain should be latched into this clock domain after they have been ordered in a vector, using for that the control signals indicated in Figure 4-7. This figure shows that the rising edge used to latch the data into the slow clock domain is safely separated from the rising edge of the high speed clock domain that presents the data at the output of the serial to parallel register right before. The decision data, including the control to the retiming block, is latched to the 125 MHz clock domain at least two cycles after it became stable as outputs of the slow clock domain. Correlators The FPGA size limits the number of parallel correlators that is possible to implement. The transmitter board allows transmitting impulses at integer multiples of 10 ns. Because of this, the separation between the consecutive impulses for the duration of the preamble is chosen to be an integer multiple of the time between every two consecutive impulses in the payload. The architecture used for the correlators is shown in figure 4-8. The basic correlator unit in this case is comprised of five parallel correlators that are not time shared so that each of them is accumulating only one correlation at every instant during coarse acquisition. In this diagram, the variables w[i] have several uses at the same time as the correlators themselves. During coarse acquisition the correlators are accumulating the correlation of the incoming signal with the pseudorandom sequence. For that reason the value of the variables w[i] is common to all the correlators and contains the values of the sequence of bits of the pseudorandom sequence. The outputs of the correlators, c[i] is not added together in groups of 5. The reason the unit of correlators comprises 5 correlators is that this is the distance between every two consecutive impulses in the payload is 10 ns, that is the same interval as five samples at 500 Msamples per second. During the channel estimation state, only the top five correlators will be working. After a full correlation of the pseudorandom sequence we obtain an estimate of five taps of the channel impulse response. Since they are the five taps closer to the position of the maximum, they are expected to be also the ones containing most of the multipath energy. Those taps are estimated with nine bits of precision. After state 2, the outputs of the 5 correlators of the first group of correlators is used (by conjugating) to obtain a matched filter of only five taps. This matched filter is used during states 3 and 4 (payload), and discarded. It Figure 4-8: Block diagram of the basic structure for the correlators and matched filter. is assumed that the channel for each packet is white, but the coherence time is larger than the duration of the packet. Timing Synchronization As indicated in a previous section, the transmitter carrier generator and the receiver local oscillator are very well tuned eliminating the need for a phase locked loop (PLL) during the packet demodulation, since the total phase change for the duration of a 16 kbit packet is negligible. On the other hand, the clocks controlling the sampling rate of the ADC and the pulse repetition frequency do not posses the same stability. For that reason, it is necessary to implement some kind of delay locked loop. It follows the same scheme that was already used in the first prototype developed in chapter 2. The only relevant change with respect to the previous version is the input. During the preamble, the input to the DLL is given by the energy accumulated in the previous and next taps (correlators 2 and 4 in Figure 4-8) while they were correlated with the matched filter. During the payload demodulation, since each bit of information is encoded using only one impulse, the sum of the energy of those same taps for 32 cycles of the 25 MHz is accumulated. This takes into account the delay of one impulse out of every four if it is working at 100 Mbps, or one out of every two for a 50 Mbps data rate. The one sample granularity required in this tracking loop is obtained using a similar block as the one designed for the prototype developed in chapter 2. In this case, the first operation does not imply combining uniformly four consecutive samples, since there is a partial Rake of five taps, and since the samples separation between consecutive data impulses is equal to five or ten samples. In this case, it is necessary to provide the input samples to the series to parallel block as a chronologically ordered INO X -OutO - MUX Outi IN1 IN2 O IN3 O Out2 -•ot3 ControlPos Figure 4-9: Block diagram of the retiming block. vector. For that reason, after some samples are selectively delayed, a programmable connection matrix is used as shown in Figure 4-9. 4.2.4 State Machine During the packet detection and demodulation process, this baseband will go through the following states: 1. Packet detection - In each cycle of the system, 20 correlations are calculated and their result is compared to the threshold. If any of them meets the threshold, coarse acquisition finished. The transition to the next state consists on aligning the next received impulse with the third correlator in the figure 4-6. 2. Channel Estimation - In the next iteration, the first five taps of the channel impulse response are calculated. This is done by performing a further correlation with the incoming signal. The results are stored with a precision of four bits to perform the correlation. The sign of the center piece is compared to the ones stored during coarse acquisition. By doing this, we are ensuring that we notice the end of the preamble if we come too close to it. If they are different signs, it indicates that the end of the preamble has been reached, and that the channel estimation is also reversed in sign. 3. End of Preamble Detection - We keep doing the same thing as before, but now we take into account the final 5 samples and use the channel estimated in the previous stage to maximize the SNR. We are looking at a change in phase of 180 degrees. 4. Payload - Here the correlators are not working as such since there is no storing of the previous result to compute the next ones. -O 100- -100 D 31 31.5 32 32.5 33 33.5 34 60 60.5 61 time (g s) -F 100 0 a:-100 58 58.5 59 59.5 time (Ls) Figure 4-10: Part of the preamble of a data packet as measured in the discrete prototype, without (above) and with an interference(below). 4.2.5 Results Figure 4-10 shows an example of a part of the preamble of the data packet as measured in this discrete prototype using an FCC compliant signal centered at 5.355 GHz in a wireless link without interference and affected by an interference with an SIR = -11 dB in the preamble. The receiver achieved packet synchronization in presence of the interferer, and the channel impulse response was measured to be below the inter-pulse interval of the payload. Packets of 10000 bits were perfectly demodulated without ISI. This discrete prototype was used to demonstrate a 100 Mbps data rate using the AWG with data packets of 32 kbits payload. Using the transmitter set-up at 50 Mbps, the API was used to transmit a continuous data stream comprised of a sequence of jpeg images. 4.3 Application for Testing Multitone-FSK The flexibility provided by this system allows a full range of possible testing. In this section the first test of a communication scheme that has been proved to be optimal in [4] is shown. 4.3.1 Signal Definition In frequency shifting keying (FSK) systems, different symbols are represented by sinusoids with different frequencies. For multitone FSK, the symbols are combinations of multiple sinusoids with different frequencies in the band. If we have a set of M mutu- I 100 so o0 - -100 0.6 0.8 1 1.2 1A Time (nh) 1.6 1.8 4 Vlin Figure 4-11: Example of MFSK signal. Courtesy of Cheng Luo ally orthogonal frequencies over the allocated bandwidth, every possible combination of these tones is a possible symbol to use. For Q-tone FSK, there are ( ) possible Q-tone combinations from M tones. Let S denote the complete set of symbols, and Sm an symbol in the set, i.e., Sm E S. Let T, be the duration of the symbol. Then, each symbol can be represented as [4]: ie 2 x(t) = fkt, 0<t <T (4.1) keSm Some of these values must be specified as a function of the characteristics of the channel and the bandwidth available. Let us assume (Af)c denotes the coherence bandwidth of the channel. Thus, two sinusoids with frequency separation greater than (Af), are affected differently by the channel. In the same way we define (At), as the time spread of the channel. If we send two impulses with delays closer than (At), then those impulses will be blurred together by the channel impulse response if they are in the same frequency. The inter-symbol is chosen to be larger than (At), to ensure that there is no ISI. The separation in frequency of the tones available is also made larger than (Af)c, to ensure that the fading of each tone can be assumed to be independent of the fading of all the other tones. If the alphabet built like this is large enough, the probability of collisions between symbols of different users is very low. Parameters such as the duty cycle or the separation between frames may be optimized to reduce this. The resulting signal is shown in Figure 4-11. The impulses used can be made longer and reduce its amplitude in order to comply with any legal rule imposed in the system. Since the integration that is performed in the receiver depends only on the duration of the impulse (it is non-coherent integration and for that reason, the integration gain is sensibly lower than coherent integration) and can be made as long as needed. F-T I 0 JiFterL Fl.-r Input Signal .iTL Filtrr S . -r2 2ii f3 fIFr _o4) Figure 4-12: Architecture for demodulation of Multitone FSK [4]. 4.3.2 Receiver Architecture The receiver uses a bank of matched filters with their central frequencies tuned to each of the M tones[4]. The simplicity of the receiver is shown in Figure 4-12. After the matched filter, the power in each of the outputs is averaged over an interval in the order of the duration of the symbol, and the outputs of each channel are compared to a threshold chosen to minimized false detections when the tone is not being used, using a Neyman-Pearson curve. It has been shown [4] that this signal modulation scheme, with this conceptually simple receiver achieves a capacity close to that of the wideband capacity limit independently of the channel multipath and for typical channel parameters. This receiver is robust when the bandwidth between the tones is smaller than (Af)c (and then, the fading/channel impulse response for these two tones correlated). When this happens, the effect is that multitone FSK still approaches the wideband capacity, but in this case, the convergence is slower. No study on this slower convergence (how slow it is) has been done yet [4]. 4.4 Conclusions In this chapter we have presented a flexible platform for testing of UWB transceivers. This platform is flexible enough to test non-impulse signals (such as multitone-FSK), while at the same time providing the right functionality to test UWB systems under real-time conditions. This system has been used to implement a smaller scale FCC compliant UWB system than the receiver developed in chapter 5. Using this prototype, wireless data links of 100 Mbps and 50 Mbps were demonstrated. Chapter 5 ASIC Implementation of a Baseband for FCC Compliant UWB This chapter presents the architecture developed to implement an FCC compliant robust UWB transceiver. This follows the design of an architecture that implements the functionality that was presented in chapter 3 in order to obtain a robust receiver. This chapter includes both a description of the architecture, and the results of the measures required for testing the functionality and the power dissipation. 5.1 Functionality of the Chip This circuit is part of a system designed in our group for FCC compliant UWB communication in the 3.1 to 10.6 GHz band. Figure 5-1 shows the architecture of the system. It comprises a homodyne receiver in which both the front-end [112], the transmitter [113], and the ADC [114], have been designed by other students in the Digital Circuits and Systems Group. The UWB data packet is comprised of a preamble and a payload, both of fixed length. The preamble is comprised of impulses that are transmitted with a separation interval of 60 ns. It consists of 13 repetitions of a Gold code of length 31 that will be used by the digital baseband to detect the presence of the packet and to achieve packet acquisition. The larger time interval between impulses in the preamble allows estimating the channel impulse response with a lower impact of inter symbol interference. The payload on the other hand is comprised of a sequence of BPSK impulses of 500 MHz bandwidth transmitted with a pulse repetition frequency of 100 MHz. Since each bit of information is represented by one impulse, this system allows transmitting a raw data rate of 100 Mbits/s. The packet length is 5 Kbits. The digital baseband performs the detection and demodulation of the data packets. It is implemented based on the assumption that it receives samples from a direct conversion receiver with synchronized ADCs in the in-phase and quadrature channels. It was determined in chapter 3 that the signal processing required in the demodulation Bit to Transmit NM Pus Generator TR ,TRANSMITTER rnt 4 .. .. .. .. ........... ------ ------------------- '----- REEVRI Figure 5-1: Block diagram of the full transceiver. of the signal depends on the channel quality. Because of this, the digital baseband is designed with two main objectives. First, it is going to be able to estimate the channel impulse response during the preamble of the data packet, and use this information both on a partial RAKE and an MLSE for channel compensation. Second, it provides the possibility of activating and deactivating different subsystems of the baseband, making possible to scale the complexity (and the energy dissipation) of the signal processing applied to demodulate the signal, allowing it to adapt to the channel quality. The digital baseband has been designed also to minimize the number of signals that are fed back to the analog front-end and the ADC. Only the automatic gain control is set to the front-end, and the whole synchronization is performed autonomously in the digital baseband. Figure 5-2 presents the block diagram implemented in this system. 5.2 Interfaces and Clock Structure This baseband was designed to work with a dual scalable, successive approximation register (SAR) ADC [77] sampling at 500 MS/s. Each of the two ADCs is comprised by 6 parallel ADCs each of them sampling at 1 / 6th the total sampling frequency. Following the trend already explored in the first prototype, the outputs of this ADC are presented in parallel to the baseband, as chronologically ordered vectors of six consecutive samples aligned with a 83.3 MHz clock (1/ 6th of the sampling rate) to latch this information into the baseband. Each sample represents a complex number that has a real part and an imaginary part, both with 5 bits. The baseband is divided in two clock domains: a high-speed clock domain running at 83.3 MHz, and a slow-speed clock domain running at 16.6 MHz. The 83.3 MHz clock is provided externally to the baseband chip by the ADC. The 16.6 MHz is generated internally by dividing the high-speed clock frequency by 5. The highspeed clock domain contains a retiming block for delay tracking, a series to parallel Threshold mberof states To RF Front-end I Programmable Features Figure 5-2: Block diagram of the functionality of the chip implemented. converter that performs a 5x parallelization, and the main control of the baseband. The low-speed clock domain implements all the signal processing, taking advantage of the longer period of the clock. It also determines the next state in the receiver. The demodulated bits are presented through a parallel interface of six demodulated bits at a time at 16.6 MHz, with a signal that may be used as an interruption and that indicates when the bits presented are valid data bits. The baseband can be programmed using a serial port. This procedure loads a shift register with several numeric values that are required for the normal function of the chip (threshold, filter taps), and a sequence of flags that indicates which parts of the system should be activated or not. This vector serves to adapt the complexity of the receiver to the signal detected. As different subsystems are presented in this chapter, the programmability options available for each of them will be introduced. In the following sections we will provide details of the different subblocks of the receiver. We will start with the functionality implemented in the high-speed clock domain. Taking into account the complexity of the slow-speed clock domain, a section will be dedicated to the correlators, the timing synchronization blocks, the channel analysis sub-system and the MLSE equalizer. The baseband system goes through a state machine of four states during the detection and demodulation of a data packet as shown in Figure 5-3. The duration of each of these states varies with the state itself. We will define an "iteration" as the time necessary for the receiver to gather enough data to perform a decision on which to jump from one state to another, or to perform a correction in phase of the received signal or in delay. During the time of an iteration, no control adjustments are made in the receiver of any kind. An iteration is going to comprise an integer number of cycles of the 16.66 MHz clock. The number of cycles for states 1 (PD), 2(CE), and PACKET DETECTION CHANNEL ESTIMATION PREAMBLE END PAYLOAD DETECTION Figure 5-3: State machine implemented in the system. 3 (EPD) are 36, 31 and 31 respectively. States CE and EPD duration are associated to the length of the Gold Code. The duration of the iteration in state PD is linked to the number of different delays that the system is testing in each iteration. Since the receiver performs 150 correlations in each iteration, it is testing a time interval equivalent to 5 cycles of the 16.66 MHz, and this must be added to the duration of the iteration so that in the next iteration the next 150 possible delays are tested. In state 4 (PL), taking into account that the duration of the payload is fixed, and that once in this state the baseband will provide a sequence of 5k bits no matter what, the iteration represents the time that both the delay locked loop and the Costas loop perform an update. Each iteration of state 3 takes 32 cycles of the 16.66 MHz clock, and there are a total of 32 iterations for the total payload. This number of cycles was chosen because it was an integer power of 2 while being still able to track a total frequency difference of 100 ppm between transmitter and receiver oscillators. 5.3 High-speed Clock Domain Figure 5-4 shows the block diagram of the high-speed clock domain. This block receives the input from the dual ADC as a vector of six consecutive samples aligned with the rising edge of an 83.3 MHz clock. For the purposes of processing the signal in the receiver, this block performs a 5 way parallellization providing vectors of 30 consecutive samples. In this way, it is possible to reduce the clock frequency in most of the receiver to 16.66 MHz, simplifying the timing design of the most com- Samples ADC Samples ADC 6 complex samples I = 5 bits Q = 5 bits 60 bits Sample 0=>5 [-- L Sample 6=*11 6 complex samples I= 5 bits Q * 5 bits 60 bits Correlators -j% Sample 24 29 CIk @83.3MHz 1 0I ,•1011WIM I = 5 bits Q * 5 bits 60 bits - _CIk @16.6MHz -- ---- InitFrame StateDemod (2bits) To Slow-speed Clock-domain 10- Counter (6 bits) NextState (2 bits) - Res - Swapperln (3bits) From Slow-speed Delayln Clock-domain Delayiln Waitln (5 bits) Figure 5-4: Block diagram of the high speed clock domain. plex mathematical operations that will be performed only at this reduced frequency. Although for the process chosen, 16.66 MHz is a low enough frequency, the critical path in some of the subsystems (concretelly the MLSE) will be close to the period of this clock. The parallelization is performed in the serial to parallel block, and at the output, the samples are latched using the slow clock in order to provide a clean interface with the slow-speed clock domain. The high-speed clock domain includes also the main control of the system. Following the paradigm already established in previous designs, the control of the system is distributed, with individual subsystems receiving from the main control the 16.66 MHz clock, the state in which the receiver is operating and a signal (InitFrame)that indicates when an "iteration" starts. Decisions related to changes of state are taken in susbsystems present in the slow-speed clock domain. Only the result, expressed as INO --- OutO -Outl LU1- MUX IN1-·~~-l IN3 D0 IN4ý INS-EFD -Out2 O3F: -Out3 oZz -' ut 0ZWP 1 oz 0 -Out4 CO 11X ýOnC3 -Out5 S 1C4 " ControlPos Figure 5-5: Block diagram of the retiming block. the next state, plus some minimum additional information (timing adjustments necessary for the retiming block and/or extra wait cycles) are sent to the main control, simplifying its design. The high-speed clock domain also includes a retiming block, whose block-diagram is indicated in Figure 5-5. Its function is the same as the homologous structures in the previous receivers. There are two differences in the implementation. Since the vectors now have 6 elements instead of 4, the structure used before must be expanded accordingly in order to be able to provide the vectors indicated in Figure 5-6. In this receiver, the different elements of these vectors are processed independently. It does not add together without previous processing several consecutive samples. For that reason, the output of the retiming block must be chronologically ordered, and the retiming block requires a configurable connection matrix. 5.4 Correlators/Matched Filter Block The correlators block is part of the slow clock domain working at 16.66 MHz. It receives, from the high speed clock domain with every rising edge of the 16.66 MHz clock, a vector of 30 consecutive complex samples, with real and imaginary parts represented with 5-bit 2s-complement binary numbers. It is possible to program this block to perform either 30 or 150 correlations during each iteration of packet detection. This impacts the length of the preamble required to ensure that the packet is detected. This trade-off will be explored in the last section of this chapter. 100 ADCO[n I ADCII ADC2[nj I ADC3nj I ADC4[nj ADC5nI \ \ \ \ \ \ \ \ \ \ \ \ \ \~ \ \ \ \ \ \ \ \ I I -\ \ -• \ -\ \ \_ I I \ \ \_ \ \ \ \ ' \ \ \ \ AD"o0nI ADCi[n J \ \ \ \ ' \ \ \ AUW[n] ADaIn [n \ \ \ \ \ \ \ \ oADcOn] ADC1In] '[QC2In I \ \ \ \ -+ j X$DC3( " \ ________\ A \ . *ADGOn I A"DCI-nijOCj 2[jAgP" I AD=O(n] ADd DOn ADCOIn]i Figure 5-6: Retiming block. The correlators block in this implementation has grown from the simple structure used in the previous implementations to a complex block that allows even a 25 tap partial RAKE to be implemented. Its functionality is extended as compared to previous receivers, since it will not only calculate the correlations of the incoming signal with the pseudorandom sequence. It indicates the position of the maximum of correlation, if any, during coarse acquisition. It provides the channel impulse response estimation for its use in other subsystems. It also implements the variable Rake receiver. Figure 5-7 shows the high level block diagram of the correlators that also indicates the inputs and outputs required by this block. The correlator block is comprised of 6 correlator groups, each of them including 5 "slices". The outputs of the correlator block depend on the programmability and on the state. The summary of the features is the following: During payload detection it is possible to choose from programmability either to calculate 30 correlations in parallel or to calculate 150 correlations in parallel. The duration of the iteration will depend on the number of correlations calculated. In the case of 30 correlations, the iteration has a length of 32 cycles of the 16.66 MHz clock. In the case of 150 correlations, the iteration has a length of 36 cycles of the 16.66 MHz clock. In both cases, the correlations are estimated with 28 out of the total of 31 impulses, which reduces the signal to noise ratio of the peak of correlation by 2 dB. The outputs provided in any of these two cases for the other blocks to process is comprised of a signal that indicates if the packet detection threshold was met (ThreshMet0), the value of the maximum sample (RealforCheck and ImagforCheck), its L1 metric (Maximo Value), and its position (PosAxisl, PosAxis2 and PosAxis3). Of the three values given of the position, only PosAxis2 and PosAxis3 are required when the baseband is programmed to perform only 30 correlations during the packet detection estate. During the Channel Estimation state, three outputs of the block are relevant. • 10 bits 25 complex samples I * 2/3/4 bits .9A 234 IGoldCode I To Coarse Samples Acquisition -IMF To CostasLoop and DLL SamplO L I I I I ii II ill I lex To Channel bit Analysis :1 ; I Samples 24329 Q 30 complex ?",, \ Correlators -. Q =>5 bits 60 bits Maximum 1 0-5 Q > 10 bits Wl=:;, I I I•I PoesAdsl (3 bits) PosAxis2 (3 bits) PoAs3(3bits) MaximumValue (11bits) ThresmetO To Coarse Acquisition Figure 5-7: Block diagram of the correlators. 102 First, the channel impulse response estimated appears in ChanOutReal(real part) and ChanOutlmag (imaginary part). This output is given with 10 bits in both the real and imaginary parts. Signal ThreshMetl indicates if a second threshold lower than the first has been met, indicating that the previous packet detection was not a false alarm. Finally, SignRealMax, SignImagMax, RealforCheckand ImagforCheck,may be used to detect a change of the polarity in the signature sequence that would indicate that the preamble has ended and the payload should be demodulated immediately afterwards, skipping preamble end detection step. During the Preamble End Detection state, the relevant outputs are ThreshMetl (indicating if the signal is still present), SignlmagMax, SignRealMax, RealforCheck, and ImagforCheck (to detect the end of the preamble). Finally, during the payload detection, OutputCorrRealand OutputCorrlmagcontains the real and imaginary parts of the outputs of the Rake receiver. Six outputs are obtained at a time, each of them rounded to 10 bits. They are fed to the MLSE demodulator. The first of the outputs (represented by the LSBs of the whole vector) is also used as input to the Costas loop. In addition, PrevReal, PrevImag, NextReal and Nextlmag are used as inputs to the DLL. Figure 5-8 shows the block diagram of a correlator group, comprised of five "slices". Its obtains partial results of the matched filter. During Packet detection, either 5 or 25 correlations with the signature sequence are calculated. Their maximum is obtained and its position given as two 3 bit value. The latency for this is two cycles of the slow clock after the last correlation. During channel estimation, five weights of the channel impulse response are calculated. During payload detection, this block already receives all the taps estimated of the channel impulse response, and they are fed into the five slices as shown in Figure 5-8. The outputs of these are added together in groups of five, so that the outputs of the first multipliers in each slice are added together, the outputs of the second multipliers are added together, ... Those blocks associated to at least one set of five consecutive taps that have been already made equal to 0 are directly shut down. Since the numbers of multipliers activated depends on the length of the channel impulse response used for the Rake receiver, the number of outputs obtained on this structure will vary from one to five as the channel impulse response grows in length. Figure 5-9 shows one "slice". In the correlator bank we use 30 of these units, and they perform different functions depending on the programming or the stage of the baseband. The input x to this block is a complex number in which both the real and imaginary parts are represented with five bits. x represents one arbitrary element of the vector of 30 samples that is input to the correlators. Other inputs to this block are up to five possible complex taps in which both the real and imaginary parts are represented with 2s-complement numbers with four bits. x is then multiplied with up to five different taps and the result may or may not be accumulated in the registers by using the signal InitZero. When InitZero is 1, it will add the number stored in the register to the result of the product of x with one of the five taps. If InitZero is equal to 0, the result of the product of x with one of the five taps is stored directly in the register. This block also offers the possibility of obtaining an estimate of the power 103 010 ao] c[s] c[lO] C[15) al] will w[2] y[1] y[2] w[3] w[4] SLICE 0 y[3] y[4] Abs(y.M)Pos(y,) i af51 a10o] a15] sr.i m[Ol LI) LL c[6] ! c111] c[16] c[21] w[ol will w[2] w[3] y]l] y[2] wf41 SLICE 1 Y3]1 y[4] Abs(y,.)Pos(ym,) a[6] a[11] a[16] a[21] m[l] c[91 c[14] c[19] of24] w[il wI21 w[3] y[)1 w[0] fol a[41 I ! y[1] y[2] a[9] a1 ,[l9 5a[241 : p[O] c[i] f1] f4] a[201 I 0_ a[24] Abe(. a[20] i A[141 a[191 Pos~m p1] a[21] w[4] SLICE 4 y[3] yf[4] mAb(y)Pos(y 1 D m[4] as22] p[4] o4 as231 a[24] Figure 5-8: Block diagram of a correlator group. InitZero Figure 5-9: Block diagram of the minimal unit of the correlators. 104 of the value contained in the register using the L1 metric. The five metrics are then compared to produce the value and position of the register value with larger power. There is an option of programming the slice during packet detection to perform either one or 5 correlations at the same time. During PD, the taps have the values of the bits of the pseudorandom sequence that is used as the signature sequence in the preamble. By doing this, each of the correlators obtains the value of the correlation of the incoming signal with the signature sequence. After a full iteration, the results of the five correlations are compared and the one with the largest energy is chosen and its position is given as output. During channel estimation, the first of the correlators has as weight the bits of the signature sequence, but the other four have zero weight. The result of the first correlator after the iteration of channel estimation is one of the taps of the channel impulse response. The result is represented by ten bits for both the real and imaginary parts. During EPD, again only the first correlator is used, and the weight contains the bits of the signature sequence. Finally during PL, the weights of the slice contain up to four bits of the complex channel impulse response estimation. If the channel impulse response was detected (after nulling out of the coefficients that do not meet the threshold) to have a length of less than 5 samples (10 ns), only the first multiplier in each slide is used. If the channel impulse response is detected to have a length of 6 to 10 samples, 2 multipliers are activated. If 11 to 15 samples, 3 multipliers. If 16 to 20 samples, 4 multipliers. If 21 to 25 samples, 5 multipliers. The results are combined with other results from other slices to obtain the output of the partial Rake. The L 1 metric is used to estimate the output of the correlators: ljy[n]ll~ = I R{y[nj]} I+ j {y[n]} 1 (5.1) L2 metric requires the use of two multipliers of 10 bits inputs and 20 bits outputs and an adder of 20 bits (unless there is some truncation afterwards). The L 1 metric only requires three adders of 10 bits. Taking into account that there are five of these L1 metric blocks in each slice and a total of 30 slices, the area and power savings due to the use of L1 as compared to L2 are not negligible. L1 metric has replaced the L2 metric in every instance in this ASIC. An example of its implementation can be seen in Figure 5-11. 5.5 Channel Analysis Module The channel analysis block takes as input the channel impulse response estimated in the correlators and obtains the data necessary for both the MLSE demodulator and the Rake receiver. It takes into account the settings programmed in the chip and performs the analysis to simplify the complexity of the computations as much as possible. It analyzes the effective length of the channel impulse response to be able to turn off as much of the functionality of the back-end as possible. The block diagram of this subsystem is represented in Figure 5-10. It works with a 16.66 MHz clock. The inputs to the channel analysis block are the 25 consecutive samples of 105 the channel impulse response read from the first 25 correlators during state CE of the receiver. Each of these samples have a real and imaginary parts represented as 10 bit 2s-complement binary numbers. All the arithmetic in these blocks is fixed point arithmetic. This block is programmable. It admits as inputs the number of bits used to represent the estimation of the channel impulse response (2, 3 or 4), and the threshold of the minimum threshold for using a channel impulse response tap (dependent on the number of bits of the representation indicated previously). During channel estimation, the position that was detected as the first one with the maximum energy is aligned with the third correlator out of the total of thirty. It is expected that the maximum amplitude happens in the first five samples of the channel, given the exponential decay profile of the channel model [3]. The first five samples are used to determine what set of bits will be used as more significant bits. Once the more significant bit switching is identified, this is going to be used as a rough normalization of the weights or automatic gain control for the channel impulse response estimation. By using the MSB detector we are performing a defacto normalization of the channel impulse response that was estimated with 10 bits precision in the correlator blocks, with steps of 6 dB (since the maximum possible value from one step to the next is roughly 6 dB apart). The inclusion of the MSB detector, the specifications on the AGC are relaxed, allowing the system to work with the granularity provided by the front-end (6 dB). After the MSB has been obtained, this information is used to reduce internally the representation of the channel to the number of bits programmed (2 to 4 for both real and imaginary parts). The reduction itself happens by means of rounding, not truncating, since truncating for this small number of bits would imply severe positive biases (offsets) introduced. The channel impulse response samples are then truncated from any initial number of bits to 2, 3 or 4 bits, depending on the programming value. Once the signals have been rounded to the desired number of bits, the next step consists of comparing to the preprogrammed threshold. What is compared to the threshold is the sum of the absolute values of the real and imaginary parts of each tap. This is done in the Threshold Comply block. This block is comprised of 25 blocks as the one shown in Figure 5-11. Again, it was chosen to perform the L 1 metric instead of using the Euclidean (L 2 ) metric for simplicity of implementation, although in this case the impact is reduced with respect to the previous one since, only 25 comparators for numbers of four bits are required as compared to the ones that were required in the correlators. The Threshold Check and generates a T[j] signal for each of the input taps that indicates if the threshold has been met. After that, the "OR" of each group of consecutive five samples is also obtained as T[0] to T[4]. This signal is used to determine a rough estimation of the channel impulse response length. The channel estimation is given in multiples of five samples or 10 ns. These summarized signals can be used to add granularity to turning off parts of both the channel impulse analysis and the correlators. The number of taps used in each slice in the correlators is decided here as the number of five tap groups that are used at all. The Threshold Comply block nulls out those samples of the channel impulse response that did not meet the threshold, and afterwards, the results are conjugated 106 5 complex EnableMSBDetector (3 bits) Chan 10-4] h 5 bits) r[5-9] ý5bits) Chan TS S(6 I , a jLno "D A .jC0 F[20-24 5 bits) 25 complex numbers I > 10 bits Q = 10 bits (4 bits) EnableSummary To MLSE Decoder numbers I = 2/3/4 bits Q = 2/3/4 bits EnableMLSE EnableMiddle EnableConjugate Figure 5-10: Block diagram of the channel analysis subsystem. 107 -- Figure 5-11: Structure of one of the 25 components of "Threshold Check" Block. before being used in the correlators. The blocks Threshold Comply and Complex Conjugation are comprised of 25 blocks as that shown in Figure 5-12. For the MLSE demodulator it is necessary to obtain the autocorrelation of the channel impulse response and down-sample it at the symbol rate. The MLSE Weights block, depicted in Figure 5-13, obtains this result. In this Figure, the block labelled as Dot Product, takes as inputs two complex vectors of four consecutive taps of the channel impulse response estimation and computes is inner product. The output of this block are four weights that will be used in the MLSE equalizer. This block is only used once during the packet demodulation process after the channel has been estimated in state CE. The results are not loaded into the respective blocks that use them until state PL. The total time required for the computations is four cycles. We are taking advantage in this case of the long period that is used for the slow clock domain. 5.6 Timing Synchronization There are two possible timing errors that must be corrected during the packet demodulation. In this case, the preamble duration is short enough to make sure it is not needed during the preamble (states CE and EPD). This system as designed has four independent timing references, in the same way that was shown in chapter 4. The difference now is that it cannot be ensured that the difference in frequency is small enough between the transmitter carrier and the receiver oscillator. Both a DLL and a Costas loop [40] are implemented in this transceiver. The DLL used in this transceiver is exactly the same one that has been used 108 11~~1~ /Threshold I Comply (Complex Conjugation Figure 5-12: Structure of one of the 25 components of the blocks "Threshold Comply" and "Complex Conjugation". Figure 5-13: Block diagram of the MMSE weight estimator. 109 Reset-4 phi ----'J L'J.. / phi 'I 0. Mf 2]V i N I ----------------------------- 0.4 - - a 0.2 -0-- phi -------- -------------- ----- -. 4 -0.8 ----------- -0.8----i-- -"- - .1 -- ----------- _nC Y"· ----- ---------- n V n, A V··· ------ 1 Figure 5-14: Block diagram of the Costas loop. in previous prototypes. For that reason, we will not describe it again. On the other hand, this is the first receiver where a Costas Loop is implemented. Its block diagram is shown in Figure 5-14. The output of the correlators during PL consists of six complex outputs of the matched filter in parallel. For the purpose of detecting the phase error, only the first of these six outputs will be used, ignoring the rest. This allows reducing the data rate by 2, 3 or 6 without having to make any changes to how both the DLL and the Costas Loop work. In BPSK, the symbols should be aligned with the real axis, with positive real part if a 1 was sent, and negative real part if a 0 was sent. For the Costas Loop, if a 1 is received, the complex number is accumulated. If a 0 is received its value, negated, is accumulated instead. This accumulation process is performed during 32 cycles of the 16.66 MHz clock, for a duration of 1.92 ps. At the end of this interval, assuming the output of the accumulator is I + j -Q, the phase error is approximated as: _esign(I) -Q (5.2) This approximation is accurate when I << Q. The phase estimation is filtered with a programmable filter and its output scaled to five significant bits. This five significant bits are used as addresses of a 32 position ROM, each of the positions occupied with a complex number representing a concrete phase correction, as shown also in Figure 5-14. The phase corrections are stored in the ROM consecutively, so that an increase 110 in the phase correction translates in an increase of the address of the ROM. We also take into account that the phase corrections are periodic, so that when an overflow occurs in the address, the corrections are still continuous since the address goes back to small values. Even if only the first of the outputs of the partial Rake are used for phase error estimation, the correction is applied to the six outputs of the Rake. The total latency from the output of the partial Rake to the phase correction updating is 3 cycles of the 16.66 MHz clock. 5.7 MLSE Equalizer The use of a Viterbi demodulator for MLSE equalization follows a similar path to that of demodulating a convolutional code. Figure 5-15 shows the Trellis diagram that corresponds to an eight state MLSE equalizer, as specified in chapter 3. During the demodulation process, every time an impulse is received, it is assumed that the initial state of the demodulator may be any of the eight initial states indicated in Figure 5-15. The final state depends on the initial state and on the demodulated value of the impulse received (each state has two output branches depending on if the next bit is a 0 or a 1). As more than one impulse are received, the Trellis shown in the diagram 5-16 with one slice per impulse received represents the procedure to find the more probable path and demodulating the information at the same time. An example of a possible path with the demodulated bits associated to it is shown also/ in 5-16. The objective of the MLSE algorithm is to find the path along the Trellis that maximizes the maximum likelyhood metric. For our MLSE equalizer we will use a classical architecture as that shown in Figure 5-17. It consists of a branch metric unit (BMU), and add-compare-select unit (ACS) and a trace-back unit (TBU). The purpose of the BMU is to calculate all the metric units as a function of the channel impulse response and the output of the partial Rake. These branch metrics are taken into account in the ACS as they are accumulated to the previous states and the result serves to choose one branch out of all the branches that arrive to every final state. The TBU stores the values of the initial states and the paths followed to be able to perform the trace back function and demodulate the received bits. All these blocks are implemented in the slow clock domain. The first two perform the operations in only one cycle. The last one adds a latency that depends in the number of cycles required for trace-back and decode. In our transceiver the MLSE equalizer receives six inputs at a time. Six bits must be demodulated with every clock positive edge, although the total latency is not important. The solution for this, instead of performing only one Trellis iteration per cycle, is to unroll the algorithm six times, and to perform the equivalent of six iterations per clock edge. The main impact of this occurs in the BMU, since now it must obtain all the metrics associated to all the possible paths from the initial states to the final states after six slices of the Trellis. An elegant solution appears since all the possible paths that share the initial state and the final state also have in common the initial state metric. A prior decision may be taken without looking at the initial state metrics Figure 5-15: 8-state Trellis diagram. 112 Figure 5-16: Locating the most probable path in a 8-state Trellis. Costas LoopOutputs_ Add Compare Select Branch Metric Unit Preselect Pointer 1 Pointer 2 Pointer 3 Write Trace-Back Decode Trace Back Unit Depth 12 6 bits output 8 blocks 64 blocks 64 branch metrics Figure 5-17: Block diagram of the MLSE equalizer. 113 that reduces the number of outputs of the BMU to only 64 (one per initial and final state). Once this simplification is performed, the ACS needs only to perform a decision among eight paths to each of the eight final states. This is done with a Radix-8 ACS unit. Taking into account that it is assumed that each bit can collide with the next three, the duration of the trace-back operation must be at least equal to five times this number, that is, 15 bits. Since in each operation of the MLSE, six bits are associated to each path, the trace-back would need to go over three of these iterations. We will be conservative and perform the trace-back operation over four of these iterations. The TBU is designed with a depth 16, in which each position contains a path for six bits. Three pointers will be travelling along the TBU, as shown in Figure 5-17. One pointer will perform the write operation, another one the trace back operation, and a third one, the decode operation. The size of the TBU is decided so that these pointers never collide. The latency associated to the TBU is equal to 12 cycles of the 16.66 MHz clock. 5.8 Implementation and Results The previous architecture was implemented in a 0.18 pm CMOS process, at 1.8V. Figure 5-18 shows the layout of the chip. It was implemented using normal digital flow, synthesizing all the blocks as a whole. The total area of the chip is 3.8 mm x 3.6 mm = 13.7 mm2 . It used approximately 1.5 million gates, of which 47% belong to the correlators and the high-speed clock domain, and 36% belong to the MLSE equalizer. For the purpose of testing, a PCB board was designed as shown in Figure 5-19. This board allows to connect the custom ASIC both to a logic analyzer, for testing under controlled inputs and outputs or to connect it to an interface board with the ADC that was designed by Brian Ginsburg. The first stage of testing was performed with the logic analyzer. In this case, it was not possible to test the system with a complex signal (active inputs in both the in-phase and quadrature channels) because the pattern generator of the logic analyzer does not have enough outputs for both generating the input signal and the control signals required for the chip. For this reason, it was chosen to perform part of the testing with real inputs only. Under these circumstances, the Costas loop cannot be tested, since it is designed to correct the phase error of a complex signal. Another limitation of this phase of testing is the maximum clock frequency that can be generated with this pattern generator, that limits the testing to 70 MHz instead of 83.3 MHz. Figure 5-20 shows an example of the signals captured by the logic analyzer. EnBits is a signal that goes from zero to one whenever a packet has been detected and the output bits of the ASIC (shown in the line Bits) are valid demodulated bits. The signal EnBits may be used in the interface with another board or a PC as an enable for an input register. This figure also shows the inputs to the ASIC (InputsRealO to InputsReal5), corresponding to 6 samples in parallel at a frequency five times faster than that of Bits simulating the output of the ADC. Signal Swap represents the control of the retiming block. Figure 5-21 shows the sequence of bits 114 - i- 3.8mm IF 5-18: Robust 3.6Figure UWB baseband layout. Figure 5-18: Robust UWB baseband layout. 115 Figure 5-19: Testing board. sent and demodulated in these tests. This sequence of bits is a Manchester code and was chosen to ensure that not a long sequence of impulses of the same sign would then be used in the RF front-end. A signal with Gaussian noise was simulated in the pattern generator in order to estimate the baseband sensitivity. Figure 5-22 shows the measured bit-error rate for different SNRs in two cases. First, when the input signal is comprised of Gaussian impulses with AWGN. Second, when the input signal has gone through a CM2 channel. Since the input signal follows the assumptions of the simulations presented in chapter 3, the results closely follow the simulated curves. For each point with bit-error rates larger than 10- 4 , 10 data packets were used to perform the bit-error rate estimation. For lower bit-error rates, the estimation was based in the results of 20 packets. Figure 5-23 demonstrates the trade-off between complexity of the signal processing and the power dissipation. As the threshold that controls the number of taps of the partial Rake receiver that are activated decreases, more taps are used and the total power dissipation increases. For this plot a CM2 channel [3] was used, and the MLSE equalizer was enabled. In this plot, the threshold never goes over amplitude 5. This is due to the fact that during the testing it was found that because of the normalization performed in the channel analysis subsystem, sometimes, even if the packet was detected with good SNR, no sample of the channel impulse response after normalization met the threshold when it was larger than 5. Because of that, the whole packet was lost. If we change the number of bits of the internal representation of the channel impulse response (it can be chosen to be 2, 3 or 4 bits), these power results do not change sensibly. The reason for this is that the procedure to change the internal representation is simply to null-out those bits that are not used. Since a 2s-complement binary representation is used, a change of sign implies the switching of the MSBs, 116 Figure 5-20: Interface signals when a packet has been detected. Figure 5-21: Interface signals showing a sequence of demodulated bits. 117 ..... ..... ...... no quantization ............. Simulated x ASIC clean pulse i.............;........., *i!~i~iiriiiiiiiii -9i - ASIC CM2 10- 1 1D- 2 10 .................. ........................ :::: ?::: ....... .... ........... ............. ..... ... ... .... ... I.... ... .... . -.... .. 0 .............. .......... ........ : .............. ....... 10a10 "' -10 -5 0 5 SNR (dB) 10 15 Figure 5-22: Probability of error measures in the ASIC. even if it is not needed. To avoid this problem and achieve better energy trade-off, it would have been necessary to use an architecture that ensures that the multipliers switch only the relevant bits. This ASIC has also been tested integrated in a full wireless system at a 62.5 MHz frequency. Figure 5-24 shows the end of the preamble and the beginning of the payload obtained in these conditions. A 7% reduction in packet acquisition resulted due to observing a change of sign in the real part of the output of the matched filter to detect the end of the preamble. This loss would be avoided by looking instead at the change of sign after the phase correction given by the Costas loop has been taken into account. A wireless data rate of 85 Mbps was obtained 62.5 MHz. With the logic analyzer, the ASIC consumes a maximum of 83 mA from a 2 V power source. The simulated values were for these conditions 76 mA at 1.8V, of which 28 mA correspond to the MLSE equalizer and 40 mA correspond to the high-speed clock domain and the correlators. This power dissipation can be reduced up to 45% changing the threshold used to select the taps of the channel impulse response estimation. This proves the possibility of trading off energy dissipation with quality of service. In a 5 kbit packet, the energy dissipated at the full functionality is 2.4 nJ/bit. 118 MM6 .... ... .. .. . .. ... .... .. ..... .. .. ... ... . ... ..... .. ... ..... .. ... .. ... ... .. ... ... ..... ... ................. ........... ... ... ..... .. .. ... "*.. .................... ................... ......................... ......... ..... ..................... .... .......... ............. ........... .. ............ An 1 5 2 2.5 3 3.5 4 Threshold for channel (maximum 7) 4 5 Figure 5-23: Demonstration of a QoS - Power trade-off. Data packet Figure 5-24: Structure of the data packet. 119 120 Chapter 6 Conclusions and future work 6.1 Thesis summary Although impulse signals were used for some time in radar applications, only recently have they been revisited for communication purposes. The authorization of the band from 3.1 to 10.6 GHz for communication purposes under certain restrictions has spurred the research and development of applications using UWB signals. There are currently two main drivers of the technology: applications that try to achieve very high data rates at very short distances, and applications that achieve very low data rates at larger distances. In this thesis we have focused on the challenges associated with impulse UWB targeting high data rates. The use of ultra-wideband signals for wireless communications presents some advantages over conventional narrowband signals while at the same time posing some interesting challenges. UWB signals have better time definition than narrowband signals. Multipath does not appear as fading since individual echoes may be detected independently, and their energy collected. To compensate for the multipath it is necessary to estimate the channel impulse response, and to use this information in both a matched filter (that gathers the energy from the different echoes of the signal) and an equalizer (to compensate for possible inter-symbol interference). UWB receivers may be successful in multipath environments at the cost of increasing the complexity and power dissipation of the digital baseband. In this context, the power dissipation of the digital baseband becomes a relevant part of the total power budget, and should be optimized. In this thesis, the design and implementation of a baseband for UWB wireless systems has been explored through several prototypes. First, a custom ASIC oriented to baseband UWB signals was designed. It was shown that, for this application, an ADC of only 4 bits ensured reliable signal demodulation. The baseband was designed for a signal in the band from 0 to 500 MHz, as part of a system-on-a-chip implemented in 0.18 ,im CMOS technology working at 1.8 V. Each bit of information is represented with 31 baseband impulses of 2 ns width and 2% duty cycle , resulting in a raw data rate of 322 kbps. The digital baseband is completely functional at a clock frequency of 300 MHz but not at 500 MHz. The frequency range for the coarse acquisition 121 algorithm between a pair of transceivers is ±3%. The average time to declare coarse acquisition is 65 ps. At 300 MHz, a data rate of 193 kbps was demonstrated. The baseband consumes 75 mW. This architecture is also scalable to larger bandwidths. Second, the specification of a FCC compliant UWB system with a raw data-rate of 100 Mbps at 10 m distance using a bandwidth of 500 MHz in the band from 3.1 GHz to 10.6 GHz was analyzed. It has been established in this thesis that a homodyne architecture is better suited.for ultra-wideband signals. The main challenge in this kind of wireless communication is multipath. In order to cope with it, it is necessary to estimate the channel impulse response and use this information in a Rake receiver and in an equalizer to compensate for inter-symbol interference. It is possible to estimate the channel quality and adapt the signal processing available in the digital baseband to the concrete channel quality. For the purpose of exhibiting this trade-off, impulse UWB signals are better suited than OFDM signals where the complexity of the signal processing is already fixed to avoid inter-carrier interference. It has been proven that impulse UWB offers the opportunity of reducing the number of bits of the ADC under favorable conditions of SNR or channel impulse response. A data packet is defined that is comprised of a preamble and a payload. The preamble is composed of 16 repetitions of a Gold code of 31 bits, in which every two consecutive impulses are separated by an interval of 60 ns. The Gold code is used during the detection of the data packet and the 16 repetitions ensure a time to achieve packet acquisition of 30 Ms. The separation between impulses allows the estimation of the channel impulse response with reduced or no ISI. The channel impulse response is estimated using the information obtained by receiving a sequence of 31 impulses. Each tap of the channel impulse response is represented with a complex number in which both real and imaginary parts have 4-bit precision. During the payload, the impulses are separated by 10 ns, and each bit of information is represented by the sign of only one impulse. The sensitivity of this system is -81 dBm with a noise figure of 5 dB. Third, a complete discrete prototype was built in the Digital Circuits and Systems group with off-the-shelf components. This prototype was used to validate some of the theoretical claims of the system in real-time conditions. As the components of the system are designed and fabricated, they can also be individually substituted into the prototype to verify overall system functionality. The second UWB baseband was designed for this discrete prototype and implemented using an FPGA. This prototype is FCC compliant and uses 500 MHz subbands from 3.1 GHz to 10.6 GHz. With this digital baseband it was possible to obtain either a data rate of 100 Mbps using an arbitrary waveform generator or 50 Mbps using a dedicated impulse generator. Due to the flexibility of this discrete prototype, it is not restricted to impulse UWB signals, being possible to test other modulations. For example, it was used to test multitoneFSK modulation. Finally, a second ASIC was designed to implement a robust, FCC-compliant UWB baseband working at 100 Mbps. This second ASIC was designed using 0.18 pm CMOS technology working at 1.8 V. Among the subsystems implemented are 150 correlators in parallel to reduce the time to achieve coarse acquisition, a programmable partial Rake that may use up to 25 complex taps and a MLSE equalizer. This thesis has 122 explicitly exposed the link between signal processing complexity, power dissipation and quality of service. The packet acquisition is achieved in 30 Ms. It consumes a maximum of 83 mA from a 1.8 V power source. This power dissipation can be reduced by 45% by changing the threshold of the channel impulse response estimation, proving the possibility of trading off energy dissipation with quality of service. In a 5 kbit packet, the energy dissipated at the full functionality is 2.4 nJ/bit. 6.2 Conclusions In this work it has been determined that the main challenge in a UWB baseband for high-data rate applications is multipath compensation. Because of the bandwidth of the signal, it is possible to separate the different echoes that comprise the channel impulse response and use this information to gather signal energy using a Rake receiver. But the multipath causes inter-symbol interference, needing an equalizer to compensate for it. The Rake receiver and the equalizer severely increase the complexity of the digital baseband, and, together with the large sampling rate needed for high bandwidth signals causes the digital baseband to consume a sensible part of the system power budget. Impulse UWB allows trading off signal processing complexity with quality of service, depending on the channel quality. When the received SNR is high and the channel impulse response is short to ensure that no ISI happens, it is possible to reduce the energy dissipated per packet by not using the MLSE equalizer and also reducing the number of taps that the Rake is using (this can be done by increasing the threshold used for the channel impulse response). If, on the other hand, the SNR is low, and the channel impulse response causes ISI, it is necessary to use the full complexity of the system to recover the information. This trade-off has been explicitly proven in this thesis in terms of the loss of SNR and the complexity in chapter 3 and in terms of an explicit power dissipation difference in chapter 5. MB-OFDM does not allow this trade-off. One of the most important specifications of an UWB system is the specification of the number of bits of the ADC. It has been proven that for the applications designed in this thesis, a 4-bit ADC allows reliable demodulation. In the interference limited case, for a data rate of 100 Mbps, it even allows operation with a SIR of -7 dB. Still, the received impulses depends not only on the transmitted impulses but also on the channel impulse response and in the transfer functions of both transmitter and receiver antenna and the receiver front-end. For this reason, it is advisable to estimate the received impulse shape as part of the channel impulse response. This limitation was discovered in the first custom ASIC designed in this thesis and its solution implemented both in the discrete prototype and in the second custom ASIC. Ultra low power UWB system might be designed if the data rate is reduced enough to ensure no ISI happens. If it is possible also to reduce the data rate so that each bit of information is represented with more than one impulse, there is no need of a Rake receiver either, and the architecture is simplified. As the data rate increases, with constant signal bandwidth, the number of symbols affected by the channel impulse 123 response increases linearly. The Rake complexity also increases linearly, but, more important, the complexity of the MLSE equalizer increases exponentially. To keep the complexity to a minimum, it is necessary to keep data rates low. 6.3 Future work In terms of future lines of work, the exploration of further ways of reducing the power dissipation seems promising. In this work we focused on the concrete tradeoff that involves the complexity of the signal processing and the quality of service. Circuit techniques that reduce power dissipation, specifically turning off the different segments of the system when they are not needed should be explored. Among the possible techniques, dynamic voltage scaling, clock gating, and the use of high-Vt devices to reduce the leakage will prove very effective. This thesis used a MLSE equalizer. This is not the only architecture available for this purpose, although it represents the theoretically optimum solution to the ISI problem. It would be desirable to thoroughly study other architectures (such as zeroforcing or decision-feedback schemes), and the different complexity/quality of service trade-offs that each of them would involve. Finally, we have not addressed the problem of in-band narrowband interferers. It was initially claimed that, not only would an UWB signal cause a negligible interference to already existing services, but it would also be robust to strong interference. This assumes that the whole system is linear. Linearity constraints are imposed by the RF front end and the ADC. This sharply reduces the tolerance of UWB signal to narrowband interferers to only 7 dBr. Active cancellation of in-band interferers, by estimating the frequency of the narrowband interferers in the digital baseband and providing this information to the RF-front end, would help to improve the robustness of the receiver. 124 Appendix A Comments on Link Budget In this chapter, the equations that were used for the link budget specification are presented. A.1 Notation The meaning of the symbols used in this appendix is shown below. fmin - Minimum frequency of the band (Hz) fmax - Maximum frequency of the band (Hz) Sd - Maximum spectral density (dBm/MHz) BW - fax - fmin - Signal Bandwidth (Hz) (A.1) (A.2) (A.3) (A.4) BW Pt = Sd + 10 -log 10 Average transmitted power (dBm) Gt - Transmitter antenna gain (dB) f, -- /fmax fmin - Geometric center frequency (Hz) 20. log ( 4 fc lm) L1 Path loss at im (dB) dmin= Minimum distance (m) L 2min ~ 20 log di mim Extra path loss at dmin (dB) AADC da = KADC - M _ (A.7) (A.8) (A.10) (A.11) 20 -log max - Extra path loss at da (dB) G, Zo - (A.6) (A.9) dmax - Maximum distance (m) L 2 mazx (A.5) 1m Receiver antenna gain (dB) Reverence resistance (Q) Peak amplitude value for the ADC Constant from ADC Design Margin (dB) 125 (A.12) (A.13) (A.14) (A.15) (A.16) (A.17) PRF - Pulse repetition frequency (Hz) A.2 (A.18) Definition of the Parameter K K is defined in the following equation: Ep = j p(t)12 = KA 2 (V 2s) (A.19) where p(t) is the mathematical representation of the pulse shape, A is the peak value of the impulse, and Ep is an amount proportional to the energy of the pulse. In order to get energy, in Joules, it is necessary to consider an impedance where this energy is dissipated. Then the energy Ef is: KA 2 Ef = ZA (A.20) Z1 is the input resistance (not necessarily 50 Q). The power is: P = K - A 2 - PRF K . A2 • PRF Z Psf -- (V 2 ) (A.21) (W) (A.22) When these impulses are up-converted, all these expressions must be divided by 2. E = Ef = _00 KA 2 I(t) p . coswt 2 2A dt = KK 2 (A.23) (A.24) (W) 2Z 1 (V 2 s) Applied to power expressions: P = PA A.2.1 A 2 PRF 2 K. A2 • PRF 2 PRF (V 2 ) (A.25) (W) (A.26) Gaussian Pulse The equations for this kind of pulse are: (A.27) p(t) = Ae - t2/ 2a 2 ,2w P(jw) = AaivHe E, = U 7rrA 126 2 2 2 (A.28) (A.29) from where K = avo (A.30) It is also very useful to relate the standard deviation of the Gaussian pulse with the bandwidth of the signal at certain attenuations: 0.241 a= (A.31) 0.132 a =0132 (A.32) From these equations, the standard deviation required for a Gaussian pulse of 10dB bandwidth equal to 250MHz is 0.964 10-'s. This would be 250MHz of baseband bandwidth or 500MHz of passband bandwidth (once up-converted). A.2.2 RC-charge Pulse The equations for this kind of pulse are: p(t) = A 1 - e-T/' A u(t) - A 1 - e-T/r 1 - e jwT P(jw) = 1 - eT/ j(1 +j 1- e-T/' jU(1 + jW7) EpA= T - r(1 - e - Tr/ E,= A2 (1 - e-T/r)2 ue-(t(t - T) (A.33) (A.34) ) (A.35) from where T - r(1 - e -T/) (A (1 - e-T/T) 2 With T = 2ns and 7 = 1.11ns, the baseband bandwidth is 250MHz and the bandpass bandwidth is 500MHz. A.3 Link Budget and Sensitivity The sensitivity, defined as the minimum required power with which the receiver works properly, is simulated for a reference Pulse repetition frequency PRFref. It gives: SNRef = A2KPRFref 2 PRF (A.37) The probability of error is still the same when the pulses are closer together or further apart, as long as they do not collide, because it depends only on the ratio A/a. It is 127 possible to refer this sensitivity to other PRFs as follows: SNRe,, = 10 log SNR = 10 log A2 K 2 A 2K 2 + 10 log PRFre, (A.38) + 10 log PRF (A.39) from where SNR = SNRr,, - 10 log PRFef + 10 log PRF PRF = SNREef + 10 log (A.40) PRFref Once SNRref and a PRFre,f are obtained for a concrete Pe, for a bandwidth and a bit rate, SNR is obtained as a function of PRF. A.4 Extra Losses Due to Pulse Shape In this section the losses due to the fact that a sinc pulse is not used are considered. They are associated to not taking advantage of all the power spectral density available. The power of the square spectrum (see Figure 3-22) is 2 B A 2dw = A 27rBW (A.41) meanwhile, the one with the pulse that used in transmission assumed to be p,(t) with Fourier transform Po(jw), and unit maximum amplitude, has a power: A2 IP(jw) 2 dw j0 (A.42) The losses can be defined as: Lshape A.4.1 2irBW Po(jw) I do 0 1~ (A.43) Gaussian Pulse For a gaussian pulse: Lgauss = 2W - BWa Taking in to account that (A.44) 2 Po(jw) =e a2w 2 (A.45) _= (A.46) the following result is obtained SlPo() 12 d 128 Assumming the BW is for -10dB, Lgaus, = 2.33dB. A.4.2 RC-charge Pulse Starting from the pulse shape: 1 Po(jW) = T 1 - e - j wT 1- -jwT (A.47) Tju(1 +j-r) using a version of the Parseval equality: p(t)12 = 2rEp SP(jw)I2 dw = 27r From where: LRC = 10 log BW . T 2 - (1 - e-T/ (A.48) (A.49) T - -r(1- e-Tl7) This formula gives 2.7dB for a 250MHz baseband bandwidth. A.5 Receiver Constraints The minimum desired received power is Prmin = Pt - L1 - L 2max - Lex + Gr (dBm) (A.50) (dBm) (A.51) and the maximum desired received power is Prmax = Pt - L - L2min - Lex + Gr This is the power given by the antenna when the antenna is matched. The use of this equation implies assumptions on the matching of the receiver antenna. It does not provide any data on the transmitter as Pt dumps together the gain of the antenna, implying any matching effects are already included in the number. A.6 ADC Constraints and Detection The following formulas assume that the automatic gain control works perfectly. They refer to signals at the input of the ADC. * Minimum Desired Received Voltage Vmin = F KZ K- PRF PRF 129 (mV) (A.52) e Maximum Desired Received Voltage P2Z10+30 Vmin = (mV) K. PRF (A.53) * Desired Voltage at the Input of ADC J Va.dc KADC - AADC K-PRF(1 + 10-SNR/10) (mV) (A.54) * Desired Power at the Input of ADC Psao, = - 30 + 2 0 log KADC - 10 log Z1 + 2 0 log AADC - 10 log (1 + 10- SNR10O) (dBm) (A.55) * Maximum Noise Standard Deviation at Input of ADC Oade = KADC *AADC 10- (1 S N R / I O) V(1 + lo-SNRI1O) (mV) (A.56) o Maximum Noise Power at the Input of ADC Pn,,a = -30 + 20 log KADC 10 log Z1 + 20 log AADC - 10 log (1 + 10SNRI10O) (dBm) A.6.1 - (A.57) Explanation of KADC and AADC This analysis is performed for a baseband signal, right at the input of the ADC. Ps = A 2 K - PRF Z, (A.58) a2 PN = 0 Z1 (A.59) Where Ps is the signal power and PN is the noise power. The policy of the AGC can be described as follows: Total Power Scaling X = A2 K PRF + a2 KADC 130 (A.60) (A.61) Where the Scaling represents the number the input signal is divided to fit the ADC input range. This Scaling is used as follows: A Scaling Apime- a a aprime = Scaling SNR = 10log (A.62) AA KADC • KADC (A.63) KADC V A 2K - PRF (A.64) 2 Where Aprime and sigmaji,, refer to the signals at the input of the ADC. The total power is then: X=o 2 + A 2 K - PRF = 2 a+ A2K PRF (A.65) = U2 (1 + 10SNR/1) Scaling is rewritten as Scaling = sN o•/1 + 10 R /I1 (A.66) KADC then Aprime = A A1• aprime = a1+ besides KADC 1 KADC K . PRF 1 + KADC V+ 1 0SNR I O _ -A10 SNR/1O a . KADc 10 SNR/O A 2 K PRF = 1 1 0 -SNR/1 10 SNRo10 (A.67) (A.68) (A.69) -2 Then A= KPRF (A.70) In order to normalizing the input scale of ADC, Aprin and -prime are multiplied by AADC. The units depend on those of AADC. A.7 Gain Specifications The following equations refer to the total minimum and maximum gains required of the analog front-end. * Minimum Gain in the margin Gmin = Psadc - Prmaz 131 (dB) (A.71) * Maximum Gain in the Margin Gmax = Psadc - Prmin (dB) (A.72) * Constant Part of the Gain Gfpoor = Gmin (dB) (A.73) * Variable Part of the Gain Gvar = Gmax - Gmin A.8 (dB) (A.74) Noise Figure Specification This formula refers to the maximum noise figure allowed to the analog front-end. F = P, - SNR - M - 10 log BW + 174 In this formula P, must be expressed in dBm. 132 (A.75) Appendix B Comments on Signal Generation In this chapter, the procedure to include the non-idealities of both the transmitter and the receiver in simulations is demonstrated. B.1 Transmitted Signal Two different signals are considered. The difference is the timing origin of pulses with respect to the phase of the carrier used to up-convert them. The theoretical transmitted signal is defined as: bkp(t - kT,) s(t) = (B.1) k=O where bk is the sequence of transmitted bits, p(t) is the transmitted pulse, and T, is the time interval between consecutive pulses. There are two different cases. In the first case, p(t) = g(t) st.(t) = s(t) - cos(wct + ¢) (B.2) (B.3) p(t) = g(t) - cos(wct + q) (B.4) In the second case, stz(t) = s(t) (B.5) In these expressions, st.(t) represents the signal that is transmitted. Two cases are considered: sl(t)= 00Ebkp(t-kT)cos(Wt+O)=O s 2 (t) = E bkp(t - kT,)cos(w,(t - kT,) + €) k=O 133 (B.6) (B.7) 00 = bkp(t - kT,)cos(w t + - wkT,) k=O B.2 Jitter in the Transmitter Jitter affects the instants of generation of pulses (Sti) and the carrier phase (601). The effect is different depending on the kind of transmitted signal. Taking into account them: sl(t) = bkp(t - kT tk) - 1 (t)) (B.8) - w 0Stk) c - w~kT, (B.9) cos(t + 00 s 2 (t) = E bkp(t - kT,6tk)cos(wct + k=O B.3 Channel Impulse Response The description of the channel is given as a sequence of amplitude and delays. From this expression, the equivalent low pass representation of the channel is obtained. Let the impulse response of the channel be: N-1 h(t) = a6(t - ti) (B.10) i=O where ai are real numbers and ti are delays and N is the number of multipath components in the model. We can also summarize the input signal to the channel as: s(t) = x(t)cos ((w, + 0)t + ¢(t)) = x(t)cos(wot)cos(wct + 0(t)) - x(t)sin(wot)sin(wt + 0(t)) (B.11) = si(t)coswot + sQ(t)sinwot Where the sub-index I represents in-phase component and the sub-index Q represents quadrature component. The output of the channel is therefore: N-1 r(t) = s(t) * h(t) = E ais(t- ti) i=O N-1 N-1 aisi(t - ti)coswo(t - ti) + E aisQ(t - ti)sinw,(t - ti) S i=O i=0 N-1 aisl(t - ti) (coswot - coswti + sinwot - sinwoti) =- i=O 134 (B.12) N-1 + E aisQ(t - ti) (sinwt - coswoti - coswot, sinroiti) i=O From this it is possible to write: N-1 N-1 aisr(t - ti)coswoti - r(t) = i=0 aisq(t - ti)sinwoti cOSWot (B.13) i=O N-1 N-1 aisI(t - ti )sinwoti + E aisQ(t - ti)coswoti sinwot + i=O i=O Let (B.14) sL(t) = si(t) + jsQ(t) N-1 hi(t) = E aiej-"t'(t - ti) (B.15) i=O ei•wt' = coswoit + j - sinwoti (B.16) Taking into account the previous equations, N-1 st(t) * hi(t) = E aieij w t ' (sj(t - ti) + j - sq(t - ti)) i=0 N-1 ai (si(t - ti)coswoti - sQ(t - ti)sinwotl) = (B.17) i=O N-1 ai (si(t - ti)sinwoti + sQ(t - ti)coswoti) i=O Therefore the signal at the output of the channel is r(t) = Re { (sL(t) * h (t)) e - w i dt } B.4 (B.18) Signal at the Input of the Receiver In a realistic receiver there is a difference in frequency between the transmitter(tx) and receiver(rx) carriers: Wr = Wo = Wo + Awo eWx (B.19) (B.20) but in general Awo << w0 . There is one jitter that affects the center position of the pulses (Sti) and another one that affects the phase of the carrier (6S1(t)). For the 135 first kind of signal, taking into account these jitters: s(t)= bkg(t - kT, - 6tk)) cos (wot + 0 + 6o 1 (t)) k=0 (O 5tk)) bkg(t- kT, - cos (wot + Awot + q + J0 6(t)) (B.21) ± bkg(t- kT, - 6tk)) cos (/wot +-- + - ( (Ok=0 bk9(t coswt 1 (t)) - kT, - 6tk)) sin (Awot + ,1 (t)) sirnw From where slI(t) = S1Q(t) = siz(t) = (Ok=0 k=( (t) kT - 6tk)) cos (Awot + + 6q11(t)) bkg(t -= bkg(t - kTU - Ybkg(t - kTs (ook=0 tk)) sin (Awto + - Jtk)) + 0 1 (t)) (B.22) (B.23) (B.24) eAwot ej ejsbl (t) Following a similar procedure for the second kind of the signal s 2 (t) = E bkg(t - kTs - 6 tk)COS (wc(t - kT, - 6 tk) + ) k=O (B.25) 00 6 bkg(t - kT, - = tk)COS ((wO + AWo)(t - kT, - 3 tk) + q) k=O Defining oi,tz(t) = -wokT, - Wtk + Awt - AwokT, - AAWotk + q (B.26) Therefore ' bkg(t - kT - 6tk)ej Oii(t) s21(t)= (B.27) k=O From these equations si(t) = Re {sil(t)ej~ot} 2(t) = Ree { 2 136 (t)e w'o } (B.28) (B.29) The transmitted signal is obtained from the equivalent low pass model as s(t) = Re {si(t)eiwot) = si(t)coswot - sQ(t)sinwot (B.30) Assuming the channel is N-1 h(t) = E ai(t - ti) (B.31) i=O With a2 complex. The procedure to use the channel with the low pass model of the signal is: r(t) = N--1 s(t) * h(t) = (sj(t)coswot - sQ(t)sinoWt) * = ai (sI(t N-1 = - t,)cos(wot - woti) - sQ(t i=O N-1 - ai(sQ(t - ti)sin(wot ii=O= - woti)) - a, (si(t - ti)cosw ot -swti) + si(t - ti)sin(wtsinwti) tj) sinwtcoswoti +sQ(t- t,)coswotsinwotj) i=O N-1 - a, (sQ(t - t,)c sw ot+ sQ(t - t)sinw i=O N-1 - it,) coswot5 (B.32) a, (sQ(t- ti)coswoti - sQ(t - t)sinwt) sinwt i=O N-1 - aj5(t - ti)) aRe {(s(t - ti)+ jsQ(t - t,))(oswi (B32) j iwt} - jsinwot,)e i=O = = 0Re af(s 1 (t - ti) + jsQ(t - t))(cosw - sin e Re {re(t)eiwot} with r(t) = ha(t) * si(t) (B.33) N-1 h=(t) = a)e-•wt6(t - t,) i=O 137 (B.34) ELP of a Non-linearity B.5 Let the input to the non linearity be the output of the channel, that is r(t) = Re {r(t)ejdwot = rI(t)coswot - rQ(t)sinwot (B.35) A fifth-order non-linearity is assumed: z(t) = r(t) + a2r 2(t) + a3 r3 (t) + a4r 4 (t) + a 5r 5 (t) (B.36) The objective of this section is to obtain the coefficients bo, b,,i and b,,i in the following expression: r(t) = bo + bC,icoswot - b,,lsinwot + bc, 2cos2wot - b,,2sin2wt + bc, 3cos3wot - b,,3sin3wot + b, 4cos4wot - b8,4sin4wot (B.37) + bc,scos5wot - b,, 5sin5wot The square term in (B.36) is expanded as: r2 (t) = r•(t) + r•(t) r1(t) - r4(t) cos2wot - rI(t)rQ(t)sin2wot (B.38) For the r 3 (t) in (B.36): r3 (t) = r (t)cos3Wot - 3r2 (t)rQ (t)cos2w tsinwJt 1 2 1 2W,, w t _r 3 Wt + 3rI(t)rQ(t)coswotsin2wt r(t)sin3wot (B.39) (B39) Where: 3 1 cos 3 wot = -coswot + -cos3wot 4 4 1 1 cos 2wotsitnwt = -sinwot + -sin3wt 4 4 1 1 2 coswotsin ot = cOswot - -cos3wot 4 4 1 3 sin3 wot = -sinwot - -sin3wt 4 4 (B.40) (B.41) (B.42) (B.43) Therefore, r((t) = 3 (r(t)+ r (t)) (r,(t)coswot - rQ(t)sinwot) + 1 (r1 (t) - 3r (t)) ri(t)cos3wot 138 (B.44) 1 I + (r2(t) - 3r2(t)) rQ(t)sin3wot For the r 4 (t) term in (B.36): r 4 (t) r4 t) C4 + - 4rr(t)rQ(t) cos3 Wotsin 6rf(t)r2(t) cos2 ot ot sin 2 wot - 4ri(t)rQ(t)coswt sin3 0ot (B.45) + r'(t)sin 4 wot The required expressions for this are: 31 1 - + - cos 2wt + - cos 4wot 82 8 1 1 3 cos wot sin wot = - sin 2wt + - sin 4wt 4 8 Cos 4 wot 1 1 cos 2 Wot sin 2 wt = I - cos 4wt 8 8 cos wt sin3 wt = - sin 2wot - - sin 4wot 4 8 31 1 sin 4 wt = -cos 2wt + - cos 4wot 82 8 (B.46) (B.47) (B.48) (B.49) (B.50) Therefore: 4 (t) =3 (r(t)+ 4 2r (t)r2(t) + r4(t)) + 1 (r (t) - r (t)) cos 2wot - rI(t)rQ(t) (r2 (t) + r2(t)) sin 2wot 1 + 8 (r (t) - 6rV(t)r,(t) + rQ(t)) cos4wot (B.51) •2rI(t)rQ(t) (r2(t) + r2(t)) sin 4wt For the term r5 (t) term in (B.36): r5 (t) = r5 (t) cos5 wt - 5r14(t)rQ(t) cos4 Wt sin wt + 10rI(t)r(t)cos wot sin2 wot - 10r2(t)r+,(t)cos 2 wt sin 3 • ,t + (B.52) 5ri(t)r4(t)coswotsin4 wot + rQ(t) sin5 wot The required expressions for this are: 5 5 1 cos 5 wot = - coswot + - cos 3wot + - cos 5wot 8 16 16 139 (B.53) 3 1 sin 3wot + 8 16 1 1 Cos 3 Wt sin 2 wot = - cos wt - I cos 3wt Cos 4 W 0 t sin wt= - sin wt + - 16 8 (B.54) (B.55) 16 1 1 1 s-in wot + - sin 3wot - - sin 5wot 16 16 8 cos 2 Wot sin 3 wot = SA cos wot sin wot = 1 - sin 5wot 16 1 - cos 5wot - cos Wot - - ° w cos 3wt + - (B.56) , cos 5wot 8 16 16 1 5 5 sin 5wot sin5 wot = -sin wot - - sin 3wot + 8 16 16 (B.57) (B.58) From where: r 5 + r• (t)r2 (t) + r(t)r ((t) 8r8(t (t)rQ(t) + s r(t)(t) -(r4 15 16'(t) - r(t ) 6()r I r4(t)rQ(t) j-~rk~r si 5ot - 16r5(t)) r 10 2 16r (t)r(t - (B.59) 5 2(t) I(t)r 3 1---Ir5 ( (t)rq(t)) cos3wot 10 + o(t) sin w ot +(16r (t) - 16rI (t)r2(t) - 165 - cos ( t ) rq) 16r5(t) cos5ot sin 5wot + After this, the coefficients bo, b,,i and b8,i are obtained. For the DC term: ( bo = (t) r2 (t)) + + r (t)) 2(r(t)+ (B.60) For the fundamental term: 5a b, = 1+ a (r2(t) + 2(t)) + 5a8 (r2(t) + r2(t))2) 5 b3,1 =- ( + 3 (r I(t) + r ri(t) +(t)) +- (• (t) + r(t))2 rQ(t) (B.61) (B.62) For the second harmonic: +(t))Q+ (r(t)- r4(t)) 4 bs,2 = a2rs(t)rQ(t) + a4rs(t)rQ(t) (r 2(t) + r 2(t)) bc,2 a2 (r (t) 140 (B.63) (B.64) For the third harmonic: b,3a bs,3 = (r(t)- 3r(t)) + 5 (3r2 (t) - r,(t)) + (r4(t) - 2r2(t)r2(t) - 3rQ(t))2 5 (3r(t)+ ri(t) (B.65) 2r 2(t)r(t) - rQ(t))2 rQ•(t) (B.66) For the fourth harmonic: bc,4 = a (r (t) - 6r2(t)r2(t) + rQ(t)) (B.67) b,4 = (B.68) r(t)rQ(t) (r2 (t) + r2(t)) For the fifth harmonic: S= B.6 (t)ar(t) + 5r (t)) rI(t) (B.69) s = (5r(t)- 1Or2(t)r2(t) + ra(t)) rQ(t) (B.70) (r(t) (t)r-- Modelling I-Q unbalance In order to obtain the in-phase and quadrature components, the input signal is multiplied by: In-phase =- cos (Wot + c0 2 (t)) Quadrature - - (1 + A) sin (wot + Ak + 50 2 (t)) (B.71) (B.72) (B.73) where A represents the amplitude unbalance, A0 represents the phase unbalance and 652(t) the jitter in the receiver oscillator. A low pass filter (l.p.f) after the down-conversion is assumed to remove higher frequencies spurs. If the input to the down-converter is y(t) = Re { y (t)e&wt } = yi(t) cos wot - yQ(t) sinwot Then, the in-phase component is obtained as: I(t) = 2 -1.p.f. {y(t) - cos (wot + 56 2 (t))} = yI(t) cos 62 (t) + yQ(t) sin 502 (t) and the quadrature component is obtained as: Q(t) = -2 l1.p.f. {y(t) - (1 + A) sin (wot + AO + 50 2 (t))} = yQ(t) (1 + A) cos (AO + J52 (t)) + yj(t) (1 + A) sin (A0 + 502 (t)) 141 (B.74) 142 Bibliography [1] A.A.M. Saleh and R.A. Valenzuela, "A Statistical Model for Indoor Multipath Propagation," IEEE Journalon Selected Areas in Communications,vol. SAC-5, no. 2, pp. 128-137, Feb. 1987. [2] Federal Communications Commission, Ultra-Wideband (UWB) First Report and Order, Federal Communication Commission, Feb. 2002. [3] J. Foerster, "Channel Modeling Sub-Committee Report Final," Tech. Rep., IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs), Feb. 2002. [4] C. Luo, M. Medard, L. Zheng, "On Aproaching Wideband Capacity Using Multitone FSK," IEEE Journal on Selected Areas in Communications,vol. 23, no. 9, pp. 1830-1838, September 2005. [5] T.W. Barrett, "History of UltraWideBand (UWB) Radar & Communications: Pioneers and Innovators," in Proceedings of Progress In ElectromagneticsSymposium, Cambridge MA, 2000. [6] K. Siwiak, P. Withington, S. Phelan, "Ultra-wide band radio: the emergence of an important new technology," in Vehicular Technology Conference 2001, 2001, vol. 2, pp. 1169-1172. [7] S. Roy, J.R. Foerster, V.S. Somayazulu, D.G. Leeper, "Ultrawideband Radio Design: The Promise of High-Speed, Short-Range Wireless Connectivity," Proceedings of the IEEE, vol. 92, no. 2, pp. 295-311, February 2004. [8] I.I. Immoreev and A.N. Sinyavin, "Features of Ultra-wideband signals radiation," in Proceedings of the IEEE Conference on Ultra Wideband Systems and Technologies, 2002, pp. 345-349. [9] M.Z. Win and R.A. Scholtz, "Impulse Radio: How it Works," IEEE Communication Letters, vol. 2, no. 2, pp. 36-38, February 1998. [10] C.L. Bennet, G.F. Ross, "Time-domain electromagnetics and its application," Proceedings of the IEEE, vol. 66, pp. 299-318, 1978. [11] R.N. Morey, "Geophysical survey system employing electromagnetic impulses," U.S. Patent 3,806,795, Apr. 1974. 143 [12] H.F. Harmuth, Sequency Theory, Academic Press, 1977. [13] R.A. Scholtz, "Multiple Access with Time-Hopping Impulse Modulation," in Proceedings of the MILCOM conference, 1993, pp. 447-450. [14] M.Z. Win and R.A. Scholtz, "Ultra-Wide Bandwidth Time-hopping, SpreadSpectrum Impulse Radio for Wireless Multiple Access Communications," IEEE Transactions on Communications,vol. 48, no. 4, pp. 679-691, Apr. 2000. [15] M.Z. Win, R.A. Scholtz, L.W. Fullerton, "Time-hopping SSMA techniques for impulse radio with an analog modulated data subcarrier," in Proceedings of the IEEE Fourth InternationalSymposium on Spread Spectrum Techniques and Applications, Mainz, Germany, September 1996, pp. 359-394. [16] M.Z. Win and R.A. Scholtz, "On the Robustness of Ultra-Wide Bandwidth Signals in Dense Multipath Environments," IEEE CommunicationLetters, vol. 2, no. 2, pp. 51-53, Feb. 1998. [17] M.Z. Win and R.A. Scholtz, "On the Energy Capture of Ultrawide Bandwidth Signals in Dense Multipath Environments," IEEE Communications Letters, vol. 2, no. 9, pp. 245-247, Sept. 1998. [18] R.J. Cramer, R.A. Scholtz, M.Z. Win, "An Evaluation of the Ultra-wideband Propagation Channel," IEEE Transactions on Antennas Propagation,vol. 50, no. 5, pp. 516-570, May 2002. [19] D. Cassioli, M.Z. Win, A.F. Molisch, "The Ultra-wide Bandwidth Indoor Channel: from Statistical Model to Simulations," IEEE Journal on Selected Areas of Communication, vol. 20, no. 6, pp. 1247-1257, August 2002. [20] M.Z. Win and R.A. Scholtz, "Characterization of Ultra-wide Bandwidth Wireless Indoor Channel: A Communication Theoretic View," IEEE Journal on Selected Areas of Communication, vol. 20, no. 9, pp. 1613-1627, December 2002. [21] C.J. Le Martret and G.B. Giannakis, "All Digital PAM Impulse Radio for Multiple Access through Frequency Selective Multipath," in Proc. of the IEEE 2000 Global Telecommunications Conference, 2000, pp. 77-81. [22] L. Yang and G.B. Giannakis, "Ultra-wideband Communications: An Idea Whose Time Has Come," IEEE Signal Processing Magacine, vol. 21, no. 6, pp. 26-54, November 2004. [23] C.J. Le Martret and G.B. Giannakis, "All Digital PPM Impulse Radio for Multiple Access through Frequency Selective Multipath," in Proceedings of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop, 2000, pp. 22-26. 144 [24] C.N. Georghiades, "On PPM Sequences with Good Autocorrelation Properties," IEEE Transactions on Information Theory, vol. 34, no. 3, pp. 571-576, may 1988. [25] M. Medard and R.G. Gallager, "Bandwidth Scaling for Fading Multipath Channels," IEEE Transactions on Information Theory, vol. 48, no. 4, pp. 840-852, April 2002. [26] I.E. Telatar and D.N.C. Tse, "Capacity and Mutual Information of Wideband Multipath Fading Channels," IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1384-1400, July 2000. [27] I.D. O'Donnel and R.W. Brodersen, "An Ultra-Wideband Transceiver Architecture for Low Power, Low Rate, Wireless Systems," IEEE Transactions on Vehicular Technology, vol. 54, no. 5, pp. 1623-1631, September 2005. [28] T.Q.-S. Quek and M.Z. Win, "Performance Analysis of Ultrawide Bandwidth Transmitted-reference Communications," in Proceedings of the IEEE Semiannual Vehicular Technology Conference, Milan, Italy, May 2004, vol. 3, pp. 1285-1289. [29] T.Q.-S. Quek and M.Z. Win, "Analysis of UWB Transmitted REference Communication Systems in Dense Multipath Channels," IEEE Journal on Selected Areas of Communication, vol. 23, no. 9, pp. 1863-1874, September 2005. [30] R.T. Hoctor and H.W. Tomlinson, "An Overview of Delay-Hopped, Transmitted-Reference RF Communications," Tech. Rep., GE Research & Development Center, January 2002. [31] J.D. Choi and W.E. Stark, "Performance of Ultra-wideband Communications with Suboptimal Receivers in Multipath Channels," IEEE Journal on Selected Areas in Communications,vol. 20, no. 9, pp. 1754-1766, December 2002. [32] W.M. Gifford and M.Z. Win, "On Transmitted-Reference UWB Communications," in Proceedings of the 38th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, november 2004, pp. 1526-1531, Invited Paper. [33] T.Q.S. Quek, M.Z. Win, D. Dardari, "UWB Transmitted-Reference Signalling Schemes - Part I: Performance Analysis," in Proceedings of the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland, September 2005, pp. 587-592. [34] T.Q.S. Quek, M.Z. Win, D. Dardari, "UWB Transmitted-Reference Signalling Schemes - Part 2: Narrowband Interference Analysis," in Proceedings of the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland, September 2005, pp. 593-598. [35] Anuj Batra et al., "TI Physical Layer Proposal for IEEE 802.15 Task Group 3a," Tech. Rep., Texas Instruments, May 2003. 145 [36] J. Balakrishnan, A. Batra, A. Dabak, "A Multi-band OFDM System for UWB Communication," in Proceedings of the COnference on Ultra-Wideband Systems and Technologies, 2003, pp. 354-358. [37] E. Saberinia and A. Tewfik, "Pulsed and Non-pulsed OFDM Ultra Wideband Wireless Personal Area Networks," in Proceedings of the 2003 IEEE Conference on Ultra Wideband Systems and Technologies, November 2003, pp. 275-270. [38] R. Roberts, "XtremeSpectrum CFP Document," Tech. Rep., Physical Layer Submission to IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs), July 2003. [39] A. Batra, J. Balakrishnan, A. Dabak, R. Gharpurey, P. Fontaine, J. Lin, "TimeFrequency Interleaved Orthogonal Frequency Division Multiplexing," Tech. Rep., Physical Layer Submission to IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs), May 2003. [40] "IEEE 802.11a, supplement to Standard IEEE 802.11. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: Highspeed Physical Layer in the 5 GHz Band," Tech. Rep., IEEE, Sept. 1999. [41] D.B. Jourdan, J.J. Deyst, M.Z. Win, N. Roy, "Monte-Carlo Localization in Dense Multipath Environments using UWB Ranging," in Proceedings of the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland, september 2005, pp. 314-319. [42] J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits, Prentice Hall, 2nd edition, 2002. [43] P. Newaskar, R. Blazquez, A. Chandrakasan, "A/D Precision Requirements for an Ultra-Wideband Radio Receiver," in Proc. of the 2002 IEEE Workshop on SIPS, 2002, pp. 270-275. [44] M.S. Braasch and A.J. Van Dierendonck, "GPS Receiver Architectures and Measurements," Proceedings of the IEEE, vol. 87, no. 1, pp. 48-64, January 1999. [45] H. Meyr and G. Ascheid, Synchronization in Digital Communications, Volume 1: Phase-Frequency-LockedLoops and Amplitude Control, Wiley Interscience, 1990. [46] S.Iida, K. Tanaka, H. Suzuki, N. Yoshikawa, N. Shoji, B. Griffiths, D. Mellor, F. Hayden, I. Butler, J. Chatwin, "A 3.1 to 5 GHz CMOS DSSS UWB Transceiver for WPANs," in Proceedingsof the 2005 IEEE InternationalSolid-State Circuits Conference, 2005, pp. 214-215. [47] J.R. Foerster, "The Effects of Multipath Interference on the Performance of UWB systems in an Indoor Wireless Channel," in Proc. of the 2001 IEEE Vehicular Technology Conference, May 2001, pp. 1176-1180. 146 [48] S.S. Ghassemzadeh, R. Jana, C.W. Rice, W. Turin, V. Tarokh, "A Statistical Path Loss Model for Inhome UWB Channels," in Proc. of the 2002 IEEE UWBST, May 2002, pp. 59-64. [49] D. Cassioli, M.Z. Win, A.F. Molish, "A Statistical Model for the UWB Indoor Channel," in Proc. of the 2001 IEEE Vehicular Technology Conference, May 2001, pp. 1159-1163. [50] Vincenzo Lottici, Aldo D'Andrea, Umberto Mengali, "Channel Estimation for Ultra-Wideband Communications," IEEE Journal on Selected Areas in Communications, vol. 20, no. 9, pp. 1638-1645, december 2002. [51] J.G. Proakis, Digital Communications, McGraw Hill Inc, fourth edition, 2000. [52] C. Carbonelli, U. Mengali, U. Mitra, "Synchronization and Channel Estimation for UWB Signals," in Proceedings of the Global Telecoommunications Confer- ence, 2003, pp. 764-768. [53] I. Maravic, J. Kusuma, M. Vetterli, "Low-Sampling Rate UWB Channel Characterization and Synchronization," Journal of Communication Networks, vol. 5, no. 4, pp. 319-327, 2002. [54] Z. Wang and X. Yang, "Ultra wide-band communications with blind channel estimation based on first-order statistics," in Proceedings of the International Conference in Acoustics, Speech and Signal Processing,2004. [55] W. Suwansantisuk and M.Z. Win, "Fundamental Limits on Spread Spectrum Signal Acquisition," in Proceedings of the Conference on Information Science and Systems, Baltimore, MD, march 2005. [56] W. Suwansantisuk, M.Z. Win, L.A.Shepp, "Properties of the Mean Acquisition Time for Wide-Bandwidth Signals in Dense Multipath Channels," in Proceedings of the 3rd SPIE International Symposium on Fluctuation and Noise in Communication Systems, Austin, TX, may 2005, pp. 121-135. [57] W. Suwansantisuk and M.Z. Win, "On the Asymptotic Performance of MultiDwell Signal Acquisition in Dense Multipath Channels," in Proceedings of the IEEE International Conference on Ultra-Wideband, Zurich, Switzerland, September 2005, Invited Paper. [58] W. Suwansantisuk and M.Z. Win, "Multipath Aided Rapid Acquisition: Optimal Search Strategies," IEEE Transactions on Information Theory, vol. tbp, pp. tbp, 2006. [59] G.R. Aiello and G.D. Rogerson, "Ultra-wideband wireless systems," microwave magazine, vol. 4, no. 2, pp. 36-47, June 2003. 147 IEEE [60] M.Z. Win, G. Chrisikos, N.R. Sollenberger, "Performance of Rake Reception in Dense Multipath Channels: Implications of Spreading Bandwidth and Selection Diversity Order," IEEE Journal on selected areas in communications, vol. 18, no. 8, pp. 1516-1525, August 2000. [61] M.Z. Win, G. Chrisikos, N.R. Sollenberger, "Effects of Chip Rate on Selective Rake Combining," IEEE Communications Letters, vol. 4, no. 7, pp. 233-235, July 2000. [62] L. Yang and G.B. Giannakis, "A General Model and SINR Analysis of Low Duty-Cycle UWB Access Through Multipath With Narrowband Interference and Rake Reception," IEEE Transactions on Wireless Communications, vol. 4, no. 4, pp. 1818-1833, July 2004. [63] D. Cassioli, M.Z. Win, F. Vatalaro, A.F. Molish, "Performance of LowComplexity RAKE Reception in a Realistic UWB Channel," in Proc. of the 2002 IEEE International Conference on Communications, 2002, pp. 763-767. [64] G.D. Forney, "Maximum Likelihood Sequence estimation of Digital Sequences in the Presence of Intersymbol Interference," IEEE Transactions on Information Theory, vol. 18, pp. 363-378, May 1972. [65] A. Hafeez and W.E. Stark, "Decision Feedback Sequence Estimation for Unwhitened ISI Channels with Applications to Multiuser Detection," IEEE Journal on Selected Areas in Communications, vol. 16, no. 9, pp. 1785-1795, December 1998. [66] P.J. Black, T.H.Y. Meng, "A 1-Gb/s, Four-State, Sliding Block Viterbi Decoder," IEEE Journal of Solid State Circuits, vol. 32, pp. 797-805, June 1997. [67] C.B. Shung, H.D. Lin, R. Cypher, P.H. Siegel, H.K. Thapar, "Area-efficient Architectures for the Viterbi Algorithm - Part I: Theory," IEEE Transactions on Communications, vol. 41, no. 4, pp. 636-644, april 1993. [68] C.B. Shung, H.D. Lin, R. Cypher, P.H. Siegel, H.K. Thapar, "Area-Efficient Architectures for the Viterbi Algorithm - Part II: Applications," IEEE Transactions on Communications,vol. 41, no. 5, pp. 802-807, may 1993. [69] M.A. Bickerstaff, et al., "A Unified Turbo/Viterbi Channel Decoder for 3 GPP Mobile Wireless in 0.18-pm CMOS," IEEE Journal of Solid-State Circuits, vol. 37, no. 11, pp. 1555-1564, november 2002. [70] X. Liu and M.C. Papaefthymiou, "Design of a 20-Mb/s 256-State Viterbi Decoder," IEEE Transactions on very large scale integration (VLSI) systems, vol. 11, no. 6, pp. 965-975, december 2003. [71] E. Yeo, S.A. Augsburger, W.R. Davis, B. Nikolic, "A 500-Mb/s Soft-Output Viterbi Decoder," IEEE Journal of Solid-State Circuits, vol. 38, no. 7, pp. 1234-1241, july 2003. 148 [72] A.P. Chandrakasan, S. Sheng, R.W. Brodersen, "Low-power Digital CMOS Design," IEEE Journal of Solid State Circuits, vol. 27, no. 4, pp. 473-484, April 1992. [73] R. Min, M. Bhardwaj, S.H. Cho, N. Ickes, E. Shih, A. Sinha, A. Wang, A. Chandrakasan, "Energy-Centric Enabling Technologies For Wireless Sensor Networks," IEEE Wireless Communications, vol. 9, no. 4, pp. 28-39, August 2002. [74] J. Bergervoet, K. Harish, G. van der Weide, D. Leenaerts, R. van de Beek, H. Waite, Y. Zhang, S. Aggarwal, C. Razzell, R. Roovers, "An Interference Robust Receive Chain for UWB Radio in SiGe BiCMOS," in Proceedings of the 2005 IEEE InternationalSolid-State Circuits Conference, 2005, pp. 200-201. [75] D. Leenaerts, R. van de Beek, G. van der Weide, J. Bergervoet, K.S. Harish, H. Wite, Y. Zhang, C. Razzell, R. Roovers, "A SiGe BiCMOS 1 ns Fast Hopping Frequency Synthesizer for UWB Radio," in Proceedings of the 2005 IEEE InternationalSolid-State Circuits Conference, 2005, pp. 202-203. [76] H.Y. Liu, C.C. Lin, Y.W. Lin, C.C. Chung, K.L. Lin, W.C. Chang, L.H. Chen, H.S. Chang, C.Y. Lee, "A 480Mb/s LDPC-COFDM-Based UWB Baseband Transceiver," in Proceedingsof the 2005 IEEE InternationalSolid-State Circuits Conference, 444-445, 2005. [77] M. Verhelst, W. Vereecken, M. Steyaert, W. Dehaene, "Architectures for Low Power Ultra-wideband Radio Receivers in the 3.1-5GHz Band for Data Rates < 10Mbps," in Proceedings of the ISLPED, Newport Beach, California, USA, 2004, pp. 280-285. [78] B. Razavi, Principles of Data Conversion System Design, Wiley-IEEE Press, 1994. [79] F.S. Lee, D. Wentzloff, A. Chadrakasan, "An Ultra-Wideband Baseband FrontEnd," in Digest of Papers of the 2004 Radio Frequency Integrated Circuits Symposium, June 2004, pp. 493-496. [80] W. Suwansantisuk, M.Z. Win, L.A. Shepp, "On the Performance of WideBandwidth Signal Acquisition in Dense Multipath Channels," IEEE Transactions on Vehicular Technology, vol. 54, no. 5, pp. 1584-1594, September 2005. [81] W. Suwansantisuk and M.Z. Win, "Optimal Search Strategies for Ultrawide Bandwidth Signal Acquistion," in Proceedingsof the IEEE InternationalConference on Ultra-Wideband,Zurich, Switzerland, September 2005, pp. 349-354. [82] E.A. Homier and R.A. Scholtz, "Rapid Acquisition of Ultra-wideband signals in the dense multipath channel," in Proceedings of the IEEE Conference on Ultra Wideband Systems and Technologies, 2002, pp. 105-109. 149 [83] L. Yang and G. B. Giannakis, "Blind UWB Timing with a Dirty Template," in Proceedings of the International Conference in Acoustics, Speech and Signal Processing, 2004, pp. 509-512. [84] R. Gagliardi, J. Robbins, H. Taylor, "Acquisition Sequences in PPM Communications," IEEE Transactions on Information Theory, vol. IT-33, no. 5, pp. 738-744, september 1987. [85] R. Blazquez, P. Newaskar, A. Chandrakasan, "Coarse Acquisition for Ultrawideband digital Receivers," in Proc. of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing,Apr. 2003, vol. 4, pp. 137-140. [86] A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing, Prentice Hall Signal Processing Series. Prentice Hall, second edition, 1999. [87] J. Dornberg, H.S. Lee, D.A. Hodges, "Full-speed testing of A/D converters," IEEE Journal of Solid-State Circuits,vol. 19, no. 12, pp. 22-26, December 1984. [88] A. Giorgetti, M. Chiani, M.Z. Win, "The Effect of Narrowband Interference on Wideband Wireless Communication Systems," IEEE Transactions on Communications, vol. 53, no. 12, pp. 2139-2149, december 2005. [89] D. Gerakoulis and P. Salmi, "An Interference Suppressing OFDM System for Ultra Wide Bandwidth Radio Channels," in Proceedings of the IEEE Conference on Ultra Wideband Systems and Technologies, 2002, pp. 259-264. [90] B. Razavi, RF Microelectronics, Communications, Engineering and Emerging Technologies. Prentice Hall, first edition, 1998. [91] J.A.C. Bingham, "Multicarrier Modulation for Data Transmission: An Idea Whose Time Has Come," IEEE Communications Magazine, vol. 28, no. 5, pp. 5-14, may 1990. [92] R. Blazquez, F.S. Lee, D. Wentzloff, P. Newaskar, J. Powell, A. Chandrakasan, "Digital Architecture for an Ultra-wideband Radio Receiver," in Proc. of the 2003 IEEE Vehicular Technology Conference, 2003, vol. 2, pp. 1303-1307. [93] W.R. Braun and U. Dersch, "A Physical Mobile Radio Channel Model," IEEE Transactions on Vehicular Technology, vol. 40, no. 2, pp. 472-482, May 1991. [94] M.Z. Win and J.H. Winters, "Analysis of Hybrid Selection/Maximal-Ratio Combining in Rayleigh Fading," IEEE Transactions on Communications,vol. 47, no. 12, pp. 1773-17767, December 1999. [95] M.Z. Win and J.H. Winters, "Analysis of Hybrid Selection/Maximal-Ratio Combining in Rayleigh Fading," in Proceedings of the IEEE InternationalConference on Communications, Vancouver Canada, June 1999, vol. 1, pp. 6-10. 150 [96] M.Z. Win and J.H. Winters, "Analysis of Hybrid Selection/Maximal-Ratio Combining of Diversity Branches with Unequeal SNR in Rayleigh Fading," in Proceedings of the 49th Annual International Vehicular Technology Conference, Houston, TX, May 1999, vol. 1, pp. 215-220. [97] A. Papoulis, Probability, Random Variables, and Stochastic Processes, Electrical & Electronic Engineering. McGraw-Hill International Editions, third edition, 1991. [98] S. Verdu, Multiuser Detection, Cambridge University Press, 1998. [99] Q. Li and L.A. Rusch, "Multiuser Detection for DS-CDMA UWB in the Home Environment," IEEE Journal on Selected Areas in Communications, vol. 20, no. 9, pp. 1701-1711, December 2002. [100] N. Kong and L.B. Milstein, "Combined Average SNR of A Generalized Diversity Selection Combining Scheme," in Proceedings of the IEEE International Conference on Communications, June 1998, vol. 3, pp. 1556-1560. [101] M.Z. Win and Z.A. Kostic, "Impact of Spreading Bandwidth on Rake Reception in Dense Multipath Channels," IEEE Journal on Selected Areas in Communications,vol. 17, no. 10, pp. 1794-1806, October 1999. [102] M.Z. Win and Z.A. Kostic, "Virtual Path Analysis of Selective Rake Receiver in Dense Multipath Channels," IEEE Communications Letters, vol. 3, no. 11, pp. 308-310, November 1999. [103] W.C. Jakes, Microwave Mobile Communications, IEEE Press, Piscataway, NJ, 08855-1331, IEEE press classic reissue edition edition, 1995. [104] J. Foerster and Q. Li, "UWB Channel Modeling Contribution from Intel," Tech. Rep., IEEE P802.15-02/279-SG3a. [105] M.Z. Win, "A Unified Spectral Analysis of Generalized Time-Hopping SpreadSpectrum Signals in the Presence of Timing Jitter," IEEE Journal on Selected Areas in Communications, vol. 20, no. 9, pp. 1664-1676, December 2002. [106] M.Z. Win, "Spectral Density of Random Time-hopping Spread-spectrum UWB Signals," IEEE Communications Letters, vol. 6, no. 12, pp. 526-528, December 2002. [107] A. Ridolfi and M.Z. Win, "Ultrawide Bandwidth Signals as Shot-Noise: a Unifying Approach," IEEE Journal on Selected Areas of Communications,vol. 24, no. 4, pp. 899-905, april 2006. [108] J. Romme and L. Piazzo, "On the Power Spectral Density of Time-Hopping Impulse Radio," in Proceedings of the Conference on Ultra-wideband Systems and Technologies, 2002, pp. 241-244. 151 [109] J. Powell and A.P. Chandrakasan, "Differential and Single Ended Elliptical Antennas for 3.1-10.6 GHz Ultra Wideband Communication," in Proceedings of the IEEE Antennas and PropagationSociety InternationalSymposium, June 2004. [110] J. Powell and A.P. Chandrakasan, "Spiral Slot Antenna and Circular Disc Monopole Antenna for 3.1-10.6 GHz Ultra Wideband Communications," in Proceedingsof the 2004 InternationalSymposium on Antennas and Propagation, June 2004. [111] N. Ackerman, "A Platform for Ultra Wideband Communication Systems," M.S. thesis, Massachusetts Institute of Technology, May 2005. [112] F.S. Lee and A.P. Chandrakasan, "A BiCMOS Ultra-wideband 3.1-10.6GHz Front-End," in Proceedings of the IEEE CICC, September 2005. [113] D.D. Wentzloff and A.P. Chandrakasan, "A 3.1-10.6 GHz Ultra-wideband Pulse-shaping Mixer," in IEEE Radio Frequency IC symposium, June 2005. [114] B. Ginsburg and A.P. Chandrakasan, "Dual Scalable 500MS/s, 5b TimeInterleaved SAR ADCs for UWB Applications," in Proceedings of the IEEE CICC, 2005. 152