Engineering and Industrial Services White Paper Forward-Looking SoC-based PHY Architecture for Macro and Small Cell LTE eNode-Bs About the Authors Venkatasubramanian Viraraghavan Venkatasubramanian Viraraghavan has more than 12 years of experience in the telelcom industry and is currently the Technical Architect of the LTE Physical Layer IP developed at Tata Consultancy Services (TCS). He has gained expertise on the physical layer of most radio standards such as CDMA2000, WCDMA Release 99 and HSPA+, TD-SCDMA, LTE and LTE Advanced. In his previous role, he was a System Architect for one of TCS' telecom clients. He also has over 3 years of experience in multimedia research, particularly in music signal processing. He holds a Master's degree in System Science and Signal Processing from the Indian Institute of Science, Bangalore and a Bachelor’s degree in Electrical Engineering from the Indian Institute of Technology, Madras. Kulandaivel P Kulandaivel.P has over 13 years of experience in the field of Wireless Products design and development. He currently leads the LTE Physical Layer IP development program at TCS. Kulandaivel started his career as a Research Engineer working on the design and development of Wireless Baseband Design for an Enhanced TDMA system. He has since worked in the field of Mobile WiMAX Base station development, satellite communication and data communication. Keen interest in signal processing and hardware design, and contributions to several international conference papers have further enriched his expertise. He holds a Masters degree in Engineering (Applied Electronics) from the Coimbatore Institute of Technology. Dimitri Dey Dimitri Dey is a Systems Engineer at TCS with approximately three years of experience in LTE Physical Layer and Protocol Stack. He holds a Bachelor’s degree in Technology from the Gurunanak Institute Of Technology. Heterogeneous networks such as a mix of macro and small cell (pico, femto) base-stations enable flexible, low-cost deployments. They provide a uniform broadband experience to users anywhere in the network. However, multiple form factors of base station solutions have their own challenges in physical layer development. Conventional Digital Signal Processor (DSP) architectures cannot meet these demands, prompting platform providers to come up with new, more efficient System on Chip (SoC) solutions to address these demands. Additionally, maintaining different versions of physical layer software for different form factors can require significant modifications and rework, making this task difficult. This paper describes an efficient, forward-looking architecture that enables the handling of various form factors of Long Term Evolution (LTE) base stations with minimal software modification, and without architecture changes. The proposed architecture supports easy migration to next generation SoCs as well as to more powerful SoCs of the same generation. Following the implementation of this architecture, we have now migrated two generations of SoCs for LTE Release-9 small and macro cells, and are ready to implement a multi-sector, LTEAdvanced (LTE-A) macro base station. This solution is part of our intellectual property (IP) development initiative. Contents Section I Introduction 5 Section II Current Trends in Base Station SoCs 5 Section III Evolution of SoCs for 4G systems 7 Section IV Forward looking and flexible architecture for LTE Macro base station 9 Section V Forward looking and flexible architecture for LTE small cell base station SoC 11 Section VI Forward looking and flexible architecture for LTE–Advanced macro base station SoC 12 Section VII Real time implementation results 13 Section VIII Conclusion 14 Section IX Acknowledgments 14 Section I - Introduction The growing popularity of smart phones, tablets, and laptops has consequently led to an increase in data and video content - requiring additional bandwidth (BW) and necessitating better quality of service. Wireless subscribers expect high speed network access anytime, anywhere. Current wireless standards, such as the 3rd Generation Partnership Projects (3GPP) LTE, can offer the data rate required by demanding applications. However, LTE radio access is testing the limits of increasing the BW and improving the signal to noise ratio to increase the channel capacity defined by Shannon's theorem – suitably adapted for Multiple Input Multiple Output (MIMO) channels. Therefore, LTE-Advanced increases the spectrum available for mobile data applications through carrier aggregation. Another solution for increasing overall mobile network capacity is to increase the carrier-tointerference ratio while decreasing cell size and deploying small cell technologies. LTE-A addresses improvements in spectral efficiency per unit area. One of the key components that affect the throughput of such networks is the physical layer (PHY), especially the baseband. LTE and LTE-A base station (eNode-B) baseband design challenges such as higher data rates and lower latency have compelled designers to adopt design methodologies that predominantly leverage heterogeneous system designing. Heterogeneous architectures consisting of multiple DSP cores, cores for higher layer processing, and hardware (HW) accelerators are being developed to fulfill the wireless broadband standard requirements. Heterogeneous implementations offer the combined benefits of flexibility and programmability in DSPs, and increased performance (higher throughput and lower power) in the hardware accelerators. Several advanced SoC solutions employing such heterogeneous architectures are available for macro, micro, pico and femto LTE eNode-Bs development. These SoCs are considered in the next two sections - we discuss current trends in base station SoCs and also trace the evolution of such SoCs for 4G LTE. The paper details our forward-looking architecture that can be easily ported onto future generations of SoCs. It also provides examples on how this porting can be achieved for LTE small cells and LTE-Advanced macro cells respectively. The real time implementation results corresponding to our current implementation and its projected evolution that keeps up with the SoCs are presented in this white paper. Section II - Current Trends in Base Station SoCs In the early stages of evolution of a wireless standard (exemplified by LTE, CDMA2000, WCDMA and so on), the physical layer for a base station is often implemented on a board with multiple DSPs and Field Programmable Gate Arrays (FPGAs). These boards are typically general purpose boards and enable Software-Defined-Radio (SDR) implementations of multiple standards. 5 Higher layer board(s) ARM, Freescale or other Ethernet Backhaul Memory High-speed (HS) interface Baseband board(s) DSP 1 DSP 2 DSP 3 FPGA(s) HS serial interface RF subsystem Memory Figure 1: Wireless base-station architecture Correspondingly, the 'higher layers' (HL), that is, Layer 2 and Layer 3, for such standards are implemented on other suitable boards, which typically have an ARM or Freescale processor. Again, the board is not limited to any particular wireless standard. The RF subsystem is separate and is not considered in detail in this paper, except to state that there is some high-speed serial interface of differential signals between the RF board and the baseband board. Figure 1 captures this architecture. Indeed, until now, this has been a common architecture for testing as well as measurement equipment. Wireless applications demand very high processing power and the chosen processors need to be suitably high-end. In parallel, putting together ASICs and heterogeneous processors based on Moore's law have resulted in SoC solutions. Further, ASICs and processors have grown in processing power. ASICs now cater to wireless technologies and current SoCs have the wherewithal to perform all the processing needs of a multi-sector base station. What was done on several boards with several processors can now be compressed into a single SoC at a fraction of the cost of all the boards needed for equivalent functionality (even after correcting for current prices). And, the SoC will draw a fraction of the total power consumed. This paper goes on to consider the evolutions of such SoCs in more detail. In summary, typical SoCs for wireless base stations consist of the following: Standard-independent processing cores for the physical layer (DSP cores) Standard-independent processing cores for the higher layers (HL cores) 6 Standard-dependent accelerator ASICs for data processing for the physical layer and higher layers. Note, there may be more than one standard that is catered to. The interface presented by these accelerators will allow the choice of the standard, but it is rare to find SoCs that perform complete data processing for earlier standards such as GSM. Industry-standard RF interfaces (CPRI, OBSAI and so on) Industry-standard inter-board interfaces such as Ethernet, SRIO, PCI-express. These are used for connecting to the backhaul in a base station. Peripherals such as the DMA engine, memory controllers Section III - Evolution of SoCs for 4G systems This section examines the general trend in the evolution of SoCs for 4G systems from the physical layer point of view. Figure 2 shows the Physical Downlink Shared CHannel (PDSCH) and the Physical Uplink Shared CHannel 1,2 (PUSCH) chains in an eNodeB (except the channel estimation block) . A list of blocks in increasing order of computational complexity and decreasing order of configurability is presented below: Block 1. DL symbol rate processing (scrambling to resource mapping) Block 2. UL soft bits processing (de-mapping to LLR generation) Block 3. UL symbol rate processing (channel equalization and IDFT) Block 4. CRC, Turbo Encoding and rate matching Block 5. Cyclic Prefix addition and removal Block 6. Half-subcarrier shift removal Block 7. IDFT/FFT/IFFT Block 8. Turbo decoding Starting from a complete software solution, if the aim is to move blocks to hardware, a natural strategy, based on computational complexity, is to move them in line with the above list, but in reverse order. The evolution of SoCs itself is evidence of this strategy. In the first generation SoCs, only turbo decoding and FFT/IFFT/IDFT were provided as hardware accelerators – the most computationally intensive but well-defined blocks. Figure 2 presents these blocks in dashed boxes. [1] 3GPP TS 36.211 V10.5.0 (2012-06)3rd Generation Partnership Project;Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA);Physical Channels and Modulation (Release 10); accessed Nov 2014 [2] 3GPP TS 36.212 V10.6.0 (2012-06) 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel coding (Release 10); accessed Nov 2014 7 5. CP Removal 2. MMC Equalization Matrix Inversion 6. Half sub carrier shift 7. FFT Deinterleaver Descrambling Demod 3. IDFT 8. Turbo Decoder Rate Dematching HARQ ACK/NACK 1st, 2nd and 3rd Generation SoCs 2nd and 3rd Generation SoCs 3rd Generation SoCs UL Data Channel 4 CRC24a CB Segmentation CRC24b 5. CP Insertion 4. Turbo Encoding Rate Matching 7. IFFT 1. Scrambling Modulation, Layer Mapping, Pre-coding Resource Mapping Control Insertion 5. CP Insertion DL Data Channel (PDSCH) Blocks Figure 2: Evolution of hardware for LTE data channels in eNodeB SoCs The second generation of SoCs provided hardware acceleration for Blocks 3 to 6. The current (that is, 3rd) generation of SoCs completes the chain partially, if not fully, by providing hardware acceleration for Blocks 1 and 2. Third generation SoCs require software to perform very little for LTE data channels. The number of instances of hardware accelerator blocks increased in 3rd generation SoCs which helps to reduce the complexity of implementing baseband design. The order in which blocks are accelerated in hardware is not the same as the logical ordering of blocks in the data chains. This has a bearing on the design of forward-looking software. In reality, several vendors offer SoCs for 3G and 4G systems, each with unique features and advantages. Thus, the generic analysis of LTE will not precisely fit any available SoC. Nevertheless, it is relevant because there aren't many 8 ways of splitting the LTE physical layer data chains. Each SoC may leave out parts of the blocks described above, which have to then be completed in software. A common component not addressed by hardware acceleration is channel estimation (not shown in Figure 2) – this is done to allow eNodeB developers to use proprietary algorithms that can differentiate their product. The control of hardware accelerators resides in DSP cores that are also present on the same SoC. These cores are also used to complete the physical layer functionality (DL control channels, UL control channels and physical signals). The numbers of instances of DSP cores, higher layer cores and hardware accelerators (for L1 and L2) are decided by the target application of the SoC. LTE and LTE-A macro cells require many cores and a set of hardware accelerators. This is addressed by 3rd generation SoCs. For small cells, a small number of cores and the minimal set of hardware accelerators will suffice. For LTE-A and for macro cells with multiple sectors or carriers, many sets of cores and hardware accelerators will be needed. It is a challenge to design a software architecture that can easily migrate to more powerful SoCs and scale up to the available processing power. The following sections present a solution to this challenge. An evolution similar to the one mentioned here is occurring in the same SoCs for WCDMA, and the SoCs support both 3G and 4G acceleration simultaneously. Section IV - Forward looking and flexible architecture for LTE Macro base station In the industry, it is usual for physical layer developers to port their software onto several generations or variants of platforms and maintain more than one stream simultaneously. To minimize rework, and considering the evolution of SoCs, a systematic approach for software architecture and design can cater to all generations of SoCs. Principles of such a design are explained below. An architecture that adheres to the following principles is likely to be more 'forward-compatible' than one that does not. Principle 1. Software design should cater to a family of SoCs, rather than a single SoC. For this to be possible, advance information from the SoC manufacturer is needed, which may entail a partnership with them. In the absence of such a partnership, the design team needs to periodically follow releases, and look-up data sheets of upcoming or just released SoCs. Principle 2. Hardware accelerators of next-generation SoCs should be mimicked in unique DSP cores in the current generation SoCs, to the extent that available information about next generation SoCs permits. This way, the control software on the main controlling DSP core that manages tasks and threads will undergo little change when migrating from one generation to another. Two added advantages exist. First, the number of times the controlling DSP core is interrupted will mirror that in the next generation SoC. Secondly, parameters for processing will be copied as in the next generation SoC (the core that is mimicking the accelerator should be stateless), thus 9 simulating its environment more accurately. Contrast this with a linear software implementation (function call or software-posted task) that will not identify such interaction issues early. Principle 3. Inherent virtual addressing to isolate sector contexts (in a multi-sector deployment) should be exploited. This will bring in some automatic scalability as a major effort and cost saver. Principle 4. Flexibility in the framework for deploying software modules in different cores is an obvious requirement. 2nd Gen hardware Accelerator Antenna Interfaces Memory Subsystem DDR3 Mem Controller Core 1 FAPI L1C + UL Control SW + PRACH Multicore Fabric Bus Core 0 FAPI L1C + DL Control SW Core 5 Unused Core 4 PUSCH SW Components Core 2 Unused Core 3 PDSCH SW Components GigEth Controller SRIO Figure 3: Evolution of hardware for LTE data channels in eNode-B SoCs As an example of an architecture that follows the above principles, we illustrate our architecture for a release-9, LTE eNode-B physical layer for a single sector on a multi-core DSP³. This DSP has six cores and one hardware accelerator, the latter corresponding to that in second generation SoCs. Looking ahead at 3rd generation SoCs, it was decided that core 0 and core 1 would retain similar functionality in both generations' SoCs. Core 3 and Core 4 would respectively mimic the accelerators on the 3rd generation SoC for Blocks 1 and 2 in Section III. Figure 3 illustrates this architecture, which exemplifies Principles 1 and 2. [3] Freescale, Six core Digital Signal Processor with Security, Rev 2, Dec2013, accessed Feb 2014, http://www.freescale.com/files/dsp/doc/data_sheet/MSC8157E.pdf 10 To complete the picture, the downlink signals and channels (Cell-Specific Reference Signals, Primary and Secondary Synchronization Sequences, the Physical Control Format Indicator Channel, the Physical HARQ Indicator Channel and the Physical Downlink Control Channel) are implemented in software in core 0. The uplink physical channels and signals (the Uplink Control Channel, the Physical Random Access Channel (PRACH) and the Sounding Reference Signal) are implemented in core 1. These software modules are optimized to achieve the standard-specific latency using well known techniques: compilation with the highest optimization level of executing speed, loop unrolling, the use of constant or restrict keywords, the use of cache-able memory efficiently, Direct Memory Access (DMA) use and so on. Section V - Forward looking and flexible architecture for LTE small cell base station SoC 3,4 Figure 4 presents the currently used architecture for the 3rd generation, small cell SoC . 3nd Gen hardware Accelerator Antenna Interfaces Memory Subsystem DDR3 Mem Controller Core 1 FAPI L1C + UL Control SW + PRACH Multicore Fabric Bus Core 0 FAPI L1C + DL Control SW GigEth Controller L2,L3 Core 0 Power Architecture L2,L3 Core 1 Power Architecture SRIO Figure 4: Small LTE cell architecture on a 3rd Generation SoC [4] Texas Instruments, TMS320TCI6614 Communications Infrastructure KeyStone SoC, Feb2013, accessed Feb 2014. http://www.ti.com/lit/ds/symlink/tms320tci6614.pdf 11 Comparing this with Figure 3 reveals that two higher layer cores have been added, four DSP cores have been removed, and the hardware accelerator has been enhanced. Given the hardware available, only one LTE release-9 sector is proposed (or two if the software and hardware usage are to be optimized significantly more). However, given the forward looking architecture mentioned in Section IV, we can see that neither the two unused cores of the second generation SoC nor the two cores (3 and 4) that mimic the hardware accelerators of this SoC are now needed. Further, the software for core 0 and core 1 will undergo slight change, largely to address the drivers for the new found hardware accelerators. Section VI - Forward looking and flexible architecture for LTE–Advanced macro base station SoC 5,6, For an LTE-Advanced implementation on a 3rd generation SoC, the architecture in Section V can be extended easily. First the software architecture can be extended to handle more than one component carrier (two will suffice for high data-rate cells). 3nd Gen hardware Accelerator Antenna Interfaces (More instances) L2,L3 Cores Power Architecture Memory Subsystem Core 1 (Sector 0 UL) Multicore Fabric Bus Core 0 (Sector 0 DL) Core 5 (Sector3 UL) Core 4 UL Control Ch (Sector 2 DL) Core 2 (Sector 1 DL) Core 3 (Sector 1 UL) GigEth Controller SRIO Figure 5: Macro LTE-A cell architecture on a 3rd Generation SoC [5] Texas Instruments,Multicore DSP+ARM KeyStone II System-on-Chip (SoC),Oct 2013, accessed Feb 2014 http://www.ti.com/lit/ds/symlink/tci6636k2h.pdf [6] Freescale, B4860: QorIQ Qonverge B4860 Baseband Processor, accessed Feb 2014 http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=B4860 12 Second, the scale for multiple carriers and sectors or even component carriers, the same software can be deployed in additional cores (Core 2 to Core 5). This will be made much simpler by the use of principle 3 of Section IV (see Figure 5). Of course, for such a scalability to be feasible, the hardware accelerator throughputs must match, which is usually achieved in SoCs by increasing the number of instances (also depicted in Figure 5). In the B48606 SoC, the extra core performance can be used to deploy more sectors and carriers than predicted by a simple linear extension 7 of the performance in PSC9132QDS Platform Section VII - Real time implementation results The performance we achieved using the forward-looking architecture with parallel programming and multi-core methodology as discussed above is given in Table 1. We note that at these performance numbers, or scaled equivalents depending on the system configuration, the solution also interworks with a 3rd-party UE simulator. The results show that in the downlink, the theoretical maximum rate for two antennas has been reached while it also works with 4 to 8 UEs/TTI. Key Performance Indicator (KPI) Macro Cell Base station Small Cell Base Station2 Macro Cell with Multi Sector3 3GPP LTE Release 3GPP LTE Release 9 3GPP LTE Release 9 3GPP LTE Release 10 LTE Small Cell Solution Advanced DSP hardware /SoC Used MSC8157 PSC9132 B4860 NumberNumber of UEs/TTI supported 4UEs/TTI 8UEs/TTI 20 UEs/TTI Number of Connected Users1 128 32 512 Data rate achieved (DL/UL) in Mbps 150/50 150/50 600/150 MIMO Configuration 2X2 2X2 4X4 Software portability (% lines of code changed from the current implementation) Current 15% 25% Notes: 1. Applicable only if PHY stores semi-static information 2. Projected figures, implementation is in progress 3. Projected figures for proposed implementation [7] Freescale, BSC9132: QorIQ Qonverge BSC9132 Dual-core Processor and Dual-core DSP, accessed Feb 2014 http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=BSC9132 13 Section VIII - Conclusion LTE/LTE-A eNode-B design is challenged by the need for handling various base station requirements which require software architecture changes and the corresponding code changes. The long lead time in obtaining the latest SoC hardware for porting an existing solution poses another operational challenge. This impacts the time to market of the solution and/or meeting of customer milestones with new requirements. A forward looking architecture will address these challenges, at least partly, on the existing hardware itself. We have described our LTE eNode-B physical layer implementation, which leverages such a forward looking architecture and reduces the time for porting it on various SoCs targeting base stations for small cells, macro cells and LTE-A cells. As pointed out, achieving such porting with minimal effort in software modification will only require documentation of the latest and upcoming SoCs from the vendor as against waiting for the hardware itself. From the projected real time implementation results for LTE-Advanced, we can realize the maximum performance (600 Mpbs in the DL and 150 Mbps in UL) using Freescale's Macro Base station SoC B4860. This SoC can also be used to implement multi-sector and multi-carrier LTE-Advanced eNodeBs. The data rates achieved can be stretched even further by efficiently utilizing multi-core synchronization using the same forward looking architecture. The real time implementation results achieved with our forward looking architecture for an LTE Release 9, small cell solution and the proposed LTE advanced performance metrics realize both the stringent protocol demands and business goals of cost-effective upgrades. Section IX - Acknowledgments The authors would like to thank Mr Rajarama Nayak, Head, Embedded Technology Solutions Group, Mr. Srinath Chitlapalli, Head, Telecom EIS, and Dr. Sudharsan G for their contribution to this paper. 14 Contact For more information, contact eis.marketing@tcs.com Subscribe to TCS White Papers TCS.com RSS: http://www.tcs.com/rss_feeds/Pages/feed.aspx?f=w Feedburner: http://feeds2.feedburner.com/tcswhitepapers About Tata Consultancy Services (TCS) Tata Consultancy Services is an IT services, consulting and business solutions organization that delivers real results to global business, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled infrastructure, engineering and assurance services. This is delivered through its unique Global Network Delivery ModelTM, recognized as the benchmark of excellence in software development. A part of the Tata Group, India’s largest industrial conglomerate, TCS has a global footprint and is listed on the National Stock Exchange and Bombay Stock Exchange in India. IT Services Business Solutions Consulting All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright © 2014 Tata Consultancy Services Limited TCS Design Services I M I 03 I 14 For more information, visit us at www.tcs.com