Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks By Phil Callahan, Senior Marketing Manager, Timing Products, Silicon Labs Introduction Skyrocketing network bandwidth demands driven by consumer mobile devices and cloud-based streaming services, such as Netflix, Hulu, YouTube, Spotify, Pandora, online gaming and others, are pushing Internet Infrastructure suppliers to develop data center systems that support dramatically higher data rates, such as 10G, 40G and 100 Gbps. In addition, the increasing popularity of commercial cloud computing, which offers network-based computing and storage as a service, is further accelerating the demand for application-flexible, high-bandwidth networks in today’s data centers. Figure 1 illustrates the impact of these popular cloud-based streaming services on the growth of Internet traffic bandwidth. Cisco’s Visual Networking Index (VNI) Forecast (June 2014) projects the following market trends: Cloud applications and services such as Netflix, YouTube, Pandora, and Spotify, will account for 90 percent of total mobile data traffic by 2018. Global network traffic will be three times larger in 2018 than in 2013, equivalent to streaming 33B DVDs/month, or 46M DVDs/hour. By 2018, consumer online gaming traffic will be four times higher than it was in 2013. Figure 1. Cisco VNI June 2014 Silicon Labs Rev 1.0 1 Cloud-Based Apps Drive Network Convergence in the Data Center To reliably deliver a Netflix video or a Spotify high-quality audio stream, service providers must be equipped with data center hardware that supports three primary networks, as shown in Figure 2: LAN/WAN networks commonly comprise 1 Gb, 10 Gb, and/or 100 Gb Ethernet switches connected in a mesh switch fabric for the data center LAN, and OTN (Optical Transport Networking) interconnects to the WAN. These networks deliver the content from the data center to the cloud and, ultimately, to the user. Compute networks comprise many server and switch “blades” interconnected using copper cables, PCB backplanes or optical links. These interconnects use a combination of 1 Gb, 10 Gb Ethernet, PCIe, and in some cases, InfiniBand. Network interfaces in compute networks must support not only high data rates but also very low latency, which is critical for streaming video and audio service quality. Storage networks are primarily based on Fiber Channel, Gb or 10Gb Ethernet switches and direct connections to storage subsystems using PCIe. These networks store considerable amounts of content, requiring multi-gigabit capable protocols. YouTube, Netflix, Hulu, Spotify, Pandora... Users Network Cloud WAN 10GbE, 100 GbE over OTN Ethernet over DWDM Switch Ethernet Switch Ethernet Switch Converged Data Center GbE, 10 GbE Compute Server Blades LAN GbE, 10 GbE Data Center Switch Blade PCIe 3.0, GbE, 10 GbE Storage 10GbE, 100 GbE over OTN Data Center Switch Blade 10/40 GbE backplane Storage Compute Server Blades Compute Fibre Channel, FCoE Storage Storage Storage Figure 2. Data Center Network Overview To meet the rapidly expanding Internet bandwidth demands of content providers, compute and storage networks for data centers must become flatter and more horizontally interconnected. Known as the “converged data center,” this flatter architecture is required to improve server-to-server and server-tostorage communication within the data center, which directly impacts latency and the quality of streaming services. In addition to delivering latency performance advantages, the converged data center architecture is highly scalable and lends itself to software virtualization of compute server and storage hardware resources, supporting rapid changes in service bandwidth demands. Some vendors refer to this architecture as Software Defined Networking (SDN). Silicon Labs Rev 1.0 2 Traditional Clock Tree Designs for Converged Data Centers Are Too Complex As data center compute and storage networks become horizontally interconnected with multi-gigabit Ethernet, Fibre Channel, and PCIe links embedded into pluggable, high-density blades, they place new demands on system engineers, especially the clock tree designers. Designers must find clock tree solutions that support both increasing functional densities and the multitude of high-bandwidth network protocols while reducing PCB footprints, power and costs. Let’s consider a traditional clock tree design approach for a data center switch blade, as shown in Figure 3. Whether this blade is implemented on a PCB using multiple ICs or based primarily on a single systemon-a-chip (SoC) solution, the compute switch blade’s primary function is to support simultaneous, highbandwidth, low-latency communications between the LAN, compute server blades and storage devices. Data center switch blades support the consolidation of multi-gigabit LAN and multi-protocol storage traffic into highly scalable networks. However, the traditional clock tree used to support data center switch blades is complicated (see Figure 3), requiring eight clock tree components: Three crystal oscillators (XOs) Three buffer ICs Two clock generator ICs. Data Center Switch Blade CPU / NPU 75 to 150 MHz 1.8V CML Clock IC Multi-core Processor Security Processor Memory Control 166.66... MHz 2.5V LVDS Clock IC 100 MHz 2.5V HCSL PCIe 3.0 DDR-333 Memory Switch SoC L2 Switch Fabric PCIe 3.0 Controller Storage Management Multi-lane SerDes Multi-lane SerDes X8 PCIe slot Storage Blades Octal 10GbE PHY Management and I/O Control 10GbE FCoE GbE LAN 156.25 MHz 2.5V LVDS 125 MHz 2.5V LVDS 50 MHz 2.5V LVDS Multi-lane SerDes Buffer Octal GbE PHY Multi-lane SerDes Quad 10G PHY Quad 10G PHY 10/40 GbE LAN 161.1328125 MHz 3.3V LVPECL Buffer 156.25 MHz XO System Clock Ethernet MACs Buffer 125 MHz XO 10 GbE Backplane Dual XO 161.1328125 & 156.25 MHz 2.5V LVDS Figure 3. Data Center Switch Blade Using Traditional Clock Tree Multi-Lane SerDes and PHY Reference Clocks A major reason for clock tree complexity is that high-speed communications links fundamentally rely on multi-lane, multi-gigabit serializer/de-serializers (SerDes) and physical layer devices (PHYs) for each network interface type. SerDes chips and PHYs are critical building blocks for data center switch blades. Depending on the network type (LAN/ WAN, compute, storage), protocol (GbE, 10 GbE, Fibre Channel, PCIe), and transmission medium (fiber optic cabling, copper cables or PCB backplanes), each multigigabit SerDes or PHY device requires a low-jitter reference clock, and many operate at different frequencies. Due to protocol and physical media standard differences, these reference clocks are seldom integer-related. For example, the 161.1328125 MHz clock is fractionally related (by 66/64) to the 156.25 MHz clock. This fractional relationship makes the simultaneous generation of low-jitter SerDes clocks much more challenging, as fractional dividers must be used. Fractional dividers used in traditional clock Silicon Labs Rev 1.0 3 generators produce significantly higher jitter than integer dividers used in integer-only PLL clock generators, forcing designers to use more expensive, dedicated XOs to generate each unique frequency. CPU, Memory and System Clocks While the jitter requirements of some ICs (such as SerDes and PHY clocks) may be very strict, other switch blade functions have less stringent requirements (100 MHz PCIe, variable 75 to 150 MHz CPU, and 166.66 MHz DDR-333 memory clocks). However, given the limited flexibility and integration level of traditional solutions, clock tree designers have been forced to use multiple clock generators and crystal oscillators (XOs) and buffers to complete the clock tree. Therefore, to meet increasing demands for higher network port density and bandwidth in data center switch blades, clock tree designers need clock generators that offer: Multiple, low-phase jitter SerDes and PHY reference clocks that are fully compliant with the stringent jitter performance specifications required by the dominant networking (1/10/100G Ethernet), storage (Fibre Channel, PCIe) and computing (PCIe, Infiniband) standards. Generally, the jitter specifications range from about 1 ps RMS to less than 300 fs RMS (12 k to 20 MHz). Frequency flexibility to enable simultaneous generation of a wide range of integer and fractionallyrelated clock frequencies while adhering to the stringent network, compute and storage, and clock jitter specifications. The ability to change frequencies on the fly, without affecting other outputs, is also highly desirable. For example, this enables speed-grading CPUs to meet different product cost and market needs. Highest level of integration to provide significant reductions in PCB area, cost and component count and maximize system port densities and cost per bit. A New Approach to Clock Tree Design for Converged Data Centers In contrast to traditional clock generators, next-generation clocking solutions, such as Silicon Labs’ Si5341/40 clock family, leverage fractional- and integer-frequency-synthesis flexibility and higher levels of integration. This architectural approach delivers an efficient, cost-effective, single-chip solution that integrates all discrete timing functions into a single IC without sacrificing jitter performance. Silicon Labs’ proprietary MultiSynth fractional divider technology is key to enabling the Si5341 clock to simultaneously generate any integer or fractional frequency up to 800 MHz on any output, with typical jitter < 150 fs. Si5341 Silicon Labs Rev 1.0 XTAL SEL As shown in Figure 4, the Si5341 clock uses a single, lowpower VCO to drive five independent MultiSynth fractional dividers, which are connected via a non-blocking cross point switch to an array of 10 clock outputs. In the first stage of this architecture, the MultiSynth high-speed fractional-N divider seamlessly switches between the two closest integer divider values to produce an exact output clock frequency with 0 ppm frequency synthesis error. To eliminate phase errors generated by this process, MultiSynth calculates the relative phase difference between the clock produced by the fractional-N divider and the desired output clock and dynamically adjusts the phase to match the ideal clock waveform. This novel approach makes it possible to generate any-output clock frequencies from 1 kHz to 800 MHz with 0 ppm error. The result is better than 100 fs RMS phase jitter performance (12 kHz to 20 MHz) in integer mode and less than 150 fs in its synthesis mode, which simultaneously generates both fractional and integer-related clocks. To learn IN0 ÷P0 IN1 ÷P1 IN2 ÷P2 Nn1 Nd1 PLL Nn2 Nd2 XA OSC Nn3 Nd3 XB FB_IN ÷Pfb I2C/ SPI Multi Synth Nn4 Nd4 Multi Synth Nn5 Nd5 Multi Synth Nn6 Nd6 NVM Status Multi Synth Nn7 Nd7 Multi Synth Nn8 Nd8 ÷R0 OUT0 ÷R1 OUT1 ÷R2 OUT2 ÷R3 OUT3 ÷R4 OUT4 ÷R5 OUT5 ÷R6 OUT6 ÷R7 OUT7 ÷R8 OUT8 ÷R9 OUT9 Nn9 Nd9 Figure 4. Si5341 Functional Diagram 4 more, see the related white paper, Innovative DSPLL ® and MultiSynth Clock Architecture Enables HighDensity 10/40/100G Line Card Designs. Frequency Flexibility Transforms Clock Tree Designs for Data Center Switch Blades By using the frequency-flexible, ultra-low jitter, 10-output Si5341 clock generator, developers can reduce the clock tree for a data center switch blade from eight discrete components to just one high-performance clock. (See Figure 5.) Traditional Clock Tree for Data Center Switch Blade 50 MHz (LVDS) Clock Silicon Labs Solution for Data Center Switch Blade System Clock 50 MHz (LVDS) 166.66... MHz (LVDS) 166.66... MHz (LVDS) DDR-333 Clock System Clock 75 to 150 MHz (LVDS) Switch SoC CPU 100 MHz (HCSL) Switch SoC PCIe 3.0 DDR-333 Si5341 75 to 150 MHz (LVDS) Switch SoC CPU 100 MHz (HCSL) Switch SoC PCIe 3.0 MultiSynth 161.1328125 MHz (LVPECL) 161.1328125 MHz (LVPECL) MultiSynth 10/40G PHY Buffer MultiSynth 10/40G PHY 10/40G PHY 161.1328125 MHz (LVPECL) 10/40G PHY MultiSynth 156.25 MHz (LVDS) MultiSynth 10G PHY Buffer 156.25 MHz (LVDS) 10G PHY 156.25 MHz (LVDS) Switch SoC SerDes 125 MHz (LVDS) Switch SoC SerDes 125 MHz (LVDS) 1G PHY 1G PHY Buffer 125 MHz (LVDS) 1G PHY 1G PHY Figure 5. Traditional vs. Si5341-Based Clock Tree Traditional Clock Tree Challenges Many clocks with diverse frequencies PHY and SerDes clocks require very low jitter Many different signaling formats are required Silicon Labs Solution Any-frequency, Any format, Any output < 100 fs jitter (integer mode); < 150 fs (fractional) Ten output clocks consolidate the clock tree Summary Cloud-based streaming services are driving growing demands for higher data rates. To meet these demands, high-speed networking and data center equipment requires frequency-flexible clock generator IC solutions to support faster data rates. High-performance, frequency-flexible Si5341/40 clock generators are capable of generating any frequency on any output with best-in-class jitter performance (< 100 fs RMS in integer mode and < 150 fs RMS in fractional synthesis mode). Data center clock tree designers can leverage these new clock products and Silicon Labs’ ClockBuilder Pro software to minimize the timing component BOM count and complexity required to build highly flexible, high-bandwidth Internet Infrastructure equipment for the converged data center. Silicon Labs Rev 1.0 5