Evaluating New Technologies for Test and Measurement: PCI Express, Multicore Processing, and Microsoft Windows Vista NIDays 2007 Worldwide Virtual Instrumentation Conference 10,000 Max Bandwidth (MB/s) Increasing (Improving) Bandwidth Evaluating Test and Measurement Buses Good Better Best PCI Express (x4) 1,000 PCI/PXI (32/33) Gigabit Ethernet 100 10 Hi-Speed USB IEEE 1394a Fast Ethernet USB 1.1 1 10,000 1,000 VME/VXI GPIB (HS488) GPIB (488.1) 100 10 1 Approximate Latency (μs) Decreasing (Improving) Latency 3 0.1 Increasing Bus Bandwidth Opens New Applications 24 Multichannel Audio Number of Bits 20 IF Communications 16 Data Acquisition 12 HighResolution Digitizers Instrument Control 8 High-Speed Imaging PCI ISA 1M 10M PCI Express 100M 1G 10G Sample Rate (S/s) 4 100G PCI Express Overview • Evolutionary version of PCI – Uses same software model as PCI, ensuring compatibility • • • Inside every new PC and notebook today Low cost – built into PC chipsets Serial interconnect at 2.5 Gb/s – – – – PCI transactions are packetized and then serialized Low-voltage differential signaling, point-to-point, 8 B/10 B encoded Bandwidth is dedicated PER slot and in BOTH directions Multiple lanes can be grouped together to form links • x1 (by 1) has bandwidth of 250 MB/s/direction • x16 (by 16) has bandwidth of 4 GB/s/direction • • Scalable interconnect – chip-to-chip, backplane, or cabled Roadmap for longevity with Gen-2 clocking (5 Gb/s) 5 Dedicated Bandwidth per Device Total System Bus Throughput (MB/s) 3000 2500 Ie PC x4 2000 1500 1000 PCIe x1 500 PCI (32/33) Gigabit Ethernet 0 1 3 2 Number of Devices 6 Software Layer • PCI software-model compatible – 100% OS and driver-level compatible – PCI enumeration, configuration, and power management mechanisms – Existing operating systems boot with no changes (including BIOS) • PCI Express hierarchy mapped using PCI elements – Host bridges – P2P bridges – All enumerated using the regular PCI device configuration space • PCI capability pointer for PCI Express-specific extensions 7 Physical Layer Data Clock • • • • • • • Sequence Number Packet Request CRC Frame Device B Device A Data Frame x1 Lane Frame CRC Packet Sequence Request Number Frame Data Data Clock Point-to-point, differential interconnect with two endpoints Low-voltage signaling, AC coupled Two unidirectional links, no sideband signals Bit rate: >2.5 Gb/s/pin/direction and beyond Clocking: Embedded clock signaling using 8 B/10 B encoding Link widths (per direction): x1, x2, x4, x8, x12, x16, x32 Gen-2 (5 Gb/s) speed increase 8 PCI Express and PCI Slots on a Motherboard 3 x1 PCI Express Slots 1 x16 PCI Express Slots 2 PCI Slots 9 PCI Express Cards NI PCIe-GPIB Instrument Control (x1) NI PCIe-1429 Image Acquisition (x4) PCI Express Graphics Card (x16) Examples of Different PCI Express Link Widths: x1, x4, and x16 10 Up-Plugging and Down-Plugging Up-plugging: Installing boards in higher-lane slots • Allowed by PCI Express • Example: Plugging a x4 module in a x8 slot • Caveat: Motherboard vendors are only required to support a x1 data rate in this configuration – Full-bandwidth support will be vendor specific – Example: x16 slots may operate as a x1, even for x4 cards Down-plugging: Installing boards in lower-lane slots • Physically prevented by the design of the slots and connectors for the desktop form factor • Allowed in PXI Express and CompactPCI Express 11 ExpressCard – PCI Express for Laptops • Both x1 PCI Express and Hi-Speed USB signaling on host • 34 mm and 54 mm form factors • PXI embedded controllers include ExpressCard/34 slot 12 PCI Express Industry Adoption • First PCI Express desktops shipped mid 2004 • First ExpressCard laptops shipped early 2005 • PCI and PCI Express are side-by-side in all Intel/Dell roadmaps • Primary consumer driver is graphics processing (gamers, video editing) – PCI Express x16 replacing AGP 13 National Instruments Shipping Products • • • • • • • • • • NI PCIe-GPIB (x1) NI PCIe-6251 M Series (x1) NI PCIe-6259 M Series (x1) NI PCIe-1429 Camera Link (x4) NI PCIe-1430 Camera Link (x4) NI PCIe-8361 MXI-Express (x1) NI PCIe-8362 MXI-Express (x1) NI PCIe-8371 MXI-Express (x4) NI PCIe-8372 MXI-Express (x4) NI ExpressCard-8360 MXI-Express 14 PCI Express Advantages • • • • • • • • Software compatibility with PCI High bandwidth (up to >4 GB/s) Scalable bandwidth Dedicated bandwidth per slot Low latency Peer-to-peer communication Internal and external operation Long life (20+ years in the mainstream market) 15 PXI Express – Integrating PCI Express into the PXI Backplane • Up to 6 GB/s backplane and 2 GB/s slot bandwidth • Backward compatibility – Complete software compatibility – Hybrid slot definition – install modules with either PCI or PCI Express signaling in a single slot • Enhanced synchronization capabilities – 100 MHz differential clock, differential triggering 16 PXI and Hybrid Slots Ensure Compatibility 17 PXI Slots 18 Hybrid Slots 19 PXI Express Hybrid Slots • x8 PCIe (up to 2 GB/s) • Differential Clk. 100 & Star Triggers • Power • Trigger Bus • Star Trigger • Clk. 10 • Reserved Pins • Local Bus (typically unused) 32/33 PCI (132 MB/s per system) PXI Express Hybrid PXI 20 Hybrid Slot Flexibility PXI Express Peripheral Module 32-Bit CompactPCI Module Hybrid Slot Compatible PXI Module 21 NI PXIe-1062Q Hybrid Chassis Hybrid Slot Configuration PXI: 2 PXI or PXIe: 3 PXIe Only: 4 Hybrid Slots 22 6 H 5 7 H 8 PXI-8105 Dual-Core Embedded Controller • • • • • • • • • Industry’s highest-performance embedded controller Up to 100% higher performance for multithreaded apps 2.0 GHz dual-core Intel Core Duo processor T2500 Dual-channel 667 MHz DDR2 RAM Gigabit Ethernet ExpressCard/34 slot 4 Hi-Speed USB ports 60 GB SATA hard drive DVI-I video 23 NI PXI-1033 Chassis with Integrated MXI Express Controller • 110 MB/s sustained throughput with MXI-Express remote control • Rugged, compact package with slots for five peripheral modules • Quiet acoustic noise emissions as low as 38 dBA • Kit includes chassis with integrated controller, host card (PCI Express or ExpressCard), and cable 24 PXI Express Video Demo – NIWeek 2006 Keynote Click box to start video demo 25 What Is Multicore Processing? • Multicore processors contain two or more cores, or computing engines, in one physical processor • • Multicore processors simultaneously execute two or more computing tasks Why Multicore? Because of power and performance issues, continuing to rely solely on increases in processor clock rates to improve performance is not feasible 26 Multi-core Programming “One Holy Grail of computer science research has been finding a way to let a compiler take care of parallelization. “ - Richard Wirt, Intel Senior Fellow C LabVIEW 27 Multicore vs. Multiprocessor vs. Hyperthreaded Multiprocessor • Multiprocessor systems include two or more physical processors • Multiprocessor systems duplicate computing resources that are often shared in multicore systems (front-side bus, etc.) • Multiprocessor systems are, most often, higher cost than similar multicore systems (single processor, processor socket, etc.) Hyperthreaded • A hyperthreaded processor “acts like” two physical processors • Certain resources are duplicated (register set, etc.), but the execution unit is shared • Hyperthreaded systems include multiple logical processors 28 Multitasking • Multitasking environments (Windows XP, etc.) allow multiple applications to run at the same time • With a multicore processor, these multiple applications can simultaneously execute on the processor cores 29 Multithreading • Multithreaded applications separate their tasks into independent threads • A multicore processor can simultaneously execute these threads 30 Demo Multithreaded Application Executing on a Dual-Core Processor LabVIEW 8.0 Multi-Threaded VI PXI-8105 LabVIEW Benchmarks PXI-8105 LabVIEW 8.0 Single-Threaded VI PXI-8196 100% PXI-8105 PXI-8196 0 50 25% 100 150 200 250 32 The Future of Multicore Processing • Architecture improvements to further reduce power and improve memory bandwidth • Multiprocessor systems with multicore processors • More processor cores • Quad-core processors will release in 2007 33 Microsoft Windows Vista Overview • • • • • • Visualization and Search Security Changes .NET 3.0 API Vista x86 versus Vista x64 Vista Availability Vista System Requirements 34 Graphics and Visualization 35 Vista x86 versus Vista x64 Vista x86 (32-Bit) Vista x64 (64-Bit) WoW Emulation Executes in User Mode 32-Bit Application 32-Bit Application 64-Bit Application Executes in Kernel Mode 32-Bit Service or Driver 64-Bit Service or Driver 64-Bit Service or Driver NI Software 2007 After 2007 36 Vista System Requirements • Minimum (XP-like experience) – 1 GHz “Modern” Processor – 512 MB RAM – DirectX 9 Video • Premium (“Aero” experience) – 1 GHz “Modern” Processor – 1 GB RAM – DirectX 9 Video with 128 MB VRAM 37 Vista-ready LabVIEW 8.2.1 released on Monday, April 9th 38