Reconfigurable Architectures • Forces that drive a Reconfigurable Architecture – Price • Mass production 100K to millions • Experimental 1 to 10’s – Granularity of reconfiguration • Fine grain • Course Grain – Degree of system integration/coupling • Tightly • Loosely All are a function of the application that will run on the Architecture 1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Example Points in (Price,Granularity,Coupling) Space $1M’s Exec Int float RFU Store Decode Intel / AMD Processor Price Coupling $100’s Loose Coarse Tight PC Ethernet Granularity ML507 Fine 2 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) What’s the point of a Reconfigurable Architecture • Performance metrics – Computational • Throughput • Latency – Power • Total power dissipation • Thermal – Reliability • Recovery from faults Increase application performance! 3 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Typical Approach for Increasing Performance • Application/algorithm implemented in software – Often easier to write an application in software • Profile application (e.g. gprof) – Determine where the application is spending its time • Identify kernels of interest – e.g. application spends 90% of its time in function matrix_multiply() • Design custom hardware/instruction to accelerate kernel(s) – Analysis to kernel to determine how to extract fine/coarse grain parallelism (does any parallelism even exist?) Amdahl’s Law! 4 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity 5 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Coarse Grain • rDPA: reconfigurable Data Path Array • Function Units with programmable interconnects Example ALU ALU ALU ALU ALU ALU ALU ALU ALU 6 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Coarse Grain • rDPA: reconfigurable Data Path Array • Function Units with programmable interconnects Example ALU ALU ALU ALU ALU ALU ALU ALU ALU 7 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Coarse Grain • rDPA: reconfigurable Data Path Array • Function Units with programmable interconnects Example ALU ALU ALU ALU ALU ALU ALU ALU ALU 8 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Fine Grain • FPGA: Field Programmable Gate Array • Sea of general purpose logic gates CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Configurable Logic Block 9 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Fine Grain • FPGA: Field Programmable Gate Array • Sea of general purpose logic gates CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Configurable Logic Block 10 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Fine Grain • FPGA: Field Programmable Gate Array • Sea of general purpose logic gates CLB CLB CLB Configurable Logic Block CLB CLB CLB CLB CLB 11 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor 1024-bits 12 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits Microprocessor op A B 2-LUT 4 3 3 10-LUT 3 1024-bits 13 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits Microprocessor op A B 2-LUT 4 3 3 10-LUT 3 op A B op 4 A B 3 4 3 3 1024-bits 3 3 3 14 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits Microprocessor op A B 2-LUT 4 3 op A B op 4 A B 3 3 10-LUT 3 3 1024-bits 3 3 3 3 15 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits Microprocessor op A B 2-LUT 4 3 1024-bits 4 op A B op A B 10-LUT 3 3 3 3 3 4 3 3 3 16 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants 17 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures 1024-bits Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants 1024-bits (A and “1100”) or (B or “1000”) 18 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT A 10-LUT B Bit logic and constants 1024-bits (A and “1100”) or (B or “1000”) 19 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits A 4 2-LUT AND 10-LUT 1 Bit logic and constants OR (A and “1100”) or (B or “1000”) B 0 4 1024-bits Area that was required using 2-LUTS OR It’s much worse, each 10-LUT only has one output 20 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Example Architectures • Fine grain: GARP • Course grain: PipeRench 21 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP Memory I-cache D-cache CPU RFU Config cache Garp chip 22 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP Memory RFU control (1) I-cache D-cache CPU RFU Config cache Execution (16, 2-bit) N PE (Processing Element) Garp chip 23 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP Memory RFU control (1) I-cache D-cache CPU RFU Config cache Garp chip Execution (16, 2-bit) N PE (Processing Element) Example computations in one cycle A<<10 | (b&c) (A-2*b+c) 24 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP Memory I-cache D-cache Impact of configuration size • 1 GHz bus frequency •128-bit memory bus • 512Kbits of configuration size On a RFU context switch how long to load a new full configuration? CPU RFU Config cache Garp chip 4 microseconds An estimate of amount of time for the CPU perform a context switch is ~5 microseconds ~2x increase context switch latency!! 25 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP Memory RFU control (1) I-cache D-cache CPU RFU Config cache Execution (16, 2-bit) N PE (Processing Element) Garp chip “The Garp Architecture and C Compiler” http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf 26 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench • Coarse granularity • Higher (higher) level programming • Reference papers • PipeRench: A Coprocessor for Streaming Multimedia Acceleration (ISCA 1999): http://www.cs.cmu.edu/~mihaib/research/isca99.pdf • PipeRench Implementation of the Instruction Path Coprocessor (Micro 2000): http://class.ee.iastate.edu/cpre583/papers/piperench_Micro_2000. pdf 27 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE PE PE 8-bit ALU 8-bit ALU 8-bit ALU Reg file Reg file Reg file Global bus Interconnect PE PE PE 8-bit ALU 8-bit ALU 8-bit ALU Reg file Reg file Reg file Interconnect PE PE PE 8-bit ALU 8-bit ALU 8-bit ALU Reg file Reg file Reg file 28 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE 2 3 4 5 6 Pipeline 0 stage 1 2 PE PE PE PE PE PE PE PE 3 4 29 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 2 3 4 5 6 0 2 PE PE PE PE PE PE PE PE 3 4 30 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 4 5 6 0 1 2 PE PE PE PE PE PE PE PE 3 4 31 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 2 PE PE PE PE PE PE PE PE 0 2 3 0 0 1 1 4 5 6 2 3 4 32 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 2 PE PE PE PE PE PE PE PE 3 0 2 3 4 0 0 1 1 1 2 2 5 6 3 4 33 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 2 PE PE PE PE PE PE PE PE 3 0 2 3 4 0 0 1 1 1 2 2 2 3 3 4 34 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures 5 6 4 Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 2 PE PE PE PE PE PE PE PE 3 0 2 3 4 5 6 0 0 1 1 1 2 2 2 3 3 3 4 4 0 4 35 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 2 6 0 4 Cycle 1 5 3 4 5 6 Pipeline 0 stage 1 2 36 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 Pipeline 0 stage 1 2 6 0 4 Cycle 1 5 3 4 5 6 0 2 37 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 Pipeline 0 stage 1 0 2 6 0 4 Cycle 1 5 3 4 5 6 0 1 2 38 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 Pipeline 0 stage 1 2 0 2 3 0 0 1 1 6 0 4 Cycle 1 5 4 5 6 2 39 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 Pipeline 0 stage 1 2 0 6 0 4 Cycle 1 5 2 3 4 0 0 3 1 1 1 2 2 40 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures 5 6 Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 Pipeline 0 stage 1 2 0 6 0 4 Cycle 1 5 2 3 4 5 0 0 3 3 1 1 1 4 2 2 2 41 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures 6 Iowa State University (Ames) Granularity: PipeRench Cycle 1 PE PE PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE PE PE PE PE PE PE 3 4 Pipeline 0 stage 1 2 0 6 0 4 Cycle 1 5 2 3 4 5 6 0 0 3 3 3 1 1 1 4 4 2 2 2 0 42 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling • Independent Reconfigurable Coprocessor – Reconfigurable Fabric does not have direct communication with the CPU • Processor + Reconfigurable Processing Fabric – Loosely coupled on the same chip – Tightly coupled on the same chip 43 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back DMA Controller Memory Controller L2 Cache Main Memory L1 Cache Memory Decode Fetch CPU Execute ALU FPU I/O Controller USB PCI NIC PCI-Express SATA Hard Drive 44 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back DMA Controller Memory Controller L2 Cache Main Memory L1 Cache Memory Decode Fetch CPU Execute ALU FPU I/O Controller USB PCI RPF NIC PCI-Express SATA Hard Drive 45 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back RPF DMA Controller Memory Controller L2 Cache Main Memory L1 Cache Memory Decode Fetch CPU Execute ALU FPU I/O Controller USB PCI NIC PCI-Express SATA Hard Drive 46 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back DMA Controller L2 Cache Memory Controller RPF I/O Controller PCI NIC PCI-Express Main Memory USB L1 Cache Memory Decode Fetch Config I/F CPU Execute ALU FPU SATA Hard Drive 47 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back DMA Controller L2 Cache Memory Controller RPF I/O Controller PCI NIC PCI-Express Main Memory USB L1 Cache Memory Decode Fetch Config I/F CPU Execute ALU FPU SATA Hard Drive 48 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back DMA Controller Memory Controller L2 Cache RPF USB PCI NIC I/O PCI-Express Main Memory L1 Cache Memory Decode Fetch Config I/F CPU Execute ALU FPU I/O Controller SATA Hard Drive 49 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Write Back DMA Controller Memory Controller L2 Cache Main Memory L1 Cache Memory Decode Fetch CPU Execute ALU FPU RFU I/O Controller USB PCI NIC PCI-Express SATA Hard Drive 50 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) 51 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Next Class • Reconfiguration Management – Chapter 4 52 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Questions/Comments/Concerns • Write down – Main point of lecture – One thing that’s still not quite clear OR – If everything is clear, then give an example of how to apply something from lecture 53 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Lecture notes 54 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench • Scheduling virtual stage on to physical • Partial/Dynamically reconfig (each cycle) 55 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP • Impact of configuration size on performance • Context switching • Garp feature • Dynamic reconfigurable • Store multiple configurations in an on chip cache (4) • One configuration at a time • Example app mapping to GARP (loop) • Amdahl's Law The Garp Architecture and C Compiler • http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf 56 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Overview • Dimensions – Price – Granularity – Coupling – To optimize App Performance (compute (throughput, latency), Power, reliability) • RPF to efficiently implement VICs – Main picture authors' wants to convey • What’s the point or having a Reconfigure arch – Example (Increase App performance) • App -> SW/CPU • Profile • ID kernels of intense compute • Design custom hardware/instruction (Amdels law) – Intel FPL paper, great example for reading by Friday 57 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures • RPF -> VIC (short slide) 58 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)