Novel Architectures Nigel Topham ICSA School of Informatics Overview Background My research interests ICSA research profile 10 Feb 2004 Team Talent Review - Nigel Topham 2 Brief bio 85-89: Lecturer in Computer Science, Edinburgh 90-91: Lead architect, ACRI, France – – 92-99: Back at Edinburgh – – – – Scalable VLIW architectures for embedded and DSP apps Novel compiler + micro-architecture techniques Founding director of ICSA Left to form a startup 99-03: Siroyan Ltd, co-founder and chief architect – – – – – – – Multiprocessor GaAs supercomputer (Decoupled Access/Execute) Compiler research followed Developed scalable VLIW DSP (OneDSP) Delivered on-time, as synthesised Verilog IP Highly-optimised compiler technology was key Right-first-time 10million-gate silicon in 0.15um (and also 0.18um) Proven in TSMC and UMC foundries BlueLogic program VC problems in 2002 curtailed development, IP sold to Altera 03-03: ARC International, chief architect (ARC 600) 10 Feb 2004 Team Talent Review - Nigel Topham 3 OneDSP overview SR-8C Scalable DSP + RISC Configurable core: 2 to 16 MACS/~ Clustered VLIW architecture SR-4C – Partitioned register files – Register rotation (for s/w pipelining) SR-2C 200mW, 2GIPS – Predicated execution ( “ ) @ 250 MHz – Extensive SIMD capabilities SR-1C Low power synthesisable core 50mW, 500 MIPS – 0.1 mW / MHz / MAC @ 0.13um @ 250 MHz – 250 MHz worst-case Sophisticated compiler technology – Combined software pipelining and register allocation – Peak performance achievable through compiled DSP code Announced at MPF’01, product released on schedule 10 Feb 2004 Team Talent Review - Nigel Topham 4 ARC 600 290 MHz worst-case 0.13um ASIC process Dimensions: 1.05mm x 1.87mm (2 sq.mm) Challenge: – 50% higher clock frequency – 25% lower power consumption – 6 months, team of 4 (grew to 30 during verification) Results: – Again completed on time – Announced MPF’03 (Oct’03) – All performance targets met 8 KB data cache 4-way associative IF tsu fetch align DE RF decbranch tsu read RF byp EX WB ALUmerge wr Novelty: cc – I-cache power saving techniques – Static branch prediction (inherently low power) – Coded for power-aware and physicallyaware synthesis – 40 W / MHz power consumption (CPU) miss Data 1 Data 2 tsu D-tag hit tsu D-data ext wr tsu RAM DSP 1 XY AGU XY mem AGU tsuXY-bank 1 AGU tsuXY-bank 2 opd sel DSP 2 DSP 3 16x16 3:2 inv + sat selA1 rnd sat pmul p csa a c k 16x16 inv 3:2 + sat selA2 rnd sat pmul csa pointer regs FFT bfly 10 Feb 2004 Team Talent Review - Nigel Topham 5 Research interests Mixture of Deeply Technical + Methodology issues Micro-architecture – – – – Low power (new techniques) High performance (for low-power embedded systems) Energy aware, adaptive micro-architecture What are the 5-10 yr challenges? Compiler optimisation – Compiler / architecture synergy – working together is better – Energy aware compilation – existing systems v.inefficient High-performance embedded systems – Configurable computing – NRE cost mandates flexibility in most scenarios System-on-chip design – IP is expensive to produce, doubtful business model in isolation – Rarely trusted (re-verification problem) – How can eScience help? 10 Feb 2004 Team Talent Review - Nigel Topham 6 ICSA research summary Optimising compilers (Mike O’Boyle) – – QCD architecture simulation (Roland Ibbett, Tony Kennedy) – – Similar to Smart Dust concept Scottish university consortium to develop demonstrators Parallel computing – – Design space exploration through architecture simulation Hardware/software co-simulation for performance prediction Speckled computing (Arvind) – – High-level restructuring Iterative and adaptive compilation Structured parallelism (Murray Cole) Compiler/architecture support for cellular multi-processors - (IBM, Marcelo Cintra) New EU Network of Excellence (almost) announced – – High Performance Embedded Architecture and Compilation ICSA is UK coordinator 10 Feb 2004 Team Talent Review - Nigel Topham 7