Intel Single Chip Cloud Computer (SCC) – An Overview by Karthik.V.M. Motivations for [1] SCC Many-core processor research High-performance power-efficient fabric Fine-grain power management Message-based programming support Parallel Programming Research Better support for scale-out model servers OS, Communication architecture Scale out programming model for client Programming languages, runtimes Courtesy: intel SCC Feature Set First Si with 48 iA cores on a single die Power envelope 125W, Core @ 1GHz, Mesh @ 2GHz Message passing architecture No coherent shared memory Proof of concept for scalable many-core solution Next generation 2D mesh interconnect Bisection B/W 1.5Tb/s to 2Tb/s, avg.power 6W to 12 W Fine grain dynamic power management Courtesy: intel SCC system overview Courtesy: intel Die Architecture Courtesy: intel Voltage and Frequency islands Courtesy: intel Package and Test Board Courtesy: intel Core & Router Fmax Courtesy: intel SCC Platform Board Overview Courtesy: intel SCC Software SCC customized linux Cross compilers for pentium processor available for c++ & fortran Cross compiled MPI2 including iTAC trace analyzer available C++ programming frame work ”baremetal C” availble for creating baremeta apps, OS etc Management Console PC software sccGui Courtesy: intel Programmer's view of SCC Courtesy: intel – A small library for many-core communication [2][3] RCCE Compact light weight communication Research vehicle to see how message passing APIs map to many cores One can work close to the hardware (eg manipulate the MPB) Same program executes at all cores Has MPI style APIs & Power mgmt APIs Two level APIs – gory & non gory RCCE emulator Courtesy: intel Software Managed Cache Coherence Implementing hardware managed cache coherence is difficult Limited Power budget High complexity and validation effort Software Managed Coherence Scales with number of cores Multiple apps running in separate coherency domains Dynamically reconfigurable coherency domains Most apps are RO-shared, few RW-shared Courtesy: intel Software Managed Cache Coherence (cont) Shared virtual memory can be used to support coherency (like DSM) The coherency is maintained by regions being owned exclusively The regions can then be handed over to other core for exclusive operation Some regions are jointly acessible No coherence traffic until ownership is changed Consistency guaranteed only at release/acquire points Courtesy: intel Separated Coherency Domains Courtesy: intel Multiple SCC Chips – Wider Coherency Courtesy: intel References [1] J. Howard et al., “A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS,” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, 7-11 2010, pp. 108 –109. [2] T. G. Mattson and R. F. V. der Wijngaart, “Rcce: a small library for many-core communication,” Intel Corporation, Tech. Rep., May 2010. [3] T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy, J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe, “The 48-core scc processor: the programmer’s view,” in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1–11. [Online]. Available: http://dx.doi.org/10.1109/SC.2010.53