“Single-chip Cloud Computer”

A many-core research platform from Intel Labs

Tor Lund-Larsen

Intel Labs

2

Agenda

Introduction

Motivation for the SCC

Introducing the Intel “ Single-Chip Cloud

Computer”

Introducing the SCC Co-Traveler program

Summary, Q & A

Intel in Braunschweig

 Intel acquisition in 2000

 Today: 100+: Research Scientist & Engineers,

Students

 Two main Intel groups on-site

– IAG: Intel Architecture Group - Product Development

– IL: Intel Labs – Research

3

One of the largest Intel R & D sites in Europe

4

Intel Labs Braunschweig

Research Charter:

• Emulation technology; accelerate IA architecture & design

• “Tera-scale”/Many-Core processor prototyping

• Memory architectures; break the “memory wall”

• Joint research programs with European universities

Emulation infrastructure and technology

Tera-scale microprocessors

Memory architectures

Compute evolving to “Tera-

Scale”

Entertainment, Learning

Financial Analytics

TIPS

Model-

Based Apps

Personal Media Creation and Management

GIPS

3D and

Video

5

MIPS

Multimedia

KIPS Text

Kilobytes

Many-core

Multi-Core

INTEL

TERA-SCALE

RESEARCH

Single-Core

Megabytes Gigabytes

Dataset Size

Terabytes

Health and

Medicine

Tera-Scale Scaling Challenges

6

Energy

Efficiency

Design

Complexity

Programming

Models

Emerging

Applications

Single-chip Cloud Computer

7

• Experimental many-core CPU/Platform for “Tera-Scale” HW/SW research

• Many-core processor research  Hardware

• Parallel Programming research  Software

• Research platform shared with industry and academic collaborators to enable/encourage “tera-scale” explorations

Energy

Efficiency

• Dynamic voltage/frequency scaling

• 1/3 power reduction for core-core I/O

Design

Complexity

• Array of small IA-based tiles could lead to more agile, flexible designs

Programming

Model

• Message-passing approach proven to scale to 1000’s processors

Application

Development

• Sharing with Microsoft* & others for academic, industry innovation

The SCC Platform

8

• Debug tools – Memory Reader, SoftRAM, R/W SCC

Config Registers

• Provides SW based virtual I/O (e.g. performance widget)

• Konsole w/ SSH connections to all booted cores

• Comandline tools – sccBoot, sccReset, sccBMC, sccKonsole

Intel SCC – a complete HW/SW “Tera-Scale” research platform

9

R

A closer look

• 24 Dual-core tiles (48 IA cores)

• 24 Routers

• Mesh network with 256 GB/s bisection bandwidth

• 4 Integrated DDR 3 memory controllers

• 1.3 Billion transistors

Dual-core SCC Tile

L2 Cache Core 1

1TIL

E

R

ROUTER Message Buffer

R

R R R

ROUTE

R

L2 Cache Core 2

10

Dynamic Power Management

Fine-grain, software-controlled power management

 8 Voltage and frequency islands  Dynamic range 25-125W

 Each tile can run at a different frequency

 6 banks of 4 tiles can run at different voltages

 Independent V&F control for I/O network & MCs

R

Tile V Tile

R n

V Tile

F n

R

Tile V Tile

R

Tile

R

Tile

R

F n

R

F

R

Tile

R

Tile

48 IA cores at

25-125W

R

Tile V Tile

R

Tile V Tile

R

Tile V Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

Advancing Parallel SW Research

 The SCC eliminates significant complexity & power by removing hardware cache coherency

11

DATA

 Enables exploration of more scalable alternatives:

– Message passing models common in datacenter, HPC

– Software-managed, adaptive cache coherency

SCC Co-Traveler Program

Goal: Enable Tera-Scale Research by Industry/academic institutions

 Access to SCC System(s), Tools, Documentation, Open Source SW, Support etc

 Access to “Eco-system” of users, Intel sponsored conferences, SCC Workshops

 Working with 100+ partners around the world

 Deployment in wavers, currently executing wave #1

"We're very excited about Intel's SCC. In the Barrelfish project we are designing

OS architectures for future multi-core and many-core systems. The chip's memory system and message passing support are a great fit for us, and it's an ideal vehicle for us to test and validate our ideas.“

– Prof. Timothy Roscoe, ETH Zürich

"The upcoming Single-chip Cloud Computer is of great interested to application developers and tools researchers. The availability of the hardware will greatly accelerate our development of applications and tools for massively parallel computing platforms.”

– Prof. Wen-Mei Hwu, University of Illinois, UPCRC@Illinois co-director

12

SCC Co-Traveling in action

Financial Analytics w/ shared virtual memory

Microsoft Visual Studio Advanced Power Management

JavaScript Physics Modeling HPC Parallel Workloads Hadoop Web Search

13

SCC Co-Traveler Timeline

May Jan Feb Mar Apr

Introduction Symposia/Workshops

Jun Jul Aug Sep

Santa Clara

Feb 12

Germany

Mar 16

Research Proposal Process (1 st wave)

EWME

May 10-12

Workshops and Conferences

TBD

Registration Applications 15 Apr

Deadline

Notification

1st Week of May

Documentation/SW Availability

Overview, Messaging LIB,

EAS, How To Use Linux

Platform Availability

Sample

Workflows

Final Docs

Website Active

Beta

Testers

General HW Availability

Datacenter For Remote Access

14

Summary:

 Many-Core/Tera-Scale compute transition will happen!

 The Intel SCC is an experimental many-core research platform designed to help addressing the “tera-scale” HW and SW challenges:

Energy

Efficiency

Design

Complexity

• Dynamic voltage/frequency scaling

• 1/3 power reduction for core-core I/O

• Array of small IA-based tiles could lead to more agile, flexible designs

Programming

Model

• Message-passing approach proven to scale to 1000’s processors

Application

Development

• Sharing with Microsoft* & others for academic, industry innovation

15

 Intel is making the SCC available world-wide to enable industry and academic research and innovation.

You are invited to join us!

Thank You!

• Visit our website: www.intel.com/info/scc

•Technical Questions about SCC Platform, Research:

SCC_Technical_Questions@intel.com

Copyright © 2010, Intel Corporation. All Rights Reserved.

Partner Disclaimer: It is acknowledged that the use of the word "Partner" is a commonly used term in the technology industry to designate a marketing relationship between otherwise unaffiliated companies and is used in accordance with this common usage herein. The use of the w ord “Partner” herein shall not be deemed to nor is it intended to create a partnership, agency, joint venture or other similar arrangement between Intel and such partners and the employees, agents and representatives of one party shall not be deemed to be employees, agents or representatives of the other. Intel and the partners shall be deemed to be independent contractors and shall have no authority to bind each other.

Intel and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

* Other names and brands may be claimed as the property of others.

16

17

Acknowledgements

Jim Held, Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram

Vangal, David Finan, Gregory Ruhl, David Jenkins, Howard Wilson,

Nitin Borkar, Gerhard Schrom, Fabrice Pailet, Shailendra Jain, Tiju

Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, Vasantha

Erraguntla, Michael Riepen, Guido Droege, Joerg Lindemann, Matthias

Gries, Thomas Apel, Kersten Henriss, Tor Lund-Larsen, Sebastian

Steibl, Shekhar Borkar, Vivek De, Rob Van Der Wijngaart, Timothy

Mattson

Extending Tera-scale Research

19

2006 Many-core Prototype

“Teraflops Research Processor”

Many simple FP cores

Validated tiled-design concept

Tested HW limits of a mesh network

Sleep capabilities at core and circuit level

Lightweight message passing

Limited programmability for basic benchmarks

2009 Many-core Prototype

“Single-chip Cloud Computer”

Many fully-functional IA cores

Prototypes a tiled-design microprocessor

Improved mesh with 3x performance/watt

Dynamic voltage & frequency scaling

Message passing & controlled memory sharing

Full programmability for application research

Primarily a circuit experiment Circuit & software research vehicle

Intel Labs Braunschweig

 Design:

– IA core and the Message Passing solution

– DDR3 Memory Controllers design

 Validation:

– Logic validation of the complete chip

– FPGA Emulation for pre-silicon SW prototyping

 Platform:

– Test bed system validation and bring up platform

20

 Software:

– A Linux OS

– Platform firmware, operational SW & drivers