Data Processing & the LCG

advertisement

Data Processing and the LHC

Computing Grid (LCG)

Jamie Shiers [Jamie.Shiers@cern.ch]

Database Group, IT Division

CERN, Geneva, Switzerland http://cern.ch/db/

Overview

• Brief overview / recap of LHC (emphasis on Data)

• The LHC Computing Grid

• The Importance of the Grid

• The role of the Database Group (IT-DB)

• Summary

Jamie Shiers LCG Data Processing 2

LHC Overview

CERN – European Organization for Nuclear Research

The LHC Machine

CMS

Data Rates:

• 1PB/s from detector

• 100MB/s – 1.5GB/s to ‘disk’

• 5-10PB growth / year

• ~3GB/s per PB of data

Data Processing:

• 100,000 of today’s fastest PCs

Jamie Shiers LCG Data Processing 6

LHC: Higgs Decay into 4 muons

(+30 minimum bias events)

Reconstructed tracks with pt > 25 GeV

All charged tracks with pt > 2 GeV selectivity: 1 in 10 13

- 1 person in a thousand world populations

- A needle in 20 million haystacks

Data

R

A

W

E

S

D

100TB/yr

A

O

D

10TB/yr

TAG

1TB/yr

Tier1 seq.

1PB/yr (1PB/s prior to reduction!)

Tier0 random

Users

LHC Data Grid Hierarchy

~PByte/sec

CERN/Outside Resource Ratio ~1:2

Tier0/(  Tier1)/(  Tier2) ~1:1:1

Experiment

Online System

~100-400

MBytes/sec

Tier 0 +1

CERN 700k SI95

~1 PB Disk;

Tape Robot

Tier 1

~2.5 Gbits/sec

IN2P3 Center RAL Center INFN Center

FNAL: 200k

SI95; 600 TB

2.5 Gbps

Tier 2

~2.5 Gbps

Tier2 Center

Tier 3

Institute Institute

Physics data cache

Workstations

Jamie Shiers

100 - 1000

Mbits/sec

Tier 4

LCG Data Processing 9

HEP Data Analysis

• Physicists work on analysis

“channels”

• Find collisions with similar features

• Physics extracted by collective iterative discovery – small groups of professors and students

• Each institute has ~10 physicists working on one or more channels

• Order 1000 physicists in 100 institutes in 10 countries

Jamie Shiers LCG Data Processing 10

LHC Computing Characteristics

 Perfect parallelism

• Independent events (collisions)

 bulk of the data is read-only – in conventional files

• New versions rather than updates

• meta-data (few %) in databases

 very large aggregate requirements

• computation, data, i/o

 chaotic workload –

• unpredictable demand, data access patterns

• no limit to the requirements

Jamie Shiers LCG Data Processing 11

The LHC Computing Grid (LCG) http://cern.ch/LCG/

From Mainframes to the Grid

Jamie Shiers LCG Data Processing 13

Jamie Shiers LCG Data Processing 14

The GRID Vision

Computing resources

Complex problem

GRID

Data

Knowledge

Jamie Shiers

Solution

LCG Data Processing

Instruments

People

15

And Reality…

Uni x physicist

Lab m

UK

Uni a

Lab a

USA

CERN Tier 1

France

Italy

Lab b

Uni y

……….

CERN Tier 0

Japan

Germany

Uni b

Lab c

Uni n

Jamie Shiers LCG Data Processing 16

The opportunity of

Grid technology

Tier3 physics department

Lab a

Tier2

Uni x grid for a

ATLAS

Lab m

CERN Tier 1

UK

USA

The promise of

Grid technology

Italy

CERN

CERN Tier 0

LHC b

Japan

Uni a

CMS

Uni n

Desktop

Lab b

Spain Germany

Uni y grid for a physics

Tier 0 Centre at CERN

Lab c

Virtual Computing Centre

The user ---

sees the image of a single cluster does not need to know where the data is

where the processing capacity is

how things are interconnected

the details of the different hardware and is not concerned by the conflicting policies of the equipment owners and managers

Jamie Shiers LCG Data Processing 18

The LHC Computing Grid Project

Goal –

Prepare and deploy the LHC computing environment

• applications support –

• develop and support the common tools, frameworks, and environment needed by the physics applications

• computing system –

• build and operate a global data analysis environment

integrating large local computing fabrics

and high bandwidth networks

to provide a service for ~6K researchers

in over ~40 countries

This is not yet another grid technology project –

Jamie Shiers it is a grid deployment project

LCG Data Processing 19

The LHC Computing Grid Project

Two phases

Phase 1 – 2002-05

• Development and prototyping

• Operate a 50% prototype of the facility needed by one of the larger experiments

Phase 2 – 2006-08

• Installation and operation of the full world-wide initial production Grid for all four experiments

Jamie Shiers LCG Data Processing 20

Leveraging Other Grid Projects

C ross G rid

• significant R&D funding for Grid middleware

• scope for divergence

• global grids need standards

Many national,

• regional Grid projects -the trick will be to recognise and be willing to migrate to the winning solutions

US projects

Jamie Shiers LCG Data Processing

European projects

21

LHC Computing Grid Project

The first Milestone within one year -

Service deploy a Global Grid

• sustained 24 X 7 service

• including sites from three continents

 identical or compatible Grid middleware and infrastructure

• several times the capacity of the CERN facility

• and as easy to use

Having stabilised this base service – progressive evolution –

• number of nodes, performance, capacity and quality of service

• integrate new middleware functionality

• migrate to de facto standards as soon as they emerge

Jamie Shiers LCG Data Processing 22

LCG Production

• CMS preparing for distributed data challenge, starting Q3 2003, ending Q1 2004

• Tier0 (CERN), 2-3 Tier1, 5-10 Tier2

• Total data volume ~100TB

• Need to be production ready with Grid Computing

System and Applications by 1 st July 2003

Jamie Shiers LCG Data Processing 23

The Importance of the Grid

Birth of the Web

• Original proposal 1989

• 1992 – explosion inside HEP

• 1993 – explosion across the world

• Largely due to NCSA Mosaic browser

• Now totally ubiquitous: every firm must have a

Website!

Jamie Shiers LCG Data Processing 25

Jamie Shiers LCG Data Processing 26

US Grid Projects

• NASA Information Power Grid

• DOE Science Grid

• NSF National Virtual Observatory

• NSF GriPhyN

• DOE Particle Physics Data Grid

• NSF TeraGrid

• DOE ASCI Grid

• DOE Earth Systems Grid

• DARPA CoABS Grid

• NEESGrid

• DOH BIRN

• NSF iVDGL

Jamie Shiers LCG Data Processing 27

European Grid Projects

• UK e-Science Grid

• Netherlands – VLAM, PolderGrid

• Germany – UNICORE, Grid proposal

• France – Grid funding approved

• Italy – INFN Grid

• Eire – Grid proposals

• Switzerland - Network/Grid proposal

• Hungary – DemoGrid, Grid proposal

• Nordic Grid …

• SPAIN:

Jamie Shiers LCG Data Processing 28

EU GridProjects

• DataGrid (CERN, ..)

• EuroGrid (Unicore)

• DataTag (TTT…)

• Astrophysical Virtual Observatory

• GRIP (Globus/Unicore)

• GRIA (Industrial applications)

• GridLab (Cactus Toolkit)

• CrossGrid (Infrastructure Components)

• EGSO (Solar Physics)

Jamie Shiers LCG Data Processing 29

IBM and the Grid

Interview with Irving Wladawsky-Berger

• ‘Grid computing is a set of research management services that sit on top of the OS to link different systems together’

• ‘We will work with the Globus community to build this layer of software to help share resources’

• ‘All of our systems will be enabled to work with the grid, and all of our middleware will integrate with the software’

Jamie Shiers LCG Data Processing 30

Industrial Engagement?

‘Grid Computing is one of the three next big things for Sun and our customers’

Ed Zander, COO Sun

‘The alignment of OGSA with XML Web services is important because it will make Internet-scale, distributed Grid

Computing possible’

Robert Wahbe, General Manager of Web

Services, Microsoft

Oracle: starting Grid activities…

Jamie Shiers LCG Data Processing 31

HP and Grids

The Grid fabric will be:

• Soft – share everything, failure tolerant

• Dynamic – resources will constantly come and go, no steady state, ever

• Federated – global structure not owned by any single authority

• Heterogeneous – from supercomputer clusters to P2P PCs

John Manley, HP Labs

Jamie Shiers LCG Data Processing 32

The Role of the Database Group

CERN-IT-DB Provides…

• Database Infrastructure for CERN laboratory

• Currently based on Oracle – all sectors

• Applications support in certain key areas

• Oracle Application Server, Engineering Data

Management Service

• Physics Data Management support

• Services for the LHC experiments, Applications, …

• Grid Data Management

• European DataGrid WP2 – Data Management

• Corresponding LCG Services

• LCG Persistency Project: POOL

Jamie Shiers LCG Data Processing 34

Los Endos

Jamie Shiers LCG Data Processing 35

Download