Data Processing and the LHC
Computing Grid (LCG)
Jamie Shiers [Jamie.Shiers@cern.ch]
Database Group, IT Division
CERN, Geneva, Switzerland http://cern.ch/db/
• Brief overview / recap of LHC (emphasis on Data)
• The LHC Computing Grid
• The Importance of the Grid
• The role of the Database Group (IT-DB)
• Summary
Jamie Shiers LCG Data Processing 2
LHC Overview
CERN – European Organization for Nuclear Research
Data Rates:
• 1PB/s from detector
• 100MB/s – 1.5GB/s to ‘disk’
• 5-10PB growth / year
• ~3GB/s per PB of data
Data Processing:
• 100,000 of today’s fastest PCs
Jamie Shiers LCG Data Processing 6
LHC: Higgs Decay into 4 muons
(+30 minimum bias events)
Reconstructed tracks with pt > 25 GeV
All charged tracks with pt > 2 GeV selectivity: 1 in 10 13
- 1 person in a thousand world populations
- A needle in 20 million haystacks
Data
R
A
W
E
S
D
100TB/yr
A
O
D
10TB/yr
TAG
1TB/yr
Tier1 seq.
1PB/yr (1PB/s prior to reduction!)
Tier0 random
Users
~PByte/sec
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2) ~1:1:1
Experiment
Online System
~100-400
MBytes/sec
Tier 0 +1
CERN 700k SI95
~1 PB Disk;
Tape Robot
Tier 1
~2.5 Gbits/sec
IN2P3 Center RAL Center INFN Center
FNAL: 200k
SI95; 600 TB
2.5 Gbps
Tier 2
~2.5 Gbps
Tier2 Center
Tier 3
Institute Institute
Physics data cache
Workstations
Jamie Shiers
100 - 1000
Mbits/sec
Tier 4
LCG Data Processing 9
• Physicists work on analysis
“channels”
• Find collisions with similar features
• Physics extracted by collective iterative discovery – small groups of professors and students
• Each institute has ~10 physicists working on one or more channels
• Order 1000 physicists in 100 institutes in 10 countries
Jamie Shiers LCG Data Processing 10
Perfect parallelism
• Independent events (collisions)
bulk of the data is read-only – in conventional files
• New versions rather than updates
• meta-data (few %) in databases
very large aggregate requirements
• computation, data, i/o
chaotic workload –
• unpredictable demand, data access patterns
• no limit to the requirements
Jamie Shiers LCG Data Processing 11
The LHC Computing Grid (LCG) http://cern.ch/LCG/
Jamie Shiers LCG Data Processing 13
Jamie Shiers LCG Data Processing 14
Computing resources
Complex problem
GRID
Data
Knowledge
Jamie Shiers
Solution
LCG Data Processing
Instruments
People
15
Uni x physicist
Lab m
UK
Uni a
Lab a
USA
CERN Tier 1
France
Italy
Lab b
Uni y
……….
CERN Tier 0
Japan
Germany
Uni b
Lab c
Uni n
Jamie Shiers LCG Data Processing 16
The opportunity of
Grid technology
Tier3 physics department
Lab a
Tier2
Uni x grid for a
ATLAS
Lab m
CERN Tier 1
UK
USA
Italy
CERN
CERN Tier 0
LHC b
Japan
Uni a
CMS
Uni n
Desktop
Lab b
Spain Germany
Uni y grid for a physics
Tier 0 Centre at CERN
Lab c
sees the image of a single cluster does not need to know where the data is
where the processing capacity is
how things are interconnected
the details of the different hardware and is not concerned by the conflicting policies of the equipment owners and managers
Jamie Shiers LCG Data Processing 18
Goal –
Prepare and deploy the LHC computing environment
• applications support –
• develop and support the common tools, frameworks, and environment needed by the physics applications
• computing system –
• build and operate a global data analysis environment
integrating large local computing fabrics
and high bandwidth networks
to provide a service for ~6K researchers
in over ~40 countries
This is not yet another grid technology project –
Jamie Shiers it is a grid deployment project
LCG Data Processing 19
Two phases
Phase 1 – 2002-05
• Development and prototyping
• Operate a 50% prototype of the facility needed by one of the larger experiments
Phase 2 – 2006-08
• Installation and operation of the full world-wide initial production Grid for all four experiments
Jamie Shiers LCG Data Processing 20
C ross G rid
• significant R&D funding for Grid middleware
• scope for divergence
• global grids need standards
Many national,
• regional Grid projects -the trick will be to recognise and be willing to migrate to the winning solutions
US projects
Jamie Shiers LCG Data Processing
European projects
21
The first Milestone within one year -
Service deploy a Global Grid
• sustained 24 X 7 service
• including sites from three continents
identical or compatible Grid middleware and infrastructure
• several times the capacity of the CERN facility
• and as easy to use
Having stabilised this base service – progressive evolution –
• number of nodes, performance, capacity and quality of service
• integrate new middleware functionality
• migrate to de facto standards as soon as they emerge
Jamie Shiers LCG Data Processing 22
• CMS preparing for distributed data challenge, starting Q3 2003, ending Q1 2004
• Tier0 (CERN), 2-3 Tier1, 5-10 Tier2
• Total data volume ~100TB
• Need to be production ready with Grid Computing
System and Applications by 1 st July 2003
Jamie Shiers LCG Data Processing 23
The Importance of the Grid
• Original proposal 1989
• 1992 – explosion inside HEP
• 1993 – explosion across the world
• Largely due to NCSA Mosaic browser
• Now totally ubiquitous: every firm must have a
Website!
Jamie Shiers LCG Data Processing 25
Jamie Shiers LCG Data Processing 26
• NASA Information Power Grid
• DOE Science Grid
• NSF National Virtual Observatory
• NSF GriPhyN
• DOE Particle Physics Data Grid
• NSF TeraGrid
• DOE ASCI Grid
• DOE Earth Systems Grid
• DARPA CoABS Grid
• NEESGrid
• DOH BIRN
• NSF iVDGL
Jamie Shiers LCG Data Processing 27
• UK e-Science Grid
• Netherlands – VLAM, PolderGrid
• Germany – UNICORE, Grid proposal
• France – Grid funding approved
• Italy – INFN Grid
• Eire – Grid proposals
• Switzerland - Network/Grid proposal
• Hungary – DemoGrid, Grid proposal
• Nordic Grid …
• SPAIN:
Jamie Shiers LCG Data Processing 28
• DataGrid (CERN, ..)
• EuroGrid (Unicore)
• DataTag (TTT…)
• Astrophysical Virtual Observatory
• GRIP (Globus/Unicore)
• GRIA (Industrial applications)
• GridLab (Cactus Toolkit)
• CrossGrid (Infrastructure Components)
• EGSO (Solar Physics)
Jamie Shiers LCG Data Processing 29
Interview with Irving Wladawsky-Berger
• ‘Grid computing is a set of research management services that sit on top of the OS to link different systems together’
• ‘We will work with the Globus community to build this layer of software to help share resources’
• ‘All of our systems will be enabled to work with the grid, and all of our middleware will integrate with the software’
Jamie Shiers LCG Data Processing 30
‘Grid Computing is one of the three next big things for Sun and our customers’
Ed Zander, COO Sun
‘The alignment of OGSA with XML Web services is important because it will make Internet-scale, distributed Grid
Computing possible’
Robert Wahbe, General Manager of Web
Services, Microsoft
Oracle: starting Grid activities…
Jamie Shiers LCG Data Processing 31
The Grid fabric will be:
• Soft – share everything, failure tolerant
• Dynamic – resources will constantly come and go, no steady state, ever
• Federated – global structure not owned by any single authority
• Heterogeneous – from supercomputer clusters to P2P PCs
John Manley, HP Labs
Jamie Shiers LCG Data Processing 32
The Role of the Database Group
• Database Infrastructure for CERN laboratory
• Currently based on Oracle – all sectors
• Applications support in certain key areas
• Oracle Application Server, Engineering Data
Management Service
• Physics Data Management support
• Services for the LHC experiments, Applications, …
• Grid Data Management
• European DataGrid WP2 – Data Management
• Corresponding LCG Services
• LCG Persistency Project: POOL
Jamie Shiers LCG Data Processing 34
Jamie Shiers LCG Data Processing 35