Replica Optimisation Within The EU DataGrid David Cameron e-Science Summer School

advertisement
Replica Optimisation Within The EU
DataGrid
David Cameron
e-Science Summer School
16 – 21 September 2002
Summary
The need for Grid.
Grid architecture.
Replica management and optimisation
through economic models.
Grid simulation – OptorSim.
Some results.
Simulation demo.
The Large Hadron Collider
Complexity
CPU Requirements
Complex events
„
„
Large number of signals
“good” signals are covered
with background
Many events
„
„
„
109 events/experiment/year
1- 25 MB/event raw data
several PB per year
→ Need
world-wide:
7*106 SPECint95 (3*108 MIPS)
Several PB of storage space
GRID computing
A Physics Event
•
Gated electronics response from a proton-proton
collision
ATLAS Barrel Inner Detector
Ð
H→bb
•
•
•
Raw data: hit addresses, digitally
converted charges and times
b
Marked by a unique code:
•
•
Proton bunch crossing number, RF bucket
Event number
Collected, Processed, Analyzed, Archived….
•
•
Variety of data objects become associated
Event “migrates” through analysis chain:
•
•
•
may be reprocessed;
selected for various analyses;
replicated to various locations.
Ð
b
Data Structure
Physics Models
Trigger System
Monte
MonteCarlo
CarloTruth
TruthData
Data
Data Acquisition
Level 3 trigger
Trigger
RawData
Data
TriggerTags
Tags Raw
Run
RunConditions
Conditions
Calibration
CalibrationData
Data
Reconstruction
Event
EventSummary
SummaryData
Data Event
EventTags
Tags
ESD
ESD
Detector Simulation
MC
MCRaw
RawData
Data
Reconstruction
MC
MCEvent
EventSummary
SummaryData
Data MC
MCEvent
EventTags
Tags
REAL and SIMULATED data required
Data Hierarchy
~2 MB/event
~100 kB/event
~10 kB/event
~1 kB/event
RAW
Recorded by DAQ
Triggered events
ESD
Reconstructed
information
AOD
TAG
Selected
information
Analysis
information
Detector digitisation
Pseudo-physical information:
Clusters, track candidates
(electrons, muons), etc.
Physical information:
Transverse momentum,
Association of particles, jets,
(best) id of particles,
Physical info for relevant “objects”
Relevant information
for fast event selection
Physics Analysis
ESD:
ESD:Data
DataororMonte
MonteCarlo
Carlo
ATA FLOW
INCREASING D
Event
EventTags
Tags
Event Selection
Analysis
Object
Data
Analysis
Object
Data
Analysis
Object
Data
Analysis
Object
Analysis
Data
AnalysisObject
ObjectData
Data
Tier 0,1
Collaboration
wide
Calibration
CalibrationData
Data
AOD
AOD
Raw
RawData
Data
Analysis, Skims
Tier 2
Analysis
Groups
Physics
Physics
Physics
Objects
Objects
Objects
Tier 3, 4
Physicists
Physics Analysis
Tier-0 - CERN
Commodity Processors +IBM (mirrored) EIDE
Disks..
Storage Systems.
2004 Scale: ~1,000 CPUs
~1 PBytes
UK Tier-1 RAL
New Computing Farm
4 racks holding 156 dual
1.4GHz Pentium III cpus.
Each box has 1GB of
memory, a 40GB internal
disk and 100Mb ethernet.
Tape Robot
50TByte disk-based
Mass Storage Unit
upgraded last year
uses 60GB STK 9940 tapes
45TB currrent capacity
could hold 330TB.
after RAID 5 overhead.
PCs are clustered on
network switches with up
to 8x1000Mb ethernet out
of each rack.
2004 Scale: 1000 CPUs
0.5 PBytes
UK Tier-2 ScotGRID
• 59 IBM X Series 330
dual 1 GHz Pentium III
with 2GB memory
• IBM
X Series 370 PIII
Xeon with 512 MB
memory 32 x 512 MB
RAM
• 70 x 73.4 GB IBM FC
Hot-Swap HDD
2004 Scale: 300 CPUs
0.1 PBytes
Grid architecture
Replica management
Replica Manager
copyFile()
copyAndRegisterFile()
listReplicas()
deleteFile()
Replica Catalogue – (LFN
registerEntry()
unregisterEntry()
PFNs )
Submitting a Job
The Grid
Site 1
User
Scheduler
Site 2
Site 3
Replica Optimisation
Optimise use of computing, storage and network
resources.
Short term optimisation:
„
„
Minimise running time of current job.
“Get me the files for my job as quickly as possible”
Long term optimisation:
„
„
Minimise running time of all jobs.
“Make sure files are in the best
places for all my future jobs.”
Optimisation Through Economic Models
Files represent goods.
Bought by Computing Elements for jobs.
Bought and sold by Storage Elements to make
“profit”.
Investment decision based on projected future
value based on previous file access patterns.
Storage Elements can buy popular files
independently of running jobs.
Replica optimiser: architecture
Access Mediator (AM) contacts replica optimisers to
locate the cheapest copies of files
and makes them locally available
Storage Broker (SB) - manages
files stored in storage element,
trying to maximise profit for the
finite amount of storage space
available
P2P Mediator (P2PM) establishes and maintains P2P
communication between grid
sites
Auction Mechanism
Use Vickrey auction:
„
„
Every seller makes a bid lower than the asking
price.
File is sold to lowest bidder at second lowest
price.
Ensures:
„
„
„
Low price for purchaser.
Trading fairness.
Minimal messaging
OptorSim – a replica optimiser simulation
Need to tune optimisation
algorithms.
Develop Grid simulation in
JAVA.
Input network configuration
and files and jobs.
Job: transfer the files defined
in job description to CE
running job.
OptorSim – a replica optimiser simulation
Schedule to CE using
CEcost = queueSize
+ accessCost
Files requested according
to access pattern.
„
„
„
„
„
Sequential
Random
Unitary random walk
Gaussian random walk
Zipf distribution (not yet
implemented).
No “processing” involved,
only file transfer.
OptorSim – a replica optimiser simulation
Input site policies and
experiment data files
(simplified CDF jobs).
Data Sample
Number of Files
Total Size (GB)
Central J/ψ
120
1200
High pt electrons
20
200
Inclusive electrons
500
5000
Inclusive muons
140
1400
High Et photons
580
5800
Z0 -> b bbar
60
600
Tested replication strategies:
No replication
„ Always Replicate, Delete Oldest File
„ Always Replicate, Delete Least Accessed File
„ Economic Model
„
Results
Eco model 40%
better for
sequential but no
better for others –
expected since
eco model is
tuned for
sequential access.
Future Work
3rd party replication
SAM access patterns
Integration Optor
Reptor
Testbed
Conclusions
Simulation shows Eco Model successful.
Further simulation will help tune algorithms.
Integration into testbed code “soon”.
OptorSim
DEMO!
Download