Replica Optimisation Within The EU DataGrid David Cameron e-Science Summer School 16 – 21 September 2002 Summary The need for Grid. Grid architecture. Replica management and optimisation through economic models. Grid simulation – OptorSim. Some results. Simulation demo. The Large Hadron Collider Complexity CPU Requirements Complex events Large number of signals “good” signals are covered with background Many events 109 events/experiment/year 1- 25 MB/event raw data several PB per year Need world-wide: 7*106 SPECint95 (3*108 MIPS) Several PB of storage space GRID computing A Physics Event • Gated electronics response from a proton-proton collision • Raw data: hit addresses, digitally converted charges and times • Marked by a unique code: • • • Proton bunch crossing number, RF bucket Event number Collected, Processed, Analyzed, Archived…. • • Variety of data objects become associated Event “migrates” through analysis chain: • • • may be reprocessed; selected for various analyses; replicated to various locations. Data Structure Physics Models Trigger System Monte Carlo Truth Data Data Acquisition Run Conditions Level 3 trigger Trigger Tags Raw Data Calibration Data Reconstruction Event Summary Data ESD Event Tags Detector Simulation MC Raw Data Reconstruction MC Event Summary Data MC Event Tags REAL and SIMULATED data required Data Hierarchy ~2 MB/event ~100 kB/event ~10 kB/event ~1 kB/event RAW Recorded by DAQ Triggered events ESD Reconstructed information AOD TAG Selected information Analysis information Detector digitisation Pseudo-physical information: Clusters, track candidates (electrons, muons), etc. Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, Physical info for relevant “objects” Relevant information for fast event selection Physics Analysis ESD: Data or Monte Carlo ATA FLOW INCREASING D Tier 0,1 Event Tags Event Selection Collaboration wide Analysis Object Data Analysis Data AnalysisObject Object Data Calibration Data AOD Raw Data Analysis, Skims Tier 2 Analysis Groups Physics Physics Physics Objects Objects Objects Tier 3, 4 Physicists Physics Analysis Tier-0 - CERN Commodity Processors +IBM (mirrored) EIDE Disks.. Storage Systems. 2004 Scale: ~1,000 CPUs ~1 PBytes UK Tier-1 RAL New Computing Farm 4 racks holding 156 dual 1.4GHz Pentium III cpus. Each box has 1GB of memory, a 40GB internal disk and 100Mb ethernet. Tape Robot 50TByte disk-based Mass Storage Unit upgraded last year uses 60GB STK 9940 tapes 45TB currrent capacity could hold 330TB. after RAID 5 overhead. PCs are clustered on network switches with up to 8x1000Mb ethernet out of each rack. 2004 Scale: 1000 CPUs 0.5 PBytes UK Tier-2 ScotGRID • 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • IBM X Series 370 PIII Xeon with 512 MB memory 32 x 512 MB RAM • 70 x 73.4 GB IBM FC Hot-Swap HDD 2004 Scale: 300 CPUs 0.1 PBytes Grid architecture Replica management Replica Manager copyFile() copyAndRegisterFile() listReplicas() deleteFile() Replica Catalogue – (LFN registerEntry() unregisterEntry() PFNs ) Submitting a Job The Grid Site 1 User Scheduler Site 2 Site 3 Replica Optimisation Optimise use of computing, storage and network resources. Short term optimisation: Minimise running time of current job. “Get me the files for my job as quickly as possible” Long term optimisation: Minimise running time of all jobs. “Make sure files are in the best places for all my future jobs.” Optimisation Through Economic Models Files represent goods. Bought by Computing Elements for jobs. Bought and sold by Storage Elements to make “profit”. Investment decision based on projected future value based on previous file access patterns. Storage Elements can buy popular files independently of running jobs. Replica optimiser: architecture Access Mediator (AM) contacts replica optimisers to locate the cheapest copies of files and makes them locally available Storage Broker (SB) - manages files stored in storage element, trying to maximise profit for the finite amount of storage space available P2P Mediator (P2PM) establishes and maintains P2P communication between grid sites Auction Mechanism Use Vickrey auction: Every seller makes a bid lower than the asking price. File is sold to lowest bidder at second lowest price. Ensures: Low price for purchaser. Trading fairness. Minimal messaging OptorSim – a replica optimiser simulation Need to tune optimisation algorithms. Develop Grid simulation in JAVA. Input network configuration and files and jobs. Job: transfer the files defined in job description to CE running job. OptorSim – a replica optimiser simulation Schedule to CE using CEcost = queueSize + accessCost Files requested according to access pattern. Sequential Random Unitary random walk Gaussian random walk Zipf distribution (not yet implemented). No “processing” involved, only file transfer. OptorSim – a replica optimiser simulation Input site policies and experiment data files (simplified CDF jobs). Data Sample Number of Files Total Size (GB) Central J/y 120 1200 High pt electrons 20 200 Inclusive electrons 500 5000 Inclusive muons 140 1400 High Et photons 580 5800 Z0 -> b bbar 60 600 Tested replication strategies: No replication Always Replicate, Delete Oldest File Always Replicate, Delete Least Accessed File Economic Model Results Eco model 40% better for sequential but no better for others – expected since eco model is tuned for sequential access. Future Work 3rd party replication SAM access patterns Integration Optor Reptor Testbed Conclusions Simulation shows Eco Model successful. Further simulation will help tune algorithms. Integration into testbed code “soon”. OptorSim DEMO!