Exploitation of the Starlight/UK light link for the experiment CDF
Valeria Bartsch, Nicola Pezzi, Mark Lancaster University College London Content: ● CDF, physics & why we need Starlight ● Grid software of our collaboration ● exploitation of the starlight link at UCL
CDF ● located at Fermilab close to Chicago ● proton/antiproton collisions at the Tevatron of an energy of 1.2 TeV (at the moment the highest artificial collision energy at the world) ● multipurpose detector with discovery potential for the Higgs, studies of b physics and measurement of standard model parameters ● luminosity of about 1fb 1 per year
● good tracking due to silicon tracker which can pinpoint displaced vertices ● calorimeters for energy measurements
UCL engaged in W width and W mass measurement which shed light on the Higgs particle Bs mixing 1995: discovery of the top quark
Remote Farms Central Farms
Raw Data Reco Data Reco MC User Data
Data Handling Services User Desktops Central Storage Remote Analysis Systems Central Analysis Systems
● The experiment has ~ 800 physicists of which ~ 50 are in the UK.
● The experiment produces large amounts of data which is stored in the US ● ~ 1000 Tb per year ● ~ 2000 Tb data stored to date and expect this to rise to 10,000 by 2008 ● UK physicists: ● Need to be able to copy datasets ( ~ 0.510 Tb) quickly to the UK ● Reprocess this data (with better calibrations) and share this data with other UK physicists and other CDF physicists worldwide
► Transfer enormous amounts of data needed for different activities (scalable) ► Don’t want to know the details [where files sit, where jobs run] (helpful) ► … sometimes over large distances and with commodity hardware (robust) Solution… ► A data handling and job management system ► Maintain knowledge of what we are doing and what we did (monitoring and bookkeeping) ► Maximize use of our resources (efficient) ► SAM used by CDF, CAF used to be our old batch system but has been enhanced to a GRID system
Remote Facilities provides user analysis, MC generation, reprocessing for DZero different stages of services: for users at own institutes, for users of own experiment, opportunistic use of GRID systems Central Storage dCache: developed in collaboration with DESY (Hamburg) enstore robots Sequential Access Via Metadata & Grid software Central Systems still major facilities for user analysis CDF: 1000 GHz CPU, DZero: …..
CDF: reprocessing farms
… MSS or Other Station Temp Disk Producers/ Project Managers /Consumers Cache Disk File Storage Clients File Storage Server Station & Cache Manager MSS or Other Station … ► Data flow Control File Stager(s) eworkers SAM is a distributed data movement and management service: data replication is achieved by the use of disk caches during file routing.
►
FNAL
CDF: 10k/20k Files declared/day 15k Files consumed/day 8 TByte of Files cons./day tests selected SAM stations start main consumption of data still central remote use on the rise
Submit and forget until receiving a mail Does all the job handling and negotiation with the data handling system without the user knowing
most monitoring via regular Web interface
HTTPd cgi-bin fetch
Ker bero s
Monitor Mailer Submitter
Condor Worker nodes
Schedd User jobs Negotiator assigns nodes to jobs 1 3 Collector 2 Negotiator User priorities 1 Starter User Job 1 Starter User Job
Schedd User jobs
Negotiator assigns nodes to jobs
Collector Negotiator User priorities Starter User Job Starter User Job
Globus assigns nodes to jobs
Schedd User jobs User Job
All control on Grid site
Globus Grid nodes
Monitoring would need to be reimplemented
User Job
Schedd User jobs
Negotiator assigns nodes to jobs Globus assigns nodes to VOs Glideins
Collector User Job Starter Globus Grid nodes Starter Negotiator User priorities User Job
Monitor
nr_waiting_jobs nr_queued_glideins CondorG
Glidekeeper
CDF prox y Glob serv ice us Globus
CondorG CDF Glob serv ice us Collector Condor GSI CDF service proxy CDF user Starter Globus Starter CDF user
goals (taken from an earlier presentation by Mark): ● Achieve 500 Mbit/sec between CDF data store @ FNAL and UCL Setup large CPU/disk facility at UCL to utlise this I/O for: Simulated data (MC) production [Real CDF data reprocessing => not possible because no storage here] ● ● CDF data analysis Open up facility to the 400+ CDF users Connect Liverpool to FNAL over UCL
StarLig UK ligh ht t link ● 10 Gbit/sec link between Fermilab and UCL ● 1 Gbit/sec dedicated to CDF ● part of it being UK light link, in the US Starlight link ● as usual real data transfer rate limited at the last 10 metres Beware of fisher boats!
● ● finalised the optimum disk, kernel and gridftp configurations necessary to maximise and monitor (web100) throughput.
sustained the milestone rate of 500 Mbit/sec between CDF datastore and UCL disks ● finalised the configurations and modifications to CDF software necessary to submit ● CDF jobs to LCG/GridPP resources ● ran CDF jobs under LCG on UCLHEP cluster
Liverpool cluster SAM station CAF ~1000 worker nodes Starlight link 1GBit/sec Starlight cluster 2 SAM stations with 1TB RAID0 each CCC cluster SAM station CAF ~1000 worker nodes Hep UCL cluster SAM station CAF ~20 worker nodes Liverpool and CCC cluster Tier2 centers in the LCG terminology
not yet deployed Starlight link 1GBit/sec Starlight cluster 2 SAM stations with 1TB RAID0 each Hep UCL cluster SAM station CAF ~20 worker nodes not yet deployed ● CCC cluster not yet deployed ● Liverpool: not yet data forwarding
● if possible monitoring from our GlideCaf at uclhep
for comparison: last years plans & our progress CDF Plans ● Utilise the 100 GHz farm at UCL and then > 1000 GHz across London in the GridPP/ATLAS Tier2 centres.
● Complete “Gridification” of CDF software infrastructure: moving to ● ● File catalog using FNAL “SAM” product (done) File transfer using GridFtp (done) Job submission (JIM) that interfaces to pbs etc (using CAF interface now) ● Develop an interface to LCG to exploit use of UK CPU resources (done) ● Ultimate aim: ● ● ● Send data from FNAL to multiple UK sites via StarLight.
Reprocess data at multiple UK sites and have data transparently visible to other CDF users worldwide.
Transfer data back to FNAL over StarLight and update catalog.
Improve data transfer rates CDF s/w @ UCL HEP CDF s/w @ UCL CCC data transfer to Liverpool 01/06 06/06 12/06
● ● porting of CDF environment from UCL HEP to UCL CCC subsequent maintenance of this environment ● establishing UK/StarLight connectivity from UCL CCC to FNAL/CDF ● provision of sufficient disk space to make this attractive to CDF colleagues and thus a success beyond high transfer rates requires ~ 20 50 TB we hoped to use the UCL SAN project for this but severely delayed instead user analysis jobs will copy data as needed from FNAL ● mitigate lack of disk space by providing more CPU via Liverpool ● availability of link • • CDF usage will require on demand push and pull of data, to & from FNAL CDF will take/analyse data until 2010 what happens at end of ESLEA ?