Exploitation of the Starlight/UK light link for the experiment CDF
Valeria Bartsch, Nicola Pezzi, Mark Lancaster
University College London
Content:
●
CDF, physics & why we need Starlight
●
Grid software of our collaboration
●
exploitation of the starlight link at UCL
CDF
●
located at Fermilab close
● to Chicago
proton/antiproton collisions at the Tevatron of an energy of 1.2 TeV
(at the moment the highest artificial collision energy at the world)
●
multipurpose detector with discovery potential for the Higgs, studies of b physics and measurement of standard model parameters
●
luminosity of about 1fb 1 per year
●
good tracking due to silicon tracker which can pinpoint displaced vertices
●
calorimeters for energy measurements
UCL engaged in W width and W mass measurement which shed light on the
Higgs particle
Bs mixing
1995: discovery of the top quark
Remote
Farms
Central
Farms
Raw Data
Reco Data
Reco MC
User Data
Data Handling
Services
User
Desktops
Central
Storage
Remote Analysis
Systems
Central Analysis
Systems
●
The experiment has ~ 800 physicists of which ~ 50 are in the UK.
●
The experiment produces large amounts of data which is stored in the US
●
~ 1000 Tb per year
●
~ 2000 Tb data stored to date and expect this to rise to 10,000 by 2008
●
UK physicists:
●
Need to be able to copy datasets ( ~ 0.510 Tb) quickly to the UK
●
Reprocess this data (with better calibrations) and share this data with
other UK physicists and other CDF physicists worldwide
► Transfer enormous amounts of data needed for different activities (scalable)
► Don’t want to know the details
[where files sit, where jobs run]
(helpful)
Solution…
► A data handling and job
► … sometimes over large distances and with commodity hardware (robust)
► Maintain knowledge of what we are doing and what we did
(monitoring and bookkeeping)
► Maximize use of our resources
(efficient)
►
SAM used by CDF, CAF used to be our old batch system but has been enhanced to a GRID system
Remote Facilities
provides user analysis,
MC generation, reprocessing for DZero
different stages of services: for users at own institutes, for users of own experiment, opportunistic use of GRID systems
Central Storage
dCache: developed in collaboration with DESY
(Hamburg)
enstore robots
Sequential
Access Via
Metadata
&
Grid software
Central Systems
still major facilities for user analysis
CDF: 1000 GHz CPU,
DZero: …..
CDF: reprocessing farms
… MSS or
Other
Station
Temp Disk
Producers/
Project
Managers
/Consumers
Cache Disk
File
Storage
Clients
File
Storage
Server
Station &
Cache
Manager
MSS or
Other
Station
…
►
Data flow
Control
File
Stager(s) eworkers
SAM is a distributed data movement and management service: data replication is achieved by the use of disk caches during file routing.
►
FNAL
CDF:
10k/20k Files declared/day
15k Files consumed/day
8 TByte of Files cons./day tests selected SAM stations start
main consumption of data
still central
remote use on the rise
Submit and forget until receiving a mail
Does all the job handling and negotiation with the data handling system without the user knowing
most monitoring via regular
Web interface
HTTPd cgi-bin fetch
Ker bero s
Monitor
Mailer
Submitter
Condor
Worker nodes
Schedd
User jobs
Negotiator assigns nodes to jobs
1
3
Collector
2
Negotiator
User priorities
1
Starter
User Job
1
Starter
User Job
Schedd
User jobs
Negotiator assigns nodes to jobs
Collector
Negotiator
User priorities
Starter
User Job
Starter
User Job
Globus assigns nodes to jobs
Schedd
User jobs
User Job
All control on
Grid site
Globus
Grid nodes
Monitoring would need to be reimplemented User Job
Schedd
User jobs
Negotiator assigns nodes to jobs
Globus assigns nodes to VOs
Glideins
Collector
User Job
Starter
Globus
Grid nodes
Starter
Negotiator
User priorities User Job
Monitor nr_waiting_jobs nr_queued_glideins
CondorG
Glidekeeper
CDF prox
serv y
Glob us ice
Globus
CondorG
CDF
serv ice us
Collector
Condor
GSI
CDF service proxy
CDF user
Starter
Globus
Starter
CDF user
goals (taken from an earlier presentation by Mark):
●
Achieve 500 Mbit/sec between CDF data store @ FNAL and UCL
Setup large CPU/disk facility at UCL to utlise this I/O for:
Simulated data (MC) production
[Real CDF data reprocessing => not possible because no storage here]
●
●
CDF data analysis
Open up facility to the 400+ CDF users
Connect Liverpool to FNAL over UCL
StarLig
UK ligh ht t link
●
10 Gbit/sec link between
●
Fermilab and UCL
1 Gbit/sec dedicated to
●
CDF
part of it being UK light link, in the US Starlight link
●
as usual real data transfer rate limited at the last 10 metres
Beware of fisher boats!
●
finalised the optimum disk, kernel and gridftp configurations necessary to maximise and monitor (web100) throughput.
●
sustained the milestone rate of 500 Mbit/sec between CDF datastore and
●
UCL disks
finalised the configurations and modifications to CDF software necessary
● to submit
CDF jobs to LCG/GridPP resources
●
ran CDF jobs under LCG on UCLHEP cluster
Liverpool cluster
SAM station
~1000 worker
CAF nodes
Starlight link
CCC cluster
SAM station
CAF
1GBit/sec
Starlight cluster
2 SAM stations with
1TB RAID0 each
~1000 worker nodes
Hep UCL cluster
SAM station
CAF
~20 worker nodes
Liverpool and CCC cluster Tier2 centers in the LCG terminology
not yet deployed
Starlight link
1GBit/sec
Starlight cluster
2 SAM stations with
1TB RAID0 each
Hep UCL cluster
SAM station
CAF
~20 worker nodes not yet deployed
●
CCC cluster not yet
● deployed
Liverpool: not yet data forwarding
● if possible monitoring from our GlideCaf at uclhep
for comparison: last years plans & our progress
CDF Plans
●
Utilise the 100 GHz farm at UCL and then > 1000 GHz across
London in the GridPP/ATLAS Tier2 centres.
●
Complete “Gridification” of CDF software infrastructure: moving to
●
●
File catalog using FNAL “SAM” product (done)
File transfer using GridFtp (done)
Job submission (JIM) that interfaces to pbs etc (using CAF interface now)
●
Develop an interface to LCG to exploit use of UK CPU resources (done)
●
Ultimate aim:
●
●
●
Send data from FNAL to multiple UK sites via StarLight.
Reprocess data at multiple UK sites and have data transparently visible to other CDF users worldwide.
Transfer data back to FNAL over StarLight and update catalog.
Improve data transfer rates
CDF s/w
@ UCL HEP
CDF s/w
@ UCL CCC data transfer to Liverpool
01/06 06/06 12/06
●
●
porting of CDF environment from UCL HEP to UCL CCC
subsequent maintenance of this environment
●
establishing UK/StarLight connectivity from UCL CCC to FNAL/CDF
●
provision of sufficient disk space to make this attractive to CDF colleagues and thus a success beyond high transfer rates
requires ~ 20 50 TB
we hoped to use the UCL SAN project for this but severely delayed
instead user analysis jobs will copy data as needed from FNAL
●
mitigate lack of disk space by providing more CPU via Liverpool
●
availability of link
•
•
CDF usage will require on demand push and pull of data, to & from
FNAL
CDF will take/analyse data until 2010 what happens at end of ESLEA ?