ATLAS on UKLight

advertisement
ATLAS on UKLight
Large Hadron Colider
ƒ LHC will collide beams of protons at
an energy of 14 TeV
ƒ Using the latest super-conducting
technologies, it will operate at about 2700C, just above absolute zero of
temperature.
ƒ With its 27 km circumference, the
accelerator will be the largest
superconducting installation in the
world.
4 LHC Experiments
ATLAS
- general purpose: origin of mass,
supersymmetry, micro-black holes,
where did the antimatter go?
-Also top quarks, standard model,
heavy ions physics (quark gluon
plasma)
- 2,000 scientists from 34 countries
As featured in FHM!
LHC Data Challenges
Starting from this event…
Selectivity: 1 in 1013
Like looking for 1
person in a thousand
world populations
Or for a needle in 20
million haystacks!
..we are looking for this “signature”
A particle collision = an event
We need:
ƒDetectors to record
ƒTriggers to select
ƒComputing and software to
process/reconstruct/simulate
ƒComputing, software &
physicists to refine selection and
analyse
Why ATLAS needs UKLight
ƒ But most data simulated/analysed not at CERN
ƒ Need High Bandwidth for File Transfer
ƒ
ƒ
ƒ
ƒ
~10s PB/year
(Disc-Disc SCP @ 1MB/s)/(Single site Analysis) not an option
Need for Grid for particle physics
LCG and UK GRIDPP needed
ƒ Three Main Aims of ATLAS ESLEA collaborators
ƒ
ƒ
ƒ
Increasing capability of large bulk file transfers over WAN
Real Time Analyse of Calibration/Alignment Data
Monitoring Links
ƒ Real Time Monitor
ƒ WeatherMap
ƒ Archival history /
Billing Info
Tier Model
ƒ Tiered service varies level
ƒ
ƒ
ƒ
ƒ
ƒ
of support/functionality.
Complex Data Flows
UK has distributed Tier2s
comprising multiple
university sites
RAL is UK Tier1
Lancaster in Tier2
NORTHGRID with:
Manchester, Liverpool,
Sheffield, Daresbury Lab.
T0
T1
T1
Site
Site
T2
T2
Site
T1
Site
T1
Atlas Tier1 Model and data flows
Tape
RAW
Real data storage,
reprocessing and
distribution
ESD2
RAW
Tier-0
AODm2
1.6 GB/file
0.02 Hz
1.7K f/day
32 MB/s
2.7 TB/day
0.044 Hz
3.74K f/day
44 MB/s
3.66 TB/day
ESD1
AODm1
RAW
AOD2
0.5 GB/file
0.02 Hz
1.7K f/day
10 MB/s
0.8 TB/day
500 MB/file
0.04 Hz
3.4K f/day
20 MB/s
1.6 TB/day
1.6 GB/file
0.02 Hz
1.7K f/day
32 MB/s
2.7 TB/day
10 MB/file
0.2 Hz
17K f/day
2 MB/s
0.16 TB/day
ESD2
AODm2
0.5 GB/file
0.02 Hz
1.7K f/day
10 MB/s
0.8 TB/day
500 MB/file
0.036 Hz
3.1K f/day
18 MB/s
1.44 TB/day
Other
T1
T1
Tier-1s
disk
buffer
CPU
farm
ESD2
AODm2
0.5 GB/file
0.02 Hz
1.7K f/day
10 MB/s
0.8 TB/day
500 MB/file
0.004 Hz
0.34K f/day
2 MB/s
0.16 TB/day
disk
storage
ESD2
AOD2
AODm2
0.5 GB/file
0.02 Hz
1.7K f/day
10 MB/s
0.8 TB/day
10 MB/file
0.2 Hz
17K f/day
2 MB/s
0.16 TB/day
500 MB/file
0.004 Hz
0.34K f/day
2 MB/s
0.16 TB/day
AODm1
AODm2
500 MB/file
0.04 Hz
3.4K f/day
20 MB/s
1.6 TB/day
500 MB/file
0.04 Hz
3.4K f/day
20 MB/s
1.6 TB/day
Tier-2s
T1
T1
Plus simulation &
analysis data flow
ESD2
AODm2
0.5 GB/file
0.02 Hz
1.7K f/day
10 MB/s
0.8 TB/day
500 MB/file
0.036 Hz
3.1K f/day
18 MB/s
1.44 TB/day
Other
T1
T1
Tier-1s
Testing of Model
ƒ Day-Day Running & Challenges
ƒ Both experimental software and generic middleware stack
ƒ Data challenges were used by experiments to test experiment
software
ƒ “Data Challenges”
ƒ “Can we actually simulate data and know how to store/analyse it?”
ƒ Service Challenges through (W)LCG were used to test middleware
and networks
ƒ “Service Challenges”
ƒ “Can we provide the services that the experiments want/need?”
ƒ Combination of service and data challenges towards full rates
ongoing and will continue to ramp up until Full data taking in Spring
‘07
ƒ Latest phase of ATLAS involvement started 19th June
Hardware Configuration
ƒ Computing Element (CE) and other
standard LCG EGEE Services
ƒ Monitoring Node
ƒ EGEE User Interface (UI) node
ƒ Storage Element (SE)
ƒ
ƒ
ƒ
Storage Resource Managed (SRM) dCache
Head Node
6 Pool Nodes
ƒ 2 x 6TB RAID5 storage arrays
Network Configuration
ƒ LAN continually evolving
ƒ Connected to both UKLight and
Production network
Production
Network
CE &
Production
Services
ROW
ƒ 100Mbps limit to University
network for management/service RAL
communication
Network
SE &
UKLight
ƒ Upgrading SE-CE connection
to 1Gbps
Endpoint
7609
Router
UKLight
Monitoring
High spec 2.4
TB Raid
UKLight Network
•DHCP/DNS organised via University/RNO
•Heavy use of static routes with backup over
JANET network
Software Configuration
ƒ Scientific Linux 3 OS with linux_2.4.21-40 kernel
ƒ Basic TCP-Tuning to increase default/max TCP_window_size,
txqueuelen
ƒ LCG software stack including
ƒ
ƒ
ƒ
ƒ
dCache srmcp
globus-url-copy
EGEE File Transfer Server FTS
Access to
ƒ EGEE User Interface (UI) to initiate transfers
ƒ ATLAS Distributed Data Management (DDM)
ƒ Monitoring
ƒ Ping, Traceroute, Iperf, Pathload and Pathrate for Line Testing
ƒ MRTG for Cisco 7609 Monitoring/Input to RTM/Weathermap
Lancaster-RAL Link
ƒ T1-T2 transfer testing
ƒ Bulk of T2 transfers will be to T1
ƒ 1 Gbps line
ƒ Iperf/Pathrate/Pathload close to design
ƒ 12 to 2 hops using Traceroute
ƒ Avoid production network induced
bottlenecks
ƒ 400 Mbps firewall at RAL
File Transfer Tests
ƒ Tested using :
ƒ dCache version of GridFTP srmcp
ƒ Also using FTS server to control transfers
ƒ Achieved
ƒ Peak of 948Mbps
ƒ Transferred:
ƒ 8TB in 24 hours - 900+ Mbps aggregate rate
ƒ 36TB in 1 week - 500+ Mbps aggregate rate
ƒ Parallel file transfers increase rate
ƒ Better utilisation of bandwidth
ƒ Staggered initialisation of transfers reduces overhead from
initialisation/cessation of individual transfers. Rate increase from
150Mbps to 900Mbps
ƒ 2% (18Mbps) reverse traffic flow for 900Mbps transfer
ƒ File size affects rate of transfer
ƒ FTS transfers not yet as successful as srmcp
only transfers
ƒ Greater overhead
ƒ Single FTS file transfer gives 150Mbps
ƒ Same as srmcp
ƒ 400Mbps Max for 10 concurrent FTS transfers
ƒ All single stream transfers
ƒ Single stream rate varies 150 to 180 Mbps
with increase from 1 to 10 GB
RAL-CERN Link
ƒ LINK on UKLight as initial connection
ƒ To be replaced by LCG Optical Private
Network
ƒ Replacement/SJ5/Geant2
ƒ UKLight to SARA; then Netherlight to
CERN
ƒ Support/Manpower available for
assistance
ƒ 4 Gbps
Lancaster-SARA Link
ƒ Link not yet active
ƒ SARA capacity underused, RAL capacity
currently too small for UK simulation storage
ƒ Also, SARA planned by ATLAS to be Tier1 in
case of RAL downtime ( FTS catalogues etc.)
ƒ Tests similar to Lancaster-RAL Tests
ƒ T1-T2 testing
ƒ Study of effect of International/Extended link
length, alternate protocols etc..
Lancaster-Manchester Link
ƒ Intra-Tier2 site Testing
ƒ 1Gbps
ƒ Initial IPERF 220/800 Mbps with and without
“basic” TCP Tuning
ƒ “Homogeneous Distributed Tier2”
ƒ dCache Head node at Lancs, pool nodes at
both Lancs and Mancs
ƒ Test Transfers to/from RAL
ƒ Test of Job submission to close CE
ƒ Possible testing of xrootd
Lancaster-Edinburgh Link
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Link not active yet
UDT Protocol Testing
ESLEA box
6x400GB SATA RAID disks with 3Ware SATA
RAID controller
Details of software configuration TBC
Hand over to Barney::::::
Download