Introduction

advertisement
The Computing System for
the Belle Experiment
Ichiro Adachi
KEK
representing the Belle DST/MC production group
CHEP03, La Jolla, California, USA
March 24, 2003
•Introduction: Belle
•Belle software tools
•Belle computing system & PC farm
•DST/MC production
•Summary
Introduction
• Belle experiment
– B-factory experiment at KEK
– study CP violation in B meson
system. start from 1999
– recorded ~120M B meson
pairs so far
– KEKB accelerator is still
improving its performance
March 24, 2003
120fb-1
The largest B meson
data sample at (4s)
region in the world
Ichiro Adachi, CHEP03
2
Belle detector
example of event reconstruction
fully reconstructed event
March 24, 2003
Ichiro Adachi, CHEP03
3
Belle software tools
Event flow
• Home-made kits
– “B.A.S.F.” for framework
• Belle AnalySis Framework
• unique framework for any step of event
processing
• event-by-event parallel processing on
SMP
Input with panther shared object
unpacking
calibration
• unique data format from DAQ to user
analysis
• bank system with zlib compression
– reconstruction & simulation library
• written in C++
• Other utilities
– CERNLIB/CLHEP…
– Postgres for database
B.A.S.F.
– “Panther” for I/O package
tracking
vertexing
module
loaded
dynamically
clustering
particle ID
diagnosis
Output with panther
March 24, 2003
Ichiro Adachi, CHEP03
4
Belle computing system
PC farms
Sun computing server
500MHz*4
38 hosts
online tape server
Fujitsu
GbE
switch
tape library 500TB
Compaq
Computing network for batch jobs and DST/MC production
GbE
switch
work
group server
500MHz*4
9 hosts
HSM server
super-sinet
1Gbps
Tokyo
Nagoya
Tohoku
GbE
switch
user PC
University resources
file server 8TB
1GHz 100hosts
disk 4TB
HSM library 120TB
User analysis & storage system
March 24, 2003
Ichiro Adachi, CHEP03
5
Computing requirements
Reprocess entire beam data in 3 months
Once reconstruction codes are updated or constants are
improved, fast turn-around is essential to perform physics
analyses in a timely manner
MC size is 3 times larger than real data at least
Analyses are getting matured and understanding systematic
effect in detail needs large MC sample enough to do this
Added more PC farms and disks
March 24, 2003
Ichiro Adachi, CHEP03
6
PC farm upgrade
Total CPU = CPU processor speed(GHz)  # of CPUs  # of nodes
Total CPU(GHz)
1600
~1500GHz
1400
Will come soon
1200
1000
800
boost up CPU power for DST & MC
productions
600
Delivered in
Dec.2002
400
200
0
19
19
20
20
20
20
20
20
20
99
99
00
00
01
01
02
02
03
.1
.7
.1
.7
.1
.7
.1
.7
.1
.1
.1
.1
.1
.1
.1
.1
.1
.1
Total CPU has become 3 times bigger in recent two years
60TB(total) disks have been also purchased for storage
March 24, 2003
Ichiro Adachi, CHEP03
7
Belle PC farm CPUs
-heterogeneous system from various vendors
-CPU processors(Intel Xeon/PenIII/Pen4/Athlon)
Dell 36PCs
(Pentinum-III ~0.5GHz)
NEC 84PCs
(Pentium4 2.8GHz)
will come soon
2% 3%
3%
2%
31%
11%
470GHz
168GHz
Compaq 60PCs
(Intel Xeon 0.7GHz)
Appro
113PCs
(Athlon 2000+)
320GHz
380GHz
21%
setting up done
25%
2%
Fujitsu 127PCs
(Pentium-III 1.26GHz)
March 24, 2003
Ichiro Adachi, CHEP03
8
DST production & skimming scheme
1. Production(reproduction)
raw data
data transfer
Sun
DST data
disk
PC farm
histograms
log files
2. Skimming
disk or HSM
skims such as
hadronic data
sample
Sun
user analysis
histograms
log files
DST data
disk
March 24, 2003
Ichiro Adachi, CHEP03
9
Output skims
• Physics skims from reprocessing
– “Mini-DST”(4-vectors) format
– Create hadronic sample as well as typical physics channels(up
to ~20 skims)
• many users do not have to go through whole hadronic sample.
– Write data onto disk at Nagoya(350Km away from KEK)
directly using NFS(thanks to super-sinet link of 1Gbps)
mini-DST
1Gbps
Nagoya
~350Km from KEK
KEK site
March 24, 2003
hadronic mini-DST full recon
reprocessing
output
J/ inclusive
Ichiro Adachi, CHEP03
bs
D*s
10
Processing power & failure rate
• Processing power
– Processing ~1fb-1 per day with 180GHz
• Allocate 40 PC hosts(0.7GHzx4CPU) for daily production to catch up with
DAQ
– 2.5fb-1 per day possible
• Processing speed(in case of MC) with 1GHz one CPU
– Reconstruction: 3.4sec
– Geant simulation: 2.3sec
• Failure rate
for one B meson pair
< 0.01%
module crash
tape I/O error
1%
process communication error
3%
network trouble/system error
March 24, 2003
Ichiro Adachi, CHEP03
negligible
11
Reprocessing 2001 & 2002
• Reprocessing
3months
– major library &
constants update in
April
– sometimes we have to
wait for constants
For 2002 summer 78fb-1
• Final bit of beam data
taken before summer
shutdown always
reprocessed in time
2.5months
For 2001 summer 30fb-1
March 24, 2003
Ichiro Adachi, CHEP03
12
MC production
• Produce ~2.5fb-1 per day with 400GHz
PenIII
– Resources at remote sites also used
• Size 15~20GB for 1 M events.
– 4-vector only
• Run dependent
min. set of generic MC
Run# xxx
B0 MC data
Run# xxx
beam data file
mini-DST
run-dependent
B+B- MC data
background
IP profile
charm MC data
light quark MC
March 24, 2003
Ichiro Adachi, CHEP03
13
MC production 2002
• Keep producing MC
generic samples
– PC farm shared with DST
– Switch from DST to MC
production can be made
easily
minor change
• Reached 1100M events in
March 2003. 3 times
larger samples of 78fb-1
completed
major update
March 24, 2003
Ichiro Adachi, CHEP03
14
MC production at remote sites
GHz
300
CPU resource available
~300GHz
200
100
0
en
I
VP
o
ky
To
i
ai
aw
H
ku
ho
To
k
Ri
T
TI
a
oy
ag
N
K
KE
600
MC events produced
500
• Total CPU resources
at remote sites is
similar to KEK
• 44% of MC samples
has been produced at
remote sites
– All data transferred
to KEK via network
• 6~8TB in 6 months
M events
400
44% at remote sites
300
200
100
0
i
yo
ok
ai
aw
u
ok
ok
n
ke
I
VP
T
H
T
Ri
IT
T
a
oy
ag
N
K
KE
March 24, 2003
Ichiro Adachi, CHEP03
15
Future prospects
• Short term
– Software:standardize utilities
– Purchase more CPUs and/or disks if budget permits…
– Efficient use of resources at remote sites
• Centralized at KEK  distributed over Belle-wide
– Grid computing technology… just started survey & application
• Date file management
• CPU usage
• SuperKEKB project
–
–
–
–
Aim 1035(or more) cm-2s-1 luminosity from 2006
Phys.rate ~100Hz for B-meson pair
1PB/year expected
New computing system like LHC experiment can be a candidate
March 24, 2003
Ichiro Adachi, CHEP03
16
Summary
• The Belle computing system has been working fine.
More than 250fb-1 of real beam data has been
successfully (re)processed.
• MC samples with 3 time larger than beam data has
been produced so far.
• Will add more CPU in near future for quick turnaround as we accumulate more data.
• Grid computing technology would be a good friend of
ours. Start considering its application in our system.
• For SuperKEKB, we need much more resources. May
have rather big impact in our system.
March 24, 2003
Ichiro Adachi, CHEP03
17
Download