The Computing System for the Belle Experiment Ichiro Adachi KEK representing the Belle DST/MC production group CHEP03, La Jolla, California, USA March 24, 2003 •Introduction: Belle •Belle software tools •Belle computing system & PC farm •DST/MC production •Summary Introduction • Belle experiment – B-factory experiment at KEK – study CP violation in B meson system. start from 1999 – recorded ~120M B meson pairs so far – KEKB accelerator is still improving its performance March 24, 2003 120fb-1 The largest B meson data sample at (4s) region in the world Ichiro Adachi, CHEP03 2 Belle detector example of event reconstruction fully reconstructed event March 24, 2003 Ichiro Adachi, CHEP03 3 Belle software tools Event flow • Home-made kits – “B.A.S.F.” for framework • Belle AnalySis Framework • unique framework for any step of event processing • event-by-event parallel processing on SMP Input with panther shared object unpacking calibration • unique data format from DAQ to user analysis • bank system with zlib compression – reconstruction & simulation library • written in C++ • Other utilities – CERNLIB/CLHEP… – Postgres for database B.A.S.F. – “Panther” for I/O package tracking vertexing module loaded dynamically clustering particle ID diagnosis Output with panther March 24, 2003 Ichiro Adachi, CHEP03 4 Belle computing system PC farms Sun computing server 500MHz*4 38 hosts online tape server Fujitsu GbE switch tape library 500TB Compaq Computing network for batch jobs and DST/MC production GbE switch work group server 500MHz*4 9 hosts HSM server super-sinet 1Gbps Tokyo Nagoya Tohoku GbE switch user PC University resources file server 8TB 1GHz 100hosts disk 4TB HSM library 120TB User analysis & storage system March 24, 2003 Ichiro Adachi, CHEP03 5 Computing requirements Reprocess entire beam data in 3 months Once reconstruction codes are updated or constants are improved, fast turn-around is essential to perform physics analyses in a timely manner MC size is 3 times larger than real data at least Analyses are getting matured and understanding systematic effect in detail needs large MC sample enough to do this Added more PC farms and disks March 24, 2003 Ichiro Adachi, CHEP03 6 PC farm upgrade Total CPU = CPU processor speed(GHz) # of CPUs # of nodes Total CPU(GHz) 1600 ~1500GHz 1400 Will come soon 1200 1000 800 boost up CPU power for DST & MC productions 600 Delivered in Dec.2002 400 200 0 19 19 20 20 20 20 20 20 20 99 99 00 00 01 01 02 02 03 .1 .7 .1 .7 .1 .7 .1 .7 .1 .1 .1 .1 .1 .1 .1 .1 .1 .1 Total CPU has become 3 times bigger in recent two years 60TB(total) disks have been also purchased for storage March 24, 2003 Ichiro Adachi, CHEP03 7 Belle PC farm CPUs -heterogeneous system from various vendors -CPU processors(Intel Xeon/PenIII/Pen4/Athlon) Dell 36PCs (Pentinum-III ~0.5GHz) NEC 84PCs (Pentium4 2.8GHz) will come soon 2% 3% 3% 2% 31% 11% 470GHz 168GHz Compaq 60PCs (Intel Xeon 0.7GHz) Appro 113PCs (Athlon 2000+) 320GHz 380GHz 21% setting up done 25% 2% Fujitsu 127PCs (Pentium-III 1.26GHz) March 24, 2003 Ichiro Adachi, CHEP03 8 DST production & skimming scheme 1. Production(reproduction) raw data data transfer Sun DST data disk PC farm histograms log files 2. Skimming disk or HSM skims such as hadronic data sample Sun user analysis histograms log files DST data disk March 24, 2003 Ichiro Adachi, CHEP03 9 Output skims • Physics skims from reprocessing – “Mini-DST”(4-vectors) format – Create hadronic sample as well as typical physics channels(up to ~20 skims) • many users do not have to go through whole hadronic sample. – Write data onto disk at Nagoya(350Km away from KEK) directly using NFS(thanks to super-sinet link of 1Gbps) mini-DST 1Gbps Nagoya ~350Km from KEK KEK site March 24, 2003 hadronic mini-DST full recon reprocessing output J/ inclusive Ichiro Adachi, CHEP03 bs D*s 10 Processing power & failure rate • Processing power – Processing ~1fb-1 per day with 180GHz • Allocate 40 PC hosts(0.7GHzx4CPU) for daily production to catch up with DAQ – 2.5fb-1 per day possible • Processing speed(in case of MC) with 1GHz one CPU – Reconstruction: 3.4sec – Geant simulation: 2.3sec • Failure rate for one B meson pair < 0.01% module crash tape I/O error 1% process communication error 3% network trouble/system error March 24, 2003 Ichiro Adachi, CHEP03 negligible 11 Reprocessing 2001 & 2002 • Reprocessing 3months – major library & constants update in April – sometimes we have to wait for constants For 2002 summer 78fb-1 • Final bit of beam data taken before summer shutdown always reprocessed in time 2.5months For 2001 summer 30fb-1 March 24, 2003 Ichiro Adachi, CHEP03 12 MC production • Produce ~2.5fb-1 per day with 400GHz PenIII – Resources at remote sites also used • Size 15~20GB for 1 M events. – 4-vector only • Run dependent min. set of generic MC Run# xxx B0 MC data Run# xxx beam data file mini-DST run-dependent B+B- MC data background IP profile charm MC data light quark MC March 24, 2003 Ichiro Adachi, CHEP03 13 MC production 2002 • Keep producing MC generic samples – PC farm shared with DST – Switch from DST to MC production can be made easily minor change • Reached 1100M events in March 2003. 3 times larger samples of 78fb-1 completed major update March 24, 2003 Ichiro Adachi, CHEP03 14 MC production at remote sites GHz 300 CPU resource available ~300GHz 200 100 0 en I VP o ky To i ai aw H ku ho To k Ri T TI a oy ag N K KE 600 MC events produced 500 • Total CPU resources at remote sites is similar to KEK • 44% of MC samples has been produced at remote sites – All data transferred to KEK via network • 6~8TB in 6 months M events 400 44% at remote sites 300 200 100 0 i yo ok ai aw u ok ok n ke I VP T H T Ri IT T a oy ag N K KE March 24, 2003 Ichiro Adachi, CHEP03 15 Future prospects • Short term – Software:standardize utilities – Purchase more CPUs and/or disks if budget permits… – Efficient use of resources at remote sites • Centralized at KEK distributed over Belle-wide – Grid computing technology… just started survey & application • Date file management • CPU usage • SuperKEKB project – – – – Aim 1035(or more) cm-2s-1 luminosity from 2006 Phys.rate ~100Hz for B-meson pair 1PB/year expected New computing system like LHC experiment can be a candidate March 24, 2003 Ichiro Adachi, CHEP03 16 Summary • The Belle computing system has been working fine. More than 250fb-1 of real beam data has been successfully (re)processed. • MC samples with 3 time larger than beam data has been produced so far. • Will add more CPU in near future for quick turnaround as we accumulate more data. • Grid computing technology would be a good friend of ours. Start considering its application in our system. • For SuperKEKB, we need much more resources. May have rather big impact in our system. March 24, 2003 Ichiro Adachi, CHEP03 17