SAMGrid:JIM and CDF Development Rick St. Denis, University of Glasgow • CDF Accepts the Need for the Grid – Requirements • How to Meet the Need – Status of SAMGrid for CDF 4 March 2004 GridPP 9th Collaboration Meeting Spokespersons’ Requirements for CDF Maximize physics output @ low Lumi –L3 output rate: 80 -> 360Hz by 06 CDF needs the Grid Director’s review, International Finance Committee: 50% computing outside FNAL CDFGrid supported by FNAL PAC 4 March 2004 GridPP 9th Collaboration Meeting Scale of CDF Requirements THz FY04 3.7 %offsite CPU Speed 25% 3GHz #duals FY05 9.0 50% 5GHz +360 FY06 16.5 50% 8GHz +220 150 6-7 sites, 100Duals each, by 2006 + 700 @FNAL 4 March 2004 GridPP 9th Collaboration Meeting CDF Computing Model • Develop Analysis on desktop – Access to all CDF data from anywhere • Large scale processing on batch clusters – Submission from anywhere Implemented Now with – interactive tools: ls,top,head/tail/cat CAF – Output to scratch space or desktop 4 March 2004 GridPP 9th Collaboration Meeting Use Cases for Summer 2004 • User Level MC Production – All CDF Users have access – No data on site -> SAM write SAM Essential for Summer 2004 • User Level Data Access – All users have access – Selected samples on site: Full SAM Support 4 March 2004 GridPP 9th Collaboration Meeting Medium Term Vision • Many Sites • Fully transparent submission to all of CDF resources: 75% FNAL, 25% outside • Fully transparent input and output of data 4 March 2004 GridPP 9th Collaboration Meeting Summer 04 Functionality • User selects submission site, saying what dataset they will use • System checks they can do this (privileges) • User access with SAM/dCache • User registers output with SAM 4 March 2004 GridPP 9th Collaboration Meeting October 04 • To extend beyond 25% outside computing JIM is essential: JIM Test for CDF June04, production October 04 • HOWEVER: It already seems that the 25% resources are not sufficient for the produciton passes: will want JIM earlier. 4 March 2004 GridPP 9th Collaboration Meeting CDF Grid from CDFGrid fromaaUser UserPerspective Perspective CAF Gui/CLI Uses SAM AC++ Grid Italy 4 March 2004 Toronto Korea Only Outside Grid Fermilab Lab Taiwan FermiCAF GridPP 9th Collaboration Meeting UK CDF Grid Strategy • 25% of CDF Computing from external resources. All CDF computing on CDF Grid by April 15: Utilize resources fully controlled by CDF: Kerberos/fbsng: dCAF + SAM • October 15, 2004: JIM to capture shared resources • June 2005: 50% of Computing resources external 4 March 2004 GridPP 9th Collaboration Meeting Anywhere @ each site Desktop Simple JIM Private LAN Globus GK CAF Submitter SAM Station @regional centers Condor Submitter WN Private LAN dCache @FNAL SAM DB Condor Matchmaker 4 March 2004 GridPP 9th Collaboration Meeting June 2004 testing June 2005 required Detailed JIM User Interface Flow of: job data User Interface User Interface Submission meta-data User Interface Submission Global Job Queue Resource Selector Grid Client Match Making Global DH Services Info Gatherer SAM Naming Server Info Collector SAM Log Server Resource Optimizer MSS Cluster Data Handling Local Job Handling SAM Station (+other servs) Grid Gateway SAM Stager(s) Local Job Handler (CAF, D0MC, BS, ...) AAA4 March 2004 Worker Nodes SAM DB Server Site RC MetaData Catalog Bookkeeping Service Info Manager JIM Advertise Dist.FS Cache MDS Web Serv Info Providers Grid Monitoring XML DB server Site Conf. Glob/Loc JIDMeeting map GridPP 9th Collaboration ... User Tools Site Site Site Meeting the Needs • • • • • Progress in SAM JIM Status RunJob CDFGridWorkshop: “Nerd’s Paradise” Strict Project Management and process to respond to operational issues 4 March 2004 GridPP 9th Collaboration Meeting Progress in SAM • Dbserver, the database server between applications and Oracle, was upgraded to use a common schema for CDF and D0. • All CDF data files are in SAM • Sam in is in beta testing on the CDF CAF (1200 cpus): passed 20TB/Day delivery • Minos uses SAM for its Data Handling • Steve Mrenna (Phenomenology) depositing ALPGEN files in SAM for common CDF/D0 use. 4 March 2004 GridPP 9th Collaboration Meeting JIM Deployment Issues Focus: • 200 jobs each getting 200 files generated 120000 Communication with the expert! requests simultaneously to the DBServer! – Sensible sam: reliability went to 60%. Now add retries. Training Users • D0 has D0Tools: Big script; determines where user is and copies files: harder to get into a sandbox; • CAF conditions users! Distribution and compatibility: • This has made great strides with SAM, now time for JIM 4 March 2004 GridPP 9th Collaboration Meeting RunJob • Dedicated farms at FNAL will go away and RunJob will be used for production processing of data • CDF will use RunJob for MC production • Dave Evans worked for CDF for 2 mo.: has made CDFRunJob based on RunJob(Shakar), a tool common to CMS. Morag will work on this. 4 March 2004 GridPP 9th Collaboration Meeting Florida workshop: • 11 installations in about 2 hours. Integrated with dCAF in 2 cases inNow 2 days. 20! • 3 in Asia, 4 in Europe • 6 sites committed to summer 2004 usage of their facilities for all of CDF (mostly MC) • Sam installation now: initsam cdf <stationname> • Follow-up on April 1. • Each site has a local user support person to reduce load on core development team. • Generally: Security ate 80% of the effort! 4 March 2004 GridPP 9th Collaboration Meeting 4 March 2004 GridPP 9th Collaboration Meeting Florida Workshop: After 2 Days Installations progress Participating Institues installation and testing progress Sam CDF Sam Sam Caf Caf DCAF Sam Sam File INSTITUTE krb5 Sam sam_par_ret File AC++Dump Head Node Works Station AC++Dump Store Software Store on CAF Remote MIT Yes ? Korea Yes Yes Yes Yes Yes knu Yes Yes Pisa Yes Yes Yes Yes Yes pisa Yes Yes Yes Japan Yes Yes Yes Problems Yes japan Yes Yes Yes Karlsruhe Yes Yes Yes Problems Yes fzzka Yes Yes Yes Yes Yes Yes Problems Yes liverpool Yes Yes Yes Liverpool In progress Toronto Yes Taiwan Yes Yes TTU Yes Glasgow Yes Yes Yes Yes toronto Yes Yes taiwan Yes Yes Yes Yes Yes Yes Yes -ttu,-ttuYes phys In Progress Yes glasgow Yes UCSD Yes Yes Yes Yes Yes ucsd Yes CNAF Yes Yes Yes Yes Yes cnaf Yes 4 March 2004 Yes Yes Yes GridPP 9th Collaboration Meeting Yes 2TB/Day: Karlsruhe 4 March 2004 GridPP 9th Collaboration Meeting CDF Dcache on CAF ALL CDF on CAF reads 20TB/Day 4 March 2004 GridPP 9th Collaboration Meeting 4 March 2004 GridPP 9th Collaboration Meeting 4 March 2004 GridPP 9th Collaboration Meeting Dcache and SAM • Dcache shapes traffic into disk: If a SAM cache is large, need to use Dcache instead of nfs mounts • Dcache gives the user what is requested. 1TB gets same priority as 1GB: CDF users must send email requesting data to be staged. • SAM examines consumption rate before staging next files – No EMAIL needed. • SAM uses Dcache for its Caching at FNAL. •4 March This work with 2004 needs further GridPP 9th Collaboration Meeting SRM SAMGrid Management Sam Management Team Sam Project Leaders Sam Technical Leaders Sam Operations And Projects 4 March 2004 GridPP 9th Collaboration Meeting Sam Design SamGrid Development Process Chaired by Project Leaders Chaired by Technical Managers SAMGrid Operations/Projects Issue Raised SAMGrid Design SAMGrid Management Team Grid Deliverables Subproject 4 March 2004 GridPP 9th Collaboration Meeting Subproject Organization • Each Subproject has a subproject leader (SPL) responsible for making a plan and reporting progress. • Each Subproject has one of the Technical leaders evaluating against an assessment template. • No deliverable requires more than 3mo work to deliver. 4 March 2004 GridPP 9th Collaboration Meeting SubProject Assessment Template 1. 2. 3. 4. 5. 6. 7. 8. Background Documents Project Definition/Mission Statement Deliverables and timetable Inter-project deliverables Project status Challenges and Critical Path Items Lessons Learned Project specific comments, alternate views 4 March 2004 GridPP 9th Collaboration Meeting SAMGrid Assigned SubProjects MC / Reconstruction Housekeeping Work FlowPackage MCRequest Housekeeping H Stream for CDF JIM:MCD0 Test Harness User analysis Apps JIM:D0Tools Infrastructure Common API 4 March 2004 Retire CDF Replica Catalog Database Server Rewrite Database Servers toLinux Configuration Management Caching Metadata Query with configurable Params GridPP 9th Collaboration Meeting Status of Assessments • Subprojects defined • Interviews conducted on about ½ • Assessment reports being written 4 March 2004 GridPP 9th Collaboration Meeting Conclusions • CDF has embraced the need for the Grid to achieve its physics mission • Progress in deployment, robustness testing has SAM in CDF • JIM is rapidly solving its problems • … with the help of a review and management process 4 March 2004 GridPP 9th Collaboration Meeting