Open Science Grid In the first part of 2007 the Open Science Grid (OSG) consortium established and operated the first generation of a shared national cyberinfrastructure bringing together advanced grid technologies to provide end-to-end distributed capabilities to a broad range of compute intensive application. The 0.6.0 OSG software stack released early this year offers the community additional capabilities integrated into the Virtual Data Toolkit. These capabilities were used to deliver opportunistic computing to the DZero Tevatron Experiment at Fermilab, demonstrated their power in the data challenges for the CERN LHC experiments, are being tested at scale by the LIGO gravitational wave and STAR nuclear physics experiments, and provide a basis for the use of OSG by non-physics communities. Training and outreach to campus organizations, together with targetted engagement activities, are bringing additional users and communities to use the emerging cyberinfrastructure. D0 Reprocessing on Opportunistically Available Resources During the first half of 2007 DZero reprocessed their complete dataset. Over 50% of the events were processed using opportunistically available resources on the OSG (see Figure 1). This was an important demonstration of the ability of the OSG Consortium stakeholders to contribute resources to the common infrastructure while still maintaining control for their own use. DZero was able to use more than twelve sites on the OSG including the LHC Tier-1s in the US (BNL and Fermilab) and university Tier-2 centers, LIGO and other university sites. On OSG, DZero sustained execution of over 1000 simultaneous jobs, and overall moved over 70 Terabytes of data. "This is the first major production of real high energy physics data (as opposed to simulations) ever run on OSG resources," said Brad Abbott of the University of Oklahoma, head of the DZero Computing group. Reprocessing was completed in June. Towards the end of the production run the throughput on OSG was more than 5 million events per daytwo to three times more than originally planned. In addition to the reprocessing effort, OSG provided 300,000 CPU hours to DZero for one of the most precise Figure 1: D0 Event Processing on Different Grids measurements to date of the top quark mass, and to achieve this result in time for the spring physics conferences. LHC Data Challenges: Simulated Data Distribution and Analysis As part of the preparations for data acquired from the accelerator at CERN, the ATLAS and CMS experiments organize “data challenges” which test the performance and functionality of their global data distribution and analysis systems. The latest round of activities covered the managed distribution and placement of data around the world, including moving data between storage resources at CERN to more than 10 Tier-1 sites for each experiment worldwide. Movement of data between the Tier-1s and the OSG University Tier-2 sites was also part of these exercises (see Figure 2). The sustained performance is as important, and more difficult to achieve in many cases, as the peak throughput delivered. Each experiment uses the Globus GridFTP protocols to distribute the data, the Enabling Grids for EsciencE (EGEE) gLITE File Transfer Service (FTS) to manage contention for and policies within the network pipes, and an experiment specific data placement service (known as DQ2 for ATLAS and PHEDEX for CMS) to orchestrate the data catalogs, namespaces and management. End-to-end analysis and production job Figure 2: CMS Data Transfer scheduling and throughput are another important aspect of the exercises, which Open Science Grid www.opensciencegrid.org July 2007 1 included support from OSG. Both experiments achieved throughputs of more than 10K jobs a day, with success rates of more than 80%.. For data taking next year all these rates must increase by factors of 2-5. The tests are continuing and the sites and middleware are being scaled to meet the deliverables. Virtual Data Toolkit Extensions 07 n- l-0 06 6 Ja Ju 5 05 l-0 n- Ja Ju 4 04 l-0 n- Ja Ju 3 l-0 n- Ja 03 n- Ju l-0 2 Ja Ju Ja n- 02 Number of major components The OSG Virtual Data Toolkit (VDT) provides the underlying packaging and distribution of the OSG software stack. The distribution was initially built and supported by the Trillium projects – GriPhyN, iVDGL and PPDG. VDT continues to be the packaging and distribution vehicle for Condor, Globus, myproxy, and common components of the OSG and EGEE software. VDT packaged components are also used by EGEE, the LIGO Data Grid, the Australian Partnership for Advanced Computing and the UK national grid, and the underlying middleware versions and testing infrastructure are shared between OSG and TeraGrid. The VDT distribution (See Figure 3) is available as either a set of pacman caches or RPMs, with specific distributions available for making processing farms or storage resources accessible from a Grid infrastructure. In the first nine months of the OSG project the VDT Both added and has been further extended to include: OSG removed software accounting probes, collectors and a central 50 More dev releases repository for accounting information, contributed 45 VDT 1.3.9 VDT 1.3.6 by Fermilab; The EGEE CEMON information 40 For OSG 0.4 For OSG 0.2 35 manager which converts information from the VDT 1.1.8 30 VDT 1.3.0 Adopted by LCG MDS2 LDIF format to Condor ClassADS; Virtual VDT 1.6.1 25 VDT 1.0 For OSG 0.6.0 Organization (VO) management registration Globus 2.0b 20 Condor-G 6.3.1 software developed for the World Wide LHC 15 VDT 1.1.11 Computing Grid (WLCG) and used by most physics VDT 1.2.0 10 Grid2003 5 collaborations; An additional implementation of 0 storage software, interfaced through the Storage Resource Management (SRM) interface. The dCache software, provided by a collaboration between the DESY laboratory in Hamburg, and VDT 1.1.x VDT 1.2.x VDT 1.3.x VDT 1.4.0 VDT 1.5.x VDT 1.6.x Fermilab, is also in use by the WLCG and High Energy Physics experiments in the US. VDT Figure 3: Timeline of VDT Releases releases are tested on the OSG Integration Grid before being put in production. VDT is an effective vehicle for the rapid managed dissemination of security patches to the component middleware. Patches and updates are provided to the installation administrators for security and bug fixes. Interoperability and Federation Campus Grids OSG includes within its scope the support and gateways between campus and the OSG infrastructures: FermiGrid: The Fermilab Campus Grid provides a uniform interface to OSG and dispatches jobs to available resources on site. A shared data area allows sharing of data across the local sites. Grid Laboratory Of Wisconsin (GLOW): Work continues to enable applications to be automatically “elevated” to OSGchallenging because the security infrastructures are different on the two facilities, and to allow GLOW users to use their existing local kerberos identities. The "football problem" from LeHigh University, is the first such application to be thus "elevated". OSG collaborates with Educause, Internet2 and TeraGrid to sponsor day-long workshops local to university CampusesCyberInfrastructure (CI) Days. These workshops bring expertise to the Campus to foster research and teaching faculty, IT facility, and CIO discussions .The first such workshop, held at the University of California Davis in March, had an extremely positive response. At least four more CI days will happen in the fall at the University of New Mexico, Elizabeth College, the University of Arkansas and in collaboration with Clemson University. Interoperability OSG interoperates with the EGEE in support of the LHC and other physics VOs. This is now working well based on the correct configuration of the information service. OSG sites must also report the results of site functional tests to the WLCG in support of the LHC infrastructure. WLCG and OSG have worked together on common definitions for the output of such tests, and these are being promulgated to the wider community. The foundation of federation is translation of published information to a format that other grids can use. OSG contributors continue to participate in this activity as part of the Open Grid Forum GLUE work. Open Science Grid www.opensciencegrid.org July 2007 2 Engagement of Non-Physics Communities The OSG Engagement activity’s mission is to work closely with new communities over periods of several months to help them use the production infrastructure and transition to be full contributing members of the OSG. In the nine months since the start of the OSG project engagement activities have succeeded in: Production running opportunistically using more than a hundred thousand CPU hours of the Rosetta application from the Kuhlman Laboratory in North Carolina across more than thirteen OSG sites. Experience shows that once a site has been “commissioned” it is fairly stable against errors unless, and until, a scheduled maintenance occurs, and once jobs are submitted they run quickly across the grid (See Figure 4). Production runs of the Weather Research and Forecast (WRF) application using more than one hundred and fifty thousand CPUhours on the NERSC OSG site at Lawrence Berkeley 3,000 jobs National Laboratory. Improvement of the performance of the nanoWire application from the nanoHub project on sites on the OSG and TeraGrid, such that stable running of batches of five hundred jobs across more than five sites is routine. Adaptation of the ATLAS workload management system, PANDA, for the Chemistry at Harvard Molecular Mechanics Hours (CHARMM) program for macromolecular simulations, in this case for the study of water Figure 4:Rosetta Jobs Submitted across the OSG penetration in staphylococcal nuclease by Dr A. Damjanovic at Johns Hopkins University who has used over thirty thousand CPUHours on twelve OSG resources over the last few months. Grid Schools and the Education Virtual Organization The heart of the Open Science Grid education training builds on the successful annual grid schools run by the Trillium projects. Each of the OSG grid schools is two to three days of lectures and hands on practicals. Schools have so far been held in at the University of Illinois at Chicago and at the University of Texas Brownsville (UTB, a Minority Serving Institution), with a third school taking place at the beginning of August at the University of Nebraska, Lincoln (an Espcor state). The schools are focused on the graduate level student but at each session several faculty members have signed up with their students. This has resulted in good follow up after the classes as the teachers provide a foundation for continuing to use the material. As well as schools in the US, international organizationsto date in Argentina, Columbia and Brazil have used the material provided (See Figure 5). For the first time OSG has contributed to the annual International Summer School for Grid Computing, which is organized by the National Escience Research Center in Edinburgh. Ten students who have taken the short OSG course were able to spend two weeks immersive hands on training, with close contact with the staff, and in a group of more than 60 students worldwide. The concepts and challenges of distributed computing are taught in tandem with hands on exercises using today’s technologies and systems. After attending a school participants can register, with the OSG Virtual Organization and access opportunistically available resources. At the University of Texas Brownsville, for example, students are continuing to work with LIGO on their Figure 5: Sample of Grid School Curriculum data analysis. Acknowledgements OSG is supported by the Department of Energy Office of Science SciDAC-2 program from the High Energy Physics, Nuclear Physics and Advanced Software and Computing Research programs, and the National Science Foundation Math and Physical Sciences, Office of CyberInfrastructure and Office of International Science and Engineering Directorates. Open Science Grid www.opensciencegrid.org July 2007 3