OSG–doc–932 January 15, 2010 www.opensciencegrid.org Report to the US Department of Energy December 2009 Miron Livny Ruth Pordes Kent Blackburn Paul Avery University of Wisconsin Fermilab Caltech University of Florida 1 PI, Technical Director Co-PI, Executive Director Co-PI, Council co-Chair Co-PI, Council co-Chair Table of Contents 1. Executive Summary ............................................................................................................... 3 1.1 1.2 1.3 1.4 1.5 2. Contributions to Science........................................................................................................ 9 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 3. Training and Content Management .............................................................................................. 55 Outreach Activities ....................................................................................................................... 56 Internet dissemination .................................................................................................................. 57 Participants ........................................................................................................................... 59 5.1 5.2 6. 7. Usage of the OSG Facility............................................................................................................ 34 Middleware/Software ................................................................................................................... 36 Operations .................................................................................................................................... 38 Integration and Site Coordination ................................................................................................ 39 Virtual Organizations Group ........................................................................................................ 40 Engagement .................................................................................................................................. 41 Campus Grids ............................................................................................................................... 44 Security......................................................................................................................................... 44 Metrics and Measurements ........................................................................................................... 47 Extending Science Applications ................................................................................................... 48 Scalability, Reliability, and Usability ........................................................................................... 48 Workload Management System ................................................................................................... 51 Internet2 Joint Activities .............................................................................................................. 52 ESNET Joint Activities ................................................................................................................ 54 Training, Outreach and Dissemination ............................................................................. 55 4.1 4.2 4.3 5. ATLAS ........................................................................................................................................... 9 CMS ............................................................................................................................................. 13 LIGO ............................................................................................................................................ 16 ALICE .......................................................................................................................................... 18 D0 at Tevatron .............................................................................................................................. 18 CDF at Tevatron ........................................................................................................................... 20 Nuclear physics ............................................................................................................................ 26 MINOS ......................................................................................................................................... 30 Astrophysics ................................................................................................................................. 30 Structural Biology ........................................................................................................................ 31 Multi-Disciplinary Sciences ......................................................................................................... 33 Computer Science Research ......................................................................................................... 33 Development of the OSG Distributed Infrastructure ....................................................... 34 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4. What is Open Science Grid? .......................................................................................................... 3 Science enabled by Open Science Grid .......................................................................................... 4 OSG cyberinfrastructure research .................................................................................................. 5 Technical achievements in 2009 .................................................................................................... 6 Challenges facing OSG .................................................................................................................. 7 Organizations................................................................................................................................ 59 Partnerships and Collaborations ................................................................................................... 59 Cooperative Agreement Performance................................................................................ 62 Publications Using OSG Infrastructure............................................................................. 63 Sections of this report were provided by: the scientific members of the OSG Council, OSG PI-s and CoPIs, and OSG staff and partners. Paul Avery and Chander Sehgal acted as the editors. 2 1. 1.1 Executive Summary What is Open Science Grid? Open Science Grid (OSG) aims to transform processing and data intensive science by operating and evolving a cross-domain, self-managed, nationally distributed cyber-infrastructure (Figure 1). OSG’s distributed facility, composed of laboratory, campus and community resources, is designed to meet the current and future needs of scientific Virtual Organizations (VOs) at all scales. It provides a broad range of common services and support, a software platform, and a set of operational principles that organizes and supports users and resources in Virtual Organizations. OSG is jointly funded, until 2011, by the Department of Energy and the National Science Foundation. Figure 1: Sites in the OSG Facility OSG does not own any computing or storage resources. Rather, these are contributed by the members of the OSG Consortium and used both by the owning VO and other VOs. OSG resources are summarized in Table 1. Table 1: OSG computing resources Number of Grid interfaced processing resources on the production infrastructure Number of Grid interfaced data storage resources on the production infrastructure Number of Campus Infrastructures interfaced to the OSG Number of National Grids interoperating with the OSG Number of processing resources on the Integration infrastructure Number of Grid interfaced data storage resources on the in- 3 91 58 8 (Clemson, FermiGrid, Purdue, Wisconsin, Buffalo, Nebraska, Oklahoma, SBGrid) 3 (EGEE, NGDF, TeraGrid) 21 9 tegration infrastructure Number of Cores accessible to the OSG infrastructure Size of Tape storage accessible to the OSG infrastructure Size of Disk storage accessible to the OSG infrastructure CPU Wall Clock usage of the OSG infrastructure ~54,000 ~23 Petabytes @ LHC Tier1s ~15 Petabytes Average of 32,000 CPU days/ day during Oct. 2009 The overall usage of OSG continues to grow though utilization by each stakeholder varies depending on its needs during any particular interval. Overall use of the facility in 2009 was approximately 240M hours, a 70% increase from the previous year! (Detailed usage plots can be found in the attached document.) During stable normal operations, OSG provides approximately 700,000 CPU wall clock hours a day (~30,000 cpu days a day) with peaks occasionally exceeding 900,000 CPU wall clock hours a day; approximately 100,000 to 200,000 opportunistic wall clock hours are available on a daily basis for resource sharing. The efficiency of use is based on the ratio of CPU to I/O in the applications as well as existing demand. Based on our transfer accounting (which is in its early days and ongoing in depth validation), we see over 300TB of data movement (both intra- and inter-site) on a daily basis with peaks of 600TB/day; of this, we estimate 25% is GridFTP based transfers and the rest is via LAN-optimized protocols. 1.2 Science enabled by Open Science Grid OSG’s infrastructure supports a broad scope of scientific research activities, including the major physics collaborations, nanoscience, biological sciences, applied mathematics, engineering, computer science, and, through the Engagement program, other non-physics research disciplines. The distributed facility is heavily used, as described below and in the attached document showing usage charts. A strong OSG focus in the last year has been supporting the ATLAS and CMS collaborations preparations for LHC data taking that has recently re-started. Each experiment has each run significant workload tests (including STEP09), data distribution and analysis challenges as well as maintained significant ongoing simulation processing. The OSG infrastructure has performed well in these exercises and is ready to support full data taking (at the date of the report we have successfully supported the initial run in December 2009. OSG is actively partnering with ATLAS and CMS to develop and deploy mechanisms that will enable productive use of over 40 new Tier3 sites in the United States. OSG has made significant accomplishments in support of the science of the Consortium members and stakeholders. In 2009 LIGO significantly ramped up Einstein@Home production on OSG to search for gravitational radiation from spinning neutron star pulsars. D0 published 42 papers in 2009 that utilized OSG computing facilities, including a 62% increase in Monte Carlo production over 2008. Similarly, CDF utilized OSG facilities to publish 56 papers (with 16 additional submissions) and 83 conference papers in 2009. CMS recently submitted for publication 23 physics papers based on cosmic ray analyses and the recent December 2009 collision run, all of which were based on significant use of OSG computing facilities. Similarly, ATLAS submitted a total of 8 papers for publication that relied on OSG resources and services. Overall, more than 150 publications/submissions in 2009 (listed in Section 7) depended on support from the OSG (not just in “cycles” but in software and the other services we provide). 4 Besides the physics communities the structural biology group at Harvard Medical School, mathematics research at the University of Colorado, chemistry calculations at Buffalo, protein structure modeling and prediction applications have sustained (though cyclic) use of the production infrastructure. The Harvard paper was published in Science. 1.3 OSG cyberinfrastructure research As a comprehensive collaboratory OSG continues to provide a laboratory for research activities to deploy and extend advanced distributed computing technologies in the following areas: Research on the operation of a scalable heterogeneous cyber-infrastructure in order to improve its effectiveness and throughput. As part of this research we have developed a comprehensive set of “availability” probes and reporting infrastructure to allow site and grid administrators to quantitatively measure and assess the robustness and availability of the resources and services. Deployment and scaling in the production use of “pilot-job” workload management system – ATLAS PanDA and CMS glideinWMS. These developments were crucial to the experiments meeting their analysis job throughput targets. Scalability and robustness enhancements to Condor technologies. For example, extensions to Condor to support Pilot job submissions were developed, significantly increasing the job throughput possible on each Grid site. Scalability and robustness testing of enhancements to Globus grid technologies. For example, testing of the alpha and beta releases of the Globus GRAM5 package provided feedback to Globus ahead of the official release, in order to improve the quality of the released software. Scalability and robustness testing of BeStMan, XrootD, dCache, and HDFS storage technologies at-scale to determine their capabilities and provide feedback to the development team to help meet the needs of the OSG stakeholders. Operational experiences with a widely distributed security infrastructure that assesses usability and availability together with response, vulnerability and risk. Support of inter-grid gateways that support transfer of data and cross- execution of jobs, including transportation of information, accounting, service availability information between OSG and European Grids supporting the LHC Experiments (EGEE/WLCG). Usage of the Wisconsin GLOW campus grid “grid-router” to move data and jobs transparently from the local infrastructure to the national OSG resources. Prototype testing of the OSG FermiGridto-TeraGrid gateway to enable greater integration and thus enable easier access to appropriate resources for the science communities. Integration and scaling enhancement of BOINC-based applications (LIGO’s Einstein@home) submitted through grid interfaces. Further development of a hierarchy of matchmaking services (OSG MM) and Resource Selection Services (ReSS) that collect information from more than most of the OSG sites and provide community based matchmaking services that are further tailored to particular application needs. 5 Investigations and testing of policy and scheduling algorithms to support “opportunistic” use and backfill of resources that are not otherwise being used by their owners, using information services such as GLUE, matchmaking and workflow engines including Pegasus and Swift. Comprehensive job accounting across most OSG sites with published summaries for each VO and Site. This work also supports a per-job information finding utility for security forensic investigations. 1.4 Technical achievements in 2009 More than three quarters of the OSG staff directly support (and leverage at least an equivalent number of contributor efforts) the operation and software for the ongoing stakeholder productions and applications (the remaining quarter mainly engages new customers and extends and proves software and capabilities; and also provides management and communications etc.). In 2009, some specific technical activities that directly support science include: OSG released a stable, production-capable OSG 1.2 software package on a schedule that enabled the experiments to deploy and test the cyberinfrastructure before LHC data taking. This release also allowed the Tier-1 sites to transition to be totally OSG supported, eliminating the need for separate integration of EGEE gLite components and simplifying software layering for applications that use both EGEE and OSG. OSG carried out “prove-in” of reliable critical services (e.g. BDII) for LHC and operation of services at levels that meet or exceed the needs of the experiments. This effort included robustness tests of the production infrastructure against failures and outages and validation of information by the OSG as well as the WLCG. Collaboration with STAR continued toward deploying the STAR software environment as virtual machine images on grid and cloud resources. We have successfully tested publish/subscribe mechanisms for VM instantiation (Clemson) as well as VM managed by batch systems (Wisconsin). Extensions work on LIGO resulted in adapting the Einstein@Home for Condor-G submission, enabling a greater than 5x increase in the use of OSG by Einstein@Home. Collaborative support of ALICE, Geant4, NanoHub, and SBGrid has increased their productive access to and use of OSG, as well as initial support for IceCube and GlueX. Engagement efforts and outreach to science communities this year have led to work and collaboration with more than 10 additional research teams. Security program activities that continue to improve our defenses and capabilities towards incident detection and response via review of our procedures by peer grids and adoption of new tools and procedures. Metrics and measurements effort has continued to evolve and provides a key set of function in enabling the US-LHC experiments to understand their performance against plans; and assess the overall performance and production across the infrastructure for all communities. In addition, the metrics function handles the reporting of many key data elements to the LHC on behalf of US-ATLAS and US-CMS. Work in testing at-scale of various software elements and feedback to the development teams to help achieve the needed performance goals. In addition, this effort tested new candidate 6 technologies (e.g. CREAM and ARC) in the OSG environment and provided feedback to WLCG. Contributions to the PanDA and GlideinWMS workload management software that have helped improve capability and supported broader adoption of these within the experiments. Collaboration with ESnet and Internet2 has provided a distribution, deployment, and training framework for new network diagnostic tools such as perfSONAR and extensions in the partnerships for Identity Management. Training, Education, and Outreach activities have reached out to numerous professionals and students who may benefits from leverage of OSG and the national CI. The International Science Grid This Week electronic newsletter (www.isgtw.org) continues to experience significant subscription growth. In summary, OSG continues to demonstrate that national cyberinfrastructure based on federation of distributed resources can effectively meet the needs of researchers and scientists. 1.5 Challenges facing OSG We continue to work and make progress on the challenges faced to meet our longer-term goals: Dynamic sharing of tens of locally owned, used and managed compute and storage resources, with minimal additional human effort (to use and administer, and limited negative impacts on the communities owning them. Utilizing shared storage with other than the owner group - not only more difficult than (the quantized) CPU cycle sharing, but also less well supported by the available middleware. Federation of the local and community identity/authorization attributes within the OSG authorization infrastructure. The effort and testing required for inter-grid bridges involves significant costs, both in the initial stages and in continuous testing and upgrading. Ensuring correct, robust end-to-end reporting of information across such bridges remains fragile and human effort intensive. Validation and analysis of availability and reliability testing, accounting and monitoring information. Validation of the information is incomplete, needs continuous attention, and can be human effort intensive. The scalability and robustness of the infrastructure has not yet reached the scales needed by the LHC for analysis operations in the out years. The US LHC software and computing leaders have indicated that OSG needs to provide ~x2 in interface performance over the next year or two, and the robustness to upgrades and configuration changes throughout the infrastructure needs to be improved. Full usage of available resources. A job “pull” architecture (e.g., the Pilot mechanism) provides higher throughput and better management than one based on static job queues, but now we need to move to the next level of effective usage. Automated site selection capabilities are inadequately deployed and are also embryonic in the capabilities needed – especially when faced with the plethora of errors and faults that are 7 naturally a result of a heterogeneous mix of resources and applications with greatly varying I/O, CPU and data provision and requirements. User and customer frameworks are important for engaging non-Physics communities in active use of grid computing technologies; for example, the structural biology community has ramped up use of OSG enabled via portals and community outreach and support. A common operations infrastructure across heterogeneous communities can be brittle. Efforts to improve the early detection of faults and problems before they impact the users help everyone. Analysis and assessment of and recommendations as a result of usage, performance, accounting and monitoring information is a key need which requires dedicated and experienced effort. Transitioning students from the classroom to be users is possible but continues as a challenge, partially limited by the effort OSG can dedicate to this activity. Use of new virtualization, multi-core and job parallelism techniques, scientific and commercial cloud computing. We have two new satellite projects funded in these areas: the first on High Throughput Parallel Computing (HTPC) on OSG resources for an emerging class of applications where large ensembles (hundreds to thousands) of modestly parallel (4- to ~64way) jobs; the second a research project to do application testing over the ESnet 100-Gigabit network prototype, using the storage and compute end-points supplied by the Magellan cloud computing at ANL and NERSC. Collaboration and partnership with TeraGrid, under the umbrella of a Joint Statement of Agreed Upon Principles signed in August 2009. New activities have been undertaken, including representation at one another’s management meetings, tests on submitting jobs to one another’s infrastructure and exploration of how to accommodate the different resource access mechanisms of the two organizations. These challenges are not unique to OSG. Other communities are facing similar challenges in educating new entrants to advance their science through large-scale distributed computing resources. 8 2. 2.1 Contributions to Science ATLAS In preparation for the startup of the LHC collider at CERN in December, the ATLAS collaboration was performing computing challenges of increasing complexity and scale. Monte Carlo production is ongoing with some 50,000 concurrent jobs worldwide, with about 10,000 jobs running on resources provided by the distributed ATLAS computing facility operated in the U.S., comprising the Tier-1 center at BNL and 5 Tier-2 centers located at 9 different institutions. As the facility completed its readiness for distributed analysis a steep ramp-up of analysis jobs in particular at the Tier-2 sites was observed. Based upon the results achieved during the computing challenges and at the start of ATLAS data taking we can confirm that the tiered, grid-based, computing model is the most flexible structure currently conceivable to process, reprocess, distill, disseminate, and analyze ATLAS data. We found, however, that the Tier-2 centers may not be sufficient to reliably serve as the primary analysis engine for more than 400 U.S. physicists. As a consequence a third tier with computing and storage resources located geographically close to the researchers was defined as part of the analysis chain as an important component to buffer the U.S. ATLAS analysis system from unforeseen, future problems. Further, the enhancement of U.S. ATLAS institutions’ Tier-3 capabilities is essential and is planned to be built around the short and long-term analysis strategies of each U.S. group. An essential component of this strategy is the creation of a centralized support structure, because the considerable obstacle to creating and sustaining campus-based computing clusters is the continuing support required. In anticipation of not having access to IT professionals to install and maintain these clusters U.S. ATLAS at each institution has spent a considerable amount of effort to develop an approach for a low maintenance implementation of Tier-3 computing. While computing expertise in U.S. ATLAS was sufficient to develop ideas on a fairly high level only the combined expert knowledge and associated effort from U.S. ATLAS and OSG facilities personnel has eventually resulted in a package that is easily installable and maintainable by scientists. Dan Fraser of the Open Science Grid (OSG) has organized regular Tier 3 Liaison meetings between several members of the OSG facilities, U.S. Atlas and U.S. CMS. During these meetings, topics discussed include cluster management, site configuration, site security, storage technology, site design, and experiment-specific Tier-3 requirements. Based on information exchanged at these meetings several aspects of the U.S. Atlas Tier-3 design were refined. Both the OSG and U.S. Atlas Tier-3 documentation was improved and enhanced. OSG has hosted two multi-day working meetings at the University of Wisconsin in Madison with members from the OSG Virtual Data Toolkit (VDT) group, Condor development team, the OSG Tier-3 group and U.S. Atlas attending. As a result U.S. Atlas changed the way that user accounts were to be handled based on discussions with the system manager of the University of Wisconsin GLOW cluster; thus leading to a simpler design for smaller Tier-3 sites. Also, the method for Condor batch software installation using yum repositories and RPM files were developed. As a result the Condor and VDT teams created an official yum repository for software installation similar to how the Scientific Linux operation system is installed. Detailed instructions were written for the installation of U.S. Atlas Tier-3 sites, including instructions for components as complex as storage management systems based on xrootd. 9 Following the trips to Madison, U.S. Atlas installed an entire Tier 3 cluster using virtual machines on one multi core desktop machine. This virtual cluster is used for documentation development and U.S. Atlas Tier-3 administrator training. Marco Mambelli of OSG has provided much assistance in the configuration and installation of the software for the virtual cluster. Also, OSG supports a crucial user software package (wlcg-client with support for xrootd added by OSG) used by all U.S. Atlas Tier 3 users, a package that enhances and simplifies the user environment at the Tier 3 sites. Today U.S. ATLAS (contributing to ATLAS as a whole) relies quite extensively on services and software provided by OSG, as well as on processes and support systems that have been produced or evolved by OSG. Partly this originates in the fact that U.S. ATLAS fully committed itself to relying on OSG for several years now. Over the course of the last about 3 years U.S. ATLAS itself invested heavily in OSG in many aspects – human and computing resources, operational coherence and more. In addition and essential to the operation of the worldwide distributed ATLAS computing facility the OSG efforts have aided the integration with WLCG partners in Europe and Asia. The derived components and procedures have become the basis for support and operation covering the interoperation between OSG, EGEE, and other grid sites relevant to ATLAS data analysis. OSG provides Software components that allow interoperability with European ATLAS sites, including selected components from the gLite middleware stack such as LCG client utilities (for file movement, supporting space tokens as required by ATLAS), and file catalogs (server and client). It is vital to U.S. ATLAS that the present level of service continue uninterrupted for the foreseeable future, and that all of the services and support structures upon which U.S. ATLAS relies today have a clear transition or continuation strategy. Based on experience and observations U.S. ATLAS made a suggestion to start work on a coherent middleware architecture rather than having OSG to continue providing a distribution as a heterogeneous software system consisting of components contributed by a wide range of projects. One of the difficulties we ran into several times was due to inter-component functional dependencies that can only be avoided if there is good communication and coordination between component development teams. A technology working group, chaired by a member of the U.S. ATLAS facilities group (John Hover, BNL) is asked to help the OSG Technical Director by investigating, researching, and clarifying design issues, resolving questions directly, and summarizing technical design trade-offs such that the component project teams can make informed decisions. In order to achieve the goals as they were set forth, OSG needs an explicit, documented system design, or architecture so that component developers can make compatible design decisions and virtual organizations (VO) such as U.S. ATLAS can develop their own application based on the OSG middleware stack as a platform. As a short-term goal the creation of a design roadmap is in progress. Regarding support services the OSG Grid Operations Center (GOC) infrastructure at Indiana University is at the heart of the operations and user support procedures. It is also integrated with the GGUS infrastructure in Europe making the GOC a globally connected system for worldwide ATLAS computing operation. Middleware deployment support is an essential and complex function the U.S. ATLAS facilities are fully dependent upon. The need for continued support for testing, certifying and building a middleware as a solid foundation our production and distributed analysis activities runs on was 10 served very well so far and will continue to exist in the future, as does the need for coordination of the roll out, deployment, debugging and support for the middleware services. In addition the need for some level of preproduction deployment testing has been shown to be indispensable and must be maintained. This is currently supported through the OSG Integration Test Bed (ITB) providing the underlying grid infrastructure at several sites with a dedicated test instance of PanDA, the ATLAS Production and Distributed Analysis system running on top of it. This implements the essential function of validation processes that accompany incorporation of new and new versions of grid middleware services into the VDT, the coherent OSG software component repository. In fact U.S. ATLAS relies on the VDT and OSG packaging, installation, and configuration processes that lead to a well-documented and easily deployable OSG software stack. U.S. ATLAS greatly benefits from OSG's Gratia accounting services, as well as the information services and probes that provide statistical data about facility resource usage and site information passed to the application layer and to WLCG for review of compliance with MOU agreements. One of the essential parts of grid operations is that of operational security coordination. The coordinator is provided by OSG today, and relies on good contacts to security representatives at the U.S. ATLAS Tier-1 center and Tier-2 sites. Thanks to activities initiated and coordinated by OSG a strong operational security community has grown up in the U.S. in the past few years, driven by the needs of ensuring that security problems are well coordinated across the distributed infrastructure. Moving on to ATLAS activities that were performed on the OSG facility infrastructure in 2009 the significant feature, besides dissemination and processing of collision data late in the year, was the preparation for and the execution of the STEP’09 (Scale Testing for the Experimental Program 2009) exercise, which provided a real opportunity to perform a significant scale test with all experiments scheduling exercises in synchrony with each other. The main goals were to utilize specific aspects of the computing model that had not previously been fully tested. The two most important tests were the reprocessing at Tier-1 centers with the experiments recalling the data from tape at the full anticipated rates, and the testing of analysis workflows at the Tier-2 centers. In the U.S., and in several other regions, the results were very encouraging with the Tier2 centers exceeding the metrics set by ATLAS. As an example, with more than 150,000 successfully completed analysis jobs at 94% efficiency U.S. ATLAS took the lead in ATLAS. This was achieved while the Tier-2s were also continuing to run Monte Carlo production jobs, scaled to keep the Tier-2 resources fully utilized. Not a single problem with the OSG provided infrastructure was encountered during the execution of the exercise. However, there is an area of concern that may impact the facilities’ performance in the future. As we continuously increase the number of job slots at sites the performance of pilot submission through CondorG and the underlying Globus Toolkit 2 (GT2) based gatekeeper has to keep up without slowing down the job throughput in particular when running short jobs. When addressing this point with the OSG facilities team we found that they were open to evaluate and incorporate recently developed components, like the CREAM CE provided by EGEE developers in Italy. There are already intensive tests conducted by the Condor team in Madison and integration issues were discussed in December between the PanDA team and Condor developers. Other important achievements are connected with the start of LHC operations on November 23rd. As all ATLAS detector components were working according to their specifications, including the data acquisition, trigger chain and prompt reconstruction at the Tier-0 center at CERN timely data dissemination to and stable operations of the facility services in the U.S. were the most criti11 cal aspects we were focusing on. We are pleased to report that throughout the (short) period of data taking (ended on December 16th) data replication was observed at short latency, with data transfers from the Tier-0 to the Tier-1 center at BNL typically completed within 2.6 hours and transfers from the Tier-1 to 5 Tier-2 and 1 Tier-3 centers completed in 6 hours on average. The full RAW dataset (350 TB) and the derived datasets were sent to BNL, with the derived datasets (140 TB) further distributed to the Tier-2s. About 100 users worldwide started immediately to analyze the collision data, with 38% of the analysis activities observed to run on the resources provided in the U.S. Due to improved alignment and condition data that became available following first pass reconstruction of the RAW data at CERN a worldwide reprocessing campaign was conducted by ATLAS over the Christmas holidays. The goal of the campaign was to process all good runs at the Tier-1 centers and have the results ready by January 1st, 2010. All processing steps (RAW -> ESD, ESD -> AOD, HIST, NTUP and TAG) and the replication of newly produced datasets to the Tier-1 and Tier-2 centers worldwide was completed well before the end of the year. ATLAS Figure 2: OSG CPU hours used by ATLAS (57M hours total), color coded by facility. In the area of middleware extensions, US ATLAS continued to benefit from the OSG's support for and involvement in the US ATLAS developed distributed processing and analysis system (PanDA) layered over the OSG's job management, storage management, security and information system middleware and services, providing a uniform interface and utilization model for the experiment's exploitation of the grid, extending not only across OSG but EGEE and Nordugrid as well. PanDA is the basis for distributed analysis and production ATLAS-wide, and is also used by the OSG itself as a WMS available to OSG VOs, with the addition this year of a PanDA based service for OSG Integrated Testbed (ITB) test job submission, monitoring and automation. This year the OSG's WMS extensions program contributed the effort and expertise essential to a program of PanDA security improvements that culminated in the acceptance of PanDA's security model by the WLCG, and the implementation of that model on the foundation of the glexec system that is the new OSG and EGEE standard service supporting the WLCG- 12 approved security model for pilot job based systems. The OSG WMS extensions program supported Maxim Potekhin and Jose Caballero at BNL, working closely with the ATLAS PanDA core development team at BNL and CERN, who provided the design, implementation, testing and technical liaison (with ATLAS computing management, the WLCG security team, and developer teams for glexec and ancillary software) for PanDA's glexec-based pilot security system. This system is now ready for deployment once glexec services are offered at ATLAS production and analysis sites. The work was well received by ATLAS and WLCG, both of which have agreed that in a new round of testing to proceed in early 2010 (validating a new component of the glexec software stack, ARGUS), other experiments will take the lead role in testing and validation in view of the large amount of work contributed by Jose and Maxim in 2009. This will allow Jose and Maxim more time in 2010 for a new area we are seeking to build as a collaborative effort between ATLAS and OSG, in WMS monitoring software and information systems. ATLAS and US ATLAS are in the process of merging what have been distinct monitoring efforts, a PanDA/US effort and a WLCG/CERN/ARDA effort, together with a newly opening development effort in an overarching ATLAS Grid Information System (AGIS) that will integrate ATLAS-specific information with the grid information systems. Discussions are beginning with the objective of integrating this merged development effort with the OSG program in information systems and monitoring. US ATLAS and the OSG WMS effort are also continuing to seek ways in which the larger OSG community can leverage PanDA and other ATLAS developments. In response to feedback from prospective PanDA users outside ATLAS, a simple web-based data handling system was developed that integrates well with PanDA data handling to allow VOs using PanDA to easily integrate basic dataflow as part of PanDAmanaged processing automation. We also look forward to the full deployment of PanDA for ITB testbed processing automation. In other joint OSG work, CondorG constitutes the critical backbone of PanDA's pilot job submission system, and as mentioned earlier in this report we have benefited greatly from several rounds of consultation and consequent Condor improvements to increase the scalability of the submission system for pilot jobs. We expect that in 2010 the OSG and its WMS effort will continue to be the principal source for PanDA security enhancements, and an important contributor to further integration with middleware (particularly Condor), and scalability/stress testing. 2.2 CMS US-CMS relies on Open Science Grid for critical computing infrastructure, operations, and security services. These contributions have allowed US-CMS to focus experiment resources on being prepared for analysis and data processing, by saving effort in areas provided by OSG. OSG provides a common set of computing infrastructure on top of which CMS, with development effort from the US, has been able to build a reliable processing and analysis framework that runs on the Tier-1 facility at Fermilab, the project supported Tier-2 university computing centers, and opportunistic Tier-3 centers at universities. There are currently 18 Tier-3 centers registered with the CMS computing grid in the US which provide additional simulation and analysis resources to the US community. In addition to common interfaces, OSG has provided the packaging, configuration, and support of the storage services. Since the beginning of OSG the operations of storage at the Tier-2 centers have improved steadily in reliability and performance. OSG is playing a crucial role here for CMS in that it operates a clearinghouse and point of contact between the sites that deploy and operate this technology and the developers. In addition, OSG fills in gaps left open by the devel- 13 opers in areas of integration, testing, and tools to ease operations. The stability of the computing infrastructure has not only benefitted CMS. CMS’ use of resources (see Figure 3 and Figure 4) has been very much cyclical so far, thus allowing for significant use of the resources by other scientific communities. OSG is an important partner in Education and Outreach, and in maximizing the impact of the investment in computing resources for CMS and other scientific communities. In addition to computing infrastructure OSG plays an important role in US-CMS operations and security. OSG has been crucial to ensure US interests are addressed in the WLCG. The US is a large fraction of the collaboration both in terms of participants and capacity, but a small fraction of the sites that make-up WLCG. OSG is able to provide a common infrastructure for operations including support tickets, accounting, availability monitoring, interoperability and documentation. As CMS has entered the operations phase, the need for sustainable security models and regular accounting of available and used resources has become more important. The common accounting and security infrastructure and the personnel provided by OSG represent significant benefits to the experiment, with the teams at Fermilab and the University of Nebraska providing the development and operations support, including the reporting and validation of the accounting information between the OSG and WLCG. CMS Figure 3: OSG CPU hours used by CMS (63M hours total), color coded by facility. 14 CMS Figure 4: Number of unique CMS users of the OSG computing facility. In addition to these general roles OSG plays for CMS, there were the following concrete contributions OSG has made to major milestones in CMS during the last year: Within the last year, CMS transitioned to using glideinWMS in production for both reprocessing at the Tier-1s as well as analysis at the Tier-2s and Tier-3s. Burt Holzman at Fermilab and Igor Sfiligoi at UCSD have been spearheading these efforts. OSG has been crucial in providing expertise during deployment and integration, and working with the Condor team on resolving a variety of scalability issues within glideinWMS. In addition, OSG provided developer expertise towards the integration of glideinWMS with the CMS data analysis framework, CRAB. The next steps in this area are to work out deployment issues with CERN. Through the leadership of Frank Wuerthwein at UCSD OSG is working with DISUN and Condor on a more firewall friendly glideinWMS, a requirement for CERN deployment as well as to overcome the next set of scalability hurdles. OSG’s introduction of lcg-cp/ls/rm, UberFTP, and MyProxy into the VDT client installation has eliminated the dependencies on the gLite middleware at CMS sites, thus significantly simplifying deployment and operations of the sites. The OSG work on CE scalability and robustness has made CE overloads a rare occurrence at present scales for CMS. The only time we still experience this as a problem is when VOs other than CMS use our sites in non-optimal ways. The investment by the OSG storage team, led by Tanya Levshina at Fermilab, in support of BeStMan SRM is an excellent example where OSG provided something requested by stakeholders other than CMS, but that has proven to be very beneficial to CMS. CMS now deploys BeStMan, supported through OSG’s Alex Sim who is a member of Arie Shoshani’s team at 15 LBNL, as part of the preferred Storage Element (SE) solution at Tier-3s, as well as a high performance SE option for the Tier-2s. The latter in combination with HDFS or Lustre as file systems, reaching 100Hz lcg-ls rates during scalability tests done by OSG at three of the CMS Tier-2 sites. This is a 20x increase within the last year. OSG has recently been starting to include integration and deployment support for the complete package of BeStMan/gridFTP/Fuse/HDFS for use at Tier-2s and Tier-3s. Without OSG, this kind of “technology transfer” into CMS would not have happened. OSG, through the contributions of Brian Bockelman at the University of Nebraska working with the CMS and OSG storage teams, has contributed to an improved understanding of the IO characteristic of CMS analysis jobs, and is providing advice and expertise to an effort in global CMS of measuring the IO capability of all of the CMS sites worldwide. OSG helped get this activity off the ground in global CMS, with an expectation that lessons learned will flow back to OSG to be disseminated to the wider OSG community. The CMS goal here is to improve the CPU efficiency for data analysis at Tier-2s and Tier-3s worldwide. We expect this activity to continue well into 2010. Finally, let us mention a major success in connection with the first LHC collisions observed in the CMS detector in the afternoon of November 23rd, 2009. By the end of the day, 4 Tier-2s and one Tier-3 on OSG had transferred the complete 2/3 TB collision data taken that day, and were making it accessible for analysis to everybody in the collaboration. We consider this an excellent validation of the LHC computing model, of which OSG is a major part. Since then OSG sites have kept up with data taking, and are presently hosting all the relevant primary and secondary datasets from collisions, as well as the corresponding Monte Carlos. 2.3 LIGO The Einstein@Home data analysis application that searches for gravitational radiation from spinning neutron stars using data from the Laser Interferometer Gravitational Wave Observatory (LIGO) detectors was identified as an excellent LIGO application for production runs on OSG. Because this particular search is virtually unbounded in the scientific merit achieved by additional computing resources, any available resource can provide additional scientific benefit. The original code base for Einstein@Home has evolved significantly during the first half of this year to support the use of Condor-G for job submission and job management. This has removed internal dependencies in the code base on the particular job manager in use at various grid sites around the world. Several other modifications to the code ensued to address stability, reliability and performance. By May 2009, the code was running reliably in production on close to 20 sites across the OSG that support job submission from the LIGO Virtual Organization. Since late summer of 2009 the number of wall clock hours utilized by Einstein at home on the OSG have nearly doubled from what was typically available in the first half of the year. Since January 1, 2009, more than 5 million wall clock hours have been contributed by the Open Science Grid to this important LIGO data analysis application (Figure 5). This is primarily due to effort made to deploy and run the application on a larger number of OSG sites, and working with local site authorities to understand local policies and how best to support them in the running application. 16 Figure 5: OSG Usage by LIGO's Einstein@Home application for the year running at production levels. The new code using the Condor-G job submission interface began in March of 2009. Increases seen since early September are due to increases in number of sites compatible with code. In the past year, LIGO has also begun to investigate ways to migrate the data analysis workflows searching for gravitational radiation from binary black holes and neutron stars onto the Open Science Grid for production scale utilization. The binary inspiral data analyses typically involve working with tens of terabytes of data in a single workflow. Collaborating with the Pegasus Workflow Planner developers at USC-ISI, LIGO has identified changes to both Pegasus and to the binary inspiral workflow codes to more efficiently utilize the OSG where data must be moved from LIGO archives to storage resources near the worker nodes on OSG sites. One area of particular focus has been on the understanding and integration of Storage Resource Management (SRM) technologies used in OSG Storage Element (SE) sites to house the vast amounts of data used by the binary inspiral workflows so that worker nodes running the binary inspiral codes can effectively access the data. A SRM Storage Element has been established on the LIGO Caltech OSG integration testbed site. This site has 120 CPU cores with approximately 30 terabytes of storage currently configured under SRM. The SE is using BeStMan and Hadoop for the distributed file system shared among the worker nodes. Using Pegasus for the workflow planning, workflows for the binary inspiral data analysis application needing tens of terabytes of LIGO data have successfully run on this system. Effort is now underway to translate this success onto OSG productions sites that support the SRM storage element. LIGO has also been working closely with the OSG, DOE Grids, and ESnet to evaluate the implications of its requirements on authentication and authorization within its own LIGO Data Grid user community and how these requirements map onto the security model of the OSG. 17 2.4 ALICE The ALICE experiment at the LHC relies on a mature grid framework, AliEn, to provide computing resources in a production environment for the simulation, reconstruction and analysis of physics data. Developed by the ALICE Collaboration, the framework has been fully operational for several years, deployed at ALICE and WLCG Grid resources worldwide. This past year, in conjunction with plans to deploy ALICE Grid resources in the US, the ALICE VO and OSG have begun the process of developing a model to integrate OSG resources into the ALICE Grid. In early 2009, an ALICE-OSG Joint task force was formed to support the inclusion of ALICE Grid activities in OSG. The task force developed a series of goals leading to a common understanding of AliEn and OSG architectures. The OSG Security team reviewed and approved a proxy renewal procedure common to ALICE Grid deployments for use on OSG sites. A jobsubmission mechanism was implemented whereby an ALICE VO-box service deployed on the NERSC-PDSF OSG site, submitted jobs to the PDSF cluster through the OSG interface. The submission mechanism was activated for ALICE production tasks and operated for several months. The task force validated ALICE OSG usage through normal reporting means and verified that site operations were sufficiently stable for ALICE production tasks at low job rates and with minimal data requirements as allowed by the available local resources. The ALICE VO is currently a registered VO with OSG, supports a representative in the OSG VO forum and an Agent to the OSG-RA for issuing DOE Grid Certificates to ALICE collaborators. ALICE use of OSG will grow as ALICE resources are deployed in the US. These resources will provide the data storage facilities needed to expand ALICE use of OSG and add compute capacity on which the AliEn-OSG interface can be utilized at full ALICE production rates. 2.5 D0 at Tevatron The D0 experiment continues to rely heavily on OSG infrastructure and resources in order to achieve the computing demands of the experiment. The D0 experiment has successfully used OSG resources for many years and plans on continuing with this very successful relationship into the foreseeable future. All D0 Monte Carlo simulation is generated at remote sites, with OSG continuing to be a major contributor. During the past year, OSG sites simulated 390 million events for D0, approximately 1/3 of all production. This is an increase of 62% relative to the previous year. The reason for this increase is due to many factors. The opportunistic use of storage elements has improved. The SAM infrastructure has been improved resulting in an increased OSG job efficiency. In addition many sites have modified their preemption policy which improves the throughput for the D0 jobs. Improved efficiency, increased resources, (D0 used 24 sites in the past year and uses 21 regularly), automated job submission, use of resource selection services and expeditious use of opportunistic computing continue to play a vital role in high event throughput. The total number of D0 OSG MC events produced over the past several years is 726 million events (Figure 6). Over the past year, the average number of Monte Carlo events produced per week by OSG continues to remain approximately constant. Since we use the computing resources opportunistically, it is interesting to find that, on average, we can maintain an approximately constant rate of MC events. When additional resources become available (April and May of 2009 in Figure 7) we are quickly able to take advantage of this. Over the past year we were able to produce over 10 million events/week on nine separate occasions and a record of 13 million events/week was 18 set in May 2009. With the turn on of the LHC experiments, it will be more challenging to obtain computing resources opportunistically. It is therefore very important that D0 use the resources it can obtain efficiently. Therefore D0 plans to continue to work with OSG and Fermilab computing to continue to improve the efficiency of Monte Carlo production on OSG sites. The primary processing of D0 data continues to be run using OSG infrastructure. One of the very important goals of the experiment is to have the primary processing of data keep up with the rate of data collection. It is critical that the processing of data keep up in order for the experiment to quickly find any problems in the data and to keep the experiment from having a backlog of data. Typically D0 is able to keep up with the primary processing of data by reconstructing nearly 6 million events/day. However, when the accelerator collides at very high luminosities, it is difficult to keep up with the data using our standard resources. However, since the computing farm and the analysis farm have the same infrastructure, D0 is able to increase its processing power quickly to improve its daily processing of data. During the past year, D0 moved some of its analysis nodes to processing nodes in order to increase its throughput. Figure 8 shows the cumulative number of data events processed per day. In March the additional nodes were implemented and one can see the improvement in events reconstructed. These additional nodes allowed D0 to quickly finish processing its collected data and the borrowed nodes were then returned to the analysis cluster. Having this flexibility to move analysis nodes to processing nodes in times of need is a tremendous asset. Over the past year D0 has reconstructed nearly 2 billion events on OSG facilities. OSG resources have allowed D0 to meet is computing requirements in both Monte Carlo production and in data processing. This has directly contributed to D0 publishing 37 papers in 2009, see http://www-d0.fnal.gov/d0_publications/#2009 Figure 6: Cumulative number of D0 MC events generated by OSG during the past year. 19 Figure 7: Number of D0 MC events generated per week by OSG during the past year. Figure 8: Cumulative number of D0 data events processed by OSG infrastructure. In March additional nodes were added to the processing farm to increase the throughput of the processing farm. The flat region in August and September corresponds to the time when the accelerator was down for maintenance so no events needed to be processed. 2.6 CDF at Tevatron In 2009, the CDF experiment produced 83 new results in 2009 for the winter and summer conferences using OSG infrastructure and resources. These resources support the work of graduate students, who are producing one thesis per week, and the collaboration as a whole, which is submitting a publication of new physics results every ten days. Over one billion Monte Carlo events were produced by CDF in the last year. Most of this processing took place on OSG resources. CDF also used OSG infrastructure and resources to support the processing of 1.8 billion raw data events that were streamed to 3.4 billion reconstructed events, which were then processed into 5.7 billion ntuple events; an additional 1.5 billion ntuple events were created from Monte Carlo data. Detailed numbers of events and volume of data are given in Table 2 (total data since 2000) and Table 3 (data taken in 2009). 20 Table 2: CDF data collection since 2000 Data Type Raw Data Production MC Stripped-Prd Stripped-MC MC Ntuple Total Volume (TB) 1398.5 1782.8 796.4 76.8 0.5 283.2 4866.5 # Events (M) 9658.1 12988.5 5618.5 712.1 3.0 4138.3 51151.9 # Files 1609133 1735186 907689 75108 533 257390 5058987 Table 3: CDF data collection in 2009 Data Type Raw Data Production MC Stripped-Prd Stripped-MC Ntuple MC Ntuple Total Data Volume (TB) 286.4 558.2 196.8 21.2 0 187.3 86.0 1335.8 # Events (M) 1791.7 3395.9 1034.6 127.1 0 5732.6 1482.2 13564.2 # Files 327598 466755 228153 17066 0 155129 75954 1270655 The OSG provides the collaboration computing resources through two portals. The first, the North American Grid portal (NamGrid), covers the functionality of MC generation in an environment which requires the full software to be ported to the site and only Kerberos or grid authenticated access to remote storage for output. The second portal, CDFGrid, provides an environment that allows full access to all CDF software libraries and methods for data handling which include dCache, disk resident dCache and file servers for small ntuple analysis. CDF, in collaboration with OSG, aims to improve the infrastructural tools in the next years to increase the usage of Grid resources, particularly in the area of distributed data handling. Since March 2009, CDF has been operating the pilot-based Workload Management System (glideinWMS) as the submission method to remote OSG sites. This system went into production on the NAmGrid portal. Figure 9 shows the number of running jobs on NAmGrid and demonstrates that there has been steady usage of the facilities, while Figure 10, a plot of the queued requests, shows that there is large demand. The highest priority in the last year has been to validate sites for reliable usage of Monte Carlo generation and to develop metrics to demonstrate smooth operations. One impediment to smooth operation has been the rate at which jobs are lost and restarted by the batch system. There were a significant number of restarts until May 2009, after which the rate tailed down significantly. At that point, it was noticed that most re-starts occurred at specific sites. These sites were subsequently removed from NAmGrid. Those sites and any new site are tested and certified in an integration instance of the NAmGrid portal using Monte Carlo jobs that have previously been run in production. 21 A large resource provided by Korea at KISTI is being brought into operation and will provide a large Monte Carlo production resource with high-speed connection to Fermilab for storage of the output. It will also provide a cache that will allow the data handling functionality to be exploited. We are also adding more monitoring to the CDF middleware to allow faster identification of problem sites or individual worker nodes. Issues of data transfer and the applicability of opportunistic storage is being studied as part of the effort to understand issues affecting reliability. Significant progress has been made by simply adding retries with a backoff in time assuming that failures occur most often at the far end. Figure 9: Running CDF jobs on NAmGrid Figure 10: Waiting CDF jobs on NAmGrid, showing large demand, especially in preparation for the 42 results sent to Lepton-Photon in August 2009 and the rise in demand for the winter 2010 conferences. A legacy glide-in infrastructure developed by the experiment has been running through 2009 until December 8 on the portal to on-site OSG resources (CDFGrid). This has now been replaced 22 by the same glideWMS infrastructure used in NAmGrid. Plots of the running jobs and queued requests are shown in Figure 11 and Figure 12. The very high demand for the CDFGrid resources observed during the winter conference season (leading to 41 new results), and again during the summer conference season (leading to an additional 42 new results) is noteworthy. Queues exceeding 50,000 jobs can be seen. Figure 11: Running CDF jobs on CDFGrid Figure 12: Waiting CDF jobs on CDFGrid A clear pattern of CDF computing has emerged. There is high demand for Monte Carlo production in the months after the conference season, and for both Monte Carlo and data starting about two months before the major conferences. Since the implementation of opportunistic computing on CDFGrid in August, the NAmGrid portal has been able to take advantage of the computing resources on FermiGrid that were formerly only available through the CDFGrid portal. This has led to very rapid production of Monte Carlo in the period of time between conferences when the generation of Monte Carlo datasets are the main computing demand. 23 In May 2009, CDF conducted a review of the CDF middleware and usage of Condor and OSG. While there were no major issues identified, a number of cleanup projects were suggested. These have all been implemented and will add to the long-term stability and maintainability of the software. A number of issues affecting operational stability and operational efficiency have arisen in the last year. These issues and examples and solutions or requests for further OSG development are cited here. Architecture of an OSG site: Among the major issues we encountered in achieving smooth and efficient operations was a serious unscheduled downtime lasting several days in April. Subsequent analysis found the direct cause to be incorrect parameters set on disk systems that were simultaneously serving the OSG gatekeeper software stack and end-user data output areas. No OSG software was implicated in the root cause analysis; however, the choice of an architecture was a contributing cause. This is a lesson worth consideration by OSG, that a best practices recommendation coming from a review of the implementation of computing resources across OSG could be a worthwhile investment. Service level and Security: Since April, Fermilab has had a new protocol for upgrading Linux kernels with security updates. An investigation of the security kernel releases from the beginning of 2009 showed that for both SLF4 and SLF5 the time between releases was smaller than the maximum time allowed by Fermilab for a kernel to be updated. Since this requires a reboot of all services, this has forced the NAmGrid and CDFGrid to be down for three days for long (72 hour) job queues every 2 months. A rolling reboot scheme has been developed and is deployed, but careful sequencing of critical servers is still being developed. Opportunistic Computing/Efficient resource usage: The issue of preemption has been important to CDF this year. CDF has both the role of providing and exploiting opportunistic computing. During the April downtime already mentioned above, the preemption policy as provider caused operational difficulties characterized vaguely as “unexpected behavior”. The definition of expected behavior was worked out and preemption was enabled after the August 2009 conferences ended. From the point of view of exploiting opportunistic resources, the management of preemption policies at sites has a dramatic effect on the ability of CDF to utilize those sites opportunistically. Some sites, for instance, modify the preemption policy from time to time. Tracking these changes requires careful communication with site managers to ensure that the job duration options visible to CDF users are consistent with the preemption policies. CDF has added queue lengths in an attempt to provide this match. The conventional way in which CDF Monte Carlo producers compute their production strategy, however, requires queue lengths that exceed the typical preemption time by a considerable margin. To address this problem, the current submission strategies are being re-examined. A more general review of the Monte Carlo production process at CDF will be carried out in February 2010. A second common impediment to opportunistic usage is the exercise of a preemption policy that kill jobs immediately when a higher priority user submits a job, rather than a more graceful policy that allows completion of the lower priority job within some time frame. CDF has removed all such sites from NAmGrid because the effective job failure rate is too high for users most users to tolerate. This step has significantly reduced the OSG resources available to CDF, which now essentially consists of those sites at which CDF has paid for computing. 24 Clear policies and guidelines on opportunistic usage, publication of the policies in human and computer readable form are needed so that the most efficient use of computing resources may be achieved. Infrastructure Reliability and Fault tolerance: Restarts of jobs are not yet completely eliminated. The main cause seems to be individual worker nodes that are faulty and reboot from time to time. While it is possible to blacklist nodes at the portal level, better sensing of these faults and removal at the OSG infrastructural level would be more desirable. We continue to emphasize the importance of stable running and minimization of infrastructure failures so that users can reliably assume that failures are the result of errors in their own processing code, thereby avoiding the need to continually question the infrastructure. Job and Data handling interaction: Job restarts also cause a loss of synchronization between the job handling and data handling. A separate effort is under way to improve the recovery tools within the data handling infrastructure. Tools and design are needed to allow for the Job and data handling to be integrated and to allow fault tolerance for both systems to remain synchronized. Management of input Data resources: The access to data both from databases and data files has caused service problems in the last year. The implementation of opportunistic running on CDFGrid from NAmGrid, coupled with decreased demand on CDFGrid for file access in the post-conference period and a significant demand for new Monte Carlo datasets to be used in the 2010 Winter conference season led to huge demands on the CDF Database infrastructure and grid job failures due to overloading of the databases. This has been traced to complex queries whose computations should be done locally rather than on the server. Modifications in the simulation code are being implemented and throttling of the Monte Carlo production is required until the new code is available in January 2010. During the conference crunch in July 2009, there was huge demand on the data-handling infrastructure and the 350TB disk cache was being turned over every two weeks. Effectively the files were being moved from tape to disk, being used by jobs and deleted. This in turn led to many FermiGrid worker nodes sitting idle waiting for data. A program to understand the causes of idle computing nodes from this and other sources has been initiated and CDF users are asked to more accurately describe what work they are doing when they submit jobs by filling in qualifiers in the submit command. In addition, work to implement and monitor the effectiveness of pre-staging files is nearing completion. There is a general resource management problem pointed to by this and the database overload issue. Resource requirements of jobs running on OSG should be examined in a more considered way and would benefit from more thought by the community at large. The usage of OSG for CDF has been fruitful and the ability to add large new resources such as KISTI as well as more moderate resources within a single job submission framework has been extremely useful for CDF. The collaboration has produced significant new results in the last year with the processing of huge data volumes. Significant consolidation of the tools has occurred. In the next year, the collaboration looks forward to a bold computing effort in the push to see evidence for the Higgs boson, a task that will require further innovation in data handling and significant computing resources in order to reprocess the large quantities of Monte Carlo and 25 data needed to achieve the desired improvements in tagging efficiencies. We look forward to another year with high publication rates and interesting discoveries. 2.7 Nuclear physics The STAR experiment has continued the use of data movement capabilities between its established Tier-1 and Tier-2 centers and between BNL and LBNL (Tier-1), Wayne State University and NPI/ASCR in Prague (two fully functional Tier-2 centers). A new center, the Korea Institute of Science and Technology Information (KISTI) has joined the STAR collaboration as a full partnering facility and a resource provider in 2008. Activities surrounding the exploitation of this new potential have taken a large part of STAR’s activity in the 2008/2009 period. The RHIC run 2009 was projected to bring to STAR a fully integrated new data acquisition system with data throughput capabilities going from 100 MB/sec reached in 2004 to1000 MB/sec. This is the second time in the experiment’s lifetime STAR computing has to cope with an order of magnitude growth in data rates. Hence, a threshold in STAR’s Physics program was reached where leveraging all resources across all available sites has become essential to success. Since the resources at KISTI have the potential to absorb up to 20% of the needed cycles for one pass data production in early 2009, efforts were focused on bringing the average data transfer throughput from BNL to KISTI to 1 Gb/sec. It was projected (Section 3.2 of the STAR computing resource planning, “The STAR Computing resource plan”, STAR Notes CSN0474, http://drupal.star.bnl.gov/STAR/starnotes/public/csn0474) that such a rate would sustain the need up to 2010 after which a maximum of 1.5 Gb/sec would cover the currently projected Physics program up to 2015. Thanks to the help from ESNet, Kreonet and collaborators at both end institutions this performance was reached (see http://www.bnl.gov/rhic/news/011309/story2.asp, “From BNL to KISTI: Establishing High Performance Data Transfer From the US to Asia” and http://www.lbl.gov/cs/Archive/news042409c.html, “ESnet Connects STAR to Asian Collaborators”). At this time baseline Grid tools are used and the OSG software stack has not yet been deployed. STAR plans to include a fully automated job processing capability and return of data results using BeStMan/SRM (Berkeley’s implementation of SRM server). Encouraged by the progress on the network tuning for the BNL/KISTI path and driven by the expected data flood from Run-9, the computing team re-addressed all of its network data transfer capabilities, especially between BNL and NERSC and between BNL and MIT. MIT has been a silent Tier-2, a site providing resources for local scientist’s research and R&D work but has not been providing resources to the collaboration as a whole. MIT has been active since the work made on Mac/X-Grid reported in 2006, a well-spent effort which has evolved in leveraging additional standard Linux-based resources. Data samples are routinely transferred between BNL and MIT. The BNL/STAR gatekeepers have all been upgraded and all data transfer services are being re-tuned based on the new topology. Initially planned for the end of 2008, the strengthening of the transfers to/from well-established sites was a delayed milestone (6 months) to the benefit of the BNL/KISTI data transfer. A research activity involving STAR and the computer science department at Prague has been initiated to improve the data management program and network tuning. We are studying and testing a multi-site data transfer paradigm, coordinating movement of datasets to and from multiple locations (sources) in an optimal manner, using a planner taking into account the performance of the network and site. This project relies on the knowledge of file locations at each site and a known network data transfer speed as initial parameters (as data is moved, speed can be re- 26 assessed so the system is a self-learning component). The project has already shown impressive gains over a standard peer-to-peer approach for data transfer. Although this activity has so far impacted OSG in a minimal way, we will use the OSG infrastructure to test our implementation and prototyping. To this end, we paid close attention to protocols and concepts used in Caltech’s Fast Data Transfer (FDT) tool as its streaming approach has non-trivial consequence and impact on TCP protocol shortcomings. Design considerations and initial results were presented at the Grid2009 conference and published in the proceedings as “Efficient Multi-Site Data Movement in distributed Environment”. The implementation is not fully ready however and we expect further development in 2010. Our Prague site also presented their previous work on setting up a fully functional Tier 2 site at the CHEP 2009 conference as well as summarized our work on the Scalla/Xrootd and HPSS interaction and how to achieve efficient retrieval of data from mass storage using advanced request queuing techniques based on file location on tape but respecting faireshareness. The respective asbtract “Setting up Tier2 site at Golias/ Prague farm” and “Fairshare scheduling algorithm for a tertiary storage system” are available as http://indico.cern.ch/abstractDisplay.py?abstractId=432&confId=35523 and http://indico.cern.ch/abstractDisplay.py?abstractId=431&confId=35523. STAR has continued to use and consolidate the BeStMan/SRM implementation and has continued active discussions, steering and integration of the messaging format from the Center for Enabling Distributed Petascale Science’s (CEDPS) Troubleshooting team, in particular targeting use of BeStMan client/server troubleshooting for faster error and performance anomaly detection and recovery. BeStMan and syslog-ng deployments at NERSC provide early testing of new features in a production environment, especially for logging and recursive directory tree file transfers. At the time of this report, an implementation is available whereas BeStMan based messages are passed to a collector using syslog-ng. Several problems have already been found, leading to strengthening of the product. We hoped to have a case study within months but we are at this time missing a data-mining tool able to correlate (hence detect) complex problems and automatically send alarms. The collected logs have been useful however to, at a single source of information, find and identify problems. STAR has also finished developing its own job tracking and accounting system, a simple approach based on adding tags at each stage of the workflow and collecting the information via recorded database entries and log parsing. The work was presented at the CHEP 2009 conference (“Workflow generator and tracking at the rescue of distributed processing. Automating the handling of STAR's Grid production”, Contribution ID 475, CHEP 2009, http://indico.cern.ch/ contributionDisplay.py?contribId=475&confId= 35523). STAR has also continued an effort to collect information at application level, build and learn from in-house user-centric and workflow-centric monitoring packages. The STAR SBIR TechX/UCM project, aimed to provide a fully integrated User Centric Monitoring (UCM) toolkit, has reached its end-of-funding cycle. The project is being absorbed by STAR personnel aiming to deliver a workable monitoring scheme at application level. The reshaped UCM library has been used in nightly and regression testing to help further development (mainly scalability, security and integration into Grid context). Several components needed reshape as the initial design approach, too complex, slowed down maintenance and upgrade. To this extent, a new SWIG (“Simplified Wrapper and Interface Generator”) based approach was used and reduced the overall size of the interface package by more than an order of magnitude. The knowledge and a working infrastructure based on syslog-ng may very well provide a simple mechanism for merging UCM with CEDPS vision. Furthermore, STAR has developed a workflow analyzer for experimental data production (simulation mainly) and presented the work at the CHEP 2009 confer27 ence as “Automation and Quality Assurance of the Production Cycle” (http://indico.cern.ch/abstractDisplay.py?abstractId=475&confId=35523) now accepted for publication. The toolkit developed in this activity allows extracting independent accounting and statistical information such as task efficiency, percentage success allowing keeping a good record of production made on Grid based operation. Additionally, a job feeder was developed allowing automatic throttling of job submission across multiple site, keeping all site at maximal occupancy but also detecting problems (gatekeeper downtimes and other issues). The feeder has the ability to automatically re-submit failed jobs for at least N times, bringing the overall job success efficiency for only one re-submission to 97% success. This tool was used in the Amazon EC2 exercise (see later section). STAR grid data processing and job handling operations have continued their progression toward a full Grid-based operation relying on the OSG software stack and the OSG Operation Center issue tracker. The STAR operation support team has been efficiently addressing issues and stability. Overall the grid infrastructure stability seems to have increased. To date, STAR has however mainly achieved simulated data production on Grid resources. Since reaching a milestone in 2007, it has become routine to utilize non-STAR dedicated resources from the OSG for the Monte-Carlo event generation pass and to run the full response simulator chain (requiring the whole STAR framework installed) on STAR’s dedicated resources. On the other hand, the relative proportion of processing contributions using non-STAR dedicated resources has been marginal (and mainly on the FermiGrid resources in 2007). This disparity is explained by the fact that the complete STAR software stack and environment, which is difficult to impossible to recreate on arbitrary grid resources, is necessary for full event reconstruction processing and hence, access to generic and opportunistic resources are simply impractical and not matching the realities and needs of running experiments in Physics production mode. In addition, STAR’s science simply cannot suffer the risk of heterogeneous or non-reproducible results due to subtle library or operating system dependencies and the overall workforce involved to ensure seamless results on all platforms exceeds our operational funding profile. Hence, STAR has been a strong advocate for moving toward a model relying on the use of Virtual Machine (see contribution at the OSG booth @ CHEP 2007) and have since closely work, to the extent possible, with the CEDPS Virtualization activity, seeking the benefits of truly opportunistic use of resources by creating a complete pre-packaged environment (with a validated software stack) in which jobs will run. Such approach would allow STAR to run any one of its job workflow (event generation, simulated data reconstruction, embedding, real event reconstruction and even user analysis) while respecting STAR’s policies of reproducibility implemented as complete software stack validation. The technology has huge potential in allowing (beyond a means of reaching non-dedicated sites) software provisioning of Tier-2 centers with minimal workforce to maintain the software stack hence, maximizing the return to investment of Grid technologies. The multitude of combinations and the fast dynamic of changes (OS upgrade and patches) make the reach of the diverse resources available on the OSG, workforce constraining and economically un-viable. 28 Figure 13: Corrected STAR recoil jet distribution vs pT This activity reached a world-premiere milestone when STAR made used of the Amazon/EC2 resources, using Nimbus Workspace service to carry part of its simulation production and handle a late request. These activities were written up in iSGTW (“Clouds make way for STAR to shine”, http://www.isgtw.org/?pid=1001735, Newsweek (“Number Crunching Made Easy Cloud computing is making high-end computing readily available to researchers in rich and poor nations alike“ http://www.newsweek.com/id/195734), SearchCloudComputing (“Nimbus cloud project saves brainiacs' bacon“ http://searchcloudcomputing.techtarget.com/news/article/ 0,289142,sid201_gci1357548,00.html) and HPCWire (“Nimbus and Cloud Computing Meet STAR Production Demands“ http://www.hpcwire.com/offthewire/Nimbus-and-CloudComputing-Meet-STAR-Production-Demands-42354742.html?page=1). This was the very first time cloud computing had been used in the HENP field for scientific production work with full confidence in the results. The results were presented during a plenary talk at CHEP 2009 conference (Figure 13), and represent a breakthrough in production use of clouds. We are working with the OSG management for the inclusion of this technology into OSG’s program of work. Continuing on this activity in the second half of 2009, STAR has undertaken testing of various models of cloud computing on OSG since the EC2 production run. The model used on EC2 was to deploy a full OSG-like compute element with gatekeeper and worker nodes. Several groups within the OSG offered to assist and implement diverse approaches. The second model, deployed 29 at Clemson University, uses a persistent gatekeeper and having worker nodes launched using a STAR specific VM image. Within the image, a Condor client then register to an external Condor master hence making the whole batch system is completely transparent to the end-user (the instantiated VM appear as just like other nodes, the STAR jobs slide in the VM instances where it finds a fully supported STAR environment and software package). The result is similar to configuring a special batch queue meeting the application requirements contained in the VM image and then many batch jobs can be run in that queue. This model is being used at a few sites in Europe as described at a recent HEPiX meeting, http://indico.cern.ch/conferenceTimeTable.py ?confId=61917. A third model has been preliminarily tested at Wisconsin where the VM image itself acts as the payload of the batch job and is launched for each job. This is similar to the condor glide-in approach and also the pilot-job method where the useful application work is performed after the glide-in or pilot job starts. This particular model is not well matched to the present STAR SUMS workflow as jobs would need to be pulled in the VM instance rather than integrated as a submission via a standard Gatekeeper (the GK interaction only starts instances). However, our MIT team will pursue testing at Wisconsin and attempt to a demonstrator simulation run and measure efficiency and evaluate practicality within this approach. One goal of the testing at Clemson and Wisconsin is to eventually reach a level where scalable performance can be compared with running on traditional clusters. The effort in that direction is helping to identify various technical issues, including configuration of the VM image to match the local batch system (contextualization), considerations for how a particular VM image is selected for a job, policy and security issues concerning the content of the VM image, and how the different models fit different workflow management scenarios. Our experience and results in this cloud/grid integration domain are very encouraging regarding the potential usability and benefits of being able to deploy application specific virtual machines, and also indicate that a concerted effort is necessary in order to address the numerous issues exposed and reach an optimal deployment model. All STAR physics publications acknowledge the resources provided by the OSG. 2.8 MINOS Over the last three years, computing for MINOS data analysis has greatly expanded to use more of the OSG resources available at Fermilab. The scale of computing has increased from about 50 traditional batch slots to typical user jobs running on over 1,000 cores, with a strong desire to expand to about 5,000 cores (over the past 12 months they have used 3.1M hours on OSG from 1.16M submitted jobs). This computing resource, combined with 90 TB of dedicated BlueArc (NFS mounted) file storage, has allowed MINOS to move ahead with traditional and advanced analysis techniques, such as Neural Network, Nearest Neighbor, and Event Library methods. These computing resources are critical as the experiment moves beyond the early, somewhat simpler Charged Current physics, to more challenging Neutral Current, +e and other analyses which push the limits of the detector. We use a few hundred cores of offsite computing at collaborating universities for occasional Monte Carlo generation. MINOS is also starting to use TeraGrid resources at TACC, hoping to greatly speed up their latest processing pass. 2.9 Astrophysics The Dark Energy Survey (DES) used approximately 40,000 hours of OSG resources in 2009, with DES simulation activities ramping up in the latter part of the year. The most recent DES simulation run produced 3.5 Terabytes of simulated imaging data, which were used for testing 30 the DES data management data processing pipelines as part of DES Data Challenge 5. These simulations consisted of 2,600 mock science images, covering 150 square degrees of the sky, along with another 900 calibration images. Each 1-GB-sized DES image is produced by a single job on OSG and simulates the stars and galaxies on the sky covered in a single 3-square-degree pointing of the DES camera. The processed simulated data are also being actively used by the DES science working groups for development and testing of their science analysis codes. DES expects to roughly triple its usage of OSG resources over the following 12 months, as we work to produce a larger simulations data set for DES Data Challenge 6 in 2010. 2.10 Structural Biology Biomedical computing In 2009 we established NEBioGrid VO to support biomedical research on OSG (nebiogrid.org), and to integrate the regional biomedical resources in the New England area. NEBioGrid has deployed the OSG MatchMaker service that was provided by RENCI. The Matchmaker provides automatic resource selection and scheduling. When combined with Condor's DAGMan, job submission, remote assignment, and resubmission (in the event of a failure it can be automated with minimal user intervention required). This will allow NEBioGrid to streamline utilization of a wide range of OSG resources for its researchers. A researcher at Harvard Medical School expressed interest in utilizing OSG resources for his project; within a week's time NEBioGrid had an initial batch of jobs running on OSG sites. They have continued to refine submission scripts and job work-flow to improve the success rate and efficiency of the researcher's continuing computations. A framework has also been established for a researcher at Massachusetts General Hospital, which will allow him to utilize OSG to access computational resources far surpassing what was available to him previously. Work is also in progress with researchers at Northeastern University and Children's Hospital who have expressed a need and desire to utilize OSG. NEBioGrid collaborated with the Harvard Medical School West Quad Computing Group to help set up a new cluster configured as an OSG CE. This work was completed within a couple weeks to bring their CE into operational status and receive remote jobs. A cluster utilizing Infiniband for MPI calculations is currently being installed in collaboration with Children's Hospital and the Immune Disease Institute. This cluster will be used for long-running molecular dynamics simulations, and will be made available as an OSG CE with MPI support in 2010. Progress is under way with another HMS IT group to integrate their cluster into OSG. NEBioGrid holds regular seminars and workshops on HPC and grid computing related subjects. Some talks are organized in collaboration with the Harvard University Faculty of Arts and Sciences. In April the topics were the LSF batch management system, and Harvard's Odyssey cluster, which contributes to the ATLAS experiment. In July Ian Stokes-Rees spoke about collaborative web portal interfaces to HPC resources. In August Johan Montagnat spoke on the topic of large-scale medical image analysis using the European grid infrastructure. In October a full-day workshop was held in collaboration with Northeastern University; over a dozen speakers presented on the topic of Biomedical computing on GPUs. In December Theresa Kaltz spoke about Infiniband and other High-Speed Interconnects. 31 Structural biology computing There were two major developments in 2009. First, SBGrid has deployed a range of applications onto over a dozen OSG sites. The primary applications that have been utilized are two molecular replacement programs: Molrep and Phaser. These programs are widely used by structural biologists to identify the 3-D structure of proteins by comparing imaging data from the unknown structure to that of known protein fragments. Typically a data set for an unknown structure is compared to a single set of protein coordinates. SBGrid has developed a technique to do this analysis with 100,000 fragments, requiring between 2000 and 15,000 hours for a single structure study, depending on the exact application and configuration parameters. Our early analysis with Molrep indicated that signals produced by models with very weak sequence identity are, in many cases, too weak to produce meaningful ranking. The study was repeated utilizing Phaser - a maximum likelihood application that requires significant computing resources (searches with individual coordinates take between 2-10 minutes for crystals with a single molecule in an asymmetric unit, and longer for molecules with many copies of the same molecule). With Phaser a significant improvement in sensitivity of global molecular replacement was achieved and we have recently identified several very encouraging cases. For example structure of NADH:FMN oxidoreductase (PDB code: 2VZF, (Figure 14B gray), originally solved by experimental phasing method), could be phased with coordinates of SCOP model 1zwkb1 (Figure 14B teal). The SCOP model was one of four structures that formed a distinct cluster from the rest of SCOP models, and yet has a very weak sequence identity (<15%). In the testing phase of the project between Sept-Nov 2009 SBGrid has ramped up its utilization of OSG, running 42,000 jobs consuming over 200,000 CPU hours. The portal and our new method have been recently tested by users from Yale University and UCSF, and will soon become available to all members of the structural biology community. Figure 14: Global Molecular replacement. A - after searching with 100,000 SCOP domains four models form a distinct cluster (highlighted). B - one of the SCOP domain in the cluster (teal) superimposes well with 2VZF coordinates (grey), although sequence identity between two structures is 32 minimal. C - SBGrid molecular replacement portal deploys computations to OSG resources. Typical runtime for an individual search with a single SCOP domain is 10 minutes. In the second major development, in collaboration with Harrison laboratory at Harvard Medical School, we deployed on the grid a method to build and refine a three dimensional (3D) protein structure primarily based on orientational constraints obtained from nuclear magnetic resonance (NMR) data. We measured three sets of orthogonal residual dipolar couplings (rdc) constraints corresponding to protein backbone atom pairs (N-NH, N-CO and CO-CA vectors). In our protocol, the experiential data is first divided into small, sequential rdc sets. Each set is used to scan a database of known backbone fragments. We generated a database of 5,7,9,12,15,20 and 25 residues fragments by processing a selected group of high quality x-ray and NMR structures deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB). For each protein fragment in the database, we score our experimental data using the Prediction of Alignment from Structure procedure (PALES). The top scoring fragments define the local protein conformation; the global protein structure is refined in a second step using xplor. Each PALES comparison requires the prediction of an alignment tensor from a known 3D coordinate file (each fragment) and the statistical analysis of the fit to the experimental dataset. Every PALES analysis takes about 50 sec to complete on a current state-of-the-art single core processor. A typical fragment database consists of ~300,000 fragments and our experiential data (obtained on a 32 KD membrane protein) is divided into ~300 sequential rdc sets. This requires ~ 109 comparisons or ~52 days for completion on a single core computer. The scoring step can be easily divided in to ~1000 independent jobs, reducing the execution time to only a few hours per database. We have used the SBGrid computing infrastructure to develop our protocol and to efficiently process our data. We are currently refining the 3D protein structure. 2.11 Multi-Disciplinary Sciences The Engagement team has worked directly with researchers in the areas of: biochemistry (Xu), molecular replacement (PRAGMA), molecular simulation (Schultz), genetics (Wilhelmsen), information retrieval (Blake), economics, mathematical finance (Buttimer), computer science (Feng), industrial engineering (Kurz), and weather modeling (Etherton). The computational biology team led by Jinbo Xu of the Toyota Technological Institute at Chicago uses the OSG for production simulations on an ongoing basis. Their protein prediction software, RAPTOR, is likely to be one of the top three such programs worldwide. A chemist from the NYSGrid VO using several thousand CPU hours a day sustained as part of the modeling of virial coefficients of water. During the past six months a collaborative task force between the Structural Biology Grid (computation group at Harvard) and OSG has resulted in porting of their applications to run across multiple sites on the OSG. They are planning to publish science based on production runs over the past few months. 2.12 Computer Science Research A collaboration between OSG extensions program, the Condor project, US ATLAS and US CMS is using the OSG to test new workload and job management scenarios which provide “justin-time” scheduling across the OSG sites using “glide-in” methods to schedule a pilot job locally at a site which then requests user jobs for execution as and when resources are available. This includes use of the “GLExec” component, which the pilot jobs use to provide the site with the identity of the end user of a scheduled executable. 33 3. Development of the OSG Distributed Infrastructure 3.1 Usage of the OSG Facility The OSG facility provides the platform that enables production by the science stakeholders; this includes operational capabilities, security, software, integration, and engagement capabilities and support. We are continuing our focus on providing stable and reliable “production” level capabilities that the OSG science stakeholders can rely on for their computing work and get timely support when needed. The stakeholders continue to ramp up their use of OSG, and ATLAS and CMS VOs are ready for LHC data taking that is beginning. They have each run several workload tests (including STEP09) and the OSG infrastructure has performed well in these exercises and is ready to support full data taking. Figure 15: OSG facility usage vs. time broken down by VO In the last year, the usage of OSG resources by VOs has continued to increase from 3,500,000 hours per week to over 5,000,000 hours per week, sustained; additional detail is provided in attachment 1 entitled “Production on the OSG.” OSG provides an infrastructure that supports a broad scope of scientific research activities, including the major physics collaborations, nanoscience, biological sciences, applied mathematics, engineering, and computer science. Most of the current usage continues to be in the area of physics but non-physics use of OSG is a growth area with current usage exceeding 200,000 hours per week (averaged over the last year) spread over 13 VOs. 34 Figure 16: OSG facility usage vs. time broken down by Site. (Other represents the summation of all other “smaller” sites) With about 80 sites, the production provided on OSG resources continues to grow; the usage varies depending on the needs of the stakeholders. During stable normal operations, OSG provides approximately 700,000 CPU wall clock hours a day with peaks occasionally exceeding 900,000 CPU wall clock hours a day; approximately 150,000 opportunistic wall clock hours are available on a daily basis for resource sharing. In addition, OSG is providing significant effort and technical planning devoted to preparing for a large influx of CMS (~20 new) and Atlas (~20 new) Tier-3 sites that will be coming online early in 2010. These ~40 Tier-3 sites are notable since many of the administrators are not expected to have formal computer science training. To support these sites (in collaboration with Atlas and CMS), OSG has been focused on creating both documentation as well as a support structure suitable for these sites. To date the effort has been directed in multiple directions: Onsite help and hands-on assistance to the Atlas and CMS Tier-3 coordinators in setting up their Tier-3 test sites including several multi-day meetings to bring together the OSG experts needed to answer and document specific issues relevant to the Tier-3s. OSG hosts regular meetings with these coordinators as well to discuss issues and plan steps forward. OSG packaging and support for Tier-3 components such as XRootd that are projected to be installed at over half of the Tier-3 sites (primarily Atlas sites). This includes testing and working closely with the Xrootd development team via bi-weekly meetings. Refactoring OSG workshops to draw in the smaller sites by incorporating tutorials and detailed instruction. The Site-admin Workshop in Indianapolis this year for example presented in-depth tutorials for installing and/or upgrading CE and SE components. Over half of the 35 sites represented were able to install or upgrade a component while at the workshop, and the remainder completed their upgrades approximately two weeks later. OSG documentation for Tier-3s that begins from the point where systems have only an operating system installed. Hence sections for site planning, file system setup, basic networking instructions, and cluster setup and configuration were written together with more detailed explanations of each step. https://twiki.grid.iu.edu/bin/view/Tier3/WebHome. Working directly with new CMS and Atlas site administrators as they start to deploy their sites and in some cases are working through their documents. Bi-weekly site meetings geared toward Tier-3s in conjunction with the ongoing site coordination effort including office hours held three times every week to discuss issues that arise involving all aspects of the sites. The site meetings are just beginning, and still mostly attended by Tier-2s, but as new Tier-3's start to come online, we expect Tier-3 attendance to pick up. As the LHC re-starts data-taking in December 2009, the OSG infrastructure has withstood the test of the latest I/O and computational challenges: STEP09 for Atlas and CMS, the Atlas User Analysis Test (UAT), and the CMS physics challenge. As a result OSG has demonstrated that it is meeting the needs of USCMS and USATLAS stakeholders and is well positioned to handle the planned ramp-up of I/O and analysis forecasts for 2010. 3.2 Middleware/Software To enable a stable and reliable production platform, the middleware/software effort has increased focus on support and capabilities that improve administration, upgrades, and support. Between January 2009 and December 2009, OSG’s software efforts focused on supporting OSG 1.0; developing, releasing, and supporting OSG 1.2, and a new focus on so-called native packaging. As in all major software distributions, significant effort must be given to ongoing support. OSG 1.0 was released in August 2008 and was extensively documented in last year’s annual report. Subsequently, there have been 24 incremental updates to OSG 1.0, which demonstrates that OSG 1.0 was successful in one of our main goals: being able to support incremental updates to the software stack, something that has been traditionally challenging in the OSG software stack. That said, there were still significant challenges to being able support incremental updates using our distribution mechanisms (packaging software called Pacman), so in early 2009, we developed OSG 1.2. Our goal was to release OSG 1.2 before the restart of the LHC so that sites could install the new version. We had a pre-release in June 2009, and it was formally released in July 2009, which gave sites sufficient time to upgrade if they chose to do so, and roughly 60% of the sites in OSG have done so; since the initial release in June we have released 11 software updates to OSG 1.2. Most of the software updates to the OSG software stack were “standard” updates featuring numerous bug fixes, security fixes, and occasional minor feature upgrades. It should be noted that this general maintenance consumes roughly 50% of the effort of the OSG Software effort. There have been several software updates and events in the last year that are worthy of deeper discussion. As background, the OSG software stack is based on the VDT grid software distribution. The VDT is grid-agnostic and used by several grid projects including OSG, TeraGrid, and WLCG. The OSG software stack is the VDT with the addition of OSG-specific configuration. 36 1) VDT 1.10.1v (released in April 2009) was a significant new update that stressed our ability to supply a major incremental upgrade without requiring complete re-installations. To do this, we supplied a new update program that assists site administrators with the updating process and ensures that it is done correctly. This updater will be used for all future updates provided by the VDT. The update provided a new version of Globus, an update to our authorization infrastructure, and an update to our information infrastructure. It underwent significant testing both internally and by VOs in our integration testbed. 2) OSG 1.2 was released in July 2009. Not only did it significantly improve our ability to provide updates to users, but it also added support for a new operating system (Debian 5), which is required by LIGO. This differed from VDT 1.10.1v in that we tackled a new source of packaging difficulty which prevented us from easily adding new platform support without disrupting our ability to provide incremental updates for other users. 3) Since the summer of 2009, we have been focusing on the needs of the upcoming ATLAS and CMS Tier-3 sites. In particular, we have focused on Tier-3 support, particularly with respect to new storage solutions. We have improved our packaging, testing, and releasing of BeStMan, XRootd, and Hadoop, which are a large part of our set of storage solutions. 4) We have emphasized improving our storage solutions in OSG. This is partly for the Tier-3 effort mention in item #3, but is also for broader use in OSG. For example, we have created new testbeds for Xrootd and Hadoop and expanded our test suite to ensure that the storage software we support and release are well tested and understood internally. We have started regular meetings with the XRootd developers and ATLAS to make sure that we understand how development is proceeding and what changes are needed. We have also provided new tools to help users query our information system for discovering information about deployed storage systems, which has traditionally been hard in OSG. We expect these tools to be particularly useful to LIGO and SCEC, though other VOs will likely benefit as well. We also conducted an in-person storage forum in June 2009 at Fermilab, to help us better understand the needs of our users and to directly connect them with storage experts. 5) We have begun intense efforts to provide the OSG software stack as so-called “native packages” (e.g. RPM on Red Hat Enterprise Linux). With the release of OSG 1.2, we have pushed the packaging abilities of our infrastructure (based on Pacman) as far as we can. While our established users are willing to use Pacman, there has been a steady pressure to package software in a way that is more similar to how they get software from their OS vendors. With the onset of Tier-3s, this effort has become more important because system administrators at Tier-3s are often less experienced and have less time to devote to managing their OSG sites. We have wanted to support native packages for some time, but have not had the effort to do so, due to other priorities; but it has become clear that we must do this now. We are initially focusing on the needs of the LIGO experiment, but in early 2010 hope to gradually phase in native package support to meet the needs of ATLAS, CMS, and other VOs, eventually supporting the entire OSG software stack as native packages. The VDT continues to be used by external collaborators. EGEE/WLCG uses portions of VDT (particularly Condor, Globus, UberFTP, and MyProxy). The VDT team maintains close contact with EGEE/WLCG due to the OSG Software Coordinator's (Alain Roy's) weekly attendance at the EGEE Engineering Management Team's phone call. TeraGrid and OSG continue to maintain a base level of interoperability by sharing a code base for Globus, which is a release of Globus, 37 patched for OSG and TeraGrid’s needs. The VDT software and storage coordinators (Alain Roy and Tanya Levshina) are members of the WLCG Technical Forum, which is addressing ongoing lacks, needs and evolution of the WLCG infrastructure in the face of data taking. 3.3 Operations The OSG Grid Operations Center (GOC) provides the central point for operational support for the Open Science Grid and provides the coordination for various distributed OSG services. The GOC performs real time monitoring of OSG resources, supports users, developers and system administrators, maintains critical information services, provides incident response, and acts as a communication hub. The primary goals of the OSG Operations group are: supporting and strengthening the autonomous OSG resources, building operational relationships with peering grids, providing reliable grid infrastructure services, ensuring timely action and tracking of operational issues, and assuring quick response to security incidents. In the last year, the GOC continued to provide the OSG with a reliable facility infrastructure while at the same time improving services to offer more robust tools to the OSG stakeholders. During the last year, the GOC continued to provide and improve tools and services for the OSG: The OSG Information Management (OIM) database that provides the definitive source of topology and administrative information about OSG user, resource, support agency, and virtual organizations was updated to allow new data to be provided to the WLCG based on installed capacity requirements for Tier 1 and Tier 2 resources. Additional WLCG issue tracking requirements we also added to insure preparedness for LHC data-taking. The Resource and Service Validation (RSV) monitoring tool has received a new command line configuration tool plus several new tests, including security, accounting, and central service probes. The BDII (Berkeley Database Information Index), which is critical to CMS production, is now functioning with an approved Service Level Agreement (SLA) which was reviewed and approved by the affected VOs and the OSG Executive Board. The MyOSG information consolidation tool achieved production deployment and is now heavily used within the OSG for status information as well as being ported to peering grids such as EGEE and WLCG. MyOSG allows administrative, monitoring, information, validation and accounting services to be displayed within a single user defined interface. The GOC-provided public trouble ticket-viewing interface has received updates to improve usability. This interface allows issues to be tracked and updated by user and it also allows GOC personnel to use OIM meta-data to route tickets much more quickly, reducing the amount of time needed to look up contact information of resources and support agencies. And we continued our efforts to improve service availability via the completion of several hardware and service upgrades: The GOC services located at IU were moved to a new more robust physical environment, providing much more reliable power and network stability. The BDII was thoroughly tested to be sure outage conditions on a single BDII server were quickly identified and corrected at the WLCG top-level BDII. 38 A migration to a virtual machine environment for many services is now complete to allow flexibility in providing high availability services. Virtual machines for training were provided to GOC staff to help troubleshoot installation and job submission issues. OSG Operations is ready to support the LHC re-start and continues to refine and improve its capabilities for these stakeholders. We have completed preparation for the stress of the LHC startup on services by testing, by putting proper failover and load-balancing mechanisms in place, and by implementing administrative automation with several user services. Service reliability for GOC services has always been high and we now gather metrics that can show the reliability of these services exceed the requirements of Service Level Agreements (SLAs). SLAs have been finalized for the BDII and MyOSG and SLAs for the OSG software cache and RSV are being reviewed by stakeholders. Regular release schedules for all GOC services have been implemented to enhance user testing and regularity of software release cycles for GOC provided services. One additional staff member will be added to support the load increase in tickets expected with the LHC re-start. During 2009, the GOC and Operations team completed the transition from the building phase for both service infrastructure and user support into a period of stable operational service for the OSG community. Preparations completed, we are now focusing on providing a reliable, sustainable operational infrastructure and a well-trained, proactive support staff for OSG production activities. 3.4 Integration and Site Coordination The OSG Integration & Sites Coordination activity continues to play a central role in helping improve the quality of grid software releases prior to deployment on the OSG and in helping sites deploy and operate OSG services thereby achieving greater success in production. The major focus was preparations for OSG 1.2, the golden release for the LHC restart. Beginning in April 2009, the 3-site Validation Test Bed (VTB) was used repeatedly to perform installation, configuration and functional testing of (pre-releases) of VDT across three sites. These tests shake out basic usability prior to release to the many-site, many-platform, many-VO integration test bed (ITB). In July, the deliverable of a well-tested OSG 1.2 suitable for LHC running was met through the combined efforts of both OSG core staff and the wider community of OSG site administrators and VO testers participating in the validation. The face-to-face site administrator’s workshop hosted by the OSG GOC in Indianapolis in July featured deployment of OSG 1.2 as its main theme. The VTB and ITB test beds also serve as a platform for testing the main configuration tools of the OSG software stack, which are developed as part of this activity. The configuration system has grown to meet the requirements of the services offered in the OSG software stack including configuration of storage element software. A new tool suite developed in the Software Tools Group for RSV (resource service validation) configuration and operation testing was vetted on the VTB repeatedly, with feedback given to the development team from the site administrator’s viewpoint. The persistent ITB continues to provide an on-going operational environment for VO testing at will, as well as being the main platform during the ITB testing prior to a major OSG release. Overall the ITB is comprised of 12 sites providing compute elements and four sites providing storage elements (dCache and BeStMan packages implementing SRM v1.1 and v2.2 protocols); 39 36 validation processes were defined across these compute and storage resources in readiness for the production release. Pre-deployment validation of applications from 12 VOs were coordinated with the OSG Users group. Other accomplishments include both dCache and SRM-BeStMan storage element testing on the ITB; delivery of new releases of the configuration tool as discussed above; and testing of an Xrootd distributed storage system as delivered by the OSG Storage group. The OSG Release Documentation continues to receive significant edits from the community of OSG participants. The collection of wiki-based documents capture processes for install, configure, and validation methods as used throughout the integration and deployment processes. These documents were updated and received review input from all corners of the OSG community. In the past year we have taken a significant step towards increasing contact with the site administration community directly through use of a persistent chat room (“Campfire”). We offer three hour sessions at least three days a week where OSG core Integration or Sites support staff are available to discuss issues, troubleshoot problems or simply “chat” regarding OSG specific issues. The sessions are archived and searchable. Finally a major new initiative meant to improve the effectiveness of OSG release validation has begun whereby a suite of test jobs can be executed through the (pilot-based) Panda workflow system. The test jobs can be of any type and flavor; the current set includes simple ‘hello-world’ jobs, jobs that are CPU-intensive, and jobs that exercise access to/from the associated storage element of the CE. Importantly, ITB site administrators are provided a command line tool they can use to inject jobs aimed for their site into the system, and then monitor the results using the full monitoring framework (pilot and Condor-G logs, job metadata, etc) for debugging and validation at the job-level. As time goes on additional workloads can be created an executed by the system, simulating components of VO workloads. 3.5 Virtual Organizations Group The Virtual Organizations (VO) Group coordinates and supports the portfolio of the “at-large” Science community VOs in OSG except for the three major stakeholders (ATLAS, CMS, and LIGO) which are directly supported by the OSG Executive Team. At various times through the year, science communities were provided assistance in planning their use of OSG. Direct input was gathered from nearly 20 at-large VOs and reported to the OSG Council on behalf of ALICE, CDF, CIGI, CompBioGrid, D0, DES, DOSAR, Fermilab VO, GEANT4, GPN, GRASE, GROW, GUGrid, IceCube, JDEM, Mariachi, NanoHUB, NYSGrid, and SBGrid. These input provided a roadmap of intended use to enable OSG to better support the goals of the science stakeholders; these roadmaps covered: scope of use; VO mission; average and peak grid utilization quantifiers; resource provisioning; and plans, needs, milestones. From time to time a VO may need special assistance to effectively use OSG. In such cases, OSG management establishes ad hoc task forces coordinated by the VO Group to provide support for the science community. In the last year, joint task forces were organized with ALICE, D0, Geant4, NanoHUB, and SBGrid. With mutual sharing of expertise between OSG and the VOs, each task force addressed specialized matters to enable measurably higher productivity for each VO. Care was taken to try to maximize long-term residual impact and sustainability, beyond the near term goals of each effort. 40 We continued our efforts to strengthen the effective use of OSG by VOs. D0 increased to 6080% efficiency at 80-120000 hours/day and this contributed to new levels of D0 Montecarlo production, reaching a new peak of 13 million events per week; and in effect, D0 enhanced its harnessing capacity on OSG, transforming it into ‘a publication per week.’ CDF undertook readiness exercises to prepare for Linux 5 and Kerberos CA transitions. The Fermilab VO with its wide array of more than 12 individual sub-VOs, continued efficient operations; and refocused on OSG-TeraGrid gateway, proving its concept at limited scale with D0 as a cross-grid consumer. We conducted a range of activities to jump-start VOs that are new users of OSG or looking to increase their leverage of OSG. ALICE was enabled to start operations using its scale of resources at NERSC and LLNL. NanoHUB was enabled to pull-up from nominal utilization to 500 hours/day with multiple streams of nanotechnology applications. The SBGrid resource infrastructure was enabled and its molecular biology science applications started up across OSG. The GEANT4’s EGEE-based biannual regression-testing production runs were expanded onto the OSG, assisting in its toolkit’s quality releases for BaBar, MINOS, ATLAS, CMS, and LHCb. And we continue to actively address plans for scaling up operations of additional communities including IceCube, CompBioGrid, and GPN. Pre-release validation by Science stakeholders was completed for OSG Release 1.2 in partnership with the OSG Integration team. After validation on ITB 1.1, official green flags for OSG 1.2 were given by 13 VOs including ATLAS, CMS, LIGO, and STAR. As part of ITB 0.9.2 earlier, a smaller-scale cycle was organized for the incremental Release1.0.1. By working closely with VOs in planning of schedules for timely release of software, we achieved validation within 4 weeks. Balancing schedules with thoroughness of testing remains a tradeoff and we will continue to address and improve these processes in the coming year. Weekly VO Forum teleconferences were organized for regular one-on-one interaction between representatives of at-large VOs and staff members of OSG. We increased emphasis on assessing viability, and merger or closure, of a VO if this can better serve its own community’s successroadmap and value proposition. As a result: two VOs, GUGrid and Mariachi, were closed due to lack of growth in activity, based on their own self-assessment; two new registrations were encouraged to merge into existing VOs, Minerva into Fermilab VO, and CALICE into ILC; four new registrations were vetted and encouraged to form official VOs - GlueX, GROW, JDEM, and NEBioGrid. Efficient and effective use of the OSG continues to be a driving force for this team. Building and maintaining productivity with sustained growth is important for success of the diverse set of Science communities in OSG. In the coming year, the VO Group will continue to work toward this objective. 3.6 Engagement A priority of Open Science Grid is helping science communities benefit from the OSG infrastructure by working closely with these communities over periods of several months. The Engagement activity brings the power of the OSG infrastructure to scientists and educators beyond high-energy physics and uses the experiences gained from working with new communities to drive requirements for the natural evolution of OSG. To meet these goals, engagement helps in: providing an understanding of how to use the distributed infrastructure; adapting applications to run effectively on OSG sites; engaging the deployment of community owned distributed infrastructures; working with the OSG Facility to ensure the needs of the new community are met; 41 providing common tools and services in support of the engagement communities; and working directly with and in support of the new end users with the goal to have them transition to be full contributing members of the OSG. These goals and methods continue to evolve from the foundation created in previous years. Use of the OSG by the Engagement supported scientists continues to grow. Figure 17 shows the diversity and level of activity among Engagement users for the previous year, and Figure 18 shows the distribution by OSG facility of the more than 6 million CPU hours consumed by Engagement users, which is more than double the previous year’s activity. Figure 17: Engage user activity for one year 42 Figure 18: CPU hours by facility for Engage Users The Engagement team continues to work with active production users including: Steffen Bass (+3 staff), theoretical physics, Duke University; Anton Betten, mathematics, Colorado State; Jinbo Xu (+1 staff), protein structure prediction, Toyota Technological Institute; Eun Jung Choi, biochemistry, UNC-CH; Vishagan Ratnaswamy, mechanical engineering, New Jersey Institute of Technology; Abishek Patrap (+2 staff), systems biology, Institute for Systems Biology; Damian Alvarez Paggi, molecular simulation, Universidad de Buenos Aires; Eric Delwart, metagenomics, UCSF; Tai Boon Tan, molecular simulation, SUNY Buffalo. New engagements this year have led to work and collaboration with more than 10 research teams, including: Peter Rose and Andreas Prlic, RCSB PDB at UCSD; Bo Xin and Ian Shipsey, LSST, Purdue; Ridwan Sakidja, materials science, UW Madison; Dan Katz and student Allan Espinosa, computer science, U.Chicago; Lene Jung Kjaer, Cooperative Wildlife Research Laboratory, SIU; Zi-Wei Lin, Physics, Eastern Carolina University; Jarrett Byrnes, UCSB - Marine Science Institute; In addition, the Engagement effort has helped LIGO evolve their site selection methods for running their application on OSG. And the Engagement effort has also led to meaningful and growing use of OSG by the RENCI Science Portal, a separately funded TeraGrid Science Gateway project. 43 For the coming year, we have seeded relationships a number of research teams including a number of 2009 awardees of the US/European Digging Into Data Challenge, the University of Maryland Institute for Genome Sciences, the National Biomedical Computational Resource project (NBCR), and significant integration and enhancements support the work of Dr. Steffen Bass (Duke Physics) and his newly awarded 4 year NSF CDI award. 3.7 Campus Grids The Campus Grids team’s goal is to include as many US universities as possible in the national cyberinfrastructure. By helping universities understand the value of campus grids and resource sharing through the OSG national framework, this initiative aims at democratizing cyberinfrastructures by providing all resources to users and doing so in a collaborative manner. In the last year, the campus grids team worked closely with the OSG education team to provide training and outreach opportunities to new campuses. A workshop for new site administrators was organized at Clemson and was attended by representatives from four campuses: University of Alabama, University of South Carolina, Florida International Miami and from the Saint Louis Genome Center. A second workshop was held at RENCI, and included representatives from Duke, UNC-Chapel Hill, and NCSU among others. UNC-CH has recently launched its TarHeel Grid, and Duke University is actively implementing a campus grid being incubated by a partnership between academic computing and the physics department. Sites that have been engaged in the past continue to collaborate with us – Genome Center, New Jersey Higher Education Network – with various level of success. Overall 24 sites have been in various levels of contacts with the campus grids team and seven are now in active deployment. Finally, a collaborative activity was started with SURAGrid to study interoperability of the grids; and an international collaboration with the White Rose Grid from the UK National Grid Service in Leeds as well as the UK Interest Group on Campus Grid was initiated this period. (Barr Von Oehsen from Clemson presented the Clemson Grid at a Campus CI SIG workshop in Cardiff in September 2009). As a follow-up to work from the prior year, the Clemson Windows grid contributed significant compute cycles to Einstein@home run by LIGO and reached the rank of the 4th highest contributor world-wide; a new accounting probe has recently been completed to report this usage to the OSG accounting system, this will be made available to other campus sites that contribute via BOINC projects. Finally as a teaching activity, five Clemson students (of which two are funded by OSG and two are funded by CI-TEAM) competed in the TeraGrid 2009 parallel computing competition and finished second. One of them, an African-American female, attended the ISSGC 2009 in France thanks to OSG financial support. Some campuses have started to demonstrate high interest in cloud technologies and the campus CI area is working on this in collaboration with STAR project. STAR provided a virtual machine image containing their certified software and this image was deployed at Clemson and University of Wisconsin to investigate mechanism to run STAR jobs within VMs. Per STAR scientist, Jerome Lauret (member of the OSG council), the Clemson setup that transparently runs jobs submitted to an OSG CE into the provided VM image is as easy to use as the EC2 cloud environment that they have tested. 3.8 Security The Security team continued its multi-faceted approach to successfully meeting the primary goal of maintaining operational security, developing security policies, acquiring, or developing neces- 44 sary security tools and software, and disseminating security knowledge and awareness. And we are evolving with increased focus toward providing an operationally secure infrastructure that balances usability and security requirements simultaneously. Our mission statement is different from last year's in that we are emphasizing “usability.” This change stems from both our observations and the feedback we received from our community. We continued our efforts to improve the OSG security program. The security officer asked for an external peer evaluation of the OSG security program which was conducted by the EGEE and TeraGrid security officers and a senior security person from University of Wisconsin. The committee produced a report of their conclusions and suggestions, according to which we are currently adjusting our current work. This peer evaluation proved so successful that we have received requests from EGEE and TeraGrid officers to join an evaluation of their security programs. In collaboration with ESNet Authentication and Trust Fabric Team, we have organized two security workshops: 1) the Living in an Evolving Identity World Workshop brought technical experts together to discuss security systems; and, 2) the OSG Identity Management Requirements Gathering Workshop brought the user communities together to discuss usage of the security systems. Usability has surfaced as a key element in both workshops. Historically, we have understood that ease-of-use of the PKI infrastructure greatly affects the security of our environment: a too complex security environment forces users to circumvent security all together or increases the barriers of entry into the grid world. We have conducted a detailed user survey with our VO contacts to pinpoint problem areas; the workshop reports are available at https://twiki.grid.iu.edu/bin/view/Security/OsgEsnetWorkshopReport). Accordingly, we have added priority to improve these problem areas in the next year. During the past year, we increased our preparations for the LHC re-start of data taking: We conducted a risk assessment of our infrastructure and held a 2-day blueprint meeting with our executive team. We identified the risks over our authentication infrastructure that could significantly affect OSG operations. We subsequently met with ESNet management since they provide the backbone of our authentication infrastructure. We jointly developed mitigation plans for the identified risks and have been implementing the plans. For example, we completed a tool to monitor suspicious behavior in ESNet certificate issuance infrastructure. We identified that our software distribution mechanism is a high-risk attack target and thus we decided to use signed software packages that can be automatically validated by site administrators during software updates. In order to increase the security quality of VDT software, we reached out to our external software providers and exchanged our security practices such as vulnerability announcement, patching policies, update frequencies and such. With the help of Professor Barton Miller of University of Wisconsin, we organized a “Secure Programming Workshop” in conjunction with the OGF. The tutorial taught the software providers the secure coding principles with hands-on code evaluation. A similar tutorial was also given at EGEE'09 conference to reach out to European software providers, which forms a significant part of the VDT code. 45 We formed a software vulnerability process with the help of VDT team. The security team watches the security bulletin boards on a daily basis and triggers the vulnerability evaluation process if a possible vulnerability is suspected that could affect the VDT code. To measure LHC Tier-2s ability to respond to an ongoing incident, we conducted an incident drill. The drill included all LHC Tier-2 sites and a LIGO site, 19 sites in total. Each site has been tested one-on-one; a security team member generated a non-malicious attack and assessed sites responses. In general the sites performed very well with only a handful of site needing minor assistance or support in implementing the correct response actions. Our next goal is to conduct an incident drill to measure the preparedness of our user community and the VOs. During the last year, the USATLAS and USCMS management asked OSG for increased security support for the LHC US Tier-3 sites. Tier-3s usually are small sites managed by graduate students. Thus, they may lack the security skills of a professional site administrator. To address this need: We focused on Tier-3s at the OSG Site Administrators Workshop. Before the workshop, we individually contacted each site, asked for their needs and encouraged participation. Their feedback helped shaping our preparation significantly. After the workshop, we have continued our efforts by meeting with a Tier-3 site admin oneon-one each week. The summaries of our meetings can be found at https://twiki.grid.iu.edu/bin/view/Security/FY10SecuritySupport and this effort continues in coming year. We give special guidance to our Tier 3 sites during security vulnerabilities. So far, we have had two high-risk root-level kernel vulnerabilities. In addition to our regular response process, we contacted each Tier-3 site individually, and helped them test and patch their systems. There is a clear need for this type of support at least until Tier3s become more familiar with site management. However, individualized care for close to 20 sites is not sustainable due to our existing effort level and we are investigating automated tools to relieve this burden. In addition, we want to build a security community, where each site admin is an active contributor and can work with his/her peers. We have started a security blog and a ticketing system to accommodate information sharing and we plan to increase awareness of these tools to create an active community. OSG did not have any grid incident during past year. Our efforts were preventive in nature and we spent most efforts on software patching (both grid and systems software) and responding to tangential incidents that were not attacks on the grid infrastructure but affected OSG sites indirectly. For example, we had attacks coming in via SSH ports and affecting HEP users but none of the attacks targeted the grid middleware or grid user credentials. Nevertheless, we undertook response as though OSG was attacked. We started an incident sharing community across TeraGrid, EGEE, NorduGrid, and OSG security officers; this continues to be immensely useful for operational security. During the last year, we identified a problem with updating OSG-recommended security configurations at sites; we identified the need for an administrative update tool to solve the issue and this tool is now in testing for a future release. To provide better monitoring, the security team wrote a suite of security probes which allow a site administrator to compare local security con- 46 figurations against OSG-suggested values; these monitoring probes have been released with OSG 1.2. In the next year, we plan to focus more on monitoring efforts which we consider a critical element of our risk assessment and mitigation plans. 3.9 Metrics and Measurements OSG Metrics and Measurements strive to give OSG management, VOs, and external entities quantitative details of the project’s growth throughout its lifetime. The focus for FY09 was the “internal metrics” report, where OSG Metrics collaborated with all the internal OSG areas to come up with quantitative metrics for each area. The goal was to help each area become aware of their progress throughout the year. This work culminated in OSG document #887 done in September 2009. OSG Metrics works with OSG Gratia operations to validate the data coming from the accounting system. This accounting data is sent to the WLCG on behalf of the USLHC VOs and is used for many reports and presentations to the US funding agencies. Based on the Gratia databases, the OSG Metrics team maintains a website that integrates with the accounting system and provides an expert-level view of all the OSG accounting data. This data includes RSV test results, job records, transfer records, storage statistics, cluster size statistics, and historical batch system status for most OSG sites. During FY10, we will be working to migrate the operation of this website to OSG operations and integrate the data further with MyOSG (in order to make it more accessible and user-friendly). OSG Metrics collaborates with the storage area to expand the number of sites reporting transfers to the accounting system. We have begun to roll out new storage reporting, based on new software capabilities developed for Gratia. The focus of the storage reporting is to provide a general overview of site usage centrally, but it also provides a wide range of detailed statistics to local system administrators. The Metrics and Measurements area will also begin investigating how to incorporate the network performance monitoring data into its existing reporting activities. Several VOs are actively deploying perfSONAR-based servers that provide site-to-site monitoring of the network infrastructure, and we would like to further encourage network measurements on the OSG. The Metrics and Measurements area continues to be involved with the coordination of WLCGrelated reporting efforts. It sends monthly reports to the JOT that highlights MOU sites’ monthly availability and activity. It produces monthly data for the metric thumbnails and other graphs available on the OSG homepage. OSG Metrics participated in the WLCG Installed Capacity discussions during FY09. The Installed Capacity project was tasked with providing the WLCG with accurate and up-to-date information on the amount of CPU and storage capacity available throughout the worldwide infrastructure. The OSG worked on the USLHC experiment’s behalf to communicate their needs and put the reporting infrastructure in place using MyOSG/OIM. The accounting extensions effort requires close coordination with the Gratia project at FNAL, and includes effort contributed by ATLAS. Based on a request from the OSG Council, we prepared a report on the various field of sciences utilizing the grid. This focused on a report detailing the number of users and number of wall clock hours per field. The majority of the effort was spent working with the community VOs (as opposed to an experiment VO) on a means to classify their users. Also from stakeholder input, the OSG assessed the various areas of contribution and documented the value delivered by OSG. 47 The goal of this activity was to develop and estimate the benefit and cost effectiveness, and thus provide a basis for discussion of the value of the Open Science Grid (OSG). 3.10 Extending Science Applications In addition to operating a facility, the OSG includes a program of work that extends the support of Science Applications both in terms of the complexity as well as the scale of the applications that can be effectively run on the infrastructure. We solicit input from the scientific user community both as it concerns operational experience with the deployed infrastructure, as well as extensions to the functionality of that infrastructure. We identify limitations, and address those with our stakeholders in the science community. In the last year of work, the high level focus has been threefold: (1) improve the scalability, reliability, and usability as well as our understanding thereof; (2) establish and operate a workload management system for OSG operated VOs; and (3) establish the capability to use storage in an opportunistic fashion at sites on OSG. We continued with our previously established processes designed to understand and address the needs of our primary stakeholders: ATLAS, CMS, and LIGO. The OSG ”senior account managers” responsible for interfacing to each of these three stakeholders meet, at least quarterly, with their senior management to go over their issues and needs. Additionally, we document the stakeholder desired worklists from OSG and crossmap these requirements to the OSG WBS; these lists are updated quarterly and serve as a communication method for tracking and reporting on progress. 3.11 Scalability, Reliability, and Usability As the scale of the hardware that is accessible via the OSG increases, we need to continuously assure that the performance of the middleware is adequate to meet the demands. There were four major goals in this area for the last year and they were met via a close collaboration between developers, user communities, and OSG. 1. At the job submission client level, the goal is 20,000 jobs running simultaneously and 200,000 jobs run per day from a single client installation, and achieving in excess of 95% success rate while doing so. The job submission client goals were met in collaboration with CMS, CDF, Condor, and DISUN, using glideinWMS. This was done via a mix of controlled environment and large scale challenge operations across the entirety of the WLCG. For the controlled environment tests, we developed an “overlay grid” for large scale testing on top of the production infrastructure. This test infrastructure provides in excess of 20,000 batch slots across a handful of OSG sites. An initial large-scale challenge operation was done in the context of CCRC08, the 2008 LHC computing challenge. Condor scalability limitations across large latency networks were discovered and this led to substantial redesign and reimplementation of core Condor components, and subsequent successful scalability testing with a CDF client installation in Italy submitting to the CMS server test infrastructure on OSG. Testing with this testbed exceeded the scalability goal of 20,000 running simultaneously and 200,000 jobs per day across the Atlantic. This paved the way for production operations to start in CMS across the 7 Tier-1 centers. Data Analysis Operations across the roughly 50 Tier-2 and Tier-3 centers available worldwide today is more challenging, as expected, due to the much more heterogeneous level of support at those centers. However, during STEP09, the 2009 LHC data analysis computing challenge, a single glideinWMS instance exceeded 10000 jobs running simultaneously over about 50 sites worldwide with a success rate in excess of 95%. Success rate here excludes storage as well as 48 application failures. For the coming years, CMS is expecting to have access to up to 40,000 batch slots across WLCG, so Condor is making further modifications to reach that goal. CMS also requested for glideinWMS to become more firewall friendly, which also affects Condor functionality. We are committed to help the Condor team by performing functionality and scalability tests. 2. At the storage scheduling level, the present goal is to have 50Hz file handling rates, significantly up from the 1Hz rate goal of last year. An SRM scalability of up to 100Hz was achieved using BeStMan and HadoopFS at UNL, CalTech, and UCSD. DCache based SRM is however still limited below 10Hz. The increase in the goal was needed in order to cope with the increasing scale of operations by the large LHC VOs Atlas and CMS. The driver here is stage-out of files produced during data analysis. The large VOs find that the dominant source of failure in data analysis is the stage-out of the results, followed by readaccess problems as second most likely failure. In addition, storage reliability is receiving much more attention now, given its measured impact on job success rate. In a way, the impact of jobs on storage has become more and more of a visible issue in part because of the large improvements in the submission tools, monitoring, and error accounting within the last two years. The improvements in submission tools coupled with the increased scale of resources available is driving up the load on storage. The improvements in monitoring and error accounting are allowing us to fully identify the sources of errors. OSG is very actively engaged in understanding the issues involved, working with both the major stakeholders as well as partner grids. 3. At the functionality level, this year’s goal was to improve the capability of opportunistic space use. It has been deployed at several dCache sites on OSG, and successfully used by D0 for the production operations on OSG. CDF is presently in the testing stage for adapting opportunistic storage into their production operations on OSG. However, a lot of work is left to do before opportunistic storage is an easy to use and widely available capability of OSG. 4. OSG has successfully transition to a “bridge model” with regard to WLCG for its information, accounting, and availability assessment systems. This implies that there are aggregation points at the OSG GOC via which all of these systems propagate information about the entirety of OSG to WLCG. For the information system this implies a single point of failure, the BDII at the OSG GOC. If this service fails then all resources on OSG disappear from view. ATLAS and CMS have chosen different ways of dealing with this. While ATLAS maintains its own “cached” copy of the information inside Panda, CMS depends on the WLCG information system. To understand the impact of the CMS choice, OSG has done scalability testing of the BDII. We find that the service is reliable up to a query rate of 10 Hz. The OSG GOC has deployed monitoring of the query rate of the production BDII in response to this finding, in order to understand the operational risk implied by this single point of failure. The monitoring data of the current year showed a stable query rate of about 2Hz, so while we will continue to monitor the BDII, we do not expect any additional effort to be needed. In addition, we have continued to work on a number of lower priority objectives: a. Globus has started working on a new release of GRAM, called GRAM5. OSG is committed to work with them on integration and large scale testing. We have been involved in the validation of alpha and beta releases, providing feedback to Globus ahead of the official 49 release, in order to improve the quality of the released software. This also involved working with Condor on the client side, since all OSG VOs rely on Condor for submission to GRAMbased CEs. GRAM5 is now supported in Condor version 7.3.2 and above. We plan to continue integration and scalability testing after the release of GRAM5 in order to make it a fully supported CE in the OSG software stack. b. WLCG has expressed the interest in deploying CREAM on WLCG sites in Europe and Asia. CREAM is a CE developed by INFN in EGEE. Since both ATLAS and CMS use Condor as the client to Grid resources and run jobs on WLCG resources worldwide, OSG has agreed to work with Condor to integrate CREAM support. The OSG activity involved testing of Condor development releases against a test CREAM instance provided by INFN. As a result, CREAM is now supported in Condor version 7.3.2 and above. We are now in the process of running scalability tests against CREAM, by deploying a CREAM instance at UCSD and interfacing it to the test cluster of 2000 batch slots. This will allow us to both validate the Condor client, as well as to evaluate CREAM as a possible CE candidate for OSG. We expect this work to be completed in early 2010. We are also waiting for CREAM deployment on the production infrastructure on EGEE to allow for realistic large scale testing. c. WLCG also uses the ARC CE; ARC is the middleware developed at and deployed on NorduGrid. We have been involved in testing the Condor client against ARC resources, which resulted in several improvements in the Condor code. For the next year, we plan to deploy an ARC CE at an OSG site and perform large scale scalability tests, with the aim to both validate the Condor client, as well as to evaluate ARC as a possible CE candidate for OSG. d. CMS has approached OSG for help in understanding of the I/O characteristics of CMS analysis jobs. We helped by providing advice and expertise which resulted in much improved CPU efficiencies of CMS software on OSG sites. CMS is continuing this work by testing all of their sites worldwide. OSG will continue to be involved in these tests at the level of consulting, and in order to learn from it, and document it for a wider audience. e. In the area of usability, an “operations toolkit” for dCache was improved. The intent of the toolkit is to provide a “clearing house” of operations tools that have been developed at experienced dCache installations, and derive from that experience a set of tools for all dCache installations supported by OSG. This is significantly decreasing the cost of operations and has lowered the threshold of entry. Site administrators from both the US and Europe continue to upload tools, and resulting in several releases. These releases have been downloaded by a number of sites, and are in regular use across the US, as well as some European sites. f. Another usability milestone was the creation of a package integrating BeStMan and Hadoop. By providing integration and support for this package, we expect to greatly reduce the effort needed by OSG sites to deploy it; CMS Tier-2s and Tier-3s in particular have expressed interest in using it. In the next year we plan to further improve it based on feedback received from the early adopters. g. A package containing a set of procedures that allow us to automate scalability and robustness tests of a Compute Element has been created. The intent is to be able to quickly “certify” the performance characteristics of new middleware, a new site, or deployment on new hardware. The package has already been used to test GRAM5 and CREAM. Moreover, we want to 50 offer this as a service to our resource providers so that they can assess the performance of their deployed or soon to be deployed infrastructure. h. Work has started on putting together a framework for using Grid resources for performing consistent scalability tests against centralized services, like CEs and SRMs. The intent is similar to the previous package, to quickly “certify” the performance characteristics of new middleware, a new site, or deployment on new hardware, but using hundreds of clients instead of one. Using Grid resources allows us to achieve this, but requires additional synchronization mechanisms to be performed in a reliable and repeatable manner. 3.12 Workload Management System The primary goal of the OSG Workload Management System (WMS) effort is to build, integrate, test and support operation of a flexible set of software tools and services for efficient and secure distribution of workload among OSG sites. There are currently two suites of software utilized for that purpose within OSG: Panda and glideinWMS, both drawing heavily on Condor software. The Panda system continued as a supported WMS service for the Open Science Grid, and served as a crucial infrastructure element of the ATLAS experiment at LHC. We completed the migration of Panda software to Oracle database backend, which enjoys strong support from major OSG stakeholders and allows us to host an instance of the Panda server at CERN where ATLAS is located, creating efficiencies in support and operations areas. Work has started on the Panda monitoring system upgrade, which will allow for a better integration with ATLAS Production Dashboard, as well as for easier implementation of user interfaces suitable for Panda applications outside of ATLAS. The Panda system has proven itself in a series of challenging scalability tests conducted in the months preceding the LHC run in the fall of 2009. To foster wider adoption of Panda in the OSG user community, we created a prototype of a data service that will make use easier by individual users, by providing a Web-based user interface for uploading and management of input and output data, and a secure backend that allows Panda pilot jobs to both download and transmit data as required by the Panda workflow. No additional software is required on the users’ desktop PCs or lab computers, and this will be helpful for smaller research groups, who may lack manpower to support larger software stacks. This, in part, helped us to start collaborating with BNL research group of Long Baseline Neutrino Experiment at DUSEL, who expressed strong interest in using Panda to meet their science goals. Other potential users include an interdisciplinary group at University of Wisconsin (Milwaukee), who are starting work on an advanced laser spectrometry system. We started using Panda to facilitate the ITB (Integration Testbed) activity in OSG, which allows site administrators to monitor test job execution from a single Web location and have test results automatically documented via Panda logging mechanism. Progress was made with the glideinWMS system approaching the project goal of pilot-based large-scale workload management. Version 2.0 has been released and is capable of servicing multiple virtual organizations with a single deployment. The FermiGrid facility has expressed interest in putting this in service for VOs based at Fermilab. Experiments such as CMS, CDF and MINOS are currently using glideinWMS in their production activities. The IceCube project recently used the glideinWMS facility at UCSD to run their jobs on OSG resources. Discussions are underway with new potential adopters including DZero and CompBioGrid. We also contin- 51 ued the maintenance of the gLExec (user ID management software), a collaborative effort with EGEE, as a project responsibility. In the area of WMS security enhancements, we completed integration of gLExec into Panda. It is also actively used in glideinWMS. In addition to giving the system more flexibility from security and authorization standpoint, this also allows us to maintain a high level of interoperability of the OSG workload management software with our WLCG collaborators in Europe, by following a common set of policies and using compatible tools, thus enabling both Panda and glideinWMS to operate transparently in both domains. An important part of this activity was the integration test of a new suite of user authorization and management software (SCAS) developed by WLCG, which involved testing upgrades of gLExec and its interaction with site infrastructure. We made a significant contribution to that activity and continue to actively support further work in that area. Work continued on the development of Grid User Management System (GUMS) for OSG. This is an identity mapping service which allows sites operate on the Grid while relying on traditional methods of user authentication, such as UNIX accounts or Kerberos. Based on our experience with GUMS in production since 2004, a number of new features have been added which enhance its usefulness for OSG. In this reporting period, we faced and addressed the following challenges: The absence of a lightweight, easy-to-maintain data transfer mechanism for jobs executed in Panda pilot-based framework. This was resolved by creating such a system, which is now available to users. The legacy nature of certain parts of the Panda information system, notably monitoring, which impeded integration with experiment-wide dashboard software of ATLAS, and necessary specialization of user interfaces in other cases. We addressed this by initiating work to upgrade Panda monitoring, by moving it to a modern platform and a modular structure with network API. The lack of awareness of potential entrants to OSG of capabilities and advantages of Workload Managements Systems run by our organization. This will need to be rectified by producing a set of comprehensive documentation available from OSG Web site. This program of work continues to be important for the science community and OSG for several reasons. First, having a reliable WMS is a crucial requirement for a science project involving large scale distributed computing which processes vast amounts of data. A few of the OSG key stakeholders, in particular LHC experiments ATLAS and CMS, fall squarely in that category, and the Workload Management Systems supported in the OSG serve as a key enabling factor for these communities. Second, drawing new entrants to OSG will provide benefit of access to opportunistic resources to organizations that otherwise would not be able to achieve their research goals. As more improvements are made to the system, Panda will be in a position to serve a wider spectrum of science disciplines. 3.13 Internet2 Joint Activities Internet2 collaborates with OSG to develop and test a suite of tools and services that make it easier for OSG sites to support their widely distributed user community. A second goal is to leverage the work within OSG to create scalable solutions that will benefit the entire Internet2 membership. Identifying and resolving performance problems continues to be a major challenge for 52 OSG site administrators. A complication in resolving these problems is that lower than expected performance can be caused by problems in the network infrastructure, the host configuration, or the application behavior. Advanced tools can quickly isolate problem(s) and will go a long way toward improving the grid user experience and making grids more useful to science communities. In the past year, Internet2 has worked with the OSG software developers to incorporate several advanced network diagnostic tools into the VDT package. These client programs interact with perfSONAR-based servers deployed on the Internet2 and ESnet backbones to allow on-demand testing of poorly performing sites. By enabling OSG site administrators and end users to test any individual compute or storage element in the OSG environment, we can reduce the time it takes to begin the network troubleshooting process. It also allows site administrators or users to quickly determine if a performance problem is due to the network, a host configuration issue, or an application behavior. In addition to deploying client tools via the VDT, Internet2 staff, working with partners in the US and internationally, have created a simple live-CD distribution mechanism for the server side of these tools (perfSONAR-Performance-Toolkit). This bootable CD allows an OSG site-admin to quickly stand up a perfSONAR-based server to support the OSG users. These perfSONAR hosts automatically register their existence in a distributed global database, making it easy to find new servers as they become available. Internet2 staff also identified an affordable 1U rack mountable computer that can be used to run this server software. OSG site administrators can now order this standard hardware, ensuring that they can quickly get started with a known good operating environment. These servers provide two important functions for the OSG site-administrators. First they provide an end point for the client tools deployed via the VDT package. OSG users and siteadministrators can run on-demand tests to begin troubleshooting performance problems. The second function they perform is to host regularly scheduled tests between peer sites. This allows a site to continuously monitor the network performance between itself and the peer sites of interest. The USATLAS community has deployed perfSONAR hosts and is currently using them to monitor network performance between the Tier1 and Tier2 sites. Internet2 has attended weekly USATLAS calls to provide on-going support of these deployments, and has come out with regular bug fixes. Additionally, the USCMS community has recently begun to deploy perfSONARPerformance-Toolkit based hosts to evaluate their usefulness for that community of users. Finally, on-demand testing and regular monitoring can be performed to both peer sites and the Internet2 or ESNet backbone network using either the client tools, or the perfSONAR servers. Internet2 will continue to interact with the OSG admin community to learn ways to improve this distribution mechanism. Another major task for Internet2 is to provide training on the installation and use of these tools and services. In the past year Internet2 has participated in several OSG site-admin workshops, the annual OSG all-hands meeting, and interacted directly with the LHC community to determine how the tools are being used and what improvements are required. Internet2 has provided hands-on training in the use of the client tools, including the command syntax and interpreting the test results. Internet2 has also provided training in the setup and configuration of the perfSONAR server, allowing site-administrators to quickly bring up their server. Finally, Internet2 staff has participated in several troubleshooting exercises; this includes running tests, interpreting the test results and guiding the OSG site-admin through the troubleshooting process. 53 3.14 ESNET Joint Activities OSG critically depends on ESnet for the network fabric over which data is transferred to and from the Laboratories and to/from LIGO Caltech (by specific MOU). ESnet is part of the collaboration delivering and supporting the perfSONAR tools that are now in the VDT distribution. OSG makes significant use of ESnets collaborative tools with telephone and video meetings. And ESnet and OSG are starting discussions towards collaboration in testing of the 100Gigabit network testbed as it becomes available in the future. OSG is the major user of the ESnet DOE Grids Certificate Authority for the issuing of X509 certificates (Figure 19). Registration, renewal and revocation are done through the OSG Registration Authority and ESnet provided web interfaces. ESnet and OSG collaborate on the user interface tools needed by the OSG Stakeholders for management and reporting of certificates. Figure 19: Number and distribution of valid certificates currently in use by OSG communities. OSG and ESnet have worked on a contingency plan to organize responses to an incident that would make the issued certificates untrustworthy. We also partner as members of the identity management accreditation bodies in America (TAGPMA) and globally (International Grid Trust Federation, IGTF). OSG and ESnet jointly organized a workshop on Identity Management in November with two complementary goals. To look broadly at the identity management landscape and evolving trends regarding identity in the web arena and also to gather input and requirements from the OSG communities about their current issues and expected future needs. A main result of the analysis of web-based technologies is the ability to delegate responsibility, which is essential for 54 grid computing, is just beginning to be a feature of web technologies and still just for interactive timescales. A significant result from gathering input from the users is that the communities tend to either be satisfied with the current identity management functionality or are dissatisfied with the present functionality and see a strong need to have more fully integrated identity handling across the range of collaborative services used for their scientific research. The results of this workshop and requirements gathering survey are being used to help plan the future directions for work in this area. 4. 4.1 Training, Outreach and Dissemination Training and Content Management During 2009, OSG began an evolution of its Education and Training area from a combination of general grid technology education and more specific training toward a focus on training in creation and use of grid technologies. This shift and reduction in scope in education and training was undertaken to accommodate an increase in effort needed to improve the quality and content relevancy of our documentation and training material. Consistent with the change in focus, Content Management has been added to the area with the realization that improved documentation forms the basis for improved training. A study of OSG documentation, conducted in the spring of 2009, involved interviews of 29 users and providers of OSG documentation and subsequent analysis of the results to produce recommendations for improvement. These study recommendations were reviewed and analyzed in a two-day workshop of relevant experts. The implementation of those recommendations began in October, 2009 and will continue through 2010. The OSG Training program brings domain scientists and computer scientists together to provide a rich training ground for the engagement of students, faculty and researchers in learning the OSG infrastructure, applying it to their discipline and contributing to its development. During 2009, OSG sponsored and conducted numerous training events for students and faculty and participated in additional invited workshops. The result was that grid computing training reached about 200 new professionals. Training organized and delivered by OSG in 2009 is identified in the following table: Workshop Length Location Month Chile Grid Workshop 3 days Santiago, Chile January OSG Site Administrators Workshop 2 days Clemson, SC March New Mexico Grid School 3 days Albuquerque, NM April North Carolina Grid School 3 days Chapel Hill, NC April Site Administrators Workshop 2 days Indianapolis, IN August Columbia Grid Workshop 2 weeks Bogata, Columbia October 55 OSG staff also participated as keynote speakers, instructors, and/or presenters at four venues in 2009 as detailed in the following table: Venue Length Location Month International Winter School on Grid Computing 8 sessions Online - 6 wks Feb-Mar Easter recess UJ Research Cluster 6 days Johannesburg, South Africa April International Summer School on Grid Computing 11 days Nice, France July Grace Hopper Conference 09 4 days Tucson, AZ Oct OSG was a co-organizer of the International Summer School on Grid Computing (ISSGC09). The OSG Education team arranged sponsorship (from NSF) for US-based students to attend this workshop; we selected and prepared ten US students who participated in the school. In addition, OSG staff provided direct contributions to the International Grid School by attending, presenting, and being involved in lab exercises, development, and student engagement. Another aspect of the EGEE-OSG collaboration at the education level involves the participation in the International Winter School on Grid Computing (IWSGC09); US graduates of these grid schools include system administrators of CMS Tier-3 sites. The content of the training material has been enhanced over the past year to include an updated module targeting new OSG Grid site administrators. Work continues to enhance the on-line delivery of user training and we are seeing an increased number of students and faculty that are signing up for the self-paced online course (about 50 active participants). In addition to individuals, we have made the course accessible as support material for graduate courses on grid computing at universities around the country, such as Rochester Institute of Technology, University of Missouri at St. Louis and Clemson University. The Content Management project has defined a collaborative process for managing production of high-quality documentation, defined document standards and templates, reviewed and identified documents for improvement or elimination, begun improvement of storage documentation, assisted in production of ATLAS and CMS Tier-3 documentation, and begun reorganization of documentation access by user role. This work will continue through 2010. 4.2 Outreach Activities We present a selection of the presentations and book publications from OSG in the past year: Joint EGEE and OSG Workshop at the High Performance and Distributed Computing (HPDC 2009): “Workshop on Monitoring, Logging and Accounting, (MLA) in Production Grids. http://indico.fnal.gov/conferenceDisplay.py?confId=2335 Presentation at BIO-IT WORLD CONFERENCE & EXPO 2009, Ramping Up Your Computational Science to a Global Scale on the Open Science Grid http://www.bioitworldexpo.com/ 56 Presentation to International Committee for Future Accelerators (ICFA) http://wwwconf.slac.stanford.edu/icfa2008/Livny103008.pdf Contributions to the DOE Grass Roots Cyber Security R&D Town Hall white paper. Book contribution for “Models and Patterns in Production Grids” being coordinated by TeraGrid. http://osg-docdb.opensciencegrid.org/0008/000800/001/OSG-production-V2.pdf Workshop at Grace Hopper conference. http://gracehopper.org/2008/assets/GHC2008Program.pdf Members of the OSG are participating in workshops to chart the future of cyberinfrastructure in the US: The NSF Task Force on Campus Bridging (Miron Livny, John McGee) Software sustainability (Miron Livny, Ruth Pordes) DOE Cybersecurity (Mine Altunay) ESnet requirements gathering (Miron Livny, Frank Wuerthwein) HPC Best Practices Workshop (Alain Roy). In the area of international outreach, we were active in South America and maintaining our connection to the completed site in South Africa. OSG staff conducted grid training schools at the University of Johannesburg in April 2009 followed by a series of encounters with domain scientists with the goal of providing them with guidance in using cluster and grid computing techniques for advancing their research. In South America, partnership with the GRidUNESP regional grid was agreed to and Horst Severini from Oklahoma University accepted the request to be the OSG liaison. He has attended 2 workshops in Sao Paolo to help the community establish their regional infrastructure supported by OSG services, software and experienced staff. OSG provided the staff for teaching a 2-week course in Colombia, again to support the next steps in the implementation of their regional cyberinfrastructure. Co-sponsorship of the International Summer School on Grid Computing in France (http://www.issgc.org/). OSG sponsored 10 students to attend the 2-week workshop, provided a key-note speaker and 3 teachers for lectures and hands-on exercises. 6 of the students have responded to our follow up queries and are continuing to use CI – local, OSG, TG. Continued co-editorship of the highly successful International Science Grid This Week newsletter, www.isgtw.org. OSG is very appreciative that DOE and NSF have been able to supply funds matching to the European effort starting in January 2009. A new full time editor was hired effective July 2009. Future work will include increased collaboration with TeraGrid. Presentations at the online International Winter School on Grid Computing http://www.iceage-eu.org/iwsgc09/index.cfm 4.3 Internet dissemination OSG co-sponsors the weekly electronic newsletter called International Science Grid This Week (http://www.isgtw.org/); this is a joint collaboration with the Enabling Grids for EScience 57 (EGEE) project in Europe. Additional contributions, as well as a member of the editorial board, come from the TeraGrid. The newsletter has been very well received, having just published 152 issues with subscribers totaling approximately 5,641, an increase of over 35% in the last year. This newsletter covers the global spectrum of science, and projects that support science, using distributed computing. In the last year, due to the increased support from NSF and DOE, we have been able to increase the US effort contribution and thus improve our ability to showcase US contributions. In addition, the OSG has a comprehensive web site to inform and guide stakeholders and new users of the OSG at http://www.opensciencegrid.org. As part of this we solicit and publish research highlights from our stakeholders; research highlights for the last year are accessible via the following links: Linking grids uncovers genetic mutations Superlink-online helps genealogists perform compute-intensive analyses to discover diseasecausing anomalies. August 2009 Grid helps to filter LIGO’s data Researchers must process vast amounts of data to look for signals of gravitational waves, minute cosmic ripples that carry information about the motion of objects in the universe. August 2009 Grid-enabled virus hunting Scientists use distributed computing to compare sequences of DNA in order to identify new viruses. June 2009 Clouds make way for STAR to shine The STAR experiment, running compute-intensive analyses on the OSG to study nuclear decay, has successfully made use of cloud computing as well. April 2009 Single and Playing Hard to Get High-energy physicists recently discovered the top quark produced as a single particle. The tools and techniques that made this discovery possible advance the goal of observing the Higgs particle. March 2009 The abovementioned research highlights and those published in prior years are available at http://www.opensciencegrid.org/About/What_We%27re_Doing/Research_Highlights. 58 5. 5.1 Participants Organizations The members of the Council and List of Project Organizations 1. Boston University 2. Brookhaven National Laboratory 3. California Institute of Technology 4. Clemson University 5. Columbia University 6. Distributed Organization for Scientific and Academic Research (DOSAR) 7. Fermi National Accelerator Laboratory 8. Harvard University (Medical School) 9. Indiana University 10. Lawrence Berkeley National Laboratory 11. Purdue University 12. Renaissance Computing Institute 13. Stanford Linear Accelerator Center (SLAC) 14. University of California San Diego 15. University of Chicago 16. University of Florida 17. University of Illinois Urbana Champaign/NCSA 18. University of Nebraska – Lincoln 19. University of Wisconsin, Madison 20. University of Buffalo (council) 21. US ATLAS 22. US CMS 23. STAR 24. LIGO 25. CDF 26. D0 27. Condor 28. Globus 5.2 Partnerships and Collaborations The OSG relies on external project collaborations to develop the software to be included in the VDT and deployed on OSG. Our collaborations are continuing with: Community Driven Improvement of Globus Software (CDIGS), SciDAC-2 Center for Enabling Distributed Petascale Science (CEDPS), Condor, dCache collaboration, Data Intensive Science University Network (DISUN), Energy Sciences Network (ESNet), Internet2, National LambdaRail (NLR), BNL/FNAL Joint Authorization project, LIGO Physics at the Information Frontier, Fermilab Gratia Accounting, SDM project at LBNL (BeStMan), SLAC Xrootd, Pegasus at ISI, U.S. LHC software and computing. OSG also has close working arrangements with “Satellite” projects, defined as independent projects contributing to the OSG roadmap, with collaboration at the leadership level. Four new satellite projects are: 59 High Throughput Parallel Computing (HTPC) on OSG resources for an emerging class of applications where large ensembles (hundreds to thousands) of modestly parallel (4- to ~64way) jobs. Application testing over the ESnet 100-Gigabit network prototype, using the storage and compute end-points supplied by the Magellan cloud computing at ANL and NERSC. CorralWMS to enable user access to provisioned resources and “just-in-time” available resources for a single workload integrate and build on previous work on OSG's GlideinWMS and Corral, a provisioning tool used to complement the Pegasus WMS used on TeraGrid. VOSS: “Delegating Organizational Work to Virtual Organization Technologies: Beyond the Communications Paradigm” (OCI funded, NSF 0838383) The European EGEE infrastructure will transition to several separate projects in March 2010. OSG has been active in talking to the new leadership to ensure a smooth transition that does not impact our stakeholders. We have several joint EGEE-OSG-WLCG technical meetings a year. At the last one in December 2009 the following list of existing contact points/activities was revisiting in the light of organizational changes: Continue within EGI-InSPIRE. The EGI Helpdesk (previously GGUS) will continue being run by the same team. Grid Operations Center ticketing systems interfaces. Security Incident Response. Joint Monitoring Group (MyOSG, MyEGEE). MyEGI development undertaken by CERN (on behalf of EGI-InSPIRE) will establish relationship with IU. Interoperations (possibly including Interoperability) Testing. Software Security Validation. [Security Vulnerability Group continues inc. EMI] Joint Operations meetings and collaboration [May stop naturally]. Joint Security Policy Group. [Security Policy Group] IGTF. Infrastructure Policy Working Group. [European e-Infrastructure Forum as well] Middleware Security Working Group. [Software Security Group - EGI - EMI representation] Dashboards. [HUC EGI-InSPIRE] In the context of the WLCG Collaboration WLCG Management Board. WLCG Grid Deployment Board. WLCG Technical Forum. Accounting Gratia-APEL interfaces. MyOSG-SAM interface [Exchange/Integration of OSG monitoring records]. Xrootd collaboration and support. European Middleware Initiative Storage Resource Manager (SRM) specification, interfaces and testing. dCache.org collaboration. Glexec Authorization projects. OGF Production Grid Infrastructure Working Group. 60 Virtual Data Toolkit support and Engineering Management Team. We are starting to work with TeraGrid to better understand the scope of our joint activities. These activities are summarized in Table 4. Table 4: Joint OSG – TeraGrid activities Task Mission/Goals OSG & TG Joint Activity Tracking and Reporting iSGTW continuation and joint support plan; proposal for US based contributions Resource Allocation Analysis and Recommendations Resource Accounting Inter-operation and Convergence Recommendations Explore how campus outreach and activities can be coordinated SCEC Application to use both OSG & TG Joint Middleware Distributions (Client side) Security Incidence Response Workforce Development Joint Student Activities (e.g. TG conference, summer schools, etc.) Infrastructure Policy Group (run by Bob Jones, EGEE) OSG Owner Chander Sehgal TG Owner Tim Cockerill Stage Active Judy Jackson Elizabeth Leake Planning Miron Livny, Chander Sehgal Phillip Canal, Brian Bockelman John McGee, Ruth Pordes John McGee, Mats Rynge Alain Roy, ? Kent Milfeld Planning Kent Milfeld, David Hart Planning Kay Hunt, Scott Lathrop Dan Katz Active Lee Liming, J.P. Navarro Jim Marsteller, Von Welch John Towns, Scott Lathrop Scott Lathrop, Laura McGiness J.P. Navarro, Phil Andrews Active Mine Altunay, Jim Barlow Ruth Pordes, Miron Livny David Ritchie, Jim Weichel Ruth Pordes, Miron Livny 61 Active Active Planning Active Active 6. Cooperative Agreement Performance OSG has put in place processes and activities that meet the terms of the Cooperative Agreement and Management Plan: The Joint Oversight Team meets periodically, as scheduled by DOE and NSF, via phone to hear about OSG progress, status, and concerns. Follow-up items are reviewed and addressed by OSG, as needed. Two intermediate progress reports were submitted to NSF in February and June of 2007. The Science Advisory Group (SAG) met in June 2007. The OSG Executive Board has addressed feedback from the Advisory Group. Revised membership of the SAG was done in August 2009. Telephone discussions with each member are half complete for the end of December 2009. In February 2008, a DOE annual report was submitted. In July 2008, an annual report was submitted to NSF. In December 2008, a DOE annual report was submitted. In June 2009, an annual report was submitted to NSF. As requested by DOE and NSF, OSG staff provides pro-active support in workshops and collaborative efforts to help define, improve, and evolve the US national cyberinfrastructure. 62 7. Publications Using OSG Infrastructure ATLAS experiment 1. arXiv:0912.2642 [ps, pdf, other] Title: Readiness of the ATLAS Liquid Argon Calorimeter for LHC Collisions Authors: The ATLAS Collaboration Comments: 31 pages, 34 figures, submitted to EPJC [v2: Fig. 1 fixed] Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment (hep-ex) 2. arXiv:0910.5156 [ps, pdf, other] Title: Alignment of the ATLAS Inner Detector Tracking System Authors: Grant Gorfine, for the ATLAS Collaboration Comments: To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf C090726 Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment (hep-ex) 3. arXiv:0910.2991 [ps, pdf, other] Title: Commissioning of the ATLAS Liquid Argon Calorimeters Authors: Mark S. Cooke, for the ATLAS Collaboration Comments: Proceedings from the CIPANP 2009 Conference Subjects: Instrumentation and Detectors (physics.ins-det) 4. arXiv:0910.2767 [pdf, other] Title: ATLAS Muon Detector Commissioning Authors: E. Diehl, for the ATLAS muon collaboration Comments: 6 pages, 18 figures. To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf C090726 Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment (hep-ex) 5. arXiv:0910.0847 [ps, pdf, other] Title: Results from the Commissioning of the ATLAS Pixel Detector with Cosmic data Authors: E. Galyaev, for the ATLAS collaboration Comments: To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf C090726. Contents: 9 pages, 13 figures, 9 references Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment (hep-ex) 6. arXiv:0910.0097 [pdf, other] Title: Scalable Database Access Technologies for ATLAS Distributed Computing Authors: A. Vaniachine, for the ATLAS Collaboration Comments: 6 pages, 7 figures. To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf C090726 Subjects: Instrumentation and Detectors (physics.ins-det); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics - Experiment (hep-ex) 63 7. arXiv:0907.0441 [pdf, other] Title: Readiness of the ATLAS Experiment for First Data Authors: Thilo Pauly, for the ATLAS Collaboration Comments: 8 pages, 18 figures, to appear in the Proceedings of the XLIV Rencontres de Moriond, Electroweak Session, La Thuile, March 11, 2009 Subjects: Instrumentation and Detectors (physics.ins-det) 8. arXiv:0905.3977 [ps, pdf, other] Title: Results from the commissioning of the ATLAS Pixel detector Authors: J. Biesiada, for the ATLAS Collaboration Comments: 3 pages. Submitted to NIM A as part of the proceedings of the TIPP09 conference, held at Tsukuba, Japan Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment (hep-ex) CMS experiment 1. arXiv:0912.1189 [pdf, other] Title: Search for SUSY at LHC in the first year of data-taking Authors: Michele Pioppi on behalf of (the) CMS collaboration Comments: To be published in the proceedings of MAD09, Antananarivo. 14 pages Subjects: High Energy Physics - Experiment (hep-ex) 2. arXiv:0911.5434 [pdf, other] Title: Commissioning and Performance of the CMS Pixel Tracker with Cosmic Ray Muons Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 3. arXiv:0911.5422 [pdf, other] Title: Performance of the CMS Level-1 Trigger during Commissioning with Cosmic Ray Muons Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 4. arXiv:0911.5397 [pdf, other] Title: Measurement of the Muon Stopping Power in Lead Tungstate Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 5. arXiv:0911.4996 [pdf, other] Title: Commissioning and Performance of the CMS Silicon Strip Tracker with Cosmic Ray Muons Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 6. arXiv:0911.4994 [pdf, other] Title: Performance of CMS Muon Reconstruction in Cosmic-Ray Events Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 7. arXiv:0911.4992 [pdf, other] Title: Performance of the CMS Cathode Strip Chambers with Cosmic Rays 64 Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 8. arXiv:0911.4991 [pdf, other] Title: Performance of the CMS Hadron Calorimeter with Cosmic Ray Muons and LHC Beam Data Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 9. arXiv:0911.4904 [pdf, other] Title: Fine Synchronization of the CMS Muon Drift-Tube Local Trigger using Cosmic Rays Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 10. arXiv:0911.4895 [pdf, other] Title: Calibration of the CMS Drift Tube Chambers and Measurement of the Drift Velocity with Cosmic Rays Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 11. arXiv:0911.4893 [pdf, other] Title: Performance of the CMS Drift-Tube Local Trigger with Cosmic Rays Authors: The CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 12. arXiv:0911.4889 [pdf, other] Title: Commissioning of the CMS High-Level Trigger with Cosmic Rays Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 13. arXiv:0911.4881 [pdf, other] Title: Identification and Filtering of Uncharacteristic Noise in the CMS Hadron Calorimeter Authors: The CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 14. arXiv:0911.4877 [pdf, other] Title: Performance of CMS Hadron Calorimeter Timing and Synchronization using Cosmic Ray and LHC Beam Data Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 15. arXiv:0911.4855 [pdf, other] Title: Performance of the CMS Drift Tube Chambers with Cosmic Rays Authors: The CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 16. arXiv:0911.4845 [pdf, other] Title: Commissioning of the CMS Experiment and the Cosmic Run at Four Tesla Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 65 17. arXiv:0911.4842 [pdf, other] Title: CMS Data Processing Workflows during an Extended Cosmic Ray Run Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 18. arXiv:0911.4770 [pdf, other] Title: Aligning the CMS Muon Chambers with the Muon Alignment System during an Extended Cosmic Ray Run Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 19. arXiv:0911.4045 [pdf, other] Title: Performance Study of the CMS Barrel Resistive Plate Chambers with Cosmic Rays Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 20. arXiv:0911.4044 [pdf, other] Title: Time Reconstruction and Performance of the CMS Electromagnetic Calorimeter Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 21. arXiv:0911.4022 [pdf, other] Title: Alignment of the CMS Muon System with Cosmic-Ray and Beam-Halo Muons Authors: CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det) 22. arXiv:0910.5530 [pdf, other] Title: Precise Mapping of the Magnetic Field in the CMS Barrel Yoke using Cosmic Rays Authors: The CMS Collaboration Subjects: Instrumentation and Detectors (physics.ins-det); Accelerator Physics (physics.accph) 23. arXiv:0910.3564 [pdf, other] Title: Search for Extra Dimensions in the Diphoton Channel Authors: Selda Esen, CMS Collaboration Comments: 6 pages, 9 figures, conference proceeding To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf C090726 Subjects: High Energy Physics - Experiment (hep-ex) DZero experiment 1. Determination of the Strong Coupling Constant from the Inclusive Jet Cross Section in pp Collisions at √s = 1.96 TeV Published 12/29/09: Phys. Rev. D 80, 111107 (2009), arXiv.org:0911.2710 plots 0.7 fb-1 2. Direct Measurement of the W Boson Width Published 12/4/09: Phys. Rev. Lett. 103, 231802 (2009), arXiv.org:0909.4814 plots 1.0 fb-1 3. Measurement of the Top Quark Mass in Final States with Two Leptons Published 11/20/09: Phys. Rev. D 80, 092006 (2009), arXiv.org:0904.3195 plots 1.0 fb-1 4. Measurement of the t-Channel Single Top Quark Production Cross Section Published 11/19/09: Phys. Lett. B 682, 363 (2010), arXiv.org:0907.4259 plots 2.3 fb-1 66 5. Measurement of Z/γ*+jet+X Angular Distributions in pp Collisions at √s = 1.96 TeV Published 11/13/09: Phys. Lett. B 682, 370 (2010), arXiv.org:0907.4286 plots 1.0 fb-1 6. Search for Charged Higgs Bosons in Top Quark Decays Published 11/12/09: Phys. Lett. B 682, 278 (2009), arXiv.org:0908.1811 plots 1.0 fb-1 7. Measurement of Dijet Angular Distributions at √s = 1.96 TeV and Searches for Quark Compositeness and Extra Spatial Dimensions Published 11/5/09: Phys. Rev. Lett. 103, 191803 (2009), arXiv.org:0906.4819 plots 0.70 fb-1 8. Measurement of the WW Production Cross Section with Dilepton Final States in pp Collisions at √s = 1.96 TeV and Limits on Anomalous Trilinear Gauge Couplings Published 11/2/09: Phys. Rev. Lett. 103, 191801 (2009), arXiv.org:0904.0673 plots 1.1 fb-1 9. Combination of tt Cross Section Measurements and Constraints on the Mass of the Top Quark and its Decay into Charged Higgs Bosons Published 10/19/09: Phys. Rev. D 80, 071102 (2009), arXiv.org:0903.5525 plots 1.0 fb-1 10. Search for Pair Production of First-Generation Leptoquarks in pp Collisions at √s = 1.96 TeV Published 10/8/09: Phys. Lett. B 681, 224 (2009), arXiv.org:0907.1048 plots 1.0 fb-1 11. Measurement of the W Boson Mass Published 10/1/09: Phys. Rev. Lett. 103, 141801 (2009), arXiv.org:0908.0766 plots 1.0 fb-1 12. Search for Charged Higgs Bosons in Decays of Top Quarks Published 9/30/09: Phys. Rev. D 80, 051107 (2009), arXiv.org:0906.5326 plots 0.90 fb-1 13. Measurement of Trilinear Gauge Boson Couplings from WW + WZ → lνjj Events in pp Collisions at √s = 1.96 TeV Published 9/23/09: Phys. Rev. D 80, 053012 (2009), arXiv.org:0907.4398 plots 1.1 fb-1 14. Direct Measurement of the Mass Difference Between Top and Antitop Quarks Published 9/21/09: Phys. Rev. Lett. 103, 132001 (2009), arXiv.org:0906.1172 plots 1.0 fb-1 15. A Novel Method for Modeling the Recoil in W Boson Events at Hadron Colliders Published 8/27/09: Nucl. Instrum. Methods in Phys. Res. A 609, 250 (2009), arXiv.org:0907.3713 plots 16. Observation of Single Top-Quark Production PRL Editors' Suggestion "Physics" Synopsis article Published 8/24/09: Phys. Rev. Lett. 103, 092001 (2009), arXiv.org:0903.0850 plots in the paper more plots for talks 2.3 fb-1 17. Search for Dark Photons from Supersymmetric Hidden Valleys Published 8/17/09: Phys. Rev. Lett. 103, 081802 (2009), arXiv.org:0905.1478 plots 4.1 fb-1 18. Search for Associated Production of Charginos and Neutralinos in the Trilepton Final State using 2.3 fb-1 of Data Published 8/13/09: Phys. Lett. B 680, 34 (2009), arXiv.org:0901.0646 plots 2.3 fb-1 19. Search for Resonant Pair Production of Neutral Long-Lived Particles Decaying to bb in pp Collisions at √s = 1.96 TeV Published 8/13/09: Phys. Rev. Lett. 103, 071801 (2009), arXiv.org:0906.1787 plots 3.6 fb-1 67 20. Search for Squark Production in Events with Jets, Hadronically Decaying Tau Leptons and Missing Transverse Energy at √s = 1.96 TeV Published 8/7/09: Phys. Lett. B 680, 24 (2009), arXiv.org:0905.4086 plots 1.0 – 2.1 fb-1 21. Search for Next-to-Minimal Supersymmetric Higgs Bosons in the h → aa → μμ μμ, μμ ττ Channels using pp Collisions at √s = 1.96 TeV Published 8/3/09: Phys. Rev. Lett. 103, 061801 (2009), arXiv.org:0905.3381 plots 4.2 fb-1 22. Measurement of the tt Production Cross Section and Top Quark Mass Extraction Using Dilepton Events in pp Collisions Published 7/21/09: Phys. Lett. B 679, 177 (2009), arXiv.org:0901.2137 plots 1.0 fb-1 23. Search for the Standard Model Higgs Boson in Tau Final States Published 6/25/09: Phys. Rev. Lett. 102, 251801 (2009), arXiv.org:0903.4800 plots 1.0 fb-1 24. Search for Resonant Diphoton Production with the DØ Detector Published 6/12/09: Phys. Rev. Lett. 102, 231801 (2009), arXiv.org:0901.1887 plots 2.7 fb-1 25. Relative Rates of B Meson Decays into ψ(2S) and J/ψ Published 6/10/09: Phys. Rev. D 79, 111102(R) (2009), arXiv.org:0805.2576 plots 1.3 fb-1 26. Measurements of Differential Cross Sections of Z/γ*+jets+X Events in pp Collisions at √s = 1.96 TeV Published 5/29/09: Phys. Lett. B 678, 45 (2009), arXiv.org:0903.1748 plots 1.0 fb-1 27. Measurement of the Zγ → ννγ Production Cross Section and Limits on Anomalous ZZγ and Zγγ Couplings in pp Collisions at √s = 1.96 TeV Published 5/22/09: Phys. Rev. Lett. 102, 201802 (2009), arXiv.org:0902.2157 plots 2.6 fb-1 28. Search for Charged Higgs Bosons Decaying into Top and Bottom Quarks in pp Collisions Published 5/14/09: Phys. Rev. Lett. 102, 191802 (2009), arXiv.org:0807.0859 plots 0.90 fb-1 29. Measurement of γ + b + X and γ + c + X Production Cross Sections in pp Collisions at √s = 1.96 TeV Published 5/12/09: Phys. Rev. Lett. 102, 192002 (2009), arXiv.org:0901.0739 plots 1.0 fb-1 30. Search for Long-Lived Charged Massive Particles with the DØ Detector Published 4/22/09: Phys. Rev. Lett. 102, 161802 (2009), arXiv.org:0809.4472 plots 1.1 fb-1 31. Evidence of WW+WZ Production with Lepton+Jets Final States in pp Collisions at √s = 1.96 TeV Published 4/21/09: Phys. Rev. Lett. 102, 161801 (2009), arXiv.org:0810.3873 plots 1.1 fb-1 32. Search for the Lightest Scalar Top Quark in Events with Two Leptons in pp Collisions at √s = 1.96 TeV Published 4/17/09: Phys. Lett. B 675, 289 (2009), arXiv.org:0811.0459 plots 1.0 fb-1 33. Search for Anomalous Top-Quark Couplings with the DØ Detector Published 3/4/09: Phys. Rev. Lett. 102, 092002 (2009), arXiv.org:0901.0151 plots 1.0 fb-1 34. Evidence for the Decay Bs0→Ds(*)Ds(*) and a Measurement of ΔΓsCP/Γs Published 3/3/09: Phys. Rev. Lett. 102, 091801 (2009), arXiv.org:0811.2173 plots 2.8 fb-1 68 35. Measurement of the Lifetime of the Bc± Meson in the Semileptonic Decay Channel Published 3/2/09: Phys. Rev. Lett. 102, 092001 (2009), arXiv.org:0805.2614 plots 1.3 fb-1 36. Search for Admixture of Scalar Top Quarks in the tt Lepton+Jets Final State at √s = 1.96 TeV Published 2/20/09: Phys. Lett. B 674, 4 (2009), arXiv.org:0901.1063 plots 0.90 fb-1 37. Search for Neutral Higgs Bosons at High tanβ in the b(h/H/A)→bτ+τ− Channel Published 2/6/09: Phys. Rev. Lett. 102, 051804 (2009), arXiv.org:0811.0024 plots 0.30 fb-1 38. Search for Large Extra Spatial Dimensions in the Dielectron and Diphoton Channels in pp Collisions at √s = 1.96 TeV Published 2/6/09: Phys. Rev. Lett. 102, 051601 (2009), arXiv.org:0809.2813 plots 1.0 fb-1 39. Search for Associated W and Higgs Boson Production in pp Collisions at √s = 1.96 TeV Published 2/4/09: Phys. Rev. Lett. 102, 051803 (2009), arXiv.org:0808.1970 plots 1.1 fb-1 40. Measurement of the Semileptonic Branching Ratio of Bs0 to an Orbitally Excited Ds** State: Br(Bs0→Ds1−(2536)μ+νX) Published 2/3/09: Phys. Rev. Lett. 102, 051801 (2009), arXiv.org:0712.3789 plots 1.3 fb-1 41. Measurement of the Angular and Lifetime Parameters of the Decays Bd0→J/ψK*0 and Bs0→J/ψφ Published 1/20/09: Phys. Rev. Lett. 102, 032001 (2009), arXiv.org:0810.0037 plots 2.8 fb-1 CDF experiment 1. Measurement of the WW + WZ Production Cross Section Using the Lepton + Jets Final State at CDF II T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. November 24, 2009. Fermilab-Pub-09-593-E. arXiv: 0911.4449. Abstract, PDF (181KiB). 9964: Added: Monday, 2009 December 07 14:40:47 CST 2. Search for Fermion-Pair Decays qq-bar --> (tW-/p)(t-bar W+/-) in Same-Charge Dilepton Events T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 5, 2009. Fermilab-Pub-09-628-E. arXiv: 0912.1057. Abstract, PDF (251KiB). 9919: Added: Thursday, 2009 December 17 09:11:25 CST 3. Search for Technicolor Particles Produced in Association with a W Boson at CDF T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 10, 2009. Fermilab-Pub-09-619-E. arXiv: 0912.2059. Abstract, PDF (189KiB). 9907: Added: Thursday, 2009 December 10 13:19:32 CST 4. Measurement of the W+W- Production Cross Section and Search for Anomalous WWgamma and WWZ Couplings in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 22, 2009. Fermilab-Pub-09-635-E. arXiv: 0912.4691 Abstract, PostScript (414KiB). 9902: Added: Friday, 2010 January 08 14:06:29 CST 5. A Search for the Higgs Boson Using Neural Networks in Events with Missing Energy and bquark Jets in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-586-E. arXiv: 0911.3935. Sub- 69 mitted to Phys. Rev. Lett. November 19, 2009. Abstract, PDF (303KiB). 9838: Added: Monday, 2009 December 07 10:20:07 CST 6. Search for Pair Production of Supersymmetric Top Quarks in Dilepton Events from p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 7, 2009. Fermilab-Pub-09-614-E. Abstract, PDF (257KiB). 9823: Added: Monday, 2009 December 07 14:14:38 CST 7. A Study of the Associated Production of Photons and b-quark Jets in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 16, 2009. Fermilab-Pub-09-634-E. arXiv: 0912.3453. Abstract, PostScript (1069KiB). 9803: Added: Tuesday, 2009 December 22 12:17:10 CST 8. Measurement of the Inclusive Isolated Prompt Photon Cross Section in p anti-p Collisions at s**(1/2) = 1.96 TeV using the CDF Detector T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 11106 (2009). arXiv: 0910.3623. Abstract, PDF (178KiB). 9729: Added: Friday, 2010 January 08 14:29:27 CST 9. A Search for the Associated Production of the Standard Model Higgs Boson in the AllHadronic Channel T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 221801 (2009). arXiv: 0907.0810. Abstract, PostScript (467KiB). 9669: Added: Monday, 2009 December 07 15:30:51 CST 10. Search for New Physics with a Dijet plus Missing E(t) Signature in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 23, 2009. Fermilab-Pub-09-635-E. arXiv: 0912.4691. Abstract, PostScript (364KiB). 9329: Changed: Friday, 2010 January 08 13:56:14 CST 11. Measurement of the Top Quark Mass in the Dilepton Channel using m_T2 at CDF T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. D Rapid Communications November 17, 2009. Fermilab-Pub-09-580-E. arXiv: 0911.2956. Abstract, PDF (298KiB). 9885: Added: Tuesday, 2009 November 17 10:34:33 CST 12. Measurements of Branching Ratios and CP Asymmetries in B+/- --> D_CP K+/- Decays in Hadron Collisions T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. D November 2, 2009. Fermilab-Pub-09-549-E. arXiv: 0911.0425. Abstract, PDF (218KiB). 9708: Added: Monday, 2009 November 09 09:52:39 CST 13. Search for New Color-Octet Vector Particle Decaying to t anti-t in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Lett. B November 16, 2009. Fermilab-Pub-09-578-E. arXiv: 0911.3112. Abstract, PDF (341KiB). 9636: Changed: Monday, 2009 November 16 13:11:18 CST 14. Search for Higgs Bosons Predicted in Two-Higgs-Doublet Models Via Decays to Tau lepton Pairs in 1.96 TeV p anti-p Collisions 70 A. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 201801 (2009). Abstract, PDF (231KiB). 9622: Added: Friday, 2009 November 13 09:44:11 CST 15. Measurement of the tt-bar Production Cross Section in 2 fb-1 of p anti-p Collisions at s**(1/2) 1.96 TeV Using Lepton Plus Jets Events with Soft Muon Tagging T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 052007 (2009). arXiv: 0901.4142. Abstract, PDF (810KiB). 9587: Added: Wednesday, 2009 November 11 13:23:20 CST 16. First Observation of B-bar0_s --> D+/-_s K-/+ and Measurement of the Ratio of Branching Fractions B(B-bar0_s --> D+/-_s K-/+)/B(B-bar0_s -> D+_s pi+-) T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 191802 (2009) Abstract, PDF (287KiB). 9372: Added: Wednesday, 2009 November 11 10:35:13 CST 17. Search for Anomalous Production of Events with Two Photons and Additional Energetic Objects at CDF T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-535-E. arXiv: 0910.5170. Submitted to Phys. Rev. D October 27, 2009. Abstract, PostScript (1106KiB). 9886: Added: Tuesday, 2009 October 27 15:25:07 CDT 18. Measurements of the Top-Quark Mass Using Charged Particle Tracking T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. D October 7, 2009. Fermilab-Pub-09-464-E. arXiv: 0910.0969. Abstract, PDF (841KiB). 9840: Added: Wednesday, 2009 October 07 10:30:19 CDT 19. A Search for the Higgs Boson Produced in Association with Z --> ell+ ell- Using the Matrix Element Method at CDF II T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 071101 (2009). arXiv: 0908.3534. Abstract, PDF (443KiB). 9810: Added: Monday, 2009 October 26 15:14:50 CDT 20. Observation of the Omega_b and Measurement of the Properties of the Xi-_b and Omega-_b Baryons T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 072003 (2009). arXiv: 0905.3123. Abstract, PostScript (1203KiB). 9726: Added: Monday, 2009 October 26 14:01:34 CDT 21. Precision Measurement of the X(3872) Mass in J/psi pi+_ pi+- Decays T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 152001 (2009). Abstract, PDF (189KiB). 9700: Added: Monday, 2009 October 26 14:32:24 CDT 22. First Observation of Vector Boson Pairs in a Hadronic Final State at the Tevatron Collider T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 091803 (2009). arXiv: 0905.4714. Abstract, PDF (214KiB). 9784: Added: Tuesday, 2009 September 08 16:01:39 CDT 23. Measurement of the Mass of the Top Quark Using the Invariant Mass of Lepton Pairs in Soft Muon b-tagged Events T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 051104 (2009). arXiv: 0906.5371. Abstract, PostScript (434KiB). 9739: Added: Tuesday, 2009 September 15 12:24:35 CDT 71 24. Search for a Standard Model Higgs Boson in WH --> l nu bb-bar in p anti-p Collisions at s**(1/2) = 1.96 TeV Phys. Rev. Lett. 103, 101802 (2009). arXiv: 0906.5613. Abstract, PDF (177KiB). 9720: Changed: Tuesday, 2009 September 15 10:39:57 CDT 25. Search for Charged Higgs Bosons in Decays of Top Quarks in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 101803 (2009). arXiv: 0907.1269. Abstract, PDF (171KiB). 9687: Added: Tuesday, 2009 September 15 11:19:04 CDT 26. Search for the Neutral Current Top Quark Decay t --> Zc Using Ratio of Z-Boson + 4 Jets to W-Boson + 4 Jets Production T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 052001 (2009). arXiv: 0905.0277. Abstract, PDF (525KiB). 9656: Added: Wednesday, 2009 September 16 14:09:28 CDT 27. Measurement of the b-jet Cross Section in Events with a W Boson in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-416-E. arXiv: 0909.1505. Submitted to Phys. Rev. Lett. September 8, 2009. Abstract, PDF (286KiB). 9649: Added: Tuesday, 2009 September 08 15:12:52 CDT 28. Search for Anomalous Production of Events with a Photon, Jet, b-quark Jet, and Missing Transverse Energy T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 052003 (2009). arXiv: 0905.0231. Abstract, PDF (987KiB). 9627: Added: Wednesday, 2009 September 16 14:41:30 CDT 29. Search for Hadronic Decays of W and Z Bosons in Photon Events in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 052011 (2009). arXiv: 0803.4264. Abstract, PostScript (989KiB). 9050: Added: Tuesday, 2009 September 29 13:09:53 CDT 30. Measurement of d sigma/dy of Drell-Yan e+e- Pairs in the Z Mass Region from p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-402-E. arXiv: 0908.3914. Submitted to Phys. Rev. Lett. August 27, 2009. Abstract, PostScript (347KiB). 9832: Added: Thursday, 2009 August 27 14:51:03 CDT 31. Observation of Electroweak Single Top Quark Production T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 092002 (2009). arXiv: 0903.0885. Abstract, PDF (193KiB). 9715: Added: Thursday, 2009 September 03 16:21:53 CDT 32. Search for a Fermiophobic Higgs Boson Decaying into Diphotons in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 061803 (2009). arXiv: 0905.0413. Abstract, PDF (154KiB). 9655: Added: Wednesday, 2009 August 12 11:47:12 CDT 72 33. Search for the Production of Narrow tb-bar Resonances in 1.9 fb-1 of p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 041801 (2009). arXiv: 0902.3276. Abstract, PDF (178KiB). 9507: Added: Wednesday, 2009 August 12 13:31:09 CDT 34. Production of psi(2S) Mesons in p anti-p Collisions at 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 031103 (2009). arXiv: 0905.1982. Abstract, PDF (242KiB). 9205: Changed: Wednesday, 2009 August 12 11:48:22 CDT 35. Searching the Inclusive ell_gamma Missing E(t) + b-quark Signature for Radiative Top Quark Decay and Non-Standard-Model Processes T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 011102 (2009). arXiv: 0906.0518. Abstract, PDF (313KiB). 9675: Added: Wednesday, 2009 August 19 11:12:36 CDT 36. Search for Hadronic Decays of W and Z in Photon Events in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Eur. Phys. J. C62: 319:326 (2009). arXiv: 0903.2060. Abstract, PDF (161KiB). 9652: Added: Tuesday, 2009 September 29 14:22:10 CDT 37. Search for a Standard Model Higgs Boson Production in Association with a W Boson using a Neural Network Discriminant at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 012002 (2009) Abstract, PDF (252KiB). 9639: Added: Wednesday, 2009 August 19 14:42:33 CDT 38. Search for Long-Lived Massive Charged Particles in 1.96 TeV p anti-p Collisions T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 021802 (2009). arXiv: 0902.1266. Abstract, PDF (235KiB). 9552: Added: Tuesday, 2009 July 14 14:48:09 CDT 39. Observation of New Charmless Decays of Bottom Hadrons T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 031801 (2009) Abstract, PostScript (267KiB). 9525: Added: Wednesday, 2009 August 19 14:14:31 CDT 40. Precision Measurement of the X(3872) Mass in J/psi pi++ pi+- Decays T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-330-E. arXiv: 0906.5218. Submitted to Phys. Rev. Lett. June 29, 2009. Abstract, PostScript (257KiB). 8311: Added: Friday, 2009 July 10 15:12:59 CDT 41. First Measurement of the tt-bar Differential Cross Section d sigma/dM_tt-bar in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 222003 (2009). arXiv: 0903.2850. Abstract, PDF (199KiB). 9664: Added: Tuesday, 2009 June 09 11:03:10 CDT 42. Evidence for a Narrow Near-Threshold Structure in the J/psi phi Mass Spectrum in B+ --> J/psi phi K+ Decays T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 242002 (2009). arXiv: 73 0903.2229. Abstract, PostScript (1047KiB). 9638: Added: Tuesday, 2009 June 30 12:27:35 CDT 43. Search for Gluino-Mediated Bottom Squark Production in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 221801 (2009). arXiv: 0903.2618. Abstract, PostScript (420KiB). 9631: Added: Thursday, 2009 June 11 11:12:10 CDT 44. Search for Exclusive Z Boson Production and Observation of High Mass p anti-p --> p gamma gamma p-bar --> p ell+ ell- p-bar Events in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 222002 (2009). arXiv: 0902.2816. Abstract, PDF (151KiB). 9630: Added: Tuesday, 2009 June 09 11:50:04 CDT 45. Measurement of the Particle Production and Inclusive Differential Cross Sections in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112005 (2009). Abstract, PDF (455KiB). 9550: Added: Tuesday, 2009 July 07 13:41:22 CDT 46. Search for WW and WZ Production in Lepton Plus Jets Final States at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112011 (2009). arXiv: 0903.0814. Abstract, PDF (397KiB). 9546: Added: Tuesday, 2009 June 30 12:07:45 CDT 47. First Simultaneous Measurement of the Top Quark Mass in the Lepton + Jets and Dilepton Channels at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 092005 (2009). arXiv: 0809.4808. Abstract, PDF (1384KiB). 9515: Changed: Wednesday, 2009 July 08 10:28:48 CDT 48. A Measurement of the t anti-t Cross Section in p anti-p Collisions at sqrt s = 1.96 TeV using Dilepton Events with a Lepton plus Track Selection T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112007 (2009). arXiv: 0903.5263. Abstract, PDF (842KiB). 9510: Added: Tuesday, 2009 July 07 12:54:48 CDT 49. Measurement of the Top Quark Mass at CDF using the "neutrino phi weighting" Template Method on a Lepton Plus Isolated Track Sample T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 072005 (2009). arXiv: 0901.3773. Abstract, PDF (849KiB). 9505: Added: Friday, 2009 June 05 16:23:12 CDT 50. Observation of Exclusive Charmonium Production and gamma gamma -> mu+ mu- in p antip Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 242001 (2009). arXiv: 0902.1271. Abstract, PDF (265KiB). 9477: Changed: Tuesday, 2009 July 07 11:45:57 CDT 51. Measurement of the k_T Distribution of Particles in Jets Produced in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 232002 (2009). arXiv: 74 0811.2820. Abstract, PostScript (271KiB). 9476: Added: Tuesday, 2009 July 07 14:59:58 CDT 52. Search for New Particles Decaying into Dijets in Proton-Antiproton Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112002 (2009). arXiv: 0812.4036. Abstract, PDF (244KiB). 9347: Added: Tuesday, 2009 June 09 08:58:08 CDT 53. Search for the Decays B0_(s) --> e+ mu- and B0_(s) --> e+ e- in CDF Run II T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 201801 (2009). arXiv: 0901.3803. Abstract, PDF (355KiB). 9597: Added: Wednesday, 2009 September 16 15:50:54 CDT 54. Search for Top-Quark Production via Flavor-Changing Neutral Currents in W + 1 Jet Events at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 151801 (2009). arXiv: 0812.3400. Abstract, PDF (161KiB). 9581: Added: Monday, 2009 June 08 10:34:07 CDT 55. Direct Measurement of the W Production Charge Asymmetry in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 181801 (2009). arXiv: 0901.2169. Abstract, PostScript (395KiB). 9388: Added: Thursday, 2009 June 11 15:01:37 CDT 56. Top Quark Mass Measurement in the tt-bar All Hadronic Channel Using a Matrix Element Technique in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 072010 (2009). arXiv: 0811.1062. Abstract, PDF (622KiB). 9293: Changed: Tuesday, 2009 June 30 11:59:11 CDT 57. Measurement of the b-Hadron Production Cross Section Using Decays to mu- D0X Final States in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 092003 (2009). arXiv: 0903.2403. Abstract, PostScript (941KiB). 8890: Added: Friday, 2009 May 29 13:57:22 CDT 58. Top Quark Mass Measurement in the Lepton Plus Jets Channel Using a Modified Matrix Element Method T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 072001 (2009). arXiv:0812.4469. Abstract, PostScript (1297KiB). 9491: Added: Thursday, 2009 April 16 09:34:00 CDT 59. Measurement of the Top Quark Mass with Dilepton Events Selected Using Neuroevolution at CDF II Phys. Rev. Lett. 102, 152001 (2009). arXiv:0807.4652. Abstract, PDF (297KiB). 9235: Added: Thursday, 2009 April 16 09:48:20 CDT 60. Inclusive Search for Squark and Gluino Production in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Phys. Lett. 102, 121801 (2009). arXiv: 75 0811.2512. Abstract, PDF (184KiB). 9594: Added: Wednesday, 2009 July 08 10:53:02 CDT 61. A Search for High-Mass Resonances Decaying to Dimuons at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 091805 (2009). arXiv:0811.0053. Abstract, PDF (223KiB). 9516: Added: Thursday, 2009 March 12 10:53:13 CDT 62. Measurement of W-Boson Helicity Fractions in Top-Quark Decays using cos theta* T. Aaltonen et al., The CDF Collaboration, Phys. Lett. B479, p. 160-167 (2009). arXiv: 0811.0344. Abstract, PostScript (458KiB). 9471: Added: Thursday, 2009 July 09 13:51:10 CDT 63. Measurement of the Cross Section for b Jet Production in Events with a Z Boson in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboraton, Phys. Rev. D79, 052008 (2009). arXiv: 0812.4458. Abstract, PDF (251KiB). 9466: Added: Thursday, 2009 July 09 10:31:55 CDT 64. Measurement of the Fraction of tt-bar Production via Gluon-Gluon Fusion in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 031101 (2009) Abstract, PostScript (707KiB). 9356: Added: Tuesday, 2009 March 03 16:28:16 CST 65. Measurement of Resonance Parameters of Orbitally Excited Narrow B0 Mesons T. Aaltonen et al., The CDF Collaboration, Phys. Rev. 102, 102003 (2009). arXiv:0809.5007. Abstract, PDF (169KiB). 9352: Added: Monday, 2009 March 23 10:49:26 CDT 66. Search for New Physics in the mu mu + e/mu + missing E(t) Channel with a low-p(t) Lepton Threshold at the Collider Detector at Fermilab T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 052004 (2009). arXiv:0810.3213. Abstract, PDF (388KiB). 9311: Added: Monday, 2009 March 23 13:07:44 CDT 67. Direct Bound on the Total Decay Width of the Top Quark in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 042001 (2009). arXiv:0808.2167. Fermilab-Pub-08-302-E. Abstract, PDF (186KiB). 9218: Added: Wednesday, 2009 February 11 15:00:13 CST 68. First Measurement of the Ratio of Branching Fractions B(Lambda0(b) -> Lambda+_c munu-bar_mu)/B(Lambda0(b) --> Lambda+_c pi+-) T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 032001 (2009). arXiv:0810.3213. Abstract, PostScript (1976KiB). 8477: Added: Friday, 2009 February 13 11:57:37 CST 69. Search for Maximal Flavor Violating Scalars in Same-Charge Lepton Pairs in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 041801 (2009). arXiv:0809.4903. Abstract, PDF (267KiB). 9452: Added: Wednesday, 2009 January 28 10:52:55 CST 76 70. Search for a Higgs Boson Decaying to Two W Bosons at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 021802 (2009). arXiv:0809.3930. Abstract, PDF (162KiB). 9368: Added: Tuesday, 2009 January 27 16:17:59 CST 71. Search for High-Mass e+e- Resonances in p anti-p Collisions at s**(1/2) = 1.96 TeV T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 031801 (2009). arXiv:0810.2059. Abstract, PDF (180KiB). 9259: Added: Wednesday, 2009 January 28 10:28:42 CST 72. Search for the Rare B Decays B+ --> mu+ mu- K+, B0 --> mu+ mu- K*(892)0, and B0(s) --> mu+ mu- phi at CDF T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 011104(R) (2009). arXiv:0804.3908. Abstract, PostScript (470KiB). 9119: Added: Wednesday, 2009 January 28 11:52:46 CST Other research projects 1. Structure of rotavirus outer-layer protein VP7 bound with a neutralizing Fab, Aoki et al., Science (2009) vol. 324 (5933) pp. 1444-7 2. A Probabilistic Graphical Model for Ab Initio Folding, in Research in Computational Biology, Feng Zhao, Jian Peng, Joe DeBartolo, Karl F. Freed, Tobin R. Sosnick and Jinbo Xu, ISBN 978-3-642-02007-0, SpringerLink 2009. 3. Sudden stratospheric warmings seen in MINOS deep underground muon data, Osprey S., et al., Geophys. Res. Lett., 36, L05809, doi:10.1029/2008GL036359 (2009). 4. Sixth, seventh and eighth virial coefficients of the Lennard-Jones model, A. J. Schultz and D. A. Kofke, Mol. Phys. 107 2309 (2009). 5. Lattice Strain Due to an Atomic Vacancy, SD Li, M. S. Sellers, C. Basaran, A. J. Schultz, D. A. Kofke, Int. J. Mol. Sci. 10 2798 (2009). 6. Virial coefficients of Lennard-Jones mixtures, A. J. Schultz and D. A. Kofke, J. Chem. Phys. 130 224104 (2009). 7. Interpolation of virial coefficients, A. J. Schultz and D. A. Kofke, Mol. Phys. 111 1431 (2009). 8. Fourth and Fifth Virial Coefficients of Polarizable Water, K. M. Benjamin, A. J. Schultz and D. A. Kofke, J. Phys. Chem. B 113 7810 (2009). 9. There is no Drake/Larson linear space on 30 points, Anton Betten, Dieter Betten, J. Combinatorial Designs, 28: 48-70 (2010) 10. A Self-Organized Grouping (SOG) Method for Efficient Grid Resource Discovery, Padmanabhan, A., Ghosh, S., and Wang, S. 2010. Journal of Grid Computing, Forthcoming, 2010. DOI: 10.1007/s10723-009-9145-0. 11. An Interoperable Information Service Solution for Grids, in Cyberinfrastructure Technologies and Applications (edited by Cao, J.), Padmanabhan, A., Shook, E., Liu, Y., and Wang, S., Nova Science Publishers, Inc. (2009) 77