Content for an NSFAnnual Project Report[1].doc

advertisement
OSG–doc–932
January 15, 2010
www.opensciencegrid.org
Report to the US Department of Energy
December 2009
Miron Livny
Ruth Pordes
Kent Blackburn
Paul Avery
University of Wisconsin
Fermilab
Caltech
University of Florida
1
PI, Technical Director
Co-PI, Executive Director
Co-PI, Council co-Chair
Co-PI, Council co-Chair
Table of Contents
1.
Executive Summary ............................................................................................................... 3
1.1
1.2
1.3
1.4
1.5
2.
Contributions to Science........................................................................................................ 9
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
3.
Training and Content Management .............................................................................................. 55
Outreach Activities ....................................................................................................................... 56
Internet dissemination .................................................................................................................. 57
Participants ........................................................................................................................... 59
5.1
5.2
6.
7.
Usage of the OSG Facility............................................................................................................ 34
Middleware/Software ................................................................................................................... 36
Operations .................................................................................................................................... 38
Integration and Site Coordination ................................................................................................ 39
Virtual Organizations Group ........................................................................................................ 40
Engagement .................................................................................................................................. 41
Campus Grids ............................................................................................................................... 44
Security......................................................................................................................................... 44
Metrics and Measurements ........................................................................................................... 47
Extending Science Applications ................................................................................................... 48
Scalability, Reliability, and Usability ........................................................................................... 48
Workload Management System ................................................................................................... 51
Internet2 Joint Activities .............................................................................................................. 52
ESNET Joint Activities ................................................................................................................ 54
Training, Outreach and Dissemination ............................................................................. 55
4.1
4.2
4.3
5.
ATLAS ........................................................................................................................................... 9
CMS ............................................................................................................................................. 13
LIGO ............................................................................................................................................ 16
ALICE .......................................................................................................................................... 18
D0 at Tevatron .............................................................................................................................. 18
CDF at Tevatron ........................................................................................................................... 20
Nuclear physics ............................................................................................................................ 26
MINOS ......................................................................................................................................... 30
Astrophysics ................................................................................................................................. 30
Structural Biology ........................................................................................................................ 31
Multi-Disciplinary Sciences ......................................................................................................... 33
Computer Science Research ......................................................................................................... 33
Development of the OSG Distributed Infrastructure ....................................................... 34
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
4.
What is Open Science Grid? .......................................................................................................... 3
Science enabled by Open Science Grid .......................................................................................... 4
OSG cyberinfrastructure research .................................................................................................. 5
Technical achievements in 2009 .................................................................................................... 6
Challenges facing OSG .................................................................................................................. 7
Organizations................................................................................................................................ 59
Partnerships and Collaborations ................................................................................................... 59
Cooperative Agreement Performance................................................................................ 62
Publications Using OSG Infrastructure............................................................................. 63
Sections of this report were provided by: the scientific members of the OSG Council, OSG PI-s and CoPIs, and OSG staff and partners. Paul Avery and Chander Sehgal acted as the editors.
2
1.
1.1
Executive Summary
What is Open Science Grid?
Open Science Grid (OSG) aims to transform processing and data intensive science by operating
and evolving a cross-domain, self-managed, nationally distributed cyber-infrastructure (Figure
1). OSG’s distributed facility, composed of laboratory, campus and community resources, is designed to meet the current and future needs of scientific Virtual Organizations (VOs) at all
scales. It provides a broad range of common services and support, a software platform, and a set
of operational principles that organizes and supports users and resources in Virtual Organizations. OSG is jointly funded, until 2011, by the Department of Energy and the National Science
Foundation.
Figure 1: Sites in the OSG Facility
OSG does not own any computing or storage resources. Rather, these are contributed by the
members of the OSG Consortium and used both by the owning VO and other VOs. OSG resources are summarized in Table 1.
Table 1: OSG computing resources
Number of Grid interfaced processing resources on the production infrastructure
Number of Grid interfaced data storage resources on the production infrastructure
Number of Campus Infrastructures interfaced to the OSG
Number of National Grids interoperating with the OSG
Number of processing resources on the Integration infrastructure
Number of Grid interfaced data storage resources on the in-
3
91
58
8 (Clemson, FermiGrid, Purdue, Wisconsin, Buffalo, Nebraska, Oklahoma,
SBGrid)
3 (EGEE, NGDF, TeraGrid)
21
9
tegration infrastructure
Number of Cores accessible to the OSG infrastructure
Size of Tape storage accessible to the OSG infrastructure
Size of Disk storage accessible to the OSG infrastructure
CPU Wall Clock usage of the OSG infrastructure
~54,000
~23 Petabytes @ LHC Tier1s
~15 Petabytes
Average of 32,000 CPU days/ day during
Oct. 2009
The overall usage of OSG continues to grow though utilization by each stakeholder varies depending on its needs during any particular interval. Overall use of the facility in 2009 was approximately 240M hours, a 70% increase from the previous year! (Detailed usage plots can be
found in the attached document.) During stable normal operations, OSG provides approximately
700,000 CPU wall clock hours a day (~30,000 cpu days a day) with peaks occasionally exceeding 900,000 CPU wall clock hours a day; approximately 100,000 to 200,000 opportunistic wall
clock hours are available on a daily basis for resource sharing. The efficiency of use is based on
the ratio of CPU to I/O in the applications as well as existing demand. Based on our transfer accounting (which is in its early days and ongoing in depth validation), we see over 300TB of data
movement (both intra- and inter-site) on a daily basis with peaks of 600TB/day; of this, we estimate 25% is GridFTP based transfers and the rest is via LAN-optimized protocols.
1.2
Science enabled by Open Science Grid
OSG’s infrastructure supports a broad scope of scientific research activities, including the major
physics collaborations, nanoscience, biological sciences, applied mathematics, engineering,
computer science, and, through the Engagement program, other non-physics research disciplines.
The distributed facility is heavily used, as described below and in the attached document showing usage charts.
A strong OSG focus in the last year has been supporting the ATLAS and CMS collaborations
preparations for LHC data taking that has recently re-started. Each experiment has each run significant workload tests (including STEP09), data distribution and analysis challenges as well as
maintained significant ongoing simulation processing. The OSG infrastructure has performed
well in these exercises and is ready to support full data taking (at the date of the report we have
successfully supported the initial run in December 2009. OSG is actively partnering with ATLAS and CMS to develop and deploy mechanisms that will enable productive use of over 40
new Tier3 sites in the United States.
OSG has made significant accomplishments in support of the science of the Consortium members and stakeholders. In 2009 LIGO significantly ramped up Einstein@Home production on
OSG to search for gravitational radiation from spinning neutron star pulsars. D0 published 42
papers in 2009 that utilized OSG computing facilities, including a 62% increase in Monte Carlo
production over 2008. Similarly, CDF utilized OSG facilities to publish 56 papers (with 16 additional submissions) and 83 conference papers in 2009. CMS recently submitted for publication
23 physics papers based on cosmic ray analyses and the recent December 2009 collision run, all
of which were based on significant use of OSG computing facilities. Similarly, ATLAS submitted a total of 8 papers for publication that relied on OSG resources and services. Overall, more
than 150 publications/submissions in 2009 (listed in Section 7) depended on support from the
OSG (not just in “cycles” but in software and the other services we provide).
4
Besides the physics communities the structural biology group at Harvard Medical School, mathematics research at the University of Colorado, chemistry calculations at Buffalo, protein structure modeling and prediction applications have sustained (though cyclic) use of the production
infrastructure. The Harvard paper was published in Science.
1.3
OSG cyberinfrastructure research
As a comprehensive collaboratory OSG continues to provide a laboratory for research activities
to deploy and extend advanced distributed computing technologies in the following areas:

Research on the operation of a scalable heterogeneous cyber-infrastructure in order to improve its effectiveness and throughput. As part of this research we have developed a comprehensive set of “availability” probes and reporting infrastructure to allow site and grid administrators to quantitatively measure and assess the robustness and availability of the resources
and services.

Deployment and scaling in the production use of “pilot-job” workload management system –
ATLAS PanDA and CMS glideinWMS. These developments were crucial to the experiments
meeting their analysis job throughput targets.

Scalability and robustness enhancements to Condor technologies. For example, extensions to
Condor to support Pilot job submissions were developed, significantly increasing the job
throughput possible on each Grid site.

Scalability and robustness testing of enhancements to Globus grid technologies. For example,
testing of the alpha and beta releases of the Globus GRAM5 package provided feedback to
Globus ahead of the official release, in order to improve the quality of the released software.

Scalability and robustness testing of BeStMan, XrootD, dCache, and HDFS storage technologies at-scale to determine their capabilities and provide feedback to the development team
to help meet the needs of the OSG stakeholders.

Operational experiences with a widely distributed security infrastructure that assesses usability and availability together with response, vulnerability and risk.

Support of inter-grid gateways that support transfer of data and cross- execution of jobs, including transportation of information, accounting, service availability information between
OSG and European Grids supporting the LHC Experiments (EGEE/WLCG). Usage of the
Wisconsin GLOW campus grid “grid-router” to move data and jobs transparently from the
local infrastructure to the national OSG resources. Prototype testing of the OSG FermiGridto-TeraGrid gateway to enable greater integration and thus enable easier access to appropriate resources for the science communities.

Integration and scaling enhancement of BOINC-based applications (LIGO’s Einstein@home) submitted through grid interfaces.

Further development of a hierarchy of matchmaking services (OSG MM) and Resource Selection Services (ReSS) that collect information from more than most of the OSG sites and
provide community based matchmaking services that are further tailored to particular application needs.
5

Investigations and testing of policy and scheduling algorithms to support “opportunistic” use
and backfill of resources that are not otherwise being used by their owners, using information
services such as GLUE, matchmaking and workflow engines including Pegasus and Swift.

Comprehensive job accounting across most OSG sites with published summaries for each
VO and Site. This work also supports a per-job information finding utility for security forensic investigations.
1.4
Technical achievements in 2009
More than three quarters of the OSG staff directly support (and leverage at least an equivalent
number of contributor efforts) the operation and software for the ongoing stakeholder productions and applications (the remaining quarter mainly engages new customers and extends and
proves software and capabilities; and also provides management and communications etc.). In
2009, some specific technical activities that directly support science include:

OSG released a stable, production-capable OSG 1.2 software package on a schedule that enabled the experiments to deploy and test the cyberinfrastructure before LHC data taking. This
release also allowed the Tier-1 sites to transition to be totally OSG supported, eliminating the
need for separate integration of EGEE gLite components and simplifying software layering
for applications that use both EGEE and OSG.

OSG carried out “prove-in” of reliable critical services (e.g. BDII) for LHC and operation of
services at levels that meet or exceed the needs of the experiments. This effort included robustness tests of the production infrastructure against failures and outages and validation of
information by the OSG as well as the WLCG.

Collaboration with STAR continued toward deploying the STAR software environment as
virtual machine images on grid and cloud resources. We have successfully tested publish/subscribe mechanisms for VM instantiation (Clemson) as well as VM managed by batch
systems (Wisconsin).

Extensions work on LIGO resulted in adapting the Einstein@Home for Condor-G submission, enabling a greater than 5x increase in the use of OSG by Einstein@Home.

Collaborative support of ALICE, Geant4, NanoHub, and SBGrid has increased their productive access to and use of OSG, as well as initial support for IceCube and GlueX.

Engagement efforts and outreach to science communities this year have led to work and collaboration with more than 10 additional research teams.

Security program activities that continue to improve our defenses and capabilities towards
incident detection and response via review of our procedures by peer grids and adoption of
new tools and procedures.

Metrics and measurements effort has continued to evolve and provides a key set of function
in enabling the US-LHC experiments to understand their performance against plans; and assess the overall performance and production across the infrastructure for all communities. In
addition, the metrics function handles the reporting of many key data elements to the LHC on
behalf of US-ATLAS and US-CMS.

Work in testing at-scale of various software elements and feedback to the development teams
to help achieve the needed performance goals. In addition, this effort tested new candidate
6
technologies (e.g. CREAM and ARC) in the OSG environment and provided feedback to
WLCG.

Contributions to the PanDA and GlideinWMS workload management software that have
helped improve capability and supported broader adoption of these within the experiments.

Collaboration with ESnet and Internet2 has provided a distribution, deployment, and training
framework for new network diagnostic tools such as perfSONAR and extensions in the partnerships for Identity Management.

Training, Education, and Outreach activities have reached out to numerous professionals and
students who may benefits from leverage of OSG and the national CI. The International Science Grid This Week electronic newsletter (www.isgtw.org) continues to experience significant subscription growth.
In summary, OSG continues to demonstrate that national cyberinfrastructure based on
federation of distributed resources can effectively meet the needs of researchers and scientists.
1.5
Challenges facing OSG
We continue to work and make progress on the challenges faced to meet our longer-term goals:

Dynamic sharing of tens of locally owned, used and managed compute and storage resources,
with minimal additional human effort (to use and administer, and limited negative impacts on
the communities owning them.

Utilizing shared storage with other than the owner group - not only more difficult than (the
quantized) CPU cycle sharing, but also less well supported by the available middleware.

Federation of the local and community identity/authorization attributes within the OSG authorization infrastructure.

The effort and testing required for inter-grid bridges involves significant costs, both in the
initial stages and in continuous testing and upgrading. Ensuring correct, robust end-to-end
reporting of information across such bridges remains fragile and human effort intensive.

Validation and analysis of availability and reliability testing, accounting and monitoring information. Validation of the information is incomplete, needs continuous attention, and can
be human effort intensive.

The scalability and robustness of the infrastructure has not yet reached the scales needed by
the LHC for analysis operations in the out years. The US LHC software and computing leaders have indicated that OSG needs to provide ~x2 in interface performance over the next year
or two, and the robustness to upgrades and configuration changes throughout the infrastructure needs to be improved.

Full usage of available resources. A job “pull” architecture (e.g., the Pilot mechanism) provides higher throughput and better management than one based on static job queues, but now
we need to move to the next level of effective usage.

Automated site selection capabilities are inadequately deployed and are also embryonic in the
capabilities needed – especially when faced with the plethora of errors and faults that are
7
naturally a result of a heterogeneous mix of resources and applications with greatly varying
I/O, CPU and data provision and requirements.

User and customer frameworks are important for engaging non-Physics communities in active use of grid computing technologies; for example, the structural biology community has
ramped up use of OSG enabled via portals and community outreach and support.

A common operations infrastructure across heterogeneous communities can be brittle. Efforts
to improve the early detection of faults and problems before they impact the users help everyone.

Analysis and assessment of and recommendations as a result of usage, performance, accounting and monitoring information is a key need which requires dedicated and experienced effort.

Transitioning students from the classroom to be users is possible but continues as a challenge, partially limited by the effort OSG can dedicate to this activity.

Use of new virtualization, multi-core and job parallelism techniques, scientific and commercial cloud computing. We have two new satellite projects funded in these areas: the first on
High Throughput Parallel Computing (HTPC) on OSG resources for an emerging class of
applications where large ensembles (hundreds to thousands) of modestly parallel (4- to ~64way) jobs; the second a research project to do application testing over the ESnet 100-Gigabit
network prototype, using the storage and compute end-points supplied by the Magellan cloud
computing at ANL and NERSC.

Collaboration and partnership with TeraGrid, under the umbrella of a Joint Statement of
Agreed Upon Principles signed in August 2009. New activities have been undertaken, including representation at one another’s management meetings, tests on submitting jobs to one another’s infrastructure and exploration of how to accommodate the different resource access
mechanisms of the two organizations.
These challenges are not unique to OSG. Other communities are facing similar challenges in educating new entrants to advance their science through large-scale distributed computing resources.
8
2.
2.1
Contributions to Science
ATLAS
In preparation for the startup of the LHC collider at CERN in December, the ATLAS collaboration was performing computing challenges of increasing complexity and scale. Monte Carlo production is ongoing with some 50,000 concurrent jobs worldwide, with about 10,000 jobs running
on resources provided by the distributed ATLAS computing facility operated in the U.S., comprising the Tier-1 center at BNL and 5 Tier-2 centers located at 9 different institutions. As the
facility completed its readiness for distributed analysis a steep ramp-up of analysis jobs in particular at the Tier-2 sites was observed.
Based upon the results achieved during the computing challenges and at the start of ATLAS data
taking we can confirm that the tiered, grid-based, computing model is the most flexible structure
currently conceivable to process, reprocess, distill, disseminate, and analyze ATLAS data. We
found, however, that the Tier-2 centers may not be sufficient to reliably serve as the primary
analysis engine for more than 400 U.S. physicists. As a consequence a third tier with computing
and storage resources located geographically close to the researchers was defined as part of the
analysis chain as an important component to buffer the U.S. ATLAS analysis system from unforeseen, future problems. Further, the enhancement of U.S. ATLAS institutions’ Tier-3 capabilities is essential and is planned to be built around the short and long-term analysis strategies of
each U.S. group.
An essential component of this strategy is the creation of a centralized support structure, because
the considerable obstacle to creating and sustaining campus-based computing clusters is the continuing support required. In anticipation of not having access to IT professionals to install and
maintain these clusters U.S. ATLAS at each institution has spent a considerable amount of effort
to develop an approach for a low maintenance implementation of Tier-3 computing. While computing expertise in U.S. ATLAS was sufficient to develop ideas on a fairly high level only the
combined expert knowledge and associated effort from U.S. ATLAS and OSG facilities personnel has eventually resulted in a package that is easily installable and maintainable by scientists.
Dan Fraser of the Open Science Grid (OSG) has organized regular Tier 3 Liaison meetings between several members of the OSG facilities, U.S. Atlas and U.S. CMS. During these meetings,
topics discussed include cluster management, site configuration, site security, storage technology, site design, and experiment-specific Tier-3 requirements. Based on information exchanged
at these meetings several aspects of the U.S. Atlas Tier-3 design were refined. Both the OSG and
U.S. Atlas Tier-3 documentation was improved and enhanced.
OSG has hosted two multi-day working meetings at the University of Wisconsin in Madison
with members from the OSG Virtual Data Toolkit (VDT) group, Condor development team, the
OSG Tier-3 group and U.S. Atlas attending. As a result U.S. Atlas changed the way that user accounts were to be handled based on discussions with the system manager of the University of
Wisconsin GLOW cluster; thus leading to a simpler design for smaller Tier-3 sites. Also, the
method for Condor batch software installation using yum repositories and RPM files were developed. As a result the Condor and VDT teams created an official yum repository for software installation similar to how the Scientific Linux operation system is installed. Detailed instructions
were written for the installation of U.S. Atlas Tier-3 sites, including instructions for components
as complex as storage management systems based on xrootd.
9
Following the trips to Madison, U.S. Atlas installed an entire Tier 3 cluster using virtual machines on one multi core desktop machine. This virtual cluster is used for documentation development and U.S. Atlas Tier-3 administrator training. Marco Mambelli of OSG has provided
much assistance in the configuration and installation of the software for the virtual cluster. Also,
OSG supports a crucial user software package (wlcg-client with support for xrootd added by
OSG) used by all U.S. Atlas Tier 3 users, a package that enhances and simplifies the user environment at the Tier 3 sites.
Today U.S. ATLAS (contributing to ATLAS as a whole) relies quite extensively on services and
software provided by OSG, as well as on processes and support systems that have been produced
or evolved by OSG. Partly this originates in the fact that U.S. ATLAS fully committed itself to
relying on OSG for several years now. Over the course of the last about 3 years U.S. ATLAS
itself invested heavily in OSG in many aspects – human and computing resources, operational
coherence and more. In addition and essential to the operation of the worldwide distributed
ATLAS computing facility the OSG efforts have aided the integration with WLCG partners in
Europe and Asia. The derived components and procedures have become the basis for support and
operation covering the interoperation between OSG, EGEE, and other grid sites relevant to
ATLAS data analysis. OSG provides Software components that allow interoperability with European ATLAS sites, including selected components from the gLite middleware stack such as
LCG client utilities (for file movement, supporting space tokens as required by ATLAS), and file
catalogs (server and client).
It is vital to U.S. ATLAS that the present level of service continue uninterrupted for the foreseeable future, and that all of the services and support structures upon which U.S. ATLAS relies today have a clear transition or continuation strategy.
Based on experience and observations U.S. ATLAS made a suggestion to start work on a coherent middleware architecture rather than having OSG to continue providing a distribution as a
heterogeneous software system consisting of components contributed by a wide range of projects. One of the difficulties we ran into several times was due to inter-component functional dependencies that can only be avoided if there is good communication and coordination between
component development teams. A technology working group, chaired by a member of the U.S.
ATLAS facilities group (John Hover, BNL) is asked to help the OSG Technical Director by investigating, researching, and clarifying design issues, resolving questions directly, and summarizing technical design trade-offs such that the component project teams can make informed decisions. In order to achieve the goals as they were set forth, OSG needs an explicit, documented
system design, or architecture so that component developers can make compatible design decisions and virtual organizations (VO) such as U.S. ATLAS can develop their own application
based on the OSG middleware stack as a platform.
As a short-term goal the creation of a design roadmap is in progress.
Regarding support services the OSG Grid Operations Center (GOC) infrastructure at Indiana
University is at the heart of the operations and user support procedures. It is also integrated with
the GGUS infrastructure in Europe making the GOC a globally connected system for worldwide
ATLAS computing operation.
Middleware deployment support is an essential and complex function the U.S. ATLAS facilities
are fully dependent upon. The need for continued support for testing, certifying and building a
middleware as a solid foundation our production and distributed analysis activities runs on was
10
served very well so far and will continue to exist in the future, as does the need for coordination
of the roll out, deployment, debugging and support for the middleware services. In addition the
need for some level of preproduction deployment testing has been shown to be indispensable and
must be maintained. This is currently supported through the OSG Integration Test Bed (ITB)
providing the underlying grid infrastructure at several sites with a dedicated test instance of
PanDA, the ATLAS Production and Distributed Analysis system running on top of it. This implements the essential function of validation processes that accompany incorporation of new and
new versions of grid middleware services into the VDT, the coherent OSG software component
repository. In fact U.S. ATLAS relies on the VDT and OSG packaging, installation, and configuration processes that lead to a well-documented and easily deployable OSG software stack.
U.S. ATLAS greatly benefits from OSG's Gratia accounting services, as well as the information
services and probes that provide statistical data about facility resource usage and site information
passed to the application layer and to WLCG for review of compliance with MOU agreements.
One of the essential parts of grid operations is that of operational security coordination. The coordinator is provided by OSG today, and relies on good contacts to security representatives at the
U.S. ATLAS Tier-1 center and Tier-2 sites. Thanks to activities initiated and coordinated by
OSG a strong operational security community has grown up in the U.S. in the past few years,
driven by the needs of ensuring that security problems are well coordinated across the distributed
infrastructure.
Moving on to ATLAS activities that were performed on the OSG facility infrastructure in 2009
the significant feature, besides dissemination and processing of collision data late in the year,
was the preparation for and the execution of the STEP’09 (Scale Testing for the Experimental
Program 2009) exercise, which provided a real opportunity to perform a significant scale test
with all experiments scheduling exercises in synchrony with each other. The main goals were to
utilize specific aspects of the computing model that had not previously been fully tested. The two
most important tests were the reprocessing at Tier-1 centers with the experiments recalling the
data from tape at the full anticipated rates, and the testing of analysis workflows at the Tier-2
centers. In the U.S., and in several other regions, the results were very encouraging with the Tier2 centers exceeding the metrics set by ATLAS. As an example, with more than 150,000 successfully completed analysis jobs at 94% efficiency U.S. ATLAS took the lead in ATLAS. This was
achieved while the Tier-2s were also continuing to run Monte Carlo production jobs, scaled to
keep the Tier-2 resources fully utilized. Not a single problem with the OSG provided infrastructure was encountered during the execution of the exercise. However, there is an area of concern
that may impact the facilities’ performance in the future. As we continuously increase the number of job slots at sites the performance of pilot submission through CondorG and the underlying
Globus Toolkit 2 (GT2) based gatekeeper has to keep up without slowing down the job throughput in particular when running short jobs. When addressing this point with the OSG facilities
team we found that they were open to evaluate and incorporate recently developed components,
like the CREAM CE provided by EGEE developers in Italy. There are already intensive tests
conducted by the Condor team in Madison and integration issues were discussed in December
between the PanDA team and Condor developers.
Other important achievements are connected with the start of LHC operations on November 23rd.
As all ATLAS detector components were working according to their specifications, including the
data acquisition, trigger chain and prompt reconstruction at the Tier-0 center at CERN timely
data dissemination to and stable operations of the facility services in the U.S. were the most criti11
cal aspects we were focusing on. We are pleased to report that throughout the (short) period of
data taking (ended on December 16th) data replication was observed at short latency, with data
transfers from the Tier-0 to the Tier-1 center at BNL typically completed within 2.6 hours and
transfers from the Tier-1 to 5 Tier-2 and 1 Tier-3 centers completed in 6 hours on average. The
full RAW dataset (350 TB) and the derived datasets were sent to BNL, with the derived datasets
(140 TB) further distributed to the Tier-2s. About 100 users worldwide started immediately to
analyze the collision data, with 38% of the analysis activities observed to run on the resources
provided in the U.S. Due to improved alignment and condition data that became available following first pass reconstruction of the RAW data at CERN a worldwide reprocessing campaign
was conducted by ATLAS over the Christmas holidays. The goal of the campaign was to process
all good runs at the Tier-1 centers and have the results ready by January 1st, 2010. All processing
steps (RAW -> ESD, ESD -> AOD, HIST, NTUP and TAG) and the replication of newly produced datasets to the Tier-1 and Tier-2 centers worldwide was completed well before the end of
the year.
ATLAS
Figure 2: OSG CPU hours used by ATLAS (57M hours total), color coded by facility.
In the area of middleware extensions, US ATLAS continued to benefit from the OSG's support
for and involvement in the US ATLAS developed distributed processing and analysis system
(PanDA) layered over the OSG's job management, storage management, security and
information system middleware and services, providing a uniform interface and utilization model
for the experiment's exploitation of the grid, extending not only across OSG but EGEE and
Nordugrid as well. PanDA is the basis for distributed analysis and production ATLAS-wide, and
is also used by the OSG itself as a WMS available to OSG VOs, with the addition this year of a
PanDA based service for OSG Integrated Testbed (ITB) test job submission, monitoring and
automation. This year the OSG's WMS extensions program contributed the effort and expertise
essential to a program of PanDA security improvements that culminated in the acceptance of
PanDA's security model by the WLCG, and the implementation of that model on the foundation
of the glexec system that is the new OSG and EGEE standard service supporting the WLCG-
12
approved security model for pilot job based systems. The OSG WMS extensions program
supported Maxim Potekhin and Jose Caballero at BNL, working closely with the ATLAS PanDA
core development team at BNL and CERN, who provided the design, implementation, testing
and technical liaison (with ATLAS computing management, the WLCG security team, and
developer teams for glexec and ancillary software) for PanDA's glexec-based pilot security
system. This system is now ready for deployment once glexec services are offered at ATLAS
production and analysis sites. The work was well received by ATLAS and WLCG, both of which
have agreed that in a new round of testing to proceed in early 2010 (validating a new component
of the glexec software stack, ARGUS), other experiments will take the lead role in testing and
validation in view of the large amount of work contributed by Jose and Maxim in 2009. This will
allow Jose and Maxim more time in 2010 for a new area we are seeking to build as a
collaborative effort between ATLAS and OSG, in WMS monitoring software and information
systems. ATLAS and US ATLAS are in the process of merging what have been distinct
monitoring efforts, a PanDA/US effort and a WLCG/CERN/ARDA effort, together with a newly
opening development effort in an overarching ATLAS Grid Information System (AGIS) that will
integrate ATLAS-specific information with the grid information systems. Discussions are
beginning with the objective of integrating this merged development effort with the OSG
program in information systems and monitoring. US ATLAS and the OSG WMS effort are also
continuing to seek ways in which the larger OSG community can leverage PanDA and other
ATLAS developments. In response to feedback from prospective PanDA users outside ATLAS,
a simple web-based data handling system was developed that integrates well with PanDA data
handling to allow VOs using PanDA to easily integrate basic dataflow as part of PanDAmanaged processing automation. We also look forward to the full deployment of PanDA for ITB
testbed processing automation. In other joint OSG work, CondorG constitutes the critical
backbone of PanDA's pilot job submission system, and as mentioned earlier in this report we
have benefited greatly from several rounds of consultation and consequent Condor
improvements to increase the scalability of the submission system for pilot jobs. We expect that
in 2010 the OSG and its WMS effort will continue to be the principal source for PanDA security
enhancements, and an important contributor to further integration with middleware (particularly
Condor), and scalability/stress testing.
2.2
CMS
US-CMS relies on Open Science Grid for critical computing infrastructure, operations, and security services. These contributions have allowed US-CMS to focus experiment resources on being prepared for analysis and data processing, by saving effort in areas provided by OSG. OSG
provides a common set of computing infrastructure on top of which CMS, with development effort from the US, has been able to build a reliable processing and analysis framework that runs
on the Tier-1 facility at Fermilab, the project supported Tier-2 university computing centers, and
opportunistic Tier-3 centers at universities. There are currently 18 Tier-3 centers registered with
the CMS computing grid in the US which provide additional simulation and analysis resources to
the US community.
In addition to common interfaces, OSG has provided the packaging, configuration, and support
of the storage services. Since the beginning of OSG the operations of storage at the Tier-2 centers have improved steadily in reliability and performance. OSG is playing a crucial role here for
CMS in that it operates a clearinghouse and point of contact between the sites that deploy and
operate this technology and the developers. In addition, OSG fills in gaps left open by the devel-
13
opers in areas of integration, testing, and tools to ease operations. The stability of the computing
infrastructure has not only benefitted CMS. CMS’ use of resources (see Figure 3 and Figure 4)
has been very much cyclical so far, thus allowing for significant use of the resources by other
scientific communities. OSG is an important partner in Education and Outreach, and in maximizing the impact of the investment in computing resources for CMS and other scientific communities.
In addition to computing infrastructure OSG plays an important role in US-CMS operations and
security. OSG has been crucial to ensure US interests are addressed in the WLCG. The US is a
large fraction of the collaboration both in terms of participants and capacity, but a small fraction
of the sites that make-up WLCG. OSG is able to provide a common infrastructure for operations including support tickets, accounting, availability monitoring, interoperability and documentation. As CMS has entered the operations phase, the need for sustainable security models
and regular accounting of available and used resources has become more important. The common accounting and security infrastructure and the personnel provided by OSG represent significant benefits to the experiment, with the teams at Fermilab and the University of Nebraska
providing the development and operations support, including the reporting and validation of the
accounting information between the OSG and WLCG.
CMS
Figure 3: OSG CPU hours used by CMS (63M hours total), color coded by facility.
14
CMS
Figure 4: Number of unique CMS users of the OSG computing facility.
In addition to these general roles OSG plays for CMS, there were the following concrete contributions OSG has made to major milestones in CMS during the last year:

Within the last year, CMS transitioned to using glideinWMS in production for both reprocessing at the Tier-1s as well as analysis at the Tier-2s and Tier-3s. Burt Holzman at Fermilab and Igor Sfiligoi at UCSD have been spearheading these efforts. OSG has been crucial
in providing expertise during deployment and integration, and working with the Condor team
on resolving a variety of scalability issues within glideinWMS. In addition, OSG provided
developer expertise towards the integration of glideinWMS with the CMS data analysis
framework, CRAB. The next steps in this area are to work out deployment issues with
CERN. Through the leadership of Frank Wuerthwein at UCSD OSG is working with DISUN
and Condor on a more firewall friendly glideinWMS, a requirement for CERN deployment
as well as to overcome the next set of scalability hurdles.

OSG’s introduction of lcg-cp/ls/rm, UberFTP, and MyProxy into the VDT client installation
has eliminated the dependencies on the gLite middleware at CMS sites, thus significantly
simplifying deployment and operations of the sites.

The OSG work on CE scalability and robustness has made CE overloads a rare occurrence at
present scales for CMS. The only time we still experience this as a problem is when VOs
other than CMS use our sites in non-optimal ways.

The investment by the OSG storage team, led by Tanya Levshina at Fermilab, in support of
BeStMan SRM is an excellent example where OSG provided something requested by stakeholders other than CMS, but that has proven to be very beneficial to CMS. CMS now deploys
BeStMan, supported through OSG’s Alex Sim who is a member of Arie Shoshani’s team at
15
LBNL, as part of the preferred Storage Element (SE) solution at Tier-3s, as well as a high
performance SE option for the Tier-2s. The latter in combination with HDFS or Lustre as file
systems, reaching 100Hz lcg-ls rates during scalability tests done by OSG at three of the
CMS Tier-2 sites. This is a 20x increase within the last year. OSG has recently been starting
to include integration and deployment support for the complete package of BeStMan/gridFTP/Fuse/HDFS for use at Tier-2s and Tier-3s. Without OSG, this kind of “technology transfer” into CMS would not have happened.

OSG, through the contributions of Brian Bockelman at the University of Nebraska working
with the CMS and OSG storage teams, has contributed to an improved understanding of the
IO characteristic of CMS analysis jobs, and is providing advice and expertise to an effort in
global CMS of measuring the IO capability of all of the CMS sites worldwide. OSG helped
get this activity off the ground in global CMS, with an expectation that lessons learned will
flow back to OSG to be disseminated to the wider OSG community. The CMS goal here is to
improve the CPU efficiency for data analysis at Tier-2s and Tier-3s worldwide. We expect
this activity to continue well into 2010.
Finally, let us mention a major success in connection with the first LHC collisions observed in
the CMS detector in the afternoon of November 23rd, 2009. By the end of the day, 4 Tier-2s and
one Tier-3 on OSG had transferred the complete 2/3 TB collision data taken that day, and were
making it accessible for analysis to everybody in the collaboration. We consider this an excellent
validation of the LHC computing model, of which OSG is a major part. Since then OSG sites
have kept up with data taking, and are presently hosting all the relevant primary and secondary
datasets from collisions, as well as the corresponding Monte Carlos.
2.3
LIGO
The Einstein@Home data analysis application that searches for gravitational radiation from
spinning neutron stars using data from the Laser Interferometer Gravitational Wave Observatory
(LIGO) detectors was identified as an excellent LIGO application for production runs on OSG.
Because this particular search is virtually unbounded in the scientific merit achieved by additional computing resources, any available resource can provide additional scientific benefit. The
original code base for Einstein@Home has evolved significantly during the first half of this year
to support the use of Condor-G for job submission and job management. This has removed internal dependencies in the code base on the particular job manager in use at various grid sites
around the world. Several other modifications to the code ensued to address stability, reliability
and performance. By May 2009, the code was running reliably in production on close to 20 sites
across the OSG that support job submission from the LIGO Virtual Organization.
Since late summer of 2009 the number of wall clock hours utilized by Einstein at home on the
OSG have nearly doubled from what was typically available in the first half of the year. Since
January 1, 2009, more than 5 million wall clock hours have been contributed by the Open Science Grid to this important LIGO data analysis application (Figure 5). This is primarily due to
effort made to deploy and run the application on a larger number of OSG sites, and working with
local site authorities to understand local policies and how best to support them in the running application.
16
Figure 5: OSG Usage by LIGO's Einstein@Home application for the year running at production
levels. The new code using the Condor-G job submission interface began in March of 2009. Increases seen since early September are due to increases in number of sites compatible with code.
In the past year, LIGO has also begun to investigate ways to migrate the data analysis workflows
searching for gravitational radiation from binary black holes and neutron stars onto the Open
Science Grid for production scale utilization. The binary inspiral data analyses typically involve
working with tens of terabytes of data in a single workflow. Collaborating with the Pegasus
Workflow Planner developers at USC-ISI, LIGO has identified changes to both Pegasus and to
the binary inspiral workflow codes to more efficiently utilize the OSG where data must be
moved from LIGO archives to storage resources near the worker nodes on OSG sites. One area
of particular focus has been on the understanding and integration of Storage Resource Management (SRM) technologies used in OSG Storage Element (SE) sites to house the vast amounts of
data used by the binary inspiral workflows so that worker nodes running the binary inspiral
codes can effectively access the data. A SRM Storage Element has been established on the LIGO
Caltech OSG integration testbed site. This site has 120 CPU cores with approximately 30 terabytes of storage currently configured under SRM. The SE is using BeStMan and Hadoop for the
distributed file system shared among the worker nodes. Using Pegasus for the workflow planning, workflows for the binary inspiral data analysis application needing tens of terabytes of LIGO data have successfully run on this system. Effort is now underway to translate this success
onto OSG productions sites that support the SRM storage element.
LIGO has also been working closely with the OSG, DOE Grids, and ESnet to evaluate the implications of its requirements on authentication and authorization within its own LIGO Data Grid
user community and how these requirements map onto the security model of the OSG.
17
2.4
ALICE
The ALICE experiment at the LHC relies on a mature grid framework, AliEn, to provide computing resources in a production environment for the simulation, reconstruction and analysis of
physics data. Developed by the ALICE Collaboration, the framework has been fully operational
for several years, deployed at ALICE and WLCG Grid resources worldwide. This past year, in
conjunction with plans to deploy ALICE Grid resources in the US, the ALICE VO and OSG
have begun the process of developing a model to integrate OSG resources into the ALICE Grid.
In early 2009, an ALICE-OSG Joint task force was formed to support the inclusion of ALICE
Grid activities in OSG. The task force developed a series of goals leading to a common understanding of AliEn and OSG architectures. The OSG Security team reviewed and approved a
proxy renewal procedure common to ALICE Grid deployments for use on OSG sites. A jobsubmission mechanism was implemented whereby an ALICE VO-box service deployed on the
NERSC-PDSF OSG site, submitted jobs to the PDSF cluster through the OSG interface. The
submission mechanism was activated for ALICE production tasks and operated for several
months. The task force validated ALICE OSG usage through normal reporting means and verified that site operations were sufficiently stable for ALICE production tasks at low job rates and
with minimal data requirements as allowed by the available local resources.
The ALICE VO is currently a registered VO with OSG, supports a representative in the OSG VO
forum and an Agent to the OSG-RA for issuing DOE Grid Certificates to ALICE collaborators.
ALICE use of OSG will grow as ALICE resources are deployed in the US. These resources will
provide the data storage facilities needed to expand ALICE use of OSG and add compute capacity on which the AliEn-OSG interface can be utilized at full ALICE production rates.
2.5
D0 at Tevatron
The D0 experiment continues to rely heavily on OSG infrastructure and resources in order to
achieve the computing demands of the experiment. The D0 experiment has successfully used
OSG resources for many years and plans on continuing with this very successful relationship into
the foreseeable future.
All D0 Monte Carlo simulation is generated at remote sites, with OSG continuing to be a major
contributor. During the past year, OSG sites simulated 390 million events for D0, approximately
1/3 of all production. This is an increase of 62% relative to the previous year. The reason for
this increase is due to many factors. The opportunistic use of storage elements has improved.
The SAM infrastructure has been improved resulting in an increased OSG job efficiency. In addition many sites have modified their preemption policy which improves the throughput for the
D0 jobs. Improved efficiency, increased resources, (D0 used 24 sites in the past year and uses 21
regularly), automated job submission, use of resource selection services and expeditious use of
opportunistic computing continue to play a vital role in high event throughput. The total number
of D0 OSG MC events produced over the past several years is 726 million events (Figure 6).
Over the past year, the average number of Monte Carlo events produced per week by OSG continues to remain approximately constant. Since we use the computing resources opportunistically, it is interesting to find that, on average, we can maintain an approximately constant rate of
MC events. When additional resources become available (April and May of 2009 in Figure 7)
we are quickly able to take advantage of this. Over the past year we were able to produce over
10 million events/week on nine separate occasions and a record of 13 million events/week was
18
set in May 2009. With the turn on of the LHC experiments, it will be more challenging to obtain computing resources opportunistically. It is therefore very important that D0 use the resources it can obtain efficiently. Therefore D0 plans to continue to work with OSG and Fermilab
computing to continue to improve the efficiency of Monte Carlo production on OSG sites.
The primary processing of D0 data continues to be run using OSG infrastructure. One of the very
important goals of the experiment is to have the primary processing of data keep up with the rate
of data collection. It is critical that the processing of data keep up in order for the experiment to
quickly find any problems in the data and to keep the experiment from having a backlog of data.
Typically D0 is able to keep up with the primary processing of data by reconstructing nearly 6
million events/day. However, when the accelerator collides at very high luminosities, it is difficult to keep up with the data using our standard resources. However, since the computing farm
and the analysis farm have the same infrastructure, D0 is able to increase its processing power
quickly to improve its daily processing of data. During the past year, D0 moved some of its
analysis nodes to processing nodes in order to increase its throughput. Figure 8 shows the cumulative number of data events processed per day. In March the additional nodes were implemented and one can see the improvement in events reconstructed. These additional nodes allowed D0
to quickly finish processing its collected data and the borrowed nodes were then returned to the
analysis cluster. Having this flexibility to move analysis nodes to processing nodes in times of
need is a tremendous asset. Over the past year D0 has reconstructed nearly 2 billion events on
OSG facilities.
OSG resources have allowed D0 to meet is computing requirements in both Monte Carlo production and in data processing. This has directly contributed to D0 publishing 37 papers in 2009,
see http://www-d0.fnal.gov/d0_publications/#2009
Figure 6: Cumulative number of D0 MC events generated by OSG during the past year.
19
Figure 7: Number of D0 MC events generated per week by OSG during the past year.
Figure 8: Cumulative number of D0 data events processed by OSG infrastructure. In March additional nodes were added to the processing farm to increase the throughput of the processing farm.
The flat region in August and September corresponds to the time when the accelerator was down
for maintenance so no events needed to be processed.
2.6
CDF at Tevatron
In 2009, the CDF experiment produced 83 new results in 2009 for the winter and summer conferences using OSG infrastructure and resources. These resources support the work of graduate
students, who are producing one thesis per week, and the collaboration as a whole, which is
submitting a publication of new physics results every ten days. Over one billion Monte Carlo
events were produced by CDF in the last year. Most of this processing took place on OSG resources. CDF also used OSG infrastructure and resources to support the processing of 1.8 billion
raw data events that were streamed to 3.4 billion reconstructed events, which were then processed into 5.7 billion ntuple events; an additional 1.5 billion ntuple events were created from
Monte Carlo data. Detailed numbers of events and volume of data are given in Table 2 (total data
since 2000) and Table 3 (data taken in 2009).
20
Table 2: CDF data collection since 2000
Data Type
Raw Data
Production
MC
Stripped-Prd
Stripped-MC
MC Ntuple
Total
Volume (TB)
1398.5
1782.8
796.4
76.8
0.5
283.2
4866.5
# Events (M)
9658.1
12988.5
5618.5
712.1
3.0
4138.3
51151.9
# Files
1609133
1735186
907689
75108
533
257390
5058987
Table 3: CDF data collection in 2009
Data Type
Raw Data
Production
MC
Stripped-Prd
Stripped-MC
Ntuple
MC Ntuple
Total
Data Volume (TB)
286.4
558.2
196.8
21.2
0
187.3
86.0
1335.8
# Events (M)
1791.7
3395.9
1034.6
127.1
0
5732.6
1482.2
13564.2
# Files
327598
466755
228153
17066
0
155129
75954
1270655
The OSG provides the collaboration computing resources through two portals. The first, the
North American Grid portal (NamGrid), covers the functionality of MC generation in an environment which requires the full software to be ported to the site and only Kerberos or grid authenticated access to remote storage for output. The second portal, CDFGrid, provides an environment that allows full access to all CDF software libraries and methods for data handling
which include dCache, disk resident dCache and file servers for small ntuple analysis. CDF, in
collaboration with OSG, aims to improve the infrastructural tools in the next years to increase the
usage of Grid resources, particularly in the area of distributed data handling.
Since March 2009, CDF has been operating the pilot-based Workload Management System
(glideinWMS) as the submission method to remote OSG sites. This system went into production
on the NAmGrid portal. Figure 9 shows the number of running jobs on NAmGrid and demonstrates that there has been steady usage of the facilities, while Figure 10, a plot of the queued requests, shows that there is large demand. The highest priority in the last year has been to validate
sites for reliable usage of Monte Carlo generation and to develop metrics to demonstrate smooth
operations. One impediment to smooth operation has been the rate at which jobs are lost and restarted by the batch system. There were a significant number of restarts until May 2009, after
which the rate tailed down significantly. At that point, it was noticed that most re-starts occurred
at specific sites. These sites were subsequently removed from NAmGrid. Those sites and any
new site are tested and certified in an integration instance of the NAmGrid portal using Monte
Carlo jobs that have previously been run in production.
21
A large resource provided by Korea at KISTI is being brought into operation and will provide a
large Monte Carlo production resource with high-speed connection to Fermilab for storage of the
output. It will also provide a cache that will allow the data handling functionality to be exploited. We are also adding more monitoring to the CDF middleware to allow faster identification of
problem sites or individual worker nodes. Issues of data transfer and the applicability of opportunistic storage is being studied as part of the effort to understand issues affecting reliability.
Significant progress has been made by simply adding retries with a backoff in time assuming that
failures occur most often at the far end.
Figure 9: Running CDF jobs on NAmGrid
Figure 10: Waiting CDF jobs on NAmGrid, showing large demand, especially in preparation for
the 42 results sent to Lepton-Photon in August 2009 and the rise in demand for the winter 2010
conferences.
A legacy glide-in infrastructure developed by the experiment has been running through 2009 until December 8 on the portal to on-site OSG resources (CDFGrid). This has now been replaced
22
by the same glideWMS infrastructure used in NAmGrid. Plots of the running jobs and queued
requests are shown in Figure 11 and Figure 12. The very high demand for the CDFGrid resources observed during the winter conference season (leading to 41 new results), and again during the summer conference season (leading to an additional 42 new results) is noteworthy.
Queues exceeding 50,000 jobs can be seen.
Figure 11: Running CDF jobs on CDFGrid
Figure 12: Waiting CDF jobs on CDFGrid
A clear pattern of CDF computing has emerged. There is high demand for Monte Carlo production in the months after the conference season, and for both Monte Carlo and data starting about
two months before the major conferences. Since the implementation of opportunistic computing
on CDFGrid in August, the NAmGrid portal has been able to take advantage of the computing
resources on FermiGrid that were formerly only available through the CDFGrid portal. This has
led to very rapid production of Monte Carlo in the period of time between conferences when the
generation of Monte Carlo datasets are the main computing demand.
23
In May 2009, CDF conducted a review of the CDF middleware and usage of Condor and OSG.
While there were no major issues identified, a number of cleanup projects were suggested. These
have all been implemented and will add to the long-term stability and maintainability of the
software.
A number of issues affecting operational stability and operational efficiency have arisen in the
last year. These issues and examples and solutions or requests for further OSG development are
cited here.

Architecture of an OSG site: Among the major issues we encountered in achieving smooth
and efficient operations was a serious unscheduled downtime lasting several days in April.
Subsequent analysis found the direct cause to be incorrect parameters set on disk systems that
were simultaneously serving the OSG gatekeeper software stack and end-user data output areas. No OSG software was implicated in the root cause analysis; however, the choice of an
architecture was a contributing cause. This is a lesson worth consideration by OSG, that a
best practices recommendation coming from a review of the implementation of computing
resources across OSG could be a worthwhile investment.

Service level and Security: Since April, Fermilab has had a new protocol for upgrading
Linux kernels with security updates. An investigation of the security kernel releases from the
beginning of 2009 showed that for both SLF4 and SLF5 the time between releases was
smaller than the maximum time allowed by Fermilab for a kernel to be updated. Since this
requires a reboot of all services, this has forced the NAmGrid and CDFGrid to be down for
three days for long (72 hour) job queues every 2 months. A rolling reboot scheme has been
developed and is deployed, but careful sequencing of critical servers is still being developed.

Opportunistic Computing/Efficient resource usage: The issue of preemption has been important to CDF this year. CDF has both the role of providing and exploiting opportunistic
computing. During the April downtime already mentioned above, the preemption policy as
provider caused operational difficulties characterized vaguely as “unexpected behavior”.
The definition of expected behavior was worked out and preemption was enabled after the
August 2009 conferences ended.
From the point of view of exploiting opportunistic resources, the management of preemption
policies at sites has a dramatic effect on the ability of CDF to utilize those sites opportunistically. Some sites, for instance, modify the preemption policy from time to time. Tracking
these changes requires careful communication with site managers to ensure that the job duration options visible to CDF users are consistent with the preemption policies. CDF has added
queue lengths in an attempt to provide this match. The conventional way in which CDF Monte Carlo producers compute their production strategy, however, requires queue lengths that
exceed the typical preemption time by a considerable margin. To address this problem, the
current submission strategies are being re-examined. A more general review of the Monte
Carlo production process at CDF will be carried out in February 2010.
A second common impediment to opportunistic usage is the exercise of a preemption policy
that kill jobs immediately when a higher priority user submits a job, rather than a more graceful policy that allows completion of the lower priority job within some time frame. CDF has
removed all such sites from NAmGrid because the effective job failure rate is too high for
users most users to tolerate. This step has significantly reduced the OSG resources available
to CDF, which now essentially consists of those sites at which CDF has paid for computing.
24
Clear policies and guidelines on opportunistic usage, publication of the policies in human and
computer readable form are needed so that the most efficient use of computing resources may
be achieved.

Infrastructure Reliability and Fault tolerance: Restarts of jobs are not yet completely
eliminated. The main cause seems to be individual worker nodes that are faulty and reboot
from time to time. While it is possible to blacklist nodes at the portal level, better sensing of
these faults and removal at the OSG infrastructural level would be more desirable. We continue to emphasize the importance of stable running and minimization of infrastructure failures so that users can reliably assume that failures are the result of errors in their own processing code, thereby avoiding the need to continually question the infrastructure.

Job and Data handling interaction: Job restarts also cause a loss of synchronization between the job handling and data handling. A separate effort is under way to improve the recovery tools within the data handling infrastructure.
Tools and design are needed to allow for the Job and data handling to be integrated and to allow fault tolerance for both systems to remain synchronized.

Management of input Data resources: The access to data both from databases and data
files has caused service problems in the last year.
The implementation of opportunistic running on CDFGrid from NAmGrid, coupled with decreased demand on CDFGrid for file access in the post-conference period and a significant
demand for new Monte Carlo datasets to be used in the 2010 Winter conference season led to
huge demands on the CDF Database infrastructure and grid job failures due to overloading of
the databases. This has been traced to complex queries whose computations should be done
locally rather than on the server. Modifications in the simulation code are being implemented and throttling of the Monte Carlo production is required until the new code is available in
January 2010.
During the conference crunch in July 2009, there was huge demand on the data-handling infrastructure and the 350TB disk cache was being turned over every two weeks. Effectively
the files were being moved from tape to disk, being used by jobs and deleted. This in turn led
to many FermiGrid worker nodes sitting idle waiting for data. A program to understand the
causes of idle computing nodes from this and other sources has been initiated and CDF users
are asked to more accurately describe what work they are doing when they submit jobs by
filling in qualifiers in the submit command. In addition, work to implement and monitor the
effectiveness of pre-staging files is nearing completion. There is a general resource management problem pointed to by this and the database overload issue.
Resource requirements of jobs running on OSG should be examined in a more considered
way and would benefit from more thought by the community at large.
The usage of OSG for CDF has been fruitful and the ability to add large new resources such as
KISTI as well as more moderate resources within a single job submission framework has been
extremely useful for CDF. The collaboration has produced significant new results in the last
year with the processing of huge data volumes. Significant consolidation of the tools has occurred. In the next year, the collaboration looks forward to a bold computing effort in the push
to see evidence for the Higgs boson, a task that will require further innovation in data handling
and significant computing resources in order to reprocess the large quantities of Monte Carlo and
25
data needed to achieve the desired improvements in tagging efficiencies. We look forward to
another year with high publication rates and interesting discoveries.
2.7
Nuclear physics
The STAR experiment has continued the use of data movement capabilities between its established Tier-1 and Tier-2 centers and between BNL and LBNL (Tier-1), Wayne State University
and NPI/ASCR in Prague (two fully functional Tier-2 centers). A new center, the Korea Institute
of Science and Technology Information (KISTI) has joined the STAR collaboration as a full
partnering facility and a resource provider in 2008. Activities surrounding the exploitation of this
new potential have taken a large part of STAR’s activity in the 2008/2009 period.
The RHIC run 2009 was projected to bring to STAR a fully integrated new data acquisition system with data throughput capabilities going from 100 MB/sec reached in 2004 to1000 MB/sec.
This is the second time in the experiment’s lifetime STAR computing has to cope with an order
of magnitude growth in data rates. Hence, a threshold in STAR’s Physics program was reached
where leveraging all resources across all available sites has become essential to success. Since
the resources at KISTI have the potential to absorb up to 20% of the needed cycles for one pass
data production in early 2009, efforts were focused on bringing the average data transfer
throughput from BNL to KISTI to 1 Gb/sec. It was projected (Section 3.2 of the STAR computing resource planning, “The STAR Computing resource plan”, STAR Notes CSN0474,
http://drupal.star.bnl.gov/STAR/starnotes/public/csn0474) that such a rate would sustain the need
up to 2010 after which a maximum of 1.5 Gb/sec would cover the currently projected Physics
program up to 2015. Thanks to the help from ESNet, Kreonet and collaborators at both end institutions this performance was reached (see http://www.bnl.gov/rhic/news/011309/story2.asp,
“From BNL to KISTI: Establishing High Performance Data Transfer From the US to Asia” and
http://www.lbl.gov/cs/Archive/news042409c.html, “ESnet Connects STAR to Asian Collaborators”). At this time baseline Grid tools are used and the OSG software stack has not yet been deployed. STAR plans to include a fully automated job processing capability and return of data results using BeStMan/SRM (Berkeley’s implementation of SRM server).
Encouraged by the progress on the network tuning for the BNL/KISTI path and driven by the
expected data flood from Run-9, the computing team re-addressed all of its network data transfer
capabilities, especially between BNL and NERSC and between BNL and MIT. MIT has been a
silent Tier-2, a site providing resources for local scientist’s research and R&D work but has not
been providing resources to the collaboration as a whole. MIT has been active since the work
made on Mac/X-Grid reported in 2006, a well-spent effort which has evolved in leveraging additional standard Linux-based resources. Data samples are routinely transferred between BNL and
MIT. The BNL/STAR gatekeepers have all been upgraded and all data transfer services are being re-tuned based on the new topology. Initially planned for the end of 2008, the strengthening
of the transfers to/from well-established sites was a delayed milestone (6 months) to the benefit
of the BNL/KISTI data transfer.
A research activity involving STAR and the computer science department at Prague has been
initiated to improve the data management program and network tuning. We are studying and
testing a multi-site data transfer paradigm, coordinating movement of datasets to and from multiple locations (sources) in an optimal manner, using a planner taking into account the performance of the network and site. This project relies on the knowledge of file locations at each site
and a known network data transfer speed as initial parameters (as data is moved, speed can be re-
26
assessed so the system is a self-learning component). The project has already shown impressive
gains over a standard peer-to-peer approach for data transfer. Although this activity has so far
impacted OSG in a minimal way, we will use the OSG infrastructure to test our implementation
and prototyping. To this end, we paid close attention to protocols and concepts used in Caltech’s
Fast Data Transfer (FDT) tool as its streaming approach has non-trivial consequence and impact
on TCP protocol shortcomings. Design considerations and initial results were presented at the
Grid2009 conference and published in the proceedings as “Efficient Multi-Site Data Movement in
distributed Environment”. The implementation is not fully ready however and we expect further
development in 2010. Our Prague site also presented their previous work on setting up a fully
functional Tier 2 site at the CHEP 2009 conference as well as summarized our work on the Scalla/Xrootd and HPSS interaction and how to achieve efficient retrieval of data from mass storage
using advanced request queuing techniques based on file location on tape but respecting faireshareness. The respective asbtract “Setting up Tier2 site at Golias/ Prague farm” and “Fairshare scheduling algorithm for a tertiary storage system” are available as
http://indico.cern.ch/abstractDisplay.py?abstractId=432&confId=35523
and
http://indico.cern.ch/abstractDisplay.py?abstractId=431&confId=35523.
STAR has continued to use and consolidate the BeStMan/SRM implementation and has continued active discussions, steering and integration of the messaging format from the Center for Enabling Distributed Petascale Science’s (CEDPS) Troubleshooting team, in particular targeting
use of BeStMan client/server troubleshooting for faster error and performance anomaly detection
and recovery. BeStMan and syslog-ng deployments at NERSC provide early testing of new features in a production environment, especially for logging and recursive directory tree file transfers. At the time of this report, an implementation is available whereas BeStMan based messages are passed to a collector using syslog-ng. Several problems have already been found, leading to strengthening of the product. We hoped to have a case study within months but we are at
this time missing a data-mining tool able to correlate (hence detect) complex problems and automatically send alarms. The collected logs have been useful however to, at a single source of
information, find and identify problems. STAR has also finished developing its own job tracking
and accounting system, a simple approach based on adding tags at each stage of the workflow
and collecting the information via recorded database entries and log parsing. The work was presented at the CHEP 2009 conference (“Workflow generator and tracking at the rescue of distributed processing. Automating the handling of STAR's Grid production”, Contribution ID 475,
CHEP 2009, http://indico.cern.ch/ contributionDisplay.py?contribId=475&confId= 35523).
STAR has also continued an effort to collect information at application level, build and learn
from in-house user-centric and workflow-centric monitoring packages. The STAR SBIR TechX/UCM project, aimed to provide a fully integrated User Centric Monitoring (UCM) toolkit, has
reached its end-of-funding cycle. The project is being absorbed by STAR personnel aiming to
deliver a workable monitoring scheme at application level. The reshaped UCM library has been
used in nightly and regression testing to help further development (mainly scalability, security
and integration into Grid context). Several components needed reshape as the initial design approach, too complex, slowed down maintenance and upgrade. To this extent, a new SWIG
(“Simplified Wrapper and Interface Generator”) based approach was used and reduced the overall size of the interface package by more than an order of magnitude. The knowledge and a working infrastructure based on syslog-ng may very well provide a simple mechanism for merging
UCM with CEDPS vision. Furthermore, STAR has developed a workflow analyzer for experimental data production (simulation mainly) and presented the work at the CHEP 2009 confer27
ence as “Automation and Quality Assurance of the Production Cycle”
(http://indico.cern.ch/abstractDisplay.py?abstractId=475&confId=35523) now accepted for publication. The toolkit developed in this activity allows extracting independent accounting and statistical information such as task efficiency, percentage success allowing keeping a good record of
production made on Grid based operation. Additionally, a job feeder was developed allowing
automatic throttling of job submission across multiple site, keeping all site at maximal occupancy but also detecting problems (gatekeeper downtimes and other issues). The feeder has the ability to automatically re-submit failed jobs for at least N times, bringing the overall job success
efficiency for only one re-submission to 97% success. This tool was used in the Amazon EC2
exercise (see later section).
STAR grid data processing and job handling operations have continued their progression toward
a full Grid-based operation relying on the OSG software stack and the OSG Operation Center
issue tracker. The STAR operation support team has been efficiently addressing issues and stability. Overall the grid infrastructure stability seems to have increased. To date, STAR has however mainly achieved simulated data production on Grid resources. Since reaching a milestone
in 2007, it has become routine to utilize non-STAR dedicated resources from the OSG for the
Monte-Carlo event generation pass and to run the full response simulator chain (requiring the
whole STAR framework installed) on STAR’s dedicated resources. On the other hand, the relative proportion of processing contributions using non-STAR dedicated resources has been marginal (and mainly on the FermiGrid resources in 2007). This disparity is explained by the fact
that the complete STAR software stack and environment, which is difficult to impossible to recreate on arbitrary grid resources, is necessary for full event reconstruction processing and hence,
access to generic and opportunistic resources are simply impractical and not matching the realities and needs of running experiments in Physics production mode. In addition, STAR’s science
simply cannot suffer the risk of heterogeneous or non-reproducible results due to subtle library or
operating system dependencies and the overall workforce involved to ensure seamless results on
all platforms exceeds our operational funding profile. Hence, STAR has been a strong advocate
for moving toward a model relying on the use of Virtual Machine (see contribution at the OSG
booth @ CHEP 2007) and have since closely work, to the extent possible, with the CEDPS Virtualization activity, seeking the benefits of truly opportunistic use of resources by creating a
complete pre-packaged environment (with a validated software stack) in which jobs will run.
Such approach would allow STAR to run any one of its job workflow (event generation, simulated data reconstruction, embedding, real event reconstruction and even user analysis) while respecting STAR’s policies of reproducibility implemented as complete software stack validation.
The technology has huge potential in allowing (beyond a means of reaching non-dedicated sites)
software provisioning of Tier-2 centers with minimal workforce to maintain the software stack
hence, maximizing the return to investment of Grid technologies. The multitude of combinations
and the fast dynamic of changes (OS upgrade and patches) make the reach of the diverse resources available on the OSG, workforce constraining and economically un-viable.
28
Figure 13: Corrected STAR recoil jet distribution vs pT
This activity reached a world-premiere milestone when STAR made used of the Amazon/EC2
resources, using Nimbus Workspace service to carry part of its simulation production and handle
a late request. These activities were written up in iSGTW (“Clouds make way for STAR to
shine”, http://www.isgtw.org/?pid=1001735, Newsweek (“Number Crunching Made Easy Cloud computing is making high-end computing readily available to researchers in rich and
poor nations alike“ http://www.newsweek.com/id/195734), SearchCloudComputing (“Nimbus
cloud project saves brainiacs' bacon“ http://searchcloudcomputing.techtarget.com/news/article/
0,289142,sid201_gci1357548,00.html) and HPCWire (“Nimbus and Cloud Computing Meet
STAR
Production
Demands“
http://www.hpcwire.com/offthewire/Nimbus-and-CloudComputing-Meet-STAR-Production-Demands-42354742.html?page=1). This was the very first
time cloud computing had been used in the HENP field for scientific production work with full
confidence in the results. The results were presented during a plenary talk at CHEP 2009 conference (Figure 13), and represent a breakthrough in production use of clouds. We are working with
the OSG management for the inclusion of this technology into OSG’s program of work.
Continuing on this activity in the second half of 2009, STAR has undertaken testing of various
models of cloud computing on OSG since the EC2 production run. The model used on EC2 was
to deploy a full OSG-like compute element with gatekeeper and worker nodes. Several groups
within the OSG offered to assist and implement diverse approaches. The second model, deployed
29
at Clemson University, uses a persistent gatekeeper and having worker nodes launched using a
STAR specific VM image. Within the image, a Condor client then register to an external Condor
master hence making the whole batch system is completely transparent to the end-user (the instantiated VM appear as just like other nodes, the STAR jobs slide in the VM instances where it
finds a fully supported STAR environment and software package). The result is similar to configuring a special batch queue meeting the application requirements contained in the VM image
and then many batch jobs can be run in that queue. This model is being used at a few sites in
Europe as described at a recent HEPiX meeting, http://indico.cern.ch/conferenceTimeTable.py
?confId=61917. A third model has been preliminarily tested at Wisconsin where the VM image
itself acts as the payload of the batch job and is launched for each job. This is similar to the condor glide-in approach and also the pilot-job method where the useful application work is performed after the glide-in or pilot job starts. This particular model is not well matched to the present STAR SUMS workflow as jobs would need to be pulled in the VM instance rather than integrated as a submission via a standard Gatekeeper (the GK interaction only starts instances).
However, our MIT team will pursue testing at Wisconsin and attempt to a demonstrator simulation run and measure efficiency and evaluate practicality within this approach. One goal of the
testing at Clemson and Wisconsin is to eventually reach a level where scalable performance can
be compared with running on traditional clusters. The effort in that direction is helping to identify various technical issues, including configuration of the VM image to match the local batch
system (contextualization), considerations for how a particular VM image is selected for a job,
policy and security issues concerning the content of the VM image, and how the different models
fit different workflow management scenarios.
Our experience and results in this cloud/grid integration domain are very encouraging regarding
the potential usability and benefits of being able to deploy application specific virtual machines,
and also indicate that a concerted effort is necessary in order to address the numerous issues exposed and reach an optimal deployment model.
All STAR physics publications acknowledge the resources provided by the OSG.
2.8
MINOS
Over the last three years, computing for MINOS data analysis has greatly expanded to use more
of the OSG resources available at Fermilab. The scale of computing has increased from about 50
traditional batch slots to typical user jobs running on over 1,000 cores, with a strong desire to
expand to about 5,000 cores (over the past 12 months they have used 3.1M hours on OSG from
1.16M submitted jobs). This computing resource, combined with 90 TB of dedicated BlueArc
(NFS mounted) file storage, has allowed MINOS to move ahead with traditional and advanced
analysis techniques, such as Neural Network, Nearest Neighbor, and Event Library methods.
These computing resources are critical as the experiment moves beyond the early, somewhat
simpler Charged Current physics, to more challenging Neutral Current, +e and other analyses
which push the limits of the detector. We use a few hundred cores of offsite computing at collaborating universities for occasional Monte Carlo generation. MINOS is also starting to use
TeraGrid resources at TACC, hoping to greatly speed up their latest processing pass.
2.9
Astrophysics
The Dark Energy Survey (DES) used approximately 40,000 hours of OSG resources in 2009,
with DES simulation activities ramping up in the latter part of the year. The most recent DES
simulation run produced 3.5 Terabytes of simulated imaging data, which were used for testing
30
the DES data management data processing pipelines as part of DES Data Challenge 5. These
simulations consisted of 2,600 mock science images, covering 150 square degrees of the sky,
along with another 900 calibration images. Each 1-GB-sized DES image is produced by a single
job on OSG and simulates the stars and galaxies on the sky covered in a single 3-square-degree
pointing of the DES camera. The processed simulated data are also being actively used by the
DES science working groups for development and testing of their science analysis codes. DES
expects to roughly triple its usage of OSG resources over the following 12 months, as we work to
produce a larger simulations data set for DES Data Challenge 6 in 2010.
2.10
Structural Biology
Biomedical computing
In 2009 we established NEBioGrid VO to support biomedical research on OSG (nebiogrid.org),
and to integrate the regional biomedical resources in the New England area. NEBioGrid has deployed the OSG MatchMaker service that was provided by RENCI. The Matchmaker provides
automatic resource selection and scheduling. When combined with Condor's DAGMan, job
submission, remote assignment, and resubmission (in the event of a failure it can be automated
with minimal user intervention required). This will allow NEBioGrid to streamline utilization of
a wide range of OSG resources for its researchers.
A researcher at Harvard Medical School expressed interest in utilizing OSG resources for his
project; within a week's time NEBioGrid had an initial batch of jobs running on OSG sites. They
have continued to refine submission scripts and job work-flow to improve the success rate and
efficiency of the researcher's continuing computations. A framework has also been established
for a researcher at Massachusetts General Hospital, which will allow him to utilize OSG to access computational resources far surpassing what was available to him previously. Work is also
in progress with researchers at Northeastern University and Children's Hospital who have expressed a need and desire to utilize OSG.
NEBioGrid collaborated with the Harvard Medical School West Quad Computing Group to help
set up a new cluster configured as an OSG CE. This work was completed within a couple weeks
to bring their CE into operational status and receive remote jobs. A cluster utilizing Infiniband
for MPI calculations is currently being installed in collaboration with Children's Hospital and the
Immune Disease Institute. This cluster will be used for long-running molecular dynamics simulations, and will be made available as an OSG CE with MPI support in 2010. Progress is under
way with another HMS IT group to integrate their cluster into OSG.
NEBioGrid holds regular seminars and workshops on HPC and grid computing related subjects.
Some talks are organized in collaboration with the Harvard University Faculty of Arts and Sciences. In April the topics were the LSF batch management system, and Harvard's Odyssey cluster, which contributes to the ATLAS experiment. In July Ian Stokes-Rees spoke about collaborative web portal interfaces to HPC resources. In August Johan Montagnat spoke on the topic of
large-scale medical image analysis using the European grid infrastructure. In October a full-day
workshop was held in collaboration with Northeastern University; over a dozen speakers presented on the topic of Biomedical computing on GPUs. In December Theresa Kaltz spoke about
Infiniband and other High-Speed Interconnects.
31
Structural biology computing
There were two major developments in 2009. First, SBGrid has deployed a range of applications
onto over a dozen OSG sites. The primary applications that have been utilized are two molecular
replacement programs: Molrep and Phaser. These programs are widely used by structural biologists to identify the 3-D structure of proteins by comparing imaging data from the unknown
structure to that of known protein fragments. Typically a data set for an unknown structure is
compared to a single set of protein coordinates. SBGrid has developed a technique to do this
analysis with 100,000 fragments, requiring between 2000 and 15,000 hours for a single structure
study, depending on the exact application and configuration parameters. Our early analysis with
Molrep indicated that signals produced by models with very weak sequence identity are, in many
cases, too weak to produce meaningful ranking. The study was repeated utilizing Phaser - a maximum likelihood application that requires significant computing resources (searches with individual coordinates take between 2-10 minutes for crystals with a single molecule in an asymmetric unit, and longer for molecules with many copies of the same molecule). With Phaser a significant improvement in sensitivity of global molecular replacement was achieved and we have recently identified several very encouraging cases. For example structure of NADH:FMN oxidoreductase (PDB code: 2VZF, (Figure 14B gray), originally solved by experimental phasing method), could be phased with coordinates of SCOP model 1zwkb1 (Figure 14B teal). The SCOP
model was one of four structures that formed a distinct cluster from the rest of SCOP models,
and yet has a very weak sequence identity (<15%). In the testing phase of the project between
Sept-Nov 2009 SBGrid has ramped up its utilization of OSG, running 42,000 jobs consuming
over 200,000 CPU hours. The portal and our new method have been recently tested by users
from Yale University and UCSF, and will soon become available to all members of the structural
biology community.
Figure 14: Global Molecular replacement. A - after searching with 100,000 SCOP domains four
models form a distinct cluster (highlighted). B - one of the SCOP domain in the cluster (teal) superimposes well with 2VZF coordinates (grey), although sequence identity between two structures is
32
minimal. C - SBGrid molecular replacement portal deploys computations to OSG resources. Typical runtime for an individual search with a single SCOP domain is 10 minutes.
In the second major development, in collaboration with Harrison laboratory at Harvard Medical
School, we deployed on the grid a method to build and refine a three dimensional (3D) protein
structure primarily based on orientational constraints obtained from nuclear magnetic resonance
(NMR) data. We measured three sets of orthogonal residual dipolar couplings (rdc) constraints
corresponding to protein backbone atom pairs (N-NH, N-CO and CO-CA vectors). In our protocol, the experiential data is first divided into small, sequential rdc sets. Each set is used to scan a
database of known backbone fragments. We generated a database of 5,7,9,12,15,20 and 25 residues fragments by processing a selected group of high quality x-ray and NMR structures deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB
PDB). For each protein fragment in the database, we score our experimental data using the Prediction of Alignment from Structure procedure (PALES). The top scoring fragments define the
local protein conformation; the global protein structure is refined in a second step using xplor.
Each PALES comparison requires the prediction of an alignment tensor from a known 3D coordinate file (each fragment) and the statistical analysis of the fit to the experimental dataset. Every PALES analysis takes about 50 sec to complete on a current state-of-the-art single core processor. A typical fragment database consists of ~300,000 fragments and our experiential data
(obtained on a 32 KD membrane protein) is divided into ~300 sequential rdc sets. This requires ~
109 comparisons or ~52 days for completion on a single core computer. The scoring step can be
easily divided in to ~1000 independent jobs, reducing the execution time to only a few hours per
database. We have used the SBGrid computing infrastructure to develop our protocol and to efficiently process our data. We are currently refining the 3D protein structure.
2.11
Multi-Disciplinary Sciences
The Engagement team has worked directly with researchers in the areas of: biochemistry (Xu),
molecular replacement (PRAGMA), molecular simulation (Schultz), genetics (Wilhelmsen), information retrieval (Blake), economics, mathematical finance (Buttimer), computer science
(Feng), industrial engineering (Kurz), and weather modeling (Etherton).
The computational biology team led by Jinbo Xu of the Toyota Technological Institute at Chicago uses the OSG for production simulations on an ongoing basis. Their protein prediction software, RAPTOR, is likely to be one of the top three such programs worldwide.
A chemist from the NYSGrid VO using several thousand CPU hours a day sustained as part of
the modeling of virial coefficients of water. During the past six months a collaborative task force
between the Structural Biology Grid (computation group at Harvard) and OSG has resulted in
porting of their applications to run across multiple sites on the OSG. They are planning to publish science based on production runs over the past few months.
2.12
Computer Science Research
A collaboration between OSG extensions program, the Condor project, US ATLAS and US
CMS is using the OSG to test new workload and job management scenarios which provide “justin-time” scheduling across the OSG sites using “glide-in” methods to schedule a pilot job locally
at a site which then requests user jobs for execution as and when resources are available. This
includes use of the “GLExec” component, which the pilot jobs use to provide the site with the
identity of the end user of a scheduled executable.
33
3.
Development of the OSG Distributed Infrastructure
3.1
Usage of the OSG Facility
The OSG facility provides the platform that enables production by the science stakeholders; this
includes operational capabilities, security, software, integration, and engagement capabilities and
support. We are continuing our focus on providing stable and reliable “production” level capabilities that the OSG science stakeholders can rely on for their computing work and get timely support when needed.
The stakeholders continue to ramp up their use of OSG, and ATLAS and CMS VOs are ready for
LHC data taking that is beginning. They have each run several workload tests (including
STEP09) and the OSG infrastructure has performed well in these exercises and is ready to support full data taking.
Figure 15: OSG facility usage vs. time broken down by VO
In the last year, the usage of OSG resources by VOs has continued to increase from 3,500,000
hours per week to over 5,000,000 hours per week, sustained; additional detail is provided in attachment 1 entitled “Production on the OSG.” OSG provides an infrastructure that supports a
broad scope of scientific research activities, including the major physics collaborations, nanoscience, biological sciences, applied mathematics, engineering, and computer science. Most of the
current usage continues to be in the area of physics but non-physics use of OSG is a growth area
with current usage exceeding 200,000 hours per week (averaged over the last year) spread over
13 VOs.
34
Figure 16: OSG facility usage vs. time broken down by Site.
(Other represents the summation of all other “smaller” sites)
With about 80 sites, the production provided on OSG resources continues to grow; the usage varies depending on the needs of the stakeholders. During stable normal operations, OSG provides
approximately 700,000 CPU wall clock hours a day with peaks occasionally exceeding 900,000
CPU wall clock hours a day; approximately 150,000 opportunistic wall clock hours are available
on a daily basis for resource sharing.
In addition, OSG is providing significant effort and technical planning devoted to preparing for a
large influx of CMS (~20 new) and Atlas (~20 new) Tier-3 sites that will be coming online early
in 2010. These ~40 Tier-3 sites are notable since many of the administrators are not expected to
have formal computer science training. To support these sites (in collaboration with Atlas and
CMS), OSG has been focused on creating both documentation as well as a support structure suitable for these sites. To date the effort has been directed in multiple directions:

Onsite help and hands-on assistance to the Atlas and CMS Tier-3 coordinators in setting up
their Tier-3 test sites including several multi-day meetings to bring together the OSG experts
needed to answer and document specific issues relevant to the Tier-3s. OSG hosts regular
meetings with these coordinators as well to discuss issues and plan steps forward.

OSG packaging and support for Tier-3 components such as XRootd that are projected to be
installed at over half of the Tier-3 sites (primarily Atlas sites). This includes testing and
working closely with the Xrootd development team via bi-weekly meetings.

Refactoring OSG workshops to draw in the smaller sites by incorporating tutorials and detailed instruction. The Site-admin Workshop in Indianapolis this year for example presented
in-depth tutorials for installing and/or upgrading CE and SE components. Over half of the
35
sites represented were able to install or upgrade a component while at the workshop, and the
remainder completed their upgrades approximately two weeks later.

OSG documentation for Tier-3s that begins from the point where systems have only an operating system installed. Hence sections for site planning, file system setup, basic networking
instructions, and cluster setup and configuration were written together with more detailed explanations of each step. https://twiki.grid.iu.edu/bin/view/Tier3/WebHome.

Working directly with new CMS and Atlas site administrators as they start to deploy their
sites and in some cases are working through their documents.

Bi-weekly site meetings geared toward Tier-3s in conjunction with the ongoing site coordination effort including office hours held three times every week to discuss issues that arise involving all aspects of the sites. The site meetings are just beginning, and still mostly attended
by Tier-2s, but as new Tier-3's start to come online, we expect Tier-3 attendance to pick up.
As the LHC re-starts data-taking in December 2009, the OSG infrastructure has withstood the
test of the latest I/O and computational challenges: STEP09 for Atlas and CMS, the Atlas User
Analysis Test (UAT), and the CMS physics challenge. As a result OSG has demonstrated that it
is meeting the needs of USCMS and USATLAS stakeholders and is well positioned to handle the
planned ramp-up of I/O and analysis forecasts for 2010.
3.2
Middleware/Software
To enable a stable and reliable production platform, the middleware/software effort has increased
focus on support and capabilities that improve administration, upgrades, and support. Between
January 2009 and December 2009, OSG’s software efforts focused on supporting OSG 1.0; developing, releasing, and supporting OSG 1.2, and a new focus on so-called native packaging.
As in all major software distributions, significant effort must be given to ongoing support. OSG
1.0 was released in August 2008 and was extensively documented in last year’s annual report.
Subsequently, there have been 24 incremental updates to OSG 1.0, which demonstrates that OSG
1.0 was successful in one of our main goals: being able to support incremental updates to the
software stack, something that has been traditionally challenging in the OSG software stack.
That said, there were still significant challenges to being able support incremental updates using
our distribution mechanisms (packaging software called Pacman), so in early 2009, we developed OSG 1.2. Our goal was to release OSG 1.2 before the restart of the LHC so that sites could
install the new version. We had a pre-release in June 2009, and it was formally released in July
2009, which gave sites sufficient time to upgrade if they chose to do so, and roughly 60% of the
sites in OSG have done so; since the initial release in June we have released 11 software updates
to OSG 1.2.
Most of the software updates to the OSG software stack were “standard” updates featuring numerous bug fixes, security fixes, and occasional minor feature upgrades. It should be noted that
this general maintenance consumes roughly 50% of the effort of the OSG Software effort.
There have been several software updates and events in the last year that are worthy of deeper
discussion. As background, the OSG software stack is based on the VDT grid software distribution. The VDT is grid-agnostic and used by several grid projects including OSG, TeraGrid, and
WLCG. The OSG software stack is the VDT with the addition of OSG-specific configuration.
36
1) VDT 1.10.1v (released in April 2009) was a significant new update that stressed our ability
to supply a major incremental upgrade without requiring complete re-installations. To do
this, we supplied a new update program that assists site administrators with the updating process and ensures that it is done correctly. This updater will be used for all future updates
provided by the VDT. The update provided a new version of Globus, an update to our authorization infrastructure, and an update to our information infrastructure. It underwent significant testing both internally and by VOs in our integration testbed.
2) OSG 1.2 was released in July 2009. Not only did it significantly improve our ability to provide updates to users, but it also added support for a new operating system (Debian 5), which
is required by LIGO. This differed from VDT 1.10.1v in that we tackled a new source of
packaging difficulty which prevented us from easily adding new platform support without
disrupting our ability to provide incremental updates for other users.
3) Since the summer of 2009, we have been focusing on the needs of the upcoming ATLAS and
CMS Tier-3 sites. In particular, we have focused on Tier-3 support, particularly with respect
to new storage solutions. We have improved our packaging, testing, and releasing of BeStMan, XRootd, and Hadoop, which are a large part of our set of storage solutions.
4) We have emphasized improving our storage solutions in OSG. This is partly for the Tier-3
effort mention in item #3, but is also for broader use in OSG. For example, we have created
new testbeds for Xrootd and Hadoop and expanded our test suite to ensure that the storage
software we support and release are well tested and understood internally. We have started
regular meetings with the XRootd developers and ATLAS to make sure that we understand
how development is proceeding and what changes are needed. We have also provided new
tools to help users query our information system for discovering information about deployed
storage systems, which has traditionally been hard in OSG. We expect these tools to be particularly useful to LIGO and SCEC, though other VOs will likely benefit as well. We also
conducted an in-person storage forum in June 2009 at Fermilab, to help us better understand
the needs of our users and to directly connect them with storage experts.
5) We have begun intense efforts to provide the OSG software stack as so-called “native packages” (e.g. RPM on Red Hat Enterprise Linux). With the release of OSG 1.2, we have pushed
the packaging abilities of our infrastructure (based on Pacman) as far as we can. While our
established users are willing to use Pacman, there has been a steady pressure to package
software in a way that is more similar to how they get software from their OS vendors. With
the onset of Tier-3s, this effort has become more important because system administrators at
Tier-3s are often less experienced and have less time to devote to managing their OSG sites.
We have wanted to support native packages for some time, but have not had the effort to do
so, due to other priorities; but it has become clear that we must do this now. We are initially
focusing on the needs of the LIGO experiment, but in early 2010 hope to gradually phase in
native package support to meet the needs of ATLAS, CMS, and other VOs, eventually supporting the entire OSG software stack as native packages.
The VDT continues to be used by external collaborators. EGEE/WLCG uses portions of VDT
(particularly Condor, Globus, UberFTP, and MyProxy). The VDT team maintains close contact
with EGEE/WLCG due to the OSG Software Coordinator's (Alain Roy's) weekly attendance at
the EGEE Engineering Management Team's phone call. TeraGrid and OSG continue to maintain
a base level of interoperability by sharing a code base for Globus, which is a release of Globus,
37
patched for OSG and TeraGrid’s needs. The VDT software and storage coordinators (Alain Roy
and Tanya Levshina) are members of the WLCG Technical Forum, which is addressing ongoing
lacks, needs and evolution of the WLCG infrastructure in the face of data taking.
3.3
Operations
The OSG Grid Operations Center (GOC) provides the central point for operational support for
the Open Science Grid and provides the coordination for various distributed OSG services. The
GOC performs real time monitoring of OSG resources, supports users, developers and system
administrators, maintains critical information services, provides incident response, and acts as a
communication hub. The primary goals of the OSG Operations group are: supporting and
strengthening the autonomous OSG resources, building operational relationships with peering
grids, providing reliable grid infrastructure services, ensuring timely action and tracking of operational issues, and assuring quick response to security incidents. In the last year, the GOC continued to provide the OSG with a reliable facility infrastructure while at the same time improving
services to offer more robust tools to the OSG stakeholders.
During the last year, the GOC continued to provide and improve tools and services for the OSG:

The OSG Information Management (OIM) database that provides the definitive source of
topology and administrative information about OSG user, resource, support agency, and virtual organizations was updated to allow new data to be provided to the WLCG based on installed capacity requirements for Tier 1 and Tier 2 resources. Additional WLCG issue tracking requirements we also added to insure preparedness for LHC data-taking.

The Resource and Service Validation (RSV) monitoring tool has received a new command
line configuration tool plus several new tests, including security, accounting, and central service probes.

The BDII (Berkeley Database Information Index), which is critical to CMS production, is
now functioning with an approved Service Level Agreement (SLA) which was reviewed and
approved by the affected VOs and the OSG Executive Board.

The MyOSG information consolidation tool achieved production deployment and is now
heavily used within the OSG for status information as well as being ported to peering grids
such as EGEE and WLCG. MyOSG allows administrative, monitoring, information, validation and accounting services to be displayed within a single user defined interface.

The GOC-provided public trouble ticket-viewing interface has received updates to improve
usability. This interface allows issues to be tracked and updated by user and it also allows
GOC personnel to use OIM meta-data to route tickets much more quickly, reducing the
amount of time needed to look up contact information of resources and support agencies.
And we continued our efforts to improve service availability via the completion of several hardware and service upgrades:

The GOC services located at IU were moved to a new more robust physical environment,
providing much more reliable power and network stability.

The BDII was thoroughly tested to be sure outage conditions on a single BDII server were
quickly identified and corrected at the WLCG top-level BDII.
38

A migration to a virtual machine environment for many services is now complete to allow
flexibility in providing high availability services.

Virtual machines for training were provided to GOC staff to help troubleshoot installation
and job submission issues.
OSG Operations is ready to support the LHC re-start and continues to refine and improve its capabilities for these stakeholders. We have completed preparation for the stress of the LHC startup on services by testing, by putting proper failover and load-balancing mechanisms in place,
and by implementing administrative automation with several user services. Service reliability
for GOC services has always been high and we now gather metrics that can show the reliability
of these services exceed the requirements of Service Level Agreements (SLAs). SLAs have been
finalized for the BDII and MyOSG and SLAs for the OSG software cache and RSV are being
reviewed by stakeholders. Regular release schedules for all GOC services have been implemented to enhance user testing and regularity of software release cycles for GOC provided services.
One additional staff member will be added to support the load increase in tickets expected with
the LHC re-start. During 2009, the GOC and Operations team completed the transition from the
building phase for both service infrastructure and user support into a period of stable operational
service for the OSG community. Preparations completed, we are now focusing on providing a
reliable, sustainable operational infrastructure and a well-trained, proactive support staff for OSG
production activities.
3.4
Integration and Site Coordination
The OSG Integration & Sites Coordination activity continues to play a central role in helping
improve the quality of grid software releases prior to deployment on the OSG and in helping
sites deploy and operate OSG services thereby achieving greater success in production.
The major focus was preparations for OSG 1.2, the golden release for the LHC restart. Beginning in April 2009, the 3-site Validation Test Bed (VTB) was used repeatedly to perform installation, configuration and functional testing of (pre-releases) of VDT across three sites. These
tests shake out basic usability prior to release to the many-site, many-platform, many-VO integration test bed (ITB). In July, the deliverable of a well-tested OSG 1.2 suitable for LHC running was met through the combined efforts of both OSG core staff and the wider community of
OSG site administrators and VO testers participating in the validation. The face-to-face site
administrator’s workshop hosted by the OSG GOC in Indianapolis in July featured deployment
of OSG 1.2 as its main theme.
The VTB and ITB test beds also serve as a platform for testing the main configuration tools of
the OSG software stack, which are developed as part of this activity. The configuration system
has grown to meet the requirements of the services offered in the OSG software stack including
configuration of storage element software.
A new tool suite developed in the Software Tools Group for RSV (resource service validation)
configuration and operation testing was vetted on the VTB repeatedly, with feedback given to
the development team from the site administrator’s viewpoint.
The persistent ITB continues to provide an on-going operational environment for VO testing at
will, as well as being the main platform during the ITB testing prior to a major OSG release.
Overall the ITB is comprised of 12 sites providing compute elements and four sites providing
storage elements (dCache and BeStMan packages implementing SRM v1.1 and v2.2 protocols);
39
36 validation processes were defined across these compute and storage resources in readiness for
the production release. Pre-deployment validation of applications from 12 VOs were coordinated with the OSG Users group. Other accomplishments include both dCache and SRM-BeStMan
storage element testing on the ITB; delivery of new releases of the configuration tool as discussed above; and testing of an Xrootd distributed storage system as delivered by the OSG Storage group.
The OSG Release Documentation continues to receive significant edits from the community of
OSG participants. The collection of wiki-based documents capture processes for install, configure, and validation methods as used throughout the integration and deployment processes. These
documents were updated and received review input from all corners of the OSG community.
In the past year we have taken a significant step towards increasing contact with the site administration community directly through use of a persistent chat room (“Campfire”). We offer three
hour sessions at least three days a week where OSG core Integration or Sites support staff are
available to discuss issues, troubleshoot problems or simply “chat” regarding OSG specific issues. The sessions are archived and searchable.
Finally a major new initiative meant to improve the effectiveness of OSG release validation has
begun whereby a suite of test jobs can be executed through the (pilot-based) Panda workflow
system. The test jobs can be of any type and flavor; the current set includes simple ‘hello-world’
jobs, jobs that are CPU-intensive, and jobs that exercise access to/from the associated storage
element of the CE. Importantly, ITB site administrators are provided a command line tool they
can use to inject jobs aimed for their site into the system, and then monitor the results using the
full monitoring framework (pilot and Condor-G logs, job metadata, etc) for debugging and validation at the job-level. As time goes on additional workloads can be created an executed by the
system, simulating components of VO workloads.
3.5
Virtual Organizations Group
The Virtual Organizations (VO) Group coordinates and supports the portfolio of the “at-large”
Science community VOs in OSG except for the three major stakeholders (ATLAS, CMS, and
LIGO) which are directly supported by the OSG Executive Team.
At various times through the year, science communities were provided assistance in planning
their use of OSG. Direct input was gathered from nearly 20 at-large VOs and reported to the
OSG Council on behalf of ALICE, CDF, CIGI, CompBioGrid, D0, DES, DOSAR, Fermilab VO,
GEANT4, GPN, GRASE, GROW, GUGrid, IceCube, JDEM, Mariachi, NanoHUB, NYSGrid,
and SBGrid. These input provided a roadmap of intended use to enable OSG to better support
the goals of the science stakeholders; these roadmaps covered: scope of use; VO mission; average and peak grid utilization quantifiers; resource provisioning; and plans, needs, milestones.
From time to time a VO may need special assistance to effectively use OSG. In such cases, OSG
management establishes ad hoc task forces coordinated by the VO Group to provide support for
the science community. In the last year, joint task forces were organized with ALICE, D0,
Geant4, NanoHUB, and SBGrid. With mutual sharing of expertise between OSG and the VOs,
each task force addressed specialized matters to enable measurably higher productivity for each
VO. Care was taken to try to maximize long-term residual impact and sustainability, beyond the
near term goals of each effort.
40
We continued our efforts to strengthen the effective use of OSG by VOs. D0 increased to 6080% efficiency at 80-120000 hours/day and this contributed to new levels of D0 Montecarlo
production, reaching a new peak of 13 million events per week; and in effect, D0 enhanced its
harnessing capacity on OSG, transforming it into ‘a publication per week.’ CDF undertook readiness exercises to prepare for Linux 5 and Kerberos CA transitions. The Fermilab VO with its
wide array of more than 12 individual sub-VOs, continued efficient operations; and refocused on
OSG-TeraGrid gateway, proving its concept at limited scale with D0 as a cross-grid consumer.
We conducted a range of activities to jump-start VOs that are new users of OSG or looking to
increase their leverage of OSG. ALICE was enabled to start operations using its scale of resources at NERSC and LLNL. NanoHUB was enabled to pull-up from nominal utilization to
500 hours/day with multiple streams of nanotechnology applications. The SBGrid resource infrastructure was enabled and its molecular biology science applications started up across OSG.
The GEANT4’s EGEE-based biannual regression-testing production runs were expanded onto
the OSG, assisting in its toolkit’s quality releases for BaBar, MINOS, ATLAS, CMS, and LHCb.
And we continue to actively address plans for scaling up operations of additional communities
including IceCube, CompBioGrid, and GPN.
Pre-release validation by Science stakeholders was completed for OSG Release 1.2 in partnership with the OSG Integration team. After validation on ITB 1.1, official green flags for OSG
1.2 were given by 13 VOs including ATLAS, CMS, LIGO, and STAR. As part of ITB 0.9.2 earlier, a smaller-scale cycle was organized for the incremental Release1.0.1. By working closely
with VOs in planning of schedules for timely release of software, we achieved validation within
4 weeks. Balancing schedules with thoroughness of testing remains a tradeoff and we will continue to address and improve these processes in the coming year.
Weekly VO Forum teleconferences were organized for regular one-on-one interaction between
representatives of at-large VOs and staff members of OSG. We increased emphasis on assessing
viability, and merger or closure, of a VO if this can better serve its own community’s successroadmap and value proposition. As a result: two VOs, GUGrid and Mariachi, were closed due to
lack of growth in activity, based on their own self-assessment; two new registrations were encouraged to merge into existing VOs, Minerva into Fermilab VO, and CALICE into ILC; four
new registrations were vetted and encouraged to form official VOs - GlueX, GROW, JDEM, and
NEBioGrid.
Efficient and effective use of the OSG continues to be a driving force for this team. Building
and maintaining productivity with sustained growth is important for success of the diverse set of
Science communities in OSG. In the coming year, the VO Group will continue to work toward
this objective.
3.6
Engagement
A priority of Open Science Grid is helping science communities benefit from the OSG infrastructure by working closely with these communities over periods of several months. The Engagement activity brings the power of the OSG infrastructure to scientists and educators beyond
high-energy physics and uses the experiences gained from working with new communities to
drive requirements for the natural evolution of OSG. To meet these goals, engagement helps in:
providing an understanding of how to use the distributed infrastructure; adapting applications to
run effectively on OSG sites; engaging the deployment of community owned distributed infrastructures; working with the OSG Facility to ensure the needs of the new community are met;
41
providing common tools and services in support of the engagement communities; and working
directly with and in support of the new end users with the goal to have them transition to be full
contributing members of the OSG. These goals and methods continue to evolve from the foundation created in previous years.
Use of the OSG by the Engagement supported scientists continues to grow. Figure 17 shows the
diversity and level of activity among Engagement users for the previous year, and Figure 18
shows the distribution by OSG facility of the more than 6 million CPU hours consumed by Engagement users, which is more than double the previous year’s activity.
Figure 17: Engage user activity for one year
42
Figure 18: CPU hours by facility for Engage Users
The Engagement team continues to work with active production users including:
 Steffen Bass (+3 staff), theoretical physics, Duke University;
 Anton Betten, mathematics, Colorado State;
 Jinbo Xu (+1 staff), protein structure prediction, Toyota Technological Institute;
 Eun Jung Choi, biochemistry, UNC-CH;
 Vishagan Ratnaswamy, mechanical engineering, New Jersey Institute of Technology;
 Abishek Patrap (+2 staff), systems biology, Institute for Systems Biology;
 Damian Alvarez Paggi, molecular simulation, Universidad de Buenos Aires;
 Eric Delwart, metagenomics, UCSF;
 Tai Boon Tan, molecular simulation, SUNY Buffalo.
New engagements this year have led to work and collaboration with more than 10 research
teams, including:
 Peter Rose and Andreas Prlic, RCSB PDB at UCSD;
 Bo Xin and Ian Shipsey, LSST, Purdue;
 Ridwan Sakidja, materials science, UW Madison;
 Dan Katz and student Allan Espinosa, computer science, U.Chicago;
 Lene Jung Kjaer, Cooperative Wildlife Research Laboratory, SIU;
 Zi-Wei Lin, Physics, Eastern Carolina University;
 Jarrett Byrnes, UCSB - Marine Science Institute;
In addition, the Engagement effort has helped LIGO evolve their site selection methods for running their application on OSG. And the Engagement effort has also led to meaningful and growing use of OSG by the RENCI Science Portal, a separately funded TeraGrid Science Gateway
project.
43
For the coming year, we have seeded relationships a number of research teams including a number of 2009 awardees of the US/European Digging Into Data Challenge, the University of Maryland Institute for Genome Sciences, the National Biomedical Computational Resource project
(NBCR), and significant integration and enhancements support the work of Dr. Steffen Bass
(Duke Physics) and his newly awarded 4 year NSF CDI award.
3.7
Campus Grids
The Campus Grids team’s goal is to include as many US universities as possible in the national
cyberinfrastructure. By helping universities understand the value of campus grids and resource
sharing through the OSG national framework, this initiative aims at democratizing cyberinfrastructures by providing all resources to users and doing so in a collaborative manner.
In the last year, the campus grids team worked closely with the OSG education team to provide
training and outreach opportunities to new campuses. A workshop for new site administrators
was organized at Clemson and was attended by representatives from four campuses: University
of Alabama, University of South Carolina, Florida International Miami and from the Saint Louis
Genome Center. A second workshop was held at RENCI, and included representatives from
Duke, UNC-Chapel Hill, and NCSU among others. UNC-CH has recently launched its TarHeel
Grid, and Duke University is actively implementing a campus grid being incubated by a partnership between academic computing and the physics department. Sites that have been engaged in
the past continue to collaborate with us – Genome Center, New Jersey Higher Education Network – with various level of success. Overall 24 sites have been in various levels of contacts
with the campus grids team and seven are now in active deployment. Finally, a collaborative
activity was started with SURAGrid to study interoperability of the grids; and an international
collaboration with the White Rose Grid from the UK National Grid Service in Leeds as well as
the UK Interest Group on Campus Grid was initiated this period. (Barr Von Oehsen from Clemson presented the Clemson Grid at a Campus CI SIG workshop in Cardiff in September 2009).
As a follow-up to work from the prior year, the Clemson Windows grid contributed significant
compute cycles to Einstein@home run by LIGO and reached the rank of the 4th highest contributor world-wide; a new accounting probe has recently been completed to report this usage to
the OSG accounting system, this will be made available to other campus sites that contribute via
BOINC projects. Finally as a teaching activity, five Clemson students (of which two are funded
by OSG and two are funded by CI-TEAM) competed in the TeraGrid 2009 parallel computing
competition and finished second. One of them, an African-American female, attended the ISSGC
2009 in France thanks to OSG financial support.
Some campuses have started to demonstrate high interest in cloud technologies and the campus
CI area is working on this in collaboration with STAR project. STAR provided a virtual machine
image containing their certified software and this image was deployed at Clemson and University of Wisconsin to investigate mechanism to run STAR jobs within VMs. Per STAR scientist,
Jerome Lauret (member of the OSG council), the Clemson setup that transparently runs jobs
submitted to an OSG CE into the provided VM image is as easy to use as the EC2 cloud environment that they have tested.
3.8
Security
The Security team continued its multi-faceted approach to successfully meeting the primary goal
of maintaining operational security, developing security policies, acquiring, or developing neces-
44
sary security tools and software, and disseminating security knowledge and awareness. And we
are evolving with increased focus toward providing an operationally secure infrastructure that
balances usability and security requirements simultaneously. Our mission statement is different
from last year's in that we are emphasizing “usability.” This change stems from both our observations and the feedback we received from our community.
We continued our efforts to improve the OSG security program. The security officer asked for
an external peer evaluation of the OSG security program which was conducted by the EGEE and
TeraGrid security officers and a senior security person from University of Wisconsin. The
committee produced a report of their conclusions and suggestions, according to which we are
currently adjusting our current work. This peer evaluation proved so successful that we have received requests from EGEE and TeraGrid officers to join an evaluation of their security programs.
In collaboration with ESNet Authentication and Trust Fabric Team, we have organized two security workshops: 1) the Living in an Evolving Identity World Workshop brought technical experts
together to discuss security systems; and, 2) the OSG Identity Management Requirements Gathering Workshop brought the user communities together to discuss usage of the security systems.
Usability has surfaced as a key element in both workshops. Historically, we have understood that
ease-of-use of the PKI infrastructure greatly affects the security of our environment: a too complex security environment forces users to circumvent security all together or increases the barriers of entry into the grid world. We have conducted a detailed user survey with our VO contacts
to pinpoint problem areas; the workshop reports are available at
https://twiki.grid.iu.edu/bin/view/Security/OsgEsnetWorkshopReport). Accordingly, we have
added priority to improve these problem areas in the next year.
During the past year, we increased our preparations for the LHC re-start of data taking:

We conducted a risk assessment of our infrastructure and held a 2-day blueprint meeting with
our executive team. We identified the risks over our authentication infrastructure that could
significantly affect OSG operations. We subsequently met with ESNet management since
they provide the backbone of our authentication infrastructure. We jointly developed
mitigation plans for the identified risks and have been implementing the plans. For example,
we completed a tool to monitor suspicious behavior in ESNet certificate issuance
infrastructure.

We identified that our software distribution mechanism is a high-risk attack target and thus
we decided to use signed software packages that can be automatically validated by site
administrators during software updates.

In order to increase the security quality of VDT software, we reached out to our external
software providers and exchanged our security practices such as vulnerability announcement,
patching policies, update frequencies and such.

With the help of Professor Barton Miller of University of Wisconsin, we organized a “Secure
Programming Workshop” in conjunction with the OGF. The tutorial taught the software
providers the secure coding principles with hands-on code evaluation. A similar tutorial was
also given at EGEE'09 conference to reach out to European software providers, which forms
a significant part of the VDT code.
45

We formed a software vulnerability process with the help of VDT team. The security team
watches the security bulletin boards on a daily basis and triggers the vulnerability evaluation
process if a possible vulnerability is suspected that could affect the VDT code.

To measure LHC Tier-2s ability to respond to an ongoing incident, we conducted an incident
drill. The drill included all LHC Tier-2 sites and a LIGO site, 19 sites in total. Each site has
been tested one-on-one; a security team member generated a non-malicious attack and
assessed sites responses. In general the sites performed very well with only a handful of site
needing minor assistance or support in implementing the correct response actions. Our next
goal is to conduct an incident drill to measure the preparedness of our user community and
the VOs.
During the last year, the USATLAS and USCMS management asked OSG for increased security
support for the LHC US Tier-3 sites. Tier-3s usually are small sites managed by graduate students. Thus, they may lack the security skills of a professional site administrator. To address
this need:

We focused on Tier-3s at the OSG Site Administrators Workshop. Before the workshop, we
individually contacted each site, asked for their needs and encouraged participation. Their
feedback helped shaping our preparation significantly.

After the workshop, we have continued our efforts by meeting with a Tier-3 site admin oneon-one each week. The summaries of our meetings can be found at
https://twiki.grid.iu.edu/bin/view/Security/FY10SecuritySupport and this effort continues in
coming year.

We give special guidance to our Tier 3 sites during security vulnerabilities. So far, we have
had two high-risk root-level kernel vulnerabilities. In addition to our regular response
process, we contacted each Tier-3 site individually, and helped them test and patch their
systems. There is a clear need for this type of support at least until Tier3s become more
familiar with site management. However, individualized care for close to 20 sites is not
sustainable due to our existing effort level and we are investigating automated tools to relieve
this burden. In addition, we want to build a security community, where each site admin is an
active contributor and can work with his/her peers. We have started a security blog and a
ticketing system to accommodate information sharing and we plan to increase awareness of
these tools to create an active community.
OSG did not have any grid incident during past year. Our efforts were preventive in nature and
we spent most efforts on software patching (both grid and systems software) and responding to
tangential incidents that were not attacks on the grid infrastructure but affected OSG sites indirectly. For example, we had attacks coming in via SSH ports and affecting HEP users but none
of the attacks targeted the grid middleware or grid user credentials. Nevertheless, we undertook
response as though OSG was attacked. We started an incident sharing community across
TeraGrid, EGEE, NorduGrid, and OSG security officers; this continues to be immensely useful
for operational security.
During the last year, we identified a problem with updating OSG-recommended security configurations at sites; we identified the need for an administrative update tool to solve the issue and
this tool is now in testing for a future release. To provide better monitoring, the security team
wrote a suite of security probes which allow a site administrator to compare local security con-
46
figurations against OSG-suggested values; these monitoring probes have been released with
OSG 1.2. In the next year, we plan to focus more on monitoring efforts which we consider a
critical element of our risk assessment and mitigation plans.
3.9
Metrics and Measurements
OSG Metrics and Measurements strive to give OSG management, VOs, and external entities
quantitative details of the project’s growth throughout its lifetime. The focus for FY09 was the
“internal metrics” report, where OSG Metrics collaborated with all the internal OSG areas to
come up with quantitative metrics for each area. The goal was to help each area become aware
of their progress throughout the year. This work culminated in OSG document #887 done in
September 2009.
OSG Metrics works with OSG Gratia operations to validate the data coming from the accounting
system. This accounting data is sent to the WLCG on behalf of the USLHC VOs and is used for
many reports and presentations to the US funding agencies. Based on the Gratia databases, the
OSG Metrics team maintains a website that integrates with the accounting system and provides
an expert-level view of all the OSG accounting data. This data includes RSV test results, job
records, transfer records, storage statistics, cluster size statistics, and historical batch system status for most OSG sites. During FY10, we will be working to migrate the operation of this website to OSG operations and integrate the data further with MyOSG (in order to make it more accessible and user-friendly).
OSG Metrics collaborates with the storage area to expand the number of sites reporting transfers
to the accounting system. We have begun to roll out new storage reporting, based on new software capabilities developed for Gratia. The focus of the storage reporting is to provide a general
overview of site usage centrally, but it also provides a wide range of detailed statistics to local
system administrators. The Metrics and Measurements area will also begin investigating how to
incorporate the network performance monitoring data into its existing reporting activities. Several VOs are actively deploying perfSONAR-based servers that provide site-to-site monitoring of
the network infrastructure, and we would like to further encourage network measurements on the
OSG.
The Metrics and Measurements area continues to be involved with the coordination of WLCGrelated reporting efforts. It sends monthly reports to the JOT that highlights MOU sites’ monthly
availability and activity. It produces monthly data for the metric thumbnails and other graphs
available on the OSG homepage. OSG Metrics participated in the WLCG Installed Capacity
discussions during FY09. The Installed Capacity project was tasked with providing the WLCG
with accurate and up-to-date information on the amount of CPU and storage capacity available
throughout the worldwide infrastructure. The OSG worked on the USLHC experiment’s behalf
to communicate their needs and put the reporting infrastructure in place using MyOSG/OIM.
The accounting extensions effort requires close coordination with the Gratia project at FNAL,
and includes effort contributed by ATLAS.
Based on a request from the OSG Council, we prepared a report on the various field of sciences
utilizing the grid. This focused on a report detailing the number of users and number of wall
clock hours per field. The majority of the effort was spent working with the community VOs (as
opposed to an experiment VO) on a means to classify their users. Also from stakeholder input,
the OSG assessed the various areas of contribution and documented the value delivered by OSG.
47
The goal of this activity was to develop and estimate the benefit and cost effectiveness, and thus
provide a basis for discussion of the value of the Open Science Grid (OSG).
3.10
Extending Science Applications
In addition to operating a facility, the OSG includes a program of work that extends the support
of Science Applications both in terms of the complexity as well as the scale of the applications
that can be effectively run on the infrastructure. We solicit input from the scientific user community both as it concerns operational experience with the deployed infrastructure, as well as
extensions to the functionality of that infrastructure. We identify limitations, and address those
with our stakeholders in the science community. In the last year of work, the high level focus
has been threefold: (1) improve the scalability, reliability, and usability as well as our understanding thereof; (2) establish and operate a workload management system for OSG operated
VOs; and (3) establish the capability to use storage in an opportunistic fashion at sites on OSG.
We continued with our previously established processes designed to understand and address the
needs of our primary stakeholders: ATLAS, CMS, and LIGO. The OSG ”senior account managers” responsible for interfacing to each of these three stakeholders meet, at least quarterly, with
their senior management to go over their issues and needs. Additionally, we document the
stakeholder desired worklists from OSG and crossmap these requirements to the OSG WBS;
these lists are updated quarterly and serve as a communication method for tracking and reporting
on progress.
3.11
Scalability, Reliability, and Usability
As the scale of the hardware that is accessible via the OSG increases, we need to continuously
assure that the performance of the middleware is adequate to meet the demands. There were four
major goals in this area for the last year and they were met via a close collaboration between developers, user communities, and OSG.
1. At the job submission client level, the goal is 20,000 jobs running simultaneously and
200,000 jobs run per day from a single client installation, and achieving in excess of 95%
success rate while doing so. The job submission client goals were met in collaboration with
CMS, CDF, Condor, and DISUN, using glideinWMS. This was done via a mix of controlled
environment and large scale challenge operations across the entirety of the WLCG. For the
controlled environment tests, we developed an “overlay grid” for large scale testing on top of
the production infrastructure. This test infrastructure provides in excess of 20,000 batch slots
across a handful of OSG sites. An initial large-scale challenge operation was done in the
context of CCRC08, the 2008 LHC computing challenge. Condor scalability limitations
across large latency networks were discovered and this led to substantial redesign and
reimplementation of core Condor components, and subsequent successful scalability testing
with a CDF client installation in Italy submitting to the CMS server test infrastructure on
OSG. Testing with this testbed exceeded the scalability goal of 20,000 running
simultaneously and 200,000 jobs per day across the Atlantic. This paved the way for
production operations to start in CMS across the 7 Tier-1 centers. Data Analysis Operations
across the roughly 50 Tier-2 and Tier-3 centers available worldwide today is more
challenging, as expected, due to the much more heterogeneous level of support at those
centers. However, during STEP09, the 2009 LHC data analysis computing challenge, a
single glideinWMS instance exceeded 10000 jobs running simultaneously over about 50 sites
worldwide with a success rate in excess of 95%. Success rate here excludes storage as well as
48
application failures. For the coming years, CMS is expecting to have access to up to 40,000
batch slots across WLCG, so Condor is making further modifications to reach that goal. CMS
also requested for glideinWMS to become more firewall friendly, which also affects Condor
functionality. We are committed to help the Condor team by performing functionality and
scalability tests.
2. At the storage scheduling level, the present goal is to have 50Hz file handling rates,
significantly up from the 1Hz rate goal of last year. An SRM scalability of up to 100Hz was
achieved using BeStMan and HadoopFS at UNL, CalTech, and UCSD. DCache based SRM
is however still limited below 10Hz. The increase in the goal was needed in order to cope
with the increasing scale of operations by the large LHC VOs Atlas and CMS. The driver
here is stage-out of files produced during data analysis. The large VOs find that the
dominant source of failure in data analysis is the stage-out of the results, followed by readaccess problems as second most likely failure. In addition, storage reliability is receiving
much more attention now, given its measured impact on job success rate. In a way, the
impact of jobs on storage has become more and more of a visible issue in part because of the
large improvements in the submission tools, monitoring, and error accounting within the last
two years. The improvements in submission tools coupled with the increased scale of
resources available is driving up the load on storage. The improvements in monitoring and
error accounting are allowing us to fully identify the sources of errors. OSG is very actively
engaged in understanding the issues involved, working with both the major stakeholders as
well as partner grids.
3. At the functionality level, this year’s goal was to improve the capability of opportunistic
space use. It has been deployed at several dCache sites on OSG, and successfully used by
D0 for the production operations on OSG. CDF is presently in the testing stage for adapting
opportunistic storage into their production operations on OSG. However, a lot of work is left
to do before opportunistic storage is an easy to use and widely available capability of OSG.
4. OSG has successfully transition to a “bridge model” with regard to WLCG for its
information, accounting, and availability assessment systems. This implies that there are
aggregation points at the OSG GOC via which all of these systems propagate information
about the entirety of OSG to WLCG. For the information system this implies a single point
of failure, the BDII at the OSG GOC. If this service fails then all resources on OSG
disappear from view. ATLAS and CMS have chosen different ways of dealing with this.
While ATLAS maintains its own “cached” copy of the information inside Panda, CMS
depends on the WLCG information system. To understand the impact of the CMS choice,
OSG has done scalability testing of the BDII. We find that the service is reliable up to a
query rate of 10 Hz. The OSG GOC has deployed monitoring of the query rate of the
production BDII in response to this finding, in order to understand the operational risk
implied by this single point of failure. The monitoring data of the current year showed a
stable query rate of about 2Hz, so while we will continue to monitor the BDII, we do not
expect any additional effort to be needed.
In addition, we have continued to work on a number of lower priority objectives:
a. Globus has started working on a new release of GRAM, called GRAM5. OSG is committed
to work with them on integration and large scale testing. We have been involved in the
validation of alpha and beta releases, providing feedback to Globus ahead of the official
49
release, in order to improve the quality of the released software. This also involved working
with Condor on the client side, since all OSG VOs rely on Condor for submission to GRAMbased CEs. GRAM5 is now supported in Condor version 7.3.2 and above. We plan to
continue integration and scalability testing after the release of GRAM5 in order to make it a
fully supported CE in the OSG software stack.
b. WLCG has expressed the interest in deploying CREAM on WLCG sites in Europe and Asia.
CREAM is a CE developed by INFN in EGEE. Since both ATLAS and CMS use Condor as
the client to Grid resources and run jobs on WLCG resources worldwide, OSG has agreed to
work with Condor to integrate CREAM support. The OSG activity involved testing of
Condor development releases against a test CREAM instance provided by INFN. As a result,
CREAM is now supported in Condor version 7.3.2 and above. We are now in the process of
running scalability tests against CREAM, by deploying a CREAM instance at UCSD and
interfacing it to the test cluster of 2000 batch slots. This will allow us to both validate the
Condor client, as well as to evaluate CREAM as a possible CE candidate for OSG. We
expect this work to be completed in early 2010. We are also waiting for CREAM deployment
on the production infrastructure on EGEE to allow for realistic large scale testing.
c. WLCG also uses the ARC CE; ARC is the middleware developed at and deployed on
NorduGrid. We have been involved in testing the Condor client against ARC resources,
which resulted in several improvements in the Condor code. For the next year, we plan to
deploy an ARC CE at an OSG site and perform large scale scalability tests, with the aim to
both validate the Condor client, as well as to evaluate ARC as a possible CE candidate for
OSG.
d. CMS has approached OSG for help in understanding of the I/O characteristics of CMS
analysis jobs. We helped by providing advice and expertise which resulted in much improved
CPU efficiencies of CMS software on OSG sites. CMS is continuing this work by testing all
of their sites worldwide. OSG will continue to be involved in these tests at the level of
consulting, and in order to learn from it, and document it for a wider audience.
e. In the area of usability, an “operations toolkit” for dCache was improved. The intent of the
toolkit is to provide a “clearing house” of operations tools that have been developed at
experienced dCache installations, and derive from that experience a set of tools for all
dCache installations supported by OSG. This is significantly decreasing the cost of
operations and has lowered the threshold of entry. Site administrators from both the US and
Europe continue to upload tools, and resulting in several releases. These releases have been
downloaded by a number of sites, and are in regular use across the US, as well as some
European sites.
f. Another usability milestone was the creation of a package integrating BeStMan and Hadoop.
By providing integration and support for this package, we expect to greatly reduce the effort
needed by OSG sites to deploy it; CMS Tier-2s and Tier-3s in particular have expressed
interest in using it. In the next year we plan to further improve it based on feedback received
from the early adopters.
g. A package containing a set of procedures that allow us to automate scalability and robustness
tests of a Compute Element has been created. The intent is to be able to quickly “certify” the
performance characteristics of new middleware, a new site, or deployment on new hardware.
The package has already been used to test GRAM5 and CREAM. Moreover, we want to
50
offer this as a service to our resource providers so that they can assess the performance of
their deployed or soon to be deployed infrastructure.
h. Work has started on putting together a framework for using Grid resources for performing
consistent scalability tests against centralized services, like CEs and SRMs. The intent is
similar to the previous package, to quickly “certify” the performance characteristics of new
middleware, a new site, or deployment on new hardware, but using hundreds of clients
instead of one. Using Grid resources allows us to achieve this, but requires additional
synchronization mechanisms to be performed in a reliable and repeatable manner.
3.12
Workload Management System
The primary goal of the OSG Workload Management System (WMS) effort is to build, integrate,
test and support operation of a flexible set of software tools and services for efficient and secure
distribution of workload among OSG sites. There are currently two suites of software utilized for
that purpose within OSG: Panda and glideinWMS, both drawing heavily on Condor software.
The Panda system continued as a supported WMS service for the Open Science Grid, and served
as a crucial infrastructure element of the ATLAS experiment at LHC. We completed the migration of Panda software to Oracle database backend, which enjoys strong support from major
OSG stakeholders and allows us to host an instance of the Panda server at CERN where ATLAS
is located, creating efficiencies in support and operations areas. Work has started on the Panda
monitoring system upgrade, which will allow for a better integration with ATLAS Production
Dashboard, as well as for easier implementation of user interfaces suitable for Panda applications
outside of ATLAS. The Panda system has proven itself in a series of challenging scalability tests
conducted in the months preceding the LHC run in the fall of 2009.
To foster wider adoption of Panda in the OSG user community, we created a prototype of a data
service that will make use easier by individual users, by providing a Web-based user interface for
uploading and management of input and output data, and a secure backend that allows Panda pilot jobs to both download and transmit data as required by the Panda workflow. No additional
software is required on the users’ desktop PCs or lab computers, and this will be helpful for
smaller research groups, who may lack manpower to support larger software stacks. This, in part,
helped us to start collaborating with BNL research group of Long Baseline Neutrino Experiment
at DUSEL, who expressed strong interest in using Panda to meet their science goals. Other potential users include an interdisciplinary group at University of Wisconsin (Milwaukee), who are
starting work on an advanced laser spectrometry system.
We started using Panda to facilitate the ITB (Integration Testbed) activity in OSG, which allows
site administrators to monitor test job execution from a single Web location and have test results
automatically documented via Panda logging mechanism.
Progress was made with the glideinWMS system approaching the project goal of pilot-based
large-scale workload management. Version 2.0 has been released and is capable of servicing
multiple virtual organizations with a single deployment. The FermiGrid facility has expressed
interest in putting this in service for VOs based at Fermilab. Experiments such as CMS, CDF and
MINOS are currently using glideinWMS in their production activities. The IceCube project recently used the glideinWMS facility at UCSD to run their jobs on OSG resources. Discussions
are underway with new potential adopters including DZero and CompBioGrid. We also contin-
51
ued the maintenance of the gLExec (user ID management software), a collaborative effort with
EGEE, as a project responsibility.
In the area of WMS security enhancements, we completed integration of gLExec into Panda. It is
also actively used in glideinWMS. In addition to giving the system more flexibility from security
and authorization standpoint, this also allows us to maintain a high level of interoperability of the
OSG workload management software with our WLCG collaborators in Europe, by following a
common set of policies and using compatible tools, thus enabling both Panda and glideinWMS
to operate transparently in both domains. An important part of this activity was the integration
test of a new suite of user authorization and management software (SCAS) developed by WLCG,
which involved testing upgrades of gLExec and its interaction with site infrastructure. We made
a significant contribution to that activity and continue to actively support further work in that area.
Work continued on the development of Grid User Management System (GUMS) for OSG. This
is an identity mapping service which allows sites operate on the Grid while relying on traditional
methods of user authentication, such as UNIX accounts or Kerberos. Based on our experience
with GUMS in production since 2004, a number of new features have been added which enhance
its usefulness for OSG. In this reporting period, we faced and addressed the following challenges:

The absence of a lightweight, easy-to-maintain data transfer mechanism for jobs executed in
Panda pilot-based framework. This was resolved by creating such a system, which is now
available to users.

The legacy nature of certain parts of the Panda information system, notably monitoring,
which impeded integration with experiment-wide dashboard software of ATLAS, and necessary specialization of user interfaces in other cases. We addressed this by initiating work to
upgrade Panda monitoring, by moving it to a modern platform and a modular structure with
network API.

The lack of awareness of potential entrants to OSG of capabilities and advantages of Workload Managements Systems run by our organization. This will need to be rectified by producing a set of comprehensive documentation available from OSG Web site.
This program of work continues to be important for the science community and OSG for several
reasons. First, having a reliable WMS is a crucial requirement for a science project involving
large scale distributed computing which processes vast amounts of data. A few of the OSG key
stakeholders, in particular LHC experiments ATLAS and CMS, fall squarely in that category,
and the Workload Management Systems supported in the OSG serve as a key enabling factor for
these communities. Second, drawing new entrants to OSG will provide benefit of access to opportunistic resources to organizations that otherwise would not be able to achieve their research
goals. As more improvements are made to the system, Panda will be in a position to serve a wider spectrum of science disciplines.
3.13
Internet2 Joint Activities
Internet2 collaborates with OSG to develop and test a suite of tools and services that make it easier for OSG sites to support their widely distributed user community. A second goal is to leverage the work within OSG to create scalable solutions that will benefit the entire Internet2 membership. Identifying and resolving performance problems continues to be a major challenge for
52
OSG site administrators. A complication in resolving these problems is that lower than expected
performance can be caused by problems in the network infrastructure, the host configuration, or
the application behavior. Advanced tools can quickly isolate problem(s) and will go a long way
toward improving the grid user experience and making grids more useful to science communities.
In the past year, Internet2 has worked with the OSG software developers to incorporate several
advanced network diagnostic tools into the VDT package. These client programs interact with
perfSONAR-based servers deployed on the Internet2 and ESnet backbones to allow on-demand
testing of poorly performing sites. By enabling OSG site administrators and end users to test any
individual compute or storage element in the OSG environment, we can reduce the time it takes
to begin the network troubleshooting process. It also allows site administrators or users to quickly determine if a performance problem is due to the network, a host configuration issue, or an
application behavior.
In addition to deploying client tools via the VDT, Internet2 staff, working with partners in the
US and internationally, have created a simple live-CD distribution mechanism for the server side
of these tools (perfSONAR-Performance-Toolkit). This bootable CD allows an OSG site-admin
to quickly stand up a perfSONAR-based server to support the OSG users. These perfSONAR
hosts automatically register their existence in a distributed global database, making it easy to find
new servers as they become available. Internet2 staff also identified an affordable 1U rack
mountable computer that can be used to run this server software. OSG site administrators can
now order this standard hardware, ensuring that they can quickly get started with a known good
operating environment.
These servers provide two important functions for the OSG site-administrators. First they provide an end point for the client tools deployed via the VDT package. OSG users and siteadministrators can run on-demand tests to begin troubleshooting performance problems. The
second function they perform is to host regularly scheduled tests between peer sites. This allows
a site to continuously monitor the network performance between itself and the peer sites of interest. The USATLAS community has deployed perfSONAR hosts and is currently using them to
monitor network performance between the Tier1 and Tier2 sites. Internet2 has attended weekly
USATLAS calls to provide on-going support of these deployments, and has come out with regular bug fixes. Additionally, the USCMS community has recently begun to deploy perfSONARPerformance-Toolkit based hosts to evaluate their usefulness for that community of users. Finally, on-demand testing and regular monitoring can be performed to both peer sites and the Internet2 or ESNet backbone network using either the client tools, or the perfSONAR servers. Internet2 will continue to interact with the OSG admin community to learn ways to improve this distribution mechanism.
Another major task for Internet2 is to provide training on the installation and use of these tools
and services. In the past year Internet2 has participated in several OSG site-admin workshops,
the annual OSG all-hands meeting, and interacted directly with the LHC community to determine how the tools are being used and what improvements are required. Internet2 has provided
hands-on training in the use of the client tools, including the command syntax and interpreting
the test results. Internet2 has also provided training in the setup and configuration of the
perfSONAR server, allowing site-administrators to quickly bring up their server. Finally, Internet2 staff has participated in several troubleshooting exercises; this includes running tests, interpreting the test results and guiding the OSG site-admin through the troubleshooting process.
53
3.14
ESNET Joint Activities
OSG critically depends on ESnet for the network fabric over which data is transferred to and
from the Laboratories and to/from LIGO Caltech (by specific MOU). ESnet is part of the collaboration delivering and supporting the perfSONAR tools that are now in the VDT distribution.
OSG makes significant use of ESnets collaborative tools with telephone and video meetings.
And ESnet and OSG are starting discussions towards collaboration in testing of the 100Gigabit
network testbed as it becomes available in the future.
OSG is the major user of the ESnet DOE Grids Certificate Authority for the issuing of X509 certificates (Figure 19). Registration, renewal and revocation are done through the OSG Registration Authority and ESnet provided web interfaces. ESnet and OSG collaborate on the user interface tools needed by the OSG Stakeholders for management and reporting of certificates.
Figure 19: Number and distribution of valid certificates currently in use by OSG communities.
OSG and ESnet have worked on a contingency plan to organize responses to an incident that
would make the issued certificates untrustworthy. We also partner as members of the identity
management accreditation bodies in America (TAGPMA) and globally (International Grid Trust
Federation, IGTF).
OSG and ESnet jointly organized a workshop on Identity Management in November with two
complementary goals. To look broadly at the identity management landscape and evolving
trends regarding identity in the web arena and also to gather input and requirements from the
OSG communities about their current issues and expected future needs. A main result of the
analysis of web-based technologies is the ability to delegate responsibility, which is essential for
54
grid computing, is just beginning to be a feature of web technologies and still just for interactive
timescales. A significant result from gathering input from the users is that the communities tend
to either be satisfied with the current identity management functionality or are dissatisfied with
the present functionality and see a strong need to have more fully integrated identity handling
across the range of collaborative services used for their scientific research. The results of this
workshop and requirements gathering survey are being used to help plan the future directions for
work in this area.
4.
4.1
Training, Outreach and Dissemination
Training and Content Management
During 2009, OSG began an evolution of its Education and Training area from a combination of
general grid technology education and more specific training toward a focus on training in creation and use of grid technologies. This shift and reduction in scope in education and training was
undertaken to accommodate an increase in effort needed to improve the quality and content relevancy of our documentation and training material.
Consistent with the change in focus, Content Management has been added to the area with the
realization that improved documentation forms the basis for improved training. A study of OSG
documentation, conducted in the spring of 2009, involved interviews of 29 users and providers of
OSG documentation and subsequent analysis of the results to produce recommendations for improvement. These study recommendations were reviewed and analyzed in a two-day workshop
of relevant experts. The implementation of those recommendations began in October, 2009 and
will continue through 2010.
The OSG Training program brings domain scientists and computer scientists together to provide
a rich training ground for the engagement of students, faculty and researchers in learning the
OSG infrastructure, applying it to their discipline and contributing to its development.
During 2009, OSG sponsored and conducted numerous training events for students and faculty
and participated in additional invited workshops. The result was that grid computing training
reached about 200 new professionals. Training organized and delivered by OSG in 2009 is identified in the following table:
Workshop
Length
Location
Month
Chile Grid Workshop
3 days
Santiago, Chile
January
OSG Site Administrators Workshop 2 days
Clemson, SC
March
New Mexico Grid School
3 days
Albuquerque, NM April
North Carolina Grid School
3 days
Chapel Hill, NC
April
Site Administrators Workshop
2 days
Indianapolis, IN
August
Columbia Grid Workshop
2 weeks Bogata, Columbia October
55
OSG staff also participated as keynote speakers, instructors, and/or presenters at four venues in
2009 as detailed in the following table:
Venue
Length
Location
Month
International Winter School on Grid
Computing
8 sessions
Online - 6 wks
Feb-Mar
Easter recess UJ Research Cluster
6 days
Johannesburg,
South Africa
April
International Summer School on
Grid Computing
11 days
Nice, France
July
Grace Hopper Conference 09
4 days
Tucson, AZ
Oct
OSG was a co-organizer of the International Summer School on Grid Computing (ISSGC09).
The OSG Education team arranged sponsorship (from NSF) for US-based students to attend this
workshop; we selected and prepared ten US students who participated in the school. In addition,
OSG staff provided direct contributions to the International Grid School by attending, presenting,
and being involved in lab exercises, development, and student engagement. Another aspect of the
EGEE-OSG collaboration at the education level involves the participation in the International
Winter School on Grid Computing (IWSGC09); US graduates of these grid schools include system administrators of CMS Tier-3 sites.
The content of the training material has been enhanced over the past year to include an updated
module targeting new OSG Grid site administrators. Work continues to enhance the on-line delivery of user training and we are seeing an increased number of students and faculty that are
signing up for the self-paced online course (about 50 active participants). In addition to individuals, we have made the course accessible as support material for graduate courses on grid computing at universities around the country, such as Rochester Institute of Technology, University
of Missouri at St. Louis and Clemson University.
The Content Management project has defined a collaborative process for managing production of
high-quality documentation, defined document standards and templates, reviewed and identified
documents for improvement or elimination, begun improvement of storage documentation, assisted in production of ATLAS and CMS Tier-3 documentation, and begun reorganization of
documentation access by user role. This work will continue through 2010.
4.2
Outreach Activities
We present a selection of the presentations and book publications from OSG in the past year:

Joint EGEE and OSG Workshop at the High Performance and Distributed Computing
(HPDC 2009): “Workshop on Monitoring, Logging and Accounting, (MLA) in Production
Grids. http://indico.fnal.gov/conferenceDisplay.py?confId=2335

Presentation at BIO-IT WORLD CONFERENCE & EXPO 2009, Ramping Up Your Computational Science to a Global Scale on the Open Science Grid http://www.bioitworldexpo.com/
56

Presentation to International Committee for Future Accelerators (ICFA) http://wwwconf.slac.stanford.edu/icfa2008/Livny103008.pdf

Contributions to the DOE Grass Roots Cyber Security R&D Town Hall white paper.

Book contribution for “Models and Patterns in Production Grids” being coordinated by
TeraGrid. http://osg-docdb.opensciencegrid.org/0008/000800/001/OSG-production-V2.pdf

Workshop at Grace Hopper conference. http://gracehopper.org/2008/assets/GHC2008Program.pdf
Members of the OSG are participating in workshops to chart the future of cyberinfrastructure in
the US:

The NSF Task Force on Campus Bridging (Miron Livny, John McGee)

Software sustainability (Miron Livny, Ruth Pordes)

DOE Cybersecurity (Mine Altunay)

ESnet requirements gathering (Miron Livny, Frank Wuerthwein)

HPC Best Practices Workshop (Alain Roy).
In the area of international outreach, we were active in South America and maintaining our connection to the completed site in South Africa. OSG staff conducted grid training schools at the
University of Johannesburg in April 2009 followed by a series of encounters with domain scientists with the goal of providing them with guidance in using cluster and grid computing techniques for advancing their research. In South America, partnership with the GRidUNESP regional grid was agreed to and Horst Severini from Oklahoma University accepted the request to
be the OSG liaison. He has attended 2 workshops in Sao Paolo to help the community establish
their regional infrastructure supported by OSG services, software and experienced staff. OSG
provided the staff for teaching a 2-week course in Colombia, again to support the next steps in
the implementation of their regional cyberinfrastructure.

Co-sponsorship of the International Summer School on Grid Computing in France
(http://www.issgc.org/). OSG sponsored 10 students to attend the 2-week workshop, provided a key-note speaker and 3 teachers for lectures and hands-on exercises. 6 of the students
have responded to our follow up queries and are continuing to use CI – local, OSG, TG.

Continued co-editorship of the highly successful International Science Grid This Week newsletter, www.isgtw.org. OSG is very appreciative that DOE and NSF have been able to supply funds matching to the European effort starting in January 2009. A new full time editor
was hired effective July 2009. Future work will include increased collaboration with
TeraGrid.

Presentations at the online International Winter School on Grid Computing
http://www.iceage-eu.org/iwsgc09/index.cfm
4.3
Internet dissemination
OSG co-sponsors the weekly electronic newsletter called International Science Grid This Week
(http://www.isgtw.org/); this is a joint collaboration with the Enabling Grids for EScience
57
(EGEE) project in Europe. Additional contributions, as well as a member of the editorial board,
come from the TeraGrid. The newsletter has been very well received, having just published 152
issues with subscribers totaling approximately 5,641, an increase of over 35% in the last year.
This newsletter covers the global spectrum of science, and projects that support science, using
distributed computing. In the last year, due to the increased support from NSF and DOE, we
have been able to increase the US effort contribution and thus improve our ability to showcase
US contributions.
In addition, the OSG has a comprehensive web site to inform and guide stakeholders and new
users of the OSG at http://www.opensciencegrid.org. As part of this we solicit and publish research highlights from our stakeholders; research highlights for the last year are accessible via
the following links:

Linking grids uncovers genetic mutations
Superlink-online helps genealogists perform compute-intensive analyses to discover diseasecausing anomalies. August 2009

Grid helps to filter LIGO’s data
Researchers must process vast amounts of data to look for signals of gravitational waves, minute cosmic ripples that carry information about the motion of objects in the universe. August 2009

Grid-enabled virus hunting
Scientists use distributed computing to compare sequences of DNA in order to identify new
viruses. June 2009

Clouds make way for STAR to shine
The STAR experiment, running compute-intensive analyses on the OSG to study nuclear decay, has successfully made use of cloud computing as well. April 2009

Single and Playing Hard to Get
High-energy physicists recently discovered the top quark produced as a single particle. The
tools and techniques that made this discovery possible advance the goal of observing the
Higgs particle. March 2009
The abovementioned research highlights and those published in prior years are available at
http://www.opensciencegrid.org/About/What_We%27re_Doing/Research_Highlights.
58
5.
5.1
Participants
Organizations
The members of the Council and List of Project Organizations
1. Boston University
2. Brookhaven National Laboratory
3. California Institute of Technology
4. Clemson University
5. Columbia University
6. Distributed Organization for Scientific and Academic Research (DOSAR)
7. Fermi National Accelerator Laboratory
8. Harvard University (Medical School)
9. Indiana University
10. Lawrence Berkeley National Laboratory
11. Purdue University
12. Renaissance Computing Institute
13. Stanford Linear Accelerator Center (SLAC)
14. University of California San Diego
15. University of Chicago
16. University of Florida
17. University of Illinois Urbana Champaign/NCSA
18. University of Nebraska – Lincoln
19. University of Wisconsin, Madison
20. University of Buffalo (council)
21. US ATLAS
22. US CMS
23. STAR
24. LIGO
25. CDF
26. D0
27. Condor
28. Globus
5.2
Partnerships and Collaborations
The OSG relies on external project collaborations to develop the software to be included in the
VDT and deployed on OSG. Our collaborations are continuing with: Community Driven Improvement of Globus Software (CDIGS), SciDAC-2 Center for Enabling Distributed Petascale
Science (CEDPS), Condor, dCache collaboration, Data Intensive Science University Network
(DISUN), Energy Sciences Network (ESNet), Internet2, National LambdaRail (NLR),
BNL/FNAL Joint Authorization project, LIGO Physics at the Information Frontier, Fermilab
Gratia Accounting, SDM project at LBNL (BeStMan), SLAC Xrootd, Pegasus at ISI, U.S. LHC
software and computing.
OSG also has close working arrangements with “Satellite” projects, defined as independent projects contributing to the OSG roadmap, with collaboration at the leadership level. Four new satellite projects are:
59

High Throughput Parallel Computing (HTPC) on OSG resources for an emerging class of
applications where large ensembles (hundreds to thousands) of modestly parallel (4- to ~64way) jobs.

Application testing over the ESnet 100-Gigabit network prototype, using the storage and
compute end-points supplied by the Magellan cloud computing at ANL and NERSC.

CorralWMS to enable user access to provisioned resources and “just-in-time” available resources for a single workload integrate and build on previous work on OSG's GlideinWMS
and Corral, a provisioning tool used to complement the Pegasus WMS used on TeraGrid.

VOSS: “Delegating Organizational Work to Virtual Organization Technologies: Beyond the
Communications Paradigm” (OCI funded, NSF 0838383)
The European EGEE infrastructure will transition to several separate projects in March 2010.
OSG has been active in talking to the new leadership to ensure a smooth transition that does not
impact our stakeholders. We have several joint EGEE-OSG-WLCG technical meetings a year.
At the last one in December 2009 the following list of existing contact points/activities was revisiting in the light of organizational changes:
Continue within EGI-InSPIRE. The EGI Helpdesk (previously GGUS) will continue being run
by the same team.
 Grid Operations Center ticketing systems interfaces.
 Security Incident Response.
 Joint Monitoring Group (MyOSG, MyEGEE). MyEGI development undertaken by CERN
(on behalf of EGI-InSPIRE) will establish relationship with IU.
 Interoperations (possibly including Interoperability) Testing.
 Software Security Validation. [Security Vulnerability Group continues inc. EMI]
 Joint Operations meetings and collaboration [May stop naturally].
 Joint Security Policy Group. [Security Policy Group]
 IGTF.
 Infrastructure Policy Working Group. [European e-Infrastructure Forum as well]
 Middleware Security Working Group. [Software Security Group - EGI - EMI representation]
 Dashboards. [HUC EGI-InSPIRE]
In the context of the WLCG Collaboration
 WLCG Management Board.
 WLCG Grid Deployment Board.
 WLCG Technical Forum.
 Accounting Gratia-APEL interfaces.
 MyOSG-SAM interface [Exchange/Integration of OSG monitoring records].
 Xrootd collaboration and support.
European Middleware Initiative
 Storage Resource Manager (SRM) specification, interfaces and testing.
 dCache.org collaboration.
 Glexec
 Authorization projects.
 OGF Production Grid Infrastructure Working Group.
60

Virtual Data Toolkit support and Engineering Management Team.
We are starting to work with TeraGrid to better understand the scope of our joint activities.
These activities are summarized in Table 4.
Table 4: Joint OSG – TeraGrid activities
Task Mission/Goals
OSG & TG Joint Activity Tracking and Reporting
iSGTW continuation and joint support plan; proposal for US based contributions
Resource Allocation Analysis and Recommendations
Resource Accounting Inter-operation and Convergence Recommendations
Explore how campus outreach and activities can
be coordinated
SCEC Application to use both OSG & TG
Joint Middleware Distributions (Client side)
Security Incidence Response
Workforce Development
Joint Student Activities (e.g. TG conference,
summer schools, etc.)
Infrastructure Policy Group (run by Bob Jones,
EGEE)
OSG Owner
Chander Sehgal
TG Owner
Tim Cockerill
Stage
Active
Judy Jackson
Elizabeth Leake
Planning
Miron Livny,
Chander Sehgal
Phillip Canal,
Brian Bockelman
John McGee,
Ruth Pordes
John McGee,
Mats Rynge
Alain Roy, ?
Kent Milfeld
Planning
Kent Milfeld,
David Hart
Planning
Kay Hunt, Scott
Lathrop
Dan Katz
Active
Lee Liming, J.P.
Navarro
Jim Marsteller,
Von Welch
John Towns,
Scott Lathrop
Scott Lathrop,
Laura McGiness
J.P. Navarro,
Phil Andrews
Active
Mine Altunay,
Jim Barlow
Ruth Pordes,
Miron Livny
David Ritchie,
Jim Weichel
Ruth Pordes,
Miron Livny
61
Active
Active
Planning
Active
Active
6.
Cooperative Agreement Performance
OSG has put in place processes and activities that meet the terms of the Cooperative Agreement
and Management Plan:

The Joint Oversight Team meets periodically, as scheduled by DOE and NSF, via phone to
hear about OSG progress, status, and concerns. Follow-up items are reviewed and addressed
by OSG, as needed.

Two intermediate progress reports were submitted to NSF in February and June of 2007.

The Science Advisory Group (SAG) met in June 2007. The OSG Executive Board has addressed feedback from the Advisory Group. Revised membership of the SAG was done in
August 2009. Telephone discussions with each member are half complete for the end of December 2009.

In February 2008, a DOE annual report was submitted.

In July 2008, an annual report was submitted to NSF.

In December 2008, a DOE annual report was submitted.

In June 2009, an annual report was submitted to NSF.
As requested by DOE and NSF, OSG staff provides pro-active support in workshops and collaborative efforts to help define, improve, and evolve the US national cyberinfrastructure.
62
7.
Publications Using OSG Infrastructure
ATLAS experiment
1. arXiv:0912.2642 [ps, pdf, other]
Title: Readiness of the ATLAS Liquid Argon Calorimeter for LHC Collisions
Authors: The ATLAS Collaboration
Comments: 31 pages, 34 figures, submitted to EPJC [v2: Fig. 1 fixed]
Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment
(hep-ex)
2. arXiv:0910.5156 [ps, pdf, other]
Title: Alignment of the ATLAS Inner Detector Tracking System
Authors: Grant Gorfine, for the ATLAS Collaboration
Comments: To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf
C090726
Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment
(hep-ex)
3. arXiv:0910.2991 [ps, pdf, other]
Title: Commissioning of the ATLAS Liquid Argon Calorimeters
Authors: Mark S. Cooke, for the ATLAS Collaboration
Comments: Proceedings from the CIPANP 2009 Conference
Subjects: Instrumentation and Detectors (physics.ins-det)
4. arXiv:0910.2767 [pdf, other]
Title: ATLAS Muon Detector Commissioning
Authors: E. Diehl, for the ATLAS muon collaboration
Comments: 6 pages, 18 figures. To be published in the proceedings of DPF-2009, Detroit,
MI, July 2009, eConf C090726
Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment
(hep-ex)
5. arXiv:0910.0847 [ps, pdf, other]
Title: Results from the Commissioning of the ATLAS Pixel Detector with Cosmic data
Authors: E. Galyaev, for the ATLAS collaboration
Comments: To be published in the proceedings of DPF-2009, Detroit, MI, July 2009, eConf
C090726. Contents: 9 pages, 13 figures, 9 references
Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment
(hep-ex)
6. arXiv:0910.0097 [pdf, other]
Title: Scalable Database Access Technologies for ATLAS Distributed Computing
Authors: A. Vaniachine, for the ATLAS Collaboration
Comments: 6 pages, 7 figures. To be published in the proceedings of DPF-2009, Detroit, MI,
July 2009, eConf C090726
Subjects: Instrumentation and Detectors (physics.ins-det); Databases (cs.DB); Distributed,
Parallel, and Cluster Computing (cs.DC); High Energy Physics - Experiment (hep-ex)
63
7. arXiv:0907.0441 [pdf, other]
Title: Readiness of the ATLAS Experiment for First Data
Authors: Thilo Pauly, for the ATLAS Collaboration
Comments: 8 pages, 18 figures, to appear in the Proceedings of the XLIV Rencontres de
Moriond, Electroweak Session, La Thuile, March 11, 2009
Subjects: Instrumentation and Detectors (physics.ins-det)
8. arXiv:0905.3977 [ps, pdf, other]
Title: Results from the commissioning of the ATLAS Pixel detector
Authors: J. Biesiada, for the ATLAS Collaboration
Comments: 3 pages. Submitted to NIM A as part of the proceedings of the TIPP09 conference, held at Tsukuba, Japan
Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment
(hep-ex)
CMS experiment
1. arXiv:0912.1189 [pdf, other]
Title: Search for SUSY at LHC in the first year of data-taking
Authors: Michele Pioppi on behalf of (the) CMS collaboration
Comments: To be published in the proceedings of MAD09, Antananarivo. 14 pages
Subjects: High Energy Physics - Experiment (hep-ex)
2. arXiv:0911.5434 [pdf, other]
Title: Commissioning and Performance of the CMS Pixel Tracker with Cosmic Ray Muons
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
3. arXiv:0911.5422 [pdf, other]
Title: Performance of the CMS Level-1 Trigger during Commissioning with Cosmic Ray
Muons
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
4. arXiv:0911.5397 [pdf, other]
Title: Measurement of the Muon Stopping Power in Lead Tungstate
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
5. arXiv:0911.4996 [pdf, other]
Title: Commissioning and Performance of the CMS Silicon Strip Tracker with Cosmic Ray
Muons
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
6. arXiv:0911.4994 [pdf, other]
Title: Performance of CMS Muon Reconstruction in Cosmic-Ray Events
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
7. arXiv:0911.4992 [pdf, other]
Title: Performance of the CMS Cathode Strip Chambers with Cosmic Rays
64
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
8. arXiv:0911.4991 [pdf, other]
Title: Performance of the CMS Hadron Calorimeter with Cosmic Ray Muons and LHC Beam
Data
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
9. arXiv:0911.4904 [pdf, other]
Title: Fine Synchronization of the CMS Muon Drift-Tube Local Trigger using Cosmic Rays
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
10. arXiv:0911.4895 [pdf, other]
Title: Calibration of the CMS Drift Tube Chambers and Measurement of the Drift Velocity
with Cosmic Rays
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
11. arXiv:0911.4893 [pdf, other]
Title: Performance of the CMS Drift-Tube Local Trigger with Cosmic Rays
Authors: The CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
12. arXiv:0911.4889 [pdf, other]
Title: Commissioning of the CMS High-Level Trigger with Cosmic Rays
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
13. arXiv:0911.4881 [pdf, other]
Title: Identification and Filtering of Uncharacteristic Noise in the CMS Hadron Calorimeter
Authors: The CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
14. arXiv:0911.4877 [pdf, other]
Title: Performance of CMS Hadron Calorimeter Timing and Synchronization using Cosmic
Ray and LHC Beam Data
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
15. arXiv:0911.4855 [pdf, other]
Title: Performance of the CMS Drift Tube Chambers with Cosmic Rays
Authors: The CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
16. arXiv:0911.4845 [pdf, other]
Title: Commissioning of the CMS Experiment and the Cosmic Run at Four Tesla
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
65
17. arXiv:0911.4842 [pdf, other]
Title: CMS Data Processing Workflows during an Extended Cosmic Ray Run
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
18. arXiv:0911.4770 [pdf, other]
Title: Aligning the CMS Muon Chambers with the Muon Alignment System during an Extended Cosmic Ray Run
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
19. arXiv:0911.4045 [pdf, other]
Title: Performance Study of the CMS Barrel Resistive Plate Chambers with Cosmic Rays
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
20. arXiv:0911.4044 [pdf, other]
Title: Time Reconstruction and Performance of the CMS Electromagnetic Calorimeter
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
21. arXiv:0911.4022 [pdf, other]
Title: Alignment of the CMS Muon System with Cosmic-Ray and Beam-Halo Muons
Authors: CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det)
22. arXiv:0910.5530 [pdf, other]
Title: Precise Mapping of the Magnetic Field in the CMS Barrel Yoke using Cosmic Rays
Authors: The CMS Collaboration
Subjects: Instrumentation and Detectors (physics.ins-det); Accelerator Physics (physics.accph)
23. arXiv:0910.3564 [pdf, other]
Title: Search for Extra Dimensions in the Diphoton Channel
Authors: Selda Esen, CMS Collaboration
Comments: 6 pages, 9 figures, conference proceeding To be published in the proceedings of
DPF-2009, Detroit, MI, July 2009, eConf C090726
Subjects: High Energy Physics - Experiment (hep-ex)
DZero experiment
1. Determination of the Strong Coupling Constant from the Inclusive Jet Cross Section in pp
Collisions at √s = 1.96 TeV
Published 12/29/09: Phys. Rev. D 80, 111107 (2009), arXiv.org:0911.2710 plots 0.7 fb-1
2. Direct Measurement of the W Boson Width
Published 12/4/09: Phys. Rev. Lett. 103, 231802 (2009), arXiv.org:0909.4814 plots 1.0 fb-1
3. Measurement of the Top Quark Mass in Final States with Two Leptons
Published 11/20/09: Phys. Rev. D 80, 092006 (2009), arXiv.org:0904.3195 plots 1.0 fb-1
4. Measurement of the t-Channel Single Top Quark Production Cross Section
Published 11/19/09: Phys. Lett. B 682, 363 (2010), arXiv.org:0907.4259 plots 2.3 fb-1
66
5. Measurement of Z/γ*+jet+X Angular Distributions in pp Collisions at √s = 1.96 TeV
Published 11/13/09: Phys. Lett. B 682, 370 (2010), arXiv.org:0907.4286 plots 1.0 fb-1
6. Search for Charged Higgs Bosons in Top Quark Decays
Published 11/12/09: Phys. Lett. B 682, 278 (2009), arXiv.org:0908.1811 plots 1.0 fb-1
7. Measurement of Dijet Angular Distributions at √s = 1.96 TeV and Searches for Quark Compositeness and Extra Spatial Dimensions
Published 11/5/09: Phys. Rev. Lett. 103, 191803 (2009), arXiv.org:0906.4819 plots 0.70 fb-1
8. Measurement of the WW Production Cross Section with Dilepton Final States in pp Collisions at √s = 1.96 TeV and Limits on Anomalous Trilinear Gauge Couplings
Published 11/2/09: Phys. Rev. Lett. 103, 191801 (2009), arXiv.org:0904.0673 plots 1.1 fb-1
9. Combination of tt Cross Section Measurements and Constraints on the Mass of the Top
Quark and its Decay into Charged Higgs Bosons
Published 10/19/09: Phys. Rev. D 80, 071102 (2009), arXiv.org:0903.5525 plots 1.0 fb-1
10. Search for Pair Production of First-Generation Leptoquarks in pp Collisions at √s = 1.96 TeV
Published 10/8/09: Phys. Lett. B 681, 224 (2009), arXiv.org:0907.1048 plots 1.0 fb-1
11. Measurement of the W Boson Mass
Published 10/1/09: Phys. Rev. Lett. 103, 141801 (2009), arXiv.org:0908.0766 plots 1.0 fb-1
12. Search for Charged Higgs Bosons in Decays of Top Quarks
Published 9/30/09: Phys. Rev. D 80, 051107 (2009), arXiv.org:0906.5326 plots 0.90 fb-1
13. Measurement of Trilinear Gauge Boson Couplings from WW + WZ → lνjj Events in pp Collisions at √s = 1.96 TeV
Published 9/23/09: Phys. Rev. D 80, 053012 (2009), arXiv.org:0907.4398 plots 1.1 fb-1
14. Direct Measurement of the Mass Difference Between Top and Antitop Quarks
Published 9/21/09: Phys. Rev. Lett. 103, 132001 (2009), arXiv.org:0906.1172 plots 1.0 fb-1
15. A Novel Method for Modeling the Recoil in W Boson Events at Hadron Colliders
Published 8/27/09: Nucl. Instrum. Methods in Phys. Res. A 609, 250 (2009),
arXiv.org:0907.3713 plots
16. Observation of Single Top-Quark Production PRL Editors' Suggestion "Physics" Synopsis
article
Published 8/24/09: Phys. Rev. Lett. 103, 092001 (2009), arXiv.org:0903.0850 plots in the
paper more plots for talks 2.3 fb-1
17. Search for Dark Photons from Supersymmetric Hidden Valleys
Published 8/17/09: Phys. Rev. Lett. 103, 081802 (2009), arXiv.org:0905.1478 plots 4.1 fb-1
18. Search for Associated Production of Charginos and Neutralinos in the Trilepton Final State
using 2.3 fb-1 of Data
Published 8/13/09: Phys. Lett. B 680, 34 (2009), arXiv.org:0901.0646 plots 2.3 fb-1
19. Search for Resonant Pair Production of Neutral Long-Lived Particles Decaying to bb
in pp Collisions at √s = 1.96 TeV
Published 8/13/09: Phys. Rev. Lett. 103, 071801 (2009), arXiv.org:0906.1787 plots 3.6 fb-1
67
20. Search for Squark Production in Events with Jets, Hadronically Decaying Tau Leptons
and Missing Transverse Energy at √s = 1.96 TeV
Published 8/7/09: Phys. Lett. B 680, 24 (2009), arXiv.org:0905.4086 plots 1.0 – 2.1 fb-1
21. Search for Next-to-Minimal Supersymmetric Higgs Bosons in the
h → aa → μμ μμ, μμ ττ Channels using pp Collisions at √s = 1.96 TeV
Published 8/3/09: Phys. Rev. Lett. 103, 061801 (2009), arXiv.org:0905.3381 plots 4.2 fb-1
22. Measurement of the tt Production Cross Section and Top Quark Mass Extraction
Using Dilepton Events in pp Collisions
Published 7/21/09: Phys. Lett. B 679, 177 (2009), arXiv.org:0901.2137 plots 1.0 fb-1
23. Search for the Standard Model Higgs Boson in Tau Final States
Published 6/25/09: Phys. Rev. Lett. 102, 251801 (2009), arXiv.org:0903.4800 plots 1.0 fb-1
24. Search for Resonant Diphoton Production with the DØ Detector
Published 6/12/09: Phys. Rev. Lett. 102, 231801 (2009), arXiv.org:0901.1887 plots 2.7 fb-1
25. Relative Rates of B Meson Decays into ψ(2S) and J/ψ
Published 6/10/09: Phys. Rev. D 79, 111102(R) (2009), arXiv.org:0805.2576 plots 1.3 fb-1
26. Measurements of Differential Cross Sections of Z/γ*+jets+X Events in pp Collisions at √s =
1.96 TeV
Published 5/29/09: Phys. Lett. B 678, 45 (2009), arXiv.org:0903.1748 plots 1.0 fb-1
27. Measurement of the Zγ → ννγ Production Cross Section and Limits on Anomalous ZZγ and
Zγγ Couplings
in pp Collisions at √s = 1.96 TeV
Published 5/22/09: Phys. Rev. Lett. 102, 201802 (2009), arXiv.org:0902.2157 plots 2.6 fb-1
28. Search for Charged Higgs Bosons Decaying into Top and Bottom Quarks in pp Collisions
Published 5/14/09: Phys. Rev. Lett. 102, 191802 (2009), arXiv.org:0807.0859 plots 0.90 fb-1
29. Measurement of γ + b + X and γ + c + X Production Cross Sections in pp Collisions at √s =
1.96 TeV
Published 5/12/09: Phys. Rev. Lett. 102, 192002 (2009), arXiv.org:0901.0739 plots 1.0 fb-1
30. Search for Long-Lived Charged Massive Particles with the DØ Detector
Published 4/22/09: Phys. Rev. Lett. 102, 161802 (2009), arXiv.org:0809.4472 plots 1.1 fb-1
31. Evidence of WW+WZ Production with Lepton+Jets Final States in pp Collisions at √s = 1.96
TeV
Published 4/21/09: Phys. Rev. Lett. 102, 161801 (2009), arXiv.org:0810.3873 plots 1.1 fb-1
32. Search for the Lightest Scalar Top Quark in Events with Two Leptons in pp Collisions at √s
= 1.96 TeV
Published 4/17/09: Phys. Lett. B 675, 289 (2009), arXiv.org:0811.0459 plots 1.0 fb-1
33. Search for Anomalous Top-Quark Couplings with the DØ Detector
Published 3/4/09: Phys. Rev. Lett. 102, 092002 (2009), arXiv.org:0901.0151 plots 1.0 fb-1
34. Evidence for the Decay Bs0→Ds(*)Ds(*) and a Measurement of ΔΓsCP/Γs
Published 3/3/09: Phys. Rev. Lett. 102, 091801 (2009), arXiv.org:0811.2173 plots 2.8 fb-1
68
35. Measurement of the Lifetime of the Bc± Meson in the Semileptonic Decay Channel
Published 3/2/09: Phys. Rev. Lett. 102, 092001 (2009), arXiv.org:0805.2614 plots 1.3 fb-1
36. Search for Admixture of Scalar Top Quarks in the tt Lepton+Jets Final State at √s = 1.96
TeV
Published 2/20/09: Phys. Lett. B 674, 4 (2009), arXiv.org:0901.1063 plots 0.90 fb-1
37. Search for Neutral Higgs Bosons at High tanβ in the b(h/H/A)→bτ+τ− Channel
Published 2/6/09: Phys. Rev. Lett. 102, 051804 (2009), arXiv.org:0811.0024 plots 0.30 fb-1
38. Search for Large Extra Spatial Dimensions in the Dielectron and Diphoton Channels
in pp Collisions at √s = 1.96 TeV
Published 2/6/09: Phys. Rev. Lett. 102, 051601 (2009), arXiv.org:0809.2813 plots 1.0 fb-1
39. Search for Associated W and Higgs Boson Production in pp Collisions at √s = 1.96 TeV
Published 2/4/09: Phys. Rev. Lett. 102, 051803 (2009), arXiv.org:0808.1970 plots 1.1 fb-1
40. Measurement of the Semileptonic Branching Ratio of Bs0 to an Orbitally Excited Ds** State:
Br(Bs0→Ds1−(2536)μ+νX)
Published 2/3/09: Phys. Rev. Lett. 102, 051801 (2009), arXiv.org:0712.3789 plots 1.3 fb-1
41. Measurement of the Angular and Lifetime Parameters of the Decays Bd0→J/ψK*0 and
Bs0→J/ψφ
Published 1/20/09: Phys. Rev. Lett. 102, 032001 (2009), arXiv.org:0810.0037 plots 2.8 fb-1
CDF experiment
1. Measurement of the WW + WZ Production Cross Section Using the Lepton + Jets Final State
at CDF II
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. November 24,
2009. Fermilab-Pub-09-593-E. arXiv: 0911.4449.
Abstract, PDF (181KiB). 9964: Added: Monday, 2009 December 07 14:40:47 CST
2. Search for Fermion-Pair Decays qq-bar --> (tW-/p)(t-bar W+/-) in Same-Charge Dilepton
Events
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 5, 2009.
Fermilab-Pub-09-628-E. arXiv: 0912.1057.
Abstract, PDF (251KiB). 9919: Added: Thursday, 2009 December 17 09:11:25 CST
3. Search for Technicolor Particles Produced in Association with a W Boson at CDF
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 10, 2009.
Fermilab-Pub-09-619-E. arXiv: 0912.2059.
Abstract, PDF (189KiB). 9907: Added: Thursday, 2009 December 10 13:19:32 CST
4. Measurement of the W+W- Production Cross Section and Search for Anomalous WWgamma
and WWZ Couplings in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 22, 2009.
Fermilab-Pub-09-635-E. arXiv: 0912.4691
Abstract, PostScript (414KiB). 9902: Added: Friday, 2010 January 08 14:06:29 CST
5. A Search for the Higgs Boson Using Neural Networks in Events with Missing Energy and bquark Jets in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-586-E. arXiv: 0911.3935. Sub-
69
mitted to Phys. Rev. Lett. November 19, 2009.
Abstract, PDF (303KiB). 9838: Added: Monday, 2009 December 07 10:20:07 CST
6. Search for Pair Production of Supersymmetric Top Quarks in Dilepton Events from p anti-p
Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 7, 2009.
Fermilab-Pub-09-614-E.
Abstract, PDF (257KiB). 9823: Added: Monday, 2009 December 07 14:14:38 CST
7. A Study of the Associated Production of Photons and b-quark Jets in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 16, 2009.
Fermilab-Pub-09-634-E. arXiv: 0912.3453.
Abstract, PostScript (1069KiB). 9803: Added: Tuesday, 2009 December 22 12:17:10 CST
8. Measurement of the Inclusive Isolated Prompt Photon Cross Section in p anti-p Collisions at
s**(1/2) = 1.96 TeV using the CDF Detector
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 11106 (2009). arXiv: 0910.3623.
Abstract, PDF (178KiB). 9729: Added: Friday, 2010 January 08 14:29:27 CST
9. A Search for the Associated Production of the Standard Model Higgs Boson in the AllHadronic Channel
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 221801 (2009). arXiv:
0907.0810.
Abstract, PostScript (467KiB). 9669: Added: Monday, 2009 December 07 15:30:51 CST
10. Search for New Physics with a Dijet plus Missing E(t) Signature in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. Lett. December 23, 2009.
Fermilab-Pub-09-635-E. arXiv: 0912.4691.
Abstract, PostScript (364KiB). 9329: Changed: Friday, 2010 January 08 13:56:14 CST
11. Measurement of the Top Quark Mass in the Dilepton Channel using m_T2 at CDF
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. D Rapid Communications November 17, 2009. Fermilab-Pub-09-580-E. arXiv: 0911.2956.
Abstract, PDF (298KiB). 9885: Added: Tuesday, 2009 November 17 10:34:33 CST
12. Measurements of Branching Ratios and CP Asymmetries in B+/- --> D_CP K+/- Decays in
Hadron Collisions
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. D November 2, 2009.
Fermilab-Pub-09-549-E. arXiv: 0911.0425.
Abstract, PDF (218KiB). 9708: Added: Monday, 2009 November 09 09:52:39 CST
13. Search for New Color-Octet Vector Particle Decaying to t anti-t in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Lett. B November 16, 2009.
Fermilab-Pub-09-578-E. arXiv: 0911.3112.
Abstract, PDF (341KiB). 9636: Changed: Monday, 2009 November 16 13:11:18 CST
14. Search for Higgs Bosons Predicted in Two-Higgs-Doublet Models Via Decays to Tau lepton
Pairs in 1.96 TeV p anti-p Collisions
70
A. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 201801 (2009).
Abstract, PDF (231KiB). 9622: Added: Friday, 2009 November 13 09:44:11 CST
15. Measurement of the tt-bar Production Cross Section in 2 fb-1 of p anti-p Collisions at
s**(1/2) 1.96 TeV Using Lepton Plus Jets Events with Soft Muon Tagging
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 052007 (2009). arXiv:
0901.4142.
Abstract, PDF (810KiB). 9587: Added: Wednesday, 2009 November 11 13:23:20 CST
16. First Observation of B-bar0_s --> D+/-_s K-/+ and Measurement of the Ratio of Branching
Fractions B(B-bar0_s --> D+/-_s K-/+)/B(B-bar0_s -> D+_s pi+-)
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 191802 (2009)
Abstract, PDF (287KiB). 9372: Added: Wednesday, 2009 November 11 10:35:13 CST
17. Search for Anomalous Production of Events with Two Photons and Additional Energetic Objects at CDF
T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-535-E. arXiv: 0910.5170. Submitted to Phys. Rev. D October 27, 2009.
Abstract, PostScript (1106KiB). 9886: Added: Tuesday, 2009 October 27 15:25:07 CDT
18. Measurements of the Top-Quark Mass Using Charged Particle Tracking
T. Aaltonen et al., The CDF Collaboration, submitted to Phys. Rev. D October 7, 2009. Fermilab-Pub-09-464-E. arXiv: 0910.0969.
Abstract, PDF (841KiB). 9840: Added: Wednesday, 2009 October 07 10:30:19 CDT
19. A Search for the Higgs Boson Produced in Association with Z --> ell+ ell- Using the Matrix
Element Method at CDF II
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 071101 (2009). arXiv:
0908.3534.
Abstract, PDF (443KiB). 9810: Added: Monday, 2009 October 26 15:14:50 CDT
20. Observation of the Omega_b and Measurement of the Properties of the Xi-_b and Omega-_b
Baryons
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 072003 (2009). arXiv:
0905.3123.
Abstract, PostScript (1203KiB). 9726: Added: Monday, 2009 October 26 14:01:34 CDT
21. Precision Measurement of the X(3872) Mass in J/psi pi+_ pi+- Decays
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 152001 (2009).
Abstract, PDF (189KiB). 9700: Added: Monday, 2009 October 26 14:32:24 CDT
22. First Observation of Vector Boson Pairs in a Hadronic Final State at the Tevatron Collider
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 091803 (2009). arXiv:
0905.4714.
Abstract, PDF (214KiB). 9784: Added: Tuesday, 2009 September 08 16:01:39 CDT
23. Measurement of the Mass of the Top Quark Using the Invariant Mass of Lepton Pairs in Soft
Muon b-tagged Events
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 051104 (2009). arXiv:
0906.5371.
Abstract, PostScript (434KiB). 9739: Added: Tuesday, 2009 September 15 12:24:35 CDT
71
24. Search for a Standard Model Higgs Boson in WH --> l nu bb-bar in p anti-p Collisions at
s**(1/2) = 1.96 TeV
Phys. Rev. Lett. 103, 101802 (2009). arXiv: 0906.5613.
Abstract, PDF (177KiB). 9720: Changed: Tuesday, 2009 September 15 10:39:57 CDT
25. Search for Charged Higgs Bosons in Decays of Top Quarks in p anti-p Collisions at s**(1/2)
= 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 101803 (2009). arXiv:
0907.1269.
Abstract, PDF (171KiB). 9687: Added: Tuesday, 2009 September 15 11:19:04 CDT
26. Search for the Neutral Current Top Quark Decay t --> Zc Using Ratio of Z-Boson + 4 Jets to
W-Boson + 4 Jets Production
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 052001 (2009). arXiv:
0905.0277.
Abstract, PDF (525KiB). 9656: Added: Wednesday, 2009 September 16 14:09:28 CDT
27. Measurement of the b-jet Cross Section in Events with a W Boson in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-416-E. arXiv: 0909.1505. Submitted to Phys. Rev. Lett. September 8, 2009.
Abstract, PDF (286KiB). 9649: Added: Tuesday, 2009 September 08 15:12:52 CDT
28. Search for Anomalous Production of Events with a Photon, Jet, b-quark Jet, and Missing
Transverse Energy
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 052003 (2009). arXiv:
0905.0231.
Abstract, PDF (987KiB). 9627: Added: Wednesday, 2009 September 16 14:41:30 CDT
29. Search for Hadronic Decays of W and Z Bosons in Photon Events in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 052011 (2009). arXiv:
0803.4264.
Abstract, PostScript (989KiB). 9050: Added: Tuesday, 2009 September 29 13:09:53 CDT
30. Measurement of d sigma/dy of Drell-Yan e+e- Pairs in the Z Mass Region from p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-402-E. arXiv: 0908.3914. Submitted to Phys. Rev. Lett. August 27, 2009.
Abstract, PostScript (347KiB). 9832: Added: Thursday, 2009 August 27 14:51:03 CDT
31. Observation of Electroweak Single Top Quark Production
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 092002 (2009). arXiv:
0903.0885.
Abstract, PDF (193KiB). 9715: Added: Thursday, 2009 September 03 16:21:53 CDT
32. Search for a Fermiophobic Higgs Boson Decaying into Diphotons in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 061803 (2009). arXiv:
0905.0413.
Abstract, PDF (154KiB). 9655: Added: Wednesday, 2009 August 12 11:47:12 CDT
72
33. Search for the Production of Narrow tb-bar Resonances in 1.9 fb-1 of p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 041801 (2009). arXiv:
0902.3276.
Abstract, PDF (178KiB). 9507: Added: Wednesday, 2009 August 12 13:31:09 CDT
34. Production of psi(2S) Mesons in p anti-p Collisions at 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 031103 (2009). arXiv:
0905.1982.
Abstract, PDF (242KiB). 9205: Changed: Wednesday, 2009 August 12 11:48:22 CDT
35. Searching the Inclusive ell_gamma Missing E(t) + b-quark Signature for Radiative Top
Quark Decay and Non-Standard-Model Processes
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 011102 (2009). arXiv:
0906.0518.
Abstract, PDF (313KiB). 9675: Added: Wednesday, 2009 August 19 11:12:36 CDT
36. Search for Hadronic Decays of W and Z in Photon Events in p anti-p Collisions at s**(1/2) =
1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Eur. Phys. J. C62: 319:326 (2009). arXiv:
0903.2060.
Abstract, PDF (161KiB). 9652: Added: Tuesday, 2009 September 29 14:22:10 CDT
37. Search for a Standard Model Higgs Boson Production in Association with a W Boson using a
Neural Network Discriminant at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D80, 012002 (2009)
Abstract, PDF (252KiB). 9639: Added: Wednesday, 2009 August 19 14:42:33 CDT
38. Search for Long-Lived Massive Charged Particles in 1.96 TeV p anti-p Collisions
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 021802 (2009). arXiv:
0902.1266.
Abstract, PDF (235KiB). 9552: Added: Tuesday, 2009 July 14 14:48:09 CDT
39. Observation of New Charmless Decays of Bottom Hadrons
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 103, 031801 (2009)
Abstract, PostScript (267KiB). 9525: Added: Wednesday, 2009 August 19 14:14:31 CDT
40. Precision Measurement of the X(3872) Mass in J/psi pi++ pi+- Decays
T. Aaltonen et al., The CDF Collaboration, Fermilab-Pub-09-330-E. arXiv: 0906.5218. Submitted to Phys. Rev. Lett. June 29, 2009.
Abstract, PostScript (257KiB). 8311: Added: Friday, 2009 July 10 15:12:59 CDT
41. First Measurement of the tt-bar Differential Cross Section d sigma/dM_tt-bar in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 222003 (2009). arXiv:
0903.2850.
Abstract, PDF (199KiB). 9664: Added: Tuesday, 2009 June 09 11:03:10 CDT
42. Evidence for a Narrow Near-Threshold Structure in the J/psi phi Mass Spectrum in B+ -->
J/psi phi K+ Decays
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 242002 (2009). arXiv:
73
0903.2229.
Abstract, PostScript (1047KiB). 9638: Added: Tuesday, 2009 June 30 12:27:35 CDT
43. Search for Gluino-Mediated Bottom Squark Production in p anti-p Collisions at s**(1/2) =
1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 221801 (2009). arXiv:
0903.2618.
Abstract, PostScript (420KiB). 9631: Added: Thursday, 2009 June 11 11:12:10 CDT
44. Search for Exclusive Z Boson Production and Observation of High Mass p anti-p --> p gamma gamma p-bar --> p ell+ ell- p-bar Events in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 222002 (2009). arXiv:
0902.2816.
Abstract, PDF (151KiB). 9630: Added: Tuesday, 2009 June 09 11:50:04 CDT
45. Measurement of the Particle Production and Inclusive Differential Cross Sections in p anti-p
Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112005 (2009).
Abstract, PDF (455KiB). 9550: Added: Tuesday, 2009 July 07 13:41:22 CDT
46. Search for WW and WZ Production in Lepton Plus Jets Final States at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112011 (2009). arXiv:
0903.0814.
Abstract, PDF (397KiB). 9546: Added: Tuesday, 2009 June 30 12:07:45 CDT
47. First Simultaneous Measurement of the Top Quark Mass in the Lepton + Jets and Dilepton
Channels at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 092005 (2009). arXiv:
0809.4808.
Abstract, PDF (1384KiB). 9515: Changed: Wednesday, 2009 July 08 10:28:48 CDT
48. A Measurement of the t anti-t Cross Section in p anti-p Collisions at sqrt s = 1.96 TeV using
Dilepton Events with a Lepton plus Track Selection
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112007 (2009). arXiv:
0903.5263.
Abstract, PDF (842KiB). 9510: Added: Tuesday, 2009 July 07 12:54:48 CDT
49. Measurement of the Top Quark Mass at CDF using the "neutrino phi weighting" Template
Method on a Lepton Plus Isolated Track Sample
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 072005 (2009). arXiv:
0901.3773.
Abstract, PDF (849KiB). 9505: Added: Friday, 2009 June 05 16:23:12 CDT
50. Observation of Exclusive Charmonium Production and gamma gamma -> mu+ mu- in p antip Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 242001 (2009). arXiv:
0902.1271.
Abstract, PDF (265KiB). 9477: Changed: Tuesday, 2009 July 07 11:45:57 CDT
51. Measurement of the k_T Distribution of Particles in Jets Produced in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 232002 (2009). arXiv:
74
0811.2820.
Abstract, PostScript (271KiB). 9476: Added: Tuesday, 2009 July 07 14:59:58 CDT
52. Search for New Particles Decaying into Dijets in Proton-Antiproton Collisions at s**(1/2) =
1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 112002 (2009). arXiv:
0812.4036.
Abstract, PDF (244KiB). 9347: Added: Tuesday, 2009 June 09 08:58:08 CDT
53. Search for the Decays B0_(s) --> e+ mu- and B0_(s) --> e+ e- in CDF Run II
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 201801 (2009). arXiv:
0901.3803.
Abstract, PDF (355KiB). 9597: Added: Wednesday, 2009 September 16 15:50:54 CDT
54. Search for Top-Quark Production via Flavor-Changing Neutral Currents in W + 1 Jet Events
at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 151801 (2009). arXiv:
0812.3400.
Abstract, PDF (161KiB). 9581: Added: Monday, 2009 June 08 10:34:07 CDT
55. Direct Measurement of the W Production Charge Asymmetry in p anti-p Collisions at
s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 181801 (2009). arXiv:
0901.2169.
Abstract, PostScript (395KiB). 9388: Added: Thursday, 2009 June 11 15:01:37 CDT
56. Top Quark Mass Measurement in the tt-bar All Hadronic Channel Using a Matrix Element
Technique in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 072010 (2009). arXiv:
0811.1062.
Abstract, PDF (622KiB). 9293: Changed: Tuesday, 2009 June 30 11:59:11 CDT
57. Measurement of the b-Hadron Production Cross Section Using Decays to mu- D0X Final
States in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 092003 (2009). arXiv:
0903.2403.
Abstract, PostScript (941KiB). 8890: Added: Friday, 2009 May 29 13:57:22 CDT
58. Top Quark Mass Measurement in the Lepton Plus Jets Channel Using a Modified Matrix Element Method
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 072001 (2009).
arXiv:0812.4469.
Abstract, PostScript (1297KiB). 9491: Added: Thursday, 2009 April 16 09:34:00 CDT
59. Measurement of the Top Quark Mass with Dilepton Events Selected Using Neuroevolution at
CDF II
Phys. Rev. Lett. 102, 152001 (2009). arXiv:0807.4652.
Abstract, PDF (297KiB). 9235: Added: Thursday, 2009 April 16 09:48:20 CDT
60. Inclusive Search for Squark and Gluino Production in p anti-p Collisions at s**(1/2) = 1.96
TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Phys. Lett. 102, 121801 (2009). arXiv:
75
0811.2512.
Abstract, PDF (184KiB). 9594: Added: Wednesday, 2009 July 08 10:53:02 CDT
61. A Search for High-Mass Resonances Decaying to Dimuons at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 091805 (2009).
arXiv:0811.0053.
Abstract, PDF (223KiB). 9516: Added: Thursday, 2009 March 12 10:53:13 CDT
62. Measurement of W-Boson Helicity Fractions in Top-Quark Decays using cos theta*
T. Aaltonen et al., The CDF Collaboration, Phys. Lett. B479, p. 160-167 (2009). arXiv:
0811.0344.
Abstract, PostScript (458KiB). 9471: Added: Thursday, 2009 July 09 13:51:10 CDT
63. Measurement of the Cross Section for b Jet Production in Events with a Z Boson in p anti-p
Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboraton, Phys. Rev. D79, 052008 (2009). arXiv:
0812.4458.
Abstract, PDF (251KiB). 9466: Added: Thursday, 2009 July 09 10:31:55 CDT
64. Measurement of the Fraction of tt-bar Production via Gluon-Gluon Fusion in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 031101 (2009)
Abstract, PostScript (707KiB). 9356: Added: Tuesday, 2009 March 03 16:28:16 CST
65. Measurement of Resonance Parameters of Orbitally Excited Narrow B0 Mesons
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. 102, 102003 (2009). arXiv:0809.5007.
Abstract, PDF (169KiB). 9352: Added: Monday, 2009 March 23 10:49:26 CDT
66. Search for New Physics in the mu mu + e/mu + missing E(t) Channel with a low-p(t) Lepton
Threshold at the Collider Detector at Fermilab
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 052004 (2009).
arXiv:0810.3213.
Abstract, PDF (388KiB). 9311: Added: Monday, 2009 March 23 13:07:44 CDT
67. Direct Bound on the Total Decay Width of the Top Quark in p anti-p Collisions at s**(1/2) =
1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 042001 (2009).
arXiv:0808.2167. Fermilab-Pub-08-302-E.
Abstract, PDF (186KiB). 9218: Added: Wednesday, 2009 February 11 15:00:13 CST
68. First Measurement of the Ratio of Branching Fractions B(Lambda0(b) -> Lambda+_c munu-bar_mu)/B(Lambda0(b) --> Lambda+_c pi+-)
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 032001 (2009).
arXiv:0810.3213.
Abstract, PostScript (1976KiB). 8477: Added: Friday, 2009 February 13 11:57:37 CST
69. Search for Maximal Flavor Violating Scalars in Same-Charge Lepton Pairs in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 041801 (2009).
arXiv:0809.4903.
Abstract, PDF (267KiB). 9452: Added: Wednesday, 2009 January 28 10:52:55 CST
76
70. Search for a Higgs Boson Decaying to Two W Bosons at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 021802 (2009).
arXiv:0809.3930.
Abstract, PDF (162KiB). 9368: Added: Tuesday, 2009 January 27 16:17:59 CST
71. Search for High-Mass e+e- Resonances in p anti-p Collisions at s**(1/2) = 1.96 TeV
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. Lett. 102, 031801 (2009).
arXiv:0810.2059.
Abstract, PDF (180KiB). 9259: Added: Wednesday, 2009 January 28 10:28:42 CST
72. Search for the Rare B Decays B+ --> mu+ mu- K+, B0 --> mu+ mu- K*(892)0, and B0(s) -->
mu+ mu- phi at CDF
T. Aaltonen et al., The CDF Collaboration, Phys. Rev. D79, 011104(R) (2009).
arXiv:0804.3908.
Abstract, PostScript (470KiB). 9119: Added: Wednesday, 2009 January 28 11:52:46 CST
Other research projects
1. Structure of rotavirus outer-layer protein VP7 bound with a neutralizing Fab,
Aoki et al., Science (2009) vol. 324 (5933) pp. 1444-7
2. A Probabilistic Graphical Model for Ab Initio Folding,
in Research in Computational Biology, Feng Zhao, Jian Peng, Joe DeBartolo, Karl F. Freed,
Tobin R. Sosnick and Jinbo Xu, ISBN 978-3-642-02007-0, SpringerLink 2009.
3. Sudden stratospheric warmings seen in MINOS deep underground muon data,
Osprey S., et al., Geophys. Res. Lett., 36, L05809, doi:10.1029/2008GL036359 (2009).
4. Sixth, seventh and eighth virial coefficients of the Lennard-Jones model,
A. J. Schultz and D. A. Kofke, Mol. Phys. 107 2309 (2009).
5. Lattice Strain Due to an Atomic Vacancy,
SD Li, M. S. Sellers, C. Basaran, A. J. Schultz, D. A. Kofke, Int. J. Mol. Sci. 10 2798 (2009).
6. Virial coefficients of Lennard-Jones mixtures,
A. J. Schultz and D. A. Kofke, J. Chem. Phys. 130 224104 (2009).
7. Interpolation of virial coefficients,
A. J. Schultz and D. A. Kofke, Mol. Phys. 111 1431 (2009).
8. Fourth and Fifth Virial Coefficients of Polarizable Water,
K. M. Benjamin, A. J. Schultz and D. A. Kofke, J. Phys. Chem. B 113 7810 (2009).
9. There is no Drake/Larson linear space on 30 points,
Anton Betten, Dieter Betten, J. Combinatorial Designs, 28: 48-70 (2010)
10. A Self-Organized Grouping (SOG) Method for Efficient Grid Resource Discovery,
Padmanabhan, A., Ghosh, S., and Wang, S. 2010. Journal of Grid Computing, Forthcoming,
2010. DOI: 10.1007/s10723-009-9145-0.
11. An Interoperable Information Service Solution for Grids,
in Cyberinfrastructure Technologies and Applications (edited by Cao, J.),
Padmanabhan, A., Shook, E., Liu, Y., and Wang, S., Nova Science Publishers, Inc. (2009)
77
Download