SC|07 Bandwidth Challenge award-winning project: Using the Data Capacitor for Remote Data Collection, Analysis, and Visualization Stephen C. Simms1, Matthew Davy2, Bret Hammond3, Matt Link4, Scott Teige5, Mu-Hyun Baik6, Yogita Mantri7, Richard Lord8, D.F. (“Rick”) McMullen9, John C. Huffman10, Kia Huffman11, Guido Juckeland12, Michael Kluge13, Robert Henschel14, Holger Brunst15, Andreas Knuepfer16, Matthias Mueller17, P.R. Mukund18, Andrew Elble19, Ajay Pasupuleti20, Richard Bohn21, Sripriya Das22, James Stefano23, Gregory G. Pike24, Douglas A. Balog25, Craig A. Stewart26 © Trustees of Indiana University 2007 Released under Citation: Simms, S.C., M.P. Davy, C.B. Hammond, M.R. Link, C.A. Stewart, S. Teige, M.-H. Baik, Y. Mantri , R. Lord , D.F. McMullen, J.C. Huffman, K. Huffman, G. Juckeland, M. Kluge, R. Henschel, H. Brunst, A. Knuepfer, M. Mueller, P.R. Mukund, A. Elble, A. Pasupuleti, R. Bohn, S. Das, J. Stefano, G.G. Pike, D.A. Balog. 2007. “Indiana University’s SC|07 Bandwidth Challenge award-winning project: Using the Data Capacitor for Remote Data Collection, Analysis, and Visualization.” Indiana University, Bloomington, IN. Available from: http://hdl.handle.net/2022/14615 Note: This report was substantially completed in 2007, and has been edited and reformatted for deposit in the IUScholarWorks repository in 2012. We note that among the authors the following affiliations and email addresses have changed between 2007 and 2012: Yogita Mantri, Richard Lord, D.F. (“Rick”) McMullen, John C. Huffman, Kia Huffman, Robert Henschel, Gregory G. Pike, Douglas A. Balog, Ajay Pasupuleti. 1 Research and Academic Computing Division of University Information Technology Services, Indiana University Indiana University; ssimms@indiana.edu 2 GlobalNOC, University Information Technology Services, Indiana University; mpd@indiana.edu 3 Research and Academic Computing Division of University Information Technology Services, Indiana University; bret@indiana.edu 4 Research and Academic Computing Division of University Information Technology Services, Indiana University; mrlink@indiana.edu 5 Research and Academic Computing Division of University Information Technology Services, Indiana University; steige@indiana.edu 6 Department of Chemistry and School of Informatics, Indiana University mbaik@indiana.edu 7 Department of Chemistry and School of Informatics, Indiana University; ymantri@indiana.edu 8 Department of Chemistry and School of Informatics Indiana University; rllord@indiana.edu; Indiana University 9 Pervasive Technology Labs, Indiana University; mcmullen@indiana.edu 10 Department of Chemistry, Indiana University; jnhuffma@indiana.edu 11 Department of Chemistry, Indiana University; kihuffma@indiana.edu 12 The Center for Information Services and High Performance Computing, Technische Universitaet Dresden; guido.juckeland@tu-dresden.de 13 The Center for Information Services and High Performance Computing, Technische Universitaet Dresden; michael.kluge@tudresden.de 14 The Center for Information Services and High Performance Computing, Technische Universitaet Dresden; henschel@tudresden.de 15 The Center for Information Services and High Performance Computing, Technische Universitaet Dresden; holger.brunst@tudresden.de 16 The Center for Information Services and High Performance Computing, Technische Universitaet Dresden; andreas.knuepfer@tu-dresden.de 17 The Center for Information Services and High Performance Computing, Technische Universitaet Dresden; matthias.mueller@tu-dresden.de 18 Center for Preservation of Ancient Manuscripts, Rochester Institute of Technology; prmeee@rit.edu 19 Center for Preservation of Ancient Manuscripts, Rochester Institute of Technology; aweits@rit.edu 20 Center for Preservation of Ancient Manuscripts, Rochester Institute of Technology; axp1014@rit.edu 21 Center for Preservation of Ancient Manuscripts, Rochester Institute of Technology; rxbeee@rit.edu 22 Center for Preservation of Ancient Manuscripts, Rochester Institute of Technology; sripriyabandi@gmail.com 23 Center for Preservation of Ancient Manuscripts, Rochester Institute of Technology; jvseee@rit.edu 24 Oak Ridge National Laboratory; pikeg@ornl.gov 25 Pittsburgh Supercomputing Center; balog@psc.edu 26 Research and Academic Computing Division of University Information Technology Services and Pervasive Technology Labs, Indiana University; stewart@iu.edu Executive Summary The IEEE/ACM SC conference series has for many years included a number of “challenge” events. One of these is the bandwidth challenge, which invites teams of technologists from the nation's most elite supercomputing facilities to push the limits of modern computer networks. In 2007, Indiana University led the team that won the SC|07 bandwidth challenge with its project, “All in a Day's Work: Advancing Data Intensive Research with the Data Capacitor.” Competitors were challenged to address the theme “serving as a model,” creating methods for fully utilizing a high-speed network path to supporting end-toend network applications running across a grid that included the conference's exhibit floor and the participant's home institutions using production networks. The IU-led team created a short-term distributed computing grid using storage elements in Bloomington, IN (Data Capacitor) and the IU booth at the SC|07 Exhibition Hall in Reno, NV (Data Capacitor-Reno). Compute elements were distributed across Dresden, Germany; Rochester, NY; and Bloomington, IN. A modest compute resource was housed in the IU booth in the SC|07 Exhibition Hall in Reno, NV. We demonstrated five scholarly applications running simultaneously: Modeling and analysis of the amyloid peptide, which is thought to be the cause of Alzheimer's disease, using IU's Big Red Supercomputer. (Led by Mu-Hyun Baik of the IU School of Informatics and IU Bloomington Department of Chemistry.) Live acquisition of x-ray crystallography data. (Led by D.F. "Rick" McMullen, of Pervasive Technology Labs at Indiana University.) Digital preservation of ancient Sanskrit manuscripts. (Led by P.R. Mukund of the Rochester Institute of Technology.) Performance analysis of a computational fluid dynamics application by the Technische Universität Dresden using its Vampir/VampirTrace software package. (Led by Matthias Mueller of the Center for Information Services and High Performance Computing.) Simulations of a high energy physics reaction between the basic particles of matter. (Led by Scott Teige of Indiana University Information Technology Services.) We achieved a peak transfer rate of 18.21 Gigabits per second (Gbps) out of a possible maximum of 20 Gigabits/second for a bidirectional 10 Gbps link. Sustained performance was an overall rate of 16.2 Gigabits/second (roughly equivalent to sending 170 CDs of data per minute). A particularly notable aspect of the overall performance was that Vampir trace data was written at a rate of close to 4 Gbit/sec from Dresden Germany across the Atlantic to the show floor in Reno using a transatlantic network path that included Internet2, GÉANT, and the German National Research and Education Network (DFN). Our use of bidirectional data transfer was driven partly by our interest in having this project serve as a model demonstrating the versatility of the Data Capacitor solution and to fully utilize the 10 gigabit link that we had been given to work with. We were proud to participate in this challenge event and while it was exciting to win, the most important aspect of any challenge event at the IEEE/ACM SC Conference is not who won or who encountered the most difficulties. What matters most is that these events do serve as a motivation for innovation and short-term projects that demonstrate what might be routinely possible a year or more into the future. We did so and learned a lot in the process. 1. Introduction The IEEE/ACM SC conference series [1] has for many years included a number of “challenge” events. These challenges create the opportunity to push the boundaries of computing activities forward in the areas of computation, data analysis, and networking. Because there is a certain amount of prestige involved, it is often possible to obtain loans of equipment and create short-term collaborations to achieve new firsts that would not happen in the absence of these challenges. The SC|07 (SC 2007) web site describes the bandwidth challenge as follows: The High Performance Bandwidth Challenge is an annual competition for leading-edge network applications developed by teams of researchers from around the world, providing a showcase for the technologies and people who provide the networking capabilities crucial to supercomputing. The Bandwidth Challenge, running across SCinet, is designed to test the limits of network capabilities, and past events have showcased multi-gigabit-per-second demonstrations never before thought possible. [2] In 2006, Indiana University led a team that received an honorable mention in the SC06 bandwidth challenge [3]. This team included participants from the Pittsburgh Supercomputing Center and Oak Ridge National Laboratory, and the project was titled “All in a Day's Work: Advancing Data Intensive Research with the Data Capacitor.” This project achieved a peak of 9.2 Gigabits per second (Gbps) over a 10 Gbps network link, with an approximate sustained average of 5.5 Gigabits/second. Data were moved via a wide area network Lustre file mount between Bloomington IN and the SC|06 exhibit floor in Tampa, Florida. The 2007 bandwidth challenge focused on the theme, “Serving as a Model.” The SC|07 Call for Participation stated: This year the Bandwidth Challenge will focus on showcasing those who can serve as a model for end-to-end achievement which should be emulated by others. We’ve put these great networks in place, now let’s make sure everyone can use them to the fullest extent. This is a Call for Participation in the Bandwidth Challenge at SC07. The intention is that your participation not only will benefit your home institution, but that your example will serve as a model for other institutions to follow. … The Challenge this year is: Can you fully utilize one 10 Gig path, end-to-end, disk-todisk, from SC07 in Reno, Nevada back to your home institution, using the actual production network back home? Can you realize, demonstrate and publish all the configuration, troubleshooting, tuning and policies, not only to show off at SC07, but to leave a legacy at your home institution whereby your scientists can achieve the same results after you? Can you serve a model for others to follow? [4] With the bittersweet success of an honorable mention in 2006, IU expanded its team to include representatives of Technische Universität Dresden (Germany) and the Rochester Institute of Technology (New York, USA). The title of the 2007 project was “Using the Data Capacitor for Remote Data Collection, Analysis, and Visualization” and the abstract submitted to SC|07 was as follows: Indiana University provides powerful compute, storage, and network resources to a diverse local and national research community. In the past year, through the use of Lustre across the wide area network, IU has been able to extend the reach of its advanced cyberinfrastructure across the nation and across the ocean to Technische Universität Dresden. For this year's bandwidth challenge, a handful of researchers from IU, Rochester Institute of Technology, and the Technische Universität Dresden will run a set of dataintensive applications crossing a range of disciplines from the digital humanities to computational chemistry. Using IU's 535 TB Data Capacitor and an additional component installed on the exhibit floor, we will mount Lustre across the wide area network to demonstrate data collection, analysis, and visualization across distance. [5] We believe that distributed workflows represent an important category of scientific application workflows that make possible new and more rapid discoveries using grids and distributed workflow tools. We believe that short-term storage systems have a particularly important role to play in distributed workflows. Indeed, we have previously written that “data in a network acts as an uncompressible liquid” and a short-term storage system such as the IU Data Capacitor provides an essential tool to link inputs and outputs within a geographically distributed workflow. The Data Capacitor, funded in part by a 2005 Major Research Instrumentation grant from the National Science Foundation, is a 535 TB distributed object store file system constructed for short- to mid-term storage of large research data sets and that sits at the center of IU’s cyberinfrastructure. The Data Capacitor is based on the Lustre open source file system [6]. The Data Capacitor can be accessed via Lustre file system mounts over wide area networks, allowing it to be used as a powerful tool to accommodate loosely coupled, service-oriented computing [7]. With the ability to mount the file system in multiple locations, it is possible to view and act on the file system from different resources. The analogy with electrical circuits is apt – building up data over time from diverse sources so that it could be “discharged” at high rates into high performance resources, much as a capacitor builds up electrical energy over time and discharges it in a short powerful burst. The performance characteristics of wide area file mounts using Lustre have been published recently [8, 9]. So far in 2007 we have demonstrated single file/single client write performance from Oak Ridge National Laboratory to Indiana University in excess of 750 MB/s. With the help of PSC and ORNL, we have successfully pioneered the use of the Lustre distributed object store file system across the wide area network. The file system research has extended the reach of the Data Capacitor, enabling high-speed data transfer between geographically distributed resources, and empowering distributed scientific workflows. The particular value to distributed scientific workflows is that data remains in place while analyses and visualization can occur elsewhere on resources that could be separated my many miles. Multi-step distributed workflows represent an important category of problems that the Data Capacitor will help solve, making possible more rapid discoveries by removing the need for cumbersome data transfer tools and replacing them with standard Unix file system commands. While the bandwidth challenge project was a one-time event that showcased new capabilities, it should serve as a model for grid-based computing centered around the Data Capacitor and its capabilities for short-term file storage as a way to facilitate distributed scientific workflows. Our demonstration was designed to serve as a model that could be adopted by other research institutions. This project made use of standard TCP/IP transport over production networks (primarily, for the purposes of this demonstration, Internet2) and the Lustre open source file system (version 1.4.10.1) supporting a highly heterogeneous mixture of computing clients (an SGI shared memory Altix system and a mixture of clusters including processors from AMD, Intel, and IBM (Power)). It also represents a model in terms of institutional collaborations – including Technische Universität Dresden (TUD), Rochester Institute of Technology (RIT), Oak Ridge National Labs (ORNL), and the Pittsburgh Supercomputing Center (PSC). In the remainder of this report, we describe the short-term hardware and software infrastructure created as part of this bandwidth challenge project, describe the scientific workflows supported by this project, and describe the results we achieved in the process of winning the SC|07 bandwidth challenge. 2. Network and Hardware Configuration Figure 1 shows a schematic diagram of the network used as part of the IU-led SC|07 bandwidth challenge competition. Figure 1. Schematic diagram of networks used as part of the IU-led SC|07 bandwidth challenge project “Using the Data Capacitor for Remote Data Collection, Analysis, and Visualization.” Network segment Location (start) Location (end) Approximate Geographic distance Bandwidth of link Bloomington IN to SCinet IU Bloomington Indiana GigaPOP, Indianapolis 52 miles 10 Gigabit Indiana GigaPOP, CIC OmniPoP, Chicago 182 miles 10 Gigabit Indianapolis CIC OmniPoP, Chicago Internet2, Chicago negligible 10 Gigabit Internet2, Chicago SCinet, Reno 1,913 miles 10 Gigabit Total Distance 2,147 miles Rochester NY to SCinet RIT, Rochester Nysernet, Buffalo 74 miles 1 Gigabit Nysernet, Buffalo Internet2, Chicago 537 miles 10 Gigabit Internet2 SCinet 1,913 miles 10 Gigabit Total Distance 2,524 miles Dresden Germany to SCinet TUD, Dresden DFN, Frankfurt 287 miles 10 Gigabit DFN, Frankfurt GEANT, Paris 356 miles 10 Gigabit GEANT, Paris Internet2, Washington DC 3,840 miles 10 Gigabit Internet2, Washington SCinet, Reno 2,597 miles 10 Gigabit DC Total Distance 7,080 miles Table 1. Description of legs of computer network used as part of the IU-led SC|07 bandwidth challenge project “Using the Data Capacitor for Remote Data Collection, Analysis, and Visualization.” Figure 2. Hardware (storage and computation) used in IU-led SC|07 bandwidth challenge project. The computational and storage hardware used as part of this bandwidth challenge project are shown schematically in Figure 2 and described in more detail below: Systems in Bloomington, IN (Indiana University) Big Red Quarry Data Capacitor – IUB (permanent): 535 Terabytes Lustre storage; 14.5 GB/s aggregate write bandwidth Systems in Rochester, NY Image acquisition server Systems in Dresden, Germany Neptun Systems in IU Booth at SC|07 Exhibition Hall (temporary) Computational cluster Data Capacitor-Reno Visualization 3. Scientific workflows supported The key motivation for this bandwidth challenge project was to demonstrate several geographically distributed scientific workflows. We supported five distinct areas of science and scholarly endeavor – including studies of religious/philosophical history (not a topic typically encountered in the SCxy Exhibition Hall). 3.1. Modeling and analysis of the amyloid peptide (Mu-Hyun Baik, Yogita Mantri, Richard Lord; Indiana University) Alzheimer’s disease is associated with amyloidal plaques in brain tissue. These plaques are formed by the aggregation of short peptides, the Amyloid-β or Aβ peptides, into insoluble fibrils. Unfortunately, the 3D structure of Aβ is not known. Baik’s group recently proposed for the first time a high-resolution structure based on molecular modeling efforts. This model is shown below in Figure 3. Figure 3. Proposed structural model of Amyloid-β peptide. With this model, and the corresponding amino acid sequence in Amyloid-β, we can now ask, through computational experiments, which part of Amyloid-β is most critical for the structural integrity? Which part should we attack to cause maximal damage and potentially destroy the plaques? To investigate this question, as part of the SC|07 bandwidth challenge project, we performed molecular dynamics simulations on approximately 800 of the most promising mutations in the amino acid sequence of Amyloid-β. Simulations were run on the Big Red supercomputer at IU Bloomington and output data were written to the Data Capacitor-Reno in the IU booth in the SC|07 Exhibition Hall. The scientific workflow is diagrammed schematically in Figure 4. Figure 4. Scientific workflow for modeling and analysis of the amyloid peptide. 3.2. Data Acquisition from a Global Consortium of Crystallography Facilities (D.F. “Rick” McMullen, John C. Huffman, Kia Huffman; Indiana University) X-ray crystallography is a key method for determining the molecular structure of inorganic compounds. There are several instrument and service providers at university labs and large national facilities. The resolution possible on determining molecular structures is scalable through the brightness (intensity) of the X-ray beam: the higher the intensity, the more fine the resolution of molecular structure possible. Labs and beamlines active or coming up with users and development teams at: US: IU, Purdue, Minnesota, Case, Advanced Photon Source UK: NCS/Southampton Australia: University of Sydney, James Cook, University of Queensland, Adelaide Scientists who are users of X-ray diffraction facilities want a variety of capabilities including real-time remote access, so that they can determine if they are getting an X-ray image that is well usable (important because the crystallization process that is a prerequisite for this sort of structure determination can be difficult; within one sample can be areas that are well crystallized and other areas that are not; only the former can be used to determine molecular structure with X-ray diffraction). Figure 5 shows the scientific workflow from the viewpoint of a crystallographer. Figure 5. Scientific workflow for crystallographic structure determination. The Common Instrument Middleware Architecture (CIMA) [10] is a middleware initiative to grid-enable remote instruments and sensors. CIMA aims to provide a generalized solution to remote access, data acquisition and control, stream processing, and real-time assimilation. It is funded by the NSF Middleware Initiative and provides straightforward mechanisms for data management. As part of the SC|07 bandwidth challenge project, inputs from several crystallography facilities were simultaneously carried from their sources to a computational system at IU Bloomington. Crystallographic data were ingested from the following sources: Australia University of Sydney James Cook University US Indiana University Bloomington – IU, APS B Department of Biology; IU Molecular Science Center, IUB Department of Chemistry Argonne National Laboratory – APS ChemMatCARS; APS UNI/XOR Case Western University University of Minnesota United Kingdom NCS/Southampton Data analyses and visualization were managed using CIMA. Data were transported from the above sources to IU Bloomington, where the data were analyzed, and then moved to the Data Capacitor-Reno for visualization on the SC|07 Exhibition Hall floor using compute systems and visualization software in the IU booth there. This workflow is depicted schematically in Figure 6. Figure 6. Distributed workflow managing and analyzing X-ray diffraction data from a variety of sources, analyzed at IU Bloomington and visualized on the SC|07 Exhibition Hall floor in Reno, NV. 3.3. Digital preservation of ancient Sanskrit manuscripts (P.R. Mukund, Andrew Elble, Ajay Pasupuleti, Richard Bohn, Sripriya Das, James Stefano; Rochester Institute of Technology) The Center for Preservation of Ancient Manuscripts at the University of Rochester has as its mission the preservation and dissemination of manuscripts from various ancient cultures. The Center is particularly interested in creating a central repository – a digital library – to facilitate access to ancient manuscripts both as images and as searchable text documents. One of the projects of this center is the preservation and digitization of the Sarvamoola Granthas. The Sarvamoola Granthas are the teachings of Shri Madhvacharya (1238-1317), a great Indian Philosopher and proponent of Dvaita Philosophy. It is a collection of works with commentaries on various important scriptures such Vedas, Upanishads, Itihasas, Puranas, Tantras, and Prakaranas. All of the original manuscripts of the Sarvamoola granthas were incised on palm leaves. These palm leaves are now hundreds of years old, and suffer from exposure to atmospheric elements. The leaves become brittle and difficult to handle, and also become discolored and hard to decipher. Without the creation of a digital repository of these writings, they become unavailable to scholars due to fear of further deterioration, and future generations are deprived of access. The Vaishnava Literature collection consists of more than 100 microfilm tapes comprising well over 2000 manuscripts (600,000 images) belonging to Vaishnava tradition. This is part of the Vaishnava Literature Conservation Project (VLCP), funded by the Smithsonian Institution and the Institute for Vaishnava Studies. The goal of VLCP was to preserve the ancient manuscripts belonging to the Vaishnava tradition, which were otherwise deteriorating due to lack of proper conservation. A group of researchers and photographers spent over 18 months traveling throughout India in the early 1980s. The outcome of VLCP was a set of microfilms containing the manuscripts of almost all the Vaishnava traditions. After the successful completion of the “Digitization of the Sarvamoola Granthas” project, the principal investigator (PI) of VLCP, Dr. Charles S.J. White, Professor Emeritus of American University decided to gift a copy of the entire VL microfilm collection to Dr. P.R. Mukund of RIT to digitally preserve it and make it accessible to scholars worldwide. Making these documents available digitally involves carefully handling these now hundreds of years old palm leaves, scanning the images, and then digitally enhancing the images. Figure 7 shows the palm leaves awaiting transfer to the digital scanner. Figure 7. Palm leaves containing the writings of Shri Madhvacharya – the Sarvamoola Granthas Figure 8. (a) Stitched 8 bit grayscale image without normalization and contrast enhancement, (b) Final image after contrast enhancement. The workflow in this case is very simple – but created by a real need. The scanner and image acquisition server located at the Rochester Institute of Technology create data at a rate faster than it is possible to store and manage images locally. By transferring images in real time from this server to the Data Capacitor, it is possible to temporarily store data on disk until it can be archived to tape or other long-term storage media. The scholarly workflow, in this case, was simply the acquisition of images in Rochester NY and transfer of that image data to Data Capacitor-Reno on the where they could be visualized in the IU booth. 3.4. Performance analysis of a computational fluid dynamics application (Matthias Mueller, Guido Juckeland, Michael Kluge, Robert Henschel, Holger Brunst, Andreas Knuepfer; Technische Universität Dresden) Performance analysis and tuning of supercomputer applications is essential to achieving top application performance and performing the largest possible analyses in the shortest possible amount of wall clock time. Vampir [11] is a popular tool used to instrument applications and understand runtime behavior, and this understanding allows programmers to modify the application to improve performance. Within the Vampir tool suite, VampirTrace is used to analyze and write out performance data. VampirTrace can produce prodigious amounts of output, which is written in Open Trace Format (OTF) [12]. In fact, management of the trace files can be one of the challenges in analyzing performance of large applications. Trace data in OTF format is created automatically when running an application that has been instrumented with VampirTrace. A separate executable – Vampir Server – is used for visualization of trace data. During the 2007 bandwidth challenge, trace data from a CFD application running in Dresden was written to a Lustre file system hosted on the Data Capacitor-Reno. The program that was analyzed was Semtex, which simulates the stirring process of a conductive fluid by means of a magnetic field. The aim of the simulation is to design a magnetic field such that the stirring process causes minimal turbulences. Figure 9 shows two visualizations of output from this application. Figure 9. The above images show the velocity in the cylindrical domain with isosurfaces (left) and with a color-coding on intersection planes (right). Figure 10. Visualization of parallel execution of 128 process run of Semtex. Figure 11. Detailed visualization of the Semtex master process showing stack and I/O events. Figure 10 and Figure 11 show visualizations made with VampirServer of data written from Dresden, Germany to the Data Capacitor-Reno and visualized locally in Reno using VampirServer running on the compute cluster in the IU Booth on the SC|07 Exhibition Hall floor. We considered the trans-Atlantic writing of trace files to be a particularly interesting aspect of this scientific workflow. Imagine being able to dump your trace data to a file system quickly and easily, so experts on the other side of the globe could help you optimize your code. This workflow is diagrammed schematically in Figure 12. Figure 12. Scientific workflow for trans-Atlantic use of Vampir to analyze performance. The CFD application ran on Neptun, in Dresden Germany. Trace data were written to multiple files on the Data Capacitor-Reno and visualized there using the Vampir Server application running at SC|07 in Reno NV. 3.5. Simulations of a high energy physics reaction between the basic particles of matter (Scott Teige, Indiana University) High Energy Physics studies the properties of matter at the smallest scale. An interesting reaction is: 𝜋− 𝑝 → 𝜋0𝜋0𝜋0𝑛 Equation 1. The reaction represented above involves an incident pion interacting with a proton and producing three neutral pions and a neutron. However, the particles produced are unstable. What we are able to observe is one particle in; six gamma rays out. Gamma rays can be detected and the interesting reaction re-assembled. These experiments are conducted in high-energy physics labs. Figure 13 Scott Teige standing in front of particle detector at Brookhaven National Laboratory. Figure 14. A representation of a high energy photon interacting with the detector. When a gamma ray hits the lead glass array, many particles are generated; these, in turn, generate the observed signals diagrammed in Figure 14. The process of figuring out what happened in any given experiment happens by simulating the observed outcome of a variety of different possible reactions and matching the observed outputs against outputs generated from simulations (where one knows exactly what was simulated). In this workflow, we simulated the reaction shown in Equation 1 using the compute nodes at SC|07, sent the reaction data to the CPU resource at Indiana University where it was analyzed, and sent the analyzed results back to SC|07 for visualization to be analyzed further at a later date. 4. Results – latency, bandwidth achieved Overall latency of connections from endpoint to the SC|07 Exhibition Hall in Reno, NV was as follows (actual measurements based on ping tests): IU to Reno RIT to Reno TUD to Reno 63 ms 59 ms 171 ms We achieved a peak transfer rate of 18.21 Gigabits per second (Gbps) out of a possible maximum of 20 Gigabits/second for a bidirectional 10 Gbps link. Sustained performance was an overall rate of 16.2 Gigabits/second (roughly equivalent to sending 170 CDs of data per minute). The bidirectional data can be seen in Figure 15. A particularly notable aspect of the overall performance was that Vampir trace data was written at a rate of close to 4 Gbit/sec from Dresden Germany across the Atlantic to the show floor in Reno using a transatlantic network path that included Internet2, GÉANT, and the German National Research and Education Network (DFN). Figure 15. A diagram of IU’s challenge time window, with green representing outbound data going to the Data Capacitor in Bloomington and red representing incoming data written to Data Capacitor-Reno. The goal of the challenge was to demonstrate how use of the 10 Gigabit link could serve as a model for other institutions. Shortly after the bandwidth challenge, the University of Florida set up their own Lustre file system that exports mounts to other universities in Florida across the Florida LambdaRail. In this regard, IU’s participation has sparked the imagination of researchers in Florida and will hopefully have impact on other research groups in the future. The possibilities of sharing data at high speed with colleagues at other institutions are limitless. Couple those possibilities with the ability to perform geographically distributed workflows without explicit data transfer and you have a whole new way of thinking about data solutions. This challenge permitted IU to demonstrate a proof of concept to a wide audience and concluded the first chapter in what we hope will be quite a long book. 5. Acknowledgements It would not have been possible to carry out this project without the generous and active involvement of our vendor partners – who provided equipment, personnel time, and expertise: Force-10 Networks Data Direct Networks Myricom Inc. Dell Sun (after its purchase of CFS) We would not have been able to achieve the network performance demonstrated in this bandwidth challenge entry without the expert help and dedication of the staff and leadership of the following networks and network facilities: CIC Omnipop The scholarly and scientific research projects described here have been supported by the following sources of funding support: The Data Capacitor project is supported in part by the National Science Foundation under NSF Award Number CNS0521433 (Craig Stewart, PI; Stephen Simms, Co-PI and project manager; Caty Pilachowski, Randall Bramley and Beth Plale, Co-PIs). IU's involvement in the TeraGrid is supported in part by NSF grants ACI-0338618l, OCI-0451237, OCI-0535258, and OCI-0504075. Data Acquisition from a Global Consortium of Crystallography Facilities is supported in part by the National Science Foundation under NSF Award Number OCI-0330568 (Donald McMullen, PI; John Huffman, Randall Bramley, Kenneth Chiu, Co-PIs) IU's Big Red Supercomputer was funded in part by a grant from the Lilly Endowment, Inc. for the Indiana METACyt Initiative. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Lilly Endowment Inc. Modeling and analysis of the amyloid peptide (Mu-Hyun Baik) Digital preservation of ancient Sanskrit manuscripts (P.R. Mukund) Performance analysis of a computational fluid dynamics application (Matthias Müller) Simulations of a high energy physics reaction (Scott Teige) 6. References Cited [1] Association for Computing Machinery SIGARCH and IEEE Computer Society. The SC Conference Series. Available from: http://supercomputing.org/ [cited 2 Jul 2012] [2] SC07. Challenges. 2007. Available from: http://sc07.supercomputing.org/?pg=challenges.html [cited 2 Jul 2012] [3] Indiana University Pervasive Technology Institute. SC|06 Bandwidth Challenge. 2006. Available from: https://pti.iu.edu/ci/sc06-bandwidth-challenge [cited 2 Jul 2012] [4] SC07. SC07 Bandwidth Challenge: End-to-End Achievement: Serving as a Model (Bandwidth Challenge Call for Participation). 2007. Available from: http://sc07.supercomputing.org/html/BWC-Call-Participation.pdf [cited 2 Jul 2012] [5] SC07. Bandwidth Challenge Finalists: Using the Data Capacitor for Remote Data Collection, Analysis, and Visualization. 2007. Available from: http://sc07.supercomputing.org/schedule/event_detail.php?evid=11464 [cited 2 Jul 2012] [6] Cluster File Systems, Inc. lustre wiki (archived). 2007. Available from: http://web.archive.org/web/20071025052628/http://wiki.lustre.org/index.php?title=Main_Page [cited 2 Jul 2012] [7] Simms, S., S. Tiege, G. Pike, B. Hammond, Y. Ma, C. Westneat, L.L. Simms and D. Balog. Empowering Distributed Workflow with the Data Capacitor: Maximizing Lustre Performance Across the Wide Area Network. In: Proceedings of Workshop on Service-Oriented Computing Performance: Aspects, Issues, and Approaches. (Monterey, CA, 2007). Available from: http://portal.acm.org/citation.cfm?id=1272465 [cited 31 Jan 2011] [8] Simms, S.C., G.G. Pike and D. Balog. Wide Area Filesystem Performance Using Lustre on theTeraGrid. In: Proceedings of TeraGrid 2007. (Madison, WI, 2007). Available from: http://hdl.handle.net/2022/14057 [cited 10 Jan 2012] [9] Simms, S., M. Davy, B. Hammond, M. Link, C.A. Stewart, R. Bramley, B. Plale, D. Gannon, M.H. Baik, S. Teige, J. Huffman, D. McMullen, D. Balog and G. Pike. All in a Day's Work: Advancing Data-Intensive Research with the Data Capacitor. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. (2006). ACM Press. Available from: http://doi.acm.org/10.1145/1188455.1188711 [cited 15 Nov 2011] [10] McMullen, D.F., R. Bramley, K. Chiu, H. Davis, T. Devadithya, J.C. Huffman, K. Huffman and T. Reichherzer. The Common Instrument Middleware Architecture Experiences and Future Directions. In: Signals and Communication Technology. F. Davoli, N. Meyer, R. Pugliese and S. Zappatore, eds. Springer US, 2009. Available from: http://dx.doi.org/10.1007/978-0-387-096636_26 [cited 2 Jul 2012] [11] Vampir. Home page. Available from: http://vampir.eu/ [cited 30 Apr 2010] [12] ParaTools. Open Trace Format. Available from: http://www.paratools.com/OTF [cited 2 Jul 2012]