National Computational Infrastructure for Lattice Gauge Theory R. Brower, (Boston U.), N. Christ (Columbia U.), M. Creutz (BNL), P. Mackenzie (Fermilab), J. Negele (MIT), C. Rebbi (Boston U.), S. Sharpe (U. Washington), R. Sugar (UCSB) and W. Watson, III (JLab) The goal of our research is to obtain a quantitative understanding of the physical phenomena encompassed by quantum chromodynamics (QCD), the fundamental theory governing the strong interactions. Achievement of this goal requires terascale numerical simulations. Such simulations are necessary to solve the fundamental problems in high energy and nuclear physics that are at the heart of the Department of Energy's large experimental efforts in these fields. The SciDAC Program is enabling U.S. theoretical physicists to develop the software and prototype the hardware they need to carry out terascale simulations of QCD. The long term goals of high energy and nuclear physicists are to identify the fundamental building blocks of matter, and to determine the interactions among them that lead to the physical world we observe. Remarkable progress has been made through the development of the Standard Model of High Energy Physics, which provides fundamental theories of the strong, electromagnetic and weak interactions. However, our understanding of the Standard Model is incomplete because it has proven extremely difficult to determine many of the predictions of quantum chromodynamics (QCD), the component of the Standard Model that describes the strong interactions. To do so requires terascale numerical simulations on four-dimensional space-time lattices. The study of the Standard Model is at the core of the Department of Energy's experimental programs in high energy and nuclear physics. Major goals are to verify the Standard Model or discover its limits; determine the properties of strongly interacting matter under extreme conditions; and understand the structure of nucleons and other strongly interacting particles. Lattice QCD calculations are essential to research in all of these areas. Recent advances in algorithms and calculational methods, coupled with the rapid increase in capabilities of massively parallel computers, have created opportunities for major advances in the next few years. U.S. theoretical physicists must move quickly to take advantage of these opportunities in order to provide support for the experimental programs in a timely fashion, and to keep pace with the ambitious plans of theoretical physicists in Europe and Japan. For this reason the entire U.S. lattice QCD community has joined together in the SciDAC Program to build the computational infrastructure that is needed for the next generation of calculations. Computational facilities capable of sustaining tens of teraflops are needed to meet our near term scientific goals. By taking advantage of simplifying features of lattice QCD, such as regular grids and the well understood influence of each lattice site on its neighbors, which leads to uniform, predictable communications, it is possible to construct computers for lattice QCD that are far more cost effective than general purpose supercomputers, which must perform well for a wide variety of problems including those requiring irregular or adaptive grids, non-uniform communication patterns, and massive input/output capabilities. We are targeting a price/performance of $1M or less per sustained teraflop/s in 2004-2005, falling with Moore’s Law at 60% per year. We have identified two computer architectures that will meet this target. One is the QCDOC, the latest generation of highly successful Columbia/Riken/ Brookhaven National Laboratory (BNL) special purpose computers, which was developed at Columbia University in partnership with IBM. The other is commodity clusters, which are being specially optimized for lattice QCD at Fermi National Accelerator Laboratory (FNAL) and Thomas Jefferson National Accelerator Facility (Jlab) Under the SciDAC Program we have designed and implemented a QCD Applications Program Interface which provides a uniform programming environment to achieve high efficiency on the QCDOC, optimized clusters and commercial supercomputers. By the creation of standards for communication interfaces, optimized low level algebraic kernels, optimized high level operators and other run-time functions, the valuable U.S. base of application codes can be easily ported and extended as the computer architectures evolve over time, without duplication of effort by the lattice QCD community. This has been demonstrated on the QCDOC hardware and on clusters. The QCD API has three layers. At the lowest level are the message passing and linear algebra routines essential for all QCD applications. These have been written, and are optimized for the QCDOC and clusters. The middle layer provides a data parallel language which enables new applications to be developed rapidly and run efficiently.. This language makes use of the low level communications and linear algebra routines transparently to the applications programmers. C and C++ versions are currently available. In any QCD application, the overwhelming share of the computation is done in a small number of subroutines. The top layer of the QCD API consists of highly optimized versions of these subroutines, which can be called directly from the new data parallel language, or from C and C++ code. For the QCDOC, these subroutines have been written in assembly language, and thoroughly tested on the initial hardware. Optimization of these computationally intensive subroutines for clusters is in progress. The SciDAC Program is also supporting the construction of prototype clusters for the study of QCD. The objectives of this work are to determine optimal configurations for the multi-teraflops clusters we propose to build in the next few years; to provide platforms for testing the SciDAC software; and to enable important research in QCD. The SciDAC funded work on cluster development is being undertaken collaboratively by Fermilab and JLab/MIT. The initial clusters have proven invaluable for our software effort and our research in QCD. Additional clusters are planned for the current year at both laboratories. Development work on the QCDOC, which was funded outside the SciDAC Program, has been completed. Initial hardware has been constructed, and tests indicated that it will achieve its design goal of $1M per sustained teraflop;/s. The designers are beginning to construct multi-teraflop/s machines. We propose to build a 10 teraflop/s sustained QCDOC at BNL this year, followed by clusters of the same capabilities at FNAL and JLab in 2005 and 2006. These machines will enable major progress in our understanding of the fundamental laws of nature, and the physical phenomena arising from QCD.