Math Libraries for Windows HPC Author: Dr. Edward Stewart, IMSL Product Manager, Visual Numerics, Inc. Published: February 15, 2016 Abstract This paper discusses the current state of mathematical libraries available for Microsoft® Windows®, with a focus on High Performance Computing (HPC) use and specifically Windows ® HPC Server 2008. Software analyzed will be limited to versions designed for 64-bit x86 architectures (specifically Intel EM64T and AMD Opteron) as this is the required platform for Windows HPC Server 2008. An overview of both open source and commercial software options will be provided with a focus on uses in distributed computing. This paper is intended for software developers familiar with HPC. A high level of expertise in mathematics or details of distributed computing techniques like MPI is not required. Familiarity with Windows tools like Visual Studio® 2008 and the Windows HPC Job Scheduler for Windows HPC Server 2008 would be helpful to follow the examples, but is not necessary. Starting with an overview of Math Library uses in industry and research, coverage will progress from open source linear algebra tools, to vendor-supplied libraries, to broad commercial libraries. Special libraries for distributed computing will also be discussed. Finally, two examples will be presented that describe developing HPC applications on Windows using math libraries. The first is a distributed application utilizing MPI and the IMSL® Fortran library. The second is a parameter sweep type distributed application utilizing the IMSL C# Library for .NET Applications. Contents Usage of Math Libraries ............................................................................................................................2 Linear Algebra Libraries ............................................................................................................................2 Vendor Libraries ........................................................................................................................................3 Distributed Computing Libraries ................................................................................................................4 Broad Math Libraries .................................................................................................................................4 .NET Math Libraries ..................................................................................................................................5 Java Math Libraries ...................................................................................................................................6 SMP Parallelization ...................................................................................................................................6 Distributed Math Libraries on Windows .........................................................................................................7 Architecture Overview ...............................................................................................................................7 MPI and the IMSL Fortran Numerical Library ...........................................................................................7 Example: IMSL Fortran on Windows HPC Server 2008 with MS-MPI ................................................... 10 Installing the IMSL Fortran Library on Windows .................................................................................... 10 Creating the Project................................................................................................................................ 12 Running the Example ............................................................................................................................. 18 From the Command Line ....................................................................................................................... 20 Other IMSL Libraries with MPI ............................................................................................................... 22 Parameter Sweep Distributed Applications ................................................................................................ 23 Architecture Overview ............................................................................................................................ 23 Example: Parameter Sweep Distributed Application in C# .................................................................... 23 Installing the IMSL C# Numerical Library ............................................................................................... 24 Creating the Project................................................................................................................................ 26 Running the Example ............................................................................................................................. 31 Summary .................................................................................................................................................... 35 Appendix A ................................................................................................................................................. 36 Feedback ................................................................................................................................................... 37 More Information and Downloads .......................................................................................................... 37 Math Libraries for Windows HPC 1 Contributors and Acknowledgements Contributors from Visual Numerics for this document include Dr. Edward Stewart, IMSL Product Manager; Greg Holling, Principal Consulting Engineer; and Ryan Wagner, Technical Support Engineer. Math Libraries for Windows HPC 1 An Overview of Math Libraries for Windows Usage of Math Libraries For computer programmers, calling pre-written subroutines to do complex calculations dates back to early computing history. With minimal effort, any developer can write a function that multiplies two matrices, but these same developers would not want to re-write that function for every new program that requires it. Further, with good theory and practice, one can optimize practically any algorithm to run several times faster, though it would typically take a several hours to days to match the performance of a highly optimized algorithm. For example, compare a naïve matrix multiplication to a modern algorithm with blocking that ensures efficient use of cache memory. At this point, it makes sense for the original developer, who has a much larger problem to solve, to call on a pre-written function to perform this matrix multiplication quickly and efficiently so that attention can be focused on the higher level application. The first experience many developers have with such functions is the Numerical Recipes reference books. Positioned as educational material on “The Art of Scientific Computing” the series contains complete commented source code for hundreds of algorithms. While there is controversy around some of the algorithms themselves, Numerical Recipes has exposed many programmers to the fundamental concept of utilizing pre-written functions instead of starting from scratch. Numerical Recipes is available today as source code in several languages at http://www.nr.com/. Using Numerical Recipes on Windows® is as easy as copying the source into your application and compiling the code. Historically, during the 1950s and 1960s the United States had what can be called a “software crisis” partly because of the space race and cold war, but also partly because of the rapid advances in computing hardware. At that time, one had to be a mathematician to program a computer. Even when Fortran was developed in 1965, the complexities of the hardware and algorithms required a great deal of mathematical knowledge. The software crisis arose because there were far more problems needing solving than capable mathematicians (and therefore programmers), because programs were difficult and time consuming to write, and because the software produced was very error prone. Commercial libraries arrived in the early 1970s as a solution to the crisis and to address the issue of consistency of numerical results across various computing platforms. Primarily these tools meant that programmers did not have to reinvent the wheel to program well-known algorithms, but it also meant the resulting programs where easier to read and follow. Improved reliability was (and still is for many users) another key reason to justify using a library, as one can rely on a tested proven collection of shared knowledge instead of an individual mathematicians particular view of the solution. Scientific computing, and the use of math libraries, was traditionally limited to research labs and engineering disciplines. In recent decades, this niche computing market has blossomed across a variety of industries. While research institutes and universities are still the largest users of math libraries, especially in the High Performance Computing (HPC) arena, industries like financial services and biotechnology are increasingly turning to math libraries as well. Even the business analytics arena around business intelligence and data mining is starting to leverage the existing tools. From bond pricing and portfolio optimization to exotic instrument evaluations and exchange rate analysis, the financial services industry has a wide variety of requirements for complex mathematical algorithms. Similarly, the biology disciplines have aligned with statisticians to analyze experimental procedures which produce hundreds of thousands of results. With this expanded industry use, and use in new environments like Microsoft ® Windows®, use of math libraries has grown significantly. Linear Algebra Libraries The core area of the math library market implements linear algebra algorithms. More specialized functions, such as numerical optimization and time series forecasting, are often invoked explicitly by users. In contrast, linear algebra functions are often used as key background components for solving a wide variety of problems. Eigen analysis, matrix inversion and other linear calculations are essential components in nearly every statistical analysis in use today including regression, factor analysis, discriminate analysis, etc. The most basic suite of such algorithms is the BLAS (Basic Linear Algebra Subprograms) libraries for basic vector and matrix operations. BLAS are divided into three separate levels. Level 1 contains functions that operate on two vectors; Level 2, on a vector and a matrix; and Math Libraries for Windows HPC 2 Level 3, on two matrices. In particular, scalar dot products would fall into Level 1, multiplication of a matrix by a vector into Level 2, and matrix-matrix multiplication into Level 3. BLAS implementations will be discussed later in this paper as parts of other broader libraries. One of the first such libraries to build on the BLAS foundations is the LINPACK library written in the 1970s. LINPACK has now been superseded by LAPACK (the Linear Algebra PACKage) which solves a wide variety of problems including solving linear systems of equations, least-squares problems, Eigen systems (superseding the earlier EISPACK), various decompositions and others. LAPACK is a free library available as source code from the NetLib repository at http://www.netlib.org/lapack/. The source code is Fortran77, but LAPACK95 is also available in Fortran95. Prebuilt binaries are available for some environments including 32-bit Windows compiled specifically for the Pentium Pro architecture. To use LAPACK on Windows® HPC Server 2008 as a 64-bit library, a developer needs access to a Fortran compiler and will need to rebuild the library. Unfortunately, this is not always straightforward. For best performance, LAPACK should be used with a BLAS package optimized for specific hardware. NetLib supplies the ATLAS (Automatically Tuned Linear Algebra Software, http://www.netlib.org/atlas/) package for this purpose. APIs for ATLAS are available in both C and Fortran77. ATLAS is included in many source software packages that extend functionality well beyond linear algebra, but require BLAS functionality. As with LAPACK, Windows HPC Server 2008 developers will want to rebuild the library from source for optimal performance. GotoBLAS is a BLAS library available from the Texas Supercomputing Project in binary and source form. GotoBLAS claims to be the fastest available implementation of BLAS. The speedup in the library is obtained by optimizing Translation Look-aside Buffer (TLB) table misses, with less relative emphasis on optimizing the usage of L1 and L2 cache. GotoBLAS is available in Fortran source form, and it has been ported to a number of high-performance computing platforms. “Goto” is the name of the developer of the BLAS routines, Kazushige Goto. The GotoBLAS is being ported to the 64-bit Windows platform by Dr. Goto with the same excellent performance as the Linux version. Contact the Texas Advanced Computing Center (http://www.tacc.utexas.edu) to obtain access to GotoBLAS. Many developers would rather not build libraries from source code; they require pre-built binaries that are already optimized for their development platform. Optimized vendor-supplied libraries have been marketed for this purpose. Vendor Libraries Most hardware vendors recognize the need for optimized linear algebra routines for their platforms. Since they are most familiar with the details for their particular platforms, these vendors are in the best position to build and optimize such libraries. For the Windows HPC Server 2008 platform, the two libraries that fall into this category are the Intel Math Kernel Library (MKL) and the AMD Core Math Library (ACML). Both of these include complete implementations of all three levels of BLAS, LAPACK, Fast Fourier Transforms (FFTs), and Random Number Generators. The latest versions include excellent scaling for multi-core hardware and take advantage of low-level processor extensions like SSE, SSE2 and SSE3. Vendors will often tune particularly popular routines for optimal performance on their latest processor offerings. For example, ACML now includes explicit support for the AMD Barcelona processor in the Level 3 BLAS functions SGEMM and DGEMM (single- and double- precision matrix multiplication). Similarly, Intel has optimizations for the new quad-core Xeon processor 5300 series. To achieve high performance on Shared Memory Parallel (SMP) systems such as today’s multi-core processors and multi-CPU systems, these libraries leverage OpenMP. To allow the libraries to take full advantage of available hardware, the environment variable OMP_NUM_THREADS is set by the user at runtime to the number of threads. Alternatively, if the developer wants to parallelize code at a higher level, these libraries are available in single-threaded, but thread-safe versions, so they can be used in explicitly threaded applications. Both MKL and ACML are commercial libraries available from various resellers. As low-level libraries, they include flexible distribution policies and are relatively inexpensive. Math Libraries for Windows HPC 3 Distributed Computing Libraries The linear algebra and vendor libraries discussed so far focus on high performance algorithms for single machines. The HPC space and Windows HPC Server 2008 also target distributed computing where a collection of separate computers, or nodes, are connected together with a high-speed interconnect and usable as a single unit. This style of cluster computing often makes use of the Message Passing Interface (MPI) for parallel programming. A number of libraries have been extended to utilize MPI and allow developers to solve large linear algebra problems on these much larger systems without having to explicitly parallelize the algorithms themselves. The ScaLAPACK (Scalable LAPACK, http://www.netlib.org/scalapack/) library is one such implementation, again hosted by NetLib. This library is a subset of the LAPACK functions that have been redesigned for distributed systems. The overall ScaLAPACK project is actually made up of four components: ScaLAPACK for dense and band matrix systems PARPACK for sparse eigenvalue systems CAPSS for sparse direct systems ParPre for sparse iterative solvers Using ScaLAPACK is quite a bit more complex than using the LAPACK equivalent. Data must be stored in the block cyclic decomposition, and familiarity with MPI within the program is required. For the BLAS component, PBLAS (Parallel Basic Linear Algebra Subprograms) are utilized along with BLACS (Basic Linear Algebra Communication Subprograms) for communication tasks. Pre-built binaries are available for some systems, but many users may again turn to the supported and optimized vendor options like the Intel Math Kernel Library (MKL). In previous versions, Intel supported a separate version of MKL called Intel Cluster MKL for distributed systems. With version 10.0, MKL and Cluster MKL have been merged into a single product. Thus MKL now includes BLACS and ScaLAPACK implementations optimized for Intel hardware in distributed architectures. For users who require fundamental distributed linear algebra functions on Windows HPC Server 2008, the Intel MKL is a good choice to avoid all the issues of building such complex libraries from source. Broad Math Libraries The libraries mentioned thus far are largely focused on linear algebra. While linear algebra makes up a core piece of any numerical library offering, it is only a subset of the algorithms many of today’s developers require. Since there are many mathematical tools on the market today, the focus of the following discussion is narrowed to callable libraries with native language programming interfaces. This narrowing of focus excludes desktop math tools such as Matlab, PV-WAVE, SAS, Sage, Mathematica, and dozens of others. While some of these packages offer solutions for distributed systems and almost all are available for Windows, a developer writing in standard languages like C, Fortran or any .NET language and seeking high performance solutions will usually opt not to wrap function calls into a desktop analytics package. Reducing the scope as such, we are left with a much smaller list of broad math libraries. The GNU Scientific Library (GSL) is one example. The list of areas covered by GSL is extensive and well beyond basic linear algebra. Topics like root finding, numerical optimization, statistical analysis, differential equations, and curve fitting are covered but are just a short sample. GSL is an open source library available under the GNU General Public License (GPL). Limitations of the GPL aside, GLS is challenging to use on a Windows system. Compiled binaries are available as part of the Cygwin environment for Windows which mimics a Linux-like environment. While it might seem appealing to migrate HPC applications to Windows via the Cygwin environment, it is unfortunately challenging to integrate Cygwin with the standard Windows development environment. Specifically, mixing Windows development tools with Cygwin libraries is difficult at best. As such, unless all development tools and resources for a project exist for the Cygwin environment, GSL is not usually a viable option for the Windows HPC Server 2008 Math Libraries for Windows HPC 4 environment. Finally, GSL functions are not MPI-enabled and so distributed calculations across a cluster environment would require significant development effort. The remaining broad math libraries available for Windows are commercial libraries with a long history of supporting the Microsoft environment. Both the Numerical Algorithm Group (NAG) Library and Visual Numerics’ IMSL Numerical Library are available in C and Fortran versions for Microsoft Windows 64-bit systems. Both of these libraries support the Intel Visual Fortran compiler, Intel C++ compiler and the Microsoft® Visual C++® compiler. As with GSL, the coverage is very wide and goes well beyond linear algebra. Further, all versions of these products will link in vendor-supplied BLAS like MKL or ACML for the best performance for linear algebra functions (either directly utilized through the higher level interface or internally as BLAS). The NAG and IMSL Numerical Libraries are commercial products with solid documentation and available technical support. Both have been on the market since the early 1970s and have continued to evolve over the decades. For distributed computing environments NAG requires two Fortran libraries, the NAG SMP Library (for shared memory systems) and the NAG Parallel Library. As of this writing, neither of these libraries is available for the Windows operating system. The IMSL Fortran Library is a single product that contains some SMP-enabled functions and MPI-enabled routines. The MPI functions were expanded in version 6.0 to include a wide variety of ScaLAPACK functions along with utility functions that make distributed computing much more accessible for developers who are not MPI experts. This version of the IMSL Fortran Library is available for 64-bit Windows systems and will be the focus of an example in the following section. .NET Math Libraries With a focus on HPC, much of the discussion has been around libraries implemented in Fortran, and to a lesser extent C or C++. Since the focus of this paper is on Windows HPC Server 2008, .NET languages may also play a significant role for many developers. These libraries are almost all written in C#. Some are pure C# like the open source Math.NET project while others include native code for higher performance like the NMath product from CenterSpace Software. NMath started out as a C# wrapper for the Intel MKL library, but has expanded into statistical functions and other areas. The Extreme Optimization Numerical Libraries for .NET focus on an object oriented interface and covers a similarly wide variety of functionality. The IMSL C# Numerical Library comes in two formats as of version 5.0. One version is written in pure C# for developers who require purely managed code implementations and a second integrates MKL for low-level BLAS functions to boost performance for developers whose projects do not require pure managed code, but still want a .NET interface. The IMSL C# Numerical Library also includes charting classes with a programmatic interface to allow an easy path to visualization of results using a single tool. For programming numerical applications, F# is becoming a very popular option (see http://research.microsoft.com/fsharp/fsharp.aspx). This general purpose language includes many features that make complex programming tasks easier as a combination of procedural, object oriented and functional programming language elements. With full integration into the .NET Framework, solid performance, and an interactive scripting environment, there are many advantages to this novel platform. Not only does F# have access to the full .NET class library and is now integrated into Visual Studio® 2008, but third party libraries like those mentioned above are also fully supported. A quick example of calling the IMSL C# Library from the F# Interactive Console is shown in Figure 1. MSR F# Interactive, (c) Microsoft Corporation, All Rights Reserved F# Version 1.9.4.17, compiling for .NET Framework Version v2.0.50727 >#r “c:\\program files\\vni\\imsl\\imslcs500\\bin\\imslcs.dll”;; --> Referenced ‘c:\program files\vni\imsl\imslcs500\bin\imslcs.dll’ > open Imsl.Math;; Math Libraries for Windows HPC 5 > let g = Sfun.Gamma(0.5);; val g : float Binding session to ‘c:\program files\vni\imsl\imslcs500\bin\imslcs.dll’... > g;; val it : float = 1.772453851 > Figure 1. A sample interactive F# session calling a .NET library. Java Math Libraries Java is another platform option along the lines of the .NET Framework. Java is more focused on crossplatform application development, while .NET is more focused on effective development within the Windows environment. Java applications and tools will generally work well in heterogeneous Windows and Unix/Linux environments, for example. Calling C or C++ libraries from Java is an option for some developers, but when cross-platform solutions are required a pure Java library becomes a requirement. Java has a very large open source community and some numerical libraries are available in this form. The Java Matrix Package (JAMA) covers the basics of linear algebra, while the Colt Project expands on the theme to cover a broader range of algorithms for scientific and technical computing in Java. The project seeing the most active development with a very ambitious set of future goals is Commons-Math under the Apache Commons hierarchy of open source projects. The commercial offerings of Java numerical libraries are not as wide as the other platforms. While there are some industry-specific commercial tools (especially for financial services and biotech), the only broad commercial math library for Java is the Visual Numerics JMSL Numerical Library. SMP Parallelization With multi-core and many-core hardware becoming commonplace, shared memory parallelization is becoming more popular. The most common interface for SMP parallelization in the mathematical and scientific programming communities is OpenMP. By adding OpenMP directives into applications, supported compilers will take care of a lot of the details of parallelizing the code. Many mathematical algorithms involve repeated looping over data and very good performance gains can be realized by parallelizing large outer loops in these algorithms. Since Visual Studio 2005, the Microsoft Visual C++ compiler has supported OpenMP. In addition to OpenMP and with a focus on the .NET platform, Parallel Extension to .NET Framework 3.5 is currently available as a Community Technology Preview (see http://www.microsoft.com/downloads/details.aspx). While this is an early release for testing purposes only, the implementation holds a lot of promise for developers who want to multi-thread their .NET applications without managing all the details of the thread pool themselves. This release of the Parallel Extensions is essentially an updated System.Threading library with additional constructs like “Parallel.For” and “Parallel.ForEach”. This syntax is intuitive for anyone familiar with OpenMP, and thus adding SMP threading to .NET code with this library is expected to be straightforward. The extension also includes a System.Threading.Tasks namespace providing support for imperative task parallelism. This namespace includes the expected Task objects as well as a TaskManager class representing the schedule that executes the tasks. Finally, Parallel LINQ (PLINQ) is included in this work that allows LINQ developers to leverage parallel hardware with minimal impact to the existing programming model. Visual Numerics is currently investigating the details of this programming model with hopes to integrate the functionality inside a future release of the IMSL C# Numerical Library, providing enhanced performance with no code changes required for existing users. While existing IMSL classes can be leveraged inside parallel blocks, the best performance gain is clearly to add parallelism within the multifaceted math algorithms of the library. Math Libraries for Windows HPC 6 Distributed Math Libraries on Windows Architecture Overview In this section, we will provide an example of distributed calculations utilizing Microsoft ® Message Passing Interface, MS-MPI. The IMSL Fortran Library will be the library of choice due to its support for this Windows environment and also the ease of use for MPI applications. Consider the typical network topology for Windows HPC Server 2008 shown in Figure 2. The fine black lines and arrows indicate the flow of data in a network where the Head Node will spawn a job that utilizes the parallelization features included in the IMSL Fortran Library. In this case all of the necessary programming tools are installed on the Head Node; these include Visual Studio 2008, Intel Visual Fortran 10 and IMSL Fortran Numerical Library 6. The example codes are compiled and linked on the head node and then executed using the Windows HPC Job Manager. The runtime components are located in a shared folder on the head node that each compute node can access. Information is distributed to each compute node, which then performs its set of calculations, returning the results when work is complete. Figure 2. A typical cluster network topology for an MS-MPI application. MPI and the IMSL Fortran Numerical Library The IMSL Fortran Numerical Library is the primary offering from Visual Numerics for distributed applications and a good choice for the Windows platform given the review above. The library contains many functions that are MPI-enabled to solve large problems on clustered hardware. The functions are centered on linear algebra problems, but also extend into the realm of optimization. Furthermore, with version 6 of the IMSL Fortran Library, the existing API for many routines has been extended to leverage ScaLAPACK behind the scenes. This feature allows new users of MPI and ScaLAPACK to use the familiar IMSL Fortran Library interface instead of learning all the intricate details of MPI. While the MPI interface of the IMSL Fortran Library is based on MPICH2, the MS-MPI implementation included with Windows HPC Server 2008 is compatible and the best choice when using the Microsoft tools such as the Job Manager. There are only two requirements to develop with the IMSL Fortran Library on Windows operating systems: 1) a license for the IMSL Fortran Library and 2) a supported compiler. As of 2008, the supported compilers are the Intel Visual Fortran compiler Version 10, Absoft Pro Fortran Version 10.1 and the Portland Group’s PGI Fortran compiler version 7.1-5. One benefit of these “Visual Fortran” compilers Math Libraries for Windows HPC 7 is their integration with Visual Studio. Fortran developers who struggle with various command line tools and sparse options for fully featured editors will find being able to use Visual Studio for Fortran development is a change that should result in a significant improvement in productivity. The example in this section will use the Intel Visual Fortran compiler. Using MS-MPI with the IMSL Fortran Library is straightforward. By default, the batch script for compiling Windows Fortran applications makes reference to the MPICH2 binary library file mpi.lib. To execute using MS-MPI, the link option must be changed to pick up msmpi.lib instead. At runtime, the Compute Cluster Scheduler and Job Manager will be used to define and execute MPI applications bound for the cluster. The graphical user interface and this tool’s knowledge of other users’ schedule projects allows for a pleasant experience. If certain tasks require specific resources, the tool will wait for them to become available before attempting to execute the distributed task. There are two primary methods the IMSL Fortran Library has to distribute problems: the ScaLAPACK API technique and the Box Data Type technique. In the ScaLAPACK API technique, an IMSL function that references a ScaLAPACK function is used. Instead of having to manually configure the program with MPI functions such as MPI_BCAST or MPI_COMM_WORLD, the IMSL Fortran Library provides several utilities to configure MPI and ScaLAPACK, easing the burden on the developer. This technique is particularly helpful for programmers new to MPI-based distributed computing. An example from the IMSL Library documentation follows in Figure 3. Fortran code example shows the IMSL Fortran Library tools to leverage ScaLAPACK.where the IMSL Library interfaces and utilities are used instead of traditional code. Behind the scenes the ScaLAPACK function DGESVD (which computes the singular value decomposition of a double precision rectangular matrix) is referenced through the call to the IMSL subroutine LSVRR. USE MPI_SETUP_INT USE IMSL_LIBRARIES USE SCALAPACK_SUPPORT IMPLICIT NONE INCLUDE ‘mpif.h’ ! Declare variables INTEGER KBASIS, LDA, LDQR, NCA, NRA, DESCA(9), DESCU(9), & DESCV(9), MXLDV, MXCOLV, NSZ, MXLDU, MXCOLU INTEGER INFO, MXCOL, MXLDA, LDU, LDV, IPATH, IRANK REAL TOL, AMACH REAL, ALLOCATABLE :: A(:,:),U(:,:), V(:,:), S(:) REAL, ALLOCATABLE :: A0(:,:), U0(:,:), V0(:,:), S0(:) PARAMETER (NRA=6, NCA=4, LDA=NRA, LDU=NRA, LDV=NCA) NSZ = MIN(NRA,NCA) ! Set up for MPI MP_NPROCS = MP_SETUP() IF(MP_RANK .EQ. 0) THEN ALLOCATE (A(LDA,NCA), U(LDU,NCA), V(LDV,NCA), S(NCA)) Set values for A A(1,:) = (/ 1.0, 2.0, 1.0, 4.0/) A(2,:) = (/ 3.0, 2.0, 1.0, 3.0/) A(3,:) = (/ 4.0, 3.0, 1.0, 4.0/) A(4,:) = (/ 2.0, 1.0, 3.0, 1.0/) A(5,:) = (/ 1.0, 5.0, 2.0, 2.0/) A(6,:) = (/ 1.0, 2.0, 2.0, 3.0/) ENDIF ! ! ! ! Set up a 1D processor grid and define its context ID, MP_ICTXT CALL SCALAPACK_SETUP(NRA, NCA, .TRUE., .TRUE.) Get the array descriptor entities MXLDA, Math Libraries for Windows HPC 8 ! ! ! ! ! ! ! ! ! ! ! ! ! MXCOL, MXLDU, MXCOLU, MXLDV, AND MXCOLV CALL SCALAPACK_GETDIM(NRA, NCA, MP_MB, MP_NB, MXLDA, MXCOL) CALL SCALAPACK_GETDIM(NRA, NSZ, MP_MB, MP_NB, MXLDU, MXCOLU) CALL SCALAPACK_GETDIM(NSZ, NCA, MP_MB, MP_NB, MXLDV, MXCOLV) Set up the array descriptors CALL DESCINIT(DESCA, NRA, NCA, MP_MB, MP_NB, 0, 0, MP_ICTXT, & MXLDA, INFO) CALL DESCINIT(DESCU, NRA, NSZ, MP_MB, MP_NB, 0, 0, MP_ICTXT, & MXLDU, INFO) CALL DESCINIT(DESCV, NSZ, NCA, MP_MB, MP_NB, 0, 0, MP_ICTXT, & MXLDV, INFO) Allocate space for the local arrays ALLOCATE (A0(MXLDA,MXCOL), U0(MXLDU,MXCOLU), & V0(MXLDV,MXCOLV), S(NCA)) Map input array to the processor grid CALL SCALAPACK_MAP(A, DESCA, A0) Compute all singular vectors IPATH = 11 TOL = AMACH(4) TOL = 10. * TOL CALL LSVRR (A0, IPATH, S, TOL=TOL, IRANK=IRANK, U=U0, V=V0) Unmap the results from the distributed array back to a non-distributed array. After the unmap, only Rank=0 has the full array. CALL SCALAPACK_UNMAP(U0, DESCU, U) CALL SCALAPACK_UNMAP(V0, DESCV, V) Print results. Only Rank=0 has the solution. IF (MP_RANK .EQ. 0) THEN CALL WRRRN (’U’, U, NRA, NCA) CALL WRRRN (’S’, S, 1, NCA, 1) CALL WRRRN (’V’, V) ENDIF Exit ScaLAPACK usage CALL SCALAPACK_EXIT(MP_ICTXT) Shut down MPI MP_NPROCS = MP_SETUP(‘FINAL’) END Figure 3. Fortran code example shows the IMSL Fortran Library tools to leverage ScaLAPACK. This example computes the singular value decomposition of a 6 x 4 matrix A. The matrices U and V containing the left and right singular vectors, respectively, and the diagonal of S, containing the singular values, are printed using the utility function WRRRN. More information about this example can be found on online documentation. Notice array allocation and several different utility routines are still required, but working off of this example is significantly more straightforward for a developer new to MPI than the equivalent non-IMSL ScaLAPACK version. In comparison, calling ScaLAPACK directly would require several calls to BLACS_**** routines and many other functions not required when using the MP_SETUP and SCALAPACK_SETUP convenience routines provided by the IMSL Fortran Library. One can remove the MPI_SETUP and SCALAPACK_SETUP calls from the above example and the call to DLSVRR would be executed on a single computer (not distributed) using the equivalent LAPACK function instead. Math Libraries for Windows HPC 9 In the Box Data Type technique, multiple independent two-dimensional linear algebra problems can be stacked up as planes in a three-dimensional “box”. Individual planes are then distributed among nodes on the cluster for calculation and the results returned to the head node. This technique has shown superlinear scaling for very large problems. Many other parallelized functions are available through overloaded operators. Example: IMSL Fortran on Windows HPC Server 2008 with MS-MPI This section will provide a detailed walkthrough to build and execute a distributed calculation using MPI. As mentioned above, the Intel Visual Fortran Compiler will be used through the Visual Studio 2008 interface. A subsection at the end will describe the steps to build and run the project from the command line interface as well. The example uses a basic example using the Box Data Type technique and the overloaded operator .ix. which computes the product of the inverse of matrix A and vector or matrix B, . Installing the IMSL Fortran Library on Windows To obtain the IMSL Fortran Numerical Library, you can download an evaluation copy from the Visual Numerics website at http://www.vni.com/forms/fortran_download_choice.php. Select the option for “x86_64, Windows XP 64, Intel Fortran Compiler 10.0”; an evaluation CD can also be requested to be mailed. You will need a valid license key to execute the example, which can be acquired by contacting an Account Manager at Visual Numerics by email. If you have downloaded the product, first unzip the archive named fnl60_p10484.zip for version 6.0 of the IMSL Fortran Library. The CD contents are the same as this archive, as shown in Figure 4. In either case, start the setup procedure by running the setup.exe application. Figure 4. Installation files for the IMSL Fortran Library. Run the setup.exe application to start the installation. Running the setup program will initialize the installation procedure. Select the library to install (there is likely to be only one option) and click Next > as illustrated in Figure 5. Math Libraries for Windows HPC 10 Figure 5. Select the appropriate product to install for the platform. You will immediately be presented with the option to update system environment variables. These will make using the product easier and it is recommended you choose the option. The only case where this option should be declined is when running multiple versions of the IMSL Fortran Library on the same system. This option is shown in Figure 6. Figure 6. It is recommended to let the setup application update environment variables. The InstallShield Wizard starts next; click Next > to continue. To install the product, you must agree to the Visual Numerics, Inc. Software License Agreement by clicking Yes on the following screen. If you are a current customer and have a License Number, enter it on the following screen, otherwise enter 999999 and click Next > to continue. The Installation Location must be specified next; to use the default of “C:\Program Files (x86)\VNI\” click Next > to proceed. The installer is finally ready to copy files; progress is monitored as the files are copied. Once complete, a final dialog will let you click Finish to complete the installation and close the Setup Wizard. The following montage of screenshots, Figure 7 collectively, should help guide you through the process. Math Libraries for Windows HPC 11 Figure 7. Screenshots showing the steps to install the IMSL Fortran Numerical Library. Creating the Project Now that all the products have been installed, the next step is to create a new Fortran console Project in Visual Studio 2008 as shown in Figure 8 and Figure 9. Math Libraries for Windows HPC 12 Figure 8. Creating a new Project in Visual Studio 2008. Figure 9. Selecting an Intel Fortran project in Visual Studio 2008. This will set up a default blank solution. Create a new source file named Source1.F90 as follows: Rightclick on Source Files in the Solution Explorer and select Add -> New Item, and in the Add New Item dialog, Source should be highlighted with a default filename of Source1.F90. Select Add to create the new file; see Figure 10. Math Libraries for Windows HPC 13 Figure 10. Add a new Fortran source file in Visual Studio 2008. The next task is to change the Project Property settings to build for the x64 architecture by selecting Build -> Configuration Manager. In this dialog, select the Active Solution Platform drop-down menu and select <New…> and then set the Type to x64 as shown in Figure 111. Figure 11. Modifying the Solution Platform to x64 for 64-bit Windows environments. Close open dialogs and double-click to open Source1.F90 in the Solution Explorer. Here, paste in the following code from an example in the IMSL Fortran documentation for the .ix. operator shown in Figure Math Libraries for Windows HPC 14 12. Notice the array size is n x n x k for this box data type problem, which translates to k planes of an n x n matrix. Configured in this way, the problem separates nicely into the number of planes which is distributed across the network using MPI. use rand_int use norm_int use operation_x use operation_ix use mpi_setup_int implicit none integer, parameter :: n=32, k=4 real(kind(1e0)) :: one=1e0 real(kind(1e0)), dimension(n,n,k) :: A, b, x, err(k) call erset(0,1,0) ! Setup for MPI. MP_NPROCS=MP_SETUP() ! Generate random matrices for A and b: IF (MP_RANK == 0) THEN A = rand(A); b=rand(b) END IF ! Compute the box solution matrix of Ax = b. x = A .ix. b ! Check the results. err = norm(b - (A .x. x))/(norm(A)*norm(x)+norm(b)) if (ALL(err <= sqrt(epsilon(one))) .and. MP_RANK == 0) & write (*,*) 'Example for .ix. is correct.' ! See to any error messages and quit MPI. MP_NPROCS=MP_SETUP('Final') end Figure 12. Fortran source code for the example using MPI and the .ix. operator. In this example, two random matrices are created and the inverse computed using the .ix. operator, which is MPI-enabled and will distribute the work across the cluster. The result is checked using the .x. operator and the norm function. More information about this example and the use of overloaded operators can be found in the online documentation. The IMSL Fortran library has not yet been added to the project as a reference, so the next step is to update the Project Properties again to add the Include and Library directories. Select Project -> Console1 Properties to open the next dialog. Under the section Configuration Properties -> Fortran, several directories need to be added under the Additional Include Directories. Click in the blank area, click the dropdown button and select <Edit..> to bring up an easy dialog to edit. The folders include: C:\Program Files\Microsoft HPC Pack 2008 SDK\Include C:\Program Files (x86)\VNI\imsl\fnl600\Intel64\include\dll C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.021\em64t\Include Note that the paths may be different on different systems depending on where the various products and tools were installed. Click OK to close the dialog. Figure 13 illustrates this step: Math Libraries for Windows HPC 15 Figure 13. Configuring the additional Include directories in Visual Studio 2008. Next select the Language item and click on “Process OpenMP Directives” to add the /Qopenmp flag to the compiler options as shown in Figure 14. This is not specifically required for the example here, but your own code my include OpenMP directives and this switch must be turned on in that case. Figure 14. Adding the /Qopenmp command line compiler option in Visual Studio 2008. Math Libraries for Windows HPC 16 Click Apply to set this change. Next, a few items need to be added under the Linker section. Close the Fortran part of the tree under Configuration Properties, open the Linker options and select Input. Under Additional Dependencies, we need to add the following items: imsl.lib imslsuperlu.lib imslhpc_s.lib imslp_err.lib mkl_scalapack.lib mkl_blacs_mpich2.lib mkl_em64t.lib libguide.lib msmpi.lib msmpifec.lib lmgr.lib kernel32.lib user32.lib netapi32.lib advapi32.lib gdi32.lib comdlg32.lib comctl32.lib wsock32.lib libcrvs.lib libFNPload.lib libsb.lib There is not a pop-up dialog, such as in the Include Directories section, to enter this list. It is necessary to enter them in the area one after another with spaces between (or paste the list from this document). These add references to several IMSL Fortran Library components and also MS-MPI files. Use Figure 15 as a guide. Figure 15. Adding dependent libraries to the project in Visual Studio 2008. Of course, the project needs path information for these files as well. Under Linker -> General, add the following paths as above for the Include Directories under the Additional Library Directories section: C:\Program C:\Program C:\Program C:\Program Files\Microsoft HPC Pack 2008 SDK\lib\amd64 Files (x86)\VNI\imsl\fnl600\Intel64\lib Files (x86)\Intel\Compiler\Fortran\10.1.021\em64t\Lib Files\Microsoft SDKs\Windows\v6.0A\Lib\x64 Figure 166 illustrates this step. Math Libraries for Windows HPC 17 Figure 16. Adding additional Library paths in Visual Studio 2008. Close all the open dialogs, and we are finally ready to compile the solution. Select Build -> Build Solution from the main Visual Studio menu. Hopefully everything builds properly and you see the friendly “Build succeeded” message in the bottom information area. If not, check the source code and configuration settings for typos and missing pieces. Running the Example After the project is built, browse to the output directory (typically Console1\x64\Release) and locate the executable console1.exe. This file must be copied to a network share that is visible to all the nodes on the cluster. For this example, the head node is named “clusterbot” and by convention distributed applications are placed in a shared directory named “tasks”. The full path is \\clusterbot\tasks\VNI\console1.exe. To execute this code, open the Windows HPC Job Manager by browsing Start -> Programs -> Microsoft® HPC Pack -> HPC Job Manager. To submit a new job, select Actions -> Job Submission -> New Job as shown in Figure 17. Math Libraries for Windows HPC 18 Figure 17. Selecting a New Job in the Windows HPC Job Manager. This will open the Create New Job dialog box where all the details are entered. Name the Job something descriptive like “IMSL Example” and then select the Task List option from the left hand navigation pane and click Add. In the Command Line field, enter the full path to the console1.exe discussed above along with a leading “mpiexec” entry. The mpiexec program is required to run MPI applications. The working directories should also be entered as valid shared paths on the network. Also configure the Minimum and Maximum resources as applicable to your configuration. Please refer to Figure 18. Figure 18. Configuring the Task Details in the Windows HPC Job Manager. Save these entries and the summary should appear similar to Figure 19. Math Libraries for Windows HPC 19 Figure 19. The summary of the job for the MPI example. Click Submit and the Job will be added to the queue. On the listing of All Jobs in the Windows HPC Job Manager, this job will be appear after it has been submitted. At first, its State will be Running, but it will soon change to Finished as shown in Figure 20. Figure 20. The status of the submitted job in the Windows HPC Job Manager. From the Command Line Many developers continue to be very comfortable at the command line; therefore this short section will walk through the above example from the command line point of view. When using the compiler tools from the command line, it is best practice to start with the command line window supplied with the Intel Math Libraries for Windows HPC 20 compiler. This window presets environment variables and compiler settings; it can be accessed by using the shortcut Start -> Programs -> Intel Software Development Tools -> Intel Fortran Compiler 10.1 -> Visual Fortran Build Environment. To use IMSL Fortran in this setting, the next step is to run the fnlsetup.bat startup script. The command session to this point may look like the following: Intel(R) Visual Fortran Compiler for applications running on Intel(R) 64, Version 10.1.021 Copyright (C) 1985-2008 Intel Corporation. All rights reserved. Setting environment for using Microsoft Visual Studio 2008 x64 cross tools. C:\>"c:\Program Files (x86)\vni\imsl\fnl600\Intel64\bin\fnlsetup.bat" Setting environment for IMSL Fortran Library - Intel64 C:\> Several important and useful environment variables will be set at this stage. Since the standard MPI environment for the IMSL Fortran product is MPICH2, you may need to adjust the LINK_MPI_HPC environment to match the following: SET LINK_MPI_HPC=imsl.lib imslsuperlu.lib imslhpc_s.lib imslp_err.lib mkl_scalapack.lib mkl_blacs_mpich2.lib mkl_em64t.lib libguide.lib msmpi.lib msmpifec.lib lmgr.lib kernel32.lib user32.lib netapi32.lib advapi32.lib gdi32.lib comdlg32.lib comctl32.lib wsock32.lib libcrvs.lib libFNPload.lib libsb.lib /link /force:multiple You may also need to add the path to msmpi.lib and msmpifec.lib (typically C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64) to the LIB environment variable as well. To compile the source code, issue the following command utilizing the configured environment variables: %MPIF90% %MPIFLAGS% Source1.F90 %LINK_MPI_HPC% This will build the executable Source1.exe that again should be copied to a shared location on the network. Using the shared directory location described above, the command to submit the job is as follows: C:\>job submit /StdOut:\\cluterbot\tasks\vni\stdout.txt /numnodes:4 mpiexec \\clusterbot\tasks\vni\source1.exe Job had been submitted. ID: 1. Note that a job submitted through the command line in this manner will still appear in the Windows HPC Job Manager GUI interface. All of the options that can be set in the graphical interface are also available from the command line. The results of the submission are the same and will work together if some developers prefer one method over another using the same cluster. Also, the command line switches passed to “job submit” will override those passed to mpiexec; think of mpiexec as another parameter for the specific job. The interaction between the GUI and command line is visible in Figure 21 where the command is issued in the console area and the job appears with its current status in the Windows HPC Job Manager. Math Libraries for Windows HPC 21 Figure 21. Submitting an MPI job through the command line is equivalent to using the graphical interface. Other IMSL Libraries with MPI Other versions of the IMSL Numerical Libraries can be used in MPI settings as well as the IMSL Fortran Library. For MPI developers writing C/C++ applications, it may be easier to call a C library instead of interfacing a Fortran library. While no components of the IMSL C Library themselves utilize MPI to distribute calculations, the library can be integrated into parallel applications that require advanced analytical calculations at each node. The IMSL C Numerical Library is thread safe, however, which enables developers to write shared memory parallel applications built on the library. For .NET developers, the Indiana University group lead by Doug Gregor has made bindings to MPI available for managed code. The MPI.NET package can be used with the IMSL C# Numerical Library for .NET Applications akin to the IMSL C Library mentioned previously. As a managed code library written in pure C#, the IMSL C# Library integrates easily with .NET tools like MPI.NET. Math Libraries for Windows HPC 22 Parameter Sweep Distributed Applications Architecture Overview In this section, we will provide an example of a Parameter Sweep application where the same code is executed on each compute node but with different input data. MS-MPI is not used in this example as the nodes require no communication with each other. Instead, the Windows HPC Job Manager is used to configure a Parameter Sweep job that indicates what program is to be executed, what input parameters are to be used, and where output is to be collected. The runtime components are located in a shared folder on the head node that each compute node can see. When the job begins to run, information is distributed to each compute node where the program is run independently of other computer nodes. Consider the typical network topology for Windows HPC Server 2008 shown in Figure 22 in contrast to the one described in Figure 2. Here distribution of the code is managed by the Head Node and its tools rather than MS-MPI. Calculations are spawned on individual nodes where access to dependencies like the IMSL Libraries is required by each node, typically using a shared network resource. This network resource could also be used to collect output from each node or instance of the distributed application, but for this basic example output is piped to standard out and collected after the simulation completes. Figure 22. A typical cluster network topology for a Parameter Sweep application. Example: Parameter Sweep Distributed Application in C# The Windows HPC Server 2008 suite of tools allows developers to easily create parameter sweeps where the same code is executed in parallel on different nodes of the cluster. In a typical case, a single-threaded application is written to perform some calculation that can be repeated hundreds or thousands of times by variation of input parameters. The Windows HPC Job Manager has command line and graphical user interfaces to define the tasks to be distributed. Any of the IMSL Numerical Libraries could be included as a component of the code to be distributed. The following example focuses on the IMSL C# Numerical Library and the distribution of a Monte Carlo simulation .NET application across a cluster. In this example, a small application was created to run a simulation based on a random seed provided via the command line when defining the distributed tasks. Since a single simulation runs very quickly, each individual task performs a number of simulations. Subsequently the results are aggregated together in a file for this simple example. The rest of this section walks through all of the steps necessary to create and execute a Parameter Sweep application written in C#. Of course any language could be used, not even Math Libraries for Windows HPC 23 limited to the .NET family, but this example will leverage Visual Studio and C#; the code is fairly straightforward and should not be challenging to port to other languages. Installing the IMSL C# Numerical Library To obtain the IMSL C# Numerical Library, you can request an evaluation copy from the Visual Numerics website at http://www.vni.com/forms/cSharp_registrationForm.php and an evaluation CD will be mailed to you. Alternatively, you can contact an Account Manager at Visual Numerics and request a secure FTP download. You will need a valid license key to execute the example, which can be acquired by contacting an Account Manager at Visual Numerics by email. If you have downloaded the product, first unzip the archive named p10408.zip. Note that the part number may be updated as new versions are released. The CD contents are the same as this archive. In either case the file listing should look like Figure 23. Start the setup procedure by running the Setup.exe application. Figure 23. Install files for the IMSL C# Numerical Library. Run the Setup.exe application to start the installation. The IMSL C# Library can be used in 32-bit and 64-bit environments for .NET 1.1 or .NET 2.0 and greater. For the 64-bit environment of Windows HPC Server 2008, select the third option, “IMSL C# for .NET 2.0 and above, 64-bit FlexLM DLL” as seen in Figure 24. The contents of the library are the same for each version, but this version is built specifically for .NET 2.0 or greater linking in a 64-bit DLL for the license manager. Figure 24. Select the 64-bit version for Windows HPC Server 2008. Math Libraries for Windows HPC 24 You should now see the initial welcome screen for the Setup Wizard. Click Next > to continue the installation. Next, you will need to accept the Visual Numerics, Inc. End-User License Agreement by selecting “I accept” and clicking Next >. Enter a User Name and Organization and click Next > again. If you are a current customer and have a License Number, enter it on the following screen; otherwise enter 999999 and click Next > to continue. The Installation Location must be specified next; to use the default of “C:\Program Files (x86)\VNI\” click Next > to proceed. The installer is finally ready to copy files; click Install to begin this step. Progress is monitored as the files are copied. Once complete, a final dialog will let you click Finish to complete the installation and close the Setup Wizard. Click Close on the initial setup dialog to end the procedure. The following montage of screenshots, collectively Figure 25, should help guide you through the process: Math Libraries for Windows HPC 25 Figure 25. Screenshots showing the steps to install the IMSL C# Numerical Library. The product can now be found in the installation folder. To install your license key, browse to C:\Program Files (x86)\VNI\imsl\license and create or paste the license file as indicated by the information supplied with the license key. You can find the full product documentation under the C:\Program Files (x86)\VNI\imsl\imslcs500\manual folder along with a gallery of demonstration applications in C:\Program Files (x86)\VNI\imsl\imslcs500\gallery. All of the assemblies and shared libraries can be found in C:\Program Files (x86)\VNI\imsl\imslcs500\bin. Note for all these paths, the folder name “imslcs500” is specific to the 5.0 version of the IMSL C# Library; future versions will have updated version numbers for the folder. The ImslCS.dll is the primary assembly, which is a pure managed code library. For higher performance, Visual Numerics also supplies a version of the library that uses the native C++ Intel Math Kernel Library (MKL) for BLAS functions and this is named ImslCS_mkl.dll. More information about these files and the product can be found in the ReadMe.html file located at C:\Program Files (x86)\VNI\imsl\imslcs500\ReadMe.html. Creating the Project To get started, create a new Project that holds a C# Console Application in Visual Studio as shown in Figure 26. Any name is fine, but the default ConsoleApplication1 is used in the example. Math Libraries for Windows HPC 26 Figure 26. Creating a new C# Console Application in Visual Studio 2008. You will be presented with a standard C# class template with a Namespace and Class that has an empty Main method. The next step is to integrate the IMSL C# Numerical Library into the project by adding a reference to the assembly. The reference can be added by right-clicking on References in the Solution Explorer and selecting “Add Reference…” or by choosing Project -> Add Reference on the Visual Studio menu bar. This will spawn the Add Reference dialog box shown in Figure 27. Browse to the ImslCS.dll assembly or enter its path in the File Name area; the default path is C:\Program Files (x86)\VNI\imsl\imslcs500\bin\ImslCS.dll. Math Libraries for Windows HPC 27 Figure 27. Adding a reference to the IMSL C# assembly in Visual Studio 2008. Confirm the assembly is available by entering using Imsl.Math in the source code. The Visual Studio auto-complete feature should display suggestions after the dot is typed. Additionally, whenever a class is referenced, all of the available methods and properties are displayed; selecting a method or constructor will show all of the required parameters. This convenient feature of Visual Studio 2008 is shown in Figure 28, Figure 299 and Figure 30. Figure 28. Code completion at the Namespace level in Visual Studio 2008. Math Libraries for Windows HPC 28 Figure 29. Code completion showing available methods for an Imsl.Stat.Random object instance in Visual Studio 2008. Figure 30. Code completion showing required parameters for a method in Visual Studio 2008. The next step is to enter the source code. This code is shown in Figure 31, but without the hardcoded variance-covariance matrix. This would take over 40 pages to print in the document, so it is summarized here in a #region block; please refer to Appendix A for details on obtaining the dataset. This is a dense 100 x 100 matrix of double values with the main diagonal containing the variance of each of the 100 assets to be modeled; the off-diagonal element at ai,j is the covariance of the i-th and j-th assets. using System; using Imsl.Stat; using Imsl.Math; namespace Simulate { class Compute { private Cholesky chol; private double[] bins, portfolioValues; private int nVariables; private int nSamples = 5000; public Compute(int seed) { [covar data] nVariables = covar.GetLength(0); chol = new Cholesky(covar); portfolioValues = new double[nVariables]; for (int i = 0; i < nVariables; i++) { portfolioValues[i] = 200; } RunMonteCarlo(seed); } public void RunMonteCarlo(int seed) { Math Libraries for Windows HPC 29 int nBins = 50; double max = 0.01; Imsl.Stat.MersenneTwister mt = new MersenneTwister(seed); Imsl.Stat.Random random = new Imsl.Stat.Random(mt); double center = Portfolio(new double[nVariables]); bins = new double[nBins]; double dx = 2.0 * max / nBins; double[] x = new double[nBins]; for (int k = 0; k < nBins; k++) { x[k] = -max + (k + 0.5) * dx; } // This would typically be a threaded loop // but in this serial version, we just work // on a set of single samples. for (int i = 0; i < nSamples; i++) { double[] r = random.NextMultivariateNormal( nVariables, chol); double val = Portfolio(r); double t = (val - center) / center; int j = (int)System.Math.Round((t + max - 0.5 * dx) / dx); Console.Out.WriteLine(j); } } double Portfolio(double[] returns) { double sum = 0.0; for (int k = 0; k < returns.Length; k++) { sum += portfolioValues[k] * (1.0 + returns[k]); } return sum; } /// <summary> /// The main entry point for the application. /// One argument is expected, the integer seed /// for the random number generator. /// </summary> static void Main(string[] args) { int seed; try { seed = Convert.ToInt32(args[0]); } catch (Exception) { System.Random r = new System.Random(); seed = r.Next(); Math Libraries for Windows HPC 30 } new Compute(seed); } } } Figure 31. C# source code for the Monte Carlo model to be run as a parameter sweep. The Main method expects a random seed to be input as an argument; there is some error checking included for testing purposes so that it will still run if the argument is not provided. The seed is used in the Imsl.Stat.MersenneTwister class to create a set of random numbers of this instance of the application. The data used for the simulation is actually the Cholesky factorization of the variance-covariance matrix computed using the Imsl.Math.Cholesky class. The simulation is rather simple, with no weighting of the assets, and the output for the nSamples results are written to the console output. The Parameter Sweep configuration will drop these in a central location for easy post run analysis. Build the application by selecting Build -> Build Solution in Visual Studio; hopefully “Build succeeded” appears in the status area. Running the Example The binaries required in the deployment include the executable just built (ConsoleApplication1.exe) and the assemblies associated with the IMSL C# Library (ImslCS.dll and LicenseFlexLM.dll). Copy these three files to a shared directory visible to all nodes of the cluster. This example uses \\clusterbot\tasks\VNI. Finally note that the environment variable LM_LICENSE_FILE must be configured for each node pointing to a valid license file. The pieces are in place, so next it is time to define the job. Open the Windows HPC Job Manager (Start -> Programs -> Microsoft HPC Pack -> Windows HPC Job Manager) and select Actions -> Job Submission -> Parametric Sweep Job as shown in Figure 32. Figure 32. Submitting a new Parametric Sweep Job using the Windows HPC Job Manager. This opens the Submit Parametric Sweep Job dialog window. For this example, run 50 tasks across 8 nodes for a total number of 250,000 simulations (as each individual task does 5000). Therefore, set the End Value index to 50 and modify the Command Line entry to point to the executable built above. The options should look similar to Figure 33. Math Libraries for Windows HPC 31 Figure 33. Task details for a Parametric Sweep Job in the Windows HPC Job Manager. Click Submit and the job will run as defined. Note however that using this quick method the job will only run on a single core. To spread it out to all the nodes, select the Finished job in the Windows HPC Job Manager window and click View Job under Job Actions. Examine the Task List and you will find that the Requested Resources is just “1-1 Cores” (or Socket or Nodes depending on the default resource type configured). To expand this job to run on all resources, click Save Job As and save the job description as an XML file, paramsweep.xml for example. Click Cancel to close the open dialog after saving. Now select Create New Job From Description File under the Job Submission menu and open the XML file just saved. Under Job Details, set the Minimum and Maximum resources as appropriate for the cluster. With four dual-core nodes on this cluster, we can set this to 8 here as shown in Figure 34. Math Libraries for Windows HPC 32 Figure 34. Configuring the minimum and maximum resources for a job in the Job Details menu. Next select Task List on the left hand navigation menu and notice the “1-1” under Required Resource, Number of Cores. Set this to match the values defined just above; see Figure 35. Figure 35. Updating the Required Resources for a parameter sweep job in the Windows HPC Job Manager. Now you can save the updated XML file with the “Save Job As” button or click Submit to run the job across all the resources on the cluster. The output of this example will be 50 files in the working directory with names like “24.out”. To view the results in a meaningful way, a separate program can be written that reads in each file and bins the values into a histogram. This job ran the task 50 times across eight nodes for a total number of 250,000 simulations with a very smooth distribution of results. The output is presented in Figure 36. Math Libraries for Windows HPC 33 Figure 36. Monte Carlo simulation results from a distributed IMSL C# Numerical Library application. Math Libraries for Windows HPC 34 Summary This document provided an overview of Math Libraries available for the Windows platform, with specific focus for developers writing distributed applications using Windows HPC Server 2008. A distributed example using the MS-MPI implementation and the IMSL Fortran Numerical Library demonstrated using the Intel Fortran Compiler, Visual Studio 2008, and the Windows HPC Job Manager. Finally, a parameter sweep example presented code written in C# leveraging the IMSL C# Numerical Library for .NET Applications. Again, Visual Studio 2008 and the Windows HPC Job Manager were primary tools. Math Libraries for Windows HPC 35 Appendix A The raw data used for the Parameter Sweep example is too long to list in-line with the source code in Figure 31. For the sake of completeness the full data array is available online in a thread at the Visual Numerics Forum so a reader can utilize the example code. A ZIP file is attached to the thread and is available for download. The definition for the covar variable should be placed where [covar data] is indicated in the source code listing. Math Libraries for Windows HPC 36 Feedback Did you find problems with this tutorial? Do you have suggestions that would improve it? Send us your feedback or report a bug on the HPC developer forum. More Information and Downloads Informational URL for the IMSL Libraries http://www.vni.com/products/imsl/index.php Download link http://www.vni.com/downloads/index.php This document was developed prior to the product’s release to manufacturing, and as such, we cannot guarantee that all details included herein will be exactly as what is found in the shipping product. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Visual C++, Visual Studio, Windows, and the Windows logo are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners. Math Libraries for Windows HPC 37