High Throughput Computing Week: Introduction to the Digipede Network™ ©Copyright Digipede Technologies, LLC. Digipede and the Digipede Network are trademarks of Digipede Technologies, LLC. Microsoft, Excel, Visual Basic, and Visual Studio are registered trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their respective owners. Digipede Network ™ Session 1 Training Guide Table of Contents Table of Contents .......................................................................................................................................... 2 Preface .......................................................................................................................................................... 5 Intended Audience.................................................................................................................................... 5 How to Contact Us.................................................................................................................................... 1 Conventions Used in this Guide ............................................................................................................... 6 Introduction.................................................................................................................................................... 7 Benefits of the Digipede Network ............................................................................................................. 8 Digipede Network System Overview ........................................................................................................ 9 ................................................................................... 10 How the Infrastructure Works............................................................................................................. 10 Digipede Server ............................................................................................................................. 10 Digipede Agent............................................................................................................................... 11 How to Create and Submit a Job ....................................................................................................... 13 Digipede Workbench...................................................................................................................... 13 Digipede Framework SDK.............................................................................................................. 13 PowerShell Scripting ...................................................................................................................... 13 jobsubmit.exe ................................................................................................................................. 13 Software Requirements ...................................................................................................................... 13 .NET 2.0 ......................................................................................................................................... 14 IIS ................................................................................................................................................... 14 Microsoft SQL Server..................................................................................................................... 14 Summary ................................................................................................................................................ 14 Digipede Software Components ................................................................................................................. 15 Digipede Server ...................................................................................................................................... 15 What Gets Installed? .......................................................................................................................... 15 DigipedeControl ............................................................................................................................. 15 DigipedeTransfer............................................................................................................................ 15 DigipedeWS ................................................................................................................................... 15 Database ........................................................................................................................................ 15 Services ......................................................................................................................................... 16 Event Log ....................................................................................................................................... 16 Advanced Topics ................................................................................................................................ 16 Digipede Licensing......................................................................................................................... 16 Server Licenses ............................................................................................................................. 16 Agent Processor Licenses ............................................................................................................. 16 Summary ................................................................................................................................................ 17 Digipede Agent ....................................................................................................................................... 18 What Gets Installed? .......................................................................................................................... 18 Local Storage ................................................................................................................................. 19 Event Log ....................................................................................................................................... 19 Advanced Topics ................................................................................................................................ 19 Silent installation for large-scale deployments............................................................................... 19 Summary ............................................................................................................................................ 19 Digipede Control Basics.............................................................................................................................. 20 Pools ....................................................................................................................................................... 20 Compute Resources ............................................................................................................................... 20 Agent Availability ................................................................................................................................ 21 Agent Base Priority............................................................................................................................. 21 Agent Administration............................................................................................................................... 21 Check-in Frequency ........................................................................................................................... 21 Users....................................................................................................................................................... 23 System Roles ..................................................................................................................................... 23 System Administrator ..................................................................................................................... 23 High Throughput Computing Digipede Training.doc November, 2007 2 Digipede Network ™ Session 1 Training Guide System Monitor .............................................................................................................................. 23 Job Template Administration .................................................................................................................. 23 Job Administration .................................................................................................................................. 24 Job Control ......................................................................................................................................... 24 Troubleshooting.................................................................................................................................. 24 Best Practices ......................................................................................................................................... 24 Pool Configuration .............................................................................................................................. 24 Job Template Management................................................................................................................ 24 Summary ................................................................................................................................................ 25 Job Templates, Jobs, and Tasks ................................................................................................................ 26 ....................................................................... 26 File Definitions and Parameters ............................................................................................................. 27 File Definition...................................................................................................................................... 27 Relevance ...................................................................................................................................... 27 File Transfer ................................................................................................................................... 27 Location.......................................................................................................................................... 28 Parameter........................................................................................................................................... 28 Summary ................................................................................................................................................ 29 Job Template .......................................................................................................................................... 29 File Definition...................................................................................................................................... 29 Version ............................................................................................................................................... 29 Application Control ............................................................................................................................. 30 Command line ................................................................................................................................ 30 Standard Out/Standard Error ......................................................................................................... 30 .NET APIs ........................................................................................................................................... 31 Executive........................................................................................................................................ 31 Worker............................................................................................................................................ 31 COM ................................................................................................................................................... 31 IComWorker ................................................................................................................................... 31 Job Defaults........................................................................................................................................ 32 Summary ............................................................................................................................................ 32 Job .......................................................................................................................................................... 32 File Definitions .................................................................................................................................... 32 Parameters ......................................................................................................................................... 32 Settings............................................................................................................................................... 33 Summary ............................................................................................................................................ 33 Task ........................................................................................................................................................ 33 File Definitions .................................................................................................................................... 33 Parameters ......................................................................................................................................... 33 Result Files......................................................................................................................................... 33 Summary ............................................................................................................................................ 33 Digipede Workbench................................................................................................................................... 33 Wizards ................................................................................................................................................... 34 Job Template Wizard.......................................................................................................................... 34 Job Wizard.......................................................................................................................................... 35 Parameters in Workbench.................................................................................................................. 35 Designers................................................................................................................................................ 36 Job Template Designer ...................................................................................................................... 36 Job Designer ...................................................................................................................................... 37 Job Tracking ........................................................................................................................................... 38 Saving Job Templates and Jobs ............................................................................................................ 39 Digipede Control ......................................................................................................................................... 39 Job Template Page................................................................................................................................. 39 Jobs Page ............................................................................................................................................... 40 Task Page............................................................................................................................................... 41 Hello World Walkthrough ............................................................................................................................ 43 High Throughput Computing Digipede Training.doc November, 2007 3 Digipede Network ™ Session 1 Training Guide Defining the Job Template and Initial Job .............................................................................................. 43 Minen Walkthrough ..................................................................................................................................... 47 References ....................................................................................................Error! Bookmark not defined. Glossary ...................................................................................................................................................... 52 High Throughput Computing Digipede Training.doc November, 2007 4 Digipede Network ™ Preface Distributed computing has moved from academic research to commercial reality. Organizations today can use existing compute resources to improve the scalability and speed of their most demanding applications. Choosing the right platform is the key to distributed computing success. Most or all of a business’s servers, workstations, and software run on Microsoft Windows. Their developers use Microsoft' s Visual Studio .NET software development tools. The Digipede Network, built entirely on the Microsoft .NET platform, is the answer. As a seasoned Microsoft software developer, Digipede understands the needs of customers using Microsoft technologies. Other distributed computing solutions focus on UNIX and Linux, requiring lengthy implementation, steep learning curves, and a heavy IT burden. In contrast, the Digipede Network is radically easier to buy, install, learn, and use. With its familiar Windows user interface, the Digipede Network allows users to become productive immediately. Unlike competing solutions, no complex scripting, major modification of existing applications, or on-site implementation help is necessary. Session 1 Training Guide How to Contact Us Address: Digipede Technologies 3640 Grand Avenue Suite 206 Oakland, CA 94610 Phone: (510) 834-3645 Community Forums: http://support.digipede.net /community/ Website: http://www.digipede.net/ The Digipede Network delivers the benefits of distributed computing at any scale. Whether a small department with five computers or a corporation with thousands of servers, desktops, and cluster nodes, everyone can benefit. The Digipede Network can be downloaded, installed, and configured in less than an hour, and you’ll be on your way to improved productivity and application performance. It' s that simple. Intended Audience This document serves as an introduction to the Digipede Network and is a companion guide for Session 1 of the Digipede Training Series. Session 1 training is intended for those who will install and administer the Digipede Network. We have tested and verified the information provided in this book, however, you might find that features have changed or been added. Please let us know of any errors you find, as well as any suggestions for future editions. High Throughput Computing Digipede Training.doc November, 2007 5 Digipede Network ™ Session 1 Training Guide Conventions Used in this Guide The following formatting conventions are used in this guide: Type of term Convention Reference to another document “Quotes” Reference to another section Bold in this document Reference to a table or figure in this document Bold Element in user interface Bold File names and paths Italics Command lines, code examples Courier font We' ll use the terms grid computing and distributed computing interchangeably. Some people prefer one over the other, but we think of them as meaning the same thing: using many computers together to get work done faster. High Throughput Computing Digipede Training.doc November, 2007 6 Digipede Network ™ Session 1 Training Guide Introduction The Digipede Network is a distributed computing solution that delivers dramatically improved performance for real-world business applications. By utilizing the power of distributed computing, enterprises and developers achieve better speed, scalability, and reliability for their applications. Built entirely on the .NET platform, the Digipede Network is radically easier buy, install, learn, and use than other grid computing solutions. It includes the Digipede Framework, with which developers can build scalable, high-performance, distributed applications in the familiar Visual Studio environment. It can also be used with existing software, without re-linking or recompiling. Here are a few examples of how users can reap immediate benefits: 1. Command-line Applications - No Recompiling Required. Any command-line application that does not require user interactions can be distributed as is. The Digipede Network delivers dramatically increased performance on key applications - with no code modification. 2. Enterprise Software – Scale-Out the Middle Tier. If your enterprise applications could scale better, so could your business. When your software scales to meet your needs, you can handle more growth, take on larger jobs, and keep your teams more productive. Many enterprise applications are constrained by middle-tier scalability issues; the Digipede Network is designed to eliminate these bottlenecks. 3. Service Oriented Applications - Don't Let Your Users Wait. As more applications are made available as services—either as part of a Services Oriented Architecture, or through any web interface—the need for using standardized techniques for scaling those applications to handle variable usage becomes increasingly important. The Digipede Network features patent-pending technology to provide automatic CPU load-balancing and guaranteed quality of service, making it an ideal solution for scaling web services or any SOA. Used in combination with Excel Services and SharePoint Server, the Digipede Network puts the power of grid computing behind Office 2007 servers--creating a solution that delivers the new capabilities of Office 2007 with the scalability you need. High Throughput Computing Digipede Training.doc November, 2007 7 Digipede Network ™ Session 1 Training Guide Benefits of the Digipede Network Scales out applications and processes for higher performance: Distributes application load across Windows desktops, servers, and clusters. Scales from five nodes to thousands. Delivers order-of-magnitude increase in speed and throughput. Capacity on demand. Increases productivity: Shorter runtimes mean less waiting, more productive work. Quote: "Installation was straightforward, and the Digipede Framework SDK made grid-enabling our applications far simpler than we'd anticipated. We demonstrated near-linear scalability on a critical application with just a few lines of code, and we got far better management, monitoring, and flexibility than our own tools offered.” - actual Digipede customer Increased use of idle resources raises IT efficiency. Run multiple jobs on your grid - simultaneously. Submit jobs from any networked computer. Flexible and powerful APIs enable your developers to grid-enable applications quickly. Enable developers to focus on business requirements instead of building an in-house grid or distribution platform. First commercial grid computing solution based entirely on .NET: Integrates with Visual Studio .NET for developer productivity. Integrates with Windows security for consistency with current practices. Uses Web services for ease of implementation. Development community with forums and sample code. Relies on a scalable grid computing platform that: Quote: “With the Digipede Network we’ve been able to handle ten times the load on our Web application with no decrease in quality of service to our users...and we saved about $100,000 in hardware and software licensing costs when compared to alternate solutions.” - actual Digipede customer Guarantees quality of service. Guarantees task completion. Integrates data transfer. Integrates with your current Windows security. Provides job monitoring functions. Provides automatic CPU load balancing. Supports smart caching. Low total cost of ownership: High Throughput Computing Digipede Training.doc November, 2007 8 Digipede Network ™ Session 1 Training Guide Radically easier than other grid systems – you can install and administer it yourself. Uses standard server and desktop hardware. Digipede Network System Overview The Digipede Network is comprised of the following components: Infrastructure The Digipede Server manages the workflow through the system. Digipede Agents™ manage each of the individual desktops, servers, or cluster nodes and the tasks that run on them. Administration Digipede Control™, a website that resides on the same machine as the Digipede Server and provides the administrative user interface for the system. Job Creation/Submission The Digipede Workbench, an easy to use Windows application through which users can define and run jobs. The Digipede Framework SDK™, a programming API that developers can use to programmatically create and submit jobs. High Throughput Computing Digipede Training.doc November, 2007 9 Digipede Network ™ Session 1 Training Guide How the Infrastructure Works The Digipede Server and the Digipede Agents make up the grid infrastructure. For a compute resource to join the grid the resource must be able to connect with the Digipede Server (via HTTP) and a Digipede Agent must be installed on it. Once the grid infrastructure has been set up, a user can submit a job from any machine on the network. The user machine does not have to be a part of the grid; it must simply be able to talk to the Digipede Server. For users coming from a cluster background, this can be a new concept. Generally, jobs submitted on a cluster must be submitted from the head node. This is not the case with the Digipede Network where any computer on your network can submit a job to the grid. Note: If an agent or a compute resource goes down while working, the server will automatically reassign that task to another agent. The server itself can have a separate failover ready to take over if it goes down. Guaranteed quality of service, with no single point of failure. A job is a collection of tasks that is submitted to the Digipede Network as one unit of work. A user creates a job and submits the job to the Digipede Server. Once the Digipede Server receives the job, the job information is placed into a prioritized queue of work. As each Digipede Agent checks in with the Digipede Server, it looks at the job queue to see if there are any tasks available that it can run. When a Digipede Agent identifies a task that it can execute, it takes the task, runs it, and returns the results to the Digipede Server. The Digipede Server then returns the task results back to the user. A task executing on a compute node can access any networked resource that the Digipede Agent has the right to use. This includes file shares, databases, and even the Internet. On the surface the flow of job requests through the Digipede Network is very simple, but there is a lot going on under the covers to manage the requests and to ensure optimal use of the grid resources. Digipede Server Many grid computing solutions have a job scheduler that assigns tasks to specific compute resources. This requires that the job scheduler keep track of compute resource’s specific information such as availability, hardware, and installed software. This approach does not scale well, because the server is forced to track and actively monitor each of the compute resources. The Digipede Server instead pushes the task assignment High Throughput Computing Digipede Training.doc November, 2007 Note: The Digipede Server consists of a Windows service and a web service. The user interface is provided entirely through a browser-based component (see Digipede Control below). 10 Digipede Network ™ Session 1 Training Guide decision off to the Digipede Agents specifically because the Digipede Agent knows about the compute resource that it is installed on. The Digipede Server makes sure that tasks are completed, keeps track of the jobs and their status, and stores job information. Often jobs have files that are associated with them such as data files or execution files. The Digipede Server supports moving these files through the system to the compute resource where they are needed. This frees the administrator and user from the need to pre-load software and data on the compute resources. The Digipede Server can support a large grid because the work assignment decisions are actually made by the Digipede Agent. Guaranteed job completion. The Digipede Server passively monitors all work on the Digipede Network. If a Digipede Agent is unable to finish a task, the Digipede Server puts that task back in the task queue so that another Digipede Agent can execute it. This is how the Digipede Server guarantees that a job completes. Digipede Agent The Digipede Agent decides which tasks it can execute based on its hardware, software, and availability. This is called a pullsystem and there are many benefits to this approach: Automatic CPU load-balancing. Each Digipede Agent takes a task when it is available. Compute node resource information is always up to date. The Digipede Agent collects compute resource configuration information each time it is started. For example, if RAM is added to the compute resource, the Digipede Agent automatically collects the new information when the machine reboots. Knowing the amount of available RAM is important because jobs can be defined with specific hardware requirements. The Digipede Agent uses the most recent system data to decide which tasks it can take. This eliminates the need to notify the Digipede Server about hardware upgrades. Note: Only one Agent is installed on a compute node—even if the compute node has more than one processor. However, the agent can manage multiple processes simultaneously in order to take advantage of multiprocessor systems. Software components can be cached on the compute node. The Digipede Network supports the caching of files to reduce bandwidth utilization. The Digipede Agent knows what software it has installed and cached on the compute node. So if the user has a specific job that is submitted regularly the required execution and data files can be cached on the compute resource so that they are only moved once. This reduces the amount of bandwidth used to run common jobs and eliminates the time needed to move the common files. High Throughput Computing Digipede Training.doc November, 2007 11 Digipede Network ™ Session 1 Training Guide When the agent has identified a job in the queue that it is eligible to work on, it executes the following steps: 1. It identifies any files that it needs to get in order to work on the job. Files can arrive at the agent in one of 3 ways: they can be streamed directly through the Digipede Network, they can be copied from a file share, or they can be fetched via HTTP. 2. It notifies the server that it is taking a task (or several) tasks from the job, and asks for those tasks. The job itself determines how many tasks an agent can take simultaneously—it may be permitted to run more than one task simultaneously. 3. It receives the tasks from the server (and the server notes which tasks it was assigned). 4. It executes the tasks according to the type of job (command-line, COM or .NET). 5. As each task completes, it notifies the server of the completion (returning any appropriate results), and asks for more work (when appropriate). High Throughput Computing Digipede Training.doc November, 2007 12 Digipede Network ™ How to Create and Submit a Job The Digipede Network provides several ways for a user to create and submit a job. Traditional grid computing supports job submission through a scripting language. The Digipede Network expands job submission to new levels by providing tools that make it easier for a user to create and submit jobs. Session 1 Training Guide Flexibility is the key here. Some users are comfortable with programming languages, and some are not. By providing a comprehensive SDK and an easy-to-use user interface, the Digipede Network brings the power of grid computing to more people than ever. Digipede Workbench The Digipede Workbench is designed to replace scripting. A GUI application, the Digipede Workbench provides wizards that walk the user through the process of job creation. Jobs can be submitted and monitored right from the UI. Digipede Framework SDK The Digipede Framework SDK is a set of libraries and development tools that can be used to programmatically manage the Digipede Network. Using the Digipede Framework SDK, programmers can add the power of grid computing to their own applications and some types of third party applications. PowerShell Scripting Digipede has released a PowerShell snap-in that allows complete use of the Digipede Framework—job submission, monitoring, and control—from within a scripting environment. While PowerShell alone allows complete access to the Digipede Framework, the snap-in was designed to make management tasks even easier by providing cmdlets for many common tasks. jobsubmit.exe Jobsubmit.exe is a command-line application that can be used to submit a Job to the Digipede Network and is often used to submit jobs for batch processing. Jobsubmit sends XML files representing jobs; these files can be created with the Digipede Workbench, programmatically (serialized from the Digipede Framework), or hand encoded. Software Requirements The Digipede Network is designed to make grid computing easy and accessible. To accomplish these objectives the Digipede Network is able to move files, guarantee job completion, accurately report status, and allow job submission from any computer on the network. The Digipede Network takes advantage of Microsoft technologies and as a result requires that certain Microsoft High Throughput Computing Digipede Training.doc November, 2007 13 Digipede Network ™ Session 1 Training Guide technologies and software be installed on the machines the Digipede Network runs on. .NET 2.0 The Digipede Network is built using .NET 2.0 and takes advantage of many of the advanced capabilities provided by .NET. .NET 2.0 is required by the Digipede Server, Digipede Agent, and the Digipede Workbench. IIS Microsoft’s Internet Information Services (IIS) is used by several parts of the Digipede Network. Digipede Control is the web-based administration tool, DigipedeTransfer provides HTTP-based file transport, and DigipedeWS provides web services for the entire Digipede Network. All of these components are installed with the Digipede Server. IIS does not need to be installed on the compute nodes themselves; only the Digipede Server requires IIS. While the Digipede Network itself takes advantage of .NET 2.0, the applications distributed by the Digipede Network do not need to use .NET—they can be unmanaged command-line executables, COM servers, or either .NET 1.1 or .NET 2.0 applications. Microsoft SQL Server Microsoft SQL Server is used by the Digipede Network to store jobs, job templates, and configuration information. SQL Server is required by the Digipede Server. If the administrator does not have access to a SQL Server installation, the Digipede Server will install and use a SQL Server Express database. Summary The Digipede Network is a very easy to use, yet powerful distributed computing tool. With many job submission tools to choose from the Digipede Network makes grid computing an accessible and cost effective tool to improve application performance and scalability. High Throughput Computing Digipede Training.doc November, 2007 14 Digipede Network ™ Session 1 Training Guide Digipede Software Components Digipede Server The Digipede Server is the communication hub for the Digipede Network and is the first Digipede Network component to be installed. It is recommended that the Digipede Server be installed on a dedicated server machine, however this is not required. For step-by-step instructions on installing the Digipede Server, please see the “Digipede Network Installation Guide”. What Gets Installed? The Digipede Server requires both IIS and SQL Server to work properly. IIS is required because the Digipede Network installs websites for administration and communication. SQL Server is required because the Digipede Network uses a database to store job and configuration information. DigipedeControl A multi-page website that provides the administrative user interface for the Digipede Network. Users can submit and monitor jobs via Digipede Control, but most non-administrator users prefer using Digipede Workbench. Having a browser-based administrative tool means that the administrator can monitor and control the Digipede Network from any machine in the enterprise. DigipedeTransfer A website that transports files via the HTTP (or HTTPS) protocol. If the network architecture does not permit the use of shares for file copying, you can use DigipedeTransfer to serve files. You can also use this as a destination for results files. Note: Installation of DigipedeTransfer is optional. DigipedeTransfer is only required if you are using HTTP for file transport. Digipede Transfer can also be installed separately from the Digipede Server by simply running the Digipede Server setup application and choosing a Custom installation. DigipedeWS Provides functionality to the Digipede Agents, Digipede Workbench, and any other applications that submit and monitor jobs on The Digipede Network. Database SQL Server is required because the Digipede Network creates a database called DigipedeDB. DigipedeDB stores all job, job template, and configuration information for the Digipede Network. The SQL Server instance can be installed on the same computer as the Digipede Server or on a different one. If an administrator is expecting to install a large and active grid, High Throughput Computing Digipede Training.doc November, 2007 If you are going to install a failover Digipede Server, you must install SQL Server (and the Digipede database) on a different machine than the Digipede Server. Both the primary and secondary Digipede Servers will be configured to run from that database. 15 Digipede Network ™ Session 1 Training Guide then for optimal performance it is recommended that SQL Server and the Digipede Server be installed on different machines. If the administrator does not have access to a SQL Server installation, the Digipede Server will install and use SQL Server Express. SQL Server Express is free and perfect for a small grid installation. Services The primary functionality of the Digipede Server is provided by the Digipede Network Service. This program is a Windows service, running as the local system account. It starts automatically on start-up, and will run whether or not any users are logged in to the local machine. Event Log The installation creates a Digipede Event Log that can be viewed through the Windows Event Viewer administrative tool. This event log is useful in administering, setting up, and troubleshooting the Digipede Network. Advanced Topics Digipede Licensing You will need both server and agent-processor licenses. You can use the Digipede License Manager on your Digipede Server to manage your license, add additional agent processor licenses, and to activate your license online. Server Licenses The Digipede Server will not run without a valid license file. After you download your installation of the Digipede Network, you will receive a license file from Digipede. This file must be installed on your server for the Digipede Server, Web Service, and Digipede Control to run. If you try to start the services or view the website without a valid license, you will receive an error message indicating that the license is invalid. If you feel you have received this message in error, contact Digipede at www.digipede.net/support. Agent Processor Licenses Unlike the Digipede Server, Digipede Agents do not need license files. However, each Digipede Server license indicates the number of agent-processor licenses that have been purchased. You can install the agent on as many machines as you like, but only licensed agents will be permitted to perform work on the network. High Throughput Computing Digipede Training.doc November, 2007 Digipede Agents are licensed per processor (not per core). A dual core, single processor machine only takes one license. But a dual processor machine takes two licenses. 16 Digipede Network ™ Session 1 Training Guide Summary The Digipede Server is more than one Windows service. Using standard, stable, and well documented Microsoft solutions, the Digipede Server is able to provide communication, storage, and administration services for the Digipede Network. High Throughput Computing Digipede Training.doc November, 2007 17 Digipede Network ™ Session 1 Training Guide Digipede Agent Once the Digipede Server has been installed, you can begin installing the Digipede Agent on to the compute resources in your enterprise. The Digipede Server must be running and accessible from the compute resource in order for the installation to succeed. The Digipede Agent must be able to connect to the Digipede Server and register. For step-by-step instructions on installing the Digipede Agent, please see the “Digipede Network Installation Guide”. A Digipede Agent can run on either a shared or a dedicated compute resource, and may be installed on as many compute nodes as you like—the Digipede Server license controls how many of those agents actually perform work on jobs. The Digipede Agent can be installed on the same machine as the Digipede Server. However, this configuration is not recommended for installations of the Digipede Network Professional Edition with large numbers of agents. What Gets Installed? On a desktop machine, the Digipede Agent requires Windows XP or higher; on a server machine, Windows 2000, SP4 or higher. What this means for the average company is that the Digipede Agent can be installed on any Windows compute resource on the network. After installation, any subsequent configuration of the agent happens through Digipede Control. You don' t need to go to your compute nodes to administer them. Installation of the Digipede Agent is very simple and there are multiple ways to start the installation. A Digipede Agent installation file is installed with the Digipede Server and can be found at: C:\Inetpub\wwwroot\Digipede\DigipedeControl\Install\Agent\se tup.exe The installer is accessible from Digipede Control’s home page via a hyperlink. An administrator can open Digipede Control from the target compute resource and click the Digipede Agent download hyperlink to start the install. The installation file could also be copied to a file share or a disk. It is also possible to install to multiple compute resources simultaneously using silent installation. This functionality is only available in Digipede Network Professional Edition. See Advanced Topics in this section for more details. The Digipede Agent is made up of multiple components. The three main components are: NISvc.exe – is the Digipede Agent Service. NISvc.exe is a Windows service that starts automatically at start-up and logs on to the local system as the Local System account. High Throughput Computing Digipede Training.doc November, 2007 18 Digipede Network ™ For greater security, install the Digipede Agent service as a specific user account. Typically this is a local or domain account with limited privileges. When you use a specific account, you can use additional features, such as disk quota and limited directory access, for enhanced security. NICore.exe - starts and monitors processes, and handles communication with the Digipede Server. NIUser.exe - provides the System Tray user interface. Local Storage Session 1 Training Guide The Agent has very minimal user interface—an administrator performs most configuration from within Digipede Control. However, a person using a computer with the Digipede Agent on it can always disable the Agent; this prevents the Agent from degrading performance on a shared resource. The Digipede Agent executes tasks it selects from the Digipede Server. To execute a task the Digipede Agent often needs supporting files. These files may be execution or data files and they need to be stored on the compute resource. The Digipede Agent installer creates the directory C:\Documents and Settings\All Users\Application Data\Digipede\Agent to store data files. Event Log The Digipede Agent installer creates an Event Log that is viewable through the Windows Event Viewer administrative tool. This event log is useful in administering and troubleshooting your installation. Advanced Topics Silent installation for large-scale deployments If you plan a large-scale deployment of Digipede Agents for the Digipede Network Professional Edition, you can use a “silent” installation. Silent installations do not require any user or administrator interaction. Contact Digipede for information on this functionality. Summary Where the Digipede Server is the communication hub for the Digipede Network, the Digipede Agent is the workhorse. Install a Digipede Agent onto each compute resource you want to add to the Digipede Network and seamlessly grow your grid. High Throughput Computing Digipede Training.doc November, 2007 19 Digipede Network ™ Session 1 Training Guide Digipede Control Basics Digipede Control is the Digipede Network’s administration tool. As a thin client it can be accessed from any machine on the network that has access rights to the web server where it is installed. Digipede Control is automatically installed with the Digipede Server. Pools A pool is collection of compute resources. Pools allow an administrator to partition the grid into smaller computational groups. This allows the administrator to control where jobs are run on the grid and also allows control for users'access rights to those machines. Compute Resources can belong to more than one pool. Administrators and users can both use Digipede Control. However, they will have different experiences. Administrators have menus and abilities that other users do not. Each installation of the Digipede Network contains a Master Pool which contains every compute node available on the grid. An administrator can create as many pools as he needs and may segment the grid for security, technology, or business reasons. Compute Resources Digipede Control provides an administrator with the ability to configure the Digipede Agent installed on each compute resource. As the Digipede Agent takes work it affects both the compute resource that it is running on and the network. Based on network and business needs an administrator can configure each Digipede Agent to maximize availability and at the same time reduce negative effects. Tip: Nearly every screen in Digipede Control has sorting and filtering. To filter the items being displayed, use the Find tab. To resort the items, click the column headers. Administrators can configure a specific Digipede Agent by opening the Compute Resource page in Digipede Control and selecting the compute resource. High Throughput Computing Digipede Training.doc November, 2007 20 Digipede Network ™ Session 1 Training Guide Agent Availability The Agent Availability option specifies whether the agent is available Always (subject to the Peak Time schedule) or Only when idle (either the screen saver is active or no one is logged in to the machine). By default the option is set to Always. An administrator may want to set this flag to Only when idle if the Digipede Agent is installed on a desktop computer that is sometimes used by a person. For example, the administrator may configure a desktop compute resource that is in use during business hours to Only when idle so that the business user has complete use of the processor when he needs it, but the machine is available to the grid when the user is not there. Many users make agents Always Available even on shared resources, but they set the Base Priority (see below) to Low. For most applications, users won' t even notice when their computer is working on a Digipede job. Agent Base Priority Agent Base Priority specifies the priority at which the Digipede Agent runs processes on the compute resource. These priorities (which are defined by the operating system) are: Low (sometimes called Idle), Below Normal, Normal, High, and Real Time. The Windows operating system is a multitasking operating system, so it constantly switches between the currently running processes. Setting the base priority determines how often the operating system lets the processes started by the Digipede Agent have access to the CPU. Digipede recommends setting Agent Base Priority to Low on shared resources and High on dedicated resources. While Real Time is available, selecting Real Time could interfere with the operating system and is not recommended. Agent Administration Check-in Frequency The Digipede Agent periodically checks in with the Digipede Server to see if there is any work available for it. A Digipede Agent that has just completed a task immediately checks to see if there is any more work available. However, if there is no work available then the Digipede Agent will wait for the specified period of time. After the check-in time has passed, the Digipede Agent pings the Digipede Server to see if there is any work. The best measure for the proper check-in frequency is number of agents checking in per second. Take the total number of agents on your system, and divide it by your check-in frequency. You can use this number as a guide for tuning your installation properly. In the default configuration, a Digipede Agent checks in every 5 seconds; the time is configurable using the Administration Settings page in Digipede Control. Finding the appropriate check-in frequency depends on several factors. The most important factor is the number of agents on the system. A High Throughput Computing Digipede Training.doc November, 2007 21 Digipede Network ™ Session 1 Training Guide system with 100 agents set to check in every 5 seconds averages 20 agents checking in per second. A system with 1000 agents checking in at that frequency would have 200 agents hitting the server every second. This puts a load on the server and on the network. The more powerful the machine that the Digipede Server is installed on, the more agent checkins it will be able to process. However, there are benefits to having a short check-in time. Because the Digipede Network uses an agent-based pullsystem, an available agent will not begin working on a job until it checks in; if you have a check-in frequency of 10 minutes, it could be 10 minutes before all of your agents are working on your job. If you have jobs that have a critical need for rapid computation, having a short check-in frequency ensures that your agents will be working very soon after a job is submitted. Choosing the appropriate check-in frequency is one of the most important decisions to make when configuring the Digipede Network. Take into account the nature of the work, the speed of the network, the scalability of the Digipede Server, and the number of machines on the grid. High Throughput Computing Digipede Training.doc November, 2007 22 Digipede Network ™ Session 1 Training Guide Users Users are people or processes that have the right to access the Digipede Network. The Digipede Network Team Edition supports up to five users, while the Digipede Network Professional Edition supports an unlimited number of users. System Roles Roles are used to grant rights to users. Identifying users and defining roles is one layer of security available to the administrator. System Administrator A System Administrator has full access rights and can: • access and use Administration pages; • install agents, delete job templates, enable and disable agent licenses on compute resources, perform database administration, register external resources, administer pools, and administer users; • submit jobs on the system. System Monitor A System Monitor has limited access rights and changes are limited directly to information that is tied to him. A System Monitor can: • submit jobs on the system; • delete his own job templates; • change his own user profile. Tip: Most users on your system do not need to be Administrators. Monitors can submit and control their own jobs, and that' s all most users need to do! Job Template Administration A job template tells the Digipede Network what files need to be on a compute resource to run a job, where to get those files, how to install them, how to execute the job, and how to communicate with the executable. Every job submitted to the Digipede Network has an associated job template. The files specified by a job template reside on a compute resource until the job template is deleted from the system. An administrator can use the Job Template Administration page to delete old and unused job templates from the system. Deleting a job template from Digipede Control instructs every agent to delete all associated files from its cache. High Throughput Computing Digipede Training.doc November, 2007 Tip: For every Job Template in the system, there may be many files on each of the compute resources. In addition to taking up space on your hard disks, having hundreds or thousands of these can slow the performance of the agents. If your users intend to use the Job Template again, they should keep it in the system. But if they are done with it, it should be deleted after use. 23 Digipede Network ™ Session 1 Training Guide Job Administration Job Control Users and administrators have the ability to monitor and control jobs using the Jobs page in Digipede Control. By navigating to the Jobs page, a user can view the progress of jobs running on the system. When a running job is selected, an administrator (or the user who submitted that job) can pause the job by pressing the Pause button. No agents take tasks from a paused job (although they continue working on any tasks currently in progress). A paused job can be resumed using the same button. Similarly, a job can be aborted by clicking the Abort button. No more tasks for that job will be assigned, and any agents working on tasks for that job will stop working as soon as they check in. Troubleshooting Error messages for a job (for example, if agents are unable to download files) can be found on the Status tab. Error messages for particular tasks (or task assignments) can be found on the Tasks page; select the job that had an error, and then select the Tasks link. Select the task that failed and click the Task Assignment link to see errors, standard error, standard output, and the command line (when appropriate). To view a list of all Task Assignments for a Job, click the Job Task Assignments link on the Tasks page. Best Practices Pool Configuration To maintain fine control over which users have access to hardware resources, we recommend always setting pools to Enforce Pool Roles on every pool—including the Master Pool. Never give submission rights to any user on the Master Pool. Tip: Pools can also be used to ensure that certain jobs run on certain machines. For example, if some of your distributed jobs require Excel on the nodes, you could create a pool that consists only of machines that have Excel on them. Job Template Management Files defined by a job template are moved to the compute resource for a task to use. Every job is associated with a job template and for commonly run jobs it is recommended that a user reuse the job template. There are two reasons to do this: The job template provides a common high-level definition for the job that may include files and system requirements. Files that are defined in the job template can be cached on the compute resource for reuse. There is a flag in each job template called DiscardAfterUse. Generally a user sets the DiscardAfterUse flag to true if the High Throughput Computing Digipede Training.doc November, 2007 Tip: In the default configuration, web service calls are limited to 4 MB. With the ability to stream files and objects, it is easy for Digipede job submissions to become larger than this. Be 24 Digipede Network ™ Session 1 Training Guide job is only going to be submitted once, resulting in a complete cleanup of the files on the compute resources when the job finishes. However, if the job is run often the user can set the DiscardAfterUse flag to false and leave the files on the compute resource for later use. Summary With the Digipede Control an administrator can configure the Digipede Network for his specific business and technology requirements. High Throughput Computing Digipede Training.doc November, 2007 25 Digipede Network ™ Session 1 Training Guide Job Templates, Jobs, and Tasks There are several tools for job submission provided by the Digipede Network. While each tool is designed to address a specific type of job submission, the objects required for a job submission are standard. There are three important concepts to understand regarding work submitted to the Digipede Network: the job template, job, and task. The relationship between these objects is important; this relationship is shown in . A task is an atomic unit of work—work that gets executed on one machine. A job is a collection of one or more similar tasks. A job template describes the files necessary to work on a job, along with how to execute those files. A job template is designed to be reusable and can be associated with more than one job. ! Job templates, jobs, and tasks are all configurable and have associated properties that can be used to define a specific job submission. In some respects, these objects are hierarchical: 4. A property set in the job template is inherited by any job that uses the job template. 5. A property set in the job is inherited by all tasks defined for that job. 6. Many properties can be overridden. When this is the case, the property value in the more granular object is used. For example, if a shared property is set in both the job template and the job, the setting in the job is used. The task contains the detailed specification of the work that will occur on a particular job on a particular computer. The tasks for a particular job differ from each other in three respects: each can have unique files, parameters, and serialized data. High Throughput Computing Digipede Training.doc November, 2007 26 Digipede Network ™ Session 1 Training Guide File Definitions and Parameters The job template defines all of the files and parameters that will be used for jobs. Files and parameters may vary from task to task, or may be the same for all tasks in the job. Job templates, jobs, and tasks each have a collection of files and parameters associated with them. File Definition Using the Digipede Network to move files means that the user doesn’t have to pre-install software or data on the compute resources, or figure out a how to get result files back to the client machine. The user simply creates a File Definition for each file that will be moved and the Digipede Network does the rest. Relevance Files moved by the Digipede Network can apply either to the job template, the job, or a task. Files apply to the: 1. job template when they are needed by every task and job that uses that job template (for example, an executable, DLL, or configuration file). 2. job when they are different for each job but are the same for each task (for example, a document you are searching). 3. task when they are different for each task (for example, if you are searching for 1000 different strings in a genome and each string is in its own file, those files are task files). Each file definition in a job template has a relevance; the relevance indicates whether that file belongs to the job template, the job, or the task. A relevance of JobTemplate indicates that the file belongs to the job template. A relevance of JobPlaceholder indicates that the file belongs to the job; in this case, the file is not fully specified until job submission and every job submitted must fully specify the file. A relevance of InputPlaceholder indicates that the file belongs to the task; in this case, the file is not fully specified until job submission, and every task must fully specify the file. File Transfer When creating a file definition, the user must decide how the file will be moved. The two transfer methods employed by the Digipede Network are streamed and hosted. A streamed file moves from the client machine through the Digipede Network to the Digipede Server; it is then streamed to the compute High Throughput Computing Digipede Training.doc November, 2007 27 Digipede Network ™ Session 1 Training Guide resource. The advantage of streaming a file is that it does not have to be hosted on a machine that is reachable by the agents; the Digipede Network will move it automatically. On the other hand, a hosted file must be located on a machine that is accessible to the agents. When the job is run, the agents will copy the file directly from the host machine to the compute resource using a specified protocol. The advantage of using hosted files is that the file is only moved once (directly from the hosting machine to the compute resource). The hosted transfer type currently supports three protocols: SMB, HTTP, and HTTPS. With the SMB protocol, files are transferred to or from a Windows Share. With both HTTP and HTTPS, files are transferred using the HTTP protocol (over SSL in the case of HTTPS). In addition to being able to download files via HTTP, the Digipede Agent can upload result files via HTTP using the Digipede Transfer website. The AcceptsFiles.aspx program installed with the Digipede Transfer website allows agents to "push" files to a file server. Digipede Transfer can be installed on any machine with IIS, not just the Digipede Server. Location When creating a file definition for a hosted file, the user must indicate where the file can be found on the network. Each job template has remote locations associated with it; the remote locations indicate network paths and transfer protocols for that network location. Each file definition for hosted files must specify which remote location it will be moved to or from. Parameter A parameter is a name-value pair used to define a commandline parameter or to define a variable for a job or task. If you include the name of the parameter in the command line of a distributed application (for example, blast.exe $(PARAM1), the Digipede Agent will replace the Parameter with the proper value when it calls the command line. Parameters can be defined for a job template, job, or task. It is also possible to create a placeholder Parameter for a job template that requires a corresponding Parameter definition for a job and task object. This ensures that any job using the job template defines the required Parameter value. High Throughput Computing Digipede Training.doc November, 2007 28 Digipede Network ™ Session 1 Training Guide Summary The Digipede Network uses job template, job, and tasks as the definition of the distributed work. The file definition, parameter, and setting objects provide the details on how and where the job and tasks are to be executed. Job Template A job template contains reusable information about a specific type of job. It is, in essence, a template for a job. The job template defines what common files are needed to execute a job, how a job should be started, and if there are any compute resource requirements. It also defines the parameters and file definitions that must be completed in order to submit the job. A job template is designed to be reusable—many jobs can be submitted against the same job template. Tip: Digipede recommends that if a job template is going to be reused, that the user give it an easily identifiable name. There are several advantages to reusing job templates. One is that a job template can serve as a repository for default job settings. This ensures that the basics for a specific type of job are already set up and ready for the user. Another advantage is that common files defined in a job template can be cached on the compute resources. Caching files reduces the amount of bandwidth required to execute a job and gives the job a performance boost because those files do not have to be copied in order to start working. File Definition Files in the job template can be cached on the compute resource for future use and are then available to any job using the job template. Caching files reduces network bandwidth utilization as well as the time it takes to execute a job. The user can set the Discard After Use flag to control file caching. If this property is set to false, the Digipede Network leaves the files on the compute resource. The cached files reside on the compute resources until the job template is deleted using Digipede Control. Version High Throughput Computing Digipede Training.doc November, 2007 29 Digipede Network ™ Session 1 Training Guide Once a job template has been submitted to the Digipede Network, it cannot be changed. This functionality ensures that a job template cannot be modified while a running job is using it, and ensures that a user always knows exactly which files his job is running against. When you need to change a job template you must create a new version and make changes to that, then submit future jobs using the new job template. An advantage to using a new version for an existing job template, instead of a modified copy, is that files cached by an earlier version of the job template are available to the later version. This eliminates the need to redistribute the already cached files. Application Control The Application Control tells the Digipede Agent how to control an executable. It includes the command line, the API, and information on stopping or suspending the job. The distributed application can be started in several ways: 1. Command line to start the executable or script directly; 2. .NET object (An object created by a grid-enabled .NET application); 3. COM server (An object created by a grid-enabled COM application). Command line The default Application Control start type is command line. A Digipede Agent can start any command line application, batch process, or script with both dynamic and static input parameters. The user defines the command line and the Digipede Network inserts any dynamic parameters before starting the command line. Standard Out/Standard Error By default, command line processes write message text to standard out (stdout) and error text to standard error (stderr). These two text buffers may contain important information that the user would like to retrieve. By default, the Application Control returns standard error text and ignores standard output text. If the user would like to see all the text produced by the executed command line process, then standard output should be set to true. It is recommended that standard error be set to true so that any errors occurring during command line execution be returned to the user so that the user can determine what went High Throughput Computing Digipede Training.doc November, 2007 30 Digipede Network ™ Session 1 Training Guide wrong. .NET APIs Digipede provides two different .NET APIs, the Executive and the Worker. The can be used independently or in conjunction with each other. Executive With the Executive design pattern the job template, job, and tasks are created on the client machine, but the associated .NET objects are created on the compute resource. Because the .NET objects are created remotely, Executive applications can be started from the Digipede Workbench. Additionally, an Executive stays active until the job finishes. This is differs from the command line application which is associated with a task and closes once the task is completed. Using an Executive allows the developer to share information on the compute resource between tasks, such as database connections, as well as eliminating the task-based application start up time. Worker The Worker design pattern is the most common pattern used for grid-enabling applications. With the Worker pattern the job template, job, tasks, and all associated .NET objects are created on the client machine. The Worker pattern supports the distributed .NET object’s class definition being defined in either the application itself or in a dynamic linked library (DLL). Putting the distributed class definitions into a DLL allows the developer to grid-enable applications with graphical user interfaces (GUI) and can significantly reduce the footprint on the compute resources. COM The Digipede Framework SDK supports the grid-enablement of COM applications. Grid-enabled COM applications create the job template, job, and tasks on the client machine with the distributed COM objects being created on the compute resource. IComWorker IComWorker pattern is used to grid-enable a COM application. High Throughput Computing Digipede Training.doc November, 2007 31 Digipede Network ™ Session 1 Training Guide The IComWorker interface must be added to any COM Server class you create for distribution on the Digipede Network. The Digipede Agent then uses the IComWorker interface to start the work on the compute resource. Job Defaults Settings define requirements and rules that the Digipede Agent uses to determine whether it can run a task, and if so, how to run it. Unless overridden by the associated job, these job template settings are the default values used to define basic job requirements and execution rules. Summary The job template is a reusable Digipede object that defines common files, execution requirements, and execution rules for jobs that use the job template. Job A job contains the details for a specific run of a job template, and it contains one or more tasks. A job definition can also define job specific hardware and software requirements, job level files, and execution parameters. File Definitions The file definitions created for a job are specific to that job. If a job template has any file definitions with a relevance of JobPlaceholder, the jobs submitted against that job template must have file definitions for those files. Because these files are specific to an instance of a job, the files are not cached on the compute resource but are deleted when the job completes. A job file definition has the same file transfer locations as a job template file definition. Parameters Similar to file definitions, if a job template has any parameters with relevance of JobPlaceholder, the jobs submitted against that job template must specify the values for those parameters. High Throughput Computing Digipede Training.doc November, 2007 32 Digipede Network ™ Session 1 Training Guide Settings A job inherits the settings defined in the job defaults of the associated job template. To override a job template setting, simply change the setting in the job. Summary A job is a specific run of a job template and can use the default job template settings or be uniquely configured. A job also contains and defines the tasks that are executed on the compute resources. Task A task is an atomic piece of work that will be executed on a single compute resource. Each job has one or more tasks and these tasks must be able to be executed in parallel. A task can be a call to a command-line application or a script, a .NET object, or a COM Server. File Definitions Some applications require different data files for each task. For any job template that has file definitions with relevance of InputPlaceholder, the tasks in jobs submitted against the template must have file definitions to specify those files. When a task completes, these files are deleted; these files are never cached. Tip: Digipede Workbench' s job wizard automatically groups files by their filename (modulo extension). For example, if you specify task files input001.inf and input001.dat, Workbench would create one task with two input files. If your input files are not named in this convention, you will have to create your manually in the designer. Parameters If the job template specifies that tasks have unique parameters, each task in that job must specify values for the parameters. Result Files Tasks may specify result files; these are files that will be moved from the compute resources to a specified location after each task completes. Summary Tasks define the atomic units of work for a job and these units of work need to be able to be executed in parallel. Digipede Workbench Traditional grid computing solutions require a user to create High Throughput Computing Digipede Training.doc November, 2007 For detailed information about the Digipede Workbench, see the33 “Workbench User Guide” which is installed with the Digipede Workbench. Digipede Network ™ Session 1 Training Guide jobs using a scripting language—sometimes in proprietary languages, sometimes using scripting languages such as Perl. Digipede recognizes that requiring scripting is a major barrier to grid adoption and created the Digipede Workbench to simplify this arduous task. The Digipede Workbench is a Windows application designed to make it easy for a user to create, submit, and monitor jobs. Wizards With Digipede Workbench a user creates a job using Job Wizard. Job Wizard is made up of pages that walk the user through the job creation process. The user provides the file, parameter, command, and setting information. Once the job has been created, it can be automatically submitted when the Job Wizard closes or later by loading the job into the Designer and starting the job. Job Template Wizard After you select New Job (either from the File menu or by clicking the New Job button), Workbench will ask if you would like to use an existing job template. If you answer "No," Workbench opens the Job Template wizard. The Job Template wizard walks you through the process of creating a job template. It automatically creates a job template, remote locations, file definitions, and parameters. Based on the task files and parameters you specify, it also creates a job to submit. The wizard does not force you to specify the location and relevance of each file or parameter manually. Rather, it uses natural language questions (e.g., "Will the Digipede Agent install common files for this job?") and interprets the results to create file definitions with the appropriate relevance. It allows the user to browse to locations on the network and automatically creates the correct remote locations. High Throughput Computing Digipede Training.doc November, 2007 34 Digipede Network ™ Session 1 Training Guide Tip: If you select the Yes, cache the template and common files button on the last page of the wizard, the job template will be stored in the system and common files will be cached on the compute resources. And, it will be easier to submit jobs against this job template in subsequent submissions, because you won' t need to define common files, file definitions, or remote locations. " # $ # % Job Wizard After you select New Job (either from the File menu or by clicking the New Job button), Workbench asks if you would like to use an existing job template. If you answer "Yes," Workbench opens the Job Template Wizard. The Job Wizard is a shortened version of the Job Template Wizard. Rather than forcing you to define all of the file definitions and remote locations, it simply asks you to provide details for any file definitions or parameters with InputPlaceholder relevance. & # $ # % Parameters in Workbench Workbench can automatically pre-populate the values for parameters. There are four different ways it can do this population: • Literal: A constant. Literal parameters can be specified as job-relevant (specified only once for the entire job). If a High Throughput Computing Digipede Training.doc November, 2007 35 Digipede Network ™ Session 1 Training Guide Literal parameter is not job-relevant, you can change it for each task. • Range: A range of numbers that varies for each task. If you specify a Range parameter (along with input files and parameters from files), you indirectly set the number of tasks in a job. For example, a range from 10 to 10000 stepping by 10 creates 1000 tasks: the first would have PARAM1 = 10, the second would have PARAM1 = 20, etc., all the way up to PARAM1 = 10000. If you have more than one Range parameter, the cross product of the sets they generate determines the number of tasks. • Random: A randomized number from within a range that you specify. Random parameters can be either real or whole numbers. • Stored in a File: Each line in a particular file is a set of parameters for your tasks. The Digipede Network can read parameters from a file. When the job is submitted, the user can specify a file in which the parameters are located. Designers Workbench' s designer pages give the user full access to all of the information contained in jobs and job templates. After job templates and jobs have been created in the wizards, the user can view and edit them using the designer pages. If you prefer to work in the designers, you can create a blank job template by using the File->New->Blank Job Template option. Job Template Designer After a job template has been created, it can be opened in the Job Template Designer, where the user can specify or change aspects of the job template. All the specifications the user made in the Job Template Wizard are displayed in the Job Template Designer when the user chooses the Job Template Definition view. A job template definition can be changed until it has been submitted. To makes changes to a submitted job template, a user must either create a new version of the job template or make a copy of the job template and change the copy. High Throughput Computing Digipede Training.doc November, 2007 36 Digipede Network ™ ' Session 1 Training Guide # $ Job Designer Once a job has been created, it can be opened in the Job Designer, where the user can view, specify, or change properties of the job. All the specifications made in the Job Wizard are displayed in the Job Designer when the user chooses the Job Definition view. If the specifications are changed here, the changes become the new definition for that job. Unlike a job template, a job can be modified after it has been submitted. The user can then resubmit the changed job. High Throughput Computing Digipede Training.doc November, 2007 37 Digipede Network ( ™ Session 1 Training Guide # $ Job Tracking The Job Tracking Page is a Digipede Workbench tool that allows a user to monitor and find jobs on the Digipede Server. The user can search for jobs within a specified time frame, and/or a having specific statuses. To view jobs, select the appropriate time range and statuses (e.g., All running jobs submitted today) and click the Find button. Workbench will query the Digipede Server for appropriate jobs and list them. Double-click a job (or select it and click Monitor) to get detailed information about that job in a job window. If the job is in an active state (anything except Aborted, Completed, or Failed), the job window will actively monitor the progress of the job. If you select the Get history when monitoring jobs checkbox, Workbench will download the job history (all task assignment information, including which tasks ran where, standard error, etc). Tip: Check the Only my Jobs box to limit the jobs listed to yours jobs. Also, if you check the Autorefresh box, Workbench will refresh this page every minute. The Job Tracking Page provides much of the same information as the Jobs page in Digipede Control but allows the user to stay in the application where he is building his jobs. High Throughput Computing Digipede Training.doc November, 2007 38 Digipede Network ) ™ Session 1 Training Guide # $ $ * Saving Job Templates and Jobs Job templates and jobs can be saved to XML files. These XML files can be submitted to the Digipede Server by Digipede Control, can be opened in another Digipede Workbench, and can even be hand-edited. To save a job template or job to XML, simply make sure that its window is active, and then select Save As from the File menu. Although the files contain standard XML, by convention the following extensions are used for the files. If the XML file contains a job template, it receives the DNAX extension. If it contains a job, it receives DNJX. If the file contains a full "job submission," that is, a job template and a job, its extension is DNSX. Digipede Control Digipede Control is the Digipede Network’s administration tool. Using Digipede Control, a user can view the status and history of submitted job templates, jobs, and tasks. Each job template, job, and task is assigned an ID. This ID can be used identify associated job template, job, and task objects. Optionally, the user can assign names to job templates and jobs to make association identification easier. Job Template Page As you can see in Figure 8, with Digipede Control a user can view all the currently defined job templates. The Job Template page (Administration->Administer Job Templates) contains a list of available job templates. To see details about a particular job template simply select the job template from High Throughput Computing Digipede Training.doc November, 2007 39 Digipede Network ™ Session 1 Training Guide the list. The Information tab (in the top half of the page) then displays detailed information of the selected job template. + , * To view the details in a particular job template, select that template and click the View XML for this Template link in the Information tab. Figure 9 shows the contents of the MonteCarloPi job template. The MonteCarloPi job template is a job template created from the WorkerLibraryForms sample supplied with the Digipede Framework SDK. Binary information (i.e., streamed files) is omitted from the XML file. -. Jobs Page High Throughput Computing Digipede Training.doc November, 2007 40 Digipede Network ™ Session 1 Training Guide Like the Job Template page, Digipede Control can display a list of jobs in the system. Click the Jobs link to view the Jobs page. The lower half of the page displays a list of jobs, and the tabs on the top half of the page show details about the selected job. By default, jobs are listed most-recent-first; however, clicking on any column (ID, Job Name, Priority, Time Started, Progress, Status, and Last Result Time) will re-sort the list by that column. If you would like to filter the job list or find a particular job, use the Find tab. To learn more about the individual tasks in a job, select the job in the job list and then select the Task hyperlink above the job list. This takes you to the Task page. / , * Task Page The Task page shows the status of the tasks for one job. This page is often used to check the status of the tasks in an executing job, to see what machines were used for a job, or to gather information on a failed job. When a Digipede Agent claims a task the task is labeled "Assigned". When an Agent executes a task, the task is labeled "Running". When the Agent results are returned to the Digipede Server, the task is labeled "Completed". High Throughput Computing Digipede Training.doc November, 2007 41 Digipede Network ™ Session 1 Training Guide , * High Throughput Computing Digipede Training.doc November, 2007 42 Digipede Network ™ Session 1 Training Guide Hello World Walkthrough This walkthrough will introduce you to the Digipede Workbench by having you define and run a job. The executable you will distribute is HelloWorld.exe, a simple command-line program that writes to standard output. This HelloWorld.exe can optionally take a command line argument—if you give it an argument, it will echo that argument in its standard output. In this exercise, you will define and submit a job and job template. Subsequently, you will submit another job against that job template. Defining the Job Template and Initial Job 1. If you haven’t installed Digipede Workbench, do so. To install Workbench, log in to Digipede Control by opening a browser and navigating to HTCServer/DigipedeControl and entering your username and password. Your username is your machine name (e.g., LABPC01) and your password is the same as your username. Click on the Digipede Workbench link and follow the installation instructions. 2. Start Digipede Workbench by selecting Start->All Programs->Digipede->Workbench from your start menu. Workbench will prompt you for your credentials. Select Digipede Network Authentication and enter your username and password again; use the same username and password you used for Digipede Control. 3. Start a new job by clicking the “New Job” link in the Common Tasks pane of the Start Page. High Throughput Computing Digipede Training.doc November, 2007 43 Digipede Network ™ Session 1 Training Guide 4. When prompted with the “Would you like to use an existing Job Template” dialog, answer “No.” 5. Enter a name and description for your job template and click Next. 6. “Hello World” does require a file to be moved. Select the appropriate protocol: if your files are accessible via file share, choose “Share;” if your files are accessible on a web server, choose “HTTP.” High Throughput Computing Digipede Training.doc November, 2007 44 Digipede Network ™ Session 1 Training Guide 7. The executable is a “Common” file (also known as a Job Template file). Select “Yes” to common files and click Next. 8. Click the add button, then browse to and select the HelloWorld.exe file. The file is located in the \\HTCServer\SharedFiles folder. 9. This job does not require Job files or Task files. Select “No” to those screens and click Next on each. 10. While this job does not require any command line parameters, we’ll define one parameter for this job. We’ll define a “Range” parameter in order to create multiple tasks for this job. Select “Yes” and click Next. High Throughput Computing Digipede Training.doc November, 2007 45 Digipede Network ™ Session 1 Training Guide 11. Add one Range parameter. Enter a Name for your parameter (e.g. “NumTasks”) and define the range to go from 1 to 10 step 1. Check the “Can override at Job submission” box. Then, click OK, then Next. 12. This application doesn’t return result files nor does it use the Digipede API, so select No for the next two screens. 13. This application needs one command line parameter: a Task ID. Modify the command line by clicking “Edit.” 14. Add the Task ID to the command line by clicking where you would like the parameter to go (right after “HelloWorld.exe”), then double-clicking “Task ID” (or, if you’d rather, any other variable). Your command line should look like this: HelloWorld.exe $(TaskID) High Throughput Computing Digipede Training.doc November, 2007 46 Digipede Network ™ Session 1 Training Guide 15. Click “No” to notifications and click Next. 16. Because this application generates standard output, click the Advanced Options button and ensure that “Save Standard Output” is selected. Click OK. 17. Your job template and job are ready to submit. To submit immediately, ensure that the “Run Job on Finish” checkbox is selected and click Finish. However, you may wish to familiarize yourself with the details of the job and job template before submitting them. If you want to see the details before submitting, uncheck the “Run Job on Finish” checkbox and click Finish. Minen Walkthrough This walkthrough demonstrates using the Digipede Network to distribute the calculations of the Minimum Energy executable. Minen contains one distributable step—for each input file, the Update_File executable must be run in order to generate the results for that set of inputs. Digipede Workbench can be used to quickly and easily distribute the work of many Update_File calls. Before beginning this walkthrough, ensure that you have run make_runs.exe in order to create input files. Also, you must create a file share that is “world writable,” or you must utilize Digipede’s HTTP file transfer. 1. Start Digipede Workbench by selecting Start->All High Throughput Computing Digipede Training.doc November, 2007 47 Digipede Network ™ Session 1 Training Guide Programs->Digipede->Workbench from your start menu. 2. Start a new job by clicking the “New Job” link in the Common Tasks pane of the Start Page. 3. When prompted with the “Would you like to use an existing Job Template” dialog, answer “No.” 4. Enter a name and description for your job template and click Next. 5. “Update_File.exe” has two common files: Update_file.exe and cygwin1.dll. Answer “Yes” to common files, then browse the file share and select these two files. High Throughput Computing Digipede Training.doc November, 2007 48 Digipede Network ™ Session 1 Training Guide 6. There are no job files for MinEn, so select “No” and click “Next.” 7. Each task in MinEn requires an input file. Select “Yes” and click “Next.” 8. Browse to a file share and select one or more input files. 9. The tasks do not have parameters – click “No” and click “Next.” 10. MinEn will produce one result file – select “Yes” and browse to the file share where files should be returned. High Throughput Computing Digipede Training.doc November, 2007 49 Digipede Network ™ Session 1 Training Guide For this exercise, browse to: htc-server/digipedetransfer/AcceptsFiles.aspx 11. Next, give the output file a name (that can be used on the command line) and give an expression that can be used for a file name. In this case, use the expression $N(INFile).out. The $(VariableName) syntax indicates that the value of another variable (in this case, the input file name) will be used. The N indicates that the extension should be stripped off of the file, and the “.out” adds the .out extension. For a complete definition of the command line expression syntax, see the Digipede Workbench documentation. 12. This application does not use the Digipede API, so select “My application does NOT use the Digipede API” and click “Next.” 13. Update_File takes two arguments – the input file and the output file. Because these will be different for each task, you should use a variable for each of them. The wizard will provide a list of the files for this job. 14. We do not have an SMTP server set up for this server, so select “No” to notifications and click “Next.” 15. Click Finish. High Throughput Computing Digipede Training.doc November, 2007 50 Digipede Network ™ High Throughput Computing Digipede Training.doc November, 2007 Session 1 Training Guide 51 Digipede Network ™ Session 1 Training Guide Glossary AGENTS THE SOFTWARE THAT RUNS ON INDIVIDUAL COMPUTE RESOURCES. AGENTS MANAGE THE EXECUTION OF THE DISTRIBUTED APPLICATION. APPLICATION/DATA SERVER A SERVER THAT PROVIDES DATA AND APPLICATIONS TO THE AGENTS. IT CAN, BUT DOES NOT NEED TO BE, THE SAME SERVER THAT HOLDS THE DIGIPEDE SERVER. NO DIGIPEDE SOFTWARE NEEDS TO BE INSTALLED ON AN APPLICATION/DATA SERVER. BATCHABLE APPLICATION A COMMAND-LINE APPLICATION THAT DOES NOT REQUIRE ANY USER INTERACTION. THESE APPLICATIONS ARE CALLED BATCHABLE BECAUSE THEY CAN BE RUN FROM A BATCH PROCESS. COMPUTE RESOURCES COMPUTERS THAT ARE MADE AVAILABLE ON THE DIGIPEDE NETWORK. THESE COMPUTE RESOURCES MAY BE DEDICATED OR SHARED. DEDICATED COMPUTE RESOURCES ARE USED EXCLUSIVELY FOR JOBS RUN ON THE DIGIPEDE NETWORK. SHARED COMPUTE RESOURCES MAY ALSO BE USED FOR OTHER PURPOSES. DATA TRANSFER THE PROCESS BY WHICH ALL DATA REQUIRED FOR A SPECIFIC TASK ARE TRANSFERRED FROM A DATA RESOURCE TO A COMPUTE RESOURCE. DEDICATED COMPUTE RESOURCES COMPUTE RESOURCES THAT ARE USED EXCLUSIVELY FOR JOBS RUN ON THE DIGIPEDE NETWORK (FOR EXAMPLE, CLUSTER NODES IN A CLUSTER USED EXCLUSIVELY FOR SUCH APPLICATIONS). DIGIPEDE AGENT DIGIPEDE CONTROL DIGIPEDE SERVER DIGIPEDE TRANSFER DIGIPEDE WORKBENCH DISTRIBUTED APPLICATION THE SOFTWARE COMPONENT THAT MANAGES THE COMPUTE RESOURCE FOR THE DIGIPEDE NETWORK. THIS IS A SMALL, UNOBTRUSIVE PROGRAM THAT DOES NOT REQUIRE ANY INTERACTION WITH ANY USER OF THE COMPUTE RESOURCE. THE ADMINISTRATIVE COMPONENT OF THE DIGIPEDE NETWORK. DIGIPEDE CONTROL IS A WEBSITE (USUALLY HOSTED ON THE SAME COMPUTER AS THE DIGIPEDE SERVER) THROUGH WHICH AN ADMINISTRATOR CAN MONITOR AND RUN THE DIGIPEDE NETWORK. THE SERVER SOFTWARE THAT MANAGES JOBS AND ALL COMMUNICATION WITH THE DIGIPEDE AGENT SOFTWARE. A WEBSITE THAT TRANSPORTS FILES VIA THE HTTP (OR HTTPS) PROTOCOL. IF YOUR NETWORK ARCHITECTURE DOES NOT PERMIT THE USE OF SHARES FOR FILE COPYING, YOU CAN USE DIGIPEDE TRANSFER TO SERVE FILES. A PROGRAM IN DIGIPEDE TRANSFER CALLED ACCEPTSFILES.ASPX CAN RECEIVE FILES VIA HTTP. YOU CAN USE THIS AS A DESTINATION FOR YOUR RESULTS FILES. YOU CAN ALSO INSTALL DIGIPEDE TRANSFER ON ANY MACHINE IN YOUR ORGANIZATION WHERE YOU WOULD LIKE TO HOST FILES. THE SOFTWARE COMPONENT THAT DEFINES AND RUNS JOBS ON THE DIGIPEDE NETWORK. THIS WINDOWS SMART CLIENT CAN START AND MONITOR JOBS, AND CAN RUN ON ANY MACHINE IN AN ORGANIZATION. THE APPLICATION THAT THE DIGIPEDE AGENT MANAGES FOR EXECUTION ON COMPUTE RESOURCES. THIS APPLICATION CAN BE WRITTEN TO COMMUNICATE DIRECTLY WITH THE DIGIPEDE High Throughput Computing Digipede Training.doc November, 2007 52 Digipede Network ™ Session 1 Training Guide NETWORK USING THE DIGIPEDE API, OR IT CAN BE A STAND-ALONE COMMAND-LINE EXECUTABLE. EXTERNAL RESOURCE JOB JOB TEMPLATE ANY RESOURCE (E.G., A FILE SERVER, DATABASE, OR SOFTWARE LICENSE) USED BY APPLICATIONS ON THE DIGIPEDE NETWORK. THE DIGIPEDE NETWORK CAN APPLY LIMITS TO ENSURE THAT EXTERNAL RESOURCES ARE NOT OVERUSED OR OVERTAXED. A TASK TO RUN ON THE DIGIPEDE NETWORK; A SINGLE, SPECIFIC SUBMISSION OF A JOB TEMPLATE. OFTEN A JOB IS COMPOSED OF MULTIPLE TASKS. THE INFORMATION NECESSARY TO COMPLETE A JOB. A JOB TEMPLATE TELLS THE DIGIPEDE NETWORK WHAT FILES NEED TO BE ON A COMPUTE RESOURCE TO RUN A JOB, WHERE TO GET THOSE FILES, HOW TO INSTALL THEM, HOW TO EXECUTE THE JOB, AND HOW TO COMMUNICATE WITH THE EXECUTABLE. JOB TEMPLATES RESIDE ON A COMPUTE RESOURCE UNTIL THE JOB IS DELETED FROM THE SYSTEM. MASTER APPLICATION AN APPLICATION THAT COMMUNICATES WITH THE DIGIPEDE SERVER USING WEB SERVICES OR THE DIGIPEDE API IN ORDER TO START, MONITOR, OR CONTROL JOBS. MASTER POOL THE MASTER POOLS IS THE COLLECTION OF ALL THE COMPUTE RESOURCES POOL A COLLECTION OF COMPUTE RESOURCES ON WHICH JOBS ARE RUN. SHARED COMPUTE RESOURCES COMPUTE RESOURCES THAT ARE USED FOR OTHER PURPOSES, IN ADDITION TO RUNNING JOBS ON THE DIGIPEDE NETWORK (FOR EXAMPLE, DESKTOPS WITH ONE OR MORE INTERACTIVE USERS). TASK THE PART OF A JOB THAT IS RUN ON AN INDIVIDUAL COMPUTE RESOURCE. MOST JOBS ARE COMPOSED OF MANY TASKS. OFTEN, THE DIGIPEDE AGENT MUST COPY FILES TO A COMPUTE RESOURCE IN ORDER TO RUN A PARTICULAR TASK. THE AGENT DELETES THESE FILES AFTER THE COMPLETING THE TASK. High Throughput Computing Digipede Training.doc November, 2007 53