Building a Campus Grid with Existing Resources LabMan Conference, Notre Dame June 8-9, 2009 Preston Smith Purdue University Special Thanks • Thanks to the Condor Team at Wisconsin for graciously allowing us to borrow from their tutorial materials! Outline • Supercomputers on Campus – Campus Grids – High-Throughput Computing – The impact of the campus grid • The Condor Software – Condor 101, at 200 mph • Condor from an administrator’s view – Policies – Networking – Security – Virtual Appliance Campus Grids • Campus grids link computing resources within universities and research institutes, often including geographically distributed sites. – Dedicated computing resources – Idle non-dedicated computing resources • Workstations • Student Labs – Campus grids build computation resources out of an institution’s existing investment in computer resources Supercomputers on Campus • Purdue’s Campus Grid currently has 23,000 cores – There are only 21 systems on the 11/2008 Top 500 list with 16,000 or more cores. • Theoretical peak capacity of the campus grid is 177 Teraflops – This would place at #12 on the 11/2008 Top 500 list • Acquiring a resource of this scale is expensive! – $3 million for compute nodes alone – Requires 2000 square feet of floor space, plus power and cooling BoilerGrid • Purdue’s Campus Grid – West Lafayette Campus • 23,000 cores – X86_64, ia32, ia64 Linux •Idle HPC nodes in Rosen Center clusters – Solaris, MacOS X – Windows •Instructional lab systems at main West Lafayette campus and Purdue’s regional campuses BoilerGrid • Backfilling on idle HPC cluster nodes – Condor runs on idle cluster nodes (nearly 10,000 cores today) when a node isn’t busy with PBS (primary scheduler) jobs BoilerGrid • Windows systems (~7000 cores) – Instructional Labs • Purdue’s TLT division has run Condor on labs since 2001 – Supporting student rendering, some faculty research – Library terminals • Dedicated Condor resources – GPU rendering cluster – FPGA computation accelerator BoilerGrid around Campus • To date, the bulk of BoilerGrid cycles are provided by ITaP, Purdue’s central IT – Rosen Center for Advanced Computing (RCAC) – Research Computing • Community Clusters – See http://www.isgtw.org/?pid=1001247 – Teaching and Learning Technologies (TLT) – Student Labs • Centrally operated Linux clusters provide approximately 12k cores • Centrally operated student labs provide 7k Windows cores • That’s actually a lot of cores now, but there’s more around a large campus like Purdue – 27, 317 machines, to be exact – Can the campus grid cover most of campus? Target: All of Campus • Green Computing is big everywhere, Purdue is no exception • CIO’s challenge – power-save your idle computers, or run Condor and join BoilerGrid – University’s President runs Condor on her PC • Centrally supported workstations have Condor available for install through SCCM. Thou shalt turn off thy computer or run Condor Other Campus Grids • Grid Laboratory of Wisconsin (GLOW) – University of Wisconsin, Madison • FermiGrid – Fermi National Accelerator Lab • Clemson University • Rochester Institute of Technology DiaGrid • New name for our effort to spread the campus grid gospel beyond Purdue’s borders – Perhaps institutions who wear red or green and may be rivals on the gridiron or hardwood wouldn’t like being in something named “Boiler”. • We’re regularly asked about implementing a Purdue-style campus grid at institutions without HPC on their campus. – Federate our campus grids into something far greater than what one institution can do alone http://farm1.static.flickr.com/124/365647571_e52111b7f4.jpg DiaGrid Partners • Sure, it’d make a good basketball tournament… • Purdue - West Lafayette • Purdue Regionals – – – – – • • • • Calumet North Central IPFW Statewide Technology Cooperative Extension Offices Your Campus?? Indiana University Notre Dame Indiana State Wisconsin (GLOW) – Via JobRouter • Louisville National scale: TeraGrid • The Purdue Condor Pool is a resource available for allocation to anybody in the nation today • NSF now recognizes high-thoughput computing resources as a critical part of the nation’s cyberinfrastructure portfolio going forward. – Not just megaclusters, XT5s, Blue Waters, etc, but loosely-coupled as well • NSF vision for HPC - Sharing among academic institutions to optimize the accessibility and use of HPC as supported at the campus level – This matches closely with our goal to spread the gospel of the campus grid via DiaGrid High Throughput Computing • Like the Top 500 List, High Performance Computing is often measured by floating point operations per second (FLOPS) • High Throughput Computing is concerned with how many floating point operations per month or per year they can extract from their computing environment rather than the number of such operations the environment can provide them per second or minute. Impact - Disciplines • Supply Chain Simulations • Structural Biology (viruses) • Astrophysics • Particle Physics • Mathematics • Economics • Communication • Materials Science • Hydrology • Bioinformatics Impact 14,000,000 50000 14,000,000 140 Hours Delivered Pool Size Jobs Unique Users 45000 12,000,000 12,000,000 120 40000 10,000,000 10,000,000 35000 100 30000 8,000,000 8,000,000 80 25000 6,000,000 60 20000 6,000,000 15000 4,000,000 40 10000 4,000,000 2,000,000 20 5000 0 0 2003 2004 2005 2006 2007 2008 2009 2010 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 2003 2004 2005 2006 2007 2008 2009 2010 -2,000,000 0 2003 2004 2005 2006 2007 2008 2009 2010 2,000,000 Pool Size Unique Jobs Users Hours Delivered Condor The Condor Software • Available as a free download from http://www.cs.wisc.edu/condor • Download Condor for your operating system – Available for most UNIX (including Linux and Apple’s OS X) platforms – Windows NT / XP / Vista Full featured system • Flexible scheduling policy engine via ClassAds – Preemption, suspension, requirements, preferences, groups, quotas, settable fair-share, system hold… • Facilities to manage BOTH dedicated CPUs (clusters) and non-dedicated resources (desktops) • Transparent Checkpoint/Migration for many types of serial jobs • No shared file-system required • Federate clusters w/ a wide array of Grid Middleware Full featured system • Workflow management (inter-dependencies) • Support for many job types – serial, parallel, etc. • Fault-tolerant: can survive crashes, network outages, no single point of failure. • Development APIs: via SOAP / web services, DRMAA (C), Perl package, GAHP, flexible command-line tools, MW • Platforms: Linux i386/IA64, Windows 2k/XP/Vista, MacOS, FreeBSD, Solaris, IRIX, HP-UX, Compaq Tru64, … lots. – IRIX and Tru64 are no longer supported by current releases of Condor Condor – at 200 mph • We could talk about Condor all day.. – So just the highlights http://www.automopedia.org/wpcontent/uploads/2008/05/indy500_start2.jpg Meet Phil. He is a scientist with a big problem. Phil’s Application … Run a Parameter Sweep of F(x,y,z) for 200 values of x, 100 values of y and 30 values of z – 200×100×30 = 600,000 combinations – F takes on the average 6 hours to compute on a “typical” workstation (total = 600,000 × 6 = 3,600,000 hours: 410 years) – F requires a “moderate” (512 MB) amount of memory – F performs “moderate” I/O - (x,y,z) is 5 MB and F(x,y,z) is 50 MB I have 600,000 simulations to run. Where can I get help? NSF won’t fund the Blue Gene that I requested. While sharing a beverage with some colleagues, Phil shares his problem. Somebody asks “Have you tried Condor?.” Phil Installs a “Personal Condor” on his machine… • What do we mean by a “Personal” Condor? – Condor on your own workstation – No root / administrator access required – No system administrator intervention needed • After installation, Phil submits his jobs to his Personal Condor… Phil’s Condor Pool F(3,4,5) 600k Condor jobs personal Condor Phil's workstation Personal Condor?! What’s the benefit of a Condor “Pool” with just one user and one machine? Condor will ... • Keep an eye on your jobs and will keep you posted on their progress • Implement your policy on the execution order of the jobs • Keep a log of your job activities • Add fault tolerance to your jobs • Implement your policy on when the jobs can run on your workstation Definitions • Job – The Condor representation of your work • Machine – The Condor representation of computers and that can perform the work • Match Making – Matching a job with a machine “Resource” Job Jobs state their requirements and preferences: I need a Linux/x86 platform I want the machine with the most memory I prefer a machine in the chemistry department Machine Machines state their requirements and preferences: Run jobs only when there is no keyboard activity I prefer to run Phil’s jobs I am a machine in the physics department Never run jobs belonging to Dr. Smith The Magic of Matchmaking • Jobs and machines state their requirements and preferences • Condor matches jobs with machines based on requirements and preferences Using the Vanilla Universe • The Vanilla Universe: – Allows running almost any “serial” job – Provides automatic file transfer, etc. – Like vanilla ice cream •Can be used in just about any situation Make your job batch-ready Must be able to run in the background • No interactive input • No windows • No GUI Create a Submit Description File • A plain ASCII text file • Condor does not care about file extensions • Tells Condor about your job: – Which executable, universe, input, output and error files to use, command-line arguments, environment variables, any special requirements or preferences (more on this later) • Can describe many jobs at once (a “cluster”), each with different input, arguments, output, etc. Simple Submit Description File # Simple condor_submit input file # (Lines beginning with # are comments) # NOTE: the words on the left side are not # case sensitive, but filenames are! Universe = vanilla Executable = my_job Output = output.txt Queue 4. Run condor_submit • You give condor_submit the name of the submit file you have created: – condor_submit my_job.submit • condor_submit: – Parses the submit file, checks for errors – Creates a “ClassAd” that describes your job(s) – Puts job(s) in the Job Queue ClassAd ? • Condor’s internal data representation – Similar to classified ads (as the name implies) – Represent an object & its attributes •Usually many attributes – Can also describe what an object matches with ClassAd Details • ClassAds can contain a lot of details – The job’s executable is analysis.exe – The machine’s load average is 5.6 • ClassAds can specify requirements – I require a machine with Linux • ClassAds can specify preferences – This machine prefers to run jobs from the physics group ClassAd Details (continued) • ClassAds are: – semi-structured – user-extensible – schema-free – Attribute = Expression ClassAd Example Example: String MyType = "Job" TargetType = "Machine" Number ClusterId = 1377 Owner = "roy" Cmd = "sim.exe" Boolean Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && (Disk >= DiskUsage) && ((Memory * 1024)>=ImageSize) … The Dog ClassAd Type = “Dog” Color = “Brown” Price = 12 ClassAd for the “Job” ... Requirements = (type == “Dog”) && (color == “Brown”) && (price <= 15) ... Phil’s Condor Pool F(3,4,5) 600k Condor jobs personal Condor Phil's workstation Phil can still only run one job at a time, however. Good News (Boss) The Boss says Phil can add his coworkers’ desktop machines into his Condor pool as well… but only if they can also submit jobs. Adding nodes • Phil installs Condor on the desktop machines, and configures them with his machine as the central manager – The central manager: •Central repository for the whole pool •Performs job / machine matching, etc. • These are “non-dedicated” nodes, meaning that they can't always run Condor jobs Phil’s Condor Pool 600k Condor jobs Condor Pool Now, Phil and his coworkers can run multiple jobs at a time so their work completes sooner. How can my jobs access their data files? Condor File Transfer • ShouldTransferFiles = YES – Always transfer files to execution site • ShouldTransferFiles = NO – Rely on a shared filesystem • ShouldTransferFiles = IF_NEEDED – Will automatically transfer the files if the submit and execute machine are not in the same FileSystemDomain Universe = vanilla Executable = my_job Log = my_job.log ShouldTransferFiles = IF_NEEDED Transfer_input_files = dataset.$(Process), common.data Transfer_output_files = TheAnswer.dat Queue 600 Phil’s Condor Pool 600k Condor jobs Condor Pool With the additional resources, Phil and his coworkers can get their jobs completed even faster. Dedicated Cluster Now what? • Some of the machines in the pool can’t run my jobs – Not enough RAM – Not enough scratch disk space – Required software not installed – Etc. Specify Requirements • An expression (syntax similar to C or Java) • Must evaluate to True for a match to be made Universe = vanilla Executable = my_job Log = my_job.log InitialDir = run_$(Process) Requirements = Memory >= 256 && Disk > 10000 Queue 600 Advanced Requirements • Requirements can match custom attributes in your Machine Ad – Can be added by hand to each machine Universe = vanilla Executable = my_job Log = my_job.log InitialDir = run_$(Process) Requirements = Memory >= 256 && Disk > 10000 \ && ( HasMATLAB =?= TRUE) ) Queue 600 And, Specify Rank • All matches which meet the requirements can be sorted by preference with a Rank expression. • Higher the Rank, the better the match Universe = vanilla Executable = my_job Log = my_job.log Arguments = -arg1 –arg2 InitialDir = run_$(Process) Requirements = Memory >= 256 && Disk > 10000 Rank = (KFLOPS*10000) + Memory Queue 600 What does the IT shop need to know? The IT administrator should know: – Condor’s daemons – Policy Configuration – Security – Virtualization Typical Condor Pool = Process Spawned = ClassAd Communication Pathway master master startd Submit-Only Execute-Only Central Manager schedd negotiator collector master schedd startd Execute-Only master startd Regular Node master startd schedd Regular Node master startd schedd Job Startup Central Manager Negotiator Collector Submit Machine Execute Machine Schedd Startd Starter Submit Shadow Job Condor Syscall Lib 59 Ok, now what? • Default configuration is pretty sane – Only start a job when the keyboard is idle for > 15 minutes and there is no CPU load – Terminate a job when the keyboard or mouse is used, or when the CPU is busy for more than two minutes • Can one customize how Condor behaves? Policy Expressions • Allow machine owners to specify job priorities, restrict access, and implement local policies Policy Configuration (The Boss) • I asked the computer lab folks to add nodes into Condor… but the jobs from their users have priority there New Settings for the lab machines •Prefer lab jobs START = True RANK = Department == ”Lab” SUSPEND = False CONTINUE = True PREEMPT = False KILL = False Submit file with Custom Attribute • Prefix an entry with “+” to add to job ClassAd Executable = 3dsmax.exe Universe = vanilla +Department = “Lab" queue More Complex RANK • Give the machine’s owners (psmith and jpcampbe) highest priority, followed by Lab, followed by the Physics department, followed by everyone else. More Complex RANK IsOwner = (Owner == ”psmith" || Owner == ”jpcampbe") IsTLT =(Department =!= UNDEFINED && Department == ”Lab") IsPhys =(Department =!= UNDEFINED && Department == "Physics") RANK = $(IsOwner)*20 + $(IsTLT)*10 + $(IsPhys) Policy Configuration (The Boss) • So far this is okay, but... Condor can use staff desktops when they would otherwise be idle Defining Idle • One possible definition: – No keyboard or mouse activity for 5 minutes – Load average below 0.3 Desktops should • START jobs when the machine becomes idle • SUSPEND jobs as soon as activity is detected • PREEMPT jobs if the activity continues for 5 minutes or more • KILL jobs if they take more than 5 minutes to preempt Policies • Policies are nearly infinitely customizable! – If you can describe it, you can make Condor do it! • A couple examples follow Custom Machine Attributes • Can add attributes to a machine’s ClassAd, typically done in the local config file HAS_MATLAB=TRUE NETWORK_SPEED=1000 MATLAB_PATH=“c:\matlab\bin\matlab.exe” STARTD_EXPRS=HAS_MATLAB, MATLAB_PATH, NETWORK_SPEED 71 Custom Machine Attributes • Jobs can now specify Rank and Requirements using new attributes: Requirements = (HAS_MATLAB =?= UNDEFINED || HAS_MATLAB==TRUE) Rank = NETWORK_SPEED =!= UNDEFINED && NETWORK_SPEED 72 START policies • Time of Day Policy – WorkHours = ( (ClockMin >= 480 && ClockMin < 1020) && \ (ClockDay > 0 && ClockDay < 6) ) AfterHours = ( (ClockMin < 480 || ClockMin >= 1020) || \ (ClockDay == 0 || ClockDay == 6) ) # Only start jobs after hours. START = $(AfterHours) && $(CPUIdle) && KeyboardIdle > $(StartIdleTime) # Consider the machine busy during work hours, # or if the keyboard or CPU are busy. MachineBusy = ( $(WorkHours) || $(CPUBusy) || $(KeyboardBusy) ) START policies • Policy to keep your network from saturating from off-campus jobs SmallRemoteJob = ( DiskUsage <= 30000 && \ FileSystemDomain != “my.filesystem.domain”) # Only start jobs that don’t bring along # huge amounts of data from off-campus. START = $(SmallRemoteJob) && $(START) Security Host/IP Address Security • The basic security model in Condor – Stronger security available (Encrypted communications, cryptographic authentication) • Can configure each machine in your pool to allow or deny certain actions from different groups of machines Advanced Security Features •AUTHENTICATION – Who is allowed •ENCRYPTION - Private communications, requires AUTHENTICATION. •INTEGRITY - Checksums Security Features • Features individually set as REQUIRED, PREFERRED, OPTIONAL, or NEVER • Can set default and for each level (READ, WRITE, etc) • All default to OPTIONAL • Leave NEGOTIATION at OPTIONAL Authentication Complexity • Authentication comes at a price: complexity • Authentication between machines requires an authentication system • Condor supports several existing authentication systems – We don’t want to create yet another one AUTHENTICATION_METHODS • Authentication requires one or more methods: – FS – FS_REMOTE – GSI – Kerberos – NTSSPI – CLAIMTOBE Networking Networking • Each submit node and potential execute node must – Be able to communicate with each other – Full bidirectional communication • Firewalls are a problem – We can deal with that, see next slide • NAT is more of an issue… Networking • Firewalls – Port 9618 needs to be open to your central manager, from all of your execute machines – Define range for dynamic ports HIGHPORT = 50500 LOWPORT = 50000 – And open corresponding ports in firewall – Condor can install its own exception in Windows firewall configuration Virtualization Condor’s VM Universe Submit Machine Execute Machine Startd Schedd VM Startd Job Condor’s VM Universe • Rather than submit a program into potentially unknown execution environments, why not submit the environment? • The VM image is the job • Job output is the modified VM image • VMWare and Xen are supported Virtual Condor Appliance • Engineering is Purdue’s largest non-central IT organization – 4000 machines – Already a BoilerGrid partner, providing nearly 1000 cores of Linux cluster nodes to BoilerGrid. • But what about desktops? What about Windows? – Engineering is interested... But… • Engineering leadership wants the ability to sandbox Condor away from systems holding research or business data. • Can we do this? Virtual Condor Appliance • Sure! • Distribute virtual machine images running a standard OS and Condor Configuration – CentOS 5.2 – Virtual private p2p networking – Encryption, authentication Virtual Condor Appliance • For us and partners on campus, this is a win – Machine owners get their sandbox – Our support load to bring new machine owners online gets easier – Execution environments become consistent • Much of the support load with new “sites” is firewall and Condor permissions. – Virtual machines and virtual “IPOP” network makes that all go away. • Not only native installers for campus users, but now a VM image – With installer to run virtual nodes as a service – Systray app to forward keyboard/mouse events to virtual guests • Not Virtualization implementation dependent – we can prepare and distribute VM images with KVM, VirtualBox, Vmware, Xen, and so on. – Just VMWare currently – We’ll offer more in the future. Condor Week 2009 Whew!!! http://media.photobucket.com/image/is it live or is it memorex/QueenB8271/MemorexAdPhoto.jpg> I could also talk lots about… • GCB: Living with firewalls & private networks • Federated Grids/Clusters • APIs and Portals • MW • Database Support (Quill) • High Availability Fail-over • Compute On-Demand (COD) • Dynamic Pool Creation (“Glide-in”) • Role-based prioritization and accounting • Strong security, incl privilege separation • Data movement scheduling in workflows •… Conclusion • Campus Grids are effective ways of bringing highperformance computing to campus – Using institution’s existing investment in computing • The Condor software is an excellent framework for implementing a Campus Grid – Flexible – Powerful – Minimal extra work for lab adminstrators! •Just one more package in your image • Virtualization with Condor – Improve security of machine owners’ systems – Improve grid manageability – Consistency The End Questions? Interested in a campus grid at your institution? Want to join DiaGrid? http://www.rcac.purdue.edu/boilergrid