What Makes the Early Birds Get the Worm? Industrial Drivers/Solutions for Grid/Condor Jason Stowe Why HTC? What are “Enterprise’s” requirements for Grid? Based upon first-hand experience What is an ‘Enterprise’? What motivates them? Includes Research/Gov’t/Academic environments? What about Money? Doesn’t industry spend more on large computations? Remember our community Then just look at Top 10 or 50 of the largest computing sites… Many/most are not at companies So, its *Any* Organization Using Condor with… Demanding Users Organization => Groups of Demanding Users Purchased Computer Capacity Need Computation Done. Easily. On a Deadline. In-House, Third Party Applications without modification in order to do their jobs What drives Enterprise Condor Users? Cost? Important Certainly Condor works well here, more $$$ for Hardware Reliability Uptime Fault Tolerance Disaster Recovery Condor provides High Availability, Fault-tolerant Design 20+ Years old, so it is very stable Users have varying levels of tolerance for uptime Depends upon the application Commercial endeavors have tighter Service Level Agreements What else about SLAs? SLAs = Latency/Throughput Some Applications Require High-throughput Many longer jobs, none lost, even with failures Example: Movies, lots of frames, don't miss any Some Require low-latency Many short jobs, need fast response Missing some is fine, we'll decide about retries Example: Trader pricings, fast turn around times Throughput and latency are related Overall utilization ~ scheduling latency as % of compute time => Decreasing latency improves utilization (overall goodput) Critical aspect: Missing Job Tolerance The Bottom Line => Productivity Employee productivity Easy integration of custom and 3rd party applications Condor does well at these Cycle Computing Training/Consultation on Condor Developers want APIs to submit work SOAP/DRMAA Not only people productivity Also resource utilization Budgets = looking at how much resources are 'used' Cycle-stealing, fair-share become drivers Corporations do share internally Condor fits perfectly into this scenario On-demand Computing is appearing as a driver CycleCloud™ On-demand Condor pools in Beta (contact me) Competitive Advantage: Computation Options Be able to have any competitive advantage they can: Hiring and Computing Be able to use any compute technology available Want Computing Options for Advantage: Windows/Linux/x86/PS3/etc. Avoid “Vendor Lock-in” Because that causes spending lots of Competitive Advantage: Scale Entities computing usage constantly increases 10 slots, become 100, become several hundred Thanks to Moore’s Law With numerous sites having X000s of slots Condor can grow with the installation Competitive Advantage => Scheduling Project/User Priorities AccountingGroups enable priority for departments/project/etc. Quotas for minimum/maximum capacity Need Flexible control of how resources are used Condor ClassAds and policies are the best at this Competitive Advantage => Virtualization Checkpoint Any Job Security on Workstations Condor 6.9 Competitive Advantages => Easy Management Ability to manage configuration Analyze productivity across many resources Usage reporting and visualization Accountability Audit changes, Authorization Monitoring of Machine/Condor Condor provides CmdLine Tools Community provides support Grids generate Lots of Data But Need Productivity Solutions In Administration, Provisioning, and Analysis Need Analysis, Management, Training Solutions Cycle Computing • CycleServer™ Management • Consulting and Advice • Implementation • Training Cycle has Experience with Tens of Condor Pools, 5000+ machines, X0000 slots Attempting to implement policies for sharing or scaling? Consulting for: Policy Creation, Best Practices Improvements in Various Environments: 20x in Negotiation Performance 10x in Scheduler Capacity Need help creating software pipelines or pools? Implementation for: Pool Setup, Software Pipeline Development, Low-latency scheduling Several Hundred Servers up and running in 1.5 days Want to bring Users/System Administrators up to speed on Condor? Training Classes for: Condor Administration Architecture Best Practices Submission Best Practices Command Lines Issue Diagnosis Do you need coverage for Condor issues with variable SLAs? Condor/CycleServer Support : 24/7 Phone (SLAs in the hours) E-mail based (SLAs in days) Need Management for Condor Pools to enable Analysis, Reporting, and Auditing? CycleServer™ Web-based GUI for Condor Management CycleServer™ Overview – Web and Command-line based management system for multiple Condor pools – Built to run in Java Servlet Containers (e.g. Tomcat, WebLogic, etc.) – Data persistence to Oracle or PostgreSQL – Uses XSLT transforms for HTML presentation, so page layout/look can be configured CycleServer™ Overview •Configuration Management and Machine Movement • Auditing for Config Changes and Condor Commands • Job Status and Diagnostic Information • Pool Administration w/o logging in to machines • Authorization/Permissions for Actions on Grid • Usage Monitoring, Capacity Planning, Reporting • Easy Pool Status notifications, and viewing • Easy shutdown of Machines for maintenance Thank you NeSC. Demo. Questions? http://www.cyclecomputing.com jstowe @ cyclecomputing.com