Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins Mike King – Odeh Engineers, Inc. CP5132 If you are building Autodesk add-ins and running up against performance problems, is it time to leave the local machine behind? What if you could, with a quick change, be running on a system with 68GB of RAM? How about an 8-core system with 4.25 GHz each? How about several of these systems connected by high-speed networks? Or maybe you don’t need a lot of power, you just don’t want to tie up the local system so you access a little machine that is available on demand to offload your long running operations. The flexibility of cloud computing allows you to pay for only what you need when you need it and you will learn how to leverage it with your Autodesk extensions. We will be using Amazon EC2 in this class, but the principles are applicable to other frameworks. Autodesk Revit® will be used in this class, but any .NET-based add-in, including for AutoCAD®, works the same Learning Objectives At the end of this class, you will be able to: Dramatically improve performance of computationally or memory-bound operations Recognize good candidate applications for cloud computing Extract and serialize data for use by your cloud-based processing Set up cloud computing resources and integrate creation and management into your applications so they are available on demand About the Speaker Mike King is a software engineer proficient in Microsoft® .NET technology. He has a background in civil/structural engineering and in-depth experience in .NET development of all types. He has worked to customize AutoCAD® from Visual LISP® and VBA to ObjectARX® and Microsoft .NET, and has significant experience with the Revit® Structure API. At Odeh Engineers, Inc., he has developed both line-of-business applications, such as a corporate intranet that incorporates CMS, CRM, and PM functionality, and specific-needs applications such as custom tablet PC tools for documenting field conditions. mike.king@odehengineers.com * Updated versions will be posted leading up to AU 2011 Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins Introduction Cloud Computing is a buzz word right now. Unfortunately, that means it’s tossed around a lot and the meaning is diluted and stretched beyond recognition. When we talk about cloud computing in this class, we mean computation as a service. We’re talking about virtual computers complete with operating systems, remote access, installed software and virtualized hardware. Hardware in this environment is abstracted, specs are provided but not manufacturers. The idea is that the service provider can deliver your specs in any manner they deem cost efficient, as long as your virtual machine performs as promised. Cloud computing providers run vast data centers and purchase commodity hardware on scales very few companies could dream of. The economics of scale involved often allow them to deliver computing resources at remarkably low cost with flexibility undreamed of with physical hardware allocations and virtually no maintenance. Taking advantage of this type of computing in your software applications requires a mind shift, careful planning, evaluation and measurement but the rewards can be remarkable. While cloud computing is often associated with startups because of the flexibility and low up front cost it allows, many large and established companies also take advantage. Even Autodesk itself has been taking advantage of cloud computing. Autodesk Cloud Computing Tools 123D Catch Beta (formerly Project Photofly): digital photo to 3d models Project Neon: Cloud based rendering Autodesk Cloud: Software delivered from cloud on subscription model Project Twitch: Streamed software delivery AutoCAD WS: Browser based AutoCAD Project Storm: Cloud based structural analysis Benefits Do things you simply couldn’t before Dramatically improve performance for certain types of applications Process lots of data without tying up local resources Reduce hardware acquisition and maintenance costs Share computing resources easily across locations Improve flexibility of server resources Alleviate IT burden of planning Major Providers Amazon Elastic Compute Cloud (EC2) - this class Microsoft Azure Google App Engine (app based, not virtual machines) 2 Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins Amazon Elastic Compute Cloud (EC2) EC2 is a web service, or a set of web services, provided by Amazon. More than that, it’s a part of a suite of tools called the Amazon Web Services (AWS) tools that are intended to make cloud computing accessible to all developers. Behind the scenes, EC2 is a vast infrastructure of data centers, hardware, personnel, expertise and software applications designed to keep your computing and data resources safe and reliable. You can, for a few dollars, achieve a level of dependability and performance that would be impractical for all but the largest companies otherwise. Signing Up You need an Amazon account, and then you need to activate AWS for your account. Once you do that, you’ll receive an Access Key and a Secret Key (more on that later). You can sign up here: http://aws.amazon.com/ AWS if fee based, but you don’t pay for anything you don’t use and you won’t be on the hook for more than a couple dollars / month for development and light use. There is a free usage tier, but it does not allow windows instances (only Linux / UNIX) and has other restrictions. Complete and up-to-date information is available on their website. Understanding Amazon Machine Images (AMI) An AMI is essentially a disk image. For those familiar with VM Ware, it is similar to a VM Ware virtual disk. In fact, you can import your VM Ware virtual discs to create Amazon Machine Images. It also includes some configuration information and, optionally, tags you can use to track your resources. Once created, an AMI can be instantiated any number of times on any type of compatible hardware. This basically means if you can’t run a 64 bit OS in a 32 bit instance (obviously). Available “base” AMI’s from Amazon: Basic 32/64 bit “Amazon Linux” (opt. Cluster Instance) SUSE Linux Enterprise Server 11 32/64 bit (opt. Cluster Instance) Red Hat Enterprise Linux 6.1 32/64 bit Microsoft Windows Server 2008 32/64 bit (opt. Cluster Instance) SQL Server 2008 Express & IIS SQL Server 2008 R2 Standard Many more from the community To create your own AMI: Start with existing AMI Launch in an instance Connect (SSH or Remote Desktop) 3 Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins Install software, configure, customize Create a new AMI with current state Repeat! Keep in mind when building your AMIs that they will be far more scalable if they don’t maintain local state. Rather than saving things to the local machine, store them in a database or put the results in a central location where they can be picked up by the client. This allows you to just start up more instances as needed to handle load, or to terminate unneeded instances without saving at any time to reduce cost. Instance Types Again, make sure to check http://aws.amazon.com/ec2/#instance for up to date information on available instance types and their hourly costs. Detailed information current as of 11/14/2011 is available in the accompanying PowerPoint presentation, I’m not going to duplicate it here because I want you to go to the Amazon website and get accurate information! Keep in mind that that hardware you are running on is abstracted away in nearly all cases. This makes performance specs a bit unfamiliar for instance descriptions sometimes. Memory is in GB, which isn’t very unusual. Processing power is measured in Compute Units, however. 1 Compute Unit ~ 1.0-1.2 GHz 2007 Opteron / 2007 Xeon ~ early 2006 1.7 GHz Xeon IO performance is differentiated, but not specifically measured (Low, Moderate, High, Very High). Instance Categories Standard o 1.7 – 15 GB RAM o 1-8 Compute Units o 160-1,690 GB Local storage o Moderate – High IO Performance High Memory o 17.1 – 68.4 GB RAM o 6.5 – 26 Compute Units o 420 – 1,690 GB local storage o Moderate – High IO Performance High CPU o 1.7 – 7 GB RAM o 5-20 Compute Units o 350 – 1,690 GB Local storage o Moderate – High IO Performance Other o Micro: tiny instance designed for infrequent bursts of CPU 4 Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins o Cluster (compute/GPU): 22+ GB RAM, 33.5 Compute Units, 1690 local storage, Very High IO performance Miscellaneous Costs Data transfer Static IP Extra storage Monitoring Load balancing Scaling Availability Zones / Regions Regions are geographical areas with different pricing schemes. Choosing a region near your clients helps reduce latency and improve responsiveness. Each geographic region has multiple isolated availability zones to improve reliability. Certain resources need to be located in the same availability zone to interconnect. Availability zones within each region are designated by letters (e.g. A, B, C) and are account specific but not consistent between accounts and don’t represent any specific building. Security Groups Each running instance is protected by a security group. This group is assigned at instance launch but can be edited anytime. It is essentially a white-list filter of source IP addresses and port ranges allowed for inbound connections. Elastic IPs Fixed IP addresses are available at no cost, as long as they are in use. They can be swapped between running instances at any time, which leads to some interesting reliability and scaling options. It’s also useful if your client application needs to directly locate or connect to a cloud instance. Load Balancing Optional Auto Scaling and CloudWatch products from Amazon make load balancing and instance scaling for well architected applications effortless. Granular performance monitoring can help detect bottle necks and perform intelligent capacity planning. Elastic Block Storage (EBS) Volumes EC2 Instances are not stateful, all data is lost when the instance terminates (unless you create a new AMI from it). For this reason, any persistent data needs to be stored off-instance. There are many options for this, but one appealing one is an EBS volume. These are persistent storage volumes allocated with an availability zone that can be attached to zero or one running instance in that same region. They provide approximately local storage speed and snapshots can easily be used to track history and provide backups. Attaching an EBS volume at startup is 5 Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins an easy way to allow local persistent storage for your instances without keeping them running all the time. Other Related Tools Amazon Simple Storage Service (S3): Highly reliable secure storage with URL accessible objects Amazon Virtual Private Cloud (VPC): Provides LAN like connectivity between multiple running instances Amazon Simple Notification Service (SNS): Notification service supporting a variety of protocols (email, http, SMS, etc.) AWS Direct Connect: VPN connection between your instances and your office Amazon SimpleDB: Non-relational database in the cloud Relational Database Service (RDS): Relational database in the cloud Many more! Choosing Candidate Applications Criteria Memory or computationally bound operations (performance gain) Long running operations (free local resources) Serializable data (must be able to transfer data from local machine to cloud and back) Not time sensitive (not required, but can allow cheap batch processing) Strong separation of concerns (software architecture, UI, logic and data separation) Input / output distributed (started on one machine and monitored from others) Durability requirements (absolutely positively must complete successfully!) If you’re looking to improve performance, you need to figure out where your existing bottle necks are. Is your application CPU bound, memory bound or is it just slow because of network latency or lots of disc operations? You need to understand what is happening in your own application before you can make an informed decision about integrating cloud computing to help boost performance. There is definitely an architecture and complexity cost to adding this type of processing, be sure that it’s justified. Don’t underestimate the power of Windows Task Manager, but you should also try professional profiling tools like those provided by Red Gate or EQATEC. Try Windows Performance Monitor (PerfMon) or just use integrated Stopwatch objects and logging in your .NET applications to zoom in on problem areas. In order to take advantage of any kind of distributed architecture, you need to have serializable data. That means you need to be able to take an in memory .NET object, reduce it to a series 6 Scale to the Sky: Adding Cloud Processing to Autodesk® Add-Ins of bytes that can be sent between processes or over a network connection to another process on another machine in another building in another part of the country (or world). That object needs to be reconstructed by that program on that other machine and processed by code there, then the result needs to be similarly deconstructed, transmitted, and reconstructed back on the client. For cloud computing, your data transfer is going to be at WAN speeds, usually significantly lower than LAN speeds so make sure your data transfer time isn’t going to kill any performance gains you get from the split. 7