Making the Most of Infrastructure as a Service E.J. Daly CTO, Creme Global 2014-02-27 Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management EU FP5 Monte Carlo Project 1999 CREME Project 2002-2005 Creme Software Ltd Formed 2005 By 2007… HQ: Trinity College, Lloyd Building Team: 4 People (MSc HPC Graduates) Since then… • Consistently Listed Amongst the Fastest Growing Technology Companies in Ireland • Deloitte “Fast 50” • • • • 2010: 14th 2011: 9th 2012: 20th 2013: 35th • “Organic” Growth Today… HQ: Trinity Technology & Enterprise Campus Today… Team: 23 Full Time Staff Software Engineers Quality Assurance Maths Modellers Statisticians Food Scientists Nutritionists Today… What exactly does Creme Global do? Predictive Intake Modelling We give decision makers access to the right data, models and expertise in a form that they can understand. We build models and software to calculate consumer exposure to substances (chemicals, flavorings, fragrances, contaminants) present in food, cosmetics, packaging, environment These analyses enable decision makers to set regulatory limits based on the real consumer exposure. Creme Global - Services High Performance Technical Services Cloud Software & Projects Data Validation & Curation Cloud Software Technical Services Data Curation Value Chain Creme Global Primary Data Generation (research, labs, innovation) Complex Data, Large Volumes Analysis of Data > Information (scenarios, risk) Accurate and Trusted Results Decisions (Policy, Regulation, Investment) Better Decisions and Confidence Creme Global - Benefits Pro Actively Protecting Consumer Health Understand Exposure Assessment Better Decisions Limitations of Traditional Methods • Large investments in collecting data have been made • Data sets are reduced to a few basic statistics to make exposure estimates for regulatory purposes • Exposure estimates are assumed to be conservative • Level of conservatism is actually unknown • Results are not accurate or realistic • Exposure estimates can be incorrect by an order of magnitude Risk Analysis and The Flaw of Averages Image: www.flawofaverages.com Creme Global Methods Detailed Product Usage Information Occurrence Data Expert Models • Probabilistic & • Deterministic Consumer Exposure Creme Global Methods • Scientifically validated models of consumer exposure and risk assessment • As called for by FDA, EFSA, SCCS, USDA, FSA, etc… • Use all the available real data in the exposure model • Retains relationship between intakes and key factors • Aggregate Exposure from multiple sources • Assess substances from multiple products • Cumulative Exposure from multiple substances / chemicals simultaneously from all sources • Assess full formulations Creme Global: Probabilistic Modelling • In an Ideal World, we would have access to complete exposure data for everyone: • How much they consume? • How often? • Which products? • The exact chemical concentration in these products? • This detailed data would enable a (relatively) straightforward calculation of population exposure • In reality, data is only available for a relatively small proportion of the population. • The software developed by Creme enables estimation of the actual population exposure from this limited data. Creme Database Tables Subjects Subject demographics Consumption Products and Foods consumed Brands Market Shares, Brand Loyalties Groups Recipe and Food Groups Info Correlations Information on correlated variables Endpoints e.g. ARfD, ADI Processing Potential processing factors Substances Substance / Chemical concentrations in products / foods Creme Global: Probabilistic Modelling • The software creates a large simulated population based on the observed data, using probabilistic modelling (Monte Carlo) • The simulated population has the same usage patterns and habits as the real population • This simulated population is used to represent the real population • Exposure statistics are calculated for the simulated population Creme Global: Probabilistic Modelling • Example: Dermal Exposure to Fragrance Compounds from a Cosmetics Product Dermal Exposure (mg/cm2/day) = (F x A x C x R) / S F = Frequency of Use (of Cosmetic Product) A = Amount Per Use C = Chemical Concentration R = Retention Factor S = Skin Surface Area Creme Global: Probabilistic Modelling Dermal Exposure (mg/cm2/day) = (F x A x C x R) / S These values are not available for everyone in the population We gather information for each parameter from available data collection sources (surveys / studies) Freq. of Use: Amount per Use: Chem. Conc.: Retention Factors: Surface Area: Survey of 36000 EU/US consumers (1.2 million recorded events) Surveys of between 360 and 500 people Fragrance and Cosmetics Manufacturers Expert Opinion US EPA Creme Global: Probabilistic Modelling Dermal Exposure (mg/cm2/day) = (F x A x C x R) / S F X A C X S X R Model Output Frequency Distribution of Subjects Exposure (mg/kg/day) Reference Dose Frequency Daily Average & Maximum Day Daily Average Maximum Day ARfD Exposure Daily Average & Lifetime Exposure Frequency Lifetime Daily Average Exposure FOOD GLOBAL FOOD GLOBAL FOOD COSMETICS GLOBAL FOOD COSMETICS GLOBAL MICROBIAL FOOD CROP PROTECTION PESTICIDES COSMETICS GLOBAL NANOTECH PACKAGING MICROBIAL Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Cloud Computing • How is Cloud Computing different to everything else? (Armbrust et al, 2010) • The appearance of infinite computing resources • The elimination of an up-front commitment by cloud users • The ability to pay for use of computing resources as needed (for example processors by the hour, storage by the day) • Definition from NIST (National Institute of Standards and Technology): “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” • More concisely: • Cloud Computing = Internet Services & Pay for what you use Image: www.jansipke.nl Using Iaas to provide SaaS Image: A view of Cloud Computing (Armbrust et al., 2010) Leading Providers of IaaS Image: blog.appcore.com Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Benefits of IaaS • High Quality, Reliable, Enterprise Grade Infrastructure • Servers • Storage • Networks • Reduce waste and inefficiency • Reduce cost • Avoid large up front capital expenditure • Avoid large in-house maintenance costs • Rapid scaling possible to fit requirement • Accessible from anywhere IaaS Benefits: Scalability Image: www.techtricksworld.com IaaS Benefits: Scalability Problem 1: Wasted Resources IaaS Benefits: Scalability Problem 1: Wasted Resources Problem 2: Losing Customers? Negatives of IaaS • Performance • • • • Network (High Performance Computing) Disk (not as good as performance / throughput. Database heavy applications) Reliability in terms of performance (Not always consistent performance) Performance of cloud is usually less than dedicated hardware • Although, some cloud providers can provide higher performance (“pay per performance”) • You can sometimes work your way around poor performance (e.g. RAID arrays) • Cost • More expensive for some applications / workflows e.g. workflows with relatively constant load • “Pound for pound” more expensive than Virtual Private Hosting, Colocation The added flexibility and scalability comes at a cost Cloud Computing: Hype Cycle Cloud Computing: Hype Cycle Cloud Computing: Hype Curve Entering the “Trough of Disillusionment”? Moving away from Cloud? • Recent reports of migrations away from public cloud, to in-house / private clouds: Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over the next three years. But with physical servers, the cost will be closer to $200,000. (wired.com report, Aug 2013) Moving away from Cloud? • Recent reports of migrations away from public cloud, to in-house / private clouds: Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over the next three years. But with physical servers, the cost will be closer to $200,000. (wired.com report, Aug 2013) • Cloud Computing predicted growth 36% compound annual thru 2016 (451Research) Moving away from Cloud? • Recent reports of migrations away from public cloud, to in-house / private clouds: Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over the next three years. But with physical servers, the cost will be closer to $200,000. (wired.com report, Aug 2013) • Cloud Computing predicted growth 36% compound annual thru 2016 (451Research) • Cloud Computing is not perfect for every business / application • As IaaS matures, some early adopters may start to consider more sophisticated approaches like Hybrid Cloud Moving away from Cloud? • Recent reports of migrations away from public cloud, to in-house / private clouds: Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over the next three years. But with physical servers, the cost will be closer to $200,000. (wired.com report, Aug 2013) • Cloud Computing predicted growth 36% compound annual thru 2016 (451Research) • Cloud Computing is not perfect for every business / application • As IaaS matures, some early adopters may start to consider more sophisticated approaches like Hybrid Cloud • (In agreement with Gartner Hype Cycle) When is it a good idea to think about IaaS? • Start-ups • Avoid up front expenditure in hardware • Avoid having a deicated sys admin function to ensure uptime for clients • Flexible, Agile, Lean – easy to ‘pivot’ • Elasticity • Not sure about predicted load / usage for the next 12-24 months? • Load on servers is inherently variable: you expect the load to vary a lot for the foreseeable future. If you’re not sure: 1) Try to do a cost calculation 2) Is there a difference in the level of service you will be able to offer? Case Study: Creme Global • 2006-2009: • Single HPC Cluster (3x rack servers) • 8 cores • 16 GB RAM • Colocation hosting in Dublin • Capacity: • Up to 2x concurrent assessments / jobs • 2009: • Increasing client base • Potential clients requesting trials • Evaluation of compute resource requirements needed… Analysis of Compute Resources Monitored assessment / job requests on compute servers over 4 month period 10.71% 89.29% 0 Jobs 1+ Jobs Analysis of Compute Resources Problem 1: Most of the time: zero load on compute servers (Wasted compute resources) 10.71% 89.29% Compute resources in use about 10% of the time 0 Jobs 1+ Jobs Analysis of Compute Resources Closer examination of load when resources are in use (i.e. when clients are using the compute resources) 10.71% 89.29% 0 Jobs 1+ Jobs Analysis of Compute Resources 75% 89.29% When clients are using the system, a large proportion of the time their jobs have to queue. 10.75% 25% 0 Jobs 1-2 Jobs 3+ Jobs Problem 2: System is overloaded 25% of the time it is in use. Analysis of Compute Resources • Problem 1: Compute system is usually unused (~90% of the time) • Waste of compute resources • Problem 2: When in use, system is regularly overloaded (~25% of the time) • Unsatisfactory service being offered to customers IaaS Benefits: Scalability Problem 1: Wasted Resources Problem 2: Losing Customers? Typical Scenario Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Elastic Scaling of Resources in the Cloud • This is one of the biggest benefits of using IaaS Cloud is generally more expensive – because of this benefit • Manual -or- Automated? Scaling Manually • Initially, you probably won’t have a scaling strategy • Manually monitoring and scaling usage will provide the data you need to move to automation • An incomplete or poorly designed automation strategy can end up costing more, or providing worse service • Dev / Test phase of an application (let developers scale up/down manually) • If your requirements will change relatively infrequently • Good predictions of future requirements • Scale up (down) as you add (remove) a product or client • Needs a good alignment with business development and strategy • Scaling manually on the cloud is quite similar to Virtual Private Hosting • • • VPS are usually lower cost than on-demand cloud servers You can get VPS-style hosting from cloud providers and migrate to on-demand when needed Reserved instances (pay some up front to lower the overall cost) Scaling Manually Image: 8kmiles.com Scaling Manually Scale up when you need to (e.g. new contract) Scaling Manually Scale down when demand falls (e.g. end of contract) Scale up when you need to (e.g. new contract) Measuring Performance • Even a manual scaled system, will need Metrics to measure the performance • Examples: • Server Load CPU Utilization, Disk Read / Write, Network I/O, Memory Usage, Disk Usage • Availability Uptime (%) • Response Time Database queries, Server-side processing, Content distribution • Queue Length Batch processes waiting to start Scaling Automatically • Demand on system changes too rapidly to manage manually Image: 8kmiles.com Scaling Automatically AutoScaling: Ready Made vs Build Your Own AutoScaling: Ready Made Solution • IaaS providers offering built-in scaling solutions • Some third-party providers and consultants can help build a solution for you • RightScale • 8kmiles.com • Pros: can be set up quite quickly and relatively cheaply Don’t need to spend a lot of time and resources on R&D • Cons: may be limited in scope May not be a perfect fit for your scaling requirements AWS provides built-in autoscaling functionality for your EC2 instances Image: aws.amazon.com AWS provides built-in autoscaling functionality for your EC2 instances 1) Metrics “Should we make a change?” AWS provides built-in autoscaling functionality for your EC2 instances 2) Scaling Rules “What change to make?” 1) Metrics “Should we make a change?” Average Min Max Sum SampleCount CPU Utilization (%) Disk Reads (Bytes) Disk Read Operations Disk Writes (Bytes) Disk Write Operations Network In (Bytes) Network Out (Bytes) Average Min Max Sum SampleCount CPU Utilization (%) Disk Reads (Bytes) Disk Read Operations Disk Writes (Bytes) Disk Write Operations Network In (Bytes) Network Out (Bytes) Average Min Max Sum SampleCount < ≤ > ≥ CPU Utilization (%) Disk Reads (Bytes) Disk Read Operations Disk Writes (Bytes) Disk Write Operations Network In (Bytes) Network Out (Bytes) Average Min Max Sum SampleCount < ≤ > ≥ 1 Minute 5 Minutes 15 Minutes 1 Hour 6 Hours Other Metrics Possible • EBS (Elastic Block Storage) • • • • Read / Write Bytes Read / Write Operations Idle time Queue length (operations waiting to be completed) • SQS (Simple Queuing System) • Number of messages sent / received • Number of messages in the queue • Custom metrics can be defined by the user Choose range of cluster size Choose range of cluster size Add a number of instances or Increase size by a certain percent Choose range of cluster size Add a number of instances or Increase size by a certain percent 1) Rule for Scale Up Choose range of cluster size Add a number of instances or Increase size by a certain percent 1) Rule for Scale Up 2) Rule for Scale Down AutoScaling: Build Your Own Solution • Full control and customization over the scaling algorithms • Case Studies: • Netflix • Creme Global Custom AutoScaling: Netflix 5 Days Custom AutoScaling: Netflix General pattern emerges over time 5 Days Custom AutoScaling: Netflix General pattern emerges over time Noise (Unpredictable / Random) 5 Days Custom AutoScaling: Netflix General pattern emerges over time Noise (Unpredictable / Random) 5 Days Scaling Strategy: 1) Predict the general pattern 2) React to the randomness 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Fast Fourier Transform (approximation of observed data as combination of Sin waves) 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Fast Fourier Transform (approximation of observed data as combination of Sin waves) 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Fast Fourier Transform (approximation of observed data as combination of Sin waves) 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Fast Fourier Transform (approximation of observed data as combination of Sin waves) Scaling plan ready before demand changes 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Fast Fourier Transform (approximation of observed data as combination of Sin waves) Scaling plan ready before demand changes 𝑀𝑒𝑡𝑟𝑖𝑐 = 𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Fast Fourier Transform (approximation of observed data as combination of Sin waves) Scaling plan ready before demand changes Random spikes (deviations from the prediction) are fixed using Amazon AutoScaling Custom AutoScaling: Creme Global Custom AutoScaling: Creme Global 8.16% 89.29% 10.75% 2.59% 0 Jobs 1-2 Jobs 3+ Jobs Job Requests – Very Unpredictable Custom AutoScaling: Creme Global Job Run Times (1+ hour only) 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Series1 1-2 hr 2-3 hr 3-4 hr 4-5 hr 5-6 hr 6-7 hr 7-8 hr 8-9 hr 9-10 hr 10+ hr 46% 15% 8% 9% 3% 3% 2% 2% 1% 10% Run Time (Hours) Job Size – Variable from 1 hour to 10+ hours Custom AutoScaling: Creme Global • Job size is very much larger than typical requests to web service • If resources are low – job cannot start Compare to standard web services • Each job requires dedicated resources to run Typically 1-2 jobs per server; Each server is usually at close to 100% CPU while processing jobs Measuring individual server load is not a good measure for scaling • Jobs are very variable in size Length of job queue alone is not a very good measure for scaling • A custom scaling approach was required http://www.google.com/patents/US20110138055 Custom AutoScaling: Creme Global • Devised a more relevant Metric to measure the performance of the system: “How long will it take for the last job in the queue to start?” • How to calculate this metric? 1. 2. 3. Estimate the time required for each job to complete (running or queued) Simulate the processing of each job through the queue in order Calculate the time that will have to pass before the last job will begin to process http://www.google.com/patents/US20110138055 How to calculate this metric? 1) Estimate the time required for each job to complete (running or queued) • For jobs that are already running: • The application can estimate the percentage of a job completed so far • Estimate: Total time required = (Time required so far) / (Percent complete) Time remaining = Total time required – Time required so far • For jobs that have yet to start: • Estimate the complexity of the job based on input factors, including: Monte Carlo iterations requested Size and complexity of the data sets involved Mathematical model computational complexity How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 30 min 5 min 32 min 7 min 47 min T = 0 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 30 min 5 min 25 min 0 min 40 min T = 7 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue Job Done 14 min 62 min 30 min 5 min 0 min 25 min 40 min T = 7 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue Job Done 14 min 62 min 30 min 5 min 0 min 25 min New Job Starts 40 min T = 7 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue Job Done 14 min 62 min Queue Progresses 30 min 5 min 0 min 25 min New Job Starts 40 min T = 7 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 30 min 25 min 5 min 40 min T = 7 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 30 min 20 min 0 min 35 min T = 12 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 20 min 30 min 35 min T = 12 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 0 min 10 min 15 min T = 32 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 62 min 10 min 15 min T = 32 min How to calculate this metric? 2) Simulate the processing of each job through the queue in order Queue 14 min 52 min 0 min 5 min T = 42 min How to calculate this metric? 3) Calculate the time that will have to pass before the last job will begin to process Queue 14 min Last job ready to start 52 min 5 min T = 42 min How to calculate this metric? 3) Calculate the time that will have to pass before the last job will begin to process Queue 14 min Last job ready to start 52 min 5 min Queue Length (Performance Metric): 42 min T = 42 min Custom AutoScaling: Creme Global • Scale Up Rules: Queue Length > 10 min No instance pending • Scale Down Rules: Instance is idle (no running job) Queue is empty Less than 5 min to another billing hour Custom AutoScaling: Creme Global Job Queue Times (14,623 Jobs :: 2103-14) 80% 70% 60% 50% 40% 30% 20% 10% 0% Series1 0-2 min 2-4 min 4-6 min 6-8 min 8-10 min 10-12 min 12+ min 74% 12% 5% 6% 2% 1% 1% Queue Time (min) Creme Global Cloud Computing and IaaS Migrate to the Cloud? Scaling Resources Business and Management Creme Global Cloud Computing and IaaS Scaling Resources Migrate to the Cloud? Business and Management Business and Management Considerations • Changing Roles (Software Dev) • Data Security • Monitoring Costs • Further Cost Saving Strategies Changing Role of Software Developers • Hardware provisioning is now the responsibility of Software Dev • Spinning up / down instances and volumes is part of the day-today for Developers • What happens when things get busy?: Think about: what happens your desk, desktop, inbox… • Risks: • • • • Test / Development instances left running Volumes and Snapshots without labels Easy to keep backups “just in case” - build up over time if not managed Billing is far less transparent (compared to conventional hardware purchase / budgeting) • Easy to scale up Easy to make a mess! • A fixed-resource system will self-regulate due to the inherent limits Cloud system does not have these limits Changing Role of Software Developers • Benefits • Very empowering for some software architects - can design, build, test hardware configurations that will support their applications • Compliments Agile and Lean Development practices • Software developers can acquire new skills (e.g. systems engineering skills, IS management) • Streamline design, development, QA / test, release, support • Merging of a number of roles • Software Developer, Software Architect, Systems Engineer, … • Result: “DevOps” Data Protection Image: aws.amazon.com Data Protection Data protection is vital to reputation of IaaS providers Very high standards in place and auditing processes Your IaaS provider should be able to grant you access to their auditing reports / whitepapers on security Data Protection Employ best practice within your own organization: - Server Upgrades / Patches - Application Security - Data encryption (storage / transit) - Principle of Least Privilege - Defense in Depth - Refer to guidelines: DPA, AWS Data protection is vital to reputation of IaaS providers Very high standards in place and auditing processes Your IaaS provider should be able to grant you access to their auditing reports / whitepapers on security Data Protection Building trust with customers: - Provide audit and reports from IaaS providers - Provide documentation on standards within your organization Employ best practice within your own organization: - Server Upgrades / Patches - Application Security - Data encryption (storage / transit) - Principle of Least Privilege - Defense in Depth - Refer to guidelines: Data Protection Commissioner, AWS Data protection is vital to reputation of IaaS providers Very high standards in place and auditing processes Your IaaS provider should be able to grant you access to their auditing reports / whitepapers on security Managing Cost • IaaS: “Pay for what you use” • IaaS: Easy to scale up / down as demand increase / decreases • Can you accurately predict Cloud Computing bill each month? • Can you afford to wait until the next bill to find out? • An unexpectedly large bill could cause cash flow problems for a small company or start-up Managing Cost: Monitoring & Alarms Managing Cost: Monitoring & Alarms With AWS, you can view current monthly spend Managing Cost: Monitoring & Alarms With AWS, you can view current monthly spend Set up Alerts e.g. “Email me when my bill goes over $1,000” Managing Cost: Monitoring & Alarms With AWS, you can view current monthly spend Set up Alerts e.g. “Email me when my bill goes over $1,000” But, what to do next? Cost Management: Finding Savings • IaaS = “Pay for what you use” including: elasticity, scaling, storage quality / redundancy, backups, reliability • To save costs: 1. 2. 3. 4. 5. Can you make an upfront commitment on some servers (less flexible)? Can you build your own auto-scaling application? Can you put some of your data into cheaper archive storage? Can you live with having some of your data stored non-redundantly? Can you live with unpredictable server outages? Cost Management: Finding Savings 1. Can you do with less flexibility in terms of number of servers? 2. Can you build your own auto-scaling application? 3. Can you put some of your data into cheaper archive storage? 4. Can you live with having some of your data stored non-redundantly? 5. Can you live with unpredictable server outages? Paying an upfront annual cost for a particular usage of Cloud instances will reduce the overall cost AWS provides Reserved Instances which can give up to 65% saving over on demand instances Data can be expensive to store on the Cloud using the standard services If data is not needed “on demand”, then cheaper storage options are available EBS costs: 1TB = $600 per annum, Glacier cost: 1TB = $60 per annum Standard storage on AWS is 99. 999999999% durable. If data is already stored somewhere else, then 99.99% durability may be sufficient (saving about 20% on cost) If your application is fault tolerant and able to withstand random server outages AWS Spot Instances Spot Instances are unused AWS instances that are auctioned off to highest bidder You will lose any instances that are out-bidded without warning Private and Hybrid Cloud Private Cloud • Concerns / Considerations: • Is your organization large enough to have a private cloud which gives the “appearance of infinite compute resources”? • If not, then don’t expect your private cloud system to operate under exactly the same rules as the public cloud you’re used to • Private cloud will require in-house IT capability to manage • Can internal system provide the same level of service as enterprise public cloud? Think about: network bandwidth / redundancy, uptime, backup, disaster recovery • Going private for performance: maybe bare-metal is the what you really need. “Web servers belong in the public cloud. But things like databases — that need really high performance, in terms of [input and output] and reading and writing to memory — really belong on bare-metal servers or private setups.” John Engates (CTO, Rackspace) Hybrid Cloud Hybrid Cloud Use Private Cloud for predictable workloads Overflow to Public Cloud when needed Hybrid Cloud Use Private Cloud for predictable workloads Overflow to Public Cloud when needed Integration between Private and Public cloud is very important: - Network: Bandwidth, Latency, Reliability - Application Programming Interface (API) - Virtual Machine Image Creme Global Cloud Computing and IaaS Scaling Resources Migrate to the Cloud? Business and Management Creme Global Cloud Computing and IaaS Scaling Resources Migrate to the Cloud? Business and Management Thanks to: International Association of Software Architects, Ireland (IASA) Irish Computer Society (ICS) More Info: blog.cremeglobal.com ie.linkedin.com/in/ejdaly/