Click Here to Presentation

advertisement
Making the Most of
Infrastructure as a Service
E.J. Daly
CTO, Creme Global
2014-02-27
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
EU
FP5
Monte
Carlo
Project
1999
CREME Project
2002-2005
Creme Software Ltd
Formed 2005
By 2007…
HQ:
Trinity College, Lloyd
Building
Team:
4 People
(MSc HPC Graduates)
Since then…
• Consistently Listed Amongst the Fastest Growing
Technology Companies in Ireland
• Deloitte “Fast 50”
•
•
•
•
2010: 14th
2011: 9th
2012: 20th
2013: 35th
• “Organic” Growth
Today…
HQ:
Trinity Technology
& Enterprise Campus
Today…
Team:
23 Full Time Staff
Software Engineers
Quality Assurance
Maths Modellers
Statisticians
Food Scientists
Nutritionists
Today…
What exactly does Creme Global do?
Predictive Intake Modelling
We give decision makers access to the right data, models and
expertise in a form that they can understand.
We build models and software to calculate consumer exposure to
substances (chemicals, flavorings, fragrances, contaminants)
present in food, cosmetics, packaging, environment
These analyses enable decision makers to set regulatory limits based
on the real consumer exposure.
Creme Global - Services
High Performance Technical Services
Cloud Software
& Projects
Data
Validation &
Curation
Cloud
Software
Technical
Services
Data
Curation
Value Chain
Creme Global
Primary Data
Generation (research,
labs, innovation)
Complex Data, Large
Volumes
Analysis of Data > Information
(scenarios, risk)
Accurate and Trusted
Results
Decisions
(Policy, Regulation,
Investment)
Better Decisions
and Confidence
Creme Global - Benefits
Pro Actively
Protecting
Consumer Health
Understand
Exposure
Assessment
Better Decisions
Limitations of Traditional Methods
• Large investments in collecting data have been made
• Data sets are reduced to a few basic statistics to make
exposure estimates for regulatory purposes
• Exposure estimates are assumed to be conservative
• Level of conservatism is actually unknown
• Results are not accurate or realistic
• Exposure estimates can be incorrect by an order of magnitude
Risk Analysis and The Flaw of Averages
Image: www.flawofaverages.com
Creme Global Methods
Detailed
Product
Usage
Information
Occurrence
Data
Expert
Models
• Probabilistic &
• Deterministic
Consumer Exposure
Creme Global Methods
• Scientifically validated models of consumer exposure
and risk assessment
• As called for by FDA, EFSA, SCCS, USDA, FSA, etc…
• Use all the available real data in the exposure model
• Retains relationship between intakes and key factors
• Aggregate Exposure from multiple sources
• Assess substances from multiple products
• Cumulative Exposure from multiple substances /
chemicals simultaneously from all sources
• Assess full formulations
Creme Global:
Probabilistic Modelling
• In an Ideal World, we would have access to complete exposure data for
everyone:
• How much they consume?
• How often?
• Which products?
• The exact chemical concentration in these products?
• This detailed data would enable a (relatively) straightforward
calculation of population exposure
• In reality, data is only available for a relatively small proportion of the
population.
• The software developed by Creme enables estimation of the actual
population exposure from this limited data.
Creme Database Tables
Subjects
Subject demographics
Consumption
Products and Foods consumed
Brands
Market Shares, Brand Loyalties
Groups
Recipe and Food Groups Info
Correlations
Information
on correlated
variables
Endpoints
e.g. ARfD, ADI
Processing
Potential processing factors
Substances
Substance / Chemical
concentrations
in products / foods
Creme Global:
Probabilistic Modelling
• The software creates a large simulated population based on the
observed data, using probabilistic modelling
(Monte Carlo)
• The simulated population has the same usage patterns and habits as
the real population
• This simulated population is used to represent the real population
• Exposure statistics are calculated for the simulated population
Creme Global:
Probabilistic Modelling
• Example:
Dermal Exposure to Fragrance Compounds
from a Cosmetics Product
 Dermal Exposure (mg/cm2/day) =
(F x A x C x R) / S
F = Frequency of Use (of Cosmetic Product)
A = Amount Per Use
C = Chemical Concentration
R = Retention Factor
S = Skin Surface Area
Creme Global:
Probabilistic Modelling
 Dermal Exposure (mg/cm2/day) =
(F x A x C x R) / S
 These values are not available for everyone in the population
 We gather information for each parameter from available data collection sources (surveys /
studies)

Freq. of Use:




Amount per Use:
Chem. Conc.:
Retention Factors:
Surface Area:
Survey of 36000 EU/US consumers
(1.2 million recorded events)
Surveys of between 360 and 500 people
Fragrance and Cosmetics Manufacturers
Expert Opinion
US EPA
Creme Global:
Probabilistic Modelling
 Dermal Exposure (mg/cm2/day) =
(F x A x C x R) / S
F
X
A
C
X
S
X
R
Model Output
Frequency
Distribution of Subjects
Exposure (mg/kg/day)
Reference Dose
Frequency
Daily Average & Maximum Day
Daily Average
Maximum Day
ARfD
Exposure
Daily Average & Lifetime Exposure
Frequency
Lifetime
Daily
Average
Exposure
FOOD
GLOBAL
FOOD
GLOBAL
FOOD
COSMETICS
GLOBAL
FOOD
COSMETICS
GLOBAL
MICROBIAL
FOOD
CROP PROTECTION
PESTICIDES
COSMETICS
GLOBAL
NANOTECH
PACKAGING
MICROBIAL
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Cloud Computing
• How is Cloud Computing different to everything else? (Armbrust et al, 2010)
• The appearance of infinite computing resources
• The elimination of an up-front commitment by cloud users
• The ability to pay for use of computing resources as needed
(for example processors by the hour, storage by the day)
• Definition from NIST (National Institute of Standards and Technology):
“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and
released with minimal management effort or service provider interaction.”
• More concisely:
• Cloud Computing = Internet Services & Pay for what you use
Image: www.jansipke.nl
Using Iaas to provide SaaS
Image: A view of Cloud Computing (Armbrust et al., 2010)
Leading Providers of IaaS
Image: blog.appcore.com
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Benefits of IaaS
• High Quality, Reliable, Enterprise Grade Infrastructure
• Servers
• Storage
• Networks
• Reduce waste and inefficiency
• Reduce cost
• Avoid large up front capital expenditure
• Avoid large in-house maintenance costs
• Rapid scaling possible to fit requirement
• Accessible from anywhere
IaaS Benefits: Scalability
Image: www.techtricksworld.com
IaaS Benefits: Scalability
Problem 1:
Wasted Resources
IaaS Benefits: Scalability
Problem 1:
Wasted Resources
Problem 2:
Losing Customers?
Negatives of IaaS
• Performance
•
•
•
•
Network (High Performance Computing)
Disk (not as good as performance / throughput. Database heavy applications)
Reliability in terms of performance (Not always consistent performance)
Performance of cloud is usually less than dedicated hardware
• Although, some cloud providers can provide higher performance (“pay per performance”)
• You can sometimes work your way around poor performance (e.g. RAID arrays)
• Cost
• More expensive for some applications / workflows
e.g. workflows with relatively constant load
• “Pound for pound” more expensive than Virtual Private Hosting, Colocation
The added flexibility and scalability comes at a cost
Cloud Computing: Hype Cycle
Cloud Computing: Hype Cycle
Cloud Computing: Hype Curve
Entering the
“Trough of Disillusionment”?
Moving away from Cloud?
• Recent reports of migrations away from public cloud, to in-house / private clouds:
Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy
Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over
the next three years.
But with physical servers, the cost will be closer to $200,000.
(wired.com report, Aug 2013)
Moving away from Cloud?
• Recent reports of migrations away from public cloud, to in-house / private clouds:
Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy
Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over
the next three years.
But with physical servers, the cost will be closer to $200,000.
(wired.com report, Aug 2013)
• Cloud Computing predicted growth 36% compound annual thru 2016 (451Research)
Moving away from Cloud?
• Recent reports of migrations away from public cloud, to in-house / private clouds:
Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy
Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over
the next three years.
But with physical servers, the cost will be closer to $200,000.
(wired.com report, Aug 2013)
• Cloud Computing predicted growth 36% compound annual thru 2016 (451Research)
• Cloud Computing is not perfect for every business / application
• As IaaS matures, some early adopters may start to consider more sophisticated
approaches like Hybrid Cloud
Moving away from Cloud?
• Recent reports of migrations away from public cloud, to in-house / private clouds:
Zynga, HubSpot, MemSQL, Uber, Mixpanel, Tradesy
Eric Frenkiel (HubSpot) estimates that, had the company stuck with Amazon, it would have spent about $900,000 over
the next three years.
But with physical servers, the cost will be closer to $200,000.
(wired.com report, Aug 2013)
• Cloud Computing predicted growth 36% compound annual thru 2016 (451Research)
• Cloud Computing is not perfect for every business / application
• As IaaS matures, some early adopters may start to consider more sophisticated
approaches like Hybrid Cloud
• (In agreement with Gartner Hype Cycle)
When is it a good idea to think about IaaS?
• Start-ups
• Avoid up front expenditure in hardware
• Avoid having a deicated sys admin function to ensure uptime for clients
• Flexible, Agile, Lean – easy to ‘pivot’
• Elasticity
• Not sure about predicted load / usage for the next 12-24 months?
• Load on servers is inherently variable: you expect the load to vary a lot for the foreseeable
future.
If you’re not sure:
1) Try to do a cost calculation
2) Is there a difference in the level of service you will be able to offer?
Case Study: Creme Global
• 2006-2009:
• Single HPC Cluster (3x rack servers)
• 8 cores
• 16 GB RAM
• Colocation hosting in Dublin
• Capacity:
• Up to 2x concurrent assessments / jobs
• 2009:
• Increasing client base
• Potential clients requesting trials
• Evaluation of compute resource requirements needed…
Analysis of Compute Resources
Monitored assessment / job requests on compute
servers over 4 month period
10.71%
89.29%
0 Jobs
1+ Jobs
Analysis of Compute Resources
Problem 1:
Most of the time: zero load on compute servers
(Wasted compute resources)
10.71%
89.29%
Compute resources in use about 10% of the time
0 Jobs
1+ Jobs
Analysis of Compute Resources
Closer examination of load when resources are in use
(i.e. when clients are using the compute resources)
10.71%
89.29%
0 Jobs
1+ Jobs
Analysis of Compute Resources
75%
89.29%
When clients are using the system, a
large proportion of the time their jobs
have to queue.
10.75%
25%
0 Jobs
1-2 Jobs
3+ Jobs
Problem 2:
System is overloaded 25% of the time it
is in use.
Analysis of Compute Resources
• Problem 1: Compute system is usually unused
(~90% of the time)
• Waste of compute resources
• Problem 2: When in use, system is regularly overloaded
(~25% of the time)
• Unsatisfactory service being offered to customers
IaaS Benefits: Scalability
Problem 1:
Wasted Resources
Problem 2:
Losing Customers?
Typical Scenario
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Elastic Scaling of Resources in the Cloud
• This is one of the biggest benefits of using IaaS
Cloud is generally more expensive – because of this benefit
• Manual -or- Automated?
Scaling Manually
• Initially, you probably won’t have a scaling strategy
• Manually monitoring and scaling usage will provide the data you need to move to automation
• An incomplete or poorly designed automation strategy can end up costing more, or providing worse service
• Dev / Test phase of an application (let developers scale up/down manually)
• If your requirements will change relatively infrequently
• Good predictions of future requirements
• Scale up (down) as you add (remove) a product or client
• Needs a good alignment with business development and strategy
• Scaling manually on the cloud is quite similar to Virtual Private Hosting
•
•
•
VPS are usually lower cost than on-demand cloud servers
You can get VPS-style hosting from cloud providers and migrate to on-demand when needed
Reserved instances (pay some up front to lower the overall cost)
Scaling Manually
Image: 8kmiles.com
Scaling Manually
Scale up when you
need to
(e.g. new contract)
Scaling Manually
Scale down when
demand falls
(e.g. end of contract)
Scale up when you
need to
(e.g. new contract)
Measuring Performance
• Even a manual scaled system, will need Metrics to measure the
performance
• Examples:
• Server Load
CPU Utilization, Disk Read / Write, Network I/O, Memory Usage, Disk Usage
• Availability
Uptime (%)
• Response Time
Database queries, Server-side processing, Content distribution
• Queue Length
Batch processes waiting to start
Scaling Automatically
• Demand on system changes too rapidly to manage manually
Image: 8kmiles.com
Scaling Automatically
AutoScaling: Ready Made vs Build Your Own
AutoScaling: Ready Made Solution
• IaaS providers offering built-in scaling solutions
• Some third-party providers and consultants can help build a solution
for you
• RightScale
• 8kmiles.com
• Pros: can be set up quite quickly and relatively cheaply
Don’t need to spend a lot of time and resources on R&D
• Cons: may be limited in scope
May not be a perfect fit for your scaling requirements
AWS provides built-in autoscaling functionality for
your EC2 instances
Image: aws.amazon.com
AWS provides built-in autoscaling functionality for
your EC2 instances
1) Metrics
“Should we make a change?”
AWS provides built-in autoscaling functionality for
your EC2 instances
2) Scaling Rules
“What change to make?”
1) Metrics
“Should we make a change?”
Average
Min
Max
Sum
SampleCount
CPU Utilization (%)
Disk Reads (Bytes)
Disk Read Operations
Disk Writes (Bytes)
Disk Write Operations
Network In (Bytes)
Network Out (Bytes)
Average
Min
Max
Sum
SampleCount
CPU Utilization (%)
Disk Reads (Bytes)
Disk Read Operations
Disk Writes (Bytes)
Disk Write Operations
Network In (Bytes)
Network Out (Bytes)
Average
Min
Max
Sum
SampleCount
<
≤
>
≥
CPU Utilization (%)
Disk Reads (Bytes)
Disk Read Operations
Disk Writes (Bytes)
Disk Write Operations
Network In (Bytes)
Network Out (Bytes)
Average
Min
Max
Sum
SampleCount
<
≤
>
≥
1 Minute
5 Minutes
15 Minutes
1 Hour
6 Hours
Other Metrics Possible
• EBS (Elastic Block Storage)
•
•
•
•
Read / Write Bytes
Read / Write Operations
Idle time
Queue length (operations waiting to be completed)
• SQS (Simple Queuing System)
• Number of messages sent / received
• Number of messages in the queue
• Custom metrics can be defined by the user
Choose range of cluster size
Choose range of cluster size
Add a number of instances
or
Increase size by a certain percent
Choose range of cluster size
Add a number of instances
or
Increase size by a certain percent
1) Rule for Scale Up
Choose range of cluster size
Add a number of instances
or
Increase size by a certain percent
1) Rule for Scale Up
2) Rule for Scale Down
AutoScaling: Build Your Own Solution
• Full control and customization over the scaling algorithms
• Case Studies:
• Netflix
• Creme Global
Custom AutoScaling: Netflix
5 Days
Custom AutoScaling: Netflix
General pattern
emerges over time
5 Days
Custom AutoScaling: Netflix
General pattern
emerges over time
Noise
(Unpredictable / Random)
5 Days
Custom AutoScaling: Netflix
General pattern
emerges over time
Noise
(Unpredictable / Random)
5 Days
Scaling Strategy:
1) Predict the general pattern
2) React to the randomness
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Fast Fourier Transform
(approximation of observed data as
combination of Sin waves)
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Fast Fourier Transform
(approximation of observed data as
combination of Sin waves)
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Fast Fourier Transform
(approximation of observed data as
combination of Sin waves)
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Fast Fourier Transform
(approximation of observed data as
combination of Sin waves)
Scaling plan ready before
demand changes
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Fast Fourier Transform
(approximation of observed data as
combination of Sin waves)
Scaling plan ready before
demand changes
𝑀𝑒𝑡𝑟𝑖𝑐 =
𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑠 𝑝𝑒𝑟 𝑆𝑒𝑐𝑜𝑛𝑑
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Fast Fourier Transform
(approximation of observed data as
combination of Sin waves)
Scaling plan ready before
demand changes
Random spikes (deviations from the prediction)
are fixed using Amazon AutoScaling
Custom AutoScaling: Creme Global
Custom AutoScaling: Creme Global
8.16%
89.29%
10.75%
2.59%
0 Jobs
1-2 Jobs
3+ Jobs
Job Requests – Very Unpredictable
Custom AutoScaling: Creme Global
Job Run Times
(1+ hour only)
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Series1
1-2 hr
2-3 hr
3-4 hr
4-5 hr
5-6 hr
6-7 hr
7-8 hr
8-9 hr
9-10 hr
10+ hr
46%
15%
8%
9%
3%
3%
2%
2%
1%
10%
Run Time (Hours)
Job Size – Variable from 1 hour to 10+ hours
Custom AutoScaling: Creme Global
• Job size is very much larger than typical requests to web service
• If resources are low – job cannot start
Compare to standard web services
• Each job requires dedicated resources to run
Typically 1-2 jobs per server; Each server is usually at close to 100% CPU while processing jobs
Measuring individual server load is not a good measure for scaling
• Jobs are very variable in size
Length of job queue alone is not a very good measure for scaling
• A custom scaling approach was required
http://www.google.com/patents/US20110138055
Custom AutoScaling: Creme Global
• Devised a more relevant Metric to measure the performance of the
system:
“How long will it take for the last job in the queue to start?”
• How to calculate this metric?
1.
2.
3.
Estimate the time required for each job to complete (running or queued)
Simulate the processing of each job through the queue in order
Calculate the time that will have to pass before the last job will begin to process
http://www.google.com/patents/US20110138055
How to calculate this metric?
1) Estimate the time required for each job to complete (running or queued)
• For jobs that are already running:
• The application can estimate the percentage of a job completed so far
• Estimate:
Total time required = (Time required so far) / (Percent complete)
Time remaining = Total time required – Time required so far
• For jobs that have yet to start:
• Estimate the complexity of the job based on input factors, including:
Monte Carlo iterations requested
Size and complexity of the data sets involved
Mathematical model computational complexity
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
30
min
5
min
32
min
7
min
47
min
T = 0 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
30
min
5
min
25
min
0
min
40
min
T = 7 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
Job Done
14
min
62
min
30
min
5
min
0
min
25
min
40
min
T = 7 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
Job Done
14
min
62
min
30
min
5
min
0
min
25
min
New Job Starts
40
min
T = 7 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
Job Done
14
min
62
min
Queue
Progresses
30
min
5
min
0
min
25
min
New Job Starts
40
min
T = 7 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
30
min
25
min
5
min
40
min
T = 7 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
30
min
20
min
0
min
35
min
T = 12 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
20
min
30
min
35
min
T = 12 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
0
min
10
min
15
min
T = 32 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
62
min
10
min
15
min
T = 32 min
How to calculate this metric?
2) Simulate the processing of each job through the queue in order
Queue
14
min
52
min
0
min
5
min
T = 42 min
How to calculate this metric?
3) Calculate the time that will have to pass before the last job will begin to process
Queue
14
min
Last job ready to start
52
min
5
min
T = 42 min
How to calculate this metric?
3) Calculate the time that will have to pass before the last job will begin to process
Queue
14
min
Last job ready to start
52
min
5
min
Queue Length (Performance Metric):
42 min
T = 42 min
Custom AutoScaling: Creme Global
• Scale Up Rules:
Queue Length > 10 min
No instance pending
• Scale Down Rules:
Instance is idle (no running job)
Queue is empty
Less than 5 min to another billing hour
Custom AutoScaling: Creme Global
Job Queue Times
(14,623 Jobs :: 2103-14)
80%
70%
60%
50%
40%
30%
20%
10%
0%
Series1
0-2 min
2-4 min
4-6 min
6-8 min
8-10 min
10-12 min
12+ min
74%
12%
5%
6%
2%
1%
1%
Queue Time (min)
Creme Global
Cloud Computing
and IaaS
Migrate to
the Cloud?
Scaling Resources
Business and
Management
Creme Global
Cloud Computing
and IaaS
Scaling Resources
Migrate to
the Cloud?
Business and
Management
Business and Management Considerations
• Changing Roles (Software Dev)
• Data Security
• Monitoring Costs
• Further Cost Saving Strategies
Changing Role of Software Developers
• Hardware provisioning is now the responsibility of Software Dev
• Spinning up / down instances and volumes is part of the day-today for Developers
• What happens when things get busy?:
Think about: what happens your desk, desktop, inbox…
• Risks:
•
•
•
•
Test / Development instances left running
Volumes and Snapshots without labels
Easy to keep backups “just in case” - build up over time if not managed
Billing is far less transparent
(compared to conventional hardware purchase / budgeting)
• Easy to scale up  Easy to make a mess!
• A fixed-resource system will self-regulate due to the inherent limits
Cloud system does not have these limits
Changing Role of Software Developers
• Benefits
• Very empowering for some software architects - can design, build, test
hardware configurations that will support their applications
• Compliments Agile and Lean Development practices
• Software developers can acquire new skills
(e.g. systems engineering skills, IS management)
• Streamline design, development, QA / test, release, support
• Merging of a number of roles
• Software Developer, Software Architect, Systems Engineer, …
• Result: “DevOps”
Data Protection
Image: aws.amazon.com
Data Protection
Data protection is vital to reputation of IaaS providers
Very high standards in place and auditing processes
Your IaaS provider should be able to grant you access to their
auditing reports / whitepapers on security
Data Protection
Employ best practice within your own organization:
- Server Upgrades / Patches
- Application Security
- Data encryption (storage / transit)
- Principle of Least Privilege
- Defense in Depth
- Refer to guidelines: DPA, AWS
Data protection is vital to reputation of IaaS providers
Very high standards in place and auditing processes
Your IaaS provider should be able to grant you access to their
auditing reports / whitepapers on security
Data Protection
Building trust with customers:
- Provide audit and reports from IaaS providers
- Provide documentation on standards within your organization
Employ best practice within your own organization:
- Server Upgrades / Patches
- Application Security
- Data encryption (storage / transit)
- Principle of Least Privilege
- Defense in Depth
- Refer to guidelines: Data Protection Commissioner, AWS
Data protection is vital to reputation of IaaS providers
Very high standards in place and auditing processes
Your IaaS provider should be able to grant you access to their
auditing reports / whitepapers on security
Managing Cost
• IaaS: “Pay for what you use”
• IaaS: Easy to scale up / down as demand increase / decreases
• Can you accurately predict Cloud Computing bill each month?
• Can you afford to wait until the next bill to find out?
• An unexpectedly large bill could cause cash flow problems for a small
company or start-up
Managing Cost: Monitoring & Alarms
Managing Cost: Monitoring & Alarms
With AWS, you can view current monthly spend
Managing Cost: Monitoring & Alarms
With AWS, you can view current monthly spend
Set up Alerts
e.g. “Email me when my bill goes over $1,000”
Managing Cost: Monitoring & Alarms
With AWS, you can view current monthly spend
Set up Alerts
e.g. “Email me when my bill goes over $1,000”
But, what to do next?
Cost Management: Finding Savings
• IaaS = “Pay for what you use”
including: elasticity, scaling, storage quality / redundancy, backups, reliability
• To save costs:
1.
2.
3.
4.
5.
Can you make an upfront commitment on some servers (less flexible)?
Can you build your own auto-scaling application?
Can you put some of your data into cheaper archive storage?
Can you live with having some of your data stored non-redundantly?
Can you live with unpredictable server outages?
Cost Management: Finding Savings
1.
Can you do with less flexibility in terms of number of servers?
2.
Can you build your own auto-scaling application?
3.
Can you put some of your data into cheaper archive storage?
4.
Can you live with having some of your data stored non-redundantly?
5.
Can you live with unpredictable server outages?
Paying an upfront annual cost for a particular usage of Cloud instances will reduce the overall cost
AWS provides Reserved Instances which can give up to 65% saving over on demand instances
Data can be expensive to store on the Cloud using the standard services
If data is not needed “on demand”, then cheaper storage options are available
EBS costs: 1TB = $600 per annum, Glacier cost: 1TB = $60 per annum
Standard storage on AWS is 99. 999999999% durable.
If data is already stored somewhere else, then 99.99% durability may be sufficient (saving about 20% on cost)
If your application is fault tolerant and able to withstand random server outages  AWS Spot Instances
Spot Instances are unused AWS instances that are auctioned off to highest bidder
You will lose any instances that are out-bidded without warning
Private and Hybrid Cloud
Private Cloud
• Concerns / Considerations:
• Is your organization large enough to have a private cloud which gives the
“appearance of infinite compute resources”?
• If not, then don’t expect your private cloud system to operate under exactly
the same rules as the public cloud you’re used to
• Private cloud will require in-house IT capability to manage
• Can internal system provide the same level of service as enterprise public
cloud?
Think about: network bandwidth / redundancy, uptime, backup, disaster
recovery
• Going private for performance: maybe bare-metal is the what you really need.
“Web servers belong in the public cloud. But things like databases — that need
really high performance, in terms of [input and output] and reading and writing
to memory — really belong on bare-metal servers or private setups.”
John Engates (CTO, Rackspace)
Hybrid Cloud
Hybrid Cloud
Use Private Cloud for
predictable workloads
Overflow to Public
Cloud when needed
Hybrid Cloud
Use Private Cloud for
predictable workloads
Overflow to Public
Cloud when needed
Integration between Private and Public cloud is
very important:
- Network: Bandwidth, Latency, Reliability
- Application Programming Interface (API)
- Virtual Machine Image
Creme Global
Cloud Computing
and IaaS
Scaling Resources
Migrate to
the Cloud?
Business and
Management
Creme Global
Cloud Computing
and IaaS
Scaling Resources
Migrate to
the Cloud?
Business and
Management
Thanks to:
International Association of Software
Architects, Ireland (IASA)
Irish Computer Society (ICS)
More Info:
blog.cremeglobal.com
ie.linkedin.com/in/ejdaly/
Download