Introduction to Windows Azure

advertisement
Introduction to Windows Azure
Cloud Computing Futures Group, Microsoft Research
Roger Barga, Jared Jackson, Nelson Araujo,
Dennis Gannon, Wei Lu, and Jaliya Ekanayake
Range in size from “edge”
facilities to megascale.
Economies of scale
Approximate costs for a small size
center (1000 servers) and a larger,
100K server center.
Technology
Cost in smallsized Data
Center
Cost in Large
Data Center
Ratio
Network
$95 per Mbps/
month
$13 per Mbps/
month
7.1
Storage
$2.20 per GB/
month
$0.40 per GB/
month
5.1
Administration
~140 servers/
Administrator
>1000 Servers/
Administrator
7.1
Each data center is
11.5 times
the size of a football field
A bunch of machines in data centers
Fabric Controller
Owns all data center hardware
Uses inventory to host services
Deploys applications to free resources
Maintains the health of those applications
Maintains health of hardware
If the node goes offline, FC will try to recover it
If a failed node can’t be recovered, FC migrates
role instances to a new node, A suitable
replacement location is found, Existing role
instances are notified of change
Manages the service life cycle starting from bare
metal
Highly-available
Fabric Controller (FC)
At Minimum
(Small)
Up to 7 Guest VMs
A Host Virtual Machine
An Optimized Hypervisor
CPU: 1.5-1.7 GHz
x64
Memory: 1.7GB
Network: 100+
Mbps
Local Storage:
500GB
Up to (Extra
Large)
CPU: 8 Cores
Memory: 14.2 GB
Local Storage: 2+
TB
At Minimum
CPU: 1.5-1.7 GHz x64
Memory: 1.7GB
Network: 100+ Mbps
Local Storage: 500GB
Up to
CPU: 8 Cores
Memory: 14.2 GB
Local Storage: 2+ TB
Azure Platform
Worker Role
Web Role
Compute
Blobs
Queues
Storage
Tables
Drives
A closer look
HTTP
Blobs
Application
Storage
Compute
Fabric
…
Drives
Tables
Queues
Access
Data is exposed via .NET and RESTful
interfaces
Data can be accessed by:
Windows Azure apps
Other on-premise applications or cloud
applications
Account
Container
images
jared
Blob
PIC01.JPG
PIC02.JPG
movies
MOV1.AVI
http://jared.blob.core.windows.net/images/PIC01.JPG
Number of Blob Containers
Can have has many Blob Containers as will fit within the
storage account limit
Blob Container
A container holds a set of blobs
Set access policies at the container level
Private or Public accessible
Associate Metadata with Container
Metadata are <name, value> pairs
Up to 8KB per container
Block Blob
Targeted at streaming workloads
Each blob consists of a sequence of blocks
Each block is identified by a Block ID
Size limit 200GB per blob
Page Blob
Targeted at random read/write workloads
Each blob consists of an array of pages
Each page is identified by its offset from the start of the blob
Size limit 1TB per blob
Account
Container
images
jared
Blob
PIC01.JPG
PIC02.JPG
movies
MOV1.AVI
Block or
Page
Block or
Page 1
Block or
Page 2
Block or
Page 3
Producers
Scalable message
paths
Provides loose
synchronization
Any number of
messages
One week of
persistence
Maximum size 8KB
Visibility timeout
Consumers
C1
P2
4
P1
3
2
1
C2
Provides Structured Storage
Massively Scalable Tables
Billions of entities (rows) and TBs of data
Can use thousands of servers as traffic grows
Data is replicated several times
Table
A storage account can create many tables
Table name is scoped by account
Set of entities (i.e. rows)
Entity
Set of properties (columns)
Required properties
PartitionKey, RowKey and Timestamp
Partition 1
Partition 2
Source : Windows Azure Table – Programming Table Storage
A Windows Azure Drive is a Page Blob formatted as a NTFS
single volume Virtual Hard Drive (VHD)
Drives can be up to 1TB
A VM can dynamically mount up to 8 drives
A Page Blob can only be mounted by one VM at a time for
read/write
Remote Access via Page Blob
Can upload the VHD to its Page Blob using the blob interface, and then
mount it as a Drive
Can download the Drive through the Page Blob interface
A closer look
Web Role
HTTP
Load
Balancer
IIS
Worker Role
ASP.NET, WCF,
etc.
Agent
main()
{ … }
Agent
Fabric
VM
Using queues for reliable messaging
To scale, add more of either
1) Receive work
Worker Role
Web Role
main()
{ … }
ASP.NET, WCF,
etc.
2) Put work in
queue
3) Get work
from queue
Queue
4) Do
work
Queues are the application glue
• Decouple parts of application, easier to scale independently;
• Resource allocation, different priority queues and backend servers
• Mask faults in worker roles (reliable messaging).
Use Inter-role communication for performance
• TCP communication between role instances
• Define your ports in the service models
Points of interest
Access
Data is exposed via .NET and RESTful interfaces
Data can be accessed by:
Windows Azure apps
Other on-premise applications or cloud applications
Work
Home
Develop
Development Fabric
Develop
Your
App
Run
Development Storage
Source
Control
Version
Local
Application Works Locally
What the ‘Value Add’ ?
Provide a platform that is scalable and available
Services are always running, rolling upgrades/downgrades
Failure of any node is expected, state has to be replicated
Failure of a role (app code) is expected, automatic recovery
Services can grow to be large, provide state management
that scales automatically
Handle dynamic configuration changes due to load or failure
Manage data center hardware: from CPU cores, nodes, rack,
to network infrastructure and load balancers.
Key takeaways
Cloud services have specific design considerations
Always on, distributed state, large scale, fault tolerance
Scalable infrastructure demands a scalable architecture
Stateless roles and durable queues
Windows Azure frees service developers from
many platform issues
Windows Azure manages both services and servers
Worker
Web Role
Web
Portal
Web
Service
Job
registration
Job Management Role
Scaling
Engine
Job
Scheduler
Job
Registry
NCBI
databas
es
Database
updating
Role
Azure Table
Worker
Global
dispatch
queue
Blast
databases,
temporary
data,
etc.)
Azure
Blob
…
Worker
•
Always design with failure in mind
- On large jobs it will happen, and it can happen anywhere
•
Factoring work into optimal sizes has large performance impacts
- The optimal size may change depending on the scope of the job
•
Test runs are your friend
- Blowing $20,000 of computation is not a good idea
•
Make ample use of logging features
- When failure does happen, it’s good to know where
•
Cutting 10 years of computation down to 1 week is great!!
- Little Cloud development headaches are probably worth it
Thank you!
Download