Slide - Microsoft NT konferenca

advertisement
ARCHITECTING APPLICATIONS
FOR HIGH SCALABILITY
Leveraging the Windows Azure Platform
Scott Densmore
Sr. Software Development Engineer
Microsoft patterns & practices
ABOUT YOU (AN ASSUMPTION)
• You…
• are a developer
• know C#
• have a basic understanding of Windows Azure
GOALS FOR THIS SESSION
• Learn what is available in Windows Azure to
help you build scalable systems
• (Re)-Discover helpful design patterns
• Learn about practical techniques
• Identify (and avoid) potential problems
TAILSPIN
DEMO
TailSpin Surveys
TAKE THE SURVEY
http://tailspindemo.cloudapp.net/survey/fabrikam/slovenia
Where should my application live?
LOCATION
GEO-LOCATION
WINDOWS AZURE TRAFFIC MANAGER
WINDOWS AZURE TRAFFIC MANAGER
50ms
WINDOWS AZURE TRAFFIC MANAGER
100ms
50ms
WINDOWS AZURE TRAFFIC MANAGER
200ms
100ms
50ms
WINDOWS AZURE TRAFFIC MANAGER
WINDOWS AZURE TRAFFIC MANAGER
• Load balancing across multiple Hosted Services
• Integrated in the Windows Azure Platform portal
Performance
Fault Tolerance
Directs the user to the
best / closest
deployment
Redirect traffic to
another deployment
based on availability
Round Robin
Traffic routed to
deployments based on
fixed ratio
WINDOWS AZURE TRAFFIC MANAGER
• Multiple factors determine DNS resolution
• Configured by Microsoft
• Geo-IP mapping
• Periodic performance measurement
• Configured by service owner
• Policy: Performance, Failover, Geo, Ratio
• Monitoring
• Currently in CTP
WINDOWS AZURE CDN
• Integrated with Storage
• Delivery from Windows Azure Compute
instances
• Https support
• CTP of Smooth Streaming
LEVERAGING THE CDN
LEVERAGING THE CDN
MANAGING CDN CONTENT EXPIRATION
• Default behavior is to fetch once and cache for
up to 72 hrs
• Modify cache control blob header to control the
TTL
• x-ms-blob-cache-control: public, maxage=<value in seconds>
• Think hours, days or weeks
• Higher numbers reduce cost and latency via CDN &
downstream caches
MANAGING CDN CONTENT EXPIRATION
• Enables easy rollback and A/B testing
• Use versioned URLs to expire content ondemand
<img
src="http://azXXXX.vo.msecnd.
net/images/logo.2011-0529.png" />
logo.2011-05-01.png
logo.2011-05-01.png
logo.2011-05-29.png
logo.2011-05-29.png
Who is using my application?
IDENTITY
IDENTITY
IDENTITY
SHARED ACCESS SIGNATURES
• Provide direct access to content
• Can be time-bound or revoked on demand
• Also works for write access (e.g. user-generated
content)
SHARED ACCESS SIGNATURES
1. “I am Bob & I want X”
Hosted Compute
2. Service prepares a
Shared Access
Signature (SAS) to X
using the securely
stored storage account
key
3. Service returns SAS (signed HTTPS URL)
4. Bob uses SAS to access X
directly from Blob Storage for
reduced latency & compute
load
X
Non-public blob
(e.g. paid or adfunded content)
Where is the bottleneck?
BALANCING LOAD
USER SESSION
• Session is not affinitized – Load Balancer
• Session in Windows Azure
• Session Providers
• SQL Azure
• Table Storage
• Windows Azure AppFabric Caching
• JavaScript on the client
• ViewState (hidden fields)
WINDOWS APPFABRIC CACHING
• Out of box ASP.NET providers for session state
& page output caching
• Extreme low latency with the local cache
• Local cache enables you to use spare available
memory in your Web tier while the Caching tier
gives you a predictable distributed cache
WINDOWS APPFABRIC CACHING
• Caches any managed object (CLR objects,
rows, XML, Binary Data…)
• Only requirement is that the object should be
serializable
• Easily integrates into existing applications
• Same managed interfaces as Windows Server
AppFabric Caching
• Secured by the Access Control Service
KEY CACHING PATTERNS
• Reference Data
• A version of the authoritative data, refreshed periodically
• Large number of accesses, mostly read
• Example – Product catalogs
• Activity-oriented Data
• Data generated as part of the app activity, typically logged back to a
backend datastore
• Needs read, write access
• Example – Shopping cart, Session State
• Resource-oriented Data
• Authoritative data, modified by transactions, temporal in nature
• Needs frequent read, limited write access
• Example – Flight Inventory, Stock Quotes
PARTITION THE APPLICATION
• Multiple web sites
• Choose the right number of instances and
instance size
• Monitor and scale your application without
redeploying
• Use async processing (Worker Roles)
FUNDAMENTAL DESIGN PATTERN
DELAYED PROCESSING
CALCULATING SURVEY RESULTS
• Two approaches
• Retrieve all the surveys to date at a fixed time
interval, recalculate and then save the summary data
over the existing data
• Retrieve the survey data since the last time the task
ran and update the summary results
CALCULATING SURVEY RESULTS
MAP REDUCE ALGORITHM
• Original concepts come from map and reduce
functions used in functional languages (Haskell,
F#, Erlang)
• Parallelize operations on a large dataset and
speeds up processing by using multiple compute
nodes
• Dryad is Microsoft’s implementation
DATA STORAGE
TAILSPIN SURVEYS DATA MODEL
SQL AZURE
• Partition (or shard) your data across databases
• Spreads load across multiple database
instances
• Avoid hitting database size limits
• Parallelized queries across more nodes
• Improved query performance on commodity hardware
• Partitioning scheme varies per data set
SQL AZURE
Tenant 1
Hosted Compute
Tenant 2
Tenant 3
TABLE STORAGE
• Don’t be afraid to de-normalize data
• Only two indexes in a table
• Partition Key
• Row Key
• They are not really tables, think of them as Entity
bags (key / value storage)
PAGING WITH TABLE STORAGE
• Use the ContinuationToken along with the Take
operation in your query
• The ContinuationToken only accesses the next
page of data
• To implement forward and back you will need a
stack of ContinuationTokens
PAGING WITH TABLE STORAGE
TABLE STORAGE BEST PRACTICES
• Limit large scans and expect continuation tokens
for queries that scan
• Entity Group Transaction - Batch to reduce costs
and get transaction semantics
• Do not reuse DataServiceContext across
multiple logical operations
• Discard DataServiceContext on failures
TABLE STORAGE BEST PRACTICES
• AddObject/AttachTo can throw exception if entity
is already being tracked
• Query throws an exception if resource does not
exist. Use IgnoreResourceNotFoundException
BLOB STORAGE
• Blobs can be anything
•
•
•
•
Pictures, docs, etc.
Html
XML
JSon objects
BLOB STORAGE
BLOB STORAGE
PAGING WITH BLOB STORAGE
• Each item (survey answer) is stored as a blob
(json) in a container
• A blob is used to maintain a list of the items
(survey answers) as they were entered by id
• Use an inverted tick count to generate the id of
the answer to make it unique and ordered
BLOB STORAGE BEST PRACTICES
• Use parallel block upload count to reduce latency when
uploading blob
• Client Library uses a default of 90s timeout – use size
based timeout
• Snapshots – For block or page reuse, issue block and
page uploads in place of UploadXXX methods in Storage
Client
BLOB STORAGE BEST PRACTICES
• Shared Access Signature
• Use container level policy as it allows revoking permissions
• Share SAS URLs using https
• Create new containers for blobs like log files that have
retention period
• Delete logs after 1 month - create new containers every month.
• Container recreation
• Garbage collection can take time until which time container with
same name cannot be created.
• Use unique names for containers
RESOURCES
• Books
• http://wag.codeplex.com
• Products
• http://www.microsoft.com/windowsazure
• http://research.microsoft.com/en-us/projects/Dryad/
• Me
• scottden@microsoft.com
• @scottdensmore
• http://scottdensmore.typepad.com
QUESTIONS?
After the session please fulfil the questionnaire.
Questionnaires will be sent to you by e-mail and will be available in the profile section of the
NT Conference website www.ntk.si .
Thank you!
Download