SVC32: Lessons Learned: Building Scalable

>
>
>
>
>
>
>
>
>
>
>
>
>
partner
>
>
>
>
>
>
>
>
>
>
>
Fans
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
Fans
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
partner
RiskMetrics RiskBurst™
RiskMetrics Group
Offers industry-leading products and services in the disciplines of risk management, corporate
governance and financial research & analysis
Scaling on-premise computation to the Cloud
Integration of RiskMetrics extensive on-premise capability with Windows Azure
We are running on 2,000 instances on Windows Azure
We have plans to use 10,000+ instances in 2010
What are RiskMetrics doing with so much computing power?
Calculation of financial risk
Simulate scenarios for the movement of market factors over time & price financial assets in those
scenarios
Notoriously complex – can involve Monte Carlo2 for complex asset classes of the kind that the
triggered the 'credit crunch‘
Results in very high computational loads for RiskMetrics
Daily risk analysis load equivalent to calculating risk on 4 trillion US Stocks
Computational loads are characterised by high demand peaks
Strong growth trend in calculation complexity
www.riskmetrics.com
20
Peak Load Characteristics
www.riskmetrics.com
21
Growth trend in calculation complexity
Relative Equity Equivalent Units (Log Scale)
Maximum Complexity of Risk Analysis Processing Request
10
9
8
Risk problem complexity has doubled every 6 months
7
6
5
4
Processor power doubles every 2 years
3
2
1
0
1994
www.riskmetrics.com
Moore’s Law
1996
1998
2000
2002
2004
2006
2008
22
Analytics Architecture: Large-Scale Data Dependent
Processing vs. Distributable Work Packets
Market and
Pricing Data
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
RiskServer
RiskServer
Scenario Pricing:
Load
Balancer
RiskServer
Work Packets are
self-contained
RiskServer
Scenario Generation
and Aggregation:
These Services
dependent on high
speed access to large
scale data stores and
caches
www.riskmetrics.com
RiskServer
Velocity Scenario
Cache
23
Work Packet Example:
Pricing request for a Mortgage Backed Security
Compute Time:
150ms - 30s
www.riskmetrics.com
24
Analytics Architecture:
Integration of Cloud Resources?
Market and
Pricing Data
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
Pricer
RiskServer
RiskServer
Scenario Pricing:
Load
Balancer
RiskServer
Work Packets are
self-contained
RiskServer
Scenario Generation
and Aggregation:
These Services
dependent on high
speed access to large
scale data stores and
caches
www.riskmetrics.com
RiskServer
Velocity Scenario
Cache
25
RiskBurst™ Project Timeline
Q1 2010
November December
September October
July - August
March June
•Project
Conception
•Choice of
Platform
www.riskmetrics.com
•Initial MSFT
Meetings
•RiskMetrics joins
TAP
•TAP team actively
involved in
architecture
decisions
•Engineering work on
scaling proof of
concept
•Deep-dive sessions
•Large-scale testing
with test load (2002000 nodes)
•‘Industrialisation’
of architectural
pattern
•Run parallel with
in-house solution
•Production
•Large-scale UAT
using load
application
•Complete work on
operational
integration
26
RiskBurst™
An architectural pattern for large scale computational applications
www.riskmetrics.com
27
Architectural Pattern
Building large scale computation requires careful design
Problem: Need to avoid the Von Neumann Bottleneck
Keywords: Reason and Instrument
No changes to the application
Run on-premise on HPC Server or in cloud on Azure
Pattern has end-to-end decoupling
Horizontal scaling of decoupled components
Workload
Workload
Workload
Generation
Workload
Generation
Generation
Generation
www.riskmetrics.com
Messaging &
Messaging &
Storage
Messaging
Storage &
Messaging
Storage &
Messaging
Storage &
Messaging
Storage &
Messaging
Storage &
Storage
Computational
Computational
Resources
&
Computational
Resources
&
Computational
Application
Resources
&
Application
Computational
Resources
&
Application
Computational
Resources
Application &
Resources &
Application
Application
28
RiskBurst™ Workflow: Windows Azure & HPC Server
RiskBurst™ Server
Windows Azure
WCF Request
WCF Request
Workload Receiver
Batching and Sending
Input Message
Queue(s)
WCF Request
Scenario
WCF Response
Generator
WCF Response
Windows Azure
Worker Output Monitoring
www.riskmetrics.com
Output Message
Output
Queue(s)
WCF Response
WCF Error Response
Input
Outstanding Request Timeout
Sweeper
29
Windows Azure Storage Component Usage
Input Queues (To Do Jobs)
Azure Queue
Azure Queue
Azure Queue
Azure Queue
Azure Queue
RiskBurst
Server
Input Blob Storage
Worker Role
Worker Role
Worker Role
Instance
Worker Role
Instance
Instance
Instance
Support files in Blob Storage
Azure Queue
Azure Queue
Azure Queue
Azure Queue
Azure Queue
Local storage
Data
Output Queues (Job done)
Output Blob Storage
www.riskmetrics.com
30
Mapping to the Azure Environment
Visual Studio 2008 Azure development SDK mimics cloud
Mix code running in dev locally, with cloud resources such as Blob storage or queues
Good for features, does not assist with scale
Existing 32-bit .NET C++/CLI application with 3 third-party libraries
Initial idea - run directly in web-role – but 32-bit(!)
Run within worker role
Preserve WCF interface – no changes whatsoever to analytics app
Only changes to existing code base are:
Retrieve Cash-flow library support files from Blob storage on demand
Some diagnostic information added
www.riskmetrics.com
31
Getting to Cloud Resources: Bandwidth & Latency
Problem: Bandwidth to Azure gateway limited by Internet
Solution: pass by reference & blobs
Replace pass-by-value calls with pass-by-reference
Create key for scenario
Large, repeated objects (scenarios) pushed to blob storage
WCF call contains only key
Each of 1000 scenarios, used for all assets
Problem: Communications Latency
Within data centre, 20ms latency on WCF call through HPC SOA platform
Queues and Blob storage are off-device; engineering must respect this!
Work packet : 200ms computation
Solution: batch requests within input queues
But, more simultaneous work requests (threads outstanding on input)
www.riskmetrics.com
32
Utilizing Cloud Resources: Generating Load
www.riskmetrics.com
33
Utilizing Cloud Resources: Generating Load
Problem: Generating Load for Cloud Resources
Threading architecture
Workload originally generated by synchronous calls in client
Number of outstanding pricing requests = nodes x batch size
Implies large number of threads in wait states in scenario generators
Work request made asynchronous
RiskBurst™ Server Logic
Creates a balanced workload – uses a work item’s average run time
Made calls to RiskBurst™ Server asynchronous
Incoming calls create batch entry synchronously with request
Map created from message id to wait handlers
When batch full, sent on to Azure input queue
Sweeper thread gathers up output messages and uses map to associate with wait handlers
Scales well to over 1000 simultaneous requests per RiskBurst™ Server
Horizontal scale of RiskBurst™ Servers – each creates own input queue
www.riskmetrics.com
34
Horizontal Scaling within the Cloud
Problem: Saturation behaviour of queues
Can create situation where queues are saturated, made worse by retry logic
Complexity due to varied processing time
Controller will move busy queues to independent hardware
Use exponential back-off algorithm
Batch work items for each queue read or write (using 10 work packets per queue item)
Amortizing the cost of IO against CPU time is key
Batch compute sizes need to be big enough both to occupy the CPU for long enough and not
cause the swamping of the queues
Also, more items contained in queue item -> fewer queue hits
But, larger batches imply more simultaneous outstanding connections on client side
Variable run-time of assets – from 150ms – 30 seconds
Carry out processing concurrently with queue access
Pushing IO onto background threads is critical (the writes and the deletes are independent
background tasks)
On-node caching within worker role to avoid queue reads
www.riskmetrics.com
35
Exception Management in Distributed Applications
Keep it simple
Large distributed system implies need to engineer robustness to failure
Distinguish between events that are random and unpredictable and poison-message kind of
failures
Do not over-engineer efficient handling of occasional exceptions
Return exceptions to client application
Client can track number of attempts to process a work item
Distinguish poison messages and give up
Parallel handling on HPC Server SOA platform
Complexity from varying message processing times
Time-outs can be caused by several long-running pricings in same job
Re-try time-outs by sending all pricings in batch independently
www.riskmetrics.com
36
Diagnostics and Run-time Monitoring
A challenge for large scale applications, even more so for Cloud
Logging and monitoring must be switchable so as to reduce overhead
Variable level of diagnostics and logging
Requirement to filter information through decoupled architecture (on node; centralized in Azure;
returned to client)
Key data for architectural pattern
Request and result queue; successful/unsuccessful read, write and delete; time taken for all
operations
Empty request queue gets
Count of successful/unsuccessful work packets
% Processor Time performance counter
Cache misses
We utilized custom built solution during TAP
Nodes broadcast over service bus
Clients subscribe to trace messages
New diagnostic & monitoring package provides platform support
www.riskmetrics.com
37
Final Comments
Integrating on-premise and cloud applications
www.riskmetrics.com
38
Production Services across On-Premise and Cloud
Operational Integration
Fully integrate Windows Azure capabilities with RiskMetrics Operational Infrastructure
Provisioning plus diagnostic & monitoring packages
“Outside-In” Services
Control and visibility of the services on the cloud consistent with on-premise services.
Resource View
Nodes
Queues
Blob Stores
Process View
Throughput & Performance
Traceability
Problem identification
Process linkage (intra- & inter-cloud)
Binding SLA Commitments
Operational Support Escalation
www.riskmetrics.com
39
RiskBurst™ on Windows Azure
Effective architectural pattern delivers key business benefits
Elastic scaling
Enhanced services
Empowered innovation
High reliability
Improved agility
Windows Azure was an obvious choice of cloud platform
Minimize impedance mismatch between on-premise and off-premise
.NET/WCF/HPC SOA in data center extended to cloud
Configure to run in either environment
Familiar development environment
Massive scalability
View of Azure as extension of OS into Cloud
Undertake work with HPC Server Team in 2010
Ability to target either Azure-hosting WCF services or HPC Server hosted WCF services in a
seamless manner
Synchronization of on-premise Velocity instance with Azure instance
www.riskmetrics.com
40
Acknowledgements
Prototype Development:
Supporting Cast:
Stuart Hartley (University of York, UK)
Alistair Beagley (DPE / Azure)
Simon Davies (TAP programme)
Patrick Butler Monterde (TAP Programme)
Production Development Team:
Azure Product Group (Hoi Vo, Brad Calder,
Rich Bower (Team Lead)
Tom Fahrig, Joe Chau)
Kelly Crawford (RiskBurst Server/Client)
Hunter Cadzow & Analytics Development at
RiskMetrics
Simon Davies (TAP Programme)
Jonathan Blair (Microsoft Consulting)
Tom Stockdale (RiskMetrics CTO)
www.riskmetrics.com
www.riskmetrics.com
41
>
>
>
>
>
>
>
channel9.msdn.com/learn
Built by Developers for Developers….
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.