Analytics, Data and the Time for High Performance Computing in Financial Markets Brian Sentance, Chief Executive Officer, Xenomorph Published: November 2007 Abstract This white paper describes the current drivers behind the need for greater data analytical power within financial markets, the business and technical challenges these drivers imply and how the Microsoft® High Performance Computing (HPC) platform can move financial institutions closer to goal of real-time pricing, trading and risk management across asset classes. Specifically, the paper illustrates many business scenarios present in financial markets that may benefit from the calculation scalability and fault-tolerance of Microsoft HPC, and how Microsoft is enabling greater productivity in financial markets by bringing this compute power closer to the traders, risk managers and researchers that need it. This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. © 2007 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, Excel, Visual Basic, Visual Studio, Windows, Windows Server, and the Windows logo are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners. Contents Data and Analytics in Financial Markets ..................................................................................... 1 High Performance Computing Background ............................................................................... 2 Microsoft® and High Performance Computing ............................................................................. 2 Introducing the Microsoft HPC Platform....................................................................................... 3 Microsoft® Windows® Compute Cluster Server 2003................................................................... 4 Driving the Need for HPC - Regulation ........................................................................................ 5 Basel I, II and Risk Management ................................................................................................. 6 Sarbanes Oxley and Spreadsheets ............................................................................................. 6 UCITS III, Derivatives and Structured Products .......................................................................... 7 RegNMS and Data Volumes ........................................................................................................ 7 MiFID, Data Volumes and Data Analysis ..................................................................................... 8 Driving the Need for HPC - Competition ..................................................................................... 9 Algorithmic Trading, Speed and Data Volumes ......................................................................... 10 Decimalization and US Trading Volumes .................................................................................. 10 Credit Derivatives and Product Complexity ............................................................................... 11 Structured Products Growth in the US ....................................................................................... 11 Liability Driven Investment (LDI) and Derivatives ...................................................................... 12 Hedge Fund Valuation and Risk Management .......................................................................... 12 Where to Apply HPC in Financial Markets ................................................................................ 13 Backtesting, Data Mining and Complexity in Algorithmic Trading ............................................. 13 New Derivative Products from Quantitative Research ............................................................... 14 Product Control for New Types of Derivative ............................................................................. 15 Derivatives Trading .................................................................................................................... 15 Portfolio Optimization ................................................................................................................. 15 Risk Management ...................................................................................................................... 16 Putting the Productivity in HPC – Some Practical Examples ................................................. 18 Introducing High Productivity Computing ................................................................................... 18 Productivity Example #1 – Windows CCS applied to Excel® 2007 ............................................ 18 Productivity Example #1(a) – Excel-WCCS Adapter for Legacy XLLs ...................................... 20 Productivity Example #2 – Windows CCS and Legacy VBA Code ........................................... 22 Productivity Example #3 – Windows CCS and MATLAB........................................................... 23 Productivity Example #4 – A Windows CCS “Call to Action” for Systems Developers ............. 24 Summary ...................................................................................................................................... 25 Appendix 1- URL Links for More Information ........................................................................... 26 Appendix 2- Improvements in Microsoft® Office Excel® 2007 ................................................. 27 Contributors and Acknowledgments Chris Budgen, Chief Technical Architect, Xenomorph Michael Newberry, Product Manager HPC, Microsoft Jeff Wierer, Senior Product Manager HPC, Microsoft Antonio Zurlo, Technology Specialist HPC, Microsoft Data and Analytics in Financial Markets Financial markets are going through interesting times. Usually the market experiences more than enough activity when one area of regulation is changing or one area of the market is going through rapid expansion. The current business and technical issues faced by financial markets are at extraordinary levels, driven by a multitude of drivers spanning both market growth and regulatory change. Those institutions that win will be those that can manage: (i) higher data volumes, more quickly; (ii) greater data complexity, more easily; and (iii) more challenging data analysis, more productively. All of these challenges in data and analytics management drive the need for greater compute power within financial markets. In order to ensure productive use of this compute power, it is essential that it becomes easily accessible to all practitioners within financial institutions; from traders and risk managers to IT development staff, regardless of whether they are spreadsheet users or involved in the low level programming of high performance trading applications. This is one of the key design goals of the Microsoft® High Performance Computing (HPC) platform. Microsoft believes that the Microsoft HPC platform is an enabling technology for financial markets institutions that wish to be first to market with innovative new financial products, risk management techniques and trading strategies. The time for high performance computing is now, as institutions must face the challenging journey towards real-time pricing, trading and risk management across asset classes. Analytics, Data and the Time for High Performance Computing in Financial Markets 1 High Performance Computing Background So firstly, let’s take a step back and ask the question just what is High Performance Computing? Fundamentally, HPC is concerned with running applications and business processes faster and more reliably than ever before. In more detail, HPC is a branch of computer science that studies systems designs and programming techniques to extract the best computational performance out of microprocessors. Whilst the origins of HPC are to found in the development of specialized supercomputers developed for the US government in the 1960s, in recent years distributed (aka parallel) computing using commodity hardware has come to dominate this field. The simple idea of distributed computing within HPC is to take a calculation, break it into smaller chunks and run these smaller sub-calculations in parallel on multiple processors and servers in order to reduce the overall calculation time. For example, applying one hundred processors in parallel to a calculation has the theoretical potential to reduce overall calculation time down by a factor of one hundred. Add to this compute power the ability to re-run and recover from hardware or software failures within the “cluster” of parallel computers, and it becomes easy to understand why HPC technology is so appropriate to the challenges currently faced in financial markets. Microsoft and High Performance Computing Microsoft has set out to make high performance computing available to all, and nowhere is the goal needed more than in financial markets. In summary, Microsoft HPC technology has been designed to: Leverage existing user skills such as familiarity with Microsoft® Windows® and Windows-based applications, Microsoft® Excel® as an example. Build upon existing .NET developer skills through tight integration of HPC functionality within Microsoft® Visual Studio®. Enable business user productivity through easy-to-use GUI applications for cluster management and job scheduling. Provide extensive command line interfaces for power users and developers that need more flexibility and control. Offer support for HPC industry standard interfaces such as MPI (Message Passing Interface) for clustering. Provide an easy to program job scheduler API for integrating desktop applications directly to the cluster. Offer integrated cluster setup, deployment and monitoring capabilities. Build upon Microsoft® Active Directory® to offer powerful, built-in cluster security management functionality. Combining these goals with Microsoft’s partners’ ability to deliver the out-of-the-box business functionality based on Microsoft HPC technology, means that high performance computing has the potential to go mainstream within financial markets. Analytics, Data and the Time for High Performance Computing in Financial Markets 2 Introducing the Microsoft HPC Platform The Microsoft HPC Platform is fundamentally based upon Windows Compute Cluster Server 2003 (WCCS), which in turn is composed of two components Windows Server® 2003 Compute Cluster Edition (CCE) Windows® Compute Cluster Pack (CCP) Windows CCE is Microsoft’s 64-bit operating system offered specifically for the HPC deployment. It offers the power and integration of the Windows Server family at a value point designed for distributed, multiple-server computing. Windows CCP is Microsoft’s 64-bit cluster management suite, offering hitherto unavailable levels of cluster deployment, access and controllability for high performance computing. In addition to Windows CCS, we need add two further components to complete full picture of the Microsoft HPC Platform offering: Microsoft Visual Studio Microsoft .NET Framework Microsoft Visual Studio is the most advanced and widely used integrated development environment (IDE) on the market, and has a wealth of features designed to ensure it is the IDE of choice for developing and debugging distributed HPC software. The Microsoft .NET Framework should need no introduction as Microsoft’s strategic managed code programming model for Windows and web application development and deployment. Analytics, Data and the Time for High Performance Computing in Financial Markets 3 Microsoft Windows Compute Cluster Server 2003 In this introductory section, let’s spend a little more time looking at the core member of the Microsoft HPC Platform, Windows Compute Cluster Server 2003 (WCCS). Windows CCS provides an integrated application platform for deploying, running, and managing high performance computing applications. For customers who need to solve complex computational problems, Windows CCS accelerates time-to-insight by providing an HPC platform that is easy to deploy, operate, and integrate with your existing infrastructure. Windows CCS allows researchers to research and financial analysts to conduct analysis with minimal IT administration. Windows CCS operates on a cluster of servers that includes a single head node and one or more compute nodes (see above diagram). The head node controls and mediates access to the cluster resources and is the single point of management, deployment, and job scheduling for the compute cluster. Windows CCS uses the existing corporate Active Directory infrastructure for security, account management, and overall operations management using tools such as Microsoft Operations Manager 2005 and Microsoft Systems Management Server 2003. For more detailed information please visit: http://www.microsoft.com/windowsserver2003/ccs/technology.aspx and http://www.microsoft.com/windowsserver2003/ccs/techresources/default.mspx Analytics, Data and the Time for High Performance Computing in Financial Markets 4 Driving the Need for HPC - Regulation So what is driving this growth towards high volumes of data, greater complexity and faster analysis, and ultimately the need for high performance computing in financial markets? Regulatory compliance is a major factor behind many market developments, and wherever you look regulators require institutions to manage and analyze more data, ever more quickly. Some more detailed explanations of these regulations that are taking us towards financial markets that need more compute power and greater accessibility to it are outlined below. Analytics, Data and the Time for High Performance Computing in Financial Markets 5 Basel I, II and Risk Management Basel I & II are industry-wide banking regulations defining capital adequacy requirement for banks based around the concepts of market risk, credit risk and operational risk. Basel I and its subsequent 1996 amendment were the drivers behind the use of Value at Risk (VaR) to determine the degree of market risk being undertaken. This calculation is typically data- and calculation-intensive, from the perspectives of both multi-factor market simulation and full revaluation of derivative and fixed income instruments. As such VaR is ideally suited to the application of HPC in order to enable more detailed scenario and sensitivity analysis and calculations within shorter timeframes. Basel I was also one of the drivers behind the growth of the credit markets, where regulatory weaknesses in Basel I were exploited to free up regulatory capital using credit derivatives. Basel II (effective 2008 worldwide) focuses on adding credit risk and operational risk to the Basel I framework, and brings with it further computational challenges in extending risk methodologies to credit risk management and promoting further innovation in structured credit derivative instruments. HPC offers a technology framework where the need to re-price ever more complex financial instruments can be successfully matched against the compute power needed to achieve this goal in a meaningful business timeframe. Sarbanes Oxley and Spreadsheets Sarbanes Oxley is a US federal law passed in response to a number of corporate and accounting scandals that have occurred in the US, and enforces higher reporting responsibilities and standards upon any corporation that trades in the US. Given the ubiquitous usage of spreadsheets in the creation of company report and accounts, Sarbanes Oxley has been one of the main drivers behind the establishment of “Spreadsheet Management” as a mainstream sector of the software market. Whilst Sarbanes Oxley is not specifically targeted at institutions within financial markets, its implications apply equally to any corporation operating in and around the US financial markets. Additional and very specific weight to the business need for spreadsheet management within financial markets is the operational risk associated with overuse of spreadsheets by traders. Traders typically use spreadsheets as the main tool for pricing, hedging and risk management when derivative products are too complex for mainstream systems to handle. In addition to the direct operational risk of traders reporting on the value of their own positions, is that regulators can insist on much higher regulatory capital levels if a bank’s risk management systems are not capable of analyzing and reporting upon the more exotic products within a portfolio. Here the Microsoft HPC platform can once again add value, both from the perspective of increasing data transparency for regulators by using the server-based spreadsheet technology found in Excel Services, Analytics, Data and the Time for High Performance Computing in Financial Markets 6 and by using HPC behind Excel to enable closer to real-time calculation of trading opportunity and risk exposure. UCITS III, Derivatives and Structured Products The original UCITS (Undertakings for the Collective Investment of Transferable Securities) directive was intended by the European Union (EU) to create a single European market and regulatory framework for fund management, allowing investment management companies to “passport” between different member states. In many ways its “passporting” goals were similar to the more recent MiFID directive from the EU (see below). Whilst its latest incarnation, UCITS III, continues with this aim in the “Management Directive” side of its scope, the “Product Directive” has enabled the increased usage of derivatives and complex structured products by fund management institutions. UCITS III expands the range and type of financial instruments permitted by the original 1985 directive to include: Transferable Securities and Money market instruments. Bank Deposits. Units of other investment funds. Financial derivative instruments. Index tracking funds. Overall a UCITS Fund can have up to 20% of its NAV invested in OTC derivatives and up to 35% if exchange traded derivatives are used. The leverage potential of this change is enormous, as is the attractiveness to fund managers of being able to change fund risk/return profiles, particularly in times where stock market performance has been highly variable. These factors have contributed to growth in equity derivatives, credit derivatives and structured products on the buy and sell side, with the competitive drive for product innovation requiring the pricing of instruments of greater complexity. The advantages of using these more complex products do not however come without cost. In this regard, both buy-side and sell-side institutions must put the tools in place that enable their technology to scale up along with business need. The Microsoft HPC platform can fulfill this need, enabling reliable and rapid valuation and re-pricing of complex instrument portfolios. RegNMS and Data Volumes The Regulation National Market System (RegNMS) aims to promote “best execution” and greater market transparency within US equity markets. The four main components of RegNMS are the: Order Protection Rule – ensuring that best bid and best offer prices are displayed in all markets. Analytics, Data and the Time for High Performance Computing in Financial Markets 7 Access Rule – to prevent an order being executed in one market center at an inferior price to that available in another. Sub-Penny Pricing Rule – to prevent practitioners quoting in fractions of a penny to step ahead of existing buy and sell limit orders. Market Data Rule - more transparent allocation of market data revenues, combined with the freedom for broker-dealers and market centers to distribute their own data. The implications of these rules are wide and, although effective October 2007, are still subject to much debate. However, given that aspects of the regulation are concerned with market transparency, inevitably this will increase the need to manage and analyze more data sourced from a multitude of trade execution venues. Whilst many of the proceeding examples have been concerned with the pricing of more complex instruments using ever more complex mathematical techniques, here the computational challenge faced is based upon the sheer volumes of data to be analyzed in timeframes that are meaningful from a business perspective. HPC is not the complete answer to these data analysis problems, but the issue is now migrating from a heavily data centric problem to one that focuses on making more sense of the data through more detailed and faster analysis. MiFID, Data Volumes and Data Analysis The Markets in Financial Instruments Directive (MiFID) issued by the European Union (effective November 2007) will be a great catalyst for change in European financial markets. Like RegNMS, MiFID is designed to increase market transparency and focuses on ensuring “best execution” of financial instrument transactions in the EU. Unlike RegNMS, MiFID also has to deal with enforcing market transparency across the twenty-seven member states, and in this way could be considered as a more ambitious and far-reaching regulation, the full effects of which are yet to be fully determined. The regulation aims to allow investment firms to “passport” across member states (see earlier section on UCITS III) as a single regulatory framework is created across the EU. It encourages the creation of alternate trading venues (such as Project Turquoise for example) which will increase the venues from which data may need to be sourced to obtain a full market “picture” of instrument prices and liquidity. Pre-trade and posttrade reporting requirements will additionally put pressure on data management activity at investment institutions. MiFID also has execution of OTC derivatives within its scope, and so more complex products are also in scope for the increasing number of institutions that trade them. Increased data volumes and data complexity look set to be a key implication of MiFID implementation, presenting further challenges to existing data and analytics management technology and processes. Enabling both business users and technologists alike to face these future challenges with confidence is a key design aim of the Microsoft HPC platform. Analytics, Data and the Time for High Performance Computing in Financial Markets 8 Driving the Need for HPC - Competition Whilst regulation increasingly requires institutions to analyze more data more quickly, intense commercial competition is the profit-driven motivation for many institutions to look at high performance computing. The commercial rewards for bringing a new product or service to market quicker than the competition are enormous, and wherever calculation time is a delaying factor then use of HPC will follow. Combine this decreased time-to-market with improved end-user productivity and operational efficiency, and the case for deployment of the Microsoft HPC Platform is a strong one, as outlined by the commercial drivers described below. Analytics, Data and the Time for High Performance Computing in Financial Markets 9 Algorithmic Trading, Speed and Data Volumes The algorithmic trading market has undergone extraordinary expansion in the US over recent years and adoption of this approach to trading is increasing dramatically in Europe and Asia. The days where individual traders would attempt to optimally fill a client buy or sell order through manually scheduling out trades during a day are numbered. Clients are now able to see better order execution outcomes through the use of computer automated or “algorithmic” trading techniques, and the banks and brokers are able to dispense with the cost of expensive trading staff. Additionally the banks and brokerage firms are now “productizing” the algorithms they have developed as competitive differentiators for their buy-side clients. This market has become a competitive battle around the speed (latency) of execution and the sophistication of the algorithms used in this market. These trading algorithms have given rise to greater trading volumes, as more trades are implemented in smaller transaction amounts across multiple trading venues. Changes in exchange quoting methods (penny quotes as described below) has also driven this growth in trade volumes. The addition of regulations such as RegNMS in the US and MiFID in the EU have also driven the growth in algorithmic trading activity, as the search for price transparency and liquidity cause algorithmic trading products to search across a greater number of trade execution venues. Whilst algorithmic trading is ordinarily concerned with the execution of client orders, the broader term of automated trading is also a driver of growth of trading volumes, where traders and investors are trading on their own account (i.e. using the organization’s own funds, not driven by client orders). Here traders go in search of “Alpha” from intraday opportunities, often based around techniques such as statistical arbitrage. As equity markets have matured, fewer opportunities for arbitrage trading now exist using daily/end of day data hence the increasing interest in intraday tick data. Additionally, the increase in algorithmic and automated trading has also opened the door to the concept of algorithms that compete against each other, either in terms of better execution or profiting directly from knowledge of what algorithms are being used in the market. Decimalization and US Trading Volumes Whilst the RegNMS regulation addresses sub-penny pricing, it is probably worth a brief mention of decimalization (penny pricing) of market quotes, and how this has been a catalyst for increased trading volumes in the US. Decimalization in the US equity markets since the year 2000 has contributed greatly to a reduction in trading spreads and has also contributed to an increase in trading volumes, coupled with (and driving) the growth in algorithmic and automated trading. This pattern is currently being repeated in US exchange-traded option markets, where the recent change to penny quotes (for example the Feb 2007 change to penny quotes on OPRA) will Analytics, Data and the Time for High Performance Computing in Financial Markets 10 increase trading volumes, increase algorithmic trading activity in these markets and increase the amounts of data that needs to be analyzed. Credit Derivatives and Product Complexity The credit markets have also undergone rapid expansion over recent years, with issuance rising from $180 billion in 1996 to an estimated $20 trillion in 2006 according to the British Bankers Association. Prior to the creation of the credit derivatives markets in the early 1990s, it was very difficult for banks to reduce or offload the credit risk associated with their loan portfolio (i.e. the very real risk that some of the loans will not be repaid due to company bankruptcy and other circumstances). The credit derivatives markets and the products within it have allowed credit risk to become a tradable commodity, and hence credit risk can be better distributed across the financial markets from high leveraged institutions (banks) to better-capitalized businesses such as investment institutions and corporations. As mentioned earlier, regulatory arbitrage has also played its part in the growth of this market. The benefits and pitfalls of the advent of the credit derivatives markets are much debated, however a continued feature of this market is product innovation, as quantitative researchers create more complex products and structures. One of fundamental and enabling building blocks of the market (and still accounting for around a third of the market’s size) is the Credit Default Swap (CDS). In overview terms, a CDS is a financial contract that pays the buyer a fee if a third party should default in its payments on some referenced debt obligation. Other more complex products, such as Collateralized Debt Obligations (CDOs) and CDO squared instruments, can require extensive use of Monte-Carlo simulation methods since the pricing techniques in this market are new and immature when compared to other better-understood asset classes. Gone are the days when a credit derivatives desk could simply acquire a new server to apply to each new computationally intensive trade undertaken. As trading desks compete for more business within the credit derivatives market, scalability across business processes will be a key success factor. The Microsoft HPC platform is one of the key technologies that will enable scalability and faster pricing of both existing and new types of exotic credit derivatives. Structured Products Growth in the US Given the effects of UCITS III described in the previous section and its effects on structured products trading in Europe, it is also worth mentioning that structured products are finally experiencing very strong growth in the United States (U.S.). U.S. markets and investors have always been very equity focused. Europe has led the growth in structured products because of the greater taxation, regulatory and market complexity of the European market. Even so, growth of U.S. structured products issuance from $50 billion in 2005 to around $100 billion in 2006 means that more complex products are here to stay in the US, with a Analytics, Data and the Time for High Performance Computing in Financial Markets 11 ll of the complexity of data management and data analysis that constant product innovation implies. Again with its computational scalability, reliability and easy integration with Microsoft Excel and Microsoft’s development tools, the Microsoft HPC Platform is the technology to accelerate the management of structured products. Liability Driven Investment (LDI) and Derivatives Another factor that has contributed to the increase use of derivatives by asset management institutions is the emergence of “Liability Driven Investment” (LDI) as a mainstream technique for pension fund management. As a relatively recent refinement to Asset and Liability Management (ALM), investment managers are now offering pension funds an LDI service that focuses on better matching of pension fund assets to member liabilities. As described in more detail below, LDI demands the kind of increased compute power available from the Microsoft HPC platform. In a traditional scenario, a pension’s asset profile might well be invested primarily in equities, leading to a large exposure to equity market risk. This risk profile is not well matched against the liability profile of long-term cash payments made to retired members of the fund, which is exposed to different risk factors such as interest rates, inflation and longevity. Within LDI, asset management institutions are now competing with the sell-side banks in offering investment overlay services, where the asset risk profile is overlaid with derivative products such as equity, fixed-income and inflation derivatives. In addition to pricing and hedging the derivatives used in LDI, a further computationally intensive stage is the optimization of the asset portfolio against the client’s liability benchmark. Hedge Fund Valuation and Risk Management As investment management institutions search for greater sources of trading “Alpha”, investment by fund managers, pension funds in hedge funds has become much more mainstream. Many of the investment and trading strategies employed by hedge funds involve the use of derivatives and more complex investment strategies to generate Alpha, as well as seek returns that are less correlated with mainstream market indices. The desirability of these more complex strategies and acceptance of hedge funds as a clearly defined asset class is in turn driving the need for better risk management and derivatives valuation on the buy-side. Analytics, Data and the Time for High Performance Computing in Financial Markets 12 Where to Apply HPC in Financial Markets Now we have examined some of the drivers behind the growth in data volumes, complexity and speed of analysis, let’s get into more detail on where Microsoft HPC can be applied within financial markets. Back Testing, Data Mining and Complexity in Algorithmic Trading Strategists involved in automated and algorithmic trading usually have no shortage of trading ideas that they would like to try out, against a background where they have very little time to develop and back test them. Even once an algorithm is found that has some potential, it is often the case that the algorithm needs to be re-written and re-tested within an automated trading environment that is production worthy. All of these processes take time, and once again time to market is a key success factor in this rapidly evolving market sector. Analytics, Data and the Time for High Performance Computing in Financial Markets 13 Additionally as the automated trading market develops further, the algorithms used are becoming more complex, combining the real-time statistical analysis of real-time, intraday and historic data. This complexity is only set to increase, especially as new asset classes fall within the grasps of trading automation, implying that real-time derivatives and fixed income pricing will soon be required within all algorithmic trading solutions. Whilst latency is the current key technology metric for algorithmic trading, as algorithms and the asset classes traded by them become more complex, compute power will be fundamental to success in this market. Against this background of increasing algorithmic complexity, is the growing difficulty in both managing and analyzing vast volumes of stored real-time data. According to recent estimates by the Tabb Group, markets world-wide generate around 3.5 Terabytes of real-time data per day, leading to databases at many institutions that are currently many 10’s of Terabytes in size and which will soon expand far beyond these levels. Many institutions now looking at their huge investment in databases of tick data are finding that workstation-bound analysis of these vast datasets by traders is simply not feasible and limits the kinds of analysis that can be done. The question seems to be heading towards: “Well, we have all the data we need, so what shall we do it with now?”. This data analysis challenge demands scalable, fault-tolerant server side tools that deliver compute power to traders, without sacrificing the traders’ ability to try new ideas and strategies quickly and easy. The Microsoft HPC Platform can deliver this compute power to where it is needed, both for practitioners and technologists alike. Faster parameter optimization, faster back testing, faster re-pricing of derivatives and faster data analysis are all areas that the Microsoft HPC Platform can deliver upon, enabling faster time to market for new automated trading strategies. New Derivative Products from Quantitative Research The speed with which a new financial product can come to market is a key competitive factor for many derivative trading desks. Shorter time to market means higher margins can be charged as market demand can be met in the absence of strong price competition. A reputation for successful innovation also assists in driving market share through perceived “ownership” of expertise in a particular product area. Even for those products that are more widespread, the ability to quote a price faster to the client than the competition can only increase the chances of success. Quant and derivative research teams are at the front line of this battle of innovation, under constant pressure from trading desks to deliver pricing models for new products in timescales that are invariably short and very challenging. Whilst it may be the case that an elegant “closed form” solution can be ultimately found for a derivative product, it is highly likely that for new and complex derivative products financial mathematicians will have to apply numerical methods in order to value the security. Ultimately, faster numerical methods such as tree-based solutions may be found to the valuation problem. However it is highly likely that either the initial valuation (or at the very least Analytics, Data and the Time for High Performance Computing in Financial Markets 14 the validation) of the new derivative product will be done as some form of Monte-Carlo simulation, a technique that lends itself well to “parallelization” with the Microsoft HPC Platform. Product Control for New Types of Derivative Ensuring that the pricing and sensitivity outputs of a pricing model for a new and potentially exotic instrument are accurate before it is traded is a key concern for Product Control departments at many sell-side institutions. These concerns arise from the possible trading losses that may ensue if an instrument is mispriced, from the need to check that an instrument can be hedged and risk managed, and in order for the regulators to see that process is being followed and that unnecessary operational risk is not being taken. Time to market is key for new types of derivative and structured products, as profit margins are often determined by who else in the market can offer a similar service and whether or not competition on price comes into play. Hence the quicker that the process of bringing new products to market can be achieved, the greater are the opportunities for increased profitability. One of the main techniques used by quantitative analysts within product control is Monte-Carlo simulation to verify independently that the trading desk’s pricing method is both accurate and robust. If this process of model validation can be speeded up linearly through the use of the Microsoft HPC Platform to distribute this compute load across many machines, then time to market is reduced and profit margins will rise. As a final thought for the future, it is probably not long until product control will need to be involved in the validation of algorithmic trading “products” before they are released to market, with all of the back testing and data analysis that this process implies. Derivatives Trading Once an instrument has been priced and a deal done with a counterparty, it is then down to the derivatives trader to manage and hedge each instrument position within the context of a wider portfolio of derivatives, underlying securities and hedge instruments. Traders must consider both position risk and un-hedged exposure at both a single instrument and a portfolio level, under a wide variety of “what-if” scenarios. Given more (and easy) access to more computing power, traders would be able to revalue their portfolios and analyze more complex market scenarios in shorter periods of time. In this scenario, utilizing the Microsoft HPC Platform to deliver this compute power to traders would enable more robust post-trade hedging, plus faster analysis of pre-trade market opportunities available. Portfolio Optimization Portfolio optimization within finance has been around as a major subject within academia for a very long time, however its practical application has never quite met its Analytics, Data and the Time for High Performance Computing in Financial Markets 15 true potential as market realities have often interfered with the elegance of some of the mathematical techniques applied. Fixed transaction costs, non-fractional positions and other matters causing mathematical discontinuities have always forced practitioners down the route of more expensive and difficult optimization techniques, such as integer quadratic programming and the like. The advent of Liability Driven Investment and other techniques involving the use of derivative products have added further non-linearity and discontinuities to the mathematical and computational problems of applied optimization. Whilst some of the problems will remain just out of reach due to their computational intractability, the Microsoft HPC Platform delivers the compute performance that will enable practitioners to face more difficult problems with confidence, allowing faster optimization runs and more sophisticated modeling. Risk Management Given the economic pain associated with having high regulatory capital reserves enforced upon them, it is unsurprising that the calculation of enterprise Value at Risk (VaR) is one of the key responsibilities of risk management departments within sellside institutions. VaR can be defined as the level of portfolio loss that would not be expected to be exceeded under normal market conditions over a defined time for a given level of probability. For example, under Basel I the VaR calculation is defined as the loss that would only be exceeded one time in one hundred (99% VaR) over a ten day investment horizon. Taking a different example, if we had a portfolio of financial instruments worth $500 million with a 95% VaR level of $40 million over a one day timeframe, then only one day in twenty would we expect to suffer losses greater than $40 million. The two main techniques used to estimate VaR are the historical simulation and Monte-Carlo simulation methods. Historical VaR requires that a period of history is chosen whose set of returns will “represent” a distribution or scenario set of market conditions in the future. Under each scenario, defined by current market conditions perturbed by each historical set of changes, the profits and losses for the current portfolio are calculated. Monte-Carlo VaR is similar, except that Monte-Carlo simulation is used to model the future market scenarios rather than using a period of history to directly model market movements. However, Monte-Carlo VaR requires a multi-factor random simulation of market variables, and in this regard historic data is still required in order to calculate the correlation matrices necessary to drive the simulation. The addition of derivative products to the portfolio of instruments to be analyzed adds further complication, as the whole portfolio may need to be fully revalued for each simulation run. This could result in multiple Monte-Carlo simulations (to value the complex instruments in the portfolio) being run within each VaR scenario (themselves being generated by MonteCarlo methods). Analytics, Data and the Time for High Performance Computing in Financial Markets 16 Both methods are computationally intense, but are “embarrassingly parallel” (i.e. a calculation that can easily be split and distributed within a HPC environment). The parallel nature of these calculations, along with the fact that many are performed within Microsoft Excel lend themselves well to being implemented using the Microsoft HPC Platform, with the aim of spreading this large compute load across multiple servers and processors in order to reduce calculation time. Monte-Carlo simulation also requires correlation matrices to be pre-calculated in addition to performing the actual simulation runs themselves. VaR is usually calculated as part of an end of day batch process. Whilst initial implementations may run in an hour or so, it is often the case that with portfolio growth an “overnight” process quickly evolves into something that may run for many hours, challenging the concept of overnight reporting and certainly not allowing any time for re-runs should there be any process failures or strange results needing further investigation. This current pressure on the calculation time for VaR will increase as the growth in automated trading will eventually require institutions to move towards real-time risk management. Faster, real-time risk management will be the challenge of the future, a challenge that can be delivered upon using the compute scalability and fault-tolerance of the Microsoft HPC Platform. Analytics, Data and the Time for High Performance Computing in Financial Markets 17 Putting the Productivity in HPC – Some Practical Examples Introducing High Productivity Computing One of the key design aims of the Microsoft HPC platform is to make high performance computing available to both business users and technologists alike, and in so doing help to increase the productivity of all users. From a trading and risk management perspective, business users will be able to work on a Windows desktop, using familiar Windows applications, such as Excel, scaling seamlessly across a Windows cluster. All this can be done using a single set of security credentials and not requiring any additional skills or training to benefit from the power of distributed computing. From a technologist viewpoint, the Microsoft HPC platform offers integrated access to high performance computing using the Microsoft Visual Studio 2005 IDE, tightly integrated security and management features as mentioned above, powerful command line scripting and conformance to parallel computing industry standards such as MPI with the Microsoft Message Passing Interface (MS-MPI) and a parallel debugger. Time to market is a key factor within financial markets, and in this way the Microsoft HPC Platform has been designed to address the problem of HPC deployment, management and usage from both a business and technical viewpoint, not just a computational perspective. Given this focus on delivery of the complete solution, then Microsoft suggests that High Productivity Computing is both a valid and valuable alternative definition to be considered when talking about why financial markets institutions need HPC. Productivity Example #1 – Windows CCS applied to Excel 2007 Whilst we have seen some of the business cases for using HPC in financial markets in preceding sections, it is probably worth stating that the majority of these business cases for HPC have an additional common theme. That theme is that many of them will be front-ended (or at least prototyped) with the “lingua franca” of financial markets, Microsoft Office Excel. The tight integration that is possible between Windows CCS and Microsoft Office Excel 2007 has the potential to “democratize” access to high performance computing. Through spreadsheet access, high performance computing can be delivered directly to the trading, portfolio and risk managers who need it. Analytics, Data and the Time for High Performance Computing in Financial Markets 18 Microsoft Office Excel has become a mission critical application for most financial markets organizations, where business users find that the spreadsheet is the only environment that can quickly bring together position data, complex instrument data, market data and pricing models, and at the same time support the most complex analysis under a wide variety of market scenarios. Whilst Excel has been an overwhelming success in financial markets, these same business users have been pushing Excel to the limit and demanding enterprise performance and reliability. In summary, this increasingly “mission critical” usage of Excel requires: Guaranteed execution of mission critical calculations. Improved performance by completing parallel iterations on models or longrunning calculations. Improved productivity of employees and resources by moving complex calculations off desktops. Transparent access to centralized spreading reports and analysis. Excel add-ins are the primary way in which derivative and fixed income pricing models are deployed and used in financial markets. Complex spreadsheets used to price derivatives, simulate market movements or optimize portfolio composition are compute-intensive and, as we have seen, invariably it would be desirable to greatly speed up such calculations. Given the addition of the new multi-threaded calculation engine of Excel 2007 (see Appendix 1 for more detail), traders and risk managers will notice dramatic calculation speed improvements as Excel can now distribute its cell calculation load in parallel across the available cores and processors on the machine it is running on. In addition, the greatly improved memory-management features of Excel 2007 will ensure that less time is taken accessing physical disk storage for data intensive calculation runs. There are many cases however, where the parallelization of the calculation load on one machine will not be sufficient. In these scenarios, Excel 2007 can be combined with Windows Compute Cluster Server in order to deliver calculation scalability and fault tolerance across many machines containing multiple processors and processor cores. In order to achieve this, user-defined functions can be created and installed on 64-bit servers running WCCS. Using the new multithreaded calculation engine of Excel 2007, simultaneous calls can be made to the cluster servers to perform remote calculations. Since calculations may be performed in parallel by multiple servers, many complex spreadsheet calculations can be performed far quicker than before whilst, at the same time, the load on the local client machine can be significantly reduced. Hence traders and risk managers can achieve faster time to market with new ideas, and at the same time not bring their local desktop to a standstill as a big spreadsheet is calculated. Analytics, Data and the Time for High Performance Computing in Financial Markets 19 With this solution, organizations are able to move formulas out of the Excel workbooks and store them on a set of servers. Additionally, since the processing is moved off the desktop, organizations have the option to lock-down access to the formulas used in the calculations and just provide user visibility to results. This scenario requires creating a local, client, User-Defined Function (UDF) as an XLL that will schedule jobs on the head node via the web services API. Additionally a server-version of the UDF (or its functional equivalent) will need to be created which will live on the cluster servers. The “server” UDF will perform the calculations and return the results back to Excel. Productivity Example #1(a) – Excel-WCCS Adapter for Legacy XLLs As mentioned in Example #1 above, the integration of Windows CCS with Excel 2007 requires some coding work to be done to build new “server-side” UDF functions and to wrap existing legacy add-ins (often developed as XLLs) so that they can be accessed in Windows CCS from Excel. As a result of their desire to make the application of the Windows HPC Platform open and productive for all users, Microsoft has developed a tool called the Excel-WCCS Adapter. The Excel-WCCS Adapter allows a developer to automatically create a cluster-enabled add-in from an existing XLL and provides at runtime a system for the distributed execution of XLL function calls on the cluster, typically resulting in accelerated performance for supported scenarios. Analytics, Data and the Time for High Performance Computing in Financial Markets 20 The primary benefits of the Excel-WCCS Adapter include: No custom development required to adapt the existing XLL libraries for compute cluster deployment; provided XLL UDFs meet the deployment criteria The source code for the original XLL is not required as the automated proxy generation process analyzes the public interface of the XLL binary component itself to create a compatible compute cluster proxy component. Multi-user support where many different users can each run their own spreadsheets and call the same server-side UDFs on the compute cluster server. Multi-XLL support where each spreadsheet calls UDFs in multiple XLL libraries, each of which can be distributed to the server. Mixed deployment (i.e. spreadsheets with both local and remote XLLs are also supported). The diagram above illustrates the Excel-WCCS Adapter architecture in more detail. It should also be pointed out that certain deployment criteria must be met by the legacy XLL before this tool can be applied to it. Those readers interested in reading more about this productivity initiative for Excel add-ins should follow the link below: Analytics, Data and the Time for High Performance Computing in Financial Markets 21 Productivity Example #2 – Windows CCS and Legacy VBA Code When designing Excel Services, Microsoft took the decision that “unmanaged” (non.NET) code was not appropriate for secure, server-based spreadsheet access. Whilst the technical reasons for this approach are understandable and valid, it does however leave a productivity gap for those of users that are dependent on functionality contained within “unmanaged” Visual Basic® for Applications (VBA) code. Whilst the Microsoft HPC platform allows for many configurations of high performance clustering behind Excel 2007, it is possible and practical to combine legacy versions of Excel and Excel spreadsheets with Windows CCS. In this scenario, the legacy version of Excel and would be installed on each compute node of the Windows CCS cluster. An application or command-line script would then marshal what jobs (calculations within spreadsheet workbooks) need to be undertaken by Excel within the cluster and where results are to be placed. Obviously, this approach is only appropriate where the desired calculation can be parallelized. However, this is often the case where VBA code has been used to both control and implement Monte-Carlo simulations for risk management and pricing purposes. Analytics, Data and the Time for High Performance Computing in Financial Markets 22 Productivity Example #3 – Windows CCS and MATLAB The MathWorks is one of the world’s leading developers of technical computing and Model-Based Design software for risk managers and quants in financial markets. With an extensive product set based on MATLAB® and Simulink®, The MathWorks provides software and services to solve challenging problems and accelerate innovation in financial markets. MATLAB and Simulink enable risk managers and quants to concentrate on problem solving, instead of spending their time programming. The MATLAB development environment lets you interactively analyze and visualize data, develop algorithms, and manage projects. With Distributed Computing Toolbox and MATLAB Distributed Computing Engine users can tackle even larger problems than before by exploiting the computing power of a group of networked computers managed by Windows CCS. Analytics, Data and the Time for High Performance Computing in Financial Markets 23 Productivity Example #4 – A Windows CCS “Call to Action” for Systems Developers Given the levels of Excel usage within financial markets, it is perhaps unsurprising that some of the preceding examples have focused on the use of Windows CCS with Excel. This also reflects Microsoft’s belief that Excel is one of the main ways in which business users in financial markets can begin to harness and benefit from the computational power of HPC. However, Excel is only one way of accessing Windows CCS, and in this regard Microsoft has implemented very tight integration of Windows CCS with the Microsoft Visual Studio 2005 IDE. The .NET Framework and features such as parallel debugging of distributed calculations can help systems developers and independent software vendors to deliver HPC-enabled applications faster than ever before. As we have seen, the market need is there for HPC-enabled applications and Microsoft is making this transition to distributed architecture computing as simple and straight forward as possible. Analytics, Data and the Time for High Performance Computing in Financial Markets 24 Summary Regulation and competition are currently driving extraordinary levels of change and innovation within financial markets. The ability to manage higher volumes of data, greater complexity of data and faster analysis of data will be crucial in determining which organizations succeed in the markets of the future. High Performance Computing technology will play a key part in delivering the compute power to deal with this data and analytics revolution, and the Microsoft HPC Platform has been designed to maximize the productivity of users in meeting this challenge. Leveraging familiar operating system, application and programming resources, and combining them within one fully integrated management and security environment for clustering, means that the Microsoft HPC Platform is an enabling technology in the journey towards real-time pricing, trading and risk management across asset classes. Analytics, Data and the Time for High Performance Computing in Financial Markets 25 Appendix 1- URL Links for More Information Product Information http://www.microsoft.com/hpc http://www.microsoft.com/hpc/financial Client Case Studies http://www.microsoft.com/casestudies/casestudy.aspx?casestudyid=4000000030 http://www.microsoft.com/casestudies/casestudy.aspx?casestudyid=53917 Other Whitepapers Overview of Windows Compute Cluster Server 2003 http://download.microsoft.com/download/9/e/d/9edcdeab-f1fb-4670-8914c08c5c6f22a5/HPC_Overview.doc Windows Compute Cluster Server 2003 Reviewer’s Guide http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx Deploying and Managing Windows Compute Cluster Server 2003 http://go.microsoft.com/fwlink/?LinkId=55927 Improving Performance in Microsoft Office Excel 2007 http://msdn2.microsoft.com/en-us/library/aa730921.aspx Spreadsheet Compliance in the 2007 Microsoft Office System http://download.microsoft.com/download/8/d/7/8d7ea200-5370-4f23-bdcaca1615060ec4/Excel%20Regulatory%20White%20Paper_Final0424.doc Excel Services Technical Overview http://office.microsoft.com/search/redir.aspx?AssetID=XT102058301033&CTT=5&Origin=HA10 2058281033 Blogs http://blogs.msdn.com/hpc/ http://blogs.msdn.com/excel http://blogs.msdn.com/cumgranosalis http://windowshpc.net/ Analytics, Data and the Time for High Performance Computing in Financial Markets 26 Appendix 2- Improvements in Microsoft Office Excel 2007 Microsoft has made significant investments towards the improvement of Excel in the 2007 release. With Excel 2007 users will experience a redesigned interface that makes it easier and faster to create spreadsheets that match the growing needs of the market. Here are some of the key features of Excel 2007 that can help to greatly improve the data analysis capabilities of spreadsheet users and add-in developers: Expanded capacity for spreadsheets. Previous releases of Excel were limited to 256 columns by 65,536 rows. Users can now create spreadsheets with columns going to XFD (that’s 16,384 columns!) and include up to 1,048,576 rows. Maximized memory usage. Previous releases of Excel could only take advantage of 1 GB of memory. That limit has been removed and Excel can now take advantage of the maximum amount of memory addressed by the installed 32-bit version of Windows. Excel 2007 now includes a multi-threaded calculation engine. By default Excel will maximize the utilization of all CPUs on a machine. For customers running multi-core, multi-CPU, or hyper-threaded processes, linear improvements can be seen in performance, assuming parallelization of calculations and the use of components and environments that are multi-thread capable. Continued compatibility with existing Excel XLL add-ins, with the addition of an extended Excel XLL C API to fully support the functionality outlined above, plus new features such as support for strings up to 32,767 characters in length. For more information on Excel 2007, visit: http://office.microsoft.com/excel and for more detail on developing add-ins for Excel 2007 using the updated Excel XLL C API, see: http://msdn2.microsoft.com/en-us/library/aa730920.aspx Analytics, Data and the Time for High Performance Computing in Financial Markets 27