Grid based Integration of Real-Time Value-at-Risk (VaR) Services Paul Donachy Daniel Stødle Terrence J harmer Ron H Perrott Belfast e-Science Centre www.qub.ac.uk/escience Brian Conlon Gavan Corr [some names here] First Derivatives plc www.firstderivatives.com DataSynapse, Inc. www.datasynapse.com Version 0.4 Abstract The workings within the financial sector are largely cyclical based on approximately 8-hour sessions, which contain overlaps due to worldwide activities starting with Asia, then involving Europe and finally North America. In each of these regions comprehensive risk assessment calculations are performed. One such assessment is Value-at-Risk (VaR), a statistical measure, initially developed by JP Morgan. VaR calculations are highly computationally intensive containing vast amounts of financial derivatives calculations. The financial sector depends heavily on such measures to gain a competitive advantage. At present such tasks are performed onsite in distributed environments. Despite the advances in distributed technology, a number of shortcomings exist in current practice. However, Grid Computing presents an alternative technology to provide shared access to heterogeneous resources in a secure, reliable and scalable manner. This paper outlines the shortcomings in current practice and proposes a gridbased service oriented framework providing secure, reliable and scalable access to core VaR services. Such a framework would allow the current time zone to utilise unused resources in the other global locations. Finally some initial experimental results from the eScience RiskGrid project will be presented. 1 Introduction market transactions as input to detailed and complex simulations. The requirements and demands of the financial markets are largely cyclical based on approximately 8-hour sessions, which contain overlaps due to worldwide activities starting with Asia, then involving Europe and finally North America. As a result of the stock market activities in these regions, comprehensive risk assessments are performed upon share portfolios using stock One such calculation is Value-at-Risk (VaR). VaR is a statistical measure, initially developed by J. P. Morgan in the 1980’s for its own internal company-wide value-at-risk system. Subsequently in the 1990s, VaR measures became increasingly important and were combined with Profit and Loss statements (P&L’s) in a report for the daily “after market” Treasury meeting in New York. Such VaR calculations are computationally intensive as large amounts of financial derivatives calculations are required. Using an 8 processor-based machine, these calculations can typically take up to 4 hours for a typical share portfolio of 100 trades. A vast amount of historical stock market data is used for such simulations. The daily activity on the NYSE can create up to 2Gigabytes of market transactions. As the calculations are highly data bound, the high throughput data-access required in the calculations frequently creates bottlenecks. As a result, the increasing demand for better application performance and consistent reliability continues to outstrip an organizations' supply of available computational resources. Currently companies in the financial sector depend heavily on such computationally intensive calculations to gain a competitive advantage. An improvement in VaR calculation times provides traders with a more accurate assessment of the potential risk in performing a given share trade. Such increased accuracy in risk assessment has a direct impact on the traders’ margins. At present and in order to meet computational demands, such tasks are largely performed on-site using distributed or multi-processor based systems. • The benefits of distributed computing are only economically viable for organisations with large amounts of available internal resources. • Only resources within the same administrative domain can be used – potential resources “internal” to an organisation are not used for the purposes of distributed computing externally. • Although delivering significant performance improvements the resources available within an individual organisation are in some cases not enough to deliver the results in a timely basis – this is the case even if all the resources were theoretically available. • Peak demand for processing resources often occurs when the supply of available resources is at a minimum. • There is a lack of standards in areas such as middleware and workflow management. • Practical testing and application of new research findings in finance and risk management theory (often resulting from advances in related disciplines such as mathematics, neural networking and physics) which requires intensive processing power is still beyond the power of current distributed processing capabilities. • The lack of timely processing capabilities is impeding research into advanced risk management and pricing algorithms. 2 Current Practice To date localised distributed computing technology used within the financial sector has largely been SIMD based parallelization techniques within homogenous cluster environments. However, despite the advances in distributed computing technology and the relative success of some such implementations there are a number of shortcomings in current practice. These include: 3 Grid Based Architecture The Grid based architecture presented here is based on the Open Grid Services Architecture (OGSA) model [1] derived from the Open Grid Services Infrastructure specification defined by the OGSI Working Group within the GGF [2]. The Open Grid Services Architecture represents an evolution towards a Grid architecture based on Web services concepts and technologies. It describes and defines a service-oriented architecture (SOA) composed of a set of interfaces and their corresponding behaviors to facilitate distributed resource sharing and access in heterogeneous dynamic environments. Service Requestor BIND FIND Transport Medium An example of such a requirement is the maximum calculation time a service requestor is willing to accept for a given financial VaR calculation service or the minimum amount of historical market data that is required from a financial database query service. The service directory will thus include not only taxonomies that facilitate the search, but also information such as maximum calculation time, QoS details or the cost associated with a service. When a service requestor locates a suitable service, it binds to the service provider, using binding information maintained in the service directory. The binding information contains the specification of the protocol that the service requestor must use as well as the structure of the request messages and the resulting responses. The communication between the various agents occurs via an appropriate transport mechanism [3][4]. This architecture is based on a view of service collaboration that is independent of specific programming languages or operating systems. Instead, it relies on already-existing transport technologies (such as HTTP or SMTP) and industry-standard data encoding techniques (such as XML). 4 VaR Services PUBLISH Service Provider Service Directory Figure 1 Figure 1 shows the individual components of the service-oriented architecture. The service directory is the location where all information about all available grid services is maintained. A service provider that wants to offer services publishes its services by putting appropriate entries into the service directory. A service requestor uses the service directory to find an appropriate service that matches its requirements. Using such a service oriented architecture we now present a framework for calculating VaR measures within a grid environment. In such an environment the first step for all service providers that wish to offer services is to publish its services via appropriate entries in the Service Directory. See Figure 2. These entries include those from service providers offering services such as FTSE Historical Databases, HPC resources and analysis and presentation resources. When the services are located the client binds to the service using binding information detailed in the service directory. This may, for instance, in the above example involve specifying the protocol that the client must use to interact with the database service and the transport mechanism that is to be used such as JMS or SMTP. See Figure 4 PUBLISH FTSE Historical Market Database HPC Resource Service Provider A Service Provider B Service Directory PUBLISH Client A Analysis/ Presentation Resource Service Provider C BIND Grid Service Instance Figure 2 Next the client requests the Service Directory for find appropriate services that are needed to provision the fulfillment of a VaR service. These may be found via a portal user interface or dynamically from within a client application. An example of such a request would be “find me services that retrieve all FTSE market data for the past month in format X that costs less than USD100 and take less than 30 sec”. See Figure 3. C lie n t A S e a rc h fo r la s t 3 0 d a ys FTS E m a rk e t d a ta in fo rm a t X GSH S e rv ic e D ire c to ry C o n ta in s in fo rm a tio n o n S e rv ic e P ro v id e r A S e rv ic e P ro v id e r B S e rv ic e P ro v id e r C Figure 3 FTSE Historical Market Database HPC Resource Analysis/ Presentation Resource Figure 4 5 Test Bed Implementation Our test implementation is based upon three components: The Globus Toolkit v3.0 [5] to handle Grid interaction, security and exposing the VaR service, DataSynapse LiveCluster [6] to manage a cluster of workstations to provide the HPC resources, and K/KDB [7] to handle data storage and the actual VaR calculations. Our cluster consists of 25 nodes running Windows XP, and 3 nodes running RedHat Linux, all on mid-range hardware. One of the more powerful Linux nodes also runs the LiveCluster server software, and is used to distribute tasks to the slave nodes. The architecture is based on three tiers, separated into User, Grid and Cluster domains. See Figure 5. In the User domain, different client applications can connect to the Grid, sending commands to the VaR service and receiving notifications from the presentation service as different jobs progress. The Grid domain consists of three important components. The first component is the VaR service itself, which exposes a number of different VaR-related calculations to its clients (the available calculations along with their arguments are published through a method provided by the service). The second component is a KDB service, which connects to a KDB daemon running on any computer reachable from the Grid environment. The KDB service feeds the VaR service with market and portfolio data, and is responsible for storing results as they arrive. (In our current implementation, the VaR service interfaces directly with the KDB daemon, despite the presence of a KDB service. This is due to the substantially improved performance by skipping two steps of expensive data marshalling/unmarshalling.) The final Grid-component is a presentation service, which provides the client applications with a simple way of getting updates as jobs progress, as well as view and export results in different formats. Figure 5: Test bed implementation architecture The third domain is the Cluster domain. As the user requests execution of different calculations, the VaR service gets the necessary data from the KDB service/ daemon, and proceeds to send the data to this domain. The cluster domain is managed by the DataSynapse LiveCluster software, and the calculation is performed by using a simple master-slave approach to distribute it. A DataSynapse job is used to receive calculation requests from the VaR service in the Grid domain. The job instance handles splitting the computation into manageable tasks, and then waits for DataSynapse to On the slave nodes, a simple tasklet (the part of a job that runs on all the slave nodes) is used to receive work from the LiveCluster server. The task descriptions contain an executable K statement, along with the parameters required for the K statement to execute successfully. The tasklet is also responsible for starting up a local K daemon (in case one isn't running), built for the platform the tasklet happens to be running on (Linux or Windows). After starting the daemon, the tasklet proceeds to send it the specified K statement and parameters, waiting until the daemon finishes processing the request and results are returned before announcing that it is ready to receive more work. The local K daemons are preloaded with routines that perform the calculations, so the K statement passed to the tasks will in general be a simple function call. In addition, the K daemons are preloaded with semi-static market data, to avoid passing this information around all the time. As is evident, the architecture we have deployed allows for a vast range of different computations. Not only can simple VaR calculations be distributed with relative ease, but also any other relevant financial calculations that fit within the master-slave paradigm can easily be added to the portfolio of available calculations. 6 Results Our preliminary results are gathered from running a call option pricing function (based on MonteCarlo simulations) on a large volume of stocks. A call option is an option to buy a stock at a fixed price at some time in the future. However, for call options to make sense, it is important that the call option is sold at a reasonable price, both for seller and buyer. The purpose of the calculation is thus to determine how much a call option on a stock valued at, for instance, £10, should be if the option allows the buyer to acquire the stock for £10 within 3 months. The result of the calculation will be the price the buyer will need to pay in order to get the stock option. Running time 6000 Time in seconds distribute the tasks to the available slave nodes (engines in DataSynapse terminology). As tasks complete on the slave nodes, the job receives notifications from the LiveCluster software, and proceeds to notify the VaR service of its progress. The VaR service will use these notifications to inform the presentation service that a job has made progress, or, in case of failure, that the job has failed. The presentation service continues by notifying any client applications. 5000 4000 3000 2000 1000 0 1 5 10 15 20 25 Nodes Figure 6: Running time for various node configurations Our test calculations were run on 1, 5, 10, 15, 20 and 25 nodes, pricing 30000 different call options. The running time of the calculation on different node configurations is shown in figure 6. provide the basis for an open source reference architecture for the financial sector. Speedup 30 However before widespread adoption happens within this sector a number of fundamental areas will need to be addressed: 25 Sp 20 ee du 15 p 10 • Security: The area of security, as with various other knowledge-based industries, will be a primary concern and requirement. As Grid technology looks to share resources both internally and externally within organisations, security and integrity of information are not only important but also critical to business operations. The whole area of AAA (Authentication, Authorization and Accounting) and the adoption of established security infrastructures within evolving grid standards will play an important role in the uptake of such technology within the financial sector. • Standards: The financial sector is already heavily loaded with various competing and proprietary standards. The addition of a further set of vendor specific proprietary grid computing standards will not assist in the adoption and uptake of grid computing. Integration of emerging Grid Computing standards, e.g. the Globus Toolkit, OGSA and involvement within the GGF standards will play a vital role. • Management: As grids evolve as a heterogeneous array of hybrid grid nodes, the management of such grids becomes more and more prevalent especially within the tightly controlled financial sector. To date, little or no work has been undertaken to investigate a cohesive strategy for managing such arrays of heterogeneous grid elements, and how such management strategies will 5 0 1 5 10 15 20 25 Nodes Figure 7: Calculation speedup Running the calculation on one node took approximately 82 minutes. Running the same calculation on 25 nodes gave a running time of about 3 minutes 20 seconds, or a speedup of 24.7. The graph in figure 7 shows that the calculation appears to attain an approximately linear speedup, which is welcome, but fairly unsurprising considering the nature of the calculation. 7 Summary Grid computing technology presents an architectural framework that aims to provide access to heterogeneous resources in a secure, reliable and scalable manner across various administrative boundaries. The financial sector is an ideal candidate to exploit the benefits of such a framework. Initial results presented here are encouraging, with considerable speedup achieved using the prototype commodity HPC service. In addition to increased performance, a major benefit in such architecture has been the creation of an integration fabric. Integration of such remote, heterogeneous resources in any enterprise is the major bottleneck and the realm of major Enterprise Application Integration (EAI) activities. Here we have presented a Grid based framework that could be integrated into existing enterprise/corporate management and operational support system (OSS) solutions. Such strategies have been overlooked and if not addressed soon, will inhibit the rapid adoption of Grid technology to the wider industrial community. References [1] OGSA http://www.globus.org/ogsa/ [2] OGSI http://www.gridforum.org/ogsi-wg/ [3] S. Burbeck, “The Tao of e-Business Services,” IBM Corporation (2000); see http://www- 4.ibm.com/software/developer/library/w s-tao/index.html. [4] http://www.research.ibm.com/journal/sj/ 412/leymann.html [5] http://www-unix.globus.org/toolkit/ [6] http://www.datasynapse.com/ [7] http://www.kx.com/ [8] Forget the web, make way for the grid, Deutsche Bank, 2000 http://www.b2business.net/DBNewEcon omy_report.pdf