Building the e-Science Grid in the UK: GridMon - Grid Network Performance Monitoring Mark Leese (m.j.leese@dl.ac.uk) and Robin Tasker (r.tasker@dl.ac.uk) CCLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD http://gridmon.dl.ac.uk/ Abstract: At last year’s inaugural AllHands meeting, our paper outlined the proposed development of a comprehensive and extensible network monitoring infrastructure for UK e-Science. This paper initially serves as an update, outlining the last year’s good progress in establishing an infrastructure which is already supplying tangible benefits. The paper then introduces the project’s second phase, which will see GridMon’s integration into Grid technology via compliance with the Open Grid Services Architecture. The starting point for this journey has been the development of GridMon as a web service. This, and future stages of the journey will be outlined. GridMon is not alone in developing a network monitoring system as a web and/or Grid service. Also described, as part of GridMon’s aim to be a “best of breed” network monitoring system for UK e-Science, are ongoing collaborations such as those with the Internet2 piPEs initiative. Finally, consideration is given to new work which seeks to redress some of the widely observed imbalance between the achieved and expected network performance of end users. By building on relevant research, GridMon hopes to provide “best practice” examples of TCP configuration, with our monitoring results showing these in a real world rather than ‘laboratory’ context. Glossary: API Application Programming Interface BAR Backbone Access Router BW Bandwidth to the World CCLRC Council for the Central Laboratories of the Research Councils CIM Common Information Model EDG European Data Grid GGF Grid Global Forum HEP High Energy Physics IEC International Electrotechnical Commission IEPM Internet End-to-end Performance Monitoring JANET Joint Academic NETwork LFN Long Fat Network MCC Manchester Computing Centre NMWG Network Monitoring Working Group OGSA Open Grid Services Architecture Introduction The concepts and practice of network monitoring are well understood and are widely used to identify problems, quantify performance and set expected levels of service. piPEs performance initiative Performance Environment system QoS Quality of Service R-GMA Relational Grid Monitoring Architecture RPC Remote Procedure Call RTT Round Trip Time SJ4 Super JANET 4 SLAC Stanford Linear Accelerator Centre SOAP Simple Object Access Protocol TCP Transmission Control Protocol UCL University College London UDDI Universal Description Discovery Integration UML Unified Modelling Language URL Uniform Resource Locator WP Work Package WSDL Web Service Description Language WSP Web Service Provider XML eXtensible Mark-up Language Monitoring for the Grid builds on these established concepts and practices, however, it is different in intent and purpose. Firstly, Grid monitoring deals with end-to-end performance. Secondly, it is closely coupled with real Grid applications and may allow those applications to vary their transport strategies for optimal performance, by for example, tuning TCP parameters. To facilitate this, the products of monitoring, the network metrics, are made available to the Grid middleware via a publication service. In addition, the data can also be made available to end users and network personnel. To this end, in June 2002, the UK e-Science Core Programme began funding work to “...design and deploy an infrastructure for network performance monitoring within the UK e-Science community.” This paper describes the first 12 months of the project, and outlines the new work being undertaken: web and Grid services, monitoring collaborations, and TCP tuning. The Last Year Before reviewing the last year’s progress, it may be helpful to provide a brief reminder of the architecture GridMon set out to establish 12 months ago. Monitoring Host IperfER PingER UDPmon bbcp/ftp GridFTP browser Every 30 minutes (90 minutes for bbcp/ftp and GridFTP) each machine performs monitoring between itself and all other e-Science Centres. In this way a mesh of monitoring is created, allowing each centre to build up a picture of the quality of its links to all other centres. The mesh approach is feasible given the relatively low number of sites involved (12-15 in this case). IperfER, PingER and UDPmon are tools used by the EDG WP7[1-4] group. Bbcp/ftp are end user tools used for network monitoring in an approach pioneered by the IEPM-BW work at SLAC[5]. miperfer[6], a multicast version of IperfER, is a new tool, created in the last 12 months at MCC. It is currently on extended beta trial. The toolkit currently deploys just PingER, IperfER and UDPmon, mirroring the original EDG WP7 approach. The remaining tools will be rolled out in due course. data files not database A presence has been established at all e-Science centres. Some problems exist, but these are being debugged. The rollout has required a great deal of effort, however, good foundations have been laid, which can now be built upon. publication service Feedback from sites that, as intended, use the tools themselves is favourable. GridMon’s success is further demonstrated by the fact that other work groups (e.g.UK HEP) are requesting to become monitoring hosts. Grid middleware People are also recognising GridMon as a useful vehicle for deploying testing tools that are of interest to them (e.g. miperfer from MCC). In addition, the project is gaining experience that is feeding into other e-Science monitoring projects, such as those being run at Cambridge, UCL and UKERNA. miperfer www.visualisation implemented using LDAP, R-GMA or as a web or Grid service. This will be discussed in a later section. Fig 1: GridMon architecture Monitoring is performed by a kit of tools installed on a suitable machine at each e-Science Centre. Performance data is stored locally on that machine, and is published to interested people via a web interface, and will be made available to the Grid middleware via a publication service. At inception, the publication service could have been The remainder of this section highlights the features of what most GridMon users see: the user interface, whose consistent view allows users to navigate with ease across the infrastructure. The start page for the GridMon installation at each site will feature a UK map, as shown in figure 2. Colour coded ‘blobs’ show the site’s connectivity to other UK sites within the last 30 minutes. Unsurprisingly, the blobs are red, green or amber. Floating text will display the level of packet loss that was last experienced. Fig 4: data plot Fig 2: active UK map Mouse clicking on a site (blob) takes the user to the GridMon interface for that site, where they the site’s performance data using a form as shown in figure 3. Fig 3: selection form The form allows the user to select the remote hosts/sites, metrics and date range that they are interested in. Clicking the View Plot button produces the corresponding data plot, as shown in figure 4. Clicking the View reverse direction button will show the same metric for the same period but in the opposite direction, i.e. load the equivalent web page from the remote end. We finish the section by looking at an example of where GridMon has proved useful. Fig 5: TCP performance Figure 5 shows a plot of TCP performance from Daresbury to Manchester (upper plot) and Newcastle (lower plot) for a period in December 2002. In this case the level of the graphs is unimportant; we are only interested in their shape. Note that the performance to Manchester is fairly flat, whilst Newcastle suffers a severe daily drop off. The performance to Newcastle was representative of Daresbury’s performance to all sites, except Manchester, and since Daresbury is connected into JANET via Manchester, this suggested the existence of a problem between Manchester and the SJ4 core. When prompted, the network staff at Manchester BAR discovered that a router had been misconfigured, causing it to under perform under high loading. Changes resulted in the improvements seen toward the right of the plot. Web Services During the lifetime of the project, various methods of publishing data to the Grid middleware have been mentioned, including LDAP and R-GMA. The popularity of these technologies is fading however, and there is a growing movement towards the use of web and (OGSA) Grid services (a Grid service is essentially a web service with some Grid specific add-ons/pre-requisites). When new technologies are developed, there is the inevitable temptation to quickly adopt them without considering their ‘true’ value, either to maintain your cutting-edge status or simply because everyone else is doing the same. In this case however, web and Grid services do offer real benefits…. Use of web and Grid services will lead to much easier integration of differing monitoring architectures, allowing systems to use functionality and data provided by others. In the UK for example, this would allow simpler integration of the GridMon and new UKERNA monitoring efforts, both e-Science projects. To fulfil its role as a “best of breed” monitoring solution, GridMon will need to take account of work going on elsewhere, and where possible, get involved. Web and Grid services will make this task easier and improve the chances of success. This will especially be the case if a web or Grid service is combined with a classification system such as that proposed by the GGF NMWG hierarchy document [7]. This document describes a set of network characteristics and a classification hierarchy for those characteristics, aimed at Grid applications and services. The application of the hierarchy will facilitate the creation of common schemata for describing network monitoring data, the idea being that using a standard classification for the measurements you take maximises the portability of your data. From a GridMon perspective, a web service will be the first to be developed, which can then be extended to a Grid service. The aim is to run them in parallel, so that GridMon can be interrogated by Grid and non-Grid applications alike. The basic web service architecture is shown in figure 6. 2. Client locates suitable service using registry Client UDDI registry 1. WSP registers service with registry 3. Client requests WSDL doc WSP 4. WSDL tells client how to interact 5. Service & client communicate using XML messages, sent via SOAP Fig 6: web service architecture A client will search a UDDI registry for a service that is of interest. Searches can be performed based on business name, service name or a service category. To make initial contact with a service, the client is given the URL of the service’s WSDL document. This XML document describes the methods (functions) that the service has made available, and how the client should interact with them. Once the client has retrieved the WSDL document it can start using the service, via XML RPCs and XML messages encapsulated in SOAP messages. Although beyond the scope of this paper, authorisation and authentication may also be an issue. In the absence of a suitable UDDI registry [8], the GridMon web services can be soft-coded as to the locations of the GridMon web services at other sites. For simple implementations, the results of using services can be returned as simple data types, such as strings, as they would with other RPC implementations. The only difference here is that results are encapsulated in SOAP. This isn’t very useful however, when dealing with large and complex datasets, and situations where the service could return differing amounts and types of data. Enter the schema, a self-describing method of representing data. This self-describing nature makes it easier to share data between clients and services that are capable of parsing schemas (being flexible about what data they can send and receive). PMP Backbone e.g. US Abilene network PMP PMP PMP PMP Host B PMP GigaPoP 2 Campus Y Fig 7: sample piPEs topology A full description of the architecture is beyond the scope of this paper, but it is worth outlining the salient features: • Collaborations The piPEs project [11], being run by Internet2’s E2Epi, seeks to reach a networking monitoring utopia. In this utopia, when users experience network problems, they have access to a tool which can tell them what the problem is, where it is located, and perhaps most importantly, who should be contacted for its resolution. GigaPoP 1 PMP Work has begun, spearheaded by the NMWG, on producing CIM, UML and XML[9] based network monitoring schemas, all based on the group’s hierarchy document Until these are evaluated, no firm decision can be made over which technology to use. As a proof of concept however, later iterations of the GridMon web services interface will use an XML schema based on work at UCL and the previously mentioned NWMG schema. Implementation of an initial web service is in progress, using Apache Tomcat to host the web application, and Apache Axis to provide the SOAP support required to turn the application into a service. This and subsequent versions will be used as a testbed in work conducted by UCL’s eScience Networking Centre of Excellence[10]. This is in addition to ‘proving’ the XML schema, and is yet another example of GridMon adding value. Campus X Host A • • In its final form, the piPEs infrastructure will be able to determine complete path (end-to-end) performance by aggregating information relating to the various segments that make up the path, whether these segments are in the same domain or not. • • The basic topology is produced by inserting Performance Monitoring Points (PMPs) at selected stages in a network (nominally alongside routers) as shown in figure 7. • A battery of tests is periodically performed, providing a minimum set of measurements of loss, jitter, throughput and one way delay. The resulting performance data is stored locally (within that domain) in a database. When users or network administrators request information about the state of the network, on-demand tests can be scheduled if the relevant data does not already exist in a local or remote results database. Users require authorisation to perform tests. Users have two ways of using the system: the human analysis engine and associated web display for dealing with historic performance, and the testing/analysis engine with associated interface for dealing with the “here and now” A “culprit database” exists to relate support personnel to network domains. An important point perhaps is that there is nonhuman access to data, other than from other piPEs domains. The piPEs initiative also has overlap with Dante’s multi-domain monitoring[12]. This will impact on GridMon via its work with piPEs and UCL. And while this work may sound ambitious, with experience suggesting that it may also be difficult to get all parties (domains) to sign up, the obvious benefits make it a worthwhile cause to champion. As previously mentioned, the SLAC IEPM–BW tools (bbcp/ftp…) will be integrated into GridMon. The tools will first be trialled between CCLRC’s laboratories at Daresbury and Rutherford Appleton. Some collaboration will also take place with DataTAG WP2[13] regarding the work outlined in the next section: TCP tuning. This section hopefully highlights the level of monitoring initiatives that the GridMon team have exposure to. GridMon is a UK e-Science project, but it doesn’t exist in a vacuum, and is evolving to show the best way to carry out monitoring, based on the best techniques and technologies from around the world. UK e-Science using the installed base of monitoring machines. A full discussion of TCP tuning issues is well beyond the scope of this paper, but an interesting if less frequently used example is interrupt handling. Many NIC drivers offer features to limit or queue the number of interrupt requests sent to the machine’s CPU. This throttling makes the NIC disturb the CPU as little as possible, leaving it free for other tasks. Relaxing these limitations can considerably increase NIC throughput, but at the expense of CPU utilisation, since it is disturbed more frequently. For a typical e-Science Grid application (which is likely to be computationally intensive) there must be a trade off between the requirements for network bandwidth and CPU usage. Work has begun in this area, initially using Gigabit Ethernet enabled machines at Daresbury and Rutherford Appleton. Figure 8 highlights the dangers of disabling various options! TCP Tuning Given the success of GridMon’s first stage in establishing a monitoring infrastructure, it is now possible to carry out work relating to end-to-end TCP performance, using the installed base of GridMon machines as a testbed. LFNs can be described as network connections that have high RTTs and high bandwidths, so that they resemble long and fat pipes. Problems with TCP’s inability to scale to work with LFNs were discovered as early as late 1980’s[14]. Fixes implemented since are now coming to the limit of their application, as the current definition of an LFN reaches a new order of magnitude. TCP’s current problems with LFNs, and other typical eScience applications are well documented [15] [16] [mathematical treatment 17]. Matters are not helped by known implementation problems [18]. This has given rise to new TCP implementations such as Fast[19] and Scalable[20] TCP, but with these technologies still at the experimental stage, a clear requirement exists for showing how to achieve optimum performance from existing “standard” TCP implementations, such as Reno. Much work is going into this topic and it is GridMon’s intention to use the available research to demonstrate real-world TCP best practice to Figure 8: initial TCP tuning Acknowledgements The work described here is closely coordinated with work underway within the EDG, and benefits from collaboration with the IEPM work at SLAC, and multicast work at MCC. Conclusion The first year of the GridMon project has gone well, with an initial presence established at each of the 12 e-Science Centres. There have been, and continue to be some technical problems, but this is to be expected with a varied set of installed machines. This does not appear to be an off- putting factor however, and the success of GridMon is being demonstrated by the fact that non e-Science groups are requesting to become involved. Indeed, as GridMon grows in scope and functionality, its use is expected to widen further. 8. As we move into the second phase of work, GridMon is well poised to evolve into a “best of breed” monitoring solution, building on work of the GGF, Internet2, SLAC and others, acknowledged leaders in their respective fields. 9. Providing web and Grid services interfaces will increase GridMon’s user base by attracting users who were uninterested in the human interface, and by generating interest from other network monitoring groups who can now use GridMon with their own developments. TCP tuning can be considered as a value added service, providing a ‘real world’ networking best practice demonstrator using an already available infrastructure. Both these strands of work are being carried out because they will prove to be genuinely useful, rather than being the proving of a technology. The future therefore, is bright. Everyone is now familiar with Moore’s law, summing the rapid growth of semiconductor devices. Networking also moves at a fast pace, and whereas work beyond web/Grid services and TCP tuning may be difficult to predict, evaluating alternate TCP stacks such as Fast and Scalable TCP looks a likely contender. The arrival of SuperJANET5 also raises new possibilities, such as a permanent UK implementation of QoS. Whatever the direction of future UK networking, there is still much to do, and much that is possible. GridMon is funded until June 2004, and hopefully it will be given the opportunity to reach its full potential. 10. 11. 12. 13. 14. 15. 16. References 1. EDG WP7, Network Services: http://ccwp7.in2p3.fr/ 2. IperfER: http://www.hep.ucl.ac.uk/~ytl 3. Pinger: http://wwwiepm.slac.stanford.edu/pinger/ 4. UDPmon: http://www.hep.man.ac.uk/u/rich/ 5. IEPM-BW: http://wwwiepm.slac.stanford.edu/bw/ 6. miperfer: http://www.csar.cfs.ac.uk/staff/daw/ 7. B. Lowekamp, B. Tierney, L. Cottrell, R. Hughes-Jones, T. Kielmann, and T. Swany. A 17. 18. Hierarchy of Network Performance Characteristics for Grid Applications and Services, Global Grid Forum, 19 June 2003: http://www-didc.lbl.gov/NMWG/docs/draftggf-nmwg-hierarchy-00.pdf R.J. Allan, D. Chohan, X.D. Wang, M. McKeown, J. Colgrave, and M. Dovey. UDDI and WSIL for e-Science, Grid Support Centre, 2002. http://esc.dl.ac.uk/Papers/UDDI/uddi/uddi.ht ml D. Gunter. Schemas for Exchanging Network Measurements with OGSI. NMWG, 19 June 2003: http://wwwdidc.lbl.gov/NMWG/schemas/NMWG_Schemas_for_OGSI.html piPEs: http://e2epi.internet2.edu/E2EpiPEs/e2epipe_i ndex.html Dante inter-domain performance monitoring: http://www.dante.net/tf-ngn/perfmonit/ DataTAG WP2, High Performance Networks: http://icfamon.dl.ac.uk/DataTAG-WP2/ Y. Li, P.D. Mealor, M.J. Leese and P. Clarke. Plug ‘n’ Play (Network) Performance Monitoring. To be presented at UK e-Science All Hands Meeting, September 2003. V. Jacobson, R. Braden. RFC1072: TCP Extensions for Long-Delay Paths. IETF, October 1988: http://www.ietf.org/rfc/rfc1072.txt D. Katabi. Congestion Control for High Bandwidth-Delay Product Networks (extended abstract). MIT, February 2003: http://datatag.web.cern.ch/datatag/pfldnet200 3/papers/katabi.pdf W. Feng and P. Tinnakornsrisuphap. The Failure of TCP in High-Performance Computational Grids. Proceedings of 2000 Supercomputing Conference (SC ’00): http://csdl.computer.org/dl/proceedings/sc/20 00/9802/00/98020037.pdf T.V. Lakshman and U. Madhow. The performance of TCP/IP for networks with high bandwidth-delay products and random loss. IEEE/ACM Trans. Networking, vol. 5, no. 3, pp. 336-350, June 1997: http://www.ece.ucsb.edu/Faculty/Madhow/Pu blications/ton97.ps V. Paxson, M. Allman, S. Dawson, W. Fenner, J. Griner, I. Heavens, K. Lahey, J. Semke, and B. Volz, RFC2525: Known TCP Implementation Problems. IETF, March 1999: http://www.ietf.org/rfc/rfc2525.txt 19. C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R. L. A. Cottrell, J. C. Doyle, W. Feng, O. Martin, H. Newman, F. Paganini, S. Ravot and S. Singh. FAST TCP: From Theory to Experiment. Caltech, 30 March 2003: http://netlab.caltech.edu/FAST/ 20. T. Kelly. Scalable TCP: Improving Performance in Highspeed Wide Area Networks. CERN / Universiry of Cambridge, 21 December 2002: http://datatag.web.cern.ch/datatag/pfldnet200 3/papers/kelly.pdf