An Overview of the External Security Review and Performance of the GT4 Components William (Bill) Allcock 07 April 2005 with slides from: Stu Martin, Lisa Childers, Jarek Gawor, Sam Meder Sam Lang, Ravi Madduri, Jen Schopf Security Review GT4 Security Architecture External Review Reviewers Marty Humphrey (Univ of Virginia) Jim Basney (NCSA) Matt Crawford (Fermi Lab) Zach Miller (Condor) Jamie Frey (Condor) Carey Kireyev (Condor) 3 The Goals Review of the *architecture* No code looked at The plan was to have a written report from the review team, which we could respond to and then publish However, due to problems with ANL accounting, Russ ? , the head of the IETF security group, and the one person who was going to be paid, didn’t participate. He was also supposed to write the report At this point we are at the mercy of our volunteer reviewers. We do have email summary of the issues and recommended changes, discussed here 4 Web Services Security There were two main concerns Delegation service should turn over its key pair regularly. This is to prevent its theft and surreptitious use with future proxy certificates delegated to it This is now in place. The entire container uses the same keys. No protection from mis-behaving services in the container. reuses key pair. 5 GridFTP concerns Large body of C setuid code. Recommendation is to run the control channel (the client connection) process as a nonpriveledged user, run the data channel (moves the data, control channel connects to it, not the external client), as root setuid and lock it down to only accept connections from the control channel We now support anonymous and user/pass authentication No longer guaranteed GSI auth on 2811 Make sure the [de]activation is very explicit, not just empty anon name. 6 Other issues executables, configs, etc., should be owned by someone other than the user that runs the jobs (globus-exec .vs. globus-admin) Use of sudo policy needs to be cut and paste, putting a generic tool to specific use. Support X509 NameConstraint Extension Number of other things that we have already fixed, such as explicit destroy in Delegation Service. 7 Performance Issues What do we mean by Performance? Performance in the broadest sense of the word. How fast How many How stable How easy We keep the URL below (fairly) up to date http://wwwunix.globus.org/toolkit/docs/development/4.0drafts/perf_overview.html 9 GridFTP New GT4 GridFTP Implementation NOT web services based NOT based on wuftpd 100% Globus code. No licensing issues. Absolutely no protocol change. New server should work with old servers and custom client code. Extremely modular to allow integration with a variety of data sources (files, mass stores, etc.) Striping support is present. Has IPV6 support included (EPRT, EPSV), but we have limited environment for testing. Based on XIO wuftpd specific functionality, such as virtual domains, will NOT be present 11 New Server Architecture Data Transport Process (Data Channel) is architecturally, 3 distinct pieces: The protocol handler. This part talks to the network and understands the data channel protocol The Data Storage Interface (DSI). A well defined API that may be re-implemented to access things other than POSIX filesystems ERET/ESTO processing. Ability to manipulate the data prior to transmission. currently handled via the DSI In V4.2 we to support XIO drivers as modules and chaining Working with several groups to on custom DSIs LANL / IBM for HPSS UWis / Condor for NeST SDSC for SRB 12 Current Development Status GT3.9.5 is a beta release (interfaces wont change). This code base has been in use for over a year. There are bug fixes in CVS. We reused the data channel code, from wuftpd, and so has been running for several years. Initial bandwidth testing is outstanding. Memory leaks are approx 30 KB per 24 hrs of transfer One host was supporting 1800 clients. One workflow had O(1000) errors with the 3.2.1 server, and had none with the new server. Statically linked version in VDT 1.3.3 Bottom Line: You REALLY want to use the GT4 version. 13 Deployment Scenario under Consideration All deployments are striped, i.e. separate processed for control and data channel. Control channel runs as a user who can only read and execute executable, config, etc. It can write delegated credentials. Data channel is a root setuid process Outside user never connects to it. If anything other than a valid authentication occurs it drops the connection It can be locked down to only accept connections from the control channel machine IP First action after successful authentication is setuid 14 Possible Configurations Typical Installation Control Data Striped (n=1) Control Data Striped (n>1) Control Data non-privileged user root Striped Server (future) Control Data 15 TeraGrid Striping results Ran varying number of stripes Ran both memory to memory and disk to disk. Memory to Memory gave extremely high linear scalability (slope near 1). We achieved 27 Gbs on a 30 Gbs link (90% utilization) with 32 nodes. Disk to disk we were limited by the storage system, but still achieved 17.5 Gbs 16 Memory to Memory Striping Performance BANDWIDTH Vs STRIPING 30000 25000 Bandwidth (Mbps) 20000 15000 10000 5000 0 0 10 20 30 40 50 60 # Stream = 16 # Stream = 32 70 Degree of Striping # Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 17 Disk to Disk Striping Performance BANDWIDTH Vs STRIPING 20000 18000 16000 14000 Bandwidth (Mbps) 12000 10000 8000 6000 4000 2000 0 0 10 20 30 40 50 60 70 Degree of Striping # Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32 18 Scalability Results GridFTP Server Performance Upload 10MB file from 1800 clients to ned-6.isi.edu:/dev/null 2000 100 1900 Load 1800 90 1700 Memory Used 1600 80 1500 70 CPU % 1300 1200 60 1100 1000 50 Throughput 900 800 40 CPU % / Throughput (MB/s)) # of Concurent Machines / Response Time (sec) 1400 700 600 30 Response Time 500 400 20 300 200 10 100 0 0 0 500 1000 1500 2000 2500 3000 3500 Time (sec) 19 RFT So, what about Web Services… Web Services access to data movement is available via the Reliable File Transfer Service. WSRF, WS-addressing, WSN, WSI compliant It is reliable. State is persisted in a database. It will retry and either succeed or meet what you defined as ultimate failure criteria. It is a service. Similar to a job scheduler. You can submit your data transfer job and go away. 21 Important Points Container wide database connection pool Container wide RFT thread max Request has a thread pool equal to concurrency Resource Lifetime is independent of transfers Total number of transfer threads limited One resource per request Can either wait for ever or throw an exception Needs to exceed transfer lifetime if you want state around to check status. URL expansion can be time consuming Currently does not start transfers until fully expanded 22 RFT Architecture 23 RFT Testing Current Testing : We are in the process of moving the Sloan Digital Sky Survey DR3 Archive. 900+K files, 6 TB. We have killed the transfer several times for recoverability testing. No human intervention has been required. Current maximum request size is approx 20,000 entries with a default 64MB heap size. Since GRAM uses RFT for staging all GRAM tests that include staging also test RFT (and GridFTP and the delegation service and core…) Infinite transfer - LAN - killed the container after ~120,000 transfers. Servers were killed by mistake. Was a good test. Found a corner case where postgres was not able to perform ~ 3 update queries / sec and was using up CPU. Infinite transfer - WAN - ~67000 killed because of the same reason as above Infinite transfer - 3 scripts creating transfer resources of one file with life time of 5 mins. Found a synchronization bug and fixed it. -Active nightly tests and a script to randomly kill the container and database daemon are in progress. 24 MDS MDS Query results Only one set of data so far. No data yet for Trigger Service. Ran at this load for 10 minutes without failure. DefaultIndexService Message size 7.5 KB Requests processed: 11262 Elapsed Time: 181 seconds Average round-trip time in milliseconds: 16 ContainerRegistryService Message Size 32KB Queries processed: 6232 Elapsed Time: 181 seconds Average round-trip time in milliseconds: 29 26 Long Running Test Ran 14 days (killed by accident during other testing responded to over 94 million requests 13 millisecond average Query RTT 76 requests per second. Has also had diperf tests run against it (next slide) 27 28 Java Core Core Performance We’ve been working hard to increase basic messaging performance Factor 4 improvement so far We’re testing reliability We’ve shown that core can scale to a very large number of resources (>>10000) 30 Core Messaging Performance Messaging Performance 700 600 Time (ms) 500 Axis Update Branch (1/10/05) 400 CVS Head (1/10/05) CVS Head (11/05/04) 300 CVS Head (11/01/04) 200 100 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 Message Size (number of GRAM subjob messages) 31 Security Performance We’ve measured performance for both WS and transport security mechanisms See next slide for graph Transport security is significantly faster than WS security We made transport security (i.e. https) our default We’re working on making it even faster by using connection caching 32 Security Performance 1800 1600 1400 Time (ms) 1200 Transport Security CVS Head (1/19/2005) 1000 Message Security CVS Head (1/19/2005) 800 Conversation Security CVS Head (1/19/2005) 600 400 200 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 Message Size (number of GRAM subjob messages) 33 C WS Core What is implemented Implements serialization / deserialization Implements WS-Addressing Implements Secure Conversation, Message Security, and Transport Security (HTTPS, the default) Implements Notification Consumer (client side), but not Notification Source (server side) 35 C WS Core Clients: Java vs. C Java VM startup: large initial overhead Simple Java client Request/Response: ~5 seconds Simple C client Request/Response: ~0.5 seconds 36 C WS Core Performance: Service Container Without Security Java Container C Container 0.36s avg. Request/Response 0.015s avg. Request/Response With Security Java Container 0.66s avg. Request/Response C Container 0.12s avg. Request/Response 37 C Performance Improvements HTTP Persistence No Security, No Caching No Security, With Caching 0.17s avg. Request/Response With Security, No Caching 0.25s avg. Request/Response 2.6s avg. Request/Response With Security, With Caching 0.52s avg. Request/Response 38 C Performance Improvements (Planned) Improved Deserialization performance of optional schema elements WS-Security performance: Inlined Canonicalization 39 C globusrun-ws Performance Query Delegation Factories: 0.046s Query Certificate Chain: 0.058s CreateManagedJob: 0.12s Active Notification: 5.11s Cleanup Notification: 0.73s Done Notification: 2.29s C client total processing time: 1.12s 40 GRAM Some of our Goals “GRAM should add little to no overhead compared to an underlying batch system” Submit as many jobs to GRAM as is possible to the underlying scheduler Goal – efficiently fill the process table for fork scheduler Submit/process jobs as fast to GRAM as is possible to the underlying scheduler Goal - 10,000 jobs to a batch scheduler Goal - 1 per second We are not there yet… A range of limiting factors at play 42 Design Decisions Efforts and features towards the goal Allow job brokers the freedom to optimize Reduced cost for GRAM service on host Single WSRF host environment Better job status monitoring mechanisms More scalable/reliable file handling E.g. Condor-G is smarter than globusrun Protocol steps made optional and shareable GridFTP and RFT instead of globus-url-copy Removal of non-scalable GASS caching GT4 tests performing better than GT3 did But more work to do 43 GRAM / GridFTP file system mapping Associates compute resources and GridFTP servers Maps shared filesystems of the gram and gridftp hosts, e.g. Gram host mounts homes at /pvfs/home gridftp host mounts same at /pvfs/users/home GRAM resolves file:/// staging paths to local GridFTP URLs File:///pvfs/home/smartin/file1... resolves to: gsiftp://host.domain:2811/pvfs/users/home/smartin/file1 $GL/etc/gram-service/globus_gram_fs_map_config.xml 44 GRAM 3.9.4 performance Service performance & stability Throughput GRAM can process ~70 /bin/date jobs per minute ~60 jobs/minute that require delegation Job burst Many simultaneous job submissions Are the error conditions acceptable? Max concurrency Total jobs a GRAM service can manage at one time without failure? Service uptime Under a moderate load, how long can the GRAM service process jobs without failure / reboot? 45 Long Running Test Ran 500,000+ sequential jobs over 23 days Staging, delegation, fork job manager http://bugzilla.globus.org/bugzilla/show_b ug.cgi?id=2582 As an experiment we have been tracking some of our work in bugzilla 46 Max Concurrency Test Current limit is 32,000 jobs due to a Linux directory limit using multiple sub-directories will resolve this, but not likely to make it for 4.0 Simple job to Condor scheduler. Long running sleep job. No staging, streaming, no delegation, no cleanup http://bugzilla.globus.org/bugzilla/show_b ug.cgi?id=3090 47 Max Throughput Tests Current limit is approx 77 jobs per minute Simple job to fork scheduler. /bin/date job. No staging, streaming, no delegation, no cleanup bottleneck investigation http://bugzilla.globus.org/bugzilla/show_bu g.cgi?id=2521 with delegation was 60 jobs per minute http://bugzilla.globus.org/bugzilla/show_bu g.cgi?id=2557 48 Overall Summary Have we done all the testing we want to? Have we done far more testing on GT4 than any other GT release? absolutely not You bet Have we done enough testing? We think so 49 RLS I don’t really have results for RLS. They have not changed much and did their testing a while ago. However, will point out two projects that are heavily using it and GridFTP in production. LIGO – Laser Interferometer Gravitational Wave Observatory UK QCD – Quantum Chromodynamics 50 LIGO They don’t have such a pretty web site However, impressive numbers Produce 1 TB per day 8 sites > 3 million entries in the RLS > 30 million files This replication of data using RLS and GridFTP is enabling more gravitational wave data analysts across the world to do more science more efficiently then ever before. Globus RLS and GridFTP are in the critical path for LIGO data analysis. 51 Installation configure / make lots of binaries planned much better platform support nightly testing on a wide variety of platforms (will probably install for you) tools (need more work here) check your security config grid-mapfile-check-consistency 52