Catalogue Access on the Grid

advertisement
A R
D A
Catalogue Access on the Grid
Birger Koblitz
for the ARDA project
Grid Performance Workshop, Edinburgh, June 22nd, 2005
Overview
● Characteristics of Grid Catalogue Access
● How to Access Database on the Grid
● File catalogues: Comparing LFC and FiReMan
● AMGA the ARDA metadata server
● SOAP vs. TCP text streaming
● Conclusions
1
A R
D A
Grid Catalogues
Most prominent catalogues on grid are
●
●
File catalogues
Metadata catalogues
Both catalogues types normally have
(relational) database back ends
Special access pattern on grid (for HEP):
●
●
●
Write once read many times
Distinction between writers (production jobs) and
readers (analysis jobs)
Readers frequently read large amounts of catalogue
data in HEP ( O(1k - 100k) entries )
➔ Need
fast, bulk read access to DBs
2
A R
D A
DB Access on a Grid
API
Application
Client
+Performance
+Simple Implementation
− Security, Monitoring
− How do you authenticate?
Server
SQL
SQL-DB
DB-Service
SOAP
XML-RPC
Text
SQL-DB
“Service”: RLS, AMI, RefDB, ...
Server
SQL via
ODBC, JDBC
proprietary
Protocols
“Traditional” Way: ODBC, RAL, ...
There are 2 ways to access a DB remotely:
API
Application
Client
+Lightweight Client
+Security: GSI, x509
− Performance
− Implementation: State
3
A R
D A
LFC and FireMan
Both are 2nd generation catalogues by CERN
●
Support ACLs, GUIDs
LFC is LCG-2 file catalogue for EGEE:
●
●
●
C server with proprietary, binary RPCs
Uses transactions and DB cursors via sessions
No bulk operations
FiReMan is gLite catalogue for EGEE:
●
●
●
Uses SOAP, Java service in Axis
No DB cursor → no data consistency between calls
One call bulk operations as transactions, no sessions
Test setup:
●
●
Server: 2x Xeon @2.4GHz, 2GB RAM (DB + service)
Client: PIII @800MHz, multi-threaded client
4
A R
D A
FC: Performance
1200
timeouts
250
200
150
1000
800
600
400
100
timeouts
Inserts / Second
300
Reading
timeouts
350
Entries Returned / Second
Insertion
Fireman - Single Entry
Fireman - Bulk 100
LFC
50
0 1
2
5
50
20
10
Number of Threads
200
timeouts
100
0
1
2
5
10
20
Number Of Threads
50
➔LFC
faster for single ops, slower for many
(bulk operations missing)
➔FiReMan has problems with many clients
with C. Munro
5
A R
D A
FC: Protocol Analysis
Study of protocols with authentication enabled:
120
200
80
Number of Packets
100
Number of Packets
180
RESPONSE
GET NEXT
RESPONSE
READ DIR
60
AUTHENTICATE
160
READ DIR
140
120
AUTHENTICATE
100
GET SERVICE METADATA
80
40
GET STAT
20
0
AUTHENTICATE
0
5000
10000
15000
AUTHENTICATE
60
CHECK ENTRY EXISTS
40
AUTHENTICATE
GET INTERFACE VERSION
20
20000
25000
AUTHENTICATE
0
0
20000
40000
Data Transferred (bytes)
Both protocols have large overhead:
➔Several RPC needed
➔Authentication not persistent
➔SOAP blows up message by factor 5
60000
80000
100000
120000
Data Transferred (bytes)
with C. Munro
6
A R
D A
Tools
Main tool for Network tracing: Ethereal
System tracing: strace, gdb
7
A R
D A
LFC & FiReMan Summary
LFC:
●
●
●
●
Fast for single entries
Relatively small protocol overhead
Transactions → Consistency
No bulk operations → Slow for many entries
FiReMan:
●
●
●
●
Bulk operations → fast ops on many entries
No transactions → no consistent reading
Large protocol overhead
Timeouts
Both catalogues could reduce protocol overhead
LFC should implement bulk operations
8
AMGA Server
Server
Server
ODBC
PostgreSQL
MD-Server
ODBC
Asynchr. Buffer
SOAP
Command
PostgreSQL
Security wrapper
GSI
SSL
GSI
TEXT
Implement Metadata
server from what we
learned:
● Multi-threaded C++
server for Text-streaming
& SOAP
● Uses ODBC as
RDBMS abstraction:
Oracle, PostgreSQL,
SQLite
● Sessions supported in
SOAP & streaming
● DB cursors
● Streams responses
asynchronously
● Iterators for SOAP
TEXT
A R
D A
Server
Firewall
SSL
Security wrapper
Java-API C++-API Python
File
Application
Client
9
A R
D A
Interface: Retrieving data
The Bulk transfers to client are done through iterators
on the back end, resending query allows statelessness:
● int query(string query, MDResult &result)
● int nextQuery(string query, string token)
● int endQuery(string token)
struct MDResult {
Boolean last;
String token;
String query;
DataChunk chunk;
}
AMGA has streamed versions:
● int getAttr(string pattern, list<string>keys)
Returns values for all keys of the entries matching pattern:
➔ Client knows semantic
●
int find(string pattern, string query,
Handler &handle)
Returns all entries (no collections) matching pattern and fulfilling query
with gLite DM team
10
A R
D A
Performance
Extensive performance tests done on LAN:
read 60 attributes of 1000 entries
TCP-S, no KA
TCP-S, KA
gSOAP, no KA
gSOAP, KA
Ping
Average throughput [entries/sec]
Average throughput [calls/sec]
1000 ping operations
10000
out of Sockets
1000
1
10
# clients
100
TCP-S, Single
TCP-S, Bulk
gSOAP, Single
gSOAP, Bulk
getAttr
1000
100
1
10
100
# clients
No sessions used, no SSL
➔TCP Streaming in general 2-5 times faster than SAOP
➔Performance very promising
➔Importance of bulk transfers evident
with N. Santos
11
A R
D A
LAN and WAN
Comparisons of gSOAP and TCP streaming on
LAN and WAN
25
1000 ops, LAN (0.8ms latency)
1400
Raw no KA
Raw KA
gSOAP no KA
gSOAP KA
Multiplied by 5
1200
Execution Time [s]
15
10
{
20
Execution Time [s]
1000 ops, WAN (300ms latency)
1000
800
600
400
5
200
0
ping
add
get
get Bulk
0
ping
add
get
get Bulk (x5)
Single clients:
➔TCP streaming always fastest, but SOAP not bad
➔Times on WAN dominated by latency
➔Streaming dramatically faster on WAN
with N. Santos
12
x5
SOAP Toolkit Performance
1000 ping operations on LAN
Execution Time [s]
A R
D A
25 TCP-S no KA
TCP-S KA
SOAP no KA
SOAP KA
20
SOAP toolkits:
C++: gSOAP
Java: Axis
Python: ZSI
15
10
5
0
C++
Java
Python
SOAP toolkit quality varies widely:
● Took 2 weeks to write SOAP clients in 3 languages
● Toolkits incompatible, only hand-written WSDL works
● SOAP APIs differ ↔ BSD sockets standard
with N. Santos
13
A R
D A
Summary
Existing Catalogues can still be improved
SOAP problematic protocol for DB access:
●
●
●
No sessions -> No DB cursors
Large overhead
No streaming
Protocol should be tailored to task
●
●
Streaming very promising for DB access,
necessary on WAN
Statefullness needed?
LFC & FiReMan 2nd generation FCs
●
Still several issues with bulk ops, sessions, DB
cursors, large protocol overhead...
14
A R
D A
SSL and Sessions
SSL can dramatically reduce performance if no
sessions are used:
Pings per second [1/s]
10 clients, 100 pings each
Connection
Session
Mult. Connections
10000
1000
100
10
TCP-S
TCP-S w. SSL
gSOAP
gSOAP w. SSL
15
Download