Alastair Dewhurst, Tom Byrne
Tom Byrne, 12 th November 2014
1
•
On 15 th October gave overview talk on plans for
Ceph at RAL Tier 1.
•
Will aim to provide updates on progress made focusing on the xrootd deployment and testing.
•
Current Ceph cluster with 7 nodes using 2013 generation hardware.
2
Tom Byrne, 12 th November 2014
•
At last meeting we had S3 gateway on virtual machine:
•
Hope to have firewall holes + x.509 authentication working by next week.
•
S3 gateway ‘does it’s own thing’ with files which means it is difficult to use with other plugins.
•
Will investigate writing own WebDAV gateway.
3
Tom Byrne, 12 th November 2014
•
CERN have four plugins based on XRootD for
CEPH:
• radosfs (impl. file & directories in rados)
• xrootd-rados-oss (interfacing radosfs as OSS plug-in)
• xrootd-diamond-ofs (adding checksumming & TPC)
• xrootd-auth-change-id (adding NFS server style authentication to xrootd)
•
Our work has been on the xrootd-diamond-ofs
•
Setup instructions can be found: https://github.com/cern-eos/eos-diamond/wiki
Tom Byrne, 12 th November 2014
4
•
Used RPMs provided on wiki to setup XrootD gateway
•
Had to setup a Cache tier because it currently doesn’t work directly with erasure coded pools
•
This is because the file is opened and then appended to,
CERN are working on patching it to work with EC.
•
There are two pools:
•
Data and Meta-Data
5
Tom Byrne, 12 th November 2014
•
Cache Tier is using mostly default settings
•
3 replicas of the data
•
Will create a ‘cold’ erasure coded copy instantly
•
LRU algorithm to clean up data.
•
We would prefer not to use a Cache Tier and have direct access to Erasure coded pool
•
It would be possible to have a ~10% Cache Tier in front of the storage.
•
We believe Erasure coded pool should work well as we are not appending to files.
6
Tom Byrne, 12 th November 2014
•
Plugin splits file into chunks which are stored with a
GUID in Ceph:
•
Makes it hard to manage files and write other plugins.
[root@gdss540 ~]# rados -p diamond-data ls | grep 774b1a83-14d0-4fb9-a6c0-10e36c32febf | sort
774b1a83-14d0-4fb9-a6c0-10e36c32febf
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000001
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000002
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000003
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000004
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000005
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000006
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000007
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000008
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000009
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000a
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000b
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000c
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000d
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000e
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000f
7
Tom Byrne, 12 th November 2014
8 https://indico.cern.ch/event/305441/session/5/contribution/37/material/slides/0.pdf
Tom Byrne, 12 th November 2014
•
Have tried commands from:
•
UI (using xrootd v3.3.6)
•
Node (using xrootd v4.0.4)
•
Can copy files in and out:
[root@gdss540 ~]# xrdcp ./ivukotic\:group.test.hc.NTUP_SMWZ.root
root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root.1
[760.2MB/760.2MB][100%][==================================================][95.03MB/s]
[root@gdss540 ~]# xrdcp root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root /ivukotic\:group.test.hc.NTUP_SMWZ.root
[760.2MB/760.2MB][100%][==================================================][58.48MB/s]
9
Tom Byrne, 12 th November 2014
xrdfs gdss541 mkdir "/atlas/?owner=10763&group=1307"
•
Can create directories with UNIX style permissions.
[root@gdss540 ~]# xrdfs gdss541 ls /atlas/
/atlas/ivukotic:group.test.hc.NTUP_SMWZ.root
/atlas/test
•
Setup is “Fragile” – frequently need to restart xrootd.
•
Dies when doing “ls –l”
10
Tom Byrne, 12 th November 2014
Code from Wahid:
• git clone https://wbhimji@git.cern.ch/reps/FAX
Wanted to try 4 tests:
•
Read 10% of the file and use 30MB cache
•
Read 100% of the file and use 30MB cache
•
Read 10% of the file and use 100MB cache – CRASHED!
•
Read 100% of the file and use 100MB cache – CRASHED!
30MB
Cache
1st 2nd 3rd Average
100% CPU Time /s
Disk IO
MB/s
31.13
112.654
31.13
112.951
30.5
113.094
10% CPU Time /s
Disk IO
MB/s
15.9
16.35
110.737
112.13
Tom Byrne, 12 th November 2014
16.04
112.056
30.92
112.8997
16.09667
111.641
11
•
3 threads of development:
•
Get simplified xrootd to work.
•
Look into GridFTP gateway – Spoken to Brian Bockelman who has made equivalent for HDFS.
•
Look into Webdav gateway – Instructions to get started on
Ceph wiki and will speak to DPM developers.
•
Need to start looking at xattr
•
We have procured mac mini for future Calamari builds.
12
Tom Byrne, 12 th November 2014
•
We got S3 gateway to work, but it wasn’t quite what we wanted.
•
Testing Diamond plugin with help from CERN. Do not need all the features.
•
Question: Why do all the plugins create their own data formats?
•
If we go with an object store we will have to write our own plugins but this does not appear to be an impossible task.
13
Tom Byrne, 12 th November 2014