slides

advertisement

Ceph – status update and xrootd testing

Alastair Dewhurst, Tom Byrne

Tom Byrne, 12 th November 2014

1

Introduction

On 15 th October gave overview talk on plans for

Ceph at RAL Tier 1.

Will aim to provide updates on progress made focusing on the xrootd deployment and testing.

Current Ceph cluster with 7 nodes using 2013 generation hardware.

2

Tom Byrne, 12 th November 2014

S3 gateway

At last meeting we had S3 gateway on virtual machine:

Hope to have firewall holes + x.509 authentication working by next week.

S3 gateway ‘does it’s own thing’ with files which means it is difficult to use with other plugins.

Will investigate writing own WebDAV gateway.

3

Tom Byrne, 12 th November 2014

CERN plugins

CERN have four plugins based on XRootD for

CEPH:

• radosfs (impl. file & directories in rados)

• xrootd-rados-oss (interfacing radosfs as OSS plug-in)

• xrootd-diamond-ofs (adding checksumming & TPC)

• xrootd-auth-change-id (adding NFS server style authentication to xrootd)

Our work has been on the xrootd-diamond-ofs

Setup instructions can be found: https://github.com/cern-eos/eos-diamond/wiki

Tom Byrne, 12 th November 2014

4

Xrootd deployment

Used RPMs provided on wiki to setup XrootD gateway

Had to setup a Cache tier because it currently doesn’t work directly with erasure coded pools

This is because the file is opened and then appended to,

CERN are working on patching it to work with EC.

There are two pools:

Data and Meta-Data

5

Tom Byrne, 12 th November 2014

Cache Tier

Cache Tier is using mostly default settings

3 replicas of the data

Will create a ‘cold’ erasure coded copy instantly

LRU algorithm to clean up data.

We would prefer not to use a Cache Tier and have direct access to Erasure coded pool

It would be possible to have a ~10% Cache Tier in front of the storage.

We believe Erasure coded pool should work well as we are not appending to files.

6

Tom Byrne, 12 th November 2014

Diamond data

Plugin splits file into chunks which are stored with a

GUID in Ceph:

Makes it hard to manage files and write other plugins.

[root@gdss540 ~]# rados -p diamond-data ls | grep 774b1a83-14d0-4fb9-a6c0-10e36c32febf | sort

774b1a83-14d0-4fb9-a6c0-10e36c32febf

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000001

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000002

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000003

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000004

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000005

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000006

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000007

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000008

774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000009

774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000a

774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000b

774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000c

774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000d

774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000e

774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000f

7

Tom Byrne, 12 th November 2014

Diamond meta-data

8 https://indico.cern.ch/event/305441/session/5/contribution/37/material/slides/0.pdf

Tom Byrne, 12 th November 2014

Testing

Have tried commands from:

UI (using xrootd v3.3.6)

Node (using xrootd v4.0.4)

Can copy files in and out:

[root@gdss540 ~]# xrdcp ./ivukotic\:group.test.hc.NTUP_SMWZ.root

root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root.1

[760.2MB/760.2MB][100%][==================================================][95.03MB/s]

[root@gdss540 ~]# xrdcp root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root /ivukotic\:group.test.hc.NTUP_SMWZ.root

[760.2MB/760.2MB][100%][==================================================][58.48MB/s]

9

Tom Byrne, 12 th November 2014

“Filesystem”

xrdfs gdss541 mkdir "/atlas/?owner=10763&group=1307"

Can create directories with UNIX style permissions.

[root@gdss540 ~]# xrdfs gdss541 ls /atlas/

/atlas/ivukotic:group.test.hc.NTUP_SMWZ.root

/atlas/test

Setup is “Fragile” – frequently need to restart xrootd.

Dies when doing “ls –l”

10

Tom Byrne, 12 th November 2014

Direct Read

Code from Wahid:

• git clone https://wbhimji@git.cern.ch/reps/FAX

Wanted to try 4 tests:

Read 10% of the file and use 30MB cache

Read 100% of the file and use 30MB cache

Read 10% of the file and use 100MB cache – CRASHED!

Read 100% of the file and use 100MB cache – CRASHED!

30MB

Cache

1st 2nd 3rd Average

100% CPU Time /s

Disk IO

MB/s

31.13

112.654

31.13

112.951

30.5

113.094

10% CPU Time /s

Disk IO

MB/s

15.9

16.35

110.737

112.13

Tom Byrne, 12 th November 2014

16.04

112.056

30.92

112.8997

16.09667

111.641

11

Future plans

3 threads of development:

Get simplified xrootd to work.

Look into GridFTP gateway – Spoken to Brian Bockelman who has made equivalent for HDFS.

Look into Webdav gateway – Instructions to get started on

Ceph wiki and will speak to DPM developers.

Need to start looking at xattr

We have procured mac mini for future Calamari builds.

12

Tom Byrne, 12 th November 2014

Summary

We got S3 gateway to work, but it wasn’t quite what we wanted.

Testing Diamond plugin with help from CERN. Do not need all the features.

Question: Why do all the plugins create their own data formats?

If we go with an object store we will have to write our own plugins but this does not appear to be an impossible task.

13

Tom Byrne, 12 th November 2014

Download