Generic policy rules and principles Jean-Yves Nief Talk overview An introduction to CC-IN2P3 activity. iRODS in production: – Why are we using it ? – Who is using it ? – Prospects. iRODS rules policies through examples: – Resource Monitoring System. – Biomedical applications: • Human data. • Animal data. – – – – Arts and Humanities. Other rules: Mass storage system interface, access rights. Pitfalls. Future usages. Repository workshop - Garching 20/09/10 2 CC-IN2P3 activities Federate computing needs of the french scientific community in: – Nuclear and particle physics. – Astrophysics and astroparticles. Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, …. Opened now to biology, Arts & Humanities. dapnia Repository workshop - Garching 20/09/10 3 iRODS @ CC-IN2P3: why using it ? National and international collaborations. Users spread geographically (Europe, America, Australia…). Need for storage virtualization: - federation of heterogeneous storage (disks, tapes) and data access system (MSS, databases…). - transparent data access for end users. - middleware working on heterogeneous OS. - common logical name space. - virtual organization (access rights, groups etc…). - metadata search. - Easy interface with any kind of clients applications (APIs, drivers). Repository workshop - Garching 20/09/10 4 iRODS @ CC-IN2P3: why using it ? SRB being used since 2003: – 3 PBs handled for 10 different experiments (HEP, astro, biology). – Decomissionning: end of 2012 ? Limitation: – no centralized data management (DM). no enforcement of DM policy. iRODS rules based policy: – adequate solution. – from the user point of view: virtualization of data management policy. Repository workshop - Garching 20/09/10 5 iRODS @ CC-IN2P3: who is using it ? Arts and Humanities (Adonis): – Long term data preservation. – Web and batch jobs access. Biology (phylogenetic), fluid mechanics: – grid jobs. Biomedical applications: – Human and animal imagery. Biology (phylogenetic), fluid mechanics: – grid jobs. High Energy physics: – Neutrino experiment. Repository workshop - Garching 20/09/10 6 iRODS @ CC-IN2P3: who is going to use it ? Astrophysics experiments: – LSST … Other biomedical, physics projects. iRODS will be part of French NGI. All the SRB instances to be moved to iRODS. 1 PB should be reached soon. Repository workshop - Garching 20/09/10 7 Rules examples: Arts and Humanities Ex: archival and data publication of audio files (CRDO). CRDO CC-IN2P3 CINES Archive Repository workshop - Garching Fedora 1. Data transfer: CRDO CINES (Montpellier). 2. Archived at CINES. 3. iRODS transfer to CCIN2P3: iput file.tar 4. Automatic untar at Lyon + checksum. 5. Automatic registration in Fedora-commons (delayed rule). 20/09/10 8 Rules examples: biomedical data Human and animal data (fMRI, PET, MEG etc…). Usually in DICOM format. Main issue for human data: – Need to be anonymized ! Need to do metadata search on DICOM files. Rule: 1. Check for anonymization of the file: send a warning if not true. 2. Extract a subset of metadata (based on a list stored in iRODS) from DICOM files. 3. Add these metadata as user defined metadata in iRODS. Repository workshop - Garching 20/09/10 9 Rules examples: resource monitoring system Perf script iRODS data server 1. Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. 3. Results sent back to the iCAT. 4. Store metrics into iCAT. 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi). Perf script DB iRODS data server iRODS iCAT server Perf script iRODS data server Perf script iRODS data server Repository workshop - Garching 20/09/10 10 Other rules Mass Storage System integration: – Using compound resources: iRODS disk cache + tapes. – Data on disk cache replication into MSS asynchronously (1h later) using a delayExec rule. – Recovery mechanism: retries until success, delay between each retries is doubled at each round. ACL management: – Rules needed for fine granularity access rights management. – Eg: • 3 groups of users (admins, experts, users). • ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r • ACLs on all others subcollections => admins + experts : r/w, users : r Repository workshop - Garching 20/09/10 11 Developpements needed Scripts/binaries: – Metadata extraction from DICOM files. – Registration of files into Fedora-Commons. – … Needed whatever storage system being used underneath. Micro-services: – ACLs, tar/untar of archives file,… APIs already available, did not require a large amount of work (parts of iRODS distro). – Resource Monitoring System: bigger developpement, includes modification of the iCAT schema. Rules: – Most of them are simple. – Somes requires more work (Adonis project), workflow more complex. Repository workshop - Garching 20/09/10 12 Pitfalls and bugs Writing complex rules: – Avoid writing them directly using the .irb syntax. – Becomes difficult to debug especially with nested actions. solution: need to use ruleGen to generate rules in a more user friendly manner. Some memory leaks found with irodsReServer with Oracle as a backend: Fixed in 2.4. delayExec syntax bugs: Fixed in 2.4 and 2.4.1. Rules in configuration file at the moment: – Must be consistent on all the iRODS servers. Will be in the iCAT database in the future. Repository workshop - Garching 20/09/10 13 Prospects Rules for database interaction (in progress): – Will be used by DTM (developped at CC-IN2P3): • DTM managed list of tasks to be processed by a batch cluster. • DTM requires a database to manage the tasks. – Rule launched by the client will interact with the DTM database through iRODS: • More security: iRODS used as a proxy server (database behind a firewall, use iRODS authentication. • Database schema upgrade transparent for the client (no SQL code launched on the client side). Xmessaging system (part of iRODS): – Allow to exchange messages between different iRODS process or clients. – e.g.: Could be used to monitor job status in a distributed computing environnement. Repository workshop - Garching 20/09/10 14 Acknowledgement Thanks to: – Pascal Calvat. – Yonny Cardenas. – Thomas Kachelhoffer. – Pierre-Yves Jallud. iRODS at CC-IN2P3 03/25/10 15