Template to gather information about existing portal / middleware and infrastructure solutions (for WP4 and WP5) We assume to create three lists (portals, dispatchers, computing resources), linked among one another Partner number and name: UU – Partner 8 ● Portal solution(s) ○ Name CS-ROSETTA3 ○ URL http://haddock.science.uu.nl/enmr/services/CS-ROSETTA3 ○ Institution - who is hosting/responsible for the portal itself UU ○ Description, applications supported Chemical-shift based NMR structure calculations using rosetta ○ ○ ○ ○ Requires several applications to run, some on the grid some only locally: - CSRosetta2009_05 - DBScore - ProfitV3.1 - csrosetta3 - nmrPipe - rosetta3.5 (the only one really required on the grid) - talosplus -… Al together about 30GB of software and data required. Implementation framework (PHP, Tomcat, Django, …) ● The server front end are html pages (some with php). cgi scripts are python User AuthN/Z ■ Authentication mechanism - certificates, dedicated username/password, identity federations … Registration required from the WeNMR site (combination of username and password are used for submission) – Access to the portal via the WeNMR SSO module (only granted provided a valid X509 certificate registered with the enmr.eu VO ■ Authorization mechanism - VOMS, registration procedures, membership renewal Connection to the WeNMR SSO module. The server itself uses a robot certificate for submission to the grid Details about size of datasets ■ How much data are uploaded by the user on job submission? Typically a few to tens of MB ■ How much data are downloaded as the result? Results are presented on a web page – full result archive (tar gzipped archive) can be up to several GBs depending on the system size. ■ Are any “background” data used, e.g. PDB referred by id? No essential statistics ■ number of active users 50 registered (WeNMR stats) ■ ○ ○ number of jobs per year ~50 runs translating into 60-70 thousands individual grid jobs ■ average job length and number of CPU cores per job The jobs dispatched to the grid as part of the complex workflow have an average runtime of 1 ½ hours (EGI accounting portal stats). But this varies very much depending on the system sizes which job dispatcher is used (from the list in the next section) torque batch commands on the local resources, gLite WMS computing resources accessed (from the list below) local clusters (for pre- and post-processing) and EGI grid resources supporting the enmr.eu VO (Luna question) Do the compute jobs require shared drive mount? Are they multinodes jobs (ex: require MPI)? No ● Job dispatcher ○ Overall description of architecture Complex python workflow, creating individual jobs sent to local resources or the grid. Grid submission is handled by separate grid scripts (mostly csh scripts running as cron deamons). ○ Supported backends (local batch system, gLite WMS/CREAM, Dirac, OCCI, …) Local batch system (Torque/Maui), gLite WMS ○ Standard solution x proprietary ■ Interfaces, API ○ dispatcher vs. computing resource AuthN - user proxy certificates (using MyProxy?), robotic certificates, … Grid submission makes use of a robot certificate (in the name of Alexandre Bonvin) ○ Application software distribution - VM images, Docker, xroot, ... - see above ● Local computing resources ○ Operating system Scientific Linux (SL5.X) ○ Number of CPUs,Memory, Storage (total / used by portals) One cluster with respectively ~180 cores, 4 TB storage space. Results are only stored for 2 weeks to limit storage requirements. Software+portal account for ~30GB of diskspace. Current results storage used is 3.5 GB ○ Dispatcher / batch system Torque ○ Interfaces, APIs ■ No clear what is asked here? Seems redundant with above ● Remote computing resources ○ Provider EGI ○ local to specific portal x external (accessed remotely) All sites with a software tag added VO-enmr.eu-ROSETTA3.3 ○ ○ ○ Number of CPUs (or jobs), Memory, Storage (total / used by portals) The portal sends single CPU jobs to the grid (~50000 per year). Job+data size is typically less than 20MB. Only local temporary storage is used on the grid. Results recovered via the gLite WMS Interfaces, APIs ■ grid (CREAM, …) Fixed, dynamic and/or opportunistic resources Opportunistic resources from sites supporting the enmr.eu VO with the proper software tag added ● Software deployment / management ○ How is the software deployed? (e.g. sent with the jobs, remotely installed, other) Software is deployed and managed by us. Initially using the software manager role in the enmr.eu VO and installing the software in the local software dir on each grid site. Replaced now in most case by CVMFS ○ Licensing scheme Free for non-profit users, but does require a license form ● Storage solutions ○ Local and network storage requirements An active run on the server can generate several GB of data (even >10GB) while running. Concurrent runs are allowed to a maximum of 10 (and a max of typically 5 per user) ○ How are the results returned to the users? Users are notified by email and can access their results on a web page.