4. Installation and Configuration

advertisement
DataGrid
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Document identifier:
DataGrid-01-TEN-0118-1_2
Date:
07/03/2016
Work package:
WP1
Partner:
Datamat SpA
Document status
Deliverable identifier:
Abstract: This note provides the administrator and user guide for the WP1 WMS software.
IST-2000-25182
PUBLIC
1 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Delivery Slip
Name
Partner
Date
From
Fabrizio Pacini
Datamat
SpA
24/11/2003
Verified by
Stefano Beco
Datamat
SpA
24/11/2003
Signature
Approved by
Document Log
Issue
Date
Comment
Author
0_0
21/12/2001
First draft
Fabrizio Pacini
0_1
14/01/2002
Draft
Fabrizio Pacini
0_2
24/01/2002
Draft
Fabrizio Pacini
0_3
05/02/2002
Draft
Fabrizio Pacini
0_4
15/02/2002
Draft
Fabrizio Pacini
0_5
08/04/2002
Draft
Fabrizio Pacini
0_6
13/05/2002
Fabrizio Pacini
0_7
19/07/2002
Fabrizio Pacini
0_8
16/09/2002
Fabrizio Pacini
0_9
03/12/2002
Fabrizio Pacini
1_0
13/06/2003
1_1
04/09/2003
Fabrizio Pacini
1_2
24/11/2003
Fabrizio Pacini
IST-2000-25182
First issue for Release 2.0
PUBLIC
Fabrizio Pacini,
Massimo Sgaravatto
2 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
Document Change Record
Issue
0_1
0_2
0_3
Item
General update
General Update
General Update
IST-2000-25182
Reason for Change

Take into account changes in the rpm
generation procedure.

Add
missing
info
about
daemons
(RB/JSS/CondorG) starting accounts

Some general corrections

Add Cancelling
information.

Add OUTPUTREADY job state.

Add new profile rpms.

Remove /etc/workload* shell scripts.

Add summary map table (user / daemon).

Add CEId format check.

Add new job cancel notification.

Modified RB/JSS start-up procedure

Add gridmap-file users/groups issues

Add proxy certificate usage by daemons

Job attribute CEId changed to SubmitTo

Add DGLOG_TIMEOUT setting

Add workload-profile and userinterface-profile
rpms
PUBLIC
and
Cancel
Reason
3 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Document Change Record
Issue
0_4
0_5
0_6
0_7
Item
General Update
General Update
General Update
General Update
IST-2000-25182
Reason for Change

Add configure option –enable-wl for system
configuration files

Add installation checking option –with-globus
for Globus to the Workload configure

Add new Information Index configure options

Remove edg-profile and edg-user-env rpms
from II and UI dependencies

Add security configuration rpm’s for all the
Certificate Authorities to UI dependencies

Add new parameters to RB configuration file

Add new Job Exit Code field to the returned
job status info

Remove dependence from SWIG in the
userinterface binary rpm

Modify command options syntax (getopt-like
style)

Add MyProxy server and client package
installation/utilisation

Modify job cancel notification

Add Userguide rpm

Modify configure options for the various
components

UI commands modified to use python2
executable

Clarify myproxy usage

Explain how RB/LB addresses in the UI config
file are used by the commands

Add –logfile option to the UI commands

Modify configure options for the various
components

Clarify UI commands –notify option usage

Add make test target for UI
PUBLIC
4 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Document Change Record
Issue
0_8
0_9
Item
General Update
General Update
1_0
General Update
1_1
General Update
IST-2000-25182
Reason for Change

Specified dependencies of profile rpms

Update needed env vars for UI

Explain how to include default constraints in
the job requirements

Explain that the lc field in the ReplicaCatalog
address is now mandatory

Explain how to specify wildcards and special
chars in "Arguments" in the JDL expression

Defaults for Rank and Requirements in the UI
config file

Added reference to the “.BrokerInfo” file
document

other.CEId in Requirements vs --resource
option

Explain MyProxy Server configuration

Added description of new parameters in RB
configuration file

RB/JSS databases clean-up procedure added

Explain usage of RetryCount JDL attribute

Better explain how to specify wildcards and
special chars in "Arguments" in the JDL
expression

Updated reference to JDL Attributes note

Added Annex on Submission failures analysis

Refer to WMS release 2

Description of new UI commands options for
interactive jobs (--nogui, --nolisten)

Added annexes section on job re-submission
PUBLIC
5 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Document Change Record
Issue
1_2
Item
General Update
Reason for Change

Add voms client APIs rpms among WMS
components dependencies

Update commands description due to the
integration with VOMS

Remove proxy credential creation from UI
commands

Remove --hours option from UI edg-jobsubmit command
Files
Software Products
Word 2000
Acrobat Exchange 5.0
IST-2000-25182
User files
DataGrid-01-TEN-0118-1_2.doc
DataGrid-01-TEN-0118-1_2.pdf
PUBLIC
6 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
CONTENT
1. INTRODUCTION ............................................................................................................................... 10
1.1. OBJECTIVES OF THIS DOCUMENT ........................................................................................................ 10
1.2. APPLICATION AREA ............................................................................................................................ 10
1.3. APPLICABLE DOCUMENTS AND REFERENCE DOCUMENTS ..................................................................... 10
1.4. DOCUMENT EVOLUTION PROCEDURE .................................................................................................. 12
1.5. TERMINOLOGY .................................................................................................................................. 12
2. EXECUTIVE SUMMARY ................................................................................................................... 14
3. WORKLOAD MANAGEMENT SYSTEM OVERVIEW ..................................................................... 15
3.1. DEPLOYMENT OF THE WMS SOFTWARE ............................................................................................. 17
4. INSTALLATION AND CONFIGURATION ........................................................................................ 20
4.1. LOGGING AND BOOKKEEPING SERVICES ............................................................................................. 20
4.1.1. Required software ............................................................................................................... 20
4.1.1.1. LB local-logger and LB APIs ........................................................................................................ 20
4.1.1.2. LB Server..................................................................................................................................... 20
4.1.2. Configuration ....................................................................................................................... 21
4.1.2.1. LB Local-Logger .......................................................................................................................... 22
4.1.2.2. LB Server..................................................................................................................................... 22
4.1.3. Environment Variables ........................................................................................................ 22
4.2. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM ........................................................................ 24
4.2.1. Required software ............................................................................................................... 24
4.2.1.1. Globus installation and configuration ........................................................................................... 24
4.2.1.1.1. Condor-G installation and configuration ............................................................................... 24
4.2.1.2. ClassAd installation and configuration ......................................................................................... 25
4.2.1.3. Boost installation and configuration ............................................................................................. 25
4.2.1.4. Replica Manager installation and configuration ........................................................................... 25
4.2.2. Configuration ....................................................................................................................... 25
4.2.2.1. Configuration of the “common” attributes..................................................................................... 26
4.2.2.2. NS configuration .......................................................................................................................... 27
4.2.2.3. WM configuration ......................................................................................................................... 29
4.2.2.4. JC configuration ........................................................................................................................... 31
4.2.2.5. LM configuration .......................................................................................................................... 32
4.2.3. Environment variables ......................................................................................................... 33
4.2.4. Other requirements and configurations for the “RB node” .................................................. 34
4.2.4.1. Customized Gridftp server ........................................................................................................... 34
4.2.4.2. Grid-mapfile ................................................................................................................................. 35
4.2.4.3. Disk Quota ................................................................................................................................... 35
4.3. SECURITY SERVICES ......................................................................................................................... 36
4.3.1. MyProxy Server ................................................................................................................... 36
4.3.2. Proxy renewal service ......................................................................................................... 37
4.3.2.1. Required software ....................................................................................................................... 37
4.3.2.2. Configuration ............................................................................................................................... 37
4.3.2.3. Environment variables ................................................................................................................. 38
4.4. GRID ACCOUNTING SERVICES ............................................................................................................ 38
4.4.1. Required software ............................................................................................................... 38
4.4.1.1. Creating the MySQL databases for the HLR server .................................................................... 39
4.4.1.2. Creating the MySQL database for the PA server ......................................................................... 39
4.4.2. Configuration ....................................................................................................................... 40
4.4.2.1. Configuring the HLR server ......................................................................................................... 40
4.4.2.2. Configuring the PA server............................................................................................................ 41
4.4.2.3. Configuring the ATM client software ............................................................................................ 41
4.4.3. Environment variables ......................................................................................................... 42
4.5. USER INTERFACE .............................................................................................................................. 43
IST-2000-25182
PUBLIC
7 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.5.1. Required software ............................................................................................................... 43
4.5.1.1. Python Command Line Interface ................................................................................................. 43
4.5.1.2. C++ API ....................................................................................................................................... 45
4.5.1.3. Java API ...................................................................................................................................... 45
4.5.1.4. Java GUI...................................................................................................................................... 46
4.5.2. RPM installation ................................................................................................................... 48
4.5.2.1. Python Command Line Interface ................................................................................................. 48
4.5.2.2. C++ API ....................................................................................................................................... 49
4.5.2.3. Java API ...................................................................................................................................... 49
4.5.2.4. Java GUI...................................................................................................................................... 50
4.5.3. Configuration ....................................................................................................................... 51
4.5.3.1. Python Command Line Interface ................................................................................................. 52
4.5.3.2. Java GUI...................................................................................................................................... 55
4.5.4. Environment variables ......................................................................................................... 58
4.5.4.1. Python Command Line Interface ................................................................................................. 59
4.5.4.2. Java GUI...................................................................................................................................... 59
5. OPERATING THE SYSTEM ............................................................................................................. 60
5.1. LB LOCAL-LOGGER ............................................................................................................................ 60
5.1.1. Starting and stopping daemons ........................................................................................... 60
5.1.2. Troubleshooting ................................................................................................................... 61
5.2. LB SERVER ...................................................................................................................................... 62
5.2.1. Starting and stopping daemons ........................................................................................... 62
5.2.2. Creating custom indices ...................................................................................................... 63
5.2.3. Purging the LB database ..................................................................................................... 65
5.2.4. Experimental R-GMA Interface............................................................................................ 65
5.2.5. Troubleshooting ................................................................................................................... 66
5.3. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM ................................................................. 66
5.3.1. Starting and stopping NS, WM, JC and LM daemons ......................................................... 66
5.3.2. NS, WM, JC, LM troubleshooting ........................................................................................ 66
5.4. PROXY RENEWAL .............................................................................................................................. 66
5.4.1. Starting and stopping daemon............................................................................................. 66
5.4.2. Troubleshooting ................................................................................................................... 67
5.5. PURGER ........................................................................................................................................... 67
5.6. GRID ACCOUNTING ...................................................................................................................... 69
5.6.1. Starting and stopping daemon............................................................................................. 69
5.6.1.1. HLR server .................................................................................................................................. 69
5.6.1.2. PA Server .................................................................................................................................... 69
5.6.2. HLR server administration ................................................................................................... 70
5.6.2.1. Creating a Fund account ............................................................................................................. 71
5.6.2.2. Creating a Group account............................................................................................................ 72
5.6.2.3. Creating a User account .............................................................................................................. 73
5.6.2.4. Creating a Resource account ...................................................................................................... 74
5.6.2.5. Deleting accounts ........................................................................................................................ 75
5.6.3. Troubleshooting ................................................................................................................... 75
5.7. USER INTERFACE (JAVA GUI) ............................................................................................................ 75
5.7.1. Troubleshooting ................................................................................................................... 76
6. USER GUIDE .................................................................................................................................... 80
6.1. USER INTERFACE .............................................................................................................................. 80
6.1.1. Security ................................................................................................................................ 80
6.1.1.1. MyProxy....................................................................................................................................... 81
6.1.1.1.1. MyProxyClient ...................................................................................................................... 81
6.1.2. Common behaviours ........................................................................................................... 83
6.1.2.1. The --input option ........................................................................................................................ 85
6.1.3. Commands description ........................................................................................................ 87
6.1.3.1. edg-job-submit ............................................................................................................................. 87
6.1.3.2. edg-job-get-output ..................................................................................................................... 101
IST-2000-25182
PUBLIC
8 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.3. edg-job-list-match ...................................................................................................................... 105
6.1.3.4. edg-job-cancel ........................................................................................................................... 110
6.1.3.5. edg-job-status ............................................................................................................................ 115
6.1.3.6. edg-job-get-logging-info ............................................................................................................. 121
6.1.3.7. edg-job-attach............................................................................................................................ 126
6.1.3.8. edg-job-get-chkpt....................................................................................................................... 128
7. ANNEXES ....................................................................................................................................... 132
7.1. JDL ATTRIBUTES ............................................................................................................................ 132
7.2. JOB STATUS DIAGRAM .................................................................................................................... 132
7.3. JOB EVENT TYPES .......................................................................................................................... 135
7.4. SUBMISSION FAILURES ANALYSIS .................................................................................................... 137
7.5. JOB RESUBMISSION AND RETRYCOUNT ............................................................................................ 139
7.6. WILDCARD PATTERNS ...................................................................................................................... 139
7.7. THE MATCH MAKING ALGORITHM ..................................................................................................... 141
7.7.1. Direct Job Submission ....................................................................................................... 141
7.7.2. Job submission without data-access requirements ........................................................... 141
7.7.3. Job submission with data-access requirements ................................................................ 144
IST-2000-25182
PUBLIC
9 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
1. INTRODUCTION
This document provides a guide to the installation, configuration and usage of the WP1 WMS
software released within the DataGrid project.
1.1. OBJECTIVES OF THIS DOCUMENT
Goal of this document is to describe the complete process by which the WP1 WMS software
can be installed and configured on the DataGrid test-bed platforms.
Guidelines for operating the whole system and accessing provided functionalities are also
provided.
1.2. APPLICATION AREA
Administrators can use this document as a basis for installing, configuring and operating
WP1 WMS software. Users can refer to the User Guide chapter for accessing provided
services through the User Interface.
1.3. APPLICABLE DOCUMENTS AND REFERENCE DOCUMENTS
Applicable documents
[A1]
JDL Attributes - DataGrid-01-TEN-0142-0_0 – 13/06/2003
(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0142-0_0.{doc,pdf})
[A2]
Definition of the architecture, technical plan and evaluation criteria for the resource
co-allocation framework and mechanisms for parallel job partitioning
(http://www.infn.it/workload-grid/docs/DataGrid-01-D1.4-0127-1_0.{doc, pdf})
[A3]
DataGrid Accounting System - Architecture v 1.0
[A4]
Logging and Bookkeeping Architecture – DataGrid-01-TED-0141
(http://www.infn.it/workload-grid/docs/DataGrid-01-TED-0126-1_0.pdf)
(http://lindir.ics.muni.cz/dg_public/lb_draft2_formatted.pdf)
[A5]
Job Description Language HowTo – DataGrid-01-TEN-0102-02 – 17/12/2001
(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf)
[A6]
The Glue CE Schema
(http://www.cnaf.infn.it/~sergio/datatag/glue/v11/CE/index.htm)
Reference documents
[R1]
The Resource Broker Info file – DataGrid-01-TEN-0135-0_0
(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0135-0_0.{doc,pdf})
IST-2000-25182
PUBLIC
10 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
[R2]
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
LB-API Reference Document – DataGrid-01-TED-0139-0_0
(http://lindir.ics.muni.cz/dg_public/lb_api.pdf)
[R3]
Job Partitioning and Checkpointing – DataGrid-01-TED-0119-0_3
(https://edms.cern.ch/file/347730/1/DataGrid-01-TED-0119-0_3.pdf)
IST-2000-25182
PUBLIC
11 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
1.4. DOCUMENT EVOLUTION PROCEDURE
The content of this document will be subjected to modification according to the following
events:

Comments received from Datagrid project members,

Changes/evolutions/additions to the WMS components.
1.5. TERMINOLOGY
Definitions
Condor
Condor is a High Throughput Computing (HTC) environment that can
manage very large collections of distributively owned workstations
Globus
The Globus Toolkit is a set of software tools and libraries aimed at the
building of computational grids and grid-based applications.
Glossary
class-ad
Classified advertisement
CE
CLI
Computing Element
Command Line Interface
DB
Data Base
DGAS
Datagrid Grid Accounting Service
EDG
FQDN
European DataGrid
Fully Qualified Domain Name
GIS
Grid Information Service, aka MDS
GSI
Grid Security Infrastructure
GUI
HLR
Graphical User Interface
Home Location Register
IS
Information Service
job-ad
Class-ad describing a job
JA
Job Adapter
JC
Job Controller
JDL
Job Description Language
LB
Logging and Bookkeeping Service
LM
LRMS
Log Monitor
Local Resource Management System
MDS
Metacomputing Directory Service, aka GIS
IST-2000-25182
PUBLIC
12 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
MPI
Message Passing Interface
NS
Network Server
OS
Operating System
PA
Price Authority
PID
Process Identifier
PM
Project Month
RB
Resource Broker
SE
Storage Element
SI00
Spec Int 2000
SMP
Symmetric Multi Processor
TBC
To Be Confirmed
TBD
To Be Defined
UI
VO
User Interface
Virtual Organisation
VOMS
Virtual Organisation Membership server
WM
WMS
Workload Manager
Workload Management System
WP
Work Package
IST-2000-25182
PUBLIC
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
13 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
2. EXECUTIVE SUMMARY
This document comprises the following main sections:
Section 3: Workload management System Overview
Briefly introduces the new revised Workload Management System architecture, and
discusses about the deployment of the WMS components.
Section 4: Installation and Configuration
Describes changes that need to be made to the environment and the steps to be
performed for installing the WMS software on the test-bed target platforms. The resulting
installation tree structure is detailed for each system component.
Section 5: Operating the System
Provides actual procedures for starting/stopping WMS components processes and
utilities.
Section 6: User Guide
Describes in a Unix man pages style all User Interface component commands allowing
the user to access WMS provided services.
Section 7: Annexes
Deepens arguments introduced in the User Guide section that are considered useful for
the user to better understand system behaviour.
IST-2000-25182
PUBLIC
14 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
3. WORKLOAD MANAGEMENT SYSTEM OVERVIEW
The revised (release 2) architecture of the EDG Workload Management System (WMS),
which is described in detail in [A2], is represented in
Figure 1.
Figure 1: UML diagram describing the new (rel. 2) WMS architecture
IST-2000-25182
PUBLIC
15 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
The User Interface (UI) is the component that allows users to access the functionalities
offered by the Workload Management System.
The Network Server (NS) is a generic network daemon, responsible for accepting incoming
requests from the UI (e.g. job submission, job removal), which, if valid, are then passed to
the Workload Manager.
The Workload Manager (WM) is the core component of the Workload Management System.
Given a valid request, it has to take the appropriate actions to satisfy it. To do so, it may
need support from other components, which are specific to the different request types.
All these components that offer support to the Workload Manager provide a class whose
interface is inherited from a Helper class. Essentially the Helper, given a JDL expression,
returns a modified one, which represents the output of the required action. For example, if
the request was to find a suitable resource for a job, the input JDL expression will be the one
specified by the user, and the output will be the JDL expression augmented with the CE
choice.
The Resource Broker (RB) or Matchmaker is one of these classes offering support to the
Workload Manager. It provides a matchmaking service: given a JDL expression (e.g. for a
job submission), it finds the resources that best match the request. It interacts with the
Information Service and with the data management services.
The Job Adapter (JA) is responsible for making the final “touches” to the JDL expression for
a job, before it is passed to CondorG for the actual submission. So, besides preparing the
CondorG submission file, this module is also responsible for creating the wrapper script, and
for creating the appropriate execution environment in the CE worker node (this includes the
transfer of the input and of the output sandboxes).
CondorG is the module responsible for performing the actual job management operations
(job submission, job removal, etc.), issued on request of the Workload Manager.
The Log Monitor (LM) is responsible for “watching” the CondorG log file, intercepting
interesting events concerning active jobs, that is events affecting the job state machine (e.g.
job done, job cancelled, etc.), and therefore triggering appropriate actions.
For what concerns the Logging and Bookkeeping (LB) service, it stores logging and
bookkeeping information concerning events generated by the various components of the
WMS. Using this information, the LB service keeps a state machine view of each job.
As described in section 4.3, a proxy renewal mechanism is available to assure that, for all the
lifetime of a job, a valid user proxy exists within the WMS, and this proxy renewal service
relies on the MyProxy software.
The DataGrid Accounting System (DGAS) is another functionality offered by the WMS,
described in detail in [A3]. DGAS has two main purposes:

Economic accounting for Grid Users and Resources
The users pay for resource usage while the resources earns virtual credits executing
user jobs

Economic Brokering
IST-2000-25182
PUBLIC
16 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Help the Resource Broker in choosing the most suitable resource for a given job
according to the current price of a resource and a pre-defined economic policy.
The HLR (Home Location Register) server is responsible of implementing the first item in the
list. The second is covered by the Price Authority (PA) service. The suggested configuration
is to have a HLR server and a PA server per VO.
3.1. DEPLOYMENT OF THE WMS SOFTWARE
For what concerns the deployment of the WMS software, it is possible to identify the
following types of “boxes”:

The User Interface machine, which is used to interact with the functionalities of the
WMS: the WMS User Interface software has to be installed on this machine;
moreover on this machine part of the DGAS HLR client software (the DGAS job-auth
client software) and the LB C and C++ API have to be installed

The “RB node”, where the Network Server, the Workload Manager and its helpers
(Matchmaker and Job Adapter), the Job Controller, CondorG, the Log Monitor, the LB
local logger, the LB C API, the Proxy Renewal components have to be installed

The LB server, where the LB software has to be installed

The Computing Elements (CEs): on the gatekeeper node of each CE the LB local
logger software and part of the DGAS HLR client software (the DGAS ATM client
software) have to be installed. On the WNs it is necessary to install the checkpointing
API and the C and sh LB APIs.

The MyProxy server host

The HLR server, where the HLR server and the PA client software have to be
installed

The PA server, where the PA server software has to be installed
These are the EDG WP1 RPMs needed in the various “machines”
User Interface machine:
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-common-api-java-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-dgas-hlr-ui-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-ui-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-ui-api-java-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm
IST-2000-25182
PUBLIC
17 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-ui-gui-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2
edg-wl-config-X.Y.Z-K_gcc3_2_2
edg-wl-bypass-X.Y-Z.i486.rpm
“RB node”:
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-interactive-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-locallogger-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-proxyrenewal-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-wm-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-dgas-hlr-jobAuthClient-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-globus-gridftp-X.Y.Z-gxx3_2_2.i486.rpm
edg-wl-bypass-X.Y-Z.i486.rpm
LB server:
edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-lbserver-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-lbserver-rgma-X.Y.Z-K_gcc3_2_2.i486.rpm
Gatekepeer of CE:
edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-dgas-hlr-ATMClient-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-locallogger-X.Y.Z-K_gcc3_2_2.i486.rpm
WN of CE:
edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-sh-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
DGAS HLR server:
edg-wl-dgas-hlr-server-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-dgas-hlr-server-admin-X.Y.Z-K_gcc3_2_2.i486.rpm
IST-2000-25182
PUBLIC
18 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
DGAS PA server:
edg-wl-dgas-pa-server-X.Y.Z-K_gcc3_2_2.i486.rpm
Note that in this list only RPMs concerning EDG WP1 software have been specified (i.e. no
sw needed by these RPMs has been specified: details can be found in section 4).
It is not strictly needed that these different types of services have to be installed on different
machines. A machine can in fact host different services (for example, the PA server and the
HLR server could run o the same machine).
IST-2000-25182
PUBLIC
19 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4. INSTALLATION AND CONFIGURATION
This section deals with the procedures for installing and configuring the WP1 WMS
components on the target platforms. For each of them, before starting with the installation
procedure which is described through step-by-step examples, is reported the list of
dependencies i.e. the software required on the same machine by the component to run.
Moreover a description of needed configuration items and environment variables settings is
also provided. It is important to remark that since the RPMs are generated using gcc 3.2 and
RPM 4.0.2 it is expected to find the same configuration on the target platforms.
4.1. LOGGING AND BOOKKEEPING SERVICES
From the installation point of view LB services can be split in three main components:

The LB services responsible for accepting messages from their sources and
forwarding them to the logging and/or bookkeeping servers, which we will refer as LB
local-logger services.

The LB services responsible for accepting messages from the LB local-logger
services, saving them on their permanent storage and supporting queries generated
by the consumer API, that we will refer as LB server services.

The LB APIs (C, C++, sh)
The LB local-logger services must be installed on all the machines hosting processes
pushing information into the LB system, i.e. the “RB node” and the gatekeeper machines of
the CEs. An exception is the submitting machine (i.e. the machine running the User
Interface) on which this component can be installed but is not mandatory.
The LB server services need instead to be installed only on a server machine.
The LB APIs should be installed on the UI machine (C and C++ APIs), “RB node” (C and
C++ APIs) and on the CE worker nodes (C and sh APIs).
4.1.1. Required software
4.1.1.1. LB local-logger and LB APIs
For the installation of the LB local-logger and LB APIs the only software required is the
Globus Toolkit 2.2 (actually only GSI rpms are needed). Globus 2.2 RPMs are available at
http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/
4.1.1.2. LB Server
For the installation of the LB local-logger the only software required is the Globus Toolkit 2.2
(actually only GSI RPMs are needed). Globus 2.2 RPMs are available at
http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/
IST-2000-25182
PUBLIC
20 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Besides the Globus Toolkit, for the LB server to work properly it is also necessary to install
MySQL Distribution 4.0.1 or higher.
Packages and documentation about MySQL can be found at: http://www.mysql.org.
Anyway
the
MySQL
RPMs
for
pc-linux-gnu
(i686)
is
available
at
http://datagrid.in2p3.fr/distribution/external/RPMS/. At least packages MySQL-4.0.x and
MySQL-client-4.0.x have to be installed for creating and configuring the LB database.
LB server stores the logging data in a MySQL database that must hence be created. The
following assumes the database and the server daemon (edg-wl-bkserverd) runs on the
same machine, which is considered to be secure, i.e. no database authentication is used. In
a different set-up the procedure has to be adjusted accordingly as well as a secure database
connection (via ssh tunnel etc.) established.
The action list below contains placeholders DB_NAME and USER_NAME: real values have
to be substituted. They form the database connection string required on some LB daemons
invocation. Suggested value for DB_NAME is ‘lbserver20’ and for USER_NAME is `lbserver'.
These values are also the compiled-in defaults (i.e. when used, the database connection
string needn't be specified at all).
The following needed steps require MySQL root privileges:
1. Create the database:
mysqladmin -u root -p create DB_NAME
where DB_NAME is the name of the database.
2. Create a dedicated LB database user:
mysql -u root -p -e 'grant create,drop, alter,index, \
select,insert, update,delete on DB_NAME.* to \
USER_NAME@localhost'
where USER_NAME is the name of the user running the LB server daemon.
3. Create the database tables:
mysql -u USER_NAME DB_NAME < server.sql
where server.sql is a file containing sql commands for creating needed tables.
server.sql can be found in the directory “<install path>/etc” created by the LB server
rpm installation.
For the LB server it is also necessary to install expat (recommended release is 1.95.2 or
higher), which can be downloaded from: http://datagrid.in2p3.fr/distribution/external/RPMS/.
4.1.2. Configuration
IST-2000-25182
PUBLIC
21 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.1.2.1. LB Local-Logger
The LB local logger has no configuration file.
4.1.2.2. LB Server
The LB server has no configuration file.
By default the LB server is configured with only very few indices so that a limited set of
queries is supported. Upon installation the server administrator may decide to create
additional indices to support further expected user query types. See section 5.2.2 for details.
4.1.3. Environment Variables
All LB components recognize the following environment variables in the same way GSI
handles them:

X509_USER_KEY

X509_USER_CERT

X509_CERT_DIR

X509_USER_PROXY
However, in case of LB daemons, the recommended way for specifying security files
locations is using --cert, --key, --CAdir options explicitly: GSI searches through various
default locations and finding a wrong credential file in some of them may cause unexpected
behaviour.
The Logging library i.e. the library that is linked into UI, NS, WM, JC, LM, and called from the
job-wrapper script recognizes the following environment variables (besides the X509_* ones
listed above):

GLOBUS_HOSTNAME
Hostname that will appear as the source of logged events

EDG_WL_LOG_DESTINATION
<hostname>:<port> of the local-logger to use

EDG_WL_LOG_TIMEOUT
Timeout for standard (asynchronous) logging

EDG_WL_LOG_SYNC_TIMEOUT
Timeout for synchronous logging
IST-2000-25182
PUBLIC
22 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
EDG_WL_QUERY_SERVER
Default server to query (prefix in JobId overrides this setting)

EDG_WL_QUERY_TIMEOUT
Timeout for queries
All them has reasonable defaults and needn’t be set in normal operation (details can be
found in [R2].
On the submitting machine if the variable EDG_WL_LOG_DESTINATION is not set, it is
dynamically assigned by the UI referring to the machine where the NS runs. The Logging
library functions timeout is automatically increased with respect to the default value
(recommended for non-locals logging).
IST-2000-25182
PUBLIC
23 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.2. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM
As introduced in section 3, the Network Server (NS), the Workload Manager (WM), the Job
Controller (JC) and the Log Monitor (LM) are dealt with together since they always reside on
the same host (the “RB node”) and consequently are distributed by means of a single rpm.
4.2.1. Required software
For the installation of NS, WM, JC and LM, the following products are expected to be
installed:

Globus

Condor-G

ClassAd library

Boost

LB local logger (whose installation and configuration is discussed in section 4.1)

ReplicaManager from the EDG WP2 distribution
4.2.1.1. Globus installation and configuration
For what concerns Globus, the required release is 2.2. The Globus software can be
downloaded from: http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/.
Please note that the “RB node” should run a gridftp server (actually a “customized” one: see
section 4.2.4.1), while it should not run a globus gatekepeer.
4.2.1.1.1. Condor-G installation and configuration
Condor-G release required is CondorG 6.5.1, which can be found at the following URL:
http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/condor/RPMS/.
Moreover some additional configuration steps have to be performed in the Condor
configuration file pointed to by the CONDOR_CONFIG environment variable set during
installation. In the $CONDOR_CONFIG file the following attributes need to be modified:
RELEASE_DIR
= /opt/condor
LOCAL_DIR
= $ENV(GLOBUS_LOCATION)/var/condor
CONDOR_ADMIN
= <a valid e-mail address of the Condor-G administrator>
and the following entries need to be added:
IST-2000-25182
PUBLIC
24 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
AUTHENTICATION_METHODS
= CLAIMTOBE
ENABLE_GRID_MONITOR
GRID_MONITOR
= TRUE
= $(SBIN)/grid_monitor.sh
Date: 07/03/2016
4.2.1.2. ClassAd installation and configuration
The ClassAd software required is a “customized” classads-0.9.4 release, available in rpm
format (to be installed as root) at:
http://datagrid.in2p3.fr/distribution/external/RPMS.
The ClassAd library documentation can be found at the following URL:
http://www.cs.wisc.edu/condor/classad.
4.2.1.3. Boost installation and configuration
The Boost C++ libraries release required is 1.29 (or higher).
The boost documentation can be found at the following URL:
http://www.boost.org
whilst it is available in rpm format (to be installed as root) at:
http://datagrid.in2p3.fr/distribution/external/RPMS
4.2.1.4. Replica Manager installation and configuration
The Replica Manager RPMs that must be installed are:
edg-gsoap-base-1.0.3-1.i386.rpm
edg-replica-location-client-c++-1.2.8-1.i386.rpm
edg-replica-optimization-client-c++-1.2.9-1.i386.rpm
edg-replica-metadata-catalog-client-c++-1.2.8-1.i386.rpm
edg-replica-manager-client-c++-1.0.6-1.i386.rpm
After the RPM installation, it is then needed to configure the configuration files for the various
VOs in <istall-dir>/etc/edg-replica-manager (please refer to WP2 documentation for details).
4.2.2. Configuration
Once the rpm installation has been performed, the NS, WM, JC and LM services must be
properly
configured.
This
can
be
done
editing
the
file
${EDG_WL_CONFIG_DIR}/edg_wl.conf file. If $EDG_WL_CONFIG_DIR hasn’t been
defined, the edg_wl.conf file is looked for first in /opt/edg/etc, then in /etc, and then in
/usr/local/etc.
This configuration file has the following structure (ClassAd based):
IST-2000-25182
PUBLIC
25 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
[
Common = [
…
…
];
NetworkServer = [
…
…
];
WorkloadManager = [
…
…
];
JobController = [
…
…
];
LogMonitor = [
…
…
];
]
Therefore the configuration file is composed of 5 parts:

one for the “common” (i.e. “used” by all services) attributes

one for the configuration of the NS

one for the configuration of the WM

one for the configuration of the JC

one for the configuration of the LM
4.2.2.1. Configuration of the “common” attributes
As introduced in the previous section, it is necessary first of all to edit the:
Common = [
…
IST-2000-25182
PUBLIC
26 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
…
];
part of the configuration file, in order to set the attributes “used” by all the services.
The only “common” attribute that must be specified is:

DGUser refers to the user name account running the NS, WM, JC and LM services.
E.g.:
DGUser = “${EDG_WL_USER}”;
4.2.2.2. NS configuration
Configuration of the Network Server is accomplished editing the configuration file and setting
opportunely the attributes in the:
NetworkServer = [
…
…
];
section.
They are listed hereafter grouped according to the functionality they are related with:

II_Contact, II_Port, II_DN and II_Timeout refer to the II service and respectively
represent the hostname where this service is running, the port number, the base DN
(which represents the distinguished name to use as a starting place for searches in
the information service) to be used when querying the II, and the timeout in seconds
to consider when the II is queried. E.g.:
II_Contact = "grid001f.cnaf.infn.it";
II_Port = 2170;
II_DN = "mds-vo-name=local, o=grid";
II_Timeout = 60;

Gris_Port, Gris_DN and GRIS_Timeout respectively represent the port number
where the GRISes run, the base DN to be used when querying the GRISes, and the
timeout in seconds when the GRISes are queried.
Actually the port and the base DN to be used are specified in the information service
schema, and the NS relies on these values: the GRIS_Port and GRIS_DN attributes
IST-2000-25182
PUBLIC
27 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
specified in the configuration file are considered only if, for some reasons, they are
not published in the information service.
E.g.:
Gris_Port = 2135;
Gris_DN = "mds-vo-name=local, o=grid";
Gris_Timeout = 20;

ListeningPort is the port used by the NS to listen for requests coming from the User
Interface. Default value for this parameter is:
ListeningPort = 7772;

MasterThreads defines the maximum number of simultaneous connections with User
Interfaces. Default value is:
MasterThreads = 8;

DispatcherThreads defines the maximum number of simultaneous connections (to
forward the incoming requests) with the Workload Manager. Default value is:
DispatcherThreads = 10;

SandboxStagingPath represents the pathname of the root sandboxes directory, i.e.
the complete pathname referring to the directory where the RB creates both
input/output sandboxes directories and stores the “.Brokerinfo” file. Please take care
that this directory must not have the sticky bit (o+t).
E.g.:
SandboxStagingPath = "/disk/sandbox”;

EnableQuotaManagement is a Boolean attribute which specifies if the user quota
has to be checked to control if there is enough space to store the input sandbox (see
section 4.2.4.3)
E.g.:
EnableQuotaManagement = true;

MaxInputSandboxSize defines the maximum size (in bytes) for the input sandbox
allowed per job. If the size of the input sandbox for a given job is greater than
MaxInputSandboxSize, then the job is refused.
IST-2000-25182
PUBLIC
28 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
E.g.
MaxInputSanboxSize = 10000000;

EnableDynamicQuotaAdjustment and QuotaAdjustmentAmount refer to
“dynamic” quota (see section 4.2.4.3). If EnableDynamicQuota is true, than if for the
considered user a disk quota hasn’t been set by the system administrator, than a
virtual quota equal to QuotaAdjustmentAmount is added for that user for each
submitted job, and it is released when the job has completed its execution.
E.g.:
EnableDynamicQuotaAdjustment = true;
QuotaAdjustmentAmount = 10000;

QuotaInsensibleDiskPortion represents the percentage of the disk storing the
sandboxes directories that the administrator wants to keep unassigned. So if the free
disk space is less than the specified percentage, no new jobs can’t be accepted (see
section 4.2.4.3).
E.g.:
QuotaInsensibleDiskPortion = 2.0;

LogFile and LogLevel refer to the NS log file. LogFile is the name of this file, while
LogLevel allows to specify the verbosity of the information the NS records in its log
file: 0 is the minimum value (no debug information is written in the log file), while 6 is
the maximum value (full debug). E.g.:
LogFile = “${EDG_WL_TEMP}/NetworkServer/log”;
LogLevel = 6;
4.2.2.3. WM configuration
Configuration of the Workload Manager is accomplished editing the configuration file and
setting opportunely the attributes in the:
WorkloadManager = [
…
…
];
section.
IST-2000-25182
PUBLIC
29 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
They are listed hereafter grouped according to the functionality they are related with:

PipeDepth defines the maximum size of the buffer between the dispatcher and the
worker threads. E.g.:
PipeDepth = 10;

NumberOfWorkerThreads represents the size of the worker threads pool. The
default value is:
NumberOfWorkerThreads = 1;

DispatcherType defines the type of the input queue of requests. There shouldn’t be
any reasons to change the provided default value (“filelist”).

Input refers to the WM input “queue” of requests. There shouldn’t be reasons to
change the provided default value. E.g.:
Input = “"${EDG_WL_TMP}/workload_manager/input.fl";

MaxRetryCount allows specifying the maximum number of times the WM has to try
to re-schedule and re-submit the job in case the submission to the CE failed (e.g.
globus down on the CE, network problems, etc.).
When a job is submitted specifying the RetryCount attribute in the JDL, the
submission retries attempted by the WM are at most the minimum value between
RetryCount and MaxRetryCount. The default value for this configuration parameter is:
MaxRetryCount = 10;

LogFile and LogLevel refer to the WM log file. LogFile is the name of this file, while
LogLevel allows specifying the verbosity of the information the WM records in its log
file: 0 is the minimum value (no debug information is written in the log file), while 6 is
the maximum value (full debug). E.g.:
LogFile = “${EDG_WL_TEMP}/manager/log/events.log”;
LogLevel = 6;
Please note that all directories specified in the WM configuration file are supposed to already
exist, i.e. as the WM does not create directories, if they are not already there, they have to be
created at installation time.
IST-2000-25182
PUBLIC
30 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.2.2.4. JC configuration
Configuration of the Job Controller is accomplished editing the configuration file and setting
opportunely the attributes in the:
JobController = [
…
…
];
section.
They are listed hereafter grouped according to the functionality they are related with:

CondorSubmit
CondorRemove,
CondorQuery
CondorSubmitDag
CondorRelease respectively specify the pathname of the condor_submit, condor_rm,
condor_q, condor_submit_dag and condor_release Condor-G commands.

SubmitFileDir defines the directory where the temporary files (the CondorG submit
file and the job wrapper scripts) are created.
E.g.:
SubmitFileDir

= "${EDG_WL_TMP}/jobcontrol/submit";
OutputFileDir defines the directory where the standard output and standard error
files of CondorG are temporarily saved.
E.g.:
OutputFileDir = "${EDG_WL_TMP}/jobcontrol/condorio";

Input refers to the JC input “queue” of requests. There shouldn’t be any reasons to
change the default value

LogFile and LogLevel refer to the JC log file. LogFile is the name of this file, while
LogLevel allows specifying the verbosity of the information the JC records in its log
file: 0 is the minimum value (no debug information is written in the log file), while 6 is
the maximum value (full debug). E.g.:
LogFile = “${EDG_WL_TEMP}/jobcontrol/log/events.log”;
LogLevel = 6;
IST-2000-25182
PUBLIC
31 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
ContainerRefreshThreshold represents the number of jobs after which the JC has
to re-read the IdRepositoryName LM file (see section 4.2.2.5). There shouldn’t be any
reasons to change the provided default value:
ContainerRefreshThreshold = 1000;

UseFakeForProxy and UseFakeForReal are used for debug purposes. Therefore
there shouldn’t be any reasons to modify the default values:
UseFakeForProxy = false;
UseFakeForReal = false;
4.2.2.5. LM configuration
Configuration of the Log Monitor is accomplished editing the configuration file and setting
opportunely the attributes in the:
LogMonitor = [
…
…
];
section.
They are listed hereafter grouped according to the functionality they are related with:

CondorLogDir defines the directory name where the CondorG log files (i.e. the files
where the events for the submitted jobs are recorded) are created.
E.g.:
CondorLogDir = "${EDG_WL_TMP}/LM/CondorGlog";

JobsPerCondorLog represents the number of jobs whose events are recorded for
each single CondorG log file. So every JobsperCondorLog jobs, the CondorG log
file is changed.
E.g.:
JobsperCondorLog = 1000;
IST-2000-25182
PUBLIC
32 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
MainLoopDuration defines when the LM reads the CondorG log files: every
MainLoopDuration seconds, the LM reads the CondorG log files.
E.g.:
MainLoopDuration = 10;

CondorLogRecycleDir defines the directory name where the already read (by LM)
CondorG log files are stored.
E.g.:
CondorLogRecycleDir = "${EDG_WL_TMP}/LM/RecCondorGlog";

MonitorInternalDir is the directory where some files needed for the LM service by
internal purposes are created and stored.
E.g.:
MonitorInternalDir = "${EDG_WL_TMP}/LM/internal";

IdRepositoryName is the name of a file used by LM for internal purposes (where the
dgjobid – Condorid correspondences are kept).
E.g.:
IdRepositoryName = "irepository.dat";

AbortedJobsTimeout represents the timeout (in seconds) to have a cancelled job
forgot by the LM (useful when the job is hang in the CondorG queue).
E.g.:
AbortedJobsTimeout = 600;

LogFile and LogLevel refer to the LM log file. LogFile is the name of this file, while
LogLevel allows specifying the verbosity of the information the LM records in its log
file: 0 is the minimum value (no debug information is written in the log file), while 6 is
the maximum value (full debug). E.g.:
LogFile = “${EDG_WL_TEMP}/LM/log/events.log”;
LogLevel = 6;
4.2.3. Environment variables
Environment variables that have to be set (or can be set) for the NS, WM, JC and LB
services are listed hereafter:

EDG_WL_LOG_DESTINATION
IST-2000-25182
PUBLIC
33 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
The Logging library i.e. the library providing APIs for logging job events to the LB
reads its immediate logging destination from the environment variable
EDG_WL_LOG_DESTINATION (see section 4.1.3).

CONDOR_CONFIG
This variable has to refer to the CondorG
/opt/condor/etc/condor_config (see section 4.2.1.1.1).

configuration
file,
usually
EDG_WL_CONFIG_DIR
As explained in section 4.2.2, this variable refers to the directory where the
configuration file for the WMS services running on the “RB node” (edg_wl.conf) is
available.

GRIDMAP
This variable must refer to the grid-mapfile (usually /etc/grid-secury/grid-mapfile)

LD_LIBRARY_PATH
Should include $GLOBUS_LOCATION/lib, the Boost lib directory and the gcc 3.2 lib
directory

EDG_LOCATION
Should refer to the EDG software installation directory (usually /opt/edg): needed for
the WP2 services used by the RB
Then of course, if some environment variables are used in the NS/WM/JC/LM configuration
sections, they have of course to be set as well.
Anyway, all variables that must be defined for the proper execution of the WMS services,
are set by the relevant start-up scripts.
4.2.4. Other requirements and configurations for the “RB node”
4.2.4.1. Customized Gridftp server
To assure the “security” of the input and output sandboxes a “customized” Gridftp server has
to run on the “RB node”.
With this “patched” Gridftp server, the sandbox files are transferred in the “RB node”
belonging to the group of the user running the NS, WM, JC and LM services (usually
edguser) and rwxrwx--- as mask. In this way a user cannot access sandbox files belonging to
other users.
IST-2000-25182
PUBLIC
34 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
In order to install this “customized” Gridftp server, the following RPM has to be installed:
edg-wl-globus-gridftp-X.Y.Z-K.i486.rpm
By default this rpm installs the software in the “/opt/edg” directory
After having installed the software, it may be necessary to modify the line:
magicgroup edguser all
in the file:
<INSTALL-PREFIX>/etc/ftpaccess
if edguser is not the group for the user running the WMS services in the “RB node”.
To start/stop this patched gridftp server, the following command has to be issued:
/etc/rc.d/init.d/edg-wl-ftpd start/stop
4.2.4.2. Grid-mapfile
The Globus grid-mapfile (usually located in /etc/grid-security) on the “RB node” must be filled
with the certificate subjects of all the users allowed to use the WMS functionalities. Users
being mapped into the gridmap-file have to belong to groups which, for security reasons,
have to be different than the group for the dedicated user (e.g. edguser) running the NS,
WM, JC, LM services.
4.2.4.3. Disk Quota
When a job is submitted to the WMS, first of all the NS checks if there is enough space to
store its input sandbox files.
Moreover, as explained in section 4.2.2.2, the NS checks that the input sandbox size is not
greater than the value specified as MaxInputSandboxSize in the NS configuration section,
otherwise the job is refused.
As introduced in section 4.2.2.2, it is also possible enabling a disk quota check (by setting
EnableQuotaManagement=true in the NS configuration section). In this case, when a user
submits a job, the NS checks the disk quotas for that particular local account (the one
defined in the grid-mapfile), to see if it is possible to move the input sandbox files in the “RB
node”. So, if the disk quota check has been enabled in the NS configuration file, the disk
IST-2000-25182
PUBLIC
35 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
quota option had to be enabled, and disk quotas had to be set for the various users allowed
to submit jobs to the WMS (i.e. the ones defined in the grid-mapfile).
If the NS configuration parameter EnableDynamicQuota is set to true, than if for the
considered user a disk quota hasn’t been set by the system administrator, than a “dynamic”
quota equal to QuotaAdjustmentAmount (an other NS configuration parameter) is added for
that user for each submitted job, and it is released when the job has completed its execution.
It is also possible to define (via the QuotaInsensibleDiskPortion NS configuration parameter)
a portion of disk. If the free space of the disk used for storing input and output sandboxes is
less than this percentage value, than no new jobs can be submitted.
4.3. SECURITY SERVICES
The EDG WMS software relies on the GSI mechanisms for what concerns authentication.
This means that, for all the lifetime of a job, a valid user proxy must exist within the WMS (not
necessarily in the UI) for all the lifetime of a job.
A secure way for achieving this (instead of considering long time proxy) is to exploit the proxy
renewal (PR) mechanisms, which rely on the MyProxy package. The underlying idea is that
the user registers in a MyProxy server a valid long-term certificate proxy that will be used by
the WMS to perform a periodic credential renewal for the submitted job; in this way the user
is no longer obliged to create very long lifetime proxies when submitting jobs lasting for a
great amount of time.
The MyProxy credential repository system consists of a server and a set of client tools that
can be used to delegate and retrieve credentials to and from a server. Normally, a user
would start by using the myproxy_init client program along with the permanent credentials
necessary to contact the server and delegate a set of proxy credentials to the server along
with authentication information and retrieval restrictions.
The Proxy Renewal (PR) service maintains users' proxy certificates and, by periodically
contacting the Myproxy server, keeps the certificates valid. The service communicates only
through a local unix socket, so it must be installed on the same machine where services
registering proxies run (i.e. on the “RB node”).
Therefore:

In the “RB node” it is necessary to deploy (via the edg-wl-proxyrenewal-X.Y.ZK.i486.rpm RPM) the PR software and the Myproxy libraries (as discussed in the
section 4.3.2)

In the “Myproxy server” node it is necessary to deploy the Myproxy Server software,
discussed in section 4.3.1
The MyProxy Toolkit is available at the following URL: http://myproxy.ncsa.uiuc.edu/
MyProxy version v0.5.3 is recommended for the Datagrid environment.
4.3.1. MyProxy Server
myproxy-server is a daemon that runs on a trusted, secure host and manages a database of
proxy credentials for use from remote sites. Proxies have a lifetime that is controlled by the
IST-2000-25182
PUBLIC
36 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
myproxy-init program. When a proxy is requested from the myproxy-server, via the myproxyget-delegation command, further delegation insures that the lifetime of the new proxy is less
than the original to enforce greater security.
A configuration file is responsible for maintaining a list of trusted portals and users that can
access this service. To configure a Myproxy server, one must restrict the users that are
allowed to store credentials within the Myproxy server and, more importantly, which clients
are allowed to retrieve credentials from the Myproxy server. To do that, it is necessary to edit
a configuration file (/etc/myproxy-server.conf) and add specific services allowed to carry out
proxy renewal. An example of myproxy-server.conf is below:
accepted_credentials "/C=CZ/O=CESNET/*"
accepted_credentials "/C=IT/O=INFN/*"
authorized_renewers \
"/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0380.cern.ch"
authorized_renewers \
"/C=IT/O=INFN/OU=gatekeeper/L=CNAF/CN=grid010g.cnaf.infn.it/Ema
il=sitemanager@cnaf.infn.it"
As it is possible to see, it contains the subject names of all the resources who are allowed to
renew credentials (the recognized “RB nodes”) and the prefixes of the subject names of the
users that are allowed to store proxies in the repository.
In order to launch the myproxy-server daemon, it is necessary to run the binary
<prefix>/sbin/myproxy-server. The program will start up and background itself. It accepts
connections on TCP port 7512, forking off a separate child to handle each incoming
connection.
It logs information via the syslog service. A Starting script (/etc/init.d/myproxy) is provided to
run the service on reboot.
4.3.2. Proxy renewal service
4.3.2.1. Required software
Globus 2.2 (which can be downloaded from http://datagrid.in2p3.fr/distribution/globus/vdt1.1.8/globus/RPMS/) and the Myproxy libraries (version v0.5.3 recommended) are needed
for the Proxy Renewal services.
4.3.2.2. Configuration
The PR daemon has no configuration file.
IST-2000-25182
PUBLIC
37 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
4.3.2.3. Environment variables
The PR daemon recognizes the following environment variables in the same way the GSI
handles them:

X509_USER_KEY

X509_USER_CERT

X509_CERT_DIR

X509_USER_PROXY
4.4. GRID ACCOUNTING SERVICES
4.4.1. Required software
As introduced in section 3, for what concerns the DGAS services, it is necessary to install:

The HLR server software plus the PA client software on a HLR server machine

The PA server software on a PA server machine

The DGAS job-auth client software on the UI machine

The ATM client software on the gatekeeper node of the CE
For the DGAS software, the Globus Toolkit 2.2 software is required (actually only GSI RPMs
are needed). Globus 2 RPMs are available at http://datagrid.in2p3.fr/distribution/globus/vdt1.1.8/globus/RPMS/.
Besides Globus Toolkit, for both PA and HLR servers it is also necessary to install MySQL
Distribution 4.0.1 or higher.
Packages and documentation about MySQL can be found at: http://www.mysql.org
Anyway
the
MySQL
RPMs
for
pc-linux-gnu
http://datagrid.in2p3.fr/distribution//external/RPMs.
(i686)
is
available
at
The MySQL database can be executed basically in two ways, as a daemon waiting both for
remote TCP calls and local Unix-socket calls, or waiting for local calls only.
DGAS doesn't need MySQL to wait for incoming TCP calls, so if this feature is not needed for
other purposes, it is strongly suggested to instruct MYSQL listening Unix-socket only. In
order to skip networking in MySQL, the following line:
skip-networking
IST-2000-25182
PUBLIC
38 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
has to be added in the MySQL configuration file (usually /etc/my.conf) under the daemon
part of the configuration file (i.e. [mysqld]).
4.4.1.1. Creating the MySQL databases for the HLR server
The HLR stores its data in two databases that needs to be installed and configured. The file
needed to install the databases can be found under <install-prefix>/etc and are:

hlr.sql
Main DB used by the HLR engine.

hlr_tmp.sql
DB where the HLR stores temporary data.
In order to install the HLR DBs, it is first necessary to create them:
# mysqladmin create hlr
# mysqladmin create hlr_tmp
and then the previous listed files have to be used to create the tables in the databases:
# mysql hlr < hlr.sql
# mysql hlr_tmp < hlr_tmp.sql
4.4.1.2. Creating the MySQL database for the PA server
The PA stores its data in one database that needs to be installed and configured. The file
needed to install the database can be found under <install-path>/etc, and is:

pa.sql
Main DB used by the PA engine.
In order to install the PA DB, first of all it has to be created:
# mysqladmin create pa
and then the previous mentioned file can be used to create the tables in the database:
IST-2000-25182
PUBLIC
39 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
# mysql pa < pa.sql
4.4.2. Configuration
4.4.2.1. Configuring the HLR server
The main options for the HLR server daemon can be set up in its configuration file, which can
be referenced when starting the daemon (see section 5.6.1.1). The file is usually found in
$EDG_WL_LOCATION/etc
The configuration file accepts parameters in the form:
item = "value"
with an item-value pair per line.
These are the parameters that can be specified in the HLR configuration file:

hlr_sql_server
The host where the HLR databases are installed (usually it is the localhost)

hlr_sql_user
The user running the HLR database

hlr_sql_password
The HRL database user password

hlr_sql_dbname
The HLR database name

hlr_tmp_sql_dbname
The HRL tmp database name

hlr_port
The HLR server listening port

hlr_log
The HLR log file name
IST-2000-25182
PUBLIC
40 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.4.2.2. Configuring the PA server
The main options for the PA server daemon can be set up in its configuration file, which can
be referenced when starting the daemon (see section 5.6.1.2). The file is usually found in
<install-path>/etc.
These are the parameters that can be specified in the PA configuration file:

pa_sql_server
The host where the PA database is installed (usually it is the localhost)

pa_sql_user
The user running the PA database

pa_sql_password
The PA database user password

pa_sql_dbname
The PA database name

pa_port
The PA server listening port

pa_log
The PA log file name
4.4.2.3. Configuring the ATM client software
As mentioned before, in the gatekeeper node of each CE, the DGAS ATM client software
has to be installed and configured. It is necessary to specify, via a configuration file, the full
contact string for the Resource PA and HLR.
This configuration file, referenced by the DGAS ATM Client API, should usually be <installpath>/etc/ edg-wl-ATMClient-dgas.conf and it has to specify the two following attributes:

res_acct_PA_id
The resource PA, in the format:
Res_acct_OA_id= "hostname:portnumber:X509CertSubject"

res_acct_bank_id
The resource HLR, in the format:
IST-2000-25182
PUBLIC
41 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
res_acct_bank_it= "hostname:portnumber:X509CertSubject"
4.4.3. Environment variables
The grid accounting services don’t rely on any environment variables.
IST-2000-25182
PUBLIC
42 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.5. USER INTERFACE
This section describes the steps needed to install and configure the User Interface, which is
the software module of the WMS allowing the user to access main services made available
by the components of the scheduling sub-layer.
The UI software is distributed in 4 different packages:

The python command line interface

The C++ API

The Java API

The Java GUI
4.5.1. Required software
All the above listed packages have a dependency on the Globus Toolkit software. The
required release is 2.2 from the VDT distribution. It can be downloaded from
http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/. The needed rpms are
listed here below:

vdt_globus_essentials-EDGVDT1.1.8-5.i386.rpm

vdt_globus_sdk-EDGVDT1.1.8-5.i386.rpm

vdt_compile_globus_core-EDGVDT1.1.8-1.i386.rpm

globus-initialization-2.2.4-2.noarch.rpm
Moreover the set of security configuration rpm’s for all the Certificate Authorities in Testbed2
available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ have to be installed
together with the rpm to be used for renewing your certificate for your CA. This is available at
http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/local/.
Lastly the MyProxy package should be installed on the UI node in order to allow users to
take advantage of the proxy-renewal feature for long running jobs. The corresponding rpm
can be fount at http://datagrid.in2p3.fr/distribution/external/RPMS and is named as follows:

myproxy-gcc32dbg-client-0.5.3-1.i386.rpm
4.5.1.1. Python Command Line Interface
In order to install the CLI, apart form the proper user interface rpm:

edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm
the common UI configuration rpm:
IST-2000-25182
PUBLIC
43 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
and the configuration LCFGng objects:

edg-lcfg-cliconfig

edg-lcfg-uicmnconfig
the following WMS and third-party packages are needed:

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API)

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
(LB client C API)

edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
(Condor bypass)
(WMS common lib)
Moreover the VOMS client C++ API rpm:

voms-api_gcc3_2_2-1.1.39-1_RH7.3
available at
http://datagrid.in2p3.fr/distribution/autobuild/i386-rh7.3-gcc3.2.2/wp6/RPMS/
is needed on the UI, together with the VO specific rpm containing the credentials of the
VOMS server for the given VO (one rpm per VO is needed):

edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm
available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ .
Lastly, the Python interpreter, version 2.2.2 must also be installed on the submitting machine.
The rpm for this package is available at http://datagrid.in2p3.fr/distribution/redhat7.3/updates/RPMS as:

python2-2.2.2-11.7.3.i386.rpm

tkinter2-2.2.2-11.7.3.i386.rpm
Information about python and the package sources can be found at www.python.org.
IST-2000-25182
PUBLIC
44 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
4.5.1.2. C++ API
The UI C++ API is distributed within the following rpm:

edg-wl-ui-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
Moreover the following WMS and third-party packages are needed:

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API)

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
(LB client C API)

edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2.i486.rpm
(Checkpointing API)

edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
(Condor bypass)
(WMS common lib)
The VOMS client C++ API rpm:

voms-api_gcc3_2_2-1.1.39-1_RH7.3
available at
http://datagrid.in2p3.fr/distribution/autobuild/i386-rh7.3-gcc3.2.2/wp6/RPMS/
is also needed on the UI, together with the VO specific rpm containing the credentials of the
VOMS server for the given VO (one rpm per VO is needed):

edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm
available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ .
Lastly, the following external rpms, all available at the following URL:
http://datagrid.in2p3.fr/distribution/external/RPMS need to be installed on the UI node.
They are the customised Condor classads library (see 4.2.1.2 for details):

classads-g3-0.9.4-vh8.i486.rpm
and the The Boost C++ libraries (see 4.2.1.3 for details):

boost-g3-1.29.1-vh6.i486.rpm
4.5.1.3. Java API
The UI Java API is distributed within the following rpm:
IST-2000-25182
PUBLIC
45 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
edg-wl-ui-api-java-X.Y.Z-K.i486.rpm
Moreover the following WMS and third-party packages are needed:

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
(WMS common lib)

edg-wl-common-api-java-X.Y.Z-K.i486.rpm
(requestAd jar)

edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API)

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
(LB client C API)

edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
(Condor bypass)
Lastly,
the
following
external
rpms
all
available
at
http://datagrid.in2p3.fr/distribution/external/RPMS need to be installed on the UI node. They
are the customised Condor classads java library version 1.1:

classads-jar-1.1-2.i386.rpm
the Java 2 Development Kit version 1.4 (or greater):

j2sdk-1.4.1_01-fcs.i586.rpm

j2sdk_profile-1.4.1_01-1.noarch.rpm
the Globus Java CoG Kit version 1.0 alpha:

cog-jar-1.0-1_alpha.i386.rpm
the Log4J package version 1.2.6:

log4j-1.2.6-1jpp.noarch.rpm
and the EDG Java Security API:

bouncycastle-jdk14-1.19-2

edg-java-security-1.4.1-1
4.5.1.4. Java GUI
In order to install the Java Graphical User Interface, apart form the proper GUI rpm:
IST-2000-25182
PUBLIC
46 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
edg-wl-ui-gui-X.Y.Z-K.i486.rpm
the Java API rpm:

edg-wl-ui-api-java-X.Y.Z-K.i486.rpm
the common UI configuration rpm:

edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
and the configuration LCFGng objects:

edg-lcfg-guiconfig

edg-lcfg-uicmnconfig
the following WMS and third-party packages are needed:

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
(WMS common lib)

edg-wl-common-api-java-X.Y.Z-K.i486.rpm
(requestAd jar)

edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API)

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
(LB client C API)

edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
(Condor bypass)
Moreover,
the
following
external
rpms
all
available
at
http://datagrid.in2p3.fr/distribution/external/RPMS need to be installed on the UI node. They
are the customised Condor classads java library version 1.1:

classads-jar-1.1-2.i386.rpm
the Java 2 Development Kit version 1.4 (or greater):

j2sdk-1.4.1_01-fcs.i586.rpm

j2sdk_profile-1.4.1_01-1.noarch.rpm
the Globus Java CoG Kit version 1.0 alpha:
IST-2000-25182
PUBLIC
47 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
cog-jar-1.0-1_alpha.i386.rpm
the Log4J package version 1.2.6:

log4j-1.2.6-1jpp.noarch.rpm
and the EDG Java Security API:

bouncycastle-jdk14-1.19-2

edg-java-security-1.4.1-1
4.5.2. RPM installation
All the needed rpms can be downloaded with the command
wget -nd –r <URL>/<rpm name>
and installed with
rpm –ivh <rpm name>
As stated at the beginning of section 4.5.1 all UI packages requires the installation of the
Globus Toolkit software release 2.2 from the VDT distribution and the MyProxy package.
It is important to remark that since the RPMs are generated using gcc 3.2 and RPM 4.0.2 it is
expected to find the same configuration on the target platforms.
Hereafter are reported details for each UI package.
4.5.2.1. Python Command Line Interface
In order to install the python command line User Interface, the following commands have to
be issued with root privileges:
rpm
rpm
rpm
rpm
rpm
rpm
rpm
–ivh
–ivh
–ivh
–ivh
–ivh
–ivh
–ivh
IST-2000-25182
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm
PUBLIC
48 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
By default the rpms install the software in the “/opt/edg” directory.
Moreover the VOMS API rpms have to be installed as follows:
rpm –ivh voms-api_gcc3_2_2-1.1.39-1_RH7.3
rpm –ivh edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm
Of course the python2.2 rpms have to be installed too if they are not present on the machine
(should be included in the RH 7.3 distribution).
4.5.2.2. C++ API
In order to install the UI C++ API, the following commands have to be issued with root
privileges:
rpm
rpm
rpm
rpm
rpm
–ivh
–ivh
–ivh
–ivh
–ivh
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
rpm –ivh edg-wl-ui-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
By default the rpms install the software in the “/opt/edg” directory.
Moreover the VOMS API and the classads and boost libraries have to be installed as follows:
rpm –ivh voms-api_gcc3_2_2-1.1.39-1_RH7.3
rpm –ivh edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm
rpm –ivh classads-g3-0.9.4-vh8.i486.rpm
rpm –ivh boost-g3-1.29.1-vh6.i486.rpm
4.5.2.3. Java API
In order to install the UI Java API, the following commands have to be issued with root
privileges:
rpm –ivh j2sdk-1.4.1_01-fcs.i586.rpm
rpm –ivh j2sdk_profile-1.4.1_01-1.noarch.rpm
rpm –ivh classads-jar-1.1-2.i386.rpm
IST-2000-25182
PUBLIC
49 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
rpm –ivh cog-jar-1.0-alpha-1.0-1_alpha.i386.rpm
rpm –ivh log4j-1.2.6-1jpp.noarch.rpm
rpm –ivh bouncycastle-jdk14-1.19-2
rpm –ivh edg-java-security-1.4.1-1
rpm
rpm
rpm
rpm
–ivh
–ivh
–ivh
–ivh
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
rpm –ivh edg-wl-common-api-java-X.Y.Z-K.i486.rpm
rpm –ivh edg-wl-ui-api-java-X.Y.Z-K.i486.rpm
By default the WMS rpms install the software in the “/opt/edg” directory.
4.5.2.4. Java GUI
In order to install the Java GUI, the following commands have to be issued with root
privileges:
rpm –ivh j2sdk-1.4.1_01-fcs.i586.rpm
rpm –ivh j2sdk_profile-1.4.1_01-1.noarch.rpm
rpm –ivh classads-jar-1.1-2.i386.rpm
rpm –ivh cog-jar-1.0-alpha-1.0-1_alpha.i386.rpm
rpm –ivh log4j-1.2.6-1jpp.noarch.rpm
rpm –ivh bouncycastle-jdk14-1.19-2
rpm –ivh edg-java-security-1.4.1-1
rpm
rpm
rpm
rpm
–ivh
–ivh
–ivh
–ivh
edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm
edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm
rpm –ivh edg-wl-common-api-java-X.Y.Z-K.i486.rpm
rpm –ivh edg-wl-ui-api-java-X.Y.Z-K.i486.rpm
IST-2000-25182
PUBLIC
50 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
rpm –ivh edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm
rpm –ivh edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm
rpm –ivh edg-wl-ui-gui-X.Y.Z-K.i486.rpm
By default the WMS rpms install the software in the “/opt/edg” directory.
4.5.3. Configuration
The User Interface C++ and Java API packages have no configuration.
The Python command line interface and the GUI have instead a common configuration
section that allows setting VO-specific parameters. This information is provided within file
edg_wl_ui.conf. There is one such file for each of the supported EDG VOs. These files are
located in the directory
$EDG_WL_LOCATION/etc/<VO name>/
i.e. there is one directory per VO . The VO name is lower case.
These directories are created by the LCFGng object called edg-lcfg-uicmnconfig, so if
the installation is not performed using LCFGng, after having installed the common
configuration rpm (edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm) that creates
the directory:
$EDG_WL_LOCATION/etc/vo_template
containing the file edg_wl_ui.conf, you must create in $EDG_WL_LOCATION/etc a directory
for each needed VO and copy in it the file
$EDG_WL_LOCATION/etc/vo_template/edg_wl_ui.conf
opportunely updated.
The edg_wl_ui.conf file is a classad containing the following fields:

VirtualOrganisation
this is a string representing the name of the virtual
organisation the file refers to. It should match with the name of the directory
containing the file. This parameter is mandatory.

NSAddresses
this is a string or a list of strings representing the
address or list of addresses (<host fqdn>:<port>) of the Network Servers available for
the given VO. Job submission is performed towards the first NS in the list and in case
of failure it is retried on each listed NS until succes or the end of the list is reached.
This parameter is mandatory.

LBAddresses
this is a string or a list of strings representing the
address or list of addresses (<host fqdn>:<port>) of the LB servers available for the
IST-2000-25182
PUBLIC
51 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE


DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
given VO. When job submission is performed, the UI choses randomly one LB server
within the list and uses it for generating the job identifier so that all information related
with that job will be managed by the chosen LB server. This allows distributing load
on several LB servers. This parameter is mandatory.
HLRLocation
this is a string representing the address (<host
fqdn>:<port>:<X509contact string>) of the HLR for the given VO. HLR is the service
responsible for managing the economic transactions and the accounts of user and
resources. This parameter is not mandatory. It is not present in the file by default. If
present, it makes the UI automatically add to the job description the HLRLocation JDL
attribute (if not specified by the user) and this enables accounting.
MyProxyServer
this is a string representing the MYProxy server address
(<host fqdn>) for the given VO. This parameter is not mandatory. It is not present in
the file by default. If present, it makes the UI automatically add to the job description
the MyProxyServer JDL attribute (if not specified by the user) and this enables proxy
renewal. If the myproxy client package is installed on the UI node, then this
parameter should be set equal to the MYPROXY_SERVER environment variable.
Herafter is provided an example of configuration file for the “atlas” Virtual Organisation. The
file path will hence be $EDG_WL_LOCATION/etc/atlas/edg_wl_ui.conf
[
VirtualOrganisation = "atlas";
NSAddresses = {
"ibm139.cnaf.infn.it:7772",
"grid013f.cnaf.infn.it:9772",
"grid012f.cnaf.infn.it:9772",
"grid004f.cnaf.infn.it:7771"
};
LBAddresses = {
"ibm139.cnaf.infn.it:9000",
"fox.to.infn.it:9000"
};
HLRLocation = "lilith.to.infn.it:56568:/C=IT/O=INFN/OU=Personal
Certificate/L=Torino/CN=Andrea Guarise/Email=A.Guarise@to.infn.it";
MyProxyServer = "skurut.cesnet.cz";
]
4.5.3.1. Python Command Line Interface
Configuration of the Command line User Interface is accomplished through the file
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf
IST-2000-25182
PUBLIC
52 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
This file is installed by the LCFGng object edg-lcfg-cliconfig, so if the installation is
not performed using LCFGng, after having installed the UI rpm (edg-wl-ui-cli-X.Y.ZK_gcc3_2_2.i486.rpm) that creates the file:
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf.template
you must copy it in :
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf
and update the content of the latter file opportunely.
The edg_wl_ui_cmd_var.conf file is a classad containing the following fields:

requirements
this is an expression representing the default value for the
requirements expression in the JDL job description. This parameter is mandatory.
The value of this parameter is assigned by the UI to the requirements attribute in the
JDL if not specified by the user. If the user has instead provided an expression for the
requirements attribute in the JDL, the one specified in the configuration file is added
(in AND) to the existing one. E.g. if in the edg_wl_ui_cmd_var.conf configuration file
there is:
requirements = other.GlueCEStateStatus == "Production" ;
and in the JDL file the user has specified:
requirements = other.GlueCEInfoLRMSType == "PBS";
then the job description that is passed to the NS contains
requirements
=
(other.GlueCEInfoLRMSType
(other.GlueCEStateStatus == "Production");
==
"PBS")
&&
Obviously the setting TRUE for the requirements in the configuration file does not have
any impact on the evaluation of job requirements as it would result in:
requirements = (other.GlueCEInfoLRMSType == "PBS") && TRUE ;

rank this is an expression representing the default value for the rank expression in
the JDL job description. The value of this parameter is assigned by the UI to the rank
attribute in the JDL if not specified by the user. This parameter is mandatory.

RetryCount this is an integer representing the default value for the number of
submission retries for a job upon failure due to some grid component (i.e. not to the
job itself). The value of this parameter is assigned by the UI to the RetryCount
attribute in the JDL if not specified by the user.

DefaultVo
this is a string representing the name of the virtual organisation to be
taken as the user’s VO (VirtualOrganisation attribute in the JDL) if not specified by
the user neither in the credentials VOMS extension, nor directly in the job description
nor through the --vo option. This attribute can be either set to “unspecified” or not
included at all in the file to mean that no default is set for the VO.
IST-2000-25182
PUBLIC
53 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016

ErrorStorage this is a string representing the path of the directory where the UI
creates log files. This directory is not created by the UI, so It has to be an already
existing directory. Default for this parameter is /tmp.

OutputStorage this is a string defining the path of the directory where the job
OutputSandbox files are stored if not specified by the user through commands
options. This directory is not created by the UI, so It has to be an already existing
directory. Default for this parameter is /tmp.

ListenerStorage this is a string defining the path of the directory where are created
the pipes where the edg_grid_console_shadow process saves the job standard
streams for interactive jobs. Default for this parameter is /tmp.

LoggingDestination this is a string defining the address (<host>:[<port]) of the
logging service (edg-wl-logd logging daemon ) to be targeted when logging events.
The UI first check the environment for the EDG_WL_LOG_DESTINATION variable
and only if this is not set, the value of the LoggingDestination parameter is taken into
account.

LoggingTimeout this is an integer representing the timeout in seconds for
asynchronous logging function called by the UI when logging events to the LB.
Recommended value for UI that are non-local to the logging service (edg-wl-logd
logging daemon) is not less than 30 seconds.

LoggingSyncTimeout this is an integer representing the timeout in seconds for
synchronous logging function called by the UI when logging events to the LB.
Recommended value is not less than 30 seconds.

DefaulStatusLevel this is an integer defining the default level of verbosity for the
edg-job-status command. Possible values are 0,1 and 2. 0 is the default and means
minimum verbosity. Default for this parameter is 0.

DefaultLogInfoLevel this is an integer defining the default level of verbosity for the
edg-job-get-logging-info command. Possible values are 0,1 and 2. 0 is the default and
means minimum verbosity. Default for this parameter is 0.

NSLoggerLevel this is an integer defining the quantity of information logged by the
NS client. Possible values range from 0 to 6. 0 is the defaults and means that no
information is logged. Default for this parameter is 0.
Hereafter is provided an example of the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf
configuration file.
[
requirements = other.GlueCEStateStatus == "Production" ;
rank = - other.GlueCEStateEstimatedResponseTime ;
RetryCount = 3 ;
ErrorStorage= "/var/tmp" ;
OutputStorage="/tmp";
ListenerStorage = "/tmp"
LoggingTimeout = 30 ;
LoggingSyncTimeout = 45 ;
IST-2000-25182
PUBLIC
54 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
LoggingDestination = "ibm139.cnaf.infn.it:9002" ;
DefaultStatusLevel = 1 ;
DefaultLogInfoLevel = 0;
NSLoggerLevel = 2;
DefaultVo = "cms";
]
The files:
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_err.conf
and
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_help.conf
contain respectively the error codes and error messages retyurned by the UI and the text
describing the commands usage.
4.5.3.2. Java GUI
The Java GUI is composed by three components:

JobSubmitter

JobMonitor

JDLEditor
Configuration of the Java GUI is accomplished through the file
$EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf
This file is installed by the LCFGng object edg-lcfg-guiconfig, so if the installation is
not performed using LCFGng, after having installed the UI rpm (edg-wl-ui-gui-X.Y.ZK.i486.rpm) that creates the file:
$EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf.template
you must copy it in :
$EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf
and update the content of the latter file opportunely.
The edg_wl_ui_gui_var.conf file is a classad containing the following fields:

JDLEDefaultSchema this is a string representing the default schema used by the
JDLEditor for building the rank and requirements expressions in the JDL job
description. This should be the schema of the Information Service describing the
resources targeted by the job submissions.
IST-2000-25182
PUBLIC
55 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
The following three attributes -- requirements, rank and rankMPI -- are elements of a subclassad that has the name of the schema (see example below):

requirements
this is an expression representing the default value for the
requirements expression in the JDL job description. This parameter is mandatory.
The value of this parameter is assigned by the UI to the requirements attribute in the
JDL if not specified by the user. If the user has instead provided an expression for the
requirements attribute in the JDL, the one specified in the configuration file is added
(in AND) to the existing one. E.g. if in the edg_wl_ui_gui_var.conf configuration file
there is:
requirements = other.GlueCEStateStatus == "Production" ;
and in the JDL file the user has specified:
requirements = other.GlueCEInfoLRMSType == "PBS";
then the job description that is passed to the RB will contain
requirements
=
(other.GlueCEInfoLRMSType
(other.GlueCEStateStatus == "Production");
==
"PBS")
&&
Obviously the setting TRUE for the requirements in the configuration file does not have
any impact on the evaluation of job requirements as it would result in:
requirements = (other.GlueCEInfoLRMSType == "PBS") && TRUE ;
This parameter is repeated in the configuration file once per each supported schema
(see example below).

rank this is an expression representing the default value for the rank expression in
the JDL job description. The value of this parameter is assigned by the UI to the rank
attribute in the JDL if not specified by the user. This parameter is mandatory. This
parameter is repeated in the configuration file once per each supported schema (see
example below).

rankMPI
this is an expression representing the default value for the rank
expression for MPI jobs (JobType = “MPICH”) in the JDL job description. The value of
this parameter is assigned by the UI to the rank attribute in the JDL if not specified by
the user. This parameter is repeated in the configuration file once per each supported
schema (see example below). If this parameter is not present in the configuration file
then the GUI takes as default the expression specified for the rank parameter also for
MPI jobs. This parameter is repeated in the configuration file once per each
supported schema (see example below).

RetryCount this is an integer representing the default value for the number of
submission retries for a job upon failure due to some grid component (i.e. not to the
job itself). The value of this parameter is assigned by the UI to the RetryCount
attribute in the JDL if not specified by the user.

ErrorStorage this is a string representing the path of the directory where the
JDLEditor component creates parsing errors log files. This directory is not created by
the GUI component, so it has to already exists on the machine. Default for this
parameter is /tmp.

LoggingDestination this is a string defining the address (<host>:[<port]) of the
logging service (edg-wl-logd logging daemon ) to be targeted when logging events.
The UI first check the environment for the EDG_WL_LOG_DESTINATION variable
IST-2000-25182
PUBLIC
56 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
and only if this is not set, the value of the LoggingDestination parameter is taken into
account.

LoggingTimeout this is an integer representing the timeout in seconds for
asynchronous logging function called by the UI when logging events to the LB.
Recommended value for UI that are non-local to the logging service (edg-wl-logd
logging daemon) is not less than 30 seconds.

LoggingSyncTimeout this is an integer representing the timeout in seconds for
synchronous logging function called by the UI when logging events to the LB.
Recommended value is not less than 30 seconds.

NSLoggerLevel this is an integer defining the quantity of information logged by the
NS client. Possible values range from 0 to 6. 0 is the defaults and means that no
information is logged. Default for this parameter is 0.
Hereafter is provided an example of the $EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf
configuration file. In the following example supported schemas are Glue and the old EDG
one.
[
JDLEDefaultSchema = "Glue" ;
Glue = [
rank = - other.GlueCEStateEstimantedResponseTime ;
rankMPI = other.GlueCEStateFreeCPUs;
requirements = other.GlueCEStateStatus == "Production";
] ;
EDG = [
rank = - other.EstimatedTraversalTime ;
rankMPI = other.FreeCPUs;
requirements = other.Active;
] ;
RetryCount = 3 ;
ErrorStorage= "/tmp" ;
LoggingTimeout = 30 ;
LoggingSyncTimeout = 60 ;
LoggingDestination = "ibm139.cnaf.infn.it:9002" ;
NSLoggerLevel = 0;
]
Additional files installed in $EDG_WL_LOCATION/etc are the Information Service schema
definition files:

edg_wl_ui_jdle_<IS_schema>.xml (Glue and EDG are currently supported)
the condor dtd for parsing job description written in classad/xml format
IST-2000-25182
PUBLIC
57 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
condor.dtd
and lastly the Log4j properties file (see 5.7.1)

edg_wl_ui_gui_log4j.properties
4.5.4. Environment variables
Environment variables that have to be set for the User Interface are listed hereafter:

X509_USER_KEY
the user private key file path. Default value is
$HOME/.globus/userkey.pem

X509_USER_CERT
the user certificate file path.Default value is
$HOME/.globus/usercert.pem

X509_CERT_DIR
the trusted certificate directory and ca-signing-policy
directory. Default value is /etc/grid-security/certificates

X509_USER_PROXY
the user proxy certificate file path. Default value is
/tmp/x509up_u<UID> where UID is the user identifier on
the machine as required by GSI.
These variables are used by the GSI layer to establish the security context.
Moreover there are:

GLOBUS_LOCATION
/opt/globus

EDG_WL_LOCATION
The User Interface installation path. It has to be set only
if installation has been made in a non-standard location. It defaults to /opt/edg

EDG_WL_LOG_DESTINATION
address (<host>:[<port]) of the logging service
(edg-wl-logd logging daemon ) to be targeted when logging events. This variable
takes precedence with respect to the value set into the UI configuration. It defaults to
localhost:9002.

EDG_WL_LOG_TIMEOUT the timeout in seconds for asynchronous logging
function called by the UI when logging events to the LB. Recommended value for UI
that are non-local to the logging service (edg-wl-logd logging daemon) is not less than
30 seconds. This variable takes precedence with respect to the value set into the UI
configuration.
IST-2000-25182
The Globus rpms installation path. It defaults to
PUBLIC
58 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
EDG_WL_LOG_SYNC_TIMEOUT this is an integer representing the timeout in
seconds for synchronous logging function called by the UI when logging events to the
LB. Recommended value is not less than 30 seconds. This variable takes
precedence with respect to the value set into the UI configuration.
4.5.4.1. Python Command Line Interface

EDG_WL_UI_CONFIG_VAR
Non-standard location of the command line
interface configuration file edg_wl_ui_cmd_var.conf. This variable points to the file
absolute path.

EDG_WL_UI_CONFIG_VO Non-standard location of the vo-specific UI configuration
file edg_wl_ui.conf. This variable points to the file absolute path.
4.5.4.2. Java GUI

EDG_WL_GUI_CONFIG_VAR
Non-standard location of the command line
interface configuration file edg_wl_ui_gui_var.conf. This variable points to the file
absolute path.

EDG_WL_GUI_CONFIG_VO Non-standard location of the vo-specific
configuration file edg_wl_ui.conf. This variable points to the file absolute path.
GUI
The GUI components also require setting of the following environment variables if the
corresponding packages are not installed in the standard location (/usr/share/java)

JAVA_INSTALL_PATH
the installation path of Java 2 Development Kit

COG_INSTALL_PATH
the installation path of the Globus Java CoG Kit

LOG4J_INSTALL_PATH
the installation path the Log4J package

CLASSADJ_INSTALL_PATH
classads java library
IST-2000-25182
installation path of the the customised Condor
PUBLIC
59 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
5. OPERATING THE SYSTEM
For security purposes all the WMS daemons run with proxy certificates. These certificates
are generated from the start-up scripts that are described in the following section, before the
applications are started. Lifetime of proxies created by the start-up scripts is 24 hours. In
order to provide the daemons with valid proxies for all their lifetime the administrators need to
ensure regular generation of new proxies. This can be achieved adding the following lines to
the machine /etc/crontab:
57
57
57
57
2,8,14,20
2,8,14,20
2,8,14,20
2,8,14,20
*
*
*
*
*
*
*
*
*
*
*
*
root
root
root
root
service
service
service
service
edg-wl-locallogger proxy
edg-wl-lbserver proxy
edg-wl-proxyrenewal proxy
edg-wl-ns proxy
This will make proxies be created by cron.
5.1. LB LOCAL-LOGGER
5.1.1. Starting and stopping daemons
To run the LB local-logger services, it suffices to issue as root the following command:
/etc/rc.d/init.d/edg-wl-locallogger start
This makes both the edg-wl-logd and the edg-wl-interlogd processes start.
The same can be done issuing the following commands:
<install path>/sbin/edg-wl-logd <options>
<install path>/sbin/edg-wl-interlogd <options>
Both daemons recognize a common set of options:
--key=<proxyfile>
despites the name, this should refer to the host proxy file (this
option overrides
value of the environment variable
X509_USER_KEY). Here below an example of option usage:
--key=/tmp/hostproxy.pem
--cert=<certfile>
despites the name, this should refer to the host proxy file (this
option overrides
value of the environment variable
X509_USER_CERT). Here below an example of option usage:
--cert=/tmp/hostproxy.pem
IST-2000-25182
PUBLIC
60 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
--CAdir=<certdir>
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
trusted certificate and ca-signing-policy directory (this option
overrides value of the environment variable X509_CERT_DIR).
Here below an example of option usage:
--CAdir=/etc/grid-security/certificates
--file-prefix=<file path>
Absolute path of the file where are stored locally the
logged events. The default value is /tmp/dglog, which
can result in risk of data loss in case of reboot. Note that
the same value must be specified for both daemons.
--socket=<local socket path>
Unix socket used for direct communication
between the daemons
.
--debug
make the process run in foreground and produce
diagnostics
--verbose
be more verbose (makes sense with --debug only)
--help
display usage message and exit
edg-wl-logd recognises the following specific option:
--port=<port number> listen on a non-default port
edg-wl-interlogd should be currently invoked with the --book option. It disables sending the
events to persistent log storage, which is not yet supported.
Using the options explicitly is recommended rather than relying on the correspondent
environment variables.
Stop of the LB local-logger services can be performed using the edg-wllocallogger script with
the stop option.
5.1.2. Troubleshooting
If the LB local-logger services are started in debug mode (i.e. using the –-debug option), the
daemons log fatal failures with syslog().
IST-2000-25182
PUBLIC
61 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
5.2. LB SERVER
5.2.1. Starting and stopping daemons
To run the LB server services, it suffices to issue as root the following command:
/etc/rc.d/init.d/edg-wl-lbserver start
This makes the edg-wl-bkserverd processes start.
The same can be done issuing the following commands:
<install path>/sbin/edg-wl-bkserverd <options>
The daemon recognizes this set of options:
--key=<keyfile>
despites the name, this should refer to the host proxy file (this
option overrides
value of the environment variable
X509_USER_KEY). Here below an example of option usage:
--key=/tmp/hostproxy.pem
--cert=<certfile>
despites the name, this should refer to the host proxy file (this
option overrides
value of the environment variable
X509_USER_CERT). Here below an example of option usage:
--cert=/tmp/hostproxy.pem
--CAdir=<certdir>
trusted certificate and ca-signing-policy directory (this option
overrides value of the environment variable X509_CERT_DIR).
Here below an example of option usage:
--CAdir=/etc/grid-security/certificates
--debug
make the process run in foreground to produce diagnostics
--port=<port number>
listen on a non-default port. Note that the release 2 server
listens also on <port number> + 1 for incoming events.
--mysql=<database>
connect to a non-default MySQL database. The database
string
takes
the
form
user/password@hostname:database.
IST-2000-25182
PUBLIC
62 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--slaves=<number>
spawn that many slaves, meaning that this number of
client connections can be handled in parallel.
--semaphores=<number>
use that many semaphores for internal job locking.
Defaults to the number of slaves and should not be
changed in normal operation.
--pidfile=<filename>
use non-default PID & lock file.
Using the options explicitly is recommended rather than relying on the correspondent
environment variables.
Stop of the LB server services can be performed using the edg-wl-lbserver script with the
stop option.
5.2.2. Creating custom indices
By default the LB server indexes data according to JobId only. Because the querying
capabilities of LB release 2 were considerably extended, the server refuses to process a
query which would not utilize any index – we prevent overloading the underlying database
engine in this way. Consequently, even a trivial query “give me all my jobs” results in an error
in the default setup – under certain conditions processing such query may require handling
gigabytes of data.
The server administrator can create and modify the set of indices and control the set of
supported queries in the following way, using the edg-wl-bkindex utility. It is invoked in the
following way:
edg-wl-bkindex [options] [<index file>]
where the recognised options are:
--mysql=<database>
non-default database to connect to, same as for edg-wlbkserverd
--verbose
be verbose
--dump
dump the current settings to stdout
--really
really perform reindexing. Without this option the required
actions are reported but not actually done.
The index file follows this syntax (subset of ClassAd syntax):
index-file ::= [ JobIndices = { index-description * } ]
index-description ::= column-description | list-of-columns
IST-2000-25182
PUBLIC
63 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
list-of-columns ::= { column-description + }
column-description ::= [ column-type; column-name; prefix-len ? ]
column-type ::= type = “user” | type = “system”
column-name ::= name = “actual column name”
prefix-len ::= prefixlen = integer
The only top-level attribute JobIndices is a list (possibly empty) of index descriptions.
Each index description is either a single column or a list of columns (where the order is
important). The column is described by mandatory attributes type and name, and an
optional attribute prefixlen.
Possible values of type are “system” for LB internal attributes and “user” for user tags –
arbitrary name=value pairs assigned to a job by the user.
Currently supported system column names are “owner”, “destination” and “location”.
Names of user tags are arbitrary as long as their length is less than 60 characters and they
contain only ASCII printable characters excluding backtick (`).
The prefixlen value may be used to restrict indexing of columns, which may grow rather
long, to a fixed size. This becomes necessary with compound indices as MySQL limits the
total size of index to 250 bytes only.
The following example index file contains two indices, the first one on a single system
attribute -- job owner, the second one composed from system attribute job destination and
user tag called “experiment number”:
[
JobIndices = {
[ type = "system"; name = "owner" ],
{
[ type = "system"; name = "destination";
prefixlen = 100 ],
[ type = "user"; name = "experiment number";
prefixlen = 100]
}
}
]
There is a sample configuration file, edg_wl_query_index.conf, containing definitions of
indices on all the currently supported indexed system attributes, i.e. “owner”, “destination”,
and “location”.
The edg-wl-bkindex should be run with the --really option with the LB server shut down.
Depending on actual size of the database the reindexing may take rather long time. LB
server becomes aware of the new index setup automatically on its startup.
IST-2000-25182
PUBLIC
64 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
5.2.3. Purging the LB database
The edg-wl-bkpurge process, whose executable is installed in <install path>/sbin, is not a
daemon but an utility which should be run periodically (e.g. using a cron job) in order to
remove inactive jobs (i.e. those that have already entered the Cleared status since a certain
amount of time) from the LB database. This utility recognizes the following set of options:
--log
data being purged from database are dumped to the
stdout
--outfile=<file>
data being purged from database are dumped in the file
named <file>
--mysql=<database>
name of the database to be purged. It must be the same
used by edg_wl_bkserverd (this option is not required in
the standard set-up
--timeout=<timeout>[smhd]
removes data for all jobs that entered the “Cleared”
status since more than <timeout>
[seconds/minutes/hours/days].
--debug
print diagnostics on the stderr
--nopurge
dry run mode. It doesn't really purge (useful for
debugging purposes)
--aborted,
-a
delete from the database data also for jobs that have
entered the “Aborted” status
If --log is specified, the data in ULM format are dumped to stdout (or <file>). Normally
information is appended to the file. The file is locked with flock (_LOCK_EX) to prevent race
conditions, e.g. rotating logs.
An example use of this utility is the following cron line to delete all data older than 14 days
from the database:
edg-wl-bkpurge --log --outfile=/var/log/dglb-data.log --timeout=14d
In general, the edg-wl-bkpurge utility may generate rather high background load on the
database engine. Therefore it should not be run too frequently (once a day is appropriate),
and preferably at the time of low LB server activity.
5.2.4. Experimental R-GMA Interface
The LB server release 2 is capable of feeding the R-GMA infrastructure with notifications on
job state changes. The functionality is enabled by starting edg-wl-bkserverd with the option
–-rgmaexport. In addition, the environment variables EDG_WL_RGMA_FILE and
EDG_WL_RGMA_SOCK has to be set to point to a file and local UNIX socket name used for
communication with the R-GMA producer. The producer itself is the Java program
LBProducer. It has to be invoked with the two environment variables set to the same values.
It takes no further arguments.
IST-2000-25182
PUBLIC
65 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
5.2.5. Troubleshooting
If the LB server services are started in debug mode (that is using the –-debug option) the
daemons log fatal failures with syslog().
5.3. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM
5.3.1. Starting and stopping NS, WM, JC and LM daemons
Startup of NS, WM, JC and LM can be achieved issuing:
/etc/rc.d/init.d/edg-wl-ns
/etc/rc.d/init.d/edg-wl-wm
/etc/rc.d/init.d/edg-wl-jc
/etc/rc.d/init.d/edg-wl-lm
start
start
start
start
In the same way stopping is achieved by:
/etc/rc.d/init.d/edg-wl-ns
/etc/rc.d/init.d/edg-wl-wm
/etc/rc.d/init.d/edg-wl-jc
/etc/rc.d/init.d/edg-wl-lm
stop
stop
stop
stop
The startup script for JC also starts and stops the underlying CondorG service. These scripts
will start the daemons with the correct selected users. Startup scripts can also be used to
know the current status of the daemons using the status option.
Moreover it is strongly recommended to set the configuration of the machine in such a way
that all these services will be started at the startup of the system.
5.3.2. NS, WM, JC, LM troubleshooting
The NS, WM, LC and LM services supply with log files recording their various events. These
files can be used to debug abnormal behaviors of these services. The log-file names and the
level of debugging (i.e. the level of “detail”) can be changed by directly modifying the
configuration file.
5.4. PROXY RENEWAL
5.4.1. Starting and stopping daemon
To run the PR daemon, it suffices to issue as root the following command:
IST-2000-25182
PUBLIC
66 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
/etc/rc.d/init.d/ edg-wl-proxyrenewal start
This makes the edg-wl-renewald process start.
The same can be done issuing the following command:
The daemon recognizes the following set of options:
--debug
make the process run in foreground and produce diagnostics
--repository=<dir> directory where registered proxies will be stored
(/var/spool/edg-wl-renewd by default)
5.4.2. Troubleshooting
If the PR service is started in debug mode (i.e. using the --debug option), the daemon prints
out fatal failures to stdout.
5.5. PURGER
The input/output sandbox directory for a given job are cleared in the “RB node” when the job
retrieves the output sandbox files (with the command edg-job-get-output command), or when
the job is declared as aborted.
To avoid that the file system of the “RB node” get fully used, the system administrator can
run the Storage Purger daemon, which is in charge to clean old input-output sandboxes,
according to a policy that has to be specified
The edg-wl-purgeStorage executable (the storage purger) accepts the following options:
-a=<argument>
--allocated-limit=<argument>
Defines a percentage of used space in the input/output sandbox disk, if the used
space is more than the specified argument, then the purging is triggered.
-b
--brute-rm
If enabled, ALL the input-output sandboxes directories are removed (use this option
with care).
-e
--enable-progress
Enables the progress indicator bar
IST-2000-25182
PUBLIC
67 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
.
-f
--fake-rm
Does not perform any directory removal.
-l=<argument>
--log-file=<argument>
Logs the purge information into the specified file.
If not specified, the default file is $EDG_WL_TMP/edg-wl-purgeStorage-<date><time>.log
-p=<argument>
--staging-path=<argument>
Defines the sandbox staging path (should be the same value specified as
SandboxStagingPath in the NS configuration section (see section 4.2.2.2).
If not specified the directory referenced by the $EDG_WL_TMP environment variable
is considered.
-q
--quiet
Does not create any log file (any settings specified with the option -l will be ignored).
-t=<argument>
--threshold=<argument>
The purge storage deletes the sandboxes directories for jobs which have been in
DONE status or ABORTED status for at least <argument> seconds (while the
sandboxes directories will be cleared for all jobs in CLEARED status).
If not specified the default value is 604800 (one week)
The storage purger should be regularly invoked, for example via a cron job.
These are two examples of cron rules:
edg-wl-purgeStorage-weekly.cron
# Execute the "purger" command at 4:00 AM, 8:00 AM, 12:00 noon,
# 4:00 PM, and 8:00 PM (0 */4) on each Sunday (sun).
0 */4 * * sun $EDG_WL_LOCATION/sbin/edg-wl-purgeStorage -l
$EDG_WL_LOCATION_VAR/log/edg-wl-purgeStorage.log -t 604800
edg-wl-purgeStorage-hourly.cron
IST-2000-25182
PUBLIC
68 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
# Execute the "purger" command at every day except on Sunday
# with a frequency of one hour if and only if the percentage of
# used space is greater than 40%
0 */1 * * mon-sat $EDG_WL_LOCATION/sbin/edg-wl-purgeStorage -l
$EDG_WL_LOCATION_VAR/log/edg-wl-purgeStorage.log -t 604800 -a
40
5.6. GRID ACCOUNTING
5.6.1. Starting and stopping daemon
5.6.1.1. HLR server
To run the HLR server daemon, it suffices to issue as root the following command:
/etc/rc.d/init.d/edg-wl-hlrd start
This makes the edg-wl-dgas-hlrd process start.
The daemon recognizes the following set of options:
--help
-h
Print an informative help message describing the options and then exit.
--conf
-c
Specifies the full path to the configuration file to be used (see section 4.4.2.1).
--port
-p
Specifies listening port number
--log
-l
Specifies the full path for the log file
5.6.1.2. PA Server
To run the PA server daemon, it suffices to issue as root the following command:
IST-2000-25182
PUBLIC
69 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
/etc/rc.d/init.d/edg-wl-pad start
This makes the edg-wl-dgas-pad process start.
The daemon recognizes the following set of options:
--help
-h
Print an informative help message describing the options and then exit.
--conf
-c
Specifies the full path to the configuration file to be used (see section 4.4.2.2).
--port
-p
Specifies listening port number
--log
-l
Specifies the full path for the log file
5.6.2. HLR server administration
To use the DGAS software, it is necessary first of all to create the accounts for users and
resources.
Users and resources are divided into groups. These groups can be used to collect statistics
about the expenses/earnings of a given subset of the users/resources of the HLR.
As example, let’s suppose that users Ua, Ub, Uc, Ud and the resources Ra, Rb belong to the
group Ga. When a user (e.g. Ub) spends some credits, his group is debited of the same
amount. When a resource (e.g. Ra) earns some credits, the corresponding group is credited
of the same amount.
Funds are containers of groups. They can be used for example to divide users or resources
belonging to different VOs on HLRs used to manage multiple VOs, or to achieve a better
granularity on large HLRs.
IST-2000-25182
PUBLIC
70 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Actually up to now this division into groups and funds has limited advantages, since it only
affects the way earnings/expenses are computed, but in future releases of the software it will
allow setting different priorities to the users.
The steps needed to create the accounts are:

Creating the fund accounts

Creating the group accounts

Creating the user/resource accounts
5.6.2.1. Creating a Fund account
The command to use is:
edg-wl-dgas-hlr-addFund [OPTIONS]
-i --interactive
-C --Conf <Configuration file name>
-f --fid "fid"
-F --Force "Force the record insertion"
-d --descr "description"
-t --total "total funds"
where:
-C is used to specify the conf file needed by the command to point to the HLR database
-f is used to specify a fund identifier (fid) that will be used to address the fund
-d is used to specify a human readable reminder of what this fund is
-t is used to assign credits to the fund. You can use 0 as a default value
For example, in order to create the fund VO_2, a command such as:
edg-wl-dgas-hlr-addFund -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \
-f VO_2 -d "Virtual Organization 2 account" -t 0
should be issued.
To check if a fund has been correctly inserted, the following command can be used:
IST-2000-25182
PUBLIC
71 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
edg-wl-dgas-hlr-queryFund -C ../etc/edg-wl-dgas-hlr.conf
which should get an output like:
|VO_2|Virtual Organization 2 account|0|0|
5.6.2.2. Creating a Group account
The command to be used is:
edg-wl-dgas-hlr-addGroup [OPTIONS]
-i --interactive
-F --Force
-C --Conf
Configuration file name
-g --gid "gid"
-d --descr "description"
-f --fid "fid"
-t --total "total funds"
-b --booked "booked funds" shouldn't be specified manually
-s --spent "spent funds" shouldn't be specified manually
where:
-C specifies the configuration file.
-g specifies a group identifier (gid) used to address this group
-d is a reminder of what the group is
-f is used to specify the fid of the fund to link this group with
-t is the amount of credits assigned to the group, you can use 0
For example, in order to create the group Group3, it is necessary to issue a command like:
edg-wl-dgas-hlr-addGroup -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \
-g Group3 -d "Users and resources of VO_2" -f VO_2 -t 0
To check if a Group has been correctly inserted, the following command can be used:
IST-2000-25182
PUBLIC
72 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
./edg-wl-dgas-hlr-queryGroup -C ../etc/edg-wl-dgas-hlr.conf
which should return an output like:
|Group3|Users and resources of VO_2|VO_2|0|0|0|
5.6.2.3. Creating a User account
The command to be used to create a user account is:
edg-wl-dgas-hlr-addUser [OPTIONS]
-i --interactive
-C --Conf
Configuration file name
-F --Force
Force record insertion
-u --uid "uid"
-e --email "email"
-d --descr "description"
-c --cert "cert subject"
-g --gid "gid"
-f --fid "fid"
-a --assigned "assigned funds"
-b --booked "booked funds" shouldn't be specified manually
-s --spent "spent funds" shouldn't be specified manually
where:
-C specifies the configuration file.
-u specifies an identifier for the user (uid) (nothing to do with the Unix uid !)
-d A reminder of who the user is e.g. his real name
-c The User X509 cert subject
-g gid of the user Group.
-f fid of the user fund.
-a amount of credits assigned to the user. Use 0.
For example, to create the user Ua, it is necessary to issue a command like:
edg-wl-dgas-hlr-addUser -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \
IST-2000-25182
PUBLIC
73 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
-u Ua -e user.a@userdomain -d "UserA desc" \
-c "UserCertSubject" -g Group3 -f VO_2 -a 0
To check if a User has been correctly inserted, the command:
edg-wl-dgas-hlr-queryUser -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" -U
can be used.
If everything is fine, the output should look like:
|Ua|user.a@userdomain|UserA desc|UserCertSubject|Group3|VO_2|0|0|0|
5.6.2.4. Creating a Resource account
The command used to create a resource account is:
edg-wl-dgas-hlr-addResource [OPTIONS]
-i --interactive
-F --Force
-C --Conf
Configuration file name
-r --rid "rid"
-e --email "email for contact person"
-d --descr "description"
-c --cert "cert subject"
-g --gid "gid"
-f --fid "fid"
where:
-C specifies the configuration file.
-r specifies an identifier for the user (rid) used to address the resource
-e specifies an email address of a contact person for that resource
-d specifies a description of the resource
-c the CeID of the assigned to this account
-g the gid of the resource group
-f the fid of the resource fund
IST-2000-25182
PUBLIC
74 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
For example to create the resource Ra, it is necessary to issue a command like:
edg-wl-dgas-hlr-addResource -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \
-r Ra -e resource.administrator@domain -d "Res desc" \
-c "CeID" -g Group3 -f VO_2
To check if a Resource has been correctly inserted, the following command can be used:
edg-wl-dgas-hlr-queryResource -C "/opt/edg/etc/edg-wl-dgas-hlr.conf"
-R
The output should be something like:
|Ra|resource.administrator@domain|Res desc|CeID|Group3|VO_2|0|
5.6.2.5. Deleting accounts
The commands:
edg-wl-dgas-hlr-delFund
edg-wl-dgas-hlr-delGroup
edg-wl-dgas-hlr-delResource
edg-wl-dgas-hlr-delUser
can be used to delete respectively fund, group, resource, user accounts.
5.6.3. Troubleshooting
Both the HLR and the PA server supply with log files, ad described in the previous sections.
These files can be used to debug abnormal behaviors of the services.
5.7. USER INTERFACE (JAVA GUI)
As already mentioned in section 4.5.3.2 the Java GUI encompasses three components that
are:

JobSubmitter

JobMonitor

JDLEditor
IST-2000-25182
PUBLIC
75 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
To start the these components, it suffices to issue respectively the following commands:
$EDG_WL_LOCATION/sbin/edg-wl-ui-jobsubmitter.{sh,csh}
$EDG_WL_LOCATION/sbin/edg-wl-ui-jobmonitor.{sh,csh}
$EDG_WL_LOCATION/sbin/edg-wl-ui-jdleditor.{sh,csh}
where the scripts listed above are installed by the edg-wl-ui-gui-X.Y.Z-K.i486.rpm
rpm. It is important to note that from the JobSubmitter it is possible to start the other GUI
components.
With the exception of the JDLEditor that does not interact with any external WMS module,
before starting the GUI components, the X509_USER_KEY, X509_USER_CERT,
X509_USER_PROXY environment variables have to be set if user credentials are not stored
in the default locations. It is worth recalling that the GUI only needs a valid proxy certificate
for working correctly.
Lastly the environment variables JAVA_INSTALL_PATH, COG_INSTALL_PATH,
LOG4J_INSTALL_PATH and CLASSADJ_INSTALL_PATH that are used by the scripts to
set the java CLASSPATH need to be set if the corresponding packages are not installed in
the standard location (/usr/share/java).
5.7.1. Troubleshooting
The GUI supplies with log files recording its various events. These files can be used to debug
abnormal behaviors of the three components. The log-file names and the level of debugging
(i.e. the level of “detail”) can be changed by directly modifying the log4j configuration file
$EDG_WL_LOCATION/etc/ edg_wl_ui_gui_log4j.properties.
The configuration files is written in Java properties format (e.g. <attribute name>=<attribute
value>) as for the examples reported below.
log4j allows to print logging information to different multiple destinations or appenders. The
most important kinds of appenders available are: console, files, GUI components and remote
socket servers. It is possible to log information in synchronous or asynchronous manner.
When using an appender it is important to define and associate a layout with it. The layout is
the way the logging information is formatted during the logging request. The PatternLayout
allows the user to specify the output format according to his preferences or criteria.
Here is an example of configuration file:
# Setting root level
(1)
log4j.rootLogger=DEBUG, appender1
# Setting appender1 as console appender
(2)
log4j.appender.appender1=org.apache.log4j.ConsoleAppender
IST-2000-25182
PUBLIC
76 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
# Setting PatternLayout for appender1
(3)
log4j.appender.appender1.layout=org.apache.log4j.PatternLayout
(4)
log4j.appender.appender1.layout.ConversionPattern=%-4r
5p %c - %m%n
[%t]
%-
The line (1) is used to set the logging level and the appender used to log information. In this
example the level is DEBUG indicating that all logging requests are enabled i.e. all logging
information will be wrote to the appender. It is possible to change this value choosing from
one of the available logger levels (DEBUG, INFO, WARN, ERROR and FATAL). If the
logging level is set to ERROR then only ERROR and FATAL logging requests will be
activated. The appender used to log is set to appender1. Line (2) defines appender1 as a
ConsoleAppender to write log information to the user console. Lines (3) and (4) are
necessary to define the layout used to write log information to the appender. The first line set
the layout as a PatternLayout and the second one is used to describe the pattern
(ConversionPattern) to use.
The following configuration file defines two appenders: a console appender and a file
appender to write logging information to a file.
# Setting root level
(1)
log4j.rootLogger=ERROR, appender1, appender2
# Setting appender1 as console appender
(2)
log4j.appender.appender1=org.apache.log4j.ConsoleAppender
# Setting PatternLayout for appender1
(3)
log4j.appender.appender1.layout=org.apache.log4j.PatternLayout
(4)
log4j.appender.appender1.layout.ConversionPattern=%-4r
5p %c - %m%n
[%t]
%-
# Setting appender2 as external file appender
(5)
log4j.appender.appender2=org.apache.log4j.RollingFileAppender
(6)
log4j.appender.appender2.File=example.log
(7)
log4j.appender.appender2.MaxFileSize=200KB
(8)
log4j.appender.appender2.MaxBackupIndex=1
# Setting PatternLayout for appender2
(9)
log4j.appender.appender2.layout=org.apache.log4j.PatternLayout
(10)
log4j.appender.appender2.layout.ConversionPattern=%-4r
5p %c - %m%n
[%t]
%-
Lines from (1) to (4) define the appender1 as seen in the first example. Lines from (5) to (10)
set the appender2 as a file appender. More precisely: line (5) sets the appender as a rolling
file (first inserted lines will be lost when the maximum size will be reached); line (6) sets the
name of the file appender; line (7) sets the maximum file size in Kbytes; line (8) says to use a
IST-2000-25182
PUBLIC
77 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
second file to store information when roll-over occurs (the file will be named
example.log.1). Lines (9) and (10) define the pattern layout in the same way as done for
appender1.
Here are the meaning of some fields used to define the pattern layout:
%r
number of milliseconds elapsed since the start of the application
%t
thread making the log request
%p
level of the log statement
%c
name of the logger associated with the log request
%m
message of the statement
%n
new line feed
%d
date and time
%F
java file name
%L
java file line number
Here below is reported the log4j configuration file for the GUI that is installed by the rpm
edg-wl-ui-gui-X.Y.Z-K_gcc3_2_2.i486.rpm. As can be seen it sets the appender
as a rolling file (/var/tmp/edg_wl_ui_gui_log4j.log) and the logging level to FATAL.
Some alternatives are provided (commented) in the file; the user can anyway customize this
file according to her/his needs.
# Setting root level
# log4j.rootLogger=loggerLevel, appenderList
# Possible values for loggerLevel are: DEBUG, INFO, WARN, ERROR, FATAL
# appenderList is a list of appenders separated by a comma
log4j.rootLogger=FATAL, myAppender
# Setting myAppender as ConsoleAppender
# log4j.appender.myAppender=org.apache.log4j.ConsoleAppender
# Setting myAppender as external file appender
# log4j.appender.myAppender=org.apache.log4j.FileAppender
# log4j.appender.myAppender.File=/var/tmp/edg_wl_ui_gui_log4j.log
# Setting myAppender as external rolling file appender
log4j.appender.myAppender=org.apache.log4j.RollingFileAppender
log4j.appender.myAppender.File=/var/tmp/edg_wl_ui_gui_log4j.log
log4j.appender.myAppender.MaxFileSize=500KB
log4j.appender.myAppender.MaxBackupIndex=1
IST-2000-25182
PUBLIC
78 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
# Setting PatternLayout for myAppender
log4j.appender.myAppender.layout=org.apache.log4j.PatternLayout
# Log4j basic configurator conversion pattern
# log4j.appender.myAppender.layout.ConversionPattern=%-4r [%t] %-5p %c %m%n
# Use this conversion pattern to show java file name and line number
# log4j.appender.myAppender.layout.ConversionPattern=%d [%t] %-5p %c
(%F:%L) %n \t %m%n
log4j.appender.myAppender.layout.ConversionPattern=%d
%m%n
IST-2000-25182
PUBLIC
[%t]
%-5p
%c
%n
\t
79 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6. USER GUIDE
The software module of the WMS allowing the user to access main services made available
by the components of the scheduling sub-layer is the User Interface that hence represents
the entry-point to the whole system.
Sections 6.1.1 and 6.1.2 provide a general description of the UI, dealing with the security
management, common behaviours, environment variables to be set etc. Section 6.1.3
describes the Job Submission User Interface commands in a Unix man-page style.
6.1. USER INTERFACE
The Job Submission UI is the module of the WMS allowing the user to access main services
made available by the components of the scheduling sub-layer. The user interaction with the
system is assured by means of a JDL and a command-driven user interface providing
commands to perform a certain set of basic operations. Main operations made possible by
the UI are:
- Submit a job for execution on a remote Computing Element, also encompassing:
 automatic resource discovery and selection
 staging of the application sandbox (input sandbox)
- Find the list of resources suitable to run a specific job
- Cancel one or more submitted jobs
- Retrieve the output files of a completed job (output sandbox)
- Retrieve and display bookkeeping information about submitted jobs
- Retrieve and display logging information about submitted jobs.
-
Retrieve checkpoint states of a submitted checkpointable job.
-
Start a local listener for an interactive job.
The User Interface depends on two other Workload Management System components:
- the Network Server that provides support for the job control functionality
- the Logging and Bookkeeping Service that provides support for the job monitoring
functionality.
6.1.1. Security
For the DataGrid to be an effective framework for largely distributed computation, users, user
processes and grid services must work in a secure environment.
Due to this, all interactions between WMS components, especially those that are networkseparated, will be mutually authenticated: depending on the specific interaction, an entity
authenticates itself to the other peer using either its own credential or a delegated user
credential or both. For example when the User Interface passes a job to the Network Server,
the UI authenticates using a delegated user credential (a proxy certificate) whereas the NS
uses its own service credential. The same happens when the UI interacts with the Logging
and Bookkeeping service. The UI uses a delegated user credential to limit the risk of
compromising the original credential in the hands of the user.
The user or service identity and their public key are included in a X.509 certificate signed by
a EDG trusted Certification Authority (CA), whose purpose is to guarantee the association
between that public key and its owner.
IST-2000-25182
PUBLIC
80 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
According to what just premised, to take advantage of UI commands the user has to possess
a valid X.509 certificate on the submitting machine, consisting of two files: the certificate file
and the private key file. The location of the two mentioned files is assumed to be either
pointed to respectively by
“$X509_USER_CERT” and “$X509_USER_KEY” or by
“$HOME/.globus/usercert.pem” and “$HOME/.globus/userkey.pem” if the X509 environment
variables are not set.
Actually the user certificate and private key files are not mandatory on the UI machine;
indeed they are only needed for the creation of the delegated user credentials through the
grid-proxy-init or edg-voms-proxy-init commands. It is for example possible to download the
proxy credentials from a trusted site and work with it without having the cert and key
available locally.
What is really needed is the user proxy credentials: all UI commands, when started, check
for the existence and expiration date of a user proxy credentials in the location pointed to by
“$X509_USER_PROXY” or in “/tmp/x509up_u<UID>” (<UID> is the user identifier in the
submitting machine OS) if the X509 environment variable is not set. If the proxy certificate
does not exist or has expired the UI returns an error message to the user and exits.
Once a job has been submitted by the UI, it passes through several components of the WMS
(e.g. the NS, the WM, the JC, CondorG etc.) before it completes its execution. At each step
operations that are related with the job could require authentication by a certificate. For
example during the scheduling phase, the RB needs to get some information about the user
who wants to schedule a job and the certificate of the user could be needed to access this
information. Similarly, a valid user’s certificate is needed by JC/CondorG to submit a job to
the CE. Moreover JC has to be able to repeat this process e.g. in case of crashing of the CE
which the job is running on, therefore, a valid user’s certificate is needed for all the job
lifetime.
A job gets a valid proxy certificate when it is submitted by the UI to NS. Validity of such a
certificate is usually set to 12 hours, hence problems could occur if the job spends on CE (in
a queue or running) more time than lifetime of its proxy certificate.
In order to submit long-running jobs, users can either generate proxy credentials using the
respectively the --valid and --hours of the grid-proxy-init and edg-voms-proxy-init commands
or (more safely) rely on the features of the MyProxy package, as introduced in section 4.3.
The underlying idea is that the user registers in a MyProxy server a valid long-term certificate
proxy that will be used by the WMS to perform a periodic credential renewal for the submitted
job; in this way the user is no longer obliged to create very long lifetime proxies when
submitting jobs lasting for a great amount of time. A more detailed description of this
mechanism is provided in the following paragraph.
6.1.1.1. MyProxy
The MyProxy credential repository system consists of a server and a set of client tools that
can be used to delegate and retrieve credentials to and from a server. Normally, a user
would start by using the myproxy_init client program along with the permanent credentials
necessary to contact the server and delegate a set of proxy credentials to the server along
with authentication information and retrieval restrictions.
6.1.1.1.1. MyProxyClient
IST-2000-25182
PUBLIC
81 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
The set of binaries provided for the client is made of the following files:
myproxy-init
myproxy-info
myproxy-destroy
myproxy-get-delegation
myproxy-init command allows you to create and send a delegated proxy to a myproxy server
for later retrieval; in order to launch it you have to assure you're able to execute the gridproxy-init GLOBUS command (i.e.t he binary is visible from your $PATH environment and
the required cert files are either stored in the common path or specified with the X509
variables). You can use the command as follows (you will be asked for your PEM
passhprase):
myproxy-init -s <host name> -t <hours> -d –n
The myproxy-init command stores a user proxy in the repository specified by <host name>
(the –s option). Default lifetime of proxies retrieved from the repository will be set to <hours>
(see -t) and no password authorization is permitted when fetching the proxy from the
repository (the -n option). The proxy is stored under the same username as is your subject
in your certificate (-d).
The myproxy-info command returns the remaining lifetime of the proxy in the repository along
with subject name of the proxy owner (in our case it will be the same as in your proxy
certificate). So if you want to get information about the stored proxies you can issue:
myproxy-info -s <host name> -d
where -s and -d options have already been explained in the myproy-init command
The myproxy-destroy command simply destroys any existing proxy stored in the myproxy
server. You can use it as follows:
myproxy-destroy
-s <host name> -d
where -s and -d options have already been explained in the myproy-init command
The myproxy-get-delegation command is indeed used to retrieve information about the
proxies stored in the myproxy server. You can use it as follows:
myproxy-get-delegation -s <host name> -d -t <hours> \
-o <output file> -a <user proxy>
You should end up with a retrieved proxy in <output file>, which is valid for
IST-2000-25182
PUBLIC
82 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
<hours> hours.
It is worth noting that the environment variable MYPROXY_SERVER can be set to tell to all
these programs the hostname where the myproxy server is running.
6.1.2. Common behaviours
A User Interface installation mainly consists of four directories bin, lib, etc and share that are
created under the UI installation path that is usually pointed by the EDG_WL_LOCATION
environment variable. If this variable is not set or its value is not correct, default value is
assumed to be “/opt/edg”.
bin contains the commands executables and hence it is recommended to add it to the user
PATH environment variable to allow her/him to use UI commands from whatever location. lib
contains the shared libraries (wrappers of the NS/LB APIs) implementing functionalities for
accessing the NS and LB services. Moreover lib contains a subdirectory named python
containing some python modules also needed for accessing the underlying services. ,
etc is the UI configuration area: it contains the file containing the mapping between error
codes and error messages (edg_wl_ui_cmd_err.conf), the file containing the detailed
description of each command (edg_wl_ui_cmd_help.conf) and the actual configuration file
(edg_wl_ui_cmd_var.conf). The latter file is the only one that could need to be edited and
tailored according to the user/platform characteristics and needs. It contains the following
information that are read by and have influence on commands behaviour (see section 4.5.3
for details):
-
default location of the local storage areas for the Output sandbox files,
-
default location for the UI log files,
default values for the JDL mandatory attributes,
-
default values for timeouts when logging events to the LB,
-
default logging destination,
-
user’s default VO,
-
default level of information displayed by the monitoring commands
Inside etc there is a directory for each supported EDG Virtual Organisation and named as
the VO (e.g. for atlas we will have etc/atlas/) that contains a vo-specific configuration file
edg_wl_ui.conf specifying the list of Network Servers and LBs accessible for the given VO.
When started, UI commands first check if the EDG_WL_LOCATION is set and then search
for the etc directory containing its configuration files in the following locations, in order of
precedence: “$EDG_WL_LOCATION”, “/opt/edg“, “/“, “/usr/local“. If none of the locations
contains needed files an error is returned to the user.
Since several users on the same machine can use a single installation of the UI, people
concurrently issuing UI commands share the same configuration files. Anyway for users (or
groups of users) having particular needs it is possible to “customise” the UI configuration
through the --config and –config-vo options supported by each UI command.
IST-2000-25182
PUBLIC
83 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Indeed every command launched specifying “--config file_path” reads its configuration
settings in the file pointed by “file_path” instead of the default configuration file. The same
happens for the vo-specific configuration file if the command is started using specifying “–
config-vo vo_file_path”. Hence the user only needs to create such file according to her/his
needs and to use the appropriate options to work under “private” settings.
Moreover if the user wants to make this change in some way permanent avoiding the use for
each issued command of the --config option, she/he can set the environment variable
EDG_WL_UI_CONFIG_VAR to point to the non-standard path of the configuration file.
Indeed if that variable is set commands will read settings from file
“$EDG_WL_UI_CONFIG_VAR”. Anyway the --config option takes precedence on all other
settings.
Exactly the same applies to the EDG_WL_UI_CONFIG_VO environment variable and the -config-vo option.
It is important to note that since the job identifiers edg_jobId (see section 6.1.3 – edg-jobsubmit) implicitly holds the information about the LB that is managing the corresponding job,
all the commands taking the edg_jobId as input parameter do not take into account the LB
addresses listed in the configuration file to perform the requested operation also if the –
config-vo option has been specified.
Hereafter are listed the options that are common to all UI commands:
--config
--noint
--debug
file_path
--logfile file_path
--version
--help
The --noint option skips all interactive questions to the user and goes ahead in the command
execution. All warning messages and errors (if any) are written to the file
<command_name>_<UID>_<PID>_<date_time>.log in the location specified in the
configuration file instead of the standard output. It is important to note that when --noint is
specified some checks on “dangerous actions” are skipped. For example if jobs cancellation
is requested with this option, this action will be performed without requiring any confirmation
to the user. The same applies if the command output will overwrite an existing file, so it is
recommended to use the --noint option in a safe context.
The --debug option is mainly thought for testing and debugging purposes; indeed it makes
the commands print additional information while running. Every time an external API function
call is encountered during the command execution, values of parameters passed to the API
are printed to the user. The info messages are displayed on the standard output and are also
written
together
with
possible
errors
and
warnings,
to
<command_name>_<UID>_<PID>_<date_time>.log.
If --noint option is specified together with --debug option the debug message will not be
printed on standard output.
The –logfile <file_path> option allows re-location of the commands log files in the location
pointed by file_path.
IST-2000-25182
PUBLIC
84 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
The --version and --help options respectively make the commands display the UI current
version and the command usage.
Two further options that are common to almost all commands are --input and --output. The
latter one makes the commands redirect the outcome to the file specified as option argument
whilst the former reads a list of input items from the file given as option argument. The only
exception is the edg-job-list-match command that does not have the --input option.
6.1.2.1. The --input option
For all commands, the file given as argument to the --input option shall contain a list of job
identifiers in the following format: one edg_jobId for each line, comments beginning with a “#”
or a “*” character. If the input file contains only one edg_jobId (see the description of dg-jobsubmit command later in this document for details about edg_jobId format), then the request
is directly submitted taking the edg_jobId as input, otherwise a menu is displayed to the user
listing all the contained items, i.e. something like:
--------------------------------------------------------------------------1 : https://ibm139.cnaf.infn.it:9000/ZU9yOC7AP7AOEhMAHirG3w
2 : https://ibm139.cnaf.infn.it:9000/ZU9yOC767gJOEhMAHirG3w
3 : https://ibm135.cnaf.infn.it:9000/ZU9yOC7AP7A55TREAHirG3w
4 : https://grid012f.cnaf.infn.it:7846/ZUHY6707AP7AOEhMAHirG3w
5 : https://grid012f.cnaf.infn.it:9000/Cde341P7AOEhMAHirG3w
6 : https://ibm139.cnaf.infn.it:9000/BgT8T6H_L92FsKq3OeTWOw
7 : https://ibm139.cnaf.infn.it:9000/lYlPBQez7fiXx9qq7BEdyw
8 : https://ibm139.cnaf.infn.it:9000/_f0Bm_s6UdFPZIEjSglipg
a : all
q : quit
--------------------------------------------------------------------------Choose one or more edg_jobId(s) in the list - [1-10]all:
The user can choose one or more jobs from the list entering the corresponding numbers.
Single jobs can be selected specifying the numbers associated to the job identifiers
separated by commas. Ranges can also be selected specifying ends separated by a dash
and it is worth mentioning that it is possible to select at the same time ranges and single
jobs. E.g.:

2
makes the command take the second listed edg_jobId as input

1,4
makes the command take the first and the fourth listed edg_jobIds as input

2-5 makes the command take listed edg_jobIds from 2 to 5 (ends included) as
input

1,3-5,8 selects the first job id in the list, the ids from the third to the fifth (ends
included) and finally the eighth one.

all
makes the command take all listed edg_jobIds as input

q
makes the command quit
Default value for the choice is all. If the –input option is used together with the --noint then all
edg_jobIds contained in the input file are taken into account by the command.
IST-2000-25182
PUBLIC
85 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
There are some commands whose --input behaviour differs from the one just described. One
of them is edg-job-submit. First of all the input file contains in this case CEIds instead of
edg_jobIds, moreover only one CE at a time can be the target of a submission hence the
user is allowed to choose one and only one CEId. Default value for the choice is “1”, i.e. the
first CEId in the list. This also the choice automatically made by the command when the -input option is used together with the --noint one.
The other commands are edg-job-attach and edg-job-get-chkpt whose --input option
allows to select one (just one) of the edg_jobIds contained in the input file.
IST-2000-25182
PUBLIC
86 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3. Commands description
In this section we describe syntax and behavior of the commands made available by the UI
to allow job submission, monitoring and control.
In the commands synopsis the mandatory arguments are showed between angle brackets
(<arg>) whilst the optional ones between square brackets ([arg]).
6.1.3.1. edg-job-submit
Allows the user to submit a job for execution on remote resources in a grid.
SYNOPSIS
edg-job-submit
[options]
Options:
--help
--version
--vo
--input, -i
--resource, -r
--chkpt
--nolisten
--nogui
--nomsg
--config, -c
--config-vo
--output, -o
--noint
--debug
<jdl_file>
<vo_name>
<file_path>
<ce_id>
<file_path>
<file_path>
<file_path>
<file_path>
--logfile <file_path>
DESCRIPTION
edg-job-submit is the command for submitting jobs to the DataGrid and hence allows the
user to run a job at one or several remote resources. edg-job-submit requires as input a job
description file in which job characteristics and requirements are expressed by means of
Condor class-ad-like expressions. While it does not matter the order of the other arguments,
the job description file has to be the last argument of this command.
The job description file given in input to this command is syntactically checked and default
values are assigned to some of the not provided mandatory attributes in order to create a
IST-2000-25182
PUBLIC
87 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
meaningful class-ad. The resulting job-ad is sent to the NS, which then forwards it to the
WM, which via the RB/Matchmaker finds the job best matching resource (match-making) and
then the JC submits the job to it. The match-making algorithm is described in details in
Annex 7.7.
Upon successful completion this command returns to the user the submitted job identifier
edg_jobId (a string that identifies unambiguously the job in the whole EDG), generated by the
User Interface, that can be later used as a handle to perform monitor and control operations
on the job (e.g. see edg-job-status described later in this document). The format of the
edg_jobId is as follows:
https://Lbserver_address[:port]/unique_string
The unique_string is a md5 string computed taking into account the following information:
- IP of the User Interface machine,
- timestamp,
- process ID (more UI instances may occur on the same machine),
- sequence or just random number (if the User Interface submits jobs in batches and
more than one per second can be submitted),
The final md5 sum is encoded using modified Base64 encoding (“:” is used instead of “/”)
ensuring reasonable uniqueness and compactness of job IDs.
The structure of the edg_jobId that could appear in some way complex and not easily
readable, has been conceived in order to ensure uniqueness and at the same time contain
information that are needed by the components of the WMS to fulfil user requests.
The --vo option allows the user to specify the Virtual Organisation she/he is currently working
for in case she/he is working with non-VOMS credentials. Indeed, if the user proxy
credentials currently available on the UI contains VOMS extensions specifying one or more
VOs, then the default VO from the proxy credentials has precedence over all other possible
choiches and is taken as the current working VO.
If the --vo option is not used (and the proxy credentials does not contain extensions), then
the VirtualOrganisation attribute in the JDL is considered. If this attribute has not been
specified
in
the
JDL,
then
the
default
VO
specified
in
the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file is
considered. Otherwise an error is returned to the user.
The --resource option can be used to target the job submission to a specific known resource
identified by the provided Computing Element identifier ce_id (returned by edg-job-listmatch described later in this document). The CE identifier is a string published in the IS (the
GlueCEUniqueID field in the Glue schema) that univocally identifies a resource belonging to
the Grid. The admitted format for CEId is:
<full-hostname>:<port-number>/jobmanager-<service>-<queue-name>
where <service> is for example lsf, pbs, bqs, condor but can also be a different string as it is
freely set by the site administrator when the queue is set-up.
When the --resource option is specified, the WMS skips completely the match making
process and directly submits the job to the requested CE. It is important to note that in this
IST-2000-25182
PUBLIC
88 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
case the “.BrokerInfo” file is not generated even if data requirements have been specified in
the JDL, so jobs submitted using this option should not rely on the .BrokerInfo file information
when running on the CE. The “.BrokerInfo” file is a file generated by the RB/Matchmaker
during matchmaking and contains information about the location where input data specified
in the JDL are physically stored, the SEs that are “close” to the CE chosen for submitting the
job etc. It is shipped within the InputSandbox to the CE where the job is going to run so that it
can be used at run-time to get information (through the appropriate API) for accessing data.
Details about the “.BrokerInfo” file and the BrokerInfo API can be found in [R1].
A way for performing direct submission to a given CE and at the same time having the
“.BrokerInfo” file generated by RB and shipped to the CE is to not use the --resource option
and specify the following requirements in the JDL:
Requirements = other.GlueCEUniqueID == <Ce_identifier>;
(e.g. Requirements = other.GlueCEUniqueID == “lxde01.pd.infn.it:2119/jobmanager-lsfgrid01”;)
It is also possible to specify the target CE to which submit the job using the --input option.
With the --input option an input_file must be supplied containing a list of target CE ids. In this
case the edg-job-submit command parses the input_file and displays on the standard
output the list of CE Ids written in the input_file. The user is then asked to choose one CEId
between the listed ones. The command will then behave exactly like already explained for
the --resource option. The basic idea of this command is to use as input_file the output file
generated by the edg-job-list-match command when used with the --output option (see
edg-job-list-match) that contains the list of CE Ids (if any) matching the requirements
specified in the jobad.jdl file. An example of a possible sequence of commands is:
>$ edg-job-list-match --output CEList.out jobad.jdl
>$ edg-job-submit --input CEList.out jobad.jdl
If CEList.out contains more than one CEId then the user is prompted for choosing one Id
from the list.
It is possible to redirect the returned edg_jobId to an output file using the --output option. If
the file already exists, a check is performed: if the file was previously created by the
command edg-job-submit (i.e. it contains a well defined header), the returned edg_jobId is
appended to the existing file every time the command is launched. If the file wasn’t created
by the command edg-job-submit the user will be prompted to choose if overwrite the file or
not. If the answer is no the command will abort.
The edg-job-submit command has a particular behaviour when the job description file
contains the InputSandbox attribute whose value is a list of file paths on the UI machine local
disk. The purpose of the introduction of the InputSandbox attribute is to stage, from the UI to
the CE, files that are needed for the execution.
To better understand, let’s suppose to have a job that needs for the execution a certain set of
files having a small size and available on the submitting machine. Let’s also suppose that for
performance reasons it is preferable not going through the WP2 data transfer services for the
staging of these files on the executing node. Then the user can use the InputSandbox
attribute to specify the files that have to be staged from the submitting machine to the
executing CE. All of them are indeed transferred at job submission time together with the job
IST-2000-25182
PUBLIC
89 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
class-ad to the NS that will store them temporarily on its local disk. The JobWrapper will then
perform the staging of these files on the executing node. The size of files to be transferred to
the “RB node” should be small since overfull of RB node local storage means that no more
job of this type can be submitted (see section 4.2.4.3).
This mechanism can also be used to stage a job executable available locally on the UI
machine to the executing CE. Indeed in this case the user has to include this file in the
InputSandbox list (specifying its absolute path in the file system of the UI machine) and as
Executable attribute value has only to specify the file name. On the contrary, if the
executable is already available in the file system of the executing machine, the user has to
specify as Executable an absolute path name for this file (if necessary using environment
variables). The same argument can be applied to the standard input file that is specified
through the StdInput JDL attribute.
Since the InputSandbox expression can consist of a great number of file names, it is
admitted the use of wildcards and environment variables to specify the value of this attribute.
Syntax and allowed wildcards are described in Annex 7.6.
It is important to note that since the gridftp protocol (the protocol used for the InputSanbox
files staging) in general doesn't preserve the x flag, the script specified as Executable in the
JDL (on which chmod +x is done automatically by the WP1 JobWrapper), should perform a
chmod +x for all the files needing execution permission, that are transferred within the
InputSandbox of the job.
For the standard output and error of the job the user shall instead always specify just file
names (without any directory path) through the StdOutput and StdError JDL attributes. To
have them staged back on the UI machine it suffices to list them in the OutputSandbox and
use after job completion the edg-job-get-output command described later in this document.
The list of data specification JDL attributes is completed by the InputData and OutputData
attributes.
InputData refers to data used as input by the job that are not subjected to staging and are
stored in one or more storage elements and published in replica catalogues. When the user
specifies the InputData attribute then he/she also has to provide the protocol her/his
application is able to “speak” for accessing data (DataAccessProtocol attribute). The
InputData attribute should contain a list of Logical File Names (LFN) and/or Grid Unique
Identifilers (GUID).
There is no need to specify the Replica Location Service to be contacted for resolving the
logical files names and GUIDs to storage files names as it is automatically determined by the
WP2 software through the VO the user belongs to. This information is provided by in the
VirtualOrganisation JDL attribute (filled by the UI).
It is worth noting that the usage for the ranking phase of the WP2 getAccessCost (see 7.7.3)
i.e. ranking CEs according to the cost for accessing data, can be triggered through the JDL
by setting the rank as follows:
Rank = other.DataAccessCost;
The OutputData attribute allows instead the user to ask for the automatic upload and
registration of datasets produced by the job on the WN.
Through this attribute it is possible to indicate for each output file the LFN to be used for
registration and the SE on which the file has to be uploaded.
IST-2000-25182
PUBLIC
90 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Both LFN and SE are optional in the sense that if no LFN is indicated then it is assigned
automatically by the WP2 services (RM) and if no SE is indicated, the close SE is
considered.
OutputData is a list of classads, where each classad indicates the name of the file to be
uploaded, the logical file to be used and the SE where the file has to be copied. E.g.:
OutputData = {
[
OutputFile = "dataset_1.out ";
StorageElement = "se1.cnaf.infn.it";
LogicalFileName = "lfn:LFN_1"
],
[
OutputFile = "dataset_2.out ";
StorageElement = "se2.pd.infn.it";
LogicalFileName = "lfn:LFN_2"
],
]
OutputFile = "dataset_3.out ";
StorageElement = "se3.cesnet.cz";
LogicalFileName = "lfn:LFN_3"
]
};
If the attribute OutputData is found in the JDL then the JobWrapper at the end of the job calls
the WP2 “copy And Register” service that copies the file from the WN onto the specified SE
and registers it with the given LFN. As usual, logical file names have to be prefixed with the
string “lfn:”.
If the specified LFN is already in use, WP2 RM registers the file with a newly generated
identifier GUID (Grid Unique Identifier).
During
this
process
the
JobWrapper
creates
a
file
(named
“DSUpload_<unique_jobid_string>.out”) that is put automatically in the OutputSandbox
attribute list by the UI and can then be retrieved by the user. This file contains the results of
the upload and registration process in the following format:
<FILE_NAME> <LFN | ERROR>
e.g. in our case we could have:
dataset_1.out
dataset_2.out
dataset_3.out
LFN_1
<GUID2>
<error code returned by RM>
meaning that dataset_1.out was uploaded successfully and registered as LFN_1,
dataset_1.out was uploaded successfully but with name <GUID2> (assigned by the ERM)
IST-2000-25182
PUBLIC
91 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
since LFN_2 was already in use and upload of dataset_3.out failed for the reason specified
by the reported error message.
It is worth noting that the StorageElement attributes of the OutputData list are not taken into
account by the RB for the matchmaking, so the job could have run on a CE that is not close
to the specified SEs. Due to this it is suggested (unless the user has particular needs) to omit
the StorageElement specification so that the close SEs are automatically taken into account
for the datasets upload.
The Arguments attribute in the JDL allows the user to specify all the command line
arguments needed to start the job. They have to be specified as a single string, e.g. the job
sum that is started with:
$ sum
N1 N2 –out result.out
is described by:
Executable = “sum”;
Arguments = “N1 N2 –out result.out”;
If you want to specify a quoted string inside the Arguments then you have to escape quotes
with the \ character. E.g. when describing a job like:
$ grep –i “my name” *.txt
you will have to specify:
Executable = “/bin/grep”;
Arguments = “-i \”my name\” *.txt”;
Analogously, if the job takes as argument a string containing a special character (e.g. the job
is the tail command issued on a file whose name contains the quotes character, say
file1&file2), since on the shell line you would have to write:
$ tail –f file1\&file2
in the JDL you’ll have to write:
Executable = “/usr/bin/tail”;
Arguments = “-f file1\\\&file2”;
i.e. a \ for each special character.
IST-2000-25182
PUBLIC
92 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
In general, special characters such as &, |, >, < are only allowed if specified inside a
quoted string or preceded by triple \.
The character “`” cannot be specified in the Arguments attribute of the JDL.
The RetryCount attribute allows setting the number of submission retries for a job upon
failure due to some grid component (i.e. not to the job itself). RetryCount has to be a positive
number and the actual number of submission retries for a job is represented by the minimum
value between RetryCount itself and the value of the MaxrRetryCount parameter in the WM
configuration file (see section 4.2.2.3). It suffices setting RetryCount to 0 to disable job
resubmission.
It is important to recall here that the safest way for submitting long-running jobs is to use the
proxy renewal feature provided by the WMS. To do this the user should use the myproxy-init
command (see section 6.1.1.1) before the edg-job-submit. The myproxy-init command
registers indeed in a MyProxy server a valid long-term certificate proxy that will be used by
WMS to perform a periodic credential renewal for the submitted job.
When using the myproxy-init command the user has to specify either through the –s option
or the MYPROXY_SERVER environment variable the host name of the MyProxy server
where to store the certificate proxy.
To trigger the proxy renewal mechanism, the same MyProxy server address has to be
specified in the JDL through the MyProxyServer attribute (this can also be made a default
behaviour through the configuration – see 4.5.3.1).
An example of the JDL setting is provided hereafter:
MyProxyServer = “skurut.cesnet.cz”;
Note that the port number must not be provided.
Interactive jobs are specified setting the JDL JobType attribute to “Interactive”. When an
interactive job is submitted, the edg-job-submit command starts a grid console shadow
process in the background that listens on a port for the job standard streams. Moreover the
edg-job-submit command opens a new window where the incoming job streams are
forwarded. The port on which the shadow process listens is assigned by the OS, but can be
forced through the ListenerPort attribute in the JDL.
As the command in this case opens a X window, the user should make sure the DISPLAY
environment variable is correctly set, a X server is running on the local machine and if she/he
is connected to the UI node from remote machine (e.g. with ssh) enable secure X11
tunneling.
If this is not possible, the user can specify the --nogui option that makes the command
provide a simple standard non-graphical interaction with the running job.
Another option that is reserved for interactive jobs is --nolisten: it makes the command
forward the job standard streams coming from the WN to named pipes on the UI machine
whose names are returned to the user together with the OS id of the listener process. This
allows the user to interact with the job through her/his own tools. It is important to note that
when this option is specified, the UI has no more control over the launched listener process
that has hence to be killed by the user (through the returned process id) when the job is
finished.
IST-2000-25182
PUBLIC
93 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
For interactive jobs the UI automatically requires for the job outbound IP connectivity on the
WN
adding
(in
AND
to
the
user
defined
expression)
the
other.GlueHostNetworkAdapterOutboundIP to the JDL Requirements expression.
Checkpointable jobs are specified setting the JDL JobType attribute to “Checkpointable”.
When a checkpointable job is submitted the user can specify the number (or list) of steps in
which the job can be logically decomposed and the step to be considered as the initial one.
This can be done setting respectively the JDL attributes JobSteps and CurrentStep.
CurrentStep is a mandatory attribute and if not provided by the user, it is set automatically to
0 by the UI.
The --chkpt option allows the submission of a checkpointable job specifying as input a
checkpoint state generated by a previously submitted job. This option makes the submitted
job start running from the checkpoint state given in input and not from the very beginning.
The initial checkpoint states to be used with this option can be retrieved by means of the
edg-job-get-chkpt command (see 6.1.3.8). A checkpoint state is a JDL file as described in
[R3].
MPI jobs are specified setting the JDL JobType attribute to “MPICH”. When a MPI job is
submitted the presence of the NodeNumber attribute (it specifies the required number of
CPUs) in the JDL is mandatory and the UI automatically requires the MPICH runtime
environment installed on the CE and a number of CPUs at least equal to the required
number of nodes. This is done adding (in AND to the user defined expression) the following
expression
(other.GlueCEInfoTotalCPUs >= NodeNumber) &&
Member(other.GlueHostApplicationSoftwareRunTimeEnvironment,"MPICH")
to the the JDL Requirements expression.
Lastly the --nomsg option makes the command display neither messages nor errors on the
standard output. Only the edg_jobId assigned to the job is printed to the user if the command
was successful. Otherwise the location of the generated log file containing error messages is
printed on the standard output. This option has been provided to make easier use of the edgjob-submit command inside scripts in alternative to the --output option.
It is important to note that the edg-job-submit is a sort of fire-and-forget command, i.e. it
exits successfully once the JDL has been passed to the NS and the InputSandbox files have
been transferred. It does not matter about what happens afterwards to the job.
Understanding the reason of a job abort can however be accomplished by using the edgjob-status (especially looking at the “Status Reason” field) and edg-job-get-logging-info on
the job identifier returned from the submission.
JOB DESCRIPTION FILE
A job description file contains a description of job characteristics and constraints in a classad style. A general description of the class-ad language is provided in document [A5].
The job description file must be edited by the user to insert relevant information about the job
that is later needed by the RB to perform the match-making. Job description file entries are
strings having the format attribute = expression and are terminated by the semicolon
character. Attribute expressions can span several lines provided the semicolon is put only at
the end of the whole expression. Comments must be preceded by a sharp character (#) or
have to follow the C++ syntax, i.e a double slash (//) at the beginning of each line or
statements begun/ended respectively with “/*” and “*/”.
IST-2000-25182
PUBLIC
94 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Being the class-ad an extensible language, it doesn’t exist a fixed set of admitted attributes,
i.e. the user can insert in the job description file whatever attribute he believes meaningful to
describe her/his jobs, anyway only the attributes that can be in some way connected with the
resource ones published in the IS are taken into account by the Matchmaker/RB for the
match-making process. Unrelated attributes are simply ignored except when they are used to
build the Requirements expression. In the latter case they are indeed evaluated and could
affect the match-making result. The attributes taken into account by the RB together with
their meaning are listed in annex 7.1 and described in detail in document [A1]. That is the
document that has to be followed by the user when writing the JDL description of her/his
jobs.
There is a small subset of JDL attributes that are compulsory, i.e. that have to be present in a
job class-ad before it is sent to the Network Server in order to make possible the performing
of the match making and submission.
They can be grouped in two categories: some of them must be provided by the user whilst
some other, if not provided, are filled by the UI with configurable default values. The following
Table 1 summarises what just stated.
Attribute
Mandatory
Mandatory with default value
(default value)
Type

“Job”
JobType

“Normal”
Executable


Requirements
other.GlueCEStateStatus == "Production"
[configurable]

Rank
other.GlueCEStateFreeCPUs
(for MPICH jobs)
other.GlueCEStateEstimatedResponseTime
(for all other job types)
[configurable]
NodeNumber

(only for MPICH jobs)

0
CurrentStep
(only for checkpointable jobs)

VirtualOrganisation
[configurable]
DataAccessProtocol
IST-2000-25182

(only if InputData has
PUBLIC
95 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Attribute
Mandatory
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Mandatory with default value
(default value)
been specified)
InputData

(only if
DataAccessProtocol has
been specified)
Table 1 Mandatory Attributes
In Table 1 the default values for Requirements and Rank can be interpreted respectively as
follows:
-
if the user has not provided job constraints then Requirements is set to
(other.GlueCEStateStatus == "Production"), i.e. the target CE has to be active.
-
Since in the JDL the greater is the value of Rank the better is considered the match, if no
expression for Rank has been provided, then the resources where the jobs waits a
shorter time to pass from the SCHEDULED to the RUNNING status are preferred, hence
the Rank expression is set to (- other.GlueCEStateEstimatedResponseTime). MPICH
jobs are an exception as they have as default rank other.GlueCEStateFreeCPUs
meaning that the preferred resources are the ones having the higher number of free
CPUs.
The default values for the Requirements and Rank attributes can be set in the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf file. See section 4.5.3.1 for details on
how to set these defaults.
As the classad language (and hence the JDL) is an extensible language, it allows the user to
freely include new attributes within the job description. These attributes are ignored by the
WMS for the scheduling but are passed-through by the UI (if their syntax is correct) since
they could be relevant for the submitter of for some other component processing the JDL.
However if the job description file contains attributes that are unknown to the WMS, the UI
will print a warning (when used with the --debug option) listing all of them.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--vo vo_name
This option allows the user to specify the Virtual Organisation she/he is currently
working for.
If the user proxy contains VOMS extensions then the VO specified through this option
is overridden by the default VO contained in the proxy (i.e. this option is only useful
IST-2000-25182
PUBLIC
96 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
when working with non-VOMS proxies). The following precedence rule is followed for
determining the user's VO:

the default VO from the user proxy (if it contains VOMS extensions),

the VO specified through the --vo or --config-vo options,

the VO specified in the configuration file pointed by the
EDG_WL_UI_CONFIG_VO environment variable,

the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS
extensions this value is overridden as above),

the default VO specified in the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field)
configuration file.
If none of the listed trials has success an error is returned and the submission is
aborted.
--resource ce_id
-r ce_id
if the command is launched with this option, the job-ad sent to the NS contains a line
of the type SubmitTo = ce_id and the job is submitted by the WMS to the resource
identified by ce_id without going through the match-making process. Accepted format
for the CEId is:
<full hostname>:<port number>/jobmanager-<service>-<queue name>
where <service> could be for example lsf, pbs, bqs, condor but can also be a different
string as it is freely set by the site administrator when setting the queue.
Note that when this option is used, the “.BrokerInfo” file is not generated.
--input file_path
-i input_file
if this option is specified, the user will be asked to choose a CEId from a list of CEs
contained in the file_path. Once a CEId has been selected the command behaves as
explained for the --resource option. If this option is used together with the –noint one
and the input file contains more than one CEId, then the first CEId in the list is taken
into account for submitting the job.
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--config-vo file_path
if the command is launched with this option, the vo-specific configuration file pointed
to by file_path is used instead of the standard vo-specific configuration file.
IST-2000-25182
PUBLIC
97 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--output file_path
-o file_path
writes the generated edg_jobId assigned to the submitted job in the file specified by
out_file. out_file can be either a simple name or an absolute path (on the submitting
machine). In the former case the file out_file is created in the current working
directory.
--chkpt file_path
This option can be used only for checkpointable jobs. The state specified as input is a
checkpoint state generated by a previously submitted job. This option makes the
submitted job start running from the checkpoint state given in input and not from the
very beginning.
The initial checkpoint states to be used with this option can be retrieved by means of
the edg-job-get-chkpt command (see 6.1.3.8).
--nogui
This option can be used only for interactive jobs. As the command for such jobs
opens a X window, the user should make sure a X server is running on the local
machine and if she/he is connected to the UI node from remote machine (e.g. with
ssh) enable secure X11 tunneling. If this is not possible, the user can specify the -nogui option that makes the command provide a simple standard non-graphical
interaction with the running job.
--nolisten
This option can be used only for interactive jobs. It makes the command forward the
job standard streams coming from the WN to named pipes on the UI machine whose
names are returned to the user together with the OS id of the listener process. This
allows the user to interact with the job through her/his own tools. It is important to
note that when this option is specified, the UI has no more control over the launched
listener process that has hence to be killed by the user (through the returned process
id) once the job is finished.
--nomsg
this option makes the command print on the standard output only the edg_jobId
generated for the job if submission was successful; the location of the log file
containing massages and diagnostics is printed otherwise.
--noint
if this option is specified every interactive question to the user is skipped and all
warning messages and errors (if occurred) are written to the file edg-jobsubmit_<UID>_<PID_<timestamp>.log under the /tmp directory. Log file location is
configurable.
IST-2000-25182
PUBLIC
98 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--debug
when this option is specified, information about parameters used for the API functions
calls inside the command are displayed on the standard output and are written to
edg-job-submit_<UID>_<PID>_<timestamp>.log file under the /tmp directory too. Log
file location is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
jdl_file
this is the file containing the JDL describing the job to be submitted. It must be the
last argument of the command.
EXIT STATUS
edg-job-submit exits with a status value of 0 (zero) upon success, and >0 (greater than
zero) upon failure.
EXAMPLES
1. $> edg-job-submit –vo cms myjob1.jdl
where myjob1.jdl is as follows:
##############################################
#
# -------- Job description file ---------#
##############################################
[
JobType = "Normal" ;
Executable
= "$(CMS)/fpacini/exe/sum.exe";
InputData
= "lfn:testbed0-00019";
DataAccessProtocol = "gridftp";
Rank
Requirements
= other.GlueCEPolicyMaxCPUTime;
= other.GlueCEInfoLRMSType == "Condor" && \
(!(RegExp("*nikhef*",other.GlueCEUniqueID)));
IST-2000-25182
PUBLIC
99 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
submits sum.exe to a resource (supposed to contain the executable file) whose LRMS is
Condor and not containing the string “nikhef” in the CE identifier. The command returns
the following output to the user, containing the job handle (edg_jobid):
================= edg-job-submit Success ==================================
The job has been successfully submitted to the Network Server. Your job is identified by
(edg_jobId):
https://ibm139.cnaf.infn.it:9000/ZU9yOC7AP7AOEhMAHirG3
Use edg-job-status command to display current job status.
======================================================================
2. $> edg-job-submit --chkpt /home/test/state10.chkpt myjob2.jdl
Submits the checkpointable job described by myjob2.jdl that will start running from the initial
state state10.chkpt.
SEE ALSO
[A1], [A2], edg-job-list-match, edg-job-attach, edg-job-get-chkpt.
IST-2000-25182
PUBLIC
100 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.2. edg-job-get-output
This command retrieves the job output files (specified by the OutputSandbox attribute of the
job-ad) from the RB node and stores them on the submitting machine local disk.
SYNOPSIS
edg-job-get-output
Options:
--help
--version
--input, -i
--dir
--config, -c
--noint
--debug
--logfile
[options]
<job Id(s)>
<file_path>
<directory_path>
<file_path>
<file_path>
DESCRIPTION
The edg-job-get-output command can be used to retrieve the output files of a job that has
been submitted through the edg-job-submit command with a job description file including
the OutputSandbox attribute. After the submission, when the job has terminated its
execution, the user can download the files generated by the job and temporarily stored on
the RB machine as specified by the OutputSandbox attribute, issuing the edg-job-getoutput with as input the edg_jobId returned by the edg-job-submit. It is also possible to
specify a list of job identifiers when calling this command or an input file containing
edg_jobIds by means of the --input option. When the --input is used, the user is requested to
choose all, one or a subset of the job identifiers contained in the input file.
It is important to note that the OutputSandbox of a submitted job can only be retrieved when
the job has reached the Done status (see Annex 7.2) indicating that the job has successfully
terminated its execution and the OutputSandbox files are ready for retrieval on the RB node.
edg-job-get-output will always fail for jobs that are not yet in the Done status.
The user can decide the local directory path on the UI machine where these files have to be
stored by means of the --dir option, otherwise the retrieved files are put in a default location
specified in the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf configuration file
(OutputStorage parameter). In both cases a sub-directory will be added to the path supplied.
The name of this sub-directory is the unique string of the edg_jobId identifier (see command
edg-job-submit for details on the edg_jobId structure) prefixed by the user login name
(value of the LOGNAME environment variable).
IST-2000-25182
PUBLIC
101 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
If the user wants to use her/his “private” configuration file, this can be done using option -config path_name. As a consequence the edg-job-get-output command looks for the file
“path_name” instead of the standard configuration file. If this file does not exist the user is
notified with an error message and the command is aborted.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--dir directory_path
retrieved files (previously listed by the user through the OutputSandbox attribute of
the job description file) are stored in the location indicated by directory_path/<login
name>_<edg_jobId unique string>.
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if occurred) are written to the file edg-job-getoutput_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of log file
is configurable.
--debug
when this option is specified, information about parameters used for the API functions
calls inside the command are displayed on the standard output and are written to dgget_job_output_<UID>_<PID>_<timestamp>.log file under the /tmp directory too.
Location of log file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
edg_jobId
IST-2000-25182
PUBLIC
102 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
job identifier returned by edg-job-submit. If a list of oe or more job identifiers is
specified, edg_jobIds have to be separated by a blank. Job identifiers must be last
argument of the command.
--input file_path
-i file_path
this option makes the command return the OutputSandbox files for each edg_jobId
contained in the file_path. This option can't be used if one (or more) edg_jobIds have
been already specified. The format of the input file must be as follows: one edg_jobId
for each line and comment lines must begin with a "#" or a "*" character. See 6.1.2.1
for details about this option.
EXIT STATUS
edg-job-get-output exits with a status value of 0 (zero) upon success, >0 upon failure and
<0 upon partial failure. An example of partial failure is when more than one job identifiers has
been specified and the OuputSandbox could be retrieved only for some of them.
IST-2000-25182
PUBLIC
103 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
EXAMPLES
Let us consider the following command issued by the user logges as mrossi:
$> edg-job-get-output https://ibm139.cnaf.infn.it:9000/CiXMLojKC_iLsvSHfEhqIQ --dir
/home/data
It retrieves the files listed in the OutputSandbox attribute of job identified by
https://ibm139.cnaf.infn.it:9000/CiXMLojKC_iLsvSHfEhqIQ from the RB node and stores
them locally in /home/data/mrossi_CiXMLojKC_iLsvSHfEhqIQ.
IST-2000-25182
PUBLIC
104 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.3. edg-job-list-match
Returns the list of resources fulfilling job requirements specified in the JDL job description
SYNOPSIS
edg-job-list-match
[options]
<jdl file>
Options:
--help
--version
--verbose
--rank
--config, -c
--config-vo
--vo
--output, -o
--noint
--debug
--logfile
<file_path>
<file_path>
<vo_name>
<file_path>
<file_path>
edg-job-list-match displays the list of identifiers of the resources on which the user is
authorized and satisfying the job requirements included in the job description file. The CE
identifiers are returned either on the standard output or in a file according to the chosen
command options, and are strings univocally identifying the CEs published in the IS.
The returned CEIds are listed in decreasing order of rank, i.e. the one with the best (greater)
rank is in the first place and so on.
The --rank option makes the command also display the rank value for each found CEId.
The --vo option allows the user to specify the Virtual Organisation she/he is currently working
for in case she/he is working with non-VOMS credentials. Indeed, if the user proxy
credentials currently available on the UI contains VOMS extensions specifying one or more
VOs, then the default VO from the proxy credentials has precedence over all other possible
choiches and is taken as the current working VO.
If the --vo option is not used (and the proxy credentials does not contain extensions), then
the VirtualOrganisation attribute in the JDL is considered. If this attribute has not been
specified
in
the
JDL,
then
the
default
VO
specified
in
the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file is
considered. Otherwise an error is returned to the user.
edg-job-list-match requires a job description file in which job characteristics and
requirements are expressed by means of a class-ad. The job description file is first
syntactically checked and then used as the main command-line argument to edg-job-listmatch. The Network Server is only contacted to find job compatible resources; the job is not
IST-2000-25182
PUBLIC
105 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
submitted. See the edg-job-submit section 6.1.3.1 and in particular Error! Reference
source not found. for general rules for building the job description file.
If the user wants to use his “private” configuration, file this can be done using option --config
path_name.
The option --verbose of the dg-job-list-match command can be used to obtain on the
standard output the class-ad sent to the RB generated from the job description.
The --output option makes the command save the list of compatible resources into the
specified file. If the provided file name is not an absolute path, then the output file is created
in the current working dir.
JOB DESCRIPTION FILE
See section 6.1.3.1for details.
IST-2000-25182
PUBLIC
106 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
OPTIONS
OPTIONS
--help
displays command usage.
--version
displays UI version.
--verbose
-v
displays on the standard output the job class-ad that is sent to the Network Server
generated from the job description file. This differs from the content of the job
description file since the UI adds to it some attributes that cannot be directly inserted
by the user (e.g., defaults for Rank and Requirements if not provided,
VirtualOrganisation etc).
--rank
displays the “matching” CEIds and the associated ranking values.
--vo vo_name
This option allows the user to specify the Virtual Organisation she/he is currently
working for.
If the user proxy contains VOMS extensions then the VO specified through this option
is overridden by the default VO contained in the proxy (i.e. this option is only useful
when working with non-VOMS proxies). The following precedence rule is followed for
determining the user's VO:

the default VO from the user proxy (if it contains VOMS extensions),

the VO specified through the --vo or --config-vo options,

the VO specified in the configuration file pointed by the
EDG_WL_UI_CONFIG_VO environment variable,

the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS
extensions this value is overridden as above),

the default VO specified in the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field)
configuration file.
If none of the listed trials has success an error is returned and the submission is
aborted.
IST-2000-25182
PUBLIC
107 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--config-vo file_path
if the command is launched with this option, the vo-specific configuration file pointed
to by file_path is used instead of the standard vo-specific configuration file.
--output file_path
-o file_path
returns the CEIds list in the file specified by file_path. file_path can be either a simple
name or an absolute path (on the submitting machine). In the former case the file
file_path is created in the current working directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if any) are written to the file edg-job-list-match
<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of the log file is
configurable.
--debug
when this option is specified, information about the API functions called inside the
command are displayed on the standard output and are written to the file edg-job-listmatch_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location of the
log file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
jdl_file
this is the file containing the classad describing the job to be submitted. It must be the
last argument of the command.
EXIT STATUS
edg-job-list-match exits with a status value of 0 upon success, and a >0 value upon failure.
IST-2000-25182
PUBLIC
108 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
EXAMPLES
Let us consider the following command:
$> edg-job-list-match myjob.jdl
where the job description file myjob.jdl looks like:
#########################################
#
# ---- Sample Job Description File
----
#
#########################################
JobType = "Normal";
Executable
StdInput
= "sum.exe";
= "data.in";
InputSandbox = {"/home_firefox/fpacini/exe/sum.exe","/home1/data.in"};
OutputSandbox = {"data.out","sum.err"};
Rank
= other.GlueCEPolicyMaxCPUTime;
Requirements = other.GlueCEInfoLRMSType == "Condor" &&
other.GlueHostArchitecturePlatformType== "INTEL" &&
other.GlueHostOperatingSystemName == "LINUX" &&
other.GlueCEStateFreeCPUs >= 2;
In this case the job requires CEs being Condor Pools of INTEL LINUX machines with at least
2 free Cpus. Moreover the Rank expression states that queues with higher maximum CPU
time allowed for jobs are preferred.
The response of such a command is something as follows:
***************************************************************************
Computing Element IDs LIST
The following CE(s) matching your job requirements have been found:
*CEId*
bbq.mi.infn.it:2119/jobmanager-pbs-dque
skurut.cesnet.cz:2119/jobmanager-pbs-wp1
***************************************************************************
$>
SEE ALSO
[A1],[A2], edg-job-submit.
IST-2000-25182
PUBLIC
109 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.4. edg-job-cancel
Cancels one or more submitted jobs.
SYNOPSIS
edg-job-cancel
[options]
Options:
--help
--version
--all
--input, -i
--config, -c
--config-vo
--vo
--output, -o
--noint
--debug
--logfile
<job Id(s)>
<file_path>
<file_path>
<file_path>
<vo_name>
<file_path>
<file_path>
DESCRIPTION
This command cancels a job previously submitted using edg-job-submit. Before
cancellation, it prompts the user for confirmation. The cancel request is sent to the Network
Server that forwards it to the WM that fulfils it.
edg-job-cancel can remove one or more jobs: the jobs to be removed are identified by their
job identifiers (edg_jobIds returned by edg-job-submit) provided as arguments to the
command and separated by a blank space. The result of the cancel operation is reported to
the user for each specified edg_jobId.
If the --all option is specified, all the jobs owned by the user submitting the command are
removed. When the command is launched with the --all option, no edg_jobId can be
specified. It has to be remarked that only the owner of the job can remove the job. When the
--all option is specified the UI queries each LB listed in the vo-specific configuration file
$EDG_WL_LOCATION/etc/<vo_name>/edg_wl_ui.conf for getting the identifiers of all the
jobs owned by the user identified by her/his certificate subject. Afterwards the UI sends a
cancellation request to the NS for each job being in a status for which the cancellation is
allowed.
Job states for which cancellation is allowed are:
- Submitted
-
Waiting
IST-2000-25182
PUBLIC
110 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
-
Ready
-
Scheduled
Running
-
Unknown
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
For all the other job states the cancellation request will result in a failure.
If the user wants to use his “private” configuration file this could be done using option --config
file_path.
The --input option permits to specify a file (file_path) that contains the edg_jobIds to be
removed. The format of the file must be as follows: one edg_jobId for each line and comment
lines must begin with a “#” or a “*” character. When using this option the user is interrogated
for choosing among all, one or a subset of the listed job identifiers. If the file_path does not
represent an absolute path the file will be searched in the current working directory.
IST-2000-25182
PUBLIC
111 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
OPTIONS
--help
displays command usage.
--version
displays UI version.
--all
cancels all job owned by the user submitting the command. This option can’t be used
either if one or more edg_jobIds have been specified explicitly or with the –input
option.
--input file_path
-i file_path
cancels edg_jobId contained in the file_path. This option can’t be used neither if one
or more edg_jobIds have been specified nor with the –all option. See 6.1.2.1 for
details about this option.
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--config-vo file_path
if the command is launched with this option, the vo-specific configuration file pointed
to by file_path is used instead of the standard vo-specific configuration file. This
option is allowed only when used together with the --all one.
--vo vo_name
This option allows the user to specify the Virtual Organisation she/he is currently
working for.
If the user proxy contains VOMS extensions then the VO specified through this option
is overridden by the default VO contained in the proxy (i.e. this option is only useful
when working with non-VOMS proxies). The following precedence rule is followed for
determining the user's VO:

the default VO from the user proxy (if it contains VOMS extensions),

the VO specified through the --vo or --config-vo options,

the VO specified in the configuration file pointed by the
EDG_WL_UI_CONFIG_VO environment variable,
IST-2000-25182
PUBLIC
112 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016

the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS
extensions this value is overridden as above),

the default VO specified in the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field)
configuration file.
If none of the listed trials has success an error is returned and the submission is
aborted. This option is not allowed when one or more edg_jobIds are specified as
command arguments.
--output file_path
-o file_path
writes the cancel results in the file specified by file_path instead of the standard
output. file_path can be either a simple name or an absolute path (on the submitting
machine). In the former case the file file_path is created in the current working
directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if occurred) are written to the file edg-jobcancel_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of the log
file is configurable.
--debug
when this option is specified, information about the API functions called inside the
command are displayed on the standard output and are written to the file edg-jobcancel_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location of the
log file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
edg_jobId
job identifier returned by edg-job-submit. The job identifier list must be the last
argument of this command.
EXIT STATUS
dg-job-cancel exits with a status value 0 if all the specified jobs were cancelled successfully,
>0 if errors occurred for each specified job id and <0 in case of partial failure. An example of
partial failure is when more then one job has been specified: some jobs could be successfully
removed and some others could be not removed.
IST-2000-25182
PUBLIC
113 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
EXAMPLES
1. $> edg-job-cancel --input joblist.txt
where joblist.txt is a file containing 3 edg_JobIds, displays the following confirmation
message:
Are you sure you want to remove all jobs specified? [y/n]n: y
====================== edg-job-cancel Success
=========================
The cancel request for the following job(s) has been successfully submitted
to NS:
- https://ibm139.cnaf.infn.it:9000/nUbiIiMFmY1oIusAaWxPhg
- https://ibm139.cnaf.infn.it:9000/VtMvhs8z7WGCptt92ZMPIQ
- https://ibm139.cnaf.infn.it:9000/yKTKyrdSgHKQ1wwwSocJiw
========================================================================
$>
In this case the command exit code is 0.
2. $> edg-job-cancel --all --noint
removes all job owned by the user submitting the command.
SEE ALSO
[A2], edg-job-submit.
IST-2000-25182
PUBLIC
114 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.5. edg-job-status
Displays bookkeeping information about submitted jobs.
SYNOPSIS
edg-job-status
[options]
Options:
--help
--version
--all
--input, -i
--verbosity, -v
--config, -c
--config-vo
--vo
--output, -o
--noint
--debug
--logfile
<job Id(s)>
<file_path>
<verbosity_value>
<file_path>
<file_path>
<vo_name>
<file_path>
<file_path>
DESCRIPTION
This command prints the status of a job previously submitted using edg-job-submit. The job
status request is sent to the LB that provides the requested information. This can be done
during the whole job life.
edg-job-status can monitor one or more jobs: the jobs to be checked are identified by one or
more job identifiers (edg_jobIds returned by edg-job-submit) provided as arguments to the
command and separated by a blank space.
If the --all option is specified, information about all the jobs owned by the user submitting the
command is printed on the standard output. When the command is launched with the --all
option, neither can an edg_jobId be specified nor can the --input option be specified.
The --input option permits to specify a file (file_path) that contains the edg_jobIds to monitor.
The format of the file must be as follows: one edg_jobId for each line and comment lines
have to begin with a “#” or a “*” character. When using this option the user is requested for
choosing among all, one or a subset of the listed job identifiers. If the file_path does not
represent an absolute path, it will be searched in the current working directory.
If the user wants to use his “private” configuration file, this can be done using option --config
file_path.
The same applies for the vo-specific configuration file and the --config-vo option.
IST-2000-25182
PUBLIC
115 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
The --verbosity option allows setting the detail level of the returned information. This option
can be specified with three values, 0, 1 and 2. The default level of verbosity is 0 unless
otherwise specified in the UI configuration file
$EDG_WL_LOCATION/etc/
edg_wl_ui_cmd_var.conf (DefaultStatusLevel parameter).
Hereafter are listed the information displayed according to the verbosity level.
Verbosity equal 0:
-
edg_jobId
(the job unique identifier)
-
Current Status
(the job current status)
-
exit_code
(Unix exit code – if applicable)
-
Status Reason
(reason for being in this state)
-
Reached on
(date/time when the job entered actual state)
-
destination
(ID of CE where the job has been submitted – if applicable)
With verbosity equal 1, some additional information fields are added such as:
-
cancelling
-
cancelReason
ce_node
(boolean indicating if a cancellation request for the job is in
progress)
(Reason of cancel)
(Worker node where the job is executed)
-
children_hist
(summary -- histogram -- of children job states)
-
children_num
(number of subjobs)
-
subjob_failed
condorId
(Subjob failed -- the parent job will fail too)
(Id within Condor-G)
-
cpuTime
(Consumed CPU time)
-
expectUpdate
-
expectFrom
(Boolean indicating that some logged information has not
arrived yet)
(Sources of the missing information)
-
jobtype
lastUpdateTime
(Type of the request: 0 = Job, 1 = DAG)
(Last known event of the job)
-
location
(location Where the job is being processed)
-
network_server
(Network server handling the job)
-
owner
(certificate subject of Job owner)
-
resubmitted
(boolean indicating that the job was resubmitted)
Lastly, with verbosity equal 2 there are the following additional fields:
-
jdl
IST-2000-25182
(User submitted job description)
PUBLIC
116 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
-
matched_jdl
(Full job description after matchmaking)
-
condor_jdl
rsl
(ClassAd passed to Condor-G for job submission)
(Job RSL sent to Globus)
-
stateEnterTimes
(When all previous states were entered)
Date: 07/03/2016
Information fields that are not available (i.e. not returned by the LB because not applicable
for the given status) are not printed at all to the user.
The job Status possible values are reported in Annex 7.2. Details on the Job Status Diagram
can be found in [A4].
OPTIONS
--help
displays command usage.
--version
displays UI version.
--all
displays status information about all job owned by the user submitting the command.
This option can’t be used either if one or more edg_jobIds have been specified or if
the --input option has been specified. All LBs listed in the vo-specific UI configuration
file $EDG_WL_LOCATION/etc/<vo_name>/edg_wl_ui.conf are contacted to fulfil this
request.
--input input_file
-i input_file
displays bookkeeping info about dg_jobIds contained in the input_files. When using this
option the user is interrogated for choosing among all, one or a subset of the listed job
identifiers. This option can’t be used either if one or more edg_jobIds have been specified or
if the --all option has been specified. See 6.1.2 for details about this option.
--verbosity verb_level
--v verb_level
sets the detail level of information about the job displayed to the user. Possible values
for verb_level are 0,1 and 2.
--config file_path
-c file_path
IST-2000-25182
PUBLIC
117 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--config-vo file_path
if the command is launched with this option, the vo-specific configuration file pointed
to by file_path is used instead of the standard vo-specific configuration file. This
option is allowed only when used together with the --all one.
--vo vo_name
This option allows the user to specify the Virtual Organisation she/he is currently
working for.
If the user proxy contains VOMS extensions then the VO specified through this option
is overridden by the default VO contained in the proxy (i.e. this option is only useful
when working with non-VOMS proxies). The following precedence rule is followed for
determining the user's VO:

the default VO from the user proxy (if it contains VOMS extensions),

the VO specified through the --vo or --config-vo options,

the VO specified in the configuration file pointed by the
EDG_WL_UI_CONFIG_VO environment variable,

the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS
extensions this value is overridden as above),

the default VO specified in the
$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field)
configuration file.
If none of the listed trials has success an error is returned and the submission is
aborted. This option is allowed only when used together with the --all one.
--output file_path
-o file_path
writes the bookkeping information in the file specified by file_path instead of the
standard output. file_path can be either a simple name or an absolute path (on the
submitting machine). In the former case the file file_path is created in the current
working directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if any) are written to the file edg-jobstatus_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of log file is
configurable.
IST-2000-25182
PUBLIC
118 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--debug
when this option is specified, information about the API functions called inside the
command are displayed on the standard output and are written to the file edg-jobstatus_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location of log
file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
edg_jobId
job identifier returned by edg-job-submit. Job identifiers must always be provided as
last arguments of the command.
EXIT STATUS
edg-job-status exits with a value of 0 if the status of all the specified jobs is retrieved
correctly, >0 if errors occurred for each specified job id and <0 in case of partial failure. An
example of partial failure is when more then one job is specified: status info could be
successfully retrieved for some jobs and not retrieved for some others.
IST-2000-25182
PUBLIC
119 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
EXAMPLES
$> edg-job-status –v 0 https://ibm139.cnaf.infn.it:9000/_tO6hdgToYKGCuV68q-gqQ
displays the following lines:
*************************************************************
BOOKKEEPING INFORMATION:
Printing status info for the Job :
https://ibm139.cnaf.infn.it:9000/_tO6hdgToYKGCuV68q-gqQ
Current Status:
Scheduled
Destination:
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Status Reason:
Job successfully submitted to Globus
reached on:
Tue May
6 16:14:59 2003
*************************************************************
$>
SEE ALSO
[A1], [A2], [A4], edg-job-submit.
IST-2000-25182
PUBLIC
120 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.6. edg-job-get-logging-info
Displays logging information about submitted jobs.
SYNOPSIS
edg-job-get-logging-info
[options]
<job Id(s)>
Options:
--help
--version
--input, -i
<file_path>
--verbosity, -v <verbosity_value>
--config, -c
--output, -o
--noint
--debug
--logfile
<file_path>
<file_path>
<file_path>
DESCRIPTION
This command queries the LB persistent DB for logging information about jobs previously
submitted using edg-job-submit. The job logging information are stored permanently by the
LB service and can be retrieved also after the job has terminated its life-cycle, differently
from the bookkeeping information that are in some way “consumed” by the user during the
job existence.
The edg-job-get-logging-info request is sent to the LB service that queries the DB and
returns the retrieved information. Content of the logging information varies according to the
type of the event they are related to. The most common information fields are:
-
Event
(event type - possible event types are listed in Annex 7.3)
-
source
(WMS component which generated the event)
-
result
(result of the attempt)
-
destination
(destination where the job is being transferred to)
-
timestamp
(timestamp of event generation)
The --verbosity option allows setting the detail level of the returned information. This option
can be specified with three values, 0, 1 and 2. The default level of verbosity is 0 unless
otherwise specified in the UI configuration file
$EDG_WL_LOCATION/etc/
edg_wl_ui_cmd_var.conf (DefaultLoggingLevel parameter).
The information listed above is displayed when the chosen verbosity level is 0. If the
command is issued with 1 as verbosity flag, then the following additional information is
shown:
IST-2000-25182
PUBLIC
121 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
-
host
(hostname of the machine where the event was generated)
-
dest_host
(destination hostname)
-
dest_instance
(instance of destination WMS component)
-
user
dest_jobid
(identity -- cert. subj. -- of the generator)
(destination internal jobid)
-
node
(worker node where the executable is run)
-
ns
(Network server handling the job)
-
nsubjobs
(number of subjobs)
-
local_jobid
(new jobId assigned by the receiving component)
-
queue
(destination queue name)
-
status_code
(way of job termination/classification of the cancel)
Lastly if the command is issued with verbosity level 2, additional information mostly
consisting in the job description within the WMS component that has logged the event, is
printed to the user:
-
jdl
job
(job description)
(job description in receiver language)
-
descr
(description of current job transformation -- output of helper)
-
classad
(checkpoint state value)
-
seqcode
level
(sequence code assigned to the event)
(logging level -- system, debug, ...)
Data on several jobs can be queried by specifying a list of job identifiers separated by a
blank space as arguments of the command. Moreover the --input option permits to specify a
file (file_path) which contains the edg_jobIds whose information are requested. The format of
the file must be as follows: one edg_jobId for each line and comment lines have to begin with
a “#” or a “*” character. When using this option the user is interrogated for choosing among
all, one or a subset of the listed job identifiers. If the file_path does not represent an absolute
path, it will be searched in the current working directory.
Each event logged in the LB has an associated log level according to “Universal Format for
Logger
Messages”
(see
draft-abela-ulm-05.txt
available
at
http://wwwdidc.lbl.gov/NetLogger/draft-abela-ulm-05.txt). Default value for the log level used by WMS
components is System, anyway there could be special situations in which problems
investigation is needed and additional events are logged with the Debug log level. The -output option can be used to have the retrieved information written in the file identified by
file_path instead of the standard output. file_path can be either a simple name or an absolute
path (on the submitting machine). In the former case the file file_path is created in the current
working directory.
IST-2000-25182
PUBLIC
122 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
If the user wants to use his “private” configuration file this could be done using option --config
file_path.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--input file_path
-i file_path
retrieves logging info for all edg_jobIds contained in the file_path. This option can’t be
used if one or more edg_jobIds have been specified. See 6.1.2 for details about this
option.
--verbosity verb_level
--v verb_level
sets the detail level of information about the job displayed to the user. Possible values
for verb_level are 0,1 and 2.
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--output file_path
-o file_path
writes the logging information in the file specified by file_path instead of the standard
output. file_path can be either a simple name or an absolute path (on the submitting
machine). In the former case the file file_path is created in the current working
directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if occurred) are written to the file edg-joblogging_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location for log file
is configurable.
IST-2000-25182
PUBLIC
123 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--debug
when this option is specified, information about the API functions called inside the
command are displayed on the standard output and are written to the file edg-joblogging_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location for
log file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
edg_jobId
job identifier returned by edg-job-submit. Job identifiers must always be provided as
last arguments for this command.
EXIT STATUS
edg-job-get-logging-info exits with a value of 0 if the status of all the specified jobs is
retrieved correctly, >0 if errors occurred for each specified job and <0 in case of partial
failure. An example of partial failure is when more then one job is specified: some job’s
logging info could be successfully retrieved and some others could be not retrieved.
EXAMPLES
1. $> edg-job-get-logging-info \
https://ibm139.cnaf.infn.it:9000/GMUJtnNqe6Lq7w7MfOzeQw –output mylog.txt
writes in file mylog.txt in the current working directory logging information about the job
identified by https://ibm139.cnaf.infn.it:9000/GMUJtnNqe6Lq7w7MfOzeQw.
2. $> edg-job-get-logging-info –v 0 –input $HOME/myIds.txt
where
$HOME/myjobs.txt contains two job identifiers, displays the following output
------------------------------------------------------------------------------------------------------1 : https://ibm139.cnaf.infn.it:9000/D4S_i25ffAsPnKB3iCqeaA
2 : https://ibm139.cnaf.infn.it:9000/2qzyCbPWr7pDY3rNh9PuXA
a : all
q : quit
------------------------------------------------------------------------------------------------------Choose one or more edg_jobId(s) in the list - [1-2]all: 2
IST-2000-25182
PUBLIC
124 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job : https://ibm139.cnaf.infn.it:9000/2qzyCbPWr7pDY3rNh9PuXA
--Event: RegJob
- source
=
UserInterface
- timestamp
=
Wed May 14 10:55:35 2003
- destination
=
NetworkServer
- result
=
START
- source
=
UserInterface
- timestamp
=
Wed May 14 10:55:36 2003
- destination
=
NetworkServer
- result
=
OK
- source
=
UserInterface
- timestamp
=
Wed May 14 10:55:44 2003
- source
=
NetworkServer
- timestamp
=
Wed May 14 10:56:42 2003
--Event: Transfer
--Event: Transfer
--Event: Accepted
--Event: EnQueued
- result
=
OK
- source
=
NetworkServer
- timestamp
=
Wed May 14 10:56:45 2003
**********************************************************************
…
…
SEE ALSO
[A2], [A4], edg-job-submit.
IST-2000-25182
PUBLIC
125 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
6.1.3.7. edg-job-attach
This commands starts an interactive session for a previously submitted interactive job.
SYNOPSIS
edg-job-attach [options] <job Id>
Options:
--help
--version
--port, -p
--nogui
--nolisten
--config, -c
--input, -i
--noint
--debug
--logfile
<port_num>
<file_path>
<file_path>
<file_path>
DESCRIPTION
This command starts a listener process on the UI machine (grid_console_shadow) that
allows attaching to the standard streams of a previously submitted interactive job and
displays them on a dedicated window.
As the command opens a X window, the user should make sure the DISPLAY environment
variable is correctly set, a X server is running on the local machine and if she/he is
connected to the UI node from remote machine (e.g. with ssh) enable secure X11 tunneling.
The listener process and the window are started automatically by the edg-job-submit
command for interactive jobs, so this command can be used for example in case a problem
occurred on the UI machine that made the interactive session be lost or in case the user
needs to follow the job from another machine or another port on the same machine (--port
option).
This command can only be invoked for interactive jobs.
OPTIONS
--help
displays command usage.
--version
IST-2000-25182
PUBLIC
126 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
displays UI version.
--port port_num
-p port_num
make sthe command start a listener on the local machine on the specified port and
logs these information to the LB associated to the job.
--nogui
As the edg-job-attach command opens a X window, the user should make sure a X
server is running on the local machine and if she/he is connected to the UI node from
remote machine (e.g. with ssh) enable secure X11 tunneling. If this is not possible,
the user can specify the --nogui option that makes the command provide a simple
standard non-graphical interaction with the running job.
--nolisten
This option makes the command forward the job standard streams coming from the
WN to named pipes on the UI machine whose names are returned to the user
together with the OS id of the listener process. This allows the user to interact with
the job through her/his own tools. It is important to note that when this option is
specified, the UI has no more control over the launched listener process that has
hence to be killed by the user (through the returned process id) once the job is
finished.
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--input file_path
-i file_path
allows the user to attach to one (just one) of the edg_jobIds contained in the file_path.
This option can’t be used if one edg_jobIds has been specified. See 6.1.2 for details
about this option.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if occurred) are written to the file edg-jobattach_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location for log file
is configurable.
--debug
when this option is specified, information about the API functions called inside the
command are displayed on the standard output and are written to the file edg-jobIST-2000-25182
PUBLIC
127 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
attach_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location for log
file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
edg_jobId
job identifier returned by edg-job-submit. Job identifiers must always be provided as
last arguments for this command.
EXIT STATUS
edg-job-attach exits with a value of 0 on success and >0 on failure.
EXAMPLES
$> edg-job-attach https://ibm139.cnaf.infn.it:9000/t3KwW8qhXhkYs-ZfNCFidg
displays the following information message:
**********************************************************************
JOB ATTACHED:
The Interactive Session Listener has been successfully launched
with the following parameters:
--Host:
10.1.1.90
Port:
40713
Pid:
18575
**********************************************************************
and opens a window allowing interaction with the job through the standard streams.
6.1.3.8. edg-job-get-chkpt
This commands retrieves checkpoint states saved by a previously submitted checkpointable
job.
SYNOPSIS
edg-job-get-chkpt [options] <job Id>
Options:
--help
IST-2000-25182
PUBLIC
128 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
--version
--cs
--config, -c
<state_num>
<file_path>
--input, -i
--output, -o
--noint
--debug
--logfile
<file_path>
<file_path>
<file_path>
DESCRIPTION
This commands allows the user to retrieve one or more checkpoint states saved by a
previously submitted job. Checkpoint states are retrieved from the LB server and are saved
locally into a file in JDL format.
The --cs option allows the user to select the checkpoint state she/he wants to be retrieved.
Indeed specifying the command with “--cs N” makes the command retrieve the last but N job
checkpoint state. Last saved state is retrieved otherwise.
The retrieved state is saved in a file in JDL format. The output file path can be set through
the --output option of the command.
This command can be used only for checkpointable jobs.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--config file_path
-c file_path
if the command is launched with this option, the configuration file pointed to by
file_path is used instead of the standard configuration file.
--cs state_num
if the command is launched with this option then it retrieves the “last but state_num”
state saved by the job. Last saved state is returned if the option is not used
(equivalent to state_num = 0).
--input file_path
IST-2000-25182
PUBLIC
129 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
-i file_path
allows the user to select one (just one) of the edg_jobIds contained in the file_path for
retrieval of the saved checkpoint state. This option can’t be used if one edg_jobIds
has been specified. See 6.1.2 for details about this option.
--output file_path
-o file_path
saves the retrieved state in the file specified by file_path. file_path can be either a
simple name or an absolute path (on the submitting machine). In the former case the
file file_path is created in the current working directory. If this option is not used the
retrieved state is displayed on the standard output.
--noint
if this option is specified every interactive question to the user is skipped. All warning
messages and errors (if occurred) are written to the file edg-job-getchkpt_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location for log file is
configurable
--debug
when this option is specified, information about the API functions called inside the
command are displayed on the standard output and are written to the file edg-jobattach_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location for log
file is configurable.
--logfile file_path
when this option is specified, the command log file is relocated to the location pointed
by file_path
edg_jobId
job identifier returned by edg-job-submit. Job identifiers must always be provided as
last arguments for this command.
EXIT STATUS
edg-job-get-chkpt exits with a value of 0 on success and >0 on failure.
EXAMPLES
The following commands retrieve the last but 3 saved checkpint state of the job and saves it
in the file specified by the user :
$> edg-job-get-chkpt -o myjob.chk -cs 3 https://ibm139.cnaf.infn.it:9000/LNn4rOX17LL30e34hSqGjQ
IST-2000-25182
PUBLIC
130 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
======================= edg-job-get-chkpt Success =======================
The checkpointable Job state has been successfully retrieved from LB
Server and stored in the file: /home/fpacini/CLI/bin/myjob.chk
=========================================================================
$> more /home/fpacini/CLI/bin/myjob.chk
#
Job State Retrieved for
#edg_jobId: https://ibm139.cnaf.infn.it:9000/LNn4rOX17LL30e34hSqGjQ
[
UserData =
[
distribution = false;
hsum_filename =
"gsiftp://lxde01.pd.infn.it/tmp/root_test/hsum_lxde04_1200000.root";
first_event = 1200001
];
IST-2000-25182
PUBLIC
131 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
7. ANNEXES
7.1. JDL ATTRIBUTES
The JDL is a fully extensible language (i.e. it does not rely on a fixed schema), hence the
user is allowed to use whatever attribute for the description of a job without incurring in
errors. Anyway only a certain set of attributes (that we will refer to as “supported” attributes)
can be taken into account by the WMS components for scheduling and submit a job. The
“supported” attributes, their meaning and the way to use them to describe a job are dealt in
detail in document [A1].
7.2. JOB STATUS DIAGRAM
The following reports the status that a job can assume during its life cycle.
IST-2000-25182
PUBLIC
132 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Figure 2 Job Life Cycle
Job status in Figure 2 are briefly described hereafter (see [A4] for further details):
STATUS:
-
SUBMITTED: job is entered by the user to the User Interface but not yet transferred to
Network Server for processing.
-
WAITING: job has been accepted by NS and is waiting for Workload Manager processing
or is being processed by WM Helper modules (e.g., WM is busy, no appropriate
Computing Element (cluster) has been found yet, required dataset is not available, job is
waiting for resource allocation).
-
READY: job has been processed by WM and its Helper modules (especially, appropriate
Computing Element has been found) but not yet transferred to the Computing Element
(local batch system queue) via Job Controller and Condor-G.
IST-2000-25182
PUBLIC
133 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
-
SCHEDULED: job is waiting in the queue on the Computing Element.
-
RUNNING: job is running.
-
DONE: job exited or is considered to be in a terminal state by Condor-G (e.g., submission
to CE has failed in an unrecoverable way).
-
-
ABORTED: job processing was aborted by WMS (waiting in the Workload Manager
queue or Computing Element for too long, over-use of quotas, expiration of user
credentials, etc.).
CANCELLED: job has been successfully cancelled on user request.
-
CLEARED: output sandbox was transferred to the user or removed due to the timeout.
IST-2000-25182
PUBLIC
134 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
7.3. JOB EVENT TYPES
Hereafter is reported the list of job event types that could be returned to the user by the edgjob-get-logging-info command. They are organized in several categories:

Events concerning a job transfer between components:

JobTransfer A component generates this event when it tries to transfer a job to
some other component via network interface (protocol). This event contains the
identification of the receiver and possibly the job description expressed in the
language accepted by the receiver. The result of the transfer, i.e. success or
failure, as seen by the sender is also included.

JobAccepted A component generates this event when it receives a job from another
WMS component. This event contains also the locally assigned job identifier.

JobRefused Receiving component could not accept the job, the reason being a part
of the event.

JobEnqueue The job is inserted into a queue, e.g., the queue holding the job after it
is received by Network Server and before it is processed by Workload Manager.

JobDequeue

HelperCall Helper component is called during the job processing. The typespecific data include the name of called Helper, whether the logging component
is called or calling one, and optionally parameters passed to the Helper.

HelperReturnCall to Helper returned.
The job is removed from queue.


Events concerning a job state change during processing within a component:

JobAbort
The job processing is stopped by WMS due to error condition, the
event contains the reason for abort.

JobRun

JobDone
Job has exited, has been successfully cancelled or is considered to be
in terminal state by Condor-G.

JobResub

JobCleared The user has successfully retrieved the job results, e.g. the output files
specified in the output sandbox, or the job results has been deleted due to
time limit.

JobCancel



The job is started on a CE.
The result of resubmission decision after the job has failed.
Cancel operation has been attempted on the job.
JobPurge
The job was purged from bookkeeping server's database. This event is
stored only in a logging server..
Events associated with the Workload Manager or Helper modules:
JobMatch
An appropriate match between a job and a Computing Element has
been found. The event contains the identifier of the selected CE.
IST-2000-25182
PUBLIC
135 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE

DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
JobPending A match between a job and a suitable Computing Element was not
found, so the job is kept pending by the WM. The event contains the reason
why no match was found.


Events used to store special information in logging and bookkeeping services:

JobRegister Logged by job creator (User Interface) in order to register the job with
bookkeeping server.

JobChkpt
An application-specific checkpoint was created (logged by
checkpointing API). Checkpoint tag and ClassAd strings should be included.

JobListener Used by UI to store listener network port information for interactive
jobs. Listener port number, hostname and service name (multiple ports can be
advertised) are included.

JobCurJdl
This optional event can be used to report ClassAd describing the
current state of job processing (output from Helper modules).

More details on job event types can be found in [A4].
IST-2000-25182
PUBLIC
136 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
7.4. SUBMISSION FAILURES ANALYSIS
Analysis of failed job’s state can be carried out through the check of the consistency and
completeness of the job related events returned by the edg-job-get-logging-info command. A
further verification if needed, should be then performed on the retrieved output files produced
by the jobs (if any) and through the inspection of the log files ad debugging information
traced by the various system components.
As explained in section 6.1.3.6 to get the logging information about a job you need to issue
the following command:
edg-job-get-logging-info <job_Identifier>
Since the output of the command could be copious we advice usage of the –output option
too to redirect it to a given file:
edg-job-get-logging-info –output <my_file> <job_Identifier>
Using the –full option allows then to get more detailed information (the job descriptions at
the various stages are also included):
edg-job-get-logging-info –v 2 –output <my_file> <job_Identifier>
Before using the edg-job-get-logging-info command, it is in some cases useful a check to the
edg-job-status output that can contain information about the cause of a job failure.
As explained in section 6.1.3.5 to get the status information about a job you need to issue the
following command:
edg-job-status <job_Identifier>
As said at the beginning of this section another way for analysing submission failures is to
inspect the standard output and error of the job generated on the Worker Node and retrieved
on the UI machine through the edg-job-get-output command. A typical example of
errors that can be detected in this way is when the users submits a script that in turn tries to
start enother script or an executable. E.g. the submitted scripts is like:
#!/bin/sh
# Use the coincidence file to compare the meaurements
curdir=`pwd`
${curdir}/lecture_new_gome_V2_sel1_PT_10
idl appli.pro
IST-2000-25182
PUBLIC
137 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
Upon job abortion, the error message received through the OutputSandbox retrieval is:
./demo_june: /home/eo004/3042/lecture_new_gome_V2_sel1_PT_10: Permission
denied
The reason for this error is that globus-url-copy (used for the InputSanbox files staging)
in general doesn't preserve the x flag so the script specified as Executable in the JDL (on
which chmod +x is done automatically by the WP1 JobWrapper), should perform a chmod
+x for all the executable files (lecture_new_gome_V2_sel1_PT_10 in this example)
transferred within the InputSandbox of the job.
IST-2000-25182
PUBLIC
138 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
7.5. JOB RESUBMISSION AND RETRYCOUNT
It is important to note that there are particular cases, as for example temporary network
outages in the proximity of the CE, that can make the WMS “think” the job has failed (no way
currently to distinguish such situations by means of error reporting from the underlying
components) and hence trigger job resubmission whilst the job is running on the WN. This
can cause having one or even more copies of the same job running on different CEs since
until the network is down the WMS is not able to kill the original job.
Due to above mentioned possibility (although should occur very rarely) it is advisable for jobs
performing sensitive operations (e.g. committing data into a DB) to disable the WMS resubmission feature. This can be easily done on a per-job basis setting to 0 the value of the
RetryCount attribute in the job description and on a “per-session” basis setting to 0 the value
of the RetryCount parameter in the UI configuration (see 4.5.3.1 and 4.5.3.2).
7.6. WILDCARD PATTERNS
The wildcard patterns that can be included in the InputSandbox attribute expression are used
by the UI to perform file name “globbing” in a fashion similar to the UNIX csh shell. The result
of the “globbing” is a list of the files whose names match any of the specified patterns.
The admitted special characters together with their meaning are listed hereafter:
-
*
wildcard for any string
-
?
wildcard for any single character
-
[chars ]
delimits a wildcard matching any of the enclosed characters. If chars
contains a sequence of the form a-b then any character between a and
b (inclusive) will match. Such an expression can be negated by means
of the special character “!” ([!chars] matches any character not in
chars).
EXAMPLES
Consider a directory where “ls –F” gives:
1file
2files
ABS
a1
a1.f
ab
ab.f
apple.f
apple.o
apps/
bob
bob.f
bob.o
foo.c
foo.f
gh
h4374.f
h4374.o
john
john.f
john.o
mydir/
stuff/
That is to say some files and directories. The examples below show the way the mentioned
wildcards are expanded (the notation => indicates the result of typing the command).
1) Every two letter file name:
echo ?? => a1 ab gh
IST-2000-25182
PUBLIC
139 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
2) Every two character name starting with “a“:
echo a? => a1 ab
3) Every file starting with j, o, h, or n:
echo [john]* => h4374.f h4374.o john john.f john.o
4) Include a range, e.g. everything starting with an upper case letter or a digit:
echo [A-Z0-9]* => 1file 2files ABS
5) Negate a range:
echo [!john]*.f => a1.f ab.f apple.f bob.f foo.f
6) Every file starting in “a” and ending in .f:
echo a*.f => a1.f ab.f apple.f
IST-2000-25182
PUBLIC
140 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
7.7. THE MATCH MAKING ALGORITHM
The main task performed by the RB (aka Matchmaker) is to find the best suitable Computing
Element where to execute the job. In order to accomplish this task the RB interacts with the
other EDG components. More precisely, the Replica Location Service (RLS) and the
GOUT/Information Index (II) are the two main components which supply the RB with all the
information required for the actual resolution of the matches between job requirements and
Computing Element capabilities (i.e. runtime environments, data access features, processing
resources etc.).
The following sections provide a description of the matchmaking algorithm performed by the
RB. At this aim it is worth to identify three different scenarios to be dealt with separately:

direct job submission,

job submission without data-access requirements,

job submission with data-access requirements.
7.7.1. Direct Job Submission
The simplest scenario is to consider the case where the JDL submitted by the UI contains a
link to the resource to submit the job at, i.e. the Computing Element identifier (CEId). In this
case the RB doesn’t perform any matchmaking algorithm at all, but simply the job is
submitted to the specified CE.
CE = lxde01.pd.infn.it:2119/
jobmanager-lsf-grid01
JDL
UI
WAN / LAN
JDL
JDL
WAN / LAN
WM - RB
Job
lxde01.pd.infn.it:2119/jobmanager-lsf-grid01
Figure 3 - Submission with specified CEId
It should be pointed out that, if the CEId is specified then the RB neither checks whether the
user who submitted the job is authorised to access the given CE, nor interacts with the RLS
for the resolution of files requirements, if any. The only check performed by the RB is the JDL
syntax one, while converting the JDL into a ClassAd.
7.7.2. Job submission without data-access requirements
Let’s do a little step onwards and consider the scenario where the user specifies a job with
given execution requirements, but without data constraints. Once the JDL has been received
by the RB and successfully converted into ClassAd (job-ad) the RB starts the actual matchmaking algorithm to find if the characteristics and status of Grid resources match the job
requirements.
IST-2000-25182
PUBLIC
141 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
The matchmaking algorithm consists of two different phases: requirements check and the
rank computation.
During the requirements check phase the RB contacts the GOUT/II in order to create a set of
suitable CEs to execute the job at, thus compliant with user requirements and where the user
is authorized to submit jobs, as well. Taking into account that all the CE attributes involved in
the JDL requirements (defined by the user to express his/her needs) usually refers to “static”
information, (such as operating system, installed software runtime environments, etc.), it is
clear that all the information cached in the GOUT/II represent a good source for testing
matches between job requirements and CE features. It is clearly more efficient than
contacting each CE to find out the same information.
[
...
requirements = other.GlueCEInfoLRMSType == "pbs" &&
member(“CMS3.2”, other.GlueHostApplicationSoftwareRunTimeEnvironment);
rank = other.GlueCEStateFreeCPUs;
...
]
JDL
JDL
JDL
Suitable CEs
skurut.cesnet.cz:2119/jobmanager-pbs-wp1
bbq.mi.infn.it:2119/jobmanager-pbs-dque
WM - RB
WAN / LAN
Retrieves information about CEs
UI
WAN / LAN
Data
II / GOUT
Figure 4 - Requirements checking phase
Once the RB has created the set of the suitable CEs where the job can be executed, the RB
performs the second phase of the matchmaking algorithm, which allows the RB to acquire
information about the “quality” of the just found suitable CEs.
In the ranking phase the RB contacts directly the LDAP server (i.e the GRIS) of the involved
CEs to obtain the values of those attributes appearing in the rank expression of the received
JDL. It should be pointed out that conversely to the previous phase, it is better to contact
each suitable CE, rather than using the GOUT/II as source of information, since the rank
attributes usually refers to variables varying in time very frequently (i.e. FreeCPUs,
FreeMemory).
IST-2000-25182
PUBLIC
142 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
If there are two or more CEs that meet all the requirements and have the same best rank,
then the CE is chosen among them in a random way (and all these CEs have the same
probability to be chosen).
The default policy the RB adopts while performing the matchmaking algorithm is to select the
CEs at maximum rank value. Therefore, the higher is the frequency at which variables
involved in the rank expression change their values the higher the probability the
matchmaking algorithm yields different CEs. As explained in [A1], it is also possible to enable
“fuzzyness” in the matchmaking, i.e. to force matchmaking algorithm to adopt a stochastic
selection criteria while searching for the best matching CE. This can be done specifying the
following attribute
FuzzyRank = true;
in the submitted JDL. In this case, rank values associated to each matching CE represent the
probability that each CE has, to be selected as the best matching one. Namely, the higher is
the probability to be selected the higher the rank value.
Rank computation is depicted in Figure 5.
[
...
requirements = other.GlueCEInfoLRMSType == "pbs" &&
member(“CMS3.2”, other.GlueHostApplicationSoftwareRunTimeEnvironment);
rank = other.GlueCEStateFreeCPUs;
...
]
JDL
JDL
JDL
Suitable CEs
skurut.cesnet.cz:2119/jobmanager-pbs-wp1
bbq.mi.infn.it:2119/jobmanager-pbs-dque
WM - RB
WAN / LAN
Retrieves information about CEs
UI
WAN / LAN
Data
II / GOUT
Figure 5 – Rank computation phase
IST-2000-25182
PUBLIC
143 / 146
Doc. Identifier:
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
DataGrid-01-TEN-0118-1_2
Date: 07/03/2016
7.7.3. Job submission with data-access requirements
The Resource Broker interacts with the WP2 Replica Management services in order to find
out the most suitable Computing Element taking into account the Storage Elements where
both input data sets are physically stored and output data sets should be staged on
completion of job execution.
Before describing the action taken by the RB upon reception of a JDL where both dataaccess and computing requirements are present, it is worth to recall the JDL attributes which
represent a data requirement at the RB side: OutputSE, InputData and DataAccessProtocol,
respectively representing the Storage Element (SE) where the output file should be staged,
the input files (LFNs, GUIDs) required as input for the actual job execution and the protocol
“spoken” by the application to access such files.
The main two phases of the match making algorithm performed by the RB remain
unchanged, but the RB executes the requirements check and ranking for each class of CEs
satisfying the data-access requirements. Additionally, the RB performs a pre-match
processing to find out and classify those CEs satisfying both data-access and user
authorisation requirements.
During the pre-match processing phase the RB contacts the RLS in order to resolve logical
file names and collect all the information about SEs containing at least one input data file.
This information will be used to write down the broker-info-file, which will be shipped, within
the input sandbox, to the WN where the execution will take place.
At this point the RB is ready to start the CEs classification procedure, during which the RB
contacts the II/GOUT in order to find the CEs satisfying both the authorization requirements
and having the OutputSE “close” to them. Using the information retrieved during the file
name resolution, the RB classifies those CEs depending on the number of input files stored
in storage element(s) which is (are) close to the CE itself and speak at least one of the
protocols specified in the DataAccessProtocol JDL attribute.
IST-2000-25182
PUBLIC
144 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
ClassifyCEOnDataAccess
Retirieve SEs information and resolve
LFN -> PFN mapping if so required
While exists a ComputingElement (CEId) such as
AuthorizedUser = JDL.CertificateSubject
JDL.OutputSE is a
CloseSE for CEId ?
No
Yes
For each StorageElement close to CEId
While
Does at least one
SEProtocol
for such SE belong to
DataAccessProtocol ?
Yes
Retirieve the MountPoint and conunt the
number N of distinguished InputData
files it supplies access with.
End
No
For
Files_x_CE[CEId] =
files.size()
Figure 6 - CEs classification procedure
Upon completion of the CE data classification, the RB is ready for the actual match making
and starts the requirements checking phase for each CE belonging to the first non-empty
class of CEs, which can access the highest number of distinguished files. If a CE doesn’t
satisfy the user requirements it is removed from its class. The requirements checking phase
is repeated until at least a CE matching the user requirement is found.
Once the requirements checking phase is completed either the RB knows a set of CEs
satisfying both data-access and computing requirements having access to the maximum
number of distinguished input files, or there does not exist a suitable CE matching such
requirements. In the first case the RB starts the ranking phase in order to find the best CE to
which submitting the job.
IST-2000-25182
PUBLIC
145 / 146
Doc. Identifier:
DataGrid-01-TEN-0118-1_2
WP1 - WMS SOFTWARE
ADMINISTRATOR AND USER GUIDE
Date: 07/03/2016
Start
max_files =
max(Files_x_CE)
do
theJob.SuitableCEs =
CEs with max_files
During the check
requirements phase the CE
which do not match the job
requirements are removed
form theJob.SuitableCEs
check requirements
while theJob.SuitableCEs.empty AND --max_files
max_files > 0
No
Yes
No matching
resource found!
compute rank choose the highest
ranked CE
End
Figure 7 - Match-Making algorithm
A special case is when the getAccessCost method has been specified as rank (see [A1]),
i.e.:
Rank = other.DataAccessCost;
in this case the CE is chosen among the CEs satisfying the no-data requirements (i.e. the
ones specified in the Requirements JDL expression) and where the user is allowed to submit
jobs, and the choice of the “best” CE among them is delegated to the getAccessCost
function, i.e. the CE from which access to data is the lowest is the “best” one.
IST-2000-25182
PUBLIC
146 / 146
Download