Oracle Grid Engine Installation and Upgrade Guide

Oracle® Grid Engine
Installation and Upgrade Guide
Release 6.2 Update 7
E21973-02
February 2012
Oracle Grid Engine Installation and Upgrade Guide, Release 6.2 Update 7
E21973-02
Copyright © 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Primary Author:
Uma Shankar
Contributing Author:
Contributor:
Andy Schwierskott
This software and related documentation are provided under a license agreement containing restrictions on
use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your
license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,
transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse
engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is
prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If
you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it
on behalf of the U.S. Government, the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data
delivered to U.S. Government customers are "commercial computer software" or "commercial technical data"
pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As
such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and
license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of
the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software
License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software or hardware is developed for general use in a variety of information management
applications. It is not developed or intended for use in any inherently dangerous applications, including
applications that may create a risk of personal injury. If you use this software or hardware in dangerous
applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other
measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages
caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of
their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks
are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD,
Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced
Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products,
and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly
disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle
Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your
access to or use of third-party content, products, or services.
Contents
Preface ................................................................................................................................................................. ix
Audience.......................................................................................................................................................
Documentation Accessibility .....................................................................................................................
Related Documents .....................................................................................................................................
Conventions .................................................................................................................................................
ix
ix
ix
ix
1 Planning the Installation
System Requirements.............................................................................................................................. 1-1
Disk Space Requirements................................................................................................................. 1-1
Supported Operating Platforms...................................................................................................... 1-2
Planning Checklist ................................................................................................................................... 1-2
Cluster Design .......................................................................................................................................... 1-3
Cells ...................................................................................................................................................... 1-4
Cluster Name ..................................................................................................................................... 1-4
Queue Structure........................................................................................................................................ 1-4
Host System Requirements .................................................................................................................... 1-4
Master Host......................................................................................................................................... 1-4
Shadow Master Hosts........................................................................................................................ 1-5
Execution Hosts.................................................................................................................................. 1-5
Administration Hosts ........................................................................................................................ 1-6
Submit Hosts....................................................................................................................................... 1-6
User Account Considerations................................................................................................................ 1-6
User Names ........................................................................................................................................ 1-6
Installation Accounts ......................................................................................................................... 1-6
File Access Permissions.................................................................................................................... 1-7
Network Services...................................................................................................................................... 1-7
Installation Methods................................................................................................................................ 1-7
Directory Organization ........................................................................................................................... 1-8
Spool Directories under the Root Directory.................................................................................. 1-9
Choosing Between Classic Spooling and Database Spooling.................................................. 1-10
$SGE_ROOT Directory.................................................................................................................. 1-10
Spooling Options .................................................................................................................................. 1-11
Database Server and Spooling Host ............................................................................................ 1-11
Scheduler Profiles ................................................................................................................................. 1-12
Getting the Software............................................................................................................................. 1-13
iii
Electronic Download ...................................................................................................................... 1-13
CD-ROM Distribution .................................................................................................................... 1-13
2 Installing Grid Engine
Loading the Distribution Files on a Workstation ............................................................................. 2-2
How to Load the Distribution Files on a Workstation ................................................................ 2-2
pkgadd Method ................................................................................................................................. 2-2
tar Method.......................................................................................................................................... 2-3
Installing the Software With the GUI Installer................................................................................. 2-4
Requirements ...................................................................................................................................... 2-4
Express Installation.................................................................................................................................. 2-5
Using the Express Installation Mode ............................................................................................. 2-5
Custom Installation............................................................................................................................... 2-13
Using the Custom Installation Mode .......................................................................................... 2-13
How to Configure Password-less Access for the root User .......................................................... 2-21
Configuring Password-less ssh Access for the root User......................................................... 2-22
Configuring Password-less rsh Access for the root User ......................................................... 2-23
Understanding Host and Installation States................................................................................... 2-23
Host Resolving................................................................................................................................ 2-24
Host Validating .............................................................................................................................. 2-24
Installation States ........................................................................................................................... 2-25
Tweaking start_gui_installer.............................................................................................................. 2-26
Description of start_gui_installer Options ................................................................................. 2-26
Using start_gui_installer Options ..................................................................................................... 2-27
installing as a Different connect_user .......................................................................................... 2-27
Installing Single Windows Execution Host................................................................................. 2-27
Installing Multiple Windows Execution Hosts........................................................................... 2-27
Troubleshooting the GUI Installer.................................................................................................... 2-28
FAQs................................................................................................................................................. 2-28
Known issues and workarounds ................................................................................................. 2-31
Installing the Software From the Command Line ......................................................................... 2-31
Installation Overview ..................................................................................................................... 2-31
Performing an Installation ............................................................................................................ 2-32
How to Install the Master Host.......................................................................................................... 2-32
Installing the Master Host.............................................................................................................. 2-32
Example Master Host Installation..................................................................................................... 2-37
How to Install Shadow Master Host ................................................................................................. 2-49
Starting a Shadow Master Host Manually .................................................................................. 2-51
Configuring Shadow Master Host Environment Variables...................................................... 2-51
Example Shadow Master Host Installation ................................................................................. 2-52
How to Install Execution Hosts .......................................................................................................... 2-54
Example Execution Host Installation ............................................................................................... 2-57
How to Register Administration Hosts............................................................................................ 2-62
How to Register Submit Hosts .......................................................................................................... 2-62
How to Install the Berkeley DB Spooling Server .......................................................................... 2-62
Installing the Increased Security Features ...................................................................................... 2-65
Why Install the Increased Security Features? ............................................................................ 2-65
iv
Additional Setup Required...........................................................................................................
How to Install a CSP-Secured System..............................................................................................
How to Generate Certificates and Private Keys for Users ...........................................................
How to Renew Certificates..................................................................................................................
How to Check Certificates ...................................................................................................................
Displaying a Certificate ..................................................................................................................
Check Issuer .....................................................................................................................................
Check Subject ...................................................................................................................................
Show Email of Certificate...............................................................................................................
Show Validity ..................................................................................................................................
Show Fingerprint.............................................................................................................................
Verifying the Installation.....................................................................................................................
How to Verify That the Daemon is Running on the Master Host ..............................................
How to Verify That the Daemons Are Running on the Execution Hosts..................................
How to Run Simple Commands........................................................................................................
How to Submit Test Jobs......................................................................................................................
Automating the Installation Process ................................................................................................
Automatic Installation..........................................................................................................................
Special Considerations ...................................................................................................................
Using the inst_sge Utility and a Configuration Template .......................................................
How to Automate Installation With Increased Security (CSP) ...............................................
How to Automate Other Installations Through a Configuration File....................................
How to Automate the Master Host Installation ........................................................................
Automating Other Installations Through a Configuration File .......................................
Automatic Uninstallation ...................................................................................................................
How to Uninstall Execution Hosts Automatically ....................................................................
How to Uninstall the Master Host Automatically ....................................................................
How to Uninstall the Shadow Master Host ...............................................................................
How to Start the Automatic Backup ..................................................................................................
Troubleshooting Automatic Installation and Uninstallation ......................................................
Installing SMF Services .......................................................................................................................
Why Install SMF Services?............................................................................................................
Additional Setup Required...........................................................................................................
How Do SMF Services Compare to the Normal Services?.......................................................
qmaster Daemon ......................................................................................................................
shadowd Daemon ....................................................................................................................
execd Daemon ..........................................................................................................................
Berkeley RPC Server................................................................................................................
dbwriter Software ....................................................................................................................
Installing a JMX-Enabled System .....................................................................................................
Additional Setup Required...........................................................................................................
How to Install a JMX Agent-Enabled System.................................................................................
How to Generate Certificates, Private Keys and Keystores for Users .......................................
How to Check Certificates, Private Keys and Keystores for Users.............................................
JMX Configuration Files ......................................................................................................................
jaas.config .........................................................................................................................................
java.policy.........................................................................................................................................
2-65
2-66
2-68
2-69
2-70
2-70
2-70
2-70
2-70
2-71
2-71
2-71
2-72
2-72
2-73
2-73
2-74
2-75
2-75
2-76
2-76
2-77
2-77
2-78
2-79
2-79
2-80
2-80
2-80
2-81
2-82
2-82
2-82
2-82
2-82
2-83
2-83
2-83
2-83
2-84
2-84
2-85
2-86
2-87
2-87
2-87
2-89
v
management.properties ................................................................................................................. 2-92
jmx.access ......................................................................................................................................... 2-95
jmx.password................................................................................................................................... 2-96
logging.properties ........................................................................................................................... 2-96
Testing and Troubleshooting .............................................................................................................. 2-98
Removing the Software........................................................................................................................ 2-99
How to Remove the Software Interactively .................................................................................... 2-99
How to Remove the Software Using the inst_sge Utility and a Configuration Template ... 2-100
Additional Software for the Microsoft Operating System.......................................................... 2-101
Additional Software...................................................................................................................... 2-101
Microsoft Services for UNIX ............................................................................................................ 2-101
Unsupported Grid Engine Functionality................................................................................... 2-102
Configuring User Name Mapping............................................................................................. 2-102
How to Install Microsoft Services for Unix................................................................................... 2-103
System Requirements ................................................................................................................... 2-103
Services for UNIX Installation.................................................................................................... 2-104
Post SFU Installation Tasks......................................................................................................... 2-106
Troubleshooting SFU.......................................................................................................................... 2-108
Microsoft Subsystem for UNIX-based Applications................................................................... 2-109
Unsupported Grid Engine Functionality.................................................................................. 2-110
How to Install a Microsoft Subsystem for UNIX-based Applications .................................... 2-110
System Requirements ................................................................................................................... 2-110
Installing Subsystem for UNIX-based Applications ............................................................... 2-110
Post Installation Tasks .................................................................................................................. 2-114
Troubleshooting Microsoft Subsystem for UNIX-based Applications .................................... 2-115
Changing Default Behavior to Case Sensitivity .......................................................................... 2-116
Disabling DEP ..................................................................................................................................... 2-117
How to Disable DEP for Windows XP Professional, Windows Server 2000 and Window Server
2003 2-117
How to Disable DEP for Windows Vista (Enterprise and Ultimate) and Windows Server 2008 .
2-117
Enabling suid Behavior for Interix Programs............................................................................... 2-118
User Management on Windows Hosts ........................................................................................... 2-118
Managing Users on Windows Hosts .............................................................................................. 2-118
Windows User Example............................................................................................................... 2-119
UNIX User Management.............................................................................................................. 2-119
Using Grid Engine in a Microsoft Windows Environment ........................................................ 2-119
Registering Windows User Passwords ...................................................................................... 2-120
Using the sgepasswd Command ................................................................................................ 2-120
Adding Windows Hosts to Existing Grid Engine Systems.................................................... 2-121
How to Add Windows Hosts Later ................................................................................................. 2-121
Other Installation Issues................................................................................................................... 2-122
How to Verify and Install Linux Motif Libraries ......................................................................... 2-122
How to Install the Software on a System with IPMP.................................................................. 2-122
What Is IP Multipathing?............................................................................................................ 2-123
Issues Between IPMP and Grid Engine .................................................................................... 2-123
Installing the Grid Engine Master Node With IPMP.............................................................. 2-123
Ignoring the Error Messages ...................................................................................................... 2-124
vi
Temporarily Disabling IPMP ..................................................................................................... 2-124
Installing a Grid Engine on an Execution Host With IPMP................................................... 2-124
Enabling Administrative and Submit Hosts With IPMP ....................................................... 2-124
3 Upgrading Grid Engine
About Upgrading the Software ............................................................................................................ 3-1
Before You Upgrade ................................................................................................................................. 3-1
Constraints................................................................................................................................................ 3-2
How to Back Up the Configuration of the Old Cluster ................................................................... 3-2
What the Backup Contains ............................................................................................................... 3-2
How to Back Up the Cluster ............................................................................................................ 3-3
How to Install the 6.2 Software Using the Cloned Configuration Method ................................. 3-3
Additional Constraints for the New 6.2 Installation with Cloned Configuration................... 3-3
Example Upgrade for Cloned Cluster Configuration ...................................................................... 3-8
How to Upgrade the Original Cluster to the 6.2 Software (Real Upgrade)............................... 3-14
How to Upgrade from 5.3 to 6.0 ......................................................................................................... 3-17
A Configuration File Templates
Configuration File Template ................................................................................................................. A-1
vii
viii
Preface
Oracle Grid Engine Installation and Upgrade Guide describes how to install Grid
Engine and how to upgrade the Grid Engine from a previous version.
Audience
This document is intended for system administratrs to perform installation or
upgrading Oracle Grid Engine.
Documentation Accessibility
For information about Oracle's commitment to accessibility, visit the Oracle
Accessibility Program website at
http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.
Access to Oracle Support
Oracle customers have access to electronic support through My Oracle Support. For
information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or
visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing
impaired.
Related Documents
For more information, see the following documents in the Oracle Grid Engine Release
6.2 documentation set:
■
Oracle Grid Engine Release Notes
■
Oracle Grid Engine User Guide
■
Oracle Grid Engine Administration Guide
Conventions
The following text conventions are used in this document:
Convention
Meaning
italic
Italic type indicates book titles, emphasis, or placeholder variables for
which you supply particular values.
monospace
Monospace type indicates commands within a paragraph, URLs, code
in examples, text that appears on the screen, or text that you enter.
ix
x
1
Planning the Installation
1
Before you install the Grid Engine software, you must plan how to achieve the results
that fit your environment. This section helps you make the decisions that affect the rest
of the procedure.
System Requirements
To verify that the systems on which you intend to install Grid Engine conform to
required hardware and software specifications, review the system requirements listed
below.
Disk Space Requirements
The Grid Engine software directory tree has the following fixed disk space
requirements:
■
50 Mbytes for the installation files without any binaries
■
Between 60 and 100 Mbytes for each set of binaries
The ideal disk space for Grid Engine system spool directories is as follows:
■
50-200 Mbytes for the master host spool directories
■
50-200 Mbytes for the Berkeley DB spool directories
The spool directories of the master host and of the execution hosts are configurable
and need not reside under the default location, sge-root.
You must satisfy several Windows platform-specific
prerequisites before you can install Grid Engine on hosts that are
running the Windows operating system. You might need to install
additional software on your computer which might require additional
disk space. See Microsoft Services for UNIX and Microsoft Subsystem
for UNIX-based Applications.
Note:
Planning the Installation 1-1
Planning Checklist
Supported Operating Platforms
The Grid Engine 6.2 software supports the following operating systems and platforms:
Master Host
■
Solaris 11, 10, 9, and 8 Operating Systems (SPARC Platform Edition)
■
Solaris 9 Operating System (x86 Platform Edition)
■
Solaris 11 and 10 Operating Systems (x64 Platform Edition)
■
Linux x86, kernel 2.4 and higher, glibc >= 2.3.2
■
Linux x64, kernel 2.4 and higher, glibc >= 2.3.2
Compute Host
■
Solaris 11, 10, 9, and 8 Operating Systems (SPARC Platform Edition)
■
Solaris 9 Operating System (x86 Platform Edition)
■
Solaris 11 and 10 Operating Systems (x64 Platform Edition)
■
Linux x86, kernel 2.4 and higher, glibc >= 2.3.2
■
Linux x64, kernel 2.4 and higher, glibc >= 2.3.2
■
Linux IA64, kernel 2.4, 2.6, glibc >= 2.3.2
■
Apple Mac OS X 10.4 (Tiger), PPC platform
■
Apple Mac OS X 10.4 (Tiger), x86 platform
■
Apple Mac OS X 10.5 (Leopard), x86 platform
■
Hewlett Packard HP-UX 11.00 or higher, 32 bit
■
Hewlett Packard HP-UX 11.00 or higher, 64 bit (including HP-UX on IA64)
■
IBM AIX 5.1, 5.3, 6.1
■
Microsoft Windows:
■
Server 2003
■
XP Professional with at least Service Pack 1
■
2000 Server with at least Service Pack 3
■
2000 Professional with at least Service Pack 3
■
Server 2003 Release 2
■
Server 2008
■
Vista Enterprise
■
Vista Ultimate
Planning Checklist
Before you install the Grid Engine software, you must plan how to achieve the results
that fit your environment. This section helps you make the decisions that affect the rest
of the procedure. Write down your installation plan in a table similar to the following
example.
1-2 Oracle Grid Engine Installation and Upgrade Guide
Cluster Design
Table 1–1
Planning Checklist
Parameter
Value
$SGE_ROOT directory
___________________________
Cell name
___________________________
$SGE_CLUSTER_NAME
___________________________
Administrative user
___________________________
sge_qmaster port number (6444 is recommended)
___________________________
sge_execd port number (6445 is recommended)
___________________________
Master host
___________________________
Shadow master hosts
___________________________
Execution hosts
___________________________
Spooling for each execution host (global or local)
___________________________
Windows execution hosts (yes or no)
___________________________
Administration hosts
___________________________
Submit hosts
___________________________
Group ID range for jobs
___________________________
Spooling mechanism (Berkeley DB or Classic
spooling)
___________________________
Berkeley DB server host (the master or another
host)
___________________________
Berkeley DB spooling directory on the database
server
___________________________
Scheduler tuning profile (Normal, High, Max)
___________________________
Installation method (interactive, secure, automated, ___________________________
or upgrade)
If you are going to install Grid Engine 6.2 on Microsoft Windows Server 2003,
Windows XP Professional with at least Service Pack 1, Windows 2000 Server with at
least Service Pack 3, or Windows 2000 Professional with at least Service Pack 3, acquire
and install Microsoft Services For UNIX. See Microsoft Services for UNIX for more
information.
If you are going to install Grid Engine 6.2 on Microsoft Windows Server 2003 Release
2, Windows Server 2008, Windows Vista Enterprise or Windows Vista Ultimate,
acquire and install Microsoft Subsystem for UNIX-based Applications. See Microsoft
Subsystem for UNIX-based Applications for more information.
If you are going to install Grid Engine 6.2 on a Windows system, create the required
Certificate Security Protocol (CSP) certificates before installing Grid Engine. See How
to Install a CSP-Secured System for information about CSP certificates.
Check Other Installation Issues for applicability.
Cluster Design
Planning the Installation 1-3
Queue Structure
Cells
You can set up the Grid Engine system as a single cluster or as a collection of loosely
coupled clusters called cells. The $SGE_CELL environment variable indicates the cluster
being referenced. When the Grid Engine system is installed as a single cluster, $SGE_
CELL is not set, and the value default is assumed for the cell value.
Cluster Name
The $SGE_CLUSTER_NAME environment variable supports unique naming of the cluster.
Unlike the $SGE_CELL variable, there are restrictions on $SGE_CLUSTER_NAME. If you
decide to use Grid Engine SMF services on Solaris 10 or later hosts, you must select a
new $SGE_CLUSTER_NAME. This name becomes part of the name of the Grid Engine SMF
services. The $SGE_CLUSTER_NAME is also used to distinguish multiple rc files for
different clusters.
If your $SGE_CELL name already reflects the desired cluster
name and also satisfies $SGE_CLUSTER_NAME restrictions, set the cluster
name to the $SGE_CELL value. Otherwise, the proposed default value
is pSGE_QMASTER_PORT, which uniquely identifies the running cluster
by the port on which its qmaster daemon is running. See Installing
SMF Services for more information.
Note:
Queue Structure
The installation procedure creates a default cluster queue structure, which is suitable
for getting acquainted with the system. The default queue can be removed after
installation.
No matter what directory is used for the installation of the
software, the administrator can change most settings that were created
by the installation procedure. This change can be made while the
system is running.
Note:
Consider the following when determining a queue structure:
■
Whether you need cluster queues for sequential, interactive, parallel, and other job
types
■
Which queue instances to put on which execution hosts
■
How many job slots are needed in each queue
For more detailed information on administering cluster queues, see Oracle Grid Engine
Administration Guide.
Host System Requirements
Master Host
The master host controls the Grid Engine system. This host runs the master daemon
sge_qmaster.
The master host must comply with the following requirements:
1-4 Oracle Grid Engine Installation and Upgrade Guide
Host System Requirements
■
The host must be a stable platform.
■
The host must not be excessively busy with other processing.
■
■
■
At least 60 to 120 Mbytes of unused main memory must be available to run the
Grid Engine system daemons. For very large clusters that include many hundreds
or thousands of hosts and tens of thousands of jobs in the system at any time, 1
GByte or more of unused main memory might be required and 2 CPUs might be
beneficial.
The master host must be installed before shadow master execution,
administration, or submit hosts.
(Optional) The Grid Engine software directory, $SGE_ROOT, should be installed
locally to cut down on network traffic.
Note:
Windows hosts cannot act as master hosts.
For more information, see How to Install the Master Host.
Shadow Master Hosts
These hosts back up the functionality of sge_qmaster in case the master host or the
master daemon fails. To be a shadow master host, a machine must have the following
characteristics:
■
■
■
It must run sge_shadowd.
It must share sge_qmaster status, job information, and queue configuration
information that is logged to disk. In particular, the shadow master hosts need
read/write root or administration user access to the sge_qmaster spool directory
and to the $SGE_ROOT/$SGE_CELL/common directory.
The $SGE_ROOT/$SGE_CELL/common/shadow_masters file must contain a line
defining the host as a shadow master host.
If no cell name is specified during installation, the value of
$SGE_CELL is default.
Note:
The shadow master host facility is activated for a host as soon as these conditions are
met. You do not need to restart the Grid Engine system daemons to make a host into a
shadow master host.
Note:
Windows hosts cannot act as shadow master hosts.
For more information, see How to Install Shadow Master Host.
Execution Hosts
Execution hosts run the jobs that users submit to the Grid Engine system. An
execution host must first be set up as an administration host. You run an installation
script on each execution host. For more information, see How to Install Execution
Hosts.
Planning the Installation 1-5
User Account Considerations
Administration Hosts
Operators and managers of the Grid Engine system use administration hosts to
perform administrative tasks such as reconfiguring queues or adding Grid Engine
users.
The master host installation script automatically makes the master host an
administration host. During the master host installation process, you can add other
administration hosts. You can also manually add administration hosts on the master
host at any time after installation.
Submit Hosts
Jobs can be submitted and controlled from submit hosts. The master host installation
script automatically makes the master host a submit host.
User Account Considerations
User Names
For the Grid Engine system to verify that users submitting jobs have permission to
submit them on the desired execution hosts, users' names must be identical on the
submit and execution hosts. You might therefore have to change user names on some
machines, because Grid Engine user names map directly to system user accounts.
User names on the master host are not relevant for permission
checking. These user names do not have to match or even exist.
Note:
Installation Accounts
You can install the Grid Engine software either as the root user or as an unprivileged
user, for example, your own user account. However, if you install the software when
you are logged in as an unprivileged user, the installation allows only that user to run
Grid Engine jobs. Access is denied to all other accounts. Installing the software when
you are logged in as root resolves this restriction. However, root permission is
required for the complete installation procedure. Also, if you install as an unprivileged
user, you are not allowed to use the qrsh, qtcsh, or qmake commands, nor can you run
tightly integrated parallel jobs.
To use SMF on Solaris 10 or later hosts and run the Grid Engine software as an
unprivileged user, perform the following additional steps as root user (or user with
appropriate permissions):
1.
Create the new role sgeadmin for the local user :
roleadd -c "Grid Engine SMF Administrator" -g <group> -d <home_dir> -u <UID> -s
<profile_shell> -P "solaris.smf.manage.sge" "sgeadmin"
2.
Assign the just-created role sgeadmin to the user:
usermod -R "sgeadmin" <login>
For a distributed name service, such as NIS, NIS+, or LDAP, create the new role
sgeadmin and assign it to the user:
/usr/sadm/bin/smrole add -D <domain_name> - -n "sgeadmin" -a "normal_user" -d
1-6 Oracle Grid Engine Installation and Upgrade Guide
Installation Methods
<home_dir> -c "Grid Engine SMF Administrator" -p "solaris.smf.manage.sge"
File Access Permissions
If you install the software logged in as root, you might have a problem configuring
root read/write access for all hosts on a shared file system. Therefore, you might have
problems putting the $SGE_ROOT files onto a network-wide file system.
You can force Grid Engine software to run all Grid Engine system components
through a non-root administrative user account, for example sgeadmin. With this
setup, this particular user needs only read/write access to the shared $SGE_ROOT file
system.
The installation procedure asks whether files should be created and owned by an
administrative user account. If you answer "Yes" and provide a valid user name, files
are created by this user. Otherwise, the user name under which you run the
installation procedure is used. Create an administrative user, and answer "Yes" to this
question.
Make sure in all cases that the account used for file handling on all hosts has
read/write access to the $SGE_ROOT directory. Also, the installation procedure assumes
that the host from which you access the Grid Engine software distribution media can
write to the $SGE_ROOT directory.
The name of the root user on Windows hosts depends on the
system language of the Windows operating system. You can even
change the name of the root user. The default name for many
languages is the name Administrator.
Note:
If your Windows host is a member of a Windows domain, only the local
Administrator is the root user. Neither the members of the Administrators group, nor
the domain Administrator, nor a member of the Domain Admins group are the root
user. See User Management on Windows Hosts for more information about users on
Windows hosts.
Network Services
Determine whether your site's network services are defined in an NIS database or in
an /etc/services file that is local to each workstation. If your site uses NIS, determine
the host name of your NIS server so that you can add entries to the NIS services map.
The Grid Engine system services are sge_execd and sge_qmaster. To add the services
to your NIS map, choose reserved, unused port numbers. The following examples
show sge_qmaster and sge_execd entries.
sge_qmaster 6444/tcp
sge_execd 6445/tcp
Installation Methods
Several methods are available for installing the Grid Engine software:
■
Interactive
■
Interactive, with increased security
Planning the Installation 1-7
Directory Organization
■
Automated, using the inst_sge script and a configuration file
■
Upgrade
To decide which installation method you should use, consider the following factors.
■
Do you already have the Grid Engine software installed and running?
■
■
■
■
■
If so, you will probably want to upgrade. The upgrade process is described in
Upgrading Grid Engine.
If not, the master host installation is only done once. The master host is
typically installed interactively, as described in Installing the Software With
the GUI Installer or Installing the Software From the Command Line.
Do you need to install just a few execution hosts? If so, then you will probably
want to install them interactively, as described in Installing the Software With the
GUI Installer or Installing the Software From the Command Line.
Do you need to install a large number of execution hosts? If so, then you might
want to perform automated installation, using the inst_sge script and a
configuration file. See Using the inst_sge Utility and a Configuration Template.
Do you require your grid to use encryption? If so, you have to perform an
interactive installation with increased security. See Installing the Increased
Security Features.
Directory Organization
When determining the directory organization, you must decide the following:
■
■
The directory organization, for example, whether you will install a complete
software tree on each workstation, cross-mounted directories, or a partial
directory tree on some workstations.
Where to locate each $SGE_ROOT root directory.
Because changing the installation directory or the spool
directories requires a new installation of the system, use extra care to
select a suitable installation directory. You can preserve all important
information from a previous installation.
Note:
By default, the installation procedure installs the Grid Engine software, man pages,
spool areas, and the configuration files in a directory hierarchy under the installation
directory as shown in the following figure. If you accept this default behavior, you
should install or select a directory with the access permissions that are described in
File Access Permissions.
1-8 Oracle Grid Engine Installation and Upgrade Guide
Directory Organization
Figure 1–1 Sample Directory Hierarchy
You can choose to put the spool areas in other locations during the primary
installation. See Oracle Grid Engine Administration Guide for more detailed instructions
about configuring queues.
Spool Directories under the Root Directory
During the installation of the master host, you must specify the location of a spooling
directory. This directory is used to spool jobs from execution hosts that do not have a
local spooling directory.
If you are using a Windows execution host, you must use the
local spooling directory.
Note:
■
■
On the master host, spool directories are maintained under qmaster-spool-dir.
The location of qmaster-spool-dir is defined during the master host installation
process. The default value of qmaster-spool-dir is $SGE_ROOT/$SGE_
CELL/spool/qmaster.
On each execution host, a spool directory called execd-spool-dir is defined
during the execution host installation processes. The default value of
execd-spool-dir is $SGE_ROOT/$SGE_CELL/spool/exec-host. You will get better
performance from execution hosts with local spooling directories than from
execution hosts that have NFS mounted the master host's spooling directory.
Planning the Installation 1-9
Directory Organization
If no cell name is specified during installation, the value of
$SGE_CELL is default.
Note:
You do not need to export these directories to other machines. However, exporting the
entire $SGE_ROOT tree and making it write-accessible for the master host and all
executable hosts makes administration easier.
If you use a Lustre fileshare as the spool directory, you should
disable file striping for these directories. For information about how to
disable file striping, refer to the Lustre operation manual located at:
http://wiki.lustre.org/index.php/Lustre_Documentation.
Note:
Choosing Between Classic Spooling and Database Spooling
During the installation, you are given the option to choose between classic spooling
and Berkeley DB spooling. If you choose Berkeley DB spooling, you are then given the
option to spool to a local directory or to a separate host, known as a Berkeley DB
spooling server.
Using a Berkeley DB spooling server might provide better performance than classic
spooling. Part of this performance increase is because the master host can make
non-blocking writes to the database, but has to make blocking writes to the text file
used by classic spooling. Also consider file format and data integrity. Writing to the
Berkeley DB provides a greater level of data integrity than writing to a text file.
However, a text file stores data in a format that you can read and edit. Normally, you
do not need to read these files, but the spooling directory contains the messages from
the system daemons, which can be useful for debugging.
$SGE_ROOT Directory
You must create a directory into which to load the contents of the distribution media.
This directory is called the root directory, or $SGE_ROOT. When the Grid Engine system
is running, this directory stores the current cluster configuration and all other data that
must be spooled to disk.
For efficient spooling, place the spooling directories
somewhere other than within $SGE_ROOT.
Note:
Use a valid path name for the directory that is network-accessible on all hosts. For
example, if the file system is mounted using automounter, set $SGE_ROOT to /usr/SGE6,
not to /tmp_mnt/usr/SGE6.
Note: Throughout this information space, the $SGE_ROOT
environment variable is used to refer to the directory into which the
Grid Engine software is installed.
The $SGE_ROOT directory is the top level of the Grid Engine software directory tree. On
startup, each Grid Engine software component in a cell needs read access to the $SGE_
ROOT/$SGE_CELL/common directory. When Grid Engine software is installed as a
single cluster, the value of $SGE_CELL is default.
1-10 Oracle Grid Engine Installation and Upgrade Guide
Spooling Options
For ease of installation and administration, this directory should be readable on all
hosts on which you intend to run the Grid Engine software installation procedure. For
example, you can select a directory that is available across a network file system, such
as NFS. If you choose to select file systems that are local to the hosts, you must copy
the installation directory to each host before you start the installation procedure for the
particular machine. See File Access Permissions for a description of required
permissions.
Spooling Options
During the installation, you are given the option to choose between classic spooling
and Berkeley DB spooling. If you choose Berkeley DB spooling, you are then given the
option to spool to a local directory or to a separate host, known as a Berkeley DB
spooling server.
Using a Berkeley DB spooling server might provide better performance than classic
spooling. Part of this performance increase is because the master host can make
non-blocking writes to the database, but has to make blocking writes to the text file
used by classic spooling. Also consider file format and data integrity. Writing to the
Berkeley DB provides a greater level of data integrity than writing to a text file.
However, a text file stores data in a format that you can read and edit. Normally, you
do not need to read these files, but the spooling directory contains the messages from
the system daemons, which can be useful for debugging.
Database Server and Spooling Host
The master host can store its configuration and state to a Berkeley DB spooling
database. The spooling database can be installed on the master server or on a separate
host. When the Berkeley DB spools into a local directory on the master host, the
performance is better. If you want to set up a shadow master host, you need to use a
separate Berkeley DB spooling server (host). In this case, you have to choose a host
with a configured RPC service. The master host connects through RPC to the Berkeley
DB.
This configuration does not provide a High-Availability (HA)
solution. For example, scripts of pending jobs are not spooled through
BDB spool server and thus are not available for a shadow master.
Note:
With the introduction of NFS4 software available with the Solaris 10 operating system,
you can use Berkeley DB spooling on a network file system. You could not use
Berkeley DB spooling on previous NFS versions. This circumstance allows a shadow
host installation spooled on Berkeley DB without setting up an additional Berkeley DB
Spooling Server.
Caution: Although using a shadow master host is more reliable,
using a separate Berkeley DB spooling host results in a potential
security hole. RPC communication as used by the Berkeley DB can be
easily compromised. Only use this alternative if your site is secure and
if users can be trusted to access the Berkeley DB spooling host by
means of TCP/IP communication.
If you choose to use Berkeley DB spooling without a shadow master, you do not need
to set up a separate spooling server. Likewise, if you choose not to use Berkeley DB
Planning the Installation 1-11
Scheduler Profiles
spooling, you can set up a shadow master host without setting up a separate spooling
server.
Once you determine whether you need a separate spooling server, you will also need
to determine the location for the spooling directory. The spooling directory must be
local to the spooling server. A default value for the location of the spooling directory is
recommended during installation, but this default value is not suitable when the file
server is different from the master host.
The requirements for the Berkeley DB spooling host are similar to the requirements for
the master host:
■
The host must be a stable platform.
■
The host must not be excessively busy with other processing.
■
■
■
At least 60 to 120 Mbytes of unused main memory must be available to run the
Grid Engine system daemons. For very large clusters that include many hundreds
or thousands of hosts and tens of thousands of jobs in the system at any time, one
GByte or more of unused main memory might be required and two CPUs might
be beneficial.
(Optional) A separate spooling host must be installed before the master host.
(Optional) The $SGE_ROOT directory should be installed locally, to cut down on
network traffic.
Scheduler Profiles
You can choose from three scheduler profiles during the installation process: normal,
high, and max. You can use these predefined profiles as a starting point for Grid
Engine tuning.
Using these profiles, you can optimize the scheduler for one or more of the following:
■
The amount of information that is tracked about a scheduling run
■
The load adjustment during a scheduling run
■
Interval scheduling (the default) or immediate scheduling
You can choose from three scheduler profiles:
■
■
■
normal - This profile uses load adaptation and interval scheduling, and reports all
the information that the scheduler gathers during the dispatch cycle. This profile is
the starting point for most grids. Use this profile if your highest priority is
gathering and reporting information about a scheduling run.
high - This profile is more appropriate for a large cluster, where throughput is
more important than gathering and reporting all the information from the
scheduler. This profile also uses interval scheduling. Use this profile if you want to
get better performance at the cost of getting less information about your
scheduling runs.
max - This profile disables all information gathering and reporting, enables
immediate scheduling, and disables load adaptation. Immediate scheduling is
very useful for sites with high throughput and very short running jobs. The
advantage of immediate scheduling decreases as runtime of the jobs increases.
This profile can be used in clusters of any size where only throughput is important
and everything else is a lower priority.
For more information on how to configure scheduling, see Oracle Grid Engine
Administration Guide.
1-12 Oracle Grid Engine Installation and Upgrade Guide
Getting the Software
Getting the Software
The software is distributed through electronic download and on CD-ROM.
Electronic Download
To electronically download a copy of the Grid Engine software, visit SUN.COM. The
product distribution is in pkgadd format for the Solaris Operating System (Solaris OS).
If you would like to download a copy of the open source Grid Engine software, visit
the download center.
CD-ROM Distribution
For information on how to access CD-ROMs, ask your system administrator or refer to
your local system documentation. For instructions, see Loading the Distribution Files
on a Workstation.
Planning the Installation 1-13
Getting the Software
1-14 Oracle Grid Engine Installation and Upgrade Guide
2
Installing Grid Engine
2
To effectively install Grid Engine, perform the following tasks in the order that they
are listed:
Table 2–1
Installation Tasks
Topic
Description
Planning the Installation
Strategically plan your installation to achieve results that
fit your environment.
Loading the Distribution
Files on a Workstation
Unpack and load the distribution files onto a workstation.
Installing the Software With
the GUI Installer
Learn how to run the new GUI installer and install whole
cluster.
Installing the Software From Learn how to run an installation script on the master host
and on every execution host in the Grid Engine system
the Command Line
and to register information about administration hosts
and submit hosts.
Installing the Increased
Security Features
Set up your system more securely.
Oracle Grid Engine User’s
Guide
Install the Accounting and Reporting Console, an
optional feature that enables you to gather live reporting
data from the Grid Engine system.
Verifying the Installation
Verify that the daemon is running on the master host and
on the Execution Hosts and how to run simple commands
and submit test jobs.
In addition, you might need to perform one or more related tasks:
Table 2–2
Additional Installation Tasks
Topic
Description
Automating the Installation
Process
Learn how to automate the Grid Engine installation process.
Installing SMF Services
Learn how to install the Service Management Facility (SMF)
services.
Installing a JMX-Enabled
System
Learn how to install a JMX-enabled system.
Removing the Software
Learn how to remove the Grid Engine software.
Additional Software for the Learn how to install Grid Engine on Microsoft Windows
Microsoft Operating System operating system.
Installing Grid Engine 2-1
Loading the Distribution Files on a Workstation
Table 2–2 (Cont.) Additional Installation Tasks
Topic
Description
User Management on
Windows Hosts
Learn how to manage user accounts on Windows hosts.
Other Installation Issues
Learn how to identify additional considerations for installing
Grid Engine software.
Loading the Distribution Files on a Workstation
The Grid Engine 6.2 software is distributed on CD-ROM and through electronic
download. The CD-ROM distribution contains a directory named Sun_Grid_Engine_
6_2. The product distribution is in this directory, in both tar.gz format and the pkgadd
format. The pkgadd format is provided for the Solaris Operating System (Solaris OS).
For all supported operating systems, the software is distributed in tar.gz format. For
more on how to obtain the distribution files, see Getting the Software.
How to Load the Distribution Files on a Workstation
Ensure that the file systems and directories that are to contain the Grid Engine
software distribution and the spool and configuration files are set up properly by
setting the access permissions as defined in File Access Permissions.
1.
Provide access to the distribution media. If you downloaded the software, rather
than getting it on CD-ROM, just unzip the files into a directory. This directory
must be located on a file system that has at least 350 MBytes free space.
2.
Log in to a system. Log in preferably on a system that has a direct connection to a
file server.
3.
Create the installation directory. Create an installation directory as described in
$SGE_ROOT Directory.
# mkdir /opt/sge6-2
In these instructions, the installation directory is abbreviated as sge-root.
4.
Install the binaries for all binary architectures that are to be used by any of your
master, execution, and submit hosts in your Grid Engine system cluster. You can
use either the pkgadd Method or the tar Method.
pkgadd Method
The pkgadd format is provided for the Solaris Operating System. To facilitate remote
installation, the pkgadd directories are also provided in zip files.
You can install the following packages:
Table 2–3
Installing Packages Using Pkgadd Method
Package
Description
SUNWsgeec
Architecture independent files
SUNWsgeex
Solaris (SPARC platform) 64-bit binaries for Solaris 8, Solaris 9,
and Solaris 10 Operating Systems
SUNWsgeei
Solaris (x86 platform) binaries for Solaris 8, Solaris 9, and Solaris
10 Operating Systems
SUNWsgeeax
Solaris (x64 platform) binaries for Solaris 10 Operating System
2-2 Oracle Grid Engine Installation and Upgrade Guide
Loading the Distribution Files on a Workstation
Table 2–3 (Cont.) Installing Packages Using Pkgadd Method
Package
Description
SUNWsgeea
Accounting and Reporting Console (ARCo) packages for the
Solaris and Linux Operating systems.
As you type the following commands, you must be prepared to respond to script
questions about your base directory, sge-root, and the administrative user. The script
requests the choices that you made during the planning steps of this installation. See
Planning the Installation for further details.
At the command prompt, type the following commands, responding to the script
questions.
# cd cdrom_mount_point/Sun_Grid_Engine_6_2
# pkgadd -d ./Common/Packages SUNWsgeec
Depending on the Solaris binary that you need, type one of the following commands:
#
#
#
#
pkgadd
pkgadd
pkgadd
pkgadd
-d
-d
-d
-d
./Solaris_sparc/Packages SUNWsgee
./Solaris_sparc/Packages SUNWsgeex
./Solaris_x86/Packages SUNWsgeei
./Solaris_x64/Packages SUNWsgeeax
tar Method
For all supported operating systems, the software is distributed in tar.gz format.
Regardless of platform, install the architecture independent file Common/tar/sge-6_
2-common.tar.gz.
The tar files that contain platform-specific binaries use the naming convention of
sge-6_2-bin-architecture.tar.gz.
The following table lists the platform-specific binaries. Install the file for each platform
that you need to support. Note that each platform has its own directory under Sun_
Grid_Engine_6_2.
Table 2–4
Installing Binaries Using Tar Method
Platform-Specific File
Platform
Solaris_sparc/tar/sge-6_ Solaris (SPARC platform) 64-bit binaries for Solaris 8,
2-bin-solaris-sparcv9.ta Solaris 9, and Solaris 10 Operating Systems
r.gz
Solaris_x86/tar/sge-6_
Solaris (x86 platform) binaries for Solaris 8, Solaris 9, and
2-bin-solaris-i586.tar.g Solaris 10 Operating Systems
z
Solaris_x64/tar/sge-6_
Solaris (x64 platform) 64-bit binaries for Solaris 10
2-bin-solaris-x64.tar.gz
Windows/tar/sge-6_
Microsoft Windows (x86 platform) 32-bit binaries for
2-bin-windows-x86.tar.gz Windows 2000, XP and Windows Server 2003
Linux24_i586/tar/sge-6_ Linux (x86 platform) binaries for the 2.4 and 2.6 kernel
2-bin-linux24-i586.tar.g
z
Linux24_amd64/tar/sge-6_ Linux (Itanium platform) binaries for the 2.4 and 2.6 kernel
2-bin-linux24-ia64.tar.g
z
Installing Grid Engine 2-3
Installing the Software With the GUI Installer
Table 2–4 (Cont.) Installing Binaries Using Tar Method
Platform-Specific File
Platform
Linux24_amd64/tar/sge-6_ Linux binaries for the 2.4 and 2.6 kernel
2-bin-linux24-x64.tar.gz
MacOSX/tar/sge-6_
2-bin-darwin-ppc.tar.gz
Apple Mac OS/X (PowerPC platform)
MacOSX/tar/sge-6_
2-bin-darwin-x64.tar.gz
Apple Mac OS/X (Intel-based platform)
HPUX11/tar/sge-6_
2-bin-hp11.tar.gz
Hewlett-Packard HP-UX 11 or higher
HPUX11/tar/sge-6_
2-bin-hp11-64.tar.gz
64-bit binaries for Hewlett-Packard HP-UX 11 or higher
Aix43/tar/n1ge-6_
1-bin-aix51.tar.gz
IBM AIX 5.1 and 5.3
Type the following commands at the command prompt. In the example, <basedir> is
the abbreviation for the full directory, cdrom-mount-point/Sun_Grid_Engine_6_2.
%
#
#
#
#
#
#
su
cd <sge-root>
gzip -dc <basedir>/Common/tar/sge-6_2-common.tar.gz | tar xvpf gzip -dc <basedir>/Solaris_sparc/tar/sge-6_2-bin-solsparc32.tar.gz | tar xvpf gzip -dc <basedir>/Solaris_sparc/tar/sge-6_2-bin-solsparc64.tar.gz | tar xvpf SGE_ROOT=<sge-root>; export SGE_ROOT
util/setfileperm.sh $SGE_ROOT
Installing the Software With the GUI Installer
A new GUI installer to simplify the installation process is available since Grid Engine
6.2u2. The GUI installer enables you to easily install a whole cluster interactively. To
install a cluster, you need to set up the environment in a similar way to an automatic
installation.
Requirements
■
The GUI installer requires at least Version 5 of the Java platform.
■
Screen resolution of 1024x768 or larger.
■
■
■
(Optional) Password-less ssh or rsh access as root user to all remote hosts that
you want to install. If this requirement is not met you can only install Grid Engine
components on a local host. For more information, see How to Configure
Password-less Access for the root User. You can still use the GUI installer by
starting it locally from each remote host.
Start the installer as root user.
Ensure that you start the installation from the qmaster host when password-less
root access is available.
For information on installation modes supported by the GUI installer, see these topics:
2-4 Oracle Grid Engine Installation and Upgrade Guide
Express Installation
Topic
Description
Express Installation
Enables first-time users to install the software easily. Provides a
significantly reduced set of parameters that need to be
configured. Requires password-less ssh access as root user to all
remote hosts that you want to install.
Custom Installation
Enables you to configure almost all existing options that are
available during the command-line installation. Offers more
advanced features for the cluster host selection. Requires
password-less ssh or rsh access as root user to all remote hosts
that you want to install.
For additional reference information, see these topics:
Topic
Description
How to Configure
Procedure for configuring a password-less ssh or rsh access for
Password-less Access the root user to install a whole Grid Engine (SGE) cluster by using
the GUI installer.
for the root User
Understanding Host
and Installation
States
Describes the different installation states that you might encounter
while using the GUI installer.
Tweaking start_gui_
installer
Describes the command-line options of the start_gui_installer
command and how to use them to fine tune the performance of the
installer.
Troubleshooting the
GUI Installer
Contains known issues and their workarounds.
Express Installation
The express installation mode is targeted at first-time users and provides a
significantly reduced set of parameters to configure. This mode also provides
reasonable default values for most of the parameters. You must have a password-less
ssh or scp access if you are planning to install Grid Engine on remote hosts. The
following steps describe a complete cluster installation and assume that the
password-less access is configured. (Click any of the screen captures in the following
steps to view more details.)
Using the Express Installation Mode
The express installation steps are as follows.
1.
Start the GUI installer. On the welcome screen, click Next.
Ensure that you start the GUI installer on the qmaster host. As
root, run the start_gui_installer command in your sge-root
directory. For example:
Note:
master:/sge# ./start_gui_installer
Starting Installer ...
2.
Choose components to install. Click Next. See the following table for a brief
explanation of options displayed on this screen.
Installing Grid Engine 2-5
Express Installation
Host type
Description
Qmaster host
Main component in Grid Engine software. You must install
exactly one qmaster component per Grid Engine cluster
installation.
Execution host(s)
Hosts that execute the tasks (jobs).
Shadow host(s)
Hosts that provide a high availability feature to the cluster. In case
the qmaster fails (for example, due to a crash or network issue),
one of the shadow hosts takes over the qmaster responsibility.
Berkeley db host
Host that implies a Berkeley db host spooling option. Grid Engine
then spools data to a remote server. Not recommended as the
default option.
If you are not sure what you want to install, keep the components selected by
default.
3.
Modify the main configuration details. Click Next.
Figure 2–1 Main Congifuration Information
Option
Description
Admin user
Grid Engine processes will be executed under this user name, and
certain directories will be owned by this user.
Qmaster host
Host that will run qmaster daemon (main component). It can be
changed later in the host selection.
Grid Engine root
directory
Directory where you unpacked Grid Engine tar.gz archive or
installed a package (for example, rpm, pkg). It must not contain an
automounter prefix.
Cell name
Name of this Grid Engine cell, a value that identifies an instance of
Grid Engine when several instances run simultaneously.
2-6 Oracle Grid Engine Installation and Upgrade Guide
Express Installation
Option
Description
Cluster name
Name of this Grid Engine instance used by SMF on Solaris
machines. In express installation mode, this instance is hidden and
has a default value of p6444. The following naming restrictions
apply to this field: The cluster name must start with a letter
([A-Za-z]), followed by letters, digits ([0-9]), dashes ("-"), or
underscores ("_").
Qmaster port
Port that will be used by the qmaster daemon. Default value is
6444.
Execd port
Port that will be used by the execution daemon. Default value is
6445.
Administrator mail
Email address used by Grid Engine to report issues to the grid
administrator. Default value is none (no emails will be sent).
Automatically start
service(s) at machine
boot
Component (service) will be automatically started at machine boot.
By default, this is selected.
Typically, one would provide a valid administrator email and click next.
4.
Select hosts to be installed and fix reported problems. Click Install to start the
installation on the reachable hosts.
Figure 2–2 Selecting Hosts
This screen allows you to select the hosts and components that you would like to
install. Express installation mode has a slightly simplified selection model. Custom
installation mode enables you to change the components that will be selected once
new hosts are added. The qmaster host is added based on the qmaster host value
from the main configuration screen by default. You can select the hosts in one of
two different ways:
■
By a host name, host name pattern, or by an IP address or IP address pattern
Installing Grid Engine 2-7
Express Installation
■
From a file that you create using the installer's save action
The patterns do not support regular expressions. The supported expressions are
lists and numeric ranges. For more information, see the following table:
Description
Input
Resolved Value
Host name
grid00
grid00
IP address
192.168.0.1
192.168.0.1
List of hosts
grid00 grid01 grid03
grid00 grid01 grid03
List of IP addresses
192.168.0.1 192.168.0.2
192.168.0.5
192.168.0.1
192.168.0.2 192.168.0.5
Host ranges
grid[00-03]
grid00 grid01 grid02 grid03
Range of IP
addresses
192.[168-169].0.[50-60] 192.168.0.50 ... 192.168.0.60,
192.169.0.50 ... 192.169.0.60
In the following screen sequence, hosts grid00 to grid10 are added as execution
and submit hosts. However, host grid11 has an error. See Understanding Host
and Installation States for a complete list of errors and possible solutions. Note
that each state has a tooltip that displays a better error message. Once the errors
are resolved on the problematic hosts, select hosts that you want to verify and
right-click. A pop-up menu enables you to refresh selected hosts. Optionally,
invalid hosts can be removed. Once the states have been refreshed, a different
error state or reachable state will be displayed.
Figure 2–3 Adding Hosts grid00 to grid10
2-8 Oracle Grid Engine Installation and Upgrade Guide
Express Installation
Figure 2–4 Unreachable Host State
5.
(Optional) Modify the host configuration. Click OK. Select a host in the Select
hosts screen, right-click on the host and click Configure to modify the host
configuration.
Figure 2–5 Modifying Host Configuration
Table 2–5
Host Configuration Information
Option
Description
Local execd spool
directory
Directory for local execd spooling data.
JVM library path
Path to the JVM library on the qmaster
and/or shadow hosts.
Additional JVM args
Additional arguments to be used when
starting the JVM in qmaster.
Connect user
The user which will be used to connect to
the remote host using ssh or scp.
Resolve timeout(sec)
Timeout value for any resolving task.
Installing Grid Engine 2-9
Express Installation
Table 2–5
(Cont.) Host Configuration Information
Option
Description
Install timeout (sec)
Timeout value for any installation task.
6.
(Optional) Fix problems reported during pre-install validation, then click Install.
When you click the Install button as described in Step 5, the installation does not
start immediately. First, the installer executes a series of advanced checks for each
host to verify that there is no misconfiguration. If the validation fails, host states
are updated and you are presented with an option to return to the host selection or
to continue with the installation.
Continuing the installation after the installer reports errors
will likely result in a failed installation. Before restarting the
installation, you should return to the host selection and either resolve
the reported problems or remove the hosts that have configuration
errors.
Note:
In the following screen, one host has a configuration error. See Understanding
Host and Installation States for a complete list of errors and possible solutions.
Notice that each state includes a tooltip that displays an error message.
Figure 2–6 Configuration Warning Message
7.
Monitor the progress of the installation, then click Next.
2-10 Oracle Grid Engine Installation and Upgrade Guide
Express Installation
Figure 2–7 Grid Engine Installing Status
Figure 2–8 Success Status of Installing Grid Engine
Installing Grid Engine 2-11
Express Installation
Figure 2–9 Error Message for Existing Cluster
If there were any failures during the installation, the Failed tab is selected. See
Understanding Host and Installation States for a complete list of installation states.
Click the Log button for each failed installation for more information.
This error is displayed because the cluster name p6444 already exists on this host
(installation was not attempted).
8.
Review the overview information, then click Done.
2-12 Oracle Grid Engine Installation and Upgrade Guide
Custom Installation
Figure 2–10
Reviewing Grid Engine Installation Results
Optionally, print or save the information about the Grid Engine configuration for
future reference. The page is also automatically saved to the $SGE_ROOT/$SGE_
CELL/Readme_TIMESTAMP.html file. If the page could not be saved there, due to
root being mapped to nobody on NFS shared file system, it is saved to
/tmp/Readme_TIMESTAMP.html. To verify the installation, go to Verifying the
Installation.
Custom Installation
The custom installation mode is targeted at the experienced users. It offers more
advanced customization of Grid Engine installation than the Express Installation. It
provides default values for most of the parameters. You must have a password-less
ssh or rsh access if planning to install Grid Engine on remote hosts. The following
steps assume that the password-less access is configured and describe a cluster
installation consisting of:
■
Qmaster host with JMX feature enabled
■
Three execution hosts on various architectures
■
One shadow host
■
One administrative host
■
Four submit hosts
Using the Custom Installation Mode
The custom installation steps are as follows.
1.
Start the GUI Installer. On the welcome screen, click Next.
Installing Grid Engine 2-13
Custom Installation
Ensure that you start the GUI installer on the qmaster host. As
root, run the start_gui_installer command in your sge-root
directory. For example:
Note:
master:/sge# ./start_gui_installer
Starting Installer ...
2.
Choose components to install, including a shadow host and the custom
installation option, and click Next. See the following table for a brief explanation
of options displayed on this screen.
Host type
Description
Qmaster host
Main component in Grid Engine software. Exactly one
qmaster component must be installed per Grid Engine
cluster installation.
Execution host(s)
Hosts that execute the tasks (jobs).
Shadow host(s)
Shadow hosts provide a high availability feature to the
cluster. In case that the qmaster fails (crash, network
issue), one of the shadow hosts will take over the
qmaster responsibility.
Berkeley db host
Selecting it implies a Berkeley db host spooling option.
The Grid Engine then spools data to a remote server.
Not recommended as default option.
3.
Modify the main configuration details. Click Next.
Figure 2–11
Modifying Main Configuration Information
2-14 Oracle Grid Engine Installation and Upgrade Guide
Custom Installation
Option
Description
Admin user
Grid Engine processes will be executed under this
user name, and certain directories will be owned by
this user.
Qmaster host
Host that will run qmaster daemon (main
component). It can be changed later in the host
selection.
Grid Engine root directory
Directory where you unpacked the Grid Engine
tar.gz archive or installed a package (for example,
rpm, pkg). It must not contain an automounter prefix.
Cell name
Name of this Grid Engine cell, a value that identifies
an instance of a Grid Engine when several instances
run simultaneously.
Cluster name
Name of this Grid Engine instance used by SMF on
Solaris machines. The following naming restrictions
apply to this field: The cluster name must start with a
letter ([A-Za-z]), followed by letters, digits ([0-9]),
dashes ("-"), or underscores ("_").
Qmaster port
Port that will be used by the qmaster daemon. Default
value is 6444.
Execd port
Port that will be used by the execution daemon.
Default value is 6445.
Group id range
Range of additional group IDs. The group IDs in this
range must not be used anywhere else. The size of the
range determines how many concurrent jobs can run
in Grid Engine. Choose a large value.
Shell name
Shell to be used while connecting to remote hosts
(with ssh or rsh syntax). Expected values for this field
are ssh or rsh.
Copy command
Command to be used while copying files to remote
hosts (with scp or rcp syntax). Expected values for
this field are scp or rcp.
Administrator mail
Email address used by the Grid Engine to report
issues to the grid administrator. Default value is none
(no emails will be sent).
Automatically start service(s)
at machine boot
Component (service) will be automatically started at
machine boot. By default, this is selected.
Use JMX
Triggers installation of a JVM thread in qmaster.
Currently only needed when you plan to install
Service Domain Manager. By default, this is selected.
Ignore domain names
Grid Engine will ignore domain names when
comparing host names. By default, this is selected.
Use CSP product mode
Grid Engine will be installed with certificate security
protocol (CSP). Communication between Grid Engine
daemons will be protected by an SSL certificate. Has
impact on cluster throughput. By default, this is not
selected.
Typically, one would customize the default values and click Next.
4.
Modify the JMX configuration details. Click Next.
Installing Grid Engine 2-15
Custom Installation
Figure 2–12
Modifying JMX Configuration Details
Option
Description
JMX port
Port number to be used by JVM thread in qmaster process.
Enable SSL server
authentication
Once enabled, SSL certificate configuration will be presented
later. The server certificate will be used for authentication and
encryption.
Enable SSL client
authentication
Client authentication will be used.
Path to the keystore
Path to Java keystore file that will be created during the qmaster
installation.
Keystore password
Keystore password. Default value is changeit.
Retype password
Password to retype. Default value is changeit.
5.
Modify the spooling configuration. Click Next.
2-16 Oracle Grid Engine Installation and Upgrade Guide
Custom Installation
Figure 2–13
Modifying the Spooling Configuration
Option
Description
Qmaster spool directory
Directory for qmaster spooling data.
Global execd spool directory
Directory for execution daemon spooling
directory used by default for all execution
hosts. Unless overridden in the host
selection screen, each execution host
creates a subdirectory in the global execd
spool directory.
Classic spooling method
Spooling is done in human readable
format.
Berkeley db spooling method
Spooling is done to local Berkley db.
Berkeley db spooling server spooling
method
Spooling is done to Berkley db server.
Berkeley db host
Host for Berkeley db server, enabled only
when Berkeley db spooling server method
is selected.
Db directory
Berkeley db spooling directory, either on
local host or Berkeley db host, if Berkeley
db spooling server method is selected.
6.
(Optional) Provide SSL certificate information. Click Next.
Installing Grid Engine 2-17
Custom Installation
Figure 2–14
Providing SSL Configuration Details
This screen is displayed only when you have previously selected the JMX or CSP
features. An SSL certificate will be generated as part of qmaster installation. This
certificate will then be used throughout the Grid Engine.
Option
Description
Country code
Two-character country code.
Default value is DE.
State
State. Default value is GERMANY.
Location
Location. Default value is
Building.
Organization
Organization. Default value is
Organisation.
Organization unit
Organization unit. Default value
is Organisation_unit.
Email address
Email address. Default value is
name@yourdomain.com.
7.
Select hosts to be installed and fix reported problems. Click Install to start the
installation on the reachable hosts.
This screen allows you to select the hosts and components that you would like to
install. The qmaster host is added based on the qmaster host value from the main
configuration screen by default. You can select the hosts in one of two different
ways:
■
By a host name, host name pattern, or by an IP address or IP address pattern
■
From a file that you create using the installer's save action
2-18 Oracle Grid Engine Installation and Upgrade Guide
Custom Installation
The patterns do not support regular expressions. The supported expressions are
lists and numeric ranges. For more information, see the following table:
Description
Input
Resolved Values
Host name
grid00
grid00
IP address
192.168.0.1
192.168.0.1
List of hosts
grid00 grid01 grid05
grid00 grid01 grid05
List of IP addresses
192.168.0.1 192.168.0.2 192.168.0.1 192.168.0.2
192.168.0.5
192.168.0.5
Host ranges
grid[00-10]
grid00, grid01, ...,
grid10
Range of IP
addresses
192.[168-169].0.[50-60
]
192.168.0.50 ...
192.168.0.60,
192.169.0.50 ...
192.169.0.60
In the following screen sequence, six shadow, execution admin, and submit hosts
are added from a file and later seven execution and admin hosts are added via
hostname. Two hosts (grid11, grid12) have errors; they are unreachable. See
Understanding Host and Installation States for a complete list of errors and
possible solutions. Note that each state has a tooltip that displays a better error
message. Hosts can be refreshed or removed using a context menu. The default
component selection may be changed from execution and submit host to also
include shadow and admin host before pressing the Add button. The selected
components will be applied to any newly added hosts.
Figure 2–15
Selecting Hosts from File
Installing Grid Engine 2-19
Custom Installation
Figure 2–16
Selecting Hosts Using Hostname
Figure 2–17
Unreachable Hosts in Selected Host List
8.
(Optional) Modify the host configuration. Click OK. Select a host in the Select
hosts screen, right-click on the host and click Configure to modify the host
configuration.
2-20 Oracle Grid Engine Installation and Upgrade Guide
How to Configure Password-less Access for the root User
Option
Description
Local execd spool
directory
Directory for local execd spooling data.
JVM library path
Path to the JVM library on the qmaster and/or shadow
hosts.
Additional JVM args
Additional arguments to be used when starting the JVM
in qmaster.
Connect user
The user which will be used to connect to the remote
host using ssh or scp.
Resolve timeout (sec) Timeout value for any resolving task.
Install timeout (sec)
9.
Timeout value for any installation task.
(Optional) Fix problems reported during pre-install validation. Click Install. When
you click the Install button as described in Step 8, the installation does not started
immediately. First, the installer executes a series of advanced checks for each host
to verify that there is no misconfiguration. If the validation fails, host states are
updated and you may return to the host selection or continue with the installation.
Continuing the installation after the installer reports errors
will likely result in a failed installation. Before restarting the
installation, you should return to the host selection and either resolve
the reported problems or remove the hosts that have configuration
errors.
Note:
See Understanding Host and Installation States for a complete list of errors and
possible solutions. An example of a pre-install validation with errors can be found
in Express Installation.
10. Monitor the progress of the installation, then click Next.
If there were any failures during the installation, see Understanding Host and
Installation States for a complete list of installation states. Click the Log for each
failed installation for more information as shown in this example.
11. Review the overview information, then click Done. Optionally, print or save the
information about the Grid Engine configuration for future reference. The page is
also automatically saved to the $SGE_ROOT/$SGE_CELL/Readme_TIMESTAMP.html
file. If the page could not be saved there, due to root being mapped to nobody on
NFS shared file system, it is saved to /tmp/Readme_TIMESTAMP.html. To repeat the
installation or to install more hosts, click Continue. To verify the installation, go to
Verifying the Installation.
How to Configure Password-less Access for the root User
This section describes how to set up a password-less ssh or rsh access for the root user
to install a whole Grid Engine cluster at once by using the GUI Installer. The Grid
Engine installation must be started on the qmaster host, so you need to first decide
which host is going to be the qmaster host. The following instructions use qmaster as
the qmaster host name. You must replace qmaster with your qmaster host name.
Installing Grid Engine 2-21
How to Configure Password-less Access for the root User
You can skip this procedure if you plan to install Grid Engine
only on a local host.
Note:
WARNING: Enabling root login without a password can be a
security risk! The commands and configuration files used in the
following procedure are applicable only to the Solaris 10 operating
system. You can substitute these with commands and configuration
files that are appropriate for your operating system.
Installing Grid Engine cluster with CSP option may
additionally require password-less access to the localhost (qmaster
host to the qmaster host).
Note:
Configuring Password-less ssh Access for the root User
1.
2.
Enable root login. For security reasons, using ssh as root is disabled on many
platforms by default. Perform the following for each host on which you will log in
using password-less ssh as the root user:
1.
As root, open the /etc/ssh/sshd_config file.
2.
Modify PermitRootLogin no to PermitRootLogin yes.
Restart ssh service on all remote hosts. As root type the following command.
svcadm disable -st ssh ; svcadm enable ssh
3.
Generate a certificate on the qmaster host. As root, type the following command
to generate the RSA key on the qmaster host. You should leave the passphrase
empty.
# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ec:fa:48:55:c4:3d:59:40:a6:27:10:a2:90:11:de:dc root@qmaster
4.
Copy the certificate to all remote hosts. Copy the generated public key contained
in a id_rsa.pub file to every remote host that should accept root login without a
password from this host. The following example enables root access to host
grid05 from host qmaster.
qmaster# cat /root/.ssh/id_rsa.pub
ssh-rsa
ACCCB3NzaC1yc2EBBBBBIwAAAIEA1xfRiZMV7xt8EMDollLQH5RTAVz3lIXkr/FTfcbwjuMa0t/PdO9
gBnJY03e1mIIpjDPiqT2IWfdrzHZB4xvl0MBNhMTWf8Gd3WDO4T7/zw7VhlqT6wUl0ncrhzE5BTIMB0
i0X/amgidEzFbL+hE3RvPuowapNZUv+JC1IjDVmmE= root@qmaster
qmaster# ssh grid05
The authenticity of host 'grid05 (192.168.1.5)' can't be established.
RSA key fingerprint is ec:fa:48:55:c4:3d:59:40:a6:27:10:a2:90:11:de:dc.
Are you sure you want to continue connecting (yes/no)? yes
2-22 Oracle Grid Engine Installation and Upgrade Guide
Understanding Host and Installation States
Password:
grid05# mkdir -p ~/.ssh
grid05# echo "ssh-rsa
ACCCB3NzaC1yc2EBBBBBIwAAAIEA1xfRiZMV7xt8EMDollLQH5RTAVz3lIXkr/FTfcbwjuMa0t/PdO9
gBnJY03e1mIIpjDPiqT2IWfdrzHZB4xvl0MBNhMTWf8Gd3WDO4T7/zw7VhlqT6wUl0ncrhzE5BTIMB0
i0X/amgidEzFbL+hE3RvPuowapNZUv+JC1IjDVmmE= root@qmaster" >> ~/.ssh/authorized_
keys
5.
Verify if you are able to connect to the hosts as root without a password. As root,
type the following command.
ssh <remote_password-less_host>
If you are able to connect to the hosts without being prompted, password-less
access to the hosts has been set up. Now, you can invoke the GUI installer using
the start_gui_installer command from your sge-root directory.
Configuring Password-less rsh Access for the root User
1.
2.
3.
Enable root login. Normally, the root user can only log in to the console
/dev/console. You can remove this restriction by performing the following.
1.
Open the /etc/default/login file.
2.
Comment out the CONSOLE=/dev/console line by inserting a # character at the
beginning of the line. You need to perform this for each remote host you
would like to log in to.
Set up access without a password.
1.
Create a .rhosts file.
2.
Add a single line that contains the qmaster's host name optionally followed by
a + sign. For example, if foo is the qmaster's host name, add the line foo + or
simply foo to the .rhosts file.
3.
Copy this file to the root user's home directory on each of the remote hosts
where you wish to install Grid Engine. This will allow root to log in from the
qmaster host without a password to any machine that will be part of the
cluster.
Restart rlogin service on all remote hosts. As root, type the following command.
svcadm disable -st rlogin ; svcadm enable rlogin
4.
Verify if you are able to connect to the hosts as root without a password. As root,
type the following command.
rlogin <remote_password-less_host>
If you are able to connect to the hosts without being prompted, password-less
access to the hosts has been set up. Now, you can invoke the GUI installer using
the start_gui_installer command from your sge-root directory. Choose the
Custom Installation mode and replace ssh with rsh and scp with rcp in the Main
configuration panel.
Understanding Host and Installation States
This section lists the different installation states that you might encounter while using
the GUI installer. The installation states can be divided into the following three
categories.
Installing Grid Engine 2-23
Understanding Host and Installation States
Host Resolving
When a new host is added in the Select hosts screen, the host name State field is
immediately set to New unknown host and host name resolving process is initiated.
The host name is marked as Reachable only if the architecture of the host can be
retrieved. All the other states specify an error. The GUI installer cannot perform any
installation on such a host. The following table lists all possible states.
Table 2–6
Host Installation States
State
Description
New unknown host
Initial state. When the host name is added, the GUI installer
immediately starts resolving the host name or IP address of the
host, if there are available threads in the resolve pool.
Resolving
Temporary state. The host is being resolved based on the host
name or IP address by using the default name service.
Unknown host
Final state. The host cannot be resolved by the name service.
Resolvable
Both temporary and final state. Once host has been resolved and if
we have available threads in the resolve pool, we immediately try
to get the host's architecture via an ssh or rsh call. If this is the
final state, the installer was probably not able to ssh/rsh to the
host without a password. Check the tooltip message for more
information. Right-click on the host and select 'Configure...' action
and verify that the intended 'Connect user' has been used for
remote connection on that host.
Contacting
Temporary state. The host has been resolved and the host's
architecture is being retrieved.
Missing remote file
Final state. Missing file '$SGE_ROOT/util/arch' on remote host. Is
the sge-root path the same for the remote host and the local host?
If not, fix the path or refer to using path aliasing.
Reachable
Final state. The host architecture cannot be retrieved.
Password-less ssh or rsh access to remote hosts is working
properly.
Unreachable
Final state. The host architecture cannot be retrieved.
Password-less ssh or rsh access to remote hosts is not working
properly. See How to Configure Password-less Access for the root
User for more information.
Canceled
Final state. The user has canceled the host resolving process.
Host Validating
After the hosts have been resolved and their architecture has been retrieved, they are
moved to the Reachable tab in the Select hosts screen. You can install Grid Engine on a
host that is in the Reachable state. While clicking the Install button, the GUI installer
first invokes additional remote host validation. If the installer discovers any
configuration errors (see RED and ORANGE states in the list below), the installation is
not initiated and the appropriate error message is displayed. You can return to the
Select hosts screen and proceed with the installation if you wish.
2-24 Oracle Grid Engine Installation and Upgrade Guide
Understanding Host and Installation States
Table 2–7
Configuration Error and Resolution
State
Description
Problem Resolution
Copy timeout
Timeout occurred when
copying check_host or
install_component files.
See tooltip for the exact file
name.
Try again (press Install button one more
time). If timeout reoccurs, save your host
list to a file, stop the installer and restart it
with increased timeout values. See
Tweaking start_gui_installer.
Copy failed
Copying files check_host or
install_component to the
remote host failed. See
tooltip for the exact file
name.
Try again (press Install button one more
time). If problems reoccurs try to copy a
any file with scp or rcp to verify these
commands work properly. If not make sure
they do before new installation attempt.
Permission
denied
Either of Berkeley DB,
qmaster, execution daemon
spool directory or JMX
keystore file is not writable.
See tooltip for the exact
message. Installation will
most likely fail, if you
proceed anyway.
Did you start the installation as root? What
permissions are for the first existing
directory? Are you on a NFS file system
with root mapped to nobody? Is the UID
for the admin user the same on the local
and remote machine?
Admin user
missing
The admin user entered in Setup the host properly so that name
the main configuration
service provides the name properly to the
screen does not exist on the remote machine (or create the user locally).
remote machine.
Directory exists
Berkeley DB spool
directory already exists!
Wrong FS type
Specified Berkeley DB
Go back to the spooling configuration
spool directory is on a local screen and choose a proper local directory.
file system.
Unknown error
Unknown error has
occurred.
Reachable
Validation did not discover NA
any issues for this remote
host.
Canceled
User canceled further host
validation.
Check the remote host for existing Berkeley
DB installations. Remove the existing
directory.
Try again (press Install button one more
time). If reoccurring, ignore and try to
install anyway.
NA
Installation States
When the installation is started the host list with the chosen components is
transformed to a task list. The task list is better suited to handle dependencies. These
are the states one may encounter during the installation.
Table 2–8
Installation States
State
Description
Waiting
Task is waiting to be executed.
Processing
Temporary state. Task is being processed.
Timeout
Task did not finish before timeout value has been reached.
Success
Task finished successfully.
Installing Grid Engine 2-25
Tweaking start_gui_installer
Table 2–8 (Cont.) Installation States
State
Description
Failed
Task finished unsuccessfully. Click the Log button to get more
information.
Tweaking start_gui_installer
The start_gui_installer command will start the Java? GUI installer. This section
describes the command-line options of start_gui_installer, that you might use to
affect the performance of the installer in your environment or possibly use as a
workaround for yet unknown issues.
The Help text can be invoked by calling the -help option.
master:/sge62u2 # ./start_gui_installer -help
Usage: start_gui_installer [-help] [-resolve_pool=<num>] [-resolve_timeout=<sec>]
[-install_pool=<num>] [-install_timeout=<sec>] [-connect_user=<usr>]
[-connect_mode=windows]
<num> ... decimal number greater than zero
<sec> ... number of seconds, must be greater then zero
<usr> ... user id
If no parameter is specified, the start_gui_installer command is started as if the
following command was called:
master:/sge62u2 # ./start_gui_installer -resolve_pool=12 -resolve_timeout=20
-install_pool=8 -install_timeout=120
Every installation generates installation logs in the sge_root/sge_cell/install_logs
directory. In addition, a GUI log file is created in a $TEMP directory (usually /var/tmp
or /tmp) named SGE_Gui-Installer_Log_<date>.txt.
Description of start_gui_installer Options
Table 2–9
GUI Installer Options
Option
Description
-help
displays help for start_gui_installer
-resolve_pool=<num>
Defaults to 12. Defines how many hosts can be resolved in
parallel when adding new hosts, refreshing their states or when
validating hosts. The higher the value the higher load will be
generated when resolving hosts, refreshing host states, copying
an installation script to remote hosts or validating hosts.
-resolve_
timeout=<sec>
Defaults to 20 seconds. A timeout value for any operation in a
resolve_pool (resolving hosts, refreshing host states, copying
an installation script to remote host). Host validation has a
timeout which is always equal to 2*resolve_timeout value.
Increase the default value if you see hosts with Unreachable
state and you are sure that password-less access is working
correctly for the connect_user.
-install_pool=<num>
Defaults to 8. Defines how many execution daemons can be
installed in parallel. The higher the value the higher load will be
generated when performing installation tasks.
2-26 Oracle Grid Engine Installation and Upgrade Guide
Using start_gui_installer Options
Table 2–9
(Cont.) GUI Installer Options
Option
Description
-install_
timeout=<sec>
Defaults to 120 seconds. A timeout value for any installation
task. Increase the default value if you see that the installation
tasks are failing with a Timeout state.
-connect_user=<user> Defaults to current user. User name that will be used when
connecting to remote hosts.
-connect_
mode=windows
When set, each connect_user is prefixed by a host domain (see
examples below). This is useful when installing multiple
windowd execution hosts that require a different connect_user.
-debug
Starts the installer in a debug mode. Prints a lot of output to the
terminal. Intended for developer purposes, but may provide
additional information when unexpected circumstances occur.
Using start_gui_installer Options
installing as a Different connect_user
Suppose that you cannot log in as the root user, but can log in as another privileged
user with uid=0, called admin. In this case an attempt for a remote connection would
be done as current user, but due to uid=0 we would connect as root if root is the
primary user with uid=0 on the remote host. Users admin and root would have
different home directories and we assume that the password-less access was setup
only for the user admin, so the connection without a password as currect user would
fail. Invoking the following command will enforce that every remote connection is
established as the admin user.
master:/sge62u2 # ./start_gui_installer -connect_user=admin
Installing Single Windows Execution Host
Suppose you want to use the installer to add a single windows execution hosts to the
existing cluster. The host is called win-01 and belongs to the WIN-01 domain. Also, the
privileged user in this case is Admin (part of Administrators group).
The windows hosts can only be installed remotely from a UNIX/LINUX system and
you cannot become an Admin user there. So you might use -connect_
user=WIN01+Admin to connect as the correct user directly.
master:/sge62u2 # ./start_gui_installer -connect_user=WIN01+Admin
Installing Multiple Windows Execution Hosts
Suppose you have additional hosts win-02 belonging to the WIN-02 domain and win_
vista-01 belonging to the WIN_VISTA-01 domain. All hosts have Administrator user
privileges. In this case, you can use the following command to start the GUI installer
that will allow you to install all the three Windows execution hosts simultaneously.
master:/sge62u2 # ./start_gui_installer -connect_user=Administrator -connect_
mode=windows
Every remote connection to host win-01 would be done as WIN-01+Administrator
user.
Installing Grid Engine 2-27
Troubleshooting the GUI Installer
Every remote connection to host win-02 would be done as WIN-02+Administrator
user.
Every remote connection to host win_vista-01 would be done as WIN_
VISTA-01+Administrator user.
Troubleshooting the GUI Installer
You will find the known issues and their workarounds in this section as well as
additional answers to some frequently asked questions.
FAQs
I cannot start the installer. It throws an exception!
Most likely a general problem with any GUI application in your current environment.
You are probably starting the installer on a remote host and either did not export the
DISPLAY variable properly or did not allow displaying remote GUI applications on
the target system (where the GUI should pop-up).
1.
Display variable is not set. If your DISPLAY variable is not set and you are not
locally on the system you will see a similar message:
hostA# ./start_gui_installer
Starting Installer ...
java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which
requires it.
at
java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:159)
at java.awt.Window.<init>(Window.java:317)
at java.awt.Frame.<init>(Frame.java:419)
at java.awt.Frame.<init>(Frame.java:384)
at javax.swing.JFrame.<init>(JFrame.java:150)
at com.izforge.izpack.installer.GUIInstaller.loadLangPack(Unknown
Source)
at com.izforge.izpack.installer.GUIInstaller.access$000(Unknown Source)
at com.izforge.izpack.installer.GUIInstaller$1.run(Unknown Source)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:199)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:461)
at
java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:
242)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:16
3)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:157)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:149)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:110)
java.lang.NullPointerException
at com.izforge.izpack.installer.GUIInstaller.loadGUI(Unknown Source)
at com.izforge.izpack.installer.GUIInstaller.access$100(Unknown Source)
at com.izforge.izpack.installer.GUIInstaller$2.run(Unknown Source)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:209)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:461)
at
java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:
2-28 Oracle Grid Engine Installation and Upgrade Guide
Troubleshooting the GUI Installer
242)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:16
3)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:157)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:149)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:110)
If you start the installer on hostA, but want to display it on hostB, you need to set
a proper DISPLAY variable. If hostB has your graphical session on port 22, type
the following command as user that will start the installer:
hostA# DISPLAY=hostB:22 ; export DISPLAY
See next step to finish the setup.
2.
Remote host does not allow remote GUI applications. In this case you will see a
similar message:
hostA# ./start_gui_installer
Starting Installer ...
Xlib: connection to "hostB:22" refused by server
Xlib: No protocol specified
Exception in thread "main" java.lang.InternalError: Can't connect to X11 window
server using 'hostB:22' as the value of the DISPLAY variable.
at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
at
sun.awt.X11GraphicsEnvironment.access$000(X11GraphicsEnvironment.java:53)
at
sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:142)
at java.security.AccessController.doPrivileged(Native Method)
at
sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:131)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:164)
at
java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.ja
va:68)
at sun.awt.motif.MToolkit.<clinit>(MToolkit.java:93)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:164)
at java.awt.Toolkit$2.run(Toolkit.java:821)
at java.security.AccessController.doPrivileged(Native Method)
at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:804)
at javax.swing.UIManager.initialize(UIManager.java:1262)
at javax.swing.UIManager.maybeInitialize(UIManager.java:1245)
at javax.swing.UIManager.getDefaults(UIManager.java:556)
at javax.swing.UIManager.put(UIManager.java:841)
at com.izforge.izpack.installer.GUIInstaller.loadLookAndFeel(Unknown
Source)
at com.izforge.izpack.installer.GUIInstaller.<init>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessor
Impl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructor
Installing Grid Engine 2-29
Troubleshooting the GUI Installer
AccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at java.lang.Class.newInstance0(Class.java:350)
at java.lang.Class.newInstance(Class.java:303)
at com.izforge.izpack.installer.Installer.main(Unknown Source)
You have to explicitly allow remote GUI connections from hostA. Type the
following command as the user running the graphical session on hostB:
hostB# xhost +hostA
Now you may start the start_gui_installer and the Welcome screen should get
displayed on the remote host.
How can I remove a host from the host selection that I previously added?
Right-click the host and select Remove selected action from the pop-up menu.
Figure 2–18
Removing Selected Hosts
Can I save hosts that I selected in the host selection to a file?
Yes, you can. Select multiple hosts using CTRL + left-click and do a right-click. A
pop-up menu appears allowing you to save all hosts in the current tab or just the
selected hosts.
Qmaster JMX thread does not appear to be running.
The qmaster messages file shows message could not load libjvm ld.so.1: sge_
qmaster: fatal: jvm_missing: open failed: No such file or directory.
Message means that the installer could not auto-detect a suitable JVM library. Possible
reasons include being on a 64-bit platform and not having the 64-bit Java installed at
all on the target hosts. Once you install correct Java you may change the libjvm_path
attribute from jvm_missing to the correct path to the JVM library by calling qconf
-mconf command.
2-30 Oracle Grid Engine Installation and Upgrade Guide
Installing the Software From the Command Line
Known issues and workarounds
Installing BDB server always fails with a timeout state.
Unfortunately you can't currently use the GUI installer to install a BDB server to any
other platform, but Solaris OS. You may use the CLI installation (inst_sge -db) to do
the job locally. You may then use the GUI installer to install qmaster and any number
of shadow and executions hosts if the password-less access is configured. See issue
2941 for more information.
Cannot install additional execd hosts from a remove host (different from
qmaster) when qmaster was installed with CSP or JMX SSL and a custom
connect user is used.
It's recommended to always start the installation in the qmaster host. The reason is
that in the subsequent standalone execd installation there is no way to specify a
connect user for qmaster host. The remote connection to a qmaster host will be
attempted as a user who started the GUI installer.
Installing the Software From the Command Line
The instructions in this section assume that you are installing the software on a
computer running the Solaris TM Operating System. Any difference in functionality
created by other operating system architecture that the Grid Engine software runs on
is documented in files starting with the string arc_depend in the $SGE_ROOT/doc
directory. The remainder of the file name indicates the operating system architectures
to which the comments in the files apply, as in the arc_depend_irix.asc file.
Also note that there are several prerequisites that you must satisfy for Windows
systems before you can install Grid Engine. See Microsoft Services for UNIX and
Microsoft Subsystem for UNIX-based Applications for further details.
This section does not cover the upgrade process or the installation of the Accounting
and Reporting Module, ARCo. For information about upgrading, see Upgrading Grid
Engine. For information about installing ARCo, see Oracle Grid Engine User’s Guide for
installing the accounting and reporting console.
Installation Overview
The instructions in this section are for a new Grid Engine
system only. For instructions on how to install a new system with
additional security protection, see Installing the Increased Security
Features. For instructions on how to upgrade an existing installation
of an earlier version of the Grid Engine software, see Upgrading Grid
Engine .
Note:
Full installation includes the following tasks:
■
■
Running an installation script on the master host and on every execution host in
the Grid Engine system
Registering information about administration hosts and submit hosts
Installing Grid Engine 2-31
How to Install the Master Host
Performing an Installation
The following sections describe how to install all the components of the Grid Engine
system, including the master, execution, administration, and submit hosts. If you need
to install the system with enhanced security, see Installing the Increased Security
Features before you continue installation. For more information about installing Grid
Engine SMF services, see Installing SMF Services before you start the installation.
Topic
Description
How to Install the Master Host
(Example Master Host Installation)
Procedure for installing the master host.
How to Install Shadow Master Host Procedure for installing the shadow master hosts.
(Example Shadow Master Host
Installation)
How to Install Execution Hosts
(Example Execution Host
Installation)
Procedure for installing the execution host.
How to Register Administration
Hosts
Procedure for registering an administration host.
How to Register Submit Hosts
Procedure for registering a submit host.
How to Install the Berkeley DB
Spooling Server (Example Berkeley
DB Spooling Server Installation)
Procedure for installing the necessary software for
Berkeley DB spooling.
How to Install the Master Host
The master host installation procedure creates the appropriate directory hierarchy that
the master daemon requires and starts the Grid Engine master daemon sge_qmaster
on the master host. The master host is also registered as a host with administrative and
submit permission. The installation procedure creates a default configuration for the
system on which it is run. The installation script queries the system for the type of
operating system. The script then makes meaningful settings based on this
information.
If, at any time during the installation, you think something went wrong, you can quit
the installation procedure and restart it.
Before You Begin
■
Extract the Grid Engine software, as described in Loading the Distribution Files on
a Workstation.
■
If you have decided to use an administrative user, as described in User Account
Considerations, you should create that user before installing the master host.
Note:
Windows hosts cannot act as master hosts.
Installing the Master Host
1.
Log in to the master host as root.
2.
If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=<path_to_installation_directory (the directory MUST contain all Grid
Engine files such as Grid Engine binaries)>; export SGE_ROOT
2-32 Oracle Grid Engine Installation and Upgrade Guide
How to Install the Master Host
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
3.
Go to the installation directory.
■
■
4.
If the directory where the installation files reside is visible from the master
host, change directory (cd) to the installation directory sge-root, and then
proceed to the next step.
If the directory is not visible and cannot be made visible, do the following:
–
Create a local installation directory, sge-root, on the master host.
–
Copy the installation files to the local installation directory sge-root
across the network, for example, by using ftp or rcp.
–
Change directory (cd) to the local sge-root directory.
Type the inst_sge -m command, adding the -csp flag if you are installing using
the Certificate Security Protocol method described in Installing the Increased
Security Features. This command starts the master host installation procedure.
You are asked several questions, and you might be required to run some
administrative actions. For a complete installation example, see Example Master
Host Installation.
# ./inst_sge -m
Welcome to the Grid Engine installation
--------------------------------------Grid Engine qmaster host installation
------------------------------------.
.
.
The qmaster installation procedure will take approximately 5-10 minutes.
Hit <RETURN> to continue >>
5.
Choose an administrative account owner. See Example Master Host Installation.
6.
Verify the $SGE_ROOT directory setting. In the example shown Example Master
Host Installation, the value of $SGE_ROOT is /opt/sge62.
7.
Set up the TCP/IP services for the Grid Engine software. If TCP/IP services have
not been configured, you will be notified. To configure TCP/IP services:
■
■
Start a new terminal session or window to add the information /etc/services
file or your NIS maps.
Add the correct ports to the /etc/services file or your NIS services map, as
described in Network Services. The following example adds entries for both
sge_qmaster and sge_execd to your /etc/services file.
...
sge_qmaster
sge_execd
■
6444/tcp
6445/tcp
Save your changes and return to the window where the installation script is
running.
Installing Grid Engine 2-33
How to Install the Master Host
8.
9.
Type the name of your cell or accept the default cell name. The use of Grid Engine
system cells is described in Cells.
■
If you have decided to use cells, type the cell name now.
■
If you have decided not to use cells, press the Return key.
Set up a unique cluster name. For more information, see Cluster Name.
■
To accept the default cluster name, press the Return key.
■
To enter a new cluster name, type the cluster name and press the Return key.
10. Specify a spool directory. For guidelines on disk space requirements for the spool
directory, see Disk Space Requirements. For information on where spool directory
is installed, see Spool Directories under the Root Directory.
■
■
To accept the default spool directory, press the Return key.
If you want to use a different spool directory, then answer y to the prompt and
provide a complete path name to the directory.
11. Specify whether you plan to use Windows-based execution hosts.
■
■
If you do not plan to use Windows support, answer No.
If you want Windows support, answer Yes. You will be asked some
Windows-specific questions later in the installation process. These questions
will be marked as WINDOWS-ONLY.
12. Verify or set the correct file permissions.
■
■
■
If you used pkgadd or you know that the file permissions are correct, answer y
to accept the current permissions.
Answer n if you need to verify or change the file permissions.
WINDOWS ONLY - If you specified that you wanted Windows Execution
Host support in the previous question, you should let the script set the file
permissions for you.
13. Specify whether all Grid Engine hosts for this cluster are located in a single DNS
domain.
■
■
If all of your Grid Engine system hosts are located in a single DNS domain,
then answer y. Grid Engine will not care if domain information is supplied.
hostA and hostA.foo.com are equivalent.
If all of your Grid Engine system hosts are not located in a single DNS
domain, then answer n. You will be asked to configure a default domain to use
in case a host is specified without domain information.
14. Watch while Grid Engine creates directories according to the information that you
provided so far.
15. Specify whether you want to enable the JMX MBean Server to use the SDM Grid
Engine Adapter.
■
If you enable the JMX MBean Server, you are asked to enter the following
information:
–
JAVA_HOME path
–
Additional JVM arguments
–
JMX MBean Server port number
–
JMX SSL server authentication
2-34 Oracle Grid Engine Installation and Upgrade Guide
How to Install the Master Host
–
JMX SSL client authentication
–
JMX SSL server keystore path
–
JMX SSL server keystore password
If you are on a 64-bit system, you need to provide JAVA_
HOME for a 64-bit Java (usually installed as an addition to the 32-bit
Java).
Caution:
16. Specify whether you want to use classic spooling or Berkeley DB. By default, Grid
Engine uses Berkeley Database spooling. For more information on how to
determine the type of spooling mechanism you want, please see Choosing
Between Classic Spooling and Database Spooling.
■
If you choose Berkeley DB spooling, you are asked to choose whether to use a
local directory or a Berkeley DB Spooling Server.
Tip: To use a shadow master host for increased availability of the
database, use the Berkeley DB Spooling Server.
To use a Berkeley DB spooling server, enter y. To install the Berkeley DB
Spooling Server:
■
■
–
Start a new terminal session or window and install the software, as
described in How to Register Submit Hosts.
–
After you have installed the software on the spooling server, return to the
master installation window, and press the Return key.
–
Type the name of the spooling server. In Step 16 of the Example Master
Host Installation, vector is the host name of the spooling server.
–
Type the name of the spooling directory. In Step 16 of the Example Master
Host Installation, /opt/sge62/default/spool/spooldb is the spooling
directory.
If you do not want to use a Berkeley DB spooling server, type n. You are asked
to provide the complete path to the database directory. If the directory does
not exist, it is created.
To specify classic spooling, type classic.
17. Type a range of IDs that will be assigned dynamically for jobs. See Step 17 in the
Example Master Host Installation. For more information, see Planning Checklist.
18. Verify the spooling directory for the execution daemon. See Step 18 in the
Example Master Host Installation. The Grid Engine administrator must have
access to create and write into this directory. For information on spooling, see
Spool Directories under the Root Directory.
19. Type the email address of the user who should receive problem reports. See Step
19 in the Example Master Host Installation. In the example, the user who will
receive problem reports is me@my.domain.
20. Verify the configuration parameters. See Step 20 in the Example Master Host
Installation
■
If configuration parameters are correct, Grid Engine proceeds to create the
local configuration.
Installing Grid Engine 2-35
How to Install the Master Host
■
If configuration parameters are not correct, type y to change them.
21. Specify whether you want the daemons to start when the system is booted. See
Step 21 in the Example Master Host Installation.
22. WINDOWS-ONLY - If you specified that you want Windows support, you are
asked to create Certificate Security Protocol (CSP) certificates. Even if the system is
not running in CSP mode, it is necessary to create certain CSP certificates for
Windows support. These certificates are automatically generated during the
master host installation. For instructions on how to transfer these certificates to the
Windows execution hosts, see Step 6 of How to Install a CSP-Secured System.
23. WINDOWS-ONLY - Add the Windows Administrator name to the Grid Engine
manager list.
24. Identify the hosts that you will later install as execution hosts. See Step 24 in the
Example Master Host Installation.
Tip: You can list hosts individually, separated by a blank space, or
you can supply a file that contains host names.
You can use the master host for executing jobs. To do so, you
must carry out the execution host installation for the master machine.
However, if you use a very slow machine as master host, or if your
cluster is significantly large, do not use the master host as an
execution host.
Note:
25. Select a scheduler profile. See Step 25 in the Example Master Host Installation. For
information on how to determine which profile you should use, see Scheduler
Profiles. Once you answer this question, the installation process is complete.
Several screens of information will be displayed before the script exits.
26. WINDOWS-ONLY - Copy the certificate files to the Windows execution hosts.
You can use a script to perform this function.
Tip: To use this functionality without being asked for a password,
the root user should use rsh or ssh to access the execution hosts.
27. Create the environment variables ($SGE_ROOT and $SGE_CELL) for use with the
Grid Engine software. See Step 27 in the Example Master Host Installation.
If no cell name was specified during installation, the value of
cell is default.
Note:
■
If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
■
If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
For details about how you can verify that the execution host has been set up correctly,
see How to Verify That the Daemon is Running on the Master Host.
2-36 Oracle Grid Engine Installation and Upgrade Guide
Example Master Host Installation
Example Master Host Installation
The following example shows a complete Grid Engine master host installation.
Remember that this is only one step in the entire Grid Engine installation process. The
steps in this example coordinate with the master host installation description at How
to Install the Master Host.
Steps 1-4
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
% su # cd sge-install-dir
# ./inst_sge -m
Grid Engine License is displayed.
Do you agree with that license? (y/n) [n] >>
Welcome to the Grid Engine installation
--------------------------------------Grid Engine qmaster host installation
------------------------------------Before you continue with the installation please read these hints:
- Your terminal window should have a size of at least
80x24 characters
- The INTR character is often bound to the key Ctrl-C.
The term >Ctrl-C< is used during the installation if you
have the possibility to abort the installation
The qmaster installation procedure will take approximately 5-10 minutes.
Hit <RETURN> to continue >>
Step 5
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
Grid Engine admin user account
-----------------------------The current directory
/opt/sge62
is owned by user
myusername
If user >root< does not have write permissions in this directory on *all*
of the machines where Grid Engine will be installed (NFS partitions not
exported for user >root< with read/write permissions) it is recommended to
install Grid Engine that all spool files will be created under the user id
of user >myusername<.
IMPORTANT NOTE: The daemons still have to be started by user >root<.
Do you want to install Grid Engine as admin user >myusername< (y/n) [y] >>
Installing Grid Engine as admin user >myusername<
Installing Grid Engine 2-37
Example Master Host Installation
050
Hit <RETURN> to continue >>
051
Choosing Grid Engine admin user account
052
--------------------------------------053
054
You may install Grid Engine that all files are created with the user id of
an
055
unprivileged user.
056
057
This will make it possible to install and run Grid Engine in directories
058
where user >root< has no permissions to create and write files and
directories.
059
060
- Grid Engine still has to be started by user >root<
061
062
- This directory should be owned by the Grid Engine administrator
063
064
Do you want to install Grid Engine
065
under an user id other than >root< (y/n) [y] >> y
066
067
Choosing a Grid Engine admin user name
068
-------------------------------------069
070
Please enter a valid user name >> sgeadmin
071
072
Installing Grid Engine as admin user >sgeadmin<
073
074
Hit <RETURN> to continue >>
075
Step 6
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
Checking $SGE_ROOT directory
---------------------------The Grid Engine root directory is:
$SGE_ROOT = /opt/sge62
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/opt/sge62] >>
Your $SGE_ROOT directory: /opt/sge62
Hit <RETURN> to continue >>
Step 7 Two actions - one for qmaster, one forexecd
091
Grid Engine TCP/IP communication service
092
---------------------------------------093
094
The port for sge_qmaster is currently set by the shell environment.
095
096
SGE_QMASTER_PORT = 10500
097
098
Now you have the possibility to set/change the communication ports by using
the
099
>shell environment< or you may configure it via a network service,
configured
100
in local >/etc/services<, >NIS< or >NIS+<, adding an entry in the form
2-38 Oracle Grid Engine Installation and Upgrade Guide
Example Master Host Installation
101
102
sge_qmaster <port_number>/tcp
103
104
to your services database and make sure to use an unused port number.
105
106
How do you want to configure the Grid Engine communication ports?
107
108
Using the >shell environment<:
[1]
109
110
Using a network service like >/etc/services<, >NIS/NIS+<: [2]
111
112
(default: 1) >> 1
113
114
Using the environment variable
115
116
$SGE_QMASTER_PORT=10500
117
118
as port for communication.
119
120
Hit <RETURN> to continue >>
121
122
Grid Engine TCP/IP communication service
123
---------------------------------------124
125
The port for sge_execd is currently set by the shell environment.
126
127
SGE_EXECD_PORT = 10501
128
129
Now you have the possibility to set/change the communication ports by using
the
130
>shell environment< or you may configure it via a network service,
configured
131
in local >/etc/services<, >NIS< or >NIS+<, adding an entry in the form
132
133
sge_execd <port_number>/tcp
134
135
to your services database and make sure to use an unused port number.
136
137
How do you want to configure the Grid Engine communication ports?
138
139
Using the >shell environment<:
[1]
140
141
Using a network service like >/etc/services<, >NIS/NIS+<: [2]
142
143
(default: 1) >> 1
144
145
Using the environment variable
146
147
$SGE_EXECD_PORT=10501
148
149
as port for communication.
150
151
Hit <RETURN> to continue >>
Step 8
152
153
154
155
156
Grid Engine cells
----------------Grid Engine supports multiple cells.
Installing Grid Engine 2-39
Example Master Host Installation
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name
default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >>
Using cell >default<.
Hit <RETURN> to continue >>
Step 9
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
Unique cluster name
------------------The cluster name uniquely identifies a specific Grid Engine cluster.
The cluster name must be unique throughout your organization. The name
is not related to the Grid Engine cell.
The cluster name must start with a letter ([A-Za-z]), followed by letters,
digits ([0-9]), dashes (-) or underscores (_).
Enter new cluster name or hit <RETURN>
to use default [p10500] >>
Your $SGE_CLUSTER_NAME: p10500
Hit <RETURN> to continue >>
Step 10
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
Grid Engine qmaster spool directory
----------------------------------The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.
The admin user >myusername< must have read/write access
to the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start
the qmaster daemon on other hosts (see the corresponding section in the
Grid Engine Installation and Administration Manual for details) the account
on the shadow master hosts also needs read/write access to this directory.
The following directory
[/opt/sge62/default/spool/qmaster]
will be used as qmaster spool directory by default!
Do you want to select another qmaster spool directory (y/n) [n] >>
2-40 Oracle Grid Engine Installation and Upgrade Guide
Example Master Host Installation
212
Step 11
213
214
215
216
217
Windows Execution Host Support
-----------------------------Are you going to install Windows Execution Hosts? (y/n) [n]
Step 12
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
Verifying and setting file permissions
-------------------------------------Did you install this version with >pkgadd< or did you already
verify and set the file permissions of your distribution (y/n) [y] >>
Verifying and setting file permissions
-------------------------------------We may now verify and set the file permissions of your Grid Engine
distribution.
This may be useful since due to unpacking and copying of your distribution
your files may be unaccessible to other users.
We will set the permissions of directories and binaries to
755 - that means executable are accessible for the world
and for ordinary files to
644 - that means readable for the world
Do you want to verify and set your file permissions (y/n) [y] >>
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
>3rd_party<
>bin<
>ckpt<
>examples<
>inst_sge<
>install_execd<
>install_qmaster<
>lib<
>mpi<
>pvm<
>qmon<
>util<
>utilbin<
>catman<
>doc<
>include<
>man<
Your file permissions were set
Hit <RETURN> to continue >>
Installing Grid Engine 2-41
Example Master Host Installation
Step 13
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
Select default Grid Engine hostname resolving method
---------------------------------------------------Are all hosts of your cluster in one DNS domain? If this is
the case the hostnames
>hostA< and >hostA.foo.com<
would be treated as equal, because the DNS domain name >foo.com<
is ignored when comparing hostnames.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >>
Ignoring domainname when comparing hostnames.
Hit <RETURN> to continue >>
Step 14
282
283
284
285
286
287
288
Making directories
-----------------creating directory: /opt/sge62/default/spool/qmaster
creating directory: /opt/sge62/default/spool/qmaster/job_scripts
Hit <RETURN> to continue >>
Step 15
289
Grid Engine JMX MBean server
289
---------------------------290
291
In order to use the Service Domain Manager (SDM) SGE adapter
292
you need to configure a JMX server in qmaster. Qmaster will then
293
load a Java Virtual Machine through a shared library.
294
NOTE: Java 1.5 or later is required for the JMX MBean server.
294
295
Do you want to enable the JMX MBean server (y/n) [y] >> y
296
297
Please give some basic parameters for JMX MBean server
298
We will ask for
299
- JAVA_HOME
300
- additional JVM arguments (optional)
301
- JMX MBean server port
302
- JMX ssl authentication
303
- JMX ssl client authentication
304
- JMX ssl server keystore path
305
- JMX ssl server keystore password
306
307
Detecting suitable JAVA ...
308
Please enter JAVA_HOME or press enter [/usr] >> /usr
309
Please enter additional JVM arguments (optional, default is [-Xmx256m]) >>
-Xmx256m
310
Please enter an unused port number for the JMX MBean server [6444] >> 6444
311
Enable JMX SSL server authentication (y/n) [y] >> y
312
313
Enable JMX SSL client authentication (y/n) [y] >> y
314
315
Enter JMX SSL server keystore path
2-42 Oracle Grid Engine Installation and Upgrade Guide
Example Master Host Installation
[/var/sgeCA/port6442/def2/private/keystore] >> /var/sgeCA/port6442/def2/private
316
/keystore
317
Enter JMX SSL server keystore pw (at least 6 characters) >> ********
318
319
Using the following JMX MBean server settings.
320
libjvm_path
>/usr/jdk/instances/jdk1.5.0/jre/lib/sparcv9/server/libjvm.so<
321
Additional JVM arguments >-Xmx256m<
322
JMX port
>6444<
323
JMX ssl
>true<
324
JMX client ssl
>true<
325
JMX server keystore
>/var/sgeCA/port6442/def2/private/keystore<
326
JMX server keystore pw
>****************<
327
328
Do you want to use these data (y/n) [y] >> y
329
330
Hit <RETURN> to continue >>
331
332
Making directories
333
-----------------334
335
creating directory: /cod_home/sge6.2u3/def2/spool/qmaster
336
creating directory: /cod_home/sge6.2u3/def2/spool/qmaster/job_scripts
337
Hit <RETURN> to continue >>
338
Step 16
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
Setup spooling
-------------Your Grid Engine binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
The Berkeley DB spooling method provides two configurations!
1) Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host
2) Berkeley DB Spooling Server:
If you want to setup a shadow master host, you need to use
Berkeley DB Spooling Server!
In this case you have to choose a host with a configured RPC service.
The qmaster host connects via RPC to the Berkeley DB. This setup is more
failsafe, but results in a clear potential security hole. RPC communication
(as used by Berkeley DB) can be easily compromised. Please only use this
alternative if your site is secure or if you are not concerned about
security. Check the installation guide for further advice on how to achieve
failsafety without compromising security.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> y
Berkeley DB Setup
----------------Please, log in to your Berkeley DB spooling host and execute "inst_sge -db"
Please do not continue, before the Berkeley DB installation with
"inst_sge -db" is completed, continue with <RETURN>
Installing Grid Engine 2-43
Example Master Host Installation
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
Berkeley Database spooling parameters
------------------------------------Please enter the name of your Berkeley DB Spooling Server! >> vector
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>
Hit <RETURN> to continue >>
Berkeley Database spooling parameters
------------------------------------Please enter the Database Directory now, even if you want to spool locally,
it is necessary to enter this Database Directory.
Default: [/opt/sge62/default/spool/spooldb] >> /tmp/dom/spooldb
Dumping bootstrapping information
Initializing spooling database
Hit <RETURN> to continue >>
Step 17
394
Grid Engine group id range
395
-------------------------396
397
When jobs are started under the control of Grid Engine an additional group
id
398
is set on platforms which do not support jobs. This is done to provide
maximum
399
control for Grid Engine jobs.
400
401
This additional UNIX group id range must be unused group id's in your
system.
402
Each job will be assigned a unique id during the time it is running.
403
Therefore you need to provide a range of id's which will be assigned
404
dynamically for jobs.
405
406
The range must be big enough to provide enough numbers for the maximum
number
407
of Grid Engine jobs running at a single moment on a single host. E.g. a
range
408
like >20000-20100< means, that Grid Engine will use the group ids from
409
20000-20100 and provides a range for 100 Grid Engine jobs at the same time
410
on a single host.
411
412
You can change at any time the group id range in your cluster configuration.
413
414
Please enter a range >> 20000-20100
415
416
Using >20000-20100< as gid range. Hit <RETURN> to continue >>
417
Step 18
418
419
420
421
Grid Engine cluster configuration
--------------------------------Please give the basic configuration parameters of your Grid Engine
2-44 Oracle Grid Engine Installation and Upgrade Guide
Example Master Host Installation
422
installation:
423
424
<execd_spool_dir>
425
426
The pathname of the spool directory of the execution hosts. User
>myusername<
427
must have the right to create this directory and to write into it.
428
429
Default: [/opt/sge62/default/spool] >>
430
Step 19
431
432
433
434
435
436
437
438
439
440
441
442
443
Grid Engine cluster configuration (continued)
--------------------------------------------<administator_mail>
The email address of the administrator to whom problem reports are sent.
It is recommended to configure this parameter. You may use >none<
if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [none] >> me@my.domain
Step 20
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
The following parameters for the cluster configuration were configured:
execd_spool_dir
administrator_mail
/opt/sge62/default/spool
me@my.domain
Do you want to change the configuration parameters (y/n) [n] >> n
Creating local configuration
---------------------------Creating >act_qmaster< file
Adding default complex attributes
Adding Grid Engine default usersets
Adding >sge_aliases< path aliases file
Adding >qtask< qtcsh sample default request file
Adding >sge_request< default submit options file
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<
Hit <RETURN> to continue >>
Step 21
465
466
467
468
469
470
471
472
473
qmaster startup script
---------------------Do you want to start qmaster automatically at machine boot?
NOTE: If you select "n" SMF will be not used at all! (y/n) [y] >>
Hit <RETURN> to continue >>
Installing Grid Engine 2-45
Example Master Host Installation
474
475
476
477
478
Grid Engine qmaster startup
--------------------------Starting qmaster daemon. Please wait ...
Hit <RETURN> to continue >>
Step 24
479
Adding Grid Engine hosts
480
-----------------------481
482
Please now add the list of hosts, where you will later install your
execution
483
daemons. These hosts will be also added as valid submit hosts.
484
485
Please enter a blank separated list of your execution hosts. You may
486
press <RETURN> if the line is getting too long. Once you are finished
487
simply press <RETURN> without entering a name.
488
489
You also may prepare a file with the hostnames of the machines where you
plan
490
to install Grid Engine. This may be convenient if you are installing Grid
491
Engine on many hosts.
492
493
Do you want to use a file which contains the list of hosts (y/n) [n] >> n
494
495
Adding admin and submit hosts
496
----------------------------497
498
Please enter a blank seperated list of hosts.
499
500
Stop by entering <RETURN>. You may repeat this step until you are
501
entering an empty list. You will see messages from Grid Engine
502
when the hosts are added.
503
504
Host(s): host1 host2 host3 host4
505
506
host1 added to administrative host list
507
host1 added to submit host list
508
host2 added to administrative host list
509
host2 added to submit host list
510
host3 added to administrative host list
511
host3 added to submit host list
512
host4 added to administrative host list
513
host4 added to submit host list
514
Hit <RETURN> to continue >>
515
516
Adding admin and submit hosts
517
----------------------------518
519
Please enter a blank seperated list of hosts.
520
521
Stop by entering <RETURN>. You may repeat this step until you are
522
entering an empty list. You will see messages from Grid Engine
523
when the hosts are added.
524
525
Host(s):
526
Finished adding hosts. Hit <RETURN> to continue >>
527
528
If you want to use a shadow host, it is recommended to add this host
529
to the list of administrative hosts.
2-46 Oracle Grid Engine Installation and Upgrade Guide
Example Master Host Installation
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
plan
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
If you are not sure, it is also possible to add or remove hosts after the
installation with <qconf -ah hostname> for adding and <qconf -dh hostname>
for removing this host
Attention: This is not the shadow host installation
procedure.
You still have to install the shadow host separately
Do you want to add your shadow host(s) now? (y/n) [y] >>
Adding Grid Engine shadow hosts
------------------------------Please now add the list of hosts, where you will later install your shadow
daemon.
Please enter a blank separated list of your execution hosts. You may
press <RETURN> if the line is getting too long. Once you are finished
simply press <RETURN> without entering a name.
You also may prepare a file with the hostnames of the machines where you
to install Grid Engine. This may be convenient if you are installing Grid
Engine on many hosts.
Do you want to use a file which contains the list of hosts (y/n) [n] >>
Adding admin hosts
-----------------Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.
Host(s): es-ergb01-01
adminhost "es-ergb01-01" already exists
Hit <RETURN> to continue >>
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.
Host(s):
Finished adding hosts. Hit <RETURN> to continue >>
Creating the default <all.q> queue and <allhosts> hostgroup
----------------------------------------------------------root@myhost added "@allhosts" to host group list
root@myhost added "all.q" to cluster queue list
Hit <RETURN> to continue >>
No CSP system installed!
No CSP system installed!
Installing Grid Engine 2-47
Example Master Host Installation
Step 25
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
Scheduler Tuning
---------------The details on the different options are described in the manual.
Configurations
-------------1) Normal
Fixed interval scheduling, report scheduling information,
actual + assumed load
2) High
Fixed interval scheduling, report limited scheduling information,
actual load
3) Max
Immediate Scheduling, report no scheduling information,
actual load
Enter the number of your preferred configuration and hit <RETURN>!
Default configuration is [1] >>
We're configuring the scheduler with >Normal< settings!
Do you agree? (y/n) [y] >>
changed scheduler configuration
Step 27
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
Using Grid Engine
----------------You should now enter the command:
source /scratch2/myusername/sge62/default/common/settings.csh
if you are a csh/tcsh user or
# . /scratch2/myusername/sge62/default/common/settings.sh
if you are a sh/ksh user.
This will set or expand the following environment variables:
-
$SGE_ROOT
$SGE_CELL
$SGE_CLUSTER_NAME
$SGE_QMASTER_PORT
$SGE_EXECD_PORT
$PATH/$path
$MANPATH
(always necessary)
(if you are using a cell other than >default<)
(always necessary)
(if you haven't added the service >sge_qmaster<)
(if you haven't added the service >sge_execd<)
(to find the Grid Engine binaries)
(to access the manual pages)
Hit <RETURN> to see where Grid Engine logs messages >>
Grid Engine messages
-------------------Grid Engine messages can be found at:
2-48 Oracle Grid Engine Installation and Upgrade Guide
How to Install Shadow Master Host
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
>>
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
670
671
672
673
674
675
676
677
678
Startup messages can be found in SMF service log files.
You can get the name of the log file by calling svcs -l <SERVICE_NAME>
E.g.: svcs -l svc:/application/sge/qmaster:p10500
After startup the daemons log their messages in their spool directories.
Qmaster:
/scratch2/myusername/sge62/default/spool/qmaster/messages
Exec daemon: <execd_spool_dir>/<hostname>/messages
Grid Engine startup scripts
--------------------------Grid Engine startup scripts can be found at:
/scratch2/myusername/sge62/default/common/sgemaster (qmaster)
/scratch2/myusername/sge62/default/common/sgeexecd (execd)
Do you want to see previous screen about using Grid Engine again (y/n) [n]
Your Grid Engine qmaster installation is now completed
-----------------------------------------------------Please now login to all hosts where you want to run an execution daemon
and start the execution host installation procedure.
If you want to run an execution daemon on this host, please do not forget
to make the execution host installation in this host as well.
All execution hosts must be administrative hosts during the installation.
All hosts which you added to the list of administrative hosts during this
installation procedure can now be installed.
You may verify your administrative hosts with the command
# qconf -sh
and you may add new administrative hosts with the command
# qconf -ah <hostname>
Please hit <RETURN> >>
sge_qmaster successfully installed!
How to Install Shadow Master Host
Shadow master hosts are machines in the cluster that can detect a failure of the master
daemon and take over its role as master host. When the shadow master daemon
detects that the master daemon has failed abnormally, it starts up a new master
daemon on the host where the shadow master daemon is running.
The shadow master host file, $SGE_ROOT/$SGE_CELL/common/shadow_masters, contains
the name of the primary master host, which is the machine where the master daemon
initially runs, followed by the names of the shadow master hosts. The order of the
shadow master hosts is significant. The primary master host is the first line in the file.
If the primary master host fails to proceed, then the shadow master defined in the
second line takes over. If this shadow master also fails, the shadow master defined in
Installing Grid Engine 2-49
How to Install Shadow Master Host
the third line takes over, and so forth. You can affect this order by installing shadow
master daemons first on hosts that you want to be at the top of this list.
Steps
1. Log in to the shadow master host as root.
2.
If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=<path_to_installation_directory (the directory MUST contain all Grid
Engine files such as Grid Engine binaries)>; export SGE_ROOT
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
3.
Go to the installation directory.
■
■
4.
If the directory where the installation files reside is visible from the shadow
master host, change directory (cd) to the installation directory sge-root, and
then proceed to the next step.
If the directory is not visible and cannot be made visible, do the following:
–
Create a local installation directory, sge-root, on the master host.
–
Copy the installation files to the local installation directory sge-root
across the network, for example, by using ftp or rcp.
–
Change directory (cd) to the local sge-root directory.
Type the inst_sge -sm command.
This command starts the shadow master host installation procedure. You are
asked several questions, and you might be required to run some administrative
actions.
For a complete installation example, see Example Shadow Master Host
Installation.
# ./inst_sge -sm
See Step 1-4 in the Example Shadow Master Host Installation.
5.
Choose an administrative account owner.
See Step 5 in the Example Shadow Master Host Installation. Use the same
administrative user as in qmaster installation.
6.
Verify the $SGE_ROOT directory setting.
See Step 6 of the Example Shadow Master Host Installation, the value of $SGE_
ROOT in the example is /sge.
7.
Type the name of your cell.
See Step 7 in the Example Shadow Master Host Installation.
8.
Confirm that host is known by the qmaster host.
See Step 8 in the Example Shadow Master Host Installation.
9.
(optional) Specify JMX MBean Server values.
Presented when you installed qmaster with JMX MBean Server. See Step 9 in the
Example Shadow Master Host Installation.
■
Enter the following information:
2-50 Oracle Grid Engine Installation and Upgrade Guide
How to Install Shadow Master Host
If you are on a 64-bit system, you need to provide JAVA_
HOME for a 64-bit Java (usually installed as an addition to the 32-bit
Java).
Caution:
–
JAVA_HOME path
–
Additional JVM arguments
10. Confirm creation of the local configuration.
See Step 10 in the Example Shadow Master Host Installation.
11. Specify whether you want to start the shadow master daemon when the system is
booted.
See Step 11 in the Example Shadow Master Host Installation.
Installation is now complete. See Step 12 in the Example Shadow Master Host
Installation.
Starting a Shadow Master Host Manually
To start a shadow master host manually, the system must be sure either that the old
master daemon has terminated, or that it will terminate without performing actions
that interfere with the newly started shadow master.
In very rare circumstances, you might not be able to determine whether the old master
daemon has terminated or if it will terminate. In such cases, an error message is logged
to the messages log file of the sge_shadowd daemons on the shadow master hosts.
If an attempts to open a tcp connection to a master daemon permanently fails, make
sure that no master daemon is running, and then restart the master daemon manually
on any of the shadow master machines. See HOW TO RESTART DAEMONS FROM
THE COMMAND LINE for further details.
Configuring Shadow Master Host Environment Variables
Three environment variables affect the takeover time for a shadow master:
Table 2–10
Shadow Master Host Environment Variables
Variable
Description
SGE_DELAY_TIME
This variable controls the interval in which sge_shadowd pauses
if a takeover bid fails. This value is used only when there are
multiple sge_shadowd instances that are contending to be the
master (the default is 600 seconds).
SGE_CHECK_INTERVAL
This variable controls the interval in which the sge_shadowd
checks the heartbeat file (the default is 60 seconds).
SGE_GET_ACTIVE_
INTERVAL
This variable controls the interval when a sge_shadowd instance
tries to take over when the heartbeat file has not changed.
These variables interact in the following ways:
1.
The master host updates the heartbeat file every 30 seconds.
2.
The sge_shadowd daemon checks for changes to the heartbeat file at an interval
defined by the SGE_CHECK_INTERVAL variable. This value must be greater than 30
seconds.
Installing Grid Engine 2-51
How to Install Shadow Master Host
■
■
3.
If the heartbeat file has been updated, the sge_shadowd daemon restarts the
waiting clock.
If the heartbeat file has not been updated, the sge_shadowd daemon continues
to wait until the designated interval defined by the SGE_CHECK_INTERVAL
variable expires. This action ensures that the sge_shadowd daemon is not too
aggressive in trying to take over and allows the master host some leeway in
updating the heartbeat file.
When the SGE_GET_ACTIVE_INTERVAL has expired, the sge_shadowd daemon then
takes over if the heartbeat file has still not been updated.
A reasonable configuration might be to set the SGE_CHECK_INTERVAL to 45 seconds and
the SGE_GET_ACTIVE_INTERVAL to 90 seconds. So, after about two minutes, the takeover
will occur. Meanwhile, you get an error message whenever a Grid Engine system
command is run. If you want to check the operation of the shadow host after you have
configured these environment variables, you will have to disconnect the master host's
network cable to simulate a failure.
The file $SGE_ROOT/$SGE_CELL/common/act_qmaster contains
the name of the host that is actually running the sge_qmaster daemon.
Note:
If the master daemon is shut down gracefully, the shadow master daemon does not
start up. If you want the shadow master daemon to take over after you shut down the
master daemon gracefully, remove the lock file that is located in the sge_qmaster
spool directory. The default location of this spool directory is $SGE_ROOT/$SGE_
CELL/spool/qmaster.
Example Shadow Master Host Installation
The following example shows a complete Grid Engine shadow master host
installation. Remember that this is only an optional step in the entire Grid Engine
installation process. The steps in this example coordinate with the shadow master host
installation, How to Install Shadow Master Host.
Steps 1-4
% su # cd /sge
# ./inst_sge -sm
Shadow Master Host Setup
-----------------------Make sure, that the host, you wish to configure as a shadow host,
has read/write permissions to the qmaster spool and SGE_ROOT/<cell>/common
directory! For using a shadow master it is recommended to set up a
Berkeley DB Spooling Server
Hit <RETURN> to continue >>
Step 5
Grid Engine admin user account
-----------------------------The current directory
/sge
2-52 Oracle Grid Engine Installation and Upgrade Guide
How to Install Shadow Master Host
is owned by user
sgeadmin
If user >root< does not have write permissions in this directory on *all*
of the machines where Grid Engine will be installed (NFS partitions not
exported for user >root< with read/write permissions) it is recommended to
install Grid Engine that all spool files will be created under the user id
of user >sgeadmin<.
IMPORTANT NOTE: The daemons still have to be started by user >root<.
Do you want to install Grid Engine as admin user >sgeadmin< (y/n) [y] >>
Installing Grid Engine as admin user >sgeadmin<
Hit <RETURN> to continue >>
Step 6
Checking $SGE_ROOT directory
---------------------------The Grid Engine root directory is not set!
Please enter a correct path for SGE_ROOT.
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/sge] >>
Your $SGE_ROOT directory: /sge
Hit <RETURN> to continue >>
Step 7
Please enter your SGE_CELL directory or use the default [default] >>
Step 8
Checking hostname resolving
--------------------------This hostname is known at qmaster as an administrative host.
Hit <RETURN> to continue >>
Step 9
Grid Engine JMX MBean server
---------------------------In order to use the Service Domain Manager (SDM) SGE adapter
you need to configure a JMX server in qmaster. Qmaster will then
load a Java Virtual Machine through a shared library.
NOTE: Java 1.5 or later is required for the JMX MBean server.
Please give some basic parameters for JMX MBean server
We may ask for
- JAVA_HOME
- additional JVM arguments (optional)
Detecting suitable JAVA ...
Installing Grid Engine 2-53
How to Install Execution Hosts
Please enter JAVA_HOME or press enter [/usr/jdk/latest] >>
Please enter additional JVM arguments (optional, default is [-Xmx256m]) >>
Using the following JMX MBean server settings.
libjvm_path
>/usr/jdk/latest/jre/lib/amd64/server/libjvm.so<
Additional JVM arguments >-Xmx256m<
Do you want to use these data (y/n) [y] >>
Hit <RETURN> to continue >>
Step 10
Creating local configuration
---------------------------sgeadmin@shadow1 modified "shadow1" in configuration list
Local configuration for host >shadow1< created.
Hit <RETURN> to continue >>
Step 11
shadow startup script
--------------------Do you want to start shadowd automatically at machine boot?
NOTE: If you select "n" SMF will be not used at all! (y/n) [y] >> y
Hit <RETURN> to continue >>
Step 12
Starting sge_shadowd on host shadow1
Shadowhost installation completed!
How to Install Execution Hosts
The execution host installation procedure creates the appropriate directory hierarchy
required by sge_execd, and starts the sge_execd daemon on the execution host. This
section describes how to install execution hosts interactively from the command line.
You can automate the installation of execution of multiple hosts by using the
procedure described in Automating the Installation Process.
Before You Begin
Before installing an execution host, you first need to install the master server as
described in How to Install the Master Host and share the common directory.
Caution: If you the fail to share the $SGE_ROOT/$SGE_CELL/common
directory, you will not able to install execution hosts on nodes other
than the qmaster host.
2-54 Oracle Grid Engine Installation and Upgrade Guide
How to Install Execution Hosts
For Windows only, You must satisfy several prerequisites
before you can install Grid Engine execution hosts with Windows
operating systems.
Note:
■
■
■
You might have to install additional software on your computer. See Microsoft
Services for UNIX and Microsoft Subsystem for UNIX-based Applications.
See the steps described in How to Install a CSP-Secured System - Steps 6a, 6b and
6c.
After the installation, each user has to register their Windows password with Grid
Engine using the sgepasswd client application. See User Management on Windows
Hosts for more information.
Steps
1. Log in to the execution host as root.
2.
As you did for the master installation, either copy the installation files to a local
installation directorysge-rootor use a network installation directory.
3.
If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=<path_to_install/unpacked_directory>; export SGE_ROOT
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
4.
Change directory (cd) to the installation directory sge-root.
5.
Verify that the execution host has been declared on the administration host.
■
If you do not see the name of this execution host in the output of the qconf
-sh command, you will need to declare it as an administration host.
–
Start a new terminal session or window.
–
In that window, log into the master host.
–
Declare the execution host as an administration host, using the qconf
command.
# qconf -ah quark
quark added to administrative host list
–
6.
Log back out of the master host, and continue with the installation of the
execution host.
Type the install_execd command, adding the -csp flag if you are installing using
the Certificate Security Protocol method described in Installing the Increased
Security Features. This command starts the execution host installation procedure.
For a complete installation example, see Example Execution Host Installation.
# ./inst_sge -x
Welcome to the Grid Engine execution host installation
-----------------------------------------------------.
.
.
The execution host installation will take approximately 5 minutes.
Installing Grid Engine 2-55
How to Install Execution Hosts
Hit <RETURN> to continue >>
7.
Verify the $SGE_ROOT directory setting. In the example shown in lines 27 through
41 of the Example Execution Host Installation, the value of $SGE_ROOT is
/scratch2/myusername/sge62.
8.
Type the name of your cell or accept the default cell name. See lines 042 through
076 of the Example Execution Host Installation. The use of Grid Engine system
cells is described in Cells.
9.
■
If you have decided to use cells, then type the cell names now.
■
If you have decided not to use cells, then press the Return key.
The install script checks to see what ports have been defined for the execution
daemon. See lines 077 through 085 of the Example Execution Host Installation. If
no ports have been defined, you will be asked to define them.
10. The install script checks to see whether the admin user already exists. If the admin
user already exists, the script continues uninterrupted. If the admin user does not
exist, the script shows the following screen where you must supply a password for
the admin user. After the admin user is created, press the Return key.
Local Admin User
---------------The local admin user sgeadmin, does not exist!
The script tries to create the admin user.
Please enter a password for your admin user >>
Creating admin user sgeadmin, now ...
Admin user created, hit <ENTER> to continue!
11. Verify the execution host has been declared as an administration host. See lines
086 through 092 of the Example Execution Host Installation.
12. Specify whether you want to use a local spool directory. See lines 093 through 122
of the Example Execution Host Installation. For information on spooling, see Spool
Directories under the Root Directory.
■
■
If you do not want a local spool directory, answer n.
If you do want a local spool directory, answer y. In the example,
/tmp/dom/execs is used as the local spool directory on domain.com. Choose
any directory that meets the disk space requirements described in Disk Space
Requirements.
13. Specify whether you want execd to start automatically at boot time. See lines 123
through 131 of the Example Execution Host Installation. You might not want to
install the startup script if you are installing a test cluster or you would rather start
the daemon manually on reboot.
14. WINDOWS ONLY - Choose whether to display the GUI for Windows jobs. See
lines 132 through 163 of the Example Execution Host Installation. A Grid Engine
Helper Service is included with the Grid Engine distribution. This service enables
Windows jobs to display a GUI on the visible desktop of the execution host. The
visible desktop is either the desktop of the user currently logged in on the
execution host or the desktop of the next user who will log in. It is not the log in
screen. The Helper Service is a independent component loosely coupled with the
execution daemon. The startup of the Helper Service is plugged in the Services
dialog box in the Windows control panel. You can install only one Helper Service
2-56 Oracle Grid Engine Installation and Upgrade Guide
Example Execution Host Installation
per host. There can be only one execution daemon installed per Helper Server. The
installation script asks during the installation of a execution host whether you
want to see the GUI of Windows jobs.
15. Specify a queue for this host. See lines 164 through 183 of the Example Execution
Host Installation. Once you answer this question, the installation process is
complete. Several screens of information will be displayed before the script exits.
16. Create the environment variables ($SGE_ROOT and $SGE_CELL) for use with the
Grid Engine Software. See lines 184 through 234 of the Example Execution Host
Installation.
If no cell name was specified during installation, the value of
cell is default.
Note:
■
If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
■
If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
For details about how you can verify that the execution host has been set up correctly,
see How to Verify That the Daemons Are Running on the Execution Hosts.
Example Execution Host Installation
The following example shows a complete Grid Engine execution host installation.
Before you install the execution host, you need to first install the master server as
described in How to Install the Master Host. The line numbers in this example are
referred to from the execution host installation description at How to Install Execution
Hosts.
Steps 1-6
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
% su # qstat -f
# ./ins_sge -x
Welcome to the Grid Engine execution host installation
-----------------------------------------------------If you haven't installed the Grid Engine qmaster host yet, you must execute
this step (with >install_qmaster<) prior the execution host installation.
For a sucessful installation you need a running Grid Engine qmaster. It is
also necessary that this host is an administrative host.
You can verify your current list of administrative hosts with
the command:
# qconf -sh
You can add an administrative host with the command:
# qconf -ah <hostname>
The execution host installation will take approximately 5 minutes.
Installing Grid Engine 2-57
Example Execution Host Installation
024
025
026
Hit <RETURN> to continue >>
Step 7
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
Checking $SGE_ROOT directory
---------------------------The Grid Engine root directory is:
$SGE_ROOT = /scratch2/myusername/sge62
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/scratch2/myusername/sge62] >>
Your $SGE_ROOT directory: /scratch2/myusername/sge62
Hit <RETURN> to continue >>
Step 8
042
Grid Engine cells
043
----------------044
045
Please enter cell name which you used for the qmaster
046
installation or press <RETURN> to use [default] >>
047
048
Using cell: >default<
049
050
Hit <RETURN> to continue >>
051
052
... set owner of /var/sgeCA/port10500 to bofur+myusername
053
054
... copy /var/sgeCA/port10500/default/userkeys/root to
055
/var/sgeCA/port10500/default/userkeys/bofur+Administrator
056
cp: /var/sgeCA/port10500/default/userkeys/root: No such file or directory
057
058
... copy /var/sgeCA/port10500/default/userkeys/root to
059
/var/sgeCA/port10500/default/userkeys/Administrator
060
cp: /var/sgeCA/port10500/default/userkeys/root: No such file or directory
061
062
... copy /var/sgeCA/port10500/default/userkeys/myusername to
063
/var/sgeCA/port10500/default/userkeys/bofur+myusername
064
065
... set owner of /var/sgeCA/port10500/default/userkeys/Administrator to
Administrator
066
067
... set owner of /var/sgeCA/port10500/default/userkeys/bofur+Administrator
to bofur+Administrator
068
069
... set owner of /var/sgeCA/port10500/default/userkeys/myusername to
myusername
070
071
... set owner of /var/sgeCA/port10500/default/userkeys/bofur+myusername to
bofur+myusername
072
073
... remove old /var/sgeCA/port10500/default/userkeys/root certificates
074
2-58 Oracle Grid Engine Installation and Upgrade Guide
Example Execution Host Installation
075
076
WINDOWS certificates are copied and permissions are set!
Step 9
077
078
079
080
081
082
083
084
085
Grid Engine TCP/IP communication service
---------------------------------------The port for sge_execd is currently set BOTH as service and by the
shell environment
SGE_EXECD_PORT = 10501
sge_execd service set to port 725
Step 10
If the admin user already exists, the script automatically skips this step. See How to
Install Execution Hosts for more information.
Step 11
086
087
088
089
090
091
092
Checking hostname resolving
--------------------------This hostname is known at qmaster as an administrative host.
Hit <RETURN> to continue >>
Step 12
093
Local execd spool directory configuration
094
----------------------------------------095
096
During the qmaster installation you've already entered a global
097
execd spool directory. This is used, if no local spool directory is
configured.
098
099
Now you can configure a local spool directory for this host.
100
ATTENTION: The local spool directory doesn't have to be located on a local
101
drive. It is specific to the <local> host and can be located on network
drives,
102
too. But for performance reasons, spooling to a local drive is recommended.
103
104
FOR WINDOWS USER: On Windows systems the local spool directory MUST be set
105
to a local harddisk directory.
106
Installing an execd without local spool directory makes the host unuseable.
107
Local spooling on local harddisk is mandatory for Windows systems.
108
109
Do you want to configure a local spool directory
110
for this host (y/n) [n] >> y
111
112
Please enter the local spool directory now! >> /tmp/dom/execs
113
Using local execd spool directory [/tmp/dom/execs]
114
Hit <RETURN> to continue >>
115
116
Creating local configuration
117
---------------------------118
myusername@domain.com modified "domain.com" in configuration list
119
Local configuration for host >domain.com< created.
120
Installing Grid Engine 2-59
Example Execution Host Installation
121
122
Hit <RETURN> to continue >>
Step 13
123
124
125
126
127
128
129
130
131
execd startup script
-------------------We can install the startup script that will
start execd at machine boot (y/n) [y] >> n
Hit <RETURN> to continue >>
Step 14
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Windows Helper Service Installation
--------------------------------------If you're going to run Windows job's using GUI support, you have
to install the Windows Helper Service
Do you want to install the Windows Helper Service? (y/n) [n] >> y
Testing, if a service is already installed!
... a service is already installed!
... stopping service!
... uninstalling old service!
Service successfully uninstalled.
... moving new service binary!
... installing new service!
Service successfully installed.
... starting new service!
Hit <RETURN> to continue >>
Grid Engine execution daemon startup
-----------------------------------Starting execution daemon. Please wait ...
starting sge_execd
Hit <RETURN> to continue >>
Step 15
164
165
166
167
168
169
170
171
172
Adding a queue for this host
---------------------------We can now add a queue instance for this host:
- it is added to the >allhosts< hostgroup
- the queue provides 1 slot(s) for jobs in all queues
referencing the >allhosts< hostgroup
2-60 Oracle Grid Engine Installation and Upgrade Guide
Example Execution Host Installation
173
174
175
176
177
178
179
180
181
182
183
You do not need to add this host now, but before running jobs on this host
it must be added to at least one queue.
Do you want to add a default queue instance for this host (y/n) [y] >>
No modification because "bofur" already exists in "hostlist" of "hostgroup"
root@domain.com modified "@allhosts" in host group list
root@domain.com modified "all.q" in cluster queue list
Hit <RETURN> to continue >>
Step 16
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Using Grid Engine
----------------You should now enter the command:
source /scratch2/myusername/sge62/default/common/settings.csh
if you are a csh/tcsh user or
# . /scratch2/myusername/sge62/default/common/settings.sh
if you are a sh/ksh user.
This will set or expand the following environment variables:
-
$SGE_ROOT
$SGE_CELL
$SGE_CLUSTER_NAME
$SGE_QMASTER_PORT
$SGE_EXECD_PORT
$PATH/$path
$MANPATH
(always necessary)
(if you are using a cell other than >default<)
(always necessary)
(if you haven't added the service >sge_qmaster<)
(if you haven't added the service >sge_execd<)
(to find the Grid Engine binaries)
(to access the manual pages)
Hit <RETURN> to see where Grid Engine logs messages >>
Grid Engine messages
-------------------Grid Engine messages can be found at:
/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages
(during execution daemon startup)
After startup the daemons log their messages in their spool directories.
Qmaster:
/scratch2/myusername/sge62/default/spool/qmaster/messages
Exec daemon: <execd_spool_dir>/<hostname>/messages
Grid Engine startup scripts
--------------------------Grid Engine startup scripts can be found at:
/scratch2/myusername/sge62/default/common/sgemaster (qmaster)
/scratch2/my/sge62/default/common/sgeexecd (execd)
Installing Grid Engine 2-61
How to Register Administration Hosts
231
>>
232
233
234
Do you want to see previous screen about using Grid Engine again (y/n) [n]
Your execution daemon installation is now completed.
How to Register Administration Hosts
The master host is implicitly allowed to run administrative tasks and to submit,
monitor, and delete jobs. The master host does not require any additional installation
or configuration to perform administration functions. By contrast, pure administration
hosts do require registration.
You can also install administration hosts by using the QMON
graphical user interface. See Oracle Grid Engine Administration Guide
for information about configuring administration hosts with QMON.
Note:
To register an administration host from the command line:
1.
On the master host, log in to the Grid Engine system administrative account, for
example, the sgeadmin account.
2.
Type the following command:
% qconf -ah <admin-host-name>[,...]
How to Register Submit Hosts
You can also install submit hosts by using the QMON
graphical user interface. See Oracle Grid Engine Administration Guide.
Note:
To register a submit host from the command line:
1.
On the master host, log in to the Grid Engine system administrative account, for
example, the sgeadmin account.
2.
Type the following command:
% qconf -as <submit-host-name>[,...]
Refer to Oracle Grid Engine Administration Guide for more details and other means to
configure the different host types.
How to Install the Berkeley DB Spooling Server
The installation procedure installs the Grid Engine software necessary for Berkeley DB
spooling.
1.
Load the Grid Engine software onto a local file system. For details on how to
extract the files, see How to Load the Distribution Files on a Workstation.
2.
Log in to the spooling server host as root.
3.
If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=sge-root; export SGE_ROOT
2-62 Oracle Grid Engine Installation and Upgrade Guide
How to Install the Berkeley DB Spooling Server
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
4.
Change to the installation directory.
# cd $SGE_ROOT
5.
Type the inst_sge command with the -db option.
# sge-root/inst_sge -db
This command starts the spooling server installation procedure. You are asked
several questions. If you think something went wrong, you can quit the
installation procedure and restart it at any time.
6.
Choose an administrative account owner.
Choosing Grid Engine admin user account
--------------------------------------You may install Grid Engine that all files are created with the user id of an
unprivileged user.
This will make it possible to install and run Grid Engine in directories
where user >root< has no permissions to create and write files and directories.
- Grid Engine still has to be started by user >root<
- this directory should be owned by the Grid Engine administrator
Do you want to install Grid Engine
under an user id other than >root< (y/n) [y] >> y
Choosing a Grid Engine admin user name
-------------------------------------Please enter a valid user name >> sgeadmin
Installing Grid Engine as admin user >sgeadmin<
Hit <RETURN> to continue >>
7.
Verify the $SGE_ROOT directory setting. In the following example, the value of
$SGE_ROOT is /opt/sge62.
Checking $SGE_ROOT directory
---------------------------The Grid Engine root directory is:
$SGE_ROOT = /opt/sge62
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/opt/n1ge6] >>
Your $SGE_ROOT directory: /opt/sge62
Hit <RETURN> to continue >>
8.
Type the name of your cell. The use of Grid Engine system cells is described in
Cells.
Installing Grid Engine 2-63
How to Install the Berkeley DB Spooling Server
Grid Engine cells
----------------Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name
default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >>
9.
Select Berkeley DB spooling.
Setup spooling
-------------Your Grid Engine binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
10. Verify your host name. In this example, the installation script is being run on
host2.
Berkeley Database spooling parameters
------------------------------------You are going to install an RPC Client/Server mechanism!
In this case, qmaster will
contact an RPC server running on a separate server machine.
If you want to use the Grid Engine shadowd, you have to use the
RPC Client/Server mechanism.
Enter database server name or
hit <RETURN> to use default [host2] >>
11. Type the directory path of your spooling directory. You might need to change this
path if this directory is NFS mounted, or if you do not have write permissions to
this directory.
Enter the database directory
or hit <RETURN> to use default [/opt/sge62/default//spooldb] >>
creating directory: /opt/sge62/default//spooldb
12. Start the RPC server.
Now we have to startup the rc script
>/opt/sge62/default/common/sgebdb<
on the RPC server machine
If you already have a configured Berkeley DB Spooling Server,
you have to restart the Database with the rc script now and continue with >NO<
2-64 Oracle Grid Engine Installation and Upgrade Guide
Installing the Increased Security Features
Shall the installation script try to start the RPC server? (y/n) [y] >> y
Starting rpc server on host host2!
The Berkeley DB has been started with these parameters:
Spooling Server Name: host2
DB Spooling Directory: /opt/sge62/default//spooldb
Please remember these values, during Qmaster installation
you will be asked for them! Hit <RETURN> to continue!
13. Specify whether you want Berkeley DB service to start automatically at boot time.
Berkeley DB startup script
-------------------------We can install the startup script that
Grid Engine is started at machine boot (y/n) [y] >> y
Once you answer this question, the installation process is complete.
14. Create the environment variables for use with the Grid Engine software.
If no cell name was specified during installation, the value of
$SGE_CELL is default.
Note:
■
If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
■
If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
Installing the Increased Security Features
Use the instructions in this section to set up your system more securely. These
instructions will help you set up your system with Certificate Security Protocol
(CSP)-based encryption.
Why Install the Increased Security Features?
Instead of transferring messages in clear text, the messages in this secure system are
encrypted with a secret key. The secret key is exchanged using a public/private key
protocol. Users present their certificates through the Grid Engine system to prove
identity. Users receive the certificate to ensure that they are communicating with the
correct systems. After this initial announcement phase, communication continues
transparently in encrypted form. The session is valid only for a certain period, after
which the session must be re-announced.
Additional Setup Required
The steps required to set up the Certificate Security Protocol enhanced version of the
Grid Engine system are similar to the standard setup. You generally follow the
instructions in Planning the Installation, Loading the Distribution Files on a
Workstation, How to Install the Master Host, How to Install Execution Hosts and How
to Register Administration Hosts.
However, the following additional tasks are required:
Installing Grid Engine 2-65
How to Install a CSP-Secured System
■
■
Generating the Certificate Authority (CA) system keys and certificates on the
master host by calling the installation script with the -csp flag
Distributing the system keys and certificates to the execution and submit hosts
using a secure method such as ssh
■
Generating user keys and certificates automatically, after master installation
■
Adding new users
Topic
Description
How to Install a CSP-Secured
System
Procedure for installing a CSP-secured
system.
How to Generate Certificates and
Private Keys for Users
Procedure for generating user-specific
certificates and private keys.
How to Renew Certificates
Procedure for renewing user-specific
certificates.
How to Check Certificates
Procedure for checking user-specific
certificates.
How to Install a CSP-Secured System
Install the Grid Engine software as outlined in Performing an Installation, with the
following exception: use the additional flag -csp when invoking the various
installation scripts. To install a CSP-secured system do the following:
1.
Change the master host installation procedure. Type the following command and
respond to the prompts from the installation script.
# ./install_qmaster -csp
2.
Supply the following information to generate the CSP certificates and keys:
■
Two-letter country code, for example, US for the United States
■
State
■
Location, such as a city
■
Organization
■
Organizational unit
■
■
CA email address As the installation proceeds, the Certificate Authority is
created. A CA specific to the Grid Engine system is created on the master host.
The directories that contain information relevant to security are as follows:
The publicly accessible CA and daemon certificate are stored in
$SGE_ROOT/$SGE_CELL/common/sgeCA
■
The corresponding private keys are stored in
/var/sgeCA/{sge_service| portSGE_QMASTER_PORT}/cell/private
■
User keys and certificates are stored in
/var/sgeCA/{sge_service| portSGE_QMASTER_PORT}/cell/userkeys/$USER
3.
The script prompts you for site information.
4.
Confirm whether the information you supplied is correct.
2-66 Oracle Grid Engine Installation and Upgrade Guide
How to Install a CSP-Secured System
5.
After the security-related setup of the master host sge_qmaster is finished, the
script prompts you to continue with the rest of the installation procedure, as in the
following example:
SGE startup script
-------------------Your system wide Grid Engine startup script is installed as:
"/scratch2/eddy/sge_sec/default/common/sgemaster"
Hit Return to continue >>
6.
Transfer the directory that contains the private key and the random file to each
execution host.
■
As root on the master host, type the following commands to prepare to copy
the private keys to the machines you set up as execution hosts:
# umask 077
# cd /
# tar cvpf /var/sgeCA/port536.tar /var/sgeCA/port536/default
■
As root on each execution host, use the following commands to securely copy
the files:
#
#
#
#
#
#
umask 077
cd /
scp masterhost:/var/sgeCA/port536.tar .
umask 022
tar xvpf /port536.tar
rm /port536.tar
Note: On a Windows execution host, the tar utility cannot restore
the ownerships and permissions. In this case, the Administrator must
set the ownerships and permissions manually.
■
Type the following command to verify the file permissions:
# ls -lR /var/sgeCA/port536/
The output should look like the following example:
/var/sgeCA/port536/:
total 2
drwxr-xr-x
4 eddy
other
512
/var/sgeCA/port536/default:
total 4
drwx-----2 eddy
staff
512
drwxr-xr-x
4 eddy
staff
512
/var/sgeCA/port536/default/private:
total 8
-rw------1 eddy
staff
887
-rw------1 eddy
staff
887
-rw------1 eddy
staff
1024
-rw------1 eddy
staff
761
/var/sgeCA/port536/default/userkeys:
total 4
dr-x-----2 eddy
staff
512
Mar
6 10:52 default
Mar
Mar
6 10:53 private
6 10:54 userkeys
Mar
Mar
Mar
Mar
6
6
6
6
Mar
6 10:54 eddy
10:53
10:53
10:54
10:53
cakey.pem
key.pem
rand.seed
req.pem
Installing Grid Engine 2-67
How to Generate Certificates and Private Keys for Users
dr-x-----2 root
staff
512 Mar
/var/sgeCA/port536/default/userkeys/eddy:
total 16
-r-------1 eddy
staff
3811 Mar
-r-------1 eddy
staff
887 Mar
-r-------1 eddy
staff
2048 Mar
-r-------1 eddy
staff
769 Mar
/var/sgeCA/port536/default/userkeys/root:
total 16
-r-------1 root
staff
3805 Mar
-r-------1 root
staff
887 Mar
-r-------1 root
staff
2048 Mar
-r-------1 root
staff
769 Mar
7.
6 10:54 root
6
6
6
6
10:54
10:54
10:54
10:54
cert.pem
key.pem
rand.seed
req.pem
6
6
6
6
10:54
10:54
10:53
10:54
cert.pem
key.pem
rand.seed
req.pem
Install the Grid Engine software on each execution host.
# cd $SGE_ROOT
# ./install_execd -csp
8.
Respond to the prompts from the installation script. The execution host
installation procedure creates the appropriate directory hierarchy required by
sge_execd, and starts the sge_execd daemon on the execution host. If the root user
does not have write permissions in the $SGE_ROOT directory on all of the machines
where Grid Engine software will be installed, you are asked whether to install the
software as the user to whom the directory belongs. If you answer yes, you must
install the security-related files into that user's $HOME/.sge directory, as shown in
the following example.
%
%
%
%
su - sgeadmin
source $SGE_ROOT/default/common/settings.csh
$SGE_ROOT/util/sgeCA/sge_ca -copy
logout
In the above example, sgeadmin is the name of the user who owns the installation
directory.
9.
After completing all remaining installation steps, refer to the instructions below in
How to Generate Certificates and Private Keys for Users.
How to Generate Certificates and Private Keys for Users
To use the CSP-secured system, the user must have access to a user-specific certificate
and private key. The most convenient method of gaining access is to create a text file
identifying the users.
1.
On the master host, create and save a text file that identifies users. Use the format
of the file myusers.txt shown in the following example. The fields of the file are
UNIX_username:Gecos_field:email_address.
eddy:Eddy Smith:eddy@my.org
sarah:Sarah Miller:sarah@my.org
leo:Leo Lion:leo@my.org
2.
As root on the master host, type the following command:
# $SGE_ROOT/util/sgeCA/sge_ca -usercert myusers.txt
3.
Confirm by typing the following command:
# ls -l /var/sgeCA/port536/default/userkeys
2-68 Oracle Grid Engine Installation and Upgrade Guide
How to Renew Certificates
This directory listing produces output similar to the following example.
dr-x-----dr-x-----dr-x-----4.
2 eddy staff
2 sarah staff
2 leo
staff
512 Mar 5 16:13 eddy
512 Mar 5 16:13 sarah
512 Mar 5 16:13 leo
Tell each user to install security related files in their directories. Tell each user
listed in the file (myusers.txt in the example) to install the security-related files in
their $HOME/.sge directories by typing the following commands.
% source $SGE_ROOT/default/common/settings.csh
% $SGE_ROOT/util/sgeCA/sge_ca -copy
Users should see the following confirmation (user eddy in the example).
Certificate and private key for user
eddy have been installed
For every Grid Engine software installation, a subdirectory for the corresponding
SGE_QMASTER_PORT number is installed. The following example, based on the
myusers.txt file, is a result of issuing the command preceding the output.
% ls -lR $HOME/.sge
/home/eddy/.sge:
total 2
drwxr-xr-x 3 eddy staff
512 Mar
5 16:20 port536
/home/eddy/.sge/port536:
total 2
drwxr-xr-x 4 eddy staff
512 Mar
5 16:20 default
/home/eddy/.sge/port536/default:
total 4
drwxr-xr-x 2 eddy staff
512 Mar 5 16:20 certs
drwx------ 2 eddy staff
512 Mar 5 16:20 private
/home/eddy/.sge/port536/default/certs:
total 8
-r--r--r-- 1 eddy staff
3859 Mar
5 16:20 cert.pem
/home/eddy/.sge/port536/default/private:
total 6
-r-------- 1 eddy staff
887 Mar 5 16:20 key.pem
-r-------- 1 eddy staff
2048 Mar 5 16:20 rand.seed
How to Renew Certificates
1.
Change to $SGE_ROOT and become root on the master host.
# tcsh
# source $SGE_ROOT/default/settings.csh
Note:
2.
This assumes that $SGE_CELL is the default.
Edit $SGE_ROOT/util/sgeCA/renew_all_certs.csh, and change the number of
days that the certificates are valid:
# extend the validity of the CA certificate by
set CADAYS = 365
Installing Grid Engine 2-69
How to Check Certificates
# extend the validity of the daemon certificate by
set DAEMONDAYS = 365
# extend the validity of the user certificate by
set USERDAYS = 365
3.
Run the changed script.
# util/sgeCA/renew_all_certs.csh
The default for all extension times is 365 days from the day the
script is run.
Note:
4.
Replace the old certificates against the new ones on all hosts that installed them
locally. That is, under /var/sgeCA/..., see the execution daemon installation.
5.
If users have copied certificates and keys to $HOME/.sge, they have to repeat $SGE_
ROOT/util/sgeCA/sge_ca -copy to have access to the renewed certificates.
How to Check Certificates
The following sections provide examples of commands related to certificates, where
arch is your system architecture, as in sol-sparc64. Depending on what you want to
do, type one or more of the following commands.
Displaying a Certificate
Type the following as one string with a space between the -in and the ~/.sge
components.
% $SGE_ROOT/utilbin/arch/opensslx509 -in
~/.sge/port536/default/certs/cert.pem -text
Check Issuer
Type the following as one string with a space between the -in and the ~/.sge
components.
% $SGE_ROOT/utilbin/arch/opensslx509 -issuer -in
~/.sge/port536/default/certs/cert.pem -noout
Check Subject
Type the following as one string with a space between the -in and the ~/.sge
components.
% $SGE_ROOT/utilbin/arch/opensslx509 -subject -in
~/.sge/port536/default/certs/cert.pem -noout
Show Email of Certificate
Type the following as one string with a space between the -in and the ~/.sge
components.
% $SGE_ROOT/utilbin/arch/opensslx509 -email -in
~/.sge/default/port536/certs/cert.pem -noout
2-70 Oracle Grid Engine Installation and Upgrade Guide
Verifying the Installation
Show Validity
Type the following as one string with a space between the -in and the ~/.sge
components.
% $SGE_ROOT/utilbin/arch/opensslx509 -dates -in
~/.sge/default/port536/certs/cert.pem -noout
Show Fingerprint
Type the following as one string with a space between the -in and the ~/.sge
components.
% $SGE_ROOT/utilbin/arch/opensslx509 -fingerprint -in
~/.sge/port536/default/certs/cert.pem -noout
Install the security-related files into the admin user's $HOME/ .sge directory. If the root
user does not have write permissions in the $SGE_ROOT directory on all of the machines
where Grid Engine software will be installed, you are asked whether to install the
software as the user to home directory belongs. If you answer yes, you must install the
security-related files into the user's $HOME/ .sge directory, as shown in the following
example. In the following example, sgeadmin is the name of the user who owns the
installation directory.
%
%
%
%
su - sgeadmin
source $SGE_ROOT/default/common/settings.csh
$SGE_ROOT/util/sgeCA/sge_ca -copy
logout
Verifying the Installation
The verification phase includes the following tasks:
■
Ensuring that the master daemon is running on the master host
■
Ensuring that the daemons are running on all execution hosts
■
Ensuring that you can run simple commands
■
Submitting test jobs
To ensure that the Grid Engine system daemons are running, look for the sge_qmaster
daemon on the master host and the sge_execd daemon on the execution hosts. Once
you have verified that the daemons are running, you can try to use commands and
prepare to submit jobs.
If no cell name was specified during installation, the value of
$SGE_CELL is default.
Note:
Topic
Description
How to Verify That the
Daemon is Running on the
Master Host
Procedure for verifying that the Daemon is running
on the master host.
How to Verify That the
Procedure for verifying that the Daemons are
Daemons Are Running on the running on the execution hosts.
Execution Hosts
Installing Grid Engine 2-71
How to Verify That the Daemon is Running on the Master Host
Topic
Description
How to Run Simple
Commands
Procedure for verifying that the Grid Engine
software is operational by running some trial
commands.
How to Submit Test Jobs
Procedure for submitting test jobs.
How to Verify That the Daemon is Running on the Master Host
1.
Log in to the master host. Look in the file $SGE_ROOT/$SGE_CELL/common/act_
qmaster to see if you really are on the master host.
2.
Verify that the master daemon is running.
■
On BSD-based UNIX systems, type the following command:
% ps -ax | grep sge
You should see output similar to the following example.
14676 p1 S <
■
4:47 /gridware/sge/bin/solaris/sge_qmaster
On systems running a UNIX System 5-based operating system (such as the
Solaris Operating System), type the following command:
% ps -ef | grep sge
You should see output similar to the following example.
root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster
3.
If you do not see the appropriate string, restart the daemon. To start the master
host daemon, sge_qmaster:
# $SGE_ROOT/$SGE_CELL/common/sgemaster
4.
start
Continue the verification process. After you have verified that the master host and
the execution host daemons are running, continue the verification process. See
How to Run Simple Commands.
How to Verify That the Daemons Are Running on the Execution Hosts
1.
Log in to the execution hosts on which you ran the execution host installation
procedure.
2.
Verify that the daemons are running.
■
On BSD-based UNIX systems, type the following command:
% ps -ax | grep sge
You should see output similar to the following example.
14688 p1 S <
■
4:27
/gridware/sge/bin/solaris/sge_execd
On systems running a UNIX System 5-based operating system (such as the
Solaris Operating System), type the following command:
% ps -ef | grep sge
You should see output similar to the following example.
root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd
2-72 Oracle Grid Engine Installation and Upgrade Guide
How to Submit Test Jobs
3.
If you do not see similar output, restart the daemon.
# $SGE_ROOT/$SGE_CELL/common/sgeexecd
4.
start
Continue the verification process. After you have verified that the master host and
the execution host daemons are running, continue the verification process. See
How to Run Simple Commands below for details.
How to Run Simple Commands
If both the necessary daemons are running on the master and execution hosts, the Grid
Engine software should be operational. Check by issuing a trial command.
1.
Log in to either the master host or another administrative host. In your standard
search path, make sure to include $SGE_ROOT/bin.
2.
From the command line, type the following command:
% qconf -sconf
This qconf command displays the current global cluster configuration
Configuring Clusters. If this command fails, your $SGE_ROOT environment variable
is not set correctly.
1.
Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_
PORT are set in the script files, $SGE_ROOT/$SGE_CELL/common/settings.csh or
$SGE_ROOT/$SGE_CELL/common/settings.sh.
If no cell name was specified during installation, the value of
$SGE_CELL is default.
Note:
If so, make sure that the environment variables SGE_EXECD_PORT and SGE_
QMASTER_PORT are set to the correct value before you try the command again.
If not, verify whether your NIS services map contains entries for sge_qmaster
and sge_execd. If the SGE_EXECD_PORT and SGE_QMASTER_PORT variables are
not used in these files, then the services database (/etc/services or the NIS
services map for example) on the machine from which you run the command
must provide entries for both sge_qmaster and sge_execd. If these entries do
not exist, add an entry to the machine's services database, giving it the same
value as is configured on the master host.
2.
3.
Retry the qconf command.
Try to submit test jobs.
How to Submit Test Jobs
Before you start submitting batch scripts to the Grid Engine system, check to see
whether your site's standard shell resource files (.cshrc, .profile, or .kshrc) as well
as your personal resource files contain commands such as stty. Batch jobs do not have
a terminal connection by default, and therefore calls to stty result in an error.
1.
Log in to the master host.
2.
Type the following command.
% rsh <exec-host-name> date
Installing Grid Engine 2-73
Automating the Installation Process
The exec-host-name refers to one of the already installed execution hosts. You
should try this test on all execution hosts if your login or home directories differ
from host to host. The rsh command should give you output similar to the date
command run locally on the master host. If any additional lines contain error
messages, you must fix the cause of the errors before you can run a batch job
successfully. For all command interpreters, you can check on an actual terminal
connection before you run a command such as stty. The following is an example
of a Bourne shell script to test the terminal connection.
tty -s
if [ $? = 0 ]; then
stty erase ^H
fi
The following example shows C shell syntax.
tty -s
if ( $status = 0 ) then
stty erase ^H
endif
3.
Submit one of the sample scripts contained in the $SGE_ROOT/examples/jobs
directory.
% qsub $SGE_ROOT/examples/jobs/simple.sh
4.
Use the qstat command to monitor the job's behavior. For more information
about submitting and monitoring batch jobs, see Oracle Grid Engine User’s Guide
for more information about submitting batch jobs.
5.
After the job finishes executing, check your home directory for the redirected
stdout/stderr files script-name.ejob-id and script-name.ojob-id. The job-id
is a consecutive unique integer number assigned to each job.
In case of problems, see Oracle Grid Engine Administration Guide for fine tuning your
environment and using dtrace for performance tuning.
Automating the Installation Process
This section describes how you can automate the software installation process for the
following reasons:
■
To install the Grid Engine software on many hosts
■
To install the Grid Engine software without user interaction
You can use the $SGE_ROOT/inst_sge utility to install and uninstall Grid Engine
master hosts, execution hosts, shadow host and Berkeley DB spooling server hosts.
You can also use this utility to backup automatically the Grid Engine configuration
and accounting data.
Using the Berkeley DB Spooling Server host does not provide
high availability, and it has no authentication mechanism. It should
only be used on a closed network with fully trusted users.
Note:
You can use the inst_sge utility in interactive mode to supplant any of the commands
that were described in Installing the Software From the Command Line.
2-74 Oracle Grid Engine Installation and Upgrade Guide
Automatic Installation
To simplify automatic installation and backup processes, use the configuration
templates that are located in the $SGE_ROOT/util/install_modules directory.
The automatic installation requires no user interaction. No messages are displayed on
the terminal during the installation.
When the installation finishes, a message indicates where the installation log file
resides. The name of the installation log file format is install_hostname_
timestamp.log. Normally, you can find information about errors during installation in
this file. In case of serious errors though, the installation script might not be able to
move the log file into the spool directory. In this situation, the log file is placed in the
/tmp directory.
Topic
Description
Automatic Installation
Perform an automatic installation by setting up
a configuration file.
Automatic
Uninstallation
Learn how to uninstall hosts automatically.
How to Start the
Automatic Backup
Procedure for backing up configuration and
accounting data by using the interactive
backup procedure.
Troubleshooting
Automatic Installation
and Uninstallation
Troubleshoot errors that might be encountered
during automatic installation.
Automatic Installation
Special Considerations
The first step in performing an automatic installation is to set up a configuration file.
You can find configuration file templates in the $SGE_ROOT/util/install_modules
directory. Consider the following as you plan your automatic installation:
■
■
■
To use automatic installation on remote hosts, the root user must be able to access
those hosts through rsh or ssh without supplying a password.
For local spooling, that is, spooling on the master host, no special configuration is
needed. However, the directory where the spooling occurs must not be on an NFS
version 3 volume. You may use an NFS version 4 volume for local spooling.
To run the Berkeley DB spooling server on a host other than the master host, you
must install and configure RPC services on this separate host.
To perform this step manually before you start the automatic installation, use the
following command:
./inst_sge -db
You can also use the following command to install automatically the Berkeley DB
Spooling Server:
% ./inst_sge -db -m -x -auto <full-path-to-configuration-file>
This command checks the SPOOLING_SERVER entry within the configuration file and
starts the Berkeley DB installation on the server host.
Installing Grid Engine 2-75
Automatic Installation
If you start the automatic installation on the master host, the
entire cluster can be installed with one command. The automatic
installation script accesses the remote hosts through rsh or ssh and
starts the installation remotely. This process requires a
well-configured configuration file, which each host must be able to
read. That file should be installed on each host or shared through
NFS.
Note:
Using the inst_sge Utility and a Configuration Template
To automate system installation, use the inst_sge utility in combination with a
configuration file. See How to Automate Other Installations Through a Configuration
File.
You cannot use the auto installation procedure to install
remotely a Windows execution host. You must run the auto
installation procedure directly on the Windows execution host.
Note:
How to Automate Installation With Increased Security (CSP)
The automatic installation also supports the Certificate Security Protocol (CSP) mode
described in Installing the Increased Security Features. To use the CSP security mode,
you must fill out the CSP parameters of the template files. The parameters are as
follows:
# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each installation, if true.
# In case of false, the certs will be created, if not existing.
# Existing certs won't be overwritten. (mandatory for csp install)
CSP_RECREATE="true"
# The created certs won't be copied, if this option is set to false
# If true, the script tries to copy the generated certs. This
# requires passwordless ssh/rsh access for user root to the
# execution hosts
CSP_COPY_CERTS="false"
# csp information, your country code (only 2 characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"
# your state (mandatory for csp install)
CSP_STATE="Germany"
# your location, eg. the building (mandatory for csp install)
CSP_LOCATION="Building"
# your organisation (mandatory for csp install)
CSP_ORGA="Organisation"
# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"
# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"
2-76 Oracle Grid Engine Installation and Upgrade Guide
Automatic Installation
To start the installation, type the following command:
inst_sge -m -csp -auto template-file-name
Note: Certificates are created during the installation process. These
certificates have to be copied to each host of the installed cluster. The
installation process can do this for you; however, you need to perform
the following steps to allow the installation process appropriate
permissions to copy the certificates:
1.
Use rsh/rcp or ssh/scp on each host.
2.
Provide the root user with access to each host over ssh or rsh, without
entering a password.
How to Automate Other Installations Through a Configuration File
In addition to installing the master host, you can perform a variety of other automatic
installations using a similar process. The actual form of the inst_sge command differs
slightly, and different sections of the configuration file apply. This section provides
some examples.
■
To install a shadow host, use the following form of the command:
inst_sge -sm -auto <full-path-to-configuration-file>
Tip: To install more than one shadow host, enter the host names in
the <SHADOW_HOST> parameter section within the configuration file.
■
You can install a separate execution host installation if the master host was
installed without identified compute hosts or if you need to add additional
compute hosts. For the execution host installation, you also need to have a
configuration file. To install all configured execution hosts, use the following form
of the command:
inst_sge -x -auto <full-path-to-configuration-file>
■
To install the Berkeley database server, use the following form of the command:
inst_sge -db -auto <full-path-to-configuration-file>
See the Configuration File Templates.
How to Automate the Master Host Installation
Before You Begin
You need to complete the planning process as outlined in Planning the Installation.
In addition, you need to be able to connect to each of the remote hosts using the rsh or
ssh commands, without supplying a password. If this type of access is not allowed on
your network, you cannot use this method of installation.
Steps
1. Create a copy of the configuration template, $SGE_ROOT/util/install_
modules/inst_template.conf.
Installing Grid Engine 2-77
Automatic Installation
# cd $SGE_ROOT/util/install_modules
# cp inst_template.conf my_configuration.conf
2.
Edit your configuration template, using the values from the worksheet you
completed in Planning the Installation.The configuration file template includes
liberal comments to help you decide where appropriate information belongs. See
Configuration File Templates.
3.
Log in as root on the system that you want to be the Grid Engine master host.
4.
Create the $SGE_ROOT directory.The $SGE_ROOT directory is the root directory of
the Grid Engine software hierarchy, for example /opt/sge62.
5.
Go to the $SGE_ROOT directory and start the installation.
# cd $SGE_ROOT
# ./inst_sge -m -auto <full-path-to-configuration-file>
The -m option starts the master host installation and installs the master daemon on the
local machine. In addition, the -auto option sets up any remote hosts, as specified in
the configuration file.
You cannot install remotely a master host. You must always
install a master host locally.
Note:
To prevent data loss or destroying an already installed cluster, the automatic
installation terminates if the configured $SGE_CELL directory or the configured
Berkeley DB spooling directory already exists. If the installation terminates, the script
displays the reason for the termination on the screen.
A log file of the master installation is created in the $SGE_
ROOT/default/spool/qmaster directory. The file name is created using the format
install_hostname_date_time.log.
Tip: You can also combine options if you want to perform multiple
installations with one command. For example, the following
command installs the master daemon on the local machine and
installs all execution hosts that are configured in the configuration file:
./inst_sge -m -x -auto <full-path-to-configuration-file>
Wait for notification that the installation has completed. When the automatic
installation exits successfully, it displays a message similar to the following:
The Install log can be found in the
/opt/sge62/spool/install_myhost_30mar2007_090152.log file.
The installation log file includes any script or error messages that were generated
during installation. If the qmaster_spooling_dir directory exists, the log files will be
in that directory. If the directory does not exist, the log files will be in the /tmp
directory.
Troubleshooting - If you do not want your execution hosts to spool locally, be sure to
set EXECD_SPOOL_DIR_LOCAL="", with no space between the double quotes ("").
Automating Other Installations Through a Configuration File
In addition to installing the master host, you can perform a variety of other automatic
installations using a similar process. The actual form of the inst_sge command differs
2-78 Oracle Grid Engine Installation and Upgrade Guide
Automatic Uninstallation
slightly, and different sections of the configuration file apply. This section provides
some examples.
■
To install a shadow host, use the following form of the command:
inst_sge -sm -auto <full-path-to-configuration-file>
Tip: To install more than one shadow host, enter the host names in
the <SHADOW_HOST> parameter section within the configuration file.
■
You can install a separate execution host installation if the master host was
installed without identified compute hosts or if you need to add additional
compute hosts. For the execution host installation, you also need to have a
configuration file. To install all configured execution hosts, use the following form
of the command:
inst_sge -x -auto <full-path-to-configuration-file>
■
To install the Berkeley database server, use the following form of the command:
inst_sge -db -auto <full-path-to-configuration-file>
See Configuration File Templates.
Automatic Uninstallation
You can also uninstall hosts automatically.
Uninstall all compute hosts before you uninstall the master
host. If you uninstall the master host first, you have to uninstall all
execution hosts manually.
Note:
To ensure that you have a clean environment, always source the $SGE_ROOT/$SGE_
CELL/common/settings.csh file before proceeding.
Topic
Description
How to Uninstall the Master Host
Automatically
Procedure for uninstalling the master
host automatically.
How to Uninstall Execution Hosts Procedure for uninstalling the
execution hosts automatically.
Automatically
How to Uninstall the Shadow
Master Host
Procedure for uninstalling the shadow
host.
How to Uninstall Execution Hosts Automatically
During the execution host uninstallation, all configuration information for the targeted
hosts is deleted. The uninstallation attempts to stop the exec hosts in a graceful
manner.
First, the queue instances associated with the target host of the uninstallation will be
disabled, so that new jobs will not be started. Then, in sequence, the following actions
are done on each of the running jobs: checkpoint the job; reschedule the job; do forced
rescheduling of the job.
Installing Grid Engine 2-79
How to Start the Automatic Backup
At this point, the queue instance will be empty, and the execution daemon will be shut
down, then the configuration, global spool directory or local spool directory will be
removed.
The configuration file template has a section for identifying hosts that can be
uninstalled automatically. Look for this section:
# Remove this execution hosts in automatic mode
EXEC_HOST_LIST_RM="host1 host2 host3 host4"
Every host in the EXEC_HOST_LIST_RM list will be automatically removed from the
cluster.
To start the automatic uninstallation of execution hosts, type the following command:
% ./inst_sge -ux -auto <full-path-to-configuration-file>
How to Uninstall the Master Host Automatically
The master host uninstallation removes all of the Grid Engine configuration files. After
the uninstallation procedure completes, only the binary files remain. If you think that
you will need the configuration information after the uninstallation, perform a backup
of the master host. The master host uninstallation supports both interactive and
automatic mode.
To start the automatic uninstallation of the master host, type the following command:
% ./inst_sge -um -auto <full-path-to-configuration-file>
This command performs the same procedure as in interactive mode, except the user is
not prompted for confirmation of any steps and all terminal output is suppressed.
Once the uninstall process is started, it cannot be stopped.
How to Uninstall the Shadow Master Host
To start the automatic uninstallation of the shadow host, type the following command:
% ./inst_sge -usm -auto <full-path-to-configuration-file>
How to Start the Automatic Backup
The automatic backup procedure backs up configuration and accounting data in much
the same way as the interactive backup procedure. You can run the automatic backup
procedure as a cron job if you want to schedule unattended or periodic backups.
Steps
The automatic backup requires a configuration file, for which a template is located in
the $SGE_ROOT/util/install_modules/backup_template.conf file. Comments within
the configuration file template indicate what values to use for your environment.
After you set up the configuration file, type the following command to start the
automatic backup:
% ./inst_sge -bup -auto <full-path-to-configuration-file>
To prevent overwriting existing backup files, a date/time combination is added to the
end of the backup directory name that is specified in the configuration file.
2-80 Oracle Grid Engine Installation and Upgrade Guide
Troubleshooting Automatic Installation and Uninstallation
Example Backup Configuration File
#--------------------------------------------------# Autobackup Configuration File Template
#--------------------------------------------------# Please, enter your SGE_ROOT here (mandatory)
SGE_ROOT="/opt/gridengine"
# Please, enter your SGE_CELL here (mandatory)
SGE_CELL="default"
# Please, enter your Backup Directory here
# After backup you will find your backup files here (mandatory)
# The autobackup will add a time /date combination to this dirname
# to prevent an overwriting!
BACKUP_DIR="/opt/backups/ge_backup"
# Please, enter true to get a tar/gz package
# and false to copy the files only (mandatory)
TAR="true"
# Please, enter the backup file name here. (mandatory)
BACKUP_FILE="backup.tar"
Troubleshooting Automatic Installation and Uninstallation
The following errors might be encountered during auto-installation:
Table 2–11
Troubleshooting Automatic Installation and Uninstallation
Problems
Solution
If the $SGE_CELL directory exists,
Remove or rename the directory.
the installation terminates to avoid
overwriting a previous installation.
If the Berkeley database spooling
directory exists, the installation
terminates to avoid overwriting a
previous installation.
This directory must be removed or renamed in
order to proceed. Make sure that the ADMINUSER
has permissions to write into the location where the
Berkeley database spooling directory is located. The
ADMINUSER will be the owner of the Berkeley
database spooling directory.
The execution host installation
appears to succeed, but the
execution daemon is not started, or
no load values are shown.
Verify that user root is allowed to rsh or ssh to the
other host, without entering a password.
JMX thread does not appear to be
running. The qmaster messages file
shows message could not load
libjvm ld.so.1: sge_qmaster:
fatal: jvm_missing: open
failed: No such file or
directory
Either an incorrect value for SGE_JVM_LIB_PATH is
specified in the installation template file or if left
empty installer could not autodetect a suitable JVM
library. Possible reasons might include being on a
64-bit platform and providing a path to a 32-bit
JVM library or not having the 64-bit Java installed
at all. Once you install correct Java you may change
the libjvm_path attribute from jvm_missing to the
correct path to the JVM library by calling qconf
-mconf command.
If your network does not allow user root to have permissions to connect to other hosts
through rsh or ssh without asking for a password, the automatic installation will not
work remotely. In this case, log in to the host and use the following command to start
the automatic installation locally on each host:
% ./inst_sge -x -noremote -auto /tmp/install_config_file.conf
Installing Grid Engine 2-81
Installing SMF Services
Installing SMF Services
The Service Management Facility (SMF) is a new feature in Solaris 10. It provides a
unified model for controlling services, replaces RC scripts, handles service
dependencies, provides better service availability, and speeds up boot process. If you
do not use at least Version 10 of the Solaris OS in your cluster, or you do not plan to
use SMF, continue with Installing the Software From the Command Line.
SMF is now the default for the hosts that run at least Version
10 of the Solaris OS. If you want to use the old behavior (RC files) for
the Solaris hosts, you need to start the installation with the -nosmf
option. Use the following command:
Note:
./inst_sge -x -nosmf
Installing SMF services includes the following topics:
■
Why Install SMF Services?
■
Additional Setup Required
■
How Do SMF Services Compare to the Normal Services?
Why Install SMF Services?
SMF provides a unified administrative model of the persistent services. It solves many
challenges of the previous approaches. All services have a common place for log files.
Persistent services are automatically restarted on failure. For more information, see
SMF documentation at:
http://hub.opensolaris.org/bin/view/Project+smf-doc/WebHome
Additional Setup Required
If you want unprivileged users to use SMF services, you should create a role sge_
admin. Assign this role to the users who should be able to manipulate the Grid Engine
SMF services as described in Planning the Installation.
Then, you can simply answer y when prompted to use SMF during the installation.
How Do SMF Services Compare to the Normal Services?
The biggest difference between SMF and normal services is that SMF does not
consider kill -9 to be a correct service shutdown. SMF interprets kill -9 to restart
the service.
Within the SMF framework, a service is uniquely identified by its fault resource
management identifier (FMRI).
qmaster Daemon
Service name (FMRI) is svc:/application/sge/qmaster:$SGE_CLUSTER_NAME.
Table 2–12
qmaster Daemon
SGE version
sgemaster
stop
qconf-km
kill -15
kill -9
reboot
6.1
stop
stop
stop
stop
restart 1
2-82 Oracle Grid Engine Installation and Upgrade Guide
Installing SMF Services
Table 2–12 (Cont.) qmaster Daemon
SGE version
sgemaster
stop
qconf-km
kill -15
kill -9
reboot
6.2
stop
stop
stop
restart
restart
1
Restart the daemon if RC scripts were installed
shadowd Daemon
Service name (FMRI) is svc:/application/sge/shadowd:$SGE_CLUSTER_NAME.
Table 2–13
shadowd Daemon
SGE version
sgemaster
-shadow
stop
qconf-km
kill -15
kill -9
reboot
6.1
stop
stop
stop
stop
restart1
6.2
stop
stop
stop
restart
restart
1
Restart the daemon if RC scripts were installed
execd Daemon
Service name (FMRI) is svc:/application/sge/execd:$SGE_CLUSTER_NAME.
Table 2–14
execd Daemon
SGE version
sgeexecd
stop
qconf-ke
kill -15
kill -9
reboot
6.1
stop
stop
stop
stop
restart1
6.2
stop
stop
stop
restart
restart
1
Restart the daemon if RC scripts were installed
Berkeley RPC Server
Service name (FMRI) is svc:/application/sge/bdb:$SGE_CLUSTER_NAME.
Table 2–15
Berkeley RPC Server
SGE version
berkeley_
svc stop
kill -15
kill -9
reboot
6.1
stop
stop
stop
restart1
6.2
stop
stop
restart
restart
1
Restart the server if RC scripts were installed
dbwriter Software
Service name (FMRI) is svc:/application/sge/dbwriter:$SGE_CLUSTER_NAME.
Table 2–16
dbwriter Software
SGE version
sgedbwriter
stop
kill -15
kill -9
reboot
6.1
stop
stop
stop
restart1
6.2
stop
restart
restart
restart
Installing Grid Engine 2-83
Installing a JMX-Enabled System
1
Restart the dbwriter if RC scrripts were installed
Additionally you may use new SMF interfaces to interact with
services.
Note:
The new actions and commands are:
Table 2–17
New Action and Commands
Actions
Commands
Start service
temporary
svcadm enable -t FMRI
Start service
svcadm enable FMRI
permanently (across
reboots)
Stop service
temporary
svcadm disable -t FMRI
Stop service
svcadm disable FMRI
permanently (across
reboots)
Restart service
svcadm reboot FMRI
Installing a JMX-Enabled System
The JMX agent functionality enables access to a subset of sge_qmaster functionality
via the JMX protocol. For Grid Engine 6.2, the main purpose of the JMX agent is to
provide an interface between the SDM Grid Engine adapter and the Grid Engine
system.
Additional Setup Required
The steps required to set up the JMX agent feature of Grid Engine are similar to the
standard setup. You generally follow the instructions in Planning the Installation,
Loading the Distribution Files on a Workstation, How to Install the Master Host, How
to Install Execution Hosts and How to Register Administration Hosts. However, you
have to perform a few additional tasks:
■
■
Generating necessary configuration files on the master host by calling the
installation script with the -jmx flag and depending on the JMX specific
installation settings the optional generation of certificates, keys and keystore files.
Optional distribution of security relevant files to the execution and submit hosts
using a secure method such as ssh.
■
Generating user keys, certificates and keys automatically, after master installation.
■
Adding new users.
■
Tweaking of JMX-specific files.
Topic
Description
How to Install a JMX
Agent-Enabled System
Procedure for installing Grid Engine using the jmx flag
when invoking the qmaster installation scripts.
2-84 Oracle Grid Engine Installation and Upgrade Guide
How to Install a JMX Agent-Enabled System
Topic
Description
How to Generate
Procedure for generating user-specific certificates,
Certificates, Private Keys private keys, and keystores.
and Keystores for Users
How to Check
Procedure for checking certificates, private keys, and
Certificates, Private Keys keystores.
and Keystores for Users
JMX Configuration Files
Describes the JMX configuration files in detail.
Testing and
Troubleshooting
Testing and troubleshooting a JMX-enabled system.
How to Install a JMX Agent-Enabled System
Install the Grid Engine software as outlined in Installing the Software From the
Command Line, with the following exception: use the additional flag -jmx when
invoking the qmaster installation scripts.
To install a JMX agent enabled system do the following:
1.
Change the master host installation procedure. Type the following command and
respond to the prompts from the installation script.
# ./install_qmaster -jmx [-csp]
2.
Supply the following information to generate necessary configuration files and
optionally the certificates, keys and keystores:
■
JMX agent options
If you are on a 64-bit system, you need to provide JAVA_
HOME for a 64-bit Java (usually installed as an addition to the 32-bit
Java).
Caution:
■
–
JAVA_HOME (mandatory)
–
Additional JVM arguments (optional)
–
JMX MBean server port >= 1024 (mandatory)
–
JMX ssl authentication (default: true)
–
JMX ssl client authentication (default: true)
–
JMX ssl server keystore path (/var/sgeCA/sge_qmaster| port$SGE_
QMASTER_PORT/$SGE_CELL/private/keystore)
–
JMX ssl server keystore password
Optional certificate specific options, if there is no CA available
–
Two-letter country code, for example, US for the United States
–
State
–
Location, such as a city
–
Organization
–
Organizational unit
Installing Grid Engine 2-85
How to Generate Certificates, Private Keys and Keystores for Users
–
CA email address As the installation proceeds, several JMX specific
configuration files are installed. jvm_threads is set to 1 instead of 0 if JMX
is enabled in $SGE_ROOT/$SGE_CELL/common/bootstrap:
...
jvm_threads
...
■
1
Several JMX agent specific configuration files are generated as:
$SGE_ROOT/$SGE_CELL/common/jmx/jaas.config
$SGE_ROOT/$SGE_CELL/common/jmx/java.policy
$SGE_ROOT/$SGE_CELL/common/jmx/jmxremote.access
$SGE_ROOT/$SGE_CELL/common/jmx/jmxremote.password
$SGE_ROOT/$SGE_CELL/common/jmx/logging.properties
$SGE_ROOT/$SGE_CELL/common/jmx/management.properties
For a detailed description, see the comments in the files and the description
below. Optionally the Certificate Authority is created. The directories that
contain information relevant to security are as follows:
–
The publicly accessible CA and daemon certificate are stored in $SGE_
ROOT/$SGE_CELL/common/sgeCA
–
The publicly accessible user certificates are stored in $SGE_ROOT/$SGE_
CELL/common/sgeCA/usercerts
–
The corresponding private keys and keystore are stored in
/var/sgeCA/sge_qmaster| port$SGE_QMASTER_PORT/$SGE_CELL/private
–
User keys, certificates and keystore are stored in /var/sgeCA/sge_
qmaster| port$SGE_QMASTER_PORT/$SGE_CELL/userkeys/$USER
3.
The script prompts you for site information.
4.
Confirm whether the information you supplied is correct.
5.
Continue the installation. After the security-related setup of the master host is
finished, the script prompts you to continue with the rest of the installation
procedure, as in the following example:
SGE startup script
-------------------Your system wide Grid Engine startup script is installed as:
"/scratch2/eddy/sge_sec/$SGE_CELL/common/sgemaster"
Hit Return to continue >>
6.
Proceed to the next task. For more information, see Installing the Increased
Security Features.
How to Generate Certificates, Private Keys and Keystores for Users
To use the CSP-secured system, the user must have access to a user-specific certificate,
private key and keystore. Usually the steps outlined in Installing the Increased
Security Features are performed. After that the following procedure can be done to
generate the corresponding keystore files for the users.
1.
As root on the master host run the following command:
# $SGE_ROOT/util/sgeCA/sge_ca -userks [-kspwf <kspwf-file>]
2-86 Oracle Grid Engine Installation and Upgrade Guide
JMX Configuration Files
2.
Confirm that the creation has been successful.
# ls -lR /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/userkeys
/var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/userkeys/:
total 8
drwx-----2 eddy
staff
512 Mar 13 11:33 eddy
drwx-----2 sarah
staff
512 Mar 13 11:33 sarah
drwx-----2 leo
staff
512 Mar 13 11:33 leo
/var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/userkeys/eddy:
total 16
-rw------1 eddy staff
1586 Mar 13 11:32 cert.pem
-rw------1 eddy staff
891 Mar 13 11:32 key.pem
-rw------1 eddy staff
3031 Mar 13 11:33 keystore
-rw------1 eddy staff
1024 Mar 13 11:32 rand.seed
-rw------1 eddy staff
818 Mar 13 11:32 req.pem
...
How to Check Certificates, Private Keys and Keystores for Users
To confirm that these files contain the intended information, use the following
commands:
■
To display a certificate:
$SGE_ROOT/util/sgeCA/sge_ca -print <whereever>/cert.pem
■
To display a key:
$SGE_ROOT/util/sgeCA/sge_ca -printkey <whereever>/key.pem
To display a keystore or truststore:
$JAVA_HOME/bin/keytool -list -v -keystore <whereever>/keystore
The keystore password must be entered to see all entries otherwise only the certificates
are visible. For more information, see Java keytool documentation at:
http://download.oracle.com/javase/1.5.0/docs/tooldocs/solaris/keytool.html
JMX Configuration Files
The following configuration files are installed into $SGE_ROOT/$SGE_CELL/common/jmx
and are explained in detail here. Manual modification is usually not necessary and the
preinstalled configurations should be sufficient.
jaas.config
Before using the JMX interface, you must run a special authentication against sge_
qmaster. This process adds the correct principle that gives you the necessary role to
access the JMX interfaces in read-only or read-write mode. The responsible section in
the jaas.config file is named GridwareConfig or TestConfig (for testing only). In
general, the jaas.config file defines which login modules are used for which
application case. The choice of the login module is defined either in a configuration file
like management.properties or programmatically. The jaas.config file contains
different sections and allows the replacement of the authentication mechanism, e.g.
authentication via unix pam or via LDAP (see the GridwareConfig section and
TestConfig section below). The different modules can be stacked. For a general
Installing Grid Engine 2-87
JMX Configuration Files
overview of Jaas, see
http://java.sun.com/developer/technicalArticles/Security/jaasv2/
Here the procedure consists of two steps in the GridwareConfig section:
■
■
Authenticate the user (for example with keystore or Unix login or with LDAP).
In the JGDILogin module, add the JMXPrincipal that gives the defined role to the
user. This role is used later in the jmx.access file to check if the user has read-only
or read-write access.
/*
* Default login configuration for qmaster's jmx server
*/
GridwareConfig {
/**
* Accepts all clients which have a certificate which is signed with
* the CA certificate.
*/
com.sun.grid.security.login.GECATrustManagerLoginModule requisite
caTop="${com.sun.grid.jgdi.caTop}";
/*
* Accepts all clients which has a valid username/password.
*
* The username/password validation is done with the authuser binary
(included
* in the Grid Engine distribution, $SGE_ROOT/utilbin/$ARCH/authuser).
*
* ATTENTION: The authuser binary needs the suid bit. It does not work if
grid
* engine is installed on a nosuid file system.
*
*/
com.sun.grid.security.login.UnixLoginModule requisite
sge_root="${com.sun.grid.jgdi.sgeRoot}"
auth_method="system";
/*
* Username password authentication against LDAP.
*
* Alternative username/password authentication if
* com.sun.grid.security.login.UnixLoginModule is not working.
*
* The LDAP specific parameters have to be adjusted to the local requirements
* For details please have a look at the LdapLoginModule javadocs.
*
* ATTENTION: The LdapLoginModule is only available in java 6. The
* parameter libjvm_path must point to a java 6 jvm
* (qconf -sconf | grep libjvm_path)
*/
/*
com.sun.security.auth.module.LdapLoginModule requisite
userProvider="ldap://sun-ds/ou=people,dc=sun,dc=com"
userFilter="(&(uid={USERNAME})(objectClass=inetOrgPerson))"
useSSL=false;
*/
/*
*
The JGDILoginModule adds a JGDIPrincipal to the subject. The username of
2-88 Oracle Grid Engine Installation and Upgrade Guide
JMX Configuration Files
* the JGDIPrincipal is the name of the first trusted principal. This name
* treated as username for gdi communication.
* For each login a new jgdi session id is created.
*
* In the jmxremote.access file users who can access the system are defined
* Any principal matching these entries is given the corresponding role.
* Usually a jmxPrincipal is defined to give a user access to the system.
* (e.g. com.sun.grid.security.login.UserPrincipal = xyz &
*
jmxPrincipal="controlRole" gives user xyz access under controlRole
* )
*/
com.sun.grid.jgdi.security.JGDILoginModule optional
trustedPrincipal="com.sun.grid.security.login.UserPrincipal"
trustedPrincipal1="com.sun.security.auth.UserPrincipal"
jmxPrincipal="controlRole";
};
/*
* TestConfig accepts any user. Only for testing
*/
TestConfig {
com.sun.grid.security.login.TestLoginModule requisite;
com.sun.grid.jgdi.security.JGDILoginModule optional
trustedPrincipal="com.sun.grid.security.login.UserPrincipal"
jmxPrincipal="controlRole";
};
/*
* Mandatory if native jgdi is used with a csp system
* (e.g. jgdish in csp mode)
*/
jgdi {
com.sun.security.auth.module.KeyStoreLoginModule required
keyStoreURL="file:./keystore"
debug=false;
};
java.policy
The java.policy file that is used by the JGDIAgent restricts the possibilities of code that
can be run in sge_qmaster's JVM.
Usually changes here are only necessary to change the access to a subset of the overall
functionality. To tweak the policy settings to your needs it is useful to run the JMX
server with security debugging enabled and to consult the generated logging files.
(qconf -mconf, additional_jvm_args = -Djavax.net.debug=ssl
-Djava.security.debug=access,failure)
/*
**
**
**
**
**
**
**
*/
with LdapLoginModule
grant principal com.sun.security.auth.UserPrincipal "controlRole"
with jmxremote.password
grant principal javax.management.remote.JMXPrincipal "controlRole"
Installing Grid Engine 2-89
JMX Configuration Files
grant codeBase "file:${com.sun.grid.jgdi.sgeRoot}/lib/jgdi.jar" {
permission java.net.SocketPermission
"*:1024-", "accept,connect";
permission java.net.SocketPermission
"localhost:1024-", "listen,resolve";
permission java.lang.RuntimePermission "loadLibrary.jgdi";
permission java.lang.RuntimePermission "shutdownHooks";
permission java.lang.RuntimePermission "setContextClassLoader";
permission javax.security.auth.AuthPermission "createLoginContext.jgdi";
permission javax.security.auth.AuthPermission "doAs";
permission javax.security.auth.AuthPermission "getSubject";
permission java.util.PropertyPermission "*", "read";
permission java.util.logging.LoggingPermission "control";
permission java.lang.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/${com.sun.grid.jgdi.sgeCell}/common/jmx/-", "read";
permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/util/-",
"execute";
permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/utilbin/-",
"execute";
permission javax.management.MBeanServerPermission "createMBeanServer";
permission javax.management.MBeanPermission "*", "*";
permission javax.management.MBeanTrustPermission "register";
permission java.lang.management.ManagementPermission "monitor";
permission java.lang.management.ManagementPermission "control";
permission java.lang.RuntimePermission "setIO";
permission java.io.FilePermission
"jgdi.stdout", "write";
permission java.io.FilePermission
"jgdi.stderr", "write";
permission java.io.FilePermission
"jgdi0.log.lck", "delete";
permission java.io.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/${com.sun.grid.jgdi.sgeCell}/common/jmx/*", "read";
permission java.io.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/lib/-",
"read";
permission java.lang.RuntimePermission
"accessClassInPackage.sun.management.jmxremote";
permission java.lang.RuntimePermission
"accessClassInPackage.sun.management.resources";
permission java.lang.RuntimePermission "accessClassInPackage.sun.management";
permission java.lang.RuntimePermission "accessClassInPackage.sun.rmi.server";
permission java.lang.RuntimePermission
"accessClassInPackage.sun.management.snmp.util";
permission java.lang.RuntimePermission "accessClassInPackage.sun.rmi.registry";
permission java.util.PropertyPermission "java.rmi.server.randomIDs", "write";
permission javax.security.auth.AuthPermission "modifyPrincipals";
permission javax.security.auth.AuthPermission "createLoginContext.*";
permission javax.security.auth.AuthPermission
"createLoginContext.JMXPluggableAuthenticator";
permission java.security.SecurityPermission "createAccessControlContext";
permission javax.management.remote.SubjectDelegationPermission
"javax.management.remote.JMXPrincipal.controlRole";
};
grant principal javax.management.remote.JMXPrincipal "controlRole" {
permission javax.management.MBeanPermission
"com.sun.grid.jgdi.management.mbeans.JGDIJMX#*", "*";
permission javax.management.MBeanPermission "sun.management.*#*", "*";
permission javax.security.auth.AuthPermission "createLoginContext.jgdi";
permission javax.security.auth.AuthPermission "doAs";
2-90 Oracle Grid Engine Installation and Upgrade Guide
JMX Configuration Files
permission
permission
permission
permission
permission
"read";
permission
permission
javax.security.auth.AuthPermission "getSubject";
java.util.PropertyPermission "*", "read";
java.util.PropertyPermission "user.timezone", "read,write";
java.util.logging.LoggingPermission "control";
java.io.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/lib/-",
java.lang.management.ManagementPermission "monitor";
java.net.SocketPermission "*", "resolve";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]",
"isInstanceOf";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]",
"getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#ProcessCpuTime[java.lang:type=OperatingSys
tem]", "getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#Name[java.lang:type=OperatingSystem]",
"getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#Version[java.lang:type=OperatingSystem]",
"getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#Arch[java.lang:type=OperatingSystem]",
"getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#AvailableProcessors[java.lang:type=Operati
ngSystem]", "getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#CommittedVirtualMemorySize[java.lang:type=
OperatingSystem]", "getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#TotalPhysicalMemorySize[java.lang:type=Ope
ratingSystem]", "getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#FreePhysicalMemorySize[java.lang:type=Oper
atingSystem]", "getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#TotalSwapSpaceSize[java.lang:type=Operatin
gSystem]", "getAttribute";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#FreeSwapSpaceSize[java.lang:type=Operating
System]", "getAttribute";
permission javax.management.MBeanPermission
"javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]
", "addNotificationListener";
permission javax.management.MBeanPermission
"javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]
", "isInstanceOf";
permission javax.management.MBeanPermission
"javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]
", "getMBeanInfo";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]",
"queryNames";
permission javax.management.MBeanPermission
"java.util.logging.Logging#-[java.util.logging:type=Logging]", "queryNames";
permission javax.management.MBeanPermission
"javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]
Installing Grid Engine 2-91
JMX Configuration Files
", "queryNames";
permission javax.management.MBeanPermission
"java.util.logging.Logging#-[java.util.logging:type=Logging]", "isInstanceOf";
permission javax.management.MBeanPermission
"java.util.logging.Logging#-[java.util.logging:type=Logging]", "getMBeanInfo";
permission javax.management.MBeanPermission
"com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]",
"getMBeanInfo";
};
grant {
permission java.util.logging.LoggingPermission "control";
permission java.util.PropertyPermission "*", "read";
permission java.util.PropertyPermission "user.timezone", "write";
permission java.lang.RuntimePermission "setIO";
permission java.lang.RuntimePermission "loadLibrary.jgdi";
permission java.io.FilePermission
"jgdi.stdout", "write";
permission java.io.FilePermission
"jgdi.stderr", "write";
permission java.io.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/lib/-",
"read";
permission java.io.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/util/arch", "execute";
permission java.io.FilePermission
"${com.sun.grid.jgdi.sgeRoot}/utilbin/-", "execute";
permission javax.security.auth.AuthPermission "modifyPrincipals";
permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}", "read";
permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/cacert.pem",
"read";
permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/ca-crl.pem",
"read";
permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/usercerts/-",
"read";
permission java.io.FilePermission "${com.sun.grid.jgdi.serverKeystore}",
"read";
};
/*
grant {
permission java.security.AllPermission;
};
*/
management.properties
This file describes the general JMX server configuration and the default template looks
similar to this example and is usually adapted automatically during the installation
process replacing the @@SGE_*@@ variables by concrete values. The meaning of the
@@SGE_*@@ variables is:
■
@@SGE_JMX_PORT@@ is the configured JMX port
■
@@SGE_JMX_SSL@@ is true or false if SSL shall be enabled for JMX or not
■
@@SGE_JMX_SSL_CLIENT@@ is true or false if client authentication is required
■
@@SGE_JMX_SSL_KEYSTORE@@ the keystore used for enabled SSL
■
@@SGE_JMX_SSL_KEYSTORE_PW@@ the corresponding keystore password
■
@@SGE_ROOT@@ the $SGE_ROOT root directory
2-92 Oracle Grid Engine Installation and Upgrade Guide
JMX Configuration Files
■
@@SGE_CELL@@ the $SGE_CELL name usually 'default'
#####################################################################
# Default Configuration File for JGDI JMX
#####################################################################
#
# The Management Configuration file (in java.util.Properties format)
# will be read if one of the following system properties is set:
#
-Dcom.sun.grid.jgdi.management.jmxremote.port=<port-number>
# or -Dcom.sun.grid.jgdi.management.config.file=<this-file>
#
# The default Management Configuration file is:
#
#
$SGE_ROOT/{$SGE_CELL|default}/common/jmx/management.properties
#
# ################ Management Agent Port #########################
#
# For setting the JMX RMI agent port use the following line
# com.sun.grid.jgdi.management.jmxremote.port=<port-number>
com.sun.grid.jgdi.management.jmxremote.port=@@SGE_JMX_PORT@@
#####################################################################
#
RMI Management Properties
#####################################################################
#
# If system property -Dcom.sun.grid.jgdi.management.jmxremote.port=<port-number>
# is set then
#
- A MBean server is started
#
- JRE Platform MBeans are registered in the MBean server
#
- RMI connector is published in a private readonly registry at
#
specified port using a well known name, "jmxrmi"
#
- the following properties are read for JMX remote management.
#
# The configuration can be specified only at startup time.
# Later changes to above system property (e.g. via setProperty method),
# this config file, the password file, or the access file have no effect to the
# running MBean server, the connector, or the registry.
#
#
# ###################### RMI SSL #############################
#
# com.sun.grid.jgdi.management.jmxremote.ssl=true|false
#
Default for this property is true. (Case for true/false ignored)
#
If this property is specified as false then SSL is not used.
#
#For RMI monitoring without SSL use the following line
# com.sun.grid.jgdi.management.jmxremote.ssl=false
com.sun.grid.jgdi.management.jmxremote.ssl=@@SGE_JMX_SSL@@
# com.sun.grid.jgdi.management.jmxremote.ssl.enabled.cipher.suites=<cipher-suites>
#
The value of this property is a string that is a comma-separated list
#
of SSL/TLS cipher suites to enable. This property can be specified in
#
conjunction with the previous property "com.sun.management.jmxremote.ssl"
#
in order to control which particular SSL/TLS cipher suites are enabled
#
for use by accepted connections. If this property is not specified then
#
the SSL RMI Server Socket Factory uses the SSL/TLS cipher suites that
#
are enabled by default.
#
Installing Grid Engine 2-93
JMX Configuration Files
# com.sun.grid.jgdi.management.jmxremote.ssl.enabled.protocols=<protocol-versions>
#
The value of this property is a string that is a comma-separated list
#
of SSL/TLS protocol versions to enable. This property can be specified in
#
conjunction with the previous property "com.sun.management.jmxremote.ssl"
#
in order to control which particular SSL/TLS protocol versions are
#
enabled for use by accepted connections. If this property is not
#
specified then the SSL RMI Server Socket Factory uses the SSL/TLS
#
protocol versions that are enabled by default.
#
# com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true|false
#
Default for this property is false. (Case for true/false ignored)
#
If this property is specified as true in conjunction with the previous
#
property "com.sun.management.jmxremote.ssl" then the SSL RMI Server
#
Socket Factory will require client authentication.
#
#For RMI monitoring with SSL client authentication use the following line
#com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true
com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=@@SGE_JMX_SSL_CLIENT@@
#
# ################ RMI User authentication ################
#
# com.sun.grid.jgdi.management.jmxremote.authenticate=true|false
#
Default for this property is true. (Case for true/false ignored)
#
If this property is specified as false then no authentication is
#
performed and all users are allowed all access.
#
# For RMI monitoring without any checking use the following line
# com.sun.grid.jgdi.management.jmxremote.authenticate=false
com.sun.grid.jgdi.management.jmxremote.authenticate=true
#
# ################ RMI Login configuration ###################
#
# com.sun.grid.jgdi.management.jmxremote.login.config=<config-name>
#
Specifies the name of a JAAS login configuration entry to use when
#
authenticating users of RMI monitoring.
#
#
Setting this property is optional - the default login configuration
#
specifies a file-based authentication that uses the password file.
#
#
When using this property to override the default login configuration
#
then the named configuration entry must be in a file that gets loaded
#
by JAAS. In addition, the login module(s) specified in the configuration
#
should use the name and/or password callbacks to acquire the user's
#
credentials. See the NameCallback and PasswordCallback classes in the
#
javax.security.auth.callback package for more details.
#
#
If the property "com.sun.management.jmxremote.authenticate" is set to
#
false, then this property and the password & access files are ignored.
#
# For a non-default login configuration use the following line
# com.sun.grid.jgdi.management.jmxremote.login.config=<config-name>
com.sun.grid.jgdi.management.jmxremote.login.config=GridwareConfig
2-94 Oracle Grid Engine Installation and Upgrade Guide
JMX Configuration Files
#
# ################ RMI Password file location ##################
#
# com.sun.grid.jgdi.management.jmxremote.password.file=filepath
#
Specifies location for password file
#
This is optional - default location is
#
$JRE/lib/management/jmxremote.password
#
#
If the property "com.sun.grid.jgdi.management.jmxremote.authenticate" is
set to
#
false, then this property and the password & access files are ignored.
# For a non-default password file location use the following line
# com.sun.grid.jgdi.management.jmxremote.password.file=filepath
com.sun.grid.jgdi.management.jmxremote.password.file=@@SGE_ROOT@@/@@SGE_
CELL@@/common/jmx/jmxremote.password
#
# ################ RMI Access file location #####################
#
# com.sun.grid.jgdi.management.jmxremote.access.file=filepath
#
Specifies location for access file
#
This is optional - default location is
#
$JRE/lib/management/jmxremote.access
#
#
If the property "com.sun.management.jmxremote.authenticate" is set to
#
false, then this property and the password & access files are ignored.
#
Otherwise, the access file must exist and be in the valid format.
#
If the access file is empty or non-existent then no access is allowed.
#
# For a non-default access file location use the following line
# com.sun.grid.jgdi.management.jmxremote.access.file=filepath
com.sun.grid.jgdi.management.jmxremote.access.file=@@SGE_ROOT@@/@@SGE_
CELL@@/common/jmx/jmxremote.access
# For the JGDI keystore module use this settings for the server keystore and
keystore password
com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystore=@@SGE_JMX_SSL_KEYSTORE@@
com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystorePasswordFile=@@SGE_JMX_
SSL_KEYSTORE@@.password
# moved into $CALOCALTOP/private/keystore.password
# com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystorePassword=<SGE_JMX_SSL_
KEYSTORE_PW>
jmx.access
The jmx access file defines which principals are mapped to a special role.
######################################################################
#
Default Access Control File for Remote JMX(TM) Monitoring
######################################################################
#
# Access control file for Remote JMX API access to monitoring.
# This file defines the allowed access for different roles. The
# password file (jmxremote.password by default) defines the roles and their
# passwords. To be functional, a role must have an entry in
# both the password and the access files.
#
Installing Grid Engine 2-95
JMX Configuration Files
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Default location of this file is $JRE/lib/management/jmxremote.access
You can specify an alternate location by specifying a property in
the management config file $JRE/lib/management/management.properties
(See that file for details)
The file format for password and access files is syntactically the same
as the Properties file format. The syntax is described in the Javadoc
for java.util.Properties.load.
Typical access file has multiple lines, where each line is blank,
a comment (like this one), or an access control entry.
An access control entry consists of a role name, and an
associated access level. The role name is any string that does not
itself contain spaces or tabs. It corresponds to an entry in the
password file (jmxremote.password). The access level is one of the
following:
"readonly" grants access to read attributes of MBeans.
For monitoring, this means that a remote client in this
role can read measurements but cannot perform any action
that changes the environment of the running program.
"readwrite" grants access to read and write attributes of MBeans,
to invoke operations on them, and to create or remove them.
This access should be granted to only trusted clients,
since they can potentially interfere with the smooth
operation of a running program
A given role should have at most one entry in this file. If a role
has no entry, it has no access.
If multiple entries are found for the same role name, then the last
access entry is used.
Default access control entries:
o The "monitorRole" role has readonly access.
o The "controlRole" role has readwrite access.
monitorRole
controlRole
readonly
readwrite
jmx.password
This is also a possible simple authentication mechanism though not recommended.
Usually the jaas login module is preferred since it is much more flexible. You can
specify a password for the different roles there. If a simple login mechanism is
required it is recommended to change management.properties to use TestConfig
instead of GridwareConfig, which allows any valid Unix user to connect to JGDI JMX
server without a password.
logging.properties
To enable JGDI and JMX logging the delivered logging file has to be adjusted and sge_
qmaster or at least the JMX server has to be restarted. The generated logging files
default to jgdi0.log, jgdi.stderr and jgdi.stdout in the master spooling directory.
The logging can also be influenced by changing the additional_jvm_args
configuration to enable additional debugging messages for example.
#
#
#
Java Logging Configuration for JMX MBean server
2-96 Oracle Grid Engine Installation and Upgrade Guide
JMX Configuration Files
# Specify the handlers to create in the root logger
# (all loggers are children of the root logger)
# The following creates two handlers
# Per default we log to the console
#handlers = java.util.logging.ConsoleHandler
# Use FileHandler
handlers = java.util.logging.FileHandler
# -----------------------------------------------------------------------------# Definition of log levels
# -----------------------------------------------------------------------------# Set the default logging level for the root logger
.level = INFO
#com.sun.grid.jgdi.JGDI.level = FINE
#com.sun.grid.jgdi.rmi.level = FINE
#com.sun.grid.jgdi.configuration.xml.XMLUtil.level = FINE
#com.sun.grid.jgdi.configuration.ClusterQueueTestCase.level = FINE
#com.sun.grid.jgdi.management.level = FINER
#com.sun.grid.jgdi.event.level = FINER
# For authuser login module debugging
#com.sun.grid.security.login.level = FINER
#com.sun.grid.util.expect.level = FINER
# -----------------------------------------------------------------------------# Settings for ConsoleHandler
# -----------------------------------------------------------------------------# Set the default logging level for new ConsoleHandler instances
java.util.logging.ConsoleHandler.level = INFO
# Set the default formatter for new ConsoleHandler instances
java.util.logging.ConsoleHandler.formatter = com.sun.grid.jgdi.util.SGEFormatter
# -----------------------------------------------------------------------------# Settings for FileHandler
# -----------------------------------------------------------------------------# Set the default logging level for new FileHandler instances
java.util.logging.FileHandler.level = ALL
# qmaster runs in qmaster spool dir, so the file is created there
java.util.logging.FileHandler.pattern=jgdi%u.log
java.util.logging.FileHandler.formatter=com.sun.grid.jgdi.util.SGEFormatter
#
# Possible columns:
#
#
time
timestamp of the log message
#
host
hostname of the log message
#
name
name of the logger
#
thread id of the thread
#
level
log level (short form)
#
source class and method name
#
level_long log_level long form
#
com.sun.grid.jgdi.util.SGEFormatter.columns = time thread source level message
#
#
#
Print the stacktrace of the log record
Installing Grid Engine 2-97
Testing and Troubleshooting
com.sun.grid.jgdi.util.SGEFormatter.withStacktrace=true
#
# Delimiter between columns
#
com.sun.grid.jgdi.util.SGEFormatter.delimiter = |
Testing and Troubleshooting
To connect to the JMX server jconsole can be used for testing. It is the responsibility of
the administrator to allow/disallow access to the system via JMX. To force also client
authentication of jconsole the management.properties file must be configured with:
■
com.sun.grid.jgdi.management.jmxremote.ssl=true
■
com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true
% jconsole -J-Djava.security.manager=java.rmi.RMISecurityManager \
-J-Djava.security.policy=$SGE_ROOT/util/rmiconsole.policy \
-J-Djavax.net.ssl.trustStore=<server truststore> \
[-J-Djavax.net.ssl.keyStore=/<safe>/mykeystore \
-J-Djavax.net.ssl.keyStorePassword=<mykeystore_pw> \
-J-Djavax.net.ssl.keyPassword=<mykeystore_pw> ] \
[-J-Djavax.net.debug=ssl]
where <server truststore> usually is either: /var/sgeCA/port5322/$SGE_
CELL/private/keystore (only the server certificate is accessible without password) or
a special truststore is made available by the administrator:
keytool -export -alias "root" \
-keystore /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/private/keystore -rfc
-file /tmp/jmxserver.cer
keytool -import -file /tmp/jmxserver.cer -keystore /tmp/truststore
Enter keystore password: <pwd>
...
Trust this certificate? [no]: yes
Certificate was added to keystore
The optional arguments are required if client authentication is set to true or for
debugging.
The following simple example can be used to connect via JMX and monitor events
% java [-Dcom.sun.grid.jgdi.keyStore=\
/var/sgeCA/port$SGE_QMASTER_
PORT/$SGE_CELL/private/keystore \
-Dcom.sun.grid.jgdi.caTop="$SGE_ROOT/$SGE_CELL/common/sgeCA" \
-Djava.util.logging.config.file=util/shell_logging.properties ] \
-cp $SGE_ROOT/lib/juti.jar:$SGE_ROOT/lib/jgdi.jar \
com.sun.grid.jgdi.examples.jmxeventmonitor.Main
The optional arguments can be skipped and serve only to preset the login dialog with
useful values. If a connection has been established once a preferences file is written,
that is reused afterwards. To have the correct environment variables set the Grid
Engine settings.(c)sh file has to be sourced. To get access to the keystore the
command must be run by the admin user in the example above.
For troubleshooting the following settings and files might give some additional
insights:
■
Messages file in the master spool directory if the JMX server can't be started
2-98 Oracle Grid Engine Installation and Upgrade Guide
How to Remove the Software Interactively
■
■
■
■
$SGE_ROOT/$SGE_CELL/common/bootstrap to check if jvm_threads is enabled at all
jgdi* log files in the master spool directory are the main source for finding out the
reason for failure analysis
$SGE_ROOT/$SGE_CELL/common/jmx/logging.properties to enable more detailed
logging
qconf -mconf with an additional_jvm_args parameter For example, add these
two arguments -Djava.security.debug=all -Djavax.net.debug=ssl to trace any
permission and authentication problems.
Removing the Software
This section consists of the following topics.
Topic
Description
How to Remove the Software
Interactively
Procedure for removing the Grid Engine
software interactively.
How to Remove the Software
Using the inst_sge Utility and a
Configuration Template
Procedure for removing the Grid Engine
software using the inst_sge utility and a
configuration template.
How to Remove the Software Interactively
To remove the software interactively, follow the steps below.
Remove the software from the execution hosts before
removing it from the master host. If you remove the software from the
master host first, you cannot automate the removal of the software
from the execution hosts.
Note:
1.
Ensure that your environment variables are set up properly.
If no cell name was specified during installation, the value of
$SGE_CELL is default.
Note:
■
If you are using a C shell, type the following command:
# source $SGE_ROOT/$SGE_CELL/common/settings.csh
■
If you are using a Bourne or Korn shell, type the following command:
# . $SGE_ROOT/$SGE_CELL/common/settings.sh
2.
On the master host, issue the $SGE_ROOT/inst_sge -ux command. This example
uninstalls the execution hosts: host1, host2 and host3.
# $SGE_ROOT/inst_sge -ux -host "host1 host2 host3"
You are not prompted for any information during this process.
However, the output from this process will be displayed to the
terminal window where you run the command.
Note:
Installing Grid Engine 2-99
How to Remove the Software Using the inst_sge Utility and a Configuration Template
3.
(Optional) If you have any shadow master hosts, uninstall them:
# $SGE_ROOT/inst_sge -usm -host "host4"
4.
Uninstall the master host.
# $SGE_ROOT/inst_sge -um
How to Remove the Software Using the inst_sge Utility and a
Configuration Template
Unlike the interactive uninstallation method, the automated uninstallation method
suppresses output during the process. Also, the automated method requires a
properly formatted configuration file.
To remove the software using the inst_sge utility and a configuration template,
follow these steps:
Remove the software from the execution hosts before
removing it from the master host. If you remove the software from the
master host first, you cannot automate the removal of the software
from the execution hosts.
Note:
1.
Ensure that your environment variables are set up properly.
If no cell name was specified during installation, the value of
$SGE_CELL is default.
Note:
■
If you are using a C shell, type the following command:
# source $SGE_ROOT/$SGE_CELL/common/settings.csh
■
If you are using a Bourne or Korn shell, type the following command:
# . $SGE_ROOT/$SGE_CELL/common/settings.sh
2.
Create a copy of the configuration template, $SGE_ROOT/util/inst_sge_
modules/inst_sge_template.conf.
# cd $SGE_ROOT/util/inst_sge_modules/
# cp inst_sge_template.conf my_configuration.conf
3.
Edit your configuration template. Every host that is in the EXEC_HOST_LIST_RM list
will be removed.
# Remove these execution hosts in automatic mode
EXEC_HOST_LIST_RM="host1 host2 host3 host4"
4.
On the master host type the $SGE_ROOT/inst_sge -ux -auto command. This
example uninstalls the execution hosts: host1, host2 and host3. Type the
following command as one string, with a space between the -auto and the $SGE_
ROOT/util/inst_sge_modules/my_configuration.conf components.
# $SGE_ROOT/inst_sge -ux -auto $SGE_ROOT/util/inst_sge_modules/my_
configuration.conf
2-100 Oracle Grid Engine Installation and Upgrade Guide
Microsoft Services for UNIX
You are not prompted for any information during this process.
However, the output from this process will be displayed to the
terminal window where you run the command.
Note:
5.
(Optional) If you have any shadow master hosts, uninstall them. Type the
following command as one string, with a space between the -auto and the $SGE_
ROOT/util/inst_sge_modules/my_configuration.conf components.
# $SGE_ROOT/inst_sge -usm -auto $SGE_ROOT/util/inst_sge_modules/my_
configuration.conf
6.
Uninstall the master host. Type the following command as one string, with a space
between the -auto and the $SGE_ROOT/util/inst_sge_modules/my_
configuration.conf components.
# $SGE_ROOT/inst_sge -um -auto $SGE_ROOT/util/inst_sge_modules/my_
configuration.conf
Additional Software for the Microsoft Operating System
Microsoft Windows Services for UNIX (SFU) and Microsoft Subsystem for
UNIX-based Applications (SUA) make it possible to integrate some Windows
operating systems into existing UNIX environments. SFU and SUA provide
components that simplify network administration and user management across the
UNIX and Windows platforms.
Additional Software
The following sections describe the Microsoft Windows Services for UNIX (SFU) and
Microsoft Subsystem for UNIX-based Applications (SUA) in detail.
Topic
Description
Microsoft Services for UNIX
Learn how Microsoft Windows Services for UNIX
(SFU) makes it possible to integrate some
Windows operating systems into existing UNIX
environments.
Microsoft Subsystem for
UNIX-based Applications
Learn how Microsoft Subsystem for UNIX-based
Applications (SUA) makes it possible to integrate
some Windows operating systems into existing
UNIX environments.
Changing Default Behavior to
Case Sensitivity
Choose between default behavior and case
sensitivity for object names.
Disabling DEP
Learn how to enable DEP for different Windows
platforms.
Enabling suid Behavior for
Interix Programs
Learn how to enable suid Behavior for Interix
Programs.
Microsoft Services for UNIX
Microsoft Windows Services for UNIX (SFU) makes it possible to integrate some
Windows operating systems into existing UNIX environments. SFU provides
components that simplify network administration and user management across the
UNIX and Windows platforms. You can use SFU to do the following:
Installing Grid Engine 2-101
Microsoft Services for UNIX
■
■
■
■
■
Integrate Windows hosts into Grid Engine clusters. This means that the execution
and client environment of Grid Engine can be used on Microsoft Windows hosts.
You must use Grid Engine in combination with SFU for this to occur.
Access the network file system (NFS). This makes it possible for you to share files
between the UNIX and Windows environments.
Possibly access account and password services on UNIX and Windows systems
(PCNFS, NIS) using the user mapping service.
Synchronize passwords and map authentication credentials between the UNIX
and Windows operating systems. You can use the "single sign-on" capability for
Windows and UNIX environments.
Execute UNIX shell scripts and applications to run on Windows platform-based
computers in full-featured UNIX environments.
Interix, SFU's UNIX environment subsystem, offers the following features:
■
■
■
A complete, high-performance UNIX environment. You can use the csh shell or
the ksh shell.
Several hundred tools and utilities.
A complete set of development tools and libraries that make it possible to port
your UNIX-based applications to the Interix sub-system.
SFU is an essential prerequisite to install Grid Engine on Microsoft Windows Server
2003, Windows XP Professional with at least Service Pack 1, Windows 2000 Server
with at least Service Pack 3, or Windows 2000 Professional with at least Service Pack 3.
For Microsoft Windows Server 2003 Release 2, Windows Server 2008, Windows Vista
Enterprise, Windows Vista Ultimate, please see Microsoft Subsystem for UNIX-based
Applications.
Unsupported Grid Engine Functionality
The following Grid Engine components are not supported in a Microsoft Windows
environment and cannot be used on Windows Hosts even though they are standard to
a Grid Engine installation:
■
Master and Scheduler (sge_qmaster and sge_shadowd)
■
Graphical User Interface (qmon)
■
DRMAA
■
qsh client command
Topic
Description
How to Install
Learn how to install Microsoft Services for
Microsoft Services for Unix.
Unix
Troubleshooting SFU
Learn how to troubleshoot Microsoft
Services for Unix.
Configuring User
Name Mapping
Learn how to configure user name
mapping.
Configuring User Name Mapping
User Name Mapping acts as a single clearinghouse that provides centralized user
mapping services for the NFS client of Interix. User Name Mapping provides a map
2-102 Oracle Grid Engine Installation and Upgrade Guide
How to Install Microsoft Services for Unix
between the Windows users and groups on the NFS client, and the corresponding
UNIX users and groups on the NFS server. In principle, these user and group names
might not be identical. However, for users who intend to use Grid Engine, these
names must be identical.
User Name Mapping lets you maintain a single mapping database for the entire
enterprise. This feature makes it easy to configure authentication for multiple
computers running Windows Services for UNIX.
User Name Mapping also permits one-to-many mapping. This lets you associate
multiple Windows accounts with a single UNIX account. To do this, you can use
simple maps, which map Windows and UNIX accounts with identical names. You can
also create advanced maps to associate Windows and UNIX accounts with different
names, which you can use with simple maps. This feature can be useful, for example,
when you do not need to maintain separate UNIX accounts for individuals and would
rather use a few accounts to provide different classes of access permission.
For information about simple and advanced maps, see "Simple
and Advanced Maps" in Help for Services for UNIX. After the
installation has finished, you can find Help for Services for UNIX in
Start -> Programs -> Services for UNIX -> Help for Services for UNIX.
Note:
How to Install Microsoft Services for Unix
System Requirements
The following system requirements apply to the SFU installation:
■
■
■
■
You must install at least Version 5.0 of Internet Explorer, before running the SFU
setup.
You cannot install SFU on a system running Microsoft Services for Network File
System. For example, Microsoft Services for NFS is a component of Windows
Storage Server 2003.
You must install the latest Windows service pack before installing SFU and Grid
Engine. Then, you can install additional Windows service packs as they become
available.
The hard disk requirements for an SFU installation depend on which components
you need to install. The following installation parameters apply:
■
The minimum disk space required is 20 MB.
■
The maximum disk space requirement is 360 MB.
■
■
SFU must be installed on a partition that is formatted with the NTFS file
system.
You must disable Data Execution Prevention (DEP). DEP is not compatible with
some parts of SFU and might cause segmentation faults. For more information
about DEP, see
http://support.microsoft.com/kb/875352
To disable DEP, see Disabling DEP. You can find more details concerning SFU
requirements at:
http://www.microsoft.com/windows/sfu/
Installing Grid Engine 2-103
How to Install Microsoft Services for Unix
Services for UNIX Installation
Microsoft's SFU is required to install Grid Engine successfully. You can download SFU
from
http://www.microsoft.com/
Search the site for "Windows Services for Unix" to find the current download
information.
1.
Get the SFU distribution media.
2.
Execute the application to unzip the files into a directory. This directory must be
located on a file system that has at least 480 MBytes free space.
3.
Log in to the Windows system with the Administrator account.
4.
Start the setup.exe application that you unpacked previously.
5.
Enter your User name and Organization.
6.
Accept the license agreement for SFU.
7.
Choose the standard installation (recommended) or the custom installation. If disk
space is limited, you might want to choose the custom installation. Make sure that
you install at least the following components:
■
Utilities -> Base Utilities
■
Interix GNU components -> Interix GNU utilities
■
Remote connectivity components -> Telnet Server and Windows Remote Shell
■
8.
If you intend to use NFS shared file systems, you also need Authentication
tools for NFS -> User Mapping and Server for NFS Authentication.
Depending on the Windows operating system, you might be presented with the
following two options concerning SFU security settings, shown in the dialog box
below: If you need further information, consult Microsoft's SFU at:
http://www.microsoft.com/windows/sfu/
2-104 Oracle Grid Engine Installation and Upgrade Guide
How to Install Microsoft Services for Unix
Figure 2–19
9.
SFU Security Settings
Configure User Name Mapping.
Figure 2–20
Configuring User Name Mapping
Note: User Name Mapping is part of SFU and not part of Grid
Engine. Consult Microsoft documentation and support to set up user
mapping correctly. Your selection in the dialog box, shown below,
depends on the hosts and services that are currently provided in your
Windows and UNIX environments. If there is no Remote User
Mapping server in your environment, then you should select Local
User Name Mapping Server.
Installing Grid Engine 2-105
How to Install Microsoft Services for Unix
You should install SFU and enable the User Name Mapping
service on your host that acts as a Domain Controller for your
windows environment. All other hosts should contact that Remote
User Name Mapping Server. If you choose Local User Name Mapping
Server, then you might either select Network Information Services
(NIS) to access your passwd and group NIS-maps. Otherwise, select I
if you can provide the files yourself. See Configuring User Name
Mapping for further details.
Note:
10. Depending on your previous selections, you can either enter the NIS Domain
name and NIS Server name or the path of the passwd and group files. Below is an
example of the files that have the standard UNIX format. This means that you can
also use your /etc/passwd and /etc/group files from your UNIX environment.
C:\Unix\etc\passwd
root:x:0:0:UNIX root user:/home/root:/bin/tcsh
user1:x:1002:100:Full name of user1:/home/user1:/bin/tcsh
C:\Unix\etc\group
root::0:
Some NIS maps do not contain an entry for the root user. If
this is the case, follow these steps to map Administrator to root:
Note:
■
■
■
■
First create a password file containing the root entry.
If the SFU installation is finished, start the Services for UNIX
Administration application and create the mapping:
Administrator <-> root.
Switch to NIS mapping.
Use simple mapping or add manual mappings. At this point the
installation starts installing components. Wait until all
components are installed.
11. When the installation process finishes, you might need to reboot the machine,
depending on the version of Windows that you are using.
12. Make sure that the Interix Subsystem Startup starts during boot time. If you intend
to use NFS shares and user mapping, then also start Client for NFS and User
Name Mapping. Depending on the installation options and your version of the
Windows operating system, one or more of these services are disabled by default.
Post SFU Installation Tasks
There are several steps you should follow after you install the SFU software.
1.
Before you start using SFU and install Grid Engine, check that the user mapping is
working correctly by following these steps:
1.
Open an Interix shell locally on the Interix host.
2.
Use the login command to switch to a known user that is not the
Administrator.
3.
Verify the access permissions for NFS shares that should be accessible to that
user.
2-106 Oracle Grid Engine Installation and Upgrade Guide
How to Install Microsoft Services for Unix
4.
2.
Try to access these network resources. If the user cannot access a network
drive and it is a NFS shared drive, most likely the User Name Mapping is not
working correctly.
Check users' home directories. To enable the automounting of the users' home
directories, use the following series of menus:
Control Panel -> Administrative Tools -> Computer Management -> Users ->
Properties -> Profile
Click connect to, select a drive letter, and enter the path of the user's home
directory in UNC notation: \\<server>\<share>\<user home>. Within the Interix
subsystem, you might access all network shares through the special directory:
/net/server/share. You might also create links to these directories to access the
shares directly, for example,
ln -s /net/myserver/export/share00/home /home.
See Step 5to mount network shares .
3.
Enable Administrator names on your machines. Make sure that the administrator
accounts on all machines that are enabled as execution hosts for Grid Engine use
the same account name, such as Administrator. Also make sure that this user has
manager privileges in your Grid Engine cluster. If this is not the case, add the
privileges using qconf -am administrator before the installation of the execution
daemon.
4.
Set the CLI commands. This starts an editor. Make sure to set the EDITOR
environment variable to vi, or your preferred UNIX editor, within the Interix
subsystem before you start using UNIX commands.
5.
Mount network shares. There are two ways to mount network shares to the Interix
host:
1.
Interix provides a directory in which it makes available all network shares it
finds by browsing the network. This works similar to the "Network" folder in
the "Windows Explorer", which also searches the network for available shares.
The directory where these network shares are provided is /net. In this /net
directory all automatically discovered hosts can be found as subdirectories.
Each of this subdirectories will list all network shares of this host as
subdirectories, again. The syntax is /net/server/share, for example,
/net/myserver/home. Eventhough a ls /net may list no content but perhaps
some errors, a ls /net/server/share will list the content of the share. The
errors or missing host names seem to be bug in displaying the names, but this
bug doesn't affect the functionality of the automatically discovered shares. To
make these shares available under the same path as on a UNIX host, it's
recommended to create links to these shares. For example ln -s
/net/myserver/home /home makes the users' UNIX home directories
accessible through /home/username on the Windows host. The automatically
discovered shares are available at boot time for all users who have the
permissions to access the shares. Interix will discover the same kinds of
network shares (SMB, NFS, CIFS and so on) the "Windows Explorer" can
discover. For this, the proper network client must be installed and the
permissions must be sufficient.
2.
Network shares can also be mapped to drive letters by using the net command
of Windows. The syntax is /dev/fs/C/Windows/System32/net.exe <drive
letter>: \\<computername>\<sharename> <devicename>. For example:
/dev/fs/C/Windows/System32/net.exe Z: \\\\myserver\\home
Installing Grid Engine 2-107
Troubleshooting SFU
This drive is now accessible through /def/fs/Z. A link can be created to this
drive to use the same path as on a UNIX host.
As shown in the example above, all backslashes must be
written twice because the shell interprets a single backslash as an
escape character.
Note:
Troubleshooting SFU
The following section describes some common problems that users may encounter
when installing and using Grid Engine in a Services for UNIX environment on a
Windows system.
Impossible to connect to the Interix subsystem through telnet or rsh.
Make sure that the correct services are started. The corresponding Windows services
must be disabled. The Interix versions of telnetd and rshd must be started. You can
do this task by removing the pound sign (#) from the following lines in
/etc/inetd.conf:
#telnet stream tcp nowait NULL /usr/sbin/in.telnetd in.telnetd -i
#shell stream tcp nowait NULL /usr/sbin/in.rshd in.rshd -a
If you still cannot connect to the machine, check your firewall configuration. Do not
block connections to corresponding ports:
Service | Ports
---------+----------ftp
| 20, 21
ssh
| 22
telnet
| 23
rsh
| 514
The wrong default login shell is started. Why?
Both the .rhost and host.equiv authentications fail if new user accounts are created
and if the passwords of existing users are changed. In this case, the command regpwd
needs to be called. After that, follow the steps to register passwords correctly.
Why is the access to NFS mounted home directories slow?
User Name Mapping might be the cause. For a large number of user maps, installing
User Name Mapping on a Domain Controller improves performance by reducing
network traffic. You can create a User Name Mapping server pool. This method means
that you use DNS round-robin to create a pool of computers running User Name
Mapping. This provides improved performance on wide area networks and provides
failover when one of the servers is no longer available.
How can I map user root if it does not exist in the NIS maps?
First create a passwd file which contains an entry for the user root. Then, explicitly
map the root account (no basic mapping) using the created passwd file. Finally, change
the mapping to use the NIS maps. Note that the previous root mapping will persist.
2-108 Oracle Grid Engine Installation and Upgrade Guide
Microsoft Subsystem for UNIX-based Applications
NIS Server cannot be contacted during the SFU installation.
Interrupt the SFU installation and make sure that there is no other service or
application running which already configures or uses the NIS server. If this is the case,
then disable this service for the duration of the SFU installation.
The Interix Subsystem of SFU or the User Mapping is not enabled after reboot.
Make sure that Interix Subsystem Startup and User Name Mapping are automatically
started after machine reboot. Also if you use NFS mounted directories, enable the
service by default: Client for NFS.
Queues stick in unknown state for a very long time.
After the installation or restart of an execution host, the corresponding queues have
attached the unknown (u) state for a very long time. This is normal behavior for
Windows machines. After a full load report interval, the u state should be gone. If this
is not the case, then check that the sge_execd has been started on the corresponding
machine.
Microsoft Subsystem for UNIX-based Applications
Microsoft Subsystem for UNIX-based Applications allows you to integrate Windows
operating systems with the existing UNIX environments. This subsystem provides
components that simplify network administration and user management across UNIX
and Windows platforms. You can use this subsystem to perform the following:
■
■
■
■
Integrate Windows hosts with Grid Engine clusters - Enables you to use the
execution and client environment of Grid Engine on Microsoft Windows hosts.
You must use Grid Engine in combination with Microsoft Subsystem for
UNIX-based Applications for this to happen.
Access the network file system (NFS) - Enables you to share files between the
UNIX and Windows environments.
Synchronize passwords and map authentication credentials between the UNIX
and Windows operating systems - Enables you to use the 'single sign-on'
capability for Windows and UNIX environments.
Execute UNIX shell scripts and applications - Enables you to run shell scripts and
applications on Windows platform-based computers in full-featured UNIX
environments.
Microsoft Subsystem for UNIX-based Application's UNIX environment subsystem,
Interix, offers the following features:
■
■
■
A complete, high-performance UNIX environment. You can use the csh or ksh
shell.
Several hundred tools and utilities.
A complete set of development tools and libraries that make it possible to port
your UNIX-based applications to the Interix sub-system.
Microsoft Subsystem for UNIX-based Applications is an essential prerequisite to
install Grid Engine on Microsoft Windows Server 2003 Release 2, Windows Server
2008, Windows Vista Enterprise, and Windows Vista Ultimate. For Microsoft
Windows Server 2003, Windows XP Professional with at least Service Pack 1,
Windows 2000 Server with at least Service Pack 3, or Windows 2000 Professional with
at least Service Pack 3, see Microsoft Services for UNIX.
Installing Grid Engine 2-109
How to Install a Microsoft Subsystem for UNIX-based Applications
Unsupported Grid Engine Functionality
The following Grid Engine components are not supported in the Microsoft Windows
environment and cannot be used on Windows hosts even though they are standard to
a Grid Engine installation:
■
Master and Scheduler (sge_qmaster and sge_shadowd)
■
Graphical user interface (qmon)
■
DRMAA
■
qsh client command
How to Install a Microsoft Subsystem for UNIX-based Applications
This section describes how to install a Microsoft Subsystem for UNIX-based
Applications.
System Requirements
The system requirements for a Subsystem for UNIX-based Applications installation
are:
■
■
■
Microsoft Windows Server 2003 Release 2, Windows Server 2008, Windows Vista
Enterprise, or Windows Vista Ultimate. Windows Vista Business and all lower
Vista versions are not supported by this subsystem.
You must install the latest Windows service pack before installing Subsystem for
UNIX-based Applications and Grid Engine. You can install additional Windows
service packs as they become available.
The hard disk requirement for a Subsystem for UNIX-based Applications
installation depends on the components that you are planning to install. The
following installation parameters apply.
■
The minimum disk space required is 182 MBytes.
■
The maximum disk space required is approximately 350 MBytes.
■
■
Subsystem for UNIX-based Applications must be installed on a partition that
is formatted with the NTFS file system.
You must disable the Data Execution Prevention (DEP) feature. The DEP feature is
not compatible with some parts of Subsystem for UNIX-based Applications and
might cause segmentation faults. For more information about DEP, see
http://support.microsoft.com/kb/875352
For information on how to disable DEP, see Disabling DEP.
You can find additional information about Subsystem for UNIX-based Applications
requirements at:
http://technet.microsoft.com/en-us/library/cc779522.aspx
Installing Subsystem for UNIX-based Applications
Microsoft Subsystem for UNIX-based Applications is required for installing Grid
Engine on Windows Vista, Windows Server 2008, and Windows 2003 R2. Subsystem
for UNIX-based Applications is partially delivered with these versions of Windows,
but you also need to download some components from the Microsoft web site.
2-110 Oracle Grid Engine Installation and Upgrade Guide
How to Install a Microsoft Subsystem for UNIX-based Applications
1.
Install the components of Subsystem for UNIX-based Applications that are
delivered with Windows.
In this procedure Windows Vista is used as an example. Other
supported Windows Versions function similarly. You must have the
right administrative privileges to perform the installation.
Note:
1.
Click Start.
2.
Click Control Panel.
3.
Click Programs.
4.
Click the Turn Windows features on or off option from the Programs and
Features panel. The Windows Features screen appears.
Figure 2–21
Windows Features Screen
5.
Select the Subsystem for UNIX-based Applications option.
6.
You can also open the Services for NFS tree and select the appropriate option,
if you prefer to use NFS shares.
Note: Ensure that you use SAMBA for networking shares, as you
might have trouble setting up an environment that functions correctly
with both Subsystem for UNIX-based Applications NFS and
Subsystem for Unix NFS clients.
7.
2.
Click OK. Windows installs the new features and might prompt you to insert
the Windows installation DVD.
Download and install the remaining components of Subsystem for UNIX-based
Applications.
Installing Grid Engine 2-111
How to Install a Microsoft Subsystem for UNIX-based Applications
1.
Click Start > All Programs. You will notice a new folder named Subsystem for
UNIX-based Applications in the Windows Start menu.
Figure 2–22
Start Menu with Folder Subsystem for UNIX-based Applications
This folder contains the link to the web page where the remaining components
of Subsystem for UNIX-based Applications can be downloaded.
2.
Download the remaining components of Subsystem for UNIX-based
Applications and double-click to open the file. The file will open in a WinZip
Self-Extractor dialog box.
3.
Click Unzip. The utility unzips the files.
4.
Click OK. The Installer is started and the Subsystem for UNIX-based
Applications setup wizard appears.
5.
Click Next. The Customer Information screen appears.
6.
Enter user name and organization name and click Next. The License and
Support Information screen appears.
7.
Accept the terms of the license and click Next. The Installation Options screen
appears.
8.
Select the Custom Installation option. The Selecting Components screen
appears.
9.
Use the preset selections and select GNU Utilities. Ensure that you also select
GNU SDK. The Security Settings screen appears.
2-112 Oracle Grid Engine Installation and Upgrade Guide
How to Install a Microsoft Subsystem for UNIX-based Applications
Figure 2–23
Selecting Components for Installation
10. Depending on the Windows operating system that you are using, you might
be presented with the above Subsystem for UNIX-based Applications security
settings. Click Next. The Summary screen appears.
11. Select the required disk volume and click Install.
Figure 2–24
Selecting Disk Volume
The installation wizard appears.
12. Click Finish to exit the installation wizard.
Installing Grid Engine 2-113
How to Install a Microsoft Subsystem for UNIX-based Applications
13. Reboot the host. After rebooting, you will notice a C Shell, a Korn Shell, and
some additional links and documentation in the folder named Subsystem for
UNIX-based Applications in the Windows Start menu.
You must set the proper firewall rules to access the host. To
allow programs running under Interix to access the network, you
must add psxss.exe to the list of exceptions. psxss.exe is the main
part of the Interix "kernel". This executable is typically located in
C:\Windows\system32\psxss.exe.
Note:
3.
Ensure that the Interix Subsystem starts up during system booting. If you intend
to use NFS shares, start the client for NFS. The mapping between UNIX and
Windows user IDs is done by the Windows Active Domain Server; consult the
Subsystem for UNIX-based Applications documentation for more information.
Depending on the installation options and your version of the Windows operating
system, one or more of these services are disabled by default.
Post Installation Tasks
You need to perform the following steps after installing Subsystem for UNIX-based
Applications.
1.
Before you start using Subsystem for UNIX-based Applications and install Grid
Engine, you need to check that the user mapping is working correctly.
1.
Open an Interix shell locally on the Interix host.
2.
Use the login command to switch to a known user which is not an
Administrator.
3.
Verify the access permissions for network shares that should be accessible to
that user.
4.
Try to access these network resources. If the user cannot access a network
drive and it is a NFS shared drive, most likely the User Name Mapping is not
working correctly.
2.
Check the users' home directories. To enable the automounting of the users' home
directories, click Start > Control Panel > Administrative Tools > Computer
Management > Users > Properties > Profile. Click Connect to and select the
required drive letter. Enter the path of the user's home directory in UNC notation,
\\<server>\<share>\<user home>. Within the Interix subsystem, you can access
all network shares through the special directory, /net/server/share. You can also
create links to these directories to access the shares directly, for example, ln -s
/net/myserver/export/share00/home /home. See Step 5 to mount network shares.
3.
Enable Administrator names on your machines. Ensure that the administrator
accounts on all machines that are enabled as execution hosts for Grid Engine use
the same account name, such as Administrator. Ensure that this user has manager
privileges in your Grid Engine cluster. If this is not the case, add the privileges
using qconf -am administrator before the installation of the execution daemon.
4.
Set the CLI commands. This opens an editor. Ensure that you set the EDITOR
environment variable to vi, or your preferred UNIX editor, within the Interix
subsystem before you start using UNIX commands.
5.
Mount network shares. There are two ways to mount network shares to the Interix
host:
2-114 Oracle Grid Engine Installation and Upgrade Guide
Troubleshooting Microsoft Subsystem for UNIX-based Applications
1.
Interix provides a directory in which it makes available all network shares it
finds by browsing the network. This works similar to the "Network" folder in
the "Windows Explorer", which also searches the network for available shares.
The directory where these network shares are provided is /net. In this /net
directory all automatically discovered hosts can be found as subdirectories.
Each of this subdirectories will list all network shares of this host as
subdirectories, again. The syntax is /net/server/share, for example,
/net/myserver/home. Eventhough a ls /net may list no content but perhaps
some errors, a ls /net/server/share will list the content of the share. The
errors or missing host names seem to be bug in displaying the names, but this
bug doesn't affect the functionality of the automatically discovered shares. To
make these shares available under the same path as on a UNIX host, it's
recommended to create links to these shares. For example ln -s
/net/myserver/home /home makes the users' UNIX home directories
accessible through /home/username on the Windows host. The automatically
discovered shares are available at boot time for all users who have the
permissions to access the shares. Interix will discover the same kinds of
network shares (SMB, NFS, CIFS and so on) the "Windows Explorer" can
discover. For this, the proper network client must be installed and the
permissions must be sufficient.
2.
Network shares can also be mapped to drive letters by using the net command
of Windows. The syntax is /dev/fs/C/Windows/System32/net.exe <drive
letter>: \\<computername>\<sharename> <devicename>. For example:
/dev/fs/C/Windows/System32/net.exe Z: \\\\myserver\\home
This drive is now accessible through /def/fs/Z. A link can be created to this
drive to use the same path as on a UNIX host.
As shown in the example above, all backslashes must be
written twice because the shell interprets a single backslash as an
escape character.
Note:
Troubleshooting Microsoft Subsystem for UNIX-based Applications
The following section describes some common problems that users may encounter
when installing and using Grid Engine in a Subsystem for UNIX-based Applications
environment on a Windows system.
Impossible to connect to the Interix subsystem through telnet or rsh.
Make sure that the correct services are started. The corresponding Windows services
must be disabled. The Interix versions of telnetd and rshd must be started. You can
do this task by removing the pound sign (#) from the following lines in
/etc/inetd.conf:
#telnet stream tcp nowait NULL /usr/sbin/in.telnetd in.telnetd -i
#shell stream tcp nowait NULL /usr/sbin/in.rshd in.rshd -a
If you still cannot connect to the machine, check your firewall configuration. Do not
block connections to corresponding ports:
Service | Ports
---------+----------ftp
| 20, 21
ssh
| 22
telnet
| 23
Installing Grid Engine 2-115
Changing Default Behavior to Case Sensitivity
rsh
|
514
The wrong default login shell is started. Why?
Both the .rhost and host.equiv authentications fail if new user accounts are created and
if the passwords of existing users are changed. In this case, the command regpwd
needs to be called. After that, follow the steps to register passwords correctly.
Why is the access to NFS mounted home directories slow?
User Name Mapping might be the cause. For a large number of user maps, installing
User Name Mapping on a Domain Controller improves performance by reducing
network traffic. You can create a User Name Mapping server pool. This method means
that you use DNS round-robin to create a pool of computers running User Name
Mapping. This provides improved performance on wide area networks and provides
failover when one of the servers is no longer available.
How can I map user root if it does not exist in the NIS maps?
First create a passwd file which contains an entry for the user root. Then, explicitly
map the root account (no basic mapping) using the created passwd file. Finally,
change the mapping to use the NIS maps. Note that the previous root mapping will
persist.
NIS Server cannot be contacted during the Microsoft Subsystem for UNIX-based
Applications installation.
Interrupt the Microsoft Subsystem for UNIX-based Applications installation and make
sure that there is no other service or application running which already configures or
uses the NIS server. If this is the case, then disable this service for the duration of the
Microsoft Subsystem for UNIX-based Applications installation.
The Interix Subsystem of Microsoft Subsystem for UNIX-based Applications or
the User Mapping is not enabled after reboot.
Make sure that Interix Subsystem Startup and User Name Mapping are automatically
started after machine reboot. Also if you use NFS mounted directories, enable the
service by default: Client for NFS.
Queues stick in unknown state for a very long time.
After the installation or restart of an execution host, the corresponding queues have
attached the unknown (u) state for a very long time. This is normal behavior for
Windows machines. After a full load report interval, the u state should be gone. If this
is not the case, then check that the sge_execd has been started on the corresponding
machine.
Scripts and binaries are causing core dumps
Make sure the "GNU Utilities" and perhaps the "GNU SDK" were installed during the
Microsoft Subsystem for UNIX-based Applications installation. A lot of scripts and
binaries need libraries included in these packages.
Changing Default Behavior to Case Sensitivity
You might have to choose between default behavior and case sensitivity for object
names, such as file names. Your choice will affect system security as well as how
Microsoft Services for UNIX (SFU) and Microsoft Subsystem for UNIX-based
Applications (SUA) function.
2-116 Oracle Grid Engine Installation and Upgrade Guide
Disabling DEP
With Microsoft Windows, the names of most objects are case preserving, but case
insensitive. So, you cannot have two files in the same directory named sample.txt and
Sample.txt because Windows regards the names as identical.
However, the UNIX operating system is fully case sensitive. So, UNIX systems
distinguish between object names even when the only difference between those names
is the case of the object name characters. Therefore, sample.txt and Sample.txt could
appear in the same directory and the UNIX system would distinguish between them
when performing operations on the files. For example, the command rm S*.txt would
delete Sample.txt but not sample.txt. To implement typical UNIX behavior, the server
for NFS and the Interix subsystem are normally case sensitive when working with file
names.
This behavior can present security issues, particularly for users who are accustomed to
the case insensitive conventions of Windows. For example, a Trojan horse version of
edit.exe, named EDIT.EXE, could be stored in the same directory as the original. If a
user were to type edit at a Windows command prompt, the Trojan horse version
(EDIT.EXE) could be executed instead of the standard version.
Caution: If case sensitivity is enabled, Windows users should be
made aware of the security issues.
For Windows XP (Professional) and the Windows Server 2003 family, the default
behavior of subsystems (other than the Win32 subsystem) is to preserve case but be
case insensitive. In previous versions of Windows, such subsystems were fully case
sensitive by default. To support standard UNIX behavior, the SFU and SUA setups
allow you to change the default Windows XP and Windows Server 2003 family
behavior for non-Win32 subsystems when installing the base utilities (the Interix
subsystem) or Server for NFS. If you enable case sensitivity and then subsequently
uninstall the base ut
Disabling DEP
How to Disable DEP for Windows XP Professional, Windows Server 2000 and Window
Server 2003
1.
Right-click the My Computer icon on your desktop.
2.
Click Properties.
3.
In the Properties dialog box, click the Advanced tab.
4.
Click Settings in the Startup and Recovery section.
5.
In the next dialog box, click the Edit button to edit the boot command line of your
Windows installation.
6.
Add /noexecute=alwaysoff or modify an existing /noexecute option.
How to Disable DEP for Windows Vista (Enterprise and Ultimate) and Windows Server
2008
1.
Click Start > All Programs > Accessories.
2.
Right-click Command Prompt.
Installing Grid Engine 2-117
Enabling suid Behavior for Interix Programs
3.
Left-click Run as Administrator.
4.
Click Allow, if the system prompts you for permission.
5.
Type the following text in the command prompt window.
bcdedit.exe /set {current} nx AlwaysOff
Enabling suid Behavior for Interix Programs
According to the POSIX standard, a file has permissions that include bits to set both a
UID (setuid) and a GID (setgid) when the file is executed. If either or both bits are set
on a file, and a process executes that file, the process gains the UID or GID of the file.
When used carefully, this mechanism allows a non-privileged user to execute
programs that run with the higher privileges of the file's owner or group.
When used incorrectly, however, this behavior can present security risks by allowing
non-privileged users to perform actions that should only be performed by an
administrator. For this reason, Windows Services for UNIX and Windows Subsystems
for UNIX-based Applications setup does not enable support for this mechanism by
default.
You should enable support for setuid behavior because Grid Engine runs programs
that require this support. If you do not enable support for setuid behavior when
installing Windows Services for UNIX, you can enable it later.
User Management on Windows Hosts
Every user of the Grid Engine execution environment of a Windows machine must
have a user account that has the same name as on the UNIX hosts. User accounts
contain information about the user, including name, password, various optional
entries that determine when and how users log on and how their desktop settings are
stored.
The following sections describe how you would use Windows user management to
support Grid Engine. Windows machines are referred to here using three different
terms. The following table lists the terms and the operating systems which might run
on each corresponding host:
Table 2–18
Windows Machines Terms
Terms
Microsoft Windows OS
Windows Host
Microsoft Windows 2000, Microsoft Windows XP, Microsoft
Windows 2000 Server, Microsoft Windows Server 2003
Windows Server
Microsoft Windows 2000 Server, Microsoft Windows Server
2003
Windows Workstation
Microsoft Windows 2000, Microsoft Windows XP
Managing Users on Windows Hosts
It is possible to administer user accounts on all Windows hosts individually. Each
Windows Host has an authentication center which validates user names and
corresponding user rights. User accounts which are defined on a Windows
workstation are referred to here as local user accounts or local users.
Each Windows Host has its own local domain, and each Windows Server has the
ability to make that domain available to other hosts. Account names within a local
2-118 Oracle Grid Engine Installation and Upgrade Guide
Using Grid Engine in a Microsoft Windows Environment
domain and account names within a server domain can collide. To avoid such
collisions, you must specify the correct user account by providing the domain name as
a prefix to the user account name followed by a + (plus sign) character.
Windows User Example
The following is an example that illustrates the potential complexity of Windows host
accounts interacting with Windows Domain accounts. Suppose Windows Workstation
host named CRUNCH has a local user account named Peter. This Windows
Workstation is part of the domain named ENGINEERING. This domain is provided
by a Windows Server which also has a user account named Peter. In this example, the
ENGINEERING domain is the default domain of the host named CRUNCH. The
following table shows the possible results of what would happen if a person tried to
log in to CRUNCH.
Table 2–19
Using Domain Accounts
Login Name
Result
CRUNCH+Peter
Peter is logged in with his account as a local user of the
machine CRUNCH.
ENGINEERING+Peter
Peter is logged in with the account provided by the
Windows Server hosting the ENGINEERING domain.
Peter
This approach is equivalent to using
ENGINEERING+Peter because CRUNCH has
ENGINEERING as its default domain. Otherwise, the
local account would be used.
Each domain has a special user account that provides superuser access. The default
name for that account is Administrator. For native Windows, the members of the
Administrators group and of the Domain Admins group in the server domain also
have superuser access. However, for Interix, only the user Administrator of the local
domain is the superuser of the local host.
The local Administrator can start applications in an account without knowing the
password of the user for that account. However, the application would not be able to
access network resources because even the local Administrator is not fully trusted by
the network, unlike the Unix super user root. Therefore, the Grid Engine administrator
uses the sgepasswd tool to register the users' passwords, as explained in Using Grid
Engine in a Microsoft Windows Environment.
UNIX User Management
UNIX has no equivalent to the Windows domain concept. With UNIX, each user has a
local account and is authenticated as a local account even if the underlying account
information lies on an LDAP or NIS server. The UNIX super user root is similar to the
local Windows super user Administrator. The UNIX super user can start applications
and processes on behalf of UNIX accounts without knowing each corresponding
password.
Using Grid Engine in a Microsoft Windows Environment
The Grid Engine execution environment starts jobs on behalf of the submitting user.
The execution daemon (sge_execd) on UNIX hosts runs as root so that it can start jobs
on behalf of all users.
Installing Grid Engine 2-119
Using Grid Engine in a Microsoft Windows Environment
On Windows hosts, the execution daemon runs as the local Administrator user so that
it can start jobs on behalf of users without knowing their password, but these jobs
would not have the permissions to access network resources. Only fully authenticated
users can access network resources. For a full authentication, the user's password is
needed. Therefore, all users who want to submit jobs to a Windows execution host
have to register their passwords with Grid Engine. The execution daemon still needs
to run as the local Administrator to have the permissions to do several administrative
tasks.
Registering Windows User Passwords
Users who want to start Grid Engine jobs on Windows execution hosts use the
sgepasswd client application to register their Windows passwords. The following
example shows Peter who has a user account in the domain ENGINEERING. Because
ENGINEERING is the principal domain of the Windows execution host CRUNCH,
Peter does not need to register his password for a specific domain. This should be the
default in any properly set up single domain environment. In multiple domain
environments, it might be necessary to register the password explicitly for a specific
domain.
Note: You must run the sgepasswd command on a non-Windows
host.
> sgepasswd
Changing password for Peter
New password:
Re-enter new password:
Password changed
Using the sgepasswd Command
The sgepasswd command changes the Grid Engine password file sgepasswd(5). This
file contains a list of user names and their Windows passwords in encrypted form.
You can use sgepasswd to perform the following tasks:
■
To add a new entry for your user account.
■
To change your existing password, if you know your stored password.
Caution: If Grid Engine tries to run several of your jobs at once on a
Windows execution host and is unable to access a correct password
for your account, the Windows intrusion detection system could
disable your account. To keep your account from being disabled, you
must prevent your pending jobs from being run before you attempt to
change your Windows user password. Once you have changed your
password using sgepasswd on a non-Windows host and then on your
Windows domain, you can allow your jobs to be run again.
Additionally, the root user can change or delete the password entries for other user
accounts. sgepasswd is only available on non-Windows hosts.
The sgepasswd uses one of the following syntaxes:
sgepasswd [[ -D <domain> ] -d <user> ]
2-120 Oracle Grid Engine Installation and Upgrade Guide
How to Add Windows Hosts Later
sgepasswd [ -D <domain> ] [ <user> ]
This command supports the following options:
Table 2–20
Supported Options of sgepasswd
Options
Description
-D domain
By default, sgepasswd adds or modifies the current
UNIX user name without a domain specification. You
can use this switch to add a domain specification in front
of the current user name. Consult your Microsoft
Windows documentation for more information about
domain users.
-d user
Only root can use this parameter to delete entries from
the sgepasswd(5) file.
-help
Prints a listing of all options.
Additionally, the following environment variables affect the operation of this
command.
Table 2–21
Environment Variables Affecting sgepasswd Command
Variables
Description
SGE_CERTFILE
Specifies the location of public key file. By default, sgepasswd
uses the file $SGE_ROOT/$SGE_
CELL/common/sgeCA/certs/cert.pem.
SGE_KEYFILE
If set, this specifies the location of the private key file. The default
file is /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_
CELL/private/key.pem.
SGE_RANDFILE
If set, this specifies the location of the rand.seed file. The default
file is /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_
CELL/private/rand.seed.
Adding Windows Hosts to Existing Grid Engine Systems
If you have a running Grid Engine system on which Windows support is not enabled,
you can enable the support manually. The following steps provide a
Windows-enabled Grid Engine system that allows additional Windows execution
hosts.
How to Add Windows Hosts Later
1.
Copy Windows binaries to the $SGE_ROOT directory.
2.
Type the following command:
qconf -mconf
Set the execd_params to enable_windomacc=true.
3.
Type the following command:
qconf -am <win_admin_name>
4.
Run the following command:
$SGE_ROOT/util/sgeCA/sge_ca -init -days 365
Installing Grid Engine 2-121
Other Installation Issues
5.
For a CSP installation, run the following command:
$SGE_ROOT/util/sgeCA/sge_ca -user <win_admin_name>
6.
Type the following command:
qconf -ah <new_win_hosts>
7.
Copy certificates to each Windows host.
8.
Set the owner of the certificates to ADMINUSER. Use a command similar to the
following example:
chown -R foo:bar /var/sgeCA/port <SGE_QMASTER_PORT>
9.
Run normal exec daemon installation on each execution host.
Other Installation Issues
Additional considerations for installing Grid Engine software are identified in this
section.
How to Verify and Install Linux Motif Libraries
On newer Linux systems, the libXm.so.2 Motif libraries are not always installed,
which results in the inability to run the precompiled Linux qmon binary.
To correct this problem, follow these steps:
1.
Check if the libraries are already present.
% ls -l /usr/X11R6/lib/libXm*
If the /usr/X11R6/lib/libXm.so.2 points to a libXm.so.2.x version, you are
done. Note that a symbolic link to /usr/X11R6/lib/libXm.so.3 does not work. If
the libraries are not present, then continue following these steps.
2.
Download the corresponding openmotif libraries from
http://www.ist.co.uk/DOWNLOADS/motif_download.html or from the SUSE 9.1
distribution (an additional rpm file called openmotif21-* is available).
3.
Install the missing libraries as root. For SUSE 9.1, you install the openmotif21-*
package like any other package. For packages downloaded from
http://www.ist.co.uk, install the libraries as shown in the following example.
# rpm -i --prefix /tmp/test --force \
openmotif-2.1.31-2_IST-JDS2003.i386.rpm
# cd /tmp/test/OpenMotif-2.1.31/lib
# cp libXm.so.2.1 /usr/X11R6/lib
# cd /usr/X11R6/lib
# ln -s libXm.so.2.1 libXm.so.2
4.
Test qmon.
% ldd `which qmon`
How to Install the Software on a System with IPMP
This section describes how to install the Grid Engine software on hosts with the Solaris
Operating Environment IP Multipathing (IPMP) technology.
2-122 Oracle Grid Engine Installation and Upgrade Guide
How to Install the Software on a System with IPMP
What Is IP Multipathing?
IP Multipathing is a technology that allows TCP/IP interfaces to be grouped for
failover and load balancing purposes. If an interface within an IP Multipathing group
fails, the interface is disabled and its IP address is relocated to another interface in the
group. Outbound IP traffic is distributed across the interfaces of a group. For further
details on IP Multipathing, refer to the Oracle Solaris OS documentation at:
http://www.oracle.com/technetwork/indexes/documentation/index.html
Issues Between IPMP and Grid Engine
When starting the Grid Engine daemons on a machine where the main interface is part
of an IPMP group, error messages appear. When the IPMP load balancing distributes
the connections across the interfaces in the group, the IP packets show up at the
receiving end as coming from a different host from the one associated with the main
interface. For example, on a machine with three interfaces named qfe0, qfe1, and
qfe3, where the IP addresses for these interfaces are 10.1.1.1, 10.1.1.2, and
10.1.1.3 respectively, IPMP would need an extra address for each interface for
testing. However, that requirement is ignored in this example. Each of these addresses
has a host name associated with it. The hosts table looks like the following example:
10.1.1.1 sge
10.1.1.2 sge-qfe1
10.1.1.3 sge-qfe2
The machine's host name is sge. When a connection is established from sge to another
machine, it might go through sge, sge-qfe1, or sge-qfe2. Upon installation, Grid
Engine will only recognize sge. When Grid Engine receives a connection request from
sge-qfe2, it closes the connection because the request is not from one of the
authorized (or known) nodes.
To solve this problem, use the host_aliases files to "tell" Grid Engine that sge, sge1,
and sge-qfe2 are all from the same machine. The host_aliases file in this case would
look like this:
sge sge-qfe1 sge-qfe2
If you make any changes to the $SGE_ROOT/$SGE_
CELL/common/host_aliases file, you must stop and restart all running
Grid Engine daemons (sge_qmaster and sge_execd). To do this, log in
as root to all your Grid Engine hosts and enter these commands:
Note:
/etc/init.d/sgemaster stop
/etc/init.d/sgeexecd stop
/etc/init.d/sgemaster start
/etc/init.d/sgeexecd start
Installing the Grid Engine Master Node With IPMP
There are two ways that you can fix this problem:
■
■
Ignore the error messages during installation. This method is operating system
independent (except for MS Windows).
Temporarily disable IPMP on the interface associated with the machine's host
name. This method only works on systems running at least Version 8 of the Solaris
OS.
Installing Grid Engine 2-123
How to Install the Software on a System with IPMP
Ignoring the Error Messages
To ignore the error messages, follow these steps:
1.
Run the inst_sge -m command while ignoring the error messages during the start
up of the daemons.
2.
Shut down the daemons with the/etc/init.d/sgemaster stop
and/etc/init.d/sgemaster stop commands. Due to the networking errors, some
daemons fail to shutdown and must be killed with the kill -9 command. To see
which daemons failed to shutdown use this command: ps -e | grep sge_.
3.
Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.
4.
Restart the daemons with the/etc/init.d/sgemaster start and
/etc/init.d/sgeexecd start commands.
Temporarily Disabling IPMP
To temporarily disable IPMP, follow these steps:
1.
Identify the interface associated with the machine's host name.
2.
Verify that the interface has IPMP enabled by using the ifconfig interface|
grep groupname command.
3.
Take note of the group name.
4.
Disable IPMP with this command: ifconfig interface group "".
5.
Install the Grid Engine master node.
6.
Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.
7.
Restart the daemons with the /etc/init.d/sgemaster and/etc/init.d/sgeexecd
commands.
8.
Re-enable IPMP using the following command: ifconfig interface group_IPMP
group.
Installing a Grid Engine on an Execution Host With IPMP
Once the host_aliases file is installed and the Grid Engine daemons are restarted, you
can simply start the execution host installation without further problems.
Enabling Administrative and Submit Hosts With IPMP
You have two choices when enabling these hosts with IPMP:
■
■
Follow the same procedure used for the execution host (updating the host_
aliases file before installation).
Add all the host names associated with the administrative or submit host with one
of the following commands:
■
For the administrative host:
qconf -ah <hostname> <alias 1> <alias 2> ...
■
For the submit host:
qconf -as <hostname> <alias 1> <alias 2> ...
2-124 Oracle Grid Engine Installation and Upgrade Guide
3
Upgrading Grid Engine
3
The following instructions will work only on the Grid Engine
6.2 RR release.
Note:
The upgrade procedure is only able to upgrade your software from version 6.0 update
2 or higher. If you are running an older version of the Grid Engine software, such as
5.3 or 6.0, you must upgrade to version 6.0 update 2 or higher and then upgrade again
to version 6.2 as explained below. See How to Upgrade from 5.3 to 6.0.
About Upgrading the Software
■The upgrade procedure is now partly destructive. See the
Constraints.
Note:
■
■
The LD_LIBRARY_PATH variable is not set in Grid Engine 6.2
software. Remove the existing LD_LIBRARY_PATH settings from 6.0
before you start a 6.2 installation.
Before you begin the upgrade process, make sure that you source
the existing $SGE_ROOT/$SGE_CELL/common/settings.sh or $SGE_
ROOT/$SGE_CELL/common/settings.csh file.
The upgrade procedure uses the cluster configuration information from the older
version of the software to install the Grid Engine 6.2 software on the master host.
Beginning with the Grid Engine 6.2 release, you can install 6.2 to a different $SGE_ROOT
or $SGE_CELL and transfer the old configuration to this cluster. This method is called
cloned cluster configuration. You might want to use this method to accomplish the
following:
■
To test the upgrade before making the real upgrade.
■
To keep the old cluster running.
Before You Upgrade
Choose one of the following methods to upgrade to 6.2:
■
New 6.2 installation (different $SGE_ROOT or $SGE_CELL) using the same
configuration as was used for the old cluster (cloned cluster configuration). If you
use the cloned cluster configuration, you do not have to stop or in any way affect
the original cluster. You simply install a new qmaster and transfer the
Upgrading Grid Engine 3-1
Constraints
configuration from the old cluster to the new one. Then, you manually restart the
new execution daemons on all the original execution hosts. The disadvantage of
the cloned configuration method is that you have to install the new qmaster and
might loose some of the configuration information during the upgrade (see the
Constraints). Another disadvantage is that the original execution host will now
have twice as many slots - one set for the old cluster and one for the new one.
■
Real upgrade of the existing cluster (same $SGE_ROOT and $SGE_CELL.)
Constraints
The following constraints apply to both upgrade methods:
■
Dynamic and static load values will be lost (only static values will be recreated).
■
The sharetree usage will be lost.
■
Neither jobs nor advanced reservations (ARs) will be replicated.
■
■
There might be running or pending jobs in the cluster when the configuration is
saved. If you decide to install the new Grid Engine version in the same $SGE_ROOT
and $SGE_CELL, then you must remove all jobs from the old cluster before the old
cluster is shutdown and the new software is installed.
The previous state of a disabled queue will be lost if the queue config initial_
state is set to default.
How to Back Up the Configuration of the Old Cluster
You can create this backup at any time before you start the upgrade procedure. The
upgrade is the same for both types of the upgrade procedures. To create the backup, at
least the qmaster daemon must be running.
What the Backup Contains
The backup saves the following files:
■
arseqnum
■
jobseqnum
■
act_qmaster
■
bootstrap
■
cluster_name
■
host_aliases
■
qtask
■
sge_aliases
■
sge_ar_request
■
sge_request
■
sge_qstat
■
sge_qquota
■
sge_qstat
■
shadow_masters
3-2 Oracle Grid Engine Installation and Upgrade Guide
How to Install the 6.2 Software Using the Cloned Configuration Method
■
accounting
■
dbwriter.conf
■
jmx directory
Caution: During the upgrade procedure, you can select the next job
ID. Do not select a job ID that is less than the last job ID in the
accounting file in the backup. If you do, the accounting file will
contain some job IDs twice. This leads to unexpected behaviors. To
avoid the problem, accept the suggested default for the next job ID.
The upgrade procedure calculates a safe value for the default.
The backup process creates the following files:
■
sge_root - old $SGE_ROOT
■
sge_cell - old $SGE_CELL
■
ports - old $SGE_QMASTER_PORT and $SGE_EXECD_PORT
■
win_hosts - A list of registered windows execution hosts at the time of the backup
The standard qconf client is used to save the complete cluster configuration.
How to Back Up the Cluster
1.
Either download the backup script or get the backup script from the Grid Engine
6.2 common package (util/upgrade_modules/save_sge_config.sh).
2.
(Optional) Verify that the script is executable.
3.
Source the $SGE_ROOT/$SGE_CELL/common/settings.sh (or .csh) file of the
original cluster.
4.
Run the backup script. The backup script has one argument, which is the path to
the directory in which to store the backup. The directory must not already exist,
but the user must have permission to create it.
Note: You must run the backup script on an admin host (qconf -sh)
as a manager or operator user (typically sgeadmin).
# ./save_sge_config.sh /backups/sge_6.1_June10_2008
The backup process displays a message confirming that the backup succeeded.
How to Install the 6.2 Software Using the Cloned Configuration Method
Additional Constraints for the New 6.2 Installation with Cloned Configuration
For the cloned cluster configuration, you must also define several new variables and
directories that must be different from the original settings:
■
$SGE_ROOT
■
$SGE_CELL
Upgrading Grid Engine 3-3
How to Install the 6.2 Software Using the Cloned Configuration Method
■
$SGE_CLUSTER_NAME
■
$SGE_QMASTER_PORT
■
$SGE_EXECD_PORT
■
Master daemon spooling directory (qmaster_spool_dir)
■
Execution daemon spooling directory (execd_spool_dir)
■
Group ID range for the jobs (gid_range)
Caution: Only one SGE_Helper_Service.exe can run on an
execution host. You cannot use the same Windows execution host for
a 6.0 or 6.1 cluster and a 6.2 cluster.
Because there have been significant changes in the Grid
Engine 6.2 software, loading the configuration adds and removes
some configuration attributes. Adding and removing configuration
attributes might affect the operation of the cluster.
Note:
■
To ensure stability, you should always follow this process:
1.
Upgrade to the new $SGE_ROOT or $SGE_CELL (cloned cluster configuration).
2.
Test that the original cluster configuration did not change and that the
functionality of the cluster remains intact.
3.
Perform the real upgrade of the original cluster, if desired.
Caution: Do not make both the new cluster and the old cluster
available to your users. If you do, execution hosts would offer the
original amount of slots for both clusters and might become
overloaded.
1.
Back up the original cluster settings as described in How to Back Up the Cluster.
2.
(Optional) ARCo Upgrade Prerequisites If you use ARCo and you want to have
the data from the old and new cluster in the same ARCo database, you cannot
install the dbwriter on the new cluster, specifying the old dbwriter's database
parameters, unless the dbwriter from the old cluster is stopped and all the data
from the old cluster are inserted in the database. After installing dbwriter (with
the same database parameters) on the new cluster, you must not again start the
dbwriter on the old cluster, otherwise your database will be compromised.
1.
Wait to install ARCo on the new cluster until all the jobs are drained from the
old cluster, the cluster is stopped and the old reporting file is processed
completely. There should be no reporting or reporting.processing file in the
$SGE_ROOT/$SGE_CELL/common directory of the old cluster.
Jobs can be submitted and the reporting file generated on the
new cluster, as long as there is no dbwriter installed on the new
cluster.
Note:
3-4 Oracle Grid Engine Installation and Upgrade Guide
How to Install the 6.2 Software Using the Cloned Configuration Method
Caution: There cannot be more than one dbwriter process writing
into the same ARCo database and schema. If you create a new ARCo
database for the new cluster, you cannot later merge it with the old
ARCo database, due to the primary key constraints.
Once the reporting file on the old cluster is processed, on dbwriter host:
2.
Source the cluster settings.sh (or .csh) file.
3.
Stop the dbwriter:
# $SGE_ROOT/$SGE_CELL/common/sgedbwriter stop
3.
Extract the new 6.2 binaries and common files to the new $SGE_ROOT directory.
Caution: Before starting the next step, ensure that your library path
does not contain path to libraries from the previous version (for
example, echo $LD_LIBRARY_PATH). If old library is found on the
library path remote it or unset it completely (for example, unset LD_
LIBRARY_PATH). Use LIBPATH, DYLD_LIBRARY_PATH or SHLIB_PATH
instead of LD_LIBRARY_PATH on AIX, Mac OS or HP Unix platforms
respectively.
4.
Start the new upgrade installation of the qmaster from the new $SGE_ROOT
directory.
# ./inst_sge -upd
This starts the upgrade procedure. See the Example Upgrade for Cloned Cluster
Configuration.
Tip: To enable or disable some additional features like JMX, CSP, or
use old IJS, you must provide additional flags to the upgrade script
the same way you would for qmaster installation. For example, to
upgrade a cluster and enable JMX thread in qmaster and CSP mode
run: ./inst_sge -upd -jmx -csp
5.
Accept the displayed license.
6.
Enter the complete path to the backup directory. For example, /backups/sge_6.1_
June10_2008. See Step 6 in the example.
7.
Enter the new $SGE_ROOT directory. The default is the current directory. For more
information, see $SGE_ROOT Directory. See Step 7 in the example.
8.
Select a new $SGE_CELL directory. The default is the $SGE_CELL directory from the
backup. For more information, see Cells. See Step 8 in the example.
9.
Select a new SGE_QMASTER_PORT number. The default is the $SGE_QMASTER_PORT
number from the backup + 10. See Step 9 in the example.
10. Select a new SGE_EXECD_PORT number. The default is the $SGE_EXECD_PORT number
from the backup + 10. See Step 10 in the example.
11. Select a new qmaster spooling directory The default is $SGE_ROOT/$SGE_
CELL/spool/qmaster. See Step 11 in the example.
Upgrading Grid Engine 3-5
How to Install the 6.2 Software Using the Cloned Configuration Method
12. Select a new $SGE_CLUSTER_NAME. The default is p$SGE_QMASTER_PORT. For more
information, see Cluster Name. See Step 12 in the example.
13. (Optional) Choose the JMX configuration. For more information about JMX, see
Installing a JMX-Enabled System. If you started the upgrade using the -jmx
option, one of the following choices appears:
1.
Choose if you want to use JMX settings from the backup or use new settings.
This question appears when JMX exists in the backup.
2.
Choose a JMX port. This question appears when JMX does not exist in the
backup.
14. Select a spooling method. For more information on choosing a spooling
mechanism, see Choosing Between Classic Spooling and Database Spooling. See
Step 14 in the example.
15. Choose if you want to use interactive jobs support (IJS) settings from the backup
or use the new defaults for 6.2. In most cases, you should use the new defaults
which enable the new interactive jobs support. Step 15 in the example shows the
new defaults.
If you changed QLOGIN_DAEMON, QLOGIN_COMMAND, RLOGIN_
DEAMON, RLOGIN_COMMAND, RSH_DEAMON, or RSH_COMMAND configuration
attributes, you should verify that the new IJS will not break your
site-specific settings.
Caution:
16. Choose the group id range. The default is the last group id from the backup + 100
and same range. See Step 16 in the example.
17. Select the next job ID. The default is old jobseqnum + 1000, rounded up to the
nearest 1000. See Step 17 in the example.
18. (Optional) Select the next AR ID. This question appears only if arseqnum is in the
backup. The default is old arseqnum + 1000, rounded up to the nearest 1000. See
Step 18 in the example
19. Select automatic startup options. See Step 19 in the example. One of the following
choices appears:
1.
Choose whether to run qmaster as an SMF service. This question appears only
on systems that run at least version 10 of the Solaris OS.
2.
Choose whether to use RC scripts for qmaster. This question appears on
platforms that are not running at least version 10 of the Solaris OS or if you
started the upgrade using the -nosmf option.
20. Load the old configuration. See Step 20 in the example. If this step fails with a
critical error:
1.
Check the log file /tmp/sge_backup_date.log.
2.
Try to reload the configuration through the $SGE_ROOT/util/ upgrade_
modules/load_sge_config.sh script and the arguments displayed in the
previous step.
3.
If the preceding steps do not resolve the problem, stop the upgrade process.
21. (Optional) Upgrade ARCo. If you use ARCo, you need to upgrade it. If you want
to use the same ARCo database, copy the $SGE_ROOT/$SGE_
CELL/common/dbwriter.conf from the old cluster into the same directory on the
3-6 Oracle Grid Engine Installation and Upgrade Guide
How to Install the 6.2 Software Using the Cloned Configuration Method
new cluster, it will be sourced and you will be only prompted to enter any missing
information during the installtion of dbwriter.
22. Run the post upgrade procedures.
The post-upgrade procedures are easier when you have root
access to all machines through ssh or rsh without having to enter a
password. To use rsh instead of the default ssh, run the ./inst_sge
command with -rsh argument. Example:
Note:
# ./inst_sge -upd-execd -rsh
1.
Initialize the local execd spool directories. This step creates the local execd
spool directories on the execd hosts with the correct permissions. Run the
following command as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-execd
2.
(Optional) Create new RC scripts for the whole cluster.
Caution: This command removes old RC scripts. To keep the old RC
scripts, do not run this command.
To start the services automatically after a reboot, run the following command
as root from the master host in $SGE_ROOT directory:
## ./inst_sge -upd-rc
3.
(Optional) Install or update the Windows helper service. Perform this step to
use the Windows execution hosts with the 6.2 cluster. When connecting to
each Windows execution host, you are prompted for an administrator user to
connect to the Windows host. If all your Windows hosts share the same
administrative user, set the environment variable SGE_WIN_ADMIN to that user
to access all Windows hosts without additional user intervention. Example:
(sh, bash)# export SGE_WIN_ADMIN=Administrator
(csh,tcsh)# setenv SGE_WIN_ADMIN Administrator
To install or update the Windows helper service, run the following command
as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-win
Caution: Only one SGE_Helper_Service.exe can run on an
execution host. You cannot use the same Windows execution host for
a 6.0 or 6.1 cluster and a 6.2 cluster.
23. Start the new execution daemons. Optionally, if you can login without typing a
password, you can start the whole cluster as root user from the $SGE_ROOT
directory with a single command:
# ./inst_sge -start-all
This command starts the master daemon, shadow daemons, and all execution
daemons.
Upgrading Grid Engine 3-7
Example Upgrade for Cloned Cluster Configuration
Upgrade is complete.
Example Upgrade for Cloned Cluster Configuration
The following upgrade example uses a copy of the existing cluster configuration with
a different $SGE_CELL. This example does not use JMX and there are no Service Tags.
The steps in this example are referred to from the software upgrade description at
How to Install the 6.2 Software Using the Cloned Configuration Method.
Steps 4 and 5
# ./inst_sge -upd
Welcome to the Grid Engine Upgrade Procedure
-------------------------------------------Before you continue with the upgrade, read these hints:
- Your terminal window should have a size of at least
80x24 characters
- At any time during the upgrade process, use your standard
interrupt key to abort the upgrade. Typically, the interrupt
key combination is Ctrl-C.
The upgrade procedure will take approximately 1-2 minutes.
Hit <RETURN> to continue >>
Step 6
Type the complete path to the Grid Engine configuration backup directory.
------------------------------------------------------------------------Backup directory >> /tmp/bck
Found backup from Grid Engine 6.1u4 version created on 2008-06-10_10:56:29
Continue with this backup directory (y/n) [y] >>
Step 7
The Grid Engine root directory is:
$SGE_ROOT = /sge
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/sge] >>
Your $SGE_ROOT directory: /sge
Hit <RETURN> to continue >>
Step 8
Grid Engine cells
----------------Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name
3-8 Oracle Grid Engine Installation and Upgrade Guide
Example Upgrade for Cloned Cluster Configuration
default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >> new_cell
Using cell >new_cell<.
Hit <RETURN> to continue >>
Step 9
Grid Engine TCP/IP communication service
---------------------------------------The port for sge_qmaster is currently set by the shell environment.
SGE_QMASTER_PORT = 21640
Now you have the possibility to set/change the communication ports by
using the
>shell environment< or you may configure it via a network service,
configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
sge_qmaster <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<:
[1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 1) >>
Grid Engine TCP/IP communication service
---------------------------------------Using the environment variable
$SGE_QMASTER_PORT=21640
as port for communication.
Do you want to change the port number? (y/n) [n] >>
Step 10
Grid Engine TCP/IP communication service
---------------------------------------The port for sge_execd is currently set by the shell environment.
Upgrading Grid Engine 3-9
Example Upgrade for Cloned Cluster Configuration
SGE_EXECD_PORT = 21641
Now you have the possibility to set/change the communication ports by
using the
>shell environment< or you may configure it via a network service,
configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
sge_execd <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<:
[1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 1) >>
Grid Engine TCP/IP communication service
---------------------------------------Using the environment variable
$SGE_EXECD_PORT=21641
as port for communication.
Do you want to change the port number? (y/n) [n] >>
Step 11
Grid Engine qmaster spool directory
----------------------------------The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.
The admin user >sgeadmin< must have read/write access
to the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start
the qmaster daemon on other hosts (see the corresponding section in the
Grid Engine Installation and Administration Manual for details) the account
on the shadow master hosts also needs read/write access to this directory.
The following directory
[/sge/new_cell/spool/qmaster]
will be used as qmaster spool directory by default!
Do you want to select another qmaster spool directory (y/n) [n] >>
Step 12
Unique cluster name
------------------The cluster name uniquely identifies a specific Grid Engine cluster.
3-10 Oracle Grid Engine Installation and Upgrade Guide
Example Upgrade for Cloned Cluster Configuration
The cluster name must be unique throughout your organization. The name
is not related to the Grid Engine cell.
The cluster name must start with a letter ([A-Za-z]), followed by letters,
digits ([0-9]), dashes (-) or underscores (_).
Enter new cluster name or hit <RETURN>
to use default [p21640] >>
Your $SGE_CLUSTER_NAME: p21640
Hit <RETURN> to continue >>
Step 14
creating directory: /sge/new_cell/spool/qmaster/job_scripts
Setup spooling
-------------Your Grid Engine binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic
Initializing spooling database
Hit <RETURN> to continue >>
Step 15
Interactive Job Support (IJS) Selection
--------------------------------------The backup configuration includes information for running
interactive jobs. Do you want to use the IJS information from
the backup ('y') or use new default values ('n') (y/n) [y] >> n
Using new interactive job support default setting for a new installation.
Hit <RETURN> to continue >>
Creating >act_qmaster< file
Step 16
Grid Engine group id range
-------------------------When jobs are started under the control of Grid Engine an additional
group id is set on platforms which do not support jobs. This is done
to provide maximum control for Grid Engine jobs.
This additional UNIX group id range must be unused group id's in your
system. Each job will be assigned a unique id during the time it is
running. Therefore you need to provide a range of id's which will
be assigned dynamically for jobs.
The range must be big enough to provide enough numbers for the
maximum number of Grid Engine jobs running at a single moment on
a single host. E.g. a range like >20000-20100< means, that Grid Engine
will use the group ids from 20000-20100 and provides a range for
100 Grid Engine jobs at the same time on a single host.
Upgrading Grid Engine 3-11
Example Upgrade for Cloned Cluster Configuration
You can change at any time the group id range in your cluster configuration.
Please enter a range [34299-34498] >>
Using >34299-34498< as gid range. Hit <RETURN> to continue >>
Grid Engine cluster configuration
--------------------------------Please give the basic configuration parameters of your Grid Engine
installation:
<execd_spool_dir>
The pathname of the spool directory of the execution hosts. User >sgeadmin<
must have the right to create this directory and to write into it.
Default: [/sge/new_cell/spool] >>
Grid Engine cluster configuration (continued)
--------------------------------------------<administrator_mail>
The email address of the administrator to whom problem reports are sent.
It is recommended to configure this parameter. You may use >none<
if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [sgeadmin@qmaster.com] >>
The following parameters for the cluster configuration were configured:
execd_spool_dir
administrator_mail
/sge/new_cell/spool
sgeadmin@qmaster.com
Do you want to change the configuration parameters (y/n) [n] >>
Step 17
Provide a value to use for the next job ID.
------------------------------------------Backup contains last job ID 1. As a suggested value, we added 1000
to that number and rounded it up to the nearest 1000.
Increase the value, if appropriate.
Choose the new next job ID [2000] >>
Hit <RETURN> to continue >>
Step 18
Provide a value to use for the next AR ID.
-----------------------------------------Backup contains last AR ID 1. As a suggested value, we added 1000
to that number and rounded it to the nearest 1000.
3-12 Oracle Grid Engine Installation and Upgrade Guide
Example Upgrade for Cloned Cluster Configuration
Increase the value, if appropriate.
Choose the new next AR ID [2000] >>
Hit <RETURN> to continue >>
Step 19
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<
Hit <RETURN> to continue >>
qmaster startup script
---------------------Do you want to start qmaster automatically at machine boot?
NOTE: If you select "n" SMF will be not used at all! (y/n) [y] >> n
Grid Engine qmaster startup
--------------------------Starting qmaster daemon. Please wait ...
starting sge_qmaster
Hit <RETURN> to continue >>
Step 20
Last step - load configuration from the backup
---------------------------------------------load command: /sge/util/upgrade_modules/load_sge_config.sh /tmp/bck -mode "copy"
-log C -newijs "false" -gid_range "34299-34498" -admin_mail "sgeadmin@qmaster.com"
-execd_spool_dir "/sge/new_cell/spool"
Hit <RETURN> to continue >>
Loading saved cluster configuration from /tmp/bck (log in
/tmp/sge_backup_load_2008-06-13_17:42:28.log)...
Loading saved cluster configuration from /tmp/bck (log in /tmp/sge_backup_load_
2008-06-13_17:42:28.log)...
Done
If loading the configuration succeeded run these additional commands:
REQUIRED:
inst_sge -upd-execd
This command initializes all execd spool directories.
inst_sge -upd-win
This command connects to all Windows execution hosts and installs
the new Windows helper service on each host.
WARNING: If a helper service from a previous release is running
on this host, the new helper service overwrites it. The
host will run only in a 6.2 cluster.
TIP: This action requires to enter a windows administrator user for each
host interactively. If all your systems share the same administrator you
can set the environment variable SGE_WIN_ADMIN to that user name.
Upgrading Grid Engine 3-13
How to Upgrade the Original Cluster to the 6.2 Software (Real Upgrade)
E.g.: (sh, bash) export SGE_WIN_ADMIN=Administrator
(csh,tcsh) setenv SGE_WIN_ADMIN Administrator
OPTIONAL:
inst_sge -upd-rc
This command creates new autostart scripts for the new cluster
and removes any conflicting files.
TIP: To disable SMF on Solaris systems, use the command
inst_sge -upd-rc -nosmf
TIP: Use inst_sge -post-upd to do all above actions
How to Upgrade the Original Cluster to the 6.2 Software (Real Upgrade)
1.
(Optional) Test the cloned cluster, if you used the cloned cluster configuration
method to transfer the configuration to a new 6.2 cluster.
2.
Back up the original cluster settings as described in How to Back Up the Cluster.
3.
Stop the scheduler:
# qconf -ks
4.
Verify that no jobs are running on the cluster.
5.
Stop the old cluster:
# qconf -ke all
# $SGE_ROOT/$SGE_CELL/common/sgemaster stop
6.
(Optional) Stop the Berkeley DB server, if your cluster uses Berkeley DB server
spooling. On the BDB server host:
1.
Source the cluster settings.sh (or .csh) file.
2.
Type the following command:
# $SGE_ROOT/$SGE_CELL/common/sgebdb stop
7.
(Optional) If you use ARCo, ensure that the reporting file has been completely
processed by the dbwriter. There should be no reporting or reporting.processing
file in the $SGE_ROOT/$SGE_CELL/common directory. Once the reporting file is
processed, on dbwriter host:
1.
Source the cluster settings.sh (or .csh) file.
2.
Stop the dbwriter:
# $SGE_ROOT/$SGE_CELL/common/sgedbwriter stop
WARNING: If you use ARCo, you must completely process the
reporting file and stop the dbwriter before you continue.
8.
Extract the new 6.2 binaries and common files to the $SGE_ROOT directory.
3-14 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade the Original Cluster to the 6.2 Software (Real Upgrade)
Caution: Do not remove any of the $SGE_ROOT directory contents,
except for the case where the new Sun Grid Engine 6.2 binaries differ
from the existing installation. For example, you might have used your
custom lx26-amd64 binaries, but Sun Grid Engine 6.2 uses
lx24-amd64 for 2.6 kernels. In that case you must remove the old
binaries manually. You must ensure that all binaries for the used
architectures were updated and no architecture with the old version
remains in the $SGE_ROOT directory.
9.
Start the new upgrade on the original qmaster host from the $SGE_ROOT directory.
# ./inst_sge -upd
Tip: To enable or disable some additional features like JMX, CSP, or
to use the old IJS, you must provide additional flags to the upgrade
script in the same way that you would for qmaster installation. For
example, to upgrade a cluster and enable the JMX thread in qmaster
and use CSP mode, run the following command:
./inst_sge -upd -jmx -csp
10. Accept the displayed license.
11. Enter the complete path to the backup directory. For example, /backups/sge_6.1_
June10_2008.
In case you you don't specify the original $SGE_ROOT and
$SGE_CELL in the next two steps, the upgrade type attempted will not
be the real upgrade. Instead, the clone cluster configuration method
will be used.
Caution:
12. Enter the $SGE_ROOT directory. The default is the current directory. For more
information, see $SGE_ROOT Directory.
13. Enter the $SGE_CELL directory. The default is default. For more information, see
Cells.
14. Select a new $SGE_CLUSTER_NAME. The default value is one of the following,
depending on which is found first:
1.
The existing SGE_CLUSTER_NAME ($SGE_ROOT/$SGE_
CELL/common/cluster-name)
2.
The SGE_CLUSTER_NAME from the backup
3.
p$SGE_QMASTER_PORT For more information, see Cluster Name.
15. (Optional) Select the JMX configuration. For more information about JMX, see
Installing a JMX-Enabled System. If you started the upgrade using the -jmx
option, one of the following choices appears:
1.
Choose if you want to use JMX settings from the backup or use new settings.
This question appears when JMX exists in the backup.
2.
Choose a JMX port. This question appears when JMX does not exist in the
backup.
16. Choose if you want to keep the spooling method from the backup.
Upgrading Grid Engine 3-15
How to Upgrade the Original Cluster to the 6.2 Software (Real Upgrade)
17. (Optional) Select a spooling method. This is displayed if you chose not to use
backup in the previous screen. See Example Master Host Installation. For more
information on choosing a spooling mechanism, see Choosing Between Classic
Spooling and Database Spooling.
18. Choose if you want to use interactive jobs support (IJS) settings from the backup
or use the new defaults for 6.2. In most cases, you should use the new defaults
which enable the new interactive jobs support.
If you changed QLOGIN_DAEMON, QLOGIN_COMMAND, RLOGIN_
DEAMON, RLOGIN_COMMAND, RSH_DEAMON, or RSH_COMMAND configuration
attributes, you should verify that the new IJS will not break your
site-specific settings.
Caution:
19. Select the next job ID. The default is old jobseqnum + 1000, rounded up to the
nearest 1000.
20. (Optional) Select the next AR ID. This question appears only if arseqnum is in the
backup. The default is old arseqnum + 1000, rounded up to the nearest 1000.
21. Choose automatic startup options. One of the following choices appears:
1.
Choose whether to run qmaster as an SMF service. This question appears only
on systems that run at least version 10 of the Solaris OS.
2.
Choose whether to use RC scripts for qmaster. This question appears on
platforms that are not running at least version 10 of the Solaris OS or if you
started the upgrade using the -nosmf option.
22. Load the old configuration. If this step fails with a critical error:
1.
Check the log file /tmp/sge_backup_date.log.
2.
Try to reload the configuration through the $SGE_ROOT/util/upgrade_
modules/load_sge_config.sh script and the arguments displayed in the
previous step.
3.
If the preceding steps do not resolve the problem, stop the upgrade process.
23. (Optional) Copy the binaries and the commondirectory to all the hosts in the
cluster, if not on a shared file system. If you use local binaries or a local common
directory for each host, you must copy all the new binaries and the common
directory locally to each host. Ensure that all binaries are updated and no
architecture with the old version remains in the $SGE_ROOT directory.
If you do not perform this operation the qmaster host will
have Sun Grid Engine 6.2 binaries, while the rest of the cluster will
still have the old version and will not work as desired.
Note:
24. (Optional) Upgrade ARCo. If you use ARCo, you need to upgrade it. See Oracle
Grid Engine User’s Guide for upgrading ARCO.
25. Run the post upgrade procedures.
3-16 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade from 5.3 to 6.0
The post-upgrade procedures are easier when you have root
access to all machines through ssh or rsh without having to enter a
password. To use rsh instead of the default ssh, run the ./inst_sge
command with -rsh argument. Example:
Note:
# ./inst_sge -upd-execd -rsh
1.
Initialize the local execd spool directories. This step creates the local execd
spool directories on the execd hosts with the correct permissions. Run the
following command as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-execd
2.
(Optional) Create new RC scripts for the whole cluster.
Caution: This command removes old RC scripts. To keep the old RC
scripts, do not run this command. To start the services automatically
after a reboot, run the following command as root from the master
host in $SGE_ROOT directory:
## ./inst_sge -upd-rc
3.
(Optional) Install or update the Windows helper service. Perform this step to
use the Windows execution hosts with the 6.2 cluster. When connecting to
each Windows execution host, you are prompted for an administrator user to
connect to the Windows host. If all your Windows hosts share the same
administrative user, set the environment variable SGE_WIN_ADMIN to that user
to access all Windows hosts without additional user intervention. Example:
(sh, bash)# export SGE_WIN_ADMIN=Administrator
(csh,tcsh)# setenv SGE_WIN_ADMIN Administrator
To install or update the Windows helper service, run the following command
as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-win
Caution: Only one SGE_Helper_Service.exe can run on an
execution host. You cannot use the same Windows execution host for
a 6.0 or 6.1 cluster and a 6.2 cluster.
26. Start the new execution daemons. Optionally, if you can login without typing a
password, you can start the whole cluster as root user from the $SGE_ROOT
directory with a single command:
# ./inst_sge -start-all
This command starts the master daemon, shadow daemons, and all execution
daemons.
Upgrade is complete.
How to Upgrade from 5.3 to 6.0
Upgrading Grid Engine 3-17
How to Upgrade from 5.3 to 6.0
Before You Begin
Be sure to review Planning the Installation for the information that you will need
during the upgrade process. If you have decided to use an administrative user, as
described in User Account Considerations, you should create that user now. This
procedure assumes that you have already extracted the Grid Engine software, as
described in Loading the Distribution Files on a Workstation.
Note: While you can run Grid Engine 6.0 software concurrently with
your older version of Grid Engine software, you should run the
upgrade procedure when there are no running jobs.
Steps
1. Log in to the master host as root.
2.
Load the distribution files. For details, see Loading the Distribution Files on a
Workstation.
3.
Ensure that you have set the $SGE_ROOT environment variable by typing:
# echo $SGE_ROOT
If the $SGE_ROOT environment variable is not set, set it now by typing:
# SGE_ROOT=sge-root; export SGE_ROOT
4.
Change to the sge-root installation directory. Select one of the two following
options:
■
■
5.
If the directory where the installation files reside is visible from the master
host, change directories (cd) to the installation directory sge-root, and then
proceed to How to Install the Master Host.
If the directory is not visible and cannot be made visible, do the following:
–
Create a local installation directory, sge-root, on the master host.
–
Copy the installation files to the local installation directory sge-root
across the network (for example, by using ftp or rcp).
–
Change directories (cd) to the local sge-root directory.
Run the upgrade command on the master host, and respond to the prompts. This
command starts the master host installation procedure. You are asked several
questions, and you might be required to run some administrative actions. The
syntax of the upgrade command is:
inst_sge -upd 5.3-sge-root-directory 5.3-cell-name
In the following example, the 5.3 sge-root directory is /sge/gridware and the cell
name is default.
# ./inst_sge -upd /sge/gridware default
Welcome to the Grid Engine Upgrade
---------------------------------Before you continue with the installation please read these hints:
- Your terminal window should have a size of at least
80x24 characters
- The INTR character is often bound to the key Ctrl-C.
3-18 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade from 5.3 to 6.0
The term >Ctrl-C< is used during the upgrade if you
have the possibility to abort the upgrade
The upgrade procedure will take approximately 5-10 minutes.
After this upgrade you will get a running qmaster and schedd with
the configuration of your old installation. If the upgrade was
successfully completed it is necessary to install your execution hosts
with the install_execd script.
Hit <RETURN> to continue >>
6.
Choose an administrative account owner. In the following example, the value of
sge-root is /opt/n1ge6, and the administrative user is sgeadmin.
Grid Engine admin user account
-----------------------------The current directory
/opt/n1ge6
is owned by user
sgeadmin
If user >root< does not have write permissions in this directory on *all*
of the machines where Grid Engine will be installed (NFS partitions not
exported for user >root< with read/write permissions) it is recommended to
install Grid Engine that all spool files will be created under the user id
of user >sgeadmin<.
IMPORTANT NOTE: The daemons still have to be started by user >root<.
Do you want to install Grid Engine as admin user >sgeadmin< (y/n) [y] >>
7.
Verify the $SGE_ROOT directory setting.In the following example, the value of
$SGE_ROOT is /opt/n1ge6.
Checking $SGE_ROOT directory
---------------------------The Grid Engine root directory is:
$SGE_ROOT = /opt/n1ge6
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/opt/n1ge6] >>
8.
Set up the TCP/IP services for the Grid Engine software.
■
If the TCP/IP services have not been configured, respond to the installation
messages.
Grid Engine TCP/IP service >sge_qmaster<
---------------------------------------There is no service >sge_qmaster< available in your >/etc/services< file
or in your NIS/NIS+ database.
You may add this service now to your services database or choose a port
Upgrading Grid Engine 3-19
How to Upgrade from 5.3 to 6.0
number.
It is recommended to add the service now. If you are using NIS/NIS+ you
should
add the service at your NIS/NIS+ server and not to the local
>/etc/services<
file.
Please add an entry in the form
sge_qmaster <port_number>/tcp
to your services database and make sure to use an unused port number.
Please add the service now or press <RETURN> to go to entering a port
number >>
■
■
Start a new terminal session or window to add information to the
/etc/services file or your NIS maps.
Add the correct ports to the /etc/services file or your NIS services map, as
described in Network Services. The following example shows how you might
edit your /etc/services file.
...
sge_qmaster
sge_execd
536/tcp
537/tcp
In this example, the entries for both sge_qmaster and sge_
execd are added to /etc/services. Subsequent steps in this example
assume that both entries have been made.
Note:
■
Save your changes and return to the window where the installation script is
running.
Please add the service now or press <RETURN> to go to entering a port
number >>
Press <RETURN>. The installation procedure displays the following output:
sge_qmaster 536
Service >sge_qmaster< is now available.
Hit <RETURN> to continue >>
Grid Engine TCP/IP service >sge_execd<
-------------------------------------Using the service
sge_execd
for communication with Grid Engine.
Hit <RETURN> to continue >>
9.
Enter the name of your cell or press Return to use the default. The use of Grid
Engine system cells is described in Cells.
3-20 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade from 5.3 to 6.0
Grid Engine cells
----------------Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name
default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >>
If you have decided not to use cells, the installation process displays the following
information:
Using cell >default<.
Hit <RETURN> to continue >>
10. Specify a spool directory. For guidelines on disk space requirements for the spool
directory, see Disk Space Requirements. For information on where the spool
directory is installed, see Spool Directories under the Root Directory.
Grid Engine qmaster spool directory
----------------------------------The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.
The admin user >sgeadmin< must have read/write access
to the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start
the qmaster daemon on other hosts (see the corresponding section in the
Grid Engine Installation and Administration Manual for details) the account
on the shadow master hosts also needs read/write access to this directory.
The following directory
[/opt/n1ge6/default/spool/qmaster]
will be used as qmaster spool directory by default!
Do you want to select another qmaster spool directory (y/n) [n] >>
■
■
If you want to accept the default spool directory, press Return to continue.
If you do not want to accept the default spool directory, then answer y. In the
following example the /my/spool directory is specified as the master host
spool directory.
Do you want to select another qmaster spool directory (y/n) [n] >> y
Please enter a qmaster spool directory now! >>/my/spool
Upgrading Grid Engine 3-21
How to Upgrade from 5.3 to 6.0
11. Set the correct file permissions.
Verifying and setting file permissions
-------------------------------------Did you install this version with >pkgadd< or did you already
verify and set the file permissions of your distribution (y/n) [y] >> n
Verifying and setting file permissions
-------------------------------------We may now verify and set the file permissions of your Grid Engine
distribution.
This may be useful since due to unpacking and copying of your distribution
your files may be unaccessible to other users.
We will set the permissions of directories and binaries to
755 - that means executable are accessible for the world
and for ordinary files to
644 - that means readable for the world
Do you want to verify and set your file permissions (y/n) [y] >> y
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
Verifying
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
setting
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
file
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
permissions
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
owner
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
in
>3rd_party<
>bin<
>ckpt<
>examples<
>install_execd<
>install_qmaster<
>mpi<
>pvm<
>qmon<
>util<
>utilbin<
>catman<
>doc<
>man<
>inst_sge<
>bin<
>lib<
>utilbin<
Your file permissions were set
Hit <RETURN> to continue >>
12. Specify whether all of your Grid Engine system hosts are located in a single DNS
domain.
Select default Grid Engine hostname resolving method
---------------------------------------------------Are all hosts of your cluster in one DNS domain? If this is
the case the hostnames
>hostA< and >hostA.foo.com<
3-22 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade from 5.3 to 6.0
would be treated as eqal, because the DNS domain name >foo.com<
is ignored when comparing hostnames.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >>
■
If all of your Grid Engine system hosts are located in a single DNS domain,
then answer y.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y
Ignoring domainname when comparing hostnames.
Hit <RETURN> to continue >>
■
If all of your Grid Engine system hosts are not located in a single DNS
domain, then answer n.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >> n
The domainname is not ignored when comparing hostnames.
Hit <RETURN> to continue >>
Default domain for hostnames
---------------------------Sometimes the primary hostname of machines returns the short hostname
without a domain suffix like >foo.com<.
This can cause problems with getting load values of your execution hosts.
If you are using DNS or you are using domains in your >/etc/hosts< file or
your NIS configuration it is usually safe to define a default domain
because it is only used if your execution hosts return the short hostname
as their primary name.
If your execution hosts reside in more than one domain, the default domain
parameter must be set on all execution hosts individually.
Do you want to configure a default domain (y/n) [y] >>
■
Press Return to continue.
–
If you want to specify a default domain, then answer y. In the following
example, sun.com is specified as the default domain.
Do you want to configure a default domain (y/n) [y] >> y
Please enter your default domain >> sun.com
Using >sun.com< as default domain. Hit <RETURN> to continue >>
–
If you do not want to specify a default domain, then answer n. In the
following example, sun.com is specified as the default domain.
Do you want to configure a default domain (y/n) [y] >> n
13. Press Return to continue.
Making directories
Upgrading Grid Engine 3-23
How to Upgrade from 5.3 to 6.0
-----------------creating directory: default/common
creating directory: /opt/n1ge6/default/spool/qmaster
creating directory: /opt/n1ge6/default/spool/qmaster/job_scripts
Hit <RETURN> to continue >>
14. Specify whether you want to use classic spooling or Berkeley DB. For more
information on choosing the spooling mechanism, see Database Server and
Spooling Host.
Setup spooling
-------------Your SGE binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
■
If you want to specify Berkeley DB spooling, press Return to continue.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
The Berkeley DB spooling method provides two configurations!
1) Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host
2) Berkeley DB Spooling Server:
If you want to setup a shadow master host, you need to use
Berkeley DB Spooling Server!
In this case you have to choose a host with a configured RPC service.
The qmaster host connects via RPC to the Berkeley DB. This setup is more
failsafe, but results in a clear potential security hole. RPC communication
(as used by Berkeley DB) can be easily compromised. Please only use this
alternative if your site is secure or if you are not concerned about
security. Check the installation guide for further advice on how to achieve
failsafety without compromising security.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>
–
If you want to use a Berkeley DB spooling server, enter y.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> y
Berkeley DB Setup
----------------Please, log in to your Berkeley DB spooling host and execute "inst_sge
-db"
Please do not continue, before the Berkeley DB installation with
"inst_sge -db" is completed, continue with <RETURN>
Do not press Return until you have completed the Berkeley
DB installation on the spooling server.
Note:
Follow these steps to set up a Berkeley DB spooling server:
–
Start a new terminal session or window.
3-24 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade from 5.3 to 6.0
–
Log in to the spooling server.
–
Install the software as described in How to Install the Berkeley DB
Spooling Server.
–
After you have installed the software on the spooling server, return to the
master installation window, and press Return to continue.
–
Enter the name of the spooling server. In the following example, vector is
the host name of the spooling server.
Berkeley Database spooling parameters
------------------------------------Please enter the name of your Berkeley DB Spooling Server! >> vector
–
Enter the name of the spooling directory. In the following example,
/opt/n1ge6/default/spooldb is the spooling directory.
Please enter the Database Directory now!
Default: [/opt/n1ge6/default/spooldb] >>
Dumping bootstrapping information
Initializing spooling database
Hit <RETURN> to continue >>
–
If you do not want to use a Berkeley DB spooling server, enter n.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> n
Hit <RETURN> to continue >>
Berkeley Database spooling parameters
------------------------------------Please enter the Database Directory now, even if you want to spool
locally
it is necessary to enter this Database Directory.
Default: [/opt/n1ge6/default/spool/spooldb] >>
Then specify an alternate directory, or press Return to continue.
creating directory: /opt/n1ge6/default/spool/spooldb
Dumping bootstrapping information
Initializing spooling database
Hit <RETURN> to continue >>
■
If you want to specify classic spooling, then enter classic.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
classic
Dumping bootstrapping information
Initializing spooling database
Hit <RETURN> to continue >>
15. Enter a group ID range. For more information, see Using the Custom Installation
Mode.
Upgrading Grid Engine 3-25
How to Upgrade from 5.3 to 6.0
Grid Engine group id range
-------------------------When jobs are started under the control of Grid Engine an additional group id
is set on platforms which do not support jobs. This is done to provide maximum
control for Grid Engine jobs.
This additional UNIX group id range must be unused group id's in your system.
Each job will be assigned a unique id during the time it is running.
Therefore you need to provide a range of id's which will be assigned
dynamically for jobs.
The range must be big enough to provide enough numbers for the maximum number
of Grid Engine jobs running at a single moment on a single host. E.g. a range
like >20000-20100< means, that Grid Engine will use the group ids from
20000-20100 and provides a range for 100 Grid Engine jobs at the same time
on a single host.
You can change at any time the group id range in your cluster configuration.
Please enter a range >> 20000-20100
Using >20000-20100< as gid range. Hit <RETURN> to continue >>
16. Verify the spooling directory for the execution daemon. For information on
spooling, see Spool Directories under the Root Directory.
Grid Engine cluster configuration
--------------------------------Please give the basic configuration parameters of your Grid Engine
installation:
<execd_spool_dir>
The pathname of the spool directory of the execution hosts. User >sgeadmin<
must have the right to create this directory and to write into it.
Default: [/opt/n1ge6/default/spool] >>
17. Enter the email address of the user who should receive problem reports. In this
example, the user who will receive problem report is me@my.domain.
Grid Engine cluster configuration (continued)
--------------------------------------------<administator_mail>
The email address of the administrator to whom problem reports are sent.
It's is recommended to configure this parameter. You may use >none<
if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [none] >> me@my.domain
Once you answer this question, the installation process is complete. The system
displays several screens of information before the script exits. The upgrade
3-26 Oracle Grid Engine Installation and Upgrade Guide
How to Upgrade from 5.3 to 6.0
process uses your existing configuration to customize the installation. Output
similar to the following is displayed:
Creating >act_qmaster< file
Creating >sgemaster< script
Creating >sgeexecd< script
creating directory: /tmp/centry
Reading in complex attributes.
Reading in administrative hosts.
Reading in execution hosts.
Reading in submit hosts.
Reading in users:
User "as114086".
User "md121042".
Reading in usersets:
Userset "defaultdepartment".
Userset "deadlineusers".
Userset "admin".
Userset "bchem1".
Userset "bchem2".
Userset "bchem3".
Userset "bchem4".
Userset "damtp7".
Userset "damtp8".
Userset "damtp9".
Userset "econ1".
Userset "staff".
Reading in calendars:
Calendar "always_disabled".
Calendar "always_suspend".
Calendar "test".
Reading in projects:
Project "ap1".
Project "ap2".
Project "high".
Project "low".
Project "p1".
Project "p2".
Project "staff".
Reading in parallel environments:
PE "bench_tight".
PE "make".
Creating settings files for >.profile/.cshrc<
Caution: Do not rename any of the binaries of the distribution. If you
use any scripts or tools in your cluster that monitor the daemons,
make sure to check for the new names.
18. Create the environment variables for use with the Grid Engine software.
If no cell name was specified during installation, the value of
$SGE_CELL is default.
Note:
■
If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
Upgrading Grid Engine 3-27
How to Upgrade from 5.3 to 6.0
■
If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
19. Install or upgrade the execution hosts. There are two ways that you can install the
Sun Grid Engine software on your execution hosts: installation or upgrade. If you
install the execution hosts, the local spool directory configuration, and some execd
parameters will be overwritten. If you upgrade the execution hosts, those files will
remain untouched.
■
To upgrade the software on the execution host, you need to log into each
execution host and run the following command:
# $SGE_ROOT/inst_sge -x -upd
■
To install the software on the execution host:
–
If you only have a few execution hosts, you can install them interactively.
You need to log into each execution host, and run the following command:
# $SGE_ROOT/inst_sge -x
Complete instructions for installing execution hosts interactively are in
How to Install Execution Hosts.
–
If you have a large number of execution hosts, you should consider
installing them non-interactively. Instructions for installing execution
hosts in an automated way are in Using the inst_sge Utility and a
Configuration Template.
20. If you have configured load sensors on your execution hosts, you will need to
copy these load sensors to the new directory location.
21. Check your complexes. Both the structure of complexes and the rules for
configuring complexes have changed. You can use qconf -sc to list your
complexes. Review the log file that was generated during the master host upgrade,
update.pid. The update.pid file will be placed in the master host spool directory,
which is $SGE_ROOT/$SGE_CELL/spool/ by default. If necessary, you can use qconf
-mc to reconfigure your complexes. For details about configuring resource
attributes, see Oracle Grid Engine Administration Guide.
22. Reconfigure your queues. During the upgrade process, a single default cluster
queue is created. Within this queue you will find all of your installed execution
hosts. It is recommended that you reconfigure your queues. For details about
configuring queues, see Oracle Grid Engine Administration Guide.
3-28 Oracle Grid Engine Installation and Upgrade Guide
A
Configuration File Templates
A
Configuration file templates are located in the $SGE_ROOT/util/install_
modules directory.
A.1 Configuration File Template
#------------------------------------------------# SGE default configuration file
#------------------------------------------------# Use always fully qualified pathnames, please
# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
SGE_ROOT="/opt/n1ge61"
# SGE_QMASTER_PORT is used by qmaster for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_QMASTER_PORT="6444"
# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_EXECD_PORT="6445"
# CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
# Please enter only the name of the cell. No path, please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"
# ADMIN_USER, if you want to use a different admin user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir will be used as
admin user
ADMIN_USER=""
# The dir, where qmaster spools this parts, which are not spooled by DB
#(mandatory for qmaster installation)
QMASTER_SPOOL_DIR="/opt/n1ge61/default/spool/qmaster"
#
#
#
#
#
The dir, where the execd spools (active jobs)
This entry is needed, even if your are going to use
berkeley db spooling. Only cluster configuration and jobs will
be spooled in the database. The execution daemon still needs a spool
directory
Configuration File Templates A-1
Configuration File Template
#(mandatory for qmaster installation)
EXECD_SPOOL_DIR="/opt/n1ge61/default/spool"
# For monitoring and accounting of jobs, every job will get
# unique GID. So you have to enter a free GID Range, which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
GID_RANGE="20000-20100"
# If SGE is compiled with -spool-dynamic, you have to enter here, which
# spooling method should be used. (classic or berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"
# Name of the Server, where the Spooling DB is running on
# if spooling methode is berkeleydb, it must be "none", when
# using no spooling server and it must containe the servername
# if a server should be used. In case of "classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"
# The dir, where the DB spools
# If berkeley db spooling is used, it must contain the path to
# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
# Remember, this directory must be local on the qmaster host or on the
# Berkeley DB Server host. No NSF mount, please
DB_SPOOLING_DIR="/opt/n1ge61/default/spooldb"
# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
ADMIN_HOST_LIST="host1"
# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
SUBMIT_HOST_LIST="host1"
# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
EXEC_HOST_LIST="host1"
# The dir, where the execd spools (local configuration)
# If you want configure your execution daemons to spool in
# a local directory, you have to enter this directory here.
# If you do not want to configure a local execution host spool directory
# please leave this empty
EXECD_SPOOL_DIR_LOCAL=""
# If true, the domainnames will be ignored, during the hostname resolving
# if false, the fully qualified domain name will be used for name resolving
HOSTNAME_RESOLVING="true"
A-2 Oracle Grid Engine Installation and Upgrade Guide
Configuration File Template
# Shell, which should be used for remote installation (rsh/ssh)
# This is only supported, if your hosts and rshd/sshd is configured,
# not to ask for a password, or promting any message.
SHELL_NAME="rsh"
# Enter your default domain, if you are using /etc/hosts or NIS configuration
DEFAULT_DOMAIN="none"
# If a job stops, fails, finnish, you can send a mail to this adress
ADMIN_MAIL="my.name@sun.com"
# If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
# to start automatically during boottime
ADD_TO_RC="true"
#If this is "true" the file permissions of executables will be set to 755
#and of ordinary file to 644.
SET_FILE_PERMS="true"
# This option is not implemented, yet.
# When a exechost should be uninstalled, the running jobs will be rescheduled
RESCHEDULE_JOBS="wait"
# Enter a one of the three distributed scheduler tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"
# The name of the shadow host. This host must have read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter the servername
# (mandatory for shadowhost installation)
SHADOW_HOST="hostname"
# Remove this execution hosts in automatic mode
# (mandatory for unistallation of executions hosts)
EXEC_HOST_LIST_RM="host2 host3 host4"
# This is a Windows specific part of the auto isntallation template
# If you going to install windows executions hosts, you have to enable the
# windows support. To do this, please set the WINDOWS_SUPPORT variable
# to "true". ("false" is disabled)
# (mandatory for qmaster installation, by default WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"
# Enabling the WINDOWS_SUPPORT, recommends the following parameter.
# The WIN_ADMIN_NAME will be added to the list of SGE managers.
# Without adding the WIN_ADMIN_NAME the execution host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator" which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation)
WIN_ADMIN_NAME="Administrator"
# This parameter set the number of parallel installation processes.
# The prevent a system overload, or exeeding the number of open file
# descriptors the user can limit the number of parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20", maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"
Configuration File Templates A-3
Configuration File Template
A-4 Oracle Grid Engine Installation and Upgrade Guide