Standards and Policies for Linux

advertisement
Linux Standards, Policies and Procedures
The Linux group builds and manages Linux servers in a data center environment. They
install, configure and maintain the hardware and the system software, and work with the
client to ensure the optimum environment for the client’s application.
Documentation for the following standards, policies and procedures can be found at
http://www.stanford.edu/dept/itss/projects/linux
Brief Description of Linux Server Policies and Standards:

The Linux group is responsible for installing and maintaining the server hardware
and the system software. At present, the group supports Red Hat Enterprise
Linux 2.1 and 3.0.
o Note: access to build documentation is restricted. Please contact Barb
Sidor for access.
o Servers will be built using centralized build procedures. Modifications
and additions to the basic build will be scripted, and changes to a server
will be reflected in the scripts.
o System software includes the operating system, system tools (see list),
Apache web server, system monitoring agents, and security tools.
o The Linux group is not responsible for installing and maintaining
application software, which includes the client’s software, application
tools, and application monitoring agents.
o Upon request, the Linux Group will install the latest version of Tomcat,
but configuring and maintaining it will be the responsibility of the client.
o Upon request, the Linux Group will assist the client in installing an
application, especially when system modifications are needed.
o System applications maintained by the Linux Group will be installed in
/usr/local; application software installed by the Linux Group but
maintained by clients will be installed in /usr/local/apps. Applications
installed and maintained by the client will reside in /home. Modifications
to the default file system can be made if there are client requirements for a
special setup
o AFS will be installed by default on every server. Whenever possible,
client’s home directories will be their AFS home directories. On
production servers, AFS should not be used for day-to-day operations.
AFS can be used for home directories, software distribution, and copies of
critical files.
o The Linux Group will maintain system software logs, including Apache
logs. The client will maintain application logs. System logs will be rolled
over daily, and backed up by TSM. In addition, a copy of the logs may be
kept in AFS to enable quick access.
o Patch policy: Critical security patches will be applied immediately.
Routine or low risk patches/fixes/upgrades will be applied during a





maintenance window. The client will be given sufficient time to test the
patch before it is deployed on production servers.
Linux Server Security policy (per ITSS policies):
o No one other than system administrators will have root on the production
Linux servers.
o Application owners who need to execute root commands will be given
access to those specific commands via sudo.
o Tripwire and swatch will be installed by default.
o Clients will access the system via kerberized login or SSH only.
o User accounts must be SUNetIDs.
o Critical security fixes/patches will be installed immediately.
Server Support:
o The Linux Group follows the ITSS standards on problem severity
definitions, escalation policies and response times.
o Each server supported by the Linux Group is assigned a primary and a
secondary support administrator.
o On-call duties for Linux servers are rotated among all the Linux system
administrators.
o Problems are reported through the HelpSU system.
o Escalation in the Linux Group will be sys admin  technical lead 
manager  director.
Hardware Standards: The Linux Group currently supports Dell hardware.
Hardware Support:
o A pool of hardware spares will be maintained by the Linux Group.
o Load balanced systems and single application servers (eg, web servers)
with no client data: A broken server will be removed from the load
balanced pool, and either repaired and returned, or a spare put in its place.
For single servers, the hardware will be replaced by a spare.
o Servers with critical client data: Vendor 24x7 hardware support. The
vendor will be called in to fix the hardware.
o Medium to Large Linux Servers: Vendor 24x7 hardware support.
Maintaining a pool of spares is too costly.
Disaster Recovery: The purpose of disaster recovery is to minimize the impact of
downtime to the client’s business by returning the server to full functionality
quickly, with as little loss of data as possible. The recovery process may vary
between servers in order to minimize downtime. In some cases, it will be quicker
to use TSM for the entire recovery process; in other cases, it will be quicker to
rebuild the server from build scripts, and then restore the client data from TSM.
General standards:
o TSM will be used for daily incremental backups on every server that
contains client data.
 Servers that do not contain client data and can be rebuilt from the
central build scripts do not have to be backed up with TSM.
 Servers that are part of a load-balanced pool and are identically
configured do not have to be backed up with TSM.


In the case of a server that does not have to be backed up via TSM,
logs and other critical configuration files can be stored in AFS.
o Certain server directories such as the AFS cache and /tmp will not be
backed up to save TSM bandwidth.
o Certain critical files or logs may be copied into AFS as a supplemental
backup method in order to facilitate or speed up disaster recovery, but
AFS will not be used to backup client data.
o Important client information: TSM backups are ‘snapshots’ of the server
at one point in the day. In the event of a disaster, the server will be
restored to that point. It may be impossible to restore the server to the
functionality at the time of the crash. We strongly urge the client to
develop a business continuity plan that will document how to test the
application under these situations, and a plan on how to restore the
application, if necessary, to full functionality. We recommend that the
plan include how to re-install the application, list critical files that are
needed for configuration, list any files that change often and are necessary
to the ‘state’ of the application, etc.
Monitoring:
o The Linux Group will monitor production and development Linux hosts
using the same monitoring method as other unix servers. For now, this is
mon. The default monitoring will be ping and kftgtd, with the addition of
http if the server runs apache. Alerts are sent to the on-call admin. In
addition to the remote monitoring, the server will be monitored locally
using tripwire and swatch.
o Application monitoring will be the responsibility of the client.
Download