Linux Standards, Policies and Procedures The Linux group builds and manages Linux servers in a data center environment. They install, configure and maintain the hardware and the system software, and work with the client to ensure the optimum environment for the client’s application. Documentation for the following standards, policies and procedures can be found at http://www.stanford.edu/dept/itss/projects/linux Brief Description of Linux Server Policies and Standards: The Linux group is responsible for installing and maintaining the server hardware and the system software. At present, the group supports Red Hat Enterprise Linux 2.1 and 3.0. o Note: access to build documentation is restricted. Please contact Barb Sidor for access. o Servers will be built using centralized build procedures. Modifications and additions to the basic build will be scripted, and changes to a server will be reflected in the scripts. o System software includes the operating system, system tools (see list), Apache web server, system monitoring agents, and security tools. o The Linux group is not responsible for installing and maintaining application software, which includes the client’s software, application tools, and application monitoring agents. o Upon request, the Linux Group will install the latest version of Tomcat, but configuring and maintaining it will be the responsibility of the client. o Upon request, the Linux Group will assist the client in installing an application, especially when system modifications are needed. o System applications maintained by the Linux Group will be installed in /usr/local; application software installed by the Linux Group but maintained by clients will be installed in /usr/local/apps. Applications installed and maintained by the client will reside in /home. Modifications to the default file system can be made if there are client requirements for a special setup o AFS will be installed by default on every server. Whenever possible, client’s home directories will be their AFS home directories. On production servers, AFS should not be used for day-to-day operations. AFS can be used for home directories, software distribution, and copies of critical files. o The Linux Group will maintain system software logs, including Apache logs. The client will maintain application logs. System logs will be rolled over daily, and backed up by TSM. In addition, a copy of the logs may be kept in AFS to enable quick access. o Patch policy: Critical security patches will be applied immediately. Routine or low risk patches/fixes/upgrades will be applied during a maintenance window. The client will be given sufficient time to test the patch before it is deployed on production servers. Linux Server Security policy (per ITSS policies): o No one other than system administrators will have root on the production Linux servers. o Application owners who need to execute root commands will be given access to those specific commands via sudo. o Tripwire and swatch will be installed by default. o Clients will access the system via kerberized login or SSH only. o User accounts must be SUNetIDs. o Critical security fixes/patches will be installed immediately. Server Support: o The Linux Group follows the ITSS standards on problem severity definitions, escalation policies and response times. o Each server supported by the Linux Group is assigned a primary and a secondary support administrator. o On-call duties for Linux servers are rotated among all the Linux system administrators. o Problems are reported through the HelpSU system. o Escalation in the Linux Group will be sys admin technical lead manager director. Hardware Standards: The Linux Group currently supports Dell hardware. Hardware Support: o A pool of hardware spares will be maintained by the Linux Group. o Load balanced systems and single application servers (eg, web servers) with no client data: A broken server will be removed from the load balanced pool, and either repaired and returned, or a spare put in its place. For single servers, the hardware will be replaced by a spare. o Servers with critical client data: Vendor 24x7 hardware support. The vendor will be called in to fix the hardware. o Medium to Large Linux Servers: Vendor 24x7 hardware support. Maintaining a pool of spares is too costly. Disaster Recovery: The purpose of disaster recovery is to minimize the impact of downtime to the client’s business by returning the server to full functionality quickly, with as little loss of data as possible. The recovery process may vary between servers in order to minimize downtime. In some cases, it will be quicker to use TSM for the entire recovery process; in other cases, it will be quicker to rebuild the server from build scripts, and then restore the client data from TSM. General standards: o TSM will be used for daily incremental backups on every server that contains client data. Servers that do not contain client data and can be rebuilt from the central build scripts do not have to be backed up with TSM. Servers that are part of a load-balanced pool and are identically configured do not have to be backed up with TSM. In the case of a server that does not have to be backed up via TSM, logs and other critical configuration files can be stored in AFS. o Certain server directories such as the AFS cache and /tmp will not be backed up to save TSM bandwidth. o Certain critical files or logs may be copied into AFS as a supplemental backup method in order to facilitate or speed up disaster recovery, but AFS will not be used to backup client data. o Important client information: TSM backups are ‘snapshots’ of the server at one point in the day. In the event of a disaster, the server will be restored to that point. It may be impossible to restore the server to the functionality at the time of the crash. We strongly urge the client to develop a business continuity plan that will document how to test the application under these situations, and a plan on how to restore the application, if necessary, to full functionality. We recommend that the plan include how to re-install the application, list critical files that are needed for configuration, list any files that change often and are necessary to the ‘state’ of the application, etc. Monitoring: o The Linux Group will monitor production and development Linux hosts using the same monitoring method as other unix servers. For now, this is mon. The default monitoring will be ping and kftgtd, with the addition of http if the server runs apache. Alerts are sent to the on-call admin. In addition to the remote monitoring, the server will be monitored locally using tripwire and swatch. o Application monitoring will be the responsibility of the client.