RAC on Linux Best Practices

advertisement
RAC on Linux Best Practices
(Coniguring Linux for RAC)
This paper will help you to configure Linux for a RAC environment and will cover the
OS enhancements.
RAC on Linux Overview
Oracle provides, with its Unbreakable campaign for Linux (Red Hat AS and United
Linux), a solid solution for the Linux platform. To maximize performance and
availability in Linux environment some steps are required within Oracle and at the
Operating System level. This paper describes those features.
·
Using Asynchronous I/O
Asynchronous I/O can be used on RAWIO, EXT2, EXT3, NFS, REISERFS filesystem.Presently Async
I/O is not available for the Oracle Clustered File System(OCFS), because the Linux
kernel does not expose the API needed. We hope the API is exposed in the RedHat 3.0
release, so OCFS can support asynchronous I/O.
To enable asynchronous I/O for Oracle9iR2 on Red Hat Linux Advanced Server 2.1 and
United Linux 1.0 you must re-link Oracle using the Async I/O library, libaio, as follows:
cd to $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk async_on
make -f ins_rdbms.mk ioracle
Two parameters in the init.ora need to be changed, add following lines to the appropriate
init.ora file:
Parameter settings in init.ora for raw devices:
disk_asynch_io=true (default value is true)
Parameter settings in init.ora file or spfile.ora for filesystem files:
Make sure that all Oracle datafiles reside on filesystems that support
asynchronous I/O. (For example, ext2, ext3)
disk_asynch_io=true (default value is true)
filesystemio_options=asynch
To get better I/O throughput:
For DSS workloads, the /proc/sys/fs/aio-max-size has to be increased from the default 131072 bytes to
>=1M.
For OLTP workloads, the default size of 131072 would suffice, as these workloads perform very small
writes.
You can set this value by executing the following command as root user:
echo >/proc/sys/fs/aio-max-size 1048576
For United Linux the aio-max-size cannot be larger than 524288 because of an OS limit, and if this limit
exceeds the LGWR process (Oracle Redolog writer) will crash.
·
Configuring Linux for large Buffer Cache
Oracle9i can allocate and use more than 4 GB of memory for the database buffer cache.
In order to use the extended buffer cache support on Linux, create an in-memory file system on the
/dev/shm mount point equal in size or larger than the amount of memory that you intend to use for the
database buffer cache.
mount -t shm shmfs -o size=8g /dev/shm
To enable the extended buffer cache feature, set the foloowing parameter in the init.ora.
USE_INDIRECT_DATA_BUFFERS = true
Dynamic Cache Parameters
The following dynamic cache parameters can not be used while the extended cache feature is enabled.
DB_CACHE_SIZE
DB_2K_CACHE_SIZE
DB_4K_CACHE_SIZE
DB_8K_CACHE_SIZE
DB_16K_CACHE_SIZE
DB_32K_CACHE_SIZE
If the extended cache feature is enabled, you have to use the DB_BLOCK_BUFFERS parameter to specify
the database cache size.
Limitations
The following limitations apply to the extended buffer cache feature on Linux:
·
You cannot change the size of the buffer cache while the instance is running.
·
You cannot create or use tablespaces with non-standard block sizes.
·
Increasing Address Space
The current shipped version of Oracle can use about 1.7 GB of address space for its SGA. To increase this
size, Oracle needs to be re-linked with a lower SGA base, and Linux needs to have the mapped base
lowered for processes running Oracle.
The solution exists on RH 2.1 and UnitedLinux 1.0
First, the SGA base address must be lowered by relinking Oracle as follows:
Shutdown all instances of Oracle
cd $ORACLE_HOME/lib
cp -a libserver9.a libserver9.a.org (to make a backup copy)
cd $ORACLE_HOME/rdbms/lib
genksms -s 0x15000000 >ksms.s (lower SGA base to 0x15000000)
make -f ins_rdbms.mk ksms.o (compile in new SGA base address)
make -f ins_rdbms.mk ioracle (relink)
After relinking the Oracle executable, it can only run on Linux when the mapped_base is lowered.
The shmmax kernel parameter has to be set to 3GB
sysctl �w kernel.shmmax=3000000000
(See Metalink Note: 200266.1 for details and a sample program.)
·
Using UDP
Efficient interprocess communication is important for RAC since cache fusion transfers
buffers between instances use this to transfer messages and data among the instances.
In Oracle Version 9.2.0.1 and onwards, we use UDP as the default protocol on Linux and
hence we recommend tuning the UDP send and receive space.
We strongly recommend adjusting the send and receive buffer size to 256K.
Tune the default and maximum window sizes with sysctl:
sysctl -w net.core.rmem_max=262144
sysctl -w net.core.wmem_max=262144
sysctl -w net.core.rmem_default=262144
sysctl -w net.core.wmem_default=262144
To check the values read the following files:
/proc/sys/net/core/rmem_default
/proc/sys/net/core/rmem_max
/proc/sys/net/core/wmem_default
/proc/sys/net/core/wmem_max
- default receive window
- maximum receive window
- default send window
- maximum send window
Appendix A contains a small C program which can be used to determine the current send
and receiver buffer size.
·
Logical Volumn Manager (LVM)
Suse�s United Linux 1.0 offers a LVM that can be used to format and configure the disk layout, to use
RAW or OCFS.
For a complete documentation how the LVM works and how to use the LVM on Suse please see the Suse
web page.
ftp://ftp.suse.com/pub/suse/i386/supplementary/commercial/Oracle/docs/lvm_whitepaper.pdf
There is still a problem with the LVM because this is not cluster aware All volumn
create, volumn changes need to be done on all nodes.LVM is not available on RH 3.0.
·
Hangcheck-timer and Oracle Cluster Manager
Detaching watchdogd from the Cluster Manager (Bug 2495915)
The �watchdogd� deamon impacted system availability as it initiated system reboots
under heavy workloads. The watchdogd implementation has been removed from Oracle
9.2.0.2 and above for this reason.
In the place of watchdogd, the 9.2.0.2 and above versions of the oracm for Linux now
includes the use of a Linux kernel module called hangcheck-timer. This module is not
required for oracm operation, but its use is highly recommended. This module monitors
the Linux kernel for long operating system hangs that could affect the reliability of a
RAC node and cause a corruption of a RAC database. When such a hang occurs, this
module reboots the node. This approach offers the following three advantages over the
watchdogd approach,:
- node resets are triggered from within the Linux kernel making them much less
effected by system load,
- oracm on a RAC node can easily be stopped and reconfigured because its
operation is completely independent of the kernel module,
- The features provided by the hangcheck-timer module closely resemble features
available in the implementation of the Cluster Manager for RAC on the Windows
platform, on which the Cluster Manager on Linux was based.
Configuration Parameter Changes for the oracm on Linux
The deprecation and removal of the watchdogd in the 9.2.0.2 version (and above) of the
Oracle Cluster Manager for Linux and the addition of the hangcheck-timer kernel module
requires several parameter changes in the configuration file. File
��$ORACLE_HOME/oracm/admin/cmcfg.ora� is used to configure the Oracle Cluster
Manager.
1. The removal of the watchdogd means that the following parameters
included in the cmcfg.ora file used by the Oracle Cluster Manager
are no longer valid:
WatchdogTimerMargin
WatchdogSafetyMargin
These parameters should be removed from cmcfg.ora on all nodes in
the cluster.
2. Hangcheck timer introduces the following configuration parameter used in
cmcfg.ora to allow the oracm to know the name of the hangcheck-timer
kernel module so it can determine if it is correctly loaded:
KernelModuleNam(see Appendix B)
If the module in KernelModuleName is either not loaded but
correctly specified or incorrectly specified, the oracm will
produce a series of error messages in the syslog system log
(/var/log/messages). However, it will not prevent the oracm process
from running. The module must be loaded prior to oracm startup.
It is strongly recommended that this parameter be set correctly on
all nodes in the cluster.
3. Finally, it changes the use of the following configuration
parameter from optional to mandatory:
CMDiskFile(see Appendix B)
This is done to ensure that a CM quorum partition is used and
allows the oracm to more reliably handle certain kinds of hardware
and software errors that affect cluster participation.
4. The inclusion of the hangcheck-timer kernel module also introduces
two new configuration parameters to be used when the module is
loaded:
hangcheck_tick - the hangcheck_tick is an interval indicating how
often the hangcheck-timer checks on the health of
the system.
hangcheck_margin - certain kernel activities may randomly introduce
delays in the operation of the hangcheck-timer.
hangcheck_margin provides a margin of error to
prevent unnecessary system resets due to these
delays.
Taken together, these two parameters indicate how long a RAC node
must hang before the hangcheck-timer module will reset the system. A
node reset will occur when the following is true:
(system hang time) > (hangcheck_tick + hangcheck_margin)
Example of loading the hangcheck-timer. Put this into the rc.local script
on RH or the /etc/init.d/oracle script on UL
# load hangcheck-timer module for ORACM 9.2.0.2 (or higher)
/sbin/insmod /lib/modules/2.4.19-4GB/kernel/drivers/char/hangchecktimer.o hangcheck_tick=30 hangcheck_margin=180
Recommended Configuration Defaults
Oracle recommends that the hangcheck-timer module be loaded and the
oracm be started with the following parameter values (in addition to
recommendations made elsewhere in the Oracle RAC documentation):
Parameter
Service
Value
hangcheck_tick
hangcheck-timer
30 seconds
hangcheck_margin
hangcheck-timer
180 seconds
KernelModuleName
oracm
hangcheck-timer
MissCount
oracm
> hangcheck_tick +
hangcheck_margin (>
210 seconds)
·
Linux Monitoring and Configuration Tools
Overall tools
sar, vmstat
CPU
/proc/cpuinfo, mpstat, top
Memory
/proc/meminfo, /proc/slabinfo, free
Disk I/O
iostat
Network
/proc/net/dev, netstat, mii-tool
Kernel Version and Release
cat /proc/version
Types of I/O Cards
lspci -vv
Kernel Modules Loaded
lsmod, cat /proc/modules
List all PCI devices (HW)
lspci �v
Startup changes
&nb/etc/sysctl.conf, /etc/rc.local
Kernel messages
&nb/var/log/messages, /var/log/dmesg
OS error codes
&nbs/usr/src/linux/include/asm/errno.h
OS calls
&nbs/usr/sbin/strace-p
Appendix A:
=========
cmcfg.ora
=========
HeartBeat=15000
ClusterName=Oracle Cluster Manager, version 9i
KernelModuleName=hangcheck-timer
PollInterval=1000
MissCount=250
PrivateNodeNames=mars-int venus-int
PublicNodeNames=mars venus
ServicePort=9998
CmDiskFile=/dev/quorum
HostName=venus-int
ocmargs.ora
==========
# Sample configuration file $ORACLE_HOME/oracm/admin/ocmargs.ora
oracm
norestart
Download