ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010 Outline ACFS – cluster file system for 11gR2 ASM Use cases Architecture Installation and setup Some investigations of the internals ACFS at CERN Use cases Deployment Performance tests Conclusions ACFS under scrutiny – Luca Canali, Dawid Wojcik 2 ACFS Use Cases Use cases for ACFS with Oracle Automatic Diagnostic Repository (ADR) for 11g RDBMS • unified logging structure for RAC RDBMS related usage: BFILES, datapump dump files, ETL files, miscellaneous log files (RMAN), etc Can also be used for Oracle RDBMS home binaries • shared or non-shared • Allows to take snapshots before applying a patch or a patchset Use cases of ACFS as generic file system Can be deployed for custom applications and application servers No need to have RDBMS installation, only clusterware 11.2 Performance, maintenance and high availability DBAs will find it easy to use ACFS under scrutiny – Luca Canali, Dawid Wojcik 3 ASM and ACFS – Architecture Oracle RAC or Single Instance DBs ASM Cluster File System (ACFS) Third Party File System (optional) ASM dynamic Volume Manager (ADVM) ASM Clusterware 11.2 Oracle Database Files Applications Grid Infrastructure ACFS under scrutiny – Luca Canali, Dawid Wojcik 4 ASM and ACFS ASM: volume manager and cluster file system for Oracle DB files ACFS: POSIX compliant cluster file system Build on top of ASM disk groups For ‘all other files’ (DB not supported on ACFS yet) ACFS leverages ASM and CRS Performance Manageability High Availability Ref: ACFS Technical Overview and Deployment Guide [ID 948187.1] ACFS under scrutiny – Luca Canali, Dawid Wojcik 5 Automatic Storage Management ASM (Automatic Storage Management) Oracle’s cluster file system and volume manager for Oracle databases HA: fault tolerant, online storage reorganization/addition Performance: stripe and mirroring everything Commodity HW: Physics databases at CERN use ASM normal redundancy (similar to RAID 1+0 across multiple disks and storage arrays) DATA_DiskGroup RECOVERY_DiskGroup Failgroup1 Failgroup2 Failgroup3 ACFS under scrutiny – Luca Canali, Dawid Wojcik Failgroup4 6 ASM Dynamic Volume Manager ASM Dynamic Volume Manager (ADVM) New feature in Oracle Clusterware 11.2 Volumes are implemented as ASM files exposed to OS as block devices: /dev/asm/volume_name-x Configurable redundancy, stripe width and stripe columns A dirty region logging file is created if redundancy is mirror or high On top of ADVM volume one can create any file system (ext3, ACFS, ...) Volumes can be resized online File system must also support online resize (ACFS, grow: ext2, ext3, ext4) Further investigations on internals: v$asm_volume, v$asm_file, x$kffxp ACFS under scrutiny – Luca Canali, Dawid Wojcik 7 ASM Cluster File System What is ACFS ASM-based Cluster File System – new in Oracle 11.2 • Built on top of ADVM volumes • Can be used cluster-wide or single-node only Multi platform support (11.2.0.2) Can be shared using NFS, CIFS, … Online file system expansion / shrink Mirror protection when using NORMAL or HIGH redundancy diskgroups/volumes Read-only snapshots built-in Replication, security realms, encryption and tagging introduced in 11.2.0.2 ACFS under scrutiny – Luca Canali, Dawid Wojcik 8 ACFS Integration with the Oracle Software Stack ASM, ADVM and ACFS are integrated with Oracle 11gR2 clusterware ASM and clusterware are tightly integrated in 11gR2 A single ‘GRID HOME’ is used Notable: administration is simplified by storing OCR and voting disk(s) in ASM ADVM and ACFS resources are managed by clusterware Ease maintenance and learning curve for the DBA ACFS under scrutiny – Luca Canali, Dawid Wojcik 9 ACFS Crash Recovery In case of a node crash or force dismounting of ACFS – recovery is needed (three levels) ASM in RAC will use surviving nodes to recover • ASM uses ‘internal files’ such as ACD (Active Change Directory) and COD (Continuing Operation Directory) for this purpose ADVM volumes with normal or high redundancy have associated dirty region logging file (high redundancy by default) – recovery run by ASM processes ACFS utilizes Metadata Transaction Log ACFS under scrutiny – Luca Canali, Dawid Wojcik 10 ACFS setup Setting up ACFS for the first time The quick way: asmca The alternative CLI setup Create and enable volume (enabled on all nodes by default) • Asmcmd: create volume -G {diskroup_name} -s {size} • /dev/asm/{vol_name}-x device is created (Linux) {vol_name} Create ACFS file system • mkfs -t acfs /dev/asm/{vol_name} Register acfs general purpose filesystem with CRS (Allows to mount filesystem automatically with CRS) • acfsutil registry -a -f {vol_path} {acfs_mount_point} • If ACFS will be used for Oracle home use this instead: – srvctl add filesystem -d {vol_path} -v {volume_name} -g {disgroup_name} – Allows to maintain ACFS, ASM and DB dependencies ACFS under scrutiny – Luca Canali, Dawid Wojcik 11 ASM and ACFS internals New ASM background processes in 11gR2 Used to manage interaction ADVM and ASM, IO fencing and clusterware membership • One can see them with ps -elf | grep ASM Volume Driver Background (VDBG) Volume Background (VBG#) processes Volume Membership Background (VMB) ACFS background process More details on metalink note [ID 883028.1] ACFS under scrutiny – Luca Canali, Dawid Wojcik 12 ACFS, ADVM and Linux Kernel modules needed for ADVM and ACFS oracleacfs, oracleadvm, oracleoks • Can been seen on OS level with lsmod | grep oracle Binaries in $GRID_HOME/install/usm/ • One can check location with acfsroot version_check How to remove acfs-related kernel modules Modules are proprietary (non-GPL) and trigger message on kernel tainting in /var/log/messages If don’t want to use acfs or are afraid of kernel tainting acfsroot uninstall See also note [ID 1082851.1] ACFS under scrutiny – Luca Canali, Dawid Wojcik 13 ACFS Command Line Tools Main command line interface – acfsutil Display filesystem information, resize filesystem, register mountpoints, create snapshots, … Can be used to configure new 11.2.0.2 features of security, realms, encryption, replication and tagging Most operations can also be done via GUI tool asmca Other utilities Typical Linux/UNIX: fsck, mkfs, mount, umount afcsdbg – debug tool advmutil – display ADVM information, tune ADVM ACFS under scrutiny – Luca Canali, Dawid Wojcik 14 ACFS Snapshots ACFS snapshots provide point-in-time images Can be used for consistent backups Performed online Copy on first write mechanism (before-images shared between snapshots) Snapshots within the same file system Snapshots visible in CLI or in V$ASM_ACFSSNAPSHOTS You can read snapshots in /mount_point/.ACFS/snaps/ Limited to 63 snapshots per ACFS file system ACFS under scrutiny – Luca Canali, Dawid Wojcik 15 ACFS Replication File system level replication from one primary site to another Can replicate whole ACFS filesystem or only tagged fragments Changes on the primary system captured to replication logs Replication logs send by background processes to destination cluster and replayed there Logs deleted from both system after applying Replication logs stored in the same filesystem – check for free space! Replication can be set-up with acfsutil Possible use case: Replicate ACFS file system data in Data Guard setup ACFS under scrutiny – Luca Canali, Dawid Wojcik 16 CERN experience with ACFS Physics DB HW, a typical setup Dual-CPU quad-core blade servers, 24GB memory, Intel Nehalem low power; 2.26GHz clock Redundant power, mirrored local disks, 4 NIC (2 private/ 2 public), dual HBAs, “RAID 1+0 like” with ASM ACFS under scrutiny – Luca Canali, Dawid Wojcik 18 ACFS and ASM on low-cost storage Advantages High performance Reliability Low cost • Normal redundancy ASM disk groups instead of complicated RAIDs • Cheap SATA disks rather than more expensive enterprise solutions • Can provide mirroring across storage arrays Online operations (grow/shrink, add/remove disks) Disadvantages Can only be used on nodes with clusterware installed • Unless exported via NFS Some operations cause cluster-wide sync – performance impact Async IO not supported – cannot put DB data files on ACFS ACFS under scrutiny – Luca Canali, Dawid Wojcik 19 ACFS use cases at CERN ACFS is used in production at CERN General purpose cluster file system for backup & monitoring cluster – fast and reliable Repository of oracle binaries Temporary storage for large exports/imports Other usages predicted after moving to Oracle 11.2 Automatic Diagnostic Repository (ADR) Export/import directory for each cluster DB Local Oracle homes (?) – snapshots can be used before patching ACFS under scrutiny – Luca Canali, Dawid Wojcik 20 ACFS test setup 64 SATA II, 7k2 RPM, 400GB lower end disks JBOD configuration – visible to ASM as 64 LUNs 45% of each disk’s capacity used for DATA diskgroup For improved IOPS and throughput (OS level partitioning) ASM normal redundancy used – 10.5TB diskgroup Two 800GB and 80GB ADVM normal redundancy volumes created for tests ACFS, ext2 and ext3 file systems created on ADVM volumes No difference in speed between small and large file systems in any of the tests (80GB vs 800GB) ACFS under scrutiny – Luca Canali, Dawid Wojcik 21 Tests conducted Tests conducted using dd tool – different operations and different operation block sizes (all file sizes of 70GB) bonnie++ – generic file system tests (ver. 1.96) Tests presented Comparing ACFS, ext2 ext3 and encrypted ACFS (AES 192-bit) ADVM used in all tests dd command sequential write (synchronous and asynchronous) dd command sequential read (synchronous) bonnie++ - file system block write, rewrite and read; file creation and deletion speed Multithread tests – workload run from 1 node and 2 nodes • Running same workload (2 threads) • Running one workload listing directories on the second node (10x/s) • Streaming tests – one thread writing, second thread reading ACFS under scrutiny – Luca Canali, Dawid Wojcik 22 Write test results results in our enviroment Asynchronous sequential write [MB/s] 240 230 MB/s 220 210 200 218 210 190 232 229 226 217 215 215 231 205 ACFS async [MB/s] 229 217 214 205 203 EXT3 async [MB/s] 180 256 1024 8 32 128 EXT2 async [MB/s] operation block size [kB] MB/s Sychronous sequential write [MB/s] 160 140 120 100 80 60 40 20 0 ACFS sync [MB/s] 148 92 90 120 109 109 102 113 115 EXT2 sync [MB/s] 56 1024 256 128 63 62 23 32 25 24 EXT3 sync [MB/s] 8 operation block sie [kB] ACFS under scrutiny – Luca Canali, Dawid Wojcik 23 Read and write results in our enviroment Sychronous sequential read [MB/s] 300 250 MB/s 200 150 100 ACFS sync [MB/s] 276 278 207 219 218 208 208 209 215 208 208 EXT2 sync [MB/s] 116 116 50 54 EXT3 sync [MB/s] 54 0 8 32 128 256 1024 operation block size [kB] Sequential read and write - encrypted ACFS (AES 192-bit) 50 MB/s 40 30 20 ACFS async write [MB/s] 39 39 39 38 38 ACFS sync write [MB/s] 10 5.5 7.4 5.4 7.5 5.4 7.4 5.2 7.9 256 128 32 4.5 7.7 ACFS sync read [MB/s] 0 1024 8 operation block size [kB] ACFS under scrutiny – Luca Canali, Dawid Wojcik 24 bonnie++ test results results in our enviroment bonnie++ throughput tests 250 210 201 200 155 152 MB/s 207 201 191 150 207 ACFS 138 Encr. ACFS 100 EXT2 50 25 EXT3 7.5 6.7 0 Block write Block rewrite Block read bonnie++ random file creation test 30000 24000 25000 Files/s 20000 ACFS 15000 10000 5000 7800 7500 2800 Encr. ACFS 6200 4800 EXT2 0 Files created/s Files deted/s ACFS under scrutiny – Luca Canali, Dawid Wojcik 25 Multithread test results results in our enviroment bonnie++ throughput tests - multithread comparison average MB/s/thread 250 201 201 152 144 150 98 100 198 183 200 165 126 Write 82 Rewrite 54 50 26 Read 0 1 thread 2 threads (same node) 2 threads (different nodes) 1 thread (ls command 10x/s on second node) bonnie++ throughput tests - write and reader threads average MB/s/thread 250 200 212 194 161 154 150 126 120 100 72 88 Write thread Read thread 50 0 Threads on the same node (1MB block) Threads on the 2 nodes (1MB Threads on the same node (8kB block) block) Threads on the 2 nodes (8kB block) ACFS under scrutiny – Luca Canali, Dawid Wojcik 26 Conclusions ADVM and ACFS Cluster file system integrated in 11.2 Leverages ASM features for non-RDBMS files ACFS usage at CERN Positive experience • Currently used to provide cluster filesystem for our custom DB monitoring Being considered for the 11g RDBMS deployments • To support log files on ADR, … ACFS performance tests • Positive results, more tests in progress ACFS under scrutiny – Luca Canali, Dawid Wojcik 27 Acknowledgments CERN-IT DB group and in particular: Jacek Wojcieszuk More info: http://cern.ch/it-dep/db http://cern.ch/canali ACFS under scrutiny – Luca Canali, Dawid Wojcik 28