ACFS Under Scrutiny - Luca Canali

advertisement
ACFS Under Scrutiny
Luca Canali, CERN
Dawid Wojcik, CERN
UKOUG Conference, Birmingham, Nov 2010
Outline
 ACFS – cluster file system for 11gR2 ASM




Use cases
Architecture
Installation and setup
Some investigations of the internals
 ACFS at CERN

Use cases
 Deployment
 Performance tests
 Conclusions
ACFS under scrutiny – Luca Canali, Dawid Wojcik
2
ACFS Use Cases
 Use cases for ACFS with Oracle

Automatic Diagnostic Repository (ADR) for 11g RDBMS
• unified logging structure for RAC


RDBMS related usage: BFILES, datapump dump files, ETL files,
miscellaneous log files (RMAN), etc
Can also be used for Oracle RDBMS home binaries
• shared or non-shared
• Allows to take snapshots before applying a patch or a patchset
 Use cases of ACFS as generic file system




Can be deployed for custom applications and application servers
No need to have RDBMS installation, only clusterware 11.2
Performance, maintenance and high availability
DBAs will find it easy to use
ACFS under scrutiny – Luca Canali, Dawid Wojcik
3
ASM and ACFS – Architecture
Oracle RAC or Single
Instance DBs
ASM Cluster
File System
(ACFS)
Third Party
File System
(optional)
ASM dynamic Volume Manager
(ADVM)
ASM
Clusterware 11.2
Oracle
Database
Files
Applications
Grid Infrastructure
ACFS under scrutiny – Luca Canali, Dawid Wojcik
4
ASM and ACFS
 ASM: volume manager and cluster file system
for Oracle DB files
 ACFS: POSIX compliant cluster file system


Build on top of ASM disk groups
For ‘all other files’ (DB not supported on ACFS yet)
 ACFS leverages ASM and CRS



Performance
Manageability
High Availability
 Ref: ACFS Technical Overview and Deployment Guide
[ID 948187.1]
ACFS under scrutiny – Luca Canali, Dawid Wojcik
5
Automatic Storage Management
 ASM (Automatic Storage Management)




Oracle’s cluster file system and volume manager for Oracle
databases
HA: fault tolerant, online storage reorganization/addition
Performance: stripe and mirroring everything
Commodity HW: Physics databases at CERN use ASM normal
redundancy (similar to RAID 1+0 across multiple disks and storage
arrays)
DATA_DiskGroup
RECOVERY_DiskGroup
Failgroup1
Failgroup2
Failgroup3
ACFS under scrutiny – Luca Canali, Dawid Wojcik
Failgroup4
6
ASM Dynamic Volume Manager
 ASM Dynamic Volume Manager (ADVM)

New feature in Oracle Clusterware 11.2

Volumes are implemented as ASM files
exposed to OS as block devices: /dev/asm/volume_name-x




Configurable redundancy, stripe width and stripe columns
A dirty region logging file is created if redundancy is mirror or high
On top of ADVM volume one can create any file system (ext3,
ACFS, ...)
 Volumes can be resized online

File system must also support online resize (ACFS, grow: ext2,
ext3, ext4)
 Further investigations on internals:

v$asm_volume, v$asm_file, x$kffxp
ACFS under scrutiny – Luca Canali, Dawid Wojcik
7
ASM Cluster File System
 What is ACFS

ASM-based Cluster File System – new in Oracle 11.2
• Built on top of ADVM volumes
• Can be used cluster-wide or single-node only






Multi platform support (11.2.0.2)
Can be shared using NFS, CIFS, …
Online file system expansion / shrink
Mirror protection when using NORMAL or HIGH
redundancy diskgroups/volumes
Read-only snapshots built-in
Replication, security realms, encryption and tagging
introduced in 11.2.0.2
ACFS under scrutiny – Luca Canali, Dawid Wojcik
8
ACFS Integration with the
Oracle Software Stack
 ASM, ADVM and ACFS are integrated with Oracle
11gR2 clusterware
 ASM and clusterware are tightly integrated in 11gR2


A single ‘GRID HOME’ is used
Notable: administration is simplified by storing OCR and voting
disk(s) in ASM
 ADVM and ACFS resources are managed by
clusterware

Ease maintenance and learning curve for the DBA
ACFS under scrutiny – Luca Canali, Dawid Wojcik
9
ACFS Crash Recovery
 In case of a node crash or force dismounting of
ACFS – recovery is needed (three levels)

ASM in RAC will use surviving nodes to recover
• ASM uses ‘internal files’ such as ACD (Active Change
Directory) and COD (Continuing Operation Directory) for this
purpose

ADVM volumes with normal or high redundancy have
associated dirty region logging file (high redundancy
by default) – recovery run by ASM processes
 ACFS utilizes Metadata Transaction Log
ACFS under scrutiny – Luca Canali, Dawid Wojcik
10
ACFS setup
 Setting up ACFS for the first time

The quick way: asmca
 The alternative CLI setup

Create and enable volume (enabled on all nodes by default)
• Asmcmd: create volume -G {diskroup_name} -s {size}
• /dev/asm/{vol_name}-x device is created (Linux)

{vol_name}
Create ACFS file system
• mkfs -t acfs /dev/asm/{vol_name}

Register acfs general purpose filesystem with CRS (Allows to
mount filesystem automatically with CRS)
• acfsutil registry -a -f {vol_path} {acfs_mount_point}
• If ACFS will be used for Oracle home use this instead:
– srvctl add filesystem -d {vol_path} -v {volume_name} -g
{disgroup_name}
– Allows to maintain ACFS, ASM and DB dependencies
ACFS under scrutiny – Luca Canali, Dawid Wojcik
11
ASM and ACFS internals
 New ASM background processes in 11gR2

Used to manage interaction ADVM and ASM, IO
fencing and clusterware membership
• One can see them with ps -elf | grep ASM

Volume Driver Background (VDBG)
 Volume Background (VBG#) processes
 Volume Membership Background (VMB)
 ACFS background process
 More details on metalink note [ID 883028.1]
ACFS under scrutiny – Luca Canali, Dawid Wojcik
12
ACFS, ADVM and Linux
 Kernel modules needed for ADVM and ACFS

oracleacfs, oracleadvm, oracleoks
• Can been seen on OS level with lsmod | grep oracle

Binaries in $GRID_HOME/install/usm/
• One can check location with acfsroot version_check
 How to remove acfs-related kernel modules

Modules are proprietary (non-GPL) and trigger
message on kernel tainting in /var/log/messages

If don’t want to use acfs or are afraid of kernel tainting
acfsroot uninstall
See also note [ID 1082851.1]


ACFS under scrutiny – Luca Canali, Dawid Wojcik
13
ACFS Command Line Tools
 Main command line interface – acfsutil

Display filesystem information, resize filesystem,
register mountpoints, create snapshots, …
 Can be used to configure new 11.2.0.2 features of
security, realms, encryption, replication and tagging
 Most operations can also be done via GUI tool
asmca
 Other utilities



Typical Linux/UNIX: fsck, mkfs, mount, umount
afcsdbg – debug tool
advmutil – display ADVM information, tune ADVM
ACFS under scrutiny – Luca Canali, Dawid Wojcik
14
ACFS Snapshots




ACFS snapshots provide point-in-time images
Can be used for consistent backups
Performed online
Copy on first write mechanism (before-images
shared between snapshots)
 Snapshots within the same file system


Snapshots visible in CLI or in V$ASM_ACFSSNAPSHOTS
You can read snapshots in /mount_point/.ACFS/snaps/
 Limited to 63 snapshots per ACFS file system
ACFS under scrutiny – Luca Canali, Dawid Wojcik
15
ACFS Replication
 File system level replication from one primary site to
another






Can replicate whole ACFS filesystem or only tagged fragments
Changes on the primary system captured to replication logs
Replication logs send by background processes to destination
cluster and replayed there
Logs deleted from both system after applying
Replication logs stored in the same filesystem – check for free
space!
Replication can be set-up with acfsutil
 Possible use case:

Replicate ACFS file system data in Data Guard setup
ACFS under scrutiny – Luca Canali, Dawid Wojcik
16
CERN
experience
with ACFS
Physics DB HW, a typical setup
 Dual-CPU quad-core blade servers, 24GB memory, Intel
Nehalem low power; 2.26GHz clock
 Redundant power, mirrored local disks, 4 NIC (2 private/
2 public), dual HBAs, “RAID 1+0 like” with ASM
ACFS under scrutiny – Luca Canali, Dawid Wojcik
18
ACFS and ASM on low-cost storage
 Advantages



High performance
Reliability
Low cost
• Normal redundancy ASM disk groups instead of complicated RAIDs
• Cheap SATA disks rather than more expensive enterprise solutions
• Can provide mirroring across storage arrays

Online operations (grow/shrink, add/remove disks)
 Disadvantages

Can only be used on nodes with clusterware installed
• Unless exported via NFS


Some operations cause cluster-wide sync – performance impact
Async IO not supported – cannot put DB data files on ACFS
ACFS under scrutiny – Luca Canali, Dawid Wojcik
19
ACFS use cases at CERN
 ACFS is used in production at CERN



General purpose cluster file system for backup &
monitoring cluster – fast and reliable
Repository of oracle binaries
Temporary storage for large exports/imports
 Other usages predicted after moving to Oracle 11.2

Automatic Diagnostic Repository (ADR)
 Export/import directory for each cluster DB
 Local Oracle homes (?) – snapshots can be used before
patching
ACFS under scrutiny – Luca Canali, Dawid Wojcik
20
ACFS test setup
 64 SATA II, 7k2 RPM, 400GB lower end disks
 JBOD configuration – visible to ASM as 64 LUNs
 45% of each disk’s capacity used for DATA diskgroup

For improved IOPS and throughput (OS level partitioning)
 ASM normal redundancy used – 10.5TB diskgroup
 Two 800GB and 80GB ADVM normal redundancy volumes
created for tests


ACFS, ext2 and ext3 file systems created on ADVM volumes
No difference in speed between small and large file systems in any of
the tests (80GB vs 800GB)
ACFS under scrutiny – Luca Canali, Dawid Wojcik
21
Tests conducted
 Tests conducted using


dd tool – different operations and different operation block sizes (all
file sizes of 70GB)
bonnie++ – generic file system tests (ver. 1.96)
 Tests presented






Comparing ACFS, ext2 ext3 and encrypted ACFS (AES 192-bit)
ADVM used in all tests
dd command sequential write (synchronous and asynchronous)
dd command sequential read (synchronous)
bonnie++ - file system block write, rewrite and read; file creation and
deletion speed
Multithread tests – workload run from 1 node and 2 nodes
• Running same workload (2 threads)
• Running one workload listing directories on the second node (10x/s)
• Streaming tests – one thread writing, second thread reading
ACFS under scrutiny – Luca Canali, Dawid Wojcik
22
Write test results results in
our enviroment
Asynchronous sequential write [MB/s]
240
230
MB/s
220
210
200
218
210
190
232
229
226
217
215
215
231
205
ACFS async [MB/s]
229
217
214
205
203
EXT3 async [MB/s]
180
256
1024
8
32
128
EXT2 async [MB/s]
operation block size [kB]
MB/s
Sychronous sequential write [MB/s]
160
140
120
100
80
60
40
20
0
ACFS sync [MB/s]
148
92
90
120 109 109
102 113 115
EXT2 sync [MB/s]
56
1024
256
128
63
62
23
32
25
24
EXT3 sync [MB/s]
8
operation block sie [kB]
ACFS under scrutiny – Luca Canali, Dawid Wojcik
23
Read and write results in
our enviroment
Sychronous sequential read [MB/s]
300
250
MB/s
200
150
100
ACFS sync [MB/s]
276 278
207 219 218
208
208 209 215
208
208
EXT2 sync [MB/s]
116 116
50
54
EXT3 sync [MB/s]
54
0
8
32
128
256
1024
operation block size [kB]
Sequential read and write - encrypted ACFS (AES 192-bit)
50
MB/s
40
30
20
ACFS async write [MB/s]
39
39
39
38
38
ACFS sync write [MB/s]
10
5.5 7.4
5.4 7.5
5.4 7.4
5.2 7.9
256
128
32
4.5 7.7
ACFS sync read [MB/s]
0
1024
8
operation block size [kB]
ACFS under scrutiny – Luca Canali, Dawid Wojcik
24
bonnie++ test results results
in our enviroment
bonnie++ throughput tests
250
210
201
200
155
152
MB/s
207
201
191
150
207
ACFS
138
Encr. ACFS
100
EXT2
50
25
EXT3
7.5
6.7
0
Block write
Block rewrite
Block read
bonnie++ random file creation test
30000
24000
25000
Files/s
20000
ACFS
15000
10000
5000
7800
7500
2800
Encr. ACFS
6200
4800
EXT2
0
Files created/s
Files deted/s
ACFS under scrutiny – Luca Canali, Dawid Wojcik
25
Multithread test results
results in our enviroment
bonnie++ throughput tests - multithread comparison
average MB/s/thread
250
201
201
152
144
150
98
100
198
183
200
165
126
Write
82
Rewrite
54
50
26
Read
0
1 thread
2 threads (same node)
2 threads (different nodes)
1 thread (ls command 10x/s on
second node)
bonnie++ throughput tests - write and reader threads
average MB/s/thread
250
200
212
194
161
154
150
126
120
100
72
88
Write thread
Read thread
50
0
Threads on the same node
(1MB block)
Threads on the 2 nodes (1MB Threads on the same node (8kB
block)
block)
Threads on the 2 nodes (8kB
block)
ACFS under scrutiny – Luca Canali, Dawid Wojcik
26
Conclusions
 ADVM and ACFS


Cluster file system integrated in 11.2
Leverages ASM features for non-RDBMS files
 ACFS usage at CERN

Positive experience
• Currently used to provide cluster filesystem for our custom DB
monitoring

Being considered for the 11g RDBMS deployments
• To support log files on ADR, …

ACFS performance tests
• Positive results, more tests in progress
ACFS under scrutiny – Luca Canali, Dawid Wojcik
27
Acknowledgments
 CERN-IT DB group and in particular:
 Jacek Wojcieszuk
 More info:
http://cern.ch/it-dep/db
http://cern.ch/canali
ACFS under scrutiny – Luca Canali, Dawid Wojcik
28
Download