Oracle Cache Fusion – In Operation - oracle-info

advertisement
Oracle Cache Fusion – In Operation
}
Agenda
} Cache Fusion
– What is it?
– Cache Coherency Vs. Cache Fusion
– Key Components and terminology
} Cache Fusion in operation
– Lock Mastering & Resource Affinity
– Type of Contentions
– Cache Fusion – I
– Cache Fusion – II
– Examples
} Instance Crash Recovery in RAC
– Key Components in a Instance crash
– I Pass recovery
– II Pass recovery
}Cache Fusion – What is it?
}
What is it?
Oracle introduced the framework of sharing data using private interconnects
between the nodes, which was used only for messaging purposes in
previous versions. This protocol is Cache Fusion. Data blocks are shipped
throughout the network similar to messages, reducing the most expensive
component of data transfer, disk I/O, to data sharing.
According to the manual:
Process that implement Cache Fusion. It maintains the block mode for blocks
in the global role. It is responsible for block transfers between instances. The
Global Cache Service employs various background processes such as the
Global Cache Service Processes (LMSn) and Global Enqueue Service
Daemon (LMD).
A diskless cache coherency mechanism in Oracle Real Application Clusters
that provides copies of blocks directly from a holding instance's memory
cache to a requesting instance's memory cache.
}
Cache Coherency
} According to Manual
– The synchronization of data in multiple caches so that reading a memory
location through any cache will return the most recent data written to that
location through any other cache. Sometimes called cache consistency.
} Can We say its something to maintain the resource (block) status, If so, the
following two together provides the same for us.
– GCS (Global Cache Services)
– GES (Global Enqueue Services)
In the name of
Global Resource Directory
}
Now both together ………
} The GCS manages all types of data blocks. Cache coherency is maintained through the GCS by
requiring that instances acquire a resource (lock or enqueue on a block) cluster-wide before
modifying or reading a database block. The GCS is used to synchronize global cache access,
allowing only one instance to modify a block at any single point in time. The GCS, through the RAC
wide Global Services Directory, ensures that the status of data blocks cached in any mode in the
cluster is globally visible and maintained.
} Oracle’s RAC has multi-versioning architecture. This multi-versioning architecture distinguishes
between current data blocks and one or more consistent read (CR) versions of a block. A current
block contains changes for all committed and yet-to-be-committed transactions. A consistent read
(CR) version of a block represents a consistent snapshot of the data at a previous point in time. A
data block can reside in many buffer caches under the auspices of shared resources.
} In Oracle9i RAC, applying rollback segment information to current blocks produces consistent read
versions of a block. Both the current and consistent read blocks are managed by the GCS.
} To transfer data blocks among database caches, buffers are shipped by means of the high speed
IPC interconnect. Disk writes are only required for cache replacement. A past image (PI) of a block
is kept in memory before the block is sent if it is a dirty (modified) block. In the event of failure,
Oracle reconstructs the current version of the block by reading the PI blocks.
}
Background Process and their roles
} LMSx – Lock Monitor Services (GCS)
} Primarily responsible for shipping the blocks across buffers
} Provides/creates a CR image whenever there is cross instance call for a dirtyblcok
} LMS must also check constantly with the LMD background process (or our GES process) to get the lock
requests placed by the LMD process.
} Parameter: GCS_SERVER_PROCESS upto 36 as of 10.2, Min. cpu_count/2
} LMON – Lock Monitor Process (GES)
} LMON Processes manages the global locks & resources.
} Reconfiguration of locks & resources when an instance joins or leaves the cluster are handled by LMON (
During reconfiguration LMON generate the trace files)
} LMON also provides cluster group services.
} LMD – Lock Manager Daemon
} LMD process performs global lock deadlock detection local and remote . (GES)
} Also monitors for lock conversion timeouts.
} Basically maintains the lock queues, traverse through the GES structures
} LCK – Lock Process
} Manages instance resource requests & cross instance calls for shared resources.
} During instance recovery,it builds a list of invalid lock elements and validates lock elements.
} DIAG – Diagnostic Daemon
– Oracle 10g - this one new background processes ( New enhanced diagnosability framework).
Regularly monitors the health of the instance.
Also checks instance hangs & deadlocks.
} History of Cache Fusion
Oracle
Release
Feature
Description
Prior to 8.1.5
OPS
OPS used disk-based pings
8.1.5
Cache Fusion I or Consistent Read
Server
Consistent read version of the block is
transferred over the interconnect
9i
Cache Fusion II (write/write cache fusion)
Current version of the block is transferred
over the interconnect
10g R1
Oracle Cluster Ready Services (CRS)
CRS eliminates the need for third-party
clusterware, though it can be used
10g R2
Oracle CRS for High Availability
CRS provides high availability for nonOracle applications
} Key Components in Cache Fusion
Ping
The transfer of a data block from one instance’s buffer cache to another instance’s buffer cache is known as a ping.
Whenever an instance needs a block, it sends a request to the lock master to obtain a lock in the desired mode. If
another lock resides on the same block, the master will ask the current holder to downgrade/release the current lock.,
this process is known as a blocking asynchronous trap (BAST). When an instance receives a BAST it downgrades the
lock as soon as possible. However, before downgrading the lock, it might have to write the corresponding block to disk.
This operation sequence is known as disk ping or a hard ping.
CR Fabrication
When ever there is Consistent read request from any other instance, the holding instance (LMS) has to create a
Consistent read image by applying the undo information to the Current Block. Since CR fabrication is I/O
expensive which requires a undo into the buffer and apply the undo image etc.
Past Image (PI) Blocks
PI blocks are copies of blocks in the local buffer cache. Whenever an instance has to send a block it has recently
modified to another instance, it preserves a copy of that block, marking it as PI. An instance is obliged to keep Pls until
that block is written to the disk by the current owner of the block. Pls are discarded after the latest version of the block is
written to disk. When a block is written to disk and is known to have global role, indicating the presence of Pls in other
instances’ buffer caches, Global Cache Services (GCS) informs the instance holding the Pls to discard the Pls. With
Cache Fusion, a block is written to disk to satisfy checkpoint requests and so on, not to transfer the block from one
instance to another via disk.
Lock Mastering
The memory structure where GCS keeps information about a data block (and other sharable resources) usage is known
as the lock resource. The responsibility of tracking locks is distributed among all the instances and the required memory
also comes from the participating instances’ System Global Area (SGA). Due to this distributed ownership of the
resources, a master node exists for each lock resource. The master node maintains complete information about current
users and requestors for the lock resource. The master node also contains information about the Pls of the block.
}
Resource Affinity and Dynamic remastering
} Each block is mastered in any one of the instance at any given point of time
} Resource Master can be changed based on frequency of the block that is requested by other
instances
– For a period of 10 Mins if an instance request 50 times for a particular resource the requested instance
become the master. This is called resource affinity
- Block Mastering
} In Oracle 9.2
– documentation describes dynamic remastering
– not implemented in code
} In Oracle 10.1
–
–
–
–
work at data file level
very high threshold so difficult to test
does occur on some customer sites
may cause LMON process to crash in 10.1.0.4
}
}
bug 3659289 - patch available
fixed in 10.1.0.5/10.2.0.1
} In Oracle 10.2
– works at object level
– thresholds are relatively low.
– Object re mastering is recorded in V$GCSPFMASTER_INFO
}
Cache Fusion- Possible Types of Contention
} Contention of a resource occurs when two or more instances want the same
resource. If a resource such as a data block is being used by an instance and is
needed by another instance at the same time, a contention occurs. There are three
types of contention for data blocks:
} Read/Read contention Read/read contention is never a problem because of the shared disk system. A block read by one
instance can be read by other instances without the intervention of GCS.
} Write/Read contention Write/read contention was addressed in Oracle 8i by the consistent read server. The holding
instance constructs the CR block and ships the requesting instance using interconnects.
} Write/Write contention Write/write contention is addressed by the Cache Fusion technology. Since Oracle 9i, cluster
interconnect is used in some cases to ship data blocks among the instances that need to modify the same data block
simultaneously.
}
Prior to Cache Fusion (before 8.1.5)
Write/read contention before Cache Fusion
}
Cache Fusion – I aka Consistent Read Server
Write/Read contention - CR Block Transfer in Cache Fusion
Oracle Introduced a background process called BSP (Block Server process) makes the CR fabrication at the holder’s cache and ships the
CR version of the block across the interconnect
}
Still need to address Write/Write Contention
Write / Write Contention before Cache Fusion – II (before 9i)
}
So now – Cache Fusion – II or Write/Write Cache Fusion
Cache Fusion current block transfer (from 9i r2 )
}
Buffer States In Cache Fusion
Mode/Role
Local
Global
Null: N
NL
NG
Shared: S
SL
SG
Exclusive: X
XL
XG
SL When an instance has a resource in SL form, it can serve a copy of the block to other instances and it can read
the block from disk. Since the block is not modified, there is no need to write to disk.
XL When an instance has a resource in XL form, it has sole ownership and interest in that resource. It also has the
exclusive right to modify the block. All changes to the blocks are in its local buffer cache, and it can write the block to
disk. If another instance wants the block, it will contact the instance via GCS.
NL A NL form is used to protect consistent read blocks. If a block is held in SL mode and another instance wants it in
X mode, the current instance will send the block to the requesting instance and downgrade its role to NL.
SG In SG form, a block is present in one or more instances. An instance can read the block from disk and serve it to
other instances.
XG In XG form, a block can have one or more Pls, indicating multiple copies of the block in several instances’ buffer
caches. The instance with the XG role has the latest copy of the block and is the most likely candidate to write the
block to disk. GCS can ask the instance with the XG role to write the block to disk or to serve it to another instance.
NG After discarding Pls when instructed by GCS, the block is kept in the buffer cache with NG role. This serves only
as the CR copy of the block.
}
Example 1: Reading a Block from Disk
}
Example 2: Reading a Block from the Cache
}
Example 3: Getting a (Cached) Clean Block for Update
}
Example 4: Getting a (Cached) Modified Block for Update and Commit
}
Example 5: Commit the Previously Modified Block and Select the Data
}
Example 6: Write the Dirty Buffers to Disk Due to Checkpoint
}
Example 7: Master Instance Crash
}
Example 7: What Alert log says abt reconfiguration…….
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
List of nodes:
012
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
* domain 0 valid = 0 according to instance 0
Wed Jun 21 23:22:22 2006
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Jun 21 23:22:22 2006
LMS 0: 0 GCS shadows cancelled, 0 closed
Wed Jun 21 23:22:22 2006
LMS 2: 0 GCS shadows cancelled, 0 closed
Wed Jun 21 23:22:22 2006
LMS 3: 0 GCS shadows cancelled, 0 closed
Wed Jun 21 23:22:22 2006
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed Jun 21 23:22:22 2006
LMS 0: 2189 GCS shadows traversed, 332 replayed
Wed Jun 21 23:22:22 2006
LMS 2: 2027 GCS shadows traversed, 364 replayed
Wed Jun 21 23:22:22 2006
LMS 3: 2098 GCS shadows traversed, 364 replayed
Wed Jun 21 23:22:22 2006
LMS 1: 2189 GCS shadows traversed, 343 replayed
Wed Jun 21 23:22:22 2006
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
}
Crash Recovery – Key Components
 Redo Threads and Streams
 Redo Records and Change Vectors
 Checkpoints
Thread Checkpoint or Local Checkpoint
Database Checkpoint or Global Checkpoint
Incremental Checkpoint
Bounded Recovery
Block Written Record (BWR)
Past Image (PI)
Checkpoints and PI
 I Pass Recovery
 II Pass Recovery
 Merge Threads
}
Cache Fusion - Crash Instance
Recovery
The steps for GRD reconfiguration are as follows:
Instance death is detected by the cluster manager.
Requests for PCM locks are frozen.
Enqueues are reconfigured and made
available.
DLM recovery.
GCS (PCM lock) is remastered.
Pending writes and notifications are
processed.
The steps for I Pass recovery are as follows:
The instance recovery (IR) lock is acquired
by SMON.
The recovery set is prepared and built.
Memory space is allocated in the SMON
Program Global Area (PGA).
SMON acquires locks on buffers that need
recovery.
II Pass recovery steps are as follows:
II Pass is initiated. The database is partially
available.
Blocks are made available as they are
recovered.
The IR lock is released by SMON. Recovery
is complete.
The system is available.
}
Example 8: Select the Rows from Instance A
}
Just for a clear understanding……
} Its time to play ……
}
Cross Instance Consistent Read
Instance 1
Instance 2
Session 15
LMS0
SELECT runs,wickets
FROM score
WHERE team = 'ENG';
Build read
consistent version
of block 42
Session 27
UPDATE score
SET runs = runs + 6
4
2
WHERE team = 'ENG';
segment 5 slot 18:
state: 10
wrap#: 4E7
dba: 00800777
Undo Header
ITL1
ITL1
ITL1
seq: 530 irb 12
xid: 0005.018.4E7
xid: 0005.018.4E7
xid: 0005.018.4E7
xid: 0005.018.4E7
uba:
uba: -800777.530.12
800777.530.13
800777.530.12
800777.530.13
800777.530.14
uba:
uba: -800777.530.12
800777.530.13
800777.530.12
800777.530.13
800777.530.14
uba: 800777.530.14
800777.530.12
800777.530.13
slot 0
slot 0
slot 0
col1: ENG
col1: ENG
col1: ENG
col2: 340
350
344
352
col2: 340
350
344
352
340
col2: 352
344
350
col3: 1
col3: 1
col3: 1
12 uba: 5.1
slot 1
slot 1
col1: AUS
col1: AUS
col1: AUS
col2: 99
col2: 99
col2: 99
col3: 10
col3: 10
col3: 10
DataData
Block
Block
42 (copy)
42
DataData
Block
Block
42 (copy)
42
Data Block 42
col3: 340
13 uba 800777.530.12
5.1
slot 1
block 42 slot 0
block 42 slot 0
col3: 344
14 uba 800777.530.13
5.1
block 42 slot 0
col3: 350
Undo Block 800777
}
Commited Block – Block on Disk
Session15
LMS0
Session27
22:9
22:10
ENG 199
ENG 205
ENG 205
199
200
204
AUS 99
AUS 99
ENG 204
Block 42
Undo
Block
SELECT runs
FROM score
WHERE team = 'ENG';
199
ENG 205
AUS 99
Instance 1
ENG 200
Instance 2
UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';
COMMIT;
Committed Block – Block on Buffer Cache
}
Session15
LMS0
Session27
22:9
22:10
ENG 199
ENG 205
ENG 205
200
204
199
AUS 99
AUS 99
ENG 204
Block 42
Undo
Block
SELECT runs
FROM score
WHERE team = 'ENG';
ENG 199
AUS 99
Instance 1
STOP
ENG 200
Instance 2
UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';
COMMIT;
}
Uncommitted Block – Block in Buffer cache
Session15
LMS0
Session27
22:10
ENG 199
ENG 199
ENG 199
205
204
200
AUS 99
SELECT runs
FROM score
WHERE team = 'ENG';
ENG 205
199
200
204
AUS 99
AUS 99
ENG 204
Block 42
Copy
Block 42
Undo
Block
ENG 199
AUS 99
Instance 1
ENG 200
Instance 2
UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';
}
Uncommitted Block – On Disk
Session15
LMS0
Session27
ENG 199
ENG 205
199
200
204
AUS 99
SELECT runs
FROM score
WHERE team = 'ENG';
22:10
ENG 199
ENG 200
ENG 205
199
200
204
ENG 204
AUS 99
ENG 204
Block 42
Undo
Block
ENG 200
UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';
ENG 205
199
200
204
SEE SLIDE NOTES
FOR ADDITIONAL
INFORMATION
AUS 99
Instance 1
Instance 2
Q &A
}
References:} Oracle 10g Real Application Clusters handbook – K Gopalkrishnan
} Julian Dyke – RAC Presentation
} Oracle 10g RAC Administrators Guide
Download