z/OS New Year`s resolutions for Saving CPU Cycles & Improving I/O

advertisement
z/OS New Year's resolutions
for Saving CPU Cycles &
Improving I/O Utilization
February 12, 2013
© 2012 IBM Corporation
Session Agenda
 DB2 and IMS Database and Storage Integration Overview
 DB2 and IMS System Level Backup Methodologies and
Storage System Integration
 DB2 and IMS Back Ups Using Storage-Based Fast-replication
 Exposing DFSMShsm Resource Utilization
 Optimizing Your Batch Window
2
© 2012 IBM Corporation
Database and Storage Administration
Trends and Directions
 Large IMS and DB2 systems require high availability
– Fast and non-intrusive backup and cloning facilities are required
– Fast recovery capabilities minimize downtime and promote high
availability
– Most backup, recovery and cloning solutions do not leverage storagebased fast-replication facilities
 Storage-based fast-replication facilities are under-utilized
– Tend to be used by storage organizations
– Tend not to be used by database administrators (DBAs)
 Storage-aware database products allow DBAs to use fastreplication in a safe and transparent manner
– Provides fast and non-intrusive backup and cloning operations
– Simplifies recovery operations and reduces recovery time
– Simplifies disaster recovery procedures
3
© 2012 IBM Corporation
Database and Storage Integration
Application and
Database Management
Domain
Mainframe
Database
Systems
• Organizational Integration
Storage-Aware
Database Tools
• New Backup Methods
• New Recovery Strategies
• Business Recovery Monitoring
• Cloning Automation
Storage Administration
and
Business Continuity
Domain
4
Source
Database
• Disaster Restart Solutions
Backup,
Clone,
DR
© 2012 IBM Corporation
Host Based Data Copy Options
 Data copy processes use host based CPU and I/O facilities
 More costly and slower than storage-based fast replication
 Volume copy options
–
–
–
–
DFSMSdss (IBM)
FDR (Innovation Data Processing)
TDMF (IBM)
FDRPAS (Innovation Data Processing)
Host-based
Copy Process
 Data set copy options
– DFSMSdss (IBM)
– FDR (Innovation Data Processing)
5
© 2012 IBM Corporation
What is Storage-based Fast Replication?
 An instant copy of a volume/data set at a
specific point in time
– Builds a bitmap to describe the source volume
– After the bitmap has been created, the source and target
volume data can be used immediately
Storage
Processor-based
Copy Process
 Data movement (CPU and I/O) offloaded to
storage processor
– Frees up resources on host processor
– No host CPU or I/O costs
 For volume replication a relationship is
established between a source and a target
– Geometrically similar devices
 Consistency Groups
– Group of volumes copied at exactly the same point in time
while maintaining the order of dependent writes
6
© 2012 IBM Corporation
Advantages Using Storage-Based Fast Replication
 Fast
• Copies data instantaneously
 Provides high availability
• Provides a consistent copy of production without sacrificing availability
• Allows clones or recoveries to be available quicker
 Provides huge cost savings
• Doesn’t use host CPU or I/O resources
– Copy process is done in the storage processor
> Save CPU and I/O costs
• Save personnel time
7
7
© 2012 IBM Corporation
Database System Level Backup Overview
 A Backup or Clone of the entire DB2 or IMS
environment at a point in time
– Recorded in metadata repositories
DB2 or IMS
 Leverages storage-based fast replication to drive
the volume backup
Storage aware
DB2 and IMS
backup
– Backup instantly - performed in seconds
– Offloading data copy process to the storage processor
saves CPU and I/O resources
– Faster than data set copies
Storage Processor APIs
 Backup DB2 and IMS without affecting
applications
Source
DB2 or IMS
Volumes
– Backup windows reduced by replacing image copies
– Extends processing windows
 Data consistency ensures data is dependent-write
consistent
– DB2 suspend, IMS suspend
– Storage-based consistency functions
– Equivalent to a power failure
Target
Volumes
DB2 or IMS
System
Backup
8
8
© 2012 IBM Corporation
Database System Level Backup Overview
 Backup validation each time
ensures successful recoveries
DB2 or IMS
– Insurance that a backup is available
Storage-Aware
Backup and
Recovery
 Automated backup offload
(archive/recall)
– Copies system backup from fast
replication disk to tape for use at
either local or disaster site (or both)
 Can be used in combination
with image copies
Storage Processor APIs
Source
Database
Volumes
System
Backup
SLB
Offload
Tape
Processing
9
© 2012 IBM Corporation
Benefits of SLB over Image Copies and Change Accums
 Creating SLB with Fast Replication is equivalent to:
– Creating all Image Copies with < 1 second of IMS or DB2 unavailable time
– SLB created using storage processor CPU (not Host CPU)
– Significant CPU cost savings
 Guaranteed Recoverability
– Validation of IMS and DB2 configuration each time SLB is created
 Fast restore with Parallel Log Apply
– Reduces recovery time and complexity
– Executes the restore in parallel with the log apply
 Change Accumulations may not be needed
– System Level Backups can be created frequently
– Save host CPU and I/O
 Significantly reduce costs by using less CPU and I/O resources
– Reduce costs to create backups
– Save cost by reducing number of image copies needed
10
© 2012 IBM Corporation
SLB Disaster Recovery Benefits
 Simplifies disaster recovery operations
–System level backup for restart
–System level backup and roll forward
 Taking full volume dumps for disaster recovery?
‒System level backups add automation and a meta-data repository
• Can now use the backup for multiple purposes
 Basis for tape-based DB2 and IMS coordinated recovery
- Restore IMS and DB2 systems back to a transactionally consistent
point which is the backup time or end of the last common log
11
© 2012 IBM Corporation
Integrating SLB’s into Recovery
Using an Intelligent Recovery Manager
 Recovers application, individual database, or indexes
– Using Current, Timestamp, or PITR
 Application profile is created in advance
– Single database or group of databases
– Logically related databases and indexes can be included automatically
 Determines best recovery method
–
–
–
–
Restores from either IC or SLB
Indexes that can not be restored are rebuilt
Recovery using log apply needs one pass of the logs
Access to DBs is automatically stopped and restarted at end of recovery
 Storage-based fast-replication performs restore
– Performs an instantaneous data set restore process
12
© 2012 IBM Corporation
Customer Experience
Using
SLB Resource Assessment Tool
© 2012 IBM Corporation
Customer Experience
 EXCP Consumption for Image Copies over 28 day period
– Top 5 systems
IMS System
1.
2.
3.
4.
5.
14
IMS1
IMS2
IMS3
IMS4
IMS5
EXCPs
573,323,342
549,197,344
547,836,773
446,749,090
263,317,210
DB2 System
1.
2.
3.
4.
5.
DB21
DB22
DB23
DB24
DB25
EXCPs
88,390,971
85,007,495
78,792,982
53,788,217
34,337,687
© 2012 IBM Corporation
Customer Experience
 Minimizing EXCP Consumption
– Product using Fast Replication Technologies
– Offloads the backup processing
• From the CPU to the Storage Processor
– Reducing number of EXCPs results in:
• CPU reduction
• Elapsed time to execute
• Frees up resources for other business processing
 EXCPs consumed today vs. Estimated EXCPs using SLB’s
IMS
15
DB2
© 2012 IBM Corporation
Customer Experience
 Backup Processing
– 9 IMS systems
• More than 60 hours of elapsed
time running Image Copy
backups
– 15 DB2 systems
• More than 57 hours of elapsed
time running Image Copy
backups
16
© 2012 IBM Corporation
Financials
 Projected Image Copy vs. SLB Cost Savings for IMS
SECTION A - Monthly Im age Copy Costs
CPU and I/O Cost:
Total Image Copy CPU seconds
429,536
Total Image Copy EXCPs
3,316,752,222
Total CPU costs for Image Copies
$
50,685.21
Total EXCP cost for Image Copies
$
132,670.09
Total CPU and EXCP costs for Im age Copies
$
183,355.30
Total annual cost of im age copies
$
2,200,263.64
SECTION B - System Level Backup
CPU and EXCP Cost:
Per volume CPU seconds (default 0.023)
0.023
Per volume EXCP (default 155)
155
Total CPU costs for specified number of volumes
$
19.74
Total EXCP costs for specified number of volumes
$
45.09
Total CPU and EXCP costs
64.82
Total Cost:
System level backups per day - 1 per day / per system
Weekly system level backup cost
9
$
4,083.82
Yearly system level backup cost
$
212,358.86
Total annual cost of system level backups
$
212,358.86
Note: Costs of CPU and EXCPs are agreed upon by Rocket Software and customer. Defaults values were used for the
purpose of this assessment. CPU cost per second used is $0.118 and cost per 1000’s EXCPs used is $0.04.
17
© 2012 IBM Corporation
Financials
 Projected Image Copy vs. SLB Cost Savings for DB2
SECTION A - Monthly Im age Copy Costs
CPU and I/O Cost:
Total Image Copy CPU seconds (includes DBM1 Address Space Work)
94,640
Total Image Copy EXCPs (includes DBM1 Address Space Work)
1,244,865,263
Total CPU costs for Image Copies
$
11,167.47
Total EXCP cost for Image Copies
$
49,794.61
Total CPU and EXCP costs for Im age Copies
$
60,962.08
Total annual cost of im age copies
$
731,544.92
SECTION B - System Level Backup
CPU and EXCP Cost:
Per volume CPU seconds (default 0.023)
(This includes CPU from the system level backup and DB2 address space for the
system level backup operation from testing performed at Rocket)
0.023
Per volume EXCP (default 155)
(This includes EXCP from the system level backup and DB2 address space for the
system level backup operation from testing performed at Rocket)
155
Total CPU costs for specified number of volumes
$
7.11
Total EXCP costs for specified number of volumes
$
16.24
Total CPU and EXCP costs
23.35
Total Cost:
System level backups per day - 1 per day / per system
Weekly system level backup cost
15
$
2,451.31
Yearly system level backup cost
$
127,467.88
Total annual cost of system level backups
$
127,467.88
Note: Costs of CPU and EXCPs are agreed upon by Rocket Software and customer. Defaults values were used for the
purpose of this assessment. CPU cost per second used is $0.118 and cost per 1000’s EXCPs used is $0.04.
18
© 2012 IBM Corporation
Financial Summary
 Projected Image Copy vs. SLB Cost Savings Summary
– IMS
System level backup Versus Image Copy Savings
Estim ated annual cost of im age copies
Estim ated savings by replacing 95% of im age copies w ith system level
backups
Estim ated annual cost of im age copies (retain 5% of im age copies) w hen
using system level backups
$
2,200,263.64
$
2,090,250.46
$
110,013.18
Estim ated annual cost using system level backup
Estim ated annual cost using system level backup w ith rem aining (5%) im age copies
$
212,358.86
$
322,372.05
Total estim ated annual savings using system level backups
$
1,877,891.59
$
731,544.92
Estim ated annual cost using system level backup w ith rem aining (5%) im age copies
$
164,045.13
Total estim ated annual savings using system level backups
$
567,499.79
– DB2
System level backup Versus Image Copy Savings
19
Estim ated annual cost of im age copies
Estim ated savings by replacing 95% of im age copies w ith system level
backups
Estim ated annual cost of im age copies (retain 5% of im age copies) w hen
using system level backups
$
694,967.67
$
36,577.25
Estim ated annual cost using system level backup
$
127,467.88
© 2012 IBM Corporation
20
20
© 2012 IBM Corporation
Exposing DFSMShsm
Resource Utilization and
Associated Costs
© 2012 IBM Corporation
A Look Inside Your DFSMShsm
Costs
– What resources are used by DFSMShsm to perform scheduled and
requested work?
– Migration / Recall / Backup / Recycle
– Successful vs. Unsuccessful (failures)
– Data duplication
Efficiency
– Where can performance and configuration tuning help?
– Reduce failed migrations, improve backup failures
Savings
– Can the reclaimed resource savings save CPU?
– Lost tapes, questionable old DFSMShsm data, failed cycles, thrashing,
etc.
22
22
© 2012 IBM Corporation
DFSMShsm Migration Failures
 Data that won’t migrate
– HSM attempts to migrate the data sets every day, using both CPU and
I/O until the processes fails
– This can go on every day for months, even years because the
administrator is not aware that it’s failing
– These data sets remain on disk, occupying space that should have been
released for new allocations
 Why won’t the data migrate?
– Structural errors
– Not enough space on ML1
– Unknown DSORG or otherwise not manageable by HSM
23
© 2012 IBM Corporation
Migration Failure Error Summary Example
Rc
24
Count
Message
05
9202
NO MIGRATION VOLUME AVAILABLE
06
15
16
8
19
447
24
4
37
1344
39
1
58
13
MIGRATION OR DBA DBU FAILED
82
24
TAPE MIGRATION UNSUPPORTED
99
3296
DUPLICATE DSN IN MCDS
PRIMARY COPY READ ERROR
DATA SET IN USE
DATA SET NOT AVAILABLE FOR MIGRATION
NO SPACE ON MIGRATION VOLUME
RACF PROCESSING ERROR
UNSUPPORTED DS
24
© 2012 IBM Corporation
DFSMShsm Backup Failures
 Data that Fails Backup
– HSM attempts to backup the data every day
– Sometimes this goes on every day for months, even years because the
administrator is not aware that it’s failing
– These data sets may rely on HSM for backup
 Why data can’t be backed up?
– Data sets are in use during backup
– Unknown DSORG or otherwise not manageable by HSM
– Errors in the data set
 Does this data really need to be backed up by HSM
– Are multiple back ups of the data occurring?
– Which back up is the right back up?
25
© 2012 IBM Corporation
DFSMShsm Recall Failures
 Data that Fails Recall
– HSM must move the data from ML1 or ML2 storage back to primary
DASD
– When a recall request fails, the requesting application may fail as well;
causing an outage
– If the data set cannot be recalled and no backup copy exist; application
disruption may occur until the situation is resolved
 Why data recalls fail?
– Data sets are not migrated, not managed by HSM
– Users issue multiple recalls for the same data set
– Tape volume not available
26
© 2012 IBM Corporation
DFSMShsm Data Thrashing Analysis
 Data Sets that are Thrashing
– Thrashing is data that is migrated and recalled, migrated and recalled, migrated
and recalled in a short period of time
• These data are typically production GDGs that are created earlier in the month and
then used again in weekly or monthly processing
 Thrashing Costs in Terms of CPU
– HSM uses CPU and I/O to migrate and recall data; compressing and
decompressing data from ML1
• The compression/decompression is all CPU
• Data migrated to ML2 uses both CPU and I/O – the data may be compressed by the
hardware but not by DFSMShsm
• If data on ML2 is being recalled from physical tape, this typically takes longer (wallclock time) than ML1
• A high number of recalls can place a burden on a virtual tape subsystem since the
data has been de-staged to physical tape and must be re-staged into the cache
– Executing jobs (or TSO sessions) wait for recalls
27
© 2012 IBM Corporation
Managing Aged (Unreferenced Data) in
DFSMShsm
28
© 2012 IBM Corporation
Retaining Data in DFSMShsm
 DFSMShsm is a Life Cycle Management System
– It makes perfect sense to retain data in DFSMShsm until it expires; that
is why we have DFSMShsm!
– However, there is a substantial cost associated with it!
 Where are the costs?
–
–
–
–
29
In daily RECYCLE
In the daily backup of the DFSMShsm control data sets
In duplicating ML2 tapes and or a mirrored virtual tape subsystem
In moving the data every 3 years or so to refresh storage media
© 2012 IBM Corporation
Cost of Managing Aged Data in DFSMShsm
 Managing inactive data in DFSMShsm for long periods of time has a
cost in terms of daily CPU, I/O and Storage Resources
 Inactive data is data that is 2 years old or older and has not been
recalled (used) in 1 year or more
– CPU and I/O to RECYCLE the tapes (recycle typically runs daily)
• Recycle is the act of deleting expired data and moving non-expired data to another
tape
– Data Storage Costs
• DFSMShsm data is typically stored on DASD, physical TAPE or virtual tape
– DFSMShsm Backup and Reorganization Costs
• Every migrated data set has at least 2 CDS records; 3 if VSAM
• Every data set that is backed up has at least 2 CDS records plus 1 MCC record for
each backup copy
• DFSMShsm Control Data Sets are backed up daily; catalogs are backed up multiple
times per day
– Duplication of ML2 and Backup Data
• The cost of storing inactive data in DFSMShsm is further exasperated by the
duplication of this data for DR purposes
30
© 2012 IBM Corporation
Cost of Managing Aged Data in DFSMShsm
 Bottom Line…the costs associated are:
– Daily recycle
– Daily backup of the control data sets
– Daily backup of catalogs
– Data duplication (remote mirroring for Business Continuity)
Backups
Recycle
31
© 2012 IBM Corporation
In Addition…
 The majority of customers polled are now using a virtual tape
subsystem for DFSMShsm ML2 data
– More tape drives available
– Faster recalls (typically)
– Capability to mirror remotely
 A virtual tape subsystem has an average life span of just 3 years
– Data must be copied from one virtual tape subsystem to another every 3 years
• Data must be “migrated” to newer technology when disk is replaced
– This adds to the cost of data storage for long term retention
 Data that needs to be retained longer than three years will outlive the
virtual tape it’s stored on
– Data with long term storage requirements must be housed on media that can
support the requirement
• Generally, all tape media used in zOS environments meet these requirements
• Current tape media has a 10 -15 year life span
32
© 2012 IBM Corporation
If You MUST Keep This Data…
 Data migrated more than 2 years and has not been recalled are
candidates for archival
– Possible solution is to use an Archive Manager
– Deletes entries from the MCDS, BCDS and Catalog
– Improves performance and saves CPU resources
 Benefits of archiving aged data
–
–
–
–
–
–
33
HSM MCDS and BCDS record count reduction
DASD space requirements reduction for CDSs and CDS backup copies
Saved CPU and I/O from moving aged from tape to tape during recycle
Tape recycle activity and CPU time reduction
Related data archived together; expires together
Possible catalog record count reduction, catalog backup and CPU time
reduction
© 2012 IBM Corporation
34
34
© 2012 IBM Corporation
Optimizing Your Batch
Window
© 2012 IBM Corporation
Performance Challenges are Increasing
 Online availability requirement are increasing
– High demand for information access and up to the minute data
 Service Level Agreements are more stringent
 Determining system-wide impact of application tuning
activities is difficult
 Batch windows are getting smaller
– Batch jobs get bottlenecked by extensive I/Os, blocking their ability to
run at peak speed
– Little time to optimize batch jobs
36
36
© 2012 IBM Corporation
Why is Optimization of I/O Important?
 Growth and batch processing window constraints
 Business needs out of alignment with application design
 Legacy application integration with e-business
 Data center consolidations
 Extending business without the need to upgrade systems
Time = $$$
Optimization Saves Time
37
37
© 2012 IBM Corporation
Batch Window Constraints
 Batch processing is composed of:
– CPU Cycles
– Memory
– I/O
How can I/O constraints be reduced to
improve batch elapsed time?
38
38
© 2012 IBM Corporation
Reducing Batch I/O Constraints
 Look for an intelligent, intuitive and integrated optimization
tool that:
– Significantly reduces elapsed times of batch processing
– Reduces batch processing requirements
– Is storage platform independent
– Automatically enhances buffering to improve batch cycles
39
39
© 2012 IBM Corporation
I/O Without Using an Optimization Product
Small buffer size
Many I/O operations
HOST
DASD
Buffer
Get / Put
Program
EXCP
I/O
 Inefficient I/O operations
 Relying on system defaults
 Low performance
 Improper tuning
 System is not utilized
to its maximum
capacity
 Lack of flexibility when change
is required from sequential to
random access (or vice versa)
40
40
© 2012 IBM Corporation
I/O Using an Optimization Product
Large buffer size
Fewer I/O operations
Buffer
HOST
DASD
Get / Put
Program
EXCP
I/O
Automatically adjusts the buffers
No need for
application
modification
Reduces number of I/Os dramatically
Increases performance
Frees system resources
41
41
© 2012 IBM Corporation
Optimization Product Functions
 Automated Batch I/O Tuning Solution
– Significantly improve system-wide performance for VSAM and nonVSAM batch processing
– Reductions of batch elapsed time in the range of 25-75%
– Benefits VSAM, non-VSAM (QSAM, BSAM) and VSAM loads
 Accomplishes this by:
– Reducing CPU overhead associated with I/O (EXCPs)
– Exploiting “locality of reference” principle in real storage
• Refers to ‘reuse of specific data, and/or resources, within a relatively small
time duration’
– Adapting NSR/LSR Buffering to changes in file processing
– Enabling VSAM LSR and Hiperspace for high level code
42
42
© 2012 IBM Corporation
Customer Experience
 Results of I/O Optimization – Wall Clock Savings
43
Without I/O
Optimization
With I/O
Optimization
Percent
Improvement
VSAM Job1
00:00:12.06
00:00:02.20
81.76
VSAM Job 2
00:01:17.53
00:00:17.68
77.20
VSAM Job 3
00:01:38.01
00:00:19.05
80.56
Non-VSAM Job1
00:00:11.97
00:00:06.36
46.87
Non-VSAM Job2
00:00:11.74
00:00:06.44
45.14
Load Job1
00:01:20.71
00:00:14.02
82.63
Load Job2
00:00:23.03
00:00:05.08
77.94
Load Job3
00:03:34.37
00:00:33.88
84.20
© 2012 IBM Corporation
Customer Experience
 Results of I/O Optimization – EXCP Savings
44
EXCPs Without I/O
Optimization
EXCPs With I/O
Optimization
Percent
Improvement
VSAM Job 4
1,457,551
110,461
92
VSAM Job 5
847,287
89,247
89
VSAM Job 6
2,589,771
334,058
87
Non-VSAM Job3
4,839,708
1,995,825
58
Non-VSAM Job4
3,800,729
1,454,560
61
Load Job 4
9,498,212
227,177
97
Load Job 5
8,665,813
205,981
97
Load Job 6
8,694,282
184,257
97
© 2012 IBM Corporation
45
45
© 2012 IBM Corporation
Download