GuendertThe performance information gap paradox

advertisement
THE PERFORMANCE
INFORMATION GAP PARADOX
ECC CONFERENCE 2012
Dr. Steve Guendert
Principal Engineer/Global Solutions Architect
Brocade Mainframe Solutions
Stephen.guendert@brocade.com
@DrSteveGuendert
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
1
Intros-Dr. Steve Guendert
• Brocade Principal Engineer/Solutions Architect, focused on mainframe-System z
• Academic
• Ph. D. MIS, M.S. in MIS, MBA
• IEEE Senior Member
• ACM Senior Member
• Computer Measurement Group (CMG)
• Intl. Board of Directors 2011-present (Director of Publications)
• Storage Subject Chair 2007-2008
• SHARE
• Board of Directors/EDC Program Manager 2007-2011
• zNextGen co-founder
• Author
• CMG Editorial Review Board (ERB), zJournal ERB
• Published over 40 papers for Brocade and in zJournal, CMG, NaSPA Technical Support,
Disaster Recovery Journal
• Industry experience
• IBM, McDATA, CNT, Brocade, end user
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
2
Agenda
• Abstract
• What is the Performance Information Gap Paradox?
• How/why did we get here?
• Hope on the horizon
• Starting to solve the paradox-Perf. Monitoring basics
• Next steps and what is still needed
• Concluding thoughts and observations
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
3
Abstract
Soon to be published article in zJournal
• System z I/O technology has made significant advances over the past
five years. These advances are quite diverse, yet synergistic, ranging
from the mainframe itself (processors, STIs, busses, channels) to the
FICON directors, to the storage control units and devices. Speeds and
feeds get continuously faster and new technologies have emerged.
Some of these new technologies include MIDAW, HyperPAVs, z High
Performance FICON (zHPF), 16 Gbps FICON, FICON Dynamic Channel
path Management (DCM), and the PCIe I/O drawers. There promises to
be even more changes on the horizon. Accompanying changes have
also occurred in SMF/RMF. Understanding the new technologies is
crucial to your job, but so is understanding their performance using
SMF/RMF.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
4
What is the Performance Information Gap
Paradox?
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
5
Paradox defined
• A paradox is a statement or group of statements that
leads to a contradiction or a situation which (if true)
defies logic or reason, similar to circular reasoning.
• Example-the classic time travel grandfather paradox
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
6
Recent mainframe I/O and storage
technology enhancements
• Mainframe itself
•
•
•
•
•
New processors
New internals-STIs, busses, I/O drawers
Channel technology
Dynamic Channel path Management (DCM)
Z Discovery and Auto Configuration (zDAC)
• Storage
• Higher capacity, faster traditional disk and tape/virtual tape
technology
• Higher capacity, more capable switching technology
• SSDs
• EAVs, HyperPAVs
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
7
Recent mainframe I/O and storage
technology enhancements
• SAN
• Larger, faster switching devices
• Further distances (buffer credits), virtual fabrics
• Better management tools
• Protocol/combination
• Z High Performance FICON (zHPF)
• MIDAW
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
8
Performance Information Gap Paradox
What is it?
• My job/role and what I do
• Conclusion from past five years
• Aforementioned technology advances
• Globally, we have a less than optimal understanding of the root
functionality of this new technology
• Case in point: zHPF
• Accompanying lack of understanding of the performance
implications and/or performance management of the new
technology.
• There’s the paradox
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
9
How/why did we get here?
“The four horsemen”
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
10
The 2007Global Recession
BIOB
• Job cuts
• Training budgets cut
• Travel budgets cut
• Training oriented conferences suffered
• SHARE
• CMG
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
11
Computer Science/MIS curriculum
• IBM Academic Initiative
• How many departments offer
performance/management and/or capacity planning?
• “Queuing theory and Statistics is too boring”
• End up with OJT and read your own
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
12
The evil vendor conspiracy theory
• Advances in technology
and decreasing hardware
prices lead to “throwing
more hardware at
performance problems”
• The politician-Silicon
Valley industrial complex.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
13
Slow recognition of the budding problem
Lack of being proactive
• Blissful ignorance by
vendors and end users?
• Short sightedness and
short run focus on
quarterly results?
• Vendor personnel
suffering from same
training issues?
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
14
Hope on the horizon
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
15
Feb-March 2012 China business trip
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
16
Solving the paradox
Basics/building blocks
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
17
What is a performance problem?
• Wide variety of views, with following in common:
• Unacceptably high resource usage
• Unacceptably high response/service time
• Numerically qualifiable pain
• In general:
• Workload not getting resources to complete in timely matter
• Resources available, but not obtained fast enough to meet
service level objective(s)
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
18
Solving a performance problem
• Acknowledge/identify the problem
• Root cause analysis/root cause determination
• If you cannot determine the root cause you will not
come up with the correct long term solution
• Temporary “band-aid” fixes that mask a symptom, but don’t
cure the disease
• The classic “blame the buffer credits” game
• In the mainframe world, we use SMF/RMF for this
process
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
19
Mainframe I/O and storage performance
monitoring basics
What’s performance analysis?
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
20
Service Level Agreement/Objective
(SLA/SLO)
• Match the business needs to the subjective
perceptions in IT
• Typically is a contract that defines, describes, and
enforces measurable specifics.
• Performance SLAs: typically focus on average
transaction response time:
• CPU, I/O, network, or all of the above
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
21
What’s performance analysis?
• Performance analysis refers to the techniques and tools that are used
to enforce in your IT systems the performance objectives defined in your
SLAs/SLOs.
• The overall goal is to make the best use of your current resources to
meet these objectives without excessive tuning efforts.
• RMF provides an interface to an installation’s System z environment,
allowing the end user to accomplish this.
• RMF facilitates reporting and detailed measurements of installations’
critical resources in their System z environment.
• RMF issues reports about performance problems as they occur, so that
your installation can take action before the problems become critical.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
22
RMF (Resource Measurement Facility)
Uses
• Determine that your system is running smoothly
• Detect system bottlenecks caused by contention for
resources
• Identify the workload delayed and the reason for the
delay
• Monitor system failures, system stalls, and failures of
selected applications
• Evaluate the service your installation provides to
different groups of users
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
23
Using RMF
How do I know where to look and what to look for?
• The next slide is a diagram representing a high level view of a single
zEnterprise 196, attached via multiple (non-cascaded) FICON directors
to an enterprise class DASD array.
• The DASD array in the diagram is a somewhat simplistic, generic
illustration intended to represent the control unit, the devices, and the
adapters connecting the DASD array to the FICON directors).
• listed on the diagram are the RMF reports that are used with the
various components of this environment.
• These are the RMF reports that are the most commonly used reports
for identification, root cause analysis, and resolution (tuning) of I/O
related performance problems in a modern mainframe environment.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
24
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
25
The RMF reports
• SMF 78: RMF I/O Queuing Activity (IOQ).
• The I/O Queuing Activity report provides information on your
installation’s I/O configuration and activity rate, queue lengths,
and percentages when one or more I/O components, grouped
by a logical control unit (LCU), were busy.
• SMF 73: RMF Channel Path Activity (CHAN).
• The Channel Path Activity report provides basic information
about channel path use.
• It identifies each channel path by channel path identifier
(CHPID) and channel path type.
• It also reports the total channel utilization by the entire
mainframe, and the channel utilization by the individual LPAR.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
26
The RMF reports
• SMF 74-7: RMF FICON Director Activity (FCD).
• Provides useful capacity planning and troubleshooting
information for identifying potential bottlenecks and switch
latency at the individual port level.
• SMF 74-8: RMF ESS Link Statistics (ESS).
• Provides information and statistics on the utilization and
performance of the individual adapters on the DASD array.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
27
The RMF reports
• SMF 74-5: RMF Cache Activity (CACHE).
• Provides cache statistics on a subsystem basis as well as on a
more detailed device-level basis.
• SMF 74-1: RMF Device Activity (DEVICE).
• Provides information for all devices in one or more device
classes ( tape, DASD) or for those other devices you specify in
the DEVICE option.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
28
Simple methodology
• The Device Activity report is the report you are likely to
use first, and most in a performance analysis and
troubleshooting situation as it contains the response
time information.
• From there you would then start to “drill down” and
narrow what you are looking at to find the root cause of
the performance problem.
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
29
Next steps and what is still needed
• Add performance focused course(s) to curriculum
• At a minimum, have some lectures on the subject
• Encouragement as a viable career option
• Get active in CMG and/or SHARE
• Use vendors to educate
• ACM and IEEE CS Digital libraries
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
30
Concluding thoughts and observations
• We need to improve I/O and storage performance
education
• We need a renewed focus on capacity planning and
performance management as a valuable discipline and
career field.
• We have the chance to do this with the new generation
of mainframers “zNextGen”
• Follow on articles in zJournal will focus on the specifics
and details of the technologies and RMF records
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
31
Thank You
© 2012 Brocade Communications Systems, Inc. Company Proprietary Information
32
Download