The twelve days of degradation - Individual CMG Regions and SIGs

advertisement
Non-linear scaling of long running batch jobs:
“The twelve days of degradation”
Computer Measurement Group Conference 2011
Paper # 5002
Session #514
Chris B. Papineau
Software Architect
Oracle Corporation
Denver, CO
Non-linear scaling of long running batch jobs
Introduction
• Forecast for today:
100% Cloud free
Non-linear scaling of long running batch jobs
Introduction
• The Twelve Days of Christmas
• A Christmas carol which scales non-linearly
– Singing all twelve verses takes more than 12 times as long as
verse #1 alone
• Why? Each of the twelve verses takes progressively longer
to sing
– Each verse adds something new and includes all previous gifts
… four software fixes
… three hardware upgrades
… two OS patches
And a bug fix for your custom
program!
Non-linear scaling of long running batch jobs
Introduction
Day #1:
Day #2:
Day #6:
Day #12:
.
.
.
Day #n:
1 gift
3 gifts
21 gifts
78 gifts
n(n+1)/2 gifts
Non-linear scaling of long running batch jobs
Introduction
TOTAL of all gifts, all 12 days:
n=1
∑
n(n+1)/2
n=12
…. = 364 total gifts
…. But we digress…….
Non-linear scaling of long running batch jobs
Introduction
• This customer case study will address
ONE common cause of non-linearity in
large batch jobs
– It is NOT the only possible cause – just a fairly
common one
– Along the way, performance analysis
principles used will be highlighted
• The cause deals with resource leaks
– …as opposed to memory leaks
• The batch job in question comes from a
typical Enterprise Resource Planning
(ERP) package
Nonlinear scaling of long running batch jobs
Agenda
• Problem Definition
• Analysis
• Results
• Conclusion
Nonlinear scaling of long running batch jobs
Problem Definition
• “If you don't know where you're going, chances are you will
end up somewhere else.”
- Yogi Berra
Nonlinear scaling of long running batch jobs
Problem Definition
• The Problem
– Several batch programs from a typical ERP package
did not scale with increasing data input
– They are all Electronic Data Interchange (EDI) related
• Batch ‘A’ - creates EDI Receiving Advices
– 3000 records: 55m
– 6000 records: 396m
– 9000 records: 722m
• Batch ‘B’ - generates EDI Inbound Purchase Orders
– 3000 records: 8m
– 6000 records: 22m
– 9000 records: 110m
• Batch ‘C’ - processes EDI Inbound Purchase Orders
– 3000 records: 6m
– 6000 records: 22m
– 9000 records: 45m
Nonlinear scaling of long running batch jobs
Problem Definition
• The runtime rises
much more
dramatically than the
amount of input data.
• Batch ‘A’ was by far
the worst and most
disruptive offender
800.00
Runtime
(minutes)
Data showing the non-linear scaling of batch jobs
9000 records
700.00
600.00
500.00
6000 records
400.00
300.00
200.00
3000 records
100.00
0.00
Batch ‘B’
R47011
Batch ‘C’
R4311Z1I
Batch ‘A’
R47071
1
9.37
7.36
55.56
2
8.12
6.19
53.40
3
8.00
6.24
54.18
4
22.36
22.46
396.05
5
110.12
45.46
722.39
Batch ‘B’
R47011
Batch ‘C’
R4311Z1I
Batch ‘A’
R47071
Nonlinear scaling of long running batch jobs
Problem Definition
• Platform: IBM iSeries + DB2 DBMS
• Background on the ERP product in question:
– Business Logic in the batch programs is implemented in the C
language in the form of dynamically loaded functions stored in
Dynamically Linked Libraries (DLLs)
• These are called “Business Functions” (BSFNs)
• This is the “Application code”
– End users have access to the source code of the BSFNs
• Power users may modify the code, or create new BSFNs
– BSFNs, in turn, make calls to proprietary libraries of functions which
are part of the ERP infrastructure
• “Tools APIs” provide technical back-end functionality
–
–
–
–
–
–
Data caching
Database I/O
SQL statement construction
Middleware
Data formatting
Other assorted functionalities
Nonlinear scaling of long running batch jobs
Analysis
• Problem Definition
• Analysis
• Results
• Conclusion
Nonlinear scaling of long running batch jobs
Analysis
• You get a sharp pain in your eye when
drinking a cup of coffee…
• Do you:
– Run chemical tests on the coffee, send to a lab,
involve an ophthalmologist, check for food
allergies
OR….
– Take the spoon out of the cup?
Nonlinear scaling of long running batch jobs
Analysis
• Identify low-hanging fruit items first
– “When your mother tells you she loves you, get a second opinion”
- from Primal Fear
• Run conditions verified:
– Was the SAME data set used for each test?
• √ The input data range for all the runs (3000 / 6000 / 9000 records) was identical,
and the database was reset prior to each test – so these were repeatable and
reliable “apples-apples” tests.
– Was nothing else running during the tests?
• √ The system was quiescent before each run
• Operating system tuning
– IBM iSeries experts analyzed temp space, memory pools, paging, etc…
• √ No hardware or OS tuning opportunities which were obvious
– …although Memory pools are a very common issue with UBEs on iSeries
Always validate and verify the data before pursuing more
detailed analysis
Nonlinear scaling of long running batch jobs
Analysis
• “Everybody Lies”
- Gregory House, M.D.
Nonlinear scaling of long running batch jobs
Analysis
• With the “low hanging fruit” items addressed, the next step was to turn to
First Principles of Performance Analysis:
– Where is the Code or System Spending its time?
See: “Performance Engineering Parables”
C. Papineau
CMG conference 2010
– This involves collection of profiling data for each batch job.
Performance First Principles: Where is the code and/or system
spending its time?
Nonlinear scaling of long running batch jobs
Analysis
• Numerous options exist for profiling tools, depending on operating
System, language, etc
– Quantify
– Tprof
– Performance Explorer (PEX)
• The ERP package in question provides a customized debugging
feature which produces output in plain text format
– Enabled by a configuration file setting
– Debugging data output to a text file
• The Batch programs are instrumented to collect the following timestamped debugging data:
–
–
–
–
BSFNs
SQL
Hierarchical BSFN Call Stack data
Proprietary technical functions
• Middleware APIs
• Tools APIs
Nonlinear scaling of long running batch jobs
Analysis
•
A tool called Performance Workbench (PW) is provided with the ERP product which can
parse this text-based data and produce HTML-based reports and call stacks as output
–
–
The text based data can also be manipulated and analyzed using spreadsheets, Perl programs, or
other text parsing tools of choice
So – the time-stamped ERP debugging output is used as custom profiling data
•
The ERP product provided the custom instrumentation critical to this analysis.
But….
•
Custom instrumentation / profiling data can be added to ANY software package to
which source code is available
–
–
•
Format can be specified as per needs, in whatever manner is most useful to the desired purposes
Plain text can be readily parsed into human readable format by C, Perl, VB, etc….
For details on Performance Workbench – see Poster Session 598
–
Thursday 5:00 PM - 6:30 PM Room: Woodrow Wilson
Plain text output can be used for instrumentation and targeted
custom profiling
Nonlinear scaling of long running batch jobs
Analysis
•
Collect profiling logs – 3000 record case
–
–
Run batch jobs with ERP debugging feature enabled
3000 record case provided enough of a robust sample size but without the
excess overhead of the 6000 or 9000 record cases
•
PW run was run against the plain text data, producing these outputs:
– HTML Code profile
– Hierarchical Call Stack of Business Functions
•
Parse profiling logs with Excel spreadsheet, extract BSFN runtimes
– Columnar / fixed-width nature of the data made it a simple matter
to import to Excel and leverage graphical capabilities
•
We will focus on Batch ‘A’ for the purposes of this study, since it was
the most severely degraded and illustrates the key analysis concepts
most clearly. However, similar procedures were applied to all three
jobs yielding similar results.
Nonlinear scaling of long running batch jobs
Analysis
Performance Workbench profile
of batch job: BSFN call stack output
BSFN: R47011DataBaseUpdateSection()
Import Performance Workbench output into Excel –
graph BSFN timing trends
Individual listing of BSFN runtimes
Graphical analysis of BSFN runtimes
Parse the profiling data with Excel spreadsheet
Nonlinear scaling of long running batch jobs
Analysis
•
The two BSFNs
which consumed the
majority of the time
in Batch ‘A’ are
indicated in the PW
profile below
•
The BSFNs are
called about 3000
times each
•
This sample size
warrants a
measuring the
trend of those
runtimes for the
duration of the run
• This is where
Excel is useful….
Performance Workbench HTML profiling data – “Where is the code spending its time?”
Critical Business Functions
in profile of Batch Job ‘A’:
F4312EditLine
F4312EndDoc
Nonlinear scaling of long running batch jobs
Analysis
•
•
The degradation seemed to be distributed throughout the
code, and not occurring in one area
Here are the runtimes for the two key BSFNs for the 3000
record run, plotted as a function of elapsed batch time
– They clearly rise steadily over the course of the batch job:
Business function runtimes steadily increase as batch program progresses
Elapsed Time
Elapsed Time
Nonlinear scaling of long running batch jobs
Analysis
•
•
What could be causing such an “across the board”
degradation
What do all the critical parts of the BSFN code
have in common that would exhibit this uniform
rise in runtime?
•
The question to ask is:
– What do these batch jobs DO ????
•
A look at the profiling data shows a decisive
insight:
– These jobs are very heavy users of data caching
Nonlinear scaling of long running batch jobs
Analysis
The PW profiling data showed thousands of calls to
data caching APIs
•
The ERP package in question
provides a caching API set
which allows the developer to
create lists of indexed data in
memory.
–
–
•
•
These lists can be
manipulated and modified in
many of the same ways as a
database, e.g: FETCH,
DELETE, UPDATE, INSERT.
There are also advanced
features similar to that which
one would see in a Database
Management System
(DBMS), such as cursors and
multiple indexes.
jdeCacheInit() initializes
a cache object
jdeCacheTerminate()
releases it
Nonlinear scaling of long running batch jobs
Analysis
•
A simple feature in the ERP product’s logging capability tracks the
caches whose handles have been initialized via jdeCacheInit()
but not been released via jdeCacheTerminate()
– Many of the ERP products applications utilize caching, and the PW
tool + the debug instrumentation recognizes this fact
A Lengthy list of
unterminated
caches in the raw
ERP debug logging
Specific product knowledge is critical to performance analysis
of complex systems
Nonlinear scaling of long running batch jobs
Analysis
•
de·us ex ma·chi·na
[dey-uh s eks mah-kuh-nuh,dee-uh s eks mak-
uh-nuh] noun
– Any artificial or improbable device resolving the difficulties of a plot.
•
If the leap to data caching seems a bit improbable:
–
–
–
While the Performance Engineer does not have to possess expert level knowledge of
the code in question, he or she must have access to such a Subject Matter Expert
(SME).
In this case, the insight of the cache APIs dominating the profile – provided by such
a SME - was a critical leap which shortened this performance analysis effort greatly.
A code profile is data, but data is of limited use if it does not provide information.
"I should think you Jedi would have more respect for the
difference between knowledge and wisdom."
- Dexter Jettster, Star Wars: Attack of the Clones
Nonlinear scaling of long running batch jobs
Analysis
•
Upon investigation of the debug logs, the crux of this issue
is clear:
– A very large number (tens of thousands) of uniquely named
caches (thousands) are used in all three of these batch jobs
– These caches are not properly terminated during the course of
the job.
•
Note that this is a different problem from either of the
following, and should not be confused with same:
– Large number of records in a single cache
– Large number of references to a single cache without terminating
•
Neither of the above would cause the sort of phenomenon
we’re seeing….
Nonlinear scaling of long running batch jobs
Analysis
•
When many caches are created but not destroyed when no longer
needed, the internal list of cache names in jdeCacheInit() gets
progressively longer, so the calls to jdeCacheInit() take longer
and longer.
•
The jdeCacheInit() API is called everywhere, thus having the
effect of making “all” areas of the code quite uniformly slower as the
run progresses.
•
This is what we call a “resource leak”
–
The actual memory cost is not significant
Resource and memory leaks can have a critical impact on
Performance
Nonlinear scaling of long running batch jobs
Analysis
•
The reason for the unique cache names for thread safe code
–
•
Two users must not be using the same cache in the same process
This methodology itself is not what causes the problem
–
–
Failing to destroy each uniquely named cache as soon as it is no
longer needed is the true problem
This causes the unneeded cache names to accumulate in the internal
buffers of the jdeCacheInit() and jdeCacheTerminate() APIs
•
–
–
…which in turn causes progressively longer calls over the course of processing
the data.
The impact on the batch jobs over time is very significant.
The true problem is the large cache name “footprint” – the total number
of unique cache names stored in the buffers at any given time must
remain small.
Nonlinear scaling of long running batch jobs
Analysis
•
jdeCache dynamic runtime cache naming: BSFN Code sample:
/* Add (const JCHAR*)before szMathString*/
sprintf(szCacheName, _J(“GeneralLedgerCache-%ls"), szUniqueJObID);
/* Use the new cache name */
CacheInitReslt = jdeCacheInit(hUser, hCache, szCacheName, WfIndex);
•
Rather than hard coding a cache name, the name for each cache is created
by appending a unique number to a hardcoded string:
–
–
–
•
The unique number could be job number, order number, or something else to do with
the business purposes of the app.
In this case, the variable Job ID (szUniqueJObID) gets sprintf()’ed to a
hardcoded string, e.g. “GeneralLedgerCache-”
For each record processed, a new unique cache name is dynamically
created
This is only a problem when the reference count does not go to zero
when the program is finished with the cache, and the cache name
remains in the list.
–
In this case – the practice leads to an extremely large number of distinct caches
– which causes the slowness in the bowels of the cache APIs
Nonlinear scaling of long running batch jobs
Analysis
•
Why does the accumulation of cache names slow down the batch job
by such a large factor?
–
–
–
–
All these thousands of cache names get stored in static internal buffers in
the tools cache APIs.
They must be processed by jdeCacheInit() and
jdeCacheTerminate() each time.
The larger the run, the more cache names, and the larger these
buffers get. This is the reason why larger runs become increasingly
unscalable and run longer by non-linear factors.
There are two while loops in each call to jdeCacheInit() which become
longer and longer each time it is called to create or access a cache.
•
•
jdeCacheInit() must loop through an ever increasing list of all existing cache
names each time a cache is created.
jdeCacheTerminate(), similarly, must loop through the same bloated list to
find the proper cache to terminate
“Tweleve Drummers Drumming, Eleven pipers piping, ten lords-a-leaping….”
Nonlinear scaling of long running batch jobs
Analysis
•
jdeCacheInit()
–
–
–
•
If there is no existing cache of the same name:
Creates the new cache and adds the name to the internal list
If a cache of the indicated name already exists:
Increments the reference count for existing caches
In either case – the entire list of cache names is searched sequentially
jdeCacheTerminate()
–
–
–
If the reference count is > 1: Only decreases the reference count
If the reference count = 1: then the cache is actually destroyed and
removed from the list
IN either case – the entire list of cache names is searched sequentially
Nonlinear scaling of long running batch jobs
Analysis
• jdeCacheInit() Tools API internals - code
segment
Loop through every cache object
Compare cache name with every other
cache stored in the list
Nonlinear scaling of long running batch jobs
Analysis
•
•
•
•
•
Back to the ERP debug
log …
There is a repeating pattern
of seven caches, each of
which has one instance for
each EDI job number
There is a static name
appended to a variable (EDI
job #)
Each of these cache names
is stored in the internal
cache name list and must
be scanned each time
jdeCacheInit() or
jdeCacheTerminate() is
called
…The 21,000 days of
Christmas - jdeCache()
style
Nonlinear scaling of long running batch jobs
Analysis
•
So where do these caches occur in the code?
–
•
This question must be answered if the problem is to be fixed at the
application level.
This can be determined via a customized string search of the
profiling data.
–
–
Standard text parsing tools, such as those supporting regular expressions,
can be used for this task.
However, the PW tool provides this functionality as yet another built-in
feature.
•
•
The profiling data can searched for user-specified strings, and an output file
listing all the BSFNs containing those strings within their scope will be listed
The hardcoded portions of the seven cache names in Batch ‘A’
can be entered as search strings in Performance Workbench to locate
the “unterminated cache” names indicated in the debug log.
Nonlinear scaling of long running batch jobs
Analysis
•
The BSFNs containing the
seven cache strings within
their scope are indicated,
thus it is these BSFNs
which contain the
offending unterminated
caches.
Note that a fundamental
insight was necessary to
once again simplify the
problem scope:
•
–
–
Namely, the patterns in
the cache names which
essentially reduced
21,000 names to seven.
This is why software
performance work is truly
engineering, not rote
technician’s tasks.
It is NOT “linear thinking”
Performance analysis involves understanding the data and
deriving original insights, not just rote number crunching
Nonlinear scaling of long running batch jobs
Analysis
• It should be noted here that the Performance
Workbench tool – customized to this ERP
product - has once again provided an insight vital
to the timely resolution of this issue.
– It has done so in a manner much more
straightforward than that of tools from a third-party
vendor.
– The lesson here is that customized time-stamped
profiling is often the decisive factor in quickly narrowing
down the source of performance problems in cases
where access to the source code is possible.
The use of the proper profiling tools is critical to performance
analysis. This may entail customized profiling and data parsing.
Nonlinear scaling of long running batch jobs
Analysis
•
For the 3,000 record case:
–
Caches in Batch ‘A’ which are not terminated: total of 21,070 unique caches in
seven patterns in three Business Functions
•
BSFN: F4111ClearDetailStack (XT4111Z1)
–
–
–
–
–
•
BSFN: CacheProcessPOHeaderCache (B4302180)
–
•
B3003130-<number>
Caches in Batch ‘B’ which are not terminated: total of 558 unique caches in
one pattern in one Business Function
•
BSFN: F41021DeleteCache (B4200370)
–
–
B4302180F1706<number>
BSFN: B3003130 (B3003130)
–
–
F41UI001-<number>
X4111Z1GL-<number>
X4111Z1LPACM-<number>
X4111Z1QTY-<number>
X4111Z1ACM-<number>
B4200370C-<number>
Caches in Batch ‘C’ which are not terminated: total of 3000 unique caches in
one pattern in one Business Function
•
BSFN: CacheProcessConfigurationID (B3201470)
–
B3201470<number>
Nonlinear scaling of long running batch jobs
Analysis
• Applications Solutions:
– Fix the BSFN code to include appropriate calls to
jdeCacheTerminate()
•
This was the case for Batch ‘A’
.. OR, if the code logic dictates:
– Fix the BSFN code to eliminate the superfluous calls to
jdeCacheInit()
•
•
In other words, did we REALLY need to initialize all those caches in the first place…
This was the case for Batch ‘B’ and ‘C’
Nonlinear scaling of long running batch jobs
Analysis
•
Pseudocode showing the essence of the fix for case 1
/* if cache cursor is NULL AND there are no records in the cache,
terminate the pointer */
if (GeneralLegderCacheCursor == NULL)
{
if (jdeCacheGetNumRecords(hGeneralLegderCache) == 0)
{
/* MISSING CALL to jdeCacheTerminateAll() */
jdeCacheTerminate(hUser, hGeneralLegderCache);
}
}
So this is the answer to all
the riddles….
Nonlinear scaling of long running batch jobs
Analysis
• This leads us to another vital factor in
Performance Analysis: the ability to read and
understand the code.
• In order to comprehend details of the problem,
one does not need to be an expert at writing
code.
• However, being adept at grasping the nuances of
code written by others is essential.
– In other words, you do not need to take off and land the
plane, but you should be able to navigate in the air.
The ability to read and understand source code is very
important to determining the cause of Performance Problems
•
Nonlinear scaling of long running batch jobs
Analysis
Elephant in the living room:
– The sequential search in the Tools code shown below
– Linked list data storage, combined with efficient search algorithms
such as bubble sort and shell sort, are very fundamental “cookie
cutter” concepts. Why was this not addressed?????
Sequential search…YIKES!!!!
Loop through every cache object
Compare cache name with every other
cache stored in the list
•
Nonlinear scaling of long running batch jobs
Analysis
The answer was driven by business reasons:
– Modifications to the Tools code layer take longer to deliver to
customers that those to the application level, since the Tools code
is proprietary and can only changed by the vendor.
– Changes to Tools APIs impact every application which makes calls
to them, and so there is a much longer regression testing cycle
involved.
– Changes to application code, conversely, can be delivered quickly
and tested right in the customer’s environment.
– When a customers’ business cycle is impacted, the fastest means
of delivering a stable solution must be leveraged, so an applicationlevel solution to the problem was delivered
Science means solving Academic problems
Engineering means solving Business problems.
Nonlinear scaling of long running batch jobs
Analysis
•
•
There actually WAS a Tools code change which resulted from this
effort, which will tend to “cover up” the application programming
oversight which caused this batch scalability predicament.
This fix used a binary tree algorithm instead of the flawed sequential
search of a “flat” array in jdeCacheInit() and
jdeCacheTerminate().
–
•
•
It was delivered at a later date, as a part of a regularly scheduled Tools
“service pack”.
Application-side solution will always work, even without this Tools fix
With this fix in place, however, ANY batch scaling problems caused by
this issue should improve dramatically
Nonlinear scaling of long running batch jobs
Analysis
• Characterization of the
problem….
• Internal testing by the Tools
team using a test driver which
leaves varying numbers of
caches open:
• Throughput drops severely as a
function of the number of unique
cache names left in the internal
list
– If n is the caches leaked per
iteration
– The throughput falls off as
O(1/n)…at least
…the more unique cache
objects left open, the slower
the program runs
Test Program: number of cache inits per sec (Windows)
Throughput (init per sec)
2200
2100
2000
1900
1800
1700
1600
1500
1400
1300
1200
1100
1000
900
800
700
600
500
400
300
200
100
0
0
5000
10000
# of open caches
15000
# of cache objects left open
20000
Nonlinear scaling of long running batch jobs
Analysis
•
Have many customers have complained about scalability
in these batch jobs?
– Customers may not notice it if they are running sufficiently low
volumes, even if the issue has long been present.
– Others may have noticed the problem, but their runtime windows
could be long enough so that it does not cause a business
disruption.
– Ergo – they don’t enter calls…
…performance is often in the
eye of the beholder
Nonlinear scaling of long running batch jobs
Agenda
• Problem Definition
• Analysis
• Results
• Conclusion
Nonlinear scaling of long running batch jobs
Results
• Results:
–
–
Following the simple application code fix, the throughput of all three
batch programs was dramatically improved.
Note that the runtime of the 9000 record use case for Batch ‘A’, on
which this paper has focused, was reduced from 722 minutes to 30
minutes
Batch ‘A’
# records
3000
6000
9000
runtime (minutes )
Before fix
55
396
722
The use case that REALLY
mattered
After fix % runtime reduction
14
74.5%
20
94.9%
30
95.8%
Batch ‘B’
3000
6000
9000
8
22
110
14
23
32
-75.0%
-4.5%
70.9%
3000
6000
9000
6
22
45
4
7
13
33.3%
68.2%
71.1%
Batch ‘C’
Anomalous results...likely
due to data inconsistencies
Nonlinear scaling of long running batch jobs
Results
• Terminating the cache objects clearly
solved a significant resource leak in these
batch programs.
• After the fix was applied, the batch runtimes
rose at a normal, linear rate to the amount
of input data processed.
• The disruptions to the customer’s business
cycle were completely resolved.
Nonlinear scaling of long running batch jobs
Analysis
• Problem Definition
• Analysis
• Results
• Conclusion
Nonlinear scaling of long running batch jobs
Conclusions
•
•
•
•
•
•
•
•
•
Always validate and verify the data before pursuing more detailed analysis
Performance First Principles: Where is the code and/or system spending its time?
Plain text output can be used for instrumentation and targeted profiling
Specific product knowledge is critical to performance analysis of complex systems
Resource and memory leaks can have a critical impact on performance
Performance analysis involves understanding the data and deriving original insights,
not just rote number crunching
The use of the proper profiling tools is critical to performance analysis. This may
entail customized profiling and data parsing.
The ability to read and understand source code is very important to determining the
cause of Performance Problems
Science means solving Academic problems
Engineering means solving Business problems.
Nonlinear scaling of long running batch jobs
Conclusions
• How many gifts would be involved in
singing The 100 Days of Christmas
• The answer for day 100 is……5050
gifts!!!
P.S. – What the heck is a “lord-a-leaping”????
Nonlinear scaling of long running batch jobs
Performance First Principles
Download