Java on z/OS: A Fresh Look

advertisement
Java on z/OS: A fresh look
Scott Chapman
American Electric Power
Important notes
 I don’t really like Java as a language
 I’m not a Java expert
 Results presented herein may be
installation-dependent
 There’s a lot of moving parts here
 I understand there’s zAAP on zIIP
 “zAAP” used generically here
 All trademarks of IBM, Oracle, and
everybody else hereby recognized
Why Java on z/OS?
Because programmers want to use it
http://xkcd.com/801/
Why Java on z/OS
 Because it enables open source
projects that are
cool/useful/interesting
 Key trick: run the JVM in ASCII
 -Dfile.encoding=ISO8859-1
 Many things will just run with that
run-time option!
What about a GUI?
 Turns out that that just works too!
 Start Xming X server on your PC
 Check the “No Access Control” option
 Set the DISPLAY environment variable
 Run the code
S147774:/u/s147774: >export DISPLAY=10.97.131.15:0
S147774:/u/s147774: >java -Xmx320m -jar ga33.jar
Debugging Javascript code
running in Helma on the
mainframe with the GUI
connected to Xming on
my laptop
Works better than
I expected
Why Java on z/OS
 Because it enables more programming
language choices
 Javascript built in to Java 6
 Rhino interpreter from Mozilla
 In theory, should be able to run any JVMbased language (I haven’t tested these)
 Jython
 Groovy
 Clojure
 Scala
 Ruby (via JRuby)
Why Java on z/OS
 It may perform better
 If you are on a sub-capacity machine
 It may save you money
 Pretty unlikely
 Only if you can take some work away from
your peaks
Which job is better?
How cheap are zAAP/zIIPs?
• $100K/SE (z196, zEC12)
• How much is $100K?
• Consider adding 1 engine to z196-710:
a) 710 = 10,250 MIPS, 1191 MSUs
b) 711 = 11,073 MIPS, 1286 MSUs
c) 710+1 zIIP = 10,302+1,000 MIPS
z/OS (base) at this level costs $62/MSU
•
•
Scenario B, z/OS base goes up almost
$6K/month
zIIP costs < 17 months of z/OS Base
•
•
Not to mention features, DB2, CICS, etc.
What about accessing z/OS
services?
 JZOS Classes to easily access z/OS
specific constructs




z/OS datasets
RACF
Respond to operator commands
Access JES Spool
Ways to Run Java on z/OS






WebSphere
CICS
DB2 Stored Procedures
Batch
Started Tasks
Unix shell
Batch / Started Task options
 BPXBATC
 BPXBATCH (traditional alias)
 BPXBATSL (local spawn alias)
 Traditional approach
 Difficulty with 100-byte JCL Parm
 JZOS
 Ships with z/OS
 Avoids 100-byte parm limit
 Adds a lot of flexibility
Measuring Java
zAAP vs. GCP time
 Watch the normalization factor!
 Most SMF values not normalized
 Tools/reports may normalize for you
 Consider IFAHONORPRIORITY=NO
 Avoid using GCPs to help zAAPs
 Can result in >99% of Java CPU time
executed on zAAP
SDSF zAAP vs. GCP columns
JOBNAME
P3SR01BS
P3SR01AS
P3SR01B
P3SR01A
P3SR02A
P3SR02B
P3SR01AS
P3SR02BS
P3SR01BS
P3SR02AS
RTMSERVE
CPU-Time
1514.11
1706.50
788.55
763.01
2953.37
3051.88
7281.39
2805.58
7783.21
2591.27
2661.39
TCB + SRB
This data comes from RMF
GCP-Time zAAP-Time zACP-Time zAAP-NTime
9.53
772.02
2.26
1501.82
12.82
868.75
1.95
1690.00
197.66
281.64
1.53
547.87
192.47
272.33
1.10
529.77
422.62
1188.79
5.39
2312.56
437.74
1226.02
6.55
2385.00
62.56
3698.72
11.47
7195.17
123.85
1316.22
22.15
2560.45
63.38
3955.54
14.38
7694.77
118.60
1216.36
10.74
2366.21
3.85
1363.45
1.03
2652.34
real
zAAP on GCP
normalized
SMF 30 Accounting
 BPXBATCH vs. BPXBATSL vs. JZOS
 Important due to spawned OMVS tasks
 Single step job results:
 BPXBATSL: 1 step, 1 job record
 BPXBATCH: 6 step, 4 job records
 CPU time collected on type OMVS records
 JZOS: 2 step, 2 job records
 CPU time almost completely on JOB types
Some interesting calculations
zAAPn = SMF30_TIME_ON_IFA * SMF30ZNF / 256
percent work done on zAAP =
zAAPn / (zAAPn + SMF30CPT + SMF30CPU)
(“Generosity” or “offload” factor)
percent zAAP sent to GCP =
SMF30_TIME_IFA_ON_CP /
(SMF30_TIME_ON_IFA+SMF30_TIME_IFA_ON_CP)
(“Fallback” percentage—can be <1%, although some
fallback is normal and expected)
Other SMF records
 RMF records
 Look for breakdown of processor types
for both hardware and report / service
classes
 WAS 120 records
 New subtype 9s for WAS 7+ much better!
 HIS type 113 records
 GCP vs. zAAP vs. zIIP
Java Performance
What about performance?
 Java on the mainframe has a history of
performance problems
 Java is inherently “heavy” due to the
JVM
 Scott’s Law: “The easier you make it on
the programmer, the harder it is on the
system”
 Today’s z hardware and software are
up to the task!
 (But you probably want zAAPs!)
Heard at WAS Week 200x…
 “Our goal is to get JVM startup time
down to about 1 second.”
 Seemed like a stretch at the time!
 WAS startup took several minutes
Today: WAS Servant Startup <1 min
15.49.15
STC14327 ---- MONDAY,
18 APR 2011 ----
15.49.15 STC14327
$HASP373 P3SR02AS STARTED
15.49.15 STC14327
IEFUSI BPXBATSL-P3ASRU
15.49.15 STC14327
IEF403I P3SR02AS - STARTED - TIME=15.49.15
15.49.16 STC14327
+BBOO0004I WEBSPHERE FOR Z/OS SERVANT PROCESS
ABOVE REGION SET TO 1536MB
P3CELL/P3NODEA/P3SR02/P3SR02A IS STARTING.
15.49.16 STC14327
+BBOO0239I WEBSPHERE FOR Z/OS SERVANT PROCESS p3cell/p3nodea/p3sr02a IS
STARTING.
15.49.16 STC14327
+BBOO0308I SERVANT PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A IS EXECUTING
IN 64-BIT ADDRESSING MODE.
15.49.16 STC14327
+BBOM0007I CURRENT CB SERVICE LEVEL IS build level 7.0.0.12
(cf121027.08) release WAS70.ZNATV date 07/09/10 11:02:02.
...
15.49.56
STC14327
+BBOO0222I: WSVR0001I: Server SERVANT PROCESS p3sr02a open for
e-business
15.49.57 STC14327
+BBOO0020I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT
PROCESS P3SR02A.
15.49.57 STC14327
+BBOO0248I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT
PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A.
Not much in that
particular servant
Today: HelloWorld in <2 seconds
10.08.55
10.08.57
10.08.57
10.08.57
10.08.57
JOB47259 IEF403I S147774B - STARTED - TIME=10.08.55
JOB47259 --TIMINGS (MINS.)-JOB47259 -JOBNAME STEPNAME PROCSTEP
RC
EXCP
CPU
SRB CLOCK
JOB47259 -S147774B
RUNOMVS
00
59
.00
.00
.02
JOB47259 IEF404I S147774B - ENDED - TIME=10.08.57
10.08.57 JOB47259
10.08.57
-S147774B ENDED.
JOB47259
NAME-BPXBATCH TEST
TOTAL CPU TIME=
.00
SERV
2524
TOTAL
PG
0
----PAGING COUNTS--PAGE
SWAP
VIO
0
0
0
ELAPSED TIME=
.02
$HASP395 S147774B ENDED
Output
Hello Scott
Java runtime: IBM Corporation 1.6.0, vm version 2.4
Running on: s390 z/OS 01.10.00
Running for: S147774
Classpath: /usr/lpp/java/J6.0/lib:/usr/lpp/java/IBM/J1.3/l
JCL
//RUNOMVS EXEC PGM=BPXBATCH,
// PARM='SH java -Xms32M -Xmx32M HelloWorldApp Scott'
//SYSOUT
DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//STDENV
DD *
//STDOUT
DD SYSOUT=*
//STDERR
DD SYSOUT=*
z10 EC 504
with zAAP
Small machine
10.51.53
JOB10901
IEF403I S147774B - STARTED - TIME=10.51.53
10.52.04 JOB10901
-
10.52.04 JOB10901
-JOBNAME
10.52.04 JOB10901
-S147774B
10.52.04 JOB10901
IEF404I S147774B - ENDED - TIME=10.52.04
10.52.04 JOB10901
-S147774B ENDED.
10.52.04
JOB10901
--TIMINGS (MINS.)-STEPNAME PROCSTEP
RUNOMVS
----PAGING COUNTS---
RC
EXCP
CPU
SRB
CLOCK
SERV
PG
PAGE
SWAP
VIO
00
86
.00
.00
.18
2252
0
0
0
0
NAME-BPXBATCH TEST
TOTAL CPU TIME=
.00
TOTAL
ELAPSED TIME=
$HASP395 S147774B ENDED
z10 BC E02
without zAAPs
Not surprising that ~50 MIPS engines can’t keep up with
450 / 900 MIPS engines
.18
What about doing real work?
 Days of assuming it will run faster on
your PC are over
 Have seen H2 perform better on z/OS
 Still, it is Java, it’s not CPU-free
 Performance may depend on:





zAAP and GCP capacity
System settings (USS, zFS, WLM)
Application code
Java Settings (heap size, GC policy)
Random luck
Application code
 Application code is always important
 Regardless of the language!
 BufferedReader or ZFile?
 Classic “it depends”
 BufferedReader seems like it should be faster
 But they provide different results: byte array vs.
string
 What you want to do with the result may impact
which is best for any given situation
 Java has lots of similar but slightly different
ways of doing things
Heap settings
 Heap settings always seen as an issue
 Size is the usual suggestion
 Is bigger always better?
 Does anybody know how much heap they
really need? (no)
 Min / Max sizes same or different?
 Garbage collection policy options
Memory is an issue
 Java’s memory usage can be an issue
 “Requirements” for 100s of MBs are
not unusual
 Often “requirements” seem to be a SWAG
 Java heap size can’t be reliably predicted
from the code & expected volumetrics
 Test with reasonable numbers before
assuming the requirements are real
 Be sure to get all processing scenarios!
Garbage Collection Options
(IBM Java 6)
 optthruput – default
 Probably best for batch
 gencon – generational / concurrent
 maybe good for large heap, transactional workloads
(WAS)
 optavgpause – reduces long pauses
 subpool – “improved” object allocation
 For important workloads, may want to test all
of them at various size
 Lots of other heap/gc options too
 See IBM JDK Diagnostics Guide!
Heap size impact - Workload 1
45
40
zAAPn seconds
35
30
25
20
15
10
5
0
Run 1
Run 2
32MB
Run 3
64MB
128MB
Run 4
256MB
Run 5
512MB
For some workloads,
heap size may not matter
Heap size impact - Workload 2
350
zAAPn seconds
300
250
200
150
100
50
0
Run 1
Run 2
32MB
Run 3
64MB
128MB
Run 4
256MB
Run 5
512MB
Too small of a heap can
cause CPU increase
Variable vs. Fixed Heap size
350
zAAPn Seconds
300
250
200
150
100
50
0
WL1 32MB
WL1 32-128MB
WL1 128MB
Run 1
Run 2
WL2 32MB
Run 3
Run 4
WL2 32-128MB
WL2 128MB
Run 5
There might be a slight
benefit to a fixed
heap size
GC Policy Comparison, Workload 2
800
700
zAAPn Seconds
600
500
400
300
200
100
0
Run 1
Run 2
Run 3
Run 4
optthruput 128MB
optavgpause 128MB
subpool 128MB
optthruput 32MB
optavgpause 32MB
subpool 32MB
Run 5
gencon 128MB
Heap size most important,
but GC Policy also
can be significant
Runtime options
140
zAAPn Seconds
120
100
80
60
40
20
0
Run 1
Run 2
Baseline
Run 3
jit:count=0
Run 4
Run 5
quickstart
Don’t mess
with the JIT!
Quickstart with trivial workload
0.9
zAAPn seconds
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Run 1
Run 2
Run 3
baseline
Run 4
quickstart
Run 5
Could be good
for certain
workloads
So what’s the random thing?
 Much more variation in CPU time
measurements with today’s CPUs
 Superscalar pipeline and cache issues
 Seems to impact my Java work more
than I expected




Consistently ran same workload
Extremely lightly utilized LPAR
Lightly utilized zAAPs
Same variability over time
 So I tried some more tests…
0
Workload1, 32MB
Workload1, 512MB
Workload1, REXX
Workload2, 128MB
Workload2, 512MB
Trivial, 32MB
20MAY11:04:45:00
20MAY11:02:45:00
20MAY11:00:45:00
19MAY11:22:15:00
19MAY11:20:15:00
19MAY11:18:15:00
180
Zero zAAPs
1.8
160
1.6
140
1.4
120
1.2
100
1
80
0.8
60
0.6
40
0.4
20
0.2
0
CPU Seconds for trivial workload
Two zAAPs
19MAY11:16:15:00
19MAY11:14:15:00
19MAY11:12:15:00
19MAY11:10:15:00
19MAY11:08:15:00
19MAY11:05:15:00
19MAY11:03:15:00
19MAY11:01:15:00
18MAY11:23:15:00
18MAY11:21:15:00
One zAAP
18MAY11:19:15:00
18MAY11:17:15:00
18MAY11:15:15:00
18MAY11:12:00:00
18MAY11:10:00:00
18MAY11:08:00:00
18MAY11:06:00:00
18MAY11:04:00:00
17MAY11:22:00:00
17MAY11:20:00:00
200
17MAY11:18:00:00
17MAY11:16:00:00
17MAY11:14:00:00
17MAY11:12:00:00
17MAY11:10:00:00
17MAY11:07:45:00
CPU seconds (zAAPn + GCP)
Java Workload Variability
2
Why is this?
 I don’t know, but best guess is CPU
cache and memory access effects
 But I thought I’d look at the 113
records to see if I could find anything
interesting….
Processor Speed
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0
Data from
Test period 1
(One zAAP)
2
Proc 0 = GCP
Proc 2 = zAAP
Executed Instruction Rate
400
350
300
250
200
150
100
50
0
0
Proc 0 = GCP
Proc 2 = zAAP
2
Seems to confirm
our SMF30 data
Level 1 Miss Percentage
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0
Proc 0 = GCP
Proc 2 = zAAP
2
Percent sourced from L1.5 Cache
100
90
80
70
60
50
40
30
20
10
0
0
Proc 0 = GCP
Proc 2 = zAAP
2
L1.5 Improvement
corresponds to dip
in machine usage
Percent TLB Miss of Total CPU
50
45
40
35
30
25
20
15
10
5
0
0
Proc 0 = GCP
Proc 2 = zAAP
2
Dip in GCP TLB Miss
overhead due to
machine less busy
Estimated Cycles Per Instruction
10
9
8
7
6
5
4
3
2
1
0
0 - Sum of ESTIMATEDINSTRUCTIONCOMPLEXITYCPI(ESTICCPI)
0 - Sum of ESTIMATEDCPI FROMFINITECACHE/MEM(ESTFINCP)
2 - Sum of ESTIMATEDINSTRUCTIONCOMPLEXITYCPI(ESTICCPI)
2 - Sum of ESTIMATEDCPI FROMFINITECACHE/MEM(ESTFINCP)
Proc 0 = GCP
Proc 2 = zAAP
My Guesses…
 My test Java workloads were too cache and
superscalar friendly
 Perhaps makes it more susceptible to pipeline
hazards
 But:
 Wouldn’t the REXX workload be even more superscalar
and cache friendly?
 Why were the 113 measurements so consistent?
 Or Java is really doing variable amounts of work?
 Or… something isn’t right someplace?
 Take away: Java CPU measurements might be
more variable than you expect
Most recent testing
 Repeated testing later in the year
 z/OS 1.12 vs. 1.10
 1 Year more recent Java 6 (Fall 2010 vs. Fall 2009)
 Still saw variability, but worst of it was closer to
25-30% instead of upwards of 75%
 Saw similar variability when testing on a z9 with
zAAPs
 Saw at least one instance in a production LPAR
with similar variability: (in 3 executions of the
same job, 1st consumed just over half as much
CPU of the later runs)
 Could not readily replicate on a WSC system
running under z/VM
Summary
 Java enables all sorts of cool things
you might not have thought could run
on the mainframe
 Mainframe’s Java performance not
significantly worse than any other
platform
 (Assuming adequate zAAP capacity)
 Lots of tuning knobs for Java
 Java CPU time measurements might be
more variable
Download