Uploaded by Yuval Shimron

Smaller footprint for Java collections (Summary of Thesis)

Smaller Footprint for
Java Collections
TechTalk – Yahoo TLV
Yuval Shimron
02/3/2016
Software Memory Bloat
Tendency of (newer) computer software to
have a significantly large memory footprint
◦ Without a good reason
◦ LinkedList<LinkedList<Object>> in Java?
But memory is there (is it?), so why not use as
much as we want from it?
◦ Moore’s law vs. Wirth’s law
◦ Microsoft Windows?
CS Technion
June 15th, 2011
2/46
Java & Java Collections
Bloat
Java and JVM hide hardware technicalities
◦ Abstraction, garbage collection, etc…
◦ Runtime & memory cost
Modern compilers offset runtime cost
◦ Automatic optimizations, like JIT
Memory cost is a headache!
◦ Simple HashSet with 3 ASCII characters (3 bytes)
consumes 256 bytes!
◦ Difficult to automatic optimize squandering memory
allocation programming
◦ Negative impact on time performance, scalability and
usability
CS Technion
June 15th, 2011
3/46
My M.Sc. Research
Focuses on the JRE java.util.Map and
java.util.Set implementations:
◦ java.util.Hash{Map, Set}
◦ java.util.Tree{Map, Set}
◦ Probably the most common Java collections in use
excluding java.util.List (LinkedList, ArrayList)
Presents five ‘general use’ memory compaction
techniques
Suggests alternative implementations to above
classes applying these techniques
CS Technion
June 15th, 2011
4/46
Main Results
Significant improvement of the heap memory
overhead
◦ Full compatibility to existing implementation
◦ HashMap / HashSet
● Upto 50% reduction (common use) in HotSpot32 and
upto 70% in HotSpot64
◦ TreeMap / TreeSet
● Upto 60% reduction in HotSpot32 and upto 75% in
HotSpot64
Can even do better if we change algorithm
Runtime affect?
CS Technion
June 15th, 2011
5/46
Memory Overhead per Entry
A minimal Set/Map implementation must hold n
references to n keys or 2n references to n keys
and n values
◦ 4 or 8 bytes per entry (32-bit JVM)
◦ 8 or 16 bytes per entry (64-bit JVM)
A (runtime) practical implementation must
include some internal data structure(s) and other
fields
The memory overhead refers to this additional
data
CS Technion
June 15th, 2011
6/46
Java’s Object Memory Model
Objects in Java always contain a header
◦ 8 bytes (HotSpot32) or 16 bytes (HotSpot64)
Size is always 8 bytes aligned
long, double: 8 bytes each
int: 4 bytes;
byte, boolean: 1 byte each
References: 4 (HotSpot32) or 8 (HotSpot64) bytes
Array of length m : m*s bytes (s - size of entry)
◦ It’s an object - has a header (8/16 bytes)
◦ Length also encoded (int field)
CS Technion
June 15th, 2011
7/46
Object Memory Model
Example
class A
Size of new B()?
A next;
int id;
HotSpot32:
class B
8 + 8 + 1 = 24 bytes
byte b;
HotSpot64:
16 + 12 + 1 = 40 bytes
CS Technion
June 15th, 2011
8/46
java.util.HashMap
t[0]
t[1]
t[2]
t[3]
t[4]
null
null
Entry
null
null
null
t[15]
...
null
…
Entry[] t
HashMap m
int hash = -1937791998;
hash(“architects”.hashCode());
-137491710;
hash(“advertisers”.hashCode());
key = “advertisers”
“architects”
int ivalue
= 2;
hash
15;
= &
Integer(735)
Integer(34)
-1937791998
-137491710 key, value, t[i]);
t[i] hash
Entry(hash,
Entry(-137491710,“architects”,
t[2]
= new= Entry(-1937791998,“advertisers”,
next = Entry
null
{…}
null);
34, t[2]);
key = “architects”
m.put(“architects”,
m.put(“advertisers”,
34);
735);
value = Integer(34)
hash = -137491710
next = null
CS Technion
June 15th, 2011
9/46
Keys Density in HashMap
We denote n to be the current #keys
t length m , is doubled whenever n > m *LF
◦ LF is an internal field (default 0.75)
We denote p to be the current density
◦ After first resize (default):
● 0.375 < p <= 0.75
CS Technion
June 15th, 2011
10/46
Memory Consumption and
Overhead of HashMap/HashSet
Overhead
(64-bit)
Total
(64-bit)
32 + 8/p
Overhead
(32-bit)
16 + 4/p
48 + 8/p
40 + 8/p
48
24 + 4/p
HashMap
(Bytes/Key)
20 + 4/p
HashSet
(Bytes/Key)
24
HashMap
p = 0.5
(Bytes/Key)
64
56
Total
(32-bit)
32
28
CS Technion
HashSet
p = 0.5
(Bytes/Key)
June 15th, 2011
11/46
Consumption and Overhead of
HashMap/HashSet (HotSpot32)
CS Technion
June 15th, 2011
12/46
Consumption and Overhead of
HashMap/HashSet (HotSpot64)
CS Technion
June 15th, 2011
13/46
Memory Compaction
Techniques
Five techniques that compact the internal
representation of collections
◦ Should usually be combined together (as possible)
◦ Reduction depends on specific JVM memory model
● Fields alignment
● Size of types, e.g.: byte in HotSpot vs. J9
null pointer elimination
boolean elimination
Objects fusion
Fields pull-up
Fields consolidation
CS Technion
June 15th, 2011
14/46
(1) null pointer elimination
Symptom: class C defines a pointer field p
which is null in many of C’s instances
◦ Overhead of up to 8 bytes in both 32 & 64 bit JVMs
Solution:
class C
class C
P p() {
return null;
}
P p;
class Cp
P p;
P p() {
return p; }
CS Technion
June 15th, 2011
15/46
(2) boolean elimination
Symptom: class C defines a single boolean
field b
◦ Overhead of up to 8 bytes in both 32 & 64 bit JVMs
abstract
class C
Solution:
abstract
boolean b();
class C
boolean b;
class Ct
class Cf
boolean b() {
return true;
}
boolean b() {
return false;
}
CS Technion
June 15th, 2011
16/46
(3) Object fusion
Symptom: class C defines an ownership
pointer in a non-null field of type C’
◦ Overhead of object header, C → C’ pointer and
perhaps a back pointer C’ → C
Solution:
class C
class C
C’ c’ = new c’();
A a;
B b;
class C’
A a;
B b;
CS Technion
June 15th, 2011
17/46
(4) Fields pull-up
Symptom: class C’ inherits from C and C is
not fully occupied due to alignment
◦ Possible alignment waste in C’
Solution:
class C
int i;
short s;
byte b;
class C
int i;
short s;
class C’
class C’
byte b;
CS Technion
June 15th, 2011
18/46
(5) Fields consolidation
Symptom: same field is defined in a large
number of objects
● Memory waste due to object header and possibly
alignment
Solution:
class C
class C
class C
class
C b;
byte
class
C b;
byte
byte b;
byte b;
…
byte b;
class C
class C
class C
class
C b;
byte
class
C b;
byte
byte b;
byte b;
…
C[] cs
byte[] bs
C[] cs
CS Technion
June 15th, 2011
19/46
Back to HashMap
Target: apply techniques with minimal
changes to internal implementation
◦ Full compatibility is achievable
For maximal memory savings, a better
understanding of hashing behavior is needed
◦ In particular, of the buckets distribution
CS Technion
June 15th, 2011
20/46
Buckets size distribution
CS Technion
June 15th, 2011
21/46
Buckets size distribution
When p<0.69 most table entries are empty!
◦ A lot of wasted memory
Most of non-empty buckets are singletons
◦ Around 77% for p=0.5
Buckets of size 2 are most of the rest
◦ Around 19.3% for p=0.5
Can’t reduce fraction of empty buckets
◦ Since requires different hashing technique
But we can try reduce buckets overhead
CS Technion
June 15th, 2011
22/46
“Fused” Buckets
Buckets up to size 4 are “fused” together
Bucket of size s has
objects
Single object to represent most of buckets in
default density
Combination of all memory compaction
techniques
CS Technion
June 15th, 2011
23/46
FHashMap
t
chv
…
t[0]
t[1]
t[2]
t[3]
t[4]
null
null
Entry
null
null
null
0
0
14
0
0
0
chv[0] chv[1]
chv[2] chv[3] chv[4]
t[15]
...
...
null
0
chv[15]
Entry[] t
byte[] chv
int hash = -2061004282;
hash1(“architects”.hashCode());
-106482162;
hash1(“advertisers”.hashCode());
key1
==“architects”
int i =
hash2(hash)
2;
& 15;
key1
“architects”
==
Integer(34)
value1
Integer(34)
hash =value1
14;
(byte)
6;
hash;
FHashMap m
“advertisers”
t[i] =key2
t[2]
new =
t[i].add(hash,
t[2].add(6,“advertisers”,34);
Entry(key,
Entry(“architects”,
key,
value);
value);
34);
chv[2]value2
chv[i]
= hash;
14; = Integer(735)
hash2 = 6
hash3 = 0
hash4 = 0 34);
m.put(“architects”,
m.put(“advertisers”,
735);
hash5 = 0
next = null
CS Technion
June 15th, 2011
24/46
“Squashed” Buckets
Extension of fields consolidation
◦ Add array for values of first key
Buckets of size 1 now have a single field
◦ Can be referenced directly from the table!
● Eliminating object header and reference to it
◦ This is the common case ☺
Bad: significant memory overhead due to empty
buckets
◦ Mainly for lower values of p and 64-bit JVM
◦ But even then still better than HashMap
CS Technion
June 15th, 2011
25/46
Expected Saving in Memory
Overhead - HotSpot32
CS Technion
June 15th, 2011
26/46
Expected Saving in Memory
Overhead – HotSpot64
CS Technion
June 15th, 2011
27/46
Timing Benchmarks
Results are generally inconclusive and sometimes
even inconsistent
◦ But generally show speedup / less slowdown as table
gets larger for both {S,F}HashMap
Nevertheless they show:
◦ Significant slowdown for removals and iterations
◦ Some slowdown for retrievals in small tables
◦ Some speedup for retrievals in large tables
We did not see significant speedup/slowdown for
genera benchmarks like SPECjbb, SPECjvm and
daCapo
CS Technion
June 15th, 2011
28/46
java.util.TreeMap
Implementation of the Map interface
using binary search tree
Red-black tree
◦ Leaves & root are black
◦ Children of red node are black
◦ Simple path from node to descendant leaves has
same number of black nodes
◦ Height is bounded by 2log(n+1)
◦ Rotations during insertion/removal keep tree balanced
java.util.TreeMap
◦ Red-black tree leaf is represented as null
◦ Leaf node thus represents a node whose two children
should be two (black) leaves
CS Technion
June 15th, 2011
29/46
TreeMap node
parent
value
key
boolean
color
left
right
CS Technion
June 15th, 2011
30/46
Memory Consumption and
Overhead of TreeMap/TreeSet
Overhead
(64-bit)
Total
(64-bit)
48
Overhead
(32-bit)
24
64
56
Total
(32-bit)
32
28
HashMap
(Bytes/Key)
HashSet
(Bytes/Key)
Same overhead as in HashMap for p = 0.5
CS Technion
June 15th, 2011
31/46
Towards Overhead
Reduction
A typical red-black TreeMap:
◦ 14% of nodes are nodes
with a single leaf child
◦ 9% of nodes are nodes
with two leaf children
◦ 43% of nodes are leaves
◦ 34% of nodes are of other types
CS Technion
June 15th, 2011
32/46
“Fused” Tree Nodes
Fuse together two types of nodes
◦ Parent node with a single leaf child
● 28% of nodes
◦ Parent node with two leaf children
● 27% of nodes
Combination of the null-pointer elimination,
boolean elimination and object fusion
techniques
CS Technion
June 15th, 2011
33/46
Techniques for Other Nodes
Remaining leaves (11%):
◦ null-pointer elimination and boolean elimination
Other nodes (34%):
◦ boolean elimination
CS Technion
June 15th, 2011
34/46
FTreeMap
Nine concrete classes of nodes
Implementation is complicated!
◦ Even with dynamic dispatch
◦ Tree rotation in insertion/removal now has many
different cases
Implementation is slower
◦
◦
◦
◦
Successful search - ~2%
Insertion - ~20%
Iteration - ~220%
Removal - ~30%
CS Technion
June 15th, 2011
35/46
Expected Memory Consumption
and Overhead Saving HotSpot32
Total
Others
Parent w. 2
leaves
Parent w.
one leaf
Leaves
100%
34%
27%
28%
11%
Fraction
21.6
32
13.33
16
24
FTreeMap
(Bytes/Key)
15.4
24
8
12
16
FTreeSet
(Bytes/Key)
Memory overhead of 13.6 bytes per key for FTreeMap
◦ 43% less than TreeMap (24 bytes per key)
Memory overhead of 11.4 bytes per key for FTreeSet
◦ 59% less than TreeSet (28 bytes per key)
CS Technion
June 15th, 2011
36/46
Expected Memory Consumption
and Overhead Saving –
HotSpot64
Total
Others
Parent w. 2
leaves
Parent w.
one leaf
Leaves
100%
34%
27%
28%
11%
Fraction
37.8
56
24
28
40
FTreeMap
(Bytes/Key)
29.8
48
16
20
32
FTreeSet
(Bytes/Key)
Memory overhead of 21.8 bytes per key for both
FTreeMap and FTreeSet
◦ 55% less than TreeMap (48 bytes per key)
◦ 61% less than TreeSet (56 bytes per key)
CS Technion
June 15th, 2011
37/46
Consolidation of Tree
Nodes
Apply consolidation for all fields in
TreeMap.Entry
Nodes are now represented by integers
Requires resizing scheme for the internal
arrays
◦ E.g.:
● Start with a power of two length
● Double whenever keys array get full
CS Technion
June 15th, 2011
38/46
STreeMap
public class STreeMap<K, V> … {
// …
int root = -1;
K[] keys;
V[] vals;
int[] left;
int[] right;
int[] parent;
byte[] color;
// …
}
CS Technion
June 15th, 2011
39/46
STreeMap Implementation
Relatively easy and “natural”
Requires minimal changes
◦ Replace p.some_field expression
with some_field[p] expression
● Where p is changed to integer instead of reference
◦ ‘-1’ value is equivalent to null reference
Timing benchmarks show almost identical results
in compare to java.util.TreeMap
◦ Small slowdown in iteration and removal
CS Technion
June 15th, 2011
40/46
Expected Memory Consumption
and Overhead Saving
Discussion is for optimal case
◦ Where arrays length is same as #keys
◦ E.g.: if #keys is known during creation
A more sophisticated analysis is in place
◦ Taking into account usage statistics
CS Technion
June 15th, 2011
41/46
Expected Memory Consumption
and Overhead Saving HotSpot32
Overhead
Total
13
21
STreeMap
(Bytes/Key)
13
17
STreeSet
(Bytes/Key)
Memory overhead of 13 bytes per key for both
FTreeMap and FTreeSet
◦ 46% less than TreeMap (24 bytes per key)
◦ 54% less than TreeSet (28 bytes per key)
CS Technion
June 15th, 2011
42/46
Expected Memory Consumption
and Overhead Saving –
HotSpot64
Overhead
Total
13
29
STreeMap
(Bytes/Key)
13
21
STreeSet
(Bytes/Key)
Memory overhead of 13 bytes per key for both
FTreeMap and FTreeSet
◦ 73% less than TreeMap (48 bytes per key)
◦ 75% less than TreeSet (56 bytes per key)
CS Technion
June 15th, 2011
43/46
Conclusions
Overhead reduction for Java common collections
is practical, with minimal impact on performance
◦ even possible performance improvement
Justification for different implementations of Set
vs. Map unlike today
◦ And perhaps even of Map in 32-bit vs. Map in
64-bit, e.g.: SHashMap vs. FHashMap
Performance impact is hard to predict
◦ Due to many different usage scenarios
◦ Due to Java benchmarking difficulties
CS Technion
June 15th, 2011
44/46
Future Research
Tiny collections
◦ Overhead is more significant
◦ Require different implementations
◦ Is it worthwhile?
Automation of compaction techniques
◦ Is it possible?
◦ Can reduce code duplications
Performance
◦ If memory overhead decreases so rapidly why
implementations are not significantly faster?
CS Technion
June 15th, 2011
45/46
Things To Consider
Software bloat
◦ “Sickness” of present and future software
Programming in Java
◦ Memory overhead is a serious issue
◦ JVMs define different object memory models
● Affects this overhead
◦ Collections are not perfectly implemented
● Sometimes an ad-hoc implementation would be a
better choice
The end… ☺
CS Technion
June 15th, 2011
46/46