Smaller Footprint for Java Collections TechTalk – Yahoo TLV Yuval Shimron 02/3/2016 Software Memory Bloat Tendency of (newer) computer software to have a significantly large memory footprint ◦ Without a good reason ◦ LinkedList<LinkedList<Object>> in Java? But memory is there (is it?), so why not use as much as we want from it? ◦ Moore’s law vs. Wirth’s law ◦ Microsoft Windows? CS Technion June 15th, 2011 2/46 Java & Java Collections Bloat Java and JVM hide hardware technicalities ◦ Abstraction, garbage collection, etc… ◦ Runtime & memory cost Modern compilers offset runtime cost ◦ Automatic optimizations, like JIT Memory cost is a headache! ◦ Simple HashSet with 3 ASCII characters (3 bytes) consumes 256 bytes! ◦ Difficult to automatic optimize squandering memory allocation programming ◦ Negative impact on time performance, scalability and usability CS Technion June 15th, 2011 3/46 My M.Sc. Research Focuses on the JRE java.util.Map and java.util.Set implementations: ◦ java.util.Hash{Map, Set} ◦ java.util.Tree{Map, Set} ◦ Probably the most common Java collections in use excluding java.util.List (LinkedList, ArrayList) Presents five ‘general use’ memory compaction techniques Suggests alternative implementations to above classes applying these techniques CS Technion June 15th, 2011 4/46 Main Results Significant improvement of the heap memory overhead ◦ Full compatibility to existing implementation ◦ HashMap / HashSet ● Upto 50% reduction (common use) in HotSpot32 and upto 70% in HotSpot64 ◦ TreeMap / TreeSet ● Upto 60% reduction in HotSpot32 and upto 75% in HotSpot64 Can even do better if we change algorithm Runtime affect? CS Technion June 15th, 2011 5/46 Memory Overhead per Entry A minimal Set/Map implementation must hold n references to n keys or 2n references to n keys and n values ◦ 4 or 8 bytes per entry (32-bit JVM) ◦ 8 or 16 bytes per entry (64-bit JVM) A (runtime) practical implementation must include some internal data structure(s) and other fields The memory overhead refers to this additional data CS Technion June 15th, 2011 6/46 Java’s Object Memory Model Objects in Java always contain a header ◦ 8 bytes (HotSpot32) or 16 bytes (HotSpot64) Size is always 8 bytes aligned long, double: 8 bytes each int: 4 bytes; byte, boolean: 1 byte each References: 4 (HotSpot32) or 8 (HotSpot64) bytes Array of length m : m*s bytes (s - size of entry) ◦ It’s an object - has a header (8/16 bytes) ◦ Length also encoded (int field) CS Technion June 15th, 2011 7/46 Object Memory Model Example class A Size of new B()? A next; int id; HotSpot32: class B 8 + 8 + 1 = 24 bytes byte b; HotSpot64: 16 + 12 + 1 = 40 bytes CS Technion June 15th, 2011 8/46 java.util.HashMap t[0] t[1] t[2] t[3] t[4] null null Entry null null null t[15] ... null … Entry[] t HashMap m int hash = -1937791998; hash(“architects”.hashCode()); -137491710; hash(“advertisers”.hashCode()); key = “advertisers” “architects” int ivalue = 2; hash 15; = & Integer(735) Integer(34) -1937791998 -137491710 key, value, t[i]); t[i] hash Entry(hash, Entry(-137491710,“architects”, t[2] = new= Entry(-1937791998,“advertisers”, next = Entry null {…} null); 34, t[2]); key = “architects” m.put(“architects”, m.put(“advertisers”, 34); 735); value = Integer(34) hash = -137491710 next = null CS Technion June 15th, 2011 9/46 Keys Density in HashMap We denote n to be the current #keys t length m , is doubled whenever n > m *LF ◦ LF is an internal field (default 0.75) We denote p to be the current density ◦ After first resize (default): ● 0.375 < p <= 0.75 CS Technion June 15th, 2011 10/46 Memory Consumption and Overhead of HashMap/HashSet Overhead (64-bit) Total (64-bit) 32 + 8/p Overhead (32-bit) 16 + 4/p 48 + 8/p 40 + 8/p 48 24 + 4/p HashMap (Bytes/Key) 20 + 4/p HashSet (Bytes/Key) 24 HashMap p = 0.5 (Bytes/Key) 64 56 Total (32-bit) 32 28 CS Technion HashSet p = 0.5 (Bytes/Key) June 15th, 2011 11/46 Consumption and Overhead of HashMap/HashSet (HotSpot32) CS Technion June 15th, 2011 12/46 Consumption and Overhead of HashMap/HashSet (HotSpot64) CS Technion June 15th, 2011 13/46 Memory Compaction Techniques Five techniques that compact the internal representation of collections ◦ Should usually be combined together (as possible) ◦ Reduction depends on specific JVM memory model ● Fields alignment ● Size of types, e.g.: byte in HotSpot vs. J9 null pointer elimination boolean elimination Objects fusion Fields pull-up Fields consolidation CS Technion June 15th, 2011 14/46 (1) null pointer elimination Symptom: class C defines a pointer field p which is null in many of C’s instances ◦ Overhead of up to 8 bytes in both 32 & 64 bit JVMs Solution: class C class C P p() { return null; } P p; class Cp P p; P p() { return p; } CS Technion June 15th, 2011 15/46 (2) boolean elimination Symptom: class C defines a single boolean field b ◦ Overhead of up to 8 bytes in both 32 & 64 bit JVMs abstract class C Solution: abstract boolean b(); class C boolean b; class Ct class Cf boolean b() { return true; } boolean b() { return false; } CS Technion June 15th, 2011 16/46 (3) Object fusion Symptom: class C defines an ownership pointer in a non-null field of type C’ ◦ Overhead of object header, C → C’ pointer and perhaps a back pointer C’ → C Solution: class C class C C’ c’ = new c’(); A a; B b; class C’ A a; B b; CS Technion June 15th, 2011 17/46 (4) Fields pull-up Symptom: class C’ inherits from C and C is not fully occupied due to alignment ◦ Possible alignment waste in C’ Solution: class C int i; short s; byte b; class C int i; short s; class C’ class C’ byte b; CS Technion June 15th, 2011 18/46 (5) Fields consolidation Symptom: same field is defined in a large number of objects ● Memory waste due to object header and possibly alignment Solution: class C class C class C class C b; byte class C b; byte byte b; byte b; … byte b; class C class C class C class C b; byte class C b; byte byte b; byte b; … C[] cs byte[] bs C[] cs CS Technion June 15th, 2011 19/46 Back to HashMap Target: apply techniques with minimal changes to internal implementation ◦ Full compatibility is achievable For maximal memory savings, a better understanding of hashing behavior is needed ◦ In particular, of the buckets distribution CS Technion June 15th, 2011 20/46 Buckets size distribution CS Technion June 15th, 2011 21/46 Buckets size distribution When p<0.69 most table entries are empty! ◦ A lot of wasted memory Most of non-empty buckets are singletons ◦ Around 77% for p=0.5 Buckets of size 2 are most of the rest ◦ Around 19.3% for p=0.5 Can’t reduce fraction of empty buckets ◦ Since requires different hashing technique But we can try reduce buckets overhead CS Technion June 15th, 2011 22/46 “Fused” Buckets Buckets up to size 4 are “fused” together Bucket of size s has objects Single object to represent most of buckets in default density Combination of all memory compaction techniques CS Technion June 15th, 2011 23/46 FHashMap t chv … t[0] t[1] t[2] t[3] t[4] null null Entry null null null 0 0 14 0 0 0 chv[0] chv[1] chv[2] chv[3] chv[4] t[15] ... ... null 0 chv[15] Entry[] t byte[] chv int hash = -2061004282; hash1(“architects”.hashCode()); -106482162; hash1(“advertisers”.hashCode()); key1 ==“architects” int i = hash2(hash) 2; & 15; key1 “architects” == Integer(34) value1 Integer(34) hash =value1 14; (byte) 6; hash; FHashMap m “advertisers” t[i] =key2 t[2] new = t[i].add(hash, t[2].add(6,“advertisers”,34); Entry(key, Entry(“architects”, key, value); value); 34); chv[2]value2 chv[i] = hash; 14; = Integer(735) hash2 = 6 hash3 = 0 hash4 = 0 34); m.put(“architects”, m.put(“advertisers”, 735); hash5 = 0 next = null CS Technion June 15th, 2011 24/46 “Squashed” Buckets Extension of fields consolidation ◦ Add array for values of first key Buckets of size 1 now have a single field ◦ Can be referenced directly from the table! ● Eliminating object header and reference to it ◦ This is the common case ☺ Bad: significant memory overhead due to empty buckets ◦ Mainly for lower values of p and 64-bit JVM ◦ But even then still better than HashMap CS Technion June 15th, 2011 25/46 Expected Saving in Memory Overhead - HotSpot32 CS Technion June 15th, 2011 26/46 Expected Saving in Memory Overhead – HotSpot64 CS Technion June 15th, 2011 27/46 Timing Benchmarks Results are generally inconclusive and sometimes even inconsistent ◦ But generally show speedup / less slowdown as table gets larger for both {S,F}HashMap Nevertheless they show: ◦ Significant slowdown for removals and iterations ◦ Some slowdown for retrievals in small tables ◦ Some speedup for retrievals in large tables We did not see significant speedup/slowdown for genera benchmarks like SPECjbb, SPECjvm and daCapo CS Technion June 15th, 2011 28/46 java.util.TreeMap Implementation of the Map interface using binary search tree Red-black tree ◦ Leaves & root are black ◦ Children of red node are black ◦ Simple path from node to descendant leaves has same number of black nodes ◦ Height is bounded by 2log(n+1) ◦ Rotations during insertion/removal keep tree balanced java.util.TreeMap ◦ Red-black tree leaf is represented as null ◦ Leaf node thus represents a node whose two children should be two (black) leaves CS Technion June 15th, 2011 29/46 TreeMap node parent value key boolean color left right CS Technion June 15th, 2011 30/46 Memory Consumption and Overhead of TreeMap/TreeSet Overhead (64-bit) Total (64-bit) 48 Overhead (32-bit) 24 64 56 Total (32-bit) 32 28 HashMap (Bytes/Key) HashSet (Bytes/Key) Same overhead as in HashMap for p = 0.5 CS Technion June 15th, 2011 31/46 Towards Overhead Reduction A typical red-black TreeMap: ◦ 14% of nodes are nodes with a single leaf child ◦ 9% of nodes are nodes with two leaf children ◦ 43% of nodes are leaves ◦ 34% of nodes are of other types CS Technion June 15th, 2011 32/46 “Fused” Tree Nodes Fuse together two types of nodes ◦ Parent node with a single leaf child ● 28% of nodes ◦ Parent node with two leaf children ● 27% of nodes Combination of the null-pointer elimination, boolean elimination and object fusion techniques CS Technion June 15th, 2011 33/46 Techniques for Other Nodes Remaining leaves (11%): ◦ null-pointer elimination and boolean elimination Other nodes (34%): ◦ boolean elimination CS Technion June 15th, 2011 34/46 FTreeMap Nine concrete classes of nodes Implementation is complicated! ◦ Even with dynamic dispatch ◦ Tree rotation in insertion/removal now has many different cases Implementation is slower ◦ ◦ ◦ ◦ Successful search - ~2% Insertion - ~20% Iteration - ~220% Removal - ~30% CS Technion June 15th, 2011 35/46 Expected Memory Consumption and Overhead Saving HotSpot32 Total Others Parent w. 2 leaves Parent w. one leaf Leaves 100% 34% 27% 28% 11% Fraction 21.6 32 13.33 16 24 FTreeMap (Bytes/Key) 15.4 24 8 12 16 FTreeSet (Bytes/Key) Memory overhead of 13.6 bytes per key for FTreeMap ◦ 43% less than TreeMap (24 bytes per key) Memory overhead of 11.4 bytes per key for FTreeSet ◦ 59% less than TreeSet (28 bytes per key) CS Technion June 15th, 2011 36/46 Expected Memory Consumption and Overhead Saving – HotSpot64 Total Others Parent w. 2 leaves Parent w. one leaf Leaves 100% 34% 27% 28% 11% Fraction 37.8 56 24 28 40 FTreeMap (Bytes/Key) 29.8 48 16 20 32 FTreeSet (Bytes/Key) Memory overhead of 21.8 bytes per key for both FTreeMap and FTreeSet ◦ 55% less than TreeMap (48 bytes per key) ◦ 61% less than TreeSet (56 bytes per key) CS Technion June 15th, 2011 37/46 Consolidation of Tree Nodes Apply consolidation for all fields in TreeMap.Entry Nodes are now represented by integers Requires resizing scheme for the internal arrays ◦ E.g.: ● Start with a power of two length ● Double whenever keys array get full CS Technion June 15th, 2011 38/46 STreeMap public class STreeMap<K, V> … { // … int root = -1; K[] keys; V[] vals; int[] left; int[] right; int[] parent; byte[] color; // … } CS Technion June 15th, 2011 39/46 STreeMap Implementation Relatively easy and “natural” Requires minimal changes ◦ Replace p.some_field expression with some_field[p] expression ● Where p is changed to integer instead of reference ◦ ‘-1’ value is equivalent to null reference Timing benchmarks show almost identical results in compare to java.util.TreeMap ◦ Small slowdown in iteration and removal CS Technion June 15th, 2011 40/46 Expected Memory Consumption and Overhead Saving Discussion is for optimal case ◦ Where arrays length is same as #keys ◦ E.g.: if #keys is known during creation A more sophisticated analysis is in place ◦ Taking into account usage statistics CS Technion June 15th, 2011 41/46 Expected Memory Consumption and Overhead Saving HotSpot32 Overhead Total 13 21 STreeMap (Bytes/Key) 13 17 STreeSet (Bytes/Key) Memory overhead of 13 bytes per key for both FTreeMap and FTreeSet ◦ 46% less than TreeMap (24 bytes per key) ◦ 54% less than TreeSet (28 bytes per key) CS Technion June 15th, 2011 42/46 Expected Memory Consumption and Overhead Saving – HotSpot64 Overhead Total 13 29 STreeMap (Bytes/Key) 13 21 STreeSet (Bytes/Key) Memory overhead of 13 bytes per key for both FTreeMap and FTreeSet ◦ 73% less than TreeMap (48 bytes per key) ◦ 75% less than TreeSet (56 bytes per key) CS Technion June 15th, 2011 43/46 Conclusions Overhead reduction for Java common collections is practical, with minimal impact on performance ◦ even possible performance improvement Justification for different implementations of Set vs. Map unlike today ◦ And perhaps even of Map in 32-bit vs. Map in 64-bit, e.g.: SHashMap vs. FHashMap Performance impact is hard to predict ◦ Due to many different usage scenarios ◦ Due to Java benchmarking difficulties CS Technion June 15th, 2011 44/46 Future Research Tiny collections ◦ Overhead is more significant ◦ Require different implementations ◦ Is it worthwhile? Automation of compaction techniques ◦ Is it possible? ◦ Can reduce code duplications Performance ◦ If memory overhead decreases so rapidly why implementations are not significantly faster? CS Technion June 15th, 2011 45/46 Things To Consider Software bloat ◦ “Sickness” of present and future software Programming in Java ◦ Memory overhead is a serious issue ◦ JVMs define different object memory models ● Affects this overhead ◦ Collections are not perfectly implemented ● Sometimes an ad-hoc implementation would be a better choice The end… ☺ CS Technion June 15th, 2011 46/46