Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu 2/14/01 RightOrder : Telegraph & Java 1 Telegraph Overview 100% Java In memory database Query engine for alternative sources Web Sensors Testbed for adaptive query processing 2/14/01 RightOrder : Telegraph & Java 2 Telegraph & WWW : FFF Federated Facts and Figures Collect Data on the Election Based on Avnur and Hellerstein Sigmod ‘00 Work: Eddies Route tuples dynamically based on source loads and selectivities 2/14/01 RightOrder : Telegraph & Java 3 fff.cs.berkeley.edu 2/14/01 RightOrder : Telegraph & Java 4 Architecture Overview Query Parser Preoptimizer Jlex & CUP Chooses Access Paths Eddy Routes Tuples To Modules 2/14/01 RightOrder : Telegraph & Java 5 Modules Doubly-Pipelined Hash Joins Index Joins For probing into web-pages Aggregates & Group Bys Scans Telegraph Screen Scraper: View web pages as Relations 2/14/01 RightOrder : Telegraph & Java 6 Execution Framework One Thread Per Query Iterator Model for Queries Experimented with Thread Per Module Linux threads are expensive Two Memory Management Models Java Objects Home Rolled Byte Arrays 2/14/01 RightOrder : Telegraph & Java 7 Tuples as Java Objects Tuple Data stored as a Java Object Each in separate byte array Tuples copied on joins, aggregates Issues Memory Management between Modules, Queries, Garbage collector control Allocation Overhead Performance: 30,000 200byte tuples / sec -> 5.9 MB / sec 2/14/01 RightOrder : Telegraph & Java 8 Tuples As Byte Array All tuples stored in same byte array / query Surrogate Java Objects Byte Array Offset, Size Offset, Size Offset, Size Surrogate Objects Directory 2/14/01 RightOrder : Telegraph & Java 9 Byte Array (cont) Allows explicit control over memory / query (or module) Compaction eliminates garbage collection randomness Lower throughput: 15,000 t/sec No surrogate object reuse Synchronization costs 2/14/01 RightOrder : Telegraph & Java 10 Other System Pieces XML Based Catalog Java Introspection Helps Applet-based Front End JDBC Interface Fault Tolerance / Multiple Servers Via simple UNIX tools 2/14/01 RightOrder : Telegraph & Java 11 RightOrder Questions Performance vs. C JNI Issues Garbage Collection Issues Serialization Costs Lots of Java Objects JDBC vs ODI 2/14/01 RightOrder : Telegraph & Java 12 Performance Vs. C JVM + JIT Performance Encouraging: IBM JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks IBM JIT 2x Faster than HotSpot for Telegraph Scans Stability Issues www.javalobby.org/features/jpr 2/14/01 RightOrder : Telegraph & Java 13 JIT Performance vs C Optimized Intel Optimized MS IBM JIT Source: www.javalobby.org/features/jpr 2/14/01 RightOrder : Telegraph & Java 14 Performance Gotchas Synchronization ~2x Function Call overhead in HotSpot Used in Libraries: Vector, StringBuffer • String allocation single most intensive operation in Telegraph • Mercatur: 20% initial CPU Cost Garbage Collection Java dumb about reuse Mercatur: 15% Cost OceanStore: 30ms avg latency, 1S peak 2/14/01 RightOrder : Telegraph & Java 15 More Gotchas Finalization Finalizing methods allows inlining Serialization RMI, JNI use serialization Philippsen & Haumacher Show Performance Slowness 2/14/01 RightOrder : Telegraph & Java 16 Performance Tools Tools to address some issues JAX, Jopt: make bytecode smaller, faster • www.alphaworks.ibm.com/tech/JAX www.condensity.com • Bytecode optimizer www.optimizeit.com • Good profiler, memory allocation and garbage collection monitor 2/14/01 RightOrder : Telegraph & Java 17 JNI Issues Not a part of Telegraph JNI overhead quite large (JDK 1.1.8, PII 300 MHz) Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis, UC Berkeley, 1999. 2/14/01 RightOrder : Telegraph & Java 18 More JNI But, this is being worked on JNI allows synchronization (pin / unpin), thread management IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII) See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html GCJ + CNI: access Java objects via C++ classes http://gcc.gnu.org/java/ 2/14/01 RightOrder : Telegraph & Java 19 Garbage Collection Performance Big problem: 1 S or longer to GC lots of objects Most Java GCs blocking (not concurrent or multithreaded) Unexpected Latencies OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC In high-concurrency apps, such delays disastrous 2/14/01 RightOrder : Telegraph & Java 20 Garbage Collection Cont. Limited Control Runtime.gc() only a hint Runtime.freeMemory() unreliable No way to disable No object reuse Lots of unnecessary memory allocations 2/14/01 RightOrder : Telegraph & Java 21 Serialization Not in Telegraph Philippsen and Haumacher, “More Efficient Object Serialization.” International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. Sun Serialization provides versioning Serialization costs for RMI are 50% of total RMI time Discard longevity for 7x speed up Complete class description stored with each serialized object Most standard classes forward compatible (JDK docs note special cases) See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html 2/14/01 RightOrder : Telegraph & Java 22 Lots of Objects GC Issues Serious Memory Management GC makes programmers allocate willy-nilly Hard to partition memory space Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries 2/14/01 RightOrder : Telegraph & Java 23 Storage Overheads Java Object class is big: Integer requires 23 bytes in JDK 1.3 int requires 4.3 bytes No way to circumvent object fields Use primitives or hand-written serialization whenever possible 2/14/01 RightOrder : Telegraph & Java 24 JDBC vs ODI No experience with Oracle JDBC overheads are high, but don’t have specific performance numbers 2/14/01 RightOrder : Telegraph & Java 25 Bottom Line Java great for many reasons Java performance isn’t bad GC, standard libraries, type safety, introspection, etc. Significant reductions in development and debugging time. Especially with some tuning Memory Management an Issue Lack of control over JVMs bad When to garbage collect, how to serialize, etc. 2/14/01 RightOrder : Telegraph & Java 26