Telegraph Java Experiences

advertisement
Telegraph Java
Experiences
Sam Madden
UC Berkeley
madden@cs.berkeley.edu
2/14/01
RightOrder : Telegraph & Java
1
Telegraph Overview



100% Java
In memory database
Query engine for alternative sources



Web
Sensors
Testbed for adaptive query processing
2/14/01
RightOrder : Telegraph & Java
2
Telegraph & WWW : FFF
Federated Facts and Figures
 Collect Data on the Election
 Based on Avnur and Hellerstein
Sigmod ‘00 Work: Eddies


Route tuples dynamically based on
source loads and selectivities
2/14/01
RightOrder : Telegraph & Java
3
fff.cs.berkeley.edu
2/14/01
RightOrder : Telegraph & Java
4
Architecture Overview

Query Parser


Preoptimizer


Jlex & CUP
Chooses Access Paths
Eddy

Routes Tuples To Modules
2/14/01
RightOrder : Telegraph & Java
5
Modules
Doubly-Pipelined Hash Joins
 Index Joins


For probing into web-pages
Aggregates & Group Bys
 Scans


Telegraph Screen Scraper: View
web pages as Relations
2/14/01
RightOrder : Telegraph & Java
6
Execution Framework


One Thread Per Query
Iterator Model for Queries



Experimented with Thread Per Module
Linux threads are expensive
Two Memory Management Models


Java Objects
Home Rolled Byte Arrays
2/14/01
RightOrder : Telegraph & Java
7
Tuples as Java Objects




Tuple Data stored as a Java Object
Each in separate byte array
Tuples copied on joins, aggregates
Issues



Memory Management between Modules,
Queries, Garbage collector control
Allocation Overhead
Performance: 30,000 200byte tuples /
sec -> 5.9 MB / sec
2/14/01
RightOrder : Telegraph & Java
8
Tuples As Byte Array


All tuples stored in same byte array /
query
Surrogate Java Objects Byte Array
Offset, Size
Offset, Size
Offset, Size
Surrogate Objects
Directory
2/14/01
RightOrder : Telegraph & Java
9
Byte Array (cont)
Allows explicit control over
memory / query (or module)
 Compaction eliminates garbage
collection randomness
 Lower throughput: 15,000 t/sec

No surrogate object reuse
 Synchronization costs

2/14/01
RightOrder : Telegraph & Java
10
Other System Pieces

XML Based Catalog

Java Introspection Helps
Applet-based Front End
 JDBC Interface
 Fault Tolerance / Multiple Servers


Via simple UNIX tools
2/14/01
RightOrder : Telegraph & Java
11
RightOrder Questions
Performance vs. C
 JNI Issues
 Garbage Collection Issues
 Serialization Costs
 Lots of Java Objects
 JDBC vs ODI

2/14/01
RightOrder : Telegraph & Java
12
Performance Vs. C




JVM + JIT Performance Encouraging:
IBM JIT == 60% of Intel C compiler,
faster than MSC for low level
benchmarks
IBM JIT 2x Faster than HotSpot for
Telegraph Scans
Stability Issues
www.javalobby.org/features/jpr
2/14/01
RightOrder : Telegraph & Java
13
JIT Performance vs C
Optimized Intel
Optimized MS
IBM JIT
Source: www.javalobby.org/features/jpr
2/14/01
RightOrder : Telegraph & Java
14
Performance Gotchas

Synchronization


~2x Function Call overhead in HotSpot
Used in Libraries: Vector, StringBuffer
• String allocation single most intensive operation
in Telegraph
• Mercatur: 20% initial CPU Cost

Garbage Collection



Java dumb about reuse
Mercatur: 15% Cost
OceanStore: 30ms avg latency, 1S peak
2/14/01
RightOrder : Telegraph & Java
15
More Gotchas

Finalization


Finalizing methods allows inlining
Serialization
RMI, JNI use serialization
 Philippsen & Haumacher Show
Performance Slowness

2/14/01
RightOrder : Telegraph & Java
16
Performance Tools

Tools to address some issues

JAX, Jopt: make bytecode smaller, faster
• www.alphaworks.ibm.com/tech/JAX

www.condensity.com
• Bytecode optimizer

www.optimizeit.com
• Good profiler, memory allocation and garbage
collection monitor
2/14/01
RightOrder : Telegraph & Java
17
JNI Issues


Not a part of Telegraph
JNI overhead quite large (JDK
1.1.8, PII 300 MHz)
Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis,
UC Berkeley, 1999.
2/14/01
RightOrder : Telegraph & Java
18
More JNI

But, this is being worked on


JNI allows synchronization (pin /
unpin), thread management


IBM JDK 100,000 B copy in 5ms, vs 23ms
for 1.1.8 (500 Mhz PIII)
See
http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html
GCJ + CNI: access Java objects via
C++ classes

http://gcc.gnu.org/java/
2/14/01
RightOrder : Telegraph & Java
19
Garbage Collection

Performance



Big problem: 1 S or longer to GC lots of objects
Most Java GCs blocking (not concurrent or multithreaded)
Unexpected Latencies


OceanStore: Network File Server, 30ms avg.
latencies for network updates, 1000 ms peak due
to GC
In high-concurrency apps, such delays disastrous
2/14/01
RightOrder : Telegraph & Java
20
Garbage Collection Cont.

Limited Control




Runtime.gc() only a hint
Runtime.freeMemory() unreliable
No way to disable
No object reuse

Lots of unnecessary memory allocations
2/14/01
RightOrder : Telegraph & Java
21
Serialization


Not in Telegraph
Philippsen and Haumacher, “More Efficient Object Serialization.”
International Workshop on Java for Parallel and Distributed
Computing. San Juan, April, 1999.



Sun Serialization provides versioning



Serialization costs for RMI are 50% of total RMI time
Discard longevity for 7x speed up
Complete class description stored with each serialized
object
Most standard classes forward compatible (JDK docs
note special cases)
See
http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html
2/14/01
RightOrder : Telegraph & Java
22
Lots of Objects


GC Issues Serious
Memory Management



GC makes programmers allocate willy-nilly
Hard to partition memory space
Telegraph byte-array ugliness due to
inability to limit usage of concurrent
modules, queries
2/14/01
RightOrder : Telegraph & Java
23
Storage Overheads

Java Object class is big:

Integer requires 23 bytes in JDK 1.3
int requires 4.3 bytes
 No way to circumvent object
fields
 Use primitives or hand-written
serialization whenever possible

2/14/01
RightOrder : Telegraph & Java
24
JDBC vs ODI
No experience with Oracle
 JDBC overheads are high, but
don’t have specific performance
numbers

2/14/01
RightOrder : Telegraph & Java
25
Bottom Line

Java great for many reasons



Java performance isn’t bad



GC, standard libraries, type safety, introspection,
etc.
Significant reductions in development and
debugging time.
Especially with some tuning
Memory Management an Issue
Lack of control over JVMs bad

When to garbage collect, how to serialize, etc.
2/14/01
RightOrder : Telegraph & Java
26
Download