JavaOne

EnfranchisedMind's Adventures at JavaOne 

Alternative Languages on the JVM 3

No good tools for telling why optimizers bailed out, which is an industry-wide problem.
 
So run microbenchmark with hot simple code and GDB it. Debugging flies lie and aren't QA'ed and are just a problem.
 
ROUND 2: Alternative Concurrency
 
Only talking about Clojure STM as only alternative concurrency approach [ Scala actor knock? -ed ] Nothing shared by default, with shared variables guarded by STM. Using the travelling salesmen problem w/worker ants, tried up to 600 ants on 768 Azul.
 
Tried contention microbenchmark: many CPUs bumping a shared counter. Performanced died: total throughput went down to zero over time, and adding more CPUs made things work. No graceful fall-off under pressure. No idea if this is a JDK 6 library failure or a Clojure usage model failure.
 
STM has not been good for 15 years and it's still not good, so hope of a graceful (slow degrade) in livelock would be great, but no great hope.
 
Note that complex locking schemes suffer the same way. A typical DB gets more concurrent requests, and then a peak, and then drops a bit, and then craters. Why doesn't DB queue requests & maintain max throughput?
 
Fall-off Under Load is Bad (FULB). Let's work towards reliable performance under pressure. Queuing requests beyond saturation point would be great. Publishing that would be better. Backpressure reporting at runtime would work, too.
 
Summary: Non-Java languages CAN run as fast on a JVM, but it takes (especially around inlining).

Comments [0]

Alternative Languages on the JVM 2

Despite the huge allocation rate, the GC is not the problem. The allocation itself is the pain. If you make fields "final", there is a cost of the memory fence. This pain could be avoided with escape analysis and some way to avoid Integer.valueOf cache efforts.
 
"Fixing these issues would make these languages run easilly 2x faster." [I disbelieve your microbenchmark. - ed]
 
JRuby missed the metal because each minor action dispatches via "call". BimorphicInlining was guessing the wrong target, but a flag implied it worked. Confirmed this with GDB on x86 Java 6. JRuby has apparently fixed this with a flag that they're doing.
 
Issue was calls stitched together with a trampoline. The hope was that the JIT would inline the trampoline, folding up the complex lookup logic.
 
Issue was that the code was not analyzable: it was beyond the JIT's ability to figure out. It needed some kind of profile, but the JIT didn't do it. But there's no inlining during profilining, so profiles confused callers per call site: basically, the CachingCallSite::call looks like it's picking an arbitrary value.
 
The heavyweight JIT does inline the call, but there's no profile data and no dominant target, so no even speculative inlining fails.
 
JRuby 1.3-RC2 --fast (and some other flag) helps quite a bit. (Charlie Nutter dropped him a line to help him out.)
 
So much performance depends on inlining, but the rules on inlining are complex and subtle. There's also a language/bytecode mismatch. Can't assume an Ueber Escape Analysis -- infinite precision math is a huge missing part, too.
 
Doug Lea's fork/join framework has the same issue: a tiny piece of user code underneath a huge pile of library support.

Comments [0]

Alternative Languages on the JVM

Bytecodes for alternative languages are different than those for Java, which makes different performance characteristics. Reputation about being fast is critical for adoption.
 
Testing Java, Scala, Clojure, JPC, Jython, and Rhino. [ No Groovy! -ed ]
 
Can a language go "To the Metal"? Write a mini loop of something that's an xor and divide. This is to figure out what the different languages are up to. (Running on the Azul JVM)
 
NOTE: The goal here is not to find the "fastest language". Also ignoring blatant language microbenchmark mismatch.
 
Java came out with an rolled loop in the JIT. Scala generated pretty much the same code. Clojure was very close -- no dispatch logic, but it did have overflow checks. JRuby did not have major inlining. Rhino ended up boxing the values.
 
Java's advantage came from the close fit with JIT expectations. Scala was pretty much there, too, but it had the same semantics as Java (no overflow promotion), and if that's added in, it looks just as bad as the others.
 
JPC is a Java DOS 3.1 emulator. For this example, had 16000 classes, 7800 compiles.
 
Clojure was "almost close". No obvious subroutine calls in inner loop. The downside shows a lot of ephemeral object allocation, which requires escape analysis to kill the git. There were lots of fixnum overflow checks everywhere, which could be turned off.
 
Jython has a massive allocation and extra locks are thrown in, in addition to the fix-num issue.
 
JavaScript/Rhino used doubles. No fix-num talks.
 
JRuby didn't hit the metal, because there was substantial pain when the language expected CachingCallSite::Call was inlined. This is apparently getting better.

Comments [0]

Concurrency Gotchas 5

Safe publication: don't escape before constructor finishes running. Most common example is to register yourself as a listener in the constructor. Starting a thread in a constructor is also bad (usually a cleaner/maintainer). Static factory methods are the solution.
 
Coordination: threads and wait/notify. Don't use stop, suspend, resume, destroy, run on Thread. ThreadGroups are also now passe.
 
To do wait/notify, you need to synchronize on the same lock. Important to check the condition in a while loop, because wait may not have actually been notified. And make sure the condition is satisfied. [And make sure the condition is volatile or otherwise ensured to be visible. -ed]
 
Performance: Deadlock. To avoid it: lock splitting (divorce locks and do not take locks at the same time), lock ordering (classic), lock timeout (via ReentrantLock), tryLock (nonblocking via Reentrant). SpinWait. i.e. "while(!flag) { Thread.sleep(100) }" -- you really want wait/notifyAll. Lock contention is also a problem: most data structures can break themselves down into a number of distinct partitions ("Lock Striping").
 
Then shared some particularly naughty code he found in Hibernate.

Comments [0]

Concurrency Gotchas 4

Volatile counters: "++"/"--" is not an atomic operation, so volatile is not sufficient. count++ is "read, write, read", so reads and writes will propagate, but will not be atomic
 
Composing atomic actions doesn't work: repeated calls to a thread-safe class aren't atomic, so multiple calls to a thread-safe class aren't thread safe. Slapping a synchronized lock on things is odd, because you're adding a different (distinct) lock from the thread-safe class itself. If your class uses synchronized methods, it locks on "this", which is available both inside and outside of the app, and that works. Better is to use encapsulated methods: ConcurrentHashMap.
 
Assigning 64 bit values is not atomic on 32 bit JVMs.

Comments [0]

Concurrency Gotchas 3

* Visibility issues
 
 
No locking at all, or only intermittent locking. This is why "synchronized" on methods was trying to be done.
 
You need to synchronize when pulling out of a mutable data structure, even though things aren't being modified: another thread may be modifying it at the same time.
 
Double-checked locking is just a fail, because reads don't work. It can be fixed on the volatile field. The Holder Idiom (from "Effective Java" is better): put the instance as a final field on a inner static class.
 
Racy single check is okay -- processing a final value and storing the result into a field. Potential is for duplicated effort, but that's cheaper than synchronized. Important part is to copy the field into a local variable, check it/work on it/whatever, and then assign the field back to the variable.
 
Volatile arrays do not have volatile elements. Notably, the AtomicInteger array solves this.

Comments [0]

Concurrency Gotchas 2

Whole series of bugs associated with synchronized issues.
 
 
synchronized(null) { // NPE
 
synchronized(obj) {
  obj = new MyObject(); // Undermined synchronization
}
 
private static final String LOCK = "LOCK";
synchronized(LOCK) { // Interned strings use the same lock elsewhere
 
 
private static final Integer LOCK = 0;
synchronized(LOCK) // Same as string issue
 
// ReentrantLock FAIL
final Lock lock = new ReentrantLock()
synchronized(lock) {
 
Should be:
 
final Lock lock = new RentrantLock()
lock.lock()
try {
 
} finally {
  lock.unlock()
}
 
 
You should lock on the variable being protected, or an explicit lock object (new Object())

Comments [0]

Concurrency Gotchas

Line around the block
 
Alex Miller of Terracotta
 
"Most of the problems I'm going to talk about in this talk, FindBugz is going to have detectors for these problems."
 
3 Categories: shared data, coordination, performance
 
Shared Data
- Locking
- Visibility
- Atomicity
- Safe Publication
 
* Mutable Statics (e.g. DateFormat/Calendar)
  + Create a new one each time you call it

Comments [0]

Modularity: JSR-294 and Beyond 4

Modules will be able to "provide" other names, which creates a way to swap implementations. It may also support annotations.
 
Moving on now: versioning. Chapter 13 of the Java Language Spec specifies what it means for two class files to be "binary compatible". Specifically, it outlaws removing members and changing signatures and thrown exceptions. Versioning specifies the amount of change and the style (signature vs. behavior).
 
Modules can be versioned:
module com.smokejumperit.foo @ 1.0 {
  ...
  requires module jdom @ 1.0;
  ...
}
 
Problems with versions: structure (shape/name), ordering (1.6:u13  
Finally, accessibility: a new visibility level will be allowed called "module", which makes it module-private. Basically, that means it is missing in the outside world, but visible in the local world. Also, there's a 'package' level being proposed which would be module-private and limited into the given package.

Comments [0]

Modularity: JSR-294 and Beyond 3

The "javac classpath" is dead news is a bit oversold. For one thing, it's still there and usable. For another, modulepath steps in. This allows multiple modules to be compiled together: source files are compiled at the same time. [Interesting the push is away from schlepping around compiled code. -ed]
 
But classpath doesn't scale, and it's an implementation detail of javac.
 
Using modulepath, you can specify:
planetjdk/src:rssutils/rome
 
...and it will infer that there will contain directories representing modules.
 
When compiling com.smokejumperit.foo/* classes, javac looks for com.smokejumperit.foo/module-info.java and infers the module name.

Comments [0]