Alternative Languages on the JVM 2
Despite the huge allocation rate, the GC is not the problem. The allocation itself is the pain. If you make fields "final", there is a cost of the memory fence. This pain could be avoided with escape analysis and some way to avoid Integer.valueOf cache efforts.
"Fixing these issues would make these languages run easilly 2x faster." [I disbelieve your microbenchmark. - ed]
JRuby missed the metal because each minor action dispatches via "call". BimorphicInlining was guessing the wrong target, but a flag implied it worked. Confirmed this with GDB on x86 Java 6. JRuby has apparently fixed this with a flag that they're doing.
Issue was calls stitched together with a trampoline. The hope was that the JIT would inline the trampoline, folding up the complex lookup logic.
Issue was that the code was not analyzable: it was beyond the JIT's ability to figure out. It needed some kind of profile, but the JIT didn't do it. But there's no inlining during profilining, so profiles confused callers per call site: basically, the CachingCallSite::call looks like it's picking an arbitrary value.
The heavyweight JIT does inline the call, but there's no profile data and no dominant target, so no even speculative inlining fails.
JRuby 1.3-RC2 --fast (and some other flag) helps quite a bit. (Charlie Nutter dropped him a line to help him out.)
So much performance depends on inlining, but the rules on inlining are complex and subtle. There's also a language/bytecode mismatch. Can't assume an Ueber Escape Analysis -- infinite precision math is a huge missing part, too.
Doug Lea's fork/join framework has the same issue: a tiny piece of user code underneath a huge pile of library support.