Xml and More: JRockit: Parallel vs Concurrent Collectors

The mark and sweep algorithm is the basis of all commercial garbage collectors in JVMs today.^[1,5,12] There are various implementations of mark and sweep algorithm in JRockit, which use different garbage collection strategies.

Unless you really know what you are doing, the –XgcPrio flag is the preferred way to tell JRockit what garbage collection strategies to run.^[1] However, if you want further control over GC behavior, a more fine grained garbage collection strategy can be set from the command line using the –Xgc flag.

In this article, we will examine four Generational Garbage Collectors (Generational GCs) in JRockit, which uses nursery. Note that JRockit refers to the young generations^[2] as nurseries.

Generational Garbage Collection Strategies

The garbage collection strategy is defined by nursery, mark strategy and sweep strategy. The nursery can either be present or not present. The mark and sweep phases can either be concurrent or parallel.

Here we only cover Generational GCs that use the nursery. For example, we will ignore "–Xgc:singlecon"—or single generational concurrent—in this discussion. As shown below, JRockit supports four different garbage collection strategies. They differ only in the using either concurrent or parallel in its mark and sweep phases.

-Xgc: option	Mark	Sweep
genconcon or gencon	Concurrent	Concurrent
genconpar	Concurrent	Parallel
genparpar or genpar (default)	Parallel	Parallel
genparcon	Parallel	Concurrent

GC Mode

Running JRockit with –Xverbose:gc will output plenty of verbose information on what the JVM memory management system is doing. This information includes garbage collections, where they take place (nurseries or old space), changes of GC strategy, and the time a particular garbage collection takes. For example, we have specified the following options:

-Xverbose:gc -Xverbosedecorations=level,module,timestamp,millis,pid

to find the GC mode of each strategy uses. As you can see, different strategies are designed for different purposes:

Throughput, or
Low lantency (or short pausetimes)

-Xgc: option	GC Mode
genconcon or gencon	Garbage collection optimized for short pausetimes, strategy: Generational Concurrent Mark & Sweep.
genconpar	Garbage collection optimized for short pausetimes, strategy: Generational Concurrent Mark, Parallel Sweep.
genparpar or genpar	Garbage collection optimized for throughput, strategy: Generational Parallel Mark & Sweep.
genparcon	Garbage collection optimized for short pausetimes, strategy: Generational Parallel Mark, Concurrent Sweep.

Stopping the World (STW)

Stopping the world means that collectors are halting all executing Java threads to run a collection cycle, thus guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector is running. This has the obvious disadvantage that the application can perform no useful work while a collection cycle is running.

Even though an algorithm such as mark and sweep may run mostly in parallel, it is still a complicated problem that references may change during actual garbage collections. If the Java application is allowed to run, executing arbitrary field assignments and move instructions at the same time as the garbage collector tries to clean up the heap, synchronization between the application and the garbage collector is needed. Stopping the world for short periods of time is necessary for most garbage collectors and this is one of the main sources of latency and non-determinism in a runtime.

Parallel vs Concurrent Collectors

The mark-and-sweep algorithm consists of two phases:^[5]

Mark phase

In which, it finds and marks all accessible objects (or live objects).
Marking is very parallelizable and large parts of a mark phase can also be run concurrently with executing Java code.

Sweep phase

In which, it scans through the heap and reclaims all the unmarked objects.
Sweeping and compaction (JRockit uses partial compaction to avoid fragmentation), however, tend to be more troublesome for parallelization, even though it is fairly simple to compact parts of the heap while others are swept, thus achieving better throughput.

Many stages of a mark and sweep garbage collection algorithm can be made to run concurrently with the executing Java application. Marking is the most critical of the stages, as it usually takes up around 90 percent of the total garbage collection time. The computational complexity of mark and sweep is both a function of the amount of live data on the heap (for mark) and the actual heap size (for sweep).

Parallel collectors require stop-the-world pause for the whole duration of major collection phases (mark or sweep), but employ all available cores to compress pause time. Parallel collectors usually have better throughput, but they are not a good fit for pause critical applications.

Concurrent collectors try to do most work concurrently (though they also do it in parallel on multi-core systems), stopping the application only for short duration. Note that the concurrent collection algorithm in JRockit is fairly different from both HotSpot's concurrent collectors (CMS^[3] and G1^[4] ).

Conclusions

In memory management, the time spent in GC is detrimental to an application's performance. So, minimizing the time spent in GC seems to be a good thing to achieve (or is it?). Collectors use simple stop-the-world strategy consumes less CPU time because it is simple and requires less bookkeeping. However, simple stop-the-world collectors halt all Java threads and they are not a good fit for pause critical applications. Note that lantency is caused by not spending CPU cycle executing Java code.

Optimizing for low lantencies is basically a matter of avoiding stopping the world and let the Java application run as much as possible. However, performing GC and executing Java code concurrently requires a lot more bookkeeping and thus, the total time spent in GC will be longer. Also, if the garbage collector gets too little total CPU time and it can't keep up with the allocation frequency, the heap will fill up and an OutOfMemoryError^[6-8] will be thrown by the JVM. Therefore, another key to low latency is to keep heap usage and fragmentation at a proper level.

To conclude, the tradeoff in memory management is between maximizing throughput and maintaining low latencies. In the real world, we can't achieve both at the same time.

References

Oracle JRockit- The Definitive Guide
Understanding Garbage Collection (HotSpot)
Understanding CMS GC Logs (HotSpot)
Garbage First Garbage Collector Tuning (HotSpot)
Mark-and-Sweep Garbage Collection
How to Debug Native OutOfMemory in JRockit (Xml and More)
Diagnosing OutOfMemoryError or Memory Leaks in JRockit (XML and More)
Diagnosing Java.lang.OutOfMemoryError (XML and More)
JRockit: All Posts on "Xml and More"
JRockit R27.8.1 and R28.3.1 versioning

Note that R28 went from R28.2.9 to R28.3.1—these are just ordinary maintenance releases, not feature releases. There is zero significance to the jump in minor version number.

JRockit: Out-of-the-Box Behavior of Four Generational GC's (Xml and More)

Friday, January 10, 2014

JRockit: Parallel vs Concurrent Collectors