Xml and More: JRockit: Out-of-the-Box Behavior of Four Generational GC's

In [2], we have presented four Generational GCs in JRockit:

-Xgc:gencon
-Xgc:genpar
-Xgc:genconpar
-Xgc:genparcon

In this article, we will examine their Out-of-the-box (OOTB)^[7] behaviors, especially on adaptive (or automatic) memory management in JRockit.

Adaptive Memory Management

JRockit was the first JVM to recognize that adaptive optimizations based on feedback could be applied to all subsystems in the runtime,^[1] which include:

Code Generation
Memory Management
Threads and Synchronization

In this article, we will focus mainly on the memory management of R28. In R28, JRockit adaptively modifies many aspects of the garbage collection, but to a lesser extent than R27.^[1]

Adaptive optimizations based on runtime feedback work in this way:^[1] In the beginning, these changes are fairly frequent, but after a warm-up period and maintained steady-state behavior, the idea is that the JVM should settle upon an optimal algorithm. If, after a while, the steady-state behavior changes from one kind to another, the JVM may once again change strategies to a more optimal one.

So, JRockit may heuristically change garbage collection behavior at runtime, based on feedback from the memory system by doing:

Changing GC strategies
Automatic heap resizing
Getting rid of memory fragmentation at the right intervals
Recognizing when it is appropriate to "stop the world"
Changing the number of garbage collecting threads

-Xverbose:gc flag

To investigate the OOTB behavior of four mentioned Generational GC's (see also [2]), we have used:

-Xverbose:gc flag

Typically, the log shows things such as garbage collection strategy changes and heap size adjustments, as well as when a garbage collection take place and for how long.

OOTB Behavior

The OOTB behavior of four Generational GC's was investigated using one of our benchmarks, which has the following characteristics (see also [3,4]):

High churning rate
Allocating large objects

Here are the Average Response Time (ART; on relative scale) of four different GC strategies:

-Xgc:genpar

Baseline

-Xgc:gencon

-3.21%

-Xgc:genconpar

-36.56%

-Xgc:genparcon

-17.60%

After tests, it turns out that the default throughput GC (i.e., genpar) performs the best while other low-pausetimes GC's lag behind. There are a couple of reasons:

Large live data set

Our benchmark has large live data size (i.e., 1,471,425 KB) and we have assigned it a relative tight heap space (i.e., 2GB).

Concurrent sweeping phase cannot keep up with the workload generated from marking phase

This can be manifested by the following facts:

Emergency parallel sweep requested for both genconcon and genparcon
But, not genconpar

Changing GC Strategies at Runtime

The mark-and-sweep algorithm consists of two phases:[1,5,8]

Mark phase

In which, it finds and marks all accessible objects (or live objects)

Sweep phase

In which, it scans through the heap and reclaims all the unmarked objects

In the GC log of both gencon and genparcon, we have seen the following messages:

gencon

[INFO ][memory ][Thu Jan 9 06:02:12 2014][1389247332488][25536] [OC#6] Changing GC strategy from: genconcon to: genconpar, reason: Emergency parallel sweep requested.

genparcon

[INFO ][memory ][Wed Jan 8 21:15:00 2014][1389215700143][24163] [OC#1] Changing GC strategy from: genparcon to: genparpar, reason: Emergency parallel sweep requested.

Possible reasons could be that:

Sweeping and compaction (JRockit uses partial compaction to avoid fragmentation) tend to be more troublesome for parallelization. When you allow Java threads to be run concurrently in the sweep phase, it makes sweeping run longer and slower because more bookkeeping and/or synchronization needed. Also, using fewer GC threads introduces the issue that the garbage collector cannot keep up with the growing set of dead objects.

Conclusions

Ideally, an adaptive runtime would never need tuning at all, as the runtime feedback alone would determine how the application should behave for any given scenario at any given time. However, the computational complexity of mark-and-sweep algorithm is both a function of the amount of live data on the heap (for mark) and the actual heap size (for sweep). Depending on the amount of live data on the heap and system configuration, the OOTB behavior of chosen Generational GC's may or may not be able to keep up the garbage collections.

It is often argued that automatic memory management can slow down execution for certain applications to such an extent that it becomes impractical. This is because automatic memory management can introduce a high degree of non-determinism to a program that requires short response times. And, there are more bookkeeping or overhead. For example, it would need an Old GC to change garbage collection strategies. To avoid this, some manual tuning may be needed to get good application performance.

Before attempting to tune the performance of an application, it is important to know where the bottlenecks are. That way no unnecessary effort is spent on adding complex optimizations in places where it doesn't really matter.

Saturday, January 11, 2014

JRockit: Out-of-the-Box Behavior of Four Generational GC's

Adaptive Memory Management

-Xverbose:gc flag

OOTB Behavior

Changing GC Strategies at Runtime

Conclusions

References

No comments: