Cross Column

Showing posts with label Hotspot VM. Show all posts
Showing posts with label Hotspot VM. Show all posts

Sunday, November 23, 2014

G1 GC: Humongous Objects and Humongous Allocations

For Garbage First Garbage Collector (G1 GC), any object that is more than half a region size is considered a Humongous Object.   A humongous object is directly allocated into Humongous Regions.[1]

In this article, we will discuss the following topics:
  • Humongous regions and humongous allocations
  • How humongous allocations impact G1's performance?
  • How to detect humongous allocations?
    • Basic Investigation
    • Advanced Investigation

Humongous Regions and Humongous Allocations


A humongous object is allocated directly in the humongous regions. These humongous regions are a contiguous set of regions. StartsHumongous marks the start of the contiguous set and ContinuesHumongous marks the continuation of the set.

Before allocating any humongous region, the marking threshold is checked, initiating a concurrent cycle, if necessary. Dead humongous objects are freed at the end of the marking cycle during the cleanup phase also during a full garbage collection cycle (but, a new implementation has changed this; see the next section).

Since each individual set of StartsHumongous and ContinuesHumongous regions contains just one humongous object, the space between the end of the humongous object and the end of the last region spanned by the object is unused. For objects that are just slightly larger than a multiple of the heap region size, this unused space can cause the heap to become fragmented.

How Humongous Allocations Impact G1's Performance?


In the old implementation, humongous objects are not released until after a full concurrent marking cycle.[4]  This is far from ideal for many transaction-based enterprise applications that create short-lived (i.e., in the transaction scope) humongous objects such as ResultSet(s) generated from JDBC queries.  This results in the heap filling up relatively quickly and leads to unnecessary marking cycles to reclaim them.

A new implementation since:
java version "1.8.0_40-ea"
Java(TM) SE Runtime Environment (build 1.8.0_40-ea-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b05, mixed mode)
handles humongous regions differently and can reclaim them earlier if they are short-lived (see [4] for details).

For investigating humongous allocations in G1, you should begin with the following minimal set of VM options:
  • -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC (-XX:+UseG1GC)

Basic Humongous Allocation Investigation


For basic humongous allocations investigation, you simply add a new option:
  • -XX:+PrintAdaptiveSizePolicy
This option will print out something like below:
12025.832: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 3660578816 bytes, allocation request: 1048592 bytes, threshold: 3650722120 bytes (85.00 %), source: concurrent humongous allocation]
From the above, we saw a humongous allocation is initiated with a request size of 1048592 bytes. Note that our region size is 1MBytes in this test. So, the request object is considered as humongous (i.e., 1048592 Bytes > half of 1 MBytes).

In the output, you can also find the following entries:
      [Humongous Reclaim: 0.0 ms]
         [Humongous Total: 54]
         [Humongous Candidate: 0]
         [Humongous Reclaimed: 0]
This shows that there are 54 humongous regions found in this GC event.   Also, none of them were reclaimed in this GC event. The reason is that the current algorithm is relatively conservative and it skips humongous objects for reclamation if there are sparse references into them.  In the case that those references belongs to sparse table entries, G1 may improve the reclamation rate by discovering if they are actually not referenced at all after iterating the table.  A performance enhancement is pending for this effort now.[5]

Advanced Humongous Allocation Investigation


To further investigate humongous allocation in more details, you can add:
-XX:+PrintAdaptiveSizePolicy -XX:+UnlockExperimentalVMOptions -XX:+G1ReclaimDeadHumongousObjectsAtYoungGC -XX:G1LogLevel=finest -XX:+G1TraceReclaimDeadHumongousObjectsAtYoungGC
Note that the following VM options are experimental:
  • G1TraceReclaimDeadHumongousObjectsAtYoungGC[4]
  • G1ReclaimDeadHumongousObjectsAtYoungGC[3]
  • G1LogLevel (i.e., [fine|finer|finest])
So, that's why we added the unlock option;
-XX:+UnlockExperimentalVMOptions
With extra printing options, you will find detailed description of humongous allocations as shown below:
3708.669: [SoftReference, 0 refs, 0.0000340 secs]3708.669: [WeakReference, 1023 refs, 0.0001790 secs]3708.669: [FinalReference, 295 refs, 0.0007020 secs]3708.670: [PhantomReference, 0 refs, 0.0000060 secs]3708.670: [JNI Weak Reference, 0.0000140 secs]Live humongous 1 region 9 size 65538 with remset 1 code roots 0 is marked 0 live-other 0 obj array 0

Live humongous 1 region 10 size 88066 with remset 1 code roots 0 is marked 0 live-other 0 obj array 0
Live humongous 1 region 15 size 262146 with remset 1 code roots 0 is marked 0 live-other 0 obj array 1
...
Finally, you can also use Java Flight Recorder (JFR) and Java Mission Control (JMC) to get information about humongous objects by showing their allocation sources with stack traces.[7]

References

  1. Garbage First Garbage Collector Tuning
  2. G1GC: Migration to, Expectations and Advanced Tuning
  3. Thread local allocation buffers (TLAB's)
  4. Early reclamation of large objects in G1
    • Introduced since 8u40 b02 and 8u45
    • Was not back ported to 8u20
  5. Early reclaim of large objects that are referenced by a few objects
  6. G1TraceReclaimDeadHumongousObjectsAtYoungGC
    • JDK-8058801
      • Prints information about live and dead humongous objects
        • (Before bug fix) prints that information when there is at least one humongous candidate.
        • (After bug fix) prints that information even when there is no humongous candidates; it also prints humongous object size in the output
  7. Java Mission Control (JMC) and Java Flight Recorder (JFR)
  8. Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
  9. Other JDK 8 articles on Xml and More
  10. Concurrent Marking in G1

Sunday, November 2, 2014

G1 GC: What Is "to-space exhausted" in GC Log?

As shown in [1], you can enable basic garbage collection (GC) event printing using two options:
  • -XX:+PrintGCTimeStamps (or -XX:+PrintGCDateStamps) 
  • -XX:+PrintGCDetails
These additional printing incurs minimal overhead while provide useful information for understanding the healthiness of your running garbage collector (e.g., Generation First GC).

In this article, we will discuss the significance of to-space exhausted events in your GC logs.

What Is "to-space exhausted"?


When you see to-space exhausted messages in GC logs, it means that G1 does not have enough memory for either survived or tenured objects, or for both, and the Java heap cannot be further expanded since it is already at its max.[5]  This event only happens in mixed GC 's when G1 tries to copy live objects from the source space to the destination space.  Note that, in G1, it avoids heap fragmentation by compaction (i.e., done when objects are copied from source region to destination region).

Example message:

5843.494: [GC pause (G1 Evacuation Pause) (mixed)... (to-space exhausted), 0.1145790 secs]

When that happens, GC pause time will be longer (i.e., since both RSet and CSet[5] need to be re-generated) and sometimes it may require a Full GC to resolve the issue. 

A Case Study of "to-space exhausted" Events


Using a benchmark with large live data size,[3] a "to-space exhausted" event can be triggered if there is not enough breathing room for shuffling live data objects in the Java heap.  For example, we have a test case which its live data set occupies 68% of the Java heap at run time.  In a 4-hour run, we have seen the following sequence of events:

5843.494: [GC pause (G1 Evacuation Pause) (mixed)... (to-space exhausted), 0.1145790 secs]
5848.831: [GC pause (G1 Evacuation Pause) (mixed)... (to-space exhausted), 0.0847160 secs]
5856.640: [GC pause (G1 Evacuation Pause) (mixed)... 0.0864060 secs]
5866.182: [GC pause (G1 Evacuation Pause) (mixed)... 0.0821220 secs]
5875.550: [GC pause (G1 Evacuation Pause) (mixed)... 0.0549090 secs]
5882.353: [GC pause (G1 Evacuation Pause) (mixed)... 0.0827630 secs]
5896.331: [GC pause (G1 Evacuation Pause) (mixed)... 0.0632660 secs]
5901.966: [GC pause (G1 Evacuation Pause) (mixed)... 0.5356580 secs]
5902.813: [GC pause (G1 Evacuation Pause) (mixed)... (to-space exhausted), 1.3480320 secs]
5904.162: [Full GC (Allocation Failure) ... 5.0413590 secs]

A series of "to-space exhausted" events eventually led to a Full GC.  As discussed in [4], adding -Xmn400m on the command line to the same test case further add insult to injury—instead of seeing 3 "to-space exhausted" events, now we saw 20 of them.  This is due to we have further cut down the breathing room for G1.



How to Deal with "to-space exhausted" Events?


"to-space exhausted" events happen when G1 runs out of space (either survivor space or old space; see diagram above) to copy live objects to.  When it happens, sometimes G1's ergonomics can get by the issue by dynamically re-sizing heap regions.  If not, it eventually leads to a Full GC event.

When "to-space exhausted" events happen, it could mean you have a very tight heap allocated for your application.  The easiest approach to resolve this is increasing your heap size if it's possible. Alternatively, sometimes it's possible to tune G1's mixed GC to get by (see [4]). As shown in the above section, we have a high live data occupancy (i.e., 68%) in Java heap.  However, after tuning mixed GC, we can get by with a couple of Full GC's in a 4-hour run, which yields satisfactory result.

Note that mixed GC can clean up both young and old regions. If it does not take an old region, it would be a young(-only) GC. G1's mixed GC's are meant as a replacement for Full GC's. Instead of doing all the work at once, the idea is to achieve the same task in steps. Each small steps in mixed GC is smaller than a Full GC. Because G1 does not intend to reclaim all garbage as in a Full GC event, mixed GC uses less time.  However, Full GC is a compaction event which it can walk around "to-space exhausted" issue without needing extra space.

Acknowledgement


Some writings here are based on the feedback from Thomas Schatzl and Yu Zhang. However, the author would assume the full responsibility for the content himself.

References

  1. g1gc logs - basic - how to print and how to understand
  2. g1gc logs - Ergonomics -how to print and how to understand
  3. JRockit: How to Estimate the Size of Live Data Set
  4. JDK 8: Is Tuning MaxNewSize in G1 GC a Good Idea?
  5. Garbage First Garbage Collector Tuning
    • Note that "to-space overflow" message was removed and only "to-space exhausted" remains.
    • RSet (Remembered Set)—Tracks object references into a given region. There is one RSet per region in the heap. The RSet enables the parallel and independent collection of a region. The overall footprint impact of RSets is less than 5%.
    • CSet (Collection Set)—Are the set of regions that will be collected in a GC. All live data in a CSet is evacuated (copied/moved) during a GC. Sets of regions can be Eden, survivor, and/or old generation. CSets have a less than 1% impact on the size of the JVM. 
  6. HotSpot Virtual Machine Garbage Collection Tuning Guide
  7. Garbage-First Garbage Collector Tuning
  8. Getting Started with the G1 Garbage Collector
  9. Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
  10. Other JDK 8 articles on Xml and More
  11. Concurrent Marking in G1

Tuesday, September 30, 2014

JDK 8: Thread Stack Size Tuning

When you upgrade JDK, you should re-examine all JVM options you have set in your Java applications.  For example, let's look at thread stack size tuning specifically.  As suggested in [1], it states:
  • In most applications, 128k happens to be enough for the Java thread stack.
However, after setting that, we have run into the following fatal exception:
The stack size specified is too small, Specify at least 228k
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
In this article, we will discuss thread stack size tuning in JDK 8 (i.e., HotSpot VM).

Default Thread Stack Size


When a new thread is launched, the Java virtual machine creates a new Java stack for the thread. As mentioned earlier, a Java stack stores a thread's state in discrete frames.[3] The Java virtual machine only performs two operations directly on Java Stacks: it pushes and pops frames.

The default thread stack size varies with JVM, OS and environment variables. To find out what your default ThreadStackSize is on your platform, use:[1]
java -XX:+PrintFlagsFinal -version
A typical value is 512k. It is generally larger for 64bit JVMs because references are 8 bytes rather than 4 bytes in size (but, you can compress oops or class pointers if you choose).[2] For example,[4]
In Java SE 6, the default on Sparc is 512k in the 32-bit VM, and 1024k in the 64-bit VM. On x86 Solaris/Linux it is 320k in the 32-bit VM and 1024k in the 64-bit VM.
On Windows, the default thread stack size is read from the binary (java.exe). As of Java SE 6, this value is 320k in the 32-bit VM and 1024k in the 64-bit VM.

In JDK 8, every time the JVM creates a thread, the OS allocates some native memory to hold that thread’s stack, committing more memory to the process until the thread exits. Thread stacks are fully allocated (i.e., committed, not just reserved) when they are created.

This means that if your application spawns a lot of threads, this can consume a significant amount of memory which could otherwise be used by your application or OS (or it can eventually leads to OutOfMemoryError).

You can reduce your stack size by running with the -Xss option. For example:
java -server -Xss256k
or
java -server -XX:ThreadStackSize=256 
Note that if you have installed a 64-bit VM binary for Linux, you can omit -server option.[5]

Virtual Memory Map


In JDK 8, HotSpot installation comes with a feature named Native Memory Tracking (default: disabled).  To enable it, use:
-XX:NativeMemoryTracking=[off|detail|summary]

After enabling NMT, you can examine the memory footprint taken by either Thread or Thread Stack using:
jcmd <pid> VM.native_memory [summary | detail | baseline | summary.diff | detail.diff | shutdown] [scale= KB | MB | GB]

 For example, on a 64-bit Linux platform, here is the thread stack size before and after setting -Xss256k:

Before

 Virtual memory map:

[0x0000000040049000 - 0x000000004014a000] reserved and committed 1028KB for Thread Stack from
    [0x00002aec741ca5e4] JavaThread::run()+0x24
    [0x00002aec74083268] java_start(Thread*)+0x108

After

Virtual memory map:

[0x0000000040078000 - 0x00000000400b9000] reserved and committed 260KB for Thread Stack from
    [0x00002b02c69156e4] JavaThread::run()+0x24
    [0x00002b02c67ce338] java_start(Thread*)+0x108


Conclusions


The thread stack is used to push stacks frames in nested method calls. If the nesting is so deep that the thread runs out of space, the thread dies with a StackOverflowError.[8] If your applications use lots of recursive algorithms or if your applications are built on top of a framework utilizing MVC design pattern such as Oracle ADF, you may want to leave StackThreadSize as defaults.

However, thread stacks are quite large, particularly for a 64-bit JVM.  In [9], Scott Oaks has advised:
  • As a general rule, many applications can actually run with a 128 KB stack size in a 32-bit JVM, and a 256 KB stack size in a 64-bit JVM.
  • In a 64-bit JVM, there is usually no reason to set this value unless the machine is quite strained for physical memory and the smaller stack size will prevent applications from running out of native memory. 
  • On the other hand, using a smaller (e.g., 128 KB) stack size on a 32-bit JVM is often a good idea, as it frees up memory in the process size and allows the JVM to utilize a larger heap.

Finally, the total footprint of the JVM has a significant effect on its performance. So, footprint is one aspect of Java performance that should be commonly monitored.

Thursday, August 28, 2014

JDK 8: Revisiting ReservedCodeCacheSize and CompileThreshold

In[1], someone has commented that:
Did you specify any extra JVM parameters to reach the state of full CodeCache? Some comments on the internet indicate this happens if you specify too low "-XX:CompileThreshold" and too much bytecode gets compiled by HotSpot very early.

when the following warning was seen:
VM warning: CodeCache is full. Compiler has been disabled.

In this article, we will look at tuning CompileThreshold and ReservedCodeCacheSize in JDK 8.


CompileThreshold



By default, CompileThreshold is set to be 10,000:

     intx CompileThreshold     = 10000       {pd product}

As described in [2], we know {pd product} means "platform-dependent product option".  Our platfrom is linux-x64 and that will be used for this discussion.

Very often, you see people setting the threshold lower.  For example
-XX:CompileThreshold=8000
Why?  Since the JIT compiler does not have time to compile every single method in an application, all code starts out initially running in the interpreter, and once it becomes hot enough it gets scheduled for compilation. To help determine when to convert bytecodes to compiled code, every method has two counters:
  • Invocation counter
    • Which is incremented every time a method is entered
  • Backedge counter
    •  Which is incremented every time control flow moves from a higher bytecode index to a lower one
Whenever either counter is incremented by the interpreter it checks them against a threshold, and if they cross this threshold, the interpreter requests a compile of that method.

The threshold used for the invocation counter is called the CompileThreshold, the backedge counter uses a more complex formula derived from CompileThreshold and OnStackReplacePercentage.  So, if you set the threshold lower, HotSpot compiles methods earlier.  And, in some cases, that can help the performance of server codes.

ReservedCodeCacheSize


A code cache is where JVM uses to store the native code generated for compiled methods.  As described in [3], to improve an application's performance, you can set the "reserved" code cache size:
  • -XX:ReservedCodeCacheSize=256m
when tiered compilation is enabled for the HotSpot.  Basically it sets the maximum size for the compiler's code cache.  In [4], we have shown that an application can run faster if tiered compilation is enabled in a server environment.  However, code cashe size also needs to be specified larger.

What's New in JDK 8?


We have seen people setting the following JVM options:
  • -XX:ReservedCodeCacheSize=256m -XX:+TieredCompilation
or
  • -XX:CompileThreshold=8000 
in JDK 7.  In JDK 8, do we still need to set them?  The answer is that it depends on the platform.  On linux-x64 platforms, those setting are no longer necessary.  Here we will describe why.

In JDK 8, it chooses the following default values for linux-x64 platforms:

    bool TieredCompilation        = true       {pd product}     
    intx CompileThreshold         = 10000      {pd product}
    uintx ReservedCodeCacheSize   = 251658240  {pd product}


When tiered compilation is enabled, two things happen:
  1. CompileThreshold is ignored
  2. A bigger code cache is needed.  Internally, HotSpot will set it to be 240 MB (i.e., 48 MB * 5)
That's why we say that people don't need to set the following options anymore in JDK8:
  • -XX:ReservedCodeCacheSize=256m -XX:+TieredCompilation 
or
  • -XX:CompileThreshold=8000

Noted that “reserved” code cache is just an address space reservation, it does not really consume any additional physical memory unless it’s used.  On 64-bit platforms, it doesn’t hurt at all to set a higher value.  However, if you have set cache size to be too small, you will definitively see the negative impact on your application's performance.

Acknowledgement


Some writings here are based on the feedback from Igor Veresov and Vladimir Kozlov. However, the author would assume the full responsibility for the content himself.

References

  1. VM warning: CodeCache is full. Compiler has been disabled.
  2. HotSpot: What Does {pd product} Mean?  (Xml and More)
  3. Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation (Xml and More
  4. A Case Study of Using Tiered Compilation in HotSpot  (Xml and More)
  5. Useful JVM Flags – Part 4 (Heap Tuning)
  6. g1gc logs - Ergonomics -how to print and how to understand 
  7. G1 GC Glossary of Terms
  8. Learn More About Performance Improvements in JDK 8 
  9. HotSpot Virtual Machine Garbage Collection Tuning Guide
  10. Other JDK 8 articles on Xml and More
  11. Tuning that was great in old JRockit versions might not be so good anymore
    • Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
    • Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).

Thursday, February 6, 2014

Hotspot: Creating Flight Recording and Viewing Object Allocation

In [1], Marcus has written an excellent article on "Creating Flight Recording." Based on his advice, we have created a JFR recording with object allocations inside/outside TLAB (Thread Local Allocation Buffer) enabled. Then, we installed Java Mission Control (an Eclipse plug-in) on our 64-bit Windows platform and used it to investigate object allocations in the recording.

In this article, we will visit the following topics:
  • Generating JFR recording
  • Downloading and installing Java Mission Control
  • Viewing object allocations using Java Mission Control

Generating JFR Recording


Following Marcus' Time Fixed Recording example, we have used the following command line options:
  • -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=7800s,duration=300s,name=MyRecording,filename=/tmp/myrecording.jfr,settings=default

In our example, we started the recording for 5 minutes after Hotspot has run for 7800 seconds. To configure the recording, you can use either template provided in jre/lib/jfr folder:
  • default.jfc
  • profile.jfc

For our example, we have chosen the default template (note that the settings parameter can also take a path to a template) in which we have enabled the following settings:
<flag name="allocation-profiling-enabled" label="Allocation Profiling">true</flag>

<event path="java/object_alloc_in_new_TLAB">
  <setting name="enabled" control="allocation-profiling-enabled">true</setting>
  <setting name="stackTrace">true</setting>
</event>

<event path="java/object_alloc_outside_TLAB">
  <setting name="enabled" control="allocation-profiling-enabled">true</setting>
  <setting name="stackTrace">true</setting>
</event>

Note that we have saved an original copy of default.jfc before making the changes. After running our benchmark, it has generated a recording file named:
  • myrecording.jfr

Downloading and Installing Java Mission Control



Capability
Oracle JRockit
JDK6 (R28+)
Oracle JDK 7 GA
Oracle JDK
7u40+
Host JRMC/JMC GUI
Yes (JRMC) Yes (JMC) Yes (JMC)
WLDF JFR Events and Analysis
Yes Yes Yes
JFR, JMC Convergence
(JVM Events)
Yes No Yes
                 Table 1  Oracle JDK 7 Java Mission Control Support

You can follow the instructions from [2] to download Java Mission Control—an Eclipse plug-in. However, instead of using https, you should use http as shown above. Also, be noted that
Java Mission Control 5.2.0 does not support Kepler, only Juno.[3,4]
So, you need to download Eclipse Juno from here. If you installed JMC in Eclipse Kepler, you would see the problem as reported in [5].




Viewing object allocations using Java Mission Control


After opening myrecording.jfr, select Memory tab group on the left as shown above.  The Memory tab group shows Information on memory management and garbage collections. It is comprised of these tabs:
  • Overview Tab
  • Garbage Collections Tab
  • GC Times Tab
  • GC Configuration Tab
  • Allocations Tab
  • Object Statistics Tab
The Allocation tab is a good starting point if you suspect that you have problems with object allocation. It contains data about allocated objects and Thread Local Area Buffers (TLABs). This tab has information about how much memory each thread has allocated.

To recap: when we did the recording, we have enabled the following events:
  • java/object_alloc_in_new_TLAB
  • java/object_alloc_outside_TLAB
which are displayed in the above figure:
  • Allocation in new TLAB
  • Allocation outside TLAB


References

  1. Creating Flight Recordings
  2. Oracle Java Mission Control Downloads
  3. Java Mission Control for Eclipse
  4. Oracle® Java Mission Control
    • Oracle® Java Mission Control is a set of plug-ins for Eclipse 3.8 or 4.2.
  5. Mission Control and Flight Recorder on HotSpot JVM
  6. Which JVM?
    • If opening JFR recordings takes forever,  you may need to increase heap size.
  7. JDBC in Java Mission Control 
  8. JDK Mission Control (JMC) 8.1.0 Downloads 

Monday, September 2, 2013

HotSpot: Using jstat to Explore the Performance Data Memory

HotSpot provides jvmstat instrumentation for performance testing and problem isolation purposes.  And it's enabled by default (see -XX:+UsePerfData).

If you run Java application benchmarks, it's also useful to save PerfData memory to hsperfdata_ file on exit by setting:
  • -XX:+PerfDataSaveToFile
A file named  hsperfdata_<vmid> will be saved in the WebLogic domain's top-level folder.

How to Read hsperfdata File?


To display statistics collected in PerfData memory, you can use:
  • jstat[3]
    • Experimental JVM Statistics Monitoring Tool - It can attach to an instrumented HotSpot Java virtual machine and collects and logs performance statistics as specified by the command line options. (formerly jvmstat)
There are two ways of showing statistics collected in PerfData memory:
  • Online
    • You can attach to an instrumented HotSpot JVM and collect and log performance statistics at runtime.
  • Offline
    • You can set -XX:+PerfDataSaveToFile flag and read the contents of the hsperfdata_ file on the exit of JVM.
In the following, we have shown an offline example of reading the hsperfdata_ file (i.e. a binary file; you need to use jstat[3] to display its content):
$ /scratch/perfgrp/JVMs/jdk-hs/bin/jstat -class file:///<Path to Domain>/MyDomain/hsperfdata_9872

Loaded    Bytes  Unloaded   Bytes       Time
30600   64816.3         2     3.2      19.74

You can check all available command options supported by jstat using:

$jdk-hs/bin/jstat -options
-class
-compiler
-gc
-gccapacity
-gccause
-gcmetacapacity
-gcnew
-gcnewcapacity
-gcold
-gcoldcapacity
-gcutil
-printcompilation

HotSpot Just-In-Time Compiler Statistics


One of the command option supported by jstat is "-compiler", which can provide high-level JIT compiler statistics.

Column Description
Compiled Number of compilation tasks performed.
Failed Number of compilation tasks that failed.
Invalid Number of compilation tasks that were invalidated.
Time Time spent performing compilation tasks.
FailedType Compile type of the last failed compilation.
FailedMethod Class name and method for the last failed compilation.

In the following, we have shown the compiler statistics of three managed servers in one WLS Domain using two different JVM builds:

$/scratch/perfgrp/JVMs/jdk-hs/bin/jstat -compiler file:///<Path to Domain>/MyDomain/hsperfdata_9872


JVM1

Compiled Failed Invalid   Time   FailedType FailedMethod
   33210     13       0   232.97          1 oracle/ias/cache/Bucket objInvalidate
   74054     20       0   973.03          1 oracle/security/o5logon/b b
   74600     18       0  1094.21          1 oracle/security/o5logon/b b

JVM2

Compiled Failed Invalid   Time   FailedType FailedMethod
   33287     10       0   246.26          1 oracle/ias/cache/Bucket objInvalidate
   68237     18       0  1022.46          1 oracle/security/o5logon/b b
   67346     18       0   943.79          1 oracle/security/o5logon/b b

Given the above statistics, we could take next action on analyzing why JVM2 generating less compiled methonds than JVM1 did. At least this is one of the use case for using PerfData with its associated tool—jstat.

PerfData-Related JVM Options


NameDescriptionDefaultType
UsePerfDataFlag to disable jvmstat instrumentation for performance testing and problem isolation purposes.truebool
PerfDataSaveToFileSave PerfData memory to hsperfdata_ file on exitfalsebool
PerfDataSamplingIntervalData sampling interval in milliseconds50 /*ms*/intx
PerfDisableSharedMemStore performance data in standard memoryfalsebool
PerfDataMemorySizeSize of performance data memory region. Will be rounded up to a multiple of the native os page size.32*Kintx

Note that the default size of PerfData memory is 32K. Therefore the file (i.e., hsperfdata_ file) dumped on exit is also 32K in size.

References

  1. New Home of Jvmstat Technology
  2. The most complete list of -XX options for Java JVM
  3. jstat - Java Virtual Machine Statistics Monitoring Tool

Saturday, August 10, 2013

Diagnosing Heap Stress in HotSpot

Heap stress is characterized as OutOfMemory conditions or frequent Full GCs accounting for a certain percentage of CPU time[6].  To diagnose heap stress, either heap dumps or heap histograms can help.

In this article, we will discuss the following topics:
  1. Heap histogram vs. heap dump[1]
  2. How to generate heap histogram or heap dump in HotSpot


Heap Histogram vs. Heap Dump 


Without much ado, read this companion article for the comparison.  For heap analysis, you can use either jmap or jcmd to do the job[5].  Here we focus only on using jmap.

$ jdk-hs/bin/jmap -help
Usage:
    jmap [option] 
        (to connect to running process)
    jmap [option] 
        (to connect to a core file)
    jmap [option] [server_id@]
        (to connect to remote debug server)

where <option> is one of:
    <none>               to print same info as Solaris pmap
    -heap                to print java heap summary
    -histo[:live]        to print histogram of java object heap; if the "live"
                         suboption is specified, only count live objects
    -permstat            to print permanent generation statistics
    -finalizerinfo       to print information on objects awaiting finalization
    -dump:<dump-options> to dump java heap in hprof binary format
                         dump-options:
                           live         dump only live objects; if not specified,
                                        all objects in the heap are dumped.
                           format=b     binary format
                           file=  dump heap to 
                         Example: jmap -dump:live,format=b,file=heap.bin <pid>
    -F                   force. Use with -dump:<dump-options> <pid> or -histo
                         to force a heap dump or histogram when <pid> does not
                         respond. The "live" suboption is not supported
                         in this mode.
    -h | -help           to print this help message
    -J<flag>             to pass <flag> directly to the runtime system


Generating Heap Histogram


Heap histograms can be obtained by using jmap (note that you need to use jmap from the same JDK installation which is used to run your applications):

$~/JVMs/jdk-hs/bin/jmap -histo:live 7891 >hs_jmap_7891.txt


 num     #instances         #bytes  class name
----------------------------------------------
   1:       2099805      195645632  [C
   2:        347553       49534472  <constMethodKlass>
   3:       2055692       49336608  java.lang.String
   4:        347553       44501600  <methodKlass>
   5:         30089       36612792  <constantPoolKlass>
   6:       1044560       33425920  java.util.HashMap$Entry
   7:         90868       24909264  [B
   8:         30089       23289072  <instanceKlassKlass>
   9:         22323       18194144  <constantPoolCacheKlass>
  10:        177458       15661816  [Ljava.util.HashMap$Entry;
  11:        642260       15414240  javax.management.ObjectName$Property
  12:        159785       15405144  [Ljava.lang.Object;

In the output, it shows the total size and instance count for each class type in the heap.  For example, there are 2099805 instances of character arrays (i..e, [C), which has a total size of 195645632 bytes.  Because the suboption live was specified, only live objects were counted (i.e., a full GC was forced before histogram was collected).

Generating Heap Dump


Heap dump is a file containing all the memory contents of a Java application. It can be generated via:

$ ~/JVMs/jdk-hs/bin/jmap -dump:live,file=/tmp/hs_jmap_dump.hprof 7891
Dumping heap to /tmp/hs_jmap_dump.hprof ...
Heap dump file created

Including the live option in jmap will force a full GC to occur before the heap is dumped so that it contains only live objects.  We recommend taking multiple heap dumps.  For example, 30 minutes and 1 hour into the run.  Then use Eclipse MAT[2,3] to examine the heap dumps.

Finally, there are other ways to generate a java heap dump:
  • Use jconsole option to obtain a heap dump via HotSpotDiagnosticMXBean at runtime
  • Heap dump will be generated when OutOfMemoryError is thrown by specifying
    • -XX:+HeapDumpOnOutOfMemoryError VM option
  • Use hprof[7]


References

Wednesday, August 7, 2013

hotspot.log from the HotSpot's Fastdebug Build

I was asked to use a fastbebug VM build to run my benchmark.  I have noticed that a new file was generated in our WebLogic domain folder:
  • hotspot.log[1]
To track which methods are getting inlined, you can use DEBUG VM with the following option:
  • -XX:PrintInlining
Note that there are some interesting options available only in DEBUG build[4].


What Is a Fastdebug Build?


There are only scarce information for fastdebug VM build on the internet.  Here are some description of it (but, it may be obsolete)[2]:

  • Hotspot, fastdebug builds, plug&play shared libraries (removal of java_g)
    These may seem unrelated, but when the Hotspot team started building "fastdebug" VM libraries (using -g -O and including assertions), that could just be plugged into any JDK image, that was a game changer. It became possible to plug&play native components when building this way, instead of the java_g builds where all the components had to be built the same way, an all or nothing run environment that was horribly slow and limiting. So we tried to create a single build flow, with just variations on the builds (I sometimes called them "build flavors" of product, debug, and fastdebug). Of course, at the same time, the machines got faster, and perhaps now using a complete debug build of a jdk make more sense? In any case, that fastdebug event influenced matters. We do want different build flavors, but we don't want separate make logic for them.


For a fastdebug build, I've downloaded the following zip file:
  • linux_x64_2.6-fastdebug.zip

hotspot.log


The contents of hotspot.log look like below:

<?xml version='1.0' encoding='UTF-8'?>
<hotspot_log version='160 1' process='7063' time_ms='1375898373959'>
<vm_version>
  <name>Java HotSpot(TM) 64-Bit Server VM</name>
  <release>25.0-b43-fastdebug</release>
  <info>
   Java HotSpot(TM) 64-Bit Server VM (25.0-b43-fastdebug) for linux-amd64 JRE ...
  </info>
</vm_version>
<vm_arguments>
  <args>
   ...
  </args>
  <command>weblogic.Server</command>
  <launcher>SUN_STANDARD</launcher>
  <properties>
   java.vm.specification.name=Java Virtual Machine Specification
   ...
  </properties>
</vm_arguments>

Obviously, there are tons of other information in the file that can help you to identify any VM issues.

References

Friday, May 31, 2013

Understanding String Table Size in HotSpot

In JDK-6962930[2], it requested that string table size be configurable.  The resolved date of that bug was on 04/25/2011 and it's available in JDK 7.  In another JDK bug[3], it has requested the default size (i.e. 1009) of string table be increased.

In this article, we will examine the following topics:
  • What string table is
  • How to find the number of interned strings in your applications
  • The tradeoff between memory footprint and lookup cost

String Table


In Java, string interning[1] is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool, which is the string table in HotSpot.

The size of the string table (i.e., a chained hash table) is configurable in JDK 7.  When the overflow chains become long, performance can degrade.  The current default size of string table is 1009 (or 1009 buckets), which is too small for applications that stress the string table.  Note that the string table itself is allocated in native memory but the strings are java objects.

Increasing the size improves performance (i..e, reducing look-up cost) but increases the StringTable size by 16 bytes on 64-bit systems, 8 bytes on 32-bit systems for every additional entry.  For example, changing the default size to 60013 increases the String Table size by 460K on 32 bit systems.

Finding Number of Interned Strings in the Applications


In HotSpot, it provides a product level option named PrintStringTableStatistics which can be used to print hash table statistics[4].  For example, using one of our applications (hereafter will be referred as JavaApp), it prints out the following information:

StringTable statistics:
Number of buckets  : 60013
Average bucket size  : 5
Variance of bucket size : 5
Std. dev. of bucket size: 2
Maximum bucket size  : 17

You can find the above output from your manged server's log file in the WebLogic domain.  Note that we have set the following option:
  • -XX:StringTableSize=60013
So, there are 60013 buckets in the hash table (or string table).

In JDK, there is also a tool named jmap which can be used to find out number of interned strings in your application.  For example, we have found the following information using:

$ jdk-hs/bin/jmap -heap 18974
Attaching to process ID 18974, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.0-b43

using thread-local object allocation.
Parallel GC with 18 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 2147483648 (2048.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 402653184 (384.0MB)
   MaxPermSize      = 402653184 (384.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation

<deleted for brevity>

270145 interned Strings occupying 40429904 bytes.


Therefore, we know there are around 260K interned Strings in the table.

Tradeoff Between Memory Footprint and Lookup Cost


Based on curiosity, we have tried to set the string table size to be 277331 (a prime number) to see how JavaApp performs.  Here are our findings:

  • Average Response Time: +0.75%
  • 90% Response Time: +0.56%

However, the memory footprint has increased:
  • Total Memory Footprint: -1.03%

Finally, here is the hash table statistics based on the new size (i.e., 277331):

StringTable statistics:
Number of buckets       :  277331
Average bucket size     :       1
Variance of bucket size :       1
Std. dev. of bucket size:       1
Maximum bucket size     :       8


The conclusion is that increasing string table size from 60013 to 277331 helps JavaApp's performance a little bit at the expense of larger memory footprint.  In this case, the benefit is minimal, keeping string table size to be 60013 is good enough.

References

  1. String Interning (Wikipedia)
  2. JDK 6962930 : make the string table size configurable
  3. JDK 8009928: Increase default value for StringTableSize
  4. Java GC tuning for strings
  5. All other performance tuning articles on XML and More
  6. G1 GC Glossary of Terms


Sunday, April 7, 2013

Understanding CMS GC Logs

In this article, we will examine the following topics:
  • What's Mark and Sweep algorithm?
  • What's CMS Collector in HotSpot?
  • What's the format of CMS logs?

Mark and Sweep (MS) Algorithm


The mark-and-sweep algorithm consists of two phases[3]: In the first phase, it finds and marks all accessible objects. The first phase is called the mark phase. In the second phase, the garbage collection algorithm scans through the heap and reclaims all the unmarked objects. The second phase is called the sweep phase. The algorithm can be expressed as follows:

for each root variable r
    mark (r);
sweep ();

The computational complexity of mark and sweep is both a function of the amount of live data on the heap (for mark) and the actual heap size (for sweep).
  • Mark
    • Add each object in the root set to a queue
      • Typically, the root set contains all objects that are available without having to trace any references[5], which includes all Java objects on local frames in whatever methods the program is executing when it is halted for GC. This includes:
        • Everything we can obtain from the user stack and registers in the thread contexts of the halted program. 
        • Global data, such as static fields.
    • For each object X in the queue Mark X reachable
      • mark bit is typically associated with each reachable object. 
    • Add all objects referenced from X to the queue
    • Marking is the most critical of the stages, as it usually takes up around 90 percent of the total garbage collection time. 
      • Fortunately, marking is very parallelizable and large parts of a mark phase can also be run concurrently with executing Java code. 
  • Sweep
    • For each object X on the heap, 
      • If the X not marked, garbage collect it
    • Sweeping (or compaction which is not included in MS algorithm), however, tend to be more troublesome to be made to run concurrently with the executing Java program.

Concurrent Mark Sweep (CMS) Collector


CMS (Concurrent Mark Sweep) is one of garbage collectors implemented in HotSpot JVM.  It is enabled using:
  • -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Note that there are two different implementations of parallel collectors for the young generation:
  • UseParNewGC
  • UseParallelGC 

UseParNewGC should be used with CMS, and UseParallelGC should be used with the throughput collector.  Also that there is another collector named iCMS,[4] which is a variant of CMS.  You should not use iCMS anymore because it is (or will be) deprecated.

CMS is designed to be mostly concurrent, requiring just two quick stop-the-world pauses per old space garbage collection cycle.   These two phases are the initial mark phase (single-threaded) and the remark phase (multithreaded).  It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads.

CMS does not perform compaction in normal CMS cycle.  However, an old generation overflow will trigger a stop-the-world compacting garbage collection.
  • Par new generation (or Young Gen)
    • Young generation space is further split into 
      • Eden 
      • Survivor spaces
    • On CMS, it would have defaulted to a maximum size 665 MB if you didn't explicitly set it.
    • Survivors from the young generation are evacuated to 
      • Other survivor space, or
      • Old generation space
  • Concurrent mark-sweep generation (or Tenured Gen)
    • Does in-place de-allocation (which can lead to heap fragmentation)
    • Is managed by free lists, which need synchronization
    • Concurrent marking phase
      • Two stop-the-world pauses
        • Initial mark
          • Marks reachable (live) objects
        • Remark
          • Unmarked objects are deducted to be unreachable (dead)
    • Concurrent sweeping phase
      • Sweeps over the heap
      • In-place de-allocated unmarked (dead) objects
      • End of concurrent sweeping--all unmarked objects have been de-allocated

Enabling Logging


We have used the following HotSpot VM options to generate the log file:
  • -Xloggc:/gc_0.log 
  • -verbose:gc 
  • -XX:+PrintGCDetails 
  • -XX:+PrintGCTimeStamps 
  • -XX:+PrintReferenceGC 

CMS GC Logs


Let's take a look at some of the CMS logs generated with 1.7.0_10:

2715.210: [GC 2715.210: [ParNew2715.282: [SoftReference, 0 refs, 0.0000120 secs]2715.282: [WeakReference, 2433 refs, 0.0004150 secs]2715.283: [FinalReference, 1947 refs, 0.0074410 secs]2715.290: [PhantomReference, 0 refs, 0.0000080 secs]2715.290: [JNI Weak Reference, 0.0000040 secs]: 310460K->27980K(318912K), 0.0804180 secs] 3828421K->3561073K(4158912K), 0.0805140 secs] [Times: user=0.36 sys=0.00, real=0.08 secs]
Young generation (ParNew) collection. Young generation capacity is 318912K and after the collection its occupancy drops down from 310460K to 27980K. This collection took 0.08 secs.

2715.294: [GC [1 CMS-initial-mark: 3533092K(3840000K)] 3564157K(4158912K), 0.0418950 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]

Beginning of tenured generation collection with CMS collector. This is initial Marking phase of CMS where all the objects directly reachable from roots are marked and this is done with all the mutator threads stopped.

Capacity of tenured generation space is 3840000K and CMS was triggered at the occupancy of 3533092K

2715.337: [CMS-concurrent-mark-start]
Start of concurrent marking phase.

In Concurrent Marking phase, threads stopped in the first phase are started again and all the objects transitively reachable from the objects marked in first phase are marked here.

2717.428: [CMS-concurrent-mark: 2.091/2.091 secs] [Times: user=6.34 sys=0.08, real=2.09 secs]
Concurrent marking took total 2.091 econds cpu time and 2.091 seconds wall time that includes the yield to other threads also.

2717.428: [CMS-concurrent-preclean-start]
Start of precleaning.

Precleaning is also a concurrent phase. Here in this phase we look at the objects in CMS heap which got updated by promotions from young generation or new allocations or got updated by mutators while we were doing the concurrent marking in the previous concurrent marking phase. By rescanning those objects concurrently, the precleaning phase helps reduce the work in the next stop-the-world “remark” phase.

2717.428: [Preclean SoftReferences, 0.0000020 secs]2717.428: [Preclean WeakReferences, 0.0066370 secs]2717.434: [Preclean FinalReferences, 0.0011100 secs]2717.436: [Preclean PhantomReferences, 0.0003480 secs]2717.475: [CMS-concurrent-preclean: 0.045/0.048 secs] [Times: user=0.19 sys=0.00, real=0.04 secs]

Concurrent precleaning took 0.045 secs total cpu time and 0.048 wall time.


2717.475: [CMS-concurrent-abortable-preclean-start]

2717.876: [GC 2717.876: [ParNew2717.958: [SoftReference, 0 refs, 0.0000140 secs]2717.958: [WeakReference, 2683 refs, 0.0004340 secs]2717.959: [FinalReference, 1972 refs, 0.0069420 secs]2717.966: [PhantomReference, 0 refs, 0.0000080 secs]2717.966: [JNI Weak Reference, 0.0000050 secs]: 311500K->28150K(318912K), 0.0900310 secs] 3844593K->3577211K(4158912K), 0.0901370 secs] [Times: user=0.41 sys=0.01, real=0.09 secs]

2719.157: [CMS-concurrent-abortable-preclean: 1.582/1.681 secs] [Times: user=4.16 sys=0.07, real=1.68 secs]

2719.157: [GC[YG occupancy: 171027 K (318912 K)]2719.157: [Rescan (parallel) , 0.0680250 secs]2719.225: [weak refs processing2719.226: [SoftReference, 0 refs, 0.0000070 secs]2719.226: [WeakReference, 9940 refs, 0.0010950 secs]2719.227: [FinalReference, 10877 refs, 0.0226430 secs]2719.249: [PhantomReference, 0 refs, 0.0000050 secs]2719.249: [JNI Weak Reference, 0.0000110 secs], 0.0238550 secs]2719.249: [scrub string table, 0.0057930 secs] [1 CMS-remark: 3549060K(3840000K)] 3720087K(4158912K), 0.0989920 secs] [Times: user=0.40 sys=0.01, real=0.10 secs]
In the above log, after a preclean, 'abortable preclean' starts. After the young generation collection, the young gen occupancy drops down from 328912K to 28150K When young gen occupancy reaches 171027 K which is 52% of the total capacity, precleaning is aborted and 'remark' phase is started.
Note that in young generation occupancy also gets printed in the final remark phase.
Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also processes Reference objects. Here the rescanning work took 0.0680250 secs and weak reference objects processing took 0.0010950 secs. This phase took total 0.0238550 secs to complete.

2719.257: [CMS-concurrent-sweep-start]
Start of sweeping of dead/non-marked objects. Sweeping is concurrent phase performed with all other threads running.

2722.945: [CMS-concurrent-sweep: 3.606/3.688 secs] [Times: user=8.79 sys=0.12, real=3.69 secs]
Sweeping took 3.69 secs.

2722.945: [CMS-concurrent-reset-start]
Start of reset.

2722.973: [CMS-concurrent-reset: 0.028/0.028 secs] [Times: user=0.05 sys=0.00, real=0.03 secs]
In this phase, the CMS data structures are reinitialized so that a new cycle may begin at a later time. In this case, it took 0.03 secs.

This was how a normal CMS cycle runs.  Note that, in our experiment, we didn't see the "concurrent mode failure".  So, we didn't discuss it here.  For more information of it, read [1].

References

  1. Understanding CMS GC Logs
  2. Java GC, HotSpot's CMS and heap fragmentation
  3. Mark-and-Sweep Garbage Collection
  4. Really? iCMS? Really?
  5. Oracle JRockit--The Definitive Guide
  6. JRockit: Parallel vs Concurrent Collectors (Xml and More)
  7. The Unspoken - CMS and PrintGCDetails (Jon Masamitsu's Weblog)
  8. The Unspoken - Phases of CMS (Jon Masamitsu's Weblog)

Tuesday, March 12, 2013

HotSpot—java.lang.OutOfMemoryError: PermGen space

There could be different causes that lead to out-of-memory error in HotSpot VM.  For example, you can run out of memory in PermGen space:
  • java.lang.OutOfMemoryError: PermGen space

In this article, we will discuss:
  • Java Objects vs Java Classes
  • PermGen Collection[2]
  • Class unloading
  • How to find the classes allocated in PermGen?
  • How to enable class unloading for CMS?
Note that this article is mainly based on Jon's excellent article[1].

Java Objects vs Java Classes


Java objects are instantiations of Java classes. HotSpot VM has an internal representation of those Java objects and those internal representations are stored in the heap (in the young generation or the old generation[2]). HotSpot VM also has an internal representation of the Java classes and those are stored in the permanent generation.

PermGen Collector


The internal representation of a Java object and an internal representation of a Java class are very similar.  From now on, we use Java objects and Java classes to refer to their internal representations.  The Java objects and Java classes are similar to the extent that during a garbage collection both are viewed just as objects and are collected in exactly the same way.

Besides its basic fields, Java class also include the following:
  • Methods of a class (including the bytecodes)
  • Names of the classes (in the form of an object that points to a string also in the permanent generation)
  • Constant pool information (data read from the class file, see chapter 4 of the JVM specification for all the details).
  • Object arrays and type arrays associated with a class (e.g., an object array containing references to methods).
  • Internal objects created by the JVM (java/lang/Object or java/lang/exception for instance)
  • Information used for optimization by the compilers (JITs)
There are a few other bits of information that end up in the permanent generation but nothing of consequence in terms of size. All these are allocated in the permanent generation and stay in the permanent generation.

Class Loading/Unloading


Back in old days, most classes were mostly static and custom class loaders were rarely used.  Then class unloading may not be necessary.  However, things have changed and sometimes you could run into the following message:
  • java.lang.OutOfMemoryError: PermGen space
In this case, there are at least two options:
  • Increasing the size of PermGen
  • Enabling class unloading

Increasing the Size of PermGen


Sometimes there is a legitimate need to increase PermGen size by setting the following options:
  • -XX:PermSize=384m -XX:MaxPermSize=384m 
However, before you do that, you may want to find out what Java classes were allocated in PermGen by running
  • jmap -permstat
This is supported in JDK5 and later on both Solaris and Linux.

Enabling Class Unloading


By default, most HosSpot Garbage Collectors do class unloading except CMS collector[2] (enabled by  -XX:+UseConcMarkSweepGC).

If you use CMS collector and run into PermGen's out-of-memory error, you could consider enabling class unloading by setting:
  • -XX:+CMSClassUnloadingEnabled
  • -XX:+CMSPermGenSweepingEnabled

Depending on the release you may have, earlier versions (i.e.,  Java 6 Update 3 or earlier) require you to set both options.  However, in later releases you only need to specify:
  • -XX:+CMSClassUnloadingEnabled

The following are the cases that you want to enable class unloading in CMS collectors:
  • If your application is using multiple class loaders and/or reflection, you may need to enable collecting of garbage in permanent space.
  • Objects in permanent space may have references to normal old space thus even if permanent space is not full itself, references from perm to old space may keep some dead objects unreachable for CMS if class unloading is not enabled.
  • Lots of redeployment may pressure PermGen space
    • A class and it's classloader have to both be unreachable in order for them to be unloaded. A class X with classloader A and the same class X with classloader B will result in two distinct objects (klassses) in the permanent generation. 

References

  1. Understanding GC pauses in JVM, HotSpot's CMS collector.
  2. Understanding Garbage Collection
  3. Presenting the Permanent Generation
  4. Diagnosing Java.lang.OutOfMemoryError
  5. A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded
  6. Understanding Garbage Collector Output of Hotspot VM
  7. Java HotSpot VM Options
  8. Eight flavors of java.lang.OutOfMemoryError