Tuesday, September 30, 2014

JDK 8: Thread Stack Size Tuning

When you upgrade JDK, you should re-examine all JVM options you have set in your Java applications.  For example, let's look at thread stack size tuning specifically.  As suggested in [1], it states:
  • In most applications, 128k happens to be enough for the Java thread stack.
However, after setting that, we have run into the following fatal exception:
The stack size specified is too small, Specify at least 228k
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
In this article, we will discuss thread stack size tuning in JDK 8 (i.e., HotSpot VM).

Default Thread Stack Size


When a new thread is launched, the Java virtual machine creates a new Java stack for the thread. As mentioned earlier, a Java stack stores a thread's state in discrete frames.[3] The Java virtual machine only performs two operations directly on Java Stacks: it pushes and pops frames.

The default thread stack size varies with JVM, OS and environment variables. To find out what your default ThreadStackSize is on your platform, use:[1]
java -XX:+PrintFlagsFinal -version
A typical value is 512k. It is generally larger for 64bit JVMs because references are 8 bytes rather than 4 bytes in size (but, you can compress oops or class pointers if you choose).[2] For example,[4]
In Java SE 6, the default on Sparc is 512k in the 32-bit VM, and 1024k in the 64-bit VM. On x86 Solaris/Linux it is 320k in the 32-bit VM and 1024k in the 64-bit VM.
On Windows, the default thread stack size is read from the binary (java.exe). As of Java SE 6, this value is 320k in the 32-bit VM and 1024k in the 64-bit VM.

In JDK 8, every time the JVM creates a thread, the OS allocates some native memory to hold that thread’s stack, committing more memory to the process until the thread exits. Thread stacks are fully allocated (i.e., committed, not just reserved) when they are created.

This means that if your application spawns a lot of threads, this can consume a significant amount of memory which could otherwise be used by your application or OS (or it can eventually leads to OutOfMemoryError).

You can reduce your stack size by running with the -Xss option. For example:
java -server -Xss256k
or
java -server -XX:ThreadStackSize=256 
Note that if you have installed a 64-bit VM binary for Linux, you can omit -server option.[5]

Virtual Memory Map


In JDK 8, HotSpot installation comes with a feature named Native Memory Tracking (default: disabled).  To enable it, use:
-XX:NativeMemoryTracking=[off|detail|summary]

After enabling NMT, you can examine the memory footprint taken by either Thread or Thread Stack using:
jcmd <pid> VM.native_memory [summary | detail | baseline | summary.diff | detail.diff | shutdown] [scale= KB | MB | GB]

 For example, on a 64-bit Linux platform, here is the thread stack size before and after setting -Xss256k:

Before

 Virtual memory map:

[0x0000000040049000 - 0x000000004014a000] reserved and committed 1028KB for Thread Stack from
    [0x00002aec741ca5e4] JavaThread::run()+0x24
    [0x00002aec74083268] java_start(Thread*)+0x108

After

Virtual memory map:

[0x0000000040078000 - 0x00000000400b9000] reserved and committed 260KB for Thread Stack from
    [0x00002b02c69156e4] JavaThread::run()+0x24
    [0x00002b02c67ce338] java_start(Thread*)+0x108


Conclusions


The thread stack is used to push stacks frames in nested method calls. If the nesting is so deep that the thread runs out of space, the thread dies with a StackOverflowError.[8] If your applications use lots of recursive algorithms or if your applications are built on top of a framework utilizing MVC design pattern such as Oracle ADF, you may want to leave StackThreadSize as defaults.

However, thread stacks are quite large, particularly for a 64-bit JVM.  In [9], Scott Oaks has advised:
  • As a general rule, many applications can actually run with a 128 KB stack size in a 32-bit JVM, and a 256 KB stack size in a 64-bit JVM.
  • In a 64-bit JVM, there is usually no reason to set this value unless the machine is quite strained for physical memory and the smaller stack size will prevent applications from running out of native memory. 
  • On the other hand, using a smaller (e.g., 128 KB) stack size on a 32-bit JVM is often a good idea, as it frees up memory in the process size and allows the JVM to utilize a larger heap.

Finally, the total footprint of the JVM has a significant effect on its performance. So, footprint is one aspect of Java performance that should be commonly monitored.

Sunday, September 14, 2014

G1 GC: Tuning Mixed Garbage Collections in JDK 8

To tune G1 GC (Garbage First Garbage Collector),  a place to start with is [1]. For the latest information, read [2].

Assuming you have minimal knowledge of G1 GC, the focus of this article is to tune the Mixed GC for better performance.

The Need to Tune Mixed GC


G1 GC is a generational garbage collector—read [3] to learn what eden, survivor, and old generation spaces are.  It also uses region-based architecture which divides large contiguous Java heap space into multiple fixed-sized heap regions.

The destination region for a particular object depends upon the object's age; an object that has aged sufficiently is evacuated to an old generation region (or promoted); otherwise, the object is evacuated to a survivor region and will be included in the CSet[5] of the next young or mixed garbage collection.

During young collections, G1 GC adjusts its young generation (eden and survivor sizes) to meet its pause target. During mixed collections, the G1 GC adjusts the number of old gen regions that are collected based on a target number of mixed garbage collections, the percentage of live objects in each region of the heap, and the overall acceptable heap waste percentage (see details later).

Depending on your application's workload, Full GC could be expensive in G1 GC.  Since Mixed GC collects both young and old regions, you can get better performance by tuning Mixed GC—the goal is to reduce the number of Full GC's when your applications run.

More on Mixed Garbage Collections


Upon successful completion of a concurrent marking cycle, the G1 GC switches from performing young garbage collections to performing mixed garbage collections. In a mixed garbage collection, the G1 GC optionally adds some old regions to the set of eden and survivor regions that will be collected. The exact number of old regions added is controlled by a number of flags that will be discussed later. After the G1 GC collects a sufficient number of old regions (over multiple mixed garbage collections), G1 reverts to performing young garbage collections until the next marking cycle completes.

To summarize, Mixed GC can be characterized by:
  • Collecting both young and old regions
  • Denoted by: [GC pause (G1 Evacuation Pause) (mixed) in the GC log file[2]
  • Only after a completed marking cycle
  • A sequence of mixed GC events to collect old regions, up to 8 by default

Tuning Mixed Garbage Collections


You can play with the following options to tune Mixed GC:[1,2,4]
  • -XX:InitiatingHeapOccupancyPercent 
    • Percentage of the (entire) heap occupancy to start a concurrent GC cycle. It is used by GCs that trigger a concurrent GC cycle based on the occupancy of the entire heap, not just one of the generations.  The default value is 45.
    • This option can be used to change the marking threshold
      • If threshold is exceeded, a concurrent marking will be initiated next.
      • The higher the threshold is, the less concurrent marking cycles will be, which also means the less mixed GC evacuation will be.
  • -XX:G1MixedGCLiveThresholdPercent 
    • This option can be used to change the threshold which determines whether a region should be added to the CSet or not.
      • Only regions whose live data percentage are less than the threshold will be added to the CSet.
      • The higher the threshold (default: 65) is, the more likely a region will be added to the CSet, which also means more mixed GC evacuation and longer evacuation time will happen. 
  • -XX:G1HeapWastePercent
    • Amount of reclaimable space, expressed as a percentage of the heap size that G1 will stop doing mixed GC's. If the amount of space that can be reclaimed from old generation regions compared to the total heap is less than this, G1 will stop mixed GC's.
    • Current default is 10%.  A lower value, say 5%, will potentially cause G1 to add more expensive region(s) to evacuate for space reclamation.
      • G1 will continue triggering mixed GC if the reclaimable is higher than the waste threshold.  So, there will be more mixed GC and maybe more expensive if you set this waste threshold lower.
Besides the above-mentioned options, you may also want to tune the following ones:
  • -XX:G1MixedGCCountTarget 
    • Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. 
    • The default is 8 mixed garbage collections. The goal for mixed collections is to be within this target number.
  • -XX:G1OldCSetRegionThresholdPercent
    • Sets an upper limit on the number of old regions to be collected during a mixed garbage collection cycle. The default is 10 percent of the Java heap.
  • -XX:G1HeapRegionSize
    • Sets the size of a G1 region. The value will be a power of two and can range from 1MB to 32MB. 
    • The goal is to have around 2048 regions based on the minimum Java heap size.

Conclusion


Each application is unique, you may need to tune G1 GC in an iterative process.  Note that all of the above options are product options except G1MixedGCLiveThresholdPercent and G1OldCSetRegionThresholdPercent, which are experimental options.  Be warned that: for the experimental options, Oracle may remove them at its discretion in the future releases.  

Acknowledgement


Some writings here are based on the feedback from Thomas Schatzl and Yu Zhang. However, the author would assume the full responsibility for the content himself.

References

  1. Garbage First Garbage Collector Tuning
  2. g1gc logs - Ergonomics -how to print and how to understand 
  3. Understanding Garbage Collection  
  4. Garbage First (G1) Garbage Collection Options
  5. CSet
    • The G1 GC reduces heap fragmentation by incremental parallel copying of live objects from one or more sets of regions (called Collection Set (CSet)) into different new region(s) to achieve compaction.
    • The goal of G1 GC is to reclaim as much heap space as possible, starting with those regions that contain the most reclaimable space (i..e, garbage first), while attempting to not exceed the pause time goal.
  6. G1 GC Glossary of Terms
  7. Learn More About Performance Improvements in JDK 8 
  8. HotSpot Virtual Machine Garbage Collection Tuning Guide
  9. Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
  10. G1 GC: Humongous Objects and Humongous Allocations (Xml and More)
  11. Other JDK 8 articles on Xml and More
  12. Concurrent Marking in G1