Cross Column

Showing posts with label Ergonomics. Show all posts
Showing posts with label Ergonomics. Show all posts

Saturday, November 8, 2014

HotSpot: GC Worker Threads Used in CMS, Parallel, and G1 GC

As discussed in [1], the Java HotSpot virtual machine includes five garbage collectors (or GC):
  • Serial Collector
  • Parallel Collector (or throughput collector)
  • Parallel Compacting Collector
  • Concurrent Mark-Sweep (CMS) Collector
  • Generation First (G1) Garbage Collector
Also, in [2], we have shown how to find the default HotSpot JVM values.  Some VM options are set from command line and some are set by GC's ergonomics.  However, all VM options are printed when you execute:
  • -XX:+PrintFlagsFinal
Using G1 GC as an example, it uses SurvivorRatio, but not InitialSurvivorRatio, MinSurvivorRatio, or TargetSurvivorRatio.  In this article, we will show you one way to tell which VM option is used by which GC.

GC Threads


In [3], we have discussed the difference between Parallel Collectors and Concurrent Collectors.

Parallel collectors require stop-the-world pause for the whole duration of major collection phases (mark or sweep), but employ all available cores to compress pause time. Parallel collectors usually have better throughput, but they are not a good fit for pause critical applications. 
Concurrent collectors try to do most work concurrently (though they also do it in parallel on multi-core systems), stopping the application only for short duration. Note that the concurrent collection algorithm in JRockit is fairly different from both HotSpot's concurrent collectors (CMS and G1).

There are three kinds of GC threads utilized in HotSpot for CMS, Parallel, and G1 garbage collectors:
  • ParallelGCThreads
  • ConcGCThreads 
  • G1ConcRefinementThreads
When you use PrintFlagsFinal to print out all JVM flags for CMS, Parallel or G1 GC.  The results are as follows:

CMS
    uintx ParallelGCThreads           = 23     {product}
    uintx ConcGCThreads               = 6      {product}
    uintx G1ConcRefinementThreads     = 0      {product}
Parallel GC     uintx ParallelGCThreads           = 23     {product}     uintx ConcGCThreads               = 0      {product}     uintx G1ConcRefinementThreads     = 0      {product}
G1 GC
    uintx ParallelGCThreads           = 23     {product}
    uintx ConcGCThreads               = 6      {product}
    uintx G1ConcRefinementThreads     = 23     {product}

Based on JDK-8047976,[4] it says that GC's normally don't update flags they do not use.  That's why we see "0" in some of the printouts.  To summarize, here are the GC worker threads used in CMS, Parallel, and G1 garbage collectors:

Garbage Collector
Worker Threads Used
CMS
ParallelGCThreads
ConcGCThreads
Parallel
ParallelGCThreads
G1
ParallelGCThreads
ConcGCThreads
G1ConcRefinementThreads

Note that the above description is valid for the following JVM version or later ones:
java version "1.8.0_40-ea"
Java(TM) SE Runtime Environment (build 1.8.0_40-ea-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b16, mixed mode)
Because JDK-8047976 bug,[4] the behavior is different if you use earlier versions.

References

  1. HotSpot VM Performance Tuning Tips (Xml and More)
  2. What Are the Default HotSpot JVM Values? (Xml and More)
  3. JRockit: Parallel vs Concurrent Collectors (Xml and More)
  4. Ergonomics for GC thread counts should update the flags
  5. Tuning that was great in old JRockit versions might not be so good anymore
    • Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
    • Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).
  6. HotSpot VM options (JDK 8)

Tuesday, June 24, 2014

HotSpot: What Are Those GC Worker Threads in G1 GC?

On the Internet, you can find some tuning guides for Garbage First Garbage Collector (G1 GC)[1,5,8,9] and some of them may be outdated.

In this article, we try to update some of those information with the focus on thread tuning.

Ergonomics


Ergonomics for servers was first introduced in Java SE 5.0[2]. It has greatly reduced application tuning time for server applications, particularly with heap sizing and advanced GC tuning. In many cases no tuning options when running on a server is the best tuning you can do.

Here we will focus on the number of GC worker threads chosen to run on a specific server based on the ergonomics in G1 GC.

Garbage Collector
Worker Threads Used
Parallel
ParallelGCThreads
CMS
ParallelGCThreads
ConcGCThreads
G1
ParallelGCThreads
ConcGCThreads G1ConcRefinementThreads


G1 Threads


In the table below, we have summarized the most important GC worker threads in Oracle's G1 GC implementation. For comparison, Parallel GC only uses ParallelGCThreads and CMS only uses ParallelGCThreads and ConcGCThreads. Be warned that these values can be changed in the future if Oracle sees appropriate.

GC thread counts are based on the number of processors (including virtual processor if hyper-threading[3] is enabled) reported by the system. To see the number of processors you have in the system, check out the file /proc/cpuinfo. In this article, 32 processors are used in the demonstration.



NameControlled byDefault if Not SpecifiedDescription
Parallel GC Threads-XX:ParallelGCThreadsif #proc <=8, then #proc;
if #proc >8, 8+(#proc-8)*(5/8)
Parallel operations
Parallel Marking Threads-XX:ConcGCThreadsThere are lot of cases, but mostly it uses max((ParallelGCThreads+2)/4, 1)
G1 Main Concurrent Mark GC Thread
1Master concurrent thread
G1 Concurrent Refinement Thread-XX:G1ConcRefinementThreadsParallelGCThreads+1The +1 is the master thread which control these worker threads.


What Are They?


G1 garbage collection cycle has multiple phases: it starts with an initial marking phase and ends with a concurrent-cleanup phase. During the cycle, there are some phases that pause all the application threads, and some that run concurrently. These GC worker threads used in G1 GC include, but not limited to:
  • ParallelGCThreads
    • ParallelGCThreads are the threads employed during GC pauses
      • They are responsible for copying (and scanning for) live objects during stop-the-world (or STW) pauses.
      • They are also employed during other STW pauses (such as Remark and Cleanup).
  • ConcGCThreads
    • In concurrent marking, there is one controller thread and another bunch of worker threads. The number of worker threads is set using ConcGCThreads or a fraction of ParallelGCThreads if ConcGCThreads is not set.
      • Worker threads are the threads used for marking and determining the liveness of regions, which are run concurrently to the application threads.
      • Increasing ConcGCThreads will make the marking cycle complete faster.
  • G1ConcRefinementThreads
    • G1ConcRefinementThreads are the threads used for updating remembered set while application threads are running concurrently.
      • They are responsible for taking the buffers that are used to log object updates and updating the RememberedSets (or RSets)[10] based upon the updates.
    • If G1ConcRefinementThreads is not set, then its value is ParallelGCThreads + 1.
      • For example, if ParallelGCThreads is 23 then we will have 24 refinement threads.
        • One is used for sampling and updating the RSets of young regions (typically young regions have less cross-region references so one thread is enough); the other 23 threads fill update buffers.
        • Typically they don't all run at once.
          • They each have a stepped activation threshold and a thread is activated when the number of completed buffers exceeds it's activation threshold.
          • They also have deactivation thresholds and when the number of filled update buffers falls below a thread's deactivation threshold, it is deactivated.

Thread Tuning


For our server with 32 processors, the number of GC worker threads chosen by G1's ergonomics are:
  • # ParallelGCThreads = 23
  • # ConcGCThreads = 6
  • # G1ConcRefinementThreads = 23

To find out what the GC worker thread counts are for your server, you can use -XX:+PrintFlagsFinal.[4] You can also change the default values if you see appropriate.[6,7] For some systems, default values sometimes can be too high.

References

  1. Garbage First Garbage Collector Tuning (Published August 2013)
  2. Java Heap Sizing: How do I size my Java heap correctly?
  3. Intel® Hyper-Threading Technology
  4. What Are the Default HotSpot JVM Values?
  5. Part #1 - Tuning Java Garbage Collection for HBase
  6. Linux: Understanding Processor Queue in Vmstat Output
  7. How to Troubleshoot High CPU Usage of Java Applications? 
  8. Getting Started with the G1 Garbage Collector 
  9. G1: One Garbage Collector To Rule Them All 
  10. Remembered Sets (RSets) in G1 GC
    • G1 GC uses independent Remembered Sets (RSets) to track references into regions. Independent RSets enable parallel and independent collection of regions because only a region's RSet must be scanned for references into that region, instead of the whole heap. To achieve the task, G1 GC uses a post-write barrier to record changes to the heap and update the RSets. 
  11. D. L. Detlefs, C. H. Flood, S. Heller, and T. Printezis. Garbage-First Garbage Collection. In A. Diwan, editor, Proceedings of the 2004 International Symposium on Memory Management (ISMM 2004), pages 37-48, Vancouver, Canada. October 2004. ACM Press.
  12. T. Printezis and D. L. Detlefs. A Generational Mostly-Concurrent Garbage Collector. In A. L. Hosking, editor, Proceedings of the 2000 International Symposium on Memory Management (ISMM 2000), pages 134-154,Minneapolis, MN, USA, October 2000. ACM Press. 
  13. g1gc logs - how to print and how to understand - basic 
  14. G1 GC Glossary of Terms
  15. Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)