In this article, we will demonstrate one case that setting MaxNewSize for G1 actually hurts the application's performance. BTW, after benchmarking G1 in JDK 7, another author has claimed that:[14]
- There is little to gain and much to lose from setting the new generation size explicitly.
Live Data Size
To tune the heap size of a Java Application, it's important to find out what its live data size is. To learn how to estimate a Java application's live data size, read [2]. For the benchmark we are using, its live data size is around 1400 MB. With this benchmark, we are interested in knowing how G1 performs with MaxNewSize set to 400MB and Java heap set to 2GB.
Regularly we use the following settings for the G1 tuning:
- -Xms2g -Xmx2g -XX:+UseG1GC
- -Xms2g -Xmx2g -XX:+UseG1GC -Xmn400m
Concurrent Cycle in G1 GC
G1 has four main operations:[5]
- A young collection
- A background, concurrent cycle
- A mixed collection
- If necessary, a full GC
Without much ado, read [5-9] for the needed background of G1 GC.
G1 is a concurrent collector. Its concurrent cycle has multiple phases—some of which stop all application threads (denoted by STW) and some of which do not:
- Initial-mark (STW)
- Denoted by:
- [GC pause (G1 Evacuation Pause) (young) (initial-mark)
- [GC pause (Metadata GC Threshold) (young) (initial-mark)
- The G1 GC marks the roots during this phase. This phase is piggybacked on a normal (STW) young garbage collection.
- Root Region Scan
- Denoted by:
- [GC concurrent-root-region-scan-start]
- [GC concurrent-root-region-scan-end
- The G1 GC scans survivor regions of the initial mark for references to the old generation and marks the referenced objects.
- This phase runs concurrently with the application (not STW) and must complete before the next STW young garbage collection can start.
- Concurrent marking
- Denoted by:
- [GC concurrent-mark-start]
- [GC concurrent-mark-end
- The G1 GC finds reachable (live) objects across the entire heap.
- This phase happens concurrently with the application, and can be interrupted by STW young garbage collections.
- Remark (STW)
- Denoted by:
- [GC remark
- This phase is STW collection and helps the completion of the marking cycle.
- Finds objects that were missed by the concurrent mark phase due to updates by Java application threads to objects after the concurrent collector had finished tracing that object.
- G1 GC drains SATB buffers, traces unvisited live objects, and performs reference processing.
- Cleanup (STW)
- Denoted by:
- [GC cleanup 1936M->1931M(2048M)
- Prepare for next concurrent collection by clearing data structures.
- In this final phase, the G1 GC performs the STW operations of accounting and RSet scrubbing.
- During accounting, the G1 GC identifies completely free regions and mixed garbage collection candidates. The cleanup phase is partly concurrent when it resets and returns the empty regions to the free list.
- Concurrent cleanup:
- Denoted by:
- [GC concurrent-cleanup-start]
- [GC concurrent-cleanup-end
- In this phase, G1 reclaims regions which were found empty during marking.
- Adds empty regions to the free list—i.e., thread local free lists are merged into global free list
G1 GC uses the Snapshot-At-The-Beginning (SATB) algorithm, which takes a snapshot of the set of live objects in the heap at the start of a marking cycle. During the concurrent cycle, young garbage collections are allowed, which are triggered when eden fills up (note that initial-mark is implemented using a young collection cycle) .
The set of live objects after marking cycle is composed of the live objects in the snapshot, and the objects allocated since the start of the marking cycle. The G1 GC marking algorithm uses a pre-write barrier to record and mark objects that are part of the logical snapshot.
After the concurrent cycle, we expect to see:[5]
- The eden regions before the marking cycle have been completely freed and new eden regions have been allocated
- Old regions could be more occupied because the promotion of live objects from young regions
- Some old regions are identified to be mostly garbage and become candidates in later mixed or old collection cycles
Diagnosis
The option has forced G1 to use a young generation space of up to 400 MB and leaves G1 just 200 MB (i.e. 2048MB - 400MB - 1400MB) breathing room for shuffling live objects around.
The small free space has negative impact on G1's concurrent cycles, which has caused most marking cycles not being completed in time.Before setting -Xmn400m on the command line, we have found only one instance of aborted marking cycle, which is denoted by:
- [GC concurrent-mark-abort]
Differences between Parallel GC and G1 GC
Note that -Xmn400m is the optimal setting for our benchmark if Parallel GC is used. When we set MaxNewSize to be 400MB for G1, our benchmark regressed. So, what's the difference between G1 GC and Parallel GC?
Initially, G1 GC was designed to replace CMS [10] for its relatively lower and more predictable pause times. This is different from the design of Parallel GC which aims for higher throughput. To help gain higher throughput, Parallel GC tries to:
- adjust young generation as large as possible and let it run into a full GC
- In our 4-hour experiments, we usually see around 50 full GC's in Parallel GC
- However, for G1's performance tuning, we want to avoid full GC's (see [11] for how)
- Which means more frequent full GC is expected
- Note that Parallel GC's Full GC pause time is shorter than G1's in the current JDK 8 releases
Acknowledgement
Some writings here are based on the feedback from Thomas Schatzl. However, the author would assume the full responsibility for the content himself.
References
- Garbage First Garbage Collector Tuning
- JRockit: How to Estimate the Size of Live Data Set
- g1gc logs - basic - how to print and how to understand
- g1gc logs - Ergonomics -how to print and how to understand
- Java Performance: The Definitive Guide (Strongly recommended)
- Garbage First Garbage Collector Tuning - Oracle
- Our Collectors by Jon Masamitsu
- Understanding Garbage Collection
- HotSpot VM Performance Tuning Tips
- Understanding CMS GC Logs
- G1 GC: Tuning Mixed Garbage Collections in JDK 8
- G1 GC Glossary of Terms
- Learn More About Performance Improvements in JDK 8
- Benchmarking G1 and other Java 7 Garbage Collectors
- HotSpot Virtual Machine Garbage Collection Tuning Guide
- Getting Started with the G1 Garbage Collector
- Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
- Other JDK 8 articles on Xml and More
- Tuning that was great in old JRockit versions might not be so good anymore
- Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
- Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).
No comments:
Post a Comment