Insights on Oracle & Tech: Hotspot

Showing posts with label Hotspot. Show all posts

Saturday, October 24, 2015

Does SPECjbb 2015 Have Less Variability than SPECjbb2013?

What's the advantage of using SPECjbb 2015 over SPECjbb 2013? The answer may be "less variability". In this article, I'm going to examine the credibility of this claim.

Note that the experiments run here are not intended to evaluate the ultimate performance of JVMs (OpenJDK vs. HotSpot) with different configurations. Different JDK 7 versions were used in OpenJDK and HotSpot here. But, the author believes that the differences between builds of OpenJDK and HotSpot should be minor.

To recap, the main purpose of this article is comparing the variability of SPECjbb 2013 and SPECjbb 2015.

OpenJDK vs. HotSpot

To begin with , we want to emphasize that OpenJDK and HotSpot actually share most of the code bases. Henrik Stahl, VP of Product Management in the Java Platform Group at Oracle, has shed some light on it:

Q: What is the difference between the source code found in the OpenJDK repository, and the code you use to build the Oracle JDK?

A: It is very close - our build process for Oracle JDK releases builds on OpenJDK 7 by adding just a couple of pieces, like the deployment code, which includes Oracle's implementation of the Java Plugin and Java WebStart, as well as some closed source third party components like a graphics rasterizer, some open source third party components, like Rhino, and a few bits and pieces here and there, like additional documentation or third party fonts. Moving forward, our intent is to open source all pieces of the Oracle JDK except those that we consider commercial features such as JRockit Mission Control (~~not yet available in Oracle JDK~~), and replace encumbered third party components with open source alternatives to achieve closer parity between the code bases.

Raw Data

Common Settings of the Tests

SPECjbb

OOTB settings
-m composite

JDK 7

4GB heap
Parallel GC (default)

OEL 6.6
Data sampling

Total Runs: 5 for each test
Metrics

Sample Standard Deviation (Std Dev)
Coefficient of variation (CV)

Different Settings of Tests

Test 1

OpenJDK
Huge Pages:^[7] 0

Test 2

OpenJDK
Huge Pages: 2200

Test 3

HotSpot
Huge Pages: 0

Test 4

HotSpot
Huge Pages: 2200

SPECjbb 2013	Test 1			Test 2			Test 3			Test 4
	Mean	Std Dev	CV	Mean	Std Dev	CV	Mean	Std Dev	CV	Mean	Std Dev	CV
max-jOPS	2973.40	76.01	2.56%	3490.60	92.54	2.65%	3124.20	43.22	1.38%	3538	71.98	2.03%
critical-jOPS	748.40	70.02	9.36%	852.20	76.37	8.96%	741.60	36.28	4.89%	828.8	76.54	9.24%

SPECjbb 2015	Test 1			Test 2			Test 3			Test 4
	Mean	Std Dev	CV	Mean	Std Dev	CV	Mean	Std Dev	CV	Mean	Std Dev	CV
max-jOPS	2853.80	55.22	1.93%	3233.40	82.32	2.55%	2893.20	53.89	1.86%	3238.60	92.89	2.87%
critical-jOPS	514	20.21	3.93%	581.80	45.26	7.78%	522.40	18.90	3.62%	571.6	36.67	6.42%

Analysis

Coefficient of variation (CV) is used for the comparison of experiment's variability. Based on 5 sample points of each test, we have come to the following preliminary conclusions:

SPECjbb 2013 vs SPECjbb 2015

On critical-jOPS, SPECjbb 2015 exhibits lower extent of variability in all tests.
On max-jOPS, SPECjbb 2015 exhibits lower extent of variability in all tests except Test 3&4.

Enabling Huge Pages or not

In both SPECjbb 2013 & 2015, enabling huge pages exhibits higher extent of variability (exception: SPECjbb 2013/OpenJDK/critical-jOPS)

OpenJDK vs HotSpot

Both JVMs exhibit similar performance in average (i.e., max- or critical- jOPS) .

By no means our results are conclusive. As more tests are run in the future, we could gain a better picture. From the preliminary results, it seems to be wise of choosing SPECjbb 2015 over SPECjbb 2013 in future Java SE performance tests.

References

Coefficient of variation
When to use the sample or population standard deviation
SPECjbb2015_release_notes.txt
SPECjbb2013 Result File Fields
Java 7 Questions & Answers
java - Difference between JVM and HotSpot?
Huge Pages

Java Mission Control

Java Mission Control (JMC) is for Hotspot as JRockit Mission Control (JRMC) is for JRockit.

Standard Deviation Calculator

Friday, August 15, 2014

jstat Tool: New Metaspace Statistics from -gc Option

In [1], we have shown one way of estimating the size of metaspace^[2] using the following command:

jstat -class

However, there are other metadata or implementation overhead (i.e., MetaBlock overhead) not accounted for in the reported sizes.

In this article, we will introduce another way of monitoring the sizes needed for "Metaspace" and "Class Space".

Heap Statistics Reported by PrintGC

As described in [2], the following heap statistics would be printed at the exit of JVM if you have enabled PringGC (i.e., PrintGCDetails or PrintHeapAtGC):

Heap
garbage-first heap   total 2097152K, used 1680819K [0x0000000080000000, 0x0000000100000000, 0x0000000100000000)
region size 4096K, 27 young (110592K), 9 survivors (36864K)
Metaspace       used 327768K, capacity 340303K, committed 340780K, reserved 1349632K
class space    used 37537K, capacity 40670K, committed 40832K, reserved 1048576K

jstat -gc Command

jstat -gc can be used to display Garbage-collected heap statistics. In JDK 8, it would display Metaspace's statistics, but not PermGen's.

-bash-3.2$ jstat -gc 18990 1000 3
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.used substituted NaN
S0C   S1C      S0U   S1U     EC       EU       OC         OU           PC     PU
0.0   16384.0 0.0   16384.0 94208.0 12288.0 1986560.0 1783935.8    �      �
0.0   16384.0 0.0   16384.0 94208.0 57344.0 1986560.0 1775743.8    �      �
0.0   16384.0 0.0   16384.0 94208.0 69632.0 1986560.0 1775743.8    �      �

At the first try, it has printed the above three lines. However, the default jstat was run, which belongs to JDK 7. So, you still see headers such as "PC" and "PU". Also, you could see some gibberish text because of "unresolved symbols." To run jstat, you need to use the correct version (i.e., from your JDK 8 installation). After correcting this error, we have seen:

-bash-3.2$ cd JVMs/
-bash-3.2$ jdk-hs/bin/jstat -gc 18990 1000 3
S0C   S1C      S0U   S1U     EC       EU       OC         OU         MC       MU       CCSC    CCSU
0.0   28672.0 0.0   28672.0 163840.0 45056.0 1904640.0 1663244.2 340992.0 327753.1 40960.0 37549.0
0.0   28672.0 0.0   28672.0 163840.0 77824.0 1904640.0 1663244.2 340992.0 327753.1 40960.0 37549.0
0.0   28672.0 0.0   28672.0 163840.0 86016.0 1904640.0 1663244.2 340992.0 327753.1 40960.0 37549.0

Note that we now see "MC", "MU", "CCSC", and "CCSU" instead of "PC" and "PU".

Meaning of "MC", "MU", "CCSC", and "CCSU"

There is no good documentation on what they mean yet. But, a good guess would be:

MC (Metaspace Committed)
MU (Metaspace Used)
CCSC (Compressed Class Space Committed)
CCSU (Compressed Class Space Used)

If you check the values printed out from "jstat -gc" and the values printed out at JVM's exit time. They have similar values. Note that the "jstat -gc" command was issued close to the end of our JVM run.

To learn more about these new headers, read [2].

Acknowledgement

Some writings here are based on the feedback from Jon Masamitsu. However, the author would assume the full responsibility for the content himself.

References

Friday, October 26, 2012

HotSpot VM Performance Tuning Tips

In some cases, it may be obvious from benchmarking that parts of an application need to be rewritten using more efficient algorithms^[1]. Sometimes it may just be enough to provide a more optimal runtime environment by tuning the JVM parameters.

In this article, we will show you some of the HotSpot VM performance tuning tips.

What to tune?

You can tune HotSpot performance on multiple fronts:

Code generation^[8,10]
Memory management

In this article, we will focus more on memory management (or garbage collector). The goals of tuning garbage collector include:

To make a garbage collector operate efficiently, by

Reducing pause time or
Increasing throughput

To avoid heap fragmentation

Different garbage collector uses different compaction to eliminate fragmentation

To make it scalable for multithreaded applications on multiprocessor systems

In this article, we will cover the following tuning options:

Client VM or Server VM
32-bit VM or 64-bit VM
GC strategy
Heap sizing
Further tuning

Client vs. Server VM

The HotSpot Client JVM has been specially tuned to reduce application startup time and memory footprint, making it particularly well suited for client environments. On all platforms, the HotSpot Client JVM is the default.

The Java HotSpot Server VM is similar to the HotSpot Client JVM except that it has been specially tuned to maximize peak operating speed. It is intended for long-running server applications, for which the fastest possible operating speed is generally more important than having the fastest startup time. To invoke the HotSpot Server JVM instead of the default HotSpot Client JVM, use the -server parameter; for example,

java -server MyApp

In [7], authors have mentioned a third HotSpot VM runtime named tiered. If you are using Java 6 Update 25, Java 7, or later, you may consider using tiered server runtime as a replacement for the client runtime. For more details, read [8,10].

32-Bit or 64-Bit VM

The 32-bit JVM is the default for the HotSpot VM. The choice of using a 32-bit or 64-bit JVM is dictated by the memory footprint required by the application along with whether any third-party software used in the application supports 64-bit JVMs and if there are any native components in the Java application. All native components using the Java Native Interface (JNI) in a 64-bit JVM must be compiled in 64-bit mode.

Running 64-bit VM has the following advantages^[7]:

Larger address space
Better performance in two fronts

64-bit JVMs can make use of additional CPU registers
Help avoid register spilling

and one disadvantage:

Increased width for oops

Results in fewer oops being available on a CPU cache line and as a result decreases CPU cache efficiency.
This negative performance impact can be mitigated by setting:

-XX:+UseCompressedOops VM command line option

Note that client runtimes are not available in 64-bit HotSpot VMs. See [11] for more details.

GC Strategy

JVM performance is usually measured by its GC's effectiveness. Garbage collection (GC) reclaims the heap space previously allocated to objects no longer needed. The process of locating and removing those dead objects can stall your Java application while consuming as much as 25 percent of throughput.

The Java HotSpot virtual machine includes five garbage collectors.^[27] All the collectors are generational.

Serial Collector

Both young and old collections are done serially (using a single CPU), in a stop-the-world fashion.
The old and permanent generations are collected via a mark-sweep-compact collection algorithm.

The sweep phase “sweeps” over the generations, identifying garbage. The collector then performs sliding compaction, sliding the live objects towards the beginning of the old generation space (and similarly for the permanent generation), leaving any free space in a single contiguous chunk at the opposite end.

When to use

For most applications that are run on client-style machines and that do not have a requirement for low pause times

How to select

In the J2SE 5.0 and above, the serial collector is automatically chosen as the default garbage collector on machines that are not server-class machines. On other machines, the serial collector can be explicitly requested by using the -XX:+UseSerialGC command line option.

Parallel Collector (or throughput collector)

Young generation collection

Uses a parallel version of the young generation collection algorithm utilized by the serial collector
It is still a stop-the-world and copying collector, but performing the young generation collection in parallel, using many CPUs, decreases garbage collection overhead and hence increases application throughput.

Old generation collection

Uses the same serial mark-sweep-compact collection algorithm as the serial collector

When to use

For applications run on machines with more than one CPU and do not have pause time constraints, since infrequent, but potentially long, old generation collections will still occur.
Examples of applications for which the parallel collector is often appropriate include those that do batch processing, billing, payroll, scientific computing, and so on.

How to select

In the J2SE 5.0 and above, the parallel collector is automatically chosen as the default garbage collector on server-class machines. On other machines, the parallel collector can be explicitly requested by using the -XX:+UseParallelGC command line option.

Parallel Compacting Collector

Young generation collection

Use the same algorithm as that for young generation collection using the parallel collector.

Old generation collection

The old and permanent generations are collected in a stop-the-world, mostly parallel fashion with sliding compaction
The collector utilizes three phases (see [5] for more details):

Marking phase
Summary phase
Compaction phase

When to use

For applications that are run on machines with more than one CPU and applications that have pause time constraints.

How to select

If you want the parallel compacting collector to be used, you must select it by specifying the command line option -XX:+UseParallelOldGC.

Concurrent Mark-Sweep (CMS) Collector^[5,6]

Young generation collection

The CMS collector collects the young generation in the same manner as the parallel collector.

Old generation collection

Most of the collection of the old generation using the CMS collector is done concurrently with the execution of the application.
The CMS collector is the only collector that is non-compacting. That is, after it frees the space that was occupied by dead objects, it does not move the live objects to one end of the old generation.

To minimize risk of fragmentation, CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.

When to use

For application needs shorter garbage collection pauses and can afford to share processor resources with the garbage collector when the application is running. (Due to its concurrency, the CMS collector takes CPU cycles away from the application during a collection cycle.)
Typically, applications that have a relatively large set of long-lived data (a large old generation), and that run on machines with two or more processors, tend to benefit from the use of this collector.
Compared to the parallel collector, the CMS collector decreases old generation pauses—sometimes dramatically—at the expense of slightly longer young generation pauses, some reduction in throughput, and extra heap size requirements.

How to select

If you want the CMS collector to be used, you must explicitly select it by specifying the command line option -XX:+UseConcMarkSweepGC.
If you want it to be run in incremental mode, also enable that mode via the –XX:+CMSIncrementalMode option.

This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).

ParNewGC is the parallel young generation collector for use with CMS. To choose it, you can specify the command line option ‑XX:+UseParNewGC.

Generation First (G1) Garbage Collector^[31]

Summary

Differs from CMS in the following ways

Compacting

Reduce fragmentation and is good for long-running applications

Heap is split into regions

Easy to allocate and resize

Evacuation pauses

For both young and old regions

Young generation collection

During a young GC, survivors from the young regions are evacuated to either survivor regions or old regions

Done with stop-the-world evacuation pauses
Performed in parallel

Old generation collection

Some garbage objects in regions with very high live ratio may be left in the heap and be collected later
Concurrent marking phase

Calculates liveness information per region

Empty regions can be reclaimed immediately
Identifies best regions for subsequent evacuation pauses

Remark is done with one stop-the-world puase while initial mark is piggybacked on an evacuation pause
No corresponding sweeping phase
Different marking algorithm than CMS

Old regions are reclaimed by

Evacuation pauses

Using compaction
Where most reclamation is done

Remark (when totally empty)

When to use

The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput.

How to select

If you want the G1 garbage collector to be used, you must explicitly select it by specifying the command line option -XX:+UseG1GC.

Note that the difference between:

-XX:+UseParallelOldGC

and

-XX:+UseParallelGC

is that -XX:+UseParallelOldGC enables both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, that is, both minor garbage collections and full garbage collections are multithreaded. -XX:+UseParallelGC enables only a multithreaded young generation garbage collector. The old generation garbage collector used with -XX:+UseParallelGC is single threaded.

Using -XX:+UseParallelOldGC also automatically enables -XX:+UseParallelGC. Hence, if you want to use both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, you need only specify -XX:+UseParallelOldGC.

Note that the above distinction between -XX:+UseParallelOldGC and -XX:+UseParallelGC are no longer true in JDK 7. In JDK 7, the following three settings are equivalent:

Default
-XX:+UseParallelGC
-XX:+UseParallelOldGC

They all use multithreaded collectors for both young generation and old generation.

Heap Sizing

If a heap size is small, collection will be fast but the heap will fill up more quickly, thus requiring more frequent collections. Conversely, a large heap will take longer to fill up and thus collections will be less frequent, but they may take longer.

Command line parameters that divide the heap between new and old generations usually cause the greatest performance impact. If you increase the new generation's size, you often improve the overall throughput; however, you also increase footprint, which may slow down servers with limited memory.

For more details, you can read [12-15].

Further Tuning

HotSpot's default parameters are effective for most small applications that require faster startup and a smaller footprint. However, more often than not, you will find default settings are not good enough and need further tuning your Java applications. As shown in [16], there are many VM options exoosed and can be further tuned by brave souls. I'm not going to discuss such tunings in this article. But, I'll keep on posting articles on this blogger with VM tunings. Stay tuned!

References

Java Performance Tips
Pick up performance with generational garbage collection
Java HotSpot™ Virtual Machine Performance Enhancements
Oracle JRockit
Memory Management in the Java HotSpot™ Virtual Machine
Understanding GC pauses in JVM, HotSpot's CMS collector
Java Performance by Charlie Hunt and Binu John
Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation
Java Tuning White Paper
A Case Study of Using Tiered Compilation in HotSpot
HotSpot VM Binaries: 32-Bit vs. 64-Bit
HotSpot Performance Option — SurvivorRatio
A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded
Understanding Garbage Collection
Diagnosing Java.lang.OutOfMemoryError
What Are the Default HotSpot JVM Values?
Understanding Garbage Collector Output of Hotspot VM
On Stack Replacement in HotSpot JVM
Professional Oracle WebLogic Server by Robert Patrick, Gregory Nyberg, and Philip Aston
Sun Performance and Tuning: Java and the Internet by Adrian Cockroft and Richard Pettit
Concurrent Programming in Java: Design Principles and Patterns by Doug Lea
Capacity Planning for Web Performance: Metrics, Models, and Methods by Daniel A. Menascé and Virgilio A.F. Almeida
Java Performance Tuning (Michael Finocchiaro)
Diagnosing Heap Stress in HotSpot
Introduction to HotSpot JVM Performance and Tuning
Tuning the JVM (video)

Frequency of Minhor GC dictated by:

Application object allocation rate
Size of Eden

Frequency of object promotion dictated by:

Frequency of minor GCs (tenuring)
Size of survivor spaces

Full GC Frequency dictated by

Promotion rate
Size of old generation

JEP 173: Retire Some Rarely-Used GC Combinations
G1 GC Glossary of Terms
Learn More About Performance Improvements in JDK 8
Java SE HotSpot at a Glance
Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
Tuning that was great in old JRockit versions might not be so good anymore
- Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
- Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).

Wednesday, September 12, 2012

A Case Study of Using Tiered Compilation in HotSpot

Hotspot has two JITs named c1 (i.e., client JIT) and c2 (i.e., server JIT).^[1] The client JIT starts fast but provides less optimizations. So, it is used for GUI application. The server JIT starts more slowly but provide very good optimizations. The idea of tiered compilation ^[2]is to get the best of both compilers, first JITs the code with c1 and then if the code is really hot to recompile it with c2.

The tiered server runtime is enabled with the following Hotspot VM options:

-server -XX:+TieredCompilation

In this article, we show a case study of using HotSpot (build 23.0-b18, mixed mode) with TieredCompilation off/on.

Comparison

You can tune JVM's performance by tuning either memory management or code generaion. Here we have tuned code generator by turning tiered compilation mode on. With tiered compilation on, more classes are compiled (or compiled more efficiently) and so they execute faster (i.e., +15%).

Using ATG CRM Demo benchmark, we saw the following KPI changes:

	-XX:-TieredCompilation	-XX:+TieredCompilation	% Change
Total Footprint	4431MB	4686MB	-5.4%
Application Server CPU	23%	20%	+15%
Average Response Time	0.234	0.217	+7.8%

Notes

Here is our setting for ReservedCodeCacheSize^[2]:
-TieredCompilation: 128MB
+TieredCompilation: 256MB
When we turned on tiered compilation mode, we also reserve larger code cache for it (i.e., 256 MB vs. 128 MB). Code cache is allocated out of native memory (vs. heap). Total footprint shown in the table includes native memory. So, we see total footprint is larger when tiered compilation mode is turned on.

References

Friday, August 31, 2012

Which JVM?

When you report java.lang.OutOfMemoryError: Java heap space, you need to tell people which JVM has thrown the exception. Sometimes this can be obvious and sometimes it isn't (see below).

An OutOfMemoryError does not necessarily imply a memory leak. The issue might simply be a configuration issue, for example if the specified heap size (or the default size if not specified) is insufficient for the application^[1]. In that case, you need to figure out which configuration file to modify.

jps Command

You can use jps (Java Virtual Machine Process Status Tool) to list all local VM identifiers of instrumented HotSpot Java Virtual Machines (JVMs) on the target Linux system. For example, on my system, it lists:

$ jps 
17291 Server
17468 Jps
7883 Launcher

This list includes one Integrated WebLogic Server (i.e., Server) and one JDeveloper (i.e., Launcher). Note that JDeveloper and Integrated WLS use their own JVM's. So, if you're running Integrated WLS from JDeveloper, you need to know which JVM has thrown the OutOfMemoryError exception.

Configuration Files

Each application uses its own configuration file to start its JVM and you need to know where to find it.

For example, to change heap space sizes, you can modify those parameters in:

JDeveloper
- $JDEV_HOME/jdev/bin/jdev.conf
- or 'which jdev' and find its containing directory; jdev.conf should be located in the same directory
- Note that jdev.conf includes ide.conf which contains the following configurations:
WebLogic Server
- You have two options:
  - Modify MEM_ARGS in the setDomainEnv.sh
  - $setenv USER_MEM_ARGS "-Xms512m -Xmx1024m -XX:MaxPermSize=1024m -XX:CompileThreshold=8000"
- If this is Integrated WLS, your script folder is located under:
Eclipse
- /eclipse.ini

Project Properties

Here is another case which requires the change of project properties in JDeveloper:

Scenario:

JDeveloper is used to deploy "server-setup-seed-deploy-test" target with "Run Ant Target" and it threw "java.lang.OutOfMemoryError: PermGen space" exception.^[2]

Solution:

In Jdev window, right click the project file > Project Properties > Ant > Process > Java Options:

-Xms512m -Xmx1024m -XX:PermSize=512m -XX:MaxPermSize=1024m

Note that a new JVM is launched for the Ant process and it uses the above settings.

References

Saturday, August 25, 2012

Understanding JVM Thread States

To investigate CPU issues in Java applications, one approach is to diagnose monitor locks and thread activities^[1].

In this article, we will show you:

How to generate thread dumps on different VMs
What to know about thread states

JVM Thread States

A thread can be in only one state at a given point in time. These states are virtual machine states which do not reflect any operating system thread states. As an example, here shows different threads in different states on a HotSpot VM:

$ cat thread.tmp | grep "java.lang.Thread.State" | sort | uniq -c
3 java.lang.Thread.State: BLOCKED (on object monitor)
18 java.lang.Thread.State: RUNNABLE
6 java.lang.Thread.State: TIMED_WAITING (on object monitor)
2 java.lang.Thread.State: TIMED_WAITING (sleeping)
13 java.lang.Thread.State: WAITING (on object monitor)
3 java.lang.Thread.State: WAITING (parking)

Below we describe some of the thread states that can be found in a thread dump:

NEW - this state represents a new thread which is not yet started.
RUNNABLE - this state represents a thread which is executing in the underlying JVM. Here executing in JVM doesn't mean that the thread is always executing in the OS as well - it may wait for a resource from the Operating system like the processor while being in this state.
BLOCKED (on object monitor)- this state represents a thread which has been blocked and is waiting for a moniotor to enter/re-enter a synchronized block/method. A thread gets into this state after calling Object.wait method.
WAITING - this state represnts a thread in the waiting state and this wait is over only when some other thread performs some appropriate action. A thread can get into this state either by calling - Object.wait (without timeout), Thread.join (without timeout), or LockSupport.park methods.
TIMED_WAITING - this state represents a thread which is required to wait at max for a specified time limit. A thread can get into this state by calling either of these methods: Thread.sleep, Object.wait (with timeout specified), Thread.join (with timeout specified), LockSupport.parkNanos, LockSupport.parkUntil
TERMINATED - this state reprents a thread which has completed its execution either by returning from the run() method after completing the execution OR by throwing an exception which propagated from the run() method and hence caused the termination of the thread.
WAITING (parking)- it means a wait state after being parked. A thread can suspend its execution until permit is available (or thread is interrupted, or timeout expired, etc) by calling park(). You can give permit to a thread by calling unpark(). When permit is available, the parked thread consumes it and exits a park() method. Unlike Semaphore's permits, permits of LockSupport are associated with threads (i.e. permit is given to a particular thread) and doesn't accumulate (i.e. there can be only one permit per thread, when thread consumes the permit, it disappears).

Notes:

It's hard to investigate any CPU issue just from one thread dump. So, you need to prepare a series of thread dumps for investigation. For an example of analyzing hanging problems, see [9].
The above thread state information is for HotSpot. To understand JRockit's thread dump contents, read [4].

Generating Thread Dumps in HotSpot

$jcmd 15679 Thread.print >thread.tmp

where 15679 is the process ID. For example, it will print the following messages:

15679:
2012-08-24 11:27:27
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.0-b21-internal mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000000528e000 nid=0x4653 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Thread-29" daemon prio=10 tid=0x00002aaab9447000 nid=0x3fe3 runnable [0x000000004408b000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked <0x00000000f5ae4738> (a java.io.BufferedInputStream)
        at com.sun.jndi.ldap.Connection.run(Connection.java:849)
        at java.lang.Thread.run(Thread.java:722)

Generating Thread Dumps in JRockit^[4]

$jrcmd 1286 print_threads

where 1286 is the process ID. For example, it will print the following messages:

1286:

===== FULL THREAD DUMP ===============
Thu Aug 23 17:36:30 2012
Oracle JRockit(R) R28.1.3-11-141760-1.6.0_24-20110301-1432-linux-x86_64

"Main Thread" id=1 idx=0x4 tid=1287 prio=5 alive, waiting, native_blocked
    -- Waiting for notification on: weblogic/t3/srvr/T3Srvr@0xe48b4598[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/t3/srvr/T3Srvr.waitForDeath(T3Srvr.java:981)
    ^-- Lock released while waiting: weblogic/t3/srvr/T3Srvr@0xe48b4598[fat lock]
    at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:490)
    at weblogic/Server.main(Server.java:71)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"(Signal Handler)" id=2 idx=0x8 tid=1288 prio=5 alive, native_blocked, daemon

"(OC Main Thread)" id=3 idx=0xc tid=1289 prio=5 alive, native_waiting, daemon
...

References

Saturday, March 31, 2012

A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded

When you see the following exception:

java.lang.OutOfMemoryError: GC overhead limit exceeded

it means the garbage collector is taking an excessive amount of time and recovers very little memory in each run.

In this article, we will showcase a real example using the trace records from the garbage collector's output. To learn more on garbage collector's output, read [1] first.

Correlating Timestamps between server.log and gc.log

Two log files generated from Hotspot VM and WebLogic Server are used in this study:

server.log
gc.log

Note that the names or locations of the log files could be different (which are configurable) in your environment. By default, they are generated in the log directory of the WLS domain.

server.log

In the server log file, we have spotted OutOfMemoryError at 5:21:40 AM as shown below:

  <Mar 30, 2012 5:21:40 AM PDT> <Error> 
  <Kernel> <BEA-000802> <ExecuteRequest failed

  java.lang.OutOfMemoryError: GC overhead limit exceeded.

We have also noted the first timestamp in the server log file is 4:13:56 AM:

  <Mar 30, 2012 4:13:56 AM PDT> <Info> <Security> <BEA-090905> <Disabling CryptoJ

Because gc log file uses elapsed time (in seconds) since server started, we know this out-of-memory incident happened 4064 seconds after server started:

5:21:40 - 4:13:56 = 4064 secs.

gc.log

Searching for the Full GC events, we have found that something went bad at 3167 seconds after server started.

3167.301: [Full GC3168.441:
[SoftReference, 786 refs, 0.0001560 secs]3168.442:
[WeakReference, 25548 refs, 0.0041550 secs]3168.446:
[FinalReference, 7977 refs, 0.0024050 secs]3168.448:
[PhantomReference, 40 refs, 0.0000070 secs]3168.448:
[JNI Weak Reference, 0.0000080 secs]
[PSYoungGen: 100268K->23067K(349568K)]
[ParOldGen: 2713398K->2752509K(2752512K)]
2813666K->2775576K(3102080K)
[PSPermGen: 210744K->210744K(393216K)], 2.8599650 secs]
[Times: user=18.52 sys=0.03, real=2.85 secs]

3170.809: [Full GC3171.973:
[SoftReference, 991 refs, 0.0002140 secs]3171.974:
[WeakReference, 26280 refs, 0.0041760 secs]3171.978:
[FinalReference, 12136 refs, 0.0142580 secs]3171.992:
[PhantomReference, 34 refs, 0.0000100 secs]3171.992:
[JNI Weak Reference, 0.0000080 secs]
[PSYoungGen: 197915K->32131K(349568K)]
[ParOldGen: 2752509K->2752511K(2752512K)]
2950424K->2784643K(3102080K)
[PSPermGen: 210744K->210744K(393216K)], 2.7171030 secs]
[Times: user=17.79 sys=0.04, real=2.72 secs]

We have noticed that a series of Full GC's happened after 3167.301 seconds. For example, the above shows the first two such GC events. Only after 3.5 seconds, the second Full GC was triggered. Usually, in between Full GC's, it should be interleaved with multiple Minor GC's. But, it was not the case.

From examining the GC output, we have concluded that:

Full GC's happened too frequent (in about 3 to 4 seconds)
Old generation space was full and remained full after garbage collection.

However, at this time, the exception

java.lang.OutOfMemoryError: GC overhead limit exceeded

was not thrown yet.

Based on our calculation, we know that exception was thrown after 4064 seconds after server started. Here are the garbage collector's output around that time:

4064.456: [Full GC4065.667:
[SoftReference, 2380 refs, 0.0003730 secs]4065.667:
[WeakReference, 47627 refs, 0.0091360 secs]4065.677:
[FinalReference, 3007 refs, 0.0007760 secs]4065.677:
[PhantomReference, 42 refs, 0.0000070 secs]4065.677:
[JNI Weak Reference, 0.0000090 secs]
[PSYoungGen: 174848K->172682K(349568K)]
[ParOldGen: 2752509K->2752508K(2752512K)]
2927357K->2925191K(3102080K)
[PSPermGen: 210598K->210598K(393216K)], 2.3630820 secs]
[Times: user=16.01 sys=0.05, real=2.37 secs]

4066.837: [Full GC4068.051:
[SoftReference, 2355 refs, 0.0003750 secs]4068.051:
[WeakReference, 43590 refs, 0.0081260 secs]4068.060:
[FinalReference, 6859 refs, 0.0012270 secs]4068.061:
[PhantomReference, 42 refs, 0.0000220 secs]4068.061:
[JNI Weak Reference, 0.0000090 secs]
[PSYoungGen: 174848K->171709K(349568K)]
[ParOldGen: 2752508K->2752510K(2752512K)]
2927356K->2924219K(3102080K)
[PSPermGen: 210598K->210598K(393216K)], 2.8005630 secs]
[Times: user=18.47 sys=0.02, real=2.81 secs]

Heap Size Adjustment

When a Full GC happens, you need to determine whether it is the occupancy of the old generation space or the occupancy of the permanent generation space that triggers a full garbage collection. In our case, the trigger was not from the occupancy of the permanent generation space:

[PSPermGen: 210598K->210598K(393216K)]

From the above, we know the total permanent generation space was 393216K. After Full GC, its size remained stable (i.e., 210598K).

However, we know there was serious issue regarding old generation space:

[ParOldGen: 2752508K->2752510K(2752512K)]

It became full and remained full after space reclamation.

Conclusion

If you observe an OutOfMemoryError in the garbage collection logs, try increasing the Java heap size (including young generation and old generation) or permanent generation space. If the issue is coming from old generation, try increasing the Java heap size up to 80% of the physical memory you have available for the JVM taking into the consideration of the available memory on your system, the memory needed by the OS, the memory needed by other applications running concurrently, and so on.

Based on whether the old generation space or the permanent generation space is running out of memory, you adjust the sizes of heap spaces in this way:

For old generation space OutOfMemoryErrors
- increase -Xms and -Xmx
For permanent generation OutOfMemoryErrors
- increase -XX:PermSize and -XX:MaxPermSize

Cross Column

Saturday, October 24, 2015

Does SPECjbb 2015 Have Less Variability than SPECjbb2013?

OpenJDK vs. HotSpot

Raw Data

Analysis

References

Friday, August 15, 2014

jstat Tool: New Metaspace Statistics from -gc Option

Heap Statistics Reported by PrintGC

jstat -gc Command

Meaning of "MC", "MU", "CCSC", and "CCSU"

Acknowledgement

References

Friday, October 26, 2012

HotSpot VM Performance Tuning Tips

What to tune?

Client vs. Server VM

32-Bit or 64-Bit VM

GC Strategy

Heap Sizing

Further Tuning

References

Wednesday, September 12, 2012

A Case Study of Using Tiered Compilation in HotSpot

Comparison

Notes

References

Friday, August 31, 2012

Which JVM?

jps Command

Configuration Files

Project Properties

References

Saturday, August 25, 2012

Understanding JVM Thread States

JVM Thread States

Generating Thread Dumps in HotSpot

Generating Thread Dumps in JRockit[4]

References

Saturday, March 31, 2012

A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded

Correlating Timestamps between server.log and gc.log

Heap Size Adjustment

Conclusion

References

Generating Thread Dumps in JRockit^[4]