Cross Column

Showing posts with label Hotspot. Show all posts
Showing posts with label Hotspot. Show all posts

Saturday, October 24, 2015

Does SPECjbb 2015 Have Less Variability than SPECjbb2013?

What's the advantage of using SPECjbb 2015 over SPECjbb 2013? The answer may be "less variability". In this article, I'm going to examine the credibility of this claim.

Note that the experiments run here are not intended to evaluate the ultimate performance of JVMs (OpenJDK vs. HotSpot) with different configurations. Different JDK 7 versions were used in OpenJDK and HotSpot here. But, the author believes that the differences between builds of OpenJDK and HotSpot should be minor.

To recap, the main purpose of this article is comparing the variability of SPECjbb 2013 and SPECjbb 2015.


OpenJDK vs. HotSpot



To begin with , we want to emphasize that OpenJDK and HotSpot actually share most of the code bases. Henrik Stahl, VP of Product Management in the Java Platform Group at Oracle, has shed some light on it:
Q: What is the difference between the source code found in the OpenJDK repository, and the code you use to build the Oracle JDK?
A: It is very close - our build process for Oracle JDK releases builds on OpenJDK 7 by adding just a couple of pieces, like the deployment code, which includes Oracle's implementation of the Java Plugin and Java WebStart, as well as some closed source third party components like a graphics rasterizer, some open source third party components, like Rhino, and a few bits and pieces here and there, like additional documentation or third party fonts. Moving forward, our intent is to open source all pieces of the Oracle JDK except those that we consider commercial features such as JRockit Mission Control (not yet available in Oracle JDK), and replace encumbered third party components with open source alternatives to achieve closer parity between the code bases.

Raw Data



Common Settings of the Tests
Different Settings of Tests
  • Test 1
    • OpenJDK 
    • Huge Pages:[7] 0
  • Test 2
    • OpenJDK 
    • Huge Pages: 2200
  • Test 3
    • HotSpot
    • Huge Pages: 0
  • Test 4
    • HotSpot
    • Huge Pages: 2200

SPECjbb
2013
Test 1
Test 2
Test 3
Test 4

MeanStd DevCVMeanStd DevCVMeanStd DevCVMean   Std DevCV
  max-jOPS2973.4076.012.56%3490.6092.542.65%3124.2043.221.38%353871.982.03%
  critical-jOPS748.4070.029.36%852.2076.378.96%741.6036.284.89%828.876.549.24%



SPECjbb
2015
Test 1
Test 2
Test 3
Test 4

MeanStd DevCVMeanStd DevCVMeanStd DevCVMeanStd DevCV
  max-jOPS2853.8055.221.93%3233.4082.322.55%2893.2053.891.86%3238.6092.892.87%
  critical-jOPS51420.213.93%581.8045.267.78% 522.40 18.903.62%571.636.676.42%

Analysis


Coefficient of variation (CV) is used for the comparison of experiment's variability.  Based on 5 sample points of each test, we have come to the following preliminary conclusions:
  • SPECjbb 2013 vs SPECjbb 2015
    • On critical-jOPS, SPECjbb 2015 exhibits lower extent of variability in all tests.
    • On max-jOPS, SPECjbb 2015 exhibits lower extent of variability in all tests except Test 3&4.
  • Enabling Huge Pages or not
    • In both SPECjbb 2013 & 2015, enabling huge pages exhibits higher extent of variability (exception: SPECjbb 2013/OpenJDK/critical-jOPS)
  • OpenJDK vs HotSpot
    • Both JVMs exhibit similar performance in average (i.e., max- or critical- jOPS) .
By no means our results are conclusive.  As more tests are run in the future, we could gain a better picture.  From the preliminary results, it seems to be wise of choosing SPECjbb 2015 over SPECjbb 2013 in future Java SE performance tests.

Friday, August 15, 2014

jstat Tool: New Metaspace Statistics from -gc Option

In [1], we have shown one way of estimating the size of metaspace[2] using the following command:
jstat -class
However, there are other metadata or implementation overhead (i.e., MetaBlock overhead) not accounted for in the reported sizes.

In this article, we will introduce another way of monitoring the sizes needed for "Metaspace" and "Class Space".

Heap Statistics Reported by PrintGC


As described in [2], the following heap statistics would be printed at the exit of JVM if you have enabled PringGC (i.e., PrintGCDetails or PrintHeapAtGC):

Heap
 garbage-first heap   total 2097152K, used 1680819K [0x0000000080000000, 0x0000000100000000, 0x0000000100000000)
  region size 4096K, 27 young (110592K), 9 survivors (36864K)
 Metaspace       used 327768K, capacity 340303K, committed 340780K, reserved 1349632K
  class space    used 37537K, capacity 40670K, committed 40832K, reserved 1048576K

jstat -gc Command


jstat -gc can be used to display Garbage-collected heap statistics.  In JDK 8, it would display Metaspace's statistics, but not PermGen's.

-bash-3.2$ jstat -gc 18990 1000 3
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.used substituted NaN
 S0C   S1C      S0U   S1U     EC       EU       OC         OU           PC     PU  
 0.0   16384.0  0.0   16384.0 94208.0  12288.0  1986560.0  1783935.8    �      �     
 0.0   16384.0  0.0   16384.0 94208.0  57344.0  1986560.0  1775743.8    �      �     
 0.0   16384.0  0.0   16384.0 94208.0  69632.0  1986560.0  1775743.8    �      �      

At the first try, it has printed the above three lines.  However, the default jstat was run, which belongs to JDK 7.  So, you still see headers such as "PC" and "PU".  Also, you could see some gibberish text because of "unresolved symbols."  To run jstat, you need to use the correct version (i.e., from your JDK 8 installation).  After correcting this error, we have seen: 

-bash-3.2$ cd JVMs/
-bash-3.2$ jdk-hs/bin/jstat  -gc 18990 1000 3
 S0C   S1C      S0U   S1U     EC       EU       OC         OU         MC       MU       CCSC    CCSU  
 0.0   28672.0  0.0   28672.0 163840.0 45056.0  1904640.0  1663244.2  340992.0 327753.1 40960.0 37549.0  
 0.0   28672.0  0.0   28672.0 163840.0 77824.0  1904640.0  1663244.2  340992.0 327753.1 40960.0 37549.0
 0.0   28672.0  0.0   28672.0 163840.0 86016.0  1904640.0  1663244.2  340992.0 327753.1 40960.0 37549.0 


Note that we now see "MC", "MU", "CCSC", and "CCSU" instead of "PC" and "PU".

Meaning of "MC", "MU", "CCSC", and "CCSU"


There is no good documentation on what they mean yet.  But, a good guess would be:
  • MC (Metaspace Committed)
  • MU (Metaspace Used)
  • CCSC (Compressed Class Space Committed)
  • CCSU (Compressed Class Space Used)

If you check the values printed out from "jstat -gc" and the values printed out at JVM's exit time.  They have similar values.  Note that the "jstat -gc" command was issued close to the end of our JVM run.

To learn more about these new headers, read [2].

Acknowledgement


Some writings here are based on the feedback from Jon Masamitsu. However, the author would assume the full responsibility for the content himself.

References

  1. HotSpot: Monitoring and Tuning Metaspace in JDK 8
  2. HotSpot: Understanding Metaspace in JDK 8
  3. jstat - Java Virtual Machine Statistics Monitoring Tool
  4. Interpreting jstat results
  5. JavaOne Wednesday: Permanent Generation Removal Overview
  6. HotSpot: A Case Study of MetaspaceSize Tuning in JDK 8

Friday, October 26, 2012

HotSpot VM Performance Tuning Tips

In some cases, it may be obvious from benchmarking that parts of an application need to be rewritten using more efficient algorithms[1]. Sometimes it may just be enough to provide a more optimal runtime environment by tuning the JVM parameters.

In this article, we will show you some of the HotSpot VM performance tuning tips.

What to tune?


You can tune HotSpot performance on multiple fronts:
  • Code generation[8,10]
  • Memory management
In this article, we will focus more on memory management (or garbage collector).  The goals of tuning garbage collector include:
  • To make a garbage collector operate efficiently, by
    • Reducing pause time or
    • Increasing throughput
  • To avoid heap fragmentation
    • Different garbage collector uses different compaction to eliminate fragmentation
  • To make it scalable for multithreaded applications on multiprocessor systems
In this article, we will cover the following tuning options:
  1. Client VM or Server VM
  2. 32-bit VM or 64-bit VM
  3. GC strategy
  4. Heap sizing
  5. Further tuning

Client vs. Server VM


The HotSpot Client JVM has been specially tuned to reduce application startup time and memory footprint, making it particularly well suited for client environments. On all platforms, the HotSpot Client JVM is the default.

The Java HotSpot Server VM is similar to the HotSpot Client JVM except that it has been specially tuned to maximize peak operating speed. It is intended for long-running server applications, for which the fastest possible operating speed is generally more important than having the fastest startup time. To invoke the HotSpot Server JVM instead of the default HotSpot Client JVM, use the -server parameter; for example,
  • java -server MyApp
In [7], authors have mentioned a third HotSpot VM runtime named tiered.   If you are using Java 6 Update 25, Java 7, or later, you may consider using tiered server runtime as a replacement for the client runtime.   For more details, read [8,10].

32-Bit or 64-Bit VM


The 32-bit JVM is the default for the HotSpot VM. The choice of using a 32-bit or 64-bit JVM is dictated by the memory footprint required by the application along with whether any third-party software used in the application supports 64-bit JVMs and if there are any native components in the Java application. All native components using the Java Native Interface (JNI) in a 64-bit JVM must be compiled in 64-bit mode.

Running 64-bit VM has the following advantages[7]:
  • Larger address space
  • Better performance in two fronts
    • 64-bit JVMs can make use of additional CPU registers
    • Help avoid register spilling
      • Register spilling occurs where there is more live state (i.e. variables) in the application than the CPU has registers. 
and one disadvantage:
  • Increased width for oops
    • Results in fewer oops being available on a CPU cache line and as a result decreases CPU cache efficiency. 
    • This negative performance impact can be mitigated by setting:
      • -XX:+UseCompressedOops VM command line option
Note that client runtimes are not available in 64-bit HotSpot VMs.   See [11] for more details.

GC Strategy


JVM performance is usually measured by its GC's effectiveness.  Garbage collection (GC) reclaims the heap space previously allocated to objects no longer needed. The process of locating and removing those dead objects can stall your Java application while consuming as much as 25 percent of throughput.

The Java HotSpot virtual machine includes five garbage collectors.[27] All the collectors are generational.
  • Serial Collector
    • Both young and old collections are done serially (using a single CPU), in a stop-the-world fashion.
    • The old and permanent generations are collected via a mark-sweep-compact collection algorithm. 
      • The sweep phase “sweeps” over the generations, identifying garbage. The collector then performs sliding compaction, sliding the live objects towards the beginning of the old generation space (and similarly for the permanent generation), leaving any free space in a single contiguous chunk at the opposite end.
    • When to use
      • For most applications that are run on client-style machines and that do not have a requirement for low pause times
    • How to select
      • In the J2SE 5.0 and above, the serial collector is automatically chosen as the default garbage collector on machines that are not server-class machines. On other machines, the serial collector can be explicitly requested by using the -XX:+UseSerialGC command line option.
  • Parallel Collector (or throughput collector)
    • Young generation collection
      • Uses a parallel version of the young generation collection algorithm utilized by the serial collector
      • It is still a stop-the-world and copying collector, but performing the young generation collection in parallel, using many CPUs, decreases garbage collection overhead and hence increases application throughput.
    • Old generation collection
      • Uses the same serial mark-sweep-compact collection algorithm as the serial collector
    • When to use
      • For applications run on machines with more than one CPU and do not have pause time constraints, since infrequent, but potentially long, old generation collections will still occur. 
      • Examples of applications for which the parallel collector is often appropriate include those that do batch processing, billing, payroll, scientific computing, and so on.
    • How to select
      • In the J2SE 5.0 and above, the parallel collector is automatically chosen as the default garbage collector on server-class machines. On other machines, the parallel collector can be explicitly requested by using the -XX:+UseParallelGC command line option.
  • Parallel Compacting Collector
      • Young generation collection
        • Use the same algorithm as that for young generation collection using the parallel collector.
      • Old generation collection
        • The old and permanent generations are collected in a stop-the-world, mostly parallel fashion with sliding compaction
        • The collector utilizes three phases (see [5] for more details):
          • Marking phase
          • Summary phase
          • Compaction phase
      • When to use
        • For applications that are run on machines with more than one CPU and applications that have pause time constraints.
      • How to select
        • If you want the parallel compacting collector to be used, you must select it by specifying the command line option -XX:+UseParallelOldGC.
    • Concurrent Mark-Sweep (CMS) Collector[5,6]
        • Young generation collection
          • The CMS collector collects the young generation in the same manner as the parallel collector. 
        • Old generation collection
          • Most of the collection of the old generation using the CMS collector is done concurrently with the execution of the application. 
          • The CMS collector is the only collector that is non-compacting. That is, after it frees the space that was occupied by dead objects, it does not move the live objects to one end of the old generation. 
            • To minimize risk of fragmentation, CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.
        • When to use
          • For application needs shorter garbage collection pauses and can afford to share processor resources with the garbage collector when the application is running. (Due to its concurrency, the CMS collector takes CPU cycles away from the application during a collection cycle.)
          • Typically, applications that have a relatively large set of long-lived data (a large old generation), and that run on machines with two or more processors, tend to benefit from the use of this collector. 
          • Compared to the parallel collector, the CMS collector decreases old generation pauses—sometimes dramatically—at the expense of slightly longer young generation pauses, some reduction in throughput, and extra heap size requirements.
        • How to select
          • If you want the CMS collector to be used, you must explicitly select it by specifying the command line option -XX:+UseConcMarkSweepGC.  
          • If you want it to be run in incremental mode, also enable that mode via the  –XX:+CMSIncrementalMode option.
            • This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).
          • ParNewGC is the parallel young generation collector for use with CMS.  To choose it, you can specify the command line option  ‑XX:+UseParNewGC.
      • Generation First (G1) Garbage Collector[31]
        • Summary
          • Differs from CMS in the following ways
            • Compacting
              • Reduce fragmentation and is good for long-running applications 
            • Heap is split into regions
              • Easy to allocate and resize
          • Evacuation pauses 
            • For both young and old regions
        • Young generation collection
          • During a young GC, survivors from the young regions are evacuated to either survivor regions or old regions
            • Done with stop-the-world evacuation pauses
            • Performed in parallel
        • Old generation collection
          • Some garbage objects in regions with very high live ratio may be left in the heap and be collected later
          • Concurrent marking phase
            • Calculates liveness information per region
              • Empty regions can be reclaimed immediately
              • Identifies best regions for subsequent evacuation pauses
            • Remark is done with one stop-the-world puase while initial mark is piggybacked on an evacuation pause
            • No corresponding sweeping phase
            • Different marking algorithm than CMS
          • Old regions are reclaimed by
            • Evacuation pauses
              • Using compaction
              • Where most reclamation is done
            • Remark (when totally empty)
        • When to use
          • The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. 
        • How to select
          • If you want the G1 garbage collector to be used, you must explicitly select it by specifying the command line option -XX:+UseG1GC.  
      Note that the difference between:
      • -XX:+UseParallelOldGC
      and
      • -XX:+UseParallelGC
      is that -XX:+UseParallelOldGC enables both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, that is, both minor garbage collections and full garbage collections are multithreaded. -XX:+UseParallelGC enables only a multithreaded young generation garbage collector. The old generation garbage collector used with -XX:+UseParallelGC is single threaded. 

      Using -XX:+UseParallelOldGC also automatically enables -XX:+UseParallelGC. Hence, if you want to use both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, you need only specify -XX:+UseParallelOldGC.

      Note that the above distinction between -XX:+UseParallelOldGC and -XX:+UseParallelGC are no longer true in JDK 7.  In JDK 7, the following three settings are equivalent:

      • Default
      • -XX:+UseParallelGC
      • -XX:+UseParallelOldGC
      They all use multithreaded collectors for both young generation and old generation.

      Heap Sizing


      If a heap size is small, collection will be fast but the heap will fill up more quickly, thus requiring more frequent collections. Conversely, a large heap will take longer to fill up and thus collections will be less frequent, but they may take longer.

      Command line parameters that divide the heap between new and old generations usually cause the greatest performance impact.  If you increase the new generation's size, you often improve the overall throughput; however, you also increase footprint, which may slow down servers with limited memory.

      For more details, you can read [12-15].

      Further Tuning


      HotSpot's default parameters are effective for most small applications that require faster startup and a smaller footprint.  However, more often than not, you will find default settings are not good enough and need further tuning your Java applications.  As shown in [16], there are many VM options exoosed and can be further tuned by brave souls.  I'm not going to discuss such tunings in this article.  But, I'll keep on posting articles on this blogger with VM tunings.  Stay tuned!

      References

      1. Java Performance Tips
      2. Pick up performance with generational garbage collection
      3. Java HotSpot™ Virtual Machine Performance Enhancements
      4. Oracle JRockit
      5. Memory Management in the Java HotSpot™ Virtual Machine
      6. Understanding GC pauses in JVM, HotSpot's CMS collector
      7. Java Performance by Charlie Hunt and Binu John
      8. Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation
      9. Java Tuning White Paper
      10. A Case Study of Using Tiered Compilation in HotSpot
      11. HotSpot VM Binaries: 32-Bit vs. 64-Bit
      12. HotSpot Performance Option — SurvivorRatio
      13. A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded
      14. Understanding Garbage Collection
      15. Diagnosing Java.lang.OutOfMemoryError
      16. What Are the Default HotSpot JVM Values?
      17. Understanding Garbage Collector Output of Hotspot VM
      18. On Stack Replacement in HotSpot JVM
      19. Professional Oracle WebLogic Server by Robert Patrick, Gregory Nyberg, and Philip Aston
      20. Sun Performance and Tuning: Java and the Internet by Adrian Cockroft and Richard Pettit
      21. Concurrent Programming in Java: Design Principles and Patterns by Doug Lea
      22. Capacity Planning for Web Performance: Metrics, Models, and Methods by Daniel A. Menascé and Virgilio A.F. Almeida
      23. Java Performance Tuning (Michael Finocchiaro)
      24. Diagnosing Heap Stress in HotSpot
      25. Introduction to HotSpot JVM Performance and Tuning
      26. Tuning the JVM (video)
        • Frequency of Minhor GC dictated by:
          • Application object allocation rate
          • Size of Eden
        • Frequency of object promotion dictated by:
          • Frequency of minor GCs (tenuring)
          • Size of survivor spaces
        • Full GC Frequency dictated by
          • Promotion rate
          • Size of old generation
      27. JEP 173: Retire Some Rarely-Used GC Combinations
      28. G1 GC Glossary of Terms
      29. Learn More About Performance Improvements in JDK 8 
      30. Java SE HotSpot at a Glance
      31. Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
      32. Tuning that was great in old JRockit versions might not be so good anymore
        • Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
        • Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).

      Wednesday, September 12, 2012

      A Case Study of Using Tiered Compilation in HotSpot

      Hotspot has two JITs named c1 (i.e., client JIT) and c2 (i.e., server JIT).[1] The client JIT starts fast but provides less optimizations. So, it is used for GUI application. The server JIT starts more slowly but provide very good optimizations. The idea of tiered compilation [2]is to get the best of both compilers, first JITs the code with c1 and then if the code is really hot to recompile it with c2.

      The tiered server runtime is enabled with the following Hotspot VM options:
      • -server -XX:+TieredCompilation
      In this article, we show a case study of using HotSpot (build 23.0-b18, mixed mode) with TieredCompilation off/on.

      Comparison

      You can tune JVM's performance by tuning either memory management or code generaion.  Here we have tuned code generator by turning tiered compilation mode on.  With tiered compilation on, more classes are compiled (or compiled more efficiently) and so they execute faster (i.e., +15%).

      Using ATG CRM Demo benchmark, we saw the following KPI changes:



      -XX:-TieredCompilation

      -XX:+TieredCompilation

      % Change

      Total Footprint

      4431MB

      4686MB

      -5.4%

      Application Server CPU

      23%

      20%

      +15%

      Average Response Time

      0.234

      0.217

      +7.8%

      Notes


      1. Here is our setting for ReservedCodeCacheSize[2]:
        -TieredCompilation: 128MB
        +TieredCompilation: 256MB
      2. When we turned on tiered compilation mode, we also reserve larger code cache for it (i.e., 256 MB vs. 128 MB).  Code cache is allocated out of native memory (vs. heap).  Total footprint shown in the table includes native memory.  So, we see total footprint is larger when tiered compilation mode is turned on.

      References

      1. HotSpot Glossary of Terms
      2. Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation (Xml and More)

      Friday, August 31, 2012

      Which JVM?

      When you report java.lang.OutOfMemoryError: Java heap space, you need to tell people which JVM has thrown the exception.  Sometimes this can be obvious and sometimes it isn't (see below).

      An OutOfMemoryError does not necessarily imply a memory leak. The issue might simply be a configuration issue, for example if the specified heap size (or the default size if not specified) is insufficient for the application[1].  In that case, you need to figure out which configuration file to modify.

      jps Command

      You can use jps (Java Virtual Machine Process Status Tool) to list all local VM identifiers of instrumented HotSpot Java Virtual Machines (JVMs) on the target Linux system.  For example, on my system, it lists:

      $ jps 
      17291 Server
      17468 Jps
      7883 Launcher

      This list includes one Integrated WebLogic Server (i.e., Server) and one JDeveloper (i.e., Launcher). Note that JDeveloper and Integrated WLS use their own JVM's. So, if you're running Integrated WLS from JDeveloper, you need to know which JVM has thrown the OutOfMemoryError exception.

      Configuration Files


      Each application uses its own configuration file to start its JVM and you need to know where to find it.

      For example, to change heap space sizes, you can modify those parameters in:
      • JDeveloper
        • $JDEV_HOME/jdev/bin/jdev.conf
        • or 'which jdev' and find its containing directory; jdev.conf should be located in the same directory 
        • Note that jdev.conf includes ide.conf which contains the following configurations:
          • AddVMOption  -Xmx640M
          • AddVMOption  -Xms128M
      • WebLogic Server
        • You have two options:
          • Modify MEM_ARGS in the setDomainEnv.sh
          • $setenv USER_MEM_ARGS "-Xms512m -Xmx1024m -XX:MaxPermSize=1024m -XX:CompileThreshold=8000"
            • This will override the settings from WLS scripts as shown below:
              • if [ "${USER_MEM_ARGS}" != "" ] ; then
                MEM_ARGS="${USER_MEM_ARGS}"
                export MEM_ARGS
                fi
        • If this is Integrated WLS, your script folder is located under:
          • ../system11.1.1.6.38.61.92/DefaultDomain/bin
          • On JDev 12c, the heap size is set by the following line (which is embedded in setStartupEnv.sh):[5]
            • set SERVER_MEM_ARGS_64=-Xms1024m -Xmx3072m
              • You can trace the settings starting from startWebLogic.sh, to setDomainEnv.sh, to setSOADomainEnv.sh, and finally to setStartupEnv.sh.
      • Eclipse
        • /eclipse.ini

      Project Properties


      Here is another case which requires the change of project properties in JDeveloper:
      • Scenario:
        • JDeveloper is used to deploy "server-setup-seed-deploy-test" target with "Run Ant Target" and it threw "java.lang.OutOfMemoryError: PermGen space" exception.[2]
      • Solution:
        • In Jdev window, right click the project file >  Project Properties > Ant > Process > Java Options:
          • -Xms512m -Xmx1024m -XX:PermSize=512m -XX:MaxPermSize=1024m
        • Note that a new JVM is launched for the Ant process and it uses the above settings.


      References

      Saturday, August 25, 2012

      Understanding JVM Thread States

      To investigate CPU issues in Java applications, one approach is to diagnose monitor locks and thread activities[1].

      In this article, we will show you:
      • How to generate thread dumps on different VMs
      • What to know about thread states

      JVM Thread States


      A thread can be in only one state at a given point in time. These states are virtual machine states which do not reflect any operating system thread states. As an example, here shows different threads in different states on a HotSpot VM:

      $ cat thread.tmp | grep "java.lang.Thread.State" | sort | uniq -c
            3    java.lang.Thread.State: BLOCKED (on object monitor)
           18    java.lang.Thread.State: RUNNABLE
            6    java.lang.Thread.State: TIMED_WAITING (on object monitor)
            2    java.lang.Thread.State: TIMED_WAITING (sleeping)
           13    java.lang.Thread.State: WAITING (on object monitor)
            3    java.lang.Thread.State: WAITING (parking)

      Below we describe some of the thread states that can be found in a thread dump:
      • NEW - this state represents a new thread which is not yet started.
      • RUNNABLE - this state represents a thread which is executing in the underlying JVM. Here executing in JVM doesn't mean that the thread is always executing in the OS as well - it may wait for a resource from the Operating system like the processor while being in this state.
      • BLOCKED (on object monitor)- this state represents a thread which has been blocked and is waiting for a moniotor to enter/re-enter a synchronized block/method. A thread gets into this state after calling Object.wait method.
      • WAITING - this state represnts a thread in the waiting state and this wait is over only when some other thread performs some appropriate action. A thread can get into this state either by calling - Object.wait (without timeout), Thread.join (without timeout), or LockSupport.park methods.
      • TIMED_WAITING - this state represents a thread which is required to wait at max for a specified time limit. A thread can get into this state by calling either of these methods: Thread.sleep, Object.wait (with timeout specified), Thread.join (with timeout specified), LockSupport.parkNanos, LockSupport.parkUntil
      • TERMINATED - this state reprents a thread which has completed its execution either by returning from the run() method after completing the execution OR by throwing an exception which propagated from the run() method and hence caused the termination of the thread.
      • WAITING (parking)- it means a wait state after being parked. A thread can suspend its execution until permit is available (or thread is interrupted, or timeout expired, etc) by calling park(). You can give permit to a thread by calling unpark(). When permit is available, the parked thread consumes it and exits a park() method. Unlike Semaphore's permits, permits of LockSupport are associated with threads (i.e. permit is given to a particular thread) and doesn't accumulate (i.e. there can be only one permit per thread, when thread consumes the permit, it disappears).
      Notes:

      1. It's hard to investigate any CPU issue just from one thread dump. So, you need to prepare a series of thread dumps for investigation. For an example of analyzing hanging problems, see [9].
      2. The above thread state information is for HotSpot.  To understand JRockit's thread dump contents, read [4].

      Generating Thread Dumps in HotSpot

      $jcmd 15679 Thread.print >thread.tmp

      where 15679 is the process ID. For example, it will print the following messages:

      15679:
      2012-08-24 11:27:27
      Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.0-b21-internal mixed mode):
      
      "Attach Listener" daemon prio=10 tid=0x000000000528e000 nid=0x4653 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
      "Thread-29" daemon prio=10 tid=0x00002aaab9447000 nid=0x3fe3 runnable [0x000000004408b000]
         java.lang.Thread.State: RUNNABLE
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:150)
              at java.net.SocketInputStream.read(SocketInputStream.java:121)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
              - locked <0x00000000f5ae4738> (a java.io.BufferedInputStream)
              at com.sun.jndi.ldap.Connection.run(Connection.java:849)
              at java.lang.Thread.run(Thread.java:722)

      Generating Thread Dumps in JRockit[4]

      $jrcmd 1286 print_threads
      where 1286 is the process ID. For example, it will print the following messages:

      1286:
      
      ===== FULL THREAD DUMP ===============
      Thu Aug 23 17:36:30 2012
      Oracle JRockit(R) R28.1.3-11-141760-1.6.0_24-20110301-1432-linux-x86_64
      
      "Main Thread" id=1 idx=0x4 tid=1287 prio=5 alive, waiting, native_blocked
          -- Waiting for notification on: weblogic/t3/srvr/T3Srvr@0xe48b4598[fat lock]
          at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
          at java/lang/Object.wait(J)V(Native Method)
          at java/lang/Object.wait(Object.java:485)
          at weblogic/t3/srvr/T3Srvr.waitForDeath(T3Srvr.java:981)
          ^-- Lock released while waiting: weblogic/t3/srvr/T3Srvr@0xe48b4598[fat lock]
          at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:490)
          at weblogic/Server.main(Server.java:71)
          at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
          -- end of trace
      
      "(Signal Handler)" id=2 idx=0x8 tid=1288 prio=5 alive, native_blocked, daemon
      
      "(OC Main Thread)" id=3 idx=0xc tid=1289 prio=5 alive, native_waiting, daemon
      ...

      References

      1. Understanding Threads and Locks
      2. Thread States Diagram
      3. Useful tool: jrcmd
      4. Using Thread Dumps
      5. Understanding a Java thread dump
      6. Fun with JStack 
      7. Java 2 Platform, Standard Edition 5.0 "Trouobingshooting and Diagnostic Guide"
      8. HotSpot VM Performance Tuning Tips
      9. Analyze Hanging Programs Using Java Thread Traces (XML and More)
      10. Analyzing Thread Dumps in Middleware - Part 1
      11. Analyzing Thread Dumps in Middleware - Part 2

      Saturday, March 31, 2012

      A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded

      When you see the following exception:
      • java.lang.OutOfMemoryError: GC overhead limit exceeded
      it means the garbage collector is taking an excessive amount of time and recovers very little memory in each run.

      In this article, we will showcase a real example using the trace records from the garbage collector's output. To learn more on garbage collector's output, read [1] first.

      Correlating Timestamps between server.log and gc.log

      Two log files generated from Hotspot VM and WebLogic Server are used in this study:
      • server.log
      • gc.log
      Note that the names or locations of the log files could be different (which are configurable) in your environment. By default, they are generated in the log directory of the WLS domain.

      server.log

      In the server log file, we have spotted OutOfMemoryError at 5:21:40 AM as shown below:
        <Mar 30, 2012 5:21:40 AM PDT> <Error> 
        <Kernel> <BEA-000802> <ExecuteRequest failed
      
        java.lang.OutOfMemoryError: GC overhead limit exceeded.

      We have also noted the first timestamp in the server log file is  4:13:56 AM:

        <Mar 30, 2012 4:13:56 AM PDT> <Info> <Security> <BEA-090905> <Disabling CryptoJ

      Because gc log file uses elapsed time (in seconds) since server started, we know this out-of-memory incident happened 4064 seconds after server started:
      • 5:21:40 - 4:13:56 = 4064 secs.
      gc.log

      Searching for the Full GC events, we have found that something went bad at 3167 seconds after server started.

      3167.301: [Full GC3168.441:
        [SoftReference, 786 refs, 0.0001560 secs]3168.442:
        [WeakReference, 25548 refs, 0.0041550 secs]3168.446:
        [FinalReference, 7977 refs, 0.0024050 secs]3168.448:
        [PhantomReference, 40 refs, 0.0000070 secs]3168.448:
        [JNI Weak Reference, 0.0000080 secs]
        [PSYoungGen: 100268K->23067K(349568K)]
        [ParOldGen: 2713398K->2752509K(2752512K)]
        2813666K->2775576K(3102080K)
        [PSPermGen: 210744K->210744K(393216K)], 2.8599650 secs]
        [Times: user=18.52 sys=0.03, real=2.85 secs]

      3170.809: [Full GC3171.973:
        [SoftReference, 991 refs, 0.0002140 secs]3171.974:
        [WeakReference, 26280 refs, 0.0041760 secs]3171.978:
        [FinalReference, 12136 refs, 0.0142580 secs]3171.992:
        [PhantomReference, 34 refs, 0.0000100 secs]3171.992:
        [JNI Weak Reference, 0.0000080 secs]
        [PSYoungGen: 197915K->32131K(349568K)]
        [ParOldGen: 2752509K->2752511K(2752512K)]
        2950424K->2784643K(3102080K)
        [PSPermGen: 210744K->210744K(393216K)], 2.7171030 secs]
        [Times: user=17.79 sys=0.04, real=2.72 secs]

      We have noticed that a series of Full GC's happened after 3167.301 seconds.  For example, the above shows the first two such GC events.  Only after 3.5 seconds, the second Full GC was triggered.  Usually, in between Full GC's, it should be interleaved with multiple Minor GC's.  But, it was not the case.

      From examining the GC output, we have concluded that:
      • Full GC's happened too frequent (in about 3 to 4 seconds)
      • Old generation space was full and remained full after garbage collection.
      However,  at this time, the exception
      • java.lang.OutOfMemoryError: GC overhead limit exceeded
      was not thrown yet.

      Based on our calculation, we know that exception was thrown after 4064 seconds after server started.  Here are the garbage collector's output around that time:

      4064.456: [Full GC4065.667:
        [SoftReference, 2380 refs, 0.0003730 secs]4065.667:
        [WeakReference, 47627 refs, 0.0091360 secs]4065.677:
        [FinalReference, 3007 refs, 0.0007760 secs]4065.677:
        [PhantomReference, 42 refs, 0.0000070 secs]4065.677:
        [JNI Weak Reference, 0.0000090 secs]
        [PSYoungGen: 174848K->172682K(349568K)]
        [ParOldGen: 2752509K->2752508K(2752512K)]
        2927357K->2925191K(3102080K)
        [PSPermGen: 210598K->210598K(393216K)], 2.3630820 secs]
        [Times: user=16.01 sys=0.05, real=2.37 secs]

      4066.837: [Full GC4068.051:
        [SoftReference, 2355 refs, 0.0003750 secs]4068.051:
        [WeakReference, 43590 refs, 0.0081260 secs]4068.060:
        [FinalReference, 6859 refs, 0.0012270 secs]4068.061:
        [PhantomReference, 42 refs, 0.0000220 secs]4068.061:
        [JNI Weak Reference, 0.0000090 secs]
        [PSYoungGen: 174848K->171709K(349568K)]
        [ParOldGen: 2752508K->2752510K(2752512K)]
        2927356K->2924219K(3102080K)
        [PSPermGen: 210598K->210598K(393216K)], 2.8005630 secs]
        [Times: user=18.47 sys=0.02, real=2.81 secs]

      Heap Size Adjustment

      When a Full GC happens, you need to determine whether it is the occupancy of the old generation space or the occupancy of the permanent generation space that triggers a full garbage collection.  In our case, the trigger was not from the occupancy of the permanent generation space:
      • [PSPermGen: 210598K->210598K(393216K)]
      From the above, we know the total permanent generation space was 393216K.  After Full GC, its size remained stable (i.e., 210598K).

      However, we know there was serious issue regarding old generation space:
      • [ParOldGen: 2752508K->2752510K(2752512K)]
      It became full and remained full after space reclamation.

      Conclusion

      If you observe an OutOfMemoryError in the garbage collection logs, try increasing the Java heap size (including young generation and old generation) or permanent generation space.  If the issue is coming from old generation, try increasing  the Java heap size up to 80% of the physical memory you have available for the JVM taking into the consideration of the available memory on your system, the memory needed by the OS, the memory needed by other applications running concurrently, and so on.

      Based on whether the old generation space or the permanent generation space is running out of memory, you adjust the sizes of heap spaces in this way:
      • For old generation space OutOfMemoryErrors
        • increase -Xms and -Xmx
      • For permanent generation OutOfMemoryErrors
        • increase -XX:PermSize and -XX:MaxPermSize

      References

      1. Understanding Garbage Collector Output of Hotspot VM 
      2. Java Tuning White Paper 
      3. Java 2 Platform, Standard Edition 5.0 "Trouobingshooting and Diagnostic Guide"
      4. Which JVM?

      © Travel for Life Guide. All Rights Reserved.

      Analytical Insights on Health, Culture, and Security.