Wednesday, September 4, 2013

How to Troubleshoot High CPU Usage of Java Applications?

A Java application that is constantly maxing out the CPU load sometimes can be a good thing[3]. For instance, for a batch application that is computationally bound, it would normally be a best case scenario for it to complete as soon as possible. Also idled CPU could be a waste and should be avoided. CPU idle happens when the system:
  • Needs to wait for locks or external resources
    • The application might be blocked on a synchronization primitive and unable to execute until that lock is released
    • The application might be waiting for something, such as a response to come back from a call to the database
  • Has no threads available to handle the work in a multithreaded, multi-CPU case
  • Has nothing to do
Looking at the % CPU utilization is a first step in understanding your application performance, but it is only that—Use it to see if you are using all the CPU you expect, or if it points to some synchronization or resource issue.

Normally, some over-provisioning is needed to keep an application responsive. If the CPU usage is very high (i.e, consistently over 95%), you may want to invest in better hardware, or look over the data structures and algorithms employed by the application.

In this article, we will show you how to investigate where all those CPU cycles are being spent in your Java applications.

How to Troubleshoot High CPU Usage?


The easiest approach is to generate a sequence of thread dumps to see what's keeping the processor busy. Note that you can't tell much from a single thread dump. So, you need to generate a sequence of thread dumps.

Thread dumps generated at high CPU times are the most useful. To monitor CPU usage, you can use Linux tools like top [2] or prstat[4] to see which threads are consuming the most CPU and get thread dumps at the same time. Then you can map the ids. It may end up being GC that is taking the CPU if your memory pressure is high. In that case, you also need to gather GC logs for further analysis.

Using top Linux command, Java threads (or Linux LWP's) will be sorted based on the %CPU by default. Pressing Shift+F, you will be shown a screen specifying the current sort field. Then you can select different sort field by selecting different field letter. For example, select "n" for sorting by memory usage (RES).

User Time vs. System Time


Some Linux commands (i.e., vmstat)[1] can report CPU time spent in either system or user space. User time (including nice time for vmstat) is the percentage of time the CPU is executing application code (including GC code), while system time is the percentage of time the CPU is executing kernel code.

System time could be related to your application too. For example, if your application performs I/O, the kernel will execute the code to read the file from disk, or write the network buffer, and so on. High levels of system time often mean something is wrong, or the application is making many system calls. Investigating the cause of high system time is always worthwhile.

CPU Tuning


The goal in performance is always to drive the CPU usage as high as possible (for as short a time as possible). The CPU number is an indication of how effectively the program is using the expensive CPU, and so the higher the number the better. As previously mentioned, in some CPU-bound applications (i.e., the CPU is the limiting factor), for example batch jobs, it is normally a good thing for the system to be completely saturated during the run. However, for a standard server-side application it is probably more beneficial if the system is able to handle some extra load in addition to the expected one.

Based on [8], Oracle has provided the following tuning guidelines (including CPU tuning) for its Fusion Applications:


Metric Category Metric Name Warning Threshold Critical Threshold Comments
Disk Activity Disk Device Busy >80% >95%
Filesystems Filesystem Space Available <20 <5
Load CPU in I/O wait >60% >80%
CPU Utilization >80% >95%
Run Queue (5 min average) >2 >4 The run queue is normalized by the number of CPU cores.
Swap Utilization >75% >90%
Total Processes >15000 >25000
Logical Free Memory % <20 <10
CPU in System Mode >20% >40%
Network Interfaces Summary All Network Interfaces Combined Utilization >80% >95%
Switch/Swap Activity Total System Swaps >3 >5 Value is per second.
Paging Activity Pages Paged-in (per second)
Pages Paged-out (per second) The combined value of Pages Paged-in and Pages Paged-out should be <=1000


Oracle Performance Tools


To analyze high CPU usage in Java applications, the best approach is to use enterprise profilers. For example, Oracle Solaris Studio[5] can offer more performance details and better measurements.  Now it can run on Oracle Solaris, Oracle Linux, and Red Hat Enterprise Linux operating systems.

The Oracle Solaris Studio Performance Analyzer can be extremely useful to identify bottlenecks and provide advanced profiling for your applications. The key features of it includes[6]:
  • Low overhead for fast and accurate results
  • Advanced profiling of single-threaded and multithreaded applications
  • Support for multiprocess and system-wide profiling
  • Ability to analyze MPI applications
  • Support for C, C++, Fortran, and Java code
  • Optimized for the latest Oracle systems
If your applications run in JRockit, another good way to profile CPU usage is to capture JRockit Flight Recordings[7]. JFR can provide extremely detailed level of profiling with little impact of your application performance.  If you use HotSpot, Java Mission Control and Java Flight Recorder (commercial features) are now available for Java SE 7u40 as well as for JRockit[9].

References

1 comment:

Fixing Problem said...

A Windows feature called superfetch preloads frequently used programmes into memory for quicker access. On some systems, nevertheless, it can result in performance concerns. To disable Superfetch in Windows, open Services, locate SysMain, right-click and choose Properties, change Startup type to Disabled, and then click Stop.