Monday, July 23, 2012

Troubleshooting System Crashes in WebLogic Server

In this article, we will describe how to troubleshoot system crashes in WebLogic Server step by step:
  1. Enabling core dump if it's not done yet
  2. Determining where the crash occurred

Enabling Core Dump

In my test, a fatal error has been detected by the Java Runtime Environment and it tried to write core dump.  However, it cannot:

# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again


You can find above message in the server log file (in <domain loc>/logs). In most Linux Distributions core file creation is disabled by default for a normal user. To enable writing core files you use the ulimit command, it controls the resources available to a process started by the shell. For example, I've added the following line at the top of my start script:

#!/bin/ksh
ulimit -c unlimited


Determining Where the Crash Occurred

Basically a hs_err files is what you get when something goes wrong during a local Java session. Local as in using a Java Virtual Machine (JVM) locally. The HotSpot VM is the default JVM for the Java SE plattform. Hence the name of hs_err files: HotSpot Error log. For example, I have found this file in the WLS domain folder:
  • hs_err_pid18181.log
The error log is a text file consisting of the following sections:
  • A header that provides a brief description of the crash
  • A section with thread information
  • A section with process information
  • A section with system information

It contains information obtained at the time of the fatal error.  For detailed descriptions of the above sections, read [4].

A Case Study

From the error log header, you can tell what the problematic frame is.  For example, this is what I have seen in the error log:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/HXXXXX/workspace/jdk7u4-2-build-linux-amd64-product/jdk7u4/hotspot/src/share/vm/oops/methodOop.cpp:498), pid=19941, tid=1098848576
#  assert(Klass::cast(k)->is_subclass_of(SystemDictionary::Throwable_klass())) failed: invalid exception class
#
# JRE version: 7.0_04-bXX
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.0-b21-fastdebug mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /data/ATG/MXXXXX/user_projects/domains/atgdomain/core or core.19941
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#


---------------  T H R E A D  ---------------

Current thread (0x00002aaac0871000):  JavaThread "[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon [_thread_in_vm, id=19995, stack(0x00000000416f1000,0x00000000417f2000)]

Stack: [0x00000000416f1000,0x00000000417f2000],  sp=0x00000000417ed4d0,  free space=1009k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc6f6d2]  VMError::report_and_die()+0x2f2
V  [libjvm.so+0x5dfa44]  report_vm_error(char const*, int, char const*, char const*)+0x84
V  [libjvm.so+0x9fd8c8]  methodOopDesc::resolved_checked_exceptions_impl(methodOopDesc*, Thread*)+0x1c8
V  [libjvm.so+0xb14eb4]  Reflection::get_exception_types(methodHandle, Thread*)+0x24
V  [libjvm.so+0xb19741]  Reflection::new_constructor(methodHandle, Thread*)+0x161
V  [libjvm.so+0x8524f6]  JVM_GetClassDeclaredConstructors+0x626
J  java.lang.Class.getDeclaredConstructors0(Z)[Ljava/lang/reflect/Constructor;

In this case, an Internal Error occurred with a thread executing in the library libjvm.so. From the Thread section , we know a JavaThread fails while in the _thread_in_vm state (meaning that it is executing in Java VM code).

Thread Information[2]

The first part of the thread section shows the thread that provoked the fatal error, as follows.
Current thread (0x00002aaac0871000):  JavaThread "[AC...]" [_thread_in_native, id=21139]
                    |                   |         |            |                +-- ID
                    |                   |         |            +---------------- state
                    |                   |         +------------------------------ name
                    |                   +---------------------------------------- type
                    +--------------------------------------------------------- pointer

The thread pointer is the pointer to the Java VM internal thread structure. It is generally of no interest unless you are debugging a live Java VM or core file. The following list shows possible thread types.
  • JavaThread
  • VMThread
  • CompilerThread
  • GCTaskThread
  • WatcherThread
  • ConcurrentMarkSweepThread
The following table shows the important thread states.

Thread State

Description
_thread_uninitialized
Thread is not created. This occurs only in the case of memory corruption.
_thread_new
Thread has been created but it has not yet started.
_thread_in_native
Thread is running native code. The error is probably a bug in native code.
_thread_in_vm
Thread is running VM code.
_thread_in_Java
Thread is running either interpreted or compiled Java code.
_thread_blocked
Thread is blocked.
..._trans
If any of the above states is followed by the string _trans, that means that the thread is changing to a different state.

The thread ID in the output is the native thread identifier. If a Java thread is a daemon thread, then the string daemon is printed before the thread state.

References

  1. Troubleshooting System Crashes
  2. Thread Information
  3. Description of Java hs_err error log files
  4. Description of Fatal Error Log
  5. Monitoring WebLogic Server Thread Pool at Runtime
  6. Understanding JVM Thread States
  7. Understanding WebLogic Incident and the Diagnostic Framework behind It

Thursday, July 19, 2012

Performance Considerations on NUMA Systems

UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access) are shared memory architectures used in parallel computers.

In a UMA architecture, all processors share the physical memory uniformly. In this model, access time to a memory location is independent of which processor makes the request or which memory chip contains the transferred data.

In contrast, the system using NUMA architecture consists of multiple "nodes" of cpu's (processors) and/or memory which are linked together by a special high-speed network. All cpu's have access to all of memory, but a cpu's access to memory on the local or a nearby node is faster than to distant nodes. For more explanation see [1-3].

In this article, we will examine NUMA support in Linux kernels and Application Servers.

Memory Policy

Most modern OS kernels are NUMA aware and support different memory policies.

NUMA policy is concerned about putting memory allocations on specific nodes to let programs access it as fast as possible. The primary way to do this is to allocate memory for a thread on its local node and keep the thread running there (node affinity). This gives the best latency for memory and minimizes traffic over  the global interconnect.

There are four different memory policies supported in Linux:
  1. interleave
    • Interleave memory allocations on a set of nodes.
  2. preferred
    • Try to allocate on a node first.
  3. membind
    • Allocate on a specific set of nodes.
  4. localalloc (default)
    • Allocate on the local node.
The difference between bind and preferred is that bind will fail the allocation when the memory cannot be allocated on the specified nodes. preferred falls back to other nodes. Using bind can lead to earlier running out of memory and delays due to more swapping.

To find out how many nodes existing on your Linux System, try:


$ numactl --hardware
available: 2 nodes (0-1)
node 0 size: 48448 MB
node 0 free: 212 MB
node 1 size: 48480 MB
node 1 free: 13148 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10



$ numactl --hardware
available: 1 nodes (0-0)
node 0 size: 32284 MB
node 0 free: 12942 MB
node distances:
node   0
  0:  10


The above outputs are from two Linux systems.  The first system has two memory nodes (i.e., a NUMA system).  The second system has only one memory node (i.e., not a NUMA system).  For the last three lines in the outputs, they specify the access cost of reaching node M from node N (which is 10 or 20 units).  If the machine is a NUMA system, you should see an NxN matrix with access costs from one node to another.

Similarly, you can find what the current memory policy, cpu bind and memory bind are by typing:

$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
cpubind: 0 1
nodebind: 0 1
membind: 0 1


$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7
cpubind: 0
nodebind: 0
membind: 0

Note that the outputs are from the same systems and in the same order.  The first system has 24 CPU's.  The second system has 8 CPU's.

Running Applications with a Specific NUMA Policy

The numactl program allows you to run your application program on specific cpu's and memory nodes. It does this by supplying a NUMA memory policy to the operating system before running your program. For example,
   $numactl  --cpunodebind=0  --membind=0,1  myprog
runs program "myprog" on cpu 0, using memory on nodes 0 and 1.

NUMA Enhancements in Different VMs


As described in [3], certain applications seem not to scale easily on Non-Uniform Memory Architectures (NUMA) since the addition of CPU cores does not proportionately increase application performance.

To perform well on a NUMA architecture, the garbage collector threads should be structured in a beneficial way. Below we will examine the NUMA enhancements provided in HotSpot and JRockit.

HotSpot


Since JDK6 update 2, the NUMA-aware allocator can be turned on with the -XX:+UseNUMA  flag in conjunction with the throughput collector (which is the default in HotSpot or you can set it explicitly with -XX:+UseParallelGC flag)[4]. When you turn this feature on, JVM then divides the young generation into separate pools, one pool for each node. When a specific thread allocates an object, the JVM looks to see what node that thread is on and then allocates the object from the pool for that node.  By taking advantage of node locality and affinity, applications running on NUMA systems can scale better.

JRockit[5]


The fag –XX:BindToCPUs can be used to force JRockit to only use certain CPUs in the machine. 

For example, you can set JRockit CPU affnity to CPU 0 and 2 only: 
  • java –XX:BindToCPUs=0,2
For NUMA, a separate affnity fags exists for NUMA nodes (–XX:BindToNumaNodes) as well as a fag that can control the NUMA memory allocation policy. This enables the user to specify if JRockit should interleave allocated pages evenly across all NUMA nodes or bind them in the local node where the memory allocation takes place. A "preferred local" policy can also be selected, that is JRockit should try to use the local node, but interleaved allocation is also fne. 

For example, you can force local NUMA allocation. 
  • java –XX:NumaMemoryPolicy=strictlocal
Other values are preferredlocal and interleave.

Acknowledgement


Some of the writings here are based on the feedback from Igor Veresov, Jon Masamitsu and David Keenan. However, the author would assume the full responsibility for the content himself.

Tuesday, July 17, 2012

GlassFish Server 3.1.2.2 is Now Available


As previously announced[1] in the GlassFish Forums, GlassFish Server 3.1.2.2 is now available from GlassFish.org[2] and Oracle.com[3]. 

If GlassFish update notification is enabled, you will see a notification of "new updates available". You can click on the notification to update your existing installation to the new version. 
Note: Stay tuned for an update to the Java EE 6 SDK distributions with GlassFish Server 3.1.2.2.

Thursday, July 12, 2012

jstat - Java Virtual Machine Statistics Monitoring Tool

If you suspect something wrong in a long-running application, one thing to check is to see if garbage collection is working healthily or not.

In this article, we will introduce you one way to monitor the health of JVM's garbage collection activities.

jstat

You can find the Java Virtual Machine Statistics Monitoring Tool named jstat in the following folder:
  • ${JDK_INST}/bin/jstat
The jstat utility uses the built-in instrumentation in the HotSpot VM to provide information on performance and resource consumption of running applications. The tool can be used when diagnosing performance issues, and in particular issues related to heap sizing and garbage collection. The utility does not require that the application be started with any special options as instrumentation is enabled by default.

jstat provides different options which determine the statistics information that jstat displays. To find the list of options for a particular platform installation that jstat displays, you type:
  • jstat -option
For example, it displays the following supported options for my Linux platform:
  • -class
    • displays statistics on the behavior of the class loader
  • -compiler
    • displays statistics of the behavior of the HotSpot Just-in-Time compiler
  • -gc
    • displays statistics of the behavior of the garbage collected heap
  • -gccapacity
    • displays statistics of the capacities of the generations and their corresponding spaces
  • -gccause
    • displays summary of garbage collection statistics (same as -gcutil), with the cause of the last and current (if applicable) garbage collection events
  • -gcnew
    • displays statistics of the behavior of the new generation
  • -gcnewcapacity
    • displays statistics of the sizes of the new generations and its corresponding spaces
  • -gcold
    • displays statistics of the behavior of the old and permanent generations
  • -gcoldcapacity
    • displays statistics of the sizes of the old generation
  • -gcpermcapacity
    • displays statistics of the sizes of the permanent generation
  • -gcutil
    • displays summary of garbage collection statistics
  • -printcompilation
    • displays HotSpot compilation method statistics

-gcutil Option

In this demonstration, we will choose -gcutil to display garbage collection statistics.  You can find the meaning of displayed columns below:

Summary of Garbage Collection Statistics
ColumnDescription
S0Survivor space 0 utilization as a percentage of the space's current capacity.
S1Survivor space 1 utilization as a percentage of the space's current capacity.
EEden space utilization as a percentage of the space's current capacity.
OOld space utilization as a percentage of the space's current capacity.
PPermanent space utilization as a percentage of the space's current capacity.
YGCNumber of young generation GC events.
YGCTYoung generation garbage collection time.
FGCNumber of full GC events.
FGCTFull garbage collection time.
GCTTotal garbage collection time.

The Finding

In our case, we have found the following statistics information:
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    11  416.676  471.872
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    11  416.676  471.872
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    11  416.676  471.872
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    11  416.676  471.872

Note that 9852 is the process id that jstat retrieves statistics from. You can also use `pgrep java` to retrieve the process id directly. For example, you can specify the above command line as:
  • jstat -gcutil `pgrep java` 

Then, after a little while, we print the statistics information again:
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    25 1008.989 1064.186
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    25 1008.989 1064.186
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    25 1008.989 1064.186
$ jstat -gcutil 9852
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00 100.00 100.00  23.68    447   55.197    25 1008.989 1064.186

This has helped us to find out that our process was stuck because it is out of heap and all it is doing is garbage collection. In other words, O (old space) and E (eden space) are continually full.

References

  1. jstat - Java Virtual Machine Statistics Monitoring Tool
  2. Java 2 Platform, Standard Edition 5.0 "Trouobingshooting and Diagnostic Guide"
  3. Monitoring Java garbage collection with jstat

Tuesday, July 10, 2012

How to Unlock a User Account in WebLogic Server

If a user unsuccessfully attempts to log into a WebLogic Server instance more than the configured number of retry attempts, they are locked out of further access.

This article shows you how to unlock a locked user so that they can log in again.

The Symptom

When a user was locked out, you can find an entry in the application server's log:

  <Jul 10, 2012 3:51:38 PM PDT> <Notice> <Security> <BEA-090078> 
  <User psr1 in security realm myrealm has had 5 invalid login 
    attempts, locking account for 30 minutes.>


The Solution

To unlock a user account: 
  1. If you have not already done so, in the Change Center of the Administration Console, click Lock & Edit.
  2. In the left pane click the name of the domain. 
  3. Select Security > Unlock User
  4. Enter the user name of the user whose account you want to unlock. 
  5. Click Save



User Lockout Management Statistics

After you have unlocked a user account, you can monitor user lockout management statistics for your server by selecting:

  • Servers > CRMDemo_server1 > Monitoring > Security
In our case, there were 6 locked users.  After unlocking all of them, the page became:



References

  1. How to reset a user who has been locked out of WebLogic due to too many invalid login attempts