Insights on Oracle & Tech: September 2012

Saturday, September 29, 2012

Monitoring WebLogic Server Thread Pool at Runtime

There are two critical areas for WebLogic Server performance tuning:

Thread management
Network I/O tuning

Before any tuning, you need to identify areas of bottlenecks first. In this article, we will focus on thread management and specifically its run-time health monitoring.

Thread Pool

There are thread pool implementation changes since WLS 9.0:

Previous versions

Multiple pools of threads
See [7] on how to use the WebLogic 8.1 thread pool model for backward compatibility

WLS 9.0 and above

A single dynamically sized pool of threads (or self-tuning thread pool)
Self-tuning work manager^[8,9]

New WebLogic Server uses a single thread pool, in which all types of work are executed. Here we summarize how the new implementation works:

WebLogic Server prioritizes work and allocates threads based on an execution model that takes into account
- Administrator-defined parameters
- Actual run-time performance
The common thread pool changes its size automatically to maximize throughput.

This new strategy makes it easier for administrators to allocate processing resources and manage performance, avoiding the effort and complexity involved in configuring, monitoring, and tuning custom executes queues.
The queue monitors throughput over time and based on history, determines whether to adjust the thread count. For example,

Thread count will be increased when:

If historical throughput statistics indicate that a higher thread count increased throughput, WebLogic increases the thread count.

Thread count will be reduced when

If statistics indicate that fewer threads did not reduce throughput, WebLogic decreases the thread count.

In general, the self-tuning thread pool changes its size automatically to maximize throughput, so in normal cases there is nothing you need to do aside from monitoring it to understand the behavior of your server under different types of load.

WebLogic Server Monitoring Dashboard

From the Oracle WebLogic Administration Console, you can navigate to

Servers | SalesServer_1 | Monitoring | Threads

to view important statistics related to thread pool and thread pool threads.

Everything from Active Execute Threads to Min Threads Contrain Complete is shown on this page.

From the dashboard, the thread pool runtime can be monitored in real-time. The key KPI's to monitor include^[2]:

Hogging Thread Count

The threads that are being held by a request right now. These threads will either be declared as stuck after the configured timeout or will return to the pool before that. The self-tuning mechanism will backfill if necessary.

Pending User Request Count

The number of pending user requests in the priority queue. The priority queue contains requests from internal subsystems and users. This is just the count of all user requests.

Queue Length

The number of requests queued up when you don’t have idle threads.

On a low usage environment, these should ideally be hovering around zero.

Another KPI worth of monitoring is

Throughput

The Throughput is a single value that denotes the mean number of requests completed per second.

The higher this value is, the better it is.

Pool Size

In some cases, you do want to set the range of pool size. For example, for our Fusion Application benchmarks, our area of interest is in the JVM (i.e, not upper layers). In that case, we want to reduce the variation introduced by the self-tuning of thread pool and set our thread pool size to be:

-Dweblogic.threadpool.MinPoolSize=32 -Dweblogic.threadpool.MaxPoolSize=32

The general rule^[4] for pool sizing or other kinds of tuning is to start with no specific tuning and then configure Work Managers only to address specific problems that might arise. Aggressively configuring Work Managers for a specific environment can end up hurting performance when your application, workload, or underlying system changes.

Stuck Threads

If an execute thread is being hogged by a request for much more than the normal execution time (as automatically observed by the scheduler), it's declared as a hogger. These threads will either be declared as stuck after the configured timeout (by default, 10 min of processing time) or will return to the pool before that.

If you find any thread's state become STUCK, it's the time you start investigating—does the stuck thread ever recover or does it stay stuck indefinitely? By default, Oracle Fusion Apps is configured to generate an incident when a STUCK thread is detected. You can find them here:

<DOMAIN_HOME>/servers/<server_name>/adr/diag/ofm/<domain_name>/<server_name>/incident

The incident contains some key diagnostic information that can be used to help understand why the request took so long.

What to Expect if Things Work Normally?

This thread below is what an execute thread in the WebLogic Server self-tuning pool looks like when there is nothing for it to do.

"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" id=15 idx=0x3c tid=3810 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/work/ExecuteThread.waitForRequest(ExecuteThread.java:162)
    ^-- Lock released while waiting: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:183)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

The thread below is a timer thread waiting to be notified that it is time to wake up and do whatever it is supposed to do:

"JFR request timer" id=16 idx=0x40 tid=3811 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0xa0c20b28[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0xa0c20b28[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

What state is the “main” thread in? It should look something like this one:

"main" prio=6 tid=0x000000000224f000 nid=0x2e64 runnable [0x00000000023de000]
   java.lang.Thread.State: RUNNABLE
        at weblogic.i18n.Localizer.prune(Localizer.java:358)
        at weblogic.i18n.Localizer.getObject(Localizer.java:164)
        at weblogic.i18n.Localizer.getDiagnosticVolume(Localizer.java:344)
        at weblogic.i18n.logging.CatalogMessage.(CatalogMessage.java:53)
        at weblogic.kernel.T3SrvrLogger.logServerStateChange(T3SrvrLogger.java:2084)
        at weblogic.t3.srvr.T3Srvr.setState(T3Srvr.java:211)
        - locked <0x00000000e0a30d78> (a weblogic.t3.srvr.T3Srvr)
        at weblogic.t3.srvr.T3Srvr.initializeAdmin(T3Srvr.java:921)
        at weblogic.t3.srvr.T3Srvr.startup(T3Srvr.java:589)
        at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:471)
        at weblogic.Server.main(Server.java:74)

References

Oracle® Fusion Applications Performance and Tuning Guide 11g Release 1 (11.1.4)
Oracle SOA Suite 11g Administrator's Handbook
Tuning WebLogic Server
Oracle WebLogic Server 11g Administration Handbook
Controlling Thread Pool Size in WebLogic Server
Understanding JVM Thread States
Using the WebLogic 8.1 Thread Pool Model
- Describes how to use and tune WebLogic 8.1 thread pools
Using Work Managers to Optimize Scheduled Work
Understanding WebLogic Work Manager
Oracle® Fusion Middleware Tuning Performance of Oracle WebLogic Server 12c (12.2.1)
Data Source Connection Pool Sizing (The Weblogic Server Blog)

Tuesday, September 25, 2012

Oracle ADF Essentials

Yesterday Oracle released Oracle ADF Essentials - a free to develop and deploy packaging of the core technologies at the base of Oracle ADF. Oracle ADF Essentials applications can be deployed on the GlassFish Server.

Oracle ADF Essentials includes:

Oracle ADF Faces Rich Client Components
Oracle ADF Controller
Oracle ADF Binding
Oracle ADF Business Components.

Learn more about Oracle ADF Essentials here:

Wednesday, September 19, 2012

HotSpot VM Binaries: 32-Bit vs. 64-Bit

Yes. The following option is used to select the 64-bit compiler of HotSpot:

-d64

But, it's only good for the 32/64-bit hybrid JDK existing for some platforms (such as HP-PA, HPIA, and Solaris64). This article tries to clarify the selection of the different flavors of compiler implementation with HotSpot on our selected platforms: Linux and Solaris.

32-Bit vs. 64-Bit

The main difference between 32-bit and 64-bit JVMs is the address space. For 32-bit JVMs, memory address is limited to 4 GB. However, the actual Java heap space available for a 32-bit HotSpot VM may be further limited depending on the underlying OS^[2]:

Microsoft Windows

~1.5 GB

Linux

~2.5 - 3.0 GB for very recent Linux kernels
~2 GB for less recent Linux kernels

Solaris

~3.3 GB

The actual maximums vary due to the memory address space consumed by both a given Java application and a JVM version.

On Solaris, there are three compilers for the VM:^[4,5]

-client
-server
-d64

They are all installed in the same binary (or same Java Home) and you can switch between them with the above VM options.

On Linux, if you install the 32-bit binary, it has the -client and -server flavors. If you install the 64-bit binary, it has only the -d64 flavor. So, if you provide -d64 option on a 64-bit Linux binary, it's a no-op.

For example, I have a 64-bit binary installed on my Linux box. Here is my JDK and JVM version information:

  java version "1.7.0_04-ea"
  Java(TM) SE Runtime Environment (build 1.7.0_04-ea-b17)
  Java HotSpot(TM) 64-Bit Server VM (build 23.0-b18, mixed mode)

Note that the wording is "64-Bit Server VM" because there is no such thing as a 64-bit client VM on either Linux or Solaris.

References

Unable to load performance pack (Solaris 10)
Java Performance by Charlie Hunt and Binu John
weblogic.nodemanager. common.ConfigException Native version is enabled but node manager native library could not be loaded : NativeVersionEnabled

If you're switching from 32 bit JVM to 64 bit JVM in WebLogic Server environment, the challenge is that the WLS native libraries installed are based on the JVM used to do the installation. That means the native libraries will be 32 bit libraries that do not work with a 64 bit JVM. This will also have performance/scalability implications.

Installing WebLogic Server on 64-Bit Platforms Using a 64-Bit JDK

(UNIX or Linux only) Include the -d64 flag in the installation command when using a 32/64-bit hybrid JDK (such as for the HP-PA, HPIA, and Solaris64 platforms).

Which JDK is my FMW 11g WebLogic Domain Configured to Use?

For 32/64-bit-hybrid JVMs on some platforms, you would have to include the -d64 flag to tell the JVM to run in 64-bit mode.

Friday, September 14, 2012

HotSpot Performance Option — SurvivorRatio

As shown in [1], -XX:SurvivorRatio has been classified as a performance option on HotSpot VM.

In this article, we will discuss how this option works and how it can impact an application's performance.

Survivor Spaces

The young generation space in all HotSpot garbage collectors is subdivided into an eden space and two survivor spaces. Eden space is where new Java objects are allocated. One of the survivor spaces is labeled the “from” survivor space, and the other survivor space is labeled the “to” survivor space.

If during a minor garbage collection, the “to” survivor space is not large enough to hold all of the live objects being copied from the eden space and the “from” survivor space, the overflow will be promoted to the old generation space. Overflowing into the old generation space may cause the old generation space to grow more quickly than desired and result in an eventual stop-the-world compacting full garbage collection.

If you are interested to learn more on garbage collector, read this excellent book— "Java Performance."

-XX:SurvivorRatio

For the following discussions, we will use the following HotSpot command options:

-server -Xms2560m -Xmx2560m -Xmn512m -XX:SurvivorRatio=4

In other words, we have set Young Generation size to be 512MB and the Survivor Ratio to be 4. Since we didn't specify which Garbage Collector to be used, the default GC (i.e., -XX+UseParallelGC ) is selected.

When you set the SurvivorRatio to be 4, it means that:

The ratio of eden/survivor space size will be 4. The default value is 8.

Below we will see how the setting of SurvivorRatio will impact the sizes of survivor spaces.

If InitialSurvivorRatio or MinSurvivorRatio were not specified, but the SurvivorRatio has been set, their values will be set to:

SurvivorRatio + 2

Then the calculation is as follows (Note that young_gen_size is the value specified with -Xmn):


  size_t survivor_size = young_gen_size / InitialSurvivorRatio;

  eden_size = size - (2 * survivor_size);

So, for SurvivorRatio = 4, different spaces are derived as follows:


 InitialSurvivorRatio = SurvivorRatio + 2 = 6
 survivor_size = 512m / 6 = 87360K  (Note that young_gen_size = 512m)

 eden = young_gen_size - 2 * survivor_size = 512m - 2 * 
87360K  = 349568K

 young_gen_total(as reported in the gc print below) = eden + 1*survivor = 436928K

How to Verify?

When we started the application, we have specified the following GC-related print options:

-Xloggc:/<your_path>/logs/gc_0.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC

After the application (i.e., a benchmark in our case) ends, you can find the following printouts at the end of gc_0.log:

Heap
PSYoungGen total 436928K, used 64296K [0x00000007e0000000, 0x0000000800000000, 0x0000000800000000)

eden space 349568K, 15% used [0x00000007e0000000,0x00000007e3571608,0x00000007f5560000)
from space 87360K, 10% used [0x00000007f5560000,0x00000007f5eb8cd0,0x00000007faab0000)
to space 87360K, 0% used [0x00000007faab0000,0x00000007faab0000,0x0000000800000000)
ParOldGen total 2097152K, used 1457708K [0x0000000760000000, 0x00000007e0000000, 0x00000007e0000000)

object space 2097152K, 69% used [0x0000000760000000,0x00000007b8f8b2a8,0x00000007e0000000)
PSPermGen total 393216K, used 246405K [0x0000000748000000, 0x0000000760000000, 0x0000000760000000)

object space 393216K, 62% used [0x0000000748000000,0x00000007570a14c0,0x0000000760000000)

As you can see it:

eden space / from space = 349568K / 87360K = 4
PSYoungGen total = 349568K + 87360K = 436928K

Why It's a Performance Option?

Given there is limited physical memory, you can increase heap to a certain limited size. After that, you can tune survivor ratio to see if short lived objects can be allowed a longer time period to die in the young generation (or if they can be promoted less directly into the old generation), which could help overall response time.

Bottom line: larger survivor spaces allow short lived objects a longer time period to die in the young generation and this helps application's response time (in other words, there will be less long-paused full garbage collections).

Acknowledgement

Some of the writings here are based on the feedback from Jon Masamitsu and Scott Oaks. However, the author would assume the full responsibility for the content himself.

References

Wednesday, September 12, 2012

A Case Study of Using Tiered Compilation in HotSpot

Hotspot has two JITs named c1 (i.e., client JIT) and c2 (i.e., server JIT).^[1] The client JIT starts fast but provides less optimizations. So, it is used for GUI application. The server JIT starts more slowly but provide very good optimizations. The idea of tiered compilation ^[2]is to get the best of both compilers, first JITs the code with c1 and then if the code is really hot to recompile it with c2.

The tiered server runtime is enabled with the following Hotspot VM options:

-server -XX:+TieredCompilation

In this article, we show a case study of using HotSpot (build 23.0-b18, mixed mode) with TieredCompilation off/on.

Comparison

You can tune JVM's performance by tuning either memory management or code generaion. Here we have tuned code generator by turning tiered compilation mode on. With tiered compilation on, more classes are compiled (or compiled more efficiently) and so they execute faster (i.e., +15%).

Using ATG CRM Demo benchmark, we saw the following KPI changes:

	-XX:-TieredCompilation	-XX:+TieredCompilation	% Change
Total Footprint	4431MB	4686MB	-5.4%
Application Server CPU	23%	20%	+15%
Average Response Time	0.234	0.217	+7.8%

Notes

Here is our setting for ReservedCodeCacheSize^[2]:
-TieredCompilation: 128MB
+TieredCompilation: 256MB
When we turned on tiered compilation mode, we also reserve larger code cache for it (i.e., 256 MB vs. 128 MB). Code cache is allocated out of native memory (vs. heap). Total footprint shown in the table includes native memory. So, we see total footprint is larger when tiered compilation mode is turned on.

References

Sunday, September 9, 2012

When to use -Xbootclasspath on HotSpot?

As Ted Neward described in his article[1], you can use -Xbootclasspath to tweak the Java Runtime API. For example, we are evaluating a new ArrayList implementation and would like to benchmark its performance. So, we specify

-Xbootclasspath/p:/data/patches/NewArrayList.jar

to load the new ArrayList class from someplace other than the rt.jar file in the jre/lib directory.

-Xbootclasspath

At start-up, JVM load its internal classes and the java.* pacages from the default boot class path. However, the Java Runtime environment is very configurable. For example, you can use -Xbootclasspath to append/substitute/prepend a list of directories to/with the default boot class path using the following options:

-Xbootclasspath:bootclasspath

Specify a semicolon-separated list of directories, JAR archives, and ZIP archives to search for boot class files. These are used in place of the boot class files included in the Java 2 SDK.
Note: Applications that use this option for the purpose of overriding a class in rt.jar should not be deployed as doing so would contravene the Java 2 Runtime Environment binary code license.

-Xbootclasspath/a:path

Specify a semicolon-separated path of directires, JAR archives, and ZIP archives to append to the default bootstrap class path.

-Xbootclasspath/p:path

Specify a semicolon-separated path of directires, JAR archives, and ZIP archives to prepend in front of the default bootstrap class path.
Note: Applications that use this option for the purpose of overriding a class in rt.jar should not be deployed as doing so would contravene the Java 2 Runtime Environment binary code license.

How to Verify

To verity the effect of -Xbootclasspath, you can use the following option:

-verbose:class

Using the above example, you can find the following output from WebLogic Server's log file^{[see Note 1]}:

[Opened /data/patches/NewArrayList.jar]
[Opened /data/JVMs/nmt_test/jre/lib/alt-rt.jar]
[Opened /data/JVMs/nmt_test/jre/lib/rt.jar]
[Loaded java.lang.Object from /data/JVMs/nmt_test/jre/lib/rt.jar]
[Loaded java.io.Serializable from /data/JVMs/nmt_test/jre/lib/rt.jar] ...
[Loaded java.lang.NoSuchMethodError from /data/JVMs/nmt_test/jre/lib/rt.jar]
[Loaded java.util.ArrayList from /data/patches/NewArrayList.jar]
[Loaded java.util.Collections from /data/JVMs/nmt_test/jre/lib/rt.jar]

From the above highlighted lines, you can see that java.util.Array.List was indeed loaded from the new jar file (i.e., NewArrayList.jar).

In summary, to diagnose any class loading issue, you can use -verbose:class. There are other useful options which enable verbose output:

-verbose[:class|gc|jni]

Note

We have started WebLogic Server with the following line:

bin/startManagedWebLogic.sh CRMDemo_server1 http://myserver:7001 > logs/CRMDemo_server1.log 2>&1 < /dev/null &

References

Using the BootClasspath--Tweaking the Java Runtime API
WebLogic's Classloading Framework
Oracle® JRockit Command-Line Reference Release R28

-Xbootclasspath directories and zips/jars separated by ; (Windows) or : (Linux and Solaris)

java - the Java application launcher

Cross Column

Saturday, September 29, 2012

Monitoring WebLogic Server Thread Pool at Runtime

Thread Pool

WebLogic Server Monitoring Dashboard

Pool Size

Stuck Threads

What to Expect if Things Work Normally?

References

Tuesday, September 25, 2012

Oracle ADF Essentials

Wednesday, September 19, 2012

HotSpot VM Binaries: 32-Bit vs. 64-Bit

32-Bit vs. 64-Bit

References

Friday, September 14, 2012

HotSpot Performance Option — SurvivorRatio

Survivor Spaces

-XX:SurvivorRatio

How to Verify?

Why It's a Performance Option?

Acknowledgement

References

Wednesday, September 12, 2012

A Case Study of Using Tiered Compilation in HotSpot

Comparison

Notes

References

Sunday, September 9, 2012

When to use -Xbootclasspath on HotSpot?

-Xbootclasspath

How to Verify

Note

References