Saturday, September 29, 2012

Monitoring WebLogic Server Thread Pool at Runtime

There are two critical areas for WebLogic Server performance tuning:
  • Thread management
  • Network I/O tuning
Before any tuning, you need to identify areas of bottlenecks first.  In this article, we will focus on thread management and specifically its run-time health monitoring.

Thread Pool


There are thread pool implementation changes since WLS 9.0: 
  • Previous versions
    • Multiple pools of threads 
    • See [7] on how to use the WebLogic 8.1 thread pool model for backward compatibility
  • WLS 9.0 and above
    • A single dynamically sized pool of threads (or self-tuning thread pool)
    • Self-tuning work manager[8,9]
New WebLogic Server uses a single thread pool, in which all types of work are executed. Here we summarize how the new implementation works:
  • WebLogic Server prioritizes work and allocates threads based on an execution model that takes into account 
    • Administrator-defined parameters 
    • Actual run-time performance 
      • Throughput
  • The common thread pool changes its size automatically to maximize throughput.
    • This new strategy makes it easier for administrators to allocate processing resources and manage performance, avoiding the effort and complexity involved in configuring, monitoring, and tuning custom executes queues. 
    • The queue monitors throughput over time and based on history, determines whether to adjust the thread count.  For example,
      • Thread count will be increased when:
        • If historical throughput statistics indicate that a higher thread count increased throughput, WebLogic increases the thread count. 
      • Thread count will be reduced when
        • If statistics indicate that fewer threads did not reduce throughput, WebLogic decreases the thread count. 
In general, the self-tuning thread pool changes its size automatically to maximize throughput, so in normal cases there is nothing you need to do aside from monitoring it to understand the behavior of your server under different types of load.

WebLogic Server Monitoring Dashboard


From the Oracle WebLogic Administration Console, you can navigate to

  • Servers | SalesServer_1 | Monitoring | Threads 

to view important statistics related to thread pool and thread pool threads.

Everything from Active Execute Threads to Min Threads Contrain Complete is shown on this page.


From the dashboard, the thread pool runtime can be monitored in real-time. The key KPI's to monitor include[2]:
  • Hogging Thread Count 
    • The threads that are being held by a request right now. These threads will either be declared as stuck after the configured timeout or will return to the pool before that. The self-tuning mechanism will backfill if necessary.
  • Pending User Request Count
    • The number of pending user requests in the priority queue. The priority queue contains requests from internal subsystems and users. This is just the count of all user requests.
  • Queue Length
    • The number of requests queued up when you don’t have idle threads.  
On a low usage environment, these should ideally be hovering around zero.

Another KPI worth of monitoring is
  • Throughput
    • The Throughput is a single value that denotes the mean number of requests completed per second. 
The higher this value is, the better it is.

Pool Size


In some cases, you do want to set the range of pool size.  For example, for our Fusion Application benchmarks, our area of interest is in the JVM (i.e, not upper layers).  In that case, we want to reduce the variation introduced by the self-tuning of thread pool and set our thread pool size to be:
  • -Dweblogic.threadpool.MinPoolSize=32 -Dweblogic.threadpool.MaxPoolSize=32
The general rule[4] for pool sizing or other kinds of tuning is to start with no specific tuning and then configure Work Managers only to address specific problems that might arise. Aggressively configuring Work Managers for a specific environment can end up hurting performance when your application, workload, or underlying system changes.

Stuck Threads


If an execute thread is being hogged by a request for much more than the normal execution time (as automatically observed by the scheduler), it's declared as a hogger.  These threads will either be declared as stuck after the configured timeout (by default, 10 min of processing time) or will return to the pool before that.

If you find any thread's state become STUCK, it's the time you start investigating—does the stuck thread ever recover or does it stay stuck indefinitely?  By default, Oracle Fusion Apps is configured to generate an incident when a STUCK thread is detected.  You can find them here:
  • <DOMAIN_HOME>/servers/<server_name>/adr/diag/ofm/<domain_name>/<server_name>/incident
The incident contains some key diagnostic information that can be used to help understand why the request took so long.

What to Expect if Things Work Normally?


This thread below is what an execute thread in the WebLogic Server self-tuning pool looks like when there is nothing for it to do.
"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" id=15 idx=0x3c tid=3810 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/work/ExecuteThread.waitForRequest(ExecuteThread.java:162)
    ^-- Lock released while waiting: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:183)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

The thread below is a timer thread waiting to be notified that it is time to wake up and do whatever it is supposed to do:
"JFR request timer" id=16 idx=0x40 tid=3811 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0xa0c20b28[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0xa0c20b28[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

What state is the “main” thread in? It should look something like this one:
"main" prio=6 tid=0x000000000224f000 nid=0x2e64 runnable [0x00000000023de000]
   java.lang.Thread.State: RUNNABLE
        at weblogic.i18n.Localizer.prune(Localizer.java:358)
        at weblogic.i18n.Localizer.getObject(Localizer.java:164)
        at weblogic.i18n.Localizer.getDiagnosticVolume(Localizer.java:344)
        at weblogic.i18n.logging.CatalogMessage.(CatalogMessage.java:53)
        at weblogic.kernel.T3SrvrLogger.logServerStateChange(T3SrvrLogger.java:2084)
        at weblogic.t3.srvr.T3Srvr.setState(T3Srvr.java:211)
        - locked <0x00000000e0a30d78> (a weblogic.t3.srvr.T3Srvr)
        at weblogic.t3.srvr.T3Srvr.initializeAdmin(T3Srvr.java:921)
        at weblogic.t3.srvr.T3Srvr.startup(T3Srvr.java:589)
        at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:471)
        at weblogic.Server.main(Server.java:74)

References

  1. Oracle® Fusion Applications Performance and Tuning Guide 11g Release 1 (11.1.4)
  2. Oracle SOA Suite 11g Administrator's Handbook
  3. Tuning WebLogic Server
  4. Oracle WebLogic Server 11g Administration Handbook
  5. Controlling Thread Pool Size in WebLogic Server
  6. Understanding JVM Thread States
  7. Using the WebLogic 8.1 Thread Pool Model
    • Describes how to use and tune WebLogic 8.1 thread pools
  8. Using Work Managers to Optimize Scheduled Work
  9. Understanding WebLogic Work Manager
  10. Oracle® Fusion Middleware Tuning Performance of Oracle WebLogic Server 12c (12.2.1)
  11. Data Source Connection Pool Sizing (The Weblogic Server Blog)

Tuesday, September 25, 2012

Oracle ADF Essentials

Yesterday Oracle released Oracle ADF Essentials - a free to develop and deploy packaging of the core technologies at the base of Oracle ADF. Oracle ADF Essentials applications can be deployed on the GlassFish Server.

Oracle ADF Essentials includes:
  • Oracle ADF Faces Rich Client Components
  • Oracle ADF Controller
  • Oracle ADF Binding 
  • Oracle ADF Business Components.

Learn more about Oracle ADF Essentials here:



Wednesday, September 19, 2012

HotSpot VM Binaries: 32-Bit vs. 64-Bit

Yes. The following option is used to select the 64-bit compiler of HotSpot:
  • -d64
But, it's only good for the 32/64-bit hybrid JDK existing for some platforms (such as HP-PA, HPIA, and Solaris64).   This article tries to clarify the selection of the different flavors of compiler implementation with HotSpot on our selected platforms: Linux and Solaris.

32-Bit vs. 64-Bit


The main difference between 32-bit and 64-bit JVMs is the address space.  For 32-bit JVMs, memory address is limited to 4 GB.  However, the actual Java heap space available for a 32-bit HotSpot VM may be further limited depending on the underlying OS[2]:
  • Microsoft Windows
    • ~1.5 GB
  • Linux 
    • ~2.5 - 3.0 GB for very recent Linux kernels
    • ~2 GB for less recent Linux kernels
  • Solaris
    • ~3.3 GB
The actual maximums vary due to the memory address space consumed by both a given Java application and a JVM version.

On Solaris, there are three compilers for the VM:[4,5]
  • -client
  • -server
  • -d64
They are all installed in the same binary (or same Java Home) and you can switch between them with the above VM options.

On Linux, if you install the 32-bit binary, it has the -client and -server flavors. If you install the 64-bit binary, it has only the -d64 flavor. So, if you provide -d64 option on a 64-bit Linux binary, it's a no-op.

For example, I have a 64-bit binary installed on my Linux box.  Here is my JDK and JVM version information:

  java version "1.7.0_04-ea"
  Java(TM) SE Runtime Environment (build 1.7.0_04-ea-b17)
  Java HotSpot(TM) 64-Bit Server VM (build 23.0-b18, mixed mode)

Note that the wording is "64-Bit Server VM" because there is no such thing as a 64-bit client VM on either Linux or Solaris.

References

  1. Unable to load performance pack (Solaris 10)
  2. Java Performance by Charlie Hunt and Binu John
  3. weblogic.nodemanager. common.ConfigException Native version is enabled but node manager native library could not be loaded : NativeVersionEnabled
    • If you're switching from 32 bit JVM to 64 bit JVM  in WebLogic Server environment, the challenge is that the WLS native libraries installed are based on the JVM used to do the installation.  That means the native libraries will be 32 bit libraries that do not work with a 64 bit JVM.  This will also have performance/scalability implications. 
  4. Installing WebLogic Server on 64-Bit Platforms Using a 64-Bit JDK
    • (UNIX or Linux only) Include the -d64 flag in the installation command when using a 32/64-bit hybrid JDK (such as for the HP-PA, HPIA, and Solaris64 platforms). 
  5. Which JDK is my FMW 11g WebLogic Domain Configured to Use?
    • For 32/64-bit-hybrid JVMs on some platforms, you would have to include the -d64 flag to tell the JVM to run in 64-bit mode.

Friday, September 14, 2012

HotSpot Performance Option — SurvivorRatio

As shown in [1], -XX:SurvivorRatio has been classified as a performance option on HotSpot VM.

In this article, we will discuss how this option works and how it can impact an application's performance.

Survivor Spaces


The young generation space in all HotSpot garbage collectors is subdivided into an eden space and two survivor spaces.  Eden space is where new Java objects are allocated.  One of the survivor spaces is labeled the “from” survivor space, and the other survivor space is labeled the “to” survivor space.

If during a minor garbage collection, the “to” survivor space is not large enough to hold all of the live objects being copied from the eden space and the “from” survivor space, the overflow will be promoted to the old generation space. Overflowing into the old generation space may cause the old generation space to grow more quickly than desired and result in an eventual stop-the-world compacting full garbage collection.

If you are interested to learn more on garbage collector, read this excellent book— "Java Performance."


-XX:SurvivorRatio


For the following discussions, we will use the following HotSpot command options:
  • -server -Xms2560m -Xmx2560m -Xmn512m -XX:SurvivorRatio=4
In other words, we have set Young Generation size to be 512MB and the Survivor Ratio to be 4.  Since we didn't specify which Garbage Collector to be used, the default GC (i.e., -XX+UseParallelGC ) is selected.

When you set the SurvivorRatio to be 4, it means that:
  • The ratio of eden/survivor space size will be 4. The default value is 8.
Below we will see how the setting of SurvivorRatio will impact the sizes of  survivor spaces.

If InitialSurvivorRatio or MinSurvivorRatio were not specified, but the SurvivorRatio has been set, their values will be set to:
  • SurvivorRatio + 2
Then the calculation is as follows (Note that young_gen_size is the value specified with -Xmn):

  size_t survivor_size = young_gen_size / InitialSurvivorRatio;
  eden_size = size - (2 * survivor_size);


So, for SurvivorRatio = 4, different spaces are derived as follows:

 InitialSurvivorRatio = SurvivorRatio + 2 = 6
 survivor_size = 512m / 6 = 87360K  (Note that young_gen_size = 512m)
 eden = young_gen_size - 2 * survivor_size = 512m - 2 *  87360K  = 349568K
 young_gen_total(as reported in the gc print below) = eden + 1*survivor = 436928K

How to Verify?


When we started the application, we have specified the following GC-related print options:
  • -Xloggc:/<your_path>/logs/gc_0.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC 

After the application (i.e., a benchmark in our case) ends, you can find the following printouts at the end of gc_0.log:

Heap
PSYoungGen total 436928K, used 64296K [0x00000007e0000000, 0x0000000800000000, 0x0000000800000000) 
  eden space 349568K, 15% used [0x00000007e0000000,0x00000007e3571608,0x00000007f5560000)
  from space 87360K, 10% used [0x00000007f5560000,0x00000007f5eb8cd0,0x00000007faab0000)
  to space 87360K, 0% used [0x00000007faab0000,0x00000007faab0000,0x0000000800000000)
ParOldGen total 2097152K, used 1457708K 
[0x0000000760000000, 0x00000007e0000000, 0x00000007e0000000) 
  object space 2097152K, 69% used [0x0000000760000000,0x00000007b8f8b2a8,0x00000007e0000000)
PSPermGen total 393216K, used 246405K 
[0x0000000748000000, 0x0000000760000000, 0x0000000760000000) 
  object space 393216K, 62% used [0x0000000748000000,0x00000007570a14c0,0x0000000760000000)

As you can see it:
  • eden space / from space = 349568K / 87360K = 4
  • PSYoungGen total =  349568K +  87360K = 436928K

Why It's a Performance Option?


Given there is limited physical memory, you can increase heap to a certain limited size.  After that, you can tune survivor ratio to see if short lived objects can be allowed a longer time period to die in the young generation  (or if they can be promoted less directly into the old generation), which could help overall response time.

Bottom line: larger survivor spaces allow short lived objects a longer time period to die in the young generation and this helps application's response time (in other words, there will be less long-paused full garbage collections).

Acknowledgement


Some of the writings here are based on the feedback from Jon Masamitsu and Scott Oaks. However, the author would assume the full responsibility for the content himself.

References

  1. Java HotSpot VM Options
  2. The Fault with Defaults
  3. Garbage Collector Ergonomics
  4. Java Performance by Charlie Hunt and Binu John

Wednesday, September 12, 2012

A Case Study of Using Tiered Compilation in HotSpot

Hotspot has two JITs named c1 (i.e., client JIT) and c2 (i.e., server JIT).[1] The client JIT starts fast but provides less optimizations. So, it is used for GUI application. The server JIT starts more slowly but provide very good optimizations. The idea of tiered compilation [2]is to get the best of both compilers, first JITs the code with c1 and then if the code is really hot to recompile it with c2.

The tiered server runtime is enabled with the following Hotspot VM options:
  • -server -XX:+TieredCompilation
In this article, we show a case study of using HotSpot (build 23.0-b18, mixed mode) with TieredCompilation off/on.

Comparison

You can tune JVM's performance by tuning either memory management or code generaion.  Here we have tuned code generator by turning tiered compilation mode on.  With tiered compilation on, more classes are compiled (or compiled more efficiently) and so they execute faster (i.e., +15%).

Using ATG CRM Demo benchmark, we saw the following KPI changes:



-XX:-TieredCompilation

-XX:+TieredCompilation

% Change

Total Footprint

4431MB

4686MB

-5.4%

Application Server CPU

23%

20%

+15%

Average Response Time

0.234

0.217

+7.8%

Notes


  1. Here is our setting for ReservedCodeCacheSize[2]:
    -TieredCompilation: 128MB
    +TieredCompilation: 256MB
  2. When we turned on tiered compilation mode, we also reserve larger code cache for it (i.e., 256 MB vs. 128 MB).  Code cache is allocated out of native memory (vs. heap).  Total footprint shown in the table includes native memory.  So, we see total footprint is larger when tiered compilation mode is turned on.

References

  1. HotSpot Glossary of Terms
  2. Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation (Xml and More)

Sunday, September 9, 2012

When to use -Xbootclasspath on HotSpot?

As Ted Neward described in his article[1], you can use -Xbootclasspath to tweak the Java Runtime API.  For example, we are evaluating a new ArrayList implementation and would like to benchmark its performance.  So, we specify
  • -Xbootclasspath/p:/data/patches/NewArrayList.jar
to load the new ArrayList class from someplace other than the rt.jar file in the jre/lib directory.

-Xbootclasspath


At start-up, JVM load its internal classes and the java.* pacages from the default boot class path.  However, the Java Runtime environment is very configurable.  For example, you can use -Xbootclasspath to append/substitute/prepend a list of directories to/with the default boot class path using the following options:

  • -Xbootclasspath:bootclasspath 
    • Specify a semicolon-separated list of directories, JAR archives, and ZIP archives to search for boot class files. These are used in place of the boot class files included in the Java 2 SDK.
    • Note: Applications that use this option for the purpose of overriding a class in rt.jar should not be deployed as doing so would contravene the Java 2 Runtime Environment binary code license. 
  • -Xbootclasspath/a:path 
    • Specify a semicolon-separated path of directires, JAR archives, and ZIP archives to append to the default bootstrap class path. 
  • -Xbootclasspath/p:path 
    • Specify a semicolon-separated path of directires, JAR archives, and ZIP archives to prepend in front of the default bootstrap class path.
    • Note: Applications that use this option for the purpose of overriding a class in rt.jar should not be deployed as doing so would contravene the Java 2 Runtime Environment binary code license.

How to Verify


To verity the effect of -Xbootclasspath, you can use the following option:
  • -verbose:class
Using the above example, you can find the following output from WebLogic Server's log file[see Note 1]:

[Opened /data/patches/NewArrayList.jar]
[Opened /data/JVMs/nmt_test/jre/lib/alt-rt.jar]
[Opened /data/JVMs/nmt_test/jre/lib/rt.jar]
[Loaded java.lang.Object from /data/JVMs/nmt_test/jre/lib/rt.jar]
[Loaded java.io.Serializable from /data/JVMs/nmt_test/jre/lib/rt.jar] ...
[Loaded java.lang.NoSuchMethodError from /data/JVMs/nmt_test/jre/lib/rt.jar]
[Loaded java.util.ArrayList from /data/patches/NewArrayList.jar]
[Loaded java.util.Collections from /data/JVMs/nmt_test/jre/lib/rt.jar]

From the above highlighted lines, you can see that java.util.Array.List was indeed loaded from the new jar file (i.e., NewArrayList.jar).

In summary, to diagnose any class loading issue, you can use -verbose:class.  There are other useful options which enable verbose output:
  • -verbose[:class|gc|jni]

Note


  1. We have started WebLogic Server with the following line:
    • bin/startManagedWebLogic.sh CRMDemo_server1   http://myserver:7001 >   logs/CRMDemo_server1.log 2>&1 < /dev/null &

    • In other words, we have redirected stdout and stderr from WLS window to CRMDemo_server1.log

References

  1. Using the BootClasspath--Tweaking the Java Runtime API
  2. WebLogic's Classloading Framework
  3. Oracle® JRockit Command-Line Reference Release R28
    • -Xbootclasspath directories and zips/jars separated by ; (Windows) or : (Linux and Solaris)
  4. java - the Java application launcher