Xml and More: Monitoring WebLogic Server Thread Pool at Runtime

There are two critical areas for WebLogic Server performance tuning:

Thread management
Network I/O tuning

Before any tuning, you need to identify areas of bottlenecks first. In this article, we will focus on thread management and specifically its run-time health monitoring.

Thread Pool

There are thread pool implementation changes since WLS 9.0:

Previous versions

Multiple pools of threads
See [7] on how to use the WebLogic 8.1 thread pool model for backward compatibility

WLS 9.0 and above

A single dynamically sized pool of threads (or self-tuning thread pool)
Self-tuning work manager^[8,9]

New WebLogic Server uses a single thread pool, in which all types of work are executed. Here we summarize how the new implementation works:

WebLogic Server prioritizes work and allocates threads based on an execution model that takes into account
- Administrator-defined parameters
- Actual run-time performance
The common thread pool changes its size automatically to maximize throughput.

This new strategy makes it easier for administrators to allocate processing resources and manage performance, avoiding the effort and complexity involved in configuring, monitoring, and tuning custom executes queues.
The queue monitors throughput over time and based on history, determines whether to adjust the thread count. For example,

Thread count will be increased when:

If historical throughput statistics indicate that a higher thread count increased throughput, WebLogic increases the thread count.

Thread count will be reduced when

If statistics indicate that fewer threads did not reduce throughput, WebLogic decreases the thread count.

In general, the self-tuning thread pool changes its size automatically to maximize throughput, so in normal cases there is nothing you need to do aside from monitoring it to understand the behavior of your server under different types of load.

WebLogic Server Monitoring Dashboard

From the Oracle WebLogic Administration Console, you can navigate to

Servers | SalesServer_1 | Monitoring | Threads

to view important statistics related to thread pool and thread pool threads.

Everything from Active Execute Threads to Min Threads Contrain Complete is shown on this page.

From the dashboard, the thread pool runtime can be monitored in real-time. The key KPI's to monitor include^[2]:

Hogging Thread Count

The threads that are being held by a request right now. These threads will either be declared as stuck after the configured timeout or will return to the pool before that. The self-tuning mechanism will backfill if necessary.

Pending User Request Count

The number of pending user requests in the priority queue. The priority queue contains requests from internal subsystems and users. This is just the count of all user requests.

Queue Length

The number of requests queued up when you don’t have idle threads.

On a low usage environment, these should ideally be hovering around zero.

Another KPI worth of monitoring is

Throughput

The Throughput is a single value that denotes the mean number of requests completed per second.

The higher this value is, the better it is.

Pool Size

In some cases, you do want to set the range of pool size. For example, for our Fusion Application benchmarks, our area of interest is in the JVM (i.e, not upper layers). In that case, we want to reduce the variation introduced by the self-tuning of thread pool and set our thread pool size to be:

-Dweblogic.threadpool.MinPoolSize=32 -Dweblogic.threadpool.MaxPoolSize=32

The general rule^[4] for pool sizing or other kinds of tuning is to start with no specific tuning and then configure Work Managers only to address specific problems that might arise. Aggressively configuring Work Managers for a specific environment can end up hurting performance when your application, workload, or underlying system changes.

Stuck Threads

If an execute thread is being hogged by a request for much more than the normal execution time (as automatically observed by the scheduler), it's declared as a hogger. These threads will either be declared as stuck after the configured timeout (by default, 10 min of processing time) or will return to the pool before that.

If you find any thread's state become STUCK, it's the time you start investigating—does the stuck thread ever recover or does it stay stuck indefinitely? By default, Oracle Fusion Apps is configured to generate an incident when a STUCK thread is detected. You can find them here:

<DOMAIN_HOME>/servers/<server_name>/adr/diag/ofm/<domain_name>/<server_name>/incident

The incident contains some key diagnostic information that can be used to help understand why the request took so long.

What to Expect if Things Work Normally?

This thread below is what an execute thread in the WebLogic Server self-tuning pool looks like when there is nothing for it to do.

"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" id=15 idx=0x3c tid=3810 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/work/ExecuteThread.waitForRequest(ExecuteThread.java:162)
    ^-- Lock released while waiting: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:183)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

The thread below is a timer thread waiting to be notified that it is time to wake up and do whatever it is supposed to do:

"JFR request timer" id=16 idx=0x40 tid=3811 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0xa0c20b28[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0xa0c20b28[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

What state is the “main” thread in? It should look something like this one:

"main" prio=6 tid=0x000000000224f000 nid=0x2e64 runnable [0x00000000023de000]
   java.lang.Thread.State: RUNNABLE
        at weblogic.i18n.Localizer.prune(Localizer.java:358)
        at weblogic.i18n.Localizer.getObject(Localizer.java:164)
        at weblogic.i18n.Localizer.getDiagnosticVolume(Localizer.java:344)
        at weblogic.i18n.logging.CatalogMessage.(CatalogMessage.java:53)
        at weblogic.kernel.T3SrvrLogger.logServerStateChange(T3SrvrLogger.java:2084)
        at weblogic.t3.srvr.T3Srvr.setState(T3Srvr.java:211)
        - locked <0x00000000e0a30d78> (a weblogic.t3.srvr.T3Srvr)
        at weblogic.t3.srvr.T3Srvr.initializeAdmin(T3Srvr.java:921)
        at weblogic.t3.srvr.T3Srvr.startup(T3Srvr.java:589)
        at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:471)
        at weblogic.Server.main(Server.java:74)

References

Oracle® Fusion Applications Performance and Tuning Guide 11g Release 1 (11.1.4)
Oracle SOA Suite 11g Administrator's Handbook
Tuning WebLogic Server
Oracle WebLogic Server 11g Administration Handbook
Controlling Thread Pool Size in WebLogic Server
Understanding JVM Thread States
Using the WebLogic 8.1 Thread Pool Model
- Describes how to use and tune WebLogic 8.1 thread pools
Using Work Managers to Optimize Scheduled Work
Understanding WebLogic Work Manager
Oracle® Fusion Middleware Tuning Performance of Oracle WebLogic Server 12c (12.2.1)
Data Source Connection Pool Sizing (The Weblogic Server Blog)

Saturday, September 29, 2012

Monitoring WebLogic Server Thread Pool at Runtime