Saturday, September 29, 2012

Monitoring WebLogic Server Thread Pool at Runtime

There are two critical areas for WebLogic Server performance tuning:
  • Thread management
  • Network I/O tuning
Before any tuning, you need to identify areas of bottlenecks first.  In this article, we will focus on thread management and specifically its run-time health monitoring.

Thread Pool


There are thread pool implementation changes since WLS 9.0: 
  • Previous versions
    • Multiple pools of threads 
    • See [7] on how to use the WebLogic 8.1 thread pool model for backward compatibility
  • WLS 9.0 and above
    • A single dynamically sized pool of threads (or self-tuning thread pool)
    • Self-tuning work manager[8,9]
New WebLogic Server uses a single thread pool, in which all types of work are executed. Here we summarize how the new implementation works:
  • WebLogic Server prioritizes work and allocates threads based on an execution model that takes into account 
    • Administrator-defined parameters 
    • Actual run-time performance 
      • Throughput
  • The common thread pool changes its size automatically to maximize throughput.
    • This new strategy makes it easier for administrators to allocate processing resources and manage performance, avoiding the effort and complexity involved in configuring, monitoring, and tuning custom executes queues. 
    • The queue monitors throughput over time and based on history, determines whether to adjust the thread count.  For example,
      • Thread count will be increased when:
        • If historical throughput statistics indicate that a higher thread count increased throughput, WebLogic increases the thread count. 
      • Thread count will be reduced when
        • If statistics indicate that fewer threads did not reduce throughput, WebLogic decreases the thread count. 
In general, the self-tuning thread pool changes its size automatically to maximize throughput, so in normal cases there is nothing you need to do aside from monitoring it to understand the behavior of your server under different types of load.

WebLogic Server Monitoring Dashboard


From the Oracle WebLogic Administration Console, you can navigate to

  • Servers | SalesServer_1 | Monitoring | Threads 

to view important statistics related to thread pool and thread pool threads.

Everything from Active Execute Threads to Min Threads Contrain Complete is shown on this page.


From the dashboard, the thread pool runtime can be monitored in real-time. The key KPI's to monitor include[2]:
  • Hogging Thread Count 
    • The threads that are being held by a request right now. These threads will either be declared as stuck after the configured timeout or will return to the pool before that. The self-tuning mechanism will backfill if necessary.
  • Pending User Request Count
    • The number of pending user requests in the priority queue. The priority queue contains requests from internal subsystems and users. This is just the count of all user requests.
  • Queue Length
    • The number of requests queued up when you don’t have idle threads.  
On a low usage environment, these should ideally be hovering around zero.

Another KPI worth of monitoring is
  • Throughput
    • The Throughput is a single value that denotes the mean number of requests completed per second. 
The higher this value is, the better it is.

Pool Size


In some cases, you do want to set the range of pool size.  For example, for our Fusion Application benchmarks, our area of interest is in the JVM (i.e, not upper layers).  In that case, we want to reduce the variation introduced by the self-tuning of thread pool and set our thread pool size to be:
  • -Dweblogic.threadpool.MinPoolSize=32 -Dweblogic.threadpool.MaxPoolSize=32
The general rule[4] for pool sizing or other kinds of tuning is to start with no specific tuning and then configure Work Managers only to address specific problems that might arise. Aggressively configuring Work Managers for a specific environment can end up hurting performance when your application, workload, or underlying system changes.

Stuck Threads


If an execute thread is being hogged by a request for much more than the normal execution time (as automatically observed by the scheduler), it's declared as a hogger.  These threads will either be declared as stuck after the configured timeout (by default, 10 min of processing time) or will return to the pool before that.

If you find any thread's state become STUCK, it's the time you start investigating—does the stuck thread ever recover or does it stay stuck indefinitely?  By default, Oracle Fusion Apps is configured to generate an incident when a STUCK thread is detected.  You can find them here:
  • <DOMAIN_HOME>/servers/<server_name>/adr/diag/ofm/<domain_name>/<server_name>/incident
The incident contains some key diagnostic information that can be used to help understand why the request took so long.

What to Expect if Things Work Normally?


This thread below is what an execute thread in the WebLogic Server self-tuning pool looks like when there is nothing for it to do.
"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" id=15 idx=0x3c tid=3810 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/work/ExecuteThread.waitForRequest(ExecuteThread.java:162)
    ^-- Lock released while waiting: weblogic/work/ExecuteThread@0xa0c21480[fat lock]
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:183)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

The thread below is a timer thread waiting to be notified that it is time to wake up and do whatever it is supposed to do:
"JFR request timer" id=16 idx=0x40 tid=3811 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0xa0c20b28[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0xa0c20b28[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

What state is the “main” thread in? It should look something like this one:
"main" prio=6 tid=0x000000000224f000 nid=0x2e64 runnable [0x00000000023de000]
   java.lang.Thread.State: RUNNABLE
        at weblogic.i18n.Localizer.prune(Localizer.java:358)
        at weblogic.i18n.Localizer.getObject(Localizer.java:164)
        at weblogic.i18n.Localizer.getDiagnosticVolume(Localizer.java:344)
        at weblogic.i18n.logging.CatalogMessage.(CatalogMessage.java:53)
        at weblogic.kernel.T3SrvrLogger.logServerStateChange(T3SrvrLogger.java:2084)
        at weblogic.t3.srvr.T3Srvr.setState(T3Srvr.java:211)
        - locked <0x00000000e0a30d78> (a weblogic.t3.srvr.T3Srvr)
        at weblogic.t3.srvr.T3Srvr.initializeAdmin(T3Srvr.java:921)
        at weblogic.t3.srvr.T3Srvr.startup(T3Srvr.java:589)
        at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:471)
        at weblogic.Server.main(Server.java:74)

References

  1. Oracle® Fusion Applications Performance and Tuning Guide 11g Release 1 (11.1.4)
  2. Oracle SOA Suite 11g Administrator's Handbook
  3. Tuning WebLogic Server
  4. Oracle WebLogic Server 11g Administration Handbook
  5. Controlling Thread Pool Size in WebLogic Server
  6. Understanding JVM Thread States
  7. Using the WebLogic 8.1 Thread Pool Model
    • Describes how to use and tune WebLogic 8.1 thread pools
  8. Using Work Managers to Optimize Scheduled Work
  9. Understanding WebLogic Work Manager
  10. Oracle® Fusion Middleware Tuning Performance of Oracle WebLogic Server 12c (12.2.1)
  11. Data Source Connection Pool Sizing (The Weblogic Server Blog)

No comments: