Xml and More: October 2012

Friday, October 26, 2012

HotSpot VM Performance Tuning Tips

In some cases, it may be obvious from benchmarking that parts of an application need to be rewritten using more efficient algorithms^[1]. Sometimes it may just be enough to provide a more optimal runtime environment by tuning the JVM parameters.

In this article, we will show you some of the HotSpot VM performance tuning tips.

What to tune?

You can tune HotSpot performance on multiple fronts:

Code generation^[8,10]
Memory management

In this article, we will focus more on memory management (or garbage collector). The goals of tuning garbage collector include:

To make a garbage collector operate efficiently, by

Reducing pause time or
Increasing throughput

To avoid heap fragmentation

Different garbage collector uses different compaction to eliminate fragmentation

To make it scalable for multithreaded applications on multiprocessor systems

In this article, we will cover the following tuning options:

Client VM or Server VM
32-bit VM or 64-bit VM
GC strategy
Heap sizing
Further tuning

Client vs. Server VM

The HotSpot Client JVM has been specially tuned to reduce application startup time and memory footprint, making it particularly well suited for client environments. On all platforms, the HotSpot Client JVM is the default.

The Java HotSpot Server VM is similar to the HotSpot Client JVM except that it has been specially tuned to maximize peak operating speed. It is intended for long-running server applications, for which the fastest possible operating speed is generally more important than having the fastest startup time. To invoke the HotSpot Server JVM instead of the default HotSpot Client JVM, use the -server parameter; for example,

java -server MyApp

In [7], authors have mentioned a third HotSpot VM runtime named tiered. If you are using Java 6 Update 25, Java 7, or later, you may consider using tiered server runtime as a replacement for the client runtime. For more details, read [8,10].

32-Bit or 64-Bit VM

The 32-bit JVM is the default for the HotSpot VM. The choice of using a 32-bit or 64-bit JVM is dictated by the memory footprint required by the application along with whether any third-party software used in the application supports 64-bit JVMs and if there are any native components in the Java application. All native components using the Java Native Interface (JNI) in a 64-bit JVM must be compiled in 64-bit mode.

Running 64-bit VM has the following advantages^[7]:

Larger address space
Better performance in two fronts

64-bit JVMs can make use of additional CPU registers
Help avoid register spilling

and one disadvantage:

Increased width for oops

Results in fewer oops being available on a CPU cache line and as a result decreases CPU cache efficiency.
This negative performance impact can be mitigated by setting:

-XX:+UseCompressedOops VM command line option

Note that client runtimes are not available in 64-bit HotSpot VMs. See [11] for more details.

GC Strategy

JVM performance is usually measured by its GC's effectiveness. Garbage collection (GC) reclaims the heap space previously allocated to objects no longer needed. The process of locating and removing those dead objects can stall your Java application while consuming as much as 25 percent of throughput.

The Java HotSpot virtual machine includes five garbage collectors.^[27] All the collectors are generational.

Serial Collector

Both young and old collections are done serially (using a single CPU), in a stop-the-world fashion.
The old and permanent generations are collected via a mark-sweep-compact collection algorithm.

The sweep phase “sweeps” over the generations, identifying garbage. The collector then performs sliding compaction, sliding the live objects towards the beginning of the old generation space (and similarly for the permanent generation), leaving any free space in a single contiguous chunk at the opposite end.

When to use

For most applications that are run on client-style machines and that do not have a requirement for low pause times

How to select

In the J2SE 5.0 and above, the serial collector is automatically chosen as the default garbage collector on machines that are not server-class machines. On other machines, the serial collector can be explicitly requested by using the -XX:+UseSerialGC command line option.

Parallel Collector (or throughput collector)

Young generation collection

Uses a parallel version of the young generation collection algorithm utilized by the serial collector
It is still a stop-the-world and copying collector, but performing the young generation collection in parallel, using many CPUs, decreases garbage collection overhead and hence increases application throughput.

Old generation collection

Uses the same serial mark-sweep-compact collection algorithm as the serial collector

When to use

For applications run on machines with more than one CPU and do not have pause time constraints, since infrequent, but potentially long, old generation collections will still occur.
Examples of applications for which the parallel collector is often appropriate include those that do batch processing, billing, payroll, scientific computing, and so on.

How to select

In the J2SE 5.0 and above, the parallel collector is automatically chosen as the default garbage collector on server-class machines. On other machines, the parallel collector can be explicitly requested by using the -XX:+UseParallelGC command line option.

Parallel Compacting Collector

Young generation collection

Use the same algorithm as that for young generation collection using the parallel collector.

Old generation collection

The old and permanent generations are collected in a stop-the-world, mostly parallel fashion with sliding compaction
The collector utilizes three phases (see [5] for more details):

Marking phase
Summary phase
Compaction phase

When to use

For applications that are run on machines with more than one CPU and applications that have pause time constraints.

How to select

If you want the parallel compacting collector to be used, you must select it by specifying the command line option -XX:+UseParallelOldGC.

Concurrent Mark-Sweep (CMS) Collector^[5,6]

Young generation collection

The CMS collector collects the young generation in the same manner as the parallel collector.

Old generation collection

Most of the collection of the old generation using the CMS collector is done concurrently with the execution of the application.
The CMS collector is the only collector that is non-compacting. That is, after it frees the space that was occupied by dead objects, it does not move the live objects to one end of the old generation.

To minimize risk of fragmentation, CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.

When to use

For application needs shorter garbage collection pauses and can afford to share processor resources with the garbage collector when the application is running. (Due to its concurrency, the CMS collector takes CPU cycles away from the application during a collection cycle.)
Typically, applications that have a relatively large set of long-lived data (a large old generation), and that run on machines with two or more processors, tend to benefit from the use of this collector.
Compared to the parallel collector, the CMS collector decreases old generation pauses—sometimes dramatically—at the expense of slightly longer young generation pauses, some reduction in throughput, and extra heap size requirements.

How to select

If you want the CMS collector to be used, you must explicitly select it by specifying the command line option -XX:+UseConcMarkSweepGC.
If you want it to be run in incremental mode, also enable that mode via the –XX:+CMSIncrementalMode option.

This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).

ParNewGC is the parallel young generation collector for use with CMS. To choose it, you can specify the command line option ‑XX:+UseParNewGC.

Generation First (G1) Garbage Collector^[31]

Summary

Differs from CMS in the following ways

Compacting

Reduce fragmentation and is good for long-running applications

Heap is split into regions

Easy to allocate and resize

Evacuation pauses

For both young and old regions

Young generation collection

During a young GC, survivors from the young regions are evacuated to either survivor regions or old regions

Done with stop-the-world evacuation pauses
Performed in parallel

Old generation collection

Some garbage objects in regions with very high live ratio may be left in the heap and be collected later
Concurrent marking phase

Calculates liveness information per region

Empty regions can be reclaimed immediately
Identifies best regions for subsequent evacuation pauses

Remark is done with one stop-the-world puase while initial mark is piggybacked on an evacuation pause
No corresponding sweeping phase
Different marking algorithm than CMS

Old regions are reclaimed by

Evacuation pauses

Using compaction
Where most reclamation is done

Remark (when totally empty)

When to use

The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput.

How to select

If you want the G1 garbage collector to be used, you must explicitly select it by specifying the command line option -XX:+UseG1GC.

Note that the difference between:

-XX:+UseParallelOldGC

and

-XX:+UseParallelGC

is that -XX:+UseParallelOldGC enables both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, that is, both minor garbage collections and full garbage collections are multithreaded. -XX:+UseParallelGC enables only a multithreaded young generation garbage collector. The old generation garbage collector used with -XX:+UseParallelGC is single threaded.

Using -XX:+UseParallelOldGC also automatically enables -XX:+UseParallelGC. Hence, if you want to use both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, you need only specify -XX:+UseParallelOldGC.

Note that the above distinction between -XX:+UseParallelOldGC and -XX:+UseParallelGC are no longer true in JDK 7. In JDK 7, the following three settings are equivalent:

Default
-XX:+UseParallelGC
-XX:+UseParallelOldGC

They all use multithreaded collectors for both young generation and old generation.

Heap Sizing

If a heap size is small, collection will be fast but the heap will fill up more quickly, thus requiring more frequent collections. Conversely, a large heap will take longer to fill up and thus collections will be less frequent, but they may take longer.

Command line parameters that divide the heap between new and old generations usually cause the greatest performance impact. If you increase the new generation's size, you often improve the overall throughput; however, you also increase footprint, which may slow down servers with limited memory.

For more details, you can read [12-15].

Further Tuning

HotSpot's default parameters are effective for most small applications that require faster startup and a smaller footprint. However, more often than not, you will find default settings are not good enough and need further tuning your Java applications. As shown in [16], there are many VM options exoosed and can be further tuned by brave souls. I'm not going to discuss such tunings in this article. But, I'll keep on posting articles on this blogger with VM tunings. Stay tuned!

References

Java Performance Tips
Pick up performance with generational garbage collection
Java HotSpot™ Virtual Machine Performance Enhancements
Oracle JRockit
Memory Management in the Java HotSpot™ Virtual Machine
Understanding GC pauses in JVM, HotSpot's CMS collector
Java Performance by Charlie Hunt and Binu John
Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation
Java Tuning White Paper
A Case Study of Using Tiered Compilation in HotSpot
HotSpot VM Binaries: 32-Bit vs. 64-Bit
HotSpot Performance Option — SurvivorRatio
A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded
Understanding Garbage Collection
Diagnosing Java.lang.OutOfMemoryError
What Are the Default HotSpot JVM Values?
Understanding Garbage Collector Output of Hotspot VM
On Stack Replacement in HotSpot JVM
Professional Oracle WebLogic Server by Robert Patrick, Gregory Nyberg, and Philip Aston
Sun Performance and Tuning: Java and the Internet by Adrian Cockroft and Richard Pettit
Concurrent Programming in Java: Design Principles and Patterns by Doug Lea
Capacity Planning for Web Performance: Metrics, Models, and Methods by Daniel A. Menascé and Virgilio A.F. Almeida
Java Performance Tuning (Michael Finocchiaro)
Diagnosing Heap Stress in HotSpot
Introduction to HotSpot JVM Performance and Tuning
Tuning the JVM (video)

Frequency of Minhor GC dictated by:

Application object allocation rate
Size of Eden

Frequency of object promotion dictated by:

Frequency of minor GCs (tenuring)
Size of survivor spaces

Full GC Frequency dictated by

Promotion rate
Size of old generation

JEP 173: Retire Some Rarely-Used GC Combinations
G1 GC Glossary of Terms
Learn More About Performance Improvements in JDK 8
Java SE HotSpot at a Glance
Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
Tuning that was great in old JRockit versions might not be so good anymore
- Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
- Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).

Monday, October 22, 2012

Oracle releases new ADF Mobile

Oracle ADF Mobile enables developer to build applications that install and run on both iOS and Android devices from one source code.

Development is done with JDeveloper and ADF and leverages Java and HTML5 technologies, while keeping the same visual and declarative approach ADF is known for.

Redwood Shores, Calif. – October 22, 2012

News Facts

Oracle today announced the general availability of Oracle Application Development Framework (ADF) Mobile, an extension of the Oracle Application Development Framework.
Part of Oracle Fusion Middleware, Oracle ADF Mobile is a HTML5 and Java-based framework that enables developers to easily build, deploy, and extend enterprise applications for mobile environments, including iOS and Android, from a single code base. Based on a next-generation hybrid mobile development architecture, Oracle ADF Mobile allows developers to increase productivity, while protecting investments, by enabling code reuse through a flexible, open standards-based architecture.
Oracle ADF Mobile based applications enable enterprises across industries to meet frequently changing mobile requirements by allowing developers to rapidly and visually develop applications once, and deploy to multiple devices and platforms.

Friday, October 12, 2012

Dynamically Sizing JDBC Connection Pool in WebLogic Server

A data source in WebLogic Server has a set of properties that define the initial, minimum, and maximum number of connections in the pool. A data source automatically adds one connection to the pool when all connections are in use. When the pool reaches maxCapacity, the maximum number of connections are opened, and they remain opened unless you enable automatic shrinking on the data source or manually shrink the data source.

In this article, we will discuss the trade-offs between memory footprint and CPU utilization in the task of JDBC connection pool sizing. Before you start, you may want to read this companion article first:

Monitoring WebLogic JDBC Connection Pool at Runtime

Fixed-Sized vs Dynamically-Sized Pool

Sometimes you would like to set the initial capacity to the same value as the maximum capacity—this way, the connection pool will have all the physical connections ready when the pool is initialized. However, sometimes it's not possible to estimate what your run-time workloads (either average or peak load) would be in advance and it could become wasteful to over-allocate connection instances. Then dynamically-sized pool may be the better approach.

Monitoring JDBC Connection Statistics

As shown above, you can navigate to:

Services -> Data Sources -> ApplicationDB -> Monitoring -> Statistics

and monitor the connection statistics of a specific data source (i.e., "ApplicationDB").

In our case, ApplicationDB was deployed to multiple servers. As you can see, the active connections on each server is low (i.e., maximum is 6). However, we have set its Initial Capacity to be 20 and all five pools inherit the setting and have a current capacity of 20.

Also, in our case, only SalesServer_1 will ever need over 20 connections concurrently and allocating 20 connections for all pools can be wasteful. So, based on your own situation, you may want to reduce ApplicationDB's initial capacity appropriately.

After you estimate your peak load, you can choose a Maximum Capacity for the data source. In this case, initial and maximum capacity will be different. Then you can configure the way the pool can shrink and grow by using two additional properties:

Shrink Frequency

The number of seconds to wait before shrinking a connection pool that has incrementally increased to meet demand.
When set to 0, shrinking is disabled.

Minimum Capacity

The minimum number of physical connections that this connection pool can contain after it is initialized.

You may want to drop some connections from the data source when a peak usage period has ended, freeing up WebLogic Server and DBMS resources. When you shrink a data source, WebLogic Server reduces the number of connections in the pool to the greater of either the Minimum Capacity or the number of connections currently in use.

For best performance, you should always tune pool sizes based on DataSource statistics.

References

Monitoring WebLogic JDBC Connection Pool at Runtime
Oracle® Fusion Middleware Configuring and Managing JDBC Data Sources for Oracle WebLogic Server 11g Release 1 (10.3.4)
Configuring JDBC Data Sources in JDeveloper and Oracle WebLogic Server
Monitoring and Tuning Oracle Fusion Applications
Why My WebLogic Managed Server is in ADMIN State?

Read this for a good example of when to set Initial Capacity to be zero.

JBO-26061: Error while opening JDBC connection
Tuning Data Sources (12.2.1.3.0)
Top Tuning Recommendations for WebLogic Server (12.2.1.3.0)

Thursday, October 11, 2012

Passivation and Activation in Oracle ADF—jbo.passivationstore

The passivation/activation implementation in Oracle ADF Business Components^[1] is designed to keep transaction states across multiple requests or sessions.

In this article, we will discuss one aspect of the passivation/activation implementation in Oracle Fusion Applications—the passivation store.

Passivation and Activation

There are two kinds of pools in use when running a typical Fusion web application:

Application Module (AM) pools
Database connection pools

An application module pool is a collection of instances of a single application module type which are shared by multiple application clients. As for database connection pools, they are usually maintained by the J2EE container. You can read [9] for more details. To tune Fusion Application's performance, you need to understand both pools^[8].

Each time a user accesses a resource and that resource uses an AM to display data, the Application Module pool manager assigns an AM instance to the user session. If the pool runs out of connections, the AM pool Manager passivates the state of one of the sessions (either in the database or in a file), thus releasing an instance and assigning it to the new session. When the user that was passivated resumes their work, ADF will activate their state from the configured store. This is done automatically for you.

Passivation Store

In order to manage application module pending work, the application module pool asks AM instances to "snapshot" their state to XML at different times. If the value of the jbo.dofailover configuration parameter is true (default), then this XML snapshotting will happen each time the AM instance is released to the pool.

The AM instance snapshots can be saved either in the database or in a file. To configure it, you can set jbo.passivationstore to be:

database
file

File Store

If you set jbo.passivationstore to be file, by default passivation should go to use.dir. However, you can change the location by setting:

-Djbo.tmpdir

For example, in our CRM Fusion Application, we have selected file to be the passivation store. But, we didn't set its location (i..e jbo.tmpdir). By default, it used "user.dir":

<Installation Home>/instance/domains/<Server Name>/CRMDomain

To find out where "user.dir" points to on Linux, you can do:


$ls -l /proc/<pid>/cwd

cwd -> /c1/mt/rup1/instance/domains/myserver/CRMDomain

So, going to that directory, you can find a bunch of files used by CRM for passivation:


-rw-r-----  1 mygrp testuser 6380   Oct 11 10:04 BCacc13d9BCD

-rw-r-----  1 mygrp testuser 263    Oct 11 10:04 BC325e092dBCD

-rw-r-----  1 mygrp testuser 127186 Oct 11 10:04 BC166a7e7fBCD

DB Store

If you set jbo.passivationstore to be database (default), XML snapshots will be written to a BLOB column in a row of the PS_TXN table in the database.

While the file-based option is a little faster, unless your multiple application server instances share a file system, then the the database-backed passivation scheme is the most robust for application server-farm and failover scenarios.

Configuration Parameters

To summarize, the following configuration parameters are related to this topic:

jbo.dofailover

Enables eager passivation or not

jbo.passivationstore

Dictates the store type

jbo.tmpdir

Specifies the location for file store

You can reference [10] for other application module pool configuration parameters.

References

Friday, October 5, 2012

Tuning WebLogic's Prepared Statement Cache

In [10], the top 9 tuning recommendations for WebLogic Server includes

Use the Prepared Statement Cache

The primary utility of a cached prepared statement is its association with a compiled query plan in the DBMS. In this article, we will show how to tune Prepared Statement Cache in Oracle WebLogic Server for better web application performance.

Caching Prepared Statements

There are two steps to complete a SQL request:

Compiling the SQL statement
Executing the SQL statement

By using prepared statements (java.sql.PreparedStatement), you can reduce unnecessary compilation, saving time. A prepared statement contains SQL statements that have already been compiled, thus making their execution faster. If you’re going to use a SQL statement more than once, you should use a prepared statement.

However, when you use a prepared statement or a callable statement (a callable statement object provides a way to call stored procedures in a standard way for all RDBMs) in an application, there’s additional overhead due to the need for communication between WebLogic Server and the database.

To minimize the processing costs, WebLogic Server can cache prepared and callable statements used in your applications. When an application or EJB calls any of the statements stored in the cache, WebLogic Server reuses the statement stored in the cache. Reusing prepared and callable statements can:

Reduce CPU usage on the database server
Improve the statement’s performance on the application server

Statement Cache

The statement cache caches statements from a specific physical connection. Each connection in a data source has its own individual cache of prepared and callable statements used on the connection. However, you configure statement cache options per data source. That is, the statement cache for each connection in a data source uses the statement cache options specified for the data source, but each connection caches it's own statements. Statement cache configuration options include:

Statement Cache Type—The algorithm that determines which statements to store in the statement cache. See Statement Cache Algorithms.
Statement Cache Size—The number of statements to store in the cache for each connection.

Configuring Statement Cache Size

You can configure the size of the statement cache from the WebLogic Server Administration Console by navigating to:

Services -> Data Sources -> ApplicationDB -> Configuration -> Connection Pool

The value can be from 0 to 1024 (default: 10). If you set the size of the statement cache to 0, it runs off statement caching.

JDBC DataSource Runtime Statistics

Each connection in the connection pool has its own cache of statements. JDBC DataSource Runtime Statistics shown in the table are the sum of the number of cached statements for all connections in the connection pool.

Prep Stmt Cache Access Count

The total number of times the statement cache was accessed

Prep Stmt Cache Add Count

The total number of statements added to the statement cache for all connections

Prep Stmt Cache Current Size

The number of prepared and callable statements currently cached in the statement cache

Prep Stmt Cache Hit Count

The running count of the number of times the server used a statement from the cache

Prep Stmt Cache Miss Count

The number of times that a statement request could not be satisfied with a statement from the cache

Access Count is the sum of Hit Count and Miss Count. When you tune for the cache performance, you want to reduce the miss ratio (i.e., Miss Count / Access Count) based on the guideline below.

Tuning Guideline

By increasing the statement cache size, you can increase your system performance. However, you must consider how your DBMS handles open prepared and callable statements. In many cases, the DBMS will maintain a cursor for each open statement. This applies to prepared and callable statements in the statement cache.

If you cache too many statements, you may exceed the limit of open cursors on your database server. If the DBMS is Oracle, you will get ORA-1000 error. To avoid exceeding the limit of open cursors for a connection, you can change the limit in your database management system or you can reduce the statement cache size for the data source.

References

Wednesday, October 3, 2012

Monitoring WebLogic JDBC Connection Pool at Runtime

Before you start any performance tuning, you need to monitor your application runtime behavior using default application server settings first.

In this article, we will show you how to monitor the health of WebLogic JDBC connection pool. In a companion article^[7], we also show you how to tune Prepared Statement Cache in WebLogic Server for better web application performance.

Data Sources & JDBC Connection Pool

WebLogic Server maintains a pool of reusable physical database connections to minimize the overhead involved in connecting to a database. All the connections in a pool connect to the same database and use the same username and password for the connections.

WebLogic Server also manage your database connectivity through JDBC data sources. WebLogic Server data sources help separate database connection information from your application code. Each data source that you configure contains a pool of database connections that are created when the data source instance is created—when it is deployed or targeted, or at server startup. The connection pool can grow or shrink dynamically to accommodate the demand.

At runtime, Java applications perform a lookup of the JNDI tree to find the data source and request database connections using the getConnectionMethod. Once the application completes using that connection, the connection goes back to the data source’s connection pool.

DataSource Runtime Monitoring

Using WebLogic Server Administration Console, you can monitor JDBC DataSource statistics by navigating to:

Servers --> SalesServer_1 --> Monitoring --> JDBC

There are many KPIs that you can monitor with and you can customize which ones to be displayed in the table. We have listed some important KPIs here:

Waiting For Connection High Count

Highest number of application requests concurrently waiting for a connection from this instance of the data source

Wait Seconds High Count

The highest number of seconds that an application waited for a connection (the longest connection reserve wait time) from this instance of the connection pool since the connection pool was instantiated

Connection Delay Time

The average amount of time, in milliseconds, that it takes to create a physical connection to the database
The value is calculated as summary of all times to connect divided by the total number of connections

Configuring the Connection Pool

When the WebLogic Server starts up or when you deploy a data source to a new target, the connection pool is registered with the server, meaning that the connection pool and its connections are created at that time. You can configure various settings to control the connection pool size and the way the pool can shrink and grow. You should tune pool sizes based on DataSource statistics. For example, you should ensure connection wait time is not high.

We have listed some settings here that you may want to tune for your applications:

Initial Capacity

Number of connections created when pool is initialized

Minimum Capacity

Minimum number of connections that will be maintained in the pool
Should be tuned for steady load

Maximum Capacity

Maximum number of connections that pool can have
Should be tuned to peak load

Shrink Frequency

Should be enabled to drop some connections from the data source when a peak usage period has ended, freeing up WebLogic Server and DBMS resources

It is common to set the initial capacity to a value that handles your estimated average, but not necessarily the maximum number of connections to the database. Ideally, you want to make sure that you have enough initial connections to match the number of concurrent requests that you expect to have running on any given server instance.

To be on the safe side, you can set the initial capacity to the same value as the maximum capacity—this way, the connection pool will have all the physical connections ready when the pool is initialized. However, sometimes you do want to dynamically adjust pool size at run-time, see [5].

The rule of the thumb for pool sizing is simply to make sure that the pool is large enough for all server threads to get access to the pooled resources they need concurrently. In previous versions of WebLogic Server,this was usually simple. For example,each execute thread needs access to one database connection from each pool, so you always make sure that the maximum capacity of the database connection pool was greater than or equal to the number of execute threads. With the introduction of server self-tuning, the number of execute threads isn't necessarily well deﬁned. Then, the tips provided here may be helpful to you.

Acknowledgement

Some writings here are based on the feedback from Sandeep Mahajan and Stevan Malesevic. However, the author would assume the full responsibility for the content himself.

References

Monday, October 1, 2012

Performance Turning for WebLogic Server—Native Muxers vs. Java Muxers

There are two critical areas for WebLogic Server (WLS) performance tuning:

Thread management
Network I/O tuning

In this article, we will touch upon one aspect of Network I/O tuning—Native Muxers vs. Java Muxers.

Listen Thread

Listen Thread ---> Listen Thread Queue --> Socket Muxer

When a server process starts up, it binds itself to a port and assigns a listen thread to the port to listen for incoming requests. Once the request makes a connection, the server passes the control of that connection to the socket muxer.

From the thread dump, you can find an entry like this:

  "DynamicListenThread[Default[9]]" daemon prio=10 tid=0x00002aaac921b800 
   nid=0x3bf1 runnable [0x000000004c026000]

From the server log file, you can find a matching entry like this:

  <Oct 2, 2012 11:02:28 AM PDT> <Notice> <Server> <BEA-002613>
  <Channel "Default[9]" is now listening on 0:0:0:0:0:0:0:1:9000 
  for protocols iiop, t3, ldap, snmp, http.>

Socket Muxer

socket muxer --> execute queue

Muxers read messages from the network, bundle them into a package of work, and queue them to the Work Manager. An idle execute thread will pick up a request from the execute queue and may in turn hand off the job of responding to those requests to special threads. Finally, socket muxers also make sure the response gets back to the same socket from which the request came. Socket muxers are software modules and there are two types:

Java Muxers

Uses pure Java to read data from sockets
The number of threads is tunable for Java muxers by configuring the Percent Socket Readers parameter setting in the Administration Console

Native Muxers

Native muxers use platform-specific native binaries to read data from sockets

The majority of all platforms provide some mechanism to poll a socket for data

Native muxers provide better performance, especially when scaling to large user bases, because they implement a non-blocking thread model
Note that Native IO is not supported for WebLogic clients which includes WLST

The Enable Native IO checkbox on the server’s configuration settings tells the server which version to use. In the above figure, we have selected Native IO and, therefore, JavaSocketMuxer Socket Readers was grayed out.

In general, the server will determine the correct type of muxer to use and will use the native muxers by default without having to make any specific tuning changes.

Which Muxer Was Actually Used?

The quickest way is to create a thread dump (for example, using jstack) and search for "Muxer". In our experimental environment, Posix Muxer was picked up accidentally:

"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaae190b800 nid=0x10cf runnable [0x0000000040e13000]
   java.lang.Thread.State: RUNNABLE
        at weblogic.socket.PosixSocketMuxer.poll(Native Method)
        at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java

-Dweblogic.SocketReaders

You can explicitly set the number of socket readers using the following command line option:

-Dweblogic.SocketReaders=3

If you set it to be 3, you can find the following entries from the thread dump:

"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaac8776000 nid=0x3475 waiting for monitor entry [0x0000000041dbd000]
   java.lang.Thread.State: BLOCKED (on object monitor)

"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaac8774800 nid=0x3474 waiting for monitor entry [0x0000000041cbc000]
   java.lang.Thread.State: BLOCKED (on object monitor)

"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaac8770000 nid=0x3473 runnable [0x0000000041877000]
   java.lang.Thread.State: RUNNABLE

The main reason to do this is that in some releases the number of readers is set by default to the number of CPUs available on the system. On some types of hardware this results in as many as 128 reader threads, which is not so good.

Typically you will see good performance anywhere between 1-3 socket readers threads. In some case, folks have used 6—but, those are special cases. Be warned that not having enough readers will result in work not being read from the sockets quickly enough for the server to process.

Using our ATG CRM benchmark, you can see the changes of throughput and response time when number of SocketReaders is changed from 1 to 3:

	SocketReaders=1	SocketReaders=3
Maximum Running Vusers	400	400
Total Throughput (bytes)	2,487,087,264	2,496,307,995
Average Throughput (bytes/second)	1,036,286	1,040,128
Average Hits per Second	29.786	29.86
Average Response Time (seconds)	0.248	0.236
90% Response Time (seconds)	0.209	0.210

BEA-000438

In some circumstances, you may see the following error message:

  <BEA-000438> <Unable to load performance pack. Using Java I/O instead.
  Please ensure that libmuxer library is in...

For instance, when you use 64-bit JVM and libmuxer.so is not on the LD_LIBRARY_PATH. To resolve it, just add the following path:

<Oracle Home>/wlserver_10.3/server/native/linux/x86_64

to the LD_LIBRARY_PATH.

Acknowledgement

Some of the writings here are based on the feedback from Sandeep Mahajan. However, the author would assume the full responsibility for the content himself.

Friday, October 26, 2012

HotSpot VM Performance Tuning Tips

What to tune?

Client vs. Server VM

32-Bit or 64-Bit VM

GC Strategy

Heap Sizing

Further Tuning

References

Monday, October 22, 2012

Oracle releases new ADF Mobile

News Facts

See Also

Friday, October 12, 2012

Dynamically Sizing JDBC Connection Pool in WebLogic Server

Fixed-Sized vs Dynamically-Sized Pool

Monitoring JDBC Connection Statistics

References

Thursday, October 11, 2012

Passivation and Activation in Oracle ADF—jbo.passivationstore

Passivation and Activation

Passivation Store

File Store

DB Store

Configuration Parameters

References

Friday, October 5, 2012

Tuning WebLogic's Prepared Statement Cache

Caching Prepared Statements

Statement Cache

Configuring Statement Cache Size

JDBC DataSource Runtime Statistics

Tuning Guideline

References

Wednesday, October 3, 2012

Monitoring WebLogic JDBC Connection Pool at Runtime

Data Sources & JDBC Connection Pool

DataSource Runtime Monitoring

Configuring the Connection Pool

Acknowledgement

References

Monday, October 1, 2012

Performance Turning for WebLogic Server—Native Muxers vs. Java Muxers

Listen Thread

Socket Muxer

Which Muxer Was Actually Used?

-Dweblogic.SocketReaders

SocketReaders=1

SocketReaders=3

BEA-000438

Acknowledgement

References