Friday, October 26, 2012

HotSpot VM Performance Tuning Tips

In some cases, it may be obvious from benchmarking that parts of an application need to be rewritten using more efficient algorithms[1]. Sometimes it may just be enough to provide a more optimal runtime environment by tuning the JVM parameters.

In this article, we will show you some of the HotSpot VM performance tuning tips.

What to tune?


You can tune HotSpot performance on multiple fronts:
  • Code generation[8,10]
  • Memory management
In this article, we will focus more on memory management (or garbage collector).  The goals of tuning garbage collector include:
  • To make a garbage collector operate efficiently, by
    • Reducing pause time or
    • Increasing throughput
  • To avoid heap fragmentation
    • Different garbage collector uses different compaction to eliminate fragmentation
  • To make it scalable for multithreaded applications on multiprocessor systems
In this article, we will cover the following tuning options:
  1. Client VM or Server VM
  2. 32-bit VM or 64-bit VM
  3. GC strategy
  4. Heap sizing
  5. Further tuning

Client vs. Server VM


The HotSpot Client JVM has been specially tuned to reduce application startup time and memory footprint, making it particularly well suited for client environments. On all platforms, the HotSpot Client JVM is the default.

The Java HotSpot Server VM is similar to the HotSpot Client JVM except that it has been specially tuned to maximize peak operating speed. It is intended for long-running server applications, for which the fastest possible operating speed is generally more important than having the fastest startup time. To invoke the HotSpot Server JVM instead of the default HotSpot Client JVM, use the -server parameter; for example,
  • java -server MyApp
In [7], authors have mentioned a third HotSpot VM runtime named tiered.   If you are using Java 6 Update 25, Java 7, or later, you may consider using tiered server runtime as a replacement for the client runtime.   For more details, read [8,10].

32-Bit or 64-Bit VM


The 32-bit JVM is the default for the HotSpot VM. The choice of using a 32-bit or 64-bit JVM is dictated by the memory footprint required by the application along with whether any third-party software used in the application supports 64-bit JVMs and if there are any native components in the Java application. All native components using the Java Native Interface (JNI) in a 64-bit JVM must be compiled in 64-bit mode.

Running 64-bit VM has the following advantages[7]:
  • Larger address space
  • Better performance in two fronts
    • 64-bit JVMs can make use of additional CPU registers
    • Help avoid register spilling
      • Register spilling occurs where there is more live state (i.e. variables) in the application than the CPU has registers. 
and one disadvantage:
  • Increased width for oops
    • Results in fewer oops being available on a CPU cache line and as a result decreases CPU cache efficiency. 
    • This negative performance impact can be mitigated by setting:
      • -XX:+UseCompressedOops VM command line option
Note that client runtimes are not available in 64-bit HotSpot VMs.   See [11] for more details.

GC Strategy


JVM performance is usually measured by its GC's effectiveness.  Garbage collection (GC) reclaims the heap space previously allocated to objects no longer needed. The process of locating and removing those dead objects can stall your Java application while consuming as much as 25 percent of throughput.

The Java HotSpot virtual machine includes five garbage collectors.[27] All the collectors are generational.
  • Serial Collector
    • Both young and old collections are done serially (using a single CPU), in a stop-the-world fashion.
    • The old and permanent generations are collected via a mark-sweep-compact collection algorithm. 
      • The sweep phase “sweeps” over the generations, identifying garbage. The collector then performs sliding compaction, sliding the live objects towards the beginning of the old generation space (and similarly for the permanent generation), leaving any free space in a single contiguous chunk at the opposite end.
    • When to use
      • For most applications that are run on client-style machines and that do not have a requirement for low pause times
    • How to select
      • In the J2SE 5.0 and above, the serial collector is automatically chosen as the default garbage collector on machines that are not server-class machines. On other machines, the serial collector can be explicitly requested by using the -XX:+UseSerialGC command line option.
  • Parallel Collector (or throughput collector)
    • Young generation collection
      • Uses a parallel version of the young generation collection algorithm utilized by the serial collector
      • It is still a stop-the-world and copying collector, but performing the young generation collection in parallel, using many CPUs, decreases garbage collection overhead and hence increases application throughput.
    • Old generation collection
      • Uses the same serial mark-sweep-compact collection algorithm as the serial collector
    • When to use
      • For applications run on machines with more than one CPU and do not have pause time constraints, since infrequent, but potentially long, old generation collections will still occur. 
      • Examples of applications for which the parallel collector is often appropriate include those that do batch processing, billing, payroll, scientific computing, and so on.
    • How to select
      • In the J2SE 5.0 and above, the parallel collector is automatically chosen as the default garbage collector on server-class machines. On other machines, the parallel collector can be explicitly requested by using the -XX:+UseParallelGC command line option.
  • Parallel Compacting Collector
      • Young generation collection
        • Use the same algorithm as that for young generation collection using the parallel collector.
      • Old generation collection
        • The old and permanent generations are collected in a stop-the-world, mostly parallel fashion with sliding compaction
        • The collector utilizes three phases (see [5] for more details):
          • Marking phase
          • Summary phase
          • Compaction phase
      • When to use
        • For applications that are run on machines with more than one CPU and applications that have pause time constraints.
      • How to select
        • If you want the parallel compacting collector to be used, you must select it by specifying the command line option -XX:+UseParallelOldGC.
    • Concurrent Mark-Sweep (CMS) Collector[5,6]
        • Young generation collection
          • The CMS collector collects the young generation in the same manner as the parallel collector. 
        • Old generation collection
          • Most of the collection of the old generation using the CMS collector is done concurrently with the execution of the application. 
          • The CMS collector is the only collector that is non-compacting. That is, after it frees the space that was occupied by dead objects, it does not move the live objects to one end of the old generation. 
            • To minimize risk of fragmentation, CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.
        • When to use
          • For application needs shorter garbage collection pauses and can afford to share processor resources with the garbage collector when the application is running. (Due to its concurrency, the CMS collector takes CPU cycles away from the application during a collection cycle.)
          • Typically, applications that have a relatively large set of long-lived data (a large old generation), and that run on machines with two or more processors, tend to benefit from the use of this collector. 
          • Compared to the parallel collector, the CMS collector decreases old generation pauses—sometimes dramatically—at the expense of slightly longer young generation pauses, some reduction in throughput, and extra heap size requirements.
        • How to select
          • If you want the CMS collector to be used, you must explicitly select it by specifying the command line option -XX:+UseConcMarkSweepGC.  
          • If you want it to be run in incremental mode, also enable that mode via the  –XX:+CMSIncrementalMode option.
            • This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).
          • ParNewGC is the parallel young generation collector for use with CMS.  To choose it, you can specify the command line option  ‑XX:+UseParNewGC.
      • Generation First (G1) Garbage Collector[31]
        • Summary
          • Differs from CMS in the following ways
            • Compacting
              • Reduce fragmentation and is good for long-running applications 
            • Heap is split into regions
              • Easy to allocate and resize
          • Evacuation pauses 
            • For both young and old regions
        • Young generation collection
          • During a young GC, survivors from the young regions are evacuated to either survivor regions or old regions
            • Done with stop-the-world evacuation pauses
            • Performed in parallel
        • Old generation collection
          • Some garbage objects in regions with very high live ratio may be left in the heap and be collected later
          • Concurrent marking phase
            • Calculates liveness information per region
              • Empty regions can be reclaimed immediately
              • Identifies best regions for subsequent evacuation pauses
            • Remark is done with one stop-the-world puase while initial mark is piggybacked on an evacuation pause
            • No corresponding sweeping phase
            • Different marking algorithm than CMS
          • Old regions are reclaimed by
            • Evacuation pauses
              • Using compaction
              • Where most reclamation is done
            • Remark (when totally empty)
        • When to use
          • The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. 
        • How to select
          • If you want the G1 garbage collector to be used, you must explicitly select it by specifying the command line option -XX:+UseG1GC.  
      Note that the difference between:
      • -XX:+UseParallelOldGC
      and
      • -XX:+UseParallelGC
      is that -XX:+UseParallelOldGC enables both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, that is, both minor garbage collections and full garbage collections are multithreaded. -XX:+UseParallelGC enables only a multithreaded young generation garbage collector. The old generation garbage collector used with -XX:+UseParallelGC is single threaded. 

      Using -XX:+UseParallelOldGC also automatically enables -XX:+UseParallelGC. Hence, if you want to use both a multithreaded young generation garbage collector and a multithreaded old generation garbage collector, you need only specify -XX:+UseParallelOldGC.

      Note that the above distinction between -XX:+UseParallelOldGC and -XX:+UseParallelGC are no longer true in JDK 7.  In JDK 7, the following three settings are equivalent:

      • Default
      • -XX:+UseParallelGC
      • -XX:+UseParallelOldGC
      They all use multithreaded collectors for both young generation and old generation.

      Heap Sizing


      If a heap size is small, collection will be fast but the heap will fill up more quickly, thus requiring more frequent collections. Conversely, a large heap will take longer to fill up and thus collections will be less frequent, but they may take longer.

      Command line parameters that divide the heap between new and old generations usually cause the greatest performance impact.  If you increase the new generation's size, you often improve the overall throughput; however, you also increase footprint, which may slow down servers with limited memory.

      For more details, you can read [12-15].

      Further Tuning


      HotSpot's default parameters are effective for most small applications that require faster startup and a smaller footprint.  However, more often than not, you will find default settings are not good enough and need further tuning your Java applications.  As shown in [16], there are many VM options exoosed and can be further tuned by brave souls.  I'm not going to discuss such tunings in this article.  But, I'll keep on posting articles on this blogger with VM tunings.  Stay tuned!

      References

      1. Java Performance Tips
      2. Pick up performance with generational garbage collection
      3. Java HotSpot™ Virtual Machine Performance Enhancements
      4. Oracle JRockit
      5. Memory Management in the Java HotSpot™ Virtual Machine
      6. Understanding GC pauses in JVM, HotSpot's CMS collector
      7. Java Performance by Charlie Hunt and Binu John
      8. Performance Tuning with Hotspot VM Option: -XX:+TieredCompilation
      9. Java Tuning White Paper
      10. A Case Study of Using Tiered Compilation in HotSpot
      11. HotSpot VM Binaries: 32-Bit vs. 64-Bit
      12. HotSpot Performance Option — SurvivorRatio
      13. A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded
      14. Understanding Garbage Collection
      15. Diagnosing Java.lang.OutOfMemoryError
      16. What Are the Default HotSpot JVM Values?
      17. Understanding Garbage Collector Output of Hotspot VM
      18. On Stack Replacement in HotSpot JVM
      19. Professional Oracle WebLogic Server by Robert Patrick, Gregory Nyberg, and Philip Aston
      20. Sun Performance and Tuning: Java and the Internet by Adrian Cockroft and Richard Pettit
      21. Concurrent Programming in Java: Design Principles and Patterns by Doug Lea
      22. Capacity Planning for Web Performance: Metrics, Models, and Methods by Daniel A. Menascé and Virgilio A.F. Almeida
      23. Java Performance Tuning (Michael Finocchiaro)
      24. Diagnosing Heap Stress in HotSpot
      25. Introduction to HotSpot JVM Performance and Tuning
      26. Tuning the JVM (video)
        • Frequency of Minhor GC dictated by:
          • Application object allocation rate
          • Size of Eden
        • Frequency of object promotion dictated by:
          • Frequency of minor GCs (tenuring)
          • Size of survivor spaces
        • Full GC Frequency dictated by
          • Promotion rate
          • Size of old generation
      27. JEP 173: Retire Some Rarely-Used GC Combinations
      28. G1 GC Glossary of Terms
      29. Learn More About Performance Improvements in JDK 8 
      30. Java SE HotSpot at a Glance
      31. Garbage-First Garbage Collector (JDK 8 HotSpot Virtual Machine Garbage Collection Tuning Guide)
      32. Tuning that was great in old JRockit versions might not be so good anymore
        • Trying to bring over each and every tuning option from a JR configuration to an HS one is probably a bad idea.
        • Even when moving between major versions of the same JVM, we usually recommend going back to the default (just pick a collector and heap size) and then redoing any tuning work from scratch (if even necessary).

      Monday, October 22, 2012

      Oracle releases new ADF Mobile

      Oracle ADF Mobile enables developer to build applications that install and run on both iOS and Android devices from one source code.

      Development is done with JDeveloper and ADF and leverages Java and HTML5 technologies, while keeping the same visual and declarative approach ADF is known for.

      Redwood Shores, Calif. – October 22, 2012

      News Facts

      • Oracle today announced the general availability of Oracle Application Development Framework (ADF) Mobile, an extension of the Oracle Application Development Framework.
      • Part of Oracle Fusion Middleware, Oracle ADF Mobile is a HTML5 and Java-based framework that enables developers to easily build, deploy, and extend enterprise applications for mobile environments, including iOS and Android, from a single code base. Based on a next-generation hybrid mobile development architecture, Oracle ADF Mobile allows developers to increase productivity, while protecting investments, by enabling code reuse through a flexible, open standards-based architecture.
      • Oracle ADF Mobile based applications enable enterprises across industries to meet frequently changing mobile requirements by allowing developers to rapidly and visually develop applications once, and deploy to multiple devices and platforms. 

      See Also


      Friday, October 12, 2012

      Dynamically Sizing JDBC Connection Pool in WebLogic Server

      A data source in WebLogic Server has a set of properties that define the initial, minimum, and maximum number of connections in the pool. A data source automatically adds one connection to the pool when all connections are in use. When the pool reaches maxCapacity, the maximum number of connections are opened, and they remain opened unless you enable automatic shrinking on the data source or manually shrink the data source.

      In this article, we will discuss the trade-offs between memory footprint and CPU utilization in the task of JDBC connection pool sizing. Before you start, you may want to read this companion article first:

      Fixed-Sized vs Dynamically-Sized Pool


      Sometimes you would like to set the initial capacity to the same value as the maximum capacity—this way, the connection pool will have all the physical connections ready when the pool is initialized. However, sometimes it's not possible to estimate what your run-time workloads (either average or peak load) would be in advance and it could become wasteful to over-allocate connection instances. Then dynamically-sized pool may be the better approach.

      Monitoring JDBC Connection Statistics



      As shown above, you can navigate to:
      • Services -> Data Sources -> ApplicationDB -> Monitoring -> Statistics
      and monitor the connection statistics of a specific data source (i.e., "ApplicationDB").

      In our case, ApplicationDB was deployed to multiple servers. As you can see, the active connections on each server is low (i.e., maximum is 6). However, we have set its Initial Capacity to be 20 and all five pools inherit the setting and have a current capacity of 20.

      Also, in our case, only SalesServer_1 will ever need over 20 connections concurrently and allocating 20 connections for all pools can be wasteful. So, based on your own situation, you may want to reduce ApplicationDB's initial capacity appropriately.

      After you estimate your peak load, you can choose a Maximum Capacity for the data source. In this case, initial and maximum capacity will be different. Then you can configure the way the pool can shrink and grow by using two additional properties:
      • Shrink Frequency
        • The number of seconds to wait before shrinking a connection pool that has incrementally increased to meet demand.
        • When set to 0, shrinking is disabled.
      • Minimum Capacity
        • The minimum number of physical connections that this connection pool can contain after it is initialized.
      You may want to drop some connections from the data source when a peak usage period has ended, freeing up WebLogic Server and DBMS resources. When you shrink a data source, WebLogic Server reduces the number of connections in the pool to the greater of either the Minimum Capacity or the number of connections currently in use.

      For best performance, you should always tune pool sizes based on DataSource statistics.

      References

      1. Monitoring WebLogic JDBC Connection Pool at Runtime
      2. Oracle® Fusion Middleware Configuring and Managing JDBC Data Sources for Oracle WebLogic Server 11g Release 1 (10.3.4)
      3. Configuring JDBC Data Sources in JDeveloper and Oracle WebLogic Server
      4. Monitoring and Tuning Oracle Fusion Applications
      5. Why My WebLogic Managed Server is in ADMIN State?
        • Read this for a good example of when to set Initial Capacity to be zero.
      6. JBO-26061: Error while opening JDBC connection
      7. Tuning Data Sources (12.2.1.3.0) 
      8. Top Tuning Recommendations for WebLogic Server (12.2.1.3.0)

      Thursday, October 11, 2012

      Passivation and Activation in Oracle ADF—jbo.passivationstore

      The passivation/activation implementation in Oracle ADF Business Components[1] is designed to keep transaction states across multiple requests or sessions.

      In this article, we will discuss one aspect of the passivation/activation implementation in Oracle Fusion Applications—the passivation store.

      Passivation and Activation


      There are two kinds of pools in use when running a typical Fusion web application:

      • Application Module (AM) pools
      • Database connection pools

      An application module pool is a collection of instances of a single application module type which are shared by multiple application clients. As for database connection pools, they are usually maintained by the J2EE container. You can read [9] for more details. To tune Fusion Application's performance, you need to understand both pools[8].

      Each time a user accesses a resource and that resource uses an AM to display data, the Application Module pool manager assigns an AM instance to the user session. If the pool runs out of connections, the AM pool Manager passivates the state of one of the sessions (either in the database or in a file), thus releasing an instance and assigning it to the new session. When the user that was passivated resumes their work, ADF will activate their state from the configured store. This is done automatically for you.

      Passivation Store


      In order to manage application module pending work, the application module pool asks AM instances to "snapshot" their state to XML at different times. If the value of the jbo.dofailover configuration parameter is true (default), then this XML snapshotting will happen each time the AM instance is released to the pool.

      The AM instance snapshots can be saved either in the database or in a file. To configure it, you can set jbo.passivationstore to be:
      • database
      • file

      File Store


      If you set jbo.passivationstore to be file, by default passivation should go to use.dir. However, you can change the location by setting:
      • -Djbo.tmpdir
      For example, in our CRM Fusion Application, we have selected file to be the passivation store. But, we didn't set its location (i..e jbo.tmpdir). By default, it used "user.dir":
      • <Installation Home>/instance/domains/<Server Name>/CRMDomain

      To find out where "user.dir" points to on Linux, you can do:

      $ls -l /proc/<pid>/cwd
      cwd -> /c1/mt/rup1/instance/domains/myserver/CRMDomain


      So, going to that directory, you can find a bunch of files used by CRM for passivation:

      -rw-r----- 1 mygrp testuser 6380   Oct 11 10:04 BCacc13d9BCD
      -rw-r----- 1 mygrp testuser 263    Oct 11 10:04 BC325e092dBCD
      -rw-r----- 1 mygrp testuser 127186 Oct 11 10:04 BC166a7e7fBCD


      DB Store


      If you set jbo.passivationstore to be database (default), XML snapshots will be written to a BLOB column in a row of the PS_TXN table in the database.

      While the file-based option is a little faster, unless your multiple application server instances share a file system, then the the database-backed passivation scheme is the most robust for application server-farm and failover scenarios.

      Configuration Parameters


      To summarize, the following configuration parameters are related to this topic:
      • jbo.dofailover
        • Enables eager passivation or not
      • jbo.passivationstore
        • Dictates the store type
      • jbo.tmpdir
        • Specifies the location for file store
      You can reference [10] for other application module pool configuration parameters.

      References

      1. Oracle ADF Essentials
      2. Reusable ADF Components—Application Modules
      3. Java System Properties
      4. Why is the user.dir system property working in Java?
      5. ADF BC Passivation/Activation and SQL Execution Tuning
      6. Demystifying ADF BC Passivation and Activation
      7. Ensuring that your ADF Application is Passivation/Activation Safe
      8. Understanding Application Module Pooling Concepts and Configuration Parameters
      9. Monitoring WebLogic JDBC Connection Pool at Runtime
      10. What You May Need to Know About Application Module Pool Parameters

      Friday, October 5, 2012

      Tuning WebLogic's Prepared Statement Cache

      In [10], the top 9 tuning recommendations for WebLogic Server includes
      Use the Prepared Statement Cache

      The primary utility of a cached prepared statement is its association with a compiled query plan in the DBMS.  In this article, we will show how to tune Prepared Statement Cache in Oracle WebLogic Server for better web application performance.

      Caching Prepared Statements


      There are two steps to complete a SQL request: 
      • Compiling the SQL statement
      • Executing the SQL statement
      By using prepared statements (java.sql.PreparedStatement), you can reduce unnecessary compilation, saving time.  A prepared statement contains SQL statements that have already been compiled, thus making their execution faster. If you’re going to use a SQL statement more than once, you should use a prepared statement.

      However, when you use a prepared statement or a callable statement (a callable statement object provides a way to call stored procedures in a standard way for all RDBMs) in an application, there’s additional overhead due to the need for communication between WebLogic Server and the database.

      To minimize the processing costs, WebLogic Server can cache prepared and callable statements used in your applications. When an application or EJB calls any of the statements stored in the cache, WebLogic Server reuses the statement stored in the cache. Reusing prepared and callable statements can:
      • Reduce CPU usage on the database server 
      • Improve the statement’s performance on the application server

      Statement Cache


      The statement cache caches statements from a specific physical connection.  Each connection in a data source has its own individual cache of prepared and callable statements used on the connection. However, you configure statement cache options per data source. That is, the statement cache for each connection in a data source uses the statement cache options specified for the data source, but each connection caches it's own statements. Statement cache configuration options include:
      • Statement Cache Type—The algorithm that determines which statements to store in the statement cache. See Statement Cache Algorithms.
      • Statement Cache Size—The number of statements to store in the cache for each connection.

      Configuring Statement Cache Size




      You can configure the size of the statement cache from the WebLogic Server Administration Console by navigating to:

      • Services -> Data Sources -> ApplicationDB -> Configuration -> Connection Pool

      The value can be from 0 to 1024 (default: 10).  If you set the size of the statement cache to 0, it runs off statement caching.

      JDBC DataSource Runtime Statistics



      Each connection in the connection pool has its own cache of statements. JDBC DataSource Runtime Statistics shown in the table are the sum of the number of cached statements for all connections in the connection pool.
      • Prep Stmt Cache Access Count 
        • The total number of times the statement cache was accessed
      • Prep Stmt Cache Add Count
        • The total number of statements added to the statement cache for all connections
      • Prep Stmt Cache Current Size
        • The number of prepared and callable statements currently cached in the statement cache
      • Prep Stmt Cache Hit Count 
        • The running count of the number of times the server used a statement from the cache
      • Prep Stmt Cache Miss Count
        • The number of times that a statement request could not be satisfied with a statement from the cache
      Access Count is the sum of Hit Count and Miss Count.  When you tune for the cache performance, you want to reduce the miss ratio (i.e., Miss Count / Access Count) based on the guideline below.

      Tuning Guideline


      By increasing the statement cache size, you can increase your system performance. However, you must consider how your DBMS handles open prepared and callable statements. In many cases, the DBMS will maintain a cursor for each open statement. This applies to prepared and callable statements in the statement cache.

      If you cache too many statements, you may exceed the limit of open cursors on your database server. If the DBMS is Oracle, you will get ORA-1000 error.  To avoid exceeding the limit of open cursors for a connection, you can change the limit in your database management system or you can reduce the statement cache size for the data source.

      References

      1. Oracle WebLogic Server 11g Administration Handbook
      2. JDBC statement cache
      3. Why Prepared Statements are important and how to use them "properly"
      4. Statement Cache Algorithms
      5. Monitoring and Tuning Oracle Fusion Applications
      6. Monitoring WebLogic JDBC Connection Pool at Runtime (XML and More)
      7. Oracle JDBC Memory Management (Oracle Database 12c)
      8. Fusion Middleware Performance and Tuning for Oracle WebLogic Server
      9. Oracle® Fusion Middleware Tuning Performance of Oracle WebLogic Server 12c (12.2.1)
      10. Top Tuning Recommendations for WebLogic Server

      Wednesday, October 3, 2012

      Monitoring WebLogic JDBC Connection Pool at Runtime

      Before you start any performance tuning, you need to monitor your application runtime behavior using default application server settings first.

      In this article, we will show you how to monitor the health of WebLogic JDBC connection pool. In a companion article[7], we also show you how to tune Prepared Statement Cache in WebLogic Server for better web application performance.

      Data Sources & JDBC Connection Pool



      WebLogic Server maintains a pool of reusable physical database connections to minimize the overhead involved in connecting to a database. All the connections in a pool connect to the same database and use the same username and password for the connections.

      WebLogic Server also manage your database connectivity through JDBC data sources. WebLogic Server data sources help separate database connection information from your application code.  Each data source that you configure contains a pool of database connections that are created when the data source instance is created—when it is deployed or targeted, or at server startup. The connection pool can grow or shrink dynamically to accommodate the demand.

      At runtime, Java applications perform a lookup of the JNDI tree to find the data source and request database connections using the getConnectionMethod. Once the application completes using that connection, the connection goes back to the data source’s connection pool.

      DataSource Runtime Monitoring


      Using WebLogic Server Administration Console, you can monitor JDBC DataSource statistics by navigating to:
      • Servers --> SalesServer_1 --> Monitoring --> JDBC

      There are many KPIs that you can monitor with and you can customize which ones to be displayed in the table.  We have listed some important KPIs here:
      • Waiting For Connection High Count
        • Highest number of application requests concurrently waiting for a connection from this instance of the data source
      • Wait Seconds High Count
        • The highest number of seconds that an application waited for a connection (the longest connection reserve wait time) from this instance of the connection pool since the connection pool was instantiated
      • Connection Delay Time
        • The average amount of time, in milliseconds, that it takes to create a physical connection to the database
        • The value is calculated as summary of all times to connect divided by the total number of connections

      Configuring the Connection Pool


      When the WebLogic Server starts up or when you deploy a data source to a new target, the connection pool is registered with the server, meaning that the connection pool and its connections are created at that time. You can configure various settings to control the connection pool size and the way the pool can shrink and grow.  You should tune pool sizes based on DataSource statistics. For example, you should ensure connection wait time is not high.

      We have listed some settings here that you may want to tune for your applications:
      • Initial Capacity
        • Number of connections created when pool is initialized
      • Minimum Capacity
        • Minimum number of connections that will be maintained in the pool
        • Should be tuned for steady load
      • Maximum Capacity
        • Maximum number of connections that pool can have
        • Should be tuned to peak load
      • Shrink Frequency 
        • Should be enabled to drop some connections from the data source when a peak usage period has ended, freeing up WebLogic Server and DBMS resources
      It is common to set the initial capacity to a value that handles your estimated average, but not necessarily the maximum number of connections to the database. Ideally, you want to make sure that you have enough initial connections to match the number of concurrent requests that you expect to have running on any given server instance.

      To be on the safe side, you can set the initial capacity to the same value as the maximum capacity—this way, the connection pool will have all the physical connections ready when the pool is initialized.  However, sometimes you do want to dynamically adjust pool size at run-time, see [5].

      The rule of the thumb for pool sizing is simply to make sure that the pool is large enough for all server threads to get access to the pooled resources they need concurrently.  In previous versions of WebLogic Server,this was usually simple.  For example,each execute thread needs access to one database connection from each pool, so you always make sure that the maximum capacity of the database connection pool was greater than or equal to the number of execute threads. With the introduction of server self-tuning, the number of execute threads isn't necessarily well defined.  Then, the tips provided here may be helpful to you.

      Acknowledgement


      Some writings here are based on the feedback from Sandeep Mahajan and Stevan Malesevic. However, the author would assume the full responsibility for the content himself.

      References

      1. Professional Oracle WebLogic Server
      2. Oracle WebLogic Server 11gR1 PS2: Administration Essentials
      3. The WebLogic Server Administration Console
      4. Managing WebLogic JDBC Resources
      5. Dynamically Sizing JDBC Connection Pool in WebLogic Server
      6. Configuring JDBC Data Sources in JDeveloper and Oracle WebLogic Server
      7. Tuning WebLogic's Prepared Statement Cache
      8. Data Source Connection Pool Sizing

      Monday, October 1, 2012

      Performance Turning for WebLogic Server—Native Muxers vs. Java Muxers

      There are two critical areas for WebLogic Server (WLS) performance tuning:
      • Thread management
      • Network I/O tuning
      In this article, we will touch upon one aspect of Network I/O tuning—Native Muxers vs. Java Muxers.

      Listen Thread


        Listen Thread ---> Listen Thread Queue --> Socket Muxer 

      When a server process starts up, it binds itself to a port and assigns a listen thread to the port to listen for incoming requests.  Once the request makes a connection, the server passes the control of that connection to the socket muxer.

      From the thread dump, you can find an entry like this:
        "DynamicListenThread[Default[9]]" daemon prio=10 tid=0x00002aaac921b800 
         nid=0x3bf1 runnable [0x000000004c026000]
      
      
      From the server log file, you can find a matching entry like this:
        <Oct 2, 2012 11:02:28 AM PDT> <Notice> <Server> <BEA-002613>
        <Channel "Default[9]" is now listening on 0:0:0:0:0:0:0:1:9000 
        for protocols iiop, t3, ldap, snmp, http.>

      Socket Muxer


        socket muxer --> execute queue 
      Muxers read messages from the network, bundle them into a package of work, and queue them to the Work Manager.  An idle execute thread will pick up a request from the execute queue and may in turn hand off the job of responding to those requests to special threads.  Finally, socket muxers also make sure the response gets back to the same socket from which the request came. Socket muxers are software modules and there are two types:
      • Java Muxers
        • Uses pure Java to read data from sockets
        • The number of threads is tunable for Java muxers by configuring the Percent Socket Readers parameter setting in the Administration Console
      • Native Muxers 
        • Native muxers use platform-specific native binaries to read data from sockets
          • The majority of all platforms provide some mechanism to poll a socket for data
        • Native muxers provide better performance, especially when scaling to large user bases, because they implement a non-blocking thread model
        • Note that Native IO is not supported for WebLogic clients which includes WLST
      The Enable Native IO checkbox on the server’s configuration settings tells the server which version to use.  In the above figure, we have selected Native IO and, therefore, JavaSocketMuxer Socket Readers was grayed out.

      In general, the server will determine the correct type of muxer to use and will use the native muxers by default without having to make any specific tuning changes.

      Which Muxer Was Actually Used?


      The quickest way is to create a thread dump (for example, using jstack) and search for "Muxer".  In our experimental environment, Posix Muxer was picked up accidentally:

      "ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaae190b800 nid=0x10cf runnable [0x0000000040e13000]
         java.lang.Thread.State: RUNNABLE
              at weblogic.socket.PosixSocketMuxer.poll(Native Method)
              at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java

      -Dweblogic.SocketReaders


      You can explicitly set the number of socket readers using the following command line option:
      • -Dweblogic.SocketReaders=3
      If you set it to be 3, you can find the following entries from the thread dump:
      "ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaac8776000 nid=0x3475 waiting for monitor entry [0x0000000041dbd000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      
      "ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaac8774800 nid=0x3474 waiting for monitor entry [0x0000000041cbc000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      
      "ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aaac8770000 nid=0x3473 runnable [0x0000000041877000]
         java.lang.Thread.State: RUNNABLE
      
      
      The main reason to do this is that in some releases the number of readers is set by default to the number of CPUs available on the system. On some types of hardware this results in as many as 128 reader threads, which is not so good.

      Typically you will see good performance anywhere between 1-3 socket readers threads. In some case, folks have used 6—but, those are special cases.  Be warned that not having enough readers will result in work not being read from the sockets quickly enough for the server to process.

      Using our ATG CRM benchmark, you can see the changes of throughput and response time when number of SocketReaders is changed from 1 to 3:


      SocketReaders=1

      SocketReaders=3

      Maximum Running Vusers 400 400
      Total Throughput (bytes) 2,487,087,264 2,496,307,995
      Average Throughput (bytes/second) 1,036,286 1,040,128
      Average Hits per Second 29.786 29.86
      Average Response Time (seconds) 0.248 0.236
      90% Response Time (seconds) 0.209 0.210


      BEA-000438


      In some circumstances, you may see the following error message:
        <BEA-000438> <Unable to load performance pack. Using Java I/O instead.
        Please ensure that libmuxer library is in...
      
      For instance, when you use 64-bit JVM and libmuxer.so is not on the LD_LIBRARY_PATH.  To resolve it, just add the following path:
      • <Oracle Home>/wlserver_10.3/server/native/linux/x86_64
      to the LD_LIBRARY_PATH.

      Acknowledgement


      Some of the writings here are based on the feedback from Sandeep Mahajan. However, the author would assume the full responsibility for the content himself.

      References

      1. Oracle WebLogic Server 11g Administration Handbook (Oracle Press)
      2. HotSpot VM Binaries: 32-Bit vs. 64-Bit
      3. Weblogic - Socket Muxers in Thread Dumps