Xml and More: May 2013

Friday, May 31, 2013

Understanding String Table Size in HotSpot

In JDK-6962930^[2], it requested that string table size be configurable. The resolved date of that bug was on 04/25/2011 and it's available in JDK 7. In another JDK bug^[3], it has requested the default size (i.e. 1009) of string table be increased.

In this article, we will examine the following topics:

What string table is
How to find the number of interned strings in your applications
The tradeoff between memory footprint and lookup cost

String Table

In Java, string interning^[1] is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool, which is the string table in HotSpot.

The size of the string table (i.e., a chained hash table) is configurable in JDK 7. When the overflow chains become long, performance can degrade. The current default size of string table is 1009 (or 1009 buckets), which is too small for applications that stress the string table. Note that the string table itself is allocated in native memory but the strings are java objects.

Increasing the size improves performance (i..e, reducing look-up cost) but increases the StringTable size by 16 bytes on 64-bit systems, 8 bytes on 32-bit systems for every additional entry. For example, changing the default size to 60013 increases the String Table size by 460K on 32 bit systems.

Finding Number of Interned Strings in the Applications

In HotSpot, it provides a product level option named PrintStringTableStatistics which can be used to print hash table statistics^[4]. For example, using one of our applications (hereafter will be referred as JavaApp), it prints out the following information:

StringTable statistics:
Number of buckets : 60013
Average bucket size : 5
Variance of bucket size : 5
Std. dev. of bucket size: 2
Maximum bucket size : 17

You can find the above output from your manged server's log file in the WebLogic domain. Note that we have set the following option:

-XX:StringTableSize=60013

So, there are 60013 buckets in the hash table (or string table).

In JDK, there is also a tool named jmap which can be used to find out number of interned strings in your application. For example, we have found the following information using:

$ jdk-hs/bin/jmap -heap 18974
Attaching to process ID 18974, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.0-b43

using thread-local object allocation.
Parallel GC with 18 thread(s)

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 2147483648 (2048.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 8
PermSize = 402653184 (384.0MB)
MaxPermSize = 402653184 (384.0MB)
G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
<deleted for brevity>

270145 interned Strings occupying 40429904 bytes.

Therefore, we know there are around 260K interned Strings in the table.

Tradeoff Between Memory Footprint and Lookup Cost

Based on curiosity, we have tried to set the string table size to be 277331 (a prime number) to see how JavaApp performs. Here are our findings:

Average Response Time: +0.75%
90% Response Time: +0.56%

However, the memory footprint has increased:

Total Memory Footprint: -1.03%

Finally, here is the hash table statistics based on the new size (i.e., 277331):

StringTable statistics:
Number of buckets : 277331
Average bucket size : 1
Variance of bucket size : 1
Std. dev. of bucket size: 1
Maximum bucket size : 8

The conclusion is that increasing string table size from 60013 to 277331 helps JavaApp's performance a little bit at the expense of larger memory footprint. In this case, the benefit is minimal, keeping string table size to be 60013 is good enough.

References

String Interning (Wikipedia)
JDK 6962930 : make the string table size configurable
JDK 8009928: Increase default value for StringTableSize
Java GC tuning for strings
All other performance tuning articles on XML and More
G1 GC Glossary of Terms

Wednesday, May 29, 2013

How to Debug Native OutOfMemory in JRockit

This is the first time that I have seen the following messages:

Caused By: java.lang.OutOfMemoryError: CG #210992 (2) weblogic/management/configuration/DomainMBeanImpl$Helper.getChildren()Ljava/util/Iterator; in generate_code (compilerfrontend.c:537).

Attempting to allocate 6G bytes

There is insufficient native memory for the Java Runtime Environment to continue.

In this article, we will discuss what native memory is and how to debug running out of native memory in JRockit.

Native Memory vs. Heap Memory

There are two types of memory used by JVM and its applications, all of which are allocated from system memory:

Java Heap

Java heap is the area of memory used by the JVM to do dynamic memory allocation.
The amount of memory used for the heap can be controlled by the following command options:

–Xms2g
–Xmx2g

Heap memory can be garbage collected^[4].

Native Memory

Internal JVM memory management is, to a large extent, kept off the Java heap and allocated natively in the operating system, through system calls like malloc. This non-heap system memory allocated by the JVM is referred to as native memory.
For JRockit, increasing the amount of available native memory is done implicitly by lowering the maximum Java heap size using –Xmx.

If the heap is too large, it may well be the case that not enough native memory is left for JVM internal usage—bookkeeping, code optimizations, and so on. In that case, the JVM may have no other choice than to throw an OutOfMemoryError from native code (for example, from line 537 of compilerfrontend.c in the previous example).

One example is when several parallel threads perform code optimizations in the JVM. Code optimization typically is one of the JVM operations that consumes the largest amounts of native memory, though only when the optimizing JIT is running and only on a per-method basis.

There are also mechanisms that allow the Java program, and not just the JVM, to allocate native memory, for example through JNI calls. If a JNI call executes a native malloc to reserve a large amount of memory, this memory will be unavailable to the JVM until it is freed.

Code Buffers

JRockit is unique in that it has no bytecode interpreter^[1]. The native code is emitted into a code buffer and executed whenever the function it represents is called. There are two main problems associated with this compile-only strategy:

Larger compile-code size

This problem is mitigated by garbage collecting code buffers with methods no longer in use.

Long compilation time for large methods

This problem is solved by having a sloppy mode for the JIT.
Sometimes JRockit will use a lot of time generating a relatively large method, the typical example being a JSP.

However, once finished, the response time for accessing that JSP will be better than that of an interpreted version.

The problem of running out of memory for metadata in JRockit is not that different from the one in HotSpot, except for that it is native memory instead of heap memory. There are, however, two differences:

Cleaning up stale metadata is always enabled by default in JRockit

UseCodeGC = true (default)

Allow GC of discarded compiled code

FreeEmptyCodeBlocks = true (default)

Free unused code memory

There is no fixed size limit, be default, for the space used to store metadata

JRCMD^[2]

When JRockit runs out of native memory and throws an OOM exception, JRCMD can be used for debugging. JRCMD is a small command-line tool that can be used to interact with a running JRockit instance. And it can be used to track native memory usage.

There is no need to pre-configure the JVM or the application to be able to later attach the tool. Also, the tool add virtually no overhead, making it suitable for use in live production environments.

The tools.jar in the JDK contains an API for attaching to a running JVM—the Java Attach API. This framework is utilized by JRCMD to invoke diagnostic commands.

For debugging OOM, you can invoke jrcmd with print_memusage command with displayMap argument:

$ ./jrcmd 411 print_memusage displayMap
411:
Total mapped 3641460KB (reserved=178564KB)
- Java heap 2097152KB (reserved=0KB)
- GC tables 70156KB
- Thread stacks 45876KB (#threads=132)
- Compiled code 65536KB (used=45010KB)
- Internal 1672KB
- OS 394836KB
- Other 544088KB
- Classblocks 27392KB (malloced=26718KB #62502)
- Java class data 393728KB (malloced=388547KB #294025 in 62502 classes)
- Native memory tracking 1024KB (malloced=168KB #10)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
OS *java r x 0x0000000000400000.( 76KB)
OS *java rw 0x0000000000612000.( 4KB)
OS *[heap] rw 0x000000001e8a0000.( 284976KB)
THREAD Stack 457 rwx 0x000000004007c000 ( 8KB)
THREAD Stack 457 0x000000004007e000 ( 12KB)

In the header section, the first column contains the name of a memory space (i.e., "Java Heap") and the second column shows how much memory is mapped for that space. The third column contains details.

In the map section, the first column shows the category of memory chunks:

THREAD: Thread related, for example thread stacks.
INT: Internal use, for example pointer pages.
HEAP: Chunk used by JRockit for the Java heap.
OS: Mapped directly from the operating system, such as third party DLLs or shared objects.
MSP: Memory space. A memory space is a native heap with a specific purpose, for example native memory allocation inside the JVM.
GC: Garbage collection related, for example live bits.
CODE: compiled code

When tracking native memory leaks, it is useful to look at how much the memory usage changes over time. You can do by establishing a baseline first:

$jrcmd 411 print_memusage scale=M baseline

The argument baseline is used to establish a point from which to start measuring. The scale argument modifies the unit of the amounts of memory in the printout (default is KB). Once print_memusage is executed with the baseline argument, subsequent calls will include differentials against the baseline. This can facilitate the monitoring of memory usage changes over time.

References

Oracle JRockit - The Definitive Guide by Marcus Hirt and Marcus Lagergren
Diagnostic Commands (JRCMD)
JNI calls (Wikipedia)
Understanding Garbage Collection (XML and More)
Where did all of these ConstPoolWrapper objects come from?!

Thursday, May 16, 2013

How to Determine JDBC Driver Version Installed with WebLogic Server?

Updated (08/15/2014)

Since the writing of this blog article, things have changed.^[5,6] For example, in WebLogic Server 12.1.2, the following files have moved from wlserver/server/lib to ORACLE_HOME/oracle_common/modules/oracle.jdbc_11.2.0:

ojdbc5.jar
ojdbc6.jar
ojdbc6dms.jar
ojdbc5_g.jar
ojdbc6_g.jar

Updated (12/09/2014)

Added a new section "List of Thin JDBC Driver Versions"

There are built-in JDBC drivers installed with WebLogic Server. For example, the 11g version of the Oracle Thin driver (ojdbc6.jar for JDK 6) is bundled with WebLogic Server. That driver can be found in :

WL_HOME/server/lib

If you plan to use a different version of any of the drivers installed with WebLogic Server, you can replace the driver file in WL_HOME/server/lib with an updated version of the file or add the new file to the front of your CLASSPATH. However, be warned that if you replaced the default JDBC driver in WLS, you might miss some enhancements that were shipped with it. For example, you should not replace the one from WLS with the one from Oracle JDBC (i.e., D:\download\oracle\JDBC\JDBCDrivers\11.2.0.3\ojdbc6.jar) without consultation.

Copies of Oracle Thin drivers and other supporting files (e.g., a debug version named ojdbc6_g.jar) can also be found in

WL_HOME/server/ext/jdbc/

There is a subdirectory in this folder for each DBMS. If you need to revert to the version of the driver installed with WebLogic Server, you can copy the file from WL_HOME/server/ext/jdbc/ to WL_HOME/server/lib.

Manifest File

Built-in JDBC driver files are listed in the manifest of weblogic.jar (see below). So they can be loaded when weblogic.jar is loaded (when the server starts). Therefore, you do not need to add them to your CLASSPATH^[3].

...

Implementation-Version: 10.3.6.0

Class-Path: ../../../modules/features/weblogic.server.modules_10.3.6.0

    .jar schema/weblogic-domain-binding.jar schema/weblogic-domain-bindin

    g-compatibility.jar schema/diagnostics-binding.jar schema/diagnostics

    -image-binding.jar wlcipher.jar webservices.jar xmlx.jar ojdbc6.jar o

    ns.jar ucp.jar aqapi.jar EccpressoAsn1.jar EccpressoCore.jar Eccpress

    oJcae.jar mysql-connector-java-commercial-5.1.17-bin.jar  wlsqlserver

    .jar wldb2.jar wlsybase.jar wlinformix.jar fmwgenerictoken.jar wlw-la

    ngx.jar jcom.jar weblogic-L10N.jar

If you plan to use a third-party JDBC driver that is not installed with WebLogic Server, you need to update the WebLogic Server's classpath to include the location of the JDBC driver classes. Edit the

commEnv.cmd/sh script

in WL_HOME/common/bin and prepend your classes as described in "Modifying the Classpath" in the Oracle Fusion Middleware Command Reference for Oracle WebLogic Server.

How to Determine JDBC Driver Version?

Before you switch to a different version of any drivers, you better find out which version of the existing one is.

To find out which Oracle Thin driver version is used, you can do:


$ java -jar wlserver_10.3/server/lib/ojdbc6.jar

Exception in thread "main" java.lang.ClassFormatError: oracle.jdbc.OracleDriver (unrecognized class file version)

   at java.lang.VMClassLoader.defineClass(libgcj.so.7rh)

   at java.lang.ClassLoader.defineClass(libgcj.so.7rh)

   at java.security.SecureClassLoader.defineClass(libgcj.so.7rh)

   at java.net.URLClassLoader.findClass(libgcj.so.7rh)

   at java.lang.ClassLoader.loadClass(libgcj.so.7rh)

   at java.lang.ClassLoader.loadClass(libgcj.so.7rh)

   at gnu.java.lang.MainThread.run(libgcj.so.7rh)

However, you need to use the correct version of Java executable. For example, the following default java executable is in version 1.4.2 and won't work because our driver was compiled with JDK 6.


$ which java

/usr/bin/java



$ java -version

java version "1.4.2"

gij (GNU libgcj) version 4.1.2 20080704 (Red Hat 4.1.2-46)

So, after choosing the correct java executable, we have found out the Oracle JDBC Thin driver's version as below:


$ /export/home/bench/workload/target_jvm/jdk-hs/bin/java -jar wlserver_10.3/server/lib/ojdbc6.jar

Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6 on Fri_Nov_04_08:05:20_PDT_2011
#Default Connection Properties Resource
#Thu May 16 15:03:40 PDT 2013

List of Thin JDBC Driver Versions

In the following table, we have listed the versions of Oracle JDBC Thin Driver^[9] that comes with WebLogic Server version 9.2 to 12c. But, always consult with Oracle Support for confirmation.

WLS Version	Date	Thin Jar Version
12.1.2	2012	11.2.0.3
12.1.1	December 2011	11.2.0.3
10.3.6	December 2011	11.2.0.3
10.3.5	April 2011	11.2.0.3
10.3.4	January 2011	11.2.0.2
10.3.3	April 2010	11.1.0.7
10.3.2	November 2009	11.1.0.7
10.3.1	June 2009	11.1.0.7
10.3	August 2008	11.1.0.6.0
10.0 MP2	January 2009	11.1.0.6.0
10.0 MP1	October 2007	10.2.0.2.0
9.2.0	July 2006	10.2.0.2.0

References

Using JDBC Drivers with WebLogic Server (11g)
Oracle® Fusion Middleware Command Reference for Oracle WebLogic Server 11g Release 1 (10.3.1)
Adding Classes to the JAR File's Classpath (The Java Tutorials)
Oracle Database 11g Release 2 JDBC Drivers
What's New in Oracle WebLogic Server 12c
Understanding the Standard Installation Topology (12c)
ojdbc6.jar Download

Be warned that upgrading the driver (say from 11.2.0.3 to 11.2.0.4) by downloading is not the right way. You could be missing some bug fixes needed by WebLogic Server. So, always consider to apply the patch provided by Oracle to upgrade your WLS JDBC driver appropriately.

Older JDBC Driver Downloads - 10g (Unsupported)
Type 3 vs. Type 4 JDBC Drivers

Type 4 JDBC Drivers

The JDBC Thin driver (i.e., ojdbc6.jar) is a pure Java, Type IV driver that can be used in applications and applets. It is platform-independent and does not require any additional Oracle software on the client-side. The JDBC Thin driver communicates with the server using SQL*Net to access Oracle Database.
The JDBC Thin driver allows a direct connection to the database by providing an implementation of SQL*Net on top of Java sockets. The driver supports the TCP/IP protocol and requires a TNS listener on the TCP/IP sockets on the database server.
To use the Oracle Type 4 JDBC drivers, you create a JDBC data source in your WebLogic Server configuration and select the JDBC driver to create the physical database connections in the data source. Applications can then look up the data source on the JNDI tree and request a connection.

Sunday, May 5, 2013

Book Review: Managing Multimedia and Unstructured Data in the Oracle Database

There is a large amount of unstructured data in the real world. In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90% of all potentially usable business information may originate in unstructured form^[1].

There is no doubt that we need to process this information to extract meaning and create structured data about the information. One approach will be using Oracle Database to manage such data which includes multimedia (aka "rich media"). This book:

Managing Multimedia and Unstructured Data in the Oracle Database

aims to help readers understand and manage unstructured data using Oracle Database.

Dealing with Unstructured Data

Unstructured Data refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables^[1].

Relational data can be considered a subset of structured data. Besides relational databases, structured data can also be stored in non-relational format such as in XML, inverted list databases^[2], or object databases^[4]. XML does not conform with formal structure of data models, but nonetheless it contains tags to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, one can consider XML as semi-structured.

It's possible to store unstructured data in a column of a relational table, which is structured. The traditional approach has been to just treat it as a blob (binary large object), but with a greater understanding of the variety of unstructured data types (i.e., video, audio, photographs, documents, etc.) that exist, the need to manage them has grown.

Metadata

To manage unstructured data, metadata is crucial. It is the data that describes the unstructured data and gives meaning to it. Metadata can be used for

Searching (covered in Chapter 4, Searching the Multimedia Warehouse)
Annotation

Adding meaning to unstructured data objects

Relating unstructured data objects (or adding structure)
Matching data stored in relational databases

It is envisaged that in the future technology will improve to the point that algorithms will be able to identify objects and people in a video or photo, and understand sounds and complex speech in audio files. When that point is reached, the need for metadata may be reduced or limited to a smaller scope.

Oracle Database

In the past few years, with changes in database technology and improvements in disk performance and storage, it now makes business sense to use the Oracle database to store and manage all of an organization's unstructured data.

For a database management system to begin to correctly handle the unstructured data, it must have support for objects^[4]. The use of a database that can support objects makes it a lot easier to manage large volumes of digital objects. Though these objects can be stored in a file system, there are now advantages to having them stored inside the database. In Oracle, both relational data and objects are supported. After adding Online Analytical Processing (OLAP)^[5] and XML^[6], Oracle database grew from being relational to one supporting most structures.

Oracle multimedia uses blobs and new types, which can be accessed and used as required. In addition, it supports a variety of methods that simplify the act of loading and manipulating digital objects. This is covered in in Chapter 7, Techniques for Creating a Multimedia Database.

Most databases can enable unstructured data to be stored in them, but do not support the management, control, and manipulation of that data. Even though Oracle is a market leader in unstructured data management there are still a large number of major improvements needed. This is covered in Chapter 9, Understanding the Limitations of Oracle Products.

Scalability

When working with multimedia and unstructured data, a row in the database can be 10 GB in size, which could be greater than an entire relational database. Therefore, traditional tuning techniques might fail as the rules regarding them no longer make sense.

For example, in a multimedia warehouse (covered in Chapter 4), the concept of trying to achieve logical data consistency is not attempted, as it becomes apparent that the amount of data that is fuzzy forms the bulk of most of the digital objects. So, novel solutions to tuning problems are needed.

In Oracle Multimedia, there are also built-in supports for scalability. For example, the new 11G Securefile BLOBS using parallel techniques, that allows of loading of files much faster than using traditional BLOBS.

From the hardware front, technology also offers help. With the recent introduction of a low-cost terabyte SATA disks, and with the use of low-cost SAN's, the ability to store a petabyte is within the reach of a number of organizations.

To handle large amount of unstructured data, issues seen and solutions provided by Oracle include:

Hitting limits on the image size

An Oracle BLOB can be unlimited in size

Reaching internal structural limits within the database (max number of files that store data)

Oracle's use of tablespaces allow a large number of multimedia files to be stored in it
With the ability to control where a blob is stored, files can be split across multiple tablespaces and devices

Dealing with fragmentation

By using locally managed tablespaces, fragmentation is removed as a performance issue.

The efficient management of those images (for example, backup/recovery)

Using partitioning on LOBS allows a very large number of multimedia files to be stored and efficiently managed
Using RMAN, Oracle can be configured to back up large amounts of data

When dealing with Multimedia, one has to look at the different dimensions of scalability to best understand how the Oracle Database best handles it. This includes managing the CPU, memory, disk I/O, and network bandwidth. All of the above are covered in Chap 8, Tuning.

References

Unstructured data (Wikipedia)
ADABAS (Wikipedia)
3D Printing (Wikipedia)
Object Database (Wikipedia)
Oracle OLAP
Oracle XML DB
Managing Multimedia and Unstructured Data in the Oracle Database (Reviewed Book)

Friday, May 31, 2013

Understanding String Table Size in HotSpot

String Table

Finding Number of Interned Strings in the Applications

Tradeoff Between Memory Footprint and Lookup Cost

References

Wednesday, May 29, 2013

How to Debug Native OutOfMemory in JRockit

Native Memory vs. Heap Memory

Code Buffers

JRCMD[2]

References

Thursday, May 16, 2013

How to Determine JDBC Driver Version Installed with WebLogic Server?

Manifest File

How to Determine JDBC Driver Version?

List of Thin JDBC Driver Versions

References

Sunday, May 5, 2013

Book Review: Managing Multimedia and Unstructured Data in the Oracle Database

Dealing with Unstructured Data

Metadata

Oracle Database

Scalability

References

JRCMD^[2]