Wednesday, June 25, 2014

Oracle Fusion Apps: The port may already be used by another process

After cloning our benchmark (i.e., CRM FUSE) to new servers, the following issue popped up:
The port may already be used by another process
This happened only after I restarted some managed servers in the WebLogic Domain.  In this article, we will describe what happened and how to fix it.

The Issue


The port in conflict was 9020.  As described in a companion article[1], you can use netstat command in Linux to investigate:
$ netstat -an |grep 9020

tcp  0  0 ::ffff:10.214.10.20:7101    ::ffff:10.214.10.20:9020    ESTABLISHED 
tcp  0  0 ::ffff:10.214.10.20:9020    ::ffff:10.214.10.20:7101    ESTABLISHED 

So, port 9020 was truly used by another process.  What happened is that we had many Fusion applications running on the same server and many sockets were created.  For a specific application, it requires to use port 9020 to listen and this port happened to be grabbed by another process dynamically.

In [1], we have documented a way to walk around port-conflict issue—by re-ordering start-up steps.  But, that cannot be guaranteed to work every time.  So, we will look at another approach in this article.

TCP Socket


It is the socket pair that specifies the two endpoints that uniquely identifies each TCP connection in an internet.  Note that an internet connection can use different transport protocols.  Here we only cover TCP connection.

For either local or remote endpoint, it is a combination of an IP address and a port number, much like one end of a telephone connection is the combination of a phone number and a particular extension. Based on this address, TCP sockets deliver incoming data packets to the appropriate application process or thread.

A process that opens a listen port will allow multiple sockets to that port. For example, when tnslsnr listens on port 1521, there will be many sockets where one port is 1521. So that port is shared; it can only be used for connections to that one particular process. The OS will never pick that port for the dynamic side of a connection, and any attempt by another process to listen on that port will fail because the address is in use.

The other port (picked by the OS) can not be shared; it will be used exclusively by the socket assigned to that port.  For the dynamic port, it will be picked up from the ip_local_port_range. For example, on our Linux server,  it was set to be from 9000 to 65500
$cat /proc/sys/net/ipv4/ip_local_port_range
9000    65500

The Solution


Port-conflict happened when a connection tried to pick  an available port (i.e., 9020) from the range (i.e., from 9000 to 65500).  In our case, it has picked 9020, which happened to be required by another managed server.

To prevent port conflicts from happening, we need to raise the lower limit of ip_local_port_range to be higher (say, 11000):
# echo "11000 65500" >/proc/sys/net/ipv4/ip_local_port_range

Note that you need to be the root user to make this change.  If you use Redhat edition, read [2] for more details.

As you can tell, after the ip_local_port_range change, the system tried to pick dynamic ports from 11000-to-65500 range when it connects to a remote socket (note that this remote endpoint happens to be on the same server):
$ netstat -an |grep 7101

tcp  0  0 ::ffff:10.214.10.20:7101   :::*                        LISTEN
tcp  0  0 ::ffff:10.214.10.20:7101   ::ffff:10.214.10.20:19739   ESTABLISHED
tcp  0  0 ::ffff:10.214.10.20:7101   ::ffff:10.214.10.20:20506   ESTABLISHED

Before we end this article, we also want to share two nice-to-know topics:
  • Privileged Ports
  • Which Port Is Configured for AdminServer or Managed Servers

Privileged Ports


The port numbers are divided into three ranges:
  • Well Known Ports: those from 0 through 1023.
  • Registered Ports: those from 1024 through 49151
  • Dynamic and/or Private Ports: those from 49152 through 65535
The TCP/IP port numbers below 1024 are special in that normal users are not allowed to run servers on them. This is a security feature, in that if you connect to a service on one of these ports you can be fairly sure that you have the real thing, and not a fake which some hacker has put up for you.

When you run a server as a test from a non-privileged account, you will normally test it on other ports, such as 2784, 5000, 8001 or 8080, not the well-known port (say, 80).

Which Port Is Configured for AdminServer or Managed Servers


If you are not sure about the port used by Weblogic server's Admin and Managed servers, you can verify it from the configuration file $DOMAIN_HOME/config/config.xml.   For example, port 9020 was reserved for our CRMAnalyticsServer_1:

<machine>slcaf977.us.oracle.com</machine>
    <listen-port>9020</listen-port>
    <cluster>CRMAnalyticsCluster</cluster>
    <web-server>
      <name>CRMAnalyticsServer_1</name>

References

  1. How to Investigate: Failed to Bind to Port on Linux
  2. The ip_local_port_range parameters (Redhat edition)
  3. Oracle Products: What Patching, Migration, and Upgrade Mean? (Xml and More)

No comments: