Cross Column

Showing posts with label Linux. Show all posts
Showing posts with label Linux. Show all posts

Sunday, August 30, 2020

NTP—How to Check It Is Working?

Network Time Protocol (NTP) synchronizes timekeeping among a set of distributed time servers and clients. This synchronization allows events to be correlated when system logs are created and other time-specific events occur. 

Clock Skew


Clock skew is a phenomenon in computers in which the same sourced clock signal arrives at different components at different times. The instantaneous difference between the readings of any two clocks is called their skew.

Causes of Clock Skew could include:
  • No running NTP services 
  • Not properly configured NTP services[1]
  • NTP attack[2,7-9]
    • Some of which result in shifting time on NTP clients
    • Another threat consideration is a malicious insider, who could modify system time in attempts to hide events or manipulate time sensitive transactions.
  • Network is congested or lossy[3]

NTP—How Does It Work?


NTP uses the User Datagram Protocol (UDP) as its transport protocol. All NTP communication uses Coordinated Universal Time (UTC). An NTP network usually receives its time from an authoritative time source, such as a radio clock or an atomic clock attached to a time server. NTP distributes this time across the network. 

NTP is extremely efficient; no more than one packet per minute is necessary to synchronize two machines to within a millisecond of each other. 

Stratum


NTP uses the concept of a “stratum” to describe how many NTP “hops” away a machine is from an authoritative time source. A “stratum 1” time server typically has an authoritative time source (such as a radio or atomic clock, or a GPS time source) directly attached, a “stratum 2” time server receives its time via NTP from a “stratum 1” time server, and so on. 

Was Your NTP Service Properly Configured?


Your web server’s system time can keep on slipping far into the future or past if NTP is not properly configured. Having accurate system time is critical for:
  • Application logic
  • Scheduled jobs 
  • Logging
    • If the system time is off, log forensics and log correlation of security events across systems becomes a nightmare
and this is especially true for virtual machine based deployments.

How to Sync the Clock on VMs?


If you use Red Hat Enterprise Linux, here are some existing knowledge base documents on how to sync the clock on VMs, such as:

How to check NTP is working?


You can use the below commands to check:
  • ntpq — standard NTP query program
  • ntpstat — show network time synchronisation status
  • timedatectl — show or set info about ntp using systemd

ntpq


The ntpq utility program is used to monitor NTP daemon ntpd operations and determine performance.

$ ntpq
ntpq> pe
     remote      refid       st t when poll reach delay offset jitter
=====================================================================
-isipc6.cairn.ne .GPS1.        1 u  18  64  377  65.59 2 -5.891  0.044
+saicpc-isiepc2. pogo.udel.edu 2 u 241 128  370  10.477 -0.117  0.067
+uclpc.cairn.net pogo.udel.edu 2 u  37  64  177 212.111 -0.551  0.187
*pogo.udel.edu   .GPS1.        1 u  95 128  377   0.607  0.123  0.027

  • *
    • The tattletale symbol at the left margin displays the synchronization status of each peer. The currently selected peer is marked *, while additional peers designated acceptable for synchronization, but not currently selected, are marked +. 
    • Peers marked * and + are included in the weighted average computation to set the local clock; the data produced by peers marked with other symbols are discarded. See ntpq for the meaning of these symbols.
  • remote
    • Correspond to the server and peer entries listed in the configuration file; however, the DNS names might not agree if the names listed are not the canonical DNS names. 
  • refid
    • Shows the current source of synchronization
  • st
    • Reveals the stratum
  • t
    • The type (u = unicast, m = multicast, l = local, - = don't know)
  • when (in secs)
    • Shows the time since the peer was last heard in seconds
  • poll (in secs)
    • The poll interval 
  • reach
    • Shows the status of the reachability register (see RFC-1305) in octal. 
  • delay (in ms)
    • Show the latest round-trip delay
  • offset (in ms)
    • Show the latest offset
      • Offset generally refers to the difference in time between an external timing reference and time on a local machine. 
      • The greater the offset, the more inaccurate the timing source is. Synchronized NTP servers will generally have a low offset. 
  • jitter (in ms)
    • Show the latest jitter (or estimated error) in milliseconds
      • The jitter associated with a timing reference indicates the magnitude of variance, or dispersion, of the signal. Different timing references have different amounts of jitter. The more accurate a timing reference, the lower the jitter value. 
      • Note that in NTP Version 4 what used to be the dispersion column has been replaced by the jitter column.
To avoid possible distractions due to name resolution problems, run the ntpq program using the -n switch.
  • -n
    • Output all host addresses in dotted-quad numeric format rather than converting to the canonical host names.

# ntpq -np
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*xxx.254.169.yyy 192.168.0.151    2 u  141 1024  377    0.545    0.066   0.131


ntpstat


ntpstat  is  a  script  which  prints  a brief summary of the system clock's synchronization status when the ntpd or chronyd daemon is running.

# ntpstat
synchronised to NTP server (xxx.254.169.yyy) at stratum 3
   time correct to within 56 ms
   polling server every 1024 s

# echo $?
0

You can also use the exit status (return values) to verify its operations from a shell script or command line itself.  If exit status is
  • 0 – Clock is synchronized
  • 1 – Clock is not synchronized
  • 2 – If clock state is indefinite or questionable, for example if ntpd is not contactable

How "time correct to within 56  ms" was calculated?


The ntp query outputs from ntpd and chronyd are different.  The below discussion is based on ntpd.  


    distance=$(echo "$delay $disp" | awk '{ printf "%.3f", $1 / 2.0 + $2 }')

    if [ -n "$distance" ]; then

        printf "   time correct to within %.0f ms" "$distance"


Therefore, distance = (delay / 2 + dispersion) 

delay = 0.649

dispersion = 55.480

and "time correct to within 56 ms" was printed


Raw Data


# ntpq -c rv

associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,

version="ntpd 4.2.6p5@1.2349-o Tue Jun 23 15:14:56 UTC 2020 (1)",

processor="x86_64", system="Linux/4.14.35-1902.10.8.el7uek.x86_64",

leap=00, stratum=3, precision=-24, rootdelay=0.649, rootdisp=55.480,

refid=xxx.254.169.yyy,

reftime=e2f66690.7a8a26cd  Sun, Aug 30 2020 17:55:28.478,

clock=e2f66814.1623cd2d  Sun, Aug 30 2020 18:01:56.086, peer=35146,

tc=10, mintc=3, offset=-0.116, frequency=13.462, sys_jitter=0.000,

clk_jitter=0.057, clk_wander=0.009


# ntpstat

synchronised to NTP server (xxx.254.169.yyy) at stratum 3

   time correct to within 56 ms

   polling server every 1024 s


timedatectl 


If you are using systemd based system, timedatectl may be used to query and change the system clock and its settings.

Run the following command to check the service status:

# timedatectl status
      Local time: Sun 2020-08-30 17:12:19 UTC
  Universal time: Sun 2020-08-30 17:12:19 UTC
        RTC time: Sun 2020-08-30 17:12:20
       Time zone: UTC (UTC, +0000)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a

How can I see the Time Difference between Client and Server?


Normally ntpd maintains an estimate of the time offset. To inspect these offsets, you can use the following commands:[5]

  1. ntpq -np will display the offsets for each reachable server in milliseconds
  2. ntpdc -c loopinfo will display the combined offset in seconds, as seen at the last poll. If supported, ntpdc -c kerninfo will display the current remaining correction, just as ntptime does.
The first can be used to check what ntpd thinks the offset and jitter is currently, relative to the preferred/current server, the second can tell you something about the estimated offset/error all the way to the stratum 1 source. 

References

Sunday, August 2, 2020

How ABRT Avoid Storing Duplicated Crashes — Deduplication

Processes crash for a multitude of reasons and it’s often difficult to understand the root causes that contribute to such crashes.  The Automatic Bug Reporting Tool, commonly abbreviated as ABRT, could offer help for forensic investigation.

ABRT


ABRT consists of the abrtd daemon and a number of system services and utilities to process, analyze, and report detected problems. 

The daemon runs silently in the background most of the time, and springs into action when an application crashes or a kernel oops is detected. The daemon then collects the relevant problem data such as a core file if there is one, the crashing application's command-line parameters, and other data of forensic utility.

Why ABRT?


Earlier when applications crashed, core dumps were generated, but not limited, which could quickly fill up the disk.

A solution is to use ABRT.  For example, it can
  • Rotate cores within a size limit by deleting the oldest[11]
  • Avoid storing duplicate crashes by deduplication[9]

Elements Collected by ABRT


In the below table, it shows a shortened list of elements collected by ABRT and their descriptions. For a full list see [4].  These elements are stored in the form of files in a single directory per detected problem (such a directory is called 'dump directory').
 
core_backtraceMachine readable backtrace with no private data
coredumpCoredump of the crashing process
countNumber of times this problem occured
crash_functionFunction which crashed
dmesgCopy of dmesg
docker_inspectOutput of docker inspect $(container_id)
dso_listList of dynamic libraries loaded at the time of crash
duphashHash of the crash's backtrace
environDump of process environment variable along with their values
event_logMessages produced by ABRT tools during processing the detected problem
executableExecutable path of the component which caused the problem.
global_pidValue of %P as passed by kernel to the core_pattern helper (see man core for more details)
hostnameHostname of the affected machine
kernelKernel version string
kernel_logResults of vmcore crash analysis performed by retrace-server
kernel_tainted_longTainted kernel description
kernel_tainted_shortKernel tainted flags (For more information about tainted flags see [1])
last_occurrenceTime of the last occurence (unixtime)
 

Deduplication


When ABRT catches new crash, it compares it to the rest of the stored problems to avoid storing duplicate crashes:
  1. It first checks if there is core_bactrace or uuid item in the problem directory it is processing
  2. If there is a core_backtrace
    • It iterates over all other dump directories and computes similarity to their core backtraces (if any). If one of them is similar enough to be considered duplicate, event processing is stopped and only notify-dup event is fired.
  3. Or if there is an uuid item (and no core backtrace)
    • Simple comparison of uuid hashes is used for duplicate detection.
You can read abrt-action-analyze-backtrace for more information.[6]


count & last_occurrence


After the forensic investigation, you can use:
  • abrt-cli rm <path to the problem directory>
to remove the specified problem data directory with all its contents.

[abrt]# abrt-cli rm ccpp-2019-08-21-13:59:02-31929
PrivateReports is disabled. Run abrt-cli-root to see all problems detected by ABRT.
rm 'ccpp-2019-08-21-13:59:02-31929'

However, note that ABRT performs a detection of duplicate problems by comparing new problems with all locally saved problems. 

For a repeating crash, ABRT requires you to act upon it only once. But, if you delete the crash dump of that problem, the next time this specific problem occurs, ABRT will treat it as a new crash: ABRT will alert you about it, prompt you to fill in a description, and report it. To avoid having ABRT notifying you about a recurring problem, do not delete its problem data.

If you didn't remove a specific problem data directory, here is what would happen when ABRT catches a new crash :
  • ABRT compares it to the rest of locally stored problems
  • If it's a new problem, a new problem directory will be created
  • Otherwise, ABRT will update the recurring problem by:
    • Incrementing "count" by one
    • Updating "last_occurrence" with a new epoch
[ccpp-2019-08-21-13:59:02-31929]# ls -lrt
total 868572
-rw-r-----. 1 abrt  abrt         3 Aug 21  2019 uid
-rw-r-----. 1 abrt  abrt        10 Aug 21  2019 time
-rw-r-----. 1 abrt  abrt        32 Aug 21  2019 os_release
-rw-r-----. 1 abrt  abrt        30 Aug 21  2019 kernel
-rw-r-----. 1 abrt  abrt        24 Aug 21  2019 hostname
-rw-r-----. 1 abrt   abrt         6 Aug 21  2019 architecture
-rw-r-----. 1 abrt  abrt     70033 Aug 21  2019 maps
-rw-r-----. 1 abrt  abrt      1323 Aug 21  2019 limits
-rw-r-----. 1 abrt  abrt        88 Aug 21  2019 cgroup
-rw-r-----. 1 abrt  abrt         4 Aug 21  2019 type
-rw-r-----. 1 abrt  abrt        90 Aug 21  2019 reason
-rw-r-----. 1 abrt  abrt        39 Aug 21  2019 pwd
-rw-r-----. 1 abrt  abrt         5 Aug 21  2019 pid
-rw-r-----. 1 abrt  abrt      2072 Aug 21  2019 open_fds
-rw-r-----. 1 abrt  abrt        48 Aug 21  2019 executable
-rw-r-----. 1 abrt  abrt     14722 Aug 21  2019 environ
-rw-r-----. 1 abrt  abrt        48 Aug 21  2019 cmdline
-rw-r-----. 1 abrt  abrt         4 Aug 21  2019 analyzer
-rw-r-----. 1 abrt  abrt         5 Aug 21  2019 abrt_version
-rw-r-----. 1 abrt  abrt 886996992 Aug 21  2019 coredump
-rw-r-----. 1 abrt  abrt         7 Aug 21  2019 username
-rw-r-----. 1 abrt  abrt   1846076 Aug 21  2019 sosreport.tar.xz
-rw-r-----. 1 abrt  abrt         0 Aug 21  2019 event_log
-rw-r-----. 1 abrt  abrt        93 Aug 21  2019 machineid
-rw-r-----. 1 abrt  abrt    378414 Aug 21  2019 core_backtrace
-rw-r-----. 1 abrt  abrt        40 Aug 21  2019 uuid
-rw-r-----. 1 abrt  abrt      1424 Aug 21  2019 dso_list
-rw-r-----. 1 abrt  abrt       199 Aug 21  2019 var_log_messages
-rw-r-----. 1 abrt  abrt         2 Jul 25 08:00 count
-rw-r-----. 1 abrt  abrt        10 Jul 25 08:00 last_occurrence

[ccpp-2019-08-21-13:59:02-31929]# cat count
2

[ccpp-2019-08-21-13:59:02-31929]# cat last_occurrence
1595664006

[ccpp-2019-08-21-13:59:02-31929]# date -u -d @1595664006
Sat Jul 25 08:00:06 UTC 2020

[ccpp-2019-08-21-13:59:02-31929]# cat reason
Process /u01/app/xxx/server/bin/yyy was killed by signal 11 (SIGSEGV)

ABRT Configuration Files


Standard ABRT installation currently provides the following ABRT specific configuration files:
  • /etc/abrt/abrt.conf — allows you to modify the behavior of the abrtd service.
  • /etc/abrt/abrt-action-save-package-data.conf — allows you to modify the behavior of the abrt-action-save-package-data program.
  • /etc/abrt/plugins/CCpp.conf — allows you to modify the behavior of ABRT's core catching hook.
For example, the default location where problem data directories are created and in which problem core dumps and all other problem data are stored is:
/var/spool/abrt

[~]# cd /var/spool/abrt

[abrt]# ls -lrt
total 32
-rw-------. 1 root   root   23 Mar  8 05:18 last-via-server
-rw-------. 1 root   root   48 Jul 25 08:00 last-ccpp
drwxr-x---. 2 abrt   abrt 4096 Jul 28 15:22 ccpp-2019-08-21-13:59:02-31929

Read [11] for all the details of ABRT configuration files.

References

  1. ABRT Documentation (Release 2.14)
  2. How to properly delete a report problem in ABRT
  3. AUTOMATIC BUG REPORTING TOOL (ABRT)
  4. Elements collected by ABRT
  5. Basic ABRT components
  6. abrt-action-analyze-backtrace
    • Analyzes C/C++ backtrace, generates duplication hash, backtrace rating, and identifies crash function in problem directory DIR
    • Then it saves this data as new elements global_uuid, rating, crash_function in this problem directory
  7. abrt-backtrace
  8. ABRT FAQ
  9. ABRT Design
  10. backtrace_rating (Red Hat doc)
    • Numerical representation of quality of backtrace based on ratio of unrecognized frames among all backtrace frames
  11. ABRT SPECIFIC CONFIGURATION

Thursday, April 26, 2018

How to Debug "java.io.IOException: Connection reset by peer"?


There are many reasons that WebLogic server may throw below exception:
java.io.IOException: Connection reset by peer
In this article, we will use one specific case for discussion.

Stack Trace


####<Apr 26, 2018, 8:42:37,381 AM UTC> <Error> <HTTP> <myserver> <CloudConsoleServer_MyServices> <[ACTIVE] ExecuteThread: '23' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <XaUWE0Co100000000> <1524732157381> <[severity-value: 8] [rid: 0:1:2] [partition-id: 0] [partition-name: DOMAIN] > <BEA-101019> <[ServletContext@15863685[app:cp-myservices.ear module:mycloud path:null spec-version:3.1 version:_18.2.4.0.0_180422.1400]] Servlet failed with an IOException.
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        at weblogic.socket.NIOOutputStream$SingleBufferWrite.writeTo(NIOOutputStream.java:841)
        at weblogic.socket.NIOOutputStream$BlockingWriter.flush(NIOOutputStream.java:455)
        at weblogic.socket.NIOOutputStream$BlockingWriter.write(NIOOutputStream.java:334)
        at weblogic.socket.NIOOutputStream.write(NIOOutputStream.java:220)
        at weblogic.servlet.internal.ChunkOutput.writeChunkTransfer(ChunkOutput.java:625)
        at weblogic.servlet.internal.ChunkOutput.writeChunks(ChunkOutput.java:587)
        at weblogic.servlet.internal.ChunkOutput.flush(ChunkOutput.java:471)
        at weblogic.servlet.internal.ChunkOutput$3.checkForFlush(ChunkOutput.java:757)
        at weblogic.servlet.internal.ChunkOutput.write(ChunkOutput.java:373)
        at weblogic.servlet.internal.ChunkOutputWrapper.write(ChunkOutputWrapper.java:165)
        at weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:186)
        at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
        at oracle.adfinternal.view.faces.caching.filter.ResponseOutputStream.writeContentTo(ResponseOutputStream.java:74)
        at oracle.adfinternal.view.faces.caching.filter.AdfFacesCachingResponse._flushContent(AdfFacesCachingResponse.java:147)
        at oracle.adfinternal.view.faces.caching.filter.AdfFacesCachingResponse.flush(AdfFacesCachingResponse.java:136)

How to Debug


In this case, our server is connected to many applications in other servers. So, the peer-in-suspect could be from either a browser or an application running in our infrastructure.

Given the stack trace, the first thing to check is find some clues from it.  For example, in this case, we saw:

 weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:186)
 java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
 oracle.adfinternal.view.faces.caching.filter.ResponseOutputStream.writeContentTo(ResponseOutputStream.java:74)
 oracle.adfinternal.view.faces.caching.filter.AdfFacesCachingResponse._flushContent(AdfFacesCachingResponse.java:147)
 oracle.adfinternal.view.faces.caching.filter.AdfFacesCachingResponse.flush(AdfFacesCachingResponse.java:136)


which means that WebLogic server is 
Writing the servlet response back to the client when the IOException was thrown

If this client were a browser, what happened could be:
The browser has shutdown the connection — either the browser crashed, or the browser shut the connection explicitly because the user closed that page or cancelled navigation on that page.

If this client were an application (i.e., a selenium or other testing tools), what happened could be:
Timeouts in those tools for how long they will wait for the response.
then
Maybe their logs would show you that they had closed the socket after some time.


HTTP Keep-Alive


In [1], the author surmized that it could be:
 " Very likely an issue with HTTP keepalive (persistent connections)."
However, this is not our case because:
Keep-alive is to make sure the socket stays open between requests. Our case is in the middle of a request, so there is no keepalive in use at that time. But conceptually it is sort of the same thing: if the client (i.e., browser) decides that the response isn't coming, it closes the socket.

If You Find Out Who's the Peer


Let assume the client is another Linux application, here are possible debugging steps:
The only thing to check at the system level is that if the machine was up the entire time — you can check its uptime, and look at dmesg for messages about the link going up or down. Otherwise, maybe the application logs will tell you if the process restarted/crashed, which is the more likely cause. 

Could tcpdump help in this case?  Probably not  because
There will probably be too much data from a tcpdump unless you know how to filter what you are looking for. 


References

  1. Possible Causes for "Connection reset by peer" when using NIOReferences

Sunday, April 16, 2017

Idiosyncrasies of ${HOME} that is an NFS Share

NFS is perhaps best for more 'permanent' network mounted directories such as /homedir or regularly accessed shared resources.  In this article, we will cover the following topics:
  • Set up NFS share via automounter
  • Idiosyncrasies of  /homedir that is an NFS share 

Automounter


One drawback to using /etc/fstab is that, regardless of how infrequently a user accesses the NFS mounted file system, the system must dedicate resources to keep the mounted file system in place. This is not a problem with one or two mounts, but when the system is maintaining mounts to many systems at one time, overall system performance can be affected.

An alternative to /etc/fstab is to use the kernel-based automount utility.  An automounter consists of two components:[1]
  • A kernel module
    • implements a file system
  • A user-space daemon
    • performs all of the other functions

The automount utility can mount and unmount NFS file systems automatically (on demand mounting) therefore saving system resources. The automount utility can be used to mount other file systems including AFS, SMBFS, CIFS and local file systems.

${HOME}


When your home directory is automounted, it has different behaviors than other file systems due to its sharing.  For example, you could run into the following two issues:
  • cp: cannot stat  "KeePass-2.14.zip": Permission denied[2]
  • ".bashrc" E509: Cannot create backup file (add ! to override)"
In the below sections, we will discuss these two issues in more details.

cp: cannot stat "KeePass-2.14.zip" : Permission denied


In [2], the author has described an issue in which she has tried to copy a file from her home directory to /usr:
$ chmod 777 KeePass-2.14.zip
$ cp KeePass-2.14.zip /usr/keepass/
cp: cannot create regular file `/usr/keepass/KeePass-2.14.zip': Permission denied
$ sudo cp KeePass-2.14.zip /usr/keepass/
cp: cannot stat `KeePass-2.14.zip': Permission denied
However, sudo cp can't statKeePass-2.14.zi because${HOME} is on an NFS mount and the NFS server doesn't grant your machine root permission to the NFS share.

To workaround this "cannot stat: Permission denied" issue, you need to copy the file to another directory (e.g., /tmp) first:
cp KeePass-2.14.zip /tmp
sudo cp /tmp/KeePass-2.14.zip /usr/keepass/

".bashrc" E509: Cannot create backup file (add ! to override)"


One time when I edited and saved my $HOME/.bashrc, the system has thrown the following message:

".bashrc" E509: Cannot create backup file (add ! to override)"
Then I used "df" command to find the disk space available on my homedir:

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
server1:/export/home4/myusername
                      5.0T  1.4T  3.7T  28% /home/myusername
It showed that there were still plenty of space.  However, because ${HOME} is NFS shared for the home directories of many others, every user has been assigned a disk quota.  To find out how much quota you have been assigned for your homedir, you can run:

$ quota -Q -s
Disk quotas for user myusername (uid 40000):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
server1:/export/home4/myusername

                  1624M   2048M   2048M               0       0       0

So, to resolve this issue, you can simply remove other junk files form the homedir to gain some disk space for saving the file.

References

  1. autofs
  2. How to copy a file from my home folder to /usr
  3. Automount mini-Howto
  4. How to configure autofs in Linux and what are its advantages?
  5. Is it feasible to have home folder hosted with NFS?

Saturday, March 19, 2016

Linux: How to Read Large Text File—/var/log/messages

To support Cloud Services, IaaS is the hardware and software that powers it all – servers, storage, networks, operating systems. These days Linux (or Windows) servers used in IaaS are more and more powerful. Hence they also generate more log files.

Very often we will run into large message files above 1 GB. These log files can be viewed by regular text editors. However, most text editors have a limitation of supporting files over a certain size.

In this article, we will cover how to read large message files (e.g., /var/log/messages) generated on Linux systems.

/var/log/messages


To debug issues in Cloud environments, it's essential for you to know where the log files are and what is contained in each log file. On Linux servers, over a dozen log files are located in /var/log directory. Here we only focus on one of them:
  • /var/log/messages[7]
    • This log aims at storing "general system activity" messages.
      • There are several things that are logged in /var/log/messages including mail, cron, daemon, kern, auth, etc.
      • The severity of messages could be
        • [INFO]
        • [DEBUG]
        • [WARNING]
        • [ERR]
        • etc
    • Older message files are archived periodically with their name annotated with the date.
If your Linux system uses rsyslogd utility, its configuration file is
/etc/rsyslog.conf
in which you can specify rules (i.e., selector + action) of logging. For example, you can log anything of level informational or higher except mail, cron, or private authentication message:
*.info;mail.none;authpriv.none;cron.none /var/log/messages
and messages are logged into a file named /var/log/messages.

Limitations of Text Editors


Some editors have limitations of supporting certain sizes of text file. For example, the following popular editors on Windows have described limitation:
  • Notepad[3]
    • 64 kilobytes (KB)
  • Wordpad[4]
    • It's said of no size limit. But, the real problem is performance.
    • Depends on the version of Wordpad, some people say it can support files of size up to 20 MB without performance issues.
  • Textpad[8]
    • It can handle file sizes up to the largest contiguous chunk of 32-bit virtual memory.

Solutions


Basically, there are two solutions of dealing with large text files:
  1. Find a more capable text editor
  2. Divide and conquer
If you google search "large text file", you may find many suggestions on Large Text File Reader. Some editors may be able to open and read large text files. However, the performance (e.g., searching a pattern) of it could be slow.

On Linux systems, a good approach is 'divide-and-conquer" by using split command like:
split -b1000m messages-20160315T2201 split-messages

Wednesday, December 30, 2015

Jumbo Frames—Design Considerations for Efficient Network

Each network has some maximum packet size, or maximum transmission unit (MTU). Ultimately there is some limit imposed by the technology, but often the limit is an engineering choice or even an administrative choice.[1]

Many Gigabit Ethernet switches and Gigabit Ethernet network interface cards can support jumbo frames.[2] There are performance benefits to enable Jumbo Frames (MTU: 9000). However, existing transmission links may still impose smaller MTU (e.g., 1500). This could exhibit issues along transit paths, which is referred to here as MTU Mismatch.

In this article, we will examine issues manifested by MTU mismatch and their design considerations.

How to Accommodate MTU Differences


When a host on the Internet wants to send some data, it must know how to divide the data into packets. And in particular, it needs to know the maximum size of packet.

Jumbo frames are Ethernet frames with more than 1500 bytes of payload.[3] Conventionally, jumbo frames can carry up to 9000 bytes of payload, but variations exist and some care must be taken using the term. In this article, we will use MTU: 9000 and MTU: 1500 as our examples to discuss MTU-mismatch issues.

Issues

MTU is a maximum—you tell a network device NOT to drop frames unless they are larger than the maximum. A device with an MTU of 1500 can still communicate with a device with an MTU of 9000. However, when large-size packets are sent from MTU 9000 device to MTU-1500 device, the following happens:
  • If DF (Don't Fragment) flag is set
    • Packets will be dropped and a router is required to return an ICMP Destination Unreachable message to the source of the datagram, with the Code indicating "fragmentation needed and DF set"
  • If DF flag is not set
    • Packets will be fragmented to accommodate MTU differences, which will beget a cost[4]

How to Test Potential MTU Mismatch


Either ping, tracepath, or traceroute (with --mtu option) command can be used to test potential MTU-mismatches.

For example, you can verify that the path between two end nodes has at least the expected MTU using the ping command:
ping -M do -c 4 -s 8972
The -M do option causes the DF flag to be set.
The -c option sets the number of pings.
The -s option specifies the number of bytes of padding that should be added to the echo request. In addition to this number there will be 20 bytes for the internet protocol header, and 8 bytes for the ICMP header and timestamp. The amount of padding should therefore be 28 bytes less than the network-layer MTU that you are trying to test (9000 − 28 = 8972).

If the test is unsuccessful, then you should see an error in response to each echo request:
$ ping -M do -c 4 -s 8972 10.252.136.96
PING 10.252.136.96 (10.252.136.96) 8972(9000) bytes of data.
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)


--- 10.252.136.96 ping statistics ---
0 packets transmitted, 0 received, +4 errors
Similarly, you can use tracepath command to test:
$ tracepath -n -l 9000
The -n option specifies not looking up host names (i.e, only print IP addresses numerically).
The -l option sets the initial packet length to pktlen instead of 65536 for tracepath or 128000 for tracepath6.
In the tracepath output, the last line summarizes information about all the path to the destination:
The last line shows detected Path MTU, amount of hops to the destination and our guess about amount of hops from the destination to us, which can be different when the path is asymmetric.
/* a packet of length 9000 cannot reach its destination */
$ tracepath -n -l 9000 10.249.184.27
1: 10.241.71.129 0.630ms
2: 10.241.152.60 0.577ms
3: 10.241.152.0 0.848ms
4: 10.246.1.49 1.007ms
5: 10.246.1.106 0.783ms
6: no reply
...
31: no reply
Too many hops: pmtu 9000
Resume: pmtu 9000
/* a packet of length 1500 reached its destination */
$ tracepath -n -l 1500 10.249.184.27
1: 10.241.71.129 0.502ms
2: 10.241.152.62 0.419ms
3: 10.241.152.4 0.543ms
4: 10.246.1.49 0.886ms
5: 10.246.1.106 0.439ms
6: 10.249.184.27 0.292ms reached
Resume: pmtu 1500 hops 6 back 59

When to Enable Jumbo Frames?


Enabling jumbo frame mode (for example, on Gigabit Ethernet network interface cards) can offer the following benefits:
  • Less consumption of bandwidth by non-data protocol overhead
    • Hence increase network throughput
  • Reduction of the packet rate
    • Hence reduce server overhead
      • The use of large MTU sizes allows the operating system to send fewer packets of a larger size to reach the same network throughput.
      • For example, you will see the decrease in CPU usage when transferring larger file
The above factors are especially important in speeding up NFS or iSCSI traffic, which normally has larger payload size.

Design Considerations


When jumbo frame mode is enabled, the trade-offs include:
  • Bigger I/O buffer
    • Required for both end nodes and intermediate transit nodes
  • MTU mismatch
    • May beget IP fragmentation or even loss of data
Therefore, some design considerations are required. For example, you can:
  • Avoid situations where you have jumbo frame enabled host NIC's talking to non-jumbo frame enabled host NIC's.
      • One design trick is to let your NFS or ISCSI traffic be sent via a dedicated NIC and your normal host traffic be sent via a non-jumbo-MTU enabled interface
        • If your workload only include small messages, then the larger MTU size will not help
      • Be sure to use commands with the Don't fragment bit set to ensure that your hosts which are configured for jumbo frames are able to successfully communicate with each other via jumbo frames.
  • Enable Path MTU Discovery (PMTUD)[18]
    • When possible, use the largest MTU size that the adapter and network support, but constrained by Path MTU
    • Make sure the packet filter on your firewall process ICMP packets correctly
      • RFC 4821, Packetization Layer Path MTU Discovery, describes a Path MTU Discovery technique which responds more robustly to ICMP filtering.
  • Be aware of extra non-data protocol overhead if you configure encapsulation such as GRE tunneling or IPsec encryption.

References

  1. The TCP Maximum Segment Size and Related Topics
  2. Jumbo/Giant Frame Support on Catalyst Switches Configuration Example
  3. Ethernet Jumbo Frames\
  4. IP Fragmentation: How to Avoid It? (Xml and More)
  5. The Great Jumbo Frames Debate
  6. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
  7. Sites with Broken/Working PMTUD
  8. Path MTU Discovery
  9. TCP headers
  10. bad TCP checksums
  11. MSS performance consideration
  12. Understanding Routing Table
  13. route (Linux man page)
  14. Docker should set host-side veth MTU #4378
  15. Add MTU to lxc conf to make host and container MTU match
  16. Xen Networking
  17. TCP parameter settings (/proc/sys/net/ipv4)
  18. Change the MTU of a network interface
    • To enable PMTUD on Linux, type:
      • echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing
      • echo 1024 > /proc/sys/net/ipv4/tcp_base_mss
  19. MTU manipulation
  20. Jumbo Frames, the gotcha's you need to know! (good)
  21. Understand container communication (Docker)
  22. calicoctl should allow configuration of veth MTU #488 - GitHub
  23. Linux MTU Change Size
  24. Changing the MTU size in Windows Vista, 7 or 8
  25. Linux Configure Jumbo Frames to Boost Network Performance
  26. Path MTU discovery in practice
  27. Odd tracepath and ping behavior when using a 9000 byte MTU
  28. How to Read a Traceroute

Monday, August 12, 2013

How to Investigate: Failed to Bind to Port on Linux

From the server log file (i.e., CRMCommonServer_1.log) of WebLogic, I have found the following messages:

####<Aug 12, 2013 10:40:43 AM PDT> <Emergency> <Security> <myserver> <CRMCommonServer_1> <[STANDBY] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1376329243268> <BEA-090087> <Server failed to bind to the configured Admin port. The port may already be used by another process.>
####<Aug 12, 2013 10:40:43 AM PDT> <Error> <Server> <myserver> <CRMCommonServer_1> <DynamicListenThread[Default]> <<WLS Kernel>> <> <> <1376329243268> <BEA-002606> <Unable to create a server socket for listening on channel "Default". The address 10.241.88.31 might be incorrect or another process is using port 9004: java.net.BindException: Address already in use.>

In this article, I will show you how to investigate: 
  • Which process is using port 9004?

Netstat Command on Linux


To investigate failed-to-bind-to -port issue, netstat comes in handy on Linux systems.  netstat command can be used to:
  • Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships

In this detective work, we have used the following options:

   -a, --all
       Show both listening and non-listening sockets.  With the --interfaces  option,  show  inter-
       faces that are not marked
   -p, --program
       Show the PID and name of the program to which each socket belongs.

The results are shown below:

$ netstat -ap | grep 9004 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 myserver.us.ora:interserver myserver.oracle.com:9004 ESTABLISHED 12550/oidldapd tcp 0 0 myserver.us.oracle.com:9004 myserver.ora:interserver ESTABLISHED 22328/java


From the output, we know a Java application (i.e., process 22328) is using port 9004. When the first socket is bound to that port, then no other socket could be bound on port 9004 as long as the first socket remains open.  To know which application it is, we check out that process' command line:
  • $ vi /proc/22328/cmdline 
On the command line, we have found the following information:
  • -Dweblogic.Name=AdminServer
Also, BIDomain was mentioned there. So, that process is the AdminServer of BIDomain.

Port 7020


Similarly, we have seen port 7020 was used in another server's log file:
  • <BEA-002606> <Unable to create a server socket for listening on channel "Default". The address 10.241.88.31 might be incorrect or another process is using port 7020: java.net.BindException: Address already in use.>
When you tried:
    # netstat -ap  |grep 7020

    No entries have been returned.   However, if you use:

    # netstat -an  |grep 7020

    You could find one entry:

    tcp        0      0 ::ffff:10.241.88.31:7020    :::*                        LISTEN

    In this case, we need to use the following command line:

    # netstat -ap --numeric-ports |grep 7020
    tcp        0      0 slcag044.us.oracle.com:7020 *:*                         LISTEN      21696/java      

    So, we know process 21696 is using port 7020.  To investigate further, we typed:
    # netstat -ap  |grep  21696
    tcp        0      0 slcag044.us.oracle.:dpserve *:*                         LISTEN      21696/java

    It shows dpserve in the place of 7020.  So, that's why our first search ended up with no entries. Now we know port 7020 was used by the dpserve protocol for service type dpserve[2,3].

    Our Solution


    In our case, we need to re-order our start-up steps (see [4] for another approach). Instead of starting BIDomain first, we need to start it last. To fix our issue, we have done:
    • Shut down BIDomain 
    • Start up CRMDomain 
    • Start up BIDomain

    References

    © Travel for Life Guide. All Rights Reserved.

    Analytical Insights on Health, Culture, and Security.