Monday, August 28, 2017

AWR—"log file sync" Wait Event Analysis

There are some DB wait events could be caused by poor storage performance in Oracle.  For example,
In this article, I will discuss the log file sync wait event in Oracle AWR reports, which in many cases is caused by poor storage performance.

Top 10 Foreground Events by Total Wait Time


EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
DB CPU2590.996.6
SQL*Net break/reset to client5,864,510546.4020.4Application
log file sync19,57525.511.0Commit
SQL*Net message to client15,493,939100.4Network
library cache: mutex X8,042.60.0Concurrency
db file sequential read93.55.0User I/O
direct path read158.32.0User I/O
Disk file operations I/O89.11.0User I/O
SQL*Net more data to client945.10.0Network
cursor: pin S39.11.0Concurrency

Log File Sync


An Oracle user session issuing a commit command must wait until the LGWR (Log Writer) process writes the log entries associated with the user transaction to the log file on the disk. Oracle must commit the transaction’s entries to disk (because it is a persistent layer) before acknowledging the transaction commit. The log file sync wait event represents the time the session is waiting for the log buffers to be written to disk.

Sometimes you can find "log file sync" wait event appears on the top list of an AWR report:
Top 10 Foreground Events by Total Wait Time

What does it mean if log file sync is shown on the list?[1]
  • Is this noticeably slowing down all commits?
    • Disk throughput is only one aspect that affects LGWR. It consumes CPU while executing too. 
      •  If you've maxed out your CPU capacity processing "business transactions", then it will be starved for resource. This can lead to you seeing a lot of "log file sync" waits. 
      • If your datafiles are on the same disks as the redo logs, then DBWR will also be contending for the same disk. 
  • Is it just the top wait event on your system? 
    • Remember that there's always something that has to be "top". 

How to Reduce "log file sync" Wait Time


When a user commits or rolls back data, the LGWR flushes the session's redo from the log buffer to the redo logs. The log file sync process must wait for this to successfully complete.

If log file sync is a problem (e.g., avg wait > 2 ms), try the following solutions based on its causes:[1-4]
  • Slow disk I/O
    • Reduce contention on existing disks
    • Put log files on faster disks and/or increasing the log_buffer size above 10 megabytes
    • Put alternate redo logs on different disks to minimize the effect of archive processes (log files switches)
  • LGWR is not getting enough CPU
    • If the vmstat runqueue column is greater than cpu_count, then the instance is CPU-bound and this can manifest itself in high log file sync waits. The solution is to 
      • Tune SQL (to reduce CPU overhead)
      • Add processors
      • 'Nice' the dispatching priority of the LGWR process
  • High COMMIT activity
    • Review application design, use NOLOGGING operations where appropriate, and reduce the frequency of COMMIT statements in the application
  • LGWR is paged out
    • Check the server for RAM swapping, and add RAM if the instance processes are getting paged-out.

References

  1. Log file sync wait (Ask TOM)
  2. Log file sync wait
  3. Oracle Log File Sync Wait Event
  4. Expert Consolidation in Oracle Database 12c
  5. AWR Wait Events: Free Buffer Waits vs. Buffer Busy Waits (Xml and More)

No comments: