Cross Column

Showing posts with label Watch Rule. Show all posts
Showing posts with label Watch Rule. Show all posts

Friday, February 15, 2019

Oracle Fusion Middleware Diagnostic Framework―How to Diagnosing Problems

Oracle Fusion Middleware includes a Diagnostic Framework, which aids in detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors, such as those caused by
  • Code bugs
  • Metadata corruption
  • Customer data corruption
  • Deadlocked threads
  • Inconsistent state
In this article, we will cover what Oracle Fusion Middleware Diagnostic Framework is and how it works.

Problem vs Incident

  • Problem 
    • Is a critical error
    • Has a problem key
      • Is a text string that describes the problem
      • Includes an error code (in the format XXX-nnnnn) and in some cases, other error-specific values.
        • incident 1123 created with problem key "DFW-99998 [weblogic.jdbc.extensions.PoolDisabledSQLException][oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.executeBaseQuery][bi-contentstorage]"
  • Incident
    • Is a single occurrence of a problem
      • When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the ADR
    • Is identified by a numeric incident ID (see 1123 above), which is unique within the ADR home

Oracle Fusion Middleware Diagnostic Framework 


When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as log files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR), where it can later be retrieved by incident number and analyzed.  Here is the summary of its features:
  • Supports incident detection log filter
    • Implements the java.util.logging filter
    • Inspects each log message to see if an incident should be created, basing its decision on the diagnostic rules for components and applications.
  • Integrated with WebLogic Diagnostics Framework (WLDF)
  • All diagnostic data relating to a critical error is captured and stored as an incident in Automatic Diagnostic Repository (ADR)
    • Collects diagnostic data, such as
  • Provides standardized log formats
    • Using the ODL log file format across all Oracle Fusion Middleware components.
  • Incident flood control
    • Diagnostic Framework applies flood control to incident generation after certain thresholds are reached
      • To avoid generating too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem
    • Example:
    • [2019-02-08T23:59:50.082+00:00] [bi_server2] [WARNING] [DFW-40125] [oracle.dfw.incident] [tid: [ACTIVE].ExecuteThread: '62' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId: ] [ecid: 551d9654-1bc1-4b2f-b8d4-cbd3ab71603c-0004765a,0] [partition-name: DOMAIN] [tenant-name: GLOBAL] incident flood controlled with Problem Key "DFW-99998 [weblogic.jdbc.extensions.PoolDisabledSQLException][oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.executeBaseQuery][bi-contentstorage]"

Integration with WLDF


Oracle Fusion Middleware Diagnostics Framework integrates with the following components of WLDF:
  • WLDF Watch and Notification
    • Watches specific logs and metrics for specified conditions and sends a notification when a condition is met. 
      • Oracle Fusion Middleware Diagnostics Framework integrates with the WLDF Watch and Notification component to create incidents.
    • There are several types of notifications, including JMX notification and a notification to create a Diagnostic Image. 
  • Diagnostic Image Capture
    • Gathers the most common sources of the key server state used in diagnosing problems. 
      • Packages that state into a single artifact, the Diagnostic Image
      • With Oracle Fusion Middleware Diagnostics Framework, it writes the artifact to ADR.
Figure 1 shows the interaction when the incident is detected by the incident log detector. It shows the interaction among the incident log detector, the WLDF Diagnostic Image MBean, ADR, and component or application dumps when an incident is detected by the incident log detector.
Figure 1.  Incident Creation Generated by Incident Log Detector
Sample WebLogic Server Log

<Feb 8, 2019 11:59:54,143 PM UTC> <Notice> <Diagnostics> <xxxxxxx020308oacpod-bi-2.svcsbnet308.yyyyyyy2.oraclevcn.com> <bi_server2> <[STANDBY] ExecuteThread: '29' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <eed7eff4-508d-4c1d-9c2b-b19d8e8936a6-0007d1d0> <1549670394143> <[severity-value: 32] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-320068> <Watch "UncheckedException" in module "Module-FMWDFW" with severity "Notice" on server "bi_server2" has triggered at Feb 8, 2019 11:59:54 PM UTC. Notification details:
WatchRuleType: Log
WatchRule: (log.severityString == 'Error') and ((log.messageId == 'WL-101020') or (log.messageId == 'WL-101017') or (log.messageId == 'WL-000802') or (log.messageId == 'BEA-101020') or (log.messageId == 'BEA-101017') or (log.messageId == 'BEA-000802'))
WatchData: MESSAGE = [ServletContext@879994790[app:bi-servicelcm-rest module:bi-servicelcm-rest path:null spec-version:3.1]] Root cause of ServletException.
oracle.bi.servicelcm_v2.exceptions.PersistenceBackendException: Unable to create Pod record
at oracle.bi.servicelcm_v2.db.DatabasePodPersistenceManager.getPodImpl(DatabasePodPersistenceManager.java:39) 
... 
<Feb 8, 2019 11:59:58,489 PM UTC> <Emergency> <oracle.dfw.incident> <xxxxxxx020308oacpod-bi-2.svcsbnet308.yyyyyyy2.oraclevcn.com> <bi_server2> <[ACTIVE] ExecuteThread: '68' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <eed7eff4-508d-4c1d-9c2b-b19d8e8936a6-0007d1d1> <1549670398489> <[severity-value: 1] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-000000> <incident 1326 created with problem key "DFW-99998 [weblogic.jdbc.extensions.PoolDisabledSQLException][oracle.bi.servicelcm_v2.db.SqlHelper.doTransaction][bi-servicelcm-rest]">
DFW-99998 is one of the "Uncaught Exception Problem Keys" and its specific format is:
  • DFW-99998 [exception-name][package.class.name][app-name]
For example, the [app-name] in the above example is bi-servicelcm-rest.