Sunday, February 17, 2019

OAC―Knowing the Dimensional Modelling Basics (1/2)

Video 1.  Dimensional Modeling – Declaring Dimensions

Operational Processing vs Data Warehousing


One of the most important assets of any organization is its information. This asset is almost always used for two purposes:[1]



Operational Processing Analytical Decision Making
Flavor
Transactional Analytical
Main Data Flow
The operational systems are where you put data in The Data Warehousing and Business Intelligence (DW/BI) systems are where you get the data out
Optimization
Optimized to process transactions quickly Optimized for high-performance queries
# of Transactions in Processing
Almost always deal with one transaction record at a time Often require that many transactions be searched and compressed into an answer set
History Preservation
Typically do not maintain history, but rather update data to reflect the most current state Typically demand that historical context be preserved to accurately evaluate the organization's performance over time


Dimensional Modeling (DW/BI)


Dimensional modeling is one of the methods of data modeling, that is the preferred technique for presenting analytic data.  It helps us store the data in such a way that it is relatively easy to retrieve the data from the data once the data is stored in database.

This is the reason why dimensional modeling is used mostly in data warehouses built for reporting. On the other side, dimensional model is not a good solution if your primary purpose of your data modeling is to reduce storage space requirement, reduce redundancy, speed-up loading time etc.[1]

Figure 1.  Star Schema (left) vs OLAP Cube (right)

3NF Model vs Dimensional Model


Although dimensional models are often instantiated in relational database management systems (RDMS), they are quite different from third normal form (3NF) models which seek to remove data redundancies:
  • 3NF Model (or Normalized Model)
    • Divides data into many discrete entities, each of which becomes a relational table
    • Sometimes are referred as entity-relationship (ER) models
    • Designed to reduce the duplication of data and ensure referential integrity
    • Designed to improve database processing while minimizing storage costs
    • Useful in operational processing because an update or insert transaction touches the database in only one place
      • However, are too complicated for BI queries. 
  • Dimensional Model (Star Schemas and OLAP Cubes)
    • Both stars and cubes have a common logical design with recognizable dimensions; however, the physical implementation differs (see Figure 1):
      • Star Schemas
        • Referred to as star schemas in RDBS because of their resemblance to a star-like structure in RDMS implementation
      • OLAP Cubes
        • Referred to as online analytical processing (OLAP) cubes in multidimensional database platform
        • Cubes can deliver superior query performance because of the precalculations, indexing strategies, and other optimizations
        • The downside is that you pay a load performance price for these capabilities, especially with large data sets
    • Contains the same information as a normalized model, but packages the data in a format that delivers user understandability, query performance, and resilience to change
Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key difference between 3NF and dimensional models is the degree of normalization.

Although the capabilities of OLAP technology are continuously improving, we generally recommend that detailed, atomic information be loaded into a star schema; optional OLAP cubes are then populated from the star schema.[1]

Dimensional Modeling Case Study


Consider the business scenario for a fast food chain:[2]
The business objective is to create a data model that can store and report number of burgers and fries sold from a specific McDonalds outlet per day. 

Below are the steps used for dimensional modeling: 
  • Identify the dimensions
    • Dimensions 
      • Describe the “who, what, where, when, how, and why” associated with the business process measurement event (e.g. a sales transaction).
    • In the above scenario, we have 3 dimensions - "food" (e.g. burgers and fries), "store" and "day"
      • Separate dimension tables are created for separate dimensions
        • The dimension tables contain the textual context (normally with set of descriptive nouns that characterize the business process) associated with a measurement event. 
  • Identify the measurement events (or facts)
    • Measurement Events
      • A measurement event in the physical world has a one-to-one relationship to a single row in the corresponding fact table
    • In the above scenario, we have 1 measurement event - "quantity"
      • fact table is created for storing measures and foreign keys to the dimension tables
        • The fact table stores the "number" of food sold in "Quantity" column against a given store, food and day columns. 
        • These store/food/day columns are basically foreign key columns of the primary keys in respective dimension tables. 
  • Identify the attributes or properties of dimensions
    • Attributes (or Properties)
      • Each dimension might have number of different properties, but for a given context, not all of them are relevant for business
    • Knowing the properties let us decide what columns are required to be created in each dimension table.
    • In the above scenario, we could have
      • Food: name (burgers or fries)
      • Store: name, location, etc
      • Day: date
  • Identify the granularity of the measures
    • All the measurement rows in a fact table must be at the same grain (i.e., day or month). 
    • Having the discipline to create fact tables with a single level of detail ensures that measurements aren't inappropriately double-counted.
  • History Preservation (Optional)
    • Identify which dimensions are slowly changing (or fast changing or unchanging) is the last and final step of modeling (see video 1)
    • There are 8 different dimension types, but only 3 are commonly used:[3]
      • Type 0 - Fixed, non changing attribute
      • Type 1 - Changing attribute, no history kept
      • Type 2 - Most complex, keeps historical changes

Figure 2.  Sample rows from a dimension table with denormalized hierarchies

Summary

  • Dimensional Model (cf. Normalized Model )
    • Dimensional schema is simpler and symmetric
      • Business users benefit from the simplicity because the data is easier to understand and navigate
      • Database optimizers process these simple schemas with fewer joins more efficiently
      • Every dimension is equivalent; all dimensions are symmetrically-equal entry points into the fact table.
    • Dimensional models are gracefully extensible to accommodate change
      • With dimensional models, you can add completely new dimensions to the schema as long as a single value of that dimension is defined for each existing fact row.
  • Fact Tables
    • Fact tables tend to be deep in terms of the number of rows, but narrow in terms of the number of columns
    • The most useful facts are numeric and additive, such as dollar sales amount. 
      • Additivity is crucial because BI applications rarely retrieve a single fact table row.
        • However, you will see that facts are sometimes semi-additive (e.g., account balances) or even non-additive (e.g., unit prices). 
    • Facts are often described as continuously valued 
    • Fact tables usually make up 90 percent or more of the total space consumed by a dimensional model. 
    • All fact tables have two or more foreign keys that connect to the dimension tables' primary keys.
    • Fact tables (or bridge table) express many-to-many relationships
  • Dimension Tables
    • Dimension tables tend to be shallow in terms of the number of rows, but wide in terms of the number of columns
    • Each dimension is defined by a single primary key (surrogate key or natural key) , which serves as the basis for referential integrity with any given fact table to which it is joined.
    • Robust dimension attributes deliver robust analytic slicing-and-dicing capabilities.
      • In many ways, the data warehouse is only as good as the dimension attributes; the analytic power of the DW/BI environment is directly proportional to the quality and depth of the dimension attributes.
      • Dimension attributes serve as the primary source of query constraints, groupings, and report labels.
        • You should strive to minimize the use of codes or cryptic abbreviations in dimension tables by replacing them with more verbose textual attributes.
    • Dimension tables often represent hierarchical relationships (See Figure 2)
      • For example, products roll up into brands and then into categories
      • For each row in the product dimension, you should store the associated brand and category description. 
      • The hierarchical descriptive information is stored redundantly in the spirit of ease of use and query performance.
      • You should resist the habitual urge to normalize data (i.e., snowflaking)
        • You should almost always trade off dimension table space for simplicity and accessibility.
        • Because dimension tables typically are geometrically smaller than fact tables, improving storage efficiency by normalizing or snowflaking has virtually no impact on the overall database size. 
  • Fact or Dimension Attribute
    • When triaging operational source data, it is sometimes unclear whether a numeric data element is a fact or dimension attribute.  It is
      • A fact if
        • The column is a measurement that takes on lots of values and participates in calculations
      • A dimension attribute if
        • The column is a discretely valued description that is more or less constant and participates in constraints and row labels
      • Note:
        • Continuously valued numeric observations are almost always facts; discrete numeric observations drawn from a small list are almost always dimension attributes.

References

  1. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
  2. What is dimensional modelling?
  3. Dimensional Modeling – Declaring Dimensions (Youtube)

Friday, February 15, 2019

Oracle Fusion Middleware Diagnostic Framework―How to Diagnosing Problems

Oracle Fusion Middleware includes a Diagnostic Framework, which aids in detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors, such as those caused by
  • Code bugs
  • Metadata corruption
  • Customer data corruption
  • Deadlocked threads
  • Inconsistent state
In this article, we will cover what Oracle Fusion Middleware Diagnostic Framework is and how it works.

Problem vs Incident

  • Problem 
    • Is a critical error
    • Has a problem key
      • Is a text string that describes the problem
      • Includes an error code (in the format XXX-nnnnn) and in some cases, other error-specific values.
        • incident 1123 created with problem key "DFW-99998 [weblogic.jdbc.extensions.PoolDisabledSQLException][oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.executeBaseQuery][bi-contentstorage]"
  • Incident
    • Is a single occurrence of a problem
      • When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the ADR
    • Is identified by a numeric incident ID (see 1123 above), which is unique within the ADR home

Oracle Fusion Middleware Diagnostic Framework 


When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as log files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR), where it can later be retrieved by incident number and analyzed.  Here is the summary of its features:
  • Supports incident detection log filter
    • Implements the java.util.logging filter
    • Inspects each log message to see if an incident should be created, basing its decision on the diagnostic rules for components and applications.
  • Integrated with WebLogic Diagnostics Framework (WLDF)
  • All diagnostic data relating to a critical error is captured and stored as an incident in Automatic Diagnostic Repository (ADR)
    • Collects diagnostic data, such as
  • Provides standardized log formats
    • Using the ODL log file format across all Oracle Fusion Middleware components.
  • Incident flood control
    • Diagnostic Framework applies flood control to incident generation after certain thresholds are reached
      • To avoid generating too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem
    • Example:
    • [2019-02-08T23:59:50.082+00:00] [bi_server2] [WARNING] [DFW-40125] [oracle.dfw.incident] [tid: [ACTIVE].ExecuteThread: '62' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId: ] [ecid: 551d9654-1bc1-4b2f-b8d4-cbd3ab71603c-0004765a,0] [partition-name: DOMAIN] [tenant-name: GLOBAL] incident flood controlled with Problem Key "DFW-99998 [weblogic.jdbc.extensions.PoolDisabledSQLException][oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.executeBaseQuery][bi-contentstorage]"

Integration with WLDF


Oracle Fusion Middleware Diagnostics Framework integrates with the following components of WLDF:
  • WLDF Watch and Notification
    • Watches specific logs and metrics for specified conditions and sends a notification when a condition is met. 
      • Oracle Fusion Middleware Diagnostics Framework integrates with the WLDF Watch and Notification component to create incidents.
    • There are several types of notifications, including JMX notification and a notification to create a Diagnostic Image. 
  • Diagnostic Image Capture
    • Gathers the most common sources of the key server state used in diagnosing problems. 
      • Packages that state into a single artifact, the Diagnostic Image
      • With Oracle Fusion Middleware Diagnostics Framework, it writes the artifact to ADR.
Figure 1 shows the interaction when the incident is detected by the incident log detector. It shows the interaction among the incident log detector, the WLDF Diagnostic Image MBean, ADR, and component or application dumps when an incident is detected by the incident log detector.
Figure 1.  Incident Creation Generated by Incident Log Detector
Sample WebLogic Server Log

<Feb 8, 2019 11:59:54,143 PM UTC> <Notice> <Diagnostics> <xxxxxxx020308oacpod-bi-2.svcsbnet308.yyyyyyy2.oraclevcn.com> <bi_server2> <[STANDBY] ExecuteThread: '29' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <eed7eff4-508d-4c1d-9c2b-b19d8e8936a6-0007d1d0> <1549670394143> <[severity-value: 32] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-320068> <Watch "UncheckedException" in module "Module-FMWDFW" with severity "Notice" on server "bi_server2" has triggered at Feb 8, 2019 11:59:54 PM UTC. Notification details:
WatchRuleType: Log
WatchRule: (log.severityString == 'Error') and ((log.messageId == 'WL-101020') or (log.messageId == 'WL-101017') or (log.messageId == 'WL-000802') or (log.messageId == 'BEA-101020') or (log.messageId == 'BEA-101017') or (log.messageId == 'BEA-000802'))
WatchData: MESSAGE = [ServletContext@879994790[app:bi-servicelcm-rest module:bi-servicelcm-rest path:null spec-version:3.1]] Root cause of ServletException.
oracle.bi.servicelcm_v2.exceptions.PersistenceBackendException: Unable to create Pod record
at oracle.bi.servicelcm_v2.db.DatabasePodPersistenceManager.getPodImpl(DatabasePodPersistenceManager.java:39) 
... 
<Feb 8, 2019 11:59:58,489 PM UTC> <Emergency> <oracle.dfw.incident> <xxxxxxx020308oacpod-bi-2.svcsbnet308.yyyyyyy2.oraclevcn.com> <bi_server2> <[ACTIVE] ExecuteThread: '68' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <eed7eff4-508d-4c1d-9c2b-b19d8e8936a6-0007d1d1> <1549670398489> <[severity-value: 1] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-000000> <incident 1326 created with problem key "DFW-99998 [weblogic.jdbc.extensions.PoolDisabledSQLException][oracle.bi.servicelcm_v2.db.SqlHelper.doTransaction][bi-servicelcm-rest]">