Cross Column

Showing posts with label Data Warehouse. Show all posts
Showing posts with label Data Warehouse. Show all posts

Sunday, February 17, 2019

OAC―Knowing the Dimensional Modelling Basics (1/2)

Video 1.  Dimensional Modeling – Declaring Dimensions (YouTube link)

Operational Processing vs Data Warehousing


One of the most important assets of any organization is its information. This asset is almost always used for two purposes:[1]



Operational Processing Analytical Decision Making
Flavor
Transactional Analytical
Main Data Flow
The operational systems are where you put data in The Data Warehousing and Business Intelligence (DW/BI) systems are where you get the data out
Optimization
Optimized to process transactions quickly Optimized for high-performance queries
# of Transactions in Processing
Almost always deal with one transaction record at a time Often require that many transactions be searched and compressed into an answer set
History Preservation
Typically do not maintain history, but rather update data to reflect the most current state Typically demand that historical context be preserved to accurately evaluate the organization's performance over time


Dimensional Modeling (DW/BI)


Dimensional modeling is one of the methods of data modeling, that is the preferred technique for presenting analytic data.  It helps us store the data in such a way that it is relatively easy to retrieve the data from the data once the data is stored in database.

This is the reason why dimensional modeling is used mostly in data warehouses built for reporting. On the other side, dimensional model is not a good solution if your primary purpose of your data modeling is to reduce storage space requirement, reduce redundancy, speed-up loading time etc.[1]

Figure 1.  Star Schema (left) vs OLAP Cube (right)

3NF Model vs Dimensional Model


Although dimensional models are often instantiated in relational database management systems (RDMS), they are quite different from third normal form (3NF) models which seek to remove data redundancies:
  • 3NF Model (or Normalized Model)
    • Divides data into many discrete entities, each of which becomes a relational table
    • Sometimes are referred as entity-relationship (ER) models
    • Designed to reduce the duplication of data and ensure referential integrity
    • Designed to improve database processing while minimizing storage costs
    • Useful in operational processing because an update or insert transaction touches the database in only one place
      • However, are too complicated for BI queries. 
  • Dimensional Model (Star Schemas and OLAP Cubes)
    • Both stars and cubes have a common logical design with recognizable dimensions; however, the physical implementation differs (see Figure 1):
      • Star Schemas
        • Referred to as star schemas in RDBS because of their resemblance to a star-like structure in RDMS implementation
      • OLAP Cubes
        • Referred to as online analytical processing (OLAP) cubes in multidimensional database platform
        • Cubes can deliver superior query performance because of the precalculations, indexing strategies, and other optimizations
        • The downside is that you pay a load performance price for these capabilities, especially with large data sets
    • Contains the same information as a normalized model, but packages the data in a format that delivers user understandability, query performance, and resilience to change
Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key difference between 3NF and dimensional models is the degree of normalization.

Although the capabilities of OLAP technology are continuously improving, we generally recommend that detailed, atomic information be loaded into a star schema; optional OLAP cubes are then populated from the star schema.[1]

Dimensional Modeling Case Study


Consider the business scenario for a fast food chain:[2]
The business objective is to create a data model that can store and report number of burgers and fries sold from a specific McDonalds outlet per day. 

Below are the steps used for dimensional modeling: 
  • Identify the dimensions
    • Dimensions 
      • Describe the “who, what, where, when, how, and why” associated with the business process measurement event (e.g. a sales transaction).
    • In the above scenario, we have 3 dimensions - "food" (e.g. burgers and fries), "store" and "day"
      • Separate dimension tables are created for separate dimensions
        • The dimension tables contain the textual context (normally with set of descriptive nouns that characterize the business process) associated with a measurement event. 
  • Identify the measurement events (or facts)
    • Measurement Events
      • A measurement event in the physical world has a one-to-one relationship to a single row in the corresponding fact table
    • In the above scenario, we have 1 measurement event - "quantity"
      • fact table is created for storing measures and foreign keys to the dimension tables
        • The fact table stores the "number" of food sold in "Quantity" column against a given store, food and day columns. 
        • These store/food/day columns are basically foreign key columns of the primary keys in respective dimension tables. 
  • Identify the attributes or properties of dimensions
    • Attributes (or Properties)
      • Each dimension might have number of different properties, but for a given context, not all of them are relevant for business
    • Knowing the properties let us decide what columns are required to be created in each dimension table.
    • In the above scenario, we could have
      • Food: name (burgers or fries)
      • Store: name, location, etc
      • Day: date
  • Identify the granularity of the measures
    • All the measurement rows in a fact table must be at the same grain (i.e., day or month). 
    • Having the discipline to create fact tables with a single level of detail ensures that measurements aren't inappropriately double-counted.
  • History Preservation (Optional)
    • Identify which dimensions are slowly changing (or fast changing or unchanging) is the last and final step of modeling (see video 1)
    • There are 8 different dimension types, but only 3 are commonly used:[3]
      • Type 0 - Fixed, non changing attribute
      • Type 1 - Changing attribute, no history kept
      • Type 2 - Most complex, keeps historical changes

Figure 2.  Sample rows from a dimension table with denormalized hierarchies

Summary

  • Dimensional Model (cf. Normalized Model )
    • Dimensional schema is simpler and symmetric
      • Business users benefit from the simplicity because the data is easier to understand and navigate
      • Database optimizers process these simple schemas with fewer joins more efficiently
      • Every dimension is equivalent; all dimensions are symmetrically-equal entry points into the fact table.
    • Dimensional models are gracefully extensible to accommodate change
      • With dimensional models, you can add completely new dimensions to the schema as long as a single value of that dimension is defined for each existing fact row.
  • Fact Tables
    • Fact tables tend to be deep in terms of the number of rows, but narrow in terms of the number of columns
    • The most useful facts are numeric and additive, such as dollar sales amount. 
      • Additivity is crucial because BI applications rarely retrieve a single fact table row.
        • However, you will see that facts are sometimes semi-additive (e.g., account balances) or even non-additive (e.g., unit prices). 
    • Facts are often described as continuously valued 
    • Fact tables usually make up 90 percent or more of the total space consumed by a dimensional model. 
    • All fact tables have two or more foreign keys that connect to the dimension tables' primary keys.
    • Fact tables (or bridge table) express many-to-many relationships
  • Dimension Tables
    • Dimension tables tend to be shallow in terms of the number of rows, but wide in terms of the number of columns
    • Each dimension is defined by a single primary key (surrogate key or natural key) , which serves as the basis for referential integrity with any given fact table to which it is joined.
    • Robust dimension attributes deliver robust analytic slicing-and-dicing capabilities.
      • In many ways, the data warehouse is only as good as the dimension attributes; the analytic power of the DW/BI environment is directly proportional to the quality and depth of the dimension attributes.
      • Dimension attributes serve as the primary source of query constraints, groupings, and report labels.
        • You should strive to minimize the use of codes or cryptic abbreviations in dimension tables by replacing them with more verbose textual attributes.
    • Dimension tables often represent hierarchical relationships (See Figure 2)
      • For example, products roll up into brands and then into categories
      • For each row in the product dimension, you should store the associated brand and category description. 
      • The hierarchical descriptive information is stored redundantly in the spirit of ease of use and query performance.
      • You should resist the habitual urge to normalize data (i.e., snowflaking)
        • You should almost always trade off dimension table space for simplicity and accessibility.
        • Because dimension tables typically are geometrically smaller than fact tables, improving storage efficiency by normalizing or snowflaking has virtually no impact on the overall database size. 
  • Fact or Dimension Attribute
    • When triaging operational source data, it is sometimes unclear whether a numeric data element is a fact or dimension attribute.  It is
      • A fact if
        • The column is a measurement that takes on lots of values and participates in calculations
      • A dimension attribute if
        • The column is a discretely valued description that is more or less constant and participates in constraints and row labels
      • Note:
        • Continuously valued numeric observations are almost always facts; discrete numeric observations drawn from a small list are almost always dimension attributes.

References

  1. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
  2. What is dimensional modelling?
  3. Dimensional Modeling – Declaring Dimensions (Youtube)
  4. Learn Modern Data Visualization with Oracle Analytics
  5. Click here for more A-Team Oracle Analytics (OAC) Blogs.

Saturday, July 9, 2011

Book Review: "Oracle Warehouse Builder 11g R2: Getting Started 2011"

What's Data Warehouse

A data warehouse is a database used for reporting. The main source of the data is cleaned, transformed, cataloged and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & OBrien 2009).

The data warehouse must handle large amounts of data, and must be simple to query and understand by the end users. A data warehouse maintains its functions in three layers:
  • Staging— used to store raw data for use by developers (analysis and support).
  • Integration—used to integrate data and to have a level of abstraction from users.
  • Access— for getting data out for users


Data Warehouse vs. Transactional Database

A data warehouse (DW) is different from a transactional database (TD). A DW is a database used for reporting where the focus is getting data out of the system. On the other hands, TD is concerned with daily operational processing where the focus is on getting data into the system.

In the design, TD uses normalized approach while DW chooses dimensional approach.

The normalized structure divides data into entities, which creates several tables in a relational database. When applied in large enterprises the result is dozens of tables that are linked together by a web of joins.

In a dimensional approach, transaction data are partitioned into either facts (actual numerical measures) and dimensions (descriptive data about those measures) that place the facts in a context that is understandable to the end-user decision maker.

In this article, we will review a book named Oracle Warehouse Builder 11g R2: Getting Started 2011, in which it addresses Oracle Warehouse Builder (i.e., one of the Oracle's data mining solutions).


Introduction of the Book

If you are new to data warehousing and need a way to get started, this book offers great help. The task of designing a data warehouse is nontrivial and you can use some helps from a good tool. On this, I agree with the author:
Oracle Warehouse Builder is a great application to use to build your warehouse.
This book provides good coverage on both the theory of data warehousing and practical ways of implementing a data warehouse that can be directly applied in the real world. It's the final product of author's 27 years' working experience in the database industry and his thorough and careful research on the Oracle Warehouse Builder.

The following books are perused and referenced by the author:
  • Oracle Warehouse Builder Data Modeling, ETL and Data Quality Guide
  • Oracle Warehouse Builder Concepts Guide
  • Oracle Database SQL Language Reference
  • Oracle Warehouse Builder Users Guide
  • Oracle Database Error Messages 11g Release 2(11.2)
  • The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling


Oracle Warehouse Builder (OWB)

The Oracle Warehouse Builder is a tool provided by Oracle, which can be used at every stage of the implementation of a data warehouse, from initial design and creation of the table structure to the extract, transform, and load (ETL) process and data-quality management.

OWB has the ability to design the objects logically using cubes and dimensions in a dimensional design. It also has the ability to implement them physically in the underlying database as either a relational structure or a dimensional structure simply by checking a box. For a relational implementation (physical implementation), OWB actually provides us two options for implementing the database:
  • Pure relational option
  • Relational OLAP option
In the Figure below, it shows OWB components and its architecture:


ACME Toys and Gizmos

In this book, a fictional organization named ACME Toys and Gizmos is used to demonstrate practical ways of implementing a data warehouse that can be directly applied in the real world. This book provides explanations throughout on how to use the Oracle Warehouse Builder tool to build a data warehouse within the context of this invented company, which is involved in storefront and online Internet sales.

For instance, a customer makes a purchase of a toy with ACME Toys and Gizmos on a particular day over the Internet, which results in a dollar amount of the transaction. The dollar amount becomes the fact and the toy purchased, the customer, and the location of the purchase (the Internet in this case) become the dimensions that provide a scope of the fact measurement and give it a meaning.

The design of the data warehouse is drawn out in a star schema configuration showing the cube, which is surrounded by the dimensions with the individual items of information (attributes) . It looks like the following:


With Star Schema, managers can easily use the data for data mining, online analytical processing, market research and decision support. For example, if a manager for ACME Toys and Gizmos needs to know what products sold well in the last quarter, the query will only involve two tables—the main fact table containing the data on number of items sold and the product dimension table that contains all the information about the product.


Summary

The purpose of this book is to introduce the Oracle Warehouse Builder and use it to design and build your first data warehouse. By following author's carefully designed components and instructions, you will easily achieve that goal.


See Also

  1. Oracle Warehouse Builder (OWB) 11g
  2. Oracle Warehouse Builder 11g R2: Getting Started 2011
  3. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
  4. Oracle Warehouse Builder Data Modeling, ETL and Data Quality Guide
  5. Oracle Warehouse Builder Concepts Guide
  6. Oracle Database SQL Language Reference
  7. Oracle Warehouse Builder Users Guide
  8. Oracle Database Error Messages 11g Release 2(11.2)

© Travel for Life Guide. All Rights Reserved.

Analytical Insights on Health, Culture, and Security.